Monday, June 3, 2013

From ray-tracing to smashing galaxies, a memory management issue - Part I

Ray-tracing vs Rasterization 

Rasterization: The term refers to the popular rendering algorithm for displaying three-dimensional shapes on a computer. Rasterization is currently the most popular technique for producing real-time 3D computer graphics. Real-time applications need to respond immediately to user input, and generally need to produce frame rates of at least 24 frames per second to achieve smooth animation. This is used in Games, CAD, medical imaging, etc. Any application that needs interactive rendering of complex 3D scenes.

 

Ray-tracing: Ray-tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of its encounters with virtual objects. It is meant to produce much better quality images than rasterization but it’s also much slower because it’s much more CPU consuming.

Appetite for high quality visualization


Why do we need real-time and interactive visualization?
I don’t know if you know about that story but there was an experiment done by, I think, the American intelligence. They were asked to write an application that could quickly identify buildings on satellite pictures. So they started working on complex algorithms involving machine learning, neural networks, image processing and anything that can help identifying the shape of a building on a 2D image. But then they realized that, with brain analysis and electrodes, using humans was a much simpler and cheaper way to achieve what they wanted. So they hired people for a few dollars an hours, and as they were showing them the photos, some parts of their brain were activated when they could identify a building. And this was done in a matter of milliseconds. The human brain is just good at that and has a very optimized recognition algorithm. All this to say that our unconscious is very good at picking things up and can understand more when facing realistic images.

Also, our brain understands dynamic scenes better than static ones, and because the quality of what we see directly influences our comprehension of complex events, it has to be as high as possible. 


Where is the challenge?

With Rasterization, every object is independent from each other. It’s very suitable for embarrassingly parallel algorithms and has a linear and predictable processing time,  depending on the number of objects to process.
With Ray-tracing, every object potentially interacts with all of the others. So what you get is an exponential processing time, as the number of objects increases.
Solving this problem requires a specific management of the objects to be rendered, and this is what I am going to show you now.


Six degrees of separation


One day, a friend told me about the theory of “Six degrees of separation”.  Do you know about this theory?
Six degrees of separation is the theory that everyone and everything is six or fewer steps away, by way of introduction, from any other person in the world, so that a chain of "a friend of a friend" statements can be made to connect any two people in a maximum of six steps
I did not take this sentence very seriously at the time but for some reason, it got stuck in my head. And a few years later, as I was encountering performance problems on one of my project, that theory came back to me as an evidence.




If we are 7 billion people on this planet, and if we never are more than 6 steps away from each other, why would it be different in the memory of my computer? If I have 7 billion objects to process in my rendering algorithm, it should not take more than 6 cycles to access the object that I need to work on…
Ok, that’s the theory. How do we do that in practice? And this is of course depending on the parameters that you can pass to this clever equation, as shown on the slide.


Bounding boxes

In order to apply this algorithm, the first thing to do is to group your objects, so that the group can be considered rather than the objects that it contains. In this example, we talk about “Bounding boxes”.
A bounding box is basically a 3D space that contains a number of objects.





Ray-tracing is all about intersections. In the previous slides, I told you that when you launch a ray, you need to find the intersection with the closest object. If you have 10’000 objects, then you need to compute the intersection between the ray and the object itself, for all of the objects,  and then keep the closest intersection. By grouping objects in boxes, you can start with computing intersection between the ray and the boxes, and if there is an intersection with a box, only then you will look into that box and compute the intersections with the objects it contains.


Fine tuning

A box intersection takes 17 cycles, using the robust and efficient algorithm  provided by the University of Utah ( http://people.csail.mit.edu/amy/papers/box-jgt.pdf )
A object intersection is a lot more expensive and can cost several hundred cycles. So then it’s a trade-off to identify between the number of boxes, and the number of objects.



OO vs Algorithm Specific Structure

Object Oriented Design is good, because it uses concepts that we, humans, are comfortable with.  A car object is a set of wheels, tyres, seats, and it as a color, etc. You can define inheritance that prevents from replicating the code.
With OO, you can write readable and maintainable code, which is essential. But OO is not optimal for performance because it fragments memory and forces one to consider an object systematically as a whole.
If you take the example of a person with three attributes: a first name, a second name, and an age. Now imagine that you have a collection of persons and you want to compute the average age of that population. With OO, you will not only take the age attribute down to the processor but you will also take the first and the second name down to the CPU, even if you do not need them in your algorithm. Only because OO forces you do so, by design.




On the other hand, if you have an algorithm specific data structure, you will separate you “person” object in two parts. One for the name and family name,  the other one for the age. And only the data structure containing the age will be taken down to the processor. This will dramatically increase the performance.
In the case of the ray-tracer, we do exactly the same.  A primitive such as a sphere for example, is composed of a number of attributes such as the center, the radius, the material properties, texture coordinates, etc. But when processing intersection on the first pass of the algorithm, only the center and the radius are needed, and this can be represented in the form of a float4.
For the boxes, only the two corners of the box are needed, meaning two float3.


Order in RAM

Now that we have defined our data structures, this needs to be optimized them for computers.
In our case, we have boxes and we need to compute the ray-box intersection for each of them. So let’s make sure that boxes are aligned in memory.




As we go through our boxes, if an intersection is found, we need to get deeper into the box to make sure that there is an intersection with a primitive. Meaning that ordering primitives also is essential. This is what is being done at a second level. And as explained in the previous slide, only me minimal information is taken down to the processor.