Sunday, August 24, 2014

The beginners guide to Ray Tracing

So, I was asked recently on Twitter how Ray Tracing works, but it's not something that you can easily explain in 140 characters, so I thought I'd do a small post to explain it better.

So... the basic idea in rendering a scene in graphics, is to have a near and far place, and the corners of these two planes defines a "box" in 3D space, and that's what you're going to render - everything in the box". Orthographic projections use a straight "box", where the far plane is the same size as the near plane, whereas perspective uses a far plane that is bigger than the near one - meaning more appears in the distance than closer to you (as you would expect).

Doing Ray Tracing, is no different. First create a box aligned with the Z axis, so the planes run along the "X" axis. something like.... (values not real/worked out!)

Near plane  -100,-100,0  (top left)  100,100,0 (bottom right)
Far plane     -500,-500,200  (top left)  500,500,200 (bottom right)

This defines the perspective view we're going to render.

Now, how you "step" across the near plane, determines your resolution. So, let's say we're going to render a 320x200 view, then we'd step 320 times across the X axis, while we step 200 times down the Y.
This means we have 320 steps from -100,-100,0  to  100,-100,0 (a step of 0.625 on  X). And you then do the same on Y (-100 to 100 giving a DeltaY of 1.0). With me so far?

So, this image shows the basics of what I'm talking about. The small rectangle at the front is the near plane, and the large one at the back; the far plane. Everything inside the white box in 3D is what we consider as part of our view. If we wanted to make a small 10x10 ray traced image, then we'd step along the top 10 times - as shown by the red lines. So we have the 8 steps and the edges, giving us 10 rays to cast.

So lets go back to our 320x200 image, and consider some actual code for this. (this code isn't tested, it's just pseudo code)

float fx,fy,fdx,fdy;
float nz = 0.0f;
float fz = 200.0f;
fy = -500.0f;
fdy = 2.0f;
for(float ny= -100.0; ny<100;nydata-blogger-escaped-a="" data-blogger-escaped-colour="CastRay(nx,ny,nz," data-blogger-escaped-f="" data-blogger-escaped-fdx="" data-blogger-escaped-fdy="" data-blogger-escaped-float="" data-blogger-escaped-for="" data-blogger-escaped-fx="" data-blogger-escaped-fy="" data-blogger-escaped-fz="" data-blogger-escaped-nbsp="" data-blogger-escaped-nx="-100.0;nx<100;nx" data-blogger-escaped-ny="" data-blogger-escaped-pixel.="" data-blogger-escaped-pre="" data-blogger-escaped-raytrace="" data-blogger-escaped-vec4="">
NOTE: Sorry - Blogspot's editor has utterly nuked the code. I'll fix it later....

As you can see, it's a pretty simple loop really. This gets a little more complicated once the view is rotated, as each X,Y,Z has it's own delta, but it's still much the same, with an outer loop of X and Y dictating the final resolution. In fact, at the point the outer X and Y loops are usually just from 0 to 320, and 0 to 200 (or whatever the resolution is).

Assuming we have a background colour, then the CastRay() would currently return only the background colour, as we have nothing in the scene! And when you get right down to it, this is the core of the what ray tracing is. Its not that complicated to define a view, and start to render it. The tricky stuff comes when you start trying to "hit" things with your rays.

Lets assume we have a single sphere in this scene, then the cast ray function will check a collision from the line nx,ny,nz  to  fx,fy,fz with our sphere (which is a position and radius). This is basic maths stuff now - Ray to Sphere collision. There's a reasonable example of ray to sphere HERE.

So once you have that code in place, and you decide you've hit your sphere, you'll return the colour of your sphere. Pretty easy so far. So what if we add another sphere into the scene? Well, then the ray cast will now simply loop through everything in the scene no matter what we add; spheres, planes, triangles, whatever. 

Now we have a scene with lots of things being checked and hit, and we return a single colour for the CLOSEST collision point (giving is the nearest thing we've hit). This is basically our Z-Buffer! I means things can intersect without us having to do anything else. The closest distance down the ray to the intersection point, is the nearest thing to us, so that's the one we see first.

The next simplest thing we might try is reflections. In realtime graphics, this is pretty hard, but in Ray Tracing, it's really not. Lets say we have 2 spheres, and the ray we cast hits the 1st one, well all we now need to do is calculate the "bounce", that is the new vector that we'd get from calculating the incoming ray, and how it hits and bounces off the sphere. Once we have this, then all we do is call cast ray again, and it'll fly off an either hit nothing - or something. If it hits something, we return the colour and this is then merged into the initial collision and returned. Now, how you merge the colour, is basically how shiny the thing is! So if its VERY shiny, then a simple LERP with a 1.0 on the reflected colour would give a perfect reflection, where as if it's 0.5, then you would get a mix of the first and second spheres, and if its 1.0 on the original sphere, then you would only get the original sphere colour (and would probably never do the bounce in the 1st place!)

So the number of "bounces" you allow, is basically the level of recursion you do for reflections. Do you only want to "see" one thing in the reflection, then one bounce it is. DO you want to see a shiny object, in another shiny object? Then up the recursion count! This is one of the places Ray Tracing starts to hurt, as you can increase this number massively and doing so will also increase the number of rays being case, and the amount of CPU time required.

We should now have a shiny scene with a few spheres all reflecting off each other. What about Texture Mapping? Well, this is also pretty simple. When a collision occurs, it'll also give you a "point" of collision, and now all you need to do is project that into the texture space and then return the pixel colour at that point. Hay presto - texturing. This is very simple on say a plane, but gets harder as you start to do triangles, or spheres - or displacement mapping, or bezier patches!

For things like glass and water, you follow the same rule as reflection, except the "bounce" is a refract calculation. So instead of coming off, the ray bends a little as it carries on through the primitive you've hit. Aside from that, it's pretty much the same.

Now... lighting. Again, in realtime graphics, this is tricky, but in Ray Tracing it's again pretty easy. So lets say you've just hit something with your first ray, all you now do is take this point, and create a new ray to your light source, and then cast again. If this new ray hits anything, it means there's something in the way, and your pixel is in shadow. You'll do this for all light sources you want to consider, and merge the shadows together, blending with the surface pixel you hit in the first place, making it darker. And that's all lighting is (at a basic level). Area lights. spot lights and the rest get more interesting, but the basics are the same. If you can't see the light, your in shadow.

So how do you do all this in realtime? Well, now it gets fun! The simple view is to test with everything! And this will work just fine, but it'll be slow. The 1st optimisation is to only test with things in your view frustum. This will exclude a massive amount - and even a simple test to make sure something is even close to it, will help. After this, it gets tricky. A BSP style approach will help, and only test regions that the RAY itself intersects. This again will exclude lots, and help speed things up hugely. 

(no idea whose image this is BTW)

After this...well, this is why we have large Graphics conferences like SigGraph, professionals from the likes of Pixar and other studios and universities spend long hours trying to make this stuff better and faster, so reading up on ideas from these people would be a good idea.

Lastly....  There is also some serious benefit to threading here. Parallelisation is a big win in Ray Tracing as each pixel you process is (usually) totally contained, meaning the more CPU cores you have, the faster you'll go! Or, if your really adventurous you can stick it all on the GPU and benefit from some serious parallelisation!!