Thursday 3 February 2011

Last words on my simple ray tracer

Eventually I decided to publish some results for my simple raytracer project. It's kind of an endless project that might bring very interesting developments when starting to dig into complicated matters like using triangle meshes and accelerating structures. Although, I decided to put this on hold for the time being, as I managed to reach the initial goals for this project:
  • Improve my knowledge of C# and .NET
  • Learn the basics of ray tracing techniques
  • Push the basic algorithms to their best in C#
  • Make use of all the available cores on a machine to speed up the raytracing process.


I haven't yet set up a website where people could download my projects from, so in the meantime, if you are interested in more details, or you want to take a look at the code, please send me an email and I'll get back to you as soon as possible.
So, as I said, one of the main goals was to maximize the usage of the available cores to perform the raytracing. Even if the raytracer is not very complex, I dedicated a good effort in making the prototype application very interactive, in order to allow me to quickly test the implementation on different scenes, with different parameters, but especially using a different number of cores to perform the computation. The following graph summarizes the results:


I run my tests on a machine with a i7-930 processor. It has 4 cores, taking advantage of the Intel hyperthreading technology, the total number of hardware cores goes up to 8. The x-axis represents the number of worker threads, the y-axis the average time to raytrace the test scene in milliseconds. Each average was computed on a set of 10 executions.
Without too much surprise, being the raytracing an embarrassingly parallel problem, increasing the number of worker threads, improves the performance significantly. When the number of threads is '0', all the work is performed on the main application threads and the raytracing process is completely serial. The best result is obtained with 4 worker threads, while is quite evident that adding worker threads over the number of physical cores is not much beneficial.
I'm not completely satisfied with these results, because I know that it could be done better. The 'pure' serial cost of presenting the image on a bitmap is around 2ms, pretty much negligible, so, given a cost of around 330ms with no worker threads, I would expect to see respectively around 165ms, 110ms, 85ms for 1, 2 and 3 workers.
More accurate measurements would be necessary to confirm my impression that the behavior is not close to the ideal performance because of a combination of the following reasons:
  • The scheduler has a considerable overhead, using a queue shared between the threads and synchronized with mutexes.
  • Although I tried to minimize the memory allocations while the raytracing is computed, C# is a very dynamic language and it is very difficult to have a full control on the memory usage and the memory is clearly a shared resource that introduces additional hidden synchronization.
  • The improvement seen with the introduction of the 4th worker is likely explained by the fact that main thread has some additional work to do to process UI events while the workers perform the rendering.
There is a lot of potential for improvements and maybe some day I'll go back to the project to try out some ideas.