Sunday, March 4, 2012

Blender Cycles - GCC Compiler Optimisations

Hello Blenderheads,

In this post, I'll try to see if DragonEgg (see "DragonEgg in Fedora 16") has any effect on the rendering time of the Cycles Benchmark Scene by MPan3. I'll also check other GCC flags that my old processor might accept.
MPan3's testing BMW scene.

Materials and Methods
For this purpose I'll compile several versions of Blender. Each version will have its own set of optimisation flags.

Once all builds are done, I'll reboot and launch the rendering of the benchmark scene with vanilla-blender. Once this is done, I'll reboot and do the same with tuned-blender and dragonegg-blender and compare times.
The computer running the tests is my good'ol Bamboo. The render times are expected to be high.

Software versions are:
  • Blender : svn rev. 44621
  • LLVM : svn rev. 151974
  • DragonEgg :svn rev. 151975
  • GCC :  gcc (GCC) 4.6.2 20111027 (Red Hat 4.6.2-1)
These are the compilation flags (CFLAGS and CXXFLAGS):
  1. -O2 -DNDEBUG (normal "release" build)
  2. -O2 -DNDEBUG -fplugin=~/dev/llvm/dragonegg-src/dragonegg.sp
  3. -O3 -DNDEBUG -march=athlon64 -mfpmath="sse"
  4. -O3 -DNDEBUG -march=athlon64 -mfpmath="both" -m3dnow -msse2
At the moment there is no point in combining dragonegg and other gcc optimisations as - by default - dragonegg excludes gcc optimisations. There is a flag to do both gcc and dragonegg optimisations (-fplugin-arg-dragonegg-enable-gcc-optzns) but, though the compilation succeeds, the execution segfaults in strange ways. So the test using this option was excluded.

Here are the rendering times for the builds above (in respective order):
  1. Time: 47:32.24
  2. Time: 42:30.68
  3. Time: 34:07.18
  4. Time: 33:59.39
It's difficult to say if there is any differences between each renders as they are extremely similar to the eye. However, taking them through some numpy magic and considering "1" as the reference we get the following numbers:
  1. No difference to "1" obviously.
  2. 54% of differing pixel components, 2.19 e-5 intensity diff
  3. 54% of differing pixel components, 2.88 e-5 intensity diff
  4. 54% of differing pixel components, 3.12 e-5 intensity diff
Intensities are expressed between 0.0 and 1.0 (rescaled).

The default compilation parameters are not optimal for my platform. Better GCC parameters can be applied and the speed up is significative (30% faster!). This is no suprise.
The surprise comes from DragonEgg. I was expecting a huge boost compared to the default parameters. There is a 10% speed up, but it is far from what can be acheived by tuning the GCC flags. Still, 10% is an interesting gain!

One question remains : what type of speed up could be acheived if the flag that combines both optimizers worked?

No comments:

Post a Comment