Monday, March 28, 2011

New Register Allocator in the R300 Compiler

I'm mostly finished with a new and improved register allocator for fragment shaders in the R300 compiler. I still need to clean up the code and add comments, but otherwise it is ready for testing. The new allocator takes advantage of a register allocation algorithm designed for irregular architectures from a paper by Johan Runeson and Sven-Olof Nyström. Eric Anholt implemented this algorithm and added it to mesa, so all drivers could make use of it.

The new register allocator can pack one and two component register writes together into the same register to make full use of the four component temporary registers that the programs have access to. For example a program like this:

ADD TEMP[0].x, CONST[0].x CONST[0].x
MUL TEMP[1].x, TEMP[0].x, TEMP[0].x
MUL TEMP[2].x, TEMP[1].x, TEMP[1].x
MAD OUT[0].x, TEMP[0].x, TEMP[1].x, TEMP[2].x

will now be transformed to this:

ADD TEMP[0].x, CONST[0].x CONST[0].x
MUL TEMP[0].y, TEMP[0].x, TEMP[0].x
MUL TEMP[0].z, TEMP[0].y, TEMP[0].y
MAD OUT[0].x, TEMP[0].x, TEMP[0].y, TEMP[0].z

This will have a big impact on shaders that use a lot of scalar values. Some of the bigger shaders in Lightsmark use 30-50% less registers with the new register allocator on my RV515. I also get an improvement in fps from ~4.75 to ~5.30, which is about 10%, but with fps that low I'm not sure the difference is really significant. I'd be interested to see the results on other cards with different games and benchmarks. If anyone wants to test it out, the code is in the new-register-allocator branch here.

If you run programs with the environment variable RADEON_DEBUG=pstat they will print out statistics from the compiled shaders that are useful for evaluating the effectiveness of the new register allocator.