The new register allocator can pack one and two component register writes together into the same register to make full use of the four component temporary registers that the programs have access to. For example a program like this:
ADD TEMP[0].x, CONST[0].x CONST[0].x
MUL TEMP[1].x, TEMP[0].x, TEMP[0].x
MUL TEMP[2].x, TEMP[1].x, TEMP[1].x
ADD TEMP[0].x, CONST[0].x CONST[0].x
MUL TEMP[1].x, TEMP[0].x, TEMP[0].x
MUL TEMP[2].x, TEMP[1].x, TEMP[1].x
MAD OUT[0].x, TEMP[0].x, TEMP[1].x, TEMP[2].x
will now be transformed to this:
ADD TEMP[0].x, CONST[0].x CONST[0].x
MUL TEMP[0].y, TEMP[0].x, TEMP[0].x
MUL TEMP[0].z, TEMP[0].y, TEMP[0].y
will now be transformed to this:
ADD TEMP[0].x, CONST[0].x CONST[0].x
MUL TEMP[0].y, TEMP[0].x, TEMP[0].x
MUL TEMP[0].z, TEMP[0].y, TEMP[0].y
MAD OUT[0].x, TEMP[0].x, TEMP[0].y, TEMP[0].z
This will have a big impact on shaders that use a lot of scalar values. Some of the bigger shaders in Lightsmark use 30-50% less registers with the new register allocator on my RV515. I also get an improvement in fps from ~4.75 to ~5.30, which is about 10%, but with fps that low I'm not sure the difference is really significant. I'd be interested to see the results on other cards with different games and benchmarks. If anyone wants to test it out, the code is in the new-register-allocator branch here.
This will have a big impact on shaders that use a lot of scalar values. Some of the bigger shaders in Lightsmark use 30-50% less registers with the new register allocator on my RV515. I also get an improvement in fps from ~4.75 to ~5.30, which is about 10%, but with fps that low I'm not sure the difference is really significant. I'd be interested to see the results on other cards with different games and benchmarks. If anyone wants to test it out, the code is in the new-register-allocator branch here.
If you run programs with the environment variable RADEON_DEBUG=pstat they will print out statistics from the compiled shaders that are useful for evaluating the effectiveness of the new register allocator.