A few weeks ago I began working on using the hardware loop capabilities for fragment shaders on R500 cards. My original plan was to use the specialized loop instructions provided by the graphics card, but as it turned out, the documentation for these instructions was a little confusing (or so I thought), and I could never get them to work the way I wanted. So, instead I ended up using JUMP instructions to execute loops the same way you would if you were generating code for a CPU. This is an OK solution, but it makes it very difficult to generate code for loops that have continue or break statements.
After taking a few days off from loops, I decided to give the specialized loop instructions another try. I went back and reviewed the documentation and still it did not make sense to me, so I decided to ask Alex Deucher, who works at AMD, for some clarification on the documentation. As it turns out the documentation was fine, Alex pointed out a short but very important part of the documentation that I had over-looked. I've probably read the documentation one hundred times, but I always missed that one crucial part!!! Thanks, Alex.
I will start working on hardware loop instructions again soon, but first I am going to take a little detour to fix a bug in the compiler's instruction scheduler that is preventing me from playing civ4 and causing problems with Compiz for some people.