Apologies if its bad form to post a reference to my own github.
The VPU documentation is now at stage where enough instructions are described to do useful work.
I am about to begin writing up a set of tutorials for beginners who want to leverage the VPU.
We are looking to push forward with a robust port of GNU Binutils for VideoCore IV, and seeking contributors.
Similarly we are looking for contributors to help with ports of GCC and LLVM.
Some Background: The SoC in the Raspberry Pi has 3 processors / instruction sets - (1) An ARM v6 @700MHz for userland, (2) a VideoCore IV VPU @250MHz with dual core, dual issue and 16-way SIMD integer vector processor for running the blob, codecs, 2d acceleration etc, and (3) a 24 GFLOPS shader processor (QPUs) for OpenGL ES and OpenVG.
Understanding the QPUs is also under way at https://github.com/hermanhermitage/videocoreiv-qpu. Currently we are able to intercept and disassemble the QPU fragments generated when compiling OpenGL ES shaders. I have also begun injecting my own fragments, and am looking forward to creating a signal processing API or mini OpenCL type library to exploit the 24 GFLOPS.