The trick is basically taking the C-like syntax of shader code and turning it into *actual* C code. The trouble is that C doesn't have vector operations build into the language (like you can do with C++ and operator overloading). I used to write my own vector operations for this, but discovered someone else did a much better job:
Even with multithreading, performance of the CPU shader code is quite terrible, as you'd expect. Which is why I don't even try to run them in realtime. BUT if you are okay with that, you get some very portable code that doesn't require working GPU drivers.