It seems something is not happy when I do #OpenCL in a different #thread, getting random #crash that seems to go away if I run it on the main thread (which harms user interface responsiveness, but so be it).
Rough timings are that my AMD Radeon RX 580 GPU running OpenCL is about the same overall speed in this workload as my AMD Ryzen 2700X CPU running compiled C++, even with all the back and forth host<->device memory copies that I haven't optimized yet.
A generic #x86_64 C++ build on Intel Core2Duo runs about half the speed as the POCL CPU OpenCL implementation. Maybe I can drop my attempts at #SIMD (which benefit strongly from non-portable -march=native, as I haven't figured out runtime CPU detection and compiling multiple versions) and punt that to the #OpenCL runtime compiler(s). A fallback in case of no OpenCL might still be handy though...
Turns out I just needed to `make SIMD=0` with no code changes necessary. Now build from clean takes ~15mins, which is still long but most rebuilds during development hopefully don't need to recompile all of it.
Welcome to post.lurk.org, an instance for discussions around cultural freedom, experimental, new media art, net and computational culture, and things like that.