Patched Debian gcc-mingw-w64 to replace aligned moves with unaligned moves and rebuilt it, ran suite again - good news, no crashes any more.

vector size 4 is only an improvement when compiled with non-generic CPU target. Otherwise 2 is best (a significant improvement over 1 aka no SIMD). -march=native is a bit faster than the intersection of haswell/bdver4, on my AMD Ryzen 2700x CPU.

