Well, the original example in this thread still fails spectacularly with scaled-double. I seem to have only fixed it for off-axis Burning Ship locations.
I think it's underflow to 0.0 of the reference orbit imaginary part `Xi` (stored in double precision without scaling), which causes problems in perturbed pixel orbits when the product of Xr and Xi is scaled up in `diffabs(Xr * S * Xi, ...)`.
In this case I think the terms should have magnitudes roughly:
Xi 1e-420 (underflows to 0)
So their product should have magnitude 1e-290 which is in the range of double.
diffabs(X,x) := |X+x|-|X|
When the first argument is roughly opposite the second argument, the result is negative:
diffabs(-x,x) = -|x|
If the first argument is 0 due to unwanted underflow, the result is positive and about the same size:
This makes it go wrong.
Mandelbrot at this location works fine with scaled-double, because the Xi does not need to be scaled up for diffabs().
Simple enough fix in the end.
The SIMD path iterates in chunks (default 64), and rolls the last chunk back when any pixel in the SIMD vector (default size 2) escapes or is detected as glitched, then finishes iterating each pixel in the vector on its own one by one.
I forgot to save/restore the 'test1' variable used for storing the squared magnitude for checking escape/glitch, so it or 'test2' (the previous value of 'test1', used for low bailout smooth colouring) contained garbage data, which sometimes made it through to the output pixels.
I think that happened when the pixel's iteration count was 1 more than an integer multiple of the chunk size, but I'm not sure.
Scaled-double with SIMD vector size 2 takes 10mins vs long double's 16mins (wall-clock time) at this Burning Ship location (zoom depth 1e410).
However, the scaled-double version looks a bit rougher even downscaled to thumbnail size. I'm not sure why, but it's a bit disappointing after all the effort of implementing it.
I think I fixed it!
When Xr and Xi are small, (Xr * Xi) can #underflow to 0, so ((Xr * Xi) * S) is 0 already, Changing the multiplication order to ((Xr * S) * Xi) seems to work much better!
The risk of #overflow to infinity in (Xr * S) is probably none, because Xr is small (less than escape radius) but I should double-check deep zooms near the 1e600 threshold for switching from scaled double to long double.
Trying to get "scaled double" #maths working for #BurningShip #fractal in #kf. Purpose of rescaling is to avoid floating point underflow to 0.0. Something is broken however, I suspect it has to do with glitch detection as dead-center miniships do much better).
Burning Ship iterations:
Xrn = Xr * Xr - Xi * Xi + Cr;
Xin = abs(2 * Xr * Xi) + Ci;
Perturbed Burning Ship iterations:
xrn = (2 * Xr + xr) * xr - (2 * Xi + xi) * xi + cr;
xin = 2 * diffabs(Xr * Xi, Xr * xi + xr * Xi + xr * xi) + ci;
Rescaled perturbed Burning Ship iterations (S * s = 1, actual orbit is X + x * s):
xrn = (2 * Xr + xr * s) * xr - (2 * Xi + xi * s) * xi + cr;
xin = 2 * diffabs(Xr * Xi * S, Xr * xi + xr * Xi + xi * xr * s) + ci;
diffabs(Z,z) evaluates abs(Z+z)-abs(Z) without catastrophic precision loss, by doing case analysis on the signs of Z and Z+z. Technique invented by laser blaster at fractalforums.com.
One Burning Ship location I tested gets a similarly significant speed-up to the Mandelbrot data points in my blog post. 45% faster wall clock (13m22 instead of 19m20) and 66% faster CPU time (137m54 instead of 228m37).
Zoom depth 1e150 or so, going deeper will be slower still (as iteration count increases), I don't know if anyone is rendering Burning Ships deeper than 1e300 regularly.
The programming effort involved implementing rescaled-double for depths between 1e300 - 1e600 might be quite high for Burning Ship, but then adding the SIMD stuff on top shouldn't be insurmountable.
It's probably more worth my time implementing SIMD for Mandelbrot rescaled-double (the series approximation stuff means Mandelbrot deep zooms are much cheaper than Burning Ship).
Still need to implement SIMD for the "type C" formulas too, and the versions with derivative calculations.
Example: `main: fast1 fast2 fast3 slow` should parallelize on two cores like (slow) (fast1 ; fast2 ; fast3) and not (fast1 ; fast3) (fast2 ; slow). This would give a 33% wall-clock speed boost when `fastX` takes 1/3 the time of `slow`, CPU time would stay the same.
In a book I inherited recently, "Computers at Work" by John O E Clark in 1969, pages 5-11 cover the relevant idea of "critical path analysis".
found a serious bug in the kf-2.14.7 release (iteration data channels are saved vertically flipped in EXR files)
also fixed an old performance regression vs 2.12 branch, so now there is no reason not to use the latest.
new release: https://mathr.co.uk/kf/kf.html#kf-184.108.40.206
https://mathr.co.uk/kf/kf.html#kf-2.14.7 new KF release with EXR support (now you can do postwork without getting banding or other colour artifacts so easily) and other features+bugfixes
rounded is a Haskell package for correctly-rounded arbitrary-precision floating-point arithmetic, binding to the MPFR C library. I took over the maintainership last year.
The next releases will be twofold, with branches for MPFR 3.1 (rounded-0.x) and MPFR 4.0 (rounded-1.x). Users should depend on the lowest version(s) that have all the features and bug-fixes they need.
Changes since the previous release are a bug-fix for an embarrassing crash in the long double support, and wrappers for more of the MPFR API.
You can check the release candidates at:
I use rounded in et, an escape time fractal explorer, as well as other projects. The image was made with et.