Show more

Well, the original example in this thread still fails spectacularly with scaled-double. I seem to have only fixed it for off-axis Burning Ship locations.

I think it's underflow to 0.0 of the reference orbit imaginary part `Xi` (stored in double precision without scaling), which causes problems in perturbed pixel orbits when the product of Xr and Xi is scaled up in `diffabs(Xr * S * Xi, ...)`.

In this case I think the terms should have magnitudes roughly:
Xr 1e0
S 1e130
Xi 1e-420 (underflows to 0)
So their product should have magnitude 1e-290 which is in the range of double.

diffabs(X,x) := |X+x|-|X|

When the first argument is roughly opposite the second argument, the result is negative:
diffabs(-x,x) = -|x|
If the first argument is 0 due to unwanted underflow, the result is positive and about the same size:
diffabs(0,x)= |x|
This makes it go wrong.

Mandelbrot at this location works fine with scaled-double, because the Xi does not need to be scaled up for diffabs().

Simple enough fix in the end.

The SIMD path iterates in chunks (default 64), and rolls the last chunk back when any pixel in the SIMD vector (default size 2) escapes or is detected as glitched, then finishes iterating each pixel in the vector on its own one by one.

I forgot to save/restore the 'test1' variable used for storing the squared magnitude for checking escape/glitch, so it or 'test2' (the previous value of 'test1', used for low bailout smooth colouring) contained garbage data, which sometimes made it through to the output pixels.

I think that happened when the pixel's iteration count was 1 more than an integer multiple of the chunk size, but I'm not sure.

Similar issue with a Mandelbrot location at 4e533 zoom depth; this is a regression from the previous release so it should be possible to fix it again...

Scaled-double with SIMD vector size 2 takes 10mins vs long double's 16mins (wall-clock time) at this Burning Ship location (zoom depth 1e410).

However, the scaled-double version looks a bit rougher even downscaled to thumbnail size. I'm not sure why, but it's a bit disappointing after all the effort of implementing it.

Be sure *not* to enable `-ffast-math` for this kind of code, as it can do optimizations that break it again, unpredictably.

claude boosted

I think I fixed it!

When Xr and Xi are small, (Xr * Xi) can to 0, so ((Xr * Xi) * S) is 0 already, Changing the multiplication order to ((Xr * S) * Xi) seems to work much better!

The risk of to infinity in (Xr * S) is probably none, because Xr is small (less than escape radius) but I should double-check deep zooms near the 1e600 threshold for switching from scaled double to long double.

Trying to get "scaled double" working for in . Purpose of rescaling is to avoid floating point underflow to 0.0. Something is broken however, I suspect it has to do with glitch detection as dead-center miniships do much better).

Burning Ship iterations:
Xrn = Xr * Xr - Xi * Xi + Cr;
Xin = abs(2 * Xr * Xi) + Ci;

Perturbed Burning Ship iterations:
xrn = (2 * Xr + xr) * xr - (2 * Xi + xi) * xi + cr;
xin = 2 * diffabs(Xr * Xi, Xr * xi + xr * Xi + xr * xi) + ci;

Rescaled perturbed Burning Ship iterations (S * s = 1, actual orbit is X + x * s):
xrn = (2 * Xr + xr * s) * xr - (2 * Xi + xi * s) * xi + cr;
xin = 2 * diffabs(Xr * Xi * S, Xr * xi + xr * Xi + xi * xr * s) + ci;

diffabs(Z,z) evaluates abs(Z+z)-abs(Z) without catastrophic precision loss, by doing case analysis on the signs of Z and Z+z. Technique invented by laser blaster at

oh, Gridflow just copy/pasted 4 passes like I'm doing already for 9 passes. slight disappoint

The .cpp file is huge (115k lines) so I'm compiling it with PASSX ... chunks so I can parallelize it and wait only 2.5mins instead of 20mins for it to finish. I picked up this trick from Gridflow's build system, maybe I should look to see how it did it...

can anyone help me simplify my repetitive ?

%.1.o: %.cpp

%.2.o: %.cpp

%.3.o: %.cpp


(it goes up to 9 at the moment....)

One Burning Ship location I tested gets a similarly significant speed-up to the Mandelbrot data points in my blog post. 45% faster wall clock (13m22 instead of 19m20) and 66% faster CPU time (137m54 instead of 228m37).

Zoom depth 1e150 or so, going deeper will be slower still (as iteration count increases), I don't know if anyone is rendering Burning Ships deeper than 1e300 regularly.

The programming effort involved implementing rescaled-double for depths between 1e300 - 1e600 might be quite high for Burning Ship, but then adding the SIMD stuff on top shouldn't be insurmountable.

It's probably more worth my time implementing SIMD for Mandelbrot rescaled-double (the series approximation stuff means Mandelbrot deep zooms are much cheaper than Burning Ship).

Still need to implement SIMD for the "type C" formulas too, and the versions with derivative calculations.

new blog post
in which I show how I made faster by adding a inside an , and even faster by adding explicit using extensions

Project idea: new for the `make` that can use of previous runs to invocations in subsequent runs.

Example: `main: fast1 fast2 fast3 slow` should parallelize on two cores like (slow) (fast1 ; fast2 ; fast3) and not (fast1 ; fast3) (fast2 ; slow). This would give a 33% wall-clock speed boost when `fastX` takes 1/3 the time of `slow`, CPU time would stay the same.

In a book I inherited recently, "Computers at Work" by John O E Clark in 1969, pages 5-11 cover the relevant idea of "critical path analysis".

made a little tool to extract channels and preview images from files
very basic but useful for (I used it to be able to view kf's `NF` channel in exrdisplay which led me to notice and fix vertical flipping)

found a serious bug in the kf-2.14.7 release (iteration data channels are saved vertically flipped in EXR files)

also fixed an old performance regression vs 2.12 branch, so now there is no reason not to use the latest.

new release:

rounded-0.2 and rounded-1.0 have been released now new KF release with EXR support (now you can do postwork without getting banding or other colour artifacts so easily) and other features+bugfixes

rounded is a Haskell package for correctly-rounded arbitrary-precision floating-point arithmetic, binding to the MPFR C library. I took over the maintainership last year.

The next releases will be twofold, with branches for MPFR 3.1 (rounded-0.x) and MPFR 4.0 (rounded-1.x). Users should depend on the lowest version(s) that have all the features and bug-fixes they need.

Changes since the previous release are a bug-fix for an embarrassing crash in the long double support, and wrappers for more of the MPFR API.

You can check the release candidates at:

I use rounded in et, an escape time fractal explorer, as well as other projects. The image was made with et.

Show more

Hometown is adapted from Mastodon, a decentralized social network with no ads, no corporate surveillance, and ethical design.