Took a couple of days of coding, and resulted in my biggest commit to KF in quite a while:

28 files changed
1489 insertions
926 deletions

I refactored the reference orbit storage into double precision plus floatexp (double precision with extended exponent) for only the iterations that need it, in preparation for adding scaled perturbation computations as described by Pauldelbrot in:
fractalforums.org/programming/

It seems to work ok, after some false starts with off-by-one in series approximation, but the extra logic involved in fetching reference data in the perturbed calculations probably slows it down a fraction (I didn't benchmark yet).

If I manage to complete the second phase, the calculation efficiency for zooms beyond the range of (long) double precision (1e308 or so for GPU, 1e4900 or so for x87 CPU) should be very significant, because the current floatexp renormalization after every arithmetic operation is super slow.

Got scaled iterations partly working. Only the code path without derivatives or SIMD so far.

Some trouble with glitch detection though :( Seems some glitches are not detected, leading to bad images...

Debugging this is painful because it's in an XSLT file that generates vast amounts of C++ code (all the inner loops for all the formulas) that takes over 10 minutes to compile.

Show thread

Turns out glitch detection was fine all along. The problem was transiting via low-range double between the series approximation output and perturbation input, thus the values underflowed to 0 causing problems. Passing through the extended range doubles ("floatexp", with a wider exponent stored separately) instead of converting fixed this bug.

Speed report: in one location, scaled double takes about 2/3 the time as x87 long double, and 1/4 the time as floatexp. Nice acceleration, albeit as yet only for power 2 and power 3 (untested so far) Mandelbrot set formula.

Show thread
Follow

fractal perturbation maths summary 

start with the iteration formula
[1]: Z -> Z^2 + C

perturb the variables with unevaluated sums
[2]: (Z + z) -> (Z + z)^2 + (C + c)

do symbolic algebra to avoid the catastrophic absorption when adding tiny values z to large values Z
[3]: z -> 2 Z z + z^2 + c

scale the values to avoid underflow (substitute S w = z and S d = c)
[4]: S w -> 2 Z S w + S^2 w^2 + S d

cancel out one scale factor S throughout
[5]: w -> 2 Z w + S w^2 + d

choose S so that |w| is around 1. when |w| is at risk of overflow (or underflow), redo the scaling; this is typically a few hundred iterations as |Z|<=2.

now C, Z is the "reference" orbit, computed in high precision using [1] and rounded to (unscaled) double, which works fine most of the time. c, z are the "pixel" orbit, you can do many of these near each reference (e.g. an entire image).

problem: if |Z+z| << |Z| at any iteration, glitches can occur. See fractalforums.com/announcement
solution: retry with a new reference, or (only works for some formulas) rebase to a new reference and carry on

problem: if |Z| is very small, it can underflow to 0 in unscaled double in [5], so one needs to do a full range (e.g. floatexp) iteration at those points. It also means that |w| can change dramatically so rescaling is necessary. See fractalforums.org/programming/

· · Web · 1 · 0 · 0

fractal perturbation maths summary 

optimization: if S underflowed to 0 in unscaled double, you don't need to calculate the + S w^2 term at all when Z is not small. when Z is small you need the full range S (floatexp)

optimization: similarly you can skip the + d if it underflowed.

optimization: for higher powers there will be terms involving S^2 w^3 (for example), which might not need to be calculated due to underflow.

ideally these tests would be performed once at rescaling time, instead of in every inner loop iteration (though they would be highly predictable I suppose).

Show thread

fractal perturbation maths summary 

the other part of the thing that K I Martin's sft_maths.pdf popularized was that iteration of [3] gives a polynomial series in c:

[6]: z_n = \sum A_{n,k} c^k

(with 0 constant term). This can be used to "skip" a whole bunch of iterations, assuming that truncating the series doesn't cause too much trouble(*)

substituting [6] into [3] gives

[7]: \sum A_{n+1},k c^k = 2 Z \sum A_{n,k} c^k + (\sum A_{n,k} c^k)^2 + c

equating coefficients of c^k gives recurrence relations for the series coefficients A_{n,k}, see mathr.co.uk/blog/2016-03-06_si

(*) the traditional way to evaluate that it's ok to do the series approximation at an iteration is to check whether it doesn't deviate too far from regular iterations (or perturbation iterations) at a collection of "probe" points. when it starts to deviate, roll back an iteration and initialize all the image pixels with [6] at that iteration

Show thread

fractal perturbation maths summary 

when perturbing the burning ship and other "abs variations", one ends up with things like

[8]: |XY + Xy + xY + xy| - |XY|

which naively gives 0 by catastrophic absorption and cancellation. laser blaster made a case analysis fractalforums.com/new-theories which can be rewritten as

[9]: diffabs(c, d) := |c+d| - |c| = if sign(c) == sign(c + d) then sign(c) * d else -sign(c) * (2*c+d)

when d is small the first case much more likely. with rescaling in the mix [8] works out as

[10]: diffabs(XY/s, Xy + xY + sxy)

which has the risk of overflow when 's' is small, but the signs work out ok even for infinite 'c' as 'd' is known to be finite.

moreover, if s = 0 due to underflow, the first branch will always be taken (except when XY is small, when a floatexp iteration will be performed instead). and as s >= 0 by construction, diffabs(XY/s, Xy + xY + sxy) reduces to

[11]: sign(X Y) * (X y + x Y)

with this implemented in KF, the first benchmarks are in at zoom depth 1e449:

floatexp 135s
long double 52.1s
scaled double 34.5s

a nice speedup!

I tried this before for fixed scaling per zoom depth (for 1e300-1e600), but I think the reason it didn't work was the lack of floatexp iterations when XY was small. need to test the new implementation on the needle to be sure...

Show thread

fractal perturbation maths summary 

I might have made a mistake, but it seem the optimisation in [11] is problematic after all. At a 1e807 zoom near the needle, it gives incorrect rendering.

Moreover, with this optimisation disabled, it's slow:

scaled double 45s
long double 42s
floatexp 35s

This is because near the needle all the iterations have an X or Y or G*(X^2+Y^2) near 0 (because Y is near 0), which means floatexp iterations will be done anyway. Using floatexp from the get go avoids many branches and rescaling in the inner loop, so it's significantly faster.

And, worse, the extra complexity makes it much slower than previous versions of KF:

scaled double (N/A)
long double 9.1s
floatexp 32s

Show thread

fractal perturbation maths summary 

aha, the mistake in [11] was sign(X*Y) instead of

[12]: sign(X)*sign(Y)

the latter works fine, but is still very slow. Presumably this is because X and Y are small enough that X*Y underflows to 0, while sign(X)*sign(Y) is non-zero.

Timing is about the same, because most iterations are the floatexp ones.

Show thread

fractal perturbation maths summary 

I found a bug, Burning Ship (scaled double + OpenCL + series approximation) is broken, disabling any one of the three seems to fix it. Hopefully something simple. With series approximation and guessing disabled, one test case (6400x3600 e407) rendered in 6m55 with OpenCL on my GPU vs 11m21 for the regular CPU implementation. Disabling guessing seems to speed up OpenCL a lot, at least when there is negligible interior.

Show thread

fractal perturbation maths summary 

KF scaled double progress: derivatives for analytic distance estimation now work on regular CPU and OpenCL implementations.

OpenCL speedups over the last released KF are about 2.5x, except for Burning Ship "deep needle" locations where the new version is slightly slower. Guessing was disabled in these tests, and series approximation was disabled for Burning Ship (still broken).

CPU speedups over the last released KF are about 2x, except for Burning Ship "deep needle" locations where the new version is 3x slower.

I added a summary of the maths to a page on the fractal wiki:
fractalwiki.org/wiki/Rescaled_

Show thread

fractal perturbation maths summary 

About glitches: perturbation assumes exact maths, but some images have glitches when naively using perturbation in low precision.

A solution was discovered/invented by Pauldelbrot in fractalforums.com/announcement

One has perturbed iteration as in [3] (recap: z -> 2 Z z + z^2 + c)

Then one perturbs this with z -> z + e, c -> c + f:

[13]: e -> (2 (Z + z) + e) e + f

We are interested what happens to the ratio e/z under iteration, so rewrite [3] as

[14]: z -> (2 Z + z) z + c

Pattern matching, the interesting part (assuming c and f are small) of e/z is 2(Z + z) / 2 Z.

When e/z is small, the nearby pixels "stick together" and there is not enough precision in the number type to distinguish them, which makes a glitch.

So a glitch can be detected when

[15]: |Z + z|^2 < G |Z|^2

Where G is a threshold (somewhere between 1e-2 and 1e-8, depending how strict you want to be).

This does not add much cost, as |Z+z|^2 already needs to be computed for escape test, and G|Z^2| can be computed once for each iteration of the reference orbit and stored.

The problem now is: How to choose G? Too big and it takes forever as glitches are detected all over, too small and some glitches can be missed leading to bad images.

The glitched pixels can be recalculated with a more appropriate reference point.

Show thread

fractal perturbation maths summary 

Here's an example of the kind of glitches that can happen (in this case, missing details in the right hand image with KF's default threshold), with a lower tolerance for glitches it takes much longer but the image is hopefully correct.

Test location in the Mandelbrot set taken from here deviantart.com/olbaid-st/art/D

Show thread

fractal perturbation maths summary 

ugh, left image is incorrect, right image is correct, at least the image descriptions are ok

Show thread

fractal perturbation maths summary 

Hybrid fractals in KF are built from stanzas, each has some lines, each line has two operators, and each operator has controls for absolute x, absolute y, negate x, negate y, integer power, complex multiplier. The two operators in a line can be combined by addition, subtraction or multiplication, and currently the number of lines in a stanza can be either 1 or 2 and there can be 1, 2, 3 or 4 stanzas. The output of each line is fed into the next, and at the end of each stanza the +c part of the formula happens. There are controls to choose how many times to repeat each stanza, and which stanza to continue from after reaching the end.

Implementing perturbation for this is quite methodical. Start from an operator, with input Z and z being the reference and delta.

Set mutable variables
z (as input)
W := Z + z
B := Z
If absolute x, then
re(z) := diffabs(re(Z), re(z))
re(W) := abs(W)
re(B) := abs(B)
similarly for y and im
If negate x, then
re(z) := -re(z)
etc for W and B
similarly for y and im
Now compute
S = sum_{i=0}^{power-1} W^i B^{power-1 - i}
and return a*z*S

Combining operators into lines may be done by mathr.co.uk/blog/2018-03-12_pe
Combining lines into stanzas can be done by iterating unperturbed Z alongsize perturbed Z, z; only the +C needs high precision.

Show thread

fractal perturbation maths summary 

Rescaling hybrid iterations seems like a big challenge, but might not be so hard after all (I've not implemented it yet but this is the algorithm/maths that I've worked out so far):

if either or both the real and imaginary parts of the reference orbit Z are small, one needs to do a full range iteration with floatexp and recalculate the scale factor afterwards, as with refgular formulas like Burning Ship.

otherwise, thread s through from the top level down to the operators.

initialize with
W := Z + z*s
if absolute x
re(z) := diffabs(re(Z/s), re(z))
similarly for aboslute y with im

When combining operators (this subterm only occurs with multiplication) replace
f(op1, Z + z) with op1(Z + z*s)

and that's all the changes that need to be made as far as I can tell!

I thought it'd be much harder.

Let's see what goes wrong when I try to implement it tomorrow...

Show thread

fractal perturbation maths summary 

For distance estimation of hybrid formulas I use dual numbers for automatic differentiation.

One small adjustment was needed for it to work with rescaled iterations: instead of initializing the dual parts (before iteration) with 1 and scaling by the pixel spacing at the end, initialize the dual parts with the pixel spacing and don't scale at the end. This avoids overflow of the derivative, and the same rescaling factor can be used for regular and dual parts.

I did try something else with rescaling the regular and dual parts with different factors, but didn't get it working.

Show thread

fractal perturbation maths summary 

A preliminary benchmark at one location (zoom 10^533) with derivatives:

scaled double: 3m50s
long double: 11m51s

So scaled double is a big win.

As it's just a Mandelbrot set (just configured with the hybrid formula editor) it can also be rendered with a hardcoded algorithm:

scaled double: 18.5s
long double: 54.1s

So the hardcoded algorithm is very much faster (it can use complex differentiability instead of 2x2 Jacobian, plus no branches or loops in the iteration step.

Complex differentiability also means that series approximation works, which speeds it up a bunch even more:

scaled double: 10.4s
long double: 19.4s

So altogether the hybrid is 24x slower than the built in thing for the same image. I think OpenCL might narrow the gap, so I'll implement that next.

Show thread

fractal perturbation maths summary 

Earlier I implemented rescaled iterations for hybrids with and without derivatives for OpenCL.

Meanwhile I discovered series approximation breaks if the reference orbit is stored in single precision; but later rounding the coefficients calculated with a double precision orbit to single precision seems to work ok. Yesterday I hacked the reference storage abstraction to store double in addition to single if single is requested.

Today I added detailed timing information collection. It reports wall clock time and CPU time for reference calculations, series approximation calculations, and perturbation calculations, as well as a total (which is more than the sum of parts because it include loading/colouring/saving).

Benchmarks on one location were revealing:
- increasing the number of series approximation terms is good for efficiency (lower CPU time) but bad for racing (higher wall clock time)
- series approximation takes a surprisingly large amount of time
- rescaled single precision float hybrids on GPU using OpenCL are competitive speed-wise with built in Mandelbrot on CPU (without OpenCL), even though neither threaded reference calculations nor series approximation can be used.
- hybrids without OpenCL are too slow and inefficient to be used at all

Show thread

fractal perturbation maths summary 

concretely, those benchmarks showed:

6.3s wall, 45.2s CPU for rescaled double with 3 series terms

10.0s wall, 21.1s CPU for rescaled double with 10 series terms

10.5s wall, 6.4s CPU for rescaled float hybrid on GPU with OpenCL

Show thread

fractal perturbation maths summary 

benchmarks for one Burning Ship showed:

rescaled double was fastest on CPU; location was not "deep needle" relative to double's range

extended float was fastest on GPU; location was "deep needle" relative to float's range so rescaled float performed poorly. rescaled double was a close second

rescaled double was fastest on GPU for hybrids

hybrids on CPU are super slow (they are "interpreted", there are many branches on the (constant) formula parameters in the inner loop; when generating OpenCL source code at runtime all these branches can be removed as it is "compiled").

Show thread

fractal perturbation maths summary 

I finally ixed the bug that made OpenCL crash when run in a background thread. Typically, it had nothing to do with OpenCL: when the user interface is wiggled during rendering it sets a "stop" flag to true which the various parts of the renderer inspect often to know their work is no longer needed, but the OpenCL part didn't notice, so after the reference and series approximation calculations had given up the internal state wasn't quite pristine.

The symptoms were crashing on an assertion failure (series approximation iteration count was not less than total number of iterations), but the fix was a few words, essentially "if (stop) return;" at the start of the OpenCL rendering function.

With OpenCL in a background thread, it actually becomes useable for exploration - previously the user interface blocked until rendering was complete (after all references).

Show thread
Sign in to participate in the conversation
post.lurk.org

Welcome to post.lurk.org, an instance for discussions around cultural freedom, experimental, new media art, net and computational culture, and things like that.