yesterday I failed at making some drone music, just didn't turn out any good.

today I failed at training a neural network (trying to eventually do deep dream stuff for audio with a music vs speech discriminator - currently testing with sine wave vs white noise), just wouldn't work. possibly I had some matrix or other transposed or my architecture was fatally flawed or something else. the vanishing gradient problem bit me as part of this, output was effectively random as the earlier layers wouldn't get going.

wondering what I can fail at next week...

So far I've implemented the timbre stamp algorithm:

c <- haar(control-input)

n <- haar(noise-input)

e <- calculate-energy-per-octave(c)

o <- amplify-octaves-by(n, e)

output <- unhaar(o)

(operating on windowed overlapped chunks)

Attached has a segment of The Archers (BBC Radio 4 serial) as control input, with white noise as noise input. The output is normalized afterwards, otherwise it is very quiet (I suspect because the white noise has little energy in the lower octaves to start with).

Starting from the Energy Per Octave Per Rhythm table, I tried synthesizing speech-like noise by applying the template to white noise. But this didn't work at all well as the white noise had no rhythmic content to speak of, so amplifying it didn't do much (0 * gain = 0).

Feeding back the output to the input, so the noise becomes progressively more rhythmic, worked a lot better - takes a couple of minutes to escape from silence, and then there are about 5 sweet minutes until it goes all choppy with very loud peaks separated by silences. I tested with the feedback delay synchronous to the analysis windows, trying a desynchronized delay next.

claude@mathr@post.lurk.orgSwitched from Haar wavelets for energy per octave (11 bins), to Discrete Fourier Transform (via the fftw3 library) for energy spectrum (513 bins). Overlap factor 16, raised cosine window.

Enlarged the self-organizing map from 8x8 to 16x16, using Earth-Mover's Distance instead of Euclidean Distance when chosing the best matching unit to update the SOM.

Initial SOM weights initialized via Cholesky decomposition of covariance matrix to generate correlated Gaussian random variates (as before). Using GNU GSL to do the linear algebra and pseudo random number generation.

Still using 1st-order Markov chain for the resynthesis.

Analysis pass takes 16mins per hour of input audio, single threaded. Thinking about parallelism as that's a long wait when experimenting.

Synthesis pass is very quick, less than a second per minute of output audio.

Refs:

http://www.fftw.org/fftw3_doc/The-Halfcomplex_002dformat-DFT.html

https://en.wikipedia.org/wiki/Earth_mover's_distance#Computing_the_EMD

https://en.wikipedia.org/wiki/Self-organizing_map#Algorithm

https://en.wikipedia.org/wiki/Cholesky_decomposition#Monte_Carlo_simulation