Slowly dipping my toes into Perry Cook's dissertation on singing synthesis:

@paul I did some research into the current state of the art in singing speech synthesis a few months back when I discovered that Vocaloid is "just" a diphone synthesizer with some post-processing. I even wrote some code, though it doesn't do anything useful yet. You can condition WaveNets on intermediate parameters instead of text, so my thinking was to make it so that I could alter the pitch and any other parameters before actual synthesis rather than post-processing the waveform.

@freakazoid if there is anything that's public, I'd love to see it!

PRC uses physically based techniques to for singing synthesis, which still seems quite novel even for today. I'm thinking it's due for a renaissance.

@paul I looked at physically-based synthesis at first, because I think it's really neat. I had been thinking perhaps one could play a human-sounding voice like a synthesizer, and in fact it turns out someone built such an instrument, though I'm blanking on what it was called. They may have just called it a vocoder. Physically-based synthesizers are well behind the state of the art at this point, but perhaps they're ripe for advancement. Or they may be good if you don't want too much realism!

@paul I put the code at . I didn't write any documentation, but maybe you'll find it useful. I'm happy to answer any questions about it, though I'll probably need to refresh my own memory a bit.

@paul One thing I was working on was segmentation of speech samples from LibriTTS (I started with LibriVox not realizing LibriTTS existed) for training WaveNets. I was also playing around with formant synthesis and pitch extraction. The ipython notebooks may be most useful.

@freakazoid thanks for the links and explanations.

Curious to know what your end goal is for your project. Is it going to be a singing synthesizer with a TTS interface?

@paul I would be fine inputting phonemes as well, and that would give finer control anyway. Right now the project’s on hold until I have some more mental bandwidth, which seems unlikely before my stuff arrives in Pittsburgh from California. Just too much chaos. There’s also the possibility that I’ll start feeling more motivated once the kids are in school.


it turns out someone built such an instrument, though I'm blanking on what it was called.

The Voder:

wanders off muttering something about Darth Voder

Sign in to participate in the conversation

Welcome to, an instance for discussions around cultural freedom, experimental, new media art, net and computational culture, and things like that.

<svg xmlns="" id="hometownlogo" x="0px" y="0px" viewBox="25 40 50 20" width="100%" height="100%"><g><path d="M55.9,53.9H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,53.9,55.9,53.9z"/><path d="M55.9,58.2H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,58.2,55.9,58.2z"/><path d="M55.9,62.6H35.3c-0.7,0-1.3,0.6-1.3,1.3s0.6,1.3,1.3,1.3h20.6c0.7,0,1.3-0.6,1.3-1.3S56.6,62.6,55.9,62.6z"/><path d="M64.8,53.9c-0.7,0-1.3,0.6-1.3,1.3v8.8c0,0.7,0.6,1.3,1.3,1.3s1.3-0.6,1.3-1.3v-8.8C66,54.4,65.4,53.9,64.8,53.9z"/><path d="M60.4,53.9c-0.7,0-1.3,0.6-1.3,1.3v8.8c0,0.7,0.6,1.3,1.3,1.3s1.3-0.6,1.3-1.3v-8.8C61.6,54.4,61.1,53.9,60.4,53.9z"/><path d="M63.7,48.3c1.3-0.7,2-2.5,2-5.6c0-3.6-0.9-7.8-3.3-7.8s-3.3,4.2-3.3,7.8c0,3.1,0.7,4.9,2,5.6v2.4c0,0.7,0.6,1.3,1.3,1.3 s1.3-0.6,1.3-1.3V48.3z M62.4,37.8c0.4,0.8,0.8,2.5,0.8,4.9c0,2.5-0.5,3.4-0.8,3.4s-0.8-0.9-0.8-3.4C61.7,40.3,62.1,38.6,62.4,37.8 z"/><path d="M57,42.7c0-0.1-0.1-0.1-0.1-0.2l-3.2-4.1c-0.2-0.3-0.6-0.5-1-0.5h-1.6v-1.9c0-0.7-0.6-1.3-1.3-1.3s-1.3,0.6-1.3,1.3V38 h-3.9h-1.1h-5.2c-0.4,0-0.7,0.2-1,0.5l-3.2,4.1c0,0.1-0.1,0.1-0.1,0.2c0,0-0.1,0.1-0.1,0.1C34,43,34,43.2,34,43.3v7.4 c0,0.7,0.6,1.3,1.3,1.3h5.2h7.4h8c0.7,0,1.3-0.6,1.3-1.3v-7.4c0-0.2,0-0.3-0.1-0.4C57,42.8,57,42.8,57,42.7z M41.7,49.5h-5.2v-4.9 h10.2v4.9H41.7z M48.5,42.1l-1.2-1.6h4.8l1.2,1.6H48.5z M44.1,40.5l1.2,1.6h-7.5l1.2-1.6H44.1z M49.2,44.6h5.5v4.9h-5.5V44.6z"/></g></svg>