Pages

Chapter 8: Vocal Synthesis

So this chapter is a highly requested one, and is something I researched a lot while I waited for DSN to be released. It's definitely more about linguistics than music, but it can be used in musical ways. If you've ever used the homebrew gameboy synth LSDJ, you may have come across the vocal samples built in to the recent updates. LSDJ has 42 "words" it can make. If you are interested, check out the list here, on page 57, but it's not necessary for us. Also, there is a popular YouTube video of somebody using vocal synthesis on DS-10, see it here. From what I can tell, it looks like he's only using DS-10 to make the vocals and not the rest of the song's sounds. His patch screen looks a bit different than what we will set up. Here his is, don't freak out:


To recreate what  that would look like is pictured here:


We don't need to do all that, though. Let's use "Disco" as our guide.

My interest definitely peaked when I saw the "Disco" promotional video for DSN, in which the song ends with the synth speaking its name. The demo song comes built into the DSN, check it out to hear some great examples of what this little app is capable of.

Formant synthesis is the term usually used for replicating vowel sounds this way. "Like an old speak and spell" is the best way to describe it. I could discuss the different sounds language makes in terms of fricatives and sonorants, but that's a complex subject I won't get into.

Let's start with vowels, because they are the clearest to hear and the easiest to replicate. Try making these sounds with your mouth (and nobody listening, so you don't look crazy). Keep your tongue pressed down and in one continuous breath, say "ah, eh, ee, oo, uh". Notice how you're mouth says open and you are using how open your mouth is to filter the sound. Your vocal cords are the oscillator. Diphthongs like ai and ey are actually two different sounds next to each other, in this case ah/ee for ai and eh/ee for ey. It just sounds like one sound because of how quick you change your mouth filter. 

Use the following patch. I noticed we get the best results with VCO2 set to triangle or square:




Now all we need to do is adjust the CUTOFF on the SYNTH page. Notice how there is a small range here that makes the vowels, if you open or close the filter too much, it no longer sounds like a voice. In fact, if I set CUTOFF to change with the KX sequencer, this is the only positions that have useable vowels:



There are actually good vowel sounds between those steps that you can get by manually setting the CUTOFF knob, I just wanted to show you the limits here. And playing the sequence at a slow tempo will give a good idea of what to listen for. By manually having your CUTOFF set to a spot between these positions and leaving some steps in the KX sequence blank, it will jump back to your manually set spot when the KX has nothing. I hope I wrote that in a way that makes sense. 

What I like to call the spitting consonants are made in a completely different way. Try making the following sounds with your mouth: "Kuh, Fuh, Huh, Puh, Suh, Tuh, SHuh, CHuh". Try it without the vowel sound at the end. Do you hear it? In addition to sounding like you are dismissively angry, these sounds are a lot like the hiss of the VCO1 noise wave, with different envelopes and filtering. Good, we already know how to change those, let's do this in the same setup we used before, with VCO2 patched to VCA. You can have KY switch VCO1 to noise by setting KY to VCO1 WAVE and putting it to max in the part of the word that is noise. This works good for S sounds, at least. It makes a sharper T kind of sound if you set your GATE to 25%. Telling the difference between S and SH would have to be a subtle change in CUTOFF, as far as I can tell.

Kuh was a tough one, but I think I got it. The trick is to set GATE to 50% for the step that K is on and change KX to VCF EG INT. Set up your KX so that it is in the spot 6 down. Like this:



Other consonants sound more like waves. Buh, Duh, Guh (like the g from gum), Juh, Luh, Muh, Nuh, Ruh, Vuh, Wuh. Also you don't spit the sound out like we did on those first consonants. These last consonants are the most difficult part to synthesize, and it's easy to not be able to discern them without hearing it with a vowel, it just sounds like another synthy sound. Try saying without a vowel sound at the end. It just stops. Making these sounds is done by using the filter EG INT on the vowel that comes after, reducing attack, and sending EG TO VCO2 PITCH IN. We can't separate these consonants from the vowels, but we really don't need to. Here is DEE:



On the VCF EG INT, it's really easy to go from DEE to YAY with just the smallest tweak. Finding the right points takes some careful movements. I also found a kind of WUH, or the first part of the word what, by messing with the ATTACK, VCF EG INT, and the PITCH IN knob. Check it:




Chaining together your sounds into words can be done in several ways. I just COPY the vowel setup and paste both the Sequence and the Tone across the patterns in the same Track row. From there, you can adjust as necessary for each sound, but you'd basically start from the same setup.

I haven't mapped out every sound, but I feel like we can get everything from the setups above. Please leave a comment if you run into any issues or need help with a specific word.


Return to Index

1 comment:

  1. Merkur 23C Review: A Merkur 23C Safety Razor - Deccasino
    The Merkur 23C is one of the most popular safety razors in choegocasino the world, having an extra deccasino 10 febcasino mm. You could replace it with a Merkur

    ReplyDelete