Some Questions about VocalTract Speech Synthesis
Hi,
at the moment, I try to figure, how the VocalTract.spin speech synthesis for P1 is working. Perhaps I want to rebuild it for P2.
The code and some description of the code is here:
https://forums.parallax.com/uploads/attachments/41345/62908.zip
(This is an interesting basic read by : https://msp.ucsd.edu/syllabi/170.13f/course-notes/node5.html )
Chips model of speech sound:
So we need the sources 1. aspiration, 2. glottal with vibrato and 3. frication.
Aspiration and frication are some sort of white noise.
My first question ist about the "Glottal Pulse". As far as I understand according to the external link, it's wave form must be a pulse train, repeated with the base frequency. It's FFT will show all the harmonic frequencies, which can then be filtered (emphasised) by the formant filters and the nasal filter.
What I don't understand is, that in the code instead of pulses a pure sine wave (modulated by vibrato) seems to be generated? Is this true?
(I also don't understand, why there are no ret instructions in the source code, but at the moment I just assume, that they are hidden somehow. - Edit: Solved The code does self-unfold it's loops and then appends ret .)
Thanks for some hints!
Christof
Comments
I think, the answer for my question ist, that a pure sine wave (frequency modulated with vibrato) is mixed with aspiration noise. The formant filters f1...f4 thus work on this noise. In Phil's talk software there is a whisper mode, which switches off the glottal sine and still produces speech.
Edit: no these are pulses, but I don't understand them.
@cgracey
Hi Chip,
would you like to explain the 3 lines " convert phase to glottal pulse sample"?
Somehow the phase angle gets distorted, but I do not understand, how this works.
Thank you! Christof
Did some measurements. The sine of the distorted phase seems to generate a single positive asymmetric pulse per cycle, which rises slowly and then falls fast. This works, because logarithms and 2*pi are both positive numbers scaled to 0...$FFFF.FFFF. The offset $4000.000 in the third line selects the part of the sine, which is slowed.