Shop OBEX P1 Docs P2 Docs Learn Events
Some Questions about VocalTract Speech Synthesis — Parallax Forums

Some Questions about VocalTract Speech Synthesis

Christof Eb.Christof Eb. Posts: 1,195
edited 2024-07-12 08:41 in General Discussion

Hi,
at the moment, I try to figure, how the VocalTract.spin speech synthesis for P1 is working. Perhaps I want to rebuild it for P2.
The code and some description of the code is here:
https://forums.parallax.com/uploads/attachments/41345/62908.zip

(This is an interesting basic read by : https://msp.ucsd.edu/syllabi/170.13f/course-notes/node5.html )

Chips model of speech sound:

So we need the sources 1. aspiration, 2. glottal with vibrato and 3. frication.
Aspiration and frication are some sort of white noise.

My first question ist about the "Glottal Pulse". As far as I understand according to the external link, it's wave form must be a pulse train, repeated with the base frequency. It's FFT will show all the harmonic frequencies, which can then be filtered (emphasised) by the formant filters and the nasal filter.

What I don't understand is, that in the code instead of pulses a pure sine wave (modulated by vibrato) seems to be generated? Is this true?

(I also don't understand, why there are no ret instructions in the source code, but at the moment I just assume, that they are hidden somehow. - Edit: Solved The code does self-unfold it's loops and then appends ret .)

Thanks for some hints!
Christof

Comments

  • Christof Eb.Christof Eb. Posts: 1,195
    edited 2024-07-20 15:56

    I think, the answer for my question ist, that a pure sine wave (frequency modulated with vibrato) is mixed with aspiration noise. The formant filters f1...f4 thus work on this noise. In Phil's talk software there is a whisper mode, which switches off the glottal sine and still produces speech.

    Edit: no these are pulses, but I don't understand them.

  •                         add     t1,tune                 'tune scale so that gp=100 produces 110.00Hz (A2)
    
                            call    #antilog                'convert pitch (log frequency) to phase delta t2(t1)
                            add     gphase,t2
    
                            mov     t1,gphase               'convert phase to glottal pulse sample
                            call    #antilog
                            sub     t2,h40000000
    
                            mov     t1,ga
                            call    #sine                   ' t1= ampl, t2= angle, result=t1
    

    @cgracey
    Hi Chip,
    would you like to explain the 3 lines " convert phase to glottal pulse sample"?
    Somehow the phase angle gets distorted, but I do not understand, how this works.
    Thank you! Christof

  • Christof Eb.Christof Eb. Posts: 1,195
    edited 2024-07-22 09:22

    Did some measurements. The sine of the distorted phase seems to generate a single positive asymmetric pulse per cycle, which rises slowly and then falls fast. This works, because logarithms and 2*pi are both positive numbers scaled to 0...$FFFF.FFFF. The offset $4000.000 in the third line selects the part of the sine, which is slowed.

Sign In or Register to comment.