Shop OBEX P1 Docs P2 Docs Learn Events
Speech Generator — Parallax Forums

Speech Generator

Dave HeinDave Hein Posts: 6,347
edited 2011-05-30 00:21 in Propeller 1
I would like to convert text to speech on the prop, and I'm trying to find out what's already been done. I do know about the VocalTract object in the OBEX. I have also tried Phil's text-to-speech demo that he did a few years ago. This stuff is amazing, but I find it hard to understand unless I read the text as I hear the audio. Has anymore work been done on this to improve the quality?

Comments

  • GranzGranz Posts: 179
    edited 2011-04-11 14:47
    Depending on your needs, you may want to consider just recording your voice saying individual words, or phrases, and then playing them back in sequece. That tends to provide much better clarity, but it does limit your vocabulary. If you only need a limited amount of speech (for example voice alerts, or prompts, cheering or jeering in a game, or what-not) then that may be the way to go.

    This will not be very usefull for things like reading the daily news articles or an audio Bible or book or anything like that, though. For those type things, you will need to go a different route. Possibly, you can get a usefull phoneme set using the above method and string phonemes along like above. That would also require a text-to-speech routine, and you will always need exceptions for things like personal names, foreign words, etc.
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-04-11 16:13
    I've thought about recording the phonemes and stringing them together to make words like you suggested. There are around 44 phonemes, and if I sample them at 8KHz I figure I would need about 35 kbytes to store them. I should be able to compress them by at least two-to-one with a simple DPCM coding algorithm. I might be able to get more compressin on some of the phonemes that contain a lot of redundancy.
  • AntoineDoinelAntoineDoinel Posts: 312
    edited 2011-04-11 16:37
    Dave

    I downloaded this archive but cannot remember the place... don't know if you already have seen it.

    I just had time to quickly peek at the sources, and maybe it would be useable with Phil's phoneme-to-vocaltract if the phoneme representation, which is different, could be matched between the two.

    Licence is Public Domain, so I think it's ok to re-upload it.

    Alessandro
  • HollyMinkowskiHollyMinkowski Posts: 1,398
    edited 2011-04-11 18:43
    Dave Hein wrote: »
    I've thought about recording the phonemes and stringing them together to make words like you suggested. .

    I actually tried this a couple of years ago....the resulting 'voice' was exceptionally
    creepy sounding. Everyone kept playing with it because it was so hilarious :-)

    It was like a cross between a jibbering chinaman and someone possessed by satan. LoL
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-04-12 08:36
    @Alessandro, I compiled the english2phoneme code on a linux box, and it seems to work very well. It should be easy to translate between the phoneme codes it uses and Phil's phoneme codes. I think I will change it to use the phoneme representation that is used by dictionaries when showing a phonetic spelling. My previous sentence currently comes out looking like

    AY THIHNGk AY wIHl CHEYnj IHt tUW yUWz DHAX fOWnIYm rEHprIYzEHntEYSHAXn DHAEt IHz yUWzd bAY dIHkSHAXnEHrIYs WHEHn SHOWIHNG EY fOWnEHtIHk spEHlIHNG

    and I'll probably change it to generate

    ahy thingk ahy wil cheynj it too yooz thuh foh-neem rep-ri-zen-tay-shuhn that iz yoozd bahy dik-shuh-ner-eez hwen shoh-ing uh fo-ne-tik spel-ing

    It's a bit more readable to me, and easier to remember.

    @Holly, your comment makes more interested in trying the recorded-phoneme method. It sounds like it could be quite entertaining. There is a tendency for synthesized stuff to come across as creepy if it is close to being natural, but not quite there. This is evident in the CGI material that is used in movies, where it almost looks real, and ends up looking odd instead.
  • HollyMinkowskiHollyMinkowski Posts: 1,398
    edited 2011-04-12 08:56
    Dave, the way I did it was record and edit the small audio snippets using
    audio editing software. Then I made a simple app for windows that could
    play sequences of them to make words...it really was funny :-)

    I recommend working with a PC app to get the phonemes right...it would be
    rough doing it on a uC.
  • StefanL38StefanL38 Posts: 2,292
    edited 2011-04-12 10:22
    Does anybody remember SAM for the C64. It's 20 years ago that I tested it so my accustical memory of it is half faded away. But if I still remember it right it was medium quality.
    As the C64 had only 64kB I guess the RAM needed was quite low. Does anybody rember how many kB?

    here's a video of the demo file http://www.youtube.com/watch?v=Rm4ZCGgzeeU

    Just found this files for downloading http://the-cbm-files.tripod.com/speak/

    sam64.zip is 28kB big when etxracted.

    So there might be a chance to re-use the SAM phonemic data with SID-Cog. But I have no idea how much work this will be

    best regards

    Stefan
  • LeonLeon Posts: 7,620
    edited 2011-04-12 10:31
    Here is phoneme-based text to speech for the dsPIC:

    http://www.omniboard.be/projects/kit7/kit7.htm

    Quality is about the same as from the speech synthesis chips from 25 years ago.

    Something like it could perhaps be implemented on the Propeller.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-04-12 10:49
    I think Chip's Vocal Tract object is more than robust enough to make the necessary sounds. It's my phoneme synth object feeding it that needs the work. What would really be handy is a program that takes a waveform as input and cranks out the vocal tract parameter stream that can produces it. This is simply an optimization problem with an easy-to-quantify utility metric. It might even be grist for a genetic algorithm.

    -Phil
  • JT CookJT Cook Posts: 487
    edited 2011-04-12 16:21
    There was a Propeller object that I slapped a front end for the C3 that does text to speech. Though this was for the C3, there isn't anything C3 specific so you should be able to run it on any Prop setup with keyboard, TV, and audio out.

    ftp://ftp.propeller-chip.com/PropC3/Games/Jay_T_Cook/jtc_prop-talker_c3_v001/
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-04-13 15:14
    I've played around with Phil's program a bit more, and it performs well on most of the vowels and some of the consonants. Some consonants work well on the leading part of a syllable, but have problems on the trailing part. Other consonants are hard to understand. As Phil said, it just needs a bit of work. Once I understand how the sounds are generated I'll to tweak some of them to see if I can improve it a bit.
  • rogersydrogersyd Posts: 223
    edited 2011-04-14 15:48
    JT Cook, that is one fine piece of software. Nice job coding that one. Works great on my ppdb. Thank you sharing.
  • Jack BuffingtonJack Buffington Posts: 115
    edited 2011-05-29 21:42
    Does anyone know of a good program where you can look at a spectrograph (is that the right word) of speech samples so that I can work out the formants myself? I'm trying to make my own version of a text to speech program and would like to take a very close look at what my own voice is doing and what the output of the propeller is doing so that I can more more accurately model my voice. So far I have taken a look at audacity, which has a decent spectrogram for figuring out sounds that use s, sh, th, f, etc but doesn't provide much information about the formants because it has too few bins in the low frequencies. I have also found this: http://www.zelscope.com/ which gives a reasonably good live spectrogram but doesn't graph it out to figure out pacing. I like that I can zoom to the frequency range that I want though. This one is pretty close to what I want but doesn't record and doesn't show high frequencies. http://wonderfl.net/c/bxCT All of them are pretty close to what I want but not quite right for careful study.
  • HumanoidoHumanoido Posts: 5,770
    edited 2011-05-30 00:21
    A machine is not human and I would not want recorded human phonemes inside it. I prefer a machine sounding voice that's understandable. The Propeller chip does a great job. As Dave mentioned, the code will work even better with some fine tuning.
Sign In or Register to comment.