Does propeller do text-to-speech?
Drachs
Posts: 6
I saw an article in make talking about the propeller chip and text-to-speech. Is this something the chip does? If so where can I get some information?
David
David
Comments
What issue had the article? I didn't see it in the ones I have.
Nothing in the abstract really except a mp3 of the chip in action.
The chip is a pre-programmed PIC chip sold as a product. You send it 9600 baud serial commands and the chip talks. It works but the speech is rough. Some sounds are great, some are terrible. The commands are allophones, speed, infliction and so on. You build a special string that represents a word and send it via a serial pin. I used a 8-pin LM386 audio amp chip to drive a speaker on my robot.
The hookup to a microcontroller is very easy. Assign a serial TX pin plus 1 or 2 control pins and away you go.
I'm sure it's similiar to the one from the previous post. I think Winbond makes the actual text2speech chip and then people integrate that into something that accepts serial input.
It is a good read, perhaps the parallax guys can post it. I do not want to scan and post it myself in case the photocopy-police come to get me.
Anybody else have any experience with anything?
Phonemic Speech Synthesis
http://forums.parallax.com/showthread.php?p=613308
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe
IC Layout Engineer
Parallax, Inc.
What is this guys? Is this the code to make propeller into a voice synthesizer?
mix that with an affect model and you will really have something... and you can do it in C... and I'm sure the Prop can handle it.
Rich
I have a feeling that good text-to-speech would require much more memory than the Prop has for lookup tables. On the other hand, maybe it could be done using an SD card or extra eeproms.
What do you have to know?
Roughly there are about 24 Vowel sounds and 24 Consants sounds. The count varies a bit depending on who does the counting. And in some cases, the sounds are so similar that it is trivial to choose between them - like the two 'th' sounds.
In other cases , there are consonant blends - like 'bl', 'fl', etc. add to the inventory.
Once you get the hang of it, it has far more control that 'Text-to-speech'. You can even make laughter, a howl or a yodel.· Maybe your chip can sound like a chimpanze.
Incidentally, British English requires more phonemes. So American English is likely the easiest start. But please realize that English does allow you to import words from other languages without changes in their sould. That is precisely why text-to-speech never gets it right.
Try 'bon voyage', 'pinja colada', 'kreplach', or some other such imported term in your text-to-speech chip.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"Everything in the world is purchased by labour; and our passions are the only causes of labor." -- David·Hume (1711-76)········
Post Edited (Kramer) : 10/4/2007 1:36:19 PM GMT
If you can get phoneme-to-speech working it's just another layer to add on to get text-to-phoneme. Whether that can be shoe-horned into a Propeller I don't know. Whether phoneme-based speech synthesis is good enough for what you want I don't know either.
There's a PICmicro project "PicTalk" which stores the SPO256 allophones as WAV samples in 32K of Eeprom which can probably port and use minimal Propeller resources. The Propeller Proto Board has 32KB unused in the supplied boot Eeprom.
I'm trying to create an SPO256 emulator using Chip's VocalTract object to generate the 64 allophones but am going nowhere fast as it's all new and unfamiliar territory. I'm aware of the limitations of allophone-based speech and just 'chaining samples together', but I've got to start somewhere. These days it would probably make more sense to use any SpeakerJet allophone naming/numbering than SPO256.
Someone ( search.parallax.com ) created a tool which allows a keyboard / PropTerm to adjust all the vocal tract parameters which is not only fun but probably useful.
Post Edited (hippy) : 10/4/2007 3:06:56 PM GMT
The easiest thing to do is to borrow a complete·inventory from your Colliers College Dictionary or your Meriam Websters Collegate Dictionary, and then build the components by using there text as reference. At some point you'll find that you don't agree with everything, but by then you should be familiar enough with what the phonemes sound like to be able to select the appropriate correction.
If you just keep fooling with phoneme creation in an out of context fashion, nothing will ever get done. It is too abstract and too variable. Working on reciting the alphabet is no good either, as the alphabet is a limited subset of the phoneme inventory of about 48 items. In some ways it is quite remarkable how we listen to all the variation and still sort out all the components to create meaning.
Don't give up. People and computers have at least one thing in common. It is easier for them to talk than to listen.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"Everything in the world is purchased by labour; and our passions are the only causes of labor." -- David·Hume (1711-76)········
Post Edited (Kramer) : 10/5/2007 3:40:08 PM GMT