Does propeller do text-to-speech?

Drachs · 2007-10-03 21:28

I saw an article in make talking about the propeller chip and text-to-speech. Is this something the chip does? If so where can I get some information?

David

Mark Bramwell · 2007-10-03 21:34

I read Make mag ( http://www.makezine.com/ )

What issue had the article? I didn't see it in the ones I have.

Drachs · 2007-10-03 21:38

Issue 10. Here's the online abstract: http://makezine.com/10/propeller/

Nothing in the abstract really except a mp3 of the chip in action.

Mark Bramwell · 2007-10-03 21:49

Thanks for the link. I use a SpeakJet chip. Same idea. http://www.magnevation.com/

The chip is a pre-programmed PIC chip sold as a product. You send it 9600 baud serial commands and the chip talks. It works but the speech is rough. Some sounds are great, some are terrible. The commands are allophones, speed, infliction and so on. You build a special string that represents a word and send it via a serial pin. I used a 8-pin LM386 audio amp chip to drive a speaker on my robot.

The hookup to a microcontroller is very easy. Assign a serial TX pin plus 1 or 2 control pins and away you go.

Rayman · 2007-10-03 21:56

One of the examples somewhere has the Propeller saying "Seven" over and over again. But, I don't think it could do real text-to-speech. Parallax does sell a text-2-speech female voice chip. I've actually bought 2, but haven't tried them yet...

I'm sure it's similiar to the one from the previous post. I think Winbond makes the actual text2speech chip and then people integrate that into something that accepts serial input.

Mark Bramwell · 2007-10-03 21:59

The propeller article in make #10 is long. ( 10 pages ? )
It is a good read, perhaps the parallax guys can post it. I do not want to scan and post it myself in case the photocopy-police come to get me.

Drachs · 2007-10-03 22:09

I did see the Emic Text-to-Speech module on the parallax website, but it's on hold. I don't know what that means. I'll take a look at the speak jet and see what it can do for me, thanks for the tip.

Anybody else have any experience with anything?

Drachs · 2007-10-03 22:15

Unfortunately the article is a biography of the propeller designer and not a DIY article on the chip itself.

Beau Schwabe · 2007-10-03 22:16

Follow this thread to the end...

Phonemic Speech Synthesis
http://forums.parallax.com/showthread.php?p=613308

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe

IC Layout Engineer
Parallax, Inc.

Drachs · 2007-10-03 22:21

Oh hey, check this out: http://obex.parallax.com/objects/72/

What is this guys? Is this the code to make propeller into a voice synthesizer?

Drachs · 2007-10-03 22:33

Thanks Beau, that helps alot. It seems like the propeller community is really great! Makes me wonder if I should switch my projects over to the propeller platform. I'm so used to C I don't know if I want to learn something all new though....

rjo_ · 2007-10-04 01:37

You won't have to ... there is a C compiler coming from ImageCraft. Why don't you see if you can beta test it for them?

rjo_ · 2007-10-04 01:50

Everybody is guessing about phonation. Good physical models are becoming available, here's one: http://meweb.ecn.purdue.edu/~voice/research.htm

mix that with an affect model and you will really have something... and you can do it in C... and I'm sure the Prop can handle it.

Rich

Rayman · 2007-10-04 13:07

So, is the bottom line that the Prop may be able to do phoneme-to-speech (if you're willing to develop the software), but not text-to-speech?

I have a feeling that good text-to-speech would require much more memory than the Prop has for lookup tables. On the other hand, maybe it could be done using an SD card or extra eeproms.

LoopyByteloose · 2007-10-04 13:29

Why not learn American phonemes?· Just find a dictionary with IPA - internationial phonetic alphabet - to convert to phonemes.· IPA is the United Nations standard for all languages of the world.
What do you have to know?
Roughly there are about 24 Vowel sounds and 24 Consants sounds. The count varies a bit depending on who does the counting. And in some cases, the sounds are so similar that it is trivial to choose between them - like the two 'th' sounds.
In other cases , there are consonant blends - like 'bl', 'fl', etc. add to the inventory.

Once you get the hang of it, it has far more control that 'Text-to-speech'. You can even make laughter, a howl or a yodel.· Maybe your chip can sound like a chimpanze.

Incidentally, British English requires more phonemes. So American English is likely the easiest start. But please realize that English does allow you to import words from other languages without changes in their sould. That is precisely why text-to-speech never gets it right.

Try 'bon voyage', 'pinja colada', 'kreplach', or some other such imported term in your text-to-speech chip.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"Everything in the world is purchased by labour; and our passions are the only causes of labor." -- David·Hume (1711-76)········

···················· Tropically,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan

Post Edited (Kramer) : 10/4/2007 1:36:19 PM GMT

hippy · 2007-10-04 14:59

Rayman said...
So, is the bottom line that the Prop may be able to do phoneme-to-speech (if you're willing to develop the software), but not text-to-speech?

If you can get phoneme-to-speech working it's just another layer to add on to get text-to-phoneme. Whether that can be shoe-horned into a Propeller I don't know. Whether phoneme-based speech synthesis is good enough for what you want I don't know either.

There's a PICmicro project "PicTalk" which stores the SPO256 allophones as WAV samples in 32K of Eeprom which can probably port and use minimal Propeller resources. The Propeller Proto Board has 32KB unused in the supplied boot Eeprom.

I'm trying to create an SPO256 emulator using Chip's VocalTract object to generate the 64 allophones but am going nowhere fast as it's all new and unfamiliar territory. I'm aware of the limitations of allophone-based speech and just 'chaining samples together', but I've got to start somewhere. These days it would probably make more sense to use any SpeakerJet allophone naming/numbering than SPO256.

Someone ( search.parallax.com ) created a tool which allows a keyboard / PropTerm to adjust all the vocal tract parameters which is not only fun but probably useful.

Post Edited (hippy) : 10/4/2007 3:06:56 PM GMT

LoopyByteloose · 2007-10-05 15:33

Basically creation of phonemes makes this a liguistics project first and an electronics projects second. The greatest problem with creating a standard American phoneme set is that there really isn't any standard American sound to English.· People in New York sound differnet that people in New Jersey or Boston and so on.

The easiest thing to do is to borrow a complete·inventory from your Colliers College Dictionary or your Meriam Websters Collegate Dictionary, and then build the components by using there text as reference. At some point you'll find that you don't agree with everything, but by then you should be familiar enough with what the phonemes sound like to be able to select the appropriate correction.

If you just keep fooling with phoneme creation in an out of context fashion, nothing will ever get done. It is too abstract and too variable. Working on reciting the alphabet is no good either, as the alphabet is a limited subset of the phoneme inventory of about 48 items. In some ways it is quite remarkable how we listen to all the variation and still sort out all the components to create meaning.

Don't give up. People and computers have at least one thing in common. It is easier for them to talk than to listen.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"Everything in the world is purchased by labour; and our passions are the only causes of labor." -- David·Hume (1711-76)········

···················· Tropically,····· G. Herzog [noparse][[/noparse]·黃鶴 ]·in Taiwan

Post Edited (Kramer) : 10/5/2007 3:40:08 PM GMT

TransistorToaster · 2007-10-09 04:13

During my Masters degree I took a course on speech processing, with my term paper being text to speech. Today, the efforts are geared on concatenation of human recorded speech -at a really high level too- because memory is readily available and the audio produced is more natural sounding.

Does propeller do text-to-speech?

Comments