Hanno's next "Grand Challenge": Text to Speech
Hanno
Posts: 1,130
Hi,
Chip has written an awesome object that synthesizes speech- one formant at a time. He's very proud and enthusiastic about his object, but I haven't heard more than a few words from our Propellers. The reason is that currently, each word has to be programmed, one formant at a time. I think people want to pass a string to an object and have it do the translation from characters to phonemes to formants. 20 years ago I played with a simple set of chips that did this mapping. There's also open source software out there that does this.
So, here's my second grand challenge to the parallax community:
Publish an MIT licensed object that:
- uses Chip's VocalTract object
- uses <5KB of HUB ram
- uses 1 cog
- can speech synthesize this post- so that I can understand it
Submissions should include the code and an mp3 recording. Winner gets an ultimate license to ViewPort or 12Blocks...
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Co-author of the official Propeller Guide- available at Amazon
Developer of ViewPort, the premier visual debugger for the Propeller (read the review here, thread here),
12Blocks, the block-based programming environment (thread here)
and PropScope, the multi-function USB oscilloscope/function generator/logic analyzer
Post Edited (Hanno) : 3/16/2010 8:36:45 AM GMT
Chip has written an awesome object that synthesizes speech- one formant at a time. He's very proud and enthusiastic about his object, but I haven't heard more than a few words from our Propellers. The reason is that currently, each word has to be programmed, one formant at a time. I think people want to pass a string to an object and have it do the translation from characters to phonemes to formants. 20 years ago I played with a simple set of chips that did this mapping. There's also open source software out there that does this.
So, here's my second grand challenge to the parallax community:
Publish an MIT licensed object that:
- uses Chip's VocalTract object
- uses <5KB of HUB ram
- uses 1 cog
- can speech synthesize this post- so that I can understand it
Submissions should include the code and an mp3 recording. Winner gets an ultimate license to ViewPort or 12Blocks...
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Co-author of the official Propeller Guide- available at Amazon
Developer of ViewPort, the premier visual debugger for the Propeller (read the review here, thread here),
12Blocks, the block-based programming environment (thread here)
and PropScope, the multi-function USB oscilloscope/function generator/logic analyzer
Post Edited (Hanno) : 3/16/2010 8:36:45 AM GMT
Comments
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board now in. $24.99 has backlight driver and touch sensitive decoder.
If you have not already. Add yourself to the prophead map
Sorry for not being clear...
Yes, I want an object that expands Chip's object to do text to speech.
Chip's object requires very little memory to produce very nice speech.
His object models the vocal tract and uses just 13 byte-sized parameters to define how the speech will be generated. From Chip's chapter it looks like Chip was successful in replicating all phonemes with his object- however, he left it as an exercise to the reader. Getting all phonemes right shouldn't take too long and should be fun.
Once you have phonemes, you need to map words to a collection of phonemes- as mentioned, there are sources to get the rules for this.
So, here's what the object should do:
input: text2speech(string("hello world"))
map text to phonemes: hello world-> h e l o~ wh o~ r ld (I'm making this up)
call Chip's object with parameters for each phoneme
Would be great for all projects that need to "display" something...
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Co-author of the official Propeller Guide- available at Amazon
Developer of ViewPort, the premier visual debugger for the Propeller (read the review here, thread here),
12Blocks, the block-based programming environment (thread here)
and PropScope, the multi-function USB oscilloscope/function generator/logic analyzer
text and converts it into commands.
This is a *significant* coding requirement, and would normally be written in C/C++ for high-level OS's; eg most recently for Android #1.6 as an example of an embedded platform.
I would guess that a port via Catalina/LMM would be one route, using open source software, for example from the Festival/CMU community.
Regards,
T o n y
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Short answers? Not available at this time since I think you deserve more information than you requested.
I've already got an object that does phonemic synthesis using Chip's object:
http://forums.parallax.com/showthread.php?p=613308
It's big and bad (neither meant in a good sense, unfortunately). 'Been meaning to work on it again someday. Maybe this is the kick I need.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board now in. $24.99 has backlight driver and touch sensitive decoder.
If you have not already. Add yourself to the prophead map
Great to see you've already completed the phonemes to vocaltract part of the problem- I'll check that out later today.
Here's the open-source text to speech project I mentioned earlier:
espeak.sourceforge.net/
Like vocaltract it allows users to define formants by parameters- however, it also let's you use wav files- and after a brief scan, that's all I found.
However, the text-to-phoneme work should be very applicable here. It has a command line mode where it converts text to phonemes...
Here's the vocaltract-equivalent chip I used ages ago: courses.cit.cornell.edu/ee476/Speech/SPO256-AL2.pdf
A separate chip, the CTS256 drives the SPO256- it converts text to phonemes- I can't find the datasheet, but a program that simulates the chip is here: www.speechchips.com/shop/item.aspx?itemid=13
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Co-author of the official Propeller Guide- available at Amazon
Developer of ViewPort, the premier visual debugger for the Propeller (read the review here, thread here),
12Blocks, the block-based programming environment (thread here)
and PropScope, the multi-function USB oscilloscope/function generator/logic analyzer
I wouldn't say "completed", necessarily. You'll see what I mean.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't click on this.....
Use the Propeller icon!!
http://www.haskins.yale.edu/featured/sws/sws.html
These guys are working on some great leading edge research applied directly to the formants that I found interesting and wanted to share.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Safety Tip: Life is as good as YOU think it is!
Whoa! That sinewave speech is amazing! I mean both amazingly weird sounding and amazing that it's so understandable.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Safety Tip: Life is as good as YOU think it is!
Graham