Hanno's next "Grand Challenge": Text to Speech

Hanno · 2010-03-16 05:57

Hi,
Chip has written an awesome object that synthesizes speech- one formant at a time. He's very proud and enthusiastic about his object, but I haven't heard more than a few words from our Propellers. The reason is that currently, each word has to be programmed, one formant at a time. I think people want to pass a string to an object and have it do the translation from characters to phonemes to formants. 20 years ago I played with a simple set of chips that did this mapping. There's also open source software out there that does this.

So, here's my second grand challenge to the parallax community:

Publish an MIT licensed object that:
- uses Chip's VocalTract object
- uses <5KB of HUB ram
- uses 1 cog
- can speech synthesize this post- so that I can understand it

Submissions should include the code and an mp3 recording. Winner gets an ultimate license to ViewPort or 12Blocks...
Hanno

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Co-author of the official Propeller Guide- available at Amazon
Developer of ViewPort, the premier visual debugger for the Propeller (read the review here, thread here),
12Blocks, the block-based programming environment (thread here)
and PropScope, the multi-function USB oscilloscope/function generator/logic analyzer

Post Edited (Hanno) : 3/16/2010 8:36:45 AM GMT

Phil Pilgrim (PhiPi) · 2010-03-16 06:33

Hanno said...
- can speech synthesize this post- so that I can understand it

This could be taken to imply that you want text-to-speech. Is that what you meant?

-Phil

mctrivia · 2010-03-16 06:41

looks like it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board now in. $24.99 has backlight driver and touch sensitive decoder.

If you have not already. Add yourself to the prophead map

Hanno · 2010-03-16 08:34

Hi Phil,
Sorry for not being clear...
Yes, I want an object that expands Chip's object to do text to speech.
Chip's object requires very little memory to produce very nice speech.
His object models the vocal tract and uses just 13 byte-sized parameters to define how the speech will be generated. From Chip's chapter it looks like Chip was successful in replicating all phonemes with his object- however, he left it as an exercise to the reader. Getting all phonemes right shouldn't take too long and should be fun.
Once you have phonemes, you need to map words to a collection of phonemes- as mentioned, there are sources to get the rules for this.
So, here's what the object should do:
input: text2speech(string("hello world"))
map text to phonemes: hello world-> h e l o~ wh o~ r ld (I'm making this up)
call Chip's object with parameters for each phoneme
Would be great for all projects that need to "display" something...
Hanno

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Co-author of the official Propeller Guide- available at Amazon
Developer of ViewPort, the premier visual debugger for the Propeller (read the review here, thread here),
12Blocks, the block-based programming environment (thread here)
and PropScope, the multi-function USB oscilloscope/function generator/logic analyzer

TonyWaite · 2010-03-16 13:02

As Chip has written the speech-engine, the missing link is the text-engine: ie the software that analysises the
text and converts it into commands.

This is a *significant* coding requirement, and would normally be written in C/C++ for high-level OS's; eg most recently for Android #1.6 as an example of an embedded platform.

I would guess that a port via Catalina/LMM would be one route, using open source software, for example from the Festival/CMU community.

Regards,

T o n y

jazzed · 2010-03-16 14:04

TonyWaite said...

I would guess that a port via Catalina/LMM would be one route, using open source software, for example from the Festival/CMU community.

Except that Festival requires GNU. Catalina is not GNU. The closest thing we have is heater's ZOG emulator running GNU.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Short answers? Not available at this time since I think you deserve more information than you requested.

Phil Pilgrim (PhiPi) · 2010-03-16 16:00

Hanno,

I've already got an object that does phonemic synthesis using Chip's object:

http://forums.parallax.com/showthread.php?p=613308

It's big and bad (neither meant in a good sense, unfortunately). 'Been meaning to work on it again someday. Maybe this is the kick I need.

-Phil

mctrivia · 2010-03-16 17:06

I think a pre procesing computer program that can generate code to input into chips would be good start.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board now in. $24.99 has backlight driver and touch sensitive decoder.

If you have not already. Add yourself to the prophead map

Hanno · 2010-03-16 18:57

Hi Phil!
Great to see you've already completed the phonemes to vocaltract part of the problem- I'll check that out later today.
Here's the open-source text to speech project I mentioned earlier:
espeak.sourceforge.net/

Like vocaltract it allows users to define formants by parameters- however, it also let's you use wav files- and after a brief scan, that's all I found.

However, the text-to-phoneme work should be very applicable here. It has a command line mode where it converts text to phonemes...

Here's the vocaltract-equivalent chip I used ages ago: courses.cit.cornell.edu/ee476/Speech/SPO256-AL2.pdf
A separate chip, the CTS256 drives the SPO256- it converts text to phonemes- I can't find the datasheet, but a program that simulates the chip is here: www.speechchips.com/shop/item.aspx?itemid=13

Hanno

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Co-author of the official Propeller Guide- available at Amazon
Developer of ViewPort, the premier visual debugger for the Propeller (read the review here, thread here),
12Blocks, the block-based programming environment (thread here)
and PropScope, the multi-function USB oscilloscope/function generator/logic analyzer

Phil Pilgrim (PhiPi) · 2010-03-16 19:43

Hanno,

I wouldn't say "completed", necessarily. You'll see what I mean.

-Phil

Microcontrolled · 2010-03-17 01:37

I have been using Phil's speech for at least a year. IT IS GREAT! For your challenge, I will modify Phil's program, but I do not need something else in return. I notice that this is the 2nd time I have piggybacked my challenge-entry on Phil's work, dang it. I need to learn how to write my own awesome programs.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't click on this.....

Use the Propeller icon!!

potatohead · 2010-03-17 05:04

In light of past speech recognition topics, and this one running right now, I have to link this:

http://www.haskins.yale.edu/featured/sws/sws.html

These guys are working on some great leading edge research applied directly to the formants that I found interesting and wanted to share.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Safety Tip: Life is as good as YOU think it is!

Phil Pilgrim (PhiPi) · 2010-03-17 05:24

potatohead,

Whoa! That sinewave speech is amazing! I mean both amazingly weird sounding and amazing that it's so understandable.

-Phil

potatohead · 2010-03-17 06:51

Yeah, my thoughts too. After sitting with Chip, learning about the formants, then seeing this, I am intrigued.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Safety Tip: Life is as good as YOU think it is!

Graham Stabler · 2010-03-17 10:15

It's clear there are still many potential ways to skin a cat

Graham

Hanno's next "Grand Challenge": Text to Speech

Comments