Emic Text-To-Speech

Rayman · 2008-01-17 14:11

I just got my Emic module to work properly with the Prop...· Took some work as the Reset line is very picky about voltage levels and it is a 5V device...

Anyway, here's my code (I know somebody else posted code a long time ago):

Also, I noted my wiring in the driver file...

PS:· I noticed that they are back in stock on the Parallax site...

Manual mentioned that more dialects may be coming, but since this device appears to be a few years old already, I'm not going to hold my breath...

Paul Baker · 2008-01-17 19:58

No more dialects coming, in fact the situation is even more dire, the chip is no longer manufactured. What is already on the market is what is availible, we have aquired what stock we can and when it's gone, it's gone. For this reason we do not recommend using the part for new designs.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Paul Sr. · 2008-01-17 20:24

Paul,

What are you folks recommending these days as a replacement?

Paul Baker · 2008-01-17 21:10

There is no drop in replacement, but there are still a few companies which make TTS chips or systems.·I have not evaluated any of them so no recommendation can be made for any of them:

http://www.winbond-usa.com/en/content/view/55/159/
http://www.textspeak.com/products.htm
http://www.speakjet.com/

I think we used to carry the last chip, I don't know what our reason for no longer carrying it (I think it may have been more difficult to work with, but this is conjecture).
The second link is more of an end use product and is beyond the budget of most hobbyists. The top link points to the most popular chip, a search of WTS701 shows nearly 200 sites which discuss it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Paul Sr. · 2008-01-18 13:14

Thanks, Paul

Drone · 2008-01-18 19:23

Hi Paul,

Can't find info from Winbond on your link with respect to the WTS701. This venerable chip was introduced around 2001, but seems hard if not impossible to get.

Hi Rayman,

Text to speech chips seem to be a dying breed - can't even find the venerable SPO256-AL2 which used to be sold even by Radio Shack in the US.

www.speechchips.com covers what's available pretty-well. They have a "front end" chip for the Magnevation SpeakJet, but the front end chip plus DIP SpeakJet (seemingly forever backorder-status) will run you a total of $38 USD (ouch).

Then there's the DevanTech Speech Module www.acroname.com/robotics/parts/R184-SP03.html which does use the illusive WTS701 and integrates the Winbond chip, speaker, and I2C interface. But at $119 USD - ouch, again!

Certainly the demise of dedicated text-to-speech IC's is in-part due to the fact that microcontrollers (if even lowly PLD's or CPLD's) can play back storeed speech from memory. I seem to remember seeing some ATTiny applications that did this from a small DIP with attached serial EEPROM. Need to look further for code that does "real" text-to-speech on an embedded microcontroller though.

Perhaps someone else can suggest a better solution for text-to-speech. I only searched for around 15 minutes...

Regards, David

Rayman · 2008-01-18 19:29

I think you're right...

My fallback position is to use an SD card to store pre-generated wav files.
AT&T has a nice web page that will generate a wav file from any text you enter with several different voice/language options:

http://www.research.att.com/~ttsweb/tts/demo.php

Drone · 2008-01-18 20:43

Hi Rayman,

Cool link to the ATT site for .wav text to speech file generation. But keep in-mind that Restrictions Apply to .wav's generated by this site: (Snip) "Audio files produced on our site are intended only for private, non-commercial use."

You obviously get the picture from my previous post Rayman... but I'd like to add a bit to clarify for others that may be jumping in: There are two solutions to text-to-speech I'll address...

1. Make sound files that can be played back from memory, e.g. from an SD card via a DAC. This allows only pre-stored words, limited by storage space. Good for making a speaking clock, for-example. You have a relatively small "dictonary" of stored words, like "one" through "twelve", "AM", "PM", "O'Clock" etc. The top object invokes playing of stored words by calling routines by their text name (e.g. talk.one, talk.ocllock).

2. Ues a dedicated text-to-speech IC. These are getting hard to find. With a text to speech IC you send data to the chip and it ouputs speech based on the words (typically if not only in English). This solution gives you complete flexibility to "speak" anthing you throw at the text to speech IC and the heavy lifting is done in the speech IC, not the propeller. Let's say you wanted to make a speaking terminal connected to the Internet for blind users. This is the way to go.

True text-to-speech is possible in Propeller-native (IMHO), but you need a lot of knowledge about DSP etc. to buld the app. This is where a C compiler for Propeller may help as there's likely C stuff out there already that may form a basis.

One more note: I took a quick look at some fundamental digital speech techniques with regards to patents. As suspected, the now disfunctional (if not corrupt) US Patent and Trade Office (USPTO) seems to have issued patents for stuff that's been "out there" for decades to the likes of Micro$oft, Apple, et.al. This issue isn't limited to speech generation, but text-to-speech too. So if you want to go to "production" with your text to speech embedded in propeller, make sure you set-aside a lot of money for greedy Lawyers.

On this last paragraph regarding patents - I don't want this thread to blow up on this controversial issue. Hence I'll only stick to the topic of text to speech in any future posts to this thread.

Regards. David

VIRAND · 2008-01-18 20:57

For one, the vocal tract object.
http://forums.parallax.com/forums/default.aspx?f=25&p=1&m=152683

I have some phoneme sets that I made on the APPLE II about 20 years ago in about 10kbytes each.
It's on my to do list to port them to Spin. But what was awesome in 1984 may be awful in 2008.

Fred Hawkins · 2008-01-18 21:42

Drone said...

One more note: I took a quick look at some fundamental digital speech techniques with regards to patents. As suspected, the now disfunctional (if not corrupt) US Patent and Trade Office (USPTO) seems to have issued patents for stuff that's been "out there" for decades to the likes of Micro$oft, Apple, et.al. This issue isn't limited to speech generation, but text-to-speech too. So if you want to go to "production" with your text to speech embedded in propeller, make sure you set-aside a lot of money for greedy Lawyers.

ms, apple et al may be just troll proofing themselves. Look what a dead troll did·to Rimm.

Dgswaner · 2008-01-18 21:57

I have the AT&T TTS naturals voices "Mike" and "Crystal", I can use them with out restriction. so if you need an MP3 of some text let me know.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"A complex design is the sign of an inferior designer." - Jamie Hyneman, Myth Buster

DGSwaner

VIRAND · 2008-01-19 03:36

I'm sure mine is patent-proof.

Rayman · 2008-01-29 16:57

I've found a better, free text-to-speech web site here:

http://www.ivosoftware.com/ivonaonline.php

I'm now doing a mix of Emic and wav file playback...

The Emic makes you tell it to accept audio input... I supposet that's a good idea..

Oldbitcollector (Jeff) · 2008-01-29 17:03

"world's famous sexperts..." Hah! Sounds good.

Question, with the ability to play wav files (thanks Ray) couldn't we just record the basic sounds
and play them back to create words as needed? The answer seems simple, but then that's because
I don't have expertise here.. [noparse]:)[/noparse]

OBC

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?

Getting started with the Protoboard? - Propeller Cookbook
Got an SD card? - PropDOS
A Living Propeller FAQ - The Propeller Wiki
(Got the Knowledge? Got a Moment? Add something today!)

Tim-M · 2008-01-29 18:21

Don't forget about the V-Stamp from RCSystems.· I like this product alot and feel that is the most understandable of all·the Text-to-Speech speech synthesis chips or modules that are available.· The output can be easily·customized in many ways and it has a large recording capability integrated·too.· I'm not affiliated with these folks in any way, I just like this little module and feel it's a good value.

Tim

Edit:· P.S. Look under the 'Downloads' heading for·demo·voice samples to check out.· Oh, it comes in a 3.3 volt version too.

Post Edited (Tim-M) : 1/29/2008 6:39:08 PM GMT

VIRAND · 2008-01-30 06:36

OBC said...
Question, with the ability to play wav files (thanks Ray) couldn't we just record the basic sounds
and play them back to create words as needed? The answer seems simple, but then that's because
I don't have expertise here.. [noparse]:)[/noparse]

That always works for me,
but you need to get each of the phonetic sounds all in the same monotone pitch to start with,
or else they sound like a skipping codec when you put them together.

Drone · 2008-01-30 12:22

Virand, I've been looking for .WAV snippets that are matching allophones phonemes (or whatever you call them), just like you describe. Haven't been able to find them, but I have seen them somewhere on the Web before.

BTW - I found the venerable SPO256-AL2 speech IC's for sale at www.imagesco.com/semiconductors/index.html for $33.95 USD. Ouch - they used to cost $13 at Radio Shack decades ago.

Tim, Whoa! That VStamp is $84 bucks...

There's gotta be a way to get the Prop to do this in software.

Rgds, David

Post Edited (Drone) : 1/30/2008 12:33:54 PM GMT

Drone · 2008-01-30 12:39

Ha - found something...

Somebody made a PIC16F628 and a 24LC512 EEPROM speak by programming the allophones seemingly scraped from a SPO256-AL2 into it. Seems like the stuff may be downloadable - but I don't have time to study the sites right now.

Links:

http://home.alphalink.com.au/~derekw/pictalker/main.htm

www.isk.kth.se/kursinfo/6b4059/pictalk/fonems/index.htm

Rgds, David

Rayman · 2008-01-30 13:28

Drone: Very interesting! I think one could make the Prop itself talk this way... I guess the hard part would be converting text to allophones...

Drone · 2008-01-30 13:38

Yes, If we can find the allophones and get them in .wav format, then I've seen a thread here with the Prop playing .wav files off an SD card (your project Rayman?) The SPO256-AL2 seems to have used 64 English speech allophones, and it sounded ok. If I remember there was a way to use the SPO256-AL2 part's built-in dictionary of words but you could also build custom words by stringing allophones together. I think the Speakjet part mentioned above does this, you can download a dictionary of words it has. So for text to speech there is a dictionary of allphones that make up spoken word equivalents, or for building your own words customize the dictionary further; although for the latter case it and for only a small set of words it may be easier-better just to store whole recorded .wav words. So the questions are, where to get these 64 allophones and how much memory will they take in .wav format? If indeed that's how it works. It might be that the allophones are built around a model of the human volcal tract (Linear Predictive Coding or LPC comes to mind), and the model must be coded as well. Hmmm...

Edited: The .wav's for the allophones are given in the page www.isk.kth.se/kursinfo/6b4059/pictalk/fonems/index.htm I linked above! Right-click and save them.

David

Post Edited (Drone) : 1/30/2008 1:46:00 PM GMT

Drone · 2008-01-30 16:48

Hi Rayman,

This thread may be turning into a cutting-edge thing: Making the Propeller "Speak", largely on-chip; with minimal external hardware (perhaps just a larger external EEPROM and DS-D/A-LPF)... of course founded on the works of others (.wav from SD card project for example, credit-due).

I downloaded the allophones and used a (seemingly buggy) utility to concatenate allophones in WinXP to speak "Propeller". Results were "choppy" at best but somewhat understandable, a good first step given I've only spent an hour or two on this. I don't want to give out the link for the WinXP concatenate utility yet cause (as I said before) I think it is buggy and there are probably better alternatives out there (read below).

The 59 .wav allophones are around 68kBytes total, too large for the likes of Prop Protoboard so we need more storage. Speed wise, the PIC example seems to indicate reading an EEPROM may be ok, even over I2C, and SD card should be ok too. I want to get these concatinated .wav allophones to at least speak one word (i.e. Propeller) in a realtively clear manner in a concatinated .wav on a PC before even thinking about moving to the Prop.

It is my nature to gravitate to Shell/Perl-Scrips with the likes of SOX and Mplayer etc. in Linux/xBSD to solve this (steep learning curve but powerful are these mostly command-line Linux/xBSD tools).

However, as most posters here seem to be Windows users, maybe I (or somebody) should fire-up Audacity http://audacity.sourceforge.net/ the free and open audio editor app and crunch these allophone .wav files first and try to build a word from the allophones (it seems Audacity 1.3+ has a "Multiple clips per track" feature added that may help, current version is 1.3.4 Beta as of this writing, 1.2.6 is Stable).

Getting these scraped SPO256 allophones to first speak in Windows concatenated into a word .wav file (first successful word challenge is "Propeller") is IMHO a first step. Bit. reverse-Engineering the PICTalk source (links above in this thread) is another alternative as there seems to be an allophone.bin that may be usable.

Regards, David (Drone)

Post Edited (Drone) : 1/30/2008 4:57:56 PM GMT

Rayman · 2008-01-30 17:02

I think we could use an SD card to store the waves and just buffer a few in Prop RAM while speaking...
I don't expect this will sound all that great (it will be like 80's text-to-speech), but the price is right!

KenLem · 2008-01-30 19:35

I have better quality wav's of the SP0256-AL2 included as part of my ChipTalk program available for free on my site. Check the install folder.

See http://www.speechchips.com/shop/item.aspx?itemid=13

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.speechchips.com

Speech & Video IC's for BasicStamps

Drone · 2008-01-31 18:58

Hi Everyone,

I spent a few more minutes trying to concatinate the allophones from the SPO256 site linked to by me above in this thread (thanks KenLem, I'll try your SPO256-AL2 .wav's when I have time). I was able to concatinate the files with a free utility from www.jhepple.com/FxConCat/fx_concat.htm and make them speak the word "Propeller". However what gets saved after concatenation turns into a 44kHz sampling rate stereo .wav, not like the 7200kHz 32-bit float .wav reported by Audacity (FLOSS audio editor for Windows from http://audacity.sourceforge.net/ for the individual allophones.

I tried stereo-to-mono and down-sampling the concatenated file in Audacity (latest 1.3+ beta) then saving to .wav, Audacity crashed in WinXPSP2+. Anyway I've attached is the resultant (and larger than necessary) .wav saying "Propeller". Play it in Windows Media Player 9+ as there is no starting or ending silence yet; and some versions of VLC Media Player (my preference) seem to balk at this sometimes.

KenLem, your new allophones are most welcome. Should improve quality.

David

Rayman · 2008-01-31 19:03

Yeah, it sounds about like I thought (like Stephen Hawking's voice...)...

RobotWorkshop · 2008-01-31 19:05

Don't forget to checkout what has already been done with the propeller on the thread below:

http://forums.parallax.com/showthread.php?p=613308

Robert

RobotWorkshop · 2008-01-31 19:11

store.servomagazine.com/product.php?productid=16626&cat=366&page=2

There is also an article in the December 2007 issue of SERVO that relates to phonetic speech. It explains how to use the SpeakJet chip as a replacement for the old SC-01 chip. It uses an SX28 processor as the translator. It worked out really well. From what i've been told the SpeakGin and the SpeakJet are the EXACT same chip so either one could be used. The SoundGin chip is a different one though.

Robert

Drone · 2008-02-01 14:52

I've downloaded and scraped KenLem's SPO256-AL2 allophones from ChipTalk. The results are better than the "Propeller-02.zip" I uploaded previously, but not by much.

The results with ChipTalk reveal these SPO256-XXX allophones present a problem in true text-to-speech. If I build the word "Propeller" just by typing it in or using the best allophone representations of each letter, the results are far from ideal. You have to intersperse and/or concatenate more allophones in the .wav's that construct the word in order to get what I would call acceptable results. And (the crux of the problem), how you do this varies from word to word, especially for multi-syllabic words. One might improve on things by making more pre-stored allophones based on the combined 59 allophones, but this is a huge task for a human, and storage requriements will grow non-linearly. Another alternative (seemingly taken by other dedicated speech chips), is to build a dictionary of pre-assembled and optimized allophones for common words to compliment the base allophones. Anyway, parsing the plain text is still an issue in any-case.

I would argue that for a speaking application with limited words, like a car alarm, or GPS, or perhaps a speaking clock, it is probably better to pre-record text-to-speech for the limited number of words from the likes of a Web-based text-to-speech converter, save them as .wav's on a PC and down-sample them to mono 7200 bps 32-bit .wav's.

In conclusion, the propeller can certainly handle this natively with external storage. But most words must be parsed and then called for output from optimized constructs (a dictionary) based on optimized sets of allophones, or perhaps better-yet pre-stored stand-alone .wav words. True text-to-speech is probably not possible with this approach, and this has nothing to do with Propeller.

David

hippy · 2008-02-01 22:33

This might be of interest ... bbc.nvg.org/doc/Speech.html

I remember having the software and it really was quite phenomenal. Nowhere near as robotic as the SPO256 chained WAV files sound ( I've never heard a real SPO256 speak ), and all in 8KB/32KB ( can't remember, article not clear on that ), samples and code, including text to speech conversion !

I did have a complete core dump, samples, dictionary rules and disassembled code. Can I find it these days ? No, and I've been looking for near on ten years. It's on just a dozen pages of A4 and hiding somewhere.

Post Edited (hippy) : 2/1/2008 10:43:47 PM GMT

Oldbitcollector (Jeff) · 2008-02-01 22:37

Ah, a bbc link.. Does this mean that it has a British sounding Steven Hawking voice?? [noparse]:)[/noparse] <smirk>

OBC

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?

Getting started with the Protoboard? - Propeller Cookbook
Got an SD card? - PropDOS
A Living Propeller FAQ - The Propeller Wiki
(Got the Knowledge? Got a Moment? Add something today!)

hippy · 2008-02-01 22:40

I seem to recall it was a bit better than some of Dr H's earlier output

Emic Text-To-Speech

Comments