Emic Text-To-Speech
Rayman
Posts: 14,877
I just got my Emic module to work properly with the Prop...· Took some work as the Reset line is very picky about voltage levels and it is a 5V device...
Anyway, here's my code (I know somebody else posted code a long time ago):
Also, I noted my wiring in the driver file...
PS:· I noticed that they are back in stock on the Parallax site...
Manual mentioned that more dialects may be coming, but since this device appears to be a few years old already, I'm not going to hold my breath...
Anyway, here's my code (I know somebody else posted code a long time ago):
Also, I noted my wiring in the driver file...
PS:· I noticed that they are back in stock on the Parallax site...
Manual mentioned that more dialects may be coming, but since this device appears to be a few years old already, I'm not going to hold my breath...
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
What are you folks recommending these days as a replacement?
http://www.winbond-usa.com/en/content/view/55/159/
http://www.textspeak.com/products.htm
http://www.speakjet.com/
I think we used to carry the last chip, I don't know what our reason for no longer carrying it (I think it may have been more difficult to work with, but this is conjecture).
The second link is more of an end use product and is beyond the budget of most hobbyists. The top link points to the most popular chip, a search of WTS701 shows nearly 200 sites which discuss it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
Can't find info from Winbond on your link with respect to the WTS701. This venerable chip was introduced around 2001, but seems hard if not impossible to get.
Hi Rayman,
Text to speech chips seem to be a dying breed - can't even find the venerable SPO256-AL2 which used to be sold even by Radio Shack in the US.
www.speechchips.com covers what's available pretty-well. They have a "front end" chip for the Magnevation SpeakJet, but the front end chip plus DIP SpeakJet (seemingly forever backorder-status) will run you a total of $38 USD (ouch).
Then there's the DevanTech Speech Module www.acroname.com/robotics/parts/R184-SP03.html which does use the illusive WTS701 and integrates the Winbond chip, speaker, and I2C interface. But at $119 USD - ouch, again!
Certainly the demise of dedicated text-to-speech IC's is in-part due to the fact that microcontrollers (if even lowly PLD's or CPLD's) can play back storeed speech from memory. I seem to remember seeing some ATTiny applications that did this from a small DIP with attached serial EEPROM. Need to look further for code that does "real" text-to-speech on an embedded microcontroller though.
Perhaps someone else can suggest a better solution for text-to-speech. I only searched for around 15 minutes...
Regards, David
My fallback position is to use an SD card to store pre-generated wav files.
AT&T has a nice web page that will generate a wav file from any text you enter with several different voice/language options:
http://www.research.att.com/~ttsweb/tts/demo.php
Cool link to the ATT site for .wav text to speech file generation. But keep in-mind that Restrictions Apply to .wav's generated by this site: (Snip) "Audio files produced on our site are intended only for private, non-commercial use."
You obviously get the picture from my previous post Rayman... but I'd like to add a bit to clarify for others that may be jumping in: There are two solutions to text-to-speech I'll address...
1. Make sound files that can be played back from memory, e.g. from an SD card via a DAC. This allows only pre-stored words, limited by storage space. Good for making a speaking clock, for-example. You have a relatively small "dictonary" of stored words, like "one" through "twelve", "AM", "PM", "O'Clock" etc. The top object invokes playing of stored words by calling routines by their text name (e.g. talk.one, talk.ocllock).
2. Ues a dedicated text-to-speech IC. These are getting hard to find. With a text to speech IC you send data to the chip and it ouputs speech based on the words (typically if not only in English). This solution gives you complete flexibility to "speak" anthing you throw at the text to speech IC and the heavy lifting is done in the speech IC, not the propeller. Let's say you wanted to make a speaking terminal connected to the Internet for blind users. This is the way to go.
True text-to-speech is possible in Propeller-native (IMHO), but you need a lot of knowledge about DSP etc. to buld the app. This is where a C compiler for Propeller may help as there's likely C stuff out there already that may form a basis.
One more note: I took a quick look at some fundamental digital speech techniques with regards to patents. As suspected, the now disfunctional (if not corrupt) US Patent and Trade Office (USPTO) seems to have issued patents for stuff that's been "out there" for decades to the likes of Micro$oft, Apple, et.al. This issue isn't limited to speech generation, but text-to-speech too. So if you want to go to "production" with your text to speech embedded in propeller, make sure you set-aside a lot of money for greedy Lawyers.
On this last paragraph regarding patents - I don't want this thread to blow up on this controversial issue. Hence I'll only stick to the topic of text to speech in any future posts to this thread.
Regards. David
http://forums.parallax.com/forums/default.aspx?f=25&p=1&m=152683
I have some phoneme sets that I made on the APPLE II about 20 years ago in about 10kbytes each.
It's on my to do list to port them to Spin. But what was awesome in 1984 may be awful in 2008.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"A complex design is the sign of an inferior designer." - Jamie Hyneman, Myth Buster
DGSwaner
http://www.ivosoftware.com/ivonaonline.php
I'm now doing a mix of Emic and wav file playback...
The Emic makes you tell it to accept audio input... I supposet that's a good idea..
Question, with the ability to play wav files (thanks Ray) couldn't we just record the basic sounds
and play them back to create words as needed? The answer seems simple, but then that's because
I don't have expertise here.. [noparse]:)[/noparse]
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Getting started with the Protoboard? - Propeller Cookbook
Got an SD card? - PropDOS
A Living Propeller FAQ - The Propeller Wiki
(Got the Knowledge? Got a Moment? Add something today!)
Tim
Edit:· P.S. Look under the 'Downloads' heading for·demo·voice samples to check out.· Oh, it comes in a 3.3 volt version too.
Post Edited (Tim-M) : 1/29/2008 6:39:08 PM GMT
That always works for me,
but you need to get each of the phonetic sounds all in the same monotone pitch to start with,
or else they sound like a skipping codec when you put them together.
BTW - I found the venerable SPO256-AL2 speech IC's for sale at www.imagesco.com/semiconductors/index.html for $33.95 USD. Ouch - they used to cost $13 at Radio Shack decades ago.
Tim, Whoa! That VStamp is $84 bucks...
There's gotta be a way to get the Prop to do this in software.
Rgds, David
Post Edited (Drone) : 1/30/2008 12:33:54 PM GMT
Somebody made a PIC16F628 and a 24LC512 EEPROM speak by programming the allophones seemingly scraped from a SPO256-AL2 into it. Seems like the stuff may be downloadable - but I don't have time to study the sites right now.
Links:
http://home.alphalink.com.au/~derekw/pictalker/main.htm
www.isk.kth.se/kursinfo/6b4059/pictalk/fonems/index.htm
Rgds, David
Edited: The .wav's for the allophones are given in the page www.isk.kth.se/kursinfo/6b4059/pictalk/fonems/index.htm I linked above! Right-click and save them.
David
Post Edited (Drone) : 1/30/2008 1:46:00 PM GMT
This thread may be turning into a cutting-edge thing: Making the Propeller "Speak", largely on-chip; with minimal external hardware (perhaps just a larger external EEPROM and DS-D/A-LPF)... of course founded on the works of others (.wav from SD card project for example, credit-due).
I downloaded the allophones and used a (seemingly buggy) utility to concatenate allophones in WinXP to speak "Propeller". Results were "choppy" at best but somewhat understandable, a good first step given I've only spent an hour or two on this. I don't want to give out the link for the WinXP concatenate utility yet cause (as I said before) I think it is buggy and there are probably better alternatives out there (read below).
The 59 .wav allophones are around 68kBytes total, too large for the likes of Prop Protoboard so we need more storage. Speed wise, the PIC example seems to indicate reading an EEPROM may be ok, even over I2C, and SD card should be ok too. I want to get these concatinated .wav allophones to at least speak one word (i.e. Propeller) in a realtively clear manner in a concatinated .wav on a PC before even thinking about moving to the Prop.
It is my nature to gravitate to Shell/Perl-Scrips with the likes of SOX and Mplayer etc. in Linux/xBSD to solve this (steep learning curve but powerful are these mostly command-line Linux/xBSD tools).
However, as most posters here seem to be Windows users, maybe I (or somebody) should fire-up Audacity http://audacity.sourceforge.net/ the free and open audio editor app and crunch these allophone .wav files first and try to build a word from the allophones (it seems Audacity 1.3+ has a "Multiple clips per track" feature added that may help, current version is 1.3.4 Beta as of this writing, 1.2.6 is Stable).
Getting these scraped SPO256 allophones to first speak in Windows concatenated into a word .wav file (first successful word challenge is "Propeller") is IMHO a first step. Bit. reverse-Engineering the PICTalk source (links above in this thread) is another alternative as there seems to be an allophone.bin that may be usable.
Regards, David (Drone)
Post Edited (Drone) : 1/30/2008 4:57:56 PM GMT
I don't expect this will sound all that great (it will be like 80's text-to-speech), but the price is right!
See http://www.speechchips.com/shop/item.aspx?itemid=13
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.speechchips.com
Speech & Video IC's for BasicStamps
I spent a few more minutes trying to concatinate the allophones from the SPO256 site linked to by me above in this thread (thanks KenLem, I'll try your SPO256-AL2 .wav's when I have time). I was able to concatinate the files with a free utility from www.jhepple.com/FxConCat/fx_concat.htm and make them speak the word "Propeller". However what gets saved after concatenation turns into a 44kHz sampling rate stereo .wav, not like the 7200kHz 32-bit float .wav reported by Audacity (FLOSS audio editor for Windows from http://audacity.sourceforge.net/ for the individual allophones.
I tried stereo-to-mono and down-sampling the concatenated file in Audacity (latest 1.3+ beta) then saving to .wav, Audacity crashed in WinXPSP2+. Anyway I've attached is the resultant (and larger than necessary) .wav saying "Propeller". Play it in Windows Media Player 9+ as there is no starting or ending silence yet; and some versions of VLC Media Player (my preference) seem to balk at this sometimes.
KenLem, your new allophones are most welcome. Should improve quality.
David
http://forums.parallax.com/showthread.php?p=613308
Robert
There is also an article in the December 2007 issue of SERVO that relates to phonetic speech. It explains how to use the SpeakJet chip as a replacement for the old SC-01 chip. It uses an SX28 processor as the translator. It worked out really well. From what i've been told the SpeakGin and the SpeakJet are the EXACT same chip so either one could be used. The SoundGin chip is a different one though.
Robert
The results with ChipTalk reveal these SPO256-XXX allophones present a problem in true text-to-speech. If I build the word "Propeller" just by typing it in or using the best allophone representations of each letter, the results are far from ideal. You have to intersperse and/or concatenate more allophones in the .wav's that construct the word in order to get what I would call acceptable results. And (the crux of the problem), how you do this varies from word to word, especially for multi-syllabic words. One might improve on things by making more pre-stored allophones based on the combined 59 allophones, but this is a huge task for a human, and storage requriements will grow non-linearly. Another alternative (seemingly taken by other dedicated speech chips), is to build a dictionary of pre-assembled and optimized allophones for common words to compliment the base allophones. Anyway, parsing the plain text is still an issue in any-case.
I would argue that for a speaking application with limited words, like a car alarm, or GPS, or perhaps a speaking clock, it is probably better to pre-record text-to-speech for the limited number of words from the likes of a Web-based text-to-speech converter, save them as .wav's on a PC and down-sample them to mono 7200 bps 32-bit .wav's.
In conclusion, the propeller can certainly handle this natively with external storage. But most words must be parsed and then called for output from optimized constructs (a dictionary) based on optimized sets of allophones, or perhaps better-yet pre-stored stand-alone .wav words. True text-to-speech is probably not possible with this approach, and this has nothing to do with Propeller.
David
I remember having the software and it really was quite phenomenal. Nowhere near as robotic as the SPO256 chained WAV files sound ( I've never heard a real SPO256 speak ), and all in 8KB/32KB ( can't remember, article not clear on that ), samples and code, including text to speech conversion !
I did have a complete core dump, samples, dictionary rules and disassembled code. Can I find it these days ? No, and I've been looking for near on ten years. It's on just a dozen pages of A4 and hiding somewhere.
Post Edited (hippy) : 2/1/2008 10:43:47 PM GMT
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Getting started with the Protoboard? - Propeller Cookbook
Got an SD card? - PropDOS
A Living Propeller FAQ - The Propeller Wiki
(Got the Knowledge? Got a Moment? Add something today!)