PDA

View Full Version : Questions about the speakjet



softcon
12-18-2011, 05:01 PM
I've been hunting for an IC speech chip, and although rc systems is nice (I've used one of their doubletalk synthesizers for my screen reader needs in the past) 89 bucks is a bit too much for me to drop everytime I need speech output for one of my projects. So, until Parallax gets their new chip to market, I'm hunting for an alternative. I found the speakjet (http://www.speechchips.com/shop/item.aspx?itemid=6) which appears to be an all in one solution, but it's not clear if this version can be added to the bs2 or propeller. Plus, it's backordered, so no telling how long it'll be before more show up.
However, there's the surface mounted version in 18-sip format (http://www.speechchips.com/shop/item.aspx?itemid=18)
which claims all you need is +5v and a speaker to hear output. (just like above) but as far as I can tell, (unless it's in the manuals) it doesn't say whether or not the pin configuration will fit the 0.30 mm hole spacings on the breadboards I have here.
Also, there's this: http://www.speechchips.com/shop/item.aspx?itemid=4
which seems to indicate it requires the 18-pin version in order to produce speech output. If that's the case, then why does the 18-pin version say it's ready to go?
I'm a little confused. Anyone who has used these before have any advice and/or pointers on what to purchase? I'd like something that will work with both the bs2 and my quickstart board, though I know it'll need external power if run from the propeller.
I'm still relatively new to all this stuff, so sure would appreciate some assistance and/or recomendations (other than the v86 chips from rcsystems as I've already pointed out they're too expensive in this case).
Thanks.

Loopy Byteloose
12-18-2011, 05:41 PM
Welcome to where the whacky world of phonology meets the digital era.

There seems to be two kinds of chips sold at this site: [a] text-to-speech chips and [b] speech synthesis chips.

The text-to-speech chip is likely to be easier for a beginner to create something useful. But alas and alack, it appears that the text-to-speech chip requires the speech synthesis chip: both [a] and [b].

The speech synthesis chip will be a steeper learning curve as you would have to provide a data base of phonology associated with whatever you want to produce as coherent speech.

Just being able to spell words isn't enough. For instance, English as six generally accepted alphabetical indicators of vowels - a, e, i, o, u, and y. But these six letters represent roughly 25-6 vowel sounds (the British and Americans are still debating exactly how many vowel sounds English has - just pick a favorite dictionary and use that, it is close enough.

Consonants have there own problems, but are more alphabetically predictable than vowels.

But the worse news is that for the 25-26 vowel sounds your chip is going to produce, there are roughly 125 ways that spelling represents these. The text-to-speech chip has to sort all this out.

So it seems most of us just want a text-to-speech approach and hope that it will do the heavy lifting. Frankly, you are going to get what you pay for. A really good chip would have to have a rather huge database of words or spelling permutations. And you might find a real PC does it better than any 18pin DIP might every achieve - after all, there isn't much room to store data in that tiny piece of silicon.

Personally, I am doubtful that the audio will be very loud if any speaker is hooked directly to it - it may be better suited for an earphone unless you have and added output stage. I personally dislike 1 watt or less audio in a 5-10 watt world of noise. (Try to find a 120ohm audio speaker that isn't an ear phone. I suspect that is next to impossible. The implied audio power output is .125 watts.)

I guess what I am trying to say is that they write some very clever advertising that seems to offer an awful lot. Personally, I would prefer going with creating my own .wav files of words on an SDcard that are converted to audio with a Propeller and boosted to at least 3 watts of audio output.

Pin spacing is standard IC DIP. I believe that is .30, and you mention a .30 breadboard. But, that surface mount is likely to not fit the .30 - usually much tinier. Google 18pin SOIC and I think you will find out what I mean.

It looks at though you would have to wait and pay out $45USD for both chips to fool with this.

erco
12-18-2011, 06:20 PM
http://www.youtube.com/watch?v=PZw8_1fMx14

Duane Degn
12-18-2011, 06:42 PM
SparkFun also sells the chip to translate text to allophone addresses used by the SpeakJet.

http://www.sparkfun.com/products/9811

Yes, I'm pretty sure you need both chips to get speech from ASCII characters.

SparkFun also sells SpeakJet chips (http://www.sparkfun.com/products/9578)but they are also currently out of stock.

Both chips at SparkFun will be $47 (plus shipping).

It looks like they have (in current inventory) four shields (http://www.sparkfun.com/products/9799) that include a SpeakJet chip.

I like Loopy's idea of having wav files for each word. I don't know where one finds a dictionary of wav files though. It should be possible to write a computer program that creates them using the PC's text to speech software.

GordonMcComb
12-18-2011, 08:14 PM
SpeakJet uses 72 allophones plus complex transition analysis to produce speech. It uses basic serial (2400 or 9600 bps) for communications, and is compatible with the BS2. The primary source is speechchips.com, who also developed and sell the text-to-speed chip for it. You can also download a free text to speech translator from the developer Web site Magnevation.com. The application, written for Windows, produces the allophonic translation for text you enter. into a "say it" box.

Another option is the Babblebot chip, also sold by Speechips, which in addition to voice includes sound and music synthesis. Both the SpeakJet and Babblebot (used to be called Soundgin) were developed by the same person, and both use pre-programmed PICs. Go to babblebot.com where you can see what they offer. A second company makesan Arduino shield, plus an *extensive* C++-based library for it.

Both SpeakJet and Babblebot absolutely require an outboard amplifier. A simple LM386 is sufficient, but both should use a simple digital filter to get rid of noise. Samples circuits are provided for both of these products.

Speech is MUCH more than stringing allophones or phonemes together. Much of what makes speech intelligible is the transition between these basic vocal tract sounds, not just the sound themselves. This is why simplistic speech synthesis using allophonic/phonemic concatenation simply doesn't work. Though both SpeakJet and Babblebot include transitional algorithms, neither create what would be called super-intelligble speech. These chips are best used when the speech is short, like counting out numbers, or saying very short 3-4 syllable phrases.

If you're looking to repeat stock words or sentences, recording them as WAVs is a good alternative. If you still want a robotic voice you can process the sound clips through something like Goldwave, which has numerous presets for this sort of thing. Or, if you have a keyboard with a vocoder you can create Cylon-sounding voices.

You won't be able to plays the WAVs with a BS2 unless you use an outboard WAV player that uses (perhaps) a serial interface. The Propeller can do it, though most of the samples I've heard include loud pops at the start and end of the clip. Experiment with the bitrates for best result.

-- Gordon

softcon
12-18-2011, 09:09 PM
Oh, I'm well versed in the world of synthesized speech. I've been using computers since 1986, and since I've been blind that whole time, I've run the gauntlet for all sorts of methods of speech generation. In 1986, there were very few options for speech output, and the most popular was Artic Technologies (http://www.artictech.com). They used an ssi263 chip, which used a total (yes, total for all sounds) of 256 phoenemes. It did pretty good considering, and Artic products are still used here and there, though the company quit selling them 10 or more years ago (they were built very well). I still have a few here myself. Even voice over (the screen reader on the mac) has a phoneme mode where you can have the synthesizer build words from basic sounds, though I've not counted to see how many apple provides, it's quite an extensive list.
The section of the speech manual discussing this topic is located at: http://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/SpeechSynthesisProgrammingGuide/Phonemes/Phonemes.html#//apple_ref/doc/uid/TP40004365-CH9-SW1
But, anyhow, I was really hunting for an all-in-one solution, so guess this one won't quite fit the bill.
I want something I can add to my projects in place of a screen, to get output for myself, since in most cases, I'll be the only one using them, and screens do me no good anyhow. It's kind of a hassle to ask someone else to read the screen all the time, especially if it's something more involved than a simple temperature reading.
I'm by no means an expert on speech generation, but I have (mostly) managed to keep up with the general advances in speech generation technology over the years, so generating speech from basic vocal sounds is not new to me.
If this was a one-time shot, I'd probably not mind working out the details to get my project to talk properly, but I don't think I could use it as a general solution, since it's not really robust enough for that, and I doubt I could make it so in the available space provided on eeproms.
It might be fun to fiddle with it though, so maybe I'll obtain one anyhow, but I don't think so at this point.
I appreciate the help posted here, this forum is always full of knowledgeable folks.

erco
12-22-2011, 04:24 AM
(Try to find a 120ohm audio speaker that isn't an ear phone. I suspect that is next to impossible. The implied audio power output is .125 watts.)


GOLDMINE! OK, Electronic Goldmine. Get this, 10x 64-ohm speakers for fitty cent. Yep. a Nickel each! Put two in series. A dime, and 128 ohms! Sure they are probably headphone speakers, but stock up, they're rare!

http://www.goldmine-elec-products.com/prodinfo.asp?number=G18627&utm_source=Goldmine&utm_campaign=5bd7ae8eec-Dec+9+2011&utm_medium=email

While you're ordering, check out their other sale items:
http://www.goldmine-elec.com/?utm_source=Goldmine&utm_campaign=5bd7ae8eec-Dec+9+2011&utm_medium=email

VIRAND
12-30-2011, 08:00 AM
A couple of years ago someone posted a phonetic speech synthesizer demo in the propeller forum, which was based on the vocal tract object,
which is also used by the famous "seven" and "monks" demos.

Here it is, I think:
http://forums.parallax.com/showthread.php?89411

Sariel
01-12-2012, 07:25 PM
This must be going around like the flu. I have been thinking on this subject for a while now, and I am in the beginning stages of writing an object for the TTS256/SpeakJet pair. My idea is to have it be a serial handler that can output strings, decimal values, and maybe even hex values. Since I have not gotten my parts yet, I was wondering something...

It looks like from the data sheet that if you send it a "raw" number, it will mistake that for an op code. For example (and assuming that SPEAK is the TTS256/SpeakJet object) you sent it:



SPEAK.dec(14)


Without proper formatting, per the data sheet, that would cause the TTS256 to "Play the next phoneme with a small amount of stress in the voice", rather than saying "Fourteen". Considering that a long is 2 billion give or take, that would be broken up into a lot of different "word parts". And this is ignoring the fact that the TTS has a max buffer of 128 bytes that it will store before it starts talking. (you can output from the TTS256 before the buffer is full by sending a CR). And what about the prop's zero termination in strings? Will that have to be stripped out?

I am going to keep hammering on this, and if I have any great success, I think I may post it to the Obex.

Anyone out there care to collaborate on this idea?

RobotWorkshop
01-12-2012, 07:38 PM
I have the TTS256/Speakjet combo in one of my Propeller based robots. At the moment I've only been sending text to it but can try sending some numbers to see how it reacts. I believe you'll need to send it as ASCII text like "14" if you want it to speak the numbers. Also, for sending raw phonemes you should enable the passthough mode first so it will sned them along to the SpeakJet. I was running into problems sending the phonemes directly and need to take another look at my code to see if it is a programming issue. With the propeller I am also monitoring the Speakjet status lines like the BHF since the Propeller still needs to be aware if it is ok to send more to the chip or not.

Robert

Duane Degn
01-12-2012, 07:51 PM
Sariel,

To add to what Robert said, you'd want to send numbers as ASCII characters. The "Dec" method in most of the serial objects do this for you.

Sariel
01-12-2012, 08:35 PM
can try sending some numbers to see how it reacts
Please do. I am most curious on how this is going to work.


sending raw phonemes
I was not thinking about sending raw phenomes. I was talking about Breaking numbers up like for example "194" into a 1, a 9, then a 4, a set of strings to conserve space, then just having a set of if statements to put it all together, based upon the location of the digit so the output would be a group something to the effect of "One" "Hundred" "Nine" "Tee" "Four". I did not even think about the pass through mode, even though I read about it several times. Good point, and that is what I will most likely try first, depending on what you find out when sending your info to the pair.


monitoring the Speakjet status lines
Oh yeah. I was putting lines in already that if it says the speakjet is doing something, keep trying until it is your turn.


want to send numbers as ASCII characters

That was my question.... if the TTS would see that as an op code, or if it would actually cause the speakjet to output the number correctly. Making a routine that (when in dec mode) quickly enables pass through mode, outputs the number directly to the SpeakJet, then shut passthrough off would be a very fast work around.

erco
02-08-2012, 12:37 AM
Try to find a 120ohm audio speaker that isn't an ear phone.

Rare as hen's teeth. There's a 100 ohm, 2" diameter speaker for $2 at http://www.phanderson.com/picaxe near the bottom of the page

GordonMcComb
02-08-2012, 01:45 AM
Rare as hen's teeth.

Apparently hens are getting more teeth. Pololu has sold a 100 ohm 30mm speaker for some time. It's REALLY loud.

-- Gordon

erco
02-08-2012, 04:55 PM
Touché, Gordon! My learned opponent finds another 100-ohm speaker for $2.

Another rooster (and more teeth) in the henhouse !

RobotWorkshop
02-11-2012, 05:14 AM
I have the TTS256/Speakjet combo in one of my Propeller based robots. At the moment I've only been sending text to it but can try sending some numbers to see how it reacts. I believe you'll need to send it as ASCII text like "14" if you want it to speak the numbers. Also, for sending raw phonemes you should enable the passthough mode first so it will sned them along to the SpeakJet. I was running into problems sending the phonemes directly and need to take another look at my code to see if it is a programming issue. With the propeller I am also monitoring the Speakjet status lines like the BHF since the Propeller still needs to be aware if it is ok to send more to the chip or not.

When the TTS is active and I send a text string with a number it speaks the number properly. "Number 85" comes out spoken as "Number eighty five" I haven't tried any larger numbers so I don't know how it will react.

At the moment I'm running into odd issues when I try putting the TTS chip in passthru mode in order to send codes directly to the speech chip. It's a bit frustrating and I think I have a spare set that I can use on the Propeller PPDB to see if I can replicate and resolve the problem there. At the moment I am avoiding the passthrough mode ot just sending very small sections of codes direct.

Robert

hemavathy
11-02-2012, 08:06 AM
Hi,
Can anyone tell me the capability of interfacing TTS256 with speakjet that is how many words that can speak.someone told that it ll speak 1ly 8 words whether it correct r not can u help me with this.

RobotWorkshop
11-02-2012, 01:19 PM
You'll want to check with the developer of the TTS256 chip which you can find here:

http://www.speechchips.com/shop/

They should be able to provide all the details you need since they made that chip.

Duane Degn
11-02-2012, 03:55 PM
Hi,
Can anyone tell me the capability of interfacing TTS256 with speakjet that is how many words that can speak.someone told that it ll speak 1ly 8 words whether it correct r not can u help me with this.

I have a YouTube video (http://www.youtube.com/watch?v=8915Vs0mzck)with the TTS256 (https://www.sparkfun.com/products/9811?) interfaced with a speakjet. It uses a bunch or rules to convert the text to allophones. The TTS256 needs to be used with a chip like the Speakjet (https://www.sparkfun.com/products/9578)in order to produce speech.

Speechchips.com has a better chip (and less expensive), the SP0-512 aka RoboVoice (http://www.speechchips.com/shop/item.aspx?itemid=22). erco has an article in this month's Servo Magazine showing how to use it.

The RoboVoice chip is on sale this month (thanks to erco's article) so now is the time to purchase several of them. I think the SpeakJet is will soon be dead.

While I think the RoboVoice is a great chip, it doesn't come close to the quality of voice produced by Parallax's Emic 2 (http://www.parallax.com/StoreSearchResults/tabid/768/List/0/SortField/4/ProductID/105/Default.aspx?txtSearch=Emic+2). The Emic 2 is easier to use than any of the other options I've mentioned (no surprise, it's also more expensive (but still worth the price)).