Questions about the speakjet
softcon
Posts: 217
I've been hunting for an IC speech chip, and although rc systems is nice (I've used one of their doubletalk synthesizers for my screen reader needs in the past) 89 bucks is a bit too much for me to drop everytime I need speech output for one of my projects. So, until Parallax gets their new chip to market, I'm hunting for an alternative. I found the speakjet (http://www.speechchips.com/shop/item.aspx?itemid=6) which appears to be an all in one solution, but it's not clear if this version can be added to the bs2 or propeller. Plus, it's backordered, so no telling how long it'll be before more show up.
However, there's the surface mounted version in 18-sip format (http://www.speechchips.com/shop/item.aspx?itemid=18)
which claims all you need is +5v and a speaker to hear output. (just like above) but as far as I can tell, (unless it's in the manuals) it doesn't say whether or not the pin configuration will fit the 0.30 mm hole spacings on the breadboards I have here.
Also, there's this: http://www.speechchips.com/shop/item.aspx?itemid=4
which seems to indicate it requires the 18-pin version in order to produce speech output. If that's the case, then why does the 18-pin version say it's ready to go?
I'm a little confused. Anyone who has used these before have any advice and/or pointers on what to purchase? I'd like something that will work with both the bs2 and my quickstart board, though I know it'll need external power if run from the propeller.
I'm still relatively new to all this stuff, so sure would appreciate some assistance and/or recomendations (other than the v86 chips from rcsystems as I've already pointed out they're too expensive in this case).
Thanks.
However, there's the surface mounted version in 18-sip format (http://www.speechchips.com/shop/item.aspx?itemid=18)
which claims all you need is +5v and a speaker to hear output. (just like above) but as far as I can tell, (unless it's in the manuals) it doesn't say whether or not the pin configuration will fit the 0.30 mm hole spacings on the breadboards I have here.
Also, there's this: http://www.speechchips.com/shop/item.aspx?itemid=4
which seems to indicate it requires the 18-pin version in order to produce speech output. If that's the case, then why does the 18-pin version say it's ready to go?
I'm a little confused. Anyone who has used these before have any advice and/or pointers on what to purchase? I'd like something that will work with both the bs2 and my quickstart board, though I know it'll need external power if run from the propeller.
I'm still relatively new to all this stuff, so sure would appreciate some assistance and/or recomendations (other than the v86 chips from rcsystems as I've already pointed out they're too expensive in this case).
Thanks.
Comments
There seems to be two kinds of chips sold at this site: [a] text-to-speech chips and speech synthesis chips.
The text-to-speech chip is likely to be easier for a beginner to create something useful. But alas and alack, it appears that the text-to-speech chip requires the speech synthesis chip: both [a] and .
The speech synthesis chip will be a steeper learning curve as you would have to provide a data base of phonology associated with whatever you want to produce as coherent speech.
Just being able to spell words isn't enough. For instance, English as six generally accepted alphabetical indicators of vowels - a, e, i, o, u, and y. But these six letters represent roughly 25-6 vowel sounds (the British and Americans are still debating exactly how many vowel sounds English has - just pick a favorite dictionary and use that, it is close enough.
Consonants have there own problems, but are more alphabetically predictable than vowels.
But the worse news is that for the 25-26 vowel sounds your chip is going to produce, there are roughly 125 ways that spelling represents these. The text-to-speech chip has to sort all this out.
So it seems most of us just want a text-to-speech approach and hope that it will do the heavy lifting. Frankly, you are going to get what you pay for. A really good chip would have to have a rather huge database of words or spelling permutations. And you might find a real PC does it better than any 18pin DIP might every achieve - after all, there isn't much room to store data in that tiny piece of silicon.
Personally, I am doubtful that the audio will be very loud if any speaker is hooked directly to it - it may be better suited for an earphone unless you have and added output stage. I personally dislike 1 watt or less audio in a 5-10 watt world of noise. (Try to find a 120ohm audio speaker that isn't an ear phone. I suspect that is next to impossible. The implied audio power output is .125 watts.)
I guess what I am trying to say is that they write some very clever advertising that seems to offer an awful lot. Personally, I would prefer going with creating my own .wav files of words on an SDcard that are converted to audio with a Propeller and boosted to at least 3 watts of audio output.
Pin spacing is standard IC DIP. I believe that is .30, and you mention a .30 breadboard. But, that surface mount is likely to not fit the .30 - usually much tinier. Google 18pin SOIC and I think you will find out what I mean.
It looks at though you would have to wait and pay out $45USD for both chips to fool with this.
http://www.sparkfun.com/products/9811
Yes, I'm pretty sure you need both chips to get speech from ASCII characters.
SparkFun also sells SpeakJet chips but they are also currently out of stock.
Both chips at SparkFun will be $47 (plus shipping).
It looks like they have (in current inventory) four shields that include a SpeakJet chip.
I like Loopy's idea of having wav files for each word. I don't know where one finds a dictionary of wav files though. It should be possible to write a computer program that creates them using the PC's text to speech software.
Another option is the Babblebot chip, also sold by Speechips, which in addition to voice includes sound and music synthesis. Both the SpeakJet and Babblebot (used to be called Soundgin) were developed by the same person, and both use pre-programmed PICs. Go to babblebot.com where you can see what they offer. A second company makesan Arduino shield, plus an *extensive* C++-based library for it.
Both SpeakJet and Babblebot absolutely require an outboard amplifier. A simple LM386 is sufficient, but both should use a simple digital filter to get rid of noise. Samples circuits are provided for both of these products.
Speech is MUCH more than stringing allophones or phonemes together. Much of what makes speech intelligible is the transition between these basic vocal tract sounds, not just the sound themselves. This is why simplistic speech synthesis using allophonic/phonemic concatenation simply doesn't work. Though both SpeakJet and Babblebot include transitional algorithms, neither create what would be called super-intelligble speech. These chips are best used when the speech is short, like counting out numbers, or saying very short 3-4 syllable phrases.
If you're looking to repeat stock words or sentences, recording them as WAVs is a good alternative. If you still want a robotic voice you can process the sound clips through something like Goldwave, which has numerous presets for this sort of thing. Or, if you have a keyboard with a vocoder you can create Cylon-sounding voices.
You won't be able to plays the WAVs with a BS2 unless you use an outboard WAV player that uses (perhaps) a serial interface. The Propeller can do it, though most of the samples I've heard include loud pops at the start and end of the clip. Experiment with the bitrates for best result.
-- Gordon
The section of the speech manual discussing this topic is located at: http://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/SpeechSynthesisProgrammingGuide/Phonemes/Phonemes.html#//apple_ref/doc/uid/TP40004365-CH9-SW1
But, anyhow, I was really hunting for an all-in-one solution, so guess this one won't quite fit the bill.
I want something I can add to my projects in place of a screen, to get output for myself, since in most cases, I'll be the only one using them, and screens do me no good anyhow. It's kind of a hassle to ask someone else to read the screen all the time, especially if it's something more involved than a simple temperature reading.
I'm by no means an expert on speech generation, but I have (mostly) managed to keep up with the general advances in speech generation technology over the years, so generating speech from basic vocal sounds is not new to me.
If this was a one-time shot, I'd probably not mind working out the details to get my project to talk properly, but I don't think I could use it as a general solution, since it's not really robust enough for that, and I doubt I could make it so in the available space provided on eeproms.
It might be fun to fiddle with it though, so maybe I'll obtain one anyhow, but I don't think so at this point.
I appreciate the help posted here, this forum is always full of knowledgeable folks.
GOLDMINE! OK, Electronic Goldmine. Get this, 10x 64-ohm speakers for fitty cent. Yep. a Nickel each! Put two in series. A dime, and 128 ohms! Sure they are probably headphone speakers, but stock up, they're rare!
http://www.goldmine-elec-products.com/prodinfo.asp?number=G18627&utm_source=Goldmine&utm_campaign=5bd7ae8eec-Dec+9+2011&utm_medium=email
While you're ordering, check out their other sale items:
http://www.goldmine-elec.com/?utm_source=Goldmine&utm_campaign=5bd7ae8eec-Dec+9+2011&utm_medium=email
which is also used by the famous "seven" and "monks" demos.
Here it is, I think:
http://forums.parallax.com/showthread.php?89411
It looks like from the data sheet that if you send it a "raw" number, it will mistake that for an op code. For example (and assuming that SPEAK is the TTS256/SpeakJet object) you sent it:
Without proper formatting, per the data sheet, that would cause the TTS256 to "Play the next phoneme with a small amount of stress in the voice", rather than saying "Fourteen". Considering that a long is 2 billion give or take, that would be broken up into a lot of different "word parts". And this is ignoring the fact that the TTS has a max buffer of 128 bytes that it will store before it starts talking. (you can output from the TTS256 before the buffer is full by sending a CR). And what about the prop's zero termination in strings? Will that have to be stripped out?
I am going to keep hammering on this, and if I have any great success, I think I may post it to the Obex.
Anyone out there care to collaborate on this idea?
Robert
To add to what Robert said, you'd want to send numbers as ASCII characters. The "Dec" method in most of the serial objects do this for you.
I was not thinking about sending raw phenomes. I was talking about Breaking numbers up like for example "194" into a 1, a 9, then a 4, a set of strings to conserve space, then just having a set of if statements to put it all together, based upon the location of the digit so the output would be a group something to the effect of "One" "Hundred" "Nine" "Tee" "Four". I did not even think about the pass through mode, even though I read about it several times. Good point, and that is what I will most likely try first, depending on what you find out when sending your info to the pair.
Oh yeah. I was putting lines in already that if it says the speakjet is doing something, keep trying until it is your turn.
That was my question.... if the TTS would see that as an op code, or if it would actually cause the speakjet to output the number correctly. Making a routine that (when in dec mode) quickly enables pass through mode, outputs the number directly to the SpeakJet, then shut passthrough off would be a very fast work around.
Rare as hen's teeth. There's a 100 ohm, 2" diameter speaker for $2 at http://www.phanderson.com/picaxe near the bottom of the page
Apparently hens are getting more teeth. Pololu has sold a 100 ohm 30mm speaker for some time. It's REALLY loud.
-- Gordon
When the TTS is active and I send a text string with a number it speaks the number properly. "Number 85" comes out spoken as "Number eighty five" I haven't tried any larger numbers so I don't know how it will react.
At the moment I'm running into odd issues when I try putting the TTS chip in passthru mode in order to send codes directly to the speech chip. It's a bit frustrating and I think I have a spare set that I can use on the Propeller PPDB to see if I can replicate and resolve the problem there. At the moment I am avoiding the passthrough mode ot just sending very small sections of codes direct.
Robert
Can anyone tell me the capability of interfacing TTS256 with speakjet that is how many words that can speak.someone told that it ll speak 1ly 8 words whether it correct r not can u help me with this.
http://www.speechchips.com/shop/
They should be able to provide all the details you need since they made that chip.
I have a YouTube video with the TTS256 interfaced with a speakjet. It uses a bunch or rules to convert the text to allophones. The TTS256 needs to be used with a chip like the Speakjet in order to produce speech.
Speechchips.com has a better chip (and less expensive), the SP0-512 aka RoboVoice. erco has an article in this month's Servo Magazine showing how to use it.
The RoboVoice chip is on sale this month (thanks to erco's article) so now is the time to purchase several of them. I think the SpeakJet is will soon be dead.
While I think the RoboVoice is a great chip, it doesn't come close to the quality of voice produced by Parallax's Emic 2. The Emic 2 is easier to use than any of the other options I've mentioned (no surprise, it's also more expensive (but still worth the price)).