Speech Recognition
Hi all,
How would I go on about making my Stamp recognise what I say?
Any help would be appreciated.
Regards
Faisal
How would I go on about making my Stamp recognise what I say?
Any help would be appreciated.
Regards
Faisal
Comments
Sorry Faisal, I just couldn't resist.
Ken
The author posits the idea that for simpler 'bots it may be much much much easier for you to learn to speak rudimentary binary than for the 'bot to understand English (his analogy is that you wouldn't use full, unadjusted language to speak to a small child, why would you expect your even simpler 'bot to be any different).
His approach is as follows:
- the speaker (you) speaks an arbitrary tone of some loose duration (say anywhere from 1 to 4 seconds). The rough pitch of the tone is measured and set as the "baseline" (in software ONLY for the current run)
- the 'bot will then expect 4 subsequent "tones", again the durations and spaces are *very* loose. Any tone that is HIGHER in pitch than the baseline is considered binary "1"; any tone lower in pitch than the baseline is binary "0". The 'bot then has your Nibble recognized and decoded. In a sense it's an alternate form of auto-baud-detect serial communication, where the first "start" bit is used to determine the "settings" for the 4 bits to follow.
The idea here is that no matter what the speaker's cadences and pitch (a child has a much higher pitched voice) the 'bot will adjust and decode the Nibble.
I imagine it would be quite funny to hear in person -- you'd be saying things to your 'bot like "hooooom. ummmm. ahhhh. ahhhh. ummmm" (%0110).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
When the going gets weird, the weird turn pro. -- HST
1uffakind.com/robots/povBitMapBuilder.php
1uffakind.com/robots/resistorLadder.php
The book is an old Tab Book -- #1141 -- "How to Build Your Own Working Robot Pet", by Frank DaCosta. Circa 1979.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
When the going gets weird, the weird turn pro. -- HST
1uffakind.com/robots/povBitMapBuilder.php
1uffakind.com/robots/resistorLadder.php
·
Interesting, so the speaker (you) would sound like the person on the other end of the phone in every Charlie Brown Episode?·
·
Seriously, that's a neat idea.· The amount of recognition that you want to do strictly depends on the vocabulary that you want to implement, as well as your memory limitations and processing power.· When I was a kid I wrote a program with an ATARI computer that could distinguish the difference between "YES" and "NO", using the PADDLE controller as a means for the audio input.· It was based on something very similar to what you are describing.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe
IC Layout Engineer
Parallax, Inc.
Yes! Exactly! It's so simple and clever, really. And DaCosta implemented his idea nearly 30 years ago. He uses an op-amp and measures the zero-crossing to get a rough pitch. I read the whole chapter and he discards pitches below 160hz and above 1250 hz (but he does his counting and frequency measurements with like 10 dip sockets' worth of latches, shift-registers and 555 timers -- all that could be in firmware). His "ideal" length for the reference and bit pitches is ~ 1 second, with a 1-4 second pause between each. That gives about a 20 second window for receiving the nibble -- if all the bits aren't received within 20 seconds or so, the input is discarded as a bogus transmission.
The basic circuit is a crystal mic into a two-transistor buffer which feeds an op-amp for detecting the zero-crossing of the frequency.
He chose Nibbles because he says 4 binary "digits" are pretty easy to remember -- going to 6 or 8 bits made it nearly impossible for him to "speak" without a chart. I would tend to agree -- 5-8 bits and you might need a cheat sheet. But given the non-limitations on the firmware, yeah, you could make the "vocabulary" as extensive as your brain could handle.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
When the going gets weird, the weird turn pro. -- HST
1uffakind.com/robots/povBitMapBuilder.php
1uffakind.com/robots/resistorLadder.php
Even if you build a basic one word command vocabulary and just have a few words, there are certain recognizable "patterns" produced depending on the choice of words used. Obviously there will be several words that might have similar patterns that you will want to avoid. Instead of focusing on the actual frequency, focus on the change in frequency (set a threshold and interpret this·as a HIGH or LOW)... sort of like FSK. Also combine this with amplitude patterns. Using just those two speech components might surprise you.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe
IC Layout Engineer
Parallax, Inc.
Post Edited (Beau Schwabe (Parallax)) : 1/28/2008 3:52:14 AM GMT
If I have time this week I'm going to try breadboarding something up. I think a Stamp can do this if it is dedicated to the task; my preference might be an SX.
Faisal -- sorry to get off what may have been, for you, a not necessarily productive tagent. I will echo Mike's comments -- it's tricky. Others at the forums have used the VR Stamp (no relation) kit with some degrees of success. My impression is that programming it takes some careful planning, and it doesn't seem cheap.
My own laptop (a Mac) does a nice job of recognizing my voice (after having been trained). Many of the projects I've seen that use voice recognition (or machine vision, for that matter) seem to end up using some kind of microprocessor (i.e. a laptop or desktop PC type system) running higher-end software. Not sure it's something that can be tackled with a standalone microcontroller.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
When the going gets weird, the weird turn pro. -- HST
1uffakind.com/robots/povBitMapBuilder.php
1uffakind.com/robots/resistorLadder.php