Goertzel-based speech "recognizer" (now with source code)
Phil Pilgrim (PhiPi)
Posts: 23,514
This is pretty much just a tease for now. I'm not even posting source code yet. But I'm curious to see what kind of accuracy people might get with the attached binary. It requires a Propeller demo board and an attached NTSC monitor. That's all.
When run, you will be prompted to say the words, "left", "right", "forward", "reverse", and "stop" three times each. Then you will be asked to say any of these words, one at a time. After each utterance, each word will be displayed with three scores, one for each training template, and a decision algorithm will determine which word you said. That's it. Nothing fancy. You can even substitute different words during training, if you like: e.g. their equivalents in another language, say, or the digits "one" through "five".
Try it at different microphone distances. It's easiest to watch the monitor if you're not bent over the board speaking directly into the mic. I'm typically leaning back in my chair with the board on the bench.
Anyway, enjoy, and report back if you think it's worth pursuing further.
Thanks,
-Phil
Update: Deleted the binary and added source code.
Post Edited (Phil Pilgrim (PhiPi)) : 9/4/2009 11:52:33 PM GMT
When run, you will be prompted to say the words, "left", "right", "forward", "reverse", and "stop" three times each. Then you will be asked to say any of these words, one at a time. After each utterance, each word will be displayed with three scores, one for each training template, and a decision algorithm will determine which word you said. That's it. Nothing fancy. You can even substitute different words during training, if you like: e.g. their equivalents in another language, say, or the digits "one" through "five".
Try it at different microphone distances. It's easiest to watch the monitor if you're not bent over the board speaking directly into the mic. I'm typically leaning back in my chair with the board on the bench.
Anyway, enjoy, and report back if you think it's worth pursuing further.
Thanks,
-Phil
Update: Deleted the binary and added source code.
Post Edited (Phil Pilgrim (PhiPi)) : 9/4/2009 11:52:33 PM GMT
Comments
I've got a working implementation of the goertzel algorithm that is written in the processing environment if anyone wants another example for this kind of approach.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Not the fish.
sites.google.com/site/bitwinproject/
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propeller Tools
Looking forward to testing this tonight after work.
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Visit the: The Propeller Pages @ Warranty Void.
I have worked with it for the past half hour.
What I have noticed is "left" and "right" are not recognized about 50% of the time after training, or "right' is read as "left" a lot of times. Forward, Reverse, and Stop are almost spot on 98% of the time. Might it be that the sampling time of a one syllable word is too short? But then Stop is one syllable but has a lot of different dynamics than Left and Right .
Going to play some more and see if I can get a better response pattern written down.
When training it on "right" and "left", try extending the length of your pronunciation, and emphasize the "l" and "r" sounds. That will provide a more detailed template. When testing, every utterance is stretched to fit the template, so the more detailed the template the better.
A firend of mine stopped by this afternoon, so I had him try it with templates trained on my voice. It got about 75%, not counting the "say again, please" responses.
-Phil
do the filters have (or need) moving center frequencies ?
Another thing I've been wanted to ask - how do you get those spectragrams you've posted recently, what software or machines do this? - they look very detailed.
thanks much
- Howard
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
And congratulations !!!
BTW - You might consider posting the mic circuit for anyone interested to build onto their boards - I am thinking especially the Prop Protoboard. (yes I know they could find it in the demo board circuit diagram)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBladeProp, RamBlade, TwinBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
Just realize I had a 20” ventilation fan running in the window 2 feet from the demo board. Sure to be injecting some noise. More tests..
Must get better Mics in my studio hooked up to demo board.
Jim
The mic circuit is is included with the Demo Board and shown in the Demo Board schematic.
CounterRot,
The filter center frequencies are fixed at: 300, 424, 600, 849, 1200, 1697, 2400, and 3394 Hz. This is a logarithmic progression. 'No particular reason for it; it just seemed like the right way to do it.
-Phil
It gets a little confused on "reserve" and "stop" Listens better than the kids! [noparse]:)[/noparse]
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Visit the: The Propeller Pages @ Warranty Void.
Perhaps 2, 4, 8, 16, 32 ?
Maybe 2, 3, 5, 7, 11, 13, 17 ?
Or even 1, 2, 3, 5, 8, 13, 21 ?
Not exactly off topic [noparse]:)[/noparse] A Fibonacci inspired sequence might work nice for frequencies. Consider the fractal nature of the conch shell design and other natural patterns.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propeller Tools
(saves anyone having to find it)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBladeProp, RamBlade, TwinBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
Too soon to start bugging you for source code??? [noparse]:)[/noparse]
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Visit the: The Propeller Pages @ Warranty Void.
No, of course not! Before I provide it, I need to whip my Goertzel code into a proper object and convert the front-end code to use it. Unless "real work" intrudes — and it might — I could have something by day's end.
-Phil
She made me get rid of the masterpieces (Can't live without them, and when you kill them the paperwork is just so tedious)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
It is VERY quiet in here otherwise.
I am REALLY looking forward to the source code for this. Speech Recognition has been one of my great areas of hope recently, although I haven't had the opportunity to play with it much. I was going to buy the VR Stamp, but then the SayIt module was made available by Parallax. My testing with the SayIt module shows the GUI has comm issues with my PC, and the demo code with the built-in default command set ("robot") is much LESS sensitive and responsive than your routine here. I can speak to this routine in a normal and relaxed manner and it accurately recognizes the command more than 90% of the time, whereas the SayIt module only picks up about 1 in 7 trigger word utterances, and then it needs increased volume and significant attention to diction.
I personally think you're onto a great item here, thanks very much for sharing this with us!
Dave
PS., It also likes to interpret my typing on the keyboard as "left"... crunching candy, clicking keys.... "left". Hmmm...
-Phil
Post Edited (Phil Pilgrim (PhiPi)) : 9/4/2009 4:24:11 AM GMT
I also got about 90%. Will experiment a bit more- this if fun! My challenge still stands...
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Download a free trial of ViewPort- the premier visual debugger for the Propeller
Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
My Prop Info&Apps: ·http://www.rayslogic.com/propeller/propeller.htm
The Goertzel output is the same as a DFT on a single frequency. I picked it because it's quick and easy to compute in real time. The thing you can't do with it, though, is adjust the shape of the passband — only its width. There are FIR and IIR passband filters which might be better suited to this sort of thing. A full-blown FFT is probably overkill, though, and I'm not sure that one could be accomplished on the Prop in real time.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
My Prop Info&Apps: ·http://www.rayslogic.com/propeller/propeller.htm
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propeller Tools
-Phil
http://www.pulsedpower.net/Info/RC/RC_Filter.htm#Numerical_Low-Pass_RC_Filter
I don't imagine this type of thing needs sharp cutoffs and this one would be a lot faster...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
My Prop Info&Apps: ·http://www.rayslogic.com/propeller/propeller.htm
-Phil
-Phil
You have no idea how tempted I am to create a Propeller project which employs this
object along with your speech object. [noparse]:)[/noparse]
"Propeller, get my coffee."
"I understood 'No'"
"Propeller get my coffee."
"I understood 'left'"
Propeller get my coffee."
"Get your own coffee!"
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Visit the: The Propeller Pages @ Warranty Void.