OK! I now have the newest version of the recognizer! However, as usual, it is not fully working. It takes the voice sync files and "compresses" them (removes x amount of bytes in-between each byte read) and places them in there own buffer in the internal memory. It seems to work great, but at a certain point the screen will blank and flicker. The time before it does this decreases with the raising of the "comp" CON (or the number of bytes skipped). Does anyone know what could be causing this? Here is the code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Computers are microcontrolled.
Robots are microcontrolled. I am microcontrolled.
But you·can·call me micro.
If it's not Parallax then don't even bother.
I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)
@Hanno: This is my progress so far on fitting your requirements:
-Hardware is limited to the Parallax DemoBoard -- I've got this down on all my versions exept on the SD card version.
-Uses 1 cog and less than 15KB global ram -- On the simplest one, no problem! Same on the EEPROM one, but it is currently not working. I have another version that is like the EEPROM one only 50x faster because it "compresses" the files (well, not really) into smaller files and fits them onto HUB RAM. It has a small error that makes it freeze but other then that it works fine.
-Uses 1 spin variable to indicate what word was recognized -- Just displays a message now, but this function is simply added.
-Must understand either: "1,2,3,4,5,6,7,8,9,10" or "up,down,left,right,yes,no". -- I went with "up,down,left,right,yes,no" because that is less words that need stored.
-I should be able to take the code, speak the items in any order and not see a mistake. It's ok if I have to repeat myself, speak carefully, be in a quiet room... -- You actually have to speak fast for it to recognize your voice because of buffer storage limitations. Also, since it is a voice comparison system it needs you to sync your voice.
-Code must be MIT license -- Well, duh! Why else would I keep posting it?
These are all the requirements and how I've been doing fitting them. I havn't had time to program yet today, but I will first chance I get. I should have this code completed in a week. Starting Saturday next week I will be leaving on vacation, so I hope to have it done before then. What encourages me most is whenever someone posts on this thread. It is dissapointing when I come to post some new code and no one has commented on the previous versions. Thank you for posting here and keeping my hopes up for this project!
Micro
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Computers are microcontrolled.
Robots are microcontrolled. I am microcontrolled.
But you·can·call me micro.
If it's not Parallax then don't even bother.
I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)
This is typically done by first recognising the vowels. This is done usually by getting whats known as Forments. You need 3 forments to do a good job, but there is a simple paper out there that just measures the zero crossing points to determine frequencies (was written a long time ago by some guy from sharp labs) that could easily be implemented on a PIC. That would be able to recognize single words like "Lights", etc....
To get the consonants then takes extra processing. I'm not familiar enough to tell you how that part is done....maybe by finding certain types of discontinuities in the wave envelope or something.
I just read all the other comments on this thread and it looks like nobody did any research before attempting this work....suggest you look for speech recognition on google before you attempt a project like this in future to see how it's normally done....
Look up forments. You should be able to recognize 1 through 10 only using forments pretty easily. Should take maybe under 1K of code. You don't even need to mess with the consonants. Just think in terms of the words not having consonants, so "Lights" will sound like "Eyes". As long as you don't have words that sound the same when you remove the consonants, you can have as big a vocabulary as you want. 5 and 9 might sound the same....
As I mentioned in my last post, there is a paper out there if you search for it. Consonants is another thing. I suspect it involves playing with the envelope around the wave.
If you want to compare a wave envelope with one by another person speaking, you could take a correllation between the 2 signals. This would be done by taking the peaks of the waveforms and calculating the sum of S1xS2 where s1 and s2 are the samples from the 2 waveforms to be compared.
Maybe doing that and comparing the forments as well combined might be a good way to generalize detection and recognition of the words.
Not that a faster way of calculating correlation is to do an average sum of minimum differences I think it's called.....where you just add the differences between the values so it would be sum of s1 - s2.
For the first method (correllation) it's where the value peaks that you get a match.
For the second it's where it minimizes that you get a better match.
You have to compare the input wave with all the stored waveforms.
I actually haven't used the propeller yet. Not sure if I want to....is why I joined to create a new thread asking a few questions about the architecture as I don't see it being all that great (I have a strong supercomputer background so I'm used to working with parallel processors and a strong embedded background). Anyways, I saw this thread and thought I'd help.
It uses the Hidden-Markov technique (basically a state-machine with probabilistic features), in common with most other successful systems. A similar approach for speech recognition on the Propeller would be advisable.
It only requires 9 MIPS.
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
That pdf is a Garcia circuit cellar article.....very good and easy to understand. Now this is just for forments.
Someone suggested looking at the dsPIC library. I have seen the term Hidden Markov models used when applied to speech recognition, so to make it more advanced, you could definitely look at that.
webmasterpdx said...
I just read all the other comments on this thread and it looks like nobody did any research before attempting this work....suggest you look for speech recognition on google before you attempt a project like this in future to see how it's normally done....
Everyone has there opinion but there is nothing wrong with letting someone start from scratch on a project like this.· Perhaps instead of doing it 'just like everyone' else, some of the people taking up the challenge will find some shortcuts or new/inovative ways to approach the problem of speech recogonition using a unique processor like the Propeller chip.
I find it interesting to see what comes out of this project.
Actually no, I disagree. By not looking at whats normally done, you ignore the physics of the problem. Things like all vowels are represented by forments. You wouldn't know that if you didn't read it. Someone looking at a waveform isn't going to be able to do that.
The only thing they came a little close to was examining the waveform peaks, but being aware of signal processing techniques like correllation will shorten the time to a solution by a factor of 100 to 1.
Bottom line, there are certain laws of physics about speech that you have to be aware of before you can write this kind of code.....so to remain ignorant is not a useful strategy.
Learn the laws of physics and ignore the algorithms....thats fine....then come up with new algorithms from a position of education......that actually makes sense and then Id agree with you, but you might as well hook 2 cans together with string if you want to remain ignorant of the physics of the whole thing.
The propellor will allow you to optimize the signal processing operations.....thats probably how best to use the propellor. Actually, if someone would write a good signal processing library that can use cogs in parallel, that'd be useful for everyone...
I have to agree with Robert (Robotworkshop) on this one - I do understand the hazard of 'not looking at what's normally done'. But you need to see this thread, this activity in context. "OBC" posted some code and an idea, "Microcontrolled" (who is a younger fellow) picked up on it - and in a matter of *days* came up with something that works pretty well. He can speak for himself, of course, on what he may have researched "out there"... but I'll be he came up with these things by a combination of smarts, hacking at it's best, some exchanges here, and - most important - (re)prototyping his experiment until it worked. If he were to research all the things you mention, then he'd probably still be reading, with no results.
The merit in reinventing is learning the shape and function of wheels.
Bootstrapping from first principles (and stubbing your toes along the way) is a great way to immerse yourself in a difficult subject matter ... if you have the time and tenacity. I say this because, once you reach an impasse, you have an overwhelming incentive to see how others have done it, whereas that incentive may have been lacking before, resulting in mere spoonfeeding of information. By making your own mistakes, you learn why other techniques were explored and, hence, remember them better.
BTW, webmasterpdx, the word is "formant", not "forment".
I can't speak for microcontrolled but I'm enjoying the exploration of the subject..
If I fully understood all the variables and information involved at this point this would
cease to be a hobby project for relaxation and learning by doing...
Don't get me wrong, I'm reading materials provided, but I enjoy looking for the
missed methods by trying solutions which may or may not be a shortcut passed
some of the difficulties involved.
webmasterpdx said...
Actually no, I disagree. By not looking at whats normally done, you ignore the physics of the problem. Things like all vowels are represented by forments. You wouldn't know that if you didn't read it. Someone looking at a waveform isn't going to be able to do that.
The only thing they came a little close to was examining the waveform peaks, but being aware of signal processing techniques like correllation will shorten the time to a solution by a factor of 100 to 1.
Bottom line, there are certain laws of physics about speech that you have to be aware of before you can write this kind of code.....so to remain ignorant is not a useful strategy.
Learn the laws of physics and ignore the algorithms....thats fine....then come up with new algorithms from a position of education......that actually makes sense and then Id agree with you, but you might as well hook 2 cans together with string if you want to remain ignorant of the physics of the whole thing.
The propellor will allow you to optimize the signal processing operations.....thats probably how best to use the propellor. Actually, if someone would write a good signal processing library that can use cogs in parallel, that'd be useful for everyone...
-D
I never said or implied that someone should 'ignore the physics of the problem.'· Nor did I ever state that someone shouldn't seek them out and read them (I already have much of this material in my library).· You've missed the spirit of my post and this thread.··What is wrong with experimenting to see if someone may find a completely new method for dealing with the issue?
Great discussion! Microcontrolled will be very happy with the level of activity on this thread.
I'm very happy that so much thought is going into this challenge- it'll be great to have speech recognition for the Propeller. I really like the Propeller because it's hardware is powerful and flexible enough that pretty much anything just becomes a software problem- outputting graphics, grabbing video, computer vision, synthesizing speech, and now speech recognition. I agree that much research has already been done on understanding speech- and that eventually that knowledge should and will be used. I'm a big fan of iterative development where you start by hacking something up quickly, learn what the problems are, doing research, and staying passionate about the problem until it's completely solved. Let's see where this takes us!
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Download a free trial of ViewPort- the premier visual debugger for the Propeller
Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!
I've been playing with Nick G's visualizer (mentioned earlier in the thread) code this afternoon,
While I haven't locked in on speech patterns yet, I did stumble on
a simple method of hand clap controls. Interesting?
OBC,
Maybe the Propeller can turn lights on and off? Then we could write a "clapper" application! [noparse]:)[/noparse]
Howard,
Sorry, not following you. PID isn't magic, neither is ViewPort or fuzzy logic. And for that matter neither are the good doctor's neural networks. There's been decades of work getting computers to understand text- and yes, it typically involves working in the frequency domain to look for formants. Without AI, error rates are still unacceptably high. However, getting the Propeller to the level of my dog should be possible.
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Download a free trial of ViewPort- the premier visual debugger for the Propeller
Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!
Howard,
Sorry, not following you. PID isn't magic, neither is ViewPort or fuzzy logic. ...
No problem, Hanno - that·was about as·vague of a shot in the dark as it gets [noparse]:)[/noparse]).
I was·musing:·P.I.D. divides up the work. As in·"·I "·deals with·the time·and/or freq.domain with simple, discrete integrals. The setpoints could roll/move dynamically over the wavefile creating formant-like info on the fly, that would be the " P "...·(ah... but·non-linearity issues·... I don't know if gain scheduling would have too much overhead). The " D " then acting in a way similar to what OBC and Microcontrolled have done. But then there's the 'noise' factor that dynamic setpoints would cause - like somekind of weird harmonic oscillation.·
To me, extracting the 'correct' result from whatever sample you use is a PID process ... maybe Dr.YouKnowWho wasn't so farout with the idea of "error correction"?
I know I am articulating this very poorly - does this make any sense? I'd rather throw it out there now and refine it later ...
>·... However, getting the Propeller to the level of my dog should be possible.
Then you have one *very* smart dog !
Woof Grrr Woof = "No, Master - you go fetch your own slippers today" [noparse]:)[/noparse])
First, I never said experimentation was a bad idea. I do it all the time. I just said....proceeding without looking it up on the web first isn't the most productive way to proceed. Anyways, I'm not interested in arguing research methodology.
However, You guys got me interested in speech recognition again and I found some interesting links. Note that some of these links will disappear as OGI.COM will disappear eventually as the university in question has been closed down by their "mother ship" ohsu.
First of all, for y'all there is a great page on spectrographs for speech recognition. Basically these are frequency 2D graphs of sound. The red in the pictures show the dominant frequencies for these sounds. They are concerned with the F1, F2 and F3 Formants, since these are what are used. However, they also show how to recognize certain consonants. This page is at: http://cslu.cse.ogi.edu/tutordemos/SpectrogramReading/ipa/ipahome.html
Be aware that there are ranges of formants for male and female voices for each vowel sound (if you want to create a generalized speech recognition that doesn't require teaching).
Finally, I have a neat little algorithm I developed I call the tri-band FIR algorithm. I developed this for an application on a PIC to calculate a filter for 3 types of filtering, Low pass (LPF), Band Pass (BPF) and High pass (HPF). First, I just found one of the many java sites on the web that will generate arguments to a FIR filter. I was using a simple FIR filter of the form aS0+bS1+cS2. The java program I was using gave me the parameters for LPF and HPF. What I discovered first is the same parameters were used in both the LPF and HPF but just a minus sign here and there was all the dfference (i.e. there was symmetry in the calculations).
Secondly, I found that I could round the arguments (given as floating point) to values that are close to fractions involving powers of 2 (e.g. instead of 0.371, I could use 0.375 which is 3/8). So, basically what that did for me was to enable me to estimate s * 0.371 by (s<<1 + s) >>3 or instead of requiring floating multiplies, just shifts and adds would allow me to calculate my FIR filter calculations. By storing intermediate values and reusing values for the LPF and HPF, I was able to reduce the calculations down further and finally by just subtracting (LPF+HPF) from the original signal would give me the BPF values.
This can be applied to any calculations actually, but for a FIR filter it was especially efficient. I had been using the state variable filter by Hal Chamberlain from his book (Musical Applications of Microprocessors) but this algorithm was faster when implemented in assembly language, didn't have scaling issues I had to take care of (which is a problem with the state variable filter) and gave much better results when I used it with known signals which I created mathematically on a PC and plotted using gnuplot.
thanks for all the links --- some good reading material in there indeed! The FIR filter looks intreguing--- have to study what you say more...
> Anyways, I'm not interested in arguing research methodology.
When we all 'argue' here it is actually nearly always polite. As you hang out around these forums, I have no doubt you will be surprised by not only how nice and polite folks here *really* are, but also by how helpful everyone is. These are indeed not the ordinary internet-style of forums.
Thanks for the links. I've downloaded the Rabiner HMM paper mentioned in the Wikipedia article. It appears that some familiarity with linear predictive coding (LPC) is a prerequisite. 'More to explore there.
The February 1998 Circuit Cellar article mentioned earlier is perhaps the most interesting, mainly because so much was accomplished with such a primitive (by today's standards) microcontroller. Unfortunately, the author glosses over one key aspect of his algorithm (the time normalization step), making it necessary to dig into the assembly code to unravel it.
I've been working on a simple template matching scheme using the Goertzel algorithm to process incoming sound into discrete frequency bands. I've had a small measure of success, but nothing worth posting just yet. The Goertzel part works well in real time, so I should probably polish it enough for the OBEX. Then maybe someone else can use it to advantage.
Neat, Phil ... looking forward to seeing how you·treat the cos() and " i " functions.
If you understand the Goertzel algorithm enounh to code it, linear prediction should be a piece of cake. (I studied this stuff too long ago, and have forgotten most, but this thread and a few others have renewed my interest to relearn it.)
One thing that might be useful - if you've not thought of this already - is to have several user-settable runtime variables for:
- the number of filters
- filter center freq.
- filter Q
- and, maybe (probably!), gain/attenuation (per filter)
this would make it into a big, runtime parametric equalizer - or rather a comb filter.
(RE PID: These parameters could be controlled as part of the feedback setpoints I mentioned earlier.)
Just to note someone mentioned needing to calculate cos(). If you are referring to the trig function, the ROM has a full sine function 2K values. You should be able to calculate the cos quickly using a lookup into that table. cos(x) = sin(pi/2 - x) or something like that
For computer graphics, I've usually been able to use trig functions based on a 256 entry table of values. In this case, the angles are measured in what is known as bittrians. These are where 90 degrees = 256 bittrians. Then you convert your angles to bitrians and then convert the calculation to a quadrant and look it up in the bittrian table and you are done....very fast.
Generally for graphics (even 3D) you usually don't need values with more than 8 bits of resolution. There are many other applications where you don't need high accuracy either.
I read that very good article on speech recognition and it has given me an idea for how to do low memory speech recognition. I will tell you the idea (which I don't really know if I got it straight form the text) if it works. I took the day off from programming so I will work on that tomorrow. Thanks for all your great ideas and input! Keep it up!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Computers are microcontrolled.
Robots are microcontrolled. I am microcontrolled.
But you·can·call me micro.
If it's not Parallax then don't even bother.
I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.
Robots are microcontrolled.
I am microcontrolled.
But you·can·call me micro.
If it's not Parallax then don't even bother.
I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)
Please post your code as an archive. I can't find Dual_ADC1.spin anywhere. Nevermind, I found it.
-Phil
Post Edited (Phil Pilgrim (PhiPi)) : 8/29/2009 4:17:41 AM GMT
-Hardware is limited to the Parallax DemoBoard -- I've got this down on all my versions exept on the SD card version.
-Uses 1 cog and less than 15KB global ram -- On the simplest one, no problem! Same on the EEPROM one, but it is currently not working. I have another version that is like the EEPROM one only 50x faster because it "compresses" the files (well, not really) into smaller files and fits them onto HUB RAM. It has a small error that makes it freeze but other then that it works fine.
-Uses 1 spin variable to indicate what word was recognized -- Just displays a message now, but this function is simply added.
-Must understand either: "1,2,3,4,5,6,7,8,9,10" or "up,down,left,right,yes,no". -- I went with "up,down,left,right,yes,no" because that is less words that need stored.
-I should be able to take the code, speak the items in any order and not see a mistake. It's ok if I have to repeat myself, speak carefully, be in a quiet room... -- You actually have to speak fast for it to recognize your voice because of buffer storage limitations. Also, since it is a voice comparison system it needs you to sync your voice.
-Code must be MIT license -- Well, duh! Why else would I keep posting it?
These are all the requirements and how I've been doing fitting them. I havn't had time to program yet today, but I will first chance I get. I should have this code completed in a week. Starting Saturday next week I will be leaving on vacation, so I hope to have it done before then. What encourages me most is whenever someone posts on this thread. It is dissapointing when I come to post some new code and no one has commented on the previous versions. Thank you for posting here and keeping my hopes up for this project!
Micro
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.
Robots are microcontrolled.
I am microcontrolled.
But you·can·call me micro.
If it's not Parallax then don't even bother.
I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)
All of the objects you use have to be MIT-licensed as well. Rayman's dual ADC object is not. Maybe you can get him to add the appropriate boilerplate.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.
Robots are microcontrolled.
I am microcontrolled.
But you·can·call me micro.
If it's not Parallax then don't even bother.
I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)
To get the consonants then takes extra processing. I'm not familiar enough to tell you how that part is done....maybe by finding certain types of discontinuities in the wave envelope or something.
-D
Look up forments. You should be able to recognize 1 through 10 only using forments pretty easily. Should take maybe under 1K of code. You don't even need to mess with the consonants. Just think in terms of the words not having consonants, so "Lights" will sound like "Eyes". As long as you don't have words that sound the same when you remove the consonants, you can have as big a vocabulary as you want. 5 and 9 might sound the same....
As I mentioned in my last post, there is a paper out there if you search for it. Consonants is another thing. I suspect it involves playing with the envelope around the wave.
If you want to compare a wave envelope with one by another person speaking, you could take a correllation between the 2 signals. This would be done by taking the peaks of the waveforms and calculating the sum of S1xS2 where s1 and s2 are the samples from the 2 waveforms to be compared.
Maybe doing that and comparing the forments as well combined might be a good way to generalize detection and recognition of the words.
Not that a faster way of calculating correlation is to do an average sum of minimum differences I think it's called.....where you just add the differences between the values so it would be sum of s1 - s2.
For the first method (correllation) it's where the value peaks that you get a match.
For the second it's where it minimizes that you get a better match.
You have to compare the input wave with all the stored waveforms.
I actually haven't used the propeller yet. Not sure if I want to....is why I joined to create a new thread asking a few questions about the architecture as I don't see it being all that great (I have a strong supercomputer background so I'm used to working with parallel processors and a strong embedded background). Anyways, I saw this thread and thought I'd help.
Good luck.
-D
Thanks!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.
Robots are microcontrolled.
I am microcontrolled.
But you·can·call me micro.
If it's not Parallax then don't even bother.
I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)
www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1406&dDocName=en023596
It uses the Hidden-Markov technique (basically a state-machine with probabilistic features), in common with most other successful systems. A similar approach for speech recognition on the Propeller would be advisable.
It only requires 9 MIPS.
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
http://www.google.com/url?sa=t&source=web&ct=res&cd=3&url=http://ca.geocities.com/xxxtoytech/Stewart-91.pdf&ei=B5yaSsqAM4nQtgO0n6GnAg&usg=AFQjCNHLGxJrmFQ9GfKesWhL-xB7RwMOAw&sig2=cnoAE59xFSq3A84-jAfmNQ
That pdf is a Garcia circuit cellar article.....very good and easy to understand. Now this is just for forments.
Someone suggested looking at the dsPIC library. I have seen the term Hidden Markov models used when applied to speech recognition, so to make it more advanced, you could definitely look at that.
Good Luck
-Donald
I find it interesting to see what comes out of this project.
Robert
The only thing they came a little close to was examining the waveform peaks, but being aware of signal processing techniques like correllation will shorten the time to a solution by a factor of 100 to 1.
Bottom line, there are certain laws of physics about speech that you have to be aware of before you can write this kind of code.....so to remain ignorant is not a useful strategy.
Learn the laws of physics and ignore the algorithms....thats fine....then come up with new algorithms from a position of education......that actually makes sense and then Id agree with you, but you might as well hook 2 cans together with string if you want to remain ignorant of the physics of the whole thing.
The propellor will allow you to optimize the signal processing operations.....thats probably how best to use the propellor. Actually, if someone would write a good signal processing library that can use cogs in parallel, that'd be useful for everyone...
-D
Welcome to the forums!
I have to agree with Robert (Robotworkshop) on this one - I do understand the hazard of 'not looking at what's normally done'. But you need to see this thread, this activity in context. "OBC" posted some code and an idea, "Microcontrolled" (who is a younger fellow) picked up on it - and in a matter of *days* came up with something that works pretty well. He can speak for himself, of course, on what he may have researched "out there"... but I'll be he came up with these things by a combination of smarts, hacking at it's best, some exchanges here, and - most important - (re)prototyping his experiment until it worked. If he were to research all the things you mention, then he'd probably still be reading, with no results.
The merit in reinventing is learning the shape and function of wheels.
cheers,
Howard
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
So, by being educated it's not a problem.
Speech recognition is a complex thing to do.
I wasn't aware that he actually got something that worked....?????
-D
BTW, webmasterpdx, the word is "formant", not "forment".
-Phil
If I fully understood all the variables and information involved at this point this would
cease to be a hobby project for relaxation and learning by doing...
Don't get me wrong, I'm reading materials provided, but I enjoy looking for the
missed methods by trying solutions which may or may not be a shortcut passed
some of the difficulties involved.
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Visit the: The Propeller Pages @ Warranty Void.
I'm very happy that so much thought is going into this challenge- it'll be great to have speech recognition for the Propeller. I really like the Propeller because it's hardware is powerful and flexible enough that pretty much anything just becomes a software problem- outputting graphics, grabbing video, computer vision, synthesizing speech, and now speech recognition. I agree that much research has already been done on understanding speech- and that eventually that knowledge should and will be used. I'm a big fan of iterative development where you start by hacking something up quickly, learn what the problems are, doing research, and staying passionate about the problem until it's completely solved. Let's see where this takes us!
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Download a free trial of ViewPort- the premier visual debugger for the Propeller
Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!
While I haven't locked in on speech patterns yet, I did stumble on
a simple method of hand clap controls. Interesting?
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Visit the: The Propeller Pages @ Warranty Void.
clap clap clap
"Yes Master, fetching your slippers now..."
@Hanno (and all): I just had a very crazy thought.· Could PID techniques be applied to voice recognition?
I mean if Viewport "can land a man on the moon," why can't we put PID to work landing a few words in the Prop?
Is that far fetched?
- Howard
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Maybe the Propeller can turn lights on and off? Then we could write a "clapper" application! [noparse]:)[/noparse]
Howard,
Sorry, not following you. PID isn't magic, neither is ViewPort or fuzzy logic. And for that matter neither are the good doctor's neural networks. There's been decades of work getting computers to understand text- and yes, it typically involves working in the frequency domain to look for formants. Without AI, error rates are still unacceptably high. However, getting the Propeller to the level of my dog should be possible.
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Download a free trial of ViewPort- the premier visual debugger for the Propeller
Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!
I was·musing:·P.I.D. divides up the work. As in·"·I "·deals with·the time·and/or freq.domain with simple, discrete integrals. The setpoints could roll/move dynamically over the wavefile creating formant-like info on the fly, that would be the " P "...·(ah... but·non-linearity issues·... I don't know if gain scheduling would have too much overhead). The " D " then acting in a way similar to what OBC and Microcontrolled have done. But then there's the 'noise' factor that dynamic setpoints would cause - like somekind of weird harmonic oscillation.·
To me, extracting the 'correct' result from whatever sample you use is a PID process ... maybe Dr.YouKnowWho wasn't so farout with the idea of "error correction"?
I know I am articulating this very poorly - does this make any sense? I'd rather throw it out there now and refine it later ...
>·... However, getting the Propeller to the level of my dog should be possible.
Then you have one *very* smart dog !
Woof Grrr Woof = "No, Master - you go fetch your own slippers today" [noparse]:)[/noparse])
- H
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Finger-Snap Commander.
Detects 1-3 finger snaps (or table knocks) in succession.
Not perfect, but it does actually work.
Using the microphone on the Demoboard for sampling.
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Visit the: The Propeller Pages @ Warranty Void.
However, You guys got me interested in speech recognition again and I found some interesting links. Note that some of these links will disappear as OGI.COM will disappear eventually as the university in question has been closed down by their "mother ship" ohsu.
First of all, for y'all there is a great page on spectrographs for speech recognition. Basically these are frequency 2D graphs of sound. The red in the pictures show the dominant frequencies for these sounds. They are concerned with the F1, F2 and F3 Formants, since these are what are used. However, they also show how to recognize certain consonants. This page is at:
http://cslu.cse.ogi.edu/tutordemos/SpectrogramReading/ipa/ipahome.html
Be aware that there are ranges of formants for male and female voices for each vowel sound (if you want to create a generalized speech recognition that doesn't require teaching).
Another useful one I found is on Hidden Markov models. I'm new to this too so I'm just starting to read on this...looks very interesting. Seems to be based on probabilities of the sound being uttered being the one compared against:
http://en.wikipedia.org/wiki/Hidden_Markov_model#Architecture_of_a_hidden_Markov_model
Finally, I have a neat little algorithm I developed I call the tri-band FIR algorithm. I developed this for an application on a PIC to calculate a filter for 3 types of filtering, Low pass (LPF), Band Pass (BPF) and High pass (HPF). First, I just found one of the many java sites on the web that will generate arguments to a FIR filter. I was using a simple FIR filter of the form aS0+bS1+cS2. The java program I was using gave me the parameters for LPF and HPF. What I discovered first is the same parameters were used in both the LPF and HPF but just a minus sign here and there was all the dfference (i.e. there was symmetry in the calculations).
Secondly, I found that I could round the arguments (given as floating point) to values that are close to fractions involving powers of 2 (e.g. instead of 0.371, I could use 0.375 which is 3/8). So, basically what that did for me was to enable me to estimate s * 0.371 by (s<<1 + s) >>3 or instead of requiring floating multiplies, just shifts and adds would allow me to calculate my FIR filter calculations. By storing intermediate values and reusing values for the LPF and HPF, I was able to reduce the calculations down further and finally by just subtracting (LPF+HPF) from the original signal would give me the BPF values.
This can be applied to any calculations actually, but for a FIR filter it was especially efficient. I had been using the state variable filter by Hal Chamberlain from his book (Musical Applications of Microprocessors) but this algorithm was faster when implemented in assembly language, didn't have scaling issues I had to take care of (which is a problem with the state variable filter) and gave much better results when I used it with known signals which I created mathematically on a PC and plotted using gnuplot.
Hopefully some of these may be of use to you.
Good Luck.
-Donald
thanks for all the links --- some good reading material in there indeed! The FIR filter looks intreguing--- have to study what you say more...
> Anyways, I'm not interested in arguing research methodology.
When we all 'argue' here it is actually nearly always polite. As you hang out around these forums, I have no doubt you will be surprised by not only how nice and polite folks here *really* are, but also by how helpful everyone is. These are indeed not the ordinary internet-style of forums.
Pull up a virtual chair, relax, and enjoy!
Looking forward to your contributions.
Cheers
- Howard
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Thanks for the links. I've downloaded the Rabiner HMM paper mentioned in the Wikipedia article. It appears that some familiarity with linear predictive coding (LPC) is a prerequisite. 'More to explore there.
The February 1998 Circuit Cellar article mentioned earlier is perhaps the most interesting, mainly because so much was accomplished with such a primitive (by today's standards) microcontroller. Unfortunately, the author glosses over one key aspect of his algorithm (the time normalization step), making it necessary to dig into the assembly code to unravel it.
I've been working on a simple template matching scheme using the Goertzel algorithm to process incoming sound into discrete frequency bands. I've had a small measure of success, but nothing worth posting just yet. The Goertzel part works well in real time, so I should probably polish it enough for the OBEX. Then maybe someone else can use it to advantage.
-Phil
Neat, Phil ... looking forward to seeing how you·treat the cos() and " i " functions.
If you understand the Goertzel algorithm enounh to code it, linear prediction should be a piece of cake. (I studied this stuff too long ago, and have forgotten most, but this thread and a few others have renewed my interest to relearn it.)
One thing that might be useful - if you've not thought of this already - is to have several user-settable runtime variables for:
- the number of filters
- filter center freq.
- filter Q
- and, maybe (probably!), gain/attenuation (per filter)
this would make it into a big, runtime parametric equalizer - or rather a comb filter.
(RE PID: These parameters could be controlled as part of the feedback setpoints I mentioned earlier.)
thanks for your efforts!
- Howard
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For computer graphics, I've usually been able to use trig functions based on a 256 entry table of values. In this case, the angles are measured in what is known as bittrians. These are where 90 degrees = 256 bittrians. Then you convert your angles to bitrians and then convert the calculation to a quadrant and look it up in the bittrian table and you are done....very fast.
Generally for graphics (even 3D) you usually don't need values with more than 8 bits of resolution. There are many other applications where you don't need high accuracy either.
enjoy...
I read that very good article on speech recognition and it has given me an idea for how to do low memory speech recognition. I will tell you the idea (which I don't really know if I got it straight form the text) if it works. I took the day off from programming so I will work on that tomorrow. Thanks for all your great ideas and input! Keep it up!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.
Robots are microcontrolled.
I am microcontrolled.
But you·can·call me micro.
If it's not Parallax then don't even bother.
I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propeller Tools