Speech Recognition for the Propeller: Collaborative Project??

Microcontrolled · 2009-08-25 21:25

I should really make an automated robot with it. YouTube upload in progress........

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

Microcontrolled · 2009-08-25 21:47

www.youtube.com/watch?v=7X_0wLvo0SQ

Here's the video! It is a bad video camera-wise, when I lean in to the mic to speak I move the camera out of view from the LED's. Has anyone tried it yet? I suggest adjusting the accuracy between 1800 and 4000 if it starts thinking that different words are the original word that was spoken.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

Microcontrolled · 2009-08-25 22:27

I'm going to start a new thread for this, so that people know that there is software (although simple) avialible. I will still be working on this as a group project.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

Hanno · 2009-08-25 22:32

Great job Microcontrolled and OBC! The video is a good inspiration to things to come. I would love to include simple speech recognition in 12Blocks- so, here's the challenge:
-Hardware is limited to the Parallax DemoBoard
-Uses 1 cog and less than 15KB global ram
-Uses 1 spin variable to indicate what word was recognized
-Must understand either: "1,2,3,4,5,6,7,8,9,10" or "up,down,left,right,yes,no". I should be able to take the code, speak the items in any order and not see a mistake. It's ok if I have to repeat myself, speak carefully, be in a quiet room...
-Code must be MIT license
Winner gets a ViewPort Ultimate license...
Hanno

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Download a free trial of ViewPort- the premier visual debugger for the Propeller
Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!

Post Edited (Hanno) : 8/26/2009 1:25:11 AM GMT

Microcontrolled · 2009-08-25 22:38

The thing about it is that it must be initialized. The sound waves must match the speaker's voice. That is what makes it possible to do on the Propeller. Also, the voice samples take about 10k of HUB RAM, however, I could add an SD card and be able to do a 1 time initialization by storing them on the SD card in a WAV file format.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

w8an · 2009-08-26 00:17

Hanno, when you say your challenge must be limited to the prop demo board, do you mean it cannot contain any discrete components that are not included (like maybe a 741 op-amp)?

Rayman · 2009-08-26 00:59

I wonder if wavelet transforms would be the way to go...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
My Prop Info&Apps: ·http://www.rayslogic.com/propeller/propeller.htm

Microcontrolled · 2009-08-26 01:02

Limiting it to 5k of program space means that I will have to drop some functions, like playback, and compress some others to fit.

If it is limited to using only Demo Board components then since 12 Blocks loads to RAM only can I store speech functions in the on-board EEPROM? If I cannot then there is no way I can meet your requirements.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

Hanno · 2009-08-26 01:22

Yeah! Great to see you thinking about my requirements! Yes, only hardware allowed is the DemoBoard, that will allow more people to use it- it has a built-in mic and adc, this would be a great use of it. Ok, I was a bit stingy- you now have 15kByte of Hub RAM.
Hanno

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Download a free trial of ViewPort- the premier visual debugger for the Propeller
Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!

Microcontrolled · 2009-08-26 02:08

Thanks! That will open up a wide range of new possiblitys! I think that I can minimize it to 15k, so that is great! I am still working on an I2C version (for the past 2 hours) and have run into some difficulty, so I am glad to know that I miight be able to bypass that.

This is the first project that I have had fun with in the last year or so. Which, in saying, indicates that almost everything I have built since then is eather in progress with no visible hope of success, or is done and does not work. (well, ONE thing got compleated and worked. Rarely do my project survive off a breadboard!) I'm glad that I can at last have something that works and is of interest to others!

--Micro

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

mctrivia · 2009-08-26 02:16

To bad you can't use my super prop. 8mb ram 8mb flash fits dip40 socket

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
propmod_us and propmod_1x1 are in stock. Only $30. PCB available for $5

Want to make projects and have Gadget Gangster sell them for you? propmod-us_ps_sd and propmod-1x1 are now available for use in your Gadget Gangster Projects.

Need to upload large images or movies for use in the forum. you can do so at uploader.propmodule.com for free.

Microcontrolled · 2009-08-26 02:19

What is a "super Prop"? Or are you kidding?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

mctrivia · 2009-08-26 02:23

look in sandbox. I have been designing prop replacement with lot more memory.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
propmod_us and propmod_1x1 are in stock. Only $30. PCB available for $5

Want to make projects and have Gadget Gangster sell them for you? propmod-us_ps_sd and propmod-1x1 are now available for use in your Gadget Gangster Projects.

Need to upload large images or movies for use in the forum. you can do so at uploader.propmodule.com for free.

Dr_Acula · 2009-08-26 02:42

I spent 4 months researching this at the bionic ear institute in Melbourne back some years ago. One of the problems they were having with processing the bionic ear is that a male engineer would say "testing, testing", and it would work fine and then they would put the implant into a child and the child could hear dad but not their friends. High pitched voices clearly have a waveform where the peaks are a lot closer together. The challenge we faced was to try to recognise words from male, female and child speakers as the same word. The abstract is here ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=253692 but IEEE want you to pay to see the article. I might see if I have a copy at home.

In essence, a recognition system needs to be tolerant of deformity, and you need exact control over that process. Band pass filters are a start but the first harmonic of a male speaker might be the same as the fundamental of a child speaker, and the band pass filters can't adjust for that. But it ought to be possible to train something for one particular user.

To take it the next level, FFT is needed. The cochlea has a very precise FFT system in that each hair is tuned to a particular length and so this is a band pass filter with a very narrow width. That means you need to watch for ringing in a discrete filter, plus FFTs with narrow pass bands take more time to process. I did get a FFT working in C once on a CP/M computer but it was about 100x slower than real time. Maybe the prop can do FFT? I think the prop is faster than the DSPs from the early 90s.

Then you end up with a new waveform that looks like some of the pictures already posted. This is much easier to work with. We were using a neocognitron to pick out the peaks and other patterns in the signal and look for more and more complex patterns. The neocognitron is modelled on the visual system of the cat, but there is evidence similar processes work for auditory signals www.scholarpedia.org/article/Neocognitron for an example. It can read handwriting and can be trained with an A and can then recognise an upside down A with no further training. But a neocognitron takes a huge amount of processing power, particularly in the later layers. We were running a 20Mhz 486 all night to process 20 seconds of speech. We had some success using correllation coefficient formulas rather than neocognitron formulas for the pattern matching as these used simple formulas like (sum of x) and (sum of x squared). This was all working on a 4Mhz CP/M system and was using logs and antilogs for the multiply as the Z80 can't do a hardware multiply. We always wanted more neurons! Hmm - what is the brain - 100 billion neurons with 10,000 dendrites per neuron each at about 1000 Hz. 10^11 *10^4 *10^3 = 10^18. What is a propeller - 20 million instructions per second on 8 cogs - 1.6*10^8. Only need to make it 10^10 times faster to simulate a brain...

I guess the thing about speech recognition is that it is easy to teach a few words always said the same way by one speaker. Harder to do it with different speakers. And the big big thing we were working with for the bionic ear was working in the real world with a noisy background, particularly when the noisy background is other speech.

Start simple and work up. A few band pass filters will certainly be able to do something. FFT would be a start - is that possible on a prop?

Oldbitcollector (Jeff) · 2009-08-26 03:06

@microcontrolled:

I attempted to clean up the code a bit... quite a bit actually..

I dumped the trigger because it seemed it was interfering with the samples themselves.
This one "should" toggle between LISTENING and COMPARING and light an LED when a match is heard.

OBC

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?

Visit the: The Propeller Pages @ Warranty Void.

Post Edited (Oldbitcollector) : 8/26/2009 4:01:05 AM GMT

Oldbitcollector (Jeff) · 2009-08-26 03:10

Curious...

Why did you fork the thread into a new one?

There's still a LOT of work to be done here.. [noparse]:)[/noparse]

OBC

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?

Visit the: The Propeller Pages @ Warranty Void.

Bob Lawrence (VE1RLL) · 2009-08-26 03:12

Dr_Acula said...
FFT would be a start - is that possible on a prop?

Here's one example:
propeller.wikispaces.com/FFT

Hanno · 2009-08-26 04:56

Accurate speech recognition with a large vocabulary isn't here yet- not even with desktop power/memory. However, it should be possible to reliably differentiate between ~10 words with Propeller memory/cpu. Keep up the great work!
Hanno

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Download a free trial of ViewPort- the premier visual debugger for the Propeller
Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!

Dr_Acula · 2009-08-26 05:02

Thanks for the FFT link. Looks neat. As Hanno says, 10 words with the prop ought to be possible - I'll be following this thead with interest.

Microcontrolled · 2009-08-26 10:55

I put up a new thread so that it would be easy to find in the future. It is not really for a discussion, just to post updates as they come. After school is out I will get back to work on this.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

Microcontrolled · 2009-08-27 03:00

Okey, not as much accomplished as I hoped. The power went out just 30 minutes after I finished school and didn't come back on till 8:30 in the evening! I worked till 10 tonight on an EEPROM version that records sound samples and then writes the to the EEPROM on the demo board overwriting the current program but this is a copy for 12 Blocks and it dosn't use EEPROM, so I'm good on that deal!

It gets stuck when I get to the save to EEPROM part, so I will post the code tomorrow so that someone more experienced in that field can help me.

Thanks for all the support!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

mctrivia · 2009-08-27 03:19

Check out my I'd object in the obex for how to save to eeprom

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
propmod_us and propmod_1x1 are in stock. Only $30. PCB available for $5

Want to make projects and have Gadget Gangster sell them for you? propmod-us_ps_sd and propmod-1x1 are now available for use in your Gadget Gangster Projects.

Need to upload large images or movies for use in the forum. you can do so at uploader.propmodule.com for free.

Beau Schwabe · 2009-08-27 04:12

Here is something I thought I would throw out there... this is just a little bit of my theory on speech recognition. I have used this method on other systems with success. You must choose your words carefully, because some words can have similar patterns. i.e. you probably don't want to have the commands 'GO' and 'WHOA'... instead use 'GO' and 'STOP' to make more distinguishable patterns.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe

IC Layout Engineer
Parallax, Inc.

Post Edited (Beau Schwabe (Parallax)) : 8/27/2009 4:37:57 AM GMT

Phil Pilgrim (PhiPi) · 2009-08-27 04:34

Good point, Beau. I think that's why "giddyup" and "whoa", along with "gee" and "haw" are used with language-challenged draft animals.

-Phil

Beau Schwabe · 2009-08-27 04:37

@Phil - "language-challenged draft animals." - I've been called worse

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe

IC Layout Engineer
Parallax, Inc.

QuattroRS4 · 2009-08-27 09:47

"language-challenged draft animals." - Brilliant Phil !

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Necessity is the mother of invention'

'Those who can, do.Those who can’t, teach.'
'Convince a man against his will, he's of the same opinion still.'

·

BradC · 2009-08-27 11:24

Phil Pilgrim (PhiPi) said...
Good point, Beau. I think that's why "giddyup" and "whoa", along with "gee" and "haw" are used with language-challenged draft animals.

-Phil

Have you been watching the streaming version of Australian Parliament question time then?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lt's not particularly silly, is it?

Oldbitcollector (Jeff) · 2009-08-27 13:17

@Beau

Thanks for that! Sadly, thanks to taking a day off last week for expo, I spent yesterday
playing "catch up ball" with work until late, and didn't crack out the Prop to smack that
bug you mentioned in the code. (Hopefully tonight!)

@Phil: language-challenged draft animals LOL.

OBC

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?

Visit the: The Propeller Pages @ Warranty Void.

Microcontrolled · 2009-08-27 15:10

Thanks for the doc, Beau!

Is the waveform shown in digital, or analog? If it is in digital then I might be able to accomplish it, I am not very good with analog signals. The object I am using converts them, so it is possible for me to write the simple program that I wrote because it is only the comparison and the transfer of numbers.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

Beau Schwabe · 2009-08-27 15:26

microcontrolled,

The waveform is just the digital ...0-255... representation of the incoming analog signal.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe

IC Layout Engineer
Parallax, Inc.

Speech Recognition for the Propeller: Collaborative Project??

Comments