Shop OBEX P1 Docs P2 Docs Learn Events
Speech Recognition for the Propeller: Collaborative Project?? — Parallax Forums

Speech Recognition for the Propeller: Collaborative Project??

Oldbitcollector (Jeff)Oldbitcollector (Jeff) Posts: 8,091
edited 2009-10-14 01:50 in Propeller 1
After seeing Chip's demo at UPEW I wanted to take his idea to the next level for
some simple speech recognition for the Propeller.

NOTE: This project is in very early stages! At this point it simply displays the
recorded data and plays it back. It is based on Rayman's Parrot player.

I'm using Propterminal so I can work in my livingroom. (Included in the zip)
You'll need the demoboard with microphone (or equiv) to make this work.

I'm looking to "loosely" match high points in the waveform, for simple
one word commands. The command "LIGHTS" might turn on an LED. [noparse]:)[/noparse]

Posted for comments and suggestions!

OBC

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?

Visit the: The Propeller Pages @ Warranty Void.

Post Edited (Oldbitcollector) : 8/26/2009 3:17:26 AM GMT
«1345

Comments

  • jazzedjazzed Posts: 11,803
    edited 2009-07-06 01:14
    Maybe Chip has ideas on how this could be accomplished? I have ideas, but 0 experience in this realm.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • mparkmpark Posts: 1,305
    edited 2009-07-06 01:27
    Is there a schematic of the microphone interface somewhere for non-demoboard users?
  • Mike GreenMike Green Posts: 23,101
    edited 2009-07-06 01:35
    The Demo Board schematic is posted on the Propeller Downloads page. You can also use the comments of the microphone to headphones and microphone to VGA objects in the Object Exchange.
  • hover1hover1 Posts: 1,929
    edited 2009-07-06 01:39
    mpark said...
    Is there a schematic of the microphone interface somewhere for non-demoboard users?
    Here you go..
    http://www.parallax.com/Portals/0/Downloads/docs/prod/prop/PropDemoDschem.pdf

    It's also in the the Propeller IDE Help

    Jim
  • PhilldapillPhilldapill Posts: 1,283
    edited 2009-07-06 06:14
    Wow... I spent an entire semester racking my brain on how to do this(numbers 1 - 10), and that was with a desktop PC and MatLab... If the Propeller can handle simple speech recognition, I can't say I will be too amazed. This is a wonderful chip... [noparse]:)[/noparse]
  • LeonLeon Posts: 7,620
    edited 2009-07-06 06:27
    Low-cost speech recognition system like that which used to be available for use with the TRS-80 were very simple - they used a number of of bandpass filters (implemented with op-amps) with the outputs fed to comparators. The comparator outputs were sampled using an input port, and the samples compared to templates previously created in training sessions. IIRC, it could deal with 32 words. I spent some time playing with the TRS-80 unit and it worked quite well.

    Something like that could be implemented using an FFT. I keep meaning to try it with a dsPIC, it would be quite trivial.

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle

    Post Edited (Leon) : 7/6/2009 6:41:24 AM GMT
  • Chris MicroChris Micro Posts: 160
    edited 2009-07-06 10:07
    > ... - they used a number of of bandpass filters

    I made some attempts to realize a speech recognition system on an Atmega8. I ended up in an early stage of a speech recognition system. I implemented 6 bandpass filters in software an controlled a robot by whistles
  • Oldbitcollector (Jeff)Oldbitcollector (Jeff) Posts: 8,091
    edited 2009-07-06 13:39
    Leon said...
    Low-cost speech recognition system like that which used to be available for use with the TRS-80 were very simple - they used a number of of bandpass filters.....

    Yup, that's the direction I'm heading.. with spin..
    As soon as I can narrow the data down a bit, then I will need to
    write a loose comparator routine.

    I'll post an update when I get a little further into this..

    OBC

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    New to the Propeller?

    Visit the: The Propeller Pages @ Warranty Void.
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2009-07-06 16:32
    The discussion pjv, Oldbitcollector, and myself had at the EXPO about speech recognition relied more on looking at the integral energy for pattern recognition·more than the frequency of the speech itself.· If you capture the integral energy at specific time intervals,·and compare that to incoming data then the only normalizing that you need to do is adjust for the various rates that the word is spoken at.· This is a fairly simple algorithm to "expand" or "compress" the buffer length before the comparison is made against the stored data.· This method limits some of the words that can be used, i.e. similar words·will have similar energy patterns and can be difficult to distinguish.· But simple word commands will work just fine.· Attached are·examples of various words (Yes, No, Forward, Reverse) where you can see very specific patterns under the integral energy.·

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Beau Schwabe

    IC Layout Engineer
    Parallax, Inc.

    Post Edited (Beau Schwabe (Parallax)) : 7/6/2009 4:39:45 PM GMT
    1280 x 1024 - 218K
    1280 x 1024 - 254K
    1280 x 1024 - 387K
    1280 x 1024 - 327K
  • Chris MicroChris Micro Posts: 160
    edited 2009-07-06 17:48
    LEON said...
    Something like that could be implemented using an FFT. I keep meaning to try it with a dsPIC, it would be quite trivial.

    During my experiments I found an interesting students project in some american universtiy. The used an Atmega32 and made an FFT. Than the save the spectral peak over time for some words and afterwards compared this to some new spoken words.

    In my opinion the it is not the right way to use an equally spaced fft. It is better to use some bandpass filters with filter frequencies adjusted to the formant frequencies, This reduces the amount of data to be stored for one word and could give good results.

    By the way, here is the video of my sound controlled robot.
  • jazzedjazzed Posts: 11,803
    edited 2009-07-06 17:53
    Wasn't there a neural net done with Propeller? I guess it would be a hog for memory especially if done with floating-point weights, but not such a big deal if the FFT "signature stream" is saved in EEPROM or SDCARD as scaled signed byte or word types.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • TinkersALotTinkersALot Posts: 535
    edited 2009-07-06 21:14
    OBC:

    have you seen: http://www.circuitcellar.com/library/print/0298/stewart91/index.htm

    may be interesting in terms of techniques?
  • Oldbitcollector (Jeff)Oldbitcollector (Jeff) Posts: 8,091
    edited 2009-07-06 22:44
    TinkersALot said...
    OBC:

    have you seen: http://www.circuitcellar.com/library/print/0298/stewart91/index.htm

    may be interesting in terms of techniques?

    Good reading... Thank you!

    OBC

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    New to the Propeller?

    Visit the: The Propeller Pages @ Warranty Void.
  • Mike HuseltonMike Huselton Posts: 746
    edited 2009-07-07 02:40
    Good Lord, TinkersALot! That was an incredible article! I have to do the followup research now, just to prove the author wrong. If the author is correct, then I have learned something truly amazing. I did electronic circuit design for ten years before being sucked in the black hole of digital software design. I really cannot believe that I spent all that time remaining ignorant of this simple technique.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    JMH
  • LeonLeon Posts: 7,620
    edited 2009-07-07 09:10
    It's a similar technique to something that was in DDJ 30 years ago for digitising and playing back speech. I tried it at the time, and the speech was recognisable, with what sounded like a loud buzz saw in the background.

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle
  • Oldbitcollector (Jeff)Oldbitcollector (Jeff) Posts: 8,091
    edited 2009-07-07 13:46
    @JMH

    I'm still digesting the material in that article, but you are dead on!
    A modification of this appears to be the direction to go...

    Can't wait to get caught up on work to dig out my demoboard again!

    Here's a direct link to the PDF.
    This put Circuit Cellar on my "must subscribe soon" list.

    OBC

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    New to the Propeller?

    Visit the: The Propeller Pages @ Warranty Void.
  • Oldbitcollector (Jeff)Oldbitcollector (Jeff) Posts: 8,091
    edited 2009-08-24 19:53
    Guys,

    In the other "Dr. Jim" thread, I suggested that this might become a community project.
    There is just too much that is still over my head, but perhaps if several of us work on this
    we can make it work.

    I really believe we could pull off single word speech recognition for the Propeller using
    a simple demoboard and a connected SD card.

    Any takers on collaboration for this project?

    OBC

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    New to the Propeller?

    Visit the: The Propeller Pages @ Warranty Void.
  • RobotWorkshopRobotWorkshop Posts: 2,307
    edited 2009-08-24 20:02
    Taken from my other thread about recognition:

    If you want to try to do some voice recognition on the Prop I can go through my old computer magazines and dig through some of my old computer relics. There were quite a few early Voice recognition boards and I am sure the Propeller today could do a better job. Although I no longer have an S-100 based system I still have one of these boards:

    http://www.computerhistory.org/collections/accession/102670945

    The software is on a paper tape but I think the docs explained how it worked. There was also the VOXBOX for the TRS-80 which ran on the Model I system. I also know of some code for a 6502 that worked reasonably well for speech recognition. Also, I think there may be an old article from Byte or Circuit Cellar that dealt with speech.

    Since I'm not up to speed yet on SPIN programming I don't know how much I could contribute other than research, ideas, and suggestions. I'd be glad to help where I can.

    I also remember another product for the IBM PC called "Hearsay" and one version was an ISA card that plugged into the machine. I'm sure the more I dig through my old stuff I will come across more examples too.

    Robert
  • DogPDogP Posts: 168
    edited 2009-08-24 20:02
    I'm sure this isn't what you're looking for, but I see Parallax has this now: http://www.parallax.com/Store/Accessories/Communication/tabid/161/ProductID/589/List/1/Default.aspx?SortField=ProductName,ProductName , in case someone is looking for speech control and just needs to get it done.

    DogP
  • Oldbitcollector (Jeff)Oldbitcollector (Jeff) Posts: 8,091
    edited 2009-08-24 20:07
    I'm thinking that some code from this project may be helpful for the audio input side.
    They have already reduced the sound to simple patterns for display.


    @RobotWorkshop:

    'Speechlab Speech recognition board that occasionally "guessed" the correct word.' Wow.. LOL.


    OBC

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    New to the Propeller?

    Visit the: The Propeller Pages @ Warranty Void.
  • lonesocklonesock Posts: 917
    edited 2009-08-24 20:51
    I've started dummying(?) up something similar to TinyVoice on my PC, with the intent of porting to the prop when I get it working. I'm actually using a cascaded set of simple 1st order low-pass filters to filter into F1, F2, and "Above that" bands. Then I count 0-crossings and use an average absolute value as a faux power tracker. So a 6 parameter vector per sampling point (collected approximately 50 times a second or so). I think it will be fairly easy to do a DTW type function on the prop as well.

    The filtering simply uses a tracking variable, e.g.:

    delta := value - track1
    track1 += delta >> 2

    You control the corner frequency using the scaling factor (delta >> 2), relative to the original sampling frequency.

    Jonathan

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.
  • w8anw8an Posts: 176
    edited 2009-08-24 21:01
    I think I still have one of those SpeechLab boards and a couple S-100 machines to run it on.. Geez, I guess I'm getting behind on that basement cleanup...

    ..Steve

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    ..Steve

    Actual statement from an HP tech support chat session:
    "do you have an email address to help you send you what i need you to send me to help you"
  • MicrocontrolledMicrocontrolled Posts: 2,461
    edited 2009-08-25 01:19
    I can do SPIN, not ASM. I will throw a few attempts at this project in my spare time, if you want. I am not an excellent programmer.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Computers are microcontrolled.

    Robots are microcontrolled.
    I am microcontrolled.

    But you·can·call me micro.

    If it's not Parallax then don't even bother.

    I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)



  • MicrocontrolledMicrocontrolled Posts: 2,461
    edited 2009-08-25 20:46
    OK! After about 3 1/2 hours programming this, I have finally got a super-simple speech recognition program! The comments at the top (that's about the only place they are tongue.gif) describe how it works.

    Now back to programming!!

    Edit: Oops! Forgot to mention that it is based solely off OBC's program. Without it I could not have made this.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Computers are microcontrolled.

    Robots are microcontrolled.
    I am microcontrolled.

    But you·can·call me micro.

    If it's not Parallax then don't even bother.

    I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)



  • Oldbitcollector (Jeff)Oldbitcollector (Jeff) Posts: 8,091
    edited 2009-08-25 20:48
    Did you really! "experts" have been working on this for months.. [noparse]:)[/noparse]
    (No, I'm not one of them!)

    Ok, spill the beans... I'm at work and can't test it yet! How does it work?

    BTW, my code was based on Rayman's code..

    OBC

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    New to the Propeller?

    Visit the: The Propeller Pages @ Warranty Void.
  • MicrocontrolledMicrocontrolled Posts: 2,461
    edited 2009-08-25 21:04
    Well, as I said, too simple. It sometimes will work great and sometimes won't work at all, so it's not very reliable.

    It works by taking your program and replacing the "Play" (although it can be put back in if you want playback, I did that) with a new "CheckRecording" subroutine. It has not only a WAV byte array, but also a "Final" byte array as well. The extra subroutine takes the first part of the "Start" routine from your program and the "Play" subroutine and combines them after the "Record" subroutine call in the "Start" subroutine. It then replaces the playing part of the "Play" routine with a function that checks the sound within a range that is set in the CON block. It reads it byte by byte in decimal mode and then once checked it writes them to temporary VARs. It is doing this in a repeat loop so the temporary VARs get overwritten every time it loops. Anyway it is reading the previous recording and the new recording at the same time, so that is what it is comparing. If it is in the set range (say, "40") then it writes +1 to the globel "sound" VAR. When it is done with this it checks for how consistent it is by guessing how many sound bytes where within range by a value set in the CON block. The value can be 0000 (none right) or 8000 (all right, impossible match) it does the comparison with a "=>" sign so it can be more exact, but not less exact.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Computers are microcontrolled.

    Robots are microcontrolled.
    I am microcontrolled.

    But you·can·call me micro.

    If it's not Parallax then don't even bother.

    I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)



  • jazzedjazzed Posts: 11,803
    edited 2009-08-25 21:05
    I think your effort is admirable.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve

    Propeller Tools
  • Oldbitcollector (Jeff)Oldbitcollector (Jeff) Posts: 8,091
    edited 2009-08-25 21:12
    An excellent start in speech recognition direction!

    So we should add the warning of not using this code for failsafe shutdown of a mechanical device? [noparse]:)[/noparse] [noparse]:)[/noparse] [noparse]:)[/noparse]

    OBC

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    New to the Propeller?

    Visit the: The Propeller Pages @ Warranty Void.
  • MicrocontrolledMicrocontrolled Posts: 2,461
    edited 2009-08-25 21:14
    DEFINITELY NOTHING THAT DEPENDENT!!! smile.gifsmile.gif

    I should have a YouTube video up soon.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Computers are microcontrolled.

    Robots are microcontrolled.
    I am microcontrolled.

    But you·can·call me micro.

    If it's not Parallax then don't even bother.

    I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)



  • CounterRotatingPropsCounterRotatingProps Posts: 1,132
    edited 2009-08-25 21:18
    > ... not using this code for failsafe shutdown of a mechanical device?

    As in:

    "Open the Pod Bay doors HAL" ?

    LOL

    Nice work OBC and Microcontrolled ! This is fun stuff ...

    "f e t c h s l i p p e r s J e e v e s "

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Sign In or Register to comment.