Shop OBEX P1 Docs P2 Docs Learn Events
just how much data can you get from a mic? — Parallax Forums

just how much data can you get from a mic?

MuncherMuncher Posts: 38
edited 2006-06-25 02:59 in BASIC Stamp
Can you retrieve enough info from a plain old mic to do voice recognition and how? I do not need any code, although code for the BS2 would be helpful.

Thanks in advance,
Muncher

Comments

  • Mike GreenMike Green Posts: 23,101
    edited 2006-06-07 03:40
    Over the years, there have been a variety of programs (from the TRS-80 and Apple II to the PC) and chips that can take the output of a microphone, amplify it a little, normalize it (adjust the volume to a "standard" level and locate pauses in speech and adjust the timing of the data to standardize that), and look for patterns in a small dictionary. Usually these systems could distinguish 10-16 "words" (with a lot of mistakes), usually the digits and a few control words that were quite different. You might be able to find these programs and chips on the internet. Some programs were developed as thesis projects and were published. It's much more complicated to break apart speech into phonemes or other units and it's a lot of sophisticated work to come up with reliable text from speech. This kind of work is hard for a current high end Pentium or PowerPC processor with hundreds of MB of memory.
  • Mike GreenMike Green Posts: 23,101
    edited 2006-06-07 03:59
    Here's one research paper on an implementation of a robust speech recognizer on a Sharp Zaurus under Linux:
    http://www.cs.cmu.edu/~awb/papers/ICASSP2006/0100185.pdf#search='embedded speech recognition'
    Also, check out www.sensoryinc.com. They make both a speech recognition chip and speech synthesis chips.
    A Stamp would be too slow to do the signal processing (usually recording the timing of zero crossings and average
    energy in the sounds), but a SX or PIC microprocessor (or the Propellor) could do the job.
  • MuncherMuncher Posts: 38
    edited 2006-06-07 04:02
    I don't need to break words down into anything, I just need to distinguish 4-6 words (or sounds - a,e,i,o,u - if that is easier) from each other. For this, I need the dominate frequency and its volume (I think). I was simply wondering if someone could help me with this.

    I am still open to any advice (of course).

    -Muncher
  • HarborHarbor Posts: 73
    edited 2006-06-07 12:31
    If you just hope to mess around with a Stamp to learn something about voice recognition, I can't help. Darned if I know whether anyone ever tried something like that in Basic of any variety. The only stuff I ever dabbled in started with a running FFT of the signal and got more compute intensive from there. These days it's probably wavelets and C# on a DSP.

    On the other hand, if you have an application and are just looking for a part to get a solution, then try the Voice Direct 364. I haven't used it myself, but it specs at fifteen words max. I'm sure you can google that name and get enough hits to find a source for the part.

    You run the audio signal to this chip. I don't remember what conditioning is required. The way it looks to your system is one pin that goes high when it hears any word it's configured to listen for. Then one of eight pins goes high to indicate which word. You can drive other circuits from those outputs, like an on/off circuit, or make them inputs to a Stamp and do anything you like. (And no, I don't remember how they use eight pins to distinguish which of fifteen words were heard. Presumably a pin with a high order bit that means which bank of words caught the 'hit'.) I looked at this part for an application a couple of years ago but never got around to sampling it.

    It's about fifty bucks apiece in small quantities I think.
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2006-06-07 13:45
    When I was a kid, many moons ago on an ATARI computer in BASIC I wrote a small program that looked at the "paddle" controls (<--essentially a potentiometer) that was able to distinguish
    between "YES" and "NO" from my voice. Prior to making the distinction, I made a program that would display the average amplitude of sound over time. After studying this I was able to discern
    a pattern from this that I could use to determine a Yes or No response. From what I remember, graphically the word "YES" ramps up slowly and drops off quickly, where as the word "NO" does
    virtually the opposite. It ramps up quickly, and then tapers off slowly.

    Depending on the actual words or sounds you want to detect, it could be doable but it will most likely be a meticulous re-iterative process looking for the specific patterns. ...And keep in mind
    what works for your voice may or may not work with someone else's.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Beau Schwabe

    IC Layout Engineer
    Parallax, Inc.
  • MuncherMuncher Posts: 38
    edited 2006-06-07 14:15
    Thanks guys. I have no need to code an FFT, although, now that I think about it, I did see something a while ago about an alternative to FFT that is easier on the Mcu... I'll have to google that. As for now, I'll have to settle for something other than voice-controlled - I live on a shoestring budget, and $50 is a bit too much for one chip. I'll keep looking for easier solutions.
  • SciTech02SciTech02 Posts: 154
    edited 2006-06-07 23:23
    Yeah, unfortunatly voice recognition isn't as easy as it sounds (I know).· I don't know any commands·in PBASIC that could be used to monitor sound in a advanced mater (for speech recognition).· However,·you could get a voice recognition system.··They used to have a voice recognition stamp called the voice direct 2, but it's discontinued (and it was $40 too).· They have a new one that comes in a kit for $120 (or around that) at mikro electronica.· Check out the "speech recognition (Has it been done?)" topic:

    http://forums.parallax.com/showthread.php?p=579853

    It's around page 2.

    I hope this helps.


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    There is always an answer.

    There is always a way.
    There is always a reason.··· -SciTech02.
  • HarborHarbor Posts: 73
    edited 2006-06-08 02:37
    Beau Schwabe said...
    When I was a kid, many moons ago on an ATARI computer in BASIC I wrote a small program that looked at the "paddle" controls (<--essentially a potentiometer) that was able to distinguish between "YES" and "NO" from my voice.
    Oh my. Talk about mixed emotions, Beau. My first reaction was "Wow. Good stuff. No wonder you have a job at a neat place like Parallax if you were writing code like that as a kid."
    My second one was dismay when I realized they brought out the Atari about the time I retired. Yikes. I'm feeling really really fatigued tonight... Think I'll take a liver pill and go lie down for awhile. Do you have an emoticon on the left here for <deep sigh>? My old eyes can't tell without a looking glass.
    Harbor (Who would love to know how you rigged the paddle controls to be sensitive enough for picking up sound intensity, but he probably needs to get to bed early tonight. Too much excitement just thinking about paddles and intense impulses.)
    Besides, I can't even get the editor to listen to me. Time to retire. Again...

    Post Edited (Harbor) : 6/8/2006 2:42:43 AM GMT
  • cdubcdub Posts: 26
    edited 2006-06-25 02:59
    I've done a fair amount of voice rec on the pc side, so if you get to a point where want to involve a computer, I'll be happy to share all code I used to build an inventory system that uses a barcode scanner to scan grocery items and take in new items using voice recognition. It's not too bad, but you get better and better recognition results the more you train the system.· All code is VB6 which easily integrates with the Basic Stamp projects.
Sign In or Register to comment.