Shop OBEX P1 Docs P2 Docs Learn Events
Voice Recognition Analysis - Page 2 — Parallax Forums

Voice Recognition Analysis

245

Comments

  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-03 06:49
    jazzed,

    I think you nailed it. There can't be that many 8-pinners with Vdd on pin 6. It also explains the lack of resistors. But what an odd choice for a "filter" (if it even is one): the output could drive a speaker. Any idea what the passband of that circuit might look like? Or is someone having a little fun at our expense? smile.gif

    -Phil
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2009-09-03 06:59
    Well done jazzed. Supply on pin 6 got me too. With the cap across pin 1 and 8 it gives a gain of 200 www.josepino.com/?mini_amplifier_lm386

    Not sure about the input components - are they a high or low pass filter or neither?
  • heaterheater Posts: 3,370
    edited 2009-09-03 07:25
    10K to Vs and 0.1uF to ground make a low pass filter.

    Frequency at which power is attenuated to one half power in Hz is

    fc = 1 / (2 * PI * R * C)

    which I make about 160Hz.

    Edit: That's probably nonsense. It totally depends on the output impedance of what ever is driving it.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.

    Post Edited (heater) : 9/3/2009 7:30:22 AM GMT
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-03 09:18
    Heater,

    LOL! I followed that exact same path in my thought process: LP filter -> oh, wait.

    -Phil
  • LeonLeon Posts: 7,620
    edited 2009-09-03 10:12
    I had a PM from Beau about my surmise that it might be an op amp, he thought it was an LM386. It's not a logic chip, as Mark claimed, anyway.

    It looks very like the standard LM386 circuit to me, apart from the R and C on the input. That isn't an LP filter, as the resistor is effectively in parallel with the capacitor. The resistor must be the one usually used for supplying electret mics, although the value is rather high at 10k, they are usually 1k-2k. An LP filter has this configuration:

    ------R------
                |
                C
                |
    
    
    



    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle
  • heaterheater Posts: 3,370
    edited 2009-09-03 10:39
    It just occurred to me about the DC supply for the FET in the electret mic. I've seen a few mic amps with 10K to do that.

    Isn't it so that the R in your ASCII art circuit is the actually output impedance of the mic and therefore the 0.1uF in Jazzed's picture makes a low pass filter ? Roll off depending on the microphone.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Chris_DChris_D Posts: 305
    edited 2009-09-03 10:47
    Mallred,

    Are you and Jim both new to the internet and public presentation of big claims?· There are a few things that trigger the responses you are getting, let me start with a few of the most basic....

    1) Company name is MIT - gee, doesn't this·appear like someone without credentials trying to sound important?··Come on guys, you surely picked that name with intent to capitalize on their efforts.

    2) The name "Dr. Jim", okay, is he a doctor of something or does he just play one on the forums to make himself sound more important?

    3) Big claims require big proof.· UFOs, Bigfoot, Mind control, and Machine Intelligence are big claims.· You will notice that you are having the same problems that people who have reported seeing UFOs, Bigfoot, etc. are having.· You make a big claim, you better have big proof.·

    4) When promoting products for sale through your company on public forums, expect to back up your claims.· Someone else made a comparison to a "snake oil salesman".· When you are profiting off of other people, it is their right to question your claims.

    So far, you and this person referred to as "Dr. Jim" have talked about something "Big", are selling products to people and have been met with a lot of questions that have never been answered to the customers (or audience's) satisfaction.·You might want to learn how to satisfy customers (and potential customers) if you wish to succeed in business.

    Maybe it would be in the best interest of your company to step back, rethink what you are trying to do and then wait until you have something more than an idea to talk about before posting again.· Rather than restarting with claims that you will be doing something big, start with some demonstrations of that technology actually working.

    Chris
  • LeonLeon Posts: 7,620
    edited 2009-09-03 10:55
    Heater:

    That could be the case. I've just remembered that the output impedance of those mics is the value of the bias resistor.

    The carrier wave produced by the vocal chords is 200Hz to 300 Hz, for male speakers, which is different from your calculated value of 160 Hz. It's not a very good filter, anyway.

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle
  • Agent420Agent420 Posts: 439
    edited 2009-09-03 11:53
    mallred said...

    Personally I would like to share my enthusiasm about a possible breakthrough on tough issues. Unfortunately, it always ends in name calling and a shouting match.

    I am slowly coming to the conclusion that I do not have the technical ability to present what we have to offer in terms you appreciate, that is, very technical and detailed. I have access to Dr. Jim, but he is more interested in pushing forward on our project than in answering questions.
    With all due respect, if you are discussing technical breakthroughs on a controversial or difficult subject, you need to be able to articulate your work as clearly as possible, else risk what I would say to be understandable criticism.· As you are additionally marketing products based on your research, how could you expect anything less than requests for detailed, technical information?

    I note that on your MIT company webpage, you state:
    6. Machine Intelligence software - already complete, but needs to be ported to our new platform (Propeller chip with KISS OS using SPIN code and Assembly Language, which is what the Propeller understands natively).
    

    But when requests regarding further information or demonstrations of this material are made, which I think is quite fair given you have used that statement in terms of advertising products you are selling, suddenly that work is classified and only operates on highly custimized military pc's, so no information or demonstration is possible?· Porting infers that you have running code on another architecture...· Are we to surmise that Dr Jim has this classified system in his office, but it is not so classified that you can sell it on the Propeller?

    For that matter, Dr Jim has stated in the ROBOT interview that he has had verbal conversations with a system he built previously.· If that is the case,·has not the work on the VR filter and algorithims already been done?· I do not see the need to hypothesize how to accomplish this; you've already claimed to have done it.

    Enthusiasm aside, you must admit that even the technical documentation for the KISS OS you have released contains very little information as well.· Frankly, I am surprised that any of your customers have been able to accomplish much based on the resources you have provided them.·


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Post Edited (Agent420) : 9/3/2009 3:02:33 PM GMT
  • LeonLeon Posts: 7,620
    edited 2009-09-03 13:27
    One of their customers got very excited when KISS_OS ran on his hardware:
    said...
    This customer has successfully tested the KISS_OS with a 1TB hard drive. His response was "Wow, it works! This is exciting!"

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle

    Post Edited (Leon) : 9/3/2009 1:32:20 PM GMT
  • mallredmallred Posts: 122
    edited 2009-09-03 16:35
    I'm not sure if that schematic is correct or not. I asked Dr. Jim if he would release it, since we have released the pictures, but he does not want to do that.

    I will try to post video of our filter working so you can see before and after shots to see what is left after the signal passes through the filter.

    Other than that, I know we have a deficiency in our documentation. Dr. Jim admits to this. We will be working to flesh out our documentation.

    And in about a month we should be able to start demonstrations on machine intelligence as we train our robot brain.
  • jazzedjazzed Posts: 11,803
    edited 2009-09-03 18:09
    It's hard to tell from those pictures, so I don't know if the schematic is right either [noparse]:)[/noparse] Dr. Jim is pretty self-absorbed to the point of ignoring all others (which is fine by me), so I don't expect him comment or show his documented design [noparse]:)[/noparse] Obviously this schematic is only for the "Chattering Class" to ponder ... good alliteration by the way (not better than "pompous premises" though [noparse]:)[/noparse].

    The bottom of the board is most important other than the IC and the wiring is hard to interpret. The IC seems pretty obvious to most of us, but "we" could be wrong. I seriously doubt that it's an A/D chip. It is curious to me why a filter might be on the input ... that's OK [noparse]:)[/noparse] Looking at some condenser microphone descriptions, it seems that I missed the microphone ground connection.

    Phil, Leon, Etal: correct me if I'm wrong, but if you have Xc = 1/2PIfC doesn't that mean impedance goes to 0 as frequency goes up effectively shunting higher frequencies to ground? If the microphone has a series impedance around 50 ohms, the cutoff frequency would be close to 33Hz (or about 170Hz for 8 ohms).

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve

    Propeller Tools
  • Agent420Agent420 Posts: 439
    edited 2009-09-03 18:29
    > Obviously this schematic is only for the "Chattering Class" to ponder

    I think if you analyze that filter with a sweep generator, you will discover that it removes the chatterring fundamental.· This explains why he can't hear you.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • mallredmallred Posts: 122
    edited 2009-09-03 18:36
    Dr. Jim said specifically that you want to filter out the carrier wave first.· Then you manipulate the modulation wave, which is a much simpler wave, and can therefore be handled on a Propeller.· This is why we have the filter first.
  • jazzedjazzed Posts: 11,803
    edited 2009-09-03 19:09
    Mark, the carrier would be filtered whether the filter is on the input or on the output of the OpAmp ... as long as the carrier is filtered before the Propeller sees it. My problem with an input filter is, you may end up filtering out a lot more pre-amplified signal than you might like. But like I said, that's OK too [noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve

    Propeller Tools
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-03 19:23
    Mark,

    I'm still at a loss about a number of things:

    1. "Carrier" vs. "Modulation": what are these exactly? Can Dr. Jim relate these to more commonly-used speech terms like "formants", "vocalization", and "amplitude envelope"? It does no good to give a presentation or demo if no one knows what the terms you're using refer to.

    2. What is the bandpass of the filter? If, as someone has suggested, it's a first-order lowpass filter with a cutoff of 170Hz, what's left? If it were simply an envelope detector, it would still need a diode somewhere.

    3. Why use an analog prefilter at all? You can hook a mic directly to a sigma-delta Prop input and do as much filtering as you want in software. Multiple projects presented in the forum have proven this.

    "Enquiring minds want to know!"TMsmile.gif

    -Phil
  • LeonLeon Posts: 7,620
    edited 2009-09-03 20:09
    The vocal chords provide a carrier wave (about 200 Hz to 350 Hz) which is modulated by the mouth, tongue etc. with frequencies from about 1 kHz to 2.5 kHz. The modulation signal is actually the speech formant. It's basically a different way of looking at speech production.

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle

    Post Edited (Leon) : 9/3/2009 8:16:22 PM GMT
  • mparkmpark Posts: 1,305
    edited 2009-09-03 21:01
    That does seem to be the obvious translation from Dr. Jim-speak to commonly accepted speech terminology, but I wonder... Formants are a frequency domain thing; you see them in spectrographs, as Phil has shown. Dr. Jim (through Mark) claims that one can see the modulation wave on an oscilloscope, implying that it's a time-domain phenomenon. I'm very curious to see the promised video of a modulation wave.

    Also, if Dr. Jim's method discards the carrier wave (and "carrier wave" means what we guess it does), does that mean that his method won't work with tonal languages?
  • RobotWorkshopRobotWorkshop Posts: 2,307
    edited 2009-09-03 21:16
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-03 21:28
    The vocal cords provide a "buzzing" sound or impulse noise from their opening and closing that's rich in harmonics. If anything gets "modulated" by the remainder of the vocal tract to produce discernable formants, it's these harmonics, not the fundamental. That's why it seems so odd to filter out the "carrier" since, by extension, that has to include the harmonics as well. But then there's nothing left. I wish Dr. Jim would take some time out to explain in detail exactly what he means by "carrier", "modulation" and "filter".

    Mark, a simple clarification along these lines would go a long way towards removing the taint of what some might perceive to be "voodoo" or "snake oil".

    -Phil
    ___________________________

    "When I use a word," Humpty Dumpty said, in a rather scornful tone, "it means just what I choose it to mean - neither more nor less."
    "The question is," said Alice, "whether you can make words mean so many different things."
    "The question is," said Humpty Dumpty, "which is to be master - that's all."
    Through the Looking Glass.

    Sorry, 'couldn't resist! smile.gif
  • CounterRotatingPropsCounterRotatingProps Posts: 1,132
    edited 2009-09-03 21:59
    Mr. Mallred,

    Repeating in my own words, an oversimplification of replies to this thread so far:

    Regardless of terminology, if that input filter you showed us is low pass, rolling off sharply at very low Hertz frequencies - it would be as if you were speaking into a microphone that was turned off! (As Phil just said "there's nothing left.")

    Can you see here another reason why we are so skeptical? *Please* understand, I mean this with no ill will what so ever.

    thanks
    - Howard

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • LeonLeon Posts: 7,620
    edited 2009-09-03 22:03
    mpark said...
    That does seem to be the obvious translation from Dr. Jim-speak to commonly accepted speech terminology, but I wonder... Formants are a frequency domain thing; you see them in spectrographs, as Phil has shown. Dr. Jim (through Mark) claims that one can see the modulation wave on an oscilloscope, implying that it's a time-domain phenomenon. I'm very curious to see the promised video of a modulation wave.

    Also, if Dr. Jim's method discards the carrier wave (and "carrier wave" means what we guess it does), does that mean that his method won't work with tonal languages?

    It's nothing to do with Dr Jim, a carrier produced by the vocal chords which is modulated by the mouth etc. is how the vocal system works. Try singing a couple of different notes and you should get the idea.

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle

    Post Edited (Leon) : 9/3/2009 10:12:06 PM GMT
  • mallredmallred Posts: 122
    edited 2009-09-03 22:04
    I would really like Dr. Jim to release the schematic.· I'll see if I can push for that.· I think it would do us (as a company) a lot of good.· I'll also see if he can describe his terms a little more for the benefit of all (including myself).

    Thanks,

    Mark
  • CounterRotatingPropsCounterRotatingProps Posts: 1,132
    edited 2009-09-03 22:20
    That would be great - thanks!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-03 22:27
    Mark,

    I appreciate any and all efforts you can make in that direction! Thanks!

    -Phil
  • VIRANDVIRAND Posts: 656
    edited 2009-09-03 22:29
    The Fundamental Buzz of the vocal cords is sharp pulses with harmonics (multiples) on all audible octaves,
    but the formant regions only bandpass those harmonics around the formant frequency.
    Filtering it out seems not important.
    It's not needed to understand voice, only inflection and singing notes.

    Smearing together the "stripes" in the formants by using slower (higher Q) filters
    should make it easier to process the formants and identify vowels I think,
    since they will appear more like inkblots around the formant frequencies.
    The stripes are the harmonics of the buzz and if the formant filter responds slower
    than the buzz frequency around 100-400 hz then it won't give buzz in it's output.
    Since the formant frequencies are higher than the buzz, slowing the filter is really
    slowing an envelope follower of the filter, so that the formant frequency is not filtered out too.

    If the analizer is spectrographic then plotting bigger dots simply smears out the buzz harmonics.

    I think vowels can be recognized with only two formant frequencies
    which are the most important for understanding speech. Using more
    formants plus the buzz is only important for realistic speech Synthesis
    that sounds close to human.

    This is what I think I know about the subject that's relevant.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-03 23:23
    VIRAND,

    That, as I've stated elsewhere, is precisely my understanding, too. But smearing things in the frequency domain is not a job for a simple analog filter. Hence the mystery of Dr. Jim's microphone "prefilter".

    BTW, in my Goertzel speech recognizer posted elsewhere, I actually got poorer performance with low-Q bandpass filters, despite there being wide gaps between the passbands of the better-performing high-Q filters. I'm still trying to figure out why.

    -Phil
  • lonesocklonesock Posts: 917
    edited 2009-09-03 23:24
    One empirical observation: humans are pretty good at recognizing human voices with music in the background. I would love to see Dr. Jim's system recognize the lyrics of a song, or even pick out a single keyword when it was sung with background music present. If your electronic ear can do that I am very impressed. (btw, A fairly common technique in mixing when an instrument is competing with the vocals is to attenuate the instrument track in the human vocal frequency range...helps the ear pick out the words. But this is very definitely done in frequency space...not in "envelope" space (for lack of a better term)).

    Jonathan

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.
  • CounterRotatingPropsCounterRotatingProps Posts: 1,132
    edited 2009-09-03 23:34
    Phil > passbands of the better-performing high-Q filters. I'm still trying to figure out why.

    just another good reason to add those parametric-like parms I suggested earlier (hint hint) smile.gif

    - H

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-03 23:39
    Jonathan,

    For a long time, before I knew its title, I thought the words to a certain Beach Boys song went something like, "Bob, bob, bob, bob-bob buran," which made it a very weird song, indeed. So I'm not sure that, unless you already know the lyrics, they're that easy even for humans to pick out from the instrumentals. What is remarkable, however, is our ability to focus on and follow a single conversation in a crowded room filled with chatter.

    -Phil

    P.S. There's an entire website devoted to misheard song lyrics: www.kissthisguy.com, a reference, I believe, to Jimi Hendrix's "Purple Haze".
Sign In or Register to comment.