Shop OBEX P1 Docs P2 Docs Learn Events
Voice Recognition Analysis - Page 3 — Parallax Forums

Voice Recognition Analysis

135

Comments

  • lonesocklonesock Posts: 917
    edited 2009-09-03 23:46
    @Phil: I have a sneaking suspicion for said Beach Boys' song (and every Nirvana song), even listening to an isolated vocal track would have led to the same confusion. [noparse][[/noparse]8^)

    Jonathan

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.
  • shanghai_foolshanghai_fool Posts: 149
    edited 2009-09-04 00:03
    Phil Pilgrim (PhiPi) said...
    . What is remarkable, however, is our ability to focus on and follow a single conversation in a crowded room filled with chatter.

    This ability, too, seems to dimish with age.

    Donald
    ·
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2009-09-04 00:06
    The values I see in the pictures are as follows:

    R1 10K

    Disk Caps * 3 each - 104 = 100nF or .1 uf @ 25 Volts

    Electrolytic capacitor 10 uf * 1 @ 25 Volts

    Electrolytic capacitor 8330 µF * 1

    The rest I can't tell freaked.gif

    Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 2:14:07 AM GMT
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2009-09-04 00:31
    Resistor 10 K - 5 %

    Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 12:59:23 AM GMT
    548 x 588 - 118K
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2009-09-04 00:32
    8330 µF Capacitor ( I can't find such a cap in a Google search) ???

    Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 2:19:31 AM GMT
    600 x 400 - 104K
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2009-09-04 00:33
    104 Cap @ 25 Volts
    600 x 799 - 204K
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2009-09-04 00:35
    10 UF Cap @ 25 Volts
    112 x 135 - 27K
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2009-09-04 00:36
    Mic
    1294 x 1102 - 444K
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-04 02:11
    Jonathan,

    I think you're right! smile.gif

    shanghai_fool,

    I fear that you. too, are right. Even following a conversation that doesn't interest me — without any interference — can be a challenge!

    -Phil
  • mallredmallred Posts: 122
    edited 2009-09-04 02:25
    I was wrong about the circuit.· There are actually two circuits.· The one with pictures I posted was only an analog amplifier for the mic.

    The second circuit is the envelope extraction filter.

    And video.

    http://www.youtube.com/watch?v=th2SPT7zoJs

    I think they uploaded backwards, but you can figure out which is which.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-04 02:32
    Mark,

    Now we're talking! This is real progress. Don't you feel better? I know I do!

    Thanks,
    -Phil
  • potatoheadpotatohead Posts: 10,261
    edited 2009-09-04 02:40
    [noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
    Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
    Safety Tip: Life is as good as YOU think it is!
  • mallredmallred Posts: 122
    edited 2009-09-04 02:41
    Yep, I was able to pull it out this time.

    Thanks.

    Mark
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2009-09-04 02:43
    Low Pass Filter - First Order
    1465 x 599 - 34K
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2009-09-04 02:55
    @mallred

    Thanks! For the schematic .

    @Phil

    We were getting close smilewinkgrin.gif
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2009-09-04 02:59
    CD4066 - Data Sheet:

    www.datasheetcatalog.org/datasheets/270/109221_DS.pdf

    ADC0820 Data Sheet:

    8-Bit, high-speed, mP-compatible A/D converter
    with track/hold function

    www.datasheetcatalog.org/datasheet/philips/ADC0820CNED.pdf

    LM386 - Data Sheet
    Low Voltage Audio Power Amplifier:

    www.national.com/ds/LM/LM386.pdf


    MPSA05 - Data Sheet:
    NPN General Purpose Amplifier

    www.fairchildsemi.com/ds/MP/MPSA05.pdf

    Electret Condencer Mic 47DB - Data Sheet :
    media.digikey.com/pdf/Data%20Sheets/Horn%20Industrial%20PDFs/EM3015S-47-G.pdf

    Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 3:53:25 AM GMT
  • jazzedjazzed Posts: 11,803
    edited 2009-09-04 03:09
    Mark, the preamp looks very familiar [noparse]:)[/noparse] I was wondering how the A/D would be done. The minimalist sigma-delta method used often with 2 Propeller pins is OK, but the range and conversion rate are limited ... better to use a COG for something more interesting. There may still be distractions here, but I think you have come a long way to repairing some damage with this thread. I look forward to a demo.

    BTW: I tried that Forth (uggh) based pseudo-AI thingy. It's not very intelligent [noparse]:)[/noparse] Seems to just spew phrases with words given to it and some modifiers base on language rules. Surely someone can do better than that ... uggh [noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve

    Propeller Tools
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-04 03:13
    The CD4066 is just a way of switching between two mics. The transistor circuit looks like an envelope detector. The base is biased to 0.6V by the forward-biased diode, so will only conduct on positive excursions of the input. These are filtered by the two-stage lowpass filter going into the ADC. I'm not sure yet what the effects of the emitter resistor or feedback cap might be, except maybe to linearize the output. I'm guessing that all the formant info is being discarded and that only the amplitude profile is being used. (Beau would like this! smile.gif )

    If my interpretation is correct, it also clears up what's meant by "carrier" and "modulation" in this context.

    It's a lot of circuitry, though, to throw at a probelm that the Prop could solve on its own through software. But, hey, if it works, it works.

    -Phil

    Post Edited (Phil Pilgrim (PhiPi)) : 9/4/2009 3:19:49 AM GMT
  • SRLMSRLM Posts: 5,045
    edited 2009-09-04 03:20
    PhiPi said...
    It's a lot of circuitry, though, to throw at a probelm that the Prop could solve on its own through software. But, hey, if it works, it works.

    I have no idea how complex the equivalent software would be, but it could be that Dr. Jim wants to conserve cog resources (time and memory) and so does what he can in hardware.
  • mallredmallred Posts: 122
    edited 2009-09-04 03:21
    Dr. Jim says there is an active low-pass filter and a clipper that extracts the modulation envelope and then a passive Pi network that removes the carrier. There is a high-speed A to D converter after the Pi network. It has to be able to sustain A to D of 1.5 microsecond conversion rate.
  • mallredmallred Posts: 122
    edited 2009-09-04 03:25
    @Phil

    Dr. Jim commented to me that the Prop could do it, but there was some kind of problem and he had to reslove it by building this circuit outside of the Prop. He has his reasons for doing this I suppose.

    Mark
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-04 03:48
    Mark,

    That's fine, and I respect his decision to proceed the way he has. Please tell him that if he could divulge the problem he was having using the Propeller alone, I'd be happy to help him work out a solution, if one is possible. But I prefer the openness of the forum to private communication, so that many can benefit from the discussion. smile.gif

    -Phil
  • mparkmpark Posts: 1,305
    edited 2009-09-04 06:32
    Mark, thanks for posting the video.

    I believe Phil is correct in his interpretation of "carrier" and "modulation." Dr. Jim's method apparently discards not just the carrier but also formant information. If this approach works, I'll be very surprised. (I do like surprises, though!)
  • Nick MuellerNick Mueller Posts: 815
    edited 2009-09-04 07:59
    Re the discussion "carrier" and "envelope".
    What we finally saw, is exactly what Dr. Jim described and how I interpreted it. This method will fail, as it has been discussed in a different thread.
    It can't -at least- distinguish vowels.

    As an example:
    "beer" vs. "bar" (OK, it somehow *is* related)
    "left" vs. "lift" (my robot always turns left if he has to lift his left arm).
    "step" vs. "stop" (wheneveer I say stop, he makes an extra step)

    And all that not with just a single trained speaker, but with many different speakers. The overall analized data of some (lets say 100) words will result in a low propabillity of correct recognition.


    Nick

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Never use force, just go for a bigger hammer!

    The DIY Digital-Readout for mills, lathes etc.:
    YADRO
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-09-04 15:02
    This really begs an experiment: use the extracted envelope to modulate a constant-frequency source (maybe an "ah" sound to make it seem human), play it in a different room where the original speaker can't be heard, and see if a human listener can understand (or be trained to understand) what was being said. If a human can understand it, maybe a computer can, too; if not, there's probably not a chance for the computer.

    One thing we do have to keep in mind — absent single-word commands — is context. Conversationally, the difference between "left" and "lift" for example may be obvious by how it's used. So it's not always necessary to fully recognize individual words, if they can be inferred from their context. I'm sure there are plenty of counter examples to go around, however. Linguists are really clever at producing them.

    It does seem that the envelope method deprives the Prop of necessary vocal cues, though, and I'm afraid I share Nick's concerns about it. But I guess we'll see...

    -Phil
  • potatoheadpotatohead Posts: 10,261
    edited 2009-09-04 15:12
    Wonder if that envelope couldn't be combined with a few frequency domain data points, or perhaps a sum?

    Seems to me, just a little bit more information would open the door considerably here. Something like fundemental and first two formants. This is only a few bytes of additional info. Takes computation though, or does it? Any clever circuts that can do that, or output a wave that is representative of that, added to the modulation one already being used?

    I'm thinking of something like 4 notch filters, one near the male primary frequency, another for female, and another two for the second harmonic formant. All those output a wave that equals the volume present in their passbands, sum two of them, difference the other two, and get another low bandwidth, easily encoded envelope. This would give information that could differentiate those words.

    Seems that computer code does this way better, but I was just musing about the electronics on the front end approach.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
    Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
    Safety Tip: Life is as good as YOU think it is!

    Post Edited (potatohead) : 9/4/2009 3:29:31 PM GMT
  • heaterheater Posts: 3,370
    edited 2009-09-04 15:49
    Context hmm...

    "Time flies like an arrow.
    Fruit flies like a banana."
    Groucho Marx

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Agent420Agent420 Posts: 439
    edited 2009-09-04 16:09
    I may simulate these circuits in Proteus...· It's pretty accurate, and it allows you to specify wav audio files as a source for the simulation, so you can graph or listen to what the result will be, in addition to conventional sweep analysis.·· It will be interesting to see the various amplitude / frequency responses...

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • MicrocontrolledMicrocontrolled Posts: 2,461
    edited 2009-09-04 16:52
    Nice video! Now we have some progress here!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Computers are microcontrolled.

    Robots are microcontrolled.
    I am microcontrolled.

    But you·can·call me micro.

    If it's not Parallax then don't even bother.

    I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)



Sign In or Register to comment.