Voice Recognition Analysis

Bob Lawrence (VE1RLL) · 2009-09-03 23:46

Reference:

en.wikipedia.org/wiki/Electronic_filters#Passive_filters

Low Pass: en.wikipedia.org/wiki/Low-pass

High Pass: en.wikipedia.org/wiki/High-pass

Band Pass: en.wikipedia.org/wiki/Band-pass

lonesock · 2009-09-03 23:46

@Phil: I have a sneaking suspicion for said Beach Boys' song (and every Nirvana song), even listening to an isolated vocal track would have led to the same confusion. [noparse][[/noparse]8^)

Jonathan

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.

shanghai_fool · 2009-09-04 00:03

Phil Pilgrim (PhiPi) said...
. What is remarkable, however, is our ability to focus on and follow a single conversation in a crowded room filled with chatter.

This ability, too, seems to dimish with age.

Donald
·

Bob Lawrence (VE1RLL) · 2009-09-04 00:06

The values I see in the pictures are as follows:

R1 10K

Disk Caps * 3 each - 104 = 100nF or .1 uf @ 25 Volts

Electrolytic capacitor 10 uf * 1 @ 25 Volts

Electrolytic capacitor 8330 µF * 1

The rest I can't tell

Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 2:14:07 AM GMT

Bob Lawrence (VE1RLL) · 2009-09-04 00:31

Resistor 10 K - 5 %

Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 12:59:23 AM GMT

Bob Lawrence (VE1RLL) · 2009-09-04 00:32

8330 µF Capacitor ( I can't find such a cap in a Google search) ???

Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 2:19:31 AM GMT

Bob Lawrence (VE1RLL) · 2009-09-04 00:33

104 Cap @ 25 Volts

Bob Lawrence (VE1RLL) · 2009-09-04 00:35

10 UF Cap @ 25 Volts

Bob Lawrence (VE1RLL) · 2009-09-04 00:36

Mic

Phil Pilgrim (PhiPi) · 2009-09-04 02:11

Jonathan,

I think you're right!

shanghai_fool,

I fear that you. too, are right. Even following a conversation that doesn't interest me — without any interference — can be a challenge!

-Phil

mallred · 2009-09-04 02:25

I was wrong about the circuit.· There are actually two circuits.· The one with pictures I posted was only an analog amplifier for the mic.

The second circuit is the envelope extraction filter.

And video.

http://www.youtube.com/watch?v=th2SPT7zoJs

I think they uploaded backwards, but you can figure out which is which.

Phil Pilgrim (PhiPi) · 2009-09-04 02:32

Mark,

Now we're talking! This is real progress. Don't you feel better? I know I do!

Thanks,
-Phil

potatohead · 2009-09-04 02:40

[noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!

mallred · 2009-09-04 02:41

Yep, I was able to pull it out this time.

Thanks.

Mark

Bob Lawrence (VE1RLL) · 2009-09-04 02:43

Low Pass Filter - First Order

Bob Lawrence (VE1RLL) · 2009-09-04 02:55

@mallred

Thanks! For the schematic .

@Phil

We were getting close

Bob Lawrence (VE1RLL) · 2009-09-04 02:59

CD4066 - Data Sheet:

www.datasheetcatalog.org/datasheets/270/109221_DS.pdf

ADC0820 Data Sheet:

8-Bit, high-speed, mP-compatible A/D converter
with track/hold function

www.datasheetcatalog.org/datasheet/philips/ADC0820CNED.pdf

LM386 - Data Sheet
Low Voltage Audio Power Amplifier:

www.national.com/ds/LM/LM386.pdf

MPSA05 - Data Sheet:
NPN General Purpose Amplifier

www.fairchildsemi.com/ds/MP/MPSA05.pdf

Electret Condencer Mic 47DB - Data Sheet :
media.digikey.com/pdf/Data%20Sheets/Horn%20Industrial%20PDFs/EM3015S-47-G.pdf

Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 3:53:25 AM GMT

jazzed · 2009-09-04 03:09

Mark, the preamp looks very familiar [noparse]:)[/noparse] I was wondering how the A/D would be done. The minimalist sigma-delta method used often with 2 Propeller pins is OK, but the range and conversion rate are limited ... better to use a COG for something more interesting. There may still be distractions here, but I think you have come a long way to repairing some damage with this thread. I look forward to a demo.

BTW: I tried that Forth (uggh) based pseudo-AI thingy. It's not very intelligent [noparse]:)[/noparse] Seems to just spew phrases with words given to it and some modifiers base on language rules. Surely someone can do better than that ... uggh [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propeller Tools

Phil Pilgrim (PhiPi) · 2009-09-04 03:13

The CD4066 is just a way of switching between two mics. The transistor circuit looks like an envelope detector. The base is biased to 0.6V by the forward-biased diode, so will only conduct on positive excursions of the input. These are filtered by the two-stage lowpass filter going into the ADC. I'm not sure yet what the effects of the emitter resistor or feedback cap might be, except maybe to linearize the output. I'm guessing that all the formant info is being discarded and that only the amplitude profile is being used. (Beau would like this!

)

If my interpretation is correct, it also clears up what's meant by "carrier" and "modulation" in this context.

It's a lot of circuitry, though, to throw at a probelm that the Prop could solve on its own through software. But, hey, if it works, it works.

-Phil

Post Edited (Phil Pilgrim (PhiPi)) : 9/4/2009 3:19:49 AM GMT

SRLM · 2009-09-04 03:20

PhiPi said...
It's a lot of circuitry, though, to throw at a probelm that the Prop could solve on its own through software. But, hey, if it works, it works.

I have no idea how complex the equivalent software would be, but it could be that Dr. Jim wants to conserve cog resources (time and memory) and so does what he can in hardware.

mallred · 2009-09-04 03:21

Dr. Jim says there is an active low-pass filter and a clipper that extracts the modulation envelope and then a passive Pi network that removes the carrier. There is a high-speed A to D converter after the Pi network. It has to be able to sustain A to D of 1.5 microsecond conversion rate.

mallred · 2009-09-04 03:25

@Phil

Dr. Jim commented to me that the Prop could do it, but there was some kind of problem and he had to reslove it by building this circuit outside of the Prop. He has his reasons for doing this I suppose.

Mark

Phil Pilgrim (PhiPi) · 2009-09-04 03:48

Mark,

That's fine, and I respect his decision to proceed the way he has. Please tell him that if he could divulge the problem he was having using the Propeller alone, I'd be happy to help him work out a solution, if one is possible. But I prefer the openness of the forum to private communication, so that many can benefit from the discussion.

-Phil

mpark · 2009-09-04 06:32

Mark, thanks for posting the video.

I believe Phil is correct in his interpretation of "carrier" and "modulation." Dr. Jim's method apparently discards not just the carrier but also formant information. If this approach works, I'll be very surprised. (I do like surprises, though!)

Nick Mueller · 2009-09-04 07:59

Re the discussion "carrier" and "envelope".
What we finally saw, is exactly what Dr. Jim described and how I interpreted it. This method will fail, as it has been discussed in a different thread.
It can't -at least- distinguish vowels.

As an example:
"beer" vs. "bar" (OK, it somehow *is* related)
"left" vs. "lift" (my robot always turns left if he has to lift his left arm).
"step" vs. "stop" (wheneveer I say stop, he makes an extra step)

And all that not with just a single trained speaker, but with many different speakers. The overall analized data of some (lets say 100) words will result in a low propabillity of correct recognition.

Nick

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Never use force, just go for a bigger hammer!

The DIY Digital-Readout for mills, lathes etc.:
YADRO

Phil Pilgrim (PhiPi) · 2009-09-04 15:02

This really begs an experiment: use the extracted envelope to modulate a constant-frequency source (maybe an "ah" sound to make it seem human), play it in a different room where the original speaker can't be heard, and see if a human listener can understand (or be trained to understand) what was being said. If a human can understand it, maybe a computer can, too; if not, there's probably not a chance for the computer.

One thing we do have to keep in mind — absent single-word commands — is context. Conversationally, the difference between "left" and "lift" for example may be obvious by how it's used. So it's not always necessary to fully recognize individual words, if they can be inferred from their context. I'm sure there are plenty of counter examples to go around, however. Linguists are really clever at producing them.

It does seem that the envelope method deprives the Prop of necessary vocal cues, though, and I'm afraid I share Nick's concerns about it. But I guess we'll see...

-Phil

potatohead · 2009-09-04 15:12

Wonder if that envelope couldn't be combined with a few frequency domain data points, or perhaps a sum?

Seems to me, just a little bit more information would open the door considerably here. Something like fundemental and first two formants. This is only a few bytes of additional info. Takes computation though, or does it? Any clever circuts that can do that, or output a wave that is representative of that, added to the modulation one already being used?

I'm thinking of something like 4 notch filters, one near the male primary frequency, another for female, and another two for the second harmonic formant. All those output a wave that equals the volume present in their passbands, sum two of them, difference the other two, and get another low bandwidth, easily encoded envelope. This would give information that could differentiate those words.

Seems that computer code does this way better, but I was just musing about the electronics on the front end approach.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!

Post Edited (potatohead) : 9/4/2009 3:29:31 PM GMT

heater · 2009-09-04 15:49

Context hmm...

"Time flies like an arrow.
Fruit flies like a banana."
Groucho Marx

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Agent420 · 2009-09-04 16:09

I may simulate these circuits in Proteus...· It's pretty accurate, and it allows you to specify wav audio files as a source for the simulation, so you can graph or listen to what the result will be, in addition to conventional sweep analysis.·· It will be interesting to see the various amplitude / frequency responses...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Microcontrolled · 2009-09-04 16:52

Nice video! Now we have some progress here!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

Voice Recognition Analysis

Comments