@Phil: I have a sneaking suspicion for said Beach Boys' song (and every Nirvana song), even listening to an isolated vocal track would have led to the same confusion. [noparse][[/noparse]8^)
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
Phil Pilgrim (PhiPi) said...
. What is remarkable, however, is our ability to focus on and follow a single conversation in a crowded room filled with chatter.
Mark, the preamp looks very familiar [noparse]:)[/noparse] I was wondering how the A/D would be done. The minimalist sigma-delta method used often with 2 Propeller pins is OK, but the range and conversion rate are limited ... better to use a COG for something more interesting. There may still be distractions here, but I think you have come a long way to repairing some damage with this thread. I look forward to a demo.
BTW: I tried that Forth (uggh) based pseudo-AI thingy. It's not very intelligent [noparse]:)[/noparse] Seems to just spew phrases with words given to it and some modifiers base on language rules. Surely someone can do better than that ... uggh [noparse]:)[/noparse]
The CD4066 is just a way of switching between two mics. The transistor circuit looks like an envelope detector. The base is biased to 0.6V by the forward-biased diode, so will only conduct on positive excursions of the input. These are filtered by the two-stage lowpass filter going into the ADC. I'm not sure yet what the effects of the emitter resistor or feedback cap might be, except maybe to linearize the output. I'm guessing that all the formant info is being discarded and that only the amplitude profile is being used. (Beau would like this! )
If my interpretation is correct, it also clears up what's meant by "carrier" and "modulation" in this context.
It's a lot of circuitry, though, to throw at a probelm that the Prop could solve on its own through software. But, hey, if it works, it works.
-Phil
Post Edited (Phil Pilgrim (PhiPi)) : 9/4/2009 3:19:49 AM GMT
PhiPi said...
It's a lot of circuitry, though, to throw at a probelm that the Prop could solve on its own through software. But, hey, if it works, it works.
I have no idea how complex the equivalent software would be, but it could be that Dr. Jim wants to conserve cog resources (time and memory) and so does what he can in hardware.
Dr. Jim says there is an active low-pass filter and a clipper that extracts the modulation envelope and then a passive Pi network that removes the carrier. There is a high-speed A to D converter after the Pi network. It has to be able to sustain A to D of 1.5 microsecond conversion rate.
Dr. Jim commented to me that the Prop could do it, but there was some kind of problem and he had to reslove it by building this circuit outside of the Prop. He has his reasons for doing this I suppose.
That's fine, and I respect his decision to proceed the way he has. Please tell him that if he could divulge the problem he was having using the Propeller alone, I'd be happy to help him work out a solution, if one is possible. But I prefer the openness of the forum to private communication, so that many can benefit from the discussion.
I believe Phil is correct in his interpretation of "carrier" and "modulation." Dr. Jim's method apparently discards not just the carrier but also formant information. If this approach works, I'll be very surprised. (I do like surprises, though!)
Re the discussion "carrier" and "envelope".
What we finally saw, is exactly what Dr. Jim described and how I interpreted it. This method will fail, as it has been discussed in a different thread.
It can't -at least- distinguish vowels.
As an example:
"beer" vs. "bar" (OK, it somehow *is* related)
"left" vs. "lift" (my robot always turns left if he has to lift his left arm).
"step" vs. "stop" (wheneveer I say stop, he makes an extra step)
And all that not with just a single trained speaker, but with many different speakers. The overall analized data of some (lets say 100) words will result in a low propabillity of correct recognition.
Nick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Never use force, just go for a bigger hammer!
The DIY Digital-Readout for mills, lathes etc.: YADRO
This really begs an experiment: use the extracted envelope to modulate a constant-frequency source (maybe an "ah" sound to make it seem human), play it in a different room where the original speaker can't be heard, and see if a human listener can understand (or be trained to understand) what was being said. If a human can understand it, maybe a computer can, too; if not, there's probably not a chance for the computer.
One thing we do have to keep in mind — absent single-word commands — is context. Conversationally, the difference between "left" and "lift" for example may be obvious by how it's used. So it's not always necessary to fully recognize individual words, if they can be inferred from their context. I'm sure there are plenty of counter examples to go around, however. Linguists are really clever at producing them.
It does seem that the envelope method deprives the Prop of necessary vocal cues, though, and I'm afraid I share Nick's concerns about it. But I guess we'll see...
Wonder if that envelope couldn't be combined with a few frequency domain data points, or perhaps a sum?
Seems to me, just a little bit more information would open the door considerably here. Something like fundemental and first two formants. This is only a few bytes of additional info. Takes computation though, or does it? Any clever circuts that can do that, or output a wave that is representative of that, added to the modulation one already being used?
I'm thinking of something like 4 notch filters, one near the male primary frequency, another for female, and another two for the second harmonic formant. All those output a wave that equals the volume present in their passbands, sum two of them, difference the other two, and get another low bandwidth, easily encoded envelope. This would give information that could differentiate those words.
Seems that computer code does this way better, but I was just musing about the electronics on the front end approach.
I may simulate these circuits in Proteus...· It's pretty accurate, and it allows you to specify wav audio files as a source for the simulation, so you can graph or listen to what the result will be, in addition to conventional sweep analysis.·· It will be interesting to see the various amplitude / frequency responses...
Comments
en.wikipedia.org/wiki/Electronic_filters#Passive_filters
Low Pass: en.wikipedia.org/wiki/Low-pass
High Pass: en.wikipedia.org/wiki/High-pass
Band Pass: en.wikipedia.org/wiki/Band-pass
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
Donald
·
R1 10K
Disk Caps * 3 each - 104 = 100nF or .1 uf @ 25 Volts
Electrolytic capacitor 10 uf * 1 @ 25 Volts
Electrolytic capacitor 8330 µF * 1
The rest I can't tell
Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 2:14:07 AM GMT
Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 12:59:23 AM GMT
Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 2:19:31 AM GMT
I think you're right!
shanghai_fool,
I fear that you. too, are right. Even following a conversation that doesn't interest me — without any interference — can be a challenge!
-Phil
The second circuit is the envelope extraction filter.
And video.
http://www.youtube.com/watch?v=th2SPT7zoJs
I think they uploaded backwards, but you can figure out which is which.
Now we're talking! This is real progress. Don't you feel better? I know I do!
Thanks,
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!
Thanks.
Mark
Thanks! For the schematic .
@Phil
We were getting close
www.datasheetcatalog.org/datasheets/270/109221_DS.pdf
ADC0820 Data Sheet:
8-Bit, high-speed, mP-compatible A/D converter
with track/hold function
www.datasheetcatalog.org/datasheet/philips/ADC0820CNED.pdf
LM386 - Data Sheet
Low Voltage Audio Power Amplifier:
www.national.com/ds/LM/LM386.pdf
MPSA05 - Data Sheet:
NPN General Purpose Amplifier
www.fairchildsemi.com/ds/MP/MPSA05.pdf
Electret Condencer Mic 47DB - Data Sheet :
media.digikey.com/pdf/Data%20Sheets/Horn%20Industrial%20PDFs/EM3015S-47-G.pdf
Post Edited (Bob Lawrence (VE1RLL)) : 9/4/2009 3:53:25 AM GMT
BTW: I tried that Forth (uggh) based pseudo-AI thingy. It's not very intelligent [noparse]:)[/noparse] Seems to just spew phrases with words given to it and some modifiers base on language rules. Surely someone can do better than that ... uggh [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propeller Tools
If my interpretation is correct, it also clears up what's meant by "carrier" and "modulation" in this context.
It's a lot of circuitry, though, to throw at a probelm that the Prop could solve on its own through software. But, hey, if it works, it works.
-Phil
Post Edited (Phil Pilgrim (PhiPi)) : 9/4/2009 3:19:49 AM GMT
I have no idea how complex the equivalent software would be, but it could be that Dr. Jim wants to conserve cog resources (time and memory) and so does what he can in hardware.
Dr. Jim commented to me that the Prop could do it, but there was some kind of problem and he had to reslove it by building this circuit outside of the Prop. He has his reasons for doing this I suppose.
Mark
That's fine, and I respect his decision to proceed the way he has. Please tell him that if he could divulge the problem he was having using the Propeller alone, I'd be happy to help him work out a solution, if one is possible. But I prefer the openness of the forum to private communication, so that many can benefit from the discussion.
-Phil
I believe Phil is correct in his interpretation of "carrier" and "modulation." Dr. Jim's method apparently discards not just the carrier but also formant information. If this approach works, I'll be very surprised. (I do like surprises, though!)
What we finally saw, is exactly what Dr. Jim described and how I interpreted it. This method will fail, as it has been discussed in a different thread.
It can't -at least- distinguish vowels.
As an example:
"beer" vs. "bar" (OK, it somehow *is* related)
"left" vs. "lift" (my robot always turns left if he has to lift his left arm).
"step" vs. "stop" (wheneveer I say stop, he makes an extra step)
And all that not with just a single trained speaker, but with many different speakers. The overall analized data of some (lets say 100) words will result in a low propabillity of correct recognition.
Nick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Never use force, just go for a bigger hammer!
The DIY Digital-Readout for mills, lathes etc.:
YADRO
One thing we do have to keep in mind — absent single-word commands — is context. Conversationally, the difference between "left" and "lift" for example may be obvious by how it's used. So it's not always necessary to fully recognize individual words, if they can be inferred from their context. I'm sure there are plenty of counter examples to go around, however. Linguists are really clever at producing them.
It does seem that the envelope method deprives the Prop of necessary vocal cues, though, and I'm afraid I share Nick's concerns about it. But I guess we'll see...
-Phil
Seems to me, just a little bit more information would open the door considerably here. Something like fundemental and first two formants. This is only a few bytes of additional info. Takes computation though, or does it? Any clever circuts that can do that, or output a wave that is representative of that, added to the modulation one already being used?
I'm thinking of something like 4 notch filters, one near the male primary frequency, another for female, and another two for the second harmonic formant. All those output a wave that equals the volume present in their passbands, sum two of them, difference the other two, and get another low bandwidth, easily encoded envelope. This would give information that could differentiate those words.
Seems that computer code does this way better, but I was just musing about the electronics on the front end approach.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!
Post Edited (potatohead) : 9/4/2009 3:29:31 PM GMT
"Time flies like an arrow.
Fruit flies like a banana."
Groucho Marx
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.
Robots are microcontrolled.
I am microcontrolled.
But you·can·call me micro.
If it's not Parallax then don't even bother.
I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)