Goertzel-based speech "recognizer" (now with source code)



  • NikosGNikosG Posts: 657
    edited 2011-03-11 - 11:33:57
    Hello everybody,

    This speech recognizer is amazing! I discovered this thread last week by chance. Phil’s applications are always amazing and this one is beyond the imagination!!! ! I run successfully the code given above by PhiPi using my propeller demo board and sometimes I had accuracy of recognition almost 100% !
    I was so excited and I decided to add a microphone on my Propeller proto board. My proto board doesn't have a TV-OUT. So I converted the code by adding “serial terminal” output commands and the new code runs properly with the propeller demo board. (file: goertzel_speech_with_serial_terminal.spin) It was also able to test the new code with the Proto board without the TV-OUT.
    My next step was the building of the microphone circuit on the Propeller Proto board. I used the Mic circuit from the Demo Board (this circuit is also given above by “Closo99”)
    Here is some Photos of my effort.
    Unfortunately I haven’t the result I was expected. The system has a different behavior from the propeller Demo board. When I run the code the system accepts quickly the input words without my voice (I suppose it is very sensitive). I increase the THRESHOLD constant from 500 to 1500 and then the system behaves almost normal, it accepts the input words (I must to speak loudly) but I have no accuracy. It is not able to recognize anything. What is going wrong?
  • localrogerlocalroger Posts: 3,196
    edited 2011-03-11 - 11:44:17
    NIkos, the mic circuit relies on the propeller's ability to do delta-sigma ADC, but that only works if the wires between the resistors and capacitors are all kept very short; it doesn't work on a breadboard at all and is even iffy with a 40-pin DIP propeller. It does work with the protoboard if you solder the components close up to the chip breakout pads.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 22,275
    edited 2011-03-11 - 11:44:48
    For effective sigma-delta analog input, the passive components have to be located very close to the Propeller pins they connect to. Surface mount components, due to their diminutive size, are the best. Trying to accomplish it with a solderless breadboard will be an exercise in frustration, as you've discovered. Get some physically smaller components, solder them directly to the proto board close to the Prop, and you will get better results.

    “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery
  • NikosGNikosG Posts: 657
    edited 2011-03-16 - 16:56:28
    Thank you localroger and Phil,
    Althought I don't understand what is the " delta sigma", I am convinced that I must build a smaller circuit closer to the prop. But I found and something else. In the documentation: “spin tip, A/D and D/A for the Classroom” there is a microphone schematic from Andy’s Prototyping Board, slightly different from the microphone schematic of the propeller demo board.
    Andy doesn't use the second capacitor and also looking at this schematic I realize that there are two different Vdd, one 5V and the other 3,3V. What should be the appropriate Vdd when we try to build the Mic circuit on the ProtoBoard?
  • Mike GreenMike Green Posts: 22,891
    edited 2011-03-16 - 17:08:40
    The Vdd connected to the 1nF capacitor should be 3.3V. It's used to balance the common point (of P8, the 100K resistor, the two 1nF capacitors, and the input point of the ADC at the midpoint voltage of the Propeller's power supply (3.3V). The power source attached to the 10K resistor and the microphone is used to power the microphone. Using 5V will get you a higher voltage signal unless the microphone is made for a 3.3V supply. Check the microphone's datasheet.

    Andy's circuit is not a sigma-delta ADC. It uses only one I/O pin and measures the voltage at the microphone using an RCTIME-like action where the Prop discharges the capacitor, then monitors the voltage on the capacitor while it charges up at a fixed rate through the 100K resistor. The amount of time it takes for the capacitor to charge to some fixed threshold depends on the voltage output of the microphone.

    There's a description of sigma-delta ADCs on the Wikipedia.
  • william chanwilliam chan Posts: 1,315
    edited 2011-05-09 - 17:18:07
    Hi guys,

    I use my own custom PCB and the circuit is almost identical to the Demo Board's circuit except that I used a 3.3v switching boost regulator.
    I use the D40 chip but with SMT resistors and capacitors and they are located very close to the propeller pins.

    I added a 100ohm 10uF RC power supply filter before biasing the Mic with a 7.5k resistor.
    The reason for this RC filter is that I had wanted to reduce noise as much as possible.
    I now have a biasing voltage of 1.5v across the Mic.

    1. My problem is, I need to increase the Threshold constant from the default of 500 to about 5000 for it to stabilize.
    Otherwise, it keeps detecting voices even when nobody is talking. (the room is fairly quiet)
    Does this mean that the boost regulator is introducing too much noise into the mic circuit? (even with the RC filter?)

    2. Which is the easiest way to get the individual power levels of any of the 8 sample frequencies?

    There is no such thing as bad news.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 22,275
    edited 2011-05-09 - 18:36:17

    1. You might be getting regulator hash in the mic circuit. The S2, which has a microphone, uses a switching regulator, but it's for the +5V supply. From there, it goes through an inductive filter to a linear 3.3V regualtor, which powers the microphone. There's also a preamp, but that's mostly there as a lowpass filter and to drive the ribbon cable to the main board. Here's the schematic:


    2. It's all there in the Goertzel object:
    PUB start(inp_pin, fb_pin, freq_count, freq_addr, count_addr, sampling_rate, goertzel_rate, goertzel_n) | i
    '' Sets up the Goertzel analyzer:
    ''   inp_pin is the sigma-delta audio input.
    ''   fb_pin is the sigma-delta feedback pin.
    ''   freq_count is the number of frequencies being analyzed.
    ''   freq_addr points to an array of longs containing the frequencies being analyzed.
    ''     [b][color=red]During operation this array will be continuously refreshed with the Goertzel power coefficients
    ''     for the frequencies selected.[/color][/b]
    ''   count_addr points to a long which will be incremented after each result is posted at freq_addr.
    ''     This can be used to synchronize the reading of results.
    ''   sampling_rate is the frequency (Hz) at which the ADC is sampled.
    ''   goertzel rate is the number of times per second to report results.
    ''   goertzel_n is the number of samples required to obtain each result. The higher this number is, the
    ''     narrower the passband of the consequent filters.

    580 x 461 - 14K
    “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery
  • william chanwilliam chan Posts: 1,315
    edited 2011-05-09 - 19:28:42
    Thanks for your kind reply.
    I will do more tests on the noise problem.
    There is no such thing as bad news.
  • william chanwilliam chan Posts: 1,315
    edited 2011-05-10 - 15:54:42
    Hi Phil,

    I managed to get the Threshold level down to 1000 by adjusting the 8 frequencies.
    I got about 90% accuracy but false acceptance is also high, about 40% ( recognizing common music and ambient noise as as word )
    In your opinion, what is the easiest way to reduce false acceptance?

    Is it to
    1. Increase the number of sampled frequencies
    2. Increase the samples per second
    3. Implement hardware RC filters
    4. Add another rejection algorithm
    There is no such thing as bad news.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 22,275
    edited 2011-05-10 - 22:57:30
    Try messing with this if condition to require a higher correlation:
        if [b][color=red](maxcorr => 75 or maxcorr * 100 / seccorr > 175 - maxcorr and maxcorr => 40)[/color][/b]
          tv.str(string("You said, ", 34))
          tv.str(string(34, "."))
          tv.str(string("Say again, please."))

    The other thing you could do, along with the above, is to take more samples of the utterances you're trying to match.

    “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery
  • RaymanRayman Posts: 9,457
    edited 2011-05-11 - 06:33:18
    I'd really like to do a graphical interface for this cool code... Don't see having time for it though...
    Prop Info and Apps: http://www.rayslogic.com/
Sign In or Register to comment.