Rapid "real-time" Audio Frequency Detection challenge
I'm encouraged by recent discussions around the stellar improvements in the ADC realms of the P2. I'd like to get some feedback/ideas from folks while the topic is still fresh...
I have a challenge to identify frequencies out of acoustic music played in open air. I am aided by the fact that I only need to identify 6 particular frequencies and respond with a different frequency. I'm stymied by the fact that, 82Hz for example, for the real-time-ness requirement to be satisfied, I need to detect it with confidence in about only 3 to 4 cycles, tops!
I've been exploring a few strategies, including:
1. parallel, looking for just those freqs using Goertzel Algorithm (built in, or handcoded), in parallel over 6 cogs using sliding windows around a shared ring buffer of audio data.
2. concurrent, LowPass filter -> FFT -> DSP... possibly Wavelet Transform - demanding high quality ADC, that may detect with few complete cycles.
3. phased array of microphones
I've had a chat with Perplexity.ai, if one cares to see some of what I explored. : https://www.perplexity.ai/search/what-is-the-Ckt6C0GzQo6_aq1NTddO4A?s=c
Finally... will there be an Object, released on OBEX (beyond the file in the forum post), that incorporates all that has been learned on the P2 ADC that I can include/link into my FlexC code?
Comments
I'm also interested by this topic.
In the past Ken Gracey built a single-note detector based on code from these forums for the P1. I think he was using it to detect notes from his clarinet, and I copied it to make a game that flashed lights when the kids hummed notes in the right sequence Yeah- hours of fun!
Would be cool to recreate that with the P2 now, for exactly the reasons you mention.
I wonder about detecting outdoor sound.... so much going on ! Would the P2 have the bandwidth to provide an array of dominant frequencies, in order of dominance (loudness I guess) ?
That way it should be possible to reliably detect your target freq. (or set of freq.) to better determine the specific signature of your sound source, and also to detect it "in the background" if other sound sources barge in (like a car driving past, or someone shouting, etc..)
What input did you have in mind ? Directly connected analog/electret mic (mics), or one of those digital mics?
@cgracey mentioned recently he would like to push the completed objects to OBEX when they are cleaned up. For now though, they might be in the latest PNUT download? That zip file contains a bunch of his latest demos.
This seems like a great topic to bring up at the next P2LF if you are ever able to drop in ?
Hi @ngeneer !
Some thoughts here:
1. The "stellar improvements" are about very low frequency measurements. They do a lot of averaging to enhance signal to noise relation, which is not possible for higher frequencies. For your task, depending from top frequency Sync2 method, 4048 clocks, 13bit, might be a good starting point.
As Goertzel algorithm gives exactly the same result for the selected frequencies as FFT, you can choose Goertzel or FFT just based on efficiency. I am sure that for 6 frequencies, Goertzel will be much more efficient. And you can do it on the fly without the need to buffer all the samples.
I have spent a lot of time to get going hardware Goertzel and was rather disappointed of it's benefits in the end. You can only look fore one frequency per cog, you have only limited resolution and you need to dedicate a cog for it. There might be cases with a single very high frequency, where it makes sense, but for audio frequencies, I am sure you can use one cog to look for 6 frequencies with software Goertzel.
I have used Sync2 method, 4048 clocks, 13bit, plus Software Goertzel with Tachyon for my audio analyser. It is non-real time and uses a relative long buffer of 4000 samples to achieve high selectivity. https://forums.parallax.com/discussion/173880/picture-of-sound-parallax-propeller-p2-as-a-tool-to-analyse-a-guitar-effect-pedal-or-an-amp#latest For speed the Goertzel algorithm is coded in asm. It seems to need 125 cycles per sample.
https://forum.micropython.org/viewtopic.php?t=7290 was very helpful information for me.
((It might be possible also to use some autocorrelation method here?))
I think some sort of automatic gain amplifier (compressor) will be very helpful for this project. Perhaps something like this: http://beavisaudio.com/schematics/Dan-Armstrong-Orange-Squeezer-Schematic.htm
Have fun, Christof
I had in mind an analog mic with good 82-110 Hz frequency response, possibly at the end of a tuned port as a natural low-pass filter to accentuate the signal I'm looking for.
yep, got it. the notes are useful. will try to embedd in my FlexC as is.
I've been to a few. will have something to more talk about this time.
Stuff from this wake word thread might help:
https://forums.parallax.com/discussion/175564/
A simplified version, which is basically just Chip's example ported to VGA, and with a plot added on top of screen is here:
with the Mic-->FFT-->VGA example.
https://rayslogic.com/Propeller2/SimpleP2/SimpleP2Plus/Code/SimpleP2PlusCode.html
the top end I had been concerned with was 110Hz in the ADC input sense side, and going down to under 10Hz, moreso on the DAC output driver side.
The absolute most critical aspect of this system is that it detect a string plucked ( a high energy onset that decays immediately) at those 6 low frequencies with ~ 8-10ms cycle time. I'm gathering that the in-built Goertzel is optimal for high frequencies and that software implementation would be perfectly suitable. As this is the primary critical purpose of the device, I don't mind burning 6 cores constantly to complete the detection at the earliest possible moment. As I understand it, though, Goertzel requires at least one full cycle, and that catching it dependably one one cycle in a noisy signal is sort of wishful thinking.
what I had only realized last night is that the higher order harmonics, that I'd been meaning to filter out to increase signal to noise around target primary frequencies, may actually be the key to faster detection as the cycle times for the higher order harmonics gets much faster into the realm where I might be able to collect and analyze an adequate number cycles within an acceptable delay. I see in my preliminary spectral visualizations and understand the harmonics are usually relatively attenuated. but, it might make sense to specifically accentuate only the frequencies around the 1st,2nd,3rd harmonics and use the strength of all three as a more immediately recognizable "signature", if you will, that the particular string has been plucked as opposed to another string/instrument whose 1st harmonic may nearly coincide with a 2nd or 3rd of my string of interest. There are some cases like that, I've already checked.
plenty of time to do multiple processes for the detection of multiple notes/frequencies.
that's feeling like it might work and I won't get to play with phased arrays on this project
I found that almost immediately when embarking on this project last Thursday. Very nice results there. It gave me confidence to invest more focus/time on this.
some good simplifying tips in there, thanks.
I expected, if I ended up on the FFT train, to be cycling through an assortement of signal extraction methods, which is where I learned that the Wavelet Transform might apply well. honestly, I don't yet understand the math to most of those, just going on what research reading has turned up.
I was hoping to avoid much custom audio analog/digital circuits, though, in this case, pass-through fidelity is not a concern. I'll take a look to learn, anyway.
Hm, I have some heavy doubts, if the base frequency of a string plugged can be detected in less than one cycle. Do you know https://en.m.wikipedia.org/wiki/Karplus–Strong_string_synthesis ?
This means that the frequency is not defined at the beginning of the sound. The wave has to be reflected at the ends of the string to filter the resonance frequency of that string. Therefore the first cycle of the plugged string will not be usefull to find the resonance frequency.
interesting link, thanks! it'l be interesting to try it out and see if there is enough signal sooner from the higher freq harmonics available from a real string in the world. I'm taking that direction for the moment.
I wish the FFT could be made to work on a log-frequency scale, rather than a linear-frequency scale. That way, each bin could be set up as, say, 1/4th of a musical note step. That could tell you if you were sharp or flat for any given note.
Maybe if you did 12 FFTs, each separated by the twelfth root of 2 in sampling time, you could achieve this.
I think it was Phil Pilgrim who posted a P1 wake word type detector using Goertzel . Just looked at a few frequencies, as I recall.
Might be worth digging that up and porting to P2....
Here is something about a logarithmic FFT:
https://homepages.dias.ie/~ajones/publications/28.pdf
This is it: https://forums.parallax.com/discussion/115725/goertzel-based-speech-recognizer-now-with-source-code/p1
Chip, you've conjured up an unmatched system, for sure! I just really looked into Spin2 and I'm blown away by the other graphical analysis "debug"ing capabilites built-in. It really does seem to get out of the way and let me right at the registers and make the most of the hardware features. Particularly the FFT and scope stuff. After 30 years of coding, I've been on the C camp generally, because much code and algos are implemented there. But, I'm now longer going to shy away from writing my own Spin2 stuff now, even if it means working in Windows.
Thanks for the tips. I'll be keeping an eye out for your SimpleP2!
I second that. Also, keep in mind that there's Heisenbergs uncertainty priciple. Resolution of frequency is inversely proportional to the sampling time period. The better reolution you need the more periods you need to sample.
So we are talking about the six natural tones of (full length) guitar strings? Theese are 82 (E) to 330Hz (e). That are separated by about a factor of 4/3 so I think you need at least 3 cycles to identify one of them.
Hi,
sorry to say so, but in https://en.wikipedia.org/wiki/String_vibration the oscillation of the string is also described as a wave, which is moving along the string. This means, that at the moment, when the string is plugged, a pulse is starting in both directions. At this moment the length of the string is "not known" for the pulse. The resonance frequency only becomes effective after the filtering after the pulse is reflected from the ends of string. This is not only true for the base frequency but also for the overtones.
There are tuners on the market and also devices, which can produce second voices, harmonies. All of these have latency much greater than 10ms. In the case of voice processors, they use slow attack for the generated tone: The generated tone starts with low volume, which conceals the latency. Perhaps this is possible here too?
(((A 6 string Bass starts at 30,9Hz. 10Hz cannot be heard.)))
You want to do calculations with the data. As distortion adds harmonics, you don't want to have it and also you want to have as much valid bits as possible. The builtin sigma delta adc can provide about 11bits. A compressor can achieve that these are used quite often. Or you can use an external audio chip, as Parallax sells it.
I do not fully understand the task: There is music playing? The low frequency content is just 6 frequencies below 82 Hz? Or is there a broad spectrum and certain frequencies have to be filtered? The system finds the frequencies and emits in real time new, close frequencies? Do those feed back to the microphone or is the modified signal transmitted to anywhere?
Is the amplitude of the low frequencies dominating? Imagine a higher frequency that is modulated (changing amplitude) contains lower frequencies at low volume.
Can you be more specific in the task?
Was just testing out my version of Chip's FFT code with baby grand piano.
This video starts with a bad rendition of Charlie brown followed by playing all the white keys.
Looks like keys from around middle C to all the way on the right can be differentiated.
But, keys on the right side (lower frequency) are all kind of squashed in together.
I was also testing this code with a cello and appears that stringed instruments are much more of a challenge. All kinds of harmonics there...
A reciprocal frequency counter could easily distinguish these frequencies, even with a very low counter speed. I don't know how it would cope with the harmonics.
With the response time requirement, the code should not need more than about 1500 samples (at 44100Hz) of data. The spacing of the FFT bins is totally inadequate at a 1024 sample size. Now, you can simply pad a bunch of zeros on the end of the time domain data to interpolate the frequency domain data. The would generate more, smaller FFT bins without increasing the requirement for more input samples. But how long would it take the P2 to compute an 8192 or 16384 FFT? Maybe It would be better in this case to interpolate the frequency domain data directly? It would be nice to have an algorithm that can process data as it arrives, instead of waiting for all the samples to be stored before starting computation. If you only care about a few frequencies the FFT is fairly inefficient.
I might have more comments later after I run some simulations.
The problem in visualizing music via FFT is that lower notes are tightly banded together and harmonics are abundant. If we could have a magic Fourier transform that represented frequency in log, it would all be very straightforward. I don't know how to achieve this other than to run a whole bunch of discrete Goertzels at several octaves of note frequencies.
Here is a graph I made of four octaves of notes. Because there are 12 notes per octave, the ratio from each note to the next is the 12th root of 2, or about 1.059. After multiplying 12 times by 1.059, you arrive at twice the frequency you started at, or you've gone one whole octave. You can see that note 0 = 1, note 12 = 2, note 24 = 4, note 36 = 8, and note 48 = 16.
If we did a Goertzel at each exact note frequency, everything would be locked to the musical scale and harmonics could be perfectly registered.
Higher notes would need less sampling time than lower notes. Actually, to keep things even, the sampling period for each note would be inversely proportional to its frequency.
There must be some way to economize the computations when it comes to harmonics. Wait, that's what the FFT already does. maybe we just need 12 different low-data-point FFTs.
Let me say it this way: the problem with FT is, that it never was meant to be used the way it is used. ;-) . The basic principle is: if there is a fundamental frequency, then all the harmonics are well determined. The fundamental determines the period of the signal. As the signal is periodical, a single interval completely defines the infinite length of the signal, like running in a circle.
So, if your signals fundamental is 10 Hz and your interval is 0.1 second, the next possible harmonic is 20 Hz. If two notes differ by 1 Hz, you need an interval length of 1 second, so 10 Hz is the 9th harmonic and 11 Hz the 10th. So the lowest frequency of interest defines the smallest interval needed, while the resolution wanted defines the number of such intervals needed to separate such frequencies.
That shows: the problem of frequency analysis has no general solution, it always depends on the problem to solve, the information you want to get. That is one of the problems of speech recognition: You may identify a word, or a sentence, but not irony ;-) And that hints to why AI is on vogue: You never want to know, how it really works.
That's a challenge, what is the relative signal level of the desired signal, amongst other signals ? If you can mic the instrument of interest itself, things will be easier.
You may need to use envelope information, as well as frequency information.
A 82-110Hz bandpass would help S/N challenges, but it would mask the envelope information so separate envelope and signal filter paths could be helpful at the experiment stage.
I'd build those externally in analog with scope triggers, for the experiment stage, and then you can do Sw filters later.
6 Envelope triggered auto-correlators might manage this ?
To be more precise: do you know in advance which frequencies have to be replaced by which frequencies and should the amplitude be conserved? Obviously phase relations will not be conserved.
thanks for your interest.
the extent of the challenge: identify, as soon as possible, the presence of one of 6 fundamental frequencies... which may, or may not, be there at any given instant. leave it at that.
All good comments. thanks.
Certainly there will be experimentation on the bench and in the field. There is plenty of research and suggestions on the fast part, the fast-as-possible-part will need some actual testing and comparison. It may come to mic'ing them directly.
Are we talking the reciprocal frequency counter included in the P2 SmartPins? I may be confused about what they do. I thought they were well suited for binary sort of datastreams, not picking frequencies out of noise. In any case. thanks for the table of timings for bin size and speeds for FFT. very usefful.
hadn't the time to attempt any of this in code (I'm switching from C to Spin2, I think, to make better use of the examples given that twiddle register bits for more immediate/direct control of SmartPin features AND to make use of the nifty visual debug ffeatures with Spin ) but, I still believe that the final answer will involve muliple simultaneous analyses happening concurrently on the same data stream, each processing their own time-window on the data though with only a few clock cycles offset between them... that either minimally one of them will have arrived at the result sooner than the others and/or potentially, there is some further metadata that can be processed from the results of all of the concurrent analysis.
anyway, lots to play with. need to get my circuits down and on devices with good frequency response in that low zone.
I appreciate all you're doing to conjure up magic FFTs, and the visualization built in is a huge benefit.
I don't have the math chops to get overly creative in that realm.
I was counting on the brute parallel force I could apply to the problem given the unique strengths of the P2!
@cgracey , or anyone who might know... if I gang together 2 or 3 or 4 adjacent pins (all on the same interrnal reference voltage) with equal length traces to a single signal point for the purposes of starting/reading each of the ADC one (or few) clocks apart in time, is there some impedance or other impact to the signal that I should be aware of?
or... is the way these ADC work anyway internally is that they read on every clock cycle once they get spun up with the nifty new 17 layer averageing scheme noted recently?
and code wise, I hear tell of the ability to prep all smart pins to a state and start them on the exact same system clock cycle... am I making that up? where would I reference how to do that? and do you have some tips on doing the above mentioned few clock cycles diffferent in time?
thanks, again, anyone/everyone for all your thoughts!
OK, what you ask for is a likely a paradox. A signal only knows ONE fundamental. This is the frequency, that defines the period of said signal. So by definition all other frequencies are harmonics to this fundamental. Here the theory of the fourier transform ends.
But if, for example, the sound you analyze is created by different strings of a guitar, then for every string you have a spectrum of its own. so indeed there are multiple fundamentals.
I quote Chip: "the ratio from each note to the next is the 12th root of 2, or about 1.059." To identify two consecutive notes like 100Hz and 105.9Hz, the interval has to be expanded by about 106, then both notes fit into the interval.
As every string creates the fundamental frequency and the harmonics you now have to scan the spectrum for the existence of such a harmonic pattern.
But you will run into another problem, as you only can see the presence of a frequency and an average amplitude, and as the decay is increased with frequency, the spectrum is a function of the starting point of your signal acquisition, that is, how the single notes are located in the interval.
That said: solving your problem will imply a huge progress in signal processing, comparable to Fourier's fundamental work, I guess.
So, even at the beginning of the new Year, I'm not optimistic in this respect, but in all others. So I wish everybody a Happy New Year!
Ganging ADC pins can improve conversion quality, but it is not necessary.
Perhaps you could just use a period-measurement smart pin mode and select one of the digital-filtering modes for it. If the fundamental is very strong, that could do the trick.
Starting up pins can be done by ORing into DIRA, then DIRB, to enable all A pins at once and then all B pins two clocks later. That is close enough for most everything, I think.