Shop OBEX P1 Docs P2 Docs Learn Events
ADC Sampling Breakthrough - Page 30 — Parallax Forums

ADC Sampling Breakthrough

1272830323352

Comments

  • cgracey wrote: »
    Can you guys please think about this? What are the ramifications of double-integrating the Goertzel summing terms?

    On each clock, the ADC bit is now used to add/subtract an 8-bit cosine value and an 8-bit sine value to/from the X and Y accumulators.

    If we did a SINC2 by integrating the accumulators, and then took their periodic readings and computed diffs, might we double the ENOB of our readings?

    And how would all this work with the adder terms and accumulators and final integrators being all signed?

    This would, at least, get us around the lack-of-windowing problem that we already have in the Goertzel.
    1. Yes.
    2. It would perform sinc2 filtering. Yay! Why not go to sinc3 if it's cheap enough? For software defined radio I would love to see a third order filter. Would this always be desirable? It would interfere with using the Goertzel hardware as a FIR window filter.

    3. I don't know yet. The improvement should be similar to that seen for DC input. It should greatly improve rejection of frequencies not measured.
    4. The bit growth calculations assume two's compliment arithmetic. The input needs to be sign-extended. We were treating it as unsigned for the delta-sigma input. But our input is now 8 bits.

    sample interval, order, accumulator size
    256 clocks, 2nd order, 24 bits
    4096 clocks, 2nd order, 32 bits
    256 clocks, 3rd order, 32 bits

    Intervals could be lengthened slightly by reducing the number of active bits in the LUT.


    We'd have to read the accumulators 3 times to get one sample with sinc2. Or 4 times for sinc3. I'm not sure how this would work with clearing the accumulators upon read.
  • evanhevanh Posts: 15,916
    So, bitstream driving Goertzel driving two Sinc2's which are then sampled. Is that right?

  • AribaAriba Posts: 2,690
    edited 2018-12-07 07:13
    cgracey wrote: »
    ...
    If we did a SINC2 by integrating the accumulators, and then took their periodic readings and computed diffs, might we double the ENOB of our readings?
    ...
    You can filter the accumulators in software when you read the accumulators, no need to add additional SINC2 hardware.
    ...
    This would, at least, get us around the lack-of-windowing problem that we already have in the Goertzel.
    Can you synchronize the reading/clearing of the accumulators with the wrap around of the LUT address that generates the sin/cosine output?
    If so, you can have multiple periodes in the LUT and can calculate a window over these sin/cosine samples in the LUT RAM. The window size is then the chosen Goertzel loop size.

    GoertzelWindow.png

    Andy
    265 x 241 - 5K
  • Ariba wrote: »
    cgracey wrote: »
    ...
    If we did a SINC2 by integrating the accumulators, and then took their periodic readings and computed diffs, might we double the ENOB of our readings?
    ...
    You can filter the accumulators in software when you read the accumulators, no need to add additional SINC2 hardware.
    ...
    This would, at least, get us around the lack-of-windowing problem that we already have in the Goertzel.
    Can you synchronize the reading/clearing of the accumulators with the wrap around of the LUT address that generates the sin/cosine output?
    If so, you can have multiple periodes in the LUT and can calculate a window over these sin/cosine samples in the LUT RAM. The window size is then the chosen Goertzel loop size.

    GoertzelWindow.png

    Andy

    I'm going to do this when my P2 ES board arrives.

    Note that this does not restrict frequencies to a whole number of cycles per table. For a random frequency, the phase of the table output will differ each time it runs through compared to a continuous oscillator. Not a big problem, just rotate the Goertzel output to compensate and things should be fine. The only issue is not continuously responding to input. It might cost a little bit of sensitivity.
  • cgraceycgracey Posts: 14,155
    edited 2018-12-07 14:00
    Ariba wrote: »
    cgracey wrote: »
    ...
    If we did a SINC2 by integrating the accumulators, and then took their periodic readings and computed diffs, might we double the ENOB of our readings?
    ...
    You can filter the accumulators in software when you read the accumulators, no need to add additional SINC2 hardware.
    ...
    This would, at least, get us around the lack-of-windowing problem that we already have in the Goertzel.
    Can you synchronize the reading/clearing of the accumulators with the wrap around of the LUT address that generates the sin/cosine output?
    If so, you can have multiple periodes in the LUT and can calculate a window over these sin/cosine samples in the LUT RAM. The window size is then the chosen Goertzel loop size.

    GoertzelWindow.png

    Andy

    Andy, that is ingenious! Do the windowing operation via the LUT data. It never occurred to me before.
    Can you synchronize the reading/clearing of the accumulators with the wrap around of the LUT address that generates the sin/cosine output?

    That is exactly how it works. Furthermore, you can specify how many complete LUT cycles you want before X/Y accumulator posting and clearing. The upper two bytes of each LUT entry are the Goertzel adder values, while the bottom two bytes are what can be output, also, to the DACs. So, in the upper bytes, you can have your windowed sine/cosine pattern, while in the lower bytes you have your continuous sine/cosine pattern. This way, you can output steady sine/cosine signals of known phase and input windowed measurements that are a product of the simultaneous output.
  • cgraceycgracey Posts: 14,155
    Do you guys have any further thoughts on how to minimize the logic required to compute the Tukey window output? This could amount to a substantial logic savings
  • cgraceycgracey Posts: 14,155
    edited 2018-12-07 14:24
    Do you guys have any ideas on the performance improvements we could anticipate by using 8-bit Tukey samples, instead of 1-bit ADC samples, as input to the Goertzel computation? It would mean using 8x8 signed multipliers, instead of just 1-bit conditional negators, before the 32-bit accumulators.

    I'm thinking it could take things to a whole other level.
  • TonyB_TonyB_ Posts: 2,178
    edited 2018-12-07 19:32
    cgracey wrote: »
    TonyB_ wrote: »
    Any news on logic size with Tukey in the smart pins?

    If it's just too big to fit, could we have a mode where groups of eight ADC bits from each of four pins can be read/streamed in one long?

    Putting a Tukey into every smart pin, taking advantage of the existing flops, was smaller than putting half that many into the cogs. Only by a little bit, though.

    Was that with the 45-bit adder Tukey? Have you tried with FPGA using a counter for the +32 and +16 values? Or the add times 3 idea?

    I think any savings are likely to be small, though. Adding 45 tap values will always need a substantial amount of logic. How does short Tukey/Hann-like compare to long Tukey for quality? The plateau could be longer without adding much logic by using a counter.
    cgracey wrote: »
    Do you guys have any further thoughts on how to minimize the logic required to compute the Tukey window output? This could amount to a substantial logic savings.

    I've been trying my utmost to make the Tukey smaller. Having a small number of Tukey pins is one option.

    We need a plan B if, as seems likely, it won't fit. Would sliding windows be so terrible in software? Obviously not ideal in terms of speed, but windows could be anything, stored in LUT. How to handle triggering?
  • cgracey wrote: »
    Do you guys have any ideas on the performance improvements we could anticipate by using 8-bit Tukey samples, instead of 1-bit ADC samples, as input to the Goertzel computation? It would mean using 8x8 signed multipliers, instead of just 1-bit conditional negators, before the 32-bit accumulators.

    I'm thinking it could take things to a whole other level.

    According the to convolution theorem; multiplication in the time domain is equivalent to convolution in the frequency domain and vice versa; which means that if we could perform a fast Fourier transform on the one bit samples; or anything easily derived therefrom; then instead of multiplying each of the one bit samples; or the eight bit samples by a raised cosine (read Tukey); you might want to find some way; by hook or by crook, to get the ADC signal into the frequency domain, where the convolution kernel for a raised cosine is just the set {-1,2,-1}. One way to do this therefore might be to store the precomputed FFT values for every possible 8 bit sequence in a table; so as to be able just simply pick off 8 bits of raw ADC output at a time, with 4 bits of overlap - look up the precomputed FFT and then just simply add; no multiplies required! Then you might try down sampling the precomputed FFT results so that the next step would be to perform an inverse 4 point FFT on the down sampled, overlapped and anti-aliased summation - which for a 4 point FFT simply involves some additions and subtractions. Of course according to Wikipedia, Winograd sometime back in the 80's (I think) figured out that it is possible to perform ANY FFT with nothing but a large number of additions and subtractions; that is to say if you are willing to perform exactly 4*N multiplications at the very end. Of course - I don't know off the top of my head how to do some, or any of the more advanced Winograd transforms, like some of the ones that involve cyclotomic polynomials derived from some transformation based on a Galois field that in turn allows flipping between different prime number factorings; but for the smaller transforms it is a slam dunk in terms of the theory; other than like any software project - the devil is in the details when it come times to debugging.
  • cgraceycgracey Posts: 14,155
    edited 2018-12-07 15:48
    lazarus666 wrote: »
    cgracey wrote: »
    Do you guys have any ideas on the performance improvements we could anticipate by using 8-bit Tukey samples, instead of 1-bit ADC samples, as input to the Goertzel computation? It would mean using 8x8 signed multipliers, instead of just 1-bit conditional negators, before the 32-bit accumulators.

    I'm thinking it could take things to a whole other level.

    According the to convolution theorem; multiplication in the time domain is equivalent to convolution in the frequency domain and vice versa; which means that if we could perform a fast Fourier transform on the one bit samples; or anything easily derived therefrom; then instead of multiplying each of the one bit samples; or the eight bit samples by a raised cosine (read Tukey); you might want to find some way; by hook or by crook, to get the ADC signal into the frequency domain, where the convolution kernel for a raised cosine is just the set {-1,2,-1}. One way to do this therefore might be to store the precomputed FFT values for every possible 8 bit sequence in a table; so as to be able just simply pick off 8 bits of raw ADC output at a time, with 4 bits of overlap - look up the precomputed FFT and then just simply add; no multiplies required! Then you might try down sampling the precomputed FFT results so that the next step would be to perform an inverse 4 point FFT on the down sampled, overlapped and anti-aliased summation - which for a 4 point FFT simply involves some additions and subtractions. Of course according to Wikipedia, Winograd sometime back in the 80's (I think) figured out that it is possible to perform ANY FFT with nothing but a large number of additions and subtractions; that is to say if you are willing to perform exactly 4*N multiplications at the very end. Of course - I don't know off the top of my head how to do some, or any of the more advanced Winograd transforms, like some of the ones that involve cyclotomic polynomials derived from some transformation based on a Galois field that in turn allows flipping between different prime number factorings; but for the smaller transforms it is a slam dunk in terms of the theory; other than like any software project - the devil is in the details when it come times to debugging.

    From what you are saying, it almost sounds like some kind of live FFT could be maintained in real time using one bit samples.
  • cgraceycgracey Posts: 14,155
    Saucy, about feeding 8-bit samples into the Goertzel...

    I think it was you who said it would be rather pointless, because these samples are actually just one new bit per clock, anyway. Is that what you suppose? It stands to reason. If there was higher entropy in those 8-bit samples, there would be an advantage to using them. As it is, probably not.
  • cgracey wrote: »
    Saucy, about feeding 8-bit samples into the Goertzel...

    I think it was you who said it would be rather pointless, because these samples are actually just one new bit per clock, anyway. Is that what you suppose? It stands to reason. If there was higher entropy in those 8-bit samples, there would be an advantage to using them. As it is, probably not.
    The existing Goertzel:
    NCO-->LUT-->multiply-->sum
                    ^
          ADC bit  _/
    
    
    The proposed Goertzel:
    NCO-->LUT-->multiply-->sum->integrate
                    ^
          ADC bit  _/
    
    
    Don't do this:
    NCO-->LUT-->multiply-->sum
                    ^
    ADC bit-->Tukey_/
    
    The entropy is not a bad way to explain it. The low pass Tukey or sinc filter does not increase the entropy of the signal. It might decrease it.

    The output of the Goertzel "multiplier" may have a greater need for windowing than the ADC output. The sine/cosine inputs are periodic and what part of the cycle we collect measurements does affect the readings.

    The Goertzel output should be low pass filtered the same as the ADC output. The multiplication simply shifts the frequency we want down to zero. We can work the other way too, shifting the frequency of a lowpass filter up to become a bandpass filter. The plots show what the response of the Goertzel should be like. This is not a suggestion to use the Tukey on the Goertzel output. I used the Tukey because we've been studying it closely.

    Whether receiving radio signals or doing Goertzel analysis we want to reject the undesired frequencies to the greatest extent practical. At the very least, the Goertzel should resist DC from influencing the result. Is that why it add and subtracts instead of just adding?

    The diagram is from the article "The USRP under 1.5X Magnifying Lens!" That's basically what the Goertzel does. It's got some serious filtering to keep out-of-band signals out. The FPGA in the USRP1 does not have multipliers so they used a cordic instead.
    1188 x 704 - 211K
    1200 x 900 - 32K
    1200 x 900 - 32K
  • Don't do this: Seriously!
    NCO-->LUT-->multiply-->sum
                    ^
    ADC bit-->Tukey_/
    
    If you do this the Tukey or whatever window on the Goertzel input will attenuate the high frequency you are trying to measure and pass the DC though full strength. More DC will be picked up by the sidelobes of the rectangular window used by the Goertzel. It's worse than useless.

    In practice the measured frequency would be in the passband of the Tukey. In that case the above doesn't apply, but there is still no benefit.

  • cgraceycgracey Posts: 14,155
    Thanks, Saucy.

    I just remembered something about our Goertzel. We can play portions of the LUT. So, we can have a window open, plateau, and window close section in the LUT for making long measurements.

    It's interesting in that thing you posted that they are doing the decimation on the sine and cosine sums. That must improve acquisition time down to the square root of the number of cycles it would take, otherwise. For picking data out of a carrier wave, that must be crucial.
  • I did a quick test on the P1. Breadboarded on my Activity Board with 220pF caps instead of 1nF. Input was grounded.
               Triangle    Rectangle
    Mean      1143.3       1153.9
    Std Dev      0.13057      4.64507
    
    
    Triangular window has a standard deviation 2.8% of the rectangular window. :sunglasses: Have we been doing ADC wrong for 12 years? I'm probably going to build a second or third order modulator next.

    That's very cool! For the second order, could you please try this code? (I don't have a board with an ADC on it atm.)
                  ' quadratic, 16 clocks per iteration, 4 groups of iterations
                  ' N1 = (sample_period - overhead) / 16 / 4
                  ' N2 = 2 * N1
                  ' N3 = N1
                  mov       accum, #0
            
    :accelloop1   add       frqa,accum
                  add       accum, #1
                  add       frqa,accum
                  djnz      N1, #:accelloop1
    
    :decelloop2   add       frqa,accum
                  sub       accum, #1
                  add       frqa,accum
                  djnz      N2, #:decelloop2
    
    :accelloop3   add       frqa,accum
                  add       accum, #1
                  add       frqa,accum
                  djnz      N3, #:accelloop3
    
                  mov       frqa, #0 
    
    thanks,
    Jonathan
  • evanhevanh Posts: 15,916
    edited 2018-12-08 14:33
    Chip,
    I've just tried to use smartpin mode %01100, inc on A-rise & B-high, for use on an externally clocked bitstream (generated by an AD7400 isolated ADC). Mode %01100 is not entirely ideal for this job because the settable (X) measurement period is in sysclocks rather than bitstream clocks. Funny how such details don't pop out on first look. I know it never was intended for external synchronous clocking but I thought I should say something anyway.

    I actually ended up not using that mode at all. Just went back to mode %01111 instead, and ignored the external clock. A side effect is only one pin is used this way, so that's kind of cool. It seems to be functioning just fine this way.

  • cgraceycgracey Posts: 14,155
    edited 2018-12-08 15:32
    evanh wrote: »
    Chip,
    I've just tried to use smartpin mode %01100, inc on A-rise & B-high, for use on an externally clocked bitstream (generated by an AD7400 isolated ADC). Mode %01100 is not entirely ideal for this job because the settable (X) measurement period is in sysclocks rather than bitstream clocks. Funny how such details don't pop out on first look. I know it never was intended for external synchronous clocking but I thought I should say something anyway.

    I actually ended up not using that mode at all. Just went back to mode %01111 instead, and ignored the external clock. A side effect is only one pin is used this way, so that's kind of cool. It seems to be functioning just fine this way.

    So, the period needs to be in pin rises, not sysclks, right? It's not complicated to add some config bit to the mode to select such a thing.

    I will review the modes and see where such things can be added.
  • jmgjmg Posts: 15,173
    cgracey wrote: »
    evanh wrote: »
    Chip,
    I've just tried to use smartpin mode %01100, inc on A-rise & B-high, for use on an externally clocked bitstream (generated by an AD7400 isolated ADC). Mode %01100 is not entirely ideal for this job because the settable (X) measurement period is in sysclocks rather than bitstream clocks. Funny how such details don't pop out on first look. I know it never was intended for external synchronous clocking but I thought I should say something anyway.

    I actually ended up not using that mode at all. Just went back to mode %01111 instead, and ignored the external clock. A side effect is only one pin is used this way, so that's kind of cool. It seems to be functioning just fine this way.

    So, the period needs to be in pin rises, not sysclks, right? It's not complicated to add some config bit to the mode to select such a thing.

    I will review the modes and see where such things can be added.
    External clock will be useful in most modes. (+ / -)
    For external ADC, the Din is a count-enable.

  • evanhevanh Posts: 15,916
    cgracey wrote: »
    So, the period needs to be in pin rises, not sysclks, right?

    Right.

  • evanhevanh Posts: 15,916
    Chip,
    Is there any detailed info on DAC configuration for the Prop123 board? So far I've managed to get one going but don't know much about the settings I've used. The channel numbering, for a starters, seems to mixed up.
    		wrpin   ##%1010000000000_01_00000_0, #1           'set DAC mode for DAC0 (maybe)
    

    And this seems to activate DIR as well. Not nice when trying to use the digital pins at the same time.

  • evanhevanh Posts: 15,916
    edited 2018-12-09 02:11
    I've got some fast action going on with this AD7400. Even though the ADC is only operating at 10 Mbps, I'm making use of the 80 MHz sysclock to boost the existing sinc1 to a sync3 emulation in a tight 100 ns loop.
    '==================================
    ' Sinc3 filter (cogexec in cog #1)
    '==================================
    ORG
    start_sinc3
    		cogid   cid
    		wrpin   #%00_01111_0, #tpin   'set adc/counter mode
    		wypin   #0, #tpin             'inc on high
    		wxpin   #0, #tpin             'totaliser
    		dirh    #tpin                 'enable smart pin
    
    'Sinc3 loop (8 sysclocks)
    		rep     @.lend, #0            'loop forever
    
    		rdpin   acc1, #tpin
    		add     acc2, acc1
    		add     acc3, acc2
    		wrlut   acc3, #(mailbox & $1ff)   'for the decimator (lut sharing is active)
    .lend
    		cogstop cid
    
    
    
    acc1		long    0
    acc2		long    0
    acc3		long    0
    cid		long    0
    
    ORG $3ff
    mailbox         long    0
    
    

    One question on pasm syntax: The mailbox ORG'ing of $3ff, is it useful in terms of allocation management? Is that the right way to handle lutRAM? I mean I could have just hard coded #$1ff for the WRLUT instruction.

  • cgraceycgracey Posts: 14,155
    Evanh,

    On the Prop123 FPGA board, cog0 DAC channels 3/2/1 go to the DACs for R/G/B. Note that these are just the cog DAC channels, not particular smart pins.

    About your code above, your "org $3ff" doesn't actually load anything subsequentg into LUT. It just tracks LUT addressing, while putting code/data into hub space.
  • evanhevanh Posts: 15,916
    cgracey wrote: »
    About your code above, your "org $3ff" doesn't actually load anything subsequentg into LUT. It just tracks LUT addressing, while putting code/data into hub space.

    Right, yep, I could have used res 1 instead of long 0. I'm not confident of the tracking though. Is that recognised as lut space? Given wrlut can't resolve the address directly.

  • evanhevanh Posts: 15,916
    cgracey wrote: »
    On the Prop123 FPGA board, cog0 DAC channels 3/2/1 go to the DACs for R/G/B. Note that these are just the cog DAC channels, not particular smart pins.
    Okay, just three DACs. And should reference the RGB labels instead of the socket numbers on the board.

    And to enable them I have to set the smartpin I/O config as if the related DAC was really in that I/O pad. Which allows them to be operated by the smartpins. Right?

  • cgraceycgracey Posts: 14,155
    edited 2018-12-09 03:35
    evanh wrote: »
    cgracey wrote: »
    About your code above, your "org $3ff" doesn't actually load anything subsequentg into LUT. It just tracks LUT addressing, while putting code/data into hub space.

    Right, yep, I could have used res 1 instead of long 0. I'm not confident of the tracking though. Is that recognised as lut space? Given wrlut can't resolve the address directly.

    The assembler will register that cog address as $3FF, which is a LUT address for branching. Meanwhile, it's address $1FF of the LUT. I know this may not be any new information to you.
  • evanhevanh Posts: 15,916
    cgracey wrote: »
    Meanwhile, it's address $1FF of the LUT.

    I had to mask it to get that in the source above. So I'm thinking there is a better way.

  • evanhevanh Posts: 15,916
    Cool, I've just tried the 16-bit dither smartpin mode. Looks nice.
    evanh wrote: »
    cgracey wrote: »
    On the Prop123 FPGA board, cog0 DAC channels 3/2/1 go to the DACs for R/G/B. Note that these are just the cog DAC channels, not particular smart pins.
    Okay, just three DACs. And should reference the RGB labels instead of the socket numbers on the board.

    And to enable them I have to set the smartpin I/O config as if the related DAC was really in that I/O pad. Which allows them to be operated by the smartpins. Right?

    Oh, I see now, that's done intentionally for 24-bit RGB colour. The least 8 bits of a 32-bit longword aren't used, so OUT0 DAC is also skipped to suit.

  • evanhevanh Posts: 15,916
    16-bit dither smartpin mode is working, btw. So I assume you've had different ways to control the DACs with different FPGA images.

  • cgraceycgracey Posts: 14,155
    I've found a way to compress the Tukey:
    tap	value		3's	2's
    ------------------------------------------
    0	000001		+1	-1
    1	000011			+1
    2	000101			+1
    3	000111			+1
    4	001010		+1
    5	001101		+1
    6	010000		+1
    7	010011		+1
    8	010110		+1
    9	011001		+1
    10	011011			+1
    11	011101			+1
    12	011111			+1
    13	100000		+1	-1
    14	100000							
    15	100000							
    16	100000							
    17	100000							
    18	100000							
    19	100000							
    20	100000							
    21	100000							
    22	100000							
    23	100000							
    24	100000							
    25	100000							
    26	100000							
    27	100000							
    28	100000							
    29	100000							
    30	100000							
    31	100000							
    32	011111		-1	+1
    33	011101			-1
    34	011011			-1
    35	011001			-1
    36	010110		-1
    37	010011		-1
    38	010000		-1
    39	001101		-1
    40	001010		-1
    41	000111		-1
    42	000101			-1
    43	000011			-1
    44	000001			-1
           (000000)		-1	+1
    

  • SaucySolitonSaucySoliton Posts: 521
    edited 2018-12-09 17:20
    lonesock wrote: »
    That's very cool! For the second order, could you please try this code? (I don't have a board with an ADC on it atm.)
                  ' quadratic, 16 clocks per iteration, 4 groups of iterations
                  ' N1 = (sample_period - overhead) / 16 / 4
                  ' N2 = 2 * N1
                  ' N3 = N1
                  mov       accum, #0
            
    :accelloop1   add       frqa,accum
                  add       accum, #1
                  add       frqa,accum
                  djnz      N1, #:accelloop1
    
    :decelloop2   add       frqa,accum
                  sub       accum, #1
                  add       frqa,accum
                  djnz      N2, #:decelloop2
    
    :accelloop3   add       frqa,accum
                  add       accum, #1
                  add       frqa,accum
                  djnz      N3, #:accelloop3
    
                  mov       frqa, #0 
    
    thanks,
    Jonathan
        Triangle     Rectangle
    Mean   1164.5     1166.5
    Std       0.20081    1.51132
    relative_noise =  0.13287
    
        Quadratic     Rectangle
    Mean   1171.0     1173.3
    Std       0.16607    1.59741
    relative_noise =  0.10396
    N1=N3=64
    N2=128  This was to get a 12 bit result from the rectangular window.
    
    Breadboarding the ADC circuit can be problematic. In my previous tests it was intermittently oscillating at 40MHz. Removing the caps seems to produce better results, but the scope probe loads the circuit enough to affect it. In this test the improvement is less. But it's still an order of magnitude or 3 bits.
    1200 x 900 - 27K
Sign In or Register to comment.