ADC Sampling Breakthrough - Page 31 — Parallax Forums

• cgracey wrote: »
I've found a way to compress the Tukey:
```tap	value		3's	2's
------------------------------------------
0	000001		+1	-1
1	000011			+1
2	000101			+1
3	000111			+1
4	001010		+1
5	001101		+1
6	010000		+1
7	010011		+1
8	010110		+1
9	011001		+1
10	011011			+1
11	011101			+1
12	011111			+1
13	100000		+1	-1
14	100000
15	100000
16	100000
17	100000
18	100000
19	100000
20	100000
21	100000
22	100000
23	100000
24	100000
25	100000
26	100000
27	100000
28	100000
29	100000
30	100000
31	100000
32	011111		-1	+1
33	011101			-1
34	011011			-1
35	011001			-1
36	010110		-1
37	010011		-1
38	010000		-1
39	001101		-1
40	001010		-1
41	000111		-1
42	000101			-1
43	000011			-1
44	000001			-1
(000000)		-1	+1
```

Has this been tested? I want the Tukey logic to fit, but I wonder whether it will be too big even if we could reduce it by half.
• As I'm busy, I did not follow this development. It would be nice to have a short abstract of what is going on with this filter mechanism:

Every clock a 0/1 bit is generated. This bit goes to a filter. The filter creates a new output value every clock? Is that correct?
• ErNa wrote: »
As I'm busy, I did not follow this development. It would be nice to have a short abstract of what is going on with this filter mechanism:

Every clock a 0/1 bit is generated. This bit goes to a filter. The filter creates a new output value every clock? Is that correct?

A 45-sample sliding window is used to generate an 8-bit output on every clock:

* A 45-bit register is shifted by one bit: an ADC bit is shifted into bit 0, bit 0 is shifted to bit 1, ..., bit 44 is shifted out and discarded.
* Each of the 45 bits is multiplied by one of 45 taps containing fixed Tukey window values
* These multiplied values are added together, then divided by 4 to give an 8-bit output
* The above logic is multiplied by four so that any four pins produce a combined 32-bit output
• The 45 taps represent a set of n-bit registers containing a rising slope, full scale, falling slope (rising/falling according Tukey definition) ?
Multiply these register values by 0 or 1 and adding is just "integrating" the register value?
The "32-bit" output means 4 independent 8 bit values? (adding 4 8bit values -> 10 bit value)
As 45 taps worst case means 45 counts, full scale = 256 / 45 = 5 ?
• ErNa wrote: »
The 45 taps represent a set of n-bit registers containing a rising slope, full scale, falling slope (rising/falling according Tukey definition) ?
Multiply these register values by 0 or 1 and adding is just "integrating" the register value?
The "32-bit" output means 4 independent 8 bit values? (adding 4 8bit values -> 10 bit value)
As 45 taps worst case means 45 counts, full scale = 256 / 45 = 5 ?

Yes.
Yes, maybe.
Yes.
No, worst-case is all 45 bits set, total = 1024, divided by 4 = 256, then adjusted to be 255.
• Breadboarding the ADC circuit can be problematic. In my previous tests it was intermittently oscillating at 40MHz. Removing the caps seems to produce better results, but the scope probe loads the circuit enough to affect it. In this test the improvement is less. But it's still an order of magnitude or 3 bits.

I can run tests with my AD7400 eval board. see https://forums.parallax.com/discussion/comment/1456465/#Comment_1456465 I've actually just finished installing PropellerIDE and test running that earlier code. I've got numbers scrolling up the terminal.

• TonyB_ wrote: »
ErNa wrote: »
The 45 taps represent a set of n-bit registers containing a rising slope, full scale, falling slope (rising/falling according Tukey definition) ?
Multiply these register values by 0 or 1 and adding is just "integrating" the register value?
The "32-bit" output means 4 independent 8 bit values? (adding 4 8bit values -> 10 bit value)
As 45 taps worst case means 45 counts, full scale = 256 / 45 = 5 ?

Yes.
Yes, maybe.
Yes.
No, worst-case is all 45 bits set, total = 1024, divided by 4 = 256, then adjusted to be 255.

Multiply these register values by 0 or 1 and adding is just "integrating" the register values where the corresponding bits are set.

So, this is just a correlation of the bit stream with a well shaped pulse.
I'm in a hurry now, so I'll come back later to understand what total = 1024 means. I just thought: 45 Bits in the bitstream can count up to 45... Maybe the tukey coefficients are summing up to 1024 and so /4 to fit into 8 bits.
But why 45 taps?
• edited 2018-12-10 05:52
ErNa wrote: »
TonyB_ wrote: »
ErNa wrote: »
The 45 taps represent a set of n-bit registers containing a rising slope, full scale, falling slope (rising/falling according Tukey definition) ?
Multiply these register values by 0 or 1 and adding is just "integrating" the register value?
The "32-bit" output means 4 independent 8 bit values? (adding 4 8bit values -> 10 bit value)
As 45 taps worst case means 45 counts, full scale = 256 / 45 = 5 ?

Yes.
Yes, maybe.
Yes.
No, worst-case is all 45 bits set, total = 1024, divided by 4 = 256, then adjusted to be 255.

Multiply these register values by 0 or 1 and adding is just "integrating" the register values where the corresponding bits are set.

So, this is just a correlation of the bit stream with a well shaped pulse.
I'm in a hurry now, so I'll come back later to understand what total = 1024 means. I just thought: 45 Bits in the bitstream can count up to 45... Maybe the tukey coefficients are summing up to 1024 and so /4 to fit into 8 bits.
But why 45 taps?

45 taps seems sufficient for an 8-bit sample.

A total of 1024 >> 2 produces 256, which gets clamped to 255 to make an 8-bit sample. 1024 results if all taps are holding 1's.

I just made a little "RFO BASIC!" program to prove the new method:
```dim t

for x = 1 to 46
t[x] = 0
next x

th = 0
tw = 0

for y = 1 to 46*2

t = 1
if y>46 then t = 0

if t = 1 then th = th + 1
if t = 1 then th = th + 1
if t = 1 then th = th + 1
if t = 1 then th = th + 1
if t = 1 then th = th + 1
if t = 1 then th = th + 1
if t = 1 then th = th + 1
if t = 1 then th = th + 1
if t = 1 then th = th - 1
if t = 1 then th = th - 1
if t = 1 then th = th - 1
if t = 1 then th = th - 1
if t = 1 then th = th - 1
if t = 1 then th = th - 1
if t = 1 then th = th - 1
if t = 1 then th = th - 1

if t = 1 then tw = tw - 1
if t = 1 then tw = tw + 1
if t = 1 then tw = tw + 1
if t = 1 then tw = tw + 1
if t = 1 then tw = tw + 1
if t = 1 then tw = tw + 1
if t = 1 then tw = tw + 1
if t = 1 then tw = tw - 1
if t = 1 then tw = tw + 1
if t = 1 then tw = tw - 1
if t = 1 then tw = tw - 1
if t = 1 then tw = tw - 1
if t = 1 then tw = tw - 1
if t = 1 then tw = tw - 1
if t = 1 then tw = tw - 1
if t = 1 then tw = tw + 1

for x = 46 to 2 step -1
t[x] = t[x-1]
next x

sum = th*3 + tw*2
sample = int(sum/4)
if sample = 256 then sample = 255
print th,tw,sum,sample

next y
```

It maintains a running total by tracking 9-bit "twos" and "threes" terms, by summing 16 tap bits, each. The old way had to sum over twice that many bits. This should be close to half the previous logic size.
• I've been focusing my analysis on filters constructed from moving averages. Why? Because moving average filters can be implemented cheaply. And because various shapes like a triangle, trapezoid, and Gaussian can be created by running the signals through repeated moving average filters. The sincN family uses all the same length averages, but interesting stuff happens when the lengths are not the same. I'll call them Cascaded Moving Average filters.

First stage averages 31 bits.
Second stage averages 11 numbers from the first stage.
Third stage averages 3 numbers from the second stage.

Convolution is sometimes expressed with the * symbol. We could describe this filter as CMA31*11*3.

I think it could be implemented as follows:
First stage 31 register bits, output 2 bit delta. The full sum would be 5 bits, but the output will only change by 1 count each clock.
Second stage 2x11 register bits, output 3 bit delta. The full sum output of this stage could go up or down by 2 counts or less, depending on the input.
Third stage 3x3 register bits, output 4 bit delta.
Follow it all by 3 accumulators. This compensates for the diffs we added after the sum to reduce the amount of memory needed.

It might be better to just sum the length 3 term directly.

It would lend it self well to a decimate by 3, but it's too late for me to figure out how tonight.
```WindowFunction  UnquantizedVpp98%  UnquantizedStdDev  HFNoisePower  HighBitCount
tukey17_13_32         1.82812         0.427648         0.228645           95
cma31x11x3            1.65625         0.411339         0.347341           96
cma20x17x3            2.109375        0.5095961        0.1255628         104
```
The performance should be similar, but cma31x11x3 is 2 taps shorter.
```  1    3    6    9   12   15   18   21   24   27   30   32
33   33   33   33   33   33   33   33   33   33   33   33   33   33   33   33   33   33   33
32   30   27   24   21   18   15   12    9    6    3    1
```
The sum is 1023. It turns out that the sum of the taps is the product of the lengths. So if you want to construct a filter with a certain sum, factor it and decide how to group the factors together.
• edited 2018-12-10 13:56
evanh wrote: »
I've got some fast action going on with this AD7400. Even though the ADC is only operating at 10 Mbps, I'm making use of the 80 MHz sysclock to boost the existing sinc1 to a sync3 emulation in a tight 100 ns loop.
```'==================================
' Sinc3 filter (cogexec in cog #1)
'==================================
ORG
start_sinc3
cogid   cid
wrpin   #%00_01111_0, #tpin   'set adc/counter mode
wypin   #0, #tpin             'inc on high
wxpin   #0, #tpin             'totaliser
dirh    #tpin                 'enable smart pin

'Sinc3 loop (8 sysclocks)
rep     @.lend, #0            'loop forever

rdpin   acc1, #tpin
wrlut   acc3, #(mailbox & \$1ff)   'for the decimator (lut sharing is active)
.lend
cogstop cid

acc1		long    0
acc2		long    0
acc3		long    0
cid		long    0

ORG \$3ff
mailbox         long    0

```

Ah, after some solid testing, I've discovered that the decimator has to update on a multiple of 8 clocks. ie: In sync with the accumulator update rate. Otherwise there is some sort of shadow phase rotations occurring.

• edited 2018-12-10 14:15
And if the signal is small then over-driving the accumulators, by extending the decimation period beyond the calculated capacity limit, works too.

• Saucy, I've been thinking about the same things over the last 24 hours. I realized I could do a double-integrating filter which is even cheaper than the one I last proposed, but then I realize the ultimate is to go more natural and forget about the concept of certain taps and have some kind of integrating filter that just winds up being what you need.
• edited 2018-12-10 14:55
I'm not in the office, so I can not verify what I say here:
way back I applied those filters:
ErNa wrote: »
The smartpin can create a new value every clock (if I'm right) the way I showed in the excel sheets that is filtered. Then it is up to the propeller software, how often the smartpin is read. It would be nice if someone could test mit mathlab or similar, what the filter curve of such a cascaded averager looks like. I think, this filter can be realized with very little hardware. The first graph shows the bitstream, values 0-1
the second the sum of 2 neigbours, values 0 - 2
the third the sum of 2 neigbours of the second, values 0-4 3-Bits resolution
and so on. The last but one graph has 8 bits resolution, still full data rate. We now clearly see a structure that allows to apply a special type of filter, finally we see a nice structure of the bitstream data.
No, I do not create bits from nothing. The point is, that in the first graph the signal changes abruptly between 0 and one. In the filtered signals the signal can not change more then one level, so the slope is limited and so the variety in the signals possible. That's the point. The filter converts signal information in time (streamed data) to signal information amplitude. As you can not read the streamer at full rate, it should be usefull reading as many bits to fill the sampling period of the propeller program loop.
The single step are showing how the signal evolves. There is no need to do this, in the end, it's just to know, how many bits are set in a shiftregister of arbitrary length.

The last signal shows, there is structure in the noise. As we know that the signal to measure is band limited (makes no sense to try to measure 20 MHz with 100 MHz clock and 10 bit resolution) we know, all the variations we see ARE ERRORS and we can just clip them by limiting the slope.

It would be nice to have an actual signal in the bitstream, e.g. 2 MHz of 10% FS amplitude

remark: the adders only have to add 1 bit in the first stage, 2 in the second etc as the results are limited. I will do another simulation later to simplify the hardware BOM
• edited 2018-12-10 15:21
Erna,
I've got this AD7400 generating a noisy 50Hz from a dangling wire. the analogue filtered bitstream is about 60 mVp-p on a 3.0 V regulator, so that's 2% maybe. There is notable other 60 kHz, probably switchmode, carrier looking fuzz in amongst it.

I'll see how much of it I can record ...

• ErNa

I am supposing that, in the above example, you have used some bitstream that was meant to contain only "01010101" or "10101010" sequences, thus being an expression of some "plateau"-alike measurement (any steady voltage level that, if it was not due the noise that is affecting its digital expression, would keep its uniformity, almost undisturbed, during a long time).

To ease any analisys, you could start the above depicted sums at the first pair of samples that diverge from the "10" or "01" cadence, discarding the former ones.

You could freely stablish a minimum sequence: "10", "1010" (or their inverses), or any other you could feel appropriate, as a "steadyness" delimiter, provided it can be repeatedly found within the bitstream.

IMHO, the above could ease finding the center peak positions, and also stablish relations with interferences, such as 50 or 60 Hz, and, perhaps, a better observation of 1/f noise.

Hope it could help

Henrique
• evanh

If you are also looking for 1/f noise interfering with your measurements, IMHO it would be better to record the bitstreams for at least 10 seconds, or even more.

It's not unnusual for that kind of noise to have some observable peakings at 0.1 Hz, thus it's advisable to extend the sampling period to enable "catching" those points too.

This also serves to the efforts being done by ErNa.

Henrique
• evanh wrote: »
evanh wrote: »
I've got some fast action going on with this AD7400. Even though the ADC is only operating at 10 Mbps, I'm making use of the 80 MHz sysclock to boost the existing sinc1 to a sync3 emulation in a tight 100 ns loop.
```'==================================
' Sinc3 filter (cogexec in cog #1)
'==================================
ORG
start_sinc3
cogid   cid
wrpin   #%00_01111_0, #tpin   'set adc/counter mode
wypin   #0, #tpin             'inc on high
wxpin   #0, #tpin             'totaliser
dirh    #tpin                 'enable smart pin

'Sinc3 loop (8 sysclocks)
rep     @.lend, #0            'loop forever

rdpin   acc1, #tpin
wrlut   acc3, #(mailbox & \$1ff)   'for the decimator (lut sharing is active)
.lend
cogstop cid

acc1		long    0
acc2		long    0
acc3		long    0
cid		long    0

ORG \$3ff
mailbox         long    0

```

Ah, after some solid testing, I've discovered that the decimator has to update on a multiple of 8 clocks. ie: In sync with the accumulator update rate. Otherwise there is some sort of shadow phase rotations occurring.
The gain is affected by decimation rate. So generally you want a fixed decimation rate to keep the gain constant.
• Hmm, don't have way of synchronising the AD7400's native 10 MHz bitstream clock. So at 80 Msps we're talking 10e6 bytes/s. Even at 10 Mbps that's still over 1 MB/s. Might have some trouble making 10 seconds unbroken using just hubRAM.

• edited 2018-12-10 17:33
The gain is affected by decimation rate. So generally you want a fixed decimation rate to keep the gain constant.

Oh, the software isn't just wandering on that parameter. I have it adjustable for diagnostics. As a result I learnt that I couldn't just use any old value for the rate.

It's not something of concern in the real hardware because that's updating the accumulators on every clock. Then there's no integer that isn't a multiple.

• edited 2018-12-10 18:00
As far as I know, a streamer can't lock onto an arbitrary external clock source, of any speed. It's principle is for the whole Propeller to be locked to the same master clock as the external devices.

I guess the real silicon could be driven by the AD7400's clock.

With that in mind, the most efficient config for the FPGA would be a little over the 20 MHz Nyquist: 80 MHz / 3 = 26.67 MHz.

• evanh

I don't know if you can use it in some way, but there are two bitstream packing-related blog entries, whose full coding is available at Github, under MIT license terms.

Perhaps one or even two cogs could do one of them, in real time, to dimminish the memory requirements for long-term sampling experiments.

Hope it helps

kinematicsoup.com/news/2016/9/6/data-compression-bit-packing-101

kinematicsoup.com/news/2017/8/17/data-compression-crushing-data-using-entropy
• edited 2018-12-10 18:12
Thanks, I'll look those up.

I've also remembered there is a 32 MB SDRAM on the P123 FPGA board. Chip will need to explain how to accessed this. • There is some SDRAM code from Chip or Ariba from P2-hot I could post if useful. I don't think it would take too much to translate it
• The recent Prop2 FPGA compiles didn't involve the SDRAM.

Right now, things are in flux with the FPGA code, but I could have a version in a week, or two.
• Here's the code from P2Hot fwiw.
• edited 2018-12-10 23:56
I've been focusing my analysis on filters constructed from moving averages. Why? Because moving average filters can be implemented cheaply. And because various shapes like a triangle, trapezoid, and Gaussian can be created by running the signals through repeated moving average filters. The sincN family uses all the same length averages, but interesting stuff happens when the lengths are not the same. I'll call them Cascaded Moving Average filters.

First stage averages 31 bits.
Second stage averages 11 numbers from the first stage.
Third stage averages 3 numbers from the second stage.

Convolution is sometimes expressed with the * symbol. We could describe this filter as CMA31*11*3.

I think it could be implemented as follows:
First stage 31 register bits, output 2 bit delta. The full sum would be 5 bits, but the output will only change by 1 count each clock.
Second stage 2x11 register bits, output 3 bit delta. The full sum output of this stage could go up or down by 2 counts or less, depending on the input.
Third stage 3x3 register bits, output 4 bit delta.
Follow it all by 3 accumulators. This compensates for the diffs we added after the sum to reduce the amount of memory needed.

It might be better to just sum the length 3 term directly.

It would lend it self well to a decimate by 3, but it's too late for me to figure out how tonight.
```WindowFunction  UnquantizedVpp98%  UnquantizedStdDev  HFNoisePower  HighBitCount
tukey17_13_32         1.82812         0.427648         0.228645           95
cma31x11x3            1.65625         0.411339         0.347341           96
cma20x17x3            2.109375        0.5095961        0.1255628         104
```
The performance should be similar, but cma31x11x3 is 2 taps shorter.
```  1    3    6    9   12   15   18   21   24   27   30   32
33   33   33   33   33   33   33   33   33   33   33   33   33   33   33   33   33   33   33
32   30   27   24   21   18   15   12    9    6    3    1
```
The sum is 1023. It turns out that the sum of the taps is the product of the lengths. So if you want to construct a filter with a certain sum, factor it and decide how to group the factors together.

How do you derive your new 43-tap table from 31x11x3? What logic savings are we looking at with three cascaded stages?

A total of 1024 would not be a problem and 1024 = 32x8x4. If final output > 255, then set it to 255.

Why are the sidelobes so high on the right-hand side of the frequency response?
Why does your impulse response not start and end at zero?

The centre notch can be removed now because overflow is easy to correct, as already mentioned.

• edited 2018-12-10 23:42
Here is the new triple-integrating Tukey 17/32 that's in the smart pins. It reduced each smart pin by 35 ALMs, which is good.

This is the concept:
```tap	value	dv1	dv2
---------------------------
0	0	0	+1
1	1	+1	+1
2	3	+2
3	5	+2
4	7	+2	+1
5	10	+3
6	13	+3
7	16	+3
8	19	+3
9	22	+3
10	25	+3	-1
11	27	+2
12	29	+2
13	31	+2	-1
14	32	+1	-1
15	32	0
16	32	0
17	32	0
18	32	0
19	32	0
20	32	0
21	32	0
22	32	0
23	32	0
24	32	0
25	32	0
26	32	0
27	32	0
28	32	0
29	32	0
30	32	0
31	32	0
32	32	0	-1
33	31	-1	-1
34	29	-2
35	27	-2
36	25	-2	-1
37	22	-3
38	19	-3
39	16	-3
40	13	-3
41	10	-3
42	7	-3	+1
43	5	-2
44	3	-2
45	1	-2	+1
46	0	-1	+1
47	0	0
```

Here is the business part of the code: And here's a "SmallBASIC" program that proves the concept:
```dim t(47)

for x = 0 to 47
t(x) = 0
next x

inta = 0
intb = 0
intc = 0

for y = 0 to 46*2

t(00) = 1
if y>45 then t(00) = 0

delta = 0

if (t(00) = 1 and t(01) = 0) then delta = delta + 1
if (t(00) = 0 and t(01) = 1) then delta = delta - 1

if (t(01) = 1 and t(02) = 0) then delta = delta + 1
if (t(01) = 0 and t(02) = 1) then delta = delta - 1

if (t(04) = 1 and t(05) = 0) then delta = delta + 1
if (t(04) = 0 and t(05) = 1) then delta = delta - 1

if (t(10) = 1 and t(11) = 0) then delta = delta - 1
if (t(10) = 0 and t(11) = 1) then delta = delta + 1

if (t(13) = 1 and t(14) = 0) then delta = delta - 1
if (t(13) = 0 and t(14) = 1) then delta = delta + 1

if (t(14) = 1 and t(15) = 0) then delta = delta - 1
if (t(14) = 0 and t(15) = 1) then delta = delta + 1

if (t(32) = 1 and t(33) = 0) then delta = delta - 1
if (t(32) = 0 and t(33) = 1) then delta = delta + 1

if (t(33) = 1 and t(34) = 0) then delta = delta - 1
if (t(33) = 0 and t(34) = 1) then delta = delta + 1

if (t(36) = 1 and t(37) = 0) then delta = delta - 1
if (t(36) = 0 and t(37) = 1) then delta = delta + 1

if (t(42) = 1 and t(43) = 0) then delta = delta + 1
if (t(42) = 0 and t(43) = 1) then delta = delta - 1

if (t(45) = 1 and t(46) = 0) then delta = delta + 1
if (t(45) = 0 and t(46) = 1) then delta = delta - 1

if (t(46) = 1 and t(47) = 0) then delta = delta + 1
if (t(46) = 0 and t(47) = 1) then delta = delta - 1

for x = 47 to 1 step -1
t(x) = t(x-1)
next x

't(32) = t(16)
't(16) = 0

inta = inta + delta
intb = intb + inta
intc = intc + intb

sample = int(intc/4)
if sample = 256 then sample = 255
print inta,intb,intc,sample

next y
```
• cgracey wrote: »
Here is the new triple-integrating Tukey 17/32 that's in the smart pins. It reduced each smart pin by 35 ALMs, which is good.

That is good. Does it work with real bitstreams?
• edited 2018-12-11 00:55
TonyB_ wrote: »
cgracey wrote: »
Here is the new triple-integrating Tukey 17/32 that's in the smart pins. It reduced each smart pin by 35 ALMs, which is good.

That is good. Does it work with real bitstreams?

Oh, yeah. It's just like the others. Produces the exact same output. It takes a few more taps, though. I went to bed early to get back on a normal schedule, but just thought about this all night until I figured it out. Getting the sample value to go back down was where I realized I needed to be looking at edges in the data, not just states. You can see that the idea could be applied to any tap length. For every slope change, you need a set of delta terms. A trapezoidal window would take only four sets. We need twelve for the Tukey 17/32 (six slope changes going up, six going down).
• edited 2018-12-11 02:56
Tubular wrote: »
Here's the code from P2Hot fwiw.

Ah, I see. It used a chunk (22 pins) of the then portC for address and ctrl, plus 16 pins of portB for data. That's a hog!

EDIT: And that doesn't include the DRAM clock pin.