ADC Sampling Breakthrough

cgracey · 2018-12-04 02:09

Phil Pilgrim (PhiPi) wrote: »

Chip, please don't take my lead as an endorsement of this entire thread. But if it works and simplifies what you were going to go ahead and do anyway, I guess I'd be in favor of it.

-Phil

I know, Phil.

I've been studying that code and I kind of get it, but what I don't understand is how would you go from one bit in to something like 8 bits out? You must allow for some bit-length expansion along the way, right? Or, input $FF for 1 and $00 for 0?

Anything to reduce logic these 4-per-cog ADC channels would be welcome.

cgracey · 2018-12-04 02:16

TonyB_, thanks for trying out that idea of doubling acc1 before doing acc2 and acc3. It looks to me like it's maybe kind of a sticky compromise. What's your take?

jmg · 2018-12-04 02:47

cgracey wrote: »

TonyB_, thanks for trying out that idea of doubling acc1 before doing acc2 and acc3. It looks to me like it's maybe kind of a sticky compromise. What's your take?

It appears the non-doubling (std Sinc3), is the ideal, as those deviations are significant, so it comes down to logic cost.
With the P2 adders, how many gates / LUT do they need, relative to a ripple adder, or a MUX and a flip flop ?

SaucySoliton · 2018-12-04 06:22

I did a quick test on the P1. Breadboarded on my Activity Board with 220pF caps instead of 1nF. Input was grounded.

           Triangle    Rectangle
Mean      1143.3       1153.9
Std Dev      0.13057      4.64507

Triangular window has a standard deviation 2.8% of the rectangular window.

Have we been doing ADC wrong for 12 years? I'm probably going to build a second or third order modulator next.

cgracey · 2018-12-04 08:10

SaucySoliton wrote: »
I did a quick test on the P1. Breadboarded on my Activity Board with 220pF caps instead of 1nF. Input was grounded.
           Triangle    Rectangle
Mean      1143.3       1153.9
Std Dev      0.13057      4.64507
Triangular window has a standard deviation 2.8% of the rectangular window. Have we been doing ADC wrong for 12 years? I'm probably going to build a second or third order modulator next.

We've been doing it wrong.

I finally tested out the Sinc3 smart pin mode and it works really well.

It seems to me that a 2nd-order modulator would be very sloppy about tracking the input. I want to see if it works for you.

cgracey · 2018-12-04 08:19

Here's the test code that runs on the FPGA with the SINC3 smart pin and two real I/O pins on the pad ring test chip:

' SINC3 test program

con		adc_pin	= 4
		dac_pin = adc_pin+1
		exp	= 5			'exp = 4..10, period = 1<<exp

dat		org

		hubset	#$FF			'select 80MHz on FPGA

		wrpin	adc,#adc_pin		'set ADC+SINC3
		wxpin	##1<<exp,#adc_pin	'set period

		wrpin	dac,#dac_pin		'set DAC+dither
		wxpin	#1, #dac_pin		'always updateable

		dirh	#1<<6 + adc_pin		'enable ADC and DAC smart pins

		setse1	#%001<<6 + adc_pin	'event on ADC period completion


.loop		rep	#8,#0			'rep gets 16-clock loop for exp=4

		waitse1				'wait for ADC period

		rdpin	x,#adc_pin		'read acc3

		sub	x,diff1			'compute diff's
		add	diff1,x
		sub	x,diff2
		add	diff2,x

		ror	x,#(exp*3-16) & $1F	'scale sample
		wypin	x,#dac_pin		'write to DAC pin


adc		long	%100011_0000000_00_11000_0	'ADC + SINC3 mode
dac		long	%10110_00000000_01_00010_0	'DAC + noise dither mode

x		res	1
diff1		res	1
diff2		res	1

jmg · 2018-12-04 08:34

cgracey wrote: »

I finally tested out the Sinc3 smart pin mode and it works really well.
Here's the test code that runs on the FPGA with the SINC3 smart pin and two real I/O pins on the pad ring test chip:

Great to hear.

What ENOB do you get on the test chip ? How does that compare with the logged P2 Devices ?

cgracey · 2018-12-04 12:07

jmg wrote: »

cgracey wrote: »

I finally tested out the Sinc3 smart pin mode and it works really well.
Here's the test code that runs on the FPGA with the SINC3 smart pin and two real I/O pins on the pad ring test chip:

Great to hear.

What ENOB do you get on the test chip ? How does that compare with the logged P2 Devices ?

Well, it's hard to say, because the FPGA board has about 50mV of 471KHz noise on its 3.3V supply that feeds the pins. Even so, the consistency of readings I'm getting is about like this:

16 counts = 7 bits
32 counts = 8 bits
64 counts = 9 bits
128 counts = 11 bits
256 counts = 12 bits
512 counts = 12 bits (1/f noise really increases)
1024 counts = 13 bits

Okay! I just wired in a quiet 3.3V regulator and things are looking WAY better:

16 counts = 8 bits
32 counts = 10 bits
64 counts = 12 bits
128 counts = 13 bits
256 counts = 13 bits
512 counts = 13 bits (1/f noise really increases)
1024 counts = 13 bits

So, noise doesn't let us get beyond ~13 bits.

evanh · 2018-12-04 12:53

Just from previous outcomes, I'm guessing Tukey will do better. At the cost of more lag.

Ramon · 2018-12-04 14:07

cgracey wrote: »

But did you see those later pics of the super accurate and quiet ramp and sine signals from the pad ring test chip running on the FPGA board? I was really surprised. This means that the digital ground noise in the substrate of the actual P2 die is demolishing our SNR. If we could quiet down that ground noise, the ADC performance would be fantastic. It should be possible in this next revision to improve noise isolation a little bit.

Does the new P2-Eval board has better ground noise performance than P2D2 board? Is the layout and capacitors helping to get better SNR? I haven't seen any information about how that board performs. Nobody is using it yet?

Publison · 2018-12-04 14:10

Ramon wrote: »

cgracey wrote: »

But did you see those later pics of the super accurate and quiet ramp and sine signals from the pad ring test chip running on the FPGA board? I was really surprised. This means that the digital ground noise in the substrate of the actual P2 die is demolishing our SNR. If we could quiet down that ground noise, the ADC performance would be fantastic. It should be possible in this next revision to improve noise isolation a little bit.

Does the new P2-Eval board has better ground noise performance than P2D2 board? Is the layout and capacitors helping to get better SNR? I haven't seen any information about how that board performs. Nobody is using it yet?

All the P2-Eval boards are still at Parallax. There are none in the wild yet. Chip is using a board for tests.

evanh · 2018-12-04 15:22

Ramon wrote: »

Does the new P2-Eval board has better ground noise performance than P2D2 board? Is the layout and capacitors helping to get better SNR? I haven't seen any information about how that board performs.

Yes and yes. Chip has commented a couple of times but not detailed yet.

cgracey · 2018-12-04 16:02

I experimented with binomial filters, but they didn't seem to do much filtering for the amount of logic involved. I first made a "1 1" and that wasn't too hot, so I made this "1 2 1" which also was lousy. Perhaps I just didn't do it right. Can anyone look at this that knows about these things and see if there's a problem in my implementation?

reg [2:0][0:0] f1;
reg [2:0][2:0] f2;
reg [2:0][4:0] f3;
reg [2:0][6:0] f4;
reg [2:0][8:0] f5;
reg [2:0][10:0] f6;
reg [2:0][12:0] f7;
reg [2:0][14:0] f8;
reg [2:0][16:0] f9;
reg [2:0][18:0] f10;
reg [2:0][20:0] f11;
reg [2:0][22:0] f12;
reg [2:0][24:0] f13;
reg [2:0][26:0] f14;
reg [2:0][28:0] f15;
reg [2:0][30:0] f16;

`regscan (f1[0], 1'b0, ena, pin_in[cfg[5:0]])
`regscan (f1[1], 1'b0, ena, f1[0])
`regscan (f1[2], 1'b0, ena, f1[1])

`regscan (f2[0], 1'b0, ena, f1[0] + (f1[1] << 1) + f1[2])
`regscan (f2[1], 1'b0, ena, f2[0])
`regscan (f2[2], 1'b0, ena, f2[1])

`regscan (f3[0], 1'b0, ena, f2[0] + (f2[1] << 1) + f2[2])
`regscan (f3[1], 1'b0, ena, f3[0])
`regscan (f3[2], 1'b0, ena, f3[1])

`regscan (f4[0], 1'b0, ena, f3[0] + (f3[1] << 1) + f3[2])
`regscan (f4[1], 1'b0, ena, f4[0])
`regscan (f4[2], 1'b0, ena, f4[1])

`regscan (f5[0], 1'b0, ena, f4[0] + (f4[1] << 1) + f4[2])
`regscan (f5[1], 1'b0, ena, f5[0])
`regscan (f5[2], 1'b0, ena, f5[1])

`regscan (f6[0], 1'b0, ena, f5[0] + (f5[1] << 1) + f5[2])
`regscan (f6[1], 1'b0, ena, f6[0])
`regscan (f6[2], 1'b0, ena, f6[1])

`regscan (f7[0], 1'b0, ena, f6[0] + (f6[1] << 1) + f6[2])
`regscan (f7[1], 1'b0, ena, f7[0])
`regscan (f7[2], 1'b0, ena, f7[1])

`regscan (f8[0], 1'b0, ena, f7[0] + (f7[1] << 1) + f7[2])
`regscan (f8[1], 1'b0, ena, f8[0])
`regscan (f8[2], 1'b0, ena, f8[1])

`regscan (f9[0], 1'b0, ena, f8[0] + (f8[1] << 1) + f8[2])
`regscan (f9[1], 1'b0, ena, f9[0])
`regscan (f9[2], 1'b0, ena, f9[1])

`regscan (f10[0], 1'b0, ena, f9[0] + (f9[1] << 1) + f9[2])
`regscan (f10[1], 1'b0, ena, f10[0])
`regscan (f10[2], 1'b0, ena, f10[1])

`regscan (f11[0], 1'b0, ena, f10[0] + (f10[1] << 1) + f10[2])
`regscan (f11[1], 1'b0, ena, f11[0])
`regscan (f11[2], 1'b0, ena, f11[1])

`regscan (f12[0], 1'b0, ena, f11[0] + (f11[1] << 1) + f11[2])
`regscan (f12[1], 1'b0, ena, f12[0])
`regscan (f12[2], 1'b0, ena, f12[1])

`regscan (f13[0], 1'b0, ena, f12[0] + (f12[1] << 1) + f12[2])
`regscan (f13[1], 1'b0, ena, f13[0])
`regscan (f13[2], 1'b0, ena, f13[1])

`regscan (f14[0], 1'b0, ena, f13[0] + (f13[1] << 1) + f13[2])
`regscan (f14[1], 1'b0, ena, f14[0])
`regscan (f14[2], 1'b0, ena, f14[1])

`regscan (f15[0], 1'b0, ena, f14[0] + (f14[1] << 1) + f14[2])
`regscan (f15[1], 1'b0, ena, f15[0])
`regscan (f15[2], 1'b0, ena, f15[1])

`regscan (f16[0], 1'b0, ena, f15[0] + (f15[1] << 1) + f15[2])
`regscan (f16[1], 1'b0, ena, f16[0])
`regscan (f16[2], 1'b0, ena, f16[1])

wire [30:0] sum = (f16[0] + (f16[1] << 1) + f16[2] + 1'b1) >> 1;

`regscan (sample, 8'b0, ena, sum[30:23])

Our Tukey window is 1/4 the logic and works 10x better. Maybe I didn't do it right, though.

evanh · 2018-12-04 16:04

I don't even know what the double [][] is doing.

cgracey · 2018-12-04 16:13

evanh wrote: »

I don't even know what the double [][] is doing.

"reg [2:0][8:0] f5" means there are three f5 registers (f5[0], f5[1], f5[2]) that are each 9 bits wide.

jmg · 2018-12-04 18:33

cgracey wrote: »

Okay! I just wired in a quiet 3.3V regulator and things are looking WAY better:

16 counts = 8 bits
32 counts = 10 bits
64 counts = 12 bits
128 counts = 13 bits
256 counts = 13 bits
512 counts = 13 bits (1/f noise really increases)
1024 counts = 13 bits

So, noise doesn't let us get beyond ~13 bits.

Is that the same (low noise TI) regulator as on the Eval Boards ?

Those are good numbers, ie hitting around 13 bits at 1MHz (which I think those numbers say) is quite good.
Be interesting to run the same test, with same regulator, on the P2 EV board, (even if via sampling) to compare the ADCs

Q: if you can hit 8b with 16 samples, do you need the scope mode silicon ? - given the analog IP bandwidth seems to be the ceiling anyway ?

cgracey · 2018-12-04 18:46

jmg wrote: »

cgracey wrote: »

Okay! I just wired in a quiet 3.3V regulator and things are looking WAY better:

16 counts = 8 bits
32 counts = 10 bits
64 counts = 12 bits
128 counts = 13 bits
256 counts = 13 bits
512 counts = 13 bits (1/f noise really increases)
1024 counts = 13 bits

So, noise doesn't let us get beyond ~13 bits.

Is that the same (low noise TI) regulator as on the Eval Boards ?

Those are good numbers, ie hitting around 13 bits at 1MHz (which I think those numbers say) is quite good.
Be interesting to run the same test, with same regulator, on the P2 EV board, (even if via sampling) to compare the ADCs

Q: if you can hit 8b with 16 samples, do you need the scope mode silicon ? - given the analog IP bandwidth seems to be the ceiling anyway ?

I jumpered a regulator from the new eval board over to the FPGA board for its 3.3 volt power.

I, too, am rethinking the live ADC. ON Semi is running some compilation tests today and I'm waiting to hear how much logic the whole live scope thing requires.

SaucySoliton · 2018-12-04 18:47

cgracey wrote: »

I experimented with binomial filters, but they didn't seem to do much filtering for the amount of logic involved. I first made a "1 1" and that wasn't too hot, so I made this "1 2 1" which also was lousy. Perhaps I just didn't do it right. Can anyone look at this that knows about these things and see if there's a problem in my implementation?

...

Our Tukey window is 1/4 the logic and works 10x better. Maybe I didn't do it right, though.

The binomial method seems unsuitable for longer filters. To match the length of the Tukey we would need 44 adders. The first and last 10 samples are basically too low to contribute much to the output. So the effective length is only 25 samples.

Then there is the problem of bit growth. It should be fine to round or truncate the rest once the sums get to 10 bits or so. Maybe that would reduce the logic by half.

TonyB_ · 2018-12-04 19:46

cgracey wrote: »

TonyB_, thanks for trying out that idea of doubling acc1 before doing acc2 and acc3. It looks to me like it's maybe kind of a sticky compromise. What's your take?

Chip, 1-bit input has the best quality, so that should be our choice, i.e. no change. The whole point of trying 2-bit was to reduce the logic, but we could do that with 1-bit and 24-bit counters to a similar extent. If somehow the Sinc3 adders could run at twice the ADC bit rate then I think 2-bit mode would use the least logic.

TonyB_ · 2018-12-04 19:56

cgracey wrote: »

jmg wrote: »

Q: if you can hit 8b with 16 samples, do you need the scope mode silicon ? - given the analog IP bandwidth seems to be the ceiling anyway ?

I, too, am rethinking the live ADC. ON Semi is running some compilation tests today and I'm waiting to hear how much logic the whole live scope thing requires.

Two channels would need less logic than four, although probably not exactly half.

cgracey wrote: »

Note to self: Make a trigger mechanism with hysteresis for scope-like triggering on ADC channel data. It drops the current write address into a register and causes an event.

If there is not enough room for any more, could we have one Tukey, perhaps quite crude, in each cog for triggering?

lazarus666 · 2018-12-04 23:44

Actually, its three bits, a lossless summation of 1+2+1=4, so the first summation (the ones in the odd positions) requires a half adder for the ones, which consists of an exclusive (which usually contains an add gate that also generates a carry for us into the even positions - for a result of 16 three bit values for every 32 bits in in order to get a [1,2,1] kernel with decimation by 2 (to half sysclock) in one step;



#define EVEN_BITS	(0x55555555)
#define ODD_BITS   	(0xaaaaaaaa)
#define EVEN_PAIRS	(0x33333333)
#define ODD_PAIRS	(0xcccccccc)
#define EVEN_NIBBLES	(0xf0f0f0f0)
#define ODD_NIBBLES	(0x0f0f0f0f)
#define EVEN_BYTES  (0x00ff00ff)
#define ODD_BYTES   (0xff00ff00)

#define UINT unsigned int
#define DWORD       unsigned int
#define MATH_TYPE   float

#define HIWORD(arg)   ((arg)&(0xffff0000))
#define LOWORD(arg)   ((arg)&(0x0000ffff))

class propeller_adc
{
public:
  bool carry;
	unsigned int REG[8];

	void reset ();
	unsigned int iterate (bool sample);
	DWORD decimate1 (DWORD input);
   DWORD decimate2 (DWORD input);
   DWORD decimate3  (DWORD input);
   void print_bytes (int regid);
   void print_bytes2 (int regid);
   void print_nibbles ();
};

DWORD propeller_adc::decimate1 (DWORD input)
{
//static bool carry;
DWORD  q0, q1, q2, q3;
DWORD  r0, r1, r2, r3;
DWORD  s0, s1;

// format b31.....b0, even bits have weight = 1
// odd bits have weight = 0.5*2, phase one - decimate by 2 using
// a [1,2,1] convolutional kernel yielding 16 3 bit values from a
// single DWORD containing 32 individual one bit input samples

REG[0] = input;

q0 = input&EVEN_BITS;
q1 = input&ODD_BITS;
// multiply all even bits by 2
q2 = q0<<1;
// add all of the odd bits with the appropriate
// alternate neighbor
q3 = (q1+(q1>>2))>>1;
// pick off odd pairs and even pairs since the 
// next addition can result in a carry into the
// third bit - so that in the end we want to pack
// the resulting 16 three bit values into nibbles
r0 = q2&EVEN_PAIRS;
r2 = q3&EVEN_PAIRS;
s0 = r0+r2;
r1 = q2&ODD_PAIRS;
r3 = q3&ODD_PAIRS;
s1 = (r1+r3)>>2;

REG[1]=s1;
REG[2]=s0;

return 0;
}

Here is one test case which shows that summation occurs with the correct weights, and that for those who REALLY want or need to sum 6116 bits to obtain a long term moving average; you can still do so; but now with half the number of operations - since half the work has been done, i.e., try feeding this stream into a 3113 window where you are now doing 6116 additions.; the result should be the same or better; even if I still need to work out carry propagation.

R0:00000000000000000000000000000001 R1/2:  0  0  0  0  0  0  0  0 
 R0:00000000000000000000000000000010 R1/2:  0  0  0  0  0  0  0  0 
 R0:00000000000000000000000000000100 R1/2:  0  0  0  0  0  0  0  0 
 R0:00000000000000000000000000001000 R1/2:  0  0  0  0  0  0  0  0 
 R0:00000000000000000000000000010000 R1/2:  0  0  0  0  0  0  0  4 
 R0:00000000000000000000000000100000 R1/2:  0  0  0  0  0  0  0  2 
 R0:00000000000000000000000001000000 R1/2:  0  0  0  0  0  0  2  2 
 R0:00000000000000000000000010000000 R1/2:  0  0  0  0  0  0  1  3 
 R0:00000000000000000000000100000000 R1/2:  0  0  0  0  0  0  4  0 
 R0:00000000000000000000001000000000 R1/2:  0  0  0  0  0  0  3  1 
 R0:00000000000000000000010000000000 R1/2:  0  0  0  0  0  0  2  2 
 R0:00000000000000000000100000000000 R1/2:  0  0  0  0  0  0  3  1 
 R0:00000000000000000001000000000000 R1/2:  0  0  0  0  0  4  0  0 
 R0:00000000000000000010000000000000 R1/2:  0  0  0  0  0  2  1  1 
 R0:00000000000000000100000000000000 R1/2:  0  0  0  0  2  2  0  0 
 R0:00000000000000001000000000000000 R1/2:  0  0  0  0  1  3  0  0 
 R0:00000000000000010000000000000000 R1/2:  0  0  0  0  4  0  0  0 
 R0:00000000000000100000000000000000 R1/2:  0  0  0  0  3  1  0  0 
 R0:00000000000001000000000000000000 R1/2:  0  0  0  0  2  2  0  0 
 R0:00000000000010000000000000000000 R1/2:  0  0  0  0  3  1  0  0 
 R0:00000000000100000000000000000000 R1/2:  0  0  0  4  0  0  0  0 
 R0:00000000001000000000000000000000 R1/2:  0  0  0  2  1  1  0  0 
 R0:00000000010000000000000000000000 R1/2:  0  0  2  2  0  0  0  0 
 R0:00000000100000000000000000000000 R1/2:  0  0  1  3  0  0  0  0 
 R0:00000001000000000000000000000000 R1/2:  0  0  4  0  0  0  0  0 
 R0:00000010000000000000000000000000 R1/2:  0  0  3  1  0  0  0  0 
 R0:00000100000000000000000000000000 R1/2:  0  0  2  2  0  0  0  0 
 R0:00001000000000000000000000000000 R1/2:  0  0  3  1  0  0  0  0 
 R0:00010000000000000000000000000000 R1/2:  0  4  0  0  0  0  0  0 
 R0:00100000000000000000000000000000 R1/2:  0  2  1  1  0  0  0  0 
 R0:01000000000000000000000000000000 R1/2:  2  2  0  0  0  0  0  0 
 R0:10000000000000000000000000000000 R1/2:  1  3  0  0  0  0  0  0 
 R0:00000000000000000000000000000011 R1/2:  0  0  0  0  0  0  0  0 
 R0:00000000000000000000000000000110 R1/2:  0  0  0  0  0  0  0  0 
 R0:00000000000000000000000000001100 R1/2:  0  0  0  0  0  0  0  0 
 R0:00000000000000000000000000011000 R1/2:  0  0  0  0  0  0  0  4 
 R0:00000000000000000000000000110000 R1/2:  0  0  0  0  0  0  0  6 
 R0:00000000000000000000000001100000 R1/2:  0  0  0  0  0  0  2  4 
 R0:00000000000000000000000011000000 R1/2:  0  0  0  0  0  0  3  5 
 R0:00000000000000000000000110000000 R1/2:  0  0  0  0  0  0  5  3 
 R0:00000000000000000000001100000000 R1/2:  0  0  0  0  0  0  7  1 
 R0:00000000000000000000011000000000 R1/2:  0  0  0  0  0  0  5  3 
 R0:00000000000000000000110000000000 R1/2:  0  0  0  0  0  0  5  3 
 R0:00000000000000000001100000000000 R1/2:  0  0  0  0  0  4  3  1 
 R0:00000000000000000011000000000000 R1/2:  0  0  0  0  0  6  1  1 
 R0:00000000000000000110000000000000 R1/2:  0  0  0  0  2  4  1  1 
 R0:00000000000000001100000000000000 R1/2:  0  0  0  0  3  5  0  0 
 R0:00000000000000011000000000000000 R1/2:  0  0  0  0  5  3  0  0 
 R0:00000000000000110000000000000000 R1/2:  0  0  0  0  7  1  0  0 
 R0:00000000000001100000000000000000 R1/2:  0  0  0  0  5  3  0  0 
 R0:00000000000011000000000000000000 R1/2:  0  0  0  0  5  3  0  0 
 R0:00000000000110000000000000000000 R1/2:  0  0  0  4  3  1  0  0 
 R0:00000000001100000000000000000000 R1/2:  0  0  0  6  1  1  0  0 
 R0:00000000011000000000000000000000 R1/2:  0  0  2  4  1  1  0  0 
 R0:00000000110000000000000000000000 R1/2:  0  0  3  5  0  0  0  0 
 R0:00000001100000000000000000000000 R1/2:  0  0  5  3  0  0  0  0 
 R0:00000011000000000000000000000000 R1/2:  0  0  7  1  0  0  0  0 
 R0:00000110000000000000000000000000 R1/2:  0  0  5  3  0  0  0  0 
 R0:00001100000000000000000000000000 R1/2:  0  0  5  3  0  0  0  0 
 R0:00011000000000000000000000000000 R1/2:  0  4  3  1  0  0  0  0 
 R0:00110000000000000000000000000000 R1/2:  0  6  1  1  0  0  0  0 
 R0:01100000000000000000000000000000 R1/2:  2  4  1  1  0  0  0  0 
 R0:11000000000000000000000000000000 R1/2:  3  5  0  0  0  0  0  0 
 R0:10000000000000000000000000000001 R1/2:  1  3  0  0  0  0  0  0 
 R0:00000000000000000000000000000111 R1/2:  0  0  0  0  0  0  0  0 
 R0:00000000000000000000000000001110 R1/2:  0  0  0  0  0  0  0  0 
 R0:00000000000000000000000000011100 R1/2:  0  0  0  0  0  0  0  4 
 R0:00000000000000000000000000111000 R1/2:  0  0  0  0  0  0  0  6 
 R0:00000000000000000000000001110000 R1/2:  0  0  0  0  0  0  2  8 
 R0:00000000000000000000000011100000 R1/2:  0  0  0  0  0  0  3  7 
 R0:00000000000000000000000111000000 R1/2:  0  0  0  0  0  0  7  5 
 R0:00000000000000000000001110000000 R1/2:  0  0  0  0  0  0  8  4 
 R0:00000000000000000000011100000000 R1/2:  0  0  0  0  0  0  9  3 
 R0:00000000000000000000111000000000 R1/2:  0  0  0  0  0  0  8  4 
 R0:00000000000000000001110000000000 R1/2:  0  0  0  0  0  4  5  3 
 R0:00000000000000000011100000000000 R1/2:  0  0  0  0  0  6  4  2 
 R0:00000000000000000111000000000000 R1/2:  0  0  0  0  2  8  1  1 
 R0:00000000000000001110000000000000 R1/2:  0  0  0  0  3  7  1  1 
 R0:00000000000000011100000000000000 R1/2:  0  0  0  0  7  5  0  0 
 R0:00000000000000111000000000000000 R1/2:  0  0  0  0  8  4  0  0 
 R0:00000000000001110000000000000000 R1/2:  0  0  0  0  9  3  0  0 
 R0:00000000000011100000000000000000 R1/2:  0  0  0  0  8  4  0  0 
 R0:00000000000111000000000000000000 R1/2:  0  0  0  4  5  3  0  0 
 R0:00000000001110000000000000000000 R1/2:  0  0  0  6  4  2  0  0 
 R0:00000000011100000000000000000000 R1/2:  0  0  2  8  1  1  0  0 
 R0:00000000111000000000000000000000 R1/2:  0  0  3  7  1  1  0  0 
 R0:00000001110000000000000000000000 R1/2:  0  0  7  5  0  0  0  0 
 R0:00000011100000000000000000000000 R1/2:  0  0  8  4  0  0  0  0 
 R0:00000111000000000000000000000000 R1/2:  0  0  9  3  0  0  0  0 
 R0:00001110000000000000000000000000 R1/2:  0  0  8  4  0  0  0  0 
 R0:00011100000000000000000000000000 R1/2:  0  4  5  3  0  0  0  0 
 R0:00111000000000000000000000000000 R1/2:  0  6  4  2  0  0  0  0 
 R0:01110000000000000000000000000000 R1/2:  2  8  1  1  0  0  0  0 
 R0:11100000000000000000000000000000 R1/2:  3  7  1  1  0  0  0  0 
 R0:11000000000000000000000000000001 R1/2:  3  5  0  0  0  0  0  0 
 R0:10000000000000000000000000000011 R1/2:  1  3  0  0  0  0  0  0 
 R0:00000000000000000000000000001111 R1/2:  0  0  0  0  0  0  0  0 
 R0:00000000000000000000000000011110 R1/2:  0  0  0  0  0  0  0  4 
 R0:00000000000000000000000000111100 R1/2:  0  0  0  0  0  0  0  6 
 R0:00000000000000000000000001111000 R1/2:  0  0  0  0  0  0  2  8 
 R0:00000000000000000000000011110000 R1/2:  0  0  0  0  0  0  3  b 
 R0:00000000000000000000000111100000 R1/2:  0  0  0  0  0  0  7  7 
 R0:00000000000000000000001111000000 R1/2:  0  0  0  0  0  0  a  6 
 R0:00000000000000000000011110000000 R1/2:  0  0  0  0  0  0  a  6 
 R0:00000000000000000000111100000000 R1/2:  0  0  0  0  0  0  c  4 
 R0:00000000000000000001111000000000 R1/2:  0  0  0  0  0  4  8  4 
 R0:00000000000000000011110000000000 R1/2:  0  0  0  0  0  6  6  4 
 R0:00000000000000000111100000000000 R1/2:  0  0  0  0  2  8  4  2 
 R0:00000000000000001111000000000000 R1/2:  0  0  0  0  3  b  1  1 
 R0:00000000000000011110000000000000 R1/2:  0  0  0  0  7  7  1  1 
 R0:00000000000000111100000000000000 R1/2:  0  0  0  0  a  6  0  0 
 R0:00000000000001111000000000000000 R1/2:  0  0  0  0  a  6  0  0 
 R0:00000000000011110000000000000000 R1/2:  0  0  0  0  c  4  0  0 
 R0:00000000000111100000000000000000 R1/2:  0  0  0  4  8  4  0  0 
 R0:00000000001111000000000000000000 R1/2:  0  0  0  6  6  4  0  0 
 R0:00000000011110000000000000000000 R1/2:  0  0  2  8  4  2  0  0 
 R0:00000000111100000000000000000000 R1/2:  0  0  3  b  1  1  0  0 
 R0:00000001111000000000000000000000 R1/2:  0  0  7  7  1  1  0  0 
 R0:00000011110000000000000000000000 R1/2:  0  0  a  6  0  0  0  0 
 R0:00000111100000000000000000000000 R1/2:  0  0  a  6  0  0  0  0 
 R0:00001111000000000000000000000000 R1/2:  0  0  c  4  0  0  0  0 
 R0:00011110000000000000000000000000 R1/2:  0  4  8  4  0  0  0  0 
 R0:00111100000000000000000000000000 R1/2:  0  6  6  4  0  0  0  0 
 R0:01111000000000000000000000000000 R1/2:  2  8  4  2  0  0  0  0 
 R0:11110000000000000000000000000000 R1/2:  3  b  1  1  0  0  0  0 
 R0:11100000000000000000000000000001 R1/2:  3  7  1  1  0  0  0  0 
 R0:11000000000000000000000000000011 R1/2:  3  5  0  0  0  0  0  0 
 R0:10000000000000000000000000000111 R1/2:  1  3  0  0  0  0  0  0

OK - this part is starting to look pretty solid. Phase two and Phase three are a work in progress.

DWORD propeller_adc::decimate2 (DWORD input)
{
// phase two - apply the same transformation
// to the 16 three bit values, yielding
// eight five bit values, having a range [0..32]
DWORD  q0, q1, q2, q3;
DWORD  s0, s1;

q0 = REG[1]&EVEN_NIBBLES;
q1 = (REG[1]&ODD_NIBBLES)>>4;
q2 = REG[2]&EVEN_NIBBLES;
q3 = (REG[2]&ODD_NIBBLES)>>4;

// to do fix nbble order to get things in correct bins, although this gives an interesting
// high peaking response for quick settling time on transient input with no effect on
// the long term average

s0 = q0+q1+(q2<<1);
s1 = q0+q1+(q3<<1);
  REG[3]=s1>>4;
  REG[4]=s0>>4;
	return 0;
}

DWORD propeller_adc::decimate3 (DWORD input)
{
  DWORD acc;
// finally repack into a 32 bit register
// for in input rate of 250Mbps - this results
// in an initial output rate of 31.25
DWORD  q0, q1, q2, q3;
DWORD  s0, s1;

q0 = REG[3]&EVEN_BYTES;
q1 = (REG[3]&ODD_BYTES)>>8; 
q2 = REG[4]&EVEN_BYTES;
q3 = (REG[4]&ODD_BYTES)>>8;

// HMMM... FIXME? DEFINTELY BROKEN HERE!!

s0 = q0+q1+(q2<<1);
s1 = q0+q1+(q3<<1);
  REG[5]=s1;
  REG[6]=s0;
  
  acc = s0<<16+s1;
  REG[7]=acc;
	return acc;
}

cgracey · 2018-12-05 08:20

Wendy at ON Semi ran some test compiles to weigh the new design.

It seems that even without SINC3 and the 4-channel-scope-per-cog, we had already grown quite a bit.

Here is where we are at:

Note: 'sequential' means flipflop
Note: 'area' is square um, so 1,000,000 = 1 square mm

Original Design, current P2 silicon:
 
Type           Instances         Area  Area %
---------------------------------------------
timing_model          92 37049021.213   72.3
sequential         58246  4655514.931    9.1
inverter           74356   737807.974    1.4
buffer             15359   242666.189    0.5
logic             469886  8569315.686   16.7
physical_cells         0        0.000    0.0
---------------------------------------------
total             617939 51254325.994  100.0


New Design with SINC3 and SCOPE:
 
Type           Instances         Area  Area %
---------------------------------------------
timing_model          91 36815952.439   69.0
sequential         61112  4871372.083    9.1
inverter           92183   908669.798    1.7
buffer             21647   339857.101    0.6
logic             559646 10389258.163   19.5
physical_cells         0        0.000    0.0
---------------------------------------------
total             734679 53325109.584  100.0


New Design without SINC3 smart pin

Type           Instances         Area  Area %
---------------------------------------------
timing_model          91 36815952.439   69.1
sequential         61112  4881813.709    9.2
inverter           90045   889771.008    1.7
buffer             21978   340910.797    0.6
logic             554207 10318409.651   19.4
physical_cells         0        0.000    0.0
---------------------------------------------
total             727433 53246857.604  100.0


New Design without 4-channel SCOPE per cog

Type           Instances         Area  Area %
---------------------------------------------
timing_model          91 36815952.439   70.1
sequential         59129  4729178.317    9.0
inverter           83262   819040.410    1.6
buffer             19948   308145.869    0.6
logic             530745  9811739.930   18.7
physical_cells         0        0.000    0.0
---------------------------------------------
total             693175 52484056.964  100.0



	Cost of SINC3
	----------------------
	sequential           0
	inverter          2138
	buffer            -331
	logic             5439
	----------------------
	total             7246
	area             78252



	Cost of SCOPE
	----------------------
	sequential        1983
	inverter          8921
	buffer            1669
	logic            28901
	----------------------
	total            41474
	area            841052

Note that SINC3 turns out to be very little logic. 5439/64 is only 85 logic cells per smart pin added. And it didn't use any new flops, since the smart pin supplied them.

The 4-channel scope, on the other hand is a real pig. 28901/4channels/8cogs = 903 logic cells per channel, which seems WAY too big. I'm wondering two things:

(1) Is the tool generating a lot of extra circuitry in order to make timing? If I pipelined the 1's counts before final summing (requires < 36 flipflops per channel), might things relax and would net logic/buffering requirements go down?

(2) If this scope function went into each smart pin, it wouldn't need any new flops and things could be pipelined to relax timing. However, each cog would need to mux in 4 channels of a new 8-bit bus coming from each smart pin. And there would be twice as many Tukey filters, only half of which could ever be used at once. However, they could be instantly mux'd and filtered samples would be forthcoming, without waiting for the Tukey filter to refill. Maybe, rather than 4 random pins, you would select a group of 4 pins, differing only in the two LSBs of their pin numbers. That would lighten the mux'ing problem.

I think the first thing I need to do is see how much I can squeeze the Tukey filter logic.

cgracey · 2018-12-05 09:05

I'm putting the Tukey into the smart pin to see how it compiles.

Just had a realization that we don't need new 8-bit buses out of the smart pins. Instead, the pin just updates the result on every clock, so that a RDPIN at any time, from any cog, gets the immediate 8-bit conversion for that pin.

So, RDPIN would always read the ADC value and those same result outputs from the smart pins could be gathered, lower bytes, only, to get parallel ADC samples for streamer recording.

One other thing. The scope-trigger mechanism could go into the smart pin to alert when a trigger event occurs, raising IN.

jmg · 2018-12-05 09:48

cgracey wrote: »

I'm putting the Tukey into the smart pin to see how it compiles.

Just had a realization that we don't need new 8-bit buses out of the smart pins. Instead, the pin just updates the result on every clock, so that a RDPIN at any time, from any cog, gets the immediate 8-bit conversion for that pin.

So, RDPIN would always read the ADC value and those same result outputs from the smart pins could be gathered, lower bytes, only, to get parallel ADC samples for streamer recording.

One other thing. The scope-trigger mechanism could go into the smart pin to alert when a trigger event occurs, raising IN.

I"m not following all the details here, but the figures you reported earlier on the PAD-Ring test chip, had quite small sample counts on Sinc3. (aka high ADC conversion rates, but low bit-counts)
If sinc3 can adjust to low bit values, is that not equivalent to an ADC-Bandwidth limited low-bit-scope Tukey pathway ?
Then, you just need a means to capture the 4 outputs ? (Which I think is what you are saying above?)

Yanomani · 2018-12-05 10:05

cgracey wrote: »

I'm putting the Tukey into the smart pin to see how it compiles.

As for the new features related to the Tukey filters, could at least some part of the new 4-pin groupings and triggerring stuff be leveraged at the streamers too, to ease some way the communications with qspi, octa spi and even hyperbus-enabled devices?

TonyB_ · 2018-12-05 13:46

cgracey wrote: »

I'm putting the Tukey into the smart pin to see how it compiles.

Just had a realization that we don't need new 8-bit buses out of the smart pins. Instead, the pin just updates the result on every clock, so that a RDPIN at any time, from any cog, gets the immediate 8-bit conversion for that pin.

So, RDPIN would always read the ADC value and those same result outputs from the smart pins could be gathered, lower bytes, only, to get parallel ADC samples for streamer recording.

One other thing. The scope-trigger mechanism could go into the smart pin to alert when a trigger event occurs, raising IN.

Tukey had better quality than Sinc8.

Can the four 8-bit Tukey values still be read as one 32-bit value? And can this be streamed?

I'm wondering whether the pair symmetry in the ramp values is reflected in the logic minimization, e.g. 1& 31, 3 & 29, 5 & 27, etc., all add up to 32. I've looked at making them add up to 31 with bits inverted to halve the taps, e.g. 1 and 30, 3 & 28, 5 & 26, then adding a pair can be done by a simple OR. The problem is that the plateau value is now 31 and n+½ plateau bits are needed for the whole thing to sum to a multiple of 256 minus 1 when all bits are set.

Another thought I had was to set the midpoint of the ramp, currently 16, to zero and having -15 & +15, -13 & +13, etc., as the pairs, again to halve the taps. The previous max tap of 32 would then be +16. However, the arithmetic would not be two's complement as used by other smart pin modes.

I also had an idea for using a counter for the plateau values instead of adding them individually and I'll try to find my post about it.

lazarus666 · 2018-12-05 14:16

Breaking my sinc decimator into stages, the first stage turns 32 input bits into 16 3 bit values, packed in odd and even groupings, with 4 bit alignment. Hand optimization of the first stage brings it down to this, with carry propagation and debugging stages two and three yet to be debugged. Eventually you get 4 eight bit values from every 32 input bits, which can be summed by whatever windowing method you wish to use in addition to the initial decimation by 2, 4, or 8 - as desired. In the meantime I am developing this code both in Visual Studio, and in Simple IDE/Propeller GCC - so that I can also pry into the assembly that GCC is generating …

DWORD propeller_adc::decimate1B (DWORD input)
{
DWORD q2, q3;
REG[0] = input;
q2 = (input&EVEN_BITS)<<1;
q3 = ((input&ODD_BITS)+((input&ODD_BITS)>>2))>>1;
REG[1] = ((q2&ODD_PAIRS) + (q3&ODD_PAIRS))>>2;
REG[2]  = (q2&EVEN_PAIRS) + (q3&EVEN_PAIRS);
return 0;

386:FFT_test.c **** DWORD propeller_adc::decimate1B (DWORD input)
1208 0545 55AAAAAA mvi r5,#-1431655766
1208 AA
1209 054a 1514 and r5, r1
1211 054c 0B7065 xmov r7,r0 mov r6,r5
1212 054f 2740 add r7, #4
1214 0551 262A shr r6, #2
1215 0553 1650 add r6, r5
1216 0555 261A shr r6, #1
1218 0557 54CCCCCC mvi r4,#-858993460
1218 CC
1219 055c D55644 xmov r5,r6 and r5,r4
1220 055f E33080 xmov r3,r0 add r3,#8
1222 0562 20C0 add r0, #12
1225 0564 117F wrlong r1, r7
1227 0566 57555555 mvi r7,#1431655765
1227 55
1229 056b 1714 and r7, r1
1230 056d 2719 shl r7, #1
1233 056f 1474 and r4, r7
1234 0571 1540 add r5, r4
1235 0573 252A shr r5, #2
1236 0575 153F wrlong r5, r3
1238 0577 55333333 mvi r5,#858993459
1238 33
1239 057c 1654 and r6, r5
1241 057e 1754 and r7, r5
1243 0580 1670 add r6, r7
1244 0582 160F wrlong r6, r0
1247 0584 B0 mov r0, #0
1248 0585 02 lret

}

This it appears that the first stage requires about 40 words ~ 80 bytes in unrolled GCC-propeller 1 assembly, comprising ~ 26 instructions; with stage two and three expected to be similar when fully debugged and optimized. At that point I am expecting to have noisy 7 bit or something like that data running at sysclock/8, which can be summed to give numbers identical to what others are doing; so that instead of counting 6116 samples at sysclock, you would be getting 4 pre-filtered values every 32 clocks; reducing the number of filtered samples that need to be summed for a 12 bit window previously obtained by other means to 764.5; or you can store the samples and run an FFT, or linear regression, or a median filter, students t-test, or whatever you want with the data.

evanh · 2018-12-05 14:18

Oh, intriguing what James did with the Prop1. https://forums.parallax.com/discussion/comment/1456711/#Comment_1456711

That's a whole new trick for the Prop1. The ramping isn't every sysclock but maybe that isn't so terrible. Certainly his results look great. And, in theory, the shape can be more complex.

Guess what? The Prop2 has no equivalent mode!

Here's a commented snippet: (Inputs A and B are the same pin in James's test code)

                mov      i, looplength
                mov      j, looplength
                mov      frqb, #1              ' start rectangular (flat +1 increment) for input B
uploop          add      frqa, #1              ' start triangular (ramp the increment up and down) for input A
                djnz     i, #uploop            ' ramp up
downloop        sub      frqa, #1
                djnz     j, #downloop          ' ramp down
                mov      frqb, #0              ' stop rectangular for input B
                mov      frqa, #0              ' stop triangular for input A

                wrlong   phsa, adcp_tri        ' post triangular sample
                wrlong   phsb, adcp_rect       ' post rectangular sample

cgracey · 2018-12-05 14:33

TonyB_,

You can use RDPIN to read a sample, and then the streamer will be able to group four together, time-aligned.

Very interesting idea about offsetting.

Now that this thing is in the smart pin, it's pipelined, since we have the flops already there. That should relax any timing pressure. It's also no problem to check for $100 and swap out $FF, instead, so I changed that center $1F to $20.

I made this diagram of inc's and dec's for when bits move around. There has got to be some good optimization possible here:

tap	value		bit5	bit4	bit3	bit2	bit1	bit0
--------------------------------------------------------------------
0	000001							+1
1	000011						+1	
2	000101					+1	-1	
3	000111						+1	
4	001010				+1	-1		-1
5	001101					+1	-1	+1
6	010000			+1	-1	-1		-1
7	010011						+1	+1
8	010110					+1		-1
9	011001				+1	-1	-1	+1
10	011011						+1	
11	011101					+1	-1	
12	011111						+1	
13	100000		+1	-1	-1	-1	-1	-1
14	100000							
15	100000							
16	100000							
17	100000							
18	100000							
19	100000							
20	100000							
21	100000							
22	100000							
23	100000							
24	100000							
25	100000							
26	100000							
27	100000							
28	100000							
29	100000							
30	100000							
31	100000							
32	011111		-1	+1	+1	+1	+1	+1
33	011101						-1	
34	011011					-1	+1	
35	011001						-1	
36	010110				-1	+1	+1	-1
37	010011					-1		+1
38	010000						-1	-1
39	001101			-1	+1	+1		+1
40	001010					-1	+1	-1
41	000111				-1	+1		+1
42	000101						-1	
43	000011					-1	+1	
44	000001						-1	
       (000000)							-1



value	#	position
------------------------
000001	2	0,44
000011	2	1,43
000101	2	2,42
000111	2	3,41
001010	2	4,40
001101	2	5,39
010000	2	6,38
010011	2	7,37
010110	2	8,36
011001	2	9,35
011011	2	10,34
011101	2	11,33
011111	2	12,32
100000	19	13..31

#bit5 = 19
#bit4 = 14
#bit3 = 12
#bit2 = 12
#bit1 = 14
#bit0 = 20

cgracey · 2018-12-05 14:40

I sent the new scope-in-the-smart-pin file set to ON Semi for a test compile. I was surprised how little logic the SINC3 took in the smart pin, and I'm curious to see how the SCOPE may work there.

We really need to optimize the Tukey-filter summing logic. Some huge optimization(s) must be possible.

ADC Sampling Breakthrough

Comments