Shop OBEX P1 Docs P2 Docs Learn Events
ADC Sampling Breakthrough - Page 27 — Parallax Forums

ADC Sampling Breakthrough

1242527293052

Comments

  • cgraceycgracey Posts: 14,206
    edited 2018-12-04 02:10
    Chip, please don't take my lead as an endorsement of this entire thread. But if it works and simplifies what you were going to go ahead and do anyway, I guess I'd be in favor of it. :)

    -Phil

    I know, Phil.

    I've been studying that code and I kind of get it, but what I don't understand is how would you go from one bit in to something like 8 bits out? You must allow for some bit-length expansion along the way, right? Or, input $FF for 1 and $00 for 0?

    Anything to reduce logic these 4-per-cog ADC channels would be welcome.
  • cgraceycgracey Posts: 14,206
    TonyB_, thanks for trying out that idea of doubling acc1 before doing acc2 and acc3. It looks to me like it's maybe kind of a sticky compromise. What's your take?
  • jmgjmg Posts: 15,175
    cgracey wrote: »
    TonyB_, thanks for trying out that idea of doubling acc1 before doing acc2 and acc3. It looks to me like it's maybe kind of a sticky compromise. What's your take?

    It appears the non-doubling (std Sinc3), is the ideal, as those deviations are significant, so it comes down to logic cost.
    With the P2 adders, how many gates / LUT do they need, relative to a ripple adder, or a MUX and a flip flop ?
  • I did a quick test on the P1. Breadboarded on my Activity Board with 220pF caps instead of 1nF. Input was grounded.
               Triangle    Rectangle
    Mean      1143.3       1153.9
    Std Dev      0.13057      4.64507
    
    
    Triangular window has a standard deviation 2.8% of the rectangular window. :sunglasses: Have we been doing ADC wrong for 12 years? I'm probably going to build a second or third order modulator next.
    1200 x 900 - 30K
  • cgraceycgracey Posts: 14,206
    edited 2018-12-04 08:12
    I did a quick test on the P1. Breadboarded on my Activity Board with 220pF caps instead of 1nF. Input was grounded.
               Triangle    Rectangle
    Mean      1143.3       1153.9
    Std Dev      0.13057      4.64507
    
    
    Triangular window has a standard deviation 2.8% of the rectangular window. :sunglasses: Have we been doing ADC wrong for 12 years? I'm probably going to build a second or third order modulator next.

    We've been doing it wrong.

    I finally tested out the Sinc3 smart pin mode and it works really well.

    It seems to me that a 2nd-order modulator would be very sloppy about tracking the input. I want to see if it works for you.
  • cgraceycgracey Posts: 14,206
    Here's the test code that runs on the FPGA with the SINC3 smart pin and two real I/O pins on the pad ring test chip:
    ' SINC3 test program
    
    con		adc_pin	= 4
    		dac_pin = adc_pin+1
    		exp	= 5			'exp = 4..10, period = 1<<exp
    
    dat		org
    
    		hubset	#$FF			'select 80MHz on FPGA
    
    		wrpin	adc,#adc_pin		'set ADC+SINC3
    		wxpin	##1<<exp,#adc_pin	'set period
    
    		wrpin	dac,#dac_pin		'set DAC+dither
    		wxpin	#1, #dac_pin		'always updateable
    
    		dirh	#1<<6 + adc_pin		'enable ADC and DAC smart pins
    
    		setse1	#%001<<6 + adc_pin	'event on ADC period completion
    
    
    .loop		rep	#8,#0			'rep gets 16-clock loop for exp=4
    
    		waitse1				'wait for ADC period
    
    		rdpin	x,#adc_pin		'read acc3
    
    		sub	x,diff1			'compute diff's
    		add	diff1,x
    		sub	x,diff2
    		add	diff2,x
    
    		ror	x,#(exp*3-16) & $1F	'scale sample
    		wypin	x,#dac_pin		'write to DAC pin
    
    
    adc		long	%100011_0000000_00_11000_0	'ADC + SINC3 mode
    dac		long	%10110_00000000_01_00010_0	'DAC + noise dither mode
    
    x		res	1
    diff1		res	1
    diff2		res	1
    
  • jmgjmg Posts: 15,175
    edited 2018-12-04 08:36
    cgracey wrote: »
    I finally tested out the Sinc3 smart pin mode and it works really well.
    Here's the test code that runs on the FPGA with the SINC3 smart pin and two real I/O pins on the pad ring test chip:
    Great to hear.

    What ENOB do you get on the test chip ? How does that compare with the logged P2 Devices ?

  • cgraceycgracey Posts: 14,206
    jmg wrote: »
    cgracey wrote: »
    I finally tested out the Sinc3 smart pin mode and it works really well.
    Here's the test code that runs on the FPGA with the SINC3 smart pin and two real I/O pins on the pad ring test chip:
    Great to hear.

    What ENOB do you get on the test chip ? How does that compare with the logged P2 Devices ?

    Well, it's hard to say, because the FPGA board has about 50mV of 471KHz noise on its 3.3V supply that feeds the pins. Even so, the consistency of readings I'm getting is about like this:

    16 counts = 7 bits
    32 counts = 8 bits
    64 counts = 9 bits
    128 counts = 11 bits
    256 counts = 12 bits
    512 counts = 12 bits (1/f noise really increases)
    1024 counts = 13 bits

    Okay! I just wired in a quiet 3.3V regulator and things are looking WAY better:

    16 counts = 8 bits
    32 counts = 10 bits
    64 counts = 12 bits
    128 counts = 13 bits
    256 counts = 13 bits
    512 counts = 13 bits (1/f noise really increases)
    1024 counts = 13 bits

    So, noise doesn't let us get beyond ~13 bits.
  • evanhevanh Posts: 16,032
    Just from previous outcomes, I'm guessing Tukey will do better. At the cost of more lag.
  • cgracey wrote: »
    But did you see those later pics of the super accurate and quiet ramp and sine signals from the pad ring test chip running on the FPGA board? I was really surprised. This means that the digital ground noise in the substrate of the actual P2 die is demolishing our SNR. If we could quiet down that ground noise, the ADC performance would be fantastic. It should be possible in this next revision to improve noise isolation a little bit.

    Does the new P2-Eval board has better ground noise performance than P2D2 board? Is the layout and capacitors helping to get better SNR? I haven't seen any information about how that board performs. Nobody is using it yet?
  • PublisonPublison Posts: 12,366
    edited 2018-12-04 14:10
    Ramon wrote: »
    cgracey wrote: »
    But did you see those later pics of the super accurate and quiet ramp and sine signals from the pad ring test chip running on the FPGA board? I was really surprised. This means that the digital ground noise in the substrate of the actual P2 die is demolishing our SNR. If we could quiet down that ground noise, the ADC performance would be fantastic. It should be possible in this next revision to improve noise isolation a little bit.

    Does the new P2-Eval board has better ground noise performance than P2D2 board? Is the layout and capacitors helping to get better SNR? I haven't seen any information about how that board performs. Nobody is using it yet?

    All the P2-Eval boards are still at Parallax. There are none in the wild yet. Chip is using a board for tests.
  • evanhevanh Posts: 16,032
    Ramon wrote: »
    Does the new P2-Eval board has better ground noise performance than P2D2 board? Is the layout and capacitors helping to get better SNR? I haven't seen any information about how that board performs.

    Yes and yes. Chip has commented a couple of times but not detailed yet.

  • cgraceycgracey Posts: 14,206
    edited 2018-12-04 16:03
    I experimented with binomial filters, but they didn't seem to do much filtering for the amount of logic involved. I first made a "1 1" and that wasn't too hot, so I made this "1 2 1" which also was lousy. Perhaps I just didn't do it right. Can anyone look at this that knows about these things and see if there's a problem in my implementation?
    reg [2:0][0:0] f1;
    reg [2:0][2:0] f2;
    reg [2:0][4:0] f3;
    reg [2:0][6:0] f4;
    reg [2:0][8:0] f5;
    reg [2:0][10:0] f6;
    reg [2:0][12:0] f7;
    reg [2:0][14:0] f8;
    reg [2:0][16:0] f9;
    reg [2:0][18:0] f10;
    reg [2:0][20:0] f11;
    reg [2:0][22:0] f12;
    reg [2:0][24:0] f13;
    reg [2:0][26:0] f14;
    reg [2:0][28:0] f15;
    reg [2:0][30:0] f16;
    
    `regscan (f1[0], 1'b0, ena, pin_in[cfg[5:0]])
    `regscan (f1[1], 1'b0, ena, f1[0])
    `regscan (f1[2], 1'b0, ena, f1[1])
    
    `regscan (f2[0], 1'b0, ena, f1[0] + (f1[1] << 1) + f1[2])
    `regscan (f2[1], 1'b0, ena, f2[0])
    `regscan (f2[2], 1'b0, ena, f2[1])
    
    `regscan (f3[0], 1'b0, ena, f2[0] + (f2[1] << 1) + f2[2])
    `regscan (f3[1], 1'b0, ena, f3[0])
    `regscan (f3[2], 1'b0, ena, f3[1])
    
    `regscan (f4[0], 1'b0, ena, f3[0] + (f3[1] << 1) + f3[2])
    `regscan (f4[1], 1'b0, ena, f4[0])
    `regscan (f4[2], 1'b0, ena, f4[1])
    
    `regscan (f5[0], 1'b0, ena, f4[0] + (f4[1] << 1) + f4[2])
    `regscan (f5[1], 1'b0, ena, f5[0])
    `regscan (f5[2], 1'b0, ena, f5[1])
    
    `regscan (f6[0], 1'b0, ena, f5[0] + (f5[1] << 1) + f5[2])
    `regscan (f6[1], 1'b0, ena, f6[0])
    `regscan (f6[2], 1'b0, ena, f6[1])
    
    `regscan (f7[0], 1'b0, ena, f6[0] + (f6[1] << 1) + f6[2])
    `regscan (f7[1], 1'b0, ena, f7[0])
    `regscan (f7[2], 1'b0, ena, f7[1])
    
    `regscan (f8[0], 1'b0, ena, f7[0] + (f7[1] << 1) + f7[2])
    `regscan (f8[1], 1'b0, ena, f8[0])
    `regscan (f8[2], 1'b0, ena, f8[1])
    
    `regscan (f9[0], 1'b0, ena, f8[0] + (f8[1] << 1) + f8[2])
    `regscan (f9[1], 1'b0, ena, f9[0])
    `regscan (f9[2], 1'b0, ena, f9[1])
    
    `regscan (f10[0], 1'b0, ena, f9[0] + (f9[1] << 1) + f9[2])
    `regscan (f10[1], 1'b0, ena, f10[0])
    `regscan (f10[2], 1'b0, ena, f10[1])
    
    `regscan (f11[0], 1'b0, ena, f10[0] + (f10[1] << 1) + f10[2])
    `regscan (f11[1], 1'b0, ena, f11[0])
    `regscan (f11[2], 1'b0, ena, f11[1])
    
    `regscan (f12[0], 1'b0, ena, f11[0] + (f11[1] << 1) + f11[2])
    `regscan (f12[1], 1'b0, ena, f12[0])
    `regscan (f12[2], 1'b0, ena, f12[1])
    
    `regscan (f13[0], 1'b0, ena, f12[0] + (f12[1] << 1) + f12[2])
    `regscan (f13[1], 1'b0, ena, f13[0])
    `regscan (f13[2], 1'b0, ena, f13[1])
    
    `regscan (f14[0], 1'b0, ena, f13[0] + (f13[1] << 1) + f13[2])
    `regscan (f14[1], 1'b0, ena, f14[0])
    `regscan (f14[2], 1'b0, ena, f14[1])
    
    `regscan (f15[0], 1'b0, ena, f14[0] + (f14[1] << 1) + f14[2])
    `regscan (f15[1], 1'b0, ena, f15[0])
    `regscan (f15[2], 1'b0, ena, f15[1])
    
    `regscan (f16[0], 1'b0, ena, f15[0] + (f15[1] << 1) + f15[2])
    `regscan (f16[1], 1'b0, ena, f16[0])
    `regscan (f16[2], 1'b0, ena, f16[1])
    
    wire [30:0] sum = (f16[0] + (f16[1] << 1) + f16[2] + 1'b1) >> 1;
    
    `regscan (sample, 8'b0, ena, sum[30:23])
    

    Our Tukey window is 1/4 the logic and works 10x better. Maybe I didn't do it right, though.
  • evanhevanh Posts: 16,032
    I don't even know what the double [][] is doing.

  • cgraceycgracey Posts: 14,206
    evanh wrote: »
    I don't even know what the double [][] is doing.

    "reg [2:0][8:0] f5" means there are three f5 registers (f5[0], f5[1], f5[2]) that are each 9 bits wide.
  • jmgjmg Posts: 15,175
    cgracey wrote: »
    Okay! I just wired in a quiet 3.3V regulator and things are looking WAY better:

    16 counts = 8 bits
    32 counts = 10 bits
    64 counts = 12 bits
    128 counts = 13 bits
    256 counts = 13 bits
    512 counts = 13 bits (1/f noise really increases)
    1024 counts = 13 bits

    So, noise doesn't let us get beyond ~13 bits.

    Is that the same (low noise TI) regulator as on the Eval Boards ?

    Those are good numbers, ie hitting around 13 bits at 1MHz (which I think those numbers say) is quite good.
    Be interesting to run the same test, with same regulator, on the P2 EV board, (even if via sampling) to compare the ADCs

    Q: if you can hit 8b with 16 samples, do you need the scope mode silicon ? - given the analog IP bandwidth seems to be the ceiling anyway ?
  • cgraceycgracey Posts: 14,206
    jmg wrote: »
    cgracey wrote: »
    Okay! I just wired in a quiet 3.3V regulator and things are looking WAY better:

    16 counts = 8 bits
    32 counts = 10 bits
    64 counts = 12 bits
    128 counts = 13 bits
    256 counts = 13 bits
    512 counts = 13 bits (1/f noise really increases)
    1024 counts = 13 bits

    So, noise doesn't let us get beyond ~13 bits.

    Is that the same (low noise TI) regulator as on the Eval Boards ?

    Those are good numbers, ie hitting around 13 bits at 1MHz (which I think those numbers say) is quite good.
    Be interesting to run the same test, with same regulator, on the P2 EV board, (even if via sampling) to compare the ADCs

    Q: if you can hit 8b with 16 samples, do you need the scope mode silicon ? - given the analog IP bandwidth seems to be the ceiling anyway ?

    I jumpered a regulator from the new eval board over to the FPGA board for its 3.3 volt power.

    I, too, am rethinking the live ADC. ON Semi is running some compilation tests today and I'm waiting to hear how much logic the whole live scope thing requires.
  • cgracey wrote: »
    I experimented with binomial filters, but they didn't seem to do much filtering for the amount of logic involved. I first made a "1 1" and that wasn't too hot, so I made this "1 2 1" which also was lousy. Perhaps I just didn't do it right. Can anyone look at this that knows about these things and see if there's a problem in my implementation?

    ...

    Our Tukey window is 1/4 the logic and works 10x better. Maybe I didn't do it right, though.
    The binomial method seems unsuitable for longer filters. To match the length of the Tukey we would need 44 adders. The first and last 10 samples are basically too low to contribute much to the output. So the effective length is only 25 samples.

    Then there is the problem of bit growth. It should be fine to round or truncate the rest once the sums get to 10 bits or so. Maybe that would reduce the logic by half.
    1200 x 900 - 43K
    1200 x 900 - 22K
  • cgracey wrote: »
    TonyB_, thanks for trying out that idea of doubling acc1 before doing acc2 and acc3. It looks to me like it's maybe kind of a sticky compromise. What's your take?

    Chip, 1-bit input has the best quality, so that should be our choice, i.e. no change. The whole point of trying 2-bit was to reduce the logic, but we could do that with 1-bit and 24-bit counters to a similar extent. If somehow the Sinc3 adders could run at twice the ADC bit rate then I think 2-bit mode would use the least logic.
  • cgracey wrote: »
    jmg wrote: »
    Q: if you can hit 8b with 16 samples, do you need the scope mode silicon ? - given the analog IP bandwidth seems to be the ceiling anyway ?

    I, too, am rethinking the live ADC. ON Semi is running some compilation tests today and I'm waiting to hear how much logic the whole live scope thing requires.

    Two channels would need less logic than four, although probably not exactly half.
    cgracey wrote: »
    Note to self: Make a trigger mechanism with hysteresis for scope-like triggering on ADC channel data. It drops the current write address into a register and causes an event.

    If there is not enough room for any more, could we have one Tukey, perhaps quite crude, in each cog for triggering?
  • Actually, its three bits, a lossless summation of 1+2+1=4, so the first summation (the ones in the odd positions) requires a half adder for the ones, which consists of an exclusive (which usually contains an add gate that also generates a carry for us into the even positions - for a result of 16 three bit values for every 32 bits in in order to get a [1,2,1] kernel with decimation by 2 (to half sysclock) in one step;
    
    
    #define EVEN_BITS	(0x55555555)
    #define ODD_BITS   	(0xaaaaaaaa)
    #define EVEN_PAIRS	(0x33333333)
    #define ODD_PAIRS	(0xcccccccc)
    #define EVEN_NIBBLES	(0xf0f0f0f0)
    #define ODD_NIBBLES	(0x0f0f0f0f)
    #define EVEN_BYTES  (0x00ff00ff)
    #define ODD_BYTES   (0xff00ff00)
    
    #define UINT unsigned int
    #define DWORD       unsigned int
    #define MATH_TYPE   float
    
    #define HIWORD(arg)   ((arg)&(0xffff0000))
    #define LOWORD(arg)   ((arg)&(0x0000ffff))
    
    class propeller_adc
    {
    public:
      bool carry;
    	unsigned int REG[8];
    
    	void reset ();
    	unsigned int iterate (bool sample);
    	DWORD decimate1 (DWORD input);
       DWORD decimate2 (DWORD input);
       DWORD decimate3  (DWORD input);
       void print_bytes (int regid);
       void print_bytes2 (int regid);
       void print_nibbles ();
    };
    
    DWORD propeller_adc::decimate1 (DWORD input)
    {
    //static bool carry;
    DWORD  q0, q1, q2, q3;
    DWORD  r0, r1, r2, r3;
    DWORD  s0, s1;
    
    // format b31.....b0, even bits have weight = 1
    // odd bits have weight = 0.5*2, phase one - decimate by 2 using
    // a [1,2,1] convolutional kernel yielding 16 3 bit values from a
    // single DWORD containing 32 individual one bit input samples
    
    REG[0] = input;
    
    q0 = input&EVEN_BITS;
    q1 = input&ODD_BITS;
    // multiply all even bits by 2
    q2 = q0<<1;
    // add all of the odd bits with the appropriate
    // alternate neighbor
    q3 = (q1+(q1>>2))>>1;
    // pick off odd pairs and even pairs since the 
    // next addition can result in a carry into the
    // third bit - so that in the end we want to pack
    // the resulting 16 three bit values into nibbles
    r0 = q2&EVEN_PAIRS;
    r2 = q3&EVEN_PAIRS;
    s0 = r0+r2;
    r1 = q2&ODD_PAIRS;
    r3 = q3&ODD_PAIRS;
    s1 = (r1+r3)>>2;
    
    REG[1]=s1;
    REG[2]=s0;
    
    return 0;
    }
    


    Here is one test case which shows that summation occurs with the correct weights, and that for those who REALLY want or need to sum 6116 bits to obtain a long term moving average; you can still do so; but now with half the number of operations - since half the work has been done, i.e., try feeding this stream into a 3113 window where you are now doing 6116 additions.; the result should be the same or better; even if I still need to work out carry propagation.
    R0:00000000000000000000000000000001 R1/2:  0  0  0  0  0  0  0  0 
     R0:00000000000000000000000000000010 R1/2:  0  0  0  0  0  0  0  0 
     R0:00000000000000000000000000000100 R1/2:  0  0  0  0  0  0  0  0 
     R0:00000000000000000000000000001000 R1/2:  0  0  0  0  0  0  0  0 
     R0:00000000000000000000000000010000 R1/2:  0  0  0  0  0  0  0  4 
     R0:00000000000000000000000000100000 R1/2:  0  0  0  0  0  0  0  2 
     R0:00000000000000000000000001000000 R1/2:  0  0  0  0  0  0  2  2 
     R0:00000000000000000000000010000000 R1/2:  0  0  0  0  0  0  1  3 
     R0:00000000000000000000000100000000 R1/2:  0  0  0  0  0  0  4  0 
     R0:00000000000000000000001000000000 R1/2:  0  0  0  0  0  0  3  1 
     R0:00000000000000000000010000000000 R1/2:  0  0  0  0  0  0  2  2 
     R0:00000000000000000000100000000000 R1/2:  0  0  0  0  0  0  3  1 
     R0:00000000000000000001000000000000 R1/2:  0  0  0  0  0  4  0  0 
     R0:00000000000000000010000000000000 R1/2:  0  0  0  0  0  2  1  1 
     R0:00000000000000000100000000000000 R1/2:  0  0  0  0  2  2  0  0 
     R0:00000000000000001000000000000000 R1/2:  0  0  0  0  1  3  0  0 
     R0:00000000000000010000000000000000 R1/2:  0  0  0  0  4  0  0  0 
     R0:00000000000000100000000000000000 R1/2:  0  0  0  0  3  1  0  0 
     R0:00000000000001000000000000000000 R1/2:  0  0  0  0  2  2  0  0 
     R0:00000000000010000000000000000000 R1/2:  0  0  0  0  3  1  0  0 
     R0:00000000000100000000000000000000 R1/2:  0  0  0  4  0  0  0  0 
     R0:00000000001000000000000000000000 R1/2:  0  0  0  2  1  1  0  0 
     R0:00000000010000000000000000000000 R1/2:  0  0  2  2  0  0  0  0 
     R0:00000000100000000000000000000000 R1/2:  0  0  1  3  0  0  0  0 
     R0:00000001000000000000000000000000 R1/2:  0  0  4  0  0  0  0  0 
     R0:00000010000000000000000000000000 R1/2:  0  0  3  1  0  0  0  0 
     R0:00000100000000000000000000000000 R1/2:  0  0  2  2  0  0  0  0 
     R0:00001000000000000000000000000000 R1/2:  0  0  3  1  0  0  0  0 
     R0:00010000000000000000000000000000 R1/2:  0  4  0  0  0  0  0  0 
     R0:00100000000000000000000000000000 R1/2:  0  2  1  1  0  0  0  0 
     R0:01000000000000000000000000000000 R1/2:  2  2  0  0  0  0  0  0 
     R0:10000000000000000000000000000000 R1/2:  1  3  0  0  0  0  0  0 
     R0:00000000000000000000000000000011 R1/2:  0  0  0  0  0  0  0  0 
     R0:00000000000000000000000000000110 R1/2:  0  0  0  0  0  0  0  0 
     R0:00000000000000000000000000001100 R1/2:  0  0  0  0  0  0  0  0 
     R0:00000000000000000000000000011000 R1/2:  0  0  0  0  0  0  0  4 
     R0:00000000000000000000000000110000 R1/2:  0  0  0  0  0  0  0  6 
     R0:00000000000000000000000001100000 R1/2:  0  0  0  0  0  0  2  4 
     R0:00000000000000000000000011000000 R1/2:  0  0  0  0  0  0  3  5 
     R0:00000000000000000000000110000000 R1/2:  0  0  0  0  0  0  5  3 
     R0:00000000000000000000001100000000 R1/2:  0  0  0  0  0  0  7  1 
     R0:00000000000000000000011000000000 R1/2:  0  0  0  0  0  0  5  3 
     R0:00000000000000000000110000000000 R1/2:  0  0  0  0  0  0  5  3 
     R0:00000000000000000001100000000000 R1/2:  0  0  0  0  0  4  3  1 
     R0:00000000000000000011000000000000 R1/2:  0  0  0  0  0  6  1  1 
     R0:00000000000000000110000000000000 R1/2:  0  0  0  0  2  4  1  1 
     R0:00000000000000001100000000000000 R1/2:  0  0  0  0  3  5  0  0 
     R0:00000000000000011000000000000000 R1/2:  0  0  0  0  5  3  0  0 
     R0:00000000000000110000000000000000 R1/2:  0  0  0  0  7  1  0  0 
     R0:00000000000001100000000000000000 R1/2:  0  0  0  0  5  3  0  0 
     R0:00000000000011000000000000000000 R1/2:  0  0  0  0  5  3  0  0 
     R0:00000000000110000000000000000000 R1/2:  0  0  0  4  3  1  0  0 
     R0:00000000001100000000000000000000 R1/2:  0  0  0  6  1  1  0  0 
     R0:00000000011000000000000000000000 R1/2:  0  0  2  4  1  1  0  0 
     R0:00000000110000000000000000000000 R1/2:  0  0  3  5  0  0  0  0 
     R0:00000001100000000000000000000000 R1/2:  0  0  5  3  0  0  0  0 
     R0:00000011000000000000000000000000 R1/2:  0  0  7  1  0  0  0  0 
     R0:00000110000000000000000000000000 R1/2:  0  0  5  3  0  0  0  0 
     R0:00001100000000000000000000000000 R1/2:  0  0  5  3  0  0  0  0 
     R0:00011000000000000000000000000000 R1/2:  0  4  3  1  0  0  0  0 
     R0:00110000000000000000000000000000 R1/2:  0  6  1  1  0  0  0  0 
     R0:01100000000000000000000000000000 R1/2:  2  4  1  1  0  0  0  0 
     R0:11000000000000000000000000000000 R1/2:  3  5  0  0  0  0  0  0 
     R0:10000000000000000000000000000001 R1/2:  1  3  0  0  0  0  0  0 
     R0:00000000000000000000000000000111 R1/2:  0  0  0  0  0  0  0  0 
     R0:00000000000000000000000000001110 R1/2:  0  0  0  0  0  0  0  0 
     R0:00000000000000000000000000011100 R1/2:  0  0  0  0  0  0  0  4 
     R0:00000000000000000000000000111000 R1/2:  0  0  0  0  0  0  0  6 
     R0:00000000000000000000000001110000 R1/2:  0  0  0  0  0  0  2  8 
     R0:00000000000000000000000011100000 R1/2:  0  0  0  0  0  0  3  7 
     R0:00000000000000000000000111000000 R1/2:  0  0  0  0  0  0  7  5 
     R0:00000000000000000000001110000000 R1/2:  0  0  0  0  0  0  8  4 
     R0:00000000000000000000011100000000 R1/2:  0  0  0  0  0  0  9  3 
     R0:00000000000000000000111000000000 R1/2:  0  0  0  0  0  0  8  4 
     R0:00000000000000000001110000000000 R1/2:  0  0  0  0  0  4  5  3 
     R0:00000000000000000011100000000000 R1/2:  0  0  0  0  0  6  4  2 
     R0:00000000000000000111000000000000 R1/2:  0  0  0  0  2  8  1  1 
     R0:00000000000000001110000000000000 R1/2:  0  0  0  0  3  7  1  1 
     R0:00000000000000011100000000000000 R1/2:  0  0  0  0  7  5  0  0 
     R0:00000000000000111000000000000000 R1/2:  0  0  0  0  8  4  0  0 
     R0:00000000000001110000000000000000 R1/2:  0  0  0  0  9  3  0  0 
     R0:00000000000011100000000000000000 R1/2:  0  0  0  0  8  4  0  0 
     R0:00000000000111000000000000000000 R1/2:  0  0  0  4  5  3  0  0 
     R0:00000000001110000000000000000000 R1/2:  0  0  0  6  4  2  0  0 
     R0:00000000011100000000000000000000 R1/2:  0  0  2  8  1  1  0  0 
     R0:00000000111000000000000000000000 R1/2:  0  0  3  7  1  1  0  0 
     R0:00000001110000000000000000000000 R1/2:  0  0  7  5  0  0  0  0 
     R0:00000011100000000000000000000000 R1/2:  0  0  8  4  0  0  0  0 
     R0:00000111000000000000000000000000 R1/2:  0  0  9  3  0  0  0  0 
     R0:00001110000000000000000000000000 R1/2:  0  0  8  4  0  0  0  0 
     R0:00011100000000000000000000000000 R1/2:  0  4  5  3  0  0  0  0 
     R0:00111000000000000000000000000000 R1/2:  0  6  4  2  0  0  0  0 
     R0:01110000000000000000000000000000 R1/2:  2  8  1  1  0  0  0  0 
     R0:11100000000000000000000000000000 R1/2:  3  7  1  1  0  0  0  0 
     R0:11000000000000000000000000000001 R1/2:  3  5  0  0  0  0  0  0 
     R0:10000000000000000000000000000011 R1/2:  1  3  0  0  0  0  0  0 
     R0:00000000000000000000000000001111 R1/2:  0  0  0  0  0  0  0  0 
     R0:00000000000000000000000000011110 R1/2:  0  0  0  0  0  0  0  4 
     R0:00000000000000000000000000111100 R1/2:  0  0  0  0  0  0  0  6 
     R0:00000000000000000000000001111000 R1/2:  0  0  0  0  0  0  2  8 
     R0:00000000000000000000000011110000 R1/2:  0  0  0  0  0  0  3  b 
     R0:00000000000000000000000111100000 R1/2:  0  0  0  0  0  0  7  7 
     R0:00000000000000000000001111000000 R1/2:  0  0  0  0  0  0  a  6 
     R0:00000000000000000000011110000000 R1/2:  0  0  0  0  0  0  a  6 
     R0:00000000000000000000111100000000 R1/2:  0  0  0  0  0  0  c  4 
     R0:00000000000000000001111000000000 R1/2:  0  0  0  0  0  4  8  4 
     R0:00000000000000000011110000000000 R1/2:  0  0  0  0  0  6  6  4 
     R0:00000000000000000111100000000000 R1/2:  0  0  0  0  2  8  4  2 
     R0:00000000000000001111000000000000 R1/2:  0  0  0  0  3  b  1  1 
     R0:00000000000000011110000000000000 R1/2:  0  0  0  0  7  7  1  1 
     R0:00000000000000111100000000000000 R1/2:  0  0  0  0  a  6  0  0 
     R0:00000000000001111000000000000000 R1/2:  0  0  0  0  a  6  0  0 
     R0:00000000000011110000000000000000 R1/2:  0  0  0  0  c  4  0  0 
     R0:00000000000111100000000000000000 R1/2:  0  0  0  4  8  4  0  0 
     R0:00000000001111000000000000000000 R1/2:  0  0  0  6  6  4  0  0 
     R0:00000000011110000000000000000000 R1/2:  0  0  2  8  4  2  0  0 
     R0:00000000111100000000000000000000 R1/2:  0  0  3  b  1  1  0  0 
     R0:00000001111000000000000000000000 R1/2:  0  0  7  7  1  1  0  0 
     R0:00000011110000000000000000000000 R1/2:  0  0  a  6  0  0  0  0 
     R0:00000111100000000000000000000000 R1/2:  0  0  a  6  0  0  0  0 
     R0:00001111000000000000000000000000 R1/2:  0  0  c  4  0  0  0  0 
     R0:00011110000000000000000000000000 R1/2:  0  4  8  4  0  0  0  0 
     R0:00111100000000000000000000000000 R1/2:  0  6  6  4  0  0  0  0 
     R0:01111000000000000000000000000000 R1/2:  2  8  4  2  0  0  0  0 
     R0:11110000000000000000000000000000 R1/2:  3  b  1  1  0  0  0  0 
     R0:11100000000000000000000000000001 R1/2:  3  7  1  1  0  0  0  0 
     R0:11000000000000000000000000000011 R1/2:  3  5  0  0  0  0  0  0 
     R0:10000000000000000000000000000111 R1/2:  1  3  0  0  0  0  0  0 
    


    OK - this part is starting to look pretty solid. Phase two and Phase three are a work in progress.
    DWORD propeller_adc::decimate2 (DWORD input)
    {
    // phase two - apply the same transformation
    // to the 16 three bit values, yielding
    // eight five bit values, having a range [0..32]
    DWORD  q0, q1, q2, q3;
    DWORD  s0, s1;
    
    q0 = REG[1]&EVEN_NIBBLES;
    q1 = (REG[1]&ODD_NIBBLES)>>4;
    q2 = REG[2]&EVEN_NIBBLES;
    q3 = (REG[2]&ODD_NIBBLES)>>4;
    
    // to do fix nbble order to get things in correct bins, although this gives an interesting
    // high peaking response for quick settling time on transient input with no effect on
    // the long term average
    
    s0 = q0+q1+(q2<<1);
    s1 = q0+q1+(q3<<1);
      REG[3]=s1>>4;
      REG[4]=s0>>4;
    	return 0;
    }
    
    DWORD propeller_adc::decimate3 (DWORD input)
    {
      DWORD acc;
    // finally repack into a 32 bit register
    // for in input rate of 250Mbps - this results
    // in an initial output rate of 31.25
    DWORD  q0, q1, q2, q3;
    DWORD  s0, s1;
    
    q0 = REG[3]&EVEN_BYTES;
    q1 = (REG[3]&ODD_BYTES)>>8; 
    q2 = REG[4]&EVEN_BYTES;
    q3 = (REG[4]&ODD_BYTES)>>8;
    
    // HMMM... FIXME? DEFINTELY BROKEN HERE!!
    
    s0 = q0+q1+(q2<<1);
    s1 = q0+q1+(q3<<1);
      REG[5]=s1;
      REG[6]=s0;
      
      acc = s0<<16+s1;
      REG[7]=acc;
    	return acc;
    }
    





  • cgraceycgracey Posts: 14,206
    edited 2018-12-05 08:24
    Wendy at ON Semi ran some test compiles to weigh the new design.

    It seems that even without SINC3 and the 4-channel-scope-per-cog, we had already grown quite a bit.

    Here is where we are at:

    Note: 'sequential' means flipflop
    Note: 'area' is square um, so 1,000,000 = 1 square mm
    
    Original Design, current P2 silicon:
     
    Type           Instances         Area  Area %
    ---------------------------------------------
    timing_model          92 37049021.213   72.3
    sequential         58246  4655514.931    9.1
    inverter           74356   737807.974    1.4
    buffer             15359   242666.189    0.5
    logic             469886  8569315.686   16.7
    physical_cells         0        0.000    0.0
    ---------------------------------------------
    total             617939 51254325.994  100.0
    
    
    New Design with SINC3 and SCOPE:
     
    Type           Instances         Area  Area %
    ---------------------------------------------
    timing_model          91 36815952.439   69.0
    sequential         61112  4871372.083    9.1
    inverter           92183   908669.798    1.7
    buffer             21647   339857.101    0.6
    logic             559646 10389258.163   19.5
    physical_cells         0        0.000    0.0
    ---------------------------------------------
    total             734679 53325109.584  100.0
    
    
    New Design without SINC3 smart pin
    
    Type           Instances         Area  Area %
    ---------------------------------------------
    timing_model          91 36815952.439   69.1
    sequential         61112  4881813.709    9.2
    inverter           90045   889771.008    1.7
    buffer             21978   340910.797    0.6
    logic             554207 10318409.651   19.4
    physical_cells         0        0.000    0.0
    ---------------------------------------------
    total             727433 53246857.604  100.0
    
    
    New Design without 4-channel SCOPE per cog
    
    Type           Instances         Area  Area %
    ---------------------------------------------
    timing_model          91 36815952.439   70.1
    sequential         59129  4729178.317    9.0
    inverter           83262   819040.410    1.6
    buffer             19948   308145.869    0.6
    logic             530745  9811739.930   18.7
    physical_cells         0        0.000    0.0
    ---------------------------------------------
    total             693175 52484056.964  100.0
    
    
    
    	Cost of SINC3
    	----------------------
    	sequential           0
    	inverter          2138
    	buffer            -331
    	logic             5439
    	----------------------
    	total             7246
    	area             78252
    
    
    
    	Cost of SCOPE
    	----------------------
    	sequential        1983
    	inverter          8921
    	buffer            1669
    	logic            28901
    	----------------------
    	total            41474
    	area            841052
    


    Note that SINC3 turns out to be very little logic. 5439/64 is only 85 logic cells per smart pin added. And it didn't use any new flops, since the smart pin supplied them.


    The 4-channel scope, on the other hand is a real pig. 28901/4channels/8cogs = 903 logic cells per channel, which seems WAY too big. I'm wondering two things:

    (1) Is the tool generating a lot of extra circuitry in order to make timing? If I pipelined the 1's counts before final summing (requires < 36 flipflops per channel), might things relax and would net logic/buffering requirements go down?

    (2) If this scope function went into each smart pin, it wouldn't need any new flops and things could be pipelined to relax timing. However, each cog would need to mux in 4 channels of a new 8-bit bus coming from each smart pin. And there would be twice as many Tukey filters, only half of which could ever be used at once. However, they could be instantly mux'd and filtered samples would be forthcoming, without waiting for the Tukey filter to refill. Maybe, rather than 4 random pins, you would select a group of 4 pins, differing only in the two LSBs of their pin numbers. That would lighten the mux'ing problem.

    I think the first thing I need to do is see how much I can squeeze the Tukey filter logic.
  • cgraceycgracey Posts: 14,206
    edited 2018-12-05 09:08
    I'm putting the Tukey into the smart pin to see how it compiles.

    Just had a realization that we don't need new 8-bit buses out of the smart pins. Instead, the pin just updates the result on every clock, so that a RDPIN at any time, from any cog, gets the immediate 8-bit conversion for that pin.

    So, RDPIN would always read the ADC value and those same result outputs from the smart pins could be gathered, lower bytes, only, to get parallel ADC samples for streamer recording.

    One other thing. The scope-trigger mechanism could go into the smart pin to alert when a trigger event occurs, raising IN.
  • jmgjmg Posts: 15,175
    cgracey wrote: »
    I'm putting the Tukey into the smart pin to see how it compiles.

    Just had a realization that we don't need new 8-bit buses out of the smart pins. Instead, the pin just updates the result on every clock, so that a RDPIN at any time, from any cog, gets the immediate 8-bit conversion for that pin.

    So, RDPIN would always read the ADC value and those same result outputs from the smart pins could be gathered, lower bytes, only, to get parallel ADC samples for streamer recording.

    One other thing. The scope-trigger mechanism could go into the smart pin to alert when a trigger event occurs, raising IN.

    I"m not following all the details here, but the figures you reported earlier on the PAD-Ring test chip, had quite small sample counts on Sinc3. (aka high ADC conversion rates, but low bit-counts)
    If sinc3 can adjust to low bit values, is that not equivalent to an ADC-Bandwidth limited low-bit-scope Tukey pathway ?
    Then, you just need a means to capture the 4 outputs ? (Which I think is what you are saying above?)
  • cgracey wrote: »
    I'm putting the Tukey into the smart pin to see how it compiles.

    As for the new features related to the Tukey filters, could at least some part of the new 4-pin groupings and triggerring stuff be leveraged at the streamers too, to ease some way the communications with qspi, octa spi and even hyperbus-enabled devices?
  • TonyB_TonyB_ Posts: 2,196
    edited 2018-12-05 14:25
    cgracey wrote: »
    I'm putting the Tukey into the smart pin to see how it compiles.

    Just had a realization that we don't need new 8-bit buses out of the smart pins. Instead, the pin just updates the result on every clock, so that a RDPIN at any time, from any cog, gets the immediate 8-bit conversion for that pin.

    So, RDPIN would always read the ADC value and those same result outputs from the smart pins could be gathered, lower bytes, only, to get parallel ADC samples for streamer recording.

    One other thing. The scope-trigger mechanism could go into the smart pin to alert when a trigger event occurs, raising IN.

    Tukey had better quality than Sinc8.

    Can the four 8-bit Tukey values still be read as one 32-bit value? And can this be streamed?

    I'm wondering whether the pair symmetry in the ramp values is reflected in the logic minimization, e.g. 1& 31, 3 & 29, 5 & 27, etc., all add up to 32. I've looked at making them add up to 31 with bits inverted to halve the taps, e.g. 1 and 30, 3 & 28, 5 & 26, then adding a pair can be done by a simple OR. The problem is that the plateau value is now 31 and n+½ plateau bits are needed for the whole thing to sum to a multiple of 256 minus 1 when all bits are set.

    Another thought I had was to set the midpoint of the ramp, currently 16, to zero and having -15 & +15, -13 & +13, etc., as the pairs, again to halve the taps. The previous max tap of 32 would then be +16. However, the arithmetic would not be two's complement as used by other smart pin modes.

    I also had an idea for using a counter for the plateau values instead of adding them individually and I'll try to find my post about it.
  • Breaking my sinc decimator into stages, the first stage turns 32 input bits into 16 3 bit values, packed in odd and even groupings, with 4 bit alignment. Hand optimization of the first stage brings it down to this, with carry propagation and debugging stages two and three yet to be debugged. Eventually you get 4 eight bit values from every 32 input bits, which can be summed by whatever windowing method you wish to use in addition to the initial decimation by 2, 4, or 8 - as desired. In the meantime I am developing this code both in Visual Studio, and in Simple IDE/Propeller GCC - so that I can also pry into the assembly that GCC is generating …
    DWORD propeller_adc::decimate1B (DWORD input)
    {
    DWORD q2, q3;
    REG[0] = input;
    q2 = (input&EVEN_BITS)<<1;
    q3 = ((input&ODD_BITS)+((input&ODD_BITS)>>2))>>1;
    REG[1] = ((q2&ODD_PAIRS) + (q3&ODD_PAIRS))>>2;
    REG[2]  = (q2&EVEN_PAIRS) + (q3&EVEN_PAIRS);
    return 0;
    
    386:FFT_test.c **** DWORD propeller_adc::decimate1B (DWORD input)
    1208 0545 55AAAAAA mvi r5,#-1431655766
    1208 AA
    1209 054a 1514 and r5, r1
    1211 054c 0B7065 xmov r7,r0 mov r6,r5
    1212 054f 2740 add r7, #4
    1214 0551 262A shr r6, #2
    1215 0553 1650 add r6, r5
    1216 0555 261A shr r6, #1
    1218 0557 54CCCCCC mvi r4,#-858993460
    1218 CC
    1219 055c D55644 xmov r5,r6 and r5,r4
    1220 055f E33080 xmov r3,r0 add r3,#8
    1222 0562 20C0 add r0, #12
    1225 0564 117F wrlong r1, r7
    1227 0566 57555555 mvi r7,#1431655765
    1227 55
    1229 056b 1714 and r7, r1
    1230 056d 2719 shl r7, #1
    1233 056f 1474 and r4, r7
    1234 0571 1540 add r5, r4
    1235 0573 252A shr r5, #2
    1236 0575 153F wrlong r5, r3
    1238 0577 55333333 mvi r5,#858993459
    1238 33
    1239 057c 1654 and r6, r5
    1241 057e 1754 and r7, r5
    1243 0580 1670 add r6, r7
    1244 0582 160F wrlong r6, r0
    1247 0584 B0 mov r0, #0
    1248 0585 02 lret
    
    }
    


    This it appears that the first stage requires about 40 words ~ 80 bytes in unrolled GCC-propeller 1 assembly, comprising ~ 26 instructions; with stage two and three expected to be similar when fully debugged and optimized. At that point I am expecting to have noisy 7 bit or something like that data running at sysclock/8, which can be summed to give numbers identical to what others are doing; so that instead of counting 6116 samples at sysclock, you would be getting 4 pre-filtered values every 32 clocks; reducing the number of filtered samples that need to be summed for a 12 bit window previously obtained by other means to 764.5; or you can store the samples and run an FFT, or linear regression, or a median filter, students t-test, or whatever you want with the data.
  • evanhevanh Posts: 16,032
    edited 2018-12-05 14:40
    Oh, intriguing what James did with the Prop1. https://forums.parallax.com/discussion/comment/1456711/#Comment_1456711

    That's a whole new trick for the Prop1. The ramping isn't every sysclock but maybe that isn't so terrible. Certainly his results look great. And, in theory, the shape can be more complex.

    Guess what? The Prop2 has no equivalent mode!

    Here's a commented snippet: (Inputs A and B are the same pin in James's test code)
                    mov      i, looplength
                    mov      j, looplength
                    mov      frqb, #1              ' start rectangular (flat +1 increment) for input B
    uploop          add      frqa, #1              ' start triangular (ramp the increment up and down) for input A
                    djnz     i, #uploop            ' ramp up
    downloop        sub      frqa, #1
                    djnz     j, #downloop          ' ramp down
                    mov      frqb, #0              ' stop rectangular for input B
                    mov      frqa, #0              ' stop triangular for input A
    
                    wrlong   phsa, adcp_tri        ' post triangular sample
                    wrlong   phsb, adcp_rect       ' post rectangular sample
    
    
  • cgraceycgracey Posts: 14,206
    TonyB_,

    You can use RDPIN to read a sample, and then the streamer will be able to group four together, time-aligned.

    Very interesting idea about offsetting.

    Now that this thing is in the smart pin, it's pipelined, since we have the flops already there. That should relax any timing pressure. It's also no problem to check for $100 and swap out $FF, instead, so I changed that center $1F to $20.

    I made this diagram of inc's and dec's for when bits move around. There has got to be some good optimization possible here:
    tap	value		bit5	bit4	bit3	bit2	bit1	bit0
    --------------------------------------------------------------------
    0	000001							+1
    1	000011						+1	
    2	000101					+1	-1	
    3	000111						+1	
    4	001010				+1	-1		-1
    5	001101					+1	-1	+1
    6	010000			+1	-1	-1		-1
    7	010011						+1	+1
    8	010110					+1		-1
    9	011001				+1	-1	-1	+1
    10	011011						+1	
    11	011101					+1	-1	
    12	011111						+1	
    13	100000		+1	-1	-1	-1	-1	-1
    14	100000							
    15	100000							
    16	100000							
    17	100000							
    18	100000							
    19	100000							
    20	100000							
    21	100000							
    22	100000							
    23	100000							
    24	100000							
    25	100000							
    26	100000							
    27	100000							
    28	100000							
    29	100000							
    30	100000							
    31	100000							
    32	011111		-1	+1	+1	+1	+1	+1
    33	011101						-1	
    34	011011					-1	+1	
    35	011001						-1	
    36	010110				-1	+1	+1	-1
    37	010011					-1		+1
    38	010000						-1	-1
    39	001101			-1	+1	+1		+1
    40	001010					-1	+1	-1
    41	000111				-1	+1		+1
    42	000101						-1	
    43	000011					-1	+1	
    44	000001						-1	
           (000000)							-1
    
    
    
    value	#	position
    ------------------------
    000001	2	0,44
    000011	2	1,43
    000101	2	2,42
    000111	2	3,41
    001010	2	4,40
    001101	2	5,39
    010000	2	6,38
    010011	2	7,37
    010110	2	8,36
    011001	2	9,35
    011011	2	10,34
    011101	2	11,33
    011111	2	12,32
    100000	19	13..31
    
    #bit5 = 19
    #bit4 = 14
    #bit3 = 12
    #bit2 = 12
    #bit1 = 14
    #bit0 = 20
    
  • cgraceycgracey Posts: 14,206
    edited 2018-12-05 14:43
    I sent the new scope-in-the-smart-pin file set to ON Semi for a test compile. I was surprised how little logic the SINC3 took in the smart pin, and I'm curious to see how the SCOPE may work there.

    We really need to optimize the Tukey-filter summing logic. Some huge optimization(s) must be possible.
Sign In or Register to comment.