ADC Sampling Breakthrough

evanh · 2018-11-27 01:47

evanh wrote: »

Erna gave an alternative, window of just 11 clocks, but it was expensive as hell.

Actually, in hindsight, that might have been too quick to dismiss. There was a large number of adders in a pyramid to represent multipliers, but they were also very small at the bottom. And not being circular, can probably be optimised well.

cgracey · 2018-11-27 01:49

On line 92, ctr[10:06] would better read sq[15:11]. I changed it on my end.

cgracey · 2018-11-27 01:50

jmg wrote: »

cgracey wrote: »

I've made the NCO module adder generate a discrete Z[31:0]+Y[31:0] sum that the compiler should recognize as also existing in the SINC3 module (as Z[29:0] + Y[29:0]). Before, the NCO mux'd in what was going to be added to Z. This should help reduce logic if SINC3 is implemented. Compiling now...

cgracey wrote: »

Okay, the compile just finished. It didn't make any difference!

I think the compiler is already crunching things down pretty well.

Is there an easy way to confirm it has shared the adder between the 2 modes, as hoped ?

Probably. I will look. Will be gone for two hours...

TonyB_ · 2018-11-27 02:05

evanh wrote: »

TonyB_ wrote: »

If acc3 is negative in the integrator, shouldn't it also be negative in the differentiator? It's something different to try with sign-extending.

The values aren't specifically signed. It's just a circle that goes round and round, that's why two's-complement suits it.

The integrator values are specifically signed in the sign-extension test. Here's how the Verilog might look:

assign r		= {zq[29],zq[29],zq[29:00]};

Probably not great Verilog but it shows what I mean.

evanh · 2018-11-27 02:13

What I mean is the Sinc function doesn't make a distinction. Signed or not, the value goes around and around in the accumulators.

Phil Pilgrim (PhiPi) · 2018-11-27 02:25

evanh wrote:

From my school days: The integral of a sine function is a cosine function. The integral of a cosine function is a sine function.

Not quite. The integral of a sine function is a negative cosine function.

-Phil

evanh · 2018-11-27 02:40

To make the circular function work at 32 bits, RDPIN result should be most significant aligned to 32-bit word size:

assign r		= {zq[29:00], 2'b0};

SaucySoliton · 2018-11-27 05:42

Here's an idea for how to quickly filter a sigma-delta bitstream using a look-up-table. Output sample rate is decimated by 8 from input bitrate. Supports filters up to 32 taps.

uint8 in;
uint32 accum;
uint8 out;
uint32 LUT[256]; 

for samples
   accum+=LUT[in];    // get 8 new bits, feed to LUT
   out=accum&0xff;   // output filtered 8 bit sample
   accum = accum>>8;  // shift right 8 bits

A 256 entry look-up-table can perform arbitrary operations on 8 bits.
We can use it for FIR filtering.
Since we don't need all 32 bits of the output, we can pack more into each long.
Let's treat it as 4 separate tables, performing 4 different functions at the same time.
[31:24][23:16][15:8][7:0]
Each byte of the table contains the result of filtering each part of the filter.
If we are mindful of carry bits, we can operate on 4 bytes at the same time. This is known as soft-SIMD.

This picture is for illustration and is not an optimal filter.

cgracey · 2018-11-27 06:27

Saucy, you are just talking about adding up weighted bits in a sample window, right? Is that really going to give us good performance, without any fancy filtering? I can see the concept is extremely simple.

cgracey · 2018-11-27 08:14

Saucy, I think the problem with your proposal is that it would take 4 discrete LUT reads to do a whole filter for one channel.

TonyB_ · 2018-11-27 12:18

Did sign-extending acc1 and acc2 work when result in acc3 is signed-extended or shifted so sign is bit 31?

TonyB_ · 2018-11-27 14:47

If acc3 in the smart pin is cleared after it is read then the diff1 differentiator can be omitted, saving two instructions and four cycles. The acc3 feedback from register to adder could be zeroed during the cycle after the read, so that acc3 is loaded with acc2+0.

cgracey · 2018-11-27 15:28

I'm using TonyB_'s Tukey 17*/32 to good effect on a windowed 8-bit-sample-per-clock ADC mode in the cog.

Here is the table. I needed to drop the total table sum by 1, and the only symmetrical point to do it at was the middle value:

tukey	long	01,03,05,07,10,13,16,19,22,25,27,29,31		'13 up, up/down sum = 208 * 2
	long	32[9],31,32[9]					'19 top, top sum = 607
	long	31,29,27,25,22,19,16,13,10,07,05,03,01		'13 down, total sum = 1023 (>>2 = 255)

This is definitely an application where the Tukey shines. So, all that work is not going to waste.

In order to get sufficient SNR for 8-bit usage, the window needed to be as wide as it is. The window's low-pass effect begins to kick in at around 1MHz at 180MHz Fsys.

This scope mode will go into the cog, where it will run when enabled and the streamer will be able to write samples to memory at anything up to full speed. The samples are always available! It will be 4 lanes wide, like a 4-channel scope. It's just a 45-bit shifter with staged adders to compute the weighted bits' sum on each clock. Bits 9..2 of the sum make the result.

Here is a picture of a 1.2MHz sawtooth recording that is getting played back at full-speed (250MSPS):

Here is a 1MHz square wave:

And here is a 1MHz sine wave:

The slew is not as fast as it was in an earlier version that used an 8-sample SINC3 filter, but the signal quality is better. I would rather do a SINC3, but it would take an inordinate amount of resources to implement the staggered stages. This is pretty cheap, but lower on bandwidth.

T Chap · 2018-11-27 15:37

Is there any potential distortion from in to out. That’s looking good. I’d like to see a comparison against a good scope!

Ken Gracey · 2018-11-27 15:52

Fun wrecker here. We'd really like Chip to return to the Spin interpreter development. Very soon people are going to have boards in their hands and want to get started.

Chip has to make the decision, but we can also encourage the transition.

Ken Gracey

cgracey · 2018-11-27 15:52

T Chap wrote: »

Is there any potential distortion from in to out. That’s looking good. I’d like to see a comparison against a good scope!

This is bandwidth-limited because of two things: The window's low-pass filter effect and the ADC's analog front end's slowness.

Roy Eltham · 2018-11-27 16:03

Ken,
Remember that we have fastspin that targets P2 and the p2gcc thing that takes existing propgcc output and retargets it to the P2, plus the built in ROM forth(like?). So people can use those if Chip's spin2 isn't ready yet.
Also, people have already been using pnut to do PASM2 stuff for testing.

I do want Chip to get back onto Spin2 also, so I can get OpenSpin2 done in time. Porting is going to take a bit since he's changed a lot.

TonyB_ · 2018-11-27 16:08

cgracey wrote: »
I'm using TonyB_'s Tukey 17*/32 to good effect on a windowed 8-bit-sample-per-clock ADC mode in the cog.

Here is the table. I needed to drop the total table sum by 1, and the only symmetrical point to do it at was the middle value:
tukey	long	01,03,05,07,10,13,16,19,22,25,27,29,31		'13 up, up/down sum = 208 * 2
	long	32[9],31,32[9]					'19 top, top sum = 607
	long	31,29,27,25,22,19,16,13,10,07,05,03,01		'13 down, total sum = 1023 (>>2 = 255)
This is definitely an application where the Tukey shines. So, all that work is not going to waste.

Looks good and I'm pleased the Tukey work has not been wasted, but I don't know where we are now and what's in the smart pins and what's not. Is hardware Sinc3 dropped? The sign-extending test was doomed to fail without sign-extending everything.

cgracey · 2018-11-27 16:13

TonyB_ wrote: »
cgracey wrote: »
I'm using TonyB_'s Tukey 17*/32 to good effect on a windowed 8-bit-sample-per-clock ADC mode in the cog.

Here is the table. I needed to drop the total table sum by 1, and the only symmetrical point to do it at was the middle value:
tukey	long	01,03,05,07,10,13,16,19,22,25,27,29,31		'13 up, up/down sum = 208 * 2
	long	32[9],31,32[9]					'19 top, top sum = 607
	long	31,29,27,25,22,19,16,13,10,07,05,03,01		'13 down, total sum = 1023 (>>2 = 255)
This is definitely an application where the Tukey shines. So, all that work is not going to waste.
Looks good and I'm pleased the Tukey work has not been wasted, but I don't know where we are now and what's in the smart pins and what's not. Is hardware Sinc3 dropped? The sign-extending test was doomed to fail without sign-extending everything.

The SINC3 is in the smart pin. Before I move on from this ADC stuff, I want to get the scope mode working, too. We are on the same page, don't worry.

I'm doing the sign-extension test where I sign-extend everything, so that acc3 is full-size, acc2 is 1 bit less, and acc1 is two bits less. It's not working, unfortunately. Everything seems to need to be full-sized, which is too bad. Any other ideas about reducing these acc sizes?

cgracey · 2018-11-27 16:14

TonyB_ wrote: »

If acc3 in the smart pin is cleared after it is read then the diff1 differentiator can be omitted, saving two instructions and four cycles. The acc3 feedback from register to adder could be zeroed during the cycle after the read, so that acc3 is loaded with acc2+0.

We can do that, no problem. What would the code look like then, with and without the possible DIFF instruction?

TonyB_ · 2018-11-27 16:25

cgracey wrote: »
TonyB_ wrote: »
cgracey wrote: »
I'm using TonyB_'s Tukey 17*/32 to good effect on a windowed 8-bit-sample-per-clock ADC mode in the cog.

Here is the table. I needed to drop the total table sum by 1, and the only symmetrical point to do it at was the middle value:
tukey	long	01,03,05,07,10,13,16,19,22,25,27,29,31		'13 up, up/down sum = 208 * 2
	long	32[9],31,32[9]					'19 top, top sum = 607
	long	31,29,27,25,22,19,16,13,10,07,05,03,01		'13 down, total sum = 1023 (>>2 = 255)
This is definitely an application where the Tukey shines. So, all that work is not going to waste.
Looks good and I'm pleased the Tukey work has not been wasted, but I don't know where we are now and what's in the smart pins and what's not. Is hardware Sinc3 dropped? The sign-extending test was doomed to fail without sign-extending everything.
The SINC3 is in the smart pin. Before I move on from this ADC stuff, I want to get the scope mode working, too. We are on the same page, don't worry.

I'm doing the sign-extension test where I sign-extend everything, so that acc3 is full-size, acc2 is 1 bit less, and acc1 is two bits less. It's not working, unfortunately. Everything seems to need to be full-sized, which is too bad. Any other ideas about reducing these acc sizes?

Thanks for trying sign-extending again - it just doesn't work.

We could reduce the acc sizes by changing the decimation rate R from 1024 to 256. Is there any point or need for 20-bit resolution if only 16-bit values are written?

cgracey · 2018-11-27 16:39

TonyB_ wrote: »
cgracey wrote: »
TonyB_ wrote: »
cgracey wrote: »
I'm using TonyB_'s Tukey 17*/32 to good effect on a windowed 8-bit-sample-per-clock ADC mode in the cog.

Here is the table. I needed to drop the total table sum by 1, and the only symmetrical point to do it at was the middle value:
tukey	long	01,03,05,07,10,13,16,19,22,25,27,29,31		'13 up, up/down sum = 208 * 2
	long	32[9],31,32[9]					'19 top, top sum = 607
	long	31,29,27,25,22,19,16,13,10,07,05,03,01		'13 down, total sum = 1023 (>>2 = 255)
This is definitely an application where the Tukey shines. So, all that work is not going to waste.
Looks good and I'm pleased the Tukey work has not been wasted, but I don't know where we are now and what's in the smart pins and what's not. Is hardware Sinc3 dropped? The sign-extending test was doomed to fail without sign-extending everything.
The SINC3 is in the smart pin. Before I move on from this ADC stuff, I want to get the scope mode working, too. We are on the same page, don't worry.

I'm doing the sign-extension test where I sign-extend everything, so that acc3 is full-size, acc2 is 1 bit less, and acc1 is two bits less. It's not working, unfortunately. Everything seems to need to be full-sized, which is too bad. Any other ideas about reducing these acc sizes?
Thanks for trying sign-extending again - it just doesn't work.

We could reduce the acc sizes by changing the decimation rate R from 1024 to 256. Is there any point or need for 20-bit resolution if only 16-bit values are written?

Well, in externally-clocked mode, there could be need for 20-bit resolution.

cgracey · 2018-11-27 16:41

Is there much need for the DIFF instruction, anymore? Does it save only one instruction now if we are clearing ACC3 in the smart pin at each measurement start? Would it have much use outside of this application?

TonyB_ · 2018-11-27 16:45

cgracey wrote: »

TonyB_ wrote: »

We could reduce the acc sizes by changing the decimation rate R from 1024 to 256. Is there any point or need for 20-bit resolution if only 16-bit values are written?

Well, in externally-clocked mode, there could be need for 20-bit resolution.

But if it's the difference between fitting (comfortably) or not? R=256 reduces the acc2 and acc3 adders from 30-bit to 24-bit, assuming acc1 uses a counter. Sinc3 could be done in software for > 16-bit.

TonyB_ · 2018-11-27 16:52

cgracey wrote: »

TonyB_ wrote: »

If acc3 in the smart pin is cleared after it is read then the diff1 differentiator can be omitted, saving two instructions and four cycles. The acc3 feedback from register to adder could be zeroed during the cycle after the read, so that acc3 is loaded with acc2+0.

We can do that, no problem. What would the code look like then, with and without the possible DIFF instruction?

It's not my idea. I've been doing some reading and this is called Integrate and Dump. It could be used elsewhere in the smart pins, probably. In the differentiator diff1 stores the previous value of acc3, but if acc3 is reset after being read then diff1 is redundant.

DIFF as a separate instruction would save only one instruction and it's not worth the effort. Also, aren't the previously free slots used by SETDAC and another pin instruction?

Integrate and Dump Differentiator without DIFF

	rdpin   z, #adcpin
	sub	z,diff2
	add	diff2,z
	sub	z,diff3
	add	diff3,z

Integrate and Dump Differentiator with DIFF

	rdpin   z, #adcpin
	diff    diff2, 0-0
	diff    diff3, 0-0
	mov     z, 0-0

cgracey · 2018-11-27 16:58

Looking at the Verilog, we don't exactly clear ACC3 at the measurement start, but we mask it away as we add in ACC2.

cgracey · 2018-11-27 17:03

I haven't put anything into those empty instruction slots, yet.

Electrodude · 2018-11-27 17:08

The DIFF instruction would also be useful for comparing values of CNT and other such circular types.

jmg · 2018-11-27 18:59

cgracey wrote: »

This scope mode will go into the cog, where it will run when enabled and the streamer will be able to write samples to memory at anything up to full speed. The samples are always available! It will be 4 lanes wide, like a 4-channel scope. It's just a 45-bit shifter with staged adders to compute the weighted bits' sum on each clock. Bits 9..2 of the sum make the result.

The slew is not as fast as it was in an earlier version that used an 8-sample SINC3 filter, but the signal quality is better. I would rather do a SINC3, but it would take an inordinate amount of resources to implement the staggered stages. This is pretty cheap, but lower on bandwidth.

Looks great, and sounds a good compromise.

cgracey wrote: »

In order to get sufficient SNR for 8-bit usage, the window needed to be as wide as it is. The window's low-pass effect begins to kick in at around 1MHz at 180MHz Fsys.

With this Streamer Capture + Software filter, you will be able to trade off speed against ENOB, and also against processing time, right ?
ie someone could go faster, but at reduced bits ?
IIRC The Analog limit was a rise time of ~ 50ns & what you show is not far from that, maybe 2x ? How many bits would filter be, for ~ 50ns rise ?

jmg · 2018-11-27 19:08

Ken Gracey wrote: »

Fun wrecker here. We'd really like Chip to return to the Spin interpreter development. Very soon people are going to have boards in their hands and want to get started.

Chip has to make the decision, but we can also encourage the transition.

Ken Gracey

Is that code for a Verilog Freeze ?

that must be very close now ?

Spin is not mandatory to test P2, (just nice to have, and there are other P2 pathways now) so I would move Spin2 priority somewhat, to after rev B sign-off release.
Before Rev B, Chip is surely better focused on the reported hardware issues, testing verilog, and getting that rev B as good as it can be ?
Even better docs around smart pins, would greatly help testing coverage there.

ADC Sampling Breakthrough

Comments