ADC Sampling Breakthrough

evanh · 2018-11-25 14:06

The differentiation doesn't have to be done inline if just doing data acquisition. Acc3 can be sampled and stored in bulk form, saved to a file even. The diff'ing can be done later and maybe not by cog code.

cgracey · 2018-11-25 14:30

Yes, that makes sense. Thanks.

ErNa · 2018-11-25 16:08

I took a look at ozpropdev's data streaming file and created an excel sheat to separate the bit streams.
A the input is ASCII, I first had to create a number value (took several steps, didn't find a function) and then extract the bits from the value. These bits are in column I to Q, only I completed (Value rang: 0,1). Next I added two neigbours to a new value (column S, value range 0 -2), and so on. In the end I have a data stream at sample frequency with a given resolution. So I create data from nothing. Like making money. The truth is: the data samples are not free, they have a history, kind of filter. I don't know, how this compares to the other filters that were studied here, so I attach the excel file with the separated bitstream to make experiments for those, interested. There seems to be a pattern in the structure of the bitstream that might be linked to the processor activities, a.k. current drawn.
Remove the txt suffics to get access to the excel. As working with 64000 rows is not a pleasure, shorter versions are available to do experiments.

SaucySoliton · 2018-11-25 16:18

cgracey wrote: »

If you were to take ten 10-clock sinc3 readings and sum them all up, would it be equivalent to having waited 100 clocks and taken one reading?

It would be easier to sum readings over time, rather than maintain all these variables.

No. The filter window would look like this. Evan is right that we would need to filter the 10 samples, not just sum them up.

This is what the edge looks like:

Woah, that's almost a Tukey!

TonyB_ · 2018-11-25 19:06

To generate TukeyS(x), x=0..S-1, I use:

  Tukey(x)=0.5-0.5*cos(pi*(x/(S-1)))  ' if S odd
  Tukey(x)=0.5-0.5*cos(pi*(x+0.5)/S)) ' if S even

SaucySoliton · 2018-11-25 21:04

cgracey wrote: »

I want it to run continuously, in "rolling mode".

I think the way to do this would be to run a full CIC decimator and interpolator back to back. The integrators would run at full speed, the diffs would be clocked at a reduced rate that could be adjusted to control the filter bandwidth. It might be too many flops.

Following this paper: https://dspguru.com/dsp/tutorials/cic-filter-introduction/
R is the decimation/interpolation factor
M is the comb delay length
N is the number of stages

The bandwidth is proportional to 1/RM.
The output sample rate is proportional to 1/R.
The memory required is proportional to NM.

Most designs use M=1 and R>1 because:
It reduces the memory/flops required.
It's wasteful to sample a signal at a rate much higher than the bandwidth.

jmg · 2018-11-25 21:10

SaucySoliton wrote: »

....
It's wasteful to sample a signal at a rate much higher than the bandwidth.

A problem in P2, is the charge balancing capacitor sample rate is fixed in the PAD Ring design, at sysCLK.

ErNa · 2018-11-25 22:01

The smartpin can create a new value every clock (if I'm right) the way I showed in the excel sheets that is filtered. Then it is up to the propeller software, how often the smartpin is read. It would be nice if someone could test mit mathlab or similar, what the filter curve of such a cascaded averager looks like. I think, this filter can be realized with very little hardware.
SnSh25.11.2018-22.58.40.gif

The first graph shows the bitstream, values 0-1
the second the sum of 2 neigbours, values 0 - 2
the third the sum of 2 neigbours of the second, values 0-4 3-Bits resolution
and so on.
SnSh25.11.2018-22.59.49.gif

The last but one graph has 8 bits resolution, still full data rate. We now clearly see a structure that allows to apply a special type of filter, finally we see a nice structure of the bitstream data.
No, I do not create bits from nothing. The point is, that in the first graph the signal changes abruptly between 0 and one. In the filtered signals the signal can not change more then one level, so the slope is limited and so the variety in the signals possible. That's the point. The filter converts signal information in time (streamed data) to signal information amplitude. As you can not read the streamer at full rate, it should be usefull reading as many bits to fill the sampling period of the propeller program loop.
The single step are showing how the signal evolves. There is no need to do this, in the end, it's just to know, how many bits are set in a shiftregister of arbitrary length.

SaucySoliton · 2018-11-25 22:16

cgracey wrote: »

Seairth wrote: »

cgracey wrote: »

It's amazing what we can get in 16 clocks, but it's kind of useless when the ADC's front-end analog circuit is only good for a few MHz.

I've only been marginally following this conversation. What do you mean when you say "only good for a few MHz"?

By 10MHz, there is severe phase lag and attenuation.

cgracey wrote: »

I would like an 8-bit sample EVERY clock, though. I think maybe that can be done by windowing (hardwired Tukey) a continuous span of ~16 samples, and instead of adding 0/1 into acc1, we'd add 0..7 on each clock. I will try this out with real ADC streams to see what kind of response we can get. This would be cool for scope-like applications and I imagine it would do wonders for the Goertzel circuit.

It would make no difference for the Goertzel. Other than adding two 8x8 multipliers to it. It's purpose is to measure the amount of energy at particular frequency. By adding a low-pass filter you reduce the amount of energy, forcing it to "work harder." The Goertzel operation is basically a band-pass filter.

evanh · 2018-11-26 02:47

ErNa wrote: »

The last but one graph has 8 bits resolution, still full data rate.

Assuming that's column Y from the sheet, and, if I'm reading it correctly, then 11 bit-clocks is the overall window width.

evanh · 2018-11-26 02:54

SaucySoliton wrote: »

Woah, that's almost a Tukey!

Would, say, a 4 sample group, fit Vorbis? Vorbis seemed to have a notable flat top as well.

SaucySoliton · 2018-11-26 03:18

ErNa wrote: »

It would be nice if someone could test mit mathlab or similar, what the filter curve of such a cascaded averager looks like. I think, this filter can be realized with very little hardware.

Looks like we can get oscilloscope quality traces at sysclock rate. The bandwidth might be 5-10MHz according to what Chip has said. It might be too resource intensive to run in realtime because of number of registers needed to store samples for the moving average filters. Running a few moving averages on the cog should be quite fast, but nowhere near real time. The cog has plenty of memory for what's required.

Perhaps I misunderstood Chip. Trying to generate 8 bit samples continuously at sysclock rate would require more than a tiny bit of dedicated hardware. But reconstructing a few thousand samples like for an oscilloscope display is no problem.

For these plots I scaled the data to 0-256 to get an idea of how many unique values we can resolve.

cgracey · 2018-11-26 07:57

Last night, I added a smart pin mode to do SINC3 integration with three 30-bit accumulators, an 11-bit reporting counter, and an externally-clocked mode. I packed the four USB modes down to two modes ('host' and 'device') by remaking the slow/full-speed switch from X[15] via WXPIN (that NCO bit was always written to '0', anyway).

So, while this SINC3 mode didn't create any new flops, it did grow the smart pin logic by 20%, which is not trivial. This change might have singularly grown the overall P2 logic by 6%.

cgracey · 2018-11-26 08:07

Off-topic, but a hopeful perspective about the future:

https://www.forbes.com/sites/richkarlgaard/2018/02/09/why-technology-prophet-george-gilder-predicts-big-techs-disruption/#2ed0f46b2d21

cgracey · 2018-11-26 09:16

TonyB_ wrote: »

evanh wrote: »

cgracey wrote: »

What about diff1/2/3? It seems to me that they might all need 16 bits, but I'm probably not seeing something.

Yes, all 16-bit, equal acc3.

Only 16-bit?! Scrooge!

Does overflow/underflow in diffX not matter?

In the tests I did to determine bit-length requirements, the take-away was that for a given-sized measurement, all upper acc/diff bits could just be ignored (or not even implemented). For example, in an 8-sample conversion, only bits 8..0 mattered. So, rollover seems part of the natural accommodation, but only the Log2(#samples) LSBs matter.

I wish it were possible to reduce acc1 and acc2 bit-length, but they seem to need to be able to climb to acc3-length values.

cgracey · 2018-11-26 09:27

TonyB_ wrote: »

cgracey wrote: »

I'm looking at these DIFF registers, thinking about putting them into the smart pin for short high-speed conversions. Does anyone have any idea of the bit-requirement for the DIFF registers?

For a 63-sample Sinc3:

acc1 needs 6 bits
acc2 needs 11 bits
acc3 needs 16 bits

What about diff1/2/3? It seems to me that they might all need 16 bits, but I'm probably not seeing something.

diff1 16-bit, diff2 17-bit, diff3 18-bit?

Great. Any idea if the acc1/acc2 registers could be made smaller than acc3, as well?

cgracey · 2018-11-26 09:39

SaucySoliton wrote: »

ErNa wrote: »

It would be nice if someone could test mit mathlab or similar, what the filter curve of such a cascaded averager looks like. I think, this filter can be realized with very little hardware.

Looks like we can get oscilloscope quality traces at sysclock rate. The bandwidth might be 5-10MHz according to what Chip has said. It might be too resource intensive to run in realtime because of number of registers needed to store samples for the moving average filters. Running a few moving averages on the cog should be quite fast, but nowhere near real time. The cog has plenty of memory for what's required.

Perhaps I misunderstood Chip. Trying to generate 8 bit samples continuously at sysclock rate would require more than a tiny bit of dedicated hardware. But reconstructing a few thousand samples like for an oscilloscope display is no problem.

For these plots I scaled the data to 0-256 to get an idea of how many unique values we can resolve.

Saucy, I'm thinking that the place for scope-speed sampling is not in the smart pins, but in the streamer, where ADC bitstreams are readily available, along with full-speed hub writing. Imagine writing four 8-bit conversions on every clock!

evanh · 2018-11-26 09:52

cgracey wrote: »

Off-topic, but a hopeful perspective about the future:

That is rubbish talk in that article! What was even said about the future? He just dis'd a bunch of others is about all I could see there. Is that normal material for Forbes?

cgracey · 2018-11-26 10:03

evanh wrote: »

cgracey wrote: »

Off-topic, but a hopeful perspective about the future:

That is rubbish talk in that article! What was even said about the future? He just dis'd a bunch of others is about all I could see there. Is that normal material for Forbes?

I think he's an older guy that has the benefit of experience and is like a low-pass filter for the current alarmism. I thought he nailed a lot of stuff. I don't know what's normal for Forbes. My take was that he thinks sanity will prevail and nonsense will pass. I liked the message.

evanh · 2018-11-26 10:16

I certainly wouldn't want to be in his world.

cgracey · 2018-11-26 10:25

I'm pretty sure I know why the ADC sees more GIO noise than VIO noise.

It's because of the die substrate's many digital-ground tap connections. The substrate is full of digital ground noise and even though our analog ground is brought in via separate bond wires and tap-connected to private deep N-wells, those deep N-wells couple a lot of digital ground noise from the substrate.

Yanomani has pm'd me lots of stuff about on-chip noise-isolation, but I didn't quite get it.

The reason there's even as much noise as there is on the VIO reading is because GIO (local analog ground) is powering the inverters which make up the integrator sense amp, and the ground noise is causing the integrator cap to be read with uncertainty. For the GIO reading, you have the same inverter noise, plus the noise of the ADC input connected to GIO, instead of VIO, so there's even more.

So, the ADC's biggest noise source is from its local analog ground deep N-wells coupling digital-ground noise via the substrate. That's what's limiting ADC resolution right now.

evanh · 2018-11-26 10:44

cgracey wrote: »

So, the ADC's biggest noise source is from its local analog ground deep N-wells coupling digital-ground noise via the substrate. That's what's limiting ADC resolution right now.

Below 1 MHz too?

cgracey · 2018-11-26 10:52

evanh wrote: »

cgracey wrote: »

So, the ADC's biggest noise source is from its local analog ground deep N-wells coupling digital-ground noise via the substrate. That's what's limiting ADC resolution right now.

Below 1 MHz too?

There's 1/f noise, too. Maybe that is most signifiicant, actually.

This reminds me of something... shorter measurements have less noise. With this SINC3 filter, we are getting in 256 clocks what used to take 64K clocks, so we might be getting better-quality measurements, already. I will test that out in the morning on the FPGA.

Ah, but comparisons require multiple measurements, which bring time back in. Nevertheless, calibrated measurements can happen much faster now. So there should be an improvement. I am anxious to find out.

rogloh · 2018-11-26 11:30

cgracey wrote: »

So, while this SINC3 mode didn't create any new flops, it did grow the smart pin logic by 20%, which is not trivial. This change might have singularly grown the overall P2 logic by 6%.

Chip do you have an inkling as to the budget of free logic resources actually available in the second spin of P2 to make it fit the die etc? These numbers seem like a rather large increase to me. What would you expect to have free, space wise to play with? Is that something already known in advance?

Cluso99 · 2018-11-26 12:29

While 6% more logic scares me, and it's impact on routing and hence timing, and a bit more power too, what worries me most is the possibility of introducing bugs in the existing working section.
Finally it seems that the culprit in less performance than hoped is due to silicon placement rather than logic.

Surely effort in finding why the power usage doesn't reduce with less cogs running and less Hub activity would be beneficial ???
Or more testing of what we have ???

Just my opinion though.

evanh · 2018-11-26 13:04

Full adders aren't cheap I guess. Hmm, there should have already been two 32-bit adders in each smartpin before the Sinc3 was included. Surely the optimiser can make use of those.

I wonder if the 20% is all from a single adder.

cgracey · 2018-11-26 13:50

evanh wrote: »

Full adders aren't cheap I guess. Hmm, there should have already been two 32-bit adders in each smartpin before the Sinc3 was included. Surely the optimiser can make use of those.

I wonder if the 20% is all from a single adder.

The pre-existing adders are inc/dec-type, not full-type. The growth makes perfect sense.

The added routing for this is all local, within each smart pin.

cgracey · 2018-11-26 13:55

Cluso99 wrote: »

While 6% more logic scares me, and it's impact on routing and hence timing, and a bit more power too, what worries me most is the possibility of introducing bugs in the existing working section.
Finally it seems that the culprit in less performance than hoped is due to silicon placement rather than logic.

Surely effort in finding why the power usage doesn't reduce with less cogs running and less Hub activity would be beneficial ???
Or more testing of what we have ???

Just my opinion though.

I've been extremely careful and conservative in all modifications to the design. Some basic tests should confirm whether or not I broke anything, and I'm testing along the way.

Getting the power down is coming up after I redo the streamer.

cgracey · 2018-11-26 14:06

rogloh wrote: »

cgracey wrote: »

So, while this SINC3 mode didn't create any new flops, it did grow the smart pin logic by 20%, which is not trivial. This change might have singularly grown the overall P2 logic by 6%.

Chip do you have an inkling as to the budget of free logic resources actually available in the second spin of P2 to make it fit the die etc? These numbers seem like a rather large increase to me. What would you expect to have free, space wise to play with? Is that something already known in advance?

I think our starting utilzation was 65%. This will drive it up to about 69%, which is a little on the high side.

TonyB_ · 2018-11-26 14:11

cgracey wrote: »

evanh wrote: »

Full adders aren't cheap I guess. Hmm, there should have already been two 32-bit adders in each smartpin before the Sinc3 was included. Surely the optimiser can make use of those.

I wonder if the 20% is all from a single adder.

The pre-existing adders are inc/dec-type, not full-type. The growth makes perfect sense.

The added routing for this is all local, within each smart pin.

I can accept why each of the differentiators must be full-width, but it doesn't make sense to me for each of the integrators to be the same size. Sinc3 is Sinc2 + another stage and Sinc2 is Sinc1 + another stage and acc1 for Sinc1 need only be 10-bit if R=1024.

Whatever the size, acc1 can be an incrementer as only 0 or +1 are added to it. I think acc1 could be 10-bit up counter, acc2 a 20-bit adder+register and acc3 a 30-bit adder+register. Have you tried sign-extending 10-bit acc1 to 20-bit to add to acc2, then sign-extending acc2 to 30-bit to add to acc3?

EDIT:
Edited for clarity.

ADC Sampling Breakthrough

Comments