I'm looking at these DIFF registers, thinking about putting them into the smart pin for short high-speed conversions. Does anyone have any idea of the bit-requirement for the DIFF registers?
I'm looking at these DIFF registers, thinking about putting them into the smart pin for short high-speed conversions. Does anyone have any idea of the bit-requirement for the DIFF registers?
What if we do the full computation on every clock? This would keep our DIFF's down, wouldn't it? We could kick out a sample every clock then, for whoever wants it. Could this Sinc3 thing work like that, or does it only work in batch mode, where you do a big computation to catch everything up periodically?
Could this Sinc3 thing work like that, or does it only work in batch mode, where you do a big computation to catch everything up periodically?
They call it decimation. From what little I can understand, I think the mathematical filter strength of the data in the accumulators is inherent to the decimation rate.
The faster the decimation rate is the higher the cut-off becomes.
What if we do the full computation on every clock? This would keep our DIFF's down, wouldn't it? We could kick out a sample every clock then, for whoever wants it. Could this Sinc3 thing work like that, or does it only work in batch mode, where you do a big computation to catch everything up periodically?
Differentiator clock is much slower than integrator clock.
I did some experiments to find the limits of high-speed A/D.
These are 8-clock samples @240MHz, making 30M samples/second.
The analog input is a 1.2MHz sawtooth.
Will the rev B silicon be able to run the diff code at 30M, to keep this real time ?
~50ns full scale is impressive for an ADC not focused on speed, and could be useful for current sense on high speed switchers.
That does not need so many bits, & over current is one common requirement.
On that topic, what is the speed of the Threshold DAC + Comparator, I think you mentioned before ?
Do we have a silicon LUT adder cost yet, for these adders in the smart pins ?
I just grabbed a bunch of raw ADC bits and processed them afterwards. Running 8 clocks per sample would have to be totally done in hardware. Only 4 instructions could execute in that time. On the other hand, acc's and diff's would be low bit-count. It might be good to have modes in hardware for low-sample-count Sinc3 conversions. There will never be time for software to handle these computations. Then, the streamer could have modes to capture these sample runs and drop them into memory, thereby kicking the ball forward slightly and giving software another headache.
My Sinc3 code optimization saves some cycles. I think a cog could do 16 clocks per sample, writing unshifted results to shared LUT with PTRx++. Its paired cog could read from the LUT, shift the data, write it back to LUT in packed longs and do fast block writes to hub RAM.
Chip,
For a little side info, JMG linked to a TI DSP earlier today. He quoted some specifications of integrated filter hardware that took a bitstream as input. JMG derived that the decimation interval was 256 bit-clocks in the quoted specs.
One of the specs was 16-bit data out. This would be post-differentiation. What it'll do is the decimator will probably be 24-bit but only the upper 16 bits gets presented as user data. So, it's also a Sinc3 design.
It would be easier to sum readings over time, rather than maintain all these variables.
I think the adds need to be every sysclk, but the diff can be slower, and could even be serial if that helps logic size ?
For some applications, it would be ok to burst capture sinc3, and then process later.
eg in a SMPS you know the shape of the current, so do not need to follow the whole waveform, you just need to read it on the same aperture every SMPS cycle, and compare or react.
If you were to take ten 10-clock sinc3 readings and sum them all up, would it be equivalent to having waited 100 clocks and taken one reading?
It can be done but the filter performance changes accordingly. That's what I did to solve for discontinuous single point sampling. Aside from cut-off, there was a loss of resolution in splitting the Sinc2 256-clock sample into 4 chunks of 64 clocks - produced only 12-bit samples instead of 16-bit samples.
To make that loss back up would've needed another layered Sinc2 on top. In your case, Sinc3.
It seems like it's probably best to just maintain the Sinc3 accumulators and make acc3 readable at all times. Forget about time base, too, since there are no flops left for that. The cog code must read it on schedule.
Wait... it can work like this for synchronous conversions: OUT will gate acc3 reporting. When OUT is low, acc3 is continually reported for RDPIN reading. While OUT is high, the reporting stops, allowing grouped pins to be read from the same point in time.
I just did a masking test and I've found that for 8-sample Sinc3 conversions, only 9 bits are needed for all variables (acc1/acc2/acc3/diff1/diff2/diff3).
Could this Sinc3 thing work like that, or does it only work in batch mode, where you do a big computation to catch everything up periodically?
They call it decimation. From what little I can understand, I think the mathematical filter strength of the data in the accumulators is inherent to the decimation rate.
The faster the decimation rate is the higher the cut-off becomes.
The decimation is not required. But it has a very convenient property of reducing the length of delay on the comb part and consequently the amount of memory/flops required. The response should be the same whether we adjust R (the decimation rate) or M (the comb delay).
Think of a first order setup of an integrator followed by a comb. If you wanted to do a moving average, you would read the integrator, wait for how many samples you wanted to sum, then read the integrator again. Finally, calculate the difference of those two values you read.
The logic required to do any real amount of filtering and output samples at sysclock rate is too much. It might be reasonable to decimate down to 10MS or so.
I just did a masking test and I've found that for 8-sample Sinc3 conversions, only 9 bits are needed for all variables (acc1/acc2/acc3/diff1/diff2/diff3).
One idea for these filters, to help reduce Logic creepage effects, is they do not need to be attached to all smart pin cells ?
I just did a masking test and I've found that for 8-sample Sinc3 conversions, only 9 bits are needed for all variables (acc1/acc2/acc3/diff1/diff2/diff3).
One idea for these filters, to help reduce Logic creepage effects, is they do not need to be attached to all smart pin cells ?
I almost complained about that on a number of occasions today but checked myself each time when I realised that Sinc3 allows less clocks per sample for same resolution. So, settling time to equivalent dB probably doesn't change from Sinc1 up.
You can see that for 8..32 samples, there are enough registers in a smart pin to do it ALL.
I would like an 8-bit sample EVERY clock, though. I think maybe that can be done by windowing (hardwired Tukey) a continuous span of ~16 samples, and instead of adding 0/1 into acc1, we'd add 0..7 on each clock. I will try this out with real ADC streams to see what kind of response we can get. This would be cool for scope-like applications and I imagine it would do wonders for the Goertzel circuit.
...I would like an 8-bit sample EVERY clock, though. I think maybe that can be done by windowing (hardwired Tukey) a continuous span of ~16 samples, and instead of adding 0/1 into acc1, we'd add 0..7 on each clock. I will try this out with real ADC streams to see what kind of response we can get. This would be cool for scope-like applications and I imagine it would do wonders for the Goertzel circuit.
I've done some experiments and I see now that unless you do multiple iterations of Sinc3, you do not get the magical effect you're looking for. The only way you could get a unique sample every clock would be to pipeline the Sinc3, so that you are always working on maybe 8 stages of accumulation and differentiation. That is totally not practical in a smart pin, so we are only going to be getting ADC samples over numbers of clocks. It still seems amazing, anyway.
Comments
Groups of the existing modes are have so much in common they could easy be merged using bits of Y to select inc/dec conditions.
For a 63-sample Sinc3:
acc1 needs 6 bits
acc2 needs 11 bits
acc3 needs 16 bits
What about diff1/2/3? It seems to me that they might all need 16 bits, but I'm probably not seeing something.
They could, but I might throw more counter modes together using Y bits, to free up a nice 5-bit code, or two, for Sinc3.
diff1 16-bit, diff2 17-bit, diff3 18-bit?
Yes, all 16-bit, equal acc3.
Only 16-bit?! Scrooge!
What if we do the full computation on every clock? This would keep our DIFF's down, wouldn't it? We could kick out a sample every clock then, for whoever wants it. Could this Sinc3 thing work like that, or does it only work in batch mode, where you do a big computation to catch everything up periodically?
They call it decimation. From what little I can understand, I think the mathematical filter strength of the data in the accumulators is inherent to the decimation rate.
The faster the decimation rate is the higher the cut-off becomes.
Does overflow/underflow in diffX not matter?
Differentiator clock is much slower than integrator clock.
Yes it does, that's why Chip said 63 clocks. 63 clocks would be the maximum software sampling (decimation) interval.
My Sinc3 code optimization saves some cycles. I think a cog could do 16 clocks per sample, writing unshifted results to shared LUT with PTRx++. Its paired cog could read from the LUT, shift the data, write it back to LUT in packed longs and do fast block writes to hub RAM.
63 is the max for 16-bit acc3 and therefore diff1 is also 16-bit, but diff3 = acc3 - diff2 - diff1.
For a little side info, JMG linked to a TI DSP earlier today. He quoted some specifications of integrated filter hardware that took a bitstream as input. JMG derived that the decimation interval was 256 bit-clocks in the quoted specs.
One of the specs was 16-bit data out. This would be post-differentiation. What it'll do is the decimator will probably be 24-bit but only the upper 16 bits gets presented as user data. So, it's also a Sinc3 design.
It would be easier to sum readings over time, rather than maintain all these variables.
For some applications, it would be ok to burst capture sinc3, and then process later.
eg in a SMPS you know the shape of the current, so do not need to follow the whole waveform, you just need to read it on the same aperture every SMPS cycle, and compare or react.
http://www.ti.com/lit/ds/symlink/ads1260.pdf
It says Sinc3 is settling ~ 3x measument time.
Calibrate is ~ 17x measurement time, so is taking some time to do.
It can be done but the filter performance changes accordingly. That's what I did to solve for discontinuous single point sampling. Aside from cut-off, there was a loss of resolution in splitting the Sinc2 256-clock sample into 4 chunks of 64 clocks - produced only 12-bit samples instead of 16-bit samples.
To make that loss back up would've needed another layered Sinc2 on top. In your case, Sinc3.
It seems like it's probably best to just maintain the Sinc3 accumulators and make acc3 readable at all times. Forget about time base, too, since there are no flops left for that. The cog code must read it on schedule.
Wait... it can work like this for synchronous conversions: OUT will gate acc3 reporting. When OUT is low, acc3 is continually reported for RDPIN reading. While OUT is high, the reporting stops, allowing grouped pins to be read from the same point in time.
Think of a first order setup of an integrator followed by a comb. If you wanted to do a moving average, you would read the integrator, wait for how many samples you wanted to sum, then read the integrator again. Finally, calculate the difference of those two values you read.
The logic required to do any real amount of filtering and output samples at sysclock rate is too much. It might be reasonable to decimate down to 10MS or so.
One idea for these filters, to help reduce Logic creepage effects, is they do not need to be attached to all smart pin cells ?
No pin should be more equal than any other pin.
That describe a layering arrangement maybe?
I almost complained about that on a number of occasions today but checked myself each time when I realised that Sinc3 allows less clocks per sample for same resolution. So, settling time to equivalent dB probably doesn't change from Sinc1 up.
You can see that for 8..32 samples, there are enough registers in a smart pin to do it ALL.
I would like an 8-bit sample EVERY clock, though. I think maybe that can be done by windowing (hardwired Tukey) a continuous span of ~16 samples, and instead of adding 0/1 into acc1, we'd add 0..7 on each clock. I will try this out with real ADC streams to see what kind of response we can get. This would be cool for scope-like applications and I imagine it would do wonders for the Goertzel circuit.
I've done some experiments and I see now that unless you do multiple iterations of Sinc3, you do not get the magical effect you're looking for. The only way you could get a unique sample every clock would be to pipeline the Sinc3, so that you are always working on maybe 8 stages of accumulation and differentiation. That is totally not practical in a smart pin, so we are only going to be getting ADC samples over numbers of clocks. It still seems amazing, anyway.