Now I'm compiling a new version that is pipelined. It sums the 1's in each bit position, then adds them all together, staggered, in the next clock. The compiler will only builder adders where they are needed, since many value have 0's in most bits:
Okay. The compiler liked that. It's now 92 ALM's, instead of 80, and there's tons of slack in the timing, which is great. And that's a complete scope channel with pin selector and a sample output on every clock!
Because these scope blocks are high-value, each one will have its own pin selector, rather than be stuck with a set of four contiguous pins. Each cog will be able to look at any four pins. I'll make an instruction for reading the four channels (GETADCS D) and the streamer will have access to them for storing in memory. As Saucy said, since a filtered sample is always available, you can get whatever sample rate you want through the streamer with its NCO.
Note to self: Make a trigger mechanism with hysteresis for scope-like triggering on ADC channel data. It drops the current write address into a register and causes an event.
Mismatched accumulators seems to be unstable. The bigger the size difference the larger the wobble amplitude. Eg: 30-27-24 is much bigger peak to peak than either 24-23-22 or 30-29-28, which are identical to each other.
I am a bit lost in all this discussion here.
But I have a 'strange feeling' about this so maybe someone can help me getting over it ;-).
Starting with a simple question:
My understanding was that the analog part in the smart pin ADC is a Single-Order Sigma-Delta-Modulator like (3) in this Wikipedia schema.
1. Is this correct?
The noise shaping is done in the 'loop-filter' and not in the decimation filter, according to: 9789400713864-c2.pdf
Sinc3 Decimation-Filter are said to be the close to optimal for a SECOND Order Modulator but not for a Single-Order Modulator.
So unless there is a second order Modulator the advanced Decimation filtering will not give the expected noise shaping.
But Sinc3 still is a good low-pass filter and may smooth things out.
Maybe I found it in the above document.
Even a single order modulator already gives a Noise-Shaping with a 9dB/Octave SNR improvement compared to a 15dB/Octave for a Second-Order modulator.
The high frequency quantization noise (from switching) is smoothed in the integrator/C - low-pass.
OK - I think that was it -
A Sinc3 is usually associated with a second order modulator
- that's what felt strange with the assumption the P2 only uses first order modulation.
I'm using TonyB_'s Tukey 17*/32 to good effect on a windowed 8-bit-sample-per-clock ADC mode in the cog.
Here is the table. I needed to drop the total table sum by 1, and the only symmetrical point to do it at was the middle value:
tukey long 01,03,05,07,10,13,16,19,22,25,27,29,31 '13 up, up/down sum = 208 * 2
long 32[9],31,32[9] '19 top, top sum = 607
long 31,29,27,25,22,19,16,13,10,07,05,03,01 '13 down, total sum = 1023 (>>2 = 255)
This is definitely an application where the Tukey shines. So, all that work is not going to waste.
In order to get sufficient SNR for 8-bit usage, the window needed to be as wide as it is. The window's low-pass effect begins to kick in at around 1MHz at 180MHz Fsys.
Chip, did you test Tukey13*/32?
tukey long 01,02,05,08,12,16,20,24,27,30,31 '11 up, up/down sum = 176 * 2
long 32[10],31,32[10] '21 top, top sum = 671
long 31,30,27,24,20,16,12,08,05,02,01 '11 down, total sum = 1023 (>>2 = 255)
The Tukey tap logic minimizes better, but I don't know about the quality.
I'm using TonyB_'s Tukey 17*/32 to good effect on a windowed 8-bit-sample-per-clock ADC mode in the cog.
Here is the table. I needed to drop the total table sum by 1, and the only symmetrical point to do it at was the middle value:
tukey long 01,03,05,07,10,13,16,19,22,25,27,29,31 '13 up, up/down sum = 208 * 2
long 32[9],31,32[9] '19 top, top sum = 607
long 31,29,27,25,22,19,16,13,10,07,05,03,01 '13 down, total sum = 1023 (>>2 = 255)
This is definitely an application where the Tukey shines. So, all that work is not going to waste.
In order to get sufficient SNR for 8-bit usage, the window needed to be as wide as it is. The window's low-pass effect begins to kick in at around 1MHz at 180MHz Fsys.
Chip, did you test Tukey13*/32?
tukey long 01,02,05,08,12,16,20,24,27,30,31 '11 up, up/down sum = 176 * 2
long 32[10],31,32[10] '21 top, top sum = 671
long 31,30,27,24,20,16,12,08,05,02,01 '11 down, total sum = 1023 (>>2 = 255)
The Tukey tap logic minimizes better, but I don't know about the quality.
The Tukey 17*/32 is good because we need that many side lobes. If the side lobes get too short, we see lots of ripple on DC readings because we are not covering that 7-cycle max period that the ADC data cycles in.
Hey, it might be good to try something longer than 17. What else have you got?
tukey long 01,02,05,08,12,16,20,24,27,30,31 '11 up, up/down sum = 176 * 2
long 32[10],31,32[10] '21 top, top sum = 671
long 31,30,27,24,20,16,12,08,05,02,01 '11 down, total sum = 1023 (>>2 = 255)
The Tukey tap logic minimizes better, but I don't know about the quality.
The Tukey 17*/32 is good because we need that many side lobes. If the side lobes get too short, we see lots of ripple on DC readings because we are not covering that 7-cycle max period that the ADC data cycles in.
Hey, it might be good to try something longer than 17. What else have you got?
I've been looking only at Tukeys that are perfectly linear in the mid-range. Here is a new one below the existing:
Tukey17*/32, discarding 0 twice and 32 twice. Current
tukey long 01,03,05,07,10,13,16,19,22,25,27,29,31 '13 up, up/down sum = 208 * 2
long 32[9],31,32[9] '19 top, top sum = 607
long 31,29,27,25,22,19,16,13,10,07,05,03,01 '13 down, total sum = 1023 (>>2 = 255)
Tukey16*/33, discarding 0 and 33, flat top.
long 01,02,04,06,09,12,15,18,21,24,27,29,31,32 '14 up, up/down sum = 231 * 2 = 462
long 33[17] '17 top, top sum = 561
long 32,31,29,27,24,21,18,15,12,09,06,04,02,01 '14 down, total sum = 1023 (>>2 = 255)
long 01,02,04,06,09,12,15,18,21,24,27,29,31,32 '14 up, up/down sum = 231 * 2 = 462
long 33[17] '17 top, top sum = 561
long 32,31,29,27,24,21,18,15,12,09,06,04,02,01 '14 down, total sum = 1023 (>>2 = 255)
1023 = 33 * 31
That seems to have the UP not a mirror of the DOWN ?
At some logic size, it pays to change from a state engine / wider ROM to a smaller up.down counter, & smaller ROM
Since Chip had to pipeline anyway for speed, the cross-over may be lower than appears at first glance.
EDIT: Chip here. I updated the table in your post because I kept looking at it and making mistakes. Better to have this data appear only correctly.
TonyB_, I don't know why it should be, but that Tukey 16*/33 is noisier than the Tukey 17*/32. It must have something to do with the frequency of data cycling in the ADC bitstream as it interacts with the side lobes of the window.
TonyB_, I don't know why it should be, but that Tukey 16*/33 is noisier than the Tukey 17*/32. It must have something to do with the frequency of data cycling in the ADC bitstream as it interacts with the side lobes of the window.
The linear section is one sample longer and maybe that doesn't help. If you think Tukey13*/32 is too short, then I say stick with 17*/32 and let's move on, because on paper 16*/33 looked very good!
But Sinc3 still is a good low-pass filter and may smooth things out.
We could certainly see better results in testing. And as JMG says, this can also be used with external bitstream ADCs which can be very extremely good at instrumentation because it's easy to place electrical isolation on the digital bitstream signal.
Obviously the die space costs have become a concern. I'm thinking reducing to 24-bit accumulators and staying with Sinc3 is the way forward. This allows up to 256 bit-clocks per sample.
EDIT: JMG mentioned current loop control in servo drives for potential uses. Now that's an electrically noisy environment! Strain-gauges is another fast acting transducer with potential for noise injection because of the often long cable run from load-cell to electronics.
Evanh, do you think this can be done without any numerical compromise?
Maybe if rounding can be done cheaply. Tony mentioned doing rounding.
Evan, thanks for all your tests.
The literature says the integrator and comb widths should be Bout = N * log2(RM) + Bin. For our Sinc3, Bin = 1, M = 1, N = 3 and R = 1024 max. Therefore Bout = 31 but we know it works if Bout = 30.
I've calculated how many bits are needed before and after each stage, for Bout = 30 and 31:
This LSB pruning is based on standard deviations and I prefer a definite 1 and a definite 0 when it comes to logic. When acc1 rolls over to zero, it takes two or three samples for acc1 to be large enough to affect acc2, which seems to be an insoluble problem.
I know where the sign-extending should be done now and it's at the input to acc1, so strictly speaking acc1 should count down when the ADC bit = 1, but I doubt that makes the slightest difference to the accuracy.
This LSB pruning is based on standard deviations and I prefer a definite 1 and a definite 0 when it comes to logic. When acc1 rolls over to zero, it takes two or three samples for acc1 to be large enough to affect acc2, which seems to be an insoluble problem.
That's good info. I think I understand it. Thanks.
I definitely want the SInc3 now, simply because the smartpin can be used for external bitstream timing. I hadn't pondered it much until now but I've always wanted to hook up external bitstream ADCs to get electrical isolation.
Evan, do you want to try acc2/acc3 = 28/25 bits wide, with acc1 decremented?
I've tried many combos. They all produce some ongoing oscillation. The bigger the truncation, the bigger oscillation amplitude. EDIT: Only equal sized accumulators produces a stable signal.
Decrementing/incrementing are one and the same in this context.
I definitely want the SInc3 now, simply because the smartpin can be used for external bitstream timing. I hadn't pondered it much until now but I've always wanted to hook up external bitstream ADCs to get electrical isolation.
Yes, sinc3 seems to be the standard used, and there are DSPs and high end MCUs with these filters in them, which will mean even more Analog parts designed to use them ...
The newest TI variant part, (AMC1035) drops isolation, and adds 0.25% VREF, still in the same 8 pin package, for a lower price. ~ $1.50 (you can add external isolation)
Their data says
"When used with a digital filter (such as integrated in the TMS320F28004x, TMS320F2807x or TMS320F2837x microcontroller families) to decimate the output bitstream, the device can achieve 16 bits of resolution with a dynamic range of 87 dB at a data rate of 82 kSPS."
Digital Microphones seem to also have the PDM interface (CLK in, Data out)
I've seen D-FF and high end amplifiers, used to get > 20 bits of ADC, at quite low (and reducing all the time) costs, so I'd be wary of any Sinc3 size prune that limits P2 to 16b ADCs only.
We want P2 to have a long design life.
Another benefit of a standard-size Sinc3, is the existing analog parts (AMC1035, AD7403 etc) can be used to confirm FPGA operation.
It's a shame the SmartPins wasn't each implemented as a tiny reduced P2 cog.
No, I am definitely not suggesting it now. It's way too late now!!!
The smart pins work far faster than a P2 COG could manage in software, but they are so smart, that I would make a variant of your comment.
It's a shame some tiny reduced P2 COGs were not implemented to start and service the smart pins.
Using a Full P2 COG to load some registers and poll some flags, is quite an overkill.
I recoded the Tukey filter to sum up the 1s', 2s', 4s', 8s', 16s', and 32s' bits, and then sum those summations up at appropriate bit offsets. This has plenty of slack, still, with NO pipelining. It's down to 88 ALMs.
This is a whole 8-bit-sample-per-clock scope channel with input selector, thought impossible (by me, anyway) just a week ago.
Thanks for all your help on this, Everyone.
// cog osc fil
module cog_osc_fil
(
input resn,
input clk,
input ena,
input set,
input [7:0] d,
input [63:0] pin_in,
output reg [7:0] sample
);
reg [7:0] cfg; // configuration
`regscan (cfg, 8'b0, !ena || set, !ena ? 8'b0 : d[7:0])
reg [44:0] tap; // Tukey window taps
`regscan (tap, 45'b0, cfg[7], {tap[43:0], pin_in[cfg[5:0]]})
wire [5:0][5:0] bits;
genvar i; // sum bits of Tukey values and'd with taps
generate
for (i = 0; i <= 5; i++)
begin : bitsgen
assign bits[i] = (6'h01 >> i & tap[44]) +
(6'h03 >> i & tap[43]) +
(6'h05 >> i & tap[42]) +
(6'h07 >> i & tap[41]) +
(6'h0A >> i & tap[40]) +
(6'h0D >> i & tap[39]) +
(6'h10 >> i & tap[38]) +
(6'h13 >> i & tap[37]) +
(6'h16 >> i & tap[36]) +
(6'h19 >> i & tap[35]) +
(6'h1B >> i & tap[34]) +
(6'h1D >> i & tap[33]) +
(6'h1F >> i & tap[32]) +
(6'h20 >> i & tap[31]) +
(6'h20 >> i & tap[30]) +
(6'h20 >> i & tap[29]) +
(6'h20 >> i & tap[28]) +
(6'h20 >> i & tap[27]) +
(6'h20 >> i & tap[26]) +
(6'h20 >> i & tap[25]) +
(6'h20 >> i & tap[24]) +
(6'h20 >> i & tap[23]) +
(6'h1F >> i & tap[22]) +
(6'h20 >> i & tap[21]) +
(6'h20 >> i & tap[20]) +
(6'h20 >> i & tap[19]) +
(6'h20 >> i & tap[18]) +
(6'h20 >> i & tap[17]) +
(6'h20 >> i & tap[16]) +
(6'h20 >> i & tap[15]) +
(6'h20 >> i & tap[14]) +
(6'h20 >> i & tap[13]) +
(6'h1F >> i & tap[12]) +
(6'h1D >> i & tap[11]) +
(6'h1B >> i & tap[10]) +
(6'h19 >> i & tap[09]) +
(6'h16 >> i & tap[08]) +
(6'h13 >> i & tap[07]) +
(6'h10 >> i & tap[06]) +
(6'h0D >> i & tap[05]) +
(6'h0A >> i & tap[04]) +
(6'h07 >> i & tap[03]) +
(6'h05 >> i & tap[02]) +
(6'h03 >> i & tap[01]) +
(6'h01 >> i & tap[00]) ;
end
endgenerate
wire [10:0] sum = {bits[5], 5'b0} +
{bits[4], 4'b0} +
{bits[3], 3'b0} +
{bits[2], 2'b0} +
{bits[1], 1'b0} +
{bits[0]};
`regscan (sample, 8'b0, cfg[7], sum[9:2])
endmodule
The Tukey 17*/32 is kind of special, aside from working the best, because its 1s', 2s', 4s', 8s', 16s' and 32s' bits are all about the same in population, which nicely distributes the bit-adders' workload, keeping them all around the same speed:
Comments
Okay. The compiler liked that. It's now 92 ALM's, instead of 80, and there's tons of slack in the timing, which is great. And that's a complete scope channel with pin selector and a sample output on every clock!
Because these scope blocks are high-value, each one will have its own pin selector, rather than be stuck with a set of four contiguous pins. Each cog will be able to look at any four pins. I'll make an instruction for reading the four channels (GETADCS D) and the streamer will have access to them for storing in memory. As Saucy said, since a filtered sample is always available, you can get whatever sample rate you want through the streamer with its NCO.
Note to self: Make a trigger mechanism with hysteresis for scope-like triggering on ADC channel data. It drops the current write address into a register and causes an event.
SETADCS {#D} - configure all four ADCs via D bytes where each bit7=enable and each bit 5..0=pin
GETADCS D - read all 4 ADCs into D bytes
Are certain ADCs tied to certain cogs like DACs are? If so, can a cog's ADCs be on different pins than its DACs?
The four ADC inputs to each cog are fully selectable. They don't have to be in a row.
Could that also be a plan C, if Sinc3 is tight for ALL pins, to reduce it to some-number-of-pins-per-COG ?
Yes. Could be. The nice thing about smart pins is that they ALL could be doing conversions.
But I have a 'strange feeling' about this so maybe someone can help me getting over it ;-).
Starting with a simple question:
My understanding was that the analog part in the smart pin ADC is a
Single-Order Sigma-Delta-Modulator like (3) in this Wikipedia schema.
1. Is this correct?
The noise shaping is done in the 'loop-filter' and not in the decimation filter, according to:
9789400713864-c2.pdf
Sinc3 Decimation-Filter are said to be the close to optimal for a SECOND Order Modulator but not for a Single-Order Modulator.
So unless there is a second order Modulator the advanced Decimation filtering will not give the expected noise shaping.
But Sinc3 still is a good low-pass filter and may smooth things out.
2. so where did I miss the important link?
https://maximintegrated.com/en/app-notes/index.mvp/id/1870
Maybe I found it in the above document.
Even a single order modulator already gives a Noise-Shaping with a 9dB/Octave SNR improvement compared to a 15dB/Octave for a Second-Order modulator.
The high frequency quantization noise (from switching) is smoothed in the integrator/C - low-pass.
OK - I think that was it -
A Sinc3 is usually associated with a second order modulator
- that's what felt strange with the assumption the P2 only uses first order modulation.
Answer to (1) would be great anyhow.
thanks
MJB
Chip, did you test Tukey13*/32?
The Tukey tap logic minimizes better, but I don't know about the quality.
The Tukey 17*/32 is good because we need that many side lobes. If the side lobes get too short, we see lots of ripple on DC readings because we are not covering that 7-cycle max period that the ADC data cycles in.
Hey, it might be good to try something longer than 17. What else have you got?
I've been looking only at Tukeys that are perfectly linear in the mid-range. Here is a new one below the existing:
Tukey17*/32, discarding 0 twice and 32 twice. Current
Tukey16*/33, discarding 0 and 33, flat top. 1023 = 33 * 31
That seems to have the UP not a mirror of the DOWN ?
At some logic size, it pays to change from a state engine / wider ROM to a smaller up.down counter, & smaller ROM
Since Chip had to pipeline anyway for speed, the cross-over may be lower than appears at first glance.
EDIT: Chip here. I updated the table in your post because I kept looking at it and making mistakes. Better to have this data appear only correctly.
Yes, it was. Mistake corrected.
The linear section is one sample longer and maybe that doesn't help. If you think Tukey13*/32 is too short, then I say stick with 17*/32 and let's move on, because on paper 16*/33 looked very good!
We could certainly see better results in testing. And as JMG says, this can also be used with external bitstream ADCs which can be very extremely good at instrumentation because it's easy to place electrical isolation on the digital bitstream signal.
Obviously the die space costs have become a concern. I'm thinking reducing to 24-bit accumulators and staying with Sinc3 is the way forward. This allows up to 256 bit-clocks per sample.
EDIT: JMG mentioned current loop control in servo drives for potential uses. Now that's an electrically noisy environment! Strain-gauges is another fast acting transducer with potential for noise injection because of the often long cable run from load-cell to electronics.
Evan, thanks for all your tests.
The literature says the integrator and comb widths should be Bout = N * log2(RM) + Bin. For our Sinc3, Bin = 1, M = 1, N = 3 and R = 1024 max. Therefore Bout = 31 but we know it works if Bout = 30.
I've calculated how many bits are needed before and after each stage, for Bout = 30 and 31:
This LSB pruning is based on standard deviations and I prefer a definite 1 and a definite 0 when it comes to logic. When acc1 rolls over to zero, it takes two or three samples for acc1 to be large enough to affect acc2, which seems to be an insoluble problem.
I know where the sign-extending should be done now and it's at the input to acc1, so strictly speaking acc1 should count down when the ADC bit = 1, but I doubt that makes the slightest difference to the accuracy.
That's good info. I think I understand it. Thanks.
I definitely want the SInc3 now, simply because the smartpin can be used for external bitstream timing. I hadn't pondered it much until now but I've always wanted to hook up external bitstream ADCs to get electrical isolation.
No, I am definitely not suggesting it now. It's way too late now!!!
I've tried many combos. They all produce some ongoing oscillation. The bigger the truncation, the bigger oscillation amplitude. EDIT: Only equal sized accumulators produces a stable signal.
Decrementing/incrementing are one and the same in this context.
The newest TI variant part, (AMC1035) drops isolation, and adds 0.25% VREF, still in the same 8 pin package, for a lower price. ~ $1.50 (you can add external isolation)
Their data says
"When used with a digital filter (such as integrated in the TMS320F28004x, TMS320F2807x or TMS320F2837x microcontroller families) to decimate the output bitstream, the device can achieve 16 bits of resolution with a dynamic range of 87 dB at a data rate of 82 kSPS."
Digital Microphones seem to also have the PDM interface (CLK in, Data out)
I've seen D-FF and high end amplifiers, used to get > 20 bits of ADC, at quite low (and reducing all the time) costs, so I'd be wary of any Sinc3 size prune that limits P2 to 16b ADCs only.
We want P2 to have a long design life.
Another benefit of a standard-size Sinc3, is the existing analog parts (AMC1035, AD7403 etc) can be used to confirm FPGA operation.
The smart pins work far faster than a P2 COG could manage in software, but they are so smart, that I would make a variant of your comment.
It's a shame some tiny reduced P2 COGs were not implemented to start and service the smart pins.
Using a Full P2 COG to load some registers and poll some flags, is quite an overkill.
Maybe that will come later, in P3 ?
.... maybe P3 will have 32 simpler COGs and a RISC-V ?
This is a whole 8-bit-sample-per-clock scope channel with input selector, thought impossible (by me, anyway) just a week ago.
Thanks for all your help on this, Everyone.
Having 30-bit Sinc3 accumulators in the smart pin is not that much more expensive than what the pruning would provide, anyway.
What else could use some improvement on this front, while we're here?