...
Everything seems to need to be full-sized, which is too bad. Any other ideas about reducing these acc sizes?
I've successfully used a second order IIR filter for the feedback filter of a SigmaDelta DAC. Maybe this would also work well here for the ADC decimation filter.
Instead of using a multiplication, I just shifted right by 6, this needs no logic cells in an FPGA.
Here is a 16bit version of such a simplified IIR filter in Verilog:
bit ist the current bit of the ADC bitstream, this switches the input into the filter between MIN and MAX of the desired ADC resolution ($FFFF : 0000 for 16 bits).
You can easy add a third stage to get a third order filter. The code would be the same as the second stage. And maybe a rightshift by 5 or 7 should also be tried.
...
Reducing the maximum decimation rate and pruning bits would save a fair amount of logic.
Interesting.
The example prunes one bit per stage (they have many stages), and we have first stage as a Counter anyway, and (hopefully) can share one full-adder with the NCO mode, which only leaves one stage for possible savings ?
...
Reducing the maximum decimation rate and pruning bits would save a fair amount of logic.
Interesting.
The example prunes one bit per stage (they have many stages), and we have first stage as a Counter anyway, and (hopefully) can share one full-adder with the NCO mode, which only leaves one stage for possible savings ?
The integrators seem to have more scope for bit pruning than the combs. If R=256 max, then we could use the following bit widths possibly:
acc1 = 24, acc2 = 21, acc3 = 18
Pruning bits before each comb (only two now with Integrate and Dump) doesn't save any logic and wastes time. I think we just want a final truncation to 16 bits but with rounding.
IMHO this is a waste of silicon (and power). It is just trying to cover up a limitation of the hand laid silicon. There is 64x silicon waste. And then there is the risk. There is always risk in any change. This increase in logic will surely effect routing, and in turn, max speed.
There are far more main stream changes that could be added with way less risk.
But in reality, all we need is for the power to be fixed, and the bugs fixed.
Chip, please get back to the things that matter and not the things that are interesting. We are on the home stretch.
I agree with Cluso99. This effort is nothing more than a self-indulgent waste of time and resources. Chip needs to move on with making the P2 a commercial reality, not just a fun diversion.
The integrators seem to have more scope for bit pruning than the combs. If R=256 max, then we could use the following bit widths possibly:
acc1 = 24, acc2 = 21, acc3 = 18
Pruning bits before each comb (only two now with Integrate and Dump) doesn't save any logic and wastes time. I think we just want a final truncation to 16 bits but with rounding.
I would be wary of forcing a 16 bit ceiling, as external parts could easily be more, as Chip has mentioned.
Also the first stage is a counter ? and the NCO has a full adder that can be 'borrowed' ? , so those save nothing on bits (except maybe some connections ?) leaving just the acc3 as a possible saving.
The integrators seem to have more scope for bit pruning than the combs. If R=256 max, then we could use the following bit widths possibly:
acc1 = 24, acc2 = 21, acc3 = 18
Pruning bits before each comb (only two now with Integrate and Dump) doesn't save any logic and wastes time. I think we just want a final truncation to 16 bits but with rounding.
I would be wary of forcing a 16 bit ceiling, as external parts could easily be more, as Chip has mentioned.
Also the first stage is a counter ? and the NCO has a full adder that can be 'borrowed' ? , so those save nothing on bits (except maybe some connections ?) leaving just the acc3 as a possible saving.
I was looking at the do-minimum option, logic-wise.
Last night, I added a smart pin mode to do SINC3 integration with three 30-bit accumulators, an 11-bit reporting counter, and an externally-clocked mode..
Checking this external clock mode details :
This clock can be generated by any smart-pin in Frequency-out mode right ?
How can that be shared across many ADCs - is that done by using the adjacent pin MUX ? Can that clock-in be chosen as either polarity ?
Looking at Digital Microphones, they have a PDM serial bitstream, with 3~6MHz Clocks, and most look to have a L:R mux done on the clock.
ie they can alternate L/R/L/R, which would just need a change in clock polarity to support. One ADC is _/= and one ADC is =\_
...
Reducing the maximum decimation rate and pruning bits would save a fair amount of logic.
Interesting.
The example prunes one bit per stage (they have many stages), and we have first stage as a Counter anyway, and (hopefully) can share one full-adder with the NCO mode, which only leaves one stage for possible savings ?
The integrators seem to have more scope for bit pruning than the combs. If R=256 max, then we could use the following bit widths possibly:
acc1 = 24, acc2 = 21, acc3 = 18
Pruning bits before each comb (only two now with Integrate and Dump) doesn't save any logic and wastes time. I think we just want a final truncation to 16 bits but with rounding.
Rounding will actually cause overflow if the filter is receiving all one's.
Last night, I added a smart pin mode to do SINC3 integration with three 30-bit accumulators, an 11-bit reporting counter, and an externally-clocked mode..
Checking this external clock mode details :
This clock can be generated by any smart-pin in Frequency-out mode right ?
How can that be shared across many ADCs - is that done by using the adjacent pin MUX ? Can that clock-in be chosen as either polarity ?
Looking at Digital Microphones, they have a PDM serial bitstream, with 3~6MHz Clocks, and most look to have a L:R mux done on the clock.
ie they can alternate L/R/L/R, which would just need a change in clock polarity to support. One ADC is _/= and one ADC is =\_
No problem. Smart pins can select polarity of their A and B inputs, as well as their relative -4..+4 location.
Last night, I added a smart pin mode to do SINC3 integration with three 30-bit accumulators, an 11-bit reporting counter, and an externally-clocked mode..
Checking this external clock mode details :
This clock can be generated by any smart-pin in Frequency-out mode right ?
How can that be shared across many ADCs - is that done by using the adjacent pin MUX ? Can that clock-in be chosen as either polarity ?
Looking at Digital Microphones, they have a PDM serial bitstream, with 3~6MHz Clocks, and most look to have a L:R mux done on the clock.
ie they can alternate L/R/L/R, which would just need a change in clock polarity to support. One ADC is _/= and one ADC is =\_
No problem. Smart pins can select polarity of their A and B inputs, as well as their relative -4..+4 location.
Great.
I was just unsure how many of those rules applied to the selections of Clock+Data analog mode Filtering case.
How many pins could share the same clock, in this case ?
eg Can one clock reach-to supply 4 each side, for a maximum of 8 analog cells - half of which would sample on CLK _/= and the other half on =\_
I agree with Cluso99. This effort is nothing more than a self-indulgent waste of time and resources. Chip needs to move on with making the P2 a commercial reality, not just a fun diversion.
-Phil
So noted, Phil. Of course, I think what I'm doing is important. Getting double the ENOB from our ADC's in the normal amount of time is awesome. So is being able to have 8-bit samples on every clock. Having a 4-channel scope built into each cog is going to be a whole new frontier.
This work is 80% done. The SINC3 stuff just needs to be tested, which will take a compile with the test-chip board and a half-hour of verification. The four-channel scope is a deterministic endeavor, at this point. When the streamer gets redesigned, which is basically next, it will be greatly improved by this scope facility, a nice compliment to the existing 4-channel high-speed DAC output.
...
Reducing the maximum decimation rate and pruning bits would save a fair amount of logic.
Interesting.
The example prunes one bit per stage (they have many stages), and we have first stage as a Counter anyway, and (hopefully) can share one full-adder with the NCO mode, which only leaves one stage for possible savings ?
The integrators seem to have more scope for bit pruning than the combs. If R=256 max, then we could use the following bit widths possibly:
acc1 = 24, acc2 = 21, acc3 = 18
Pruning bits before each comb (only two now with Integrate and Dump) doesn't save any logic and wastes time. I think we just want a final truncation to 16 bits but with rounding.
Rounding could actually cause overflow if the filter is receiving all one's.
Rounding seems to be required when using bit pruning, otherwise the truncated lsb can be wrong. There is scope to shrink acc2 and acc3 as we start off with a 30-bit acc1 for R=1024 and the final result is 20-bit max. Those 10+ bits chopped off at the end in one chunk could be removed in smaller pieces after acc1, e.g. acc2 might be 27-bit and acc1 24-bit.
The links I posted earlier and repeated below tell us how many bits can be sliced off at each stage. Understanding of the noise gain calculations is needed, though, which I'm still acquiring. Any help would be appreciated.
In the past, we've done things by brute-forcing testing and that would be the quickest way again, probably. For instance, make acc2 27-bit and acc3 24-bit and see whether the 20-bit result is the same as when all the acc's are 30-bit. The msb's must be propagated to the next acc, i.e. acc1[29:3] + acc2 and acc2[26:3] + acc3, and rounding may be necessary at the end.
I agree with Cluso99. This effort is nothing more than a self-indulgent waste of time and resources. Chip needs to move on with making the P2 a commercial reality, not just a fun diversion.
-Phil
We all want to see the P2 become a reality that we can put to use, so I get the sentiment, but it is ultimately up to Chip to say when it is ready.
I've been facing some challenges with staff recently at work, as well as getting to spend some time with my youngest, a pack of paper, and BIG box of crayons.
Seemingly unrelated, but in the end seeing that we need to sometimes have fun and let the creativity flow...balls to the wall production isn't everything.
This thread has seen a lot of collaborative effort toward a goal, the kind of thing that attracted me to the P1 and Parallax from the beginning.
I agree with Cluso99. This effort is nothing more than a self-indulgent waste of time and resources. Chip needs to move on with making the P2 a commercial reality, not just a fun diversion.
-Phil
We all want to see the P2 become a reality that we can put to use, so I get the sentiment, but it is ultimately up to Chip to say when it is ready.
I've been facing some challenges with staff recently at work, as well as getting to spend some time with my youngest, a pack of paper, and BIG box of crayons.
Seemingly unrelated, but in the end seeing that we need to sometimes have fun and let the creativity flow...balls to the wall production isn't everything.
This thread has seen a lot of collaborative effort toward a goal, the kind of thing that attracted me to the P1 and Parallax from the beginning.
C.W.
The inspiration, knowledge, and development efforts by forum members, into the P2, have been incredible. No company could assemble such a team of people at their peak of inclination. This is a blossom of free will here.
And, no chip has ever been designed this way before, because of the secretive nature of the semiconductor industry.
A 4 channel "scope" does sound pretty cool and even if the analog bandwidth is limited to several MHz I can imagine it could come in very handy for audio and for other signal processing.
It will be another very worthy P2 feature from our combined well of never-ending ideas but I just hope all its extra logic is not going to increase the size of the P2 to easily fit the die and/or reduce it's final maximum speed (perhaps down to below 250MHz which would be a shame and kill any HDMI), and that not too much time is lost for all the other important bug fixes/improvements such as how the power scales with number of active/waiting COGs etc.
The question to ask is what happens if all these changes combined start to impact the speed significantly during respin? Do enhancements then start being removed or left in? I suspect the latter as it may be difficult to decide to yank things out at that point as it could be an agonizing choice.
Rounding seems to be required when using bit pruning, otherwise the truncated lsb can be wrong.
Certainly don't want to truncate the output for the sake of it. Just msbit align for easy 32-bit operations.
There is scope to shrink acc2 and acc3 as we start off with a 30-bit acc1 for R=1024 and the final result is 20-bit max. Those 10+ bits chopped off at the end in one chunk could be removed in smaller pieces after acc1, e.g. acc2 might be 27-bit and acc1 24-bit.
I like those numbers better.
In the past, we've done things by brute-forcing testing ...
A 4 channel "scope" does sound pretty cool and even if the analog bandwidth is limited to several MHz I can imagine it could come in very handy for audio and for other signal processing.
Yes, and of course there is the 250Mhz streamer in there for the Logic Analyzer capture of digital signals too.
Also in there I think, is a threshold adjustable by DAC, but I've not seen the speed spec testing results on that yet, or the range ?
And, no chip has ever been designed this way before, because of the secretive nature of the semiconductor industry.
Heater might raise the RISC-V about now. Although, that isn't a collective effort for one implementation - It's just a collection for pillaging. Like BSD was before Linux brought GPL into the mix.
And, no chip has ever been designed this way before, because of the secretive nature of the semiconductor industry.
Heater might raise the RISC-V about now. Although, that isn't a collective effort for one implementation - It's just a collection for pillaging. Like BSD was before Linux brought GPL into the mix.
Hey Phil,
Not sure why you post here anymore, you have already said you aren't going to use the P2 most likely. Almost all you do is post negative unhelpful stuff here in the P2 forums. Maybe stay away instead?
You keep telling Chip to stop doing what he wants to do.... like you don't want him to make the P2 he wants to make. It makes no sense.
The question to ask is what happens if all these changes combined start to impact the speed significantly during respin? Do enhancements then start being removed or left in? I suspect the latter as it may be difficult to decide to yank things out at that point as it could be an agonizing choice.
There will be simulations again. Chip will get to make such choices before the respin.
Hey Phil,
Not sure why you post here anymore, you have already said you aren't going to use the P2 most likely. Almost all you do is post negative unhelpful stuff here in the P2 forums. Maybe stay away instead?
You keep telling Chip to stop doing what he wants to do.... like you don't want him to make the P2 he wants to make. It makes no sense.
Phil's concerns are very rational. I understand what he's saying. there is just some low-hanging fruit that we ought to grab before the door shuts again.
The inspiration, knowledge, and development efforts by forum members, into the P2, have been incredible. No company could assemble such a team of people at their peak of inclination. This is a blossom of free will here.
I can't agree with that more. But how does that get us any closer to the finish line? I would submit that it does not. What that takes is a dispassionate assessment of the cost/benefit ratio of what additional continual futzing with stuff that's ultimately unimportant to Parallax's bottom line versus getting something out there that works and is salable. Chip, you really have to embrace the concept of compromise, else Parallax and their family-wage-earner employees are going to suffer in ways that you obviously can't imagine.
Hey Phil,
Not sure why you post here anymore, you have already said you aren't going to use the P2 most likely. Almost all you do is post negative unhelpful stuff here in the P2 forums. Maybe stay away instead?
You keep telling Chip to stop doing what he wants to do.... like you don't want him to make the P2 he wants to make. It makes no sense.
I would hope that all opinions are welcome, except maybe the ones that turn into personal attacks.
Rounding seems to be required when using bit pruning, otherwise the truncated lsb can be wrong.
Certainly don't want to truncate the output for the sake of it. Just msbit align for easy 32-bit operations.
There is scope to shrink acc2 and acc3 as we start off with a 30-bit acc1 for R=1024 and the final result is 20-bit max. Those 10+ bits chopped off at the end in one chunk could be removed in smaller pieces after acc1, e.g. acc2 might be 27-bit and acc1 24-bit.
I like those numbers better.
In the past, we've done things by brute-forcing testing ...
Dust off!
Evan, do you have a simulated bitstream that you could input into Sinc3? If so, could you try acc1/2/3=30/27/24-bit?
Comments
Jonathan
http://www.informit.com/articles/article.aspx?p=361985&seqNum=4
Ignore upscaling.
https://www.dsprelated.com/showcode/269.php
http://www.jks.com/cic/cic.html
Reducing the maximum decimation rate and pruning bits would save a fair amount of logic.
I've successfully used a second order IIR filter for the feedback filter of a SigmaDelta DAC. Maybe this would also work well here for the ADC decimation filter.
Instead of using a multiplication, I just shifted right by 6, this needs no logic cells in an FPGA.
Here is a 16bit version of such a simplified IIR filter in Verilog:
bit ist the current bit of the ADC bitstream, this switches the input into the filter between MIN and MAX of the desired ADC resolution ($FFFF : 0000 for 16 bits).
You can easy add a third stage to get a third order filter. The code would be the same as the second stage. And maybe a rightshift by 5 or 7 should also be tried.
Andy
Interesting.
The example prunes one bit per stage (they have many stages), and we have first stage as a Counter anyway, and (hopefully) can share one full-adder with the NCO mode, which only leaves one stage for possible savings ?
The integrators seem to have more scope for bit pruning than the combs. If R=256 max, then we could use the following bit widths possibly:
acc1 = 24, acc2 = 21, acc3 = 18
Pruning bits before each comb (only two now with Integrate and Dump) doesn't save any logic and wastes time. I think we just want a final truncation to 16 bits but with rounding.
There are far more main stream changes that could be added with way less risk.
But in reality, all we need is for the power to be fixed, and the bugs fixed.
Chip, please get back to the things that matter and not the things that are interesting. We are on the home stretch.
-Phil
I would be wary of forcing a 16 bit ceiling, as external parts could easily be more, as Chip has mentioned.
Also the first stage is a counter ? and the NCO has a full adder that can be 'borrowed' ? , so those save nothing on bits (except maybe some connections ?) leaving just the acc3 as a possible saving.
I was looking at the do-minimum option, logic-wise.
Checking this external clock mode details :
This clock can be generated by any smart-pin in Frequency-out mode right ?
How can that be shared across many ADCs - is that done by using the adjacent pin MUX ?
Can that clock-in be chosen as either polarity ?
Looking at Digital Microphones, they have a PDM serial bitstream, with 3~6MHz Clocks, and most look to have a L:R mux done on the clock.
ie they can alternate L/R/L/R, which would just need a change in clock polarity to support. One ADC is _/= and one ADC is =\_
The drifting problem still has to be looked into. It's independent of this filter work. I'm waiting for a real chip to start on this.
Rounding will actually cause overflow if the filter is receiving all one's.
No problem. Smart pins can select polarity of their A and B inputs, as well as their relative -4..+4 location.
Great.
I was just unsure how many of those rules applied to the selections of Clock+Data analog mode Filtering case.
How many pins could share the same clock, in this case ?
eg Can one clock reach-to supply 4 each side, for a maximum of 8 analog cells - half of which would sample on CLK _/= and the other half on =\_
So noted, Phil. Of course, I think what I'm doing is important. Getting double the ENOB from our ADC's in the normal amount of time is awesome. So is being able to have 8-bit samples on every clock. Having a 4-channel scope built into each cog is going to be a whole new frontier.
This work is 80% done. The SINC3 stuff just needs to be tested, which will take a compile with the test-chip board and a half-hour of verification. The four-channel scope is a deterministic endeavor, at this point. When the streamer gets redesigned, which is basically next, it will be greatly improved by this scope facility, a nice compliment to the existing 4-channel high-speed DAC output.
Rounding seems to be required when using bit pruning, otherwise the truncated lsb can be wrong. There is scope to shrink acc2 and acc3 as we start off with a 30-bit acc1 for R=1024 and the final result is 20-bit max. Those 10+ bits chopped off at the end in one chunk could be removed in smaller pieces after acc1, e.g. acc2 might be 27-bit and acc1 24-bit.
The links I posted earlier and repeated below tell us how many bits can be sliced off at each stage. Understanding of the noise gain calculations is needed, though, which I'm still acquiring. Any help would be appreciated.
http://www.informit.com/articles/article.aspx?p=361985&seqNum=4
https://www.dsprelated.com/showcode/269.php
http://www.jks.com/cic/cic.html
In the past, we've done things by brute-forcing testing and that would be the quickest way again, probably. For instance, make acc2 27-bit and acc3 24-bit and see whether the 20-bit result is the same as when all the acc's are 30-bit. The msb's must be propagated to the next acc, i.e. acc1[29:3] + acc2 and acc2[26:3] + acc3, and rounding may be necessary at the end.
We all want to see the P2 become a reality that we can put to use, so I get the sentiment, but it is ultimately up to Chip to say when it is ready.
I've been facing some challenges with staff recently at work, as well as getting to spend some time with my youngest, a pack of paper, and BIG box of crayons.
Seemingly unrelated, but in the end seeing that we need to sometimes have fun and let the creativity flow...balls to the wall production isn't everything.
This thread has seen a lot of collaborative effort toward a goal, the kind of thing that attracted me to the P1 and Parallax from the beginning.
C.W.
The inspiration, knowledge, and development efforts by forum members, into the P2, have been incredible. No company could assemble such a team of people at their peak of inclination. This is a blossom of free will here.
And, no chip has ever been designed this way before, because of the secretive nature of the semiconductor industry.
It will be another very worthy P2 feature from our combined well of never-ending ideas but I just hope all its extra logic is not going to increase the size of the P2 to easily fit the die and/or reduce it's final maximum speed (perhaps down to below 250MHz which would be a shame and kill any HDMI), and that not too much time is lost for all the other important bug fixes/improvements such as how the power scales with number of active/waiting COGs etc.
The question to ask is what happens if all these changes combined start to impact the speed significantly during respin? Do enhancements then start being removed or left in? I suspect the latter as it may be difficult to decide to yank things out at that point as it could be an agonizing choice.
That actually makes sense for the combs at least. Adding gains an extra bit each operation. So I guess subtracting can lose a bit each operation too.
The integrators are more of a mystery.
Certainly don't want to truncate the output for the sake of it. Just msbit align for easy 32-bit operations.
I like those numbers better.
Dust off!
Also in there I think, is a threshold adjustable by DAC, but I've not seen the speed spec testing results on that yet, or the range ?
Heater might raise the RISC-V about now. Although, that isn't a collective effort for one implementation - It's just a collection for pillaging. Like BSD was before Linux brought GPL into the mix.
Please explain more. I want to understand.
Not sure why you post here anymore, you have already said you aren't going to use the P2 most likely. Almost all you do is post negative unhelpful stuff here in the P2 forums. Maybe stay away instead?
You keep telling Chip to stop doing what he wants to do.... like you don't want him to make the P2 he wants to make. It makes no sense.
There will be simulations again. Chip will get to make such choices before the respin.
Phil's concerns are very rational. I understand what he's saying. there is just some low-hanging fruit that we ought to grab before the door shuts again.
-Phil
I would hope that all opinions are welcome, except maybe the ones that turn into personal attacks.
Evan, do you have a simulated bitstream that you could input into Sinc3? If so, could you try acc1/2/3=30/27/24-bit?