....Still it does not work with old parser (not connection in PropTool) and when compiling same sources with new parser it just works!
Hmm, I wonder if (given the $readmemh() is much more 'sensitive' ) if that is still not quite working at loading the right info into the right places ?
The bit streams will be significantly in variance, from the reports above, but maybe there is some easy way to compare memory areas of a bit stream ?
It would be interesting to nudge that 160MHz up, and down, and see how that reflects on the results.
Some designs may prefer less MHz if that gives more spare logic.
Hmm, I wonder if (given the $readmemh() is much more 'sensitive' ) if that is still not quite working at loading the right info into the right places ?
The bit streams will be significantly in variance, from the reports above, but maybe there is some easy way to compare memory areas of a bit stream ?
It would be interesting to nudge that 160MHz up, and down, and see how that reflects on the results.
Some designs may prefer less MHz if that gives more spare logic.
Good idea there jmg. I actually was thinking about this myself on the way to work today.. Like maybe doing a simulation of it and looking into was really there and why it doesn't work.. or like you said extracing BlockMem content from the actual bit-file. I know there are tools to update BlockRam content without recompiling the whole project. Thanks for the pointer. I'll lokk into that.
Another interesting notion (regardless of parser) is that it uses 32/36 BlockRAM in the 1600E, and then starts eating logic resources for implementing the missing RAMs instead of using the last 4 BLockRAM's
After looking at the result of the compilation, it actually uses BlockRAM only for the HubRAM, and implements all CogRAM using logic, LutRAM.
This presses the resource usage from 60% => 97% so it still fits but gets.. quite troublesome.
As one CogRAM (2K) fits into a single BlockRAM (2304 bytes), one would assume that in this case the first 4 CogRAM's should have been mapped to 4 unused BlockRAM and remaining 4 would map into LutRAM. My guess is that this decision is taken ONCE for the generate statement.
If we were to roll-out the generate statement this would maybe be fixed..
In my case, this would probably both make it: use less logic, build faster and probably get faster FMAX.
A guess would be that it would go half-way between 60-97% => 80%
I'll try that! This codebase is a good example for testing lots of stuff
Sometime in the future I will actually start writing some nice stuff for the Propeller!
I like assembler and also like SPIN, as little as I've seen of it :-)
But to make this interesting I need interfaces to screen,keyboard, joystick, SD-card, MIDI, sound which is waiting for me at home..
If we were to roll-out the generate statement this would maybe be fixed..
In my case, this would probably both make it: use less logic, build faster and probably get faster FMAX.
I'd check the speed effect first, but you may be able to use Cluso's Parameter scheme to direct some COGS each way.
Forcing all 8 still gives 57344 HUB ? - and would free up a LOT of Logic.
As one CogRAM (2K) fits into a single BlockRAM (2304 bytes), one would assume that in this case the first 4 CogRAM's should have been mapped to 4 unused BlockRAM and remaining 4 would map into LutRAM. My guess is that this decision is taken ONCE for the generate statement.
If we were to roll-out the generate statement this would maybe be fixed..
In my case, this would probably both make it: use less logic, build faster and probably get faster FMAX.
A guess would be that it would go half-way between 60-97% => 80%
I'll try that! This codebase is a good example for testing lots of stuff
.
I tried the rolling out (COG0-COG7) but it does not solve the RAM-infer problem.
Also tried it with inline RAM-infer directives (BLOCK for the first 4 and then DISTRIBUTED for the next 4), but it doesn't seem possible to give it directives to use different infer-method for different instances of the same object.
The only way would probably be to have 2 different COG-mem implementations where the COM-mem also use 2 different types of memory-infer methods.
A better way is probably to lower the used HUB mem down to standard for the 1600E.
Another interesting notion (regardless of parser) is that it uses 32/36 BlockRAM in the 1600E, and then starts eating logic resources for implementing the missing RAMs instead of using the last 4 BLockRAM's
After looking at the result of the compilation, it actually uses BlockRAM only for the HubRAM, and implements all CogRAM using logic, LutRAM.
This presses the resource usage from 60% => 97% so it still fits but gets.. quite troublesome.
As one CogRAM (2K) fits into a single BlockRAM (2304 bytes), one would assume that in this case the first 4 CogRAM's should have been mapped to 4 unused BlockRAM and remaining 4 would map into LutRAM. My guess is that this decision is taken ONCE for the generate statement.
If we were to roll-out the generate statement this would maybe be fixed..
In my case, this would probably both make it: use less logic, build faster and probably get faster FMAX.
A guess would be that it would go half-way between 60-97% => 80%
I'll try that! This codebase is a good example for testing lots of stuff
Sometime in the future I will actually start writing some nice stuff for the Propeller!
I like assembler and also like SPIN, as little as I've seen of it :-)
But to make this interesting I need interfaces to screen,keyboard, joystick, SD-card, MIDI, sound which is waiting for me at home..
Hi overclock
if you didn't try this one already, I was able to steer cog/hub memory allocation using "ram_style attribute", by following the instructions that I found here:
After doing some more experiments with this code but fixed/built for Spartan-3E it does not seem to be stable running on Spartan-3E even on the default 160/80 Mhz. I'm sure why, but when lowering the speed to 120/60 Mhz everything seem to work much better. I had problems both with Serial/VGA and probably other protocols at the default speed.
Have anyone incorparated the original full rom (Font+Sine other?) into this code? Because it is RAM, in can actually be overwritten if not used, so it would really come "for free".
If no one done it yet, how has the current files been created? Script/program/source?
Have anyone incorparated the original full rom (Font+Sine other?) into this code? Because it is RAM, in can actually be overwritten if not used, so it would really come "for free".
If no one done it yet, how has the current files been created? Script/program/source?
FYI, I've made some simple mods to the combined Altera/Xilinx Github project to make it work for the Digilent Nexys4, which is based on an Artix7-100. The changes were pretty much limited to:
1) Adjustments to ifdef's and timing loops since the Nexys4 uses a 100Mhz external oscillator rather than 50Mhz.
2) Revised .UCF constraints to provide sane pinouts for the Nexys4. IO pins are assigned to various onboard switches and LEDs.
3) Can be programmed directly from the Propeller tool provided that the option to use RTS rather than DTR is set in the tool, as the onboard UART only provides RTS/CTS to the Artix7.
I submitted as a pull request to the base project, hopefully will be accepted soon. For my next trick I'll probably try to integrate the full ROM including sine tables, etc., since the Artix7 has boatloads of available block RAM compared to the older chips. Total utilization of the current design on the Artix7-100 is only about 15% of LUTs, 4% of FF's, about 12% of block RAM, and 37% of BUFG's for clock distribution.
FYI, I've made some simple mods to the combined Altera/Xilinx Github project to make it work for the Digilent Nexys4, which is based on an Artix7-100.
I submitted as a pull request to the base project, hopefully will be accepted soon.
I'll probably integrate this into Github later today. It needs a few small changes because your version has tabs (not spaces) and you started from a version of Magnus' code that he reverted. Thanks Andy!
Comments
The bit streams will be significantly in variance, from the reports above, but maybe there is some easy way to compare memory areas of a bit stream ?
It would be interesting to nudge that 160MHz up, and down, and see how that reflects on the results.
Some designs may prefer less MHz if that gives more spare logic.
Good idea there jmg. I actually was thinking about this myself on the way to work today.. Like maybe doing a simulation of it and looking into was really there and why it doesn't work.. or like you said extracing BlockMem content from the actual bit-file. I know there are tools to update BlockRam content without recompiling the whole project. Thanks for the pointer. I'll lokk into that.
After looking at the result of the compilation, it actually uses BlockRAM only for the HubRAM, and implements all CogRAM using logic, LutRAM.
This presses the resource usage from 60% => 97% so it still fits but gets.. quite troublesome.
As one CogRAM (2K) fits into a single BlockRAM (2304 bytes), one would assume that in this case the first 4 CogRAM's should have been mapped to 4 unused BlockRAM and remaining 4 would map into LutRAM. My guess is that this decision is taken ONCE for the generate statement.
If we were to roll-out the generate statement this would maybe be fixed..
In my case, this would probably both make it: use less logic, build faster and probably get faster FMAX.
A guess would be that it would go half-way between 60-97% => 80%
I'll try that! This codebase is a good example for testing lots of stuff
Sometime in the future I will actually start writing some nice stuff for the Propeller!
I like assembler and also like SPIN, as little as I've seen of it :-)
But to make this interesting I need interfaces to screen,keyboard, joystick, SD-card, MIDI, sound which is waiting for me at home..
I'd check the speed effect first, but you may be able to use Cluso's Parameter scheme to direct some COGS each way.
Forcing all 8 still gives 57344 HUB ? - and would free up a LOT of Logic.
I tried the rolling out (COG0-COG7) but it does not solve the RAM-infer problem.
Also tried it with inline RAM-infer directives (BLOCK for the first 4 and then DISTRIBUTED for the next 4), but it doesn't seem possible to give it directives to use different infer-method for different instances of the same object.
The only way would probably be to have 2 different COG-mem implementations where the COM-mem also use 2 different types of memory-infer methods.
A better way is probably to lower the used HUB mem down to standard for the 1600E.
Hi overclock
if you didn't try this one already, I was able to steer cog/hub memory allocation using "ram_style attribute", by following the instructions that I found here:
http://www.dilloneng.com/inferring-block-ram-vs-distributed-ram-in-xst-and-precision.html
I had the opposite problem, I couldn't get cog ram to move onto LUTs and free room for more hub.
This did the trick, finally I managed to build for LX9 (LogiPi board from kickstarter), with 3 COGs and full 64KB hub, with upper 4KB preloaded.
btw, if anyone has that board, I can post the modified ucf file.
Have anyone incorparated the original full rom (Font+Sine other?) into this code? Because it is RAM, in can actually be overwritten if not used, so it would really come "for free".
If no one done it yet, how has the current files been created? Script/program/source?
http://forums.parallax.com/showthread.php/156866-Question-for-Chip-re-ROM-code
1) Adjustments to ifdef's and timing loops since the Nexys4 uses a 100Mhz external oscillator rather than 50Mhz.
2) Revised .UCF constraints to provide sane pinouts for the Nexys4. IO pins are assigned to various onboard switches and LEDs.
3) Can be programmed directly from the Propeller tool provided that the option to use RTS rather than DTR is set in the tool, as the onboard UART only provides RTS/CTS to the Artix7.
I submitted as a pull request to the base project, hopefully will be accepted soon. For my next trick I'll probably try to integrate the full ROM including sine tables, etc., since the Artix7 has boatloads of available block RAM compared to the older chips. Total utilization of the current design on the Artix7-100 is only about 15% of LUTs, 4% of FF's, about 12% of block RAM, and 37% of BUFG's for clock distribution.
I'll probably integrate this into Github later today. It needs a few small changes because your version has tabs (not spaces) and you started from a version of Magnus' code that he reverted. Thanks Andy!
===Jac