Note that TESTP needs one less clock than reading the INA/INB register.
And at 40 MHz sysclock, for the A9 FPGA at least, that reduces down another clock for all cases.
Chip,
It still bothers me that we are seeing, even with both registered I/O and also the tight 1 ns constraints you tried on v31a, a difference between 40 MHz and 80 MHz sysclock. It should have locked in either a 4-clock or a 3-clock I/O turnaround, no matter the speed of the sysclock.
evan,
The vertical bar is in the middle of the clock. The out occurs 1/2 clock earlier and the input sample occurs 1/2 clock later. But there is some indeterminate time for this to exit the fpga to the pin, and some indeterminate time for the pin to return into the fpga. If these are too slow, then you may see an extra clock between the two. But in reality, because the clock strobing these occurs nearest the pin in the real silicon, these delays should be negligible and so that clock that these occur within should be the same.
Chip,
I proved the pin slew rate and input buffering wasn't a problem in the FPGA. The turnaround time at the FPGA pins is a mere 2.5 ns. This fits into a consistent 1-clock in all measurements.
The problem is in the internal routing to/from the logic blob.
Smartpins are affected as badly as the Cogs are. Which leads me to conclude they are not even slightly geolocated with their respective pins.
Chip,
I proved the pin slew rate and input buffering wasn't a problem in the FPGA. The turnaround time at the FPGA pins is a mere 2.5 ns. This fits into a consistent 1-clock in all measurements.
The problem is in the internal routing to/from the logic blob.
Smartpins are affected as badly as the Cogs are. Which leads me to conclude they are not even slightly geolocated with their respective pins.
I see what you are getting at. To meet timing, only setup and hold time requirements are met. This could be all over the place. On Semi said that we could do something called 'skew limiting' on a group of signals. I will ask them about this now. They are just about to push through the final netlist.
It still bothers me that we are seeing, even with both registered I/O and also the tight 1 ns constraints you tried on v31a, a difference between 40 MHz and 80 MHz sysclock. It should have locked in either a 4-clock or a 3-clock I/O turnaround, no matter the speed of the sysclock.
I'm still worried about why the sysclock affects this timing and what that means for final silicon.
Agreed.
We have the hope that the silicon will fix this, which seems an artifact of the FPGA release actually being above the design-speed - ie that 80MHz is actually overclocked.
This effect may be the first sign of margin-reached in an overclocked P2 so it gives a useful testing point on final silicon.
I figured out how to make nice looking timing diagrams using monospaced fonts. The trick is to use underscores "_" on the line above for the high state:
I figured out how to make nice looking timing diagrams using monospaced fonts. The trick is to use underscores "_" on the line above for the high state:
Well, it looks better when you can get rid of extra line spacing.
I use this approach, which makes it clear which is hi, even on a long stable signal, and only needs one line, so cannot miss-align. I've also seen ~~~ used to show tristate.
I tried to boot TAQOZ from the new FPGA images onto a BeMicro A2 but it doesn't have Cordic which TAQOZ uses - argghhh! I did boot an older version of taqoz32k from an SD though. But this cordic thing is only a problem with some FPGA limitations and won't affect the silicon which this image is destined for. Perhaps I could give Chip a non-cordic version of the ROM so he could compile some DE0 and A2 images for us.
Oh, TAQOZ does boot, it just can't understand you The cordic is used for UMOD operation that stores the next character in a single word buffer but it is stuck in the first position so it will still react to control characters and possibly single character words.
Have a reached final ROM, instruction set or is there still a chance that something will change? If so when will we be past that point?
There is always a non-zero chance a change is required, even after first engineering samples.
Some vendors need multiple die revisions to get to a shippable level of device errata.
Hey Chip, if I do up a version of the ROM just for non-cordic FPGAs such as the DE0 and A2, do you think you could compile a new image for those. This will still allows us to test it just like the real thing and plus since TAQOZ is compiled after the booters and monitor it doesn't change any of Ray's entry points for external calls.
FYI
Quartus allows you to do a ROM change without doing a full compile.
"Processing">"Update Memory Initialization file"
then run the "Aseembler" again to build a new programmer file.
IIRC the new ROM filename must be the same as the previous ROM filename.
Chip has the ROM configured as a sequential ROM, the same as OnSemi's. Does that work the same way? Anyway, the smaller FPGAs compile in a lot less time.
v32i seems to be working on my DE2-115; at least, I'm able to get into TAQOZ and the P2-MONITOR.
The only issue I had was some confusion at first as to whether autobaud was working, since there's no feedback after the "> " -- you have to type either ^D or ESC to get something printed. That's a relatively minor point, but might bite some newbies.
A ? command for the monitor to show the available commands would be handy, but perhaps too much room?
v32i seems to be working on my DE2-115; at least, I'm able to get into TAQOZ and the P2-MONITOR.
The only issue I had was some confusion at first as to whether autobaud was working, since there's no feedback after the "> " -- you have to type either ^D or ESC to get something printed. That's a relatively minor point, but might bite some newbies.
A ? command for the monitor to show the available commands would be handy, but perhaps too much room?
It boots and gets into TAQOZ on my Prop123-A9 board as well. I haven't tried the monitor.
Comments
Chip,
It still bothers me that we are seeing, even with both registered I/O and also the tight 1 ns constraints you tried on v31a, a difference between 40 MHz and 80 MHz sysclock. It should have locked in either a 4-clock or a 3-clock I/O turnaround, no matter the speed of the sysclock.
See https://forums.parallax.com/discussion/comment/1430499/#Comment_1430499
I'm still worried about why the sysclock affects this timing and what that means for final silicon.
The vertical bar is in the middle of the clock. The out occurs 1/2 clock earlier and the input sample occurs 1/2 clock later. But there is some indeterminate time for this to exit the fpga to the pin, and some indeterminate time for the pin to return into the fpga. If these are too slow, then you may see an extra clock between the two. But in reality, because the clock strobing these occurs nearest the pin in the real silicon, these delays should be negligible and so that clock that these occur within should be the same.
On Semi has been able to limit the clock skew among all pins to 300ps, which is pretty tight.
I proved the pin slew rate and input buffering wasn't a problem in the FPGA. The turnaround time at the FPGA pins is a mere 2.5 ns. This fits into a consistent 1-clock in all measurements.
The problem is in the internal routing to/from the logic blob.
Smartpins are affected as badly as the Cogs are. Which leads me to conclude they are not even slightly geolocated with their respective pins.
I see what you are getting at. To meet timing, only setup and hold time requirements are met. This could be all over the place. On Semi said that we could do something called 'skew limiting' on a group of signals. I will ask them about this now. They are just about to push through the final netlist.
We have the hope that the silicon will fix this, which seems an artifact of the FPGA release actually being above the design-speed - ie that 80MHz is actually overclocked.
This effect may be the first sign of margin-reached in an overclocked P2 so it gives a useful testing point on final silicon.
Well, it looks better when you can get rid of extra line spacing.
It contains the final ROM code with SD boot and TAQOZ Forth.
Please let me know if you discover any problems.
Both Nano images wont boot TAQOZ but will boot monitor.
P123-A7 image shows no cog leds and is non responsive.
I susoect the nano problem is the rom doesn't fit + no cordic.
Edit: Ah. CV-A2 boots but fails from there.
The DE0-Nano has only 32KB and no CORDIC.
The BeMicro-A2 has 128KB and no CORDIC.
The Prop123-A7 should be viable, though. I will look into this.
Have a reached final ROM, instruction set or is there still a chance that something will change? If so when will we be past that point?
There is always a non-zero chance a change is required, even after first engineering samples.
Some vendors need multiple die revisions to get to a shippable level of device errata.
As far as we know, this is the final everything in v32i.
Thanks for checking these, Brian.
I'm recompiling the Prop123-A7 image now.
Quartus allows you to do a ROM change without doing a full compile.
"Processing">"Update Memory Initialization file"
then run the "Aseembler" again to build a new programmer file.
IIRC the new ROM filename must be the same as the previous ROM filename.
Chip has the ROM configured as a sequential ROM, the same as OnSemi's. Does that work the same way? Anyway, the smaller FPGAs compile in a lot less time.
I think v32j is just the PNut version number (Chip created a special version that could handle more symbols for compiling the boot ROM, IIRC).
The only issue I had was some confusion at first as to whether autobaud was working, since there's no feedback after the "> " -- you have to type either ^D or ESC to get something printed. That's a relatively minor point, but might bite some newbies.
A ? command for the monitor to show the available commands would be handy, but perhaps too much room?