Brian found some trouble with the spinning Fozzy demo. Our 120MHz overclocking is too aggressive.
I'm recompiling everything now for 80MHz operation. This is below the Fmax of most compiles, but still 8MHz above the 16-cog compiles, which will be fine. Getting 11% above Fmax is no problem, but asking for 67% more is just too much.
This new version will be out tomorrow and it will be v21a. So, back to 80MHz for reliable FPGA testing.
I'm not clear why you don't just make the clock more programmable, to get closer to what a real P2 will do ?
A full PLL config(VCO/M=PFD=Xtal/N) is not possible, but you should be able to do a (480MHz PLL.VCO) / N ? - to say 5 bits
Will the PAD ring test chip allow full testing of config (VCO/M=PFD=Xtal/N) into FPGA ?
Is there a breakout board for FPGA-TestChip design done ?
Jmg, for our purposes now, I don't want to try to replicate what the actual silicon will do, clock-wise. What we have works like this:
CLKSET #0 - set 20MHz (internal RC mode, boot-up)
CLKSET #255 - set 80MHz (fast PLL mode, user code)
There are in-betweens, but they are jittery. For testing purposes, using just these two modes is sufficient.
- but being this limited, means 96MHz cannot be dialed up, which is a likely USB frequency.
I think some code has also relied on the higher than 80MHz clock speed ?
The somewhat still undefined issues around USB code, and other areas above, suggest being able to readily vary clock speed is an important part of testing.
I'm getting nothing running. PX is happy as usual. PNut is reporting a valid Prop2 as per Brian's observation.
I think it's probably a change in how PNut loads the obj into the Prop2. I'm using Dave's loadp2 program in place of PNut because of Pnut's handshake glitch under Wine.
USB on P2v21a @80MHz is much better. The initial v18->v19+ transition code that failed miserably on P2v19-v21 @120MHz is working on P2v21a @80MHz for full-speed, but for some reason not low-speed. I'm not too concerned about that yet, as low-speed has usually been a lot less fussy when trying to get the timing dialed in.
It's going to be a bummer to lose the 120MHz clock, as that made full-speed a comfortable fit. It may be a while yet before I can verify with certainty whether or not the funky behavior I was seeing @120MHz is completely gone, but so far its looking pretty positive.
USB on P2v21a @80MHz is much better. The initial v18->v19+ transition code that failed miserably on P2v19-v21 @120MHz is working on P2v21a @80MHz for full-speed, but for some reason not low-speed. I'm not too concerned about that yet, as low-speed has usually been a lot less fussy when trying to get the timing dialed in.
Maybe that's just some op-code changes that were missed, in the focus on FS USB ?
It would be real nice to have both LS and FS all working, even if not as cleanly as at higher MHz.
It's going to be a bummer to lose the 120MHz clock, as that made full-speed a comfortable fit. It may be a while yet before I can verify with certainty whether or not the funky behavior I was seeing @120MHz is completely gone, but so far its looking pretty positive.
Sounding good - is it worth testing on 84MHz and 96MHz builds ? How many more MHz are needed to make that 'comfortable fit' ?
A few minor Verilog changes were made to accommodate synthesis. These changes should not result in any functional differences.
The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.
This new PNut_v21b.exe properly reports FPGA frequency when doing a Ctrl-G.
The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.
The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.
A few minor Verilog changes were made to accommodate synthesis. These changes should not result in any functional differences.
The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.
This new PNut_v21b.exe properly reports FPGA frequency when doing a Ctrl-G.
Are you saying that there is no difference between 21a and 21b for versions other than the "64smartpins" version?
A few minor Verilog changes were made to accommodate synthesis. These changes should not result in any functional differences.
The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.
This new PNut_v21b.exe properly reports FPGA frequency when doing a Ctrl-G.
Are you saying that there is no difference between 21a and 21b for versions other than the "64smartpins" version?
The only differences are the new "64smartpins" image and PNut_v21b.exe, which properly shows 80MHz, instead of 120MHz.
A few minor Verilog changes were made to accommodate synthesis. These changes should not result in any functional differences.
The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.
This new PNut_v21b.exe properly reports FPGA frequency when doing a Ctrl-G.
Are you saying that there is no difference between 21a and 21b for versions other than the "64smartpins" version?
The only differences are the new "64smartpins" image and PNut_v21b.exe, which properly shows 80MHz, instead of 120MHz.
Thanks. I grabbed the new .zip file just for good measure. At least I can run the latest PNut!
USB on P2v21a @80MHz is much better. The initial v18->v19+ transition code that failed miserably on P2v19-v21 @120MHz is working on P2v21a @80MHz for full-speed, but for some reason not low-speed.
Found the low-speed issue, and given my lack of skill/tools in the hardware debugging department, I'm *almost* certain it's related to the POLLCTx instruction. The attached test code repeatedly calls a poll-based version of WAITX, which I use in the USB host cog when I need to do a timed wait that might overlap with the interrupt-driven Start-of-Frame packet that must be transmitted on the USB every millisecond, +/- 0.0005ms.
Prop123-A9 board + Parallax Serial Terminal, Windows10 and the current P2v21b FPGA image @80MHz. The test code does not utilize interrupts.
The poll_waitx routine is repeatedly called with a one-second delay time value. When using the POLLCTRx instruction, the first call delays for the expected one second, but the second call will spin until the ~54 second timer wrap-around occurs. Then it successfully repeats five one-second delays, but then will wrap-around again on the sixth attempt, then this pattern continues to repeat.
I went back to the P2v21 @120MHz image and got similar results, with the difference being four successful calls before the first wrap-around stall and then a stall at six calls thereafter, as with the 80MHz test.
When the code is changed to use the J[N]CTx instruction, it works as expected.
Replacing POLLCTx with a J[N]CTx variant appears to have me back on par with the USB code I had running at P2v18
I hope somebody else is able to replicate this -- I'd hate think it might be my Prop123-A9 board
The CT1 event flag was being sampled into C on the first clock of the POLLCT1 instruction and then it was being cleared on the second clock. Once in a while, it would sample the event flag on the same clock that it was getting set on, returning C=0, and then clearing the newly-set event flag on the next clock. This caused the event to get 'disappeared', so that it would not be registered again (and only maybe, then), until the 32-bit counter wrapped around again and hit the target.
The JCT1 instruction didn't have this problem, because it both sampled and cleared the CT1 event flag on the first clock of the instruction.
This program demonstrates the problem:
dat org
getct x 'get initial counter
.next addct1 x,##80_000_000/10 'set ct1 for 1/10s from now
getrnd y 'wait some random amount of time
and y,#7 '..to mix up the phase relationship between
waitx y '..the ct1 event and the pollct1 instruction
.loop pollct1 wc 'this will hang sometimes
if_nc jmp #.loop '..until the counter wraps
drvnot #32 'ct1 fired, toggle led
jmp #.next
x long 0
y long 0
I've got all the POLLx instructions now sampling and clearing on the first clock of the instruction.
Here is a Prop123-A9 image with 8 cogs, 512KB hub, and 64 smart pins - just like the silicon is shaping up to be. This runs at 80MHz:
It is a GOOD thing you discovered this. It would have been a tragedy had this slipped through. I think your code will start working reliably now. Thanks a bunch!
Thanks, Chip, I'll give the P2v22 image a go asap!
Super!
That bug crept in when I optimized the way the ALU-result bus worked. I captured every Z/C/result I could, in the first clock, so that there would be less to multiplex in the much-more-critical second clock. That introduced the phase error.
Thanks, Chip, I'll give the P2v22 image a go asap!
Super!
That bug crept in when I optimized the way the ALU-result bus worked. I captured every Z/C/result I could, in the first clock, so that there would be less to multiplex in the much-more-critical second clock. That introduced the phase error.
You living with that bug was demoralizing.
Don't be demoralized, Chip! Most of what you do is beyond my comprehension anyway, and we all know that sooner or later, some kind of s**t happens. I'm super happy just to be minimally involved in something as unique and challenging as this P2 project that you and Parallax have taken on and allowing us all to participate in :-D
And by the way -- woohoo!!!
The P2v18->v19 update code looks to be as solid as the v18 code was, whether it be POLLCTx or J[N}CTx. In fact, I think there might have been wider implications of your fix, because yesterday, after I posted the my test code, I was working on a post-v19 version of the code that I had updated to use just the J[N}CTx variant, and the code would enumerate the mouse/keyboard, but would hang when transitioning to the timer-controlled device polling routine. Since this was an offshoot of the v19 code, my thought at the time was that I still had a problem with something not timer-related in the polling routine. But after I finished testing the v19 code, I fired this version up and it, too, worked just fine!
If your fix did have wider implications, and if you have the time, might it be possible to revisit the land of 120MHz?
I've got the hub slot frequency increased to 1/cogs, instead of the old 1/16.
This means hub instructions don't have to wait so long to execute (COGINIT, QROTATE, etc.).
If some of you will remember, there was a 2-bit CORDIC command-in-progress counter used to avoid hang-ups on GETQX/Y. That got increased to 5 bits to accommodate 1- and 2-cog variants, where a cog can issue CORDIC instructions back-to-back. For an 8-cog variant, it's every 8 clocks. You could pipeline your code and get huge throughput for batched operations.
Also, I moved the debug interrupt vectors around a little so that they go from $FFFFC down, instead of from $FFFC0..$FFFFC. This tidies things up a bit.
Once I test this in every cog-count configuration and I know it's okay, I'll make a new FPGA version and update the doc's.
Comments
I'm not clear why you don't just make the clock more programmable, to get closer to what a real P2 will do ?
A full PLL config(VCO/M=PFD=Xtal/N) is not possible, but you should be able to do a (480MHz PLL.VCO) / N ? - to say 5 bits
Will the PAD ring test chip allow full testing of config (VCO/M=PFD=Xtal/N) into FPGA ?
Is there a breakout board for FPGA-TestChip design done ?
CLKSET #0 - set 20MHz (internal RC mode, boot-up)
CLKSET #255 - set 80MHz (fast PLL mode, user code)
There are in-betweens, but they are jittery. For testing purposes, using just these two modes is sufficient.
Yes, the pad ring test chip will allow the new PLL to be fully exercised. And it will use the same PCB as last time, so there's nothing new to make.
- but being this limited, means 96MHz cannot be dialed up, which is a likely USB frequency.
I think some code has also relied on the higher than 80MHz clock speed ?
Addit: Found some comments around CLK speed & USB code here :
http://forums.parallax.com/discussion/comment/1416453/#Comment_1416453
The somewhat still undefined issues around USB code, and other areas above, suggest being able to readily vary clock speed is an important part of testing.
Good to know the PLL can be fully tested
All V21a images loaded and run @80MHz Ok.
BTW Pnut still shows 120MHz in the information dialog though.
I think it's probably a change in how PNut loads the obj into the Prop2. I'm using Dave's loadp2 program in place of PNut because of Pnut's handshake glitch under Wine.
EDIT: Got it. Nice, I found that the latest edition http://forums.parallax.com/discussion/comment/1416789/#Comment_1416789 has a bunch of parameters that can set loadp2 to match whatever the sysclock frequency is and also force a default first CLKSET value.
EDIT2: One detail: The mode value is in hexadecimal.
It's going to be a bummer to lose the 120MHz clock, as that made full-speed a comfortable fit. It may be a while yet before I can verify with certainty whether or not the funky behavior I was seeing @120MHz is completely gone, but so far its looking pretty positive.
Well that and uncertainty in the path forward.
It would be real nice to have both LS and FS all working, even if not as cleanly as at higher MHz.
Sounding good - is it worth testing on 84MHz and 96MHz builds ? How many more MHz are needed to make that 'comfortable fit' ?
A few minor Verilog changes were made to accommodate synthesis. These changes should not result in any functional differences.
The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.
This new PNut_v21b.exe properly reports FPGA frequency when doing a Ctrl-G.
The only differences are the new "64smartpins" image and PNut_v21b.exe, which properly shows 80MHz, instead of 120MHz.
Prop123-A9 board + Parallax Serial Terminal, Windows10 and the current P2v21b FPGA image @80MHz. The test code does not utilize interrupts.
The poll_waitx routine is repeatedly called with a one-second delay time value. When using the POLLCTRx instruction, the first call delays for the expected one second, but the second call will spin until the ~54 second timer wrap-around occurs. Then it successfully repeats five one-second delays, but then will wrap-around again on the sixth attempt, then this pattern continues to repeat.
I went back to the P2v21 @120MHz image and got similar results, with the difference being four successful calls before the first wrap-around stall and then a stall at six calls thereafter, as with the 80MHz test.
When the code is changed to use the J[N]CTx instruction, it works as expected.
Replacing POLLCTx with a J[N]CTx variant appears to have me back on par with the USB code I had running at P2v18
I hope somebody else is able to replicate this -- I'd hate think it might be my Prop123-A9 board
The CT1 event flag was being sampled into C on the first clock of the POLLCT1 instruction and then it was being cleared on the second clock. Once in a while, it would sample the event flag on the same clock that it was getting set on, returning C=0, and then clearing the newly-set event flag on the next clock. This caused the event to get 'disappeared', so that it would not be registered again (and only maybe, then), until the 32-bit counter wrapped around again and hit the target.
The JCT1 instruction didn't have this problem, because it both sampled and cleared the CT1 event flag on the first clock of the instruction.
This program demonstrates the problem:
I've got all the POLLx instructions now sampling and clearing on the first clock of the instruction.
Here is a Prop123-A9 image with 8 cogs, 512KB hub, and 64 smart pins - just like the silicon is shaping up to be. This runs at 80MHz:
https://drive.google.com/file/d/0B9NbgkdrupkHdmpGNFRqZXdEdTA/view?usp=sharing
It is a GOOD thing you discovered this. It would have been a tragedy had this slipped through. I think your code will start working reliably now. Thanks a bunch!
I'm starting to get enthused again...
Super!
That bug crept in when I optimized the way the ALU-result bus worked. I captured every Z/C/result I could, in the first clock, so that there would be less to multiplex in the much-more-critical second clock. That introduced the phase error.
You living with that bug was demoralizing.
And by the way -- woohoo!!!
The P2v18->v19 update code looks to be as solid as the v18 code was, whether it be POLLCTx or J[N}CTx. In fact, I think there might have been wider implications of your fix, because yesterday, after I posted the my test code, I was working on a post-v19 version of the code that I had updated to use just the J[N}CTx variant, and the code would enumerate the mouse/keyboard, but would hang when transitioning to the timer-controlled device polling routine. Since this was an offshoot of the v19 code, my thought at the time was that I still had a problem with something not timer-related in the polling routine. But after I finished testing the v19 code, I fired this version up and it, too, worked just fine!
If your fix did have wider implications, and if you have the time, might it be possible to revisit the land of 120MHz?
This means hub instructions don't have to wait so long to execute (COGINIT, QROTATE, etc.).
If some of you will remember, there was a 2-bit CORDIC command-in-progress counter used to avoid hang-ups on GETQX/Y. That got increased to 5 bits to accommodate 1- and 2-cog variants, where a cog can issue CORDIC instructions back-to-back. For an 8-cog variant, it's every 8 clocks. You could pipeline your code and get huge throughput for batched operations.
Also, I moved the debug interrupt vectors around a little so that they go from $FFFFC down, instead of from $FFFC0..$FFFFC. This tidies things up a bit.
Once I test this in every cog-count configuration and I know it's okay, I'll make a new FPGA version and update the doc's.