Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

18586889091153

Comments

  • jmgjmg Posts: 12,640
    edited September 2017 Vote Up0Vote Down
    cgracey wrote: »
    Brian found some trouble with the spinning Fozzy demo. Our 120MHz overclocking is too aggressive.

    I'm recompiling everything now for 80MHz operation. This is below the Fmax of most compiles, but still 8MHz above the 16-cog compiles, which will be fine. Getting 11% above Fmax is no problem, but asking for 67% more is just too much.

    This new version will be out tomorrow and it will be v21a. So, back to 80MHz for reliable FPGA testing.

    I'm not clear why you don't just make the clock more programmable, to get closer to what a real P2 will do ?
    A full PLL config(VCO/M=PFD=Xtal/N) is not possible, but you should be able to do a (480MHz PLL.VCO) / N ? - to say 5 bits


    Will the PAD ring test chip allow full testing of config (VCO/M=PFD=Xtal/N) into FPGA ?
    Is there a breakout board for FPGA-TestChip design done ?
  • Jmg, for our purposes now, I don't want to try to replicate what the actual silicon will do, clock-wise. What we have works like this:

    CLKSET #0 - set 20MHz (internal RC mode, boot-up)
    CLKSET #255 - set 80MHz (fast PLL mode, user code)

    There are in-betweens, but they are jittery. For testing purposes, using just these two modes is sufficient.

    Yes, the pad ring test chip will allow the new PLL to be fully exercised. And it will use the same PCB as last time, so there's nothing new to make.
  • jmgjmg Posts: 12,640
    edited September 2017 Vote Up0Vote Down
    cgracey wrote: »
    Jmg, for our purposes now, I don't want to try to replicate what the actual silicon will do, clock-wise. What we have works like this:

    CLKSET #0 - set 20MHz (internal RC mode, boot-up)
    CLKSET #255 - set 80MHz (fast PLL mode, user code)

    There are in-betweens, but they are jittery. For testing purposes, using just these two modes is sufficient.

    - but being this limited, means 96MHz cannot be dialed up, which is a likely USB frequency.
    I think some code has also relied on the higher than 80MHz clock speed ?

    Addit: Found some comments around CLK speed & USB code here :
    http://forums.parallax.com/discussion/comment/1416453/#Comment_1416453


    The somewhat still undefined issues around USB code, and other areas above, suggest being able to readily vary clock speed is an important part of testing.


    cgracey wrote: »
    Yes, the pad ring test chip will allow the new PLL to be fully exercised. And it will use the same PCB as last time, so there's nothing new to make.
    Good to know the PLL can be fully tested :)


  • There's a new v21a at the top of this thread that puts everything back to 80MHz for reliability.
  • Looks good Chip!
    All V21a images loaded and run @80MHz Ok. :)

    BTW Pnut still shows 120MHz in the information dialog though.
    Melbourne, Australia
  • I'm getting nothing running. PX is happy as usual. PNut is reporting a valid Prop2 as per Brian's observation.

    I think it's probably a change in how PNut loads the obj into the Prop2. I'm using Dave's loadp2 program in place of PNut because of Pnut's handshake glitch under Wine.
    Money is a placeholder for cooperation
  • evanhevanh Posts: 5,953
    edited September 2017 Vote Up0Vote Down
    Oh, that's right. Last time I had to edit a binary file to set the baud numbers to suit the new sysclock or something ...

    EDIT: Got it. :) Nice, I found that the latest edition http://forums.parallax.com/discussion/comment/1416789/#Comment_1416789 has a bunch of parameters that can set loadp2 to match whatever the sysclock frequency is and also force a default first CLKSET value.

    EDIT2: One detail: The mode value is in hexadecimal.
    Money is a placeholder for cooperation
  • USB on P2v21a @80MHz is much better. The initial v18->v19+ transition code that failed miserably on P2v19-v21 @120MHz is working on P2v21a @80MHz for full-speed, but for some reason not low-speed. I'm not too concerned about that yet, as low-speed has usually been a lot less fussy when trying to get the timing dialed in.

    It's going to be a bummer to lose the 120MHz clock, as that made full-speed a comfortable fit. It may be a while yet before I can verify with certainty whether or not the funky behavior I was seeing @120MHz is completely gone, but so far its looking pretty positive.
    garryj
  • Okay. That's good news, so far, Garryj.
  • RaymanRayman Posts: 8,907
    edited September 2017 Vote Up0Vote Down
    Thanks for keeping up with this garryj! I've lost motivation without having USB...
    Well that and uncertainty in the path forward.
    Prop Info and Apps: http://www.rayslogic.com/
  • garryj wrote: »
    USB on P2v21a @80MHz is much better. The initial v18->v19+ transition code that failed miserably on P2v19-v21 @120MHz is working on P2v21a @80MHz for full-speed, but for some reason not low-speed. I'm not too concerned about that yet, as low-speed has usually been a lot less fussy when trying to get the timing dialed in.
    Maybe that's just some op-code changes that were missed, in the focus on FS USB ?
    It would be real nice to have both LS and FS all working, even if not as cleanly as at higher MHz.
    garryj wrote: »
    It's going to be a bummer to lose the 120MHz clock, as that made full-speed a comfortable fit. It may be a while yet before I can verify with certainty whether or not the funky behavior I was seeing @120MHz is completely gone, but so far its looking pretty positive.

    Sounding good - is it worth testing on 84MHz and 96MHz builds ? How many more MHz are needed to make that 'comfortable fit' ?

  • Remember, though, that we have an NCO generating the USB clock, not a divide-by-N counter.
  • And remember real silicon is expected to be 160+MHz
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • cgraceycgracey Posts: 10,588
    edited September 2017 Vote Up0Vote Down
    I posted a new v21b at the top of this thread.

    A few minor Verilog changes were made to accommodate synthesis. These changes should not result in any functional differences.

    The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.

    This new PNut_v21b.exe properly reports FPGA frequency when doing a Ctrl-G.
  • cgraceycgracey Posts: 10,588
    edited September 2017 Vote Up0Vote Down
    -deleted-
  • cgracey wrote: »
    The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.
    Runs all my stuff Ok here, looks fine to me.


    Melbourne, Australia
  • cgracey wrote: »
    The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.
    First twenty XORO32 sums came out good. :)
    Money is a placeholder for cooperation
  • Thanks, Guys.
  • cgracey wrote: »
    I posted a new v21b at the top of this thread.

    A few minor Verilog changes were made to accommodate synthesis. These changes should not result in any functional differences.

    The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.

    This new PNut_v21b.exe properly reports FPGA frequency when doing a Ctrl-G.
    Are you saying that there is no difference between 21a and 21b for versions other than the "64smartpins" version?

  • David Betz wrote: »
    cgracey wrote: »
    I posted a new v21b at the top of this thread.

    A few minor Verilog changes were made to accommodate synthesis. These changes should not result in any functional differences.

    The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.

    This new PNut_v21b.exe properly reports FPGA frequency when doing a Ctrl-G.
    Are you saying that there is no difference between 21a and 21b for versions other than the "64smartpins" version?

    The only differences are the new "64smartpins" image and PNut_v21b.exe, which properly shows 80MHz, instead of 120MHz.
  • cgracey wrote: »
    David Betz wrote: »
    cgracey wrote: »
    I posted a new v21b at the top of this thread.

    A few minor Verilog changes were made to accommodate synthesis. These changes should not result in any functional differences.

    The Prop123_A9_Prop2_v21b_64smartpins.rbf image contains these differences. Could some of you please run this image and see if you notice any difference/problem? It seems fine to me, but I would feel more confident if you could check it, too. Thanks.

    This new PNut_v21b.exe properly reports FPGA frequency when doing a Ctrl-G.
    Are you saying that there is no difference between 21a and 21b for versions other than the "64smartpins" version?

    The only differences are the new "64smartpins" image and PNut_v21b.exe, which properly shows 80MHz, instead of 120MHz.
    Thanks. I grabbed the new .zip file just for good measure. At least I can run the latest PNut!
  • garryjgarryj Posts: 192
    edited October 2017 Vote Up0Vote Down
    garryj wrote: »
    USB on P2v21a @80MHz is much better. The initial v18->v19+ transition code that failed miserably on P2v19-v21 @120MHz is working on P2v21a @80MHz for full-speed, but for some reason not low-speed.
    Found the low-speed issue, and given my lack of skill/tools in the hardware debugging department, I'm *almost* certain it's related to the POLLCTx instruction. The attached test code repeatedly calls a poll-based version of WAITX, which I use in the USB host cog when I need to do a timed wait that might overlap with the interrupt-driven Start-of-Frame packet that must be transmitted on the USB every millisecond, +/- 0.0005ms.

    Prop123-A9 board + Parallax Serial Terminal, Windows10 and the current P2v21b FPGA image @80MHz. The test code does not utilize interrupts.

    The poll_waitx routine is repeatedly called with a one-second delay time value. When using the POLLCTRx instruction, the first call delays for the expected one second, but the second call will spin until the ~54 second timer wrap-around occurs. Then it successfully repeats five one-second delays, but then will wrap-around again on the sixth attempt, then this pattern continues to repeat.

    I went back to the P2v21 @120MHz image and got similar results, with the difference being four successful calls before the first wrap-around stall and then a stall at six calls thereafter, as with the 80MHz test.

    When the code is changed to use the J[N]CTx instruction, it works as expected.

    Replacing POLLCTx with a J[N]CTx variant appears to have me back on par with the USB code I had running at P2v18 :smile:

    I hope somebody else is able to replicate this -- I'd hate think it might be my Prop123-A9 board :confused:
    garryj
  • Garryj, good job! I'm on it.
  • Ah, this is sounding familiar. Chip fixed up something for Oz on this matter not long ago. Yes, excellent zero'ing in there Garry.
    Money is a placeholder for cooperation
  • cgraceycgracey Posts: 10,588
    edited October 2017 Vote Up0Vote Down
    Garryj, I found the problem!

    The CT1 event flag was being sampled into C on the first clock of the POLLCT1 instruction and then it was being cleared on the second clock. Once in a while, it would sample the event flag on the same clock that it was getting set on, returning C=0, and then clearing the newly-set event flag on the next clock. This caused the event to get 'disappeared', so that it would not be registered again (and only maybe, then), until the 32-bit counter wrapped around again and hit the target.

    The JCT1 instruction didn't have this problem, because it both sampled and cleared the CT1 event flag on the first clock of the instruction.

    This program demonstrates the problem:
    dat	org
    
    	getct	x			'get initial counter
    
    .next	addct1	x,##80_000_000/10	'set ct1 for 1/10s from now
    
    	getrnd	y			'wait some random amount of time
    	and	y,#7			'..to mix up the phase relationship between
    	waitx	y			'..the ct1 event and the pollct1 instruction
    
    .loop	pollct1	wc			'this will hang sometimes
    if_nc	jmp	#.loop			'..until the counter wraps
    
    	drvnot	#32			'ct1 fired, toggle led
    	jmp	#.next
    
    x	long	0
    y	long	0
    

    I've got all the POLLx instructions now sampling and clearing on the first clock of the instruction.

    Here is a Prop123-A9 image with 8 cogs, 512KB hub, and 64 smart pins - just like the silicon is shaping up to be. This runs at 80MHz:

    https://drive.google.com/file/d/0B9NbgkdrupkHdmpGNFRqZXdEdTA/view?usp=sharing

    It is a GOOD thing you discovered this. It would have been a tragedy had this slipped through. I think your code will start working reliably now. Thanks a bunch!
  • Great! Sounds like USB going to be all better again?

    I'm starting to get enthused again...
    Prop Info and Apps: http://www.rayslogic.com/
  • Thanks, Chip, I'll give the P2v22 image a go asap!
    garryj
  • cgraceycgracey Posts: 10,588
    edited October 2017 Vote Up0Vote Down
    garryj wrote: »
    Thanks, Chip, I'll give the P2v22 image a go asap!

    Super!

    That bug crept in when I optimized the way the ALU-result bus worked. I captured every Z/C/result I could, in the first clock, so that there would be less to multiplex in the much-more-critical second clock. That introduced the phase error.

    You living with that bug was demoralizing.
  • cgracey wrote: »
    garryj wrote: »
    Thanks, Chip, I'll give the P2v22 image a go asap!

    Super!

    That bug crept in when I optimized the way the ALU-result bus worked. I captured every Z/C/result I could, in the first clock, so that there would be less to multiplex in the much-more-critical second clock. That introduced the phase error.

    You living with that bug was demoralizing.
    Don't be demoralized, Chip! Most of what you do is beyond my comprehension anyway, and we all know that sooner or later, some kind of s**t happens. I'm super happy just to be minimally involved in something as unique and challenging as this P2 project that you and Parallax have taken on and allowing us all to participate in :-D

    And by the way -- woohoo!!!

    The P2v18->v19 update code looks to be as solid as the v18 code was, whether it be POLLCTx or J[N}CTx. In fact, I think there might have been wider implications of your fix, because yesterday, after I posted the my test code, I was working on a post-v19 version of the code that I had updated to use just the J[N}CTx variant, and the code would enumerate the mouse/keyboard, but would hang when transitioning to the timer-controlled device polling routine. Since this was an offshoot of the v19 code, my thought at the time was that I still had a problem with something not timer-related in the polling routine. But after I finished testing the v19 code, I fired this version up and it, too, worked just fine!

    If your fix did have wider implications, and if you have the time, might it be possible to revisit the land of 120MHz?
    garryj
  • cgraceycgracey Posts: 10,588
    edited October 2017 Vote Up0Vote Down
    I've got the hub slot frequency increased to 1/cogs, instead of the old 1/16.

    This means hub instructions don't have to wait so long to execute (COGINIT, QROTATE, etc.).

    If some of you will remember, there was a 2-bit CORDIC command-in-progress counter used to avoid hang-ups on GETQX/Y. That got increased to 5 bits to accommodate 1- and 2-cog variants, where a cog can issue CORDIC instructions back-to-back. For an 8-cog variant, it's every 8 clocks. You could pipeline your code and get huge throughput for batched operations.

    Also, I moved the debug interrupt vectors around a little so that they go from $FFFFC down, instead of from $FFFC0..$FFFFC. This tidies things up a bit.

    Once I test this in every cog-count configuration and I know it's okay, I'll make a new FPGA version and update the doc's.
Sign In or Register to comment.