Shop OBEX P1 Docs P2 Docs Learn Events
Is the P1V bug free? — Parallax Forums

Is the P1V bug free?

steddymansteddyman Posts: 91
edited 2015-03-10 17:36 in Propeller 1
The reason I ask is I have recreated the DracBlade on the DE-115 (originally with an unmodified P1V core) and it always crashes emulating the Z80 with any program I have tried.

However, every other application works fine, even my test ones. I have even setup the DracBlade in SimpleIDE and compiled XMMC memory model applications to the board and it runs everything flawlessly so the hardware emulation of the DracBlade is clearly correct.

However, every ZiCog / z80 emulator I have tried that supports the DracBlade just crashes. Is it possible there are still bugs in the P1V verilog, or was this the actual code used for the real P1 production run?

Comments

  • Heater.Heater. Posts: 21,230
    edited 2014-11-29 14:47
    steddyman,

    If I understand correctly the P1 chip was not designed in a hardware description language. For this reason the P1 verilog was released as the "P8X32A_Emulation".
    I always thought "emulation" was a bit of a weird way to describe a different hardware implementation, that term is so often used for software emulators, like ZiCog say. But in this case it makes a bit of sense.

    Is it bug free? Who knows? It's verilog. Verilog is software. Software is basically unprovably correct (Mind you so is a hand made schematic of logic gates)

    Can we get some things straight:

    1) Can we assume this mishap happens when using an out of the box P8X32A_Emulation configuration? I guess you have some external RAM attached in that case. We need to know if this is a problem in the original Verilog code or if it's due to something you have changed in it.

    2) What do you mean by "crashes"? Is that Z80 emulation hangs up. Or is that the whole Propeller itself gets into a mess? What actually happens?

    Hopefully this is some instruction that is not implemented correctly in Verilog, wrong flag setting or something, which just happens to be used by ZiCog in some rarely used way.

    I have a nano board here. How can I recreate your set up here? If you have a very latest ZiCog to package up that would be great.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-11-29 15:28
    There were two patches that needed to be applied to the original P1V code. IIRC one was to do with the code resetting. Sorry, but you will need to search these threads. The final correction was posted by Chip so that should help your search.
  • steddymansteddyman Posts: 91
    edited 2014-11-29 16:52
    To answer Heaters questions. Yes, I ran the original P1v core originally with no changes apart from a schematic that used it to link it to SRAM and latches. I have written extensive test apps for that memory interface reading and writing from it for the full memory size using the DracBlade original driver routines and it works flawlessly. I have also compiled XMM memory model demos in PropGCC using the supplied DracBlade config file and again it works flawlessly.

    I can't tell you where exactly it crashes but in all emulator apps, it just hangs almost as soon as it hits the z80 engine but works fine up until that point. For example in the Spectrum emulator it displays the menu of roms on the SD card, let's you scroll up and doing them viewing screenshots of each. All of that is just standard spin and assembler code but that code uses the external SRAM and SD along with everything else on my schematic. As soon as you select a rom and press enter to star it, the screen corrupts and it crashes. Same with the colour genie emulator and same with other z80 based apps.

    Cluso, are you saying that the p1v download source on the parallax site has two known bugs that need to be manually patched?
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-11-29 18:27
    Yes. You need to find Chips post for the fix to his original post. There is only one post from him where he defines the agreed patch. I am not on my PC that contains the fix atm.
  • ozpropdevozpropdev Posts: 2,793
    edited 2014-11-29 19:07
    These are the only issues I am aware of.
    1.Tthe reset bug fix.
    Change in cog.v in the oura/dira section
    always @(posedge clk_cog or negedge ena)
       if (!ena)
          outa <= 32'b0;
       else if (setouta)
          outa <= alu_r;
    
    always @(posedge clk_cog or negedge ena)
       if (!ena)
          dira <= 32'b0;
       else if (setdira)
          dira <= alu_r;
    
    

    2. The counter module PLL bug fix
    Change in cog_ctr.v in the simulated PLL section
    if (~|ctr[30:28] && |ctr[27:26])
    
    
    

    3. The cog led bug for the BEmicro CV version.

    Change in top.tdf file the cog led section
    	ledg[]			= !core.cog_led[];
    
    
    and in the top.qsf file change the led assignments to
     set_location_assignment PIN_N1 -to ledg[0]
     set_location_assignment PIN_N2 -to ledg[1]
     set_location_assignment PIN_U1 -to ledg[2]
     set_location_assignment PIN_U2 -to ledg[3]
     set_location_assignment PIN_W2 -to ledg[4]
     set_location_assignment PIN_AA1 -to ledg[5]
     set_location_assignment PIN_AA2 -to ledg[6]
     set_location_assignment PIN_Y3 -to ledg[7]
    
    

    Hope this helps. :)
    Cheers
    Brian
  • pik33pik33 Posts: 2,394
    edited 2014-11-30 01:14
    There can be some more bugs. I have some strange things while debugging a new blitting procedure for my system. For example this:
    ' bad loop
    p1:
    (some code) 
        djnz counter1, #p1
    
    

    doesnt loop, the loop is executing one time only, but this
    'good loop
    p1:
    (some code) 
        nop
        djnz counter1, #p1
    
    

    works.

    Don't know why this nop makes the difference.

    Then this thing in spin
       call_bad_loop
       call_bad_loop
       call_bad_loop
     
    

    doesn't work, every loop executes one time, but when

       call_bad_loop
       waitcnt(cnt+clkfreq)
       call_bad_loop
       waitcnt(cnt+clkfreq)
       call_bad_loop
       waitcnt(cnt+clkfreq)
     
    

    these bad loops are executed as if they were good.

    Of course I added this nop and forgot the problem, but... why? I can understand something is wrong with this loop without this nop. For example maybe the hardware needs some more time for one loop to write to the SRAM. But why - why???? this waitcnt can make the difference? The loop is of course executed in another cog, but the spin procedure waits for the end of the loop. I checked this, it really waits. Strange.

    But then I added a nop and forgot about the problem.
  • steddymansteddyman Posts: 91
    edited 2014-11-30 03:16
    Thanks Brian. That was really helpful. I've applied those patches and the behavior is slightly different but it still doesn't work.

    Thanks for the heads up Pik. I guess these issues are what is probably killing the emulator. Very hard in emulation like this to find out where it might be going wrong.
  • jmgjmg Posts: 15,182
    edited 2014-11-30 14:40
    pik33 wrote: »
    There can be some more bugs. I have some strange things while debugging a new blitting procedure for my system. For example this:
    ' bad loop
    p1:
    (some code) 
        djnz counter1, #p1
    
    

    doesnt loop, the loop is executing one time only, but this
    'good loop
    p1:
    (some code) 
        nop
        djnz counter1, #p1
    
    

    works.

    Don't know why this nop makes the difference.

    Then this thing in spin
       call_bad_loop
       call_bad_loop
       call_bad_loop
     
    

    doesn't work, every loop executes one time, but when

       call_bad_loop
       waitcnt(cnt+clkfreq)
       call_bad_loop
       waitcnt(cnt+clkfreq)
       call_bad_loop
       waitcnt(cnt+clkfreq)
     
    

    these bad loops are executed as if they were good.

    They sound related, and could be something like the DJNZ needing more than 4 cycles to settle.
    If NOP is ok, as a preceeding opcode, are there any others ?
    ie DJNZ always needing NOP would have been picked up before now, so the last real opcode that the DJNZ packs may be important. ie only some opcode pairings with DJNZ may trigger this phase effect.
  • TubularTubular Posts: 4,706
    edited 2014-11-30 15:30
    Pik does using a lower global clock rate (eg half) solve the NOP issue? Can you reveal what the instruction before the NOP is?
  • ozpropdevozpropdev Posts: 2,793
    edited 2014-11-30 15:40
    I have encountered random issues with stability with clock speeds >100MHz.
    I run all my P1v's @ 80MHz now. Lots of different code with lots of DJNZ loops, no obvious issues so far. :)
    I am running code on a Nano, a DE2-115 and BEMicro CV too.
  • jac_goudsmitjac_goudsmit Posts: 418
    edited 2014-11-30 17:34
    steddyman wrote: »
    Is it possible there are still bugs in the P1V verilog, or was this the actual code used for the real P1 production run?

    It's not impossible that the P1V code still has bugs. There have been several bug fixes since it went into Github; if you get the Master branch from Mindrobots, you will have the most stable version as far as we know.

    Any bugs that may still be in there, are likely to be related to the timers. For example, the PLL is emulated in digital hardware so there may be some jitter. Also, Quartus II generates many warnings and specifically states that it doesn't meet timing requirements; we have to look into that some day but Chip has no time, and most of us are HDL noobs. We're happy it works as well as it does, and I'd certainly be interested in troubleshooting anything that works on the real P1 but not on the P1V.

    ===Jac
  • steddymansteddyman Posts: 91
    edited 2014-12-01 00:28
    Thanks Jac. Just to confirm i am running my DracBlade emulator at the stock 80mhz.
  • napabluenapablue Posts: 5
    edited 2015-01-17 12:14
    Propeller newbie here so I'm a dumb**** from out of town. :D

    I've implemented countless ARMs in Xilinx and on-chip and you have to 100% close timing on targeted clock rate with margin otherwise flakiness appears, even with golden code. Also, not sure if Propeller is same, but ARM is piped so compacted code may differ with instruction sequence used. Proper semaphoring must also be adhered to.
  • rjo__rjo__ Posts: 2,114
    edited 2015-02-11 09:20
    using p1v4cog on de2-115

    I suspect that we can say affirmative to this question:)

    I have just spent an incredible amount of time trying to do the simplest of thing...
    write to the SRAM with direct indexing. I thought I had solved it in an earlier post and then moved on to VGA... but then as I was about to reach a milestone... I went back and thoroughly tested the SRAM implementation.

    The SRAM protocol is simple... but when you assume your own ignorance and try every
    perturbation... it can take time:)

    In this case, it turned out that if I tried to write to the sram (100Mhz independent pll)
    with a Prop clock at 100Mhz... I often got good results...but occasioally bad results...each followed by furtive verilog changes.

    Finally, I reduced the Prop clock to 80Mhz and on first testing, I got 350 million consecutive writes and reads without an error.

    BUT... if I restarted my Spin program repeatedly, I occasionally got errors.
    And the errors seemed patterned. Generally, there was a fixed rate of error for a particular run,
    but the error rate differed significantly between runs.

    Sometimes I would get one error per 35000 writes and sometimes 30000 errors in 35000 writes.
    (the 35000 writes/reads were in a continuous loop with 35000 writes followed by 35000 reads.)
    So, if a run started out bad, it stayed bad with the same amount of badness...
    If a run was perfect in the beginning it stayed that way for the entire run.

    What is interesting to me (but perfectly meaningless)... is that the errors didn't return as random 16 bit numbers... they always came back as "-1".
    I have no idea what this means, but it has to mean something... seems vaguely hardware-ish:)


    Today, I reread this thread and reduced the Prop's clock to 50Mhz.

    I have been sitting here clicking the program button in PropellerIDE all morning without a direct indexing glitch.

    I still have occasional issues with auto-indexing, but now it returns 16 bit values... occasionally wrong but not -1:)
  • pik33pik33 Posts: 2,394
    edited 2015-02-12 12:22
    rjo__ wrote: »

    In this case, it turned out that if I tried to write to the sram (100Mhz independent pll)

    Do not use independent PLL; this is a mother of all SRAM/Propeller problems which stopped my retromachine project for a long time. There were random read/write SRAM errors. The solution is use ONE pll with synchronous frequencies. Cyclone 4 PLL can have more than 1 output, use this.

    I have 152 MHz pixel and SRAM clock, and 114 MHz (0.75*152) Propeller clocked from the same PLL.
  • rjo__rjo__ Posts: 2,114
    edited 2015-02-12 15:30
    lol... I had already decided to use a common pll... because that is what you did.
    Initially I had problems at 80Mhz, but then I shifted the clock feeding the P1V by -90degrees... and
    that seems to have fixed it. Humming along at 100MHz.

    Thanks for all the work you have done on the retromachine... none of what I am doing right now would have been possible without it.

    Rich
  • pik33pik33 Posts: 2,394
    edited 2015-02-14 01:47
    rjo__ wrote: »
    I shifted the clock feeding the P1V by -90degrees... and
    that seems to have fixed it. Humming along at 100MHz.

    I use DFF for this purpose. Recompile the project after changes - errors. Add DFF here and there - it works.
    Recompile once again with some minor change - errors. What to do then? Remove DFF. and then it works :)

    FPGA is fun :)
  • napabluenapablue Posts: 5
    edited 2015-03-10 14:37
    @jac_goudsmit: Coming on a month ago I downloaded, synth'd and worked on closing timing. Wish I had more time and I will return in the future. As I'm not the implementer, I can only offer my BG. A large chunk of the timing errors were FP's, and seems some were MCP's. With the proper constraints, timing errors were removed and implementation eased up a bit. I DID notice the emulated PLL. Not a good choice, but I guess it's to emulate the programmable PLL in the original chip? It'd be MUCH better to use an on-chip DLL.

    Speaking of chip, it'd be fun to chat with Chip sometime.

    Not sure about any of the SRAM issues, but on the board, there are timing paths pin to pin which have to be considered. Tighter timing requires matched paths off-chip, to-chip, and on-chip. Try implementing 40GHz paths through a SERDES and onto an FPGA. :D

    A couple of things I wanted to mention. Verilog IS HDL in this form, not software. It is not being used in a Verilog testbench, it is being implemented in an FPGA, which means it is converted to (programmable) gates, which are then laid out and routed in the FPGA, and MUST meet all timing to be robust, just like the real ASIC. Also, the P1 (that's the ASIC, right?) had to have been designed in HDL.

    Cheers!
  • jmgjmg Posts: 15,182
    edited 2015-03-10 17:36
    napablue wrote: »
    .... I DID notice the emulated PLL. Not a good choice, but I guess it's to emulate the programmable PLL in the original chip? It'd be MUCH better to use an on-chip DLL.

    IIRC, that was done for simplicity, but as most FPGAs have more than one PLL, it would be nice to make use of them, and in many use cases the Counter PLL is not needed.
    The ideal would be to have a Proper PLL for SysCLK generation, and access to a Proper PLL for Conter-NCO work, but of course that second item will be highly target dependent.
Sign In or Register to comment.