Shop OBEX P1 Docs P2 Docs Learn Events
New FPGA files for next silicon version - 5th/final release - contains new ROM!! - Page 3 — Parallax Forums

New FPGA files for next silicon version - 5th/final release - contains new ROM!!

13

Comments

  • cgraceycgracey Posts: 14,133
    Ok. Great!

    How about the jump-on-event bug?
  • cgracey wrote: »
    Ok. Great!

    How about the jump-on-event bug?
    About to start testing that now….
  • evanhevanh Posts: 15,126
    cgracey wrote: »
    If you can, please verify that the LUT-sharing bug is fixed, as well as the JMP-event-within-REP bug.

    Confirming all good with the lut-sharing. Tested all eight cogs, four patterns each.
  • Chip
    First test of "jump on event bug" looks good on V33i on the P123-A9 FPGA.
    I Went back to Rev.A silicon to verify and bug appeared, so looks Ok so far.

    I will run some more tests soon.
  • evanhevanh Posts: 15,126
    I had one version of my code that was totally 0% branching on an edge event and with only the sporadic fall through breaking the REP. I've doubled check it still does that on v32i.

    With exact same code just recompiled with Pnut33g and on v33i FPGA it is flawless 100% working. Not a single fall through either. Perfect fix for me. :)
  • cgraceycgracey Posts: 14,133
    edited 2019-02-01 11:59
    Ozpropdev and Evanh, thank you for checking these things. I'm really glad these problems got addressed, thanks mainly to your persistence. To have LUT sharing working right, along with event jumps, is a big improvement.
  • V33i now reports 8 COGs, 512K, 63 Smart Pins. Thanks Chip!
  • cgraceycgracey Posts: 14,133
    Publison wrote: »
    V33i now reports 8 COGs, 512K, 63 Smart Pins. Thanks Chip!

    You bet. Thanks for trying it out.

    If there are no more logic changes, we'll have one more update with the final ROM installed.
  • TonyB_TonyB_ Posts: 2,108
    edited 2019-02-01 20:52
    cgracey wrote: »
    evanh wrote: »
    evanh wrote: »
    There was an idea or two I had but they weren't of much significance, or too big.
    One idea that would be nice to have is changing XORO32 and SCA results to feeding next D input instead of next S input.

    Yes. I looked into this. It's doable, but I wasn't convinced of its benefit. Could you please refresh me on this? A link would do. Thanks, Evanh.
    cgracey wrote: »
    evanh wrote: »

    Thanks, Evanh. I looked all that over. I also looked at the Verilog code. I don't feel like this would be worth doing, at this point. Thanks for bringing it up, again, though.

    Although I didn't think so at first, I believe Evan's idea is a very good one indeed. The rest of us can't see the Verilog but if all other things are equal in terms of the logic then I'm convinced that D is much better than S.

    D is ideal for direct arithmetic and writing to LUT or hub RAM or pins. It also gives us in effect three operands in one instruction, with benefits that are yet to be fully appreciated.

    If S is chosen there will be extra instructions that could have been avoided and other users in the future will be asking themselves "why wasn't D chosen instead?" but by then it will be too late to change.

    * * * * * * * * * *

    The Spin2 interpreter has SCA and SCAS but they use the Cordic. Are there any real-world PASM examples for the SCA instruction? Would it make any meaningful difference if the high word is not zero? Specifically, if the 16-bit right shift were replaced by a 16-bit rotate, then a 32x32 multiply with 64-bit result would be faster. Please see http://forums.parallax.com/discussion/169698/substantially-faster-shorter-multiply#latest

    The great thing about SCA is that it doesn't change the operands being multiplied, so a MUL plus a shift/rotation is not equivalent.
  • cgraceycgracey Posts: 14,133
    TonyB_ wrote: »
    cgracey wrote: »
    evanh wrote: »
    evanh wrote: »
    There was an idea or two I had but they weren't of much significance, or too big.
    One idea that would be nice to have is changing XORO32 and SCA results to feeding next D input instead of next S input.

    Yes. I looked into this. It's doable, but I wasn't convinced of its benefit. Could you please refresh me on this? A link would do. Thanks, Evanh.
    cgracey wrote: »
    evanh wrote: »

    Thanks, Evanh. I looked all that over. I also looked at the Verilog code. I don't feel like this would be worth doing, at this point. Thanks for bringing it up, again, though.

    Although I didn't think so at first, I believe Evan's idea is a very good one indeed. The rest of us can't see the Verilog but if all other things are equal in terms of the logic then I'm convinced that D is much better than S.

    D is ideal for direct arithmetic and writing to LUT or hub RAM or pins. It also gives us in effect three operands in one instruction, with benefits that are yet to be fully appreciated.

    If S is chosen there will be extra instructions that could have been avoided and other users in the future will be asking themselves "why wasn't D chosen instead?" but by then it will be too late to change.

    * * * * * * * * * *

    The Spin2 interpreter has SCA and SCAS but they use the Cordic. Are there any real-world PASM examples for the SCA instruction? Would it make any meaningful difference if the high word is not zero? Specifically, if the 16-bit right shift were replaced by a 16-bit rotate, then a 32x32 multiply with 64-bit result would be faster. Please see http://forums.parallax.com/discussion/169698/substantially-faster-shorter-multiply#latest

    The great thing about SCA is that it doesn't change the operands being multiplied, so a MUL plus a shift/rotation is not equivalent.

    The thing is, ON is now compiling what I last gave them and after looking at the areas in the Verilog that I would need to change, I'd need the room to take a deep breath and approach this. I don't have that right now and I don't want to restart the process of ON getting the layout going. If the opportunity arises, maybe we can change this, but I don't want to do it right now.
  • cgracey wrote: »
    TonyB_ wrote: »
    Although I didn't think so at first, I believe Evan's idea is a very good one indeed. The rest of us can't see the Verilog but if all other things are equal in terms of the logic then I'm convinced that D is much better than S.

    D is ideal for direct arithmetic and writing to LUT or hub RAM or pins. It also gives us in effect three operands in one instruction, with benefits that are yet to be fully appreciated.

    If S is chosen there will be extra instructions that could have been avoided and other users in the future will be asking themselves "why wasn't D chosen instead?" but by then it will be too late to change.

    * * * * * * * * * *

    The Spin2 interpreter has SCA and SCAS but they use the Cordic. Are there any real-world PASM examples for the SCA instruction? Would it make any meaningful difference if the high word is not zero? Specifically, if the 16-bit right shift were replaced by a 16-bit rotate, then a 32x32 multiply with 64-bit result would be faster. Please see http://forums.parallax.com/discussion/169698/substantially-faster-shorter-multiply#latest

    The great thing about SCA is that it doesn't change the operands being multiplied, so a MUL plus a shift/rotation is not equivalent.

    The thing is, ON is now compiling what I last gave them and after looking at the areas in the Verilog that I would need to change, I'd need the room to take a deep breath and approach this. I don't have that right now and I don't want to restart the process of ON getting the layout going. If the opportunity arises, maybe we can change this, but I don't want to do it right now.

    Chip, I appreciate that the last train has probably left the station and unless something important needs fixing the design is done. I mentioned the SCA rotation only as a consequence of a post today about faster multiplies. The interesting point is this change to SCA saves code if the result goes to the next instruction's D, but I can find no saving if it goes to S. This confirms to me that D is intrinsically superior even though S seems the more obvious choice. These are my final thoughts on this subject and thanks for considering the various ideas that have been suggested by everyone.
  • cgraceycgracey Posts: 14,133
    TonyB_ wrote: »
    cgracey wrote: »
    TonyB_ wrote: »
    Although I didn't think so at first, I believe Evan's idea is a very good one indeed. The rest of us can't see the Verilog but if all other things are equal in terms of the logic then I'm convinced that D is much better than S.

    D is ideal for direct arithmetic and writing to LUT or hub RAM or pins. It also gives us in effect three operands in one instruction, with benefits that are yet to be fully appreciated.

    If S is chosen there will be extra instructions that could have been avoided and other users in the future will be asking themselves "why wasn't D chosen instead?" but by then it will be too late to change.

    * * * * * * * * * *

    The Spin2 interpreter has SCA and SCAS but they use the Cordic. Are there any real-world PASM examples for the SCA instruction? Would it make any meaningful difference if the high word is not zero? Specifically, if the 16-bit right shift were replaced by a 16-bit rotate, then a 32x32 multiply with 64-bit result would be faster. Please see http://forums.parallax.com/discussion/169698/substantially-faster-shorter-multiply#latest

    The great thing about SCA is that it doesn't change the operands being multiplied, so a MUL plus a shift/rotation is not equivalent.

    The thing is, ON is now compiling what I last gave them and after looking at the areas in the Verilog that I would need to change, I'd need the room to take a deep breath and approach this. I don't have that right now and I don't want to restart the process of ON getting the layout going. If the opportunity arises, maybe we can change this, but I don't want to do it right now.

    Chip, I appreciate that the last train has probably left the station and unless something important needs fixing the design is done. I mentioned the SCA rotation only as a consequence of a post today about faster multiplies. The interesting point is this change to SCA saves code if the result goes to the next instruction's D, but I can find no saving if it goes to S. This confirms to me that D is intrinsically superior even though S seems the more obvious choice. These are my final thoughts on this subject and thanks for considering the various ideas that have been suggested by everyone.

    TonyB_, it probably IS better. If we have the opportunity, we can do this. Let's see what happens with how things are going.
  • evanhevanh Posts: 15,126
    edited 2019-02-02 05:59
    Chip,
    What was the rational behind CALLA/CALLB stacking upwards instead of downwards in hubRAM?

    PS: I note WRLONG can stack up or down, so PUSHA, for example, could be aliased either way around. But not so for CALLx and RETx.
  • cgraceycgracey Posts: 14,133
    evanh wrote: »
    Chip,
    What was the rational behind CALLA/CALLB stacking upwards instead of downwards in hubRAM?

    PS: I note WRLONG can stack up or down, so PUSHA, for example, could be aliased either way around. But not so for CALLx and RETx.

    CALL = PUSH = add on top = grow upwards
  • evanhevanh Posts: 15,126
    Why up instead of down? Stacks aren't usually placed at start of memory space.
  • evanhevanh Posts: 15,126
    Hmm, maybe the hubRAM stack should be de-facto right at the beginning. That way the system parameters being discussed in the other topic can be the first items on it - https://forums.parallax.com/discussion/169714/p2-mailbox-and-parameters-where-to-place-and-what-is-needed/p1
    and https://forums.parallax.com/discussion/169697/rom-changes-for-next-silicon/p1
  • It is because SPIN did this on the P1, having the stack behind the program and pushing upwards.

    I guess

    Mike
  • evanhevanh Posts: 15,126
    Ah, I see. I never did get round to doing anything with the prop1.
  • you are missing out on something there. The P1 is a funny little beast and not that much different from the P2. At least from the view of programming a multi core.

    I am just stepping up and have a lot of fun with the P2.

    Enjoy!

    Mike
  • Mark_TMark_T Posts: 1,981
    edited 2019-02-02 12:25
    cgracey wrote: »
    CALL = PUSH = add on top = grow upwards

    Note on most modern architectures, stacks grow down, heaps grow up. I don't think I've seen a
    compiler that does it otherwise. If you only have positive offset addressing it pretty much forces
    the choice as local variables are addressed from the SP, rather than maintain a separate FP.
  • RaymanRayman Posts: 13,805
    I think I remember that the 80x86 has a push instruction that adds to the stack pointer register...
  • Rayman wrote: »
    I think I remember that the 80x86 has a push instruction that adds to the stack pointer register...

    Z80, 8080, 8086, 68000 all pre-decrement SP on push. I think one notable exception in the microprocessor world is the 8051.
  • RaymanRayman Posts: 13,805
    You're right... Guess I remembered wrong, been a while..
  • RaymanRayman Posts: 13,805
    I just tried out V33 with my 90's 3D code...
    Didn't work at first, but after some WinDiff on the VGA example, I found I needed to change these lines to match and now it works:
    m_bs        long    $7F010000+16        'before sync
    m_sn        long    $7F010000+96        'sync
    m_bv        long    $7F010000+48        'before visible
    m_vi        long    $7F010000+640       'visible
    
    m_rf        long    $7F080000+640       'visible rlong 8bpp lut
    
  • cgraceycgracey Posts: 14,133
    I posted the final logic version at the top of this thread, which includes, lastly, the extra register on each IN signal from the pins. This is to ensure metastability in the final silicon. Note that this adds one clock period to the IN signals.

    Peter and Ray, we need to use this version to finish developing the ROM code.
  • evanhevanh Posts: 15,126
    Chip,
    The FPGA image files are all numbered v32j, not v33j. Is that just a typo or worse?
  • cgraceycgracey Posts: 14,133
    evanh wrote: »
    Chip,
    The FPGA image files are all numbered v32j, not v33j. Is that just a typo or worse?

    Whoops! That should be v33j. I will fix that this morning. Thanks for noticing.
  • Is the v32j a valid v33j image? Is it just mislabeled? I loads fine.
  • cgraceycgracey Posts: 14,133
    edited 2019-02-13 19:48
    Publison wrote: »
    Is the v32j a valid v33j image? Is it just mislabeled? I loads fine.

    It was just mislabeled. I fixed it. Sorry about that.
  • cgraceycgracey Posts: 14,133
    I just updated the main file at the top of the thread.

    There is a new PNut_v33h.exe which allows a full 1MB download on the -A9 boards to permit ROM updating. This is necessary for PeterJakacki and Cluso99. Others don't need to care about this.
Sign In or Register to comment.