Shop OBEX P1 Docs P2 Docs Learn Events
P2 Backward Compatibility with P1 — Parallax Forums

P2 Backward Compatibility with P1

I am not a Verilog guy but I am a motivated fast learner

What I really want is a P1 with 2 clocks per instruction at 120MHz ie 60Mip Cores ( 3X speedup )

Ideally this would be an actual parallax ASIC but until then I am looking at the MAX10 as the target ( I saw P1V metrics at 133Mhz ).

Am I better to 'upgrade' the P1V Code to be 2 clocks per instruction ( Like the P2 )

OR

Am I better to 'downgrade' the P2 Code to strip it back to a P1 to benefit from the 2 clocks per instruction already in the P2

The P2 Fpga release notes states
>* The Verilog source code is now capable of making any sub-version of Prop2

Does this mean I can already configure the P2 Verilog to conditionally compile out the 'non P1' logic to strip back to a P1 backward compatible core ( or close enough to it )?

Thanks.

Comments

  • jmgjmg Posts: 15,173
    VBB wrote: »
    What I really want is a P1 with 2 clocks per instruction at 120MHz ie 60Mip Cores ( 3X speedup )
    ...
    Am I better to 'upgrade' the P1V Code to be 2 clocks per instruction ( Like the P2 )
    ...

    If what matters is Compatibility with P1, then P1V is the only choice.
    P2 is not binary compatible with P1.


    You could experiment with 2 clocks, but may find it drops MHz at the same time.
    (ie you do not quite double the P1V)

    Another part to consider would be the ICE40 ultra plus, with 128kB SRAM
    - tho that will not fit 8 COGs.
    How many of these now-faster COGs do you need ?
    With Fewer COGS, the HUB timing improves too.
  • >If what matters is Compatibility with P1, then P1V is the only choice.

    Binary compatibility is not critical, I can retarget to different binary output

    >How many of these now-faster COGs do you need ?

    Yes, I was considering a partitioning approach. The application is a soft-peripheral co-processor but the by definition it is unknown the future number or specifications of COG's

    For example, I want to be able to capture and buffer SPI. Even at 'Arduino' default speed of 4Mhz this is a challenge requiring 2 interleaving COG's. Yet I want to be able to offer a wider range of SPI specification than 4Mhz

    I can live with less 'custom' hardware and more core speed to simplify soft-peripheral development. I can even throw away some instructions and code around them.

    If I was an expert Verilog guy I would add some DMA Fifo's - maybe these are the P2 streamers.

    But once you start unpacking Verilog, you might aswell go 'all the way' full custom
  • VBB wrote: »
    Am I better to 'downgrade' the P2 Code to strip it back to a P1 to benefit from the 2 clocks per instruction already in the P2
    I don't think this is really an option. The P2 Verilog code has not yet been released.

  • OK! good to know .. so having not looked at all these big P2 downloads I didn't understand that they are released in some sort of compiled synthesised? format.

    I was leaning toward upgrading/enhancing P1V anyhow

    I think the suggested partitioning approach might be a place to start as it will give an immediate reduction in hub access times.

    I saw some configurable version of P1V ozprop created so that will be the place to start.

    Thanks for the feedback.
  • jmgjmg Posts: 15,173
    edited 2017-02-06 00:42
    VBB wrote: »
    I saw some configurable version of P1V ozprop created so that will be the place to start.

    Sounds a good idea.

    There is a HyperMAX development board, (HyperRAM + MAX10) but no sign of stocks or prices yet ?
    Lattice also have a ICE40 ultra plus that comes in a nice QFN48, with eval board ICE40UP5K-B-EVN
    This part has 128K SRAM on board, but a more limited LE count.

    There have been reports of very compact Cores for x86, i51 in the region of 300+LEs(!), using microcode approach. This uses more clock cycles, and a local microcode ROM, to emulate all opcodes on a simpler, smaller core.

    A P1V that allowed a choice between Fast or Small COGS could allow smaller FPGAs to be used.

  • I obviously have a lot to learn about Verilog but I can see it offers alot of power. Probably I will stick to the ASIC P1 for the first version but I want to at least want to evaluate any low hanging fruit of the P1V version

    For example, I have a MAX10M08 dev kit which offers some accessible TQFP parts. I like the MAX 10 because of the onboard Flash so no extra EEPROM.

    One idea might be to run 2 X P1V with 4 COGS each for double the access time

    I can afford to drop the video hardware and maybe even conditionally compile out some instructions.

    If some instructions are particularly heavy maybe I can microcode them.. The verilog remains mysterious on first read but maybe these shifters are heavy. In which case maybe I could limit the range of shift and restrict the code to repeated shifting.

    One idea that would require actual new Verilog (rather than conditional compiling out working code) might be have a 1 level cache register under the wrxx instructions which (somehow!?) puts the address/data in a cached register and inserts it once the hub aligns

    So these might be some wins I can explore.. then again I just ordered a PIC32MZ..

  • jmgjmg Posts: 15,173
    edited 2017-02-06 21:24
    VBB wrote: »
    One idea might be to run 2 X P1V with 4 COGS each for double the access time
    That needs a larger MAX10, but if HUB delays are critical, you could reduce COGSs and allocate more slots to one COG

    VBB wrote: »
    If some instructions are particularly heavy maybe I can microcode them.. The verilog remains mysterious on first read but maybe these shifters are heavy. In which case maybe I could limit the range of shift and restrict the code to repeated shifting.

    Yes, that is what microcoding allows.
    There are also Multipliers in most of the candidate FPGAs that could also be used to lower the LE counts.
    VBB wrote: »
    So these might be some wins I can explore.. then again I just ordered a PIC32MZ..
    Pairing a P1V with a larger MCU also can make sense, even up to a RaspPi.
    The P1V does the focused IO stuff, and the larger MCU/MPU does the slower processing.


  • >The P1V does the focused IO stuff, and the larger MCU/MPU does the slower processing.

    This is exactly how I am using it but with a much lower cost PIC18. It's a revisit and expansion on the earlier ICEShield project which I outlined in this post along with some of the performance issues I have been dealing with.

    However the dynamic re-configurability of the P1 is going to win out and I am all tooled up now with prototype hardware and custom emulators, compilers, debuggers etc to push out the virtual peripherals so the P1 ASIC will end up in the first product iteration

    The stripped down P1V+MAX10 or PIC32MZ will be the second iteration to expand capabilities but it's useful to scope it out.

  • jmgjmg Posts: 15,173
    VBB wrote: »
    However the dynamic re-configurability of the P1 is going to win out and I am all tooled up now with prototype hardware and custom emulators, compilers, debuggers etc to push out the virtual peripherals so the P1 ASIC will end up in the first product iteration

    What do you mean by 'P1 ASIC', exactly ?

    There are also now prices out on very new ICE40UP5K-SG48, ($6~7) which has a somewhat modest 5280 LUT, but a nice large 1024kb SRAM and comes in a usable, compact QFN48 with 39 io.
    I think not as fast as MAX10.

  • > 'P1 ASIC', exactly ?

    I mean the one you can 'buy off the shelf' from parallax :-)

    >ICE40UP5K-SG48

    The trend continues - it's quite amazing really. I can't wait to put aside the time to really get under the hood with Verilog. Right now just trying to push the product to market.
  • jmgjmg Posts: 15,173
    VBB wrote: »
    > 'P1 ASIC', exactly ?

    I mean the one you can 'buy off the shelf' from parallax :-)
    Ah ok, was not sure.
    Yes, it makes sense to start with a P1, and find what cannot quite be done using that.
    The P1 still represents the lowest price per COG of any solution.
    VBB wrote: »
    >ICE40UP5K-SG48

    The trend continues - it's quite amazing really. I can't wait to put aside the time to really get under the hood with Verilog. Right now just trying to push the product to market.

    The nice thing about the ICE40UP5K-SG48 QFN48 package, is you can probably allow for it on the PCB design in the first product iteration, but not plan to use it immediately.
    Using something like a CP2105, allows one UART for P1 and one UART for P1V

Sign In or Register to comment.