Shop OBEX P1 Docs P2 Docs Learn Events
Catalina and the P2 — Parallax Forums

Catalina and the P2

I was wondering, what happened to the efforts to produce P2 code with Catalina ?
«134567

Comments

  • Cluso99Cluso99 Posts: 18,069
    edited 2018-04-20 08:22
    Ross has been MIA for a number of years.
    I doubt there will be any P2 variant by Ross.
  • RossHRossH Posts: 5,462
    Actually, Ross is back! ... well, maybe :)

    David Betz has recently informed me that the P2 is finally available, so if I can get my hands on one, there will be a P2 version of Catalina. However, I can't figure out how to order the evaluation board. The P2 discussion forums point you to the shop link for the "Propeller 2 ES Evaluation Board" but that page just says it is "unavailable". Does anyone know if it is actually available? Perhaps it is already sold out?

    Thanks in advance!

    Ross.
  • Welcome back Ross. I'm sure we can get you something quickly

  • jmgjmg Posts: 15,173
    RossH wrote: »
    Actually, Ross is back! ... well, maybe :)

    David Betz has recently informed me that the P2 is finally available, so if I can get my hands on one, there will be a P2 version of Catalina. However, I can't figure out how to order the evaluation board. The P2 discussion forums point you to the shop link for the "Propeller 2 ES Evaluation Board" but that page just says it is "unavailable". Does anyone know if it is actually available? Perhaps it is already sold out?

    You need to talk with Chip, but there are about 100 P2 EV's in the wild, and some P2D2's as well.
    I believe some damaged P2EV's have been swapped for refurbished so you might score a fixed return unit ?
    ie most packaged P2 die, ended up on boards.

    There is a pause right now, as P2+ is routed/taped out - which must be due for a progress report on that from OnSemi any moment now ?
    The target for P2+ was/is to hit 250MHz and lowest HDMI speeds. Current P2's top out on the bench, overclocked at somwehere over 300MHz, on good copper planes.

  • Welcome back, Ross! I should probably just send you my P2-Eval board since you're likely to make better use of it than I will. Actually, Peter Jakacki might have another P2D2 board available.
  • RossHRossH Posts: 5,462
    Ok - thanks all. Clearly I need to do some more reading. I'm not sure what state the P2 evaluation board is actually in, or what a P2D2 is. Were there problems with the original eval board? Or could it just not be run at the full clock speed? (which wouldn't be a problem for me, at least).

    Ross.
  • cgraceycgracey Posts: 14,155
    RossH, welcome back!

    Everyone's been wondering for years if you were ever returning.

    If you can private-message me, I will see what we can do to get you a P2 Eval board.
  • evanhevanh Posts: 15,916
    edited 2019-03-08 03:20
    RossH wrote: »
    Ok - thanks all. Clearly I need to do some more reading. I'm not sure what state the P2 evaluation board is actually in, or what a P2D2 is. Were there problems with the original eval board? Or could it just not be run at the full clock speed? (which wouldn't be a problem for me, at least).
    To answer, the existing Prop2 is "engineering sample" chips, so not for general evaluation of the finished chip. There is some identified issues but speed certainly wasn't one of them. It can reach over 300 MHz, double the original estimate of 160 MHz! Although it does run hot then.

    I think everyone's hoping next run will be the finished product.
  • jmgjmg Posts: 15,173
    edited 2019-03-08 03:31
    evanh wrote: »
    To answer, the existing Prop2 is "engineering sample" chips, so not for general evaluation of the finished chip.
    ? Engineering samples are certainly intended for evaluation, and whilst they may not operate exactly as the 'finished chip; they are certainly working now, and are fine for much of software and hardware development.
    The errata is quite small, and the code changes also small, some new hardware is coming in P2+, but that will not affect a compiler.
    Biggest gotcha I think in P2-ES is the verilog sign extension oops, between FPGA flows and ASIC flows.

    IIRC, I saw that the small opcode extensions in P2+, can be made backward binary compatible.
    evanh wrote: »
    I think everyone's hoping next run will be the finished product.
    Of course :)

    Chip - any word on how OnSemi is going on the P&R of P2+ and clock gating etc ?

  • RossHRossH Posts: 5,462
    Thanks again, everyone. Fortuitously, a health issue has given me a few weeks where I can't be doing my normal work around our eco-retreat, so I have a window of opportunity to get up to speed on the P2.

    I wonder how David Betz knew???
  • Cluso99Cluso99 Posts: 18,069
    Hi Ross,
    There are two pcbs...
    The P2D2 board which Peter Jakacki (Brisbane) made and contain one of the first 10 P2-ES chips which were epoxied by OnSemi into the IC package.
    The P2_EVAL board was made by Parallax and 100 were built with the P2-ES chips correctly packaged. There were a few more of these P2-ES chips sent to Peter. IIRC there were about 110 of these P2-ES chips.

    There are some minor problems with the P2-ES silicon but IMHO we could have lived with them although the sign problem was a problem.

    There are threads for all these titles.

    Now, the P2 is real and we're expecting the next run back somewhere around May if my memory serves me correctly.

    There are some really exciting instructions in the P2 :smiley:

    Here is the real P2 hardware basics...
    * 8 Cogs with 2KB Cog RAM and 2KB LUT RAM
    * 512KB of HUB RAM dual-mapped into 1MB. The top 16KB is preloaded from internal serial ROM and can be write protected.
    * Cog access to the hub is via the "egg-beater" (see below)
    * 64 I/O, all with "smart-pins" (see below) and ADC
    * inbuilt DACs
    * 100pin QFP
    * clock aimed at 160/180MHz but we've pushed this (with cooling) to over 340MHz :)
    * code can execute from cog, lut and hub (hubexec)
    * most instructions take 2 clocks

    egg-beater
    * the hub is divided into 8 x 64KB (16Kx32bits) blocks of ram
    * each block can be considered the next long address from the previous block, wrapping at the end again.
    * On every clock, the cog has access to one of the hub ram blocks!!!
    * on each clock the cog gains access to the next hub block
    * so, after 8 clocks the cog may have read in 8 longs!!!
    * there is a streamer to facilitate this fast access.

    smart-pins
    * each pin has a smart-pin state machine to do all sorts of things

    And there is so much more.

    Only down-side is that the P2 uses a lot of power. Hopefully the next P2 will not use so much if sections are not running.
    Aim is for 200MHz


  • RossHRossH Posts: 5,462
    Thanks Cluso. I sure have a lot to catch up on!
  • jmgjmg Posts: 15,173
    RossH wrote: »
    Thanks Cluso. I sure have a lot to catch up on!

    DOCs are in the first post of this thread
    http://forums.parallax.com/discussion/162298/prop2-fpga-files-updated-2-june-2018-final-version-32i/p1

    and this thread covers the improvements/fixes in P2+
    http://forums.parallax.com/discussion/169282/list-of-changes-in-next-p2-silicon/p1
  • evanhevanh Posts: 15,916
    jmg wrote: »
    evanh wrote: »
    To answer, the existing Prop2 is "engineering sample" chips, so not for general evaluation of the finished chip.
    ? Engineering samples are certainly intended for evaluation, and whilst they may not operate exactly as the 'finished chip; they are certainly working now, and are fine for much of software and hardware development.
    *Early* development with lots of caveats type evaluation sure. By general and finished I mean as a supported and production ready product.
  • RossHRossH Posts: 5,462
    Ok - first dumb question ...

    I found this in the P2 documentation ...

    "The globally-accessible hub RAM can be read and written as bytes, words, and longs. Hub addresses are always byte-oriented. There are no special alignment rules for words and longs in hub RAM. Cogs can read and write bytes, words, and longs at any hub address, as well as execute instruction longs from any hub address starting at $400."

    But then the description of the "egg beater" says this ...

    "Hub RAM is comprised of 32-bit-wide single-port RAMs with byte-level write controls. For each cog, there is one of these RAMs, but it is multiplexed among all cogs. Let’s call these separate RAMs “slices”. Each RAM slice holds every single/2nd/4th/8th/16th (depending on number of cogs) set of 4 bytes in the composite hub RAM. At every clock, each cog can access the “next” RAM slice, allowing for continuously-ascending bidirectional streaming of 32 bits per clock between the composite hub RAM and each cog."

    So, what happens if I read or write a long that is not aligned on a long boundary? Wouldn't this long contain bytes from different slices?

    (signed)

    Confused! :smile:
  • evanhevanh Posts: 15,916
    RossH wrote: »
    So, what happens if I read or write a long that is not aligned on a long boundary? Wouldn't this long contain bytes from different slices?

    Yes, it adds one clock to the read/write time. The big feature of the egg-beater is it provides a fast burst like action available to consecutive addresses.
  • RossHRossH Posts: 5,462
    evanh wrote: »
    RossH wrote: »
    So, what happens if I read or write a long that is not aligned on a long boundary? Wouldn't this long contain bytes from different slices?

    Yes, it adds one clock to the read/write time. The big feature of the egg-beater is it provides a fast burst like action available to consecutive addresses.

    Ok, I guess that makes sense. But I can see that the timings of instructions on the P2 are going to be a headache! :(
  • cgraceycgracey Posts: 14,155
    RossH wrote: »
    evanh wrote: »
    RossH wrote: »
    So, what happens if I read or write a long that is not aligned on a long boundary? Wouldn't this long contain bytes from different slices?

    Yes, it adds one clock to the read/write time. The big feature of the egg-beater is it provides a fast burst like action available to consecutive addresses.

    Ok, I guess that makes sense. But I can see that the timings of instructions on the P2 are going to be a headache! :(

    Executing from hub, yes, timing is hard to know on branching, but straight-line code is just like within cog/LUT RAM.
  • RossHRossH Posts: 5,462
    cgracey wrote: »
    RossH wrote: »
    Executing from hub, yes, timing is hard to know on branching, but straight-line code is just like within cog/LUT RAM.

    Yes, I think I get it. It probably won't matter for most compiler generated code :smile:
  • Welcome back, @RossH ! It's an interesting time for P2 tools development, and it'll be great to have your insights and experience!

    The P2 is a much nicer target for compilers than P1. No more worrying about LMM, and no hassles about getting large constants into registers.

    Cheers,
    Eric
  • RossHRossH Posts: 5,462
    Thanks, Eric.

    Yes, I can already see several features that will make compilers easier ... but also quite a few that I can't figure out how a compiler could ever possibly use! :smile:
  • RossH wrote: »
    Thanks, Eric.

    Yes, I can already see several features that will make compilers easier ... but also quite a few that I can't figure out how a compiler could ever possibly use! :smile:
    That's often true. RISC instruction sets were designed to only include instructions that were of use to compilers but I guess we've moved way beyond that here. However, all of these fancy features will be great for PASM programmers.

  • RossHRossH Posts: 5,462
    I have a couple of questions on LUT RAM ...

    The P2 documentation says "The lookup RAM must be read and written using RDLUT/WRLUT instructions."

    But the documentation also says you can access LUT using RDLONG and WRLONG, if you also use SETQ - e.g. "Use SETQ2+RDLONG to read multiple hub longs into cog lookup RAM". But I can't quite see how the RDLONG knows to use lookup RAM and not register RAM. Also, if it can do this, can you also just use individual RDLONG instructions to read into LUT RAM - i.e. without using SETQ?

    My next question is about LUT sharing. You can use SETLUTS to share (for example) the LUT RAM between cog 0 and cog 1. I just want to make sure I understand this. My reading is that before any LUT writes, the LUT RAM of cog 0 and cog 1 would both contain their respective original values for a specific LUT RAM location, which may be different. But if either one writes to a LUT RAM location, they will both thereafter read the same value from that location. What happens if both cogs write to the same LUT RAM location at the same time - which value ends up in LUT RAM?

  • evanhevanh Posts: 15,916
    RossH wrote: »
    I have a couple of questions on LUT RAM ...

    The P2 documentation says "The lookup RAM must be read and written using RDLUT/WRLUT instructions."

    But the documentation also says you can access LUT using RDLONG and WRLONG, if you also use SETQ - e.g. "Use SETQ2+RDLONG to read multiple hub longs into cog lookup RAM". But I can't quite see how the RDLONG knows to use lookup RAM and not register RAM. Also, if it can do this, can you also just use individual RDLONG instructions to read into LUT RAM - i.e. without using SETQ?
    Short answer is no. SETQ2 + RD/WRLONG is special case.

    Internally, SETQ2 must be setting a hidden flag than RD/WRLONG alone will check for. When those two instructions see this flag waving at them their become a different, and "complex", instruction. Same story for SETQ with another flag.
  • evanhevanh Posts: 15,916
    edited 2019-03-10 06:58
    RossH wrote: »
    My next question is about LUT sharing. You can use SETLUTS to share (for example) the LUT RAM between cog 0 and cog 1. I just want to make sure I understand this. My reading is that before any LUT writes, the LUT RAM of cog 0 and cog 1 would both contain their respective original values for a specific LUT RAM location, which may be different. But if either one writes to a LUT RAM location, they will both thereafter read the same value from that location. What happens if both cogs write to the same LUT RAM location at the same time - which value ends up in LUT RAM?

    LUTSON (SETLUTS #1) is a one-way control, it allows writes to this cog's LUTRAM from the other cog's WRLUT instructions. But both cogs can issue the same and get bidirectional going. I don't know the answer ... and it won't be the same behaviour between the P2ES and the final Prop2 hardware either since there was a design flaw that got fixed around this.

    Two simultaneous writes to the one location is really not something that has been considered. The fix was with respect to simultaneous read and write.
  • RossHRossH Posts: 5,462
    Ah! Thanks. Yes, the "SETQ2" vs "SETQ" should have been a giveaway.
  • RossHRossH Posts: 5,462
    evanh wrote: »
    LUTSON (SETLUTS #1) is a one-way control, it allows writes to this cog's LUTRAM from the other cog's WRLUT instructions. But both cogs can issue the same and get bidirectional going. I don't know the answer ... and it won't be the same behaviour between the P2ES and the final Prop2 hardware either since there was a design flaw that got fixed around this.

    Two simultaneous writes to the one location is really not something that has been considered. The fix was with respect to simultaneous read and write.

    I wonder if it might happen that each cog's write might end up in the LUT of the other cog? That would at least be symmetrical and consistent!
  • Cluso99Cluso99 Posts: 18,069
    RossH wrote: »
    I have a couple of questions on LUT RAM ...

    The P2 documentation says "The lookup RAM must be read and written using RDLUT/WRLUT instructions."

    But the documentation also says you can access LUT using RDLONG and WRLONG, if you also use SETQ - e.g. "Use SETQ2+RDLONG to read multiple hub longs into cog lookup RAM". But I can't quite see how the RDLONG knows to use lookup RAM and not register RAM. Also, if it can do this, can you also just use individual RDLONG instructions to read into LUT RAM - i.e. without using SETQ?
    SETQ2 is a special case for RDLONG. This is copying a block from HUB to LUT, or LUT to HUB with SETQ2 & WRLONG.
    This is how you move HUB <--> LUT

    RD/WRLUT is for moving between COG <--> LUT and it has no SETQ/SETQ2 equivalent.

    The PTRA/B effects will also be valid for RD/WRLUT in the next silicon.
    My next question is about LUT sharing. You can use SETLUTS to share (for example) the LUT RAM between cog 0 and cog 1. I just want to make sure I understand this. My reading is that before any LUT writes, the LUT RAM of cog 0 and cog 1 would both contain their respective original values for a specific LUT RAM location, which may be different. But if either one writes to a LUT RAM location, they will both thereafter read the same value from that location. What happens if both cogs write to the same LUT RAM location at the same time - which value ends up in LUT RAM?
    COG0 & COG1 LUTs may contain different values.
    If COG0 has enabled sharing, then when COG1 writes to its LUT, that will be also written to COG0's LUT.
    Similarly, if COG1 has enabled sharing, then when COG0 writes to its LUT, that will also be written to COG1's LUT.

    There is currently a bug when reading from the same location as writing. This will be fixed in the next silicon. Currently you need to read it twice and compare.

    I don't think there is, or will be, any contention protection when both cogs write to the same LUT address.

  • RossH wrote: »
    evanh wrote: »
    LUTSON (SETLUTS #1) is a one-way control, it allows writes to this cog's LUTRAM from the other cog's WRLUT instructions. But both cogs can issue the same and get bidirectional going. I don't know the answer ... and it won't be the same behaviour between the P2ES and the final Prop2 hardware either since there was a design flaw that got fixed around this.

    Two simultaneous writes to the one location is really not something that has been considered. The fix was with respect to simultaneous read and write.

    I wonder if it might happen that each cog's write might end up in the LUT of the other cog? That would at least be symmetrical and consistent!

    I'd prefer each cog's LUT to get its own data.

    What happens if there are simultaneous writes to different locations - does or will that work? And can we say for certain yet what will work in the final hardware for different combinations of LUT sharing and simultaneous reading and writing?
  • evanhevanh Posts: 15,916
    edited 2019-03-10 11:42
    TonyB_ wrote: »
    What happens if there are simultaneous writes to different locations - does or will that work?
    That one's no issue. The RAM is dual-ported so there is no conflict there.
    And can we say for certain yet what will work in the final hardware for different combinations of LUT sharing and simultaneous reading and writing?
    Yes, Chip has it implemented in the FPGA. I've tested it. The read data corruption no longer occurs.

    A little difficult to prove given the somewhat unknown nature of coordinating two cogs but, supposedly, the reading cog receives the new data being written by the writing cog. This will be done with extra mux circuit to copy around the RAM when the two accesses coincide.
Sign In or Register to comment.