Shop OBEX P1 Docs P2 Docs Learn Events
Propeller II update - BLOG - Page 166 — Parallax Forums

Propeller II update - BLOG

1163164166168169223

Comments

  • wmosscropwmosscrop Posts: 406
    edited 2014-01-21 10:42
    Tor wrote: »
    On the other hand.. instructions were also added to processor architectures with the intention to 'help' compiler writers.

    Back in the days of the minicomputers, Data General implemented their MV-Series CPU on a single microcoded chip.

    Unfortunately, there wasn't enough microcode space to implement all of the instructions. So, the designers chose some instructions that would be emulated by the CPU instead.

    Of course, the emulation would be much slower than when implemented in microcode, but if it's for instructions that aren't used that often, it won't make a difference.

    Unless you pick the wrong instructions.

    Apparently the designers didn't look at the code generated by all of their compilers... and the COBOL compiler output (quite often) one of those "helpful" instructions (I believe it was a numeric-to-text formatter, but I'm not sure).

    The end result was that some COBOL programs ran about 6X SLOWER on the new hardware.

    Walter
  • cgraceycgracey Posts: 14,133
    edited 2014-01-22 10:32
    I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.
  • mindrobotsmindrobots Posts: 6,506
    edited 2014-01-22 10:40
    cgracey wrote: »
    I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.

    Super, Chip!

    Do you think we'll still be able to have something that fits into a Nano or are we all destined to upgrade to a DE2-115 or drop out of the testing program?
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-22 10:41
    Thank you for the update - I have dusted off my DE2-115 in preparation for playing with hubexec :)

    Can you describe the final configuration?

    Last I heard:

    - 256KB hub
    - 4 lines icache with LRU (possibly increased to 8 lines)
    - prefetch
    - 1 line dcache

    cgracey wrote: »
    I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-22 11:18
    Thank you for the update - I have dusted off my DE2-115 in preparation for playing with hubexec :)

    Can you describe the final configuration?

    Last I heard:

    - 256KB hub
    - 4 lines icache with LRU (possibly increased to 8 lines)
    - prefetch
    - 1 line dcache


    That's all correct. And it's 4 cogs on the DE2-115. I'll remove CTRB on the DE0-Nano compile to hopefully get a fit.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-22 11:29
    It would be really nice if it fits in the Nano.

    If it is not enough to remove CTRB, perhaps remove CORDIC?

    One test I am planning is a mini-P2 network over the new high speed uarts between my DE2-115 and two of my DE0-Nano's...
    cgracey wrote: »
    That's all correct. And it's 4 cogs on the DE2-115. I'll remove CTRB on the DE0-Nano compile to hopefully get a fit.
  • ctwardellctwardell Posts: 1,716
    edited 2014-01-22 11:32
    It would be really nice if it fits in the Nano.

    If it is not enough to remove CTRB, perhaps remove CORDIC?

    One test I am planning is a mini-P2 network over the new high speed uarts between my DE2-115 and two of my DE0-Nano's...

    I wonder if the UARTs are subject to change with the possible SERDES work, or will those stay separate?

    C.W.
  • SapiehaSapieha Posts: 2,964
    edited 2014-01-22 11:41
    Hi Chip.

    Will itbe possible to use even PORT C spare bits internaly?

    cgracey wrote: »
    I had to make lots of assembler changes to properly support hub exec mode. I think it's all done now. I'm just recompiling the FPGA, since I found a minor bug. Hopefully, I'll have an FPGA update in a day, or two.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-22 11:42
    Last I heard, the uarts are staying, and SERDES will likely be an alternate mode, with shared circuitry. I am eagerly awaiting what Chip cooks up :)
    ctwardell wrote: »
    I wonder if the UARTs are subject to change with the possible SERDES work, or will those stay separate?

    C.W.
  • jmgjmg Posts: 15,155
    edited 2014-01-22 11:42
    It would be really nice if it fits in the Nano.

    The Nano is an obvious target, but there is also a low cost Cyclone V board with 12% more LEs

    Besides the 12% extra size, the speed from a Cyclone V build would be interesting to see.

    or. optimize for size may give enough, but Nano would then be slightly slower.
    If it is not enough to remove CTRB, perhaps remove CORDIC?

    Remove of Cordic could make more sense, as peripheral and Counter testing will need a burst of activity.

    P2 Counter docs are late arriving, so that compresses any testing time.
  • jmgjmg Posts: 15,155
    edited 2014-01-22 11:47
    Last I heard, the uarts are staying, and SERDES will likely be an alternate mode, with shared circuitry. I am eagerly awaiting what Chip cooks up :)

    Yes, needs more user control and Sync modes : SPI, I2S, QuadSPI (& JTAG?) with granular Baud/Length choices and 50MHz+ would be a solid base. Additional bit-level support for some USB primitives might also make it.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-24 07:43
    I am having trouble waiting for the next bitstream :)

    I too hope that the nano's will still get an update.

    That low cost board is interesting, and I also find the new Cyclone V GX Starter Kit for $179 VERY interesting. If Chip could map the HDMI output to the Prop component video mode, and the DDR2 to prop's DDR, it could be a very nice 2 cog 256KB hub no expansion board required platform.

    http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=167&No=830&PartNo=1

    (replying to your next message)

    I am REALLY looking forward to what Chip does for Serdes.

    At a minimum I'd like to see as fast as possible SPI master and slave modes; I2S and QSPI would be a nice bonus.
    jmg wrote: »
    The Nano is an obvious target, but there is also a low cost Cyclone V board with 12% more LEs

    Besides the 12% extra size, the speed from a Cyclone V build would be interesting to see.

    or. optimize for size may give enough, but Nano would then be slightly slower.



    Remove of Cordic could make more sense, as peripheral and Counter testing will need a burst of activity.

    P2 Counter docs are late arriving, so that compresses any testing time.
  • cgraceycgracey Posts: 14,133
    edited 2014-01-24 14:58
    I was recompiling all yesterday for the DE0-Nano. I had to remove more than was required to simply fit the circuitry, in order to get a high-speed fit that will run at 80MHz reliably.

    For the DE0-Nano, 1 cog configuration, the following were removed:

    32x32 multiply
    64/32 divide
    square root
    SERB
    CTRB
    CTRA's function generator

    I just need now to update the docs to cover the hub execution mode, which is working like a dream. It's really nice how you can call and jump anywhere, and in and out of cog and hub spaces. It's a whole new world for me. At this point, I could write on-chip tools in PASM, without needing to get a Spin compiler working first to accommodate the large code needed.

    Hopefully, by tonight I'll have the update posted.

    Thanks for your patience, Everyone.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-24 15:01
    Exellent news!

    Why do I have a feeling that a PASM (witten in hubexec) is on the horizon?

    And it is a new world for everyone...
  • mindrobotsmindrobots Posts: 6,506
    edited 2014-01-24 15:03
    Great news! At least something will run in our Nanos!
    cgracey wrote: »
    ... the hub execution mode, which is working like a dream. It's really nice how you can call and jump anywhere, and in and out of cog and hub spaces. It's a whole new world for me.

    Careful there, Chip, next thing you know you'll sneak in an interrupt or two!! :lol:
  • jmgjmg Posts: 15,155
    edited 2014-01-24 15:33
    cgracey wrote: »
    I was recompiling all yesterday for the DE0-Nano. I had to remove more than was required to simply fit the circuitry, in order to get a high-speed fit that will run at 80MHz reliably.

    For the DE0-Nano, 1 cog configuration, the following were removed:

    32x32 multiply
    64/32 divide
    square root
    SERB
    CTRB
    CTRA's function generator

    Shame to lose 32x32 multiply 64/32 divide SERB CTRB

    How much extra space does relaxing Speed emphasis give you ?
    Is Cordic not as costly as it sounds, or just harder to remove ?

    How many more LE does it need for all the options - is the BEmicro board, with 12% more LEs a solution ?
  • cgraceycgracey Posts: 14,133
    edited 2014-01-24 15:51
    jmg wrote: »
    Shame to lose 32x32 multiply 64/32 divide SERB CTRB

    How much extra space does relaxing Speed emphasis give you ?
    Is Cordic not as costly as it sounds, or just harder to remove ?

    How many more LE does it need for all the options - is the BEmicro board, with 12% more LEs a solution ?


    I had to get below ~97% before I'd get a fast compile. I could make two versions for the Nano - one, as is, and another with CORDIC removed and the other math put back in. The way it is now, it still runs the balls.spin demo.

    The BEmicro board would accommodate a whole cog, barely.
  • rjo__rjo__ Posts: 2,114
    edited 2014-01-24 15:57
    So, so good:)
  • Heater.Heater. Posts: 21,230
    edited 2014-01-24 16:02
    Fantastic. Hub exec sounds great. Yep, I can see a self hosted PASM on the horizon.
  • jmgjmg Posts: 15,155
    edited 2014-01-24 17:22
    cgracey wrote: »
    I had to get below ~97% before I'd get a fast compile. I could make two versions for the Nano - one, as is, and another with CORDIC removed and the other math put back in. The way it is now, it still runs the balls.spin demo.

    Could be worth checking into, when the SerDes is done ( & Counters)

    cgracey wrote: »
    I
    The BEmicro board would accommodate a whole cog, barely.

    'barely' is still a fit :) - but it needs to still fit after the serDes is expanded.

    Have you done any Cyclone V builds yet, to get a handle on the speed-change from Cyclone IV ?
  • BaggersBaggers Posts: 3,019
    edited 2014-01-25 04:52
    That's awesome news Chip, fun times ahead :D
  • SeairthSeairth Posts: 2,474
    edited 2014-01-25 05:34
    cgracey wrote: »
    I had to get below ~97% before I'd get a fast compile. I could make two versions for the Nano - one, as is, and another with CORDIC removed and the other math put back in. The way it is now, it still runs the balls.spin demo.

    The BEmicro board would accommodate a whole cog, barely.

    I'm curious about the compile process...

    What did you remove that affects the operating speed? Was it just that the fitter needed some "wiggle room" to efficiently lay out the circuits? Or was there something about the specific sections you removed?

    Also, I noticed that you are targeting 80MHz, though the xtal is 50MHz. I'm assuming you're using a PLL megafunction. If so, how's it configured?
  • cgraceycgracey Posts: 14,133
    edited 2014-01-25 06:35
    Seairth wrote: »
    I'm curious about the compile process...

    What did you remove that affects the operating speed? Was it just that the fitter needed some "wiggle room" to efficiently lay out the circuits? Or was there something about the specific sections you removed?

    Also, I noticed that you are targeting 80MHz, though the xtal is 50MHz. I'm assuming you're using a PLL megafunction. If so, how's it configured?


    It is so that the compiler needs some wiggle room to get the speed up. No wiggle room means that some signals get routed a long way and slow the whole circuit down.

    We are using the 50MHz input through a PLL to get 160MHz, which gets used in an NCO to make the Prop2 clock.

    Sorry this documentation is taking so long. I'm going through the whole document making lots of changes to reflect how things are working now. I'm not even to the point of explaining the hub execution, although I don't think it will take long.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-25 06:43
    No worries... it is well worth the wait :)
    cgracey wrote: »
    It is so that the compiler needs some wiggle room to get the speed up. No wiggle room means that some signals get routed a long way and slow the whole circuit down.

    We are using the 50MHz input through a PLL to get 160MHz, which gets used in an NCO to make the Prop2 clock.

    Sorry this documentation is taking so long. I'm going through the whole document making lots of changes to reflect how things are working now. I'm not even to the point of explaining the hub execution, although I don't think it will take long.
  • potatoheadpotatohead Posts: 10,260
    edited 2014-01-25 07:38
    Given the scope of the change, I think the time elapsed so far is nothing short of amazing. No worries here. It takes the time it takes.
  • User NameUser Name Posts: 1,451
    edited 2014-01-25 07:56
    Self-hosted PASM? Perhaps I haven't given hub execution enough consideration. Until this minute, it didn't seem like my feature.

    Is self-hosted PASM a likelihood?? Has it been mentioned before today? Would it be a third-party project like Spinix or an official Parallax tool? Too soon to say?
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-01-25 08:04
    What does "self-hosted PASM" mean? Is it just a Prop assembler that runs on the Prop, or is it more like a Forth interpreter that can assemble PASM instructions?
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-01-25 08:16
    Knowing Chip, and knowing PNut is written in x86 assembler, I suspect that he might write one in hubexec assembly code - he has been wanting self-hosted PASM and Spin for a long time.

    Mind you, it really does not matter what the assembler is written in, as long as it is standard PASM syntax (instead of FORTH syntax). It would probably take less memory written in Forth.

    Btw, I look forward to see your Forth running on the P2, it ought to be a blast.

    Hmm... thinking about it, there is no reason why a Spin compiler could also not be written in Forth.

    Porting Sphinx to the P2 also seems like a good idea.
  • potatoheadpotatohead Posts: 10,260
    edited 2014-01-25 09:20
    I think that is precisely what Chip will do with Hubex.
  • Heater.Heater. Posts: 21,230
    edited 2014-01-25 09:24
    Ahhhgg! Stop it already with the Forth thing. It gives me headache.

    David,
    What does "self-hosted PASM" mean?
    A very good question.

    Historically that means we need a simple text editor. The assembler of course. And a crude OS making all that usable and able to load and run programs.

    What we have here in the P2 is a device with 8 32 bit processors running AFAP into 256KB or RAM together with video capabilities and an easy means of attaching gigabytes of storage on an SD card. Oh, and not to metion the 32MB RAM that looks like it will supplied as standard on the first dev boards.

    That makes the thing orders of magnitude more powerful than the old 8 bit personal computers or even the first IBM PC.

    That would suggest this is all more than possible. All we need is a couple of turbo assembler programmers, in the style of Gary Kildall (CP/M, PL/M), Leor Zolman (BDS C compiler), or indeed Chip himself (PASM).

    If it's worth any ones time to do all this is another question of course.
Sign In or Register to comment.