Shop OBEX P1 Docs P2 Docs Learn Events
HUB EXEC Update Here - Page 4 — Parallax Forums

HUB EXEC Update Here

1246716

Comments

  • Heater.Heater. Posts: 21,230
    edited 2014-01-31 13:28
    How does gcc and or catalina style compressed code compare to Spin byte codes? CMM or whatever it is called?

    Given that there is no Spin byte code interpreter in ROM on the PII one could imagine adopting something completely different than the old bytecodes.

    Is there any merit is compiling to byte/compressed code or native PASM to be decided on an object by object basis? Then no new syntax needs inventing, like CPUB/CPRI/CDAT/CVAR. Nothing would change in an objects source code to move it from being compiled to byte codes or native. I'm not sure where or how we specify the compile mode though.
  • jmgjmg Posts: 15,173
    edited 2014-01-31 13:28
    @ Dave: The idea sounds good, but since then, GCC has support to direct selective code compiled to a COG, I think. (and has in line PASM as well )

    Would it make sense to look at the controls in GCC, and apply the same/similar semantics to Spin, so there is not too much culture shock moving from one to the other ?
  • SRLMSRLM Posts: 5,045
    edited 2014-01-31 14:54
    Heater. wrote: »
    How does gcc and or catalina style compressed code compare to Spin byte codes? CMM or whatever it is called?

    Early benchmarks indicated the CMM to Spin was about 1:1 on size and 2:1 on speed (source)
  • SRLMSRLM Posts: 5,045
    edited 2014-01-31 14:55
    jmg wrote: »
    @ Dave: The idea sounds good, but since then, GCC has support to direct selective code compiled to a COG, I think. (and has in line PASM as well )

    GCC can support compiling an entire cog (direct), inlining LMM/CMM code, and inline code for the FCACHE (effectively inline cog code).
  • rjo__rjo__ Posts: 2,114
    edited 2014-02-03 08:47
    I'm using the Nano with the latest jic. Is there a list of commands that were excluded to squeeze the build into the Nano?
    I saw speculation on the required changes but no confirmed list.
  • ozpropdevozpropdev Posts: 2,792
    edited 2014-02-03 16:37
    rjo__ wrote: »
    I'm using the Nano with the latest jic. Is there a list of commands that were excluded to squeeze the build into the Nano?
    I saw speculation on the required changes but no confirmed list.

    Rich, See here

    Brian :)
  • rjo__rjo__ Posts: 2,114
    edited 2014-02-03 17:31
    Thanks Brian.
  • cgraceycgracey Posts: 14,151
    edited 2014-02-04 16:30
    I got these working the same way in all single-/multi-task modes:

    REPS
    <spacer instruction>
    <REPS block>

    REPD
    <spacer instruction>
    <spacer instruction>
    <spacer instruction>
    <REPD block>


    These should be working, as well, after the current compile completes:

    JMPD/CALLD/RETD
    <trailing instruction>
    <trailing instruction>
    <trailing instruction>


    These cog changes will make all code execute the same way, no matter the task mix, so that everything can be written for what was single-task mode. Now we'll write all future code using these spacer/trailer rules and it will run optimally in single-task mode, but still work in multi-task mode without any pipeline issues.

    I hope to have an update out tonight.

    Thanks for your patience.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-02-04 16:38
    Sounds great, fewer "Duh" moments for everyone!

    I am hoping to have some quality DE2-115 time in a couple of days :)
    cgracey wrote: »
    I got these working the same way in all single-/multi-task modes:

    REPS
    <spacer instruction>
    <REPS block>

    REPD
    <spacer instruction>
    <spacer instruction>
    <spacer instruction>
    <REPD block>


    These should be working, as well, after the current compile completes:

    JMPD/CALLD/RETD
    <trailing instruction>
    <trailing instruction>
    <trailing instruction>


    These cog changes will make all code execute the same way, no matter the task mix, so that everything can be written for what was single-task mode, only. Now we'll write all future code using these spacer/trailer rules and it will run optimally single-task mode, but still work in multi-task mode.

    I hope to have an update out tonight.

    Thanks for your patience.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-02-04 16:51
    Fantastic news Chip. Life will be so much simpler this way!
  • SapiehaSapieha Posts: 2,964
    edited 2014-02-04 16:54
    Hi Chip.

    Thanks

    Looks good.

    Now P2 will be user friendly (Compiler's to)
  • cgraceycgracey Posts: 14,151
    edited 2014-02-04 16:56
    Cluso99 wrote: »
    Fantastic news Chip. Life will be so much simpler this way!


    The only other pipeline issue regarding spacer instructions that I can think of is the write-before-execute issue, which must be coded with two spacers to work in single-task mode. In multi-task mode, zero, one, or two spacers may be needed, with two working for every single- and multi-task case:

    ADD :i,#1
    <spacer instruction>
    <spacer instruction>
    :i MOV OUTA,0


    So, if these cases are coded with two spacers, they will work under any circumstance. I think that does it for unifying the coding rules across task mixes.

    Thanks for sticking around, Guys. I appreciate your help and enthusiasm.
  • ozpropdevozpropdev Posts: 2,792
    edited 2014-02-04 17:35
    @Chip

    Nice work! :)
    Did you manage to squeeze in the 3 additional REPx blocks?
    Brian
  • jmgjmg Posts: 15,173
    edited 2014-02-04 18:47
    cgracey wrote: »
    I got these working the same way in all single-/multi-task modes:

    REPS
    <spacer instruction>
    <REPS block>

    REPD
    <spacer instruction>
    <spacer instruction>
    <spacer instruction>
    <REPD block>


    These should be working, as well, after the current compile completes:

    JMPD/CALLD/RETD
    <trailing instruction>
    <trailing instruction>
    <trailing instruction>


    These cog changes will make all code execute the same way, no matter the task mix, so that everything can be written for what was single-task mode. Now we'll write all future code using these spacer/trailer rules and it will run optimally in single-task mode, but still work in multi-task mode without any pipeline issues.

    Great :)

    I can see a safe mnemonic form for the first two (borrowing from how other DSPs manage this opcode
    like this
    REPSs  LoopCount, BlockStartAdr, BlockEndAdr 
    <spacer instruction>
    BlockStartAdr:
      <REPS block>
      <REPS block>
      <REPS block>
    BlockEndAdr:
    
    
    REPDs LoopCount, BlockStartAdr, BlockEndAdr 
    <spacer instruction>
    <spacer instruction>
    <spacer instruction>
    BlockStartAdr:
      <REPD block>
      <REPD block>
    BlockEndAdr:
    

    A matching 'safe' version of delayed exit is also worth having.
    Perhaps something like this ?
      <preceding  instruction>
      <preceding  instruction>
    JMPDs/CALLDs/RETDs DestAddress, ActualDepartureAdr 
      <trailing instruction>
      <trailing instruction>
      <trailing instruction>
    ActualDepartureAdr :  ' actual departure point
    
      <other code>   
    DestAddress:  
    
    

    in operation, Assembler does a simple sanity check on BlockStartAdr & ActualDepartureAdr and flags if wrong for that opcode. (just like other checks for Adr out of range)
  • cgraceycgracey Posts: 14,151
    edited 2014-02-04 20:52
    ozpropdev wrote: »
    @Chip

    Nice work! :)
    Did you manage to squeeze in the 3 additional REPx blocks?
    Brian


    Ah, yes. I forgot to mention that each task has its own REPS/REPD circuit now.
  • jmgjmg Posts: 15,173
    edited 2014-02-04 21:07
    cgracey wrote: »
    Ah, yes. I forgot to mention that each task has its own REPS/REPD circuit now.

    Nice. How much added-logic did that cost ? IIRC earlier comments had it not insignificant ?
  • cgraceycgracey Posts: 14,151
    edited 2014-02-04 22:08
    jmg wrote: »
    Nice. How much added-logic did that cost ? IIRC earlier comments had it not insignificant ?


    It added over 1,000 flipflops to the chip, which already has, probably 60,000. Not a big deal.
  • Heater.Heater. Posts: 21,230
    edited 2014-02-04 22:08
    This is great stuff Chip. Having such regularity in behaviour is a huge win.

    Is it time to be freeze things up a bit? That next shuttle run is coming fast isn't it?
  • cgraceycgracey Posts: 14,151
    edited 2014-02-04 22:10
    Heater. wrote: »
    This is great stuff Chip. Having such regularity in behaviour is a huge win.

    Is it time to be freeze things up a bit? That next shuttle run is coming fast isn't it?


    I think things are getting very near to "done".

    Next, I want to add a CALLR instruction which writes the return address to a register, instead of a stack. The C compiler guys really want this for leaf functions. Spin could use it, too. It's handy for pulling arguments right after the CALLR, then jumping to the final register value.
  • AleAle Posts: 2,363
    edited 2014-02-05 01:18
    Hei Chip,

    I have sort of a general coding question, regarding verilog. You have added loads and loads of opcodes, they need arguments and you have huge fan-outs, and muxes for the results, how is it that it is so fast ?. (I mean 80+ MHz).
    I got the idea of having more that one "opcode" register, so to say and then it will have smaller fan-outs, probably nothing new...

    Thanks.

    Edit: Maybe are the fpgas that fast... Lets see... I'll try to compile my (un-optimized and sub-par 6809) for the Cyclone V and see :) (It can do 40 MHz in the MachXO2, and 67 MHz in the Spartan3E, it only has 8 & 16 bit paths, but many muxes :( )

    Edit: It can do 90 MHz on the cyclone V. (5CEFA2F23C8N).
  • ctwardellctwardell Posts: 1,716
    edited 2014-02-05 06:38
    cgracey wrote: »
    I think things are getting very near to "done".

    Next, I want to add a CALLR instruction which writes the return address to a register, instead of a stack. The C compiler guys really want this for leaf functions. Spin could use it, too. It's handy for pulling arguments right after the CALLR, then jumping to the final register value.

    Is there still a possibility of adding the non-hub flags and increasing the locks?

    http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1236830&viewfull=1#post1236830

    Thanks,

    Chris Wardell
  • mindrobotsmindrobots Posts: 6,506
    edited 2014-02-05 07:02
    Great stuff, Chip!

    The consistent operation will be a big help!

    So, before the first P2 developer boards are made, is there going to be a signature sheet passed around so anyone that contributed a feature suggestion or helped with the development and testing and sign it. How cool would it be to have an autographed mask on the board with Chip's signature and all the contributor's signatures?
  • bartgranthambartgrantham Posts: 83
    edited 2014-02-05 10:56
    cgracey wrote: »
    Ah, yes. I forgot to mention that each task has its own REPS/REPD circuit now.

    That's fantastic! Having as much of the instruction set be task-agnostic in its usage is super helpful. I can imagine this resulting in a library of tight and simple background task code that can be quickly mixed in with larger cog programs, or used on its own.
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-02-05 11:33
    cgracey wrote: »
    Next, I want to add a CALLR instruction which writes the return address to a register, instead of a stack. The C compiler guys really want this for leaf functions. Spin could use it, too. It's handy for pulling arguments right after the CALLR, then jumping to the final register value.
    Instead of a CALLR instruction maybe it would be useful to have an instruction that sets a register location that is written to when any of the CALL instructions are used. It would be something like "SETRETREG register_number". This would cause all of the CALL instructions to write the return address to the designated register instead of writing it to the return stack. Does this make sense, or would it cause confusion? I think this would satisfy the requirement for the C compiler. David Betz, what do you think?

    EDIT: I thought about it a bit more, and this might cause some problems when trying to mix code that expects to use the stack for the return address. So a CALLR instruction might be better as long as it works with both COG addresses and hub addresses.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-02-05 11:39
    Sorry, that would be confusing, and might interfere with the other modes of CALL operation - the intent is to NOT interfere with them, but provide something David/Eric/you need.

    I think Chip will either pick $1F1 as the link register, or provide a SETLR (or as you called it, SETRETREG) for the CALLR instruction.
    Dave Hein wrote: »
    Instead of a CALLR instruction maybe it would be useful to have an instruction that sets a register location that is written to when any of the CALL instructions are used. It would be something like "SETRETREG register_number". This would cause all of the CALL instructions to write the return address to the designated register instead of writing it to the return stack. Does this make sense, or would it cause confusion? I think this would satisfy the requirement for the C compiler. David Betz, what do you think?
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-02-05 11:47
    Bill, I agree, it would be confusing. I was just thinking out loud. So will the CALLR instruction allow for calling 9-bit and 16-bit constant addresses, and also calling indirectly through a register? It seems like that might require 2, or maybe 3 different instructions.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-02-05 12:00
    I suspect that there will only be one:

    CALLR D/#16bitconst

    As that would allow addressing all 256KB (64KLONG) hub; the other CALLx's distinguish between cog/hub addresses as 0-511=cog, 512+=hub
    Dave Hein wrote: »
    Bill, I agree, it would be confusing. I was just thinking out loud. So will the CALLR instruction allow for calling 9-bit and 16-bit constant addresses, and also calling indirectly through a register? It seems like that might require 2, or maybe 3 different instructions.
  • DaveJensonDaveJenson Posts: 375
    edited 2014-02-05 12:24
    Wow, Chip.
    It just keeps getting better and better!
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-02-05 12:48
    I was thinking of Sapieha's request for a CALL-equivalent to the new list. Something like that could be useful for VM's, and even more so for operating systems / libraries.

    CALLVECT D,#n
    CALLVECT D,S

    CALLLIST looked weird with 3 L's, so I changed it to VECT for the example

    D holds the base address of a WORD table in the hub

    n or S are the index

    This way it could dispatch to 512 system/VM routines. I think only the 4-level hardware stack version would be needed.

    It bears thinking on, even if only for P3.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-02-05 13:01
    Adddendum:

    If the addresses in the word table are relative to their position, this would get us DLL's.

    The calling task/cog would place the start of the DLL in the register "dllbase"

    Then it could call any routine in the DLL with

    CALLVECT dllbase,#routineindex ' (0..511)<<2

    A dll would be:

    WORD @function0
    WORD @function1
    ....
    WORD @functionN-1
    ' local DLL data area
    function0:

    function1:

    ...

    As relative addresses would be used, no need for a relocating loader
Sign In or Register to comment.