Shop OBEX P1 Docs P2 Docs Learn Events
Hub Execution Model Thread (split from blog) - Page 6 — Parallax Forums

Hub Execution Model Thread (split from blog)

13468922

Comments

  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-04 15:37
    David Betz wrote: »
    Yes, this is how I propose that BIG work. The BIG instruction supplies bits 31:9 to the following instruction's S field.

    What I was proposing wasn't this complicated. The 32 bit value would be used as an immediate value in the instruction that follows the BIG instruction. There would be no range checking and no hub accesses unless the instruction happened to be RDxxxx or WRxxxx. In that case, the 32 bit immediate value would be the hub address for the hub access. Really, nothing in the COG processor would need to change except the handling of immediate values in the S field.

    I don't think it would even affect the execution unit. It would only affect the instruction decoder that handles the forming of immediate operands.

    I think there should be one 23 bit "big" register for each thread. That way the BIG instruction could be used even when threading was in use.
    How do you guys do quotes within quotes copied from the post you are referring to? Reply with quote does not do that?

    Thanks for the answers David. Yes, I understand now. That sounds quite easy IMHO.

    Might be possible to actually extend it to loading a 32 bit field and just "OR" in the lower #S 9 bits. This could provide more uses such that the BIG instruction would be 32 bits and the #S could be #0. I am not going to suggest adding the #S as this most likely takes too long within the pipeline although it could be useful.

    BIG [#]D ' Loads a 32 bit register to be "OR"ed with the next instructions #S (9 bits) to be used as a resultant 32 bit immediate S field.
    Presuming we can free up a full instruction, then an immediate value of
    xxxxxxx 00 x xxxx xxxxxxxxx SSSSSSSSS ' Loads a the value stored in register S into the "BIG" register
    xxxxxxx 10 n nnnn nnnnnnnnn nnnnnnnnn ' Load 23 immediate bits into the lower "BIG" register bits 22..0 and zero bits 31..23.
    xxxxxxx 11 n nnnn nnnnnnnnn nnnnnnnnn ' Load 23 immediate bits into the upper "BIG" register bits 31..9 and zero bits 8..0

    We would require 4 such registers for use in multi-tasking.

    Bill, yes it could be used in RD/WRxxxx
  • jmgjmg Posts: 15,173
    edited 2013-12-04 15:41
    Cluso99 wrote: »
    How do you guys do quotes within quotes copied from the post you are referring to? Reply with quote does not do that?
    You can simply
    Nest Quotes
    as deep as you like
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-04 15:42
    As Sapieha pointed out, there is effectively no NOP instruction. There is a WAIT #n instruction.
    But you can no longer assume that an instruction with cccc=0000 will not execute (ie as a NOP).
    Currently the top 14 bits must be all zeros to ensure a NOP - well not precisely... this bit config
    0xxxxxx xx x 0000 xxxxxxxxx xxxxxxxxx ensures a NOP
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-04 15:46
    Ouch!

    I just checked the latest docs,

    0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS RDBYTE D,S/PTRx

    So much for using NOP!
    BTW Bill, there never was a NOP in P1. It was any instruction with cccc=0000 and was typically a WRBYTE instruction.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-04 15:51
    Reply with Quote from your post gets me nothing.
    So I have to manually cut & paste within manual quote and end quote tags? Or is there a simple way to copy someones post that already includes quotes, while keeping those quotes?
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-04 15:59
    ozpropdev wrote: »
    Ray,

    DECOD5 takes the lower 5 bits and decodes it into a single bit mask.

    DECOD5 reg2

    replaces

    MOV reg,#1
    SHL reg1,reg2

    DECOD4 takes the lower 4 bits and creates a 16 bit mask. The resulting mask is copied to the 2 word positions.
    DECOD3 takes the lower 3 bits and creates a 8 bit mask. The resulting mask is copied to the 4 byte positions.

    ENCOD does the reverse.

    Example

    ENCOD reg1,#000 would return the value 5 to represent the 5th bit is set.
    Values returned are in the range of 1 to 32.
    A zero result represents no bit is set.

    IIRC if multiple bits are set, it returns the most significant bit.

    Maybe these can be shrunk into one opcode by using WZ,WC as suggested.

    So WZ & WC are not required for DECOD3/4/5 but WZ looks like being required for ENCOD?
    That's OK as currently ENCOD is shared with BLMASK, but it only saves 2 instructions. I was hoping that I could find a place for BLMASK and free another instruction slot.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-04 16:03
    Nesting quotes requires cut&paste... at least how I do it, with a text editor beside the browser.

    I like your encoding! That would work well.
    Cluso99 wrote: »
    How do you guys do quotes within quotes copied from the post you are referring to? Reply with quote does not do that?

    Thanks for the answers David. Yes, I understand now. That sounds quite easy IMHO.

    Might be possible to actually extend it to loading a 32 bit field and just "OR" in the lower #S 9 bits. This could provide more uses such that the BIG instruction would be 32 bits and the #S could be #0. I am not going to suggest adding the #S as this most likely takes too long within the pipeline although it could be useful.

    BIG [#]D ' Loads a 32 bit register to be "OR"ed with the next instructions #S (9 bits) to be used as a resultant 32 bit immediate S field.
    Presuming we can free up a full instruction, then an immediate value of
    xxxxxxx 00 x xxxx xxxxxxxxx SSSSSSSSS ' Loads a the value stored in register S into the "BIG" register
    xxxxxxx 10 n nnnn nnnnnnnnn nnnnnnnnn ' Load 23 immediate bits into the lower "BIG" register bits 22..0 and zero bits 31..23.
    xxxxxxx 11 n nnnn nnnnnnnnn nnnnnnnnn ' Load 23 immediate bits into the upper "BIG" register bits 31..9 and zero bits 8..0

    We would require 4 such registers for use in multi-tasking.

    Bill, yes it could be used in RD/WRxxxx
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-04 16:04
    Ok, I am getting old and grey, and my memory is going... thanks for the refresh cycle!
    Cluso99 wrote: »
    BTW Bill, there never was a NOP in P1. It was any instruction with cccc=0000 and was typically a WRBYTE instruction.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-04 16:05
    I wonder if there are other opportunities to combine instructions that don't need one or more of WZ WC and I ?
    Cluso99 wrote: »
    So WZ & WC are not required for DECOD3/4/5 but WZ looks like being required for ENCOD?
    That's OK as currently ENCOD is shared with BLMASK, but it only saves 2 instructions. I was hoping that I could find a place for BLMASK and free another instruction slot.
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-04 16:10
    Cluso99 wrote: »
    So WZ & WC are not required for DECOD3/4/5 but WZ looks like being required for ENCOD?
    That's OK as currently ENCOD is shared with BLMASK, but it only saves 2 instructions. I was hoping that I could find a place for BLMASK and free another instruction slot.

    WZ with ENCOD I don't think has any effect. I'm firing up the FPGA now to check.

    Edit: Just thinking about it.... It would reflect no bits set....Oops!
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-04 16:22
    I wonder if there are other opportunities to combine instructions that don't need one or more of WZ WC and I ?
    quote_icon.png Originally Posted by Cluso99 viewpost-right.png
    So WZ & WC are not required for DECOD3/4/5 but WZ looks like being required for ENCOD?
    That's OK as currently ENCOD is shared with BLMASK, but it only saves 2 instructions. I was hoping that I could find a place for BLMASK and free another instruction slot.


    No, I have carefully scanned them. But there may be something that comes out of direct HUB-AUX transfers that could affect the RDBYTE/WORD/LONG Cache versions, or the RDAUX/RDAUXR, but I am not hopeful.

    There are a couple in the 1000011-1111110 area that might yield something.

    Then there is 1111111 & S=1xxxxxxxx that may also be available for an 8 bit S.

    And I have a REPS/REPD alternative that partially frees 1111110.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-04 16:26
    I'm hopeful...

    Sounds like DECODExx may save us two dual-op instructions (if I read the above correctly), and that's all we need for (HJMP/HCALL/HCALLA/HCALLB) and BIG, HRET / HRETA / HRETB don't need any arguments.
    Cluso99 wrote: »
    No, I have carefully scanned them. But there may be something that comes out of direct HUB-AUX transfers that could affect the RDBYTE/WORD/LONG Cache versions, or the RDAUX/RDAUXR, but I am not hopeful.

    There are a couple in the 1000011-1111110 area that might yield something.

    Then there is 1111111 & S=1xxxxxxxx that may also be available for an 8 bit S.

    And I have a REPS/REPD alternative that partially frees 1111110.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-04 17:59
    Cluso99 wrote: »
    How do you guys do quotes within quotes copied from the post you are referring to? Reply with quote does not do that?

    Thanks for the answers David. Yes, I understand now. That sounds quite easy IMHO.

    Might be possible to actually extend it to loading a 32 bit field and just "OR" in the lower #S 9 bits. This could provide more uses such that the BIG instruction would be 32 bits and the #S could be #0. I am not going to suggest adding the #S as this most likely takes too long within the pipeline although it could be useful.

    BIG [#]D ' Loads a 32 bit register to be "OR"ed with the next instructions #S (9 bits) to be used as a resultant 32 bit immediate S field.
    Presuming we can free up a full instruction, then an immediate value of
    xxxxxxx 00 x xxxx xxxxxxxxx SSSSSSSSS ' Loads a the value stored in register S into the "BIG" register
    xxxxxxx 10 n nnnn nnnnnnnnn nnnnnnnnn ' Load 23 immediate bits into the lower "BIG" register bits 22..0 and zero bits 31..23.
    xxxxxxx 11 n nnnn nnnnnnnnn nnnnnnnnn ' Load 23 immediate bits into the upper "BIG" register bits 31..9 and zero bits 8..0

    We would require 4 such registers for use in multi-tasking.

    Bill, yes it could be used in RD/WRxxxx
    Sounds good although I'm not sure I see the value of being able to load the BIG register from another register. Also, I think Bill said that since BIG can't be encoded as a NOP, there may not be much reason to have the form that loads the low order bits with the BIG value. How would that even work? Would there be an extra bit to say which way BIG had been loaded so the instruction decode would know how to combine it with the S bits of the next instruction? That seems overly complicated to me. Maybe we'd better let Bill chime in on whether there is still value in loading the low bits rather than the high bits.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-04 18:21
    The low 23 bit option is still very useful as hub addresses ignore the high bits.
    David Betz wrote: »
    Sounds good although I'm not sure I see the value of being able to load the BIG register from another register. Also, I think Bill said that since BIG can't be encoded as a NOP, there may not be much reason to have the form that loads the low order bits with the BIG value. How would that even work? Would there be an extra bit to say which way BIG had been loaded so the instruction decode would know how to combine it with the S bits of the next instruction? That seems overly complicated to me. Maybe we'd better let Bill chime in on whether there is still value in loading the low bits rather than the high bits.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-04 18:22
    The low 23 bit option is still very useful as hub addresses ignore the high bits.
    Okay but I would hate for this to be rejected because we tried to pile on too many features. Of course, I guess that's the standard approach with the P2 so far. :-)

    Anyway, if you allow either the low or high 23 bits to be loaded you'll need one extra bit to remember which was requested by the BIG instruction so it can be combined correctly with the S field of the next instruction.
  • YanomaniYanomani Posts: 1,524
    edited 2013-12-04 18:34
    Following the discussion that is going on, I've tought some new ideas, targeted to the realm of the P2.

    I used to have a method, many times abused to be true, in former applications that I'd crafted along the years, to gain some 'almost" NOPs behaviour, from congested instruction set decoders.
    I'll try to depict it here, but due to my known difficulties to write in English, I'll beg you pardon an patience, for any typo or seems-to-be-confusing descriptions I'll make.

    Since OCT related coding, focused to be executed from HUB memory, must be eight consecutively long aligned, but not necessarily depart executing from xxxxxxxxxxxxxxxxxxxx000, then, for the first eight longs that will be fetched from HUB memory, all the "needed" 32 bit constants that will be used "inside" those "less than eight" executable instruction block, are present at the first x longs, that belongs to that block.
    Since we don't need to use the full 32 bits of data, but only 23, there are 9 remaining "unused" bits.
    Three of they, will be used to set the entry point for the next "eight long aligned" code block, that will be fetched in advance, as the present block is under execution. This provides enough space, to represent any number of constants we could need, to be referenced at the next code block, and so on.
    This provides for straight execution of code blocks, yet providing ample room for 'almost' immediate values placement, without having to waste a single JUMP, to skip over data space.
    It's kind of unaligned inline immediate values placement, and sure, almost for free.

    Now the technic to fetch those values, useable either from HUB, as for AUX, and even COG memory.

    Whenever an operation, references the same place, as the source and destination, for a read or write operation, they are to be treated as NOPS, inside the pipeline, in the aspect of doing their WRITE phase, at stage four.
    First, because it's worthless to write over the same place, a value that is already there.
    Second, because the full 32 bits of the gathered value, are at disposal, to be used elsewhere; in the present case, to load the BIG register.
    And this also gives us some 9 unused bits, three of them, sure, compromised as above.

    IMHO, the pipeline ALU will have no problem at all, dealing with the above depicted operations, sure, pending Chip's analisys and approval, and also sure, the comments, aditions and critics of each and every other of the many participants of the forum.

    Naturaly, the same technic still works, easily, for AUX and COG memories too.

    I hope it helps in the present situation.

    Yanomani

    P.S. When I wrote "Whenever an operation, references the same place, as the source and destination, for a read or write operation", I was talking about general memory, not the pin circuits, or any other special feature register, where writing over could be used for special purposes.

    P.S. 2 - Sure, "It's kind of unaligned inline immediate values placement, and sure, almost for free." is not true.
    You must place the WRLONG D,S, where D=S, in order to get the action done. My mistake and shame!:blank:

    P.S. 3 - "Second, because the full 32 bits of the gathered value, are at disposal, to be used elsewhere; in the present case, to load the BIG register." To be true, the write phase will exist, directed to the BIG register, and to the three bit "next OCT entry point" register. This must not be cleared, untill used for the first time, at next block execution entry.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-04 18:55
    Actually, that bit may not be needed...

    The usage case with the low 32 bits is basically meant for the table case, and directly enoded hub addresses. If BIG is OR'd with a #0 in the S, that works as well... so I don't think the extra bit is needed.
    David Betz wrote: »
    Okay but I would hate for this to be rejected because we tried to pile on too many features. Of course, I guess that's the standard approach with the P2 so far. :-)

    Anyway, if you allow either the low or high 23 bits to be loaded you'll need one extra bit to remember which was requested by the BIG instruction so it can be combined correctly with the S field of the next instruction.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-04 19:00
    Actually, that bit may not be needed...

    The usage case with the low 32 bits is basically meant for the table case, and directly enoded hub addresses. If BIG is OR'd with a #0 in the S, that works as well... so I don't think the extra bit is needed.
    Could you give a concrete example of the table case? I'm having a hard time visualizing what you're talking about. Sorry to be so dense!
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-04 19:27
    David Betz wrote: »

    quote_icon.png Originally Posted by Cluso99 viewpost-right.png
    Might be possible to actually extend it to loading a 32 bit field and just "OR" in the lower #S 9 bits. This could provide more uses such that the BIG instruction would be 32 bits and the #S could be #0. I am not going to suggest adding the #S as this most likely takes too long within the pipeline although it could be useful.

    BIG [#]D ' Loads a 32 bit register to be "OR"ed with the next instructions #S (9 bits) to be used as a resultant 32 bit immediate S field.
    Presuming we can free up a full instruction, then an immediate value of
    xxxxxxx 00 x xxxx xxxxxxxxx SSSSSSSSS ' Loads a the value stored in register S into the "BIG" register
    xxxxxxx 10 n nnnn nnnnnnnnn nnnnnnnnn ' Load 23 immediate bits into the lower "BIG" register bits 22..0 and zero bits 31..23.
    xxxxxxx 11 n nnnn nnnnnnnnn nnnnnnnnn ' Load 23 immediate bits into the upper "BIG" register bits 31..9 and zero bits 8..0

    We would require 4 such registers for use in multi-tasking.

    Bill, yes it could be used in RD/WRxxxx
    Sounds good although I'm not sure I see the value of being able to load the BIG register from another register. Also, I think Bill said that since BIG can't be encoded as a NOP, there may not be much reason to have the form that loads the low order bits with the BIG value. How would that even work? Would there be an extra bit to say which way BIG had been loaded so the instruction decode would know how to combine it with the S bits of the next instruction? That seems overly complicated to me. Maybe we'd better let Bill chime in on whether there is still value in loading the low bits rather than the high bits.
    The instruction "BIG" would load the 23 bits into the appropriate bits in the "BIG" register. The next executed instruction would not know, or care, where the bits were loaded. It's #S field would just be ORed with the BIG register.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-12-04 21:05
    Spending time with wifey, quick response only for now :) table is not as important as option to be in bots 0..22. More tomorrow.
    David Betz wrote: »
    Could you give a concrete example of the table case? I'm having a hard time visualizing what you're talking about. Sorry to be so dense!
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-04 23:31
    I have again re-read this thread.

    I am still not sure of the requirement regarding HJMP, HCALL and HRET, and how they get used.
    I presume you do not need to save/restore the Z/C flags with these instructions?

    Could we simplify this whole thing a bit, and disregard multi-tasking for this mode of operation? Might simplify it quite a bit for Chip, etc.

    Does the mapping/windowing of AUX into COG help if you could map larger blocks into COG?
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-05 00:17
    Looking at this another way.

    I have written LMM pasm, so I understand how we FJMP, FCALL and FRET. Also I know that in this mode, it is better to have constants embedded as NOP instructions (18 bit constants) than have to setup fixed constants in cog.

    I am presuming that the GCC compiler emits code in a similar fashion.
    I presume that is why the BIG instruction is important, and I understand that.

    Currently in LMM on P1 we run a tight 4 instruction loop. In P2 that loop is a 5 instruction loop - this is what I use in my P2 Debugger...
    ''-------[ LMM execution loop ]-------------------------
    LmmLoop         rdlong  lmm_opcode, lmm_pc              ' rdlong        (read LMM hub instr into OPCODE using PC)
                    add     lmm_pc, #4                      ' PC++          (inc PC to next LMM hub instr)
    lmm_op2         nop                                     ' rdlong delay  (optional 2nd instruction execution)
    lmm_opcode      nop                                     ' rdlong result (execute the LMM hub instr)
                    jmp     #LmmLoop                        ' loop
    

    By being able to window some AUX ram into COG ram, we can now execute in place, saving the LMM execution loop.
    Presuming we window only 8*Longs (the RDWIDE instruction width) of AUX into COG $1E0..$1E7 we might do something like this...
    xxx:  RDWIDE ddd,sss  'read 8*longs into aux which is windowed into the following instructions
    '[I]some delay to ensure the aux has been read[/I]
    1e0:  instr1  '\\ 8*longs read in by the RDWIDE instruction  
    1e1:  instr2  '||
    ...
    1e7:  instr8  '//
    1e8:  JMP #xxx 'go fetch another 8*longs
    ... some instructions to accept the FJMP, FCALL, FRET instructions
    
    Am I on the right track?
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-05 00:43
    Extending the above HUBEXEC (named by Bill) model (replaces LMM model)...

    This method would permit a tight DJNZ style instruction loop
    1df:  RDWIDE ddd,sss WC 'read 8*longs into aux which is windowed into the following instructions at $1E0..$1E7; WC means stall until read.
    1e0:  instr1  '\\ 8*longs read in by the RDWIDE instruction  
    1e1:  instr2  '||
    ...
    1e7:  instr8  '//
    1e8:  JMP #$1df 'go fetch another 8*longs
    ... some instructions to receive the FJMP, FCALL, FRET instructions... some instructions to accept the FJMP, FCALL, FRET instructions
    

    An alternative, but the REPS instruction terminates with DJNZ style instructions..
    1dd:  REPS #9  'repeat next 9 instructions until a JMP is executed
    1de:  NOP      'spacer instruction
    1df:  RDWIDE ddd,sss WC 'read 8*longs into aux which is windowed into the following instructions at $1E0..$1E7; WC means stall until read.
    1e0:  instr1  '\\ 8*longs read in by the RDWIDE instruction  
    1e1:  instr2  '||
    ...
    1e7:  instr8  '//
    ... some instructions to receive the FJMP, FCALL, FRET instructions... some instructions to accept the FJMP, FCALL, FRET instructions
    

    I have asked Chip if it were possible to
    (1) Make the RDWIDE instruction capable of delivering up to a count of 32 x 8*Long reads into AUX in the background with a tiny state m/c
    (2) If it would be possible to map up to the whole 32 x 8*Long AUX registers into COG ram

    By mapping a large Aux block into Cog, a good set of hub instructions could be executed inline at a time, and possibly small loops could be contained
    within those blocks read, giving an enormous boost to performance.
  • cgraceycgracey Posts: 14,151
    edited 2013-12-05 01:04
    I got rid of the SETPIX0/1/2/3 instructions and made a new SETPIXW instruction that loads all eight PIX terms from the WIDE registers, all at once. So, there are four 'D/#,S/#' opcodes available now.

    I've loosely read this thread and I understand that you are looking for some opcode space for HJMP/HCALL/...

    I also see there is talk about how to have a 32-bit constant in-line. About that: I think the idea has already been posited, but we could have a dummy instruction that doesn't do anything, though its 23 LSBs are free for data payload. Any instruction that has an immediate D or S, with priority going to S, can look for this dummy instruction in the next-lower stage of the pipeline. If it sees it, and it hasn't been cancelled as trailing branch code, it will use its 23 LSBs to augment the 9-bit immediate value it already has, giving it a full 32-bit immediate for D or S. This would solve the problem, would it not?
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-05 01:15
    cgracey wrote: »
    I got rid of the SETPIX0/1/2/3 instructions and made a new SETPIXW instruction that loads all eight PIX terms from the WIDE registers, all at once. So, there are four 'D/#,S/#' opcodes available now.

    I've loosely read this thread and I understand that you are looking for some opcode space for HJMP/HCALL/...

    I also see there is talk about how to have a 32-bit constant in-line. About that: I think the idea has already been posited, but we could have a dummy instruction that doesn't do anything, though its 23 LSBs are free for data payload. Any instruction that has an immediate D or S, with priority going to S, can look for this dummy instruction in the next-lower stage of the pipeline. If it sees it, and it hasn't been cancelled as trailing branch code, it will use its 23 LSBs to augment the 9-bit immediate value it already has, giving it a full 32-bit immediate for D or S. This would solve the problem, would it not?

    That seems to fit the 32 bit constant concept quite well.
    That would work well with MUL / DIV as well.
    Now, what to name it? :)
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-05 01:17
    cgracey wrote: »
    I got rid of the SETPIX0/1/2/3 instructions and made a new SETPIXW instruction that loads all eight PIX terms from the WIDE registers, all at once. So, there are four 'D/#,S/#' opcodes available now.

    I've loosely read this thread and I understand that you are looking for some opcode space for HJMP/HCALL/...

    I also see there is talk about how to have a 32-bit constant in-line. About that: I think the idea has already been posited, but we could have a dummy instruction that doesn't do anything, though its 23 LSBs are free for data payload. Any instruction that has an immediate D or S, with priority going to S, can look for this dummy instruction in the next-lower stage of the pipeline. If it sees it, and it hasn't been cancelled as trailing branch code, it will use its 23 LSBs to augment the 9-bit immediate value it already has, giving it a full 32-bit immediate for D or S. This would solve the problem, would it not?
    Yes Chip. It's what was called the "BIG" instruction in that summary post of mine.
  • cgraceycgracey Posts: 14,151
    edited 2013-12-05 01:17
    ozpropdev wrote: »
    That seems to fit the 32 bit constant concept quite well.
    That would work well with MUL / DIV as well.
    Now, what to name it? :)

    It would probably never be used for in-cog code, as it would waste a cycle as the dummy data-payload instruction floated through the pipeline, but it would provide code executing from the hub a way to have 32-bit constants without resorting to complicated means.
  • cgraceycgracey Posts: 14,151
    edited 2013-12-05 01:25
    Cluso99 wrote: »
    Yes Chip. It's what was called the "BIG" instruction in that summary post of mine.

    Super!

    It would be used like this:

    ADD reg,#bigconstant & $1FF
    BIG #bigconstant >> 9

    That would add bigconstant to reg.

    Instead of BIG, we should probably give it a name like AUGI for 'augment immediate'.

    Any instruction having an immediate S or D would look for AUGI behind it. If it sees it and it's not cancelled, it extends the immediate value right in the pipeline, before it gets to stage 4. This was a really clever idea you guys came up with, and it turns out that it can be done by using the registers already in the pipeline, so it's almost free!
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-05 02:16
    cgracey wrote: »
    Super!

    It would be used like this:

    ADD reg,#bigconstant & $1FF
    BIG #bigconstant >> 9

    That would add bigconstant to reg.

    Instead of BIG, we should probably give it a name like AUGI for 'augment immediate'.

    Any instruction having an immediate S or D would look for AUGI behind it. If it sees it and it's not cancelled, it extends the immediate value right in the pipeline, before it gets to stage 4. This was a really clever idea you guys came up with, and it turns out that it can be done by using the registers already in the pipeline, so it's almost free!
    Great news. That is clever just reversing the order of the two instructions. And working for #D as well. WTG Chip!
    It doesn't work for multi-tasking? (That's fine I think)

    Looks like we will get that "HUBEXEC" (execute in place) model working! What a performance boost over LMM!
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-05 02:34
    Cluso99 wrote: »
    It doesn't work for multi-tasking? (That's fine I think)

    A small price to pay for a HUGE feature! :)
Sign In or Register to comment.