Shop OBEX P1 Docs P2 Docs Learn Events
Where to grow the Prop1 architecture, instruction-wise — Parallax Forums

Where to grow the Prop1 architecture, instruction-wise

cgraceycgracey Posts: 14,155
edited 2014-09-05 18:45 in Propeller 1
In the current Prop1 chip, there are four unused instruction opcodes:

000100 ZCRICCCC DDDDDDDDD SSSSSSSSS <MUL> D,S/#
000101 ZCRICCCC DDDDDDDDD SSSSSSSSS <MULS> D,S/#
000110 ZCRICCCC DDDDDDDDD SSSSSSSSS <ENC> D,S/#
000111 ZCRICCCC DDDDDDDDD SSSSSSSSS <ONES> D,S/#

These instructions were planned for future implementation, but are unused in the current chip and tools.

Making new instructions that work in these spaces will not impede the operation of the current tools systems. If you were to make the ADD instruction behave differently, on the other hand, you would cause the ROM code to malfunction and you wouldn't even be able to download a program. By doing new things in these four instruction spaces, you'll encounter no problems with the ROM code or current tools.

I propose, for the purpose of varied development efforts, to rename these opcodes as follows:

000100 ZCRICCCC DDDDDDDDD SSSSSSSSS USR0 D,S/#
000101 ZCRICCCC DDDDDDDDD SSSSSSSSS USR1 D,S/#
000110 ZCRICCCC DDDDDDDDD SSSSSSSSS USR2 D,S/#
000111 ZCRICCCC DDDDDDDDD SSSSSSSSS USR3 D,S/#

The assemblers can be modified to recognize and assemble these instructions so that a way exists to grow the instruction set without any fixed functional requirements.

Comments

  • Bill HenningBill Henning Posts: 6,445
    edited 2014-08-07 13:18
    Sounds great!

    Questions

    1) Would we also not have

    xxxxxx xxxx 0000 ddddddddd sssssssss

    also available for instructions that can execute unconditionally?

    2) Is Port B left as an exercise for the readers?
  • cgraceycgracey Posts: 14,155
    edited 2014-08-07 13:21
    Sounds great!

    Questions

    1) Would we also not have

    xxxxxx xxxx 0000 ddddddddd sssssssss

    also available for instructions that can execute unconditionally?

    2) Is Port B left as an exercise for the readers?


    1) Yes, that would work, too. Then we could implement those four planned instructions, which would be useful.

    2) I didn't add a Port B because I figured it would be a good learning opportunity for others.


    I think many of you are going to get way into hardware, from now on.
  • mindrobotsmindrobots Posts: 6,506
    edited 2014-08-07 13:24
    cgracey wrote: »

    I propose, for the purpose of varied development efforts, to rename these opcodes as follows:

    000100 ZCRICCCC DDDDDDDDD SSSSSSSSS USR0 D,S/#
    000101 ZCRICCCC DDDDDDDDD SSSSSSSSS USR1 D,S/#
    000110 ZCRICCCC DDDDDDDDD SSSSSSSSS USR2 D,S/#
    000111 ZCRICCCC DDDDDDDDD SSSSSSSSS USR3 D,S/#

    The assemblers can be modified to recognize and assemble these instructions so that a way exists to grow the instruction set without any fixed functional requirements.

    Excellent idea! I had wondered about that a day or so ago it the "anticipation thread".....to really get into the verilog and make significant changes, at some point you would need to get into OpenSpin and PropGCC to have it recognize your new instructions. This would solve that problem.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-08-07 13:45
    Perhaps leave MUL & DIV, and have ENC and ONES become USR0 & USR1?

    Plus we decode the "never execute" opcodes.

    Agreed port B is an excellent user exercise :)

    Dang it! There goes what little "free" time I had... where is my Arrow account... (at least I know where my DE0-Nano's are)
    cgracey wrote: »
    1) Yes, that would work, too. Then we could implement those four planned instructions, which would be useful.

    2) I didn't add a Port B because I figured it would be a good learning opportunity for others.


    I think many of you are going to get way into hardware, from now on.
  • jmgjmg Posts: 15,173
    edited 2014-08-07 14:26
    mindrobots wrote: »
    ....to really get into the verilog and make significant changes, at some point you would need to get into OpenSpin and PropGCC to have it recognize your new instructions.

    In the medium term, a macro assembler can manage new opcodes at the ASM level, and I think PropGCC has a macro assembler.

    Of course, longer term, PropGCC and OpenSpin can also code generate for new opcodes, but that will need more care, and a target-version-management scheme.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-07 14:29
    jmg wrote: »
    In the medium term, a macro assembler can manage new opcodes at the ASM level, and I think PropGCC has a macro assembler.

    Of course, longer term, PropGCC and OpenSpin can also code generate for new opcodes, but that will need more care, and a target-version-management scheme.
    I already modified PropGCC a while back to use MUL and DIV from the new P2 instruction set. It wasn't that hard.
  • SRLMSRLM Posts: 5,045
    edited 2014-08-07 14:35
    David Betz wrote: »
    I already modified PropGCC a while back to use MUL and DIV from the new P2 instruction set. It wasn't that hard.

    How hard is it to modify the compiler to accept different HUB sizes for the P1? The problem is that the compiler gives an error when 32KB is exceeded. IIRC, the hub size is hard coded in some file that is built into the compiler when propgcc is itself compiled.
  • jmgjmg Posts: 15,173
    edited 2014-08-07 14:41
    David Betz wrote: »
    I already modified PropGCC a while back to use MUL and DIV from the new P2 instruction set. It wasn't that hard.

    The problem will more be in 'Which MUL and (any) DIV' management.
    All take room, and some ARMs for example, either skip a DIV, or they include it in ROM
    The compiler use, needs to match what the silicon does, and I can see that gets murky quickly.

    Another way to manage this, is treat the new opcodes as in-line-HW-function calls.
    For example, someone could code a scaled multiply (A * B) / C, where it is nice to have an interim result of 64 bits,but that does not need to be in register space..
    Likewise, a DivMod in-line-HW-function that returns both quotient and remainder can give tighter code.
  • jmgjmg Posts: 15,173
    edited 2014-08-07 14:44
    SRLM wrote: »
    How hard is it to modify the compiler to accept different HUB sizes for the P1? The problem is that the compiler gives an error when 32KB is exceeded. IIRC, the hub size is hard coded in some file that is built into the compiler when propgcc is itself compiled.

    Most controller families have config files, that sets these resource check warning limits in a set of small files somewhere.
    That would need to come for all the Verilog-P1 variants. (and also P2)
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-07 14:55
    jmg wrote: »
    The problem will more be in 'Which MUL and (any) DIV' management.
    All take room, and some ARMs for example, either skip a DIV, or they include it in ROM
    The compiler use, needs to match what the silicon does, and I can see that gets murky quickly.

    Another way to manage this, is treat the new opcodes as in-line-HW-function calls.
    For example, someone could code a scaled multiply (A * B) / C, where it is nice to have an interim result of 64 bits,but that does not need to be in register space..
    Likewise, a DivMod in-line-HW-function that returns both quotient and remainder can give tighter code.
    This can be handled by compiler and linker options. For example, -p2 selects P2 code generation in the trunk version of PropGCC.
  • jazzedjazzed Posts: 11,803
    edited 2014-08-07 15:43
    SRLM wrote: »
    How hard is it to modify the compiler to accept different HUB sizes for the P1? The problem is that the compiler gives an error when 32KB is exceeded. IIRC, the hub size is hard coded in some file that is built into the compiler when propgcc is itself compiled.


    IIRC, all you need is a different .ld link file. I'm pretty sure the default .ld file set creates this restriction in CMM/LMM to keep people from getting horribly confused.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-07 15:47
    jazzed wrote: »
    IIRC, all you need is a different .ld link file. I'm pretty sure the default .ld file set creates this restriction in CMM/LMM to keep people from getting horribly confused.
    Yes, a different linker script is all that is needed.
  • roglohrogloh Posts: 5,794
    edited 2014-08-07 20:46
    Perhaps leave MUL & DIV, and have ENC and ONES become USR0 & USR1?

    Plus we decode the "never execute" opcodes.

    Agreed port B is an excellent user exercise :)

    Hi Bill,
    there was no DIV opcode on the P1, only unimplemented MUL and MULS(signed) from what I can see.

    It would be really nice to gain the 16x16 multiplier operation using FPGA hardware - single instruction cycle if possible. I don't know much about how it all works yet.

    Port B would naturally make a good second IO port for FPGAs with enough pins brought out or a quick way to control internal/custom FPGA peripherals without too much mucking about. Having all 3 ports (dirb, portb, and inb) each read and write gives plenty of flexibility there. You could perhaps use it to issue SDRAM operations if you wanted to, or try to use the hub to coordinate things better if multiple COGs are sharing it.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-08-07 21:30
    I missed this new forum - I had a direct link to the old (now sticky) thread :(

    HUBEXEC

    IMHO this feature is going to be a must have feature for most users here.

    This requires at least 2 instructions with a large number of immediate bits. They are

    CALLX/JMPX/RETX #nnnnnnnnn_nnnnnnnnn (18 bits reserved for the absolute goto address)

    LOAD #nnnnnnnnn_nnnnnnnnn (18 bits reserved for the immediate load/address)

    I propose to use the two opcode instruction spaces reserved for ENC and ONES.
    Both these instructions will use the combined D & S fields to store the constant value.
    I propose that the 18 bits all be available. Currently that would permit hub being 18 bits = 256KB.

    The 'LOAD' instruction requirements is not fully defined yet. We know what was done in the P2 but that was quite a bit more complicated and there were a number of instructions. IMHO we don't have that luxury while keeping code backward compatible.
    This 'LOAD' type instruction may go thru a few iterations... simple at first to test the Verilog waters.

    There are also instructions in the SYSOP opcode that can be further utilised along the P2 lines (particularly the early versions). It might be necessary to put the RETX here.

    For now, utilising the CCCC field for other than conditional execution is problematic in the Verilog code, at least for me as a Verilog beginner.

    BTW I would not like to utilise the CCCC=0000 for further instructions. Remember, it was/is an easy way to disable instructions, so there is likely code relying on this.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-07 21:36
    Cluso99 wrote: »
    I missed this new forum - I had a direct link to the old (now sticky) thread :(

    HUBEXEC

    IMHO this feature is going to be a must have feature for most users here.

    This requires at least 2 instructions with a large number of immediate bits. They are

    CALLX/JMPX/RETX #nnnnnnnnn_nnnnnnnnn (18 bits reserved for the absolute goto address)

    LOAD #nnnnnnnnn_nnnnnnnnn (18 bits reserved for the immediate load/address)

    I propose to use the two opcode instruction spaces reserved for ENC and ONES.
    Both these instructions will use the combined D & S fields to store the constant value.
    I propose that the 18 bits all be available. Currently that would permit hub being 18 bits = 256KB.

    The 'LOAD' instruction requirements is not fully defined yet. We know what was done in the P2 but that was quite a bit more complicated and there were a number of instructions. IMHO we don't have that luxury while keeping code backward compatible.
    This 'LOAD' type instruction may go thru a few iterations... simple at first to test the Verilog waters.

    There are also instructions in the SYSOP opcode that can be further utilised along the P2 lines (particularly the early versions). It might be necessary to put the RETX here.

    For now, utilising the CCCC field for other than conditional execution is problematic in the Verilog code, at least for me as a Verilog beginner.

    BTW I would not like to utilise the CCCC=0000 for further instructions. Remember, it was/is an easy way to disable instructions, so there is likely code relying on this.
    One problem with adding hubexec to P1 is that it will require a new code generator for PropGCC and Catalina to make any use of it.
  • pik33pik33 Posts: 2,366
    edited 2014-08-07 21:42
    MUL is to implement "at once" without using many resources, FPGA has multipliers built-in. Many of them. Hardware accelerated 16x16 MUL should be as fast as ADD. 32x32 MUL maybe will take longer, 2 registers has to be written.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-08-07 21:47
    David Betz wrote: »
    One problem with adding hubexec to P1 is that it will require a new code generator for PropGCC and Catalina to make any use of it.
    I coded my P2 debugger in PASM and used LMM. This was so much simpler when I did the same for an LCD driver.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-07 21:48
    Cluso99 wrote: »
    I coded my P2 debugger in PASM and used LMM. This was so much simpler when I did the same for an LCD driver.
    Yes, that is certainly possible.
  • roglohrogloh Posts: 5,794
    edited 2014-08-07 22:57
    pik33 wrote: »
    MUL is to implement "at once" without using many resources, FPGA has multipliers built-in. Many of them. Hardware accelerated 16x16 MUL should be as fast as ADD. 32x32 MUL maybe will take longer, 2 registers has to be written.

    That's what I was hoping/anticipating. MUL in a single instruction cycle, just like ADD today. Saves code space/time vs the successive addition approach. Even the simple AVR already has MUL (albeit 8x8) and it came in handy before for me with some real time audio scaling stuff.
  • Roy ElthamRoy Eltham Posts: 3,000
    edited 2014-08-08 12:44
    Someone implement some form of indirection on this!

    :)
  • potatoheadpotatohead Posts: 10,261
    edited 2014-08-08 13:19
    A lot of instructions could be supported on real props as lmm pseudo ops.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2014-08-08 17:55
    MUL is to implement "at once" without using many resources, FPGA has multipliers built-in. Many of them. Hardware accelerated 16x16 MUL should be as fast as ADD. 32x32 MUL maybe will take longer, 2 registers has to be written.

    +1 to that as well. Could do some nifty and fast FFT and DSP.
  • User NameUser Name Posts: 1,451
    edited 2014-08-08 19:43
    Roy Eltham wrote: »
    Someone implement some form of indirection on this!

    :)

    +1 :)
  • overclockedoverclocked Posts: 80
    edited 2014-08-09 00:32
    32x32 MUL will probably not be slow either because the MUL-instances are so quick so they can probably the clocked higher for internal multi-cycle work.
    My guess is you can have a 32x32 MUl running together with your Propeller at "full" speed.
  • pik33pik33 Posts: 2,366
    edited 2014-08-09 00:42
    You have to write 2 cog registers after 32x32 mul so it may needs another clock cycle.
  • rjo__rjo__ Posts: 2,114
    edited 2014-09-05 04:21
    Where are we with giving Saps like me access to usr0...usr3?
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-09-05 07:00
    You can code the verilog to implement the usr instructions.
    Meanwhile for testing, you can use a long to define the usr instructions - you manually compile each usr instruction. That's how I am testing the AUGDS and JMPRETX instructions.
  • rjo__rjo__ Posts: 2,114
    edited 2014-09-05 18:45
    Cluso99 et al

    Thanks. Sorry I didn't respond sooner. I should have said that I was on the road and wouldn't be able to check in for a day or two.

    From Chips remarks and my own (occasionally misguided) understanding I thought that a somewhat generic compiler interface was possible...
    So, for instance, I could write code that made sense to my verilog substrate (using usr0) and others could write code that made sense to their verilog substrates (using usr0 )and we could all use (pretty much) the same compiler...but since our verilog implementations would vary, our results would be specific to our verilog implementation.. I understand that there will be examples that fall outside of this, but is it generally correct?
Sign In or Register to comment.