Replacing the P1 JMPRET instruction in P2

Cluso99 · 2018-10-28 06:38

JMPRET D,{#}S {wc,wz,nr}

010111_zcri_cccc_ddddddddd_sssssssss    JMPRET D,{#}S  {wc,wz,nr}
  result:  Z=0;  C=1 unless PC+1=0
-----------------------------------------------------------------
010111_zc01_cccc_xxxxxxxxx_sssssssss    RET            {wc,wz}
010111_zc0i_cccc_xxxxxxxxx_sssssssss    JMP      {#}S  {wc,wz}
010111_zc1i_cccc_ddddddddd_sssssssss    CALL     {#}S  {wc,wz}
010111_zc1i_cccc_ddddddddd_sssssssss    JMPRET D,{#}S  {wc,wz}

P2

Sometimes we can replace the JMPRET/CALL/RET instructions with the CALL & RET instructions using the P2's internal stack.

However, sometimes it is not possible to replace the JMPRET/CALL/RET instructions with those. These are the cases where the addresses are 9-bit register addresses and the upper bits may be non-zero.
These cases have to be looked at individually.

Here is one example...
I have a table of vectors. Each 32bit table entry contains 3 x9bit register addresses and a 5bit set of flags. Each entry represents up to 3 call addresses and a set of flags. So after I have located the respective table entry, I copy it and then jump/call to the lowest 9bit entry (in the "S" space). I then ROR #9 to move the "D" entry into the "S" entry, and again jump/call the lowest 9bit entry. This may be repeated a third time.

In the P2, the CALLx saves a 20bit address as C,Z,10'b0,PC+1(19:0) and jumps to an address (if not immediate) at D of PC(19:0).

Because when the return address is stored it overwrites the whole 32bits, it is not possible to store the return address in the RET (JMP) instruction like we did in P1.
So, we need an extra register just to store the return address.

In my above example, I also require an extra instruction to copy the 9bit vector address from the "S" bits into another register that has at least bits 19:10 cleared. The register could be the same register as where the return address will be stored.

Cluso99 · 2018-11-09 19:46

Copied from the New Pin Instructions thread

I am concerned about the feature creep.

How is this going to impact the current silicon testing?

Having said this, I am finding a problem converting P1 code. The big "gotcha" is the missing equivalent JMPRET [D],[#]S instruction. The problem comes with self-modifying code, and the direct 9-bit jumps. There were other tricks done with the WC and WZ setting but these are rarer and can be overcome with additional instructions.

It would make a huge difference having a JMPRET instruction equivalent. What does it need?
It needs a bit for the NR case where the instruction is just a JMP or RET where no return address is being written. A bit for immediate #S (an "I" bit) is required. S is either an immediate goto address in cog, or a cog register storing a 9-bit cog address in its S bits.
For the JMPRET or CALL equivalent, the cog return address is written to the S bits in the cog register pointed to in the 9-bit D address.
Note this instruction only works in COG. Addresses are direct, not relative, because they can also be set using SETS and SETD (formerly MOVS and MOVD) instructions in self-modifying code.

Now, we can use the JMP {#}S direct 20-bit address instruction for the JMP and RET replacement. The SETS instruction works here too. The upper address bits just remain as 0's. BUT, there is the case where we use JMP S to jump to an address stored in the S-bits of the cog register pointed to by the S address. That S-register might have higher bits set (ie not 0's) and in that case, the JMP S direct 20-bit instruction doesn't work.

Next are the JMPRET and CALL equivalents which are really just a JMPRET D,{#}S where D is a 9-bit cog register where the 9-bit cog return address will be written into its S field, and {#}S is the 9-bit cog address to jump to. Optional WC and WZ bits would be nice but as I said can be worked around. They would be set as per P1, not P2. A shortcut can be use to just set Z and C. In P1, the WZ was used to set NZ as the only way Z could be set is if the code wrapped to cog $000 from the shadow register $1FF which is almost impossible. I cannot recall if WC was ever used and how it was set.

Thoughts please.

Cluso99 · 2018-11-09 19:47

Dave Hein replied

You can use the CALLD in place of JMPRET. That's what I do for p2gcc. There are two forms for CALLD. One is CALLD D, {#}S. This allows for the target address to be in cog RAM, or it can be an immediate value between 0 and 511. The other form is CALLD PA/PB/PTRA/PTRB,#A, where A is an 20-bit immediate value.

p2gcc always uses CALLD PA,#A. If you are running from cog memory you can use the first form. If the target address is in hub RAM, the return address will need to be in PA, PB, PTRA or PTRB.

Cluso99 · 2018-11-09 21:11

Here are the JMPRET and variant instructions in my Faster Spin Interpreter for P1 which I am trying to get running on P2.
It's a real P1 program and JMPRET is a real bugbear to convert.

The easy parts. There are lots of these and fortunately they directly translate and compile...

                        jmp     #xxxx

and this one...

        if_c            jmp     #callobj        wz      'obj[].sub? (z=0) i.e. c+nz

which must (easily) be converted to use two instructions (the wz forces nz)...

        if_c            modz    _clr                    ' set nz
        if_c            jmp     #callobj                'obj[].sub? (z=0) i.e. c+nz

Next are these...

:restore                movs    pushret,#loop           'restore pushret, followed by write
where...
pushret                 jmp     #loop

Knowing how the jmp #loop will compile, it will use an immediate 20-bit address with the top 11 bits zeroed. So this will work after converting MOVS to SETS.

:restore                movs    pushret,#loop           'restore pushret, followed by write
where...
                        jmp     pushret
pushret                 long    @loop

These subroutines can easily use the internal stack instructions since they do not call any routines within them...

getadrs                 rdbyte  op2,pcurr               'get first byte
...
getret                  ret

popayx                  sub     dcurr,#4
                        rdlong  a,dcurr
popyx                   sub     dcurr,#4
                        rdlong  y,dcurr
popx                    sub     dcurr,#4
popxr                   rdlong  x,dcurr
popx_ret
popyx_ret
popayx_ret              ret

range   if_c            xor     a,y                     'if reverse range, swap range values
...
range_ret               ret

So this takes care of these call instructions...

                        call    #range                  'check if x in range y..a according to c
        if_z_and_nc     call    #popyx                  'if look true or casedone, pop target and address
        if_nc           call    #popx                   'register bit?
        if_c            call    #popyx                  'register range?
        if_c            call    #popx                   'yes, pop and scale
        if_nc_and_nz    call    #popx                   'write?
        if_z            call    #popx
                        call    #popayx                 'pop data (a=to, y=from, t1=step)
                        call    #range                  'check if x in range y..a according to c

Next we need to look closely at these... Note the wz can be fixed with an extra MODZ instruction.

jmpadr                  jmpret  getret,#getadrs         'get sign-extended address
j3_12                   jmpret  getret,#getadrs wz      'case, get sign-extended address (c=same, z=0)
        if_c_or_nz      jmpret  getret,#getadrs
                        jmpret  getret,#getadrs         'get address

Fortunately, after examining the GETADRS routine, we can again use the internal stack (as we did for the RANGE etc routines), so these JMPRET GETRET,#GETADRS become a simple CALL #GETADRS.

Next, we are only left to look closely at these...

                        jmpret  pushret,#read           'modifier or mathop, read var (c=1 if mathop)
                        jmpret  pushret,#vector1        'do mathop (via vector) (c=1)

When we check the instruction at pushret we find...

pushret                 jmp     #loop

We need to replace this instruction with the 2 instruction sequence...

                        jmp     pushret
pushret                 long    @loop

And we need to change the JMPRET instructions to...

                        calld   pushret,#read           'modifier or mathop, read var (c=1 if mathop)
                        calld   pushret,#vector1        'do mathop (via vector) (c=1)

Finally, we get to the hardest parts...

This is the main loop which is executed for every spin bytecode. Each bytecode has a 32-bit table entry consisting of 5-bits+3*(9-bit subroutine addresses) which I've called vectors.

main_loop               mov     x,#0                    'reset x

                        rdbyte  op,pcurr                'get opcode
                        add     pcurr,#1
                        mov     a,op                    'preset a (for mathop)
                        test    op,#%01         wz      'get flags                          
                        test    op,#%10         wc      '(note varop requires c=1)

                        mov     vector,op               'get the offset (bytecode op)
vector1                 shl     vector,#2               'convert to longs (*4)
                        add     vector,vector_base      'add the hub base address
                        rdlong  vector,vector           'get the vector_table entry

                        jmpret  popx_ret,vector         'indirect call to 1st vector (return address in pop_ret=vector_ret)
vector2                 shr     vector,#9              
                        jmpret  popx_ret,vector         'indirect call to 2nd vector (return address in pop_ret=vector_ret)
vector3                 shr     vector,#9               
                        jmpret  popx_ret,vector         'indirect call to 3rd vector (return address in pop_ret=vector_ret)
                        jmp     #$                      'never gets here!!!  (can save this instr)   '<===
vector_base             long    0-0                     'base of vector_table
vector                  long    0-0                     'vector(s)

There is no equivalent for these three JMPRET instructions, so the whole loop has to be physically re-coded using extra MOVS (SETS) instructions.
The reason is that vector contains up to 3 9-bit addresses, so when a JMPRET xxxx,vector is executed, an indirect jump to vector results. In P1, that indirect jump takes the lowest 9-bits of vector. In P2, it takes 20-bits, and those top 11-bits are not necessarily 0's !!!
I have not worked out the conversion for this code at this time.

Please note none of this has been tested at this time

It would be so much easier if we had a JMPRET equivalent where we could just globally change JMPRET, CALL, JMP and RET instructions to use a new instruction that worked the same as the P1 JMPRET instruction. ie uses 9-bit cog addresses only.

ozpropdev · 2018-11-10 02:40

Ray
FYI, You still need to include the 'wz' for 'modz' to work correctly.

        if_c            modz    _clr    wz              ' set nz
        if_c            jmp     #callobj                'obj[].sub? (z=0) I.

Maybe Pnut needs to automatically assume that's the case.

Cluso99 · 2018-11-10 03:01

Yes, I had presumed pnut would supply the WZ. It's just a sub-variant of the MODCZ instruction.

Cluso99 · 2018-11-15 07:02

Time to push this further because it is important for converting every P1 PASM program.

JMPRET Instructions

010111 zcri eeee ddddddddd sssssssss  JMPRET D,#/S
010111 zc0i eeee --------- sssssssss  JMP      #/S
010111 zc01 eeee --------- ---------  RET
010111 zc1i eeee ddddddddd sssssssss  CALL     #/S  'D is supplied by the compiler

We often also use this to change the return address

...... zc1i .... ddddddddd sssssssss  MOVS   D,#/S  'change the address in a JUMP or RET instruction

JMPRET and it's forms do not exist in P2. There are a few problems with P2 replacement...

1. CALLs always save C & Z and 20-bit addresses. The return address cannot be placed into a JMP/RET instruction.
So, every JMP/RET instruction needs to be identified by hand in P1 code, and replaced with an indirect JMP/RET and a new register to hold the address. Takes an extra long for every JMP/RET.

2. JMP, CALL & RET instructions quite often have their "goto" addresses modified at execution time using the MOVS instruction. Provided the replacement JMP/CALL/RET gets converted to use the absolute addressing, and the address is immediate, all will be fine with this.

3. If the JMP/CALL/RET "S" address is indirect, then this indirect register needs checking. If it is purely an address, all will be fine. However, if the address is part of an instruction or some other data where the upper bits[31:9] are non-zero, or are used, then this will fail because JMP/CALL/RET instructions on P2 expect 20 bit addresses.

Conversion of every P1 PASM program will require close scrutiny for self-modifying code and every use of JMP/CALL/RET/JMPRET instructions.

A nice solution would be to have a somewhat compatible P2 JMPRET instruction...

eeee xxxxxxx xxi ddddddddd sssssssss  JMPRET D,#/S

This code could only run in cog-exec mode.
The return address (9-bit cog register) would be saved/written in Destination bits[8:0]. C & Z would not be saved/written.
If the jump to address is indirect, then only the source bits[8:0] would be used, permitting the upper bits to be ignored.

For the JMP and RET equivalents, since we do not have a NR bit in P2, then I think JMPRET INA,#/S may work as "INA/INB" are (writes) are only used for the debug registers. Perhaps it may mean that debug will not work with these converted programs,but it is way better than the current conversion problems.

While there are other issues that may confront conversions from P1 PASM programs, this seems a "biggie" that could easily be solved.

cgracey · 2018-11-15 17:43

Cluso99, I hear you. I haven't wanted to tamper with PC-related instructions. Also, this would require a new instruction slot. One problem is that the cog PC range is now 10 bits with the LUT added. The overall PC is 20 bits. This instruction would only work for cog registers. In the P2, JMPRET became outmoded.

Electrodude · 2018-11-15 17:54

Can you add a trimmed-down JMPRET that can only do indirect jumps, for coroutines? You could make it a single-operand instruction that reads the jump target and writes the return address to and from the same cogram register instead of separate registers like the P1. A programmer could use ALTR to redirect the result somewhere else when there are more than two coroutines.

cgracey · 2018-11-15 17:57

Electrodude wrote: »

Can you add a trimmed-down JMPRET that can only do indirect jumps, for coroutines? You could make it a single-operand instruction that reads the jump target and writes the return address to and from the same cogram register instead of separate registers like the P1. A programmer could use ALTR to redirect the result somewhere else when there are more than two coroutines.

Can't you use CALLD to do that, already?

Electrodude · 2018-11-15 18:00

cgracey wrote: »

Electrodude wrote: »

Can you add a trimmed-down JMPRET that can only do indirect jumps, for coroutines? You could make it a single-operand instruction that reads the jump target and writes the return address to and from the same cogram register instead of separate registers like the P1. A programmer could use ALTR to redirect the result somewhere else when there are more than two coroutines.

Can't you use CALLD to do that, already?

You're right, sorry.

Cluso99 · 2018-11-15 20:54

The crux of the problem is the intense manual effort required for every JMPRET/CALL/JMP/RET.
Each requires specifically checking what it does, and then different changes depending on what it does.

JMPRET/CALLs all save 9 bits only, no C or Z.

JMPRET/CALL/RET/JMP indirect always only use the bottom 9 bits of the indirect register, allowing the top bits to be anything (eg an instruction or a series of 9-bit addresses that get SHR #9 after each use).

JMPRET/CALL/RET/JMP immediate can have their addresses changed by MOVD or MOVS.

Without a new replacement instruction, it's just not worth the effort to convert any P1 PASM code !!!

I have spent days on my P1 Spin Interpreter and I am stuck on this part. There is just no easy way.

cgracey · 2018-11-15 21:03

Cluso99 wrote: »

The crux of the problem is the intense manual effort required for every JMPRET/CALL/JMP/RET.
Each requires specifically checking what it does, and then different changes depending on what it does.

JMPRET/CALLs all save 9 bits only, no C or Z.

JMPRET/CALL/RET/JMP indirect always only use the bottom 9 bits of the indirect register, allowing the top bits to be anything (eg an instruction or a series of 9-bit addresses that get SHR #9 after each use).

JMPRET/CALL/RET/JMP immediate can have their addresses changed by MOVD or MOVS.

Without a new replacement instruction, it's just not worth the effort to convert any P1 PASM code !!!
I have spent days on my P1 Spin Interpreter and I am stuck on this part. There is just no easy way.

But aren't there better ways to do things now in P2? Aside from translating old code, is there any new benefit to having an old-style JMPRET instruction?

Cluso99 · 2018-11-15 21:18

The benefit is being able to take a P1 PASM program and converting it to run on P2. There are heaps of programs in ibex that should convert nicely, but JMPRET is a show-stopper as it requires a complete understanding of the program to convert this part.
With a JMPRET replacement, it is possible to convert almost all JMPRET/CALL/JMP/RET instructions to the new JMPRET by a software program.
Without it, there are just too many variances, and more code space because every RET needs to be indirect.

There are other issues with converting, but JMPRET/etc is usd throughout programs. It's probably the most used instruction in any program!

We don't need WZ/WC/Wcz. We dot need NR.
There are two slots vacant above SETPAT.

jmg · 2018-11-15 21:19

Cluso99 wrote: »

JMPRET/CALL/RET/JMP immediate can have their addresses changed by MOVD or MOVS.

Without a new replacement instruction, it's just not worth the effort to convert any P1 PASM code !!!
I have spent days on my P1 Spin Interpreter and I am stuck on this part. There is just no easy way.

That change is true of any self modifying code, and any self modifying code is always going to be tricky to port.
Something like a P1 Spin Interpreter, is going to be hand crafted ASM, and should really be hand-crafted P2-ASM for best performance, not a simple-line-by-line port ?

Cluso99 wrote: »

JMPRET/CALLs all save 9 bits only, no C or Z.
JMPRET/CALL/RET/JMP indirect always only use the bottom 9 bits of the indirect register, allowing the top bits to be anything (eg an instruction or a series of 9-bit addresses that get SHR #9 after each use).

Is there some middle ground - were maybe a P2 opcode can be told to limit/mask to 9 bits ?
The downside is you make a new opcode, that should only be used in only one place (COG), so you may just delay the pain if someone then tried to move that code....

I do not see that 'zero effort' porting is possible, as some change will always be needed in moving from P1 to P2.

__red__ · 2018-11-18 03:05

jmg wrote: »

Cluso99 wrote: »

Is there some middle ground - were maybe a P2 opcode can be told to limit/mask to 9 bits ?

That sounds like something fairly simple to implement in a compiler, no?

Cluso99 · 2018-11-18 03:21

For me, without an easier way to convert P1 to P2, I am not going to bother. It's just too difficult

That means a lot of code that could have been ported quickly will need to be totally re-written. That's a big job and will take a lot of time. From past computer history, it will not happen.

So we will be without objects for a long time and that will impact the P2 traction.

That's just my opinion though.

potatohead · 2018-11-18 03:27

Very early on, compatability was taken off the table.

Honestly, many objects can be done in plain old SPIN. That will run somewhere on par with P1 PASM.

kwinn · 2018-11-18 04:28

Cluso99 wrote: »

For me, without an easier way to convert P1 to P2, I am not going to bother. It's just too difficult

That means a lot of code that could have been ported quickly will need to be totally re-written. That's a big job and will take a lot of time. From past computer history, it will not happen.

So we will be without objects for a long time and that will impact the P2 traction.

That's just my opinion though.

Leaving the JMPRET/CALL/JMP/RET instructions unchanged while converting the rest of the P1 code to P2 would still be a benefit even if the P2 equivalent code has to be done manually. To take advantage of the P2's added capabilities some additional manual coding would be necessary in many cases.

Replacing the P1 JMPRET instruction in P2

Comments