CALLD D,{#}S {WC/WZ/WCZ}
Cluso99
Posts: 18,069
in Propeller 2
Question: Is there a case where WC or WZ only would be used ???
I can see the case for setting WCZ and the case for not setting either.
If not, could that be re-purposed?
I have been looking for a way to implement the P1 JMPRET instruction.
If there were an option to just overwrite the return address into D[19:0] without saving CZ to D[31:30] and without clearing D[29:20] then we could use the
JMP #A
instruction as the D register, which would work as the RET instruction variant of JMPRET.
The compiler would need to set the C & Z bits on the CALLD instruction to either 10 or 01 (ie WC or WZ).
Might this be possible?
Here are the two instructions involved. Only CALLD needs a silicon tweek.
BTW - I think #S is #rel9. I am unsure when using S if it is relative (20bits) or not???
I can see the case for setting WCZ and the case for not setting either.
If not, could that be re-purposed?
I have been looking for a way to implement the P1 JMPRET instruction.
If there were an option to just overwrite the return address into D[19:0] without saving CZ to D[31:30] and without clearing D[29:20] then we could use the
JMP #A
instruction as the D register, which would work as the RET instruction variant of JMPRET.
The compiler would need to set the C & Z bits on the CALLD instruction to either 10 or 01 (ie WC or WZ).
Might this be possible?
Here are the two instructions involved. Only CALLD needs a silicon tweek.
EEEE 1011001 CZI DDDDDDDDD SSSSSSSSS CALLD D,{#}S {WC/WZ/WCZ} Call to S** by writing {C,Z,10'b0,PC[19:0]} to D. C=S[31],Z=S[30]. EEEE 1101100 RAA AAAAAAAAA AAAAAAAAA JMP #A Jump to A. If R=1, PC+=A, else PC=A.
BTW - I think #S is #rel9. I am unsure when using S if it is relative (20bits) or not???
Comments
Plenty of P2 to keep me occupied
Ah, just done some testing and I've discovered that any immediate number entered in source is treated as an absolute address. The assembler then converts it to relative. This answers a puzzle I'd had for some time. I'd previously tried to hand code immediate offsets but it had always been rejected with error of must be within #0-511.
Any PC-relative encoding is resolved to absolute address upon execution. So, in your hypothetical change, when CALLD is executed it will write an absolute address to the JMP instruction. This would be okay as long as the JMP itself was specified to be an absolute in the first place.
EDIT: Third edit!
It does cover the sets (movs) tho'.
The subroutine only requires the additional long if you really need the full power of jmpret. Most uses of jmpret can be changed into call/ret. (Distinguishing between the two automatically is probably non-trivial, but for a human programmer it shouldn't be too hard to manually convert code to use the appropriate form.)
For the spin1 interpreter, can you move some of the code into HUB or LUT to free up space?
But others are much more difficult and sometimes involve inline instruction modification by conditionally modifying the return address.
For specifics, I am using the example of my spin Interpreter. It will use half of LUT for jump tables (3 cog 9-bit addresses and a 5 single flags/bits) for each bytecode. Speed is attained by unrollinggand tweeking the PASM. Only some of this can work in lut.
The thing is, this is the first real conversion of P1 PASM I have tried. Only trying real code do you find the problems in converting P1 code. Most has converted ok, but there are a few stumbling blocks, and there are some much more difficult requiring substantial understanding of the code. I'd like to be able to automate as much as possible. JMPRET stood out to be the real bug-bear, and of course every program has a substantial number of them.
There are already a few P1 instructions that require more than one P2 instructions to simulate. But these usually only occur a few times in the code, so an auto converter should work here.
I know some P2 instructions can replace multiple P1 instructions. This again requires an understanding of the original code.
My hope is to be able to convert a reasonable number of P1 programs (objects) quickly with minimal resources (ie time). The quicker we get these done the better the P2 will be received. IMHO JMPRET is the biggest stumbling block to achieving this by far. I am looking at each and every possible solution to aid the conversion.
I just grepped through the "spin-standard-library" project from github.com/parallaxinc, and there were 77 instances of "jmpret", 1340 of "call", and 3155 of "jmp". Some of those may be in comments, but I think it's clear that uses of the full power of jmpret are actually pretty rare. The Spin interpreter is kind of a special case there. But even so, looking at Chip's original interpreter it looks like most of the time the jmpret dest register is the same (there are just "getret" and "pushret"). So I think some variant of my proposal should work, since you'll only need the extra long after those two labels (the actual "jmpret" gets translated into just "calld").
For your new Spin interpreter, have you looked at P2 XBYTE mode? That'll definitely give a big speed improvement.
In reality, it's the CALL form of JMPRET that causes the most problem. This writes a 9-bit cog return address into the 9-bits (the S bits) of the D register, leaving all other bits intact. But it's also the RET (which is in fact a JMP) since this is the recipient of those 9-bit cog return address writes. Obviously the other bits must remain intact.
The problem is self-modifying code doesn't work with the new call and ret instructions. That's where the MOVS/D fail. So you really do understand the problem, but are just seeing it from a different perspective.
As for XBYTE, that's too big a change to implement in my interpreters case. I am not up for a rewrite that this would entail. You see, I broke down each bytecode into one to three subroutines. This is what my table houses, up to 3 subroutine calls in the form of 3 x 9-bit cog addresses. And there is one of these for each bytecode.
Something doesn't quite fit in that statement. If only wanting fast and compact CALL/RET function then the Prop2's hardware stack based CALL/RET should be easy drop-in replacement.
jmp xyz_ret (without the #)
to return, skipping the actual jump to the return instruction.
-Phil
And recursive subroutine calls are a no-go using the internal stack. I do it in the spin interpreter.
Phil,
Yes, I have used it sometimes, and also seen it used by others too.