Frankly, I am not sure RISC and CISC mean anything any more. Also, it was pointed at P3,
I don't expect it to make it into P2. You have to admit, being able to call DLL functions regardless of where they were loaded would be useful. It would also help C++ support.
You have to admit, being able to call DLL functions regardless of where they were loaded would be useful. It would also help C++ support.
Sounds good, but there is already this
JMPLIST jumps to a base address (S/@/@@) plus index (D).
JMPLIST D,@relative9 'jump to D plus 9-bit relative address
JMPLIST D,@@relative16 'jump to D plus 16-bit relative address
JMPLIST D,S 'jump to D plus S
and Call to a series of offsets will take the same time as a call to a series of jumps ?
- and I think current opcodes can support a call to a series of jumps ?
JMPLIST jumps to a base address (S/@/@@) plus index (D).
JMPLIST D,@relative9 'jump to D plus 9-bit relative address
JMPLIST D,@@relative16 'jump to D plus 16-bit relative address
JMPLIST D,S 'jump to D plus S
and Call to a series of offsets will take the same time as a call to a series of jumps ?
- and I think current opcodes can support a call to a series of jumps ?
- the return address is not stored
- the function list is not word pointers at a fixed base address in the hub, but a long list (takes twice the hub ram)
However...
- SERDES is more important than this
- HUNGRY is more important than this
JMPLIST jumps to a base address (S/@/@@) plus index (D).
JMPLIST D,@relative9 'jump to D plus 9-bit relative address
JMPLIST D,@@relative16 'jump to D plus 16-bit relative address
JMPLIST D,S 'jump to D plus S
and Call to a series of offsets will take the same time as a call to a series of jumps ?
- and I think current opcodes can support a call to a series of jumps ?
I note that the latest docs (Line 1589+) have conflicting definitions for the index in D or S.
Bill,
The pasm op name is currently limited to 7 characters. I like CALLVEC so perhaps we could rename JMPLIST to JMPVEC too?
Chip & all,
Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.
The pasm op name is currently limited to 7 characters. I like CALLVEC so perhaps we could rename JMPLIST to JMPVEC too?
Chip & all,
Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.
I note that the latest docs (Line 1589+) have conflicting definitions for the index in D or S.
Bill,
The pasm op name is currently limited to 7 characters. I like CALLVEC so perhaps we could rename JMPLIST to JMPVEC too?
Chip & all,
Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.
This would not be an acceptable substitute for CALLR. Chip and I talked about this when I was at Parallax last week. I don't want to have the side effect that the other CALL instructions would have when I'm using CALLR. All of the other CALL instructions write the return address to some stack and I don't want that behavior when I'm calling a leaf function.
Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.
I believe using a statically fixed high register address such as $1F1 for some return address storage (eg. LR equivalent) will limit the number of threads that can be running hubexec code when the compiler uses this feature in the call code it generates because it then would become a common resource shared by all threads and collisions could occur. Instead it's better to either have the ability to configure this LR register at some particular COG address, or, at the very least, use a low memory address instead of $1F1 so register remapping could also be used and give each thread its own context using the same register address offsets. That approach will allow multithreaded high level language execution sharing common hubexec code which needs to be independent of task ID.
Edit: I vaguely recall this was touched on in the past and I think Chip was even talking about using 0 for the LR. But if is settable to something else that could be useful too.
This would not be an acceptable substitute for CALLR. Chip and I talked about this when I was at Parallax last week. I don't want to have the side effect that the other CALL instructions would have when I'm using CALLR. All of the other CALL instructions write the return address to some stack and I don't want that behavior when I'm calling a leaf function.
I don't understand your requirement then. Are you saying you want all the CALLs to save in a fixed location? (please no)
I thought you just wanted an additional CALLR instruction (hub/cog) where the return address was stored in a fixed location. I didn't think any other CALLs needed changing???
Posible instruction fixes/changes/suggestions/additions...
=======================================================================================================
Here is a possible fix required:
WAITCNT
Thread: [URL="http://forums.parallax.com/showthread.php/151904-Here-is-the-update-from-the-Big-Change!!!?p=1222701&viewfull=1#post1222701"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1222701[/COLOR][/URL]
=======================================================================================================
Reason: Add new pin-pair instruction for use with USB bit-banging receive (similar to GETP/GETNP)
The S value (sub-instruction bits) "yyyyyyyy" would use the next available slot after CACHEX
Thread: [URL="http://forums.parallax.com/showthread.php/151904-Here-is-the-update-from-the-Big-Change!!!?p=1222515&viewfull=1#post1222515"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1222515[/COLOR][/URL]
1111111 ZC L CCCC DDDDDDDDD xyyyyyyyy GETXP [#]D [WZ],[WC] ' set flags for the pin-pair for usb bit-banging
' D = PINx (0..127), PINy := PINx XOR $1 (it's complementary pin-pair)
' C = C XOR PINx via WC
' Z = !(PINx OR PINy) via WZ (ie ZERO if both PINx and PINy are both ZERO == SE0 in USB)
PINx and PINy are a pair of pins. If PINx is even then PINy := PINx + 1 else if PINx is odd then PINy := PINx - 1
The allowance for the PINx/PINy pair to be reversed is for USB LS & HS where J/K are effectively swapped between D-/D+.
WZ & WC would normally be used.
=======================================================================================================
Reason: Add new instruction(s) for calculating/accumulating CRC for 1-bit using the Polynomial set in "ACCA"
The S value (sub-instruction bits) "yyyyyyyy" would use the next available slot after CACHEX
Thread: [URL="http://forums.parallax.com/showthread.php/151992-CRC-generation?p=1222728&viewfull=1#post1222728"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1222728[/COLOR][/URL]
1111111 xx x CCCC DDDDDDDDD xyyyyyyyy CRCBIT D ' accumulate CRC
' C = current data bit (to be accumulated)
' D = CRC Register
' ACCA = polynomial
The CRCBIT instruction performs the following...
(1) X := C XOR D[0]
(2) D := D >> 1
(3) if X == 1 then D := D XOR ACCA
Alternately, a special register to hold the polynomial "POLY" could be used, requiring the instruction(s)
1111111 x0 x xxxx DDDDDDDDD xyyyyyyyy CRCBIT D ' accumulate CRC
1111111 x1 x xxxx DDDDDDDDD xyyyyyyyy SETPOLY D ' set the polynomial to be used in
=======================================================================================================
Reason: Add new pin-pair variants for use with complementary/differential I/O 2 wire protocols
Thread: [URL="http://forums.parallax.com/showthread.php/151904-Here-is-the-update-from-the-Big-Change!!!?p=1222689&viewfull=1#post1222689"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1222689[/COLOR][/URL]
For reference only...
ZCL- 1111111 ZC L CCCC DDDDDDDDD x00111000 SETZC D/# (D[1:0] into Z/C via WZ/WC)
presume this really means...(D[1:0] into !Z/C via WZ/WC)
Currently
ZCL- 1111111 ZC L CCCC DDDDDDDDD x00110000 GETP D/# (pin into !Z/C via WZ/WC)
ZCL- 1111111 ZC L CCCC DDDDDDDDD x00110001 GETNP D/# (pin into Z/!C via WZ/WC)
--L- 1111111 xx L CCCC DDDDDDDDD x10011000 OFFP D/#
--L- 1111111 xx L CCCC DDDDDDDDD x10011001 NOTP D/#
--L- 1111111 xx L CCCC DDDDDDDDD x10011010 CLRP D/#
--L- 1111111 xx L CCCC DDDDDDDDD x10011011 SETP D/#
--L- 1111111 xx L CCCC DDDDDDDDD x10011100 SETPC D/#
--L- 1111111 xx L CCCC DDDDDDDDD x10011101 SETPNC D/#
--L- 1111111 xx L CCCC DDDDDDDDD x10011110 SETPZ D/#
--L- 1111111 xx L CCCC DDDDDDDDD x10011111 SETPNZ D/#
Replace with...
ZCL- 1111111 00 L CCCC DDDDDDDDD x00110000 GETPP D/# (pin-pair PINy:PINx into !Z/C)
ZCL- 1111111 ZC L CCCC DDDDDDDDD x00110000 GETP D/# (pin into !Z/C via WZ/WC)
ZCL- 1111111 00 L CCCC DDDDDDDDD x00110001 GETNPP D/# (pin-pair PINy:PINx into Z/!C)
ZCL- 1111111 ZC L CCCC DDDDDDDDD x00110001 GETNP D/# (pin into Z/!C via WZ/WC)
These could share opcodes???
--L- 1111111 00 L CCCC DDDDDDDDD x10011000 OFFP D/# (pin#=0??? , dir#=0)
--L- 1111111 01 L CCCC DDDDDDDDD x10011000 NOTP D/# (pin#=!pin# , dir#=1)
--L- 1111111 10 L CCCC DDDDDDDDD x10011000 CLRP D/# (pin#=0 , dir#=1)
--L- 1111111 11 L CCCC DDDDDDDDD x10011000 SETP D/# (pin#=1 , dir#=1)
These could share opcodes???
--L- 1111111 00 L CCCC DDDDDDDDD x10011001 SETPC D/# (pin#=C , dir#=1)
--L- 1111111 01 L CCCC DDDDDDDDD x10011001 SETPNC D/# (pin#=!C , dir#=1)
--L- 1111111 10 L CCCC DDDDDDDDD x10011001 SETPZ D/# (pin#=Z , dir#=1)
--L- 1111111 11 L CCCC DDDDDDDDD x10011001 SETPNZ D/# (pin#=!Z , dir#=1)
New pin-pair instructions...(could use x10011010-x10011111 if freed above, or use new sub-opcodes avail following CACHEX)
--L- 1111111 00 L CCCC DDDDDDDDD x10011010 OFFPP D/# (pin-pair PINy:PINx=00??? , dir#=00)
--L- 1111111 01 L CCCC DDDDDDDDD x10011010 NOTPP D/# (pin-pair PINy:PINx=!PINy:!PINx), dir#=11)
--L- 1111111 10 L CCCC DDDDDDDDD x10011010 CLRPP D/# (pin-pair PINy:PINx=00 , dir#=11)
--L- 1111111 11 L CCCC DDDDDDDDD x10011010 SETPP D/# (pin-pair PINy:PINx=11 , dir#=11)
--L- 1111111 00 L CCCC DDDDDDDDD x10011011 SETPPLH D/# (pin-pair PINy:PINx=01 , dir#=11)
--L- 1111111 01 L CCCC DDDDDDDDD x10011011 SETPPHL D/# (pin-pair PINy:PINx=10 , dir#=11)
Note: SETPPHL could be achievd by using SETPPLH PINy
I don't really see the need for these 2, but put it here in case you think it desirable...
--L- 1111111 10 L CCCC DDDDDDDDD x10011011 SETPPZC D/# (pin-pair PINy:PINx=!Z/C , dir#=1)
--L- 1111111 11 L CCCC DDDDDDDDD x10011011 SETPPNF D/# (pin-pair PINy:PINx=Z/!C , dir#=1)
D/# specifies PINx (0..127). PINy := PINx XOR #1 (ie it's twin pin-pair)
(ie PINx and PINy are a pair of pins. If PINx is even then PINy := PINx + 1 else if PINx is odd then PINy := PINx - 1)
=======================================================================================================
Reason: Combine to use 1 instruction with variants
Frees up opcodes 1000000 & 1000001
Remove WZ/WC options
Providing ENCOD can remove WZ option, it can move from 1000011,
freeing BLMASK to share with another instruction variant
Currently...
ZCWS 1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD3 D,S/#
ZCWS 1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD4 D,S/#
ZCWS 1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD5 D,S/#
Z-WS 1000011 Z0 I CCCC DDDDDDDDD SSSSSSSSS ENCOD D,S/# (shared with BLMASK)
Replace with...
--WS 1000010 00 I CCCC DDDDDDDDD SSSSSSSSS DECOD3 D,S/#
--WS 1000010 01 I CCCC DDDDDDDDD SSSSSSSSS DECOD4 D,S/#
--WS 1000010 10 I CCCC DDDDDDDDD SSSSSSSSS DECOD5 D,S/#
--WS 1000010 11 I CCCC DDDDDDDDD SSSSSSSSS ENCOD D,S/#
=======================================================================================================
Reason: Combine to use 1 instruction with variants
May facilitate later use of opcode 1111110
Currently...
-----------------------------------------------------------------------------------------------------
1111110 10 n nnnn nnnnnnnnn nnniiiiii REPS #n,#i 'execute 1..64 inst's 1..131072 times 1
1111111 00 0 CCCC 111111111 001iiiiii REPD #i 'execute 1..64 inst's infintely 1
1111111 00 0 CCCC DDDDDDDDD 001iiiiii REPD D,#i 'execute 1..64 inst's D+1 times 1
1111111 00 1 CCCC nnnnnnnnn 001iiiiii REPD #n,#i 'execute 1..64 inst's 1..512 times 1
-----------------------------------------------------------------------------------------------------
Replace with...
fL * ' *=infinitely
1111111 00 0 xxxx DDDDDDDDD 001iiiiii REPS D,#i 'execute 1..64 inst's D+1 times 1+1
1111111 00 1 xxxx xxxxxxxxx 001iiiiii REPS #i 'execute 1..64 inst's infinitely 1+1
1111111 01 n nnnn nnnnnnnnn 001iiiiii REPS #n,#i 'execute 1..64 inst's 1..16384 times 1+1
1111111 10 0 CCCC DDDDDDDDD 001iiiiii REPD D,#i 'execute 1..64 inst's D+1 times 1+3
1111111 10 1 CCCC xxxxxxxxx 001iiiiii REPD #i 'execute 1..64 inst's infinitely 1+3
1111111 11 0 CCCC nnnnnnnnn 001iiiiii REPD #n,#i 'execute 1..64 inst's 1..512 times 1+3
=======================================================================================================
Reason: Swap instruction opcodes GETWORD/SETWORD, WAITPEQ and WAITPNE with TESTB, WRBYTE/WRWORD and SQRT64/QSINCOS
so that SETNIB works with these instructions (ie all nibble #6 bits other than "n/nn/nnn" bits are zeros)
Of the instructions that have n, nn & nnn in their opcodes & WZ fields, only GETWORD/SETWORD, WAITPEQ and WAITPNE
have opcodes that have "1" bits in the 6th nibble (other than "n" bits).
If these instruction opcodes were swapped with TESTB, WRBYTE/WRWORD and SQRT64/QSINCOS,
their 6th nibble bits would have "0" bits in the non "n" bit positions.
This would permit the SETNIB D,[#]S,#6 instruction to be used to set the "n/nn/nnn" bits,
providing the remaining nibble bits are "0".
Thread: [URL="http://forums.parallax.com/showthread.php/151904-Here-is-the-update-from-the-Big-Change!!!?p=1222324&viewfull=1#post1222324"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1222324[/COLOR][/URL]
=======================================================================================================
Reason: Suggested by David & Bill for GCC assistance
Thread: [URL="http://forums.parallax.com/showthread.php/152079-Hub-Execution-Model-Thread-(split-from-blog)?p=1224484&viewfull=1#post1224484"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1224484[/COLOR][/URL]
(and also a little earlier for the history)
Background: Any instruction with an immediate value for #S is limited to 9-bits.
GCC often needs to manipulate a larger value, and so performs a few instructions to utilise this.
David & Bill can explain the purpose better than I can.
What is desired is a way to utilise an instruction to set an internal register, which, when combined with
the following instruction, which will use an immediate #S value, the resultant S value is an immediate
value of 32 bits. This would only work for the following instruction after "BIG", and the BIG would then
be reset to zeros (or a flag cleared).
Originally what was asked for is this BIG instruction to set the upper bits 31..9 with the immediate 32-bit "n"
field, and the lower bits 8..0 =0000000.
By making this more general purpose, perhaps the following might be implemented instead...
BIG #n sets an internal register "BIG" with the imediate 23 bits, either the top 23 bits or the bottom 23 bits,
depending on another instruction bit "Z". (ie Z indicates n<<23)
If the ALU now takes any #S instruction, and if the previous instruction was a "BIG", then the ALU will combine
the immediate 9 bits with the BIG register to form a new immediate value. Since there may be insufficient time
to add the BIG value to the #S value in the pipeline, it was thought that an "OR" of the bits might be simpler,
or alternatley, just use the upper 23 bits of BIG with the lower 9 bits of #S.
Presuming we can free up a full instruction, then...
xxxxxxx 10 n nnnn nnnnnnnnn nnnnnnnnn BIG #D ' Load 23 immediate bits into the lower "BIG" register bits 22..0 and zero bits 31..23.
xxxxxxx 11 n nnnn nnnnnnnnn nnnnnnnnn BIGU #D ' Load 23 immediate bits into the upper "BIG" register bits 31..9 and zero bits 8..0
4 such registers for use in multi-tasking.
=======================================================================================================
This would not be an acceptable substitute for CALLR. Chip and I talked about this when I was at Parallax last week. I don't want to have the side effect that the other CALL instructions would have when I'm using CALLR. All of the other CALL instructions write the return address to some stack and I don't want that behavior when I'm calling a leaf function.
Aside from my occasional snipe I've not being paying much attention to this topic. I just now tried to work out why this desire for a link register. As far as I can tell, the main reason is for speed. If that's true then, for the most part, CALLD/RETD should achieve 1 clock equivalent speed. What advantage would a link register have here?
Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.
Is there room to make the 'stack save' optional ?
Then you could do a SLCALL (single level call) and a (following later) JMPLIST and finally a SLRET
No, the stack calls are extremely useful, not to be removed.
The reason a CALLVEC is useful is that it specifies in one instruction the table (DLL) base address, and the index of the function to call. Saves memory.
I don't understand your requirement then. Are you saying you want all the CALLs to save in a fixed location? (please no)
I thought you just wanted an additional CALLR instruction (hub/cog) where the return address was stored in a fixed location. I didn't think any other CALLs needed changing???
Yes, that is what I need. Someone suggested that we could get rid of CALLR by having the other CALL instructions optionally store their return addresses in a register. I'm just saying that won't do what is needed. The CALLR instruction is the solution I'm looking for.
Aside from my occasional snipe I've not being paying much attention to this topic. I just now tried to work out why this desire for a link register. As far as I can tell, the main reason is for speed. If that's true then, for the most part, CALLD/RETD should achieve 1 clock equivalent speed. What advantage would a link register have here?
It turns out that CALLR is more flexible. You can either return directly to the address in the register or you can push the register on a hub stack. If you use the CALLD/RETD instructions you'd have to pop the return address off the stack into a register before pushing it onto the hub stack. Anyway, this whole thing was discussed in great detail much earlier in this thread.
I note that the latest docs (Line 1589+) have conflicting definitions for the index in D or S.
It's a matter of interpretation as to what is considered a base. D+S form an absolute address, they are equals. The @@ is a relative constant and only forms a base after being added to PC. Some descriptions for other processors have the constant labelled as an offset and the register as the base, but it's still just a label.
Actually, the instruction description could be improved. I believe JMPLIST D,@@relative16 means PC + @@ + D but I could be wrong. Where as JMPLIST D,S simply means D + S.
If you use the CALLD/RETD instructions you'd have to pop the return address off the stack into a register before pushing it onto the hub stack.
Ah, a hub stack, I see. I hadn't considered ever having one of those. Hmm, stacks are treated as unlimited too much of the time.
Actually, the real problem here is the combined data stack with the call-return stack. Keep them separate, problem solved. Hehe, I know, that would be hard for GCC to handle ... it's all been thrash out before ...
100% guaranteed. I'm Superman! ... hehe, I wasn't being very serious proposing a spilt stack as an alternative solution. Like I wasn't serious about blasphemy of CISC features with Bill. High level languages come with certain baggage, I get that.
The link register business has already been discussed and settled last Friday in Rocklin. If Chip can think of a more useful form than what we discussed, I'm sure he will mention it.
Yes, that is what I need. Someone suggested that we could get rid of CALLR by having the other CALL instructions optionally store their return addresses in a register. I'm just saying that won't do what is needed. The CALLR instruction is the solution I'm looking for.
Does the CALLR require 4 variants like the CALL instruction below?
100% guaranteed. I'm Superman! ... hehe, I wasn't being very serious proposing a spilt stack as an alternative solution. Like I wasn't serious about blasphemy of CISC features with Bill. High level languages come with certain baggage, I get that.
I'm not exactly sure I know what you mean by "baggage". If you have a better idea of how to handle function calling conventions I'd be happy to hear them. The use of the link register helps speed up leaf functions without overly burdoning non-leaf functions. The key is to use the same calling convention for all functions without having to know in advance whether they are leaf functions or not. If you can suggest a better calling convention that meets these requirements please let me know. We have to modify PropGCC to take advantage of Chip's new P2 instructions anyway so we could consider a new ABi if it will perform better than what we have now.
As it stands there would probably have to be explicit directives in the source; drawing a line between what uses an aux stack and what doesn't. This isn't unreasonable though and maybe an optimiser could automatically draw such a line.
I have sort of a general coding question, regarding verilog. You have added loads and loads of opcodes, they need arguments and you have huge fan-outs, and muxes for the results, how is it that it is so fast ?. (I mean 80+ MHz).
I got the idea of having more that one "opcode" register, so to say and then it will have smaller fan-outs, probably nothing new...
Thanks.
Edit: Maybe are the fpgas that fast... Lets see... I'll try to compile my (un-optimized and sub-par 6809) for the Cyclone V and see (It can do 40 MHz in the MachXO2, and 67 MHz in the Spartan3E, it only has 8 & 16 bit paths, but many muxes )
Edit: It can do 90 MHz on the cyclone V. (5CEFA2F23C8N).
I spent a lot of time working out the mux tree for ALU results, as it is the biggest structure with heavy delays, and it is on par with other critical paths. The biggest mux (64 inputs) passes only registered signals. Instructions like SETS/SETD/SETI/MOVBYTS just re-orient signals, so they can go through this mux. There are two other smaller mux's for faster signals, and one final mux which inputs all three other mux's, along with some late-arriving signals from instructions like ADD/SUB/RDBYTE/ROL. This mux tree needed a lot of tuning to get optimized. The synthesis guys, early on, said to just make one huge mux and let the logic compiler sort it all out - that didn't work well. I needed to give some more guidance by forming a mux tree before the tools would compile/place/route it efficiently.
So, before the first P2 developer boards are made, is there going to be a signature sheet passed around so anyone that contributed a feature suggestion or helped with the development and testing and sign it. How cool would it be to have an autographed mask on the board with Chip's signature and all the contributor's signatures?
We could do something like that, but it wouldn't be very legible over a mere square inch. This forum attests to all the contributions pretty well, at least.
I was thinking of Sapieha's request for a CALL-equivalent to the new list. Something like that could be useful for VM's, and even more so for operating systems / libraries.
CALLVECT D,#n
CALLVECT D,S
CALLLIST looked weird with 3 L's, so I changed it to VECT for the example
D holds the base address of a WORD table in the hub
n or S are the index
This way it could dispatch to 512 system/VM routines. I think only the 4-level hardware stack version would be needed.
It bears thinking on, even if only for P3.
I was thinking about this earlier because someone brought it up, and there is one big problem: CALLs need to specify which stack is to be used, so they take up a lot more space than JMPs do. We are pretty much out of room, already.
Comments
Frankly, I am not sure RISC and CISC mean anything any more. Also, it was pointed at P3,
I don't expect it to make it into P2. You have to admit, being able to call DLL functions regardless of where they were loaded would be useful. It would also help C++ support.
Sounds good, but there is already this
and Call to a series of offsets will take the same time as a call to a series of jumps ?
- and I think current opcodes can support a call to a series of jumps ?
WIth CALL D,index --- You directly call FUNCTION with return to caller.
In Yours example You need build second table that gave calls to be possible to return to caller
Eles with JMPLIST --- You will never return to caller
- the return address is not stored
- the function list is not word pointers at a fixed base address in the hub, but a long list (takes twice the hub ram)
However...
- SERDES is more important than this
- HUNGRY is more important than this
UN FUN DOGS:
unfundogs.jpg
FUN DOGS:
fundogs.jpg
Ok, carry on with the discussion!
JMPLIST D,S/@/@@
I note that the latest docs (Line 1589+) have conflicting definitions for the index in D or S.
Bill,
The pasm op name is currently limited to 7 characters. I like CALLVEC so perhaps we could rename JMPLIST to JMPVEC too?
Chip & all,
Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.
Sounds good to me!
I suspect your suggestion would work, as I don't think JMPx currently use the cog memory write port.
I believe using a statically fixed high register address such as $1F1 for some return address storage (eg. LR equivalent) will limit the number of threads that can be running hubexec code when the compiler uses this feature in the call code it generates because it then would become a common resource shared by all threads and collisions could occur. Instead it's better to either have the ability to configure this LR register at some particular COG address, or, at the very least, use a low memory address instead of $1F1 so register remapping could also be used and give each thread its own context using the same register address offsets. That approach will allow multithreaded high level language execution sharing common hubexec code which needs to be independent of task ID.
Edit: I vaguely recall this was touched on in the past and I think Chip was even talking about using 0 for the LR. But if is settable to something else that could be useful too.
I thought you just wanted an additional CALLR instruction (hub/cog) where the return address was stored in a fixed location. I didn't think any other CALLs needed changing???
Would it be possible to squeeze in the USB bit read instruction GETXP #/D WZ,WC into the next release please?
Here is the reference (and others) http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1224661&viewfull=1#post1224661
(note some of these have been resolved)
Aside from my occasional snipe I've not being paying much attention to this topic. I just now tried to work out why this desire for a link register. As far as I can tell, the main reason is for speed. If that's true then, for the most part, CALLD/RETD should achieve 1 clock equivalent speed. What advantage would a link register have here?
Is there room to make the 'stack save' optional ?
Then you could do a SLCALL (single level call) and a (following later) JMPLIST and finally a SLRET
Agreed.
The reason a CALLVEC is useful is that it specifies in one instruction the table (DLL) base address, and the index of the function to call. Saves memory.
you could
mov index,#fnix
call dllentry
...
dllentry: jmplist dllbase,dllindex
but that takes the extra mov
maybe CALLVEC will make P3...
It's a matter of interpretation as to what is considered a base. D+S form an absolute address, they are equals. The @@ is a relative constant and only forms a base after being added to PC. Some descriptions for other processors have the constant labelled as an offset and the register as the base, but it's still just a label.
Actually, the instruction description could be improved. I believe JMPLIST D,@@relative16 means PC + @@ + D but I could be wrong. Where as JMPLIST D,S simply means D + S.
Ah, a hub stack, I see. I hadn't considered ever having one of those. Hmm, stacks are treated as unlimited too much of the time.
Actually, the real problem here is the combined data stack with the call-return stack. Keep them separate, problem solved. Hehe, I know, that would be hard for GCC to handle ... it's all been thrash out before ...
Does the CALLR require 4 variants like the CALL instruction below?
I spent a lot of time working out the mux tree for ALU results, as it is the biggest structure with heavy delays, and it is on par with other critical paths. The biggest mux (64 inputs) passes only registered signals. Instructions like SETS/SETD/SETI/MOVBYTS just re-orient signals, so they can go through this mux. There are two other smaller mux's for faster signals, and one final mux which inputs all three other mux's, along with some late-arriving signals from instructions like ADD/SUB/RDBYTE/ROL. This mux tree needed a lot of tuning to get optimized. The synthesis guys, early on, said to just make one huge mux and let the logic compiler sort it all out - that didn't work well. I needed to give some more guidance by forming a mux tree before the tools would compile/place/route it efficiently.
We could do something like that, but it wouldn't be very legible over a mere square inch. This forum attests to all the contributions pretty well, at least.
I was thinking about this earlier because someone brought it up, and there is one big problem: CALLs need to specify which stack is to be used, so they take up a lot more space than JMPs do. We are pretty much out of room, already.