HUB EXEC Update Here

evanh · 2014-02-05 14:21

Tsk, tsk, asking for CISC instructions now ... :P

Bill Henning · 2014-02-05 14:24

Frankly, I am not sure RISC and CISC mean anything any more. Also, it was pointed at P3,

I don't expect it to make it into P2. You have to admit, being able to call DLL functions regardless of where they were loaded would be useful. It would also help C++ support.

evanh wrote: »

Tsk, tsk, asking for CISC instructions now ... :P

jmg · 2014-02-05 14:34

Bill Henning wrote: »

You have to admit, being able to call DLL functions regardless of where they were loaded would be useful. It would also help C++ support.

Sounds good, but there is already this

JMPLIST jumps to a base address (S/@/@@) plus index (D).

        JMPLIST D,@relative9        'jump to D plus 9-bit relative address
        JMPLIST D,@@relative16      'jump to D plus 16-bit relative address
        JMPLIST D,S                 'jump to D plus S

and Call to a series of offsets will take the same time as a call to a series of jumps ?
- and I think current opcodes can support a call to a series of jumps ?

Sapieha · 2014-02-05 14:43

Hi jmg.

WIth CALL D,index --- You directly call FUNCTION with return to caller.

In Yours example You need build second table that gave calls to be possible to return to caller

Eles with JMPLIST --- You will never return to caller

jmg wrote: »
Sounds good, but there is already this
JMPLIST jumps to a base address (S/@/@@) plus index (D).

        JMPLIST D,@relative9        'jump to D plus 9-bit relative address
        JMPLIST D,@@relative16      'jump to D plus 16-bit relative address
        JMPLIST D,S                 'jump to D plus S
and Call to a series of offsets will take the same time as a call to a series of jumps ?
- and I think current opcodes can support a call to a series of jumps ?

Bill Henning · 2014-02-05 14:53

Not quite, as:

- the return address is not stored
- the function list is not word pointers at a fixed base address in the hub, but a long list (takes twice the hub ram)

However...

- SERDES is more important than this
- HUNGRY is more important than this

jmg wrote: »
Sounds good, but there is already this
JMPLIST jumps to a base address (S/@/@@) plus index (D).

        JMPLIST D,@relative9        'jump to D plus 9-bit relative address
        JMPLIST D,@@relative16      'jump to D plus 16-bit relative address
        JMPLIST D,S                 'jump to D plus S
and Call to a series of offsets will take the same time as a call to a series of jumps ?
- and I think current opcodes can support a call to a series of jumps ?

mindrobots · 2014-02-05 14:54

Do we need to take a step back to reflect on Chip's vision that started this all??

UN FUN DOGS:

unfundogs.jpg

FUN DOGS:

fundogs.jpg

Ok, carry on with the discussion!

Cluso99 · 2014-02-05 15:06

Chip,

JMPLIST D,S/@/@@

I note that the latest docs (Line 1589+) have conflicting definitions for the index in D or S.

Bill,

The pasm op name is currently limited to 7 characters. I like CALLVEC so perhaps we could rename JMPLIST to JMPVEC too?

Chip & all,

Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.

Bill Henning · 2014-02-05 15:25

Ray,

Sounds good to me!

I suspect your suggestion would work, as I don't think JMPx currently use the cog memory write port.

Cluso99 wrote: »

Bill,

The pasm op name is currently limited to 7 characters. I like CALLVEC so perhaps we could rename JMPLIST to JMPVEC too?

Chip & all,

Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.

David Betz · 2014-02-05 15:37

Cluso99 wrote: »

Chip,

JMPLIST D,S/@/@@

I note that the latest docs (Line 1589+) have conflicting definitions for the index in D or S.

Bill,

The pasm op name is currently limited to 7 characters. I like CALLVEC so perhaps we could rename JMPLIST to JMPVEC too?

Chip & all,

Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.

This would not be an acceptable substitute for CALLR. Chip and I talked about this when I was at Parallax last week. I don't want to have the side effect that the other CALL instructions would have when I'm using CALLR. All of the other CALL instructions write the return address to some stack and I don't want that behavior when I'm calling a leaf function.

rogloh · 2014-02-05 15:56

Cluso99 wrote: »

Chip & all,

Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.

I believe using a statically fixed high register address such as $1F1 for some return address storage (eg. LR equivalent) will limit the number of threads that can be running hubexec code when the compiler uses this feature in the call code it generates because it then would become a common resource shared by all threads and collisions could occur. Instead it's better to either have the ability to configure this LR register at some particular COG address, or, at the very least, use a low memory address instead of $1F1 so register remapping could also be used and give each thread its own context using the same register address offsets. That approach will allow multithreaded high level language execution sharing common hubexec code which needs to be independent of task ID.

Edit: I vaguely recall this was touched on in the past and I think Chip was even talking about using 0 for the LR. But if is settable to something else that could be useful too.

Cluso99 · 2014-02-05 16:42

David Betz wrote: »

This would not be an acceptable substitute for CALLR. Chip and I talked about this when I was at Parallax last week. I don't want to have the side effect that the other CALL instructions would have when I'm using CALLR. All of the other CALL instructions write the return address to some stack and I don't want that behavior when I'm calling a leaf function.

I don't understand your requirement then. Are you saying you want all the CALLs to save in a fixed location? (please no)

I thought you just wanted an additional CALLR instruction (hub/cog) where the return address was stored in a fixed location. I didn't think any other CALLs needed changing???

Cluso99 · 2014-02-05 17:22

Chip,

Would it be possible to squeeze in the USB bit read instruction GETXP #/D WZ,WC into the next release please?
Here is the reference (and others) http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1224661&viewfull=1#post1224661
(note some of these have been resolved)

Posible instruction fixes/changes/suggestions/additions...
=======================================================================================================
Here is a possible fix required:
WAITCNT
Thread: [URL="http://forums.parallax.com/showthread.php/151904-Here-is-the-update-from-the-Big-Change!!!?p=1222701&viewfull=1#post1222701"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1222701[/COLOR][/URL]
=======================================================================================================
Reason: Add new pin-pair instruction for use with USB bit-banging receive (similar to GETP/GETNP)
        The S value (sub-instruction bits) "yyyyyyyy" would use the next available slot after CACHEX
Thread: [URL="http://forums.parallax.com/showthread.php/151904-Here-is-the-update-from-the-Big-Change!!!?p=1222515&viewfull=1#post1222515"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1222515[/COLOR][/URL]
1111111 ZC L CCCC DDDDDDDDD xyyyyyyyy       GETXP   [#]D [WZ],[WC]  ' set flags for the pin-pair for usb bit-banging  
                                                                    '   D = PINx (0..127), PINy := PINx XOR $1 (it's complementary pin-pair)
                                                                    '   C = C XOR PINx via WC
                                                                    '   Z = !(PINx OR PINy) via WZ (ie ZERO if both PINx and PINy are both ZERO == SE0 in USB)
PINx and PINy are a pair of pins. If PINx is even then PINy := PINx + 1 else if PINx is odd then PINy := PINx - 1
The allowance for the PINx/PINy pair to be reversed is for USB LS & HS where J/K are effectively swapped between D-/D+.
WZ & WC would normally be used.
=======================================================================================================
Reason: Add new instruction(s) for calculating/accumulating CRC for 1-bit using the Polynomial set in "ACCA"
        The S value (sub-instruction bits) "yyyyyyyy" would use the next available slot after CACHEX
        
Thread: [URL="http://forums.parallax.com/showthread.php/151992-CRC-generation?p=1222728&viewfull=1#post1222728"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1222728[/COLOR][/URL]
1111111 xx x CCCC DDDDDDDDD xyyyyyyyy       CRCBIT  D   ' accumulate CRC
                                                        '   C    = current data bit (to be accumulated)
                                                        '   D    = CRC Register
                                                        '   ACCA = polynomial
The CRCBIT instruction performs the following...
(1) X := C XOR D[0]
(2) D := D >> 1
(3) if X == 1 then D := D XOR ACCA
Alternately, a special register to hold the polynomial "POLY" could be used, requiring the instruction(s)
1111111 x0 x xxxx DDDDDDDDD xyyyyyyyy       CRCBIT  D   ' accumulate CRC
1111111 x1 x xxxx DDDDDDDDD xyyyyyyyy       SETPOLY D   ' set the polynomial to be used in 
=======================================================================================================
Reason: Add new pin-pair variants for use with complementary/differential I/O 2 wire protocols
Thread: [URL="http://forums.parallax.com/showthread.php/151904-Here-is-the-update-from-the-Big-Change!!!?p=1222689&viewfull=1#post1222689"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1222689[/COLOR][/URL]

For reference only...
ZCL-            1111111 ZC L CCCC DDDDDDDDD x00111000           SETZC   D/#             (D[1:0] into Z/C via WZ/WC)
                                                            presume this really means...(D[1:0] into !Z/C via WZ/WC)
Currently
ZCL-            1111111 ZC L CCCC DDDDDDDDD x00110000           GETP    D/#             (pin into !Z/C via WZ/WC)
ZCL-            1111111 ZC L CCCC DDDDDDDDD x00110001           GETNP   D/#             (pin into Z/!C via WZ/WC)
--L-            1111111 xx L CCCC DDDDDDDDD x10011000           OFFP    D/#
--L-            1111111 xx L CCCC DDDDDDDDD x10011001           NOTP    D/#
--L-            1111111 xx L CCCC DDDDDDDDD x10011010           CLRP    D/#
--L-            1111111 xx L CCCC DDDDDDDDD x10011011           SETP    D/#
--L-            1111111 xx L CCCC DDDDDDDDD x10011100           SETPC   D/#
--L-            1111111 xx L CCCC DDDDDDDDD x10011101           SETPNC  D/#
--L-            1111111 xx L CCCC DDDDDDDDD x10011110           SETPZ   D/#
--L-            1111111 xx L CCCC DDDDDDDDD x10011111           SETPNZ  D/#
Replace with...
ZCL-            1111111 00 L CCCC DDDDDDDDD x00110000           GETPP   D/#     (pin-pair PINy:PINx into !Z/C)
ZCL-            1111111 ZC L CCCC DDDDDDDDD x00110000           GETP    D/#             (pin into !Z/C via WZ/WC)
ZCL-            1111111 00 L CCCC DDDDDDDDD x00110001           GETNPP  D/#     (pin-pair PINy:PINx into Z/!C)
ZCL-            1111111 ZC L CCCC DDDDDDDDD x00110001           GETNP   D/#             (pin into Z/!C via WZ/WC)
These could share opcodes???
--L-            1111111 00 L CCCC DDDDDDDDD x10011000           OFFP    D/#             (pin#=0???  , dir#=0)
--L-            1111111 01 L CCCC DDDDDDDDD x10011000           NOTP    D/#             (pin#=!pin# , dir#=1)
--L-            1111111 10 L CCCC DDDDDDDDD x10011000           CLRP    D/#             (pin#=0     , dir#=1)
--L-            1111111 11 L CCCC DDDDDDDDD x10011000           SETP    D/#             (pin#=1     , dir#=1)
These could share opcodes???
--L-            1111111 00 L CCCC DDDDDDDDD x10011001           SETPC   D/#             (pin#=C     , dir#=1)
--L-            1111111 01 L CCCC DDDDDDDDD x10011001           SETPNC  D/#             (pin#=!C    , dir#=1)
--L-            1111111 10 L CCCC DDDDDDDDD x10011001           SETPZ   D/#             (pin#=Z     , dir#=1)
--L-            1111111 11 L CCCC DDDDDDDDD x10011001           SETPNZ  D/#             (pin#=!Z    , dir#=1)
New pin-pair instructions...(could use x10011010-x10011111 if freed above, or use new sub-opcodes avail following CACHEX)
--L-            1111111 00 L CCCC DDDDDDDDD x10011010           OFFPP   D/#     (pin-pair PINy:PINx=00???       , dir#=00)
--L-            1111111 01 L CCCC DDDDDDDDD x10011010           NOTPP   D/#     (pin-pair PINy:PINx=!PINy:!PINx), dir#=11)
--L-            1111111 10 L CCCC DDDDDDDDD x10011010           CLRPP   D/#     (pin-pair PINy:PINx=00          , dir#=11)
--L-            1111111 11 L CCCC DDDDDDDDD x10011010           SETPP   D/#     (pin-pair PINy:PINx=11          , dir#=11)
--L-            1111111 00 L CCCC DDDDDDDDD x10011011           SETPPLH D/#     (pin-pair PINy:PINx=01          , dir#=11)
--L-            1111111 01 L CCCC DDDDDDDDD x10011011           SETPPHL D/#     (pin-pair PINy:PINx=10          , dir#=11)
                                                                  Note: SETPPHL could be achievd by using SETPPLH PINy
I don't really see the need for these 2, but put it here in case you think it desirable...
--L-            1111111 10 L CCCC DDDDDDDDD x10011011           SETPPZC D/#     (pin-pair PINy:PINx=!Z/C        , dir#=1)
--L-            1111111 11 L CCCC DDDDDDDDD x10011011           SETPPNF D/#     (pin-pair PINy:PINx=Z/!C        , dir#=1)
D/# specifies PINx (0..127). PINy := PINx XOR #1 (ie it's twin pin-pair)
(ie PINx and PINy are a pair of pins. If PINx is even then PINy := PINx + 1 else if PINx is odd then PINy := PINx - 1)
=======================================================================================================
Reason: Combine to use 1 instruction with variants
        Frees up opcodes 1000000 & 1000001
        Remove WZ/WC options
        Providing ENCOD can remove WZ option, it can move from 1000011,
         freeing BLMASK to share with another instruction variant
        
Currently...
ZCWS            1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS           DECOD3  D,S/#
ZCWS            1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS           DECOD4  D,S/#
ZCWS            1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS           DECOD5  D,S/#
Z-WS            1000011 Z0 I CCCC DDDDDDDDD SSSSSSSSS           ENCOD   D,S/#   (shared with BLMASK)

Replace with...
--WS            1000010 00 I CCCC DDDDDDDDD SSSSSSSSS           DECOD3  D,S/#
--WS            1000010 01 I CCCC DDDDDDDDD SSSSSSSSS           DECOD4  D,S/#
--WS            1000010 10 I CCCC DDDDDDDDD SSSSSSSSS           DECOD5  D,S/#
--WS            1000010 11 I CCCC DDDDDDDDD SSSSSSSSS           ENCOD   D,S/#   
=======================================================================================================
Reason: Combine to use 1 instruction with variants
        May facilitate later use of opcode 1111110
Currently...        
-----------------------------------------------------------------------------------------------------
1111110 10 n nnnn nnnnnnnnn nnniiiiii        REPS    #n,#i   'execute 1..64 inst's 1..131072 times  1
1111111 00 0 CCCC 111111111 001iiiiii        REPD    #i      'execute 1..64 inst's infintely        1
1111111 00 0 CCCC DDDDDDDDD 001iiiiii        REPD    D,#i    'execute 1..64 inst's D+1 times        1
1111111 00 1 CCCC nnnnnnnnn 001iiiiii        REPD    #n,#i   'execute 1..64 inst's 1..512 times     1
-----------------------------------------------------------------------------------------------------
Replace with...
        fL *                                                 ' *=infinitely
1111111 00 0 xxxx DDDDDDDDD 001iiiiii        REPS    D,#i    'execute 1..64 inst's D+1 times        1+1
1111111 00 1 xxxx xxxxxxxxx 001iiiiii        REPS    #i      'execute 1..64 inst's infinitely       1+1
1111111 01 n nnnn nnnnnnnnn 001iiiiii        REPS    #n,#i   'execute 1..64 inst's 1..16384 times   1+1
1111111 10 0 CCCC DDDDDDDDD 001iiiiii        REPD    D,#i    'execute 1..64 inst's D+1 times        1+3
1111111 10 1 CCCC xxxxxxxxx 001iiiiii        REPD    #i      'execute 1..64 inst's infinitely       1+3
1111111 11 0 CCCC nnnnnnnnn 001iiiiii        REPD    #n,#i   'execute 1..64 inst's 1..512 times     1+3
=======================================================================================================
Reason: Swap instruction opcodes GETWORD/SETWORD, WAITPEQ and WAITPNE with TESTB, WRBYTE/WRWORD and SQRT64/QSINCOS
          so that SETNIB works with these instructions (ie all nibble #6 bits other than "n/nn/nnn" bits are zeros)
Of the instructions that have n, nn & nnn in their opcodes & WZ fields, only GETWORD/SETWORD, WAITPEQ and WAITPNE
have opcodes that have "1" bits in the 6th nibble (other than "n" bits).
If these instruction opcodes were swapped with TESTB, WRBYTE/WRWORD and SQRT64/QSINCOS,
their 6th nibble bits would have "0" bits in the non "n" bit positions.
This would permit the SETNIB D,[#]S,#6 instruction to be used to set the "n/nn/nnn" bits,
providing the remaining nibble bits are "0".
Thread: [URL="http://forums.parallax.com/showthread.php/151904-Here-is-the-update-from-the-Big-Change!!!?p=1222324&viewfull=1#post1222324"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1222324[/COLOR][/URL]
=======================================================================================================
Reason: Suggested by David & Bill for GCC assistance
Thread: [URL="http://forums.parallax.com/showthread.php/152079-Hub-Execution-Model-Thread-(split-from-blog)?p=1224484&viewfull=1#post1224484"][COLOR=#4366fb]http://forums.parallax.com/showthrea...=1#post1224484[/COLOR][/URL]
(and also a little earlier for the history)
Background: Any instruction with an immediate value for #S is limited to 9-bits.
            GCC often needs to manipulate a larger value, and so performs a few instructions to utilise this.
            David & Bill can explain the purpose better than I can.
What is desired is a way to utilise an instruction to set an internal register, which, when combined with
the following instruction, which will use an immediate #S value, the resultant S value is an immediate
value of 32 bits. This would only work for the following instruction after "BIG", and the BIG would then
be reset to zeros (or a flag cleared).
Originally what was asked for is this BIG instruction to set the upper bits 31..9 with the immediate 32-bit "n"
field, and the lower bits 8..0 =0000000.
By making this more general purpose, perhaps the following might be implemented instead...
BIG #n sets an internal register "BIG" with the imediate 23 bits, either the top 23 bits or the bottom 23 bits,
depending on another instruction bit "Z". (ie Z indicates n<<23)
If the ALU now takes any #S instruction, and if the previous instruction was a "BIG", then the ALU will combine
the immediate 9 bits with the BIG register to form a new immediate value. Since there may be insufficient time
to add the BIG value to the #S value in the pipeline, it was thought that an "OR" of the bits might be simpler,
or alternatley, just use the upper 23 bits of BIG with the lower 9 bits of #S.
              
Presuming we can free up a full instruction, then... 
xxxxxxx 10 n nnnn nnnnnnnnn nnnnnnnnn        BIG     #D      ' Load 23 immediate bits into the lower "BIG" register bits 22..0 and zero bits 31..23.
xxxxxxx 11 n nnnn nnnnnnnnn nnnnnnnnn        BIGU    #D      ' Load 23 immediate bits into the upper "BIG" register bits 31..9 and zero bits 8..0
4 such registers for use in multi-tasking.
=======================================================================================================

evanh · 2014-02-05 17:53

David Betz wrote: »

This would not be an acceptable substitute for CALLR. Chip and I talked about this when I was at Parallax last week. I don't want to have the side effect that the other CALL instructions would have when I'm using CALLR. All of the other CALL instructions write the return address to some stack and I don't want that behavior when I'm calling a leaf function.

Aside from my occasional snipe I've not being paying much attention to this topic. I just now tried to work out why this desire for a link register. As far as I can tell, the main reason is for speed. If that's true then, for the most part, CALLD/RETD should achieve 1 clock equivalent speed. What advantage would a link register have here?

jmg · 2014-02-05 18:06

Cluso99 wrote: »

Could all the JMP instructions save a return address in the $1F1 register without penalty if the JMP is taken?
Could this solve the GCC request for CALLR by using JMPxx where the return address is saved in $1F1? Note if further JMP's were used, then the GCC would need to perform a MOV $1F0,$1F1 to save the return address before it is overwritten.

Is there room to make the 'stack save' optional ?
Then you could do a SLCALL (single level call) and a (following later) JMPLIST and finally a SLRET

jmg · 2014-02-05 18:07

Bill Henning wrote: »

However...
- SERDES is more important than this

Agreed.

Bill Henning · 2014-02-05 18:12

No, the stack calls are extremely useful, not to be removed.

The reason a CALLVEC is useful is that it specifies in one instruction the table (DLL) base address, and the index of the function to call. Saves memory.

you could

mov index,#fnix
call dllentry

...

dllentry: jmplist dllbase,dllindex

but that takes the extra mov

maybe CALLVEC will make P3...

jmg wrote: »

Is there room to make the 'stack save' optional ?
Then you could do a SLCALL (single level call) and a (following later) JMPLIST and finally a SLRET

David Betz · 2014-02-05 18:22

Cluso99 wrote: »

I don't understand your requirement then. Are you saying you want all the CALLs to save in a fixed location? (please no)

I thought you just wanted an additional CALLR instruction (hub/cog) where the return address was stored in a fixed location. I didn't think any other CALLs needed changing???

Yes, that is what I need. Someone suggested that we could get rid of CALLR by having the other CALL instructions optionally store their return addresses in a register. I'm just saying that won't do what is needed. The CALLR instruction is the solution I'm looking for.

David Betz · 2014-02-05 18:27

evanh wrote: »

Aside from my occasional snipe I've not being paying much attention to this topic. I just now tried to work out why this desire for a link register. As far as I can tell, the main reason is for speed. If that's true then, for the most part, CALLD/RETD should achieve 1 clock equivalent speed. What advantage would a link register have here?

It turns out that CALLR is more flexible. You can either return directly to the address in the register or you can push the register on a hub stack. If you use the CALLD/RETD instructions you'd have to pop the return address off the stack into a register before pushing it onto the hub stack. Anyway, this whole thing was discussed in great detail much earlier in this thread.

evanh · 2014-02-05 18:39

Cluso99 wrote: »

Chip,

JMPLIST D,S/@/@@

I note that the latest docs (Line 1589+) have conflicting definitions for the index in D or S.

It's a matter of interpretation as to what is considered a base. D+S form an absolute address, they are equals. The @@ is a relative constant and only forms a base after being added to PC. Some descriptions for other processors have the constant labelled as an offset and the register as the base, but it's still just a label.

Actually, the instruction description could be improved. I believe JMPLIST D,@@relative16 means PC + @@ + D but I could be wrong. Where as JMPLIST D,S simply means D + S.

evanh · 2014-02-05 18:42

David Betz wrote: »

If you use the CALLD/RETD instructions you'd have to pop the return address off the stack into a register before pushing it onto the hub stack.

Ah, a hub stack, I see. I hadn't considered ever having one of those. Hmm, stacks are treated as unlimited too much of the time.

Actually, the real problem here is the combined data stack with the call-return stack. Keep them separate, problem solved. Hehe, I know, that would be hard for GCC to handle ... it's all been thrash out before ...

David Betz · 2014-02-05 19:02

evanh wrote: »

Ah, a hub stack, I see. I hadn't considered ever having one of those. Hmm, stacks are treated as unlimited too much of the time.

Actually, the real problem here is the combined data stack with the call-return stack. Keep them separate, problem solved.

I'm sure you know better than anyone else.

evanh · 2014-02-05 19:29

100% guaranteed. I'm Superman! ... hehe, I wasn't being very serious proposing a spilt stack as an alternative solution. Like I wasn't serious about blasphemy of CISC features with Bill. High level languages come with certain baggage, I get that.

jazzed · 2014-02-05 19:32

The link register business has already been discussed and settled last Friday in Rocklin. If Chip can think of a more useful form than what we discussed, I'm sure he will mention it.

Cluso99 · 2014-02-05 19:47

David Betz wrote: »

Yes, that is what I need. Someone suggested that we could get rid of CALLR by having the other CALL instructions optionally store their return addresses in a register. I'm just saying that won't do what is needed. The CALLR instruction is the solution I'm looking for.

Does the CALLR require 4 variants like the CALL instruction below?

----  1111110 01 1 CCCC 00 nnnnnnnnnnnnnnnn     CALL    #abs
----  1111110 01 1 CCCC 01 nnnnnnnnnnnnnnnn     CALL    @rel
----  1111110 01 1 CCCC 10 nnnnnnnnnnnnnnnn     CALLD   #abs
----  1111110 01 1 CCCC 11 nnnnnnnnnnnnnnnn     CALLD   @rel

David Betz · 2014-02-05 19:51

Cluso99 wrote: »

Does the CALLR require 4 variants like the CALL instruction below?

----  1111110 01 1 CCCC 00 nnnnnnnnnnnnnnnn     CALL    #abs
----  1111110 01 1 CCCC 01 nnnnnnnnnnnnnnnn     CALL    @rel
----  1111110 01 1 CCCC 10 nnnnnnnnnnnnnnnn     CALLD   #abs
----  1111110 01 1 CCCC 11 nnnnnnnnnnnnnnnn     CALLD   @rel

I'm not sure how Chip plans to implement it.

David Betz · 2014-02-05 20:18

evanh wrote: »

100% guaranteed. I'm Superman! ... hehe, I wasn't being very serious proposing a spilt stack as an alternative solution. Like I wasn't serious about blasphemy of CISC features with Bill. High level languages come with certain baggage, I get that.

I'm not exactly sure I know what you mean by "baggage". If you have a better idea of how to handle function calling conventions I'd be happy to hear them. The use of the link register helps speed up leaf functions without overly burdoning non-leaf functions. The key is to use the same calling convention for all functions without having to know in advance whether they are leaf functions or not. If you can suggest a better calling convention that meets these requirements please let me know. We have to modify PropGCC to take advantage of Chip's new P2 instructions anyway so we could consider a new ABi if it will perform better than what we have now.

evanh · 2014-02-05 21:47

As it stands there would probably have to be explicit directives in the source; drawing a line between what uses an aux stack and what doesn't. This isn't unreasonable though and maybe an optimiser could automatically draw such a line.

cgracey · 2014-02-06 00:35

Ale wrote: »

Hei Chip,

I have sort of a general coding question, regarding verilog. You have added loads and loads of opcodes, they need arguments and you have huge fan-outs, and muxes for the results, how is it that it is so fast ?. (I mean 80+ MHz).
I got the idea of having more that one "opcode" register, so to say and then it will have smaller fan-outs, probably nothing new...

Thanks.

Edit: Maybe are the fpgas that fast... Lets see... I'll try to compile my (un-optimized and sub-par 6809) for the Cyclone V and see (It can do 40 MHz in the MachXO2, and 67 MHz in the Spartan3E, it only has 8 & 16 bit paths, but many muxes )

Edit: It can do 90 MHz on the cyclone V. (5CEFA2F23C8N).

I spent a lot of time working out the mux tree for ALU results, as it is the biggest structure with heavy delays, and it is on par with other critical paths. The biggest mux (64 inputs) passes only registered signals. Instructions like SETS/SETD/SETI/MOVBYTS just re-orient signals, so they can go through this mux. There are two other smaller mux's for faster signals, and one final mux which inputs all three other mux's, along with some late-arriving signals from instructions like ADD/SUB/RDBYTE/ROL. This mux tree needed a lot of tuning to get optimized. The synthesis guys, early on, said to just make one huge mux and let the logic compiler sort it all out - that didn't work well. I needed to give some more guidance by forming a mux tree before the tools would compile/place/route it efficiently.

cgracey · 2014-02-06 00:38

mindrobots wrote: »

Great stuff, Chip!

The consistent operation will be a big help!

So, before the first P2 developer boards are made, is there going to be a signature sheet passed around so anyone that contributed a feature suggestion or helped with the development and testing and sign it. How cool would it be to have an autographed mask on the board with Chip's signature and all the contributor's signatures?

We could do something like that, but it wouldn't be very legible over a mere square inch. This forum attests to all the contributions pretty well, at least.

cgracey · 2014-02-06 00:46

Bill Henning wrote: »

I was thinking of Sapieha's request for a CALL-equivalent to the new list. Something like that could be useful for VM's, and even more so for operating systems / libraries.

CALLVECT D,#n
CALLVECT D,S

CALLLIST looked weird with 3 L's, so I changed it to VECT for the example

D holds the base address of a WORD table in the hub

n or S are the index

This way it could dispatch to 512 system/VM routines. I think only the 4-level hardware stack version would be needed.

It bears thinking on, even if only for P3.

I was thinking about this earlier because someone brought it up, and there is one big problem: CALLs need to specify which stack is to be used, so they take up a lot more space than JMPs do. We are pretty much out of room, already.

HUB EXEC Update Here

Comments