Shop OBEX P1 Docs P2 Docs Learn Events
Fast Bytecode Interpreter - Page 12 — Parallax Forums

Fast Bytecode Interpreter

191012141530

Comments

  • Cluso99 wrote: »
    BTW What about JMPSKIP or JMPSKP and CALLSKP ?

    +1. So much more clear.
  • Chip
    That's my favourite type of shredding! :cool:
  • Cluso99Cluso99 Posts: 18,066
    Knees won't take that type of shredding anymore :(
  • cgraceycgracey Posts: 14,131
    edited 2017-03-23 12:28
    Seairth wrote: »
    Cluso99 wrote: »
    BTW What about JMPSKIP or JMPSKP and CALLSKP ?

    +1. So much more clear.

    I think I agree, but SKIP is going to have to become SKP to make:

    SKP
    JMPSKP
    CALLSKP

    ...Wait. There's a reason I didn't do that, in the first place. These instructions are not regular jumps, since they are confined to cog memory. Also, they do not use separate jump addresses and skip patterns, as one would expect. They are a special case of instruction that exists to exploit two things: fast branching and fast skipping, only possible inside cog memory.

    ROBOJMP
    ROBOSUB
  • JMPPAT
    CALLPAT
  • cgracey wrote: »
    I moved the RET/RETA/RETB instructions to overlap with CALL/CALLA/CALLB, since they could be differentiated by opposite immediate bits. This freed up three {#}D instruction slots, into which I placed SKIP and two other new ones that really take care of business:
    Ah, more instruction encoding changes. So much for a frozen design. :-)
    It's a good thing that I wrote a program to parse your instruction spreadsheet to generate the tables to drive the assembler.

  • Cluso99Cluso99 Posts: 18,066
    cgracey wrote: »
    Seairth wrote: »
    Cluso99 wrote: »
    BTW What about JMPSKIP or JMPSKP and CALLSKP ?

    +1. So much more clear.

    I think I agree, but SKIP is going to have to become SKP to make:

    SKP
    JMPSKP
    CALLSKP

    ...Wait. There's a reason I didn't do that, in the first place. These instructions are not regular jumps, since they are confined to cog memory. Also, they do not use separate jump addresses and skip patterns, as one would expect. They are a special case of instruction that exists to exploit two things: fast branching and fast skipping, only possible inside cog memory.

    ROBOJMP
    ROBOSUB
    I don't have a problem with JMPSKP and CALLSKP and being limited to cog/lut addresses. They are quite specific (not generic) instructions.
  • We'll, whatever the name, not SHRED. That doesn't indicate that it's branching or skipping. It sounds more like a variant of the SEUSS instructions.
  • RaymanRayman Posts: 13,767
    how about HOPSKP...
  • cgracey wrote: »
    Seairth wrote: »
    Cluso99 wrote: »
    BTW What about JMPSKIP or JMPSKP and CALLSKP ?

    +1. So much more clear.


    CALLSKP

    That sounds like "Call Skype" :)

  • MJBMJB Posts: 1,235
    edited 2017-03-23 15:46
    jmg wrote: »
    ersmith wrote: »
    For example, IIRC the original Tachyon interpreter executes bytecodes very quickly because each bytecode is basically the address of the routine implementing it. The outer loop for such an interpreter is extremely small and fast, and skip is unlikely to be of much help.
    Yes, IIRC Peter changed to a 16b-code in V4 to get more speed, and it would be interesting to see what his inner loops code like on the newest P2 .
    It is a good idea to test this on more than one 'byte-code' engine.
    eg I see in the Lua thread, that has a 6b instruction choice.

    @jmg, you have asked ...
    @Peter_Jakacki might not have seen your question.

    So I take the liberty to present you the latest Tachyon-P2 kernel snipped I found on Peter's DROPBOX.

    It is a wordcode interpreter, uses the first 256 addresses to directly address COG memory.
    Actually this code seems a little 'old' - 17 days ;-). The code for Prop1 below for Tachyon V4.2 shows some enhancements.
    DAT
    '*************** TACHYON COG KERNEL for Propeller 2 *******************
    
    		org 0
    
    RESET
    		call	#INITCOG		' run non-time critical init from hubexec
    		jmp	#doNEXT
    '
    ' main Forth wordcode interpreter
    '
    
    exec
    		cmp 	X,wordcodes wc
    	if_nc	jmp 	#ENTER
    	 	call	X                     	' could call cog or hub code - use ret to return
    doNEXT          rdword	X,PTRA++   ' read word code instruction address, 
                                                                ' PTRA used as instruction pointer
    	       	cmp	ops,X wc
    	if_nc	jmp	#exec
    		call	#dolit8            ' handle 8 bit literal inside wordcode, saves a rdbyte
    		jmp	#doNEXT
    
    ops 		long 	$FF00-1
    wordcodes	long 	endcode
    
    ' Save IP and load with new IP from call
    ENTER 		test 	X,#1 wz
    		andn 	X,#1
    	if_z	wrlut	PTRA,retptr		' save IP onto return stack
    	if_z  	add	retptr,#1
     		mov 	PTRA,X          	' jump to new wordcode
    		jmp 	#doNEXT
    

    Prop1 V4.2 code for inner loop
    { *** RUNTIME BYTECODE INTERPRETER for Propeller 1 *** }
    '       *       *       *       *
    ' Fetch the next byte code instruction in hub RAM pointed to by the instruction pointer IP
    ' This is the very heart of the runtime interpreter
    '
    doNEXT                  rdword  R0,IP            ' read word code instruction
                            add     IP,#2 wc        ' advance IP to next wordcode (clears the carry too!)
                            shr     R0,#9 nr,wz      ' cog or hub?
                     if_z   jmp     R0               ' execute the code by directly indexing the first 512 longs in cog
                            ' literal?
                            mov     X,R0
                            shr     X,#15 nr,wz     ' embedded 15-bit literal?
                     if_nz  and     X,mask15
                     if_nz  jmp     #PUSHX          ' push this literal without having to do a call/return
                            ' conditional jump?
                            cmp     jumpop,X wc     ' test for jump + disp
                     if_c   jmp     #doJUMP
                            ' register op?
                            cmp     regop,X wc      ' test for REG + disp
                     if_c   jmp     #doREG
                            ' must be a call/jump
                            test    X,#1 wz         ' skip pushing the IP if bit 0 is set
                     if_nz  andn    X,#1            ' and fixup
                     if_z   call    #SAVEIP         ' otherwise this is an address of a high-level word code
                            mov     IP,X            ' so after saving the IP, load it with new address
                            jmp     #doNEXT
    
    ' Conditional - Jumps are encoded with an 8-bit signed word displacement
    doJUMP                                          ' unconditional jump ? c=1
                            tjnz    tos,#DROP       ' continue without jumping - discard flag
                            test    X,#$80 wz       ' reverse jump? nz
                            and     X,#$7F          ' mask displacement
                            shl     X,#1            ' index words
                            sumnz    IP,X           ' +/- jump
                            jmp     #DROP           ' discard flag if conditional
    
                            ' any opcodes from here on are jumps with encoded +/-128 displacement
    jumpop          long    $7F00-1
    
    
    doREG '( -- addr )
                            and     X,#$FF
                            add     X,regptr
                            jmp     #PUSHX
    
                            ' address 256 bytes of task registers per cogid
    regop           long    rg-1
    
    
    
    
  • Roy ElthamRoy Eltham Posts: 2,996
    edited 2017-03-23 17:09
    I like JMPSKP & CALLSKP much more than SHRED & SHREDR, and am fine with them being cog/lut only while the other JMP/CALL instructions are not.

    Also, SKIP can stay SKIP, it doesn't have to shorten to SKP.

    Some day I'll convince you that the instructions can be longer than 7 characters. Fitting in a single 8 char tab stop space is not important. It's a detriment.

    I would use JUMPSKIP, CALLSKIP, etc.
  • Hub jump could be added with probably little effort with something like.
    SETQ #base
    CALLSKP reg
    

    Where the resulting branchaddress would be to (base << 10) + reg[9:0]. If base is zero, then it would act as it does now (cog/lut exec). This would give you a skip table of 256 entries (because of the byte addressing).

    The question is: do we need it?
  • Cluso99Cluso99 Posts: 18,066
    Seairth wrote: »
    Hub jump could be added with probably little effort with something like.
    SETQ #base
    CALLSKP reg
    

    Where the resulting branchaddress would be to (base << 10) + reg[9:0]. If base is zero, then it would act as it does now (cog/lut exec). This would give you a skip table of 256 entries (because of the byte addressing).

    The question is: do we need it?
    If hub jump/call was to be supported (I'm quite happy without it) then it would be better to be consistent and use the AUG + JMPSKP/CALLSKP sequence.
  • Cluso99Cluso99 Posts: 18,066
    edited 2017-03-23 21:09
    Roy Eltham wrote: »
    I like JMPSKP & CALLSKP much more than SHRED & SHREDR, and am fine with them being cog/lut only while the other JMP/CALL instructions are not.

    Also, SKIP can stay SKIP, it doesn't have to shorten to SKP.

    Some day I'll convince you that the instructions can be longer than 7 characters. Fitting in a single 8 char tab stop space is not important. It's a detriment.

    I would use JUMPSKIP, CALLSKIP, etc.

    +1

    Perhaps it's more a restriction in pnut that's clouding Chip's view?

    Anyway this topic can wait until the silicon is underway.

    Twice in a week we agree. What happened?
  • Cluso99 wrote: »
    If hub jump/call was to be supported (I'm quite happy without it) then it would be better to be consistent and use the AUG + JMPSKP/CALLSKP sequence.

    Except AUG is for extending immediate values, whereas SETQ is often used to modify the behavior of an instruction.
  • jmgjmg Posts: 15,140
    cgracey wrote: »
    Whoa! This would blow up in cog-exec mode, because the REP logic counts instructions. If SKIP is causing fewer or variable numbers of instructions to execute, it's not going to work, unless you planned very carefully with perhaps constant instruction counts in every SKIP case, and set the REP block size accordingly.

    All REP does is get you out of needing a 4-clock branch instruction. It's only useful/beneficial for blocks of only several instructions, I think. There's probably no need for SKIP inside REP. But, you may find some value in it. Seems like a recipe for madness, to me.

    That means code that works in HUBexec, will break if moved into COG ? (and vice versa)
    That rather breaks one of the fundamental, hard won P2 gains, where binary code did not care where it ran, it just ran a bit slower in HUB. Nothing broke.

  • jmgjmg Posts: 15,140
    David Betz wrote: »
    Ah, more instruction encoding changes. So much for a frozen design. :-)
    It's a good thing that I wrote a program to parse your instruction spreadsheet to generate the tables to drive the assembler.
    Hehe, now that is smart thinking !

  • jmgjmg Posts: 15,140
    Roy Eltham wrote: »
    Some day I'll convince you that the instructions can be longer than 7 characters. Fitting in a single 8 char tab stop space is not important. It's a detriment.

    I would use JUMPSKIP, CALLSKIP, etc.

    Yup, the days of needing to be 8 modulus paranoid, are well behind us.
    Better to design for the human eyeball, not some archaic TAB setting.

    Easy to read, & maintain, always trumps terse.
  • jmgjmg Posts: 15,140
    MJB wrote: »
    So I take the liberty to present you the latest Tachyon-P2 kernel snipped I found on Peter's DROPBOX.

    It is a wordcode interpreter, uses the first 256 addresses to directly address COG memory.
    Actually this code seems a little 'old' - 17 days ;-).
    DAT
    '*************** TACHYON COG KERNEL for Propeller 2 *******************
    		org 0
    RESET
    		call	#INITCOG		' run non-time critical init from hubexec
    		jmp	#doNEXT
    '
    ' main Forth wordcode interpreter
    
    exec
    		cmp 	X,wordcodes wc
    	if_nc	jmp 	#ENTER
    	 	call	X                     	' could call cog or hub code - use ret to return
    doNEXT          rdword	X,PTRA++   ' read word code instruction address, 
                                                                ' PTRA used as instruction pointer
    	       	cmp	ops,X wc
    	if_nc	jmp	#exec
    		call	#dolit8            ' handle 8 bit literal inside wordcode, saves a rdbyte
    		jmp	#doNEXT
    
    ops 		long 	$FF00-1
    wordcodes	long 	endcode
    
    ' Save IP and load with new IP from call
    ENTER 		test 	X,#1 wz
    		andn 	X,#1
    	if_z	wrlut	PTRA,retptr		' save IP onto return stack
    	if_z  	add	retptr,#1
     		mov 	PTRA,X          	' jump to new wordcode
    		jmp 	#doNEXT
    

    Thanks, so how much can that improve, with Chip's new SKIP variants ?

  • Not much unless Peter also reduces multiple subroutines (words) into fever ones interleaving code and use skip.

    Mike

  • MJBMJB Posts: 1,235
    jmg wrote: »
    MJB wrote: »
    So I take the liberty to present you the latest Tachyon-P2 kernel snipped I found on Peter's DROPBOX.

    It is a wordcode interpreter, uses the first 256 addresses to directly address COG memory.
    Actually this code seems a little 'old' - 17 days ;-).
    DAT
    '*************** TACHYON COG KERNEL for Propeller 2 *******************
    		org 0
    RESET
    		call	#INITCOG		' run non-time critical init from hubexec
    		jmp	#doNEXT
    '
    ' main Forth wordcode interpreter
    
    exec
    		cmp 	X,wordcodes wc
    	if_nc	jmp 	#ENTER
    	 	call	X                     	' could call cog or hub code - use ret to return
    doNEXT          rdword	X,PTRA++   ' read word code instruction address, 
                                                                ' PTRA used as instruction pointer
    	       	cmp	ops,X wc
    	if_nc	jmp	#exec
    		call	#dolit8            ' handle 8 bit literal inside wordcode, saves a rdbyte
    		jmp	#doNEXT
    
    ops 		long 	$FF00-1
    wordcodes	long 	endcode
    
    ' Save IP and load with new IP from call
    ENTER 		test 	X,#1 wz
    		andn 	X,#1
    	if_z	wrlut	PTRA,retptr		' save IP onto return stack
    	if_z  	add	retptr,#1
     		mov 	PTRA,X          	' jump to new wordcode
    		jmp 	#doNEXT
    

    Thanks, so how much can that improve, with Chip's new SKIP variants ?

    ok then - let's see how the amateur (me) and the P2EG (Prop2 experts group) can improve Peter's Tachyon Forth word code engine for Prop2 before Peter wakes up and joins in ;-)

    As Mike write this requires combining some words COG code for using SKIP.

    Let's take this snippet:
    ' C@  ( caddr -- byte ) Fetch a byte from hub memory
    CFETCH                  rdbyte  tos,tos
                            jmp     unext
    
    ' W@  ( waddr -- word ) Fetch a word from hub memory
    WFETCH                  rdword  tos,tos
                            jmp     unext
    
    ' @  ( addr -- long ) Fetch a long from hub memory
    FETCH                   rdlong  tos,tos
                            jmp     unext
    
    
    could be rewritten to:
    ' C@  ( caddr -- byte ) Fetch a byte from hub memory          |  C@  |  W@  |  @  | JMP
    CFETCH                  rdbyte  tos,tos                                     '        0        1        1      0
    
    ' W@  ( waddr -- word ) Fetch a word from hub memory
    WFETCH                  rdword  tos,tos                                    '        1        0        1      0
    
    ' @  ( addr -- long ) Fetch a long from hub memory
    FETCH                   rdlong  tos,tos                                       '        1        0        1      0
                            jmp     unext                                                '        1        1        0      0
      
    
    but we do use the lower 9 bits of the wordcode directly to address COG and there is currently no redirection via LUT decoding. Bit 15 is used to flag 15 bit immediate literals. other bits are used to code relative jumps and register access.
    So no space/bits free in the wordcode to store a SKIP map.
    Also most kernel words are only 2 or 3 PASM instructions anyhow, so not really much to save.
    I just spent quite some time reading V4.2 kernel code again ... I do NOT see right now where SKIP would help a lot with the current architecture. :-(
  • Cluso99Cluso99 Posts: 18,066
    edited 2017-03-24 02:18
    Rayman wrote: »
    how about HOPSKP...

    or HOPSKPNJMP
    LOL
  • Cluso99Cluso99 Posts: 18,066
    cgracey wrote: »
    Seairth wrote: »
    @cgracey, what happens if there's an active skip mask inside of a rep block? Will the rep still work correctly?

    When REP is running in hub-exec, it actually does a JMP to get back to the top of the block, which entails resetting and reloading the FIFO. In cog-exec, the PC just gets changed in order to loop. So, I think SKIP would work in either case, but it may not work the same, as there'd be an extra SKIP bit consumed on each block iteration in hub-exec mode. In any case, SKIP would behave the same in both modes if it was just used within the code block, and not when it looped.

    Whoa! This would blow up in cog-exec mode, because the REP logic counts instructions. If SKIP is causing fewer or variable numbers of instructions to execute, it's not going to work, unless you planned very carefully with perhaps constant instruction counts in every SKIP case, and set the REP block size accordingly.

    All REP does is get you out of needing a 4-clock branch instruction. It's only useful/beneficial for blocks of only several instructions, I think. There's probably no need for SKIP inside REP. But, you may find some value in it. Seems like a recipe for madness, to me.

    Obviously skip works out a count to know how many instructions to skip over. Could that value be used to decrement the REP block size?
    If yes, easy/complex to do? If its complex, then its a caveat for skip instructions.
  • cgraceycgracey Posts: 14,131
    Cluso99 wrote: »
    Obviously skip works out a count to know how many instructions to skip over. Could that value be used to decrement the REP block size?
    If yes, easy/complex to do? If its complex, then its a caveat for skip instructions.

    It's way too complicated. It's like mixing polar and Cartesian domains. I imagine brushing my teeth while riding a dirt bike.
  • jmgjmg Posts: 15,140
    Seairth wrote: »
    We'll, whatever the name, not SHRED. That doesn't indicate that it's branching or skipping. It sounds more like a variant of the SEUSS instructions.
    Rayman wrote: »
    how about HOPSKP...
    HOPSKP or HOPSKIP I quite like, as it is at least close to what actually happens.
    SHRED does not sound like a serious suggestion, more whimsical.
  • cgraceycgracey Posts: 14,131
    edited 2017-03-24 02:51
    jmg wrote: »
    Seairth wrote: »
    We'll, whatever the name, not SHRED. That doesn't indicate that it's branching or skipping. It sounds more like a variant of the SEUSS instructions.
    Rayman wrote: »
    how about HOPSKP...
    HOPSKP or HOPSKIP I quite like, as it is at least close to what actually happens.
    SHRED does not sound like a serious suggestion, more whimsical.

    If we weren't hidebound to keeping it under 7 characters, we could just call it CHA_CHA_CHA. But I don't know what we'd call the one that returns.
  • cgracey wrote: »
    jmg wrote: »
    Seairth wrote: »
    We'll, whatever the name, not SHRED. That doesn't indicate that it's branching or skipping. It sounds more like a variant of the SEUSS instructions.
    Rayman wrote: »
    how about HOPSKP...
    HOPSKP or HOPSKIP I quite like, as it is at least close to what actually happens.
    SHRED does not sound like a serious suggestion, more whimsical.

    If we weren't hidebound to keeping it under 7 characters, we could just call it CHA_CHA_CHA. But I don't know what we'd call the one that returns.

    WOCKA_WOCKA_WOCKA
  • Chip,
    Why are we "hidebound to keeping it under 7 characters" ???? I think only you are thinking that is a requirement, and it's just NOT.
  • cgraceycgracey Posts: 14,131
    Roy Eltham wrote: »
    Chip,
    Why are we "hidebound to keeping it under 7 characters" ???? I think only you are thinking that is a requirement, and it's just NOT.

    I know. I'm just used to an 8-spaces-per-tab world.

    Upon someone pointing out that we could use longer names, I reviewed all of our mnemonics and I think they are sufficiently expressive at under eight characters, each.

    The only problem is this new instruction that CALLs and SKIPs, but with a single operand.
Sign In or Register to comment.