Fast Bytecode Interpreter

Seairth · 2017-03-23 11:40

Cluso99 wrote: »

BTW What about JMPSKIP or JMPSKP and CALLSKP ?

+1. So much more clear.

ozpropdev · 2017-03-23 11:55

Chip
That's my favourite type of shredding! :cool:

Cluso99 · 2017-03-23 12:02

Knees won't take that type of shredding anymore

cgracey · 2017-03-23 12:18

Seairth wrote: »

Cluso99 wrote: »

BTW What about JMPSKIP or JMPSKP and CALLSKP ?

+1. So much more clear.

I think I agree, but SKIP is going to have to become SKP to make:

SKP
JMPSKP
CALLSKP

...Wait. There's a reason I didn't do that, in the first place. These instructions are not regular jumps, since they are confined to cog memory. Also, they do not use separate jump addresses and skip patterns, as one would expect. They are a special case of instruction that exists to exploit two things: fast branching and fast skipping, only possible inside cog memory.

ROBOJMP
ROBOSUB

ozpropdev · 2017-03-23 12:28

JMPPAT
CALLPAT

David Betz · 2017-03-23 13:31

cgracey wrote: »

I moved the RET/RETA/RETB instructions to overlap with CALL/CALLA/CALLB, since they could be differentiated by opposite immediate bits. This freed up three {#}D instruction slots, into which I placed SKIP and two other new ones that really take care of business:

Ah, more instruction encoding changes. So much for a frozen design. :-)
It's a good thing that I wrote a program to parse your instruction spreadsheet to generate the tables to drive the assembler.

Cluso99 · 2017-03-23 13:38

cgracey wrote: »

Seairth wrote: »

Cluso99 wrote: »

BTW What about JMPSKIP or JMPSKP and CALLSKP ?

+1. So much more clear.

I think I agree, but SKIP is going to have to become SKP to make:

SKP
JMPSKP
CALLSKP

...Wait. There's a reason I didn't do that, in the first place. These instructions are not regular jumps, since they are confined to cog memory. Also, they do not use separate jump addresses and skip patterns, as one would expect. They are a special case of instruction that exists to exploit two things: fast branching and fast skipping, only possible inside cog memory.

ROBOJMP
ROBOSUB

I don't have a problem with JMPSKP and CALLSKP and being limited to cog/lut addresses. They are quite specific (not generic) instructions.

Seairth · 2017-03-23 14:37

We'll, whatever the name, not SHRED. That doesn't indicate that it's branching or skipping. It sounds more like a variant of the SEUSS instructions.

Rayman · 2017-03-23 14:50

how about HOPSKP...

Publison · 2017-03-23 15:01

cgracey wrote: »

Seairth wrote: »

Cluso99 wrote: »

BTW What about JMPSKIP or JMPSKP and CALLSKP ?

+1. So much more clear.

CALLSKP

That sounds like "Call Skype"

MJB · 2017-03-23 15:40

jmg wrote: »

ersmith wrote: »

For example, IIRC the original Tachyon interpreter executes bytecodes very quickly because each bytecode is basically the address of the routine implementing it. The outer loop for such an interpreter is extremely small and fast, and skip is unlikely to be of much help.

Yes, IIRC Peter changed to a 16b-code in V4 to get more speed, and it would be interesting to see what his inner loops code like on the newest P2 .
It is a good idea to test this on more than one 'byte-code' engine.
eg I see in the Lua thread, that has a 6b instruction choice.

@jmg, you have asked ...
@Peter_Jakacki might not have seen your question.

So I take the liberty to present you the latest Tachyon-P2 kernel snipped I found on Peter's DROPBOX.

It is a wordcode interpreter, uses the first 256 addresses to directly address COG memory.
Actually this code seems a little 'old' - 17 days ;-). The code for Prop1 below for Tachyon V4.2 shows some enhancements.

DAT
'*************** TACHYON COG KERNEL for Propeller 2 *******************

		org 0

RESET
		call	#INITCOG		' run non-time critical init from hubexec
		jmp	#doNEXT
'
' main Forth wordcode interpreter
'

exec
		cmp 	X,wordcodes wc
	if_nc	jmp 	#ENTER
	 	call	X                     	' could call cog or hub code - use ret to return
doNEXT          rdword	X,PTRA++   ' read word code instruction address, 
                                                            ' PTRA used as instruction pointer
	       	cmp	ops,X wc
	if_nc	jmp	#exec
		call	#dolit8            ' handle 8 bit literal inside wordcode, saves a rdbyte
		jmp	#doNEXT

ops 		long 	$FF00-1
wordcodes	long 	endcode

' Save IP and load with new IP from call
ENTER 		test 	X,#1 wz
		andn 	X,#1
	if_z	wrlut	PTRA,retptr		' save IP onto return stack
	if_z  	add	retptr,#1
 		mov 	PTRA,X          	' jump to new wordcode
		jmp 	#doNEXT

Prop1 V4.2 code for inner loop

{ *** RUNTIME BYTECODE INTERPRETER for Propeller 1 *** }
'       *       *       *       *
' Fetch the next byte code instruction in hub RAM pointed to by the instruction pointer IP
' This is the very heart of the runtime interpreter
'
doNEXT                  rdword  R0,IP            ' read word code instruction
                        add     IP,#2 wc        ' advance IP to next wordcode (clears the carry too!)
                        shr     R0,#9 nr,wz      ' cog or hub?
                 if_z   jmp     R0               ' execute the code by directly indexing the first 512 longs in cog
                        ' literal?
                        mov     X,R0
                        shr     X,#15 nr,wz     ' embedded 15-bit literal?
                 if_nz  and     X,mask15
                 if_nz  jmp     #PUSHX          ' push this literal without having to do a call/return
                        ' conditional jump?
                        cmp     jumpop,X wc     ' test for jump + disp
                 if_c   jmp     #doJUMP
                        ' register op?
                        cmp     regop,X wc      ' test for REG + disp
                 if_c   jmp     #doREG
                        ' must be a call/jump
                        test    X,#1 wz         ' skip pushing the IP if bit 0 is set
                 if_nz  andn    X,#1            ' and fixup
                 if_z   call    #SAVEIP         ' otherwise this is an address of a high-level word code
                        mov     IP,X            ' so after saving the IP, load it with new address
                        jmp     #doNEXT

' Conditional - Jumps are encoded with an 8-bit signed word displacement
doJUMP                                          ' unconditional jump ? c=1
                        tjnz    tos,#DROP       ' continue without jumping - discard flag
                        test    X,#$80 wz       ' reverse jump? nz
                        and     X,#$7F          ' mask displacement
                        shl     X,#1            ' index words
                        sumnz    IP,X           ' +/- jump
                        jmp     #DROP           ' discard flag if conditional

                        ' any opcodes from here on are jumps with encoded +/-128 displacement
jumpop          long    $7F00-1


doREG '( -- addr )
                        and     X,#$FF
                        add     X,regptr
                        jmp     #PUSHX

                        ' address 256 bytes of task registers per cogid
regop           long    rg-1

Roy Eltham · 2017-03-23 17:08

I like JMPSKP & CALLSKP much more than SHRED & SHREDR, and am fine with them being cog/lut only while the other JMP/CALL instructions are not.

Also, SKIP can stay SKIP, it doesn't have to shorten to SKP.

Some day I'll convince you that the instructions can be longer than 7 characters. Fitting in a single 8 char tab stop space is not important. It's a detriment.

I would use JUMPSKIP, CALLSKIP, etc.

Seairth · 2017-03-23 17:32

Hub jump could be added with probably little effort with something like.

SETQ #base
CALLSKP reg

Where the resulting branchaddress would be to (base << 10) + reg[9:0]. If base is zero, then it would act as it does now (cog/lut exec). This would give you a skip table of 256 entries (because of the byte addressing).

The question is: do we need it?

Cluso99 · 2017-03-23 20:48

Seairth wrote: »
Hub jump could be added with probably little effort with something like.
SETQ #base
CALLSKP reg
Where the resulting branchaddress would be to (base << 10) + reg[9:0]. If base is zero, then it would act as it does now (cog/lut exec). This would give you a skip table of 256 entries (because of the byte addressing).

The question is: do we need it?

If hub jump/call was to be supported (I'm quite happy without it) then it would be better to be consistent and use the AUG + JMPSKP/CALLSKP sequence.

Cluso99 · 2017-03-23 21:05

Roy Eltham wrote: »

I like JMPSKP & CALLSKP much more than SHRED & SHREDR, and am fine with them being cog/lut only while the other JMP/CALL instructions are not.

Also, SKIP can stay SKIP, it doesn't have to shorten to SKP.

Some day I'll convince you that the instructions can be longer than 7 characters. Fitting in a single 8 char tab stop space is not important. It's a detriment.

I would use JUMPSKIP, CALLSKIP, etc.

+1

Perhaps it's more a restriction in pnut that's clouding Chip's view?

Anyway this topic can wait until the silicon is underway.

Twice in a week we agree. What happened?

Seairth · 2017-03-23 21:20

Cluso99 wrote: »

If hub jump/call was to be supported (I'm quite happy without it) then it would be better to be consistent and use the AUG + JMPSKP/CALLSKP sequence.

Except AUG is for extending immediate values, whereas SETQ is often used to modify the behavior of an instruction.

jmg · 2017-03-23 21:28

cgracey wrote: »

Whoa! This would blow up in cog-exec mode, because the REP logic counts instructions. If SKIP is causing fewer or variable numbers of instructions to execute, it's not going to work, unless you planned very carefully with perhaps constant instruction counts in every SKIP case, and set the REP block size accordingly.

All REP does is get you out of needing a 4-clock branch instruction. It's only useful/beneficial for blocks of only several instructions, I think. There's probably no need for SKIP inside REP. But, you may find some value in it. Seems like a recipe for madness, to me.

That means code that works in HUBexec, will break if moved into COG ? (and vice versa)
That rather breaks one of the fundamental, hard won P2 gains, where binary code did not care where it ran, it just ran a bit slower in HUB. Nothing broke.

jmg · 2017-03-23 21:29

David Betz wrote: »

Ah, more instruction encoding changes. So much for a frozen design. :-)
It's a good thing that I wrote a program to parse your instruction spreadsheet to generate the tables to drive the assembler.

Hehe, now that is smart thinking !

jmg · 2017-03-23 21:32

Roy Eltham wrote: »

Some day I'll convince you that the instructions can be longer than 7 characters. Fitting in a single 8 char tab stop space is not important. It's a detriment.

I would use JUMPSKIP, CALLSKIP, etc.

Yup, the days of needing to be 8 modulus paranoid, are well behind us.
Better to design for the human eyeball, not some archaic TAB setting.

Easy to read, & maintain, always trumps terse.

jmg · 2017-03-23 21:37

MJB wrote: »

So I take the liberty to present you the latest Tachyon-P2 kernel snipped I found on Peter's DROPBOX.

It is a wordcode interpreter, uses the first 256 addresses to directly address COG memory.
Actually this code seems a little 'old' - 17 days ;-).

DAT
'*************** TACHYON COG KERNEL for Propeller 2 *******************
		org 0
RESET
		call	#INITCOG		' run non-time critical init from hubexec
		jmp	#doNEXT
'
' main Forth wordcode interpreter

exec
		cmp 	X,wordcodes wc
	if_nc	jmp 	#ENTER
	 	call	X                     	' could call cog or hub code - use ret to return
doNEXT          rdword	X,PTRA++   ' read word code instruction address, 
                                                            ' PTRA used as instruction pointer
	       	cmp	ops,X wc
	if_nc	jmp	#exec
		call	#dolit8            ' handle 8 bit literal inside wordcode, saves a rdbyte
		jmp	#doNEXT

ops 		long 	$FF00-1
wordcodes	long 	endcode

' Save IP and load with new IP from call
ENTER 		test 	X,#1 wz
		andn 	X,#1
	if_z	wrlut	PTRA,retptr		' save IP onto return stack
	if_z  	add	retptr,#1
 		mov 	PTRA,X          	' jump to new wordcode
		jmp 	#doNEXT

Thanks, so how much can that improve, with Chip's new SKIP variants ?

msrobots · 2017-03-23 22:36

Not much unless Peter also reduces multiple subroutines (words) into fever ones interleaving code and use skip.

Mike

MJB · 2017-03-24 00:22

jmg wrote: »

MJB wrote: »

So I take the liberty to present you the latest Tachyon-P2 kernel snipped I found on Peter's DROPBOX.

It is a wordcode interpreter, uses the first 256 addresses to directly address COG memory.
Actually this code seems a little 'old' - 17 days ;-).

DAT
'*************** TACHYON COG KERNEL for Propeller 2 *******************
		org 0
RESET
		call	#INITCOG		' run non-time critical init from hubexec
		jmp	#doNEXT
'
' main Forth wordcode interpreter

exec
		cmp 	X,wordcodes wc
	if_nc	jmp 	#ENTER
	 	call	X                     	' could call cog or hub code - use ret to return
doNEXT          rdword	X,PTRA++   ' read word code instruction address, 
                                                            ' PTRA used as instruction pointer
	       	cmp	ops,X wc
	if_nc	jmp	#exec
		call	#dolit8            ' handle 8 bit literal inside wordcode, saves a rdbyte
		jmp	#doNEXT

ops 		long 	$FF00-1
wordcodes	long 	endcode

' Save IP and load with new IP from call
ENTER 		test 	X,#1 wz
		andn 	X,#1
	if_z	wrlut	PTRA,retptr		' save IP onto return stack
	if_z  	add	retptr,#1
 		mov 	PTRA,X          	' jump to new wordcode
		jmp 	#doNEXT

Thanks, so how much can that improve, with Chip's new SKIP variants ?

ok then - let's see how the amateur (me) and the P2EG (Prop2 experts group) can improve Peter's Tachyon Forth word code engine for Prop2 before Peter wakes up and joins in ;-)

As Mike write this requires combining some words COG code for using SKIP.

Let's take this snippet:

' C@  ( caddr -- byte ) Fetch a byte from hub memory
CFETCH                  rdbyte  tos,tos
                        jmp     unext

' W@  ( waddr -- word ) Fetch a word from hub memory
WFETCH                  rdword  tos,tos
                        jmp     unext

' @  ( addr -- long ) Fetch a long from hub memory
FETCH                   rdlong  tos,tos
                        jmp     unext

could be rewritten to:

' C@  ( caddr -- byte ) Fetch a byte from hub memory          |  C@  |  W@  |  @  | JMP
CFETCH                  rdbyte  tos,tos                                     '        0        1        1      0

' W@  ( waddr -- word ) Fetch a word from hub memory
WFETCH                  rdword  tos,tos                                    '        1        0        1      0

' @  ( addr -- long ) Fetch a long from hub memory
FETCH                   rdlong  tos,tos                                       '        1        0        1      0
                        jmp     unext                                                '        1        1        0      0

but we do use the lower 9 bits of the wordcode directly to address COG and there is currently no redirection via LUT decoding. Bit 15 is used to flag 15 bit immediate literals. other bits are used to code relative jumps and register access.
So no space/bits free in the wordcode to store a SKIP map.
Also most kernel words are only 2 or 3 PASM instructions anyhow, so not really much to save.
I just spent quite some time reading V4.2 kernel code again ... I do NOT see right now where SKIP would help a lot with the current architecture. :-(

Cluso99 · 2017-03-24 02:18

Rayman wrote: »

how about HOPSKP...

or HOPSKPNJMP
LOL

Cluso99 · 2017-03-24 02:22

cgracey wrote: »

Seairth wrote: »

@cgracey, what happens if there's an active skip mask inside of a rep block? Will the rep still work correctly?

When REP is running in hub-exec, it actually does a JMP to get back to the top of the block, which entails resetting and reloading the FIFO. In cog-exec, the PC just gets changed in order to loop. So, I think SKIP would work in either case, but it may not work the same, as there'd be an extra SKIP bit consumed on each block iteration in hub-exec mode. In any case, SKIP would behave the same in both modes if it was just used within the code block, and not when it looped.

Whoa! This would blow up in cog-exec mode, because the REP logic counts instructions. If SKIP is causing fewer or variable numbers of instructions to execute, it's not going to work, unless you planned very carefully with perhaps constant instruction counts in every SKIP case, and set the REP block size accordingly.

All REP does is get you out of needing a 4-clock branch instruction. It's only useful/beneficial for blocks of only several instructions, I think. There's probably no need for SKIP inside REP. But, you may find some value in it. Seems like a recipe for madness, to me.

Obviously skip works out a count to know how many instructions to skip over. Could that value be used to decrement the REP block size?
If yes, easy/complex to do? If its complex, then its a caveat for skip instructions.

cgracey · 2017-03-24 02:46

Cluso99 wrote: »

Obviously skip works out a count to know how many instructions to skip over. Could that value be used to decrement the REP block size?
If yes, easy/complex to do? If its complex, then its a caveat for skip instructions.

It's way too complicated. It's like mixing polar and Cartesian domains. I imagine brushing my teeth while riding a dirt bike.

jmg · 2017-03-24 02:47

Seairth wrote: »

We'll, whatever the name, not SHRED. That doesn't indicate that it's branching or skipping. It sounds more like a variant of the SEUSS instructions.

Rayman wrote: »

how about HOPSKP...

HOPSKP or HOPSKIP I quite like, as it is at least close to what actually happens.
SHRED does not sound like a serious suggestion, more whimsical.

cgracey · 2017-03-24 02:49

jmg wrote: »

Seairth wrote: »

We'll, whatever the name, not SHRED. That doesn't indicate that it's branching or skipping. It sounds more like a variant of the SEUSS instructions.

Rayman wrote: »

how about HOPSKP...

HOPSKP or HOPSKIP I quite like, as it is at least close to what actually happens.
SHRED does not sound like a serious suggestion, more whimsical.

If we weren't hidebound to keeping it under 7 characters, we could just call it CHA_CHA_CHA. But I don't know what we'd call the one that returns.

Seairth · 2017-03-24 02:56

cgracey wrote: »

jmg wrote: »

Seairth wrote: »

We'll, whatever the name, not SHRED. That doesn't indicate that it's branching or skipping. It sounds more like a variant of the SEUSS instructions.

Rayman wrote: »

how about HOPSKP...

HOPSKP or HOPSKIP I quite like, as it is at least close to what actually happens.
SHRED does not sound like a serious suggestion, more whimsical.

If we weren't hidebound to keeping it under 7 characters, we could just call it CHA_CHA_CHA. But I don't know what we'd call the one that returns.

WOCKA_WOCKA_WOCKA

Roy Eltham · 2017-03-24 03:00

Chip,
Why are we "hidebound to keeping it under 7 characters" ???? I think only you are thinking that is a requirement, and it's just NOT.

cgracey · 2017-03-24 03:09

Roy Eltham wrote: »

Chip,
Why are we "hidebound to keeping it under 7 characters" ???? I think only you are thinking that is a requirement, and it's just NOT.

I know. I'm just used to an 8-spaces-per-tab world.

Upon someone pointing out that we could use longer names, I reviewed all of our mnemonics and I think they are sufficiently expressive at under eight characters, each.

The only problem is this new instruction that CALLs and SKIPs, but with a single operand.

Fast Bytecode Interpreter

Comments