Fast Bytecode Interpreter

cgracey · 2017-03-14 01:07

I've made some progress on the interpreter and it's going to run quite fast.

Here is the main loop:

'
' Interpreter loop
'
inst		rfbyte	p		'get bytecode
		rdlut	x,p		'lookup long from lut
		alti	x		'execute long as next instruction
		nop			'(instruction placeholder)
		jmp	#inst		'loop

In case the instruction from the LUT is 2 clocks (not a CALL), the entire bytecode loop is 13 clocks.

Here is what the LUT table looks like:

'
' Bytecode-to-instruction lookup in LUT
'
		org	$200

		push	neg1		'constant -1
		pusha	#0		'constant 0
		pusha	#1		'constant 1
		pusha	#2		'constant 2
		pusha	#3		'constant 3
		pusha	#4		'constant 4
		pusha	#7		'constant 7
		pusha	#8		'constant 8
		pusha	#15		'constant 15
		pusha	#16		'constant 16
		pusha	#31		'constant 31
		pusha	#32		'constant 32

		call	rd1xx		'read register $1xx
		call	wr1xx		'write register $1xx

ops		call	#op1		'unary operator		NOT	boolean not
		call	#op1		'unary operator		!	bitwise not
		call	#op1		'unary operator		-	negate
		call	#op1		'unary operator		||	absolute
		call	#op1		'unary operator		>|	encode
		call	#op1		'unary operator		|<	decode
		call	#op1		'unary operator		~	sign extend byte
		call	#op1		'unary operator		~~	sign extened word

		call	#op2		'binary operator	AND	boolean and
		call	#op2		'binary operator	OR	boolean or
		call	#op2		'binary operator	&	bitwise and
		call	#op2		'binary operator	|	bitwise or
		call	#op2		'binary operator	^	bitwise xor
		call	#op2		'binary operator	>>	shift right
		call	#op2		'binary operator	<<	shift left
		call	#op2		'binary operator	~>	arithmetic shift right
		call	#op2		'binary operator	<~	arithmetic shift left
		call	#op2		'binary operator	->	rotate right
		call	#op2		'binary operator	<-	rotate left
		call	#op2		'binary operator	><	reverse LSBs
		call	#op2		'binary operator	#>	limit minimum (signed)
		call	#op2		'binary operator	<#	limit maximum (signed)
		call	#op2		'binary operator	+	add
		call	#op2		'binary operator	-	subtract
		call	#op2		'binary operator	*	multiply (signed)
		call	#op2		'binary operator	/	divide (signed)
		call	#op2		'binary operator	//	remainder (signed)
		call	#op2		'binary operator	**	scale (unsigned)
		call	#op2		'binary operator	*/	fraction (unsigned)

		call	#op1		'SQRT(x)			math terms, 1 result
		call	#op1		'LOG(x)
		call	#op1		'EXP(x)
		call	#op2		'NEGBOOL(x,bool)
		call	#op3		'MUXBOOL(x,mux,bool)

		call	#op2		'POLCAR(r,t : x,y)		math terms, 2 results
		call	#op2		'CARPOL(x,y : r,t)
		call	#op3		'ROTCAR(x,y,t : x,y)

Each of those longs in the table executes as an instruction, in place of the NOP in the interpreter loop. For most bytecodes, more needs to be accomplished than what can happen in one instruction, so a CALL is placed in the LUT table. The return address is the 'jmp #inst' in the main loop.

Here's what the math operations look like:

'
' Math operators
'
op3		popa	z		'pop z
op2		popa	y		'pop y
op1		popa	x	wc, wz	'pop x, set flags

		altd	p,#.op - (ops & $FF)	'point next d to op instruction
		alti	0			'execute d as next instruction
		nop				'(instruction executes here)

	_ret_	pusha	x		'push x, return to bytecode loop


.op		call	#.not		'NOT		unaries
		not	x		'!
		neg	x		'-
		abs	x		'||
		topone	x		'>|
		decod	x		'|<
		call	#.sxb		'~~
		call	#.sxw		'~

		call	#.and		'AND		binaries
		call	#.or		'OR
		and	x,y		'&
		or	x,y		'|
		xor	x,y		'^
		shr	x,y		'>>
		shl	x,y		'<<
		sar	x,y		'~>
		sal	x,y		'<~
		ror	x,y		'->
		rol	x,y		'<-
		call	#.rev		'><
		mins	x,y		'#>
		maxs	x,y		'<#
		add	x,y		'+
		sub	x,y		'-
		call	#.mul		'*
		call	#.div		'/
		call	#.rem		'//
		call	#.scl		'**
		call	#.fra		'*/

		call	#.sqrt		'SQRT()			1 result
		call	#.log		'LOG()
		call	#.exp		'EXP()
		call	#.negbool	'NEGBOOL(,)
		call	#.muxbool	'MUXBOOL(,,)

		call	#.polcar	'POLCAR(r,t : x,y)	2 results
		call	#.carpol	'CARPOL(x,y : r,t)
		call	#.rotcar	'ROTCAR(x,y,t : x,y)


.not	if_nz	not	x,#0
	_ret_	not	x,#0

.sxb		shl	x,#24
	_ret_	sar	x,#24

.sxw		shl	x,#16
	_ret_	sar	x,#16

.and	if_nz	cmp	y,#0	wz
	if_nz	not	x,#0
		ret

.or	if_z	cmp	y,#0	wz
	if_nz	not	x,#0
		ret

.rev		rev	x
		rol	x,y
		rol	x,#1
	_ret_	triml	x,y

.mul		testb	y,#31	wz
		abs	x
		abs	y
		qmul	x,y
.mul2		getqx	x
    if_c_eq_z	neg	x
		ret

.div		testb	y,#31	wz
		abs	x
		abs	y
		qdiv	x,y
		jmp	#.mul2

.rem		abs	x
		abs	y
		qdiv	x,y
		getqy	x
    if_c	neg	x
		ret

.scl		qmul	x,y
	_ret_	getqy	x

.fra		qfrac	x,y
	_ret_	getqx	x

.sqrt		qsqrt	x
	_ret_	getqx	x

.log		qlog	x
	_ret_	getqx	x

.exp		qexp	x
	_ret_	getqx	x

.negbool	cmp	y,#0	wz
	_ret_	negnz	x

.muxbool	cmp	z,#0	wz
	_ret_	muxnz	x,y

.polcar		qrotate	x,y
		jmp	#.results

.carpol		qvector	x,y
		jmp	#.results

.rotcar		setq	y
		qrotate	x,z

.results	getqy	x
    		pusha	x
	_ret_	getqx	x

Most operations wind up being just one in-lined instruction. The more complex stuff CALLs to routines.

This '_RET_' is an implied return that uses code %0000 of the conditional field. This means NOP cannot be $00000000, anymore, as it would become a RET instruction. Having implied RET's saves lots of instructions, especially in a situation like this where you have lots of snippets of code that get CALL'd. Each _RET_ saves two clocks and a long.

If the _RET_ is used on a branch instruction, the branch instruction takes precedence, unless the branch doesn't occur. Then, the _RET_ happens. Here are two routines from one of the VGA demos:

'
' Subroutines
'
blank		call	#hsync			'blank lines
		xcont	m_vi,#0
	_ret_	djnz	x,#blank

hsync		xcont	m_bs,#0			'horizontal sync
		xzero	m_sn,#1
	_ret_	xcont	m_bv,#0

As long as the DJNZ is looping, the _RET_ is ignored. When the DJNZ would otherwise fall through, the _RET_ is executed.

Sorry for having made another change. I just realized this was going to make big improvements in code density, so I did it.

What makes things go really fast is the ALTI instruction. Being able to in-line a random instruction has lots of speed and density benefits.

Roy Eltham · 2017-03-14 01:20

This is looking really slick. The _ret_ thing seems like a reasonable change, no need to apologize.

Seems like Spin2 is going to be a lot more capable than Spin1 was. I often resort to PASM to do most things on P1, might not need as much PASM on P2.

cgracey · 2017-03-14 01:29

The %0000 code was not very useful. Any instruction that had that would never execute. There might have been some utility in it, but it's way more useful being used as an 'always' condition, like %1111, but with some special behavior.

ozpropdev · 2017-03-14 01:33

Looks good Chip.
I like the "_ret_ " addition too.

Peter Jakacki · 2017-03-14 01:35

Vectored execution with ALTI to handle an instruction or a call, SWEET!

David Betz · 2017-03-14 01:41

cgracey wrote: »

The %0000 code was not very useful. Any instruction that had that would never execute. There might have been some utility in it, but it's way more useful being used as an 'always' condition, like %1111, but with some special behavior.

True but what is the instruction encoding for NOP now?

cgracey · 2017-03-14 01:49

David Betz wrote: »

cgracey wrote: »

The %0000 code was not very useful. Any instruction that had that would never execute. There might have been some utility in it, but it's way more useful being used as an 'always' condition, like %1111, but with some special behavior.

True but what is the instruction encoding for NOP now?

I was thinking 'WAITX #0'.

jmg · 2017-03-14 01:53

cgracey wrote: »

David Betz wrote: »

cgracey wrote: »

The %0000 code was not very useful. Any instruction that had that would never execute. There might have been some utility in it, but it's way more useful being used as an 'always' condition, like %1111, but with some special behavior.

True but what is the instruction encoding for NOP now?

I was thinking 'WAITX #0'.

What about making blank Flash a NOP, which means 0xffffffff ?

cgracey · 2017-03-14 01:56

jmg wrote: »

cgracey wrote: »

David Betz wrote: »

cgracey wrote: »

The %0000 code was not very useful. Any instruction that had that would never execute. There might have been some utility in it, but it's way more useful being used as an 'always' condition, like %1111, but with some special behavior.

True but what is the instruction encoding for NOP now?

I was thinking 'WAITX #0'.

What about making blank Flash a NOP, which means 0xffffffff ?

That would be 'AUGD #all_ones'. This NOP thing is the biggest problem now.

David Betz · 2017-03-14 02:00

cgracey wrote: »

David Betz wrote: »

cgracey wrote: »

The %0000 code was not very useful. Any instruction that had that would never execute. There might have been some utility in it, but it's way more useful being used as an 'always' condition, like %1111, but with some special behavior.

True but what is the instruction encoding for NOP now?

I was thinking 'WAITX #0'.

Makes sense.

jmg · 2017-03-14 02:03

cgracey wrote: »

jmg wrote: »

What about making blank Flash a NOP, which means 0xffffffff ?

That would be 'AUGD #all_ones'. This NOP thing is the biggest problem now.

There are likely plenty of opcodes that can alias to a NOP, but it is nice to have blank memory behave in a predictable manner too.
What happens now, if an errant branch lands on a whole chain of 'AUGD #all_ones' ?

cgracey · 2017-03-14 02:10

jmg wrote: »

cgracey wrote: »

jmg wrote: »

What about making blank Flash a NOP, which means 0xffffffff ?

That would be 'AUGD #all_ones'. This NOP thing is the biggest problem now.

There are likely plenty of opcodes that can alias to a NOP, but it is nice to have blank memory behave in a predictable manner too.
What happens now, if an errant branch lands on a whole chain of 'AUGD #all_ones' ?

They pretty much are NOPs. The next instruction afterward, though, with a literal D would get the top 27 bits as highs. May or may not be a problem.

cgracey · 2017-03-14 02:12

Here are all the constant routines. These are going to be very fast:

'
' Constants
'
con8		rfbyte	x
	_ret_	pusha	x

con8n		rfbyte	x
		or	x,_FFFFFF00
	_ret_	pusha	x

con16		rfword	x
	_ret_	pusha	x

con16n		rfword	x
		or	x,_FFFF0000
	_ret_	pusha	x

con32		rflong	x
	_ret_	pusha	x

conexp		rfbyte	y
		decod	x,y
		testb	y,#5	wc
	if_c	sub	x,#1
		testb	y,#6	wc
	if_c	not	x
	_ret_	pusha	x

_FFFFFFFF	long	$FFFFFFFF
_FFFFFF00	long	$FFFFFF00
_FFFF0000	long	$FFFF0000

Here's the constant portion of the LUT table:

		push	_FFFFFFFF	'constant -1
		pusha	#0		'constant 0
		pusha	#1		'constant 1
		pusha	#2		'constant 2
		pusha	#3		'constant 3
		pusha	#4		'constant 4
		pusha	#7		'constant 7
		pusha	#8		'constant 8
		pusha	#15		'constant 15
		pusha	#16		'constant 16
		pusha	#31		'constant 31
		pusha	#32		'constant 32
		call	#con8		'constant $000000xx
		call	#con8n		'constant $FFFFFFxx
		call	#con16		'constant $0000xxxx
		call	#con16n		'constant $FFFFxxxx
		call	#con32		'constant $xxxxxxxx
		call	#conexp		'constant decod/dec/not

The most common constants are single instructions.

cgracey · 2017-03-14 02:46

25% faster:

'
' Interpreter loop
'
inst		rfbyte	p		'get bytecode
		rdlut	x,p		'lookup long from lut
		alti	x		'execute long as next instruction
		nop			'(instruction placeholder)
		rfbyte	p		'get bytecode
		rdlut	x,p		'lookup long from lut
		alti	x		'execute long as next instruction
		nop			'(instruction placeholder)
		rfbyte	p		'get bytecode
		rdlut	x,p		'lookup long from lut
		alti	x		'execute long as next instruction
		nop			'(instruction placeholder)
		rfbyte	p		'get bytecode
		rdlut	x,p		'lookup long from lut
		alti	x		'execute long as next instruction
		nop			'(instruction placeholder)
		jmp	#inst		'loop

potatohead · 2017-03-14 14:25

Sorry for having made another change.

Don't be. Worth it. This exercise is important.

Seairth · 2017-03-14 14:58

I don't suppose there's an easy way to move the explicit RET instruction to $x0000000? That way, $00000000 could still be NOP, as "_ret_ RET" would be somewhat nonsensical otherwise.

ersmith · 2017-03-14 15:07

Another option would be to swap the _ret_ prefix and the ALWAYS prefix, so that 0 is ALWAYS, and then re-arrange the opcodes so that instruction field 0 is something like AND or OR; that way 00000000 would be AND 0,0, which would work fine as a nop.

Or, don't bother re-arranging anything and just define NOP to be an alias for OR 0,0.

Rayman · 2017-03-14 15:15

There's just something wrong with 0 not being NOP...

Hope you're not thinking about changing that...

David Betz · 2017-03-14 15:20

Rayman wrote: »

There's just something wrong with 0 not being NOP...

Hope you're not thinking about changing that...

Wouldn't it actually be better if zero was something like "software reset"? Do you really want code that wanders off into uninitialized memory to just keep running? Wouldn't it be better for an embedded system if it reset to a known state?

Seairth · 2017-03-14 15:32

David Betz wrote: »

Rayman wrote: »

There's just something wrong with 0 not being NOP...

Hope you're not thinking about changing that...

Wouldn't it actually be better if zero was something like "software reset"? Do you really want code that wanders off into uninitialized memory to just keep running? Wouldn't it be better for an embedded system if it reset to a known state?

That's an excellent point. Possibly better would be to trigger a debug interrupt. Instead of a debugger, startup code could set the debug interrupt to call cogstop or whatever other behavior is desirable (cogatn a watchdog cog, for instance).

At which point, yeah, just make NOP an alias for an existing instruction.

ke4pjw · 2017-03-14 15:40

Isn't 0x00 an ADD on x86 and a NEG on the 6809? Seem like NOP is all over the place on different architectures. It's invisible to people like me, because I have no idea what object code is being generated by the assembler. That's my $.002 (two tens of 1 cent)

PS: Way to go Chip!!! W000000T!

Heater. · 2017-03-14 15:52

It's not clear to me that having zero defined as software reset or debugger interrupt, etc helps much.

If your program is buggy and runs off into the weeds, those weeds are quite likely to be some other part of your code or your data somewhere, which it will then try to execute.

David Betz · 2017-03-14 16:18

Heater. wrote: »

It's not clear to me that having zero defined as software reset or debugger interrupt, etc helps much.

If your program is buggy and runs off into the weeds, those weeds are quite likely to be some other part of your code or your data somewhere, which it will then try to execute.

Yeah, it would only really help if you wandered off into memory that the boot up code had zeroed. On the other hand, I suspect zero is probably a pretty likely value in the weeds so you might not stop immediately but probably wouldn't get too far.

Rayman · 2017-03-14 16:32

Well, changing NOP will break the code I'm working on and also garryj's USB code.
Could be fixed, but inconvenient...

One thing garryj did is load the top of his cog code with variables initialized to 0. As such, they are just NOPs and skipped over. Can't do this if NOP changes:

dat       'UsbHostEntry 
                org
UsbHostEntry 'RJA                  
' /* Host common registers
hr0             long    0                               ' Multi-purpose registers
hr1             long    0
hpar1           long    0                               ' Routine entry/exit parameters
hpar2           long    0
...
etc...
...
pkt_tmp         long    0                               ' Tmp storage for routines that deal with datax packets
' */
' /* usb_host_start
'------------------------------------------------------------------------------
' The USB host cog.
'------------------------------------------------------------------------------
usb_host_start
                call    #load_lut

cgracey · 2017-03-14 16:56

Rayman wrote: »

Well, changing NOP will break the code I'm working on and also garryj's USB code.
Could be fixed, but inconvenient...

One thing garryj did is load the top of his cog code with variables initialized to 0. As such, they are just NOPs and skipped over. Can't do this if NOP changes:

dat       'UsbHostEntry 
                org
UsbHostEntry 'RJA                  
' /* Host common registers
hr0             long    0                               ' Multi-purpose registers
hr1             long    0
hpar1           long    0                               ' Routine entry/exit parameters
hpar2           long    0
...
etc...
...
pkt_tmp         long    0                               ' Tmp storage for routines that deal with datax packets
' */
' /* usb_host_start
'------------------------------------------------------------------------------
' The USB host cog.
'------------------------------------------------------------------------------
usb_host_start
                call    #load_lut

I did that once a couple of times, myself.

We just can't do that anymore.

potatohead · 2017-03-14 16:58

And not doing it is worth the improvement.

Seairth · 2017-03-14 17:25

That could have been done as

dat       'UsbHostEntry 
                org
UsbHostEntry 'RJA                  
' /* Host common registers
hr0             nop                                     ' Multi-purpose registers
hr1             nop      
hpar1           nop                                     ' Routine entry/exit parameters
hpar2           nop      
...
etc...
...
pkt_tmp         nop                                     ' Tmp storage for routines that deal with datax packets
' */
' /* usb_host_start
'------------------------------------------------------------------------------
' The USB host cog.
'------------------------------------------------------------------------------
usb_host_start
                call    #load_lut

It would have generated the same exact code, and wouldn't have broken when Chip made this change.

David Betz · 2017-03-14 17:47

I guess it was premature to declare the instruction set frozen. Maybe it's best to assume that won't happen until after the Spin2 interpreter is complete.

garryj · 2017-03-14 19:08

cgracey wrote: »

Rayman wrote: »

Well, changing NOP will break the code I'm working on and also garryj's USB code.
Could be fixed, but inconvenient...

One thing garryj did is load the top of his cog code with variables initialized to 0. As such, they are just NOPs and skipped over. Can't do this if NOP changes:

I did that once a couple of times, myself.

We just can't do that anymore.

My intent was to have a block of cog registers that were in a known fixed location to allow multiple USB host cogs to use the same register names. If this itself is a no-no (I never tested it), things can easily be changed. Or, wouldn't this fix the issue too? :

dat
                org
usb_host_entry  jmp    #usb_host_start                  ' Skip the static reg block

' /* Host common registers
hr0             long    0                               ' Multi-purpose registers
hr1             long    0
hpar1           long    0                               ' Routine entry/exit parameters
hpar2           long    0
...
etc...
...
pkt_tmp         long    0                               ' Tmp storage for routines that deal with datax packets
' */
' /* usb_host_start
'------------------------------------------------------------------------------
' The USB host cog.
'------------------------------------------------------------------------------
usb_host_start
                call    #load_lut

In any case, the changes to make it a non-issue are minimal.

Ariba · 2017-03-14 19:12

How about this idea?

_ret_ only works if the internal call-stackpointer is not at bottom, otherwise it's ignored and therfore a NOP.
So as long as you not have done a CALL in your code, the %0000 condition works as a NOP and only inside subroutines it is a _ret_.

Andy

Rayman · 2017-03-14 19:33

Does ROR has same opcode as NOP?

Fast Bytecode Interpreter

Comments