Fast Bytecode Interpreter

cgracey · 2017-03-14 19:39

Ariba wrote: »

How about this idea?

_ret_ only works if the internal call-stackpointer is not at bottom, otherwise it's ignored and therfore a NOP.
So as long as you not have done a CALL in your code, the %0000 condition works as a NOP and only inside subroutines it is a _ret_.

Andy

I could add a 3-bit counter for this purpose, but for the benefit, it might only cause more confusion. There would need to be a way to read it, in case something like an 'abort' was done and normal stack usage resumed, but at an offset. This stack is pure flops and muxes. Not much to it:

reg [7:0][21:0] stk;

wire stk_push = push || callpa || callpb || callr || calli;

`regscan (stk, 176'b0,								// init
ptr_w || go && (stk_push || pop || ret || ret_auto),				// ena
    ptr_w || stk_push	? {stk[6:0], push && !ptr_w ? d[21:0] : {c, z, p}}	// push/call, ptr_w traps execution address
			: {stk[7], stk[7:1]})					// pop/ret/ret_auto

I think the interpreter can be made even faster by looking up a CALL address, JMPing to it, and then an implied RETurn brings execution back to the top of the loop. This will require pre-stuffing the stack with the top-of-loop address, so that infinite RETurns keep it returning to the top of the loop. When the stack pops, the top value gets copied down. Once pre-stuffed, it will forever deliver the same return address. Any intervening CALLs only temporarily affect the bottom levels.

Rayman · 2017-03-14 19:49

One nice thing about having NOP as 0 is that you can turn any register into a NOP by just writing a 0 to it. I'm using this now...

If it were something else, it would likely be >512 and then need an extra register to hold it, right?

jmg · 2017-03-14 20:02

Rayman wrote: »

There's just something wrong with 0 not being NOP...
Hope you're not thinking about changing that...

NOP can be anything, but there is benefit in having Zeroed data (0x00) and blank Flash (0xff) as at least predictable operations.
In many CPUs the 0xff fetches a simple (almost NOP) opcode, until the PC wraps to the reset vector - that is tolerable, provided that "PC wraps to the reset vector" is predictable.

What does a P2 do, when it runs over the top ?
( I guess in P2, there are 3 tops... TopOfCog, TopOfLUT, TopOfHub?)

David Betz wrote: »

Wouldn't it actually be better if zero was something like "software reset"? Do you really want code that wanders off into uninitialized memory to just keep running? Wouldn't it be better for an embedded system if it reset to a known state?

See below.

Seairth wrote: »

That's an excellent point. Possibly better would be to trigger a debug interrupt. Instead of a debugger, startup code could set the debug interrupt to call cogstop or whatever other behavior is desirable (cogatn a watchdog cog, for instance).

At which point, yeah, just make NOP an alias for an existing instruction.

There is a software/system standard, called IEC60730, that should be considered in P2 design steps.
It covers many of the above points.

Here is an example of the IEC60730 coverage a recent MCU release claims
http://www.lapis-semi.com/en/company/news/news2017/r201702_1.html

note items like "detecting access to unused ROM areas", and "RAM parity error" (unclear if that one is SW or hardware?) and "RAM guard" and "SFR guard"
Other common items are things like being able to confirm a Clock is present, before you switch to it, and many parts have a Missing Clock Detector Reset watchdog, essentially a simple monostable that fires if no edges are seen after some 10's of microseconds. (usually enough us to allow 32kHz and LF Osc clocking )

With already having two Clocks, a P2 could add this fairly easily ?

cgracey wrote: »

.... There would need to be a way to read it, in case something like an 'abort' was done and normal stack usage resumed, but at an offset...

See above, - A software reset, or watchdog, could occur anytime, how does the present stack clear manage that ?

Rayman · 2017-03-14 20:23

There's a wiki page on NOP

Looks to me that 0 is pretty popular.

This might be why:
"From a hardware design point of view, unmapped areas of a bus are often designed to return zeroes; since the NOP slide behavior is often desirable, it gives a bias to coding it with the all-zeroes opcode."

whatever that means...

KeithE · 2017-03-14 20:26

I can't remember how fuse programming works, but that's one thing to worry about. You don't want to accidentally trigger fuse programming due to running garbage code or due a brownout. In a brownout some instructions may work and others may not work, so you wouldn't want to get to the fuse programming code due to a missed branch. You can search the literature for examples.

potatohead · 2017-03-14 20:33

Ariba wrote: »

How about this idea?

_ret_ only works if the internal call-stackpointer is not at bottom, otherwise it's ignored and therfore a NOP.
So as long as you not have done a CALL in your code, the %0000 condition works as a NOP and only inside subroutines it is a _ret_.

Andy

I'm in favor of this.

Seairth · 2017-03-14 20:57

Rayman wrote: »

Does ROR has same opcode as NOP?

It had the same opcode as "if_never ROR 0,0". The important part was the "if_never". It just happened to be that ROR was instruction %0000000.

Seairth · 2017-03-14 20:59

deleted.

Seairth · 2017-03-14 21:04

@jmg umm... you somehow reversed attribution of those quotes.

cgracey · 2017-03-14 21:08

Okay, I made $00000000 into NOP. I agree that NOP should be a simple value. Nothing is simpler than 0.

This means that if anyone ever wrote this:

      _RET_   ROR     0,0

...then nothing would happen.

It makes no sense to rotate a register by itself, anyway, though it would be some kind of state machine. That they would want to do that on register 0 and do a _RET_ is even more remote of a possibility. So, this should be a good NOP. Note that it needs to be $0000_0000, exactly. High bits in the LSBs will not be a NOP.

jmg · 2017-03-14 21:16

Seairth wrote: »

@jmg umm... you somehow reversed attribution of those quotes.

oops, fixed.

Rayman · 2017-03-14 21:17

Good news! I don't know why I care so much, but I really like having 0 be NOP.

jmg · 2017-03-14 21:19

cgracey wrote: »

Okay, I made $00000000 into NOP. I agree that NOP should be a simple value. Nothing is simpler than 0.

0 is also common NOP value, and is what RAM usually clears to on init. (relevant on parts that can run code from RAM)
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?

ke4pjw · 2017-03-14 21:29

cgracey wrote: »
Okay, I made $00000000 into NOP. I agree that NOP should be a simple value. Nothing is simpler than 0.

This means that if anyone ever wrote this:
      _RET_   ROR     0,0
...then nothing would happen.

It makes no sense to rotate a register by itself, anyway, though it would be some kind of state machine. That they would want to do that on register 0 and do a _RET_ is even more remote of a possibility. So, this should be a good NOP. Note that it needs to be $0000_0000, exactly. High bits in the LSBs will not be a NOP.

Are we going to loose the the code size gains or performance increase that you initially talked about?

cgracey · 2017-03-14 21:37

ke4pjw wrote: »
cgracey wrote: »
Okay, I made $00000000 into NOP. I agree that NOP should be a simple value. Nothing is simpler than 0.

This means that if anyone ever wrote this:
      _RET_   ROR     0,0
...then nothing would happen.

It makes no sense to rotate a register by itself, anyway, though it would be some kind of state machine. That they would want to do that on register 0 and do a _RET_ is even more remote of a possibility. So, this should be a good NOP. Note that it needs to be $0000_0000, exactly. High bits in the LSBs will not be a NOP.
Are we going to loose the the code size gains or performance increase that you initially talked about?

Nope. 0 is still NOP, like it was before. That's all. Hopefully nobody ever has a bug because '_ret_ rot 0,0' isn't working. Chances are about zero.

cgracey · 2017-03-14 21:40

jmg wrote: »

cgracey wrote: »

Okay, I made $00000000 into NOP. I agree that NOP should be a simple value. Nothing is simpler than 0.

0 is also common NOP value, and is what RAM usually clears to on init. (relevant on parts that can run code from RAM)
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?

$FFFFFFFF is AUGD #$7FFFFF. It will sail through a bunch of those without doing anything.

cgracey · 2017-03-14 21:49

This interpreter takes at least 14 clocks per bytecode, but is faster than the one posted above in cases where more than one instruction must be executed, as no more branches are needed:

'
'
' Interpreter loop
'
		rep	#1,#8		'pre-stuff stack with loop address
		push	#loop		'all _ret_'s will jump to loop

loop		rfbyte	p		'get bytecode
		altgb	p,#ibase	'ready to lookup byte
		getbyte	x		'lookup byte
		jmp	x		'jump to snippet, _ret_ returns to loop

ibase		byte	con_n1,	con_0,	con_1,	con_2,	con_3,	con_4,	con_7,	con_8
		byte	con_15,	con_16,	con_31,	con_32,	con_bp,	con_bn,	con_wp,	con_wn,
		byte	con_lg,	con_bx

con_n1	_ret_	pusha	_FFFFFFFF
con_0	_ret_	pusha	#0
con_1	_ret_	pusha	#1
con_2	_ret_	pusha	#2
con_3	_ret_	pusha	#3
con_4	_ret_	pusha	#4
con_7	_ret_	pusha	#7
con_8	_ret_	pusha	#8
con_15	_ret_	pusha	#15
con_16	_ret_	pusha	#16
con_31	_ret_	pusha	#31
con_32	_ret_	pusha	#32

con_bp		rfbyte	x
	_ret_	pusha	x

con_bn		rfbyte	x
		or	x,_FFFFFF00
	_ret_	pusha	x

con_wp		rfword	x
	_ret_	pusha	x

con_wn		rfword	x
		or	x,_FFFF0000
	_ret_	pusha	x

con_lg		rflong	x
	_ret_	pusha	x

con_bx		rfbyte	y
		decod	x,y
		testb	y,#5	wc
	if_c	sub	x,#1
		testb	y,#6	wc
	if_c	not	x
	_ret_	pusha	x

It also only takes one byte per instruction for the address lookup, though the address is limited to $FF.

The LUT could be used as a either an address table or initial jump location. Then, you'd get all the address bits or faster single-instruction snippets.

jmg · 2017-03-14 22:27

cgracey wrote: »

jmg wrote: »

..
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?

$FFFFFFFF is AUGD #$7FFFFF. It will sail through a bunch of those without doing anything.

Ok so far, then what happens next ? does the PC wrap to the reset opcode, & how does it change with a primed AUGD pending ?

cgracey · 2017-03-14 22:37

jmg wrote: »

cgracey wrote: »

jmg wrote: »

..
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?

$FFFFFFFF is AUGD #$7FFFFF. It will sail through a bunch of those without doing anything.

Ok so far, then what happens next ? does the PC wrap to the reset opcode, & how does it change with a primed AUGD pending ?

It depends what's in memory, I guess. A primed AUGD with all the bits set would append 23 one's above the next 9-bit #D.

jmg · 2017-03-14 23:06

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

jmg wrote: »

..
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?

$FFFFFFFF is AUGD #$7FFFFF. It will sail through a bunch of those without doing anything.

Ok so far, then what happens next ? does the PC wrap to the reset opcode, & how does it change with a primed AUGD pending ?

It depends what's in memory, I guess. A primed AUGD with all the bits set would append 23 one's above the next 9-bit #D.

So the PC wraps from 512k to 00, or does the PC inc to 512k+1, and the memory wraps to read 00 ?
What about if the P2 memory becomes a value other than 512k, (since that will be based on final, routed die room), what will the not-physically-present memory read as ?
Can the opcode at 00 be anything, depending on the users code ?

cgracey · 2017-03-15 00:11

jmg wrote: »

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

jmg wrote: »

..
That leaves blank code space, of 0xff - what exactly does P2 do now, on landing into 0xff, and then wrapping PC ?

$FFFFFFFF is AUGD #$7FFFFF. It will sail through a bunch of those without doing anything.

Ok so far, then what happens next ? does the PC wrap to the reset opcode, & how does it change with a primed AUGD pending ?

It depends what's in memory, I guess. A primed AUGD with all the bits set would append 23 one's above the next 9-bit #D.

So the PC wraps from 512k to 00, or does the PC inc to 512k+1, and the memory wraps to read 00 ?
What about if the P2 memory becomes a value other than 512k, (since that will be based on final, routed die room), what will the not-physically-present memory read as ?
Can the opcode at 00 be anything, depending on the users code ?

That's just too much to think about. The programmer should never allow things to get out of control, anyway, and if they do get out of control, the degree to which things crash is mostly uncontrollable.

David Betz · 2017-03-15 00:19

So what is the final outcome of the instruction set changes? Since $00000000 was one of the values for NOP before, is the only real change that the $0000 execution condition now means _RET_?

cgracey · 2017-03-15 00:31

David Betz wrote: »

So what is the final outcome of the instruction set changes? Since $00000000 was one of the values for NOP before, is the only real change that the $0000 execution condition now means _RET_?

That's right.

Instead of $0xxxxxxx all being NOP's, now $00000000 is a dedicated NOP.

Condition-field value %0000 now means _RET_, which only RETurns if a branch is not also occurring because of the actual instruction.

jmg · 2017-03-15 00:34

cgracey wrote: »

That's just too much to think about. The programmer should never allow things to get out of control, anyway, and if they do get out of control, the degree to which things crash is mostly uncontrollable.

Of course you cannot cover all possible crashes, but you can design in some protections.
Most serious MCU vendors already address IEC60730 coverage that I linked above (as just one example).
That is all about designing in simple protections, and they all do add up.

For many customers, if they do not see IEC60730 mentioned, they will simply choose another controller.
'too much to think about' can quickly become "I wish I had thought about that"

David Betz · 2017-03-15 00:56

cgracey wrote: »

David Betz wrote: »

So what is the final outcome of the instruction set changes? Since $00000000 was one of the values for NOP before, is the only real change that the $0000 execution condition now means _RET_?

That's right.

Instead of $0xxxxxxx all being NOP's, now $00000000 is a dedicated NOP.

Condition-field value %0000 now means _RET_, which only RETurns if a branch is not also occurring because of the actual instruction.

Thanks! I notice you've already updated the instruction spreadsheet.

Seairth · 2017-03-15 03:01

Out of curiosity, is the trailing underline necessary for "_ret_"? Could it be just "_ret"? Actually, I'd prefer a different syntax altogether, but I realize it needs to be mutually exclusive of the "if_" predicates. Maybe "iret" (for immediate return or implied return), though that might be confused with return-from-interrupt. Or maybe just make it "ret" and change the actual instruction to "retc" (return from call, making it more consistent with the other RET instructions).

jmg · 2017-03-15 03:28

Seairth wrote: »

Out of curiosity, is the trailing underline necessary for "_ret_"? Could it be just "_ret"? Actually, I'd prefer a different syntax altogether, but I realize it needs to be mutually exclusive of the "if_" predicates. Maybe "iret" (for immediate return or implied return), though that might be confused with return-from-interrupt. Or maybe just make it "ret" and change the actual instruction to "retc" (return from call, making it more consistent with the other RET instructions).

but it is not an immediate return, when attached to a conditional instruction, making iret confusing.
maybe cret for conditional return, if you really need a modifier.

given the context tho, can ret alone not do, as ret + opcode separate it from ret alone on a line.
It is placed in the conditional column, and where it tags a following opcode, intent is clear

I'd also be happy with clearer code and very slightly higher IQ in the tools so this is supported

        djnz	x,#blank
        ret

hsync
		xcont	m_bs,#0			'horizontal sync
		xzero	m_sn,#1
	       	xcont	m_bv,#0
                ret

compiles to a list file like this

blank		call	#hsync			'blank lines
		xcont	m_vi,#0
	_ret_	djnz	x,#blank

hsync		xcont	m_bs,#0			'horizontal sync
		xzero	m_sn,#1
	_ret_	xcont	m_bv,#0

macros already expand in assembler list files, so this is similar to that.
Result is code is very easy to read, and safe to edit, and it will be more tolerant of HLL generated ASM.

Seairth · 2017-03-15 03:46

I had thought about that approach too. It's certainly more natural. But it would also complicate the assembler, which I was trying to avoid. I guess the thing is that the current "_ret_" looks like a cludge, and I was hoping for something a bit more natural looking.

Heater. · 2017-03-15 04:01

I do hope there is someway of getting rid of the horribly ugly _underscores_.

Although I'm not keen on putting too many smarts into an assembler language either. Not past aliasing some instructions. If I write an instruction I should get an instruction.

jmg · 2017-03-15 04:17

Seairth wrote: »

I had thought about that approach too. It's certainly more natural. But it would also complicate the assembler....

Only very slightly..
When any RET is found, the assembler knows if it has a tag label.
If yes, it keeps the ret form, whilst if there are 2 linear opcodes, the ret flips to be a condition code patch on the prior opcode.
The LST file clearly shows what was done. It can even add a comment as such.

Heater. wrote: »

Although I'm not keen on putting too many smarts into an assembler language either. Not past aliasing some instructions. If I write an instruction I should get an instruction.

So you never use macros then ?
Or include files ?
Or generic calls ?
Or conditional code ?

There are already many cases where any exact ASM source-line-to-bytes rule is broken.

Heater. wrote: »

I do hope there is someway of getting rid of the horribly ugly _underscores_.

Since the approach I suggest (which does get rid of the horribly ugly _underscores_) is not mutually exclusive, you are of course free to use the _ret_ form. A user taste choice.

Fast Bytecode Interpreter

Comments