Fast Bytecode Interpreter

David Betz · 2017-03-21 18:02

Roy Eltham wrote: »

David Betz,
I think that ship already sailed with the self modifying and all the ALT* instructions. This is a neat code density trick, and with a little smarts an editor could show the executed snippets in a tooltip or some form of expanded view mode. Heck, we might even be able to let you edit the "expanded" form and have it auto-generate the condensed form.

Maybe I am biased, since I have to deal with templates, lambdas, and what not in C++ all the time, and those are equally hard (if not far worse in some cases) to parse and debug. I agree that it will make PASM2 a bit harder to read and debug when it has them, but I think the benefits far outweigh that cost.

In my opinion it is way worse than lambdas and templates. Having a bit mask determine which instructions are executed means you can't look at code and figure out what it does without finding every use of it and examine the bit mask. And if you modify the code you have to make sure all of the bit masks still work as intended or modify them. It's one of the ugliest features I've ever seen in an instruction set. Just my opinion of course.

potatohead · 2017-03-21 18:09

This is another one of those choices, like interrupts are.

I second Roy's comments. This is a seriously great optimization.

We can present it in ways that help people.

No way this is a bad idea.

MJB · 2017-03-21 18:21

cgracey wrote: »
...

Well, we can now get ultimate compactness AND highest speed:
		skip	##%001111110111100	'execute jmp/rfbyte/add/rdlong/pusha
		jmp	#rw_mem

...then this executes, pieced together from the rw_mem code...

		rfbyte	m
		add	m,dbase
		rdlong	x,m
	_ret_	pusha	x		'only the four instructions we wanted, zero time overhead!
SKIP uses bits in D, LSB first, to selectively cancel instructions in the order they occur in memory, starting with the next instruction. It works through branches, too. And it doesn't just cancel instructions. In cog-exec mode, it actually increments the PC by 1..8, not just by 1, in order to get to the next instruction, saving time. It works in hub-exec mode by just cancelling instructions, always stepping by 4, to not reset the FIFO.

so skip takes the 15 bits in the above example as an immediate into D&S ? or generates an AUGD for skip patterns > 9 up to 32 bits/instructions?

so when the pattern is in a register we can really assemble it dynamically ... so we can write self modifying code with it ;-) ??
wow

Seairth · 2017-03-21 18:38

The fact that SKIP continues to apply after a branch makes me a little nervous. I know that the above code example made use of it, but is that really necessary? From what I'm seeing it could have also been done with the SKIP being executed immediately before the instructions it applies to. True, this would require a little extra setup and a 2-clock penalty if the first instruction is skipped. But it also localizes what SKIP can affect. Similarly, imagine trying to troubleshoot code that doesn't seem to be executing because a previously called block happened to set a SKIP before returning.

jmg · 2017-03-21 19:08

Seairth wrote: »

It might be worthwhile to make this two separate instructions (same encoding, though):

SETCZ #
RORCZ D

This makes it a bit clearer when reading code. Also, that aligns it with "ROR D, #1 WC" (which itself might benefit from an alias RORC).

Good point, certainly, now the opcode modifies D, it needs a rename to reflect that.

jmg · 2017-03-21 19:11

Rayman wrote: »

This is going to allow for some really obfuscated assembly code...

hehe, yes...

Rayman wrote: »

What happens if there's a second skip when one is already in effect?

My guess is it simply reloads the skip queue, but it would be rare that one skip would be 'alive' when hitting the next ?
I think also no-more-ones clears everything too.

jmg · 2017-03-21 19:21

cgracey wrote: »
		skip	##%001111110111100	'execute jmp/rfbyte/add/rdlong/pusha
SKIP uses bits in D, LSB first, to selectively cancel instructions in the order they occur in memory, starting with the next instruction. It works through branches, too. And it doesn't just cancel instructions. In cog-exec mode, it actually increments the PC by 1..8, not just by 1, in order to get to the next instruction, saving time. It works in hub-exec mode by just cancelling instructions, always stepping by 4, to not reset the FIFO.

This is pretty cool ( but I would say that, as I've suggested a SKIP before

)

Skip is also useful for streaming memory, as in XIP designs.
Condition codes can be used for some simplest skip effects, but this is more powerful.

Questions:
- Is there a 9-bit immediate version, or does it always cost 64b opcode ?
- Example shows Const, text says D, so I guess both #Immed, and Reg versions exist ?
- If there is a 9-bit version, does it make sense to nudge the 8-quanta, to 9 ?
- How does this behave crossing code boundaries ?
- Is there a SKIP #0 that cancels any queued Skip ?

I can also see a use in Libraries, where you have skip-based preambles, and skip-based returns for handling differing memory regions, and different types. The SKIP field is passed as a type param

jmg · 2017-03-21 19:30

David Betz wrote: »

I still think it might be better to just rearchitect the byte code VM instruction set to fit into COG/LUT RAM. Are you attempting to keep the byte codes the same as the ones used by the P1 VM?

I think the byte-codes are already changed for P2, as the capabilities are very different.
Usually the most-used byte code would pack into COG and LUT, but Hub exec allows the rare ones to reside in HUB.
The main issue with overflow into HUB, I see as management of HUB memory maps, so there is strong pressure to avoid this, but it is no longer a brick wall.
With a software based Spin, there was talk about a Spin compiler that extracted and packed the used calls only.
'Protecting users from themselves' is one chestnut to solve when taking that path

JasonDorie · 2017-03-21 19:31

At the very least I would've done it by moving the skip value into a known memory location (or pushing to the stack) then calling the function, and have that function pull the skip register from the stack and apply it. As Seairth says, it would limit the scope. Alternately, if any RET instruction killed it, that would also suffice, as that would avoid the push/pop for the setup, but still limit the scope.

cgracey · 2017-03-21 19:56

David Betz wrote: »

cgracey wrote: »

David Betz wrote: »

It looks clever but also a source of virtually unreadable code. Is it really faster than the one-routine-per-instruction implementation if you put those routines in hub memory? How about paring down the instruction set so you don't need so many cases?

You don't want those routines in the hub because the FIFO will always be slowing down the RDxxxx/WRxxxx instructions. This allows for smallest code, actually just the bare set of instructions needed to build any variant routine, and then a mask for the SKIP instruction.

I still think it might be better to just rearchitect the byte code VM instruction set to fit into COG/LUT RAM. Are you attempting to keep the byte codes the same as the ones used by the P1 VM?

The interpreter will fit in the cog and lut, no problem. The byte codes are new, compared to Spin1.

cgracey · 2017-03-21 20:00

Rayman wrote: »

This is going to allow for some really obfuscated assembly code...

What happens if there's a second skip when one is already in effect?

There are techniques for handling that. I will show the interpreter, as it progresses. It's almost a new way of programming, where you throw all the instructions you'll need into a loose sequence, and then at runtime, skip those you don't want. You can come up with all kinds of permutations.

Any new SKIP overrides the current SKIP.

cgracey · 2017-03-21 20:12

jmg wrote: »
cgracey wrote: »
		skip	##%001111110111100	'execute jmp/rfbyte/add/rdlong/pusha
SKIP uses bits in D, LSB first, to selectively cancel instructions in the order they occur in memory, starting with the next instruction. It works through branches, too. And it doesn't just cancel instructions. In cog-exec mode, it actually increments the PC by 1..8, not just by 1, in order to get to the next instruction, saving time. It works in hub-exec mode by just cancelling instructions, always stepping by 4, to not reset the FIFO.
This is pretty cool ( but I would say that, as I've suggested a SKIP before )

Skip is also useful for streaming memory, as in XIP designs.
Condition codes can be used for some simplest skip effects, but this is more powerful.

Questions:
- Is there a 9-bit immediate version, or does it always cost 64b opcode ?
- Example shows Const, text says D, so I guess both #Immed, and Reg versions exist ?
- If there is a 9-bit version, does it make sense to nudge the 8-quanta, to 9 ?
- How does this behave crossing code boundaries ?
- Is there a SKIP #0 that cancels any queued Skip ?

I can also see a use in Libraries, where you have skip-based preambles, and skip-based returns for handling differing memory regions, and different types. The SKIP field is passed as a type param

SKIP {#}D

It's LSB-first, so leading 0's end the skipping in cases of 9-bit #D values.

SKIP #0 would end any skipping pattern.

This is maybe a lot simpler than I conveyed. And the really cool thing is that when executing from cog/lut, the PC is stepped by 1...8 to get to the next instruction to execute, making things really fly. In cases where steps greater than 8 are needed, a step of 8 will occur, but that instruction will be cancelled, taking two clocks. I'll see about increasing the max step size to 16. It would take more logic, but mainly more settling time.

cgracey · 2017-03-21 20:23

jmg wrote: »

Rayman wrote: »

This is going to allow for some really obfuscated assembly code...

hehe, yes...

Rayman wrote: »

What happens if there's a second skip when one is already in effect?

My guess is it simply reloads the skip queue, but it would be rare that one skip would be 'alive' when hitting the next ?
I think also no-more-ones clears everything too.

Something else: a branch is seen as just another skippable instruction. If your SKIP list allows a branch, the next bits above will apply to the instruction stream at the new address. This code is all protected from interrupts, until the SKIP queue has no more 1's in it. This is a whole new way to program.

Roy Eltham · 2017-03-21 20:27

It is a new way to program, and I think it needs a new way of presenting things to make it less confusing (especially when branches are tossed into the mix).

David Betz · 2017-03-21 20:40

cgracey wrote: »

Rayman wrote: »

This is going to allow for some really obfuscated assembly code...

What happens if there's a second skip when one is already in effect?

There are techniques for handling that. I will show the interpreter, as it progresses. It's almost a new way of programming, where you throw all the instructions you'll need into a loose sequence, and then at runtime, skip those you don't want. You can come up with all kinds of permutations.

Any new SKIP overrides the current SKIP.

You may have created something even more to be avoided than GOTO! :-)

jmg · 2017-03-21 21:03

cgracey wrote: »

This is maybe a lot simpler than I conveyed. And the really cool thing is that when executing from cog/lut, the PC is stepped by 1...8 to get to the next instruction to execute, making things really fly. In cases where steps greater than 8 are needed, a step of 8 will occur, but that instruction will be cancelled, taking two clocks. I'll see about increasing the max step size to 16. It would take more logic, but mainly more settling time.

16 is worth checking, but if there is an opcode-change at 9, it would make sense to check a 9-base ?
- only slightly more logic than 8, and I think enables the more compact form, without an unexpected step ?

jmg · 2017-03-21 21:06

cgracey wrote: »

Something else: a branch is seen as just another skippable instruction. If your SKIP list allows a branch, the next bits above will apply to the instruction stream at the new address. This code is all protected from interrupts, until the SKIP queue has no more 1's in it. This is a whole new way to program.

How exactly is 'protected' applied - are interrupts disabled ? (sounds risky)
Does RET clear Skip ? - If not, a SKIP #0 could always be used, but that pushes up code size.

Seairth · 2017-03-21 21:22

If SKIP is going to survive branches, then in the case of RET, you could reuse the _ret_ flag to modify SKIP behavior. I'd recommend RET cancels SKIP by default (the safe option), but can keep the skip active with %0000. Maybe alias as RETSKIP or something.

jmg · 2017-03-21 21:23

David Betz wrote: »

In my opinion it is way worse than lambdas and templates. Having a bit mask determine which instructions are executed means you can't look at code and figure out what it does without finding every use of it and examine the bit mask. And if you modify the code you have to make sure all of the bit masks still work as intended or modify them. It's one of the ugliest features I've ever seen in an instruction set. Just my opinion of course.

Random-bit patterns are unlikely to be used by a compiler, and the skip-list management would make it tricky to use in Libraries, tho I could see benefits for compact libraries.

I could see if-else block management, with some helpers in the tool flows. Compilers could possibly manage that, but assemblers would ideally need a means to generate the skip-mask based on labels.

' psuedo listing code of no-jumps if-else
if_z   skip #%011111000   ' built from if-labels
if_nz skip #%000001111   ' built from else-labels
' 5 lines of If
' 4 lines of Else

You do NOT want programmers squinting at bit masks, and trying to count lines of code.
Is a double opcode, one or two skip bits ?

Roy Eltham wrote: »

It is a new way to program, and I think it needs a new way of presenting things to make it less confusing (especially when branches are tossed into the mix).

This looks more like an ASM-only use, and the P2 Assembler support is already at the poor end of the scale.
There is a risk the P2 becomes an i860 ?

Roy Eltham · 2017-03-21 21:27

jmg,
That's a bit of a stretch... i860... LMAO.

Heater. · 2017-03-21 21:40

jmg,

There is a risk the P2 becomes an i860 ?

Funny you should say that. The i860 also came to my mind.

Has anyone tried to program that thing in assembler? I have not, but I did sit through an hour long presentation by an Intel guy about i860 assembly language programming when it was launched.

What with it's parallel streams of integer and float operations, written on the same source line, and it's pipelining that caused the results of one instruction to come out in the destination register of another instruction 4 lines down the code, it was obviously impossible to work with. The guy giving the presentation did not show how to reach max flops performance on the i860 and admitted that Intel itself did not know how to do it for the FFT case he was discussing. That's OK "The compilers will take care of it".

Yeah, sure. The chip flopped.

Seairth · 2017-03-21 21:52

jmg wrote: »
I could see if-else block management, with some helpers in the tool flows. Compilers could possibly manage that, but assemblers would ideally need a means to generate the skip-mask based on labels.
' psuedo listing code of no-jumps if-else
if_z   skip #%011111000   ' built from if-labels
if_nz skip #%000001111   ' built from else-labels
' 5 lines of If
' 4 lines of Else

See? You already got it wrong! It should be:

' psuedo listing code of no-jumps if-else
if_z   skip ##%1111000001   ' built from if-labels
if_nz  skip #%11111         ' built from else-labels
' 5 lines of If
' 4 lines of Else

The LSB in the first skip could actually be either zero or one in this case, since the second skip is predicated by the if_nz. This variant might be a bit more clear, though:

if_z   skip ##%1111000001   ' built from if-labels
       skip #%11111         ' built from else-labels
' 5 lines of If
' 4 lines of Else

Note the use of AUGD. You would have to keep the total number of instructions in the if/else blocks to 8 to avoid that.

Seairth · 2017-03-21 21:54

Another hard-to-find bug that will likely be caused with SKIP: accounting for implied AUGx instructions.

jmg · 2017-03-21 21:55

jmg wrote: »
I could see if-else block management, with some helpers in the tool flows. Compilers could possibly manage that, but assemblers would ideally need a means to generate the skip-mask based on labels.
' psuedo listing code of no-jumps if-else
if_z   skip #%011111000   ' built from if-labels
if_nz skip #%000001111   ' built from else-labels
' 5 lines of If
' 4 lines of Else
You do NOT want programmers squinting at bit masks, and trying to count lines of code.
Is a double opcode, one or two skip bits ?

Thinking some more on this, the above example 'skips a skip', so is more compact, but is harder to read.
Another more general and easier to read/general form could be like this

' psuedo listing code of no-jumps if-else, simpler serial design, auto-label mask generate.
if_nz   skip skip_if_end    ' uses Name_else, Name_end to auto-build skip masks
' 1-32 lines of If  (< 9 lines is faster)
skip Skip_end   ' ** PASM generated, if Skip_else present
Skip_else:         ' optional Skip, not needed with no else body
' optional 1-32 lines of Else (< 9 lines is faster)
Skip_end:    'mandatory end  label

Seairth · 2017-03-21 22:00

In a similar vein, what would happen if SKIP enountered something like:

SKIP #%010
OR   reg, ##big_number
AND  reg, #small_number

Presumably, the AUGS would be executed, the OR would be skipped and the AND would be executed. Would the outstanding AUG apply to the AND?

David Betz · 2017-03-21 22:08

Just say "no" to SKIP.

jmg · 2017-03-21 22:10

Seairth wrote: »

See? You already got it wrong! It should be:

Well, yes

- I did not drill to LSB-first detail, but see already the other form I posted, using Assembler to fill-out common skip masks. That would be a more common method.

Seairth wrote: »

Another hard-to-find bug that will likely be caused with SKIP: accounting for implied AUGx instructions.

Yes, see my question above about 'Is a double opcode, one or two skip bits ?'
I think skip can only count 32b chunks, and so the labels are the best way to manage any double opcodes.

I see this as a similar problem to REP, where first versions had a complicated offset-line-count, but it was fixed to be a simpler
REP label
structure.

Random Skip patterns would always be possible, but would not be applied in most general code.
Only cases like Chip's example of complex-code-packing would use complex skips, and even those could have software helpers.

Seairth wrote: »
In a similar vein, what would happen if SKIP enountered something like:
SKIP #%010
OR   reg, ##big_number
AND  reg, #small_number
Presumably, the AUGS would be executed, the OR would be skipped and the AND would be executed. Would the outstanding AUG apply to the AND?

Good point, worth testing this type of case. It would not usually be deliberate code, but it is possible.

Roy Eltham · 2017-03-21 22:12

David Betz,
You are free to not use it.

ctwardell · 2017-03-21 22:23

Obviously some details need to be worked out, but so far the SKIP concept looks very compelling.

David Betz · 2017-03-21 22:23

Roy Eltham wrote: »

David Betz,
You are free to not use it.

Yup, you're right about that. I'm beginning to think my best option will be to stick with P1.

Fast Bytecode Interpreter

Comments