Fast Bytecode Interpreter

cgracey · 2018-09-12 12:22

In this section:

pushv		pusha	x		'a b c d e f g h		a: read INA
	_ret_	mov	x,ina		'a | | | | | | |		b: read INB
	_ret_	mov	x,inb		'  b | | | | | |		c: GETRND
	_ret_	getrnd	x		'    c | | | | |		d: GETCT
	_ret_	getct	x		'      d | | | |		e: COGID
	_ret_	cogid	x		'        e | | |		f: LOCKNEW
		locknew	x	wc	'          f | |		g: POLLATN
		pollatn		wc	'          | g |		h: POLLPAT
		pollpat		wc	'          | | h		i: LOCKCHK(lock)
lockchk_	lockrel	x	wc	'          | | | i  		j: COGCHK(cog)
	_ret_	bitc	x,#31		'          f | | i  		k: LOCKTRY(lock)
cogchk_		cogid	x	wc	'            | |   j
locktry_	locktry	x	wc	'            | |   | k
	_ret_	muxc	x,_FFFFFFFF	'            g h   j k

You can see that there are special single-byte instructions for reading INA and INB. They just push X (stack top) onto the stack and them move either INA or INB into X. Just two instructions with 6-clock overhead from the bytecode executor.

There are fast setup instructions for DIRA/DIRB/OUTA/OUTB/INA/INB, since they will be most-often operated on at the bitfield level. INA and INB may be often read whole, though.

There are also instructions for reading and writing basic pins, as well as smart pins, using atomic instructions. They are all single byte codes. Spin2 is going to be really compact and fast.

jmg · 2018-09-12 14:01

cgracey wrote: »

I've got everything into the Spin2 bytecode interpreter that I think is needed, except the bytecode snippet for starting a new Spin2 process.
Once I get everything else working, it will become more apparent how to handle that.

I need to go through this and tidy it up a little bit, but I don't feel like anything needs changing. After that, I'll start working on the compiler to generate the bytecode from source code.

Neat to see...

Can you add a MAP file with the source posting ? (& build command line(s)) - so users can confirm their build is ok ?
& a brief summary table ?

Given this looks like a release candidate, maybe start a new thread with more obvious release software name ?
Is that going to be 'Spin2 bytecode interpreter (coded in p2asm) ' here ?

I notice this has 38 references to 'cog' and only 3 of those are assembler mnemonics ?
Can you do an edit pass to do the COG -> CORE semantics change, so that drops to 3 or 0 ?

cgracey · 2018-09-12 15:05

Ooh, that's like telling Trump (or Samson) he needs to change his hairdo, just because. I need a 3-letter word that means cog, but sounds like processor or core. Plus, I need some time to acclimate to the change.

COG --> COR

jmg · 2018-09-12 15:43

cgracey wrote: »

I need a 3-letter word that means cog, but sounds like processor or core. Plus, I need some time to acclimate to the change.

COG --> COR

sure, for mnemonics I'd say COR is fine.

I thought the change of COG to CORE in the public facing docs and all explanations, and deprecating COG in P2 was decided a while back, and Ken/you were fine with ?
Made sense to me, as huge numbers of coders already have 2 or 4 core parts on their desks.

jmg · 2018-09-12 15:53

cgracey wrote: »

Ooh, that's like telling Trump (or Samson) he needs to change his hairdo, just because.

Nah, much more quite the exact opposite, like putting the school uniform on, and watching him walk thru those school gates.... a lump in the throat, quietly thinking 'wait'll they see what he can do'!

ersmith · 2018-09-12 16:42

Looks pretty nice, Chip. I just have a few comments:

(1) Lots of languages need unsigned operations. Unsigned multiply and divide are pretty much free. Unsigned comparisons would be very straightforward too, you'd just need some alternate patterns in op_rel that use cmp instead of comps.

Unsigned divide and remainder are useful even for Spin: they make number printing, for example, much easier without ugly code for $8000_0000. I've recently added "+/" and "+//" operators to fastspin for just that reason.

(2) If you find opcode space tight it should be easy to have the compiler do the swap for comparisons, i.e. convert > to < and >= to <= by pushing the arguments in the opposite order. Or, it could convert > to NOT <=.

(3) The calling convention looks to be getting pretty complicated now -- besides pbase, vbase, dbase we now have dcall too? Does the "call @ptr" opcode let compilers skip all that and just emit absolute addresses for the functions? In fact it almost seems like the compiler should do the address lookups for the array versions of call (and again save on opcode space)

Eric

octetta · 2018-09-12 18:11

cgracey wrote: »

Ooh, that's like telling Trump (or Samson) he needs to change his hairdo, just because. I need a 3-letter word that means cog, but sounds like processor or core. Plus, I need some time to acclimate to the change.

COG --> COR

I'm still not sold on the need for "core" versus "cog" but don't really want to argue it here.

As or a three letter variant though, that's a puzzle I like and therefore, I suggest using "cpu" or "mpu" (since there's no "central", "mpu" could imply multi-processing-unit or micro-processing-unit).

Anyway, good luck with the hairdo change!

David Betz · 2018-09-12 18:31

octetta wrote: »

cgracey wrote: »

Ooh, that's like telling Trump (or Samson) he needs to change his hairdo, just because. I need a 3-letter word that means cog, but sounds like processor or core. Plus, I need some time to acclimate to the change.

COG --> COR

I'm still not sold on the need for "core" versus "cog" but don't really want to argue it here.

As or a three letter variant though, that's a puzzle I like and therefore, I suggest using "cpu" or "mpu" (since there's no "central", "mpu" could imply multi-processing-unit or micro-processing-unit).

Anyway, good luck with the hairdo change!

CPU could stand for "COG Processing Unit". It still makes a reference to COG without being in your face.

potatohead · 2018-09-12 19:52

COR = Cog Originally, Right?

Yeah, I will easily admit I prefer COG. And I will say it right now:

Changing this will just trade one set of questions for another set. Best case? Some people who didn't feel good about it might, and others who did, might not.

Net gain? Zero.

Roy Eltham · 2018-09-12 19:55

Everytime I explain the Prop to friends that are already MCU savvy, I have to explain the terminology (e.g. cog = core). It will certainly not be a net gain of zero for anyone outside the core Prop fans.

Rayman · 2018-09-12 20:20

Maybe it was different with P1, without a HUBEXEC mode, like regular cores...
But, isn't it now like everybody else (with cores)?

Heater. · 2018-09-12 20:35

"cog" makes no sense. Nobody out there would immediately intuit what that means.

"core" makes no sense. There are 8 of them, none of them is the core of the machine.

"CPU" makes no sense. There are 8 of them, none of them is central.

Do I have a better idea? No. Perhaps Spud is right, changing this will just trade one set of questions for another set.

David Betz · 2018-09-12 21:07

Heater. wrote: »

"cog" makes no sense. Nobody out there would immediately intuit what that means.

"core" makes no sense. There are 8 of them, none of them is the core of the machine.

"CPU" makes no sense. There are 8 of them, none of them is central.

Do I have a better idea? No. Perhaps Spud is right, changing this will just trade one set of questions for another set.

But surely there are many multi-core processors. Why would that be confusing?

Tubular · 2018-09-12 21:45

cgracey wrote: »

Ooh, that's like telling Trump (or Samson) he needs to change his hairdo, just because. I need a 3-letter word that means cog, but sounds like processor or core. Plus, I need some time to acclimate to the change.

COG --> COR

Cores are fine, and are on the rise - hexa-cores, octa-cores in new phones, thousands of cores in modern GPUs

But now you have potential confusion with Cordic

msrobots · 2018-09-12 21:54

nah, core is OK, we have 64 smartpins, 8 cores and one cordic. Who else has that, right?

and CORSTART works as fine as COGSTART.

My favorite name was P64 but that sounds to 8-bitty.

Mike

Tubular · 2018-09-12 21:54

How about CRU - "Cruncher". It has eight crunchers and sixty four smart pins

jmg · 2018-09-12 21:57

Tubular wrote: »

Cores are fine, and are on the rise - hexa-cores, octa-cores in new phones, thousands of cores in modern GPUs

But now you have potential confusion with Cordic

Valid point, does that make this David Betz's comment from above

"CPU could stand for "COG Processing Unit". It still makes a reference to COG without being in your face."
a good mnemonic fit then ?

I'm fine with a water mark / hat-tip to the historic pathway, it's the 9-digits-sea-of-users out there, who know what cores are, are a better focus...

jmg · 2018-09-12 22:23

msrobots wrote: »

nah, core is OK, we have 64 smartpins, 8 cores and one cordic. Who else has that, right?
and CORSTART works as fine as COGSTART.

I'd agree, but I can see Tubular's point around Cordic, in that someone might confuse CORSTART with being some start the Cordic engine operation.
His suggestion would give CPUSTART, & seems ok to me.

TonyB_ · 2018-09-12 22:46

Mnemonics should be short and snappy. How about replacing COG with just C? Also change to QROT and QVEC.

potatohead · 2018-09-12 23:26

It's all good. I made the snarky "Cog Originally Right?" up for the purpose of merging it all. There is a legacy to understand and it won't all see revision.

This way, we have a bit of fun.

I am perfectly happy to be in the minority opinion as to the net gain. On that, we all shall see. Will be a very nice problem to have.

COR is just fine. Frankly, anything else will definitely be a net loss because then we will face both COG questions Roy identified, and CORe questions, because we will have latched onto yet another odd, or new thing! Hello regression city, here we come!

LMAO

We should do COR. Make sure "Cog Originally, Right?" is out there, for when someone writes COG, and hope for a gain.

(And people are gonna forget, muscle memory and write COG. Best be in front of it for max net gain potential.)

cgracey · 2018-09-12 23:34

The cogs work like team members, where the boss is only the boss by his orders (coding). It's peer to peer. A cog is like an enlisted person with orders to follow.

potatohead · 2018-09-12 23:39

Great point Chip. Never thought of it like that. To me, it always was like a custom gear box.

In any case...

Onward!

cgracey · 2018-09-12 23:56

ersmith wrote: »

Looks pretty nice, Chip. I just have a few comments:

(1) Lots of languages need unsigned operations. Unsigned multiply and divide are pretty much free. Unsigned comparisons would be very straightforward too, you'd just need some alternate patterns in op_rel that use cmp instead of comps.

Unsigned divide and remainder are useful even for Spin: they make number printing, for example, much easier without ugly code for $8000_0000. I've recently added "+/" and "+//" operators to fastspin for just that reason.

(2) If you find opcode space tight it should be easy to have the compiler do the swap for comparisons, i.e. convert > to < and >= to <= by pushing the arguments in the opposite order. Or, it could convert > to NOT <=.

(3) The calling convention looks to be getting pretty complicated now -- besides pbase, vbase, dbase we now have dcall too? Does the "call @ptr" opcode let compilers skip all that and just emit absolute addresses for the functions? In fact it almost seems like the compiler should do the address lookups for the array versions of call (and again save on opcode space)

Eric

1) Good idea about unsigned operations. Looking over all the math operations, it seems that DIVIDE, REMAINDER and RELATIONAL operators need unsigned variants. I agree this is really important. Maybe something like this in source code:

x +< y (unsigned less-than)
x +<= y (unsigned less-than-or-equal)
x +> y (unsigned greater-than)
x +>= y (unsigned greater-than-or-equal)

x +/ y (unsigned divide)
x +// y (unsigned remainder)

DIVIDE and REMAINDER take 3 bytecode slots each, due to normal/write/write-push variants. That means 6 codes. The relational operators need only 4 codes, as there are no variants.

2) Good idea!

3) From the source-code perspective, it will be simple.

Rayman · 2018-09-13 00:07

unsigned spin? Sounds strange to me...

So, byte and word are already unsigned and only long is signed, right?

ersmith · 2018-09-13 00:13

Rayman wrote: »

unsigned spin? Sounds strange to me...

So, byte and word are already unsigned and only long is signed, right?

Maybe the interpreter could be used for other languages too?

Spin1 already had signed and unsigned shifts, so this is just extending it to some other operations where signedness matters. The longs will remain signed by default, but where you want them treated as unsigned it will be possible.

jmg · 2018-09-13 00:22

ersmith wrote: »

Maybe the interpreter could be used for other languages too?

Oh, I'm sure the core elements will be used many times.

You may have seen my comment in your other thread, around a their frontend/ this backend (tweaked as needed) merger, to create a Prop 2 Python

https://github.com/adafruit/circuitpython/releases
& highly optimized ByteCode PASM backend
...
Dropping a Prop2 release into the https://github.com/adafruit/circuitpython pool would be fun to watch !

David Betz · 2018-09-13 00:37

ersmith wrote: »

Rayman wrote: »

unsigned spin? Sounds strange to me...

So, byte and word are already unsigned and only long is signed, right?

Maybe the interpreter could be used for other languages too?

Spin1 already had signed and unsigned shifts, so this is just extending it to some other operations where signedness matters. The longs will remain signed by default, but where you want them treated as unsigned it will be possible.

Won't the unusual calling sequence make it difficult to use for languages like C?

ozpropdev · 2018-09-13 01:10

Chip
Thanks for posting the Spin2 interpreter.

I just ran your Spin2 interpreter through my new debugger and it works great!
The single stepping through bytecode programs is working well.
Bytecodes can be disassembled in compressed and expanded formats. (see end of output)
As you can see the debugger uses debug information from Pnut.

<LOG>>watch x
~~~~~~~~~~~~~~~~ Propeller 2 Debugger 2.2 File: spin2_test1.obj ~~~~~~~~~~~~~~~~
           x COG $1E8:  $0003FB2C %000000000_00000011_11111011_00101100 260908
TOS: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Cog #0 FLAGS: nc nz BRK:00
000: FD67FE29 spin2                          SETQ2   #$1ff
001: FF000005                                AUGS    $5 ($a00) 
002: FB040164                                RDLONG  spin2,PTRA++[4]
<LOG>>watch y
~~~~~~~~~~~~~~~~ Propeller 2 Debugger 2.2 File: spin2_test1.obj ~~~~~~~~~~~~~~~~
           x COG $1E8:  $0003FB2C %000000000_00000011_11111011_00101100 260908
           y COG $1E7:  $00000314 %000000000_00000000_00000011_00010100 788
TOS: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Cog #0 FLAGS: nc nz BRK:00
000: FD67FE29 spin2                          SETQ2   #$1ff
001: FF000005                                AUGS    $5 ($a00) 
002: FB040164                                RDLONG  spin2,PTRA++[4]
<LOG>>step
~~~~~~~~~~~~~~~~ Propeller 2 Debugger 2.2 File: spin2_test1.obj ~~~~~~~~~~~~~~~~
           x COG $1E8:  $0003FB2C %000000000_00000011_11111011_00101100 260908
           y COG $1E7:  $00000314 %000000000_00000000_00000011_00010100 788
TOS: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Cog #0 FLAGS: nc nz BRK:00
003: FD900000                                JMP     #beginit
<LOG>(step) >
~~~~~~~~~~~~~~~~ Propeller 2 Debugger 2.2 File: spin2_test1.obj ~~~~~~~~~~~~~~~~
           x COG $1E8:  $0003FB2C %000000000_00000011_11111011_00101100 260908
           y COG $1E7:  $00000314 %000000000_00000000_00000011_00010100 788
TOS: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Cog #0 FLAGS: nc nz BRK:00
004: FF000009 beginit                        AUGS    $9 ($1200) 
005: FC7C0134                                RDFAST  #$0,##$1334
<LOG>(step) >
~~~~~~~~~~~~~~~~ Propeller 2 Debugger 2.2 File: spin2_test1.obj ~~~~~~~~~~~~~~~~
           x COG $1E8:  $0003FB2C %000000000_00000011_11111011_00101100 260908
           y COG $1E7:  $00000314 %000000000_00000000_00000011_00010100 788
TOS: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Cog #0 FLAGS: nc nz BRK:00
006: F9CFF40F                                BMASK   DIRA,#$f
<LOG>(step) >
~~~~~~~~~~~~~~~~ Propeller 2 Debugger 2.2 File: spin2_test1.obj ~~~~~~~~~~~~~~~~
           x COG $1E8:  $0003FB2C %000000000_00000011_11111011_00101100 260908
           y COG $1E7:  $00000314 %000000000_00000000_00000011_00010100 788
TOS: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Cog #0 FLAGS: nc nz BRK:00
007: FF00007E                                AUGS    $7e ($fc00) 
008: F607F8F0                                MOV     OUTA,##$fcf0
<LOG>(step) >
~~~~~~~~~~~~~~~~ Propeller 2 Debugger 2.2 File: spin2_test1.obj ~~~~~~~~~~~~~~~~
           x COG $1E8:  $0003FB2C %000000000_00000011_11111011_00101100 260908
           y COG $1E7:  $00000314 %000000000_00000000_00000011_00010100 788
TOS: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Cog #0 FLAGS: nc nz BRK:00
009: FD67FE2A                                PUSH    #$1ff
<LOG>(step) >
~~~~~~~~~~~~~~~~ Propeller 2 Debugger 2.2 File: spin2_test1.obj ~~~~~~~~~~~~~~~~
           x COG $1E8:  $0003FB2C %000000000_00000011_11111011_00101100 260908
           y COG $1E7:  $00000314 %000000000_00000000_00000011_00010100 788
TOS: 000001FF 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
Cog #0 FLAGS: nc nz BRK:00
00A: 0D658228                          _RET_ SETQ    #$c1
00B: 00000000                                NOP     
PA = $20172 (bytecode) PB = $00179 GETPTR = 01334
<LOG>(step) >
~~~~~~~~~~~~~~~~ Propeller 2 Debugger 2.2 File: spin2_test1.obj ~~~~~~~~~~~~~~~~
           x COG $1E8:  $0003FB2C %000000000_00000011_11111011_00101100 260908
           y COG $1E7:  $00000314 %000000000_00000000_00000011_00010100 788
TOS: 000001FF 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
SKIP pattern: %110000110 SKIPM:0 CALL depth = 0
Cog #0 FLAGS: nc Z BRK:00
361: FD63BA14 var_cog                        RFVARS  a
PA = $45 (bytecode) PB = $01335 GETPTR = 01335
<LOG>(step) >bytecode
================================================================================
Lut base $000, 8 Bit bytecode, BC = $45 = BC[7:0] = $45 (69)
LUT $45: $00061b61 = JMP #var_cog Skip mask = %110000110
361: FD63BA14 var_cog                        RFVARS  a
364: F603B5CF                                MOV     rd,rd_reg
365: F603B7D5                                MOV     wr,wr_reg
366: F9BBB5DD                                SETS    rd,a
367: F9B3B7DD                                SETD    wr,a
36A: 0607B81F                          _RET_ MOV     sz,#$1f
<LOG>>bytecode *
================================================================================
Lut base $000, 8 Bit bytecode, BC = $45 = BC[7:0] = $45 (69)
LUT $45: $00061b61 = JMP #var_cog Skip mask = %110000110
361: FD63BA14 var_cog                        RFVARS  a
362: FB07CF5F                      <skipped> RDLONG  y,--PTRA
363: F103BBE7                      <skipped> ADD     a,y
364: F603B5CF                                MOV     rd,rd_reg
365: F603B7D5                                MOV     wr,wr_reg
366: F9BBB5DD                                SETS    rd,a
367: F9B3B7DD                                SETD    wr,a
368: F603B5D0                      <skipped> MOV     rd,rd_lut
369: F603B7D6                      <skipped> MOV     wr,wr_lut
36A: 0607B81F                          _RET_ MOV     sz,#$1f
<LOG>>logoff

whicker · 2018-09-13 01:27

If it has to be 3 letters, then just consider renaming COG to CPU.

ersmith · 2018-09-13 01:34

jmg wrote: »

ersmith wrote: »

Maybe the interpreter could be used for other languages too?

Oh, I'm sure the core elements will be used many times.

You may have seen my comment in your other thread, around a their frontend/ this backend (tweaked as needed) merger, to create a Prop 2 Python

Python has its own bytecode, based on its very different needs (it's a dynamically typed language) so I doubt the Spin2 interpreter would be applicable there. Some of Chip's optimization techniques will obviously be useful for any bytecode interpreter, of course

More "traditional" imperative languages like C, BASIC, Pascal, Ada might want to use the Spin2 interpreter, or a tweaked version of it, and many of those want unsigned operations.

Plus, unsigned operations are useful for working with counters and other objects which are intrinsically unsigned.

Fast Bytecode Interpreter

Comments