idea: customized Spin2 interpreters mapping unused bytecodes to user-defined PASM2

Conga · 2017-05-09 02:16

Not sure it's worth doing, but here is the idea.

Since there are bytecode values not used by the standard Spin2 interpreter,
the Spin2 compiler could have the ability to automatically allocate them to frequently called user-defined PASM2.

Who decides what is needed frequently enough?

(1) For starters, it would be enough to allow the user to annotate a PASM2 procedure with some directive meaning "make it a bytecode".
The programmer is responsible for using the bytecode calling conventions (aspects like use of PA, PB, return value on top of data stack --- I guess).

Not sure it's worth the trouble, things get complicated:
How would be parameters handled?
Is the top of data stack variable (named 'x' in current Spin2 implementation, if I remember correctly) accessible?
Then the user-defined PASM2 would be required to push when returning a value, unless there was exactly one parameter, stored in 'x'.
Stack state, and interpreter's integrity in general, are in danger.

(2) Detecting most frequently called user-defined procedures with PASM2 body and making them callable as a bytecode seems nice, but assumes too much about the PASM2 code (might not use the bytecode calling conventions).
This is about most frequent static calls, not profile-driven optimization.
It's optimizing for size (of generated Spin2 bytecode) first; some speed improvement is expected but not the focus.

In any case, this needs to be automatic allocation so the programmer cannot change by mistake the meaning of an existing (standard) bytecode value.

It could be a problem if multiple Spin objects want different extensions.
Combining extensions is definitely possible but harder; (2) above ("Detecting most frequently called user-defined [...]") seems the best approach for this given the very limited bytecode space and Cog/LUT memory.

This begins to look like an optimization in the linking phase for building an embedded system image (but not exactly, since the compiled bytecode will need adjustment).

Heater. · 2017-05-09 04:32

Neat idea.

I presume such op code definitions would be on a per object basis. Else my program that defines an opcode to do X could not use your object that defines the same opcode to do Y.

Could get messy.

jmg · 2017-05-09 04:53

Heater. wrote: »

Neat idea.

I presume such op code definitions would be on a per object basis. Else my program that defines an opcode to do X could not use your object that defines the same opcode to do Y.

Could get messy.

Certainly could, that's why it's more a compiler/linker issue. Spin is now totally soft, so it does not matter what the precise byte-codes are.
Source libraries will just compile new every time anyway.

Just like some P1 flows have dead code removal, this has been mentioned before for P2, where Spin core code that is never called, is pruned to make more room for user code.

That does mean everyone's final Spin footprint would likely differ, and it also exposes the risk of small changes suddenly enabling more Spin core, and so code bumps in an unexpected way.

Of course, with Chips latest SKIP overlaid-many-times code, deciding just what is not called is a challenge.

Also, the cost of not using code inside a skip block, is close to zero as that block is there for other skip combinations...

Conga · 2017-05-09 05:19

Thinking some more:

What I suggested looks too much like building an embedded system image.

Could be based on a special file that is like a linker script and a deployment configuration (mapping Spin2 objects to cogs).

Hub memory for customized Spin2 should not be a problem: the base interpreter would be unchanged, and
the customizations (possibly varying per cog) would be like binary patches to the bytecode lookup table, plus
extra code or data to put in LUT or cog RAM (memory availability can be statically determined).

I'm really not sure this is the intended use of Spin...

jmg · 2017-05-09 05:28

Conga wrote: »

...
I'm really not sure this is the intended use of Spin...

Spin2 is supposed to support/allow in-line assembler, which has to go some way to getting fast user code, to complement Spin.
We will need to see some examples of how that all plays together.

Conga · 2017-05-09 05:56

jmg wrote: »

Conga wrote: »

...
I'm really not sure this is the intended use of Spin...

Spin2 is supposed to support/allow in-line assembler, which has to go some way to getting fast user code, to complement Spin.

I know, and I look forward to this.

I'm not sure that a static mapping like a linker script / deployment configuration
is the intended style of P2 (in general) and especially for Spin2 programming.

Conga · 2017-05-09 06:10

jmg wrote: »

Spin2 is supposed to support/allow in-line assembler, which has to go some way to getting fast user code, to complement Spin.

This means that Spin2-called PASM2 will need a calling convention to give read access to the parameters and write access for the return value(s).

I guess a single return value, but multiple return values are not out of question in a stack-based VM.
This is not a wish, much less a request, for multiple return values at Spin2 level.

Is it possible that a single Spin2 value would need more than one long, therefore multiple slots in the data stack?

Dave Hein · 2017-05-09 11:40

I sort of did something like this in my SpinLMM object for the P1. SpinLMM uses the single unused bytecode in the P1 Spin interpreter to jump to an LMM interpreter loop. SpinLMM patches the Spin interpreter at run-time to provide an LMM loop. It moves the code for some of the lesser used bytecodes out to hub RAM, and executes them from the LMM interpreter.

The P2 has a hub exec mode, so it won't need an LMM interpreter. Also, I believe Chip is including the capability to do hub execution in the P2 Spin interpreter. All that is needed is a single bytecode to do this. You just have to provide an address in hub RAM to jump to. The code in hub RAM will need to understand the Spin calling convention to be able to get parameters off of the stack and return a result. It should be very straightforward once the calling method is documented.

Conga · 2017-05-11 10:32

Thanks Dave,

What you describe seems to be an interconnect mechanism, required for that purpose.

I was thinking of an optimization, entirely optional.
The idea was to have the unused bytecode values execute user-defined PASM2.

I was exploring the implications; the way it could be used seems to result in heavyweight setup:
the programmer must specify a static mapping like a linker script / deployment configuration.

Therefore I said I'm not sure that this is the intended style of P2 (in general) and especially for Spin2 programming.

I may be wrong on all points: what it requires, whether it's worth doing, whether it's a desired/accepted style, etc.

idea: customized Spin2 interpreters mapping unused bytecodes to user-defined PASM2

Comments