TrimSpin
Dave Hein
Posts: 6,347
I have been trying out an idea about speeding up the Spin interpreter by supporting a subset of the Spin bytecodes. This allows the interpreter to be optimized for a smaller number of bytecodes. The attached file contains a program that demonstrates the results that I have so far. The TrimSpin interpreter executes random code almost twice as fast as the standard interpreter. Some functions, such as BYTEMOVE or the multiply operator run only slightly faster since they use similar code that the standard interpreter uses. The STRSIZE instruction runs about 2.5 times faster because it uses a dedicated routine instead of sharing it with the STRCOMP code.
My motivation in doing this is mostly to see if it's possible to develop a virtual machine on the Prop that runs faster than the current Spin VM and is as compact, or even more compact. This might be useful for improving the peformance of Spin, or maybe it could be used for C.
My motivation in doing this is mostly to see if it's possible to develop a virtual machine on the Prop that runs faster than the current Spin VM and is as compact, or even more compact. This might be useful for improving the peformance of Spin, or maybe it could be used for C.
zip
27K
Comments
Dave, I would be interested in finding out which bytecodes were eliminated, though, and what might have to be changed in some Spin programs to accommodate.
Thanks,
-Phil
and must be avoided in Spin programs that are executed by TrimSpin.
VAR variables cannot be accessed directly, but can by accessed through
pointers, such as BYTE[@var_variable].
Only the first 8 local stack variables can be directly accessed. However, all
local variables can be accessed through pointers, such as WORD[@loc_variable].
Indexed access can only be used with the BYTE, WORD and LONG operators,
such as WORD[ptr][index].
The operator-assignment operators can only be used on long variables, such
as x++ and x *= 10. They can not be used on byte or word variables.
The loop index used with the repeat-from-to instruction must be a long
variable.
An array of objects is not supported.
Registers can only be accessed as longs. Bit modes are not supported for
registers, and registers can not be used with extended operators, such as
++ and |=.
BYTEMOVE does not handle the case where the source and destination buffers
overlap, and the source address is greater than the destination address.
A lot of what was eliminated could be restored by a Spin preprocessor that would convert unsupported Spin expressions to something that is supported. But some exclusions present insurmountable difficulties, especially COGNEW -- hard to live without that one!
-Phil
If you are going to do the housekeeping to construct and transport a 'NewSpin', then a smarter approach I would suggest would be to 'smarten up' the Spin compiler, so it creates both a NewSpin image, and the user bytecodes.
This would be multipass and optimising, both very cheap on a PC.
eg If your code never used a Spin byte code, then that could be removed and a new smaller custom NewSpin built, and that could then free room, to allow some (small) In-line user functions ( effectively user Byte Codes)
This would be a simple 'scan and trim' algorithm, but it would run multi-pass until two passes gave the same size.
It could also report how many times each Bytecode resource was called, to show the user possible candidates for more trim.
Also, a user could know they will use most of the Spin byte codes, but prefer 'small' on some, in order to allow 'fast' on others.
The idea is that you never get a result that is not compatible with present Spin, but you can craft programs what would be much faster, each using a project specific subset of Spin, and you can free up space for user byte-codes, and/or faster (but larger) NewSpin Bytecodes.
The ideal is to always have a ~100% full COG, but filled with truly useful stuff.
this is a interesting thought. Optimizing the interpreter for the program loaded. We have to load Spin-Interpreter on Prop2 anyways - so why not integrate a custom one. Not a easy task I guess, but fore sure worth a try.
A multi-stage loader like catalina has will be useful anyways to save/recover hubspace used to load any Spin-Interpreter...
Enjoy!
Mike
Enjoy!
Mike
and somehow tell you the less used codes to optimize your spin to reduce the total number ...
cool.
Enjoy!
Mike
The regular Spin is 2K. Does TrimSpin free up any memory?