william chan said...
Why does toggling certain IO pins (P0 and P1) faster than toggling other IO pins? It shouldn't be the case.
I believe that is down to the way the numbers which specify the pin are encoded in the bytecode.
Constants 0 and 1 ( plus -1 ) are encoded as a single byte, all others are encoded as two or more bytes. This explains why Pins 0 and 1 are fastest, one less bytecode fetch, one less hub access.
Single bit, and all-lsb's set numbers are encoded as two bytes to save code space ( eg, $8000_0000 and $7FFF_FFFF are two byte constants ) but this adds a little extra decoding time over non-encoded numbers held in two bytes, so, for constant numbered pins -
0 and 1 - Single byte, fastest
2, 3, 4 etc - Two byte encoded, slowest
5 and 6 etc - Two byte non-encoded, second fastest
The interpreter is very efficiently coded and reasonably few PASM instructions need to be executed for each bytecode. Just a few extra instructions to handle each case adds little time
to each but proportionally that can become significant.
william chan said...
You need a "compiler" to make changes to the SPIN interpreter?
It's not so much changing the interpreter as adding to it. The interpreter code could perhaps be re-ordered to optimise references to Special Purpose Registers (SPR) but that could then have a detrimental impact on other execution timing.
The Post Clear and Post Set Operators may not be most efficient in the particular case identified here but they are efficient in other cases and these operators are 'shared code' so any change is complicated.
Alternatively, new bytecodes can be added to allow SPR accesses to be faster, but only an altered compiler would ever generate those bytecodes. There is almost no free space in the interpreter which makes this awkward but it could be done if more severe slowing of execution were accepted for less frequently used bytecodes.
It would be another alternative to alter the compiler and not alter the interpreter; if the compiler never generated encoded number tokens where a constant was used as a pin index this would increase the speed of execution for those pins which are currently slowest to access. Likewise the compiler could generate a fast assignment rather than generating a Post Clear or Post Set opcode. Unfortunately the compiler source code is in the hands of Parallax so only they can update it, any other compiler has to be written by the user community from first-principles up.
With the fullness of time, third-party compilers will emerge, along with optimisation, but there's no way to say when that will be.
Altering the interpreter is a much more specialised task, understanding how it works, what changes are needed and understanding the impact changes will have. It's a lot of work to solve one person's problem where that change may be considered detrimental by many others. Even if the interpreter were to change there's no way of predicting how long that might take.
Using Spin means accepting the implicit inefficiencies of interpreted bytecode and being at the mercy of how the Spin Interpreter is coded. In the great majority of cases it is an acceptable trade-off between those issues and the ease of use of a higher level language.
Where the inefficiencies are unacceptable the simplest and quickest way forward is to code part of the program in Assembler. The effort involved will be considerably less than that needed from others to change the interpreter or rewrite compilers.
did you take a look at the assembly-example in the manual on page 340?
i started with this example and changed it step by step to test and learn other commands
if you try to write down the whole bitbanging for your adc-routine at once i would expect
it will not work because prop-assembly is different to other assembly-languages
sometimes you have to change only one effect-flag or a condition
for a non-infinite loop you can take the command DJNZ
for debug-purposes i used thinks like this
mov LoopCntr,#4
mov _Time, cnt 'Calculate delay time
add _Time, #9 'Set minimum delay here
'just a test loop making LEDs flash for saying i'm running this part
testloop waitcnt _Time, _Delay2 'Wait
xor outa, _Pin 'Toggle Pin
djnz LoopCntr, #testloop
best regards
Stefan
Post Edited (StefanL38) : 3/13/2008 9:25:41 PM GMT
Comments
I believe that is down to the way the numbers which specify the pin are encoded in the bytecode.
Constants 0 and 1 ( plus -1 ) are encoded as a single byte, all others are encoded as two or more bytes. This explains why Pins 0 and 1 are fastest, one less bytecode fetch, one less hub access.
Single bit, and all-lsb's set numbers are encoded as two bytes to save code space ( eg, $8000_0000 and $7FFF_FFFF are two byte constants ) but this adds a little extra decoding time over non-encoded numbers held in two bytes, so, for constant numbered pins -
0 and 1 - Single byte, fastest
2, 3, 4 etc - Two byte encoded, slowest
5 and 6 etc - Two byte non-encoded, second fastest
The interpreter is very efficiently coded and reasonably few PASM instructions need to be executed for each bytecode. Just a few extra instructions to handle each case adds little time
to each but proportionally that can become significant.
It's not so much changing the interpreter as adding to it. The interpreter code could perhaps be re-ordered to optimise references to Special Purpose Registers (SPR) but that could then have a detrimental impact on other execution timing.
The Post Clear and Post Set Operators may not be most efficient in the particular case identified here but they are efficient in other cases and these operators are 'shared code' so any change is complicated.
Alternatively, new bytecodes can be added to allow SPR accesses to be faster, but only an altered compiler would ever generate those bytecodes. There is almost no free space in the interpreter which makes this awkward but it could be done if more severe slowing of execution were accepted for less frequently used bytecodes.
It would be another alternative to alter the compiler and not alter the interpreter; if the compiler never generated encoded number tokens where a constant was used as a pin index this would increase the speed of execution for those pins which are currently slowest to access. Likewise the compiler could generate a fast assignment rather than generating a Post Clear or Post Set opcode. Unfortunately the compiler source code is in the hands of Parallax so only they can update it, any other compiler has to be written by the user community from first-principles up.
With the fullness of time, third-party compilers will emerge, along with optimisation, but there's no way to say when that will be.
Altering the interpreter is a much more specialised task, understanding how it works, what changes are needed and understanding the impact changes will have. It's a lot of work to solve one person's problem where that change may be considered detrimental by many others. Even if the interpreter were to change there's no way of predicting how long that might take.
Using Spin means accepting the implicit inefficiencies of interpreted bytecode and being at the mercy of how the Spin Interpreter is coded. In the great majority of cases it is an acceptable trade-off between those issues and the ease of use of a higher level language.
Where the inefficiencies are unacceptable the simplest and quickest way forward is to code part of the program in Assembler. The effort involved will be considerably less than that needed from others to change the interpreter or rewrite compilers.
did you take a look at the assembly-example in the manual on page 340?
i started with this example and changed it step by step to test and learn other commands
if you try to write down the whole bitbanging for your adc-routine at once i would expect
it will not work because prop-assembly is different to other assembly-languages
sometimes you have to change only one effect-flag or a condition
for a non-infinite loop you can take the command DJNZ
for debug-purposes i used thinks like this
best regards
Stefan
Post Edited (StefanL38) : 3/13/2008 9:25:41 PM GMT
I don't know how to share variables between assembly and spin get_xxx routines yet.
Where does _Delay2 come from?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.fd.com.my
www.mercedes.com.my
Look at my CogArrays example in the obex. http://obex.parallax.com/objects/268/
It shows one way of reading/writing to and from the hub. Look also at deSilva's excellent tutorial!
J