.... Either you have to add types to Spin. Which makes the whole language a lot more complex. Or you have to add weird float operators, which is an ugly kludge.
Spin already has types, Chip supports variable types byte/word/long/register now.
The discussion is around adding more types to those that exist now.
It is not a None-to-some problem, (I agree, that would be complex) it is a 4-to-5 problem. (somewhat less complex)
So far I have not needed floats with the Propeller but that may be due to the type and small number of projects I have done, so I am more or less neutral to the idea. I do think that not having floats in Spin while they are available in C will be a disincentive for anyone not already familiar with Spin to learn it.
The fact F32 exists, proves uses do need floats.
You are right that not only will lack of float make users avoid Spin, but it may also give the impression the P2 is lousy at float.
There are many sensor & T&M designs where the P2 could be expected to report float values to an upstream PC.
No one is saying the P2 needs to do all calculations in float, but the ability to report in float is useful.
Spin already has types, Chip supports variable types byte/word/long/register now.
The discussion is around adding more types to those that exist now.
Is this really true? I know that Spin can store bytes, words, and longs but as far as I know all expressions are evaluated in terms of longs. The other values are promoted to longs before any operation is performed on them. Is that wrong?
Spin already has types, Chip supports variable types byte/word/long/register now.
The discussion is around adding more types to those that exist now.
Is this really true? I know that Spin can store bytes, words, and longs but as far as I know all expressions are evaluated in terms of longs. The other values are promoted to longs before any operation is performed on them. Is that wrong?
I'm just going by what Chip said earlier in this thread : "This will take some thought, since we don't yet have variable types beyond byte/word/long/register."
His code seems to load/store each of byte/word/long, so must track when each of those are needed, and register I think is means to choose between HUB and COG places VARs.
Yes, I like it that way personally. It's lean. Easy to write in, and yes it requires one think sometimes, but doing that is like an inversion of the more significant thought and planning needed with stronger types.
I prefer to just test.
Either way gets us working, robust code.
And I want the lean, wild SPIN to exist. It doesn't need much fixing. Could use a couple features.
For Spin on P1, local variables, parameters, and the return value of functions are all LONGs.
Operations except those explicitly calling out bytes or words are done as LONGs. If you do a basic math operation with something that isn't a long it will be implicitly promoted to LONG for the duration of the operation, and then reduced to the destination size for any assignment.
Spin really doesn't have types, it is just able to read and write bytes and words along with LONGs and has a couple things designed to work with bytes or words (like the sign extension stuff.
I know P2 has better stuff. I'm just thinking that adding FLOAT as a type to Spin will add some complication that no one here has really thought all the way through. I think it's better to leave that until later. Since Spin is not in ROM, it can evolve, and someone can make a version that does have floats, but they will also need to make a compiler for it.
I path I would rather think about is making it possible to extend Spin with user "stuff" that would allow a better mechanism for handling FLOATs than what we have on P1 with the F32/etc. stuff.
Perhaps with STRUCTs, we could have a mechanism to provide "handlers" for math operations done on STRUCTs?
Maybe that is too much of a mess, but it's something to think about.
This is the funny thing on not having the Spin-Interpreter in ROM.
@Roy said it can evolve.
I think about it as mutating, but basically its the same.
Just imagine the P1 had 64K of RAM and all them sin/font tables just got loaded, when needed. Most spin programs do not use them, and the RAM address space is wasted.
On the P2 now we will need to load the interpreter every time we start a spin cog. So calling a COGNEW on a spin method like in the P1 will require that we have a image of the interpreter in HUB (or exernal flash?) all the time. Or that the current running SPIN cog can fork itself into a new COG.
Every SPIN2 program will need to include the byte code engine used. Same might go for sub objects, needing Spin2 V1.23 and not Spin2 V.3.21.
So every SPIN2 program or sub module will have to have a OBJ stating something alike system := "SPIN2-EngineV1.23".
To have access to the interpreter binary (and Source for the programmer!) to start a new spin-cog.
And if someone needs to optimize some spin2 code he CAN reuse a byte code he don't need for his goal, so it WILL happen.
But it does not matter we need to load the SPIN2 engine anyways.
I am begging again to not overlook the LINKING of P2 binaries.
A SPIN2 program is a executable PASM2 interpreter and a DATA segment for the SPIN2 byte code.
A C program in CMM will have a PASM2 interpreter and compacted byte code alike SPIN2 does. Or just running PASM2 from HUB.
Now is the time to take care of the interoperability between SPIN2 and PropGCC. Might be helpful for other languages too.
It would be very nice if SPIN2 could declare a function/method as accessible from the outside and provide a PASM2 linkable address/symbol for the linker. SO it will be possible to call this public spin function/method from other cogs.
Same goes the other way around, SPIN2 needs to be able to call some external function defined by a address/symbol. SO it will be possible to call a public PropGCC function from other cogs.
This 'from other cogs' is important and needs to be included into SPIN2, same goes for PropGCC.
Now is the time to define a multiprocessor call mechanism allowing any COG to either run (HUBEXEC) the code directly or have some stub in HUB calling and waiting for the other byte-engine/COG to respond.
PLEASE get support for that vision into SPIN2 and PropGCC right from the start.
One thing to consider is that both Spin and C will most likely have dedicated code and registers residing in cog memory. When Spin bytecodes are used an interpreter will run from cog memory. C program will have certain key routines executing from cog memory, such as memcpy, strlen, strcmp plus the 16 odd registers that it uses. CMM C code will need an interpreter similar to the Spin bytecode interpreter. So C and Spin code will almost certainly have to run in different cogs. This means that a C program will not able to directly call a Spin method, and vice versa. So a mailbox along with a polling loop would be needed to make that happen.
Of course it's possible to merge the Spin and C interpreters into one cog image, but that would probably reduce the efficiency of the two interpreters.
I am begging again to not overlook the LINKING of P2 binaries.
A SPIN2 program is a executable PASM2 interpreter and a DATA segment for the SPIN2 byte code.
A C program in CMM will have a PASM2 interpreter and compacted byte code alike SPIN2 does. Or just running PASM2 from HUB.
Now is the time to take care of the interoperability between SPIN2 and PropGCC. Might be helpful for other languages too.
It would be very nice if SPIN2 could declare a function/method as accessible from the outside and provide a PASM2 linkable address/symbol for the linker. SO it will be possible to call this public spin function/method from other cogs.
GCC already has a linker, so to add linking of P2 Binaries would requires generate of the compatible OBJECT format, generated without absolute code locations.
I'm not sure how the GCC linker copes with mapping to 8 or 16 COGS/LUT, has that output issue been solved ?
One thing to consider is that both Spin and C will most likely have dedicated code and registers residing in cog memory. When Spin bytecodes are used an interpreter will run from cog memory. C program will have certain key routines executing from cog memory, such as memcpy, strlen, strcmp plus the 16 odd registers that it uses. CMM C code will need an interpreter similar to the Spin bytecode interpreter. So C and Spin code will almost certainly have to run in different cogs. This means that a C program will not able to directly call a Spin method, and vice versa. So a mailbox along with a polling loop would be needed to make that happen.
Of course it's possible to merge the Spin and C interpreters into one cog image, but that would probably reduce the efficiency of the two interpreters.
I would hope that most P2 programs will be compiled to native PASM and not use either the Spin byte code VM or the CMM VM. We have quite a bit more memory now. It seems as though P2 won't be thought of very well if we're still stuck using interpretive code most of the time like we had to on P1. It will probably be easier to get C native and Spin native (through fastspin) to interoperate than code that uses the VMs. That should be the normal way of using P2. I would think using interpreters would only be done for programs that need to cram a lot of code and/or data into hub memory and that seems like it will be a minority of P2 programs.
One thing to consider is that both Spin and C will most likely have dedicated code and registers residing in cog memory. When Spin bytecodes are used an interpreter will run from cog memory. C program will have certain key routines executing from cog memory, such as memcpy, strlen, strcmp plus the 16 odd registers that it uses. CMM C code will need an interpreter similar to the Spin bytecode interpreter. So C and Spin code will almost certainly have to run in different cogs. This means that a C program will not able to directly call a Spin method, and vice versa. So a mailbox along with a polling loop would be needed to make that happen.
Of course it's possible to merge the Spin and C interpreters into one cog image, but that would probably reduce the efficiency of the two interpreters.
I would hope that most P2 programs will be compiled to native PASM and not use either the Spin byte code VM or the CMM VM. We have quite a bit more memory now. It seems as though P2 won't be thought of very well if we're still stuck using interpretive code most of the time like we had to on P1. It will probably be easier to get C native and Spin native (through fastspin) to interoperate than code that uses the VMs. That should be the normal way of using P2. I would think using interpreters would only be done for programs that need to cram a lot of code and/or data into hub memory and that seems like it will be a minority of P2 programs.
I beg to differ. The prop tends to have one large, non critical timing, piece of code, supported by other cogs that are usually time critical. This was, and likely still will be, ok to run via a quick interpreter. This conserves hub space. While the first P2 is expected to have 512KB of hub, a number of variants have been put on the table, all with less hub space. Even 512KB is no longer considered large.
As soon as you add Color video into the mix, 512KB is not that big at all.
One thing to consider is that both Spin and C will most likely have dedicated code and registers residing in cog memory. When Spin bytecodes are used an interpreter will run from cog memory. C program will have certain key routines executing from cog memory, such as memcpy, strlen, strcmp plus the 16 odd registers that it uses. CMM C code will need an interpreter similar to the Spin bytecode interpreter. So C and Spin code will almost certainly have to run in different cogs. This means that a C program will not able to directly call a Spin method, and vice versa. So a mailbox along with a polling loop would be needed to make that happen.
Of course it's possible to merge the Spin and C interpreters into one cog image, but that would probably reduce the efficiency of the two interpreters.
I would hope that most P2 programs will be compiled to native PASM and not use either the Spin byte code VM or the CMM VM. We have quite a bit more memory now. It seems as though P2 won't be thought of very well if we're still stuck using interpretive code most of the time like we had to on P1. It will probably be easier to get C native and Spin native (through fastspin) to interoperate than code that uses the VMs. That should be the normal way of using P2. I would think using interpreters would only be done for programs that need to cram a lot of code and/or data into hub memory and that seems like it will be a minority of P2 programs.
I beg to differ. The prop tends to have one large, non critical timing, piece of code, supported by other cogs that are usually time critical. This was, and likely still will be, ok to run via a quick interpreter. This conserves hub space. While the first P2 is expected to have 512KB of hub, a number of variants have been put on the table, all with less hub space. Even 512KB is no longer considered large.
As soon as you add Color video into the mix, 512KB is not that big at all.
Both scenarios are likely, and so long as users can choose which mix of Byte-Code and Native-Code is right for their app, the final % of one or the other does not really matter.
Speed will demand Native-Code, and Size will demand Byte-Code, and also required will be external memory code storage for on-demand or XIP.
Seems P2 has good support for both Native-Code & Byte-Code, but I've yet to see numbers for XIP/on-demand code loading tests.
Those will be interesting...
Tomorrow, I'll add the XBYTE stuff in and execute some bytecodes on I/O pins to see how the speed compares to Prop1. With this new setup of XBYTE, which includes EXECF, there's about zero clock cycles wasted on anything. It's all business. I'm anxious to see what it does.
Looking forward to reading the results. Seems like a metric tonne of progress has been made on the interpreter and language definition. I realize that the float issue is still up in the air, but lots of other things have fallen into place (though not without a lot of mental jostling on Chip's part, of course).
Tomorrow, I'll add the XBYTE stuff in and execute some bytecodes on I/O pins to see how the speed compares to Prop1. With this new setup of XBYTE, which includes EXECF, there's about zero clock cycles wasted on anything. It's all business. I'm anxious to see what it does.
Looking forward to reading the results. Seems like a metric tonne of progress has been made on the interpreter and language definition. I realize that the float issue is still up in the air, but lots of other things have fallen into place (though not without a lot of mental jostling on Chip's part, of course).
When I got to the variable modifiers, I had to rethink how things work, in order to keep the user bytecode small. I think I got it nailed, so maybe later today I'll have a simple bytecode program running.
Tomorrow, I'll add the XBYTE stuff in and execute some bytecodes on I/O pins to see how the speed compares to Prop1. With this new setup of XBYTE, which includes EXECF, there's about zero clock cycles wasted on anything. It's all business. I'm anxious to see what it does.
Looking forward to reading the results. Seems like a metric tonne of progress has been made on the interpreter and language definition. I realize that the float issue is still up in the air, but lots of other things have fallen into place (though not without a lot of mental jostling on Chip's part, of course).
When I got to the variable modifiers, I had to rethink how things work, in order to keep the user bytecode small. I think I got it nailed, so maybe later today I'll have a simple bytecode program running.
Do you have a definition of the byte code instruction set you can post?
Tomorrow, I'll add the XBYTE stuff in and execute some bytecodes on I/O pins to see how the speed compares to Prop1. With this new setup of XBYTE, which includes EXECF, there's about zero clock cycles wasted on anything. It's all business. I'm anxious to see what it does.
Looking forward to reading the results. Seems like a metric tonne of progress has been made on the interpreter and language definition. I realize that the float issue is still up in the air, but lots of other things have fallen into place (though not without a lot of mental jostling on Chip's part, of course).
When I got to the variable modifiers, I had to rethink how things work, in order to keep the user bytecode small. I think I got it nailed, so maybe later today I'll have a simple bytecode program running.
Do you have a definition of the byte code instruction set you can post?
It's coming together. One big development was realizing how to best partition variable reads, writes, and math operations. I keep these things separated now:
- variable setup, with address, type (byte/word/long/cog/lut), and size mask
- shortcuts for local variables (0..15) setup, read, and write
- three different bytecodes for each math operator: regular op, variable assignment with pop, variable assignment without pop
"i *= 7", where "i" is a local variable, would require three bytecodes:
constant_7
setup_local (i)
assign_mul_with_pop
"i++" inside an expression would be two bytecodes:
Tomorrow, I'll add the XBYTE stuff in and execute some bytecodes on I/O pins to see how the speed compares to Prop1. With this new setup of XBYTE, which includes EXECF, there's about zero clock cycles wasted on anything. It's all business. I'm anxious to see what it does.
Looking forward to reading the results. Seems like a metric tonne of progress has been made on the interpreter and language definition. I realize that the float issue is still up in the air, but lots of other things have fallen into place (though not without a lot of mental jostling on Chip's part, of course).
When I got to the variable modifiers, I had to rethink how things work, in order to keep the user bytecode small. I think I got it nailed, so maybe later today I'll have a simple bytecode program running.
Do you have a definition of the byte code instruction set you can post?
It's coming together. One big development was realizing how to best partition variable reads, writes, and math operations. I keep these things separated now:
- variable setup, with address, type (byte/word/long/cog/lut), and size mask
- shortcuts for local variables (0..15) setup, read, and write
- three different bytecodes for each math operator: regular op, variable assignment with pop, variable assignment without pop
"i *= 7", where "i" is a local variable, would require three bytecodes:
constant_7
setup_local (i)
assign_mul_with_pop
"i++" inside an expression would be two bytecodes:
Tomorrow, I'll add the XBYTE stuff in and execute some bytecodes on I/O pins to see how the speed compares to Prop1. With this new setup of XBYTE, which includes EXECF, there's about zero clock cycles wasted on anything. It's all business. I'm anxious to see what it does.
Looking forward to reading the results. Seems like a metric tonne of progress has been made on the interpreter and language definition. I realize that the float issue is still up in the air, but lots of other things have fallen into place (though not without a lot of mental jostling on Chip's part, of course).
When I got to the variable modifiers, I had to rethink how things work, in order to keep the user bytecode small. I think I got it nailed, so maybe later today I'll have a simple bytecode program running.
Do you have a definition of the byte code instruction set you can post?
It's coming together. One big development was realizing how to best partition variable reads, writes, and math operations. I keep these things separated now:
- variable setup, with address, type (byte/word/long/cog/lut), and size mask
- shortcuts for local variables (0..15) setup, read, and write
- three different bytecodes for each math operator: regular op, variable assignment with pop, variable assignment without pop
"i *= 7", where "i" is a local variable, would require three bytecodes:
constant_7
setup_local (i)
assign_mul_with_pop
"i++" inside an expression would be two bytecodes:
It's going to take me another day to get these variable and math operators all tidied up.
Does "setup_local" basically push the address of the local variable on the stack?
No. It puts the variable data into some registers. One of those registers contains the read instruction and one contains the write instruction. So, to read the variable, for example, you just do:
ALTI rd
NOP
That will execute the read instruction in lieu of the NOP.
Tomorrow, I'll add the XBYTE stuff in and execute some bytecodes on I/O pins to see how the speed compares to Prop1. With this new setup of XBYTE, which includes EXECF, there's about zero clock cycles wasted on anything. It's all business. I'm anxious to see what it does.
Looking forward to reading the results. Seems like a metric tonne of progress has been made on the interpreter and language definition. I realize that the float issue is still up in the air, but lots of other things have fallen into place (though not without a lot of mental jostling on Chip's part, of course).
When I got to the variable modifiers, I had to rethink how things work, in order to keep the user bytecode small. I think I got it nailed, so maybe later today I'll have a simple bytecode program running.
Do you have a definition of the byte code instruction set you can post?
It's coming together. One big development was realizing how to best partition variable reads, writes, and math operations. I keep these things separated now:
- variable setup, with address, type (byte/word/long/cog/lut), and size mask
- shortcuts for local variables (0..15) setup, read, and write
- three different bytecodes for each math operator: regular op, variable assignment with pop, variable assignment without pop
"i *= 7", where "i" is a local variable, would require three bytecodes:
constant_7
setup_local (i)
assign_mul_with_pop
"i++" inside an expression would be two bytecodes:
It's going to take me another day to get these variable and math operators all tidied up.
Does "setup_local" basically push the address of the local variable on the stack?
No. It puts the variable data into some registers. One of those registers contains the read instruction and one contains the write instruction. So, to read the variable, for example, you just do:
ALTI rd
NOP
That will execute the read instruction in lieu of the NOP.
I'm not following this topic at all but I get the impression there is quite a lot of new instructions ... yes, no?
Depends on your values of 'New' since 'when' ?
Chip has not added much since the Updated 10 April 2017 release, and that tuned some details...
From 1st post in FPGA files, older news.. Newest features are at top
* New custom bytecode executor with 6-clock overhead (see xbyte.spin2 in zip file)
* SKIPF now behaves like SKIP during hub-exec
* PRNG upgraded to Xoroshiro128+ (accessed via GETRND, was 32-bit LFSR)
* New _RET_ instruction prefix for automatic RETurn, adds 2 clocks in cog exec mode.
* New SKIP/SKIPF instructions for bitmask-based instruction skipping.
* New EXECF instruction for branching plus fast skipping in cog memory.
* Additional ALTxx instructions for reading and writing nibbles, bytes, and words within cog registers.
* Single-stepping/interrupts around REP blocks made consistent between cog and hub execution modes
* 'FLTxx D/#' instructions clear DIR bit and affect OUT bit, read IN bit into C
* 'DRVxx D/#' instructions set DIR bit and affect OUT bit, read IN bit into C
* 2-clock RDPIN/WRPIN/WXPIN/WYPIN with automatic acknowledge
* 2-clock RQPIN ('read quiet') like RDPIN without acknowledge, allows concurrent reading
* Improved booter ROM now runs at 2M baud, thanks to Jmg's ongoing efforts
* ALTB added to facilitate accessing multi-register bit fields (SETBYTS removed)
* Event jumps added: JINT/JNINT/JCT1/JNCT1...
* SETPEQ/SETPNE replaced with SETPAT, C flag picks INA/INB, Z flags picks equal/not-equal
* Improved booter ROM, now supports 3-pin SPI and half-duplex serial
* ADRA/ADRB renamed to PA/PB
* New 'CALLPA/CALLPB D/#,S/#' instructions write D/# to PA/PB and call S/#
* 4 selectable events for pins, locks, and LUT r/w's
* Direct pin DIR instructions: DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRN D/#
* Direct pin OUT instructions: OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTN D/#
* Direct pin IN instructions: TESTIN/TESTNIN D/#
* Increment-test jumps: IJZ/IJNZ/IJS/IJNS D,S/#rel9
* Interrupt-triggering instructions: TRGINT1/TRGINT2/TRGINT3
* Support for Prop123-A7 boards added after Tubular fixed PLL problem
* Support for the BeMicro CV A9 board was added
* Hub/eggbeater can now be 16, 8, 4, 2, or 1 slice of cog and hub RAM
* Fewer slices means lower latency
* Cogs' FIFO's are reduced to match slices now, saving logic
* FPGA images are optimized for their number of slices (not all 16, anymore)
* The Verilog source code is now capable of making any sub-version of Prop2
I'm not following this topic at all but I get the impression there is quite a lot of new instructions ... yes, no?
These are Spin bytecodes we're talking about.
Evanh, it might be a good idea to summarize, in a new post in the PRNG thread, the final findings on the xoroshiro32+, where you show the C code with those optimal ROTL and SHL values. Maybe put the word "implementation" in the post somewhere, so that future Google searches for "xoroshiro32 implementation" will land there. I'm sure that work will get used by people if they can stumble into it. That's just the solution many people need, as it works easily in 16 or 32-bit architectures.
I'm not following this topic at all but I get the impression there is quite a lot of new instructions ... yes, no?
These are Spin bytecodes we're talking about.
Ah, it's all just sorting out the ROM then I gather.
Evanh, it might be a good idea to summarize, in a new post in the PRNG thread, the final findings on the xoroshiro32+, where you show the C code with those optimal ROTL and SHL values. Maybe put the word "implementation" in the post somewhere, so that future Google searches for "xoroshiro32 implementation" will land there. I'm sure that work will get used by people if they can stumble into it. That's just the solution many people need, as it works easily in 16 or 32-bit architectures.
Ah, it's all just sorting out the ROM then I gather.
Not quite, as unlike P1, Spin2 does not reside in the ROM, P2 ROM is for loader only.
It is proving the P2 engine, in a complex use case, and opening other byte-engine uses.
Comments
There should really be one of those, and a lean SPIN helps with that.
Can't wait to see XBYTE perform.
Spin already has types, Chip supports variable types byte/word/long/register now.
The discussion is around adding more types to those that exist now.
It is not a None-to-some problem, (I agree, that would be complex) it is a 4-to-5 problem. (somewhat less complex)
The fact F32 exists, proves uses do need floats.
You are right that not only will lack of float make users avoid Spin, but it may also give the impression the P2 is lousy at float.
There are many sensor & T&M designs where the P2 could be expected to report float values to an upstream PC.
No one is saying the P2 needs to do all calculations in float, but the ability to report in float is useful.
"This will take some thought, since we don't yet have variable types beyond byte/word/long/register."
His code seems to load/store each of byte/word/long, so must track when each of those are needed, and register I think is means to choose between HUB and COG places VARs.
Sure you can define bytes, words and longs in memory. You can even use floating point literals.
But when it comes to it:
PUB doSomething (x, y, z)
...
doSomething has no idea of the types of x, y, z.
I prefer to just test.
Either way gets us working, robust code.
And I want the lean, wild SPIN to exist. It doesn't need much fixing. Could use a couple features.
I'm not convinced floats are one of them.
Not now. Maybe later.
Operations except those explicitly calling out bytes or words are done as LONGs. If you do a basic math operation with something that isn't a long it will be implicitly promoted to LONG for the duration of the operation, and then reduced to the destination size for any assignment.
Spin really doesn't have types, it is just able to read and write bytes and words along with LONGs and has a couple things designed to work with bytes or words (like the sign extension stuff.
Which has now been generalized for any word size.
I know P2 has better stuff. I'm just thinking that adding FLOAT as a type to Spin will add some complication that no one here has really thought all the way through. I think it's better to leave that until later. Since Spin is not in ROM, it can evolve, and someone can make a version that does have floats, but they will also need to make a compiler for it.
I path I would rather think about is making it possible to extend Spin with user "stuff" that would allow a better mechanism for handling FLOATs than what we have on P1 with the F32/etc. stuff.
Perhaps with STRUCTs, we could have a mechanism to provide "handlers" for math operations done on STRUCTs?
Maybe that is too much of a mess, but it's something to think about.
This is the funny thing on not having the Spin-Interpreter in ROM.
@Roy said it can evolve.
I think about it as mutating, but basically its the same.
Just imagine the P1 had 64K of RAM and all them sin/font tables just got loaded, when needed. Most spin programs do not use them, and the RAM address space is wasted.
On the P2 now we will need to load the interpreter every time we start a spin cog. So calling a COGNEW on a spin method like in the P1 will require that we have a image of the interpreter in HUB (or exernal flash?) all the time. Or that the current running SPIN cog can fork itself into a new COG.
Every SPIN2 program will need to include the byte code engine used. Same might go for sub objects, needing Spin2 V1.23 and not Spin2 V.3.21.
So every SPIN2 program or sub module will have to have a OBJ stating something alike system := "SPIN2-EngineV1.23".
To have access to the interpreter binary (and Source for the programmer!) to start a new spin-cog.
And if someone needs to optimize some spin2 code he CAN reuse a byte code he don't need for his goal, so it WILL happen.
But it does not matter we need to load the SPIN2 engine anyways.
I am begging again to not overlook the LINKING of P2 binaries.
A SPIN2 program is a executable PASM2 interpreter and a DATA segment for the SPIN2 byte code.
A C program in CMM will have a PASM2 interpreter and compacted byte code alike SPIN2 does. Or just running PASM2 from HUB.
Now is the time to take care of the interoperability between SPIN2 and PropGCC. Might be helpful for other languages too.
It would be very nice if SPIN2 could declare a function/method as accessible from the outside and provide a PASM2 linkable address/symbol for the linker. SO it will be possible to call this public spin function/method from other cogs.
Same goes the other way around, SPIN2 needs to be able to call some external function defined by a address/symbol. SO it will be possible to call a public PropGCC function from other cogs.
This 'from other cogs' is important and needs to be included into SPIN2, same goes for PropGCC.
Now is the time to define a multiprocessor call mechanism allowing any COG to either run (HUBEXEC) the code directly or have some stub in HUB calling and waiting for the other byte-engine/COG to respond.
PLEASE get support for that vision into SPIN2 and PropGCC right from the start.
Mike
Of course it's possible to merge the Spin and C interpreters into one cog image, but that would probably reduce the efficiency of the two interpreters.
GCC already has a linker, so to add linking of P2 Binaries would requires generate of the compatible OBJECT format, generated without absolute code locations.
I'm not sure how the GCC linker copes with mapping to 8 or 16 COGS/LUT, has that output issue been solved ?
As soon as you add Color video into the mix, 512KB is not that big at all.
Both scenarios are likely, and so long as users can choose which mix of Byte-Code and Native-Code is right for their app, the final % of one or the other does not really matter.
Speed will demand Native-Code, and Size will demand Byte-Code, and also required will be external memory code storage for on-demand or XIP.
Seems P2 has good support for both Native-Code & Byte-Code, but I've yet to see numbers for XIP/on-demand code loading tests.
Those will be interesting...
When I got to the variable modifiers, I had to rethink how things work, in order to keep the user bytecode small. I think I got it nailed, so maybe later today I'll have a simple bytecode program running.
It's coming together. One big development was realizing how to best partition variable reads, writes, and math operations. I keep these things separated now:
- variable setup, with address, type (byte/word/long/cog/lut), and size mask
- shortcuts for local variables (0..15) setup, read, and write
- three different bytecodes for each math operator: regular op, variable assignment with pop, variable assignment without pop
"i *= 7", where "i" is a local variable, would require three bytecodes:
constant_7
setup_local (i)
assign_mul_with_pop
"i++" inside an expression would be two bytecodes:
setup_local (i)
assign_post_inc_without_pop
"x := y + 3" would be:
read_local (y)
constant_3
add
write_local_with_pop (x)
It's going to take me another day to get these variable and math operators all tidied up.
No. It puts the variable data into some registers. One of those registers contains the read instruction and one contains the write instruction. So, to read the variable, for example, you just do:
ALTI rd
NOP
That will execute the read instruction in lieu of the NOP.
Depends on your values of 'New' since 'when' ?
Chip has not added much since the Updated 10 April 2017 release, and that tuned some details...
From 1st post in FPGA files, older news.. Newest features are at top
* New custom bytecode executor with 6-clock overhead (see xbyte.spin2 in zip file)
* SKIPF now behaves like SKIP during hub-exec
* PRNG upgraded to Xoroshiro128+ (accessed via GETRND, was 32-bit LFSR)
* New _RET_ instruction prefix for automatic RETurn, adds 2 clocks in cog exec mode.
* New SKIP/SKIPF instructions for bitmask-based instruction skipping.
* New EXECF instruction for branching plus fast skipping in cog memory.
* Additional ALTxx instructions for reading and writing nibbles, bytes, and words within cog registers.
* Single-stepping/interrupts around REP blocks made consistent between cog and hub execution modes
* 'FLTxx D/#' instructions clear DIR bit and affect OUT bit, read IN bit into C
* 'DRVxx D/#' instructions set DIR bit and affect OUT bit, read IN bit into C
* 2-clock RDPIN/WRPIN/WXPIN/WYPIN with automatic acknowledge
* 2-clock RQPIN ('read quiet') like RDPIN without acknowledge, allows concurrent reading
* Improved booter ROM now runs at 2M baud, thanks to Jmg's ongoing efforts
* ALTB added to facilitate accessing multi-register bit fields (SETBYTS removed)
* Event jumps added: JINT/JNINT/JCT1/JNCT1...
* SETPEQ/SETPNE replaced with SETPAT, C flag picks INA/INB, Z flags picks equal/not-equal
* Improved booter ROM, now supports 3-pin SPI and half-duplex serial
* ADRA/ADRB renamed to PA/PB
* New 'CALLPA/CALLPB D/#,S/#' instructions write D/# to PA/PB and call S/#
* 4 selectable events for pins, locks, and LUT r/w's
* Direct pin DIR instructions: DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRN D/#
* Direct pin OUT instructions: OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTN D/#
* Direct pin IN instructions: TESTIN/TESTNIN D/#
* Increment-test jumps: IJZ/IJNZ/IJS/IJNS D,S/#rel9
* Interrupt-triggering instructions: TRGINT1/TRGINT2/TRGINT3
* Support for Prop123-A7 boards added after Tubular fixed PLL problem
* Support for the BeMicro CV A9 board was added
* Hub/eggbeater can now be 16, 8, 4, 2, or 1 slice of cog and hub RAM
* Fewer slices means lower latency
* Cogs' FIFO's are reduced to match slices now, saving logic
* FPGA images are optimized for their number of slices (not all 16, anymore)
* The Verilog source code is now capable of making any sub-version of Prop2
These are Spin bytecodes we're talking about.
Evanh, it might be a good idea to summarize, in a new post in the PRNG thread, the final findings on the xoroshiro32+, where you show the C code with those optimal ROTL and SHL values. Maybe put the word "implementation" in the post somewhere, so that future Google searches for "xoroshiro32 implementation" will land there. I'm sure that work will get used by people if they can stumble into it. That's just the solution many people need, as it works easily in 16 or 32-bit architectures.
I'll try out my documentation skills.
It is proving the P2 engine, in a complex use case, and opening other byte-engine uses.