Just a thought. Can all this discussion about enhancements to the Spin language wait until after the chip goes into synthesis? Any time spent on making Spin look more like C++ is just delaying the date when we can all get our hands on a P2 chip. Our efforts may be better spent on testing the latest and "final" FPGA image.
An implied library which is an object, but does not need the object.method() syntax, just the method() syntax. If any method uses an unknown keyword, the implied object's methods get checked for a name match. Any object that uses any of those methods will have that implied object included, which is just a 2-long cost. The top-level file can specify this implied object. That way, Spin can get extended without any tool changes.
This is almost exactly how fastspin implements most of the standard Spin functions like stringlen, waitcnt, lockclr, etc; there is a "system" object that's automatically included by the compiler and has definitions for all of those. When it sees a function call without an explicit object it looks first in the current object's methods, then in the system object. The optimizer removes all the unused methods, so most of the system functions don't actually have to be placed in the final output (and many of them are small and get inlined anyway).
An implied library which is an object, but does not need the object.method() syntax, just the method() syntax. If any method uses an unknown keyword, the implied object's methods get checked for a name match. Any object that uses any of those methods will have that implied object included, which is just a 2-long cost. The top-level file can specify this implied object. That way, Spin can get extended without any tool changes.
This is almost exactly how fastspin implements most of the standard Spin functions like stringlen, waitcnt, lockclr, etc; there is a "system" object that's automatically included by the compiler and has definitions for all of those. When it sees a function call without an explicit object it looks first in the current object's methods, then in the system object. The optimizer removes all the unused methods, so most of the system functions don't actually have to be placed in the final output (and many of them are small and get inlined anyway).
My understanding is that these things are happening in parallel. One is not dependant on the other. The silicon (PASM) is done and is being synthesized while Chip-and-the-forum work on building the SPIN language and compiler/interpreter using the "locked in" PASM opcodes.
Chip also mentioned some layout changes that were needed still, I assume that is with the frame that will contain the synthesized verilog stuff.
Chip isn't going to send the stuff to synthesis until he is confident in the vetting of the FPGA version, and part of that for him is getting Spin2 up and running.
Chip also mentioned some layout changes that were needed still, I assume that is with the frame that will contain the synthesized verilog stuff.
Chip isn't going to send the stuff to synthesis until he is confident in the vetting of the FPGA version, and part of that for him is getting Spin2 up and running.
That's right.
We're going to have to spend a lot of money to get through synthesis and fabrication. When we pull that trigger, we want to know, more than we do now, that things are in proper order. Having Spin working will get the confidence level up to where it needs to be.
This is the path Chip wants to take, why do you keep saying to stop doing what Chip wants?
The reason I'm asking Chip to stop doing what he wants is a selfish one on my part. I want to see the P2 as soon as possible and I'm not interested in seeing an enhanced Spin running under a bytecode interpreter -- at least not until the P2 development goes into synthesis and fab. I think then Chip will have lots of time on his hands to work on enhancing Spin. In the meantime, I think Chip can test how well the P2 works with the Spin bytecode interpreter by just porting the P1 interpreter and optimizing it for the P2. I really don't understand why Chip is going down the New Spin tangent at this time.
My understanding is that these things are happening in parallel. One is not dependent on the other. The silicon (PASM) is done and is being synthesized while Chip-and-the-forum work on building the SPIN language and compiler/interpreter using the "locked in" PASM opcodes.
I seriously doubt that synthesis is happening at this time. I think there is a lot of work to verify the latest FPGA before that can happen. Also, Chip wants to verify that the Spin bytecode interpreter works efficiently with the latest design. As I said many times, the bytecode interpreter can be tested using a P1-like Spin interpreter. All the work on enhancing the Spin syntax and functionality is happening right in the middle of the P2 design critical path, and is pushing out the P2 delivery date day-by-day for every day that New Spin is being pursued. Please correct me if I'm wrong.
My understanding is that the Verilog for the P2 is not fixed yet.
Chip just stated in the FPGA thread that the Verilog was DONE. Heater, didn't you get a piece of the cake from the celebration? Chip did add that the booter code may get some minor changes.
Chip also mentioned some layout changes that were needed still, I assume that is with the frame that will contain the synthesized verilog stuff.
Chip isn't going to send the stuff to synthesis until he is confident in the vetting of the FPGA version, and part of that for him is getting Spin2 up and running.
Yes, so we should all be testing out the latest FPGA image with all the stuff we've run before to make sure there aren't any problems. I agree on testing with Spin2, but not with all the bells and whistles that people have been kicking around.
Having Spin working will get the confidence level up to where it needs to be.
I agree, but it seems like the basic P1 Spin functionality plus P2-specific features should be sufficient at this time. Many of the new features that have been proposed are enhancements to the language, but do not change the bytecode interpreter. These features can be added later on after the Verilog has been tested.
BTW, I ported the P1 bytecode interpreter to the P2 back in November of 2015. Maybe this can be optimized and used to test out bytecode processing on the P2.
The code is contained in the p1spin thread at http://forums.parallax.com/discussion/162858/p1spin . Actually, "porting" is probably not the right term in this case. I basically wrote PASM code to implement each of the Spin bytecodes, and then used a jump table to execute each fetched bytecode. At one time I did attempt to port the interpreter contained in the ROM code, but I gave up because it relies heavily on self-modifying code specific to the P1.
It's doubtful that the November 2015 code will run on the latest FPGA image. However, maybe someone would like to give it a try. I am away from my FPGA board currently, but I'll try to get it to run next week when I get home.
EDIT: I found an earlier thread from March 2014 -- 3 years ago! -- when I first wrote p1spin for the P2. It's located at http://forums.parallax.com/discussion/154460/p1spin . Boy, time flies when you're getting old having fun.
In starting the math stack operations for the Spin2 interpreter, I realized that it was taking three instructions to do a cog-register-stack pop:
sub stkptr,#1
alts stkptr
mov x,0
In the current v16, instructions ALTSN..ALTB do not alter their D register.
I changed these instructions to use S[17:09] as a signed adder value for their D register. Under normal usage, where S is zero or #imm9, these S[17:09] bits are always 0, making the instruction behave as currently documented. When those S[17:09] bits are non-0, they get sign-extended and added to D. This makes indexing possible with just two instructions.
Here's what a cog-register-stack pop looks like now:
alts stkptr,stkpop
mov x,0
...
stkpop long $1FF<<9 + (stkbase-1) & $1FF 'use stkbase-1 (like --stkptr) and decrement D
There will be a v17 release soon which contains this change, along with the xoroshiro128+ PRNG.
There may be more simple enhancements along the way to getting Spin2 working. This is why I don't want to jump to synthesis immediately. Little things come up when you start writing software.
I changed these instructions to use S[17:09] as a signed adder value for their D register. Under normal usage, where S is zero or #imm9, these S[17:09] bits are always 0, making the instruction behave as currently documented. When those S[17:09] bits are non-0, they get sign-extended and added to D. This makes indexing possible with just two instructions.
In starting the math stack operations for the Spin2 interpreter, I realized that it was taking three instructions to do a cog-register-stack pop:
sub stkptr,#1
alts stkptr
mov x,0
In the current v16, instructions ALTSN..ALTB do not alter their D register.
I changed these instructions to use S[17:09] as a signed adder value for their D register. Under normal usage, where S is zero or #imm9, these S[17:09] bits are always 0, making the instruction behave as currently documented. When those S[17:09] bits are non-0, they get sign-extended and added to D. This makes indexing possible with just two instructions.
Here's what a cog-register-stack pop looks like now:
alts stkptr,stkpop
mov x,0
...
stkpop long $1FF<<9 + (stkbase-1) & $1FF 'use stkbase-1 (like --stkptr) and decrement D
There will be a v17 release soon which contains this change, along with the xoroshiro128+ PRNG.
There may be more simple enhancements along the way to getting Spin2 working. This is why I don't want to jump to synthesis immediately. Little things come up when you start writing software.
I thought you had decided to put the stack in hub memory?
There may be more simple enhancements along the way to getting Spin2 working. This is why I don't want to jump to synthesis immediately. Little things come up when you start writing software
There may be more simple enhancements along the way to getting Spin2 working. This is why I don't want to jump to synthesis immediately. Little things come up when you start writing software.
True; on the other hand there's always "just one more thing" that you can do to improve the chip. At some point it does have to ship!
Nice, stack operations are so common that saving 1 instruction per stack op is a huge deal.
This should be bonus for C/C++ too.
I very much doubt that the C compiler will use this. The C stack will (normally) be in hub memory, so the ptra/ptrb instructions would be used for stack operations. Even iwe did want to add a mode to allow the stack to be in cog, I'm not sure the compiler would be able to use alts, since it requires the register to be set up in a very particular way. We couldn't use it for an arbitrary predecrement unless the bottom 9 bits of the S register are treated as signed.
The stack will be in hub memory, but I'm thinking that the current stack frame can be cached in cog RAM.
Here are some stack-based math operations:
'
'
' Add
'
_add alts sp,spop
mov y,0
alts sp,spop
mov x,0
add x,y
altd sp,spush
mov 0,x
ret
'
'
' Sub
'
_sub alts sp,spop
mov y,0
alts sp,spop
mov x,0
sub x,y
altd sp,spush
mov 0,x
ret
'
'
' Logical AND
'
_logand alts sp,spop
mov x,0 wz
alts sp,spop
if_nz mov x,0 wz
if_nz not x,#0
altd sp,spush
mov 0,x
ret
'
'
' Logical OR
'
_logor alts sp,spop
mov x,0 wz
alts sp,spop
if_z mov x,0 wz
if_nz not x,#0
altd sp,spush
mov 0,x
ret
'
'
' Logical NOT
'
_lognot alts sp,spop
mov x,0 wz
if_z not x,#0
if_nz mov x,#0
altd sp,spush
mov 0,x
ret
'
'
' Stack setup
'
spop long $1FF<<9 + $1FF '$000-based stack pop [--sp]
spush long $001<<9 + $000 '$000-based stack push [sp++]
sp long $000 '$000-based stack pointer
The instruction(s) at their heart could be modified to save lots of instances of almost the same code.
If you have the stack in the cogram, then you can do the calculation direct on the stack. (The stack is made of registers).
With that you spare a pop and a push:
Andy, thanks. That should have occurred to me, too.
I've been working on the code snippets today that make up the interpreter operations. I've switched to hub push/pop, for now, since it seems too early to know how to optimize for caching.
I think the idea of structures was discussed earlier in this thread, and I believe the consensus was not to add it to Spin2. If you're going to add structures you might as well just program in C. The concept of passing an object pointer to a method was discuss, and there seem to be general agreement to add that. The object VAR space is kind of like a structure. However, because longs, words and bytes are grouped together in VAR space it messes up the order of elments with different sizes.
OK, so on the P2 the Spin compiler will not re-order the VAR data. And unaligned longs and words will be allowed. Got it. Structs will be a nice addition to Spin. I've got a few more suggestions if your interested. Pretty soon Spin will look just like C, but with indentation instead of braces.
OK, so on the P2 the Spin compiler will not re-order the VAR data. And unaligned longs and words will be allowed. Got it. Structs will be a nice addition to Spin. I've got a few more suggestions if your interested. Pretty soon Spin will look just like C, but with indentation instead of braces.
Comments
libSPIN
On second thought, maybe we should NOT go there. Sounds like an avenue for rapid baggage accumulation.
This is almost exactly how fastspin implements most of the standard Spin functions like stringlen, waitcnt, lockclr, etc; there is a "system" object that's automatically included by the compiler and has definitions for all of those. When it sees a function call without an explicit object it looks first in the current object's methods, then in the system object. The optimizer removes all the unused methods, so most of the system functions don't actually have to be placed in the final output (and many of them are small and get inlined anyway).
Eric
Sounds like a good way to go!
This is the path Chip wants to take, why do you keep saying to stop doing what Chip wants?
ersmith,
That's pretty much what I was thinking then, with the system library thing.
My understanding is that these things are happening in parallel. One is not dependant on the other. The silicon (PASM) is done and is being synthesized while Chip-and-the-forum work on building the SPIN language and compiler/interpreter using the "locked in" PASM opcodes.
Correct me?
It's only a few days ago that a new PRNG was added to it.
And some instruction or other that was supposed to help the Spin byte code.
.....
Chip isn't going to send the stuff to synthesis until he is confident in the vetting of the FPGA version, and part of that for him is getting Spin2 up and running.
That's right.
We're going to have to spend a lot of money to get through synthesis and fabrication. When we pull that trigger, we want to know, more than we do now, that things are in proper order. Having Spin working will get the confidence level up to where it needs to be.
I seriously doubt that synthesis is happening at this time. I think there is a lot of work to verify the latest FPGA before that can happen. Also, Chip wants to verify that the Spin bytecode interpreter works efficiently with the latest design. As I said many times, the bytecode interpreter can be tested using a P1-like Spin interpreter. All the work on enhancing the Spin syntax and functionality is happening right in the middle of the P2 design critical path, and is pushing out the P2 delivery date day-by-day for every day that New Spin is being pursued. Please correct me if I'm wrong.
Chip just stated in the FPGA thread that the Verilog was DONE. Heater, didn't you get a piece of the cake from the celebration? Chip did add that the booter code may get some minor changes.
Yes, so we should all be testing out the latest FPGA image with all the stuff we've run before to make sure there aren't any problems. I agree on testing with Spin2, but not with all the bells and whistles that people have been kicking around.
I agree, but it seems like the basic P1 Spin functionality plus P2-specific features should be sufficient at this time. Many of the new features that have been proposed are enhancements to the language, but do not change the bytecode interpreter. These features can be added later on after the Verilog has been tested.
Oh yeah. I missed that "DONE" announcement. The date on the last FPGA release did not move so I did not check that thread.
Hurrah! I'm going to get me some cake!
The code is contained in the p1spin thread at http://forums.parallax.com/discussion/162858/p1spin . Actually, "porting" is probably not the right term in this case. I basically wrote PASM code to implement each of the Spin bytecodes, and then used a jump table to execute each fetched bytecode. At one time I did attempt to port the interpreter contained in the ROM code, but I gave up because it relies heavily on self-modifying code specific to the P1.
It's doubtful that the November 2015 code will run on the latest FPGA image. However, maybe someone would like to give it a try. I am away from my FPGA board currently, but I'll try to get it to run next week when I get home.
EDIT: I found an earlier thread from March 2014 -- 3 years ago! -- when I first wrote p1spin for the P2. It's located at http://forums.parallax.com/discussion/154460/p1spin . Boy, time flies when you're getting old having fun.
In the current v16, instructions ALTSN..ALTB do not alter their D register.
I changed these instructions to use S[17:09] as a signed adder value for their D register. Under normal usage, where S is zero or #imm9, these S[17:09] bits are always 0, making the instruction behave as currently documented. When those S[17:09] bits are non-0, they get sign-extended and added to D. This makes indexing possible with just two instructions.
Here's what a cog-register-stack pop looks like now:
There will be a v17 release soon which contains this change, along with the xoroshiro128+ PRNG.
There may be more simple enhancements along the way to getting Spin2 working. This is why I don't want to jump to synthesis immediately. Little things come up when you start writing software.
This should be bonus for C/C++ too.
Here are some stack-based math operations:
The instruction(s) at their heart could be modified to save lots of instances of almost the same code.
Worth it.
True; on the other hand there's always "just one more thing" that you can do to improve the chip. At some point it does have to ship!
Eric
I very much doubt that the C compiler will use this. The C stack will (normally) be in hub memory, so the ptra/ptrb instructions would be used for stack operations. Even iwe did want to add a mode to allow the stack to be in cog, I'm not sure the compiler would be able to use alts, since it requires the register to be set up in a very particular way. We couldn't use it for an arbitrary predecrement unless the bottom 9 bits of the S register are treated as signed.
Eric
If you have the stack in the cogram, then you can do the calculation direct on the stack. (The stack is made of registers).
With that you spare a pop and a push:
Andy
I've been working on the code snippets today that make up the interpreter operations. I've switched to hub push/pop, for now, since it seems too early to know how to optimize for caching.
C style structures
I have no idea how hard that is to add, but would be very useful...
-Phil
I believe Chip still wants to add structures. I think you are wrong about the consensus.
Also, with the P2 the VARs do not need to be reordered anymore, and I think Chip already made that change.
That's right, VARs can be assembled in order of declaration, without any alignment concerns.
I'm working on the call/return code in the interpreter now.
I'm interested!