A thought that I had, and this might be totally off the wall, so I'll have no hurt feelings if you all point fingers and laugh at me...
The SP address passed in PAR to the code in crt0_lmm has the dual purpose of setting the stack address AND signaling if it is a primary or secondary thread by setting SP to something other than 0x8000.
This has the side effect of forcing the primary thread to use a stack address of 0x8000.
So...
How about using bit 31 of the value passed in par to signal primary vs. secondary thread.
This would just imply testing bit 31 and then masking it before using the value of PAR as the actual stack address.
How do you feel about changing crt0_*.s by default to read __hub_end from the linker rather than hard-coding to 0x8000? Did you say this is not possible?
I think there's a misunderstanding here. crt0_*.s is not setting the stack pointer to 0x8000. It always sets the stack pointer to whatever is passed in PAR. I think we want to keep that behavior, as it's nice and general.
The problem is that crt0 is testing to see if this is the first run of the program by comparing PAR to 0x8000. That's what we need to change. Dave Hein has already suggested a reasonable solution (setting a bit in the _C_LOCK variable to indicate that we've already set things up). Another option is to set a bit in PAR to indicate that this is not a first run, although that could be a bit tricky to arrange because Spin is already restricting the bits of PAR. Yet another would be to pass a NULL pointer for the entry point (i.e. to require that a 0 be pushed onto the stack before we first start the C program).
This is a little OT .... In a general sense we need a very simple way to do this for a COG without threading. This was not very obvious until I got a chance to port some ICC code - it's not multi-threaded of course, and ICC used something backwards from what we do by using COGNEW to launch an LMM program and COGNEW_NATIVE launching a PASM program.
You mean compiling an LMM program as a binary lump instead of linking it in? We already have ways to start PASM code, and this is really no different (except we have to skip over the spin boot code that's placed as a header on LMM code). It does require that the linker place everything in the two programs in separate address spaces, though, e.g. program A occupies 0x0000-0x3fff, and program B gets 0x4000-0x8000. It also has the big disadvantage that the two programs aren't sharing code. The threaded solution has them share all the library code, which is a pretty big win in LMM.
This is a little OT .... In a general sense we need a very simple way to do this for a COG without threading. This was not very obvious until I got a chance to port some ICC code - it's not multi-threaded of course, and ICC used something backwards from what we do by using COGNEW to launch an LMM program and COGNEW_NATIVE launching a PASM program. The actual meaning of COGNEW doesn't seem to matter too much and using it to just launch PASM fine. I guess it would be nice if we had an LMMNEW or something that just used the start address of a function and stack for running LMM code non-threaded. It may just be another onion layer though - yikes. Any thoughts?
I'm not sure what you're getting at. I've used ICC to run LMM code in a separate cog using this form:
cog = cognew(funcPtr, stackTop);
In PropGCC, I run LMM code in a separate cog using this form:
These two methods are almost 100% identical, except that PropGCC allows you to pass an argument to the function that runs on the new cog and the PropGCC C library is thread safe in the new cog because it uses the supplied local storage.
So I'm confused. Is the need to supply threadLocalStorage causing you trouble? If not, then what exactly are you pining for?
The same macros would be used for xmm modes right?
There are no threads in XMM modes, unfortunately :-(.
What happens if malloc fails?
Bad Things (tm). So actually we should probably make this a function rather than a macro, and test the return from malloc. Or, we could allocate the thread structure from the thread's stack, and require everyone to provide a certain minimum stack size. Either way this would probably be a better fit for a function. Anyway, I'm just trying to sketch the solution here... the tools are already in the library, I think, and if I understand it correctly you're just asking for some syntactic sugar on top.
I've pushed a change to the crt0_lmm.s in the repository so that it now sets a bit in the lock word to indicate that we've initialized, rather than relying on the value of the stack pointer. Thanks to Dave Hein for this excellent suggestion. The debug version of crt0 needs to be adapted to this as well (for now it's still using the stack to determine whether this is a first run or not).
...Or, we could allocate the thread structure from the thread's stack, and require everyone to provide a certain minimum stack size. Either way this would probably be a better fit for a function. Anyway, I'm just trying to sketch the solution here...
I'm very attracted to allocating space on the thread stack. That way we don't have to worry about free once the thread is done.
Since Eric has implemented the lock bit, I think we can now proceed with a GCC approach to satisfy the topic of this thread.
In the near future we can have a new thread or threads if necessary that leverages the ideas posted here and elsewhere.
Most likely it will involve a boot-loader like Spinix or something similar. It is possible to have a pure C/GCC solution. At this point it's up to the community to produce one, two, or more variations as the community sees fit.
There are a few possible approaches to this. None of these ideas are being funded by Parallax since all development is essentially done for Propeller-GCC P8x32a except for critical bug fixes.
Is there a solution to allow stand alone LMM programs to use a stack other than 0x8000 based on Eric's changes?
I seems we would need to pass a value to the linker that tells the loader to pass the desired stack address to the primary thread.
Hi C.W.
I was kind of waiting for some proposals from one or more of our crew on this subject.
I'll share my own thoughts here on this subject.
This is one way to go about it without having to change any Propeller GCC tools.
Lets define "it": An "Embedded Operating Environment" or OE.
That is: a procedure for booting a system and providing services for running re-loadable programs with GCC. Why not call this an O/S? Well Linux, Unix, Windows, or even CPM are Operating Systems. It is "impractical" to provide a "full service" O/S environment.
Personally I believe Dave Hein's Spinix can serve almost everyone's needs, however there are some cases where that may not be appropriate.
So an Operating Environment can be setup as follows:
1. The main program downloaded to EEPROM (or HUB RAM) would be called a BootLoader.
2. The BootLoader is responsible for setting up services in COGs for user application programs to access.
3. The BootLoader should be re-startable via cold start (power on reset) or warm start (reset without power cycle).
This is just one approach that can be implemented without making loader changes.
There is another approach that requires loader changes. I had mentioned the possibility of having the loader start-up COG services for us based on build linkages before. While this will not happen soon, it is still doable and it eliminates the need for some kind of registry.
So an Operating Environment can be setup as follows:
1. The main program downloaded to EEPROM (or HUB RAM) would be called a BootLoader.
2. The BootLoader is responsible for setting up services in COGs for user application programs to access.
3. The BootLoader should be re-startable via cold start (power on reset) or warm start (reset without power cycle).
Steve I think these steps define exactly what I'm talking about.
The bootloader would be a PropGCC created LMM standalone application, the only requirement that I see missing at this point is a way to set the stack pointer to an address below 0x8000.
An item for others to consider in commenting, I know Steve already understands my goal, is that the bootloader will be blown away in it's last step by loading a new image into it's cog.
That is my reason for leaving space above the stack, I don't want any of the bootloaders hub "footprint" to remain once the bootloader is complete.
I realize I could probably patch the files created by PropGCC to make this happen, but it would be nice to be able to do dev and testing within SimpleIDE without any extra steps.
Steve I think these steps define exactly what I'm talking about.
Great!
Yes, the only footprint left over from the BootLoader would be the COG services mailboxes ... and you can manage that in any way you like without Parallax or anyone else having to change anything in the compiler or loader tools. Linker scripts give us great flexibility.
The bootloader would be a PropGCC created LMM standalone application, the only requirement that I see missing at this point is a way to set the stack pointer to an address below 0x8000.
As I understand it, we push the stack pointer and other arguments to be used by the application when we launch the application from the BootLoader so that the mailboxes are not disturbed.
When Spinix runs a stand-alone Spin or LMM C file it runs a small loader program in the upper 512-bytes of hub RAM. This allows programs to be as large as 31.5K in size, which shouldn't be a problem in practice since programs require some stack space to operate. The small runner program shuts down all of the cogs except itself and the SD driver. After the target program is loaded in RAM it stops the SD dirver cog, starts the target program in cog 0 and stops its own cog if it was not running in cog 0.
Currently I don't leave any drivers running for stand-alone programs, but I plan on keeping the display and keyboard drivers running when I add that capability. I have also considered a zero-footprint runner program that runs entirely in a cog. That could be done by adding a program-load command to the SD driver.
and you can manage that in any way you like without Parallax or anyone else having to change anything in the compiler or loader tools. Linker scripts give us great flexibility.
Ahh, so the loader script you provided in post #24 of this thread should work along with the updated crt0_lmm file includes Dave Hein's suggested change.
Ahh, so the loader script you provided in post #24 of this thread should work along with the updated crt0_lmm file includes Dave Hein's suggested change.
I'll give it a whirl this evening.
Thanks,
Chris Wardell
Propably. We have to relaunch the program in another COG with a different stack though since we have no control over that from our propeller-load program.
Propably. We have to relaunch the program in another COG with a different stack though since we have no control over that from our propeller-load program.
We may not have to be concerned about this with the linker script I provided .... read on.
Steve: perhaps I'm misunderstanding, but it looks like your demo links the blink and mbox code together, and the main mbox code still uses 0x8000 as its stack. To use a different stack location with the default loader would require changes to the spin boot code, wouldn't it?
Eric, I was looking at spinboot.s last night, and I noticed it uses __hub_end+N for the stack pointer and other pointers. Doesn't this mean that if the .hub section is only 31K that the stack pointer can not start above 31K ? My linker script for the mailboxes in post #24 mbox.zip limits .hub to 31K and sets mbox to $7F00.
I'm not a spin expert, but I think the spin startup code hard codes PAR (and hence the C stack) to 0x8000. The __hub_end variables are only used for setting up the initial Spin stack, which will got away anyway once the C code gets started.
If there's some way we can set PAR to be one of those Spin variables that would be great, but again, I'm not a Spin expert so I'm not sure how to do that.
The C startup code doesn't use any of that now; all it cares about is what value is placed in PAR (i.e. the last thing passed to CogInit).
I'm not a spin expert, but I think the spin startup code hard codes PAR (and hence the C stack) to 0x8000. The __hub_end variables are only used for setting up the initial Spin stack, which will got away anyway once the C code gets started.
If there's some way we can set PAR to be one of those Spin variables that would be great, but again, I'm not a Spin expert so I'm not sure how to do that.
The C startup code doesn't use any of that now; all it cares about is what value is placed in PAR (i.e. the last thing passed to CogInit).
Eric
Hmmm... I guess you're right. I'm also not entirely sure that the __hub_end expressions are correct. It would really be nice if Parallax would publish a definitive document describing the Spin instruction set and binary file format. I've been unable to find any documentation other than what people have generated by reverse engineering. It is amazing to me that this information is not available even to the people who are building official Parallax tools like PropGCC.
Also is the following code from spinstart.s the section of code being discussed pertaining to 0x8000 being used as the initial PAR value?
'' here is the spin code to switch to pasm mode
.byte 0x35 ' constant 1 $00000000 (id)
.byte 0xc7 ' memory op: push PBASE + next byte
.byte 0x10
.byte 0x37 ' constant mask Y= 14 0x00008000
.byte 14
.byte 0x2C ' CogInit(Id, Addr, Ptr)
.byte 0x32 ' Return (unused)
.byte 0x00 ' padding
Thanks,
C.W.
Yes, that is the section of code in question. What we need to do is to replace the code sequence "0x37, 14" with an instruction that will load the value of a variable that we can patch in the loader. Unfortunately, I haven't found a good enough description of the Spin bytecodes to figure out how to do that. Do you know?
Also is the following code from spinstart.s the section of code being discussed pertaining to 0x8000 being used as the initial PAR value?
Yes, that's correct.
I suspect one could use Ray's Spin compiler to figure out how to push a different value than 0x8000 for the PAR value of CogInit. Possibly pushing DCURR or DBASE might work, if there's a simple opcode for this (probably, but I don't know the Spin interpreter well enough to know for sure).
If the length of that code changes from 8 bytes then various other things in the header will probably have to be changed as well. I generated the original version of spinboot.s by compiling a very simple piece of spin code that loaded the LMM PASM kernel. Various people (most recently Bill Henning, I think) have modified it since then.
Parallax has never produced a document that describes the Spin bytecodes in detail. The closest thing to a document are the comments in the interpreter source code. There aren't even official names for the bytecode instructions, which is why the BST listing has to resort to words describing each operation. I've attempted to define a Spin bytecode assembly language called Spasm. The Spasm document is attached below.
The bytecodes that we use to load $8000 to the stack are $37, 14, which is "ldlip $8000" in Spasm. This instruction is a compact form of loading constants that are a power of 2. The value of $8000 is produced by shifting 2 to the left by 14 places. This instruction can also compliment and/or negate the number to produce bit masks and other numbers, such as $ffffff and -256.
We could load any 16-bit number by using the opcode $39 followed by a two-byte value. This is "ldwi value" in Spasm.
There is a definition for 128 stack load / save opcodes, based on the info it looks like the opcode A0 followed by an offset of 0A would push the DBASE value onto the stack.
This would be the "ldwa" load word absolute opcode from spasm.txt provided in Dave's post #59 of this thread.
So I'm going to try replacing the bytes 0x37 14 with 0xA0 0x0A
.byte 0x3F ' Register op $1E9 Read - cogid
.byte 0x89
.byte 0xc7 ' memory op: push PBASE + next byte
.byte 0x10
.byte 0xA0 ' Load word absolute, was 0x37 ' constant mask Y= 14 0x00008000
.byte 0x0A ' Offset to DBASE, was 14
.byte 0x2C ' CogInit(Id, Addr, Ptr)
.byte 0x32 ' Return (unused)
The ldwa instruction gets the absolute address from the stack. So you need to load the stack with $A first and then do ldwa. In Spasm this would be
ldbi $A
ldwa
This would be the bytecode sequence $38, $0A, $A0. We don't need the return code at the end since we are coginit'ing our own cog, so this will still fit in the same number of bytes. We actually don't need the method table either because we aren't calling any methods, so if we need more space we could eliminate the method table.
Comments
It looks like that would work as long as the startup code for standalone LMM programs is also changed to pass __hub_end as the stack address in PAR.
I thought at first this might break using _start_cog_thread, but since the caller is in control of the SP value that is passed it should be OK.
Thanks for all the thought you folks are putting into this, I really appreciate it.
C.W.
The SP address passed in PAR to the code in crt0_lmm has the dual purpose of setting the stack address AND signaling if it is a primary or secondary thread by setting SP to something other than 0x8000.
This has the side effect of forcing the primary thread to use a stack address of 0x8000.
So...
How about using bit 31 of the value passed in par to signal primary vs. secondary thread.
This would just imply testing bit 31 and then masking it before using the value of PAR as the actual stack address.
Just a thought.
C.W.
I think there's a misunderstanding here. crt0_*.s is not setting the stack pointer to 0x8000. It always sets the stack pointer to whatever is passed in PAR. I think we want to keep that behavior, as it's nice and general.
The problem is that crt0 is testing to see if this is the first run of the program by comparing PAR to 0x8000. That's what we need to change. Dave Hein has already suggested a reasonable solution (setting a bit in the _C_LOCK variable to indicate that we've already set things up). Another option is to set a bit in PAR to indicate that this is not a first run, although that could be a bit tricky to arrange because Spin is already restricting the bits of PAR. Yet another would be to pass a NULL pointer for the entry point (i.e. to require that a 0 be pushed onto the stack before we first start the C program).
You mean compiling an LMM program as a binary lump instead of linking it in? We already have ways to start PASM code, and this is really no different (except we have to skip over the spin boot code that's placed as a header on LMM code). It does require that the linker place everything in the two programs in separate address spaces, though, e.g. program A occupies 0x0000-0x3fff, and program B gets 0x4000-0x8000. It also has the big disadvantage that the two programs aren't sharing code. The threaded solution has them share all the library code, which is a pretty big win in LMM.
Eric
Regarding your off topic:
I'm not sure what you're getting at. I've used ICC to run LMM code in a separate cog using this form:
In PropGCC, I run LMM code in a separate cog using this form:
These two methods are almost 100% identical, except that PropGCC allows you to pass an argument to the function that runs on the new cog and the PropGCC C library is thread safe in the new cog because it uses the supplied local storage.
So I'm confused. Is the need to supply threadLocalStorage causing you trouble? If not, then what exactly are you pining for?
- Ted
As to my O.T. side-track, I only pine for something simpler than what we have. If that is not possible, then so be it.
How about: If we add that to <propeller.h> then you'll be able to call ''cognew_lmm'' just like the ICC cognew.
Hi Eric,
Going further ...
I suppose one could add cognew_lmm_parm(functPtr, stackTop, parm)
We need a corresponding cogstop_lmm too to free the thread struct.
The same macros would be used for xmm modes right?
What happens if malloc fails?
I can add these.
Would there be issues in adding that to propeller.h ?
Regretting not making a separate thread right now ...
Thanks,
--Steve
Bad Things (tm). So actually we should probably make this a function rather than a macro, and test the return from malloc. Or, we could allocate the thread structure from the thread's stack, and require everyone to provide a certain minimum stack size. Either way this would probably be a better fit for a function. Anyway, I'm just trying to sketch the solution here... the tools are already in the library, I think, and if I understand it correctly you're just asking for some syntactic sugar on top.
Eric
Eric
I'm very attracted to allocating space on the thread stack. That way we don't have to worry about free once the thread is done.
Thanks for pushing the lock bit change.
--Steve
In the near future we can have a new thread or threads if necessary that leverages the ideas posted here and elsewhere.
Most likely it will involve a boot-loader like Spinix or something similar. It is possible to have a pure C/GCC solution. At this point it's up to the community to produce one, two, or more variations as the community sees fit.
There are a few possible approaches to this. None of these ideas are being funded by Parallax since all development is essentially done for Propeller-GCC P8x32a except for critical bug fixes.
Thanks,
--Steve
I seems we would need to pass a value to the linker that tells the loader to pass the desired stack address to the primary thread.
Thanks,
C.W.
Hi C.W.
I was kind of waiting for some proposals from one or more of our crew on this subject.
I'll share my own thoughts here on this subject.
This is one way to go about it without having to change any Propeller GCC tools.
Lets define "it": An "Embedded Operating Environment" or OE.
That is: a procedure for booting a system and providing services for running re-loadable programs with GCC. Why not call this an O/S? Well Linux, Unix, Windows, or even CPM are Operating Systems. It is "impractical" to provide a "full service" O/S environment.
Personally I believe Dave Hein's Spinix can serve almost everyone's needs, however there are some cases where that may not be appropriate.
So an Operating Environment can be setup as follows:
1. The main program downloaded to EEPROM (or HUB RAM) would be called a BootLoader.
2. The BootLoader is responsible for setting up services in COGs for user application programs to access.
3. The BootLoader should be re-startable via cold start (power on reset) or warm start (reset without power cycle).
This is just one approach that can be implemented without making loader changes.
There is another approach that requires loader changes. I had mentioned the possibility of having the loader start-up COG services for us based on build linkages before. While this will not happen soon, it is still doable and it eliminates the need for some kind of registry.
Feedback?
Thanks,
--Steve
Steve I think these steps define exactly what I'm talking about.
The bootloader would be a PropGCC created LMM standalone application, the only requirement that I see missing at this point is a way to set the stack pointer to an address below 0x8000.
An item for others to consider in commenting, I know Steve already understands my goal, is that the bootloader will be blown away in it's last step by loading a new image into it's cog.
That is my reason for leaving space above the stack, I don't want any of the bootloaders hub "footprint" to remain once the bootloader is complete.
I realize I could probably patch the files created by PropGCC to make this happen, but it would be nice to be able to do dev and testing within SimpleIDE without any extra steps.
Thanks,
Chris Wardell
Great!
Yes, the only footprint left over from the BootLoader would be the COG services mailboxes ... and you can manage that in any way you like without Parallax or anyone else having to change anything in the compiler or loader tools. Linker scripts give us great flexibility.
As I understand it, we push the stack pointer and other arguments to be used by the application when we launch the application from the BootLoader so that the mailboxes are not disturbed.
Thanks,
--Steve
Currently I don't leave any drivers running for stand-alone programs, but I plan on keeping the display and keyboard drivers running when I add that capability. I have also considered a zero-footprint runner program that runs entirely in a cog. That could be done by adding a program-load command to the SD driver.
Ahh, so the loader script you provided in post #24 of this thread should work along with the updated crt0_lmm file includes Dave Hein's suggested change.
I'll give it a whirl this evening.
Thanks,
Chris Wardell
Propably. We have to relaunch the program in another COG with a different stack though since we have no control over that from our propeller-load program.
We may not have to be concerned about this with the linker script I provided .... read on.
Eric, I was looking at spinboot.s last night, and I noticed it uses __hub_end+N for the stack pointer and other pointers. Doesn't this mean that if the .hub section is only 31K that the stack pointer can not start above 31K ? My linker script for the mailboxes in post #24 mbox.zip limits .hub to 31K and sets mbox to $7F00.
From spinboot.s
If I'm understanding this correctly, we don't have to worry about the stack over-writing the mailbox area with my linker script. No?
Thanks,
--Steve
If there's some way we can set PAR to be one of those Spin variables that would be great, but again, I'm not a Spin expert so I'm not sure how to do that.
The C startup code doesn't use any of that now; all it cares about is what value is placed in PAR (i.e. the last thing passed to CogInit).
Eric
Well, once I finish some other things, I'll focus on this.
C.W. have you looked at it yet?
Steve, unfortunately I haven't had the opportunity, I'll try to look at it this afternoon, or later in the evening.
Thanks,
C.W.
The one at http://code.google.com/p/propgcc/source/browse/gcc/gcc/config/propeller/?name=a1967a69d2&r=a1967a69d24eea2600d18ace06e16eca48c52b04 is dated December 1, 2011.
Also is the following code from spinstart.s the section of code being discussed pertaining to 0x8000 being used as the initial PAR value?
Thanks,
C.W.
No, this is the first time I looked into anything about the underlying guts of SPIN.
Do we know who wrote the above mentioned code? Was it provided by Parallax?
It would be nice to just go directly to Chip, but I know he is deep into the Prop II and we shouldn't bother him.
I'd love to dig in and do some reverse engineering, but my work load for "work" work is way too high right now.
Thanks,
C.W.
Somehow you've got a specific revision into the URL (the r=) . Try:
http://code.google.com/p/propgcc/source/browse/#hg%2Fgcc%2Fgcc%2Fconfig%2Fpropeller
instead.
Yes, that's correct.
I suspect one could use Ray's Spin compiler to figure out how to push a different value than 0x8000 for the PAR value of CogInit. Possibly pushing DCURR or DBASE might work, if there's a simple opcode for this (probably, but I don't know the Spin interpreter well enough to know for sure).
If the length of that code changes from 8 bytes then various other things in the header will probably have to be changed as well. I generated the original version of spinboot.s by compiling a very simple piece of spin code that loaded the LMM PASM kernel. Various people (most recently Bill Henning, I think) have modified it since then.
Eric
The bytecodes that we use to load $8000 to the stack are $37, 14, which is "ldlip $8000" in Spasm. This instruction is a compact form of loading constants that are a power of 2. The value of $8000 is produced by shifting 2 to the left by 14 places. This instruction can also compliment and/or negate the number to produce bit masks and other numbers, such as $ffffff and -256.
We could load any 16-bit number by using the opcode $39 followed by a two-byte value. This is "ldwi value" in Spasm.
http://forums.parallax.com/showthread.php?96211-Spin-Bytecode-Disassembler&p=665019#post665019
There is a definition for 128 stack load / save opcodes, based on the info it looks like the opcode A0 followed by an offset of 0A would push the DBASE value onto the stack.
This would be the "ldwa" load word absolute opcode from spasm.txt provided in Dave's post #59 of this thread.
So I'm going to try replacing the bytes 0x37 14 with 0xA0 0x0A
C.W.