Recover space lost to LMM/CMM kernel?

Rayman · 2012-09-09 17:59

Just wondering...

Can we use the upper 32k of a 64k eeprom for code without the kernel and then
use the lower 32k for the loader and kernel?

Maybe this would make CMM come out even closer to Spin?

David Betz · 2012-09-09 18:04

Rayman wrote: »

Just wondering...

Can we use the upper 32k of a 64k eeprom for code without the kernel and then
use the lower 32k for the loader and kernel?

Maybe this would make CMM come out even closer to Spin?

Yes, that is possible. I even started writing the code to do that a while back. The problem is that if you load the kernel in a first-stage loader and then load the program from the upper 32k of the EEPROM you end up with a hub memory image without the LMM or CMM kernel. Of course, that's what you want in order to save the space normally taken by the kernel. However, this causes a problem if you want to launch code in another COG. There is no longer a kernel image in hub memory so there is no way to launch a second LMM or CMM COG. The Spin interpreter doesn't have this problem because its interpreter lives in hub ROM. So, in answer to your question, yes, it would be quite easy to arrange to load the kernel before loading the user program but you would give up the ability to start a second LMM or CMM COG.

ersmith · 2012-09-10 02:46

David Betz wrote: »

Yes, that is possible. I even started writing the code to do that a while back. The problem is that if you load the kernel in a first-stage loader and then load the program from the upper 32k of the EEPROM you end up with a hub memory image without the LMM or CMM kernel. Of course, that's what you want in order to save the space normally taken by the kernel. However, this causes a problem if you want to launch code in another COG. There is no longer a kernel image in hub memory so there is no way to launch a second LMM or CMM COG.

Hmmm... that's an interesting problem. Perhaps the solution would be to have the COG clone itself, i.e. write its internal memory out to a (temporary) hub ram buffer and then do a cognew from there? It would require a change to the kernel's initialization code, of course (it could no longer live in the registers) but it seems like it should be quite do-able.

Heater. · 2012-09-10 03:27

Having a COG clone itself simply like that woulds still require a 2K buffer to which the kernel copies itself.
Trick would be to use a much smaller buffer say 16 longs or so.
That would have a "bootstrap" written into it which is used to start the second COG.
The bootstrap running in the second COG then recieves the kernel copy in chunks posted to that buffer by the first.
Nice puzzle.

Do we have a fork() function in the C libs anywhere?

Rayman · 2012-09-10 03:34

Couldn't you just load the kernel from the lower EEPROM again, if you need to start a new cog?

Actually, this is all sounding like more trouble than it's worth...

David Betz · 2012-09-10 04:36

Rayman wrote: »

Couldn't you just load the kernel from the lower EEPROM again, if you need to start a new cog?

Actually, this is all sounding like more trouble than it's worth...

Yes, that would be possible. You'd have to have EEPROM reading code to do that though and that would either take up some hub memory cutting into the 1984 bytes you save by not having the kernel or you'd have to sacrifice a COG to run the EEPROM driver.

ersmith · 2012-09-10 05:05

Heater. wrote: »

Having a COG clone itself simply like that woulds still require a 2K buffer to which the kernel copies itself.

True, but the buffer could be reused for other things too -- we could put it on the stack, or malloc/free it.

Trick would be to use a much smaller buffer say 16 longs or so.
That would have a "bootstrap" written into it which is used to start the second COG.
The bootstrap running in the second COG then recieves the kernel copy in chunks posted to that buffer by the first.
Nice puzzle.

Wow, that would be cool, although a lot more work.

altosack · 2012-09-10 11:34

I've been pondering this, myself, although I haven't run out of space on any of my projects yet... I know I eventually will, since I will always ask the Prop to do a little bit more since it's so much fun !

I think we're trying to make too generic a solution here. If we need to fork a new CMM or LMM cog, simply require the programmer to do that before he recovers the Hub space used for the kernel; it would be similar to the boot I2C methods already in place.

David Voss

David Betz · 2012-09-10 11:44

David Betz wrote: »

Yes, that would be possible. You'd have to have EEPROM reading code to do that though and that would either take up some hub memory cutting into the 1984 bytes you save by not having the kernel or you'd have to sacrifice a COG to run the EEPROM driver.

If we know how many COGs we want to run LMM or CMM kernels we could have the first-stage loader load all of them before loading the main program. All but one of the LMM/CMM COGs would come up in an idle state waiting for the "main COG" to start them. This would eliminate the need for the kernel image to be in hub memory and also wouldn't require some complex scheme for cloning a running COG.

David Betz · 2012-09-10 13:33

David Betz wrote: »

If we know how many COGs we want to run LMM or CMM kernels we could have the first-stage loader load all of them before loading the main program. All but one of the LMM/CMM COGs would come up in an idle state waiting for the "main COG" to start them. This would eliminate the need for the kernel image to be in hub memory and also wouldn't require some complex scheme for cloning a running COG.

To expand on this a little, the first-stage loader could load all of the COGs that the main program wants to use including drivers. The only problem with this is that there would need to be a mailbox for each COG that gets loaded so that the main program would know how to communicate with it. This is what Ross calls a registry but I'm not sure it would even have to be as complicated as the scheme he has setup. It could be more static so that the binding between mailboxes and specific drivers or kernels is done at link time rather than at runtime. This, however, would place constraints on how the drivers are written. For instance, they would have to start in an idle state spinning on their mailbox and only wake up when prodded by the main program.

altosack · 2012-09-10 14:56

David Betz wrote: »

To expand on this a little, the first-stage loader could load all of the COGs that the main program wants to use including drivers. The only problem with this is that there would need to be a mailbox for each COG that gets loaded so that the main program would know how to communicate with it. This is what Ross calls a registry but I'm not sure it would even have to be as complicated as the scheme he has setup. It could be more static so that the binding between mailboxes and specific drivers or kernels is done at link time rather than at runtime. This, however, would place constraints on how the drivers are written. For instance, they would have to start in an idle state spinning on their mailbox and only wake up when prodded by the main program.

I always want as much control over the process as I can get, so I'd rather it be done at run-time. I already pass a pointer to a mailbox for each cog I start up, and the first parameter in that structure is usually the CNT that the cog should start at (usually for the purposes of synchronization between cogs), or a command that will be zero until the cog is needed. While there could be default examples (or recommendations) that people can follow if they want to, ultimately, I'd want to be able to define this myself.

It's not very much trouble to add in our own code to start device drivers in cogs. When I'm done doing that (including starting any additional CMM or LMM cogs), I would also like to recover the memory used for the kernel in Hub ram using a library call for that purpose. Actually, there could be two different library calls, depending on how directly we'd like to manage memory. One of them would return the memory to the heap, and malloc would control it; the other would return a pointer (and a length) to the memory block that I could then use directly.

David Betz · 2012-09-10 15:04

altosack wrote: »

I always want as much control over the process as I can get, so I'd rather it be done at run-time. I already pass a pointer to a mailbox for each cog I start up, and the first parameter in that structure is usually the CNT that the cog should start at (usually for the purposes of synchronization between cogs), or a command that will be zero until the cog is needed. While there could be default examples (or recommendations) that people can follow if they want to, ultimately, I'd want to be able to define this myself.

It's not very much trouble to add in our own code to start device drivers in cogs. When I'm done doing that (including starting any additional CMM or LMM cogs), I would also like to recover the memory used for the kernel in Hub ram using a library call for that purpose. Actually, there could be two different library calls, depending on how directly we'd like to manage memory. One of them would return the memory to the heap, and malloc would control it; the other would return a pointer (and a length) to the memory block that I could then use directly.

I'm not sure what you're suggesting. Are you saying you want to write the first-stage loader yourself so you can start all of the COGs and then load the main program?

jazzed · 2012-09-10 16:15

@altosack

Welcome to the forum.

altosack wrote: »

I always want as much control over the process as I can get, so I'd rather it be done at run-time.

That's what Parallax said they wanted when we started Propeller-GCC development. They didn't want magic driver loading, and other things. Being able to load all the COGs first saves HUB memory, but it also causes problems and invites overly complicated solutions.

It is theoretically possible for a programmer to start all COGs with some reserved memory mailbox space today with a PropellerGCC program then read the application from EEPROM or other media by over-writing most of HUB memory and start a COG. This was the subject of a discussion several months ago. The issue of having the kernel available for starting multiple COG threads remains. No one ever created a demo AFAIK - I assume that they just lost interest, and we didn't have much time.

altosack wrote: »

.... When I'm done doing that (including starting any additional CMM or LMM cogs), I would also like to recover the memory used for the kernel in Hub ram using a library call for that purpose.

There is no library call for recovering HUB RAM at the moment. COG drivers are by default just memory blocks. One could provide a linker script for loading such blocks however.

We provide a method where all COG drivers can be loaded into EEPROM rather than HUB RAM - sometimes it's called ECOG mode. In SimpleIDE this can be done with COGC drivers by using the extension .ecogc . It can be done with linker scripts to take advantage of all COG drivers. The amount of code and number of drivers being used has to outweigh the startup needs though - the I2C driver space gets recycled by the cog loader. It is all done in keeping with the idea of the programmer having direct control on the loading. See the cog_load demo for an example of using an EEPROM based COG library.

Hope this helps.

altosack · 2012-09-10 16:17

David Betz wrote: »

I'm not sure what you're suggesting. Are you saying you want to write the first-stage loader yourself so you can start all of the COGs and then load the main program?

No, I don't want to write the first-stage loader myself; I just want to define the interface to the device drivers myself, which would entail me creating my own mailbox structures, and passing a pointer to the mailbox to the device driver when I start it with coginit (or the equivalent ecog library call). In practice, it seems to me that the most common interfaces will be a busy loop waiting for a command, and a synchronized start at a given system clock CNT, but there could also be shared memory that should best be defined by the developer, rather than trying to make some sort of standard mailbox interface that will work for any eventuality.

At the same time, it would be useful to provide one or two default ways of doing this in examples so people don't have to re-invent the wheel, and so device drivers are more easily shared, because they have a relatively standard interface that people will recognize. None of my applications currently use HMI, but I guess the people that do use it might want a relatively standard mailbox for that.

Ideally, from the main program, I would first start any additional CMM/LMM cogs that I want, recover the memory from where the CMM/LMM kernel was stored in hub RAM with a library call, use that memory to temporarily store ecog device drivers as they are started, and then use that memory for my own purposes (or it could be returned to the heap to be managed by malloc). While this could be automated in initialization code at link time, I'd rather do it myself from the main program at run-time so I can define the interface between the main program, the additional CMM or LMM cog(s), and the cog device drivers myself.

By the way, I really want to commend everyone involved with propgcc for a job extremely well done; it has worked for me from the start (about 2 weeks ago), and I'm already using it as the only way I program my Props. My code is now more professional, modular, and maintainable (with the godsends of conditional compilation and make files, not to mention C) than was possible before.

Thanks !!!

Rayman · 2012-09-10 16:52

Actually, it would be kindof nice (in case this idea doesn't go anywhere) to be able to lump all of my PASM drivers (and perhaps also CMM kernel) into one contiguous memory space and then be able to repurpose it. I know this is an old idea, but maybe it's easier in C++?

David Betz · 2012-09-10 20:00

Rayman wrote: »

Actually, it would be kindof nice (in case this idea doesn't go anywhere) to be able to lump all of my PASM drivers (and perhaps also CMM kernel) into one contiguous memory space and then be able to repurpose it. I know this is an old idea, but maybe it's easier in C++?

That is certainly possible. In fact, we already do that with drivers we load into high EEPROM. They all get put into a special linker section. We could do that for drivers that load into hub memory as well if that would be useful. However, that is not what I'm proposing. I'm thinking that the first-stage loader could load all of the COGs so that the main program, once it is loaded, doesn't have to include any COG images at all. That would require that the drivers that get loaded use some standard way of starting them. I was proposing a single long to be used as a flag to prod the driver out of its idle loop once the main program starts. This could just be a long that contains zero when the COG is loaded and the main program could write a mailbox address to that long to start the driver. The whole purpose of this is to allow the main program to use all available hub memory for any purpose without having to explicitly recycle space that once contained COG images. This may not be an achievable goal though if it puts too many constraints on how users can interface with their drivers. As altosack pointed out, there must be a lot of flexiblity to allow different approaches to driver interfaces.

Heater. · 2012-09-10 20:34

David Betz,

I'm thinking that the first-stage loader could load all of the COGs so that the main program, once it is loaded, doesn't have to include any COG images at all. That would require that the drivers that get loaded use some standard way of starting them

Which brings us neatly back to the old and long debate about standardizing how cog code is written. How the mailboxes are specified, how the cogs are started, How they can be used from different languages, etc, etc, and the wondeful world of "plugins"and first-stage loaders, as exemplifed by RossH's efforts to document such a standard (Sorry can't find the thread now)

Personally I think Parallax was wise to want GCC to keep away from all that. It presents an all together too complex face to the user. I would like to see all COG code written to such a standard mailbox with a standardized way of being started and useable from different languages, but I think propgcc is right in keeping with simplicity and flexibility. As Jazzed says above "...They didn't want magic driver loading ...it also causes problems and invites overly complicated solutions."

A half way house here is to concatonate all drivers into a single continuous area, leave it to the programmer to reuse that space for data if he likes, after all he knows where it is and how big it is. Yes, that space may not be contiuous with other data areas, heap or stack but so what? Data can be overlaid against it.

David Betz · 2012-09-11 05:07

Heater. wrote: »

A half way house here is to concatonate all drivers into a single continuous area, leave it to the programmer to reuse that space for data if he likes, after all he knows where it is and how big it is. Yes, that space may not be contiuous with other data areas, heap or stack but so what? Data can be overlaid against it.

That, of course, is already possible if you write a custom linker script and put the right section pragmas in your code. This is one advantage we gain from using GCC. There are already very powerful facilities for taking control of how things are laid out in memory. If you don't want the user to have to deal with the complexities of linker scripts then I guess our default linker scripts could provide a way to place COG images in a special predefined section. Then it would just be up to the programmer to use the _start and _end symbols of that section to reclaim the space.

David Betz · 2012-09-11 05:10

Heater. wrote: »

As Jazzed says above "...They didn't want magic driver loading ...it also causes problems and invites overly complicated solutions."

There would never be any magic loading of drivers. All of the drivers that would be loaded by a first-stage loader would be directly controlled by which code the user choses to link with his/her application. To be fair, I don't think Catalina forces the use of its registry or automatically loaded drivers either. It's just that there are few if any examples of how to use it without that infrastructure. I don't think there is anything in the compiler or the "binder" that mandate the use of the registry and its standardized drivers.

Rayman · 2012-09-11 06:32

The "Graphics Demo" for TV might be a great test of this second idea, if the math works out...
I think I saw that it needs to be changed to single buffering because there is not enough room (even in CMM mode?) for the second buffer.
But, what if we were to define a 12,000 byte space that would initially hold the cmm interpreter, TV driver, mouse driver, and graphics driver?
Then, after starting all the drivers, use that for the second buffer...

Any chance that would work?

(or, maybe just doing the drivers (without the cmm interpreter) would be enough to make that work?)

David Betz · 2012-09-11 06:41

Rayman wrote: »

The "Graphics Demo" for TV might be a great test of this second idea, if the math works out...
I think I saw that it needs to be changed to single buffering because there is not enough room (even in CMM mode?) for the second buffer.
But, what if we were to define a 12,000 byte space that would initially hold the cmm interpreter, TV driver, mouse driver, and graphics driver?
Then, after starting all the drivers, use that for the second buffer...

Any chance that would work?

(or, maybe just doing the drivers (without the cmm interpreter) would be enough to make that work?)

I would think that would work if all of the COG driver sections were collected together. You can try this yourself by writing a custom linker script.

Rayman · 2012-09-11 08:49

I don't think I'd want to wade that deep into the muck...
But, I think I could almost group the drivers myself... Maybe if I made a structure that contained all the driver cog code and the extra space to make up 12,000 bytes it would work...

Rayman · 2012-09-13 10:56

How about giving us a pointer to the kernel's address in HUB RAM?
With that and some idea how big it is, that should be enough info to use that space for something else, right?

David Betz · 2012-09-13 11:04

Rayman wrote: »

How about giving us a pointer to the kernel's address in HUB RAM?
With that and some idea how big it is, that should be enough info to use that space for something else, right?

The start is at __load_start_kernel (from C that would be _load_start_kernel without the initial "_"). It doesn't look like the default linker scripts provide a size or end address but that could easily be added.

jazzed · 2012-09-13 11:09

Rayman wrote: »

How about giving us a pointer to the kernel's address in HUB RAM?
With that and some idea how big it is, that should be enough info to use that space for something else, right?

In SimpleIDE right-click on project file -> Show Map File. It tells you addresses of all symbols.

Rayman · 2012-09-14 11:38

Maybe it would help if the kernel could be forced to be all the way at the end of used HUB RAM...

The graphics_demo wants to use the top of HUB ram for buffers and maybe this way it could just overlap the kernel after the cogs were started?

Maybe I need to figure out where the C++ stack is though...

Recover space lost to LMM/CMM kernel?

Comments