Compile - just strip out all the spin and compile with the propeller tool.
Load - the above creates a .binary file - rename it to something that is 3 letters - I chose .cog, copy over to SD card, and then in C open a file, load in the data into an array, close the file, send out to the cog. Hmm. That makes it sound to easy. I do have some code demonstrating this somewhere.
I also want to be able to load and run c objects (or programs) without reloading the whole prop. Have you done this?
No but Ross might have. Sounds interesting. Not quite sure I understand.
Thanks for the tips Drac. You saved me a lot of time trying to figure this out. (compile and load objects)
re C programs in Catalina, they are designed atm to be standalone. I want to be able to compile c programs (or rather modules) that use the already loaded objects. So, I would like to be able to make, for example, a DIR program using C, and then be able to load this and run it. So its just like Sphinx and some parts (I gather) of KyeDos.
In other words, I am really trying to make a proper OS where drivers (equivalent to our cog objects) can be loaded initially, unloaded and replaced dynamically (as switching from pc serial to vga and then back to the pc serial) and where programs (or modules such as DIR, LS, Pascal, ZiCog/CPM, etc can be run) using those drivers, all without re-booting. Also I want to run the Sphinx compiler (a prop spin/pasm compiler by mpark).
Of course, ZiCog/CPM requires SRAM. But so will some C programs which will use XMM.
BTW All of this can be done now. But I am looking for the best general way so that we all can contribute and benefit.
I have some very specific applications and hardware in mind
Hmm, tricky. So you take the existing TV object, split it into spin and pasm, and convert the pasm to cog code, and pass the parameters via a list in a known order. That all works fine. However, what do you do with the spin bit, eg the code that prints a decimal number on the screen. You can translate it to C and that works fine. But is it possible to load and unload bits of compiled C programs without rebooting?
I don't believe it is with current compiled programs.
I think there are some intriguing possibilities running interpreted script languages and I think down the track this could be made part of the html browser that I have half working. The interpreter for script languages could be written in C, and it could interpret C, or it could be written in any language the prop can run and interpret any language.
Having said it can't be done, I wonder if Ross might chime in with running several C programs at once. I think he has done that, and I think he has done Spin running as well. I think that just needs each running in its own memory block. Maybe a common cache handler (and caching could get around the problem of multiple programs accessing external ram at the same time).
So yes, maybe it can be done. But then again, if you have C programs being loaded and unloaded asynchronously, do you need the cogs to be separate any more? Or just build the code as part of the whole program like spin works??
So do you need two protocols, one for inter cog comms, and another for "inter C program" comms?
Ross has multithreaded c programs running afaik. I will need to extract the registry build section from c code to make c programs work under the os. What Ross has done is actually more complex.
What i am proposing is that the cog code is only loaded like sphinx does. Then the access is charcter based from spin or c programs. It is just like outputting to the stdio in cpm. I have pc comms, 1pin kbd and 1pin tv working under sphinx and michael had vga, keyboard and mayybe tv working (all working 18 mths ago). ZiCog has been using the 1pin drivers too.
I know catalina can run spin and pasm programs, but they are currently compiled with the c program.
Sphinx effectively separates the spin from pasm in its cog drivers. So, in fact it is simple to reload a new driver to swap say, tv to vga on the fly. The trick is to load these from sd card on the fly and I believe you have done this?
I am not trying to run multiple c programs using emm concurrently as this has problems to overcome. But to load and run new c programs should be easier.
Once I get the basics running - loading cog drivers which remain resident and then being able to run zicog using these, zicog should be able to return to the command prompt of the prop os instead of rebooting the prop as i am now doing.
I gues that i need a spin program that will locate (using kyes sd driver) and load into a cog a driver by filename.
Define memory in external ram (no need to waste precious hub ram!)
#include <stdio.h>
#include <string.h>
// cogjects - sd card driver in catalina is a little slow so load up once into external ram then (re) load quicker from external ram
unsigned long cogject_color[512]; // external memory for color vga driver
Load the file into external ram - call this from "main"
readcog("vgagraph.cog",cogject_color); // read in kye's graphics driver
which calls this function
void readcog(char *filename,unsigned long external_cog[]) // read in a .cog file into external memory array
{
int i;
FILE *FP1;
i = 0;
if((FP1=fopen(filename,"rb"))==0) // open the file
{
fprintf(stderr,"Can't open file %s\n",filename);
exit(1);
}
fseek(FP1,0,0);
for(i=0;i<24;i++)
{
getc(FP1); // read in the first 24 bytes and discard
}
i = 0;
while(!EoF(FP1) & (i<505)) // run until end of file or 511-6
{
external_cog[i] = getc(FP1) | (getc(FP1)<<8) | (getc(FP1)<<16) | (getc(FP1)<<24); // get the long
i+=1;
}
if(FP1)
{
fclose(FP1); // close the file
FP1=NULL;
}
//printf("external array cog first long = 0x%x \n",external_cog[0]); // hex value
}
and this is Ross' cog loader. Need to move the data from external ram to hub ram before loading
void external_memory_cog_load(int cognumber, unsigned long cogdata[], unsigned long parameters_array[]) // load a cog from external memory
{
unsigned long hubcog[511]; // create a local array, this is in hub ram, not external ram
int i;
for(i=0;i<512;i++)
{
hubcog[i]=cogdata[i]; // move from external memory to a local array in hub
}
_coginit((int)parameters_array>>2, (int)hubcog>>2, cognumber); // load the cog
}
Thanks Drac. Thats exactly what I am after. I would have forgotten to ignore the first 24 bytes. Of course we only require 496 longs to load into the cog.
I wonder if a short cog stub would be useful to load the code into the cog and eliminate the requirement for 2KB buffer in hub? Only the 512B buffer for the sd sector would be required.
I am actually thinking that with a stub cog loader (which would make the cog load slower but I dont think that would matter) just the one 512B buffer for the SD card could be used.
The stub cog loader would be only about 30 instruction max. This would be loaded into the cog (496 longs actually loaded, but only say 30 required). This code would then set a hub flag to indicate to the OS it is loaded, the OS would then load the first 512B sector into hub and set a flag, the cog would copy this into cog and set a flag, the OS would load the next 512B sector and flag, cog copies into next section of cog and flag, etc. Now, because we know how many bytes actually need to be loaded into the cog, the OS and stub would only load this portion, thereby reducing the secondary load time. Now, before you say that the cog will be running and overloading itself, remember I can run LMM with zero footprint using the shadow registers
We most likely can run, after we understand catalina better and Ross's help, more than one catalina program concurrently from different cogs, provided that the extra catalina programs do not use XMM (i.e. self contained within cog/hub).
That means a VT100 driver in C, or whatever else we want.
As promised elsewhere, I have updated the attachments in the first post in this thread to match what has actually been implemented in the newly released Catalina 3.5. The differences are small, and mostly to do with the way memory is managed during initialization - now the InitializeRegistry functions stores the address of the lowest currently used Hub RAM Address in a fixed location in upper Hub RAM (address $7FFC on a Prop I), and each plugin allocates any buffer space it needs downwards from there as part of its Setup function and updates the value for the next plugin to use (when its Setup function is called in turn). There are also a few minor additions/changes to some of the layer 1, 2 and 3 methods that I found to be useful.
Catalina 3.5 now provides a fairly complete implementation of layers 1,2,3 & 4. While there are still some "legacy" bits and pieces inside Catalina that do not fully adopt these techniques (this is because Catalina adopted many Spin/PASM drivers that themselves do not always conform) I will continue to work to make the internals align with the proposal whenever I find the time to do so.
Thanks Ross. Hub Ram $7FFC is a great location for storing the allocated hub ram. I am using the same location for my prop os version, so they should be compatible.
1c) If a plugin provides more than one service, its wrapper should allocate multiple service directory entries (chained together as usual) and pass the first one to the COG. The COG can then use the chain to find where the service entries are. Alternatively, the initialization code can allocate one large block (big enough for all the services provided) and pass that to the COG, allowing the COG to subdivide the block into service directory entries.
The service directory entries contain the communications block used to invoke the service as I understand your proposal. So how can a COG provide multiple services by adding multiple service directory entries? Won't that mean that the COG will have to monitor multiple mailboxes at the same time? Shouldn't there be a way to indicate that one COG can handle multiple services through a single mailbox? I think Ross covers this with the indirection through his level 1 table allowing multiple service directory entries to point to the same mailbox. Maybe instead of the sdentry field being followed immediately by the mailbox address, it should be followed by a pointer to the mailbox address so the same mailbox could be shared by multiple services.
I just finished scanning your document attached to the first message in this thread. Mostly, it looks good to me although I have a concern about the fact that the service table is directly indexed. This seems like it could become a problem when the number of possible services becomes large but any individual program needs only a small subset of them. Also, it presents a problem of how to allocate service index numbers. There has to be some central authority to allocate numbers or people will step on each other. I wonder if some combination of what you've done and what Eric proposed earlier in this thread might solve this problem. I realize it introduces the need to search the service directory to find a specifi entry but the results of that search could be cached by the application to avoid having to repeat the search each time the service is needed. The idea would be to use Eric's linked list approach but to have each service table entry contain not the mailbox itself but instead a pointer to the mailbox. This would allow multiple services to be handled by the same COG/mailbox and also allow a sparse list of services rather than an array where many of the entries are NULL. Does that seem reasonable? If so, I can write up a more complete proposal.
Does that seem reasonable? If so, I can write up a more complete proposal.
Hi David,
By all means propose a concrete alternative. I'd be happy to consider anything that encourage wider adoption of a common solution (since it would save me work in the long run!).
However, after implementing a working version of the current proposal, I'd have to say that even this relatively simple one turned out to involve more complexity and overhead than I was really comfortable with. I am also acutely aware that many people already find it too complex, and can't see any need for it. Eric's proposal is many times more complex than mine, and would also be many times less efficient in both time and space. On that basis, the additional benefits would seem very hard to justify - especially as you still need a central authority for allocating plugin and service types. I've not found a way around that, and I'm not sure there is one.
But if you come up with a proposal that solves all these problems without incurring so much additional overhead that it becomes effectively unusable, I'm sure everyone (including me!) would adopt it.
By all means propose a concrete alternative. I'd be happy to consider anything that encourage wider adoption of a common solution (since it would save me work in the long run!).
However, after implementing a working version of the current proposal, I'd have to say that even this relatively simple one turned out to involve more complexity and overhead than I was really comfortable with. I am also acutely aware that many people already find it too complex, and can't see any need for it. Eric's proposal is many times more complex than mine, and would also be many times less efficient in both time and space. On that basis, the additional benefits would seem very hard to justify - especially as you still need a central authority for allocating plugin and service types. I've not found a way around that, and I'm not sure there is one.
But if you come up with a proposal that solves all these problems without incurring so much additional overhead that it becomes effectively unusable, I'm sure everyone (including me!) would adopt it.
Ross.
Okay, I'll put something together. You're right though that this has to be as simple as possible.
I have suggested that the initial allocation of 1 long and then a further allocation of 2 longs (IIRC) and then a further allocation (if required) is already overly complex. I would just rather that 4 longs be allocated initially for each cog, and then allocate another block if required. This would permit a bit of simplification in the current method Ross is using in Catalina.
Currently the way in which a further block is allocated has to be held by the cog waiting for Catalina to complete the loading process unnecessarily wastes space in the cog.
If 4 longs per cog were allocated, then this would be my preference for its use...
0: b24-31 service type (as per Catalina)
...b23-0 pointer to additional hub allocation (if allocated)
1: b31-0 service mailbox as per Catalina's secondary service call (sorry, cannot recall its name (other laptop) - where you use the 2 long block for the service call)
2: b31-0 parameter for "1" call
3: b31-0 return parameter if required
This method also permits other uses such as a bi-directional data flow (eg: and input and output device where a single buffer for transmit and a single buffer for receive). Here, I am thinking of a single display and keyboard driver or FDX with internal cog buffering.
My reasoning for the above is that the overhead for allocation of two buffers (which is what Catalina 3.4 presently does) is more than the 8 extra longs consumed by 4 per cog. And the extra long permits some other things to be done with this allocation.
The top position should be reserved for the allocation pointer. Next is the question does this point to the next available free hub space, or the last used ??? ANd what do we call this pointer???
eg: the pointer is at $7FFC (top of hub)
eg cog table is at $ (8 x 4longs = 32 longs = 128 bytes)
This begs the question... could we overlap these and define cog 7 as a special case ??? where the last long is reserved for the top of hub pointer???
Here is my hub allocation for my OS version...
Note I allocate 8 x8 longs (256 bytes) per cog
' OS Hub Definitions...
' ------------------ bytes
_HubFree = $7FFC ' 4 ' \ stores total hub available (typ $7800)
' | (hub is allocated to the OS above this value)
' / (eg $7000 means $0000-$6FFF is available)
_OS_DateTime = $7FF8 ' 4 'Year20xx Month Date Hours Minutes Seconds
' (00-63) (1-12) (1-31) (00-23) (00-59) (00-59)
' 000000___0000____00000___00000____000000____000000
'To convert to FAT16/32 format, shift >>1 then add 16<<25
' --> Y(7)M(4)D(5)H(5)M(6)S*2(5bits) Base=1980
' (FYI seconds in 4yrs = 126,230,400)
_OS_Rows = $7FF7 ' 1 ' \ combined <lf> and rows
_LF_MASK = $80 ' | b7 : 1= <lf> ON; 0= strip <lf>
_ROW_MASK = $7F ' / b0-6: no of rows on screen (0-127)
_OS_Columns = $7FF6 ' 1 ' no of cols on screen (0-255)
_OS_Cogs = $7FF5 ' 1 ' stay resident cogs: 1= don't stop on reboot
_OS_Clkmode = $7FF4 ' 1 ' clkmode: saved fm hub byte $0004
_OS_Clkfreq = $7FF0 ' 4 ' clkfreq(Hz): saved fm hub long $0000
_SDpins = $7FEC ' 4 ' \ sd pins packed 4 bytes
' ' / Byte 3=/CS, 2=DI, 1=CLK, 0=DO
_SIOpins = $7FE8 ' 4 ' \ serial pins and mode settings (and cog#)
' ' / Byte 3=cog, 2=mode, 1=SO, 0=SI
_SIObaud = $7FE4 ' 4 ' serial baud (speed typ 115,200)
_Hardware = $7FE0 ' 4 ' \ hardware: hi-word=company, lo-word=config
' | $0001 = Cluso99 $0001 = RamBlade 1
' | $0001 = Cluso99 $0002 = TriBlade#2
' / $0001 = Cluso99 $0003 = RamBlade3
' ^^^ may be more valuable for something else??
_AuxIn = $7FDC ' 4 ' auxilary input rendezvous
_AuxOut = $7FD8 ' 4 ' auxilary output rendezvous
_StdIn = $7FD4 ' 4 ' standard input rendezvous
_StdOut = $7FD0 ' 4 ' standard output rendezvous
_OSreserved = $7FC0 ' 16 ' undefined
_pBuffer_C = $7F80 ' 64 '| ' \ user buffers...
_pBuffer_B = $7F40 ' 64 '| ' | (maybe serial in and serial out buffers?)
_pBuffer_A = $7F00 ' 64 '| ' / (used to pass parameters between modules)
_pBuffer = $7F00 ' 192 '^ ' / ...joins all 3 buffers A+B+C
_pCogTables = $7E00 ' 256 ' \ 8*32B cog usage tables
' | TBD
' | Current thinking is...
' | 1 byte for "type"
' | 1 long for sector address to retrieve filename
' | However, could also be used as 2*16byte buffers
' / with the head+tail(s) kept in _StdIn & _StdOut
_pSectorBuf = $7C00 ' 512 ' SD card i/o buffer
_pSpinVector = $7800 ' 1024 ' vector table for Cluso's faster spin interpreter
_HUB_RESERVED = $7800 ' $7800-$7FFF currently reserved by the OS
_HUB_RAMSIZE = $8000 ' Total hub ram
Does the Catalina registry work with your EEPROM loader? How does the loader know what size mailboxes to setup for drivers that it loads in advance of loading the user code from the upper part of EEPROM? Does the first-stage loader setup the registry so it is in place when the user program starts?
Does the Catalina registry work with your EEPROM loader? How does the loader know what size mailboxes to setup for drivers that it loads in advance of loading the user code from the upper part of EEPROM? Does the first-stage loader setup the registry so it is in place when the user program starts?
Thanks,
David
I think I've answered my own question by reading further in the Catalina documentation. It looks like the first stage loaders do setup the initial registry before starting the user code. That is my plan as well.
I think I've answered my own question by reading further in the Catalina documentation. It looks like the first stage loaders do setup the initial registry before starting the user code. That is my plan as well.
Yes, that's right. The first stage of the load sets up the registry and loads all the drivers. But instead of then loading and starting the kernel, it then loads a second stage loader that does the rest.
Yes, that's right. The first stage of the load sets up the registry and loads all the drivers. But instead of then loading and starting the kernel, it then loads a second stage loader that does the rest.
Ross.
I guess I misunderstood again. I thought the first stage loader was the one built into the Propeller chip and that the second stage was the part that loaded the drivers and started the kernel. What does this extra stage you mentioned do and how does it operate if the user's drivers are already running and possibly taking up all of the COGs?
I guess I misunderstood again. I thought the first stage loader was the one built into the Propeller chip and that the second stage was the part that loaded the drivers and started the kernel. What does this extra stage you mentioned do and how does it operate if the user's drivers are already running and possibly taking up all of the COGs?
Yes, apologies - the terminology is perhaps a bit loose here. Some of the Catalina loaders actually have more than two phases - they may have up to four, depending on the source (EEPROM, SDCARD, serial) and the destination (Hub RAM, XMM RAM, Flash). I just tend to lump them all together and call them "two-stage" loaders, but "multi-stage" loaders would probably be a better term.
Your description of the first phase (the built-in loader) should probably be considered "phase 0" - its job is to load the phase 1 loader - it knows nothing about any languages at all (Spin, C, Forth, native PASM, LMM PASM). Phase 0 just knows about bytes - but it also knows how to start phase 1. Phase 1 (always a Spin program) knows only about plugins and registries, and puts those in place - but it also knows how to start phase 2. Phase 2 knows about program segments, and how to put those in the appropriate places in Hub RAM and/or XMM RAM. Then last thing phase 2 does is start the kernel.
The most complex is probably the SDCARD loader - a description of the phases it uses is given in the Catalina Reference Manual, and your phase description corresponds most closely to phases 3 and 4 - i.e. it is phase 3 that loads the plugins, and phase 4 that loads the code into its final destination and starts the kernel.
The problem you mention about plugins taking up almost all the Hub RAM does occur, and is what makes the loading so complex. Including the registry (which has to be there from phase 1 onwards) it is important to make the whole load process use no additional cogs (since that would prevent you loading all the users plugins) and I also try to make them use less than 1kb of Hub RAM.
As I said, the most complex loader is the SDCARD loader, where most of the complexity (it is now even more complex than the description given in the reference manual, which I need to update!) derives from my desire to keep the overhead even lower - down to around 512 bytes if possible, allowing C programs to occupy nearly the full 32kb of Hub RAM.
Yes, apologies - the terminology is perhaps a bit loose here. Some of the Catalina loaders actually have more than two phases - they may have up to four, depending on the source (EEPROM, SDCARD, serial) and the destination (Hub RAM, XMM RAM, Flash). I just tend to lump them all together and call them "two-stage" loaders, but "multi-stage" loaders would probably be a better term.
Your description of the first phase (the built-in loader) should probably be considered "phase 0" - its job is to load the phase 1 loader - it knows nothing about any languages at all (Spin, C, Forth, native PASM, LMM PASM). Phase 0 just knows about bytes - but it also knows how to start phase 1. Phase 1 (always a Spin program) knows only about plugins and registries, and puts those in place - but it also knows how to start phase 2. Phase 2 knows about program segments, and how to put those in the appropriate places in Hub RAM and/or XMM RAM. Then last thing phase 2 does is start the kernel.
The most complex is probably the SDCARD loader - a description of the phases it uses is given in the Catalina Reference Manual, and your phase description corresponds most closely to phases 3 and 4 - i.e. it is phase 3 that loads the plugins, and phase 4 that loads the code into its final destination and starts the kernel.
The problem you mention about plugins taking up almost all the Hub RAM does occur, and is what makes the loading so complex. Including the registry (which has to be there from phase 1 onwards) it is important to make the whole load process use no additional cogs (since that would prevent you loading all the users plugins) and I also try to make them use less than 1kb of Hub RAM.
As I said, the most complex loader is the SDCARD loader, where most of the complexity (it is now even more complex than the description given in the reference manual, which I need to update!) derives from my desire to keep the overhead even lower - down to around 512 bytes if possible, allowing C programs to occupy nearly the full 32kb of Hub RAM.
Ross.
Ross.
Thanks Ross! I think I'm finally starting to understand your Catalina runtime structure better now that I'm looking at how to do similar things for PropGCC. Targets and Platforms make sense to me now as do your multiple load phases and what you do in each phase. I actually did read the section on the Catalina manual about load phases but I had forgotten which numbers but I did that after I posted the message you originally replied to. Anyway, thanks for taking time to explain this.
Thanks Ross! I think I'm finally starting to understand your Catalina runtime structure better now that I'm looking at how to do similar things for PropGCC. Targets and Platforms make sense to me now as do your multiple load phases and what you do in each phase. I actually did read the section on the Catalina manual about load phases but I had forgotten which numbers but I did that after I posted the message you originally replied to. Anyway, thanks for taking time to explain this.
No worries, David. I look forward to GCC eventually catching up to what Catalina has been able to do for several years now
No worries, David. I look forward to GCC eventually catching up to what Catalina has been able to do for several years now
Ross.
Yes, it does take some time to work these things out. We have some of this already but we're missing the loader stage where all of the COG drivers are loaded in advance of the user program to save hub memory. We do have a way to load COGs from images in high EEPROM but the loading is done by the user program not by another loader stage.
Comments
Compile - just strip out all the spin and compile with the propeller tool.
Load - the above creates a .binary file - rename it to something that is 3 letters - I chose .cog, copy over to SD card, and then in C open a file, load in the data into an array, close the file, send out to the cog. Hmm. That makes it sound to easy. I do have some code demonstrating this somewhere.
No but Ross might have. Sounds interesting. Not quite sure I understand.
re C programs in Catalina, they are designed atm to be standalone. I want to be able to compile c programs (or rather modules) that use the already loaded objects. So, I would like to be able to make, for example, a DIR program using C, and then be able to load this and run it. So its just like Sphinx and some parts (I gather) of KyeDos.
In other words, I am really trying to make a proper OS where drivers (equivalent to our cog objects) can be loaded initially, unloaded and replaced dynamically (as switching from pc serial to vga and then back to the pc serial) and where programs (or modules such as DIR, LS, Pascal, ZiCog/CPM, etc can be run) using those drivers, all without re-booting. Also I want to run the Sphinx compiler (a prop spin/pasm compiler by mpark).
Of course, ZiCog/CPM requires SRAM. But so will some C programs which will use XMM.
BTW All of this can be done now. But I am looking for the best general way so that we all can contribute and benefit.
I have some very specific applications and hardware in mind
Hmm, tricky. So you take the existing TV object, split it into spin and pasm, and convert the pasm to cog code, and pass the parameters via a list in a known order. That all works fine. However, what do you do with the spin bit, eg the code that prints a decimal number on the screen. You can translate it to C and that works fine. But is it possible to load and unload bits of compiled C programs without rebooting?
I don't believe it is with current compiled programs.
I think there are some intriguing possibilities running interpreted script languages and I think down the track this could be made part of the html browser that I have half working. The interpreter for script languages could be written in C, and it could interpret C, or it could be written in any language the prop can run and interpret any language.
Having said it can't be done, I wonder if Ross might chime in with running several C programs at once. I think he has done that, and I think he has done Spin running as well. I think that just needs each running in its own memory block. Maybe a common cache handler (and caching could get around the problem of multiple programs accessing external ram at the same time).
So yes, maybe it can be done. But then again, if you have C programs being loaded and unloaded asynchronously, do you need the cogs to be separate any more? Or just build the code as part of the whole program like spin works??
So do you need two protocols, one for inter cog comms, and another for "inter C program" comms?
What i am proposing is that the cog code is only loaded like sphinx does. Then the access is charcter based from spin or c programs. It is just like outputting to the stdio in cpm. I have pc comms, 1pin kbd and 1pin tv working under sphinx and michael had vga, keyboard and mayybe tv working (all working 18 mths ago). ZiCog has been using the 1pin drivers too.
I know catalina can run spin and pasm programs, but they are currently compiled with the c program.
Sphinx effectively separates the spin from pasm in its cog drivers. So, in fact it is simple to reload a new driver to swap say, tv to vga on the fly. The trick is to load these from sd card on the fly and I believe you have done this?
I am not trying to run multiple c programs using emm concurrently as this has problems to overcome. But to load and run new c programs should be easier.
Once I get the basics running - loading cog drivers which remain resident and then being able to run zicog using these, zicog should be able to return to the command prompt of the prop os instead of rebooting the prop as i am now doing.
I gues that i need a spin program that will locate (using kyes sd driver) and load into a cog a driver by filename.
Define memory in external ram (no need to waste precious hub ram!)
Load the file into external ram - call this from "main"
which calls this function
and this is Ross' cog loader. Need to move the data from external ram to hub ram before loading
I wonder if a short cog stub would be useful to load the code into the cog and eliminate the requirement for 2KB buffer in hub? Only the 512B buffer for the sd sector would be required.
The stub cog loader would be only about 30 instruction max. This would be loaded into the cog (496 longs actually loaded, but only say 30 required). This code would then set a hub flag to indicate to the OS it is loaded, the OS would then load the first 512B sector into hub and set a flag, the cog would copy this into cog and set a flag, the OS would load the next 512B sector and flag, cog copies into next section of cog and flag, etc. Now, because we know how many bytes actually need to be loaded into the cog, the OS and stub would only load this portion, thereby reducing the secondary load time. Now, before you say that the cog will be running and overloading itself, remember I can run LMM with zero footprint using the shadow registers
Interesting hey
That means a VT100 driver in C, or whatever else we want.
As promised elsewhere, I have updated the attachments in the first post in this thread to match what has actually been implemented in the newly released Catalina 3.5. The differences are small, and mostly to do with the way memory is managed during initialization - now the InitializeRegistry functions stores the address of the lowest currently used Hub RAM Address in a fixed location in upper Hub RAM (address $7FFC on a Prop I), and each plugin allocates any buffer space it needs downwards from there as part of its Setup function and updates the value for the next plugin to use (when its Setup function is called in turn). There are also a few minor additions/changes to some of the layer 1, 2 and 3 methods that I found to be useful.
Catalina 3.5 now provides a fairly complete implementation of layers 1,2,3 & 4. While there are still some "legacy" bits and pieces inside Catalina that do not fully adopt these techniques (this is because Catalina adopted many Spin/PASM drivers that themselves do not always conform) I will continue to work to make the internals align with the proposal whenever I find the time to do so.
Ross.
Sorry it took so long to reply to this but I'm just getting to the point where I need this information. I have a question about your proposal. The service directory entries contain the communications block used to invoke the service as I understand your proposal. So how can a COG provide multiple services by adding multiple service directory entries? Won't that mean that the COG will have to monitor multiple mailboxes at the same time? Shouldn't there be a way to indicate that one COG can handle multiple services through a single mailbox? I think Ross covers this with the indirection through his level 1 table allowing multiple service directory entries to point to the same mailbox. Maybe instead of the sdentry field being followed immediately by the mailbox address, it should be followed by a pointer to the mailbox address so the same mailbox could be shared by multiple services.
I just finished scanning your document attached to the first message in this thread. Mostly, it looks good to me although I have a concern about the fact that the service table is directly indexed. This seems like it could become a problem when the number of possible services becomes large but any individual program needs only a small subset of them. Also, it presents a problem of how to allocate service index numbers. There has to be some central authority to allocate numbers or people will step on each other. I wonder if some combination of what you've done and what Eric proposed earlier in this thread might solve this problem. I realize it introduces the need to search the service directory to find a specifi entry but the results of that search could be cached by the application to avoid having to repeat the search each time the service is needed. The idea would be to use Eric's linked list approach but to have each service table entry contain not the mailbox itself but instead a pointer to the mailbox. This would allow multiple services to be handled by the same COG/mailbox and also allow a sparse list of services rather than an array where many of the entries are NULL. Does that seem reasonable? If so, I can write up a more complete proposal.
Thanks,
David
Hi David,
By all means propose a concrete alternative. I'd be happy to consider anything that encourage wider adoption of a common solution (since it would save me work in the long run!).
However, after implementing a working version of the current proposal, I'd have to say that even this relatively simple one turned out to involve more complexity and overhead than I was really comfortable with. I am also acutely aware that many people already find it too complex, and can't see any need for it. Eric's proposal is many times more complex than mine, and would also be many times less efficient in both time and space. On that basis, the additional benefits would seem very hard to justify - especially as you still need a central authority for allocating plugin and service types. I've not found a way around that, and I'm not sure there is one.
But if you come up with a proposal that solves all these problems without incurring so much additional overhead that it becomes effectively unusable, I'm sure everyone (including me!) would adopt it.
Ross.
I have suggested that the initial allocation of 1 long and then a further allocation of 2 longs (IIRC) and then a further allocation (if required) is already overly complex. I would just rather that 4 longs be allocated initially for each cog, and then allocate another block if required. This would permit a bit of simplification in the current method Ross is using in Catalina.
Currently the way in which a further block is allocated has to be held by the cog waiting for Catalina to complete the loading process unnecessarily wastes space in the cog.
If 4 longs per cog were allocated, then this would be my preference for its use...
0: b24-31 service type (as per Catalina)
...b23-0 pointer to additional hub allocation (if allocated)
1: b31-0 service mailbox as per Catalina's secondary service call (sorry, cannot recall its name (other laptop) - where you use the 2 long block for the service call)
2: b31-0 parameter for "1" call
3: b31-0 return parameter if required
This method also permits other uses such as a bi-directional data flow (eg: and input and output device where a single buffer for transmit and a single buffer for receive). Here, I am thinking of a single display and keyboard driver or FDX with internal cog buffering.
My reasoning for the above is that the overhead for allocation of two buffers (which is what Catalina 3.4 presently does) is more than the 8 extra longs consumed by 4 per cog. And the extra long permits some other things to be done with this allocation.
The top position should be reserved for the allocation pointer. Next is the question does this point to the next available free hub space, or the last used ??? ANd what do we call this pointer???
eg: the pointer is at $7FFC (top of hub)
eg cog table is at $ (8 x 4longs = 32 longs = 128 bytes)
This begs the question... could we overlap these and define cog 7 as a special case ??? where the last long is reserved for the top of hub pointer???
Here is my hub allocation for my OS version...
Note I allocate 8 x8 longs (256 bytes) per cog
Does the Catalina registry work with your EEPROM loader? How does the loader know what size mailboxes to setup for drivers that it loads in advance of loading the user code from the upper part of EEPROM? Does the first-stage loader setup the registry so it is in place when the user program starts?
Thanks,
David
Yes, that's right. The first stage of the load sets up the registry and loads all the drivers. But instead of then loading and starting the kernel, it then loads a second stage loader that does the rest.
Ross.
Yes, apologies - the terminology is perhaps a bit loose here. Some of the Catalina loaders actually have more than two phases - they may have up to four, depending on the source (EEPROM, SDCARD, serial) and the destination (Hub RAM, XMM RAM, Flash). I just tend to lump them all together and call them "two-stage" loaders, but "multi-stage" loaders would probably be a better term.
Your description of the first phase (the built-in loader) should probably be considered "phase 0" - its job is to load the phase 1 loader - it knows nothing about any languages at all (Spin, C, Forth, native PASM, LMM PASM). Phase 0 just knows about bytes - but it also knows how to start phase 1. Phase 1 (always a Spin program) knows only about plugins and registries, and puts those in place - but it also knows how to start phase 2. Phase 2 knows about program segments, and how to put those in the appropriate places in Hub RAM and/or XMM RAM. Then last thing phase 2 does is start the kernel.
The most complex is probably the SDCARD loader - a description of the phases it uses is given in the Catalina Reference Manual, and your phase description corresponds most closely to phases 3 and 4 - i.e. it is phase 3 that loads the plugins, and phase 4 that loads the code into its final destination and starts the kernel.
The problem you mention about plugins taking up almost all the Hub RAM does occur, and is what makes the loading so complex. Including the registry (which has to be there from phase 1 onwards) it is important to make the whole load process use no additional cogs (since that would prevent you loading all the users plugins) and I also try to make them use less than 1kb of Hub RAM.
As I said, the most complex loader is the SDCARD loader, where most of the complexity (it is now even more complex than the description given in the reference manual, which I need to update!) derives from my desire to keep the overhead even lower - down to around 512 bytes if possible, allowing C programs to occupy nearly the full 32kb of Hub RAM.
Ross.
Ross.
No worries, David. I look forward to GCC eventually catching up to what Catalina has been able to do for several years now
Ross.