Questions regarding Catalina
Dr_Acula
Posts: 5,484
I've been thinking about propeller languages that can use more memory on the propeller. There are languages that can use more memory but they seem to have limitations. Spin runs out at 32k (less if you have leftover memory from loading cogs). CP/M runs out at about 48k. I wonder if there is a language that can truly take advantage of a large memory space that is 'flat', ie no complexities in moving around within the memory space. DIM MYARRAY(200000)
This is a bit complicated to explain, and in particular, to explain where existing languages fall short. I'll start with this description of Catalina, as I think this language might be getting the closest to achieving a large flat memory model. From the Catalina website;
* Tiny - all code and data share the 32Kb of Hub RAM. This mode is used by all LMM and EMM programs, and is suitable for use on any Propeller platform.
* Small - code can be up to 16Mb, but all data (including the stack and heap) must still share the 32kb of Hub RAM. This is the original XMM mode as implemented in the various beta releases. This mode requires dedicated XMM hardware (i.e. external SRAM). Currently supported are the Hydra and Hybrid (using the HX512 external SRAM card), the TriBladeProp, RamBlade, DracBlade and Morpheus.
* Large - code, data and heap can be up to 16Mb (combined), and only the stack uses the 32Kb of Hub RAM. This mode uses a completely new code generator (the previous code generator remains in use for the other modes), and also an enhanced XMM Kernel. This requires the same dedicated XMM hardware as the Small mode. When the Prop II eventually surfaces, the space available for stack under the Large addressing model is expected to be increased to 256Mb. However, note that 'larger' is not always 'better' - programs that use the larger addressing modes will generally be slower than programs that use the smaller addressing modes. A programs should always use the smallest addressing mode it can.
Spin is great, but up to 14k of the 32k can end up being wasted loading up cogs and that memory ends up at random bits scattered all over the 32k, so it is hard (though not impossible) to reuse. Propeller ram is precious though as it can be used for far more productive things like big screen buffers. Standalone pasm cog loaders are difficult as virtually every object in the Obex is a mixture of pasm and spin, and these are inextricably linked.
In an attempt to articulate the design challenge, consider a propeller where you have an sd card, external memory (serial/parallel) and no code is allowed to reside in hub. The only thing the hub can be used for is as a temporary store for moving data into a cog. After that has been done, all the hub is then used as a screen buffer.
This might seem crazy, but it forces a rethinking of how a language works, and in particular, it takes one down a path that I think is the same as the 'large' memory model for Catalina.
Ok, perhaps to explain this better, consider a typical object. Let's take the keyboard object. In spin we have
in the VAR section we have
and when the cog loads up we have
and then we have some PASM
What is going on here is really quite subtle. The startup passes a location @par_tail, which on the face of it is passing the 'tail' pointer, but it isn't, it is actually passing the entire list of variables and arrays in the VAR section, which just happens to start with the par_tail variable.
This list becomes the glue that binds Spin and PASM together and enables easy passing of values from Spin (via a calling method) into PASM, and to get data out again.
It is this glue that means very few PASM programs can be considered on their own, and hence any BIG program has to have a high level language glued to PASM. So we don't see much in the way of PASM programs being loaded off an SD card, and hence virtually all the propeller programs end up with up to 14k of wasted space inside the hub ram from loading up all the cogs.
Is there another way??
I don't know if the large memory model of Catalina can support inline PASM code, but even if it isn't, then perhaps this might be possible.
Consider a program in C that is 400k long and is loaded from SD card into a 512k memory chip. This program has code which replicates existing objects (translating line from line from Spin to C ought to be possible). First bit is a main, and that calls a 'keyboard load' function and that function is a replica of the Spin keyboard object. It has a VAR section with a list of arrays and variables. It has the equivalent of PUB methods called from other functions, such as GetKey. And it has some PASM and a cog loader.
Armed with this information, the compiler ought to be able to glue the PASM and C together. When a pseudo cogstart is executed in C, this is now code running in external memory. So the first thing is to take the next 2k of data (which will be PASM code) and move it from external memory to a fixed location in the propeller hub, and to then execute a real cogstart, to move that 2k into the cog. Of course, the second time this is done, the same hub location can be used as a temporary buffer.
Another thing to consider is how cogs interact with external memory. They might have to include some ram driver code (20 longs?). Or maybe they do interact via hub ram locations, though it would make sense to try to keep these to a minimum. Perhaps allocate 1k of hub ram for data, flags and for the Catalina stack.
The last cog one might load would be the display driver, which then uses 31k of hub ram for a big screen buffer (which then translates to more pixels, more color depth etc).
I think I can see a way to write functions in C itself to load up cogs (via hub), even when the C is running in external memory. And perhaps it is possible to make it easier to conserve precious hub ram for communications, maybe by putting all the list of communication arrays in one location within the C program so you can see the list growing and keep an eye on its size (eg with the keyboard example above, we can see it is 3+8+8=19 longs)
What I'm not sure about is whether Catalina can compile PASM into long bytecodes? Even if the bytecode means nothing to Catalina, it can then be moved as a block into cogs. Does Catalina contain a PASM compiler, and if not, is it possible to add one in from other sources (eg BST)?
This is a bit complicated to explain, and in particular, to explain where existing languages fall short. I'll start with this description of Catalina, as I think this language might be getting the closest to achieving a large flat memory model. From the Catalina website;
* Tiny - all code and data share the 32Kb of Hub RAM. This mode is used by all LMM and EMM programs, and is suitable for use on any Propeller platform.
* Small - code can be up to 16Mb, but all data (including the stack and heap) must still share the 32kb of Hub RAM. This is the original XMM mode as implemented in the various beta releases. This mode requires dedicated XMM hardware (i.e. external SRAM). Currently supported are the Hydra and Hybrid (using the HX512 external SRAM card), the TriBladeProp, RamBlade, DracBlade and Morpheus.
* Large - code, data and heap can be up to 16Mb (combined), and only the stack uses the 32Kb of Hub RAM. This mode uses a completely new code generator (the previous code generator remains in use for the other modes), and also an enhanced XMM Kernel. This requires the same dedicated XMM hardware as the Small mode. When the Prop II eventually surfaces, the space available for stack under the Large addressing model is expected to be increased to 256Mb. However, note that 'larger' is not always 'better' - programs that use the larger addressing modes will generally be slower than programs that use the smaller addressing modes. A programs should always use the smallest addressing mode it can.
Spin is great, but up to 14k of the 32k can end up being wasted loading up cogs and that memory ends up at random bits scattered all over the 32k, so it is hard (though not impossible) to reuse. Propeller ram is precious though as it can be used for far more productive things like big screen buffers. Standalone pasm cog loaders are difficult as virtually every object in the Obex is a mixture of pasm and spin, and these are inextricably linked.
In an attempt to articulate the design challenge, consider a propeller where you have an sd card, external memory (serial/parallel) and no code is allowed to reside in hub. The only thing the hub can be used for is as a temporary store for moving data into a cog. After that has been done, all the hub is then used as a screen buffer.
This might seem crazy, but it forces a rethinking of how a language works, and in particular, it takes one down a path that I think is the same as the 'large' memory model for Catalina.
Ok, perhaps to explain this better, consider a typical object. Let's take the keyboard object. In spin we have
PUB key : keycode '' Get key (never waits) '' returns key (0 if buffer empty) if par_tail <> par_head keycode := par_keys.word[par_tail] par_tail := ++par_tail & $F
in the VAR section we have
VAR long cog long par_tail 'key buffer tail read/write (19 contiguous longs) long par_head 'key buffer head read-only long par_present 'keyboard present read-only long par_states[8] 'key states (256 bits) read-only long par_keys[8] 'key buffer (16 words) read-only (also used to pass initial parameters)
and when the cog loads up we have
okay := cog := cognew(@entry, @par_tail) + 1
and then we have some PASM
What is going on here is really quite subtle. The startup passes a location @par_tail, which on the face of it is passing the 'tail' pointer, but it isn't, it is actually passing the entire list of variables and arrays in the VAR section, which just happens to start with the par_tail variable.
This list becomes the glue that binds Spin and PASM together and enables easy passing of values from Spin (via a calling method) into PASM, and to get data out again.
It is this glue that means very few PASM programs can be considered on their own, and hence any BIG program has to have a high level language glued to PASM. So we don't see much in the way of PASM programs being loaded off an SD card, and hence virtually all the propeller programs end up with up to 14k of wasted space inside the hub ram from loading up all the cogs.
Is there another way??
I don't know if the large memory model of Catalina can support inline PASM code, but even if it isn't, then perhaps this might be possible.
Consider a program in C that is 400k long and is loaded from SD card into a 512k memory chip. This program has code which replicates existing objects (translating line from line from Spin to C ought to be possible). First bit is a main, and that calls a 'keyboard load' function and that function is a replica of the Spin keyboard object. It has a VAR section with a list of arrays and variables. It has the equivalent of PUB methods called from other functions, such as GetKey. And it has some PASM and a cog loader.
Armed with this information, the compiler ought to be able to glue the PASM and C together. When a pseudo cogstart is executed in C, this is now code running in external memory. So the first thing is to take the next 2k of data (which will be PASM code) and move it from external memory to a fixed location in the propeller hub, and to then execute a real cogstart, to move that 2k into the cog. Of course, the second time this is done, the same hub location can be used as a temporary buffer.
Another thing to consider is how cogs interact with external memory. They might have to include some ram driver code (20 longs?). Or maybe they do interact via hub ram locations, though it would make sense to try to keep these to a minimum. Perhaps allocate 1k of hub ram for data, flags and for the Catalina stack.
The last cog one might load would be the display driver, which then uses 31k of hub ram for a big screen buffer (which then translates to more pixels, more color depth etc).
I think I can see a way to write functions in C itself to load up cogs (via hub), even when the C is running in external memory. And perhaps it is possible to make it easier to conserve precious hub ram for communications, maybe by putting all the list of communication arrays in one location within the C program so you can see the list growing and keep an eye on its size (eg with the keyboard example above, we can see it is 3+8+8=19 longs)
What I'm not sure about is whether Catalina can compile PASM into long bytecodes? Even if the bytecode means nothing to Catalina, it can then be moved as a block into cogs. Does Catalina contain a PASM compiler, and if not, is it possible to add one in from other sources (eg BST)?
Comments
I'm not sure you will ever see languages that treat external memory and HUB memory as continuous and "flat". I thought about doing this for the Zog interpreter for C code. I rejected the idea because:
1) There is a performance hit in the required checking of every memory access to see whether it is for HUB or Ext RAM and mapping addresses appropriately.
2) When Zog is running C code from Ext RAM it is possible, and quite likely desirable, to have normal Spin objects running in HUB as well. So it's better if Zog cannot easily tramp over the HUB with such things as declaring large arrays.
3) Zog gives programs running from external RAM a way to read/write LONGs from HUB with minimal impact on code execution performance. This is sufficient to enable use of shared memory between C and Spin code. And sufficient to allow Zog to load PASM code blobs which have to be in HUB at the time.
I think a flat HUB/Ext memory model is not necessarily useful or desirable.
and a lot of relevant stuff...
Yes. Thinking about just Spin and HUB for a moment imagine this scenario:
1) You know your application needs;
a) A serial console/debug port. (Uses 1 Cog for PASM)
b) A floating point co-processor (1 Cog for PASM)
c) Perhaps some video ( 1 Cog for PASM)
d) Whatever other PASM drivers you need (n COGs)
2) On start up your application has a "bootstrap" mechanism that fetches all the PASM binary blobs required, one at a time, from the high end of a 64K EEPROM or SD card or whatever. It then starts that PASM in a COG and moves on to the next one.
3) Each PASM blob is started by the boostrap loader with the required PAR parameters and is given a fixed address in HUB memory to use as is communication with the outside world when it is running. This area could contain a simple "mailbox" and/or buffer area.
4) When the loader is done it can forget about whatever buffer space was used to load the COGs. It then starts up a second stage of "bootstrap" which is entirely in PASM in a COG. This second stage no longer needs to care about the HUB memory occupied by the the 1st stage loader code. As it is running from COG PASM it has total control over all of the HUB space, except the predefined "mailbox" areas. This 2nd stage loader can pull in the actual Spin application and load it to HUB starting at address zero. When the 2nd stage is done it starts the application in it's own COG, thus recycling the loader COG, and the application can be as big as 32K minus a little bit for the "mailbox" areas.
The problem is that the whole Spin object model works against this. As you point out, PASM code is tightly bound to the Spin object that starts it and the interface methods that us it. Reality is that PASM blobs don't need those Spin object wrappers. All it needs is for PASM drivers to be coded in such a way that they only need thei PAR parameters fed in on start up and a memory area to interface to the world with when running.
Bill Henning and others have thought about this and Bill at least is tackling the problem.
Right now there is a ZOG loader that can do most of this for C code. It can start FullDuplexSerial, Float32 and VMCOg running in COGs. Then it can start the Zog interpreter giving C code free reign over the 32K of HUB. All that is missing is the ability to get the PASM blobs or Zog executable from SD card, high end EEPROM or whatever storage.
I don't know but I suspect Catalina already does something similar.
Yes. The entire scenario I outline above relies on a feature of the BST compiler. There is an option to BSTC that causes it to compile Spin objects and throw all the Spin byte codes away outputting a binary "blob" containing only the compiled PASM instructions.
Once you have that you can throw those blobs around as you please, including in Zog's case linking them into C objects.
There is more to be done here...
ICC can do that too. All of the ICC compatible modules relying on PASM in OBEX use this method. It should be possible to put all the PASM arrays into one place and recycle them in C.
Catalina already does something quite close to what you are suggesting. With the LARGE memory model, I use a truly flat addressing scheme - all memory addreses start from zero, but the first 32k (from 0000 hex to 7FFF hex) refer to hub addresses. The Catalina XMM kernel checks for this in all RAM accesses, and if the address is less than 8000h it uses hub RAM. If the address is greater or equal to 8000h it uses XMM RAM (subtracting 8000h to avoid wasting the first 32k of XMM RAM).
Heater is correct that there is a slight performance penalty for this, but there is such a large performance penalty in executing from XMM RAM anyway that I don't think it matters much. With fast XMM RAM the result is that C executes from XMM RAM at speeds comparable to SPIN executing from Hub RAM, so I think the overhead is worthwhile.
Also, it is possible to include hand-crafted PASM in a Catalina program - but if you want this PASM callable directly from C like a C function, then it has to be written in LMM PASM - which means there are a few instructions that are not permitted, and a few conventions that have to be adopted. This is described in the Catalina Reference Manual. You can also write a normal PASM program and simply launch it from C - an example of this is given in the demo\spinc folder (which illustrates the mechanism Jazzed describes, and actually uses his "spinc" program).
Note that all the Catalina drivers started life as normal PASM drivers, and therefore communicating with them from a C program executing from XMM is a little complex. But the key thing to remember is that even when using the LARGE memory model, both the stack and all local variables are stored in hub RAM - so when you need to call a driver function, you must do it using only local variables. You can see examples of this in the Catalina t_string function:
Of course, from C this is completely invisible to you, and you just call the t_string function as normal, passing in the address of a string either in Hub RAM or XMM RAM.
Ross.
Forgot to answer this bit. Catalina does not produce objects or binaries like a normal compiler. Instead it generates pure PASM, and does all its library management and linking at the PASM source level. Once it has generated all the PASM it needs, and combined it into a single file, it then uses Homespun to compile it. It originally could use either BST or Homespun (or in fact the Parallax Propeller Tool) but I found supporting multiple PASM compilers was getting too complex, so I settled on Homespun.
I intend to revisit this at some point since Michael Park has indicated he probably won't be supporting the Prop II in Homespun, whereas Brad intends to in BST.
Ross.
With fast XMM RAM the result is that C executes from XMM RAM at speeds comparable to SPIN executing from Hub RAM
That will be more than fast enough for me.
Ok, I'm still a bit confused as to how Catalina works with large memory. Say we have a program written in C. And say it is 400k long. And say it fits into memory where the first 32k is hub and the rest is external memory. And say we start our program with dimensioning a big byte array that happens to be 32768 bytes long. Then we have some code, which I presume the compiler has turned into assembly. Is there some sort of program, possibly running in a cog, that takes the first long of that program, moves it from external ram into hub or a cog, and runs it? (Is this the HMI manager?) Similar to the way the spin interpreter gets longs from hub and interprets them? Is this program the XMM manager, or linked to it in some way?
I'm intrigued that cogs can be loaded. I don't think I need LMM - just the ability to load up cogs with code, but load them in such a way that there are no bits of essential link code left in hub ram. Ideally I want to be able to completely (or almost completely) wipe the hub ram after all the cogs are loaded.
I say 'almost completely' because I note there is not just the stack, but there are also local variables store in hub. Does this mean one would need to be careful not to declare large local arrays within a function? Would it be better to declare large common arrays, and share data that way? And if you did declare a large common array at the beginning of a C program, does this mean it ends up in hub, and if so, can it be forced to be in external memory, by, say, declaring a dummy 32k array as the first declaration?
Of course, one alternative to all this is to code objects with no supporting high level code. There is an example of this in pullmoll's CP/M emulation where the keyboard has been stripped right back to just having one bit of high level code, which is there as a dummy bit of code to pass the link of the start of the arrays/variable list.
Not impossible to do with some objects, but I think it might be daunting to do it for everything in the Obex. I have a crazy idea it might be easier to use Obex code as it is and translate (or write a translator) that converts Spin to C.
Or... maybe think about just doing the essential Obex code, which I think would be keyboard, mouse, serial, display driver (and more, no doubt).
I think all these would still need to communicate through the hub though, as only one bit of code can handle external ram access (unless you have locks) and so this XMM manager would act as a postbox and check periodically for new data in certain hub locations and pass it through to external ram.
Hmm - this is complicated to think about. Though I think it could be possible to create a skeleton program that comes up when you type "new" and then you build on that by copying how things are done - eg a skeleton program with mouse, keyboard, serial and display driver. I'm thinking about not only whether this is possible, but whether it is also easy to use (particularly for newcomers).
Where I'd like to see this heading is a generic board with serial ram, so there are lots of pins free, and a generic program that provides all the basic I/O ready to go, so you can get straight into programming "hello world" without having to think about which buffer the 'h' goes to before it goes to the screen buffer.
I'm afraid it gets complex. Catalina divides RAM up into 5 types (usually called 'segments'):
- CODE: (all compiled PASM)
- CNST: Constant Static Data (string literals and other constant values)
- INIT: Read/Write Static Data (static variables, similar to the SPIN 'VAR' segment)
- DATA: Dynamic Data (heap - i.e. space allocated with 'malloc')
- (unnamed): Stack (call stack and also local variables)
The arrangements of these segments in memory is what the Catalina memory models are about:- The TINY model puts all segments in Hub RAM.
- The SMALL model puts CODE in XMM RAM and all the rest in Hub RAM.
- The LARGE puts CODE, CNST, INIT and DATA in XMM RAM, and only leaves the stack in Hub RAM. One implication of this is that the total of the call stack and local variables MUST be less than 32k.
Let's assume you are using the LARGE model. In your example, the CODE segment would be 400k - it would start at address 8000h and go to 6C000h. Then would come the CNST segment. Then would come the INIT segment. The remainder of available XMM RAM is used for the DATA segment. If you declared a 32k array outside all functions it would end up in the INIT segment. If you declared a 32k array within a function, it would (by default) be allocated on the stack and would blow the 32k stack/local variable limit. To avoid this, it would have to be declared 'static'.The use of the code segment is fairly obvious. To illustrate the use of the various types of data segment, consider the following C code (which I am assuming is compiled using the LARGE memory model):
So you can see that you can have arrays of 32k (or larger) - but they must be allocated in XMM RAM, not in the Hub RAM (i.e. if they are local to a function, they have to be declared as 'static' - which forces them into XMM RAM).
If you were to save the above program as test.c, and then compile it using a command like: Then look at the resulting test.lst file you will begin to get more of an understanding. In particular, look for the occurrences of the following symbols in the listing:
The last segment - i.e. the Stack - is unnamed because it is constructed "on the fly" by the program at runtime, so you won't see any reference to it in the listing.
However, unlike SPIN programs, the resulting Catalina program cannot simply be loaded and executed as it is shown in the listing. Catalina must do a lot of messing about to take the binary program and get it to the point where it can be executed - this is the job of the various Catalina loaders. What they do is:
- Read the binary (e.g. from SD or EEPROM)
- Rearrange the segments into Hub RAM and XMM RAM as determined by the memory model in use, and also the values of the various symbols listed above
- Load the XMM interpreter into a cog.
- Start the XMM interpreter, pointed at the program entry point.
Of course, this is a gross over-simplication - there is a lot of other messing about that needs to be done, such as loading all the drivers and other plugins into the other cogs - but you should get the idea.Once the program is loaded, you can dynamically load other cogs with additional C code or PASM if you like - provided the cog is not already used by a driver or by the Kernel. Examples of doing this are given in the Catalina demos.
Catalina also provides a 'registry', which provides a fixed area of Hub RAM used for communications between the executing Kernel and the other cogs - these cogs might contain drivers or other plugins (such as the floating point support cogs). All 'plugins' must register themselves on startup, and then monitor their registry entry. When the Kernel needs to invoke a plugin function, it does so by putting a request in the registry and then waiting for the result to appear.
Finally, you could write a translator that takes a SPIN program and converts it to C - but only a crazy person would want to do that :smilewinkgrin:. Why not just write the program in C?
Ross.
Why not just write the program in C?
Indeed. And maybe you already have some drivers written! This is all quite exciting.
Ok, another crazy idea. Program starts up and loads up the C cog with a loader program and then moves a big binary into external ram. This big binary contains pasm and C glue code and loads up the other cogs. (or you could do it through the startup program, but I'm thinking it could be helpful to have one big C program with all the pasm visible as well, so you can experiment and tweak things with the cog code. Then the 'loader' code never changes).
So - our big C program starts at the beginning of external ram and it grabs some 2k chunks of pasm data from itself and moves it into hub and does a series of cogstarts and hence loads up all the cogs.
Then we completely wipe the hub ram. Amd then fill it with a bitmap background screen (320x200 = 64000 pixels, 2 color nibbles per pixel which I think is 32 kilobytes = all the hub ram. The display cog displays this.
Each cog has a tiny bit of pasm to talk to a few fixed locations in external ram (not hub). 2 bytes - at a minimum one flag and one data byte, and repeat for comms the other way, so 4 bytes.
Ok, so what cannot be done that way and has to use hub ram.
1) I think you need locks so two cogs don't try to access external ram at the same time. The lock can't be in external ram, so it has to be in hub.
2) I think you need your stack because that is the way Catalina works.
3) Any other essential things?
So maybe we sacrifice the bottom line on a 320x200 display for essential hub ram.
Now it becomes a matter of which drivers are essential. Keyboard is done already in pasm. There is a 4 port serial object in almost all pasm, and a modified version with bigger buffers for 2 serial ports. I think I recently saw a mouse/keyboard combo that was mostly pasm. Display drivers are already mostly pasm due to speed requirements.
That ought to be bare bones enough to get a 'hello world' working.
This might seem convoluted, but what you could end up with is the highest resolution display possible on the prop, plus C code that starts in external ram and just keeps on going to as big as your ram can be.
You could do some pretty cool things with that. Store a background screen as a bitmap from within C itself, then some overlays also in C and start doing some fancy graphics and animation without ever having to read data (slowly) off an sd card. Store lots of data. Write really complicated programs.
If you are prepared to put up with a screen glitch, you could also read data in and out of cogs on the fly (which is hard in spin because each time you gobble up another precious 2k of the 32k of hub ram).
I'm excited so much of this already seems to exist for catalina. I get a vague feeling we haven't really pushed catalina to its limits yet.
I'll second that. RossH has done a tremendous job on Catalina. Not just a language compiler but a huge array of support for different platforms, hardware driver objects and tools. Every time I start to ferret around in there I find something new and amazing.
Ideally all PASM codes, UART, SPI, Float32, etc etc would be written in such a way that they are usable on their own. Just taking their parameters on start up via PAR and communicating through a memory area when running. This way they could easily be used by Catalina, Zog, PropBASIC or whatever other language. Sadly the Spin object system encourages authors to produce PASM code that is tightly bound to the enclosing Spin object.
The worst cases are such that the Spin object sets up some LONGs within the PASM DAT section prior to loading it to COG. This makes it hard to use from other languages as unlike Spin they have no idea of the labels used in the PASM.
Still, many times it is possible to extract the PASM binary from an object and use it with out the Spin code. Perhaps after some modification of the original object. Both Catalina and Zog are using this trick.
Nirvana. I think you'll find that Bill Henning is also working on such an idea.
Loading up all the cogs with PASM code would be easy enough, and reserving the lower part of hub RAM is also fairly simple - when using the LARGE memory model Catalina won't use it unless its stack grows down into it (which is easy enough to avoid by making everything 'static' and not calling too many levels of functions - but note that you'd have to avoid all library functions, since their usage of the stack is unpredictable).
However, only one cog can currently access XMM RAM - adding the ability to access XMM RAM from multiple cogs would (as you point out) require an additional cog dedicated to XMM access, or cog interlocks - both would add to the size of the code, and also slow the whole thing down significantly.
As you noted, Catalina requires the upper area of Hub RAM for the registry, and also a minimum amount of Hub RAM to use as stack space for each C cog. However, with careful planning you could probably count on having 30 or even 31k available. Of course, some of this space is normally used by other drivers - keyboard, mouse, serial drivers etc - but this is fairly small - if what you are after is maximum display capability, and perhaps just a keyboard, then you could have nearly all of the Hub RAM dedicated to it.
My big question is what would you want this for? If you want fully pixel addressible high resolution graphics, then 320x200x4 isn't really "high resolution". On the other hand, if you just want high resolution text, and with the ability to have each character cell in a different colour (allowing you to do both text and "chunky" block graphics) then you don't need to do this - just write a new improved hi-res text driver. The current one is designed to only support one color setting per row of text - but that's because otherwise it would consume too much Hub RAM to be useful. No one has yet written a driver specifically intended for use with a program executing from XMM RAM, where dedicating 24k of Hub RAM to the display would not be a problem.
With 8k of hub RAM for the character buffer, and another 4k of hub RAM for the color information, you could have 128 x 64 rows of 8x12 characters, with each cell able to independently have one of 4 foregound and one of 4 background colours. Some of those characters can be "block graphics" characters, enabling (say) 256 x 128 "chunky" graphics as well as text.
Would this be sufficient for your purposes? If not, I'd suggest you instead wait for the Prop II!
Ross.
P.S. Heater - thanks for the kind words about Catalina!
It is basically:
1) Load and start all your PASM drivers at start up.
Normally apps have no need to dynamically load/unload COGs once they are running. You know what hardware support you need and it is fixed. Those PASM images need not occupy HUB space forever.
2) The loader that does 1) could get the PASM blobs from the second half of a 64K EEPROM or elsewhere.
3) The loader then replaces itself (and any PASM COG images) in HUB with the actual application code.
4) The app code runs, using almost all HUB space for it's own code and data. Just a kilo byte or so is required to communicate with the running other running COGs.
I say "sadly" above because the Spin object model provides no support for this. We end up with all that "dead wood" of PASM images sitting around in random places in HUB wasting space. The current Spin/PASM object model has encouraged the authoring of hundreds of PASM drivers that cannot be directly used in that way.
How brilliant would it be if the current Spin loader sucked COG images from the top of a 64K EEPROM and started them in COGs and THEN sucked your Spin application from low EEPROM into HUB to run. Then of course the Prop tool would have to support programming this way.
In my small way I am aiming to get Zog to do this in C. Already I can start Zog in such a way that C code has all of HUB to itself. What it needs now is starting the required hardware support COGs, SD, UART, etc before Zog C code is started.
Much of this is already possible. Even in SPIN/PASM, it would be simple to emulate what Catalina already does (with its various EMM/XMM/SD and Hub Loaders) which is to load an "initial loader" that loads all the cogs up with drivers (in Catalina's case it also loads XMM RAM), then starts another loader to load Hub RAM with the final application program, which then finishes up by turning itself into a C Kernel or a SPIN interpreter.
Of course, you may need to load a "modified" SPIN interpreter to prevent it from trashing all the setup work you've just done - as you point out, SPIN is not very "friendly" to other applications - but Cluso99 already has one of those.
Should be relatively easy - either in C or SPIN. Of course, the devil is in the detail!
Ross.
I think that's my point about the current Spin object model.
Doing this multi-stage bootstrap is possible but you get no help from Spin the language or the Spin tool.
When you have it working you find you can't just grab any old driver out of OBEX and use it straight off.
You end up with a custom home grown solution that no one else uses so the "ecosystem" does not flourish.
Simple things in the Spin system would help. Like being able to direct the Spin compiler to bunch all the Cog PASM sections from different objects together in one place at the top of memory. That big contiguous space could then be reused after everything is started. Spin gives no control over this kind of build process.
Can't disagree with anything you say. That's why I no longer program in SPIN (if I can possibly help it!). SPIN is a good beginner language, but it is by no means an "all purpose" language. Also, while it is easy to learn, it encourages beginners to program the Prop in ways that fail to show off the Prop hardware to best advantage. It also doesn't make it easy to use SPIN where it does shines, and other languages where it doesn't.
IMHO, the things SPIN most desperately needs are:
1. A language independent object format.
2. Some memory management.
3. A standard for application/driver communication.
I know there have been some attempts by various forum members to fill these gaps - but without at least some backing from Parallax, they are doomed to wither.
The Catalina approach is simply to "do it all" so you can program in C without ever having to worry about any of this stuff. However, this is not necessarily the best approach - it is just the easiest one at the moment, given the limitations of SPIN.
Ross.
In time Homespun and BST was written --- I said BIN need restructure to that possibility's You describe --- BUT most people on forum said -- IT IS NOT NECESSARY
You know, with BST I think it is just about possible:
1) Write all your drivers in two files. A Spin only file and a PASM only file. The only interaction between them will be through PAR.
2) Build your application using only the Spin files in BST as a binary file
3) Compile the PASM files with BSTL using the option to emit the resulting PASM binary "blob"
4) Create a little program or script that appends all the PASM blobs together and places them at the end of a EEPROM image file with compiled Spin binary at the begining.
5) In the Spin code is a start up section that finds all the PASM blobs and loads them
Now there is lots of contiguous free space at the end of the Spin code.
How the Spin finds the PASM blobs is an exercise for the reader:)
I know it is possible ----> BUT can You said it is possible for BEGINNERS.
Without help from compiler?.
IMHO I don't think spin will be able to do this in the forseeable future. But it appears that catalina can do much of this. And under these circumstances, it may be that the argument of 'spin vs C' (vs any other language) is less important than the ability to acheive real practical outcomes.
Brainstorming here. We can replace external ram with code loaded from high eeprom and we can even replace sd card with code loaded from high eeprom. This is a very simple model to start with but it can be extended easily to something much bigger. And I think it comes down to cog code that has no supporting high level language at all.
I recall discussing this about a year ago - back then it was a bit of an academic topic. Since then, Pullmoll has coded some real pasm-only objects for the CP/M emulation - a keyboard object, and a vga/vt100 object and an (almost bare) dual serial port object. I'd like to think about ways to pull those into Catalina too.
Indeed, one thing about pasm only objects that has just struck me is that you can use them for lots of things. CP/M. Catalina. Zog. Spin even. You just have to stay disciplined and not be tempted to add any high level code. None, I say!
I'm still pondering the 'ideal' interface. Is it a single long (which can be done - 2 bytes for each way comms, and flags in the other two bytes that both Tx/Rx program can set/reset to say that bytes have been read. The SIMH does an entire disk drive access through two virtual ports. Or, maybe it is better to relax those rules and allow a few variables and a buffer. On the one hand - so easy to borrow a bit more hub ram. But on the other hand (I keep reminding myself), every long borrowed from hub is another few pixels deleted off the bottom of the screen.
Maybe it does not matter for the moment. What intrigues me is the possibility of combining some 'pasm only' objects with Catalina and loading them from within the catalina code (ie from external ram. Or high eeprom as heater suggests). And then erasing the code from hub ram and replacing it with something new.
I wonder what a bare bones demo of this would look like? A vga cog, keyboard cog, the glue between them two longs at the top of hub ram, and some catalina code running somewhere in hub and/or external memory.
I think I need to understand the stack more. RossH, is there a good reason the stack has to be in hub? Could it be in, say, external ram?
C is a stack based language and the stack gets a lot of use, when calling functions, when working with local variables, when returning results etc. So I'm sure he has put the stack in HUB for performance reasons.
I have considered this for Zog as well but I'm kind of wedded to the idea that the Zog Cog can run C code from external RAM and leave the HUB mostly untouched. This way one can still run normal Spin from the HUB at the same time. Or start up another HUB based Zog interpreter.
Ok, well Zog and Catalina may well have something in common - both need more 'cogjects' - pasm only objects. How do we progress this? Pool what we already have and build it from there?
Spin can not execute code from external RAM, but you can group and reuse the hub ram of several Assembly DAT sections relativly easy.
Just separate the cog-assembly part and the Spin part of a driver in two objects, and add a start-PUB to the assembly object which loads the cog. Then, after starting the cog, the whole assembly object can be overwritten.
If you do that for all the keyboard, mouse , tv... objects and include the assembly objects at the end of the object list of the top-object all this objects are grouped together and can be re-used as a whole blob of memory.
Tooks me 5 minutes to modify the attached keyboard object including the demo. As you see this is supported also from the PropellerTool.
But if you work with an SD card, then this is anyway not necessary. I have a lot of *.cog fles on the SD card and load the cogs from the card. All you need is a small hub buffer (max 2kB), I normally use the screen buffer for that: Or you can execute a chain of programs with an SD card, like Sphinx does. So you can make very big programs with Spin only.
Andy
Yes, I said previously that SPIN could be made to do much of what we are talking about. But it tends to require a degree of knowledge about the "plumbing" of SPIN and PASM that should not really be expected of newcomers - and which is not expected when using most other languages. It also exposes this plumbing unnecessarily - ideally this kind of thing should stay safely hidden until you really need to mess about with it.
Also, there is still the need to come up with a framework that can be used both with SPIN and with other languages - so that we only ever need to write drivers once no matter what language you want to work in. For that you need some kind of interface contracts, which SPIN doesn't naturally provide. And some rudimentary memory management would assist here as well, to help different runtime environments avoid tripping over each other.
SPIN is a very good language for small self-contained projects, but it does not scale well, or interoperate well. For larger projects I would prefer to use a language with better capabilities for program modularization and re-use.
I should point out that C is not particularly good at this either - but at least it has #include files (which represent interface contracts) and library management that does not demand recompiling all source each time, and also only includes code that is necessary. It even supports primitive specialization/generalization (via #define and #ifdef).
Despite SPIN being often touted as an "object oriented" language, it doesn't support any of the basic OO infrastructure that replaces these things in most OO languages - and I'd hate to tackle a really big project without them.
BST improves things somewhat - it has #includes, #defines and #ifdefs, and I believe it also does elimination of unused objects and methods. These go some way towards fixing two of SPIN's main problems - i.e. that you tend to end up with a proliferation of "similar" objects, and also when you include an object you get all its methods (and all the objects it references) whether you need them or not. Neither of these are problems in a small project, but they can both become so in large projects.
Ross.
P.S. Heater is correct about the stack. Catalina puts both the call stack and the stack frame in hub RAM. This is done entirely for performance reasons. I think performance would be between 2 and 4 times slower with the stack in XMM RAM. I do not think there is any alternative to this. I certainly would not want to end up with a version of C that was that much slower than SPIN!
Catalina already has "PASM only objects" - they're called "plugins". And the interface to every plugin is the registry.
Ross.