Catalina 3.7 BETA
RossH
Posts: 5,519
All,
Catalina 3.8 has now been released, and supercedes this beta - see here for more detail.
A new BETA release (3.7) of Catalina is now available here. Catalina is an ANSI compatible C compiler for the Propeller. This release adds support for CMM (Compact Memory Model) programs to the existing Catalina support for LMM and XMM programs. CMM programs require no source code changes, but a C program compiled as CMM will often be well under half the size of the equivalent program compiled as LMM. And while the resulting CMM program will generally execute more slowly than the equivalent LMM program, it will still generally be much faster than the equivalent Spin program.
But the main advantage of CMM mode is that it allows much larger C programs to run on a Propeller - programs that would normally be far too large to run without resorting to expensive and cumbersome XMM RAM solutions (and the CMM program could well execute just as fast as the XMM program would in any case!).
This finally makes C a practical alternative to Spin on the Propeller - especially in situations where adding XMM RAM is not feasible (e.g. because of the cost, or because there are not enough available I/O pins). Of course, if raw speed is required then LMM is probably still a better option - but in many non-compute-intensive cases (or where the parts of the program requiring speed are done in PASM anyway) the CMM version of a program may not run appreciably slower than the LMM version of the same program - and it will need less than half the memory space!
To see what the tradeoff looks like in practice, here is a an actual result taken from one of the standard C/Spin benchmark programs (XXTEA) showing the size and performance of both the CMM and LMM versions of the program, relative to Spin:
Compact mode is easily enabled - just define the COMPACT symbol (e.g. by using the option -C COMPACT on the Catalina command line).
For example, the following command might be used to compile a normal LMM program (this is an actual example included in the release):
To compile the same program as a CMM program, the following command could be used:
The CMM version of the Catalina Optimizer (included free with this release) can reduce code sizes and increase performance even further (although the savings are not currently as dramatic as they are for LMM programs). To compile the same program as an "optmized" CMM program, the following command could be used:
Compact mode is fully supported by the CodeBlocks graphical IDE, and the Payload loader and the BlackBox debugger also fully support CMM. For example, to compile, load and then debug a CMM program, the following commands could be used:
The current BETA release has some known limitations in its support of CMM mode:
Ross.
Catalina 3.8 has now been released, and supercedes this beta - see here for more detail.
A new BETA release (3.7) of Catalina is now available here. Catalina is an ANSI compatible C compiler for the Propeller. This release adds support for CMM (Compact Memory Model) programs to the existing Catalina support for LMM and XMM programs. CMM programs require no source code changes, but a C program compiled as CMM will often be well under half the size of the equivalent program compiled as LMM. And while the resulting CMM program will generally execute more slowly than the equivalent LMM program, it will still generally be much faster than the equivalent Spin program.
But the main advantage of CMM mode is that it allows much larger C programs to run on a Propeller - programs that would normally be far too large to run without resorting to expensive and cumbersome XMM RAM solutions (and the CMM program could well execute just as fast as the XMM program would in any case!).
This finally makes C a practical alternative to Spin on the Propeller - especially in situations where adding XMM RAM is not feasible (e.g. because of the cost, or because there are not enough available I/O pins). Of course, if raw speed is required then LMM is probably still a better option - but in many non-compute-intensive cases (or where the parts of the program requiring speed are done in PASM anyway) the CMM version of a program may not run appreciably slower than the LMM version of the same program - and it will need less than half the memory space!
To see what the tradeoff looks like in practice, here is a an actual result taken from one of the standard C/Spin benchmark programs (XXTEA) showing the size and performance of both the CMM and LMM versions of the program, relative to Spin:
Size Speed Spin 1.0 1.0 CMM 1.4 2.7 LMM 3.3 9.3In practice, the actual results for CMM tend to be better for large, complex C programs (this is because Spin is reasonably efficient at small, simple programs). Overall, the performance of a CMM program is typically between 1.3 and 2.7 times faster than Spin. In size, a CMM program is typically between 1.4 and 1.8 times larger than Spin.
Compact mode is easily enabled - just define the COMPACT symbol (e.g. by using the option -C COMPACT on the Catalina command line).
For example, the following command might be used to compile a normal LMM program (this is an actual example included in the release):
catalina test_suite.c -lciThis results in a code size of 6812 bytes.
To compile the same program as a CMM program, the following command could be used:
catalina test_suite.c -lci -C COMPACTThis results in a code size of 3408 bytes
The CMM version of the Catalina Optimizer (included free with this release) can reduce code sizes and increase performance even further (although the savings are not currently as dramatic as they are for LMM programs). To compile the same program as an "optmized" CMM program, the following command could be used:
catalina test_suite.c -lci -C COMPACT -O3This results in a code size of 3348 bytes
Compact mode is fully supported by the CodeBlocks graphical IDE, and the Payload loader and the BlackBox debugger also fully support CMM. For example, to compile, load and then debug a CMM program, the following commands could be used:
catalina test_suite.c -lci -C COMPACT -g3 payload test_suite blackbox test_suite
The current BETA release has some known limitations in its support of CMM mode:
- Programs that use FCACHE cannot use the debugger. The CMM code generator does not currently generate any FCACHE instructions, but some of the graphics library functions do use FCACHE (for speed). In future releases, it is likely that the code generator will continue to not generate FCACHE, but that these will be added by the Optimizer - so debugging should always be done on non-optimized programs.
- The threads library is not yet supported by the CMM kernel. Just haven't had time for this yet :frown:
- XMM RAM is not yet supported by the CMM kernel (and it may never be, since if XMM RAM is available, then using XMM mode is a far better option).
- Only serial loading is currenty supported for CMM programs. However, any existing serial loader can be used - including Payload, Propellent or the Parallax Propeller tool.
- The documentation has had only a rudimentary update - it mentions CMM in the appropriate places, but does not go into much detail yet.
Ross.
Comments
For those interested in the internals of CMM, I thought it might be worth posting a more complex example, which also illustrates the "hybrid" nature of the new CMM kernel. Here is how to compile the graphics_demo.c program (provided in the demos\graphics folder in the release):
Looks complex? Let's break that down:
catalina is the command line compiler (you could also use Code::Blocks).
-lci means use the integer version of the standard C library
-lgraphics means use the graphics library
-y means produce a listing (we'll look at that in a minute)
-C NO_HMI means don't include any HMI drivers (this program doesn't use stdin, stdout etc)
-C DOUBLE_BUFFER means use two graphics buffers (for smoother graphics)
-C COMPACT means generate a CMM program (i.e. rather than LMM - which would be too big to execute!)
-C C3 means compile for a C3 (this sets the clock and pins etc - replace this with any platform with a TV output!)
graphics_demo.c is the name of the C program we want to compile
-C PAL means generate a PAL TV output (rather than NTSC)
-C NO_INTERLACE means turn off interlacing on the TV output
That's traditional LMM PASM, executed via a traditional unrolled LMM loop embedded within the CMM kernel.
You'll also see things like this:
It may look a bit odd, but that's bog-standard COG PASM, loaded into the CMM kernel cog via FCACHE for execution.
Finally, you'll see a lot of slightly strange looking stuff like this:
That's the CMM code - essentially compressed PASM, expanded and interpreted "on the fly" by the CMM kernel.
That's why I call CMM a "hybrid" approach. CMM programs are partly interpreted, partly LMM PASM, and partly COG PASM. A CMM kernel can trade off speed vs space. More CMM code means less space ... but also less speed. More LMM code means more speed ... but also more space!
I'm still working on finding the right balance in the CMM code generator. Currently, I always optimize for space over speed. When I think I have achieved the miminum space, I will start looking at offering options to optimize for speed instead. The CMM kernel won't need to change - only the Catalina code generator.
Ross.
CMM looks great, but I'm puzzled how the CMM binary is less than half the size of LMM. With 16-bit codes it seems that the best it would be is exaclty half the size of LMM, if all of the instructions were compressed. However, not all of the instructions are compressed, so I would expect it to be slightly larger than half the size of LMM. Does CMM use additional things to get below 50%, such as a limited library?
Dave
No.To see where other efficiencies can be gained, consider loading a constant value into a register ...
With LMM PASM, you only have two choices for constant sizes - 9 bits or 32 bits. Loading up to 9 bits requires 32 bits, and loading 10 to 32 bits requires a full 64 bits to encode - very inefficient!
But with CMM, you can have many more choices. In the case of the Catalina CMM, I can encode loading 5 bit, 9 bit, 24 bit or 32 bit constants much more efficiently. Loading up to 5 bits takes only 16 bits to encode, loading 6 to 24 bits takes only 32 bits to encode, and loading a full 32 bits takes only 48 bits to encode - much more efficient!
There are other reasons, but this one alone probably accounts for additional code size reductions of another 10% to 20% over a simple halving of code sizes due to compression.
UPDATE: Edited this answer, since it was incomplete and incorrect - see my next post for more details.
Ross.
Having the code be 3X bigger is a real problem. But, 1.4X isn't so bad given that it runs faster...
One possibility is that there are a range of jumps that are replaced by adding and substracting offsets instead of using a 32-bit absolute jump address. But it seems like that is also 2:1 at best. So I still don't understand where it would get better than 2:1 compression to offset the cases where it gets less than 2:1 compression.
Hi Dave,
Yes, I realized after I went to bed last night that my answer was incomplete. I should have said that you can approach 2:1 compression, even in the face of apparently uncompressable 32 bit constants (i.e. most 32 bit constants don't really need 32 bits - they can be conveniently represented as 24 bits, and most 9 bit constants don't need 9 bits - they can be represented in 5 bits or less) - but (as you point out) that's not going to get you any better than 2:1.
To see how that's possible, you have to look at the "primitives" (or "macros" as I think GCC calls them?). In an LMM kernel, all instructions have to be valid PASM (for speed) - so primitives are generally represented by JMP instructions to code within the kernel. That takes 32 bits. Then if the primitive requires a parameter that can take another 32 bits to load, for a total of 64 bits to encode. And if the parameter is more than 9 bits, the primitive can take 96 bits to encode.
But in a CMM kernel you have no such restrictions. In a CMM kernel you can implement the primitives such that common ones take only 16 bits even if they require a parameter up to about 10 bits. That means 16 bits instead of 64 - i.e. 4:1. And even if the primitive requires a 24 bit parameter, this can be implemented in 32 bits instead of 96 - i.e. 3:1
I think GCC takes a different approach to Catalina in the LMM kernel, and makes less use of primitives - in this way it gets a speed advantage. But this comes at the cost of code size - to counter that GCC requires a much more sophisticated code optimizer than Catalina does. But in a CMM kernel you want to use as many primitives as you can - more primitives not only means smaller code, it also increases speed - and you can achieve code sizes less than 2:1.
Ross.
I just got a Hydra Xtreme 512K SRAM card for my Hydra and was wondering if at least the first 64K of it can be used with Catalina. Thanks
Hi blittled,
The answer is yes - but to be honest, I'm struggling to try and remember the details. Almost immediately after I bought my Hydra Xtreme I reloaded the firmware using Eric Moyer's upgrade to allow random access to the full 512k - this firmware upgrade is included in the Catalina release, but you have to build or buy a special programming cable to install it.
This is what my original note in the Catalina release says (this is in the file README.HX512 in the main Catalina directory):
This would imply that Catalina can (or at least could at one point!) use the first 64kb of the card even with the original firmware - but I can't retest it with the original firmware, since I have already updated the firmware on my Hydra Xtreme. The best thing to do is try it yourself. Here is what I just did - start out in the main Catalina directory (e.g. C:\Program Files\Catalina) ... If you get an error message at this point ("Access is denied") then please make sure you have write access to the Catalina\bin directory (the build_all script copies the utilities in there) and then run the command again. Then compile a test program - I used othello.c ... Then try loading it - note that this requires a special (but quite simple) "mouse" serial cable, as described on page 22 of the Catalina Reference Manual (the normal serial port cannot be used once the Xtreme is in use, so we have to load serial programs into the Xtreme via the mouse port) ... This program uses only about 10k of the Xtreme - but if this one loads and runs correctly (it will use the Hydra TV output and keyboard) then you can probably rely on using the first 64kb for anything.
Ross.
Edit: I just read your post on the errata you posted on the programmer so that answered my question. Thanks
Just discovered (on the weekend) that I forgot to include the new CMM kernel in the library function that launches a C function on a new cog - so at the moment, the multi-kernel demos will still launch an LMM kernel even if it is trying to execute CMM code - which of course won't work. D'oh!
I'm in the middle of some really impressive stuff with Catalina that I hope to be able to release soon, so please let me know if anyone needs this functionality in the short term - otherwise I will just include it in the next release.
Ross.
And for those of you wondering what I'm working on .... well, if you have a Hydra, here is a sneak peek. Prepare to be amazed! :cool:.
This program is written in C and takes under 2.5k of code space when compiled to use the new CMM kernel.
You will need a VGA monitor connected to a Hydra.
Ross.
I probably should have realized that there aren't that many people who own a Hydra ... so here is the same demo compiled for a C3. You must have a VGA montor connected:
Thanks, Ken. This program could have been programmed in Spin, but it could not have been programmed in a language that compiles to LMM. The Propeller simply doesn't have enough RAM.
But with CMM, Catalina finally makes C competitive with Spin.
Ross.
Once you have the C compiler, the graphics library, and the option to use CMM (so that you know the final code sizes will be small enough to actually execute) then all this stuff just becomes a couple of lines of C code.
When I started out with the Propeller, I got frustrated at having to try and code complex programs in Spin - especially since there were no proper support tools (such as a source level debugger). Now I can go back and finish some of the programs I started to write but then had to abandon when they got too complex to develop or debug using Spin.
Finally, I can now write them, debug them and then execute them entirely in C instead.
Ross.
Congratulations, Ross! That's very nice work. This is an exciting time for C users on the Propeller, to be sure.
Eric
Sure is!
Just to finish off this series, attached is another version for the C3 - this time with a VGA resolution of 800x600, and using 64 colors.
Here is the actual C code:
Here is a C version of the classic SpaceWar arcade game, compiled for a HYDRA (sorry, no C3 version yet, since it currently requires a gamepad). The VGA resolution is 640x480. The game will also run in 800x600. It does not require any XMM RAM - it could be compiled to run on any barebones propeller with suitable VGA and gamepad pinouts.
The Spin version of this program was written by Eric Moyer. I loved this old game so much that adding VGA support and the Cinematronics graphics enhancements was one of the first things I did when I bought my HYDRA. I (reluctantly) abandoned it when it became clear that the Spin version wasn't really fast enough to suport the enhanced graphics and the more sophisticated gameplay. Now I can finally go back to improving it as I had originally planned .
The code size of the SPIN version was10k.
The code size of the CMM version is 14k - and it runs faster than the Spin version. With a bit of tweaking I expect to be able to reduce the code size enough to also support 1024x768 resolution (I can do so now, but it currently requires me to remove a couple of kilobytes of gameplay code).
The code size of the LMM version is over 30k - completely useless, since it leaves no space for any data or video buffers!
Also attached is the actual C code for those who are interested - it's a reasonably faithful translation of the Spin version. I realize it's not much use to anyone but me yet, but it a good illustration of the size and complexity of C programs that can now be run on a "bare bones" Propeller. But there are some instructions at the top of the file that may be useful.
I'll release another beta of Catalina with enhanced CMM support and the VGA version of the graphics library soon.
Ross.
I have removed these and updated the binary. The code size has also come down - to just over 14k. Not sure why that is - most likely I also forgot to compile the previous version with the optimizer :frown:.
Ross.
I have made significant reductions in the Catalina CMM code size, and also added full keyboard and mouse support to both the vector (VGA) and raster (TV) graphics libraries. Catalina 4 will soon allow all possible combinations of text, vector graphics, raster graphics, mouse, keyboard and gamepad support - on any Propeller platforms with the appropriate devices connected.
With these improvements, the code size of the C version of the Spacewar! program is less than 1.5 times the size of the Spin version - even with full keyboard support added (something the Spin version never had!). And it's still faster!
I have attached new versions compiled for both the Hydra and the C3. Neither one requires XMM RAM. Both generate VGA output at 800x600 resolution. On the Hydra you can use either the NES gamepads, or the keyboard, or any combination of these. On the C3 the only option is for both players to share the keyboard.
Here is the new mapping of game functions to gamepad buttons and keyboard keys:
I can't honestly say it is easy for two players to play Spacewar! when both are sharing the same keyboard - but it is possible!
Ross.
I was originally thinking I would release one more beta, but CMM is working out so well that I'm not sure there is any need.
Catalina 4 should be ready to go as soon as I have time to get the multi-threading stuff working again.
Ross.