Catalina - a self-hosted PASM assembler and C compiler for the Propeller 2
I thought I would try something that has been on my "to do" list for a while ...
Here is a completely trivial Propeller program:
First, in PASM:
CON LED_PIN = 38 ' Pin 38 is LED on the P2 EDGE TIME = 180000000/2 ' 1/2 second @180 Mhz DAT org 0 Loop drvnot #LED_PIN waitx ##TIME ' Toggle pin every half second jmp #Loop
And here it is again, this time in C:
#define LED_PIN 38 // Pin 38 is LED on the P2 EDGE #define TIME 180000000/2 // 1/2 second at 180 M<hz void main() { while(1) { _pinnot(LED_PIN); // Toggle pin every half second _waitx(TIME); } }
"So, what!", I hear you say?
Well, these just happen to be the first programs compiled on my P2 Edge, using a new version of Catalina compiled to run on the P2.
Both programs were created on the P2 using the vi text editor, assembled using p2asm or compiled using catalina and executed on the P2 using the catalyst program loader.
Self-hosted PASM and C development on the Propeller 2 is here!
A working preview that runs on a P2 EDGE board equipped with PSRAM (i.e. a P2-EC32MB) is available here.
Here is an extract from the README.TXT file:
This image contains a preview of Catalina and Catalyst 6.0. It is intended to demonstrate Catalina's new ability to self-host - i.e. that you can now edit, compile and execute C programs entirely on the Propeller 2. This demo requires a P2_EDGE with 32MB PSRAM installed (P2-EC32MB), an SD card, and a VT100 compatible terminal emulator. Strictly speaking, only the vi text editor actually REQUIRES a VT100 compatible teminal emulator, but it is recommended. For a Windows terminal emulator, you can use payload. If you do not have Catalina installed, the payload executable is included on the SD Card. To use it, copy it to your PC and enter a command like: payload -pX -b230400 -i -q1 where X is the port your P2 is connected to. You can also use any VT100 compatible terminal emulator, like Catalina's Comms program, or puTTY or Tera Term. Select the appropriate serial port and set the baud rate to 230400 baud. Copy the contents of this zip file to an SD card, insert it into the P2 and set the DIP switches to boot from the SD Card. And start compiling C programs! (but please read the rest of this file first). This demo includes all the usual Catalyst commands, but they are stored in the bin folder, along with the new Catalina exexcutables: cpp.bin - C preprocessor. rcc.bin - C compiler. bcc.bin - Catalina's binder and library manager. spp.bin - PASM preprocessor. p2asm.bin - PASM assembler. pstrip.bin - a utility to reduce the size of PASM files. catalina.lua - a Lua version of the PC 'catalina' command, which manages the compilation process. xl_vi - The vi text editor for very large files (the normal Catalyst vi is fine for editing C files, but the intermediate files generated by Catalina can be too large for it. You will also see the following folders: bin - Catalyst and Catalina executables tmp - a folder used to store temporary files include - the C include files lib - the C libraries target - Catalina's runtime support files See the file CATALYST.TXT for more information on the usual Catalyst commands, but note that the usual Catalyst demo programs have been omitted, since this demo is intended to demonstrate only the new C functionality. Here are some commands that can be used to compile the demo C programs: catalina hello.c -lci -v catalina othello.c -lci -v catalina startrek.c -lc -lmc -v catalina chimaera.c -lc -lmc -v catalina diners.c -lci -lthreads -v catalina intrrpt.c -lci -lthreads -lint -v catalina psram.c -lci -lpsram -C P2_EDGE -v The -C CR_ON_LF option can be added when using a VT100 terminal emulator, or you can instead adjust the settings of your emulator to implicitly execute a CR every time it receives an LF. Payload, Comms, PuTTY, and Tera Term all have such a configuration option. The -v flag is optional, but is recommended. Note that Catalyst has a limit of 23 arguments that can be specified in a single command, including in a script. If this number is exceeded when generating the compilation script, the catalina command will print the message "too many command line arguments" and stop. This will usually be because there are too many -C or -D options being specified. This limit may be increased in the full release of Catalina 6.0. The Lua version of the catalina command supports many but not all of the command-line options of the PC version. Use the command catalina -h to see what is currently implemented. Note that the self-hosted version of Catalina currently supports compiling programs only for the Propeller 2, and only the TINY, NATIVE, or COMPACT modes. It cannot build XMM SMALL or LARGE programs. While this is technically possible, any program that requires XMM RAM would take too long to compile on the Propeller 2. However, the PSRAM installed on the P2 EDGE can be USED by a C program compiled on the Propeller - see the demo program psram.c for an example. WARNING: The larger C demo programs can take a long time to compile. Even the standard C "hello world" program (hello.c) takes 10 minutes. Compiling the chimaera game (approx 5,500 lines of C code) takes about 170 minutes. This is why the -v option is recommended - without it, you would have no way of telling that anything is actually happening until the compilation completes. ANOTHER WARNING: There is currently no error detection done by the catalina command, so if (for example) you specified the wrong file name in your compilation command, the catalina script will still execute all the sub processes - but you can now terminate an executing script by holding down a key while rebooting the Propeller.
I will repeat the main warning here for those who don't bother to read the whole README.TXT:
The larger C demo programs can take a long time to compile. Even the standard C "hello world" program (hello.c) takes nearly 10 minutes. Compiling the "chimaera" game (chimaera.c, approx 5,500 lines of C code) takes about 170 minutes.
Ross.
EDIT: Revised post and thread title now that a preview version is available.
Comments
Well done. Seems like pretty good progress so far to me.
So did you find you had to write your own custom driver for PSRAM or could you use mine to achieve this? If you do use mine then you'll be able to share the PSRAM with graphics video (in theory), and/or other COGs. Of course performance will take somewhat of a hit though as expected if memory is shared this way amongst other COGs, and real-time video would need to take priority when it reads its scan line data to avoid graphics glitches.
Yes, I use your PSRAM driver, essentially unmodified (except for making the PASM names local). I just added a C interface. Catalina itself can only use 16Mb of external RAM for code space, so the other 16Mb can be used as either data storage (you can access it from C) or as graphics video. Catalina caches its XMM access, so if the video has to be given priority it should still work ok. But I've not tried it. I have tested it using the PSRAM on the P2 Edge and also using the HyperRAM add-on board on both the P2 Edge and the P2 Eval board.
Ross.
Wanna cop one of @Rayman 's 96MB PSRAM boards?. Maybe you can find a use that doesn't involve 20 year old arcade games?
Nothing wrong with 20 year old arcade games!
But it wouldn't make much difference at this stage - it was a Catalina design decision back in the P1 days to limit some addresses to 16Mb (i.e. 24 bits). Seemed like it would be more than enough. It would be possible to remove it specifically for the P2 for LARGE mode programs, but I don't want to get into this yet - I still have too much to do as it is.
I'll add it to my "to do" list!
Ross.
cool
Mike
Nice work Ross!
A quick update: I now have a first cut of both the C libraries and Catalina's linker working on the P2. So now I can develop C programs using only the Propeller 2 itself.
It is still basically a manual process, with each step (i.e. preprocessing, compiling, linking, assembling) requiring a separate command. This is basically what Catalina's "wrapper" program (i.e. the 'catalina' command itself) manages for you on the PC. But on the PC there are lots of operating system facilities available to assist - e.g. spawning subprocesses, executing shell commands as a subprocess and piping data between them, etc etc. None of these facilities are available on the P2 (at least not in the same form).
While the process will never be as seamless on the P2 as it is on a PC, there is still some work I can do to make it as easy as possible.
Is it ultimately going to be worth all the effort? I doubt it - in fact, I suspect I may be the only one that ever uses it!
Ross.
You never know. If you can make it seamless with a basic IDE to edit and compile files a bit like a QBASIC or TURBO-C IDE stuff so people can edit/build and then execute the code from and it restarts the Prop2 back to this environment right away at P2 reset in a second or so, it could be quite a useful little standalone setup for people to write and test their programs. Like the PCs of old used to be.
Ross, just saw this! Excellent!
https://github.com/jbruchon/elks
Maybe something like ELKS would fill the gap. Edit: Maybe not. It seems very close to the X86...
Still, maybe there is something out there nice and small enough to make sense.
It may not be necessary. After a bit more tidying up, here are the commands I now execute in Catalyst to compile a C program called "hello.c" on the P2:
I could simplify this further, but I don't really need to because I can already script this in Catalyst - all I need to do is to generate the script. I could do this manually using the vi text editor but I plan to write a simple Lua program to do it. I will call it 'catalina.lua' that will parse its parameters and use them to generate the script, so that in Catalyst I will be able to just say ...
catalina hello.c -lci
... just as I would on the PC.
I will release it when I get to that point, and then start adding all the bells and whistles needed for more complex cases.
Someone else can then write the operating system!
Ross.
EDIT: commands updated again!
So, you can edit a c program, compile, and execute all on P2, right?
Do you have some kind of OS to enter those commands? Or, is it rigged to boot into vi and then automatically compile after that?
Yes, Catalyst is my OS for the Propeller 1 and Propeller 2. This is an updated list of features that will be in the next Catalyst reference manual:
Ross.
@RossH I am ashamed to admit I had no idea about Catalyst. That's really full featured and cool! I am going to have to try it!
PS- As a fan of VI, it makes me happy you have that as the editor.
The self-hosted C development will be in the next release, but everything else is there already.
Ah, thank you. I was a bit shy to ask how come we were able to achieve so much on a 8MHz PC, developing and running code in the (amazing) QuickBasic 4.5 IDE.
Craig
Seems more young people use Python instead of Basic these days...
@RossH Can Python already work in your Catalyst setup? You have an editor... Think there was some micropython already working on P2...
@RossH What about pure pasm2 files? Is it possible to edit, compile, and run them via Catalyst?
Sure. Catalyst doesn't care about the language programs have been written in. It's basically a glorified SD Card program loader plus some associated SD card utilities. Sure, it knows about all the Catalina complexities such as XMM RAM etc, but it can also just be used to load and execute any binary file that can run on the Propeller.
Ross.
In the next release, yes. I have ported Dave Hein's p2asm to the P2 because that's what Catalina uses to assemble its pasm output.
Ross.
Ah, the good old days when computers actually worked for us rather than the other way around!
Well ...
The good news is that it all works.
The bad news is that it takes 25 minutes for the P2 to compile a simple "Hello, World" C program!
Not sure whether to say or
That still sounds like great progress, so maybe "say" the average ....?
Goes to show that software is still bloating out hand in hand with hardware capabilities. The Moore's Law of Software, maybe?
Yeah seems like there is probably something drastically wrong there. Something could be thrashing badly. When I ran P2 native Micropython from PSRAM instead of HUB (with a cache) it seemed to operate at about 20-40% performance vs HUB exec (with simulated hit rates of 50-100%). Even with a 0% hit rate (all execution in a page triggering a new PSRAM read) I could still get my benchmarks running around 12% native.
Some stuff I did is covered here:
https://forums.parallax.com/discussion/comment/1536446/#Comment_1536446
Of course this performance will be dependent on the code itself and how well it maps to the cache size available. Perhaps your compiler tool's needed working set is much larger in your situation being tested, some further tuning may be needed. Also another difference is that my data was still being read from HUB RAM - I needed more gcc patching work to move that into the PSRAM and that could slow things further considerably again depending on cache performance.
I know what the problem is, and I can improve things quite a lot.
As @rogloh suggested, the compilation process is thrashing, but it is thrashing the SD Card, not the RAM.
It is the C preprocessor that is taking nearly all the time. It has to open many files simultaneously because of the number of nested #include statements. I also use the C preprocessor to pre-process the assembly language runtime support files, because it allows me to have all the myriad options that Catalina supports in just a few key files, surrounded by #if .. #else ... #endif constructs. Doing it any other way would be a maintenance nightmare.
It is reading and writing multiple files on the SD Card simultaneously that is the bottleneck. SD Cards are slow, and are also not optimized for this type of intensive random interleaved read/write access!
Preprocessing the C sources is slow but not unbearable, because C source files tend to be both small and sparse (typically a few hundred lines, for a total file size of a few kilobytes). But the assembly language files generated, and the runtime support files, are typically dense (e.g. thousands of lines, with a total file size of dozens or even hundreds of kilobytes). Even a trivial C "Hello, World" program which is only about 50 characters, and which generates a final code size of only a few kilobytes, can generate intermediate files over 200kb! It is preprocessing these files that is taking most of the time.
Some of this is unavoidable, but there are things I can do to improve things. For instance, the assembly runtime support code is full of comments. This can be 2/3 of the size of the file. Of course, I don't want to lose the comments, but I can easily remove them from the versions of the files on the SD card. Just doing this will probably halve the compilation time!
I can also simplify the support files dramatically - the Windows and Linux version of Catalina are cross-compilers that have to be able to compile programs for any supported platform, but the P2 version needs to support only the one it is actually running on. I can also modify the compilation process so that it preprocesses the runtime files only once, rather than re-doing it every time.
Another possibility is using the spare PSRAM space as a file system for temporary files. That would speed up the whole process dramatically.
However, this is turning out to be much more work than I expected, and I am no longer sure it is worth doing.
Ross.
This is something worth looking at in general. I think a temporary file system for compilation and other tool use involving files would be very handy. It would not be difficult to map sector accesses from some filesystem to memory burst reads/writes, and character I/O is easily done too. APIs for block and character reads are easily translated into the driver's different memory accesses.
I hope you can persevere a bit longer to see how much of an improvement can be made....but I fully understand if you don't have the time to commit to it all right now.
An update: a few small tweaks (such as removing comments from the runtime support files to reduce their file size) has brought the compile time down to 15 minutes. As I had thought, it is the SD card throughput which is the bottleneck.
The bulk of the time (8 minutes) is now spent doing the final pasm assembly, which at least allows me to salvage a modicum of self-respect
Ross.
Does your SD driver have one global sector buffer or one per open file? If the former, that would explain great slowdown when operating on multiple files character-wise.
DOSFS has separate buffers per open file, so I think the bottleneck is either the SD card driver, or the SD card itself. I will investigate the driver, which is a fairly straightforward SPI driver written originally by @Cluso99 - maybe it can be improved.
I wonder if anyone has written a driver that uses the higher speed modes available on some SD cards? If anyone knows of one, please let me know!
Ross.