I can't reproduce your results exactly - that is probably due to the use of a different type of SD card. However, the important thing here is how much using the cache speeds things up.
These results use a Sandisk 32Gb Extreme, at 300Mhz, in Catalina NATIVE mode:
As expected, using the cache speeds things up quite dramatically, and shows that this is a much more significant way to improve performance than just improving the low level SD card driver speed.
One other thing to note - you MUST format the SD card between each test, otherwise the tests are not using the same sectors on the SD card - and this makes a MASSIVE difference to the speeds - SD card manufacturers seem to cheat by making the first sectors much faster than subsequent sectors - presumably to give good results on simple speed tests.
Still working on the LARGE mode testing - I can't seem to get consistent results, which indicates I may have a bug with the cache in LARGE mode.
@RossH said:
One other thing to note - you MUST format the SD card between each test, otherwise the tests are not using the same sectors on the SD card - and this makes a MASSIVE difference to the speeds - SD card manufacturers seem to cheat by making the first sectors much faster than subsequent sectors - presumably to give good results on simple speed tests.
I would argue the opposite then. If there is something the cards are doing to make it faster at the start then messing it up plenty before the real test would be in order. Otherwise you'd be falling for a cheat.
@RossH said:
One other thing to note - you MUST format the SD card between each test, otherwise the tests are not using the same sectors on the SD card - and this makes a MASSIVE difference to the speeds - SD card manufacturers seem to cheat by making the first sectors much faster than subsequent sectors - presumably to give good results on simple speed tests.
I would argue the opposite then. If there is something the cards are doing to make it faster at the start then messing it up plenty before the real test would be in order. Otherwise you'd be falling for a cheat.
I suppose you could start with writing a known number of files to the card, rather than an empty card. The main thing it that it be redone exactly the same way before each test (and using the same formatting and file writing program) otherwise you may be using different clusters (in different sections of the card) for different tests - and you will get different results.
Even so, if you are using different file system implementations (e.g. DOSFS vs FatFS) they may allocate clusters to files differently, so you may still get differences. But it is probably fairest to start with an empty card, or at least write a test that utilizes more of the card than just a few sectors (in this case, about 200 - see next post).
This test program only uses about 200 sectors (204 I think), so any cache size 100kb or so would give the same results - which would easily fit in Hub RAM. But my program uses PSRAM for the cache, so the cache could be up to 16Mb. A smaller cache (say 64kb or 32kb ) would give still give good - but slightly slower - results.
That's a page of vague fear-mongering. There's not one reason why given there. Although there is a hint about size boundaries. GParted has used 1 MB partitioning boundaries forever. Ie: First partition starts at 1 MB offset by default.
That's a page of vague fear-mongering. There's not one reason why given there. Although there is a hint about size boundaries. GParted has used 1 MB partitioning boundaries forever. Ie: First partition starts at 1 MB offset by default.
I always now use the SD Association format program, but out of interest I re-ran your program using a blank 32Gb Sandisk Extreme SD card, but formatted with different programs - the SD Association one, the Windows one, and the Linux one (gparted). All different. The differences were generally small (between 5% and 10%) but the SD Association was fastest, then gparted, then Windows.
I note it has a --discard option. I coded a full card discard for some of my cards not long ago - to see if that made any difference to testing speed but it didn't. Probably because it only matters after the card has been written 100%.
Well, I've learnt a good place to put these manually added binaries is in /usr/local/bin. It was, surprisingly, a totally empty directory. There is meant to be a ~/.local/bin as well but, although the directory already existed, the default path doesn't search it.
Disk /dev/sdg: 29.72 GiB, 31914983424 bytes, 62333952 sectors
Which is exactly 30436.5 MB.
SDA formatted:
Device Boot Start End Sectors Size Id Type
/dev/sdg1 8192 62333951 62325760 29.7G c W95 FAT32 (LBA)
GParted formatted:
Device Boot Start End Sectors Size Id Type
/dev/sdg1 2048 62332927 62330880 29.7G b W95 FAT32
GParted formatted, with additional LBA flag set:
Device Boot Start End Sectors Size Id Type
/dev/sdg1 2048 62332927 62330880 29.7G c W95 FAT32 (LBA)
GParted formatted, LBA flag set, no alignment, 4 MB unused at beginning:
Device Boot Start End Sectors Size Id Type
/dev/sdg1 8192 62333951 62325760 29.7G c W95 FAT32 (LBA)
Ha! I found the issue that was giving me such inconsistent results when using the cache in XMM programs!
I have to thank you for your test program, @evanh - it drew my attention to a piece of horribly inefficient code in my cache functions. It bamboozled me for a while because the affect on NATIVE programs is so small as to be almost unnoticeable - but in XMM programs (which execute from PSRAM and therefore execute more slowly) it was inefficient enough that in some cases it made the cached version of the function execute SLOWER than the non-cached version!
Still testing, but rewriting just a couple of lines has so far improved the performance of the self-hosted version of Catalina (which executes from PSRAM) by another 20%!
However, it does make me wonder what other silly things I have yet to find!
I've done some benchmarking on my new SD card cache, and the results are looking pretty good.
Here are the latest times for compiling some Catalina demo programs, using the self-hosted version of Catalina running at 300Mhz on the P2 Eval board equipped with the Hyper RAM add-on:
hello.c (~5 SLOC) - 3 mins
othello.c (~500 SLOC) - 4 mins
startrek.c (~2200 SLOC) - 18 mins
chimaera.c (~5500 SLOC) - 55 mins
dbasic.c (~13500 SLOC) - 104 mins
Compile times are now between 2 and 3 times faster than the initial release. I always wanted to get down to 3 minutes for a simple "Hello, World!" type C program, and this version finally achieves that.
All compiles done using Catalina's Quick Build option - which in hindsight was perhaps not the best name to choose!
The improved cache will be included in the next release.
That's a nice improvement but I'm kind of surprised it's not better than this. If hello.c has 5 lines of code do you know how many lines of additional header files are being parsed?
Have you looked into what sort of cache hit rate you are getting and what proportion of the 3 mins is just waiting to read/write external memory? I guess it would be possible to instrument your cache driver to spit out hits/miss ratios during a compile and the average time taken for a hit and a miss.
With the old 20-33MHz DOS machines I used to use back in the day running Turbo Pascal and Turbo C etc, compiling a hello world type of program would maybe only take like 10-30secs or so (just roughly guessing from my old memory). It certainly was a lot less than 3mins. I'd imagine a 300MHz P2 should be somewhere in the vicinity of that depending on the cache performance.
Is most of the time looking up data structures, or blocked doing file IO, or from lots of cache misses? It'd be interesting to know where the bottleneck is if you wanted to speed it up further.
Update: Actually if you have a 70% cache hit rate and a cache miss takes 1us to resolve (approx), then the resulting MIP rate drops from 150MIPs max (@300MHz) down to about 1MIP for each miss. So 30% of the instructions you run at ~1MIP and 70% you run at 150MIPs. Avg instruction rate is then 100 instructions serviced in 30/1us+70/150 microseconds or 100/30.467us = 3.28MIPs. This is ~10% of the ballpark of a slow old PC which I guess is more in line with your recent results - although this is with a RISC vs CISC ISA so still not really a proper comparison.
@rogloh said:
That's a nice improvement but I'm kind of surprised it's not better than this. If hello.c has 5 lines of code do you know how many lines of additional header files are being parsed?
About 200.
Have you looked into what sort of cache hit rate you are getting and what proportion of the 3 mins is just waiting to read/write external memory? I guess it would be possible to instrument your cache driver to spit out hits/miss ratios during a compile and the average time taken for a hit and a miss.
Yes, the cache hit rate is good. In most of these programs, the cache is large enough to hold all the sectors used, and it is a "write-back" cache, so in fact the hit rate approaches 100%. I have not measured the time taken to actually read/write from PSRAM, since there is probably no way I could improve it much even if I found out that was a significant bottleneck.
With the old 20-33MHz DOS machines I used to use back in the day running Turbo Pascal and Turbo C etc, compiling a hello world type of program would maybe only take like 10-30secs or so (just roughly guessing from my old memory). It certainly was a lot less than 3mins. I'd imagine a 300MHz P2 should be somewhere in the vicinity of that depending on the cache performance.
Even with a cache hit rate of 100%, you still have to ultimately read and write hundreds of kilobytes - even megabytes (the core LCC program is nearly 2Mb) - from the SD card to compile even a 5 line program. Doing that from an SD card is going to take more time than doing it from a hard disk.
Then there is the fact that some of the work is done using Catalyst's scripting capabilities, which are quite slow because (for example) they have to reboot the Propeller and reload Catalyst (and then possibly also Lua) to execute each and every line of script. It's clunky, but it works.
Overall, I'd guess that somewhere between 30 seconds and 1 minute of every compilation time - even for just a 5 line program - is taken up by things that are just not that easy to speed up. The cache (once I got it working right) was actually a very easy win.
Is most of the time looking up data structures, or blocked doing file IO, or from lots of cache misses? It'd be interesting to know where the bottleneck is if you wanted to speed it up further.
Much of the compliation time is spent in the C preprocessor (cpp). The one LCC uses is grindingly slow. I have thought about replacing it - there are open source ones that would be faster. That might save another 15 seconds or so.
Another big issue is the way Catalina does linking - the C library is not actually compiled to binary - it is just translated to PASM. That means every compilation - even for a 5 line program - is actually concatenating and then assembling a large chunk of the standard C library. This design decision probably looks a little crazy, but at the time it was not - in the very early days of the Propeller 1 there was simply no other choice. There was not even a publicly documented executable format for Propeller programs, and no tools available to use it if there had been - -i.e. no intermediate object formats and no linker! Even Spin's executable format had to be reverse engineered. Luckily, there were some clever people around at the time who did that, and who produced alternative Spin tools that expanded on the Parallax offerings. The very first version of Catalina actually spat out the complete program in a combination of Spin and PASM, and then used the Parallax Spin tool to compile/assemble the final result. Catalina soon outgrew the Parallax tools, but luckily by that time there were alternatives (bstc, HomeSpun, OpenSpin etc). On the P2, Catalina no longer uses any Parallax or Spin tools at all, since someone (Dave Hein) finally produced a proper assembler.
Finally, there is the insurmountable issue that because these programs are too large to fit in Hub RAM, they have to execute from external RAM. So the PSRAM is not only being used as an SD cache - it also being used to serve up the executable code itself. I also cache that, of course - but even so, executing code from external RAM can be anywhere from 2 to 10 times slower than executing the same code from Hub RAM.
Update: Actually if you have a 70% cache hit rate and a cache miss takes 1us to resolve (approx), then the resulting MIP rate drops from 150MIPs max (@300MHz) down to about 1MIP for each miss. So 30% of the instructions you run at ~1MIP and 70% you run at 150MIPs. Avg instruction rate is then 100 instructions serviced in 30/1us+70/150 microseconds or 100/30.467us = 3.28MIPs. This is ~10% of the ballpark of a slow old PC which I guess is more in line with your recent results - although this is with a RISC vs CISC ISA so still not really a proper comparison.
I will of course look at further speed improvements, but from here on in every improvement will be harder work for less benefit. And I have other things I'd rather be doing.
@rogloh said:
That's a nice improvement but I'm kind of surprised it's not better than this. If hello.c has 5 lines of code do you know how many lines of additional header files are being parsed?
About 200.
Have you looked into what sort of cache hit rate you are getting and what proportion of the 3 mins is just waiting to read/write external memory? I guess it would be possible to instrument your cache driver to spit out hits/miss ratios during a compile and the average time taken for a hit and a miss.
Yes, the cache hit rate is good. In most of these programs, the cache is large enough to hold all the sectors used, and it is a "write-back" cache, so in fact the hit rate approaches 100%. I have not measured the time taken to actually read/write from PSRAM, since there is probably no way I could improve it much even if I found out that was a significant bottleneck.
At 300MHz my rough rule of thumb is about 1us of time to read in a "small" block of data like 32 bytes or less for a transfer. If your transfer sizes are larger then add another 2*transfer size/sysclk micro seconds to the service time to account for the extra transfer time. If you write a custom memory driver that directly couples your cache to external memory interface and avoids the COG service polling overhead you can speed it up a bit, but you'll then lose the ability to easily share the memory with multiple COGs (or will have to invent your own memory driver sharing scheme which likely slows it down again as well).
With the old 20-33MHz DOS machines I used to use back in the day running Turbo Pascal and Turbo C etc, compiling a hello world type of program would maybe only take like 10-30secs or so (just roughly guessing from my old memory). It certainly was a lot less than 3mins. I'd imagine a 300MHz P2 should be somewhere in the vicinity of that depending on the cache performance.
Even with a cache hit rate of 100%, you still have to ultimately read and write hundreds of kilobytes - even megabytes (the core LCC program is nearly 2Mb) - from the SD card to compile even a 5 line program. Doing that from an SD card is going to take more time than doing it from a hard disk.
Yeah, it will need to be read into PSRAM/HyperRAM initially. If the tools get swapped frequently during overall compilation that will add up.
Then there is the fact that some of the work is done using Catalyst's scripting capabilities, which are quite slow because (for example) they have to reboot the Propeller and reload Catalyst (and then possibly also Lua) to execute each and every line of script. It's clunky, but it works.
Sounds like it.
Overall, I'd guess that somewhere between 30 seconds and 1 minute of every compilation time - even for just a 5 line program - is taken up by things that are just not that easy to speed up. The cache (once I got it working right) was actually a very easy win.
Is most of the time looking up data structures, or blocked doing file IO, or from lots of cache misses? It'd be interesting to know where the bottleneck is if you wanted to speed it up further.
Much of the compliation time is spent in the C preprocessor (cpp). The one LCC uses is grindingly slow. I have thought about replacing it - there are open source ones that would be faster. That might save another 15 seconds or so.
Another big issue is the way Catalina does linking - the C library is not actually compiled to binary - it is just translated to PASM. That means every compilation - even for a 5 line program - is actually concatenating and then assembling a large chunk of the standard C library. This design decision probably looks a little crazy, but at the time it was not - in the very early days of the Propeller 1 there was simply no other choice. There was not even a publicly documented executable format for Propeller programs, and no tools available to use it if there had been - -i.e. no intermediate object formats and no linker! Even Spin's executable format had to be reverse engineered. Luckily, there were some clever people around at the time who did that, and who produced alternative Spin tools that expanded on the Parallax offerings. The very first version of Catalina actually spat out the complete program in a combination of Spin and PASM, and then used the Parallax Spin tool to compile/assemble the final result. Catalina soon outgrew the Parallax tools, but luckily by that time there were alternatives (bstc, HomeSpun, OpenSpin etc). On the P2, Catalina no longer uses any Parallax or Spin tools at all, since someone (Dave Hein) finally produced a proper assembler.
Yeah I agree, additional steps like that would have to slow it down.
Finally, there is the insurmountable issue that because these programs are too large to fit in Hub RAM, they have to execute from external RAM. So the PSRAM is not only being used as an SD cache - it also being used to serve up the executable code itself. I also cache that, of course - but even so, executing code from external RAM can be anywhere from 2 to 10 times slower than executing the same code from Hub RAM.
Yeah the external memory operation does slow it down a fair bit especially during cache misses.
I will of course look at further speed improvements, but from here on in every improvement will be harder work for less benefit. And I have other things I'd rather be doing.
This is a full release. It significantly improves the speed of the P2 self-hosted version of Catalina, and provides better support for the Parallax HyperFlash and HyperRAM add-on board on all P2 platforms. Please note especially the change of the default pins used for this add-on board, and also the VGA and USB adaptor boards. Plus a few other minor bug fixes and improvements.
Here is the relevant extract from the README.TXT:
New Functionality
-----------------
1. On all Propeller 2 boards, the Parallax HyperFlash/HyperRAM add-on board
is now configured to use pin 0 by default rather than pin 16. This change
was made because pin 0 allows a wider range of clock frequencies to be
used on most boards, allowing (for example) the self-hosted version of
Catalina itself to be compiled at 300Mhz. To accommodate the change, the
VGA base pin has been changed to pin 16 and the USB base pin to pin 24.
2. A new utility to test the delay setting used for the PSRAM and HYPER RAM
drivers has been added in the file /demos/p2_ram/delay.c. This utility can
be used to verify that the current delay setting (configured in the platform
configuration file) works across a selected range of clock frequencies.
See the README.TXT file in that folder for more details.
3. Add a -Q option to the catalina command, which can also be specified by
defining the Catalina symbol QUICKFORCE (e.g. -C QUICKFORCE), to force a
Quick Build. This option is similar to the -q option except that it will
re-build the target file even if one already exists. If the target file
does not exist, then the -Q option has the same effect as the -q option.
4. For the Propeller 2, Catalyst can now be built for the P2 Edge to use
either the PSRAM (if installed - i.e. the P2-EC32MB) or the HyperRAM
add-on board (on base pin 0 by default). To specify that Catalyst and
associated utilities should use the HyperRAM rather than on-board PSRAM,
specify HYPER when building it using the build_all script. You can also
specify PSRAM to use the PSRAM on the P2-EC32MB, but this is not necessary
since this is the default if HYPER is not specified. If specified, HYPER
or PSRAM should be the second parameter. For example:
cd demos\catalyst
build_all P2_EDGE SIMPLE VT100 OPTIMIZE MHZ_200
or
cd demos\catalyst
build_all P2_EDGE PSRAM SIMPLE VT100 OPTIMIZE MHZ_200
or
cd demos\catalyst
build_all P2_EDGE HYPER SIMPLE VT100 OPTIMIZE MHZ_200
The catalyst build scripts have been amended to facilitate building for
the P2 Edge demos using either HYPER or PSRAM by adding a new script
called 'build_p2' which accepts one or two parameters - the platform and
(optionally) the type of XMM RAM to use. Do not specify any other options.
For example:
build_p2 P2_EDGE <-- for P2_EDGE using on-board PSRAM
build_p2 P2_EDGE PSRAM <-- ditto
build_p2 P2_EDGE HYPER <-- for P2_EDGE using HYPER RAM add-on board
build_p2 P2_EVAL <-- for P2_EVAL using HYPER RAM add-on board
build_p2 P2_EVAL HYPER <-- for P2_EVAL using HYPER RAM add-on board
build_p2 P2_EVAL PSRAM <-- will generate an error (not supported)
This script will build two ZIP files - one that uses the SIMPLE serial
HMI option named for the platform (e.g. P2_EDGE.ZIP) and one that uses
the VGA HMI option (e.g. P2_EDGE_VGA.ZIP). Note that the 'p2_edge' and
'p2_eval' scripts have now been removed, but the same function can be
achieved using the new 'build_p2' script. Also note that the 'build_demos'
script still only builds demos for the P2_EDGE using PSRAM and for the
P2_EVAL using HYPER RAM - but building for the P2_EDGE using HYPER RAM
instead can be done with the following command:
build_p2 P2_EDGE HYPER
5. Added a new demo (in the folder demos/sd_cache) to provide an example of
how to build a local custom version of the Catalina C library to implement
some platform-specific functionality, such as (in this case) enabling the
PSRAM-based SD card cache. This folder also contains a test program that
can be used to demonstrate the SD cache. See the README.TXT file in that
folder for more details.
6. Removed the file demos/catalyst/README.SMM_Loader. The information it
contained has been added to the Catalyst Reference Manual.
Other Changes
-------------
1. Fixed an issue with Catalina only supporting 200Mhz when the HyperRAM
add-on board is used on the P2 Evaluation board. The platform configuration
files (target/p2/P2EVAL.inc and also target/p2/P2EDGE.inc) have been
updated to disable the use of Fast Reads (HYPER_FASTREAD) and also modify
the RAM read delay (HYPER_DELAY_RAM). Note that the default delay (10)
works from 150 to 260Mhz, but outside that range it may need to be adjusted.
The new utility (demos/p2_ram/delay.c) included in this release can be used
to verify the specified delay works across a range of clock frequencies.
See also the platform configuration file (e.g. P2EVAL.inc) for more details.
Affected the Propeller 2 Evaluation board only.
2. Fix some non 8.3 filenames in library files:
getrealrand.c -> getrrand.c
hub_malloc.c -> hmalloc.c
registerx.e -> register.e
Affected the Propeller 2 under Catalyst only.
3. Fix the C include file stdint.h to specifically identify Catalina as one
of the platforms with 32 bit pointers.
Affected The Propeller 2 under Catalyst only.
4. Fix LCC bug in pointer comparisons which led to the compiler emitting
erroneous warning messages such as "overflow in converting constant
expression from `pointer to char' to `pointer to void".
Affected The Propeller 2 under Catalyst only.
5. Fixed an issue with the COMPACT version of the getcnt() function on the
Propeller 2. Affected COMPACT mode programs on the Propeller 2 only.
6. The pre-built Catalyst demo (P2_DEMO.ZIP) was incorrectly claiming it was
built for a P2_EVAL board (in the CATALYST.ENV file, as displayed by the
'set' command). It now says it was built for the P2_CUSTOM platform.
Affected the Propeller 2 only.
The number 8 is lucky in Chinese, so release 8.8. must be double lucky! I really hope it is, because I will be putting Catalina into "maintenance" mode for the next few months while I attend to some mundane personal stuff. But I will still do bug fixes if anyone finds an issue that really needs it.
Hmmm. Something seems to have changed recently on Windows 10. Catalina's version of telnet no longer works - but everything else seems to be fine. It is most likely due to a recent Windows update, but it may take me a while to pin it down and (if necessary) fix it.
However, everything (including telnet) works fine under Windows 11. Since Windows 10 is due to have support pulled later this year, it may not be worth fixing.
@RossH said:
Hmmm. Something seems to have changed recently on Windows 10. Catalina's version of telnet no longer works
Found the problem. It was not Windows 10, it was Oracle VirtualBox. For reasons unknown it had created and enabled another virtual Ethernet adapter, which interferes with local Windows network connections even if they use WiFi and not Ethernet, and even if VirtualBox is not running. Windows 11 worked because that machine did not have VirtualBox installed. I seem to recall discovering this particular issue once before
Disabling this virtual network adapter using Windows Device Manager fixes the problem (note - you must disable it, not uninstall it - otherwise VirualBox just creates another one).
I don't much like Windows 11, and I use it as little as possible - but while messing about with it today (to try and find the problem with telnet (which turned out to be a VirtualBox problem anyway, not a Windows problem) I noticed that in Windows 11, the default application to use when executing batch files etc is the Terminal app, not the standard Windows command processor (i.e. cmd.exe).
This is generally ok, but there are a few things (such as the payloadi batch script) that won't work correctly under the Terminal app. To make them work correctly, you can go to "Settings" in the Terminal app, and set the "Default Terminal Application" to Windows Console Host instead of Windows Terminal or Let Windows Decide (but see EDIT1 and EDIT2, below!).
Ross.
EDIT 1: I have found a way to do this that does not require modifying the Windows Terminal app. Instead, open the properties of the "Catalina Command Line" desktop shortcut (and/or the Start menu entry) and change the Target from: "C:\Program Files (x86)\Catalina_8.8\bin\catalina_cmd.bat"
to conhost.exe "C:\Program Files (x86)\Catalina_8.8\bin\catalina_cmd.bat"
Works a treat!
EDIT 2: Also, at least one of the Catalina Geany Build commands need updating. Change the Download and Interact command to:
conhost payloadi.bat %x "%p\%e" -b%b
Note the inclusion of the ".bat" suffix - this is required.
This is a full release. It add support for Cake (version 0.12.05) as a stand-alone preprocessor that can convert C99, C11 & C23 programs to C89, suitable for compilation by Catalina. Cake is supported on Windows, Linux and Catalyst.
Here is the relevant extract from the README.TXT:
New Functionality
-----------------
1. Cake is now included as a component of Catalina. Cake is a preprocessor
that can take C99, C11 or C23 source programs and translate them to C89,
suitable for compiling with Catalina.
Eventually, Cake may be fully integrated into Catalina, but at present it
is a separate component that must be invoked manually to convert a C99 (or
C11 or C23) program to C89, which can then be compiled with Catalina as
normal. A demo C99 program called hello_99.c is provided. To convert this
to C89 and compile it with Catalina, use Cake first and then compile the
result with Catalina as normal. For example (on the Propeller 2):
cake hello_99.c -o hello_89.c
catalina -p2 -lc hello_89.c
See the file source\cake\README.TXT for a brief introduction to Cake and
how to use it with Catalina. See the Cake manaul in source\caka\manual.md
for full details on Cake.
2. Increased the maximum size of a string cpp can process from 4096 to 8192
bytes. This was required for Cake support.
3. Added va_copy macro to stdarg.h, and a new program to demonstrate its use
(demos/examples/ex_va_copy.c). This was required for Cake support.
4. Added minimum and maximums for long longs and unsigned long longs, and the
ability for CPP to parse constants of the form ddddLL or ddddULL. Currently,
long longs are the same size as longs and unsigned long longs are the same
size as unsigned longs, but some programs (like Cake) require that these
are explicitly supported.
5. Added a dummy wchar.h header file, which (at present) defines wchar_t but
does nothing else. This was required for Cake support.
6. Added an implementation of asprintf, vasprintf, snprintf and vsnprintf
to stdio, and a new program to demonstrate their use
(demos/examples/ex_snprintf.c). This was required for Cake support.
7. Added the ability for printf and family to process z or t format
specifiers - e.g. %dz, which means to print/scan something of size size_t,
or %ut, which means to print/scan something of size ptrdiff_t, and a new
program to demonstrate them (demos/examples/ex_printf_tz.c). These format
specifiers were introduced in C99, but some C89 programs assume this
functionality exists. Note that these work only in the standard C stdio
libraries (i.e. libc, libci, libcx, libcix) but not in the tiny I/O
library (libtiny). This was required for Cake support.
8. Two new functions for creating directories have been added:
mkdir(const char *name, mode_t *mode);
mkdirr(const char *name, mode_t *mode);
The mode parameter is not currently used, but is included for compatibility
with other C compilers. These new functions are defined in sys/stat.h, and
the type mode_t is defined in the new include file sys/types.h for the same
reason. Equivalent functions _mkdir and _mkdirr are defined in fs.h.
This was required for Cake support.
A new program to demonstrate the functions is included in
demos/file_systems/test_mkdir.c
The mkdir function creates only the leaf directory - e.g. mkdir "a/b/c"
will make only directory "c" assuming that "a/b" already exist - whereas
mkdirr "a/b/c" will make directory "a" if it does not exist, then "a/b"
(ditto) then "a/b/c". If the functions succeed they return zero, and if
they fail (e.g. because a directory that was expected to exist does not,
or that was not expected to exist already does) then they return -1.
9. Added definitions of EOVERFLOW and EDEADLK to sys/errno.h. These are not
used by Catalina, but are expected to be defined by some C programs. This
was required for Cake support.
10. Added definitions of HUGE_VALF and HUGE_VALL. These are not used by
Catalina, but are expected to be defined by some C programs. This was
required for Cake support.
11. A version of strdup and strndup has been added to the C libraries.
These are defined in the include file string.h as:
char *strdup(const char *_s);
char *strndup(const char *_s, size_t _n);
Some Catalina components and demo programs had their own versions of
strdup and/or strndup which have now been replaced with the library
versions. This was required for Cake support.
12. Catalina's version of stdbool.h was defining bool to be unsigned int, but
was not also defining _Bool. Now it defines both as unsigned char, which
is intended to make Catalina more compatible with C99 onwards for programs
compiled using Cake. However, this may break some existing C89 programs
which expect type bool to be compatible with bit-fields (i.e. to be of an
integer type) - so stdbool.h will revert to the previous definitions of
bool and _Bool if the C symbol __INT_BOOL__ is defined (e.g. using -D on
the catalina command line). For example:
catalina my_program.c -lc -D__INT_BOOL__
13. Catalina does not support 64 bit integers, but LCC does accept the
declaration of long long and unsigned long long, and so some programs
will expect it to support them and therefore also expect the types and
values associated with them (such as INT64_MAX and UINT64_MAX) to be
defined. The stdint.h include file has been modified to define 64 bit
types and values but make them all the same as 32 bit types and values.
This makes programs (like Cake) work, but may confuse other programs
which assume that if these values are defined then 64 bit integers are
supported.
Other Changes
-------------
1. Some of the values of the fprintf and fscanf format macros defined in
inttypes.h were incorrect. Affected Windows, Linux and Catalyst.
2. The library functions sprintf and vsprintf were being limited to writing
up to 32767 characters in each call. This limit has been increased to
INT_MAX (2147483647) characters. Affected Windows, Linux and Catalyst.
3. The Catalyst build_demos and build_p2 scripts were not working under
Linux, leading to missing files in the ZIP files they created. Affected
Linux only.
This is a full release. It adds integrated support for using Cake (version 0.12.24) as an alternative to Catalina's default C preprocessor (i.e. cpp). This means that Catalina can now compile "out of the box" programs that use C99, C11 & C23 features. Cake is supported on Windows, Linux, and also on the self-hosted version of Catalina for the Propeller 2 OS (Catalyst).
I have been working with the Cake developer to test/fix many minor Cake issues, just waiting till Cake is ready to be fully integrated into Catalina. The most recent version of Cake correctly compiles all the Catalina demo programs except for those that use C89 features no longer supported in C23 (i.e. jzip and xvi/vi - but such programs can still be compiled with Catalina simply by not using cake). Significantly, Cake now correctly compiles Lua, which is a major milestone given the extensive use Lua makes of the C preprocessor - so this is an appropriate point to do an initial integrated release.
Here is the relevant extract from the README.TXT:
New Functionality
-----------------
1. Cake is now fully integrated into Catalina. While Cake can still be used as
a stand-alone preprocessor that can take C99, C11 or C23 source programs
and translate them to C89 suitable for compiling with Catalina, it can
now also be used in place of the default Catalina preprocessor (cpp) by
using the -C option to specify the appropriate C standard using a 2 digit
numeric value. For example:
catalina -C99 -lci hello_99.c
catalina -C 23 -lci hello_99.c
Not specifying any C standard, or specifying -C89 or -C90 means Catalina
will use cpp as the C preprocessor, whereas specifying -C94, -C95, -C99,
-C11, -C17, -C18 or -C23 will use cake as the C preprocessor instead.
Other than determining whether cpp or cake is used, there is currently no
difference between the various standards except that cake will define the
symbol __STDC_VERSION__ to be 202311L (no matter which C standard is
specified) but this symbol will not be defined when using cpp. However,
eventually cake may behave differently according to which standard is
specified.
Note that the use of cake as a preprocessor is currently not compatible
with the Catalina debugger or the Catalina parallelizer. If -g or -Z is
specified along with -CXX (where XX != 89 or 90) then Catalina will issue
a warning and ignore the option. The reason for disallowing this is that
Cake may re-write the C program, which means the line numbers will not
match the original program (which would interfere with the use of the
debugger) and/or it may re-arrange lines in the C source program (which
would interfere with the operation of the parallelizer). However, Cake
can be used stand-alone to first (for example) convert a C99 program to
C89, which can then be edited, modified and compiled with Catalina using
the debug or parallelizer options.
Cake is fully compatible with catapult - simply add the appropriate C
standard (e.g. -C99) to the appropriate catapult pragma (typically, it
would be added to the "common" pragma).
Cake is fully compatible with geany - simply add the appropriate C
standard (e.g. -C99) to the Catalina Options field in the "Project->
Properties" dialog box.
Cake is fully compatible with the optimizer - simply add the appropriate
option (e.g. -O5 or -C OPTIMIZE). However, note that Cake may re-write
the C code and (similar to the optimizer itself) this may sometimes
result in it omitting code that it thinks is not used by the program (e.g.
because it appears only in the string arguments to "PASM" statements, which
are not understood by either Cake or the Optimizer). In such cases, it may
be necessary to add some dummy C code to prevent the code from being
omitted (for an example, see demos\inline_pasm\test_inline_pasm_7.c).
Note that Cake is much more rigorous in parsing C source code than cpp,
and it may issue errors, warnings or information notes in cases where cpp
simply passed the result silently on to the compiler. Also, there may be
identifiers that were valid in a C89 program that are not valid according
to later C standards, so a valid C89 program may not compile if cake is
used. However, the Catalina and Catalyst demo programs** have been updated
to compile correctly using EITHER cpp OR cake, albeit sometimes generating
additional warning messages when cake is used.
** All programs except for the Catalyst jzip and xvi (aka vi) programs.
While these programs COULD be updated to C99, they both use old-style C
function definitions extensively, which were dropped in C23 and which
Cake does not support. These programs remain as C89 only programs and
must still be compiled using -C89, -C90 or with no C standard specified.
The same is currently true of Catalina itself and the Catalina C library,
except that the C library include files have been updated where necessary
so that they can be used with any of the C standards.
A section on Cake has been added to the Catalina Reference Manuals.
Other Changes
-------------
1. The "Catalina Command Line" option on the geany build menu was not working
in Windows 11. It has now been changed to:
cmd.exe /k "call ""%LCCDIR%\\use_catalina"" && cd %p"
Note that this command must be entered with a double backslash (i.e \\)
when manually edited in %LCCDIR%\catalina_geany\data\filedefs\filetypes.c
but as a single backslash (i.e. \) when modified using the geany
"Build->Set Build Commands" menu & dialog box.
First rule of software releases - as soon as you do one, you will find something you missed!
In release 8.8.2 I forgot to add building Cake to the build_binaries script (it was added to the build_all script). You don't normally need to use these scripts if you use the Windows installer, but if you are rebuilding Catalina, or installing it manually on Linux or Pi OS, then if you use build_binaries instead of build_all, then in Catalina's source directory you should also manually do:
@RossH said:
Even with a cache hit rate of 100%, you still have to ultimately read and write hundreds of kilobytes - even megabytes (the core LCC program is nearly 2Mb) - from the SD card to compile even a 5 line program. Doing that from an SD card is going to take more time than doing it from a hard disk.
I suppose there is the question of could a higher SD throughput make a big difference? The testing shows, compared with Flexspin's C file handling, there is still a lot of headroom for improving the SD driver's data rate. In raw throughput, reads can be more than 10x faster in 1-bit SPI mode, 100x faster in 4-bit SD mode. And writes can be 100x faster SPI mode, 1000x faster SD mode.
@RossH said:
Even with a cache hit rate of 100%, you still have to ultimately read and write hundreds of kilobytes - even megabytes (the core LCC program is nearly 2Mb) - from the SD card to compile even a 5 line program. Doing that from an SD card is going to take more time than doing it from a hard disk.
I suppose there is the question of could a higher SD throughput make a big difference? The testing shows, compared with Flexspin's C file handling, there is still a lot of headroom for improving the SD driver's data rate. In raw throughput, reads can be more than 10x faster in 1-bit SPI mode, 100x faster in 4-bit SD mode. And writes can be 100x faster SPI mode, 1000x faster SD mode.
Would it make a difference? Yes, of course it would.
Would it make a big difference? Maybe, maybe not - the SD card I/O is only a small part of the whole compilation process.
I'll have a look at it when I get some time, but that won't be soon since I'm busy with other stuff just at the moment. But someone else is welcome to do so, since it involves rewriting just a single and entirely self-contained plugin which is just a few hundred lines of PASM (see target/p2/cogsd.t - ignore the clock support which is implemented in the same cog just because there is spare space).
Comments
@evanh
I can't reproduce your results exactly - that is probably due to the use of a different type of SD card. However, the important thing here is how much using the cache speeds things up.
These results use a Sandisk 32Gb Extreme, at 300Mhz, in Catalina NATIVE mode:
No cache:
Using a write-back SD sector cache:
As expected, using the cache speeds things up quite dramatically, and shows that this is a much more significant way to improve performance than just improving the low level SD card driver speed.
One other thing to note - you MUST format the SD card between each test, otherwise the tests are not using the same sectors on the SD card - and this makes a MASSIVE difference to the speeds - SD card manufacturers seem to cheat by making the first sectors much faster than subsequent sectors - presumably to give good results on simple speed tests.
Still working on the LARGE mode testing - I can't seem to get consistent results, which indicates I may have a bug with the cache in LARGE mode.
Ross.
I would argue the opposite then. If there is something the cards are doing to make it faster at the start then messing it up plenty before the real test would be in order. Otherwise you'd be falling for a cheat.
Excellent results for sure. How much RAM is used by the cache?
I suppose you could start with writing a known number of files to the card, rather than an empty card. The main thing it that it be redone exactly the same way before each test (and using the same formatting and file writing program) otherwise you may be using different clusters (in different sections of the card) for different tests - and you will get different results.
Even so, if you are using different file system implementations (e.g. DOSFS vs FatFS) they may allocate clusters to files differently, so you may still get differences. But it is probably fairest to start with an empty card, or at least write a test that utilizes more of the card than just a few sectors (in this case, about 200 - see next post).
For details on why the format program matters, see https://www.sdcard.org/press/thoughtleadership/the-sd-memory-card-formatterhow-this-handy-tool-solves-your-memory-card-formatting-needs/).
This test program only uses about 200 sectors (204 I think), so any cache size 100kb or so would give the same results - which would easily fit in Hub RAM. But my program uses PSRAM for the cache, so the cache could be up to 16Mb. A smaller cache (say 64kb or 32kb ) would give still give good - but slightly slower - results.
That's a page of vague fear-mongering. There's not one reason why given there. Although there is a hint about size boundaries. GParted has used 1 MB partitioning boundaries forever. Ie: First partition starts at 1 MB offset by default.
Nice! Which also explains why you don't have it always compiled in.
I always now use the SD Association format program, but out of interest I re-ran your program using a blank 32Gb Sandisk Extreme SD card, but formatted with different programs - the SD Association one, the Windows one, and the Linux one (gparted). All different. The differences were generally small (between 5% and 10%) but the SD Association was fastest, then gparted, then Windows.
Huh, there is a Linux edition, and it looks to be quite new - https://www.sdcard.org/downloads/sd-memory-card-formatter-for-linux/
I note it has a
--discardoption. I coded a full card discard for some of my cards not long ago - to see if that made any difference to testing speed but it didn't. Probably because it only matters after the card has been written 100%.Not FAT32-on-SDXC option though, lame.
Well, I've learnt a good place to put these manually added binaries is in /usr/local/bin. It was, surprisingly, a totally empty directory. There is meant to be a ~/.local/bin as well but, although the directory already existed, the default path doesn't search it.
Which is exactly 30436.5 MB.
SDA formatted:
GParted formatted:
GParted formatted, with additional LBA flag set:
GParted formatted, LBA flag set, no alignment, 4 MB unused at beginning:
Oh, yeah, from 64 GB upwards it shoves you with exFAT. That's a fail.
SDXC formatting of "/dev/sdg" was successfully completed. Volume information: File system: exFAT Capacity: 59.5 GiB (63847792640 bytes) Free space: 59.4 GiB (63830622208 bytes) Cluster size: 128.0 kiB (131072 bytes) Volume label: san64Reformatted to FAT32 with GParted: LBA flag set, no alignment, 16 MB unused at beginning:
Same behaviour with the Samsung 128 GB. SDA formatted:
And redone to FAT32:
Ha! I found the issue that was giving me such inconsistent results when using the cache in XMM programs!
I have to thank you for your test program, @evanh - it drew my attention to a piece of horribly inefficient code in my cache functions. It bamboozled me for a while because the affect on NATIVE programs is so small as to be almost unnoticeable - but in XMM programs (which execute from PSRAM and therefore execute more slowly) it was inefficient enough that in some cases it made the cached version of the function execute SLOWER than the non-cached version!
Still testing, but rewriting just a couple of lines has so far improved the performance of the self-hosted version of Catalina (which executes from PSRAM) by another 20%!
However, it does make me wonder what other silly things I have yet to find!
Ross.
I love making progress like that.
I've done some benchmarking on my new SD card cache, and the results are looking pretty good.
Here are the latest times for compiling some Catalina demo programs, using the self-hosted version of Catalina running at 300Mhz on the P2 Eval board equipped with the Hyper RAM add-on:
Compile times are now between 2 and 3 times faster than the initial release. I always wanted to get down to 3 minutes for a simple "Hello, World!" type C program, and this version finally achieves that.
All compiles done using Catalina's Quick Build option - which in hindsight was perhaps not the best name to choose!
The improved cache will be included in the next release.
Ross.
That's a nice improvement but I'm kind of surprised it's not better than this. If hello.c has 5 lines of code do you know how many lines of additional header files are being parsed?
Have you looked into what sort of cache hit rate you are getting and what proportion of the 3 mins is just waiting to read/write external memory? I guess it would be possible to instrument your cache driver to spit out hits/miss ratios during a compile and the average time taken for a hit and a miss.
With the old 20-33MHz DOS machines I used to use back in the day running Turbo Pascal and Turbo C etc, compiling a hello world type of program would maybe only take like 10-30secs or so (just roughly guessing from my old memory). It certainly was a lot less than 3mins. I'd imagine a 300MHz P2 should be somewhere in the vicinity of that depending on the cache performance.
Is most of the time looking up data structures, or blocked doing file IO, or from lots of cache misses? It'd be interesting to know where the bottleneck is if you wanted to speed it up further.
Update: Actually if you have a 70% cache hit rate and a cache miss takes 1us to resolve (approx), then the resulting MIP rate drops from 150MIPs max (@300MHz) down to about 1MIP for each miss. So 30% of the instructions you run at ~1MIP and 70% you run at 150MIPs. Avg instruction rate is then 100 instructions serviced in 30/1us+70/150 microseconds or 100/30.467us = 3.28MIPs. This is ~10% of the ballpark of a slow old PC which I guess is more in line with your recent results - although this is with a RISC vs CISC ISA so still not really a proper comparison.
About 200.
Yes, the cache hit rate is good. In most of these programs, the cache is large enough to hold all the sectors used, and it is a "write-back" cache, so in fact the hit rate approaches 100%. I have not measured the time taken to actually read/write from PSRAM, since there is probably no way I could improve it much even if I found out that was a significant bottleneck.
Even with a cache hit rate of 100%, you still have to ultimately read and write hundreds of kilobytes - even megabytes (the core LCC program is nearly 2Mb) - from the SD card to compile even a 5 line program. Doing that from an SD card is going to take more time than doing it from a hard disk.
Then there is the fact that some of the work is done using Catalyst's scripting capabilities, which are quite slow because (for example) they have to reboot the Propeller and reload Catalyst (and then possibly also Lua) to execute each and every line of script. It's clunky, but it works.
Overall, I'd guess that somewhere between 30 seconds and 1 minute of every compilation time - even for just a 5 line program - is taken up by things that are just not that easy to speed up. The cache (once I got it working right) was actually a very easy win.
Much of the compliation time is spent in the C preprocessor (cpp). The one LCC uses is grindingly slow. I have thought about replacing it - there are open source ones that would be faster. That might save another 15 seconds or so.
Another big issue is the way Catalina does linking - the C library is not actually compiled to binary - it is just translated to PASM. That means every compilation - even for a 5 line program - is actually concatenating and then assembling a large chunk of the standard C library. This design decision probably looks a little crazy, but at the time it was not - in the very early days of the Propeller 1 there was simply no other choice. There was not even a publicly documented executable format for Propeller programs, and no tools available to use it if there had been - -i.e. no intermediate object formats and no linker! Even Spin's executable format had to be reverse engineered. Luckily, there were some clever people around at the time who did that, and who produced alternative Spin tools that expanded on the Parallax offerings. The very first version of Catalina actually spat out the complete program in a combination of Spin and PASM, and then used the Parallax Spin tool to compile/assemble the final result. Catalina soon outgrew the Parallax tools, but luckily by that time there were alternatives (bstc, HomeSpun, OpenSpin etc). On the P2, Catalina no longer uses any Parallax or Spin tools at all, since someone (Dave Hein) finally produced a proper assembler.
Finally, there is the insurmountable issue that because these programs are too large to fit in Hub RAM, they have to execute from external RAM. So the PSRAM is not only being used as an SD cache - it also being used to serve up the executable code itself. I also cache that, of course - but even so, executing code from external RAM can be anywhere from 2 to 10 times slower than executing the same code from Hub RAM.
I will of course look at further speed improvements, but from here on in every improvement will be harder work for less benefit. And I have other things I'd rather be doing.
At 300MHz my rough rule of thumb is about 1us of time to read in a "small" block of data like 32 bytes or less for a transfer. If your transfer sizes are larger then add another 2*transfer size/sysclk micro seconds to the service time to account for the extra transfer time. If you write a custom memory driver that directly couples your cache to external memory interface and avoids the COG service polling overhead you can speed it up a bit, but you'll then lose the ability to easily share the memory with multiple COGs (or will have to invent your own memory driver sharing scheme which likely slows it down again as well).
Yeah, it will need to be read into PSRAM/HyperRAM initially. If the tools get swapped frequently during overall compilation that will add up.
Sounds like it.
Yeah I agree, additional steps like that would have to slow it down.
Yeah the external memory operation does slow it down a fair bit especially during cache misses.
Fair enough.
Catalina 8.8 has been released on GitHub and SourceForge.
This is a full release. It significantly improves the speed of the P2 self-hosted version of Catalina, and provides better support for the Parallax HyperFlash and HyperRAM add-on board on all P2 platforms. Please note especially the change of the default pins used for this add-on board, and also the VGA and USB adaptor boards. Plus a few other minor bug fixes and improvements.
Here is the relevant extract from the README.TXT:
New Functionality ----------------- 1. On all Propeller 2 boards, the Parallax HyperFlash/HyperRAM add-on board is now configured to use pin 0 by default rather than pin 16. This change was made because pin 0 allows a wider range of clock frequencies to be used on most boards, allowing (for example) the self-hosted version of Catalina itself to be compiled at 300Mhz. To accommodate the change, the VGA base pin has been changed to pin 16 and the USB base pin to pin 24. 2. A new utility to test the delay setting used for the PSRAM and HYPER RAM drivers has been added in the file /demos/p2_ram/delay.c. This utility can be used to verify that the current delay setting (configured in the platform configuration file) works across a selected range of clock frequencies. See the README.TXT file in that folder for more details. 3. Add a -Q option to the catalina command, which can also be specified by defining the Catalina symbol QUICKFORCE (e.g. -C QUICKFORCE), to force a Quick Build. This option is similar to the -q option except that it will re-build the target file even if one already exists. If the target file does not exist, then the -Q option has the same effect as the -q option. 4. For the Propeller 2, Catalyst can now be built for the P2 Edge to use either the PSRAM (if installed - i.e. the P2-EC32MB) or the HyperRAM add-on board (on base pin 0 by default). To specify that Catalyst and associated utilities should use the HyperRAM rather than on-board PSRAM, specify HYPER when building it using the build_all script. You can also specify PSRAM to use the PSRAM on the P2-EC32MB, but this is not necessary since this is the default if HYPER is not specified. If specified, HYPER or PSRAM should be the second parameter. For example: cd demos\catalyst build_all P2_EDGE SIMPLE VT100 OPTIMIZE MHZ_200 or cd demos\catalyst build_all P2_EDGE PSRAM SIMPLE VT100 OPTIMIZE MHZ_200 or cd demos\catalyst build_all P2_EDGE HYPER SIMPLE VT100 OPTIMIZE MHZ_200 The catalyst build scripts have been amended to facilitate building for the P2 Edge demos using either HYPER or PSRAM by adding a new script called 'build_p2' which accepts one or two parameters - the platform and (optionally) the type of XMM RAM to use. Do not specify any other options. For example: build_p2 P2_EDGE <-- for P2_EDGE using on-board PSRAM build_p2 P2_EDGE PSRAM <-- ditto build_p2 P2_EDGE HYPER <-- for P2_EDGE using HYPER RAM add-on board build_p2 P2_EVAL <-- for P2_EVAL using HYPER RAM add-on board build_p2 P2_EVAL HYPER <-- for P2_EVAL using HYPER RAM add-on board build_p2 P2_EVAL PSRAM <-- will generate an error (not supported) This script will build two ZIP files - one that uses the SIMPLE serial HMI option named for the platform (e.g. P2_EDGE.ZIP) and one that uses the VGA HMI option (e.g. P2_EDGE_VGA.ZIP). Note that the 'p2_edge' and 'p2_eval' scripts have now been removed, but the same function can be achieved using the new 'build_p2' script. Also note that the 'build_demos' script still only builds demos for the P2_EDGE using PSRAM and for the P2_EVAL using HYPER RAM - but building for the P2_EDGE using HYPER RAM instead can be done with the following command: build_p2 P2_EDGE HYPER 5. Added a new demo (in the folder demos/sd_cache) to provide an example of how to build a local custom version of the Catalina C library to implement some platform-specific functionality, such as (in this case) enabling the PSRAM-based SD card cache. This folder also contains a test program that can be used to demonstrate the SD cache. See the README.TXT file in that folder for more details. 6. Removed the file demos/catalyst/README.SMM_Loader. The information it contained has been added to the Catalyst Reference Manual. Other Changes ------------- 1. Fixed an issue with Catalina only supporting 200Mhz when the HyperRAM add-on board is used on the P2 Evaluation board. The platform configuration files (target/p2/P2EVAL.inc and also target/p2/P2EDGE.inc) have been updated to disable the use of Fast Reads (HYPER_FASTREAD) and also modify the RAM read delay (HYPER_DELAY_RAM). Note that the default delay (10) works from 150 to 260Mhz, but outside that range it may need to be adjusted. The new utility (demos/p2_ram/delay.c) included in this release can be used to verify the specified delay works across a range of clock frequencies. See also the platform configuration file (e.g. P2EVAL.inc) for more details. Affected the Propeller 2 Evaluation board only. 2. Fix some non 8.3 filenames in library files: getrealrand.c -> getrrand.c hub_malloc.c -> hmalloc.c registerx.e -> register.e Affected the Propeller 2 under Catalyst only. 3. Fix the C include file stdint.h to specifically identify Catalina as one of the platforms with 32 bit pointers. Affected The Propeller 2 under Catalyst only. 4. Fix LCC bug in pointer comparisons which led to the compiler emitting erroneous warning messages such as "overflow in converting constant expression from `pointer to char' to `pointer to void". Affected The Propeller 2 under Catalyst only. 5. Fixed an issue with the COMPACT version of the getcnt() function on the Propeller 2. Affected COMPACT mode programs on the Propeller 2 only. 6. The pre-built Catalyst demo (P2_DEMO.ZIP) was incorrectly claiming it was built for a P2_EVAL board (in the CATALYST.ENV file, as displayed by the 'set' command). It now says it was built for the P2_CUSTOM platform. Affected the Propeller 2 only.The number 8 is lucky in Chinese, so release 8.8. must be double lucky! I really hope it is, because I will be putting Catalina into "maintenance" mode for the next few months while I attend to some mundane personal stuff. But I will still do bug fixes if anyone finds an issue that really needs it.
Ross.
Hmmm. Something seems to have changed recently on Windows 10. Catalina's version of telnet no longer works - but everything else seems to be fine. It is most likely due to a recent Windows update, but it may take me a while to pin it down and (if necessary) fix it.
However, everything (including telnet) works fine under Windows 11. Since Windows 10 is due to have support pulled later this year, it may not be worth fixing.
Ross.
Found the problem. It was not Windows 10, it was Oracle VirtualBox. For reasons unknown it had created and enabled another virtual Ethernet adapter, which interferes with local Windows network connections even if they use WiFi and not Ethernet, and even if VirtualBox is not running. Windows 11 worked because that machine did not have VirtualBox installed. I seem to recall discovering this particular issue once before
Disabling this virtual network adapter using Windows Device Manager fixes the problem (note - you must disable it, not uninstall it - otherwise VirualBox just creates another one).
Release 8.8 is all good.
Ross.
I don't much like Windows 11, and I use it as little as possible - but while messing about with it today (to try and find the problem with telnet (which turned out to be a VirtualBox problem anyway, not a Windows problem) I noticed that in Windows 11, the default application to use when executing batch files etc is the Terminal app, not the standard Windows command processor (i.e. cmd.exe).
This is generally ok, but there are a few things (such as the payloadi batch script) that won't work correctly under the Terminal app. To make them work correctly, you can go to "Settings" in the Terminal app, and set the "Default Terminal Application" to Windows Console Host instead of Windows Terminal or Let Windows Decide (but see EDIT1 and EDIT2, below!).
Ross.
EDIT 1: I have found a way to do this that does not require modifying the Windows Terminal app. Instead, open the properties of the "Catalina Command Line" desktop shortcut (and/or the Start menu entry) and change the Target from:
"C:\Program Files (x86)\Catalina_8.8\bin\catalina_cmd.bat"to
conhost.exe "C:\Program Files (x86)\Catalina_8.8\bin\catalina_cmd.bat"Works a treat!
EDIT 2: Also, at least one of the Catalina Geany Build commands need updating. Change the Download and Interact command to:
conhost payloadi.bat %x "%p\%e" -b%bNote the inclusion of the ".bat" suffix - this is required.
Just found an obscure bug with Catalina 8.8 ...
When compiling a program for a Propeller 2 to use XMM LARGE mode and a 1K cache in the LUT you get an error message. For example:
catalina -p2 hello_world.c -lc -C P2_EDGE -C LARGE -C CACHED_1K -C LUT_CACHEGives:
3309: ERROR: Cog address exceeds FIT limit. fit $1eaThere is an easy workaround. Just also define the Catalina symbol EXTERNAL_FLT_CMP - for example:
catalina -p2 hello_world.c -lc -C P2_EDGE -C LARGE -C CACHED_1K -C LUT_CACHE -C EXTERNAL_FLT_CMPHaving a 1K cache in the LUT is probably not a combination of options used by anyone except me, but I will fix this properly in the next release.
Ross.
Catalina 8.8.1 has been released on GitHub and SourceForge.
This is a full release. It add support for Cake (version 0.12.05) as a stand-alone preprocessor that can convert C99, C11 & C23 programs to C89, suitable for compilation by Catalina. Cake is supported on Windows, Linux and Catalyst.
Here is the relevant extract from the README.TXT:
New Functionality ----------------- 1. Cake is now included as a component of Catalina. Cake is a preprocessor that can take C99, C11 or C23 source programs and translate them to C89, suitable for compiling with Catalina. Eventually, Cake may be fully integrated into Catalina, but at present it is a separate component that must be invoked manually to convert a C99 (or C11 or C23) program to C89, which can then be compiled with Catalina as normal. A demo C99 program called hello_99.c is provided. To convert this to C89 and compile it with Catalina, use Cake first and then compile the result with Catalina as normal. For example (on the Propeller 2): cake hello_99.c -o hello_89.c catalina -p2 -lc hello_89.c See the file source\cake\README.TXT for a brief introduction to Cake and how to use it with Catalina. See the Cake manaul in source\caka\manual.md for full details on Cake. 2. Increased the maximum size of a string cpp can process from 4096 to 8192 bytes. This was required for Cake support. 3. Added va_copy macro to stdarg.h, and a new program to demonstrate its use (demos/examples/ex_va_copy.c). This was required for Cake support. 4. Added minimum and maximums for long longs and unsigned long longs, and the ability for CPP to parse constants of the form ddddLL or ddddULL. Currently, long longs are the same size as longs and unsigned long longs are the same size as unsigned longs, but some programs (like Cake) require that these are explicitly supported. 5. Added a dummy wchar.h header file, which (at present) defines wchar_t but does nothing else. This was required for Cake support. 6. Added an implementation of asprintf, vasprintf, snprintf and vsnprintf to stdio, and a new program to demonstrate their use (demos/examples/ex_snprintf.c). This was required for Cake support. 7. Added the ability for printf and family to process z or t format specifiers - e.g. %dz, which means to print/scan something of size size_t, or %ut, which means to print/scan something of size ptrdiff_t, and a new program to demonstrate them (demos/examples/ex_printf_tz.c). These format specifiers were introduced in C99, but some C89 programs assume this functionality exists. Note that these work only in the standard C stdio libraries (i.e. libc, libci, libcx, libcix) but not in the tiny I/O library (libtiny). This was required for Cake support. 8. Two new functions for creating directories have been added: mkdir(const char *name, mode_t *mode); mkdirr(const char *name, mode_t *mode); The mode parameter is not currently used, but is included for compatibility with other C compilers. These new functions are defined in sys/stat.h, and the type mode_t is defined in the new include file sys/types.h for the same reason. Equivalent functions _mkdir and _mkdirr are defined in fs.h. This was required for Cake support. A new program to demonstrate the functions is included in demos/file_systems/test_mkdir.c The mkdir function creates only the leaf directory - e.g. mkdir "a/b/c" will make only directory "c" assuming that "a/b" already exist - whereas mkdirr "a/b/c" will make directory "a" if it does not exist, then "a/b" (ditto) then "a/b/c". If the functions succeed they return zero, and if they fail (e.g. because a directory that was expected to exist does not, or that was not expected to exist already does) then they return -1. 9. Added definitions of EOVERFLOW and EDEADLK to sys/errno.h. These are not used by Catalina, but are expected to be defined by some C programs. This was required for Cake support. 10. Added definitions of HUGE_VALF and HUGE_VALL. These are not used by Catalina, but are expected to be defined by some C programs. This was required for Cake support. 11. A version of strdup and strndup has been added to the C libraries. These are defined in the include file string.h as: char *strdup(const char *_s); char *strndup(const char *_s, size_t _n); Some Catalina components and demo programs had their own versions of strdup and/or strndup which have now been replaced with the library versions. This was required for Cake support. 12. Catalina's version of stdbool.h was defining bool to be unsigned int, but was not also defining _Bool. Now it defines both as unsigned char, which is intended to make Catalina more compatible with C99 onwards for programs compiled using Cake. However, this may break some existing C89 programs which expect type bool to be compatible with bit-fields (i.e. to be of an integer type) - so stdbool.h will revert to the previous definitions of bool and _Bool if the C symbol __INT_BOOL__ is defined (e.g. using -D on the catalina command line). For example: catalina my_program.c -lc -D__INT_BOOL__ 13. Catalina does not support 64 bit integers, but LCC does accept the declaration of long long and unsigned long long, and so some programs will expect it to support them and therefore also expect the types and values associated with them (such as INT64_MAX and UINT64_MAX) to be defined. The stdint.h include file has been modified to define 64 bit types and values but make them all the same as 32 bit types and values. This makes programs (like Cake) work, but may confuse other programs which assume that if these values are defined then 64 bit integers are supported. Other Changes ------------- 1. Some of the values of the fprintf and fscanf format macros defined in inttypes.h were incorrect. Affected Windows, Linux and Catalyst. 2. The library functions sprintf and vsprintf were being limited to writing up to 32767 characters in each call. This limit has been increased to INT_MAX (2147483647) characters. Affected Windows, Linux and Catalyst. 3. The Catalyst build_demos and build_p2 scripts were not working under Linux, leading to missing files in the ZIP files they created. Affected Linux only.NOTE: Cake needs to be configured before it can be used. Otherwise you will get an error message that it cannot find the C include files. For example:
cd %LCCDIR%\source\cake cake hello_99.c -o hello_89.c Cake 0.12.05 %LCCDIR%\source\cake\hello_99.c:9:19: error: file stdio.h not found 9 |#include <stdio.h> | ~ 1 files in 0.01 seconds 1 errors 0 warnings 0 notesTo configure Cake, use the -autoconfig option:
cake -autoconfig Cake 0.12.05 file '%LCCDIR%\bin/cakeconfig.h' successfully generatedThen Cake should work ok:
cake hello_99.c -o hello_89.c Cake 0.12.05 1 files in 0.01 seconds 0 errors 0 warnings 0 notesThis information, plus a few other things you need to know to use Cake, is in the file source\cake\README.TXT
Catalina 8.8.2 has been released on GitHub and SourceForge.
This is a full release. It adds integrated support for using Cake (version 0.12.24) as an alternative to Catalina's default C preprocessor (i.e. cpp). This means that Catalina can now compile "out of the box" programs that use C99, C11 & C23 features. Cake is supported on Windows, Linux, and also on the self-hosted version of Catalina for the Propeller 2 OS (Catalyst).
I have been working with the Cake developer to test/fix many minor Cake issues, just waiting till Cake is ready to be fully integrated into Catalina. The most recent version of Cake correctly compiles all the Catalina demo programs except for those that use C89 features no longer supported in C23 (i.e. jzip and xvi/vi - but such programs can still be compiled with Catalina simply by not using cake). Significantly, Cake now correctly compiles Lua, which is a major milestone given the extensive use Lua makes of the C preprocessor - so this is an appropriate point to do an initial integrated release.
Here is the relevant extract from the README.TXT:
New Functionality ----------------- 1. Cake is now fully integrated into Catalina. While Cake can still be used as a stand-alone preprocessor that can take C99, C11 or C23 source programs and translate them to C89 suitable for compiling with Catalina, it can now also be used in place of the default Catalina preprocessor (cpp) by using the -C option to specify the appropriate C standard using a 2 digit numeric value. For example: catalina -C99 -lci hello_99.c catalina -C 23 -lci hello_99.c Not specifying any C standard, or specifying -C89 or -C90 means Catalina will use cpp as the C preprocessor, whereas specifying -C94, -C95, -C99, -C11, -C17, -C18 or -C23 will use cake as the C preprocessor instead. Other than determining whether cpp or cake is used, there is currently no difference between the various standards except that cake will define the symbol __STDC_VERSION__ to be 202311L (no matter which C standard is specified) but this symbol will not be defined when using cpp. However, eventually cake may behave differently according to which standard is specified. Note that the use of cake as a preprocessor is currently not compatible with the Catalina debugger or the Catalina parallelizer. If -g or -Z is specified along with -CXX (where XX != 89 or 90) then Catalina will issue a warning and ignore the option. The reason for disallowing this is that Cake may re-write the C program, which means the line numbers will not match the original program (which would interfere with the use of the debugger) and/or it may re-arrange lines in the C source program (which would interfere with the operation of the parallelizer). However, Cake can be used stand-alone to first (for example) convert a C99 program to C89, which can then be edited, modified and compiled with Catalina using the debug or parallelizer options. Cake is fully compatible with catapult - simply add the appropriate C standard (e.g. -C99) to the appropriate catapult pragma (typically, it would be added to the "common" pragma). Cake is fully compatible with geany - simply add the appropriate C standard (e.g. -C99) to the Catalina Options field in the "Project-> Properties" dialog box. Cake is fully compatible with the optimizer - simply add the appropriate option (e.g. -O5 or -C OPTIMIZE). However, note that Cake may re-write the C code and (similar to the optimizer itself) this may sometimes result in it omitting code that it thinks is not used by the program (e.g. because it appears only in the string arguments to "PASM" statements, which are not understood by either Cake or the Optimizer). In such cases, it may be necessary to add some dummy C code to prevent the code from being omitted (for an example, see demos\inline_pasm\test_inline_pasm_7.c). Note that Cake is much more rigorous in parsing C source code than cpp, and it may issue errors, warnings or information notes in cases where cpp simply passed the result silently on to the compiler. Also, there may be identifiers that were valid in a C89 program that are not valid according to later C standards, so a valid C89 program may not compile if cake is used. However, the Catalina and Catalyst demo programs** have been updated to compile correctly using EITHER cpp OR cake, albeit sometimes generating additional warning messages when cake is used. ** All programs except for the Catalyst jzip and xvi (aka vi) programs. While these programs COULD be updated to C99, they both use old-style C function definitions extensively, which were dropped in C23 and which Cake does not support. These programs remain as C89 only programs and must still be compiled using -C89, -C90 or with no C standard specified. The same is currently true of Catalina itself and the Catalina C library, except that the C library include files have been updated where necessary so that they can be used with any of the C standards. A section on Cake has been added to the Catalina Reference Manuals. Other Changes ------------- 1. The "Catalina Command Line" option on the geany build menu was not working in Windows 11. It has now been changed to: cmd.exe /k "call ""%LCCDIR%\\use_catalina"" && cd %p" Note that this command must be entered with a double backslash (i.e \\) when manually edited in %LCCDIR%\catalina_geany\data\filedefs\filetypes.c but as a single backslash (i.e. \) when modified using the geany "Build->Set Build Commands" menu & dialog box.First rule of software releases - as soon as you do one, you will find something you missed!
In release 8.8.2 I forgot to add building Cake to the build_binaries script (it was added to the build_all script). You don't normally need to use these scripts if you use the Windows installer, but if you are rebuilding Catalina, or installing it manually on Linux or Pi OS, then if you use build_binaries instead of build_all, then in Catalina's source directory you should also manually do:
EDIT: Fixed on GitHub.
Ross.
I suppose there is the question of could a higher SD throughput make a big difference? The testing shows, compared with Flexspin's C file handling, there is still a lot of headroom for improving the SD driver's data rate. In raw throughput, reads can be more than 10x faster in 1-bit SPI mode, 100x faster in 4-bit SD mode. And writes can be 100x faster SPI mode, 1000x faster SD mode.
Would it make a difference? Yes, of course it would.
Would it make a big difference? Maybe, maybe not - the SD card I/O is only a small part of the whole compilation process.
I'll have a look at it when I get some time, but that won't be soon since I'm busy with other stuff just at the moment. But someone else is welcome to do so, since it involves rewriting just a single and entirely self-contained plugin which is just a few hundred lines of PASM (see target/p2/cogsd.t - ignore the clock support which is implemented in the same cog just because there is spare space).
Ross.