Catalina - C and Lua for the Propeller 1 & 2

RossH · 2025-04-18 22:53

I've not had time to do much more on Catalina yet (Easter is one of our busy periods!) but I discovered recently that if I do not do regular posts on the page I use for hosting videos, then Patreon suspends the page

So I have added a short video on the above jQuery demo ...

See https://www.patreon.com/posts/catalina-jquery-126965669

Ross.

dgately · 2025-04-19 15:08

Well worth the visit to the demo... High quality video with an in-depth audio explanation. Great work, Ross!

dgately

RossH · 2025-05-06 06:49

Catalina 8.6 has been released on GitHub and SourceForge.

This is a full release. It adds some minor eLua enhancements and bug fixes. The main reason for the release is to add a new version of Catalina's Windows telnet client, which has enhanced support for the Parallax WiFi module's Serial to TCP Bridge (under Linux you can use the standard telnet client, but the Microsoft Windows telnet client does not work well with the Parallax WiFi module).

This allows an internet interface to be used in place of the serial interface for Catalyst, as well as any other program that use one of the serial HMI options.

There is a new video that demonstrates the new telnet support here.

Here is the relevant extract from the README.TXT:

RELEASE 8.6

New Functionality
-----------------

1. A new eLua variant (sluarfx.c) has been added which can be used when
   a slave WiFi RPC propeller needs only a server and not a client. This 
   variant executes the server in Hub RAM, making it significantly faster 
   for Lua servers small enough to fit.

2. A new eLua variant (rlua2.c) has been added which includes only WiFi RPC 
   support and omits Serial ALOHA support. This allows it to omit the aloha 
   protocol code, and also use the 2 port serial plugin in place of the 8 
   port serial plugin, which saves Hub RAM. This variant includes the Lua 
   parser. The corresponding variant that omits the Lua parser has been 
   renamed as rlua2x.c for consistency (it was previously named rluax2.c).

3. The eLua/ALOHA WiFi RPC HTTP demo (demos/eLua/http) now includes a complete
   set of definitions for all the Lua WiFi functions. As demonstrated by the 
   demo, these allow a client Lua program to use the WiFi functions even 
   though those functions are implemented in the server, not in the client.

4. The MAX_SERVICES constant in all the eLua/ALOHA variants has been increased
   from 20 to 50. This number is arbitrary, but it has been expanded to 50 to
   accommodate the 20 WiFi services in the updated HTTP demo (demos/elua/http).
   It can be increased further if required.

5. The eLua/ALOHA WiFi RPC variant rlua2x.c is now so similar to the rluasx.c
   variant that there is no need to have both, so rluasx.c has been removed. 
   The build scripts and Aloha documentation have been updated accordingly.

6. A demo of using jQuery with Catalina has been added. The demos/wifi/gauge 
   demo uses jQuery and a custom jQuery widget to demonstrate exchanging data
   between a Propeller and a browser-based user interface. Both a Lua and a C 
   version of the demo are provided. An eLua/ALOHA WiFi version is also
   provided (in demos/eLua/gauge).

7. The Catalina telnet binary has been copied to the bin directory (it was
   previously only in the source/comms/bin directory). This is because the 
   telnet client was not previously much use, but it has now been enhanced 
   to work with the Parallax WiFi module's Serial Bridge TCP Server. This 
   enables either Catalyst, or a downloaded program, to be used via the 
   WiFi module using the telnet client. This is currently supported on a 
   Propeller 2 only. 

   Catalina's telnet client has been enhanced to work with the Parallax WiFi 
   module's Serial Bridge TCP Server by:

   - Increasing the size and number of comms buffers, to account for the 
     fact that the Serial Bridge TCP Server may send larger packets than 
     a telnet server, and may also send more small packets than a telnet 
     server.

   - adding a new mode selection telnet command ("mode char" or "mode line"), 
     which can also be specified on the command line (as "/mode=char" or
     "/mode=line"). When specified on the command line, the "/mode=char" 
     option forces the telnet client into character mode without using 
     telnet option negotiation, which is not supported by the WiFi module's 
     Serial Bridge TCP Server (which is not actually a telnet server, it
     simply echoes any characters sent, including any telnet commands).

     When using another telnet client, specifying character mode will try
     to do the same using telnet option negotiation, which may or may not 
     work, depending on how the telnet client responds to having all the
     option negotiations simply echoed by the Serial Bridge TCP Server. It
     SHOULD work, but it may not - e.g. it does not work with the Microsoft
     telnet client, which does not properly support either character mode 
     or disabling local echo - so all characters entered end up being echoed 
     twice.

   To use the telnet client via WiFi instead of a hard-wired serial port, the 
   Parallax WiFi module must be installed on pin group 56..63. However, note
   that using this pin group on a P2 EDGE with PSRAM (i.e. P2-EC32MB) means 
   the WiFi module's RESET and PGM pins are not connected, so the WiFi module
   cannot be reset programmatically and may have to be power-cycled manually 
   whenever it needs to be reset. But it does allow a telnet client program 
   to be used as Catalyst's user interface.

   Also note that when the WiFi module is installed on pin group 56 .. 63, 
   it is possible to download software to the Propeller 2 by opening a 
   browser to the page http://xxx.xxx.xxx.xxx//p2-ddloader.html (where
   xxx.xxx.xxx.xxx is the WiFi module's IP address) - and then use the
   telnet client to interact with the downloaded software. But to download 
   software the P2 module's switch 3 must be up and switch 4 must be down, 
   whereas to run software from the SD card using Catalyst, switch 3 must 
   be down and pin 4 must be up. So you cannot do both without changing 
   the switch settings.

   Under Windows, you can build Catalyst to use one of the serial HMI options
   and a VT100 as usual, such as:

      build_all P2_EDGE SIMPLE VT100 USE_COLOR

   Then, to use the Catalina telnet client, specify the IP address of the
   WiFi module, and the following options (where xxx.xxx.xxx.xxx is the IP 
   address of the WiFi module):

      telnet /mode=char /uselfaseol /autocronlf /host=xxx.xxx.xxx.xxx

   The Linux telnet client does not have comparable options, so instead you 
   must build Catalyst to use one of the serial HMI options but also add the 
   CR_ON_LF option:

      build_all P2_EDGE SIMPLE VT100 USE_COLOR CR_ON_LF

   Then, use the Linux telnet client specifying the IP address of the WiFi 
   module (e.g. where xxx.xxx.xxx.xxx is the IP address of the WiFi module):

      telnet xxx.xxx.xxx.xxx

   Once in telnet (Linux only), enter the telnet escape character ("Ctrl+]") 
   and then enter the telnet command "mode char". This will use telnet 
   negotiations to enter character mode, which may result in the program 
   receiving a few spurious characters - but it should work ok after that.

   Note that the WiFi module's Serial Bridge TCP Server times out after 5 
   minutes of inactivity. The connection can be re-opened again using the 
   telnet "open" command (i.e. Enter telnet command mode by pressing "CTrl ]" 
   and enter the command "open xxx.xxx.xxx.xxx" at the "Telnet>" prompt).

Other Changes
-------------

1. The eLua/ALOHA custom dispatcher was not using the correct length of the
   RPC data it pushed onto the Lua stack when processing an RPC call. This may
   have led to RPC call failing or Lua running out of memory when processing
   the RPC call. Affected the Propeller 2 only on Windows and Linux.

2. The definition of the WiFi RECV function in the eLua http demo was wrong.
   Affected the Propeller 1 and Propeller 2 on Windows and Linux.

3. The serial.h include file did not allow for the use of the Propeller 1 
   serial4x library. Affected the Propeller 1 only on Winows and Linux.

4. The P2_EVAL platform definition file (P2EVAL.inc) was missing the WiFi and 
   2 Port Serial definitions. Affected the Propeller 2 only on Windows and 
   Linux.

5. Add a small delay to the eLua/ALOHA custom dispatcher, to prevent the 
   program monopolizing the WiFi module by continuously polling it for new
   events. Affected the Propeller 2 only on Windows and Linux.

6. Some Catalina makefiles were not correctly detecting whether or not to use
   DOS or Linux commands (e.g. "del" vs "rm"). Most affected was the 
   "clean_all" command, which would never complete when used in some demos 
   directories. Affected Windows only.

RossH · 2025-05-10 08:16

I've just updated the Catalina Reference Manuals for the Propeller 1 and the Propeller 2. The main change is to add in to the reference manuals a lot of stuff that only existed in the Release Notes or in various README.TXT files.

The updated documents are available on GitHub here, or on Sourceforge here.

Ross.

RossH · 2025-05-20 12:11

Just an oddity worth being aware of if you intend to use telnet as your means of interacting with the P2.

Doing so requires you to have a Parallax WiFi adapter on pin group 56 .. 63, and some odd things happen if you also have a Prop Plug plugged in (to pins 62 & 63). If the Prop Plug is NOT also connected to a USB port, then the telnet connection may not work correctly. I have tried many different combinations of WiFi adapter, P2 Edge board and Prop Plug, but I am yet to figure out what is going on. Some combinations always work, some combination work sometimes, and some combinations never do.

However, it can always be made to work simply by either removing the Prop Plug, or else connecting the Prop Plug to a USB port.

Ross.

RossH · 2025-05-25 07:38

I just noticed that the version of the Propeller Reference Manual (both for the Propeller 1 and the Propeller 2) in the Sourceforge distributions of Catalina release 8.6 were not the latest ones. I have updated both the Linux and Windows distributions on SourceForge. The GitHub versions were correct.

The versions in the 'documentation' section of SourceForge were correct, so you can also just overwrite the versions in your installed documents folder with the ones from here. You only need the Reference Manuals.

Ross.

RossH · 2025-05-26 06:35

I kept losing track of all the Lua variants Catalina now includes, so I started using a simple cheat sheet. It proved to be so useful that I turned it into a document called Who is who in the Lua Zoo. I have uploaded it to GitHub and also to Sourceforge here.

It is also attached to this post.

Ross.

RossH · 2025-06-16 12:20

Just noticed an oddity with Catalina - XMM programs (SMALL or LARGE) compiled for a P2 Evaluation board equipped with the P2 HyperRAM add-on board and using a 32Kb or 64Kb cache may not run when the default clock speed (180Mhz) is used. But they run fine if 200Mhz (or higher) clock speed is used.

So, the following do not work for me ...

catalina -p2 hello_world.c -lci -C P2_EVAL -C LARGE -C CACHED_64K
catalina -p2 hello_world.c -lci -C P2_EVAL -C LARGE -C CACHED_32K

but all the following do work ...

catalina -p2 hello_world.c -lci -C P2_EVAL -C LARGE -C CACHED_64K -C MHZ_200
catalina -p2 hello_world.c -lci -C P2_EVAL -C LARGE -C CACHED_32K -C MHZ_200
catalina -p2 hello_world.c -lci -C P2_EVAL -C LARGE -C CACHED_16K
catalina -p2 hello_world.c -lci -C P2_EVAL -C LARGE -C CACHED_8K

This doesn't happen on my P2 Edge boards (which have on-board PSRAM).

I can't think of any reason why simply increasing the cache size would cause this, or why increasing the clock speed would fix it - if anything, I would expect a larger cache size should reduce the load on the PSRAM, and the increased clock speed should increase it. It may only be my particular P2 Eval board or P2 HyperRAM board that have the issue, but if you experience a similar problem, try adding -C MHZ_200 to your catalina command.

Ross.

RossH · 2025-06-16 23:55

Just a bit more information on the issue I reported in the previous post - it appears to be unique to the P2 Evaluation board (and may be unique to my board, which is an older RevB board). When I move the P2 HyperRAM add-on board to my P2 Edge and use it in place of the onboard PSRAM everything works ok.

It's a Mysteron.

RossH · 2025-07-01 04:43

Just had an issue reported on the VGA version of Catalyst for the Propeller 2 - it no workee!

The problem is with the linenoise library, which adds command line editing and command history to Catalyst (Propeller 2 version only). I enable this by default, but it relies on using a VT100-compatible serial terminal (or telnet client) as the user interface, and must be disabled when using a local VGA display and USB keyboard.

This issue affects two of the pre-built Catalyst demos (i.e. P2_EVAL_VGA.ZIP and P2_EDGE_VGA.ZIP).

The workaround is to re-build Catalyst (this only needs to be done for Catalyst itself, not all the Catalyst applications) adding the NO_LINENOISE option. For example, for the P2_EDGE, you might open a Catalina Command Line window and use commands like:

Windows:

cd "%LCCDIR%\demos\catalyst"
build_all P2_EDGE VGA COLOR_4 MHZ_200 NO_LINENOISE

Linux:

cd "$LCCDIR/demos/catalyst"
build_all P2_EDGE VGA COLOR_4 MHZ_200 NO_LINENOISE

Then replace the relevant binaries (e.g. in the /bin folder on your SD card) with the new ones that will be built in demos/catalyst/image

I will fix this in the next release.

Ross.

evanh · 2025-07-01 06:49

@RossH said:
I can't think of any reason why simply increasing the cache size would cause this, or why increasing the clock speed would fix it - if anything, I would expect a larger cache size should reduce the load on the PSRAM, and the increased clock speed should increase it. It may only be my particular P2 Eval board or P2 HyperRAM board that have the issue, but if you experience a similar problem, try adding -C MHZ_200 to your catalina command.

Sounds suspiciously like a timing mismatch resulting from poorly written clock gen solution. Each edit of the source code and recompile is a roll of the dice. It's something I've had to work quite hard at in my generic solutions when providing a runtime settable clock divider.

Many of the existing PSRAM drivers rely on using smartpin TRANSITION mode and a fixed clock divider of 2 to dodge this issue. But HyperRAM's DDR makes a divider of 2 into a transfer rate of sysclock/1, which is just too fast to be reliable. So a clock divider of 4 of required ... and this is prone to the misaligned clock gen timing problem, and therefore requires specifically addressed to achieve reliable operation.

evanh · 2025-07-01 07:27

The problem stems from the way the smartpin cycles internally. It starts cycling when DIR goes high and internally continues, whether pulses are produced or not, until DIR is lowered again. When a WYPIN is issued to the smartpin, the new pulse, or pulses, is not sent to the pin immediately but rather on the smartpin's next internal cycle. The pulse output is not timing aligned by the WYPIN instruction.

I've worked out that the earliest pulse that can be generated by these two smartpin modes (TRANSITION and PULSE) is on the second cycle, after a DIRH ... and that the timing of this second cycle can be predicted.

Here's the generic code I use - As taken from my recent SD card driver. This one is writing data block data to a SD card. The main parameter to be computed is m_align.

        xinit   m_align, #0    // lead-in delay from here at sysclock/1
        setq    v_nco        // streamer transfer rate (takes effect with buffered command below)
        xzero   m_dat, #0     // buffered-op, aligned to clock via lead-in
        dirh    p_clk    // clock timing starts here
        wypin   clocks, p_clk    // first pulse outputs during second clock period

Here's the calculation line for m_align. This is computed only when adapting to an environment change like divider or clock polarity change.

    uint32_t  poldiv = (CLK_POL ? clkdiv : clkdiv/2);

    txblkset.m_align = X_IMM_32X1_1DAC1 | 5 + CLK_REG - CMDAT_REG + clkdiv + poldiv;  // start(S)-bit is ahead of first streamer bit

The components of the calculation vary depending on which data bit is to be clocked in or out first. Eg: In the same written data block the CRC gets immediately tacked on the end of the block without any extra framing bits. A pause in streamer activity is required in case CRC is not computed in time ... and therefore a restart but with different timing:

    txblkset.m_align2 = X_IMM_32X1_1DAC1 | 5 + CLK_REG - CMDAT_REG + poldiv;    // no S-bit, CRC abuts end of data block

RossH · 2025-07-01 10:54

@evanh, @rogloh

Thanks, I'll investigate as far as I am able. But an issue like that this would probably be deep in rogloh's HyperRAM driver, which I just use without understanding all the internals. However, you have just reminded me to check that driver and I've just noticed I am still using the 0.9b BETA version. I'll see if there is a later version that addresses any timing issues. The only configuration differences I am aware of between the P2 Edge and P2 Evaluation boards are the pin numbers used. I will try changing them to use the same pins on both boards and see what that does.

Ross.

evanh · 2025-07-01 13:50

@RossH said:
... between the P2 Edge and P2 Evaluation boards are the pin numbers used. I will try changing them to use the same pins on both boards and see what that does.

If moving pin group works you're likely then dealing with the same precarious balance that sysclock/1 transfers are up against. Which would imply an off-by-one timing bug for sysclock/2 transfers.

RossH · 2025-07-03 00:01

@evanh said:

@RossH said:
... between the P2 Edge and P2 Evaluation boards are the pin numbers used. I will try changing them to use the same pins on both boards and see what that does.

If moving pin group works you're likely then dealing with the same precarious balance that sysclock/1 transfers are up against. Which would imply an off-by-one timing bug for sysclock/2 transfers.

I am beginning to think there are problems with my P2 Eval board - or maybe its a problem with the early versions of the board (mine is a Rev B ). I tried putting the HyperRAM board on the same pins I use on the P2 Edge and it does not work at all. Nor does it work on my Rev A board.

Back on the original pins on the P2 Eval board, some clock speeds work (e.g. 200Mhz, 240Mhz) but many do not (e.g. anything below 200Mhz or above 240Mhz), whereas any clock speed from 100Mhz to to 260Mhz works on the P2 Edge. It appears I may have just got lucky with the pins and clock speed I chose to use!

I will have to put this aside for now - I want to get the new Catalina release out. But if anyone has a Rev C P2 Eval board and a HyperFlash/HyperRAM add-on board, let me know and I will post some software to try.

Ross.

evanh · 2025-07-03 01:41

I have two Eval revB (one known damaged) and two HyperRAM add-on that I can easily try combinations with. I haven't actually used Roger's driver - maybe I should put the effort in starting with that alone ...

rogloh · 2025-07-03 03:15

@RossH said:
@evanh, @rogloh

Thanks, I'll investigate as far as I am able. But an issue like that this would probably be deep in rogloh's HyperRAM driver, which I just use without understanding all the internals. However, you have just reminded me to check that driver and I've just noticed I am still using the 0.9b BETA version. I'll see if there is a later version that addresses any timing issues. The only configuration differences I am aware of between the P2 Edge and P2 Evaluation boards are the pin numbers used. I will try changing them to use the same pins on both boards and see what that does.

Ross.

The P2 Edge and P2 Eval boards will have slightly different round trip IO timing to the HyperRAM device and this could account for your observations. Due to the limits of the P2's fixed sampling intervals for the memory data read at the pins this means their "most reliable" frequency band regions will be different for a given sample delay value. You could try to run my delaytest.spin utility to see where any sweet/bad spots are in the output over the operating frequencies and see how it differs for your pins and boards. There are a couple of settings in the delaytest program you can try out, namely Sysclk/1 and Sysclk/2 as well as registered/unregistered clocks that could tweak things slightly if you happen to need to operate in the middle of a "bad" range for the delay value chosen. Ideally it determines a delay value that fully works both above and below the frequency, but a bad gap will be widened if there is a lot of IO skew and this may make more frequencies unreliable.

Note that Sysclk/1 operation with HyperRAM is not recommended if you need to cover the full frequency range as it doesn't provide enough timing options to be 100% reliable. It is just an extreme performance / experimental setting. Also there is going to be some temperature variation effects too and the driver delay timing is set once at startup and not currently adaptive - that would need external management to continuously sample memory with different timing values and adjust timing as the temperature varies for the best output. This effect is going to be more noticeable if you are already operating right near the edge of a working frequency band for the configured delay. The driver does actually allow the delay to be changed on the fly, but there is nothing controlling that API right now unless the user's application does it.

RossH · 2025-07-03 03:44

@rogloh said:
You could try to run my delaytest.spin utility to see where any sweet/bad spots are in the output over the operating frequencies and see how it differs for your pins and boards. There are a couple of settings in the delaytest program you can try out, namely Sysclk/1 and Sysclk/2 as well as registered/unregistered clocks that could tweak things slightly if you happen to need to operate in the middle of a "bad" range for the delay value chosen. Ideally it determines a delay value that fully works both above and below the frequency, but a bad gap will be widened if there is a lot of IO skew and this may make more frequencies unreliable.

Thanks, @rogloh - will try this once I get the current release out (probably later today). Is there a later version of your driver than 0.9b BETA?

Ross.

RossH · 2025-07-03 03:53

@evanh said:
I have two Eval revB (one known damaged) and two HyperRAM add-on that I can easily try combinations with. I haven't actually used Roger's driver - maybe I should put the effort in starting with that alone ...

Thanks, @evanh - I will post a simple test program later today.

Ross.

evanh · 2025-07-03 04:17

I've coded the SD driver to runtime auto-calibrate itself both at init time and if a CRC read error occurs. A memory expansion driver could at least perform an auto calibrate upon init. That way it will adapt to differing boards at least.

rogloh · 2025-07-03 06:04

@RossH said:
Thanks, @rogloh - will try this once I get the current release out (probably later today). Is there a later version of your driver than 0.9b BETA?

Probably some WIP somewhere on my PC for a couple of PSRAM driver changes but nothing new has been released. I always keep the latest released version on the first page of my memory driver thread which still shows 0.9b.

rogloh · 2025-07-03 06:05

@evanh said:
I've coded the SD driver to runtime auto-calibrate itself both at init time and if a CRC read error occurs. A memory expansion driver could at least perform an auto calibrate upon init. That way it will adapt to differing boards at least.

Yeah a useful thing to add sometime. It would be good if this calibration could be run at startup and then exposed as an API that the app may wish to call periodically, although it may require some particular reserved memory space for its writes. I know I coded it to allow additional banks to have their memory timing changed without affecting other active banks until they need to be changed (ie. timing is bank specific). This allows tests to try out delay values on either side of the current delay to determine if they are "better" than the current (which is not always that easy to determine IMO when there are only two working timing values; 3 is easier just pick the middle one).

RossH · 2025-07-03 07:22

@evanh

Well, this is annoying. My test program now works at both 180Mhz and 200Mhz, whereas it used to only work at 200Mhz! But it still doesn't work at 260Mhz on the P2 Eval board (whereas it does on the P2 Edge board).

Anyway, if you want to try it, the test program is included with Catalina. It is a C replica of one of @rogloh's own programs. To use it, just open a Catalina Command Line window, and then ...

... on Windows:

cd %LCCDIR%\demos\p2_ram
build_all P2_EVAL HYPER SIMPLE
payloadi test

... on Linux:

cd $LCCDIR/demos/p2_ram
build_all P2_EVAL HYPER SIMPLE
payloadi test

The test program defaults to 180Mhz. You can specify 200Mhz by adding MHZ_200 to the build command, or 260Mhz by adding MHZ_260. For example:

build_all P2_EVAL HYPER SIMPLE MHZ_200
or
build_all P2_EVAL HYPER SIMPLE MHZ_260

Also, note the use of payloadi rather than payload. This is because some test commands (like S for settings) show too much information to fit on the default terminal size, and payloadi allows you to set it via the -g option. You may also need to reduce your font size (on Debian Linux, use the "Zoom Out" feature of the terminal window). For example ...

payloadi test -g120,60

Here is what I see under Windows when I display the settings:

The default target configuration file (/target/p2/P2EVAL.inc) expects the P2 HyperFlash/HyperRAM board to be on pins 32 .. 48, You can change this by editing this file (search for HYPER) and then re-compiling.

Ross.

Edit: Add the SIMPLE option to the commands.

rogloh · 2025-07-03 10:04

@RossH That output above looks like the test harness program I used to test individual driver functions. The other delaytest.spin2 file is the one that is best for memory testing over frequency.

RossH · 2025-07-03 11:41

@rogloh said:
@RossH That output above looks like the test harness program I used to test individual driver functions. The other delaytest.spin2 file is the one that is best for memory testing over frequency.

Yes, I unashamedly stole it and converted it to C

Will check out your delaytest.spin2 program, and probably do the same!

Ross.

Wuerfel_21 · 2025-07-03 14:04

I think I've said it before, but I'm not very confident in PSRAM auto-calibration being very reliable. With the SDSD driver it can work because fails are caught by the CRC check.
In particular:

async/unregistered timings will slightly differ based on cogid, so cog doing calibration needs to be the same one that ends up actually doing reads.
overall timing changes based on active cogs / power draw (?) - this one nearly drove me mad when I started added sound (at that point without ADPCM streaming or any other PSRAM interaction) to the NeoGeo emulator and suddenly it started crash at random due to PSRAM transfer corruption. (The fix was to enable sync io, which wasn't an option previously). I think this has something to do with temperature or the core power rail. In my RAM tester program I deliberately filled up all otherwise unused cogs with some terrible idle loop, because otherwise it would report OKs for timings that fail when put into practice. After that, certain 3rd party boards (such as earlier Rayslogic boards) could no longer start the test program at all (insufficient voltage regulation?).
there are cases when two timing settings appear valid at first, but one is actually stable and the other fails after a few minutes of RAM test.

It'd be very good to have some sort of way to auto-config this stuff though, even if that's in the form of a separate profiling tool.

RossH · 2025-07-04 02:54

Catalina 8.7 has been released on GitHub and SourceForge.

This is a full release. It significantly improves the speed of COMPACT programs (by up to 7%) and XMM SMALL and LARGE programs (by up to 30%). It also contains some minor bug fixes and tidies up the documentation, including removing many old README files (where still relevant, their content has been moved into the appropriate Catalina documentation).

Here is the relevant extract from the README.TXT:

New Functionality
-----------------

1. The Propeller 2 COMPACT kernels have been updated to enable using the FIFO
   by default. Enabling the use of the FIFO means the kernel will use RDFAST 
   and RFLONG instructions in place of RDLONG to read instructions. This can 
   result in a typical speed improvement between 4% and 7%, but it means the 
   FIFO cannot be used for other purposes by the C program. To disable the 
   use of the FIFO (essentially reverting to pre-8.7 behaviour) define the
   Catalina symbol NO_FIFO (e.g. -C NO_FIFO). For example:

      cd demos\benchmarks
      catalina -p2 -lci -C CLOCK -p2 fibo.c -C COMPACT -C NO_FIFO

   Note that COMPACT programs that use interrupts MUST specify NO_FIFO, since
   interrupts disrupt the FIFO. 

2. The LUT_PAGE option introduced in Catalina 7.4 improved performance for
   XMM SMALL programs, but for XMM LARGE programs it sometimes reduced
   performance, so at the time it was not made the default. In this release
   it has been updated so that the performance of LARGE programs is also 
   improved by up to 25% for cache sizes above 2K, so it is now enabled by 
   default for all XMM programs. This means it is no longer necessary to 
   define the LUT_PAGE Catalina symbol, and doing so has no effect. 
   However, it can now be disabled if required (e.g. in programs that want 
   to use the LUT themselves, or have only 2K available for the cache) by
   defining the new Catalina symbol NO_LUT (e.g. -C NO_LUT). For example:

      cd demos\benchmarks
      catalina -p2 -lci -C CLOCK -p2 fibo.c -C LARGE -C NO_LUT

   Note that for programs that have only a 1K cache, the CACHE_LUT option
   is recommended, and if this is specified the LUT is used to hold the
   entire cache instead of just a 1K page. Also note that the other options 
   introduced in Catalina 7.4 - CACHE_PINS and FLOAT_PINS are still 
   available and can be used with or without using the LUT. They are not 
   enabled by default since they require suitable pins to be available, but
   can offer additional speed improvements of around 10%. See the platform
   configuration files (e.g. P2EVAL.inc, P2EDGE.inc or P2CUSTOM.inc for
   more details on allocating suitable pins).

   Using the LUT, CACHE_PINS and FLOAT_PINS options can improve the 
   performance of benchmark programs (e.g. Whetstone) by up to 30%, and 
   real-world programs (such as the self-hosted version of Catalina itself) 
   by about 25%.

3. The COMPACT Catalina code generator has been updated to emit different
   alignl options in different circumstances:

      alignl       : align a long as required for both the P1 and P2
      alignl_p1    : align a long as required for the P1 only
      alignl_debug : align a statement for debugging (i.e. when -g is used)
      alignl_label : align a label

   On the Propeller 1, these all do the same thing, which is to align the 
   next code or data element  on the next long (4 byte) boundary. On the 
   P2 they also currently all do the same thing, but this may be modified 
   in a future release since the Propeller 2 does not always require code 
   or data to be aligned on long (4 byte) boundaries.

Other Changes
-------------

1. A known bug occurs when an XMM program (LARGE or SMALL) is compiled for 
   the P2 Evaluation board (i.e. P2_EVAL) when the program uses the default 
   clock frequency (180Mhz). In this case the program may not work correctly.
   This problem occurs on some P2 Evaluation boards that use the HyperFlash/
   HyperRAM add-on board - it does not occur on the P2 Edge board, either 
   with the HyperFlash/HyperRAM add-on board, or with the on-board PSRAM. 
   This issue occurs on some Rev B boards, and may also occur on other boards. 
   The program will work correctly if 200Mhz is used as the clock frequency
   (i.e. MHZ_200), which is the default frequency used for Catalyst and 
   its programs. Affects the Propeller 2 Evaluation board only.

2. The VGA versions of the Catalyst demos would not execute correctly because
   they not being compiled with the NO_LINENOISE option. The linenoise library
   relies on the use of a VT100 compatible serial terminal emulator, which is 
   not applicable when using the VGA option. Affected the Propeller 2 only.

3. The Propeller 2 version of Catalina was erroneously executing Quick Build
   processing if both -c (compile only) and -q (quick build) were specified, 
   even though the quickbuild would fail because no binary file was generated
   by the compile. Now specifying -c causes -q to be ignored, which matches 
   the PC version of Catalina. Affected the Propeller 2 on Catalyst only.

4. The Windows create_shortcuts.bat file was not creating a link to the 
   Catalina Release History document. Affected the Propeller 1 and 2 on 
   Windows only.

5. The Propeller 2 versions of the binbuild and binstats utilities were
   not setting their exit codes, which meant Catalyst scripts could not 
   detect when they failed. Affected the Propeller 2 on Catalyst only.

6. Various README files have been removed, with the relevant information
   consolidated into one file, or else included in the various reference 
   and tutorial documents.

7. The Windows installer no longer offers the option of installing Make
   and other Gnu utilities.

Ross.

RossH · 2025-07-06 07:21

After advice from @rogloh and @evanh, I made some adjustments to the HYPER configuration section of the P2 Evaluation board platform configuration file (i.e. target/p2/P2EVAL.inc) and it now works across a much wider clock range (I tested 100Mhz - 300Mhz).

Unfortunately, there is no one set of values that works across all clock frequencies on this board. Here are the values that work for me ...

' RAM options
HYPER_FASTREAD   = 0 ' 0 disables, 1 enables <-- was 1
HYPER_FASTWRITE  = 0 ' 0 disables, 1 enables
HYPER_UNREGCLK   = 0 ' 0 disables, 1 enables

' RAM latency, burst size and delay
HYPER_LATENCY_RAM   = 6
HYPER_BURST_RAM     = $0280
HYPER_DELAY_RAM     = 10 ' 11 if clock > 260Mhz, 10 if 150-260Mhz, 9 if < 150Mhz

Some additional tweaking of HYPER_DELAY_RAM may be required on other boards.

The previous values seem to work ok the P2 Edge. I must have tested my HyperRAM/HyperFlash add-on board on that and did not realize the same values would not work on the P2 Eval board across all clock speeds - it was apparently just pure luck that those values DID work at my default clock speed (200Mhz).

Have updated GitHub - SourceForge users should edit the platform configuration file manually.

Ross.

evanh · 2025-07-06 15:01

Ya, the HYPER_DELAY_RAM is what an auto calibration would set.

RossH · 2025-07-07 00:42

Small benchmark programs are fine, but as a real-world test to make sure the HyperRAM is now working 100% on my P2 Eval board, I recompiled Catalina itself to run on the P2 Eval at 300Mhz. Catalina uses the lower part of the Hyper RAM as code and data space, and the upper part as an SD card sector cache, so it really does exercise nearly all of it.

Using Catalina's Quick Build option, I can now compile "Hello World" on the P2 itself in 3.5 minutes!

EDIT: A significant chunk of that 3.5 minute compile time is the overhead of Catalyst executing the script that loads the various Catalina components - a better indicator is probably the time taken to compile the game Othello, which is about 500 lines of C code. That takes 5 minutes.

Catalina - C and Lua for the Propeller 1 & 2

Comments