Catalina - ANSI C and Lua for the Propeller 1 & 2

Wingineer19 · 2023-06-26 22:09

Hi RossH,

I've taken a break from working on the Prop2 and decided to revisit my past work done on the Prop1.

A few years ago I built an add-on memory module with a socket that allows the FLiP module to simply plug into it (i.e. piggyback). This module contains two 512KB SRAM SPI memory chips and some additional EEPROMs.

I decided to re-write the memory driver for this board, and this brought up a couple of questions:

While running build_utilities:

A. I'm first given the choice of several pre-configured Prop CPU boards: HYDRA, HYBRID, DEMO, TRIBLADEPROP, ASC, DRACBLADE, RAMBLADE, RAMBLADE3, C3, PP, QUICKSTART, ACTIVITY, FLIP, P2_CUSTOM, P2D2, P2_EDGE, P2_EVAL. Or enter CUSTOM if none of the above.

B. Then I'm presented with a choice of several pre-configured add-on memory boards: SUPERQUAD, RAMPAGE, HX512, RP2, PMC, HYPER. Or just hit the ENTER key if none of the above.

C. Then I'm asked (or told) if I want to enable the cache, and what size, depending upon the memory board chosen for both SRAM and FLASH and whether or not the board includes them.

Question: Instead of defining an entirely new Prop1 board called CUSTOM and configuring the driver code under that name, is there a way to keep the Prop1 board type as a FLIP and just create a new memory add-on board instead?

For example, I would like to create a memory module called DUALSRAM. Then when I run build_utilities, I would select the Platform as FLIP and the XMM memory module as DUALSRAM. Thus I wouldn't have to create an "all-in-one" board called CUSTOM to include all of the CPU info along with the memory driver under that description.

I've attempted to create a separate XMM add-on board called DUALSRAM by modifying the XMM.INC, CFG.INC, DEF.INC, and build_utilities and build_utilities.bat files without success.

When executing build_utilities after these modifications, it did allow me to select the FLIP module as the Platform and the DUALSRAM as the add-on. It then asked about enabling cache for DUALSRAM. After making the selection, it appeared to go through the motion of creating the necessary files but then ended the adventure with this error:

Payload_EEPROM_Loader.spin
|-Catalina_Common.spin
Catalina_Common.spin(678:19) : error : Expected a unique name, BYTE, WORD, LONG, or assembly instruction
Line:
THREAD_BLOCK_SIZE = 35         ' size (LONGs) of thread_block
Offending Item: =

Copying EEPROM binaries to bin ...

        1 file(s) copied.

                      ============================
                      Building utilities completed
                      ============================

The utility programs should have been copied to the Catalina 'bin' directory.
If you saw error messages from the copy commands, ensure you have write
permission to the 'bin' directory and then rerun this utility.

Other errors may indicate the options you selected are not supported by
your Propeller platform - check the options and then rerun this utility.

While running build_ram_test:

A. For the CPU platform I can choose: HYDRA, HYBRID, RAMBLADE, DRACBLADE, DEMO, C3, CUSTOM, TRIBLADEPROP CPU_1, TRIBLADEPROP CPU_2, SUPERQUAD, RAMPAGE.

B. For Options I can choose: NO_RAM, FLASH, CACHED, CACHED_1K, CACHED_2K, CACHED_4K, CACHED_8K.

C. I initially tested with cached enabled, so I typed: build_ram_test CUSTOM CACHED_4K PRINT_DETAIL. Ramtest was successful as it passed both the Trivial and Complex tests.

D. Next I wanted to test without cached enabled (i.e. the XMM Direct functions), so I typed: build_ram_test CUSTOM PRINT_DETAIL. Again, Ramtest was successful as it passed both Trivial and Complex tests.

E. When testing with cached enabled, I assume that ramtest only activates the XMM_READPAGE and XMM_WRITEPAGE functions which only work with individual bytes. Hence there is no movement of words or longs to/from XMM memory.

F. When testing with cache disabled, I assume that ramtest only activates the XMM_READLONG, XMM_READMULT, XMM_WRITELONG, and XMM_WRITEMULT functions. Data is moved to/from XMM memory as individual bytes, words, or longs as the Kernel requests and there is no caching of the data.

Question: When operating without cache enabled, does ramtest verify proper transfer of individual bytes, words, and longs, or does it only test the movement of longs?

I ask because I have a simple "Hello,world!" type program that works fine with cache enabled but doesn't work at all without cache. This has me wondering if there's a problem with the XMM Direct functions even though they passed ramtest.

Wingineer19 · 2023-06-27 00:41

Hi RossH,

I've enclosed a copy of my latest Prop1 XMM memory driver for review at your leisure.

The Catalina Manual warns about messing with the C and Z flags within the driver.

Note that I did just that within the MakeQuadMode function, but it doesn't seem to have an adverse effect when running my "Hello,world!" program using the cache.

The program won't run at all when I'm using the XMM Direct functions without the cache. This despite the fact that it passes the ramtest.

That's why I'm wondering if ramtest only tests LONG variables and doesn't check if WORD and BYTE data is being stored and retrieved correctly. If my XMM Direct functions aren't working with WORD and BYTE data that could explain the program failure.

After running build_utilities and configuring the XMM driver to use 4K cache, the program ran fine when I compiled it this way: catalina hello.c -p1 -lc -C TTY -C CUSTOM -C SMALL -C CACHED_4K -y

This also worked: catalina hello.c -p1 -lc -C TTY -C CUSTOM -C LARGE -C CACHED_4K -y

But after running build_utilities and configuring the XMM driver to not use cache, the program didn't run when I compiled it this way: catalina hello.c -p1 -lc -C TTY -C CUSTOM -C SMALL -y

This didn't work either: catalina hello.c -p1 -lc -C TTY -C CUSTOM -C LARGE -y

Note that the attached driver is actually called CUSTOM_XMM.inc on my computer. I just changed the name here to DUALSRAM_XMM.inc to convey the idea that it's using two SPI SRAM chips.

Here's the simple test program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <math.h>
#include <time.h>
#include <propeller.h>

void main(void)
{
 short Times=0;
 for(;;)
  {
   printf("Printed Hello World %d Times\r",Times++);
  }
}

UPDATE: OK, it looks like I'm up against the 96 longs limit for the XMM Direct functions. It's interesting that I don't remember seeing any warning from the compiler that the limit was exceeded. Anyway, I now remember that I've been here before years ago but was able to streamline the functions to ensure they fall within this limit. Hopefully I can do the same now. I also think the final result last time was quite slow unless one enjoys watching paint dry...

RossH · 2023-06-27 05:31

Hello @Wingineer19

It sounds like you are doing the right things to create a new add-on board. Id' say your SPIN error is to do with an error in your CFG file, which is included just before the line the Spin compiler is complaining about (see constants.inc, which includes CFG.inc, which will include your file (which I assume you have called DUALSIM_CFG.inc). Email me your files and I will check them.

I will try to find time to answer the rest of your questions later today or tomorrow.

Ross.

EDIT: Corrected the file name. I see you already posted DUALSIM_XMM.inc, can you also post DUALSIM_CFG.inc?

Wingineer19 · 2023-06-27 06:36

Hi RossH,

I've enclosed a couple more files for your review.

Unfortunately it appears that I screwed up the build_utilities settings with my modifications such that I had to uninstall Catalina then reinstall it to get it to even work again.

Once I get a handle on what I'm doing I can revisit the build_utilities modifications to include the DUALSRAM memory module.

But that brings up another question: If I'm able to successfully use build_utilities to generate the correct settings for the FLiP board and the DUALSRAM memory module, how will I be able to get ramtest to work with this arrangement without modifying it too?

As mentioned previously I'm up against the 96 longs limit for the XMM Direct functions so I will need to squeeze them down. The ramtest probably didn't care about this but the actual XMM Kernel definitely does.

Assuming the XMM_WRITELONG, XMM_WRITEMULT, XMM_READLONG, and XMM_READMULT functions are properly written, I think the 96 long issue is why XMM Direct didn't work. I'm puzzled that Catalina didn't flag the compiling with an error because I think there is a FIT designation within your Kernel that would catch the oversize.

It's late night here so I will have to resume this task tomorrow morning. I will post the new functions once I've compressed everything down. Talk to you later.

Wingineer19 · 2023-06-28 03:50

Hi RossH,

OK, I redid the XMM functions, and I've gone back to baseline and called everything CUSTOM instead of DUALSRAM.

For quick reference again, here is the simple "Hello,world!" program used for this test:

//program is hello.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <math.h>
#include <time.h>
#include <propeller.h>

void main(void)
{
 short Times=0;
 for(;;)
  {
   printf("Printed Hello World %d Times\r",Times++);
  }
}

To test the XMM caching functions, I ran build_ram_test CUSTOM CACHED_4K and it passed ramtest.
I then did build_utilities and configured it to use CACHE_4K. I compiled the hello.c two different ways:
1. catalina hello.c -p1 -lc -C TTY -C CUSTOM -C SMALL -C CACHED_4K -y. The program ran successfully.
2. catalina hello.c -p1 -lc -C TTY -C CUSTOM -C LARGE -C CACHED_4K -y. The program also ran successfully.

To test the XMM Direct functions, I ran build_ram_test CUSTOM and it passed ramtest.
I then did build_utilities and configured it to not use any cache. I compiled the hello.c two different ways:
1.catalina hello.c -p1 -lc -C TTY -C CUSTOM -C SMALL -y. The program didn't run at all.
2.catalina hello.c -p1 -lc -C TTY -C CUSTOM -C LARGE -y. This didn't run either.

It looks like all of my XMM Direct functions take about 89 longs, which is just under the 96 long limit. So it's probably safe to rule out the failure of the XMM Direct functions due to oversize.

If ramtest only performs long testing it's possible that the XMM Direct functions passed ramtest but failed in real world testing due to incorrect operation with byte and word variables. At this point I can't narrow it down because I don't know if ramtest performs testing on all three data types.

I've enclosed the various CUSTOM files for your review.

RossH · 2023-06-28 05:11

@Wingineer19 said:

I'm puzzled that Catalina didn't flag the compiling with an error because I think there is a FIT designation within your Kernel that would catch the oversize.

It may depend on how you compile it - I just downloaded and added your files to my target directory and modified XMM.inc, DEF.inc and CFG.inc to include them.

When I compile hello_world.c in LARGE mode using the command:

catalina hello_world.c -lc -C FLIP -C DUALSRAM -C LARGE

I get:

Catalina Compiler 5.9.2
Catalina_XMM.spin(1723:49) : error : Origin exceeds FIT limit
Line:
              fit       $1eb                    ' last 5 longs reserved for debug overlay vectors
Offending Item: ' last 5 longs reserved for debug overlay vectors

Which is what would expect if the XMM routines were longer than 96 longs. Buf if I compile it in SMALL mode using the command:

catalina hello_world.c -lc -C FLIP -C DUALSRAM -C SMALL

I get:

Catalina Compiler 5.9.2

code = 20988 bytes
cnst = 144 bytes
init = 180 bytes
data = 352 bytes
file = 54512 bytes

This is because your XMM code is 98 longs long - just 2 longs too large for XMM LARGE mode, where the limit is 96 longs, but just small enough to just sneak into XMM SMALL mode because in SMALL mode the XMM kernel itself is a few longs smaller.

If you add -C CACHED, it compiles in both LARGE and SMALL modes, because the caching XMM code is small enough to fit in both modes.

I have attached my (untested) modified version of build_utilities.bat which knows about the DUALSRAM add-on board.

To go back to a few points from your earlier posts:

When operating without cache enabled, does ramtest verify proper transfer of individual bytes, words, and longs, or does it only test the movement of longs?

Ramtest tests only longs. It probably should test both XMM_WriteMult and XMM_ReadMult with a length of 1, 2 and 4 to test bytes and words as well as longs. I will add that as another test option.

The Catalina Manual warns about messing with the C and Z flags within the driver.

Yes, you cannot modify the C or Z flags in your XMM functions. If you do so, the results will be very unpredictable. Some programs may work sometimes, while others will not at all. Also, cached mode may work but direct mode will not.

If I'm able to successfully use build_utilities to generate the correct settings for the FLiP board and the DUALSRAM memory module, how will I be able to get ramtest to work with this arrangement without modifying it too?

You should be able to just use the build_ram_test script in the utilities folder unmodified. For instance:
build_ram_test FLIP DUALSRAM
or
build_ram_test FLIP DUALSRAM CACHED

Ross.

RossH · 2023-06-28 08:51

Hello @Wingineer19

Attached is a new version of Catalina's ram test program, which can be configured to test byte and word accesses in place of long accesses. To do so, simply add READ_WORD, READ_BYTE or READ_LONG and/or READ_WORD, READ_BYTE or READ_LONG to the build_ram_test command. If all goes well, the program should function identically to the way it does now.

For example:

build_ram_test FLIP DUALSRAM READ_WORD
or
build_ram_test FLIP DUALSRAM READ_BYTE WRITE_WORD

More detail in the file itself, but here is the main change:

'  The program normally uses XMM_ReadLong and XMM_WriteLong to access the
'  XMM RAM, but it can be configured to use XMM_ReadMult and XMM_WriteMult
'  instead, by defining one of the following for reads:
'     READ_LONG - use XMM_ReadMult with XMM_LEN = 4 
'     READ_WORD - use XMM_ReadMult with XMM_LEN = 2 
'     READ_BYTE - use XMM_ReadMult with XMM_LEN = 1 
'  and/or one of the following for writes:
'     WRITE_LONG - use XMM_WriteMult with XMM_LEN = 4 
'     WRITE_WORD - use XMM_WriteMult with XMM_LEN = 2 
'     WRITE_BYTE - use XMM_WriteMult with XMM_LEN = 1

This will be included in the next release.

Ross.

EDIT: Updated to allow separate methods for read and write, and also to fix a bug when using word read/writes.

RossH · 2023-06-28 10:03

Hello @Wingineer19

Your routines still change the Z and/or C flags (e.g. in SendReadRequest and SendWriteRequest). This is not allowed.

If you need to use the flags, save their current values on entry and restore them on exit.

Ross.

Wingineer19 · 2023-06-28 22:20

Hi RossH,

I think I'm able to work around the C and Z flag problem within the XMM Direct functions.

Suppose RamData contains 0x03:

Instead of doing this:

Function        cmp RamData,#03  wc,wz
         if_e   jmp #Function_ret
                mov RamData,#01
                shl RamData,#16
Function_ret    ret

I did something like this:

Function        xor RamData,#03
                tjz RamData,#Function_ret
                mov RamData,#01
                shl RamData,#16
Function_ret    ret

Now, on to ramtest. In this case I'm still working with CUSTOM and haven't moved on to DUALSRAM yet. (In reality, they are the same functions but CUSTOM is treated as a complete project board wherein DUALSRAM will be treated as an add-on for the FLiP module).

Just like before, my newest CUSTOM driver passed ramtest when reading/writing LONGs, but failed with WORDs and BYTEs.

I don't want to relitigate the Endian issues, but do want to make sure I fully understand what the Kernel expects when invoking the XMM Direct functions.

For this example let's start with the XMM_Write functions. The XMM_Read functions can be addressed later.

Suppose the Kernel has a register called CogReg consisting of a long variable containing 0xdeadbeef and wants to store it in external memory.

Because of Little Endian this variable is actually stored internally like this:

CogReg: 0xefbeadde

But if I wanted to print out CogReg then I would see 0xdeadbeef on the screen.

Now, let's invoke the XMM_Write functions to send CogReg to external memory:

1. Save CogReg in external memory variable Mem as a LONG:
-Kernel calls either XMM_WriteLong or XMM_WriteMult with XMM_Len set to 4.
-XMM_Write function will grab a byte at a time from CogReg, starting from left to right, until all four have been sent
-The memory databus will see these bytes flowing in this order: ef be ad de
-External Memory variable Mem now contains 0xefbeadde
-If I want Mem to print what it contains, then I will see 0xdeadbeef on the screen.

2. Save CogReg in external memory variable Mem as a WORD:
-Kernel calls XMM_WriteMult with XMM_Len set to 2.
-XMM_Write will grab and send the two most leftward bytes from CogReg to external memory location Mem.
-The memory databus will see these bytes flowing in this order: ef be
-External Memory variable Mem now contains 0xefbe
-If I want Mem to print what it contains, then I will see 0xbeef on the screen.
-Note: This function will only send out the most leftward WORD in CogReg to external memory.

3. Save CogReg in external memory variable Mem as a BYTE:
-Kernel calls XMM_WriteMult with XMM_Len set to 1.
-XMM_Write will grab and send the leftmost byte from CogReg to external memory location Mem.
-The memory databus will see this byte flowing: ef
-External Memory variable Mem now contains 0xef
-If I want Mem to print what it contains, then I will see 0xef on the screen.
-Note: This function will only send out the most leftward BYTE in CogReg to external memory.

Question: Is this how the XMM_Write functions are expected to perform?

RossH · 2023-06-29 00:41

Hello @Wingineer19

Data stored in Cog RAM is not byte addressable, so it doesn't make sense to consider it as being big or little endian - i.e. $deadbeef is $deadbeef, not $efbeadde. Byte "endianness" only has significance when you store data longer than one byte in byte-addressable memory, such as Hub RAM or XMM RAM.

However, storing a Cog RAM register in XMM RAM must be done exactly the same way as it would be in Hub RAM - i.e. as little-endian, so storing the long value $deadbeef at address X becomes the four successive bytes $ef $be $ad $de. If you then read address X as a byte you will get $ef, if you read it as a word you will get $beef and if you read it as a long you will get $deadbeef

Have a look at the changes I just made to the ram test program (see the updated version in the link above), which can now use byte, word or long operations separately for read and write. They illustrate how to read and write a Cog register in XMM RAM with the correct "endianness". No matter which operations you choose for read and which for write, the program should always produce the same output.

Ross.

evanh · 2023-06-29 02:00

Prop2 instruction set does technically have the ability to address cogRAM (register space) on smaller than longword boundaries using the ALT prefixing instructions. But usually everyone wants a longword register printed as big-endian ordered hexadecimal nibbles so that's what is done.

Wingineer19 · 2023-06-29 05:44

Hello RossH,

OK, I found that part of the problem why ramtest was failing was because the XMM Direct functions were altering the XMM_Len value.

I fixed that so now MULT_LONG and MULT_WORD passes ramtest.

The MULT_BYTE appears to pass the TRIV test, but doesn't print DEADBEEF for some reason:

But fails the COMPLEX test:

I've gone over both the XMM_WriteLong/XMM_WriteMult and XMM_ReadLong/XMM_ReadMult functions and am puzzled why the BYTE test failed.

Here's XMM_WriteLong/XMM_WriteMult:

XMM_WriteLong                mov   XMM_Len,#04                 'XMM_Len=4 (Write Long)
XMM_WriteMult               call   #SendWriteReq               'Send Memory Write Request
                             add   XMM_Addr,XMM_Len            'XMM_Addr=XMM_Addr + XMM_Len
                             mov   RamLoop,XMM_Len             'RamLoop=XMM_Len
XMM_Src                      mov   RamData,0-0                 'CogRam -> RamData
:XMMWriteLoop               andn   outa,RamCsClkBus            'RamCS=L,RamCLK=L,RamBus=L
                             mov   RamCopy,RamData             'RamCopy=RamData
                             and   RamCopy,#$FF                'RamCopy=RamCopy & 0xff
                             shl   RamCopy,#Ram_D0_Pin         'RamCopy=RamCopy << Ram_D0_Pin
                              or   outa,RamCopy                'RamBus=RamCopy
                             shr   RamData,#08                 'RamData=RamData >> 0x08
                              or   outa,RamCLK                 'RamCLK=H
                            djnz   RamLoop,#:XMMWriteLoop      'if(--RamLoop) goto XMMWriteLoop
                            andn   outa,RamCsClkBus            'RamCS=L,RamCLK=L,RamBus=L
                              or   outa,RamCS                  'RamCS=H
XMM_WriteMult_ret
XMM_WriteLong_ret            ret                               'Return

It takes the CogRam value and places it in RamData. It then ships out the bytes in RamData starting with the least significant byte0 (i.e. the one furthest right). RamData is then shifted right by 8 bits thus moving byte3 into byte2, byte2 into byte1, and byte1 into byte0. The new byte0 is then shipped out. This shifting and outputting process repeats until XMM_Len bytes have shipped out.

Here's XMM_ReadLong/XMM_ReadMult

XMM_ReadLong                 mov   XMM_Len,#04                 'XMM_Len=4 (Read A Long)
XMM_ReadMult                call   #SendReadReq                'Send Memory Read Request
                             add   XMM_Addr,XMM_Len            'XMM_Addr=XMM_Addr + XMM_Len
                             mov   RamLoop,XMM_Len             'RamLoop=XMM_Len
                             mov   RamData,#0                  'RamData=0
                             mov   RamSlide,#0                 'RamSlide=0
:XMMReadLoop                 mov   RamCopy,INA                 'RamCopy=INA
                             shr   RamCopy,#Ram_D0_Pin         'RamCopy=RamCopy >> Ram_D0_Pin
                             and   RamCopy,#$FF                'RamCopy=RamCopy & 0xff
                              or   outa,RamCLK                 'RamCLK=H
                             shl   RamCopy,RamSlide            'RamCopy=RamCopy << RamSlide
                              or   RamData,RamCopy             'RamData=RamData | RamCopy
                             add   RamSlide,#08                'RamSlide=RamSlide + 0x08
                            andn   outa,RamCsClkBus            'RamCS=L,RamCLK=Lo,RamBus=L
                            djnz   RamLoop,#:XMMReadLoop       'if(--RamLoop) goto XMMReadLoop
XMM_Dst                      mov   0-0,RamData                 'XMM_Dst=RamData
                              or   outa,RamCS                  'RamCS=H
XMM_ReadMult_ret
XMM_ReadLong_ret             ret                               'Return

It reads the first byte from XMM memory and places it in byte0 of RamData. The following bytes read from XMM Memory will be shifted left by 8, 16, or 24 bits depending upon how many iterations of the loop as determined by XMM_Len. Hence, the first read byte will be placed in byte0 of RamData, the second read byte will be placed in byte1 of RamData, the third read byte will be placed in byte2 of RamData, and the fourth read byte will be placed in byte3 of RamData. Then RamData is copied into CogRam.

This process appears to work for LONG and WORD values but now fails when working with BYTE values? I'm clearly missing something here...

RossH · 2023-06-29 08:29

Hello @Wingineer19

No, that's a failure of even the trivial test. Here is what a successful "trivial" test should look like, using any combination of byte, word or long reads and/or byte, word or long writes:

I'll try and get a chance to go over your code tonight or tomorrow.

Ross.

Wingineer19 · 2023-06-29 19:58

Hi RossH,

The byte transfer problem is a result of both my driver and the hardware design.

My XMM memory module is essentially the "Rampage" design without the flash memory or the SD Card. I use two 512 kilobyte SPI SRAMs. The Quad data bus from each chip is combined into a single 8-bit data bus. Here's the schematic:

The P2EDGE does something similar but uses four 8 megabyte SPI PSRAMs. The Quad data bus from each chip is combined into a single 16-bit data bus. Here's the schematic:

Regardless of using either the SRAMs or PSRAMs, the chips initially power up into SPI mode. Once the memory driver switches all chips into Quad mode, all memory accesses must be done using nibbles on their respective Quad data bus.

Memory access is accomplished by sending a total of four bytes after Chip Select (CS) is activated. The first byte is a read or write command. The remaining three bytes comprise the 24-bit memory address to be accessed. Since these are Quad chips, not Octal, this information must be sent a nibble at a time, resulting in a total of 8 nibbles.

Each block of memory consists of bytes, not nibbles, so a READ or WRITE operation MUST consist of transferring TWO nibbles.

Referring back to my design, at first glance it might appear that one can perform a write operation by just placing an 8-bit value on the bus, pulse the CLK, and data has been stored. That's incorrect because only a nibble has been stored. Another nibble is still required to perform a proper write operation. If that doesn't happen, and CS is deactivated, the result could be a complete abort of the write operation, or memory corruption at best. Again, Quad chips require a complete byte transfer to/from memory. Just sending a nibble and ending the write cycle will not work.

Simply put, what this means is that read/write operations must happen in nibble pairs. So send/receiving a WORD would work because 4 total nibbles are transferred, two to SRAM1 (thus comprising a byte), and two to SRAM2 (also comprising a byte). Thus the byte transfer requirement is met for both chips. Sending a LONG would also work because 8 total nibbles are transferred, four to SRAM1 (thus two bytes transferred), and four to SRAM2 (also two bytes transferred).

Running the XMM driver in Caching mode will work, but ONLY if an EVEN number of BYTEs is always transferred. XMM Direct moving only WORDs or LONGs will also work because an even number of bytes is moved in each case.

This means that my existing memory driver won't work when wanting to move a single byte at a time. The simple solution to this problem is to just send two CLK pulses instead of only one to memory after each 8-bit read/write operation. This ensures that each memory location will have read/stored a byte, but doing this will cause the two nibbles within each byte to be exactly the same all the time.

I will definitely make this two CLK pulse change to my XMM caching functions to ensure proper read/write operations always take place whether or not an even number of caching bytes are accessed or not.

This may be more difficult with my XMM Direct functions because the Cog memory containing my driver is already tight. But I will take a look and see if I can make it happen. Will keep you posted.

Wingineer19 · 2023-06-29 22:18

Hi RossH,

OK, my supposition was correct. After performing a data read or write operation on the 8-bit bus I send two CLK pulses to the chips.

It worked well. I can now get ramtest to pass with LONGs, WORDs, and BYTEs. It works with or without the cache.

More importantly, I'm able to use the XMM Direct functions. I was able to compile and run the "Hello,world!" program using either SMALL or LARGE memory models, and with or without cache.

So my driver is small enough to fit within the XMM Kernel cog to even support running LARGE programs without using the cache. But, I think I can make it even better. I'll see if I can shrink it down further.

Originally my driver would decrement XMM_Len which ramtest didn't like. The reason is because you only set the XMM_Len variable once before calling the BYTE and WORD tests:

 ' read multiple bytes to MulTmp
ReadMult
        movd   XMM_Dst,#MultTmp
        call   #XMM_ReadMult
ReadMult_ret
        ret
' write multiple bytes from MultTmp
WriteMult
        movs   XMM_Src,#MultTmp
        call   #XMM_WriteMult
WriteMult_ret
        ret
' read long from XMM_Addr to Data, using XMM_ReadMult or XMM_ReadLong
ReadLong
#ifdef MULT_BYTE
        mov    XMM_Len,#1
        call   #ReadMult
        mov    Data,MultTmp
        call   #ReadMult
        shl    MultTmp,#8
        or     Data,MultTmp
        call   #ReadMult
        shl    MultTmp,#16
        or     Data,MultTmp
        call   #ReadMult
        shl    MultTmp,#24
        or     Data,MultTmp
#elseifdef MULT_WORD
        mov    XMM_Len,#2
        call   #ReadMult
        ror    MultTmp,#16
        mov    Data,MultTmp
        call   #ReadMult
        or     Data,MultTmp
#elseifdef MULT_LONG
        mov    XMM_Len,#4
        movd   XMM_Dst,#Data
        call   #XMM_ReadMult
#else
        movd   XMM_Dst,#Data
        call   #XMM_ReadLong
#endif
ReadLong_ret
        ret

' write the long in Data to XMM_Addr, using XMM_WriteMult or XMM_WriteLong
WriteLong
#ifdef MULT_BYTE
        mov    MultTmp,Data
        mov    XMM_Len,#1
        call   #WriteMult
        mov    MultTmp,Data
        shr    MultTmp,#8
        call   #WriteMult
        mov    MultTmp,Data
        shr    MultTmp,#16
        call   #WriteMult
        mov    MultTmp,Data
        shr    MultTmp,#24
        call   #WriteMult
#elseifdef MULT_WORD
        mov    XMM_Len,#2
        mov    MultTmp,Data
        shr    MultTmp,#16
        call   #WriteMult
        mov    MultTmp,Data
        call   #WriteMult
#elseifdef MULT_LONG
        mov    XMM_Len,#4
        movs   XMM_Src,#Data
        call   #XMM_WriteMult
#else
        movs   XMM_Src,#Data
        call   #XMM_WriteLong
#endif
WriteLong_ret
        ret

MultTmp long 0

If you reloaded XMM_Len before each call that would allow me to decrement XMM_Len and save me a couple of LONGs in my code

I may modify your Catalina_XMM_RamTest.spin to allow me to do just that

I don't think the XMM_Len will be an actual issue with the Kernel because I assume it loads the desired XMM_Len value every time before calling my functions?

Anyway, I've attached the CUSTOM_XMM.inc driver for anyone who is interested.

Again, many thanks for your help on this.

RossH · 2023-06-29 23:53

Hello @Wingineer19

Glad you got it working.

Just on this issue ...

@Wingineer19 said:

I don't think the XMM_Len will be an actual issue with the Kernel because I assume it loads the desired XMM_Len value every time before calling my functions?

The kernel is ok, and I will update the Ram Test program, but I just checked other places it is used, and I found a case where changing XMM_Len will cause problems. This is in the cache, in the file target\Cached_XMM.inc. I have attached an updated version, which you need to test on a program that uses the cache, and which is large enough to require paging.

Let me know when you think it is all good, and I will add your driver to the next release as an XMM add-on board enabled by the Catalina symbol DUALSRAM.

Ross.

Wingineer19 · 2023-06-30 04:39

Hi RossH,

Sounds good. I should be able to work on this again tomorrow and see if I can streamline the driver and run the test with the caching enabled.

After I got the driver to work today I compiled the startrek.c program in LARGE with no caching and much to my dismay I immediately succeeded in getting the Enterprise destroyed in a Klingon ambush. It's been decades since I've played that game and it showed

I'll keep you posted on the testing so I can give you the "all clear" signal so you can include the driver with the next release of Catalina.

RossH · 2023-06-30 05:35

@Wingineer19 said:
After I got the driver to work today I compiled the startrek.c program in LARGE with no caching and much to my dismay I immediately succeeded in getting the Enterprise destroyed in a Klingon ambush. It's been decades since I've played that game and it showed

Them Klingons is tricksy!

RossH · 2023-06-30 12:04

Just something I had never noticed before ...

The Propeller 2 will only boot from an SD card formatted as FAT32, not FAT16. But Catalyst can read and execute programs from either FAT16 or FAT32 SD cards. So, if instead of booting Catalyst from the SD card, you program it into Flash and boot from that, you can use FAT16 formatted SD cards as easily as FAT32.

Not sure how useful this is, but it confused me at first when I tried to use some old SD cards formatted as FAT16 on my Propeller 2 and it wouldn't boot from them, so I will add a note to the Catalina and Catalyst documentation for the next release.

Wuerfel_21 · 2023-06-30 14:41

P2 can also boot from MBR if you really want to, which is independent of the file system.

RossH · 2023-06-30 23:08

@Wuerfel_21 said:
P2 can also boot from MBR if you really want to, which is independent of the file system.

Thanks. Should have remembered that thread. Have added a note to my own documentation so as not to forget again in future!

Wingineer19 · 2023-07-03 05:02

Hi RossH,

I cleaned up and updated my driver, but apparently broke several things in the process.

I first used ramtest on the XMM Direct functions to see if they work. Result is that they passed both the trivial and complex tests for CUSTOM, CUSTOM MULT_LONG, CUSTOM MULT_WORD, and CUSTOM MULT_BYTE. This was after I modified the Catalina_XMM_RamTest.spin to reload XMM_Len prior to calling each function.

Then I used build_utilities to use the XMM Direct functions (i.e. no caching). So far, so good.

But when I compiled startrek.c with the LARGE memory model without cache and attempted to upload I got this error:

I then compiled startrek.c with the SMALL memory model without cache and also got an error when attempting to upload:

I don't know what an LRC error is, but everything appeared great until that time

Next I used ramtest for the caching functions. The result is that CACHED_1K, CACHED_2K, CACHED_4K, and CACHED_8K all passed the trivial test but all failed the complex test. Because of these failures there was no need to attempt compiling startrek.c using them.

I am decrementing XMM_Len within each called function so that might be causing a problem with the caching cog? Maybe that's also causing a problem within the XMM Kernel that contains the XMM Direct functions?

I can't see anything suspicious in the code to account for these failures, but I'll continue to test using GEAR to verify that the functions are doing what they are supposed to do instead of something odd...

Wingineer19 · 2023-07-03 06:54

Hi RossH,

OK, thanks to GEAR I found a major issue with XMM_ReadPage. Essentially the problem was that after sending the memory address request to XMM Memory the function wasn't switching the databus to input nor sending the required two clock pulses to confirm it. That's been fixed, and now all of the caching settings (1K, 2K, 4K, and 8K) pass ramtest with both the trivial and complex tests.

And with that repair done, I was able to compile startrek.c and run it with any choice of SMALL or LARGE and any cache size (1K, 2K, 4K, or 8K).

So, with the caching problem fixed, both the SMALL and LARGE functions for XMM Direct work now too. It looks like the suspicion about XMM_Len was unfounded.

I've attached the working driver for your review. I will continue to look it over and see if I can make it more efficient.

Wingineer19 · 2023-07-07 04:35

Hi RossH,

Just a quick update to let you know I'm still working on this. The driver appears stable so I've attached the files below.

Right now let's just classify it as "semi-finalist" because I'm still testing it. Most tests so far have focused upon running the driver in XMM Direct mode. Very little testing has been done in caching mode which I certainly need to do before reclassifying it as a "finalist" for release with a future Catalina update.

The driver assumes that Pins 27 to 18 are exclusively dedicated to the two SPI SRAMs. Once the Kernel calls XMM_Activate I see no reason for it to ever call XMM_TriState, hence I'm thinking about replacing this:

XMM_TriState                  mov  RamData,#QuadExit            'RamData=QuadExit (Exit Quad Mode)
                              mov  RamLoop,#02                  'RamLoop=2
                             call  #SendRamCmd                  'Send QuadExit Cmd To Memory
                               or  outa,RamCS                   'RamCS=H
                             andn  outa,RamCsClkBus             'RamCS=I,RamCLK=I,RamBus=IIIIIIII
XMM_TriState_ret              ret                               'Return

With this:

XMM_TriState                  nop
XMM_TriState_ret              ret

There's one important point that also must be considered. Although these SRAM chips are advertised as 512 kilobytes in size, internally they consist of two stacked 256 kilobyte chips. This means that reading/writing cannot cross the boundary between them.

Because of this limitation, and the fact that right now I don't anticipate my code to exceed 256 kilobytes, in the future I will install the 256 kilobyte chips on my memory board instead of the 512 kilobyte ones. Please review the definition settings within the DUALSRAM files to verify that I properly configured them for 256KB instead of 512KB. At some point I would like to try the driver with the 8MB PSRAM chips and see if I can get it to work. But not right now.

I've enclosed a revised version of Catalina_XMM_RamTest wherein I just reload the XMM_Len value within the MULT_LONG, MULT_WORD, and MULT_BYTE functions to ensure proper testing operation.

I've also done a slight modification to the build_utilities batch file to permit the DUALSRAM driver to operate in either XMM Direct or caching mode. Originally it would only permit caching mode operation.

I'll keep you posted as things progress.

RossH · 2023-07-07 06:44

Hello @Wingineer19

Will review in the next few days.

Ross.

Wingineer19 · 2023-07-10 04:42

Hi RossH,

I'm continuing to test the DUALSRAM driver and came across an interesting problem.

First, consider the following program:

//Segment Is NewTest.c
//Last Revision On 09Jul23
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <math.h>
#include <time.h>
#include <propeller.h>

#define  MaxStations    10

struct Yaler                   
{
  char      ServName[16];        
 short      RSSI;              
 short      Xpos;              
 short      Ypos;              
 short      Zpos;              
 union
  {
   unsigned char      Chr[8];      
   unsigned int       LV[2];
  } MAC;
};

union Noitats
{
 struct   Yaler     Relay[22];
 unsigned char      Chr[758];
};

union Noitats Station;

void LoadStations(void)
{
 strcpy(Station.Relay[0].ServName,"First");
 Station.Relay[0].MAC.LV[1]=0x00111111;
 Station.Relay[0].MAC.LV[0]=0x11111111;
 Station.Relay[0].Xpos=1;
 Station.Relay[0].Ypos=1;
 Station.Relay[0].Zpos=1;

 strcpy(Station.Relay[1].ServName,"Second");
 Station.Relay[1].MAC.LV[1]=0x00222222;
 Station.Relay[1].MAC.LV[0]=0x22222222;
 Station.Relay[1].Xpos=2;
 Station.Relay[1].Ypos=2;
 Station.Relay[1].Zpos=2;

 strcpy(Station.Relay[2].ServName,"Third");
 Station.Relay[2].MAC.LV[1]=0x00333333;
 Station.Relay[2].MAC.LV[0]=0x33333333;
 Station.Relay[2].Xpos=3;
 Station.Relay[2].Ypos=3;
 Station.Relay[2].Zpos=3;

 strcpy(Station.Relay[3].ServName,"Fourth");
 Station.Relay[3].MAC.LV[1]=0x00444444;
 Station.Relay[3].MAC.LV[0]=0x44444444;
 Station.Relay[3].Xpos=4;
 Station.Relay[3].Ypos=4;
 Station.Relay[3].Zpos=4;

 strcpy(Station.Relay[4].ServName,"Fifth");
 Station.Relay[4].MAC.LV[1]=0x00555555;
 Station.Relay[4].MAC.LV[0]=0x55555555;
 Station.Relay[4].Xpos=5;
 Station.Relay[4].Ypos=5;
 Station.Relay[4].Zpos=5;

 strcpy(Station.Relay[5].ServName,"Sixth");
 Station.Relay[5].MAC.LV[1]=0x00666666;
 Station.Relay[5].MAC.LV[0]=0x66666666;
 Station.Relay[5].Xpos=6;
 Station.Relay[5].Ypos=6;
 Station.Relay[5].Zpos=6;
}

void ShowStations(void)
{
 short idex;
 for(idex=0; idex<MaxStations; idex++)
  {
   if(strlen(Station.Relay[idex].ServName) < 1) continue;
   printf("%s\t\t%2X:%2X:%2X:%2X:%2X:%2X\t%d\t%d\t%d\r\n",
                                         (char *) Station.Relay[idex].ServName,
                                         Station.Relay[idex].MAC.Chr[5],
                                         Station.Relay[idex].MAC.Chr[4],
                                         Station.Relay[idex].MAC.Chr[3],
                                         Station.Relay[idex].MAC.Chr[2],
                                         Station.Relay[idex].MAC.Chr[1],
                                         Station.Relay[idex].MAC.Chr[0],
                                         Station.Relay[idex].Xpos,
                                         Station.Relay[idex].Ypos,
                                         Station.Relay[idex].Zpos);
  }
 printf("\r\n");
}

void main(void)
{
 short idex;
 short jdex;
 char  DelStr[20];
 for(;;)
  {
   printf("Hit ENTER Key To Retry");
   getchar();
   printf("%c[2J",0x1b);
   LoadStations(); 
   printf("\r\nShowing Initial Station List\r\n\r\n");
   ShowStations();
   strcpy(DelStr,"Third");
   printf("\r\nNow Removing The Third Station\r\n\r\n");
   for(idex=0; idex<MaxStations; idex++)
    {
     if(strcmp(Station.Relay[idex].ServName,DelStr) != 0) continue;
     for(jdex=idex; jdex<MaxStations; jdex++) Station.Relay[jdex]=Station.Relay[jdex+1];
    }
   printf("Showing Revised Station List\r\n\r\n");
   ShowStations();
  }
}

Essentially, the program contains a simulated list of WiFi stations, including their Names, MAC addresses, and (X,Y,Z) coordinates. Upon startup, the Program lists all six stations. It then eliminates the Third station, moves the remaining ones up the list, then displays the revised list. It then loops around and does the whole thing again.

After doing the proper build_utilities configuration, compiling, and executing the code for each of these settings:

catalina newtest.c -lc -lma -C TTY -C FLIP -C DUALSRAM -C SMALL

catalina newtest.c -lc -lma -C TTY -C FLIP -C DUALSRAM -C SMALL -C CACHED_1K
catalina newtest.c -lc -lma -C TTY -C FLIP -C DUALSRAM -C SMALL -C CACHED_2K
catalina newtest.c -lc -lma -C TTY -C FLIP -C DUALSRAM -C SMALL -C CACHED_4K
catalina newtest.c -lc -lma -C TTY -C FLIP -C DUALSRAM -C SMALL -C CACHED_8K

catalina newtest.c -lc -lma -C TTY -C FLIP -C DUALSRAM -C LARGE -C CACHED_1K
catalina newtest.c -lc -lma -C TTY -C FLIP -C DUALSRAM -C LARGE -C CACHED_2K
catalina newtest.c -lc -lma -C TTY -C FLIP -C DUALSRAM -C LARGE -C CACHED_4K
catalina newtest.c -lc -lma -C TTY -C FLIP -C DUALSRAM -C LARGE -C CACHED_8K

The program properly displays the following for each of the above:

Note above that running the program in XMM SMALL without the cache worked just as well as the settings that used the cache.

Now, running build_utilities and configuring it to run without the cache, compiling, and executing the code using this setting:

catalina newtest.c -lc -lma -C TTY -C FLIP -C DUALSRAM -C LARGE

Produces this result:

The code apparently locks up during the portion where the stations are moved up in the list after the Third station has been removed.

I've been testing various programs using XMM LARGE without cache and haven't encountered this problem before.

Could there still be a problem within the driver even though it passed the MULT_LONG, MULT_WORD, and MULT_BYTE settings when running ramtest?

Is there some difference between the XMM SMALL Kernel and the XMM LARGE Kernel that could account for this behavior? It doesn't seem likely to me.

It's strange because if there was a driver issue then why did the program run at all in XMM LARGE mode and ran perfectly without the lockup in XMM SMALL? Right now it's a mystery.

RossH · 2023-07-10 06:42

@Wingineer19 said:

Could there still be a problem within the driver even though it passed the MULT_LONG, MULT_WORD, and MULT_BYTE settings when running ramtest?

Yes, there could - the ram test program is far from exhaustive.

Is there some difference between the XMM SMALL Kernel and the XMM LARGE Kernel that could account for this behavior? It doesn't seem likely to me.

Yes, the kernels are different. While there could be a problem in the LARGE kernel, it is also possible is that it is exposing a problem in the driver because of differences in the way it uses XMM RAM. There was a problem with copying structs that I found and fixed in Catalina 5.9.1, but my notes say it affected only the Propeller 2. I will review that code and see if it also affects the Propeller 1.

It's strange because if there was a driver issue then why did the program run at all in XMM LARGE mode and ran perfectly without the lockup in XMM SMALL? Right now it's a mystery.

I will try running your program on a known good XMM platform. But I won't get a chance till later this week - I'm flat out at the moment (sadly, on non-Propeller stuff )

Ross.

Wingineer19 · 2023-07-10 21:40

Hi RossH,

I understand.

Here in the States, the only reason why I'm inside working on this code is because it's too blistering hot to work outside. I've got lots to do out there but everything is on hold right now. I can't get anything done in my WorkShop either because its old air conditioner can't keep the place cool. But I'm confident this heat wave will be over by December

Looking back over the test program code I posted above, as well as my driver, I'm still mystified.

The program works with XMM SMALL without cache or with any choice of cache.

It works fine with XMM LARGE with any choice of cache, but not without cache.

Well let me qualify the previous statement: It was working fine using XMM LARGE without cache until it hit the portion of the program where I'm moving things around within the union. Then the program locked up and required a Prop reset.

Maybe there's something lurking in Catalina that affects unions within the Prop1 while running XMM LARGE as you discovered within the Prop2 regarding structs as mentioned above. If not, the mystery continues.

Until the problem is resolved, I'll switch over to XMM SMALL without cache so I can continue working on my code, but I really need XMM LARGE because I have several routines that need as much HubRam memory as possible to run CMM code using the Multi Memory Model feature.

RossH · 2023-07-10 23:31

Hello @Wingineer19

Just did a quick test on my C3, and I see similar (but not identical) behavior to what you see on your platform. So it definitely looks like a bug in Catalina related to copying structs in XMM RAM. This is where I found a problem in the Propeller 2 kernel, so it may indeed be that the Propeller 1 kernel has a similar problem and I hadn't realized it. This problem is hidden by using the cache, and using the cache is required on most XMM boards, so this bug went unnoticed.

If this is the case, it should be easy to find and fix. I have to be out for most of today, but will check that when I get back.

However, note that on the C3 I also have problems in XMM SMALL mode - so until I find the problem, I'd recommend using the cache!

Ross.

RossH · 2023-07-11 11:49

Hello @Wingineer19

I believe I have found the problem in the Propeller 1 XMM kernel - it turns out I was wrong in thinking that the kernel would not be affected if the XMM access routines change the value of XMM_Len. Doing so causes exactly the kind of problem you are seeing in LARGE mode. The cached versions of the XMM access routines do not change XMM_Len, which is why the problem is hidden when the cache is used.

I can fix it, but my current fix makes the kernel slightly larger, so I will try to find a better solution before I release it.

Ross.

Catalina - ANSI C and Lua for the Propeller 1 & 2

Comments