Revisiting XMM Mode Using Modern Memory Chips

Wingineer19 · 2019-06-19 23:00

Yes, that chip is what I installed on my Propeller Project USB Board.

I used this pinout:

P27 - "D3" = SIO3 = (HOLD On SRAM) ( HOLD#/RESET# On Flash)
P26 - "D2" = SIO2 = (DNU On SRAM) (WP# On Flash)
P25 - "D1" = SIO1 = SO On Both SRAM And Flash
P24 - "D0" = SIO0 = SI On Both SRAM And Flash
P23 - SCLK (Common To SRAM and FLASH)
P22 - RAMCS (Selects SRAM)
P21 - FLCS (Selects FLASH)
P20 - SDCS (Selects SD Card)

Which probably should look like this instead to more closely match the PMC layout:

P27 - SCLK (Common To SRAM and FLASH)
P26 - RAMCS (Selects SRAM)
P25 - FLCS (Selects FLASH)
P24 - SDCS (Selects SD Card)
P23 - "D3" = SIO3 = (HOLD On SRAM) ( HOLD#/RESET# On Flash)
P22 - "D2" = SIO2 = (DNU On SRAM) (WP# On Flash)
P21 - "D1" = SIO1 = SO On Both SRAM And Flash
P20 - "D0" = SIO0 = SI On Both SRAM And Flash

The XMM driver under Catalina allowed me to change the PMC pinouts to the first arrangement shown above.

I used the XMM Memory Test Utility included with the Catalina compiler. It passed the Trivial XMM Memory Test but failed the Complex XMM Memory Test.

So, I'm going to change it to the second arrangement above as recommended by @RossH and try it again. Unfortunately I probably won't be able to get to it until next week though.

I'm also looking over the XMM API code and thinking about making some tweaks to it as well, assuming that the revised pinout arrangement doesn't work.

This project is moving along, albeit much more slowly than I would like...

David Betz · 2019-06-20 01:06

That chip should be pretty easy to get working. Where did you buy yours? Are they available in small quantities?

yeti · 2019-06-20 01:18

So far I see them here (€urope) only via mouser.de for 4.45€ per chip plus 20.00€ S&H (and currently not even being available). :-(

If someone sees a cheaper and/or faster source for them, please mention it.

David Betz · 2019-06-20 01:22

Is this chip faster than the SRAM chip that is on the PMC?

yeti · 2019-06-20 01:25

David Betz wrote: »

Is this chip faster than the SRAM chip that is on the PMC?

The datasheet mentiones 30 and 45 MHz variants.

David Betz · 2019-06-20 01:27

yeti wrote: »

David Betz wrote: »

Is this chip faster than the SRAM chip that is on the PMC?

The datasheet mentiones 30 and 45 MHz variants.

Yeah but the Propeller probably can't drive it that fast anyway.

jmg · 2019-06-20 01:33

David Betz wrote: »

yeti wrote: »

David Betz wrote: »

Is this chip faster than the SRAM chip that is on the PMC?

The datasheet mentiones 30 and 45 MHz variants.

Yeah but the Propeller probably can't drive it that fast anyway.

There is one usage mode where you might get close : if you use 1 or 2 of these parts, for direct video streaming, and P1 provides the pixel clock, in gated bursts.
A blanking buffer allows the P1 to load the RAMs during the flyback times, which it does at whatever speed SW can manage.

Wingineer19 · 2019-06-20 02:23

@"David Betz",
I bought 5 of them from Mouser last month here in the States when they still had hundreds in inventory:
https://www.mouser.com/ProductDetail/ISSI/IS62WVS5128GBLL-45NLI?qs=/ha2pyFaduiECgjpBtghA0mW54CYEHGamCnWs5RVIb7SQZYtPxUDGCG%2BIbKMUbcx

I also bought 5 of these 4MB flash chips from Mouser. Looks like they still have over 1000 in stock:
https://www.mouser.com/ProductDetail/ISSI/IS25LP032D-JNLA3?qs=sGAEpiMZZMtI%2BQ06EiAoG%2BAe0qJpJIMLFjUhKemt/SI=

What I found really intriguing was this 8MB SRAM part:
https://lcsc.com/product-detail/RAM_Lyontek-Inc-LY68L6400SLIT_C261881.html
But I see no vendor on this side of the pond selling them yet...

Cluso99 · 2019-06-20 04:49

I am curious to know what else you have hooked to the P1? You mentioned GPS. I presume that’s just a serial in to the P1?

yeti · 2019-06-20 07:58

Wingineer19 wrote: »

What I found really intriguing was this 8MB SRAM part:
https://lcsc.com/product-detail/RAM_Lyontek-Inc-LY68L6400SLIT_C261881.html
But I see no vendor on this side of the pond selling them yet...

It looks like I can get them here within 2 to 3 weeks and much cheaper than the 512k*8 ones.

Wingineer19 · 2019-06-22 00:18

XMM Memory Status Update: Getting real close to having Catalina work with the 512KB SPI SRAM installed on my Propeller Project USB Board.

Bottom line? Very close but not quite there yet as shown in the next post.

Of course once it does, then the next task will be to get the 4MB SPI flash memory to work with it as well...

Wingineer19 · 2019-06-22 00:39

@RossH,

Here's a Status Report for this 512KB SPI/SQI SRAM integration onto the Propeller Project USB Board:

I modified the pinouts as you suggested to more closely parallel the PMC layout:

P27 - SCLK (Common To SRAM and FLASH)
P26 - RAMCS (Selects SRAM)
P25 - FLCS (Selects FLASH)
P24 - SDCS (Selects SD Card)
P23 - "D3" = SIO3 = (HOLD On SRAM) ( HOLD#/RESET# On Flash)
P22 - "D2" = SIO2 = (DNU On SRAM) (WP# On Flash)
P21 - "D1" = SIO1 = SO On Both SRAM And Flash
P20 - "D0" = SIO0 = SI On Both SRAM And Flash

I then modified the Custom_XMM_DEF.Inc file to look like this:

{
'--------------------------- Custom XMM Definitions ----------------------------
}
'============================= XMM DEFINITIONS =================================
'
' XMM Base Address. Catalina currently requires one contiguous block 
' of XMM RAM - Note that this is the internal hardware address, not 
' the address the Catalina XMM Kernel uses:
'
XMM_BASE_ADDRESS = $0000_0000   ' XMM adddressing from 0
'
' XMM RW & RO Base Addresses - Note that these are the addresses used
' by the Catalina XMM Kernel - they typically start AFTER the last
' Hub address:
'
XMM_RW_BASE_ADDRESS = $0000_8000 ' Read-Write Base address
XMM_RO_BASE_ADDRESS = $0008_8000 ' Read-Only Base address
'
' XMM RW & RO Sizes (in bytes):
'
XMM_RW_SIZE = $0008_0000         ' Read-Write Size
XMM_RO_SIZE = $0040_0000         ' Read-Only Size
'
' This value determines the size of the cache index:
'
CACHE_INDEX_LOG2 = 7         ' log2 of entries in cache index
'
'
' CUSTOM QUAD SPI RAM Definitions (NOTE you also need to set the symbols 
' defined below these pin definitions to appropriate values): 
'
QSPI_VDD    = -1                ' PIN (PMC) - see below to enable
QSPI_VSS    = -1                ' PIN (PMC) - see below to enable
'
QSPI_CLK    = 27                ' PIN (PMC) Common Clock
QSPI_SCEN   = 26                ' PIN (PMC) SRAM Chip Enable
QSPI_FCEN   = 25                ' PIN (PMC) FLASH Chip Enable
QSPI_SDCEN  = 24                ' PIN (PMC) SD Card Chip Enable
'
QSPI_SIO3   = 23                ' PIN (PMC) \
QSPI_SIO2   = 22                ' PIN (PMC)  | MUST BE CONTIGUOUS
QSPI_SIO1   = 21                ' PIN (PMC)  | AND IN THIS ORDER 
QSPI_SIO0   = 20                ' PIN (PMC) /
'
SSPI_SI     = QSPI_SIO0         ' PIN (PMC)
SSPI_SO     = QSPI_SIO1         ' PIN (PMC)
'
XMM_DEBUG_PIN = 11              ' PIN (PMC) Used only for debugging
'
'
' Define this to enable applying power to the QSPI_VDD & QSPI_VSS Pins 
' (i.e. if they are connected to Propeller pins, and not directly to the
' appropriate power rails):
'
'#ifndef QUAD_POWER_PINS
'#define QUAD_POWER_PINS
'#endif
'
' Since Homespun/Openspin have no general "#if" capability, we cannot tell 
' whether or not we have to shift bits left or right to make the nibbles align
' with the pins QSPI_SIO0 .. QSPI_SIO3 when outputting data - so we have 
' to explicitly define whether or not we need to shift each nibble left 
' or right (but since the lower nibble would never have to be shifted 
' right, we only have three possibilities to worry about):
'
' Define this symbol if the lower nibble has to be 
' shifted LEFT for output (i.e. QSPI_SIO0 is > 0):
'
#ifndef QUAD_LOWER_NIBBLE_LEFT
#define QUAD_LOWER_NIBBLE_LEFT
#endif
'
' Define this symbol the upper nibble has to be 
' shifted LEFT for output (i.e. QSPI_SIO0 is > 4):
'
#ifndef QUAD_UPPER_NIBBLE_LEFT
#define QUAD_UPPER_NIBBLE_LEFT
#endif
'
' Define this symbol if the upper nibble has to be 
' shifted RIGHT for output (i.e. QSPI_SIO0 is < 4):
'
'#ifndef QUAD_UPPER_NIBBLE_RIGHT
'#define QUAD_UPPER_NIBBLE_RIGHT
'#endif

'#ifndef QUAD_LOWER_NIBBLE_RIGHT
'#define QUAD_LOWER_NIBBLE_RIGHT
'#endif

'
'========================== END OF XMM DEFINITIONS =============================
'

Then I modified the Custom_XMM.inc file and essentially copied the PMC API calls into every section that said "Insert Code Here". Essentially a one-for-one copy of the PMC code. It's way too long to post the text here, so I've included it as an attachment below.

Then I ran the Build RAM Test batch file: build_ram_test CUSTOM CACHED_1K

It created the output files without any problems or errors.

Then I ran the loader: payload RAM_Test_PC -I

Result of TRIV Test:

And the result of the CMPX Test:

Awesome! Every location within the 512KB SRAM was written to and read back successfully.

Next, I compiled this simple program to run under CMM:

// Program Is Hello.c

#include <stdio.h>

void main(void)
{
 short Times=0;
 for(;;) printf("Printed Hello Word %d Times\n",Times++);
}

Catalina CMM compile results:

Build started on: 21-06-2019 at 17:03.51
Build ended on: 21-06-2019 at 17:03.54
-------------- Build: default in Hello (compiler: Catalina C Compiler)---------------
catalina.exe -CPC -CCOMPACT -Clibc -CCUSTOM -IC:\Programs\Compiler\Catalina\include -IZ:\WorkCode\Catalina\Hello -c Hello.c -o .objs\Hello.obj
Catalina Compiler 3.13
catalina.exe -o Hello.binary .objs\Hello.obj -CPC -CCOMPACT -lc -CCUSTOM
Catalina Compiler 3.13
code = 7160 bytes
cnst = 160 bytes
init = 204 bytes
data = 940 bytes
file = 15920 bytes
Output file is Hello.binary with size 15.55 KB
Process terminated with status 0 (0 minute(s), 2 second(s))
0 error(s), 0 warning(s) (0 minute(s), 2 second(s))

After uploading and running this program here is the result:

Then I recompiled it to run under the XMM SMALL Memory Model:

Build started on: 21-06-2019 at 17:14.21
Build ended on: 21-06-2019 at 17:14.24
-------------- Build: default in Hello (compiler: Catalina C Compiler)---------------
catalina.exe -CPC -CCACHED_1K -CSMALL -Clibc -CCUSTOM -IC:\Programs\Compiler\Catalina\include -IZ:\WorkCode\Catalina\Hello -c Hello.c -o .objs\Hello.obj
Catalina Compiler 3.13
catalina.exe -o Hello.binary .objs\Hello.obj -CPC -CCACHED_1K -CSMALL -lc -CCUSTOM
code = 13348 bytes
cnst = 160 bytes
init = 208 bytes
data = 940 bytes
Catalina Compiler 3.13
file = 47504 bytes
Output file is Hello.binary with size 46.39 KB
Process terminated with status 0 (0 minute(s), 2 second(s))
0 error(s), 0 warning(s) (0 minute(s), 2 second(s))

But when I chose Download to XMM RAM and interact I got this:

Zip. Zero. Nada. I tried every CACHE option under SMALL memory model and recompiled. Same result: Nothing.

I then recompiled using the XMM LARGE Memory Model:

Build started on: 21-06-2019 at 17:15.56
Build ended on: 21-06-2019 at 17:15.59
-------------- Build: default in Hello (compiler: Catalina C Compiler)---------------
catalina.exe -CPC -CCACHED_1K -CLARGE -Clibc -CCUSTOM -IC:\Programs\Compiler\Catalina\include -IZ:\WorkCode\Catalina\Hello -c Hello.c -o .objs\Hello.obj
Catalina Compiler 3.13
catalina.exe -o Hello.binary .objs\Hello.obj -CPC -CCACHED_1K -CLARGE -lc -CCUSTOM
code = 14572 bytes
cnst = 160 bytes
init = 208 bytes
data = 940 bytes
file = 49168 bytes
Catalina Compiler 3.13
Output file is Hello.binary with size 48.02 KB
Process terminated with status 0 (0 minute(s), 3 second(s))
0 error(s), 0 warning(s) (0 minute(s), 3 second(s))

When I chose Download to XMM RAM and interact I still got:

I also tried recompiling using every CACHE option on the XMM LARGE Memory Model with the same result: Nothing.

This is puzzling considering that it passed the XMM Memory Test.

Thoughts?

RossH · 2019-06-22 04:13

Hello Wingineer19

Did you recompile the XMM loader to use your new working XMM code? Go to the utilities folder and use the command

build_utilities

Or, select the "Build XMM Utilities" option from the Code::Blocks tools menu.

(Edited: the build_all script in the utilities folder is deprecated!)

Wingineer19 · 2019-06-22 05:54

@RossH,

Oops, I forgot to do that.

I just ran the build_utilities and told it to create the loader for SRAM.

Upon completion, I did the "Download to XMM RAM and interact" and when it finished I got the scrolling output just like I did for CMM.

I'm very impressed. It was amazing to watch it work. This is a MAJOR step forward thanks to you.

My next step will be to re-configure it to load to the XMM Flash and try that out. But that will have to wait for another day as it's very, very late here. Time to call it quits for today.

Once again, thanks for your help on this.

tritonium · 2019-06-22 11:17

Hi

That's brilliant!
I wonder- how fast does it run?

Dave

Wingineer19 · 2019-06-22 16:24

@tritonium,

There was no noticeable speed difference running this program in XMM compared to CMM.

But it was simply printing out the string repeatedly via the Prop Serial Console port running at 115.2K baud, so I wasn't really expecting much visible difference there anyway.

To see the actual speed difference I would need to modify my code to read the system counter and then display the result.

But the big challenge over the next week or so is to get the code stored into flash memory and then have it execute upon system power up. I have a feeling getting the flash configuration to work won't be as easy as the SRAM...

Wingineer19 · 2019-06-22 17:56

Cluso99 wrote: »

I am curious to know what else you have hooked to the P1? You mentioned GPS. I presume that’s just a serial in to the P1?

Correct, using this uBlox GPS receiver:
https://www.sparkfun.com/products/15193

Along with a Wifi radio:
https://www.amazon.com/Sunhokey-Wireless-Transmission-External-Interface/dp/B07BRL4W9M/ref=sr_1_14?keywords=esp8285&qid=1561221297&s=gateway&sr=8-14

And rounding it up by controlling two bi-polar stepper motors. To that point, I have a couple of these boards in my possession:
https://www.amazon.com/Controller-DROK-H-Bridge-Brushed-Regulator/dp/B078TFLD7Q/ref=sr_1_1_sspa?keywords=drok+motor+controller&qid=1561221453&s=gateway&sr=8-1-spons&psc=1

This board has two outputs and was designed to control two brushed DC motors. Why not repurpose it and have each output control a coil on a bipolar stepper motor instead? One controller board per stepper motor required. I've already written the code to do this but haven't tested it yet.

On another topic, I do have concerns about losing the ability to start up new cogs while running in Catalina XMM mode.

As it stands now, I'm using the the 4-port Serial driver for the serial ports:

Port0: Prop1 Console (Yes, I replaced the normal HMI driver with this one)
Port1: GPS Receiver
Port2: Wifi Radio
Port3: Unused

In Catalina CMM mode I started up a new cog to manage all of the I/O traffic among these three ports, unpack received data, and package up new data ready for transmit, etc, etc.

On the Prop2, with its generous 512KB of RAM on board, losing the ability to start new cogs isn't an issue because code can be executed natively due to HUBEXEC capability and thus no LMM type of virtual machine will be required.

But with XMM running on the Prop1, it is because the new cog won't be able to execute out of XMM memory, and I don't think there's a feature to enable it to execute code from Hub RAM, either.

I might be able to work around this limitation by having the traffic management function placed within a loop in the main() part of the program...

Or maybe I can create some type of plug-in that can perform this traffic management function and include it during the "build_utilities" process...

If what I want to do won't work on the Prop1, then I'll move over to the Prop2 when the project boards are released. No biggie.

In the meantime, I'm having fun and learning a lot about the propeller in the process.

Right now I need to get the XMM Flash capability working so I can continue code development on the Prop1...

Cluso99 · 2019-06-22 22:04

"On the Prop2, with its generous 512KB of RAM on board, losing the ability to start new cogs isn't an issue because code can be executed natively due to HUBEXEC capability and thus no LMM type of virtual machine will be required."

I don't understand the "losing the ability to start new cogs" part. I presume this is just a current limitation of Catalina as it 's certainly not a P2 problem???

Wingineer19 · 2019-06-22 23:23

Cluso99 wrote: »

"On the Prop2, with its generous 512KB of RAM on board, losing the ability to start new cogs isn't an issue because code can be executed natively due to HUBEXEC capability and thus no LMM type of virtual machine will be required."

I don't understand the "losing the ability to start new cogs" part. I presume this is just a current limitation of Catalina as it 's certainly not a P2 problem???

Correct, it is a limitation of the Catalina compiler at this time when running the Prop1 in XMM mode and not the Prop2.

With 512KB of HubRAM, the Prop2 doesn't need an XMM memory driver (at least for most users I presume).

Of course this cogstart limitation doesn't exist with Catalina when running in CMM or LMM mode.

I saw somewhere that the PropGCC compiler does allow the startup of new cogs while in XMM mode, and the user could designate that new cog functions could be executed within HubRAM.

But I'm not working with the PropGCC at this time because I don't think it has the necessary drivers to work with my 512KB SRAM and 4MB Flash SPI chips. If there was a tutorial somewhere on how to write a memory driver I might try to write one myself if at all possible.

RossH · 2019-06-23 00:10

Wingineer19 wrote: »

Cluso99 wrote: »

"On the Prop2, with its generous 512KB of RAM on board, losing the ability to start new cogs isn't an issue because code can be executed natively due to HUBEXEC capability and thus no LMM type of virtual machine will be required."

I don't understand the "losing the ability to start new cogs" part. I presume this is just a current limitation of Catalina as it 's certainly not a P2 problem???

Correct, it is a limitation of the Catalina compiler at this time when running the Prop1 in XMM mode and not the Prop2.

You can certainly start new Spin or PASM cogs while in XMM mode. I presume you mean that you cannot start multiple cogs running Catalina code while using XMM. This is true. The XMM cache cannot support access from multiple cogs. The overhead of making it able to do so never seemed worth the effort.

Wingineer19 · 2019-06-23 03:11

RossH wrote: »

You can certainly start new Spin or PASM cogs while in XMM mode. I presume you mean that you cannot start multiple cogs running Catalina code while using XMM. This is true. The XMM cache cannot support access from multiple cogs. The overhead of making it able to do so never seemed worth the effort.

Yes, I was referring to running C code in the other cogs while the Prop1 is operating in XMM mode.

It didn't really dawn on me until now that the other cogs can be started assuming they are going to run Spin or PASM code.

I'm assuming right now that I need another cog to manage the I/O traffic to/from both the GPS receiver and the radio. It would need to get/send data from/to the cog running the Quad Serial Port driver, then populate various data structures in Hub RAM for use by the cog running the main() C code. I've written code to do this in C using structures and unions. I wonder if I could translate this into Spin or PASM and have another cog execute it...

I couldn't imagine the nightmare involved in trying to manage XMM memory among all 8 cogs.

Not being constrained by the facts of knowing all the nuances yet within the propeller, I'm free to speculate, so I guess I would attempt something like this:

1. Assign one cog as an XMM memory manager exclusively. It would allow the other cogs to access XMM memory on a round-robin fashion similar to how the HubRAM manager works.

A mailbox scheme would have to be configured in HubRAM for each cog to communicate with the XMM management cog. Memory read/write requests as well as the data bytes/words/long words would have to be routed through HubRAM using this scheme.

Each cog wanting access to XMM memory would have to wait its turn, and combined with the time the XMM manager requires to toggle the various I/O pins in order to perform the read/write operation, then upload/download this information into HubRAM, it seems overall performance of each cog would be well south of 5 MIPS.

2. The above assumes a fast external memory access design. Parallel would be ideal. Quad SPI might provide decent throughput. Standard serial SPI would probably be a disaster.

3. Throw in a caching arrangement and I think the above scheme gets trashed and completely unworkable. So caching would have to be off the table. Access to XMM memory by each cog would have to be done upon demand on a request/response basis to the XMM memory management cog.

4. Each cog would have to be assigned a particular block of memory in the XMM in which to execute by the compiler. Again, fetching the instructions and data within this block would require requests routed through the XMM memory management cog.

5. Theoretically a block within XMM could be configured as common memory for global variables and maybe a stack pointer for each cog, but it would be faster to do this in HubRAM.

6. So I think what I'm trying to say is that each cog would have to be running in LMM mode, but instead of directly reading the next instruction from HubRAM like it currently does via

rdlong instr,pc

it would have to place a mailbox request in HubRAM to have the XMM Memory cog fetch it instead. And then wait its turn for a response. Right there we can see this is much slower than what LMM does now with direct access to HubRAM.

7. I suppose if the microcode within the propeller itself was rewritten to perform the external read/write operation the process would be much smoother and faster, and access to external memory would appear transparent to the cogs (but still slower than HubRAM access). But then you would have to call this thing a microprocessor and not a microcontroller I reckon.

This is just speculation on the very rudimentary aspects of what would need to get done. So yeah, doing this would be very involved.

I'm glad I'm not the one who would have to do it.

Hat's off to the PropGCC developers for getting this done.

I'm curious what the throughput penalty would be for having multiple cogs accessing the XMM memory. Seems to me the more cogs needing access, the slower the overall throughput would be to the point of a totally diminished return...

Can we say hello to the Prop2?

RossH · 2019-06-23 03:57

Wingineer19 wrote: »

Hat's off to the PropGCC developers for getting this done.

Indeed. This was not the case last time I looked at PropGCC - mind you, that's quite a few years ago now!

Has anyone actually used this feature? How is the performance?

Wingineer19 · 2019-06-23 04:25

David Betz wrote: »

One problem with PropGCC XMM mode is that the interface to the "cache drivers" changed between the version of GCC that is distributed with SimpleIDE and the one that is in the main branch of the current PropGCC repository. Parallax asked that we make it possible to run XMM code in more than one COG at the same time and that required some refactoring. However, they never released that new version of PropGCC and then later dropped XMM entirely.

@"David Betz" ,

So did PropGCC actually reach the point of supporting XMM in multiple cogs or am I reading this wrong?

If it does indeed support XMM in multiple cogs, were any benchmarks done to gauge its overall performance with multiple cogs running?

It's been said that a single cog running in LMM mode can do close to 5 MIPS as opposed to one running self-contained PASM code at 20 MIPS.

Any idea on the performance of multiple cogs running in XMM mode?

jmg · 2019-06-23 04:51

Wingineer19 wrote: »

7. I suppose if the microcode within the propeller itself was rewritten to perform the external read/write operation the process would be much smoother and faster, and access to external memory would appear transparent to the cogs (but still slower than HubRAM access). But then you would have to call this thing a microprocessor and not a microcontroller I reckon.

This is just speculation on the very rudimentary aspects of what would need to get done. So yeah, doing this would be very involved.

P1V could be a candidate for "microcode within the propeller itself was rewritten to perform the external read/write operation" - but with P2 very close, that may make less sense ?

Wuerfel_21 · 2019-06-23 05:35

Random vaguely related ramblings follow:

I assume that multiple XMM cogs (besides being questionably useful) would run really slow. In my experience with my homegrown XMM solution, the vast majority of time is spent transferring data or waiting for the storage device (an SD card in my case, bare flash and SRAM don't have this problem as much) to rummage through itself to pull out the requested data.
Also, only one transfer can happen at a time, so in a multi-cog XMM system, each transfer may be prefaced by a rather long waitstate.

On aforementioned SD card, empirical testing (by the rather primitive way of changing the screen border color before and after each transfer, like in ye olden days...) shows that reading a 512 byte page takes roughly 11 NTSC scanlines (= 1/15734 seconds each) (of which the actual data transfer takes up maybe 5 or so? Again, flash and SRAM don't have that much latency), so assuming no cache hits, 1430 pages (*128 = 183 kIPS) can be executed per second. Sounds like a lot until you notice that's only ~23*128 instructions per frame. The theoretical maximum for SPI assuming an average (including the time to execute the instructions) read speed of 10 MBit/s is 2441 pages, double that for the absolute maximum theoretical SPI burst read speed (using multiple cogs....))

I believe GCC and Catalina use smaller page sizes on memories with more fine-grained access, which has the upside of making it more likely that the cache hits and the downside of requiring more shorter individual transactions (=more overhead) and requiring more end-of-page instructions (I assume they use those, otherwise they'd only get a meager 2.5 MIPS...) and more long (aka slow) jumps instead of short jumps. Small pages do work good when using XMM memory to store scalar variables, I guess?. I don't know.

Cluso99 · 2019-06-23 06:58

I have a few cogs running with Catalina on P1. Ross tried without success to have a standardised mailbox in hub to pass data between cogs and/or Catalina. It works fine.
IIRC I have FullDuplexSerial mod’d to run. I cannot recall if Ross or I got the SD/FAT32 working. I would have needed to do the raw driver as I share the pins with SRAM and EEPROM and possibly the RTC too.

Wingineer19 · 2019-06-23 16:04

Yes, the various plugins I'm experimenting with, like the 4Port Serial driver, work fine under XMM.

I do need a traffic manager to handle all the I/O traffic to/from the GPS receiver, the radio, and the stepper motors. Ideally that would be handled by its own cog, with commands, replies, and data shared with the main() function via global variables, but since the code is written in C that's not an option in XMM mode.

This project has lots of Menus available to the operator. The main() function has a loop which calls these Menu functions. Since these Menus are updated once per second on the serial console screen, having them pulled from XMM memory won't be a problem.

I only need a maximum of 10Hz data throughput for the traffic messages, so it's possible this traffic manager might work by calling its functions from within the same loop in main() that calls the Menu functions. Won't know until I try it.

Or, I suppose I can attempt to create a driver in Spin or PASM in another cog that will do these traffic management functions.

At this point this whole exercise is a hobby so I'm not under a deadline to get it done like I would be if this was a commercial project.

So I'm free to experiment and try different approaches. The more I learn about the Prop1 the better.

That being said, patience is not one of my virtues, so I would like to wrap up this project as soon as possible...

Wingineer19 · 2019-06-25 06:34

P1V could be a candidate for "microcode within the propeller itself was rewritten to perform the external read/write operation" - but with P2 very close, that may make less sense ?

Yeah, that ship has sailed, and the Prop2 is heading into port soon.

I don't know how much space is available within the Prop1 to modify the microcode, but if it could be done I would go with an SPI/SQI SRAM chip as XMM memory. It's the one option that would give a decent amount of memory but require the fewest number of pins.

The largest I2C EEPROM I've seen so far is 2Mbit, so I would use that for bootup.

The microcode would force the SPI/SQI SRAM chip into Quad Mode to get the best throughput from it. It wouldn't be as fast as an 8-bit parallel memory arrangement but hopefully good enough to be useful.

All user code would execute from this SRAM chip. The microcode would permit the cogs access to it using a round-robin approach just like HubRAM.

New instructions would have to be added to give the cogs a choice of reading from HubRAM or XMM.

With the microcode handling all of the XMM memory management, the process would be transparent to the various compilers, so there shouldn't be any problem executing C code on different cogs using this arrangement.

I suppose someone wanting to attempt this could do it with an FPGA, but at this point it's an academic exercise as the Prop2 is just a whisper away...

jmg · 2019-06-25 06:57

Wingineer19 wrote: »

I don't know how much space is available within the Prop1 to modify the microcode,...

Note the Prop1 does not have microcode, it is a full custom layout.

Wingineer19 wrote: »

I suppose someone wanting to attempt this could do it with an FPGA, but at this point it's an academic exercise as the Prop2 is just a whisper away...

Any changes to Prop1, would be via FPGA's as a proving vehicle anyway.
FPGA's continue to be interesting, as there are solutions for prices not too different to a Prop1, but they giver a different mix of outcomes.
eg They can deliver more memory, but fewer COGs.
examples of simpler packages :
ICE40UP5K-SG48I comes in 7x7 QFN48, with 39io and 128kB RAM, but cannot fit 8 COGs and might not quite hit 80MHz 32b counters either.
MachXO3D-9400 (no price yet?) in 10x10 QFN72, for 58io, 54kRAM, has 200~800MHz VCO, 400MHz max SysCLKs & 133MHz MCLK.

Wingineer19 · 2019-06-25 16:09

I think for this thought experiment I would try to duplicate the Prop1 exactly, i.e. 8cogs with 32KB of HubRAM. The only changes would be to implement the XMM memory management previously discussed, along with some new instructions to allow the cogs to access the XMM just like they do the HubRAM.

I haven't looked into the availability of FPGAs to do this, but I wonder if the ICE40UP5K-SG48I you mentioned could do all of this.

I don't have a development system to burn FPGAs anyway, so this is just a thought experiment (others might say an exercise in futility).

Even if these modifications were done and burned into an FPGA and it worked perfectly, one would still need the software tools (compilers) to exploit this new feature in user applications. I think the compiler developers have already run their course on the Prop1 and are now focused on the Prop2.

Returning back to reality, I'll continue to experiment with XMM on my Prop1 board to see what I can do with it.

Ross provided much appreciated help to get my 512KB SPI SRAM to work as XMM memory.

I now need to focus on getting the 4MB SPI Flash to work so it can load and run my code upon power up.

Revisiting XMM Mode Using Modern Memory Chips

Comments