Revisiting XMM Mode Using Modern Memory Chips

jmg · 2019-06-25 19:29

Wingineer19 wrote: »

I think for this thought experiment I would try to duplicate the Prop1 exactly, i.e. 8cogs with 32KB of HubRAM. The only changes would be to implement the XMM memory management previously discussed, along with some new instructions to allow the cogs to access the XMM just like they do the HubRAM.

I haven't looked into the availability of FPGAs to do this, but I wonder if the ICE40UP5K-SG48I you mentioned could do all of this.

Not quite 'all of this' - it's too small to fit 8 COGs with something just under 2k LE's needed per GOC.
See the threads on P1V around what FPGAs can fit a full P1, tho at those sizes the price and packages are somewhat above P1 region.

This one was interesting
https://forums.parallax.com/discussion/165645/p1v-with-lattice-ecp5-fpgas/p1

A check on price, has come down very slightly ? - and for the IO counts you get, it's not too bad..
LFE5U-25F-6BG256I Lattice FPGA $8.69 @ 100 24000 LUT 1032192bits RAM 197 io 256-CABGA (14x14)

I think someone may have had SDRAM hardware support, on a different FPGA.
This will be somewhat driven by what boards have what mix of fitted memories.
eg this board is mentioned in that thread
https://www.crowdsupply.com/tinyfpga/tinyfpga-ex
and I see an update has the mix of 3 changed a little...
https://www.crowdsupply.com/tinyfpga/tinyfpga-ex/updates/new-prototypes-and-updated-variants

The tools side of this is interesting, as there is an open-source tool flow mentioned.

Wingineer19 · 2019-06-27 05:44

@RossH,

Quick update: With the exception of attempting to use 8K cache, I've had zero problems getting the code to load and execute from SPI SRAM.

Why it didn't like the 8K cache setting, but was OK with 1K, 2K, or 4K is unknown. I've used the build_utilities application to configure for each cache option I wanted to try.

Now, I'm attempting to use the FLASH loader to have my code load from the SPI FLASH instead of the SPI SRAM.

I've run the build_utilities to configure it also for 4K cache and chose the Flash_Options to use the lserial4 library along with NO_HMI since my code uses Four Serial Ports (with Port0 used as the Console on pins 30 & 31 instead of the normal PC or TTY option).

Anyway, when I attempt to compile using the -C FLASH option I get:

Build started on: 26-06-2019 at 23:29.38
Build ended on: 26-06-2019 at 23:29.39
Build: default in MenuTest (compiler: Catalina C Compiler)
catalina.exe -y -O5 -CNO_HMI -CFLASH -CCACHED_4K -CLARGE -Clibserial4 -Clibm -Clibcx -CCUSTOM -IC:\Programs\Compiler\Catalina\include -IC:\WorkCode\Catalina\MenuTest\OneCog\25Jun19 -c MenuTest.c -o .objs\MenuTest.obj
Catalina Compiler 3.13
catalina.exe -o MenuTest.binary .objs\MenuTest.obj -y -O5 -CNO_HMI -CFLASH -CCACHED_4K -CLARGE -lserial4 -lm -lcx -CCUSTOM
Catalina_SPI_Cache.spin(1958:37) : error : Origin exceeds FIT limit
Line:
FIT 496 ' out of 496
Catalina Optimizer 3.13
Offending Item: ' out of 496
Catalina Compiler 3.13
Process terminated with status 0 (0 minute(s), 0 second(s))
1 error(s), 0 warning(s) (0 minute(s), 0 second(s))

I get this error regardless of whether I try the SMALL or the LARGE memory model (each with 4K cache).

Any idea why it's giving me a FIT limit error? I've never seen this error before...

Thanks

RossH · 2019-06-27 06:02

Wingineer19 wrote: »

Catalina_SPI_Cache.spin(1958:37) : error : Origin exceeds FIT limit

It looks like your SPI code is slightly too large to fit in the space available in the cog used for the cache. If you post (or email me) your code, I will have a look at it. I will send you my email address in a PM.

Ross.

RossH · 2019-06-28 07:27

Hello @Wingineer19

In your Custom_XMM_DEF.inc file, change the following line (line 26):

CACHE_INDEX_LOG2 = 7         ' log2 of entries in cache index

Change it to

CACHE_INDEX_LOG2 = 6         ' log2 of entries in cache index

The problem is that the XMM access code is quite large for this particular setup, so the cache size has to be reduced to accommodate it.

Ross.

Wingineer19 · 2019-06-28 15:56

Hi @RossH,

That definitely fixed the compile error I was getting. Thanks.

So, the next step was to run build_utilities again to reconfigure the settings for the FLASH loader instead of the SRAM loader:

Let's get started:

OK, let's choose the CUSTOM option:

There is no other Memory Board attached. All memory settings are within the CUSTOM configuration, so just hit ENTER:

Yes, CUSTOM does have FLASH Memory installed:

Let's go with 4K of Cache:

OK, yes, I want the Flash Booter to load in the libserial4 driver. Don't need the HMI driver since Port0 of the libserial4 driver will serve this purpose:

And yes, I do have SRAM memory installed as well:

Sure, let's go with 4K of Cache for SRAM as well:

Let's configure the Bootloader to load the FLASH option this time:

Fantastic. The build_utilities process was successful.

Now, let's upload this to the Prop1:

Ouch! I'm getting a Sync Error. Looks like I screwed up somewhere

It's telling me there's a problem uploading the code to the FLASH chip for some reason.

Maybe it's a hardware problem.

Or, maybe I don't have the XMM FLASH API settings configured correctly?

Let me take a look at the build_ram_test utility and see if it can be configured to perform testing on the flash chip...

Will keep you posted.

UPDATE:

Was able to run the Ram Test with the Flash Option included. As before, all 512KB of the SRAM passed the test. However, starting at address 0x80000 (where Flash starts), it's showing all Zeros. So something is definitely wrong here:

RossH · 2019-06-28 23:25

Hi @Wingineer19

One thing to try is using SPI mode instead of QPI mode for programming the flash. Uncomment line 96 in Custom_XMM.inc and recompile everything - i.e. make line 96:

#define USE_SPI_FOR_FLASH_WRITE

Also, do you have a data sheet for your Flash chip? Please check that it supports all the commands listed in Custom_XMM.def - i.e.

'
' QUAD FLASH commands:
'
JDEC_ID       = $9F
PAGE_PROGRAM  = $02
RD_STATUS_1   = $05
RD_STATUS_2   = $35
RD_STATUS_3   = $15
WR_STATUS_1   = $01
WR_STATUS_2   = $31
WR_STATUS_3   = $11
WRITE_ENABLE  = $06
WRITE_DISABLE = $04
QUAD_ENABLE   = $38
QUAD_RESET    = $FF
HI_SPEED_READ = $0B
CHIP_ERASE    = $C7
SECTOR_ERASE  = $20
GLOBAL_UNPROT = $98
ENABLE_RESET  = $66
RESET_DEVICE  = $99
QUAD_EXIT     = $FF

Or post the data sheet and I will check.

Ross.

Wingineer19 · 2019-06-29 00:07

Hello @RossH,

Sure, I'll make the changes to the file to force the Flash chip into SPI mode, and give it another try.

I've enclosed the Datasheet for the 32Mb FLASH memory chip.

Glancing over it I cannot readily see the defines for:
WR_STATUS2 0x31
WR_STATUS3 0x11
GLOBAL_UNPROT 0x98

They might be there, but my eyes can't seem to find them right now...

Wingineer19 · 2019-06-29 00:20

@RossH,

Bummer, it didn't work.

Same Sync Error as before when using Payload, and same field of zeros when using Ram Test.

I'm going to take a dinner break.

When I get back I will move the SRAM chip over to the socket that currently contains the FLASH chip.

Then I'll have the Memory Scan program that I wrote access that socket just to verify that the wiring setup on that socket is correct.

Will keep you posted.

RossH · 2019-06-29 00:34

Wingineer19 wrote: »

@RossH,

Bummer, it didn't work.

Same Sync Error as before when using Payload, and same field of zeros when using Ram Test.

The sync error implies a communications problem, which would imply that the flash loader is perhaps simply crashing. I'll have a look at what else could cause this.

Ross.

Wingineer19 · 2019-06-29 02:40

@RossH,

OK, I'm back.

1. I removed the FLASH chip from its socket and set it aside.
2. I plugged the SRAM chip into its socket instead.
3. I ran my Memory Test program and verified that what was written out to this socket was read back correctly.

So, I know the wiring is correct. The FLCS, SCLK, and D0->D3 lines are correct.

I removed the SRAM chip and put it back in its socket and plugged the FLASH chip back into its proper socket.

My Memory Test program was designed to only test SRAM chips. I haven't modified it yet for the insane Read/Write command sequences to validate Flash operation.

I did run your Ram Test program again and still got all zeros for each block of memory it tested.

Also, when I attempted to end the program, it appears that it is hung in the Erasing cycle for some reason and wont go back to the console prompt.

RossH · 2019-06-29 06:47

Wingineer19 wrote: »

Hello @RossH,

Sure, I'll make the changes to the file to force the Flash chip into SPI mode, and give it another try.

I've enclosed the Datasheet for the 32Mb FLASH memory chip.

Glancing over it I cannot readily see the defines for:
WR_STATUS2 0x31
WR_STATUS3 0x11
GLOBAL_UNPROT 0x98

They might be there, but my eyes can't seem to find them right now...

Yes, this chip seems to be different to program than the one for which the code you are using was written. I cannot recall what that chip was, but it was whatever was on the PMC board. As you have pointed out, there is no GLOBAL_UNPROT function, and the READ_STATUS, QUAD_ENABLE and QUAD_EXIT operations are also different. You will need to write new code specifically for this chip (or modify the PMC code) to make it work.

The RAM Test program is still your best way forward - get that working, and the rest should follow.

Wingineer19 · 2019-06-29 18:25

RossH wrote: »

...You will need to write new code specifically for this chip (or modify the PMC code) to make it work...

That was my conclusion as well after scanning through the manual and seeing that it was missing several of the operational codes needed to emulate the Flash chip used on the PMC board.

It's been quite some time since I've programmed in Assembly, and that's what will be needed to create the new XMM Flash API functions.

I've been wanting to go deep on the Prop1, but I didn't know I would have the chance to attempt it so soon.

Another possibility, if I want to go the hardware approach only, is to replace the 64KB EEPROM with a higher capacity one, like the AT24CM02. I have several of these AT24CM02 (256KB) EEPROMs in my possession to do just that.

However, I don't know if the Prop1 itself would have a problem with that sized EEPROM during bootup, since the Device Address byte is different: A1 and A0 in the AT24C512 is replaced with A17 and A16 in the AT24CM02. I suppose if these values were always Zeros when the Prop1 attempts bootup access then it would be OK.

Would Catalina have a problem working with an EEPROM that size? If not, then perhaps I could use the EEPROM option when compiling my XMM code and have Catalina store it there?

My finished C code will likely be quite less than 128KB, so having Catalina transfer it to the 512KB SRAM from EEPROM and execute in XMM mode seems doable.

The RAM Test program is still your best way forward - get that working, and the rest should follow.

Indeed it is. I very much appreciate your help during this adventure. I must say that Catalina is an excellent C compiler, and your very extensive documentation on it has been most welcome.

I'll keep working on this and attempt to reach a good resolution.

Hopefully you won't mind if I pose another question or two, from time to time, as I progress.

Thanks again for your help.

hinv · 2019-06-30 04:05

Wingineer19 wrote: »

I think for this thought experiment I would try to duplicate the Prop1 exactly, i.e. 8cogs with 32KB of HubRAM. The only changes would be to implement the XMM memory management previously discussed, along with some new instructions to allow the cogs to access the XMM just like they do the HubRAM.

I always thought it would be great to have a portB prop, but with the following additions:
full SerDes in the the counters
bare minimum rom for booting all the rest of the 64k being ream
rom masked to faster spin variant that was developed since P1
working fast mul/div instructions
higher clocks
64 I/O's, of course, or a 32IO version for a 40pin pack with 32io's not brought out to pins. Port B can then be used for fast mailbox between cogs
I'm guessing I could do that in FPGA if I ever get the time. Maybe after I play with the P2, I will loose all interest.

hinv · 2019-06-30 04:08

What does Catalina actually compile on? Windows? Linux? Dos?

RossH · 2019-06-30 04:13

Wingineer19 wrote: »

Would Catalina have a problem working with an EEPROM that size? If not, then perhaps I could use the EEPROM option when compiling my XMM code and have Catalina store it there?

I'd have to see the spec sheet to be sure. Another route might be to just use a smaller Flash chip - one that has been used on another board that Catalina supports (or that is higher capacity, but compatible with one of those).

In any case, good luck, and I'm happy to answer any more questions.

Ross.

Wingineer19 · 2019-06-30 05:05

hinv wrote: »

What does Catalina actually compile on? Windows? Linux? Dos?

I'm running it on a Windows 10 machine.

Catalina is very much worth a look if you're a C programmer.

Lots of ready to go software plug-ins available for video, serial, KVMs, etc, along with various sample code, configuration files for various boards you can modify or customize for your particular design (that's what I'm doing now), diagnostic utilities, plus lots and lots of documentation to help you through it all.

It supports LMM, CMM, and XMM modes in various flavors (Small or Large Memory models and a selection of cache options). It can store your code in Flash or EEPROM for execution upon startup, or you can dump your code into HubRAM (or XMM SRAM in my case) for testing and debugging before saving it to non-volatile memory.

It made a seamless integration into Code::Blocks for a complete and ready to run IDE.

Plus @RossH has been there to provide help and answer any questions I've had.

It really is a top of the line product. And you certainly can't beat the price!

Wingineer19 · 2019-06-30 05:53

RossH wrote: »

I'd have to see the spec sheet [on the AT24CM02] to be sure...[to see if it might work with Catalina for program storage]

I've enclosed one on the AT24CM02 for your review.

I'm thinking it should work for the Prop bootup just as long as the Device Address values for A17 and A16 bits are zero. A user application (like Catalina) can later set these bits to access memory beyond 64KB for code storage.

Replacing the new Flash chip with one that already has proven drivers is certainly one approach to the problem.

But since I want to learn Prop Assembly anyway, I'm going to make an attempt to write my own API functions for it.

I've already written some sample code to force Quad Mode operation, but haven't tested it yet.

I must say that the Prop Assembly Language is a whole different animal than what I dealt with in the past. I'm used to fixed register names. Never dealt with a device where I can assign memory locations as registers, and where I can add suffixes to the instructions telling it not to modify the memory (register) location after performing some operation (or leave the C and Z bits alone).

Couple these features with the ability to read/write from/to HubRAM and now I understand how @"Bill Henning" was able to exploit all of this capability to introduce LMM mode (thus opening the door to C compilers). This is a very interesting critter.

I've been looking for a Prop1 emulator so I can load and test my code and look at the "registers" (memory locations) after each instruction execution.

I found GEAR along with pPropellerSim, which looks very interesting. I just downloaded the latest version of Java to take the pPropellerSim for a spin (no pun intended).

Anyway, I'm temporarily digressing from my C coding efforts to experiment with Prop Assembly, and it looks very interesting...

RossH · 2019-06-30 06:56

Wingineer19 wrote: »

Couple these features with the ability to read/write from/to HubRAM and now I understand how @"Bill Henning" was able to exploit all of this capability to introduce LMM mode (thus opening the door to C compilers). This is a very interesting critter.

Indeed. I wonder what happened to Bill. Is he still around? Also, if anyone is in touch with Bob Anderson these days, I'd appreciate an email address or something. He does not reply to the one I have.

RossH · 2019-06-30 07:03

hinv wrote: »

What does Catalina actually compile on? Windows? Linux? Dos?

I compile it on Windows and Linux. I used to also compile it on MacOS but I gave up because I don't actually own a Mac (I used a virtual Mac, but it got too far out of date).

David Betz · 2019-07-01 00:13

RossH wrote: »

Wingineer19 wrote: »

Hat's off to the PropGCC developers for getting this done.

Indeed. This was not the case last time I looked at PropGCC - mind you, that's quite a few years ago now!

Has anyone actually used this feature? How is the performance?

I doubt anyone has used this feature as it is only available in the main branch of the PropGCC repository and Parallax has never released SimpleIDE using that version of PropGCC. Also, it is only possible to run XMM C code in 7 COGs since one has to run the cache driver.

Wingineer19 · 2019-07-01 03:58

David Betz wrote: »

RossH wrote: »

Wingineer19 wrote: »

Hat's off to the PropGCC developers for getting this done.

Indeed. This was not the case last time I looked at PropGCC - mind you, that's quite a few years ago now!

Has anyone actually used this feature? How is the performance?

I doubt anyone has used this feature as it is only available in the main branch of the PropGCC repository and Parallax has never released SimpleIDE using that version of PropGCC. Also, it is only possible to run XMM C code in 7 COGs since one has to run the cache driver.

@"David Betz" ,

Fascinating. I would like to test this capability in PropGCC, but I would first need to have an XMM memory driver for it.

I have no idea where to even start doing that in PropGCC. How would I configure the XMM driver for PropGCC?

In Catalina there's a pretty good selection of XMM drivers to choose from, or modify to suit your needs if desired.

There's a Catalina file I'm using with the XMM API calls. For each function it calls (XMM initialize, byte read/write, word read/write, or long read/write) you simply add the Assembly code under each section to make it happen. How do I do this with PropGCC?

As of now I've modified the XMM API file in Catalina to use a variant of the PMC driver, but so far I've only got the 512KB SRAM to work. Getting the Flash memory chip to work is more involved than I had hoped.

Nevertheless, even without a working Flash chip, Catalina allowed me to compile the XMM C code then dump it into the 512KB SRAM memory for testing.

Obviously I can't save the code yet for bootup due to the Flash problem, but at least I'm able to run it from the SRAM memory and test it which is a great help. At this point in the C code testing and debugging, the Flash problem isn't critical. This Catalina feature of compiling then loading into SRAM for testing and debugging C code is awesome.

So, bottom line, I would like to try PropGCC in XMM Mode, but I need the following before I can:

1. The current location of the PropGCC Repository that supports multi-cog XMM C code. Is this it? https://code.google.com/archive/p/propgcc/downloads

2. A means to create/update/modify the XMM API to work with my 512KB SRAM chip at the very minimum (until I figure out how to work with the Flash chip).

3. The ability to load the XMM C Code into the 512KB SRAM (at least initially) so I can test and debug my code (until the Flash driver issue is resolved).

Can you point me in the right direction to get this done?

Wingineer19 · 2019-07-09 06:27

@RossH,

Remember in an earlier post when I cautioned you that I would likely have some more questions about the XMM API?

Well I'm back and I have one about the Direct API functions.

But first, let me digress. Recall that the previous issue was I couldn't get the Flash RAM to work because it has some different commands than the ones used with the PMC. Hence the XMM API functions for it need to be rewritten.

Well, before I embark on that adventure, I decided to rewrite ALL of the XMM API functions for the SRAM from scratch. Once I've mastered that I will tackle the Flash API stuff.

The SRAM XMM API rewrite is done and I've successfully tested the new Cached functions (ReadPage and WritePage). They work fine with both RamTest and the Menu application I sent to you.

Now, I want to test the Direct API functions.

When I run build_ram_test CUSTOM (but leave out any CACHE options) I get this:
build_ram_test%20CUSTOM.jpg

It keeps barking at me about XMM_ReadMult already defined and I don't know why.

I've attached my CUSTOM files. Could you glance over them and see if there's something obvious that's causing this error?

Thanks.

RossH · 2019-07-09 08:05

Wingineer19 wrote: »

It keeps barking at me about XMM_ReadMult already defined and I don't know why.

I've attached my CUSTOM files. Could you glance over them and see if there's something obvious that's causing this error?

Thanks.

You have XMM_ReadMult defined on line 163 and also on line 196 in CUSTOM_XMM.def. If you define both NEED_XMM_READLONG and NEED_XMM_WRITELONG (which would be usual) then you will end up with it defined twice.

If you need both, then rename on or the other, or else take them both out of their corresponding "#defines" and just define it once elsewhere.

Ross.

Wingineer19 · 2019-07-09 20:07

Hi @RossH,

Good catch! That second XMM_ReadMult should have been XMM_WriteMult instead.

Somehow I duplicated it when copying from my primary coding file instead of copying the XMM_WriteMult.

I should have caught that before making the post. I guess that's what happens when coding around midnight

I fixed that duplicate issue. The good news is that build_ram_test no longer has that particular problem. The bad news is that it says the code is about 20 longs beyond the 96 longs limit.

Question: Does the 96 longs limit apply only to these four Direct functions, or does it also include the variables (RamCsHigh,RamCsLow,MemClkHigh,MemClkLow,etc) used within them?

I ask because these variables are also used in the Cached XMM_ReadPage and XMM_WritePage functions so it doesn't seem they would be included within the Direct 96 longs limit, but would be included for the overall limit of all the API functions.

I was glad to see the Cached functions worked fine. At least some progress has been made.

It's been many, many years since I've programmed in Assembly. Now that I've caught my second wind I'm actually enjoying it on the Prop1.

I do have some questions about the Flash API functions, along with an idea I want to bounce off you about them, but that can wait for a future post.

In the meantime, back to work on the Direct functions to see how much I can shrink them down...

RossH · 2019-07-10 05:14

Wingineer19 wrote: »

Question: Does the 96 longs limit apply only to these four Direct functions, or does it also include the variables (RamCsHigh,RamCsLow,MemClkHigh,MemClkLow,etc) used within them?

I ask because these variables are also used in the Cached XMM_ReadPage and XMM_WritePage functions so it doesn't seem they would be included within the Direct 96 longs limit, but would be included for the overall limit of all the API functions.

Yes, any variables you use in the direct functions must fit in the number of available longs in the XMM Kernel (I don't recall the exact number of longs available, but 96 could be correct). This is because if you do not use the cache, the XMM direct access functions are included in the kernel itself in place of the cached versions of the access functions, not alongside them.

I'm afraid that if you want to have access to XMM direct from the kernel (i.e. not via the cache cog), then yes, you will need to prune your access functions down until they fit. This is always a struggle.

But you may want to think about whether having only cached acess is enough. It generally is.

If you decide you do need direct access, then look at it this way - by the time you have managed to squeeze it in, you will be well on the way to being an expert PASM programmer!

Wingineer19 · 2019-07-11 04:44

Hi @RossH,

If you decide you do need direct access, then look at it this way - by the time you have managed to squeeze it in, you will be well on the way to being an expert PASM programmer!

Indeed. I'm getting quite a crash course on Prop Assembly having put in around 10 hours per weekday since starting this Project

Right now I've squeezed all four XMM Direct functions into 51 Longs. Add to this the 16 Longs of Variables. And I still need to add functions to force the SRAM into Quad Mode and configure it for Sequential Access. It's getting very tight.

BTW, I've been using PASD to perform realtime debugging of my code. It's been a lifesaver. Kudos to Andy (@Ariba) for this very helpful tool.

Question: Does the 96 Longs limit for XMM Direct also include the XMM_Activate and XMM_Tristate functions as well?

I'm assuming, due to the mutually exclusive memory access feature, either Cached or Direct ( but not both), this means these two functions must be part of the Direct suite as well.

So if this assumption is correct, I will have to squeeze the four XMM_Direct functions (and their sub-functions), the QuadMode function, the Sequential Access function, the XMM_Activate function, the XMM_Tristate function, and all the variables needed, into the 96 Longs.

Yikes!

I guess I'm attempting to do this because I want to gauge, if possible, any performance difference between Cached versus Direct access.

The Caching functions I wrote work perfectly. But even they can get squeezed down given what I've done with the Direct access functions. So that's next on the agenda once I'm through with the Direct issue.

Anyway, even if this Direct access attempt fails, it's been a great educational experience as I'm getting quite familiar with Prop Assembly (whether I wanted to or not)

But I'm certainly looking forward to returning to my C programming project

RossH · 2019-07-11 07:17

Wingineer19 wrote: »

Question: Does the 96 Longs limit for XMM Direct also include the XMM_Activate and XMM_Tristate functions as well?

I'm assuming, due to the mutually exclusive memory access feature, either Cached or Direct ( but not both), this means these two functions must be part of the Direct suite as well.

Yes, if these are required by the Direct Access API functions - which they would be unless you have permanent access to the XMM RAM from the kernel cog, with none of the I/O lines shared with any other devices.

Ross.

Wingineer19 · 2019-07-12 01:18

Hi @RossH,

I was able to squeeze all of the Direct API functions into 96 Longs. In addition to the Activate and Read/Write functions, I had to include one to force the SRAM into Quad Mode, and another to force it in Sequential Access Mode. This last function is optional as the default mode of the SRAM is sequential anyway.

I've enclosed a copy of all of these functions, plus the WritePage and ReadPage ones used during caching.

The good news is that RamTest seemed to work with both Caching and Direct Modes.

The bad news is that the Catalina Compiler itself had some issues.

Let's start with the good stuff:

A. Caching Mode
1.Run build_ram_test CUSTOM CACHED_2K

2. Passed RamTest TRIV

3. Passed RamTest CMPX

4. Then I ran build_utilities, CUSTOM platform, no addons, no flash, and SRAM with 2K cache
5. Then I loaded the MenuTest program into Catalina, chose SMALL memory model with 2K Cache. No Problems during compile:

6. Then finally load to XMM Memory and interact:

Awesome. The Caching Mode is available when needed and works.

B. NoCache Mode
1. build_ram_test CUSTOM

2. Passed RamTest TRIV

3. Passed RamTest CMPX

4. Then I ran build_utilities, CUSTOM platform, no addons, no flash, and SRAM with NO cache

5. Then reloaded MenuTest program into Catalina, chose SMALL memory model but didn't check any of the cache options. And we got a FIT error

OK, so I edited my CUSTOM_XMM.inc file, and just as a test, I commented out the function (MakeSeqMode) that forces Sequential Access, since that's the default mode anyway. Then ran build_utilities again.

6. Hey, Catalina seems to be happy now. The previous FIT error said the last 5 Longs are reserved for Vector Diagnostics. So I guess I really have 91 Longs to work with and not 96?

7. Finally, send to XMM Memory and Interact...And we get nothing

This seems really odd. Any idea why these functions pass RamTest in both Cached and NoCache modes, yet fails to run when compiled by Catalina?

I'm puzzled.

RossH · 2019-07-12 01:50

Hi @Wingineer19

The last 5 kernel longs are normally reserved for use by the debugger. You can use them yourself if you don't intend to use the debugger - just comment out the following lines in Catalina_XMM.spin in the "target" subdirectory:

              fit       $1eb                    ' last 5 longs reserved for debug overlay vectors
              org       $1eb
DEBUG_VECTORS long      0,0,0,0,0

This may temporarily solve your "fit" problem. I will have a look at the rest of your code later today.

Ross.

RossH · 2019-07-12 05:32

Hi @Wingineer19

Nothing obvious I can see yet. When using direct access and the SMALL memory mode, the Kernel really only uses two of the XMM functions: XMM_Activate and XMM_ReadLong, both of which are also used in the RAM test program, so they would appear to be working ok.

So it's something a little unusual.

The first thing to do is to re-check your "build_utilities" options are correct - just in case! If they are not, the program would not be loaded correctly.

If you are sure this is ok, then the next possibility is that XMM_ReadLong is not actually working reliably, but is failing in a way that is not being uncovered by the RAM Test program (Note that XMM_ReadLong is not used by the cached version of your program, which would explain why that one works even if XMM_Readlong is not reliable)

So try the following:

1. In Catalina_XMM_RamTest.spin, enable the INTENSIVE option (or compile the program with -C INTENSIVE). If this causes the RAM Test program to fail the complex test, it tells us the XMM_ReadLong is not working reliably. Similarly, emable the INVERT option.

2. Another possibility is that it is a timing issue. In the RamTest program there is a long delay after XMM_Activate before any other memory access is peformed. You could try putting a delay in XMM_Activate (just before it returns) to see if that helps.

4. If you don't need them, remove the -lmb and -lcx options (unless you need floating point or SD card access, just using -lc alone should be fine). This will free up some Hub RAM space, which may help if your program is running out of Hub RAM.

4. Try compiling in LARGE mode instead of SMALL. Again, this might help if your program is running out of Hub RAM.

2. Try a much smaller, much simpler program. Use one of the existing demo programs, or create a small test program that just outputs one line of text in a loop.

Ross.

Revisiting XMM Mode Using Modern Memory Chips

Comments