Looking at the 256KB AT24CM02 EEPROM, it appears to support sequential reads up to the maximum 256KB limit, then rolls back over to location $0. Page writes are limited to 256 bytes each.
Seeing that each 64kB requires a different device address I would have expected that the rollover would be contained to 64kB, so I am surprised that it is a bit more practical in this regard.
If you can't wait, I have zipped up the contents of my current P1 target directory and attached it. I have left out the "CUSTOM" files so as not to overwrite yours, but please make a backup copy of all your files just in case ... and be aware that you USE THIS AT YOUR OWN RISK!!!
An indicator could be how fast you saw the various MenuTest screens updating when you were running XEPROM. If the GPS Command Screen was updating pretty quickly then there's a chance the EEPROM approach might work. If it was moving at a snail's pace, then Flash wins.
So what was your feel for the Menu display speed? Fast, slow, molasses?
That there could give me a clue as to whether or not the EEPROM API approach should even be pursued...
The 512KB SRAMs I use consist of two, 256KB SRAMs on the same die. I believe they will support sequential access up to the 256KB limit, in which case you must terminate the process and restart since you can't cross the die boundary.
Looking at the 256KB AT24CM02 EEPROM, it appears to support sequential reads up to the maximum 256KB limit, then rolls back over to location $0. Page writes are limited to 256 bytes each.
The EEPROM page size is separate to the cache page size. The default EEPROM page size Catalina uses is 32 bytes, and it must be a divisor of 512, so it can never cross the EEPROM size boundary. I think this means that 2 sequential EEPROMs will work ok, whether they are in the same chip or in different chips.
If you can't wait, I have zipped up the contents of my current P1 target directory and attached it. I have left out the "CUSTOM" files so as not to overwrite yours, but please make a backup copy of all your files just in case ... and be aware that you USE THIS AT YOUR OWN RISK!!!
Understood.
Do I need to do anything with build_utilities in order to try this? Like saying it has Flash memory and setting the cache size?
An indicator could be how fast you saw the various MenuTest screens updating when you were running XEPROM. If the GPS Command Screen was updating pretty quickly then there's a chance the EEPROM approach might work. If it was moving at a snail's pace, then Flash wins.
So what was your feel for the Menu display speed? Fast, slow, molasses?
That there could give me a clue as to whether or not the EEPROM API approach should even be pursued...
I wasn't really paying attention to the speed. I think it was usable at 4K or 8K cache sizes, but I wouldn't have called it fast. But some of that is just down to the serial I/O speed, not the execution speed.
Do I need to do anything with build_utilities in order to try this? Like saying it has Flash memory and setting the cache size?
No, I don't think so. Just compile with -C XEPROM -C SMALL -C CACHED_4K and then use the EEPROM loader in payload. For example, here are the commands I used:
Cluso99 already has tiny P8XBlade2 modules and could supply them with 256kB I'm sure but I have a ton of various modules I have designed such as the P8 which also has Flash. What are your requirements in terms of size, power, I/O etc?
Essentially, something like the FLiP module, but with 256KB EEPROM, would be the minimum.
If it also had on-board SRAM and/or Flash, that would be even better. That would give me the option of running XMM code directly from EEPROM, or from SRAM and/or Flash if desired.
At this point I don't need a micro-SD Card, but I wouldn't mind if one was included.
What types of P1 modules have you designed? Tell me more, please.
My P8XBlade2 is linked in my signature.
If you want a different eeprom fitted i can do that provided it is the same footprint and you can send me the eeprom. I use TSSOP8 4.4x3@0.65 CAT24C512YI-GT3. I use a stencil and oven to assemble and it’s not hand soldering friendly! It has a handful of 0402 parts too.
I'm finding XEPROM fascinating. Since my EEPROM supports Fast Mode Plus (1MHz) I've been able to speed it up somewhat by adjusting the delay in the API, as well as tweaking some of the Menu display stuff in my code. I'm using the 8K cache. The Menu display speed is slow but usable. If I found an EEPROM that supported high speed mode (3.4MHz) the performance would likely be even better. I've found some FRAM ones that support that speed, but at $20 each that's pretty steep. I may ultimately still have to use external Flash to get where I need to go, but I'm going to continue experimenting with XEPROM to see what can be done. I'm captivated by it.
Anyway, that's not what this post is about. I'm working on something else that uses your standard Serial plugins (PC, PropTerminal, TTY, or TTY-VT100).
I'm facing the age-old problem of needing to scan the keyboard for input, but not wait for a carriage return.
Essentially I need something like the kbhit() or getkey() function, but there doesn't appear to be one available. Is it there but I'm missing it?
The EEPROM page size is separate to the cache page size. The default EEPROM page size Catalina uses is 32 bytes, and it must be a divisor of 512, so it can never cross the EEPROM size boundary. I think this means that 2 sequential EEPROMs will work ok, whether they are in the same chip or in different chips.
Of course, I have never actually tried it.
I now have two 256KB EEPROMs on the same I2C bus on my USB Project Board.
One is physically mapped from $0000_0000 to $0003_FFFF while the other is mapped from $0004_0000 to $0007_FFFF.
Whatever reading/writing scheme Tachyon uses apparently worked fine.
I loaded Tachyon into HubRam, then issued this command:
Don't know what the story is with these Zeros, but Tachyon apparently had no problem transitioning from one EEPROM to the other across the 256KB boundary, so I think your suspicions might be correct:
The default EEPROM page size Catalina uses is 32 bytes, and it must be a divisor of 512, so it can never cross the EEPROM size boundary. I think this means that 2 sequential EEPROMs will work ok, whether they are in the same chip or in different chips.
Where in this case, we have two separate chips that are sequentially mapped on the same bus.
So I guess the next question deals with the caching page sizes, and would they attempt to cross the 256KB boundary, in which case they would fail.
Obviously, if there were such a critter as a 512KB EEPROM that consists of a single die allowing sequential access across the entire memory range, then neither your EEPROM page sizes nor the actual XEPROM memory caching scheme would encounter any problem.
Haven't seen such a critter on the market, though...
I've got another crazy question, something that is totally hypothetical, but I would appreciate your thoughts on it nevertheless.
If I understand it correctly, CMM is a type of hybrid between pure LMM code and a series of tokens, resulting in code which is much more compact than LMM code but also slower to execute.
In contrast, the XMM kernel is similar to the LMM kernel but with the added overhead of fetching LMM code from external memory instead of HubRam?
Would it be possible to have a hybrid XMM kernel that fetches tokenized code from external memory, similar to how CMM does it from HubRam?
I realize there would likely be another speed penalty, as the tokens were fetched from external memory, decoded, then executed, but perhaps this could be partially offset by having smaller code that would require fewer memory fetches?
Just wondering about this and curious as to your thoughts on such a thing.
Just wondering about this and curious as to your thoughts on such a thing.
Your understanding is correct. However, your suggestion is not practical - at least not on the P1. There is simply not enough space in a single cog to do it all. The CMM kernel only barely fits in a cog as it is - I even had to move some of the very basic floating point support routines out to another cog make it fit (which I hated to do!).
However, I have been giving some thought to how XMM might work on the P2, where 2 cogs can closely co-operate and even share RAM. I think having a combined 2-cog "kernel and cache" might mean XMM would work very well on the P2, but perhaps a 2 cog kernel could also combine CMM and XMM functionality.
I like your way of thinking. I also think that this would be a major thing to figure out. How to build a two COG object working in tandem fashion. The basics seem to be there, gosh I wish I had more time for the P2.
For some reason I'm still having problems getting the 8K cache option working in Catalina when using my external SRAMs.
Please check your email for the test files used, including the CUSTOM_XMM memory driver.
Program storage is achieved using the EEPROM option, but for ongoing testing and debugging, I'm just dumping the code directly to XMM RAM after compilation via the Code::Blocks Menus.
I'm using a variant of the "RamPage2" external memory arrangement for this test, but without any Flash memory. The two SRAMs are running in Quad Mode. Each SRAM is connected to a 4-bit bus, yielding 8-bits total. So after the addressing information is strobed in for a read/write operation, 8 data bits are transferred upon each clock pulse.
Transferring large blocks of data causes the amortization of the initial read/write command overhead thus resulting in decent transfer speed. Short of using HyperRam, I'm guessing this is the most efficient external memory arrangement to quickly transfer 8-bit blocks of data using the fewest number of pins (10 total: 8 data, plus CS and CLK).
Anyway, Catalina works fine (as does RamTest) using the 1K, 2K, and 4K cache options.
RamTest also works fine with 8K cache.
I just can't get Catalina to work with the 8K cache, regardless of what I try.
This 8K cache problem persists if I compile the program for SMALL or LARGE memory models. No difference, the program just won't run. I get a blank screen.
Originally I thought the issue might be too much HubRam consumed. Here's what compiling using SMALL shows:
In each case I'm seeing the DATA portion being 3512 bytes. Does this exceed the amount of HubRam that can be used?
If it's not a HubRam memory overrun, any idea what's going on here?
Whatever is happening, it only manifests when using the 8K cache option...
This "Rampage2" XMM memory arrangement works perfectly in cache mode provided that the caching cog always transfers an even number of bytes. If it doesn't, the whole scheme breaks down, and that right there would cause the program not to run.
However, I'm assuming that an even number of bytes is always transferred to/from the cache irrespective of the cache size used (1K, 2K, 4K, or 8K).
I'm just perplexed that I can't get the program to run using 8K cache.
For some reason I'm still having problems getting the 8K cache option working in Catalina when using my external SRAMs.
Please check your email for the test files used, including the CUSTOM_XMM memory driver.
Or, we can cut through the clutter and take a look at the CUSTOM_XMM driver right now, since I assume that's where the problem resides. I've included it here for anyone who wants to take a look.
I don't see why it works perfectly for cache sizes of 1K, 2K, and 4K, but mysteriously doesn't work for 8K, at least not with Catalina, but yet it does work fine with RamTest.
Why it doesn't work with one, but does with the other, is what has really thrown me...
I've also sent you an email on this. But here is the gist:
It is probably not your driver, or the program size.
I found and fixed a problem with Catalina's caching. It was a bug I introduced way back in Catalina 3.12 - I never spotted it because it only affects certain programs with certain cache sizes - your program with an 8K cache just happened to be one such combination!
I have attached a new version of Cached_XMM.inc, which you should put in Catalina's "target" directory. Then recompile your program (and the utilities) to use an 8K cache.
I've also sent you an email on this. But here is the gist:
It is probably not your driver, or the program size.
I found and fixed a problem with Catalina's caching. It was a bug I introduced way back in Catalina 3.12 - I never spotted it because it only affects certain programs with certain cache sizes - your program with an 8K cache just happened to be one such combination!
I have attached a new version of Cached_XMM.inc, which you should put in Catalina's "target" directory. Then recompile your program (and the utilities) to use an 8K cache.
Let me know if it works.
Ross.
Yes sir, it works like a charm!
As an added bonus, there also appears to be a noticeable increase in execution speed due to the additional caching.
I have a question about how Tachyon writes to and reads from EEPROM.
But first, let me describe what I'm doing here.
For my Project I want to use a FLiP module instead of a USB Project Board. However, the FLiP doesn't have sufficient memory, so I designed a small mezzanine/carrier board that the FLiP plugs in to, which in turn plugs into a 40-pin socket on my main board.
Here's what it looks like:
Side by side with the FLiP:
FLiP plugged in to the carrier board:
The mezzanine board contains 3 EEPROMs (the top 3 chips) and 2 SPI SRAMs (the lower 2 chips). This arrangement brings FLiP addressable EEPROM memory up to 512KB, while the SPI SRAMs (256KB total) serve as XMM memory in which my program will run.
I only had some oversized Arduino Headers on hand, so I had to use them instead of some preferred low profile ones. It's also not the cleanest job since I used a soldering iron to install all components.
When (if?) @RossH returns to the forums in a few months (assuming his Retreat survived the fires and customers have returned), he might be interested to know that the 256KB SRAMs work perfectly using the Catalina RamTest program and 8K of cache.
Unfortunately, the Catalina EEPROM Loader gave me an error when I attempted to save an XMM program in EEPROM:
It's almost like the Loader is trying to write across the EEPROM physical boundary which obviously won't work.
So, I loaded up Tachyon and took advantage of its excellent diagnostic capability. I had it fill the entire 512KB EEPROM memory with different characters (with the last tested value of $5A shown here), and it worked perfectly.
The EEPROM page size is separate to the cache page size. The default EEPROM page size Catalina uses is 32 bytes, and it must be a divisor of 512, so it can never cross the EEPROM size boundary. I think this means that 2 sequential EEPROMs will work ok, whether they are in the same chip or in different chips.
Of course, I have never actually tried it.
If I'm reading this post correctly he's using 32-byte writes/reads but for some reason the Catalina EEPROM Loader isn't working correctly and outputs the previously mentioned error. If I can find the default settings for this loader maybe I can try different values and see if I can find something that works.
So my question to you is what size write/read blocks does Tachyon use, because whatever it's using works, and whatever the Catalina EEPROM loader uses, doesn't.
Tachyon will fill and copy to EEPROM in a page size to suit the device which is typically 128 bytes for 64KB chips. Each time it performs a write it calculates the device address so it will cross device boundaries seamlessly. So each page write is:
<START> <DEVICE ADDR> <WRITE> <16-bit MEMORY ADDRESS> <128 BYTES> <STOP>
Because the device may be busy programming a page of data, the device address is polled for an ack to indicate when it is ready along with a timeout to prevent hanging.
I use the LY68L6400, an 8MB SPI RAM as an option on the P2. It's cheap!
Thanks for the heads up on the LY68L6400 SPI SRAMs. I couldn't find a US distributor for them yet, but I did find them on Aliexpress. A batch of 10 goes for about $17.50.
How are they working out for you? Any troubles with them? I've had no problems with the ISSI 512KB SPI SRAMS so far...
I want to dig a little deeper on how Tachyon does EEPROM writes/reads as well as the techniques you employ to get past the EEPROM physical barriers in a ganged EEPROM arrangement.
Since I'm using a potpourri of EEPROMs (64KB on board the FLiP, and 64KB, 128KB, and 256KB on my expansion board), Tachyon has no a priori knowledge of how many EEPROMs it's working with, nor the memory size of each one. Yet your algorithm effectively determines this and adjusts accordingly to seamlessly write/read across EEPROM physical boundaries.
I've written code to write to a GPS receiver on an I2C bus, but not EEPROMs. There's no such thing as Page reads/writes in this case, as I just send a string of characters to the GPS receiver and look for ACK/NACK responses to verify reception of the string.
If you're using EEPROMs <= 64KB in size, the entire A15->A0 physical address is contained within the Memory Address bytes, with the Device Address byte containing device select fields A2, A1, A0 (allowing a total of 8 of these sized devices on the I2C bus).
However, if you're using a 128KB EEPROM, then device select field A0 is replaced with physical address A16 (thus restricting you to a total of 4 of these devices on the bus).
Finally, if you're using a 256KB EEPROM, then device select fields A1 and A0 are replaced with physical addresses A17 and A16, respectively, which limits you to a maximum of 2 such devices on the bus.
OK, with that out of the way, and since you're using a 128-byte Page Write scheme for the EEPROMs, I'm going to make the following assumptions and you can tell me if I'm on the right track or not:
1. Tachyon will send this packet to the EEPROM as you described:
<START> <DEVICE ADDR> <WRITE> <16-bit MEMORY ADDRESS> <128 BYTES> <STOP>
2. If the bytes you attempt to write fit within its physical boundary, the EEPROM will respond with ACKs for each byte you write.
3. Tachyon will then send this command to verify the write was successful:
<START> <DEVICE ADDR> <READ> <16-bit MEMORY ADDRESS> <128 BYTES> <STOP>
4. Assuming all is well, Tachyon will advance the memory address to the start of the next block of data and continue the write process until the task is complete.
5. However, if Tachyon attempts to write past its physical boundary, the EEPROM will respond with a NACK for the last byte it attempted to write.
6. If byte N failed, then Tachyon will downward adjust the number of bytes to send to 128 - (128 - N) bytes and retry.
7. If successful, Tachyon will send the Read command to verify N bytes were written.
8. If the Read was successful, then Tachyon will advance the memory address to the start of the next block of data and attempt to write 128 bytes again.
9. But, if Step 6 fails, then Tachyon will attempt again X number of times until a timeout value is reached, in which case it will quit and provide an error message.
So, if this process is correct, then Tachyon will dynamically adjust the number of bytes it will attempt to write until it achieves success or a timeout error occurs.
This technique thus allows Tachyon to seamlessly bridge across physical boundaries of multiple EEPROMs regardless of how many physical devices are on the bus (up to 8 total), and/or their memory size.
Does this make sense, and is this similar to how you achieve the successful bridging?
Practically all modern I2C EEPROMs accept 16-bits for their address and even for 128kB and larger devices. Whereas you would expect A16 to be the b1 or the lsb of the 7-bit address, it is in fact b3 for Atmel/Microchip x1025 devices, which is rather non-standard. Tachyon doesn't perform any voodoo, it just translates A16,A17,A18 into the device address which most normal larger devices handle, well normally. I favor the M24M0x EEPROMs.
Page writes can never cross boundaries because they always start on an aligned page address. You cannot start a page on the last byte for instance. But page writes take the same amount of time as a byte write, probably because that's what they are doing internally. I found that the guaranteed "maximum" endurance cycle of 1M writes typical is in fact a very minimum figure that you can count on. I endurance tested an EEPROM once and gave up when it was still working normally after several million cycles.
If you are trying to mix 32kB (256kb) devices with larger devices then that's your problem and the solution is simple, just don't.
The SPI RAM I buy from LCSC who I can highly recommend. To me they are a bit like an Asian Mouser, albeit they are not guaranteed to have every part, but the parts they do have are at very good prices. For instance if you buy 10 of those chips they are 61 cents each. Much cheaper than Aliex.
P.S. Tachyon doesn't need "expanded memory" as it is very compact and fast and runs directly out of 32k RAM, even for very large applications which I have written. It does however use the upper 32k of a normal 64kB EEPROM to hold ROMS which are the named binary images that can be loaded at runtime into cogs such as VGA, F32, SIDcog, UARTS, etc. That way unlike the image taking up RAM space as in Spin, you can have all that functionality but not use up any RAM.
EDIT: I've just attached a capture of the console startup and a quick look around including mounting an SD and opening and examining a file.
If you are trying to mix 32kB (256kb) devices with larger devices then that's your problem and the solution is simple, just don't.
Fortunately I'm not using 32KB EEPROMs. What I did do is expand total EEPROM memory space to 512KB by using an expansion board that has 64KB, 128KB, and 256KB EEPROMs installed. But doing this allows, in fact requires, the use of the existing 64KB EEPROM on the FLiP as well.
Which means that all four EEPROMs used are mapped like this:
As we know, for EEPROMs 64KB and smaller, the Device Byte A2,A1,A0 fields are external physical inputs to essentially select up to 8 of these on the I2C bus.
For the 128KB EEPROM, Device Bit A0 has been replaced with memory location A16, while the physical pin formerly known as A0 became NC.
And for the 256KB EEPROM, Device Bit A1 has also been replaced with memory location A17, with physical pins formerly known as A1 and A0 replaced with NC.
So, looking at the above table, the EEPROMs are properly segmented and mapped into their expected ranges.
Indeed, Tachyon confirmed this by successfully writing to and reading from all EEPROM memory locations from $00000 to $7FFFF.
However, the Catalina EEPROM loader failed while trying to access memory locations from 64KB to 128KB. @RossH had mentioned the Catalina loader is quite old and was only designed to work with EEPROMs up to 128KB. It also clearly has problems working with multiple EEPROMS on the I2C bus.
Hopefully in due time, when (if?) he returns to this forum he can take a closer look.
Of course, if I disable the FLiP internal 64KB EEPROM, and use only the 256KB EEPROM on the expansion board the Catalina loader will work. That's essentially what I did with my USB Project Board and it works fine.
The caveat is that the 256KB EEPROM will have to be placed on the expansion board where the 64KB one currently is. This is certainly doable, and will allow me to proceed with my Project, but the intent was to not mutilate the FLiP but instead allow full 512KB EEPROM access by simply plugging it into the expansion board.
Until the Catalina loader is fixed (or another software workaround is found), it's time to mutilate a FLiP, remove the 64KB and 128KB EEPROMs on the expansion board, and install the 256KB EEPROM instead...
The SPI RAM I buy from LCSC who I can highly recommend. To me they are a bit like an Asian Mouser, albeit they are not guaranteed to have every part, but the parts they do have are at very good prices. For instance if you buy 10 of those chips they are 61 cents each. Much cheaper than Aliex.
Wow, I priced out 10 of these chips and I can have them delivered to the USA for $11.19, including shipping. Amazing!
Comments
Seeing that each 64kB requires a different device address I would have expected that the rollover would be contained to 64kB, so I am surprised that it is a bit more practical in this regard.
If you can't wait, I have zipped up the contents of my current P1 target directory and attached it. I have left out the "CUSTOM" files so as not to overwrite yours, but please make a backup copy of all your files just in case ... and be aware that you USE THIS AT YOUR OWN RISK!!!
So what was your feel for the Menu display speed? Fast, slow, molasses?
That there could give me a clue as to whether or not the EEPROM API approach should even be pursued...
The EEPROM page size is separate to the cache page size. The default EEPROM page size Catalina uses is 32 bytes, and it must be a divisor of 512, so it can never cross the EEPROM size boundary. I think this means that 2 sequential EEPROMs will work ok, whether they are in the same chip or in different chips.
Of course, I have never actually tried it.
Understood.
Do I need to do anything with build_utilities in order to try this? Like saying it has Flash memory and setting the cache size?
I wasn't really paying attention to the speed. I think it was usable at 4K or 8K cache sizes, but I wouldn't have called it fast. But some of that is just down to the serial I/O speed, not the execution speed.
No, I don't think so. Just compile with -C XEPROM -C SMALL -C CACHED_4K and then use the EEPROM loader in payload. For example, here are the commands I used:
It works!
For some reason it didn't like the CUSTOM platform so I changed it to QUICKSTART.
No problems with compiling or uplinking.
But, it is pretty slow, even with 8K cache.
I will continue to experiment with it and see what, if anything, I can do to speed it up...
Essentially, something like the FLiP module, but with 256KB EEPROM, would be the minimum.
If it also had on-board SRAM and/or Flash, that would be even better. That would give me the option of running XMM code directly from EEPROM, or from SRAM and/or Flash if desired.
At this point I don't need a micro-SD Card, but I wouldn't mind if one was included.
What types of P1 modules have you designed? Tell me more, please.
If you want a different eeprom fitted i can do that provided it is the same footprint and you can send me the eeprom. I use TSSOP8 4.4x3@0.65 CAT24C512YI-GT3. I use a stencil and oven to assemble and it’s not hand soldering friendly! It has a handful of 0402 parts too.
I'm finding XEPROM fascinating. Since my EEPROM supports Fast Mode Plus (1MHz) I've been able to speed it up somewhat by adjusting the delay in the API, as well as tweaking some of the Menu display stuff in my code. I'm using the 8K cache. The Menu display speed is slow but usable. If I found an EEPROM that supported high speed mode (3.4MHz) the performance would likely be even better. I've found some FRAM ones that support that speed, but at $20 each that's pretty steep. I may ultimately still have to use external Flash to get where I need to go, but I'm going to continue experimenting with XEPROM to see what can be done. I'm captivated by it.
Anyway, that's not what this post is about. I'm working on something else that uses your standard Serial plugins (PC, PropTerminal, TTY, or TTY-VT100).
I'm facing the age-old problem of needing to scan the keyboard for input, but not wait for a carriage return.
Essentially I need something like the kbhit() or getkey() function, but there doesn't appear to be one available. Is it there but I'm missing it?
I think k_ready() is what you are looking for. See "catalina_hmi.h" for all the HMI (Human/Machine Interface) functions.
One is physically mapped from $0000_0000 to $0003_FFFF while the other is mapped from $0004_0000 to $0007_FFFF.
Whatever reading/writing scheme Tachyon uses apparently worked fine.
I loaded Tachyon into HubRam, then issued this command:
$0000 $7FFFF $FF EFILL --> ok
Then This:
$0000 $7FFFF EE DUMP
And got this: All the way up to here, where I got some Zeros at the end: Don't know what the story is with these Zeros, but Tachyon apparently had no problem transitioning from one EEPROM to the other across the 256KB boundary, so I think your suspicions might be correct: Where in this case, we have two separate chips that are sequentially mapped on the same bus.
So I guess the next question deals with the caching page sizes, and would they attempt to cross the 256KB boundary, in which case they would fail.
Obviously, if there were such a critter as a 512KB EEPROM that consists of a single die allowing sequential access across the entire memory range, then neither your EEPROM page sizes nor the actual XEPROM memory caching scheme would encounter any problem.
Haven't seen such a critter on the market, though...
I've got another crazy question, something that is totally hypothetical, but I would appreciate your thoughts on it nevertheless.
If I understand it correctly, CMM is a type of hybrid between pure LMM code and a series of tokens, resulting in code which is much more compact than LMM code but also slower to execute.
In contrast, the XMM kernel is similar to the LMM kernel but with the added overhead of fetching LMM code from external memory instead of HubRam?
Would it be possible to have a hybrid XMM kernel that fetches tokenized code from external memory, similar to how CMM does it from HubRam?
I realize there would likely be another speed penalty, as the tokens were fetched from external memory, decoded, then executed, but perhaps this could be partially offset by having smaller code that would require fewer memory fetches?
Just wondering about this and curious as to your thoughts on such a thing.
Your understanding is correct. However, your suggestion is not practical - at least not on the P1. There is simply not enough space in a single cog to do it all. The CMM kernel only barely fits in a cog as it is - I even had to move some of the very basic floating point support routines out to another cog make it fit (which I hated to do!).
However, I have been giving some thought to how XMM might work on the P2, where 2 cogs can closely co-operate and even share RAM. I think having a combined 2-cog "kernel and cache" might mean XMM would work very well on the P2, but perhaps a 2 cog kernel could also combine CMM and XMM functionality.
So many possibilities, so little time ...
I like your way of thinking. I also think that this would be a major thing to figure out. How to build a two COG object working in tandem fashion. The basics seem to be there, gosh I wish I had more time for the P2.
Me too! I think "2 cog" objects are going to be a big deal on the P2.
For some reason I'm still having problems getting the 8K cache option working in Catalina when using my external SRAMs.
Please check your email for the test files used, including the CUSTOM_XMM memory driver.
Program storage is achieved using the EEPROM option, but for ongoing testing and debugging, I'm just dumping the code directly to XMM RAM after compilation via the Code::Blocks Menus.
I'm using a variant of the "RamPage2" external memory arrangement for this test, but without any Flash memory. The two SRAMs are running in Quad Mode. Each SRAM is connected to a 4-bit bus, yielding 8-bits total. So after the addressing information is strobed in for a read/write operation, 8 data bits are transferred upon each clock pulse.
Transferring large blocks of data causes the amortization of the initial read/write command overhead thus resulting in decent transfer speed. Short of using HyperRam, I'm guessing this is the most efficient external memory arrangement to quickly transfer 8-bit blocks of data using the fewest number of pins (10 total: 8 data, plus CS and CLK).
Anyway, Catalina works fine (as does RamTest) using the 1K, 2K, and 4K cache options.
RamTest also works fine with 8K cache.
I just can't get Catalina to work with the 8K cache, regardless of what I try.
This 8K cache problem persists if I compile the program for SMALL or LARGE memory models. No difference, the program just won't run. I get a blank screen.
Originally I thought the issue might be too much HubRam consumed. Here's what compiling using SMALL shows: And here's what compiling using LARGE shows:
In each case I'm seeing the DATA portion being 3512 bytes. Does this exceed the amount of HubRam that can be used?
If it's not a HubRam memory overrun, any idea what's going on here?
Whatever is happening, it only manifests when using the 8K cache option...
This "Rampage2" XMM memory arrangement works perfectly in cache mode provided that the caching cog always transfers an even number of bytes. If it doesn't, the whole scheme breaks down, and that right there would cause the program not to run.
However, I'm assuming that an even number of bytes is always transferred to/from the cache irrespective of the cache size used (1K, 2K, 4K, or 8K).
I'm just perplexed that I can't get the program to run using 8K cache.
Thoughts?
I don't see why it works perfectly for cache sizes of 1K, 2K, and 4K, but mysteriously doesn't work for 8K, at least not with Catalina, but yet it does work fine with RamTest.
Why it doesn't work with one, but does with the other, is what has really thrown me...
I've also sent you an email on this. But here is the gist:
It is probably not your driver, or the program size.
I found and fixed a problem with Catalina's caching. It was a bug I introduced way back in Catalina 3.12 - I never spotted it because it only affects certain programs with certain cache sizes - your program with an 8K cache just happened to be one such combination!
I have attached a new version of Cached_XMM.inc, which you should put in Catalina's "target" directory. Then recompile your program (and the utilities) to use an 8K cache.
Let me know if it works.
Ross.
Yes sir, it works like a charm!
As an added bonus, there also appears to be a noticeable increase in execution speed due to the additional caching.
Thanks again for your help.
I have a question about how Tachyon writes to and reads from EEPROM.
But first, let me describe what I'm doing here.
For my Project I want to use a FLiP module instead of a USB Project Board. However, the FLiP doesn't have sufficient memory, so I designed a small mezzanine/carrier board that the FLiP plugs in to, which in turn plugs into a 40-pin socket on my main board.
Here's what it looks like:
Side by side with the FLiP:
FLiP plugged in to the carrier board:
The mezzanine board contains 3 EEPROMs (the top 3 chips) and 2 SPI SRAMs (the lower 2 chips). This arrangement brings FLiP addressable EEPROM memory up to 512KB, while the SPI SRAMs (256KB total) serve as XMM memory in which my program will run.
I only had some oversized Arduino Headers on hand, so I had to use them instead of some preferred low profile ones. It's also not the cleanest job since I used a soldering iron to install all components.
Here's how the EEPROM mapping is arranged:
When (if?) @RossH returns to the forums in a few months (assuming his Retreat survived the fires and customers have returned), he might be interested to know that the 256KB SRAMs work perfectly using the Catalina RamTest program and 8K of cache.
Unfortunately, the Catalina EEPROM Loader gave me an error when I attempted to save an XMM program in EEPROM:
It's almost like the Loader is trying to write across the EEPROM physical boundary which obviously won't work.
So, I loaded up Tachyon and took advantage of its excellent diagnostic capability. I had it fill the entire 512KB EEPROM memory with different characters (with the last tested value of $5A shown here), and it worked perfectly.
Here are some snippets to summarize what I saw:
As you can see, Tachyon was able to seamlessly write and read across EEPROM physical boundaries.
@RossH mentioned this earlier in this thread:
If I'm reading this post correctly he's using 32-byte writes/reads but for some reason the Catalina EEPROM Loader isn't working correctly and outputs the previously mentioned error. If I can find the default settings for this loader maybe I can try different values and see if I can find something that works.
So my question to you is what size write/read blocks does Tachyon use, because whatever it's using works, and whatever the Catalina EEPROM loader uses, doesn't.
Thanks.
<START> <DEVICE ADDR> <WRITE> <16-bit MEMORY ADDRESS> <128 BYTES> <STOP>
Because the device may be busy programming a page of data, the device address is polled for an ack to indicate when it is ready along with a timeout to prevent hanging.
I use the LY68L6400, an 8MB SPI RAM as an option on the P2. It's cheap!
Thanks for the heads up on the LY68L6400 SPI SRAMs. I couldn't find a US distributor for them yet, but I did find them on Aliexpress. A batch of 10 goes for about $17.50.
How are they working out for you? Any troubles with them? I've had no problems with the ISSI 512KB SPI SRAMS so far...
I want to dig a little deeper on how Tachyon does EEPROM writes/reads as well as the techniques you employ to get past the EEPROM physical barriers in a ganged EEPROM arrangement.
Since I'm using a potpourri of EEPROMs (64KB on board the FLiP, and 64KB, 128KB, and 256KB on my expansion board), Tachyon has no a priori knowledge of how many EEPROMs it's working with, nor the memory size of each one. Yet your algorithm effectively determines this and adjusts accordingly to seamlessly write/read across EEPROM physical boundaries.
I've written code to write to a GPS receiver on an I2C bus, but not EEPROMs. There's no such thing as Page reads/writes in this case, as I just send a string of characters to the GPS receiver and look for ACK/NACK responses to verify reception of the string.
If you're using EEPROMs <= 64KB in size, the entire A15->A0 physical address is contained within the Memory Address bytes, with the Device Address byte containing device select fields A2, A1, A0 (allowing a total of 8 of these sized devices on the I2C bus).
However, if you're using a 128KB EEPROM, then device select field A0 is replaced with physical address A16 (thus restricting you to a total of 4 of these devices on the bus).
Finally, if you're using a 256KB EEPROM, then device select fields A1 and A0 are replaced with physical addresses A17 and A16, respectively, which limits you to a maximum of 2 such devices on the bus.
OK, with that out of the way, and since you're using a 128-byte Page Write scheme for the EEPROMs, I'm going to make the following assumptions and you can tell me if I'm on the right track or not:
1. Tachyon will send this packet to the EEPROM as you described:
<START> <DEVICE ADDR> <WRITE> <16-bit MEMORY ADDRESS> <128 BYTES> <STOP>
2. If the bytes you attempt to write fit within its physical boundary, the EEPROM will respond with ACKs for each byte you write.
3. Tachyon will then send this command to verify the write was successful:
<START> <DEVICE ADDR> <READ> <16-bit MEMORY ADDRESS> <128 BYTES> <STOP>
4. Assuming all is well, Tachyon will advance the memory address to the start of the next block of data and continue the write process until the task is complete.
5. However, if Tachyon attempts to write past its physical boundary, the EEPROM will respond with a NACK for the last byte it attempted to write.
6. If byte N failed, then Tachyon will downward adjust the number of bytes to send to 128 - (128 - N) bytes and retry.
7. If successful, Tachyon will send the Read command to verify N bytes were written.
8. If the Read was successful, then Tachyon will advance the memory address to the start of the next block of data and attempt to write 128 bytes again.
9. But, if Step 6 fails, then Tachyon will attempt again X number of times until a timeout value is reached, in which case it will quit and provide an error message.
So, if this process is correct, then Tachyon will dynamically adjust the number of bytes it will attempt to write until it achieves success or a timeout error occurs.
This technique thus allows Tachyon to seamlessly bridge across physical boundaries of multiple EEPROMs regardless of how many physical devices are on the bus (up to 8 total), and/or their memory size.
Does this make sense, and is this similar to how you achieve the successful bridging?
Page writes can never cross boundaries because they always start on an aligned page address. You cannot start a page on the last byte for instance. But page writes take the same amount of time as a byte write, probably because that's what they are doing internally. I found that the guaranteed "maximum" endurance cycle of 1M writes typical is in fact a very minimum figure that you can count on. I endurance tested an EEPROM once and gave up when it was still working normally after several million cycles.
If you are trying to mix 32kB (256kb) devices with larger devices then that's your problem and the solution is simple, just don't.
The SPI RAM I buy from LCSC who I can highly recommend. To me they are a bit like an Asian Mouser, albeit they are not guaranteed to have every part, but the parts they do have are at very good prices. For instance if you buy 10 of those chips they are 61 cents each. Much cheaper than Aliex.
P.S. Tachyon doesn't need "expanded memory" as it is very compact and fast and runs directly out of 32k RAM, even for very large applications which I have written. It does however use the upper 32k of a normal 64kB EEPROM to hold ROMS which are the named binary images that can be loaded at runtime into cogs such as VGA, F32, SIDcog, UARTS, etc. That way unlike the image taking up RAM space as in Spin, you can have all that functionality but not use up any RAM.
EDIT: I've just attached a capture of the console startup and a quick look around including mounting an SD and opening and examining a file.
Which means that all four EEPROMs used are mapped like this: As we know, for EEPROMs 64KB and smaller, the Device Byte A2,A1,A0 fields are external physical inputs to essentially select up to 8 of these on the I2C bus.
For the 128KB EEPROM, Device Bit A0 has been replaced with memory location A16, while the physical pin formerly known as A0 became NC.
And for the 256KB EEPROM, Device Bit A1 has also been replaced with memory location A17, with physical pins formerly known as A1 and A0 replaced with NC.
So, looking at the above table, the EEPROMs are properly segmented and mapped into their expected ranges.
Indeed, Tachyon confirmed this by successfully writing to and reading from all EEPROM memory locations from $00000 to $7FFFF.
However, the Catalina EEPROM loader failed while trying to access memory locations from 64KB to 128KB. @RossH had mentioned the Catalina loader is quite old and was only designed to work with EEPROMs up to 128KB. It also clearly has problems working with multiple EEPROMS on the I2C bus.
Hopefully in due time, when (if?) he returns to this forum he can take a closer look.
Of course, if I disable the FLiP internal 64KB EEPROM, and use only the 256KB EEPROM on the expansion board the Catalina loader will work. That's essentially what I did with my USB Project Board and it works fine.
The caveat is that the 256KB EEPROM will have to be placed on the expansion board where the 64KB one currently is. This is certainly doable, and will allow me to proceed with my Project, but the intent was to not mutilate the FLiP but instead allow full 512KB EEPROM access by simply plugging it into the expansion board.
Until the Catalina loader is fixed (or another software workaround is found), it's time to mutilate a FLiP, remove the 64KB and 128KB EEPROMs on the expansion board, and install the 256KB EEPROM instead... Wow, I priced out 10 of these chips and I can have them delivered to the USA for $11.19, including shipping. Amazing!