Revisiting XMM Mode Using Modern Memory Chips

14567810»

Comments

  • Looking at the 256KB AT24CM02 EEPROM, it appears to support sequential reads up to the maximum 256KB limit, then rolls back over to location $0. Page writes are limited to 256 bytes each.

    Seeing that each 64kB requires a different device address I would have expected that the rollover would be contained to 64kB, so I am surprised that it is a bit more practical in this regard.

    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    P2 --- The LOT --- TAQOZ INTRO & LINKS --- P2 SHORTFORM DATASHEET --- TAQOZ RELOADED - 64kB binary with room to spare
    P1 --- Latest Tachyon with EASYFILE --- Tachyon Forth News Blog --- More
    paypal.png PayPal me
    Brisbane, Australia
  • You really expect me to wait? :)

    If you can't wait, I have zipped up the contents of my current P1 target directory and attached it. I have left out the "CUSTOM" files so as not to overwrite yours, but please make a backup copy of all your files just in case ... and be aware that you USE THIS AT YOUR OWN RISK!!! :)

    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • @RossH,
    An indicator could be how fast you saw the various MenuTest screens updating when you were running XEPROM. If the GPS Command Screen was updating pretty quickly then there's a chance the EEPROM approach might work. If it was moving at a snail's pace, then Flash wins.

    So what was your feel for the Menu display speed? Fast, slow, molasses?

    That there could give me a clue as to whether or not the EEPROM API approach should even be pursued...

  • The 512KB SRAMs I use consist of two, 256KB SRAMs on the same die. I believe they will support sequential access up to the 256KB limit, in which case you must terminate the process and restart since you can't cross the die boundary.

    Looking at the 256KB AT24CM02 EEPROM, it appears to support sequential reads up to the maximum 256KB limit, then rolls back over to location $0. Page writes are limited to 256 bytes each.

    The EEPROM page size is separate to the cache page size. The default EEPROM page size Catalina uses is 32 bytes, and it must be a divisor of 512, so it can never cross the EEPROM size boundary. I think this means that 2 sequential EEPROMs will work ok, whether they are in the same chip or in different chips.

    Of course, I have never actually tried it.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • RossH wrote: »
    You really expect me to wait? :)

    If you can't wait, I have zipped up the contents of my current P1 target directory and attached it. I have left out the "CUSTOM" files so as not to overwrite yours, but please make a backup copy of all your files just in case ... and be aware that you USE THIS AT YOUR OWN RISK!!! :)

    Understood.

    Do I need to do anything with build_utilities in order to try this? Like saying it has Flash memory and setting the cache size?
  • @RossH,
    An indicator could be how fast you saw the various MenuTest screens updating when you were running XEPROM. If the GPS Command Screen was updating pretty quickly then there's a chance the EEPROM approach might work. If it was moving at a snail's pace, then Flash wins.

    So what was your feel for the Menu display speed? Fast, slow, molasses?

    That there could give me a clue as to whether or not the EEPROM API approach should even be pursued...

    I wasn't really paying attention to the speed. I think it was usable at 4K or 8K cache sizes, but I wouldn't have called it fast. But some of that is just down to the serial I/O speed, not the execution speed.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Do I need to do anything with build_utilities in order to try this? Like saying it has Flash memory and setting the cache size?

    No, I don't think so. Just compile with -C XEPROM -C SMALL -C CACHED_4K and then use the EEPROM loader in payload. For example, here are the commands I used:
    catalina menutest.c -lc -lserial4 -C CUSTOM -C XEPROM -C SMALL -C CACHED_4K 
    payload EEPROM menutest.binary
    
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • RossH wrote: »
    Just compile with -C XEPROM -C SMALL -C CACHED_4K and then use the EEPROM loader in payload. For example, here are the commands I used:
    catalina menutest.c -lc -lserial4 -C CUSTOM -C XEPROM -C SMALL -C CACHED_4K 
    payload EEPROM menutest.binary
    

    It works!

    For some reason it didn't like the CUSTOM platform so I changed it to QUICKSTART.

    No problems with compiling or uplinking.

    But, it is pretty slow, even with 8K cache.

    I will continue to experiment with it and see what, if anything, I can do to speed it up...

  • Cluso99 already has tiny P8XBlade2 modules and could supply them with 256kB I'm sure but I have a ton of various modules I have designed such as the P8 which also has Flash. What are your requirements in terms of size, power, I/O etc?

    Essentially, something like the FLiP module, but with 256KB EEPROM, would be the minimum.

    If it also had on-board SRAM and/or Flash, that would be even better. That would give me the option of running XMM code directly from EEPROM, or from SRAM and/or Flash if desired.

    At this point I don't need a micro-SD Card, but I wouldn't mind if one was included.

    What types of P1 modules have you designed? Tell me more, please.
  • Cluso99Cluso99 Posts: 15,477
    edited 2019-09-13 - 08:53:52
    My P8XBlade2 is linked in my signature.
    If you want a different eeprom fitted i can do that provided it is the same footprint and you can send me the eeprom. I use TSSOP8 4.4x3@0.65 CAT24C512YI-GT3. I use a stencil and oven to assemble and it’s not hand soldering friendly! It has a handful of 0402 parts too.
    My Prop boards: P8XBlade2 , RamBlade , CpuBlade , TriBlade
    P1 Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    P1: Tools (Index) , Emulators (Index) , ZiCog (Z80)
    P2: Tools & Code , Tricks & Traps
  • Hi @RossH,

    I'm finding XEPROM fascinating. Since my EEPROM supports Fast Mode Plus (1MHz) I've been able to speed it up somewhat by adjusting the delay in the API, as well as tweaking some of the Menu display stuff in my code. I'm using the 8K cache. The Menu display speed is slow but usable. If I found an EEPROM that supported high speed mode (3.4MHz) the performance would likely be even better. I've found some FRAM ones that support that speed, but at $20 each that's pretty steep. I may ultimately still have to use external Flash to get where I need to go, but I'm going to continue experimenting with XEPROM to see what can be done. I'm captivated by it. :)

    Anyway, that's not what this post is about. I'm working on something else that uses your standard Serial plugins (PC, PropTerminal, TTY, or TTY-VT100).

    I'm facing the age-old problem of needing to scan the keyboard for input, but not wait for a carriage return.

    Essentially I need something like the kbhit() or getkey() function, but there doesn't appear to be one available. Is it there but I'm missing it?

  • .
    Essentially I need something like the kbhit() or getkey() function, but there doesn't appear to be one available. Is it there but I'm missing it?

    I think k_ready() is what you are looking for. See "catalina_hmi.h" for all the HMI (Human/Machine Interface) functions.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Wingineer19Wingineer19 Posts: 148
    edited 2019-09-14 - 22:09:41
    RossH wrote: »
    The EEPROM page size is separate to the cache page size. The default EEPROM page size Catalina uses is 32 bytes, and it must be a divisor of 512, so it can never cross the EEPROM size boundary. I think this means that 2 sequential EEPROMs will work ok, whether they are in the same chip or in different chips.

    Of course, I have never actually tried it.
    I now have two 256KB EEPROMs on the same I2C bus on my USB Project Board.

    One is physically mapped from $0000_0000 to $0003_FFFF while the other is mapped from $0004_0000 to $0007_FFFF.

    Whatever reading/writing scheme Tachyon uses apparently worked fine.

    I loaded Tachyon into HubRam, then issued this command:

    $0000 $7FFFF $FF EFILL --> ok

    Then This:

    $0000 $7FFFF EE DUMP

    And got this:
    0000.0000:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0000.0010:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0000.0020:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0000.0030:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0000.0040:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0000.0050:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    
    All the way up to here, where I got some Zeros at the end:
    0007.FF00:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0007.FF10:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0007.FF20:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0007.FF30:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0007.FF40:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0007.FF50:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0007.FF60:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0007.FF70:   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF    ................
    0007.FF80:   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................
    0007.FF90:   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................
    0007.FFA0:   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................
    0007.FFB0:   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................
    0007.FFC0:   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................
    0007.FFD0:   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................
    0007.FFE0:   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................
    0007.FFF0:   00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................ ok
    
    Don't know what the story is with these Zeros, but Tachyon apparently had no problem transitioning from one EEPROM to the other across the 256KB boundary, so I think your suspicions might be correct:
    The default EEPROM page size Catalina uses is 32 bytes, and it must be a divisor of 512, so it can never cross the EEPROM size boundary. I think this means that 2 sequential EEPROMs will work ok, whether they are in the same chip or in different chips.
    Where in this case, we have two separate chips that are sequentially mapped on the same bus.

    So I guess the next question deals with the caching page sizes, and would they attempt to cross the 256KB boundary, in which case they would fail.

    Obviously, if there were such a critter as a 512KB EEPROM that consists of a single die allowing sequential access across the entire memory range, then neither your EEPROM page sizes nor the actual XEPROM memory caching scheme would encounter any problem.

    Haven't seen such a critter on the market, though...

  • Hi @RossH,

    I've got another crazy question, something that is totally hypothetical, but I would appreciate your thoughts on it nevertheless.

    If I understand it correctly, CMM is a type of hybrid between pure LMM code and a series of tokens, resulting in code which is much more compact than LMM code but also slower to execute.

    In contrast, the XMM kernel is similar to the LMM kernel but with the added overhead of fetching LMM code from external memory instead of HubRam?

    Would it be possible to have a hybrid XMM kernel that fetches tokenized code from external memory, similar to how CMM does it from HubRam?

    I realize there would likely be another speed penalty, as the tokens were fetched from external memory, decoded, then executed, but perhaps this could be partially offset by having smaller code that would require fewer memory fetches?

    Just wondering about this and curious as to your thoughts on such a thing.
  • Just wondering about this and curious as to your thoughts on such a thing.

    Your understanding is correct. However, your suggestion is not practical - at least not on the P1. There is simply not enough space in a single cog to do it all. The CMM kernel only barely fits in a cog as it is - I even had to move some of the very basic floating point support routines out to another cog make it fit (which I hated to do!).

    However, I have been giving some thought to how XMM might work on the P2, where 2 cogs can closely co-operate and even share RAM. I think having a combined 2-cog "kernel and cache" might mean XMM would work very well on the P2, but perhaps a 2 cog kernel could also combine CMM and XMM functionality.

    So many possibilities, so little time ... :(
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Oh @RossH

    I like your way of thinking. I also think that this would be a major thing to figure out. How to build a two COG object working in tandem fashion. The basics seem to be there, gosh I wish I had more time for the P2.

    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • msrobots wrote: »
    gosh I wish I had more time for the P2.

    Me too! I think "2 cog" objects are going to be a big deal on the P2.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Wingineer19Wingineer19 Posts: 148
    edited 2019-10-17 - 19:38:02
    Hi @RossH,

    For some reason I'm still having problems getting the 8K cache option working in Catalina when using my external SRAMs.

    Please check your email for the test files used, including the CUSTOM_XMM memory driver.

    Program storage is achieved using the EEPROM option, but for ongoing testing and debugging, I'm just dumping the code directly to XMM RAM after compilation via the Code::Blocks Menus.

    I'm using a variant of the "RamPage2" external memory arrangement for this test, but without any Flash memory. The two SRAMs are running in Quad Mode. Each SRAM is connected to a 4-bit bus, yielding 8-bits total. So after the addressing information is strobed in for a read/write operation, 8 data bits are transferred upon each clock pulse.

    Transferring large blocks of data causes the amortization of the initial read/write command overhead thus resulting in decent transfer speed. Short of using HyperRam, I'm guessing this is the most efficient external memory arrangement to quickly transfer 8-bit blocks of data using the fewest number of pins (10 total: 8 data, plus CS and CLK).

    Anyway, Catalina works fine (as does RamTest) using the 1K, 2K, and 4K cache options.

    RamTest also works fine with 8K cache.

    I just can't get Catalina to work with the 8K cache, regardless of what I try.

    This 8K cache problem persists if I compile the program for SMALL or LARGE memory models. No difference, the program just won't run. I get a blank screen.

    Originally I thought the issue might be too much HubRam consumed. Here's what compiling using SMALL shows:

    Build: default in MenuTest (compiler: Catalina C Compiler)
    catalina.exe -CCLOCK -O5 -CNO_SCREEN -CNO_KEYBOARD -CNO_MOUSE -CNO_HMI -CCACHED_8K -CSMALL -Clibserial4 -Clibma -Clibc -CCUSTOM -p1 -IC:\Programs\Compiler\Catalina\include -IC:\WorkCode\Catalina\Test -c MenuTest.c -o .objs\MenuTest.obj
    Catalina Compiler 3.17
    catalina.exe -o MenuTest.binary .objs\MenuTest.obj -CCLOCK -O5 -CNO_SCREEN -CNO_KEYBOARD -CNO_MOUSE -CNO_HMI -CCACHED_8K -CSMALL -lserial4 -lma -lc -CCUSTOM -p1
    Catalina Optimizer 3.16
    Catalina Optimizer 3.16
    Catalina Compiler 3.17
    code = 33520 bytes
    cnst = 1960 bytes
    init = 524 bytes
    data = 3512 bytes
    file = 72364 bytes
    Output file is MenuTest.binary with size 70.67 KB
    Process terminated with status 0 (0 minute(s), 10 second(s))
    0 error(s), 0 warning(s) (0 minute(s), 10 second(s))
    And here's what compiling using LARGE shows:

    Build: default in MenuTest (compiler: Catalina C Compiler)
    catalina.exe -CCLOCK -O5 -CNO_SCREEN -CNO_KEYBOARD -CNO_MOUSE -CNO_HMI -CCACHED_8K -CLARGE -Clibserial4 -Clibma -Clibc -CCUSTOM -p1 -IC:\Programs\Compiler\Catalina\include -IC:\WorkCode\Catalina\Test -c MenuTest.c -o .objs\MenuTest.obj
    Catalina Compiler 3.17
    catalina.exe -o MenuTest.binary .objs\MenuTest.obj -CCLOCK -O5 -CNO_SCREEN -CNO_KEYBOARD -CNO_MOUSE -CNO_HMI -CCACHED_8K -CLARGE -lserial4 -lma -lc -CCUSTOM -p1
    Catalina Optimizer 3.16
    Catalina Optimizer 3.16
    Catalina Compiler 3.17
    code = 37484 bytes
    cnst = 1957 bytes
    init = 524 bytes
    data = 3512 bytes
    file = 76768 bytes
    Output file is MenuTest.binary with size 74.97 KB
    Process terminated with status 0 (0 minute(s), 11 second(s))
    0 error(s), 0 warning(s) (0 minute(s), 11 second(s))

    In each case I'm seeing the DATA portion being 3512 bytes. Does this exceed the amount of HubRam that can be used?

    If it's not a HubRam memory overrun, any idea what's going on here?

    Whatever is happening, it only manifests when using the 8K cache option...

    This "Rampage2" XMM memory arrangement works perfectly in cache mode provided that the caching cog always transfers an even number of bytes. If it doesn't, the whole scheme breaks down, and that right there would cause the program not to run.

    However, I'm assuming that an even number of bytes is always transferred to/from the cache irrespective of the cache size used (1K, 2K, 4K, or 8K).

    I'm just perplexed that I can't get the program to run using 8K cache.

    Thoughts?


  • Hi @RossH,

    For some reason I'm still having problems getting the 8K cache option working in Catalina when using my external SRAMs.

    Please check your email for the test files used, including the CUSTOM_XMM memory driver.
    Or, we can cut through the clutter and take a look at the CUSTOM_XMM driver right now, since I assume that's where the problem resides. I've included it here for anyone who wants to take a look.

    I don't see why it works perfectly for cache sizes of 1K, 2K, and 4K, but mysteriously doesn't work for 8K, at least not with Catalina, but yet it does work fine with RamTest.

    Why it doesn't work with one, but does with the other, is what has really thrown me...
  • Hello @Wingineer19

    I've also sent you an email on this. But here is the gist:

    It is probably not your driver, or the program size.

    I found and fixed a problem with Catalina's caching. It was a bug I introduced way back in Catalina 3.12 - I never spotted it because it only affects certain programs with certain cache sizes - your program with an 8K cache just happened to be one such combination!

    I have attached a new version of Cached_XMM.inc, which you should put in Catalina's "target" directory. Then recompile your program (and the utilities) to use an 8K cache.

    Let me know if it works.

    Ross.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • RossH wrote: »
    Hello @Wingineer19

    I've also sent you an email on this. But here is the gist:

    It is probably not your driver, or the program size.

    I found and fixed a problem with Catalina's caching. It was a bug I introduced way back in Catalina 3.12 - I never spotted it because it only affects certain programs with certain cache sizes - your program with an 8K cache just happened to be one such combination!

    I have attached a new version of Cached_XMM.inc, which you should put in Catalina's "target" directory. Then recompile your program (and the utilities) to use an 8K cache.

    Let me know if it works.

    Ross.

    Yes sir, it works like a charm!

    As an added bonus, there also appears to be a noticeable increase in execution speed due to the additional caching.

    Thanks again for your help.

Sign In or Register to comment.