Revisiting XMM Mode Using Modern Memory Chips

1235»

Comments

  • I think you are running in a Problem with P2load's terminal.

    I have sometimes the opposite effect, chars with spaces in between. As much as I praise Spin2Gui and Fastspin there is some issue with the terminal of P2load. I had no time to figure this out but I had some strange experiences the last week while fighting to integrate TAQOZ and Fastpin compiled stuff.

    My current theory is that the terminal is effing up on some character combinations. I had a couple of times the situation that I needed to HEX out the data or the Terminal misbehaves.

    Since my goal is somewhere else I just live with it and hex- out when problems arrive.

    But maybe you are fighting windmills, your program is fine, the output of the Terminal f..es up sometimes.

    I have not investigated this deep, but P2loads Terminal is - hmm - misbehaving sometimes.

    Mike.
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • Wingineer19Wingineer19 Posts: 70
    edited 2019-07-29 - 17:49:15
    Mike,

    It could be some problem with the loader, but it's more likely that some memory within the XMM Kernel itself is being overwritten somehow.

    Here's some sample code that writes to XMM Memory via the Cache (which works fine):
     XMM_WritePage     andn  outa,RamSel           'RamSel=Lo (SRAM Active)
                       call  #SendWriteCmd         'Configure SRAM For Writing
     :HubData        rdbyte  RamData,Hub_Addr      'HubRam -> RamData (This Is A Byte)
                        mov  RamLoop,#02           'RamLoop=2 (Send 2 Nibbles To SRAM)
                       call  #SendRamData          'RamData -> SRAM
                        add  Hub_Addr,#01          'Hub_Addr=Hub_Addr + 1
                        add  XMM_Addr,#01          'XMM_Addr=XMM_Addr + 1
                       djnz  XMM_Len,#:HubData     'if(--XMM_Len > 0) goto HubData
                         or  outa,RamSel           'RamSel=Hi (SRAM Inactive)
     XMM_WritePage_ret  ret                        'Return
    

    Notations:
    XMM_Len: Number of Bytes to write to SRAM
    Hub_Addr: Hub Address location from which the Bytes are to be fetched.
    XMM_Addr: The SRAM location to which the Bytes are to be written.

    The SendWriteCmd function is benign. It simply sends the SRAM Write Command ($02) with the address appended to it (for example, $0207ABCD) to the SRAM stating that it wants to write data to it at address $07ABCD.

    The SendRamData function writes a Byte to this location (with the Byte sent as 2 Nibbles because this is a 4-bit data bus).

    Now, here's the code to write to XMM Memory in Direct Mode (which doesn't work):
     XMM_WriteLong        mov   XMM_Len,#04       'XMM_Len=4 (Write Long)
     XMM_WriteMult       andn   outa,RamSel       'RamSel=Lo (SRAM Active)
                         call   #SendWriteCmd     'Configure SRAM For Writing
                          add   XMM_Addr,XMM_Len  'XMM_Addr=XMM_Addr + XMM_Len
     XMM_Src              mov   RamData,0-0       'CogRam -> RamData
                          shl   XMM_Len,#01       'XMM_Len (Bytes) -> XMM_Len(Nibbles)
                          mov   RamLoop,XMM_Len   'RamLoop=XMM_Len (Write Nibbles)
                         call   #SendRamData      'RamData -> SRAM
                           or   outa,RamSel       'RamSel=Hi (SRAM Inactive)
     XMM_WriteMult_ret
     XMM_WriteLong_ret    ret                     'Return
    

    In Direct Mode, XMM_Len can only have values of 1,2, or 4.

    Notice that the exact same two functions used during Cache write (SendWriteCmd and SendRamData) are used here as well.

    In this case the HubRam isn't used, so the Byte(s) are fetched from a CogRam location specified by the Kernel:
    XMM_Src     mov   RamData,0-0
    

    With the 0-0 overwrittten by the Kernel via the MOVS instruction thus pointing to the CogRam location.

    The XMM Read Function, which works in Cached mode is this:
     XMM_ReadPage        andn  outa,RamSel         'RamSel=Lo (SRAM Active)
                         call  #SendReadCmd        'Configure SRAM For Reading
     :XmmData             mov  RamLoop,#02         'RamLoop=2 (Grab 2 SRAM Nibbles)
                         call  #GetRamData         'SRAM -> RamData
                       wrbyte  RamData,Hub_Addr    'RamData -> HubRam
                          add  Hub_Addr,#01        'Hub_Addr=Hub_Addr + 1
                          add  XMM_Addr,#01        'XMM_Addr=XMM_Addr + 1
                         djnz  XMM_Len,#:XmmData   'if(--XMM_Len > 0) goto XmmData
                           or  outa,RamSel         'RamSel=Hi (SRAM Inactive)
     XMM_ReadPage_ret     ret                      'Return
    

    Notations:
    XMM_Len: Number of Bytes to read from SRAM
    XMM_Addr: The SRAM location from which the Bytes are to be fetched.
    Hub_Addr: Hub Address location to which the Bytes are to be written


    The SendReadCmd is just like the SendWriteCmd, except it uses a $03 (read) instead of the $02 (write) command prefix. It's followed by the address (for example, $0307ABCD) and sent to the SRAM stating that it wants to read data from this address.

    The GetRamData function reads a Byte from this location (with the Byte reconstructed from two successive Nibble reads, again due to the 4-bit data bus)

    Now, here's the Direct Read Function, which doesn't work:
     XMM_ReadLong         mov   XMM_Len,#04        'XMM_Len=4 (Read A Long)
     XMM_ReadMult        andn   outa,RamSel        'RamSel=Lo (SRAM Active)
                         call   #SendReadCmd       'Configure SRAM For Reading
                          add   XMM_Addr,XMM_Len   'XMM_Addr=XMM_Addr + XMM_Len
                          shl   XMM_Len,#01        'XMM_Len (Bytes)->XMM_Len (Nibbles)
                          mov   RamLoop,XMM_Len    'RamLoop=XMM_Len (Read Nibbles)
                         call   #GetRamData        'SRAM -> RamData
     XMM_Dst              mov   0-0,RamData        'RamData -> CogRam
                           or   outa,RamSel        'RamSel=Hi (SRAM Inactive)
     XMM_ReadMult_ret
     XMM_ReadLong_ret     ret                      'Return
    

    With XMM_Len once again restricted to values of 1,2, or 4 in Direct Mode.

    Notice how it uses the exact same two functions (SendReadCmd and GetRamData) that the Cached function uses.

    Again, the HubRam isn't used, so the Byte(s) are to be written to a CogRam location specified by the Kernel:
    XMM_Dst     mov 0-0,RamData
    

    With the 0-0 overwritten by the Kernel via the MOVD instruction thus pointing to the CogRam location.

    There's nothing mysterious with the SendWriteCmd, SendRamData, SendReadCmd, and GetRamData functions, so I'm just wondering if the MOVS and MOVD process within the Kernel is somehow overwriting portions of the Kernel itself and trashing the screen...

  • jmgjmg Posts: 13,778
    msrobots wrote: »
    ..
    But maybe you are fighting windmills, your program is fine, the output of the Terminal f..es up sometimes.
    I have not investigated this deep, but P2loads Terminal is - hmm - misbehaving sometimes.

    Yes, it is a good idea to have more than one terminal, for simple 'sanity checks'.

    So something is definitely going on here but RamTest can't complete the TRIV confirmation as shown by the Screen trashing.
    Likewise, if we look at the CMPX Screen:
    It seems as if CMPX is attempting to write "Passed" but only the "asd" portion is shown mixed in with a lot of other garbage.
    ..
    msrobots wrote: »
    to me it looks like every second letter is missing
    DEADBEEF - dEaDbEeF - EDEF

    Well spotted. It is not 'trashing' in the classic sense, as there is no garbage appearing. It's a somewhat well behaved failure.
    The effect is consistent with code being too fast for a serial reporting link - how does it change, if you change the BAUD rate ?


  • Mike,

    It could be some problem with the loader, but it's more likely that some memory within the XMM Kernel itself is being overwritten somehow.

    Here's some sample code that writes to XMM Memory via the Cache (which works fine):
     XMM_WritePage     andn  outa,RamSel           'RamSel=Lo (SRAM Active)
                       call  #SendWriteCmd         'Configure SRAM For Writing
     :HubData        rdbyte  RamData,Hub_Addr      'HubRam -> RamData (This Is A Byte)
                        mov  RamLoop,#02           'RamLoop=2 (Send 2 Nibbles To SRAM)
                       call  #SendRamData          'RamData -> SRAM
                        add  Hub_Addr,#01          'Hub_Addr=Hub_Addr + 1
                        add  XMM_Addr,#01          'XMM_Addr=XMM_Addr + 1
                       djnz  XMM_Len,#:HubData     'if(--XMM_Len > 0) goto HubData
                         or  outa,RamSel           'RamSel=Hi (SRAM Inactive)
     XMM_WritePage_ret  ret                        'Return
    

    Notations:
    XMM_Len: Number of Bytes to write to SRAM
    Hub_Addr: Hub Address location from which the Bytes are to be fetched.
    XMM_Addr: The SRAM location to which the Bytes are to be written.

    The SendWriteCmd function is benign. It simply sends the SRAM Write Command ($02) with the address appended to it (for example, $0207ABCD) to the SRAM stating that it wants to write data to it at address $07ABCD.

    The SendRamData function writes a Byte to this location (with the Byte sent as 2 Nibbles because this is a 4-bit data bus).

    Now, here's the code to write to XMM Memory in Direct Mode (which doesn't work):
     XMM_WriteLong        mov   XMM_Len,#04       'XMM_Len=4 (Write Long)
     XMM_WriteMult       andn   outa,RamSel       'RamSel=Lo (SRAM Active)
                         call   #SendWriteCmd     'Configure SRAM For Writing
                          add   XMM_Addr,XMM_Len  'XMM_Addr=XMM_Addr + XMM_Len
     XMM_Src              mov   RamData,0-0       'CogRam -> RamData
                          shl   XMM_Len,#01       'XMM_Len (Bytes) -> XMM_Len(Nibbles)
                          mov   RamLoop,XMM_Len   'RamLoop=XMM_Len (Write Nibbles)
                         call   #SendRamData      'RamData -> SRAM
                           or   outa,RamSel       'RamSel=Hi (SRAM Inactive)
     XMM_WriteMult_ret
     XMM_WriteLong_ret    ret                     'Return
    

    In Direct Mode, XMM_Len can only have values of 1,2, or 4.

    Notice that the exact same two functions used during Cache write (SendWriteCmd and SendRamData) are used here as well.

    In this case the HubRam isn't used, so the Byte(s) are fetched from a CogRam location specified by the Kernel:
    XMM_Src     mov   RamData,0-0
    

    With the 0-0 overwrittten by the Kernel via the MOVS instruction thus pointing to the CogRam location.

    The XMM Read Function, which works in Cached mode is this:
     XMM_ReadPage        andn  outa,RamSel         'RamSel=Lo (SRAM Active)
                         call  #SendReadCmd        'Configure SRAM For Reading
     :XmmData             mov  RamLoop,#02         'RamLoop=2 (Grab 2 SRAM Nibbles)
                         call  #GetRamData         'SRAM -> RamData
                       wrbyte  RamData,Hub_Addr    'RamData -> HubRam
                          add  Hub_Addr,#01        'Hub_Addr=Hub_Addr + 1
                          add  XMM_Addr,#01        'XMM_Addr=XMM_Addr + 1
                         djnz  XMM_Len,#:XmmData   'if(--XMM_Len > 0) goto XmmData
                           or  outa,RamSel         'RamSel=Hi (SRAM Inactive)
     XMM_ReadPage_ret     ret                      'Return
    

    Notations:
    XMM_Len: Number of Bytes to read from SRAM
    XMM_Addr: The SRAM location from which the Bytes are to be fetched.
    Hub_Addr: Hub Address location to which the Bytes are to be written


    The SendReadCmd is just like the SendWriteCmd, except it uses a $03 (read) instead of the $02 (write) command prefix. It's followed by the address (for example, $0307ABCD) and sent to the SRAM stating that it wants to read data from this address.

    The GetRamData function reads a Byte from this location (with the Byte reconstructed from two successive Nibble reads, again due to the 4-bit data bus)

    Now, here's the Direct Read Function, which doesn't work:
     XMM_ReadLong         mov   XMM_Len,#04        'XMM_Len=4 (Read A Long)
     XMM_ReadMult        andn   outa,RamSel        'RamSel=Lo (SRAM Active)
                         call   #SendReadCmd       'Configure SRAM For Reading
                          add   XMM_Addr,XMM_Len   'XMM_Addr=XMM_Addr + XMM_Len
                          shl   XMM_Len,#01        'XMM_Len (Bytes)->XMM_Len (Nibbles)
                          mov   RamLoop,XMM_Len    'RamLoop=XMM_Len (Read Nibbles)
                         call   #GetRamData        'SRAM -> RamData
     XMM_Dst              mov   0-0,RamData        'RamData -> CogRam
                           or   outa,RamSel        'RamSel=Hi (SRAM Inactive)
     XMM_ReadMult_ret
     XMM_ReadLong_ret     ret                      'Return
    

    With XMM_Len once again restricted to values of 1,2, or 4 in Direct Mode.

    Notice how it uses the exact same two functions (SendReadCmd and GetRamData) that the Cached function uses.

    Again, the HubRam isn't used, so the Byte(s) are to be written to a CogRam location specified by the Kernel:
    XMM_Dst     mov 0-0,RamData
    

    With the 0-0 overwritten by the Kernel via the MOVD instruction thus pointing to the CogRam location.

    There's nothing mysterious with the SendWriteCmd, SendRamData, SendReadCmd, and GetRamData functions, so I'm just wondering if the MOVS and MOVD process within the Kernel is somehow overwriting portions of the Kernel itself and trashing the screen...

    We don't see the code that uses XMM_Addr so I can't tell if it is truly relevant, but in the Cache version it is incremented by one for each time through the loop, while in the Direct version it is updated once prior to the call that transfers the data. For bytes XMM_Addr should update at the same rate, so do byte transfers work with both methods? What size transfers are used in the test that corrupts the display?
  • Wingineer19Wingineer19 Posts: 70
    edited 2019-07-30 - 00:55:38
    AJL wrote: »

    We don't see the code that uses XMM_Addr so I can't tell if it is truly relevant, but in the Cache version it is incremented by one for each time through the loop, while in the Direct version it is updated once prior to the call that transfers the data. For bytes XMM_Addr should update at the same rate, so do byte transfers work with both methods? What size transfers are used in the test that corrupts the display?

    I've attached the latest XMM driver functions for review.

    The XMM_Addr, Hub_Addr, and XMM_Len variables are provided to my functions by the Kernel and outside the control of my driver. If there was a way to capture this information it would be helpful for troubleshooting.

    In Direct Mode the XMM_Len value is restricted to 1,2, or 4 bytes per function call. I know for instruction fetches this value is always 4, but I don't know what the most common value is for data transfers. That's a question @RossH can answer.

    Since my Direct Mode functions don't mess with XMM_Len (aside from copying its value into the RamLoop variable -- then converting RamLoop into Nibbles), it doesn't really matter where in the function that XMM_Len is added to XMM_Addr. In the examples above it's added near the start of the function, but in my latest code it's added at the end of the function. It makes no difference as far as the test failure goes.

    In Cached Mode the value of XMM_Len can be much greater than 4, likely hundreds (or maybe thousands) of Bytes, as requested by the Kernel. In this case my functions simply increment both Hub_Addr and XMM_Addr by one during each iteration through the loop. The final result is that both XMM_Addr and Hub_Addr have both been incremented by XMM_Len, similar to how the XMM_Addr value has been incremented by XMM_Len in the Direct Mode functions.

    The Loader uses the Cached Mode functions to transfer the program to the propeller. In my latest code it transfers single bytes, so the Loader is responsible for arranging these bytes in the proper Little-Endian order for the program.

    However, I have used the Long version of the Cached Mode functions (commented out in the latest code), and they work as well. In fact, there's a noticeable speed difference within my program when the Cached Mode functions are operating using Long transfers as opposed to Bytes.

    Catalina provides a very nice utility called RamTest that allows you to verify operation of your memory drivers. In every case I've tried, the Cached Mode functions work, but the Direct Mode ones don't.

    It's a mystery to me because both the Cached Mode and Direct Mode functions use the exact same memory command functions (SendWriteCmd or SendReadCmd) and data functions (SendRamData or GetRamData).

    As you can see from the code listing there's not a lot of difference between the Cached Mode functions (XMM_WritePage or XMM_ReadPage) versus the Direct Mode functions (XMM_WriteLong/XMM_WriteMult or XMM_ReadLong/XMM_ReadMult), yet the former work fine while the latter don't. Or they are working but trash the serial display in the process..

    Oh, in answer to a previous question, I haven't been able to try different serial baud rates because I don't know how to configure the RamTest Display to do that.

  • RossHRossH Posts: 4,329
    edited 2019-08-03 - 08:06:44
    Hello @Wingineer19

    Sorry not to respond sooner - been away for a few days. And tomorrow I have to travel again, but I will take a few hours tonight to look at your code.

    First, to respond to your very last issue (serial baud rates) - you can adjust the baud rate used by the RAM Test program by adjusting the value of SIO_BAUD in your CUSTOM.DEF file. You may want to try 230400 and see if that solves your problem. However, be aware that this will change ALL your serial baud rates, including the baud rate used by the various program loaders. So if you change it for RAM Test, you may also need to rebuild all the utilities, and also specify the new baud rate when using payload etc.

    Your problem with the output of the RAM Test program could be simply that it is accessing RAM too fast and trying to output the serial I/O too fast for the baud rate to keep up. The serial I/O plugin used by the RAM Test program is specifically intended to be consumed by PropTerminal (available from http://www.insonix.ch/propeller/prop_term.html), and may not work well with other terminal emulators. I didn't write either the serial I/O plugin or the PropTerminal program, so I am not able to help much if this is misbehaving.

    I will look at the other issues and post again in a few hours.

    Ross.

    OOPS - AN UPDATE: It looks like PropTerminal cannot be used with baud rates greater than 115200 baud. Upping the baud rate is still worth a try, but you will have to use another terminal program.

    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • I've attached the latest XMM driver functions for review.

    Hello @Wingineer.

    I've been looking at this code for a couple of hours now, and I can't see what is going wrong :(

    I'll try again tomorrow.

    Ross.


    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Hi @Wingineer19

    One thing occurred to me over night - you might try installing a clean copy of Catalina and trying your Custom XMM code with that. In all your experimenting, it seems possible you may have edited the source of either the kernel or the RAM test program.

    Ross.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Hi @RossH,

    Welcome back!

    I uninstalled Catalina and installed your latest release (3.16).

    I copied over all of my CUSTOM files to the Target folder.

    I went to the Utilities folder and typed:

    build_ram_test CUSTOM CACHED_4K

    (I wanted to verify RamTest is still happy with the Cached functions before attempting the Direct ones)

    When it completed the task, without any errors by the way, I then typed: RamTest

    And my PC responded:

    'ramtest' is not recognized as an internal or external command, operable program or batch file

    Sigh. I went back and checked all of the environment variables and both Catalina and Catalina/bin are still within the Windows PATH.

    I then did a search for RamTest within the entire Catalina folder and Windows showed that the file didn't exist.

    Any idea what happened to it? This is odd.

    OK, so the RamTest idea didn't work because my PC can't find it.

    I then decided to give build_utilities a try. I chose CUSTOM, no Flash, yes SRAM, and NO Cache.

    Then I started Catalina, pulled up my MenuTest program, unchecked the 4K cache box, and compiled. No warnings, no errors, no problem. Then I chose Download To XMM RAM And Interact and...Nothing. So Catalina doesn't like the Direct functions.

    I then went back to the build_utilities process but this time chose the 4K cache option. Went back to Catalina, checked the 4K Cache box, recompiled, chose Download to XMM RAM And Interact, and voila!, the Main Menu appeared.

    So as before, the Cached functions work, but Direct didn't.

    If I can ever locate RamTest maybe it can shed some light on this issue...

    I miss RamTest :(

    Jeff
  • RossHRossH Posts: 4,329
    edited 2019-08-04 - 05:45:53
    Hi @Wingineer19

    There is a separate "build_ram_test" batch file in the utilities folder. It will leave the binaries in that folder - you may have to manually copy them elsewhere.

    Ross.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Hi @RossH,

    Or, one can use this command after running build_ram_test CUSTOM (for testing in Direct Mode instead of Cached)

    payload ram_test_pc -i

    (BTW, I still can't find anything called RamTest anywhere within the Catalina folder. I don't know what happened to it, but I suppose I can create it as a batch file containing the above command)

    Anyway, I was able to use the above command, and here's what I got for the TRIV Test:
    newtriv.jpg

    And for the CMPX Test:
    newcmpx.jpg

    So the insanity persists. And I can't for the life of me figure out how my code is trashing the screen.

    But whatever is happening, Catalina doesn't like Direct Mode either since, as I posted earlier, I get Nothing on the Interact Screen after downloading the compiled program to XMM memory.

    I don't think the problem is that the RAM updates are being sent to the screen too fast, and thus causing the trashing. It's like a memory overwrite within RamTest itself is happening.

    One other interesting issue: The Cached Functions don't work if I choose an 8K cache buffer. They work fine for 1K, 2K, or 4K, but not for 8K.

    I don't know if this 8K cached problem is related, even tangentially, to the Direct Mode problem, but I find it to be strange as well...
    641 x 416 - 25K
    643 x 414 - 133K
  • RossHRossH Posts: 4,329
    edited 2019-08-04 - 06:54:52
    Hello @Wingineer19

    The reason CACHED_8K doesn't work is probably unrelated. It may simply be a space issue. Does your program have 8kb of Hub RAM spare?

    The other problem is also driving me crazy. But we'll solve it! :)

    And, just thinking about it - have you tried a cache size of CACHED_1K to see if that works?

    Ross.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Hi @RossH,

    I'm running the MenuTest program that I posted earlier on this forum. Not a lot going on with that program, so I assume that there's sufficient HubRam to work with. But I have no way to prove it right now.

    I just tried the 1K cache option, and it did work for both RamTest and Catalina. However, the MenuTest screens have slowed to a crawl. In CMM the Fibonacci Menu (Menu 1) updates in about 1 sec. In XMM with 4K cache it updates in about 2 sec. And in XMM with 1K cache it takes 4.5 sec :(

    My testing with the Cached functions indicates that a 4K cache is the "sweet spot". Reasonable performance while consuming only 4K of HubRam. I'll do some more experiments with 2K as it might be workable in the end. But 1K is definitely out, and 8K won't work.

    Regarding the Direct Mode problem, is there any way I can comment out the XMM_Src and XMM_Dst lines and somehow pass the data to a known variable within the Kernel? I just want to verify that these items aren't somehow involved in overwriting critical memory within the Kernel and trashing the screen.

    Maybe within the next few days we'll solve this screen trashing issue and get those Direct functions working...

    Well, that's it for me tonight. It's past 1:00 AM here so I'm calling it quits for now.


  • Hi @RossH,

    Just a quick update here.

    I ran build_ram_test CUSTOM to create the binaries for testing in Direct Mode.

    I then used Prop Terminal to load and execute the ram_test_pc.binary, and the result is the exact same screen trashing that your default terminal emulator had.

    So it seems unlikely that the screen trashing is caused by the terminal emulators themselves.

    It's odd how the screen output gets trashed by some simple routines that only fetch or put data to the XMM memory.

    The XMM driver code has no interaction with the screen printing routines at all.

    Yet, when I switch over to Cached Mode, which uses the exact same functions to read/write from/to the XMM Memory as Direct Mode, screen output is normal.

    Maybe this one is auditioning for the next episode of Unsolved Mysteries :)
  • Hi @RossH,

    I found something in CUSTOM_XMM.inc.

    Take a look here:
    XMM_Activate        or   dira,RamSigs            'Enable RamClk,RamSel,D3,D0
                        or   outa,RamSigs            'Set RamClk=1,RamSel=1,D3=1,D0=1
                      call   #MakeQuadMode           'Switch To Quad Mode
    XMM_Activate_ret
    

    Notice anything missing?

    Like maybe a ret at the end of this function?

    I just added it to my code, and now it passes RamTest in Direct Mode.

    How in the blazes did I miss that?

    And how in the world did it work in Cached Mode with this critical piece missing?

    I'm past due for an Eye Exam so maybe I better snap to it...

    But, it still doesn't work in Catalina.

    So, I'll continue to experiment...
  • Hi @RossH,

    Success!

    I got Catalina to work with the Direct Mode functions!

    But the speed makes watching paint dry seem lightning fast by comparison :blush:

    I ran my MenuTest program (posted earlier on this forum), and here's the Benchmark Speeds I get for displaying Menu 1 (the Fibonacci Screen):

    CMM: 1.06 seconds

    XMM SMALL Direct: 9.7 seconds
    XMM SMALL Cached 1K: 4.5 seconds
    XMM SMALL Cached 2K: 2.76 seconds
    XMM SMALL Cached 4K: 1.94 seconds
    XMM SMALL Cached 8K: Doesn't Work. No Response.

    XMM LARGE Direct: 16.08 seconds
    XMM LARGE Cached 1K: 6.29 seconds
    XMM LARGE Cached 2K: 5.53 seconds
    XMM LARGE Cached 4K: 4.63 seconds
    XMM LARGE Cached 8K: Doesn't Work. No Response.

    The only question left is why doesn't the 8K Cache Mode work.

    But I'm not going to worry about it at this point. I got the answer to my question regarding Cached vs Direct Mode.

    And my choice will be to go with the XMM SMALL Cached 4K option. This seems to be the sweet spot for acceptable speed with minimal HubRam usage.

    This exercise also answered the question as to whether or not the Quad SRAM could operate in Direct Mode. It can, if you like watching paint dry. Otherwise, stick with the Cached Mode functions.

    Now I will turn to writing the functions for the Flash RAM so I can store my XMM program and execute upon startup.

    Many, many thanks to you for helping me resolve this issue. It's been a long time in coming.

    Anybody who is crazy enough to want to use this code for anything is free to do so. I consider this code to be public domain. Why anyone would want to is another issue within itself that is beyond the scope of this posting...

    Time for me to celebrate now with a nice cheeseburger :)

    Cheers,

    Jeff





  • Hello @RossH,

    Well I'm back at it again, stirring up trouble :)

    I've been thinking about the Flash RAM stuff, but then went back and re-read the Catalina manual discussing the various loaders. I came across this:
    EEPROM.binary – TINY or CMM programs can be loaded to any 32kb EEPROM –
    they do not need any special compile commands. XMM programs (SMALL or
    LARGE) can be loaded to EEPROMs of 64Kb or larger, provided they are compiled
    with the -C EEPROM option
    ...XMM programs (SMALL or LARGE) can be loaded into EEPROM if compiled with
    the EEPROM command line option:
    catalina othello.c -lci -C C3 -C SMALL -C EEPROM
    payload EEPROM Othello

    Is this saying that if I replace the 64KB EEPROM on my Propeller Board with a larger capacity one (like the 256KB AT24CM02 that I have in my possession) that Catalina can store my program in it, then transfer it to SRAM upon startup and execute?

    If so, then I don't need to fool with the Flash RAM stuff because the program can be stored in EEPROM.

    I currently have 512KB of SRAM on my board (that's what this whole Cached vs Direct Mode circus has been about).

    I could remove the existing 64KB EEPROM and replace it with two of these 256KB AT24CM02 EEPROMS on the I2C bus yielding a total of 512KB.

    Then maybe Catalina can store my program in EEPROM, transfer it to SRAM upon bootup, and run it from there?

    Am I understanding this correctly?

    Thoughts?
  • Notice anything missing?

    Like maybe a ret at the end of this function?

    I just added it to my code, and now it passes RamTest in Direct Mode.

    How in the blazes did I miss that?

    Don't feel too bad. I missed it too! :)

    However, I'm glad you finally got it sorted. I was beginning to worry you had found something fundamentally wrong with Catalina.

    But I have spent a large part of today standing by the side of the road waiting for roadside assistance to turn up to tow my broken down car to the auto mechanic, so now that I am finally home I am going to have a couple of large whiskeys and go to bed :(

    I'll look at the rest of your posts tomorrow!

    Ross.
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • RossHRossH Posts: 4,329
    edited 2019-08-07 - 05:55:48
    Hello @RossH,

    Well I'm back at it again, stirring up trouble :)

    I've been thinking about the Flash RAM stuff, but then went back and re-read the Catalina manual discussing the various loaders. I came across this:
    EEPROM.binary – TINY or CMM programs can be loaded to any 32kb EEPROM –
    they do not need any special compile commands. XMM programs (SMALL or
    LARGE) can be loaded to EEPROMs of 64Kb or larger, provided they are compiled
    with the -C EEPROM option
    ...XMM programs (SMALL or LARGE) can be loaded into EEPROM if compiled with
    the EEPROM command line option:
    catalina othello.c -lci -C C3 -C SMALL -C EEPROM
    payload EEPROM Othello

    Is this saying that if I replace the 64KB EEPROM on my Propeller Board with a larger capacity one (like the 256KB AT24CM02 that I have in my possession) that Catalina can store my program in it, then transfer it to SRAM upon startup and execute?

    If so, then I don't need to fool with the Flash RAM stuff because the program can be stored in EEPROM.

    I currently have 512KB of SRAM on my board (that's what this whole Cached vs Direct Mode circus has been about).

    I could remove the existing 64KB EEPROM and replace it with two of these 256KB AT24CM02 EEPROMS on the I2C bus yielding a total of 512KB.

    Then maybe Catalina can store my program in EEPROM, transfer it to SRAM upon bootup, and run it from there?

    Am I understanding this correctly?

    Thoughts?

    While this should be possible in theory, I just checked the EEPROM loader code (Catalina_EEPROM_SIO_Loader.spin) and it says it can only handle EEPROMS up to 128k. I am not sure why - perhaps there simply were no larger EEPROMS available at the time.

    Also, it may need tweaking to handle programming multiple EEPROMS.

    Ross.

    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • @RossH,

    The largest I2C EEPROM I've found is the 256KB Atmel AT24CM02.

    I have several in my possession right now. These new chips also support a 1MHz "Fast Mode Plus" transfer rate.

    A total of 2 of these can be placed on the same I2C bus, provided that the A2 pin on one is connected to ground, and the A2 pin on the other is connected to Vcc. This affects the addressing scheme so when you work it all out you will have a total of 512KB of EEPROM to play with.

    Assuming the read and write command sequences for these new chips are backwards compatible with the previous generation, the biggest difference would be in the addressing bits sent as part of the command sequence.

    Maybe later this week I can scan through the EEPROM loader code, find the 128KB restriction, and see if I can raise it.

    I also need to verify that the Propeller can work with this larger 256KB EEPROM. Hopefully it can.

    Although I want the full 512KB storage capability, I seriously doubt my finished code will be anywhere near that size. Even the 256KB is likely overkill.

    Nevertheless, I still like the idea of 512KB just in case I need that extra margin.

    Jeff

  • Nevertheless, I still like the idea of 512KB just in case I need that extra margin.

    Well, all I say is that it is certainly feasible, and I am happy to will work with you to make it practical :)
    Catalina - a FREE ANSI C compiler for the Propeller.
    Download it from http://catalina-c.sourceforge.net/
  • Hi @RossH,

    Success!

    I got Catalina to work with the Direct Mode functions!

    But the speed makes watching paint dry seem lightning fast by comparison :blush:

    I ran my MenuTest program (posted earlier on this forum), and here's the Benchmark Speeds I get for displaying Menu 1 (the Fibonacci Screen):

    CMM: 1.06 seconds

    XMM SMALL Direct: 9.7 seconds
    XMM SMALL Cached 1K: 4.5 seconds
    XMM SMALL Cached 2K: 2.76 seconds
    XMM SMALL Cached 4K: 1.94 seconds
    XMM SMALL Cached 8K: Doesn't Work. No Response.

    XMM LARGE Direct: 16.08 seconds
    XMM LARGE Cached 1K: 6.29 seconds
    XMM LARGE Cached 2K: 5.53 seconds
    XMM LARGE Cached 4K: 4.63 seconds
    XMM LARGE Cached 8K: Doesn't Work. No Response.
    ...
    Cheers,
    Jeff

    Congradulations. How many pins are you using? Do they have to be special pins like 0 and up for these speeds (I seem to remember those needs for ramblade). Please define SMALL and LARGE. What is CMM? Am I correct in assuming that the cache is in hub ram (don't know where else you would put 4k)? What do these numbers mean in MB/sec? What chips are you using?

    Thanks,
    Hinv
  • hinv wrote: »
    Congradulations. How many pins are you using? Do they have to be special pins like 0 and up for these speeds (I seem to remember those needs for ramblade). Please define SMALL and LARGE. What is CMM? Am I correct in assuming that the cache is in hub ram (don't know where else you would put 4k)? What do these numbers mean in MB/sec? What chips are you using?

    Thanks,
    Hinv

    Thanks. Getting the Cached Mode driver to work was relatively easy. The Direct Mode driver posed more challenges because of the available space within the Kernel to get it to work. But in the end both were working so I could run my comparison tests.
    How many pins are you using?
    I'm using a Quad SPI SRAM chip, so 6 pins total: 1 CS, 1 CLK, 4 Address/Command/Data.
    Do they have to be special pins like 0 and up for these speeds (I seem to remember those needs for ramblade)
    No. The CS and CLK pins can pretty much be whatever you want. The Data pins, however, must be in a contiguous block of 4. They don't have to start at 0. In fact, I will be moving them up to occupy pins 25 to 22. As long as the driver code knows where pin D0 is, it takes care of the I/O positioning issues.
    Please define SMALL and LARGE. What is CMM?
    These are memory models supported within Catalina. PropGCC does something similar:
    Compact Memory Model (CMM)=Code, Data, and Stack are within HubRam.
    Small Memory Model=Code is in XMM memory, Data and Stack are within HubRam.
    Large Memory Model=Code & Data in XMM memory, Stack within HubRam.

    Note that CMM will be the fastest because everything is within HubRam. Large Memory Model will be slowest since only the Stack is within HubRam.
    Am I correct in assuming that the cache is in hub ram (don't know where else you would put 4k)?
    Yes, that is correct. The XMM data is copied into the HubRam Cache via a dedicated Caching Cog, and the Kernel executing your program fetches this information from the Cache.
    What do these numbers mean in MB/sec?
    I haven't done an actual benchmark on execution time, but we're looking at KB/sec not MB/sec.

    I'm estimating that with the Caching enabled I'm seeing throughput rates of around 300KB/sec, with the rate varying somewhat depending upon how many bytes are fetched by the Caching Cog per cycle.

    The XMM memory operates in Sequential Access Mode so fetching 512 bytes instead of 128 bytes per cycle will yield a slightly higher throughput because you will only have to send the read/write command and address location once per cycle as opposed to 4 times if 128 bytes are fetched.

    At first glance the 300KB/sec speed isn't all that impressive, but it should be sufficient for my application.
    What chips are you using?
    I'm using the ISSI IS62/65WVS5128GBLL 512KX8 SPI SRAM chip that supports Quad Mode transfers. I bought them earlier this year from Mouser.com for a little more than $5 each. They appear to be a high demand item because they are still out of stock.

    Regarding Cached versus Direct Mode: Unless you are using a Parallel Interface with separate Address and Data lines (not multiplexed and/or using latches) the Direct Mode won't buy you anything.

    In fact, I found Direct Mode to be much slower than using the Cached Mode with these SPI SRAM chips.

    Furthermore, given the Sequential Access Mode feature of these SPI SRAMs, I wouldn't be surprised if Cached Mode would beat the Parallel multiplexed/latched arrangement on data throughput.

    In my case I can't spare the pins to go with separate Address and Data lines, so the Quad Mode SPI SRAM operating in Cached Mode was a reasonable compromise.







  • RossH wrote: »
    Well, all I say is that it is certainly feasible, and I am happy to will work with you to make it practical :)

    Sounds good. I'll take you up on this offer at the appropriate time.

    I bought a new USB Project Board from Parallax and decided to rearrange the XMM Memory and other components into a cleaner layout on this new board instead of the rats nest I have on my existing board.

    As I await delivery of the ordered parts, I've moved on to writing various aspects of my Main application code...

    … While keeping a close eye on the P2 testing and troubleshooting issues...
Sign In or Register to comment.