Shop OBEX P1 Docs P2 Docs Learn Events
Revisiting XMM Mode Using Modern Memory Chips - Page 4 — Parallax Forums

Revisiting XMM Mode Using Modern Memory Chips

1246710

Comments

  • Hi @RossH,

    I'm getting ready to sign off for the night, but I ran a quick RamTest after enabling the INTENSIVE and INVERT options in Catalina_XMM_RamTest.spin.

    Result? It passed both the TRIV and CMPX tests again.

    I'm wondering about this stuff I put in the code:

    Under ReadLong/ReadMult:
    XMM_Dst     mov    XMM_Temp,MemData
    
    Is this correct? I see under your example code you had this:
    XMM_Dst     mov    0-0,XMM_Temp
    
    Maybe I have them reversed? But if so, why did RamTest work?

    Also, is the 0-0 a placeholder for a variable? Not sure what it means.

    Likewise, under WriteLong/WriteMult I have:
    XMM_Src      mov    MemData,XMM_Temp
    
    and your example code has this:
    XMM_Src      mov    XMMTemp,0-0
    
    Again, maybe a reversal of some sort?

    Tomorrow I can try slipping a NOP or two in XMM_Activate and see what it does.

    For my MenuTest program I don't need the floating point library, so I can remove it.

    As far as removing the lcx library, I tried that some time ago and my va_list, va_start, va_end, and vsprintf functions under CduPrint stopped working for some reason. So I've kept this library ever since.

    Just tried compiling using the LARGE memory model with no caching and got this:
    LargeMemError.jpg

    OK, tomorrow I'll revert to a simple "Hello,World!" test and see where it takes me.

    It's very late here (or very early, depending upon how you want to look at it), so I'm calling it a night.

    I'll keep you posted tomorrow on the results of this ongoing adventure :smile:

    Again, thanks for your help with this issue.
    857 x 472 - 58K
  • RossHRossH Posts: 5,462
    I'm wondering about this stuff I put in the code:

    Under ReadLong/ReadMult:
    XMM_Dst     mov    XMM_Temp,MemData
    
    Is this correct? I see under your example code you had this:
    XMM_Dst     mov    0-0,XMM_Temp
    
    Maybe I have them reversed? But if so, why did RamTest work?

    Also, is the 0-0 a placeholder for a variable? Not sure what it means.

    Likewise, under WriteLong/WriteMult I have:
    XMM_Src      mov    MemData,XMM_Temp
    
    and your example code has this:
    XMM_Src      mov    XMMTemp,0-0
    
    Again, maybe a reversal of some sort?

    The "0-0" is just a notation convention - it indicates an instruction field (src or dst) that is modified at run time. In this instance, you are supposed to modify these via a MOVD or MOVS instruction prior to calling the functions.

    For example, here is how XMM_ReadLong might get called:
            mov    XMM_Addr,Addr
            movd   XMM_Dst,#Data
            call   #XMM_ReadLong
    

    And here is how XMM_WriteLong might get called:
            mov    XMM_Addr,Addr
            movs   XMM_Src,#Data
            call   #XMM_WriteLong
    

    In both the cases above, the cog location of "Data" is being written over the "0-0" field of the instruction at XMM_Dst (or XMM_Src) to produce the correct outcome. It doesn't matter what your program has there. So you don't acually need your "XMM_Temp" variable - the references to it get overwritten at run-time anyway.
    As far as removing the lcx library, I tried that some time ago and my va_list, va_start, va_end, and vsprintf functions under CduPrint stopped working for some reason. So I've kept this library ever since.

    Hmm. I should investigate that. Those functions should work under all the library variants. If they didn't, then "printf" would not work. Do you by any chance use "sscanf"? I seem to recall an issue where this function was only implemented in the "libx" library variant.

    Your "fit" problems remind me why I never implemented direct access for the PMC - there just isn't enough kernel space!

    But we are not yet much closer to solving your actual problem :(
  • Hi @RossH,

    I've attached the lastest revision of the CUSTOM_XMM.inc file.

    I was able to squeeze the Direct functions and variables down to 91 Longs. This includes the optional function to configure for Sequential Access Mode. Remove that optional function and we're down to 84 Longs.

    As before, this latest revision passed RamTest in both Cached and NonCached modes.

    I also switched over to this simple program for Catalina:
    void main(void)
    {
     short Times=0;
     for(;;) printf("Printed Hello World %d Times\n",Times++);
    }
    

    I did a series of tests to see if this simple program will run
    1. Configure build_utilities for CUSTOM, no Flash Ram, and SRAM with 2K Cache
    
        a. Program compiled fine under the SMALL memory model and executed flawlessly.
    
        b. Program wouldn't compile under LARGE memory model. Got FIT error
    
    2. Configure build_utilities for CUSTOM, no Flash Ram, and SRAM with NO Cache
    
        a. Program compiled fine under SMALL memory model but wouldn't execute. 
            Nothing was displayed on the Interact screen
    .
        b. Program wouldn't compile under LARGE memory model. Got FIT error.
    

    There's a glimmer of hope for the SMALL memory model, but the LARGE one is a lost cause.

    SMALL will compile and execute fine when using Cache. It will compile fine without Cache but simply won't run.

    LARGE won't even compile, Cached or NonCached, and this really has me puzzled.

    I disabled the Sequential Access Mode function, reducing the overall size to 84 Longs, and it made no difference. Still wouldn't even compile for LARGE.

    Any idea what can be interfering with compiling for the LARGE memory model?

  • RossHRossH Posts: 5,462
    Any idea what can be interfering with compiling for the LARGE memory model?

    Obviously, the LARGE kernel is larger than the SMALL kernel :)

    But seriously, this is quite true - the additional code required to support the LARGE memory model means you have even less space for your own XMM code when using that memory model.

    Could you compile your program (even just your small test program) to produce a listing (use the -y option) and post it to me? I'll have a look at it and see if I can spot anything. If it is not your XMM access code, then something odd is going on. I will try and reproduce the problem here, but using different hardware.

    Did you try putting a delay after XMM_Activate? You will need more than just a NOP instruction - you may need milliseconds or longer. Use WAITCNT or something similar.

    Ross.
  • @RossH,

    Well I guess it makes sense that LARGE is well, larger, than SMALL :)

    But I was surprised that it wouldn't even compile when I chose the Caching option with it. Unless my Direct routines were overwriting part of the kernel itself, even when they weren't supposed to be in use while caching...

    No, I haven't tried putting the delay after, or within, the XMM_Activate code yet. That's still on the list of things to do.

    I'm not concerned about the LARGE memory model problem at this time because for the Project I'm working on I plan on using the SMALL memory model. Here are a couple of reasons for this choice:

    1. I will need to write a serial data port Traffic Manager in Assembly, and it will need to share variables with those within the C portion of the code. I'm thinking about grouping all of the needed variables into a structure, and then passing a pointer to that structure when doing the coginit of the PASM program. I don't know if this approach will work but I certainly want to try it.

    2. The Traffic Manager itself will need access to the serial port Tx and Rx registers that are managed by the 4-Port serial plug-in. These registers have to be in HubRam somewhere so it's a matter of finding them and interacting with them.

    But both of the above agenda items are reserved for another day.

    In the meantime, there's a SMALL non-caching problem that needs to be addressed :wink:

    I would enclose the LST files but here's another mystery:

    1. I ran build_utilities, configured it for CUSTOM, no flash, SRAM with 2K cache, compiled the Hello program, and got the LST file. I then renamed it Hello-small-cached-working-lst.lst

    2. I ran build_utilities again, configured it for CUSTOM, no flash, SRAM with no cache, compiled the Hello program, and got the LST file. I then renamed it Hello-small-nocache-notworking-lst.lst

    3. I then ran a File Compare on these two files, and they are identical! How can that be?

    Any idea why Catalina would generate two LST files that are identical, even though build_utilities was supposed to create a different configuration, and one compile instance generated code for cached, while the other didn't?

    This gets curiouser and curiouser as we go...
  • RossHRossH Posts: 5,462
    edited 2019-07-13 04:37
    Any idea why Catalina would generate two LST files that are identical, even though build_utilities was supposed to create a different configuration, and one compile instance generated code for cached, while the other didn't?

    This gets curiouser and curiouser as we go...

    The build_utilities batch file does not alter your program in any way. All it does is build some useful utilities - including those used by the payload loader. Specifically, it builds FLASH.binary and SRAM.binary programs, which are both hardware-dependent (they use your XMM code!). One of these two will also be copied to XMM.binary. These are used when loading your program. For example, when you load an XMM program (SMALL or LARGE) you use a command like:
    payload xmm my_program
    

    This is essentially a two-stage load - first the payload program loads XMM.binary - which is itself a loader - then that program loads my_program.binary. If you told build_utilities you wanted to load sram by default then XMM.binary will be the same as SRAM.binary. If you told it to load flash by default, then XMM.binary will be the same as FLASH.binary.
  • @RossH,

    OK, that makes sense about the build_utilities batch file.

    I've enclosed the two LST files for your review.

    Hopefully they contain some clues about what is going on.

  • @RossH,
    As far as removing the lcx library, I tried that some time ago and my va_list, va_start, va_end, and vsprintf functions under CduPrint stopped working for some reason. So I've kept this library ever since.
    RossH wrote: »
    Hmm. I should investigate that. Those functions should work under all the library variants. If they didn't, then "printf" would not work. Do you by any chance use "sscanf"? I seem to recall an issue where this function was only implemented in the "libx" library variant.

    Not to overload you with too much today, but I ran my MenuTest program again without the libcx library and it didn't work. Just a couple of garbage characters on the Interact screen.

    Keep in mind that this program is not using printf(), putchar(), scanf(), sscanf(), etc. In fact the HMI drivers were disabled in the compiler in favor of the libserial4 driver.

    To print something to the screen I have a CduPrint function that performs in a similar fashion to printf() but it uses va_list, va_start, vsprintf, and va_end to dump the output contents into a string. A separate function then outputs one character at a time from the string to Port0 of the libserial4 driver.

    For some reason if I don't use the libcx library it won't work.

    Here are the sample MenuTest files for your review in your rapidly shrinking spare time :smiley:
  • RossHRossH Posts: 5,462
    I've enclosed the two LST files for your review.

    Hopefully they contain some clues about what is going on.

    Drat! I had forgotten that on the P1, the listing only includes the compiled C code (which is why they are the same, BTW - it is the kernel and runtime support code that is different).

    On the P2 the listing includes all the code - including the kernel and runtime support code, which is what I wanted.

    I will think of some other way to see what is going on.
  • RossHRossH Posts: 5,462
    Ok ... got an idea ...

    Can you please execute the following command (copy and paste it) in your working directory:
    spinnaker -p "%LCCDIR%\target\XMM_default.spin" -I "%LCCDIR%" -D CUSTOM -D CACHED_1K -D SMALL -l -o xxxxxx
    

    The list of -D options should reflect the options you specify (but using -C) on the Catalina command line.

    This will produce a listing file (called xxxxxx.lst). Please send me that file.

  • @RossH,

    Sure, here is the first command sent with no cache option:
    spinnaker -p "c:\programs\compiler\catalina\target\xmm_default.spin" -I "c:\programs\compiler\catalina" -D CUSTOM -D SMALL -l -o xxxxx
    Filename: hello-small-nocache.lst

    And the second one with the 2K cache option:
    spinnaker -p "c:\programs\compiler\catalina\target\xmm_default.spin" -I "c:\programs\compiler\catalina" -D CUSTOM -D CACHED_2K -D SMALL -l -o yyyyyy
    Filename: hello-small-cache2K.lst

    Hopefully this helps.

    It's approaching 1:00 AM here so I'm calling it quits for tonight.

  • RossHRossH Posts: 5,462
    Hello @Wingineer19

    I think I have found at least one of your problems - i.e. why your program works in cached mode but not in direct mode.

    The problem is that the Propeller is a "little-endian" architecture, but your XMM functions assume it is "big-endian". You can get away with this as long as you are always reading and writing longs, but it will fail if (for example) you write a value as a long but then read it as a word. You end up with the bytes in the word reversed.

    When you use the cache, it takes care of this for you because the underlying cache code always reads and writes all values to your XMM RAM as longs. But when you use direct mode, you have to take care of this yourself.

    I should have spotted this sooner. Apologies. I will have to update my RamTest program to detect this case!

    Ross.

    P.S. I have yet to look into your other problem - i.e. why you have to use "libx". Perhaps I will get time tomorrow.
  • Hi @RossH,

    No need to apologize. You confirmed what I was beginning to suspect: It's an Endian problem.

    Maybe just add a Word store and retrieve function to your RamTest code to detect issues like this in the future...

    This takes me back to the old Motorola versus Intel duel...

    That would mean Word swapping is at play within the Longs, and Byte swapping is at play within the Words..

    So a nice long that appears like this in the "real world":
    Long = B3 B2 B1 B0      "real world" MSB to LSB
           \W1/  \W0/
    
    Would be stored as:
    Long = W0 W1

    And with the Bytes within the Words swapped too we get:
    Long = B0 B1 B2 B3      actual stored value
            \W0/ \W1/
    

    So if you gave me a Long which I thought was this:
    Long = B3 B2 B1 B0
    But was really this:
    Long = B0 B1 B2 B3
    It would be stored in that order (B0 -> B3), and retrieved in that order (B0 -> B3), so your code would be happy because it would get back what it gave, in the same order.

    But if it wanted W0 it would get B2 B3 instead of B0 B1, and W1 would yield B0 B1 instead of B2 B3?

    My code stores and retrieves data starting at the Highest Nibble first (because the Memory data bus transports Nibbles, not Bytes).

    If I changed it to store and retrieve starting at the Lowest Nibble I wonder if that would work. Hopefully there's a way to fix this with the code space I have left...

    Time to review my code, but not until this evening...

  • RossHRossH Posts: 5,462
    If I changed it to store and retrieve starting at the Lowest Nibble I wonder if that would work. Hopefully there's a way to fix this with the code space I have left...

    The order of the nibbles shouldn't matter since they are not individually addressable. As long as you read back what you write, that's fine. It should be just a matter of adding a couple of shift operations and rotating the nibbles the opposite way as you read/write them.
  • RossHRossH Posts: 5,462
    Hello @Wingineer19

    Just a quick update regarding the need to use libcx. I can confirm that this is required if your code uses vsprintf.

    The libcx library variant is the only one that fully implements stdio. The libc and other library variants only support I/O to stdin, stdout and stderr. To minimize the program size, they do not support I/O to arbitrary streams ( which is what vsprintf uses internally).

    This is mentioned in the Catalina reference manual, but the implications are not fully spelled out. I will have a think about what to do for future releases - but in the meantime you should just continue to use libcx.
  • @RossH,

    I'm still trying to get a grasp on this issue. I'm even wondering if the caching portion is working correctly.

    After activating the SRAM by dropping the CS low, it expects a 1 byte command, followed by a 3 byte memory address, starting with the most significant byte.

    I do this by grouping the Command and 24-bit address into a single Long value structured like this:
    CHML, where C=Command, H=MSB of Address, M=next MSB of Address, L=LSB of address.

    This Long is populated by ORing the value of C (shifted left by 24 bits) with the XMM_Addr provided by your kernel. The Long is then sent to the SRAM by grabbing C and sending it, then grabbing H and sending it, then grabbing M and sending it, then finally grabbing L and sending it. In reality I'm grabbing the individual Nibbles within each of these values and sending them out, but you get the point.

    In my tests using PASD, I loaded a register with 0x0007ABCD as an Address to ship out, and that's how PASD displayed this register on the screen. The SRAM Write Command is 0x02, so this Long actually looked like 0x0207ABCD, and that's what PASD showed on the screen. Single stepping through my code showed it transmitting a 0,2,0,7,A,B,C,D in exactly that order. No apparent Word or Byte swapping in this example.

    But with XMM_Addr Word and Byte swapped, this whole transmission scheme falls apart, and who knows where in memory the data is actually sent. Apparently it still falls within the 512KB range of the SRAM, otherwise RamTest wouldn't be happy.

    I've pretty much confirmed that my code retrieves exactly what it writes, in the exact byte order starting from MSB to LSB. So to focus on the problem, here's what appears to be the case:

    1. OK: Writing a single Byte then reading it back.
    2. OK: Writing a single Word then reading it back.
    3. OK: Writing a single Long then reading it back.
    4. Not OK: Writing a single Long then attempting to read it back as two separate Words.

    Maybe there are some more additional Write/Read operations that won't match as expected?

    How often does the Kernel perform a Long write then two separate Word reads?

    I'll have to look at all of this tomorrow when I have fresh eyes and hopefully a clearer mind. I'm not messing with this anymore tonight.

  • RossHRossH Posts: 5,462
    Maybe there are some more additional Write/Read operations that won't match as expected?

    Looking at the LCC code generator, I can see that it is possible for it to generate a rdbyte of a long address when there is an indirect conversion (e.g. through a pointer) of a long to a byte. Similarly for a short to a byte. Also, it might do a rdword if there is an indirect conversion of a long to a short.

    Note: I cannot seem to make it do this in my first few attempts to generate the particular cases involved, but it looks like it can do so if it wants to. This is perfectly legitimate when you know you are working on a little-endian machine, but all these would fail if the endianness of the stored values is not what the compiler expects.

    Also of concern - because it could be deeply buried in the library code, even if you don't use them in your program - is the possible failure of unions and bit fields. In particular, the initialization code of bit fields seems to be dependent on knowing the endianness of the machine on which the code is to be run.

    So both unions and bit fields could fail if the endianness of the machine does not match what the compiler expects.
  • For reading/writing single words/bytes, you can just XOR the address with 2 or 3, respectively, and keep all the other code, if that is easier.
    What will still fail then if the is a non-long-aligned block copy, but I have no idea if that ever happens (I just skimmed the code)
  • RossHRossH Posts: 5,462
    edited 2019-07-16 00:06
    Hello @Wingineer19

    I've just been looking through Catalina's XMM code again, including the XMM loader code, and the reason cached access works is that Catalina only ever uses XMM_ReadPage and XMM_WritePage to access the XMM RAM when you use the cache. These are byte-oriented functions, so they would work whether the program is little-endian or big-endian. But the cache must provide little-endian access functions because the program was compiled to be little-endian.

    When using direct access (as you point out) everything would be fine you used XMM_ReadLong and XMM_WriteLong exclusively to access your XMM RAM, even if your functions are big-endian and the program was compiled to be little-endian (NOTE: apart from bit fields and unions).

    However, this is not the case - when you use your direct API functions, you are essentially using XMM_ReadLong to access a program that was written to the XMM RAM using XMM_WritePage (this write happens during the load process, not during program execution). So the "endianness" of the direct API functions function matters and must match the "endianness" of the compiled program. That is, they must be little-endian.
  • RossHRossH Posts: 5,462
    On the "vsprintf" issue, I have decided that for the next release I will add full support for streams back into the libc library. This means that vsprintf etc will be fully implemented in both libc and libcx, but not in libci or libcix.

    I think in hindsight that it would have been better to have a policy of "no surprises" than to try and save the few hundred extra longs of code space. If code size was so critical you would probably already be using libci and/or libtiny anyway. In your case, this change will actually save you program space, because you can use libc instead of having to use libcx.

    However, I will also describe how to recompile the libraries in case this change breaks anyone's existing programs (I don't think it will).
  • RossH wrote: »
    On the "vsprintf" issue, I have decided that for the next release I will add full support for streams back into the libc library. This means that vsprintf etc will be fully implemented in both libc and libcx, but not in libci or libcix.

    Excellent. That will likely save me some code space because at this time I'm not using the vast majority of the library functions anyway. Just need the vsprintf stuff at this time...

    Of course if one was totally insane, one could compile EACH function within the various libraries from its source code into its own object. Then the compiler would have to flag which of these library functions were included within the program, and the linker would then have to include them to generate the final executable. It would be a mess and completely insane, but I wonder how much program space would be saved since only the specific functions used within the program would be linked and not an entire library. Does that make any sense? I didn't think so.

    On the SRAM access issue, I've rewritten nearly all of the functions, both Cached and Direct.

    What complicates things is that the SRAM command and address functions must be sent Big Endian, while storing the actual data itself into memory has to be Little Endian. Fortunately, I only needed to add about 5 lines of code to my Send function to differentiate between sending a command to the memory vs storing data in it.

    Even so, I still can't get the Direct functions to work.

    This has me perplexed. Here's what I did, using the same memory location:

    1. Wrote a Long to memory: $12345678
    2. Read the Long from memory: $12345678
    3. Read a Word from memory: $5678
    4. Read a Byte from memory: $78

    If I understand Little Endian correctly, this seems right. Or is it?

    And of course the Direct access passed RamTest again. BTW, I volunteer to try your revised RamTest whenever you have it ready to go. No rush, just whenever you can get around to adding the new features to it.

    I've used 94 Words so far for these revised Direct functions, so I don't have anymore wiggle room if I need to change it.

    I'm mystified right now, so once again I need to look over the code and continue working with PASD so I can see what is happening in real time. But not until tomorrow at the earliest...

    The good news is that the new cached functions work fine, so I can go that route if necessary.

    In fact, if the cached functions prove to be faster than the Direct functions (if I ever get them to work), then my code will use the cached functions instead.

    If that's the case, then the pursuit of the Direct API was wasted effort anyway...

    ...Well not really, because I've actually had fun writing this stuff in Assembly. Takes me back to what I was doing 30+ years ago as a young man coding a Z80 when our development system only had an Assembler and no C compiler :smile:

  • RossHRossH Posts: 5,462
    Of course if one was totally insane, one could compile EACH function within the various libraries from its source code into its own object. Then the compiler would have to flag which of these library functions were included within the program, and the linker would then have to include them to generate the final executable. It would be a mess and completely insane, but I wonder how much program space would be saved since only the specific functions used within the program would be linked and not an entire library. Does that make any sense? I didn't think so.

    Essentially, this already happens. This is what linkers/binders do. But most of the code used by vsprintf is already included because it is already pulled in by other functions (e.g. printf, snprintf etc). The only additional code that will now be pulled in is the code that sets up an internal buffer as a stream.
    On the SRAM access issue, I've rewritten nearly all of the functions, both Cached and Direct.

    What complicates things is that the SRAM command and address functions must be sent Big Endian, while storing the actual data itself into memory has to be Little Endian. Fortunately, I only needed to add about 5 lines of code to my Send function to differentiate between sending a command to the memory vs storing data in it.

    Even so, I still can't get the Direct functions to work.

    This has me perplexed. Here's what I did, using the same memory location:

    1. Wrote a Long to memory: $12345678
    2. Read the Long from memory: $12345678
    3. Read a Word from memory: $5678
    4. Read a Byte from memory: $78

    If I understand Little Endian correctly, this seems right. Or is it?

    Yes, that looks correct.
    And of course the Direct access passed RamTest again. BTW, I volunteer to try your revised RamTest whenever you have it ready to go. No rush, just whenever you can get around to adding the new features to it.

    That would be great. I'm snowed under trying to reconcile my P1 version of Catalina with the P2 version for the next release. Every time I think I'm getting close, I find more things that need doing :(
    I've used 94 Words so far for these revised Direct functions, so I don't have anymore wiggle room if I need to change it.

    Easy - just post your code here and challenge the forum to improve it!
    I'm mystified right now, so once again I need to look over the code and continue working with PASD so I can see what is happening in real time. But not until tomorrow at the earliest...

    The good news is that the new cached functions work fine, so I can go that route if necessary.

    How about looking at some of the other implementations of direct functions? Look at C3 XMM code, or the RamBlade, or the RamPage 2 (RP2). The C3 code is perhaps the simplest example. You may spot something we have both missed.
    In fact, if the cached functions prove to be faster than the Direct functions (if I ever get them to work), then my code will use the cached functions instead.

    If that's the case, then the pursuit of the Direct API was wasted effort anyway...

    ...Well not really, because I've actually had fun writing this stuff in Assembly. Takes me back to what I was doing 30+ years ago as a young man coding a Z80 when our development system only had an Assembler and no C compiler :smile:

    You had an assembler? What luxury! I programmed my first computer in machine code! :)
  • RossH wrote: »
    You had an assembler? What luxury! I programmed my first computer in machine code! :)

    Oh, yeah. If memory serves correct, it was an HP-9000 development system. The mass storage device was connected via HPIB and was the size of an endtable consisting of a set of removable disk platters. I think the overall storage capacity was under 100MB, and it was slow and noisy.

    I don't remember the overall cost, but just the Z80 development package was $50,000. That included about 120 hours of technical help from HP if we needed it (and we did).

    Those were the days...

  • RossHRossH Posts: 5,462
    Hello @Wingineer19

    Just a reminder - you must rebuild the utilities every time you tweak the XMM code, or even just to switch between a cached and non-cached load of the same code. Otherwise your new code may not be used!
  • Hi @RossH,

    Yes, I remember you had mentioned this to me before.

    After each RamTest, I've made sure to either select a cache option or none in build_utilities to configure it for cached or direct mode prior to compiling my test program using Catalina.

    For some reason I still can't get the Direct functions working with Catalina.

    I'll revisit my code later this week.

    Hopefully by then I will figure out why PASD stopped working. I really need that real time operation with the actual hardware to help with debugging.

    I want to exercise all of the functions, cached and direct, on the actual hardware using PASD to see what's going on.

    One test will be to send a string of bytes to memory using the cache write function, then reading the data back with the Direct read function.

    Hopefully then I can shed some light on this mystery and finally solve it.

    Once again many thanks for your help with this issue.

    Jeff
  • Wingineer19Wingineer19 Posts: 291
    edited 2019-07-26 06:56
    I've been working on slimming down my SRAM driver code for the past week to make it work with Catalina. The latest is attached.

    The good news is the Cached Functions work fine. In fact, it appears that moving data as either Bytes or Longs works now.

    Running my MenuTest program in Catalina with 4K Cache, the Long transfer mode trims about 0.25 seconds off the Fibonacci Screen update time compared to the Byte transfer mode.

    One problem which was addressed in this latest code was the fact that the commands sent to the SRAM had to be sent MSB first (Big Endian), while the data sent to the memory had to be sent LSB first (Little Endian).

    With the SRAM forced into Quad Mode the data is sent in Nibbles, starting from highest to lowest for Commands, and lowest to highest for Data.

    If we have a Register called RamData and we want to store its contents in XMM memory, here's the process:

    RamData=$12345678

    XMM_Addr=$07ABCD

    SRAM Write Command=$02

    Command String Is $0207ABCD

    The SendWriteCmd function sends Command String Nibbles to XMM Memory MSB first:

    $0207ABCD -> 0 2 0 7 A B C D

    Then the SendRamData function sends the RamData Nibbles to XMM Memory LSB first:

    $12345678 -> 8 7 6 5 4 3 2 1

    Note that if we were sending Bytes instead of Nibbles we would send this:

    $12345678 -> 78 56 34 12

    So it's evident that the Nibbles are swapped within each Byte using this memory storage scheme compared to their normal placement. (Here's to wishing I had an 8-bit bus to transfer Bytes instead of Nibbles...)

    However, this Nibble swap shouldn't be an issue because the GetRamData function puts them back into their proper order when read back from XMM memory:

    XMM=$87654321 -> $12345678 (Placed into correct order by GetRamData function)

    Reading back a Word would yield:

    XMM=$87654321 -> $5678 (via GetRamData function)

    Likewise, reading back a Byte would yield:

    XMM=$87654321 -> $78 (via GetRamData function)

    Anyway, that's all well and good, but the XMM Direct functions still don't work. They totally trash the RamTest screen.

    The TRIV screen should look like this:
    CacheTrivTest.jpg

    But looks like this:
    DirectTrivTest.jpg

    And the CMPX screen should look like this:
    CacheCmpxTest.jpg

    But looks like this:
    DirectCmpxTest.jpg

    It's as if the Kernel is getting overwritten by my code, but looking at the size of all Direct functions, their called functions, and variables, I count 88 Longs (Propeller Tool says 91), which falls within the 96 Long limit.

    I've run simulations using both Gear and PropSim and verified that the Commands are sent out in the proper order and the Output Data is in the proper order.

    I can't simulate the Input Data from memory, but I did verify that the code was outputting 8 clock pulses when reading in a Long, 4 when reading in a Word, and 2 when reading in a Byte.

    I also used PASD for realtime testing as much as possible, but it tended to crash while testing the MOVS and MOVD instructions.

    Note that I can still run the code in Cached mode so it's not catastrophic if Direct mode doesn't work. I just wanted to see what, if anything, Direct mode would buy me.

    I don't think there's much more that @RossH or I can do with this code to make it work in Direct access mode, so hopefully some Forum Members can take a look with some fresh eyes and maybe see something that's been missed...



    646 x 418 - 46K
    643 x 415 - 28K
    643 x 416 - 63K
    645 x 415 - 133K
  • RossHRossH Posts: 5,462
    Odd. I should have time to have a closer look in the next day or two.
  • Hi @RossH,

    I would very much appreciate your review of this.

    I've taken the CUSTOM_XMM.Inc segment and converted it into a Spin program (see attached).

    I've slightly modified the XMM_ReadLong/XMM_ReadMult and XMM_WriteLong/XMM_WriteMult functions to redirect their read/write requests from/to external memory to a Cog register.

    The Cog register, called MemData, is written to by the XMM_Write functions and also contains the value requested by the XMM_Read functions.

    I've tested this Spin program using PropSim, and each time I can see that I can read back what I've written (long, word, byte) in the exact order expected.

    So this adds even more to the mystery as these Direct functions appear to be working as expected.

    Let's take a quick look back at the RamTest TRIV Screen:
    DirectTrivTest.jpg
    If you look closely, it appears that RamTest is attempting to write "DEADBEEF" (indicating that the simple write/read test worked) but the Screen is scrambled for some reason.

    I've run RamTest before wherein my functions weren't working at all, and the result would be:
    TRIV:
    A 00000000

    And that's it. Nothing more.

    So something is definitely going on here but RamTest can't complete the TRIV confirmation as shown by the Screen trashing.

    Likewise, if we look at the CMPX Screen:
    DirectCmpxTest.jpg
    It seems as if CMPX is attempting to write "Passed" but only the "asd" portion is shown mixed in with a lot of other garbage.

    Hopefully your review will uncover something.
    643 x 415 - 28K
    645 x 415 - 133K
  • to me it looks like every second letter is missing

    DEADBEEF - dEaDbEeF - EDEF
    same with
    To begin, press enter - obgen rs e but then more is missing
    TRIV: - RV

    interesting,

    Mike
  • @msrobots,

    Indeed, Mike, it is really weird.

    Even more puzzling is that the PASM functions which read and write to the XMM memory during Direct Access Mode are exactly the same ones used for reading/writing when in Cached Mode.

    Except that in Cached Mode everything works fine.

    Normally Cached operates in single Byte mode. So I thought if I had Cached fetch Longs instead it wouldn't work, but it did.

    Direct Access Mode fetches instructions using Longs, but has the option of fetching Data as Longs, Words, or Bytes.

    The Data is stored in memory using LSB first (Little Endian).

    Hence I can write a Long, then read it back as a Long. Or a Word. Or a Byte. The type of stuff you would do if casting a Long into a Short, or into a Char if desired. This worked fine.

    I can write a Word to memory. Then read it back as a Word. Or a Byte. Like casting a Short into a Char for example. This also worked fine.

    I'm able to read and write single Bytes without a problem.

    The XMM memory read and write functions are not complex and are quite easy to follow.

    Whatever is going on in Direct Mode has me completely baffled.

    Maybe I'm too close to this and can't see something obvious. The forest versus the trees argument perhaps...

    Jeff


Sign In or Register to comment.