Shop Learn P1 Docs P2 Docs
64 MB PSRAM module using 16 pins? --> 96 MB w/16 pins or 24 MB w/8 pins - Page 17 — Parallax Forums

64 MB PSRAM module using 16 pins? --> 96 MB w/16 pins or 24 MB w/8 pins

11213141517

Comments

  • evanhevanh Posts: 13,424

    Yup, for that particular run.

  • YanomaniYanomani Posts: 1,507
    edited 2022-07-23 14:24

    Then I'm almost sure your Eval-setup is suffering from Clock_Jittery-Syndrome (sure, frequency and temperature-related), and this also reveal some extra information; since you're relying on the Smart Pins in order to get propper PSRam CLK and CE# signals, but can't do the same for the PSRam I/Os (due to forcefully needing the Streamer to get proper data signals), and despite the Clocked-action is occurrying at the pad ring, the resulting timings are NOT aligned, due to some subtle differences between signals coming from the very pad ring region (pure 3.3V-related logic (CLK and CE#)), and the ones that come from the raw OR'ed OUT-bus (coming from Cogs/Streamers, so 1.8V-related, thus must be voltage-translated to 3.3V, before hitting the last latching stage at the pad ring).

  • evanhevanh Posts: 13,424

    It'll be transmission line effects. The paths are much longer out to the add-on boards. I had previously thought attenuation was the biggest issue but maybe bus terminations might do wonders.

  • evanhevanh Posts: 13,424

    Here's Rayman's 48 MB add-on attached to the same Eval Board. It has less ICs, only three per bus instead of six. The tracks are shorter too. The results are much improved over the 96 MB add-on.

    usb-Parallax_Inc_Propeller_P2-ES_EVAL_P23YOO42-if00-port0 -t -b 230400
     Chip ID is:  0d 5d 52 f6 08 37 b5 6c
    DATA_PINS = 200    CE_PIN =  13    CLK_PIN =  12
    SPI cmode=3  CLK_REGD = 1  TX_REGD = 1  RX_REGD = 1
    SPI clock ratio: 2 (sysclock/2)
    Test data length: 100 x 1024 = 102400 bytes
    
        Frequency dependent lag compensation
                  0    1    2    3    4    5
      60 MHz    100% 100%   0%   0%   0%   0%
      62 MHz    100% 100%   0%   0%   0%   0%
      64 MHz    100% 100%   0%   0%   0%   0%
      66 MHz    100% 100%   0%   0%   0%   0%
      68 MHz    100% 100%   0%   0%   0%   0%
      70 MHz    100% 100%   0%   0%   0%   0%
      72 MHz    100% 100%   0%   0%   0%   0%
      74 MHz    100% 100%   0%   0%   0%   0%
      76 MHz    100% 100%   0%   0%   0%   0%
      78 MHz    100% 100%   0%   0%   0%   0%
      80 MHz    100% 100%   0%   0%   0%   0%
      82 MHz    100% 100%   0%   0%   0%   0%
      84 MHz    100% 100%   0%   0%   0%   0%
      86 MHz    100% 100%   0%   0%   0%   0%
      88 MHz    100% 100%   0%   0%   0%   0%
      90 MHz    100% 100%   0%   0%   0%   0%
      92 MHz    100% 100%   0%   0%   0%   0%
      94 MHz    100% 100%   0%   0%   0%   0%
      96 MHz    100% 100%   0%   0%   0%   0%
      98 MHz    100% 100%   0%   0%   0%   0%
     100 MHz    100% 100%   0%   0%   0%   0%
     102 MHz    100% 100%   0%   0%   0%   0%
     104 MHz    100% 100%   0%   0%   0%   0%
     106 MHz     99% 100%   0%   0%   0%   0%
     108 MHz     89% 100%   0%   0%   0%   0%
     110 MHz     50% 100%   0%   0%   0%   0%
     112 MHz     13% 100%   0%   0%   0%   0%
     114 MHz      3% 100%   0%   0%   0%   0%
     116 MHz      0% 100%   0%   0%   0%   0%
     118 MHz      0% 100%   0%   0%   0%   0%
     120 MHz      0% 100%   0%   0%   0%   0%
     122 MHz      0% 100%   0%   0%   0%   0%
     124 MHz      0% 100%   0%   0%   0%   0%
     126 MHz      0% 100%   0%   0%   0%   0%
     128 MHz      0% 100%   0%   0%   0%   0%
     130 MHz      0% 100%   0%   0%   0%   0%
     132 MHz      0% 100%   0%   0%   0%   0%
     134 MHz      0% 100%   6%   0%   0%   0%
     136 MHz      0% 100%  96%   0%   0%   0%
     138 MHz      0% 100% 100%   0%   0%   0%
     140 MHz      0% 100% 100%   0%   0%   0%
     142 MHz      0% 100% 100%   0%   0%   0%
     144 MHz      0% 100% 100%   0%   0%   0%
     146 MHz      0% 100% 100%   0%   0%   0%
     148 MHz      0% 100% 100%   0%   0%   0%
     150 MHz      0% 100% 100%   0%   0%   0%
     152 MHz      0% 100% 100%   0%   0%   0%
     154 MHz      0% 100% 100%   0%   0%   0%
     156 MHz      0% 100% 100%   0%   0%   0%
     158 MHz      0% 100% 100%   0%   0%   0%
     160 MHz      0% 100% 100%   0%   0%   0%
     162 MHz      0% 100% 100%   0%   0%   0%
     164 MHz      0% 100% 100%   0%   0%   0%
     166 MHz      0% 100% 100%   0%   0%   0%
     168 MHz      0% 100% 100%   0%   0%   0%
     170 MHz      0% 100% 100%   0%   0%   0%
     172 MHz      0% 100% 100%   0%   0%   0%
     174 MHz      0% 100% 100%   0%   0%   0%
     176 MHz      0% 100% 100%   0%   0%   0%
     178 MHz      0% 100% 100%   0%   0%   0%
     180 MHz      0% 100% 100%   0%   0%   0%
     182 MHz      0% 100% 100%   0%   0%   0%
     184 MHz      0% 100% 100%   0%   0%   0%
     186 MHz      0% 100% 100%   0%   0%   0%
     188 MHz      0% 100% 100%   0%   0%   0%
     190 MHz      0% 100% 100%   0%   0%   0%
     192 MHz      0% 100% 100%   0%   0%   0%
     194 MHz      0% 100% 100%   0%   0%   0%
     196 MHz      0% 100% 100%   0%   0%   0%
     198 MHz      0% 100% 100%   0%   0%   0%
     200 MHz      0% 100% 100%   0%   0%   0%
     202 MHz      0% 100% 100%   0%   0%   0%
     204 MHz      0% 100% 100%   0%   0%   0%
     206 MHz      0% 100% 100%   0%   0%   0%
     208 MHz      0% 100% 100%   0%   0%   0%
     210 MHz      0% 100% 100%   0%   0%   0%
     212 MHz      0% 100% 100%   0%   0%   0%
     214 MHz      0%  99% 100%   0%   0%   0%
     216 MHz      0%  96% 100%   0%   0%   0%
     218 MHz      0%  78% 100%   0%   0%   0%
     220 MHz      0%  45% 100%   0%   0%   0%
     222 MHz      0%  16% 100%   0%   0%   0%
     224 MHz      0%   4% 100%   0%   0%   0%
     226 MHz      0%   1% 100%   0%   0%   0%
     228 MHz      0%   0% 100%   0%   0%   0%
     230 MHz      0%   0% 100%   0%   0%   0%
     232 MHz      0%   0% 100%   0%   0%   0%
     234 MHz      0%   0% 100%   0%   0%   0%
     236 MHz      0%   0% 100%   0%   0%   0%
     238 MHz      0%   0% 100%   0%   0%   0%
     240 MHz      0%   0% 100%   0%   0%   0%
     242 MHz      0%   0% 100%   0%   0%   0%
     244 MHz      0%   0% 100%   0%   0%   0%
     246 MHz      0%   0% 100%   0%   0%   0%
     248 MHz      0%   0% 100%   0%   0%   0%
     250 MHz      0%   0% 100%   0%   0%   0%
     252 MHz      0%   0% 100%   0%   0%   0%
     254 MHz      0%   0% 100%   0%   0%   0%
     256 MHz      0%   0% 100%   0%   0%   0%
     258 MHz      0%   0% 100%   0%   0%   0%
     260 MHz      0%   0% 100%   0%   0%   0%
     262 MHz      0%   0% 100%   0%   0%   0%
     264 MHz      0%   0% 100%   0%   0%   0%
     266 MHz      0%   0% 100%   4%   0%   0%
     268 MHz      0%   0% 100%  72%   0%   0%
     270 MHz      0%   0% 100% 100%   0%   0%
     272 MHz      0%   0% 100% 100%   0%   0%
     274 MHz      0%   0% 100% 100%   0%   0%
     276 MHz      0%   0% 100% 100%   0%   0%
     278 MHz      0%   0% 100% 100%   0%   0%
     280 MHz      0%   0% 100% 100%   0%   0%
     282 MHz      0%   0% 100% 100%   0%   0%
     284 MHz      0%   0% 100% 100%   0%   0%
     286 MHz      0%   0% 100% 100%   0%   0%
     288 MHz      0%   0% 100% 100%   0%   0%
     290 MHz      0%   0% 100% 100%   0%   0%
     292 MHz      0%   0% 100% 100%   0%   0%
     294 MHz      0%   0% 100% 100%   0%   0%
     296 MHz      0%   0% 100% 100%   0%   0%
     298 MHz      0%   0% 100% 100%   0%   0%
     300 MHz      0%   0% 100% 100%   0%   0%
     302 MHz      0%   0% 100% 100%   0%   0%
     304 MHz      0%   0% 100% 100%   0%   0%
     306 MHz      0%   0% 100% 100%   0%   0%
     308 MHz      0%   0% 100% 100%   0%   0%
     310 MHz      0%   0% 100% 100%   0%   0%
     312 MHz      0%   0%  99% 100%   0%   0%
     314 MHz      0%   0%  99% 100%   0%   0%
     316 MHz      0%   0%  99% 100%   0%   0%
     318 MHz      0%   0%  96% 100%   0%   0%
     320 MHz      0%   0%  87% 100%   0%   0%
     322 MHz      0%   0%  72% 100%   0%   0%
     324 MHz      0%   0%  54% 100%   0%   0%
     326 MHz      0%   0%  33% 100%   0%   0%
     328 MHz      0%   0%  19% 100%   0%   0%
     330 MHz      0%   0%  10% 100%   0%   0%
     332 MHz      0%   0%   5% 100%   0%   0%
     334 MHz      0%   0%   3% 100%   0%   0%
     336 MHz      0%   0%   1% 100%   0%   0%
     338 MHz      0%   0%   1% 100%   0%   0%
     340 MHz      0%   0%   0% 100%   0%   0%
     342 MHz      0%   0%   0% 100%   0%   0%
     344 MHz      0%   0%   0% 100%   0%   0%
     346 MHz      0%   0%   0% 100%   0%   0%
     348 MHz      0%   0%   0% 100%   0%   0%
     350 MHz      0%   0%   0%  98%   0%   0%
     352 MHz      0%   0%   0%  38%   0%   0%
     354 MHz      0%   0%   0%  25%   0%   0%
     356 MHz      0%   0%   0%  22%   0%   0%
     358 MHz      0%   0%   0%  19%   0%   0%
     360 MHz      0%   0%   0%  16%   0%   0%
     362 MHz      0%   0%   0%  14%   0%   0%
     364 MHz      0%   0%   0%  11%   0%   0%
     366 MHz      0%   0%   0%   8%   0%   0%
     368 MHz      0%   0%   0%   6%   0%   0%
     370 MHz      0%   0%   0%   4%   0%   0%
     372 MHz      0%   0%   0%   3%   0%   0%
     374 MHz      0%   0%   0%   2%   1%   0%
     376 MHz      0%   0%   0%   0%   1%   0%
     378 MHz      0%   0%   0%   0%   1%   0%
     380 MHz      0%   0%   0%   0%   1%   0%
     382 MHz      0%   0%   0%   0%   9%   0%
     384 MHz      0%   0%   0%   0%  86%   0%
     386 MHz      0%   0%   0%   0%  99%   0%
     388 MHz      0%   0%   0%   0% 100%   0%
     390 MHz      0%   0%   0%   0% 100%   1%
    Done
    
  • roglohrogloh Posts: 4,461

    @evanh said:
    Crossing page boundaries is crappy with these chips. I'm getting errors as low as 70 MHz SPI clock when doing larger than 1024 byte bursts. Smaller length page crossings can actually achieve 133 MHz, just. Whereas staying within a page works right up to Prop2's limits, ~200 MHz SPI clock when chilled. Definitely a good rule to split them up.

    Of course. That is why my driver always follows this rule and splits it up. Break the rules and it's anyone's guess as to the result. PSRAM seems solid as anything when you keep within the 1kB page size (or 2kB for 8bit, 4kB for 16bit).

  • YanomaniYanomani Posts: 1,507
    edited 2022-07-23 14:49

    @evanh said:
    It'll be transmission line effects. The paths are much longer out to the add-on boards. I had previously thought attenuation was the biggest issue but maybe bus terminations might do wonders.

    What's the P2_IO pin-numbers that are being used to drive PSRam's DSIO[3:0]???

    P.S. I understand the CLK and CE#, but the DATA_PINS-numbering system is messing whith my brain contents...

    DATA_PINS = 232 CE_PIN = 57 CLK_PIN = 56

  • roglohrogloh Posts: 4,461

    DATA_PINS = 232
    232 = 192 + 40 which is a pin group of 4 based at P40.

  • evanhevanh Posts: 13,424

    That's a good idea - Split off the ADDPINS component ...

  • YanomaniYanomani Posts: 1,507
    edited 2022-07-23 15:00

    @rogloh said:
    DATA_PINS = 232
    232 = 192 + 40 which is a pin group of 4 based at P40.

    @evanh said:
    That's a good idea - Split off the ADDPINS component ...

    Thanks guys! I was at the nearmost strabismus-contraption-point of my ever-suffering eyes here... :lol:

  • YanomaniYanomani Posts: 1,507
    edited 2022-07-24 01:14

    @rogloh said:

    @evanh said:
    Crossing page boundaries is crappy with these chips. I'm getting errors as low as 70 MHz SPI clock when doing larger than 1024 byte bursts. Smaller length page crossings can actually achieve 133 MHz, just. Whereas staying within a page works right up to Prop2's limits, ~200 MHz SPI clock when chilled. Definitely a good rule to split them up.

    Of course. That is why my driver always follows this rule and splits it up. Break the rules and it's anyone's guess as to the result. PSRAM seems solid as anything when you keep within the 1kB page size (or 2kB for 8bit, 4kB for 16bit).

    You and @evahn @evanh (sorry!!!) can also try splitting the CLK and DATA transactions in two, but keeping PSRam CE# = "Low" and providing a rest-period of CLK = "Low" at the "interval-region"; this would give more time for the PSRam for it to "prepare" the next Data-buffer, ready for continuing the operation at the next row. The interval would also enable enough time for another command to be sent to the Streamer, without needing to forcefully pre-buffering it.

    The right interval timing (Sysclk-cycles count) is of prime importance, as to ensure keeping with the Smart Pin-based new series of CLK pulses, just in sync with expected in-phase operation with Streamer I/O data transfers.

    There would be no hicup, and you'll avoid all the burden of ending the current CE#-constrainned transaction, just to start a new one, soon afterwards. Would be some kind of a "Command-Chainning", but without needing to re-send the same command and providing new (and unneeded) addressing phases.

  • evanhevanh Posts: 13,424
    edited 2022-07-26 20:30

    Okay, the 4-bit code is all sorted. I'm really happy - Managed to eliminate all uncalculated (hand-coded) timing values!

    EDIT: I guess I should add a boilerplate ... done
    EDIT2: Comments improved
    EDIT3: Rename of udec() to udeci() to avoid name clash in Pnut
    EDIT4: This one has a bug, the newer release supersedes it - https://forums.parallax.com/discussion/comment/1541443/#Comment_1541443

  • YanomaniYanomani Posts: 1,507
    edited 2022-07-24 07:56

    As for the tests of nibble-oriented devices with 1kB-long rows, maybe I can add some "little bits" (sic) of my own.

    The annexed file contains 2kBytes of non-random, nibble-oriented data, composed by 64 different 32-byte-wide "mini-rows" of data-patterns, intended to produce a deterministic set of frequencies at PSRams SIO[3:0].

    Each "mini-row" is identifyed by a one-byte Gray code which represents its number at the sequence, spanning from 'h00 thru 'h28, and located at the 18th byte-position of each 32-byte-long sequence.

    The following excerpte (four mini-rows) shows an example of how they are individually coded:

    "A5 5A A5 5A 01 80 FF 08 10 FF 01 80 5A A5 00 00 00 00 5A A5 FE 7F 00 FE 7F 00 FE 7F 5A A5 5A A5

    A5 5A A5 5A 01 80 FF 08 10 FF 01 80 5A A5 00 00 00 01 5A A5 FE 7F 00 FE 7F 00 FE 7F 5A A5 5A A5

    A5 5A A5 5A 01 80 FF 08 10 FF 01 80 5A A5 00 00 00 03 5A A5 FE 7F 00 FE 7F 00 FE 7F 5A A5 5A A5

    A5 5A A5 5A 01 80 FF 08 10 FF 01 80 5A A5 00 00 00 02 5A A5 FE 7F 00 FE 7F 00 FE 7F 5A A5 5A A5"

    Hope it helps a bit :smile:

  • evanhevanh Posts: 13,424
    edited 2022-07-24 13:31

    Damn, my newly fleshed out stdlib.spin2 has a namespace conflict with udec() in Pnut ... and I've used it lots too.

    EDIT: Now updated above.

  • evanhevanh Posts: 13,424
    edited 2022-07-26 10:59

    Aargh! Somehow I've made the timing different between Flexspin and Pnut. :( I was comparing them too, must have been a late change. Pnut is acting like it has one less instruction in the critical section ... EDIT: Bah! It's Flexspin trying to optimise by using an immediate operand when it determines there is a constant value. Which can be either # or ## depending on how large that constant is!

    So my workaround for that issue didn't work either. I guess I hadn't checked it careful enough. Time for a follow-up Flexspin bug report ...

    EDIT: This is triggering a rethink. I might try introducing a block of parameters in hubRAM that can be fast copied into register space ...

    EDIT2: Ha! Not bad at all. I can happily tack it on the end of the inline code. :) eg:

    PUB  tx_cmd( cmd )
    
        org
                    setxfrq xfrq                ' set sysclock/1 for lead in timing
                    rolnib  cmd, cmd, #1        ' big-endian nibble swap
    
                    xinit   leadin, #0          ' lead-in timing, at sysclock/1
                    setq    nco                 ' streamer transfer rate
                    xcont   ca8, cmd            ' tx Command only
                    waitx   #2 ' hacked in for quick test
                    drvl    datp                ' active for tx CA phase
                    drvl    #PSRAM_CE_PIN
                    dirh    #PSRAM_CLK_PIN      ' start smartpin internally cycling at SPI clock rate
                    wypin   #2, #PSRAM_CLK_PIN  ' 2 SPI clocks for Command only
    
                    waitx   #2 * CLK_DIV - 4
                    dirl    #PSRAM_CLK_PIN      ' reset smartpin
                    dirl    datp                ' tristate the databus upon completion
            _ret_   drvh    #PSRAM_CE_PIN
    
    xfrq            long    $8000_0000
    leadin          long    M_LEADIN
    nco             long    M_NCO
    ca8             long    M_CA8
    datp            long    PSRAM_DATA_PIN
        end
    
  • evanhevanh Posts: 13,424
    edited 2022-07-26 12:02

    Nice! It's solved the new problem completely. And behaves the same in Pnut now again.

    EDIT: Err, not completely. There is still a variant where when the function parameters have constants passed to them, it can have the same issue. Problem here is said function parameters need to be able to modify that inline data block. I don't know if that's possible. Certainly no direct symbol access in Flexspin.

  • roglohrogloh Posts: 4,461

    Yeah that is handy. I think returning like that only works in the org/end not asm/endasm blocks, right?

  • Yes. ORG/END is Interpreter-style loaded ASM block, ASM/ENDASM just dumps your assembly into the compiler IR (can participate in inlining, constant propagation, CORDIC reorder etc)

  • evanhevanh Posts: 13,424
    edited 2022-07-26 14:33

    This is rearing its head mostly now because I'm trying to move to doing the 16-bit wide bus. And what's changed is the defined constant for the 16 data pins. It now has an ADDPINS 15 so it exceeds the 9-bit limit of a single # immediate. This has had catastrophic repercussions on timing when compiling with Flexspin.

    And there is another issue, actually even worse, where a local register variable will be used if the passed in constant is non-zero. But if it's zero then the optimiser will generate an immediate operand with ## instead. This one occurs when selecting individual chips in SPI interface mode.

  • evanhevanh Posts: 13,424
    edited 2022-07-26 14:43

    There's something else going on too. And I better make a backup of this one. If I shift position of certain data in DAT section the program crashes or does stupid things. It's a little like a buffer overrun but I'm pretty certain that isn't the reason. Just swapping two items ahead of the buffers can still blow it up. And moving a non-sensitive, but important, item to after the buffers is fine.

    PS: I suspect this the elusive change-one-innocuous-thing-and-it-goes-bats. The one that will vanish without a trace again.

  • evanhevanh Posts: 13,424

    Okay, a newer more generic release that handles one, two and four chips in parallel. Only have to adjust the pin constants to suit.

    I'll work on the comments.

  • evanhevanh Posts: 13,424

    Roger,
    Thanks for putting me onto using lutRAM for mapping the CA phase of QPI interface onto multiple chips. It was a learning experience just in treating SPI and QPI interface modes so differently. It didn't take long to decide to access one chip at a time when in SPI mode. SPI is always only going to be for config here anyway.

  • roglohrogloh Posts: 4,461

    @evanh said:
    Roger,
    Thanks for putting me onto using lutRAM for mapping the CA phase of QPI interface onto multiple chips. It was a learning experience just in treating SPI and QPI interface modes so differently. It didn't take long to decide to access one chip at a time when in SPI mode. SPI is always only going to be for config here anyway.

    No problem. In my driver suite I've coded up 3 different PSRAM drivers so far plus some other special variants for Ada so I guess I should probably know a little bit about getting them working on the P2 by now. You'll have fun with the RMW aspects during write bursts :smile:

  • evanhevanh Posts: 13,424
    edited 2022-07-27 04:07

    @rogloh said:
    You'll have fun with the RMW aspects during write bursts :smile:

    Heh, not my problem. I don't intend to make a finished product. It's mostly about demonstrating the streamer/smartpin aligning in software, instruction counting, and doing the timing calculations. It's a lot more important for perfection at sysclock/2 than at sysclock/8 with the SD cards.

    EDIT: That might be next direction to go in. Use this knowledge to implement the 4-bit SD protocol at sysclock/2. Operate 50 MHz SD cards with just 100 MHz sysclock.

  • roglohrogloh Posts: 4,461

    Yeah fair enough. A spin2 based PSRAM driver is good for education and for some single COG use, but once you need a couple of COGs sharing the memory, or add real-time constaints, it starts to be a bit of a limiting factor and you need an arbiter COG that can fragment and control access to the shared memory. e.g. video use.

  • evanhevanh Posts: 13,424

    I tell you what. Having both QPI and SPI modes done with the streamer, and the equivalent timing calculations, provided a huge amount of troubleshooting. It was amazing how many times I got disparate outcomes between SPI and QPI. Which meant I could compare and come up with ideas for why and what was wrong. Very rarely did I revert back to older code to dig out of a hole.

  • roglohrogloh Posts: 4,461

    Yeah it's nice to have a good working reference to compare against. You are not starting out from scratch.

  • evanhevanh Posts: 13,424
    edited 2022-07-27 04:44

    Oh, I pillaged the nibble-swap code from your driver. I couldn't get my head around using those spit/merge instructions in combo like that. It looks like something Ariba came up with.

    EDIT: Although I did once use them for the EPROM's DPI mode ... ah, it was merge-only:

    read_byte4
            waitse1             'wait for smartpin (spi_do) buffer full event
    
            rdpin   pa, #spi_do     '16-bit shift-in as little-endian (odd bits)
            rdpin   pb, #spi_di     '(even bits)
            rev pa          'but SPI data is stored as big-endian (odd bits)
            rev pb          '(even bits)
            rolword pa, pb, #0      'combine to a single 32-bit word
        _ret_   mergew  pa          'untangle the odd-even pattern
    
  • roglohrogloh Posts: 4,461
    edited 2022-07-27 05:15

    You'll need to use the splitb, rev, movbyts, mergeb combination when the "a" bit is not available in the streamer command - i.e. for hub burst transfers in the 8 and 16 bit bus modes that can't use pure immediate nibble mode. This is required for correct address endianness.
    For 4 bit mode, you can use the simpler movbyts command to swap bytes and have the "a" bit do the nibble reversals for you.

    I find coming up with these bit twiddling sequences difficult when you need multiple in a row to achieve the final result. It's not something that just comes to my mind as to what you need to do for some reason. If fact in the past I've even resorted to writing some code that brute force exercises a bunch of these commands in different sequences until it comes up with the result. Eg. trying out different combinations of split, rev, movbyts, rol, merge in different orders etc until it eventually stumbles onto what you want, lol. They are highly versatile instructions.

  • evanhevanh Posts: 13,424
    edited 2022-07-27 05:14

    I've used the same for all now, ie The nibble swapper below. The table in lutRAM is built according to number of parallel chips and, in the case of a single, which half of the 8-bit pin group.

                    splitb  paddr               ' big-endian nibble swapping
                    rev     paddr               ' big-endian nibble swapping
                    movbyts paddr, #$1b         ' big-endian nibble swapping
                    mergeb  paddr               ' big-endian nibble swapping
    

    Here's the table builder (Still needs comments added)

        ' Preset spare lutRAM with the streamer's decoding LUT for parallel issue Command-Address phase
        ' Complete table is:  $0000,$1111,$2222,$3333,$4444,$5555,$6666,$7777,$8888,$9999,$aaaa,$bbbb,$cccc,$dddd,$eeee,$ffff
        if CHIPS == 1 and PSRAM_DATA_PIN & 4
            org
                    mov     idx, #0
                    rep     @.rend, #16
                    mov     pb, idx
                    shl     pb, #4
                    wrlut   pb, idx
                    add     idx, #1
    .rend
            end
        else
            idx := CHIPS
            org
                    cmp     idx, #2   wcz
                    mov     idx, #0
                    rep     @.rend, #16
                    mov     pb, idx
            if_ae   setnib  pb, pb, #1
            if_a    setbyte pb, pb, #1
                    wrlut   pb, idx
                    add     idx, #1
    .rend
            end
    
  • roglohrogloh Posts: 4,461
    edited 2022-07-27 05:23

    Looks unweidly, why not just use mul to compute the values? Multiply idx by $1111 as you increment idx from 0-$F and write the word or byte to LUT.

    In fact I don't think it hurts to replicate the pattern to all 4 nibbles in all cases, as the streamer commands will select the appropriate number of pins that receive the data.

Sign In or Register to comment.