64 MB PSRAM module using 16 pins? --> 96 MB w/16 pins or 24 MB w/8 pins

evanh · 2022-07-23 14:01

Yup, for that particular run.

Yanomani · 2022-07-23 14:23

Then I'm almost sure your Eval-setup is suffering from Clock_Jittery-Syndrome (sure, frequency and temperature-related), and this also reveal some extra information; since you're relying on the Smart Pins in order to get propper PSRam CLK and CE# signals, but can't do the same for the PSRam I/Os (due to forcefully needing the Streamer to get proper data signals), and despite the Clocked-action is occurrying at the pad ring, the resulting timings are NOT aligned, due to some subtle differences between signals coming from the very pad ring region (pure 3.3V-related logic (CLK and CE#)), and the ones that come from the raw OR'ed OUT-bus (coming from Cogs/Streamers, so 1.8V-related, thus must be voltage-translated to 3.3V, before hitting the last latching stage at the pad ring).

evanh · 2022-07-23 14:28

It'll be transmission line effects. The paths are much longer out to the add-on boards. I had previously thought attenuation was the biggest issue but maybe bus terminations might do wonders.

evanh · 2022-07-23 14:44

Here's Rayman's 48 MB add-on attached to the same Eval Board. It has less ICs, only three per bus instead of six. The tracks are shorter too. The results are much improved over the 96 MB add-on.

usb-Parallax_Inc_Propeller_P2-ES_EVAL_P23YOO42-if00-port0 -t -b 230400
 Chip ID is:  0d 5d 52 f6 08 37 b5 6c
DATA_PINS = 200    CE_PIN =  13    CLK_PIN =  12
SPI cmode=3  CLK_REGD = 1  TX_REGD = 1  RX_REGD = 1
SPI clock ratio: 2 (sysclock/2)
Test data length: 100 x 1024 = 102400 bytes

    Frequency dependent lag compensation
              0    1    2    3    4    5
  60 MHz    100% 100%   0%   0%   0%   0%
  62 MHz    100% 100%   0%   0%   0%   0%
  64 MHz    100% 100%   0%   0%   0%   0%
  66 MHz    100% 100%   0%   0%   0%   0%
  68 MHz    100% 100%   0%   0%   0%   0%
  70 MHz    100% 100%   0%   0%   0%   0%
  72 MHz    100% 100%   0%   0%   0%   0%
  74 MHz    100% 100%   0%   0%   0%   0%
  76 MHz    100% 100%   0%   0%   0%   0%
  78 MHz    100% 100%   0%   0%   0%   0%
  80 MHz    100% 100%   0%   0%   0%   0%
  82 MHz    100% 100%   0%   0%   0%   0%
  84 MHz    100% 100%   0%   0%   0%   0%
  86 MHz    100% 100%   0%   0%   0%   0%
  88 MHz    100% 100%   0%   0%   0%   0%
  90 MHz    100% 100%   0%   0%   0%   0%
  92 MHz    100% 100%   0%   0%   0%   0%
  94 MHz    100% 100%   0%   0%   0%   0%
  96 MHz    100% 100%   0%   0%   0%   0%
  98 MHz    100% 100%   0%   0%   0%   0%
 100 MHz    100% 100%   0%   0%   0%   0%
 102 MHz    100% 100%   0%   0%   0%   0%
 104 MHz    100% 100%   0%   0%   0%   0%
 106 MHz     99% 100%   0%   0%   0%   0%
 108 MHz     89% 100%   0%   0%   0%   0%
 110 MHz     50% 100%   0%   0%   0%   0%
 112 MHz     13% 100%   0%   0%   0%   0%
 114 MHz      3% 100%   0%   0%   0%   0%
 116 MHz      0% 100%   0%   0%   0%   0%
 118 MHz      0% 100%   0%   0%   0%   0%
 120 MHz      0% 100%   0%   0%   0%   0%
 122 MHz      0% 100%   0%   0%   0%   0%
 124 MHz      0% 100%   0%   0%   0%   0%
 126 MHz      0% 100%   0%   0%   0%   0%
 128 MHz      0% 100%   0%   0%   0%   0%
 130 MHz      0% 100%   0%   0%   0%   0%
 132 MHz      0% 100%   0%   0%   0%   0%
 134 MHz      0% 100%   6%   0%   0%   0%
 136 MHz      0% 100%  96%   0%   0%   0%
 138 MHz      0% 100% 100%   0%   0%   0%
 140 MHz      0% 100% 100%   0%   0%   0%
 142 MHz      0% 100% 100%   0%   0%   0%
 144 MHz      0% 100% 100%   0%   0%   0%
 146 MHz      0% 100% 100%   0%   0%   0%
 148 MHz      0% 100% 100%   0%   0%   0%
 150 MHz      0% 100% 100%   0%   0%   0%
 152 MHz      0% 100% 100%   0%   0%   0%
 154 MHz      0% 100% 100%   0%   0%   0%
 156 MHz      0% 100% 100%   0%   0%   0%
 158 MHz      0% 100% 100%   0%   0%   0%
 160 MHz      0% 100% 100%   0%   0%   0%
 162 MHz      0% 100% 100%   0%   0%   0%
 164 MHz      0% 100% 100%   0%   0%   0%
 166 MHz      0% 100% 100%   0%   0%   0%
 168 MHz      0% 100% 100%   0%   0%   0%
 170 MHz      0% 100% 100%   0%   0%   0%
 172 MHz      0% 100% 100%   0%   0%   0%
 174 MHz      0% 100% 100%   0%   0%   0%
 176 MHz      0% 100% 100%   0%   0%   0%
 178 MHz      0% 100% 100%   0%   0%   0%
 180 MHz      0% 100% 100%   0%   0%   0%
 182 MHz      0% 100% 100%   0%   0%   0%
 184 MHz      0% 100% 100%   0%   0%   0%
 186 MHz      0% 100% 100%   0%   0%   0%
 188 MHz      0% 100% 100%   0%   0%   0%
 190 MHz      0% 100% 100%   0%   0%   0%
 192 MHz      0% 100% 100%   0%   0%   0%
 194 MHz      0% 100% 100%   0%   0%   0%
 196 MHz      0% 100% 100%   0%   0%   0%
 198 MHz      0% 100% 100%   0%   0%   0%
 200 MHz      0% 100% 100%   0%   0%   0%
 202 MHz      0% 100% 100%   0%   0%   0%
 204 MHz      0% 100% 100%   0%   0%   0%
 206 MHz      0% 100% 100%   0%   0%   0%
 208 MHz      0% 100% 100%   0%   0%   0%
 210 MHz      0% 100% 100%   0%   0%   0%
 212 MHz      0% 100% 100%   0%   0%   0%
 214 MHz      0%  99% 100%   0%   0%   0%
 216 MHz      0%  96% 100%   0%   0%   0%
 218 MHz      0%  78% 100%   0%   0%   0%
 220 MHz      0%  45% 100%   0%   0%   0%
 222 MHz      0%  16% 100%   0%   0%   0%
 224 MHz      0%   4% 100%   0%   0%   0%
 226 MHz      0%   1% 100%   0%   0%   0%
 228 MHz      0%   0% 100%   0%   0%   0%
 230 MHz      0%   0% 100%   0%   0%   0%
 232 MHz      0%   0% 100%   0%   0%   0%
 234 MHz      0%   0% 100%   0%   0%   0%
 236 MHz      0%   0% 100%   0%   0%   0%
 238 MHz      0%   0% 100%   0%   0%   0%
 240 MHz      0%   0% 100%   0%   0%   0%
 242 MHz      0%   0% 100%   0%   0%   0%
 244 MHz      0%   0% 100%   0%   0%   0%
 246 MHz      0%   0% 100%   0%   0%   0%
 248 MHz      0%   0% 100%   0%   0%   0%
 250 MHz      0%   0% 100%   0%   0%   0%
 252 MHz      0%   0% 100%   0%   0%   0%
 254 MHz      0%   0% 100%   0%   0%   0%
 256 MHz      0%   0% 100%   0%   0%   0%
 258 MHz      0%   0% 100%   0%   0%   0%
 260 MHz      0%   0% 100%   0%   0%   0%
 262 MHz      0%   0% 100%   0%   0%   0%
 264 MHz      0%   0% 100%   0%   0%   0%
 266 MHz      0%   0% 100%   4%   0%   0%
 268 MHz      0%   0% 100%  72%   0%   0%
 270 MHz      0%   0% 100% 100%   0%   0%
 272 MHz      0%   0% 100% 100%   0%   0%
 274 MHz      0%   0% 100% 100%   0%   0%
 276 MHz      0%   0% 100% 100%   0%   0%
 278 MHz      0%   0% 100% 100%   0%   0%
 280 MHz      0%   0% 100% 100%   0%   0%
 282 MHz      0%   0% 100% 100%   0%   0%
 284 MHz      0%   0% 100% 100%   0%   0%
 286 MHz      0%   0% 100% 100%   0%   0%
 288 MHz      0%   0% 100% 100%   0%   0%
 290 MHz      0%   0% 100% 100%   0%   0%
 292 MHz      0%   0% 100% 100%   0%   0%
 294 MHz      0%   0% 100% 100%   0%   0%
 296 MHz      0%   0% 100% 100%   0%   0%
 298 MHz      0%   0% 100% 100%   0%   0%
 300 MHz      0%   0% 100% 100%   0%   0%
 302 MHz      0%   0% 100% 100%   0%   0%
 304 MHz      0%   0% 100% 100%   0%   0%
 306 MHz      0%   0% 100% 100%   0%   0%
 308 MHz      0%   0% 100% 100%   0%   0%
 310 MHz      0%   0% 100% 100%   0%   0%
 312 MHz      0%   0%  99% 100%   0%   0%
 314 MHz      0%   0%  99% 100%   0%   0%
 316 MHz      0%   0%  99% 100%   0%   0%
 318 MHz      0%   0%  96% 100%   0%   0%
 320 MHz      0%   0%  87% 100%   0%   0%
 322 MHz      0%   0%  72% 100%   0%   0%
 324 MHz      0%   0%  54% 100%   0%   0%
 326 MHz      0%   0%  33% 100%   0%   0%
 328 MHz      0%   0%  19% 100%   0%   0%
 330 MHz      0%   0%  10% 100%   0%   0%
 332 MHz      0%   0%   5% 100%   0%   0%
 334 MHz      0%   0%   3% 100%   0%   0%
 336 MHz      0%   0%   1% 100%   0%   0%
 338 MHz      0%   0%   1% 100%   0%   0%
 340 MHz      0%   0%   0% 100%   0%   0%
 342 MHz      0%   0%   0% 100%   0%   0%
 344 MHz      0%   0%   0% 100%   0%   0%
 346 MHz      0%   0%   0% 100%   0%   0%
 348 MHz      0%   0%   0% 100%   0%   0%
 350 MHz      0%   0%   0%  98%   0%   0%
 352 MHz      0%   0%   0%  38%   0%   0%
 354 MHz      0%   0%   0%  25%   0%   0%
 356 MHz      0%   0%   0%  22%   0%   0%
 358 MHz      0%   0%   0%  19%   0%   0%
 360 MHz      0%   0%   0%  16%   0%   0%
 362 MHz      0%   0%   0%  14%   0%   0%
 364 MHz      0%   0%   0%  11%   0%   0%
 366 MHz      0%   0%   0%   8%   0%   0%
 368 MHz      0%   0%   0%   6%   0%   0%
 370 MHz      0%   0%   0%   4%   0%   0%
 372 MHz      0%   0%   0%   3%   0%   0%
 374 MHz      0%   0%   0%   2%   1%   0%
 376 MHz      0%   0%   0%   0%   1%   0%
 378 MHz      0%   0%   0%   0%   1%   0%
 380 MHz      0%   0%   0%   0%   1%   0%
 382 MHz      0%   0%   0%   0%   9%   0%
 384 MHz      0%   0%   0%   0%  86%   0%
 386 MHz      0%   0%   0%   0%  99%   0%
 388 MHz      0%   0%   0%   0% 100%   0%
 390 MHz      0%   0%   0%   0% 100%   1%
Done

rogloh · 2022-07-23 14:45

@evanh said:
Crossing page boundaries is crappy with these chips. I'm getting errors as low as 70 MHz SPI clock when doing larger than 1024 byte bursts. Smaller length page crossings can actually achieve 133 MHz, just. Whereas staying within a page works right up to Prop2's limits, ~200 MHz SPI clock when chilled. Definitely a good rule to split them up.

Of course. That is why my driver always follows this rule and splits it up. Break the rules and it's anyone's guess as to the result. PSRAM seems solid as anything when you keep within the 1kB page size (or 2kB for 8bit, 4kB for 16bit).

Yanomani · 2022-07-23 14:46

@evanh said:
It'll be transmission line effects. The paths are much longer out to the add-on boards. I had previously thought attenuation was the biggest issue but maybe bus terminations might do wonders.

What's the P2_IO pin-numbers that are being used to drive PSRam's DSIO[3:0]???

P.S. I understand the CLK and CE#, but the DATA_PINS-numbering system is messing whith my brain contents...

DATA_PINS = 232 CE_PIN = 57 CLK_PIN = 56

rogloh · 2022-07-23 14:50

DATA_PINS = 232
232 = 192 + 40 which is a pin group of 4 based at P40.

evanh · 2022-07-23 14:53

That's a good idea - Split off the ADDPINS component ...

Yanomani · 2022-07-23 14:59

@rogloh said:
DATA_PINS = 232
232 = 192 + 40 which is a pin group of 4 based at P40.

@evanh said:
That's a good idea - Split off the ADDPINS component ...

Thanks guys! I was at the nearmost strabismus-contraption-point of my ever-suffering eyes here...

Yanomani · 2022-07-23 15:22

@rogloh said:

@evanh said:
Crossing page boundaries is crappy with these chips. I'm getting errors as low as 70 MHz SPI clock when doing larger than 1024 byte bursts. Smaller length page crossings can actually achieve 133 MHz, just. Whereas staying within a page works right up to Prop2's limits, ~200 MHz SPI clock when chilled. Definitely a good rule to split them up.

Of course. That is why my driver always follows this rule and splits it up. Break the rules and it's anyone's guess as to the result. PSRAM seems solid as anything when you keep within the 1kB page size (or 2kB for 8bit, 4kB for 16bit).

You and @evahn @evanh (sorry!!!) can also try splitting the CLK and DATA transactions in two, but keeping PSRam CE# = "Low" and providing a rest-period of CLK = "Low" at the "interval-region"; this would give more time for the PSRam for it to "prepare" the next Data-buffer, ready for continuing the operation at the next row. The interval would also enable enough time for another command to be sent to the Streamer, without needing to forcefully pre-buffering it.

The right interval timing (Sysclk-cycles count) is of prime importance, as to ensure keeping with the Smart Pin-based new series of CLK pulses, just in sync with expected in-phase operation with Streamer I/O data transfers.

There would be no hicup, and you'll avoid all the burden of ending the current CE#-constrainned transaction, just to start a new one, soon afterwards. Would be some kind of a "Command-Chainning", but without needing to re-send the same command and providing new (and unneeded) addressing phases.

evanh · 2022-07-24 02:32

Okay, the 4-bit code is all sorted. I'm really happy - Managed to eliminate all uncalculated (hand-coded) timing values!

EDIT: I guess I should add a boilerplate ... done
EDIT2: Comments improved
EDIT3: Rename of udec() to udeci() to avoid name clash in Pnut
EDIT4: This one has a bug, the newer release supersedes it - https://forums.parallax.com/discussion/comment/1541443/#Comment_1541443

Yanomani · 2022-07-24 07:53

As for the tests of nibble-oriented devices with 1kB-long rows, maybe I can add some "little bits" (sic) of my own.

The annexed file contains 2kBytes of non-random, nibble-oriented data, composed by 64 different 32-byte-wide "mini-rows" of data-patterns, intended to produce a deterministic set of frequencies at PSRams SIO[3:0].

Each "mini-row" is identifyed by a one-byte Gray code which represents its number at the sequence, spanning from 'h00 thru 'h28, and located at the 18th byte-position of each 32-byte-long sequence.

The following excerpte (four mini-rows) shows an example of how they are individually coded:

"A5 5A A5 5A 01 80 FF 08 10 FF 01 80 5A A5 00 00 00 00 5A A5 FE 7F 00 FE 7F 00 FE 7F 5A A5 5A A5

A5 5A A5 5A 01 80 FF 08 10 FF 01 80 5A A5 00 00 00 01 5A A5 FE 7F 00 FE 7F 00 FE 7F 5A A5 5A A5

A5 5A A5 5A 01 80 FF 08 10 FF 01 80 5A A5 00 00 00 03 5A A5 FE 7F 00 FE 7F 00 FE 7F 5A A5 5A A5

A5 5A A5 5A 01 80 FF 08 10 FF 01 80 5A A5 00 00 00 02 5A A5 FE 7F 00 FE 7F 00 FE 7F 5A A5 5A A5"

Hope it helps a bit

evanh · 2022-07-24 13:17

Damn, my newly fleshed out stdlib.spin2 has a namespace conflict with udec() in Pnut ... and I've used it lots too.

EDIT: Now updated above.

evanh · 2022-07-26 09:13

Aargh! Somehow I've made the timing different between Flexspin and Pnut. I was comparing them too, must have been a late change. Pnut is acting like it has one less instruction in the critical section ... EDIT: Bah! It's Flexspin trying to optimise by using an immediate operand when it determines there is a constant value. Which can be either # or ## depending on how large that constant is!

So my workaround for that issue didn't work either. I guess I hadn't checked it careful enough. Time for a follow-up Flexspin bug report ...

EDIT: This is triggering a rethink. I might try introducing a block of parameters in hubRAM that can be fast copied into register space ...

EDIT2: Ha! Not bad at all. I can happily tack it on the end of the inline code. eg:

PUB  tx_cmd( cmd )

    org
                setxfrq xfrq                ' set sysclock/1 for lead in timing
                rolnib  cmd, cmd, #1        ' big-endian nibble swap

                xinit   leadin, #0          ' lead-in timing, at sysclock/1
                setq    nco                 ' streamer transfer rate
                xcont   ca8, cmd            ' tx Command only
                waitx   #2 ' hacked in for quick test
                drvl    datp                ' active for tx CA phase
                drvl    #PSRAM_CE_PIN
                dirh    #PSRAM_CLK_PIN      ' start smartpin internally cycling at SPI clock rate
                wypin   #2, #PSRAM_CLK_PIN  ' 2 SPI clocks for Command only

                waitx   #2 * CLK_DIV - 4
                dirl    #PSRAM_CLK_PIN      ' reset smartpin
                dirl    datp                ' tristate the databus upon completion
        _ret_   drvh    #PSRAM_CE_PIN

xfrq            long    $8000_0000
leadin          long    M_LEADIN
nco             long    M_NCO
ca8             long    M_CA8
datp            long    PSRAM_DATA_PIN
    end

evanh · 2022-07-26 11:03

Nice! It's solved the new problem completely. And behaves the same in Pnut now again.

EDIT: Err, not completely. There is still a variant where when the function parameters have constants passed to them, it can have the same issue. Problem here is said function parameters need to be able to modify that inline data block. I don't know if that's possible. Certainly no direct symbol access in Flexspin.

rogloh · 2022-07-26 11:28

Yeah that is handy. I think returning like that only works in the org/end not asm/endasm blocks, right?

Wuerfel_21 · 2022-07-26 11:48

Yes. ORG/END is Interpreter-style loaded ASM block, ASM/ENDASM just dumps your assembly into the compiler IR (can participate in inlining, constant propagation, CORDIC reorder etc)

evanh · 2022-07-26 14:32

This is rearing its head mostly now because I'm trying to move to doing the 16-bit wide bus. And what's changed is the defined constant for the 16 data pins. It now has an ADDPINS 15 so it exceeds the 9-bit limit of a single # immediate. This has had catastrophic repercussions on timing when compiling with Flexspin.

And there is another issue, actually even worse, where a local register variable will be used if the passed in constant is non-zero. But if it's zero then the optimiser will generate an immediate operand with ## instead. This one occurs when selecting individual chips in SPI interface mode.

evanh · 2022-07-26 14:40

There's something else going on too. And I better make a backup of this one. If I shift position of certain data in DAT section the program crashes or does stupid things. It's a little like a buffer overrun but I'm pretty certain that isn't the reason. Just swapping two items ahead of the buffers can still blow it up. And moving a non-sensitive, but important, item to after the buffers is fine.

PS: I suspect this the elusive change-one-innocuous-thing-and-it-goes-bats. The one that will vanish without a trace again.

evanh · 2022-07-26 19:03

Okay, a newer more generic release that handles one, two and four chips in parallel. Only have to adjust the pin constants to suit.

I'll work on the comments.

evanh · 2022-07-27 03:38

Roger,
Thanks for putting me onto using lutRAM for mapping the CA phase of QPI interface onto multiple chips. It was a learning experience just in treating SPI and QPI interface modes so differently. It didn't take long to decide to access one chip at a time when in SPI mode. SPI is always only going to be for config here anyway.

rogloh · 2022-07-27 03:44

@evanh said:
Roger,
Thanks for putting me onto using lutRAM for mapping the CA phase of QPI interface onto multiple chips. It was a learning experience just in treating SPI and QPI interface modes so differently. It didn't take long to decide to access one chip at a time when in SPI mode. SPI is always only going to be for config here anyway.

No problem. In my driver suite I've coded up 3 different PSRAM drivers so far plus some other special variants for Ada so I guess I should probably know a little bit about getting them working on the P2 by now. You'll have fun with the RMW aspects during write bursts

evanh · 2022-07-27 04:03

@rogloh said:
You'll have fun with the RMW aspects during write bursts

Heh, not my problem. I don't intend to make a finished product. It's mostly about demonstrating the streamer/smartpin aligning in software, instruction counting, and doing the timing calculations. It's a lot more important for perfection at sysclock/2 than at sysclock/8 with the SD cards.

EDIT: That might be next direction to go in. Use this knowledge to implement the 4-bit SD protocol at sysclock/2. Operate 50 MHz SD cards with just 100 MHz sysclock.

rogloh · 2022-07-27 04:07

Yeah fair enough. A spin2 based PSRAM driver is good for education and for some single COG use, but once you need a couple of COGs sharing the memory, or add real-time constaints, it starts to be a bit of a limiting factor and you need an arbiter COG that can fragment and control access to the shared memory. e.g. video use.

evanh · 2022-07-27 04:18

I tell you what. Having both QPI and SPI modes done with the streamer, and the equivalent timing calculations, provided a huge amount of troubleshooting. It was amazing how many times I got disparate outcomes between SPI and QPI. Which meant I could compare and come up with ideas for why and what was wrong. Very rarely did I revert back to older code to dig out of a hole.

rogloh · 2022-07-27 04:34

Yeah it's nice to have a good working reference to compare against. You are not starting out from scratch.

evanh · 2022-07-27 04:36

Oh, I pillaged the nibble-swap code from your driver. I couldn't get my head around using those spit/merge instructions in combo like that. It looks like something Ariba came up with.

EDIT: Although I did once use them for the EPROM's DPI mode ... ah, it was merge-only:

read_byte4
        waitse1             'wait for smartpin (spi_do) buffer full event

        rdpin   pa, #spi_do     '16-bit shift-in as little-endian (odd bits)
        rdpin   pb, #spi_di     '(even bits)
        rev pa          'but SPI data is stored as big-endian (odd bits)
        rev pb          '(even bits)
        rolword pa, pb, #0      'combine to a single 32-bit word
    _ret_   mergew  pa          'untangle the odd-even pattern

rogloh · 2022-07-27 05:07

You'll need to use the splitb, rev, movbyts, mergeb combination when the "a" bit is not available in the streamer command - i.e. for hub burst transfers in the 8 and 16 bit bus modes that can't use pure immediate nibble mode. This is required for correct address endianness.
For 4 bit mode, you can use the simpler movbyts command to swap bytes and have the "a" bit do the nibble reversals for you.

I find coming up with these bit twiddling sequences difficult when you need multiple in a row to achieve the final result. It's not something that just comes to my mind as to what you need to do for some reason. If fact in the past I've even resorted to writing some code that brute force exercises a bunch of these commands in different sequences until it comes up with the result. Eg. trying out different combinations of split, rev, movbyts, rol, merge in different orders etc until it eventually stumbles onto what you want, lol. They are highly versatile instructions.

evanh · 2022-07-27 05:13

I've used the same for all now, ie The nibble swapper below. The table in lutRAM is built according to number of parallel chips and, in the case of a single, which half of the 8-bit pin group.

                splitb  paddr               ' big-endian nibble swapping
                rev     paddr               ' big-endian nibble swapping
                movbyts paddr, #$1b         ' big-endian nibble swapping
                mergeb  paddr               ' big-endian nibble swapping

Here's the table builder (Still needs comments added)

    ' Preset spare lutRAM with the streamer's decoding LUT for parallel issue Command-Address phase
    ' Complete table is:  $0000,$1111,$2222,$3333,$4444,$5555,$6666,$7777,$8888,$9999,$aaaa,$bbbb,$cccc,$dddd,$eeee,$ffff
    if CHIPS == 1 and PSRAM_DATA_PIN & 4
        org
                mov     idx, #0
                rep     @.rend, #16
                mov     pb, idx
                shl     pb, #4
                wrlut   pb, idx
                add     idx, #1
.rend
        end
    else
        idx := CHIPS
        org
                cmp     idx, #2   wcz
                mov     idx, #0
                rep     @.rend, #16
                mov     pb, idx
        if_ae   setnib  pb, pb, #1
        if_a    setbyte pb, pb, #1
                wrlut   pb, idx
                add     idx, #1
.rend
        end

rogloh · 2022-07-27 05:19

Looks unweidly, why not just use mul to compute the values? Multiply idx by $1111 as you increment idx from 0-$F and write the word or byte to LUT.

In fact I don't think it hurts to replicate the pattern to all 4 nibbles in all cases, as the streamer commands will select the appropriate number of pins that receive the data.