Now the LUT as RAM has settled down, I'm looking at LUT as LUT.
If we take a real example, and start simple :
800x480 is a common modern LCD display, with a choice of resolutions.
using a 4b pixel and a 32b Lookup, RAM Bytes needed are
4bp : 800*480*4/8 = 192000
8bp : 800*480*8/8 = 384000
16bp: 800*480*16/8 = 768000 (Outside current HUB budget)
From what I can see in the DOCs, 4 & 8b LUT indexing are possible, and 16b can feed direct (skip LUT)
If you want less than Digital 32b out, DOCs say
1000 = enable output on pins 31..0 32 pins
1001 = enable output on pins 39..8 32 pins
1010 = enable output on pins 47..16 32 pins
1011 = enable output on pins 55..24 32 pins
1100 = enable output on pins 63..32 32 pins
1101 = enable output on pins 63..40 24 pins, 8 MSBs not output
1110 = enable output on pins 63..48 16 pins, 16 MSBs not output
1111 = enable output on pins 63..56 8 pins, 24 MSBs not output
Which pins have the boot SPI & UART ? Is there a conflict in 42,16,8b modes ?
8bp is perhaps the most natural choice, but even that is not great for colour blend backgrounds - you can see the steps instead of gradient.
It also wastes half the LUT, which otherwise just sits there....
4bp is quite spartan, even with a lookup table.
Besides those simple choices, there are a couple of others, which the P2 resource might naturally support = 6 & 10 bit pixels ?
4bp : 800*480*4/8 = 192000 (std)
6bp : 800*480*6/8 = 288000 Packed
6bp : 800*480*(32/5)/8 = 307200 5 pixels per u32
8bp : 800*480*8/8 = 384000 (std)
10bp: 800*480*10/8 = 480000 Packed 2^19-800*480*10/8 = 44288 Spare
10bp: 800*480*(32/3)/8 = 512000 2^19-800*480*(32/3)/8 = 12288 Spare
4,6,8 index the LUT easily.
10b would work well as a
1024 x 16 LUT, but DOCs are unclear if there is a 1024 x 16 mode ?
This is 4 times as many choices, as 8 bp, and it more fully uses LUT and HUB memory.
Unpacked versions of 6,10 bp simply fill the u32 and discard the leftover.
Packed versions use less RAM, but require a simple 6,10 bit collector for the left over bits.
At 10 bp, packing makes a big difference going from 12288 free bytes to 44288 free bytes.
Is it possible to add 6,10bp to Streamer to better match available / mainstream displays ?
Comments
I encountered this "conflict" of the upper 4 bits in a early experiment.
I was sure I posted something about this but a search fails to find any matches.
Anyhow, I suggested maybe the 24,16 & 8 bit modes be relocated to the top of PortA.
Maybe this has changed now that Chip has added 4,2 & 1 bit modes in the next release.
That much is in the data, but they slide-up, then compress.....
One SDRAM solution could need 4b + 16b+16b Streamers, not clear how that can MAP to pins either ?
Hopefully the data is just delayed relative to the design ....
For streaming: 32-/16-/8-pin groups can start at any 8-pin offset (0,8,16,24...), while nibbles can start at any 4-pin offset, twits can start at any 2-pin offset, and single pins can be anywhere.
And no bits could be called zits.
Back in Oz Wednesday so rearing to go
4b + 16b + 16b Streamers,
which looks like one way to get decent SDRAM speed.
rogloh has worked out a compact frame sequence using P1V for R/W and refresh, so it looks like SRAM.
I think that could 'play-back' in a streamer ?