Fastest way to get INA Nibbles shifted to 32bit?

tonyp12 · 2011-05-04 14:49

As some SPI memory devices are now starting to show up in 100mhz Quad-SPI

I was happy that getting 4 times the bandwidth, but then I though that
shifting 4 bits from INA is not as simple as reading one pin and using RCL (or maybe muxc)

So some of the gain could be lost with extra software overhead.
Expamples should be based on that we are using a counter to generate the clock,

The code would only need get 4pins shifted in as fast as possible, but at a synchronous time interval
so to not get a mismatch with CLK

movs buffer,ina
ror buffer,#1 wc
rcl buffer2,#1
ror buffer,#1 wc
rcl buffer2,#1
...
...
and so on using un-rolled loops
But the above code in not faster that reading 1bit at time from regular spi

I could see running 4 cogs synchronous,
that treat it just as a 1bit spi and but on different quad pins.

If it's used for VGA Color data, you are allowed to use some tricks that
take advantage of that the it is only using 24bits (H & V is always 0) but should only use 6 nibbles.

Phil Pilgrim (PhiPi) · 2011-05-04 15:03

This is how I do it with a 10 MHz clock:

              movi      line_buf+0,ina
              shr       line_buf+0,#4
              movi      line_buf+0,ina
              shr       line_buf+0,#4
              movi      line_buf+0,ina
              shr       line_buf+0,#4
              movi      line_buf+0,ina
              shr       line_buf+0,#11
              
              movi      line_buf+1,ina
              shr       line_buf+1,#4
              movi      line_buf+1,ina
              shr       line_buf+1,#4
              movi      line_buf+1,ina
              shr       line_buf+1,#4
              movi      line_buf+1,ina
              shl       line_buf+1,#5
              ...

Later in the program, the lower 16 bits of the even longs have to be masked and OR'd with the masked upper 16 bits of the odd longs before transfer to the hub. Or you could just collect 16 lower bits, as in the first block above and transfer the data as words.

-Phil

jazzed · 2011-05-04 15:24

Which device are you using? I have a driver for dual W25Qs and Rayman has something for the ST parts for nibbles. Some of the logic for writing nibbles from INA to HUB has already been done. My driver is for 2x W25Q parts. See this thread: http://forums.parallax.com/showthread.php?130648-Sweet-spot-INA-to-HUB-Byte-Transfers-(6MB-s-or-48Mbit-s-1-COG-96MHz)

We're all patiently waiting for SQI RAM

Rayman · 2011-05-04 21:51

I think there is some truth to your point. Unless the SQI chip is on P0..P3, I don't think you can get 4X the speed with one cog.
As you point out, it's easy to realize 4X the speed on any pins using 4 cogs. But, I think 4 cogs is just too much.
For FlashPoint, I'm currently working on drivers for 2 and 3 cog fast reads.

With 3 cogs, I think I can do a 256x240 "full color" bitmap (8-bit) display on TV.
With 2 cogs, I can now show a 480x272 (6-bit) image on my 4.3" touchscreens at 25 Hz (fast enough to make it flicker free) from either SQI flash or 4X SRAM chips.

Fastest way to get INA Nibbles shifted to 32bit?

Comments