Looking for methods for fast SDI or SQI
ctwardell
Posts: 1,716
I'm looking for methods for fast dual (SDI) or quad (SQI) writes.
I'm aware of the method for using Counters A & B to get 20Mhz SPI but I don't see a way to expand that to SDI or SQI.
I see that Ahle2 has discussed using the VDG to do 20Mhz SPI and it looks like maybe that could be expanded to SDI or SQI but I don't know enough about the VDG to give it a go.
Any ideas?
I'm looking for a way to blast out a command byte and 2 address bytes at high speed to either and SDI or SQI RAM for the COSMACog project.
I'm planning on using a single 23LC512 device at this point, but I'm willing to change to one of the dual SQI solutions a few forum members have mentioned if needed.
Thanks,
C.W.
I'm aware of the method for using Counters A & B to get 20Mhz SPI but I don't see a way to expand that to SDI or SQI.
I see that Ahle2 has discussed using the VDG to do 20Mhz SPI and it looks like maybe that could be expanded to SDI or SQI but I don't know enough about the VDG to give it a go.
Any ideas?
I'm looking for a way to blast out a command byte and 2 address bytes at high speed to either and SDI or SQI RAM for the COSMACog project.
I'm planning on using a single 23LC512 device at this point, but I'm willing to change to one of the dual SQI solutions a few forum members have mentioned if needed.
Thanks,
C.W.
Comments
There are several examples of SPI/SQI in the propeller-gcc cache drivers.
1 color VGA mode using regular PLL single-ended, instead of PLL internal (video mode)
Color is shifted out on falling edge of PLL.
If that clock edge is not good, use PLL differential mode.
Like to get 20 MB/s (the max. the chips allow) working...
Just started working on it though...
Here is another example (single bit only with clock, effectively what tonyp12 was proposing) I prepared recently for someone:
First of all, it requires 2 bit mode; That's the whole point, to send clock and data bits simultaneously.
Secondly, it requires a pre calculated LUT at memory position 0 to 255.
Thirdly, it is not limited to 20 MHz operation, it can go as high as the video generator allows.
Lastly, and maybe MOST importantly, sending a single byte doesn't "clog the CPU" more than a single waitvid instruction does.
It's almost orthogonal. And you are free of doing pure parallel work while the video generator does all the work.
The sad part is that it only works for sending data... ... that's a bummer!!
(btw, did I say that I had this working some years ago, so I know it works seamlessly)
OK, so you need a lookup table and are limited to max fPLL/2. Counter mode 2 (as suggested) gives you simultaneous clock (pin A) and data (waitvid) at max fPLL without a LUT which sounds more agreeable to me.
In my world, without the knowledge I have now, my solution was great. My technique was a clever solution to a problem that didn't exist. Now I just feel ignorantly stupid.
I'm sorry!
C.W.
The basic premise is:
256 Long LUT with a fixed pixel pattern to do SQI output
The frame clock would be set to just allow 4 pixels of 4 color VGA per frame.
For a 20Mhz SPQ device the pixel clock would be 40Mhz.
The LUT would encode nibbles of data with clock:
a low clock low nibble pixel would be 0 0 0 0 bit3 bit2 bit1 bit0
a high clock low nibble pixel would be 0 0 0 1 bit3 bit2 bit1 bit0
a low clock high nibble pixel would be 0 0 0 0 bit7 bit6 bit5 bit4
a high clock high nibble pixel would be 0 0 0 1 bit7 bit6 bit5 bit4
so the LUT would be
0x00 - 00010000 00000000 00010000 00000000
0x01 - 00010000 00000000 00010001 00000001
...
0x10 - 00010001 00000001 00010000 00000000
0x11 - 00010001 00000001 00010001 00000001
...
0xFE - 00011111 00001111 00011110 00001110
0xFF - 00011111 00001111 00011111 00001111
the pixel frame would be:
xx xx xx xx xx xx xx xx xx xx xx xx xx 11 10 01 00
xx = don't care because we only send 4 pixel frame
so a write would look like
waitvid 0-0, pixels
Where 0-0 is the byte to be sent.
Does this look feasable?
Thanks,
C.W.
Edit, I suppose you could do a 16 entry LUT and only send frames of 2 pixels and send nibbles instead of bytes, but I think the ovehead of feeding it might be too high.
Instead you can use Regular PLL that toggles a pin, a none documented use for VGA.
And 2color (1bit) mode
I can see how I could maybe do nibble output using 2 color, one color representing a nibble with low clock and one color representing a nibble with high clock, but it seems that going with 4 color will allow a full byte per waitvid.
I think I might see what you are getting at, are you saying the clock output is driven directly from the PLL and data comes from the pixel data?
Keep in mind I want clock and either 2 or preferable 4 data lines to drive dual or quad SPI.
More rambling thoughts...
I see what Tony and Kuroneko are saying, the clock comes from the PLL and pixel can be set directly to the data value to send, however that only supports single bit + clock output, great for driving SPI and no LUT needed.
However, for Quad SPI (SQI) we need four data lines and clock, so writing a byte to the device only takes two clock (the SQI clock) cycles.
Since we need to drive multiple pins it seems we need to do that by manipulating the color data, which is where the LUT comes in as a means to properly set the data lines.
C.W.
With QuadSPI use 4 color mode, with PLL-single ended (A-pin) as your device reads data on rising edge (arrows in below pic).
and then use waitvid pixels, #%%3210
Normally the 4 color groups each use 8 bits, you will only use 4 groups of the lower 4bits and refer to them as pixel instead.
You will shift out 16bits for each waitvid, not to bad.
C.W.
And I would also make bit5 of the VGA color connected to the CS pin, so you make each byte after last nibble the color of % 00011111
If CS line needs to be toggled before you can do the reset back to SPI, end with 00011111, 00001111, 00001111, 00011111 for color data
And with VGA, if no waitvid update the last byte will be repeated forever = good as CS will remain high.
I have request, since the official documentation of the counter modules, video generator and PLL (phase issues) doesn't tell the whole story, it would be nice to have your "findings" (as well as what others have found) in a nice written unofficial document.
Tricks to do stuff like discussed in this thread could be included as well. I for one would find such a document very valuable. In my view the so called "video generator" is much more than what the names suggests and it's time to let the mass know that as well.
Does anyone know if start and stop behavior is documented anywhere for the video generator?
Some specific questions:
1) If VCFG is written, say directly after a WAITVID, with a VMODE of disabled, will the generator stop mid-frame or will the current frame finish?
2) Assuming that setting VMODE to disabled stops the generator mid-frame, will setting it back to VGA made resume mid-frame where it left off?
I guess I've reached the point where I need a good logic analyzer...
Thanks,
C.W.
Update: The -3/-1 may seem odd. This is due to the pixels alternating between 0 and 1. With a solid line of 16 one pixels any 4n delay between waitvid and disabling the video h/w steals a whole pixel (60 instead of 64 total)A. Anyway - while interesting - this seems like too much trouble to be in any way useful for this particular use case. Why do you think you need this?
A This is for back-to-back disable/enable calls. Varying the time between stop and (re)start will add additional complications.
Thanks for looking into this.
I need to send out 3 bytes to an SQI device very quickly, faster than can be done using SPI and the method of using the two counters as a clock and shifter.
I'd like to get the time down into the 16 to 20 instruction cycle range where the normal SPI takes something like 28 instruction cycles.
There isn't time within the code to keep maintaining waitvids when I'm not sending data.
My plan is something like:
PLL setup properly on cog startup
...
...
Set VSCL to XMIT value
Turn on VG
Do waitvid for first byte
Do waitvid for second byte
Do waitvid for third byte
Set VSCL to SHUTDOWN value
Do waitvid for SQI_WAIT
Turn off VG
The waitvids that send byte data use the method I discussed in post #10
The XMIT VSCL value is 1 clock per pixel with 4 pixel frames
The SHUTDOWN value for VSCL is to be determined, it needs to allow enough time to shutdown and then restart without timing out the frame prior to the first waitvid when restarted.
I don't care if the overall time varies by an instruction cylcle or two due to syncronization issues, etc.
The SQI_WAIT waitvid values will output low on the SQI data and clock pins.
Because I'm, only sending 4 pixel frames I may need to slow down the VG clock from my preferred 40Mhz because there needs to be time to do the VSCL update after the third byte and before the shutdown waitvid.
Edit: I did some calculations and it looks like a 30Mhz pixel clock would work to allow time for one non-hub instruction between waitvids, so this will give time for the VSCL update after the 3rd byte.
I'm not using the method mentioned in post #11 where the clock is supplied by the PLL because I need to handle the clock and chip select by other methods when I read from the SQI device.
C.W.
The confusing bit is, you mention 3 waitvids but your vscl setup is for a 4 pixel frame? Which would mean 12 bytes (4 per waitvid).
I'm using the video data to generate the clock as well, so two pixels per byte.
I'm doing the clock that way because I don't want to have to kill the PLL as well as the VG to stop the clock.
I'll need to manually toggle the clock when I do a read from the device.
Basic steps are:
Output the command byte and 16 bit address using the VG as described
Read in a byte "manually" since I only need to read in two nibbles.
This whole thing is for using the 23LC512 as RAM for the COSMACog project, so I need high speed random access to the 23LC512 that allow access within the emulated 1802's cycle time.
C.W.
It looks like this depends on video clock and system clock being sync'ed, I wonder if that is reliable enough for data transfer over long periods?
My RAM's and logic analyzer should be here late in the week, so hopefully I'll get a chance to start trying this out over the weekend.
C.W.
So does this method depend on setting a *long* frame at the fade out and then I need to execute *exactly* that many cycles before being back at the entry point? It looks like that is the way it would work.
I'll need to be able to stop the VG completely because not every emulated machine cycle will have a memory access.
Thanks,
C.W.
Changing vscl is kind of optional but this will affect the start-up sequence as we have to deal with the remaining frame count.
OT: 2600 posts
Ah, cool. So as far as startup are you referring to cog start-up? The cog startup will be doing some housekeeping to setup some mailboxes, other than that I can do whatever code you need to get the VG clock sync'ed to the system clock. Do we need to resync every time we restart the VG?
Update: Looks like can get away with vscl being setup once (p1f4). IOW, as long as we adhere to the 2n+m alignment for consecutive calls it just works. Below the send code. This has been called with the following example sequence (giving the expected result): Synchronization boils down to manipulating the frame counter to look like as if we just left send.