Using unused I/O pins for inter-cog comms?
porcupine
Posts: 80
This is likely a very naive question, but I'm still a newb with this platform.
HUB memory is slow. I have a DAC COG that is spending a lot of its time consulting HUB memory to pull attributes (samples, frequencies) out. I also have a lot of unused I/O lines.
My question is this: can unused I/O lines be used to communicate from COG to COG more quickly than HUB memory? I.e. designate 8, 12, 16 lines as a parallel output from one COG, and then have the DAC COG read the state of those lines for its data? Can this be performed more quickly than access to HUB memory?
HUB memory is slow. I have a DAC COG that is spending a lot of its time consulting HUB memory to pull attributes (samples, frequencies) out. I also have a lot of unused I/O lines.
My question is this: can unused I/O lines be used to communicate from COG to COG more quickly than HUB memory? I.e. designate 8, 12, 16 lines as a parallel output from one COG, and then have the DAC COG read the state of those lines for its data? Can this be performed more quickly than access to HUB memory?
Comments
My code is a mixture of C++ and inline assembly. My biggest bottleneck is the SPI code which I can't get any faster than 2mbps -- some of this is because it's running LMM, I'm working on getting code size to under 2k so I can use COG mode for the DAC code. I'm also trying to learn how the fast SPI techniques that use CTRA/CTRB work, to see if they can apply. I want to be able to drive my DAC at 15mbps or more using SPI. But even if I were to optimize my SPI-send routine, I will still need to optimize the data retrieval, as I'm using 13 instructions just to pull from a ring buffer in HUB memory right now, so it's quite costly.
See Kyes FAT16/32 driver (for sd cards). IIRC he uses counters for the SPI code.
If your code will pause more than 1 hub cycle then waitpeq/waitpne should be faster, and it reduces power while waiting. If your transfer is via pins then you can immediately then read the pins else your codeneedsto wait for the next hubcycle.
15mbs is going to be quite difficult to sustain. For a byte and inline pasm, a minimum will be...
[code]
xxx 'extract bit by shift or mask
mov outa,yyy
xor outa,
1 bit of the byte... you may be able to use the muxc or muxz rather than mov.
These are just ideas to get you thinking, so I hope they help.
First, a +1 to Cluso's suggestion of looking at Kye's FAT16/32 driver for SPI technique. That's how I started my SPI driver and using the counter module is on my todo list as well.
Second, you may need to give up the 15 MHz dream. As far as I know, you can't get any faster than ~4 MHz with an 80 MHz clock. You might be able to increase that a bit if you unroll the loop (I think 4 MHz is achieved with a djnz loop over each bit in a data word), but I don't think 15 is reachable.
Yes, you can use waitpne/peq for reading pins (or even using two instructions) faster than reading HUB ram.
With an 96MHz clock speed it is possible to read 6MB per second bursts from HUB RAM using counters with a free pin).
This all assumes a PASM style driver.
There are several examples of using the video circuit for high speed output (12Mbps+ with a 96MHz clock), but input speeds are limited.
That may eliminate 1 bottleneck
CogSaver
I read 128-byte samples, process them, and then send them to my DAC COG. It sends a burst of 16 bits to the DAC, then pauses for X cycles, where X is my clock rate / target frequency in hz / 128.
This is to avoid downsampling of my waveform data, basically to run the DAC at a variable sample rate. This is how the old PPG wavetable synthesizers worked in the 80s; the read rate from memory and the DAC sample rate varied; was tied to the frequency to be played, avoids some aliasing artifacts. Back then it was done with custom circuitry, now I'm trying to do it with serial comms; http://www.electricdruid.net/index.php?page=info.wavetableoscs
Right now I am getting about 2-3mbps on my SPI output to the DAC, bitbanging in assembly. That maxes me out at a 1000 hz tone unless I downsample.
I don't understand the WAITVID technique and can't find an example of it. I was hoping to make the dual-counter SPI output technique work, and worked from example code but can't seem to get it to work. I'll paste my code later after I get the kids to bed.
Any recommendations on a 12 or 16 bit parallel DAC?
12 bit through hole DAC with parallel interface, 1.7 million samples per second max throughput, 3 or 5v operation. If I can get it to work I'll put my C++ code into OBEX maybe.