Using unused I/O pins for inter-cog comms?

porcupine · 2014-05-05 04:31

This is likely a very naive question, but I'm still a newb with this platform.

HUB memory is slow. I have a DAC COG that is spending a lot of its time consulting HUB memory to pull attributes (samples, frequencies) out. I also have a lot of unused I/O lines.

My question is this: can unused I/O lines be used to communicate from COG to COG more quickly than HUB memory? I.e. designate 8, 12, 16 lines as a parallel output from one COG, and then have the DAC COG read the state of those lines for its data? Can this be performed more quickly than access to HUB memory?

T Chap · 2014-05-05 05:23

Are both cogs running in PASM? It may be better to post the code so it can be better understood. How many bits of resolution do you really need?

Cluso99 · 2014-05-05 05:36

Most definately you can use unused pins. You can even use the fact that 2 cogs can write to the same pins and the result is an OR of the values. You can also use waitpeq and waitpne.

porcupine · 2014-05-05 06:08

Yeah I guess the question is whether the WAIT[PEQ|PNE] + read from the pins is faster than a couple of hub memory accesses.

porcupine · 2014-05-05 06:49

T Chap wrote: »

Are both cogs running in PASM? It may be better to post the code so it can be better understood. How many bits of resolution do you really need?

My code is a mixture of C++ and inline assembly. My biggest bottleneck is the SPI code which I can't get any faster than 2mbps -- some of this is because it's running LMM, I'm working on getting code size to under 2k so I can use COG mode for the DAC code. I'm also trying to learn how the fast SPI techniques that use CTRA/CTRB work, to see if they can apply. I want to be able to drive my DAC at 15mbps or more using SPI. But even if I were to optimize my SPI-send routine, I will still need to optimize the data retrieval, as I'm using 13 instructions just to pull from a ring buffer in HUB memory right now, so it's quite costly.

Cluso99 · 2014-05-05 12:19

Do you have spare cogs? If so, then your spi driver can be done in a seperate cog in pasm.
See Kyes FAT16/32 driver (for sd cards). IIRC he uses counters for the SPI code.

If your code will pause more than 1 hub cycle then waitpeq/waitpne should be faster, and it reduces power while waiting. If your transfer is via pins then you can immediately then read the pins else your codeneedsto wait for the next hubcycle.

15mbs is going to be quite difficult to sustain. For a byte and inline pasm, a minimum will be...
[code]
xxx 'extract bit by shift or mask
mov outa,yyy
xor outa,

Cluso99 · 2014-05-05 12:27

sorry, my xoom croaked

1 bit of the byte...

 xxx 'extract next bit
 mov outa,yyyy 'output bit and clock
 xor outa,clockbit 'toggle clock (sample edge)
 xor outa,clockbit 'toggle (may save this in inline code - use the "mov" to do this and set the data bit)

you may be able to use the muxc or muxz rather than mov.

These are just ideas to get you thinking, so I hope they help.

Cluso99 · 2014-05-05 12:28

You might also consider overclocking.

DavidZemon · 2014-05-05 13:32

porcupine wrote: »

My code is a mixture of C++ and inline assembly. My biggest bottleneck is the SPI code which I can't get any faster than 2mbps -- some of this is because it's running LMM, I'm working on getting code size to under 2k so I can use COG mode for the DAC code. I'm also trying to learn how the fast SPI techniques that use CTRA/CTRB work, to see if they can apply. I want to be able to drive my DAC at 15mbps or more using SPI. But even if I were to optimize my SPI-send routine, I will still need to optimize the data retrieval, as I'm using 13 instructions just to pull from a ring buffer in HUB memory right now, so it's quite costly.

First, a +1 to Cluso's suggestion of looking at Kye's FAT16/32 driver for SPI technique. That's how I started my SPI driver and using the counter module is on my todo list as well.

Second, you may need to give up the 15 MHz dream. As far as I know, you can't get any faster than ~4 MHz with an 80 MHz clock. You might be able to increase that a bit if you unroll the loop (I think 4 MHz is achieved with a djnz loop over each bit in a data word), but I don't think 15 is reachable.

jazzed · 2014-05-05 14:21

Hi.

Yes, you can use waitpne/peq for reading pins (or even using two instructions) faster than reading HUB ram.

With an 96MHz clock speed it is possible to read 6MB per second bursts from HUB RAM using counters with a free pin).

This all assumes a PASM style driver.

There are several examples of using the video circuit for high speed output (12Mbps+ with a 96MHz clock), but input speeds are limited.

CogSaver · 2014-05-05 15:33

I do not know any details about your hardware solution, but seeing that you seem to have spare IO, is there an option to go for a parallell load DAC rather than SPI ?
That may eliminate 1 bottleneck

CogSaver

kuroneko · 2014-05-05 16:16

porcupine wrote: »

I want to be able to drive my DAC at 15mbps or more using SPI. But even if I were to optimize my SPI-send routine, I will still need to optimize the data retrieval, as I'm using 13 instructions just to pull from a ring buffer in HUB memory right now, so it's quite costly.

How many bits go out in one go or is it a continuous data stream? As others have said, it would really help to see something, like what you pull from hub (byte/word/long), queue sizes etc.

porcupine · 2014-05-05 16:28

I don't need input at all. Just output to my DAC. A separate COG does reads from sd card to read wavetable data.

I read 128-byte samples, process them, and then send them to my DAC COG. It sends a burst of 16 bits to the DAC, then pauses for X cycles, where X is my clock rate / target frequency in hz / 128.

This is to avoid downsampling of my waveform data, basically to run the DAC at a variable sample rate. This is how the old PPG wavetable synthesizers worked in the 80s; the read rate from memory and the DAC sample rate varied; was tied to the frequency to be played, avoids some aliasing artifacts. Back then it was done with custom circuitry, now I'm trying to do it with serial comms; http://www.electricdruid.net/index.php?page=info.wavetableoscs

Right now I am getting about 2-3mbps on my SPI output to the DAC, bitbanging in assembly. That maxes me out at a 1000 hz tone unless I downsample.

I don't understand the WAITVID technique and can't find an example of it. I was hoping to make the dual-counter SPI output technique work, and worked from example code but can't seem to get it to work. I'll paste my code later after I get the kids to bed.

jazzed wrote: »

Hi.

Yes, you can use waitpne/peq for reading pins (or even using two instructions) faster than reading HUB ram.

With an 96MHz clock speed it is possible to read 6MB per second bursts from HUB RAM using counters with a free pin).

This all assumes a PASM style driver.

There are several examples of using the video circuit for high speed output (12Mbps+ with a 96MHz clock), but input speeds are limited.

porcupine · 2014-05-05 16:51

Yes, I've thought about this, I think it could be a win. But can't find a cheap parallel 12-bit or greater DAC that is available in through hole. For SMT there's lots of options, of course, but was hoping for something I could breadboard.

Any recommendations on a 12 or 16 bit parallel DAC?

CogSaver wrote: »

I do not know any details about your hardware solution, but seeing that you seem to have spare IO, is there an option to go for a parallell load DAC rather than SPI ?
That may eliminate 1 bottleneck

CogSaver

porcupine · 2014-05-06 07:02

Just ordered this: http://ca.mouser.com/Search/ProductDetail.aspx?R=AD7948BNZvirtualkey58430000virtualkey584-AD7948BNZ

12 bit through hole DAC with parallel interface, 1.7 million samples per second max throughput, 3 or 5v operation. If I can get it to work I'll put my C++ code into OBEX maybe.

Using unused I/O pins for inter-cog comms?

Comments