You need to make receive and transmit FIFO's in cog RAM using ALTDS to get determinant timing. At least, this would be ideal for using the streams in the same cog.
You need to make receive and transmit FIFO's in cog RAM using ALTDS to get determinant timing. At least, this would be ideal for using the streams in the same cog.
Do you think it is worth doing to allow effective background fetch in order to boost some cogs code throughtput?
The only solution I've been able to think of so far is to actually sample RX much closer to the beginning of each bit time. As long as the bit transitions are fast, you could reasonably sample only a few clock cycles into each bit time. This would make the maximum baud rate dependent on ~(sample_delay+36) instead of ~(2*36). With clean signalling, you should be able to get back up to the ~1Mbps rate with the current 50MHz clock.
This needs a little care, as not all incoming signals are 'edge-perfect'.
If the incoming UART is a FT232H with a crystal and HW dividers, then yes, you can expect edges to better than 1 SysCLK in precision.
If it is some SW Tx, that can have edge jitter itself, and if you add AutoBAUD and/or RC Osc into the mix, there can be Baud offsets in play as well.
You need to make receive and transmit FIFO's in cog RAM using ALTDS to get determinant timing. At least, this would be ideal for using the streams in the same cog.
Also, the local buffer can be in LUT ?
Double buffering could be used, where a fast COG or LUT buffer is used to allow fastest bit timing, and then bytes move to the HUB ( & on to other COGs) when time is available.
You need to make receive and transmit FIFO's in cog RAM using ALTDS to get determinant timing. At least, this would be ideal for using the streams in the same cog.
Do you think it is worth doing to allow effective background fetch in order to boost some cogs code throughtput?
You need to make receive and transmit FIFO's in cog RAM using ALTDS to get determinant timing. At least, this would be ideal for using the streams in the same cog.
Do you think it is worth doing to allow effective background fetch in order to boost some cogs code throughtput?
I'm not sure what you mean.
It was two comments of mine on the first page, I've snipped here
78rpm wrote: »
» show previous quotes
This is relevant to the discussion so far, but not just about fds-demo.
What if you synchronise with hub-access slot, and set a counter to interrupt at the point when hub data can be read / written. That would mean buffering the data internal to the cog/lut. The interrupt occurs, you have deducted an amout of time to enable reading of cog data to pass to the hub. Then at the moment the hub slot is available the next instruction is / read / write. If read, save data in cog/lut for TX routine. IRETn.
It would be overhead to some degree, but it would also mean not waiting for the hub as such. Your pure cog code can run merrily along until it runs out of or has generated enough data, and then has to wait.
Then the second comment:
Actually, if there was Special Function Register which you could do a RDBYTE/WORD/LONG or WRBYTE/WORD/LONG, which stored the data, and some bits hidden stored the registers, including ptr involved, then when hub access came round it was *automatically* executed, you can have a higher throughput of cog code without waiting on the hub so much, by sometimes reading data in advance. If the WAITxxxx POLLxxx wee extended to provide status of this it would be a useful transfer mechanism in the background.
I.E. read/write a Special Function Reg which is bi-directional to read/write hub while you get on with something else. Com and collect when ready or write another.
You could synchronize to the hub using a CTx interrupt, but you'd need to keep your mainline code free of any 3+-clock instructions, so that the interrupt could occur on the intended clock edge, every time. It would take a few cycles to get into, and a few cycles to get out of, the ISR. I don't know if it would be a net win.
I've thought about buffering WRxxxx and RDxxxx instructions, but have never done it. In theory, it would make deterministic execution possible. That's maybe too big a bite to take at the moment, but we should keep it in mind, as it would be very nice to be able to code deterministic software that reads and writes the hub.
Where I was leading was that RDFAST and WRFAST already provide that deterministic behaviour. But one can't flip back and forth between them without incurring a delay on each RDFAST.
Where I was leading was that RDFAST and WRFAST already provide that deterministic behaviour. But one can't flip back and forth between them without incurring a delay on each RDFAST.
RDFAST is all that needs a bit of love.
RDFAST kind of needs to wait for the (re)configuration to complete, which might involve writing any remaining long fragment from a previous WRFAST, and certainly involves getting the first long from hub into the FIFO. Otherwise, you'd come out of RDFAST right away and then RFBYTE/RFWORD/RFLONG would have to wait.
Were you thinking of something where, some guaranteed number of clocks later, RDFAST will be ready, allowing RDFAST to be just 2 clocks? I think that might be reasonable to do. We'd just need a RDFASTX for RDFAST-exit!
Were you thinking of something where, some guaranteed number of clocks later, RDFAST will be ready, allowing RDFAST to be just 2 clocks? I think that might be reasonable to do. We'd just need a RDFASTX for RDFAST-exit!
Yes. I thought maybe even a new event flag could be generated from this.
Comments
Do you think it is worth doing to allow effective background fetch in order to boost some cogs code throughtput?
This needs a little care, as not all incoming signals are 'edge-perfect'.
If the incoming UART is a FT232H with a crystal and HW dividers, then yes, you can expect edges to better than 1 SysCLK in precision.
If it is some SW Tx, that can have edge jitter itself, and if you add AutoBAUD and/or RC Osc into the mix, there can be Baud offsets in play as well.
Also, the local buffer can be in LUT ?
Double buffering could be used, where a fast COG or LUT buffer is used to allow fastest bit timing, and then bytes move to the HUB ( & on to other COGs) when time is available.
I'm not sure what you mean.
It was two comments of mine on the first page, I've snipped here Then the second comment:
I.E. read/write a Special Function Reg which is bi-directional to read/write hub while you get on with something else. Com and collect when ready or write another.
I've thought about buffering WRxxxx and RDxxxx instructions, but have never done it. In theory, it would make deterministic execution possible. That's maybe too big a bite to take at the moment, but we should keep it in mind, as it would be very nice to be able to code deterministic software that reads and writes the hub.
RDFAST is all that needs a bit of love.
RDFAST kind of needs to wait for the (re)configuration to complete, which might involve writing any remaining long fragment from a previous WRFAST, and certainly involves getting the first long from hub into the FIFO. Otherwise, you'd come out of RDFAST right away and then RFBYTE/RFWORD/RFLONG would have to wait.
Were you thinking of something where, some guaranteed number of clocks later, RDFAST will be ready, allowing RDFAST to be just 2 clocks? I think that might be reasonable to do. We'd just need a RDFASTX for RDFAST-exit!
Yes. I thought maybe even a new event flag could be generated from this.