A different way to access smart pins
If we dropped the shared LUT thing, I wonder if there would there be enough room to double the bus width to the smart pins. That would speed up the special link mode AND all other smart pin interaction.
@cgracey, what is the current bus width to the smart pins? I have another thought (which I'll have to wait and share in a few hours).
I've been thinking about the use of smart pins for fast data sharing. One issue with them is that you have to trade off latency with data width (i.e. 32 bits stall a cog much longer than 8 bits do). Conversely, if you look at something like the hub, careful timing can maximize data width while minimizing latency. So, the question is, what if we were to do the same with the smart pins...
Suppose the following changes:
* Get rid of the variable-length timing, returning back to the old method where smart pin access was alway 10 clocks
* Group the smart pins into 8 groups
* Widen the data path to 32 bits (which, if @jmg is correct, is effectively how many data lines 8 cogs have anyhow)
* Access to each smart pin becomes round-robin, where the entire 32 bits from each successive smart pin are transfered every clock cyle
1. It's already a familiar access pattern for a shared resource (i.e. hub)
2. Cogs can do other things between accesses to a smart pin.
3. The worst case latency is not worse than the current worst case. And by timing access, the best case latency will be the same as the current best case. In other words, latency is dependent on timing, not data width.
What can we do with this?
Well, I'm sure you all will be able to think of some great stuff! But, here are a couple simple ones...
A tiny serial echo block:
echo_loop REP #rep_end, #0 ' loop forever WAITEDG ' wait for RX PINACK #32 ' ack (still 2 clocks) PINGETZ t0, #32 ' grab the incoming data (waits 6 clocks) PINSETY t0, #34 ' send to tx (0 extra clocks due to alignment) rep_end
Actually, this one could be even more interesting if PINGETZ could also indicate whether pin event was active. Because PINGETZ would wait until it gets the right slot, you could change the code as follows:
echo_loop REP #rep_end, #0 ' loop forever PINGETZ t0, #32 WC ' grab the incoming data (C = event flag) if_c PINSETY t0, #34 ' send to tx (0 extra clocks due to alignment) if_c PINACK #32 ' ack the rx(still 2 clocks) rep_end
Of course, this would be much less energy efficient than using WAITEDG, but it would be crazy fast!
How about another example... fast cog-to-cog data sharing of multiple 32-bit values. On the sender's side:
send_data PINSETY t0, #32 ' writes 32 bits to smart pin (triggers edg event in receiver) ' grab next data into t0 WAITEDG ' wait for falling edge (receiver ACKed) PINSETY t0, #32 ' writes t0 to smart pin (waits 6 clocks) ' next three instruction sets next t0 PINSETY t0, #32 ' writes t0 to smart pin (waits 0 clocks due to alignment) ' next three instruction sets next t0 PINSETY t0, #32 ' writes t0 to smart pin (waits 0 clocks due to alignment) ' repeat as necessary... [/code On the receiver's side: [code] get_data WAITEDG ' wait for other cog to write to smart pin PINACK #32 ' ack (still 2 clocks) PINGETZ t0, #32 ' grab the incoming data (waits 6 clocks) ' next three instructions put t0 somewhere... PINGETZ t0, #32 ' grab next incomping data (waits 0 clocks due to alignment) ' next three instructions put t0 somewhere... ' repeat as necessary...
Anyhow, I think you get the idea. Just as with the hub, you can always safely write your code for the worst-case timing. But, in those few spots where latency is critical, you still have the ability to tune your code.