Faster than hub
A million years later, a sort-of accidental capability of P1 is still the salvation of many of my projects. Today's pinch goes something like this: A cog has only 25 instructions (1.25 uS) with which to fetch a five-bit value from another cog, do a local table look-up, a quick calculation, clock the results out serially, and finish a waitcnt (for sync purposes).
There is absolutely no time to wait for a variable length rdlong to execute. Grabbing the 5-bit word from port A, written there by another cog, is the only way forward. I LOVE that hack.
I was a stinker campaigning for something like that on P2, which Chip accommodated by letting neighboring LUTs talk. I hope I'm not the only one that has benefited from its inclusion.