Faster than hub

K2 · 2024-01-23 22:53

A million years later, a sort-of accidental capability of P1 is still the salvation of many of my projects. Today's pinch goes something like this: A cog has only 25 instructions (1.25 uS) with which to fetch a five-bit value from another cog, do a local table look-up, a quick calculation, clock the results out serially, and finish a waitcnt (for sync purposes).

There is absolutely no time to wait for a variable length rdlong to execute. Grabbing the 5-bit word from port A, written there by another cog, is the only way forward. I LOVE that hack.

I was a stinker campaigning for something like that on P2, which Chip accommodated by letting neighboring LUTs talk. I hope I'm not the only one that has benefited from its inclusion.

RossH · 2024-01-24 02:08

I always wanted port B implemented for much the same reason, even if it had no actual pins connected. If the P1 is ever "re-spun" I hope Parallax adds that.

Ross.

evanh · 2024-01-24 05:13

I believe Chip has said many times that the Prop1 would've been revised if it was possible. But the tools used were obsolete even back then. It was always a one-off design.

The only thing stopping the Prop2 getting a raft of siblings with added or reduced features is the cost of making each mask set. The existing Prop2 needs to sell more first.

macca · 2024-01-24 08:19

@K2 said:
A million years later, a sort-of accidental capability of P1 is still the salvation of many of my projects. Today's pinch goes something like this: A cog has only 25 instructions (1.25 uS) with which to fetch a five-bit value from another cog, do a local table look-up, a quick calculation, clock the results out serially, and finish a waitcnt (for sync purposes).

There is absolutely no time to wait for a variable length rdlong to execute. Grabbing the 5-bit word from port A, written there by another cog, is the only way forward. I LOVE that hack.

Hub access can be synchronized, after the first, which is more or less random, all subsequent accesses can be calculated (hope to remember correctly) 2 instructions between accesses or every 4 instructions .

rdlong <- random wait
ins1
ins2
rdlong <- no wait

or

rdlong <- random wait
ins1
ins2

ins3
ins4
ins5
ins6
rdlong <- no wait

The P2 is a bit more tricky with the weird hub access scheme, but I think something like that can be used.

waitcnt could be a problem since it may disrupt the synchronization.

Maybe you already tested that and found not working for your needs, just a reminder.

evanh · 2024-01-24 13:08

On the subject of chip making progress, I recently read some info, on Wikipedia, that I wasn't expecting. ASML being the sole manufacturer of EUV lithographic equipment for chip making has actually come from fundamental research in USA. There was only two licenses granted from DOE/Congress joint signing after the conclusion of the research back in the 1990s. One was to Intel, which doesn't seem to have developed it, the other went to a small outfit called Silicon Valley Group. SVG was later bought out by ASML.

https://en.wikipedia.org/wiki/Extreme_ultraviolet_lithography

What the article didn't say was whether any special signing off was required for the license/knowledge to be transferred to Europe. I guess there is a lot of sharing of fundamental research between the two anyway.

Faster than hub

Comments