Is LUT sharing between adjacent cogs very important?

Phil Pilgrim (PhiPi) · 2016-05-19 04:24

My gut tells me that, starting from reset it won't matter. But once things get fragmented, preferring singleton cogs with already-allocated odd/even mates is definitely the way to go. (I can't believe I'm still abetting this discussion. I should be flogged.)

-Phil

jmg · 2016-05-19 04:26

cgracey wrote: »

Good thinking. What I got out of that is that when a single cog is requested, an unused cog within an even/odd pair in which the other cog is in use is preferred over an unused cog whose even/odd neighbor is also unused. This will tend to leave unused even/odd pairs available for double-cog requests. Neat! Phil, would it really work?

That checking-two-deep makes good sense, I guess it has fairly low logic cost ?

cgracey · 2016-05-19 04:26

Phil Pilgrim (PhiPi) wrote: »

My gut tells me that, starting from reset it won't matter. But once things get fragmented, preferring singleton cogs with already-allocated odd/even mates is definitely the way to go. (I can't believe I'm still abetting this discussion. I should be flogged.)

-Phil

I'm glad your so excited about things.

Phil Pilgrim (PhiPi) · 2016-05-19 04:45

cgracey wrote:

I'm glad your so excited about things.

There's a scene near the end of Kurt Vonnegut's Player Piano that I'll never forget. The Luddites have destroyed the robots and all they represent, including an automated orange juice vending machine in a station somewhere. One of them, fresh from victory, looks at the orange juice machine and says, "I think I can make it work again."

It's a sickness, really.

-Phil

cgracey · 2016-05-19 04:53

Phil Pilgrim (PhiPi) wrote: »

cgracey wrote:

I'm glad your so excited about things.

There's a scene near the end of Kurt Vonnegut's Player Piano that I'll never forget. The Luddites have destroyed the robots and all they represent, including an automated orange juice vending machine in a station somewhere. One of them, fresh from victory, looks at the orange juice machine and says, "I think I can make it work again."

It's a sickness, really.

-Phil

That's really funny.

I do not like this whole thing to drag on any longer, but there are just a few minor things that need to be put right before it's done.

Seairth pointed out that the hub r/w events are now antiquated by ATN (attention). They can go away and save a nice chuck of logic, while freeing up two events. I hope to get all the current changes done tonight.

evanh · 2016-05-19 04:53

Democracy?

cgracey · 2016-05-19 04:58

evanh wrote: »

Democracy?

Yes. With all of you guys and your ideas, there's always a lot to consider.

jmg · 2016-05-19 04:59

cgracey wrote: »

Seairth pointed out that the hub r/w events are now antiquated by ATN (attention). They can go away and save a nice chuck of logic, while freeing up two events. I hope to get all the current changes done tonight.

The ATN also needed a 'who called' information field, to make it more usable.

cgracey · 2016-05-19 05:00

jmg wrote: »

cgracey wrote: »

Seairth pointed out that the hub r/w events are now antiquated by ATN (attention). They can go away and save a nice chuck of logic, while freeing up two events. I hope to get all the current changes done tonight.

The ATN also needed a 'who called' information field, to make it more usable.

Yes, I believe Seairth has made a nice proposal about how to handle that.

Electrodude · 2016-05-19 05:06

Since there are now two extra events, can you add one that fires every time a cog stops? This would be useful for situations such as webservers with a master cog that spawns lots of cogs with jobs and that wants to be woken up whenever new cogs become available to start. This event should fire on every cogstop, and not just on cogstops when all cogs were previously running but now cogs are available, so that it can be used when it is not desirable for the jobserver cog to use all possible cogs, and so that watchdog cogs can use it as a hint that something is going wrong.

And, if there are any extra event slots left over when the chip is otherwise finished, please fill them with more timers.

jmg · 2016-05-19 05:11

Electrodude wrote: »

Since there are now two extra events, can you add one that fires every time a cog stops?

Wouldn't the COG doing the STOP fire an event just before it stops ?
Rather than more events, maybe this needs another small bit field to that 'Who called' group, with a simple 'What' Sticky OR.
Would 4 bits be enough ? - allows 4 independent boolean signals, or 1 of 16 ?

cgracey · 2016-05-19 05:37

RDL/WRL sensors and events are gone now. This is going to save some logic. And documentation.

cgracey · 2016-05-19 08:07

I hopped up COGNEW (COGINIT with D[4] set) to be able to launch one or two cogs.

To launch 1 cog, D[4] is set and D[0] is clear. It gives priority to cogs whose odd/even partners are already enabled, preserving unused odd/even pairs for future 2-cog COGNEWs, when possible.

To launch 2 cogs, D[4] is set and D[0] is set. If an unused odd/even pair of cogs is found, it starts them both up with the same parameters.

Compiling is done, so I'll test it now...

jmg · 2016-05-19 08:18

cgracey wrote: »

I hopped up COGNEW (COGINIT with D[4] set) to be able to launch one or two cogs.

To launch 1 cog, D[4] is set and D[0] is clear. It gives priority to cogs whose odd/even partners are already enabled, preserving unused odd/even pairs for future 2-cog COGNEWs, when possible.

To launch 2 cogs, D[4] is set and D[0] is set. If an unused odd/even pair of cogs is found, it starts them both up with the same parameters.

Sounds good

cgracey wrote: »

Compiling is done, so I'll test it now...

Fingers crossed...

What settings are needed for this to Compile now ? (ie all/some Pin Cells?)
What is the Logic used in A9 ? Compile Times ?

cgracey · 2016-05-19 08:19

jmg wrote: »

What settings are needed for this to Compile now ? (ie all/some Pin Cells?)
What is the Logic used in A9 ? Compile Times ?

I'm only compiling two cogs and eight smart pins, just to keep the compile time short. The cog-allocation circuitry doesn't know those cogs are missing, though, so COGNEW can be tested.

jmg · 2016-05-19 08:25

cgracey wrote: »

smart pins, just to keep the compile time short. The cog-allocation circuitry doesn't know those cogs are missing, though, so COGNEW can be tested.

Makes sense of course

How will the shared LUT write be managed - can it be 'live' via either a flag, or an address bit, or must it be some preset-mode-bit.

If it is a preset mode bit, what happens during interrupts ?

Peter Jakacki · 2016-05-19 08:32

Now that the Gracey Look Up Table is being tested I think the P2 is shaping up to be the best chip we've never had.

cgracey · 2016-05-19 08:37

jmg wrote: »

How will the shared LUT write be managed - can it be 'live' via either a flag, or an address bit, or must it be some preset-mode-bit.

If it is a preset mode bit, what happens during interrupts ?

I was just working that out in order to implement it.

I'm thinking three bits are needed:

0xx = don't allow LUT writes from other cog (default)
1xx = allow LUT writes from other cog

x00 = local writes only affect local LUT (default)
x01 = local writes only affect other LUT (if enabled in other cog)
x1x = local writes affect both local LUT and other LUT (if enabled in other cog)
x11 = (what can we use this mode for?)

On second thought, we don't need that top bit, as applications must behave themselves and not rely on partial stop-gap hardware measures to prevent mayhem.

macca · 2016-05-19 08:44

cgracey wrote: »

To launch 2 cogs, D[4] is set and D[0] is set. If an unused odd/even pair of cogs is found, it starts them both up with the same parameters.

You mean that they both load the same code ? If so, correct me if I'm wrong but doesn't look very good to me for the intended application.

cgracey · 2016-05-19 08:48

macca wrote: »

cgracey wrote: »

To launch 2 cogs, D[4] is set and D[0] is set. If an unused odd/even pair of cogs is found, it starts them both up with the same parameters.

You mean that they both load the same code ? If so, correct me if I'm wrong but doesn't look very good to me for the intended application.

That single program will be written for both cogs. The first thing it does is a COGID to check the LSB of the cog number. The even cog branches one way and the odd cog goes another. There are not hardware facilities for pointing to two different programs, so the program must figure it out.

cgracey · 2016-05-19 08:55

Maybe this is all we need for modes:

0 = other cog's LUT writes are ignored (default)
1 = other cog's LUT writes are permitted

This way, all cogs send their LUT writes to their companion cog, but the companion cog must permit them. This keeps both LUTs the same, as writes occur.

Is there much value in being able to control whether you write to your own LUT, the other LUT, or both? It makes for more memory, but then the juggling becomes more complex.

mindrobots · 2016-05-19 09:10

If you keep two control bits, then the 01 setting above allows COG to use its LUT as private space and LUT+1 as a place to pass data to COG+1. Seems like a useful mode. No?

jmg · 2016-05-19 09:15

cgracey wrote: »

Is there much value in being able to control whether you write to your own LUT, the other LUT, or both? It makes for more memory, but then the juggling becomes more complex.

Yes, that is why the suggestion to make this Address-bit based removes the juggling.

How does the method above cope with an interrupt into code that expects other-way operation ?

mindrobots · 2016-05-19 09:15

Spare mode - 11
Is there any time when you'do want the data automatically inverted? Normal data written to your LUT, inverted data written you LUT+1? Sprite manipulation? (Have no clue, just don't want to leave any modes unused!

)

Heater. · 2016-05-19 09:17

macca,

You mean that they both load the same code?

Yep. But it's not a problem see below:

If so, correct me if I'm wrong but doesn't look very good to me for the intended application.

Here is what you do.

1) Start two COGS with the same code.
2) Have that code, running in both COGs, check it's COGID for odd/even.
3) One of them can now reload itself with an entire COGS worth of code with COGINIT.
4) The other just gets on and runs the code it was loaded with.

Or of course one could start two COGs with a tiny little bit of code that checks odd/even COGID and then they both reload themselves with the real COG codes you want to run. The addresses of the codes to run would be passed in as PAR parameters.

Heck, this could be generalized and built into Spin as a dual COGSTART feature.

Edit: The above assumes we are talking about PASM. Where you want to fill both COGS with as much of your own PASM as possible. If you are just stating two COGs to run SPIN this reloading is not required, they both run the same Spin engine. But then Spin is as slow as hell so you probably would not be into this LUT sharing and dual COG starting anyway.

Hey, it's a bit like doing a fork() in Unix. A single program calls fork(), the system then starts another process to run exactly the same code. But the code knows if it is a parent or child so it branches and runs appropriate parts of itself.

cgracey · 2016-05-19 09:18

jmg wrote: »

cgracey wrote: »

Is there much value in being able to control whether you write to your own LUT, the other LUT, or both? It makes for more memory, but then the juggling becomes more complex.

Yes, that is why the suggestion to make this Address-bit based removes the juggling.

How does the method above cope with an interrupt into code that expects other-way operation ?

Can you give an example of what you are thinking about here?

cgracey · 2016-05-19 09:19

mindrobots wrote: »

If you keep two control bits, then the 01 setting above allows COG to use its LUT as private space and LUT+1 as a place to pass data to COG+1. Seems like a useful mode. No?

LUT+1? Do you mean higher addresses than $000..$1FF.

cgracey · 2016-05-19 09:24

I can see a problem during an interrupt where there's a possibility that the LUT-write mode is not what you want. You really want to avoid that, altogether.

Maybe we simply need a 5-bit value that tells how many contiguous 16-long blocks are shared, starting from $000 in the LUT. There's some flexibility, but without a need to toggle contexts, possibly even in interrupt code. It needs to be set-and-forget, I think. That would make life easiest.

Rayman · 2016-05-19 10:00

With 5 bit value don't need any control bits anymore?

jmg · 2016-05-19 10:03

cgracey wrote: »

jmg wrote: »

How does the method above cope with an interrupt into code that expects other-way operation ?

Can you give an example of what you are thinking about here?

Where you have a table like this:
0xx = don't allow LUT writes from other cog (default)
1xx = allow LUT writes from other cog

x00 = local writes only affect local LUT (default)
x01 = local writes only affect other LUT (if enabled in other cog)
x1x = local writes affect both local LUT and other LUT (if enabled in other cog)
x11 = (what can we use this mode for?)

If some code is set one way (LUT allow), and then interrupt vectors to code that is expecting the other way, does the interrupt code have to read/save/change/restore those bits for any LUT writing ?

cgracey wrote: »

Is there much value in being able to control whether you write to your own LUT, the other LUT, or both?

more on this :
Write-both, may have some uses :
That can be polled, in case the Signaling is used for something else.
ie pass params on a write-both basis, with one as a flags bitset.
then poll that flags location until it flips, and you can then read the result passed back.
that could help memory management, where params to/from are common, and a location signals 'done'.

or,
if COG1 is very short on time, it can write both, and COG2 flips its local value and polls.
I think this limits to 16,8b params, but it means COG1 can avoid any flags.
A New value arriving is clear to COG2.

Is LUT sharing between adjacent cogs very important?

Comments