Is LUT sharing between adjacent cogs very important?

cgracey · 2016-05-13 10:30

Seairth wrote: »

evanh wrote: »

I like the edge detection. That saves an instruction at the sending end. Enough to make the new "Attention" facilities redundant even?

Hardly! The "attention" facilities will still be faster when that's all you need.

Also, with the transponder mode, you will need to call PINACK to reset IN each time. The "attention" is likely an auto-reset, so no additional ACK instruction needed.

This is true. There's a lot of overlap in how to do things in the Prop2.

evanh · 2016-05-13 10:32

Cool, very flexible.

Everything done then.

cgracey · 2016-05-13 10:35

Seairth wrote: »

cgracey wrote: »

evanh wrote: »

cgracey wrote: »

... When someone does a 'SETPINX data,pin' on one of those, it updates the Z register to what was written and raises IN, alerting any interested parties that there is new data ready to be grabbed via PINGETZ. If the recipients are waiting in a WAITEDG for that pin's IN, the whole timing will be deterministic.

I like the edge detection. That saves an instruction at the sending end. Enough to make the new "Attention" facilities redundant even?

Perhaps, sort of. The 'COGATN mask16' instruction can set the attention flag in any number of cogs at once, without using up their edge detectors. Maybe what is needed, instead, is simply a second edge event. That would be universally useful.

Maybe some more people could comment on this. I'm too tired to trust my own judgment at the moment.

Can you provide the instruction set for COGATN?

The cog that wants to get others' attention just does 'COGATN mask16'. It's a 2-clock instruction that generates a 1-clock strobe to all other cogs in the mask. Each cog has an ATN event that will register the strobe. There's no data involved. The cogs have to know why they are being strobed for attention.

The 'SETPINX byte,pin' instruction only takes 4 clocks and sends a byte.

ozpropdev · 2016-05-13 10:39

evanh wrote: »

ozpropdev wrote: »

Also once a pin is configured in a smartpin mode the only way to read it's pin state is by another smartpin that's within range.

As JMG has pointed out more than once, pins get firmly mapped for board layout. That's a necessity. Are you suggesting we should change Cog allocation to all Cogs must be firm mapped as well? Ie: Eliminate COGNEW altogether.

PS: I'm actually okay with this but it is significant change that we should be clearly aware of.

??? woah! I don't think I suggested anything like ELIMINATING anything at all.
My point was simply that mapping pins also now has a new variable in the mix, that being smartpin input selector range.

Seairth · 2016-05-13 10:40

cgracey wrote: »

Seairth wrote: »

cgracey wrote: »

evanh wrote: »

cgracey wrote: »

... When someone does a 'SETPINX data,pin' on one of those, it updates the Z register to what was written and raises IN, alerting any interested parties that there is new data ready to be grabbed via PINGETZ. If the recipients are waiting in a WAITEDG for that pin's IN, the whole timing will be deterministic.

I like the edge detection. That saves an instruction at the sending end. Enough to make the new "Attention" facilities redundant even?

Perhaps, sort of. The 'COGATN mask16' instruction can set the attention flag in any number of cogs at once, without using up their edge detectors. Maybe what is needed, instead, is simply a second edge event. That would be universally useful.

Maybe some more people could comment on this. I'm too tired to trust my own judgment at the moment.

Can you provide the instruction set for COGATN?

The cog that wants to get others' attention just does 'COGATN mask16'. It's a 2-clock instruction that generates a 1-clock strobe to all other cogs in the mask. Each cog has an ATN event that will register the strobe. There's no data involved. The cogs have to know why they are being strobed for attention.

The 'SETPINX byte,pin' instruction only takes 4 clocks and sends a byte.

So the incoming cog won't know who strobed it? Hmm...

cgracey · 2016-05-13 10:42

ozpropdev wrote: »

evanh wrote: »

ozpropdev wrote: »

Also once a pin is configured in a smartpin mode the only way to read it's pin state is by another smartpin that's within range.

As JMG has pointed out more than once, pins get firmly mapped for board layout. That's a necessity. Are you suggesting we should change Cog allocation to all Cogs must be firm mapped as well? Ie: Eliminate COGNEW altogether.

PS: I'm actually okay with this but it is significant change that we should be clearly aware of.

??? woah! I don't think I suggested anything like ELIMINATING anything at all.
My point was simply that mapping pins also now has a new variable in the mix, that being smartpin input selector range.

An object that uses multiple pins would probably stipulate that they be in a row, if they are bound by smart pins.

cgracey · 2016-05-13 10:43

Seairth wrote: »

cgracey wrote: »

Seairth wrote: »

cgracey wrote: »

evanh wrote: »

cgracey wrote: »

... When someone does a 'SETPINX data,pin' on one of those, it updates the Z register to what was written and raises IN, alerting any interested parties that there is new data ready to be grabbed via PINGETZ. If the recipients are waiting in a WAITEDG for that pin's IN, the whole timing will be deterministic.

I like the edge detection. That saves an instruction at the sending end. Enough to make the new "Attention" facilities redundant even?

Perhaps, sort of. The 'COGATN mask16' instruction can set the attention flag in any number of cogs at once, without using up their edge detectors. Maybe what is needed, instead, is simply a second edge event. That would be universally useful.

Maybe some more people could comment on this. I'm too tired to trust my own judgment at the moment.

Can you provide the instruction set for COGATN?

The cog that wants to get others' attention just does 'COGATN mask16'. It's a 2-clock instruction that generates a 1-clock strobe to all other cogs in the mask. Each cog has an ATN event that will register the strobe. There's no data involved. The cogs have to know why they are being strobed for attention.

The 'SETPINX byte,pin' instruction only takes 4 clocks and sends a byte.

So the incoming cog won't know who strobed it? Hmm...

If there's a possibility that multiple cogs might be strobing it asynchronously, you've invented a time bomb that will probably blow up within 1ms.

Rayman · 2016-05-13 10:44

I like the data via smartpin mode.
I assume it doesn't take much logic to add that.

Only down side there is that it uses a pin.
Can that pin be used for anything else?

Can it be an output only pin maybe?

I think you said it couldn't be a DAC output pin too, right?

BTW: I hope these are very last things that get added.

cgracey · 2016-05-13 10:45

Rayman wrote: »

I like the data via smartpin mode.
I assume it doesn't take much logic to add that.

Only down side there is that it uses a pin.
Can that pin be used for anything else?

I think you said it couldn't be a DAC output pin too, right?

Yes. It just can't be a DAC.

evanh · 2016-05-13 10:46

ozpropdev wrote: »

My point was simply that mapping pins also now has a new variable in the mix, that being smartpin input selector range.

Well, I guess requiring a pin pair to physically be on neighbouring pins could be called a restriction. I don't think it's one that is going to be a gotcha.

EDIT: Point is, advocating for LUT sharing (Cluso) is advocating for elimination of COGNEW as an instruction. That needs to be clearly understood.

ozpropdev · 2016-05-13 10:49

cgracey wrote: »

Tonight I made smart pin modes %00001..%00011, when the DAC is not enabled, into byte/word/long data nodes. When someone does a 'SETPINX data,pin' on one of those, it updates the Z register to what was written and raises IN, alerting any interested parties that there is new data ready to be grabbed via PINGETZ. If the recipients are waiting in a WAITEDG for that pin's IN, the whole timing will be deterministic. This isn't LUT mind meld, but it gets messages across quickly.

Sounds good.
I assume this replaces the WAITATN mechanism?

Edit: Wow! A lot of posts happened while I was typing this one.

evanh · 2016-05-13 10:55

cgracey wrote: »

Yes. It just can't be a DAC.

Neat, wow, I hadn't noticed that. There is SETPINX and PINSETX.

cgracey · 2016-05-13 10:58

evanh wrote: »

cgracey wrote: »

Yes. It just can't be a DAC.

Neat, wow, I hadn't noticed that. There is SETPINX and PINSETX.

My mistake. It's PINSETX. I actually changed them back to their old names because I think, long term, they are better:

WRPIN (was PINSETM)
WXPIN (was PINSETX)
WYPIN (was PINSETY)
RDPIN (was PINGETZ)

This will be in the next release. Sorry for any headaches here.

David Betz · 2016-05-13 11:00

cgracey wrote: »

Cluso99 wrote: »

My preference is to have shared LUT, whether or not we get another way using smart pins etc.

Since shared LUT has been done already, could we get a set of fpga code to test?

I put it all in and then took it all out, already. It could go back in, but I would have to rearrange the instructions again. Unless we can get around this cog allocation problem, I don't want to go through all that, again.

Tonight I made smart pin modes %00001..%00011, when the DAC is not enabled, into byte/word/long data nodes. When someone does a 'SETPINX data,pin' on one of those, it updates the Z register to what was written and raises IN, alerting any interested parties that there is new data ready to be grabbed via PINGETZ. If the recipients are waiting in a WAITEDG for that pin's IN, the whole timing will be deterministic. This isn't LUT mind meld, but it gets messages across quickly.

It seems pretty obvious to me that there is no way around the COG allocation problem if you want to avoid the problem of "fragmented COG space". If that is a show-stopper, then I guess we can just forget shared LUTs. BTW, I'm not arguing for or against shared LUTs. My comments were just an attempt to resolve the allocation issue. I don't personally have an application for LUT sharing.

evanh · 2016-05-13 11:04

Compile time static mapping isn't silly. It might even help with keeping order with attention strobing.

David Betz · 2016-05-13 11:09

evanh wrote: »

Compile time static mapping isn't silly. It might even help with keeping order with attention strobing.

Static mapping seems like it would be fine for 99% of applications as we envision them now but it seems a dynamic solution to the problem is being required and I don't think there is one that avoids fragmentation.

Rayman · 2016-05-13 11:11

I like WRPIN and RDPIN.
WXPIN and WYPIN don't seem particularly intuitive...

Don't know what might be better though.
WPINX WRPINX SETPINX...

T Chap · 2016-05-13 11:36

Chip, forgive my ignorance on this stuff, but what I am proposing allows the user to set a Lock flag. The user can define how the bus will work so it is on the user to use it any way he wants. There is a 32 bit bus(OR'd) with Flops. There is a DataAvailable, DataRead, and 4 bit ID. The user can then use software to send data from any number of cores to any other core. If desired, the user can send and ID and any core can scan for data on that ID. The user can Lock the bus, and software can check for a Lock set on that line before using the bus. Or the user can know that there will be no other cores using the bus and make it very fast by just loading up the any amount of data up to 32 bits, set the dataAvailable flag and that's it. Any receiver can watch for dataavailable, then read the data, then tell the sender it was received with a DataRead flag. The original sender can clear the DataRead flag and post more data. Very flexible, one or many users.

evanh · 2016-05-13 11:55

Mr T,
Already covered with a small tweak to add (extra DAC like, I think) a Smartpin mode for quick sharing. - http://forums.parallax.com/discussion/comment/1375498/#Comment_1375498

Seairth · 2016-05-13 11:56

cgracey wrote: »
Seairth wrote: »
So...
GETSTB [WZ]	' Gets own strobe state (WZ: Z = !strobe), sets to zero unless being strobed again at the same time
WAITSTB		' Waits for strobe signal, sets to zero unless being strobed again at the same time
STROBE D/#n	' D: lower 16 bits mask cogs to strobe
		' #n: index (0-15) of other single cog to strobe
* Assuming the strobe event can be handled by an interrupt
* What would it mean for D to be zero? Just a NOP?

Out of curiosity, how much circuitry would it be for each cog to get receive 16 strobe lines? In other words, the strobe output from each cog fans out to their respective cogs. So each cog would see which other cog strobed them. In which case, the instructions would look like:
GETSTB D [WZ]	' Gets incoming strobes (WZ: 1 if no active strobes), sets to zero unless being strobed again at the same time
WAITSTB		' Waits for strobe signal (does not reset strobes, must use GETSTB)
STROBE D/#n	' D: lower 16 bits mask cogs to strobe
		' #n: index (0-15) of other single cog to strobe
It would take a little more hardware, but then I could see it taking a lot more software to deal with WHO strobed you.

Bringing this conversation back up, in light of COGATN. The point is that you have the option to figure out who strobed you. If you don't care, then all you are looking for is the event.

Edit: Okay. Now that I'm in front of a computer instead of typing on my phone, let me expand on this a bit...

Rewording the original instructions, you would have:

GETATN D [WZ]   ' Gets incoming strobes (WZ: 1 if no active strobes), sets strobe bits to zero unless being strobed again at the same time
WAITATN         ' Waits for strobe signal (does not reset strobes, must use GETSTB)
COGATN D        ' D: lower 16 bits mask cogs to strobe (note: it would still be nice to have #n indicate a single cog instead of a mask)

Each cog would have its own set of 16 strobe bits, where each bit indicates which other cog had strobed it.

Here are a couple use cases for this:

* Producer/consumer: one cog is producing data, multiple cogs are consuming the data. When a consumer is ready for data, it strobes the producer. Whether multiple consumers are strobing at the same time, or not, the producer needs to know which consumer cog strobed it. The producer then transfers the next chunk of data (via hub, smart pin, etc.)

* Barrier/synchronization point: when multiple cogs need to synchronize, they start by GOTATN the barrier cog, then waits with WAITATN. The barrier cog waits until all participating cogs have strobed it, at which point it turns around and performs a single COGATN with a mask of the participating cogs to release them all at the same time.

* Guard: when a cog is responding to a strobe, it may need to make sure that the strobe came from one of a specific set of cogs. In this case, the strobed cog would simply use GETATN, then AND it with a mask.

Seairth · 2016-05-13 13:09

cgracey wrote: »

My mistake. It's PINSETX. I actually changed them back to their old names because I think, long term, they are better:

WRPIN (was PINSETM)
WXPIN (was PINSETX)
WYPIN (was PINSETY)
RDPIN (was PINGETZ)

Maybe:

PINMODE
WRPINX
WRPINY
RDPIN
PINACK

Cluso99 · 2016-05-13 14:48

cgracey wrote: »

ozpropdev wrote: »

evanh wrote: »

ozpropdev wrote: »

Also once a pin is configured in a smartpin mode the only way to read it's pin state is by another smartpin that's within range.

As JMG has pointed out more than once, pins get firmly mapped for board layout. That's a necessity. Are you suggesting we should change Cog allocation to all Cogs must be firm mapped as well? Ie: Eliminate COGNEW altogether.

PS: I'm actually okay with this but it is significant change that we should be clearly aware of.

??? woah! I don't think I suggested anything like ELIMINATING anything at all.
My point was simply that mapping pins also now has a new variable in the mix, that being smartpin input selector range.

An object that uses multiple pins would probably stipulate that they be in a row, if they are bound by smart pins.

While smart pins may be more likely to use a contiguous group of pins, it is certainly not true for all P1 objects.

Cluso99 · 2016-05-13 14:53

evanh wrote: »

ozpropdev wrote: »

My point was simply that mapping pins also now has a new variable in the mix, that being smartpin input selector range.

Well, I guess requiring a pin pair to physically be on neighbouring pins could be called a restriction. I don't think it's one that is going to be a gotcha.

EDIT: Point is, advocating for LUT sharing (Cluso) is advocating for elimination of COGNEW as an instruction. That needs to be clearly understood.

NO WAY AM I ADVOCATING GETTING RID OF COGNEW !!!

Heater. · 2016-05-13 15:05

Cluso99,

NO WAY AM I ADVOCATING GETTING RID OF COGNEW !!!

Indirectly you are doing exactly that.

The logic goes something like this:

1) Let's have a high speed cog-to-cog communication mechanism that requires adjacent COG IDs.

2) In the general case one cannot start up adjacent COGS with COGNEW. As Chip has pointed out.

3) That implies having to allocate all COGs statically at start up time. Or having a software COG allocation manager. That uses COGINIT and needs a global lock to work.

4) That implies COGNEW is not needed anymore.

Rayman · 2016-05-13 15:17

We need to remove the cogid command so that no code depends on actual cog#

David Betz · 2016-05-13 15:31

Heater. wrote: »

Cluso99,

NO WAY AM I ADVOCATING GETTING RID OF COGNEW !!!

Indirectly you are doing exactly that.

The logic goes something like this:

1) Let's have a high speed cog-to-cog communication mechanism that requires adjacent COG IDs.

2) In the general case one cannot start up adjacent COGS with COGNEW. As Chip has pointed out.

3) That implies having to allocate all COGs statically at start up time. Or having a software COG allocation manager. That uses COGINIT and needs a global lock to work.

4) That implies COGNEW is not needed anymore.

This is not true *IF* you're willing to allocate COG pairs during application initialization and not dynamically at runtime. The main program can do a COGINIT on COGs 1 and 2 at startup and then the remaining COGs can be either statically or dynamically allocated as needed at runtime. The only thing you give up with this approach is dynamically starting dual-COG drivers. That doesn't seem like a big restriction to me. However, it *IS* a restriction.

David Betz · 2016-05-13 15:56

Here's another ugly idea of how to solve the COGNEW2 problem. First, add a 8 bit register available through the hub that acts as a mask for COGNEW and COGNEW2. Each bit controls a pair of COGS. If bit 0 is set, it means COGS 0 and 1 can be allocated by COGNEW. If it is clear, that pair of COGs can only be allocated by COGNEW2. This means that you partition your 16 COGs at initialization time into ones that can be allocated in pairs and ones that can be allocated individually. Then you can go your merry way and dynamically allocate using COGNEW2 or COGNEW as required with the understanding that you can't allocate any more pairs of COGs than you allowed with the mask setting and the same with individual COGs. That means that an application can use any OBEX objects it wants as long as it counts up the number of single COG objects and dual COG objects and guarantees that the pool for each is of sufficient size. What this doesn't handle is the case where a pair of COGs might be used by a dual COG driver one minute and two single COG drivers the next minute.

Heater. · 2016-05-13 16:51

@Rayman,

We need to remove the cogid command so that no code depends on actual cog#

Problem with that is that one may want to restart the current COG running different code.

Actually I don't remember exactly now but I think my Zog ZPU instruction set emulator did exactly that so that it could replace a COG running Spin with a COG running ZPU instructions. I believe prop-gcc used that idea.

@David,

This is not true *IF* ... The only thing you give up with this approach is dynamically starting dual-COG drivers.

Notice how I always include the phrase "In the general case.." in my comments about this.

Philosophically some don't care about the general case, Clusso and others, but some do, Chip, Eric and myself.

I'm all for simplicity and the "Principle of lease surprise".

Seairth · 2016-05-13 16:53

Heater. wrote: »

@Rayman,

We need to remove the cogid command so that no code depends on actual cog#

Problem with that is that one may want to restart the current COG running different code.

I think that was a joke...

David Betz · 2016-05-13 17:02

Heater. wrote: »

@Rayman,

We need to remove the cogid command so that no code depends on actual cog#

Problem with that is that one may want to restart the current COG running different code.

Actually I don't remember exactly now but I think my Zog ZPU instruction set emulator did exactly that so that it could replace a COG running Spin with a COG running ZPU instructions. I believe prop-gcc used that idea.

@David,

This is not true *IF* ... The only thing you give up with this approach is dynamically starting dual-COG drivers.

Notice how I always include the phrase "In the general case.." in my comments about this.

Philosophically some don't care about the general case, Clusso and others, but some do, Chip, Eric and myself.

I'm all for simplicity and the "Principle of lease surprise".

Yup, there is no way to solve the general case. The dual-COG drivers would have to be handled as an exception.

Is LUT sharing between adjacent cogs very important?

Comments