Is LUT sharing between adjacent cogs very important?

cgracey · 2016-05-13 04:00

T Chap wrote: »

Chip, I am curious why you can't put an OR bus connected to all cogs? One cog could load a byte/word/long onto the 32 bit bus, and any cog can read it immediately? Wouldn't that be faster than using a smartpin mode?

It would be fast, but there'd need to be some way to allocate that resource, like LOCKs have.

jmg · 2016-05-13 04:09

cgracey wrote: »

What if we had a smart-pin mode that had byte/word/long sub modes, so that a smart pin could just act as a data transponder. Anyone could write it (one at a time), but all could read it concurrently. For bytes, especially, this would be fast.

How does that "all could read it concurrently" manage skews?
I can see if they all Sync first, and then do a PINGET, that they can get the same result, but what if a user gets the sync wrong, and PINGETs are done at varying phases, what happens then ?

cgracey · 2016-05-13 04:25

jmg wrote: »

cgracey wrote: »

What if we had a smart-pin mode that had byte/word/long sub modes, so that a smart pin could just act as a data transponder. Anyone could write it (one at a time), but all could read it concurrently. For bytes, especially, this would be fast.

How does that "all could read it concurrently" manage skews?
I can see if they all Sync first, and then do a PINGET, that they can get the same result, but what if a user gets the sync wrong, and PINGETs are done at varying phases, what happens then ?

If all cogs waited for attention (WAITATN) and then did a PINGETZ, they would all get the same data. They would be in identical phase.

Smart pin modes %00001..%00011 are DAC modes. Those modes can be byte/word/long transponder modes when the low-level pin is not also configured for DAC operation. There are our three data node modes. The I/O pin, meanwhile, will act normally, but with its output enable controlled by bit 6 of the mode setting.

jmg · 2016-05-13 04:39

cgracey wrote: »

If all cogs waited for attention (WAITATN) and then did a PINGETZ, they would all get the same data. They would be in identical phase.

what happens if a user (say) adds another line of code between the WAIT and GET ?

cgracey wrote: »

Smart pin modes %00001..%00011 are DAC modes.
Those modes can be byte/word/long transponder modes when the low-level pin is not also configured for DAC operation.
There are our three data node modes.
The I/O pin, meanwhile, will act normally, but with its output enable controlled by bit 6 of the mode setting.

Sounds promising.

The I/O could also be any of the side-looking pin-cell inputs ?
or streamer IO ?

What is the cost/practicality of moving the links above pin-space (eg pin# 65..80) ?

cgracey · 2016-05-13 05:01

jmg wrote: »

cgracey wrote: »

If all cogs waited for attention (WAITATN) and then did a PINGETZ, they would all get the same data. They would be in identical phase.

what happens if a user (say) adds another line of code between the WAIT and GET ?

cgracey wrote: »

Smart pin modes %00001..%00011 are DAC modes.
Those modes can be byte/word/long transponder modes when the low-level pin is not also configured for DAC operation.
There are our three data node modes.
The I/O pin, meanwhile, will act normally, but with its output enable controlled by bit 6 of the mode setting.

Sounds promising.

The I/O could also be any of the side-looking pin-cell inputs ?
or streamer IO ?

What is the cost/practicality of moving the links above pin-space (eg pin# 65..80) ?

Yes, on both counts.

There is practically no logic cost to have the existing smart pins perform these functions. Having higher-numbered pins do this would take a lot of additional logic.

evanh · 2016-05-13 05:06

jmg wrote: »

What is the cost/practicality of moving the links above pin-space (eg pin# 65..80) ?

There's plenty of Smartpin's to go around I'd think. 59? to 63 will have dormant DACs.

jmg · 2016-05-13 05:13

evanh wrote: »

No, by normal, Chip means it is just an IN/OUT pin for the Cog.

I think the test is the other uses of that Pin do not need that pin's Pin Cell links.

That means Side looking mapping is ok (used for adjacent pin IN), and so too is streamer.

cgracey wrote: »

There is practically no logic cost to have the existing smart pins perform these functions..

Sounds like a good place to start

I like the sound of 'practically no logic cost'.

evanh · 2016-05-13 05:14

cgracey wrote: »

Yes, on both counts.

Oh, so the input A-B combination produces the IN state on initial power up?

cgracey · 2016-05-13 05:19

evanh wrote: »

cgracey wrote: »

Yes, on both counts.

Oh, so the input A-B combination produces the IN state on initial power up?

Pin modes are initialized to $0000001 on reset, which makes them act like simple I/O pins.

evanh · 2016-05-13 05:43

cgracey wrote: »

Pin modes are initialized to $0000001 on reset, which makes them act like simple I/O pins.

Okay, I didn't ask that very well at all. I'll try again ... Actually, just found a detail in the docs ... the following is written after the A-B logic combinator details: "The resultant ‘A’ will drive the IN signal in non smart pin modes."

That mostly answers my question but it's a little unclear if "resultant 'A'" means the A-input selector or the logic combination result, given it's placement in the documentation.

cgracey · 2016-05-13 05:48

evanh wrote: »

cgracey wrote: »

Pin modes are initialized to $0000001 on reset, which makes them act like simple I/O pins.

Okay, I didn't ask that very well at all. I'll try again ... Actually, just found a detail in the docs ... the following is written after the A-B logic combinator details: "The resultant ‘A’ will drive the IN signal in non smart pin modes."

That mostly answers my question but it's a little unclear if "resultant 'A'" means the A-input selector or the logic combination result, given it's placement in the documentation.

The logic combination result will drive IN. For FFF = %000, the IN from the pin is passed directly out.

Brian Fairchild · 2016-05-13 06:04

As others have said, isn't the problem with fragmented cog allocation only going to occur once you start dynamically loading and reloading objects at run-time? So what is needed is a mechanism to...

a) reserve a number of consecutive cogs for object which need multiple adjacent COGs.
b) reserve COGs a specific number of 'slots' apart to make use of optimised HUB access timing.

In my head that sounds like a couple of registers that are normally clear but that can have bits set to cause COGNEW to skip those COGs when allocating but that COGNEWX ignores.

In a system where clever COG allocation is going to be needed the designer will need to set the registers. Normal single COG objects simply use COGNEW as now but multi-COG object use COGNEWX.

The dream of being to simply using any object from OBEX2 without any restrictions is just that, a dream. EVERYTHING in life has restrictions, even simple things; I can't do cheese-on-toast in my toaster because the bread is vertical and the cheese would slide off. I accept that and move on.

Heater. · 2016-05-13 06:37

Brian,

The dream of being to simply using any object from OBEX2 without any restrictions is just that, a dream.

Why do you say so?

Provided I have memory enough to use the object in my program, provided I have enough free COGs for it, and provided I can tell it what pins to use I should be good to go.

I should not need to know anything more about an object.

Those constraints are kind of fundamental to any software and there is no way around it.

What else do I need to worry about?

Brian Fairchild · 2016-05-13 06:54

Heater. wrote: »

What else do I need to worry about?

Other than...

Heater. wrote: »

Provided I have memory enough...provided I have enough free COGs...provided I can tell it what pins to use...

...nothing at all

My point, probably not well made, is that Objects already come with some restraints. It would not be unreasonable for those Objects that need a specific allocation for their COGs to have an additional one.

ozpropdev · 2016-05-13 06:57

OBEX2 objects will also have to caution users on potential smartpin reach issues.
Smartpins only have a reach of +/- 3 pins (i.e. pin 10 can't cooperate with Pin 20).
Smartpin allocation will need management too.

evanh · 2016-05-13 07:07

ozpropdev wrote: »

OBEX2 objects will also have to caution users on potential smartpin reach issues.
Smartpins only have a reach of +/- 3 pins (i.e. pin 10 can't cooperate with Pin 20).
Smartpin allocation will need management too.

I wouldn't call that cooperation. The input selectors can't read other Smartpin states. I think they can only see the raw pins.

evanh · 2016-05-13 07:09

Not that I have any experience.

Cluso99 · 2016-05-13 07:36

My preference is to have shared LUT, whether or not we get another way using smart pins etc.

Since shared LUT has been done already, could we get a set of fpga code to test?

ozpropdev · 2016-05-13 07:48

evanh wrote: »

I think they can only see the raw pins.

Exactly.
For the A/B counter modes, quad encoder and sync tx/rx modes that require two pins, allocation is important.

Also once a pin is configured in a smartpin mode the only way to read it's pin state is by another smartpin that's within range.

jmg · 2016-05-13 09:17

cgracey wrote: »

If all cogs waited for attention (WAITATN) and then did a PINGETZ, they would all get the same data. They would be in identical phase.

I think this can also be used for a 'collected printf' style debug reporting ?

In this, a master COG would manage PC link, (USB or UART) and it would watch signals from multiple slaves, which would use an available pin-channel to send data.

This would be a Wait on one of N signals, then select that Pin cell for GET.

For highest speed, (lowest impact) one channel per COG would be used, but for least resource consumed, a queued method could be ok,

If another pin has Pin Cell connect, do other slaves get 'busy' flag, and can wait ?

Cluso99 wrote: »

My preference is to have shared LUT, whether or not we get another way using smart pins etc.

Since shared LUT has been done already, could we get a set of fpga code to test?

Yes, LUT sharing will have the highest bandwidth, and so complements the Pin-cell connect, which is intermediate in speed, but very general in use.

evanh · 2016-05-13 09:28

Chip,
Are RD/WRLUT 2-clock or 4-clock instructions?

evanh · 2016-05-13 09:37

ozpropdev wrote: »

Also once a pin is configured in a smartpin mode the only way to read it's pin state is by another smartpin that's within range.

As JMG has pointed out more than once, pins get firmly mapped for board layout. That's a necessity. Are you suggesting we should change Cog allocation to all Cogs must be firm mapped as well? Ie: Eliminate COGNEW altogether.

PS: I'm actually okay with this but it is significant change that we should be clearly aware of.

evanh · 2016-05-13 10:01

JMG also suggested the compile time tools could manage it statically by default like they already handle static memory mapping by default. Ie: Still no explicit numbered Cogs. EDIT: Although I have no idea how this would be achieved.

cgracey · 2016-05-13 10:06

Cluso99 wrote: »

My preference is to have shared LUT, whether or not we get another way using smart pins etc.

Since shared LUT has been done already, could we get a set of fpga code to test?

I put it all in and then took it all out, already. It could go back in, but I would have to rearrange the instructions again. Unless we can get around this cog allocation problem, I don't want to go through all that, again.

Tonight I made smart pin modes %00001..%00011, when the DAC is not enabled, into byte/word/long data nodes. When someone does a 'SETPINX data,pin' on one of those, it updates the Z register to what was written and raises IN, alerting any interested parties that there is new data ready to be grabbed via PINGETZ. If the recipients are waiting in a WAITEDG for that pin's IN, the whole timing will be deterministic. This isn't LUT mind meld, but it gets messages across quickly.

cgracey · 2016-05-13 10:07

evanh wrote: »

Chip,
Are RD/WRLUT 2-clock or 4-clock instructions?

RDLUT is 3 clocks, while WRLUT is 2 clocks.

evanh · 2016-05-13 10:11

Thanks Chip, that's faster than I was expecting. Definitely could be effective for sure.

evanh · 2016-05-13 10:16

cgracey wrote: »

... When someone does a 'SETPINX data,pin' on one of those, it updates the Z register to what was written and raises IN, alerting any interested parties that there is new data ready to be grabbed via PINGETZ. If the recipients are waiting in a WAITEDG for that pin's IN, the whole timing will be deterministic.

I like the edge detection. That saves an instruction at the sending end. Enough to make the new "Attention" facilities redundant even?

cgracey · 2016-05-13 10:25

evanh wrote: »

cgracey wrote: »

... When someone does a 'SETPINX data,pin' on one of those, it updates the Z register to what was written and raises IN, alerting any interested parties that there is new data ready to be grabbed via PINGETZ. If the recipients are waiting in a WAITEDG for that pin's IN, the whole timing will be deterministic.

I like the edge detection. That saves an instruction at the sending end. Enough to make the new "Attention" facilities redundant even?

Perhaps, sort of. The 'COGATN mask16' instruction can set the attention flag in any number of cogs at once, without using up their edge detectors. Maybe what is needed, instead, is simply a second edge event. That would be universally useful.

Maybe some more people could comment on this. I'm too tired to trust my own judgment at the moment.

Seairth · 2016-05-13 10:28

evanh wrote: »

I like the edge detection. That saves an instruction at the sending end. Enough to make the new "Attention" facilities redundant even?

Hardly! The "attention" facilities will still be faster when that's all you need.

Also, with the transponder mode, you will need to call PINACK to reset IN each time. The "attention" is likely an auto-reset, so no additional ACK instruction needed.

Seairth · 2016-05-13 10:30

cgracey wrote: »

evanh wrote: »

cgracey wrote: »

... When someone does a 'SETPINX data,pin' on one of those, it updates the Z register to what was written and raises IN, alerting any interested parties that there is new data ready to be grabbed via PINGETZ. If the recipients are waiting in a WAITEDG for that pin's IN, the whole timing will be deterministic.

I like the edge detection. That saves an instruction at the sending end. Enough to make the new "Attention" facilities redundant even?

Perhaps, sort of. The 'COGATN mask16' instruction can set the attention flag in any number of cogs at once, without using up their edge detectors. Maybe what is needed, instead, is simply a second edge event. That would be universally useful.

Maybe some more people could comment on this. I'm too tired to trust my own judgment at the moment.

Can you provide the instruction set for COGATN?

Is LUT sharing between adjacent cogs very important?

Comments