Is LUT sharing between adjacent cogs very important?

Cluso99 · 2016-05-17 11:10

Chip,
Nice to see an image(s) is on its way for testing.

We can at least begin testing while LUT chatter continues. The new LUT access is somewhat akin to the random slot access discussed a couple of years ago.

evanh · 2016-05-17 11:10

Cluso99 wrote: »

The LUT is almost an equivalent multi port hub ram.

That's a radical rethink that will be a whole other design. Worth exploring for a Prop3 proposal.

Heater. · 2016-05-17 11:22

K2,

IT seems strange that Heater, the poster child for "A Boy and his Soldering Iron" should be so quick to emasculate the P2 architecture and instruction set in order to protect us from ourselves.

Thanks for the characterization. I like it.

Not sure what you mean though.

Rayman · 2016-05-17 11:26

I'm trying hard not to hope that we can't compile with any new features...

evanh · 2016-05-17 11:48

Heater. wrote: »

K2,

IT seems strange that Heater, the poster child for "A Boy and his Soldering Iron" should be so quick to emasculate the P2 architecture and instruction set in order to protect us from ourselves.

Thanks for the characterization. I like it.

Not sure what you mean though.

I didn't understand it either. JMG seems to be the one leading the charge on accident prevention.

evanh · 2016-05-17 11:57

It's 2.5 hours now. Maybe Chip went to bed while waiting.

Ramon · 2016-05-17 13:53

cgracey wrote: »

In case this LUT-write matter is still unclear:

A conflict only exists when multiple cogs write the same LUT on the same clock cycle using WRLUTX. In this case, addresses and write data are each OR'd together, causing errant data and an errant address. By using WRLUTS, this problem can be completely avoided, since each cog's write output will only occur during its unique 1-of-16 timeslot, thereby singulating in time the various writes.

Someone will need to remind you that YOU were the one that raised the concern about post-sales support in case that some OBEX objects made overuse or missuse of LUT sharing. So if this PDUIWS (please-don't-use-it-wrong-s.....) WRLUTX is OK for you, then it is OK for us too.

There is also another forum member that more than one hundred times had said that the wonderful part of the propeller is that you can start a cog with any OBEX object written by another person and you will be completely sure that the Propeller will execute both cogs in peace and harmony without any fear of CPU hang, data corruption or loose of determinism. Is he stil here? What does he think about the "LUT-write matter"?

David Betz · 2016-05-17 14:03

Ramon wrote: »

cgracey wrote: »

In case this LUT-write matter is still unclear:

A conflict only exists when multiple cogs write the same LUT on the same clock cycle using WRLUTX. In this case, addresses and write data are each OR'd together, causing errant data and an errant address. By using WRLUTS, this problem can be completely avoided, since each cog's write output will only occur during its unique 1-of-16 timeslot, thereby singulating in time the various writes.

Someone will need to remind you that YOU were the one that raised the concern about post-sales support in case that some OBEX objects made overuse or missuse of LUT sharing. So if this PDUIWS (please-don't-use-it-wrong-s.....) WRLUTX is OK for you, then it is OK for us too.

There is also another forum member that more than one hundred times had said that the wonderful part of the propeller is that you can start a cog with any OBEX object written by another person and you will be completely sure that the Propeller will execute both cogs in peace and harmony without any fear of CPU hang, data corruption or loose of determinism. Is he stil here? What does he think about the "LUT-write matter"?

I don't understand the concern here. If a OBEX object makes use of this feature, it will be entirely restricted to interaction between COGs started by the object. I don't imagine that an object will publish its own LUT address where clients can write data unless it is essentially being used as a mailbox and the object is documented as only being usable by one client. Generally, LUT writing will be entirely used for internal communications within an object. This means that no one outside the object need worry about corruption as long as the object itself is correctly coded. Of course, there could be bugs in OBEX objects. One that would be particularly bad would be if an object sets the LUT write mask incorrectly so that it writes to some COG that it doesn't "own". Object authors would have to be particularly careful about that.

rjo__ · 2016-05-17 14:19

Every time I start my browser, there are at least 50 new comments in this thread.

I give up... I'm going to wait for Chip's implementation and then worry about it.

I hope Brian is following this.

ozpropdev · 2016-05-17 14:31

rjo__ wrote: »

Every time I start my browser, there are at least 50 new comments in this thread.

I give up... I'm going to wait for Chip's implementation and then worry about it.

I hope Brian is following this.

Alwayslurking....

Heater. · 2016-05-17 14:45

Ramon,

There is also another forum member that more than one hundred times had said that the wonderful part of the propeller is that you can start a cog with any OBEX object written by another person and you will be completely sure that the Propeller will execute both cogs in peace and harmony without any fear of CPU hang, data corruption or loose of determinism. Is he still here?

That could well be me. I'm still here.

What does he think about the "LUT-write matter"?

My point has always been that on the Propeller can take an object I have written and drop it into your project very easily. Because it runs in it's own COG and we don't use interrupts it is guaranteed to not mess up the timing of anything else in your project. This only requires you have a COG or COGs free to run it and memory enough for it to fit. And perhaps pins free for it's use. These are physical constraints, we can't beat the laws of physics here.

With "LUT-write" my object can now use two or more COGs and have them communicate without going through HUB. Nothing has changed. You can still drop my object into your project as easily as before. It's all care free for you and life is good.

Note: This is not true of the old LUT shared between adjacent COGs idea.

Now of course, I may have multiple COGs writing to a single LUT. And I might have to make sure that does not end up in a race condition. None of this is of any concern to you the user. My code is not going to be trampling on on the LUTs of your COG. You have need no need to know or care if I used LUT-write or not. Not any more than if you know or care if I use counters in my code or not.

Heater. · 2016-05-17 14:52

David Betz,

Yeah, pesky bugs. An object author could screw up his use of the LUT mask and hit the LUT of a COG that is not part of his object.

I don't see this as any worse than making errors in pin masks or pointer arithmetic or using COGINIT.

User Name · 2016-05-17 15:02

Thank you for including WRLUTX, even though improperly used it would cause c

Phil Pilgrim (PhiPi) wrote: »

I think this will turn out to be a case like coginit vs. cognew, but much worse: "Yes, wrlutx is there, and you could use it, but please just don't." Anything whose misuse might cause problems will get misused and will cause problems. How many times have we had to correct programmers for their use of coginit? I think it's important to consider the customer-support consequences of every feature embodied in the P2. Features designed for the priesthood will get used by the laity, whether they fully understand them or not. And this one is particularly pernicious, since the bad consequences of misuse will occur only rarely, making them almost impossible to debug.

I say just get rid of it. Take the performance hit. That matters way less than the support issues.

If Parallax has identified the target of the P2 to be noobs and hobbyists, why have they spent all the time and money making it what it is? Just to tantalize the rest of us? Put the WRLUX documentation at the back of the PASM portion of the manual and include appropriate warnings. A compiler could also generate warnings when it sees multiple cogs executing WRLUTX instructions.

Having programmed many microprocessors over many years, this is nothing. We are making a mountain out of a molehill. This forum is becoming a society of hand-wringers.

cgracey · 2016-05-17 15:47

evanh wrote: »

cgracey wrote: »

I just took out 8 more smart pins and restarted it.

Bugger, that's half of 'em! 400 x 32 = 12800 ALMs. I suspect ALMs are not the problem. Long run routes maybe? EDIT: The failed case is actually 400 x 24 = 9600 ALMs. Maybe that much is needed ... we'll find out soon enough I suppose. We're up to 90 minutes now ...

This might be the death of the feature right here.

Yeah, we will see. If it is a huge resource hog, it might not be worth keeping.

Rayman · 2016-05-17 15:48

I could live without it... Maybe we'll think of something simpler to do with the LUT's second port...

Maybe LUT could be tied to special mailbox HUB RAM addresses?
Whenever a cog writes to those HUB addresses, it also gets copied straight to LUT.
Then, you don't have to read from HUB, it's already in LUT?

David Betz · 2016-05-17 16:58

Heater. wrote: »

David Betz,

Yeah, pesky bugs. An object author could screw up his use of the LUT mask and hit the LUT of a COG that is not part of his object.

I don't see this as any worse than making errors in pin masks or pointer arithmetic or using COGINIT.

Exactly.

Phil Pilgrim (PhiPi) · 2016-05-17 17:10

Okay, as long as it's no worse than that.

What concerned me was the inherent sporadic nature of any bug manifestations, due to any misconfigured, but infrequent coincident writes. It'd be like tracking down the rare noise burst in an industrial control system. 'Been there: not fun.

-Phil

Marcus76 · 2016-05-17 17:40

You could more easily diagnose the problem by finding where your data is missing in the LUT. If you have a collision with a write to locations A and B, mangled data will end up in location C, but the data you meant to write will be missing from locations A and B.

potatohead · 2016-05-17 17:47

There is a "my lut was written" event too. Mask errors can be found this way.

cgracey · 2016-05-17 18:13

I don't know why there is so much ongoing concern over this LUT write error stuff. For cogs to write each others' LUTs, they will have been programmed together, carefully, in a known-possibility context. Cogs aren't going to write others' LUTs outside of such a context any more you are going to go up to a stranger on the street and stick your finger in his ear.

MJB · 2016-05-17 19:00

After my naive early morning quesetion about the READ vs WRITE

I just have another one ...

Now that we have those wide busses all in place,
with the adressing as described ...
wouldn't adding a READ on top of the WRITE just be one single bus line per COG ????
i.e basically for free ???

giving us both:
- data flow/data driven programming (forward)
PLUS
- query based programming (backward)
e.g. debug
the option to 'assign' many LUTs to a COG if one COG needs more room and another COG does not need the LUT (of course all in collaboration and done by s.o. who knows what s/he does - with all the options for ab-use ;-) )

sorry for asking naive questions
- just trying to make sense of this great 'reality TV' ('the making of a new multicog MCU')
thanks Chip for sharing this unique process.

Heater. · 2016-05-17 19:14

MJB,

As it stands a WRITE can write to many other COGs LUTs at the same time. Specified by the LUT mask.

Doing a READ would end up with all those LUTs being returned in some mangled fashion.

cgracey · 2016-05-17 19:45

MJB wrote: »

After my naive early morning quesetion about the READ vs WRITE

I just have another one ...

Now that we have those wide busses all in place,
with the adressing as described ...
wouldn't adding a READ on top of the WRITE just be one single bus line per COG ????
i.e basically for free ???

giving us both:
- data flow/data driven programming (forward)
PLUS
- query based programming (backward)
e.g. debug
the option to 'assign' many LUTs to a COG if one COG needs more room and another COG does not need the LUT (of course all in collaboration and done by s.o. who knows what s/he does - with all the options for ab-use ;-) )

sorry for asking naive questions
- just trying to make sense of this great 'reality TV' ('the making of a new multicog MCU')
thanks Chip for sharing this unique process.

I'm sorry if I made anyone feel their questions were naïve.

To add read-back capability would take a 16x32-bit mux for each cog. It would increase what is there, already, by 80%. The reason for this is that in FPGAs and ASICs, there are typically no such things as tri-state or bidirectional buses.

I'm having a lot of trouble just getting the compiler to do the LUT-write sharing. It never finished the other day (bad sign), and last night it took 9.5 hours to route what only used 89% of the ALM's, coming in at a lousy Fmax of 60MHz. This indicates that we have a routing nightmare that will carry into the ASIC. I changed the fitter seed and am running it again. If this goes badly, this whole LUT-write sharing thing is coming out and we are moving forward without any kind of inter-cog LUT access.

Rayman · 2016-05-17 19:47

Still thinking of other things to do, in case can't compile...

Maybe can tie second LUT port to the fast smartpin transfer?

Perhaps rig it though after some setup data written by one cog to smartpin gets placed in LUT of another cog?

David Betz · 2016-05-17 19:56

How many ports does the LUT have? The video hardware can read from the LUT and you can execute instructions from the LUT. I guess that implies two read ports. You can also read and write the LUT using special instructions. Does that mean it has a third read port and a write port? And then, on top of all of that, it has an "extra" port we're trying to find a use for?

cgracey · 2016-05-17 19:57

Rayman wrote: »

Still thinking of other things to do, in case can't compile...

Maybe can tie second LUT port to the fast smartpin transfer?

Perhaps rig it though after some setup data written by one cog to smartpin gets placed in LUT of another cog?

There's no cheap way to do LUT sharing outside of the adjacent-cog topology, which is too problematic on the software side, with acknowledgment to all those who say it isn't.

The hub egg-beater is a very fast system which was designed to be THE conduit of choice. It just has high latency.

cgracey · 2016-05-17 20:03

David Betz wrote: »

How many ports does the LUT have? The video hardware can read from the LUT and you can execute instructions from the LUT. I guess that implies two read ports. You can also read and write the LUT using special instructions. Does that mean it has a third read port and a write port? And then, on top of all of that, it has an "extra" port we're trying to find a use for?

It has two read/write ports. The first port is used by the cog for RDLUT/WRLUT and LUT exec. The second port has been only used as a read port for the streamer. We could use that second port as an alternate read/write port for inter-cog LUT access, giving it priority over the streamer's read activity. You would not be using the streamer read if you were also the inter-cog LUT access, so there's no problem here. The problem is that we can't route the inter-cog LUT sharing in a way that allows cogs to be independent of one another.

David Betz · 2016-05-17 20:05

cgracey wrote: »

David Betz wrote: »

How many ports does the LUT have? The video hardware can read from the LUT and you can execute instructions from the LUT. I guess that implies two read ports. You can also read and write the LUT using special instructions. Does that mean it has a third read port and a write port? And then, on top of all of that, it has an "extra" port we're trying to find a use for?

It has two read/write ports. The first port is used by the cog for RDLUT/WRLUT and LUT exec. The second port has been only used as a read port for the streamer. We could use that second port as an alternate read/write port for inter-cog LUT access, giving it priority over the streamer's read activity. You would not be using the streamer read if you were also the inter-cog LUT access, so there's no problem here. The problem is that we can't route the inter-cog LUT sharing in a way that allows cogs to be independent of one another.

Thanks for the explanation.

jmg · 2016-05-17 20:08

cgracey wrote: »

The problem is that we can't route the inter-cog LUT sharing in a way that allows cogs to be independent of one another.

I'm not seeing that as a problem that cannot be managed ?

Seems one thing this Any-LUT exercise has shown, is everyone is actually fine with some form of management by users, no one expects using the P2 to be thinking-free.

Tubular · 2016-05-17 20:09

That ability to 'read back' own LUT isn't causing a circular issue, is it, Chip? Ie can each cog mux see the LUT buses from 15 rather than 16 cogs?

Is LUT sharing between adjacent cogs very important?

Comments