Chip,
Nice to see an image(s) is on its way for testing.
We can at least begin testing while LUT chatter continues. The new LUT access is somewhat akin to the random slot access discussed a couple of years ago.
IT seems strange that Heater, the poster child for "A Boy and his Soldering Iron" should be so quick to emasculate the P2 architecture and instruction set in order to protect us from ourselves.
IT seems strange that Heater, the poster child for "A Boy and his Soldering Iron" should be so quick to emasculate the P2 architecture and instruction set in order to protect us from ourselves.
Thanks for the characterization. I like it.
Not sure what you mean though.
I didn't understand it either. JMG seems to be the one leading the charge on accident prevention.
A conflict only exists when multiple cogs write the same LUT on the same clock cycle using WRLUTX. In this case, addresses and write data are each OR'd together, causing errant data and an errant address. By using WRLUTS, this problem can be completely avoided, since each cog's write output will only occur during its unique 1-of-16 timeslot, thereby singulating in time the various writes.
Someone will need to remind you that YOU were the one that raised the concern about post-sales support in case that some OBEX objects made overuse or missuse of LUT sharing. So if this PDUIWS (please-don't-use-it-wrong-s.....) WRLUTX is OK for you, then it is OK for us too.
There is also another forum member that more than one hundred times had said that the wonderful part of the propeller is that you can start a cog with any OBEX object written by another person and you will be completely sure that the Propeller will execute both cogs in peace and harmony without any fear of CPU hang, data corruption or loose of determinism. Is he stil here? What does he think about the "LUT-write matter"?
A conflict only exists when multiple cogs write the same LUT on the same clock cycle using WRLUTX. In this case, addresses and write data are each OR'd together, causing errant data and an errant address. By using WRLUTS, this problem can be completely avoided, since each cog's write output will only occur during its unique 1-of-16 timeslot, thereby singulating in time the various writes.
Someone will need to remind you that YOU were the one that raised the concern about post-sales support in case that some OBEX objects made overuse or missuse of LUT sharing. So if this PDUIWS (please-don't-use-it-wrong-s.....) WRLUTX is OK for you, then it is OK for us too.
There is also another forum member that more than one hundred times had said that the wonderful part of the propeller is that you can start a cog with any OBEX object written by another person and you will be completely sure that the Propeller will execute both cogs in peace and harmony without any fear of CPU hang, data corruption or loose of determinism. Is he stil here? What does he think about the "LUT-write matter"?
I don't understand the concern here. If a OBEX object makes use of this feature, it will be entirely restricted to interaction between COGs started by the object. I don't imagine that an object will publish its own LUT address where clients can write data unless it is essentially being used as a mailbox and the object is documented as only being usable by one client. Generally, LUT writing will be entirely used for internal communications within an object. This means that no one outside the object need worry about corruption as long as the object itself is correctly coded. Of course, there could be bugs in OBEX objects. One that would be particularly bad would be if an object sets the LUT write mask incorrectly so that it writes to some COG that it doesn't "own". Object authors would have to be particularly careful about that.
There is also another forum member that more than one hundred times had said that the wonderful part of the propeller is that you can start a cog with any OBEX object written by another person and you will be completely sure that the Propeller will execute both cogs in peace and harmony without any fear of CPU hang, data corruption or loose of determinism. Is he still here?
That could well be me. I'm still here.
What does he think about the "LUT-write matter"?
My point has always been that on the Propeller can take an object I have written and drop it into your project very easily. Because it runs in it's own COG and we don't use interrupts it is guaranteed to not mess up the timing of anything else in your project. This only requires you have a COG or COGs free to run it and memory enough for it to fit. And perhaps pins free for it's use. These are physical constraints, we can't beat the laws of physics here.
With "LUT-write" my object can now use two or more COGs and have them communicate without going through HUB. Nothing has changed. You can still drop my object into your project as easily as before. It's all care free for you and life is good.
Note: This is not true of the old LUT shared between adjacent COGs idea.
Now of course, I may have multiple COGs writing to a single LUT. And I might have to make sure that does not end up in a race condition. None of this is of any concern to you the user. My code is not going to be trampling on on the LUTs of your COG. You have need no need to know or care if I used LUT-write or not. Not any more than if you know or care if I use counters in my code or not.
I think this will turn out to be a case like coginit vs. cognew, but much worse: "Yes, wrlutx is there, and you could use it, but please just don't." Anything whose misuse might cause problems will get misused and will cause problems. How many times have we had to correct programmers for their use of coginit? I think it's important to consider the customer-support consequences of every feature embodied in the P2. Features designed for the priesthood will get used by the laity, whether they fully understand them or not. And this one is particularly pernicious, since the bad consequences of misuse will occur only rarely, making them almost impossible to debug.
I say just get rid of it. Take the performance hit. That matters way less than the support issues.
If Parallax has identified the target of the P2 to be noobs and hobbyists, why have they spent all the time and money making it what it is? Just to tantalize the rest of us? Put the WRLUX documentation at the back of the PASM portion of the manual and include appropriate warnings. A compiler could also generate warnings when it sees multiple cogs executing WRLUTX instructions.
Having programmed many microprocessors over many years, this is nothing. We are making a mountain out of a molehill. This forum is becoming a society of hand-wringers.
I just took out 8 more smart pins and restarted it.
Bugger, that's half of 'em! 400 x 32 = 12800 ALMs. I suspect ALMs are not the problem. Long run routes maybe? EDIT: The failed case is actually 400 x 24 = 9600 ALMs. Maybe that much is needed ... we'll find out soon enough I suppose. We're up to 90 minutes now ...
This might be the death of the feature right here.
Yeah, we will see. If it is a huge resource hog, it might not be worth keeping.
I could live without it... Maybe we'll think of something simpler to do with the LUT's second port...
Maybe LUT could be tied to special mailbox HUB RAM addresses?
Whenever a cog writes to those HUB addresses, it also gets copied straight to LUT.
Then, you don't have to read from HUB, it's already in LUT?
What concerned me was the inherent sporadic nature of any bug manifestations, due to any misconfigured, but infrequent coincident writes. It'd be like tracking down the rare noise burst in an industrial control system. 'Been there: not fun.
You could more easily diagnose the problem by finding where your data is missing in the LUT. If you have a collision with a write to locations A and B, mangled data will end up in location C, but the data you meant to write will be missing from locations A and B.
I don't know why there is so much ongoing concern over this LUT write error stuff. For cogs to write each others' LUTs, they will have been programmed together, carefully, in a known-possibility context. Cogs aren't going to write others' LUTs outside of such a context any more you are going to go up to a stranger on the street and stick your finger in his ear.
After my naive early morning quesetion about the READ vs WRITE
I just have another one ...
Now that we have those wide busses all in place,
with the adressing as described ...
wouldn't adding a READ on top of the WRITE just be one single bus line per COG ????
i.e basically for free ???
giving us both:
- data flow/data driven programming (forward)
PLUS
- query based programming (backward)
e.g. debug
the option to 'assign' many LUTs to a COG if one COG needs more room and another COG does not need the LUT (of course all in collaboration and done by s.o. who knows what s/he does - with all the options for ab-use ;-) )
sorry for asking naive questions
- just trying to make sense of this great 'reality TV' ('the making of a new multicog MCU')
thanks Chip for sharing this unique process.
After my naive early morning quesetion about the READ vs WRITE
I just have another one ...
Now that we have those wide busses all in place,
with the adressing as described ...
wouldn't adding a READ on top of the WRITE just be one single bus line per COG ????
i.e basically for free ???
giving us both:
- data flow/data driven programming (forward)
PLUS
- query based programming (backward)
e.g. debug
the option to 'assign' many LUTs to a COG if one COG needs more room and another COG does not need the LUT (of course all in collaboration and done by s.o. who knows what s/he does - with all the options for ab-use ;-) )
sorry for asking naive questions
- just trying to make sense of this great 'reality TV' ('the making of a new multicog MCU')
thanks Chip for sharing this unique process.
I'm sorry if I made anyone feel their questions were naïve.
To add read-back capability would take a 16x32-bit mux for each cog. It would increase what is there, already, by 80%. The reason for this is that in FPGAs and ASICs, there are typically no such things as tri-state or bidirectional buses.
I'm having a lot of trouble just getting the compiler to do the LUT-write sharing. It never finished the other day (bad sign), and last night it took 9.5 hours to route what only used 89% of the ALM's, coming in at a lousy Fmax of 60MHz. This indicates that we have a routing nightmare that will carry into the ASIC. I changed the fitter seed and am running it again. If this goes badly, this whole LUT-write sharing thing is coming out and we are moving forward without any kind of inter-cog LUT access.
How many ports does the LUT have? The video hardware can read from the LUT and you can execute instructions from the LUT. I guess that implies two read ports. You can also read and write the LUT using special instructions. Does that mean it has a third read port and a write port? And then, on top of all of that, it has an "extra" port we're trying to find a use for?
Still thinking of other things to do, in case can't compile...
Maybe can tie second LUT port to the fast smartpin transfer?
Perhaps rig it though after some setup data written by one cog to smartpin gets placed in LUT of another cog?
There's no cheap way to do LUT sharing outside of the adjacent-cog topology, which is too problematic on the software side, with acknowledgment to all those who say it isn't.
The hub egg-beater is a very fast system which was designed to be THE conduit of choice. It just has high latency.
How many ports does the LUT have? The video hardware can read from the LUT and you can execute instructions from the LUT. I guess that implies two read ports. You can also read and write the LUT using special instructions. Does that mean it has a third read port and a write port? And then, on top of all of that, it has an "extra" port we're trying to find a use for?
It has two read/write ports. The first port is used by the cog for RDLUT/WRLUT and LUT exec. The second port has been only used as a read port for the streamer. We could use that second port as an alternate read/write port for inter-cog LUT access, giving it priority over the streamer's read activity. You would not be using the streamer read if you were also the inter-cog LUT access, so there's no problem here. The problem is that we can't route the inter-cog LUT sharing in a way that allows cogs to be independent of one another.
How many ports does the LUT have? The video hardware can read from the LUT and you can execute instructions from the LUT. I guess that implies two read ports. You can also read and write the LUT using special instructions. Does that mean it has a third read port and a write port? And then, on top of all of that, it has an "extra" port we're trying to find a use for?
It has two read/write ports. The first port is used by the cog for RDLUT/WRLUT and LUT exec. The second port has been only used as a read port for the streamer. We could use that second port as an alternate read/write port for inter-cog LUT access, giving it priority over the streamer's read activity. You would not be using the streamer read if you were also the inter-cog LUT access, so there's no problem here. The problem is that we can't route the inter-cog LUT sharing in a way that allows cogs to be independent of one another.
The problem is that we can't route the inter-cog LUT sharing in a way that allows cogs to be independent of one another.
I'm not seeing that as a problem that cannot be managed ?
Seems one thing this Any-LUT exercise has shown, is everyone is actually fine with some form of management by users, no one expects using the P2 to be thinking-free.
Comments
Nice to see an image(s) is on its way for testing.
We can at least begin testing while LUT chatter continues. The new LUT access is somewhat akin to the random slot access discussed a couple of years ago.
Not sure what you mean though.
Someone will need to remind you that YOU were the one that raised the concern about post-sales support in case that some OBEX objects made overuse or missuse of LUT sharing. So if this PDUIWS (please-don't-use-it-wrong-s.....) WRLUTX is OK for you, then it is OK for us too.
There is also another forum member that more than one hundred times had said that the wonderful part of the propeller is that you can start a cog with any OBEX object written by another person and you will be completely sure that the Propeller will execute both cogs in peace and harmony without any fear of CPU hang, data corruption or loose of determinism. Is he stil here? What does he think about the "LUT-write matter"?
I give up... I'm going to wait for Chip's implementation and then worry about it.
I hope Brian is following this.
With "LUT-write" my object can now use two or more COGs and have them communicate without going through HUB. Nothing has changed. You can still drop my object into your project as easily as before. It's all care free for you and life is good.
Note: This is not true of the old LUT shared between adjacent COGs idea.
Now of course, I may have multiple COGs writing to a single LUT. And I might have to make sure that does not end up in a race condition. None of this is of any concern to you the user. My code is not going to be trampling on on the LUTs of your COG. You have need no need to know or care if I used LUT-write or not. Not any more than if you know or care if I use counters in my code or not.
Yeah, pesky bugs. An object author could screw up his use of the LUT mask and hit the LUT of a COG that is not part of his object.
I don't see this as any worse than making errors in pin masks or pointer arithmetic or using COGINIT.
If Parallax has identified the target of the P2 to be noobs and hobbyists, why have they spent all the time and money making it what it is? Just to tantalize the rest of us? Put the WRLUX documentation at the back of the PASM portion of the manual and include appropriate warnings. A compiler could also generate warnings when it sees multiple cogs executing WRLUTX instructions.
Having programmed many microprocessors over many years, this is nothing. We are making a mountain out of a molehill. This forum is becoming a society of hand-wringers.
Yeah, we will see. If it is a huge resource hog, it might not be worth keeping.
Maybe LUT could be tied to special mailbox HUB RAM addresses?
Whenever a cog writes to those HUB addresses, it also gets copied straight to LUT.
Then, you don't have to read from HUB, it's already in LUT?
What concerned me was the inherent sporadic nature of any bug manifestations, due to any misconfigured, but infrequent coincident writes. It'd be like tracking down the rare noise burst in an industrial control system. 'Been there: not fun.
-Phil
I just have another one ...
Now that we have those wide busses all in place,
with the adressing as described ...
wouldn't adding a READ on top of the WRITE just be one single bus line per COG ????
i.e basically for free ???
giving us both:
- data flow/data driven programming (forward)
PLUS
- query based programming (backward)
e.g. debug
the option to 'assign' many LUTs to a COG if one COG needs more room and another COG does not need the LUT (of course all in collaboration and done by s.o. who knows what s/he does - with all the options for ab-use ;-) )
sorry for asking naive questions
- just trying to make sense of this great 'reality TV' ('the making of a new multicog MCU')
thanks Chip for sharing this unique process.
As it stands a WRITE can write to many other COGs LUTs at the same time. Specified by the LUT mask.
Doing a READ would end up with all those LUTs being returned in some mangled fashion.
I'm sorry if I made anyone feel their questions were naïve.
To add read-back capability would take a 16x32-bit mux for each cog. It would increase what is there, already, by 80%. The reason for this is that in FPGAs and ASICs, there are typically no such things as tri-state or bidirectional buses.
I'm having a lot of trouble just getting the compiler to do the LUT-write sharing. It never finished the other day (bad sign), and last night it took 9.5 hours to route what only used 89% of the ALM's, coming in at a lousy Fmax of 60MHz. This indicates that we have a routing nightmare that will carry into the ASIC. I changed the fitter seed and am running it again. If this goes badly, this whole LUT-write sharing thing is coming out and we are moving forward without any kind of inter-cog LUT access.
Maybe can tie second LUT port to the fast smartpin transfer?
Perhaps rig it though after some setup data written by one cog to smartpin gets placed in LUT of another cog?
There's no cheap way to do LUT sharing outside of the adjacent-cog topology, which is too problematic on the software side, with acknowledgment to all those who say it isn't.
The hub egg-beater is a very fast system which was designed to be THE conduit of choice. It just has high latency.
It has two read/write ports. The first port is used by the cog for RDLUT/WRLUT and LUT exec. The second port has been only used as a read port for the streamer. We could use that second port as an alternate read/write port for inter-cog LUT access, giving it priority over the streamer's read activity. You would not be using the streamer read if you were also the inter-cog LUT access, so there's no problem here. The problem is that we can't route the inter-cog LUT sharing in a way that allows cogs to be independent of one another.
Seems one thing this Any-LUT exercise has shown, is everyone is actually fine with some form of management by users, no one expects using the P2 to be thinking-free.