better smart pin interaction

Seairth · 2016-05-23 14:03

I originally presented this idea a comment elsewhere, but I recently noticed some documentation that made me feel it was worth re-posting this idea in its own thread. Note that this is not exactly the same suggestion as in the referenced comment, but a similar one that should be a bit clearer to understand and implement.

Currently, the smart pins signal events to cogs by raising their associated IN bit. Once a cog has dealt with the event, it calls PINACK to clear the event. There are a few issues with this approach:

While PINACK takes only 2 clock cylces, it takes another 5 clock cycles for IN to be cleared.
If multiple cogs are interested in the same event, they can all safely call PINGETZ, but they cannot all safely call PINACK.

So, here's my suggestion:

In each cog, add a 64-bit pin-event register (one bit per pin). The pin-event register captures the high state of the IN line and the bits are "sticky". Once set, the bit requires an explicit unset from user code.
When a smart pin signals an event, it no longer holds IN high; it only pulses the IN signal.
Because of this arrangement, reading IN on the same clock as the pulse will detect the pulse directly. Reading IN after the pulse will read the sticky bit.
Get rid of PINACK as an alias for "PINSETM #1". Instead, make PINACK a new instruction that clears the sticky bit(s). Really, it re-samples the IN signal(s), which will be 1 if another event is occuring on the same clock as the call to PINACK, and 0 otherwise.
With the removal of PINACK as an alias, this frees up one bit in the PINSETM pattern. Repurpose that bit to indicate whether IN reflects the current state of the pin or the current state of the pin-state register.

What this gives us in return:

The new PINACK does not have the 5 clock penalty, since you are now updating the pin-event register instead of the smart pin.
Each cog has its own pin-event register, so it can safely call PINACK without affecting any other cog.
It would now be possible to react to smart pins using only the pulse. In other words, in cases where a cog is waiting (WAITEDG or WAITINT) for a smart pin, it can be released with the pulse and not have to call PINACK at all.
Because this change would also work with the "default" pin mode, you can optionally use the pin-event register to capture rising edges (or falling edges if configured for inverted signals) without the use of a smart pin.
This improves the ability for smart pins to be used as an alternative to ATN when you need to pass more than the ATN event itself. In addition to the one-to-one "mailbox" usage that's already supported, it can also be safely used for one-to-many mailboxes. This will pair nicely with the current ATN event, which covers the many-to-one case (and the slightly faster one-to-one, event-only case).

cgracey · 2016-05-23 15:45

I like this idea It would cost a lot of flops, though, and might necessitate more mapped registers. Right now, the smart pin is being the pulse collector and you must manually clear it using PINACK (now AKPIN). By centralizing this pulse-capture/clear function in the smart pin, the cogs don't need 64 flops each (1024 total) to track these edges. Also, the AKPIN is a single-clock event in the smart pin, so if many cogs were to AKPIN a smart pin around the same clock cycle, it wouldn't hurt anything.

It would be nice to have all those edge sensors. To do it right, we might need some mapped registers, which would be disruptive, at this point. Let me get the current release out and maybe there will be occasion for this.

Seairth · 2016-05-23 18:07

cgracey wrote: »

It would be nice to have all those edge sensors. To do it right, we might need some mapped registers, which would be disruptive, at this point. Let me get the current release out and maybe there will be occasion for this.

Sounds good. I'll leave this for a later discussion...

jmg · 2016-05-23 22:31

Seairth wrote: »

[*] While PINACK takes only 2 clock cylces, it takes another 5 clock cycles for IN to be cleared.

Can that be sped up ?

Seairth wrote: »

[*] If multiple cogs are interested in the same event, they can all safely call PINGETZ, but they cannot all safely call PINACK.

I think any multiple-COG reading situation is going to need some co-operation rules.
eg one COG is agreed as the ACK COG in this case.

If you allow any-ack, the problem shifts downstream, where a slower COG could miss a read.
The flag is active, but nothing tells it that two lots of data have gone by.
That means you still need some means to manage slowest COG, and a Sticky-IN has not helped much ?

In MCU SPI/UART Ports, this often has an overrun flag.

It is also important that the Flags and Buffers can interleave, to allow continual data flows.
ie buffers act like a shallow FIFO. Any change to flags may affect that careful sequencing ?

Seairth · 2016-05-23 23:39

jmg wrote: »

If you allow any-ack, the problem shifts downstream, where a slower COG could miss a read.
The flag is active, but nothing tells it that two lots of data have gone by.
That means you still need some means to manage slowest COG, and a Sticky-IN has not helped much ?

But this is already the case for smart pin events, as I understand it. A slow cog will miss events right now. Actually, this is potentially worse than what I'm suggesting, because if it's the slowest cog that's responsible for ACKing in a multi-cog scenario, then all of the cogs can potentially miss an event!

jmg · 2016-05-23 23:45

Seairth wrote: »

But this is already the case for smart pin events, as I understand it. A slow cog will miss events right now. Actually, this is potentially worse than what I'm suggesting, because if it's the slowest cog that's responsible for ACKing in a multi-cog scenario, then all of the cogs can potentially miss an event!

Well, yes if that COG is too slow...
My point was that Multi-COG management is going to be tricky in any context, but it is important that the interaction of Flags and next-data should not be overlooked or lost in any Flag changes.
This small-FIFO action is important for highest data rates.

cgracey · 2016-05-24 00:12

It seems to me that a particular cog will manage its set of I/O pins and other cogs won't be reading them. Think objects and encapsulation.

Seairth · 2016-05-24 00:38

cgracey wrote: »

It seems to me that a particular cog will manage its set of I/O pins and other cogs won't be reading them. Think objects and encapsulation.

That's certainly likely to be the most common case, but I think you're selling the versatility of the smart pins short if that's all you expect. There are the non-DAC byte/word/long modes where one cog is sharing data to multiple cogs. Then there are high-rate input streams that require heavy processing, where a series of cogs will take round-robin turns to directly read and process the data. Or, the I/O is a simple shared resource and cogs take turns controlling it (instead of dedicating a single cog and communicating through that cog). And don't forget about debugging/monitoring, where you will want to use another cog to sniff raw incoming data. Not to mention the miraculous feats that PhiPi's likely to demonstrate!

But, again... this can be a later discussion... after we have the current image in hand and have played with it a bit.

jmg · 2016-05-24 00:41

cgracey wrote: »

It seems to me that a particular cog will manage its set of I/O pins and other cogs won't be reading them. Think objects and encapsulation.

'Normally', yes - however it should be possible for multiple COGs to poll a Smart Pin, and with some rules, even (commonly) read the result.

One example I can think of, is a Smart-pin connected to GPS 1pps signal.

That can capture either Period, or run in reciprocal mode, and be used as a live calibrate value.
Rates are very low, so it should be possible for many COGS to check and read the same pin - This could simplify software elsewhere.

If 6 COGS set a Pin cell to the same mode, I presume any mix of that is 'safe' ?

More important, is that a single COG should be able to read or feed a smart pin, at full link speeds with no gaps.

Phil Pilgrim (PhiPi) · 2016-05-24 01:42

Seairth, et al.,

Please try to remember that we're not trying to create the best P2 possible -- that would take forever, and very well could, from what I've seen in the last ten years. What Parallax needs more than anything right now is a P2 -- probably chock-full of design compromises -- that will actually get finished and that will sell to help Parallax's bottom line.

I seriously wonder if any of you have ever been in business. You obviously don't know where to draw the line on development. Of course it's not your money at stake with these endless suggestions for incremental design improvements. But there is a serious cost here from continued delay, no matter how well-intentioned and well-thought-out your ideas may be from an engineering perspective.

And Chip has shown time and again that he is a sucker for all of your ideas and the delays they engender. So give it up! Please! Or there will never be a P2 that embodies your ideas -- or anyone else's for that matter.

I wish I could get paid for all the projects I start and not just for the ones I finish. But life and commerce do not work that way -- for me or for Parallax. Chip has to finish the P2, and distractions like this are constantly keeping that from happening.

-Phil

Rayman · 2016-05-24 02:11

At this point, I think things that add a lot of flops or logic should be avoided.
We're at the top of A9 logic and now have to reduce smart pins out of new P2 builds.
This makes me a bit uncomfortable, although Chip seems Ok with it.

There are things like the LUT sharing that seem like we had to add though... This leverages a powerful interface that already existed with just some more flops.

I think it's healthy to think about new ideas though. Seems like only Chip knows how much a new feature would cost in terms of flops or other logic...

cgracey · 2016-05-24 02:20

I think what we have right now is really good. I don't feel like there's anything critical missing. And this HyperRAM is a huge piece of the puzzle that we needed. The Prop2 and HyperRAM together make quite a good system. If we can later move the design to 28nm and run it at 2GHz, we would be in a really great place. For now, what we will have in 180nm will be plenty. 10x faster can come later.

Seairth · 2016-05-24 02:55

@moderators, this thread has moved on from the original topic. please close and/or sink it.

Alexander (Sandy) Hapgood · 2016-05-24 03:06

cgracey wrote: »

I think what we have right now is really good. I don't feel like there's anything critical missing. And this HyperRAM is a huge piece of the puzzle that we needed. The Prop2 and HyperRAM together make quite a good system. If we can later move the design to 28nm and run it at 2GHz, we would be in a really great place. For now, what we will have in 180nm will be plenty. 10x faster can come later.

Please speak into the microphone :-)

jmg · 2016-05-24 04:08

cgracey wrote: »

I think what we have right now is really good. I don't feel like there's anything critical missing. And this HyperRAM is a huge piece of the puzzle that we needed. The Prop2 and HyperRAM together make quite a good system.

Certainly true

The HyperRAM is also looking useful for P1 life extension too !

Those with FPGAs need to get some HyperRAM talking to P2 asap, to shake out any connection finer points.

cgracey · 2016-05-24 05:06

Alexander (Sandy) Hapgood wrote: »

cgracey wrote: »

I think what we have right now is really good. I don't feel like there's anything critical missing. And this HyperRAM is a huge piece of the puzzle that we needed. The Prop2 and HyperRAM together make quite a good system. If we can later move the design to 28nm and run it at 2GHz, we would be in a really great place. For now, what we will have in 180nm will be plenty. 10x faster can come later.

Please speak into the microphone :-)

I THINK WHAT WE HAVE RIGHT NOW IS REALLY GOOD. I DON'T FEEL LIKE THERE'S ANYTHING CRITICAL MISSING. AND THIS HYPERRAM IS A HUGE PIECE OF THE PUZZLE THAT WE NEEDED. THE PROP2 AND HYPERRAM TOGETHER MAKE QUITE A GOOD SYSTEM. IF WE CAN LATER MOVE THE DESIGN TO 28NM AND RUN IT AT 2GHZ, WE WOULD BE IN A REALLY GREAT PLACE. FOR NOW, WHAT WE WILL HAVE IN 180NM WILL BE PLENTY. 10X FASTER CAN COME LATER.

ErNa · 2016-05-24 07:01

AMEN

potatohead · 2016-05-24 07:30

Attaboy Chip!

better smart pin interaction

Comments