To make something completely flexible, you'd need to get pretty low-level. That would mean BIG in 180nm.
I figure each smart pin will have two 32-bit registers and some extra flops to track configuration. There will be a 32-bit adder and maybe an incrementer, too. I think targeting a dozen modes, or so, and then implementing them so that each mode is selected by a single code is going to be the most compact and fast implementation.
I figure each smart pin will have two 32-bit registers and some extra flops to track configuration. There will be a 32-bit adder and maybe an incrementer, too. I think targeting a dozen modes, or so, and then implementing them so that each mode is selected by a single code is going to be the most compact and fast implementation.
Is this still the Target-operation ability list - for the Counter section of Smart Pins?
* Quadrature IP and CLK/DIRN modes (32b)
* True PWM, with Period and Duty registers (32b?)
* Counting with Capture, with Period/ReLoad (32b)
* P1 modes ? - including LOGIC gates and Adder/NCO (but no PLL?)
* Internal/External Clock, with edge control Rise/Fall/Both
* Duty Capture -- Dual capture into 2 registers from Rise and Fall (32b)
* Atomic control of capture enables
* Frequency Counter - Dual capture of Cycles and Time into 2 registers, under Atomic enable.
* Pin Toggle modes for Frequency Generate
* Clock In or Clock Out...
* Using an internal counter or an external pin as the Clock
* Bits OUT or IN to/from a PIN
* And read/write those bits to a 32bit register
* Therefore 2 x 32b registers would be required for some form of buffering
This would permit a lot of flexible serial type I/O to be performed. Obviously we are not going to get this running at full cog speed due mainly the MSG restrictions (ie serial control at 200MHz). But I am sure there would be a lot of general purpose things that could be done.
I am really interested to hear what Phil has to suggest. His use of the P1 was brilliant.
Is this still the Target-operation ability list - for the Counter section of Smart Pins?
* Quadrature IP and CLK/DIRN modes (32b)
* True PWM, with Period and Duty registers (32b?)
* Counting with Capture, with Period/ReLoad (32b)
* P1 modes ? - including LOGIC gates and Adder/NCO (but no PLL?)
* Internal/External Clock, with edge control Rise/Fall/Both
* Duty Capture -- Dual capture into 2 registers from Rise and Fall (32b)
* Atomic control of capture enables
* Frequency Counter - Dual capture of Cycles and Time into 2 registers, under Atomic enable.
* Pin Toggle modes for Frequency Generate
I love this idea and have just started playing with a Cypress Psoc4 dev board that came in a few days ago. Having a little bit of programmable hardware can go a LONG way in some applications but most of those can be met equally well by having the extra cores in a propeller chip. This is defiantly something I'd be all for revisiting a few years down the road for the next parallax chip but I think the smart pins are a really good idea and would like to see where they go in silicon first. Who knows maybe with a higher density process we could end up with specialized data serialization/de-serialization cores on all the pins/pairs/n-wide-groups with general purpose processing cores gluing it all together...
... Who knows maybe with a higher density process we could end up with specialized data serialization/de-serialization cores on all the pins/pairs/n-wide-groups..
I think SerDes will still be there, in #30 & #34 I was focusing on he Counter portion, and what is needed.
I think that was what Chip was looking at too ?
You can reconfigure a counter as a shifter, but the more muxes the slower it gets, and the Speed tests I did suggest the signal Muxes/adder in #30 pull it down to just under 200MHz, so it may need a separate shift FF string.
All of the config and holding registers can be common, so most of the FF's are still shared.
A separate shift FF string does make development easier, as counters can be 'signed off', for field testing, whilst the Serial side is fleshed out.
Several of the functional blocks could be registers mapped into cog memory space, similar to DIRx, INx, and OUTx.
-Phil
I'm confused how this bit would work, physically.
Would that register mapping occur over the configuration bus (pin agnostic) or would pins "local" to a cog (like high speed dac connections) have that wide parallel connection between cog and pin count registers. (Or both, like DACs can have both?)
Would that register mapping occur over the configuration bus (pin agnostic) or would pins "local" to a cog (like high speed dac connections) have that wide parallel connection between cog and pin count registers. (Or both, like DACs can have both?)
Both options would be interesting - not sure how practical that is, but if it is there already for DACs it would seem easy to extend it.
This could address one issue with moving NCO from within COGs, to pins, which is it is then harder to port P1 code.
If a default register mapping that 'looked' like P1 Counter/NCO could be provided , with more registers 'behind' that porting P1 code would be a lot easier, and faster response to flags would be possible.
Users could still create a 'sea of PWM' or a 'sea of captures' over the configuration bus (pin agnostic).
Yes, you could have 3 registers per counter like P1, the "FRQA" and "PHSA" registers being your 2 main 32 bit registers. Keep the configuration register (logic modes) similar as far as practicable.
But how would you resolve the ratio of pins to cogs being 4 (presumably). Do you have 4 sets of registers (12 gets a bit much in a 512 long cog), or perhaps 2 sets that can select what pins they connect to? Or just expose the 2 registers * 4 near pins = 8 registers? Is it necessary to be able to read/write more than 1 register per pin, once things are configured and running?
Did this kind of thing already get solved for the DACs, Chip/Phil? Or do they still need a solution too
You're asking the wrong person, if you're asking me. I'm just a broad-brush guy -- no chip design experience. However, what I thought might be practical are two or three undedicated registers for each cog, mapped into its RAM space, and coupled with the programmable logic on the periphery. What those registers connect to would be up to the programmer.
But I think this idea is effectively dead, per Chip's comments, due to the 180nm feature size and the limitations that imposes on logic density. However, I still wonder if that could be ameliorated by sharing the logic among a group of pins, instead of dedicating a large programmable logic block to each one.
Yes, you could have 3 registers per counter like P1, the "FRQA" and "PHSA" registers being your 2 main 32 bit registers. Keep the configuration register (logic modes) similar as far as practicable.
But how would you resolve the ratio of pins to cogs being 4 (presumably). Do you have 4 sets of registers (12 gets a bit much in a 512 long cog), or perhaps 2 sets that can select what pins they connect to? Or just expose the 2 registers * 4 near pins = 8 registers? Is it necessary to be able to read/write more than 1 register per pin, once things are configured and running?
I would keep the register impact in the COG memory map to an absolute minimum.
Even 6 is a lot, but there is a case for a default mapping the same as P1. (which uses 6)
Above that, a simple paging system, via CTRx, would open more data/control if needed.
CTRx would keep the Mode bits the same, but the PinA/PinB pairing would no longer have sensible meaning, nor would PLLDIV
If a group of 2-4 pins is COG related on a fast parallel sense, the diverse PinA/PinB choices would have to go, but most uses would be of close-by pins, and any P1 code is clearly going to need new pin-assignments anyway.
Even over Config Bus, any pin-cell could be set for NCO, but the PinB detail would have to be more constrained than on P1.
But I think this idea is effectively dead, per Chip's comments, due to the 180nm feature size and the limitations that imposes on logic density.
The Programmable Gate Array (sea of gates) idea is dead, being impractically large and slow and expensive to support.
- but the idea of flexible configuration can still be done, as that leaves the critical speed blocks in custom silicon, just the mode and signal-paths are user selected.
Ok. I think its an interesting idea, perhaps we can revisit it later, or perhaps some kind of minimal version could still be implemented but just applied to the key dozen or so signals associated with a couple of 32 bit registers (that can count or shift, and perhaps add)
I went on a hunt to find some very early altera chips, with quartz windows. They had something like 64 macrocells, too, and lets face it any IP would be well expired by now. I found them hiding in the eeprom programmer. I had forgotten the die size was so large.
... or perhaps some kind of minimal version could still be implemented but just applied to the key dozen or so signals associated with a couple of 32 bit registers (that can count or shift, and perhaps add)
..
That's no longer a Programmable Gate Array problem, that is more what I would call a Cross-point problem.
ie almost no AND/OR/FF Macrocell style logic is needed, just a signal mapping system and mode choices, and uC have been doing that in SFRs for generations.
Some newer uC have more comprehensive Cross-point control, to avoid usage contention in smaller packages.
However, what I thought might be practical are two or three undedicated registers for each cog, mapped into its RAM space, and coupled with the programmable logic on the periphery.
Or possibly have them mapped to hub ram elements instead? Might routing at least some pin functionality this way make it easier than routing them to cogs?
I think the idea of partial parallel pathways, was to coat-tail on the path/routing that is already there for the DAC groupings.
That means very little added routing/area cost, as it already takes DAC/DMA from the COG to the pin rim.
I was wondering if this was actually a potentially suitable alternative to that routing since the "current" setup is limited to 4 pins per cog for high BW DACs,
It should be noted that the fast connect for the DACs is an exclusive four pin grouping per Cog. All other DACs are accessed via the slower MSG instructions. That's quite a limiting factor in what Cog is used for what pins if needing this level bandwidth.
Is there likely to be a real need to have better than the MSG instructions for access to the shifters/counters? Meaning: Should there be effort into buffering the MSG instructions, or providing a similar fast access to the same nearby four pins as the fast DACs?
It should be noted that the fast connect for the DACs is an exclusive four pin grouping per Cog. All other pins are accessed via the slower MSG instructions.
Is there likely to be a real need to have better than the MSG instructions for access to the shifters/counters?
There are capture and generate cases where bandwidth matters, if the BUS is already there for DACs, it seems a low cost means to give some Counters higher bandwidth.
For most generic apps, the MSG pathway will be ok. for those were it is not ok, a faster path is a deal breaker.
Meaning: Should there be effort into buffering the MSG instructions, or providing a similar fast access to the same nearby four pins as the fast DACs.
This will depend on how many Pin-cells Chip chooses.
There are many use cases where pairs of pins, make more sense than per-pin, and that changes Pin-cells to 32.pairs.
Once you have that, 4 pin groupings naturally matches the COG count.
Hehe, I was still editing ... I need to do more reviewing before first posting. I wasn't entirely clear on my line of thought at first edit.
The four pins is quite a restriction on Cog allocation in application code. And both serial throughput and counter sampling are never really full bandwidth applications. Eg: 32 bit serialiser is going to take at least 64 system clocks to empty/fill. That's enough for the MSG instructions to deal with. Especially if there is a little buffering added. This then eliminates the allocation/mapping issues.
I think the nature of MSG being serial introduces issues for some operations, and the solution is a parallel read/latch, even if confined to "local" pins
Eg say you have a counter that is increasing at a decent rate. To read its value at an instant you start shifting data out on the MSG bus, assuming the bus is free when you need it. By the time it arrives and is decoded at your cog its already different to the counter at the pin. By a constant amount? Depends on the toggling rate driving the counter (may not be constant), and whether the MSG bus was free etc.
... Eg: 32 bit serialiser is going to take at least 64 system clocks to empty/fill. That's enough for the MSG instructions to deal with. Especially if there is a little buffering added. This then eliminates the allocation/mapping issues.
Buffering is always a good idea, and some uC have small FIFOs on their Counter capture paths, to allow very narrow pulses to be managed.
Chip has mentioned already Duty Capture, which sends Rise and Fall captures to two separate registers, and that form of buffering allows resolution down to 1 fSys clk, with an update rate determined by the Data path.
There are also some PWM designs where fault tripping is important for over current sense.
Should that be allowed for within the Pin-cell (costs a trip pin per PWM group), or is the Prop SW + MSG fast enough ?
I think the nature of MSG being serial introduces issues for some operations, and the solution is a parallel read/latch, even if confined to "local" pins
Eg say you have a counter that is increasing at a decent rate. To read its value at an instant you start shifting data out on the MSG bus, assuming the bus is free when you need it. By the time it arrives and is decoded at your cog its already different to the counter at the pin. By a constant amount? Depends on the toggling rate driving the counter (may not be constant), and whether the MSG bus was free etc.
Chip has mentioned that the PinIN (single bit) pathway from a pin cell, can be optioned to be other things.
- eg on a Capture channel, it could be the Capture Flag.
That does give a fast path (and even a WAITxx path) for a single signal from the Pin Cell.
Likewise OUT could be used as a trigger/start for sync-ing across many 'primed' pins.
Those paths are only one-bit wide per pin, but they can assist bandwidth issues. (and they are 'free')
Next question is: Can MSGOUT or equivalent read these same registers? I'm guessing there is still much to be done around MSG functionality.
I think I may have missed reading a topic about this ...
EDIT: Okay, now having read the opening post from that same topic, I see Chip was already thinking of using all three pin control bits. And that it's not anywhere near a done thing ...
QUOTE: "The DIR pin would be held high to keep the pin in the special mode. The OUT signal would then be a live input to the configured pin. Pins could send messages back via the IN signals. A simple shifter in each cog could receive a message from any pin, shifted back at the clock rate. If DIR were to go low, the pin would revert back to normal mode. This way, cogs that configure pins, but then shut down, release the pins they put into special modes.
What stuff can we make the pin state machines do? Once configured, we have OUT going to the pin and IN coming back from the pin."
I suspect he'll need mode control bits in each smart-pins device rather than trying to use DIR for maintaining the mode.
Comments
I figure each smart pin will have two 32-bit registers and some extra flops to track configuration. There will be a 32-bit adder and maybe an incrementer, too. I think targeting a dozen modes, or so, and then implementing them so that each mode is selected by a single code is going to be the most compact and fast implementation.
Is this still the Target-operation ability list - for the Counter section of Smart Pins?
* Quadrature IP and CLK/DIRN modes (32b)
* True PWM, with Period and Duty registers (32b?)
* Counting with Capture, with Period/ReLoad (32b)
* P1 modes ? - including LOGIC gates and Adder/NCO (but no PLL?)
* Internal/External Clock, with edge control Rise/Fall/Both
* Duty Capture -- Dual capture into 2 registers from Rise and Fall (32b)
* Atomic control of capture enables
* Frequency Counter - Dual capture of Cycles and Time into 2 registers, under Atomic enable.
* Pin Toggle modes for Frequency Generate
* Clock In or Clock Out...
* Using an internal counter or an external pin as the Clock
* Bits OUT or IN to/from a PIN
* And read/write those bits to a 32bit register
* Therefore 2 x 32b registers would be required for some form of buffering
This would permit a lot of flexible serial type I/O to be performed. Obviously we are not going to get this running at full cog speed due mainly the MSG restrictions (ie serial control at 200MHz). But I am sure there would be a lot of general purpose things that could be done.
I am really interested to hear what Phil has to suggest. His use of the P1 was brilliant.
Those sound right.
Cool. Look forward to the details
I think SerDes will still be there, in #30 & #34 I was focusing on he Counter portion, and what is needed.
I think that was what Chip was looking at too ?
You can reconfigure a counter as a shifter, but the more muxes the slower it gets, and the Speed tests I did suggest the signal Muxes/adder in #30 pull it down to just under 200MHz, so it may need a separate shift FF string.
All of the config and holding registers can be common, so most of the FF's are still shared.
A separate shift FF string does make development easier, as counters can be 'signed off', for field testing, whilst the Serial side is fleshed out.
I'm confused how this bit would work, physically.
Would that register mapping occur over the configuration bus (pin agnostic) or would pins "local" to a cog (like high speed dac connections) have that wide parallel connection between cog and pin count registers. (Or both, like DACs can have both?)
Both options would be interesting - not sure how practical that is, but if it is there already for DACs it would seem easy to extend it.
This could address one issue with moving NCO from within COGs, to pins, which is it is then harder to port P1 code.
If a default register mapping that 'looked' like P1 Counter/NCO could be provided , with more registers 'behind' that porting P1 code would be a lot easier, and faster response to flags would be possible.
Users could still create a 'sea of PWM' or a 'sea of captures' over the configuration bus (pin agnostic).
But how would you resolve the ratio of pins to cogs being 4 (presumably). Do you have 4 sets of registers (12 gets a bit much in a 512 long cog), or perhaps 2 sets that can select what pins they connect to? Or just expose the 2 registers * 4 near pins = 8 registers? Is it necessary to be able to read/write more than 1 register per pin, once things are configured and running?
Did this kind of thing already get solved for the DACs, Chip/Phil? Or do they still need a solution too
But I think this idea is effectively dead, per Chip's comments, due to the 180nm feature size and the limitations that imposes on logic density. However, I still wonder if that could be ameliorated by sharing the logic among a group of pins, instead of dedicating a large programmable logic block to each one.
-Phil
I would keep the register impact in the COG memory map to an absolute minimum.
Even 6 is a lot, but there is a case for a default mapping the same as P1. (which uses 6)
Above that, a simple paging system, via CTRx, would open more data/control if needed.
CTRx would keep the Mode bits the same, but the PinA/PinB pairing would no longer have sensible meaning, nor would PLLDIV
If a group of 2-4 pins is COG related on a fast parallel sense, the diverse PinA/PinB choices would have to go, but most uses would be of close-by pins, and any P1 code is clearly going to need new pin-assignments anyway.
Even over Config Bus, any pin-cell could be set for NCO, but the PinB detail would have to be more constrained than on P1.
The Programmable Gate Array (sea of gates) idea is dead, being impractically large and slow and expensive to support.
- but the idea of flexible configuration can still be done, as that leaves the critical speed blocks in custom silicon, just the mode and signal-paths are user selected.
I went on a hunt to find some very early altera chips, with quartz windows. They had something like 64 macrocells, too, and lets face it any IP would be well expired by now. I found them hiding in the eeprom programmer. I had forgotten the die size was so large.
That's no longer a Programmable Gate Array problem, that is more what I would call a Cross-point problem.
ie almost no AND/OR/FF Macrocell style logic is needed, just a signal mapping system and mode choices, and uC have been doing that in SFRs for generations.
Some newer uC have more comprehensive Cross-point control, to avoid usage contention in smaller packages.
Or possibly have them mapped to hub ram elements instead? Might routing at least some pin functionality this way make it easier than routing them to cogs?
I was wondering if this was actually a potentially suitable alternative to that routing since the "current" setup is limited to 4 pins per cog for high BW DACs,
Is there likely to be a real need to have better than the MSG instructions for access to the shifters/counters? Meaning: Should there be effort into buffering the MSG instructions, or providing a similar fast access to the same nearby four pins as the fast DACs?
For most generic apps, the MSG pathway will be ok. for those were it is not ok, a faster path is a deal breaker.
This will depend on how many Pin-cells Chip chooses.
There are many use cases where pairs of pins, make more sense than per-pin, and that changes Pin-cells to 32.pairs.
Once you have that, 4 pin groupings naturally matches the COG count.
The four pins is quite a restriction on Cog allocation in application code. And both serial throughput and counter sampling are never really full bandwidth applications. Eg: 32 bit serialiser is going to take at least 64 system clocks to empty/fill. That's enough for the MSG instructions to deal with. Especially if there is a little buffering added. This then eliminates the allocation/mapping issues.
Eg say you have a counter that is increasing at a decent rate. To read its value at an instant you start shifting data out on the MSG bus, assuming the bus is free when you need it. By the time it arrives and is decoded at your cog its already different to the counter at the pin. By a constant amount? Depends on the toggling rate driving the counter (may not be constant), and whether the MSG bus was free etc.
Buffering is always a good idea, and some uC have small FIFOs on their Counter capture paths, to allow very narrow pulses to be managed.
Chip has mentioned already Duty Capture, which sends Rise and Fall captures to two separate registers, and that form of buffering allows resolution down to 1 fSys clk, with an update rate determined by the Data path.
There are also some PWM designs where fault tripping is important for over current sense.
Should that be allowed for within the Pin-cell (costs a trip pin per PWM group), or is the Prop SW + MSG fast enough ?
Chip has mentioned that the PinIN (single bit) pathway from a pin cell, can be optioned to be other things.
- eg on a Capture channel, it could be the Capture Flag.
That does give a fast path (and even a WAITxx path) for a single signal from the Pin Cell.
Likewise OUT could be used as a trigger/start for sync-ing across many 'primed' pins.
Those paths are only one-bit wide per pin, but they can assist bandwidth issues. (and they are 'free')
I was straighten up on that one a few days back - http://forums.parallax.com/showthread.php/155145-Putting-smarts-into-the-I-O-pins?p=1269744&viewfull=1#post1269744 It's not a bus. It's a bit-stream over the output DIR control bit on a per pin basis. Very efficient way of dynamic mapping multiple processors to multiple devices!
Knowing that detail makes JMGs comments about also re-purposing the IN/OUT pin bits a lot clearer.
I think I may have missed reading a topic about this ...
EDIT: Okay, now having read the opening post from that same topic, I see Chip was already thinking of using all three pin control bits. And that it's not anywhere near a done thing ...
QUOTE: "The DIR pin would be held high to keep the pin in the special mode. The OUT signal would then be a live input to the configured pin. Pins could send messages back via the IN signals. A simple shifter in each cog could receive a message from any pin, shifted back at the clock rate. If DIR were to go low, the pin would revert back to normal mode. This way, cogs that configure pins, but then shut down, release the pins they put into special modes.
What stuff can we make the pin state machines do? Once configured, we have OUT going to the pin and IN coming back from the pin."
I suspect he'll need mode control bits in each smart-pins device rather than trying to use DIR for maintaining the mode.