Putting smarts into the I/O pins

jmg · 2014-04-07 21:44

cgracey wrote: »

Yes, each (rising) edge increments an edge count, and each edge can also, (under SW message control) capture values from a SysCLKd TIME counter and the Edge Counter.
Capture of both captures needs to be atomic, so the T,E values are free of aperture errors and are known to be for the same edge. Those are then read at leisure, and calculations done.

Usually, on a Classic uC, you can set up 2 timers to do this, but often the Atomic is tricky if the capture enables are in two separate SFR registers.

But why is one measurement or the other not adequate? Both values may not even correlate well.

This allows high precision, wide dynamic range work, but is also universally flexible.

A single cycle may be too short for pulse-width to give precise enough frequency, and counting cycles alone has best precision at higher frequencies, poorer at low ones.

By capturing both, and in an atomic way, you can calculate to roughly equal precision no matter what the Frequency is, and the 'no missing edges' nature of this, also allows even higher precision in the back ground, limited only by the maths-limits.

Working on a local register basis, this needs 4, a Time-Counter, a Cycles Ctr, and two capture registers,
(plus the config bits to control them,of course) - so that is the same Local Register count, as true PWM.
(PWM also needs 32b compare)

A standard P1 timer needs 2 local registers, and a 32b adder. (plus the config bits of course)

Tubular · 2014-04-07 21:53

Being able to set the voltage drive levels via DAC outputs would be useful, so a high output might put out +0.4v, and a low output 0 volts (for lvds). This would also let us interface with future lower voltage cores and peripherals

For LVDS displays, you need to drive low voltages (fast), and you need to arrange bits across 3-5 lanes.

The other half of the problem, arranging bits, would traditionally be performed by the cog, but using smart i/o pads you could squirt an identical 24/32 bits to all pads, and each pad would know how to assemble the bits in its own lane, pick them and clock them out at a fast rate.

It would make slave mode (externally clocked) data handling easier too, since the external clock could advance the state machine.

Definitely almost infinite potential for feature creep, however. What's our appetite for that, again?

Lawson · 2014-04-07 21:53

cgracey wrote: »

I was thinking today that CTRs do a lot of things to eventuate a state on an I/O pin, or take some reading. These complicate the cogs, of course.

How about making a simple, flexible state machine that is built onto each pin? It could do PWM, duty modulation, frequency measurement, state timing, ADC accumulation, even A/B encoding between two pins. It could be a UART, as well!

On the Prop2, there is a special signal that goes to each pin, in addition to DIR and OUT, called ALT. It is used to send serial messages to configure the pin for special modes. Prop2 has a CFGPINS instruction which sends a serial bit stream to any number of I/O's on a given port via the ALT signals, using a 32-bit mask. We could get by with using the DIR signals, instead, for this purpose, since software (taking 2 clocks per instruction) could never cause such a rapid %010xxxxx_xxxxxxxx_xxxxxxxx_xxxxxxxx message to initiate. The state machine on the pin would configure itself according to the message and operate accordingly. The DIR pin would be held high to keep the pin in the special mode. The OUT signal would then be a live input to the configured pin. Pins could send messages back via the IN signals. A simple shifter in each cog could receive a message from any pin, shifted back at the clock rate. If DIR were to go low, the pin would revert back to normal mode. This way, cogs that configure pins, but then shut down, release the pins they put into special modes.

What stuff can we make the pin state machines do? Once configured, we have OUT going to the pin and IN coming back from the pin.

Are you thinking something like an few registers, some LUTs and some flops? I.e something like 5 or less FPGA LUs with a fixed interconnect?

Say an increment, count, and compare/buffer register outputting MSB, LSB, carry, count == compare, etc. to the LUTs/Pin with the LUTs/Flops/Pin being able to drive the MSB, LSB, etc. or shift the count register, add increment to count, move count to buffer, etc. Hm... I'd also make the left/right pin state accessible so Phill (and friends) can chain pins to do "magic" tricks? To make this easy to approach, I'd add some "setup pin as X" macros to the assembler. Could this be kept within 29 or 58-bits of setup info?

Marty

jmg · 2014-04-07 22:01

cgracey wrote: »

Like this:

1) You set PWM mode with a frame_count and, for the sake of discussion, an initial on_time.

2) X=0, Y=on_time

3) if X == frame_count then X=0 and Y=on_time, else (X++, if Y<>0 then Y--, pin output = Y<>0)

4) accept new on_time via OUTMSG, while staying in (3)

OK, you only mentioned a single ONCOUNT in your first example, and using WAITxx, so I thought the HW was simpler.

Here, you have frame_count and on_time sent, and two counters which is also 4 pin-local registers.

This dual-counter topology would morph to the Frequency-Capture example, relatively easily, with some muxes and state logic, as both use 4 pin-local registers. (2 ctrs, 2 RW storage)

Instead of a Compare, you have two counters running in parallel, one a saturating one.
An advantage of this design is (on_time = 0) => always low, and (on_time >frame_count) => always high, so PWM covers ALL values. (most uC cannot do this)

[ aside : Slightly simpler is this equivalent variant which tests for zero, not X=V, so saves a little logic.

3) if X == 0 then X=frame_count and Y=on_time, else (X--, if Y<>0 then Y--, pin output = Y<>0) ]

cgracey · 2014-04-07 22:04

jmg wrote: »

OK, you only mentioned a single ONCOUNT in your first example, and using WAITxx, so I thought the HW was simpler.

Here, you have frame_count and on_time sent, and two counters which is also 4 pin-local registers.

Instead of a Compare, you have two counters running in parallel, one a saturating one.
An advantage of this design is (on_time = 0) => always low, and (on_time >frame_count) => always high, so PWM covers ALL values.

[ aside : Slightly simpler is this equivalent variant which tests for zero, not X=V, so saves a little logic.

3) if X == 0 then X=frame_count and Y=on_time, else (X--, if Y<>0 then Y--, pin output = Y<>0) ]

You're right. Better to test for zero. Less logic.

cgracey · 2014-04-07 22:08

Lawson wrote: »

Are you thinking something like an few registers, some LUTs and some flops? I.e something like 5 or less FPGA LUs with a fixed interconnect?

Say an increment, count, and compare/buffer register outputting MSB, LSB, carry, count == compare, etc. to the LUTs/Pin with the LUTs/Flops/Pin being able to drive the MSB, LSB, etc. or shift the count register, add increment to count, move count to buffer, etc. Hm... I'd also make the left/right pin state accessible so Phill (and friends) can chain pins to do "magic" tricks? To make this easy to approach, I'd add some "setup pin as X" macros to the assembler. Could this be kept within 29 or 58-bits of setup info?

Marty

Interesting idea about sharing signals with neighboring pins. The setup could be variably sized, based on a preamble. Definitely, it should fit in a long, just to put a sanity cap on things.

jmg · 2014-04-07 22:10

Lawson wrote: »

Hm... I'd also make the left/right pin state accessible so Phill (and friends) can chain pins to do "magic" tricks?

Chip has hinted at this, at least : even A/B encoding between two pins

which I read as Quadrature counting, on a Pin pair.

Another common use is to count on one pin, and capture on another, while maybe output a PWM or TC to another pin.
So that is up to 3 pins in the mix.

Another common Quadrature form has A/B/I and the index pulse resets the Counter.
That is also a 3 pin example.

The Classic ADC of P1 is a two pin example.

Adjacent-Pin is Probably ok ?

( P1 has fields for A & B pins, but that is not a pin-located counter.)

Lawson · 2014-04-07 22:34

cgracey wrote: »

Interesting idea about sharing signals with neighboring pins. The setup could be variably sized, based on a preamble. Definitely, it should fit in a long, just to put a sanity cap on things.

So that'd directly configure a simple 8 in 4 out crossbar switch. Setup the switch so if if 2 or more inputs feed an output they are AND'd or OR'd. Make the outputs a 1 clock delay back to the input, the pin, add the increment register to count, SHL count, move count to compare, and the LSB of count. (I know that's six) For inputs, have normal and inverted from the flop, the pin state, MSB count, LSB count, add carry, count > compare, count == compare, count < compare, left pin, right pin. (doh, that's 11) And it looks like my crazy idea would need some compression or a multi-level cross-bar switch to fit in a 32-bit word.

Marty

cgracey · 2014-04-07 22:37

This pin state machine could also set up the DAC for dithered output from ANY cog, rather than have the directly-connected cog do it for one of its four fast-access DACs:

DIRMSG dacpinsmask,dacconfig

...then repeat the following, as needed:

OUTMSG somedacpinmask,daclevel16

The pin state machine would dither the 9-bit 75ohm pin DAC with 7 sub-bits of dither to average a 16-bit level in 128 clocks (640ns).

JRetSapDoog · 2014-04-07 23:14

cgracey wrote: »

How about making a simple, flexible state machine that is built onto each pin? It could do PWM, duty modulation, frequency measurement, state timing, ADC accumulation, even A/B encoding between two pins. It could be a UART, as well!

Could this flexible state machine also be smart enough to allow direct digital video to the pins (bypassing the DAC's)? Yeah, I know that's not a counter application. And, no, I don't really have any idea where the video data would be pulled from (a long buffer stuffed by a cog/waitvid perhaps?) or any data bus details. I've probably wandered in_to the wrong thread. I was just hoping that maybe this is why you've held off on commenting on digital video (after all, you've had a full 24 hours to consider all of the ramifications of this new design, lol).

jmg · 2014-04-07 23:19

jmg wrote: »

This dual-counter topology would morph to the Frequency-Capture example, relatively easily, with some muxes and state logic, as both use 4 pin-local registers. (2 ctrs, 2 RW storage)

expanding on this, the P1 CTR is equiv to a Counter loaded by Cv+FRQx which becomes a standard ctr when FRQx = 1,-1
- that's a simple mux [FRQx : 0x001] to reconfig from P1_CTR to one of the 2 counters above. ( or +/-1 for all uses )

The other Counter can have the INC/DEC control for Quadrature, and this still fits in 4 pin-local registers (excluding the config)

jmg · 2014-04-07 23:37

JRetSapDoog wrote: »

Could this flexible state machine also be smart enough to allow direct digital video to the pins (bypassing the DAC's)? Yeah, I know that's not a counter application. And, no, I don't really have any idea where the video data would be pulled from (a long buffer stuffed by a cog/waitvid perhaps?) or any data bus details.

The Pin-counters have a serial config link to the COG, so they are not high bandwidth items.
There are already request/questions for LCD style parallel output (ie bypassing the DAC's), P1 had something simple, so I'd expect this to improve a little.

Chip mentioned tasking was not complex, with no cache, and as there is no separate CLUT in this, one suggestion I made was to effectively 'task-toggle' between a Opcode and a Video 'CLUT' Fetch from data in COG RAM.
This Keeps the DIE saving of removing the CLUT, but allows one to be patched in, by using some COG bandwidth & RAM.

This form of 'task' toggle does not need a second copy of any PC.Z.C, so maybe it is better called something like
Video Memory Sharing
to use a term more from the PC world on cheap systems where they have one memory array and Code & Video share the bus.

cgracey · 2014-04-08 00:45

I looked at the Prop2 CTRs and found that we could do everything they do, right in the pin circuits, excepting the function generator and Goertzel, which we could do in software with a larger granularity. Anyway, the pin smarts can handle all of these functions from the Prop2 CTRs:

PWM
NCO
DUTY
time positive edges
time negative edges
time highs
time lows
count positive edges
count negative edges
count highs (ADC summation)
count lows

Additionally, these functions can be accomplished:

time highs AND lows, with IN serial message indicating what state happened last, along with duration value.
asynchronous UART w/16-bit baud
SYNCHRONOUS IN/OUT SHIFTER USING ADJACENT PIN AS CLOCK - this is the way to implement SERDES!
drive 9-bit DAC with 7 bits of sub-dither for 16-bit average
set I/O pin drive modes: fast, slow, resistor, current, inverter, Schmitt, feedback, etc.

I just realized that once a pin is configured, it needs to stay in its mode, regardless of DIR, because there are many pin driver configurations that use DIR and OUT together, just like under normal operation. So, the DIR serial message receiver must not affect the internal DIR state if it sees that fast message preamble. This is good for ADC modes, where you'll be able to enable calibration with DIR and calibration state with OUT.

Brian Fairchild · 2014-04-08 00:47

cgracey wrote: »

I looked at the Prop2 CTRs and found that we could do everything they do, right in the pin circuits, excepting the function generator and Goertzel,...

Then don't put function generation or Goertzel on the pins. KISS.

cgracey · 2014-04-08 00:54

Brian Fairchild wrote: »

Then don't put function generation or Goertzel on the pins. KISS.

Those are so complex that they need to be in the CTRs - but not in this chip's CTRs.

JRetSapDoog · 2014-04-08 01:02

jmg wrote: »

Chip mentioned tasking was not complex, with no cache, and as there is no separate CLUT in this, one suggestion I made was to effectively 'task-toggle' between a Opcode and a Video 'CLUT' Fetch from data in COG RAM. This Keeps the DIE saving of removing the CLUT, but allows one to be patched in, by using some COG bandwidth & RAM.

Some kind of dynamic patching or on-the-fly changing (for coloring) would be good.

jmg wrote: »

The Pin-counters have a serial config link to the COG, so they are not high bandwidth items.

Thanks, jmg. Carry on with the lower-speed stuff discussions.

cgracey · 2014-04-08 01:41

I'm listing out all the instructions that we need for the new chip and I just realized that the two instructions OUTMSG and WAITMSG, which communicate with pins in special modes, would also be prop-to-prop comms in normal pin mode, as their data would actually go over the pin.

Cluso99 · 2014-04-08 02:51

cgracey wrote: »

I looked at the Prop2 CTRs and found that we could do everything they do, right in the pin circuits, excepting the function generator and Goertzel, which we could do in software with a larger granularity. Anyway, the pin smarts can handle all of these functions from the Prop2 CTRs:

PWM
NCO
DUTY
time positive edges
time negative edges
time highs
time lows
count positive edges
count negative edges
count highs (ADC summation)
count lows

Additionally, these functions can be accomplished:

time highs AND lows, with IN serial message indicating what state happened last, along with duration value.
asynchronous UART w/16-bit baud
SYNCHRONOUS IN/OUT SHIFTER USING ADJACENT PIN AS CLOCK - this is the way to implement SERDES!
drive 9-bit DAC with 7 bits of sub-dither for 16-bit average
set I/O pin drive modes: fast, slow, resistor, current, inverter, Schmitt, feedback, etc.

I just realized that once a pin is configured, it needs to stay in its mode, regardless of DIR, because there are many pin driver configurations that use DIR and OUT together, just like under normal operation. So, the DIR serial message receiver must not affect the internal DIR state if it sees that fast message preamble. This is good for ADC modes, where you'll be able to enable calibration with DIR and calibration state with OUT.

This sounds really neat!

SYNCHRONOUS IN/OUT SHIFTER USING ADJACENT PIN AS CLOCK - this is the way to implement SERDES!

Does this mean that we could use the adjacent pin as an input or an output? ie externally clocked or internally clocked which also goes out?
This could then work for I2C and SPI ?

jmg · 2014-04-08 02:59

Cluso99 wrote: »

Does this mean that we could use the adjacent pin as an input or an output? ie externally clocked or internally clocked which also goes out?
This could then work for I2C and SPI ?

SPI slaves usually have a lower clock ceiling (higher min divide), because they cross clock domains.
With that proviso, SPI slave should be possible, but SPI usually maps to 3 pins.
There are other pin-each-side mappings mentioned, so that does not add any limits.

Heater. · 2014-04-08 03:13

Chip,

I just realized that the two instructions OUTMSG and WAITMSG, which communicate with pins in special modes, would also be prop-to-prop comms in normal pin mode, as their data would actually go over the pin.

Good grief, what have you stumbled into?

How fast are the messages?

How many channels could this provide?

Am I now seeing arrays of Propellers connected with multiple high speed serial channels?

Pinch me some one, I'm dreaming.

jmg · 2014-04-08 03:19

cgracey wrote: »

I looked at the Prop2 CTRs and found that we could do everything they do, right in the pin circuits, excepting the function generator and Goertzel, which we could do in software with a larger granularity. Anyway, the pin smarts can handle all of these functions from the Prop2 CTRs:

PWM
NCO
DUTY
time positive edges
time negative edges
time highs
time lows
count positive edges
count negative edges
count highs (ADC summation)
count lows

Additionally, these functions can be accomplished:

time highs AND lows, with IN serial message indicating what state happened last, along with duration value.
asynchronous UART w/16-bit baud
SYNCHRONOUS IN/OUT SHIFTER USING ADJACENT PIN AS CLOCK - this is the way to implement SERDES!
drive 9-bit DAC with 7 bits of sub-dither for 16-bit average
set I/O pin drive modes: fast, slow, resistor, current, inverter, Schmitt, feedback, etc.

I just realized that once a pin is configured, it needs to stay in its mode, regardless of DIR, because there are many pin driver configurations that use DIR and OUT together, just like under normal operation. So, the DIR serial message receiver must not affect the internal DIR state if it sees that fast message preamble. This is good for ADC modes, where you'll be able to enable calibration with DIR and calibration state with OUT.

All sounding great

I presume simpler edge Captures are included ? - and you could continually time-stamp edges
(_/= -> CaptA, =\_ -> CaptB) to any pulse width and a period limited by the config-bus read rate.

What is the Serial config-bus read rate, across 1 COG, and over all COGS. ?

This may limit the upper speeds of SPI ?

How many 32 bit registers map for this, (ignoring config bits for now) ?

( The above examples were 4 x 32 as RMW )

I think UART can fit into 4, x 32 bit, as can SPI to 32 Bits.
The shifter-registers are hidden from the pin-serial bus, 32b UART needs 2 shifters.

Baud Config can include Bit-Width, 1..32 bits, and fractional Baud is common these days.

jmg · 2014-04-08 03:23

Heater. wrote: »

Chip,

Good grief, what have you stumbled into?

How fast are the messages?

How many channels could this provide?

Am I now seeing arrays of Propellers connected with multiple high speed serial channels?

Pinch me some one, I'm dreaming.

It does sound too good to be true

I think the gotcha is the OUTMSG and WAITMSG, which communicate with pins in special modes are themselves serial, and it is not clear how many COGs can talk at the same time on this special mode ?

Cluso99 · 2014-04-08 03:32

Heater. wrote: »

Chip,

Good grief, what have you stumbled into?

How fast are the messages?

How many channels could this provide?

Am I now seeing arrays of Propellers connected with multiple high speed serial channels?

Pinch me some one, I'm dreaming.

Consider yourself pinched - go back to the opium den now, do not pass go!

From my limited understanding, the clocking is 200MHz (2x cog speed)

But as jmg said, I don't know how many OUTMSG/WAITMSG instructions plus processing or forwarding to/from hub you could perform. But we have 16 cogs!!!

This is way better than abusing the video gen in P1 to output serial

Baggers · 2014-04-08 04:00

OMG, enough said!

Chip, this is AWESOME!

Kerry S · 2014-04-08 05:15

Chip,

This would have AMAZING applications in process control. It would take my little project to a whole new level that I never thought would be possible.

This would be worth having 2 P1+ chips being used ( 1 for HMI/SDRAM and 1 for I/O ) so the +16 I/O to get to 80 per chip would not be as important to me. Though I still think, from a marketing standpoint, you need at least 32 I/O left after SDRAM.

Brian Fairchild · 2014-04-08 05:23

cgracey wrote: »

...I just realized that the two instructions OUTMSG and WAITMSG, which communicate with pins in special modes, would also be prop-to-prop comms in normal pin mode, as their data would actually go over the pin.

Is this some new feature that Chip has just slipped into the conversation or is it something from the P2 ? I've searched the other topics for those instructions and nothing comes up.

cgracey · 2014-04-08 05:25

jmg wrote: »

It does sound too good to be true

I think the gotcha is the OUTMSG and WAITMSG, which communicate with pins in special modes are themselves serial, and it is not clear how many COGs can talk at the same time on this special mode ?

If a cog doesn't configure a pin into some special mode that traps IN/OUT signals for modal use, any OUTMSG packets will travel over the pins, as long as the corresponding DIR bits are high to enable output. Another cog (in another chip) in a WAITMSG instruction can receive it, as if it was coming from his own pin's state machine. The messages are 32 bits and encoded as %010_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_0, with bits advancing on every clock (100 Mbaud @ 200MHz clock). So, each cog can receive and send, but in a half-duplex fashion, as these instructions stall the cog.

Anyway, the pin state machines, when in their modes, can do all sorts of things involving themselves and their neighbors. The cogs only interface to whatever this may become is these three instructions:

DIRMSG - configure pin(s) based on port mask via DIR bit(s)
OUTMSG - send message to pin(s) based on port mask via OUT bit(s)
INMSG - wait for incoming message on a pin via its IN bit (instruction was called WAITMSG just a minute ago)

I really like having a simple interface on the cog side, allowing the pin brains to be developed separately. They are not going to be that complicated, but trying to design them into the cog would be a tenuous effort. Also, ALL pins can be put into PWM, DAC, NCO, whatever mode, and run continuously on their own, accepting updated settings from any cog.

T Chap · 2014-04-08 05:47

I would like to have a Pos Edge CTR mode that allows you to input a value for the counter to count up to, fire a pin high, reset the counter, take the pin low. There are cases where in PASM the time for code to do this takes too long.

cgracey · 2014-04-08 05:57

T Chap wrote: »

I would like to have a Pos Edge CTR mode that allows you to input a value for the counter to count up to, fire a pin high, reset the counter, take the pin low. There are cases where in PASM the time for code to do this takes too long.

So, is the high pin pulse just one clock?

T Chap · 2014-04-08 07:36

Well, to honest, that is subject to the application, clock speed etc as to how long the pulse would need to be to be recognized by the device watching it. In my case, I have not determined the shortest possible clock pulse that would work. The application is that the Prop is watching a clock from another device. The prop is to fire every X pulse. On the Prop 1, with a 12.288 xtal, several PASM instructions are too long to track the incoming pulses. This is is really just a wish list, maybe someone else could make use of it. I am happy to just get 16 cogs! Perhaps one clock is sufficient. Probably some capacitance would help solve it if it is too short.

Putting smarts into the I/O pins

Comments