Putting smarts into the I/O pins

cgracey · 2014-04-09 15:14

tonyp12 wrote: »

>This is the table from the preliminary P2 pdf:

I don't see a open-drain mode, if you bit bang a pin you use dir to do it but if any type of digital serial out only have push-pull mode is not good.

Look at the %111 pattern in the table. You can select 'float' for high or low OUT conditions.

tonyp12 · 2014-04-09 15:28

>Look at the %111 pattern in the table. You can select 'float' for high or low OUT conditions.
I did not know that you could mix and match those for high and low side of the totem pole, but Tubular explained that.

Rayman · 2014-04-09 15:51

Here's a possibly dumb idea for the smart pins:

Would it make sense to specify, say a CON long value, that at startup would set the pull-up resistor state of each pin?

I've sometimes thought it would be nice if all the Prop pins had a weak pull up.
But, maybe you don't always want that and not on all pins anyway...

jmg · 2014-04-09 15:51

jmg wrote: »

I focused on the inner Cell, with Boolean and Data Paths needed for all operations. Two of these 32b cells are needed.

I recoded the Verilog to move the ClockEnable to a more explicit node, and the tools did better.
This splits the logic, so Clock enable does not widen the Data Path, but has its own logic tree.
Might play with the RST line next, as I think a DFFCER register would code the smallest.

Report improved, less Slices and faster Clock reported
Number of SLICEs: 77 out of 4864 (2%)
Number of Clock Enables: 1
Net QC_CE: 24 loads, 24 LSLICEs
Number of ripple logic: 16 (32 LUT4s)

Re-Coding to focus on _CE and _SR, does shrink down to 63 Slices, and also boosts CLK speeds again.

Then I realised another Boolean (ForceRL) is needed, to support this PWM mode From #35

3) if X == 0 then X=frame_count and Y=on_time, else (X--, if Y<>0 then Y--, pin output = Y<>0) ]

and the report is now
// Number of SLICEs: 67 out of 4864 (1%)

Control Booleans are now
RST, SW_RMW, Enable, Dirn, SatMode, ForceRL

jmg · 2014-04-09 15:57

Rayman wrote: »

Here's a possibly dumb idea for the smart pins:

Would it make sense to specify, say a CON long value, that at startup would set the pull-up resistor state of each pin?

I've sometimes thought it would be nice if all the Prop pins had a weak pull up.
But, maybe you don't always want that and not on all pins anyway...

Usually a default port state is applied During Reset, but I think you are talking about something loaded during boot ?

Floating pins during reset is not ideal as the power can be higher/less defined, and you force users to place resistors on all pins that must be defined.

tonyp12 · 2014-04-09 16:09

>and you force users to place resistors on all pins that must be defined
would be cool, if you could write to a "flash" register that have weak pull up/down settings to take effect immediate on power up, saving user from having to install external ones

jmg · 2014-04-09 16:24

tonyp12 wrote: »

>and you force users to place resistors on all pins that must be defined
would be cool, if you could write to a "flash" register that have weak pull up/down settings to take effect immediate on power up, saving user from having to install external ones

There is no flash, but there are Fuses, which could possibly define a global Pin action, on Async Reset
(ie ideally, this applies to the pins, even with no clocks running)
RST pins should also have non-clocked noise-filters, so spikes under ~ 50ns are ignored.

2 Fuses could cover AllPinsRESET states of [LightPullDown / LightPullUp / Floating / PinKeep]

tonyp12 · 2014-04-09 16:34

>There is no flash,
I know there is no flash so I used "", would is be hard to get 4 longs worth of magnetic ram or any type of NV ram in the design?
And then one long is used for Pin state at reset/power-up to take effect immediate.

Fuses should at least be for groups of 16pins as some devices needs to be kept high and some low at boot-up.
Or make 2 groups pull-up and 2 groups pull-down as default without the use of fuses and user simple have to use the groups that fit the needs.

jmg · 2014-04-09 16:50

tonyp12 wrote: »

>There is no flash,
I know there is no flash so I used "", would is be hard to get 4 longs worth of magnetic ram or any type of NV ram in the design?
And then one long is used for Pin state at reset/power-up to take effect immediate.

Yes, it would be hard to get magnetic ram or any type of NV ram in the design

tonyp12 wrote: »

Fuses should at least be for groups of 16pins as some devices needs to be kept high and some low at boot-up.
Or make 2 groups pull-up and 2 groups pull-down as default without the use of fuses and user simple have to use the groups that fit the needs.

I guess this comes down to how many fuses are 'spare' and how easy they are to link in with RST ?

16 Grouping only bumps fuses needed to 8.

Split default is not a bad idea, but does require users read the manual when doing a PCB design - optimistic ?

tonyp12 · 2014-04-09 16:53

I want a few longs of these in the P1+

Pros with SONOS technology is its ease of integration in CMOS (only two additional masks).
This allows the NV cell to be located immediately adjacent to the 6T SRAM cell in each memory bit making the transfers between SRAM to NV all happen in parallel and at very low power levels.

http://www.cypress.com/?docID=45736

jmg · 2014-04-09 17:02

tonyp12 wrote: »

I want a few longs of these in the P1+

Pros with SONOS technology is its ease of integration in CMOS (only two additional masks).
This allows the NV cell to be located immediately adjacent to the 6T SRAM cell in each memory bit making the transfers between SRAM to NV all happen in parallel and at very low power levels.

Sounds nice, but whilst that (only two additional masks) is very easy to say, it has a massive cost impact.

Chip could do a 'reality check' with OnSemi, to confirm his fuses are still the smallest/cheapest/(lowest risk) way to do some config?

Rayman · 2014-04-10 03:26

Wonder if it would make any sense to have 32 pins be just digital I/O...
Would leave more room for the other pins to be smarter maybe...

dMajo · 2014-04-10 05:29

Ref. http://forums.parallax.com/showthread.php/155145-Putting-smarts-into-the-I-O-pins?p=1258113&viewfull=1#post1258113

In the below table I do not understand how I can compare an analog input (1111_0LLLLLLLL) to a value, read or wait in one place on the comparator logic output while I am reading (using ADC on same pin) also the incoming value (1111_10010xxxx). Can this modes be mixed in some ways to output the comparator logic output on external pin? It seems no.
It seems that when using comparator mode no adc is possible because the IN is used for the comparator output, isn't it?

I think that a pin can always be feed to the comparator, so when is used as ADC its DAC can set the compare level and the comparator output can be made available to the external world through the adjacent pin, the same one that will bring also the digital state into cogs through the IN register acting as feedback of the real pin state when it is configured as output.
It's true that two pins can be externally shorted one for adc, the other for comparator function but still the comparator quick response to the external world is missing

Am I wrong?

Ariba wrote: »

This is the table from the preliminary P2 pdf:

Seairth · 2014-04-10 05:42

cgracey wrote: »

The user doesn't know about the preamble bits. They are generate and received quietly.

When a pin is in some special modes, it streams 32-bit messages, separated by 32 or more clocks, so when you do a MSGINA/MSGINB, the cog waits for the dead space and then receives the next message. If it never gets a message in time, it returns C=1. Messages are always coming via INA/INB for pins configured to modes where messaging is needed.

If configuring a pin for 100k pull-up / strong low output, there would be no ongoing messaging from INA because the pin would just need to behave logically as a normal I/O.

Okay. This is starting to make a bit more sense. Now, for the next set of questions (bear with me):

Does every pin have its own smarts?
When MSGINx is called, is it looking at just one pin?
Is it possible to have some pins in INA (or INB) configured for smarts while others are raw digital I/O?
If the message is being serially transmitted to the waiting COG, does that mean that MSGINx will always block for at least 32 (36?) clock cycles?

You mentioned earlier that the messages from a smart pin are repeatedly sent to any listening cogs.

If the message is repeated without delay, would that mean that MSGINx could, at best, receive every other message (due to missing the beginning of the next preamble)?
Also, if the message is repeated without delay, why wait up to 64 clock cycles to time out?
If there can be a delay between messages, couldn't you have a potential race condition, where you have just missed the preamble of a message and there is a 32-clock delay between that message and the next?
For data that was received serially on the smart pin (UART, SPI, I2C, etc.), how do we distinguish between a new 32-bit value and a repeated 32-bit value?

Or do I have this all wrong, and there is a 32-bit buffer for each pin that holds the current message, which is either overwritten by the smart pin if there is new data or cleared by a COG when it is read via MSGINx?

(sorry. this is all still kind of fuzzy to me.)

Jack Buffington · 2014-04-10 08:41

Rayman wrote: »

Wonder if it would make any sense to have 32 pins be just digital I/O...
Would leave more room for the other pins to be smarter maybe...

One of the things that I really like a lot about the propeller is that layout is a breeze because every pin is identical. Your board layout isn't defined by pin locations for the various peripherals.

jmg · 2014-04-10 14:46

Rayman wrote: »

Wonder if it would make any sense to have 32 pins be just digital I/O...
Would leave more room for the other pins to be smarter maybe...

Depends on the area of each cell.

I've been trialing merging NCO into a Pin Cell, and there is some 'bobble' on the Lattice tools reports, but I can get > 210MHz with a non-NCO merged cell using Lattice LFE3 -9 grade.

Adding NCO as a merged option needs a MUX and a wide adder, plus another control signal, and seems to add ~30% to the Logic, and slows down by 8~20%

Addit : If I pipeline the MUX, it adds 32 FF, but helps the speed issue.
Report now has 68 Slices, ( 75FF 120 LUT4) and ~223MHz with NCO adder merged.

Addit2: Adding a CY out on the Adder for other NCO modes, bobbles the data again, and it now has 98 Slices, ( 193 LUT4) and ~246MHz with CY added
Went from
ripple logic: 17 (34 LUT4s) to ripple logic: 34 (68 LUT4s)
so looks like the tools duplicated some paths, maybe to meet timing (set to 220MHz )

cgracey · 2014-04-11 08:00

dMajo wrote: »

Ref. http://forums.parallax.com/showthread.php/155145-Putting-smarts-into-the-I-O-pins?p=1258113&viewfull=1#post1258113

In the below table I do not understand how I can compare an analog input (1111_0LLLLLLLL) to a value, read or wait in one place on the comparator logic output while I am reading (using ADC on same pin) also the incoming value (1111_10010xxxx). Can this modes be mixed in some ways to output the comparator logic output on external pin? It seems no.
It seems that when using comparator mode no adc is possible because the IN is used for the comparator output, isn't it?

I think that a pin can always be feed to the comparator, so when is used as ADC its DAC can set the compare level and the comparator output can be made available to the external world through the adjacent pin, the same one that will bring also the digital state into cogs through the IN register acting as feedback of the real pin state when it is configured as output.
It's true that two pins can be externally shorted one for adc, the other for comparator function but still the comparator quick response to the external world is missing

Am I wrong?

I never thought about running the DAC and level comparator at the same time, but it could be done. Those modes in the list will change quite a bit now. We can have a mode where the DAC outputs while the IN signal feeds back the level comparator's state. You could make a closed-loop voltage source with current measurement on one pin then.

cgracey · 2014-04-11 08:09

Seairth wrote: »

Okay. This is starting to make a bit more sense. Now, for the next set of questions (bear with me):
Does every pin have its own smarts?

When MSGINx is called, is it looking at just one pin?

Is it possible to have some pins in INA (or INB) configured for smarts while others are raw digital I/O?

If the message is being serially transmitted to the waiting COG, does that mean that MSGINx will always block for at least 32 (36?) clock cycles?

You mentioned earlier that the messages from a smart pin are repeatedly sent to any listening cogs.
If the message is repeated without delay, would that mean that MSGINx could, at best, receive every other message (due to missing the beginning of the next preamble)?

Also, if the message is repeated without delay, why wait up to 64 clock cycles to time out?

If there can be a delay between messages, couldn't you have a potential race condition, where you have just missed the preamble of a message and there is a 32-clock delay between that message and the next?

For data that was received serially on the smart pin (UART, SPI, I2C, etc.), how do we distinguish between a new 32-bit value and a repeated 32-bit value?

Or do I have this all wrong, and there is a 32-bit buffer for each pin that holds the current message, which is either overwritten by the smart pin if there is new data or cleared by a COG when it is read via MSGINx?

(sorry. this is all still kind of fuzzy to me.)

Every pin has smarts, ideally, but it doesn't need to be the case.

MSGIN waits briefly for a message coming from a single pin. It returns C=1 if no message received after ~32 clocks of waiting for one to start. Otherwise, C=0 and D=message. MSGIN stalls the cog.

Every pin is in raw digital mode until you configure it otherwise.

I'm thinking that messages need to be encoded in such a way that you'd never get a partial message. This could be done by inserting a '0' bit between every Nth data bit, and use a string of N+1 '1' bits to start the message.

Distinguishing between new and repeated values can be handled differently, according to what you're doing. If you set up a bunch of pins for ADC integration, you'll know based on CNT when new samples would be ready. Otherwise, you could strobe the OUT pin the reset a counter in a mode, etc.

ctwardell · 2014-04-11 08:20

cgracey wrote: »

MSGIN waits briefly for a message coming from a single pin. It returns C=1 if no message received after ~32 clocks of waiting for one to start. Otherwise, C=0 and D=message. MSGIN stalls the cog.

cgracey wrote: »

I'm thinking that messages need to be encoded in such a way that you'd never get a partial message. This could be done by inserting a '0' bit between every Nth data bit, and use a string of N+1 '1' bits to start the message.

How about just having the messages synchronized to a signal that occurs every 32 clocks. That would avoid needing the preamble and all the logic to make sure a frame is properly aligned.

C.W.

Lawson · 2014-04-11 08:56

cgracey wrote: »

I never thought about running the DAC and level comparator at the same time, but it could be done. Those modes in the list will change quite a bit now. We can have a mode where the DAC outputs while the IN signal feeds back the level comparator's state. You could make a closed-loop voltage source with current measurement on one pin then.

Using the DAC and comparator to set a variable pin threshold will be very useful for level translation. For me, being able to set a 10-100mV input threshold level would be great for optical triggers.

A voltage source with simultaneous current measurement is also super interesting for device self-test. It would allow measurement of the impedance as a function of frequency of all circuits connected to the chip. With a testing schedule and some logging, it should allow the chip to spot circuit degradation before failure. (impedance at DC and 2-3 frequencies would likely be sufficient)

Marty

cgracey · 2014-04-11 09:13

ctwardell wrote: »

How about just having the messages synchronized to a signal that occurs every 32 clocks. That would avoid needing the preamble and all the logic to make sure a frame is properly aligned.

C.W.

Excellent idea!!! This would simplify things quite a bit, speed things up, and make things deterministic again. It would, though, disallow phenomena that wasn't some multiple of 32 clocks, though - that needs some considering.

cgracey · 2014-04-11 09:26

Lawson wrote: »

Using the DAC and comparator to set a variable pin threshold will be very useful for level translation. For me, being able to set a 10-100mV input threshold level would be great for optical triggers.

A voltage source with simultaneous current measurement is also super interesting for device self-test. It would allow measurement of the impedance as a function of frequency of all circuits connected to the chip. With a testing schedule and some logging, it should allow the chip to spot circuit degradation before failure. (impedance at DC and 2-3 frequencies would likely be sufficient)

Marty

So, we need a mode where the 8-bit level comparator output goes to IN and when DIR=1, OUT controls the DAC where OUT=1 means output preset DAC value and OUT=0 means output DAC level 0. So, the DIR/OUT/IN relationships are like that of a normal logic pin, but the output is one of two DAC levels. We could actually set the high and low output values in the config command.

dMajo · 2014-04-11 09:47

cgracey wrote: »

I never thought about running the DAC and level comparator at the same time, but it could be done. Those modes in the list will change quite a bit now. We can have a mode where the DAC outputs while the IN signal feeds back the level comparator's state. You could make a closed-loop voltage source with current measurement on one pin then.

Yes.
But my primary intention was to be able to read the analog value on the pin while at the same time have a comparator on it. I thought that to set the compare reference/level the DAC of the same pin (routed to the comparator) can be used since the pin is an input. If there is an other register to set the comparator level then there is no need to use the DAC.
I'd like if the comparator output can be directly available externally through eg the adjacent pin's OR circuitry, where all the cogs outputs get summed. The comparator can have its own enable control of course. The comparator output can still be available in this way to the cogs because the IN of this adjacent pin shows always the real state on the pin thus the comparator eventually summed with other cogs output. This is the up to the programmer.

This can be used as diagnostics: one cog can sample the adc even with slow rates (waiting on clock), while an other cog can wait on adjacent pin (comparator output) if something wrong with the incoming signal and react fast. The whole design can be power aware (saving). Think on weather sensors/stations on battery and/or small solar cells, portable data loggers.... now you have one-chip digital and analog solution with considerably bigger ram, no external components wasting power, the uC should also be green.
An other example can be in control of power electronics with IGBTs: the ADC can eg read the load current for the process control, the adjacent pin can output PWM to the IGBT, but if you have an overcurrent or short the OR-ed comparator output can immediately shutdown the PWM pulses saving the power components. Otherwise you need bigger inductance to slow the delta current to give the software the time to react, or you need external hardware for the protection.
I know that this can be done polling the ADC channel and comparing it in sw but the matter here is to do it quickly without polling and to have the information availabe both internally and externally.

The same can apply in 4..20mA loop conversion: the ADC can read the signal value when needed, at constant rates, but comparator can immediately trigger an under 4mA condition.
Perhaps can be worthwhile considering how two adjacent pins eg using a transistor and shunt resistor only can use its ADC and DAC to output in closed-loop a 4..20mA, reducing the external component count.
In industrial field 4.20mA is used almost everywhere in place of 1V/10V analog signals because they are immune to electrical noise.

dMajo · 2014-04-11 09:51

Chip
BTW is the comparator working in the analog of digital domain. I mean is it a analog comparator for which one input is the pin and the other a variable voltage reference or is a digital comparator thar compares the adc output with a latch/register?

Seairth · 2014-04-11 10:38

cgracey wrote: »

Every pin has smarts, ideally, but it doesn't need to be the case.

MSGIN waits briefly for a message coming from a single pin. It returns C=1 if no message received after ~32 clocks of waiting for one to start. Otherwise, C=0 and D=message. MSGIN stalls the cog.

Every pin is in raw digital mode until you configure it otherwise.

I'm thinking that messages need to be encoded in such a way that you'd never get a partial message. This could be done by inserting a '0' bit between every Nth data bit, and use a string of N+1 '1' bits to start the message.

Distinguishing between new and repeated values can be handled differently, according to what you're doing. If you set up a bunch of pins for ADC integration, you'll know based on CNT when new samples would be ready. Otherwise, you could strobe the OUT pin the reset a counter in a mode, etc.

I suggest placing a 32-bit input buffer on every pin:

The smart pin would set this buffer, plus a "data ready" flag, every time it has new data.
If there was already a value in the buffer, it just gets overwritten. I suppose you could add configuration to drop the new message instead, but that can be future improvement.
The value wouldn't necessarily need to be 32 bits long (the code that set up the pin would known how many bits to expect). Which means that a new value can be available more frequently than every 32 clock cycles.
MSGINx would read the buffer (and clear the flag), or set C to indicate that the buffer was empty. Since this is really more like the WAITxxx instructions, you could still allow for a blocking mode (without timeout), while the polling mode would always return immediately.
I wouldn't expect multiple cores to be reading the same smart pin, so this doesn't necessarily need to be protected like a SYSOP.
During the MSGINx call, I'd think you could use the 32-bit INx data channel to read the buffer (since no other instruction would be reading INx at that time anyhow). This might be complicated a bit if the CTRx stuff stays in the core.

jmg · 2014-04-11 12:10

cgracey wrote: »

Excellent idea!!! This would simplify things quite a bit, speed things up, and make things deterministic again. It would, though, disallow phenomena that wasn't some multiple of 32 clocks, though - that needs some considering.

How do these MSG pathways choose the Pin, and the control register in the Pin ?
You would want to have atomic R/W on 32 bit values, so that makes the frame > 32b I think ?

dnalor · 2014-04-11 12:32

Ariba wrote: »

This is the table from the preliminary P2 pdf:

What does the SDRAM Modes? Would I not need a "SDRAM Address" mode, too. Or will they not be in this (P1+) chip?

ctwardell · 2014-04-11 12:37

jmg wrote: »

How do these MSG pathways choose the Pin, and the control register in the Pin ?
You would want to have atomic R/W on 32 bit values, so that makes the frame > 32b I think ?

After some thought it would need to be more than 32 clocks because we can't eliminate the preamble at least when writing to a smart pin.

The smart pin has no way to know data is coming, so it depends on the 'fast wiggle' preamble on the DIR line to indicate that DIR is sending a message.

C.W.

Seairth · 2014-04-11 12:50

dnalor wrote: »

What does the SDRAM Modes? Would I not need a "SDRAM Address" mode, too. Or will they not be in this (P1+) chip?

SDRAM addressing is just done with the pins in normal mode, set as outputs.

jmg · 2014-04-11 15:44

The ~ 205~245MHz numbers I was getting from P&R reports, were for LatticeECP3 ( 65 nm)
Their simpler logic yardsticks are like this
16-bit adder 500 MHz
64-bit adder 336 MHz
16-bit counter 500 MHz
32-bit counter 500 MHz
64-bit counter 350 MHz
64-bit accumulator 326 MHz

That compares with these process nodes for Cyclone's

Cyclone IV (60nm)
Cyclone V (28 nm)

Putting smarts into the I/O pins

Comments