Starting work on smart pins

2

Comments

  • Cluso99 wrote: »
    Might it be possible (and worth it) to window the smart pins into a small block in the hub ram ???

    One of the original ideas was to have pin independence so Cogs aren't competing for what shouldn't be a shared resource.

    On the other hand, we are probably now talking more about config than throughput. Compared to the Prop1, we're ending up with a lot of automatic hardware that just needs configured now.
  • jmgjmg Posts: 14,540
    evanh wrote: »
    On the other hand, we are probably now talking more about config than throughput. Compared to the Prop1, we're ending up with a lot of automatic hardware that just needs configured now.
    Yes, I think Chip is considering mainly Config/setup.

    I think there still needs to be a simple, direct path to allow (for example), multiple COGS to SysCLK granular SYNC to an single shared external edge.
    IIRC, that was via a MUX to swap-in a flag to the usual simple direct pin IO that is there now.
  • cgracey wrote: »

    PinA = this pin
    PinB = neighbor pin (pin pairs are even/odd)

    The "Input" column shows what is returned via INA/INB.

    Mode %0000_000000000 (default) is a normal digital I/O pin.

    Mode %0010_001011011 would be a dual-pin, unclocked, inverter with 100k-ohm feedback.

    Mode %0100_001110110 would be a single-pin, unclocked, Schmitt-trigger relaxation oscillator with +/-10uA current feedback.


    WOW. <remembers scene in front of police station in the movie "Tank"> I think ya got me covered.
  • Are we going to need smart pins to talk to the external memory on the P123 A9?
    What kind of benchmarks are we talking about for this?
  • rjo__ wrote: »
    Are we going to need smart pins to talk to the external memory on the P123 A9?
    What kind of benchmarks are we talking about for this?

    One smartpin will be needed to output NCO MSB to generate the SDRAM clock, but the rest will be handled by instructions and the streamer.
  • I found a better way to talk to smartpins: 4-bit pathways that can move 32 bits in 8 clocks.

    These will be OR'd together from all cogs and won't use the hub for coordination.

    They are fast enough that no interrupts will be needed to optimally use them.

    Because they happen within an instruction, they don't require separate flops to work in the background.

    Pins will feed back longs using the 3 LSBs of the system counter to time the nibbles.

    This seems to make the best balance of performance and silicon.
  • jmgjmg Posts: 14,540
    cgracey wrote: »
    I found a better way to talk to smartpins: 4-bit pathways that can move 32 bits in 8 clocks.
    ...

    This seems to make the best balance of performance and silicon.

    Sounds good.

    There is still a Flag to Pin path Mux, for shared polling for example ?

    Is there an address field sent too, for 36 or 40b frames to the pins - some means to select which PinCell.reg is used would be needed ? - that will add a few more clocks ?

    With a slowish read path, more buffering will be needed at the pins.

    eg for the Toggle-at-match PWM variant mentioned above, that is 2 compare registers, and it is also useful to have at least 2 capture registers (usually shared with compare), to allow for _/= and =\_ capture, down to 1 sysclk wide.
    I think Microchip has a small FIFO on some capture, of course is nice, but may be too costly.
  • jmgjmg Posts: 14,540
    edited 2015-12-03 - 20:00:11
    cgracey wrote: »
    rjo__ wrote: »
    Are we going to need smart pins to talk to the external memory on the P123 A9?
    What kind of benchmarks are we talking about for this?

    One smartpin will be needed to output NCO MSB to generate the SDRAM clock, but the rest will be handled by instructions and the streamer.

    For the newer clk-echo memories,(ISSI,Micron,Spansion et al) one more pin would be needed for the clk back.

    Also, a simple Pin-Mux-reg is needed for DDR@Pins to wider Clk sync'd data paths inside P2
    (or, a /2 at the pin, but that halves the data rate)

    Looking at the data on those parts, this CLK-echo has other uses besides timing closure.
    It will pause when data is not valid, and so neatly manages the latencies memories have on initial memory read, and the few-clock pauses some have on crossing page boundaries when streaming.

    Good data examples are here ( see RWDS)
    http://www.issi.com/US/product-flash.shtml

  • Hi Chip

    Perhaps one of the first questions I've made to myself, when I was introduced to the Propeller, could now be answered.

    Is there a fixed relationship between the 4 LSBs of the system counter and the COG that is getting its HUB slot?

    I'm asking this because, if there is one (perhaps equality), previewing when a specific COG will get its slot or mentally following the nibbles parade to complete 32 bits, will become very simple and straightforward tasks.

    Henrique

    cgracey wrote: »
    I found a better way to talk to smartpins: 4-bit pathways that can move 32 bits in 8 clocks.

    These will be OR'd together from all cogs and won't use the hub for coordination.

    They are fast enough that no interrupts will be needed to optimally use them.

    Because they happen within an instruction, they don't require separate flops to work in the background.

    Pins will feed back longs using the 3 LSBs of the system counter to time the nibbles.

    This seems to make the best balance of performance and silicon.

  • jmgjmg Posts: 14,540
    Yanomani wrote: »
    Is there a fixed relationship between the 4 LSBs of the system counter and the COG that is getting its HUB slot?
    Close, but not quite.
    I think the HUB slot used is governed by the 4 lower bits of the variable R/W address, not the COG number.
    This enables fast streaming, but access times are a little less deterministic.

  • potatoheadpotatohead Posts: 10,121
    edited 2015-12-03 - 20:17:02


    On 0, all cogs can access addresses with nibbles equal to their cogid.

    On clock 1, all cogs access address+1 modulo 15

    Etc...

  • RaymanRayman Posts: 11,494
    edited 2015-12-03 - 20:36:23
    I like the sound of NCO MSB output. Will give me the pixel clock I need to get LCD refresh rate up.
    Also will allow output to DVI encoder and other things like that...

    Will we be able to flip the polarity if needed?
  • YanomaniYanomani Posts: 1,080
    edited 2015-12-03 - 20:50:25
    Hi potatohead

    Modulo 15 or 7, as they are being selected by the 3 LSBs of the system counter?

    potatohead wrote: »

    On 0, all cogs can access addresses with nibbles equal to their cogid.

    On clock 1, all cogs access address+1 modulo 15

    Etc...

  • cgracey wrote: »
    I found a better way to talk to smartpins: 4-bit pathways that can move 32 bits in 8 clocks.

    These will be OR'd together from all cogs and won't use the hub for coordination.

    Because they happen within an instruction, they don't require separate flops to work in the background.

    Pins will feed back longs using the 3 LSBs of the system counter to time the nibbles.

    How in the world do you keep innovating and developing, Chip? It tires my brain just to think of all that is involved in SmartPins, and yet it sits atop an enormous amount of work you've done already on P2, P2 Hot, and P2 mask-error. It is high time for a big payoff to come your way!

  • The smart pins are sounding great!
    cgracey wrote: »
    I found a better way to talk to smartpins: 4-bit pathways that can move 32 bits in 8 clocks.

    These will be OR'd together from all cogs and won't use the hub for coordination.

    They are fast enough that no interrupts will be needed to optimally use them.

    Because they happen within an instruction, they don't require separate flops to work in the background.

    Pins will feed back longs using the 3 LSBs of the system counter to time the nibbles.

    This seems to make the best balance of performance and silicon.

  • potatoheadpotatohead Posts: 10,121
    edited 2015-12-03 - 21:47:56
    15, due to the action of the fifo, etc...

    That's what I understand. Need to go back to find the egg beater thread again. There was a cool chart showing it.

    For instructions, it is just a longer cycle, due to their two clock time.
  • Isn't it modulo 16 because of the 16 cogs?
  • cgraceycgracey Posts: 13,125
    edited 2015-12-03 - 22:51:55
    I've got a better plan: 3-pins!!!

    Smart pins will look at an incoming 3-bit code on every clock. If it is non-%000, this means the start of a command. The command length varies, according to this initial 3-bit code, which will be followed by some fixed number of 3-bit payloads:
    Smart pin configuration instructions:
    
    (1)	PINDAC  D/#,S/#		- D[07:0] = DAC value,		S[5:0] = pin number,	3 clocks
    (2)	PINCFG  D/#,S/#		- D[18:0] = Pin config,		S[5:0] = pin number,	7 clocks
    (3)	PINX    D/#,S/#		- D[31:0] = register X data,	S[5:0] = pin number,	12 clocks
    (4)	PINY    D/#,S/#		- D[31:0] = register Y data,	S[5:0] = pin number,	12 clocks
    
    3-bit-per-clock smart pin commands
    
    (1)	1DD DDD DDD - update 8-bit DAC value (lower 8 bits of pin configuration)
    (2)	01M MMM MMP PPP PPP PPP PPP - update 6-bit mode and 13-bit pin configuration
    (3)	001 0DD DDD DDD DDD DDD DDD DDD DDD DDD DDD DDD - update 32-bit register X with data
    (4)	001 1DD DDD DDD DDD DDD DDD DDD DDD DDD DDD DDD - update 32-bit register Y with data
    

    Each cog outputs 3 bits per pin (3 x 32 = 96 bits). The cogs' sixteen sets of 96 bits get OR'd together to form a composite 96-bit set that gives each pin 3 bits. It's like how OUT and DIR signals are OR'd together, except 3 bits per pin, instead of 1.

    I think two 32-bit registers are plenty for data setting, while we have a 6-bit mode and a 13-bit pin configuration. That should provide enough raw input conduit for anything. For example, that dual-output triangle-wave PWM mode could use one 32-bit register as an NCO adder value (frequency) and the other 32-bit register as two 16-bit words that provide the thresholds.

    Anyway, I feel like this is snapping to grid now, and this part of the problem is maybe solved.

    To input from pins, I was thinking that we could have 4 pins coming out of each smart pin and they could be correlated to the system counter's 3 LSBs, so that to read a smart pin, you would just gather nibbles from one of 64 sets of 4 pins, as the system counter's 3 LSBs ran from %000 to %111. I don't think that smart pins need to return more than 32 bits, but we could make them do so by using bit 3 of the system counter to parse longs.
  • jmgjmg Posts: 14,540
    cgracey wrote: »
    I don't think that smart pins need to return more than 32 bits, but we could make them do so by using bit 3 of the system counter to parse longs.
    There is a Capture case of simple Pulse Width measurement where you could want to read 64b ie 2 x 32b values being time-stamps of Rise and Fall.
    Some simple Arm/trigger logic is needed to ensure those stamps are on the same cycle.
    If you have Capture and Clear option on one edge, then 2 x 32b captures can give Edge position and Period

    Wide dynamic range Frequency counting needs to capture Time and Fi Cycles on an Arm/trigger basis.
    That's 2 x 32b captures and 2 counters, one for time, one for Fi Cycles.

  • cgracey wrote: »
    I've got a better plan: 3-pins!!!

    Smart pins will look at an incoming 3-bit code on every clock. If it is non-%000, this means the start of a command. The command length varies, according to this initial 3-bit code, which will be followed by some fixed number of 3-bit payloads: ...
    Hmmmm... Maybe you can use a similar technique to give variable-byte-length COG instructions so we get better code density! :-)

  • jmgjmg Posts: 14,540
    cgracey wrote: »
    I think two 32-bit registers are plenty for data setting, while we have a 6-bit mode and a 13-bit pin configuration. That should provide enough raw input conduit for anything. For example, that dual-output triangle-wave PWM mode could use one 32-bit register as an NCO adder value (frequency) and the other 32-bit register as two 16-bit words that provide the thresholds.
    That may be a little 'light' ?
    PWM you need to set the Period and Thresholds, and 16b is maybe just enough if you have a prescaler too.
    (some PWM control schemes keep the threshold fixed and vary the total period )

  • jmg wrote: »
    cgracey wrote: »
    I think two 32-bit registers are plenty for data setting, while we have a 6-bit mode and a 13-bit pin configuration. That should provide enough raw input conduit for anything. For example, that dual-output triangle-wave PWM mode could use one 32-bit register as an NCO adder value (frequency) and the other 32-bit register as two 16-bit words that provide the thresholds.
    That may be a little 'light' ?
    PWM you need to set the Period and Thresholds, and 16b is maybe just enough if you have a prescaler too.
    (some PWM control schemes keep the threshold fixed and vary the total period )

    The period would be a function of the NCO frequency and the two 16-bit thresholds would get compared to NCO[30:15]. Well, before the comparison, the NCO value would be NOT'd if the MSB was set. That would give the triangle waveform. That would be sufficient, wouldn't it?
  • jmgjmg Posts: 14,540
    edited 2015-12-04 - 00:01:10
    cgracey wrote: »
    The period would be a function of the NCO frequency and the two 16-bit thresholds would get compared to NCO[30:15]. Well, before the comparison, the NCO value would be NOT'd if the MSB was set. That would give the triangle waveform. That would be sufficient, wouldn't it?

    Hmmm, I'm not sure about how NCO multiple bits meshes with PWM.

    That would have jitter, and not make for easy 'live' modulation of the period, and would need >= compares rather than = ?

    In PWM designs, usually the setpoints (Periods, compares) buffer and update only on Counter=0, which is also not quite a NCO concept.
    The period needs to be fully granular and stable for some modulation schemes. (ie allow 1024, 1003, 1047 or whatever)

  • jmg wrote: »
    cgracey wrote: »
    The period would be a function of the NCO frequency and the two 16-bit thresholds would get compared to NCO[30:15]. Well, before the comparison, the NCO value would be NOT'd if the MSB was set. That would give the triangle waveform. That would be sufficient, wouldn't it?

    Hmmm, I'm not sure about how NCO multiple bits meshes with PWM.

    That would have jitter, and not make for easy 'live' modulation of the period, and would need >= compares rather than = ?

    In PWM designs, usually the setpoints (Periods, compares) buffer and update only on Counter=0, which is also not quite a NCO concept.
    The period needs to be fully granular and stable for some modulation schemes. (ie allow 1024, 1003, 1047 or whatever)

    It's true there would be 1 clock period of jitter for most values, but it would average out to be really precise. Adders would make it expensive.

    In either case, you would have to update the thresholds synchronous to the period, right? Well, I can see where a single equality event could get around the need for that. Is that what you were implying?
  • jmgjmg Posts: 14,540
    edited 2015-12-04 - 01:26:36
    cgracey wrote: »
    In either case, you would have to update the thresholds synchronous to the period, right?
    Yes, that avoids missing compares.
    cgracey wrote: »
    It's true there would be 1 clock period of jitter for most values, but it would average out to be really precise. Adders would make it expensive.
    ... Well, I can see where a single equality event could get around the need for that. Is that what you were implying?

    Where I can see NCO/adder prescalers have issues, is they are ok for small increments, but if you want a shorter period than 16b, the values effectively left-justify eg adding 62.25 gives average period of 1020, but that skips many values as it adds, and so simple equality (which is smaller in logic) will not work.
    Most PWMs always change counters by +/-1 and so can use the smaller == test on compares.
    (and if they update on period-end, that ensures there is always a match)

  • I've got the initial modes planned out. They fit neatly into 5 bits. This can be easily expanded to whatever we need, to accommodate USB, for example.
    instructions
    --------------------------------------------------------------------------------------------------------------------------------------
    WSBYTE	D/#,S/#		'write D[07:0] to pin S[5:0] data, mode dependent
    WSWORD	D/#,S/#		'write D[15:0] to pin S[5:0] data, mode dependent
    WSLONG	D/#,S/#		'write D[31:0] to pin S[5:0] data, mode dependent
    WSMODE	D/#,S/#		'write D[31:0] to pin S[5:0] mode %MMMMM_FFFFCIOHHHLLL
    
    RSBYTE	D,S/#		'read byte from pin S[5:0] into D, mode dependent
    RSLONG	D,S/#		'read long from pin S[5:0] into D, mode dependent
    
    A = IN from this pad, B = IN from other pad, B OUT = OUT to other pad
    
    				pad	pad
    MMMMM	Description		DIR	OUT	Pattern			Setup				Update
    --------------------------------------------------------------------------------------------------------------------------------------
    00000	OUT (default)		DIR	OUT
    00001	B OUT			DIR	B OUT
    00010	CLK			DIR	CLK
    00011 *	transitions		DIR	mode	update-period-repeat	WSBYTE=prescaler		WSLONG=transitions
    
    00100 *	duty			DIR	mode	update-period-repeat	WSBYTE=prescaler		WSLONG=adder ~
    00101 *	nco			DIR	mode	update-period-repeat	WSBYTE=prescaler		WSLONG=adder ~
    00110 *	pwm sawtooth 16:16	DIR	mode	update-period-repeat	WSBYTE=prescaler		WSLONG=F:T, WSWORD=T ~
    00111 *	pwm triangle 16:16	DIR	mode	update-period-repeat	WSBYTE=prescaler		WSLONG=F:T, WSWORD=T ~
    
    01000 *	count highs		DIR **	OUT	period-update-repeat	WSLONG=period (0=cont)		RSLONG=count ~
    01001 *	count lows		DIR **	OUT	period-update-repeat	WSLONG=period (0=cont)		RSLONG=count ~
    01010 *	count all edges		DIR **	OUT	period-update-repeat	WSLONG=period (0=cont)		RSLONG=count ~
    01011 *	count positive edges	DIR **	OUT	period-update-repeat	WSLONG=period (0=cont)		RSLONG=count ~
    
    01100 *	time highs		DIR **	OUT	event-update-repeat					RSLONG=count ~
    01101 *	time lows		DIR **	OUT	event-update-repeat					RSLONG=count ~
    01110 *	time highs/lows		DIR **	OUT	event-update-repeat					RSLONG=count ~ (MSB=state)
    01111 *	time positive edges	DIR **	OUT	event-update-repeat					RSLONG=count ~
    
    10000 *	DAC cog channel		DIR	OUT	event-update-repeat	WSLONG=period
    10001 *	DAC random per period	DIR	OUT	event-update-repeat	WSLONG=period
    10010 *	DAC 16-bit dither	DIR	OUT	event-update-repeat	WSLONG=period			WSWORD=value ~
    10011 *	DAC 16-bit pwm LSB	DIR	OUT	event-update-repeat	WSLONG=period			WSWORD=value ~
    
    10100 *	A-high inc, B-high dec	DIR **	OUT	period-update-repeat	WSLONG=period (0=cont)		RSLONG=count ~
    10101 *	A-rise inc, B-rise dec	DIR **	OUT	period-update-repeat	WSLONG=period (0=cont)		RSLONG=count ~
    10110 *	A-B encoder		DIR **	OUT	period-update-repeat	WSLONG=period (0=cont)		RSLONG=count ~
    10111 *	pulse, wait B		DIR	mode	period-update-repeat	WSLONG=16:16 H:L period		RSLONG=last wait for B ~
    
    11000 *	sync tx byte, B clk	DIR	mode	transmit-wait-repeat	WSWORD=baud ***			WSBYTE=data ~~
    11001 *	sync tx long, B clk	DIR	mode	transmit-wait-repeat	WSWORD=baud ***			WSLONG=data ~~
    11010 *	sync rx byte, B clk	DIR **	OUT	wait-receive-repeat	WSWORD=baud ***			RSBYTE=data ~
    11011 *	sync rx long, B clk	DIR **	OUT	wait-receive-repeat	WSWORD=baud ***			RSLONG=data ~
    
    11100 *	async tx byte		DIR	mode	transmit-wait-repeat	WSWORD=baud			WSBYTE=data ~~
    11101 *	async tx long		DIR	mode	transmit-wait-repeat	WSWORD=baud			WSLONG=data ~~
    11110 *	async rx byte		DIR **	OUT	wait-receive-repeat	WSWORD=baud			RSBYTE=data ~
    11111 *	async rx long		DIR **	OUT	wait-receive-repeat	WSWORD=baud			RSLONG=data ~
    
      * DIR from cogs: 0=reset, 1=start; IN to cogs: 1=done; !OUT from cogs clears done
     ** set %HHHLLL to %111111 (float/float) if your intent is to input
    *** for tx, update data after B-rise; for rx, sample data before b-rise (delay input data by one clk)
      ~ data is buffered
     ~~ data is double buffered
    
  • jmgjmg Posts: 14,540
    Sounds good.
    It would help to include the resource size for each mode.
    I'm guessing 16b counters/setpoints for PWM ( & 16b prescale?) and 32b counters for Timers and capture ?
    How many captures are there ?
    eg can it capture both Period and mid edge, to extract duty cycle as M/P ?
    Likewise narrow pulse width capture can capture _/= and =\_ into separate registers, to allow down to 1 SysCLK width capture.

    Do the Sync Tx modes include 2w and 4w for Dual/Quad SPI ?
    ( HW that supports Dual SPI can also do JTAG, and P2 should make quite a good JTAG engine.)
    A bit-count that covers 1..32 is the most flexible.

    Note SPI is usually duplex, and code often decides to discard Rx, but it is there as part of the process.
    Above list seems to be Tx or Rx ?
  • We could refine the capture modes to support short events, and also midpoint and period.

    I don't have any immediate plans for 2w and 4w modes, but they could be added. Right now the smart pins are even/odd-paired for signal sharing. Handling 3 or 5 pins (2w/4w+clk) would need another topology. I will get this working first, and then it should be more obvious how to arrange more pins into a smart pin.

    I'm anxious to see how much logic this will all take. There is a lot of sharing or flops, etc.
  • jmgjmg Posts: 14,540
    cgracey wrote: »
    We could refine the capture modes to support short events, and also midpoint and period.
    Some MCUs use FIFOS, but I think here just 2 capture would be enough to manage short events, and also midpoint and period.

    cgracey wrote: »
    I don't have any immediate plans for 2w and 4w modes, but they could be added. Right now the smart pins are even/odd-paired for signal sharing. Handling 3 or 5 pins (2w/4w+clk) would need another topology. I will get this working first, and then it should be more obvious how to arrange more pins into a smart pin.

    I'm anxious to see how much logic this will all take. There is a lot of sharing or flops, etc.

    Yes, the wider modes overlap somewhat with the Streamer, but it will be important to stream 4w and 8w memories.
    Logic needed will be interesting, as there are many of these.
  • jmgjmg Posts: 14,540
    cgracey wrote: »
    I've got the initial modes planned out. They fit neatly into 5 bits.

    What is the Baud Rate formula for Sync/Async ?

Sign In or Register to comment.