New Pin Instructions

cgracey · 2018-11-09 16:52

Good ideas, TonyB_. Two WRPINs from one instruction. We could even do four with a few extra clocks.

ikemschn · 2018-11-09 18:27

Hmmm ... how many clocks does it save? Is it a full 2 clock cycle or even 4 clocks? If yes, then it’d be a tremendous benefit in speed. E.g. in case of a high velocity closed loop controller. Higher frequency without the need of a higher clock rate. I‘d buy it.

ozpropdev · 2018-11-09 22:45

We only have 1 streamer per cog so to achieve multiple channel DDS we use bit bang.
A SETDAC instruction tightens things up so timer interrupt DDS performance benefits.

cgracey · 2018-11-10 04:03

TonyB_ wrote: »

Option 1 above is a sort of GETBYTE for DACs, with optional inversion for AC outputs. Data could be packed into a long to save space with no time penalty.

Being able to WRPIN/WXPIN/WYPIN more than one pin per instruction is a neat idea.

Bits 13..8 of S (normally 0) could express how many sequential pins, 1..64, are going to receive the same W?PIN operation, beginning at pin S[5:0].

I've noticed in my code that I'll often need to have four identical instructions on subsequent pins to set up four cog DACs, or something.

Also, these instructions could benefit from bits 13..8 of S expressing HOW MANY pins to operate on, but these would always take only two cycles, since we could use a thermometer decoder to get the pattern:

dirl/dirh/dirc/dirnc/dirz/dirnz/dirrnd/dirnot
outl/outh/outc/outnc/outz/outnz/outrnd/outnot
fltl/flth/fltc/fltnc/fltz/fltnz/fltrnd/fltnot
drvl/drvh/drvc/drvnc/drvz/drvnz/drvrnd/drvnot

Would you like to enable pins 32..47 for 256-step triangle PWM output and start them up, initialized to 0? Just do this:

	FLTL	pwms			'reset smart pins
	WRPIN	#%01_01000_0,pwms	'set triangle pwm mode
	WYPIN	##$0100_0001,pwms	'256 counts in a frame, frame = 1 clock
	WXPIN	#0,pwms			'initialize pwm values to 0
	DRVL	pwms			'release smart pins from reset
...
pwms	LONG	$0F_20			'16 pins starting at 32

jmg · 2018-11-10 04:22

cgracey wrote: »

TonyB_ wrote: »

Option 1 above is a sort of GETBYTE for DACs, with optional inversion for AC outputs. Data could be packed into a long to save space with no time penalty.

Being able to WRPIN/WXPIN/WYPIN more than one pin per instruction is a neat idea.

Bits 13..8 of S (normally 0) could express how many sequential pins, 1..64, are going to receive the same W?PIN operation, beginning at pin S[5:0].

I've noticed in my code that I'll often need to have four identical instructions on subsequent pins to set up four cog DACs, or something.

Also, these instructions could benefit from bits 13..8 of S expressing HOW MANY pins to operate on, but these would always take only two cycles, since we could use a thermometer decoder to get the pattern:

dirl/dirh/dirc/dirnc/dirz/dirnz/dirrnd/dirnot
outl/outh/outc/outnc/outz/outnz/outrnd/outnot
fltl/flth/fltc/fltnc/fltz/fltnz/fltrnd/fltnot
drvl/drvh/drvc/drvnc/drvz/drvnz/drvrnd/drvnot

That's certainly flexible, but is the logic cost starting to climb here ?
Adding a variable width, to a variable base, and extracting a field up to 64b wide, does not feel cheap.

Maybe if a lot of opcodes can leverage this, it can be worth the logic cost ?

cgracey · 2018-11-10 04:27

jmg wrote: »

cgracey wrote: »

TonyB_ wrote: »

Option 1 above is a sort of GETBYTE for DACs, with optional inversion for AC outputs. Data could be packed into a long to save space with no time penalty.

Being able to WRPIN/WXPIN/WYPIN more than one pin per instruction is a neat idea.

Bits 13..8 of S (normally 0) could express how many sequential pins, 1..64, are going to receive the same W?PIN operation, beginning at pin S[5:0].

I've noticed in my code that I'll often need to have four identical instructions on subsequent pins to set up four cog DACs, or something.

Also, these instructions could benefit from bits 13..8 of S expressing HOW MANY pins to operate on, but these would always take only two cycles, since we could use a thermometer decoder to get the pattern:

dirl/dirh/dirc/dirnc/dirz/dirnz/dirrnd/dirnot
outl/outh/outc/outnc/outz/outnz/outrnd/outnot
fltl/flth/fltc/fltnc/fltz/fltnz/fltrnd/fltnot
drvl/drvh/drvc/drvnc/drvz/drvnz/drvrnd/drvnot

That's certainly flexible, but is the logic cost starting to climb here ?
Adding a variable width, to a variable base, and extracting a field up to 64b wide, does not feel cheap.

Maybe if a lot of opcodes can leverage this, it can be worth the logic cost ?

I'm realizing it could only be made to work on pins 0..31 or pins 32..64, due to data-forwarding limitations on the instructions which affect the DIR and/or OUT bits.

So, the range of pins needs to be bound within DIRA/OUTA or DIRB/OUTB.

How to specify that in some fool-proof way? It means that we'd only need 5 bits for the span, not 6.

WRPIN/WXPIN/WYPIN could span 64 pins, though.

How to unify all this?

Logic cost is just a thermometer decoder for the DIR/OUT-affecting instructions.

W?PIN would require a 6-bit counter, is all.

Might as will limit W?PIN to 32-pin spans, too, for consistency, with consistent rules regarding wrap-around.

jmg · 2018-11-10 04:42

cgracey wrote: »

I'm realizing it could only be made to work on pins 0..31 or pins 32..64, due to data-forwarding limitations on the instructions which affect the DIR and/or OUT bits.

So, the range of pins needs to be bound within DIRA/OUTA or DIRB/OUTB.

How to specify that in some fool-proof way? It means that we'd only need 5 bits for the span, not 6.

It's easy enough to specify, either as 32 or 64 total reach

cgracey wrote: »

WRPIN/WXPIN/WYPIN could span 64 pins, though.

How to unify all this?

Logic cost is just a thermometer decoder for the DIR/OUT-affecting instructions.

W?PIN would require a 6-bit counter, is all.

Counter ? Is this a multi-clock opcode taking up to 32? 64? 128? sysclks. Is that interruptable ?

cgracey · 2018-11-10 04:46

This is crazy simple.

To make these instructions...

dirl/dirh/dirc/dirnc/dirz/dirnz/dirrnd/dirnot
outl/outh/outc/outnc/outz/outnz/outrnd/outnot
fltl/flth/fltc/fltnc/fltz/fltnz/fltrnd/fltnot
drvl/drvh/drvc/drvnc/drvz/drvnz/drvrnd/drvnot

...work on not just single pins, but on up to 32 pins within either DIRA/OUTA or DIRB/OUTB, this line of Verilog...

wire [31:0] pindcd	= 32'b1 << d[4:0];

Just needs to be changed to this...

wire [31:0] pindcd	= ~(32'hFFFF_FFFE << d[12:8]) << d[4:0];

That is almost nothing, in the big picture.

cgracey · 2018-11-10 04:48

The W?PIN with span would take 1 extra clock for each extra pin.

cgracey · 2018-11-10 05:09

Instead of using d[12:8], it would be better to use d[10:6], since that would enable 1..8 pins to be called out in a 9-bit immediate.

        FLTL    #%111_001000      'float pins 8..15

        FLTL    #8                'float pin 8

The programmer would have to be aware that D[10:6] are being utilized for span, not just D[5:0] for pin, as that is now 'base' pin.

jmg · 2018-11-10 05:10

cgracey wrote: »
Instead of using d[12:8], it would be better to use d[10:6], since that would enable 1..8 pins to be called out in a 9-bit immediate.
        FLTL    #%111_001000      'float pins 8..15

Yes, that makes sense.
What happens to interrupts, with this new variable-time opcode ?

cgracey · 2018-11-10 05:13

jmg wrote: »
cgracey wrote: »
Instead of using d[12:8], it would be better to use d[10:6], since that would enable 1..8 pins to be called out in a 9-bit immediate.
        FLTL    #%111_001000      'float pins 8..15
Yes, that makes sense.
What happens to interrupts, with this new variable-time opcode ?

This FLTL example is a 2-clock instruction, always.

The W?PIN instruction would take 2 clocks, plus 1 more clock for each extra pin.

Yanomani · 2018-11-10 05:28

Kind of a Ninja warrior, dispensing a lot of shuriken, right at the targets, and in perfect sequence!

But, up to date, there weren't Ninja warriors capable of doing it that fast!

If that unit could be encapsulated into some state machine (kind of a limited function, mini-streamer), once started, it could be left unatended, on its own, till the burst exhausts.

If appliable, IN could be raised at the end, thus available to be sampled, during the burst, by any RDPIN or RQPIN whose destiny is within the limits of the interval.

The above could enable another Cog to wait for the end of the burst, and take over it, almost being able to dovetail the next burst..

Just a thought...

jmg · 2018-11-10 05:44

cgracey wrote: »
jmg wrote: »
cgracey wrote: »
Instead of using d[12:8], it would be better to use d[10:6], since that would enable 1..8 pins to be called out in a 9-bit immediate.
        FLTL    #%111_001000      'float pins 8..15
Yes, that makes sense.
What happens to interrupts, with this new variable-time opcode ?
This FLTL example is a 2-clock instruction, always.

The W?PIN instruction would take 2 clocks, plus 1 more clock for each extra pin.

So that means very large jitter is possible, in interrupts, when using this opcode ?
The pins also will update sequentially, rather than all on the same clock cycle ?

I presume there is still some means to update all pins/config, on the same SysCLK ?

cgracey · 2018-11-10 05:48

jmg wrote: »
cgracey wrote: »
jmg wrote: »
cgracey wrote: »
Instead of using d[12:8], it would be better to use d[10:6], since that would enable 1..8 pins to be called out in a 9-bit immediate.
        FLTL    #%111_001000      'float pins 8..15
Yes, that makes sense.
What happens to interrupts, with this new variable-time opcode ?
This FLTL example is a 2-clock instruction, always.

The W?PIN instruction would take 2 clocks, plus 1 more clock for each extra pin.
So that means very large jitter is possible, in interrupts, when using this opcode ?
The pins also will update sequentially, rather than all on the same clock cycle ?

I presume there is still some means to update all pins/config, on the same SysCLK ?

The smart pins are released from reset when their DIRs go high. That can be done all at once to align their states.

cgracey · 2018-11-10 05:52

W?PIN affecting multiple pins would delay interrupts.

cgracey · 2018-11-12 23:51

Which of the following ideas would be better for the pin instructions (DIRx/OUTx/FLTx/DRVx), which use S[5:0] to call out a (base) pin?

a) Use S[10:6] to specify how many extra pins above pin S[5:0] will be affected by the operation:

        DRVRND  #7<<6+16   'drive P[23:16] to random states

b) Ignore bits above S[5:0], but become sensitive to a just-prior SETQ whose D[4:0] would specify the number of extra pins to affect:

        SETQ    #7
        DRVRND  #16        'drive P[23:16] to random states

For pins, I think it's pretty safe to use (a), though (b) could always act as an override.

For BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT which use S[4:0] to specify a (base) bit, using S[9:5] to specify how many extra bits to operate on would be problematic, since S might hold a bit address which spans several registers (used with ALTB) and has other meaningful bits in S[9:5], already, which should not be interpreted to mean 'extra bits'. There are two ways to get around this:

a) Always interpret bits S[9:5] as 'number of extra bits to affect', unless WCZ is used to write the old bit to C and Z, since there'd be little reason to copy what amounts to the LSB of a span of bits into C and Z:

        BITRND  reg,#15<<5+16                'set bits 31..16 to random states
        BITRND  reg,#15<<5+16    WCZ         'set only bit 16 to random state, get prior bit 16 into C and Z

b) Ignore bits above S[4:0], but become sensitive to a just-prior SETQ whose D[4:0] would specify the number of extra bits to affect:

        SETQ    #15
        BITRND  reg,#16                      'set bits 31..16 to random states

How should all this be handled?

Sometimes, it might be more convenient to use a separate SETQ, instead of having a compound value in a register or needing to use a ##, anyway.

ozpropdev · 2018-11-12 23:59

IMHO the SETQ variants are more code friendly.

cgracey · 2018-11-13 00:01

ozpropdev wrote: »

IMHO the SETQ variants are more code friendly.

I agree. Should we permit the compact form, though? It is twice as fast and half the code.

ozpropdev · 2018-11-13 00:04

cgracey wrote: »

ozpropdev wrote: »

IMHO the SETQ variants are more code friendly.

I agree. Should we permit the compact form, though? It is twice as fast and half the code.

Both would be nice. The SETQ appealed because the span can be easily controlled by a register.

cgracey · 2018-11-13 00:08

Maybe we need a hybrid approach:

SETQ preceding a pin/bit instruction always overrides the span. This way, if you've got random junk in the bits that specify the span in the pin/bit instruction, you can precede with 'SETQ #0' and always operate on a single pin/bit, if that is your intent. That would keep two simple rules:

a) The five bits above the pin/bit number ALWAYS control the span of a pin/bit instruction.
b) SETQ before a pin/bit instruction ALWAYS overrides the span bits.

How would that be?

ozpropdev · 2018-11-13 00:25

cgracey wrote: »

Maybe we need a hybrid approach:

SETQ preceding a pin/bit instruction always overrides the span. This way, if you've got random junk in the bits that specify the span in the pin/bit instruction, you can precede with 'SETQ #0' and always operate on a single pin/bit, if that is your intent. That would keep two simple rules:

a) The five bits above the pin/bit number ALWAYS control the span of a pin/bit instruction.
b) SETQ before a pin/bit instruction ALWAYS overrides the span bits.

How would that be?

That would work fine.

cgracey · 2018-11-13 00:28

It's compiling now.

This feels like the right solution. You get the best of both possibilities, without being stuck if you only want one pin/bit to be affected.

TonyB_ · 2018-11-13 00:40

Are all these changes a consequence of my post here?

What does SETDAC/PINDAC or what I call WRDAC look like now?

cgracey · 2018-11-13 00:45

TonyB_ wrote: »

Are all these changes a consequence of my post here?

What does SETDAC/PINDAC or what I call WRDAC look like now?

Yes, you brought up the notion of doing a span of operations via a single instruction.

No movement on SETDAC stuff, yet. Still working on this pin/bit-span stuff. W?PIN is next.

Cluso99 · 2018-11-13 01:40

Again, caution about adding extra features!

I am a bit late to the party, but here goes anyway...

From P1 we use a 32-bit mask although that is obviously limited to 32 successive pins.

What if for DIR/OUT/FLT/DRV we had...

Same as current silicon: DIR/OUT/FLT/DRV S[6:5]=0 & S[4:0] (yes it's actually D)
New SETM & SETM2 D/# instructions to set the mask(s) for pins [31:0] & [63:32].
When DIR/OUT/FLT/DRV S[6:5]=00=no masks, single pin, 01=use mask[31:0], 10=use mask[63:31], 11=use masks[63:32]+[31:0]

The SETM & SETM2 load internal 32-bit registers which remain set until modified. Must be set for first use (ie not necessarily zero'd on coginit)

Not sure of the silicon cost here. What it gives us is the concept of masks like P1. We can set the mask(s) once, so no need to repeat them each time the DIR/OUT/FLT/DRV is used.

I wonder if this could be used with SETPAT ?

cgracey · 2018-11-13 06:37

Cluso99 wrote: »

Again, caution about adding extra features!

I am a bit late to the party, but here goes anyway...

From P1 we use a 32-bit mask although that is obviously limited to 32 successive pins.

What if for DIR/OUT/FLT/DRV we had...

Same as current silicon: DIR/OUT/FLT/DRV S[6:5]=0 & S[4:0] (yes it's actually D)
New SETM & SETM2 D/# instructions to set the mask(s) for pins [31:0] & [63:32].
When DIR/OUT/FLT/DRV S[6:5]=00=no masks, single pin, 01=use mask[31:0], 10=use mask[63:31], 11=use masks[63:32]+[31:0]

The SETM & SETM2 load internal 32-bit registers which remain set until modified. Must be set for first use (ie not necessarily zero'd on coginit)

Not sure of the silicon cost here. What it gives us is the concept of masks like P1. We can set the mask(s) once, so no need to repeat them each time the DIR/OUT/FLT/DRV is used.

I wonder if this could be used with SETPAT ?

The trouble with registers like this is that they'd need to be readable and restorable on interrupts, which is expensive to do.

cgracey · 2018-11-13 07:03

In the next silicon rev, these instructions will be able to affect spans of 1..32 bits by using S[9:5] or Q[4:0] as an additional bit count:

BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT D,{#}S

	BITNOT  reg,#8			'flip reg[8]

	BITNOT  reg,#15<<5 + 8		'flip reg[15+8:8]

	SETQ	#3
	BITNOT	reg,index		'flip reg[3+index[4:0]:index[4:0]]

	BITNOT	reg,index		'flip reg [index[9:5]+index[4:0]:index[4:0]]

These instructions will be able to affect spans of 1..32 pins by using S[10:6] or Q[4:0] as an additional pin count:

DIRL/DIRH/DIRC/DIRNC/DIRZ/DIRNZ/DIRRND/DIRNOT {#}D
OUTL/OUTH/OUTC/OUTNC/OUTZ/OUTNZ/OUTRND/OUTNOT {#}D
FLTL/FLTH/FLTC/FLTNC/FLTZ/FLTNZ/FLTRND/FLTNOT {#}D
DRVL/DRVH/DRVC/DRVNC/DRVZ/DRVNZ/DRVRND/DRVNOT {#}D

	DRVL	#20			'drive pin[20] low

	DRVL	#5<<6 + 20		'drive pins[5+20:20] low

	SETQ	#11
	DRVL	index			'drive pins[11+index[5:0]:index[5:0]] low

	DRVL	index			'drive pins[index[10:6]+index[5:0]:index[5:0]] low

In both sets of instructions, the extra bits that exceed the MSB will wrap around, starting at the LSB, in the register(s) being affected.

The pin-modifying instructions (2nd set) will not span between DIRA/OUTA and DIRB/OUTB, but will wrap within DIRA/OUTA or DIRB/OUTB.

jmg · 2018-11-13 08:20

cgracey wrote: »

In the next silicon rev, these instructions will be able to affect spans of 1..32 bits by using S[9:5] or Q[4:0] as an additional bit count:

Can you add the timing equations of these opcodes ?
To me, they are cute, but not in the 'must have' column, and I hope they do not squeeze the logic into lower sysclk speeds due to more congested routing.

More fundamentally and widely useful I can see, are the small details like being able to output a SysCLK clk with the streamer data flow at SysCLK speeds.
ie being able to generate a clock, that actually matches the top speed the P2 can emit data.

cgracey · 2018-11-13 08:22

jmg wrote: »

cgracey wrote: »

In the next silicon rev, these instructions will be able to affect spans of 1..32 bits by using S[9:5] or Q[4:0] as an additional bit count:

Can you add the timing equations of these opcodes ?
To me, they are cute, but not in the 'must have' column, and I hope they do not squeeze the logic into lower sysclk speeds due to more congested routing.

More fundamentally and widely useful I can see, are the small details like being able to output a SysCLK clk with the streamer data flow at SysCLK speeds.
ie being able to generate a clock, that actually matches the top speed the P2 can emit data.

Outputting SysCLK would require some special timing assignments. Not sure how viable that is, but when we get into the respin work with ON Semi, I'll ask about it.

New Pin Instructions

Comments