There was some concern about the gated-counter mode.
What we have just accumulates A-rises when B is high, reporting periodically or continuously.
I think what is wanted is a mode where Z is incremented each time A rises when B is high. When B falls, the Z is reported. A new count begins in Z when B goes high. Is that right?
For a few extra gates, I just added some functionality to the mode %10010.
Y[1:0] now selects what gets tallied:
%11 = a-edges
%01 = a-rises
%x0 = a-highs
This gives it a little more flexibility.
That sounds more useful, I prefer more user control over fixed, canned choices.
So mode %10010 is now : Window-By-A
Where Am is one of Ah,Ae,Ar Repeat until Sum(Am) >=N; Report dT
- dT is the time for last instance of Sum(Am) >=N;
The edge versions give Time for N cycles
Sounds like this needs a paired version that does Window-By-Time
Where Am is one of Ah,Ae,Ar Repeat Sum(Am) until dT >=N; Report Sum
The edge versions give Cycles per Nt.
Seems those could merge with other modes ?
There was some concern about the gated-counter mode.
What we have just accumulates A-rises when B is high, reporting periodically or continuously.
I think what is wanted is a mode where Z is incremented each time A rises when B is high. When B falls, the Z is reported. A new count begins in Z when B goes high. Is that right?
Yes, that is a classic Gated Counter with Capture, for N=1 case.
For N>1, it reports Z, after N falls on B.
Because B is the clock enable, effectively yes, 'A new count begins in Z when B goes high', however if B is high when the state starts, that should be valid.
In some cases, the Gate is a trailing edge only.
In all modes, I think a flag should signal if a second capture occurred before the user read the signal.
That tells them they either need to change ranges, or discard/reality check that reading.
Leading and trailing edge glitches will be common, you do not want them invisible.
Don't remove the totalizer feature from mode %01100.
I think that is the x sets N=0 case ?
The mode %01100 = Count A-input positive edges when B-input is high
X[31:0] establishes a measurement period in clock cycles.
If zero is used for the period, the measurement operation will not be periodic, but continuous, like a totalizer, and the current 32-bit high count can always be read via RDPIN/RQPIN.
so that's not quite a simple Gated Counter, but one that is Time-gated as well.
An artifact of this, is if dT happens to hit a Bh state, you get a partial gate effect, but if dT happens to hit !Bh, you get full-gate counts, (but do not know how many Bh are included)
This comes back to the general premise that xNT should have a user choice of SysCLKs or Pin-events as a Clock.
This operation is what the DOCs say now :
If set as xNT.clk = SysCLK, mode %01100 runs for a X-set dT, and reports whatever 'B Enables A' has counted to in that time, including partial gates. Pin cell captures every dT
If set xNT.clk = Bevent, now X sets the number of B falls before the capture occurs. Pin Cell captures after Sum(Bf) >=X
This is the change Chip mentions above, I think.
In both cases, I think X=0 sets the same, continually readable ('continuous, like a totalizer,') operation. Any xNT effects are disabled.
Today, I improved the noise filtering on the smart pins. Before, we had four different modes that were fixed and assumed a 160MHz clock rate.
Now, we have have four system-level settable filter modes that pick clock-over-1/2/4/8/...2G as the sample rate and 2/3/5/8 bits of sample pipeline that must be all high or low to change the filter output.
The four different filter settings are established by SETCLK:
SETCLK ##%01nn0000_00000000_00000000_0pprrrrr
%nn = filter setting to establish (0..3)
%pp = sample pipeline depth of 2/3/5/8 bits (0..3)
%rrrrr = sample rate of system clock divided by 1, 2, 4, 8, ...2G (0..31)
Patterns %100..%111 in the %FFF bits in the WRPIN data select filter settings 0..3 for each smart pin.
The four modes are initialized on reset as follows:
mode 0 = %00_00000 = depth of 2 at clock/1 = 12.5ns @160MHz
mode 1 = %01_00101 = depth of 3 at clock/32 = 600ns @160MHz
mode 2 = %10_10011 = depth of 5 at clock/512K = 16ms @160MHz
mode 3 = %11_10110 = depth of 8 at clock/4096K = 210ms @160MHz
I think this should cover filtering pretty well. This added a negligible amount of logic to the hub and then 1 flop per smart pin. I felt, as did others, that the previous implementation was rather lacking.
Today, I improved the noise filtering on the smart pins. Before, we had four different modes that were fixed and assumed a 160MHz clock rate.
Sounds useful.
What is the fastest External Clock toggle frequency in Counter modes ?
Is that < SysCLK/2 (one sample high, one sample low), or something slower ?
I could have done it using a wait-event, but I didn't know the format at that moment, so I just typed a two instruction loop.
Phew, I was worried that granularity was gone, in the latest Pin-opcode round...
I have a design case here, typical of software assisted links, where it needs to
- Wait for an Edge on a Pin, or wait for N-edges on a Pin
- Output , or read, data to data pins
Very similar to software based SPI.
With these latest filtering choices, and opcode delays, what is the SysCLK delays in P2, for Pin-Edge-Wait to Data-Change or Data-Sample ?
Out of curiosity, has anyone played with the smart pins for controlling things like servos, steppers, and motor controllers? Or implemented PID control? With the consensus that the P2 will be well suited for machine control, is stands to reason that the smart pins should be easy to use for this purpose. With the recent discussions about needed pin modes, I figured it would be a good idea to make sure we have excellent coverage for these use cases (as long as it doesn't delay the chip).
DAC and PWM outputs are excellent. Stepper pulses will be a little more involved around the pulse rate modes. I note there is comments about specifying a number of pulses, so it should be possible to control positional distances as well as speed.
I've always worked at the DAC level and using positional feedback so haven't thought about steppers much historically.
Seairth,
I've just poked modes %00100 "pulse/cycle output" and %00101 "transition output". Both are pulse streaming output modes.
Neither of them are buffered, so not particularly great for stepper use.
%00101 counts out transitions and is not too bad at chaining successive segments. To prevent any glitch in the pulse timing, when directly waiting, it requires a minimum of 12 clocks setting for the transition period. Longer periods would be required when chaining segments via interrupts.
%00100 appears to be flawed, it has a built-in trailing bias that prohibits seamless chaining of segments. See attached snap, I've intentionally used lower frequency to highlight that the bias is based on the length of the pulse period.
Here's the source code. Mode is set to %00100. It's three pulses at 250 kHz followed immediately by three pulses at 500 kHz. There shouldn't be any gap but the WAITSE1 doesn't fire until after a trailing period.
setse1 #(%001_000000 | tpin) 'set a rising edge event on the test IN (pin 4)
mov ticks1, ##$0078_00f0 '240 clocks per pulse
mov ticks2, ##$003c_0078 '120 clocks per pulse
.main1
'Test the choosen Smartpin
pollse1 'clear possible pending event
wxpin ticks1, #tpin '
wypin #3, #tpin '3 pulses
waitse1 'wait for final pulse
wxpin ticks2, #tpin
wypin #3, #tpin
waitx one_sec
jmp #.main1
I think using a Streamer instead of any Smartpins would be more suited to driving a stepper. Streamers have the useful chained buffering as a hardware feature so, using DDS techniques, a single Streamer can handle multiple synchronised motors at once and even could do micro-stepping if you didn't mind throwing lots of pins at the job.
In a similar vein, for individual Smartpins, it would be possible to manage chaining using timed, rather than event based, updates. Basically, change the X,Y parameters a few clocks before the previous pulse stream completes. Not unlike optimising HubRAM accesses. EDIT: Actually, that's exactly what ended up happening in the fastest Prop1 driver code, the WAITs were mostly thrown out in favour of counting instructions.
Hmm. That gap is really big (~720 clock cycles?) . I haven't set up my new PropScope yet, so can you try the timing variant instead of the event and see if the delay goes away? I'm curious if the delay is in the event part or the subsequent calls to WXPIN/WYPIN.
The problem is specific to that %00100 mode, and it's dependant on the period set in X. I chose a long period to highlight the problem. Mode %00101 is perfect up to its timing limit.
Here's the code for mode %00101. I don't have scope output on hand but it didn't even miss one clock tick as far as I could tell. Not until I reduced the transition period to 11 that is. Then I got an output of a whole extra static period at the new rate before the new transition count could be loaded.
setse1 #(%001_000000 | tpin) 'set a rising edge event on the test IN (pin 4)
.main1
'Test the choosen Smartpin
pollse1 'clear possible pending event
wxpin #12, #tpin '12 clocks per transition (60 MHz / 12 = 200 ns)
wypin #5, #tpin '5 transitions = 1 microsecond
waitse1 'wait for final transition (X = 12 is minimum ...
wxpin #30, #tpin '... for appending to both parameters without any glitching)
wypin #3, #tpin
waitx one_sec
jmp #.main1
%00100 appears to be flawed, it has a built-in trailing bias that prohibits seamless chaining of segments. See attached snap, I've intentionally used lower frequency to highlight that the bias is based on the length of the pulse period.
That does look broken.
Did you check SPI Data flows, for gapless output ?
ISTR talking with Chip about gapless UART/SPI, and he tuned the data paths to allow that (I'm rusty on exact final result)
So it looks like the data path is buffered enough that it can do gapless, but the Control side may not be buffered enough ?
From your notes before, it sounds like 12 SysCLKs is the control/command/config response delay ?
That means %00100 should be able to reduce to that much gap ?
Hmm. That gap is really big (~720 clock cycles?) . I haven't set up my new PropScope yet, so can you try the timing variant instead of the event and see if the delay goes away? I'm curious if the delay is in the event part or the subsequent calls to WXPIN/WYPIN.
A delay that large, has to be data/state engine related - ie takes an extra whole or half of defined period, before accepting next command.
If that can be fixed to the correct boundary, the gap can shrink to the time to apply a new command, I think was mentioned as 12 SysCLKs.
Shrinking the gap right down to zero, would need to queue the commands, and I'm not sure how much logic that adds ?
How important is this ?
I think using a Streamer instead of any Smartpins would be more suited to driving a stepper..
Even the streamer will have some Config-Reconfig delay ?
It could be worth testing rapid FM modulation of NCO for streamer etc, to confirm that has no glitches or large gaps
JMG,
You chopped off the reason why I said that: "Streamers have the useful chained buffering as a hardware feature so, using DDS techniques, a single Streamer can handle multiple synchronised motors at once and even could do micro-stepping if you didn't mind throwing lots of pins at the job."
A Streamer can double/triple/quadruple/limited-only-by-HubRAM buffer to suit the application.
JMG,
You chopped off the reason why I said that: "Streamers have the useful chained buffering as a hardware feature so, using DDS techniques, a single Streamer can handle multiple synchronised motors at once and even could do micro-stepping if you didn't mind throwing lots of pins at the job."
A Streamer can double/triple/quadruple/limited-only-by-HubRAM buffer to suit the application.
Yes, but I was thinking about the time it takes the Streamer to change clock speeds.
That's a config-side issue, not a data-flow issue.
Or are you thinking of a single clock speed, and having massively duplicated data for lower speeds ?
No great need to change clocking at all. Just map the pulse rate like a bitmap, that's the DDS part. It might be both a tad processor and memory hungry though.
Or are you thinking of a single clock speed, and having massively duplicated data for lower speeds ?
Yes, for synchronous motion, that's probably the sensible method.
EDIT: Mode %00101 works fine for single channel stepper output. But then so does bit-bashing.
@evanh
Here's what I found when using smartpins in transition mode.
When you try to chain commands with new base periods the new base period is not set until the current base period has expired.
setse1 #(%001_000000 | tpin) 'set a rising edge event on the test IN (pin 4)
wrpin #_trans,#tpin
wxpin #90, #tpin
dirh #tpin 'init smartpin
wypin #2, #tpin '2 transitions @ 90 clocks
outh #marker1
nop
outl #marker1
waitse1 'wait for final pulse
wxpin #15, #tpin '4 transistions @ 15 clocks
wypin #4, #tpin
outh #marker1
nop
outl #marker1
waitse1 'wait for final pulse
wxpin #30, #tpin '4 transitions @ 30 clocks
wypin #4, #tpin
outh #marker1
nop
outl #marker1
waitse1 'wait for final pulse
In the included screenshot the divisions are 30 clocks.
I've included a "marker" pulse to show when a new smartpin command was issued.
See how the second set waits exactly 90 clocks before starting and the third set starts 15 clocks after starting.
I believe the base timer is a countdown counter and reloads when zero is reached.
Toggling the smartpins DIR bit is the only way to reset the counter.
From the docs
Whenever Y[31:0] is written with a non-zero value, the pin will begin toggling for
Y transitions at each base period, starting at the next base period.
"starting at the next base period" seems to only be the case after a smartpin reset.
When you try to chain commands with new base periods the new base period is not set until the current base period has expired.
That's a good thing. That's exactly why I've been saying it's a clock perfect mode.
I've included a "marker" pulse to show when a new smartpin command was issued.
See how the second set waits exactly 90 clocks before starting and the third set starts 15 clocks after starting.
I like the marker, does a good job of showing the timing of when the new segments are issued.
From the docs
Whenever Y[31:0] is written with a non-zero value, the pin will begin toggling for
Y transitions at each base period, starting at the next base period.
"starting at the next base period" seems to only be the case after a smartpin reset.
I interpreted Chip's description as meaning it'll complete the previous period before reloading with the new value. And in your example, same for me, it only generates exactly what you defined. That seems ideal to me.
%00100 appears to be flawed, it has a built-in trailing bias that prohibits seamless chaining of segments. See attached snap, I've intentionally used lower frequency to highlight that the bias is based on the length of the pulse period.
I'm not clear here, how does the earlier image, which seems to show much more than a half-period gap, reconcile with ozpropdev's images ?
@evanh
Here's what I found when using smartpins in transition mode.
When you try to chain commands with new base periods the new base period is not set until the current base period has expired.
That sounds like what you'd operation to be - seems to give no unexpected gaps ?
Comments
Y[1:0] now selects what gets tallied:
%11 = a-edges
%01 = a-rises
%x0 = a-highs
This gives it a little more flexibility.
What we have just accumulates A-rises when B is high, reporting periodically or continuously.
I think what is wanted is a mode where Z is incremented each time A rises when B is high. When B falls, the Z is reported. A new count begins in Z when B goes high. Is that right?
That sounds more useful, I prefer more user control over fixed, canned choices.
So mode %10010 is now : Window-By-A
Where Am is one of Ah,Ae,Ar
Repeat until Sum(Am) >=N; Report dT
- dT is the time for last instance of Sum(Am) >=N;
The edge versions give Time for N cycles
Sounds like this needs a paired version that does Window-By-Time
Where Am is one of Ah,Ae,Ar
Repeat Sum(Am) until dT >=N; Report Sum
The edge versions give Cycles per Nt.
Seems those could merge with other modes ?
Yes, that is a classic Gated Counter with Capture, for N=1 case.
For N>1, it reports Z, after N falls on B.
Because B is the clock enable, effectively yes, 'A new count begins in Z when B goes high', however if B is high when the state starts, that should be valid.
In some cases, the Gate is a trailing edge only.
In all modes, I think a flag should signal if a second capture occurred before the user read the signal.
That tells them they either need to change ranges, or discard/reality check that reading.
Leading and trailing edge glitches will be common, you do not want them invisible.
The mode
%01100 = Count A-input positive edges when B-input is high
X[31:0] establishes a measurement period in clock cycles.
If zero is used for the period, the measurement operation will not be periodic, but continuous, like a totalizer, and the current 32-bit high count can always be read via RDPIN/RQPIN.
so that's not quite a simple Gated Counter, but one that is Time-gated as well.
An artifact of this, is if dT happens to hit a Bh state, you get a partial gate effect, but if dT happens to hit !Bh, you get full-gate counts, (but do not know how many Bh are included)
This comes back to the general premise that xNT should have a user choice of SysCLKs or Pin-events as a Clock.
This operation is what the DOCs say now :
If set as xNT.clk = SysCLK, mode %01100 runs for a X-set dT, and reports whatever 'B Enables A' has counted to in that time, including partial gates. Pin cell captures every dT
If set xNT.clk = Bevent, now X sets the number of B falls before the capture occurs. Pin Cell captures after Sum(Bf) >=X
This is the change Chip mentions above, I think.
In both cases, I think X=0 sets the same, continually readable ('continuous, like a totalizer,') operation. Any xNT effects are disabled.
Is there no single opcode, 1 clock granular exiting, Wait for Pin Capture opcode ?
Seems the 2 lines opcode polling will be 4 sysclk granular ?
Now, we have have four system-level settable filter modes that pick clock-over-1/2/4/8/...2G as the sample rate and 2/3/5/8 bits of sample pipeline that must be all high or low to change the filter output.
The four different filter settings are established by SETCLK:
SETCLK ##%01nn0000_00000000_00000000_0pprrrrr
%nn = filter setting to establish (0..3)
%pp = sample pipeline depth of 2/3/5/8 bits (0..3)
%rrrrr = sample rate of system clock divided by 1, 2, 4, 8, ...2G (0..31)
Patterns %100..%111 in the %FFF bits in the WRPIN data select filter settings 0..3 for each smart pin.
The four modes are initialized on reset as follows:
mode 0 = %00_00000 = depth of 2 at clock/1 = 12.5ns @160MHz
mode 1 = %01_00101 = depth of 3 at clock/32 = 600ns @160MHz
mode 2 = %10_10011 = depth of 5 at clock/512K = 16ms @160MHz
mode 3 = %11_10110 = depth of 8 at clock/4096K = 210ms @160MHz
I think this should cover filtering pretty well. This added a negligible amount of logic to the hub and then 1 flop per smart pin. I felt, as did others, that the previous implementation was rather lacking.
What is the fastest External Clock toggle frequency in Counter modes ?
Is that < SysCLK/2 (one sample high, one sample low), or something slower ?
Phew, I was worried that granularity was gone, in the latest Pin-opcode round...
I have a design case here, typical of software assisted links, where it needs to
- Wait for an Edge on a Pin, or wait for N-edges on a Pin
- Output , or read, data to data pins
Very similar to software based SPI.
With these latest filtering choices, and opcode delays, what is the SysCLK delays in P2, for Pin-Edge-Wait to Data-Change or Data-Sample ?
I've always worked at the DAC level and using positional feedback so haven't thought about steppers much historically.
I've just poked modes %00100 "pulse/cycle output" and %00101 "transition output". Both are pulse streaming output modes.
Neither of them are buffered, so not particularly great for stepper use.
%00101 counts out transitions and is not too bad at chaining successive segments. To prevent any glitch in the pulse timing, when directly waiting, it requires a minimum of 12 clocks setting for the transition period. Longer periods would be required when chaining segments via interrupts.
%00100 appears to be flawed, it has a built-in trailing bias that prohibits seamless chaining of segments. See attached snap, I've intentionally used lower frequency to highlight that the bias is based on the length of the pulse period.
In a similar vein, for individual Smartpins, it would be possible to manage chaining using timed, rather than event based, updates. Basically, change the X,Y parameters a few clocks before the previous pulse stream completes. Not unlike optimising HubRAM accesses. EDIT: Actually, that's exactly what ended up happening in the fastest Prop1 driver code, the WAITs were mostly thrown out in favour of counting instructions.
That does look broken.
Did you check SPI Data flows, for gapless output ?
ISTR talking with Chip about gapless UART/SPI, and he tuned the data paths to allow that (I'm rusty on exact final result)
So it looks like the data path is buffered enough that it can do gapless, but the Control side may not be buffered enough ?
From your notes before, it sounds like 12 SysCLKs is the control/command/config response delay ?
That means %00100 should be able to reduce to that much gap ?
A delay that large, has to be data/state engine related - ie takes an extra whole or half of defined period, before accepting next command.
If that can be fixed to the correct boundary, the gap can shrink to the time to apply a new command, I think was mentioned as 12 SysCLKs.
Shrinking the gap right down to zero, would need to queue the commands, and I'm not sure how much logic that adds ?
How important is this ?
It could be worth testing rapid FM modulation of NCO for streamer etc, to confirm that has no glitches or large gaps
You chopped off the reason why I said that: "Streamers have the useful chained buffering as a hardware feature so, using DDS techniques, a single Streamer can handle multiple synchronised motors at once and even could do micro-stepping if you didn't mind throwing lots of pins at the job."
A Streamer can double/triple/quadruple/limited-only-by-HubRAM buffer to suit the application.
Yes, but I was thinking about the time it takes the Streamer to change clock speeds.
That's a config-side issue, not a data-flow issue.
Or are you thinking of a single clock speed, and having massively duplicated data for lower speeds ?
EDIT: Mode %00101 works fine for single channel stepper output. But then so does bit-bashing.
Here's what I found when using smartpins in transition mode.
When you try to chain commands with new base periods the new base period is not set until the current base period has expired. In the included screenshot the divisions are 30 clocks.
I've included a "marker" pulse to show when a new smartpin command was issued.
See how the second set waits exactly 90 clocks before starting and the third set starts 15 clocks after starting.
I believe the base timer is a countdown counter and reloads when zero is reached.
Toggling the smartpins DIR bit is the only way to reset the counter.
From the docs "starting at the next base period" seems to only be the case after a smartpin reset.
I like the marker, does a good job of showing the timing of when the new segments are issued.
I interpreted Chip's description as meaning it'll complete the previous period before reloading with the new value. And in your example, same for me, it only generates exactly what you defined. That seems ideal to me.
That sounds like what you'd operation to be - seems to give no unexpected gaps ?
EDIT: Ah, when I said just in my prior reply to Oz, "same for me", I meant same for my own testing. I didn't mean same in my earlier snapshot example.