Prop123_A9_Prop2_v7z Intermediate Release - 25 March 2016 - USB, improved serial

evanh · 2016-04-04 11:20

I've realised my earlier assumption on how the filter works was flawed, believing it was false biased rather than evenly biased binary version of a Schmitt Trigger. I've gone back to the source reference that Mickster dug up from Microchip - see attached, page 15-8 - and they demo a great use of a J-K flip-flop for producing this.

I'm not happy with the fact that the filter's divider creates massive sampling holes. In the Microchip part the divider is shared between multiple pins but this is not the case for the Prop2. The Prop2 has a divider per pin. It seems to me the divider could be better used to accumulate repeating input states rather than just pacing out a few samples.

Ie: When the current result state is false, then the counter (currently the divider) will accumulate true inputs, a false input resets the counter, until terminal count reached where by the result flips to a true. Then the counter will accumulate false inputs, a true input resets the counter, until terminal count reached where by the result flips back to a false.

ErNa · 2016-04-04 11:48

I do not know, if this applies, as I just read filter and sampling without deeper understanding. In my buttons debouncer I count up and down to a +/-limit and output high/low whenever a +- threshold is reached. This works according to the motto: wait until high stays for a while to output high and wait until low stays for a while and wait to output low. Changing count limit and threshold makes behavior very flexible.

evanh · 2016-04-04 12:03

Yep, spot on. Currently the pin input filter can optionally take only three samples (effective counts) rather than potentially much more given the amount of hardware already thrown at it.

evanh · 2016-04-04 12:38

Bugger, I just realised the divider is shared between A and B input sources. So, to make it a filter per pin, the logic for this would need shifted closer to the pin. That's always an option but maybe JMG is on to something ... base it off the serial divider circuit instead of having a dedicated divider. Would save a chuck of flops and be more programmable to boot.

cgracey · 2016-04-04 16:37

evanh wrote: »

Bugger, I just realised the divider is shared between A and B input sources. So, to make it a filter per pin, the logic for this would need shifted closer to the pin. That's always an option but maybe JMG is on to something ... base it off the serial divider circuit instead of having a dedicated divider. Would save a chuck of flops and be more programmable to boot.

Here is the current pin input selection and filtering:

// pin inputs

reg [8:0]	filt_ct;
reg [2:0]	filt_ax, filt_bx;
reg		filt_a, filt_b;

wire [7:0] pad_pix	= {pad_pi[6:4], pin_out, pad_pi[3:0]};

wire a_in		= cfg[31] ^ pad_pix[cfg[30:28]];
wire b_in		= cfg[27] ^ pad_pix[cfg[26:24]];

wire [8:0] filt_ctx	= filt_ct + 1'b1;

wire filt_step		= cfg[22] ? cfg[21] ?	filt_ct[8] && !filt_ctx[8]
					    :	filt_ct[5] && !filt_ctx[5]
				  : cfg[21] ?	filt_ct[2] && !filt_ctx[2]
					    :	1'b1;

always @(posedge clk or negedge resn)
if (!resn)
begin
	filt_ct		<= 8'b0;
	filt_ax		<= 3'b0;
	filt_bx		<= 3'b0;
	filt_a		<= 1'b0;
	filt_b		<= 1'b0;
end
else if (cfg[23])
begin
	filt_ct		<= filt_ctx;
	filt_ax		<= filt_step ? {filt_ax[1:0], a_in} : filt_ax;
	filt_bx		<= filt_step ? {filt_bx[1:0], b_in} : filt_bx;
	filt_a		<= filt_ax == 3'b111 ? 1'b1 : filt_ax == 3'b000 ? 1'b0 : filt_a;
	filt_b		<= filt_bx == 3'b111 ? 1'b1 : filt_bx == 3'b000 ? 1'b0 : filt_b;
end

wire [15:0] a_log	= 16'b0110_1110_1000_1010;	// a^b, a|b, a&b, a

wire a_final		= cfg[23] ? filt_a : a_log[{cfg[22:21], b_in, a_in}];
wire b_final		= cfg[23] ? filt_b : m_usb ? pad_pi[1] : b_in;

It does throw many samples away.

Counting contiguous states is good for debouncing a button, but not good for filtering noise out.

We could do a running inventory, where we increment the filter on 1 and decrement the filter on 0, with saturation at both ends, then some hysteresis. That would filter well and also keep the sampling granularity fine, preserving phase better.

Rayman · 2016-04-04 22:40

Ok, I'm at a loss...

Tried adding pingetz and it doesn't seem to work at all. Actually it seems to cause extra characters to be transmitted... But, maybe I'm doing something wrong... Here's what I have:

send_char
                pinsety tx_char,#TX_PIN  'send  character
send_char_wait                
                pingetz temp,#TX_PIN
                testb   temp,#31 wz,wc
         if_nz  jmp #send_char_wait                
                waitx   ##120000'300
                ret

Even with the delay, I get the extra characters. Supposed to get "41" and instead get "4k1h"
It does seem to sorta work, because the strings I send after this come out Ok... strange...

Seems like calling Pingetz more than once causes the extra characters...

jmg · 2016-04-04 22:50

Rayman wrote: »

Seems like calling Pingetz more than once causes the extra characters...

My reading of the docs has 2 choices, of
Buffer-Ready polling via IN, and if you want
End-Of-Transmit, you can use what Chip calls 'busy', via PINGETZ.
Buffer polling should have smaller gaps between characters.

- but one would not expect PINGETZ to give bonus chars ?

Rayman · 2016-04-04 23:17

Ok, this is weird, seems my real problem is with the "testb" command.

For some reason I don't understand, it's fine with "wz" but "wz,wc" doesn't work.
But, why should that make a difference when the next command conditional is "if_nz"?

I don't see yet where else in the code having carry set would change anything...
But, that there must be something... Need to look at more of the code.
So, nevermind...

Found it... Code doesn't expect flags to be changed... Need to try that push/pop stuff...

Rayman · 2016-04-04 23:26

Ok, I think I see that

ret  wz,wc

restores the original z and c flags when calling a subroutine. Can that be?

Seems to work anyway

ozpropdev · 2016-04-04 23:59

deleted.

ozpropdev · 2016-04-05 01:55

Rayman
Are you still having to add waitx delays?

 waitx   ##120000'300

rjo__ · 2016-04-05 02:56

Rayman,

I got this mostly from Brian,

send_char       
             
.loop1	pingetz	tx_ready,#tx_pin wc
	if_c	jmp	#.loop1
                pinsety Tx_char,#TX_PIN
                waitx ##20000
                ret

Rich

ozpropdev · 2016-04-05 03:08

Rayman,_rjo
I'm a little mystified why you guys have to put waitx instruction in your code?

evanh · 2016-04-05 04:27

cgracey wrote: »

It does throw many samples away.

Counting contiguous states is good for debouncing a button, but not good for filtering noise out.

I'm not that upset about the sampling holes but I could see the amount of hardware already thrown at it would allow more extensive capability.

In hindsight, I'm liking JMG's suggestion simply because it gets the job done with very few flip-flops. For serial use, always want a fixed amount of filtering. And when not using the divider for serial it is entirely free for an arbitrary filter. Or does that divider have other uses?

rjo__ · 2016-04-05 04:32

Brian,

Remember, we are using 3M as our base baud... No matter what I do, I can't get my main development environment(Processing.org) to accept the byte rate... the baud is fine... and Processing.org sends images etc at the right rate and the P2 gets the data... going the other way (p2->computer) is just a mystery. The waitx fixes it... I don't know exactly why:)... got a few ideas here, but no definite understanding. I was at a waitx of about 2000 before I hooked up a leap motion camera... now it is really slow:)

This is exactly why I think the USB guys are contributing very worthwhile input. Get rid of the computer, with all of its black holes and boxes.

I hope Chip has his noise-blocking ear buds ready for Tuesday:)

rjo__ · 2016-04-05 04:35

By the way... I have been trying to follow the conversation and I have the basic idea, but to use the USB, we are going to need some really good case studies.

jmg · 2016-04-05 04:45

rjo__ wrote: »

... going the other way (p2->computer) is just a mystery. The waitx fixes it... I don't know exactly why:)... got a few ideas here, but no definite understanding.

So this is more a PC side problem ? Is this Duplex ?
How much data and what size buffers do you have set ?

Did you test COG-COG Tx-Rx on the P2, counting bytes and checking a pattern for losses, as the P2 should be able to sustain high speeds. (much higher than 3MBd, it should manage COG-COG towards 40MBd)

Then, I'd suggest doing more on each Rx and adding handshake lines so the system is forced to wait and confirm the hardware can operate in handshake situations.

For the P2 -> PC, You could try handshakes, or use a FT2232H / FT232H which have larger buffers and a 480MBd transport, so are easier to sustain 3MBd on.

ozpropdev · 2016-04-05 05:16

Rich
I too am running at 3M at the moment.
I have run my code from 9600 to 3M using the on board FTDI device on the A9 board,
The only issue I encountered running at 3M ended up being solved by replacing my USB cable.
The same code when used with a Ft2232H runs at 12M no problems.
The only difference might be that I am only sending data from P2 to PC.
A bit of a mystery....

Rayman · 2016-04-05 10:26

I'm not using wait anymore now that I have pingetz figured out

Changing z flag messes with other parts of code
But using ret wc,wz fixed that

rjo__ · 2016-04-05 15:04

jmg

First, thank you for your tireless contributions. An entire generation of users will benefit.

Yes I am confident that this has nothing to do with the P2 and has everything to do with the PC side.

I have an FT223H in my jacket pocket, but have had less than an hour to play since grandma Libby hit town. This week looks like it was made for the P2:) Will post an update on my progress asap.

Rich

cgracey · 2016-04-05 18:05

I've been working on speeding up the pin messaging by sending only words and bytes, when sufficient. This will cut reaction times by more than half, in many cases. I'm about 2/3 there.

Seairth · 2016-04-05 19:28

@cgracey,

Is it possible to set the C flag with PINGETZ?

Here's why I ask...

(note: to avoid confusion between C and Z flags and the C and Z pin registers, I will refer to the later as pinC and pinZ. PINGETZ would read pinZ into a register and set the C flag to pinC.)

Suppose you were to change modes %10001 (Time A-input high and low states) as follows:

* On state change, capture prior A state to pinC (instead of pinZ[31])
* On state change, capture sysclk (not the running count since the last state change) to pinZ
* If sysclk loops around without A changing state, set IN without changing pinC or pinZ.

With this:

* Software can still detect a timeout because neither pinC nor pinZ changed.
* Conversely, the "timeout" can be used to maintain an extended clock to perform longer measurements.
* Because all pins running the same mode are using the same sysclck, the pin state change vents can be correlated.
* Duration can still be calculated by keeping track of the last pinZ. With this, you wouldn't technically need mode %10011

You could also enable PINSETX for this mode to indicate the resolution of the sysclk:

0: Evaluate sysclk[7:0]. Timeout occurs after 2^8 clock cycles without state change. One byte of pinZ would be transferred on PINGETZ.
1: Evaluate sysclk[15:0]. Timeout occurs after 2^16 clock cycles without state change. Two bytes of pinZ would be transferred on PINGETZ.
2: etc.
3: etc.

If the pin messaging bus is still 4 bits wide, you could actually have the granularity be 4 bits instead of 8 bits, allowing fairly fine-grained control over the timeout period. This would allow the user to trade off between faster data retrieval and fewer timeout events.

(Edit: I forgot to point out that this would mean that the suggested "capture A or B state change" mode would not be needed.)

jmg · 2016-04-05 20:15

Seairth wrote: »

...
Suppose you were to change modes %10001 (Time A-input high and low states) as follows:

* On state change, capture prior A state to pinC (instead of pinZ[31])
-ctd-
* If sysclk loops around without A changing state, set IN without changing pinC or pinZ.

With this:

* Software can still detect a timeout because neither pinC nor pinZ changed.

Could this see a rapid dual-change as a timeout ?

Seairth wrote: »

* On state change, capture sysclk (not the running count since the last state change) to pinZ
...
With this:
...
* Because all pins running the same mode are using the same sysclck, the pin state change vents can be correlated.

I like this detail, because it helps multiple-pin deployment, and I think sync action across many pins needs a little work.
This also has better noise features, as keeping absolute time allows SW sanity checks, whilst a reset-on-change can give very small values on any edge noise, and lose the more important between-far-edge time.

Seairth · 2016-04-06 11:54

jmg wrote: »

Seairth wrote: »

...
Suppose you were to change modes %10001 (Time A-input high and low states) as follows:

* On state change, capture prior A state to pinC (instead of pinZ[31])
-ctd-
* If sysclk loops around without A changing state, set IN without changing pinC or pinZ.

With this:

* Software can still detect a timeout because neither pinC nor pinZ changed.

Could this see a rapid dual-change as a timeout ?

I don't see how that could happen. It is possible to mistake 2n transitions as a timeout if you don't handle the first 2n-1 events and the 2nth event happens to fall exactly on the same value of sysclk as the last event that you had processed. If the sysclk granularity feature were implemented, this could certainly happen for more frequent timeout periods (e.g. 2^4). However, if you are trying to handle edge events that are that frequent (i.e. you are missing some of them), my guess is that this is neither the mode nor the chip you should be using.

In reality, I don't see anyone going any lower than a timeout of 2^8 cycles, since 2^4 wouldn't give enough time to process the event before the next one was received. Note, however, that this is just the timeout interval. You could still receive actual events more frequently than that. Supposing you set your timeout to 2^8 (and therefore also set the PINGETZ to pass only 8 bits, for a total of 4 clock cylces), you would need another dozen instructions to test for timeout and calculate duration. Let's estimate 30-40 clock cycles for minimal processing (including the PINGETZ). On the final chip, you are realistically limited to edge events <40MHz (assuming 160MHz sysclk). If you are handling multiple pin events on the same cog, this will be even lower.

(Incidentally, this would be a nice mode for making a logic analyzer! Assuming the above estimates are somewhat accurate, you could do a 16-pin logic analyzer with 4 cogs processing 4 pins each at an effective sample rate of about 10Msps. Or 8 cogs processing 2 pins apiece for about 20Msps. In fact, you could allow the user to trade off between pin count and effective sample rate, which is a feature I don't think I've seen before.)

MJB · 2016-04-06 14:35

Seairth wrote: »

(Incidentally, this would be a nice mode for making a logic analyzer! Assuming the above estimates are somewhat accurate, you could do a 16-pin logic analyzer with 4 cogs processing 4 pins each at an effective sample rate of about 10Msps. Or 8 cogs processing 2 pins apiece for about 20Msps. In fact, you could allow the user to trade off between pin count and effective sample rate, which is a feature I don't think I've seen before.)

wouldn't you just wait for the trigger pattern und use the streamer to capture the data as fast as possible?
just my naive thinking ;-)

Seairth · 2016-04-06 16:12

MJB wrote: »

Seairth wrote: »

(Incidentally, this would be a nice mode for making a logic analyzer! Assuming the above estimates are somewhat accurate, you could do a 16-pin logic analyzer with 4 cogs processing 4 pins each at an effective sample rate of about 10Msps. Or 8 cogs processing 2 pins apiece for about 20Msps. In fact, you could allow the user to trade off between pin count and effective sample rate, which is a feature I don't think I've seen before.)

wouldn't you just wait for the trigger pattern und use the streamer to capture the data as fast as possible?
just my naive thinking ;-)

Sure! There are a variety of ways that a logic analyzer could be implemented. I'm not suggesting that the above approach (mode %10001 variant) is the best way to do it, just one way.

cgracey · 2016-04-06 18:33

I'm still working on getting the smart-pin-><-cog communication sped up. Because there are 4-bit busses going each direction, already, I'll send an initial nibble which tells how many data nibbles are coming, so that we can not waste time with leading 0's.

For the serial receive modes, if you have a byte or less, the new bit is shifted into bit 7, not bit 31. If you have a word or less, the new bit is shifted into bit 15, not bit 31, and so on.

Seairth · 2016-04-06 19:26

cgracey wrote: »

I'm still working on getting the smart-pin-><-cog communication sped up. Because there are 4-bit busses going each direction, already, I'll send an initial nibble which tells how many data nibbles are coming, so that we can not waste time with leading 0's.

For the serial receive modes, if you have a byte or less, the new bit is shifted into bit 7, not bit 31. If you have a word or less, the new bit is shifted into bit 15, not bit 31, and so on.

Nice! This is going to be a crazy-fast chip when it's all done! Well, at least compared to P1.

Any thoughts on the mode %10001 idea?

cgracey · 2016-04-06 20:53

Seairth wrote: »

cgracey wrote: »

I'm still working on getting the smart-pin-><-cog communication sped up. Because there are 4-bit busses going each direction, already, I'll send an initial nibble which tells how many data nibbles are coming, so that we can not waste time with leading 0's.

For the serial receive modes, if you have a byte or less, the new bit is shifted into bit 7, not bit 31. If you have a word or less, the new bit is shifted into bit 15, not bit 31, and so on.

Nice! This is going to be a crazy-fast chip when it's all done! Well, at least compared to P1.

Any thoughts on the mode %10001 idea?

Sorry I'm slow to respond to the mode %10001 idea. I read it a few times, but it never sunk in until I reread it very slowly just now. That's a neat idea. It gets things down to an absolute time. That could be done, and I agree that some timescale is necessary. The only problem I see with that is the need to run the whole 32-bit counter to every smart pin. That may be no big deal, but it almost doubles the number of hub/cog signals going into the smart pin. After I get this communication sped up, we'll look more into this.

jmg · 2016-04-06 20:54

Seairth wrote: »

jmg wrote: »

Could this see a rapid dual-change as a timeout ?

I don't see how that could happen. It is possible to mistake 2n transitions as a timeout if you don't handle the first 2n-1 events and the 2nth event happens to fall exactly on the same value of sysclk as the last event that you had processed. If the sysclk granularity feature were implemented, this could certainly happen for more frequent timeout periods (e.g. 2^4). However, if you are trying to handle edge events that are that frequent (i.e. you are missing some of them), my guess is that this is neither the mode nor the chip you should be using.

The important bit, is it IS possible, being rare does not help the user.
Also users may not think they have fast edge events at all - they may miss double pulses on their external comparators, for example.

Seairth wrote: »

Let's estimate 30-40 clock cycles for minimal processing (including the PINGETZ).
On the final chip, you are realistically limited to edge events <40MHz (assuming 160MHz sysclk). If you are handling multiple pin events on the same cog, this will be even lower.

Did you mean 4MHz from that 30-40 clock cycles ?

Yes and no.
This is partly why I push for better co-operation & between multiple pins.

With the Time Interval A-B and/or having one pin cell per edge, you can resolve down to one SysCLK for narrow pulse cases.

Even if those pulses come in above your lower read rate, good capture arming can still get a valid snapshot.
For time-of-flight type uses, you would take a great many such samples, and average for higher resolution.

Prop123_A9_Prop2_v7z Intermediate Release - 25 March 2016 - USB, improved serial

Comments