FDS demo with interrupts

Seairth · 2015-10-28 03:00

Here's a simple FDS implementation that partially uses interrupts and the new timers. As it currently stands, it works reliably up to 460,800 bps (well, up to that rate in PST). If those number hold, this should easily work at 1Mbps once the PLL is working.

Right now, the driver has three major components:

* Edge interrupt to detect the beginning of a start bit.
* WRLONG interrupt to detect new data to be sent.
* A loop that uses CT1 and CT2 to do the bit-banging. When there is neither an active RX or TX, this loop is paused with a WAITINT.

Enjoy!

Edit: Added a second demo that entirely uses interrupts for TX. RX starts via interrupt, then uses timer polling for the rest of the receive period.
Edit: Added a third demo that entirely uses interrupts for both TX and RX.
Edit: Updated all three versions to match 2015-10-29 release.

cgracey · 2015-10-28 05:26

Great! Did you find that the hub read and write interrupts work okay now?

cgracey · 2015-10-28 05:28

You were right that we needed more timers. I think the three we have now are plenty. Two may not have been enough, though.

Peter Jakacki · 2015-10-28 06:23

Thanks, I'll look at integrating that into Tachyon as I am using a dedicated cog at present just for buffered receive and bit-bashing transmit data directly from the Tachyon cog. I haven't had a look at how interrupts are used on the P2 yet so this will be a great intro for me.

Seairth · 2015-10-28 11:25

cgracey wrote: »

Great! Did you find that the hub read and write interrupts work okay now?

I haven't tested read yet, but write is working great!

Seairth · 2015-10-28 11:43

Peter Jakacki wrote: »

Thanks, I'll look at integrating that into Tachyon as I am using a dedicated cog at present just for buffered receive and bit-bashing transmit data directly from the Tachyon cog. I haven't had a look at how interrupts are used on the P2 yet so this will be a great intro for me.

Go for it! Note that, at the moment, this is not as flexible as the P1 FDS. It's hardwired to 8N1 and it doesn't yet have a real send or receive buffer. I'll be adding the buffers soon (unless someone beats me to it).

Seairth · 2015-10-29 13:33

I started working on a slight variation of the design that used even more interrupt! However, I have hit an issue that I cannot seem to resolve. In the attached code, I have a receive-only version. Like the original version, it uses an interrupt to detect the start bit. From there, it uses timer polling to read the remaining bits. (the "more interrupt" part is in the TX code, which I did not include.)

The problem is occurring with the POLLCT1 at the .read_bit label. I am calling ADDCT1 just before it, then entering into a tight polling loop. The loop does eventually exit, but only after the counter has wrapped around and hit the timer the second time. At least, I think that's what's happening. The value of rx_cnt+full_bit_time (with value of 434) should be greater than (later than) the current counter.

I'm hoping I just have a bug in my code that I can't see.

con
	sys_clk = 50_000_000
	baud_rate = 115_200
	
	rx_pin = 63
	
	rx_reg = $FFF80

	ct1_int = 1
	edg_int = 5
	
dat
		orgh
		org 0

		setb	dirb, #8

		setwrl	#(rx_reg - $FFF80) >> 2		' set wrl-event the receive buffer
		
		call	#start_fds

wait4rx		waitwrl					' wait for a long to be written to receive buffer

		rdlong	.buff, ##rx_reg			' read from RX buffer
		notb	outb, #8
		
		jmp	#wait4rx
		
.buff		long	0		
		
start_fds	coginit #16, #@fds
		ret

'==================================================================================================
		org 0
fds		
		clrb	dirb, #rx_pin			' configure RX pin

		mov	ijmp1, #rx_isr			' ISR for RX start
		
		setedg	#(%0_10_000000 | rx_pin)	' set up for falling edge
		setint1	#edg_int			' enable interrupt
		
rx_loop		waitint					' wait for one of the interrupts to occur
		tjz	rx_buff, #rx_loop		' if not receiving, go back to waiting

.check_start	pollct1	wc				' tight loop, waiting for half_bit_time
	if_nc	jmp	#.check_start

		jnp	#rx_pin, #.good_start		' pin is low, good start
		mov	rx_buff, ##-1			' bad start bit
		jmp	#.rx_end

.good_start	addct1	rx_cnt, full_bit_time		' get the first bit sample time

.read_bit	pollct1	wc				' FAIL: missing the event (until the counter wraps around)
	if_nc	jmp	#.read_bit

		testb	inb, #rx_pin wc			' get pin state to C

		testb	rx_buff, #8 wz			' get buffer state to Z	
	if_nz	jmp	#.check_end			' buffer is full (z=1)

		rcl	rx_buff, #1			' store the bit
		addct1	rx_cnt, full_bit_time		' get the next bit sample time
		jmp	#.read_bit

.check_end
	if_nc	mov	rx_buff, ##-2			' missing stop bit (c=0)
	if_c	rev	rx_buff				' since we received LSB first, reverse the bits
	if_c	getbyte	rx_buff, rx_buff, #3		' and get the byte (which is at 31..24 due to the REV)

.rx_end		wrlong	rx_buff, ##rx_reg		' store value to hub
		mov	rx_buff, #0			' clear (which indicates read is no longer active)

		jmp	#rx_loop

'--------------------------------------------------------------------------------------------------
rx_isr		tjnz	rx_buff, #.ret			' see if we are already receiving a byte
							' buffer is non-zero. this is not the start bit.
		getcnt	rx_cnt
		addct1	rx_cnt, half_bit_time		' add 0.5 bit periods (to test valid start bit)
		
		mov	rx_buff, #1			' set rx_buff to non-zero.  This bit will
							' be shifted out of the lower byte. It will be
							' used to detect when a full byte is received.
.ret		reti1
'--------------------------------------------------------------------------------------------------
full_bit_time	long	sys_clk / baud_rate
half_bit_time	long	(sys_clk / baud_rate) >> 1

rx_buff		res	1
rx_cnt		res	1

Seairth · 2015-10-30 10:57

As pointed out by Chip in the ISR strangeness thread, it turns out that the above code needed a NOP immediately after the WAITINT.

With that fixed, I have now updated the OP with another attachment: fds_demo2. This one entirely uses interrupts for TX, while still using a combination of interrupt and timer polling for RX. This version has been successfully tested to 1Mbps (with PST, that is)!

evanh · 2015-10-30 12:05

Wow, that's pretty damn cool and insane use of interrupts. Masking and unmasking on a per byte basis ... and even more crazy, firing on a per bit basis to perform the bit by bit bashing. Interrupts, as per general computing at least, are a real-time managerial feature rather than a hit the metal type feature. I guess it's not uncommon territory for microcontrollers but I still doubt there is that much code out there doing bit timing directly with the interrupts.

And that's just the TX code. I note you've very effectively used another timer to regulate the RX code so that cycles stolen by the TX code don't throw out the RX timing. Using the more precise mechanism for the TX means the electrical timing very stable.

That's some nice code ... making excellent use of some nice flexible hardware I suppose ... And so readable to boot.

Seairth · 2015-10-30 12:27

Moar interrupts!

Just updated the OP with a third version that is entirely interrupt driven. It doesn't run any faster, though. On the up side, this should be more power-friendly, as all of the polling has been removed.

Seairth · 2015-10-30 12:28

evanh wrote: »

Wow, that's pretty damn cool and insane use of interrupts.

Even more importantly, its damned fun to write!

Seairth · 2015-10-30 12:44

Technical note on the second and third version:

In both cases, I think the safe baud rate is limited by the TX_start ISR. Because of the RDLONG, you must assume that TX_start will always take at least 34-36 clock cycles. And since this ISR can execute in the middle of the RX bit sampling (either the polled version in V2 or the interrupt version in V3), the RX sample could occur at least 36 clocks later than expected. Since the RX attempts to sample at the middle of the bit period, this means that the bit period must be at least 72 clock cycles wide (plus some for margin of error). On the current 50MHz clock, this means a maximum safe baud rate of ~625Kbps (assuming 80 clock cycles per bit period).

The only solution I've been able to think of so far is to actually sample RX much closer to the beginning of each bit time. As long as the bit transitions are fast, you could reasonably sample only a few clock cycles into each bit time. This would make the maximum baud rate dependent on ~(sample_delay+36) instead of ~(2*36). With clean signalling, you should be able to get back up to the ~1Mbps rate with the current 50MHz clock.

Electrodude · 2015-10-30 13:50

Can't you just make the RX sample interrupt higher priority than the TX_start one? After all, receiving data is more urgent than sending it.

78rpm · 2015-10-30 13:56

Seairth wrote: »

Technical note on the second and third version:

In both cases, I think the safe baud rate is limited by the TX_start ISR. Because of the RDLONG, you must assume that TX_start will always take at least 34-36 clock cycles. And since this ISR can execute in the middle of the RX bit sampling (either the polled version in V2 or the interrupt version in V3), the RX sample could occur at least 36 clocks later than expected. Since the RX attempts to sample at the middle of the bit period, this means that the bit period must be at least 72 clock cycles wide (plus some for margin of error). On the current 50MHz clock, this means a maximum safe baud rate of ~625Kbps (assuming 80 clock cycles per bit period).

The only solution I've been able to think of so far is to actually sample RX much closer to the beginning of each bit time. As long as the bit transitions are fast, you could reasonably sample only a few clock cycles into each bit time. This would make the maximum baud rate dependent on ~(sample_delay+36) instead of ~(2*36). With clean signalling, you should be able to get back up to the ~1Mbps rate with the current 50MHz clock.

Do not forget sampling RX bits via interrupt on bit time periods may cause problems if any of the instructions mentioned in this post to you are executing at the time of interrupt.

forums.parallax.com/discussion/comment/1351880/#Comment_1351880

evanh · 2015-10-30 13:58

It's the the Hub access that's the problem. Must be a way to use RDFAST and WRFAST, I believe they can eliminate the long instructions. Would have to alternate between them though. Sort of a page flipping exercise but with two way streaming instead.

Seairth · 2015-10-30 14:03

evanh wrote: »

It's the the Hub access that's the problem. Must be a way to use RDFAST and WRFAST, I believe they can eliminate the long instructions. Would have to alternate between them though. Sort of a page flipping exercise but with two way streaming instead.

Interesting idea! At the very least, using WRFAST instead of WRLONG would be a good idea. I hadn't considered that the WRLONG could cause a TX timer event to get delayed for the same reason described above. Because the RX and TX are asynchronous, I doubt there's a reliable way to safely switch back and forth between RDFAST and WRFAST.

evanh · 2015-10-30 14:04

Bugger, RDFAST is a long instruction. So can't be reissuing it.

I wonder if a reliable way can be made to change RDFAST to be a fast two-clock instruction ... Chip, any suggestions? an alternative maybe? That FIFO is just begging to be used for this sort of thing.

Seairth · 2015-10-30 14:05

78rpm wrote: »

Do not forget sampling RX bits via interrupt on bit time periods may cause problems if any of the instructions mentioned in this post to you are executing at the time of interrupt.

forums.parallax.com/discussion/comment/1351880/#Comment_1351880

True. There are a few AUGS that I could get rid of.

evanh · 2015-10-30 14:12

AUGS only stretches an extra two clocks. It's no worse than a branch instruction, unless it's being applied to a branch that is.

BTW: What indicates an AUGS is used?

78rpm · 2015-10-30 14:16

*** EDITED 40 minutes later because TWIT that I am forgot to scroll down my put comment inside the quote! ***

Seairth wrote: »

78rpm wrote: »

Do not forget sampling RX bits via interrupt on bit time periods may cause problems if any of the instructions mentioned in this post to you are executing at the time of interrupt.

forums.parallax.com/discussion/comment/1351880/#Comment_1351880

True. There are a few AUGS that I could get rid of.

I was thinking with respect to receiving the RX bits and the code currently executing at the time. RX start bit could be delayed by say, a long REP sequence in non-isr code. If the non-isr code has further long REP sequences the counter interrupt for RX mid-bit sampling would be delayed, possibly misssing it. May need to stay with the isr for the loop, or modify the return address in IRETn register, saving interuupted program's return address in normal long, so it return to your own code, possibly dummy, whilst RX bits are incoming, then when all received, return to the original non-isr code. This method would allow other interrupts to occur, not just higher priority ones.

Seairth · 2015-10-30 14:17

evanh wrote: »

AUGS only stretches an extra two clocks. It's no worse than a branch instruction, unless it's being applied to a branch that is.

BTW: What indicates an AUGS is used?

Anywhere I am using "##". In all of those cases, I can replace them with a register value instead of an immediate value.

Seairth · 2015-10-30 14:21

I just updated the OP for the 2015-10-29 release.

78rpm · 2015-10-30 14:39

Seairth wrote: »

I just updated the OP for the 2015-10-29 release.

Just updated my comment so it does not appear as a quote!

evanh · 2015-10-30 14:42

A way to change RDFAST might be to add some more event flags that can be polled/triggered on. One for FIFO full would allow the RDFAST instruction to return immediately and then software can also know when the FIFO is ready to be read from without any delays.

Seairth · 2015-10-30 14:44

evanh wrote: »

A way to change RDFAST might be to add some more event flags that can be polled/triggered on. One for FIFO full would allow the RDFAST instruction to return immediately and then software can also know when the FIFO is ready to be read from without any delays.

Maybe. But don't encourage Chip down this route, at least not yet! I want those smart pins finished! That might give us some other avenues for performance improvement.

evanh · 2015-10-30 15:19

But, but ... this a generic coding side ... about smoothing the state-machine emulation capabilities. I'm really keen to find a way to take the bumps out of Hub accesses. That was a battle even for Prop1 but it's worse for the Prop2 with it's faster Cogs. It'll be worth it.

Smartpins is hardware for hardware. That's cool and all but is not the same thing.

78rpm · 2015-10-30 15:31

evanh wrote: »

But, but ... this a generic coding side ... about smoothing the state-machine emulation capabilities. I'm really keen to find a way to take the bumps out of Hub accesses. That was a battle even for Prop1 but it's worse for the Prop2 with it's faster Cogs. It'll be worth it.

Smartpins is hardware for hardware. That's cool and all but is not the same thing.

This is relevant to the discussion so far, but not just about fds-demo.

What if you synchronise with hub-access slot, and set a counter to interrupt at the point when hub data can be read / written. That would mean buffering the data internal to the cog/lut. The interrupt occurs, you have deducted an amout of time to enable reading of cog data to pass to the hub. Then at the moment the hub slot is available the next instruction is / read / write. If read, save data in cog/lut for TX routine. IRETn.

It would be overhead to some degree, but it would also mean not waiting for the hub as such. Your pure cog code can run merrily along until it runs out of or has generated enough data, and then has to wait.

78rpm · 2015-10-30 15:52

78rpm wrote: »

evanh wrote: »

But, but ... this a generic coding side ... about smoothing the state-machine emulation capabilities. I'm really keen to find a way to take the bumps out of Hub accesses. That was a battle even for Prop1 but it's worse for the Prop2 with it's faster Cogs. It'll be worth it.

Smartpins is hardware for hardware. That's cool and all but is not the same thing.

This is relevant to the discussion so far, but not just about fds-demo.

What if you synchronise with hub-access slot, and set a counter to interrupt at the point when hub data can be read / written. That would mean buffering the data internal to the cog/lut. The interrupt occurs, you have deducted an amout of time to enable reading of cog data to pass to the hub. Then at the moment the hub slot is available the next instruction is / read / write. If read, save data in cog/lut for TX routine. IRETn.

It would be overhead to some degree, but it would also mean not waiting for the hub as such. Your pure cog code can run merrily along until it runs out of or has generated enough data, and then has to wait.

Actually, if there was Special Function Register which you could do a RDBYTE/WORD/LONG or WRBYTE/WORD/LONG, which stored the data, and some bits hidden stored the registers, including ptr involved, then when hub access came round it was *automatically* executed, you can have a higher throughput of cog code without waiting on the hub so much, by sometimes reading data in advance. If the WAITxxxx POLLxxx wee extended to provide status of this it would be a useful transfer mechanism in the background.

evanh · 2015-10-30 15:53

That'll be harder now (Prop2) me thinks. The Hub block mapping likely makes for varying timing slots depending on the target address. Anyone care to test with a P123 or DE2?

And even synchronised still incurs a minimum number of clocks that will probably be 6 or so for a RDLONG.

78rpm · 2015-10-30 15:55

evanh wrote: »

That'll be harder now (Prop2) me thinks. The Hub block mapping likely makes for varying timing slots depending on the target address. Anyone care to test with a P123 or DE2?

And even synchronised still incurs a minimum number of clocks that will probably be 6 or so for a RDLONG.

Indeed, then I thought ofthe backgroun read/write via SFR.

evanh · 2015-10-30 16:35

That's what the FIFO already does, well, in incrementing address order at least. RDFAST/WRFAST enable using the FIFO with RDLONG and co.

FDS demo with interrupts

Comments