Shop OBEX P1 Docs P2 Docs Learn Events
FDS demo with interrupts — Parallax Forums

FDS demo with interrupts

SeairthSeairth Posts: 2,474
edited 2015-10-30 14:21 in Propeller 2
Here's a simple FDS implementation that partially uses interrupts and the new timers. As it currently stands, it works reliably up to 460,800 bps (well, up to that rate in PST). If those number hold, this should easily work at 1Mbps once the PLL is working.


Right now, the driver has three major components:

* Edge interrupt to detect the beginning of a start bit.
* WRLONG interrupt to detect new data to be sent.
* A loop that uses CT1 and CT2 to do the bit-banging. When there is neither an active RX or TX, this loop is paused with a WAITINT.

Enjoy!

Edit: Added a second demo that entirely uses interrupts for TX. RX starts via interrupt, then uses timer polling for the rest of the receive period.
Edit: Added a third demo that entirely uses interrupts for both TX and RX.
Edit: Updated all three versions to match 2015-10-29 release.
«1

Comments

  • cgraceycgracey Posts: 14,243
    Great! Did you find that the hub read and write interrupts work okay now?
  • cgraceycgracey Posts: 14,243
    You were right that we needed more timers. I think the three we have now are plenty. Two may not have been enough, though.
  • Thanks, I'll look at integrating that into Tachyon as I am using a dedicated cog at present just for buffered receive and bit-bashing transmit data directly from the Tachyon cog. I haven't had a look at how interrupts are used on the P2 yet so this will be a great intro for me.
  • cgracey wrote: »
    Great! Did you find that the hub read and write interrupts work okay now?

    I haven't tested read yet, but write is working great!
  • Thanks, I'll look at integrating that into Tachyon as I am using a dedicated cog at present just for buffered receive and bit-bashing transmit data directly from the Tachyon cog. I haven't had a look at how interrupts are used on the P2 yet so this will be a great intro for me.

    Go for it! Note that, at the moment, this is not as flexible as the P1 FDS. It's hardwired to 8N1 and it doesn't yet have a real send or receive buffer. I'll be adding the buffers soon (unless someone beats me to it).
  • I started working on a slight variation of the design that used even more interrupt! However, I have hit an issue that I cannot seem to resolve. In the attached code, I have a receive-only version. Like the original version, it uses an interrupt to detect the start bit. From there, it uses timer polling to read the remaining bits. (the "more interrupt" part is in the TX code, which I did not include.)

    The problem is occurring with the POLLCT1 at the .read_bit label. I am calling ADDCT1 just before it, then entering into a tight polling loop. The loop does eventually exit, but only after the counter has wrapped around and hit the timer the second time. At least, I think that's what's happening. The value of rx_cnt+full_bit_time (with value of 434) should be greater than (later than) the current counter.

    I'm hoping I just have a bug in my code that I can't see.
    con
    	sys_clk = 50_000_000
    	baud_rate = 115_200
    	
    	rx_pin = 63
    	
    	rx_reg = $FFF80
    
    	ct1_int = 1
    	edg_int = 5
    	
    dat
    		orgh
    		org 0
    
    		setb	dirb, #8
    
    		setwrl	#(rx_reg - $FFF80) >> 2		' set wrl-event the receive buffer
    		
    		call	#start_fds
    
    wait4rx		waitwrl					' wait for a long to be written to receive buffer
    
    		rdlong	.buff, ##rx_reg			' read from RX buffer
    		notb	outb, #8
    		
    		jmp	#wait4rx
    		
    .buff		long	0		
    		
    start_fds	coginit #16, #@fds
    		ret
    
    '==================================================================================================
    		org 0
    fds		
    		clrb	dirb, #rx_pin			' configure RX pin
    
    		mov	ijmp1, #rx_isr			' ISR for RX start
    		
    		setedg	#(%0_10_000000 | rx_pin)	' set up for falling edge
    		setint1	#edg_int			' enable interrupt
    		
    rx_loop		waitint					' wait for one of the interrupts to occur
    		tjz	rx_buff, #rx_loop		' if not receiving, go back to waiting
    
    .check_start	pollct1	wc				' tight loop, waiting for half_bit_time
    	if_nc	jmp	#.check_start
    
    		jnp	#rx_pin, #.good_start		' pin is low, good start
    		mov	rx_buff, ##-1			' bad start bit
    		jmp	#.rx_end
    
    .good_start	addct1	rx_cnt, full_bit_time		' get the first bit sample time
    
    .read_bit	pollct1	wc				' FAIL: missing the event (until the counter wraps around)
    	if_nc	jmp	#.read_bit
    
    		testb	inb, #rx_pin wc			' get pin state to C
    
    		testb	rx_buff, #8 wz			' get buffer state to Z	
    	if_nz	jmp	#.check_end			' buffer is full (z=1)
    
    		rcl	rx_buff, #1			' store the bit
    		addct1	rx_cnt, full_bit_time		' get the next bit sample time
    		jmp	#.read_bit
    
    .check_end
    	if_nc	mov	rx_buff, ##-2			' missing stop bit (c=0)
    	if_c	rev	rx_buff				' since we received LSB first, reverse the bits
    	if_c	getbyte	rx_buff, rx_buff, #3		' and get the byte (which is at 31..24 due to the REV)
    
    .rx_end		wrlong	rx_buff, ##rx_reg		' store value to hub
    		mov	rx_buff, #0			' clear (which indicates read is no longer active)
    
    		jmp	#rx_loop
    
    '--------------------------------------------------------------------------------------------------
    rx_isr		tjnz	rx_buff, #.ret			' see if we are already receiving a byte
    							' buffer is non-zero. this is not the start bit.
    		getcnt	rx_cnt
    		addct1	rx_cnt, half_bit_time		' add 0.5 bit periods (to test valid start bit)
    		
    		mov	rx_buff, #1			' set rx_buff to non-zero.  This bit will
    							' be shifted out of the lower byte. It will be
    							' used to detect when a full byte is received.
    .ret		reti1
    '--------------------------------------------------------------------------------------------------
    full_bit_time	long	sys_clk / baud_rate
    half_bit_time	long	(sys_clk / baud_rate) >> 1
    
    rx_buff		res	1
    rx_cnt		res	1
    
  • As pointed out by Chip in the ISR strangeness thread, it turns out that the above code needed a NOP immediately after the WAITINT.

    With that fixed, I have now updated the OP with another attachment: fds_demo2. This one entirely uses interrupts for TX, while still using a combination of interrupt and timer polling for RX. This version has been successfully tested to 1Mbps (with PST, that is)!
  • evanhevanh Posts: 16,088
    Wow, that's pretty damn cool and insane use of interrupts. Masking and unmasking on a per byte basis ... and even more crazy, firing on a per bit basis to perform the bit by bit bashing. Interrupts, as per general computing at least, are a real-time managerial feature rather than a hit the metal type feature. I guess it's not uncommon territory for microcontrollers but I still doubt there is that much code out there doing bit timing directly with the interrupts.

    And that's just the TX code. I note you've very effectively used another timer to regulate the RX code so that cycles stolen by the TX code don't throw out the RX timing. Using the more precise mechanism for the TX means the electrical timing very stable.

    That's some nice code ... making excellent use of some nice flexible hardware I suppose ... And so readable to boot.
  • Moar interrupts!

    Just updated the OP with a third version that is entirely interrupt driven. It doesn't run any faster, though. On the up side, this should be more power-friendly, as all of the polling has been removed.
  • evanh wrote: »
    Wow, that's pretty damn cool and insane use of interrupts.

    Even more importantly, its damned fun to write! :lol:

  • Technical note on the second and third version:

    In both cases, I think the safe baud rate is limited by the TX_start ISR. Because of the RDLONG, you must assume that TX_start will always take at least 34-36 clock cycles. And since this ISR can execute in the middle of the RX bit sampling (either the polled version in V2 or the interrupt version in V3), the RX sample could occur at least 36 clocks later than expected. Since the RX attempts to sample at the middle of the bit period, this means that the bit period must be at least 72 clock cycles wide (plus some for margin of error). On the current 50MHz clock, this means a maximum safe baud rate of ~625Kbps (assuming 80 clock cycles per bit period).

    The only solution I've been able to think of so far is to actually sample RX much closer to the beginning of each bit time. As long as the bit transitions are fast, you could reasonably sample only a few clock cycles into each bit time. This would make the maximum baud rate dependent on ~(sample_delay+36) instead of ~(2*36). With clean signalling, you should be able to get back up to the ~1Mbps rate with the current 50MHz clock.
  • Can't you just make the RX sample interrupt higher priority than the TX_start one? After all, receiving data is more urgent than sending it.
  • Seairth wrote: »
    Technical note on the second and third version:

    In both cases, I think the safe baud rate is limited by the TX_start ISR. Because of the RDLONG, you must assume that TX_start will always take at least 34-36 clock cycles. And since this ISR can execute in the middle of the RX bit sampling (either the polled version in V2 or the interrupt version in V3), the RX sample could occur at least 36 clocks later than expected. Since the RX attempts to sample at the middle of the bit period, this means that the bit period must be at least 72 clock cycles wide (plus some for margin of error). On the current 50MHz clock, this means a maximum safe baud rate of ~625Kbps (assuming 80 clock cycles per bit period).

    The only solution I've been able to think of so far is to actually sample RX much closer to the beginning of each bit time. As long as the bit transitions are fast, you could reasonably sample only a few clock cycles into each bit time. This would make the maximum baud rate dependent on ~(sample_delay+36) instead of ~(2*36). With clean signalling, you should be able to get back up to the ~1Mbps rate with the current 50MHz clock.


    Do not forget sampling RX bits via interrupt on bit time periods may cause problems if any of the instructions mentioned in this post to you are executing at the time of interrupt.

    forums.parallax.com/discussion/comment/1351880/#Comment_1351880
  • evanhevanh Posts: 16,088
    It's the the Hub access that's the problem. Must be a way to use RDFAST and WRFAST, I believe they can eliminate the long instructions. Would have to alternate between them though. Sort of a page flipping exercise but with two way streaming instead.
  • evanh wrote: »
    It's the the Hub access that's the problem. Must be a way to use RDFAST and WRFAST, I believe they can eliminate the long instructions. Would have to alternate between them though. Sort of a page flipping exercise but with two way streaming instead.

    Interesting idea! At the very least, using WRFAST instead of WRLONG would be a good idea. I hadn't considered that the WRLONG could cause a TX timer event to get delayed for the same reason described above. Because the RX and TX are asynchronous, I doubt there's a reliable way to safely switch back and forth between RDFAST and WRFAST.
  • evanhevanh Posts: 16,088
    edited 2015-10-30 14:10
    Bugger, RDFAST is a long instruction. So can't be reissuing it. :( I wonder if a reliable way can be made to change RDFAST to be a fast two-clock instruction ... Chip, any suggestions? an alternative maybe? That FIFO is just begging to be used for this sort of thing.
  • 78rpm wrote: »
    Do not forget sampling RX bits via interrupt on bit time periods may cause problems if any of the instructions mentioned in this post to you are executing at the time of interrupt.

    forums.parallax.com/discussion/comment/1351880/#Comment_1351880

    True. There are a few AUGS that I could get rid of.
  • evanhevanh Posts: 16,088
    edited 2015-10-30 14:14
    AUGS only stretches an extra two clocks. It's no worse than a branch instruction, unless it's being applied to a branch that is.

    BTW: What indicates an AUGS is used?
  • 78rpm78rpm Posts: 264
    edited 2015-10-30 14:38
    *** EDITED 40 minutes later because TWIT that I am forgot to scroll down my put comment inside the quote! ***
    Seairth wrote: »
    78rpm wrote: »
    Do not forget sampling RX bits via interrupt on bit time periods may cause problems if any of the instructions mentioned in this post to you are executing at the time of interrupt.

    forums.parallax.com/discussion/comment/1351880/#Comment_1351880
    True. There are a few AUGS that I could get rid of.

    I was thinking with respect to receiving the RX bits and the code currently executing at the time. RX start bit could be delayed by say, a long REP sequence in non-isr code. If the non-isr code has further long REP sequences the counter interrupt for RX mid-bit sampling would be delayed, possibly misssing it. May need to stay with the isr for the loop, or modify the return address in IRETn register, saving interuupted program's return address in normal long, so it return to your own code, possibly dummy, whilst RX bits are incoming, then when all received, return to the original non-isr code. This method would allow other interrupts to occur, not just higher priority ones.
  • evanh wrote: »
    AUGS only stretches an extra two clocks. It's no worse than a branch instruction, unless it's being applied to a branch that is.

    BTW: What indicates an AUGS is used?

    Anywhere I am using "##". In all of those cases, I can replace them with a register value instead of an immediate value.
  • I just updated the OP for the 2015-10-29 release.
  • Seairth wrote: »
    I just updated the OP for the 2015-10-29 release.

    Just updated my comment so it does not appear as a quote! :blush:
  • evanhevanh Posts: 16,088
    A way to change RDFAST might be to add some more event flags that can be polled/triggered on. One for FIFO full would allow the RDFAST instruction to return immediately and then software can also know when the FIFO is ready to be read from without any delays.
  • evanh wrote: »
    A way to change RDFAST might be to add some more event flags that can be polled/triggered on. One for FIFO full would allow the RDFAST instruction to return immediately and then software can also know when the FIFO is ready to be read from without any delays.

    Maybe. But don't encourage Chip down this route, at least not yet! I want those smart pins finished! That might give us some other avenues for performance improvement.
  • evanhevanh Posts: 16,088
    But, but ... this a generic coding side ... about smoothing the state-machine emulation capabilities. I'm really keen to find a way to take the bumps out of Hub accesses. That was a battle even for Prop1 but it's worse for the Prop2 with it's faster Cogs. It'll be worth it.

    Smartpins is hardware for hardware. That's cool and all but is not the same thing.
  • evanh wrote: »
    But, but ... this a generic coding side ... about smoothing the state-machine emulation capabilities. I'm really keen to find a way to take the bumps out of Hub accesses. That was a battle even for Prop1 but it's worse for the Prop2 with it's faster Cogs. It'll be worth it.

    Smartpins is hardware for hardware. That's cool and all but is not the same thing.

    This is relevant to the discussion so far, but not just about fds-demo.

    What if you synchronise with hub-access slot, and set a counter to interrupt at the point when hub data can be read / written. That would mean buffering the data internal to the cog/lut. The interrupt occurs, you have deducted an amout of time to enable reading of cog data to pass to the hub. Then at the moment the hub slot is available the next instruction is / read / write. If read, save data in cog/lut for TX routine. IRETn.

    It would be overhead to some degree, but it would also mean not waiting for the hub as such. Your pure cog code can run merrily along until it runs out of or has generated enough data, and then has to wait.
  • 78rpm wrote: »
    evanh wrote: »
    But, but ... this a generic coding side ... about smoothing the state-machine emulation capabilities. I'm really keen to find a way to take the bumps out of Hub accesses. That was a battle even for Prop1 but it's worse for the Prop2 with it's faster Cogs. It'll be worth it.

    Smartpins is hardware for hardware. That's cool and all but is not the same thing.

    This is relevant to the discussion so far, but not just about fds-demo.

    What if you synchronise with hub-access slot, and set a counter to interrupt at the point when hub data can be read / written. That would mean buffering the data internal to the cog/lut. The interrupt occurs, you have deducted an amout of time to enable reading of cog data to pass to the hub. Then at the moment the hub slot is available the next instruction is / read / write. If read, save data in cog/lut for TX routine. IRETn.

    It would be overhead to some degree, but it would also mean not waiting for the hub as such. Your pure cog code can run merrily along until it runs out of or has generated enough data, and then has to wait.

    Actually, if there was Special Function Register which you could do a RDBYTE/WORD/LONG or WRBYTE/WORD/LONG, which stored the data, and some bits hidden stored the registers, including ptr involved, then when hub access came round it was *automatically* executed, you can have a higher throughput of cog code without waiting on the hub so much, by sometimes reading data in advance. If the WAITxxxx POLLxxx wee extended to provide status of this it would be a useful transfer mechanism in the background.
  • evanhevanh Posts: 16,088
    edited 2015-10-30 15:54
    That'll be harder now (Prop2) me thinks. The Hub block mapping likely makes for varying timing slots depending on the target address. Anyone care to test with a P123 or DE2?

    And even synchronised still incurs a minimum number of clocks that will probably be 6 or so for a RDLONG.
  • evanh wrote: »
    That'll be harder now (Prop2) me thinks. The Hub block mapping likely makes for varying timing slots depending on the target address. Anyone care to test with a P123 or DE2?

    And even synchronised still incurs a minimum number of clocks that will probably be 6 or so for a RDLONG.

    Indeed, then I thought ofthe backgroun read/write via SFR.
  • evanhevanh Posts: 16,088
    That's what the FIFO already does, well, in incrementing address order at least. RDFAST/WRFAST enable using the FIFO with RDLONG and co.
Sign In or Register to comment.