Shop OBEX P1 Docs P2 Docs Learn Events
cogserial - fullduplex smart serial using interrupt - Page 2 — Parallax Forums

cogserial - fullduplex smart serial using interrupt

2456

Comments

  • jmgjmg Posts: 15,173
    msrobots wrote: »
    as for 2 stop bits, might be a try, I just don't now how to do that with smart pins, must read a bit about that.

    IIRC, I think you just define TX as 9 bits, and align so the final sent bit is 1 - with the smart pins, you can thus define any number of stop bits this way, up to the 32b field width,

  • jmgjmg Posts: 15,173
    msrobots wrote: »
    ...
    the current version goes does this for using just one rx/tx pair and using the echo server
    running at baud 691200
      45061683 - PASS - 639204 - 146
      45061619 - PASS - 639204 - 146
      45061723 - PASS - 639204 - 146
    
    
    the first number is sysclock taken for test, thus negative on errors
    the number after PASS is the effective baudrate inclding code overhead and the third number the derivation in sysclocks per byte, because of that overhead.

    Interesting effect, - that seems quite a few Sysclks overhead, for a modest baud rate for P2 ?
    Your times :  
     45061683/180M  = 0.25034268333333333333
     45061619/180M  = 0.25034232777777777778
     45061723/180M  = 0.25034290555555555556
    
    Possible TX times (following usual UARTS granularity )
     16k*10/691200  = 0.23148148148148148148
     16k*11/691200  = 0.25462962962962962963 - hmm, you get somewhere in the middle
    
    Equivalent Stop Bit time
    
     16k*10.814813/691200  = 0.25034289351851851852
    
    Expressing that as SysCLKs 
      (180M/691200)*0.814813 = 212.190  (not quite your 146?)
    

    Notice that elapsed time, is not a whole bit time. Most uarts derive a BAUD clock, and all TX's align to that.
    That means sending "UUUUUUU" gives an exact baud/2 (5 pulses in 10 bit times) on most UARTS I've tested.

    At 691200, you should have ~ whole bit time (~130 opcodes) from char-done, to load-next-char, for the interrupt, & more if the P2 interrupts on Tx buffer emptied.
    What is the exact timing of the TX interrupt ?

    Does P2 reset the TX timing on every byte ? (or is it jittering between 10 & 11 bits/char)


    I would have expected P2 to be able to pack bytes continually in Tx and Rx. (certainly at 691200)
    It certainly needs to be able to receive bytes continually (no gaps)

    Any same-COG test is going to somewhat naturally self-pace. but a 2 COG test might have skews in paths in echo ?


  • cgraceycgracey Posts: 14,153
    The smart pin serial modes can absolutely send and receive gapless data.
  • jmgjmg Posts: 15,173
    cgracey wrote: »
    The smart pin serial modes can absolutely send and receive gapless data.

    I thought they could/should.
    Is the UART TX baud-aligned between bytes ? (ie are fractional bit times between bytes impossible ?)

    How much time margin is there on the TX side, and RX side, for interrupts ?
    That would be useful in the DOCs, to see exactly when the TX and RX interrupts fire, and the best way to manage normal data, and RS485 data (which needs to wait for end of stop bit, before change of direction)
  • msrobotsmsrobots Posts: 3,709
    edited 2019-02-03 08:50
    yeah I think my measured times are not correct, it is time needed for send 16k and receive 16K async plus time to read and write to and from the HUB with wrbyte/rdbyte.

    I think the pins are transmitting gapless. The 1 COG talking to itself driver runs astonishingly 90Mbaud with 180Mhz. Just when using my echo server on another COG in between it breaks down.

    That might be a problem of my echo server, I just threw it together in Spin, maybe doing inline PAM can do better as fastspin, but I think the problem is still a foot away from the screen I am looking at.

    EDIT: I should be more precise here, the pins are transmitting at 90Mbaut with 180Mhz clock, but my driver is not fast enough to feed them constantly so the driver maxes out at around 70Mbaud or so,

    I am still working on it,

    Enjoy!

    Mike
  • evanhevanh Posts: 15,916
    JMG,
    You don't seem to be using your board. How about I take it off your hands.
  • msrobotsmsrobots Posts: 3,709
    edited 2019-02-03 09:46
    I use INT 1 for RX1 and INT2 for RX2.

    I was not able to even envision how to use a Interrupt for sending, because when it fires at the time it can send, and I have nothing to send everything stops.

    So I used INT 3 in mode #1 just firing every x clocks (currently 100) and checking if it has something to output in its buffer and can ctually output on the smartpin for both TX1 and TX2

    The rest of the COG just takes care of the mailbox and transferring data from/to buffers and HUB.

    Sadly I am running out of space and have to rethink, because I currently use LUT as buffer for bytes, but save them as longs in the LUT. thus wasting a lot of buffer space. Currently I have 4 128 byte buffer for RX1,TX1,RX2,TX2 but if I could address the LUT byte wise I could have 4 1K buffers.

    I just need to figure out some small way to replace wrlut x,y/rdlut x,y with some call to something addressing bytes in the lut. And I am at 480 longs right now …

    I do have a index from 0 to buffer-size for each buffer (currently in longs) and would like to access the LUT byte-wise. I do have very less code space left, but I have still reused init code space for variables.
    ' I want to replace all wrlut's and rdlut's used right now
    'current code something like this
    
    .rx1block	cmp	rx1cmd,		#0 		wz	'need more bytes?
    	if_z 	jmp	#.done					'no - done
    '				
    		cmp	rx1_head, 	rx1_tail	wz	'byte received?
    	if_z	ret						'no - try again don't block the rest
    '
    		mov	rx_address, 	rx1_tail		'adjust to buffer start
    		add	rx_address, 	rx1_lut_buff		'by adding rx1_lut_buff
    		rdlut	rx_char, 	rx_address		'get byte from circular buffer in lut
    		incmod	rx1_tail, 	rx1_lut_btop		'increment buffer tail
    		wrbyte  rx_char, 	rx1param		'write byte to Block
    		add	rx1param, 	#1			'adjust Block address
    	_ret_	sub	rx1cmd,		#1			'adjust count - try again don't block the rest
    '
    ' now I want to use rx1_head rx1_tail,  rx1_lut_btop as bytes not longs as they are now
    '
    .rx1block	cmp	rx1cmd,		#0 		wz	'need more bytes?
    	if_z 	jmp	#.done					'no - done
    '				
    		cmp	rx1_head, 	rx1_tail	wz	'byte received?
    	if_z	ret						'no - try again don't block the rest
    '
    'new
    *		mov byte_index,       rx1_tail
    *		and byte_index,        #%11
    *		shl  byte_index,       #4
    		mov	rx_address, 	rx1_tail		'adjust to buffer start
    *		shr	rx_address, 	#2
    		add	rx_address, 	rx1_lut_buff	'and adding rx1_lut_buff
    		rdlut	rx_char, 	        rx_address	'get long from circular buffer in lut
    *		shr  	rx_char, 	        byte_index
    *		and  	rx_char, 	        #$FF
    'new
    		incmod	rx1_tail, 	rx1_lut_btop	'increment buffer tail
    		wrbyte  rx_char, 	rx1param		'write byte to Block
    		add	rx1param, 	#1			'adjust Block address
    	_ret_	sub	rx1cmd,		#1			'adjust count - try again don't block the rest
    

    this adds 6 instructions can I do this shorter?

    Enjoy

    Mike
  • evanhevanh Posts: 15,916
    edited 2019-02-03 12:15
    You can free up a bunch of cogRAM by putting the code in lutRAM. Here's an example wrapper around your above code:
    '-------- Copy lut code into position --------
    		setq2	#(LUT_CODE_END - LUT_CODE_START - 1)	'copy length, in longwords
    		rdlong	0, ##@LUT_CODE_START			'the "0" is lutRAM zero, or $200 in memory map
    		jmp	#\LUT_CODE_START			'jump into the lutRAM copy
    
    
    ORG   $200                                    'longword addressing
    LUT_CODE_START
    
    ' I want to replace all wrlut's and rdlut's used right now
    'current code something like this
    
    .rx1block	cmp	rx1cmd,		#0 		wz	'need more bytes?
    	if_z 	jmp	#.done					'no - done
    '				
    		cmp	rx1_head, 	rx1_tail	wz	'byte received?
    	if_z	ret						'no - try again don't block the rest
    '
    		mov	rx_address, 	rx1_tail		'adjust to buffer start
    		add	rx_address, 	rx1_lut_buff		'by adding rx1_lut_buff
    		rdlut	rx_char, 	rx_address		'get byte from circular buffer in lut
    		incmod	rx1_tail, 	rx1_lut_btop		'increment buffer tail
    		wrbyte  rx_char, 	rx1param		'write byte to Block
    		add	rx1param, 	#1			'adjust Block address
    	_ret_	sub	rx1cmd,		#1			'adjust count - try again don't block the rest
    '
    ' now I want to use rx1_head rx1_tail,  rx1_lut_btop as bytes not longs as they are now
    '
    .rx1block	cmp	rx1cmd,		#0 		wz	'need more bytes?
    	if_z 	jmp	#.done					'no - done
    '				
    		cmp	rx1_head, 	rx1_tail	wz	'byte received?
    	if_z	ret						'no - try again don't block the rest
    '
    'new
    *		mov byte_index,       rx1_tail
    *		and byte_index,        #%11
    *		shl  byte_index,       #4
    		mov	rx_address, 	rx1_tail		'adjust to buffer start
    *		shr	rx_address, 	#2
    		add	rx_address, 	rx1_lut_buff	'and adding rx1_lut_buff
    		rdlut	rx_char, 	        rx_address	'get long from circular buffer in lut
    *		shr  	rx_char, 	        byte_index
    *		and  	rx_char, 	        #$FF
    'new
    		incmod	rx1_tail, 	rx1_lut_btop	'increment buffer tail
    		wrbyte  rx_char, 	rx1param		'write byte to Block
    		add	rx1param, 	#1			'adjust Block address
    	_ret_	sub	rx1cmd,		#1			'adjust count - try again don't block the rest
    
    LUT_CODE_END
    FIT   $400
    

    EDIT: Added the absolute addressing to the jump. Avoids a bug in Pnut.
  • evanhevanh Posts: 15,916
    After that you can then move the buffers into cogRAM and use the more powerful ALTxx + GETBYTE/SETBYTE combos.
  • hmm I already use the complete LUT as buffer for my four serial ports I can not put my code there,

    My question was more if there is a faster way to access bytes out of a long out of the LUT. something like wrlut_byte(lutadrss, byte0-3)

    But GETBYTE SETBYTE just run in COGRAM, Still maybe faster as my current attempt, will test.

    Thanks,

    Mike
  • evanhevanh Posts: 15,916
    What I'm saying is you'll get better performance if you swap those over. Put the buffers in cogRAM and code in lutRAM.
  • hmm - sounds wrong.

    Code execution from LUT is slower then Code execution from RAM and if I use alts/d + getbyte I can also use rdlut+getbyte, so no code space savings but slower execution?

    confused

    Mike
  • evanhevanh Posts: 15,916
    edited 2019-02-04 03:43
    Code execution in lutRAM is full speed with no penalties. Same as cogRAM. Only limitation is self-modifying doesn't have the flexibility of cogRAM.

    RDLUT is the one that's slower. Although the biggest factor is GETBYTE can only be used upon cogRAM so any such use on data from lutRAM needs load and store operations around it.

    The ALTxx prefixing instructions provide cogRAM table/buffer indexing in a very convenient package. The extra two clocks are easily made up for by their abilities.
  • msrobotsmsrobots Posts: 3,709
    edited 2019-02-04 03:55
    hmm - I need to think about this.

    I do know that rdlut need 3 clock instead of two, but I currently use all 512 LUT longs as - guess - Look Up Table, and am on the way to rework my code, I am down to 438 longs with long buffer addressing.

    I am reworking the code to find any differences between the 1 pair of RX/TX to 2 pair of RX/TX. Fund some typos, but the main issue of 1 and 2 port failing with different errors has not lifted its head to greet me.

    I slowly think that the serial driver is OK but the echo-server is to slow. But with 4 time the buffer size in the driver it should go further and that would proof that the issue is in the echo server.

    But when I save 4 bytes as longs in my buffer not 1 byte per long I will be able to move data faster between HUB and LUT, that will make a huge difference.

    At least this is my current working plan.

    Enjoy!

    Mike
  • jmgjmg Posts: 15,173
    msrobots wrote: »
    I slowly think that the serial driver is OK but the echo-server is to slow. But with 4 time the buffer size in the driver it should go further and that would proof that the issue is in the echo server.

    Chip has said the Smart Pins can manage gapless send and receive, (at least up to some high baud speeds).
    It may be that echo needs asm coding, to copy incoming Rx byte to echo-Tx ?

  • evanhevanh Posts: 15,916
    Gapless UART transimission requires monitoring the smartpin IN status - Intended for event/IRQ generation. Using RDPIN can only tell when transmission has ceased.
  • jmgjmg Posts: 15,173
    evanh wrote: »
    Gapless UART transimission requires monitoring the smartpin IN status - Intended for event/IRQ generation. Using RDPIN can only tell when transmission has ceased.

    Most the the P2 Smart pin DOC's are rather cryptic, but they do say this :

    "X[5] selects the update mode:

    X[5] = 0 sets continuous mode, where a first word is written via WYPIN during reset (DIR=0) to prime the shifter. Then, after reset (DIR=1), the second word is buffered via WYPIN and continuous clocking is started. Upon shifting each word, the buffered data written via WYPIN is advanced into the shifter and IN is raised, indicating that a new output word can be buffered via WYPIN. This mode allows steady data transmission with a continuous clock, as long as the WYPIN’s after each IN-rise occur before the current word transmission is complete.

    X[5] = 1 sets start-stop mode, where the current output word can always be updated via WYPIN before the first clock, flowing right through the buffer into the shifter. Any WYPIN issued after the first clock will be buffered and loaded into the shifter after the last clock of the current output word, at which time it could be changed again via WYPIN. This mode is useful for setting up the output word before a stream of clocks are issued to shift it out.

    X[4:0] sets the number of bits, minus 1. For example, a value of 7 will set the word size to 8 bits.

    WYPIN is used to load the output words. The words first go into a single-stage buffer before being advanced to the shifter for output. Each time the buffer is advanced into the shifter, IN is raised, indicating that a new output word can be written via WYPIN. During reset, the buffer flows straight into the shifter.
    "


    That does mention a separate buffer and shifter, so they should have a queue of about 1 char time, so update jitter within that window, should still give gapless transmit.
    Hence this statement "This mode allows steady data transmission with a continuous clock, as long as the WYPIN’s after each IN-rise occur before the current word transmission is complete."
  • msrobotsmsrobots Posts: 3,709
    edited 2019-02-04 05:28
    jmg wrote: »
    msrobots wrote: »
    I slowly think that the serial driver is OK but the echo-server is to slow. But with 4 time the buffer size in the driver it should go further and that would proof that the issue is in the echo server.

    Chip has said the Smart Pins can manage gapless send and receive, (at least up to some high baud speeds).
    It may be that echo needs asm coding, to copy incoming Rx byte to echo-Tx ?

    since fastspin produces pasm I was not thinking so, but I can use the serial driver directly from pasm, so that is one of the next options
    evanh wrote: »
    Gapless UART transimission requires monitoring the smartpin IN status - Intended for event/IRQ generation. Using RDPIN can only tell when transmission has ceased.

    yes, I do use events/interrupts for reading the serial RX pins, int1 for RX1 and int2 for RX2 and that seems to work flawless and gapless (as long I can keep up reading my buffer) .

    but a big setback is this

    error: Third operand to setbyte must be an immediate

    same with getbyte. That is bad.

    because now I need 4 cmp and 4 getbytes/setbytes

    so
    rx1_isr		rdpin	rx1_char,	rx1_pin			'get received chr
    		shr	rx1_char,	#32-8			'shift to lsb justify
    		mov	rx1_byte_index, rx1_head
    		and	rx1_byte_index, #%11
    		mov	rx1_address,	rx1_head		'adjust to buffer start
    		shr	rx1_address,	#2
    		add	rx1_address,	rx1_lut_buff 		'by adding rx1_lut_buff
    		rdlut	rx1_lut_value,	rx1_address
    		setbyte rx1_lut_value,	rx1_char, rx1_byte_index
    		wrlut	rx1_lut_value,	rx1_address		'write byte to circular buffer in lut
    		incmod	rx1_head, 	rx1_lut_btop		'increment buffer head
    		cmp	rx1_head, 	rx1_tail 	wz	'hitting tail is bad
    	if_z	incmod	rx1_tail, 	rx1_lut_btop		'increment tail  - I am losing received chars at the end of the buffer because the buffer is full
    		reti1						'exit
    

    does not compile. Will need to do
    rx1_isr		rdpin	rx1_char,	rx1_pin			'get received chr
    		shr	rx1_char,	#32-8			'shift to lsb justify
    		mov	rx1_byte_index, rx1_head
    		and	rx1_byte_index, #%11
    		mov	rx1_address,	rx1_head		'adjust to buffer start
    		shr	rx1_address,	#2
    		add	rx1_address,	rx1_lut_buff 		'by adding rx1_lut_buff
    		rdlut	rx1_lut_value,	rx1_address
    		cmp	rx1_byte_index,	#0		wz
    	if_z	setbyte rx1_lut_value,	rx1_char, #0
    		cmp	rx1_byte_index,	#1		wz
    	if_z	setbyte rx1_lut_value,	rx1_char, #1
    		cmp	rx1_byte_index,	#2		wz
    	if_z	setbyte rx1_lut_value,	rx1_char, #2
    		cmp	rx1_byte_index,	#3		wz
    	if_z	setbyte rx1_lut_value,	rx1_char, #3
    		wrlut	rx1_lut_value,	rx1_address		'write byte to circular buffer in lut
    		incmod	rx1_head, 	rx1_lut_btop		'increment buffer head
    		cmp	rx1_head, 	rx1_tail 	wz	'hitting tail is bad
    	if_z	incmod	rx1_tail, 	rx1_lut_btop		'increment tail  - I am losing received chars at the end of the buffer because the buffer is full
    		reti1						'exit
    

    instead?

    well I am getting there, just running out of longs...

    maybe I can use altd/s/I to shorten that up

    Mike
  • evanhevanh Posts: 15,916
    ALTGB/ALTSB solves all.
  • evanh wrote: »
    ALTGB/ALTSB solves all.

    I am not following you, wtf is ALTGB/ALTBS?

    Mike
  • ElectrodudeElectrodude Posts: 1,657
    edited 2019-02-04 20:17
    See the Parallax Propeller 2 Instructions v32 spreadsheet starting at row 103.

    The ALTGB and ALTSB instructions allow you to override the fixed third argument of GETBYTE and SETBYTE instructions. There are other similar instructions to override fixed fields of other instructions too.

    EDIT: Those instructions override both the D and N fields, allowing you to access all of cogram as a word, byte, or nibble array with only two instructions per access.
  • msrobotsmsrobots Posts: 3,709
    edited 2019-02-05 02:55
    ohh, good, I missed that link and the google doc I know of does not describe the instructions, so I am flying blind, mostly.

    Thank you @Electrodude,

    Mike

    .
  • OK,
    I read the docu but I do something wrong
    rx2_isr		rdpin	rx2_char,	rx2_pin			'get received chr
    		shr	rx2_char,	#32-8			'shift to lsb justify
    		mov	rx2_byte_index, rx2_head
    		and	rx2_byte_index, #%11
    		mov	rx2_address,	rx2_head		'adjust to buffer start
    		shr	rx2_address,	#2
    		add	rx2_address,	rx2_lut_buff 		'by adding rx1_lut_buff
    		rdlut	rx2_lut_value,	rx2_address
    
    '		neg	rx2_byte_index
    '		add	rx2_byte_index,	#4
    '		add	rx2_byte_index,	#rx2_lut_value<<2
    '		altsb	rx2_byte_index
    '		setbyte 0-0,		rx2_char, #0-0
    
    		cmp	rx2_byte_index,	#0		wz
    	if_z	setbyte rx2_lut_value,	rx2_char, #3
    		cmp	rx2_byte_index,	#1		wz
    	if_z	setbyte rx2_lut_value,	rx2_char, #2
    		cmp	rx2_byte_index,	#2		wz
    	if_z	setbyte rx2_lut_value,	rx2_char, #1
    		cmp	rx2_byte_index,	#3		wz
    	if_z	setbyte rx2_lut_value,	rx2_char, #0
    '
    		wrlut	rx2_lut_value,	rx2_address		'write byte to circular buffer in lut
    		incmod	rx2_head, 	rx2_lut_btop		'increment buffer head
    		cmp	rx2_head, 	rx2_tail 	wz	'hitting tail is bad
    	if_z	incmod	rx2_tail, 	rx2_lut_btop		'increment tail  - I am losing received chars at the end of the buffer because the buffer is full
    		reti2						'exit
    

    I do want to replace the 8 lines following the out commented altsb block to save 3 longs, but it does not work, what I am doing wrong with altsb and setbyte?

    unsure,

    Mike
  • Cluso99Cluso99 Posts: 18,069
    Didn’t look properly, but you require an ALTSB before each SETBYTE instruction. The ALT and AUG only apply to the following instruction.
  • jmgjmg Posts: 15,173
    msrobots wrote: »
    I do want to replace the 8 lines following the out commented altsb block to save 3 longs, but it does not work, what I am doing wrong with altsb and setbyte?
    There is code in the ROM_Booter source that shuffles bytes into long, for the checksum, so you could check that ?

    You could also look at
    RCZR D {WC/WZ/WCZ} Rotate C,Z right through D. D = {C, Z, D[31:2]}. C = D[1], Z = D[0].
    Not sure if there is any non-destructive version of that ?
    Which gets 2 bits into CZ, you can test for 4 packed statements.

    Or, maybe this pair can be even faster ?
    DECOD D,{#}S Decode S[4:0] into D. D = 1 << S[4:0].
    and
    SKIPF   {#}D Skip cog/LUT instructions fast per D. Like SKIP, but instead of cancelling instructions, the PC leaps over them.

  • Cluso99Cluso99 Posts: 18,069
    Looking closer, i am unsure what is not working.

    Are those 8 lines working and you need to find a 5 instruction replacement for them?
  • evanhevanh Posts: 15,916
    Mike,
    I'm not sure why the RDLUT code is there but here's all I think you need in there:
    rx2_isr
    		rdpin	rx2_char,	rx2_pin			'get received chr
    		shr	rx2_char,	#32-8			'shift to lsb justify
    		altsb	rx2_head, #rx2_buffer
    		setbyte	rx2_char
    		incmod	rx2_head, 	rx2_lut_btop		'increment buffer head
    		cmp	rx2_head, 	rx2_tail 	wz	'hitting tail is bad
    	if_z	incmod	rx2_tail, 	rx2_lut_btop		'increment tail  - I am losing received chars at the end of the buffer because the buffer is full
    		reti2						'exit
    
    
  • Be aware that ALTxx is broken on P2-ES.
    IIRC sign extension (negative deltas)?
  • Hi ozpropdev. Do we know which particular ALTxx instructions are broken? I think we might have been using some of them for HDMI bitbang, though there are a few variants.
  • evanhevanh Posts: 15,916
    Only going to affect negative indexing from the base. That's not a very common action.
Sign In or Register to comment.