cogserial - fullduplex smart serial using interrupt

evanh · 2019-02-08 22:19

Huh, I hadn't noticed the differences either. I've been happily using an ALTxx + SETNIB/GETNIB combo without problem to do decimal strings. Fastspin hasn't given me any grief that I know of so I assume that Fastspin's error is unique to SETBYTE.

SETNIB  {#}S	Set S[3:0] into nibble established by prior ALTSN instruction.
GETNIB  D	Get nibble established by prior ALTGN instruction into D.

Amusingly, I've been using Pnut to produce my binaries for the last week due to having destroyed my P2ES chip and then updated the FPGA to v33i.

EDIT: Oh, no, I was using the full version of SETNIB until recently. I hadn't realised I'd changed it back to the alias since destroying the P2ES chip. I didn't have any comments as to why ... and now I remember, it was because, when I changed from Pnut to p2asm, p2asm would throw an error on that alias. But I'd long replaced p2asm with fastspin.

msrobots · 2019-02-09 07:34

aah, that makes sense.

And it is good to find a bug, that is why we are testing all of this.

Sadly I somehow messed up my program and just one of the two pairs of RX and TX want to work. I am starring at my code to no avail. It was working before, I just broke it shaving longs away to make space for string input.

This is extremely frustrating and I am at the point to start all over again. The sad thing is that I am right now even unsure if I broke the PASM driver, the spin object or my test routines.

The only thing I am sure about is that is was working before, is not working now and stupid me broke it without keeping copies of the working code.

Except what I posted here about a week ago...

Mike

evanh · 2019-02-09 07:54

.

msrobots wrote: »

The only thing I am sure about is that is was working before, is not working now and stupid me broke it without keeping copies of the working code.

Ya, done that too many times myself. Saved myself a number of times with editor undo's. When I remember, I often have hunks of disabled duplicated code when working on an alternative. Or I might duplicate the sources completely for the alternative approach.

I presume that's one big reason why source control systems were devised.

msrobots · 2019-02-09 08:06

evanh wrote: »

Mike,
What is the content of those bytes? Doing endian reversal without a good reason is just making extra work.
...

Give this a try anyway:

rx1_isr		rdpin	rx1_char,	rx1_pin			'get received chr
		shr	rx1_char,	#32-8			'shift to lsb justify

		mov	rx1_address,	rx1_head		'adjust to buffer start
		shr	rx1_address,	#2
		add	rx1_address,	rx1_lut_buff 		'by adding rx1_lut_buff
		rdlut	rx1_lut_value,	rx1_address

		mov	rx1_byte_index,	rx1_head
		and	rx1_byte_index,	#%11			'now 0 to 3
		xor	rx1_byte_index,	#%11			'now 3 to 0

		altsb 	rx1_byte_index,	#rx1_lut_value
		setbyte	rx1_char

		wrlut	rx1_lut_value,	rx1_address		'write byte to circular buffer in lut
		incmod	rx1_head, 	rx1_lut_btop		'increment buffer head
		cmp	rx1_head, 	rx1_tail 	wz	'hitting tail is bad
	if_z	incmod	rx1_tail, 	rx1_lut_btop		'increment tail  - I am losing received chars at the end of the buffer because the buffer is full
		reti1						'exit

Currently I am even not sure about the endian reversal. On the P1 a long in HUB ram has a different byte order as in COG ram, not sure about P2.

My plan was to write bytes into my LUT buffers in the RX interrupt, write bytes from my LUT buffers in the TX interrupt, but when transferring LUT to HUB and HUB to LUT being able to write longs after adjusting to the first complete LUT long by writing/reading bytes. Maybe even use the streamer. Or rdfast or whatever.

I never got that far because of running out of space and then breaking it.

My thinking was that when using getbyte/setbyte I am able to reverse the order and as soon as I write/read the first long I will find out if the reversal is needed or not. And if not shave off the reversal and save two more longs.

I am very sure that IF I find my stupid mistake I made, it will be the classical DOH moment.

Enjoy!

Mike

msrobots · 2019-02-09 08:16

evanh wrote: »

.

msrobots wrote: »

The only thing I am sure about is that is was working before, is not working now and stupid me broke it without keeping copies of the working code.

Ya, done that too many times myself. Saved myself a number of times with editor undo's. When I remember, I often have hunks of disabled duplicated code when working on an alternative. Or I might duplicate the sources completely for the alternative approach.

I presume that's one big reason why source control systems were devised.

Yes on all points, but undo's are broken in Spin2Gui, at least at my current version under windows, and since at work I am bound to use subversion and not allowed to use GIT because of company policy, I never bothered to use GIT. My private stuff is also not allowed on the company servers, and gosh who bothers about source control for files with just 500 lines of code.

I am a COBOL programmer, source files are supposed to be HUGE, with 500 lines you do not even get a DB access running in COBOL.

Hmm - I do not find any other excuse for not using source control, maybe I should look into GIT.

Enjoy!

Mike

msrobots · 2019-02-09 08:41

AJL wrote: »
msrobots wrote: »
Thanks for all your input, but I am still stuck

I tried this
rx1_isr		rdpin	rx1_char,	rx1_pin			'get received chr
		shr	rx1_char,	#32-8			'shift to lsb justify
		mov	rx1_byte_index, rx1_head
		and	rx1_byte_index, #%11			'now 0 to 3
		mov	rx1_address,	rx1_head		'adjust to buffer start
		shr	rx1_address,	#2
		add	rx1_address,	rx1_lut_buff 		'by adding rx1_lut_buff
		rdlut	rx1_lut_value,	rx1_address
		
		neg	rx1_byte_index				' now 0 to -3
		add	rx1_byte_index,	#3			' now 3 to 0
		altsb 	rx1_byte_index,	#rx1_lut_value
  		setbyte rx1_char

'		cmp	rx1_byte_index,	#0		wz
'	if_z	setbyte rx1_lut_value,	rx1_char, #3
'		cmp	rx1_byte_index,	#1		wz
'	if_z	setbyte rx1_lut_value,	rx1_char, #2
'		cmp	rx1_byte_index,	#2		wz
'	if_z	setbyte rx1_lut_value,	rx1_char, #1
'		cmp	rx1_byte_index,	#3		wz
'	if_z	setbyte rx1_lut_value,	rx1_char, #0

		wrlut	rx1_lut_value,	rx1_address		'write byte to circular buffer in lut
		incmod	rx1_head, 	rx1_lut_btop		'increment buffer head
		cmp	rx1_head, 	rx1_tail 	wz	'hitting tail is bad
	if_z	incmod	rx1_tail, 	rx1_lut_btop		'increment tail  - I am losing received chars at the end of the buffer because the buffer is full
		reti1						'exit
but it does not work. And I really need this 4 longs each per pair...

I am loosing faith in being a worthy programmer,

Mike
It's been suggested before, but I'll mention it again: Have you tried moving your code to LUT RAM and placing your buffers in COG RAM?

It seems when this has been mentioned previously you have stated that you can't because LUT RAM is full: of buffers.

But if the buffers are moved to COG RAM, you have that space for code, and you'll be able to pack your bytes into longs with ALTSB in COG RAM buffers.

Please correct me if I'm off base here.

Well yes @AJL I think @evanh mentioned that, but I am not sure why this would help. Maybe you can elaborate. My point of view here is that I have 512 longs of LUT ram that would nicely fit 4 512 byte buffers for RX1/TX1/RX2/TX2.

Cog ram is not 512 longs in my understanding, because of special registers at the end of COG ram or is that different on the P2 vs the P1? I did ask that question before and found no answer yet.

Since I am considering to rewrite this completely if I can't find the stupid mistake I made I am really interested about why two people now recommend to use LUT ram for code and COG ram as LUT/buffer.

I am sometimes quite slow to understand things, so please bear with me and explain further. I seem to miss some point of the argument why I should try this.

Sure I can copy my code from COG to LUT and run it there, and reuse the COG space as buffer, but why should I?

I currently reuse all initialization code space for register variables. To speed up things I pre calculate pointers and have them ready to use. That are about 150 registers ready to use because in COG ram.

If I have the code in LUT and my buffers in COG how to handle those variables I need to do rdbyte/rdlong buffer positions/sizes whatever.

Keeping them in COG ram would reduce the available buffer size, having them in LUT ram and accessing with rdlut wrlut seems impossible to me.

confused,

Mike

msrobots · 2019-02-09 09:04

@ersmith,

I have a stupid question.

In spin on the P1 I can do something like this and it seems to work with fastspin too, but maybe not and that is my problem.

'
' buffered smart pin serial object for P2 Eval board, buffering rx/tx in the Cog, supporting 2 full-duplex connrction
'
CON
  _txmode       = %0000_0000_000_0000000000000_01_11110_0 	'async tx mode, output enabled for smart output
  _rxmode       = %0000_0000_000_0000000000000_00_11111_0 	'async rx mode, input  enabled for smart input

OBJ
  serpasm:	"cogserialpasm.spin2"				'this is the PASM2 COG doing all the work
'
'-----------------------------------------------------------------------
'
VAR
  long rx1_cmd, rx1_param, tx1_cmd, tx1_param, rx2_cmd, rx2_param, tx2_cmd, tx2_param	'mailbox of this instance 8 longs

DAT
outchar byte "H",0,0,0
'
'-----------------------------------------------------------------------
'stop pasm cog if already running
'-----------------------------------------------------------------------
'
PUB stop
  serpasm.stop
'
'-----------------------------------------------------------------------
'return parameter address of instance
'-----------------------------------------------------------------------
'
PUB mailboxaddress
RETURN @rx1_cmd
'
'-----------------------------------------------------------------------
'use this to start a 1 port driver with 255 bytes buffer each channel
'-----------------------------------------------------------------------
'
PUB start(rxpin = 63, txpin = 62, mode = -1, baudrate = 230_400)  | bitrate
  bitrate := 7 + ((CLKFREQ / baudrate) << 16)
RETURN startpasm(@rx1_cmd, rxpin, bitrate, _rxmode, 0, $FF, txpin, bitrate, _txmode, $100, $FF, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 100)
'
'-----------------------------------------------------------------------
'use this to start a 2 port driver with 128 bytes buffer each channel
'-----------------------------------------------------------------------
'
PUB start2(rxpin1, txpin1, mode1, baudrate1, rxpin2, txpin2, mode2, baudrate2) | bitrate1, bitrate2
  bitrate1 := 7 + ((CLKFREQ / baudrate1) << 16)
  bitrate2 := 7 + ((CLKFREQ / baudrate2) << 16)
RETURN startpasm(@rx1_cmd, rxpin1, bitrate1, _rxmode, 0, $7F, txpin1, bitrate1, _txmode, $80, $7F, rxpin2, bitrate2, _rxmode, $100, $7F, txpin2, bitrate2, _txmode, $180, $7F, 100)
'
'-----------------------------------------------------------------------
'use this to start whatever combination you want
'-----------------------------------------------------------------------
'
PUB startExt(rxpin1, txpin1, rxbaudrate1, txbaudrate1, rxmode1, txmode1, rxlutstart1, rxlutsize1, txlutstart1, txlutsize1, rxpin2, txpin2, rxbaudrate2, txbaudrate2, rxmode2, txmode2, rxlutstart2, rxlutsize2, txlutstart2, txlutsize2, txclocks)  | rxbitrate1, txbitrate1, rxbitrate2, txbitrate2 
  rxbitrate1 := 7 + ((CLKFREQ / rxbaudrate1) << 16)
  txbitrate1 := 7 + ((CLKFREQ / txbaudrate1) << 16)
  rxbitrate2 := 7 + ((CLKFREQ / rxbaudrate2) << 16)
  txbitrate2 := 7 + ((CLKFREQ / txbaudrate2) << 16)
  RETURN startpasm(@rx1_cmd, rxpin1, rxbitrate1, rxmode1, rxlutstart1, rxlutsize1, txpin1, txbitrate1, txmode1, txlutstart1, txlutsize1, rxpin2, rxbitrate2, rxmode2, rxlutstart2, rxlutsize2, txpin2, txbitrate2, txmode2, txlutstart2, txlutsize2, txclocks)
'
'-----------------------------------------------------------------------
'this provides the parameter array needed to start the pasm cog on the stack
'-----------------------------------------------------------------------
'
PRI startpasm(mailboxaddress, rxpin1, rxbitrate1, rxmode1, rxlutstart1, rxlutsize1, txpin1, txbitrate1, txmode1, txlutstart1, txlutsize1, rxpin2, rxbitrate2, rxmode2, rxlutstart2, rxlutsize2, txpin2, txbitrate2, txmode2, txlutstart2, txlutsize2, txclocks)
RETURN serpasm.start(@mailboxaddress)

So I am using the parameter of the PRI method startpasm as a continuous block of 21 longs, given by the address of the first parameter to provide the start parmeter block to my PASM driver,
This starting is reading in 21 longs, then setting a sync value so that startpasm returns after the COG started and has read its parameters.

'=======================================================================
'
' buffered smart pin serial object for P2 Eval board, buffering rx/tx in the Cog, supporting 2 full-duplex connrction
'
'=======================================================================
VAR
  long cog								'cog id of this instance
'
'-----------------------------------------------------------------------
'stop cog if already running
'-----------------------------------------------------------------------
'
PUB stop
  if cog
    cogstop(cog-1)
    cog := 0
'
'-----------------------------------------------------------------------
'start COG with startparameter block address
'-----------------------------------------------------------------------
'
PUB start(startparameteraddress)									'the first long in the startparameter block contains the address of the later used
  stop													'Mailbox. So now I use long[long[]] to access the first long in the later used mailbox as Flag
  long[long[startparameteraddress]] := 0								'set flag (0) to know if the started cog has read its parameters
  cog := cognew(@cogserial_init,startparameteraddress) + 1
  if cog												'if I was able to start the COG
    repeat until long[long[startparameteraddress]] == -1						'I wait until Flag states cog is done reading parameter and ready to roll  (-1)
    RESULT := 1												'now start done

the first parameter of the start parameter block contains the address of the mailbox to use, thus the long[long[xx]]

can I assume that the parameter of my function

PRI startpasm(mailboxaddress, rxpin1, rxbitrate1, rxmode1, rxlutstart1, rxlutsize1, txpin1, txbitrate1, txmode1, txlutstart1, txlutsize1, rxpin2, rxbitrate2, rxmode2, rxlutstart2, rxlutsize2, txpin2, txbitrate2, txmode2, txlutstart2, txlutsize2, txclocks)
RETURN serpasm.start(@mailboxaddress)

are 21 continuous longs in memory addressable as @mailboxaddress like it would be on a P1 or is it possible that those parameter are not in a continuous block in memory?

pulling at straws here...

Mike

jmg · 2019-02-09 09:20

msrobots wrote: »

... I am really interested about why two people now recommend to use LUT ram for code and COG ram as LUT/buffer.

I am sometimes quite slow to understand things, so please bear with me and explain further. I seem to miss some point of the argument why I should try this.

Sure I can copy my code from COG to LUT and run it there, and reuse the COG space as buffer, but why should I?

I think the idea behind doing that, is these opcodes can work accessing into COG, and not LUT

106	.	Register Indirection	EEEE 1001011 00I DDDDDDDDD SSSSSSSSS	ALTSB   D,{#}S	Alter subsequent SETBYTE instruction. Next D field = (D[10:2] + S) & $1FF, N field = D[1:0].         D += sign-extended S[17:9].	2	same	2	same	D																						
107	alias	Register Indirection	EEEE 1001011 001 DDDDDDDDD 000000000	ALTSB   D	Alter subsequent SETBYTE instruction. Next D field = D[10:2], N field = D[1:0].	2	same	2	same	D																						
108	.	Register Indirection	EEEE 1001011 01I DDDDDDDDD SSSSSSSSS	ALTGB   D,{#}S	Alter subsequent GETBYTE/ROLBYTE instruction. Next S field = (D[10:2] + S) & $1FF, N field = D[1:0]. D += sign-extended S[17:9].	2	same	2	same	D																						
109	alias	Register Indirection	EEEE 1001011 011 DDDDDDDDD 000000000	ALTGB   D	Alter subsequent GETBYTE/ROLBYTE instruction. Next S field = D[10:2], N field = D[1:0].	2	same	2	same	D																						
90	.	Math and Logic	EEEE 1000110 NNI DDDDDDDDD SSSSSSSSS	SETBYTE D,{#}S,#N	Set S[7:0] into byte N in D, keeping rest of D same.	2	same	2	same	D																						
91	alias	Math and Logic	EEEE 1000110 00I 000000000 SSSSSSSSS	SETBYTE {#}S	Set S[7:0] into byte established by prior ALTSB instruction.	2	same	2	same	D																						
92	.	Math and Logic	EEEE 1000111 NNI DDDDDDDDD SSSSSSSSS	GETBYTE D,{#}S,#N	Get byte N of S into D. D = {24'b0, S.BYTE[N]).	2	same	2	same	D																						
93	alias	Math and Logic	EEEE 1000111 000 DDDDDDDDD 000000000	GETBYTE D	Get byte established by prior ALTGB instruction into D.	2	same	2	same	D

ie you can create a byte pointer.

msrobots · 2019-02-09 09:38

sure, I get that part.

But I can not keep variables in LUT space, how to handle that?

and with a working setbyte /altsb I seem to be able to byte-address my LUT buffer with two instructions too. What am I missing?

Mike

jmg · 2019-02-09 10:07

msrobots wrote: »

sure, I get that part.

But I can not keep variables in LUT space, how to handle that?

and with a working setbyte /altsb I seem to be able to byte-address my LUT buffer with two instructions too. What am I missing?

You may be missing that the opcode fields are only 9 bits, and so have a 512L reach : that covers all of COG, but does not reach into LUT
If all opcodes could reach COG and LUT equally, there would be no need to call it LUT

Variables stay where opcodes best access them, in COG and code can go where it only needs to execute, in LUT.

Some Assemblers allow the idea of CSEG and DSEG and they place code and data where they best fit.
Maybe P2 needs something similar, so you can just write, and then the assembler splits as needed, checking for anything that may be illegal.

Other approaches to this would be to code your buffers as byte only, and not pack them. Yes, that's wasteful, but it is also fast and simple - and allows easier testing.
Later, you can tune the buffers to be less wasteful and store more bytes, if you find you need to.
Faster interrupts do not need the buffers to be as large.

msrobots · 2019-02-09 10:40

Yes, good points, I was thinking along the same lines but came to a different answer.

Since I do need COG ram for my variables, my buffers need to be in the LUT or the buffers have to be significant smaller.

And on my first (working) example I do not pack the bytes in the LUT and waste 3 of four bytes in a long.

And yes, you are right the interrupts fill the buffer quite fast, the best working version I had worked up to sysclock baud on RX and TX.

In my current approach RX1 uses int1 RX2 uses int2 and I have int3 just running every x sysclock, checking if it can transfer on TX1 or TX2.

The main COG just checks the mailboxes and transfers data from buffer to HUB and vice versa.

That worked up to sysclock baud on all four lines talking to themselves on smartpins using one COG.

Getting the bytes into the HUB or from the HUB was and is the bottleneck. When I save my bytes in longs in LUT I can just read a long and write one byte into HUB.

When I on the other hand can pack 4 bytes in the right order in my LUT buffer long, I need just to transfer single bytes to/from HUB until I am at the start of a new LUT long and then can start moving longs to/from HUB until I need to step down to bytes on the last long. I think I might even be able to use rdfast/wrfast, when transferring big enough data.

This is about optimizing the speed when transferring blocks of data, instead of single bytes.

for a simple ser.TX(65) it does not really make a difference but for ser.str() or ser.writeblock(…) it will make a huge difference if writing say 30 bytes is 30 rdbytes/wrbytes to HUB or up to 4 rdbytes/wrbytes and at least 7 rdlongs/wrlongs instead of bytes.

But to do so I need 4 bytes not one in my LUT, and then I started packing bytes and broke everything. Well not everything RX1/TX1 are working RX2/TX2 are broken.

Mike

evanh · 2019-02-09 11:07

I presumed, since you were quadrupling the buffer density, that you could easily get away with just doubling the number of entries while halving its total size and gaining benefits all round. Including more available space for code.

msrobots · 2019-02-09 12:06

evanh wrote: »

I presumed, since you were quadrupling the buffer density, that you could easily get away with just doubling the number of entries while halving its total size and gaining benefits all round. Including more available space for code.

huh?, No I am not just thinking about quadrupling the size but faster access between HUB and LUT thru reading and writing longs instead of bytes. But I still need way more code space then I have to do so.

Refactoring and refactoring, shaving off longs and hoping to stumble upon why RX2 misbehaves, by using the same code for both channels.

I can at least report some success, thanks to all of you

this, does work and replaces my 8 lines of code by 3 of them.

rx1_isr		rdpin	rx1_char,	rx1_pin			'get received chr
		shr	rx1_char,	#32-8			'shift to lsb justify
		mov	rx1_byte_index, rx1_head
		and	rx1_byte_index, #%11			'now 0 to 3
		mov	rx1_address,	rx1_head		'adjust to buffer start
		shr	rx1_address,	#2
		add	rx1_address,	rx1_lut_buff 		'by adding rx1_lut_buff
		rdlut	rx1_lut_value,	rx1_address

		xor	rx1_byte_index,#3
		altsb 	rx1_byte_index,	#rx1_lut_value
  		setbyte 0,		rx1_char, #0
'
'		cmp	rx1_byte_index,	#0		wz
'	if_z	setbyte rx1_lut_value,	rx1_char, #3
'		cmp	rx1_byte_index,	#1		wz
'	if_z	setbyte rx1_lut_value,	rx1_char, #2
'		cmp	rx1_byte_index,	#2		wz
'	if_z	setbyte rx1_lut_value,	rx1_char, #1
'		cmp	rx1_byte_index,	#3		wz
'	if_z	setbyte rx1_lut_value,	rx1_char, #0

		wrlut	rx1_lut_value,	rx1_address		'write byte to circular buffer in lut
		incmod	rx1_head, 	rx1_lut_btop		'increment buffer head
		cmp	rx1_head, 	rx1_tail 	wz	'hitting tail is bad
	if_z	incmod	rx1_tail, 	rx1_lut_btop		'increment tail  - I am losing received chars at the end of the buffer because the buffer is full
		reti1						'exit

now I need to do the same with getbyte to shave off some more longs and my first attempt there failed also, will report soon,

Mike

msrobots · 2019-02-09 12:26

ersmith wrote: »
evanh wrote: »

Tony,
I think Mike is trying to set the content of rx2_lut_value itself. Without the # there, ALTSB would use the value stored in rx2_lut_value as a cogRAM address.

Yes, sorry, I misunderstood what Mike was trying to do. Evanh has the right idea, I think.

To summarize:

(1) If you're trying to set byte N of register X, do
  altsb N, #X
  setbyte value
(2) If P is a pointer with a COG address in it, and you want to modify byte N of the register P points to, then do:
  altsb N, P
  setbyte value
Note that P is a COG address, so it does not (necessarily) have bits 0..1 as 0; that is, we could write example (1) above as:
  mov tmp, #X
  altsb N, tmp
  setbyte value
(3) If P is a pointer with a byte address in it (that is, a COG address * 4 + a byte offset within the COG register) then just do:
   altsb P, #0
   setbyte value

Now I am thinking about this

(1) If you're trying to set byte N of register X, do

		rdlut	rx1_lut_value,	rx1_address
		xor	rx1_byte_index,#3
		altsb 	rx1_byte_index,	#rx1_lut_value
  		setbyte 0,		rx1_char, #0
		wrlut	rx1_lut_value,	rx1_address		'write byte to circular buffer in lut

that is what I currently do, corrected to the names and needed changed syntax

But is this possible instead? I need to try...

(2) If P is a pointer with a COG address in it, and you want to modify byte N of the register P points to, then do:

		xor	rx1_byte_index,#3
		altsb 	rx1_byte_index,	rx1_address
  		setbyte 0,		rx1_char, #0

I love PASM, somehow,

Mike

evanh · 2019-02-09 12:28

Here's where I came in - https://forums.parallax.com/discussion/comment/1463469/#Comment_1463469

I do have a index from 0 to buffer-size for each buffer (currently in longs) and would like to access the LUT byte-wise. I do have very less code space left, but I have still reused init code space for variables.

That's why I say you've increased density.

evanh · 2019-02-09 12:36

msrobots wrote: »

(1) If you're trying to set byte N of register X, do

		rdlut	rx1_lut_value,	rx1_address
		xor	rx1_byte_index,#3
		altsb 	rx1_byte_index,	#rx1_lut_value
  		setbyte 0,		rx1_char, #0
		wrlut	rx1_lut_value,	rx1_address		'write byte to circular buffer in lut

You still need the AND in there to limit the range to one longword:

		rdlut	rx1_lut_value,	rx1_address
		and	rx1_byte_index,#3
		xor	rx1_byte_index,#3
		...

Otherwise the ALTSB will cause SETBYTE to write to whatever register the index would point to. #rx1_lut_value is just the base address that rx1_byte_index is added to, in byte scale addressing. Hence why this can handle the complete buffer in cogRAM.

msrobots · 2019-02-09 12:43

ah, ok, language barrier, my first language is German, the second choice was Latin for reasons my parents never really explained to me, and English is my third one.

This is something I got used to on the P1. You usually have some init routines, just running once before your PASM program goes into its main loop. So I regulary use this code space as space for Variables normally declared as RES, so not needing any init value.

But on the P2 this really makes fun, because you can read a parameter with one instruction using PTRA++ into the same space where the instruction is.

something like this

DAT
		org	0
'
cogserial_init
'
'-----------------------------------------------------------------------
'
'loading parameters and reusing init code space for variables
'
'-----------------------------------------------------------------------
rx1_mailbox_ptr	rdlong	rx1_mailbox_ptr,ptra++			'pointer to mailbox in hub and rx1 mailbox hub address
rx1_pin		rdlong	rx1_pin,	ptra++			'serial1 rxpin1
rx1_bitperiod	rdlong	rx1_bitperiod,	ptra++			'bitperiod := 7 + ((CLKFREQ / baudrate) << 16)
rx1_mode	rdlong	rx1_mode,	ptra++			'configure rx1_pin for asynchronous receive, always input
rx1_lut_buff	rdlong	rx1_lut_buff,	ptra++			'lut rx1 receive buffer address in lut
rx1_lut_btop	rdlong	rx1_lut_btop,	ptra++			'lut rx1 receive buffer top address in lut (size for rx1)
tx1_pin		rdlong	tx1_pin,	ptra++			'serial1 txpin
tx1_bitperiod	rdlong	tx1_bitperiod,	ptra++			'bitperiod := 7 + ((CLKFREQ / baudrate) << 16)
tx1_mode	rdlong	tx1_mode,	ptra++			'configure tx1_pin for asynchronous transmit, always output
tx1_lut_buff	rdlong	tx1_lut_buff,	ptra++			'lut tx1 send buffer address in lut
tx1_lut_btop	rdlong	tx1_lut_btop,	ptra++			'lut tx1 send buffer top address in lut (size for tx1)
rx2_pin		rdlong	rx2_pin,	ptra++			'serial1 rx2pin
rx2_bitperiod	rdlong	rx2_bitperiod,	ptra++			'bitperiod := 7 + ((CLKFREQ / baudrate) << 16)
rx2_mode	rdlong	rx2_mode,	ptra++			'configure rx2_pin for asynchronous receive, always input
rx2_lut_buff	rdlong	rx2_lut_buff,	ptra++			'lut rx2 receive buffer address in lut
rx2_lut_btop	rdlong	rx2_lut_btop,	ptra++			'lut rx2 receive buffer top address in lut (size for rx2)
tx2_pin		rdlong	tx2_pin,	ptra++			'serial1 tx2pin
tx2_bitperiod	rdlong	tx2_bitperiod,	ptra++			'bitperiod := 7 + ((CLKFREQ / baudrate) << 16)
tx2_mode	rdlong	tx2_mode,	ptra++			'configure tx2_pin for asynchronous transmit, always output
tx2_lut_buff	rdlong	tx2_lut_buff,	ptra++			'lut tx2 send buffer address in lut
tx2_lut_btop	rdlong	tx2_lut_btop,	ptra++			'lut tx2 send buffer top address in lut (size for tx2)
tx_ct1wait	rdlong	tx_ct1wait,	ptra++			'sysclocks to wait between calls to tx interrupt
'-----------------------------------------------------------------------

you need the longs for your parameter anyways and now you reuse the code space to load them, to store them there also. Code density, yeah that fits it. Well I learn every day a bit more about this confusing illogical language called English...

Enjoy!

Mike

evanh · 2019-02-09 12:56

Oh, yes, the multitude of borrowed and reused words.

Your english seems perfect, I wouldn't have guessed it wasn't your first language.

msrobots · 2019-02-09 13:01

well another thing

		'altsb 	rx1_byte_index,	rx1_address
  		'setbyte 0,		rx1_char, #0

		rdlut	rx1_lut_value,	rx1_address
		altsb 	rx1_byte_index,	#rx1_lut_value
  		setbyte 0,		rx1_char, #0
		wrlut	rx1_lut_value,	rx1_address		'write byte to circular buffer in lut

sadly this does not replace the other, shouldn't it as of @ersmith description?

Mike

ersmith · 2019-02-09 13:01

msrobots wrote: »
@ersmith,

I have a stupid question.

In spin on the P1 I can do something like this and it seems to work with fastspin too, but maybe not and that is my problem.
'
PRI startpasm(mailboxaddress, rxpin1, rxbitrate1, rxmode1, rxlutstart1, rxlutsize1, txpin1, txbitrate1, txmode1, txlutstart1, txlutsize1, rxpin2, rxbitrate2, rxmode2, rxlutstart2, rxlutsize2, txpin2, txbitrate2, txmode2, txlutstart2, txlutsize2, txclocks)
RETURN serpasm.start(@mailboxaddress)
So I am using the parameter of the PRI method startpasm as a continuous block of 21 longs, given by the address of the first parameter to provide the start parmeter block to my PASM driver,

That's a good question, and it's actually important. Yes, that should work in fastspin: I had to add that quite a while ago in order to support a lot of existing Spin code. If you take the address of a parameter fastspin makes sure all of the values are in HUB memory and can be passsed to another COG.

I really suggest not doing it that way in new code, though, because it kills performance. Normally fastspin likes to keep local variables in COG memory rather than HUB -- obviously this is much more efficient. But if you use @ on any local variable or parameter, we can't do that. If there's an @ on a variable it has to go in HUB so it can be read by another COG. And if any variable goes in HUB they all have to go in HUB, because existing Spin code often makes assumptions about how things are laid out in memory and does some weird tricks with that.

fastspin has to be very conservative about anything in HUB memory, because it could be changed by other COGs at any time. So the optimizer pretty much can't do anything with the variables if they end up in HUB; no common sub-expression elimination, no combination of instructions, etc.

So for new code, I suggest just explicitly setting an array in HUB (something in the VAR section) with the values you want to pass to the other COG. That way you can control which values are put in HUB, and let the compiler put anything else it can into local COG memory.

Eric

ersmith · 2019-02-09 13:05

@msrobots: is your altsb code in a DAT block? If so then it should work. I found another setbyte bug in inline assembly, but it only affects inline assembly (and I think you'd get an error that says "setbyte is not supported in inline assembly" if you tried it).

I'll have a new fastspin very soon that definitely works (I have a sample program that tests setbyte, and it did change the right byte).

msrobots · 2019-02-09 13:10

evanh wrote: »

Oh, yes, the multitude of borrowed and reused words.

Your english seems perfect, I wouldn't have guessed it wasn't your first language.

Yeah I am stateside since about 12 years and I am trying to better my English skills by listening to StandUp comedians, mostly the older ones one can find on youtube, like George Carlin, WC Fields, Richard Prior, and even Ronald R. on Correspondence Dinners.

It helps to better your skills and is fun too.

- I am currently working on a joke about cocaine. - I just need two more lines...

Enjoy!

Mike

evanh · 2019-02-09 13:15

You're still missing the needed AND to limit addressing range of the index. See https://forums.parallax.com/discussion/comment/1464096/#Comment_1464096

EDIT: Oh, I see you've got that earlier in the source code.

evanh · 2019-02-09 13:24

msrobots wrote: »
(2) If P is a pointer with a COG address in it, and you want to modify byte N of the register P points to, then do:
		xor	rx1_byte_index,#3
		altsb 	rx1_byte_index,	rx1_address
  		setbyte 0,		rx1_char, #0

No that won't work because rx1_address is in lutRAM and SETBYTE can't access lutRAM. EDIT: rx1_address will be treated as a cogRAM address and overwrite something unexpected.

msrobots · 2019-02-09 13:32

ersmith wrote: »

@msrobots: is your altsb code in a DAT block? If so then it should work. I found another setbyte bug in inline assembly, but it only affects inline assembly (and I think you'd get an error that says "setbyte is not supported in inline assembly" if you tried it).

I'll have a new fastspin very soon that definitely works (I have a sample program that tests setbyte, and it did change the right byte).

Oh no, don't panic here, the altsb is working fine currently with using the longer form of setbyte as you explained, That works.

I just tried it with one more level of indirection and that seems not to do what I was thinking.

As of my init code for PRI startpasm(xxxxx) my thinking was that this saves code space as if I would do it like this

PRI startpasm(mailboxaddress, rxpin1, rxbitrate1, rxmode1, rxlutstart1, rxlutsize1, txpin1, txbitrate1, txmode1, txlutstart1, txlutsize1, rxpin2, rxbitrate2, rxmode2, rxlutstart2, rxlutsize2, txpin2, txbitrate2, txmode2, txlutstart2, txlutsize2, txclocks) | startparams[22]
  startparams[0]	:= @rx1_cmd					'hub address mailbox
  startparams[1]	:= rxpin1					'pin rx1
  startparams[2]	:=  7 + ((CLKFREQ / rxbaudrate1) << 16)		'bitrate rx1
  startparams[3]	:= rxmode1					'mode rx1
  startparams[4]	:= rxlutstart1					'start lutbuffer rx1
  startparams[5]	:= rxlutsize1					'size lutbuffer rx1
  startparams[6]	:= txpin1					'pin tx1
  startparams[7]	:=  7 + ((CLKFREQ / txbaudrate1) << 16)		'bitrate tx1
  startparams[8]	:= txmode1					'mode tx1
  startparams[9]	:= txlutstart1					'start lutbuffer tx1
  startparams[10]	:= txlutsize1					'size lutbuffer tx1
  startparams[11]	:= rxpin2					'pin rx2
  startparams[12]	:=  7 + ((CLKFREQ / rxbaudrate2) << 16)		'bitrate rx2
  startparams[13]	:= rxmode2					'mode rx2
  startparams[14]	:= rxlutstart2					'start lutbuffer rx2
  startparams[15]	:= rxlutsize2					'size lutbuffer rx2
  startparams[16]	:= txpin2					'pin tx2
  startparams[17]	:=  7 + ((CLKFREQ / txbaudrate2) << 16)		'bitrate tx2
  startparams[18]	:= txmode2					'mode tx2
  startparams[19]	:= txlutstart2					'start lutbuffer tx2
  startparams[20]	:= txlutsize2					'size lutbuffer tx2
  startparams[21]	:= tx_ct1wait					'number of sysclocks to wait between TX interrupts 

RETURN serpasm.start(@startparams)

and I just need those 21 longs while the COG is starting, never again?

I need my Mailbox of 8 longs in DAT/HUB but the start parameter are never needed later, just needed to start the PASM COG.

So what would you advice for the smallest way to archive that? Does not to be fast, just happens once at start of the COG.

still refactoring,

on the P1 it is FIT 496 how far can I go on the P2? I seem to corrupt res variables (yes I still need some) somehow when adding code thus my goal to shorten up things.

Enjoy!

Mike

msrobots · 2019-02-09 13:35

ersmith wrote: »

@msrobots: is your altsb code in a DAT block? If so then it should work. I found another setbyte bug in inline assembly, but it only affects inline assembly (and I think you'd get an error that says "setbyte is not supported in inline assembly" if you tried it).

I'll have a new fastspin very soon that definitely works (I have a sample program that tests setbyte, and it did change the right byte).

O I forgot, yes this is a complete PASM COG, just a small stub to start it from SPIN

Mike

msrobots · 2019-02-09 15:12

Hah,

and that exactly seems to have been the problem. I am now down to 487 longs and both pairs of RX/TX are working with byte addressed LUT buffers.

I post the current code before I destroy it again, still need to shorten it, I really want some sort of string input supported by PASM in the driver.

But hell yeah it passes all the tests. Just slower as envisioned because I still read and write bytes to the HUB, but all 4 lines are working, I am getting somewhere...

I attached all needed files, testserial.spin2 is the main program.

Enjoy!

Mike

msrobots · 2019-02-09 15:19

evanh wrote: »
msrobots wrote: »
(2) If P is a pointer with a COG address in it, and you want to modify byte N of the register P points to, then do:
		xor	rx1_byte_index,#3
		altsb 	rx1_byte_index,	rx1_address
  		setbyte 0,		rx1_char, #0
No that won't work because rx1_address is in lutRAM and SETBYTE can't access lutRAM. EDIT: rx1_address will be treated as a cogRAM address and overwrite something unexpected.

oh yes it did overwrite something unexpected.

Mike

jmg · 2019-02-09 20:20

msrobots wrote: »

...
oh yes it did overwrite something unexpected.

It's a big hammer, but if used with care, it can index an array of bytes, so you do not need separate word and byte pointers

see here how it primes both the byte and long pointer sections, but it is up to the user to keep the range of the long pointer safe.
(yes, this does mean your arrays need to be in COG not LUT, but overall code can be smaller and faster)

106	.	Register Indirection	EEEE 1001011 00I DDDDDDDDD SSSSSSSSS	ALTSB   D,{#}S	Alter subsequent SETBYTE instruction. Next D field = (D[10:2] + S) & $1FF, N field = D[1:0].         D += sign-extended S[17:9].	2	same	2	same	D																						
107	alias	Register Indirection	EEEE 1001011 001 DDDDDDDDD 000000000	ALTSB   D	Alter subsequent SETBYTE instruction. Next D field = D[10:2], N field = D[1:0].	2	same	2	same	D																						
108	.	Register Indirection	EEEE 1001011 01I DDDDDDDDD SSSSSSSSS	ALTGB   D,{#}S	Alter subsequent GETBYTE/ROLBYTE instruction. Next S field = (D[10:2] + S) & $1FF, N field = D[1:0]. D += sign-extended S[17:9].	2	same	2	same	D																						
109	alias	Register Indirection	EEEE 1001011 011 DDDDDDDDD 000000000	ALTGB   D	Alter subsequent GETBYTE/ROLBYTE instruction. Next S field = D[10:2], N field = D[1:0].	2	same	2	same	D

another comment :
To me, this seems risky

		wrlut	rx1_lut_value,	rx1_address		'write byte to circular buffer in lut
		incmod	rx1_head, 	rx1_lut_btop		'increment buffer head
		cmp	rx1_head, 	rx1_tail 	wz	'hitting tail is bad
	if_z	incmod	rx1_tail, 	rx1_lut_btop		'increment tail  - I am losing received chars at the end of the buffer because the buffer is full
		reti1						'exit

Here you change the WR and RD pointers, but with multiple interrupts/non ints in the mix, that gives multiple places that can change any pointer.
I prefer to clamp the WRITE pointer only, as safer.
In P2 I think that means this minor change

		wrlut	rx1_lut_value,	rx1_address		'write byte to circular buffer in lut
		incmod	rx1_head, 	rx1_lut_btop		'increment WR buffer head
		cmp	rx1_head, 	rx1_tail 	wz	'hitting RD pointer is bad
	if_nz	reti1                                           'faster return
	if_z	decmod	rx1_head, 	rx1_lut_btop		'rare safety case, undo overflow, next WR char will overwrite this one, RD ptr unaffected. 
        '   Can flag over-run here too if needed.
		reti1						'exit

msrobots · 2019-02-09 21:10

Well @jmg that is a good point and I also thought about how to handle this.

On the TX side I can just throttle the input when the buffer is full.

On the RX side I can't so I WILL lose bytes when the buffer is full.

your version will lose bytes at the front of the buffer, my version looses bytes at the end of the buffer.

I think it is better, because you might miss something if you not read fast enough but you can catch up reading and still have a valid stream of data.

loosing bytes at the front of the buffer is more fatal, because you would have a inconsistent stream missing data somewhere in between.

losing bytes at the end of the buffer is better, or - hmm - would also happen if you do not have a buffer. But the data stream is not corrupted.

that is my reasoning why I just advance the tail pointer when the write pointer hits it at RX. I throw away the last entry to keep all buffer entries in sync.

Enjoy!

Mike

cogserial - fullduplex smart serial using interrupt

Comments