Huh, I hadn't noticed the differences either. I've been happily using an ALTxx + SETNIB/GETNIB combo without problem to do decimal strings. Fastspin hasn't given me any grief that I know of so I assume that Fastspin's error is unique to SETBYTE.
SETNIB {#}S Set S[3:0] into nibble established by prior ALTSN instruction.
GETNIB D Get nibble established by prior ALTGN instruction into D.
Amusingly, I've been using Pnut to produce my binaries for the last week due to having destroyed my P2ES chip and then updated the FPGA to v33i.
EDIT: Oh, no, I was using the full version of SETNIB until recently. I hadn't realised I'd changed it back to the alias since destroying the P2ES chip. I didn't have any comments as to why ... and now I remember, it was because, when I changed from Pnut to p2asm, p2asm would throw an error on that alias. But I'd long replaced p2asm with fastspin.
And it is good to find a bug, that is why we are testing all of this.
Sadly I somehow messed up my program and just one of the two pairs of RX and TX want to work. I am starring at my code to no avail. It was working before, I just broke it shaving longs away to make space for string input.
This is extremely frustrating and I am at the point to start all over again. The sad thing is that I am right now even unsure if I broke the PASM driver, the spin object or my test routines.
The only thing I am sure about is that is was working before, is not working now and stupid me broke it without keeping copies of the working code.
The only thing I am sure about is that is was working before, is not working now and stupid me broke it without keeping copies of the working code.
Ya, done that too many times myself. Saved myself a number of times with editor undo's. When I remember, I often have hunks of disabled duplicated code when working on an alternative. Or I might duplicate the sources completely for the alternative approach.
I presume that's one big reason why source control systems were devised.
Mike,
What is the content of those bytes? Doing endian reversal without a good reason is just making extra work.
...
Give this a try anyway:
rx1_isr rdpin rx1_char, rx1_pin 'get received chr
shr rx1_char, #32-8 'shift to lsb justify
mov rx1_address, rx1_head 'adjust to buffer start
shr rx1_address, #2
add rx1_address, rx1_lut_buff 'by adding rx1_lut_buff
rdlut rx1_lut_value, rx1_address
mov rx1_byte_index, rx1_head
and rx1_byte_index, #%11 'now 0 to 3
xor rx1_byte_index, #%11 'now 3 to 0
altsb rx1_byte_index, #rx1_lut_value
setbyte rx1_char
wrlut rx1_lut_value, rx1_address 'write byte to circular buffer in lut
incmod rx1_head, rx1_lut_btop 'increment buffer head
cmp rx1_head, rx1_tail wz 'hitting tail is bad
if_z incmod rx1_tail, rx1_lut_btop 'increment tail - I am losing received chars at the end of the buffer because the buffer is full
reti1 'exit
Currently I am even not sure about the endian reversal. On the P1 a long in HUB ram has a different byte order as in COG ram, not sure about P2.
My plan was to write bytes into my LUT buffers in the RX interrupt, write bytes from my LUT buffers in the TX interrupt, but when transferring LUT to HUB and HUB to LUT being able to write longs after adjusting to the first complete LUT long by writing/reading bytes. Maybe even use the streamer. Or rdfast or whatever.
I never got that far because of running out of space and then breaking it.
My thinking was that when using getbyte/setbyte I am able to reverse the order and as soon as I write/read the first long I will find out if the reversal is needed or not. And if not shave off the reversal and save two more longs.
I am very sure that IF I find my stupid mistake I made, it will be the classical DOH moment.
The only thing I am sure about is that is was working before, is not working now and stupid me broke it without keeping copies of the working code.
Ya, done that too many times myself. Saved myself a number of times with editor undo's. When I remember, I often have hunks of disabled duplicated code when working on an alternative. Or I might duplicate the sources completely for the alternative approach.
I presume that's one big reason why source control systems were devised.
Yes on all points, but undo's are broken in Spin2Gui, at least at my current version under windows, and since at work I am bound to use subversion and not allowed to use GIT because of company policy, I never bothered to use GIT. My private stuff is also not allowed on the company servers, and gosh who bothers about source control for files with just 500 lines of code.
I am a COBOL programmer, source files are supposed to be HUGE, with 500 lines you do not even get a DB access running in COBOL.
Hmm - I do not find any other excuse for not using source control, maybe I should look into GIT.
rx1_isr rdpin rx1_char, rx1_pin 'get received chr
shr rx1_char, #32-8 'shift to lsb justify
mov rx1_byte_index, rx1_head
and rx1_byte_index, #%11 'now 0 to 3
mov rx1_address, rx1_head 'adjust to buffer start
shr rx1_address, #2
add rx1_address, rx1_lut_buff 'by adding rx1_lut_buff
rdlut rx1_lut_value, rx1_address
neg rx1_byte_index ' now 0 to -3
add rx1_byte_index, #3 ' now 3 to 0
altsb rx1_byte_index, #rx1_lut_value
setbyte rx1_char
' cmp rx1_byte_index, #0 wz
' if_z setbyte rx1_lut_value, rx1_char, #3
' cmp rx1_byte_index, #1 wz
' if_z setbyte rx1_lut_value, rx1_char, #2
' cmp rx1_byte_index, #2 wz
' if_z setbyte rx1_lut_value, rx1_char, #1
' cmp rx1_byte_index, #3 wz
' if_z setbyte rx1_lut_value, rx1_char, #0
wrlut rx1_lut_value, rx1_address 'write byte to circular buffer in lut
incmod rx1_head, rx1_lut_btop 'increment buffer head
cmp rx1_head, rx1_tail wz 'hitting tail is bad
if_z incmod rx1_tail, rx1_lut_btop 'increment tail - I am losing received chars at the end of the buffer because the buffer is full
reti1 'exit
but it does not work. And I really need this 4 longs each per pair...
I am loosing faith in being a worthy programmer,
Mike
It's been suggested before, but I'll mention it again: Have you tried moving your code to LUT RAM and placing your buffers in COG RAM?
It seems when this has been mentioned previously you have stated that you can't because LUT RAM is full: of buffers.
But if the buffers are moved to COG RAM, you have that space for code, and you'll be able to pack your bytes into longs with ALTSB in COG RAM buffers.
Please correct me if I'm off base here.
Well yes @AJL I think @evanh mentioned that, but I am not sure why this would help. Maybe you can elaborate. My point of view here is that I have 512 longs of LUT ram that would nicely fit 4 512 byte buffers for RX1/TX1/RX2/TX2.
Cog ram is not 512 longs in my understanding, because of special registers at the end of COG ram or is that different on the P2 vs the P1? I did ask that question before and found no answer yet.
Since I am considering to rewrite this completely if I can't find the stupid mistake I made I am really interested about why two people now recommend to use LUT ram for code and COG ram as LUT/buffer.
I am sometimes quite slow to understand things, so please bear with me and explain further. I seem to miss some point of the argument why I should try this.
Sure I can copy my code from COG to LUT and run it there, and reuse the COG space as buffer, but why should I?
I currently reuse all initialization code space for register variables. To speed up things I pre calculate pointers and have them ready to use. That are about 150 registers ready to use because in COG ram.
If I have the code in LUT and my buffers in COG how to handle those variables I need to do rdbyte/rdlong buffer positions/sizes whatever.
Keeping them in COG ram would reduce the available buffer size, having them in LUT ram and accessing with rdlut wrlut seems impossible to me.
In spin on the P1 I can do something like this and it seems to work with fastspin too, but maybe not and that is my problem.
'
' buffered smart pin serial object for P2 Eval board, buffering rx/tx in the Cog, supporting 2 full-duplex connrction
'
CON
_txmode = %0000_0000_000_0000000000000_01_11110_0 'async tx mode, output enabled for smart output
_rxmode = %0000_0000_000_0000000000000_00_11111_0 'async rx mode, input enabled for smart input
OBJ
serpasm: "cogserialpasm.spin2" 'this is the PASM2 COG doing all the work
'
'-----------------------------------------------------------------------
'
VAR
long rx1_cmd, rx1_param, tx1_cmd, tx1_param, rx2_cmd, rx2_param, tx2_cmd, tx2_param 'mailbox of this instance 8 longs
DAT
outchar byte "H",0,0,0
'
'-----------------------------------------------------------------------
'stop pasm cog if already running
'-----------------------------------------------------------------------
'
PUB stop
serpasm.stop
'
'-----------------------------------------------------------------------
'return parameter address of instance
'-----------------------------------------------------------------------
'
PUB mailboxaddress
RETURN @rx1_cmd
'
'-----------------------------------------------------------------------
'use this to start a 1 port driver with 255 bytes buffer each channel
'-----------------------------------------------------------------------
'
PUB start(rxpin = 63, txpin = 62, mode = -1, baudrate = 230_400) | bitrate
bitrate := 7 + ((CLKFREQ / baudrate) << 16)
RETURN startpasm(@rx1_cmd, rxpin, bitrate, _rxmode, 0, $FF, txpin, bitrate, _txmode, $100, $FF, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 100)
'
'-----------------------------------------------------------------------
'use this to start a 2 port driver with 128 bytes buffer each channel
'-----------------------------------------------------------------------
'
PUB start2(rxpin1, txpin1, mode1, baudrate1, rxpin2, txpin2, mode2, baudrate2) | bitrate1, bitrate2
bitrate1 := 7 + ((CLKFREQ / baudrate1) << 16)
bitrate2 := 7 + ((CLKFREQ / baudrate2) << 16)
RETURN startpasm(@rx1_cmd, rxpin1, bitrate1, _rxmode, 0, $7F, txpin1, bitrate1, _txmode, $80, $7F, rxpin2, bitrate2, _rxmode, $100, $7F, txpin2, bitrate2, _txmode, $180, $7F, 100)
'
'-----------------------------------------------------------------------
'use this to start whatever combination you want
'-----------------------------------------------------------------------
'
PUB startExt(rxpin1, txpin1, rxbaudrate1, txbaudrate1, rxmode1, txmode1, rxlutstart1, rxlutsize1, txlutstart1, txlutsize1, rxpin2, txpin2, rxbaudrate2, txbaudrate2, rxmode2, txmode2, rxlutstart2, rxlutsize2, txlutstart2, txlutsize2, txclocks) | rxbitrate1, txbitrate1, rxbitrate2, txbitrate2
rxbitrate1 := 7 + ((CLKFREQ / rxbaudrate1) << 16)
txbitrate1 := 7 + ((CLKFREQ / txbaudrate1) << 16)
rxbitrate2 := 7 + ((CLKFREQ / rxbaudrate2) << 16)
txbitrate2 := 7 + ((CLKFREQ / txbaudrate2) << 16)
RETURN startpasm(@rx1_cmd, rxpin1, rxbitrate1, rxmode1, rxlutstart1, rxlutsize1, txpin1, txbitrate1, txmode1, txlutstart1, txlutsize1, rxpin2, rxbitrate2, rxmode2, rxlutstart2, rxlutsize2, txpin2, txbitrate2, txmode2, txlutstart2, txlutsize2, txclocks)
'
'-----------------------------------------------------------------------
'this provides the parameter array needed to start the pasm cog on the stack
'-----------------------------------------------------------------------
'
PRI startpasm(mailboxaddress, rxpin1, rxbitrate1, rxmode1, rxlutstart1, rxlutsize1, txpin1, txbitrate1, txmode1, txlutstart1, txlutsize1, rxpin2, rxbitrate2, rxmode2, rxlutstart2, rxlutsize2, txpin2, txbitrate2, txmode2, txlutstart2, txlutsize2, txclocks)
RETURN serpasm.start(@mailboxaddress)
So I am using the parameter of the PRI method startpasm as a continuous block of 21 longs, given by the address of the first parameter to provide the start parmeter block to my PASM driver,
This starting is reading in 21 longs, then setting a sync value so that startpasm returns after the COG started and has read its parameters.
'=======================================================================
'
' buffered smart pin serial object for P2 Eval board, buffering rx/tx in the Cog, supporting 2 full-duplex connrction
'
'=======================================================================
VAR
long cog 'cog id of this instance
'
'-----------------------------------------------------------------------
'stop cog if already running
'-----------------------------------------------------------------------
'
PUB stop
if cog
cogstop(cog-1)
cog := 0
'
'-----------------------------------------------------------------------
'start COG with startparameter block address
'-----------------------------------------------------------------------
'
PUB start(startparameteraddress) 'the first long in the startparameter block contains the address of the later used
stop 'Mailbox. So now I use long[long[]] to access the first long in the later used mailbox as Flag
long[long[startparameteraddress]] := 0 'set flag (0) to know if the started cog has read its parameters
cog := cognew(@cogserial_init,startparameteraddress) + 1
if cog 'if I was able to start the COG
repeat until long[long[startparameteraddress]] == -1 'I wait until Flag states cog is done reading parameter and ready to roll (-1)
RESULT := 1 'now start done
the first parameter of the start parameter block contains the address of the mailbox to use, thus the long[long[xx]]
are 21 continuous longs in memory addressable as @mailboxaddress like it would be on a P1 or is it possible that those parameter are not in a continuous block in memory?
... I am really interested about why two people now recommend to use LUT ram for code and COG ram as LUT/buffer.
I am sometimes quite slow to understand things, so please bear with me and explain further. I seem to miss some point of the argument why I should try this.
Sure I can copy my code from COG to LUT and run it there, and reuse the COG space as buffer, but why should I?
I think the idea behind doing that, is these opcodes can work accessing into COG, and not LUT
106 . Register Indirection EEEE 1001011 00I DDDDDDDDD SSSSSSSSS ALTSB D,{#}S Alter subsequent SETBYTE instruction. Next D field = (D[10:2] + S) & $1FF, N field = D[1:0]. D += sign-extended S[17:9]. 2 same 2 same D
107 alias Register Indirection EEEE 1001011 001 DDDDDDDDD 000000000 ALTSB D Alter subsequent SETBYTE instruction. Next D field = D[10:2], N field = D[1:0]. 2 same 2 same D
108 . Register Indirection EEEE 1001011 01I DDDDDDDDD SSSSSSSSS ALTGB D,{#}S Alter subsequent GETBYTE/ROLBYTE instruction. Next S field = (D[10:2] + S) & $1FF, N field = D[1:0]. D += sign-extended S[17:9]. 2 same 2 same D
109 alias Register Indirection EEEE 1001011 011 DDDDDDDDD 000000000 ALTGB D Alter subsequent GETBYTE/ROLBYTE instruction. Next S field = D[10:2], N field = D[1:0]. 2 same 2 same D
90 . Math and Logic EEEE 1000110 NNI DDDDDDDDD SSSSSSSSS SETBYTE D,{#}S,#N Set S[7:0] into byte N in D, keeping rest of D same. 2 same 2 same D
91 alias Math and Logic EEEE 1000110 00I 000000000 SSSSSSSSS SETBYTE {#}S Set S[7:0] into byte established by prior ALTSB instruction. 2 same 2 same D
92 . Math and Logic EEEE 1000111 NNI DDDDDDDDD SSSSSSSSS GETBYTE D,{#}S,#N Get byte N of S into D. D = {24'b0, S.BYTE[N]). 2 same 2 same D
93 alias Math and Logic EEEE 1000111 000 DDDDDDDDD 000000000 GETBYTE D Get byte established by prior ALTGB instruction into D. 2 same 2 same D
But I can not keep variables in LUT space, how to handle that?
and with a working setbyte /altsb I seem to be able to byte-address my LUT buffer with two instructions too. What am I missing?
You may be missing that the opcode fields are only 9 bits, and so have a 512L reach : that covers all of COG, but does not reach into LUT
If all opcodes could reach COG and LUT equally, there would be no need to call it LUT
Variables stay where opcodes best access them, in COG and code can go where it only needs to execute, in LUT.
Some Assemblers allow the idea of CSEG and DSEG and they place code and data where they best fit.
Maybe P2 needs something similar, so you can just write, and then the assembler splits as needed, checking for anything that may be illegal.
Other approaches to this would be to code your buffers as byte only, and not pack them. Yes, that's wasteful, but it is also fast and simple - and allows easier testing.
Later, you can tune the buffers to be less wasteful and store more bytes, if you find you need to.
Faster interrupts do not need the buffers to be as large.
Yes, good points, I was thinking along the same lines but came to a different answer.
Since I do need COG ram for my variables, my buffers need to be in the LUT or the buffers have to be significant smaller.
And on my first (working) example I do not pack the bytes in the LUT and waste 3 of four bytes in a long.
And yes, you are right the interrupts fill the buffer quite fast, the best working version I had worked up to sysclock baud on RX and TX.
In my current approach RX1 uses int1 RX2 uses int2 and I have int3 just running every x sysclock, checking if it can transfer on TX1 or TX2.
The main COG just checks the mailboxes and transfers data from buffer to HUB and vice versa.
That worked up to sysclock baud on all four lines talking to themselves on smartpins using one COG.
Getting the bytes into the HUB or from the HUB was and is the bottleneck. When I save my bytes in longs in LUT I can just read a long and write one byte into HUB.
When I on the other hand can pack 4 bytes in the right order in my LUT buffer long, I need just to transfer single bytes to/from HUB until I am at the start of a new LUT long and then can start moving longs to/from HUB until I need to step down to bytes on the last long. I think I might even be able to use rdfast/wrfast, when transferring big enough data.
This is about optimizing the speed when transferring blocks of data, instead of single bytes.
for a simple ser.TX(65) it does not really make a difference but for ser.str() or ser.writeblock(…) it will make a huge difference if writing say 30 bytes is 30 rdbytes/wrbytes to HUB or up to 4 rdbytes/wrbytes and at least 7 rdlongs/wrlongs instead of bytes.
But to do so I need 4 bytes not one in my LUT, and then I started packing bytes and broke everything. Well not everything RX1/TX1 are working RX2/TX2 are broken.
I presumed, since you were quadrupling the buffer density, that you could easily get away with just doubling the number of entries while halving its total size and gaining benefits all round. Including more available space for code.
I presumed, since you were quadrupling the buffer density, that you could easily get away with just doubling the number of entries while halving its total size and gaining benefits all round. Including more available space for code.
huh?, No I am not just thinking about quadrupling the size but faster access between HUB and LUT thru reading and writing longs instead of bytes. But I still need way more code space then I have to do so.
Refactoring and refactoring, shaving off longs and hoping to stumble upon why RX2 misbehaves, by using the same code for both channels.
I can at least report some success, thanks to all of you
this, does work and replaces my 8 lines of code by 3 of them.
rx1_isr rdpin rx1_char, rx1_pin 'get received chr
shr rx1_char, #32-8 'shift to lsb justify
mov rx1_byte_index, rx1_head
and rx1_byte_index, #%11 'now 0 to 3
mov rx1_address, rx1_head 'adjust to buffer start
shr rx1_address, #2
add rx1_address, rx1_lut_buff 'by adding rx1_lut_buff
rdlut rx1_lut_value, rx1_address
xor rx1_byte_index,#3
altsb rx1_byte_index, #rx1_lut_value
setbyte 0, rx1_char, #0
'
' cmp rx1_byte_index, #0 wz
' if_z setbyte rx1_lut_value, rx1_char, #3
' cmp rx1_byte_index, #1 wz
' if_z setbyte rx1_lut_value, rx1_char, #2
' cmp rx1_byte_index, #2 wz
' if_z setbyte rx1_lut_value, rx1_char, #1
' cmp rx1_byte_index, #3 wz
' if_z setbyte rx1_lut_value, rx1_char, #0
wrlut rx1_lut_value, rx1_address 'write byte to circular buffer in lut
incmod rx1_head, rx1_lut_btop 'increment buffer head
cmp rx1_head, rx1_tail wz 'hitting tail is bad
if_z incmod rx1_tail, rx1_lut_btop 'increment tail - I am losing received chars at the end of the buffer because the buffer is full
reti1 'exit
now I need to do the same with getbyte to shave off some more longs and my first attempt there failed also, will report soon,
Tony,
I think Mike is trying to set the content of rx2_lut_value itself. Without the # there, ALTSB would use the value stored in rx2_lut_value as a cogRAM address.
Yes, sorry, I misunderstood what Mike was trying to do. Evanh has the right idea, I think.
To summarize:
(1) If you're trying to set byte N of register X, do
altsb N, #X
setbyte value
(2) If P is a pointer with a COG address in it, and you want to modify byte N of the register P points to, then do:
altsb N, P
setbyte value
Note that P is a COG address, so it does not (necessarily) have bits 0..1 as 0; that is, we could write example (1) above as:
mov tmp, #X
altsb N, tmp
setbyte value
(3) If P is a pointer with a byte address in it (that is, a COG address * 4 + a byte offset within the COG register) then just do:
altsb P, #0
setbyte value
Now I am thinking about this
(1) If you're trying to set byte N of register X, do
I do have a index from 0 to buffer-size for each buffer (currently in longs) and would like to access the LUT byte-wise. I do have very less code space left, but I have still reused init code space for variables.
You still need the AND in there to limit the range to one longword:
rdlut rx1_lut_value, rx1_address
and rx1_byte_index,#3
xor rx1_byte_index,#3
...
Otherwise the ALTSB will cause SETBYTE to write to whatever register the index would point to. #rx1_lut_value is just the base address that rx1_byte_index is added to, in byte scale addressing. Hence why this can handle the complete buffer in cogRAM.
ah, ok, language barrier, my first language is German, the second choice was Latin for reasons my parents never really explained to me, and English is my third one.
This is something I got used to on the P1. You usually have some init routines, just running once before your PASM program goes into its main loop. So I regulary use this code space as space for Variables normally declared as RES, so not needing any init value.
But on the P2 this really makes fun, because you can read a parameter with one instruction using PTRA++ into the same space where the instruction is.
something like this
DAT
org 0
'
cogserial_init
'
'-----------------------------------------------------------------------
'
'loading parameters and reusing init code space for variables
'
'-----------------------------------------------------------------------
rx1_mailbox_ptr rdlong rx1_mailbox_ptr,ptra++ 'pointer to mailbox in hub and rx1 mailbox hub address
rx1_pin rdlong rx1_pin, ptra++ 'serial1 rxpin1
rx1_bitperiod rdlong rx1_bitperiod, ptra++ 'bitperiod := 7 + ((CLKFREQ / baudrate) << 16)
rx1_mode rdlong rx1_mode, ptra++ 'configure rx1_pin for asynchronous receive, always input
rx1_lut_buff rdlong rx1_lut_buff, ptra++ 'lut rx1 receive buffer address in lut
rx1_lut_btop rdlong rx1_lut_btop, ptra++ 'lut rx1 receive buffer top address in lut (size for rx1)
tx1_pin rdlong tx1_pin, ptra++ 'serial1 txpin
tx1_bitperiod rdlong tx1_bitperiod, ptra++ 'bitperiod := 7 + ((CLKFREQ / baudrate) << 16)
tx1_mode rdlong tx1_mode, ptra++ 'configure tx1_pin for asynchronous transmit, always output
tx1_lut_buff rdlong tx1_lut_buff, ptra++ 'lut tx1 send buffer address in lut
tx1_lut_btop rdlong tx1_lut_btop, ptra++ 'lut tx1 send buffer top address in lut (size for tx1)
rx2_pin rdlong rx2_pin, ptra++ 'serial1 rx2pin
rx2_bitperiod rdlong rx2_bitperiod, ptra++ 'bitperiod := 7 + ((CLKFREQ / baudrate) << 16)
rx2_mode rdlong rx2_mode, ptra++ 'configure rx2_pin for asynchronous receive, always input
rx2_lut_buff rdlong rx2_lut_buff, ptra++ 'lut rx2 receive buffer address in lut
rx2_lut_btop rdlong rx2_lut_btop, ptra++ 'lut rx2 receive buffer top address in lut (size for rx2)
tx2_pin rdlong tx2_pin, ptra++ 'serial1 tx2pin
tx2_bitperiod rdlong tx2_bitperiod, ptra++ 'bitperiod := 7 + ((CLKFREQ / baudrate) << 16)
tx2_mode rdlong tx2_mode, ptra++ 'configure tx2_pin for asynchronous transmit, always output
tx2_lut_buff rdlong tx2_lut_buff, ptra++ 'lut tx2 send buffer address in lut
tx2_lut_btop rdlong tx2_lut_btop, ptra++ 'lut tx2 send buffer top address in lut (size for tx2)
tx_ct1wait rdlong tx_ct1wait, ptra++ 'sysclocks to wait between calls to tx interrupt
'-----------------------------------------------------------------------
you need the longs for your parameter anyways and now you reuse the code space to load them, to store them there also. Code density, yeah that fits it. Well I learn every day a bit more about this confusing illogical language called English...
So I am using the parameter of the PRI method startpasm as a continuous block of 21 longs, given by the address of the first parameter to provide the start parmeter block to my PASM driver,
That's a good question, and it's actually important. Yes, that should work in fastspin: I had to add that quite a while ago in order to support a lot of existing Spin code. If you take the address of a parameter fastspin makes sure all of the values are in HUB memory and can be passsed to another COG.
I really suggest not doing it that way in new code, though, because it kills performance. Normally fastspin likes to keep local variables in COG memory rather than HUB -- obviously this is much more efficient. But if you use @ on any local variable or parameter, we can't do that. If there's an @ on a variable it has to go in HUB so it can be read by another COG. And if any variable goes in HUB they all have to go in HUB, because existing Spin code often makes assumptions about how things are laid out in memory and does some weird tricks with that.
fastspin has to be very conservative about anything in HUB memory, because it could be changed by other COGs at any time. So the optimizer pretty much can't do anything with the variables if they end up in HUB; no common sub-expression elimination, no combination of instructions, etc.
So for new code, I suggest just explicitly setting an array in HUB (something in the VAR section) with the values you want to pass to the other COG. That way you can control which values are put in HUB, and let the compiler put anything else it can into local COG memory.
@msrobots: is your altsb code in a DAT block? If so then it should work. I found another setbyte bug in inline assembly, but it only affects inline assembly (and I think you'd get an error that says "setbyte is not supported in inline assembly" if you tried it).
I'll have a new fastspin very soon that definitely works (I have a sample program that tests setbyte, and it did change the right byte).
Oh, yes, the multitude of borrowed and reused words.
Your english seems perfect, I wouldn't have guessed it wasn't your first language.
Yeah I am stateside since about 12 years and I am trying to better my English skills by listening to StandUp comedians, mostly the older ones one can find on youtube, like George Carlin, WC Fields, Richard Prior, and even Ronald R. on Correspondence Dinners.
It helps to better your skills and is fun too.
- I am currently working on a joke about cocaine. - I just need two more lines...
No that won't work because rx1_address is in lutRAM and SETBYTE can't access lutRAM. EDIT: rx1_address will be treated as a cogRAM address and overwrite something unexpected.
@msrobots: is your altsb code in a DAT block? If so then it should work. I found another setbyte bug in inline assembly, but it only affects inline assembly (and I think you'd get an error that says "setbyte is not supported in inline assembly" if you tried it).
I'll have a new fastspin very soon that definitely works (I have a sample program that tests setbyte, and it did change the right byte).
Oh no, don't panic here, the altsb is working fine currently with using the longer form of setbyte as you explained, That works.
I just tried it with one more level of indirection and that seems not to do what I was thinking.
As of my init code for PRI startpasm(xxxxx) my thinking was that this saves code space as if I would do it like this
and I just need those 21 longs while the COG is starting, never again?
I need my Mailbox of 8 longs in DAT/HUB but the start parameter are never needed later, just needed to start the PASM COG.
So what would you advice for the smallest way to archive that? Does not to be fast, just happens once at start of the COG.
still refactoring,
on the P1 it is FIT 496 how far can I go on the P2? I seem to corrupt res variables (yes I still need some) somehow when adding code thus my goal to shorten up things.
@msrobots: is your altsb code in a DAT block? If so then it should work. I found another setbyte bug in inline assembly, but it only affects inline assembly (and I think you'd get an error that says "setbyte is not supported in inline assembly" if you tried it).
I'll have a new fastspin very soon that definitely works (I have a sample program that tests setbyte, and it did change the right byte).
O I forgot, yes this is a complete PASM COG, just a small stub to start it from SPIN
and that exactly seems to have been the problem. I am now down to 487 longs and both pairs of RX/TX are working with byte addressed LUT buffers.
I post the current code before I destroy it again, still need to shorten it, I really want some sort of string input supported by PASM in the driver.
But hell yeah it passes all the tests. Just slower as envisioned because I still read and write bytes to the HUB, but all 4 lines are working, I am getting somewhere...
I attached all needed files, testserial.spin2 is the main program.
No that won't work because rx1_address is in lutRAM and SETBYTE can't access lutRAM. EDIT: rx1_address will be treated as a cogRAM address and overwrite something unexpected.
It's a big hammer, but if used with care, it can index an array of bytes, so you do not need separate word and byte pointers
see here how it primes both the byte and long pointer sections, but it is up to the user to keep the range of the long pointer safe.
(yes, this does mean your arrays need to be in COG not LUT, but overall code can be smaller and faster)
106 . Register Indirection EEEE 1001011 00I DDDDDDDDD SSSSSSSSS ALTSB D,{#}S Alter subsequent SETBYTE instruction. Next D field = (D[10:2] + S) & $1FF, N field = D[1:0]. D += sign-extended S[17:9]. 2 same 2 same D
107 alias Register Indirection EEEE 1001011 001 DDDDDDDDD 000000000 ALTSB D Alter subsequent SETBYTE instruction. Next D field = D[10:2], N field = D[1:0]. 2 same 2 same D
108 . Register Indirection EEEE 1001011 01I DDDDDDDDD SSSSSSSSS ALTGB D,{#}S Alter subsequent GETBYTE/ROLBYTE instruction. Next S field = (D[10:2] + S) & $1FF, N field = D[1:0]. D += sign-extended S[17:9]. 2 same 2 same D
109 alias Register Indirection EEEE 1001011 011 DDDDDDDDD 000000000 ALTGB D Alter subsequent GETBYTE/ROLBYTE instruction. Next S field = D[10:2], N field = D[1:0]. 2 same 2 same D
another comment :
To me, this seems risky
wrlut rx1_lut_value, rx1_address 'write byte to circular buffer in lut
incmod rx1_head, rx1_lut_btop 'increment buffer head
cmp rx1_head, rx1_tail wz 'hitting tail is bad
if_z incmod rx1_tail, rx1_lut_btop 'increment tail - I am losing received chars at the end of the buffer because the buffer is full
reti1 'exit
Here you change the WR and RD pointers, but with multiple interrupts/non ints in the mix, that gives multiple places that can change any pointer.
I prefer to clamp the WRITE pointer only, as safer.
In P2 I think that means this minor change
wrlut rx1_lut_value, rx1_address 'write byte to circular buffer in lut
incmod rx1_head, rx1_lut_btop 'increment WR buffer head
cmp rx1_head, rx1_tail wz 'hitting RD pointer is bad
if_nz reti1 'faster return
if_z decmod rx1_head, rx1_lut_btop 'rare safety case, undo overflow, next WR char will overwrite this one, RD ptr unaffected.
' Can flag over-run here too if needed.
reti1 'exit
Well @jmg that is a good point and I also thought about how to handle this.
On the TX side I can just throttle the input when the buffer is full.
On the RX side I can't so I WILL lose bytes when the buffer is full.
your version will lose bytes at the front of the buffer, my version looses bytes at the end of the buffer.
I think it is better, because you might miss something if you not read fast enough but you can catch up reading and still have a valid stream of data.
loosing bytes at the front of the buffer is more fatal, because you would have a inconsistent stream missing data somewhere in between.
losing bytes at the end of the buffer is better, or - hmm - would also happen if you do not have a buffer. But the data stream is not corrupted.
that is my reasoning why I just advance the tail pointer when the write pointer hits it at RX. I throw away the last entry to keep all buffer entries in sync.
Comments
Amusingly, I've been using Pnut to produce my binaries for the last week due to having destroyed my P2ES chip and then updated the FPGA to v33i.
EDIT: Oh, no, I was using the full version of SETNIB until recently. I hadn't realised I'd changed it back to the alias since destroying the P2ES chip. I didn't have any comments as to why ... and now I remember, it was because, when I changed from Pnut to p2asm, p2asm would throw an error on that alias. But I'd long replaced p2asm with fastspin.
And it is good to find a bug, that is why we are testing all of this.
Sadly I somehow messed up my program and just one of the two pairs of RX and TX want to work. I am starring at my code to no avail. It was working before, I just broke it shaving longs away to make space for string input.
This is extremely frustrating and I am at the point to start all over again. The sad thing is that I am right now even unsure if I broke the PASM driver, the spin object or my test routines.
The only thing I am sure about is that is was working before, is not working now and stupid me broke it without keeping copies of the working code.
Except what I posted here about a week ago...
Mike
I presume that's one big reason why source control systems were devised.
Currently I am even not sure about the endian reversal. On the P1 a long in HUB ram has a different byte order as in COG ram, not sure about P2.
My plan was to write bytes into my LUT buffers in the RX interrupt, write bytes from my LUT buffers in the TX interrupt, but when transferring LUT to HUB and HUB to LUT being able to write longs after adjusting to the first complete LUT long by writing/reading bytes. Maybe even use the streamer. Or rdfast or whatever.
I never got that far because of running out of space and then breaking it.
My thinking was that when using getbyte/setbyte I am able to reverse the order and as soon as I write/read the first long I will find out if the reversal is needed or not. And if not shave off the reversal and save two more longs.
I am very sure that IF I find my stupid mistake I made, it will be the classical DOH moment.
Enjoy!
Mike
Yes on all points, but undo's are broken in Spin2Gui, at least at my current version under windows, and since at work I am bound to use subversion and not allowed to use GIT because of company policy, I never bothered to use GIT. My private stuff is also not allowed on the company servers, and gosh who bothers about source control for files with just 500 lines of code.
I am a COBOL programmer, source files are supposed to be HUGE, with 500 lines you do not even get a DB access running in COBOL.
Hmm - I do not find any other excuse for not using source control, maybe I should look into GIT.
Enjoy!
Mike
Well yes @AJL I think @evanh mentioned that, but I am not sure why this would help. Maybe you can elaborate. My point of view here is that I have 512 longs of LUT ram that would nicely fit 4 512 byte buffers for RX1/TX1/RX2/TX2.
Cog ram is not 512 longs in my understanding, because of special registers at the end of COG ram or is that different on the P2 vs the P1? I did ask that question before and found no answer yet.
Since I am considering to rewrite this completely if I can't find the stupid mistake I made I am really interested about why two people now recommend to use LUT ram for code and COG ram as LUT/buffer.
I am sometimes quite slow to understand things, so please bear with me and explain further. I seem to miss some point of the argument why I should try this.
Sure I can copy my code from COG to LUT and run it there, and reuse the COG space as buffer, but why should I?
I currently reuse all initialization code space for register variables. To speed up things I pre calculate pointers and have them ready to use. That are about 150 registers ready to use because in COG ram.
If I have the code in LUT and my buffers in COG how to handle those variables I need to do rdbyte/rdlong buffer positions/sizes whatever.
Keeping them in COG ram would reduce the available buffer size, having them in LUT ram and accessing with rdlut wrlut seems impossible to me.
confused,
Mike
I have a stupid question.
In spin on the P1 I can do something like this and it seems to work with fastspin too, but maybe not and that is my problem.
So I am using the parameter of the PRI method startpasm as a continuous block of 21 longs, given by the address of the first parameter to provide the start parmeter block to my PASM driver,
This starting is reading in 21 longs, then setting a sync value so that startpasm returns after the COG started and has read its parameters.
the first parameter of the start parameter block contains the address of the mailbox to use, thus the long[long[xx]]
can I assume that the parameter of my function
PRI startpasm(mailboxaddress, rxpin1, rxbitrate1, rxmode1, rxlutstart1, rxlutsize1, txpin1, txbitrate1, txmode1, txlutstart1, txlutsize1, rxpin2, rxbitrate2, rxmode2, rxlutstart2, rxlutsize2, txpin2, txbitrate2, txmode2, txlutstart2, txlutsize2, txclocks)
RETURN serpasm.start(@mailboxaddress)
are 21 continuous longs in memory addressable as @mailboxaddress like it would be on a P1 or is it possible that those parameter are not in a continuous block in memory?
pulling at straws here...
Mike
I think the idea behind doing that, is these opcodes can work accessing into COG, and not LUT
ie you can create a byte pointer.
But I can not keep variables in LUT space, how to handle that?
and with a working setbyte /altsb I seem to be able to byte-address my LUT buffer with two instructions too. What am I missing?
Mike
You may be missing that the opcode fields are only 9 bits, and so have a 512L reach : that covers all of COG, but does not reach into LUT
If all opcodes could reach COG and LUT equally, there would be no need to call it LUT
Variables stay where opcodes best access them, in COG and code can go where it only needs to execute, in LUT.
Some Assemblers allow the idea of CSEG and DSEG and they place code and data where they best fit.
Maybe P2 needs something similar, so you can just write, and then the assembler splits as needed, checking for anything that may be illegal.
Other approaches to this would be to code your buffers as byte only, and not pack them. Yes, that's wasteful, but it is also fast and simple - and allows easier testing.
Later, you can tune the buffers to be less wasteful and store more bytes, if you find you need to.
Faster interrupts do not need the buffers to be as large.
Since I do need COG ram for my variables, my buffers need to be in the LUT or the buffers have to be significant smaller.
And on my first (working) example I do not pack the bytes in the LUT and waste 3 of four bytes in a long.
And yes, you are right the interrupts fill the buffer quite fast, the best working version I had worked up to sysclock baud on RX and TX.
In my current approach RX1 uses int1 RX2 uses int2 and I have int3 just running every x sysclock, checking if it can transfer on TX1 or TX2.
The main COG just checks the mailboxes and transfers data from buffer to HUB and vice versa.
That worked up to sysclock baud on all four lines talking to themselves on smartpins using one COG.
Getting the bytes into the HUB or from the HUB was and is the bottleneck. When I save my bytes in longs in LUT I can just read a long and write one byte into HUB.
When I on the other hand can pack 4 bytes in the right order in my LUT buffer long, I need just to transfer single bytes to/from HUB until I am at the start of a new LUT long and then can start moving longs to/from HUB until I need to step down to bytes on the last long. I think I might even be able to use rdfast/wrfast, when transferring big enough data.
This is about optimizing the speed when transferring blocks of data, instead of single bytes.
for a simple ser.TX(65) it does not really make a difference but for ser.str() or ser.writeblock(…) it will make a huge difference if writing say 30 bytes is 30 rdbytes/wrbytes to HUB or up to 4 rdbytes/wrbytes and at least 7 rdlongs/wrlongs instead of bytes.
But to do so I need 4 bytes not one in my LUT, and then I started packing bytes and broke everything. Well not everything RX1/TX1 are working RX2/TX2 are broken.
Mike
huh?, No I am not just thinking about quadrupling the size but faster access between HUB and LUT thru reading and writing longs instead of bytes. But I still need way more code space then I have to do so.
Refactoring and refactoring, shaving off longs and hoping to stumble upon why RX2 misbehaves, by using the same code for both channels.
I can at least report some success, thanks to all of you
this, does work and replaces my 8 lines of code by 3 of them.
now I need to do the same with getbyte to shave off some more longs and my first attempt there failed also, will report soon,
Mike
Now I am thinking about this
(1) If you're trying to set byte N of register X, do
that is what I currently do, corrected to the names and needed changed syntax
But is this possible instead? I need to try...
(2) If P is a pointer with a COG address in it, and you want to modify byte N of the register P points to, then do:
I love PASM, somehow,
Mike
That's why I say you've increased density.
You still need the AND in there to limit the range to one longword: Otherwise the ALTSB will cause SETBYTE to write to whatever register the index would point to. #rx1_lut_value is just the base address that rx1_byte_index is added to, in byte scale addressing. Hence why this can handle the complete buffer in cogRAM.
This is something I got used to on the P1. You usually have some init routines, just running once before your PASM program goes into its main loop. So I regulary use this code space as space for Variables normally declared as RES, so not needing any init value.
But on the P2 this really makes fun, because you can read a parameter with one instruction using PTRA++ into the same space where the instruction is.
something like this
you need the longs for your parameter anyways and now you reuse the code space to load them, to store them there also. Code density, yeah that fits it. Well I learn every day a bit more about this confusing illogical language called English...
Enjoy!
Mike
Your english seems perfect, I wouldn't have guessed it wasn't your first language.
sadly this does not replace the other, shouldn't it as of @ersmith description?
Mike
I really suggest not doing it that way in new code, though, because it kills performance. Normally fastspin likes to keep local variables in COG memory rather than HUB -- obviously this is much more efficient. But if you use @ on any local variable or parameter, we can't do that. If there's an @ on a variable it has to go in HUB so it can be read by another COG. And if any variable goes in HUB they all have to go in HUB, because existing Spin code often makes assumptions about how things are laid out in memory and does some weird tricks with that.
fastspin has to be very conservative about anything in HUB memory, because it could be changed by other COGs at any time. So the optimizer pretty much can't do anything with the variables if they end up in HUB; no common sub-expression elimination, no combination of instructions, etc.
So for new code, I suggest just explicitly setting an array in HUB (something in the VAR section) with the values you want to pass to the other COG. That way you can control which values are put in HUB, and let the compiler put anything else it can into local COG memory.
Eric
I'll have a new fastspin very soon that definitely works (I have a sample program that tests setbyte, and it did change the right byte).
Yeah I am stateside since about 12 years and I am trying to better my English skills by listening to StandUp comedians, mostly the older ones one can find on youtube, like George Carlin, WC Fields, Richard Prior, and even Ronald R. on Correspondence Dinners.
It helps to better your skills and is fun too.
- I am currently working on a joke about cocaine. - I just need two more lines...
Enjoy!
Mike
EDIT: Oh, I see you've got that earlier in the source code.
No that won't work because rx1_address is in lutRAM and SETBYTE can't access lutRAM. EDIT: rx1_address will be treated as a cogRAM address and overwrite something unexpected.
Oh no, don't panic here, the altsb is working fine currently with using the longer form of setbyte as you explained, That works.
I just tried it with one more level of indirection and that seems not to do what I was thinking.
As of my init code for PRI startpasm(xxxxx) my thinking was that this saves code space as if I would do it like this
and I just need those 21 longs while the COG is starting, never again?
I need my Mailbox of 8 longs in DAT/HUB but the start parameter are never needed later, just needed to start the PASM COG.
So what would you advice for the smallest way to archive that? Does not to be fast, just happens once at start of the COG.
still refactoring,
on the P1 it is FIT 496 how far can I go on the P2? I seem to corrupt res variables (yes I still need some) somehow when adding code thus my goal to shorten up things.
Enjoy!
Mike
O I forgot, yes this is a complete PASM COG, just a small stub to start it from SPIN
Mike
and that exactly seems to have been the problem. I am now down to 487 longs and both pairs of RX/TX are working with byte addressed LUT buffers.
I post the current code before I destroy it again, still need to shorten it, I really want some sort of string input supported by PASM in the driver.
But hell yeah it passes all the tests. Just slower as envisioned because I still read and write bytes to the HUB, but all 4 lines are working, I am getting somewhere...
I attached all needed files, testserial.spin2 is the main program.
Enjoy!
Mike
oh yes it did overwrite something unexpected.
Mike
It's a big hammer, but if used with care, it can index an array of bytes, so you do not need separate word and byte pointers
see here how it primes both the byte and long pointer sections, but it is up to the user to keep the range of the long pointer safe.
(yes, this does mean your arrays need to be in COG not LUT, but overall code can be smaller and faster)
another comment :
To me, this seems risky Here you change the WR and RD pointers, but with multiple interrupts/non ints in the mix, that gives multiple places that can change any pointer.
I prefer to clamp the WRITE pointer only, as safer.
In P2 I think that means this minor change
On the TX side I can just throttle the input when the buffer is full.
On the RX side I can't so I WILL lose bytes when the buffer is full.
your version will lose bytes at the front of the buffer, my version looses bytes at the end of the buffer.
I think it is better, because you might miss something if you not read fast enough but you can catch up reading and still have a valid stream of data.
loosing bytes at the front of the buffer is more fatal, because you would have a inconsistent stream missing data somewhere in between.
losing bytes at the end of the buffer is better, or - hmm - would also happen if you do not have a buffer. But the data stream is not corrupted.
that is my reasoning why I just advance the tail pointer when the write pointer hits it at RX. I throw away the last entry to keep all buffer entries in sync.
Enjoy!
Mike