I was thinking the same thing... It's essentially the same as UTF-8.
No, it isn't -- in UTF-8 the first byte (if it's not 0-127) contains what's essentially a count of the remaining bytes in the upper bits of the first byte, and the remaining bytes store 6 bits per byte (upper bits must be 1 and then 0). UTF-8 was pretty specifically designed so that ASCII bytes (0-127) do not appear at all in the representation of values above 127, and also so that finding character boundaries is very simple (characters start with either 0x or 11, never 10, and bytes starting with 0x or 11 never appear as other than the first byte of a character). The RFVAR representation is much simpler (7 bits per byte, upper bit is 0 when finished).
Except this is limited to 24 bits? Why not allow 32-bits like UTF-8?
I think it's actually limited to 28 bits (4 bytes). I'm not sure why the option for 5 bytes wasn't provided -- it would have been nice to be able to cover the whole range of 32 bit values, but I presume something internally made it hard to go above 4 bytes per read.
@Rayman said:
Except this is limited to 24 bits? Why not allow 32-bits like UTF-8?
I think it's actually limited to 28 bits (4 bytes). I'm not sure why the option for 5 bytes wasn't provided -- it would have been nice to be able to cover the whole range of 32 bit values, but I presume something internally made it hard to go above 4 bytes per read.
RFVAR/RFVARS limit is 29 bits, as 4th byte can use all eight bits. I think FIFO being long-wide probably prevented a 5th byte and there is always RFLONG if 29 bits aren't enough. At first sight perhaps not very useful, RFVAR and RFVARS are actually a handy pair of space-saving instructions that take two cycles whether 1, 2, 3 or 4 bytes are read.
Is there any way to prevent the flexspin optimizer from removing inline PASM register assignments on an individual method basis? Or do I have to resort to removing all optimizations for the whole project with the "-O0" option ?
In my inline PASM code I'm writing to sequential local registers and then performing a setq burst transfer to write all these registers into HUB. I don't explicitly identify the consecutive register names after the first one written by the burst transfer. I'm finding the flexspin optimizer is then removing the instruction assignments to these registers because it believes the registers are no longer accessed so there is no need to perform the instructions, but this is bad as they are indirectly accessed in the setq wrlong transfer, and I get corrupted output. I've been trying to add some dummy register access instructions at the end of my loop that uses all these registers and sometimes it tricks the optimizer into believing the registers are used and so won't remove the code, but it's a little bit hit and miss. I'm thinking it's safer to just disable optimizations in these cases, but ideally do this on a per method basis. Is there a way?
Yeah I was the same and thought it would only work for other languages but SPIN is included as well now. I found it buried in the docs, seems to work okay.
Spin2-style ORG/END inline ASM is not optimized and always loads into cog RAM.
Flexspin-style ASM/ENDASM is treated just like compiler-generated code.
The 3rd inline ASM style (__asm volatile in C) isn't supported in Spin because uhhhhhhhhh.
Unrelatedly, do not assume that local variables defined consecutively actually end up in consecutive registers. Register renaming will assign them in order of use. Should work if you use an array though?
@Wuerfel_21 said:
Spin2-style ORG/END inline ASM is not optimized and always loads into cog RAM.
Flexspin-style ASM/ENDASM is treated just like compiler-generated code.
Ok
Unrelatedly, do not assume that local variables defined consecutively actually end up in consecutive registers. Register renaming will assign them in order of use. Should work if you use an array though?
Arggh, what is the syntax for the register array? Is it a local or put in the VAR block? I need to do setq burst transfers on it so it needs to be a local register accessible from inline ASM/ENDASM block.
@Wuerfel_21 said:
Unrelatedly, do not assume that local variables defined consecutively actually end up in consecutive registers. Register renaming will assign them in order of use.
Hmm, I've seen many examples of "official" code in the library from Parallax where the adresses of parameters on the stack and also local variables were assumed to be in the order of declaration. Since Spin does not support any composed data types such as structs in C this makes sense although it makes some optimisations difficult.
If the variables are forced onto the stack (as happens when you ask for the address of one), then yes, they are in order, but the whole function gets slow and bloated, so don't do that if writing for flexspin specifically.
Yeah I just hit the problem with the ordering of registers....I had to trick the optimizer again through referencing them explicitly in sequential order. You can't define a pix[8] array here instead, the inline PASM code doesn't deal with indexes, so I need explicit names. I don't think this would be even possible with FlexProp - too many args+local variables...?
PRI {++opt(!regs)} gfxBlend32(src1, stride1, src2, stride2, destaddr, stride3, w, h, blendval, mixval) | pix1, pix2, pix3, pix4, pix5, pix6, pix7, pix8, len, srcptr, destptr, src, dest
w <#= LINEBUFSIZE ' stay inside scratch buffer
if w == 0 or h == 0
return
src := @srcbuf
dest := @textbuf
asm
mov pix1, #0 ' only put here to ensure optimizer assigns registers in the usable order
mov pix2, #0
mov pix3, #0
mov pix4, #0
mov pix5, #0
mov pix6, #0
mov pix7, #0
mov pix8, #0
endasm
repeat h
mem.read(@srcbuf, src1, w<<2)
src1 += stride1
mem.read(@textbuf, src2, w<<2)
src2 += stride2
asm
mov len, w
setpiv blendval
setpix mixval
mov srcptr, src
mov destptr, dest
mixloop setq #3
rdlong pix1, srcptr
add srcptr, #16
setq #3
rdlong pix5, destptr
mixpix pix5, pix1
mixpix pix6, pix2
mixpix pix7, pix3
mixpix pix8, pix4
cmp len, #4 wc ' check if less than 4 pixels
if_c altd len, #$1ff
setq #3
wrlong pix5, destptr
add destptr, #16
sub len, #4 wcz
if_nc_and_nz jmp #mixloop
endofline
endasm
'write back the rendered buffer data
mem.write(@textbuf, destaddr, w<<2)
destaddr += stride3
@rogloh : I think if you use ORG/END instead of ASM/ENDASM you won't run into so many optimizer issues (and your code will be compatible with PropTool/PNut).
@ersmith said:
@rogloh : I think if you use ORG/END instead of ASM/ENDASM you won't run into so many optimizer issues (and your code will be compatible with PropTool/PNut).
Yeah it can compile that way but I see in the listing that would drag in the fcached inline PASM code for every scan line, slowing down performance during rendering. It might have to be the way though for a PropTool variant if that's the way inline code works there and if this code ever wants to be executed on that platform.
UPDATE: Actually in this case, if the outer loop SPIN functions are reasonably slow, it may not be that much slower over all. I'll have to benchmark it at some point.
Hmm, Eric/Ada,
I've just re-run my speed tester C program that I had used heavily during the revamping of the SD driver. Problem is it is now getting consistent file open/close errors at the same place each run. Out of 24 files written and 20 files read there is exactly two file open-to-read fails and one file open-to-write fail. Always the same ones repeat fail. PS: And I've reformatted two SD cards to verify it wasn't corrupt cards. PPS: And all three fails are within repeating loops of a file. ie: They aren't unique file fails. The single open-to-read fail is the final repeat of a 200 KB file. The open-to-write fail (which has a subsequent open-to-read fail) is the first repeat of a 200 Byte file.
I haven't used the speed tester in a few weeks. The June update of Flexspin I had also had the issue. I've already deleted that one so can't be sure of its exact date. At any rate, this issue wasn't present during any of the driver work back in May.
I've forgotten how to find commits to bisect. Some hints on using Git again would be helpful. Never mind, Uncle Google told me - git log
Okay, first bad commit is:
commit a69d1e401d53f9a747bcaae37c5876ba0dc55815
Author: Eric Smith <ersmith@totalspectrum.ca>
Date: Sat Jun 4 16:37:42 2022 -0300
re-factored SD card code to support multiple cards; this required changing C struct __using code as well to be more class-like
And I've attached my test program. The errors stand out.
@Wuerfel_21 said:
Speaking of, I just tried loading NeoYume over the host VFS and it doesn't work
CRLF issue perhaps? I do open files in binary mode, but idk if that translates across.
Any clues as to what's going wrong? Are the files not being found, or are they found but not read correctly? The binary mode should be carried across, but I don't test as much on Windows, so it's possible I missed something there.
Also, regarding the last couple int64 related commits - do note that long is not a 64bit type on Windows and it can and will blow up if you try to printf("%ld\n",(int64_t)something);. You need long long
Comments
No, it isn't -- in UTF-8 the first byte (if it's not 0-127) contains what's essentially a count of the remaining bytes in the upper bits of the first byte, and the remaining bytes store 6 bits per byte (upper bits must be 1 and then 0). UTF-8 was pretty specifically designed so that ASCII bytes (0-127) do not appear at all in the representation of values above 127, and also so that finding character boundaries is very simple (characters start with either 0x or 11, never 10, and bytes starting with 0x or 11 never appear as other than the first byte of a character). The RFVAR representation is much simpler (7 bits per byte, upper bit is 0 when finished).
I think it's actually limited to 28 bits (4 bytes). I'm not sure why the option for 5 bytes wasn't provided -- it would have been nice to be able to cover the whole range of 32 bit values, but I presume something internally made it hard to go above 4 bytes per read.
RFVAR/RFVARS limit is 29 bits, as 4th byte can use all eight bits. I think FIFO being long-wide probably prevented a 5th byte and there is always RFLONG if 29 bits aren't enough. At first sight perhaps not very useful, RFVAR and RFVARS are actually a handy pair of space-saving instructions that take two cycles whether 1, 2, 3 or 4 bytes are read.
@ersmith, @TonyB_ Thank's for explaining rfvar/rfvars. I wonder now if it'd be better to just copy UTF-8... But, I guess it's a done deal...
@ersmith,
but @@@data1 does what I think and put absolute addresses or doesn't it? I am quite annoyed with relative addressing,
mike
Yes,
@@@data1
is always absolute; but that's only available in flexspin, I don't think pnut/proptool support it.Is there any way to prevent the flexspin optimizer from removing inline PASM register assignments on an individual method basis? Or do I have to resort to removing all optimizations for the whole project with the "-O0" option ?
In my inline PASM code I'm writing to sequential local registers and then performing a setq burst transfer to write all these registers into HUB. I don't explicitly identify the consecutive register names after the first one written by the burst transfer. I'm finding the flexspin optimizer is then removing the instruction assignments to these registers because it believes the registers are no longer accessed so there is no need to perform the instructions, but this is bad as they are indirectly accessed in the setq wrlong transfer, and I get corrupted output. I've been trying to add some dummy register access instructions at the end of my loop that uses all these registers and sometimes it tricks the optimizer into believing the registers are used and so won't remove the code, but it's a little bit hit and miss. I'm thinking it's safer to just disable optimizations in these cases, but ideally do this on a per method basis. Is there a way?
Here's an example of the problem...
Source is this:
and the bad code generated is this, which is missing several intermediate instructions:
It should be this (which is what happens when the optimizer is globally disabled):
Ok, found my answer I think. I might need to use this:
PUB {++opt(!regs)} methodName(args...)
My code seems to work now.
__asm volatile {
}
Oh, is that Spin? I thought Flexspin didn't optimise inline assembly there.
Yeah I was the same and thought it would only work for other languages but SPIN is included as well now. I found it buried in the docs, seems to work okay.
Spin2-style ORG/END inline ASM is not optimized and always loads into cog RAM.
Flexspin-style ASM/ENDASM is treated just like compiler-generated code.
The 3rd inline ASM style (
__asm volatile
in C) isn't supported in Spin because uhhhhhhhhh.Unrelatedly, do not assume that local variables defined consecutively actually end up in consecutive registers. Register renaming will assign them in order of use. Should work if you use an array though?
Ok
Arggh, what is the syntax for the register array? Is it a local or put in the VAR block? I need to do setq burst transfers on it so it needs to be a local register accessible from inline ASM/ENDASM block.
I think just a regular local array is sufficient? May not actually work like that, idk.
Hmm, I've seen many examples of "official" code in the library from Parallax where the adresses of parameters on the stack and also local variables were assumed to be in the order of declaration. Since Spin does not support any composed data types such as structs in C this makes sense although it makes some optimisations difficult.
If the variables are forced onto the stack (as happens when you ask for the address of one), then yes, they are in order, but the whole function gets slow and bloated, so don't do that if writing for flexspin specifically.
Yeah I just hit the problem with the ordering of registers....I had to trick the optimizer again through referencing them explicitly in sequential order. You can't define a pix[8] array here instead, the inline PASM code doesn't deal with indexes, so I need explicit names. I don't think this would be even possible with FlexProp - too many args+local variables...?
@rogloh : I think if you use ORG/END instead of ASM/ENDASM you won't run into so many optimizer issues (and your code will be compatible with PropTool/PNut).
That makes sense now - I've always used ORG/END for Spin based inline assembly.
Yeah it can compile that way but I see in the listing that would drag in the fcached inline PASM code for every scan line, slowing down performance during rendering. It might have to be the way though for a PropTool variant if that's the way inline code works there and if this code ever wants to be executed on that platform.
UPDATE: Actually in this case, if the outer loop SPIN functions are reasonably slow, it may not be that much slower over all. I'll have to benchmark it at some point.
Hmm, Eric/Ada,
I've just re-run my speed tester C program that I had used heavily during the revamping of the SD driver. Problem is it is now getting consistent file open/close errors at the same place each run. Out of 24 files written and 20 files read there is exactly two file open-to-read fails and one file open-to-write fail. Always the same ones repeat fail. PS: And I've reformatted two SD cards to verify it wasn't corrupt cards. PPS: And all three fails are within repeating loops of a file. ie: They aren't unique file fails. The single open-to-read fail is the final repeat of a 200 KB file. The open-to-write fail (which has a subsequent open-to-read fail) is the first repeat of a 200 Byte file.
I haven't used the speed tester in a few weeks. The June update of Flexspin I had also had the issue. I've already deleted that one so can't be sure of its exact date. At any rate, this issue wasn't present during any of the driver work back in May.
I've forgotten how to find commits to bisect. Some hints on using Git again would be helpful. Never mind, Uncle Google told me -
git log
Okay, first bad commit is:
And I've attached my test program. The errors stand out.
What error code is it giving you? I think that commit increased heap memory usage, so try with increased heap size.
It's a null pointer. Searching "heap" in docs ...
enum {HEAPSIZE = 8192};
fixes it.EDIT: 4400 bytes is enough.
The error code is in
errno
when a library call returns null. You can callstrerror
to get the printable string version.Ah, interesting, errno is zero too. Definitely raising the heap size fixes it.
Speaking of, I just tried loading NeoYume over the host VFS and it doesn't work
CRLF issue perhaps? I do open files in binary mode, but idk if that translates across.
I checked in a change to bump the default heap size on P2; the new file system code does need a bit more RAM sometimes.
Any clues as to what's going wrong? Are the files not being found, or are they found but not read correctly? The binary mode should be carried across, but I don't test as much on Windows, so it's possible I missed something there.
It is reading fine (but of course really slow at 230500 baud), but the data seems to be bad (black screen)
Actually, I'm just an idiot and set a bad memory config. Ignore me.
Also, regarding the last couple int64 related commits - do note that
long
is not a 64bit type on Windows and it can and will blow up if you try toprintf("%ld\n",(int64_t)something);
. You needlong long