However with -O1 all the calculations are skipped:
01f64 | ' PUB {++opt(1)} setWindow(xs, ys, xe, ye)
01f64 | _LCD_Graphics_Driver_setWindow
01f64 | ' ' Set Screen Window (column start, row start, column end, row end)
01f64 | ' curx:= xs
01f64 24 B4 04 F1 | add objptr, #36
01f68 5A F4 60 FC | wrlong arg01, objptr
01f6c | ' cury:= ys
01f6c 04 B4 04 F1 | add objptr, #4
01f70 5A F6 60 FC | wrlong arg02, objptr
01f74 28 B4 84 F1 | sub objptr, #40
01f78 | ' WaitMailbox()
01f78 B4 FE BF FD | call #_LCD_Graphics_Driver_WaitMailbox
01f7c 7A F8 80 F1 | sub arg03, arg01
01f80 01 F8 04 F1 | add arg03, #1
01f84 7B FA 80 F1 | sub arg04, arg02
01f88 01 FA 04 F1 | add arg04, #1
01f8c 7D F8 00 FD | qmul arg03, arg04
01f90 | ' mailbox2s:= (xs<<16) | ys ' xs, ys
01f90 10 F4 64 F0 | shl arg01, #16
01f94 7B F4 40 F5 | or arg01, arg02
01f98 08 B6 04 F1 | add ptr__LCD_Graphics_Driver_dat__, #8
01f9c 5B F4 60 FC | wrlong arg01, ptr__LCD_Graphics_Driver_dat__
01fa0 | ' mailbox2e:= (xe<<16) | ye ' xe, ye
01fa0 04 B6 04 F1 | add ptr__LCD_Graphics_Driver_dat__, #4
01fa4 5B F4 60 FC | wrlong arg01, ptr__LCD_Graphics_Driver_dat__
01fa8 | ' mailbox4 := (xe - xs +1)*(ye - ys +1) ' calc no. of pixels
01fa8 08 B6 04 F1 | add ptr__LCD_Graphics_Driver_dat__, #8
01fac | ' mailbox := $81_00_0000
01fac 18 FA 60 FD | getqx arg04
01fb0 5B FA 60 FC | wrlong arg04, ptr__LCD_Graphics_Driver_dat__
01fb4 14 B6 84 F1 | sub ptr__LCD_Graphics_Driver_dat__, #20
01fb8 00 80 C0 FF
01fbc 5B 00 68 FC | wrlong ##-2130706432, ptr__LCD_Graphics_Driver_dat__
01fc0 | _LCD_Graphics_Driver_setWindow_ret
01fc0 2D 00 64 FD | ret
I think FlexSpin detects that the mailbox2s/2e and mailbox4 variables are never used (by the Spin code) and therefore thinks the calculations can be optimized away. However, the PASM part of the code relies on the results! I hope I don't have to mark them as "volatile" or something.
The only calculation that's missing is the one for mailbox2e, which it apparently decides can be the same as mailbox2s. Try -O1,~cordic-reorder or -O1,~local-reuse, it's likely one of those having a normal one.
Yes, you're correct, the same value is stored int mailbox2e as in mailbox2s which explains why it works for single pixel units. But the calculation of mailbox4 (no. of pixels) is totally skipped.
@Wuerfel_21 said:
Try -O1,~cordic-reorder or -O1,~local-reuse, it's likely one of those having a normal one.
You mean as temporary workaround? If the mailbox variables were local then I'd agree that the compiler could optimize away calculations of which the result is nerver used. But writing to a DAT section should clearly signal "hey, this data might be used somewhere else, don't optimize away!".
Anyway, replacing an expression mailbox2e:= (xe<<16) | ye with a different one mailbox2e:= (xs<<16) | ys is definitely a compiler bug and can't be explained with "too much optimization". It's simply wrong. Or am I missing something?
Ok, next problem. I can't find anything in the docs about how DEBUG is supported. It seems to work in Spin, e.g. if I put something like debug ("paintIcon", udec(bg), udec(fg)) into the assembler part in the DAT section inside the Spin2 code and switch on "BRK debug (P2 only)" in the options menu of FlexProp I can see the debug output in the terminal window.
But if I try the same in the assembler part inside a C file then I get error messages like "unexpected '(' in line..." after the "debug" statement. I think PASM sections should be treated the same no matter if they are in a Spin or C file, shouldn't they.
@ManAtWork said:
Yes, you're correct, the same value is stored int mailbox2e as in mailbox2s which explains why it works for single pixel units. But the calculation of mailbox4 (no. of pixels) is totally skipped.
Where do you see it skipped? It just moves it up due to the pipeline delay on the multiply (or, more accurately, stuff from above and below is pulled into the gap between QMUL/GETQX):
01f7c 7A F8 80 F1 | sub arg03, arg01
01f80 01 F8 04 F1 | add arg03, #1
01f84 7B FA 80 F1 | sub arg04, arg02
01f88 01 FA 04 F1 | add arg04, #1
01f8c 7D F8 00 FD | qmul arg03, arg04
[other stuff here]
01fac 18 FA 60 FD | getqx arg04
01fb0 5B FA 60 FC | wrlong arg04, ptr__LCD_Graphics_Driver_dat__
@Wuerfel_21 said:
Try -O1,~cordic-reorder or -O1,~local-reuse, it's likely one of those having a normal one.
You mean as temporary workaround? If the mailbox variables were local then I'd agree that the compiler could optimize away calculations of which the result is nerver used. But writing to a DAT section should clearly signal "hey, this data might be used somewhere else, don't optimize away!".
To figure out which optflag is buggy. Alternatively, post a complete sample that we can try locally (doesn't have to actually do anything, just enough to get the buggy codegen to show up)
Anyway, replacing an expression mailbox2e:= (xe<<16) | ye with a different one mailbox2e:= (xs<<16) | ys is definitely a compiler bug and can't be explained with "too much optimization". It's simply wrong. Or am I missing something?
Yes, that is indeed wrong. Are you using the very latest version, btw? I think a bug kinda like that was fixed recently.
@Wuerfel_21 said:
Where do you see it skipped? It just moves it up due to the pipeline delay on the multiply (or, more accurately, stuff from above and below is pulled into the gap between QMUL/GETQX):
Uhh! I didn't look carefully enough. I just saw that a lot of code is missing between the wrlong to mailbox2e and wrlong to mailbox4. I haven't noticed that the code has moved upward.
@Wuerfel_21 said:
... Try -O1,~cordic-reorder or -O1,~local-reuse, it's likely one of those having a normal one.
To figure out which optflag is buggy. Alternatively, post a complete sample that we can try locally (doesn't have to actually do anything, just enough to get the buggy codegen to show up)
I've extracted a reduced version that can show the bug but doesn't output anything. It's not as short as it could be. I simply crippled the assembler code so that it doesn't output anything to the pins but instead signals completion through the mailbox immediately. Compile the demo and search the listing file for "PUB setWindow(".
Okay, -O1,~cordic-reorder does infact bypass the issue. Hmm. Though no idea how a bad reorder would lead it to believe that the second expression can be removed and replaced with the result of the second. My money is on "other bug that only happens in the state that the reorder creates"
Edit: Or maybe not, both expressions pass through the same temporary, so if the second expression disappears, they are actually the same
Okay, so here's what's going on: here's the code of intereset, as compiled with -O1,~cordic-reorder,~local-reuse. Read bottom to top (as the block find algorithm goes):
mov local02, arg01 ' \
shl local02, #16 ' |
or local02, arg02 ' | IT REORDERS THIS BLOCK WTF
add ptr__LCD_Opt_Driver_dat__, #8 ' |
wrlong local02, ptr__LCD_Opt_Driver_dat__ ' /
mov local02, arg03 ' \
shl local02, #16 ' / Can't reorder because dependency
or local02, arg04 ' Can't reorder due to arg4 use
add ptr__LCD_Opt_Driver_dat__, #4 ' Can reorder
wrlong local02, ptr__LCD_Opt_Driver_dat__ ' Can reorder
sub arg03, arg01 ' \
add arg03, #1 ' | QMUL dependencies, can't reorder
sub arg04, arg02 ' |
add arg04, #1 ' /
qmul arg03, arg04
getqx local03
@evanh said:
I've managed to save one case - full source code attached - of weird behaviour by making a backup of it at that time. It's a different problem I think but one that has cropped up in completely different code before ... In the DAT section below, if I swap the two lines io_delay long 1 and prblob long 0[4] with each other then I get a repeating-print crash. Otherwise it runs fine.
DAT
prblob long 0[4]
io_delay long 1
txdata byte 0[BLOCKSIZE]
rxdata byte 0[BLOCKSIZE]
PS: The crash occurs at the first block copying to/from the PSRAM. After it has reset and ID queried each chip.
PPS: And a little debug shows it is during the transmit routine. So it's the first attempt to block write from the buffers. But not the first use of either io_delay or prblob.
I can't actually run the code right now (I'm away from my P2), but I see that io_delay is used in some calculations that use multiplies, so it's possible this bug may be the one that Ada just fixed where sometimes pulling calculations into CORDIC delay slots didn't check dependencies correctly.
@ManAtWork said:
Ok, next problem. I can't find anything in the docs about how DEBUG is supported. It seems to work in Spin, e.g. if I put something like debug ("paintIcon", udec(bg), udec(fg)) into the assembler part in the DAT section inside the Spin2 code and switch on "BRK debug (P2 only)" in the options menu of FlexProp I can see the debug output in the terminal window.
But if I try the same in the assembler part inside a C file then I get error messages like "unexpected '(' in line..." after the "debug" statement. I think PASM sections should be treated the same no matter if they are in a Spin or C file, shouldn't they.
DEBUG() is a Spin2 feature, so it's not available (yet) in C.
As to how to treat PASM sections in other languages... that's a big headache. The definition of PASM is tied up with the definition of Spin2 in unfortunate ways. This makes things like for example how to evaluate expressions very complicated; do we use the native language order of evaluation, or the Spin2 one? I've tried to work around this to some degree in C by having different keywords (__asm and __pasm) but this is still not a complete solution.
Ok, I understand. So the __pasm{} sections are not parsed by the same parser as the DAT sections are for Spin2 code? Surprises me but I do not know all details of the implementation.
Is there a way to get around this limitation? I could declare the assembler code in a spin2 file, export the entry point address and use that in C code.
Easier debugging saves a lot of time. Especially cog memory is not accessible from the outside and you can't just put a printf there.
@ManAtWork said:
Ok, I understand. So the __pasm{} sections are not parsed by the same parser as the DAT sections are for Spin2 code? Surprises me but I do not know all details of the implementation.
Yes, unfortunately the whole C file is parsed by the C parser and so the __pasm syntax is hacked in. Adding DEBUG for it is on my todo list, but I don't know when it will happen.
Is there a way to get around this limitation? I could declare the assembler code in a spin2 file, export the entry point address and use that in C code.
Yes, that's exactly what I would do. You can have the assembler code in a file that's imported into C via struct using.
@ersmith said:
I can't actually run the code right now (I'm away from my P2), but I see that io_delay is used in some calculations that use multiplies, so it's possible this bug may be the one that Ada just fixed where sometimes pulling calculations into CORDIC delay slots didn't check dependencies correctly.
Pik's post reminded me to test this ... No change. Still doing the repeat print crashing with latest build of master. Version 5.9.15-beta-v5.9.14-9-g03841438 Compiled on: Aug 1 2022
@pik33 said:
I tried to compile and use this minimp3 decoder.
I had to get rid of 64bit integer constants first in "1LL" format, but this was no problem (replacing with a float)
Replacing with a float will probably lose precision. As of 5.9.14 the 1LL format constants should be supported now. Did you find some that aren't?
Then I tried to add main() and call some functions from it. All my trial to call mp3_decode() ends with this:
"/home/pik33/Programy/flexprop/bin/flexspin" -2 -l --tabs=8 -D_BAUD=230400 -O1 --charset=utf8 -I "/home/pik33/Programy/flexprop/include" "/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3.c"
Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2022 Total Spectrum Software Inc.
Version 5.9.13-beta-HEAD-v5.9.12-3-ge286147b Compiled on: Jun 21 2022
child killed: segmentation violation
Finished at Mon Aug 1 11:49:15 2022
either in Linux or in Windows (official 5.9.14)
Are you sure you posted the version that you were compiling? In both Linux and Windows I did not get any segmentation fault when compiling minimp3.c. I did get some errors about missing 64 bit arithmetic functions; I've fixed those and checked in the fix to github. Now it complains about a missing main(), which seems to be correct.
@ersmith said:
I can't actually run the code right now (I'm away from my P2), but I see that io_delay is used in some calculations that use multiplies, so it's possible this bug may be the one that Ada just fixed where sometimes pulling calculations into CORDIC delay slots didn't check dependencies correctly.
Pik's post reminded me to test this ... No change. Still doing the repeat print crashing with latest build of master. Version 5.9.15-beta-v5.9.14-9-g03841438 Compiled on: Aug 1 2022
Does it still happen with optimization turned off? How about in bytecode mode (-2nu)?
@ersmith said:
Does it still happen with optimization turned off? How about in bytecode mode (-2nu)?
It heavily relies on Fcache to place the inline Pasm in cogRAM, so I kind of need the optimiser. Or is there a way to specify where the Pasm is placed irrespective of optimiser setting?
As for -2nu, I get this error: psram16_batty_crashing.spin2:122: error: identifier getrnd not found
EDIT: Replaced the randfill() function with the newer pasm version and got that compiling with -2nu ... but no joy with the critical part, it crashes at the same place I think ... err, no, the repeat print crashing is much earlier, at the init_smartpins() function .. which is the first piece of inline pasm.
Now it complains about a missing main(), which seems to be correct.
That's why I added a simple main() to this. It compiles until the main tries to call mp3_decode().
Or if I add this to the Basic as a class, the result is the same. I can call mp3_create( ), but not mp3_decode()
The first try was official 5.9.14 in Windows. Then I moved to Linux and used this 5.9.13. The last try was to recompile a new 5.9.15. All of these version give the same result: call mp3decode() and the compiler does segmentation fault.
I will recompile the compiler and try the newest version.
Recompiled.
Version 5.9.15-beta-HEAD-v5.9.14-11-g2f5736d1 Compiled on: Aug 2 2022
Now 1LL works, but calling mp3_decode() still generates a segfault. I will try to debug this by commenting out...
I commented out static int mp3_decode_frame() leaving only header and return - no segfault, but
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:802: error: global initializers are evaluated at compile time and therefore must be constant
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:802: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:802: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:803: error: global initializers are evaluated at compile time and therefore must be constant
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:803: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:803: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:804: error: global initializers are evaluated at compile time and therefore must be constant
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:804: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:804: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:805: error: global initializers are evaluated at compile time and therefore must be constant
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:805: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:805: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:806: error: global initializers are evaluated at compile time and therefore must be constant
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:806: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:806: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:807: error: global initializers are evaluated at compile time and therefore must be constant
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:807: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:807: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:808: error: global initializers are evaluated at compile time and therefore must be constant
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:808: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:808: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:809: error: global initializers are evaluated at compile time and therefore must be constant
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:809: error: Bad constant expression
/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:809: error: Bad constant expression
I now managed to compile a first test, replacing these FIXHR with precomputed constant integers.
Something seems to work.... Time to try, connect the audio output to the decoder...
Why the pointer returned by calloc in mp3_create is 6382bb88?
Minimp3 is too slow for a P2... One mp3 block, 26 ms of audio, is decoded in 121/113 (with -O2) ms at 337 MHz - 5x too slow to use.
Also, looking through the macros, I think MULS is only used for 16x16 multiplication, so you can reimplement it as such (normal multiply operator only ever generates QMUL/GETQX)
static inline int32_t MULS(int16_t a, int16_t b) {
int32_t tmp = a;
__asm {
muls tmp,b
}
return tmp;
}
All this MULS/MULH stuff has to be rewritten which I already tried with MULH (asm with qmul, which worked but it did nothing for the speed). It not only gets time, but I don't know why they decided to work in 15:17 format. It generates noise - I have this minimp3 compiled for Ultibo - and the noise is easy hearable in silent parts of music. I ended with libmad on Ultibo. It is also integer based, but they use 4:28 fixed point and it is silent (enough), Maybe try, compile libmad if I can...
I have also mp2 decoder in C, this is simpler so I may try this too
There's also libtremor lowmem branch for integer-only ogg vorbis. I tried compiling that at some point, but the 64 bit support (and general C compliance) just wasn't there yet.
Semi-relatedly, if the format is of no matter, I've been experimenting for a while to conjure up a decent ADPCM-based music compression scheme with P1 in mind (though I haven't actually developed a non-PC/C++ decoder). I'm on like the 4th rewrite of that and it doesn't totally suck now (i.e. to the point where the quantization noise is low enough that it isn't distracting). But that's at some ~240kbps (slightly less than 60 512 byte SD sectors per second) for 32kHz stereo so perhaps not efficient enough for whatever you want to do.
@ersmith said:
Does it still happen with optimization turned off? How about in bytecode mode (-2nu)?
It heavily relies on Fcache to place the inline Pasm in cogRAM, so I kind of need the optimiser. Or is there a way to specify where the Pasm is placed irrespective of optimiser setting?
You can turn off the optimizer and turn on just fcache with a command line option of something like -O0 --fcache=128.
As for -2nu, I get this error: psram16_batty_crashing.spin2:122: error: identifier getrnd not found
EDIT: Replaced the randfill() function with the newer pasm version and got that compiling with -2nu ... but no joy with the critical part, it crashes at the same place I think ... err, no, the repeat print crashing is much earlier, at the init_smartpins() function .. which is the first piece of inline pasm.
I don't have the hardware to test this, and I also seem to have an older stdlib.spin2 that's missing some functions you're using, so I can't even compile it . But things to try:
(1) As a sanity check: does this exact code work correctly in PNut / PropTool?
(2) Does the inline pasm depend on any hard coded registers? It doesn't look like it but it's worth double checking.
(3) I don't see any printing in the init_smartpins() function. What exactly is getting stuck printing? What's it printing? Is it possible that it's stuck in some kind of reset loop?
Comments
Hmm, the optimizer seems to be "too smart" and optimizes away important parts of the code. This is the source:
With optimization off this is compiled correctly:
However with -O1 all the calculations are skipped:
I think FlexSpin detects that the mailbox2s/2e and mailbox4 variables are never used (by the Spin code) and therefore thinks the calculations can be optimized away. However, the PASM part of the code relies on the results! I hope I don't have to mark them as "volatile" or something.
The only calculation that's missing is the one for
mailbox2e
, which it apparently decides can be the same asmailbox2s
. Try-O1,~cordic-reorder
or-O1,~local-reuse
, it's likely one of those having a normal one.Yes, you're correct, the same value is stored int mailbox2e as in mailbox2s which explains why it works for single pixel units. But the calculation of mailbox4 (no. of pixels) is totally skipped.
You mean as temporary workaround? If the mailbox variables were local then I'd agree that the compiler could optimize away calculations of which the result is nerver used. But writing to a DAT section should clearly signal "hey, this data might be used somewhere else, don't optimize away!".
Anyway, replacing an expression
mailbox2e:= (xe<<16) | ye
with a different onemailbox2e:= (xs<<16) | ys
is definitely a compiler bug and can't be explained with "too much optimization". It's simply wrong. Or am I missing something?Ok, next problem. I can't find anything in the docs about how DEBUG is supported. It seems to work in Spin, e.g. if I put something like
debug ("paintIcon", udec(bg), udec(fg))
into the assembler part in the DAT section inside the Spin2 code and switch on "BRK debug (P2 only)" in the options menu of FlexProp I can see the debug output in the terminal window.But if I try the same in the assembler part inside a C file then I get error messages like "unexpected '(' in line..." after the "debug" statement. I think PASM sections should be treated the same no matter if they are in a Spin or C file, shouldn't they.
Where do you see it skipped? It just moves it up due to the pipeline delay on the multiply (or, more accurately, stuff from above and below is pulled into the gap between QMUL/GETQX):
To figure out which optflag is buggy. Alternatively, post a complete sample that we can try locally (doesn't have to actually do anything, just enough to get the buggy codegen to show up)
Yes, that is indeed wrong. Are you using the very latest version, btw? I think a bug kinda like that was fixed recently.
Uhh! I didn't look carefully enough. I just saw that a lot of code is missing between the wrlong to mailbox2e and wrlong to mailbox4. I haven't noticed that the code has moved upward.
I've extracted a reduced version that can show the bug but doesn't output anything. It's not as short as it could be. I simply crippled the assembler code so that it doesn't output anything to the pins but instead signals completion through the mailbox immediately. Compile the demo and search the listing file for "PUB setWindow(".
Okay,
-O1,~cordic-reorder
does infact bypass the issue. Hmm. Though no idea how a bad reorder would lead it to believe that the second expression can be removed and replaced with the result of the second. My money is on "other bug that only happens in the state that the reorder creates"Edit: Or maybe not, both expressions pass through the same temporary, so if the second expression disappears, they are actually the same
Okay, so here's what's going on: here's the code of intereset, as compiled with
-O1,~cordic-reorder,~local-reuse
. Read bottom to top (as the block find algorithm goes):Thus after a reordering pass we get:
Which is obviously very wrong. After a dead code removal pass:
... we end up with the garbage we get.
Aaaand fix submitted: Fix incorrect reorder when code being reordered over sets a non-dead value that would be clobbered
Wow, that was fast, thank you. (check your inbox for reward)
I can't actually run the code right now (I'm away from my P2), but I see that io_delay is used in some calculations that use multiplies, so it's possible this bug may be the one that Ada just fixed where sometimes pulling calculations into CORDIC delay slots didn't check dependencies correctly.
DEBUG() is a Spin2 feature, so it's not available (yet) in C.
As to how to treat PASM sections in other languages... that's a big headache. The definition of PASM is tied up with the definition of Spin2 in unfortunate ways. This makes things like for example how to evaluate expressions very complicated; do we use the native language order of evaluation, or the Spin2 one? I've tried to work around this to some degree in C by having different keywords (__asm and __pasm) but this is still not a complete solution.
Ok, I understand. So the __pasm{} sections are not parsed by the same parser as the DAT sections are for Spin2 code? Surprises me but I do not know all details of the implementation.
Is there a way to get around this limitation? I could declare the assembler code in a spin2 file, export the entry point address and use that in C code.
Easier debugging saves a lot of time. Especially cog memory is not accessible from the outside and you can't just put a printf there.
Yes, unfortunately the whole C file is parsed by the C parser and so the __pasm syntax is hacked in. Adding DEBUG for it is on my todo list, but I don't know when it will happen.
Yes, that's exactly what I would do. You can have the assembler code in a file that's imported into C via struct using.
I tried to compile and use this minimp3 decoder.
I had to get rid of 64bit integer constants first in "1LL" format, but this was no problem (replacing with a float)
Then I tried to add main() and call some functions from it. All my trial to call mp3_decode() ends with this:
either in Linux or in Windows (official 5.9.14)
Version
5.9.15-beta-HEAD-v5.9.14-9-g03841438 Compiled on: Aug 1 2022
also gives a segmentation faultPik's post reminded me to test this ... No change. Still doing the repeat print crashing with latest build of master.
Version 5.9.15-beta-v5.9.14-9-g03841438 Compiled on: Aug 1 2022
Is there a way to split a long p2asm source code line over two or more lines?
As inline Pasm within a C function, yep, the backslash of ordinary C formatting works.
Replacing with a float will probably lose precision. As of 5.9.14 the 1LL format constants should be supported now. Did you find some that aren't?
Are you sure you posted the version that you were compiling? In both Linux and Windows I did not get any segmentation fault when compiling minimp3.c. I did get some errors about missing 64 bit arithmetic functions; I've fixed those and checked in the fix to github. Now it complains about a missing main(), which seems to be correct.
Does it still happen with optimization turned off? How about in bytecode mode (-2nu)?
It heavily relies on Fcache to place the inline Pasm in cogRAM, so I kind of need the optimiser. Or is there a way to specify where the Pasm is placed irrespective of optimiser setting?
As for -2nu, I get this error:
psram16_batty_crashing.spin2:122: error: identifier getrnd not found
EDIT: Replaced the randfill() function with the newer pasm version and got that compiling with -2nu ... but no joy with the critical part, it crashes at the same place I think ... err, no, the repeat print crashing is much earlier, at the init_smartpins() function .. which is the first piece of inline pasm.
That's why I added a simple main() to this. It compiles until the main tries to call mp3_decode().
Or if I add this to the Basic as a class, the result is the same. I can call mp3_create( ), but not mp3_decode()
The first try was official 5.9.14 in Windows. Then I moved to Linux and used this 5.9.13. The last try was to recompile a new 5.9.15. All of these version give the same result: call mp3decode() and the compiler does segmentation fault.
I will recompile the compiler and try the newest version.
Recompiled.
Version 5.9.15-beta-HEAD-v5.9.14-11-g2f5736d1 Compiled on: Aug 2 2022
Now 1LL works, but calling mp3_decode() still generates a segfault. I will try to debug this by commenting out...
I commented out static int mp3_decode_frame() leaving only header and return - no segfault, but
These are
How much I hate preprocesor programming abuse.... They are f* constants, so they can be declared as constants instead....
This line: (1635)
buf[ + j] = MULH(t0, win[18 + j]);
caused a segfault.
Correcting this to
buf[ 0 + j] = MULH(t0, win[18 + j]);
makes segfault disappear
What is left, the compiler doesn't like these constants at 802-809, but at least it doesn't crash.
I now managed to compile a first test, replacing these FIXHR with precomputed constant integers.
Something seems to work.... Time to try, connect the audio output to the decoder...
Why the pointer returned by calloc in mp3_create is 6382bb88?
Minimp3 is too slow for a P2... One mp3 block, 26 ms of audio, is decoded in 121/113 (with -O2) ms at 337 MHz - 5x too slow to use.
Looking at the code, you haven't redefined MULL/MULH, which causes it to go through a slow path, probably. Would need to look at disassembly...
Also, looking through the macros, I think MULS is only used for 16x16 multiplication, so you can reimplement it as such (normal multiply operator only ever generates QMUL/GETQX)
All this MULS/MULH stuff has to be rewritten which I already tried with MULH (asm with qmul, which worked but it did nothing for the speed). It not only gets time, but I don't know why they decided to work in 15:17 format. It generates noise - I have this minimp3 compiled for Ultibo - and the noise is easy hearable in silent parts of music. I ended with libmad on Ultibo. It is also integer based, but they use 4:28 fixed point and it is silent (enough), Maybe try, compile libmad if I can...
I have also mp2 decoder in C, this is simpler so I may try this too
There's also libtremor lowmem branch for integer-only ogg vorbis. I tried compiling that at some point, but the 64 bit support (and general C compliance) just wasn't there yet.
Semi-relatedly, if the format is of no matter, I've been experimenting for a while to conjure up a decent ADPCM-based music compression scheme with P1 in mind (though I haven't actually developed a non-PC/C++ decoder). I'm on like the 4th rewrite of that and it doesn't totally suck now (i.e. to the point where the quantization noise is low enough that it isn't distracting). But that's at some ~240kbps (slightly less than 60 512 byte SD sectors per second) for 32kHz stereo so perhaps not efficient enough for whatever you want to do.
You can turn off the optimizer and turn on just fcache with a command line option of something like
-O0 --fcache=128
.I don't have the hardware to test this, and I also seem to have an older stdlib.spin2 that's missing some functions you're using, so I can't even compile it . But things to try:
(1) As a sanity check: does this exact code work correctly in PNut / PropTool?
(2) Does the inline pasm depend on any hard coded registers? It doesn't look like it but it's worth double checking.
(3) I don't see any printing in the init_smartpins() function. What exactly is getting stuck printing? What's it printing? Is it possible that it's stuck in some kind of reset loop?
Thanks,