Shop Learn P1 Docs P2 Docs
flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler - Page 98 — Parallax Forums

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

19394959698

Comments

  • Hmm, the optimizer seems to be "too smart" and optimizes away important parts of the code. This is the source:

    PUB {++opt(1)} setWindow(xs, ys, xe, ye)
    ' Set Screen Window (column start, row start, column end, row end)
      curx:= xs
      cury:= ys
      WaitMailbox()
      mailbox2s:= (xs<<16) | ys                              ' xs, ys
      mailbox2e:= (xe<<16) | ye                              ' xe, ye
      mailbox4 := (xe - xs +1)*(ye - ys +1)                  ' calc no. of pixels
      mailbox  := $81_00_0000
    

    With optimization off this is compiled correctly:

    01f64                 | ' PUB  setWindow(xs, ys, xe, ye)
    01f64                 | _LCD_Graphics_Driver_setWindow
    01f64     07 72 04 F6 |     mov COUNT_, #7
    01f68     3C 00 A0 FD |     call    #pushregs_
    01f6c     7A 00 01 F6 |     mov local01, arg01
    01f70     7B 02 01 F6 |     mov local02, arg02
    01f74     7C 04 01 F6 |     mov local03, arg03
    01f78     7D 06 01 F6 |     mov local04, arg04
    01f7c                 | ' ' Set Screen Window (column start, row start, column end, row end)
    01f7c                 | '   curx:= xs
    01f7c     24 B4 04 F1 |     add objptr, #36
    01f80     5A 00 61 FC |     wrlong  local01, objptr
    01f84     24 B4 84 F1 |     sub objptr, #36
    01f88                 | '   cury:= ys
    01f88     28 B4 04 F1 |     add objptr, #40
    01f8c     5A 02 61 FC |     wrlong  local02, objptr
    01f90     28 B4 84 F1 |     sub objptr, #40
    01f94                 | '   WaitMailbox()
    01f94     98 FE BF FD |     call    #_LCD_Graphics_Driver_WaitMailbox
    01f98                 | '   mailbox2s:= (xs<<16) | ys                              ' xs, ys
    01f98     80 08 01 F6 |     mov local05, local01
    01f9c     10 08 65 F0 |     shl local05, #16
    01fa0     81 08 41 F5 |     or  local05, local02
    01fa4     08 B6 04 F1 |     add ptr__LCD_Graphics_Driver_dat__, #8
    01fa8     5B 08 61 FC |     wrlong  local05, ptr__LCD_Graphics_Driver_dat__
    01fac     08 B6 84 F1 |     sub ptr__LCD_Graphics_Driver_dat__, #8
    01fb0                 | '   mailbox2e:= (xe<<16) | ye                              ' xe, ye
    01fb0     82 08 01 F6 |     mov local05, local03
    01fb4     10 08 65 F0 |     shl local05, #16
    01fb8     83 08 41 F5 |     or  local05, local04
    01fbc     0C B6 04 F1 |     add ptr__LCD_Graphics_Driver_dat__, #12
    01fc0     5B 08 61 FC |     wrlong  local05, ptr__LCD_Graphics_Driver_dat__
    01fc4     0C B6 84 F1 |     sub ptr__LCD_Graphics_Driver_dat__, #12
    01fc8                 | '   mailbox4 := (xe - xs +1)*(ye - ys +1)                  ' calc no. of pixels
    01fc8     82 08 01 F6 |     mov local05, local03
    01fcc     80 08 81 F1 |     sub local05, local01
    01fd0     01 08 05 F1 |     add local05, #1
    01fd4     83 0A 01 F6 |     mov local06, local04
    01fd8     81 0A 81 F1 |     sub local06, local02
    01fdc     01 0A 05 F1 |     add local06, #1
    01fe0     85 08 01 FD |     qmul    local05, local06
    01fe4     18 0C 61 FD |     getqx   local07
    01fe8     14 B6 04 F1 |     add ptr__LCD_Graphics_Driver_dat__, #20
    01fec     5B 0C 61 FC |     wrlong  local07, ptr__LCD_Graphics_Driver_dat__
    01ff0     14 B6 84 F1 |     sub ptr__LCD_Graphics_Driver_dat__, #20
    01ff4                 | '   mailbox  := $81_00_0000
    01ff4     00 80 40 FF 
    01ff8     00 08 05 F6 |     mov local05, ##-2130706432
    01ffc     5B 08 61 FC |     wrlong  local05, ptr__LCD_Graphics_Driver_dat__
    02000     3B F0 03 F6 |     mov ptra, fp
    02004     46 00 A0 FD |     call    #popregs_
    02008                 | _LCD_Graphics_Driver_setWindow_ret
    02008     2D 00 64 FD |     ret
    

    However with -O1 all the calculations are skipped:

    01f64                 | ' PUB {++opt(1)} setWindow(xs, ys, xe, ye)
    01f64                 | _LCD_Graphics_Driver_setWindow
    01f64                 | ' ' Set Screen Window (column start, row start, column end, row end)
    01f64                 | '   curx:= xs
    01f64     24 B4 04 F1 |     add objptr, #36
    01f68     5A F4 60 FC |     wrlong  arg01, objptr
    01f6c                 | '   cury:= ys
    01f6c     04 B4 04 F1 |     add objptr, #4
    01f70     5A F6 60 FC |     wrlong  arg02, objptr
    01f74     28 B4 84 F1 |     sub objptr, #40
    01f78                 | '   WaitMailbox()
    01f78     B4 FE BF FD |     call    #_LCD_Graphics_Driver_WaitMailbox
    01f7c     7A F8 80 F1 |     sub arg03, arg01
    01f80     01 F8 04 F1 |     add arg03, #1
    01f84     7B FA 80 F1 |     sub arg04, arg02
    01f88     01 FA 04 F1 |     add arg04, #1
    01f8c     7D F8 00 FD |     qmul    arg03, arg04
    01f90                 | '   mailbox2s:= (xs<<16) | ys                              ' xs, ys
    01f90     10 F4 64 F0 |     shl arg01, #16
    01f94     7B F4 40 F5 |     or  arg01, arg02
    01f98     08 B6 04 F1 |     add ptr__LCD_Graphics_Driver_dat__, #8
    01f9c     5B F4 60 FC |     wrlong  arg01, ptr__LCD_Graphics_Driver_dat__
    01fa0                 | '   mailbox2e:= (xe<<16) | ye                              ' xe, ye
    01fa0     04 B6 04 F1 |     add ptr__LCD_Graphics_Driver_dat__, #4
    01fa4     5B F4 60 FC |     wrlong  arg01, ptr__LCD_Graphics_Driver_dat__
    01fa8                 | '   mailbox4 := (xe - xs +1)*(ye - ys +1)                  ' calc no. of pixels
    01fa8     08 B6 04 F1 |     add ptr__LCD_Graphics_Driver_dat__, #8
    01fac                 | '   mailbox  := $81_00_0000
    01fac     18 FA 60 FD |     getqx   arg04
    01fb0     5B FA 60 FC |     wrlong  arg04, ptr__LCD_Graphics_Driver_dat__
    01fb4     14 B6 84 F1 |     sub ptr__LCD_Graphics_Driver_dat__, #20
    01fb8     00 80 C0 FF 
    01fbc     5B 00 68 FC |     wrlong  ##-2130706432, ptr__LCD_Graphics_Driver_dat__
    01fc0                 | _LCD_Graphics_Driver_setWindow_ret
    01fc0     2D 00 64 FD |     ret
    

    I think FlexSpin detects that the mailbox2s/2e and mailbox4 variables are never used (by the Spin code) and therefore thinks the calculations can be optimized away. However, the PASM part of the code relies on the results! I hope I don't have to mark them as "volatile" or something.

  • The only calculation that's missing is the one for mailbox2e, which it apparently decides can be the same as mailbox2s. Try -O1,~cordic-reorder or -O1,~local-reuse, it's likely one of those having a normal one.

  • Yes, you're correct, the same value is stored int mailbox2e as in mailbox2s which explains why it works for single pixel units. But the calculation of mailbox4 (no. of pixels) is totally skipped.

    @Wuerfel_21 said:
    Try -O1,~cordic-reorder or -O1,~local-reuse, it's likely one of those having a normal one.

    You mean as temporary workaround? If the mailbox variables were local then I'd agree that the compiler could optimize away calculations of which the result is nerver used. But writing to a DAT section should clearly signal "hey, this data might be used somewhere else, don't optimize away!".

    Anyway, replacing an expression mailbox2e:= (xe<<16) | ye with a different one mailbox2e:= (xs<<16) | ys is definitely a compiler bug and can't be explained with "too much optimization". It's simply wrong. Or am I missing something?

  • Ok, next problem. I can't find anything in the docs about how DEBUG is supported. It seems to work in Spin, e.g. if I put something like debug ("paintIcon", udec(bg), udec(fg)) into the assembler part in the DAT section inside the Spin2 code and switch on "BRK debug (P2 only)" in the options menu of FlexProp I can see the debug output in the terminal window.

    But if I try the same in the assembler part inside a C file then I get error messages like "unexpected '(' in line..." after the "debug" statement. I think PASM sections should be treated the same no matter if they are in a Spin or C file, shouldn't they.

  • Wuerfel_21Wuerfel_21 Posts: 3,038
    edited 2022-07-27 13:09

    @ManAtWork said:
    Yes, you're correct, the same value is stored int mailbox2e as in mailbox2s which explains why it works for single pixel units. But the calculation of mailbox4 (no. of pixels) is totally skipped.

    Where do you see it skipped? It just moves it up due to the pipeline delay on the multiply (or, more accurately, stuff from above and below is pulled into the gap between QMUL/GETQX):

    01f7c     7A F8 80 F1 |     sub arg03, arg01
    01f80     01 F8 04 F1 |     add arg03, #1
    01f84     7B FA 80 F1 |     sub arg04, arg02
    01f88     01 FA 04 F1 |     add arg04, #1
    01f8c     7D F8 00 FD |     qmul    arg03, arg04
                                [other stuff here]
    01fac     18 FA 60 FD |     getqx   arg04
    01fb0     5B FA 60 FC |     wrlong  arg04, ptr__LCD_Graphics_Driver_dat__
    

    @Wuerfel_21 said:
    Try -O1,~cordic-reorder or -O1,~local-reuse, it's likely one of those having a normal one.

    You mean as temporary workaround? If the mailbox variables were local then I'd agree that the compiler could optimize away calculations of which the result is nerver used. But writing to a DAT section should clearly signal "hey, this data might be used somewhere else, don't optimize away!".

    To figure out which optflag is buggy. Alternatively, post a complete sample that we can try locally (doesn't have to actually do anything, just enough to get the buggy codegen to show up)

    Anyway, replacing an expression mailbox2e:= (xe<<16) | ye with a different one mailbox2e:= (xs<<16) | ys is definitely a compiler bug and can't be explained with "too much optimization". It's simply wrong. Or am I missing something?

    Yes, that is indeed wrong. Are you using the very latest version, btw? I think a bug kinda like that was fixed recently.

  • @Wuerfel_21 said:
    Where do you see it skipped? It just moves it up due to the pipeline delay on the multiply (or, more accurately, stuff from above and below is pulled into the gap between QMUL/GETQX):

    Uhh! I didn't look carefully enough. I just saw that a lot of code is missing between the wrlong to mailbox2e and wrlong to mailbox4. I haven't noticed that the code has moved upward.

    @Wuerfel_21 said:
    ... Try -O1,~cordic-reorder or -O1,~local-reuse, it's likely one of those having a normal one.
    To figure out which optflag is buggy. Alternatively, post a complete sample that we can try locally (doesn't have to actually do anything, just enough to get the buggy codegen to show up)

    I've extracted a reduced version that can show the bug but doesn't output anything. It's not as short as it could be. I simply crippled the assembler code so that it doesn't output anything to the pins but instead signals completion through the mailbox immediately. Compile the demo and search the listing file for "PUB setWindow(".

  • Wuerfel_21Wuerfel_21 Posts: 3,038
    edited 2022-07-27 14:53

    Okay, -O1,~cordic-reorder does infact bypass the issue. Hmm. Though no idea how a bad reorder would lead it to believe that the second expression can be removed and replaced with the result of the second. My money is on "other bug that only happens in the state that the reorder creates"

    Edit: Or maybe not, both expressions pass through the same temporary, so if the second expression disappears, they are actually the same

  • Wuerfel_21Wuerfel_21 Posts: 3,038
    edited 2022-07-27 15:18

    Okay, so here's what's going on: here's the code of intereset, as compiled with -O1,~cordic-reorder,~local-reuse. Read bottom to top (as the block find algorithm goes):

        mov local02, arg01                         ' \
        shl local02, #16                           ' |
        or  local02, arg02                         ' | IT REORDERS THIS BLOCK WTF
        add ptr__LCD_Opt_Driver_dat__, #8          ' |
        wrlong  local02, ptr__LCD_Opt_Driver_dat__ ' /
        mov local02, arg03 ' \ 
        shl local02, #16   ' / Can't reorder because dependency
        or  local02, arg04 ' Can't reorder due to arg4 use
        add ptr__LCD_Opt_Driver_dat__, #4  ' Can reorder
        wrlong  local02, ptr__LCD_Opt_Driver_dat__ ' Can reorder
        sub arg03, arg01    ' \
        add arg03, #1       ' | QMUL dependencies, can't reorder
        sub arg04, arg02    ' |
        add arg04, #1       ' /
        qmul    arg03, arg04
        getqx   local03
    

    Thus after a reordering pass we get:

        mov local02, arg03
        shl local02, #16
        or  local02, arg04
        sub arg03, arg01
        add arg03, #1
        sub arg04, arg02
        add arg04, #1
        qmul    arg03, arg04
        mov local02, arg01
        shl local02, #16
        or  local02, arg02
        add ptr__LCD_Opt_Driver_dat__, #8
        wrlong  local02, ptr__LCD_Opt_Driver_dat__
        add ptr__LCD_Opt_Driver_dat__, #4
        wrlong  local02, ptr__LCD_Opt_Driver_dat__
        getqx   local03
    

    Which is obviously very wrong. After a dead code removal pass:

        sub arg03, arg01
        add arg03, #1
        sub arg04, arg02
        add arg04, #1
        qmul    arg03, arg04
        mov local02, arg01
        shl local02, #16
        or  local02, arg02
        add ptr__LCD_Opt_Driver_dat__, #8
        wrlong  local02, ptr__LCD_Opt_Driver_dat__
        add ptr__LCD_Opt_Driver_dat__, #4
        wrlong  local02, ptr__LCD_Opt_Driver_dat__
        getqx   local03
    

    ... we end up with the garbage we get.

  • Wow, that was fast, thank you. :) (check your inbox for reward)

  • ersmithersmith Posts: 5,404

    @evanh said:
    I've managed to save one case - full source code attached - of weird behaviour by making a backup of it at that time. It's a different problem I think but one that has cropped up in completely different code before ... In the DAT section below, if I swap the two lines io_delay long 1 and prblob long 0[4] with each other then I get a repeating-print crash. Otherwise it runs fine.

    DAT
    prblob          long    0[4]
    io_delay        long    1
    txdata          byte    0[BLOCKSIZE]
    rxdata          byte    0[BLOCKSIZE]
    

    PS: The crash occurs at the first block copying to/from the PSRAM. After it has reset and ID queried each chip.
    PPS: And a little debug shows it is during the transmit routine. So it's the first attempt to block write from the buffers. But not the first use of either io_delay or prblob.

    I can't actually run the code right now (I'm away from my P2), but I see that io_delay is used in some calculations that use multiplies, so it's possible this bug may be the one that Ada just fixed where sometimes pulling calculations into CORDIC delay slots didn't check dependencies correctly.

  • ersmithersmith Posts: 5,404

    @ManAtWork said:
    Ok, next problem. I can't find anything in the docs about how DEBUG is supported. It seems to work in Spin, e.g. if I put something like debug ("paintIcon", udec(bg), udec(fg)) into the assembler part in the DAT section inside the Spin2 code and switch on "BRK debug (P2 only)" in the options menu of FlexProp I can see the debug output in the terminal window.

    But if I try the same in the assembler part inside a C file then I get error messages like "unexpected '(' in line..." after the "debug" statement. I think PASM sections should be treated the same no matter if they are in a Spin or C file, shouldn't they.

    DEBUG() is a Spin2 feature, so it's not available (yet) in C.

    As to how to treat PASM sections in other languages... that's a big headache. The definition of PASM is tied up with the definition of Spin2 in unfortunate ways. This makes things like for example how to evaluate expressions very complicated; do we use the native language order of evaluation, or the Spin2 one? I've tried to work around this to some degree in C by having different keywords (__asm and __pasm) but this is still not a complete solution.

  • Ok, I understand. So the __pasm{} sections are not parsed by the same parser as the DAT sections are for Spin2 code? Surprises me but I do not know all details of the implementation.

    Is there a way to get around this limitation? I could declare the assembler code in a spin2 file, export the entry point address and use that in C code.

    Easier debugging saves a lot of time. Especially cog memory is not accessible from the outside and you can't just put a printf there.

  • ersmithersmith Posts: 5,404

    @ManAtWork said:
    Ok, I understand. So the __pasm{} sections are not parsed by the same parser as the DAT sections are for Spin2 code? Surprises me but I do not know all details of the implementation.

    Yes, unfortunately the whole C file is parsed by the C parser and so the __pasm syntax is hacked in. Adding DEBUG for it is on my todo list, but I don't know when it will happen.

    Is there a way to get around this limitation? I could declare the assembler code in a spin2 file, export the entry point address and use that in C code.

    Yes, that's exactly what I would do. You can have the assembler code in a file that's imported into C via struct using.

  • pik33pik33 Posts: 1,735
    edited 2022-08-01 10:11

    I tried to compile and use this minimp3 decoder.

    I had to get rid of 64bit integer constants first in "1LL" format, but this was no problem (replacing with a float)

    Then I tried to add main() and call some functions from it. All my trial to call mp3_decode() ends with this:

    "/home/pik33/Programy/flexprop/bin/flexspin" -2 -l --tabs=8 -D_BAUD=230400 -O1    --charset=utf8 -I "/home/pik33/Programy/flexprop/include"  "/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3.c" 
    Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2022 Total Spectrum Software Inc.
    Version 5.9.13-beta-HEAD-v5.9.12-3-ge286147b Compiled on: Jun 21 2022
    child killed: segmentation violation
    Finished at Mon Aug  1 11:49:15 2022
    

    either in Linux or in Windows (official 5.9.14)


    Version 5.9.15-beta-HEAD-v5.9.14-9-g03841438 Compiled on: Aug 1 2022also gives a segmentation fault

  • evanhevanh Posts: 13,424
    edited 2022-08-01 10:15

    @ersmith said:
    I can't actually run the code right now (I'm away from my P2), but I see that io_delay is used in some calculations that use multiplies, so it's possible this bug may be the one that Ada just fixed where sometimes pulling calculations into CORDIC delay slots didn't check dependencies correctly.

    Pik's post reminded me to test this ... No change. Still doing the repeat print crashing with latest build of master. :(
    Version 5.9.15-beta-v5.9.14-9-g03841438 Compiled on: Aug 1 2022

  • TonyB_TonyB_ Posts: 1,952
    edited 2022-08-01 10:26

    Is there a way to split a long p2asm source code line over two or more lines?

  • evanhevanh Posts: 13,424

    @TonyB_ said:
    Is there a way to split a long p2asm source code line over two or more lines?

    As inline Pasm within a C function, yep, the backslash of ordinary C formatting works.

  • ersmithersmith Posts: 5,404

    @pik33 said:
    I tried to compile and use this minimp3 decoder.

    I had to get rid of 64bit integer constants first in "1LL" format, but this was no problem (replacing with a float)

    Replacing with a float will probably lose precision. As of 5.9.14 the 1LL format constants should be supported now. Did you find some that aren't?

    Then I tried to add main() and call some functions from it. All my trial to call mp3_decode() ends with this:

    "/home/pik33/Programy/flexprop/bin/flexspin" -2 -l --tabs=8 -D_BAUD=230400 -O1    --charset=utf8 -I "/home/pik33/Programy/flexprop/include"  "/home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3.c" 
    Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2022 Total Spectrum Software Inc.
    Version 5.9.13-beta-HEAD-v5.9.12-3-ge286147b Compiled on: Jun 21 2022
    child killed: segmentation violation
    Finished at Mon Aug  1 11:49:15 2022
    

    either in Linux or in Windows (official 5.9.14)

    Are you sure you posted the version that you were compiling? In both Linux and Windows I did not get any segmentation fault when compiling minimp3.c. I did get some errors about missing 64 bit arithmetic functions; I've fixed those and checked in the fix to github. Now it complains about a missing main(), which seems to be correct.

  • ersmithersmith Posts: 5,404

    @evanh said:

    @ersmith said:
    I can't actually run the code right now (I'm away from my P2), but I see that io_delay is used in some calculations that use multiplies, so it's possible this bug may be the one that Ada just fixed where sometimes pulling calculations into CORDIC delay slots didn't check dependencies correctly.

    Pik's post reminded me to test this ... No change. Still doing the repeat print crashing with latest build of master. :(
    Version 5.9.15-beta-v5.9.14-9-g03841438 Compiled on: Aug 1 2022

    Does it still happen with optimization turned off? How about in bytecode mode (-2nu)?

  • evanhevanh Posts: 13,424
    edited 2022-08-02 03:47

    @ersmith said:
    Does it still happen with optimization turned off? How about in bytecode mode (-2nu)?

    It heavily relies on Fcache to place the inline Pasm in cogRAM, so I kind of need the optimiser. Or is there a way to specify where the Pasm is placed irrespective of optimiser setting?

    As for -2nu, I get this error: psram16_batty_crashing.spin2:122: error: identifier getrnd not found

    EDIT: Replaced the randfill() function with the newer pasm version and got that compiling with -2nu ... but no joy with the critical part, it crashes at the same place I think ... err, no, the repeat print crashing is much earlier, at the init_smartpins() function .. which is the first piece of inline pasm.

  • pik33pik33 Posts: 1,735
    edited 2022-08-02 09:08

    Now it complains about a missing main(), which seems to be correct.

    That's why I added a simple main() to this. It compiles until the main tries to call mp3_decode().
    Or if I add this to the Basic as a class, the result is the same. I can call mp3_create( ), but not mp3_decode()
    The first try was official 5.9.14 in Windows. Then I moved to Linux and used this 5.9.13. The last try was to recompile a new 5.9.15. All of these version give the same result: call mp3decode() and the compiler does segmentation fault.

    I will recompile the compiler and try the newest version.


    Recompiled.

    Version 5.9.15-beta-HEAD-v5.9.14-11-g2f5736d1 Compiled on: Aug 2 2022

    Now 1LL works, but calling mp3_decode() still generates a segfault. I will try to debug this by commenting out...

  • pik33pik33 Posts: 1,735
    edited 2022-08-02 13:22

    I commented out static int mp3_decode_frame() leaving only header and return - no segfault, but

    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:802: error: global initializers are evaluated at compile time and therefore must be constant
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:802: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:802: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:803: error: global initializers are evaluated at compile time and therefore must be constant
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:803: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:803: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:804: error: global initializers are evaluated at compile time and therefore must be constant
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:804: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:804: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:805: error: global initializers are evaluated at compile time and therefore must be constant
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:805: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:805: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:806: error: global initializers are evaluated at compile time and therefore must be constant
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:806: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:806: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:807: error: global initializers are evaluated at compile time and therefore must be constant
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:807: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:807: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:808: error: global initializers are evaluated at compile time and therefore must be constant
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:808: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:808: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:809: error: global initializers are evaluated at compile time and therefore must be constant
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:809: error: Bad constant expression
    /home/pik33/Programowanie/P2-retromachine/Propeller/P2P16/minimp3/minimp3.c:809: error: Bad constant expression
    

    These are

    static const int icos36h[9] = {
        FIXHR(0.50190991877167369479/2),            //--> line 802
        FIXHR(0.51763809020504152469/2), //0
        FIXHR(0.55168895948124587824/2),
        FIXHR(0.61038729438072803416/2),
        FIXHR(0.70710678118654752439/2), //1
        FIXHR(0.87172339781054900991/2),
        FIXHR(1.18310079157624925896/4),
        FIXHR(1.93185165257813657349/4), //2
    //    FIXHR(5.73685662283492756461),
    };
    

    How much I hate preprocesor programming :( abuse.... They are f* constants, so they can be declared as constants instead....

  • pik33pik33 Posts: 1,735

    This line: (1635)

    buf[ + j] = MULH(t0, win[18 + j]);

    caused a segfault.

    Correcting this to

    buf[ 0 + j] = MULH(t0, win[18 + j]);

    makes segfault disappear

    What is left, the compiler doesn't like these constants at 802-809, but at least it doesn't crash.

  • pik33pik33 Posts: 1,735
    edited 2022-08-02 13:55

    I now managed to compile a first test, replacing these FIXHR with precomputed constant integers.
    Something seems to work.... Time to try, connect the audio output to the decoder...
    Why the pointer returned by calloc in mp3_create is 6382bb88?


    Minimp3 is too slow for a P2... One mp3 block, 26 ms of audio, is decoded in 121/113 (with -O2) ms :( at 337 MHz - 5x too slow to use.

  • Looking at the code, you haven't redefined MULL/MULH, which causes it to go through a slow path, probably. Would need to look at disassembly...

  • Also, looking through the macros, I think MULS is only used for 16x16 multiplication, so you can reimplement it as such (normal multiply operator only ever generates QMUL/GETQX)

    static inline int32_t MULS(int16_t a, int16_t b) {
      int32_t tmp = a;
      __asm {
        muls tmp,b
      }
      return tmp;
    }
    
  • pik33pik33 Posts: 1,735

    All this MULS/MULH stuff has to be rewritten which I already tried with MULH (asm with qmul, which worked but it did nothing for the speed). It not only gets time, but I don't know why they decided to work in 15:17 format. It generates noise - I have this minimp3 compiled for Ultibo - and the noise is easy hearable in silent parts of music. I ended with libmad on Ultibo. It is also integer based, but they use 4:28 fixed point and it is silent (enough), Maybe try, compile libmad if I can...

    I have also mp2 decoder in C, this is simpler so I may try this too :)

  • There's also libtremor lowmem branch for integer-only ogg vorbis. I tried compiling that at some point, but the 64 bit support (and general C compliance) just wasn't there yet.

    Semi-relatedly, if the format is of no matter, I've been experimenting for a while to conjure up a decent ADPCM-based music compression scheme with P1 in mind (though I haven't actually developed a non-PC/C++ decoder). I'm on like the 4th rewrite of that and it doesn't totally suck now (i.e. to the point where the quantization noise is low enough that it isn't distracting). But that's at some ~240kbps (slightly less than 60 512 byte SD sectors per second) for 32kHz stereo so perhaps not efficient enough for whatever you want to do.

  • ersmithersmith Posts: 5,404

    @evanh said:

    @ersmith said:
    Does it still happen with optimization turned off? How about in bytecode mode (-2nu)?

    It heavily relies on Fcache to place the inline Pasm in cogRAM, so I kind of need the optimiser. Or is there a way to specify where the Pasm is placed irrespective of optimiser setting?

    You can turn off the optimizer and turn on just fcache with a command line option of something like -O0 --fcache=128.

    As for -2nu, I get this error: psram16_batty_crashing.spin2:122: error: identifier getrnd not found

    EDIT: Replaced the randfill() function with the newer pasm version and got that compiling with -2nu ... but no joy with the critical part, it crashes at the same place I think ... err, no, the repeat print crashing is much earlier, at the init_smartpins() function .. which is the first piece of inline pasm.

    I don't have the hardware to test this, and I also seem to have an older stdlib.spin2 that's missing some functions you're using, so I can't even compile it :(. But things to try:
    (1) As a sanity check: does this exact code work correctly in PNut / PropTool?
    (2) Does the inline pasm depend on any hard coded registers? It doesn't look like it but it's worth double checking.
    (3) I don't see any printing in the init_smartpins() function. What exactly is getting stuck printing? What's it printing? Is it possible that it's stuck in some kind of reset loop?

    Thanks,

Sign In or Register to comment.