Shop OBEX P1 Docs P2 Docs Learn Events
Inline PASM, WOW! — Parallax Forums

Inline PASM, WOW!

I am reviewing pages 16 and 17 of the Spin2 manual, and all I can say is "WOW!"

If this works as easily as it appears in the manual, this is amazing. You can take those time critical parts of your code and write them in PASM, inline within the object. And it appears that you can do it easily even allowing access to the local variables.

I am excited to optimize some of my objects using this!!

The documentation is, as usual, very terse. Which is good. I need time to digest the syntax but am very excited to try it!

Are flags cleared when entering in-line PASM? Also, regardless of how they are manipulated in the in-line PASM, when returned back to spin, it won't cause problems with the interpreter?

Comments

  • JonnyMacJonnyMac Posts: 9,102
    edited 2022-01-11 19:32

    Inline PASM2 is one of my favorite things about Spin2, and I use it quite a lot.

    Caveat: If you want your code to work under Propeller Tool and under FlexProp, you need to access clkfreq as a high-level construct. Why? Well, Official Spin2 and FlexProp store that value in different locations. I ran into this when working on a WAV playback demo. This was for testing; it will play a wave file that is embedded in the code with the file directive.

    pub play_wav(p_wav, level) : result | systix, t1, mode, nsmpls, smpltix, ls, rs
    
    '' Play WAV file from RAM
    '' -- p_wav is pointer to embedded file
    ''    * must be canonical WAV with header
    '' -- level is volume control, 0..100 (%)
    
      systix := clkfreq                                             ' fix FlexProp incompatibility
    
      org
                    mov       t1, p_wav
                    add       t1, #20                               ' audio format
                    rdbyte    t1, t1
                    cmp       t1, #1                        wcz     ' must be 1 (PCM)
        if_ne       jmp       #.done
    
                    mov       mode, #0                              ' wav configuration mode
    
                    mov       t1, p_wav
                    add       t1, #22                               ' channels
                    rdbyte    t1, t1
                    cmp       t1, #2                        wcz     ' max is 2 channels
        if_a        jmp       #.done
    
                    bitz      mode, #1                              ' 0 = mono, 1 = stereo
    
                    mov       t1, p_wav
                    add       t1, #34                               ' bits per sample
                    rdbyte    t1, t1
                    shr       t1, #3                                ' convert to bytes/sample
                    cmp       t1, #2                        wcz     ' max is 2 bytes/channel
        if_a        jmp       #.done
    
                    bitz      mode, #0                              ' 0 = 8-bit, 1 = 16-bit
    
                    mov       t1, p_wav
                    add       t1, #40
                    rdlong    nsmpls, t1                            ' read sub chunk 2 bytes
    
                    mov       t1, p_wav
                    add       t1, #32                               ' block align (bytes per sample)
                    rdbyte    t1, t1
                    qdiv      nsmpls, t1
                    getqx     nsmpls
    
                    mov       result, nsmpls                        ' return samples in file
    
    '               rdlong    smpltix, #$44                         ' read clkfreq
                    mov       smpltix, systix                       ' fix FlexProp incompatibility
    
                    mov       t1, p_wav
                    add       t1, #24                               ' sample rate
                    rdlong    t1, t1
    
                    qdiv      smpltix, t1                           ' clkfreq/sample rate
                    getqx     smpltix
    
    
    .fix_level      fge       level, #0                             ' keep 0..100
                    fle       level, #100
                    mul       level, #$7FFF/100                     ' set scale factor
    
    
    .start          add       p_wav, #44                            ' point to audio
                    getct     t1                                    ' start sample timer
    
    
    .loop           jmprel    mode
                    jmp       #.mono8
                    jmp       #.mono16
                    jmp       #.stereo8
                    jmp       #.stereo16
    
    
    .mono8          rdbyte    ls, p_wav                             ' read sample
                    add       p_wav, #1                             ' point to next
                    shl       ls, #8                                ' expand sample, make 1v p-p
                    bitnot    ls, #15
                    mov       rs, ls                                ' copy to right channel
                    jmp       #.set_volume
    
    
    .mono16         rdword    ls, p_wav                             ' read sample
                    add       p_wav, #2                             ' point to next
                    mov       rs, ls                                ' copy to right
                    jmp       #.set_volume
    
    
    .stereo8        rdbyte    ls, p_wav                             ' read left sample
                    add       p_wav, #1                             ' point to right
                    rdbyte    rs, p_wav                             ' read right sample
                    add       p_wav, #1                             ' point to next left
                    shl       ls, #8                                ' expand sample, make 1v p-p
                    bitnot    ls, #15                               ' fix sign bit
                    shl       rs, #8                                ' expand sample, make 1v p-p
                    bitnot    rs, #15                               ' fix sign bit
                    jmp       #.set_volume
    
    
    .stereo16       rdword    ls, p_wav                             ' read left sample
                    add       p_wav, #2                             ' point to right
                    rdword    rs, p_wav                             ' read right sample
                    add       p_wav, #2                             ' point to next left
    
    
    .set_volume     muls      ls, level                             ' scale volume
                    shr       ls, #16                               ' adjust
                    bitnot    ls, #15                               ' center at 1v
    
                    muls      rs, level
                    shr       rs, #16
                    bitnot    rs, #15
    
    
    .set_dacs       wypin     ls, #46                               ' *** hard-coded for testing! ***
                    wypin     rs, #47
    
                    addct1    t1, smpltix                           ' update sample timer
                    waitct1                                         ' let it expire
    
                    djnz      nsmpls, #.loop                        ' next sample
    
    
    .quiet          wypin     ##$8000, #46                          ' set to midpoint of DAC
                    wypin     ##$8000, #47
    
    .done           ret
      end
    

    Full disclosure: I ran into a couple of hiccups that Chip helped me through. It works great now, though.

    Are flags cleared when entering in-line PASM? Also, regardless of how they are manipulated in the in-line PASM, when returned back to spin, it won't cause problems with the interpreter?

    I think the answer is yes, but it would be best if @cgracey clarified. I called this...

    pub flags() : result
    
      org
                    muxz      result, #%01
                    muxc      result, #%10
      end
    

    ...from inside a code loop doing various things and the result was always %00.

  • @ke4pjw said:
    Are flags cleared when entering in-line PASM? Also, regardless of how they are manipulated in the in-line PASM, when returned back to spin, it won't cause problems with the interpreter?

    From a study of the spin interpreter code, cleared on entry and don't care on exit.

  • @TonyB_ said:

    @ke4pjw said:
    Are flags cleared when entering in-line PASM? Also, regardless of how they are manipulated in the in-line PASM, when returned back to spin, it won't cause problems with the interpreter?

    From a study of the spin interpreter code, cleared on entry and don't care on exit.

    Beware, "cleared on entry" hasn't been guaranteed in any Spin2 documentation, nor is it necessarily true in other compilers (like flexspin). So if you need a flag to be 0, clear it yourself.

    "Don't care on exit" is also not guaranteed, but given that we haven't been given any explicit instructions to leave the flags in a particular state (or to save/restore them) it's probably OK to assume this one.

  • evanhevanh Posts: 15,915

    @ke4pjw said:
    I am reviewing pages 16 and 17 of the Spin2 manual, and all I can say is "WOW!"

    If this works as easily as it appears in the manual, this is amazing. You can take those time critical parts of your code and write them in PASM, inline within the object. And it appears that you can do it easily even allowing access to the local variables.

    Too right! The environment with inline assembly hides much of the little pitfalls that assembly programming brings just because it usually also drops you on the metal. The simplicity of locals all being cogRAM eases learning of assembly for everyone wanting to give it go.

  • evanhevanh Posts: 15,915
    edited 2022-01-12 00:14

    It's surprising what can be dabbled with too. A good example is the hubexec FIFO hardware. Inline assembly is 100% cogexec (lutexec I think for FlexSpin). Which means the FIFO can be temporarily used (eg: RDFAST, RFLONG) by the inline assembly without worry about crashing the Spin2 calling code.

  • cgraceycgracey Posts: 14,152

    I will document flag behavior for in-line PASM.

  • TonyB_TonyB_ Posts: 2,178
    edited 2022-01-12 12:07

    Here is the Spin2 interpreter code that calls a PASM routine:

    inline
    'some lines omitted, C/Z not written
            call    w           'call pasm code (can use pa/pb/ptra/ptrb/stack, C/Z=0)
            mov     pb,y    wc  'restore bytecode ptr
    if_c    setq    #16-1       'if inline_pasm, restore local variables to hub
    if_c    wrlong  buff,dbase  '
    _ret    mov     ptra,z      'restore ptra
    

    EDIT:
    I think inline (in hub RAM) is called by the following code, which clears C,Z:

    ' Call hub bytecode routine
    '
    hub_code
            rfbyte  pa          'get function index byte
            getptr  pb          'get updated bytecode pointer
            rdword  v,pa    wcz 'lookup function address
            call    v           'call function in hub, c/z/v[31]=0
    resume
    _ret_   rdfast  #0,pb       'resume bytecode stream
    

    The RDFAST at the end allows RDFAST & RFxxxx to be used safely by the inline PASM.

  • @cgracey : Please don't tie your hands by documenting that inline asm starts with flags clear. You may come to regret it later -- at the very least it means you can never change that part of the code. If inline assembly needs the flags to be in a particular state, they should explicitly put them in that state. That way it's obvious to the reader what's going on, and if you ever change the interpreter the code will still work.

    In practice the cost of getting into inline assembly in the interpreter is already fairly high (high enough that any inline assembly that can't afford to spend 4 cycles clearing the flags itself is probably not going to work correctly in future interpreter releases anyway). In a compiler situation though the inline assembly may have 0 overhead (if the code was already running in COG or LUT for some other reason) and then the additional cost of clearing the flags shoud only be borne by code that really needs it, not by every inline assembly snippet.

  • evanhevanh Posts: 15,915
    edited 2022-01-12 12:41

    In that case, a getrnd inb wcz should be added before the inline call. Otherwise, some coders will use the incidental state of flags. It happens all the time in programming.

  • @evanh said:
    In that case, a getrnd inb wcz should be added before the inline call. Otherwise, some coders will use the incidental state of flags. It happens all the time in programming.

    Adding two cycles and one long to randomize C/Z for every inline call seems over the top. MODCZ in inline PASM adds two cycles but only when needed.

  • @evanh said:
    In that case, a getrnd inb wcz should be added before the inline call. Otherwise, some coders will use the incidental state of flags. It happens all the time in programming.

    I have some sympathy with your suggestion, but you can't always stop people from shooting themselves in the foot. Frankly it would be more useful if the compiler could give a warning like "C flag used without being set". Something to add to my todo list, I guess...

  • People who arbitrarily use flags without setting them first need to be taught a hard lesson. I say no warning should be given, and they will have to track down their bug manually. :wink:

  • pik33pik33 Posts: 2,366
    edited 2022-01-12 21:22

    @rogloh said:
    People who arbitrarily use flags without setting them first need to be taught a hard lesson. I say no warning should be given, and they will have to track down their bug manually. :wink:

    P2 makes things somewhat harder. In "normal" CPUs there are sub and cmp. cmp is the kind of sub which doesn't write the result, but writes the flags and writing these flags is the only thing this instruction is intended to do.
    In P2 you have to write cmp a,b wcz - cmp a,b is simply nop.

    Yes, I already did this mistake.

  • roglohrogloh Posts: 5,786
    edited 2022-01-12 21:44

    @pik33 said:
    P2 makes things somewhat harder. In "normal" CPUs there are sub and cmp. cmp is the kind of sub which doesn't write the result, but writes the flags and writing these flags is the only thing this instruction is intended to do.
    In P2 you have to write cmp a,b wcz - cmp a,b is simply nop.

    Yes, I already did this mistake.

    Good point, though I bet it won't happen too many more times now. :smile: You've now learned about what wc/wz/wcz really do. Admittedly, cmp a,b is a strange instruction to actually exist in the ISA. Not sure why it is there without wc/wz/wcz.

  • @rogloh said:

    @pik33 said:
    P2 makes things somewhat harder. In "normal" CPUs there are sub and cmp. cmp is the kind of sub which doesn't write the result, but writes the flags and writing these flags is the only thing this instruction is intended to do.
    In P2 you have to write cmp a,b wcz - cmp a,b is simply nop.

    Yes, I already did this mistake.

    Good point, though I bet it won't happen too many more times now. :smile: You've now learned about what wc/wz/wcz really do. Admittedly, cmp a,b is a strange instruction to actually exist in the ISA. Not sure why it is there without wc/wz/wcz.

    MODCZ is another one that really doesn't need flag write bits, but has them anyways.

    On P1, CMP is not a separate instruction, but just an alias for CMP d,s NR.

  • cgraceycgracey Posts: 14,152
    edited 2022-01-14 10:18

    By still requiring wc/wz/wcz for the few instructions that always need them, it keeps the rule that all flag-affecting operations will have viewable flag-writes, hopefully in a consistent column, making them easy to visually scan for.

  • Chip, what's your view on clearing C,Z before calling inline PASM?

  • cgraceycgracey Posts: 14,152

    @TonyB_ said:
    Chip, what's your view on clearing C,Z before calling inline PASM?

    I did it because there was no cost in doing it. I just added WCZ to an existing instruction.

    It is better to have flags in known states on entry, but I understand ersmith's desire to avoid this, since it would force unnecessary code emission in his compiler.

Sign In or Register to comment.