Inline PASM, WOW!

ke4pjw · 2022-01-11 18:59

I am reviewing pages 16 and 17 of the Spin2 manual, and all I can say is "WOW!"

If this works as easily as it appears in the manual, this is amazing. You can take those time critical parts of your code and write them in PASM, inline within the object. And it appears that you can do it easily even allowing access to the local variables.

I am excited to optimize some of my objects using this!!

The documentation is, as usual, very terse. Which is good. I need time to digest the syntax but am very excited to try it!

Are flags cleared when entering in-line PASM? Also, regardless of how they are manipulated in the in-line PASM, when returned back to spin, it won't cause problems with the interpreter?

JonnyMac · 2022-01-11 19:11

Inline PASM2 is one of my favorite things about Spin2, and I use it quite a lot.

Caveat: If you want your code to work under Propeller Tool and under FlexProp, you need to access clkfreq as a high-level construct. Why? Well, Official Spin2 and FlexProp store that value in different locations. I ran into this when working on a WAV playback demo. This was for testing; it will play a wave file that is embedded in the code with the file directive.

pub play_wav(p_wav, level) : result | systix, t1, mode, nsmpls, smpltix, ls, rs

'' Play WAV file from RAM
'' -- p_wav is pointer to embedded file
''    * must be canonical WAV with header
'' -- level is volume control, 0..100 (%)

  systix := clkfreq                                             ' fix FlexProp incompatibility

  org
                mov       t1, p_wav
                add       t1, #20                               ' audio format
                rdbyte    t1, t1
                cmp       t1, #1                        wcz     ' must be 1 (PCM)
    if_ne       jmp       #.done

                mov       mode, #0                              ' wav configuration mode

                mov       t1, p_wav
                add       t1, #22                               ' channels
                rdbyte    t1, t1
                cmp       t1, #2                        wcz     ' max is 2 channels
    if_a        jmp       #.done

                bitz      mode, #1                              ' 0 = mono, 1 = stereo

                mov       t1, p_wav
                add       t1, #34                               ' bits per sample
                rdbyte    t1, t1
                shr       t1, #3                                ' convert to bytes/sample
                cmp       t1, #2                        wcz     ' max is 2 bytes/channel
    if_a        jmp       #.done

                bitz      mode, #0                              ' 0 = 8-bit, 1 = 16-bit

                mov       t1, p_wav
                add       t1, #40
                rdlong    nsmpls, t1                            ' read sub chunk 2 bytes

                mov       t1, p_wav
                add       t1, #32                               ' block align (bytes per sample)
                rdbyte    t1, t1
                qdiv      nsmpls, t1
                getqx     nsmpls

                mov       result, nsmpls                        ' return samples in file

'               rdlong    smpltix, #$44                         ' read clkfreq
                mov       smpltix, systix                       ' fix FlexProp incompatibility

                mov       t1, p_wav
                add       t1, #24                               ' sample rate
                rdlong    t1, t1

                qdiv      smpltix, t1                           ' clkfreq/sample rate
                getqx     smpltix


.fix_level      fge       level, #0                             ' keep 0..100
                fle       level, #100
                mul       level, #$7FFF/100                     ' set scale factor


.start          add       p_wav, #44                            ' point to audio
                getct     t1                                    ' start sample timer


.loop           jmprel    mode
                jmp       #.mono8
                jmp       #.mono16
                jmp       #.stereo8
                jmp       #.stereo16


.mono8          rdbyte    ls, p_wav                             ' read sample
                add       p_wav, #1                             ' point to next
                shl       ls, #8                                ' expand sample, make 1v p-p
                bitnot    ls, #15
                mov       rs, ls                                ' copy to right channel
                jmp       #.set_volume


.mono16         rdword    ls, p_wav                             ' read sample
                add       p_wav, #2                             ' point to next
                mov       rs, ls                                ' copy to right
                jmp       #.set_volume


.stereo8        rdbyte    ls, p_wav                             ' read left sample
                add       p_wav, #1                             ' point to right
                rdbyte    rs, p_wav                             ' read right sample
                add       p_wav, #1                             ' point to next left
                shl       ls, #8                                ' expand sample, make 1v p-p
                bitnot    ls, #15                               ' fix sign bit
                shl       rs, #8                                ' expand sample, make 1v p-p
                bitnot    rs, #15                               ' fix sign bit
                jmp       #.set_volume


.stereo16       rdword    ls, p_wav                             ' read left sample
                add       p_wav, #2                             ' point to right
                rdword    rs, p_wav                             ' read right sample
                add       p_wav, #2                             ' point to next left


.set_volume     muls      ls, level                             ' scale volume
                shr       ls, #16                               ' adjust
                bitnot    ls, #15                               ' center at 1v

                muls      rs, level
                shr       rs, #16
                bitnot    rs, #15


.set_dacs       wypin     ls, #46                               ' *** hard-coded for testing! ***
                wypin     rs, #47

                addct1    t1, smpltix                           ' update sample timer
                waitct1                                         ' let it expire

                djnz      nsmpls, #.loop                        ' next sample


.quiet          wypin     ##$8000, #46                          ' set to midpoint of DAC
                wypin     ##$8000, #47

.done           ret
  end

Full disclosure: I ran into a couple of hiccups that Chip helped me through. It works great now, though.

Are flags cleared when entering in-line PASM? Also, regardless of how they are manipulated in the in-line PASM, when returned back to spin, it won't cause problems with the interpreter?

I think the answer is yes, but it would be best if @cgracey clarified. I called this...

pub flags() : result

  org
                muxz      result, #%01
                muxc      result, #%10
  end

...from inside a code loop doing various things and the result was always %00.

TonyB_ · 2022-01-11 19:43

@ke4pjw said:
Are flags cleared when entering in-line PASM? Also, regardless of how they are manipulated in the in-line PASM, when returned back to spin, it won't cause problems with the interpreter?

From a study of the spin interpreter code, cleared on entry and don't care on exit.

ersmith · 2022-01-11 20:22

@TonyB_ said:

show previous quotes

@ke4pjw said:
Are flags cleared when entering in-line PASM? Also, regardless of how they are manipulated in the in-line PASM, when returned back to spin, it won't cause problems with the interpreter?

From a study of the spin interpreter code, cleared on entry and don't care on exit.

Beware, "cleared on entry" hasn't been guaranteed in any Spin2 documentation, nor is it necessarily true in other compilers (like flexspin). So if you need a flag to be 0, clear it yourself.

"Don't care on exit" is also not guaranteed, but given that we haven't been given any explicit instructions to leave the flags in a particular state (or to save/restore them) it's probably OK to assume this one.

evanh · 2022-01-11 23:27

@ke4pjw said:
I am reviewing pages 16 and 17 of the Spin2 manual, and all I can say is "WOW!"

If this works as easily as it appears in the manual, this is amazing. You can take those time critical parts of your code and write them in PASM, inline within the object. And it appears that you can do it easily even allowing access to the local variables.

Too right! The environment with inline assembly hides much of the little pitfalls that assembly programming brings just because it usually also drops you on the metal. The simplicity of locals all being cogRAM eases learning of assembly for everyone wanting to give it go.

evanh · 2022-01-11 23:34

It's surprising what can be dabbled with too. A good example is the hubexec FIFO hardware. Inline assembly is 100% cogexec (lutexec I think for FlexSpin). Which means the FIFO can be temporarily used (eg: RDFAST, RFLONG) by the inline assembly without worry about crashing the Spin2 calling code.

cgracey · 2022-01-12 05:59

I will document flag behavior for in-line PASM.

TonyB_ · 2022-01-12 11:32

Here is the Spin2 interpreter code that calls a PASM routine:

inline
'some lines omitted, C/Z not written
        call    w           'call pasm code (can use pa/pb/ptra/ptrb/stack, C/Z=0)
        mov     pb,y    wc  'restore bytecode ptr
if_c    setq    #16-1       'if inline_pasm, restore local variables to hub
if_c    wrlong  buff,dbase  '
_ret    mov     ptra,z      'restore ptra

EDIT:
I think inline (in hub RAM) is called by the following code, which clears C,Z:

' Call hub bytecode routine
'
hub_code
        rfbyte  pa          'get function index byte
        getptr  pb          'get updated bytecode pointer
        rdword  v,pa    wcz 'lookup function address
        call    v           'call function in hub, c/z/v[31]=0
resume
_ret_   rdfast  #0,pb       'resume bytecode stream

The RDFAST at the end allows RDFAST & RFxxxx to be used safely by the inline PASM.

ersmith · 2022-01-12 12:21

@cgracey : Please don't tie your hands by documenting that inline asm starts with flags clear. You may come to regret it later -- at the very least it means you can never change that part of the code. If inline assembly needs the flags to be in a particular state, they should explicitly put them in that state. That way it's obvious to the reader what's going on, and if you ever change the interpreter the code will still work.

In practice the cost of getting into inline assembly in the interpreter is already fairly high (high enough that any inline assembly that can't afford to spend 4 cycles clearing the flags itself is probably not going to work correctly in future interpreter releases anyway). In a compiler situation though the inline assembly may have 0 overhead (if the code was already running in COG or LUT for some other reason) and then the additional cost of clearing the flags shoud only be borne by code that really needs it, not by every inline assembly snippet.

evanh · 2022-01-12 12:39

In that case, a getrnd inb wcz should be added before the inline call. Otherwise, some coders will use the incidental state of flags. It happens all the time in programming.

TonyB_ · 2022-01-12 13:02

@evanh said:
In that case, a getrnd inb wcz should be added before the inline call. Otherwise, some coders will use the incidental state of flags. It happens all the time in programming.

Adding two cycles and one long to randomize C/Z for every inline call seems over the top. MODCZ in inline PASM adds two cycles but only when needed.

ersmith · 2022-01-12 13:05

@evanh said:
In that case, a getrnd inb wcz should be added before the inline call. Otherwise, some coders will use the incidental state of flags. It happens all the time in programming.

I have some sympathy with your suggestion, but you can't always stop people from shooting themselves in the foot. Frankly it would be more useful if the compiler could give a warning like "C flag used without being set". Something to add to my todo list, I guess...

rogloh · 2022-01-12 21:02

People who arbitrarily use flags without setting them first need to be taught a hard lesson. I say no warning should be given, and they will have to track down their bug manually.

pik33 · 2022-01-12 21:20

@rogloh said:
People who arbitrarily use flags without setting them first need to be taught a hard lesson. I say no warning should be given, and they will have to track down their bug manually.

P2 makes things somewhat harder. In "normal" CPUs there are sub and cmp. cmp is the kind of sub which doesn't write the result, but writes the flags and writing these flags is the only thing this instruction is intended to do.
In P2 you have to write cmp a,b wcz - cmp a,b is simply nop.

Yes, I already did this mistake.

rogloh · 2022-01-12 21:44

@pik33 said:
P2 makes things somewhat harder. In "normal" CPUs there are sub and cmp. cmp is the kind of sub which doesn't write the result, but writes the flags and writing these flags is the only thing this instruction is intended to do.
In P2 you have to write cmp a,b wcz - cmp a,b is simply nop.

Yes, I already did this mistake.

Good point, though I bet it won't happen too many more times now. You've now learned about what wc/wz/wcz really do. Admittedly, cmp a,b is a strange instruction to actually exist in the ISA. Not sure why it is there without wc/wz/wcz.

Wuerfel_21 · 2022-01-12 21:50

@rogloh said:

show previous quotes

@pik33 said:
P2 makes things somewhat harder. In "normal" CPUs there are sub and cmp. cmp is the kind of sub which doesn't write the result, but writes the flags and writing these flags is the only thing this instruction is intended to do.
In P2 you have to write cmp a,b wcz - cmp a,b is simply nop.

Yes, I already did this mistake.

Good point, though I bet it won't happen too many more times now. You've now learned about what wc/wz/wcz really do. Admittedly, cmp a,b is a strange instruction to actually exist in the ISA. Not sure why it is there without wc/wz/wcz.

MODCZ is another one that really doesn't need flag write bits, but has them anyways.

On P1, CMP is not a separate instruction, but just an alias for CMP d,s NR.

cgracey · 2022-01-14 10:17

By still requiring wc/wz/wcz for the few instructions that always need them, it keeps the rule that all flag-affecting operations will have viewable flag-writes, hopefully in a consistent column, making them easy to visually scan for.

TonyB_ · 2022-01-14 10:23

Chip, what's your view on clearing C,Z before calling inline PASM?

cgracey · 2022-01-14 10:36

@TonyB_ said:
Chip, what's your view on clearing C,Z before calling inline PASM?

I did it because there was no cost in doing it. I just added WCZ to an existing instruction.

It is better to have flags in known states on entry, but I understand ersmith's desire to avoid this, since it would force unnecessary code emission in his compiler.

Inline PASM, WOW!

Comments