Inline PASM, WOW!

in Propeller 2
I am reviewing pages 16 and 17 of the Spin2 manual, and all I can say is "WOW!"
If this works as easily as it appears in the manual, this is amazing. You can take those time critical parts of your code and write them in PASM, inline within the object. And it appears that you can do it easily even allowing access to the local variables.
I am excited to optimize some of my objects using this!!
The documentation is, as usual, very terse. Which is good. I need time to digest the syntax but am very excited to try it!
Are flags cleared when entering in-line PASM? Also, regardless of how they are manipulated in the in-line PASM, when returned back to spin, it won't cause problems with the interpreter?
Comments
Inline PASM2 is one of my favorite things about Spin2, and I use it quite a lot.
Caveat: If you want your code to work under Propeller Tool and under FlexProp, you need to access clkfreq as a high-level construct. Why? Well, Official Spin2 and FlexProp store that value in different locations. I ran into this when working on a WAV playback demo. This was for testing; it will play a wave file that is embedded in the code with the file directive.
pub play_wav(p_wav, level) : result | systix, t1, mode, nsmpls, smpltix, ls, rs '' Play WAV file from RAM '' -- p_wav is pointer to embedded file '' * must be canonical WAV with header '' -- level is volume control, 0..100 (%) systix := clkfreq ' fix FlexProp incompatibility org mov t1, p_wav add t1, #20 ' audio format rdbyte t1, t1 cmp t1, #1 wcz ' must be 1 (PCM) if_ne jmp #.done mov mode, #0 ' wav configuration mode mov t1, p_wav add t1, #22 ' channels rdbyte t1, t1 cmp t1, #2 wcz ' max is 2 channels if_a jmp #.done bitz mode, #1 ' 0 = mono, 1 = stereo mov t1, p_wav add t1, #34 ' bits per sample rdbyte t1, t1 shr t1, #3 ' convert to bytes/sample cmp t1, #2 wcz ' max is 2 bytes/channel if_a jmp #.done bitz mode, #0 ' 0 = 8-bit, 1 = 16-bit mov t1, p_wav add t1, #40 rdlong nsmpls, t1 ' read sub chunk 2 bytes mov t1, p_wav add t1, #32 ' block align (bytes per sample) rdbyte t1, t1 qdiv nsmpls, t1 getqx nsmpls mov result, nsmpls ' return samples in file ' rdlong smpltix, #$44 ' read clkfreq mov smpltix, systix ' fix FlexProp incompatibility mov t1, p_wav add t1, #24 ' sample rate rdlong t1, t1 qdiv smpltix, t1 ' clkfreq/sample rate getqx smpltix .fix_level fge level, #0 ' keep 0..100 fle level, #100 mul level, #$7FFF/100 ' set scale factor .start add p_wav, #44 ' point to audio getct t1 ' start sample timer .loop jmprel mode jmp #.mono8 jmp #.mono16 jmp #.stereo8 jmp #.stereo16 .mono8 rdbyte ls, p_wav ' read sample add p_wav, #1 ' point to next shl ls, #8 ' expand sample, make 1v p-p bitnot ls, #15 mov rs, ls ' copy to right channel jmp #.set_volume .mono16 rdword ls, p_wav ' read sample add p_wav, #2 ' point to next mov rs, ls ' copy to right jmp #.set_volume .stereo8 rdbyte ls, p_wav ' read left sample add p_wav, #1 ' point to right rdbyte rs, p_wav ' read right sample add p_wav, #1 ' point to next left shl ls, #8 ' expand sample, make 1v p-p bitnot ls, #15 ' fix sign bit shl rs, #8 ' expand sample, make 1v p-p bitnot rs, #15 ' fix sign bit jmp #.set_volume .stereo16 rdword ls, p_wav ' read left sample add p_wav, #2 ' point to right rdword rs, p_wav ' read right sample add p_wav, #2 ' point to next left .set_volume muls ls, level ' scale volume shr ls, #16 ' adjust bitnot ls, #15 ' center at 1v muls rs, level shr rs, #16 bitnot rs, #15 .set_dacs wypin ls, #46 ' *** hard-coded for testing! *** wypin rs, #47 addct1 t1, smpltix ' update sample timer waitct1 ' let it expire djnz nsmpls, #.loop ' next sample .quiet wypin ##$8000, #46 ' set to midpoint of DAC wypin ##$8000, #47 .done ret end
Full disclosure: I ran into a couple of hiccups that Chip helped me through. It works great now, though.
I think the answer is yes, but it would be best if @cgracey clarified. I called this...
pub flags() : result org muxz result, #%01 muxc result, #%10 end
...from inside a code loop doing various things and the result was always %00.
From a study of the spin interpreter code, cleared on entry and don't care on exit.
Beware, "cleared on entry" hasn't been guaranteed in any Spin2 documentation, nor is it necessarily true in other compilers (like flexspin). So if you need a flag to be 0, clear it yourself.
"Don't care on exit" is also not guaranteed, but given that we haven't been given any explicit instructions to leave the flags in a particular state (or to save/restore them) it's probably OK to assume this one.
Too right! The environment with inline assembly hides much of the little pitfalls that assembly programming brings just because it usually also drops you on the metal. The simplicity of locals all being cogRAM eases learning of assembly for everyone wanting to give it go.
It's surprising what can be dabbled with too. A good example is the hubexec FIFO hardware. Inline assembly is 100% cogexec (lutexec I think for FlexSpin). Which means the FIFO can be temporarily used (eg: RDFAST, RFLONG) by the inline assembly without worry about crashing the Spin2 calling code.
I will document flag behavior for in-line PASM.
Here is the Spin2 interpreter code that calls a PASM routine:
inline 'some lines omitted, C/Z not written call w 'call pasm code (can use pa/pb/ptra/ptrb/stack, C/Z=0) mov pb,y wc 'restore bytecode ptr if_c setq #16-1 'if inline_pasm, restore local variables to hub if_c wrlong buff,dbase ' _ret mov ptra,z 'restore ptra
EDIT:
I think
inline
(in hub RAM) is called by the following code, which clears C,Z:' Call hub bytecode routine ' hub_code rfbyte pa 'get function index byte getptr pb 'get updated bytecode pointer rdword v,pa wcz 'lookup function address call v 'call function in hub, c/z/v[31]=0 resume _ret_ rdfast #0,pb 'resume bytecode stream
The RDFAST at the end allows RDFAST & RFxxxx to be used safely by the inline PASM.
@cgracey : Please don't tie your hands by documenting that inline asm starts with flags clear. You may come to regret it later -- at the very least it means you can never change that part of the code. If inline assembly needs the flags to be in a particular state, they should explicitly put them in that state. That way it's obvious to the reader what's going on, and if you ever change the interpreter the code will still work.
In practice the cost of getting into inline assembly in the interpreter is already fairly high (high enough that any inline assembly that can't afford to spend 4 cycles clearing the flags itself is probably not going to work correctly in future interpreter releases anyway). In a compiler situation though the inline assembly may have 0 overhead (if the code was already running in COG or LUT for some other reason) and then the additional cost of clearing the flags shoud only be borne by code that really needs it, not by every inline assembly snippet.
In that case, a
getrnd inb wcz
should be added before the inline call. Otherwise, some coders will use the incidental state of flags. It happens all the time in programming.Adding two cycles and one long to randomize C/Z for every inline call seems over the top. MODCZ in inline PASM adds two cycles but only when needed.
I have some sympathy with your suggestion, but you can't always stop people from shooting themselves in the foot. Frankly it would be more useful if the compiler could give a warning like "C flag used without being set". Something to add to my todo list, I guess...
People who arbitrarily use flags without setting them first need to be taught a hard lesson. I say no warning should be given, and they will have to track down their bug manually.
P2 makes things somewhat harder. In "normal" CPUs there are sub and cmp. cmp is the kind of sub which doesn't write the result, but writes the flags and writing these flags is the only thing this instruction is intended to do.
In P2 you have to write cmp a,b wcz - cmp a,b is simply nop.
Yes, I already did this mistake.
Good point, though I bet it won't happen too many more times now.
You've now learned about what wc/wz/wcz really do. Admittedly, cmp a,b is a strange instruction to actually exist in the ISA. Not sure why it is there without wc/wz/wcz.
MODCZ is another one that really doesn't need flag write bits, but has them anyways.
On P1, CMP is not a separate instruction, but just an alias for CMP d,s NR.
By still requiring wc/wz/wcz for the few instructions that always need them, it keeps the rule that all flag-affecting operations will have viewable flag-writes, hopefully in a consistent column, making them easy to visually scan for.
Chip, what's your view on clearing C,Z before calling inline PASM?
I did it because there was no cost in doing it. I just added WCZ to an existing instruction.
It is better to have flags in known states on entry, but I understand ersmith's desire to avoid this, since it would force unnecessary code emission in his compiler.