Shop OBEX P1 Docs P2 Docs Learn Events
WAV Play Experiment — Parallax Forums

WAV Play Experiment

JonnyMacJonnyMac Posts: 9,104
edited 2021-02-05 16:30 in Propeller 2

I was inspired by @Baggers to create easy code for playing a WAV file from RAM -- the experimental result is attached. It will play 8- or 16-bit, mono or stereo files. Now that it all seems to be working, I am moving it to a proper object so that I can change stereo volume and playback speed on-the-fly (object will use one cog)

While we don't have structures in Spin, it is nice that with the P2 we no longer have to worry about word or long alignment, which lets us keep a group of variables in the declared order. WAV files have a 44-byte header that is a mix of longs and words. We can declare variables for the header and simply copy them into place with a xxxxmove instruction. Neat. I used bytemove to reinforce the idea of the 44-byte header (wordmove and longmove work, too).

Here are the header variables (notice the size mix)

  ' WAV header structure
  ' -- do not change order of these variables

  long  chunkId
  long  chunkSize
  long  format
  long  subChunk1Id
  long  subChunk1Size
  word  audioFormat
  word  numChannels
  long  sampleRate
  long  byteRate
  word  blockAlign
  word  bitsPerSample
  long  subChunk2Id
  long  subChunk2Size 

Copying the header from the embedded WAV is this easy now:

  bytemove(@chunkId, p_dat, 44)                                 ' copy WAV header to variables  

This is the program output to PST (ANSI terminals seem to work, too -- change T_TYPE constant).

Code updated 05 FEB 2021. Includes bug fix (thanks, Chip!) and simplification of the volume control code.

Comments

  • That's awesome @JonnyMac :D

  • flexprop may not agree with PropTool in byte moving the WAV header structure...

    Copying the vars one-at-a-time, results in a good display (though the wav file does not play correctly):

    There are 2 sounds that are heard (a high volume and a low volume), but they are just a "PHHHT"-like sound

  • JonnyMacJonnyMac Posts: 9,104
    edited 2021-02-04 03:07

    The ZERP file has a 1kHz sine on one channel and a 440Hz sine on the other -- the 16-bit version should sound very clear. Perhaps @ersmith can sort out why this program doesn't work in FlexProp -- it's really pretty straightforward. I only tested with Propeller Tool, but did check the ANSI output using PuTTY.

    The audio playback gets what it needs from the header without using any of the displayed global variables. The object I'm writing is based on that inline assembly.

  • JonnyMacJonnyMac Posts: 9,104
    edited 2021-02-04 03:37

    I thought I'd have a look to see if I could help out, but I was only reminded why I have been so turned off by Spin compilers. How does a line a code that seems like it could translate to this:

    __byte_move     rdbyte    tmp, arg02
                    wrbyte    tmp, arg01
                    add       arg02, #1
                    add       arg01, #1
                    djnz      arg03, #__byte_move
                    ret
    

    ... become this?

    __system__bytemove
            mov     _var01, arg01
            cmps    arg01, arg02 wcz
     if_ae  jmp     #LR__0142
            loc     pa,     #(@LR__0136-@LR__0135)
            call    #FCACHE_LOAD_
    LR__0135
            cmps    arg03, #3 wcz
     if_be  jmp     #LR__0137
            rdlong  _var02, arg02
            wrlong  _var02, arg01
            add     arg01, #4
            add     arg02, #4
            sub     arg03, #4
            jmp     #LR__0135
    LR__0136
    LR__0137
            mov     _var03, arg03 wz
     if_e   jmp     #LR__0148
            loc     pa,     #(@LR__0140-@LR__0138)
            call    #FCACHE_LOAD_
    LR__0138
            rep     @LR__0141, _var03
    LR__0139
            rdbyte  _var02, arg02
            wrbyte  _var02, arg01
            add     arg01, #1
            add     arg02, #1
    LR__0140
    LR__0141
            jmp     #LR__0148
    LR__0142
            add     arg01, arg03
            add     arg02, arg03
            mov     _var04, arg03 wz
     if_e   jmp     #LR__0147
            loc     pa,     #(@LR__0145-@LR__0143)
            call    #FCACHE_LOAD_
    LR__0143
            rep     @LR__0146, _var04
    LR__0144
            sub     arg01, #1
            sub     arg02, #1
            rdbyte  _var02, arg02
            wrbyte  _var02, arg01
    LR__0145
    LR__0146
    LR__0147
    LR__0148
            mov     result1, _var01
    __system__bytemove_ret
            ret
    

    The truth is, I don't care because I'm not using anything but official Parallax tools at the moment, but those using non-Parallax compilers might want to know why it takes so much code for such a simple function.

  • Based on the example by dgately, it looks like Flexprop is reordering the variables so that longs are first, followed by words. There's no other context to tell it not to.

    This variable reordering is more "normal" in the world of compilers. It's for this reason that struct keyword was invented, it means don't mess with this.

    https://stackoverflow.com/questions/9486364/why-cant-c-compilers-rearrange-struct-members-to-eliminate-alignment-padding#9487640

  • It's too bad computing has diluted the meaning of terminology. I went down the trail of researching record as opposed to structure. But even with records usually being more ridgid in their implementation, they still come with system-dependent padding and alignment concerns such that you can't be sure what you ultimately get.

    A data block might still mean an exact and unambiguous representation of how variables are laid out in memory. Consistent in the order they're declared and their size and their absence of secret padding... but even then I'm not so sure.

  • Wuerfel_21Wuerfel_21 Posts: 5,053
    edited 2021-02-04 12:13

    @whicker said:
    Based on the example by dgately, it looks like Flexprop is reordering the variables so that longs are first, followed by words. There's no other context to tell it not to.

    This is intended behavior for Spin1, but apparently not for Spin2

    @JonnyMac said:
    I thought I'd have a look to see if I could help out, but I was only reminded why I have been so turned off by Spin compilers. How does a line a code that seems like it could translate to this:
    [...]
    ... become this?
    [..]
    The truth is, I don't care because I'm not using anything but official Parallax tools at the moment, but those using non-Parallax compilers might want to know why it takes so much code for such a simple function.

    Well it's simple

    • If you wrote bytemove like you did there it wouldn't work in all cases. Read the documentation or spinterpreter source. It copies upwards or downwards depending on where the source/destination are to avoid overwriting something that it will later want to read.
    • It's faster that way - The function as generated by flexspin will be vastly faster than a simple rdbyte/wrbyte loop for big bytemoves. (Smaller constant-length bytemoves are instead inlined into the caller)

    So pack your prejudice away for today.

  • Wow, so glad I use pasm. it's like the early days of C compilers, they were terrible compared to pure asm, it took many many years before they became good.

  • FlexProp is optimized for speed, not for simplicity of the generated code. In the case of bytemove, it has to handle both upwards and downwards moves (that's a Spin language requirement). For the P2 it takes advantage of the ability to do long moves to speed up the copy in a common case. Finally there's also the FCACHE code that copies the loop into local memory so it can run even faster. All of this adds complication. Yes, if we didn't care about correctness or about speed we could do the straightforward translation you proposed @JonnyMac .

    The WAV file problem is, as @Wuerfel_21 and @whicker figured out, do to the object variables being re-ordered the same way as in Spin1. There are some distinct performance advantages to this (un-aligned reads/writes are possible on P2, but slower). If enough objects rely on variable ordering then I'll re-visit this.

    @Baggers: FlexProp isn't for everyone, but performance wise it's not "terrible". The code is complicated and "ugly" precisely because it's trying to be fast.

  • Or to bring the topic back to audio playback: I think P2 is powerful enough to run Tremor, the integer-only OGG decoder. Someone should try that.

  • JonnyMacJonnyMac Posts: 9,104
    edited 2021-02-04 16:37

    The WAV file problem is, as @Wuerfel_21 and @whicker figured out, do to the object variables being re-ordered the same way as in Spin1.

    That only explains the broken display of the header information, which Dennis sorted on his own. The playback code is atomic and not reliant on the variables used in the display.

  • JonnyMacJonnyMac Posts: 9,104
    edited 2021-02-04 17:16

    @dgately I know why audio playback is not working: The Spin compiler stores the system frequency (clkfreq) at hub address $44, while FlepProp puts the system frequency at hub address 20.

    The fix (I'm asking you to do this manually to verify what I did works [I do have proper audio from FlexProp after these changes]).
    -- Add a local variable to the play_wav() method called systix.
    -- Add this line before the inline pasm code
    systix := clkfreq

    -- Look for this line:
    rdlong smpltix, #$44

    ...which you'll find about six lines above the .fix_level label. Change that line to this:
    mov smpltix, systix

    I will try to figure out an elegant way to deal with this incompatibility issue with my WAV player object.

  • Wuerfel_21Wuerfel_21 Posts: 5,053
    edited 2021-02-04 17:19

    Ye, looks fine to me, so that not working is probably an actual bug to add to the pile. (written before the post above)

    Unrelatedly: SAL is not the way to scale a sample. Use SHL. Or if you want proper levels, MUL/MULS or ROLBYTE+SAR

  • JonnyMacJonnyMac Posts: 9,104
    edited 2021-02-04 18:03

    Did I misunderstand SAL? It appears to be a left shift that pads with the original bit0, so $7F is promoted to $7FFF instead of $7F00 as with SHL (assuming a shift value of 8). I've tried both an there is no discernable difference in the audio.

    Assembly is not my core strength, and as most of my code is for public consumption I tend to keep things very simple (i.e., public code is not highly optimized). This is the code in question:

    .mono8          rdbyte    ls, p_wav                             ' read sample
                    add       p_wav, #1                             ' point to next
                    subs      ls, #$80                              ' convert to signed
                    sal       ls, #7                                ' expand sample, make 1v p-p
                    mov       rs, ls                                ' copy to right channel
                    jmp       #.set_volume
    

    My thoughts: This is an unsigned byte so subtracting $80 gets us to a signed long. The SAL #7 promotes it to 16 bits and then divides by two; this gives a maximum of 1v peak-to-peak from the DAC into the external amplifier.

    You have a lot more experience with PASM and A/V coding than I have. I'd love to see how you would do this. I put up this experimental code so that it could be improved before being folded into a general-purpose object.

  • You understood SAL correctly, it's just that doing it that way effectively halves the resolution, because, for example, $71 becomes $71FF and $72 becomes $7200, which is almost the same value.

    I think this would be more correct (untested though)

    .mono8          rdbyte    ls, p_wav                             ' read sample
                    add       p_wav, #1                             ' point to next
                    subs      ls, #$80                              ' convert to signed
                    muls      ls, #$0101                            ' expand sample
                    sar       ls, #1                                ' make 1v p-p
                    mov       rs, ls                                ' copy to right channel
                    jmp       #.set_volume
    
  • @JonnyMac said:
    @dgately I know why audio playback is not working: The Spin compiler stores the system frequency (clkfreq) at hub address $44, while FlepProp puts the system frequency at hub address 20.

    The fix (I'm asking you to do this manually to verify what I did works [I do have proper audio from FlexProp after these changes]).
    -- Add a local variable to the play_wav() method called systix.
    -- Add this line before the inline pasm code
    systix := clkfreq

    -- Look for this line:
    rdlong smpltix, #$44

    ...which you'll find about six lines above the .fix_level label. Change that line to this:
    mov smpltix, systix

    I will try to figure out an elegant way to deal with this incompatibility issue with my WAV player object.

    Yes, this gets WavFile3 (zerp-16.wav) to play, and it sounds the same as playing that file from my Mac Desktop. The other 3 files are distorted and either very loud (01_milshot.wav) or very quiet (zerp-8 & 8.wav).

  • Wuerfel_21Wuerfel_21 Posts: 5,053
    edited 2021-02-04 18:17

    You might also want to add support for A-law compression (logarithmic PCM). It's 8 bits per sample, but it sounds so much better than straight 8 bit PCM and takes little code to decompress.
    Here's the code for that (in P1 ASM, can be optimized a bit for P2):

                  '' Decompress A-law
                  '' Compressed sample is in atmp1
                  '' output is in sfxsample, scale determined by ALAW_BASESHL (= 19 for 31 bit, would be 3 for 15 bit)
                  '' alaw_bias is a constant: 1<<(ALAW_BASESHL-1)  
                  '' alaw_leading is a constant: $10<<ALAW_BASESHL
                  xor atmp1,#$55
                  mov sfxsample,atmp1
                  and sfxsample,#$0F ' mantissa isolated
                  shl sfxsample,#ALAW_BASESHL
                  add sfxsample,alaw_bias
                  mov atmp2,atmp1
                  shr atmp2,#4
                  and atmp2,#7 wz ' exponent isolated
            if_nz add sfxsample,alaw_leading
            if_nz sub atmp2,#1
                  shl sfxsample,atmp2
                  test atmp1,#$80 wc
                  negnc sfxsample,sfxsample
    
    
  • @Wuerfel_21 said:
    You understood SAL correctly, it's just that doing it that way effectively halves the resolution, because, for example, $71 becomes $71FF and $72 becomes $7200, which is almost the same value.

    I think this would be more correct (untested though)

    .mono8          rdbyte    ls, p_wav                             ' read sample
                    add       p_wav, #1                             ' point to next
                    subs      ls, #$80                              ' convert to signed
                    muls      ls, #$0101                            ' expand sample
                    sar       ls, #1                                ' make 1v p-p
                    mov       rs, ls                                ' copy to right channel
                    jmp       #.set_volume
    

    Thanks, I will give that a try.

  • JonnyMacJonnyMac Posts: 9,104
    edited 2021-02-04 20:32

    @Wuerfel_21 said:
    You might also want to add support for A-law compression (logarithmic PCM). It's 8 bits per sample, but it sounds so much better than straight 8 bit PCM and takes little code to decompress.
    Here's the code for that (in P1 ASM, can be optimized a bit for P2):

                  '' Decompress A-law
                  '' Compressed sample is in atmp1
                  '' output is in sfxsample, scale determined by ALAW_BASESHL (= 19 for 31 bit, would be 3 for 15 bit)
                  '' alaw_bias is a constant: 1<<(ALAW_BASESHL-1)  
                  '' alaw_leading is a constant: $10<<ALAW_BASESHL
                  xor atmp1,#$55
                  mov sfxsample,atmp1
                  and sfxsample,#$0F ' mantissa isolated
                  shl sfxsample,#ALAW_BASESHL
                  add sfxsample,alaw_bias
                  mov atmp2,atmp1
                  shr atmp2,#4
                  and atmp2,#7 wz ' exponent isolated
            if_nz add sfxsample,alaw_leading
            if_nz sub atmp2,#1
                  shl sfxsample,atmp2
                  test atmp1,#$80 wc
                  negnc sfxsample,sfxsample
    
    

    Again, thank you. The 8-bit uncompressed sounds a bit ratty, so I'll I give this a try. Even running at 200MH (my standard), there are about 4500 system ticks per 44.1kHz sample period -- there is plenty of time to do a bit of simple decompression. That said, my goal is always simplicity for the sake of teaching and inspiring. That code doesn't look bad, though.

  • JonnyMacJonnyMac Posts: 9,104
    edited 2021-02-05 16:02

    I had a late night code review with Chip -- he helped me track down a bug (was setting mode bits backward) and simplify the volume control code. It sounds better and looks good on a 'scope (1v, p-p, centered at 1v). I will move this code into a cog that will allow the user to start and stop playback at will, to put playback on hold for later release, to change left and right volume levels (0% to 100%) on-the-fly, and to change the playback rate (50% to 200%) on the fly.

    Updated code is in the first post. If you're a FlexProp user, you'll have to make a small adjustment -- it's noted in the file.

Sign In or Register to comment.