Shop OBEX P1 Docs P2 Docs Learn Events
Paula (Amiga) inspired audio driver [0.95 - envelopes] — Parallax Forums

Paula (Amiga) inspired audio driver [0.95 - envelopes]

pik33pik33 Posts: 2,366
edited 2023-08-25 08:55 in Propeller 2

First working alpha (0.08) 0.95 beta attached.

Not optimal for time or memory use - optimization is TODO for next versions. There is also an unsolved problem which can make this driver to stop working after about 1000 seconds due to the internal counter overflow. This is solved now

Paula is an Amiga 4-channel sample based audio chip. Every channel can have its own sample rate. This means:

  • no resampling. No interpolating, filtering and aliases added to resampled sound. Instead, every sample is outputted "as is"
  • if the sample is one period type (for chiptunes) or has exactly integer number of wave periods, aliases introduced by Paula (using low sample rate) are harmonics. This means they rather adds to the sound instead of to be annoying, alien, non-harmonic noise

So, the goal was to make a synthesizer which can do exactly that, and exactly at original Paula sample rate so it can be easily used for Amiga modules.

The problem was of course computation time. At about 3.5 MHz sample rate the program has to determine if the new sample from every channel needs to be outputted, and if, the driver has to retrieve the sample from the hub memory. At 320 MHz I have only 90 clocks for this. It is simply not possible, one RDxxx can take 17 cycles and if it happens that all 4 channels have to be updated at the same time, there has to be much more than 90 clocks to compute this.

And the P2 cog has only 1 streamer and 1 FIFO

The solution is: the Amiga uses low sample rates. Even at 35 kHz there is a hundred Paula cycles, 9000 P2 clocks between samples. This means I have a lot of time to compute them. The 3.5 MHz procedure has only to output the sample at the proper time.

So the main sample outputting procedure uses interrupts from a DAC channel. It reads the next sample to output and its time from LUT circular buffer. In the meantime the synthesizer computes the sample values and times to output and places them in the buffer.

This allowed me to write enhanced version of a "Paula like thing". It can play 8 and 16 bit samples (sample type selectable for the channel) and stereo pan them - in 8 channels.

The driver with a test sine wave sample attached below.

The "official" driver repository: https://github.com/pik33/P2-retromachine/tree/main/Propeller/Audiodriver https://gitlab.com/pik33/P2-retromachine/-/tree/main/Propeller/Audiodriver

This is the part of the project: https://github.com/pik33/P2-retromachine https://gitlab.com/pik33/P2-retromachine

«134

Comments

  • hinvhinv Posts: 1,255

    I've been poking around on your github link. I really like what you are doing there! I too would love to see a P2 retromachine (which is a great name, BTW). One thing I want though, and I don't know if you have the same desire to put everything contained within the retromachine itself. To me, a real retrocomputer would need a local editor and compiler/assembler. Back in the day we used to program on the computer we used...now the computer I use is far to distracting or maybe it is that I am just far to easily distracted. It would be cool to also have code repositories (github or gitlab) access and forum access right from the P2 as well....maybe over USB to a phone. The lack of ability to play videos on the P2 is actually a plus.

    One question, why the 960x540 resolution? Is there some emulation that works well at this res?

  • pik33pik33 Posts: 2,366
    edited 2022-01-02 22:01

    960x540 is simply fullHD/2

    As it is now, the video driver doesn't work at this resolution (where I left this 960x540 in the code? I wanted it, but I haven't it)

    Edit: found and removed the reference in the header text. The driver has actually maximum resolution of 1024x576 @ 50 Hz (needs 360 MHz) and several other (less than this) to select. At 320 MHz there is 896x496 resolution available @ 50 Hz and 800x480 @ 60 Hz.

    Yes, this should be a self-contained machine with the Basic interpreter (for the start) That's why I used a Pi for kbd/mouse - I have also a MIDI shield and I bought several DB9,15,25 type connectors. I am waiting for HDMI breakout boards which should be available tomorrow, then I want to 3-d print a box for the machine. I will start a topic for it then

    I tried to attach Ahle2's tracker player to this "Paula": it (kind of) works, I don't understand when I have to retrigger samples.

    Edit: I actually managed to play a module using this. Too late to play further, the actual code with the module is in Github.

  • roglohrogloh Posts: 5,786
    edited 2022-01-03 04:34

    Nice work @pik33 . I just tried it out.

    I think you can save about 15 P2 instructions (and 30 clock cycles) in your main loop where you branch to your different COG addresses for computing each channel's samples. Instead of writing a value to "cn" which you test and branch on later you can move the address of the branch target directly into cn right away and then jump to it in the end of the time tests (indirect jump).

    Instead of:

                fle     ct,time2 wcz    ' TODO: THIS WILL FAIL AFTER 1210 (or 605? )seconds when overflow
        if_c    mov     cn,#1
                fle     ct,time3 wcz
        if_c    mov     cn,#2
                fle     ct,time4 wcz
    

    do this:

                fle     ct,time2 wcz    ' TODO: THIS WILL FAIL AFTER 1210 (or 605? )seconds when overflow
        if_c    mov     cn,#p202
                fle     ct,time3 wcz
        if_c    mov     cn,#p203
                fle     ct,time4 wcz
                ...(etc)
    
                jmp     cn
    
  • pik33pik33 Posts: 2,366
    edited 2022-01-03 08:53

    Instead of writing a value to "cn" which you test and branch on later you can move the address of the branch target directly into cn right away and then jump to it in the end of the time tests (indirect jump).

    Implemented. Still have to rapair the overflow problem. Time counts at Paula rate, 3.5 MHz, and when it rollovers, this fle sequence will not work

  • pik33pik33 Posts: 2,366
    edited 2022-01-03 14:50

    0.10.

    Implemented the optimization above

    Optimized ISR (2 nops gained)

    The driver no more fails when the counter overflows. This costed 17 instructions in the main loop and one in ISR (1 nop lost; 2 nops left there)

    The tracker player is now fully working. The main file in the attached zip is player.bas. Insert the module name in shared asm section. HDMI at 0, AV board at 8 (audio at P14,P15). HDMI now displays some debug. As the FlexBasic can read files from SD card and the retrocog can get data from FlexProp serial terminal, the tracker player with modules on SD is now possible :)

    Edit: the driver stil can fail on the rollover. The bug is yet to find

  • pik33pik33 Posts: 2,366

    Corrected (?) - cmp time, #$80000000 wc doesn't set c if time=$80000000. Changed to $7FFFFFFF. Also initial frequencies for channels were set way too low. To make debugging easier I set the rollover to $20000000 instead of $80000000. I am listening to the module and the counter rolled over several times. Maybe the bug is fixed.

  • evanhevanh Posts: 15,912
    edited 2022-01-04 00:53

    Assuming you're using HDMI monitor, try this for 960x540 mode:

    dvi50_960x540_timing    ' 50 Hz with 32.0 MHz pixel clock
                long   CLK320MHz
                long   320_000_000
                       '_HSyncPolarity___FrontPorch__SyncWidth___BackPorch__Columns
                       '     1 bit        7 bits      8 bits      8 bits    8 bits
                long   (SYNC_NEG<<31) | ( 8<<24) | ( 64<<16) | ( 8<<8 ) | (960/8)
    
                       '_VSyncPolarity___FrontPorch__SyncWidth___BackPorch__Visible
                       '     1 bit       8 bits      3 bits     9 bits   11 bits
                long   (SYNC_NEG<<31) | (1<<23) | (  2<<20) | ( 72<<11) | 540
                long   10 << 8     ' sys-clock to pixel-clock ratio
                long   0
                long   0   ' reserved for CFRQ parameter
    
  • evanhevanh Posts: 15,912
    edited 2022-01-04 01:06

    If you want to adjust the pixel clock to suit a different sysclock then it's as simple as updating the clock frequencies and the ( 72<<11) for vertical blank time.

    For example:

    dvi50_960x540_timing    ' 50 Hz with 30.0 MHz pixel clock
                long   CLK300MHz
                long   300_000_000
                       '_HSyncPolarity___FrontPorch__SyncWidth___BackPorch__Columns
                       '     1 bit        7 bits      8 bits      8 bits    8 bits
                long   (SYNC_NEG<<31) | ( 8<<24) | ( 64<<16) | ( 8<<8 ) | (960/8)
    
                       '_VSyncPolarity___FrontPorch__SyncWidth___BackPorch__Visible
                       '     1 bit       8 bits      3 bits     9 bits   11 bits
                long   (SYNC_NEG<<31) | (1<<23) | (  2<<20) | ( 34<<11) | 540
                long   10 << 8     ' sys-clock to pixel-clock ratio
                long   0
                long   0   ' reserved for CFRQ parameter
    

    Calculated with 30e6 / (960 + 80 = 1040) / 50 - 540 - 3 = 34

  • evanhevanh Posts: 15,912
    edited 2022-01-19 00:46

    So there is a total of four effective parameters to build complete HDMI timings from:
    1 - sysclock frequency (sysfrq)
    2 - horzontal resolution (hres)
    3 - vertical resolution (vres)
    4 - vertical frequency (vfrq)

    Remaining timings are:
    dotfrq (>= 25 MHz) = sysfrq / 10
    htot = hres + 80
    hfp = 8
    hsw = 64
    hbp = 8
    vtot = dotfrq / htot / vfrq
    vfp = 1
    vsw = 2
    vbp (>= 9) = vtot - vres - 3

    EDIT: Added the minimum for dotfrq. DVI/HDMI minimum link speed is 250 MHz.

  • @pik33 I'm looking at whether the larger mod file samples can be played directly from external PSRAM....got to understand your code and @Ahle2's tracker audio sample addressing and how frequently samples need to be read. Am hoping there may be some scope to do this. Perhaps the music sequence data would still have to come from HUB if it is accessed a lot of the time and randomly but it might be possible to have the sample data in external RAM.

  • roglohrogloh Posts: 5,786
    edited 2022-01-04 07:01

    @pik33 You might be able to read from external memory in your Paula driver with something like this...duplicate this for other channels. Whether it would keep up and handle the extra read latency of about 1us per sample I'm not sure.

    ' ------------  Channel 1
    
    p201        mov     dt0,time1      ' compute the delta to add to the global time
                sub     dt0,time0
                add     time1,freq1    ' compute the next channel time  
    
                add     p1,askip1      ' update the phase accumulator
                cmp     p1,lend1 wcz   ' substract the loop length if over the loop end
        if_ge   sub     p1,lend1            
        if_ge   add     p1,lstart1       
                mov     qq,p1          ' compute the pointer to the next sample
                add     qq,sstart1
    
                callpa  type1, #readsample
    
                scas    spl,vol1       ' apply the volume
                mov     spl,0-0
    
                scas    spl,apan1      ' apply the pan
                mov     ls1,0-0
                mov     qq,##16384
                sub     qq,apan1
                scas    spl,qq
                mov     rs1,0-0
    
                jmp     #p101              
    
    readsample
                cmp     pa, #0 wz
        if_nz   setnib  qq,#%1000,#7 'read byte from extmem
        if_z    setnib  qq,#%1001,#7 'read word from extmem
                rep     #2,#1   ' disable interrupt for next 3 long write burst
                setq    #2
                wrlong  qq, mailbox ' write command to ext mem driver
                setq    #1
                rdlong  qq, mailbox wc 'poll mailbox for result
        if_c    jmp     #$-2
        if_nz   shl     spl, #8     ' 8 bit sample
                ret
    
    ' temporary variables 
    
    qq          long 0 ' mailbox command and external address
    spl         long 0 ' sample returned here
    zero        long 0 ' keep zero here to avoid Read-modify-write of data
    
    

    EDIT: with some dummy read requests to external PSRAM as well as the real HUB read the Paula code seems to keep up with the mod file. :smile: Am now trying to copy the 512k of actual HUB data over to PSRAM and read from there (at the same address) but it is just muting the output for some reason - bad addressing issue of some type or the write is failing.

    Update: Found the fix! The wc in the mailbox read doesn't work with the setq burst so I just changed the jmp condition following the read to a tjs instruction instead. Now it is playing back mod file data from a PSRAM cached copy of the instrument samples (at the same address offset from the start of memory as used in the HUB RAM version, just for proof-of-concept convenience). :smiley:

  • roglohrogloh Posts: 5,786
    edited 2022-01-04 07:19

    Here's the new magic code to read mod data from my PSRAM driver (or SRAM/HyperRAM for that matter).

  • pik33pik33 Posts: 2,366

    The main loop code has to avoid big setq/rdlong/wrlong or anything which prevents the interrupt to be processed. If you managed to make this work, now it is possible to play big modules from the PSRAM. I want to rewrite the tracker player to FlexBasic, to better understand it and to have a full control of what it does. Maybe, writing a tracker itself can be a good idea. I have to attach some PSRAMs to the P2. I have several chips in the drawer.

  • roglohrogloh Posts: 5,786
    edited 2022-01-04 07:22

    Yes it is worth playing with. PSRAM is very flexible on the P2. I just updated the code a second time in the zip above with a player fix if you already grabbed it.

  • pik33pik33 Posts: 2,366

    I am learning FlexBasic and a module structure at the same time. The code in https://github.com/pik33/P2-retromachine/tree/main/Propeller/Tracker player now displays a module name and a sample list.

    All retro computer mix: Atari 8bit colors, Atari ST font and Amiga module. As there is Sidcog available, the C64 part will be added to this mix in a short time :)

  • Wuerfel_21Wuerfel_21 Posts: 5,051
    edited 2022-01-04 17:33

    @pik33 said:
    All retro computer mix: Atari 8bit colors, Atari ST font and Amiga module. As there is Sidcog available, the C64 part will be added to this mix in a short time :)

    I've been thinking about modifying OPN2Cog into "OPNACog" (would entail adding rythm channels and replacing SN76489 PSG with AY-3-8910), with that you get some PC-98 up in there, too. Hey, wasn't it you whomst asked for P2 Bad Apple once? Could play the original chiptune (which most people don't even know is a thing, lol) with that, haha.

  • pik33pik33 Posts: 2,366
    edited 2022-01-04 20:39

    I have a strange, hard to catch, timing problem in the driver. In one of modules, after several (2, maybe 3) counter rollovers one of channels stopped playing. This was repeatable (after recompiling and reloading, the bug appeared always in the same place.

    I tried to debug this by dumping channels counters from the cog to the hub in the main loop and displaying them on the screen from the Basic main program code.

    After this change the module plays now 3 hours without a glitch... But at least I can see these counters and I have several hundreds of modules to play and test

    Edit... OOPS!!! I am not allowed to add #1 to the "front" variable and then add #1 again, 3 instruction later, including this debug WRLONG !!! as there may be an interrupt between these adds and it can get odd value there and fail.. I moveds this RDLONG at the end of the loop which magically repaired the program, but the problem still exist... and it is easy to correct.

  • pik33pik33 Posts: 2,366
    edited 2022-01-04 21:33

    @evanh said:
    So there is a total of four effective parameters to build complete HDMI timings from:
    1 - sysclock frequency (sysfrq)
    2 - horzontal resolution (hres)
    3 - vertical resolution (vres)
    4 - vertical frequency (vfrq)

    Remaining timings are:
    dotfrq = sysfrq / 10
    htot = hres + 80
    hfp = 8
    hsw = 64
    hbp = 8
    vtot = dotfrq / htot / vfrq
    vfp = 1
    vsw = 2
    vbp (>= 9) = vtot - vres - 3

    960x540 achieved. I forgot to update one of my timing constants and this created instable picture. My HDMI driver now has a lot of flexibility: I have to add a procedural timing calculator for this, enabling creating user defined mode timings.
    And it is displaylisted, so I can mix text and graphic modes on the same screen.

    What is visible now in the player is 960x540 with active 896x496 with a border. The Basic module player uses 112x31 256-color text mode.

  • evanhevanh Posts: 15,912
    edited 2022-01-04 23:51

    Nice one. Glad someone else has tested the method now.

    I think I may have been using your earlier display driver to experiment with myself. I have it calculating from six parameters, including the graphics area parameters too.

  • roglohrogloh Posts: 5,786
    edited 2022-01-05 01:04

    @pik33 said:
    I am learning FlexBasic and a module structure at the same time. The code in https://github.com/pik33/P2-retromachine/tree/main/Propeller/Tracker player now displays a module name and a sample list.

    All retro computer mix: Atari 8bit colors, Atari ST font and Amiga module. As there is Sidcog available, the C64 part will be added to this mix in a short time :)

    Looks (and sounds) cool, you'll have to add the obligatory scrolling note history too like the classic trackers had.

  • evanhevanh Posts: 15,912

    @rogloh said:
    ... you'll have to add the obligatory scrolling note history too like the classic trackers had.

    That was the list of instrument names! Demo writers reproposed by replacing all the names with a blurb that could be viewed like a scroller.

  • I meant something like this where the note data scrolls:

  • evanhevanh Posts: 15,912

    Oh, that's the score editor itself. Yeah, they scrolled when in playback.

  • Ahle2Ahle2 Posts: 1,179

    I was thinking that you would do a complete emulation of the Paula (except the floppy controller, UART and IO stuff). I was looking forward to that.
    Even implementing the PWM volume control that ran at 1/64 the color clock (55.9 kHz) and the audio filter. The PWM volume interferes a little bit with the audio signal in the below 20 kHz range. These things all adds to the well known "Amiga sound".

    Anyway, great work! :smile:

  • pik33pik33 Posts: 2,366
    edited 2022-01-05 09:24

    @Ahle2 said:
    I was thinking that you would do a complete emulation of the Paula (except the floppy controller, UART and IO stuff). I was looking forward to that.
    Even implementing the PWM volume control that ran at 1/64 the color clock (55.9 kHz) and the audio filter. The PWM volume interferes a little bit with the audio signal in the below 20 kHz range. These things all adds to the well known "Amiga sound".

    Anyway, great work! :smile:

    I had to start somewhere. The problem is timings are tight, ISR is full and all these things need to be computed at full Paula speed. The solution may be to go from Paula * 90 to Paula * 100 system clock. This is an extreme P2 clock range. Going Paula*100 gives another 5 nops for the ISR. The setq/rdlong in the main loop also interferes with the ISR. Making these RDLONGs scattered in the code instead can give me another 4 nops for ISR. 5+4+2 I actually have= 11 nops. I need 6 for a filter, 5 can be enough to implement the PWM.

    Maybe somewhere in the future :) I will have to stop working on this in the near future, as the new (also P2 based) robot boards arrived and I have to switch to the professional work... The tracker/player/Paula is a playground for me to learn: the robot has to say something, now, instead of simple 1-voice mono driver I can use this. The main robot control program I wrote using Spin made me hate Spin with its lack of proper string handling, multidimensional arrays and indenting - the next version of the robot code will be rewritten in FlexBasic...

    Edit: tested this at 354693878 MHz. As I thought, I have 5 nops more which should be enough for the filter. As I can understand, the filter is one for all channels... PWMed volume control should be computed for all channels independently - I don't know now how I can do this.

  • roglohrogloh Posts: 5,786
    edited 2022-01-05 09:29

    @pik33
    You can save some cycles in your ISR.

    Use ptrb for the LUT address tail register with auto increment.
    Use the incmod instruction. Might need to be a2000000 instead of a1ffffff.

    isr1        wypin   lsample,#left        '2     The sample has to be outputted every 90 cycles     
                wypin   rsample,#right       '4
    
                incmod  counter,a1fffffff    '6
                cmp     counter,irqtime wcz  '8     Check if it is time for the next sample
        if_ne   reti1                        '10  If not, do nothing
    
                getword rsample,lsnext,#1    '12
                getword lsample,lsnext,#0    '14
                and     ptrb, #511           '16        
                cmp     ptrb,front wcz       '18    If the buffer is empty, do nothing 
    
     if_ne      rdlut   lsnext,ptrb++        '20/21    else read the sample and its time from LUT
     if_ne      rdlut   irqtime,ptrb++       '22/24  Read the time for this sample
                reti1                        '24/26
    
  • pik33pik33 Posts: 2,366

    I already tried ptrb - it didn't work and I don't know why. Now I have this 354 MHz setup to experiment with. I didn't tried incmod yet. I have bad experience with incmod: it never worked for me every time I tried it. There is something I don't understand.

    So, ptrb and incmod will be tested now until they work :)

  • I used RDLUT with ptrb and auto increment in my driver - it works. And INCMOD also works - I just can't recall offhand if it needs the final value or final value +1 setup in the S register (might be the final loop value, so a1ffffff).

  • Also there should be plenty of time to execute the main (non-ISR) sample reading code for normal mod files. Even if the ISR chews up 50% of the cycles. I think the Amiga HW couldn't generate its samples faster than about 29kHz. So this is four samples (or 8 in your case) in 34us. At over 300MHz you are still talking something above 5k P2 cycles to do your per channel calculations and sample reading with 50% of the COG in 34us. I found there is plenty of time to go read PSRAM samples even with a 1us latency or so per read operation.

Sign In or Register to comment.