Shop Learn
Paula (Amiga) inspired audio driver [0.93 - non-integer skip=fine tuning enabled] - Page 2 — Parallax Forums

Paula (Amiga) inspired audio driver [0.93 - non-integer skip=fine tuning enabled]

24

Comments

  • pik33pik33 Posts: 1,517
    edited 2022-01-05 11:10

    Now it works. I simply forgot to zero the ptrb at the start.

    There is 11 nops free at 354 MHz and 6 nops at 320 MHz. After an hour of playing at 354 MHz P2 Edge is hot. Not too hot to touch, something slightly more than 40C/100F. Still stable.

    isr1        wypin   lsample,#left        '2     The sample has to be outputted every 90 cycles     
                wypin   rsample,#right       '4
    
                incmod  counter,a20000000    '6 
                cmp     counter,irqtime wcz  '8     Check if it is time for the next sample
        if_ne   reti1                        '10/12 If not, do nothing
    
                getword rsample,lsnext,#1    '12
                getword lsample,lsnext,#0    '14
                cmp     ptrb,front wcz       '16    If the buffer is empty, do nothing 
        if_e    reti1                        '18/20
    
                rdlut   lsnext,ptrb++        '21    else read the sample and its time from LUT
                rdlut   irqtime,ptrb++       '23    Read the time for this sample
    
                reti1  
    
  • pik33pik33 Posts: 1,517
    edited 2022-01-05 12:31

    .. the "6 nops" low pass filter started to work....

    .. and I have to measure the filer and/or find the error in the calculations as what is supposed to be 17 kHz low pass filter seems to have much lower cutoff frequency. Then the filter has to be made switchable via the player.

  • roglohrogloh Posts: 4,027

    Just double check your INCMOD "S" value is correct. I think it is meant to be the last value in the incrementing sequence after which it wraps to zero. So INCMOD D, #0 would always remain at zero, while INCMOD D, #1 would count 0,1,0,1, etc. At least that's how I think it works.

  • pik33pik33 Posts: 1,517
    edited 2022-01-05 20:46

    incmod works with $20000000


    The filters don't work stable at 320 MHz. There are modules that can make the player stop playing, while at 354 MHz they work (more place available in the ISR).


    I made a big mistake implementing this filter and that's why calculations failed...


    A new idea. Precompute all these samples offline, not sample-time but all of them. This is simply compute the sample, as it is now, determine how many samples with the same value needs to be placed in the buffer, place them there and let ISR to put them to DACs and nothing else.

    This may allow to implement these PWMs, Amiga filter and postprocessing...

  • pik33pik33 Posts: 1,517
    edited 2022-01-10 11:25

    A major rewrite in progress, but the changed driver (v 014) actually works.

    As in the previous post. The main program computes all samples and put them into the buffer.
    The interrupt only pushes these samples into DACs

    A disadvantage: more complex main loop which executes longer - to be optimized, but Amiga modules still play
    Advantages:

    • much simpler ISR
    • all samples available in the main loop so filters can be added, on-screen oscilloscope can be made, etc

    Still no time and idea how to add a Paula type PWM volume. This has the low priority, as it is complex to implement and I have no real Amiga to compare the result.

  • roglohrogloh Posts: 4,027
    edited 2022-01-10 12:24

    @pik33 said:
    A major rewrite in progress, but the changed driver (v 014) actually works.

    As in the previous post. The main program computes all samples and put them into the buffer.
    The interrupt only pushes these samples into DACs

    A disadvantage: more complex main loop which executes longer - to be optimized, but Amiga modules still play
    Advantages:

    • much simpler ISR
    • all samples available in the main loop so filters can be added, on-screen oscilloscope can be made, etc

    Still no time and idea how to add a Paula type PWM volume. This has the low priority, as it is complex to implement and I have no real Amiga to compare the result.

    See if you can come up with a scheme that still allows samples to be read in from PSRAM (aim for about 1us of access latency per each random sample access in setups above 250MHz). This way you'll hopefully be able to play larger mod files (or maybe one day even s3m files LOL) and fit them in external memory.

  • pik33pik33 Posts: 1,517
    edited 2022-01-10 15:00

    I need to connect PSRAMs to test. I have no Edge with PSRAM (and no hope to get one in the near future) and I don't know what I can achieve with these chips connected using wires (speed?)

    I can however simulate one us latency while getting a sample to test if it still works.

    Edit. 1 us latency per sample should still work. This loop

    delay   mov qq,##90   ' about 360 cycles (?)
    p900    djnz qq,#p900
    

    caused the player to distort at something about 300 without filter and 200 with filter. There is still a lot of code to optimize in the main loop. These setq/rdlongs - they will be replaced with something less time consuming. I don't need to read all these parameters every sample.

  • pik33pik33 Posts: 1,517

    0.15.

    A lot of time (and also size) optimizations - this version doesn't hang up while running interrrupts at Paula x2 speed! (while distorting..... but the previous version didn't run at all)

    This means there will be no problem with getting samples from PSRAM or (and?) adding filters to the audio at the standard Paula clock.

    The main difference is: removed reading all registers every loop. Instead, only the channel which has to be computed reads its registers. About 160 clocks (average) saved at every loop
    The API difference: the cmd register which is used to reset the sample phase accumulator (= to start playing the note) now has to be set to $FFFFFFFF instead of any non-zero value to allow the sample to run. This allowed to remove 8 instructions - 16 clocks from the main loop. This also added additional unneeded (?) "feature": setting the value to any non-FFFFFFFF value will cause the sample to loop when the phase accumulator reaches the zero bit in the CMD register (it is ANDed with this value every loop)

    Several other changes also stripped several clocks from the loop, the loop time is about 200 clocks less now.

  • evanhevanh Posts: 12,903
    edited 2022-01-11 12:07

    With the RAMs being managed from a dedicated cog, it should be possible to perform a prefetching scheme for read data. The tricky part is making it deterministic. Which basically means application driven - the developer has to specify what to prefetch, and preallocate the space for it, before actually needing the data.

    Write data is much easier. The regular write-this-block-of-data can be a background operation. As long as data rate stays sane then no issue.

    PS: I've not tried reading up on Roger's driver as yet. No idea what flexibility it might have.

  • pik33pik33 Posts: 1,517
    edited 2022-01-11 14:03

    While playing an Amiga module, sample data are read sequentially, byte by byte and every byte, so a prefetch is possible. Exceptions are starting a new sample and loop the sample. The driver can skip samples and use 16-bit samples, but these features are not present in the original original Paula, so they are not needed to play a module. The read however is still sequential and predictable, so we can tell the RAM driver to do this.

    The audiodriver's LUT based buffer now contains about 140 microsecond of audio data.

    Maybe the module can be played directly from Edge's or Eval's flash memory using such a dedicated driver/cache? I have to try this :)

  • roglohrogloh Posts: 4,027

    @evanh said:
    With the RAMs being managed from a dedicated cog, it should be possible to perform a prefetching scheme for read data. The tricky part is making it deterministic. Which basically means application driven - the developer has to specify what to prefetch, and preallocate the space for it, before actually needing the data.

    You can try to prefetch if you can and this will reduce the number of requests needed which is helpful. Also my driver has some in-built capability to skip some bytes of memory between read portions with its graphics copy stuff, and I think this might be useful for audio, however any wrapping back around to lower memory addresses during audio sample looping still has to be dealt with by the caller.

  • pik33pik33 Posts: 1,517
    edited 2022-01-15 19:31

    A huge size optimization (445->190 longs) and major rewrite for v. 0.18 (use player2.bas to test this - the registers changed). No more 8 repeated channels code, no more 72 longs in cog ram for registers. ALTS/ALTD used instead and only one procedure and one register set.

    https://github.com/pik33/P2-retromachine/tree/main/Propeller/Tracker player

    I have to clean this up moving all old versions to recycle folder :)

    This change creates the place for more functions:

    • there is a current pointer available in the hub for every channel so the main program can now stream the long sample to one looped channel buffer (wav playing from the SD/Flash/PSRAM)
    • there is a current sample value available in the hub, so things like oscilloscopes can be done without messing with the driver: simply read these samples in the main program)
    • there is a room for more channels in the driver
  • evanhevanh Posts: 12,903
    edited 2022-01-15 22:01

    DAC update rate of sysclock/90 !? .... I was going in my head "That's insane! Why?" ... Then it dawned on me you may not know that the smartpin DACmodes have an integral DAC update timer with buffering. If you were setting the DACs directly then you've coded using the IRQ correctly.

    However, as long as the program feeds the next sample into the smartpin sometime before the next DAC update period, then it'll be a clean metronomic interval between the two samples.

    PS: Eliminating interrupts altogether will free up the smaller 14 of 90 clocks, 15%, of cog time.

  • evanhevanh Posts: 12,903
    edited 2022-01-15 22:07

    Which, in turn, means the lutRAM buffer handling can be compacted right down to just waiting for smartpin ready. No software buffering at all.

  • AribaAriba Posts: 2,584

    @evanh said:
    DAC update rate of sysclock/90 !? .... I was going in my head "That's insane! Why?" ... Then it dawned on me you may not know that the smartpin DACmodes have an integral DAC update timer with buffering. If you were setting the DACs directly then you've coded using the IRQ correctly.

    However, as long as the program feeds the next sample into the smartpin sometime before the next DAC update period, then it'll be a clean metronomic interval between the two samples.

    PS: Eliminating interrupts altogether will free up the smaller 14 of 90 clocks, 15%, of cog time.

    Interessting idea! But you need a DAC output per channel, and have to mix them together analog with resistors. With CMOS switches (4016 or 4066) and another 4 smart pins, you can also do the PWM volume control.
    So 8 pins + some external circuit for a Paula emulation, but this may be very close to the original.

    Andy

  • evanhevanh Posts: 12,903
    edited 2022-01-16 00:18

    Pik's doing all that in software I think. It's just the two 16-bit dithered DAC smartpins being fed by the ISR.

    '--------------------------------------------------------------------------
    '------ Interrupt service -------------------------------------------------
    '------ Output the sample, get the next one if exists ---------------------
    '--------------------------------------------------------------------------
    
    isr1        wypin   lsample,#left        '2     The sample has to be outputted every 90 cycles     
                wypin   rsample,#right       '4
    
                cmp     ptrb,front wcz       '6    If the buffer is empty, do nothing 
        if_e    reti1                        '8/10
    
                rdlut   lsnext,ptrb++        '11    else read the sample and its time from LUT
                getword rsample,lsnext,#1    '13
                getword lsample,lsnext,#0    '15
                reti1                        '17/19 
    

    PS: His program is whizzing along doing the mixing at over 300 MHz sysclock! There's quite some MIPS to burn.

  • roglohrogloh Posts: 4,027

    @pik33 Instead of this:

                testb   sstart0,#30 wz
        if_nz   jmp     #p403
    
                mov    pointer0,#0  
                bitl   sstart0,#30
                add    ptra,#8
                wrlong sstart0,ptra
                sub    ptra,#8
    
    p403        setq #1
    

    you could just do this (C flag is affected too but your code doesn't need to preserve it here anyway):

                bitl   sstart0, #30 wcz            
        if_z    mov    pointer0, #0  
        if_z    wrlong sstart0, ptra[2]
    
    p403        setq #1
    
  • pik33pik33 Posts: 1,517
    edited 2022-01-16 09:08

    Applied the patch: it works. 3 instruction instead of 7 !. I didn't realized there is something like ptra[2] available, after near full year (February 8th) with a P2 on my desk... Too old or too busy, or both of these...

    This was also the first time when I utilized ALTx instructions (that's why I wrote this topic about aliases).

    Now I have to try write a player with the oscilloscope on the screen. This can be a good example of using displaylisted driver: I can replace several text lines with graphic to display the oscilloscope while keeping most of the screen in the text mode, saving the hub ram space.

  • pik33pik33 Posts: 1,517
    edited 2022-01-18 21:48

    A little offtopic...

    These 4 lines in the player2.bas

        dlptr=v030.dl_ptr
        olddl=lpeek(dlptr)
        for i=0 to 539: lpoke dlptr+4*i,lpeek(dlptr+4*i+4) : next i
        lpoke dlptr+539*4,olddl
    
    

    do this to the display:

    The display is in character mode... (112x31 chars @ 8x16 pixels). I had to recall how t o use the DL and then I discovered I have still a lot of unimplemented features. The MOD player using SD and a keyboard to control, either connected via a serial terminal or via RPi based interface is now possible, all the needed elements are in place ready to use.

    The module is http://modarchive.org/index.php?request=view_by_moduleid&query=35071 - worth listening to.

    It seems connecting the power amplifier to RCAs of AV board instead of audio jack gives slightly better audio quality... one IC less in the audio path :)

  • @pik33 said:
    It seems connecting the power amplifier to RCAs of AV board instead of audio jack gives slightly better audio quality... one IC less in the audio path :)

    Yes, the audio amplifier is a bit poor and not needed when going into a line input. And for when you do plug in headphones, it is WAY TOO LOUD.

  • pik33pik33 Posts: 1,517

    The driver was upgraded, now without any problems, to 16 channels.

  • roglohrogloh Posts: 4,027
    edited 2022-01-29 00:52

    Wow, maybe soon you will be able to play that DOPE.MOD file with 28 channels. Although it would need external memory. I wonder if it could work across two COGs somehow...and still be mixed somewhere?

  • evanhevanh Posts: 12,903
    edited 2022-01-29 02:10

    A couple of optimised ISR options:

    isr1
                cmp     ptrb,front wz        '6         If the buffer is empty, do nothing
        if_z    reti1                        '8/10
    
                rdlut   lsnext,ptrb++        '11        read the stereo samples from LUT
                wypin   lsnext,#left         '15        The sample has to be outputted every 100 cycles
                getword lsnext,lsnext,#1     '13
                wypin   lsnext,#right        '17
                reti1                        '21
    
    isr1
                cmp     ptrb,front wz        '6         If the buffer is not empty
        if_nz   rdlut   lsnext,ptrb++        '8/9       read the stereo samples from LUT
        if_nz   wypin   lsnext,#left         '10/11     The sample has to be outputted every 100 cycles
        if_nz   getword lsnext,lsnext,#1     '12/13
        if_nz   wypin   lsnext,#right        '14/15
                reti1                        '18/19
    

    Original:

    isr1        wypin   lsample,#left        '6     The sample has to be outputted every 100 cycles     
                wypin   rsample,#right       '8
    
                cmp     ptrb,front wcz       '10    If the buffer is empty, do nothing 
        if_e    reti1                        '12/14
    
                rdlut   lsnext,ptrb++        '15    else read the sample and its time from LUT
                getword rsample,lsnext,#1    '17
                getword lsample,lsnext,#0    '19
                reti1                        '23
    
  • pik33pik33 Posts: 1,517

    This if_nz thing can speed the ISR up - I will implement this :)

    Adding more channels is possible but I have to get rid of this:

                mov     rs,#0            ' Mix all channels to rs and ls
                mov     ls,#0
                add     rs,rs1
                add     rs,rs2            'todo: in channel computing, mov rs to oldrs, then sub oldrs, add newrs
                add     rs,rs3
                add     rs,rs4
                add     rs,rs5
                add     rs,rs6
                add     rs,rs7
                add     rs,rs8
                add     rs,rs9
                add     rs,rs10
                add     rs,rs11
                add     rs,rs12
                add     rs,rs13
                add     rs,rs14
                add     rs,rs15
                add     rs,rs16
    
    
                add     ls,ls1
                add     ls,ls2
                add     ls,ls3
                add     ls,ls4
                add     ls,ls5
                add     ls,ls6
                add     ls,ls7
                add     ls,ls8
               add     ls,ls9
               add     ls,ls10
               add     ls,ls11
               add     ls,ls12
               add     ls,ls13
               add     ls,ls14
               add     ls,ls15
               add     ls,ls16
    

    Instead, let every channel keep its own sample, sub it from the sum and add the new one.

  • pik33pik33 Posts: 1,517

    @rogloh said:
    Wow, maybe soon you will be able to play that DOPE.MOD file with 28 channels. Although it would need external memory. I wonder if it could work across two COGs somehow...and still be mixed somewhere?

    It's big! 28 channels - so there IS a reason to make this driver 32 chn....

    And, at last, try to solder these PSRAMs... Mouser PL still doesn't have Edge 32MB in stock.

  • evanhevanh Posts: 12,903
    edited 2022-01-29 09:59

    @pik33 said:
    Adding more channels is possible but I have to get rid of this:
    ...

    Assuming I've read the code correctly ... I think the whole thing needs restructured. Move away from processing one channel at a time and just do them all every loop:

    • The channel selector naturally vanishes.
    • The SCA instructions are begging to feed summing instead of wasted on MOVs. The channel mixing can be integrated here.
    • Both the lutRAM buffering and interrupt can and need to vanish. There is already hardware sample buffering and the timing in the smartpins. TESTP or WAITSE1 for waiting on buffer ready.
    • Overall loop time becomes constant.
  • pik33pik33 Posts: 1,517
    edited 2022-01-29 11:20

    just do them all every loop:

    No way :) This is 3.5 MHz sample rate. All this channel selector and buffering allows to work around the lack of time because one channel needs to be recalculated every 100-800 samples - so average 400/16=25 samples for 16 channels, but there can be one moment where all samples need to be calculated. That's why I have to buffer this and optimize where possible. While all these channels are being calculated, the ISR has 512 samples in the buffer to play, about 140 microseconds. In this situation there will be no less than 100 samples time to fill the buffer again.

  • evanhevanh Posts: 12,903

    Thanks for the detail.

    I found three ANDs that can be deleted. I've commented them out below:

    p101        cmp     oldt0,time0 wz   ' There must not be 2 entries with the same time
        if_z    sub     front,#1         ' 
    '    if_z    and     front,#511     
    
                    ...
    
    p301        mov     t2,ptrb            ' Check if the buffer is full    
                sub     t2,#1
    '            and     t2,#511
                cmp     t2,front wcz
        if_e    jmp     #p301    
    
                wrlut   newsample, front
                add     front,#1
    '            and     front,#511
                djnz    t1,#p301
    
  • pik33pik33 Posts: 1,517

    I found three ANDs that can be deleted

    To be experimentally checked.

  • evanhevanh Posts: 12,903

    I wonder if this zero check should be up at the start rather than down at the buffering? If there's nothing to buffer then why was the sample computed?

                cmp      dt0,#0 wz
        if_z    jmp      #loop1
                mov      t1,dt0
    
Sign In or Register to comment.