Shop Learn
Paula (Amiga) inspired audio driver - Page 2 — Parallax Forums

Paula (Amiga) inspired audio driver

2»

Comments

  • pik33pik33 Posts: 1,226
    edited 2022-01-05 11:10

    Now it works. I simply forgot to zero the ptrb at the start.

    There is 11 nops free at 354 MHz and 6 nops at 320 MHz. After an hour of playing at 354 MHz P2 Edge is hot. Not too hot to touch, something slightly more than 40C/100F. Still stable.

    isr1        wypin   lsample,#left        '2     The sample has to be outputted every 90 cycles     
                wypin   rsample,#right       '4
    
                incmod  counter,a20000000    '6 
                cmp     counter,irqtime wcz  '8     Check if it is time for the next sample
        if_ne   reti1                        '10/12 If not, do nothing
    
                getword rsample,lsnext,#1    '12
                getword lsample,lsnext,#0    '14
                cmp     ptrb,front wcz       '16    If the buffer is empty, do nothing 
        if_e    reti1                        '18/20
    
                rdlut   lsnext,ptrb++        '21    else read the sample and its time from LUT
                rdlut   irqtime,ptrb++       '23    Read the time for this sample
    
                reti1  
    
  • pik33pik33 Posts: 1,226
    edited 2022-01-05 12:31

    .. the "6 nops" low pass filter started to work....

    .. and I have to measure the filer and/or find the error in the calculations as what is supposed to be 17 kHz low pass filter seems to have much lower cutoff frequency. Then the filter has to be made switchable via the player.

  • roglohrogloh Posts: 3,794

    Just double check your INCMOD "S" value is correct. I think it is meant to be the last value in the incrementing sequence after which it wraps to zero. So INCMOD D, #0 would always remain at zero, while INCMOD D, #1 would count 0,1,0,1, etc. At least that's how I think it works.

  • pik33pik33 Posts: 1,226
    edited 2022-01-05 20:46

    incmod works with $20000000


    The filters don't work stable at 320 MHz. There are modules that can make the player stop playing, while at 354 MHz they work (more place available in the ISR).


    I made a big mistake implementing this filter and that's why calculations failed...


    A new idea. Precompute all these samples offline, not sample-time but all of them. This is simply compute the sample, as it is now, determine how many samples with the same value needs to be placed in the buffer, place them there and let ISR to put them to DACs and nothing else.

    This may allow to implement these PWMs, Amiga filter and postprocessing...

  • pik33pik33 Posts: 1,226
    edited 2022-01-10 11:25

    A major rewrite in progress, but the changed driver (v 014) actually works.

    As in the previous post. The main program computes all samples and put them into the buffer.
    The interrupt only pushes these samples into DACs

    A disadvantage: more complex main loop which executes longer - to be optimized, but Amiga modules still play
    Advantages:

    • much simpler ISR
    • all samples available in the main loop so filters can be added, on-screen oscilloscope can be made, etc

    Still no time and idea how to add a Paula type PWM volume. This has the low priority, as it is complex to implement and I have no real Amiga to compare the result.

  • roglohrogloh Posts: 3,794
    edited 2022-01-10 12:24

    @pik33 said:
    A major rewrite in progress, but the changed driver (v 014) actually works.

    As in the previous post. The main program computes all samples and put them into the buffer.
    The interrupt only pushes these samples into DACs

    A disadvantage: more complex main loop which executes longer - to be optimized, but Amiga modules still play
    Advantages:

    • much simpler ISR
    • all samples available in the main loop so filters can be added, on-screen oscilloscope can be made, etc

    Still no time and idea how to add a Paula type PWM volume. This has the low priority, as it is complex to implement and I have no real Amiga to compare the result.

    See if you can come up with a scheme that still allows samples to be read in from PSRAM (aim for about 1us of access latency per each random sample access in setups above 250MHz). This way you'll hopefully be able to play larger mod files (or maybe one day even s3m files LOL) and fit them in external memory.

  • pik33pik33 Posts: 1,226
    edited 2022-01-10 15:00

    I need to connect PSRAMs to test. I have no Edge with PSRAM (and no hope to get one in the near future) and I don't know what I can achieve with these chips connected using wires (speed?)

    I can however simulate one us latency while getting a sample to test if it still works.

    Edit. 1 us latency per sample should still work. This loop

    delay   mov qq,##90   ' about 360 cycles (?)
    p900    djnz qq,#p900
    

    caused the player to distort at something about 300 without filter and 200 with filter. There is still a lot of code to optimize in the main loop. These setq/rdlongs - they will be replaced with something less time consuming. I don't need to read all these parameters every sample.

  • pik33pik33 Posts: 1,226

    0.15.

    A lot of time (and also size) optimizations - this version doesn't hang up while running interrrupts at Paula x2 speed! (while distorting..... but the previous version didn't run at all)

    This means there will be no problem with getting samples from PSRAM or (and?) adding filters to the audio at the standard Paula clock.

    The main difference is: removed reading all registers every loop. Instead, only the channel which has to be computed reads its registers. About 160 clocks (average) saved at every loop
    The API difference: the cmd register which is used to reset the sample phase accumulator (= to start playing the note) now has to be set to $FFFFFFFF instead of any non-zero value to allow the sample to run. This allowed to remove 8 instructions - 16 clocks from the main loop. This also added additional unneeded (?) "feature": setting the value to any non-FFFFFFFF value will cause the sample to loop when the phase accumulator reaches the zero bit in the CMD register (it is ANDed with this value every loop)

    Several other changes also stripped several clocks from the loop, the loop time is about 200 clocks less now.

  • evanhevanh Posts: 12,230
    edited 2022-01-11 12:07

    With the RAMs being managed from a dedicated cog, it should be possible to perform a prefetching scheme for read data. The tricky part is making it deterministic. Which basically means application driven - the developer has to specify what to prefetch, and preallocate the space for it, before actually needing the data.

    Write data is much easier. The regular write-this-block-of-data can be a background operation. As long as data rate stays sane then no issue.

    PS: I've not tried reading up on Roger's driver as yet. No idea what flexibility it might have.

  • pik33pik33 Posts: 1,226
    edited 2022-01-11 14:03

    While playing an Amiga module, sample data are read sequentially, byte by byte and every byte, so a prefetch is possible. Exceptions are starting a new sample and loop the sample. The driver can skip samples and use 16-bit samples, but these features are not present in the original original Paula, so they are not needed to play a module. The read however is still sequential and predictable, so we can tell the RAM driver to do this.

    The audiodriver's LUT based buffer now contains about 140 microsecond of audio data.

    Maybe the module can be played directly from Edge's or Eval's flash memory using such a dedicated driver/cache? I have to try this :)

  • roglohrogloh Posts: 3,794

    @evanh said:
    With the RAMs being managed from a dedicated cog, it should be possible to perform a prefetching scheme for read data. The tricky part is making it deterministic. Which basically means application driven - the developer has to specify what to prefetch, and preallocate the space for it, before actually needing the data.

    You can try to prefetch if you can and this will reduce the number of requests needed which is helpful. Also my driver has some in-built capability to skip some bytes of memory between read portions with its graphics copy stuff, and I think this might be useful for audio, however any wrapping back around to lower memory addresses during audio sample looping still has to be dealt with by the caller.

  • pik33pik33 Posts: 1,226
    edited 2022-01-15 19:31

    A huge size optimization (445->190 longs) and major rewrite for v. 0.18 (use player2.bas to test this - the registers changed). No more 8 repeated channels code, no more 72 longs in cog ram for registers. ALTS/ALTD used instead and only one procedure and one register set.

    https://github.com/pik33/P2-retromachine/tree/main/Propeller/Tracker player

    I have to clean this up moving all old versions to recycle folder :)

    This change creates the place for more functions:

    • there is a current pointer available in the hub for every channel so the main program can now stream the long sample to one looped channel buffer (wav playing from the SD/Flash/PSRAM)
    • there is a current sample value available in the hub, so things like oscilloscopes can be done without messing with the driver: simply read these samples in the main program)
    • there is a room for more channels in the driver
  • evanhevanh Posts: 12,230
    edited 2022-01-15 22:01

    DAC update rate of sysclock/90 !? .... I was going in my head "That's insane! Why?" ... Then it dawned on me you may not know that the smartpin DACmodes have an integral DAC update timer with buffering. If you were setting the DACs directly then you've coded using the IRQ correctly.

    However, as long as the program feeds the next sample into the smartpin sometime before the next DAC update period, then it'll be a clean metronomic interval between the two samples.

    PS: Eliminating interrupts altogether will free up the smaller 14 of 90 clocks, 15%, of cog time.

  • evanhevanh Posts: 12,230
    edited 2022-01-15 22:07

    Which, in turn, means the lutRAM buffer handling can be compacted right down to just waiting for smartpin ready. No software buffering at all.

  • AribaAriba Posts: 2,567

    @evanh said:
    DAC update rate of sysclock/90 !? .... I was going in my head "That's insane! Why?" ... Then it dawned on me you may not know that the smartpin DACmodes have an integral DAC update timer with buffering. If you were setting the DACs directly then you've coded using the IRQ correctly.

    However, as long as the program feeds the next sample into the smartpin sometime before the next DAC update period, then it'll be a clean metronomic interval between the two samples.

    PS: Eliminating interrupts altogether will free up the smaller 14 of 90 clocks, 15%, of cog time.

    Interessting idea! But you need a DAC output per channel, and have to mix them together analog with resistors. With CMOS switches (4016 or 4066) and another 4 smart pins, you can also do the PWM volume control.
    So 8 pins + some external circuit for a Paula emulation, but this may be very close to the original.

    Andy

  • evanhevanh Posts: 12,230
    edited 2022-01-16 00:18

    Pik's doing all that in software I think. It's just the two 16-bit dithered DAC smartpins being fed by the ISR.

    '--------------------------------------------------------------------------
    '------ Interrupt service -------------------------------------------------
    '------ Output the sample, get the next one if exists ---------------------
    '--------------------------------------------------------------------------
    
    isr1        wypin   lsample,#left        '2     The sample has to be outputted every 90 cycles     
                wypin   rsample,#right       '4
    
                cmp     ptrb,front wcz       '6    If the buffer is empty, do nothing 
        if_e    reti1                        '8/10
    
                rdlut   lsnext,ptrb++        '11    else read the sample and its time from LUT
                getword rsample,lsnext,#1    '13
                getword lsample,lsnext,#0    '15
                reti1                        '17/19 
    

    PS: His program is whizzing along doing the mixing at over 300 MHz sysclock! There's quite some MIPS to burn.

  • roglohrogloh Posts: 3,794

    @pik33 Instead of this:

                testb   sstart0,#30 wz
        if_nz   jmp     #p403
    
                mov    pointer0,#0  
                bitl   sstart0,#30
                add    ptra,#8
                wrlong sstart0,ptra
                sub    ptra,#8
    
    p403        setq #1
    

    you could just do this (C flag is affected too but your code doesn't need to preserve it here anyway):

                bitl   sstart0, #30 wcz            
        if_z    mov    pointer0, #0  
        if_z    wrlong sstart0, ptra[2]
    
    p403        setq #1
    
  • pik33pik33 Posts: 1,226
    edited 2022-01-16 09:08

    Applied the patch: it works. 3 instruction instead of 7 !. I didn't realized there is something like ptra[2] available, after near full year (February 8th) with a P2 on my desk... Too old or too busy, or both of these...

    This was also the first time when I utilized ALTx instructions (that's why I wrote this topic about aliases).

    Now I have to try write a player with the oscilloscope on the screen. This can be a good example of using displaylisted driver: I can replace several text lines with graphic to display the oscilloscope while keeping most of the screen in the text mode, saving the hub ram space.

  • pik33pik33 Posts: 1,226
    edited 2022-01-18 21:48

    A little offtopic...

    These 4 lines in the player2.bas

        dlptr=v030.dl_ptr
        olddl=lpeek(dlptr)
        for i=0 to 539: lpoke dlptr+4*i,lpeek(dlptr+4*i+4) : next i
        lpoke dlptr+539*4,olddl
    
    

    do this to the display:

    The display is in character mode... (112x31 chars @ 8x16 pixels). I had to recall how t o use the DL and then I discovered I have still a lot of unimplemented features. The MOD player using SD and a keyboard to control, either connected via a serial terminal or via RPi based interface is now possible, all the needed elements are in place ready to use.

    The module is http://modarchive.org/index.php?request=view_by_moduleid&query=35071 - worth listening to.

    It seems connecting the power amplifier to RCAs of AV board instead of audio jack gives slightly better audio quality... one IC less in the audio path :)

  • @pik33 said:
    It seems connecting the power amplifier to RCAs of AV board instead of audio jack gives slightly better audio quality... one IC less in the audio path :)

    Yes, the audio amplifier is a bit poor and not needed when going into a line input. And for when you do plug in headphones, it is WAY TOO LOUD.

Sign In or Register to comment.