Paula (Amiga) inspired audio driver [0.95 - envelopes]

Wuerfel_21 · 2022-01-29 12:49

Better yet, a sequence such as this (including the AND)

p301        mov     t2,ptrb            ' Check if the buffer is full    
            sub     t2,#1
            and     t2,#511
            cmp     t2,front wcz

can be refactored to

p301        alts    ptrb,#511            ' Check if the buffer is full    
            cmpr    front,#0-0 wcz

pik33 · 2022-01-29 12:54

dt0 is the interval - in samples - between the last computed sample and the new one. Now this number of samples have to be put in the buffer. It may be any number, including zero (this means another channel reached the point of sample change and has to be computed immediately) . But then there is a bug in this code. I have to put the last sample and not the new one into the buffer. The new one has to be remembered and put to the buffer next time.

evanh · 2022-01-29 12:57

@Wuerfel_21 said:
can be refactored to

p301        alts    ptrb,#511            ' Check if the buffer is full    
            cmpr    front,#0-0 wcz

Tricky! Currently, PTRB isn't masked in the existing code. The ISR just keeps incrementing it until register rollover.

EDIT: Oh! Only low 9 bits feeds the compare!
EDIT2: Bah, which will fail at zero wrap because 511 is higher than 0.

pik33 · 2022-01-29 12:57

@Wuerfel_21 said:
Better yet, a sequence such as this (including the AND)

p301        mov     t2,ptrb            ' Check if the buffer is full    
            sub     t2,#1
            and     t2,#511
            cmp     t2,front wcz

can be refactored to

p301        alts    ptrb,#511            ' Check if the buffer is full    
            cmpr    front,#0-0 wcz

To be tested. There is now a lot of fixes to implement

pik33 · 2022-01-29 13:01

... and then try to use these:

pik33 · 2022-01-29 13:07

p301        alts    ptrb,#511            ' Check if the buffer is full    
            cmpr    front,#0-0 wcz

Now I know how it can work (after looking into the PASM instruction list) but this piece of code is the example of a heavy P2 black magic.

evanh · 2022-01-29 13:16

@pik33 said:
p301        alts    ptrb,#511            ' Check if the buffer is full    
            cmpr    front,#0-0 wcz
Now I know how it can work (after looking into the PASM instruction list) but this piece of code is the example of a heavy P2 black magic.

Indeed, using the ALTx instructions on immediate mode fields was not their intended purpose.

For equality compare to work, I'd forgotten that's all needed, it'll need front masked to 9 bits as well.

evanh · 2022-01-29 13:18

Oh, and that can be combined too ... change

            wrlut   newsample, front
            add     front,#1
            and     front,#511
            djnz    t1,#p301

to

            wrlut   newsample, front
            incmod  front,#511
            djnz    t1,#p301

And change

p101        cmp     oldt0,time0 wz   ' There must not be 2 entries with the same time
    if_z    sub     front,#1         ' 
    if_z    and     front,#511

to

p101        cmp     oldt0,time0 wz   ' There must not be 2 entries with the same time
    if_z    decmod  front,#511

evanh · 2022-01-29 14:08

@evanh said:
Tricky! Currently, PTRB isn't masked in the existing code. The ISR just keeps incrementing it until register rollover.

That's a good point, I think there's a bug in the ISR. PTRB never gets masked, therefore it will fail the equality compare against front inside the ISR.

EDIT: Removing all the masks would've worked before that ALTS was added.

pik33 · 2022-01-29 17:10

These incmod, decmod - don't they have to be ##512?

Wuerfel_21 · 2022-01-29 17:37

No, the value has to be the last one in the range

pik33 · 2022-01-29 17:38

The result: I applied Wuerfel's magic - it works. Ands can not be removed: instead, after some debugging, I had to restore and in the isr code: buffer pointers has to be keep 9-bit wide, ptrb++ wraps around 20 bits, not 9 - wrluts still work, but cmps not

pik33 · 2022-01-29 17:39

@Wuerfel_21 said:
No, the value has to be the last one in the range

Now that's why I had problem with these instructions earlier

Wuerfel_21 · 2022-01-29 17:45

Note that the ALTS immediate trick implicitly ANDs with 511 (which is why using S = #511 works to subtract one)

pik33 · 2022-01-29 17:53

@Wuerfel_21 said:
Note that the ALTS immediate trick implicitly ANDs with 511 (which is why using S = #511 works to subtract one)

It compares ptrb added to #511 and anded with #511 with front. So the and is done on the fly and doesn't affect both registers. Using inc/decmod make ands with front unnecessary, but then ptrb is increased in the ISR and when not anded, loops on 20-bit range. (experimentally checked - expected 9 or 32 bit, found 20)

evanh · 2022-01-29 23:36

If you'd not used the ALTS trick, then removing all the ANDs would've worked. Now your ISR has grown!

pik33 · 2022-01-30 09:17

AND has to remain until I fully understand what is going on inside a P2 with this code. The driver can work without it and I don't know why. PTRB++ overflows at 20 bits, while adding to the standard register overflows at 32 bits.
I have to understand all these instructions fully.

evanh · 2022-01-30 09:26

PTRB should be 32-bit as well. Just the upper bits get ignored when addressing hubRAM. So just the same with addressing cogRAM and lutRAM too, except there it's only the lower 9 bits are used.

The compares are fine comparing at 32-bit as long as all registers contain the same width data, ie: not masked with an AND. This works, for equality compare at least, in cohort with the smaller addressing ranges because they are all powers-of-two. The circular (unsigned two's-complement) maths works out.

pik33 · 2022-01-30 10:04

I experimentally discovered yesterday via wrlong ptrb to the hub and displaying its value on the screen - when ptrb is incremented via ptrb++, its 12 upper bits remains always 0

evanh · 2022-01-30 10:27

@pik33 said:
I experimentally discovered yesterday via wrlong ptrb to the hub and displaying its value on the screen - when ptrb is incremented via ptrb++, its 12 upper bits remains always 0

Ah, it'll be the adder is truncated rather than the register itself. In that case, masking everything to 9 bits is best way.

I guess there's no choice but to have the masking AND in the ISR. The ISR had been buggy as it was earlier anyway.

pik33 · 2022-01-30 11:00

This is the current version of it:

isr1        wypin   lsample,#left        '2     The sample has to be outputted every 100 cycles     
            wypin   rsample,#right       '4

            cmp     ptrb,front wcz       '6    If the buffer is empty, do nothing 
    if_ne   rdlut   lsnext,ptrb          '9   else read the sample from LUT
    if_ne   incmod  ptrb, #511           '11
    if_ne   getword rsample,lsnext,#1    '13
    if_ne   getword lsample,lsnext,#0    '15
            reti1                        '17/19

evanh · 2022-01-30 11:14

Heh, you've taken a liking to INCMOD. Traded in the post-incrementing RDLUT.

I don't think WYPIN needs repeating with same value over and over. Smartpin Y register should maintain existing value.

Couple of corrections with the comments on used clock cycles: Missing 4 ticks for entry. The RETI1 is always 4 ticks.

pik33 · 2022-01-30 11:16

No wypin=no interrupt called

evanh · 2022-01-30 11:22

Oh, right, needs an ACK on the smartpin to lower IN state. In that case just one AKPIN on the event source will do ...
EDIT: Ha! But that takes an extra instruction. Lol, shows I've not done any interrupts myself.

EDIT2: Here we go. Need to also change the IRQ source to the "right" smartpin:

isr1
            cmp     ptrb,front wz        '6         If the buffer is not empty
    if_nz   rdlut   lsnext,ptrb          '8/9       read the stereo samples from LUT
    if_nz   incmod  ptrb, #511           '10/11
    if_nz   wypin   lsnext,#left         '12/13
    if_nz   getword lsnext,lsnext,#1     '14/15
            wypin   lsnext,#right        '16/17     To rearm event, the smartpin has to be acknowledged every 100 cycles
            reti1                        '20/21

            setse1  #%001<<6 + right ' Set the event - DAC empty

pik33 · 2022-01-31 13:30

0.22

Got rid of this awful channel mixer and these 32 wasted longs for channel values

... edit..... and now the driver can work at full Amiga (Paula x2, 7.09 MHz) frequency for the first time - the minimum value is now 48. Interrupts at over 7 MHz - try this with other architectures
This means at the normal 3.5 MHz 32 channels should be possible for this DOPE.MOD

... edit 2.... having IRQ work at 50 clocks also means the driver can now run at lower (starting from175 MHz) clock speed using standard Paula frequency

.... then... what if average 3 or 4 samples and output it at clock/256? Averaging filter at ~1 MHz should be not hearable at all in accoustic band, but moving to clock/256 will reduce DAC noise by 25 dB... To be tested.

pik33 · 2022-01-31 19:53

0.23.
32 channel, but it cannot run at 7.09 MHz. We have now open the possibility of playing DOPE.MOD if...

(1) PSRAM.... and its loading time doesn't destroy timings.
(2) the file format needs to be decoded, as it is not standard mod file.

Is there any PSRAM 1-chip driver for initial experiment? Or maybe try and write one to understand how it works....

Ahle2 · 2022-01-31 20:00

@pik33

Great work... I love seeing so much oldschool retro stuff (a la Jay Miner) in a single project. A display list (next implement a real copper ) and a sound system locked to the pixel clock which doesn't add any aliasing. That was the reason why the Amiga had way better sound quality in REALITY compared to the SNES. A first look at the specifications for the two systems (without a deeper technical understanding) would seem to indicate the SNES was the clear winner. But the SNES downmixed everything to 32 kHz and added an awful lowpass filter to hide the aliasing. Then it could only access 64 kB of sample data, so most music sounded more like "chip tunes". If a game had sound FX as well, the music and the sound FX had to share that tiny 64 kB (tiny for a sample based sound system) data.

If your driver works the same as the Amiga, you should be able to skip the long[freq] := amigaPeriodToFreq(currPeriod[channelNumber] + deltaPeriod[channelNumber]) and do long[freq] := currPeriod[channelNumber] + deltaPeriod[channelNumber]. I added that conversion to convert from the real Amiga periods to reSounds phase accumulator frequency format. The tracker player was made to handle everything according to the Amiga standard internally and then just convert to reSound compatible parameters in the API's get methods. But I don't even understand how that conversion can work with your driver at all, it shouldn't!!
The new version of the player (it will be released with the next version of simpleSound) will have a lot of small fixes; which actually mostly are implementation of bugs from the real protracker play routine on the Amiga. It will play many more tunes correctly. It's extremely interesting that the .mod standard was so EXTREMELY buggy and that the music created for it made use of that to do great things not intended. (reminds me of something.... )

Btw, that "dope.mod" is not a .mod at all, it's actually a .xm module and is not compatible with the protracker format at all. (I'm already looking into a .xm and .dbm tracker player that would will work fine with the 64 available channels in reSound. Stay tuned)

pik33 · 2022-01-31 20:07

you should be able to skip the long[freq] := amigaPeriodToFreq(currPeriod[channelNumber] + deltaPeriod[channelNumber]) and do long[freq] := currPeriod[channelNumber] + deltaPeriod[channelNumber].

It is already skipped, the driver gets periods exactly as they are - look at (very messy, experimental field ) player4.bas. Basic - another good old retro thing - this 10* (...) /10 gives me a tool to experiment with different driver base frequency. That conversion has no sense at all for this driver. I have to add a few hacker's lines to the tracker player to tell the driver the sound has to start from 0

dpoke base+24, 10*(tracker.currPeriod(0)+tracker.deltaperiod(0))/10

This is one channel :

   if tracker.trigger(0)<>old1 then 
     old1=tracker.trigger(0)

      lpoke base+8,tracker.currSamplePtr(0) or $40000000
      lpoke base+12,tracker.currsamplelength(0)-tracker.currrepeatLength(0)
      lpoke base+16,tracker.currsamplelength(0)

    endif
    dpoke base+20, (tracker.currVolume(0)+tracker.deltavolume(0))*mainvolume
    dpoke base+22, 8192-2048
    dpoke base+24, 10*(tracker.currPeriod(0)+tracker.deltaperiod(0))/10
    dpoke base+26, 1

Wuerfel_21 · 2022-01-31 20:46

@Ahle2 said:

That was the reason why the Amiga had way better sound quality in REALITY compared to the SNES. A first look at the specifications for the two systems (without a deeper technical understanding) would seem to indicate the SNES was the clear winner. But the SNES downmixed everything to 32 kHz and added an awful lowpass filter to hide the aliasing. Then it could only access 64 kB of sample data, so most music sounded more like "chip tunes". If a game had sound FX as well, the music and the sound FX had to share that tiny 64 kB (tiny for a sample based sound system) data.

Don't diss SNES too hard though - how many games for unexpanded Amiga actually have ~128k of space left for audio data (factoring in that Paula uses 8 bit uncompressed samples vs 4 bit compressed on S-DSP)? And 32kHz is waaay enough if what you're doing is anywhere near band-limited. They really should have added a bit to switch to simple 2-sample linear interpolation though instead of that 4-sample gaussian filter that makes everything sound muffled.

But yeah SEGA had better sound Not sure where I'd place Amiga in relation (4 channels is just a bit meager), but given that A1000 is from '85 I feel like it doesn't need to be in that competition. Really should've doubled to 8 channels for ECS or AGA though.

rogloh · 2022-01-31 23:05

@pik33 said:
0.23.
32 channel, but it cannot run at 7.09 MHz. We have now open the possibility of playing DOPE.MOD if...

(1) PSRAM.... and its loading time doesn't destroy timings.
(2) the file format needs to be decoded, as it is not standard mod file.

Is there any PSRAM 1-chip driver for initial experiment? Or maybe try and write one to understand how it works....

I am working on a mod to my quad PSRAM driver for single chip - it is mostly working (tested with graphics) but needs final release and that could be soon.

As to 1) above. PSRAM access latency can be a killer for random reads but I do have a way to generate a request list for accesses so the requesting COG can make multiple requests in the single query and the driver can generate back to back memory accesses at all those addresses in the list as it works its way through. This could happen in the background for your next channel sample, while the current one is being processed for example. Reads are delivered to the hub as they arrive and the list pointer is also updated in HUB RAM so you know which results are already present (like scatter/gather I/O). I think this would help when the read algorithm is designed to work with it, and should allow multiple audio channels instead of just a handful. The bandwidth is there with PSRAM, you just have to work with the latency it has. I've done so successfully with real-time video for example.

Paula (Amiga) inspired audio driver [0.95 - envelopes]

Comments