Paula (Amiga) inspired audio driver [0.93 - non-integer skip=fine tuning enabled]
First working alpha (0.08) attached. Not optimal for time or memory use - optimization is TODO for next versions. There is also an unsolved problem which can make this driver to stop working after about 1000 seconds due to the internal counter overflow.
Paula is an Amiga 4-channel sample based audio chip. Every channel can have its own sample rate. This means:
- no resampling. No interpolating, filtering and aliases added to resampled sound. Instead, every sample is outputted "as is"
- if the sample is one period type (for chiptunes) or has exactly integer number of wave periods, aliases introduced by Paula (using low sample rate) are harmonics. This means they rather adds to the sound instead of to be annoying, alien, non-harmonic noise
So, the goal was to make a synthesizer which can do exactly that, and exactly at original Paula sample rate so it can be easily used for Amiga modules.
The problem was of course computation time. At about 3.5 MHz sample rate the program has to determine if the new sample from every channel needs to be outputted, and if, the driver has to retrieve the sample from the hub memory. At 320 MHz I have only 90 clocks for this. It is simply not possible, one RDxxx can take 17 cycles and if it happens that all 4 channels have to be updated at the same time, there has to be much more than 90 clocks to compute this.
And the P2 cog has only 1 streamer and 1 FIFO
The solution is: the Amiga uses low sample rates. Even at 35 kHz there is a hundred Paula cycles, 9000 P2 clocks between samples. This means I have a lot of time to compute them. The 3.5 MHz procedure has only to output the sample at the proper time.
So the main sample outputting procedure uses interrupts from a DAC channel. It reads the next sample to output and its time from LUT circular buffer. In the meantime the synthesizer computes the sample values and times to output and places them in the buffer.
This allowed me to write enhanced version of a "Paula like thing". It can play 8 and 16 bit samples (sample type selectable for the channel) and stereo pan them - in 8 channels.
The driver with a test sine wave sample attached.
The "official" driver repository: https://github.com/pik33/P2-retromachine/tree/main/Propeller/Audiodriver
This is the part of the project: https://github.com/pik33/P2-retromachine
I've been poking around on your github link. I really like what you are doing there! I too would love to see a P2 retromachine (which is a great name, BTW). One thing I want though, and I don't know if you have the same desire to put everything contained within the retromachine itself. To me, a real retrocomputer would need a local editor and compiler/assembler. Back in the day we used to program on the computer we used...now the computer I use is far to distracting or maybe it is that I am just far to easily distracted. It would be cool to also have code repositories (github or gitlab) access and forum access right from the P2 as well....maybe over USB to a phone. The lack of ability to play videos on the P2 is actually a plus.
One question, why the 960x540 resolution? Is there some emulation that works well at this res?
960x540 is simply fullHD/2
As it is now, the video driver doesn't work at this resolution (where I left this 960x540 in the code? I wanted it, but I haven't it)
Edit: found and removed the reference in the header text. The driver has actually maximum resolution of 1024x576 @ 50 Hz (needs 360 MHz) and several other (less than this) to select. At 320 MHz there is 896x496 resolution available @ 50 Hz and 800x480 @ 60 Hz.
Yes, this should be a self-contained machine with the Basic interpreter (for the start) That's why I used a Pi for kbd/mouse - I have also a MIDI shield and I bought several DB9,15,25 type connectors. I am waiting for HDMI breakout boards which should be available tomorrow, then I want to 3-d print a box for the machine. I will start a topic for it then
I tried to attach Ahle2's tracker player to this "Paula": it (kind of) works, I don't understand when I have to retrigger samples.
Edit: I actually managed to play a module using this. Too late to play further, the actual code with the module is in Github.
Nice work @pik33 . I just tried it out.
I think you can save about 15 P2 instructions (and 30 clock cycles) in your main loop where you branch to your different COG addresses for computing each channel's samples. Instead of writing a value to "cn" which you test and branch on later you can move the address of the branch target directly into cn right away and then jump to it in the end of the time tests (indirect jump).
Implemented. Still have to rapair the overflow problem. Time counts at Paula rate, 3.5 MHz, and when it rollovers, this fle sequence will not work
Implemented the optimization above
Optimized ISR (2 nops gained)
The driver no more fails when the counter overflows. This costed 17 instructions in the main loop and one in ISR (1 nop lost; 2 nops left there)
The tracker player is now fully working. The main file in the attached zip is player.bas. Insert the module name in shared asm section. HDMI at 0, AV board at 8 (audio at P14,P15). HDMI now displays some debug. As the FlexBasic can read files from SD card and the retrocog can get data from FlexProp serial terminal, the tracker player with modules on SD is now possible
Edit: the driver stil can fail on the rollover. The bug is yet to find
Corrected (?) - cmp time, #$80000000 wc doesn't set c if time=$80000000. Changed to $7FFFFFFF. Also initial frequencies for channels were set way too low. To make debugging easier I set the rollover to $20000000 instead of $80000000. I am listening to the module and the counter rolled over several times. Maybe the bug is fixed.
Assuming you're using HDMI monitor, try this for 960x540 mode:
If you want to adjust the pixel clock to suit a different sysclock then it's as simple as updating the clock frequencies and the
( 72<<11)for vertical blank time.
Calculated with 30e6 / (960 + 80 = 1040) / 50 - 540 - 3 = 34
So there is a total of four effective parameters to build complete HDMI timings from:
1 - sysclock frequency (sysfrq)
2 - horzontal resolution (hres)
3 - vertical resolution (vres)
4 - vertical frequency (vfrq)
Remaining timings are:
dotfrq (>= 25 MHz) = sysfrq / 10
htot = hres + 80
hfp = 8
hsw = 64
hbp = 8
vtot = dotfrq / htot / vfrq
vfp = 1
vsw = 2
vbp (>= 9) = vtot - vres - 3
EDIT: Added the minimum for dotfrq. DVI/HDMI minimum link speed is 250 MHz.
@pik33 I'm looking at whether the larger mod file samples can be played directly from external PSRAM....got to understand your code and @Ahle2's tracker audio sample addressing and how frequently samples need to be read. Am hoping there may be some scope to do this. Perhaps the music sequence data would still have to come from HUB if it is accessed a lot of the time and randomly but it might be possible to have the sample data in external RAM.
@pik33 You might be able to read from external memory in your Paula driver with something like this...duplicate this for other channels. Whether it would keep up and handle the extra read latency of about 1us per sample I'm not sure.
EDIT: with some dummy read requests to external PSRAM as well as the real HUB read the Paula code seems to keep up with the mod file. Am now trying to copy the 512k of actual HUB data over to PSRAM and read from there (at the same address) but it is just muting the output for some reason - bad addressing issue of some type or the write is failing.
Update: Found the fix! The wc in the mailbox read doesn't work with the setq burst so I just changed the jmp condition following the read to a tjs instruction instead. Now it is playing back mod file data from a PSRAM cached copy of the instrument samples (at the same address offset from the start of memory as used in the HUB RAM version, just for proof-of-concept convenience).
Here's the new magic code to read mod data from my PSRAM driver (or SRAM/HyperRAM for that matter).
The main loop code has to avoid big setq/rdlong/wrlong or anything which prevents the interrupt to be processed. If you managed to make this work, now it is possible to play big modules from the PSRAM. I want to rewrite the tracker player to FlexBasic, to better understand it and to have a full control of what it does. Maybe, writing a tracker itself can be a good idea. I have to attach some PSRAMs to the P2. I have several chips in the drawer.
Yes it is worth playing with. PSRAM is very flexible on the P2. I just updated the code a second time in the zip above with a player fix if you already grabbed it.
I am learning FlexBasic and a module structure at the same time. The code in https://github.com/pik33/P2-retromachine/tree/main/Propeller/Tracker player now displays a module name and a sample list.
All retro computer mix: Atari 8bit colors, Atari ST font and Amiga module. As there is Sidcog available, the C64 part will be added to this mix in a short time
I've been thinking about modifying OPN2Cog into "OPNACog" (would entail adding rythm channels and replacing SN76489 PSG with AY-3-8910), with that you get some PC-98 up in there, too. Hey, wasn't it you whomst asked for P2 Bad Apple once? Could play the original chiptune (which most people don't even know is a thing, lol) with that, haha.
I have a strange, hard to catch, timing problem in the driver. In one of modules, after several (2, maybe 3) counter rollovers one of channels stopped playing. This was repeatable (after recompiling and reloading, the bug appeared always in the same place.
I tried to debug this by dumping channels counters from the cog to the hub in the main loop and displaying them on the screen from the Basic main program code.
After this change the module plays now 3 hours without a glitch... But at least I can see these counters and I have several hundreds of modules to play and test
Edit... OOPS!!! I am not allowed to add #1 to the "front" variable and then add #1 again, 3 instruction later, including this debug WRLONG !!! as there may be an interrupt between these adds and it can get odd value there and fail.. I moveds this RDLONG at the end of the loop which magically repaired the program, but the problem still exist... and it is easy to correct.
960x540 achieved. I forgot to update one of my timing constants and this created instable picture. My HDMI driver now has a lot of flexibility: I have to add a procedural timing calculator for this, enabling creating user defined mode timings.
And it is displaylisted, so I can mix text and graphic modes on the same screen.
What is visible now in the player is 960x540 with active 896x496 with a border. The Basic module player uses 112x31 256-color text mode.
Nice one. Glad someone else has tested the method now.
I think I may have been using your earlier display driver to experiment with myself. I have it calculating from six parameters, including the graphics area parameters too.
Looks (and sounds) cool, you'll have to add the obligatory scrolling note history too like the classic trackers had.
That was the list of instrument names! Demo writers reproposed by replacing all the names with a blurb that could be viewed like a scroller.
I meant something like this where the note data scrolls:
Oh, that's the score editor itself. Yeah, they scrolled when in playback.
Example snippet of playback:
I was thinking that you would do a complete emulation of the Paula (except the floppy controller, UART and IO stuff). I was looking forward to that.
Even implementing the PWM volume control that ran at 1/64 the color clock (55.9 kHz) and the audio filter. The PWM volume interferes a little bit with the audio signal in the below 20 kHz range. These things all adds to the well known "Amiga sound".
Anyway, great work!
I had to start somewhere. The problem is timings are tight, ISR is full and all these things need to be computed at full Paula speed. The solution may be to go from Paula * 90 to Paula * 100 system clock. This is an extreme P2 clock range. Going Paula*100 gives another 5 nops for the ISR. The setq/rdlong in the main loop also interferes with the ISR. Making these RDLONGs scattered in the code instead can give me another 4 nops for ISR. 5+4+2 I actually have= 11 nops. I need 6 for a filter, 5 can be enough to implement the PWM.
Maybe somewhere in the future I will have to stop working on this in the near future, as the new (also P2 based) robot boards arrived and I have to switch to the professional work... The tracker/player/Paula is a playground for me to learn: the robot has to say something, now, instead of simple 1-voice mono driver I can use this. The main robot control program I wrote using Spin made me hate Spin with its lack of proper string handling, multidimensional arrays and indenting - the next version of the robot code will be rewritten in FlexBasic...
Edit: tested this at 354693878 MHz. As I thought, I have 5 nops more which should be enough for the filter. As I can understand, the filter is one for all channels... PWMed volume control should be computed for all channels independently - I don't know now how I can do this.
You can save some cycles in your ISR.
Use ptrb for the LUT address tail register with auto increment.
Use the incmod instruction. Might need to be a2000000 instead of a1ffffff.
I already tried ptrb - it didn't work and I don't know why. Now I have this 354 MHz setup to experiment with. I didn't tried incmod yet. I have bad experience with incmod: it never worked for me every time I tried it. There is something I don't understand.
So, ptrb and incmod will be tested now until they work
I used RDLUT with ptrb and auto increment in my driver - it works. And INCMOD also works - I just can't recall offhand if it needs the final value or final value +1 setup in the S register (might be the final loop value, so a1ffffff).
Also there should be plenty of time to execute the main (non-ISR) sample reading code for normal mod files. Even if the ISR chews up 50% of the cycles. I think the Amiga HW couldn't generate its samples faster than about 29kHz. So this is four samples (or 8 in your case) in 34us. At over 300MHz you are still talking something above 5k P2 cycles to do your per channel calculations and sample reading with 50% of the COG in 34us. I found there is plenty of time to go read PSRAM samples even with a 1us latency or so per read operation.