HDMI discussion

rogloh · 2024-10-26 05:29

Further to this. I should be able to get rid of this runt pulse if I can align the CMOD register change with the streamer using a waitx. I found that a waitx #22 preceding the setcmod call will not show it on the scope while #21 does show a small glitch.
Putting this in takes an extra COGRAM long though. With this fix I have only one spare COGRAM long left now. But if all features are in and working that's okay. If I shuffle things around to poll audio here too I could use a smaller wait value so might not waste all of these clock cycles.

x5                          xcont   m_imm2, dataguard       '   *        |       *      send leading guard band
x6                          xcont   m_imm2, dataguard_vsync '   |        *       |      send leading guard band (vsync)
x7                          xcont   m_hubpkt, hsync1        '   *        *       *      send two packets
                            waitx   #22                     '   *        *       *
vsyncinstr                  setcmod cmodval                 '   *        *       *      patched to drvl/drvh or setcmod cmodval
extrapkts   if_z            xcont   m_hubpkt, #0            '   |        *       |      send two more packets
                            call    #load_mouse_palette     '   *        |       |      read mouse code & large palette

Update: actually this waitx instruction can be skipped on active lines, so there is no need to incur this delay in all cases. That will give more time for region processing and mouse sprite code to be loaded etc and to operate then.

rogloh · 2024-10-27 03:54

Was thinking about that spare COGRAM and what use to put it to. Once I know it's free and doesn't need to be applied for fixing bugs etc, I think I'll try to add a COGATN into the path to notify other COGs of either the last VSYNC line being sent or any new scan line starting, configurable at startup time along with the ATN mask. This could be very handy for sprite COGs using the driver's transparent region mode which would otherwise need to repeatedly poll the status long in HUB RAM to see which line of video is being sent to remain in sync. Alternatively a per VSYNC ATN can be useful for games other other applications requiring buffer flipping etc.

Additionally I am now trying to get the mouse sprite workload split up into two phases, one setup part to execute in HSYNC while the HDMI packets are being sent from HUB (we still have some free clocks there), and its other loop part mixing pixels and rendering back to the scan line running in the back porch. Achieving this should buy us just enough clocks to run with up to 4 audio samples processed as well under worst case situations of pixel doubling the 24bpp graphics mode while the region changes for the next line. That's the hope anyway. This would be great as it would allow a 96kHz audio sample rate (actually up to ~126kHz input rate) which is the highest that can be guaranteed in VGA timing when a single audio packet is sent per island when the second packet is dedicated to clock regen. This would be really pushing the COG towards the max, something like 99.25% COG timing utilization with the earlier numbers IIRC, although there are typically creative ways to improve things further. It just gets more difficult and makes the code even harder to understand.

Update: OMG worst case 96kHz enabled system from what I can tell takes ~7985! clocks with a rather hacked up build optimizing the sync interval time out of the budget of 8000 clocks (99.8% of timing budget). Doing any further processing likely risks overrunning the budget as there could still be some hub slot alignment variation here. I've put the region processing into the horizontal back porch. There's still a bit of free space left in the blanking (and will be more at 27MHz pixel clock resolutions), but not much else I can load into there for now without really splitting up existing subroutines into smaller chunks which complicates the code quite a bit although is still doable if required.

For standard 48kHz enabled setups, it's a much safer ~500 clock margin at the end of each active scan line.

rogloh · 2024-10-28 03:51

Far out, I think I finally solved all the polarity stuff. That was a mess. Found two more bugs, one from flags alteration during a subroutine call affecting things after moving some code order around, and another due to an uninitialized variable. Unfortunately all these things including the earlier ones compounded on top of each other making solving it a bit of a nightmare and taking hours because each individual fix would change the behaviour but not solve it until all were made together and analyzing the output patterns for all possible polarity cases was tedious. That'll teach me to take shortcuts trying to quickly hack things up late at night! LOL.

evanh · 2024-10-28 04:24

Hehe, now you're done, I'm with Ada. I doubt polarity has mattered, even for analogue monitors, for many decades.

rogloh · 2024-10-28 06:39

@evanh said:
Hehe, now you're done, I'm with Ada. I doubt polarity has mattered, even for analogue monitors, for many decades.

I know, but I just like to have things working. Total unnecessary mental punishment agreed.

Right now I'm now splitting the video driver into different parts which can be conditionally included to save on HUBRAM space when not required. If the required driver for the output type requested is not present in the build the initDisplay call returns with an error code. This approach cleans up the code as there is a hierarchy now and you can independently access the low level driver now in your own code without bringing in all my SPIN2 APIs - you just need to use my display data structure format. This is useful for really low memory overhead setups and the framework should allow further driver expansion later if new output types are needed.

p2videodrv.spin2
|-hdmidrv.spin2
|-basicdrv.spin2

 ' error codes returned
    ERR_NONE = 0 ' no error
    ERR_NO_FREE_COGS = -1 ' can't spawn a new COG
    ERR_NOT_INCLUDED = -2 ' feature not included in this build
    ERR_NOT_SUPPORTED = -3 ' combination of parameters not supported
    ERR_INVALID_PARAMS = -4 ' general parameter error
    ERR_BAD_CLOCK_TIMING = -5 ' clock timing not possible

You can include either component by defining a variable in Flexspin - not sure how to do it for PNut or if that's possible yet. This frees up space if you don't need these. The basic (existing) raw driver code is 5488 bytes and the HDMI compatible driver is 6912 bytes when built with flexspin (including its own overheads), however most of this space can be reclaimed for scan line buffers after spawning the COG if you only need to start the driver once after boot. The SPIN2 API layer code if used takes more space but thankfully flexspin removes dead code. Unfortunately PNut still doesn't though but does compress this interpreted SPIN2 code at least.

BASICDRIVER - existing P2 video driver
supports VGA/Composite/Component/S-video/DVI output options
HDMICOMPATIBLE - new experimental video driver with HDMI compatibility and optional simultaneous output
supports new HDMI, HDMI+VGA, DVI+VGA output options

#define HDMICOMPATIBLE = 1  ' define this to include experimental HDMI compatibility
#define BASICDRIVER = 1 ' define this to include basic VGA/COMPONENT/TV/DVI driver

OBJ
#ifdef HDMICOMPATIBLE
    hdmidrv:"hdmidrv"
#else
    hdmidrv:"nodrv"  ' not included
#endif
#ifdef BASICDRIVER
    basicdrv:"basicdrv"
#else
    basicdrv:"nodrv"  ' not included
#endif

Wuerfel_21 · 2024-10-28 12:04

With #define you don't need the equals sign. Infact, if you're just going to #ifdef, you don't need to define a symbol to anything in particular. Just being defined is enough, i.e. just #define BASICDRIVER.

Also, if you're going to release an update, look into getting the NTSC/PAL coefficients right. Note that the ones in the current driver are designed for the 124Ω DAC. Sadly using it causes a notable bit of banding on smooth gradients. If you look into old versions you can find the 75Ω coefficients (which have other issues)

rogloh · 2024-10-28 12:27

@Wuerfel_21 said:
With #define you don't need the equals sign. Infact, if you're just going to #ifdef, you don't need to define a symbol to anything in particular. Just being defined is enough, i.e. just #define BASICDRIVER.

Yeah I know and I fixed that an hour ago, was just detritus accumulating.
I also found out about the {$flexspin ... } thing too so I can limit PNUT complaining. Unfortunately it doesn't seem like Chip has added the #if stuff though despite mentioning it here
https://forums.parallax.com/discussion/174674/mixing-pnut-and-flexspin-specific-code

Also, if you're going to release an update, look into getting the NTSC/PAL coefficients right. Note that the ones in the current driver are designed for the 124Ω DAC. Sadly using it causes a notable bit of banding on smooth gradients. If you look into old versions you can find the 75Ω coefficients (which have other issues)

Yeah this has been one the things I've been waiting to get around to doing. I might even look at redefining the timing long structure too - though it increases the memory usage a bit. There was also my LCD panel work too if I can dig that up from wherever I left it.

rogloh · 2024-11-01 07:00

Hi @Wuerfel_21
I am thinking of adding the option of notification of audio samples via a HUB address as an alternative to REPO pins (which are still supported). I'm wondering if it's possible to still have the resampling process work correctly if a client COG feeds new audio sample data after each sample is accepted by the video driver and an ATN is issued back to the client COG. There would need to be a notification (via ATN) to the video driver from the client COG indicating the next sample has been written and is ready for reading and the driver would reply via ATN after each sample is read into the local history buffer (still accepting up to a maximum of 4 buffered samples). The intent is that this gets driven by the output process back to the input COGs that feeds the audio samples. So you could have another COG process that might be reading a file off a disk or larger buffer and just gets an interrupt on each ATN and feeds the next sample into the same HUB address. So long as no ATNs are missed which should not be the case if the startup order is managed, I think this would allow a COG to be driven by the output. It would be more bursty, and filling new samples as soon as the video driver has taken the last one. Am just wondering how this may affect the resampler if data is continuously appearing - maybe it needs to be locked to use a fixed 32/44.1/48kHz output rate in this case (which would be fine).

Wuerfel_21 · 2024-11-01 10:17

?????? it just reduces samples by a certain ratio. Do you not get it? I had a version with linear interpolation somewhere, that was maybe easier to understand conceptually... Linear interpolation is essentially drawing a straight line between samples and then sampling that line. Cubic interpolation is drawing a spline instead, with the next two adjacent samples as control points. The sampling point is the fractional sample position.

Having to do an ATN back and forth for each sample individually doesn't seem very nice. As I said, you'd really want a big ring buffer.

rogloh · 2024-11-01 10:51

@Wuerfel_21 said:
?????? it just reduces samples by a certain ratio.

Ok if it only reduces by a constant ratio but we blast samples for every one read then this idea might not have legs, as we'll be sending too much data.

Do you not get it? I had a version with linear interpolation somewhere, that was maybe easier to understand conceptually... Linear interpolation is essentially drawing a straight line between samples and then sampling that line. Cubic interpolation is drawing a spline instead, with the next two adjacent samples as control points. The sampling point is the fractional sample position.

I know about cubic vs linear interpolation. I'm just thinking about the sample timing/rate control here.

Having to do an ATN back and forth for each sample individually doesn't seem very nice. As I said, you'd really want a big ring buffer.

Yeah a ring buffer is nice however that's harder to manage and I might not have the free space for doing that right now. I'm trying to come up with something that still works but could fit in the COG footprint with small changes. I'm considering something not for emulators that emit their samples at some fixed rate, but for apps that want to deliver bursts of audio samples - basically a pull model where the video driver outputs samples at some fixed rate like 44.1 or 48kHz. But I guess it has to be tied back to the line frequency somehow (instead of to the source COG) which is the rate that drives the output process. We'd need to work out how many samples should have been sent by now and pull either 1,2 audio samples per line from the FIFO buffer depending on the ratio between line rate and desired audio sample rate. The other idea I had was watermark levels that combined with ATNs indicate whether to resume or pause sample feeding, or you could just read the level.

Update: I guess if the resampling/interpolation part of the overlay code gets removed due to locking the input & output rates at some fixed value then there would be more room for some HUB ring buffer management at the output side. The input side taking in samples from HUB would have to remain simple though as there are not many spare COGRAM longs left to manage that, plus the instruction overhead at polling time needs to be kept low.

rogloh · 2024-11-01 15:08

Been thinking more about this ring buffer idea. There should be enough space for it. I can patch over all the resampling code and re-use the poller area plus the history buffer for state.

No audio polling has to happen while the video is rendered, all sample extraction from the fifo/ring buffer can happen during packet encoding. This means a ret instruction can be patched over the audio_poll_repo long, and 5 longs following it are are freed.

audio_poll_repo             cmp     audio_inbuf_level,#4 wc
.pptchd     if_c            testp   #0-0 wc ' TESTP is a D-only instr
            if_c            altd    audio_inbuf_level,.inbuf_incr
.pptchs     if_c            rdpin   0-0,#0-0
.retins                     ret     wcz
.inbuf_incr long audio_inbuffer + (1<<9)

Also all these longs are freed as well in COGRAM.

audio_hist                  long    0[4]
audio_inbuffer              long    0[4]
audio_inbuf_level           long    0
resample_ratio_frac         long    0
resample_ratio_int          long    0
resample_antidrift_period   long    0
resample_phase_frac         long    0
resample_phase_int          long    0
resample_antidrift_phase    long    0

I believe all I really need to maintain are fifo start and end pointers, the current read position (tail pointer) and a phase accumulator which I use along with two constants based on N/CTS values.

Each 800 pixel scan line adds a constant (800 * N) to the phase accumulator, and this constant can be precomputed at startup. On each scan line when I build an audio packet I check if the accumulator value is positive, if so I can read another audio sample from the fifo. Then I subtract another precomputed constant (CTS*128) from the accumulator when I read a sample. If the result is still positive I can read another audio sample from the fifo to encode into the same audio packet and subtract again. Repeat this until the accumulator goes negative. No more than 2 samples per scan line would need to be sent if the line rate is 31.5kHz and when the output sample rate is 48kHz. Update the fifo tail pointer position back to HUB RAM per sample read so clients can tell where it's up to, and also wrap the pointer on the fifo end address back to the start address when needed.

The audio client feeding data can then monitor the fifo tail pointer being updated by the video driver, maintaining its own head pointer and attempting to keep ahead of the tail pointer and not let wrap around occur. No COGATNs would be needed with such a scheme. Only about 6 longs of actual state are needed, and the algorithm is dead simple vs the resampling interpolation code. I expect it should be readily doable in fact so I'm going to attempt something like this as a selectable option instead of the pin REPO mode at startup.

Wuerfel_21 · 2024-11-01 15:23

Ideally you'd also be able to write to DAC pins when you're managing the ring buffer. In that case you'd need a poll function in reverse. But it's still simpler.

rogloh · 2024-11-02 00:15

@Wuerfel_21 said:
Ideally you'd also be able to write to DAC pins when you're managing the ring buffer. In that case you'd need a poll function in reverse. But it's still simpler.

Yes I see that doing simultaneous output on analog DAC pins is an issue, the video driver certainly can't write to them synchronously so the client source COG would need to do that by somehow resynchronizing itself based on the fifo capacity as it slightly starts to get out of sync. Tracking ring buffer size changes using head/tail pointers might let it increase or decrease the P2 clocks between samples to accommodate that. Basically the client COG/writer task treats it as an elastic buffer and compensates its own inter-sample timing very slightly if its capacity starts to increase or decrease over the long term. I could also put in an option to issue a COGATN for each sample read by the video driver (or each time it wraps?), but if done per sample these ATN pulse timings won't be distributed equally, they'll be very bursty, one or two per scan line for example. They still could be useful in some cases for counting for the purpose of remaining in sync or to restart some feed process. I do think the ring buffer/fifo solution is more useful for applications that want to feed audio from WAV files etc directly to a HDMI output port or when they can generate samples much faster than the output rate, and are happy to let the video driver read out samples at its own pace.

One downside is that latency increases with the ring buffer size which may be an issue in some cases, like for streaming video and lip sync etc. A small buffer is probably okay though. Probably something under 4kB or so, that is only 21msec of 48kHz stereo audio. A small ring buffer just means the writer has to stay busy and keep up and will need to be operated far more real-time in nature. For a WAV player or some application like that the buffer could be configured larger to ease the burden on a slower filesystem. In some ways it'd be nice to be able to dynamically change the actual model in use, the sample rate and buffer size etc on the fly but that gets far more complex to deal with for now. Even just changing the HDMI sample rate is tricky enough in a limited COG if you want the HDMI sync to remain while you do this.

Wuerfel_21 · 2024-11-02 00:22

@rogloh said:
the video driver certainly can't write to them synchronously

That's exactly what I was suggesting. It'd be the exact opposite of how the repo mailbox thing works - every time you read a sample from the RAM ring, you shift it into a small buffer. Then in your poll function you check if the DACs are ready and pull one out.

rogloh · 2024-11-02 00:36

Oh okay, now I see, but won't there be some jitter if you write to DACs at these semi random times on the scan line. Or are they synchronously updated? I've not actually messed with P2 DACs before so am not sure what they are capable of.

Wuerfel_21 · 2024-11-02 00:38

The smart DACs are mailbox-based. IN raises when they're ready to receive a new sample (as per the configured period).

rogloh · 2024-11-02 00:46

Ok so we check that bit and output the next sample. There is a risk that the P2 clocks per sample is not a discrete integer when samples are delivered with N/CTS counting so there would be some underflow/overflow cases to deal with. The output side running in the overlay with more time/instructions would need to try to increase samples via duplication or decrease via occasional drops to accomodate this.

Wuerfel_21 · 2024-11-02 00:48

@rogloh said:
Ok so we check that bit and output the next sample. There is a risk that the P2 clocks per sample is not a discrete integer when samples are delivered with N/CTS counting so there would be some underflow/overflow cases to deal with. The output side running in the overlay with more time/instructions would need to try to increase samples via duplication or decrease via occasional drops to accomodate this.

You can always calculate an N/CTS for an integer number of clocks-per-sample (infact, that's what my code does), since it's just a rational factor to the TMDS clock

rogloh · 2024-11-02 00:54

@Wuerfel_21 said:

@rogloh said:
Ok so we check that bit and output the next sample. There is a risk that the P2 clocks per sample is not a discrete integer when samples are delivered with N/CTS counting so there would be some underflow/overflow cases to deal with. The output side running in the overlay with more time/instructions would need to try to increase samples via duplication or decrease via occasional drops to accomodate this.

You can always calculate an N/CTS for an integer number of clocks-per-sample (infact, that's what my code does), since it's just a rational factor to the TMDS clock

Yeah true. I guess you don't have to use the published recommended values. Hopefully all sinks can deal with arbitrary values for these parameters and not the expected values.

Wuerfel_21 · 2024-11-02 00:56

@rogloh said:
Yeah true. I guess you don't have to use the published recommended values. Hopefully all sinks can deal with arbitrary values for these parameters and not the expected values.

Going to the absolute limits of valid N/CTS is part of the conformance test for sinks, so they really should. And so far it appears to be the case.

rogloh · 2024-11-02 00:59

@Wuerfel_21 said:

@rogloh said:
Yeah true. I guess you don't have to use the published recommended values. Hopefully all sinks can deal with arbitrary values for these parameters and not the expected values.

Going to the absolute limits of valid N/CTS is part of the conformance test for sinks, so they really should. And so far it appears to be the case.

Ok that's good then. It'd avoid nastiness with messing with fractional clock adjustments etc. Hopefully this DAC update code would be able to fit into the same footprint as the REPO poller. Want to write one?

Wuerfel_21 · 2024-11-02 01:03

I don't think so, it needs to be a little bit longer, since you need to write the second pin.

rogloh · 2024-11-02 01:13

Some extra instructions is probably okay if there is padding space can made available. Right now I think I have about 4-5 free longs since I did some more rearranging.
I guess its 3 extra, something like this, after your RDPIN is replaced by the MOV, and your ALTD is replaced by ALTS
ALTS audio_inbuf_level, .inbuf_decr
MOV sample, 0-0
GETWORD temp,sample, #1
WYPIN temp, #leftpin
WYPIN sample, #rightpin

rogloh · 2024-11-02 01:28

BTW one other minor thing I realized was that it makes sense to write back to the FIFO a zero over the audio sample just read in by the video driver. This lets the sample ring buffer "age out" to a muted output condition in case the source ever stops feeding it data (hopefully with a nice volume ramp down at the end). When a source restarts again it would just resync itself to start somewhere sufficiently ahead of the current tail pointer. This seems like a sane way to go, and prevents those nasty repeat loop audio effects when things hang.

rogloh · 2024-11-02 04:38

I just hacked up the ring fifo and it fits the LUT overlay footprint with room to spare. Here are the two versions side by side in a pretty small font to make it fit, I also added volume mixing to the fifo version on the right (I do also have it also for the resampler too but it's not shown here for brevity and to fit the entire thing on screen).

The spare room should be useful for copying the small number of generated samples into a small temporary fifo buffer for a future analog DAC process that would be done instead of audio polling assuming I can fit the needed state in the free COGRAM area - audio COGRAM registers will need to be aliased/reclaimed for that when patched for fifo mode operation vs repo mode. This is basically a push(repo) vs pull(fifo) model choice. The resampling function is lost when you run in the pull model and it'll just be operating at a constant 32/44.1/48kHz etc and the source audio will be pulled from the ring fifo at that rate.

rogloh · 2024-11-02 10:09

So I found I can probably reclaim the needed space for DAC output in FIFO mode by placing the 4 long audio_hist array directly after the polling code as this array is only needed when resampling in REPO mode and can be repurposed otherwise.

So this REPO mode polling code:

audio_poll_repo             cmp     audio_inbuf_level,#4 wc
.pptchd     if_c            testp   #0-0 wc ' TESTP is a D-only instr
            if_c            altd    audio_inbuf_level,.inbuf_incr
.pptchs     if_c            wypin   0-0,#0-0 ' group
.retins                     ret     wcz
.inbuf_incr long audio_inbuffer + (1<<9)
audio_hist  long    0[4] ' reclaimed in fifo mode

would be patched to this in FIFO + simultaneous analog DAC output mode and the DAC pins would be initially setup to run in Pseudo Random Dither mode using the correct sample period when the COG is initializing.

audio_poll_repo             and     audio_inbuf_level, #$1ff wz
.pptchd     if_nz           testp   #0-0 wc 'test a single DAC pin for readiness
            if_nz_and_c     alts    audio_inbuf_level,.inbuf_decr
            if_nz_and_c     mov     spright, 0-0
            if_nz_and_c     getword spleft, spright, #1
.pptchs1    if_nz_and_c     wypin   spright, #0-0 ' patch right DAC pin   
.pptchs2    if_nz_and_c     wypin   spleft, #0-0 ' patch left DAC pin
.retins                     ret     wcz
.inbuf_decr long audio_inbuffer + ($1ff<<9) ' subtract one

The extra code needed for patching the audio encoding overlay to run in FIFO mode is also possible to fit as well, just by jumping to where the constructed audio packets are to be buffered once encoded before writing to HUB (a 32 long area at the end of LUT RAM). This new code becomes the new entry point instead of calling build_audio directly and it decides whether to load in a secondary (short) overlay on top of the existing REPO mode code when running in FIFO mode. Due to not running the resampler, there are plenty of extra cycles to load in these extra 14 longs on the fly. Ideally this overlay code held in HUB RAM could be patched first in HUB according to the operating mode but that prevents the driver from spawning more than once, so patching dynamically at run time is easier for this short amount of code as we have the cycles to do it. There are still free longs left here before LUT RAM ends and they might be useful to enable future features like dynamic changes to sample rate for example or re-encoding infoframe data etc, by reading in and executing other code overlays during video blanking periods. It'd be handy to keep that possibility for later...

            fit $3e0 ' ensure we keep 32 longs spare at end of LUTRAM overlay for the constructed audio packet buffer
packet_audio  
            tjs MIN31, #build_audio ' do repo mode case

            loc pb, #\fifoencode  ' locate fifo based audio encoder
            setq2 #fifocode_end-fifocode_start ' get length of LUTRAM code in longs-1
            rdlong build_audio:sploop & $1ff, pb ' block read into the LUT RAM to replace REPO mode code
            add phaseacc, npixval   ' add to phase accumulator based on number of pixels/line and N value
end_audio   jmp #build_audio    ' process audio

            fit $400

            orgh

fifoencode
              org build_audio:sploop

fifocode_start
              tjs phaseacc, #build_audio:validsample
              sub phaseacc, ctsval ' 128xcts

              rdlong spright, fifoptr ' read next sample
              wrlong #0, fifoptr ' clear out data

              cmp fifoptr, fifolimit wz ' increment and wrap
        if_nz add fifoptr, #4
        if_z  mov fifoptr, fifostart
              wrlong fifoptr, tailptraddr ' write back current fifo tail position

              getword spleft, spright, #1 ' get left sample
              muls spleft, lvol ' attenuate left sample
              muls spright, rvol ' attenuate right sample

              shr spleft,#8  ' align samples
              shr spright,#8
fifocode_end  jmp #build_audio:validsample

              fit build_audio:validsample

Not shown here yet but I probably also need to DC offset the DAC samples before filling the audio_inbuffer array - convert from signed to unsigned by adding $8000 to each value. Thankfully it could be done on the fill side not in the poller code.

Update: that DC offset step is something like this when filling the small internal buffer for the DACs to use.

              decod temp3, #31 ' create #$80000000 value
              mov temp2, temp3
              add temp2, spleft ' add DC offset to left sample
              add temp3, spright ' add DC offset to right sample
              shr temp3, #16
              setword temp2, temp3, #0 ' create a 2x16 bit stereo sample
              add audio_inbuf_level, #1 ' increase inbuffer size
              and audio_inbuf_level, #3  ' ensure we remain in range
              altd audio_inbuffer, audio_inbuf_level
              mov 0-0,temp2

Update 2: Not sure that the internal audio_inbuffer sample order handling is correct. Needs more work to get that correct so it's a fifo not a stack.

Wuerfel_21 · 2024-11-02 12:55

Yeah, was about to say, one side needs to shift the buffer. Also you technically need to compensate for the DAC clipping at $FF00, but that's optional-ish.

Also note that adding $80000000 is the same as XORing with it (which is the same as bitnot #31)

Also, you want PWM dither mode, the random mode is bad.

rogloh · 2024-11-02 13:47

I read that PWM dither mode says the sample rate needs to be a multiple of 256 P2 clocks, which is too restrictive.
Good point about the XOR, that can save work if the samples are packed together, I can simply XOR with $80008000 . Didn't realize about the clipping at $FF00. Lots of real world issues to deal with I guess.

Wuerfel_21 · 2024-11-02 14:25

@rogloh said:
I read that PWM dither mode says the sample rate needs to be a multiple of 256 P2 clocks, which is too restrictive.

It works best with a multiple of 256, but I've found that it's always better than noise mode. Documentation drank too much.

Good point about the XOR, that can save work if the samples are packed together, I can simply XOR with $80008000 . Didn't realize about the clipping at $FF00. Lots of real world issues to deal with I guess.

I think at least one person once stubbornly rejected the idea of dealing with this and just wrote full range 16 bit samples...
One way of dealing with it (on the assumption of a 16 bit signed sample)

              mov out,##$7F80
              scas in,##$3FC0
              add out,0-0
[...]

The downside here is that the quantization becomes a bit uneven (since some input values get mapped to the same output value). Ideally a higher-resolution input would be used, but there's no instructions for that. The alternative is to clamp to +/- $7F80 (so clipping becomes even on both sides) and then add the $7F80 bias. IME pre-recorded audio tends to be as hot as possible, so you'd rather do the SCAS thing, whereas any audio you generate at runtime is likely far quieter (and will likely run into clipping issues at some point anyways), so the clamping is better.

rogloh · 2024-11-02 14:29

Ok I think I might have sorted it - apart from this $ff00 DAC clipping thing you mentioned. Here are some snippets. It actually simplified the code a bit and thankfully I still had one spare long left in the reclaimed audio_hist array after patching which lets me store a constant for the ALTI 4 register looping constant in the DAC poller code. In theory this should work but the reader and writer must always remain processing samples at the same rate so both level counters (writer pos and reader side audio_inbuf_level) don't somehow slide past each other. Also I may need to ensure that the audio buffer is 4 long aligned in COGRAM.

' POLLER to write DACs when in FIFO mode

audio_poll_repo             cmpsub  audio_inbuf_level, #1 wc
.pptchd     if_c            testp   #0-0 wz 'test a single DAC pin for readiness
            if_z_and_c      alti    .inbuf_incr, .inbuf_loop
            if_z_and_c      mov     spright, 0-0
            if_z_and_c      getword spleft, spright, #1
.pptchs1    if_z_and_c      wypin   spright, #0-0 ' patch right DAC pin   
.pptchs2    if_z_and_c      wypin   spleft, #0-0 ' patch left DAC pin
.retins                     ret     wcz
.inbuf_incr     long audio_inbuffer 
.inbuf_loop     long %111_000_000_111 '4 register loop


audio_inbuffer              long    0[4]
audio_inbuf_level           long    0

' COGRAM memory location aliases for FIFO mode vs REPO mode

fifostart 'alias
resample_phase_int          long    0
fifolimit 'alias
resample_phase_frac         long    0
wrbuf_posl ' alias
resample_ratio_frac         long    0
fifoptr 'alias
resample_antidrift_phase    long    0
phaseacc 'alias
resample_antidrift_period   long    0

'  >> FIFO mode overlay in LUT RAM

fifoencode
              org build_audio:sploop

fifocode_start
              tjs phaseacc, #build_audio:validsample
              sub phaseacc, ctsval ' 128xcts

              rdlong spright, fifoptr ' read next sample
              wrlong #0, fifoptr ' clear out data

              cmp fifoptr, fifolimit wz ' increment and wrap
        if_nz add fifoptr, #4
        if_z  mov fifoptr, fifostart
              wrlong fifoptr, tailptraddr ' write back current fifo tail position

              getword spleft, spright, #1 ' get left sample
              muls spleft, lvol ' attenuate left sample
              muls spright, rvol ' attenuate right sample

              rolword temp2, spleft, #1
              rolword temp2, spright, #1
              xor temp2, ##$80008000 ' add DC offsets
              add audio_inbuf_level, #1 
              incmod wrbuf_pos, #3 
              altd audio_inbuffer, wrbuf_pos
              mov 0-0,temp2 

              shr spleft,#8  ' align samples
              shr spright,#8
fifocode_end  jmp #build_audio:validsample

              fit build_audio:validsample

HDMI discussion

Comments