???? No, that seems too complicated. You want the sliding buffer thing that I used. You just use 3 MOVs to shift up the buffer before adding a new sample. Much simpler and less likely to get itself into weird states.
@Wuerfel_21 said:
???? No, that seems too complicated. You want the sliding buffer thing that I used. You just use 3 MOVs to shift up the buffer before adding a new sample. Much simpler and less likely to get itself into weird states.
Yeah I know, any DAC stuff needs more to make it work. If it is even going to be possible to work correctly I'm not yet sure given the semi randomized times both the poller and the encoder run on the scan lines and it would need careful sync up at the start. The only other DAC related thing I was considering was a mono-mix option for L&R channels to be combined if there is only a single analog DAC pin configured. But analog DAC output is not really going to be the main focus right now. I need to test all the other driver init changes I've made that have been accumulating in the code which seem to be quite a lot when I diff against an older working version. It's probably quite likely that something would have been broken by now after all those changes.
I may add an option of a COG ATN notification at FIFO wraparound as when crossing 1/2 way through the buffer. This might be of use to COGs that want to just ping-pong buffer their samples in the output FIFO, although they can still poll the current tail pointer address at any time and use that also.
As things stand here are the features of this experimental video driver work, not everything audio related is fully tested yet but confidence is very high that it will work:
uses existing data structures of existing P2 video driver with all existing features still included
all P2 colour modes for graphics, HUB RAM or external RAM sourced frame buffers, optional mouse sprite, text renderer & dual cursors, pixel/line doubling, multiple regions
DVI or HDMI compatible output formats
optional simultaneous VGA DAC output - only restrictions are the VGA pins must be contiguous around 4 pin boundaries and not overlap HDMI output pins and no SyncOnGreen is possible (though both 4 and 5 pin VGA is supported)
HDMI compatible output mode can have
no audio - just HDMI format video framing only, with NULL packets generated instead of audio & clock regen packets
REPO mode audio - basically identical to Ada’s NeoVGA implementation - stereo 16 bit samples arrive on a pin in REPO mode and can be optionally resampled down to 48 kHz, or already arrive at 32/44.1/48kHz, REPO pin can overlap with HDMI (in which case the BITDAC is not used for differential "CML style" signal output levels, which hopefully does not cause a problem)
FIFO mode audio - audio samples arrive using a circular buffer FIFO and are delivered at the configured sample rate of 32/44.1/48kHz over the HDMI pins
optional COGATN notification upon FIFO wrap and 1/2 size crossings, for ping-pong buffering
I’m still hopeful to add analog DAC output if possible (TBD) in this audio mode on either one or two pins, that way a combined HDMI+VGA/AV accessory setup or DVI-I + optional L/R audio jacks could be used for all applications
AVI infoFrame source data is read from HUB RAM, passed at startup time for encoding to TERC packet and sent out during last vsync line
a volume control parameter (containing independent words for left/right channels) is updated after vertical sync which can be used as a master volume for muting or ramping up/down audio to prevent clicks/pops etc while audio samples start or stop
includes optional COGATN notification of either the last vsync line or all scan lines at hsync time for COG application use (ideally this feature could also be put into the existing P2 video driver if possible).
Other/future ideas:
It’d be neat to support audio sample rate changes, at least in FIFO mode, in case an application wanted to dynamically choose between 44.1kHz or 48kHz output for example without restarting the driver. Not sure if that’s going be doable yet. Changing the video infoFrame data contents between YUV and RGB formats on the fly without a restart could also be useful in a video player application. That requires a packet re-encoding step which could happen by the caller but nicer if the driver can do it as the TERC encoding already exists there anyway.
It could be handy to have the ability to send other HDMI packet formats out on the vsync line, like general control packets or other infoFrames types or just for experimental use. There is some time during blanking for doing other things while still encoding audio samples for the next line. Approximately 2/3’s of the active time in these scan lines are free and we could potentially read in some special overlay to LUTRAM to TERC encode two additional packets in this time for example. Once the audio has been encoded during blanking we’d have the 512 longs of LUTRAM fully available for overlay use until the active video portion starts because the 256 entry palette is not needed at this time (apart from during the first vertical back porch line while the last visible line is still being displayed which might still be using it).
@Wuerfel_21 said:
IME pre-recorded audio tends to be as hot as possible, so you'd rather do the SCAS thing, whereas any audio you generate at runtime is likely far quieter (and will likely run into clipping issues at some point anyways), so the clamping is better.
Only just saw this as our posts crossed paths last night and then the page break occurred but maybe given my volume control step I could always limit the sample range by clamping the volume multiplier to something under ff00, or at least just for the DAC based samples.
@Wuerfel_21 said:
???? No, that seems too complicated. You want the sliding buffer thing that I used. You just use 3 MOVs to shift up the buffer before adding a new sample. Much simpler and less likely to get itself into weird states.
Ok, after taking that into consideration here's the updated code which might work now. It just fits inside the footprint after reclaiming some longs and I even had an instruction free to exit early if the poller is not ready which will be much of the time.
The REPO code: I keep the audio_hist and audio.inbuffer arrays directly following it. The audio_hist array and .inbuf_incr can get reclaimed in FIFO mode while audio_inbuffer needs to remain in place. I had planned a COGATN here too in an earlier scheme but that's not really needed now.
audio_poll_repo cmp audio_inbuf_level,#4 wc
.pptchd if_c testp #0-0 wc ' TESTP is a D-only instr
if_c altd audio_inbuf_level,.inbuf_incr
.pptchs if_c rdpin 0-0,#0-0
.pptatn if_c cogatn #0-0 ' probably redundant now
.retins ret wcz
.inbuf_incr long audio_inbuffer + (1<<9)
audio_hist long 0[4] ' keep after above code
audio_inbuffer long 0[4] ' keep together with above
The FIFO code once patched over the top of REPO code:
audio_poll_repo testp #0-0 wz 'test a single DAC pin for readiness
if_z cmpsub audio_inbuf_level, #1 wc ' don't decrement unless DAC was actually ready
if_nc_or_nz ret wcz ' exit early
mov spright, audio_inbuffer
mov audio_inbuffer, audio_inbuffer+1
mov audio_inbuffer+1, audio_inbuffer+2
mov audio_inbuffer+2, audio_inbuffer+3
getword spleft, spright, #1
.pptchs1 wypin spright, #0-0 ' patch right DAC pin
.pptchs2 wypin spleft, #0-0 ' patch left DAC pin
ret wcz
audio_inbuffer long 0[4] ' keep together with above
The writer side will check the audio_inbuf_level and write at that offset in the array and increment this value. If the value of the audio_inbuf_level ever reaches 4 it will drop the sample (DAC was too slow/late to take it).
Something like this (omitting the DAC level stuff which I still have to figure out)
muls spleft, lvol ' attenuate left sample
muls spright, rvol ' attenuate right sample
rolword temp2, spleft, #1
rolword temp2, spright, #1
xor temp2, ##$80008000 ' add DC offsets
cmp audio_inbuf_level, #4 wc ' check for room left in buffer
if_c altd audio_inbuffer, audio_inbuf_level ' write at next location
if_c mov 0-0, temp2
if_c add audio_inbuf_level, #1
@TonyB_ said:
Is exiting early essential time-wise? If not another instruction could be saved, from the look of it. (I haven't studied the code in any detail.)
@Wuerfel_21 said:
With things like this I'd prefer a constant runtime, actually, since that reduces the possibility of unfortunate interactions.
Yeah I can save that instruction - I had a spare slot from this redundant COGATN I'd added, so I used this but I did wonder at the time if it was just safer to burn the cycles to keep the execution time constant too - it adds more jitter to sometimes exit early and sometimes not. I should be able to poll a scan line about 5 times in the code at a minimum when heavily loaded doubling 24bpp pixels (approximately 1600-2000 cycles apart) so it could save a little bit of time but you shouldn't really bank on it. If the shifting was moved it probably could reduce this. There are plenty of cycles left with 32/44.1/48kHz audio output for doing this, though a 96kHz output rate might be pushing it - I know it was already a bit too close before. We will see, actually perhaps FIFO mode can help save some extra time during packet encoding due to not resampling like REPO mode does, in fact that'll probably buy us a lot actually.
@Wuerfel_21 said:
I'd have done the shifting on the input side...
Did look at that, seemed bit messier that way but probably also doable. Problem is you need to take either one or two samples per line before the audio buffer gets shifted down again for you by the writer and you don't really know which will occur ahead of time without doing timing calculations.
Also you can save an instruction on both sides:
I see the ALTR use does save a cycle, but the poller one as coded won't work. I think you probably meant this:
@Wuerfel_21 said:
I'd have done the shifting on the input side...
Did look at that, seemed bit messier that way but probably also doable. Problem is you need to take either one or two samples per line before the audio buffer gets shifted down again for you by the writer and you don't really know which will occur ahead of time without doing timing calculations.
>
No, it's really the same thing. When you push one you adjust the buffer level. I guess it's hard to describe.
Also you can save an instruction on both sides:
I see the ALTR use does save a cycle, but the poller one as coded won't work. I think you probably meant this:
@Wuerfel_21 said:
No, it's really the same thing. When you push one you adjust the buffer level. I guess it's hard to describe.
Yeah if you wanted to hack up the idea in a code snippet it might be easier to explain that way. I found it tricky the first time to do it that way and just moved onto this simpler solution. I might have given up too easily by switching out the buffer rotation to the poller, but was probably missing something basic.
Hard to say that it's correct without testing. Also a bit tired right now. I think either side being able to drop a sample is needed to be able to always recover into a good state.. Though in practice the samples should be added at the same pace as they're pulled.
The problem I had was on lines with two samples emitted and occurs when the buffer level jumps up by two. You'd then read a fresher sample out before the older one at the pull side because it was indexing off decreasing audio_inbuf_level. But you solved that by reversing the buffer order. If there are no samples you don't update the DAC until there are, and if the DAC hasn't been keeping up (it should if coded right), then you simply drop a sample. I believe such an approach should recover. The main issue is probably in the initial samples, you sort of want to preload something before the scan line processing starts if no samples are present because once initialized the DACs will always want to take something potentially before you've even created the first sample.
It'll be interesting to hear the audio from the DACs, hopefully this will be ready to test soon. Time wise, it's going to be a very jittery execution process both when reading from the FIFO source, and also in polling for new sample data to write to the DACs. The internal buffer could get loaded as soon as the start of one scan line after hsync (when build audio runs early on blanking lines), and as late as the end of the next scan line (once processing doubled pixels). That's a lot of variance in time. Ideally it'd be pre-filled with zeroes at the start to prime it. Another idea is to always run the build_audio process at the start of the scan lines and then do pixels which will at least refill consistently in time, but I don't want to do that because I need to issue PSRAM requests as early as possible to give the data time to arrive and this is part of my graphics processing code.
Actually by moving the buffer rotation to the writer side, I can increase its size again (since the footprint in COGRAM is reduced) and that helps the internal FIFO deal with larger amounts of jitter. So there are more advantages in doing this.
4 slots really ought to be enough, maybe 5.The core problem is that both pushing/pulling happens synchronized to something else, so as a result, the buffer level stays constant (averaged over time). So if the FIFO process is late once (and the DAC doesn't get a new sample), the average buffer level increases. If the FIFO process is too early and the buffer is full, the average level decreases. To note here is that the DAC will simply repeat the previous sample if it doesn't get a new one in time. Also the IN flag will obviously only lower on WYPIN.
...
That earlier Idea I had about locking HSync == sample rate seems increasingly attractive. Not good for general purpose video or emulators though.
@Wuerfel_21 said:
4 slots really ought to be enough, maybe 5.The core problem is that both pushing/pulling happens synchronized to something else, so as a result, the buffer level stays constant (averaged over time). So if the FIFO process is late once (and the DAC doesn't get a new sample), the average buffer level increases. If the FIFO process is too early and the buffer is full, the average level decreases. To note here is that the DAC will simply repeat the previous sample if it doesn't get a new one in time. Also the IN flag will obviously only lower on WYPIN.
Yeah slips will change the notional steady state level. Am hopeful once synced up at the start that if you always poll faster than the actual DAC sample interval (which is every 5250 clocks for 48kHz on a 252MHz P2) and don't underflow the sample buffer it'd still keep up and we'd have no drops anywhere. Obviously the DAC sample rate and FIFO dequeue rate have to match perfectly down to the P2 clock cycle for that to be the case. Having a larger internal DAC buffer would allow for more randomized fill times by soaking up more variation.
We just need to ensure we don't get into some pathological condition where it slips continuously because the internal buffer level is too low and can't ever refill to a good level in time. I think it would still probably be able to recover from that because it would eventually advance an entire sample interval's worth of time every time that occurred giving the internal buffer time to build up again if the fill rate matches the drain rate. A glitch like that should hopefully just be transient in nature and then die out.
...
That earlier Idea I had about locking HSync == sample rate seems increasingly attractive. Not good for general purpose video or emulators though.
Yeah that'd be helpful but only if you had full control of the sample rate, in some dedicated application perhaps.
In other news, I've added the ability to send multiple pre-encoded infoFrames on the last last Vsync line (currently limited up to 4 extra packets after clk regen and audio but it could still be increased further). Right now only AVI and audio infoFrames are actually useful but it could in theory be configured to send other packet types like Vendor Specific or Source Product Descriptor packets. I did this with the intent of allowing more dynamic data island packets to be sent in the future, beyond just infoFrames. When dynamically sent they'd still need to be TERC encoded first. Ideally I'll add something to do that work maybe in the last vertical front porch blanking line if there is free code space to do so.
When the driver is initialized it would be sent an extra pointer (configInfo) to a structure containing HDMI specific information, which is setup by the initHdmiConfig method. If passed a null instead it would just default to DVI output. Here's the basic idea, args might still be changing to include pins independently. Not sure what the actual arg limit is so I kept it packed for now. I convert sample rate internally to sample period for end user convenience.
PUB initDisplay(cog, display, output, basePin, syncPin, flags, lineBuf, maxLineSize, userTiming, mbox, firstRegion, configInfo) : id | syncFlags, pin, timing, newfreq, newmode, sampleRate
...
PUB initHdmiConfig(configPtr, sampleRate, audioPins, atnMask, pktBufArea, infoFrameDataAddr, count, audioFifoStart, audioFifoEnd, audioFifoPtr) : r
long[configPtr][0] := sampleRate ' audio sample rate in Hz
long[configPtr][1] := (atnMask<<24) | (audioPins & $ffffff) ' pins are {repo basepin, leftchnl, rightchnl}, 8 bits each, 255 if not used
long[configPtr][2] := pktBufArea ' must have room for at least 7x 128 byte packets (896 bytes) when count=2, more if count is larger, and 64 byte aligned.
long[configPtr][3] := count<<24 | infoFrameDataAddr & $fffff 'hub address where the (currently static) AVI infoframe data is located to be TERC4 encoded + count<<24
long[configPtr][4] := audioFifoStart 'hub address where the audio fifo starts
long[configPtr][5] := audioFifoEnd 'hub address where the audio fifo ends - last fifo location (long samples)
long[configPtr][6] := audioFifoPtr 'hub address where the current fifo tail pointer resides
return configPtr
DAT
orgh
' HDMI AVI infoframe data
infoFrames
byte $82,02,$0D ' 24 bit header data for AVI infoframe
byte $0,$01,$88,$88, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0 ' 28 bytes of data incl chksum space
byte $84,01,$0A ' 24 bit header data for audio infoframe
byte $0,$01,$00,$00, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0 ' 28 bytes of data incl chksum space
byte $0
For repo/resample mode, you should really accept sample rate an integer period, it needs to be an exact match with the generator cog.
Also, when I was last messing with the upstairs Smasnug 3DTV I did realize that it treats 640x480 a bit specially. If I do 854x480 I can force 3D on with the remote. So I should try the vendor packet to that extent again.
@Wuerfel_21 said:
For repo/resample mode, you should really accept sample rate an integer period, it needs to be an exact match with the generator cog.
Yeah it should match your numbers. I do this in the code. If sample rate == 0 then I disable all audio output (but keep HDMI framing).
if output == HDMI or output == HDMI_VGA
sampleRate := long[configInfo][0]
if sampleRate ' need to convert sampling rate into a sample period based on current P2 clock frequency
long[configInfo][0]:= (clkfreq + (sampleRate>>1))/sampleRate ' round to nearest number of P2 clocks
long[display][2] := configInfo
...
@Wuerfel_21 said:
Maybe make it so negative numbers bypass the conversion and just specify an actual period.
I can do that yeah.
if output == HDMI or output == HDMI_VGA
sampleRate := long[configInfo][0]
if sampleRate>0 ' need to convert sampling rate into a sample period based on current P2 clock frequency
long[configInfo][0]:= (clkfreq + (sampleRate>>1))/sampleRate ' round to nearest number of P2 clocks
else
long[configInfo][0]:=-sampleRate ' use direct period value (zero value is unchanged)
long[display][2] := configInfo
else
long[display][2] := 0 'reserved
Another bonus of the FIFO based scheme is that in principle it allows other audio formats (high bit rate, DST, one bit audio) to be sent out within the audio packets defined for that type. A different LUTRAM overlay that is aware of the desired audio format can be read in and generate the samples. Because there won't be any resampling going on, it should have space to format the data to the packet if the data arriving through the FIFO is already of a suitable type. The other audio packet formats look somewhat simpler than packing in SPDIF channel status for L-PCM, so I'd expect it could fit the footprint available. Not sure how much a 640x480 signal will be able to do with high bit rate audio though, there's not really the bandwidth. Also to do much of this speciality audio beyond L-PCM requires more information which is locked away for content protection etc, and also likely needs more processing of actual EDID stuff from the sink.
FIFO mode audio seems to be working now which is good. I can hear the nice pure tone over HDMI (I dug up a portable LCD panel I can use that nicely has an audio output from HDMI). However for the DAC output from the P2 it has a really nasty buzz to it with the tone present as well when I feed that to the same amplified speakers instead of the HDMI LCD panel's audio output. I sort of wonder if I might be overdriving this little headphone amp on the VGA/AV board or it could be jitter perhaps (but jitter doesn't usually create a nasty buzz). I also tried with battery power due to the possible ground loops I might have here but it didn't help. The 75R DAC setting seems louder so I dropped down to 600 ohm output impedance hopeful of a better result but it still buzzes hard. wrpin ##P_DAC_DITHER_PWM|P_OE|P_DAC_600R_2V, b
Also I've scaled the sound generator down to use a smaller $FFF amplitude instead of $7FFF but that horrid buzz remains. I do XOR with $80008000 to DC offset it. So that doesn't seem to be it. Without the XOR it's a lot worse. sample,_ := polxy($FFF,phase+=freq)
mov temp2, spright ' just take pure L&R samples for now
xor temp2, ##$80008000 ' add DC offsets
'debug("writing ", uhex(temp2), udec(audio_inbuf_level))
and audio_inbuf_level, #$1ff ' ensure we don't jump into higher values
cmp audio_inbuf_level, #4 wc ' check for room left in buffer
if_c mov audio_inbuffer+3, audio_inbuffer+2 ' make room for new sample
if_c mov audio_inbuffer+2, audio_inbuffer+1
if_c mov audio_inbuffer+1, audio_inbuffer+0
if_c altd audio_inbuf_level, #audio_inbuffer ' write into buffer
if_c mov 0-0, temp2
if_c add audio_inbuf_level, #1 ' increase buffer level
if_c drvnot #57
UPDATE: Fixed! I messed up above and still had left some older altd code in the writer side during testing, code should have been this:
Now the DAC output sounds a lot cleaner. Not perfect though, there's still another buzz of sorts. The tone is stronger now and this buzz is lessened, but a little more crackly now.
@Wuerfel_21 said:
I just saw the EDIT. Can you post both ends of the current code?
Yeah here it was. In other news before I finished up last night I did measure the push/pull operations with some drvnots and the scope and I never saw it trigger when set to trigger with transitions over 11us apart. So we are polling at least at 90kHz or so. I don't think it's dropping samples due to being late. I realized overnight I should scope the analog output too. I wonder if it's driving out a valid sample, zero, sample, zero, etc. Something that's chopping up the samples and causing a buzz. It was way worse before I fixed the altd mistake but there is still a buzz. I also realized there is no need to check the push side for the inbuf level. You can simply rotate in the new sample from the FIFO every time. Just only need to clip the inbuf_level to at most 4 samples (or perhaps it's 3).
patch_audio
org audio_poll_repo
start_patch_audio testp #0-0 wc 'test a single DAC pin for readiness (use second DAC pin from below)
if_c and audio_inbuf_level, #$1ff wz ' see if we can take a sample (examine the 9 bit value)
if_c_and_nz alts audio_inbuf_level, .inbuf_decr
if_c_and_nz mov spright, 0-0
if_c_and_nz getword spleft, spright, #1
.pptchs1 if_c_and_nz wypin spright, #0-0 ' patch right DAC pin
.pptchs2 if_c_and_nz wypin spleft, #0-0 ' patch left DAC pin
' debug("new dac poll", uhex(spright),udec(audio_inbuf_level))
if_c_and_nz drvnot #56
.retins ret wcz
.inbuf_decr
end_patch_audio long audio_inbuffer + ($1ff<<9) 'adding $1ff to a value "decreases" it if treated as 9 bits only
@Wuerfel_21 said:
I just saw the EDIT. Can you post both ends of the current code?
also realized there is no need to check the push side for the inbuf level. You can simply rotate in the new sample from the FIFO every time. Just only need to clip the inbuf_level to at most 4 samples (or perhaps it's 3).
It's really the same, it just changes which sample you drop (the one you're pushing out or the one you would've pushed), so it's a matter of what's more convenient.
Hey do you know how to DEBUG output the flags?
You don't. (i.e. use WRZ/WRC to get them into a temp register). Yes, it's stupid. (Though flexspin's DEBUG code is easy to edit, so it should be possible to add a flag dump command)
Just scoped the analog pins playing back the samples. One of them doesn't seems good. Spikes every 22ms or so. At 48kHz that's around 22*48 samples or ~1024. I have a 1024 long sample buffer so it could be a fifo wrap around issue. Funny don't seem to hear it via HDMI output. I'll need to retest that.
It'll be due to my laziness here in the test code...
PUB soundgen2(fifostart,fifoend, fifoptr, period) | time,phase,sample,freq, inc, tailptr, p
time := getct()
freq := 1<<26
inc:=1<<24
p:=fifostart
repeat
tailptr:=long[fifoptr]
if abs(tailptr - p) > 16
sample,_ := polxy($FFF,phase+=freq)
sample.word[1] := sample.word[0]
long[p]:=sample
if p <> fifoend
p+=4
else
p:=fifostart
Interestingly the lower DAC pin shows the reverse transition making it hard to capture. But I see it on the scope and caught one.
Update: it happens at the same time as the other glitch when I triggered on that and captured both channels.
Fixed: it was fifo related but not the code above. Bug was wrong fifo end size parameter. I'd shrunk the fifo size to 512 to get the debug code to fit, but forgot to shrink the fifo end param as well. DAC code is now working. Don't like the DAC output that much on the scope though, it's not very clean and looks looks pretty nosiy for both DAC_DITHER_PWM and DAC_DITHER_RND, especially with a square wave. Hopefully all that high frequency noise is out of hearing range so it still works out ok in real life.
HDMI FIFO and REPO mode are both working with audio. Am now trying to setup a nice volume ramp. Using SCAS to adjust sample amplitude in the buffer appears to be working, but it needs to have some suitable mapping as it's not ideal simply multiplying the amplitude and much of the scale is wasted. I read somewhere that 50% volume should be about -20dB which is approx 6 * 3dB or something like 0.707^6 I guess with respect to amplitude. Need to come up with some decent mapping there. Maybe needs to use the CORDIC to avoid a look up table.
andn volumectl, ##$80008000
getword rvol, volumectl, #0
getword lvol, volumectl, #1
...
getword spleft,audio_hist+3, #1 'left value
scas spleft, lvol
rolword audio_hist+3, 0-0, #0
getword spleft, audio_hist+3, #1 ' now right value
scas spleft, rvol
rolword audio_hist+3, 0-0, #0
Maybe the formulae would be something like this: SCAS multiplier value = 4096 * 0.707^((100-vol%)*6/50) where vol% is the volume level from 0-100
Comments
???? No, that seems too complicated. You want the sliding buffer thing that I used. You just use 3 MOVs to shift up the buffer before adding a new sample. Much simpler and less likely to get itself into weird states.
Yeah I know, any DAC stuff needs more to make it work. If it is even going to be possible to work correctly I'm not yet sure given the semi randomized times both the poller and the encoder run on the scan lines and it would need careful sync up at the start. The only other DAC related thing I was considering was a mono-mix option for L&R channels to be combined if there is only a single analog DAC pin configured. But analog DAC output is not really going to be the main focus right now. I need to test all the other driver init changes I've made that have been accumulating in the code which seem to be quite a lot when I diff against an older working version. It's probably quite likely that something would have been broken by now after all those changes.
I may add an option of a COG ATN notification at FIFO wraparound as when crossing 1/2 way through the buffer. This might be of use to COGs that want to just ping-pong buffer their samples in the output FIFO, although they can still poll the current tail pointer address at any time and use that also.
As things stand here are the features of this experimental video driver work, not everything audio related is fully tested yet but confidence is very high that it will work:
uses existing data structures of existing P2 video driver with all existing features still included
DVI or HDMI compatible output formats
HDMI compatible output mode can have
FIFO mode audio - audio samples arrive using a circular buffer FIFO and are delivered at the configured sample rate of 32/44.1/48kHz over the HDMI pins
AVI infoFrame source data is read from HUB RAM, passed at startup time for encoding to TERC packet and sent out during last vsync line
a volume control parameter (containing independent words for left/right channels) is updated after vertical sync which can be used as a master volume for muting or ramping up/down audio to prevent clicks/pops etc while audio samples start or stop
Other/future ideas:
It’d be neat to support audio sample rate changes, at least in FIFO mode, in case an application wanted to dynamically choose between 44.1kHz or 48kHz output for example without restarting the driver. Not sure if that’s going be doable yet. Changing the video infoFrame data contents between YUV and RGB formats on the fly without a restart could also be useful in a video player application. That requires a packet re-encoding step which could happen by the caller but nicer if the driver can do it as the TERC encoding already exists there anyway.
It could be handy to have the ability to send other HDMI packet formats out on the vsync line, like general control packets or other infoFrames types or just for experimental use. There is some time during blanking for doing other things while still encoding audio samples for the next line. Approximately 2/3’s of the active time in these scan lines are free and we could potentially read in some special overlay to LUTRAM to TERC encode two additional packets in this time for example. Once the audio has been encoded during blanking we’d have the 512 longs of LUTRAM fully available for overlay use until the active video portion starts because the 256 entry palette is not needed at this time (apart from during the first vertical back porch line while the last visible line is still being displayed which might still be using it).
Only just saw this as our posts crossed paths last night and then the page break occurred but maybe given my volume control step I could always limit the sample range by clamping the volume multiplier to something under ff00, or at least just for the DAC based samples.
Ok, after taking that into consideration here's the updated code which might work now. It just fits inside the footprint after reclaiming some longs and I even had an instruction free to exit early if the poller is not ready which will be much of the time.
The REPO code: I keep the audio_hist and audio.inbuffer arrays directly following it. The audio_hist array and .inbuf_incr can get reclaimed in FIFO mode while audio_inbuffer needs to remain in place. I had planned a COGATN here too in an earlier scheme but that's not really needed now.
The FIFO code once patched over the top of REPO code:
The writer side will check the audio_inbuf_level and write at that offset in the array and increment this value. If the value of the audio_inbuf_level ever reaches 4 it will drop the sample (DAC was too slow/late to take it).
Something like this (omitting the DAC level stuff which I still have to figure out)
I'd have done the shifting on the input side...
Also you can save an instruction on both sides:
Is exiting early essential time-wise? If not another instruction could be saved, from the look of it. (I haven't studied the code in any detail.)
With things like this I'd prefer a constant runtime, actually, since that reduces the possibility of unfortunate interactions.
Yeah I can save that instruction - I had a spare slot from this redundant COGATN I'd added, so I used this but I did wonder at the time if it was just safer to burn the cycles to keep the execution time constant too - it adds more jitter to sometimes exit early and sometimes not. I should be able to poll a scan line about 5 times in the code at a minimum when heavily loaded doubling 24bpp pixels (approximately 1600-2000 cycles apart) so it could save a little bit of time but you shouldn't really bank on it. If the shifting was moved it probably could reduce this. There are plenty of cycles left with 32/44.1/48kHz audio output for doing this, though a 96kHz output rate might be pushing it - I know it was already a bit too close before. We will see, actually perhaps FIFO mode can help save some extra time during packet encoding due to not resampling like REPO mode does, in fact that'll probably buy us a lot actually.
Did look at that, seemed bit messier that way but probably also doable. Problem is you need to take either one or two samples per line before the audio buffer gets shifted down again for you by the writer and you don't really know which will occur ahead of time without doing timing calculations.
I see the ALTR use does save a cycle, but the poller one as coded won't work. I think you probably meant this:
>
No, it's really the same thing. When you push one you adjust the buffer level. I guess it's hard to describe.
Yes.
Yeah if you wanted to hack up the idea in a code snippet it might be easier to explain that way. I found it tricky the first time to do it that way and just moved onto this simpler solution. I might have given up too easily by switching out the buffer rotation to the poller, but was probably missing something basic.
Something to this extent
Hard to say that it's correct without testing. Also a bit tired right now. I think either side being able to drop a sample is needed to be able to always recover into a good state.. Though in practice the samples should be added at the same pace as they're pulled.
Ok I see.
The problem I had was on lines with two samples emitted and occurs when the buffer level jumps up by two. You'd then read a fresher sample out before the older one at the pull side because it was indexing off decreasing audio_inbuf_level. But you solved that by reversing the buffer order. If there are no samples you don't update the DAC until there are, and if the DAC hasn't been keeping up (it should if coded right), then you simply drop a sample. I believe such an approach should recover. The main issue is probably in the initial samples, you sort of want to preload something before the scan line processing starts if no samples are present because once initialized the DACs will always want to take something potentially before you've even created the first sample.
It'll be interesting to hear the audio from the DACs, hopefully this will be ready to test soon. Time wise, it's going to be a very jittery execution process both when reading from the FIFO source, and also in polling for new sample data to write to the DACs. The internal buffer could get loaded as soon as the start of one scan line after hsync (when build audio runs early on blanking lines), and as late as the end of the next scan line (once processing doubled pixels). That's a lot of variance in time. Ideally it'd be pre-filled with zeroes at the start to prime it. Another idea is to always run the build_audio process at the start of the scan lines and then do pixels which will at least refill consistently in time, but I don't want to do that because I need to issue PSRAM requests as early as possible to give the data time to arrive and this is part of my graphics processing code.
Actually by moving the buffer rotation to the writer side, I can increase its size again (since the footprint in COGRAM is reduced) and that helps the internal FIFO deal with larger amounts of jitter. So there are more advantages in doing this.
4 slots really ought to be enough, maybe 5.The core problem is that both pushing/pulling happens synchronized to something else, so as a result, the buffer level stays constant (averaged over time). So if the FIFO process is late once (and the DAC doesn't get a new sample), the average buffer level increases. If the FIFO process is too early and the buffer is full, the average level decreases. To note here is that the DAC will simply repeat the previous sample if it doesn't get a new one in time. Also the IN flag will obviously only lower on WYPIN.
...
That earlier Idea I had about locking HSync == sample rate seems increasingly attractive. Not good for general purpose video or emulators though.
Yeah slips will change the notional steady state level. Am hopeful once synced up at the start that if you always poll faster than the actual DAC sample interval (which is every 5250 clocks for 48kHz on a 252MHz P2) and don't underflow the sample buffer it'd still keep up and we'd have no drops anywhere. Obviously the DAC sample rate and FIFO dequeue rate have to match perfectly down to the P2 clock cycle for that to be the case. Having a larger internal DAC buffer would allow for more randomized fill times by soaking up more variation.
We just need to ensure we don't get into some pathological condition where it slips continuously because the internal buffer level is too low and can't ever refill to a good level in time. I think it would still probably be able to recover from that because it would eventually advance an entire sample interval's worth of time every time that occurred giving the internal buffer time to build up again if the fill rate matches the drain rate. A glitch like that should hopefully just be transient in nature and then die out.
Yeah that'd be helpful but only if you had full control of the sample rate, in some dedicated application perhaps.
In other news, I've added the ability to send multiple pre-encoded infoFrames on the last last Vsync line (currently limited up to 4 extra packets after clk regen and audio but it could still be increased further). Right now only AVI and audio infoFrames are actually useful but it could in theory be configured to send other packet types like Vendor Specific or Source Product Descriptor packets. I did this with the intent of allowing more dynamic data island packets to be sent in the future, beyond just infoFrames. When dynamically sent they'd still need to be TERC encoded first. Ideally I'll add something to do that work maybe in the last vertical front porch blanking line if there is free code space to do so.
When the driver is initialized it would be sent an extra pointer (configInfo) to a structure containing HDMI specific information, which is setup by the initHdmiConfig method. If passed a null instead it would just default to DVI output. Here's the basic idea, args might still be changing to include pins independently. Not sure what the actual arg limit is so I kept it packed for now. I convert sample rate internally to sample period for end user convenience.
For repo/resample mode, you should really accept sample rate an integer period, it needs to be an exact match with the generator cog.
Also, when I was last messing with the upstairs Smasnug 3DTV I did realize that it treats 640x480 a bit specially. If I do 854x480 I can force 3D on with the remote. So I should try the vendor packet to that extent again.
Yeah it should match your numbers. I do this in the code. If sample rate == 0 then I disable all audio output (but keep HDMI framing).
Maybe make it so negative numbers bypass the conversion and just specify an actual period.
I can do that yeah.
Another bonus of the FIFO based scheme is that in principle it allows other audio formats (high bit rate, DST, one bit audio) to be sent out within the audio packets defined for that type. A different LUTRAM overlay that is aware of the desired audio format can be read in and generate the samples. Because there won't be any resampling going on, it should have space to format the data to the packet if the data arriving through the FIFO is already of a suitable type. The other audio packet formats look somewhat simpler than packing in SPDIF channel status for L-PCM, so I'd expect it could fit the footprint available. Not sure how much a 640x480 signal will be able to do with high bit rate audio though, there's not really the bandwidth. Also to do much of this speciality audio beyond L-PCM requires more information which is locked away for content protection etc, and also likely needs more processing of actual EDID stuff from the sink.
FIFO mode audio seems to be working now which is good. I can hear the nice pure tone over HDMI (I dug up a portable LCD panel I can use that nicely has an audio output from HDMI). However for the DAC output from the P2 it has a really nasty buzz to it with the tone present as well when I feed that to the same amplified speakers instead of the HDMI LCD panel's audio output. I sort of wonder if I might be overdriving this little headphone amp on the VGA/AV board or it could be jitter perhaps (but jitter doesn't usually create a nasty buzz). I also tried with battery power due to the possible ground loops I might have here but it didn't help. The 75R DAC setting seems louder so I dropped down to 600 ohm output impedance hopeful of a better result but it still buzzes hard.
wrpin ##P_DAC_DITHER_PWM|P_OE|P_DAC_600R_2V, b
Also I've scaled the sound generator down to use a smaller $FFF amplitude instead of $7FFF but that horrid buzz remains. I do XOR with $80008000 to DC offset it. So that doesn't seem to be it. Without the XOR it's a lot worse.
sample,_ := polxy($FFF,phase+=freq)
UPDATE: Fixed! I messed up above and still had left some older altd code in the writer side during testing, code should have been this:
Now the DAC output sounds a lot cleaner. Not perfect though, there's still another buzz of sorts. The tone is stronger now and this buzz is lessened, but a little more crackly now.
If you're shifting the buffer, you always put the new sample into the lowest slot. You shouldn't use ALTD/ALTR.
I just saw the EDIT. Can you post both ends of the current code?
Yeah here it was. In other news before I finished up last night I did measure the push/pull operations with some drvnots and the scope and I never saw it trigger when set to trigger with transitions over 11us apart. So we are polling at least at 90kHz or so. I don't think it's dropping samples due to being late. I realized overnight I should scope the analog output too. I wonder if it's driving out a valid sample, zero, sample, zero, etc. Something that's chopping up the samples and causing a buzz. It was way worse before I fixed the altd mistake but there is still a buzz. I also realized there is no need to check the push side for the inbuf level. You can simply rotate in the new sample from the FIFO every time. Just only need to clip the inbuf_level to at most 4 samples (or perhaps it's 3).
Hey do you know how to DEBUG output the flags?
It's really the same, it just changes which sample you drop (the one you're pushing out or the one you would've pushed), so it's a matter of what's more convenient.
You don't. (i.e. use WRZ/WRC to get them into a temp register). Yes, it's stupid. (Though flexspin's DEBUG code is easy to edit, so it should be possible to add a flag dump command)
Just scoped the analog pins playing back the samples. One of them doesn't seems good. Spikes every 22ms or so. At 48kHz that's around 22*48 samples or ~1024. I have a 1024 long sample buffer so it could be a fifo wrap around issue. Funny don't seem to hear it via HDMI output. I'll need to retest that.
It'll be due to my laziness here in the test code...
Interestingly the lower DAC pin shows the reverse transition making it hard to capture. But I see it on the scope and caught one.
Update: it happens at the same time as the other glitch when I triggered on that and captured both channels.
Fixed: it was fifo related but not the code above. Bug was wrong fifo end size parameter. I'd shrunk the fifo size to 512 to get the debug code to fit, but forgot to shrink the fifo end param as well. DAC code is now working. Don't like the DAC output that much on the scope though, it's not very clean and looks looks pretty nosiy for both DAC_DITHER_PWM and DAC_DITHER_RND, especially with a square wave. Hopefully all that high frequency noise is out of hearing range so it still works out ok in real life.
HDMI FIFO and REPO mode are both working with audio. Am now trying to setup a nice volume ramp. Using SCAS to adjust sample amplitude in the buffer appears to be working, but it needs to have some suitable mapping as it's not ideal simply multiplying the amplitude and much of the scale is wasted. I read somewhere that 50% volume should be about -20dB which is approx 6 * 3dB or something like 0.707^6 I guess with respect to amplitude. Need to come up with some decent mapping there. Maybe needs to use the CORDIC to avoid a look up table.
Maybe the formulae would be something like this: SCAS multiplier value = 4096 * 0.707^((100-vol%)*6/50) where vol% is the volume level from 0-100