So I've been trying to improve the PAL and NTSC stuff in my video driver in the last couple of days. I took Chip's more accurate timing calculations from this thread https://forums.parallax.com/discussion/173102/ntsc-pal-driver-without-dot-crawl-new-version/p1
and recalculated timing parameters for these four SDTV modes output using composite/S-video:
Progressive: 288p (PAL), 240p (NTSC)
Interlaced: 576i (PAL), 480i (NTSC)
I also took @Wuerfel_21's updated colourspace matrix settings for NTSC/PAL and added the 3.3V DAC pin setting for improved yellow output range.
Final result is a lot better than my original settings which we knew were less than ideal.
For NTSC and PAL via S-video, results are very good as expected - it looked amazing on my Plasma actually. And for composite NTSC both progressive and interlaced look pretty good too. For PAL the results are still not ideal with visible chroma crawl in interlaced mode despite tweaking timing to get the colour subcarrier frequency as close as possible to the 4433618.7516MHz value as I could and computing XFREQ accurately with respect to this frequency attempting to lock it to it. While the text is crisp and the colour is good it just shimmers with colour dots visible (and I've setup 40 column text instead of 80 which helps reduce the high resolution of the text) and the vertical graphical lines also crawl upwards. A video is attached of it in action. Progressive mode PAL does look better at least as there is no crawl everything looks frozen but it does show the dot pattern - see photo. Actually it does crawl very slowly but the cycle time is over tens of seconds so it's not really noticeable. On my Dell LCD (photo'd) these dots are highly noticeable, but the same signal into the Plasma TV looks less dotty and more like it's wavy and underwater, so it's been filtered somewhat.
Yea the exact length and number of scanlines is what affects it. For lowres NTSC you ideally want each scanline to be exactly N+0.5 color cycles and an odd number of total scanlines (263 probably). That way the errors even out the best (for static images). Each scanline will have roughly the opposite error of it's upper/lower neighbor and the errors invert every frame. For interlace it's a 90° shift every field, which I guess is good?
Right now I'm not sure what's ideal for PAL (since the phase is flipping on its own to begin with).
PAL60 is kinda messed up in this regard from the beginning, nothing matches up nicely. Even "real" PAL60 sources like the Wii have annoying dot crawl. Though that uses a really nice video DAC chip ("AVE-RVL") that filters the signal to avoid artifacts in the first place, so it's not as bad.
Maybe that could be built externally i.e. output S-Video from the P2 chip and put a switchable 3.58/4.43 MHz trap filter on the Y line before mixing the signals. The common AD725 encoder has something like that built-in, so I think that works in practice. You'd really want to filter IQ before modulation, too, but oh well.
@TonyB_ said:
What PAL colour subcarrier frequency (fcsc) and line frequency (fline) are you using?
Well I first started out with what Chip used which was an integer 4433618Hz and then tweaked it further by adjusting to 4433618.75 and even 4433618.7516 (true frequency that re-aligns colour phase every 8 fields with that extra Fline/625 offset added).
Line frequency was computed based on 283.75 times this in Chip's code below.
pal_cf = 4_433_618 'colorburst frequency
pal_cc = round(283.75 * 4.0) 'color cycles per line * 4 to preserve fraction
...
dotf := muldiv64(x_total, pal ? pal_cf * 4 * 128 : ntsc_cf * 4 * 128, pal ? pal_cc : ntsc_cc) 'compute pixel clock * 128
cf := (pal ? pal_cf : ntsc_cf) frac freq
...
i := 31 - encod freq 'compute very accurate streamer frequency to stop dot-crawl
xf := ((dotf >> (7-i)) frac (freq << i) + 1) >> 1
EDIT:
The exact relationship is fcsc = 283.75 * fline + 25 Hz
where fline = 15625 Hz and fcsc = 4.43361875 MHz
The extra 25 Hz introduces one or two horrible big prime numbers when synthesizing fcsc
Just wondering whether you've tried fcsc = 283.75 * fline
If sysclk = 283.75 MHz and fline = 15625 Hz then fcsc = sysclk / 64 = 4.43359375 MHz
20.0 MHz * 227 / 16 = 283.75 MHz
283.75 MHz / 19 (or 20) pixel clock outputs very close (or close) to square pixels for 768x576 res.
I think the original PAL timing was probably somewhat better before I started tweaking it so I'll probably revert back to that for now. IIRC it showed the text dots (perhaps more of an LCD monitor artefact with poor filtering) and jagged vertical lines but less actual visible crawl.
I will also try out your P2 clock numbers TonyB_, but if they don't improve things further than Chip's stuff, for PAL use I'd probably just stick with S-video for the output with PAL. Composite PAL seems too hard to get spot on with the P2 HW even when you make all attempts at getting everything computed correctly, unless PAL always has this amount of crawl and relies on very good filtering from devices to reduce it.
Here was the hacked up vsync handling I'd tested - copied it into the mouse sprite code for now to make room for testing until I remove DVI to free COGRAM space it needs. When I scoped it I captured all four cases of the colour bursts at sync time for the four fields and saw it was doing the correct thing. Also the PAL phase (CQ value) gets reset to the same initial value before the back porch blanking begins after this code, so the burst has the correct polarity, and is XORd to flip it to the other polarity per each scan line after that. This is what is meant to occur AFAIK.
interlacedcode 'some different sync code patches
testb fieldcount,#0 wz
testb fieldcount,#1 wc
if_c_ne_z sets doburst, #hsync0 'disable colourburst fields 2,3
callpa #1, #blank 'send a blank line
sets doburst, #hsync0 'disable all colourburst cases
if_nz call #\hsync 'send
if_nz xcont m_slim, hsync0 'send 1/2 line for field 1,3
rep #2, #5-0 'defaults to PAL
xcont sync_000, hsync1 'generate horizontal blanking/sync
xcont sync_001, hsync0
decod status, #31 'update status - in vertical sync
setbyte status, fieldcount, #2 'update field counter in status
wrlong status, statusaddr
rep #2, #5-0 'defaults to PAL
xcont sync_002, hsync1 'generate horizontal blanking/sync
xcont sync_003, hsync0
cogatn #0-0
rep #2, #5-0 'defaults to PAL
xcont sync_000, hsync1 'generate horizontal blanking/sync
xcont sync_001, hsync0
if_z xcont m_half, hsync0 'remainder after half line (line 318, field 2,4)
if_c_eq_z callpa #1,#blank 'send a full line (fields 1,4, lines 6,319)
sets doburst, #colourburst 'enable all colourburst outputs from now
if_00 setd patchvbp, #15 ' 7-21 22-309 (288 lines)
if_01 setd patchvbp, #15 ' 319 - 333 inclusive 334-621 (288 lines)
if_10 setd patchvbp, #16 ' 6-21 22-309 (288 lines)
if_11 setd patchvbp, #14 ' 320-333 inclusive 334-621 (288 lines)
jmp #fieldloop
@TonyB_ said:
Just wondering whether you've tried fcsc = 283.75 * fline
If sysclk = 283.75 MHz and fline = 15625 Hz then fcsc = sysclk / 64 = 4.43359375 MHz
20.0 MHz * 227 / 16 = 283.75 MHz
283.75 MHz / 19 (or 20) pixel clock outputs very close (or close) to square pixels for 768x576 res.
Just tried this, it does give good results but still has that bad dot crawl I already see with my own tests. A brief video is attached showing the output on my plasma using these settings - gives similar results to mine.
@Wuerfel_21 said:
Maybe that could be built externally i.e. output S-Video from the P2 chip and put a switchable 3.58/4.43 MHz trap filter on the Y line before mixing the signals. The common AD725 encoder has something like that built-in, so I think that works in practice. You'd really want to filter IQ before modulation, too, but oh well.
Yeah I think many of the problems now probably mainly come down to combining Luma+Chroma inside the P2 without filtering because the S-video output looks really nice.
@TonyB_ said:
Just wondering whether you've tried fcsc = 283.75 * fline
If sysclk = 283.75 MHz and fline = 15625 Hz then fcsc = sysclk / 64 = 4.43359375 MHz
20.0 MHz * 227 / 16 = 283.75 MHz
283.75 MHz / 19 (or 20) pixel clock outputs very close (or close) to square pixels for 768x576 res.
Just tried this, it does give good results but still has that bad dot crawl I already see with my own tests. A brief video is attached showing the output on my plasma using these settings - gives similar results to mine.
Thanks for testing. I think fcsc = 283.75 * fline is the best that can be achieved. It seems that's what Chip's code is doing and results using his code should look similar. To minimize PAL dot crawl, a "precision offset" of 1/625 * fline needs to be added to 283.75 * fline but it's a tiny addition, only 5.6 ppm. The insurmountable problem on the P2 is that fline and fcsc are both integer fractions of sysclk, which would need to be 10 times faster for the precision offset to be possible.
I assume you tried the obvious "just add/subtract 1 from CFREQ" thing?
Another factor in this is that XZERO resets the pixel NCO phase (so if used, each scanline is an integer number of cycles), but not the carrier phase. So XFREQ can be an exact fraction of the master clock, but CFREQ can only be multiples of clkfreq/232. Smart choice of system clock can help here. IMO at the clock ratios involved in SDTV (<= clkfreq/16) the XFREQ does not need to be an exact fraction, as long as XZERO is used to keep everything aligned.
@Wuerfel_21 said:
I assume you tried the obvious "just add/subtract 1 from CFREQ" thing?
You mean keep tweaking its value until I see it working nicer? Or something dynamic in the code? What is obvious here?
Another factor in this is that XZERO resets the pixel NCO phase (so if used, each scanline is an integer number of cycles), but not the carrier phase. So XFREQ can be an exact fraction of the master clock, but CFREQ can only be multiples of clkfreq/232. Smart choice of system clock can help here. IMO at the clock ratios involved in SDTV (<= clkfreq/16) the XFREQ does not need to be an exact fraction, as long as XZERO is used to keep everything aligned.
Yeah I do an xzero per scanline so the phase error doesn't accumulate. It's always the correct number of clock cycles per line.
hsync xzero m_sn, hsync1 'generate the sync pulse
wrlong status, statusaddr 'update the sync status per line
dobreeze xcont m_br, hsync0 'do breezeway before colour burst
setcq cq 'reapply CQ for PAL colour changes
doburst xcont m_cb, colourburst 'do the PAL/NTSC colour burst
flipref xor cq, palflipcq 'toggle PAL colour output per scanline
bp xcont m_bv, hsync0 'generate the back porch
notify _ret_ cogatn #0-0 'notify new scan line status during hsync
Obvious in the sense of "just wiggle the CFREQ value up and down and see if that improves it". Also, remember that the XFREQ needs to be rounded up if you want an exact fraction, otherwise there's an extra cycle after XZERO.
Hey @Wuerfel_21 , are these colorspace values for component SD and component HD known to be correct in NeoVGA? I was going to use these too but didn't know if they were calculated as the working wasn't shown unlike what you did for PAL/NTSC, it's just raw values. Am talking about your params for BT.601 and Rec.709 below.
cmode_ypbpr601 ' YPbPr (with Rec. 601 matrix for SDTV)
long $00_00_00_00 ' DAC blanking value
long Y_BLANK ' Blanking color
long Y_SYNC ' HSync color
long Y_SYNC ' VSync color
long Y_BLANK ' HSync+VSync color
long 0 ' Color burst color (NTSC/PAL only)
long 0 ' Color burst frequency (NTSC/PAL only)
long ( 45&$FF) << 24 + (-38&$FF) << 16 + ( -7&$FF) << 8 + 128 ' CY
long ( 27&$FF) << 24 + ( 53&$FF) << 16 + ( 10&$FF) << 8 + BLANK_LEVEL ' CI
long (-15&$FF) << 24 + (-30&$FF) << 16 + ( 45&$FF) << 8 + 128 ' CQ
long 0 ' CQ XOR value (NTSC/PAL only)
cmode_ypbpr709 ' YPbPr (with Rec. 709 matrix for HDTV)
long $00_00_00_00 ' DAC blanking value
long Y_BLANK ' Blanking color
long Y_SYNC ' HSync color
long Y_SYNC ' VSync color
long Y_BLANK ' HSync+VSync color
long 0 ' Color burst color (NTSC/PAL only)
long 0 ' Color burst frequency (NTSC/PAL only)
long ( 45&$FF) << 24 + (-41&$FF) << 16 + ( -4&$FF) << 8 + 128 ' CY
long ( 19&$FF) << 24 + ( 64&$FF) << 16 + ( 7&$FF) << 8 + BLANK_LEVEL ' CI
long (-10&$FF) << 24 + (-35&$FF) << 16 + ( 45&$FF) << 8 + 128 ' CQ
long 0 ' CQ XOR value (NTSC/PAL only)
I think SDTV/EDTV analog progressive component video like 480p/576p uses BT601 and HDTV resolutions like 720p/1080i/p use the Rec709 one. However it's somewhat confusing. In fact after DVB-t DTV was initially released in Australia around 2001 some channel (7?) that broadcast 576p actually had the gall to call it "HD" so they didn't have to invest in more expensive equipment at the time proper HDTV came in. What a joke that was.
That link to that Demystified book I posted above had more details listed in Chapters 3 and 5.
@TonyB_ said:
Just wondering whether you've tried fcsc = 283.75 * fline
If sysclk = 283.75 MHz and fline = 15625 Hz then fcsc = sysclk / 64 = 4.43359375 MHz
20.0 MHz * 227 / 16 = 283.75 MHz
283.75 MHz / 19 (or 20) pixel clock outputs very close (or close) to square pixels for 768x576 res.
Just tried this, it does give good results but still has that bad dot crawl I already see with my own tests. A brief video is attached showing the output on my plasma using these settings - gives similar results to mine.
Thanks for testing. I think fcsc = 283.75 * fline is the best that can be achieved. It seems that's what Chip's code is doing and results using his code should look similar. To minimize PAL dot crawl, a "precision offset" of 1/625 * fline needs to be added to 283.75 * fline but it's a tiny addition, only 5.6 ppm. The insurmountable problem on the P2 is that fline and fcsc are both integer fractions of sysclk, which would need to be 10 times faster for the precision offset to be possible.
@rogloh
According to the BBC TV Signal Coding document, there should be diagonal lines moving slowly up or down when using an odd number of quarter cycles, which is true when fcsc = 283.75 * fline.
Does NTSC composite have less dot crawl than PAL? Just wondering what happens if you keep sysclk = 283.75 MHz and fcsc = sysclk / 64 and adjust fline a little:
Hi @TonyB_
I tested your other two variants with 283.5 and 284.0 as the divisors. 283.5 was probably quite a bit better than the 284.0 case. I'd have to say on first impression of using 283.5, it probably looks better than 283.75 on my LCD monitor - still to drag it over and take a look with the Plasma in the other room. I really need a good way to switch out these values dynamically or show two different outputs from two COGs side by side on two monitors to compare visually faster rather than recompute values, rebuild and reload etc and in the mean time forget what the last one looked like. But that will take too long to setup for now. I'll see if I can reload XFREQ from keypress or something...
Update: yeah on the Plasma the 283.5 setting looked quite a bit better than the original 283.75 setting. I just can't understand why the correct value of 283.75 is not working out for PAL timing on the P2. I think I'll keep this 283.5 value for now as the best yet.
One thing I'm sort of wondering is if I do go and start to tweak the XFREQ value slightly until I think it's best visually, whether that will only work "best" on my own P2 given it's crystal frequency will be different to other people's P2s, or whether it will still be applicable on other systems.
For our reference, these are the values I used that Chip's timing generator code computes for the modified 283.5 colour cycles/line and a 283.75MHz clock (with 908 total pixels/line) for PAL 50Hz ntsc.start(1, 1, 908, 720, 0, 288, 0, 283750000, @timing[0])
@rogloh said:
Update: yeah on the Plasma the 283.5 setting looked quite a bit better than the original 283.75 setting. I just can't understand why the correct value of 283.75 is not working out for PAL timing on the P2. I think I'll keep this 283.5 value for now as the best yet.
One thing I'm sort of wondering is if I do go and start to tweak the XFREQ value slightly until I think it's best visually, whether that will only work "best" on my own P2 given it's crystal frequency will be different to other people's P2s, or whether it will still be applicable on other systems.
For our reference, these are the values I used that Chip's timing generator code computes for the modified 283.5 colour cycles/line and a 283.75MHz clock (with 908 total pixels/line) for PAL 50Hz ntsc.start(1, 1, 908, 720, 0, 288, 0, 283750000, @timing[0])
I calculate that xfreq = 107468869 is more accurate for fline = fcsc / 283.5.
18144 / 908 = 19.9823788546
2^31 / 19.9823788546 = 107468868.628
Round up to 107468869
Might not make any difference. Daft question, you are testing interlaced video?
EDIT:
xfreq for 283.75 = 107374183 = $666_6666 + 1
My P2 Eval B has been out of action for a while. It was setup to work with an old Windows PC but that or the Eval was giving off a nasty burning smell and smoke. I switched both off and haven't looked for the cause yet.
I calculate that xfreq = 107468869 is more accurate for fline = fcsc / 283.5.
18144 / 908 = 19.9823788546
2^31 / 19.9823788546 = 107468868.628
Round up to 107468869
Hmm, yeah I've also noticed that Chip's calculations doesn't give the exact same results as something done manually using higher precision eg. a via Numbers spreadsheet on Mac. Could be down to single precision floats in toolchain perhaps.
EDIT: actually looking at the code again, that doesn't make sense as the computation is done using MULDIV64 in SPIN2 and the frac operator. I wonder if some precision is getting lost there somehow.
Might not make any difference. Daft question, you are testing interlaced video?
Yes.
EDIT:
xfreq for 283.75 = 107374183 = $666_6666 + 1
BTW I tried out @Wuerfel_21 's SDTV and HDTV component colourspace settings on my LCD and Plasma TV. RGBI colour swatches look reasonable on first impression so there's not some gross error apparent. I don't do colorimetry or have calibration gear etc so wouldn't know if it's accurate but it looks okay to me. Photo shows it as very grainy, looks far better in real life although I wasn't using a proper coax - this is 720p50 component HD.
I've also added a bunch of new stock resolution timings to the driver code. Not all are supported by my test gear here so YMMV. I did test 720p50, 720p60, 1080p24, 1080i50, 1080i60 and worked on my plasma TV. Not sure progressive 1080p50 and 1080p60 are supported over component on my plasma but I should try that out too. Update: nope, nothing displayed, unless it needs a proper coax instead of the cheap old DVD component cables I used. I recall you weren't allowed to do 1080p50/60 over component with consumer gear. FTS.
I calculate that xfreq = 107468869 is more accurate for fline = fcsc / 283.5.
18144 / 908 = 19.9823788546
2^31 / 19.9823788546 = 107468868.628
Round up to 107468869
Hmm, yeah I've also noticed that Chip's calculations doesn't give the exact same results as something done manually using higher precision eg. a via Numbers spreadsheet on Mac. Could be down to single precision floats in toolchain perhaps.
EDIT: actually looking at the code again, that doesn't make sense as the computation is done using MULDIV64 in SPIN2 and the frac operator. I wonder if some precision is getting lost there somehow.
Might not make any difference. Daft question, you are testing interlaced video?
Yes.
Just wondering whether dot crawl for non-interlaced PAL composite is better, the same or worse than for interlaced.
From my recollection non-interlaced was always better than interlaced with PAL in terms of quality, but I've not worked on it for a couple of weeks. I believe it's looking much better in general now vs what I had before and I've revamped the default driver timings - of course it can still be tweaked manually if required. Right now I have some other stuff I want to do before Xmas so won't have time to look at it any more for now.
LOL, yeah that can't be done at the moment. In theory it might be doable with the other analog outputs if the code is changed to use the streamer rate change for pixel doubling and I'm at least moving in that direction.
To double DVI you'd need to have a three line scan buffer. One filling with PSRAM contents, one being doubled, and another being output. My driver uses a two scan line buffer.
Something that may ameliorate this restriction is that if you want to pixel double, it's more likely you can probably use HUB RAM given it takes less memory to hold a 320x240 frame for example. And if you want to use larger resolutions with PSRAM you probably don't need to double in the first place. Not perfect I know, but at least it helps a little. Also you can still line double with PSRAM IIRC, just not double the pixel widths - though this will create non-square pixels.
DVI/HDMI monitors are far more forgiving of wild timings. Try just outputting a 320x240 mode straight. It does depend on the particular monitor but many will happily scale up something very low res for you. They certainly can all know what you're putting out, just a question as to whether it wants to accept it or not.
Minimum dotclock is 25MHz still ... so blanking would need extended accordingly to retain 50 or 60 Hz refresh. I found Hsync fixed at 64 dots wide and centred in the blanking worked well.
Vsync of 2 lines did the job too. Vsync placement can be anywhere that suits you. I placed it in the middle of blanking when blanking was large to fully utilise Roger's API limitations. Otherwise I used a front porch of 1 line.
@evanh said:
Vsync of 2 lines did the job too. Vsync placement can be anywhere that suits you. I placed it in the middle of blanking when blanking was large to fully utilise Roger's API limitations. Otherwise I used a front porch of 1 line.
The updated version I am now working with uses 16 bits per sync timing parameter and should allow way more vertical and horizontal blanking - far more than makes sense even.
@rogloh said:
Something that may ameliorate this restriction is that if you want to pixel double, it's more likely you can probably use HUB RAM given it takes less memory to hold a 320x240 frame for example. And if you want to use larger resolutions with PSRAM you probably don't need to double in the first place. Not perfect I know, but at least it helps a little. Also you can still line double with PSRAM IIRC, just not double the pixel widths - though this will create non-square pixels.
Double 320x240 16bpp buffers do fit in Hub RAM, but leave little space for anything else. The idea is to reclaim the second buffer by doing a sort of triple-buffer scheme with one hub buffer and two external ones that finished frames can be copied into.
@Wuerfel_21 said:
Double 320x240 16bpp buffers do fit in Hub RAM, but leave little space for anything else. The idea is to reclaim the second buffer by doing a sort of triple-buffer scheme with one hub buffer and two external ones that finished frames can be copied into.
Yeah that could work out. You'd just have to manage the copying into HUBRAM from PSRAM yourself. Given you can do block copies in the background if you use my driver, that may not be too onerous, though it can burn one extra COG.
@rogloh said:
The updated version I am now working with uses 16 bits per sync timing parameter and should allow way more vertical and horizontal blanking - far more than makes sense even.
@Wuerfel_21 said:
Double 320x240 16bpp buffers do fit in Hub RAM, but leave little space for anything else. The idea is to reclaim the second buffer by doing a sort of triple-buffer scheme with one hub buffer and two external ones that finished frames can be copied into.
Yeah that could work out. You'd just have to manage the copying into HUBRAM from PSRAM yourself. Given you can do block copies in the background if you use my driver, that may not be too onerous, though it can burn one extra COG.
Block copy really doesn't take that long, so for this simple testbed/demo program it's fine to do it synchronously for now (currently runs at ~15 FPS). I swapped in some old version of my video driver that I already modified for PSRAM operation and that can scale the framebuffer just fine.
A video driver I use in my stuff (Basic iterpreter, player) works like this:
in line #x:
- preload line #x+2 from the PSRAM to the hub ram cache. The cache keeps 4 lines.
- draw sprites on line #x+1
- stream the current line #x to the video output
I don't do the pixel doubling in the current version of the driver but having the line in the hub should make it easy - I have older versions of my drivers that have no sprites, but can multiply pixels vertically and horizontally (x1,2,4,8)
Comments
So I've been trying to improve the PAL and NTSC stuff in my video driver in the last couple of days. I took Chip's more accurate timing calculations from this thread
https://forums.parallax.com/discussion/173102/ntsc-pal-driver-without-dot-crawl-new-version/p1
and recalculated timing parameters for these four SDTV modes output using composite/S-video:
Progressive: 288p (PAL), 240p (NTSC)
Interlaced: 576i (PAL), 480i (NTSC)
I also took @Wuerfel_21's updated colourspace matrix settings for NTSC/PAL and added the 3.3V DAC pin setting for improved yellow output range.
Final result is a lot better than my original settings which we knew were less than ideal.
For NTSC and PAL via S-video, results are very good as expected - it looked amazing on my Plasma actually. And for composite NTSC both progressive and interlaced look pretty good too. For PAL the results are still not ideal with visible chroma crawl in interlaced mode despite tweaking timing to get the colour subcarrier frequency as close as possible to the 4433618.7516MHz value as I could and computing XFREQ accurately with respect to this frequency attempting to lock it to it. While the text is crisp and the colour is good it just shimmers with colour dots visible (and I've setup 40 column text instead of 80 which helps reduce the high resolution of the text) and the vertical graphical lines also crawl upwards. A video is attached of it in action. Progressive mode PAL does look better at least as there is no crawl everything looks frozen but it does show the dot pattern - see photo. Actually it does crawl very slowly but the cycle time is over tens of seconds so it's not really noticeable. On my Dell LCD (photo'd) these dots are highly noticeable, but the same signal into the Plasma TV looks less dotty and more like it's wavy and underwater, so it's been filtered somewhat.
I also spent time on this today and experimentally added in the special burst gate blanking PAL uses (Bruch blanking) in the hope that might have helped out with displays that don't like seeing colour bursts on incorrect lines, but still no improvement so far unfortunately. I'm at a loss as to what can actually improve this more. It seems like there is still something wrong but I can't tell what when looking on the scope.
See this link for details in useful Video Demystified book online
https://archive.org/details/video-demystified-5th-edition/page/n305/mode/2up (page 306 via browser, real page 286 in book)
and more details here:
http://www.bbceng.info/additions/2019/ETD Books/TV Signal Coding - Supplemenary information - ETD Training Book.pdf
What PAL colour subcarrier frequency (fcsc) and line frequency (fline) are you using?
EDIT:
The exact relationship is fcsc = 283.75 * fline + 25 Hz
where fline = 15625 Hz and fcsc = 4.43361875 MHz
The extra 25 Hz introduces one or two horrible big prime numbers when synthesizing fcsc
Just wondering whether you've tried fcsc = 283.75 * fline
If sysclk = 283.75 MHz and fline = 15625 Hz then fcsc = sysclk / 64 = 4.43359375 MHz
20.0 MHz * 227 / 16 = 283.75 MHz
283.75 MHz / 19 (or 20) pixel clock outputs very close (or close) to square pixels for 768x576 res.
Yea the exact length and number of scanlines is what affects it. For lowres NTSC you ideally want each scanline to be exactly N+0.5 color cycles and an odd number of total scanlines (263 probably). That way the errors even out the best (for static images). Each scanline will have roughly the opposite error of it's upper/lower neighbor and the errors invert every frame. For interlace it's a 90° shift every field, which I guess is good?
Right now I'm not sure what's ideal for PAL (since the phase is flipping on its own to begin with).
PAL60 is kinda messed up in this regard from the beginning, nothing matches up nicely. Even "real" PAL60 sources like the Wii have annoying dot crawl. Though that uses a really nice video DAC chip ("AVE-RVL") that filters the signal to avoid artifacts in the first place, so it's not as bad.
Maybe that could be built externally i.e. output S-Video from the P2 chip and put a switchable 3.58/4.43 MHz trap filter on the Y line before mixing the signals. The common AD725 encoder has something like that built-in, so I think that works in practice. You'd really want to filter IQ before modulation, too, but oh well.
This webpage is worth a read:
The 625/50 PAL Video Signal and TV Compatible Graphics Modes
Well I first started out with what Chip used which was an integer 4433618Hz and then tweaked it further by adjusting to 4433618.75 and even 4433618.7516 (true frequency that re-aligns colour phase every 8 fields with that extra Fline/625 offset added).
Line frequency was computed based on 283.75 times this in Chip's code below.
I think the original PAL timing was probably somewhat better before I started tweaking it so I'll probably revert back to that for now. IIRC it showed the text dots (perhaps more of an LCD monitor artefact with poor filtering) and jagged vertical lines but less actual visible crawl.
I will also try out your P2 clock numbers TonyB_, but if they don't improve things further than Chip's stuff, for PAL use I'd probably just stick with S-video for the output with PAL. Composite PAL seems too hard to get spot on with the P2 HW even when you make all attempts at getting everything computed correctly, unless PAL always has this amount of crawl and relies on very good filtering from devices to reduce it.
Here was the hacked up vsync handling I'd tested - copied it into the mouse sprite code for now to make room for testing until I remove DVI to free COGRAM space it needs. When I scoped it I captured all four cases of the colour bursts at sync time for the four fields and saw it was doing the correct thing. Also the PAL phase (CQ value) gets reset to the same initial value before the back porch blanking begins after this code, so the burst has the correct polarity, and is XORd to flip it to the other polarity per each scan line after that. This is what is meant to occur AFAIK.
Just tried this, it does give good results but still has that bad dot crawl I already see with my own tests. A brief video is attached showing the output on my plasma using these settings - gives similar results to mine.
Yeah I think many of the problems now probably mainly come down to combining Luma+Chroma inside the P2 without filtering because the S-video output looks really nice.
Thanks for testing. I think fcsc = 283.75 * fline is the best that can be achieved. It seems that's what Chip's code is doing and results using his code should look similar. To minimize PAL dot crawl, a "precision offset" of 1/625 * fline needs to be added to 283.75 * fline but it's a tiny addition, only 5.6 ppm. The insurmountable problem on the P2 is that fline and fcsc are both integer fractions of sysclk, which would need to be 10 times faster for the precision offset to be possible.
I assume you tried the obvious "just add/subtract 1 from CFREQ" thing?
Another factor in this is that XZERO resets the pixel NCO phase (so if used, each scanline is an integer number of cycles), but not the carrier phase. So XFREQ can be an exact fraction of the master clock, but CFREQ can only be multiples of clkfreq/232. Smart choice of system clock can help here. IMO at the clock ratios involved in SDTV (<= clkfreq/16) the XFREQ does not need to be an exact fraction, as long as XZERO is used to keep everything aligned.
You mean keep tweaking its value until I see it working nicer? Or something dynamic in the code? What is obvious here?
Yeah I do an xzero per scanline so the phase error doesn't accumulate. It's always the correct number of clock cycles per line.
Obvious in the sense of "just wiggle the CFREQ value up and down and see if that improves it". Also, remember that the XFREQ needs to be rounded up if you want an exact fraction, otherwise there's an extra cycle after XZERO.
Hey @Wuerfel_21 , are these colorspace values for component SD and component HD known to be correct in NeoVGA? I was going to use these too but didn't know if they were calculated as the working wasn't shown unlike what you did for PAL/NTSC, it's just raw values. Am talking about your params for BT.601 and Rec.709 below.
Ought to be correct, but test it if unsure.
I wonder when you're actually supposed to use which. Where does 480p "EDTV" fall? What's decode matrices are TVs using de-facto?
I think SDTV/EDTV analog progressive component video like 480p/576p uses BT601 and HDTV resolutions like 720p/1080i/p use the Rec709 one. However it's somewhat confusing. In fact after DVB-t DTV was initially released in Australia around 2001 some channel (7?) that broadcast 576p actually had the gall to call it "HD" so they didn't have to invest in more expensive equipment at the time proper HDTV came in. What a joke that was.
That link to that Demystified book I posted above had more details listed in Chapters 3 and 5.
@rogloh
According to the BBC TV Signal Coding document, there should be diagonal lines moving slowly up or down when using an odd number of quarter cycles, which is true when fcsc = 283.75 * fline.
Does NTSC composite have less dot crawl than PAL? Just wondering what happens if you keep sysclk = 283.75 MHz and fcsc = sysclk / 64 and adjust fline a little:
Dot crawl might be reduced and replaced by vertical colour banding for last example.
Hi @TonyB_
I tested your other two variants with 283.5 and 284.0 as the divisors. 283.5 was probably quite a bit better than the 284.0 case. I'd have to say on first impression of using 283.5, it probably looks better than 283.75 on my LCD monitor - still to drag it over and take a look with the Plasma in the other room. I really need a good way to switch out these values dynamically or show two different outputs from two COGs side by side on two monitors to compare visually faster rather than recompute values, rebuild and reload etc and in the mean time forget what the last one looked like. But that will take too long to setup for now. I'll see if I can reload XFREQ from keypress or something...
Update: yeah on the Plasma the 283.5 setting looked quite a bit better than the original 283.75 setting. I just can't understand why the correct value of 283.75 is not working out for PAL timing on the P2. I think I'll keep this 283.5 value for now as the best yet.
One thing I'm sort of wondering is if I do go and start to tweak the XFREQ value slightly until I think it's best visually, whether that will only work "best" on my own P2 given it's crystal frequency will be different to other people's P2s, or whether it will still be applicable on other systems.
For our reference, these are the values I used that Chip's timing generator code computes for the modified 283.5 colour cycles/line and a 283.75MHz clock (with 908 total pixels/line) for PAL 50Hz
ntsc.start(1, 1, 908, 720, 0, 288, 0, 283750000, @timing[0])
xfreq 107469456
cfreq 67109231
breeze = 13 burst = 32
horiz 30 67 91 activepixels =720
I calculate that xfreq = 107468869 is more accurate for fline = fcsc / 283.5.
18144 / 908 = 19.9823788546
2^31 / 19.9823788546 = 107468868.628
Round up to 107468869
Might not make any difference. Daft question, you are testing interlaced video?
EDIT:
xfreq for 283.75 = 107374183 = $666_6666 + 1
My P2 Eval B has been out of action for a while. It was setup to work with an old Windows PC but that or the Eval was giving off a nasty burning smell and smoke. I switched both off and haven't looked for the cause yet.
Hmm, yeah I've also noticed that Chip's calculations doesn't give the exact same results as something done manually using higher precision eg. a via Numbers spreadsheet on Mac. Could be down to single precision floats in toolchain perhaps.
EDIT: actually looking at the code again, that doesn't make sense as the computation is done using MULDIV64 in SPIN2 and the frac operator. I wonder if some precision is getting lost there somehow.
Yes.
BTW I tried out @Wuerfel_21 's SDTV and HDTV component colourspace settings on my LCD and Plasma TV. RGBI colour swatches look reasonable on first impression so there's not some gross error apparent. I don't do colorimetry or have calibration gear etc so wouldn't know if it's accurate but it looks okay to me. Photo shows it as very grainy, looks far better in real life although I wasn't using a proper coax - this is 720p50 component HD.
I've also added a bunch of new stock resolution timings to the driver code. Not all are supported by my test gear here so YMMV. I did test 720p50, 720p60, 1080p24, 1080i50, 1080i60 and worked on my plasma TV. Not sure progressive 1080p50 and 1080p60 are supported over component on my plasma but I should try that out too. Update: nope, nothing displayed, unless it needs a proper coax instead of the cheap old DVD component cables I used. I recall you weren't allowed to do 1080p50/60 over component with consumer gear. FTS.
Just wondering whether dot crawl for non-interlaced PAL composite is better, the same or worse than for interlaced.
From my recollection non-interlaced was always better than interlaced with PAL in terms of quality, but I've not worked on it for a couple of weeks. I believe it's looking much better in general now vs what I had before and I've revamped the default driver timings - of course it can still be tweaked manually if required. Right now I have some other stuff I want to do before Xmas so won't have time to look at it any more for now.
I am once again violently reminded of the lack of doubling for PSRAM source buffers
LOL, yeah that can't be done at the moment. In theory it might be doable with the other analog outputs if the code is changed to use the streamer rate change for pixel doubling and I'm at least moving in that direction.
To double DVI you'd need to have a three line scan buffer. One filling with PSRAM contents, one being doubled, and another being output. My driver uses a two scan line buffer.
Something that may ameliorate this restriction is that if you want to pixel double, it's more likely you can probably use HUB RAM given it takes less memory to hold a 320x240 frame for example. And if you want to use larger resolutions with PSRAM you probably don't need to double in the first place. Not perfect I know, but at least it helps a little. Also you can still line double with PSRAM IIRC, just not double the pixel widths - though this will create non-square pixels.
DVI/HDMI monitors are far more forgiving of wild timings. Try just outputting a 320x240 mode straight. It does depend on the particular monitor but many will happily scale up something very low res for you. They certainly can all know what you're putting out, just a question as to whether it wants to accept it or not.
Minimum dotclock is 25MHz still ... so blanking would need extended accordingly to retain 50 or 60 Hz refresh. I found Hsync fixed at 64 dots wide and centred in the blanking worked well.
Vsync of 2 lines did the job too. Vsync placement can be anywhere that suits you. I placed it in the middle of blanking when blanking was large to fully utilise Roger's API limitations. Otherwise I used a front porch of 1 line.
The updated version I am now working with uses 16 bits per sync timing parameter and should allow way more vertical and horizontal blanking - far more than makes sense even.
Double 320x240 16bpp buffers do fit in Hub RAM, but leave little space for anything else. The idea is to reclaim the second buffer by doing a sort of triple-buffer scheme with one hub buffer and two external ones that finished frames can be copied into.
Yeah that could work out. You'd just have to manage the copying into HUBRAM from PSRAM yourself. Given you can do block copies in the background if you use my driver, that may not be too onerous, though it can burn one extra COG.
Gimme!
Block copy really doesn't take that long, so for this simple testbed/demo program it's fine to do it synchronously for now (currently runs at ~15 FPS). I swapped in some old version of my video driver that I already modified for PSRAM operation and that can scale the framebuffer just fine.
A video driver I use in my stuff (Basic iterpreter, player) works like this:
- preload line #x+2 from the PSRAM to the hub ram cache. The cache keeps 4 lines.
- draw sprites on line #x+1
- stream the current line #x to the video output
I don't do the pixel doubling in the current version of the driver but having the line in the hub should make it easy - I have older versions of my drivers that have no sprites, but can multiply pixels vertically and horizontally (x1,2,4,8)