PDA

View Full Version : [FYI] waitvid setup timing & Co (updated 20111119)



kuroneko
11-05-2010, 06:49 AM
This week I had some time to spend and looked into the issue raised in the first thread. Using waitvid should just work, running it at its minimum cycle time or not. So I grabbed my test from the second thread and modified it as follows:

run PLL at 80MHz
setup VGA output, frame rate at 4 times pixel rate
keep the timing code but additionally sample the VGA pins after the final waitvid

To have some room for the delay code I picked a frame length of 48. The pixel pattern generated by the final waitvid is $AA_AA_AA_AA. Here are the results:

f/i/o: 48 11 00AA0000 -7
f/i/o: 48 10 00AA0000 -6
f/i/o: 48 9 00AA0000 -5
f/i/o: 48 8 00AA0000 -4
f/i/o: 48 7 00AA0000 -3
f/i/o: 48 6 00EC0000 -2
f/i/o: 48 5 003F0000 -1
f/i/o: 48 4 00EC0000 +0
f/i/o: 49 4 003F0000 +1
f/i/o: 96 50 00AA0000 +2
f/i/o: 96 49 00AA0000 +3
The numeric columns are frame length in cycles, cycles spent in waitvid, the bit pattern obtained from ina immediately after waitvid and finally the (theoretical) distance to minimum cycle count. The last two lines have missed the deadline, therefore we measure two frames. The blue line is effectively one cycle late but is still accepted as 4 with the added side effect of increasing the frame length by oneC. For this exercise it can be ignored.

Of interest are the 3 red coloured lines. They produce garbage. Which would suggest that waitvid should be given at least 3 extra cycles to safely provide the pixel/colour information to the h/w. I'd like a second opinion on this one.

Update 20110127: It appears that certain PLL settings (e.g. 40MHz or 128MHzA) can get away with a 2 cycle setup (6 total). The 3+4 cycle requirement so far only holds true for fPLL == clkfreq which should be seen as the worst case scenario.

Update 20110303: Did some more tests re: waitvid-less operations. Turns out 2 cycles are not always enough. Period. With a modified test program which runs the sequence once per key press I got the following sequence for 128MHz PLL. Similar patterns turn up for other non-1:1 ratios. Note that this is all about asynchronous hand-off. A 1:1 ratio isn't in any way special, it just shows less 6-cycle-success than other ratios.

f/i/o: 60 8 00550000 00550000 00AA0000 2F2EF175
f/i/o: 60 7 00550000 00550000 00AA0000 2F364E66
f/i/o: 60 6 00550000 00550000 00AA0000 2F3DA934
f/i/o: 60 5 00F40000 00F40000 00910000 2F450622
f/i/o: 60 4 00F40000 00F40000 003F0000 2F4C5E91

f/i/o: 60 8 00550000 00550000 00AA0000 3A0FED99
f/i/o: 60 7 00550000 00550000 00AA0000 3A174CC8
f/i/o: 60 6 00550000 00550000 00AA0000 3A1EA9A8
f/i/o: 60 5 00910000 00910000 003F0000 3A2608D8
f/i/o: 60 4 00910000 00910000 00F40000 3A2D6149

f/i/o: 60 8 00550000 00550000 00AA0000 40DE2116
f/i/o: 60 7 00550000 00550000 00AA0000 40E57BC6
f/i/o: 60 6 003F0000 003F0000 00EC0000 40ECD8B7
f/i/o: 60 5 003F0000 003F0000 00F40000 40F43C85
f/i/o: 60 4 00F40000 00F40000 00EC0000 40FB9973

f/i/o: 60 8 00550000 00550000 00AA0000 4FF75FBF
f/i/o: 60 7 00550000 00550000 00AA0000 4FFEC12E
f/i/o: 60 6 00550000 00550000 00AA0000 5006229C
f/i/o: 60 5 00910000 00910000 00F40000 500D7B0B
f/i/o: 60 4 00910000 00910000 00910000 5014D5AB

f/i/o: 60 8 00550000 00550000 00AA0000 586EB5C6
f/i/o: 60 7 00550000 00550000 00AA0000 587612A6
f/i/o: 60 6 003F0000 003F0000 003F0000 587D6B17
f/i/o: 60 5 003F0000 003F0000 00910000 5884C825
f/i/o: 60 4 00F40000 00F40000 003F0000 588C2093
Furthermore, when using PLL == clkfreq the hand-off point is predictable and easily sync'd to the hub windowB which voids the requirement for using waitvid in the first place. For example a well placed 8-cycle waitvid colour, pixel (between two hub ops) can simply be replaced by a cmp colour, pixel which gives you an extra 4 cycles before the next hub op.

Update 20110305: Looking at the pixel pattern for a given number of ratios (1:1, 1:2 .. 1:32) it turns out that the first pixel is output starting one frame clock cycle after a common reference point. That reference point marks the hand-off and is located at the 4th to last cycle of the waitvid. Based on the frame stretch effect this unblock signal may well reach into the 3rd to last cycle but this is not significant (hand-off miss for minimum timing). That - coupled with the fact that full input latch is only achieved during e-phase (SDeR) - leads me to the conclusion that 7 cycles are the safe minimum to use waitvid (if it has to be used). As for 6 cycles working most of the time, in this case input latch (active edge) and hand-off are both located in the 4th to last cycle. It only takes a few nano seconds between system and frame clock to make this work (contributed by PLL, clock divider, internal delays etc). Unfortunately it's not 100%.

Update 20110404: Coined the acronym WHOP (waitvid hand-off point). Bills driver is working now (I got a 2nd hand VGA monitor to do video related stuff).

Update 20110408: Added sample code for cog frame counter synchronisation.

CON
zero = $1F0 ' par

vref = 12
user = 47

DAT org 0

entry rdlong wait, #0 ' clkfreq
shr wait, #10 ' ~1ms (%%)

rdlong cref, par ' initial sync target
waitcnt cref, wait ' sync cogs

movi ctra, #%0_00001_111 ' PLL, VCO/1
movi frqa, #%0001_00000 ' 5MHz * 16 / 1 = 80MHz

mov vscl, #vref ' 12 rather than the 4K default

movd vcfg, #2 ' pin group
movs vcfg, #%0_11111111 ' pins
movi vcfg, #%0_01_0_00_000 ' VGA, 32 pixels

' vref = 12 (this places the adjustment WHOP in the first cycle of the user vscl transfer),
' the extra +2 advance for cref is to adjust the WHOP for the first user waitvid so that we
' don't violate its 3 cycle setup time.

waitcnt cref, #vref*3+2 ' PLL settled (%%)
waitvid zero, #0 ' catch frame stretch
waitvid zero, #0 ' proper sync
sub cref, cnt
mov vscl, cref ' transfer adjustment
mov vscl, #user ' transfer user value
waitvid colour, pixels ' latch user value

...

Update 20110409: The code above - while working in general - suffers from jitter as it relies on constant waitvid intervals. While it works for certain NCO setups (frqa == power of 2) it doesn't for all so we need a better solution. Listed below:

CON
user = 47

DAT org 0

entry rdlong wait, #0 ' clkfreq
shr wait, #10 ' ~1ms (%%)

rdlong cref, par ' initial sync target
waitcnt cref, wait ' sync cogs

movi ctra, #%0_00001_111 ' PLL, VCO/1
movi frqa, #%0001_00000 ' 5MHz * 16 / 1 = 80MHz

mov vscl, #1 ' reload as fast as possible

movd vcfg, #2 ' pin group
movs vcfg, #%0_11111111 ' pins
movi vcfg, #%0_01_0_00_000 ' VGA, 32 pixels

waitcnt cref, #0 ' PLL settled (%%)
' frame counter flushed
ror vcfg, #1 ' freeze video h/w
mov vscl, #user ' transfer user value
rol vcfg, #1 ' unfreeze
nop ' get some distance
waitvid colour, pixels ' latch user value
To overcome the jitter issue we make sure the initial (unknown) frame counter value is flushed and then loaded with 1 (fastest reload). Then we freeze the video h/w, set the initial user value and go from there.

References

Waitvid glitch - request for more timing information.
[resolved] waitvid minimum timing
Hanno's IODreamKit - DreamKit:TNG Discussion Thread
50 column VGA text driver in one cog - 60/50/40/30/25/20...rows!


A standard 80MHz setup
B it is also predictable for an e.g. 1:2 ratio but not as easily shifted in reference to the hub window (see reference 4)
C this is only true as far as cycle measurement is concerned, the video h/w generates a frame according to vscl