Jeff reported that the issue disappeared which is probably as close as we can get to a solution (I did this as a blind fix as I don't have a VGA monitor available).
This is based on the driver in the Sunday, November 21, 2010 at 7:52:15 PM package.
The single res goes at the end of the file.
blank_hsync mov vscl,_hf 'hsync, do invisible front porch pixels
mov vscl,_hs 'do invisble sync pixels
neg href, cnt ' waitvid reference ...
cogid $ nr
add href, cnt ' ... relative to hub window
sub href, #(76 + 42) * 4 ' pixels start in (_hs+_hb)*4
shr href, #2 ' delta in frame clocks
and href, #%11 ' limit to 0..3
add href, _hb ' add base -> _hb + 0..3
mov vscl, href 'do invisible back porch pixels
href res 1
Pixels are latched between 2 hub ops which are in fact 2 slots apart (32 cycles). Meaning the waitvid has 16 cycles leg room in there somewhere. The code above will make sure that the waitvid stays in the middle of this windowA. Previously there was a condition that you got a non-blocking waitvid which means the pixel info was taken from the or instruction just before the waitvid. In this case the active value for pixels is $03030303 (coloror) which means colour 3 is used once and colour 0 three times. When you look at the relevant image you'll find that this is exactly what happens, two colours with a 1:3 ratio.
A or rather finishes 8..11 cycles before the next rdlong which also means the unrolled loop could in fact be a loop as add only consumes 4 cycles of the gap