Chip's scancode shenanigans
escher
Posts: 138
in Propeller 1
In Chip's VGA high res text driver, he builds a routine to perform the shifting and waitviding of the line buffer into a region of memory labeled "scancode", then jmps to it when necessary for execution. What I don't understand is how this is faster... you still have to take each 32-bit operation and fetch, decode, execute, etc. So besides saving room by putting it all into memory space instead of explicitly defining the entire unrolled loop, I don't see the benefit:
What am I missing here?
'Build scanbuff display routine into scancode :waitvid mov scancode+0,i0 'org scancode :shr mov scancode+1,i1 'waitvid color,scanbuff+0 add :waitvid,d1 'shr scanbuff+0,#8 add :shr,d1 'waitvid color,scanbuff+1 add i0,#1 'shr scanbuff+1,#8 add i1,d0 '... djnz scan_ctr,#:waitvid 'waitvid color,scanbuff+cols-1 mov scancode+cols*2-1,i2 'mov vscl,#hf mov scancode+cols*2+0,i3 'waitvid hvsync,#0 mov scancode+cols*2+1,i4 'jmp #scanret
What am I missing here?
Comments
On a better assembler with for loop macros, a for loop macro would be used to generate all of this at compile time. But, since Spin/PASM has no macros, it has to be done at run time.
Big savings, small driver HUB footprint.
It's all about the initial HUB image. Keeping that small can matter.
Correct me if I'm wrong here, but wouldn't performing the scancode routine in a closed loop only add 4 cycles (for a successful DJNZ jump) per iteration? Is the insinuation here that that 5 nanoseconds means the difference between an on-time and de-synced waitvid?
You can't just add the djnz - you also have to insert an add to increment the pointer in the waitvid, and another add to increment the pointer in the shr that shifts the font data to the next line of the font. That adds up to 12 cycles/iteration.
A resolution of 1024*768*60 Hz has a pixel rate of 60 MHz according to Chip's VGA_HighRes_Text.spin. 80MHz / ((65M pixels/second) / (16 pixels/waitvid) = 21 ticks/waitvid.
A waitvid and a shr take 10 cycles, which leaves 11 cycles to spare. The 12 cycles for the djnz and the two adds won't fit.
Well there you go, the (somehow) dropped factor of 10 was throwing me off; thanks for the analysis!