Shifting consecutive LONGs
escher
Posts: 138
I have n consecutive LONGs in Cog RAM, representing a bitmap scan line, which I want to shift en masse left by some amount in order to implement pixel-fine scrolling.
Each pixel is represented by 8 bits, and there are 320 pixels visible per scan line, so at 4 pixels per LONG (32/8=4) I have a buffer of 80 pixels.
The issue is that due to the long-aligned memory model, I can only "scroll" (i.e. shift left or right) 1 LONG or 4 pixels at a time trivially.
To scroll 8 bits at a time, as far as I can tell, would require instruction-intensive offset calculating and shifting and masking and etc etc etc in order to move all pixels left or right arbitrarily (aka not multiples of 4 pixels).
Is there any efficient way via PASM or otherwise to shift all longs from 0 to n by 3, 2, or 1 pixels (i.e. 24, 16, or 8 bits)?
Each pixel is represented by 8 bits, and there are 320 pixels visible per scan line, so at 4 pixels per LONG (32/8=4) I have a buffer of 80 pixels.
The issue is that due to the long-aligned memory model, I can only "scroll" (i.e. shift left or right) 1 LONG or 4 pixels at a time trivially.
To scroll 8 bits at a time, as far as I can tell, would require instruction-intensive offset calculating and shifting and masking and etc etc etc in order to move all pixels left or right arbitrarily (aka not multiples of 4 pixels).
Is there any efficient way via PASM or otherwise to shift all longs from 0 to n by 3, 2, or 1 pixels (i.e. 24, 16, or 8 bits)?
Comments
The simplest would be to read longs from hub then shift and write them back as bytes. Could possibly shift 4 pixels in 5 or 6 hub cycles.
The alternative would be to write the PASM code so that you are interleaving reading and writing longs from/to the hub with rearranging the data in the cog.
I don't know which would be more efficient without writing the code and analyzing the timing
That's what I'm using in my game console (full code on the repository):
sbuf is the scanline buffer, xs is the amount of pixels to scroll.
There are few tricks here. The tile rendering must be 8-pixels (2 longs) aligned, so if it needs to scroll 4 pixels it just copies a long over the other, for any other amunt a shift / or is done. The coarse (8 pixels amount) scrolling is done by the tile rendering code. To take into account the additional pixels on the left or right side (if your visible resolution is actually 320 pixels) the rendering code renders one addtional tile so there are no blanks (you always need some extra tiles for scrolling, unless you display less pixels on both sides). The scanline buffer is also a couple of longs larger to accomodate the additional pixels.
This loop takes a lot of time and I'm sure there is a more efficent way to do that, like shifting each pixel in a long variable and store it on the scanline buffer once 4 pixels are available, never had the time to fully explore this way.
I'm pretty sure that's how they achieved parallax on the NES.