Shifting consecutive LONGs

escher · 2018-03-06 01:35

I have n consecutive LONGs in Cog RAM, representing a bitmap scan line, which I want to shift en masse left by some amount in order to implement pixel-fine scrolling.

Each pixel is represented by 8 bits, and there are 320 pixels visible per scan line, so at 4 pixels per LONG (32/8=4) I have a buffer of 80 pixels.

The issue is that due to the long-aligned memory model, I can only "scroll" (i.e. shift left or right) 1 LONG or 4 pixels at a time trivially.

To scroll 8 bits at a time, as far as I can tell, would require instruction-intensive offset calculating and shifting and masking and etc etc etc in order to move all pixels left or right arbitrarily (aka not multiples of 4 pixels).

Is there any efficient way via PASM or otherwise to shift all longs from 0 to n by 3, 2, or 1 pixels (i.e. 24, 16, or 8 bits)?

kwinn · 2018-03-06 04:54

I can come up with 2 methods of doing that.

The simplest would be to read longs from hub then shift and write them back as bytes. Could possibly shift 4 pixels in 5 or 6 hub cycles.

The alternative would be to write the PASM code so that you are interleaving reading and writing longs from/to the hub with rearranging the data in the cog.

I don't know which would be more efficient without writing the code and analyzing the timing

Peter Jakacki · 2018-03-06 05:11

In Forth this is a simple CMOVE or <CMOVE instruction depending upon the direction for 8-bits. In Spin you could use the equivalent bytemove I guess.

...  BUFFERS 64 DUMP 
0000.7800:   04 42 4F 4F  54 00 36 3F  02 3F 3F 00  FC 3E 05 44    .BOOT.6?.??..>.D
0000.7810:   45 46 45 52  DC 3E 46 3F  44 45 46 45  52 00 B4 3E    EFER.>F?DEFER..>
0000.7820:   46 64 65 66  73 74 6B 00  82 F6 46 2C  57 4F 52 44    Fdefstk...F,WORD
0000.7830:   24 00 8E 3E  04 21 56 47  41 00 6E 3E  44 65 76 67    $..>.!VGA.n>Devg ok
...  BUFFERS BUFFERS 1+ 64 <CMOVE  ok
...  BUFFERS $40 DUMP 
0000.7800:   04 04 42 4F  4F 54 00 36  3F 02 3F 3F  00 FC 3E 05    ..BOOT.6?.??..>.
0000.7810:   44 45 46 45  52 DC 3E 46  3F 44 45 46  45 52 00 B4    DEFER.>F?DEFER..
0000.7820:   3E 46 64 65  66 73 74 6B  00 82 F6 46  2C 57 4F 52    >Fdefstk...F,WOR
0000.7830:   44 24 00 8E  3E 04 21 56  47 41 00 6E  3E 44 65 76    D$..>.!VGA.n>Dev ok
...  BUFFERS DUP 1+ 64 LAP <CMOVE LAP .LAP 2,928 cycles = 30.500us  ok

macca · 2018-03-06 10:14

escher wrote: »

I have n consecutive LONGs in Cog RAM, representing a bitmap scan line, which I want to shift en masse left by some amount in order to implement pixel-fine scrolling.

Each pixel is represented by 8 bits, and there are 320 pixels visible per scan line, so at 4 pixels per LONG (32/8=4) I have a buffer of 80 pixels.

The issue is that due to the long-aligned memory model, I can only "scroll" (i.e. shift left or right) 1 LONG or 4 pixels at a time trivially.

To scroll 8 bits at a time, as far as I can tell, would require instruction-intensive offset calculating and shifting and masking and etc etc etc in order to move all pixels left or right arbitrarily (aka not multiples of 4 pixels).

Is there any efficient way via PASM or otherwise to shift all longs from 0 to n by 3, 2, or 1 pixels (i.e. 24, 16, or 8 bits)?

That's what I'm using in my game console (full code on the repository):

                        movs    _src1h,#sbuf
                        movs    _src2h,#sbuf+1
                        movd    _dst1h,#sbuf

                        mov     b, xs
                        and     b, #3 wz
                        shl     b, #3
                        mov     x, #32
                        sub     x, b

                        mov     ecnt, #H_RES/4

_src1h                  mov     colors1, 0-0
_src2h                  mov     colors2, 0-0

            if_nz       shr     colors1, b
            if_nz       shl     colors2, x
            if_nz       or      colors1, colors2

_dst1h                  mov     0-0, colors1

                        add     _src1h, #1
                        add     _src2h, #1
                        add     _dst1h, inc_dest

                        djnz    ecnt, #_src1h

sbuf is the scanline buffer, xs is the amount of pixels to scroll.

There are few tricks here. The tile rendering must be 8-pixels (2 longs) aligned, so if it needs to scroll 4 pixels it just copies a long over the other, for any other amunt a shift / or is done. The coarse (8 pixels amount) scrolling is done by the tile rendering code. To take into account the additional pixels on the left or right side (if your visible resolution is actually 320 pixels) the rendering code renders one addtional tile so there are no blanks (you always need some extra tiles for scrolling, unless you display less pixels on both sides). The scanline buffer is also a couple of longs larger to accomodate the additional pixels.

This loop takes a lot of time and I'm sure there is a more efficent way to do that, like shifting each pixel in a long variable and store it on the scanline buffer once 4 pixels are available, never had the time to fully explore this way.

mikeologist · 2018-03-06 19:34

Would it be possible to shift how you read them, like an offset, rather than shifting the actual data?
I'm pretty sure that's how they achieved parallax on the NES.

Shifting consecutive LONGs

Comments