Successfully implemented smooth horizontal scrolling w/ wrapping!

escherescher Posts: 120
edited 2019-05-18 - 03:51:18 in Propeller 1
I know video is pretty beaten to death on the propeller, but I'm just feeling really satisfied by what I've been able to accomplish with the P8X basically from scratch:



My system is a dual-Propeller setup: a CPU and GPU. I'm using the classic tile-sprite paradigm for my graphics. The CPU contains the game code as well as the tile map, color palettes, sprite attribute table, and other variables which are sent via a high-speed serial link (thanks Marko Lukat!) to the GPU. The GPU uses one cog for reception of the data, 6 cogs for rendering each of every 6 scanlines, and one cog to fetch the rendered data from Hub RAM and display it in full 8-bit color at 640x480, upscaled from 320x240.

Though the render and display code for the graphics driver is all 100% from scratch, it draws on a lot of inspiration and guidance from this community for which I am extremely grateful.

With smooth scrolling enabled, I can now make side-scrolling games which is a huge milestone for my project. And with wrapping, I can eventually implement real-time loading of levels outside the visible area from external RAM.

Next I am implementing vertical scrolling, as well as some sprite sizing.

Comments

  • Good work!
    How many sprites (per line) do you get with your driver? 6 cogs can go a long way.
    I recall implementing horizontal scrolling in my "JET Engine" driver was a headache. (The rendering only scrolls in 16-pixels steps, the rest is handled in the output cog by messing with VSCL). Vertical scrolling was significantly easier.
  • Peter JakackiPeter Jakacki Posts: 8,641
    edited 2019-05-18 - 02:37:30
    The link seems to work here ok:



    So when are you going to do a retro game machine for the P2? (without all the wires and breadboard) :)

    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    --->CLICK THE LOGO for more links<---
    P2 +++++ TAQOZ INTRO & LINKS +++++ P2 SHORTFORM DATASHEET
    P1 +++++ Latest Tachyon includes EASYFILE +++++ Tachyon Forth News Blog
    Brisbane, Australia
  • escherescher Posts: 120
    edited 2019-05-18 - 04:04:34
    Wuerfel_21 wrote: »
    Good work!
    How many sprites (per line) do you get with your driver? 6 cogs can go a long way.
    I recall implementing horizontal scrolling in my "JET Engine" driver was a headache. (The rendering only scrolls in 16-pixels steps, the rest is handled in the output cog by messing with VSCL). Vertical scrolling was significantly easier.

    So before scrolling support I was hitting 16 full sprites per scanline, but after I'm down to ~14. I'm considering splitting up my feature set into the classic "modes", where e.g. you can have Mode 1 with no scrolling and 16 spl, horizontal or vertical scrolling and 12 spl, bidirectional scrolling and 8 spl, etc.

    I am absolutely certain however that I am not maximizing my efficiency with how I'm rendering my tiles and sprites, and potentially a lot of performance can be recovered from using some more intuitive PASM. I was actually considering putting out a $100 bounty to whomever could get me 16 spl with horizontal scrolling haha. But for now I think I'll wait until all of my features are fleshed out before banging my head against that wall. For anyone interested though, you can see my WIP rendering code here.

    And I empathize with you on the scrolling. At first I was trying to solve the problem from a memory standpoint, i.e. rendering everything normally and then simply "shifting" the memory the appropriate amount somehow to spoof the scroll. Due to the `long`-aligned nature of the P1's memory however, this wasn't doable. Ultimately I approached the problem from a more mathematical standpoint, during real-time rendering: using a variable representing the left-most boundary of the visible screen, and dividing and flooring and modulo'ing it with the bit-width of my tiles to get everything in the right place. The biggest realization for me was that, from a rendering standpoint, the only janky bit is the very first tile you render, because it's the only one where you might start rendering partway through. Every subsequent tile, you can start rendering at its normal 0 index. The bit of overhead it DOES introduce however is that you could transition into a new tile on any given pixel, so performing a check for that condition and calling a routine to load the next tile does tax some cycles. You can see my thought process on solving these problems in the issue itself for the feature.

    And @Peter Jakacki the very instant the P2 is in silicon, I'm jumping on it. The power of a GPU/CPU P2 system would be outstanding. And for the record, my ultimate goal with my project is to develop a final PCB that's compatible with the standard arcade edge connector format.

    You can read about my project and development log here.
  • If you don't mind a P2D2 mounted onto a JAMMA card with tinned edge connections then I can easily include that in another batch of pcbs I can get from JCLPCB. The main cost is in the freight so I will just combine it.

    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    --->CLICK THE LOGO for more links<---
    P2 +++++ TAQOZ INTRO & LINKS +++++ P2 SHORTFORM DATASHEET
    P1 +++++ Latest Tachyon includes EASYFILE +++++ Tachyon Forth News Blog
    Brisbane, Australia
  • If you don't mind a P2D2 mounted onto a JAMMA card with tinned edge connections then I can easily include that in another batch of pcbs I can get from JCLPCB. The main cost is in the freight so I will just combine it.

    Wow that would be outstanding! I'd be happy to help foot the bill as well!
  • Wuerfel_21Wuerfel_21 Posts: 351
    edited 2019-05-18 - 06:56:48
    escher wrote: »
    So before scrolling support I was hitting 16 full sprites per scanline, but after I'm down to ~14. I'm considering splitting up my feature set into the classic "modes", where e.g. you can have Mode 1 with no scrolling and 16 spl, horizontal or vertical scrolling and 12 spl, bidirectional scrolling and 8 spl, etc.
    You may want to look into a cool thing I did. My driver can split the screen between different modes/settings, like one would do on most retro platforms using scanline interrupts. So one can have a textmode status bar and a scrolling playfield, for example. Or parallax-ish scrolling. And many similiar effects.
    escher wrote: »
    I am absolutely certain however that I am not maximizing my efficiency with how I'm rendering my tiles and sprites, and potentially a lot of performance can be recovered from using some more intuitive PASM.
    A lot of mine's performance comes from a rather novel pixel encoding, where all the masking can be taken care of by the hardware and a pixel can be poked out every 32 cycles. Only works for 4 color tiles though.
    :tilepixel
            shl pattern,#1 wc
    if_c    ror palette,#16
            shl pattern,#1 wc
    if_c    ror palette,#8
            wrbyte palette,pixel_ptr  
            add pixel_ptr,pixel_ptr_stride
            djnz pixel_iter,#:tilepixel
    
    The version for sprites has an extra compare between palette and a copy of it before any rotation, to mask the wrbyte.

    I had a brief look at your code and I see there's a lot of
            add             pxindx, #1
            cmp             pxindx, #8 wz
            if_z call       #tld
    
    If you "invert" pxindex (so it starts at usually-8 and counts down), I think you could optimize it to
            djnz            pxindx, #:next
            call             #tld
    :next
    
    Which, instead of always being 12 cycles, is 4 cycles when the branch is taken. Also less instructions.


  • escherescher Posts: 120
    edited 2019-05-18 - 17:29:18
    I am definitely implementing parallax scrolling via scanline interrupts - pretty much required for things like HUD graphics, statusbars, etc.

    Your pixel poking solution is definitely awesome, however I'm working with full 8-bit colors for all of my graphics.

    Your solution to pxindx tracking is great - exactly the kind of cycle-shaving I'm working towards.
  • Right now this is the critical section of my sprite rendering code which has the biggest impact on my sprites-per-line:
            ' Parse sprite pixel palette line
            mov             findx,  #sprSzX         ' Store sprite horizontal size into index
    :sprite mov             temp,   curpt           ' Load current sprite pixel palette line into temp variable
            and             temp,   #15 wz          ' Mask out current pixel
            if_z  jmp       #:trans                 ' Skip if pixel is transparent
            add             temp,   cpindx          ' Calculate color palette offset
            rdbyte          curcp,  temp            ' Load color
            mov             temp,   spxpos          ' Store sprite horizontal position into temp variable
            cmp             spxmir, #1 wz           ' Check for horizontal mirroring
            if_z  mov       tmpmir, #sprSzX         ' If so store sprite horizontal size into temp variable
            if_z  sub       tmpmir, findx           ' And invert current index
            if_z  add       temp,   tmpmir          ' And add inverted pixel index
            if_nz add       temp,   findx           ' Or add current pixel index
            if_nz sub       temp,   #1              ' And decrement for inclusivity
            mov             slboff, temp            ' Store scanline buffer offset
            shr             slboff, #2              ' slboff /= 4
            cmp             slboff, #80 wc          ' Check if pixel out of bounds
            if_nc jmp       #:trans                 ' Skip if out of bounds
            add             slboff, #slbuff         ' slboff += @slbuff
            movs            :slbget, slboff         ' Move target scanline buffer segment source
            movd            :slbput, slboff         ' Move target scanline buffer segment destination
            and             temp,   #3              ' temp %= 4
            shl             temp,   #3              ' temp *= 8
            shl             curcp,  temp            ' Shift pixel color to calculated pixel location
    :slbget mov             tmpslb, 0-0             ' Store target scanline buffer segment into temp variable
            mov             slboff, pxmask          ' Temporarily store scanline buffer segment mask
            rol             slboff, temp            ' Rotate mask to calculated pixel location
            and             tmpslb, slboff          ' Mask away calculated pixel location
            or              tmpslb, curcp           ' Insert pixel
    :slbput mov             0-0,    tmpslb          ' Re-store target scanline buffer segment
    :trans  shr             curpt,  #4              ' Shift palette line right 4 bits to next pixel
            djnz            findx,  #:sprite        ' Repeat for all pixels on sprite palette line
    
Sign In or Register to comment.