Shop OBEX P1 Docs P2 Docs Learn Events
A display list is fun :) [0.50a beta+0.07g - PSRAM command list used] - Page 3 — Parallax Forums

A display list is fun :) [0.50a beta+0.07g - PSRAM command list used]

1356

Comments

  • pik33pik33 Posts: 2,388

    It works without the PSRAM attached - it displays moving sprites on a static background.

  • evanhevanh Posts: 16,040

    @pik33 said:
    It works without the PSRAM attached - it displays moving sprites on a static background.

    Okay, so videotest2.bas is the top level right? I've found function startvideo(mode=64, pin=0, mb=0) in retromachine.bi and edited it to function startvideo(mode=64, pin=16, mb=0)

    Compiled and loaded ... no picture as yet.

  • pik33pik33 Posts: 2,388
    edited 2022-03-31 09:38

    Something is wrong with the code. It doesn't work at pin #16 - I will try to find why.

    Edit1: When I changed the streamer mode from imm->LUT->output to rflong->LUT->output, I didn't add a pin group for the stream config value. This didn't solve the problem yet, but this is the bug #1 found

  • pik33pik33 Posts: 2,388
    edited 2022-03-31 09:55

    Seems to be fixed now. It is time to clean this mess before going further.

  • evanhevanh Posts: 16,040

    :) I've moved to pin 0 anyway. Working on the cheap TV now. Although it sadly only reports "576p" as the resolution.

    Found the timings in hg004.spin2. What's the difference between "total lines" and "scanlines"? Both are set to 576.

  • pik33pik33 Posts: 2,388
    edited 2022-03-31 10:39

    Now there is no difference. The "full" DL driver (050) can make up/down borders. so these were lines with and without these borders.
    This simplified driver only loads first 10 (now 9 - I am cleaning this right now, L/R border is also unused here) of these numbers, the rest are either unused or used by high level Spin code (clkfreq, hubset, etc)
    Can you see these sprites?

  • evanhevanh Posts: 16,040
    edited 2022-03-31 11:41

    @pik33 said:
    Can you see these sprites?

    Yep, resolution looks fine.

    I note fast moving sprites are wrapping around from right to left sides of screen a little. EDIT: Should add playfields (virtual resolutions) larger than the viewport (actual resolution).

  • evanhevanh Posts: 16,040
    edited 2022-03-31 12:54

    Definitely something wrong with the modeline handling or timing gen ...

    There's 0.5 us to 1.5 us of jitter no, correction, it's just a delay on the hsync once per field. Nearly a whole line interval is skipped. That's bad! You need to tidy up the convolution with your XCONTs.

    PS: My diag is not actually measuring the real hsync signal, it's relying on embedding a pin #56 toggle at each of two hsync XZEROs that I've found in your code. One in p301, the other in blank. Eg:

    blank           xcont   m_bs,hsync0                        ' horizontal sync
                    xzero   m_sn,hsync1
                    xcont   m_bv,hsync0     
            drvnot  #56
                    xcont   m_vi,hsync0
            _ret_   djnz    pa,#blank
    
  • pik33pik33 Posts: 2,388

    The wrapping around is a bug I already found and removed (sprite number was added to sprite_x. I did it for testing and forgot to remove.

    What did you use to measure the jitter? A DL interpreter and a sprite drawer called between blanks may be too long to process in time between two short xconts.
    Instead of calling blank I am now calling called every xcont individually, interleaved them procedures.

  • evanhevanh Posts: 16,040
    edited 2022-03-31 13:43

    @pik33 said:
    What did you use to measure the jitter? A DL interpreter and a sprite drawer called between blanks may be too long to process in time between two short xconts.
    Instead of calling blank I am now calling called every xcont individually, interleaved them procedures.

    Lol, too few and too many syncs both now. The attached scope snapshot is represented by an edge per hsync. Top half of image is a little more than one field of time. Bottom half of image is zoomed in at the each end of the top half. Bottom-left shows the extra delay, and bottom-right shows the extra syncs that are now occurring on alternate fields.

    PS: I've reduced the blanking a little to make refresh faster than 50 Hz.
    timings long 8, 80, 8, 1024, 8, 2, 8, 128, 576, 336956522, %1_101101__11_0000_0110__1111_1011, 576, 0, 192, 0, 0

  • pik33pik33 Posts: 2,388
    edited 2022-03-31 13:58

    The xconts were called in the wrong order while displaying picture, and in the right order while displaying blanks!!!
    I don't know why the monitors (3 different!!) display anything on them.

    And yes, I introduced the bug in 002....

  • evanhevanh Posts: 16,040
    edited 2022-03-31 14:22

    Well, that's finally now working, as is, on my old DVI monitor. :)

    EDIT: And no longer any long duration between hsyncs for the scope to trigger on. :) And no jitter nor short hsyncs either.

  • pik33pik33 Posts: 2,388

    I hope that was the problem which caused problems at home, these monitors weren't "good enough" to display this jittery picture.
    I have to find a way to move these animated sprites to PSRAM - vblank is too short to load them using 4-bit PSRAM configuration.

  • evanhevanh Posts: 16,040
    edited 2022-03-31 14:38

    @pik33 said:
    I hope that was the problem which caused problems at home, these monitors weren't "good enough" to display this jittery picture.

    I'm confident it'll work.

    I have to find a way to move these animated sprites to PSRAM - vblank is too short to load them using 4-bit PSRAM configuration.

    Don't bother. They're better off left in fast access RAM, imho.

    However, if you really want to do it then just give yourself more vblanking. Trade in some resolution. EDIT: The sucky part about this is my old DVI monitor is limited to max of 36 vblanking lines when doing arbitrary resolutions.

  • pik33pik33 Posts: 2,388

    Static sprites can remain in the hub.I have to have them there as I use setq2+wmlong to display them so I can have transparent pixels
    Animated ones, like these balls, are too big,
    One of possible solution is to remove animation at all, let the main program do it if needed, preloading the new frame and changing the pointer in vblank.

  • evanhevanh Posts: 16,040
    edited 2022-03-31 14:54

    Yeah, do option #2, separate task for animating. It could even be real-time rendering.

  • pik33pik33 Posts: 2,388
    edited 2022-04-21 11:04

    007e.

    The sprite animation was removed from the driver: the main program can change the pointer in vblank which is easy
    Instead of constant 64x64 size the width and height are now parametrized: you can set a width and a height of the sprite individually. The width however has to be n*4 as one long is 4 pixels. Now there are only 2 factors that limits the sprite size: the place in HUB and the time needed for rdlong/wmlong them.

    I also prepared the DL to change the display address in the middle of the line to enable implementing a window manager.

  • pik33pik33 Posts: 2,388
    edited 2022-05-13 13:41

    PSRAM driver has its command list so one command can move data from different places to other different places. This makes possible the change of display address at the middle of the display line.

    The example below is the test of this possibility:

    • the DL now has a command "display a line from the PSRAM list"
    • there is a PSRAM list prepared so it loads data from the framebuffer, then 256 bytes from the $100000 address where I poked bytes from 0 to 255, then it loads again from the framebuffer
    • every vblank switches the changing positions

    This opens the possibility of making a realtime window manager even with 4 bit. I have to determine how fragmented can be a line to still load in time.

    The result : (4bit bus !) (the music in the background is SunVox playing "8-bit Tale" by NightRadio. Sunvox is a powerful tracker/modular synth)

  • That's kinda cool. Was this done using the request list feature in my PSRAM driver? I've not ever coupled request lists to my own video driver like that but it might be useful to try it out and it could give additional interesting effects or uses. It could be useful to bring up overlays of debug data or other state etc on top of the regular frame buffer data.

  • pik33pik33 Posts: 2,388
    edited 2022-05-13 14:49

    Yes, it is.

    First, the displaylist. I added a command to it - "display a line from the PSRAM request list" A hub address of the request list is the command parameter.
    Then the driver instead of loading the line data from PSRAM directly, executes the request list.

    These are first tests, but then I can make window system from it. Every window can have its own base address, width and height. The window manager has to compute the PSRAM request lists for every display line based on windows parameters.

    There can be a problem when with too many windows the request list becomes too long (=too slow to execute in time) , but I have to write a test code first to determine what can be done.

    This is the driver PASM code as it is now:

    All my p2 related mess including this is in the github repository: https://github.com/pik33/P2-retromachine

  • Very nice

  • Wonder how many small reads can really be reliably done on a per-scanline basis.
    I've been thinking about maybe emulating some other old machines and NeoGeo would be neat (partially because I can just recycle the 68k, Z80 and most of the YM2612 core). Now, NeoGeo uses enormous sprite ROMs (16 MB for Metal Slug) and can draw 96 sprites per line. 96 x 8 byte fetches per line? (While still leaving enough bandwidth for CPU ROM reads and ADPCM streaming). Would probably use a shared driver, since the sprite and ADPCM reads are really not that latency-sensitive.

  • pik33pik33 Posts: 2,388
    edited 2022-05-13 20:16

    This seems to be 320x224 pixels so the scanline seems to be 64 us. 96 transfers is too many, but maybe you can cache them as I did for audio samples. I have 8 256-byte buffers (one for a channel) and a very simple cache code which gets the sample from the cache if hit or get 256 samples from PSRAM if miss. My sprites are still in hub, I have no time to load them from PSRAM.

    This console has 16x16 4bpp sprites, 128 bytes , so maybe make 96 buffers @ 128 bytes each, and then load them in advance, several lines before they have to be displayed.

    My current driver preloads the pixel data 2 lines before and draws sprites one line before it is displayed. The pipeline architecture: preload linenum#+2, make sprites for linenum#+1, display linenum#. As preload is done by PSRAM cog and displaying is done by the streamer, main work for the CPU is drawing sprites and all 3 tasks can be done near simultaneously.

  • evanhevanh Posts: 16,040

    Overlapping windows won't really suit but partitioned windows should be fine. And box outlines, like for selection or window dragging, could be done like sprites I suspect.

  • pik33pik33 Posts: 2,388
    edited 2022-05-13 21:27

    It can be a good idea to do window decoration out of sprites. The problem is I need up to 6 of them (they has to be <512 pixels wide) to decorate one window. I don't know yet how to decorate the window and if the decoration has to be a part of a window, a part of a background or something else. I cannot use the approach I used in RPi: the 4-bit memory bandwidth is too low for this. This has to be done one step at a time :)

  • @pik33 said:
    This seems to be 320x224 pixels so the scanline seems to be 64 us. 96 transfers is too many, but maybe you can cache them as I did for audio samples. I have 8 256-byte buffers (one for a channel) and a very simple cache code which gets the sample from the cache if hit or get 256 samples from PSRAM if miss. My sprites are still in hub, I have no time to load them from PSRAM.

    This console has 16x16 4bpp sprites, 128 bytes , so maybe make 96 buffers @ 128 bytes each, and then load them in advance, several lines before they have to be displayed.

    Yeah, thought so. Caching isn't terribly easy. 96 is the per-scanline limit, but the system can handle 380 total sprites and they can get up to 512 px tall (stacking multiple 16px tiles) and can be squished down (skipping graphic lines).

  • evanhevanh Posts: 16,040

    @pik33 said:
    It can be a good idea to do window decoration out of sprites.

    Tiling, I should have said tiles rather partitions earlier, doesn't need border decorations. It won't be like a desktop GUI, but is that really desired?

  • Actually, now that I think about it, 96 reads per scanline should be totally possible. Actual read (8 bytes on 16 bit bus) is 47 cycles (2*(8+10+4)+3), but let's assume I need 80 with overhead. Scanline has ~21000 cycles. 21000/80 = 262. That's actually plenty!

  • pik33pik33 Posts: 2,388

    Rogloh's driver has more overhead for starting a transfer: I considered 300 cycles overhead. If there will be your own driver without all this mailbox interpreting stuff, the overhead should be less. 80 cycles for 8 bytes transfer? This can solve a lot of problems :)

    I didn't yet measure how fast is this request list interpreter: does it still require 300 cycles to start every transfer in it or is it faster?

  • I think from memory each request list item whose transfer is completed then defers back to the polling loop but you should try it out to be sure. This step would be needed for relinquishing control to the highest priority COG to be serviced and not hogging the processor.

    However in the future maybe if the COG was marked as LOCKED (i.e. a video driver reader COG client but not a regular writer COG client), then there could be scope to have the driver skip some re-polling here, assuming free space in the driver for this feature, (maybe...)

Sign In or Register to comment.