Shop OBEX P1 Docs P2 Docs Learn Events
A display list is fun :) [0.50a beta+0.07g - PSRAM command list used] - Page 2 — Parallax Forums

A display list is fun :) [0.50a beta+0.07g - PSRAM command list used]

2456

Comments

  • Wuerfel_21Wuerfel_21 Posts: 4,374
    edited 2022-01-29 11:49

    DJNZ (and similar instructions) are 4 cycles (+FIFO reload if hubexec) if the branch is taken, 2 if it falls through

  • TonyB_TonyB_ Posts: 2,108
    edited 2022-01-29 18:24

    @pik33 said:
    sub/if_z are 4 clocks while djnz is also 4 clock and maybe reloading the pipeline after the jump can also add time. The experiment will show: I have no more time left so if it adds to the loop time, the driver will stop working.

    I wrote this loop several monts ago and I didn't know about ptrb++ syntax then (I got a hint about it in the audio driver topic.. and didn't notice I can use it here). This will save 2 (or 3) nops which maybe can be used for more useful things

    I think that when you say "nop" what you mean is "two cycles". DJNZ would add two cycles if not zero, so it's not useful here.

    Worst-case loop timing could be reduced by four cycles if this

    p400            cmp     cpl2,lc_char     wz     ' check if switch needed '5
    
            if_nz   skipf   #%111                   ' if not, skip. Make skip instead of if_nz or jmp saves a nop
                    sub     t2,fontline             ' switch the fontline
                    add     t2,lc_fontline     
                    mov     framebuf2,lc_address    ' update the address
    

    were changed to this

                    mov     tmp,fontline
                    sub     tmp,lc_fontline
    
    p400            cmp     cpl2,lc_char     wz     ' check if switch needed '5
    
            if_z    sub     t2,tmp                  ' switch the fontline
            if_z    mov     framebuf2,lc_address    ' update the address
    
  • pik33pik33 Posts: 2,347

    A bug which prevents displaying anything after a repeat command was removed (0.35)

  • May I suggest edge smoothing for the higher scale factors?

    I made it with simple algorithm some time ago; obviously not the exact source is important, but the principle. The exact source is in https://github.com/knivd/C.impl/blob/main/RIDE/graphics.c, function drawChar(), and the fonts are defined in structure font_t.

    The difference it makes:

  • pik33pik33 Posts: 2,347

    The attached code is the source of many graphic procedures and it can be translated to Basic or Spin and included as high level procedures to do drawing in hi-res graphic modes. I have only a very basic set of such procedures now. But then it needs a graphic canvas to do the drawing;

    In the realtime driver code it can not be done, at least not with 1 cog, displaylist and realtime pixel scaling. Text modes have fonts in 8x16 or 8x8 matrix and zoom characters in the real time using pixel and line multiplying. I have no computing power left in the cog to do reasltime smoothing in text modes. Maybe a second cog could smooth resize the character on demand and feed the main driver with it: in the cog I used for the driver have no place and cycles left for this task.

    In graphic modes the big characters can be smoothed if the graphic mode has high enough resolution. But then I can have for example 240x135 resolution at 1 bpp which, while being coarse, needs only 4050 bytes for pixel data and 2 longs for a displaylist - these pixels are big. On a classic 320x200 with border, the resolution may be enough to apply this smoothing.

  • pik33pik33 Posts: 2,347

    Fine scroll command works now at non-zoomed text

  • pik33pik33 Posts: 2,347

    Only one DL command left in TODO list: changing palette on the fly. Then, after cleaning up and commenting the driver will go beta.

  • pik33pik33 Posts: 2,347
    edited 2022-02-03 15:07

    A lot of strange bugs fixed in 0.37. Several commands added to DL. Now it can do these:

    '' - repeat                 %nnnn_nnnn_nnnn_qqqq_mmmm_mmmm_mmmm_0111    repeat the next dl line n times, after q lines add offset m
    
    '' - set border color       %rrrr_rrrr_gggg_gggg_bbbb_bbbb_0001_0011    set border to rgb
    '' - set border color       %0000_0000_0000_0000_pppp_pppp_0010_0011    set border color to palette entry #p
    '' - set font size          %0000_0000_0000_0000_0000_ssss_0011_0011    ssss - bit for font size, 3=8, 4=16
    '' - set font pointer       %aaaa_aaaa_aaaa_aaaa_aaaa_0000_0100_0011
    '' - set live change        %aaaa_aaaa_aaaa_aaaa_nnnn_cccc_cccc_1111    a: 16 bit addr. 00 added to 18 aligned, 2 upper bits=11; nnnn: new font line if charmode cccc cpl to change
    '' - set hscroll            %0000_0000_0000_0000_0000_ssss_0101_0011
    '' - set hi bit for live    %0000_0000_0000_bb00_0000_0000_0110_0011    
    
    • reload palette - %mmmm_mmmm_nnnn_nnnn_qqqq_qqqq_qqqq_1011 reload n palette entries from m color from palette_ptr+q - is still to do and I have no place now for it. 9 longs left, so the cleaning and optimizing is now important.
  • pik33pik33 Posts: 2,347

    I have to do something with those high level functions, they are way too slow. Displaying 4 short lines of text on graphic canvas takes 8 ms. Inline assembler has to be used and low level function inlining. 8 miliseconds is 100x too long for this task.

  • pik33pik33 Posts: 2,347
    edited 2022-03-09 13:24

    0.38.

    PSRAM enabled!

    A display list command %aaaa_aaaa_aaaa_aaaa_aaaa_nnnn_0111_0011 switches between PSRAM and HUB RAM.
    If a=0 then the driver uses HUB as before for display data
    If a<>0 then a is the line buffer address in the hub and n is the line buffer length in 64-bytes unit where 0=64 bytes, $F=1024 bytes

    If PSRAM is enabled, the addresses in the DL becomes PSRAM addresses SHRed by #7 (128 bytes pitch, so 18 bits of DL address can access all 32 MB of PSRAM if it is 4-chip variant (which I don't have yet)

    The 0111 command can be used before any display line so interleaving is possible (several lines from PSRAM, several lines from HUB)

    Live change will work but after the switch the rest of the line data will be loaded from the HUB as before. Switching from one PSRAM address to another is not possible (yet?)

    All these changes and experiments caused the big mess in the driver code which now needs to be cleaned. Also, new high level functions have to be written to set modes, print or draw on the PSRAM.

    The driver is available here https://github.com/pik33/P2-retromachine/tree/main/Propeller/Videodriver.

    Now... cleaning time. The next version will be cleaned and commented.

  • You will have fun with this memory :smile:

  • pik33pik33 Posts: 2,347
    edited 2022-03-09 14:41

    It is incredibly fast! At 896x496 active resolution it can redraw all the screen in one frame, and this is only 4 bit bus version.

    And I don't wait for completing the command. I simply start the transfer and point the driver to the hub ram buffer. The buffer is filled faster than the driver consumes these bytes. No double buffering and preloading needed. QoS setting was needed to make the picture stable while writing to PSRAM - I copied this setting from the VGA video demo

    The DL interpreter became too long for borderless graphics now. At 1024x576 it needs at least 12 pixel left/right border or there is no picture. Maybe, after cleaning I will find the way to display the full screen 1024x576 too.

    This is of course a big fun. After these experiments I know I cannot make a 32 bit version usable at high speed: the wires will be way too long. Maybe I will try 3-chip version at one 8-pin slot if I manage to learn how to set the PSRAM driver to use such configuration

  • roglohrogloh Posts: 5,122
    edited 2022-03-09 14:58

    @pik33 said:
    It is incredibly fast! At 896x496 active resolution it can redraw all the screen in one frame, and this is only 4 bit bus version.

    Cool

    And I don't wait for completing the command. I simply start the transfer and point the driver to the hub ram buffer. The buffer is filled faster than the driver consumes these bytes. No double buffering and preloading needed. QoS setting was needed to make the picture stable while writing to PSRAM - I copied this setting from the VGA video demo

    Yes that is how to do it. You may need more burst size tweaks with audio transfers included. You can also drop off the unnecessary COGs from being serviced which speeds it up slightly as that reduces polling overhead. My code dynamically adjusts its polling loop depending on the number of COGs allowed to make requests, it can be changed on the fly.

    This is of course a big fun. After these experiments I know I cannot make a 32 bit version usable at high speed: the wires will be way too long. Maybe I will try 3-chip version at one 8-pin slot if I manage to learn how to set the PSRAM driver to use such configuration

    Yes that can be done with the larger full blown driver and some bank mapping however you will need to manage the banks individually, as the transfers are not totally contiguous crossing the bank boundaries at this time in the SPIN2 APIs (those transfers would need to be split into different portions for that and that gets more complex with the special graphics copies etc). Addresses can wrap within a bank during transfers. Best to keep each frame buffer within its own single bank.

  • pik33pik33 Posts: 2,347

    Full screen now can work - if done from the PSRAM, 576 kB frame buffer. There was no problem with timings, I simply forgot about this too big framebuffer - to experiment I simply shorten the screen to 400 lines and the borderless version started.

    The plan now is: (1) clean (2) write high-level functions for PSRAM (3) use in the player (4) make a graphic accelerator cog which can do lines, circles, etc - and sprites.... I noticed there is a graphic blitting already implemented in the PSRAM driver.

  • @pik33 said:
    The plan now is: (1) clean (2) write high-level functions for PSRAM (3) use in the player (4) make a graphic accelerator cog which can do lines, circles, etc - and sprites.... I noticed there is a graphic blitting already implemented in the PSRAM driver.

    Yes there are graphics transfers supported, as well as an experimental line drawing accelerator in the PSRAM driver itself (which can only do 8,16,24 bpp modes, not yet 1,2,4 bpp). Line drawing can be accelerated because the driver can compute the pixels and this saves writing individual pixels requests to draw lines which could be slightly less efficient depending on the current burst settings. But graphics drawing from other COGs to PSRAM is still quite fast as you've already seen. I'm looking at other accelerations like font drawing in graphics modes which can be slow if implemented inefficiently. Burst transfers from hub are the way to go instead of individual pixel accesses (wherever possible).

  • pik33pik33 Posts: 2,347

    0.39

    This is a cleanup version before 3 things I have to to do:

    (1) remove a bug which prevents the DL to set display parameters while changing the color depth ( if you change the color depth with DL , garbage is displayed)
    (2) enabling PSRAM based standard mode setting and adding PSRAM support to graphic/text function (= make PSRAM usable)
    (3) forking the driver and creating its simplified PSRAM based graphics only version - with sprites

    Why?

    I simply don't have a place in COG RAM for more code, and even if I can free some place, sprites are not possible with the current driver structure
    PSRAM is big and fast enough, even 8 MB 1 chip version, to not use text or reduced color depth modes, and they eats a lot of COG RAM
    So I want to left 8 bpp graphics only with a simplified DL to free the place for several sprites.

  • pik33pik33 Posts: 2,347
    edited 2022-03-18 16:00

    0.42a

    A display list in standard modes now use repeat commands, so up to 9 longs is enough to define a simple mode with borders, and only 4 long needed to define a borderless mode instead of 1 long per line.
    I managed to move hsync calling before DL interpreter starts. This gives the interpreter much more time to process several additional DL commands before starting to display a line
    Text modes in PSRAM are now supported with all implemented text functions. To set PSRAM based modes, bit #9 (=1024) has to be set when calling setmode()
    If text mode and Basic language is used, it is possible to

    open SendRecvDevice(@v.putchar, nil, nil) as #0

    and then use simply "print something" in Basic and it will be printed on the HDMI monitor instead of a serial port.

    Now, to do next is prepare a DLs for PSRAM based graphic modes and adapt graphic based functions for them.

    I moved the driver developing to its own directory - https://github.com/pik33/P2-retromachine/tree/main/Propeller/Videodriver develop - there is "videotest.bas" to test the driver.

  • pik33pik33 Posts: 2,347

    0.42c

    PSRAM fully supported now. Graphic functions need optimizing and speedup where possible. I cannot do much with putpixel which is one-byte-at-once transfer (>500 clocks) but boxes and filled circles at 8bpp can be speed up using bursts instead of putpixel calls.

    The DL bug (changing bpp not possible) is sitll here and is the next thing to do with the driver

  • pik33pik33 Posts: 2,347
    edited 2022-03-19 20:07

    0.42f

    It seems the PASM part known bugs are fixed. One long has to be found to fit, now it is 497.
    The last planned and unimplemented DL command is now implemented in simpler than planned functionality (no cog ram left) . A DL command can now reload 16 selected (in 16 entries blocks) palette entry.
    This enables changing the palette on the fly and use more than 256 colors on the 8-bit screen (or more than 16 colors on 4 bit screen where the full palette can be switched at every line). The 16 entries limit is timing limit: if too many longs have to be RDLONG to LUT RAM, it takes too much time and the synchronization is lost.

    Now I have to cleanup high level procedures and speed optimize 8 bpp graphics

  • pik33pik33 Posts: 2,347
    edited 2022-03-21 15:55

    0.43
    The 8bpp filled boxes and circles are now fast using burst write to PSRAM or bytefill to HUB
    The putchar now works with graphic modes enabling use BASIC's print in all modes - I have to write a scrollup/scrolldown for these modes to complete the "print enable" task
    I also have to find and remove something that caused non-repeatable glitches while changing graphic modes - sometimes the new mode doesn't look correct.
    Edit: there is something wrong with additional repeat DL command...

    Edit: glitches now stabilized (?) by making an empty DL before switching modes. Repeat DL command error fixed but the diver is now 498 longs

  • pik33pik33 Posts: 2,347
    edited 2022-03-27 11:43

    It is time to go beta

    https://github.com/pik33/P2-retromachine/tree/main/Propeller/Videodriver

    Edit: several things fixed (there was an old version of putchar2) - 050a (the driver spin2 file only) added here

  • pik33pik33 Posts: 2,347
    edited 2022-03-30 18:39

    (3) forking the driver and creating its simplified PSRAM based graphics only version - with sprites

    The first sprite test completed :) (use videotest2 from the attached zip) - I removed near all things from the 0.50, only 1024x576 @8bpp from PSRAM left.

    The driver can now display sprites, 64x46x8bpp, now they are HUB based, but these 64x64 animated sprites are too huge for a P2 hub, so the next step will be to remove HUB based animation and reload a static sprite from PSRAM while in vblank. One static sprite is 4 kB, so 16 of them will use up to 64 kB of HUB RAM. Maybe I can also use variable size. As the work in progress, the code is messy again :)
    I have only 32x32 definitions of these balls so they are upscaled here.

    Edit:

    WTF???

    I had a working driver and demo: I pushed this to github, went home, pulled it down. it doesn't work anymore :'(

    I had this kind of bug earlier in another program. It depended of something from its previous version which resides in P2 RAM until it was powered off. After the hard reset the program stopped working...

    Edit: WTF2??? The monitor says it is 2048x576@50 Hz.

  • evanhevanh Posts: 15,126
    edited 2022-03-31 04:06

    Yes, sometimes I've seen way wrong h-resolution with DVI/HDMI link. Didn't try to nail it down but I think it's when blanking limits are exceeded.

    Blanking limits, as a fixed pixel/line count, are very much a thing with digital links. And very arbitrary too. Like the VGA days when 30 kHz scan rate was an arbitrary minimum. Although you could count on that number at least being consistently always 30 kHz, whereas blanking limits are different for each model now. And no way to report them, even if they wanted to. EDID has no descriptor for blanking ranges explicitly.

    Bring on VRR's inherently wider limits I say.

  • pik33pik33 Posts: 2,347
    edited 2022-03-31 06:13

    The problem is: I didn't change timings. At least I didn't want to change timings. There are 4 versions of the driver I made yesterday, only 001 works at home. I will check it again at the university where it worked yesterday. Maybe I removed loading something important to the memory, which was loaded earlier and remained after soft resets. If yes, this is not the first time I did this. In one of previous video driver I removed uploading the palette to the LUT RAM while cleaning the code. The driver worked until the P2 board was powered off as the palette was already in LUT RAM.

  • evanhevanh Posts: 15,126
    edited 2022-03-31 08:10

    But you changed monitor, right? I'm talking about each monitor/TV limits. Each model has different limits on min/max blankings over digital links.

  • pik33pik33 Posts: 2,347

    Yes, I changed the monitors, I have 2 at home, both don't work, but version 001 works on both, so if it is timing problem, I had to accidentally change timings between 001 which works and 002 which doesn't work

    Now, at the university, it works on 3 different monitors.

    I slightly changed timings now and I cannot do any more here because it simply works with the monitors I have here, so I will continue to clean it, maybe I will find a bug then. This time I will get the code and the working binary on the pendrive instead of pulling it from github.

  • evanhevanh Posts: 15,126

    Have you got a version setup for hubRAM or HyperRAM accessory? I'd be able to test either.

  • pik33pik33 Posts: 2,347
    edited 2022-03-31 08:17

    This version is for PSRAM but it uses Rogloh's memory driver using its mailbox interface. Hub cannot fit the 1024x576x8...

  • @evanh said:
    Have you got a version setup for hubRAM or HyperRAM accessory? I'd be able to test either.

    @evanh, you need to get yourself some PSRAM so you can use that as well as HyperRAM with the P2. If you can source some QSPI PSRAM chips from anywhere these days, I can probably mail you a blank PCB you can use for the P2-Eval if you'd like as I have a spare board remaining. It can take from 1 up to 8 PSRAM chips, though I've only tested it with 4 chips populated. The CLK and CE pins for each bank are separate but can be combined in the driver for parallel/wider data buses. Being SOP8 parts they are pretty easy to solder.

  • evanhevanh Posts: 15,126

    Doesn't have to be the same demo at 8bpp. Just wanting to test your display mode. So it can be a text demo instead.

    I'm sunk with where to find config in the source code. Too much to look through and don't have the hardware to test it.

Sign In or Register to comment.