I tried the simple triangle drawing. The time depends on the size: the bigger the triangle, the better the time for one pixel - as expected. For a triangle 100 pixels high and 200 pixels wide at the bottom, I got 7 clocks per pixel for execList call (that I made PUB to call from the test program)
The biggest test triangle I tried was 512 px high, 1024 px wide and it took about 5 ms to draw, about 6 clocks per pixel.
Now I need a fast procedure to prepare the list for any given triangle and that procedurew should be as fast as possible (=asm)
The program on the attached video displays some test texts to check if the driver works OK, waits 5 seconds and then draws 256 triangles to test if drawing in a loop can corrupt the driver (that didn't happen yet)
sub testtriangle
var pixels=0
for j=0 to 255
for i=0 to 511
list(4*i)=$C000_0000+v.buf_ptr+1024*i+(511-i)
list(4*i+1)=j
list(4*i+2)=2*(i+1)
list(4*i+3)=varptr(list(4*i+4))
pixels+=2*(i+1)
next i
list(2399)=0
i=getct()
psram.execlist(mbox,varptr(list),0)
'i=getct()-i: print i: print i/337 : print i/pixels
next j
end sub
@Wuerfel_21 said: Finally made it work. I only wrote the converter tool and changed the Spin code to match, but that's already enough to push it to 30 FPS (2 vsync per rendered frame).
Neat. I wonder if higher resolutions beyond 320x240 could work if the final image is stored in PSRAM where there is plenty of space. Does the entire frame buffer need to be rendered into HUB first or could each scan line or smaller portion of the frame be copied into PSRAM as it loads the frame in PSRAM, reducing the HUB RAM requirements? Would you envisage bandwidth issues for that?
You could have it write the individual spans to the PSRAM, but that has big overhead and won't work with any sort of masking/transparency. Another option is to render 320x240 quarter screens and combine them into a final 640x480 frame. Will need additional bucket memory for that. Either way filling 4x the pixels + extra overhead -> very slow -> dubiously useful.
@pik33 said:
Now I need a fast procedure to prepare the list for any given triangle and that procedurew should be as fast as possible (=asm)
Doing this correctly (i.e. such that adjacent triangles meet their edges exactly, without gaps or overdrawn pixels) is actually not easy, especially when you add sub-pixel precision into the mix (which is needed to avoid jittery animation). For single triangles the correct rasterization tends to look weird.
Comments
I tried the simple triangle drawing. The time depends on the size: the bigger the triangle, the better the time for one pixel - as expected. For a triangle 100 pixels high and 200 pixels wide at the bottom, I got 7 clocks per pixel for execList call (that I made PUB to call from the test program)
The biggest test triangle I tried was 512 px high, 1024 px wide and it took about 5 ms to draw, about 6 clocks per pixel.
Now I need a fast procedure to prepare the list for any given triangle and that procedurew should be as fast as possible (=asm)
The program on the attached video displays some test texts to check if the driver works OK, waits 5 seconds and then draws 256 triangles to test if drawing in a loop can corrupt the driver (that didn't happen yet)
You could have it write the individual spans to the PSRAM, but that has big overhead and won't work with any sort of masking/transparency. Another option is to render 320x240 quarter screens and combine them into a final 640x480 frame. Will need additional bucket memory for that. Either way filling 4x the pixels + extra overhead -> very slow -> dubiously useful.
Doing this correctly (i.e. such that adjacent triangles meet their edges exactly, without gaps or overdrawn pixels) is actually not easy, especially when you add sub-pixel precision into the mix (which is needed to avoid jittery animation). For single triangles the correct rasterization tends to look weird.