3D teapot demo
Wuerfel_21
Posts: 5,124
in Propeller 2
Here's some fruits of what I've been working on: Teapot model with 2464 triangles, 256x256 sphere environment texture + vertex AO rendered to 320x240 16bpp at 20 FPS
This is mostly a demo of the optimized and correct 4-cog triangle rasterizer - the transform and setup is currently unoptimized. I believe 60 FPS will be achieved for this demo when I implement optimized geometry processing (currently raster takes only ~10% of the frame time!). Also currently Z-sort is implemented using linked lists in hub RAM, which is unwieldy. I need to replace it with big command blocks stored to PSRAM...
If you want to run it:
- You need a board with PSRAM (only for framebuffering)
- use very recent flexspin
- The pin settings likely don't match your board, but it's almost 4AM and I'm too lazy to dig out the EDGE board now
- Video driver in use is jaunty, don't use video modes other than 'HDMI' or
VGA2X
- Keyboard (or gamepad) lets you interact:
- Tab to toggle texture/solid mode
- Arrow keys and S/D to rotate manually
- Space to resume spinning automatically
- Esc to reset rotation
zip
258K
Comments
This looks amazing Ada, looking forward to running it
Yes, this looks really neat!
for the EDGE P2-EC32MB, this works for PSRAM cfg:
Looks awesome...can't believe this is running on a microcontroller
Got it rotating with an ADXL345, kind of like Adafruit demos with their IMUs (https://www.adafruit.com/product/2472, 1st thumbnail, except it looks like theirs runs on a Mac, whereas this is nice and self-contained!
Being a total noob to 3D, what sort of scale are the pitch, roll, etc? I just scaled up my accelerometer data until it worked, but obviously that's not very scientific.
Nice @Wuerfel_21
Combining with IMU is cool too @avsa242
BTW: Guess forgot about this way of overriding object settings:
exmem : "exmem_mini" | PSRAM_DELAY = 11, MEMORY_TYPE = 8, PSRAM_CLK = 40 addpins 1, PSRAM_SELECT = 42, PSRAM_BASE = 32, PSRAM_BANKS = 6
Do all the spin2 compilers support this, or just flexprop?
Have to remember this one...
This is only the garbage proof-of concept version...
In the end the graphics library should be able to render full 3D worlds at 30 FPS or more.
They're 32 bit binary angles, so 232 is a full rotation. This is just the native format of the P2 rotate instructions.
Yes, this is universal
This will be really interesting for live data visualization.
Nice one @Wuerfel_21 👏
Got it working here with PSRAM. I wonder what sort of 3d games could be developed with such a capability. Are there any exisiting older open source games that might suit this sort of performance level and could then run on a P2 or would something using it need to be home grown? I know it's only 2400*20 triangles/sec, so it's not some super HW accelerated thing but maybe that's still enough for something reasonably simple without too many triangles. Something like those old mechwarrior games perhaps.
This parameter override approach should help me improve my multiple external drivers wrapper code to select memory type when I next update the code for release. Ideally the lower level driver wrapper could also cull the unused memory driver objects based on some #ifdef style macro derived from these parameters so as not to incur a memory footprint penalty. I think that could already be done with flexspin but I'm still waiting for PNut to include some sort of conditional code inclusion before heading down that path...
The current bottleneck is the super unoptimized geometry handling, the triangle fill can go to toe with (very) early accelerators. 56 P2 cycles per pixel, IIRC it gets to 60-70 with overhead factored in. That's ~16 cycles globally with 4 cogs in parallel. Though the optimized geo code will eat into the same cycle budget, along with any audio (all other cogs are taken)
(btw, the demo only runs at 252MHz btw, so "free" 25% improvement going to 320)
I can't think of any existing game that could be easily fitted. Most 3D games use floating point and a lot of RAM - imagine your average mid-90s low end PC, probably has a Pentium (faster at floats than ints!) and between 8 and 32MB of directly addressable RAM.
Yeah that's probably the sort of machine & game era I was thinking about. Those 3d polygon style games from back then. I guess without floating point there's a bit of a limitation there. Need to use fixed point/integer math and leverage HW multiply wherever possible.
Doom?
I robot would work right?
https://en.m.wikipedia.org/wiki/I,Robot(video_game)
Tested this at last Works on EC32 with USB at 16 and HDMI at 0 - my standard setup so only "exmem" line had to be modified. The aspect ratio on my monitor is strange in either setting I can choose, "wide" seems to be too wide, 4:3 is too narrow. The monitor reports 720x480 on HDMI input.
And the program outputs something on the serial terminal.
garbolium video driver I lazily grabbed off the "old projects" pile can't do a true 640 mode without borders (not enough time after processing the last pixel), so 720 it is. I should hook up the newer one for 16bpp PSRAM operation, but not today.
It's at 2000000 baud - it prints end-to-end frame time and raster µcode work time
Unrelatedly, next step is to figure out how to turn models into command streams that can be easily worked on in parallel. i.e. each command has some large N of like items to process and it can be split such that each cog handles ~N/4 of them.
The current idea is to have a buffer of 256 transformed/lit/etc vertices (stored in the at-that-point unused framebuffer area).
For a small model, all vertices can be transformed in one command and then the next command builds all the triangles.
For a larger model, this needs to be (smartly) split into multiple chunks. (see also: https://www.researchgate.net/publication/6979989_An_improved_vertex_caching_scheme_for_3D_mesh_rendering )
Interesting: With this approach simple animation skinning is essentially free, since you can load matrix A, transform some verts, load matrix B, transform some more, then draw triangles that span the gap. An idea with legs and feet.
I'm not sure what to do with texture UVs though: If they are processed alongside the vertex, that will cause a lot of duplicates (same position, different UV). But Envmapping (as shown in the demo) requires this - UV is made up from vertex normal. Could separate position from UV/lighting, but then lighting gets into trouble when it depends on position (i.e. fog/depth-cue). Not that transforming a position is super expensive to begin with, but it also reduces the efficiency of the buffer.
Very cool demo, Ada!
If you though this would be fun but don't have a board with PSRAM, try this version. It sends the output images as a USB Video Class device. The PSRAM is only used to store completed frames for the display driver. That prevents some pretty bad flickering. The JPEG artifacts pretty visible with this kind of source material. The UVC output cuts the framerate in half since the rendering is stopped while the frame is JPEG encoded and sent via USB. There aren't enough cogs or on-chip memory to run both at the same time. I think one cog does double duty as a render cog and JPEG encoder.
@SaucySoliton neat!
Eventually PSRAM will store basically everything (models, textures and raster command buckets), so that trick won't work anymore...
How does this compare with small3dlb?
Better one would presume?
https://forums.parallax.com/discussion/172200/3d-graphics-with-small3dlib-and-flexc