Prop2 Texture Mapping

cgracey · 2013-03-13 18:19

I got the texture mapping documented. It will undergo some refinement in the coming days, but all the information is there. There's also an example program:

Prop2_Docs.zip

I would probably never have known how to formulate a simple texture mapper, but Andre LaMothe spent a lot of time explaining it to me, and then Roy Eltham came around later and added pixel-based alpha blending and mirroring.

If any of you guys are very versed in 3D graphics, there should be all the info you need here to implement a 3D rendering engine.

Bill Henning · 2013-03-13 19:04

Thanks for the updated docs!

The texture mapping sounds great.

cgracey wrote: »

I got the texture mapping documented. It will undergo some refinement in the coming days, but all the information is there. There's also an example program:

Prop2_Docs.zip

I would probably never have known how to formulate a simple texture mapper, but Andre LaMothe spent a lot of time explaining it to me, and then Roy Eltham came around later and added pixel-based alpha blending and mirroring.

If any of you guys are very versed in 3D graphics, there should be all the info you need here to implement a 3D rendering engine.

Sapieha · 2013-03-13 19:19

Hi Chip.

Thanks.

Have You maybe values to change NTSC to PAL ?

cgracey wrote: »

I got the texture mapping documented. It will undergo some refinement in the coming days, but all the information is there. There's also an example program:

Prop2_Docs.zip

I would probably never have known how to formulate a simple texture mapper, but Andre LaMothe spent a lot of time explaining it to me, and then Roy Eltham came around later and added pixel-based alpha blending and mirroring.

If any of you guys are very versed in 3D graphics, there should be all the info you need here to implement a 3D rendering engine.

Tubular · 2013-03-13 19:25

Wow!

Thanks for the timely example. I've been wrestling with mode F and trying to get the data into the stack ram in between waitvids, and this example shows exactly how to do it. I had been going down a multi threaded path but this is much better. Very nice.

cgracey · 2013-03-13 19:59

Sapieha wrote: »

Hi Chip.

Thanks.

Have You maybe values to change NTSC to PAL ?

I'll make a VGA example, since that will be sharpest and work everywhere.

potatohead · 2013-03-13 20:06

While you wait, it's also possible to take the changes I made to get component NTSC and run this example unchanged otherwise. (I believe anyway... I'm about to do just that.)

http://forums.parallax.com/attachment.php?attachmentid=99833&d=1362968699

Let's just say I'm a fan of the lower sweep rates.

potatohead · 2013-03-13 20:10

WOW, BTW.

Seems to me, something like WOLF3D can be done in hardware now, not a draw list like Baggers did. I just read that again and it sounds crappy. I don't mean that. The code Baggers posted is awesome for P1. Not supposed to be possible.

cgracey · 2013-03-13 20:17

potatohead wrote: »

While you wait, it's also possible to take the changes I made to get component NTSC and run this example unchanged otherwise. (I believe anyway... I'm about to do just that.)

http://forums.parallax.com/attachment.php?attachmentid=99833&d=1362968699

You probably could get that texture demo working on VGA, unless, of course, it's impossible.

Bob Lawrence (VE1RLL) · 2013-03-13 20:23

Thanks Chip. More fun.
:cool:

Here`s the results from my test:

Displayed on a Hisense LED LCD TV Model H32K38E

(fFrom Left to right) Picture 1 : NTSC 256 x 192 - luma/color bars , Picture 2: NTSC 256 x 192 - texture (the edges are actually square, the camera angle makes it look squished in on the bottom sides.

potatohead · 2013-03-13 22:03

Here it is for NTSC component: I need to get some serious time to play with this.

I've no VGA display here at the moment.

FWIW, I like NTSC component, not just for the nice, slow sweeps where there is the max time to do things, but also for the portability. I can setup an NTSC composite or S-video display and run that up to 640x400 or so. The composite / s-video will not render color at that detail, and pixels will be lost, etc... but overall the display works just fine, meaning I can capture it, work on the go, whatever. When I get to a better quality display, switch to the component and work at full detail.

This is darn cool. Other display mappings require different sweep timings which will eventually impact tighter code. At least with these two, it really is just a color space mapping and pin setup, little else. Just an FYI as to why I went this way right away. I can get full color resolution on the component, and a reasonable pixel density with no meaningful code changes. Nice. The colorburst and such are present in the signal, but ignored by the Y input, FYI.

Edit: Just thought of something. It's very highly likely that PAL compatable sets with component inputs will display this just fine. Anyone have a device to test? If they won't, I'm thinking a simple change to 50 Hz will fix that, and it will render just fine. If so, many American sets will display 50Hz signals, and they do so even when formatted NTSC. Some computers were capable of this, two I can think of were the C= Amiga and Tandy Color Computer 3, both able to output 50/60 Hz NTSC, or PAL depending on where they were made, and the software options selected.

A prop could output 50/60 Hz component signals and that signal might just display anywhere there are component inputs. If so, that's a near universal 640x200 or 400-420 line interlaced display.

Baggers · 2013-03-19 02:21

Thanks for the reference potatohead

and don't worry, I knew what you meant haha

Anyway, thought I'd give a heads up, as I ( very soon ) will be a DE2 owner

and will be able to start joining in with the fun.

As for Wolfenstein, that could be a good starting point

I also live in PAL world, and also have component TVs so will be able to test this for you, once my boards arrive and are set up!

PS, it's great to be back!

Cheers,
Jim.

potatohead · 2013-03-19 11:28

Good times ahead! The new video system is fast and capable, even at a mere 60Mhz!

Cheers, and likewise Baggers.

Baggers · 2013-03-19 15:01

Yeah, I'm looking forward to having a good play with it

Rayman · 2013-03-19 17:32

I'm curious to see how texture mapping can be applied for 3d graphics here...

I've always thought that 3d graphics was mostly all about polygon rendering.
Texture mapping onto the polygons is the next step up in quality from rendering with solid, possibly shaded colors.

Roy Eltham · 2013-03-19 21:15

Rayman,
What the texture mapper is doing is one step of an inner loop to a span rasterizer. When you do 3D graphics it all boils down to spans of pixels that comprise a triangle (usually) or polygon.

You will still need to do all the math and setup for a triangle/polygon, and then walk the edges. For each step along the edge(s) you would setup the texture mapper registers, and then loop across the screen pixels for the span.
You could also walk the edges using these instructions, but I'm not sure if it's a win. because you would need to do a pass walking the edges and saving the values, then loop back over those values again and run the spans.

Roy

cgracey · 2013-03-19 22:30

Roy Eltham wrote: »

Rayman,
What the texture mapper is doing is one step of an inner loop to a span rasterizer. When you do 3D graphics it all boils down to spans of pixels that comprise a triangle (usually) or polygon.

You will still need to do all the math and setup for a triangle/polygon, and then walk the edges. For each step along the edge(s) you would setup the texture mapper registers, and then loop across the screen pixels for the span.
You could also walk the edges using these instructions, but I'm not sure if it's a win. because you would need to do a pass walking the edges and saving the values, then loop back over those values again and run the spans.

Roy

Roy, did the texture mapping doc's make complete sense to you? (I ask Roy because he knows this stuff inside and out.)

Pharseid380 · 2013-03-19 22:37

I wrote a 3D routine back in the 90's and after it transformed, projected and clipped a polygon, it would go through the vertices of a polygon pairwise using Bresenham's algorithm to fill in values in an array which had as many values in one direction as the height of the screen and 2 values in the other dimension, which among other things would hold the starting and ending x values for that raster line (so the y values were implicit by the position in the array). It didn't do texture mapping, but each entry was a structure which also contained r,g,b color of the surface and a value for lighting. Oh yes, and z values to do the z-buffer algorithm (although I also had a depth-sort version). If it had been upgraded for texture mapping, it would contain starting and ending texture coordinates for a line and starting and ending light values. The point being to fill in this array, you're just doing a lot of interpolating between vertex values, interpolating x and y values, texture coordinates and lighting values. Once the array was "filled" (in the general case, the whole array wouldn't necessarily be filled, I would note the highest and lowest y values) it would go through the appropriate values of the array and successively draw each line. It didn't look that bad although it was pretty slow on the computers of that time. On a Prop ][ you could probably crank polygons out at a pretty good rate, given that you could have different cogs doing different steps in the rendering pipleline, you could just add cogs until you got the performance you needed.

Roy Eltham · 2013-03-20 02:17

Yes, Chip, They made sense to me.

Also, I keep forgetting that the registers aren't all mapped, so you can't use these to walk the edges, since you can't read back the values to "save" them at each edge step. So doing all the math for the edge walking is going to eat available cog memory. probably will want to use another cog for that, and just feed the rendering cog with the values.

cgracey · 2013-03-20 06:24

Roy Eltham wrote: »

Yes, Chip, They made sense to me.

Also, I keep forgetting that the registers aren't all mapped, so you can't use these to walk the edges, since you can't read back the values to "save" them at each edge step. So doing all the math for the edge walking is going to eat available cog memory. probably will want to use another cog for that, and just feed the rendering cog with the values.

Because you can't read them back, you'll have to compute terminal=initial+delta*steps on your own. That's three instructions per parameter.

Roy Eltham · 2013-03-20 10:30

For a single triangle we'll have 3 sets of X,Y,Z,U,V,R,G,B,A data. Well need to calculate 2 or 3 sets of deltas for the edge(s) depending on the triangle orientation. Then we'll need 2 sets of X,Z,U,V,R,G,B,A data plus a shared Y for both as the current values as we walk down the edges. Those current values will be used to calculate the values to load into the setpix_ instructions. (9*6)+(8*2)+1 = 71 longs The largest single texture we can use takes 256 longs. So we have like 179 longs available for code and any other data needed.

Roy

Rayman · 2013-03-20 11:36

Ok, sounds like you guys have some plans on how to do all this. Glad to hear that.

BTW: Another way to do 3D is raytracing... It'd be interesting to see if Prop2 could do real-time raytracing at some low resolution...

Roy Eltham · 2013-03-20 12:35

Ray,
Re: realtime ray tracing:

Seriously doubt it could be anything near realtime (even at very low res). Maybe if your scene was ultra simple (like no mesh data, just planes and spheres, and no textures), static, the resolution was extremely low (16x16?), and you only did like 1 maybe 2 ray bounces.

Modern GPUs barely achieve it with reasonable scenes in medium resolution, and they are using massively parallel processing (on the order of 512 to 2048 cores, resulting in a 2-4 TFLOP (TeraFLOP) throughput.

However, I bet we could get a Ray Tracer working that would produce a pleasing image in a few seconds.

Rayman · 2013-03-20 12:50

You're probably right...

Still, maybe there's some cordic magic that helps. Plus, you can buy a lot of Prop chips for the price of 1 modern GPU...

Roy Eltham · 2013-03-20 14:20

It also depends on what you mean by realtime. Some people would accept 5-10 frames per second as real time. For me it needs to be at least 30, but preferably 60 frame per second, so that means you have 16.66 or 33.33 ms to render the complete frame.

Cluso99 · 2013-03-20 16:11

Roy: I see a "stack" of P2's in your future

This is some serious stuff here. Sorry, I have never done anything like this, so I am just enjoying reading what you guys are up to

cgracey · 2013-03-20 18:06

Here is some slightly updated TEXTURE MAPPER doc's:

TEXTURE MAPPER
--------------

Each cog has a texture mapper (PIX) which can sequentially navigate a rectangular 2D texture
map with Z-perspective correction to locate a texture pixel, translate that texture pixel into
A:R:G:B (Alpha:Red:Green:Blue) pixel data, perform discrete scaling on those A:R:G:B components,
and then alpha-blend the resulting pixel with another pixel for multi-layered 3D effects.

A texture map is stored in register RAM as a sequence of 1/2/4/8-bit texture pixels which build
from the bottom bits of an initial register, upward, then into subsequent registers. They are
ordered, in contiguous sequence, from top-left to top-right down to bottom-left to bottom-right.
These texture pixels get used as offsets into stack RAM to look up A:R:G:B pixel data. Texture
map width and height are individually settable to 1/2/4/8/16/32/64/128 pixel(s).

The SETPIX instruction is used to configure PIX:

    SETPIX  D/#n    - Set PIX configuration to %UUU_VVV_PP_W_H_V_xxxx_AAAAAAAA_RRRRRRRRR

          %UUU = texture map width, %VVV = texture map height

                 %000 =   1 pixel
                 %001 =   2 pixels
                 %010 =   4 pixels
                 %011 =   8 pixels
                 %100 =  16 pixels
                 %101 =  32 pixels
                 %110 =  64 pixels
                 %111 = 128 pixels

           %PP = texture pixel size

                 %00 = 1 bit
                 %01 = 2 bits
                 %10 = 4 bits
                 %11 = 8 bits

            %W = stack RAM pixel data offset/size

                 %0 = long offset, 8:8:8:8 bit A:R:G:B data
                 %1 = word offset, 1:5:5:5 bit A:R:G:B data (gets expanded to 8:8:8:8)

            %H = horizontal mirroring

                 %0 = OFF, image repeats when U'[15] set
                 %1 = ON,  image mirrors when U'[15] set

            %V = vertical mirroring

                 %0 = OFF, image repeats when V'[15] set
                 %1 = ON,  image mirrors when V'[15] set

     %AAAAAAAA = base address in stack RAM of A:R:G:B pixel data

    %RRRRRRRRR = base address in register RAM of texture pixels


Aside from SETPIX, which configures PIX's base metrics, there are seven other instructions
which establish initial values and deltas for the (U,V) texture coordinates, Z perspective,
and A/R/G/B scalers. These instructions are likely to be used before every sequence of GETPIX
instructions. They each set the value of their respective 16-bit parameter to the low word of
their operand, while the high word sets the 16-bit delta which gets added to the parameter
upon every GETPIX instruction:

    SETPIXU D/#n    - Set U to low word and DU to high word
    SETPIXV D/#n    - Set V to low word and DV to high word
    SETPIXZ D/#n    - Set Z to low word and DZ to high word
    SETPIXA D/#n    - Set A to low word and DA to high word
    SETPIXR D/#n    - Set R to low word and DR to high word
    SETPIXG D/#n    - Set G to low word and DG to high word
    SETPIXB D/#n    - Set B to low word and DB to high word


Once PIX is configured and initial parameters are set, the GETPIX instruction may be used to
look up the current texture pixel, scale its A/R/G/B components, blend it with a pixel in D,
and update the U/V/Z/A/R/G/B parameters with their deltas. GETPIX takes 3 clocks and also
needs 3 clocks in pipeline stages 2 and 3:

        NOP     #2              'ready pipeline, GETPIX needs 3 clocks in pipeline stage 2
        NOP     #2              'ready pipeline, GETPIX needs 3 clocks in pipeline stage 3
        GETPIX  pixel           'execute GETPIX, GETPIX takes 3 clocks in pipeline stage 4


To make GETPIX more efficient, it can be repeated using REPD to perform a sequence of pixel
operations:

        REPD    #64,#1          'render 64 texture pixels and blend them with 'pixels'
        SETINDA #pixels         'point INDA to pixels
        NOP     #2              'ready pipeline, 3 clocks in initial pipeline stage 2
        NOP     #2              'ready pipeline, 3 clocks in initial pipeline stage 3
        GETPIX  INDA++          'execute GETPIX, 3 clocks per repeating GETPIX


As GETPIX executes, the following sequence occurs over three pipeline stages:


    In pipeline stage 2:

        Z-perspective correction
        ------------------------
        Z' = 256 - Z[15:8]
        U' = (U[15:0] / Z') MOD 256
        V' = (V[15:0] / Z') MOD 256

        A texture pixel is read from register RAM at texture map location (U',V'), with
        the U' and V' top-most bits being used as coordinates. For example, if the texture
        size is 32x8, then the top 5 bits of U' and the top 3 bits of V' would be used to
        locate the texture pixel.

        parameter updating
        ------------------
        Z = Z + DZ
        U = U + DU
        V = V + DV


    In pipeline stage 3:

        The texture pixel is used as an offset to look up A:R:G:B pixel data in stack RAM,
        which gets assigned to TA:TR:TG:TB.


    In pipeline stage 4:

        pixel scaling
        -------------
        A' = (TA * A[15:8]  +  255) / 256
        R' = (TR * R[15:8]  +  255) / 256
        G' = (TG * G[15:8]  +  255) / 256
        B' = (TB * B[15:8]  +  255) / 256

        pixel blending
        --------------
        D[31..24] = 0
        D[23..16] = (A' * R'  +  (255 - A') * D[23..16]  +  255) / 256
        D[15..8]  = (A' * G'  +  (255 - A') * D[15..8]   +  255) / 256
        D[7..0]   = (A' * B'  +  (255 - A') * D[7..0]    +  255) / 256

        C = A' <> 0     (for GETPIX D/#n WC, C = texture pixel opacity <> 0)

        parameter updating
        ------------------
        A = A + DA
        R = R + DR
        G = G + DG
        B = B + DB


Note that if Z[15:8] = 0, no scaling occurs, or (U',V') = (U[15:8],V[15:8]). The bigger
Z[15:8] gets, the more compressed the texture rendering becomes, until when Z[15:8] = 255,
(U',V') = (U[7:0],V[7:0]).

I realized today that the next thing I must do is make a driver for the SDRAM chip that is on the DE0 and DE2 boards, as well as the Prop2 module we are building at Parallax. We need to get high-resolution bit-mapped displays going to graphically demonstrate a lot of the Prop2's features.

John A. Zoidberg · 2013-03-22 07:24

Could this be comparable to a uh... normal ISA-level video card which processes a moderate amount of polygons? Or a S3-Trio kind of a performance? (S3 Trio is a classic commonly used graphic card back in the late 90s)

Bob Lawrence (VE1RLL) · 2013-03-23 19:33

Just having fun :cool:

Baggers · 2013-03-26 18:04

Question for Chip,

Is the output from GetPix always 32bit?
ie, can it output 16bit? so it can then be fed into a bitmap area? or would it have to be converted to 16bit, if so, is there an instruction to do this quick? or is it a case of shifts, ands and ors per word?

cgracey · 2013-03-27 15:04

Baggers wrote: »

Question for Chip,

Is the output from GetPix always 32bit?
ie, can it output 16bit? so it can then be fed into a bitmap area? or would it have to be converted to 16bit, if so, is there an instruction to do this quick? or is it a case of shifts, ands and ors per word?

GETPIX always outputs $00_RR_GG_BB data, or 8:8:8 RGB. It's 24-bit, anyway, with 8 leading 0 bits.

I never thought to make it less than 8:8:8 because 5 bits per color produces obvious gradients. At 7 bits you can hardly see gradients and at 8 they disappear. So, I left it 8:8:8, only.

Tubular · 2013-03-27 16:30

Baggers wrote: »

Question for Chip,

Is the output from GetPix always 32bit?
ie, can it output 16bit? so it can then be fed into a bitmap area? or would it have to be converted to 16bit, if so, is there an instruction to do this quick? or is it a case of shifts, ands and ors per word?

You might not need it for this, but have a look at the MovF bitfield mover, it's nice and quick and has some auto increment features.

Prop2 Texture Mapping

Comments