Yeah, that is about the limit of it. Helps to have a blurry TV.
Only other thing I can think of gets complex. Run higher bit depth scanline buffers and use a shift and lookup table to convert the 8 bit pixels to their light value higher bit depth values on the fly.
Without either limiting colors, or dither, or a bigger buffer, dynamic display seems like the answer.
Just got my frame rate back to ~10 fps (from 8 fps), but not fully clear why... Except, looks like you can't jmp into a hubexec call...
I turned some code (updateScreen) into a subroutine to move outside of cog for hubexec.
Then, I brought it back into cog.
Anyway, I just noticed this jmp in front of the RET and commented it out:
Funny thing is that the jmp should have the same effect as RET because the caller just jumps to the same place.
But, "DrawBackground" is now a hubexec routine. It must be that the RET in DrawBackground in hubexec mode goes to a different place if jumped to rather than called...
Wait, I think I get it now... I was doing DrawBackground twice...
332 is 4 shades of gray (including black). I just picked on Blue because our eyes are least sensitive, but using a packed color mode gives you the simplest linear color conversion. Unless you want to implement HSV colorspace and spread the bits out appropriately.
The P2 has a nice LUT for proper 8-bit indexed color, doesn't it? No need for garish 332.
P2 could run DOOM, I think. Main problem is RAM: DOS doom needs some 4 MB - however, that includes all sorts of stuff (I don't think it uses the first 640k at all), so by simplifying the levels a little (like on the console ports of the day) and streaming the sound effects from SD card, it should work out fairly well. I'd use one cog for running the game logic, one for sound, one for SD card handling, one for TV output, two for rendering.
Thinking about this again...
Say we want several wall tiles and the ability to darken them.
But, we also want colors for floor and especially background...
Think I'd make an image with all the wall tiles and darkened versions of them.
Then, I'd have Photoshop pick the best palette using 128 colors.
Then, do the same with just floor tiles and background.
Then, combine those two palettes.
Finally, create darkening look up tables for wall tiles. Just a 128 entry byte table...
Looks like the "COLORMAP" in doom is how they did shading. It's similar to the look up table scheme I was thinking of...
But, I see they also dynamically changed the palette to ones that were redder or yellower.
The redder one used when player is damaged. (I think I remember seeing that!).
I was looking at using QROTATE instead of sine and cosine tables in the floorcast section of code.
Was disappointed to see that this was actually slower...
floorcasting doesn't need any sines, anyways. Just some vector*matrix multiplications, except that when doing it in columns, one of the scalars is constant and the other comes from a LUT, indexed by Y position. Thus, only two two multiplies and additions are required. (Does the P2 have an FMA instruction?) When doing it in rows, the constant scalar comes from a LUT (or is calculated, allowing for vertical camera movement) and the other one is a linear function of the X position, so there's only 2 additions per pixel.
Well, I was able to move sin&cos outside of the inner, vertical loop.
Still need it for the column loop though...
Although, LUTs could help. Have one already but if had more memory, could speed it up...
Most code fits in that cog, but had to move some out to hubexec space.
Added in two darker versions of wall tile to make far walls a hair darker.
New binary in first post.
Most code fits in that cog, but had to move some out to hubexec space.
Added in two darker versions of wall tile to make far walls a hair darker.
New binary in first post.
I think I've decided that the trig table use could be improved for P2...
Right now, it takes up 60kB of space to cover angles from 0..360.
Think we can trim that down to 0..90 and save 1/4 of space with penalty of only a few clocks of extra instructions.
Also, the example has tables of sin, cos, tan and 1/sin, 1/cos, 1/tan.
It doesn't look to me like these "inverse" tables are not really needed as qdiv looks to be same speed as qmul.
Also, think I've decided that Q16.16 is OK for tan table, but sin and cos should be more like Q1.31...
I've been thinking about what might be possible here, with the constraint that everything has to fit in 512kB...
The old DOS games required 8 MB, so this is a real limitation...
Anyway, I think ROTT (Rise of the Triad) is about the level of sophistication that is fairly easy to do. ROTT uses the Wolf3d engine with some tricks that make it feel like real 3D.
The next level is the DOOM engine. I took a look at it and it looks way to complex to ever fit in 512kB...
Anyway, I just forced myself to play ROTT to refresh my memory and I see it's pretty cool, like I remember, but not so different from the simple example I'm doing here...
I'm thinking the rendering code itself cannot be that large.
But we do need some kind of external RAM solution that we can all consistently use for texture or geometry data.
Yes, RAM use is going to be key for P2 with 3d graphics and/or gaming. Write contention and latency are probably going to be the major issues to deal with once external RAM is added. Perhaps even using multiple independent RAM blocks for different things may be of benefit if the pin count remains manageable, allowing multiple processes to simultaneously access RAM or for double buffers etc.
How much can be pulled directly from the SD card?
Could Procedurally Generated levels/textures help here?
I'm not sure what you have in memory for the levels right now. Level descriptions and textures I'm assuming. Could off screen textures be cached in the SD?
An external-RAM eggbeater. Have multiple cogs take turns with external RAMs in a predetermined way.
Well I guess that was what I was thinking, not sure if I'd call it exactly that though. I was thinking one COG could have exclusive use of one block of RAM, and another COG another type of RAM (on different pins), both simultaneously, instead of competing for some shared access of a common external RAM. For example, you could have one RAM being read by a video renderer process and another being written into for the next frame, or being accessed for textures etc. Also having more than one block of RAM obviously burns more pins. Maybe HyperRAM is good for that in some cases, if the overall latency is still acceptable.
Comments
Result is just OK. If the tiles went from 64x64 to 128x128 would be better...
https://quakewiki.org/wiki/Quake_palette
They used 16 colors, with 16 levels of brightness. (Dark to Bright)
0: Black/White x 16 shades
1: Brown x 16 Shades
2: Light Blue x 16 Shades.
....
Etc.
Only other thing I can think of gets complex. Run higher bit depth scanline buffers and use a shift and lookup table to convert the 8 bit pixels to their light value higher bit depth values on the fly.
Without either limiting colors, or dither, or a bigger buffer, dynamic display seems like the answer.
Except, looks like you can't jmp into a hubexec call...
I turned some code (updateScreen) into a subroutine to move outside of cog for hubexec.
Then, I brought it back into cog.
Anyway, I just noticed this jmp in front of the RET and commented it out:
Funny thing is that the jmp should have the same effect as RET because the caller just jumps to the same place.
But, "DrawBackground" is now a hubexec routine. It must be that the RET in DrawBackground in hubexec mode goes to a different place if jumped to rather than called...
Wait, I think I get it now... I was doing DrawBackground twice...
P2 could run DOOM, I think. Main problem is RAM: DOS doom needs some 4 MB - however, that includes all sorts of stuff (I don't think it uses the first 640k at all), so by simplifying the levels a little (like on the console ports of the day) and streaming the sound effects from SD card, it should work out fairly well. I'd use one cog for running the game logic, one for sound, one for SD card handling, one for TV output, two for rendering.
Say we want several wall tiles and the ability to darken them.
But, we also want colors for floor and especially background...
Think I'd make an image with all the wall tiles and darkened versions of them.
Then, I'd have Photoshop pick the best palette using 128 colors.
Then, do the same with just floor tiles and background.
Then, combine those two palettes.
Finally, create darkening look up tables for wall tiles. Just a 128 entry byte table...
https://doomwiki.org/wiki/COLORMAP
That is how ID worked with 8 bit LUT graphics. The palette changes dynamically.
Also, this thread covers another way that is more involved to setup, but gives better results: http://quakeone.com/forum/quake-help/servers-and-coding/8154-nifty-tool-to-convert-images-to-quake-s-palette
Now, when does my eval board arrive? Lol.
But, I see they also dynamically changed the palette to ones that were redder or yellower.
The redder one used when player is damaged. (I think I remember seeing that!).
Was disappointed to see that this was actually slower...
Still need it for the column loop though...
Although, LUTs could help. Have one already but if had more memory, could speed it up...
Most code fits in that cog, but had to move some out to hubexec space.
Added in two darker versions of wall tile to make far walls a hair darker.
New binary in first post.
This is on the FPGA at 80MHz, right?
Two weeks until chipmas
Right now, it takes up 60kB of space to cover angles from 0..360.
Think we can trim that down to 0..90 and save 1/4 of space with penalty of only a few clocks of extra instructions.
Also, the example has tables of sin, cos, tan and 1/sin, 1/cos, 1/tan.
It doesn't look to me like these "inverse" tables are not really needed as qdiv looks to be same speed as qmul.
Also, think I've decided that Q16.16 is OK for tan table, but sin and cos should be more like Q1.31...
The old DOS games required 8 MB, so this is a real limitation...
Anyway, I think ROTT (Rise of the Triad) is about the level of sophistication that is fairly easy to do. ROTT uses the Wolf3d engine with some tricks that make it feel like real 3D.
The next level is the DOOM engine. I took a look at it and it looks way to complex to ever fit in 512kB...
Anyway, I just forced myself to play ROTT to refresh my memory and I see it's pretty cool, like I remember, but not so different from the simple example I'm doing here...
But we do need some kind of external RAM solution that we can all consistently use for texture or geometry data.
Could Procedurally Generated levels/textures help here?
I'm not sure what you have in memory for the levels right now. Level descriptions and textures I'm assuming. Could off screen textures be cached in the SD?
J
Well I guess that was what I was thinking, not sure if I'd call it exactly that though. I was thinking one COG could have exclusive use of one block of RAM, and another COG another type of RAM (on different pins), both simultaneously, instead of competing for some shared access of a common external RAM. For example, you could have one RAM being read by a video renderer process and another being written into for the next frame, or being accessed for textures etc. Also having more than one block of RAM obviously burns more pins. Maybe HyperRAM is good for that in some cases, if the overall latency is still acceptable.