Propeller 2: next great Indie video game platform?

pedward · 2013-04-28 22:33

With all of the uber awesome capabilities everyone has demonstrated with visual effects, I can't help but think the P2 represents a new dawn in indie video game development. A truly accessible platform that rewards clever video game developers with neat effects.

I was just checking the specs of the Propeller 2 against the Playstation 1:

MIPS: P2 160 per COG 1280 per chip -- PSX 33 main CPU, 66 geometry engine
Video: P2 1bit, 4bit, 8bit, 15bit, 16bit, 24bit, res is whatever you want -- PSX 256-640 x 240 NI or 256-640 x 480 Interlaced, 24bit

There are a bunch of other things, but I don't know how to really compare the geometry engines, but at the rendering rate of the PSX, you'd have 444 clock cycles to render a polygon on the P2 and achieve the same rate. 888 instructions if they are texture mapped polygons.

The PSX has some hard limits like 256x256 sprites, but obviously these are only limited by what a programmer wants on the P2. I think that with all of the computational engines on the P2, it would make for a really fine indie console engine.

potatohead · 2013-04-28 23:01

Yeah, I'm thinking it too. There is plenty of compute for interesting games. Until some of us write poly engines, we won't really know what a P2 can do. I am studying those right now. In the past, I've written 3D graphics, a STL file viewer in C, but that was written to OpenGL + GLUT. The hard stuff was just a call away. Really curious to see how people tackle that stuff on this chip.

IMHO, Chip's other vision is good too. A real time or just fast display of real world things. A P2 packaged nicely, with storage (2 SD cards maybe, one for system, one for data), some nicely designed I/O, etc... could make for a great bench machine to do all sorts of stuff. That one is appealing to me personally as I've got P1's setup as a cheap logic analyzer, which was plenty fast for my Apple 2, and a video test generator / audio tone generator.

Such a machine with some interactivity, maybe a quickie DOS, or menu / GUI, whatever, could present a wide range of tools to somebody looking to explore on the cheap. When Chip gets done with the P2 native dev environment... that same setup could be used on the fly to do lots of additional things.

One other thought I had was "demo scene" type things. Put a limit out there, say HUB memory only, or one load that is X size, then see what people can do. Today, while exploring Jim's SDRAM work, I'm just kind of floored at how roomy things are and at overall performance a few COGS down, and clocked at 1/3 - 1/4 speed. (depends on how we see overclocking play out, and maybe it will be like P1, with some room to spare = 200Mhz)

pedward · 2013-04-28 23:44

I was thinking about how streaming from SD could be useful. Even on the P1, you can stream audio from the SD and play it. I was thinking of making an app that does this, but broadcasts the audio over FM -- just as an exercise.

I asked Chip whether we could direct generate high quality sine waves at broadcast frequencies, nope ;( He said 10Mhz was about the limit.

Coley · 2013-04-29 12:02

My thoughts exactly and plans are already underway ;-)

pedward wrote: »

With all of the uber awesome capabilities everyone has demonstrated with visual effects, I can't help but think the P2 represents a new dawn in indie video game development. A truly accessible platform that rewards clever video game developers with neat effects.

I was just checking the specs of the Propeller 2 against the Playstation 1:

MIPS: P2 160 per COG 1280 per chip -- PSX 33 main CPU, 66 geometry engine
Video: P2 1bit, 4bit, 8bit, 15bit, 16bit, 24bit, res is whatever you want -- PSX 256-640 x 240 NI or 256-640 x 480 Interlaced, 24bit

There are a bunch of other things, but I don't know how to really compare the geometry engines, but at the rendering rate of the PSX, you'd have 444 clock cycles to render a polygon on the P2 and achieve the same rate. 888 instructions if they are texture mapped polygons.

The PSX has some hard limits like 256x256 sprites, but obviously these are only limited by what a programmer wants on the P2. I think that with all of the computational engines on the P2, it would make for a really fine indie console engine.

David Betz · 2013-04-29 12:22

pedward wrote: »

With all of the uber awesome capabilities everyone has demonstrated with visual effects, I can't help but think the P2 represents a new dawn in indie video game development. A truly accessible platform that rewards clever video game developers with neat effects.

I was just checking the specs of the Propeller 2 against the Playstation 1:

MIPS: P2 160 per COG 1280 per chip -- PSX 33 main CPU, 66 geometry engine
Video: P2 1bit, 4bit, 8bit, 15bit, 16bit, 24bit, res is whatever you want -- PSX 256-640 x 240 NI or 256-640 x 480 Interlaced, 24bit

There are a bunch of other things, but I don't know how to really compare the geometry engines, but at the rendering rate of the PSX, you'd have 444 clock cycles to render a polygon on the P2 and achieve the same rate. 888 instructions if they are texture mapped polygons.

The PSX has some hard limits like 256x256 sprites, but obviously these are only limited by what a programmer wants on the P2. I think that with all of the computational engines on the P2, it would make for a really fine indie console engine.

The one problem I see with this is that even early game platforms had more directly addressable memory than the P2. For instance, the PS1 has 2MB. I know we have fast access to SDRAM but won't we have to make a VM that can directly address that external memory to compete with even these early platforms?

rod1963 · 2013-04-29 12:56

If you can get Doom ported and running, you got a chance. This will validate IMO the P2 as a retro game station.

David Betz · 2013-04-29 12:57

rod1963 wrote: »

If you can get Doom ported and running, you got a chance. This will validate IMO the P2 as a retro game station.

I guess to do that we'll need PropGCC running code out of SDRAM.

pedward · 2013-04-29 13:05

FWIW, right now we have 60Mhz at 16bits ~120MB/s with 89% efficiency.

Multiply this up to the actual clock speed of the real chip, and the speed goes to 320MB/s. This is about 2.5 times the bus speed of the PSX.

However, the PSX used 60ns RAM, which is only 16Mhz effective speed at 32bits width, for a whopping 64MB/s bandwidth.

Even with the caveat of marshaled SDRAM access, I see the P2 beating the PSX in many areas.

The LMM (or whatever it will be called) kernel will have to play nicely with the SDRAM, but still should be able to move fast.

Another option I was thinking about is using a second prop2 as a memory controller and using it to simplify the DMA access to the primary prop.

The XFR can dump data into COG ram at the same rate as it can execute, so you can stream instructions from external memory an execute them in realtime. If you setup 32bit XFR between the 2 props, you could jam data from one to the other at full clock speed and allow a 4 instruction loop to repeat very quickly from an overlay.

You could also use the CLUT as an advanced cache if you precompile the code to know what it needs to cache prior to execution.

I believe the Prop2 cache would be considered a direct mapped 4 long data cache.

Here's another thing you can do:

Stuff a bunch of data into the CLUT in one COG, then use the XFR to transfer data via port D from COG a to COG b in realtime. If COG a was the SDRAM COG, it could stuff data into CLUT and service HUB memory. The SDRAM would drop directly into the CLUT, 256 clocks, then the COG would initiate a transfer to port D and the other one would do that as well. You could copy data from one CLUT to another using DMA and no COG clocks.

By using the CLUT to do DMA transfer of reads from SDRAM to another COG, you free up the SDRAM server COG to do a memory transfer from HUB into buffer space in that COG too. Once the XFR is done, initiate another from buffered memory.

You can handle starts and stops of transfers by using SETTASK to cause the stop to execute 16 clocks after the start, then use other threads to yank data from memory and adjust the queue pointers. It's clear that this COG code will become very complicated to make the most efficient use of resources (write combining, read combining, scatter-gather, etc).

The other neat thing is that COG b could just stream this data into 4 remapped registers and execute 256 instructions in sequence. This would allow unrolled loops to execute fairly fast. Obviously there are some timing issues that need to be worked out, but there is a lot of potential there.

pedward · 2013-04-29 13:09

The key to getting good performance will be to implement smart caching algorithms to handle the penalty of SDRAM read/write bottlenecking.

The P2 already would benefit from a 2 or 4 way associative read cache. This way you could have 2 or 4 readXXXXc instructions execute within a tight loop and not hit a hub penalty.

I realize the existing cache is limited by the architecture and dovetails precisely into the design.

David Betz · 2013-04-29 13:11

All of this discussion about streaming instructions into the LMM COG and executing them in real time worries me because it only really speeds up linear code. I think in practice most code branches a lot and much of the advantage of streaming code may be lost when flushing caches on branches.

pedward · 2013-04-29 13:19

David Betz wrote: »

All of this discussion about streaming instructions into the LMM COG and executing them in real time worries me because it only really speeds up linear code. I think in practice most code branches a lot and much of the advantage of streaming code may be lost when flushing caches on branches.

We are talking about video games, right?

For general purpose LMM code, performance will be limited because of the nature of how memory is accessed. In this case, it might make sense to read large blocks and cache them into HUB ram to try and cover branches and such. Perhaps short jump branches and long jump branches, the short jumps are within the cached block, long jumps invalidate the cache and force a load from SDRAM.

Only purpose written code could run the unrolled streaming instruction transfer. This could be particularly useful to codecs that are tightly written. Although, you would likely want a CODEC to be in COG cache and run from there, streaming data, but in the case you had some really long sequence of instructions to run, this would allow that to happen.

David Betz · 2013-04-29 13:22

pedward wrote: »

We are talking about video games, right?

For general purpose LMM code, performance will be limited because of the nature of how memory is accessed. In this case, it might make sense to read large blocks and cache them into HUB ram to try and cover branches and such. Perhaps short jump branches and long jump branches, the short jumps are within the cached block, long jumps invalidate the cache and force a load from SDRAM.

Only purpose written code could run the unrolled streaming instruction transfer. This could be particularly useful to codecs that are tightly written. Although, you would likely want a CODEC to be in COG cache and run from there, streaming data, but in the case you had some really long sequence of instructions to run, this would allow that to happen.

I guess I was thinking of trying to run the Doom source code which probably isn't written with this streaming in mind. You're certainly right that code written specifically to take advantage of this instruction streaming would be very fast.

rod1963 · 2013-04-29 14:00

If I read Pedward's analysis right. It means that open source games are unfit for the P2 as their source code would have to be completely rewritten to in order to make it Prop friendly.

So no DukeNukem or Doom for the P2.

tonyp12 · 2013-04-29 14:04

As you don't need many pins for video,audio out and Bluetooth-input for controllers.
Maybe can make the sdram access a 64bit affair?

pedward · 2013-04-29 14:45

tonyp12 wrote: »

As you don't need many pins for video,audio out and Bluetooth-input for controllers.
Maybe can make the sdram access a 64bit affair?

I suggested to Chip that we could have 2 SDRAM chips on different COGS, perhaps the compromise is to do 2 SDRAM chips with a 32bit wide bus, slaved together.

The difference would be in instance 1, you'd have interleaved memory access (faster random access) or 2, more bus width to double bandwidth, with less loss of pins.

640MB/s raw for 2 chips is a pretty good deal. That would enable long-direct fetches instead of 2 clock fetches.

pedward · 2013-04-29 14:51

rod1963 wrote: »

If I read Pedward's analysis right. It means that open source games are unfit for the P2 as their source code would have to be completely rewritten to in order to make it Prop friendly.

So no DukeNukem or Doom for the P2.

I think that's a rather bold and potent statement, not one I'd get behind.

Doom could run on a 386-40Mhz with 4MB of RAM and 70ns memory.

At 70ns that's 14.2 Mhz at 32bits, ~50MB/s. The 386-40 had a bus speed of 160MB/s, so it had roughly 3 or 4 wait states for memory access.

A lot of what I was referring to is compiler magic, LMM kernel magic, and SDRAM driver magic, the C code that id wrote in 1993 wouldn't be altered significantly.

Rayman · 2013-04-29 18:59

Personally, I hope it's the next great retro game platform

I'm hoping that somebody is able to port MAME to P2...
P2 should be perfect for emulating old games.
I have an arcade cabinet with an old PC inside running MAME.
It works pretty well, but is big and slow and takes time to boot.
I think a P2 Jamma interface would be great...

David Betz · 2013-04-29 19:59

Rayman wrote: »

Personally, I hope it's the next great retro game platform

I'm hoping that somebody is able to port MAME to P2...
P2 should be perfect for emulating old games.
I have an arcade cabinet with an old PC inside running MAME.
It works pretty well, but is big and slow and takes time to boot.
I think a P2 Jamma interface would be great...

In what way would running MAME on a P2 be better than running it on, say, a RaspPi or a BeagleBone? Would it be cheaper? Would it perform better? I'm not trying to be difficult. I'm just trying to understand in what way the P2 would make a superior retro gaming platform.

cgracey · 2013-04-29 20:47

I don't think the Prop2 will be a contender in the emulation game, but it will do some neat things that cross over between computers, micros, FPGAs, and analog. The world may not ever care a whole lot about it, but people who like to tinker and have a distrust towards things that are overly complicated, unknowable, and possibly compromised will hopefully be drawn to it. It only does what YOU want, not what the marketing of a big corporation wants, not what any government wants, and not what some thief wants. That's better than every other system I currently program on.

pedward · 2013-04-29 23:09

One thing Chip said: knowable

I like the Propeller architecture because there is so little scope, was 1 chip, soon 2.

You can become very proficient in understanding the architecture, then have a set of tools you can apply to problems you encounter. By no means is it a hammer looking at everything as a nail. Too many other manufacturers deprecate their chips and constantly change them. Heck, the datasheet for the H8/3664 chip is over 300 pages of nauseating register descriptions and terrible flow charts, which still leave you as bewildered as when you started.

What I think is important is that you can master a good portion of the Propeller design and use, then apply higher level software development skills to create software defined peripherals. You aren't at the mercy of poor documentation and you can make the peripheral do whatever you want.

You aren't constrained by the designer's view of what constitutes a proper micro. It's clear that companies like Motorola give far more consideration to their designs than companies like Hitachi (Freescale and Renesas respectively, and WTF did they eschew names that everyone knew and made up stupid sounding names?)

While you do encounter some limitations from time to time with the Propeller architecture, due to the generalized nature of the design, you often can find compromises to fit the requirement.

The way I see it, the Propeller 2 is so powerful and generalized that it could be really good at a lot of things. It can be a good arcade/indie game console, it can be a rippin' PLC, it can be a custom crypto controller, it can be a pedestrian protocol converter, it can be an old tech to new tech glue logic, it can be a sign driver, it can be a dead drop or geocache controller. There are many applications we can't yet imagine, which it will be good at.

I think an SDR transceiver would be neat, working with any protocol within the range it could tune. You could also have a ham radio add-on device that does all the standard digital standards like PSK, etc and have a waterfall display, with narrow band spectrum analyzer (200Khz is very usable, 1Mhz even better). If you tap into the pre-filter IF stages, you could apply a lot of DSP directly to the raw data and make ~old~ HF radios work very well.

potatohead · 2013-04-30 00:34

Seconded on the knowable. That's worth a lot.

Re: MAME

Well, who knows? Probably it can do some MAME games, maybe through the mid-90's. Frankly, I've very little interest in that. Lots of great ways to run MAME. A general purpose PC can do that task so well now, it's silly. There is one specific thing though, and that is driving older displays. You can do it on a PC, if you get some graphics hardware capable of the signals and timing. Older Matrox cards are a good choice, but they are 3D weak, but then again, the games that ran on those displays really aren't too demanding in the 3D department anyway.

That said, a P2 will be ideal for driving any old display, and it will do so old-school, accurate. There is a niche appeal there that the right people will do a lot of work and pay some to realize. Everybody else will just retrofit a more modern display and call it good, and for those people, a PC running something will get the job done very nicely. I could be wrong on that too.

One potential area of superiority is low latency input and display. Some retro games really depend on things happening frame exact, or less! Android isn't known for low latency input. Lag has been at issue the whole time. Other OSes can do better, but still it's not the same as the old school controls mapped right to in memory registers that could be read directly. There is the controller hardware, shifters, USB, drivers, OS, etc... in the way, which does impact things. I notice this on Atari 2600 emulation. A few titles there just do not play the way they do on the real console when using an analog display.

Rayman · 2013-04-30 03:35

You're right about the arcade monitors... They need special timings that most video cards won't do. It took me a lot of research to find a graphics card that would look right on my monitor. Even so, it is sideways and out of sync when booting and only looks right once mame takes over.
I'm not even sure if I could find a replacement these days...

BTW: It's really not the same without a real arcade monitor...

Maybe I'm missing something, but I think P2 is perfect for arcade emulation...

Pharseid380 · 2013-04-30 16:57

Doom wouldn't use the Prop ]['s special texture mapping ops, so isn't the optimal game for it. A more interesting possibility is to write code to display the .MAP level format from scratch, which would utilize the P2's 3D potential. Not as simple as just recompiling C code, but .MAP was a standard used by many companies for a while (I think a lot of other formats are just tweaked .MAP), so most of the work of implementing a couple generations of 3D games could be done in one stroke.

Rayman · 2013-04-30 18:48

I like that idea... I remember using a CAD-like design tool that created .map files that you could then walk around in...

FredBlais · 2013-05-02 06:16

Rayman wrote: »

Maybe I'm missing something, but I think P2 is perfect for arcade emulation...

I agree. I still use my Hydra a lot (mainly for prototyping). There is some interesting games on it!

I wonder if we might see an Hydra 2 after the launch

JT Cook · 2013-05-02 16:28

MAME is not a good candidate for P2. The versions of MAME that run well on smaller ARM systems have CPU cores that written in ARM assembly. Also comparing Doom 386 to a Propeller 2 is apples and oranges too since Doom used a lot of X86 assembly. The C only version of Doom is not the same as the one that will run on a 386. Doom maybe doable on a P2, but it just won't be a straight compile like on a Linux based system. And even if somehow someone compiles a MAME bin that runs on the P2, the performance, I would wager, would be horrible.

With that said, the P2 can still probably emulate some old school arcade games, but they will have to be custom written emulators from the ground up.

Pharseid380 · 2013-05-02 16:45

In a post quite some time back, somebody mentioned using an FPGA to bootstrap the P2. If a Hydra 2 is made, they should give the FPGA it's own SDRAM and expand its role to implement some of the 3D rendering pipeline. After transforming, you can do backface culling to eliminate half the polygons that need to be considered. After projecting, you can eliminate even more which don't fall within the viewport. Aside from the processing cycles saved for the P2, you would probably eliminate at least 75% of the polygons that needed to be considered for rendering, freeing up a whole lot of memory bandwidth. The complexity and resolution of 3D scenes which could be rendered could be greatly improved in that way.

Rayman · 2013-05-02 18:43

There is already a z80 emulator (I think from Heater) that runs in 1 P1 cog. I think a few of these and a video driver and a sound dirver are all you need.
So, maybe the ASM part of MAME would have to be recoded for P2, but I'm pretty sure it would work.

Heater. · 2013-05-02 18:59

We have my ZiCog Z80 emulator. It is missing a couple of lesser used Z80 instructions. It runs CP/M just fine though as most CP/M programs don't use those operations. I'm looking forward to getting ZiCog up on the PII where we won't need the hassle of external adding RAM. I'm not really interested in pushing ZiCog into other machine emulations which require special sound and graphics. Anyone is welcome to take the CPU emulation and run with it though.

Then there is the qZ80 from PulMol. That has 100% accurate instruction emulation and has been used for game machine simulation already.

As it happens PullMoll also developed the Z80 emulation used in MAME.

http://forums.parallax.com/showthread.php/110804-ZiCog-a-Zilog-Z80-emulator-in-1-Cog

http://forums.parallax.com/showthread.php/121579-qZ80-the-third-shot

jac_goudsmit · 2013-05-05 11:09

I'm thinking with the extra hub memory of the P2 it will be definitely be possible to do a version of my project (software defined 6502 computer) without external RAM, and with the extra speed it may be possible to emulate a 6502 at full speed too. The hard part of emulating old hardware is often the video hardware, but from what I've seen of the P2 so far, I have a feeling that graphics at the level of, say, aC64 or a BBC computer should be in the realm of possibilities...

===Jac

Propeller 2: next great Indie video game platform?

Comments