Propeller 2: next great Indie video game platform?
pedward
Posts: 1,642
With all of the uber awesome capabilities everyone has demonstrated with visual effects, I can't help but think the P2 represents a new dawn in indie video game development. A truly accessible platform that rewards clever video game developers with neat effects.
I was just checking the specs of the Propeller 2 against the Playstation 1:
MIPS: P2 160 per COG 1280 per chip -- PSX 33 main CPU, 66 geometry engine
Video: P2 1bit, 4bit, 8bit, 15bit, 16bit, 24bit, res is whatever you want -- PSX 256-640 x 240 NI or 256-640 x 480 Interlaced, 24bit
There are a bunch of other things, but I don't know how to really compare the geometry engines, but at the rendering rate of the PSX, you'd have 444 clock cycles to render a polygon on the P2 and achieve the same rate. 888 instructions if they are texture mapped polygons.
The PSX has some hard limits like 256x256 sprites, but obviously these are only limited by what a programmer wants on the P2. I think that with all of the computational engines on the P2, it would make for a really fine indie console engine.
I was just checking the specs of the Propeller 2 against the Playstation 1:
MIPS: P2 160 per COG 1280 per chip -- PSX 33 main CPU, 66 geometry engine
Video: P2 1bit, 4bit, 8bit, 15bit, 16bit, 24bit, res is whatever you want -- PSX 256-640 x 240 NI or 256-640 x 480 Interlaced, 24bit
There are a bunch of other things, but I don't know how to really compare the geometry engines, but at the rendering rate of the PSX, you'd have 444 clock cycles to render a polygon on the P2 and achieve the same rate. 888 instructions if they are texture mapped polygons.
The PSX has some hard limits like 256x256 sprites, but obviously these are only limited by what a programmer wants on the P2. I think that with all of the computational engines on the P2, it would make for a really fine indie console engine.
Comments
IMHO, Chip's other vision is good too. A real time or just fast display of real world things. A P2 packaged nicely, with storage (2 SD cards maybe, one for system, one for data), some nicely designed I/O, etc... could make for a great bench machine to do all sorts of stuff. That one is appealing to me personally as I've got P1's setup as a cheap logic analyzer, which was plenty fast for my Apple 2, and a video test generator / audio tone generator.
Such a machine with some interactivity, maybe a quickie DOS, or menu / GUI, whatever, could present a wide range of tools to somebody looking to explore on the cheap. When Chip gets done with the P2 native dev environment... that same setup could be used on the fly to do lots of additional things.
One other thought I had was "demo scene" type things. Put a limit out there, say HUB memory only, or one load that is X size, then see what people can do. Today, while exploring Jim's SDRAM work, I'm just kind of floored at how roomy things are and at overall performance a few COGS down, and clocked at 1/3 - 1/4 speed. (depends on how we see overclocking play out, and maybe it will be like P1, with some room to spare = 200Mhz)
I asked Chip whether we could direct generate high quality sine waves at broadcast frequencies, nope ;( He said 10Mhz was about the limit.
Multiply this up to the actual clock speed of the real chip, and the speed goes to 320MB/s. This is about 2.5 times the bus speed of the PSX.
However, the PSX used 60ns RAM, which is only 16Mhz effective speed at 32bits width, for a whopping 64MB/s bandwidth.
Even with the caveat of marshaled SDRAM access, I see the P2 beating the PSX in many areas.
The LMM (or whatever it will be called) kernel will have to play nicely with the SDRAM, but still should be able to move fast.
Another option I was thinking about is using a second prop2 as a memory controller and using it to simplify the DMA access to the primary prop.
The XFR can dump data into COG ram at the same rate as it can execute, so you can stream instructions from external memory an execute them in realtime. If you setup 32bit XFR between the 2 props, you could jam data from one to the other at full clock speed and allow a 4 instruction loop to repeat very quickly from an overlay.
You could also use the CLUT as an advanced cache if you precompile the code to know what it needs to cache prior to execution.
I believe the Prop2 cache would be considered a direct mapped 4 long data cache.
Here's another thing you can do:
Stuff a bunch of data into the CLUT in one COG, then use the XFR to transfer data via port D from COG a to COG b in realtime. If COG a was the SDRAM COG, it could stuff data into CLUT and service HUB memory. The SDRAM would drop directly into the CLUT, 256 clocks, then the COG would initiate a transfer to port D and the other one would do that as well. You could copy data from one CLUT to another using DMA and no COG clocks.
By using the CLUT to do DMA transfer of reads from SDRAM to another COG, you free up the SDRAM server COG to do a memory transfer from HUB into buffer space in that COG too. Once the XFR is done, initiate another from buffered memory.
You can handle starts and stops of transfers by using SETTASK to cause the stop to execute 16 clocks after the start, then use other threads to yank data from memory and adjust the queue pointers. It's clear that this COG code will become very complicated to make the most efficient use of resources (write combining, read combining, scatter-gather, etc).
The other neat thing is that COG b could just stream this data into 4 remapped registers and execute 256 instructions in sequence. This would allow unrolled loops to execute fairly fast. Obviously there are some timing issues that need to be worked out, but there is a lot of potential there.
The P2 already would benefit from a 2 or 4 way associative read cache. This way you could have 2 or 4 readXXXXc instructions execute within a tight loop and not hit a hub penalty.
I realize the existing cache is limited by the architecture and dovetails precisely into the design.
We are talking about video games, right?
For general purpose LMM code, performance will be limited because of the nature of how memory is accessed. In this case, it might make sense to read large blocks and cache them into HUB ram to try and cover branches and such. Perhaps short jump branches and long jump branches, the short jumps are within the cached block, long jumps invalidate the cache and force a load from SDRAM.
Only purpose written code could run the unrolled streaming instruction transfer. This could be particularly useful to codecs that are tightly written. Although, you would likely want a CODEC to be in COG cache and run from there, streaming data, but in the case you had some really long sequence of instructions to run, this would allow that to happen.
So no DukeNukem or Doom for the P2.
Maybe can make the sdram access a 64bit affair?
I suggested to Chip that we could have 2 SDRAM chips on different COGS, perhaps the compromise is to do 2 SDRAM chips with a 32bit wide bus, slaved together.
The difference would be in instance 1, you'd have interleaved memory access (faster random access) or 2, more bus width to double bandwidth, with less loss of pins.
640MB/s raw for 2 chips is a pretty good deal. That would enable long-direct fetches instead of 2 clock fetches.
I think that's a rather bold and potent statement, not one I'd get behind.
Doom could run on a 386-40Mhz with 4MB of RAM and 70ns memory.
At 70ns that's 14.2 Mhz at 32bits, ~50MB/s. The 386-40 had a bus speed of 160MB/s, so it had roughly 3 or 4 wait states for memory access.
A lot of what I was referring to is compiler magic, LMM kernel magic, and SDRAM driver magic, the C code that id wrote in 1993 wouldn't be altered significantly.
I'm hoping that somebody is able to port MAME to P2...
P2 should be perfect for emulating old games.
I have an arcade cabinet with an old PC inside running MAME.
It works pretty well, but is big and slow and takes time to boot.
I think a P2 Jamma interface would be great...
I like the Propeller architecture because there is so little scope, was 1 chip, soon 2.
You can become very proficient in understanding the architecture, then have a set of tools you can apply to problems you encounter. By no means is it a hammer looking at everything as a nail. Too many other manufacturers deprecate their chips and constantly change them. Heck, the datasheet for the H8/3664 chip is over 300 pages of nauseating register descriptions and terrible flow charts, which still leave you as bewildered as when you started.
What I think is important is that you can master a good portion of the Propeller design and use, then apply higher level software development skills to create software defined peripherals. You aren't at the mercy of poor documentation and you can make the peripheral do whatever you want.
You aren't constrained by the designer's view of what constitutes a proper micro. It's clear that companies like Motorola give far more consideration to their designs than companies like Hitachi (Freescale and Renesas respectively, and WTF did they eschew names that everyone knew and made up stupid sounding names?)
While you do encounter some limitations from time to time with the Propeller architecture, due to the generalized nature of the design, you often can find compromises to fit the requirement.
The way I see it, the Propeller 2 is so powerful and generalized that it could be really good at a lot of things. It can be a good arcade/indie game console, it can be a rippin' PLC, it can be a custom crypto controller, it can be a pedestrian protocol converter, it can be an old tech to new tech glue logic, it can be a sign driver, it can be a dead drop or geocache controller. There are many applications we can't yet imagine, which it will be good at.
I think an SDR transceiver would be neat, working with any protocol within the range it could tune. You could also have a ham radio add-on device that does all the standard digital standards like PSK, etc and have a waterfall display, with narrow band spectrum analyzer (200Khz is very usable, 1Mhz even better). If you tap into the pre-filter IF stages, you could apply a lot of DSP directly to the raw data and make ~old~ HF radios work very well.
Re: MAME
Well, who knows? Probably it can do some MAME games, maybe through the mid-90's. Frankly, I've very little interest in that. Lots of great ways to run MAME. A general purpose PC can do that task so well now, it's silly. There is one specific thing though, and that is driving older displays. You can do it on a PC, if you get some graphics hardware capable of the signals and timing. Older Matrox cards are a good choice, but they are 3D weak, but then again, the games that ran on those displays really aren't too demanding in the 3D department anyway.
That said, a P2 will be ideal for driving any old display, and it will do so old-school, accurate. There is a niche appeal there that the right people will do a lot of work and pay some to realize. Everybody else will just retrofit a more modern display and call it good, and for those people, a PC running something will get the job done very nicely. I could be wrong on that too.
One potential area of superiority is low latency input and display. Some retro games really depend on things happening frame exact, or less! Android isn't known for low latency input. Lag has been at issue the whole time. Other OSes can do better, but still it's not the same as the old school controls mapped right to in memory registers that could be read directly. There is the controller hardware, shifters, USB, drivers, OS, etc... in the way, which does impact things. I notice this on Atari 2600 emulation. A few titles there just do not play the way they do on the real console when using an analog display.
I'm not even sure if I could find a replacement these days...
BTW: It's really not the same without a real arcade monitor...
Maybe I'm missing something, but I think P2 is perfect for arcade emulation...
I agree. I still use my Hydra a lot (mainly for prototyping). There is some interesting games on it!
I wonder if we might see an Hydra 2 after the launch
With that said, the P2 can still probably emulate some old school arcade games, but they will have to be custom written emulators from the ground up.
So, maybe the ASM part of MAME would have to be recoded for P2, but I'm pretty sure it would work.
Then there is the qZ80 from PulMol. That has 100% accurate instruction emulation and has been used for game machine simulation already.
As it happens PullMoll also developed the Z80 emulation used in MAME.
http://forums.parallax.com/showthread.php/110804-ZiCog-a-Zilog-Z80-emulator-in-1-Cog
http://forums.parallax.com/showthread.php/121579-qZ80-the-third-shot
===Jac