Part of me would like to try a somewhat Gigatron-compatible machine without a native Gigatron underneath. I mean, why emulate the native Gigatron and have it emulate the vCPU set? It seems it would be more efficient to directly emulate vCPU.
But, since the native core does things other than run vCPU, those tasks would need to be handled another way. The Gigatron hackers don't really think that is possible. I think it can be, but it won't be easy. I think a supervisor (or "management engine") would be needed. It can handle things like initializing the cores, displaying a menu, playing the startup sound (a G chord), running the loader program, initializing memory, managing hotkeys, looking for a terminated program, and providing entropy. Really, the I/O core could do that and ensure the random number location is updated at least once per frame, but doing that when the supervisor is in standby mode can allow it to be updated more frequently. Of course, there is already the syscall for updating it on demand. TBH, I'd rather have that as an opcode if I were to spin my own off of this.
So the supervisor core would be what starts the machine, initializes things, and calls the loader. Then it passes control to the vCPU (or other cores if I were to add the 6502 core too, for instance). Then, as a change, have the supervisor core look for things such as terminated or hung programs, and if that is the case, then take control back, maybe provide a press any key prompt, and then reload itself. That would be an innovation over the original design which means you either use the hotkey (or start button on the game controller) or unplug it from the wall.
As for the video core, I say that for compatibility, maybe use it to gate the vCPU core, though really, it would be nice to have a new vCPU format and memory map, with options for letting coders determine this gating. Thus you can operate it without any gating between the syncs and the vCPU except when they are manually placed. So have extra vCPU opcodes to set the video gating mode. So either run only during the syncs (mode 1), run during the syncs and 1/4 the native scanlines (mode 2), run during the syncs and 1/2 the scanlines (mode 3), or run during the syncs and 3/4 the scanlines (mode 4), or run completely asynchronously of the video core during syncs and all scanlines (may tearing and frame races be damned). With a way to control this, whether an opcode or a memory location (private or exposed), the programmer could alternate between mode 4 and the new mode (equivalent to bus snooping). I wouldn't bother with actual line-skipping and some "retro look" as that would not be necessary since the video core would operate out of concurrent "DMA" (thank you hub). Programs can set which mode of 1-4 already, so gating would be good for compatibility, but for a new program format, make the ability to run ungated a part of the specification. Now, on the P2, is it possible to use hardware interrupts to busy wait when there are no syncs?
And for other things like BASIC, that could have a cog and only be functionally compatible. Thus, just rewrite the best BASIC or a combination of the 3 (untokenized "Tiny BASIC," tokenized BASIC, and Apple-Microsoft BASIC). Thus it should fly.
Yes, the P2 event system (which includes interrupts) can be used to busy wait, and also to stall wait (e.g. WAITSEx). Stall waiting should give the best response time to the event.
I wonder if the P2 would be good for doing an entire arcade machine. I imagine, for older titles, it would be no sweat, and it could do multi-game as well.
Even solid-state pinball machines could be more reliable with a P2. I mean, it could greatly reduce the number of boards needed. You'd likely need the power board and the solenoid driver board (which is also a power board), and the MPU board could do everything else. That would mean fewer connectors. Connectors are a major failure point.
I don't know if it could run Doom. I don't remember how big the WAD file is. Doom would probably benefit if the game engine could be split across different cogs. Maybe have a cog just for computing enemies. Doom is a game that has been ported to many environments. So yeah, that would be closer to arcade style since that is how older arcade machines were. They were single-purpose computers and the boards were originally tailored to each title. While they had one or more CPUs, the boards were custom. Then after a point, they began to make things more common, so for a multigame, it was just a matter of switching ROMs, assuming the user interface was compatible (ie., joysticks, flight stick, steering wheel, bike handles, or whatever).
I wonder if the P2 would be good for doing an entire arcade machine.
It can emulate NeoGeo, a machine from 1990s
I don't know if it could run Doom. I don't remember how big the WAD file is.
Original Doom was MS-DOS game. Maybe these files are too big for a standard P2, as this is 90s game, when at least 4 MB of extended memory was a standard feature, but P2-EC32 should be more than enough to fit it
Original Doom was MS-DOS game. Maybe these files are too big for a standard P2, as this is 90s game, when at least 4 MB of extended memory was a standard feature, but P2-EC32 should be more than enough to fit it
Well, why not use external RAM? But I think it would need a new game engine in native P2 code, and maybe broken down to use different cogs for different parts of the game. For instance, handling the monsters could get its own cog. The P2 has the floating point power to make the calculations with most FP ops taking 2 P2 cycles (when Doom ran on 33 MHz machines). Sure, if you need the CORDIC solver, that's 58 cycles or whatever, though that could probably still be faster than on a 486 for anything that needs that.
You don't need to access the entire *.WAD files at once. They contain all the media for the game, including the maps, sprites, tiles, sounds, and music. There is also the Reject table too. The Reject table was a refinement to reduce floating point calculations and to allow for special effects like dormant monsters. So if there is no clear line of sight from the monster to the player, the monster isn't updated. There is no need to calculate the line of sight angles and everything for a monster if it is not possible for the monster to hit you. So that saved some processing power instead of blindly updating every nearby monster. Still, more power could be saved if things were multi-threaded. So have a thread for the player and a thread for the monsters.
So one would need to make use of overlays to be sure. Keep everything on flash memory cards, and maybe have a cog for a loading and overlay manager.
Well, if you don't mind having about Atari 2600 performance, I could see that. If you want to go beyond that with full colors, complex soundtracks, etc., then you'd get by with fewer concurrent games.
While Transputer and Occam language had been discussed at this form in 2008,2010, they might not yet have been mentioned in this thread. (It is so long now) https://www.transputer.net/iset/isbn-013929134-2/tthb.pdf . P2 could emulate a 8 core Transputer network?
Well, if you don't mind having about Atari 2600 performance, I could see that. If you want to go beyond that with full colors, complex soundtracks, etc., then you'd get by with fewer concurrent games.
Well, you were suggesting older titles and multi-player. I was just showing what had already been done.
As far as high performance goes, the work Ada has done is incredible with her Neo-Geo emulator.
No, I said multi-game (multiple titles, playing one at a time), not multi-player.
@PurpleGirl said:
I wonder if the P2 would be good for doing an entire arcade machine. I imagine, for older titles, it would be no sweat, and it could do multi-play as well.
Even solid-state pinball machines could be more reliable with a P2. I mean, it could greatly reduce the number of boards needed. You'd likely need the power board and the solenoid driver board (which is also a power board), and the MPU board could do everything else. That would mean fewer connectors. Connectors are a major failure point.
I don't know if it could run Doom. I don't remember how big the WAD file is. Doom would probably benefit if the game engine could be split across different cogs. Maybe have a cog just for computing enemies. Doom is a game that has been ported to many environments. So yeah, that would be closer to arcade style since that is how older arcade machines were. They were single-purpose computers and the boards were originally tailored to each title. While they had one or more CPUs, the boards were custom. Then after a point, they began to make things more common, so for a multigame, it was just a matter of switching ROMs, assuming the user interface was compatible (ie., joysticks, flight stick, steering wheel, bike handles, or whatever).
Emphasis added to show where I think the confusion crept in.
As far as emulation of common hardware with multiple games is concerned, Ada's work with both MegaYume and NeoYume shows some of what can be done with the P2, some external RAM, and an SD card as the bulk storage device.
Emphasis added to show where I think the confusion crept in.
As far as emulation of common hardware with multiple games is concerned, Ada's work with both MegaYume and NeoYume shows some of what can be done with the P2, some external RAM, and an SD card as the bulk storage device.
Multiplay or multigame is the industry term for an arcade cabinet with multiple titles.
Emphasis added to show where I think the confusion crept in.
As far as emulation of common hardware with multiple games is concerned, Ada's work with both MegaYume and NeoYume shows some of what can be done with the P2, some external RAM, and an SD card as the bulk storage device.
Multiplay or multigame is the industry term for an arcade cabinet with multiple titles.
Something I learned today.
For many of us I would suggest, the term multiplay looks much the same as multiplayer, and a web search for 'multiplay definition' gives many hits from seemingly credible sources that would support that confusion.
Being outside of the industry, and therefore not across this jargon, the distinction wasn't obvious to me.
The only reason I say any of this is to try to clear up the confusion.
Emphasis added to show where I think the confusion crept in.
As far as emulation of common hardware with multiple games is concerned, Ada's work with both MegaYume and NeoYume shows some of what can be done with the P2, some external RAM, and an SD card as the bulk storage device.
Multiplay or multigame is the industry term for an arcade cabinet with multiple titles.
Something I learned today.
For many of us I would suggest, the term multiplay looks much the same as multiplayer, and a web search for 'multiplay definition' gives many hits from seemingly credible sources that would support that confusion.
Being outside of the industry, and therefore not across this jargon, the distinction wasn't obvious to me.
The only reason I say any of this is to try to clear up the confusion.
I think I meant multigame. Multiplay may have actually referred to pinball machines and referred more to being able to play multiple rounds on the same quarter, often due to winning them (by skill and occasionally just by luck). Still, that is not the same as multiplayer.
So I think I might have used the wrong term by mistake. And I mentioned older titles since there would be enough room for them since they are small. So using the wrong term is easy to do when you spend a lot of time watching Ron Lyons work on pinball and arcade games. If you look it up on YouTube, the term to search for is Joe's Classic Arcade. Joe owns the shop, but his brother Ron is the one who does the videos. I've seen Joe do a couple of videos, but Ron does most, and in some of Ron's videos, Joe does playfield teardown, cleaning/waxing, and reassembly. But Ron does the paint touchup. In an arcade machine with a hole kicked in it, Ron did the electronic work and the painting, and Joe repaired the hole. Joe made the hole larger into to clean circle and cut out a circle of similar wood that he used Bondo to put in place which he then sanded flush with the other. Ron then matched the paint and freehanded it (with his left hand, BTW), and sprayed clear where he put the flat acrylic. Luckily, the artwork was so busy that you couldn't tell where he repainted it. As for how/why the hole got there, one can assume that was a thief trying to get the nearby quarter tray inside. And pinball machines occasionally come into their shop with jagged holes in the bottom for the same reason. But Joe also has some electronics skills, but he doesn't really go into the more in-depth repairs. I mean, he can recap one and change the flyback transformer, or swap a CRT, which gets the majority of arcade monitors back up. But if there are shorted transistors or holes burned in the high voltage or sweep boards or main board problems, Ron handles those. Donnie, their other brother is more into fixing vehicles and rehabbing houses.
As a recap, I think we drifted in regard to my original mission. It was to make a new platform altogether, and I narrowed things later to 16-bit, preferably with external memory.
So I wanted to know what opcodes to put in a custom ISA, what memory map to use, what video and sound capabilities, etc. And not be limited to a specific solution or goal-oriented design which stereotypically fits guys better. And yes, the aggravation of writing new code for a fresh CPU design is part of the motivation. And I wouldn't mind having other chips on the board.
And I explored the possibility of using multiple P2 chips arranged into a "baseball diamond." So have a main P2 with its 8 cogs, a GP coprocessing P2, a GPU P2, and a massive sound solution on a P2. The main one and the GP coprocessor each connect to 3 P2s, the other 2 only connect to 2. Though of course, 3 might work about as well.
@PurpleGirl said:
As a recap, I think we drifted in regard to my original mission. It was to make a new platform altogether, and I narrowed things later to 16-bit, preferably with external memory.
So I wanted to know what opcodes to put in a custom ISA, what memory map to use, what video and sound capabilities, etc. And not be limited to a specific solution or goal-oriented design which stereotypically fits guys better. And yes, the aggravation of writing new code for a fresh CPU design is part of the motivation. And I wouldn't mind having other chips on the board.
And I explored the possibility of using multiple P2 chips arranged into a "baseball diamond." So have a main P2 with its 8 cogs, a GP coprocessing P2, a GPU P2, and a massive sound solution on a P2. The main one and the GP coprocessor each connect to 3 P2s, the other 2 only connect to 2. Though of course, 3 might work about as well.
I had a look at some Minicomputers in the last weeks. It is interesting, while those evolved quickly into 16bit systems in the early 1970 or so. Then microprocessors where available and solid state RAM and the main focus was to make it cheaper, not better, for about 10 years. So PDP11 became LSI11-23. I think, PDP11 has the most interesting instruction set (for a non-Forth machine :-) ).
However, if you want to have external memory, I think one or two hardware stacks might be a good thing, because they don't need many address lines. Read, write and reset might be sufficient?
In the HP-85 calculator computer they used MMU chips with own address address counters. The counters would watch the CPU operations and would go on counting dependant from the operations. "MMU-coprocessors". Special CPU.
I've been thinking of ways to rework retro machines. For instance, for some tasks like the ROM, why not rewrite that as P2 code and provide some means to intercept that? Sure, leave the original ROM for other things like character sets, programs that jump into those routines, etc., but have generic syscalls and interrupts to use native P2 code. So, one could likely have better FP libraries, for instance. But, if one does that, they'd need to keep things like character sets in mind, and if they are shadowed or can be modified by coders that things appear to software the same way and can be modified the same way on the software end.
And I got to thinking some about the Atari 800. It uses the ANTIC display list processor and GTIA as the video controller. That setup never used a framebuffer. If it had one, programs could be faster since ANTIC used bus-mastering DMA (as did the DRAM refresh), and you could write a value to an ANTIC register to disable it. However, that also turned off the display. Sure, GTIA would keep the syncs and everything, but it would have nothing to display. So with a framebuffer, you could disable ANTIC and maintain a display. And taking that further, why not add something like David Murray's Vera idea? For compatibility, only make Vera registers available when ANTIC is disabled. And of course, only do that if one wants to write code that uses it. So add a framebuffer and a way to use a clothesline memory scheme with auto-increment to directly address it and otherwise use the framebuffer to hold ANTIC's output. So a coder could program the video either way.
I probably wouldn't do either of the above, but rather spin my own. In some ways, that can be easier in that you don't have to come up with clever workarounds to add new features and still be compatible. And that can be harder in others since you have no cannon of software, let alone the core utilities such as assemblers and so on.
Atari 800 has a framebuffer. This is interpreted in the real time by Antic, then data are sent to GTIA and to the video output. That's why there is no picture when Antic is off. The framebuffer is still there but nothing gets data from it.
It seems your idea is to add a secondary framebuffer after GTIA, so when Antic is off, the picture will be still displayed by what? An additional framebuffer controller? Yet this is the way emulators do its job. What is generated by emulated GTIA goes to the buffer and then it becomes a texture that is processed via OpenGL in the graphic card of the host computer. This is the only way it can be done on a PC, but it introdiuces at least one frame of delay. We don't need to do this on a P2, as the emulated Antic/GTIA can work in the real time as in Atari. Then our implementation of Antic doesn't need to stop the emulated 6502 to get access to the Atari framebuffer. It however should, to make cycle-exact emulation needed for demos and several games, but it doesn't have to.
We can thing about "ultra-Antic" and ultra-GTIA, compatible with original, but with new modes and resolutions added. Combine this with "65032" I mentioned somewhere and we have a super-Atari8.
Maybe even better solution is to leave GTIA as is and emulate VBXE instead.
@pik33 said:
Atari 800 has a framebuffer. This is interpreted in the real time by Antic, then data are sent to GTIA and to the video output. That's why there is no picture when Antic is off. The framebuffer is still there but nothing gets data from it.
It seems your idea is to add a secondary framebuffer after GTIA, so when Antic is off, the picture will be still displayed by what? An additional framebuffer controller? Yet this is the way emulators do its job. What is generated by emulated GTIA goes to the buffer and then it becomes a texture that is processed via OpenGL in the graphic card of the host computer. This is the only way it can be done on a PC, but it introdiuces at least one frame of delay. We don't need to do this on a P2, as the emulated Antic/GTIA can work in the real time as in Atari. Then our implementation of Antic doesn't need to stop the emulated 6502 to get access to the Atari framebuffer. It however should, to make cycle-exact emulation needed for demos and several games, but it doesn't have to.
We can thing about "ultra-Antic" and ultra-GTIA, compatible with original, but with new modes and resolutions added. Combine this with "65032" I mentioned somewhere and we have a super-Atari8.
Maybe even better solution is to leave GTIA as is and emulate VBXE instead.
Not really. It has a display list buffer. If it is interpreted in real-time, then by definition, a frame buffer is unnecessary. That is a display list buffer.
I mean a discrete framebuffer between ANTIC and GTIA. So it won't be interpreting from the display list buffer to the fixed frame buffer than I'd love to see. Thus, you could turn ANTIC off when you need more CPU time, and without artifacts (missing display). I sometimes turned it off in code during heavy calculations but would either put that in inconspicuous spots where a user wouldn't be alarmed or give a warning.
Yeah, I was thinking more of real HW. Still, with a properly emulated one, you'd need to gate the CPU by the ANTIC to prevent the framebuffer from being overwritten while reading it. So buffering the output as part of the GTIA would allow the coder to turn off ANTIC without dealing with races. And having a buffer like that means you could take it in different directions, like the Vera approach. So a final frame buffer opens up the ability to use multiple graphics coprocessors, though one at a time.
And I was just putting this out there, not actually intending to do it.
Another weird platform idea that can be a remix of older stuff would be to build a machine around the 808x core and also extend the 6502 into a full '816 core and have both. While the x86 mode would be slow, it could also run 6502/'816 code. And even the JIT idea sounds like it could help. Like look for option ROMs run them through a recompiler on the fly, and run them as P2 code.
There is a guy trying to make an XT class '816 machine. It would not be x86 compatible, but it would use ISA peripherals.
It is difficult to follow here for me.
Why do you want to have only 2 different CPU families? And why exactly these?
It was decided, that the processor(s) shall be completely new and the software including the tools written from scratch?
@"Christof Eb." said:
It is difficult to follow here for me.
Why do you want to have only 2 different CPU families? And why exactly these?
It was decided, that the processor(s) shall be completely new and the software including the tools written from scratch?
Not me. Just throwing out ideas in general. So for anyone really to have ideas. The goal of the thread is not for locking any one person into a single design, but to have general ideas useful for whoever. I never claimed this thread as "mine."
I'd like to have my own. And that has nothing to do with this thread as it isn't a personal thread. Nothing is about me or what I may or may not do, which is personal for now. Just being helpful and panning different ideas that whoever may use for whatever, however.
Just wondering. I know nobody here would likely want to do this, but what if one were to use a P2 with a 65C02 and run the "public" bus at 14 MHz? I'd imagine most P2 coders would be more likely to also emulate the CPU and save bus headaches. I might want a more "traditional" homebrew machine where you have an actual CPU (whether 1 chip or many), a publicly exposed bus, and discrete system memory that's not inside a controller (not an FPGA or microcontroller with integrated memory pretending to be external memory).
If you need an even clock number, one could run the P2 at 22 times the speed (308 MHz) or 23 for 322 MHz (not sure if the math would work). Or take it to 24 times for 336 MHz. Would that be enough time for the P2 to be able to snoop the bus and flush it to the hub? Or should wait states be applied when detected ranges are found? I mean, should SRAM writes in those ranges be delayed until the P2 has time to flush it? Like read the bus and poll for the detected range, assert the RDY line if it matches, write to the hub, and toggle the line back.
What is the fastest a P2 could reliably access an external SRAM? I ask in case I'd want/need to incorporate some sort of bus-mastering UDMA. In that case, it would be good to outrun the CPU side of the bus if possible, such as 3.5x between the P2 and external RAM.
Then there's the other issue. Is a cog capable of doing 320x240? I mean, is there enough time to stream from the hub at 12.5 MHz?
There would be the 65C02, the SRAM, and a P2, plus whatever power and leveling would be needed. At the minimum, there'd be BIU and video controller cogs. Of course, that wouldn't be the only hardware needed for a system. It would also need mass I/O
(Other ideas come to mind such as augmenting the external CPU's instructions. For instance, if you want to add an RNG instruction, you could choose a 2-byte NOP from the 65C02, intercept it, and replace it with an immediate to-accumulator instruction. For wrap-around coprocessors, that is the type of thing one would do to pass parameters to the real CPU. Pushing "new registers" would be replaced with immediate pushes. Reading new registers would be a matter of intercepting instructions and replacing them with immediate ops on the CPU side. Of course, doing a complete emulation would be more efficient than trying to make a wrap-around coprocessor. It takes more work to augment (and impersonate) an existing CPU than to have those instructions there already. I mean, if your new instructions add new memory management modes, you pretty much have to almost clone the CPU to have the necessary context for such instructions.)
... as there is a 6502 emulator ready and working... at 336 MHz P2, 14 MHz 6502 seems to be available
What is the fastest a P2 could reliably access an external SRAM?
Now, what is this SRAM speed/access time? I played with the 2 MB parallel 16 bit 10 ns SRAM using an FPGA. One of problems was propagation time from and to FPGA, about 3 ns each. This limited the maximum bus frequency to 40 MHz
A P2 has these times much worse. I don't remember now, but these are several clock cycles, and this means 15 (?) ns... or maybe more. 14 MHz bus needs about 25 P2 clocks for one transfer... it may be near limits.
There is another problem: 6502 system is 5V based, we need level shifters to interface a P2 to it.
Is a cog capable of doing 320x240?
A cog is capable of doing (more than) 1920x1080 and streaming at (more than) 125 MHz. The problem is a HUB RAM too small for this resolution, but if using an external PSRAM (P2-EC32) it can do 1920x1080 @ 8 bpp. Of course we need another cog for a PSRAM driver to do this. I have a driver that displays 1920x1080 from a PSRAM framebuffer with up to 16 sprites (these are defined in HUB RAM)
That all sounds promising. I know about the 14 MHz internal 6502 core, but I was asking about a real one connected to a P2.
I was asking what reliable throughput the P2 could do with external RAM, speaking in general. Knowing that would inform the selection.
When asking what res a cog could do, I meant with hub RAM and the speed needed to access it. So what's the highest res it can do in light of the hub speed just using the hub? I mean, can it do 320x240 or 640x480 at that?
It depends on bpp. The limiter is HUB RAM that has to fit a program and a frame buffer.
1920x1080 @ 1 bpp is 259 kB
640x480 @ 8bpp is 307 kB
320x200 @ 8 bpp is 64 kB
That's why I wrote a display list based video driver. Every line has its own display address and graphic mode.
Another solution is to use ZX Spectrum type attribute map, so the picture can have colors, still being 1 bpp
Yet another solution is a tile driver where we have several tiles and indexes to them in the frame buffer. This includes text mode drivers. 1024x576 resolution fits 4608 8x16 characters. If one character uses one long, to keep its ASCII code and colors, then 4608 longs is enough for 1024x576 and 16k longs is enough for 1920x1080.
I was meaning the throughput, not the memory size, but the maximum size there is enough time to do. So I was asking what is the highest it has time to do using just the hub. I know about the sizes and needing 300 KiB for VGA at 8 bits. So what is the largest res that can be done in light of the bottleneck of the hub? That's all. This is still in the design phase. So getting an idea as to what is reasonable to expect would inform the choices for res and so on. I'd be mostly after closer to vintage resolutions, so QVGA (75 KiB) may be what I'd shoot for, or some other simulated mode.
Speed related resolution limits are 1024x600/50 Hz on HDMI, because of pixel clock = sysclk/10, and 1920x1200 on VGA, because of pixel clock<=sysclk/2 and 200 MHz limit on the VGA specifications.
That means you will have to use pixel doubling and line repeating to allow 320x200 on HDMI at a decent CPU clock. The VGA output doesn't have this constraint and you can output 320x200 directly as is.
@pik33 said:
Speed related resolution limits are 1024x600/50 Hz on HDMI, because of pixel clock = sysclk/10, and 1920x1200 on VGA, because of pixel clock<=sysclk/2 and 200 MHz limit on the VGA specifications.
I only asked what is the highest known resolution it can do in light of the access speed of the hub. I am asking what the P2 has time to be able to do. Nothing else.
Comments
Part of me would like to try a somewhat Gigatron-compatible machine without a native Gigatron underneath. I mean, why emulate the native Gigatron and have it emulate the vCPU set? It seems it would be more efficient to directly emulate vCPU.
But, since the native core does things other than run vCPU, those tasks would need to be handled another way. The Gigatron hackers don't really think that is possible. I think it can be, but it won't be easy. I think a supervisor (or "management engine") would be needed. It can handle things like initializing the cores, displaying a menu, playing the startup sound (a G chord), running the loader program, initializing memory, managing hotkeys, looking for a terminated program, and providing entropy. Really, the I/O core could do that and ensure the random number location is updated at least once per frame, but doing that when the supervisor is in standby mode can allow it to be updated more frequently. Of course, there is already the syscall for updating it on demand. TBH, I'd rather have that as an opcode if I were to spin my own off of this.
So the supervisor core would be what starts the machine, initializes things, and calls the loader. Then it passes control to the vCPU (or other cores if I were to add the 6502 core too, for instance). Then, as a change, have the supervisor core look for things such as terminated or hung programs, and if that is the case, then take control back, maybe provide a press any key prompt, and then reload itself. That would be an innovation over the original design which means you either use the hotkey (or start button on the game controller) or unplug it from the wall.
As for the video core, I say that for compatibility, maybe use it to gate the vCPU core, though really, it would be nice to have a new vCPU format and memory map, with options for letting coders determine this gating. Thus you can operate it without any gating between the syncs and the vCPU except when they are manually placed. So have extra vCPU opcodes to set the video gating mode. So either run only during the syncs (mode 1), run during the syncs and 1/4 the native scanlines (mode 2), run during the syncs and 1/2 the scanlines (mode 3), or run during the syncs and 3/4 the scanlines (mode 4), or run completely asynchronously of the video core during syncs and all scanlines (may tearing and frame races be damned). With a way to control this, whether an opcode or a memory location (private or exposed), the programmer could alternate between mode 4 and the new mode (equivalent to bus snooping). I wouldn't bother with actual line-skipping and some "retro look" as that would not be necessary since the video core would operate out of concurrent "DMA" (thank you hub). Programs can set which mode of 1-4 already, so gating would be good for compatibility, but for a new program format, make the ability to run ungated a part of the specification. Now, on the P2, is it possible to use hardware interrupts to busy wait when there are no syncs?
And for other things like BASIC, that could have a cog and only be functionally compatible. Thus, just rewrite the best BASIC or a combination of the 3 (untokenized "Tiny BASIC," tokenized BASIC, and Apple-Microsoft BASIC). Thus it should fly.
Yes, the P2 event system (which includes interrupts) can be used to busy wait, and also to stall wait (e.g. WAITSEx). Stall waiting should give the best response time to the event.
HP-85 with Capricorn processor? series80.org/
I wonder if the P2 would be good for doing an entire arcade machine. I imagine, for older titles, it would be no sweat, and it could do multi-game as well.
Even solid-state pinball machines could be more reliable with a P2. I mean, it could greatly reduce the number of boards needed. You'd likely need the power board and the solenoid driver board (which is also a power board), and the MPU board could do everything else. That would mean fewer connectors. Connectors are a major failure point.
I don't know if it could run Doom. I don't remember how big the WAD file is. Doom would probably benefit if the game engine could be split across different cogs. Maybe have a cog just for computing enemies. Doom is a game that has been ported to many environments. So yeah, that would be closer to arcade style since that is how older arcade machines were. They were single-purpose computers and the boards were originally tailored to each title. While they had one or more CPUs, the boards were custom. Then after a point, they began to make things more common, so for a multigame, it was just a matter of switching ROMs, assuming the user interface was compatible (ie., joysticks, flight stick, steering wheel, bike handles, or whatever).
How about 8 arcade machines at once?
https://www.parallax.com/p2arc8de-one-p2-ec-module-provides-audio-video-and-buttons-for-eight-8-concurrent-games/
It can emulate NeoGeo, a machine from 1990s
Original Doom was MS-DOS game. Maybe these files are too big for a standard P2, as this is 90s game, when at least 4 MB of extended memory was a standard feature, but P2-EC32 should be more than enough to fit it
Well, why not use external RAM? But I think it would need a new game engine in native P2 code, and maybe broken down to use different cogs for different parts of the game. For instance, handling the monsters could get its own cog. The P2 has the floating point power to make the calculations with most FP ops taking 2 P2 cycles (when Doom ran on 33 MHz machines). Sure, if you need the CORDIC solver, that's 58 cycles or whatever, though that could probably still be faster than on a 486 for anything that needs that.
You don't need to access the entire *.WAD files at once. They contain all the media for the game, including the maps, sprites, tiles, sounds, and music. There is also the Reject table too. The Reject table was a refinement to reduce floating point calculations and to allow for special effects like dormant monsters. So if there is no clear line of sight from the monster to the player, the monster isn't updated. There is no need to calculate the line of sight angles and everything for a monster if it is not possible for the monster to hit you. So that saved some processing power instead of blindly updating every nearby monster. Still, more power could be saved if things were multi-threaded. So have a thread for the player and a thread for the monsters.
So one would need to make use of overlays to be sure. Keep everything on flash memory cards, and maybe have a cog for a loading and overlay manager.
Well, if you don't mind having about Atari 2600 performance, I could see that. If you want to go beyond that with full colors, complex soundtracks, etc., then you'd get by with fewer concurrent games.
While Transputer and Occam language had been discussed at this form in 2008,2010, they might not yet have been mentioned in this thread. (It is so long now) https://www.transputer.net/iset/isbn-013929134-2/tthb.pdf . P2 could emulate a 8 core Transputer network?
Well, you were suggesting older titles and multi-player. I was just showing what had already been done.
As far as high performance goes, the work Ada has done is incredible with her Neo-Geo emulator.
Pretty sure Doom could run, but may need to be transposed like SockMaster did for DonkeyKong and the ColorComputer 3.
No, I said multi-game (multiple titles, playing one at a time), not multi-player.
Emphasis added to show where I think the confusion crept in.
As far as emulation of common hardware with multiple games is concerned, Ada's work with both MegaYume and NeoYume shows some of what can be done with the P2, some external RAM, and an SD card as the bulk storage device.
Multiplay or multigame is the industry term for an arcade cabinet with multiple titles.
Something I learned today.
For many of us I would suggest, the term multiplay looks much the same as multiplayer, and a web search for 'multiplay definition' gives many hits from seemingly credible sources that would support that confusion.
Being outside of the industry, and therefore not across this jargon, the distinction wasn't obvious to me.
The only reason I say any of this is to try to clear up the confusion.
I think I meant multigame. Multiplay may have actually referred to pinball machines and referred more to being able to play multiple rounds on the same quarter, often due to winning them (by skill and occasionally just by luck). Still, that is not the same as multiplayer.
So I think I might have used the wrong term by mistake. And I mentioned older titles since there would be enough room for them since they are small. So using the wrong term is easy to do when you spend a lot of time watching Ron Lyons work on pinball and arcade games. If you look it up on YouTube, the term to search for is Joe's Classic Arcade. Joe owns the shop, but his brother Ron is the one who does the videos. I've seen Joe do a couple of videos, but Ron does most, and in some of Ron's videos, Joe does playfield teardown, cleaning/waxing, and reassembly. But Ron does the paint touchup. In an arcade machine with a hole kicked in it, Ron did the electronic work and the painting, and Joe repaired the hole. Joe made the hole larger into to clean circle and cut out a circle of similar wood that he used Bondo to put in place which he then sanded flush with the other. Ron then matched the paint and freehanded it (with his left hand, BTW), and sprayed clear where he put the flat acrylic. Luckily, the artwork was so busy that you couldn't tell where he repainted it. As for how/why the hole got there, one can assume that was a thief trying to get the nearby quarter tray inside. And pinball machines occasionally come into their shop with jagged holes in the bottom for the same reason. But Joe also has some electronics skills, but he doesn't really go into the more in-depth repairs. I mean, he can recap one and change the flyback transformer, or swap a CRT, which gets the majority of arcade monitors back up. But if there are shorted transistors or holes burned in the high voltage or sweep boards or main board problems, Ron handles those. Donnie, their other brother is more into fixing vehicles and rehabbing houses.
As a recap, I think we drifted in regard to my original mission. It was to make a new platform altogether, and I narrowed things later to 16-bit, preferably with external memory.
So I wanted to know what opcodes to put in a custom ISA, what memory map to use, what video and sound capabilities, etc. And not be limited to a specific solution or goal-oriented design which stereotypically fits guys better. And yes, the aggravation of writing new code for a fresh CPU design is part of the motivation. And I wouldn't mind having other chips on the board.
And I explored the possibility of using multiple P2 chips arranged into a "baseball diamond." So have a main P2 with its 8 cogs, a GP coprocessing P2, a GPU P2, and a massive sound solution on a P2. The main one and the GP coprocessor each connect to 3 P2s, the other 2 only connect to 2. Though of course, 3 might work about as well.
I had a look at some Minicomputers in the last weeks. It is interesting, while those evolved quickly into 16bit systems in the early 1970 or so. Then microprocessors where available and solid state RAM and the main focus was to make it cheaper, not better, for about 10 years. So PDP11 became LSI11-23. I think, PDP11 has the most interesting instruction set (for a non-Forth machine :-) ).
However, if you want to have external memory, I think one or two hardware stacks might be a good thing, because they don't need many address lines. Read, write and reset might be sufficient?
In the HP-85 calculator computer they used MMU chips with own address address counters. The counters would watch the CPU operations and would go on counting dependant from the operations. "MMU-coprocessors". Special CPU.
I've been thinking of ways to rework retro machines. For instance, for some tasks like the ROM, why not rewrite that as P2 code and provide some means to intercept that? Sure, leave the original ROM for other things like character sets, programs that jump into those routines, etc., but have generic syscalls and interrupts to use native P2 code. So, one could likely have better FP libraries, for instance. But, if one does that, they'd need to keep things like character sets in mind, and if they are shadowed or can be modified by coders that things appear to software the same way and can be modified the same way on the software end.
And I got to thinking some about the Atari 800. It uses the ANTIC display list processor and GTIA as the video controller. That setup never used a framebuffer. If it had one, programs could be faster since ANTIC used bus-mastering DMA (as did the DRAM refresh), and you could write a value to an ANTIC register to disable it. However, that also turned off the display. Sure, GTIA would keep the syncs and everything, but it would have nothing to display. So with a framebuffer, you could disable ANTIC and maintain a display. And taking that further, why not add something like David Murray's Vera idea? For compatibility, only make Vera registers available when ANTIC is disabled. And of course, only do that if one wants to write code that uses it. So add a framebuffer and a way to use a clothesline memory scheme with auto-increment to directly address it and otherwise use the framebuffer to hold ANTIC's output. So a coder could program the video either way.
I probably wouldn't do either of the above, but rather spin my own. In some ways, that can be easier in that you don't have to come up with clever workarounds to add new features and still be compatible. And that can be harder in others since you have no cannon of software, let alone the core utilities such as assemblers and so on.
Atari 800 has a framebuffer. This is interpreted in the real time by Antic, then data are sent to GTIA and to the video output. That's why there is no picture when Antic is off. The framebuffer is still there but nothing gets data from it.
It seems your idea is to add a secondary framebuffer after GTIA, so when Antic is off, the picture will be still displayed by what? An additional framebuffer controller? Yet this is the way emulators do its job. What is generated by emulated GTIA goes to the buffer and then it becomes a texture that is processed via OpenGL in the graphic card of the host computer. This is the only way it can be done on a PC, but it introdiuces at least one frame of delay. We don't need to do this on a P2, as the emulated Antic/GTIA can work in the real time as in Atari. Then our implementation of Antic doesn't need to stop the emulated 6502 to get access to the Atari framebuffer. It however should, to make cycle-exact emulation needed for demos and several games, but it doesn't have to.
We can thing about "ultra-Antic" and ultra-GTIA, compatible with original, but with new modes and resolutions added. Combine this with "65032" I mentioned somewhere and we have a super-Atari8.
Maybe even better solution is to leave GTIA as is and emulate VBXE instead.
Not really. It has a display list buffer. If it is interpreted in real-time, then by definition, a frame buffer is unnecessary. That is a display list buffer.
I mean a discrete framebuffer between ANTIC and GTIA. So it won't be interpreting from the display list buffer to the fixed frame buffer than I'd love to see. Thus, you could turn ANTIC off when you need more CPU time, and without artifacts (missing display). I sometimes turned it off in code during heavy calculations but would either put that in inconspicuous spots where a user wouldn't be alarmed or give a warning.
Yeah, I was thinking more of real HW. Still, with a properly emulated one, you'd need to gate the CPU by the ANTIC to prevent the framebuffer from being overwritten while reading it. So buffering the output as part of the GTIA would allow the coder to turn off ANTIC without dealing with races. And having a buffer like that means you could take it in different directions, like the Vera approach. So a final frame buffer opens up the ability to use multiple graphics coprocessors, though one at a time.
And I was just putting this out there, not actually intending to do it.
Another weird platform idea that can be a remix of older stuff would be to build a machine around the 808x core and also extend the 6502 into a full '816 core and have both. While the x86 mode would be slow, it could also run 6502/'816 code. And even the JIT idea sounds like it could help. Like look for option ROMs run them through a recompiler on the fly, and run them as P2 code.
There is a guy trying to make an XT class '816 machine. It would not be x86 compatible, but it would use ISA peripherals.
It is difficult to follow here for me.
Why do you want to have only 2 different CPU families? And why exactly these?
It was decided, that the processor(s) shall be completely new and the software including the tools written from scratch?
Not me. Just throwing out ideas in general. So for anyone really to have ideas. The goal of the thread is not for locking any one person into a single design, but to have general ideas useful for whoever. I never claimed this thread as "mine."
I'd like to have my own. And that has nothing to do with this thread as it isn't a personal thread. Nothing is about me or what I may or may not do, which is personal for now. Just being helpful and panning different ideas that whoever may use for whatever, however.
Just wondering. I know nobody here would likely want to do this, but what if one were to use a P2 with a 65C02 and run the "public" bus at 14 MHz? I'd imagine most P2 coders would be more likely to also emulate the CPU and save bus headaches. I might want a more "traditional" homebrew machine where you have an actual CPU (whether 1 chip or many), a publicly exposed bus, and discrete system memory that's not inside a controller (not an FPGA or microcontroller with integrated memory pretending to be external memory).
If you need an even clock number, one could run the P2 at 22 times the speed (308 MHz) or 23 for 322 MHz (not sure if the math would work). Or take it to 24 times for 336 MHz. Would that be enough time for the P2 to be able to snoop the bus and flush it to the hub? Or should wait states be applied when detected ranges are found? I mean, should SRAM writes in those ranges be delayed until the P2 has time to flush it? Like read the bus and poll for the detected range, assert the RDY line if it matches, write to the hub, and toggle the line back.
What is the fastest a P2 could reliably access an external SRAM? I ask in case I'd want/need to incorporate some sort of bus-mastering UDMA. In that case, it would be good to outrun the CPU side of the bus if possible, such as 3.5x between the P2 and external RAM.
Then there's the other issue. Is a cog capable of doing 320x240? I mean, is there enough time to stream from the hub at 12.5 MHz?
There would be the 65C02, the SRAM, and a P2, plus whatever power and leveling would be needed. At the minimum, there'd be BIU and video controller cogs. Of course, that wouldn't be the only hardware needed for a system. It would also need mass I/O
(Other ideas come to mind such as augmenting the external CPU's instructions. For instance, if you want to add an RNG instruction, you could choose a 2-byte NOP from the 65C02, intercept it, and replace it with an immediate to-accumulator instruction. For wrap-around coprocessors, that is the type of thing one would do to pass parameters to the real CPU. Pushing "new registers" would be replaced with immediate pushes. Reading new registers would be a matter of intercepting instructions and replacing them with immediate ops on the CPU side. Of course, doing a complete emulation would be more efficient than trying to make a wrap-around coprocessor. It takes more work to augment (and impersonate) an existing CPU than to have those instructions there already. I mean, if your new instructions add new memory management modes, you pretty much have to almost clone the CPU to have the necessary context for such instructions.)
... as there is a 6502 emulator ready and working... at 336 MHz P2, 14 MHz 6502 seems to be available
Now, what is this SRAM speed/access time? I played with the 2 MB parallel 16 bit 10 ns SRAM using an FPGA. One of problems was propagation time from and to FPGA, about 3 ns each. This limited the maximum bus frequency to 40 MHz
A P2 has these times much worse. I don't remember now, but these are several clock cycles, and this means 15 (?) ns... or maybe more. 14 MHz bus needs about 25 P2 clocks for one transfer... it may be near limits.
There is another problem: 6502 system is 5V based, we need level shifters to interface a P2 to it.
A cog is capable of doing (more than) 1920x1080 and streaming at (more than) 125 MHz. The problem is a HUB RAM too small for this resolution, but if using an external PSRAM (P2-EC32) it can do 1920x1080 @ 8 bpp. Of course we need another cog for a PSRAM driver to do this. I have a driver that displays 1920x1080 from a PSRAM framebuffer with up to 16 sprites (these are defined in HUB RAM)
That all sounds promising. I know about the 14 MHz internal 6502 core, but I was asking about a real one connected to a P2.
I was asking what reliable throughput the P2 could do with external RAM, speaking in general. Knowing that would inform the selection.
When asking what res a cog could do, I meant with hub RAM and the speed needed to access it. So what's the highest res it can do in light of the hub speed just using the hub? I mean, can it do 320x240 or 640x480 at that?
It depends on bpp. The limiter is HUB RAM that has to fit a program and a frame buffer.
1920x1080 @ 1 bpp is 259 kB
640x480 @ 8bpp is 307 kB
320x200 @ 8 bpp is 64 kB
That's why I wrote a display list based video driver. Every line has its own display address and graphic mode.
Another solution is to use ZX Spectrum type attribute map, so the picture can have colors, still being 1 bpp
Yet another solution is a tile driver where we have several tiles and indexes to them in the frame buffer. This includes text mode drivers. 1024x576 resolution fits 4608 8x16 characters. If one character uses one long, to keep its ASCII code and colors, then 4608 longs is enough for 1024x576 and 16k longs is enough for 1920x1080.
I was meaning the throughput, not the memory size, but the maximum size there is enough time to do. So I was asking what is the highest it has time to do using just the hub. I know about the sizes and needing 300 KiB for VGA at 8 bits. So what is the largest res that can be done in light of the bottleneck of the hub? That's all. This is still in the design phase. So getting an idea as to what is reasonable to expect would inform the choices for res and so on. I'd be mostly after closer to vintage resolutions, so QVGA (75 KiB) may be what I'd shoot for, or some other simulated mode.
Speed related resolution limits are 1024x600/50 Hz on HDMI, because of pixel clock = sysclk/10, and 1920x1200 on VGA, because of pixel clock<=sysclk/2 and 200 MHz limit on the VGA specifications.
That means you will have to use pixel doubling and line repeating to allow 320x200 on HDMI at a decent CPU clock. The VGA output doesn't have this constraint and you can output 320x200 directly as is.
I only asked what is the highest known resolution it can do in light of the access speed of the hub. I am asking what the P2 has time to be able to do. Nothing else.