Proposing a possible Gigatron remake using a P2

PurpleGirl · 2022-09-03 16:24

I may or may not build this, so please keep this in mind. I'm proposing this here in the effort of seeing if it is worthwhile for me to get a P2 board for this and to document my ideas. I could probably do just as much on my A7 board.

Now, I'd want to go beyond the original Gigatron in that multiple cogs would allow for better peripheral options. Others have cloned the Gigatron itself on a P2 and have 99% compatibility. I am more after improved performance over the original Gigatron, and for a long time, I proposed converting the vCPU software core in the Gigatron mode that it written in Gigatron native code into its own core, whether on an FPGA, using discrete parts, or on the P2.

The challenge here is that the vCPU is not a complete system in and of itself. Other things would need to be added. For the Gigatron, the native ROM sets up the memory map, creates the video display, manages I/O, and more. So I'd probably want a supervisor or setup cog to do the setup and use other cogs to handle standard I/O and storage I/O. Since everything is in separate cogs, there would need to be ways to sync the different processes. Sure, everything can communicate through hub RAM, so there would be no hardware races, but for a design that is split into different sections, there should be mechanisms such as pausing the vCPU core when desired for compatibility and performance-gating reasons (such as making the original games run.

As for things like file I/O which is weak on the original, such a design can improve upon this. There would be no reason to modulate the V-sync or use the address lines to speak to outside devices. There would be no need for a shift register in the critical path of the I/O. Most of the add-on cards could be replaced with P2 versions and implemented more appropriately for a P2.

So I think this could be a way to make a more usable system for running GT1 files and create a framework for other P2 experiments. So the vCPU core could be replaced with or working alongside a 6502 core, or other emulated cores, if not actual P2 native code.

JonnyMac · 2022-09-03 19:12

...if it is worthwhile for me to get a P2 board...

Yes. IMO, it is far more fun and productive to work with actual hardware than to speculate about it.

aaaaaaaargh · 2022-09-04 10:24

Deleted

PurpleGirl · 2022-09-04 20:52

There are things I'd need to know before proceeding, and that is from both ends of this. For instance, I'd need to know if the P2 has an easy way of doing bytecode interpretation, jump lists, and so on. Apparently, it does, and I'd need to look at some code snippets to see. For the other side, I'd need to know what the vCPU entry point is supposed to be. I'd have to study the memory map, the Gigatron ROM, or at least ask about that in their forum.

There are features one would likely need to add, obviously. For instance, on the Gigatron, the video and vCPU run at different times. If you write code for that, you won't have to deal with software races or frame buffer overruns. However, since autonomous controllers would be used there, there would need to be a way to help keep the code in sync. So, for strict GT1 compatibility, adding Ready or Halt functionality to the vCPU code would be nice since that can allow the H-Syncs to gate the code and provide an easy mechanism to adjust the performance. And that could make a way to make something that performs better since I could ask the original team about creating opcodes to insert spinlocks on the user side. It wouldn't be to get them to do the work but to get permission to add to the instruction set to help prevent differing instructions later. Remember all the 6502 varieties? So really, I wouldn't want to add private opcodes only to have someone else use them differently. Thus standards cooperation would be nice. So moving from a hard-gated option to one that coders can decide would help performance. Since no interrupts are used, there need to be ways to sync the vCPU with the video cog when needed. While monitoring the video syncs would be helpful, and a convenient option for compatibility, having opcodes would be a nicer option for a new vCPU file format since coders can determine when waiting for the next video frame would be helpful. And they can avoid such gating when there are no video updates and improve performance over a hardware option for preventing overruns.

AJL · 2022-09-04 22:16

While the P2 documentation is still a work in progress there’s a lot of information there that can help you determine the answers to your questions. The P2 does have bytecode execution support, spinlocks, and cog synchronisation in hardware and there is example code for the use of these features. If you have more questions after looking at these things the forum members are normally quite helpful in providing extra information.
I think your goal of harmonising vcpu opcode use with others is worth pursuing, and hope that others that look to extend the vcpu opcode usage have the same thoughts.

PurpleGirl · 2022-09-04 23:35

@AJL said:
While the P2 documentation is still a work in progress there’s a lot of information there that can help you determine the answers to your questions. The P2 does have bytecode execution support, spinlocks, and cog synchronisation in hardware and there is example code for the use of these features. If you have more questions after looking at these things the forum members are normally quite helpful in providing extra information.
I think your goal of harmonising vcpu opcode use with others is worth pursuing, and hope that others that look to extend the vcpu opcode usage have the same thoughts.

Thanks. As for spinlocks, I was meaning more within the emulated stuff, but it is good to know and helpful that the P2 does have hardware support. So it would be nice to include hardware and software synchronizing in what is proposed. The hardware model would make the performance within the range of the original but what if one wants to move beyond that? Sure, a hotkey could be used for that and different levels of gating. I mean, whether to halt during all scanlines, 1-3 of them, none, or some selective model (like halt only on writes until the syncs change, or only during I/O writes).

Yes, I'd need to get a board before I can attempt to go further, and yes, most here are indeed helpful. I would need to use the code examples for sure since I can't make heads or tails out of a lot in the manual, and I don't find the assembly to be "easy" as it says in the literature. I find x86 assembly to be easier for me to grasp, and there are far more of those opcodes.

I think trying to harmonize similar projects is just a matter of common courtesy. And they might suggest that I use the ones that they won't be able to reach. They use jumplists to execute the vCPU. So you'd need 2-3 bytes of native code for overhead, and that means not every slot is available, only 1/3 to 1/2.

(Wow, I just thought up a way they could have made that support easier on the native side on real hardware. I never heard of a "straddle register" before, but what about an 8-bit register that covers a 12-bit domain? So addresses using it get multiplied by 16, meaning that you can jump to the beginning of paragraphs based on what is in the accumulator or whatever. That might be good to have to add to future virtual cores, so it can be retro-like with more power due to opcodes nobody used before. As for using that one in an absolute mode, the bottom nibble would be 0 and the upper nibble would be those bits from Y.)

PurpleGirl · 2022-10-03 01:19

The vCPU thing is not as straightforward as I had hoped. The GT1 files are structured more like EXE files to deal with a fragmented memory map. It isn't just a blob of code like COM files and most vintage program image formats. I don't know how the early Apples did it, since they had a highly fragmented memory map as well.

AJL · 2022-10-03 05:39

The Apple II family by default bootload the first sector (or block depending on the disk operating system) at $800. Extra code and data can be loaded contiguously, potentially overwriting the high-resolution graphics memory.
The loaded code is then jumped into and that code takes control of moving things around in memory and loading anything else from disk, either via a DOS or directly using loaded code. In short, each program can be different, taking different approaches to get everything necessary into RAM.

PurpleGirl · 2022-10-03 09:29

Yeah, the Gigatron GT1 code chooses where it loads. I think, according to a cursory read of the documentation, it gives segment, offset, length, and then the data, repeating as often as necessary.

So if vCPU is in its own cog, something would need to start it at the proper location. So it would be the loader mechanism I guess. I already mentioned a supervisor cog, so the loader can be there and then launch the vCPU core.

I had someone tell me that running .GT1 files outside of a Gigatron may be impossible. There is only one way to find out for sure.

Proposing a possible Gigatron remake using a P2

Comments