How likely could a P2 emulate an Atari 800?
To build an Atari 800 clone, there are only 4 ASICS that are needed. You'd need a "Sally"-variant 6502 based on the 2nd stepping of the original 6502 (not WD65C02), a POKEY, ANTIC, and a GTIA. You could emulate a CTIA instead, but why bother? And you could emulate an FGTIA if you want to do the SECAM standard. I'd say go with a modified GTIA that can output to a VGA monitor, and possibly do a modified ANTIC that supports more graphics modes if possible, based on some of the ANTIC replacements.
The above is mostly what you'd need. You'd likely need to emulate a PIA in some form as well. Regarding the FREDDY chip and 2 memory management chips, you wouldn't need those if you work with hub RAM or an SRAM. Obviously, it would be hard to do a 1088K-modded Atari unless you were willing to use serial SRAM and take advantage of the 6502's ready line to do wait states.
There are obviously other mods one can do to improve the performance. For instance, would the /halt line to the modified 6502 be as necessary? Obviously, the P2 has semaphores to signal such things. Or would it be wiser to include code to conditionally halt when the display list buffer is being simultaneously accessed to help prevent races while not slowing the CPU as much? I think the GTIA throws interrupts during V-syncs, so there is a mechanism in place to help prevent screen overruns.
Another mod one could do if they wish to step outside the compatibility a little is to mod the modified 6502 CPU emulator to use some of the illegal opcode space for new opcodes. Like, add some of the 65C02 features and '816 features, or others. Sure, you wouldn't be able to use those in the classic programs, but you could use those in a modded ROM and still get system-wide improvements. (Do other known mods in ROM too like the SIO port speed-up.) That would be most helpful if you add some floating point opcodes and have your custom ROM use those in its FPU emulator. Also, rewriting the BASIC cartridge might also be good. Those are the 2 most-used sets of code, ROM and BASIC. Thus ROM calls can take advantage of more efficient opcodes and software would gain collateral benefits. The stock BASIC could run faster with more efficient ROM routines. But a modded BASIC could do even better since it could directly call the extra opcodes, and there are other ways that would work on a stock 6502 to make it more efficient. The Turbo BASIC cartridge does such tricks to be about twice the speed of the original, using more efficient floating point emulation and drawing routines. There is a Veronica BASIC that is up to 5 times faster than that, but it cheats. It has its own '816 CPU that is clocked faster. I'm not sure if includes its own RAM or not. But having another CPU to emulate ROM and create proxied instructions for the main CPU is sometimes done.
The Kim Klone (not to be confused with a Kim-1 clone) used TTL chips and programmable logic to execute repurposed illegal 65C02 opcodes. It would get them before the CPU, and for most NOPs, it would simply do whatever and pass the NOP on. But for a few instructions, it would do whatever instruction and then have the CPU do something else, like if you want to copy from an extra external register to the CPU. In that case, you might load something from the outside register and pass it on as an immediate operation.
I see two ways to go with such a design. One is to modify the POKEY to use a PS/2 keyboard and do whatever interpretation is needed and emulate the entire machine.
Another way to go would be to make things more like the original so it could be mounted to a board that can go in the system board socket of an original. In that case, you'd probably omit the POKEY emulation and any PIA emulation as that would be on the mainboard. In that case, you'd leave all the other boards out. The system board tends to have just the 6502, GTIA, and ANTIC chips with their glue logic, and everything else is on the mainboard and the other cards. There may be several memory cards and a ROM card, and if you do RAM and ROM closer to the CPU, you wouldn't need those. And if this could work, you wouldn't need the potentiometer on the system board either if you convert things to use VGA. That pot is to make up for voltage changes affecting the colors. And if dropping this into a stock machine, you might need a CAS-inhibit line to disable the FREDDY chip, since there is no reason it should be refreshing the RAM.
Another note about mixing with the original parts would be voltages. In that case, anything that goes to the external bus may need to use voltage levelers. If by chance 3.3V chips can drive the 5V chips on the mainboard (and distance could be a factor), you'd likely need a resistor and a diode (thus no more than 3.3V passes either way). But signal levelers would likely be best.
Now, if you emulate the SIO port, then you'd need voltage levelers there. However, you could decide to dispense with that and provide another means and update the ROM to use it. The SIO port is interesting since we still have a modified version of that today (USB). It is a packet-based serial protocol that relies on smart devices. The 410 floppy drive used a 6507 CPU, just like the Atari 2600 console. I think it could handle I/O a little faster than what a stock Atari 800 could provide. A few ROM and DOS mods could speed that up some, and some folks did. On some of the modded Atari 800's (and the other 8-bit computers of that type, including the 400, the XE, and the XL), they added some version of the IDE/ATA protocol and used CF cards. I imagine SD cards could be made to work. With interrupts, a /halt line, and a /rdy line, that should be possible.
So could a P2 emulate an Atari 800 or similar, and would it require more than one?
Comments
P2 can emulate A8. Maybe in more than 1 cog, GTIA is a complex stuff to emulate. This was my 1st home machine when I was young so I always wanted to do this, but there is always something else to do.
We have already a 6502, it only needs to be fit to the rest of the design.
I have a miniature of 1050 FDD on my desk. This is the reduced size 1050 3D-printed case to fit 3.5" FDD. These are ready to print files on Thingivesre. Adafruit did a FDD controller with Raspberry Pico: maybe it can also be modified to fit the project.
Obviously, I assumed multiple cogs. Whatever the number of cogs that the 6502 CPU emulator would use, plus whatever you'd need for ANTIC, GTIA, and POKEY, plus any needed for glue logic. When I said more than one P2s, I meant multiple P2s if you run out of cogs.
As for a drive, in this day and age, you'd do better to find a way to hack in an SD card. It could use the SIO protocol, and you could go as far as to have some other cheap microcontroller or even another cog to do interpretation to make it FAT32 compatible (and make things easier for transferring disk images with a PC). And yes, there are even 3D prints available for SD socket covers, and one could go as far as to put a drive busy light in there.
As for the 6502 core, it should be the original 6502, not a 65C02, with the most-used illegal instructions (such as shift instructions and instructions that wrote to two registers), with the code modded to include the equivalent of a hardware halt signal. The "Sally" 6502s added a multiplexer inside to remove the 6502 from the bus. It isn't enough to de-assert the Ready line. The bus must be cleared so the ANTIC (and DRAM refresh on the original machine) can work. On the first of the Atari 800 machines, there were 4 more chips. Then Atari had MOS make one with this circuitry inside and repurpose one of the NC lines for the halt line. It may have also added another R/W line, but I don't know the significance of that.
I still like my other ideas of taking some of the unused 6502 instruction slots and repurposing them. A few FP instructions, a 3rd index register, the most-used illegal instructions (both shifts and bug-fixed), etc. I wouldn't include the most stupid of the illegal opcodes (like the one that's guaranteed to cause a crash).
Much likely.
Slight note: You shouldn't neccessarily think of the individual physical chips when it's highly integrated stuff. Think of functional blocks.
For example, on Megadrive, all these things are inside the VDP chip that end up being rather independent in my emulator:
- Video rendering (duh) -> split into 3 cog-sized components: layer rendering, compositing, scanout
- VRAM access port and DMA -> implemented in 68000 memory handlers (caveat: VRAM fill/copy halts the CPU)
- SN76489 clone -> bolted onto YM2612 cog (weird port interface handled in memory handler)
- data bus transceiver between Z80/68000 -> implemented in memory handlers (rather obvious)
(on NeoGeo this is even worse, particularly original MV4/MV6 boards, the video pipeline is split into like 4 ASICs and some of them have random other functionality like the watchdog or the controller inputs shoved into them)
Same thing with POKEY doubling as synthesizer/keyboard IF/serial IO. Technically the oscillators generate the serial clock, but that's an implementation detail that doesn't really matter (I think). Infact, the serial port having a clock or being serial at all may be abstracted away if the peripherals are virtual to begin with.
It is near. Macca did it for his esoteric 6502 machine emulator as a "clear" 6502 with nops instead of illegals. Then I used it for my SID player. As a C64 SID tune authors (ab)use 6502 illegal instructions, several of these was added by macca (before I attempted to do it myself). Adding the rest of them should be not a problem. It is one cog.
I have tried already to do gtia. As it is heavily connected with Antic, I think both chips can fit together in one cog
The third cog should do a Pokey.
PSRAM may be also useful as there are several memory extensions available for A8, up to 2 MB, and there is VBXE, a graphic card with half a meg memory.
There is no problem with SD access, we have it already in Flexprop, but I dream of reviving the floppy. Maybe not a 5.25", although I have several of them, but 3.5" - I have already print the case, and using the Adafruit contraption it should be possible to format these 3'5" in the Atari style.
I like to think of it in terms of the original hardware as it helps me think of it the other way. So actually, borrowing from what I said and what you said, one could actually upgrade it and go with USB. SIO was the earliest implementation of USB and had the same lead developer. You are thinking more like a coder while I am thinking more in terms of a board designer since I mentioned the possibility of one making the system card for the 400/800. In that case, you need the functionality of the CPU and the 2 display chips. That simplifies things, and ROM/RAM could also be included there, and one probably should use a CAS-inhibit line (or mod the mainboard with a dummy Freddy). That could simplify an overall design. Thus the mainboard deals with POKEY, PIA, SIO, the keyboard, game sockets, etc. And if one wants the VGA output as I'd like to see, then one could unplug the line cable from the RFM and maybe widen the hole in the back and put a VGA cable there. And if going for VGA and leaving out the memory boards, then one could leave off the color adjustment pot since that only adjusts the signal strength for color burst to calibrate the colors based on the voltage drop. Something as simple as adding a memory card could cause the colors to need to be recalibrated. So if you replace all the memory, the decoders, other glue logic, the CPU, ANTIC, and GTIA, you'd likely use less power, and using VGA breaks out the individual colors, so the pot here wouldn't be necessary. The neat thing is if a P2 could do most of the system board is that it would actually fit inside the cast aluminum shield. To add extended ANTIC and GTIA functionality, some folks added piggyback header boards to the other A8 varieties which won't fit in the 800 without removing the RFI shielding.
Another detail one might add for nostalgia is the sound that comes from the PIA. It doesn't actually have to come from a PIA, just so long it is mapped to the same addresses. That is what generated the key press clicks and provided an error siren. That went to a speaker in the cabinet, while the POKEY sound went out the modulator.
And it wouldn't hurt to map a simplified POKEY to the commonly used secondary location. Thus stereo or 8-channel mono could be possible. A number of folks have modded theirs to include a 2nd POKEY, and omitted using all the interrupt and device-handling stuff, just so they could extend the sound. Similar usage was done in pinball and arcade games. They didn't need the Atari computer functionality, just the sound, and there were up to 16-channel POKEYs available. With 16-channels, that could provide up to 8 channels at 16-bits. I've never seen a quad POKEY actually existing on any of the machines that Ron Lyons repaired on his YouTube videos. Those are rarer than plain POKEYs, but thankfully, there are FPGA replacement boards available for that. It was too bad that most manufactured POKEYs are in landfills. When there were blowouts on cartridges, and since it was before 3D printing, folks would buy up the cartridges, strip the PCBs, and use the PCB and case for their own cartridges. They never thought to sell the chips and particularly the POKEYs.
Speaking of the PIA (and all 6518-23/8518-23 varieties), folks used to like to "abuse" them in all sorts of ways. So they were used as sound chips, stack counters, and other things beyond their intended purpose. I was wondering how the one used it as a stack counter for a wrap-around coprocessor since the PIA/VIA only counts forward. They added an inverting octal transceiver chip. So if the stack was 255 the PIA would see 0, and when it incremented to 1, the SP would be at 254. They did it this way since they already used the PIA for something else and didn't want to add multiple counter registers.
At least that would be a good starting base for the 6502. Implementing some variation of the SALLY mod might be good, not for hardware races, but to prevent software races. (Or if one is up to it, apply selective halting to let normal code continue but halt if an instruction tries to write a display list.) Of course, now that I've thought about it, ANTIC could just assert the /RDY line. Duh! Since concurrent DMA is inherent to the P2, you don't actually need to unlatch anything, just tell the CPU that the bus is busy. That mechanism would be needed for external serial RAM (whether PS or truly static) to buy time to access it. But yeah, whatever mechanism that's used for that could be used in place of a /HALT line.
And some demos used the more useful illegal opcodes. One or two were undocumented because they were broken. As I've said, I'd love to see the remaining opcodes from what was used on the original be used for enhanced extra instructions. Such as 3rd index register instructions (and whatever new modes that inherently adds for aware software), making the shifts legal instructions, and a few FPU instructions. That would make it possible to rewrite the ROM (and BASIC) to speed up the FPU emulator calls. With a new ROM and BASIC, more memory could be made available under certain circumstances.
A thought came to mind on POKEY emulation. It would be nice in 8-bit mode to have a way to get it more in tune, sound-wise. Programmers used a note chart for programming music. The chart was in the beginner's manual and gave you the machine values. The problem is that the suggested values in the chart are not 100% linear and the notes produced may fall outside the even temperament scale chart a little in places. I get why that was the case. The default POKEY music mode was 8-bits. So what if one could have a lookup table so that when the numbers most used for music are encountered, more accurate notes were played?
For a POKEY, one would want to include all the operation modes it has. I mean, we'd need a 16-bit POKEY mode that halves the number of channels. Thus, 9-octave sounds are possible when software enables this mode. There might be a couple of modes based on the main clock frequency, maybe one that is tied to 60 Hz, and the ability to directly bit-bang it if one wants. Bit-banging was occasionally done, but usually in assembly. And it wouldn't hurt to have 2 POKEYs, but the 2nd one doesn't have to be I/O complete as it is just used as a PSG. We'd need to know where to put the 2nd one on the memory map. Modders did add an additional POKEY, and there are demos that use it if it is there. Thus you can have stereo output or 8-channel sound.
And with the earlier discussion, there is no reason why in a P2 implementation the PSG and the I/O portions of a POKEY even have to be in the same cog.
Personally, I'd like to see a hardware setup that is in fidelity with the Atari 800, and that means using SIO, 4 Atari-style joystick ports, cartridge slots, etc. If one insists on that, they might need a 2nd P2 as they may run out of pins. Now, yes, if one wants to use Famicon or USB controllers and forego SIO, the wiring could be simpler. Speaking in general, it would be nice to work out some sort of "southbridge" protocol for folks who want to use multiple P2s together. One popular bus that Atari modders have added to A8s is an ATA/IDE bus, and I presume PATA. Of course, they tend to use CF cards with that, not hard drives.
We have 1.79 MHz, 64 kHz and 15 kHz as a base for Pokey frequencies. This is divided by the value+3 for 8bit and value+7 for 16bit. This means we cannot have "exact" note frequencies at 8-bit and 16-bit needs to use 1.79 MHz main clock. We can also not have any "better" lookup table
One of workarounds used by demosceners is to continuously switch between values to keep the average frequency in tune.
The second POKEY in standard A8 extension is $20 "up" from the first.
There is nothing that says a pokey replacement has to be clocked internally at the board clock. Maybe you can't do that on a P2, but I am sure I could do that on an FPGA. So what you could have is a faster core and lookup tables to remap the 256 values to where the ones most used are aligned with the nearest music note. What I said before is possible. It has 16-bit registers. So it wouldn't be hard to remap 8-bit values to work in the 16-bit internal divisor tables. I do believe it is possible to produce an in-tune Pokey. If I have to make one to prove others wrong, I will. It is very possible to replace the internal math with a list of more arbitrary values. Again, maybe not on a P2, but I'm sure I could make one on an FPGA. I think translation rather than division is possible.
I don't like to be around those who like to throw cold water on people's ideas. To me, that is incredibly rude.
And in what I want, I don't want NTSC compatibility. So the board frequency would do better to match what a VGA GTIA would need. So a custom POKEY would do better to be remapped to work with those frequencies.
The weird intonation in Atari music is just part of how it is. It's what the composer heard when making it, so why change it? Also it'd make vibrato/slide weird and wouldn't really work if a different scale or reference frequency (NTSC vs PAL) is used.
Nothing stops the audio generator from running at whatever fractional frequency ratio it wants to. Also nothing stops you from generating slightly-offspec VGA/DVI at whatever exact timing the original hardware uses.
Well, I think fixing the tuning issue would have negligible effects on special effects, and I have nothing to display NTSC or PAL. If I were to do one, give me RGB. That said, for existing demoscene stuff and not BASIC programs, surely a register, unused control bit, non-standard opcode, or something could be used to change the function. I'd likely say to add that only for the 8-bit mode.
But the notorious tuning issue is why Commodore commissioned the SID.
And yes, I know about off-spec VGA. Even the Gigatron does that. I mean, I don't think a 6.275-6.3 Mhz crystal could be found. The standard 640x480 pixel clock is a tiny amount over 25.1. So 6.25 is slightly slow for 160 across, so Marcel tweaked the porch times to correctly center the image. It is nice that he designed a few specialized instructions to make bit-banging a bitmap more feasible. So an AND/OR can read from memory, send it to the port, and increment one of the index registers all at once. Still, that only works as good as the clock rate, and it is a tad slow for QQVGA, but luckily you can reduce the blanking intervals slightly and get by with it, even if your vertical frequency is 59.8 Hz.
312 total lines for PAL, 262 total lines for NTSC, up to 240 lines of active picture
114 CPU clock = 224 color(*) clock = 448 "hi-res" pixels total per line.
CPU clock = PAL color subcarrier oscillator (17.734475) /10 = 1.7734475 MHz for PAL systems
= NTSC color subcarrier oscillator (14.318182)/8 = 1.78977275 MHz for NTSC system
Pokey is clocked by the CPU clock. It can be divided internally by 28 or 114 (don't ask me why they selected these numbers)
This is divided by the 8- or 16-bit register value
There are 4 8-bit registers. They can be linked to get 2 16-bit registers, which means one of the "linked" sound channel is lost.
The most standard configuration is clock/28, which gives this table of frequencies:
Becasuse the audio frequency is the result of dividing the base by the register value, this is the 1/x shaped, nonlinear function. Also the granularity is much worse for higher frequencies.
Of course demosceners have a lot of tricks to do with Pokey. The example results:
2 Pokeys:
Now that to me is funny, but I get it on the Atari 8bit, especially for those that have original disks.
My own experience with buying an A500+ is that the floppy was the thing that I really hated about it. Unlike the Atari 8bits and the C64/C128, the Amiga is completely dependent on the floppy at the OS level, so it was SOOOO annoying waiting for the floppy to boot or load games, and the size limitations meant that I had to strip stuff out of a workbench disk in order to get on the programs I wanted to use. I didn't miss floppies at all after that experience, and I still have a lot of physical disks to go through that might just end up in the trash instead.
hinv,
The C64 had the slowest floppy drive of all with the 1541; the joke was that you could make yourself a sandwich while it was loading.
...
The problem with doing all of these upgrade is that it makes the machine unique, which is great for the code that you want to write, but for running other code, it kind of defeats the purpose of emulation in the first place. If you want to build audio and video to your specs resulting in a unique system, you can certainly do this inside the P2 and avoid a lot of the limitations of the day, like 40-48 pin DIP max package sizes constraints, clock speed constraints that drove a lot of the 8-Bit designs.
Yeah, I remember. I had a 1571, but I think it was still limited by the 300 baud or so bus interface. I spend literal months of my life waiting for floppy loads. I had friends with cassette drives though, so I was happy to have a floppy.
The C64/C128, like the A8's were ready to program 3 seconds after flipping the switch through. The Amiga, on the other hand wasn't. I didn't have as much experience with the Atari floppy drives because the 400's and 800's at school simply didn't have them IIRC. We typed in a lot of basic programs (which was especially annoying on the 400), but since at that point I didn't have my own computer, I still was excited to do it.
That is why I suggested a way to turn my proposed feature off and go for default behavior. Those who make the FPGA Ataris usually make their own ROMs and OS too, and go as far as to include a "CMOS" configuration page. That way, they can select the CPU type, the number of Pokeys, how much RAM is available, the available video modes, and the display type, and get nearly any program written for them to work.
I suggested various performance enhancements that won't affect compatibility. Such as adding opcodes in places where no programs written for them use them. Thus, one can only rewrite their BIOS and maybe their BASIC cartridge and gain performance. The only downside to this one is if you make the ROM routines too good and some games are unplayable due to the performance.
The 2nd Pokey upgrade is a common and known one, and very unlikely to cause side effects.
The de facto standard now are 2 pokeys and at least 256 kB, if not 1 MB, of RAM.
For adding opcodes to 6502 while making it still compatible with the original, several slots with KIL instruction are avalilable. KIL stops the CPU at all until reset, so it can be replaced with something else (but not HCF )
I had to look up HCF as an opcode. LMAO! Yeah. I've seen a few machines like that. "What's wrong with it?" "It's giving off holographic error codes in the form of smoke signals."
An instruction to make it lockup isn't useful. At least x86 machines have a conditional Halt instruction. That is useful since it is used for syncing with external devices. That is really what FWAIT is. It halts the CPU until the FPU is finished. That can cause a race condition since the main CPU may have used the result from the FPU before it was available. And that is a case where assembly can help with performance. I mean, a programmer would know how long things should take and what would be a reasonable safety margin, but a compiler doesn't, and since those are made for those of wide ranges of skill, they are coded for safety. I mean, if you dispatch FPU ops a little way before you use them, you can allow for true "co-processing." But if you can't structure your code that way and you need to do something with the result right after the FPU returns it, then you must wait on the FPU to finish.
I do love the fact that the P2 6502 core performs on par with the fastest official 65C02 offered in DIP packaging. It should perform about 14 Mhz using a 320 Mhz clock. It is funny of the story behind ARM where they went to Bill Mensch of WDC (Western Design Center, not the hard drive maker) and asked for help making a 32-bit CPU that is in no way related to the 6502 or the '816. He told them to take a hike. But, I wish he and Chuck would have surveyed the user base and use cases of the undocumented opcodes of the 6502 before making the 65C02. That would have saved a lot of headaches.
A lot of FPGA 6502/65C02 designs use a single-phase clock. Using a multi-phase clock may add to board complexity, but I get why they would do it that way. If you want to avoid pipelines, slot delays, etc., or cheat with the clock rate to get it up, you can use multiple overlapping clocks to give each part of the process enough time. Even the Gigatron does this. Memory writes (off of Clk 2) borrow some of the time of Clock 1 being low. Then its rest period ends after the next cycle of Clock 1 has given enough setup time for things early in the cycle. So what takes close to 190 ns can complete in 160 ns. Of course, for an Atari 800, everything needs to finish in under 558 ns.
I am not familiar with Atari computers but according to Wikipedia, the 800 used a 6502C which is based on the original design.
The original 6502 is known to have a number of stable illegal opcodes that were sometimes used for games but exploited in demos.
Commodore continued to use the original 6502, which they owned, on their computers, bugs and all.
The 65C02 is a CMOS version of the 6502 made by a different company that fixed a number of bugs but also made all illegal opcodes act as NOP, or No Operation, which did nothing but each up processor timing cycles.
Actually, they used a modified 6502B called Sally. While they put a C on it, "6502C" and "6502 C" are not the same. That naming issue has led to confusion. The difference from the earlier ones is a Halt line. For a P2 clone of the Atari, we don't need the /Halt line. We can add the /Rdy line to the emulator's code and use that in place of the /Halt line. The A8s used bus-mastering DMA to let the Antic chip work. So the CPU had to be halted and unlatched. Antic took over the RAM and the rest of the bus, and then passed it back and released the bus. In a P2 setup, we have inherent, 8-channel, concurrent DMA (which is why the hub is so slow, though much faster than on the P1). So we don't need to release the bus, as that automatically occurs as we access the hub. So hardware races can't occur, but software races are another issue. You don't want the 6502 to rewrite the display list midway through. One can likely have more freedom of when to pause the CPU, but for best compatibility, it should have similar enough behavior. So this CPU distinction is not important for emulating it on the P2. Emulating this across different microcontrollers would be different, but we should be able to do it all in the P2.
And yes, Commodore Semiconductor Group bought out MOS. On the C64, actually, they used the 6510. That had 6 port lines, and I think they used those for memory banking or something, not general I/O which came from RAM.
Actually, Western Design Center (not to be confused with Western Digital Corporation) wasn't so much a "different company." The 6502 inventors left CSG and started it. I guess they couldn't stand Jack Tramiel. Tramiel eventually moved to Atari, and not too many were thrilled. He was the one who started the price wars. His decision to acquire MOS was good for the industry at the time since it kept TI in check regarding component prices (they reluctantly sold to their competitors and started jacking the prices), but what hurt all the smaller computer manufacturers was Tramiel viciously undercutting TI. So an $800 computer went to under $100 overnight. Not only did it hurt TI, it hurt Commodore, Atari, and various others. It didn't harm Apple and IBM as they used a different customer base and pricing model, and schools, businesses, and high-end enthusiasts bought those. So if anything, it helped the giants. They let their competition "kill" each other, let the customers pick through the carnage, and waited it out. Then everyone came to them because nobody was left.
Then Tandy came around and cloned the IBM PC, almost, with their own multimedia features (though they copied TI on the additional sound capabilities and used their sound chip). IBM tried to sue them, but the courts said no infringement took place. They used their own PCB design and they guessed what was in the ROM without actually ever seeing it. That paved the way for other companies to make clones. And in the midst of that, an obscure company called Acorn RISC Machines (check the initials) popped up with their Acorn computer in the late '80s. It didn't seem to go far from what I saw, but they never went away. My parents tried to get one, but Western Auto never got them in stock as it said in their sales paper. I guess they had their own CPU, and the goal was to produce cheap computers for the underprivileged and those in developing nations. Now, they restructured and changed what their initials stood for, and now they are ARM Holdings.
That IS interesting. How many 6502 illegal codes are now just NOP on the 65C02?
Yeah, that would have been nice since some of the illegal opcodes were used to increase performance. If I was going to roll my own processor, I would probably start with a 6502, depending on how many useless undocumented instructions there are. I would probably strip out the BCD instructions as well to make room for more useful instructions.
There’s more to it than just rendering some as NOP, there’s also some new instructions and some new addressing modes. The 6502 has 151 valid opcodes (105 illegal) while the 65C02 has different numbers of valid opcodes depending on which revision you have, but the latest has 212 valid (44 illegal).
Despite the manufacturer’s claim, some of those ‘NOP’s do more than eat cycles, but perform extra reads and can block interrupts for themselves and the following instruction (undocumented behaviour.)
It’s easy to forget that back in 1981, when the development of the 65C02 started, with no public Internet there was no easy way to even identify who the ‘user base’ were, let alone engage with them about the use of opcodes that weren’t even listed in the original datasheet. They would have needed to pay for adverts in every trade and hobby magazine at the time, and then waited several months for any replies, while the market (and competitors) moved ahead. It just would not have been a feasible approach in 1981.
Conversely, you might wish that noone had used the illegal opcodes on the 6502 (that seems to be Bill Mensch’s view), and that would have saved a lot of headaches.
Future versions of the processor would only need to comply with the intended operation of the original while extending the capabilities.
There are no ‘BCD instructions’ just a decimal mode flag and instructions to set and clear it. While that flag bit could be left unused, you don’t get much (if anything) back from omitting BCD.
That is just semantics.
Knowing a bit about the inside, omitting that mode may actually help you decrease latency and increase the clock rate in a real design. In the original, at least, the BCD mode cuts into the critical path. It also led to design decisions that slowed the regular binary mode. If you need BCD stuff, you'd need a nibble-sized adder and extra microcode to add and subtract 6 in addition to the main adds. (TBH, there is no subtraction in a CPU, just adding the complement and provoking an unused carry.) Anyway, if you break the circuitry out for a nibble adder to be able to use and intercept the carry signal independently before it becomes bit 4 so you can add 6 to that, you insert a ripple carry into an otherwise tree adder which increases the latency for binary byte operations. And yes, that isn't inherent to being able to run all the opcodes. I mean, there are other ways, in more modern implementations. to account for the carry dilemma. One could use a carry-skip arrangement, use separate ALUs for decimal and binary modes, or just do the decimal mode with more microcode if you are willing to sacrifice cycle accuracy for the decimal mode.
In saying the above, I am not criticizing. It is easy to see what could have been done differently after the fact. Die space was at a premium, so if you could have one ALU do everything, that was a plus, since you wouldn't be forced to cut corners elsewhere just to make it all fit. That is one reason why "clones" of CPUs often worked faster. Second manufacturers play catchup, so enough time would have elapsed to where a smaller node would have been available. And with the latest runs of the 65C02, there is no telling how many of the above refinements I alluded to above were actually used in the 14 Mhz variety.
This reminds me of the history of the V20 chip. That is an 8088 replacement that added the instructions that were added to the 186 instruction set, added a hardware multiplier, added a second ALU, and gave it an 8080 mode. That likely couldn't have been done a couple of years earlier since the die space didn't exist, though arguably, they could have omitted "mini mode" and 8080 compatibility mode. The 8088/8086 had 2 main hardware configurations. They had a maxi mode which is what the majority of the compatible computers of the day used. But you could also use the mini mode which provided simple versions of board components in the CPU to where you could use it more like a microcontroller. That was intended for embedded applications, not home use. So NEC could have left that out. However, they had the die space. Had they left that mode out, it likely wouldn't have been used so widely by the telephone industry. The performance decisions made sense to me. Adding an ALU to the BIU (Bus Interface Unit) meant that the BIU wouldn't need to use the poor communication stream to the ALU and hold up execution to calculate addresses. It could just do all that on its own. The hardware multiplier made multiplication faster since the CPU no longer had to use microcode addition loops to multiply. The 286 used those refinements too and demuxed the address and data lines, which was another bottleneck. And of course, the 286 had more in common with the V30 CPU, but with a few more instructions and completely separate address and data lines.
As for my comments about WDC's opcode map, what you said makes sense. It could be argued that programmers shouldn't have used those opcodes, and yes, trying to track down the coders and asking them may have been difficult to do. That reminds me of how MOS first introduced the 6502. They attended a trade convention event that didn't allow direct sales from vendors at the event and invited any interested parties to a hotel suite. They put all the chips in a jar, Including the duds, and sold the engineering samples for maybe $50 each. While those getting bad chips likely weren't the most thrilled, I doubt they complained much, knowing that they were holding a piece of history.
Another member said 44, and it was interesting what the Kim Klone (not to be confused with a Kim-1 clone) board design did. The board's designer intercepted at least 42 of those and made a TTL wrap-around coprocessor to use them to add more instructions, particularly those using a 3rd index register. So he gave it '816 features without using an '816 or any sort of memory management unit. So he could run up to 16MB of memory and use his other instructions to nearly double the performance In a Forth interpreter. For a few of his added opcodes, he worked with the CPU by intercepting one of these NOPs, doing whatever, and then telling the 65C02 to do something else to complete his instructions. Like if he wanted to do a PUSH Z:Y:X, his coprocessor would set the upper byte using his added "Z" register (or whatever name he called that new register) and then tell the CPU to PUSH Y:X. I could be wrong on the specifics, but you get the point. The CPU didn't know about the uppermost register and didn't have to if his circuitry handled that.
Well, there were no explicit instructions for those, just a decimal mode. So you wouldn't save any opcode space. And besides, unless you had a division with remainder instruction, you would make handling things like binary <--> ASCII string conversion harder. For BCD to ASCII just take the nibble and add (or OR) 48, and build a string of numbers with that. With binary, you'd do similar, though you'd need division and a remainder to split out the digits. That in itself is harder to do, but if you have no division, it is even harder since it will likely be hell with all the nested loops. So actually, the decimal mode can be easier for converting to displayable numbers. And of course, if you are designing your own game (such as for a 2600 console), who says you even need to follow ASCII conventions? If you don't need the control codes, just make binary 0-9 map to the digits your game displays. So with BCD, you might not need to do any conversion at all, depending on your program.
Actually, it isn’t.
Knowing a bit more about the internals of the NMOS 6502, removing decimal mode saves approximately 2 inverters and 10 XOR gates in the 8 bit ALU.
You should also consider that the 6502 was taped out by hand, so every transistor added was more work (and therefore cost)
At the time it was something they could get cheaply. Getting two for $100 was still cheaper than any of the other options available. The perspective of ‘A piece of history’ generally comes significantly later after it has been successful. How many of the purchasers of Apple I computers would have taken up the trade-in option for the Apple ][ if they thought they were holding ‘a piece of history’?
I really have nothing to add to the previous reply. I still think things are mostly semantics in that decimal mode turns the other opcodes into BCD ops. And you are right. I made the distinction in my other reply. One wouldn't be saving opcode space, only maybe 6+ ns or whatever in the ALU. And unless you redesign the rest of it, you won't realize any gains. For instance, prefetching the microcode for the next instruction would gain you significantly more performance, making the ALU the next bottleneck. In Drass's 20 Mhz TTL design, he prefetched the microcode for the next instruction. And yes, that created a few problems such as handling bootstrapping, but thankfully, the 6502 is orthogonal enough to where you can guess the first microcode instruction for most instructions, and the exceptions are not hard to manage. Doing just that got him to about 17 Mhz or whatever, and then he got the decimal mode out of the critical path and used a carry-skip arrangement with the adders (adding another nibble adder and a multiplexer).
With Drass's pending 100 Mhz design, he has had to rethink things even more. The decimal mode instructions are no longer cycle-accurate. He figured adding a cycle to those was better than reducing the clock rate. He also had to make his own adders and incrementers using transparent latches. So being able to add an 8-bit number in 6.4 ns is pretty impressive for off-the-shelf components. (He could have made that even faster by being less elegant and using the "dead bug" approach to mount faster chips with a worse fanout and doubling up; mounting that would have been harder.) Of course, since that is synchronous, you'd need to set it up a cycle before you need it. Going that fast, you pretty much need to divide the AU and the LU instead of using an ALU. And of course, you must keep signal integrity in mind.
So I guess we should reign in the topic some. We have a 6502 core now, and while it might need some refinements such as adding a few of the illegal opcodes and working out how to add the equivalent of the Ready line, it is mostly ready to go. In our usage, we won't need the /Halt line since the P2 gives us concurrent DMA already. However, an equivalent of the Ready line should be done so Antic can throttle the CPU when needed, and in case anyone wants to add external memory. The reason for a Ready line on a 6502 is to give external memory (including ROM) enough time to work. It can be used to pause the CPU for external devices when unlatching the bus is not required. So we can simulate the effects of bus-mastering without actually needing it.
Then we'd need Antic, GTIA, and Pokey. Someone mentioned putting Antic and GTIA in the same cog. That might be doable. Atari had considered putting them in the same chip, but their CEO put the brakes on that project. I guess he saw the writing on the wall and knew the platform would not be tenable for too much longer. Now, someone mentioned using a framebuffer. While you can do it that way, I believe part of the purpose of the original design was to avoid doing that. (Remember how costly memory was back then.) Antic would render the display list and I assume, work it on the fly, 1 scanline at a time. Then GTIA would set the correct mode and colors and be cycle exact for the display to work. So you don't need a framebuffer if you can render a display list in perfect timing with the display. I'd want a VGA version of the GTIA in any design I might build, so even if we need 12.5 Mhz in the final video portion to get this to work, it should work, even if we must force the CPU to work around 1.79 Mhz, with maybe some mild variance to that. For a P2 implementation, we might need to render lines ahead of time and cache them when possible to make up for the hub variability and the variations in complexity of the rendering. So if it encounters 3 very easy lines to render and then finds a harder one that takes more than a line's worth of time, there would be lines already ready to be used, thus borrowing from time already saved. So our Antic can be made to work during porches and skipped lines and can stop rendering when its cache is completely saturated. So the frame buffer idea can be omitted if everything can render in real-time. So to be honest, even if our Antic works much faster than the original, it wouldn't really matter. In the implementation that I propose, It would be gated by how full its line buffers would be and by the GTIA core. The GTIA would send interrupts back to the CPU during V-sync as expected.
Regarding Pokey, it might be possible that we'd want to split the sound abilities from its port and keyboard handling. I guess it depends on how one wants/needs to do things, and what is available in terms of peripherals we can use, and whatever our goals are. For instance, we could have a 2nd Pokey core mapped about 32 bytes in memory above the main one. That is the standard for the upgrade boards and the demos that use them. And we might want to use a PS/2 or USB keyboard and more readily obtainable game devices.
There is also the PIA chip, and I am sure we could code something compatible based on what we want it to drive. Others here see no use to include an SIO port. I have things I'd plug into one. And of course, some newer Ataris actually included a parallel port in the back. Even the 800 was originally designed with that as an option, but it was never included. You can tell by how the PCB sticks out at the back end, with perhaps no actual traces there. So that was going to be a card-edge connector. There was also a limited developer edition of the Atari 800, with a ribbon connector on the system card and an additional PCB on the left side, with the case redesigned to bring out the extra ports and connectors. So we can iron out how far we take things as we go. It's a rather forgiving architecture.
While my idea of adding opcodes to the 6502 is not one that would likely be used, if one were to, there's something to keep in mind. FP operations on the Atari were stored as BCD. So adding any math coprocessor opcodes would mean it would need to output as BCD. That is the advantage of the Veronica BASIC cartridge. It got its speed from not only having a faster '816 on the cartridge (clocked at maybe 14 Mhz), but it also used better graphics and floating point routines. So it is possible that it used binary mode for everything, but I don't know. I don't know how Veronica worked. There would need to be host and client code, and the Veronica cartridge would need to act like a ROM on the Atari side. The Atari would handle mostly the I/O, while the Veronica would run the interpreter and the programs, passing I/O tasks onto the Atari.
I've seen benchmarks of a stock Atari running a program to draw a sphere in 2 minutes. Using the Turbo BASIC (interpreter) cartridge could run it in 55 seconds. The Veronica cartridge did it in about 28 seconds or something. And the Rapidus mod and Turbo BASIC did it in 3 seconds. I don't know what Rapidus and Veronica would do, as I've never seen anyone benchmark that. But I'd imagine it would be faster than 28 seconds since the bottleneck through the cartridge slot would likely be reduced, meaning any spinlocks to help with domain crossing would be reduced. I don't know about the Veronica internals. I mean, it would obviously have a ROM and a CPU, but I don't know whether it had RAM, latches, or what. I'd imagine latches would help with any spinlocks used in the code so you can have bus persistence from a faster CPU.
Yeah, that was due to a bug in the VIA chip used, so they may have ended up bit-banging it to get around that. And some modded things to use parallel access to speed up the I/O.
That reminds me of the original IBM computer. You could supposedly go pee, light a cigarette, and maybe pour yourself some coffee while it counted up to 640K of memory. For a while, the clone makers managed to do better than that, but nowadays, it isn't much different since the PC now does much more on boot. For instance, you have another computer inside that also has to boot (the Intel Management Engine) and the CPU won't boot without it. And while memory is so much faster, and assumed to be more reliable and not tested, there are all these other behind-the-scenes jobs to be done. Then, on top of that, the operating system takes quite a long time to come up and be ready for use. Microsoft made some services start as delayed-start, so you get to a screen faster, but you can't do anything. MS-DOS was simpler. The memory would count up, the floppy drive would strobe, and you'd have the C: prompt.
However, well known computer genius (Turbulence demo on a P1 author) did something with this:
https://www.linusakesson.net/programming/gcr-decoding/index.php
https://www.linusakesson.net/software/spindle/v1.php
https://www.linusakesson.net/software/spindle/v2.php