Shop OBEX P1 Docs P2 Docs Learn Events
How likely could a P2 emulate an Atari 800? — Parallax Forums

How likely could a P2 emulate an Atari 800?

To build an Atari 800 clone, there are only 4 ASICS that are needed. You'd need a "Sally"-variant 6502 based on the 2nd stepping of the original 6502 (not WD65C02), a POKEY, ANTIC, and a GTIA. You could emulate a CTIA instead, but why bother? And you could emulate an FGTIA if you want to do the SECAM standard. I'd say go with a modified GTIA that can output to a VGA monitor, and possibly do a modified ANTIC that supports more graphics modes if possible, based on some of the ANTIC replacements.

The above is mostly what you'd need. You'd likely need to emulate a PIA in some form as well. Regarding the FREDDY chip and 2 memory management chips, you wouldn't need those if you work with hub RAM or an SRAM. Obviously, it would be hard to do a 1088K-modded Atari unless you were willing to use serial SRAM and take advantage of the 6502's ready line to do wait states.

There are obviously other mods one can do to improve the performance. For instance, would the /halt line to the modified 6502 be as necessary? Obviously, the P2 has semaphores to signal such things. Or would it be wiser to include code to conditionally halt when the display list buffer is being simultaneously accessed to help prevent races while not slowing the CPU as much? I think the GTIA throws interrupts during V-syncs, so there is a mechanism in place to help prevent screen overruns.

Another mod one could do if they wish to step outside the compatibility a little is to mod the modified 6502 CPU emulator to use some of the illegal opcode space for new opcodes. Like, add some of the 65C02 features and '816 features, or others. Sure, you wouldn't be able to use those in the classic programs, but you could use those in a modded ROM and still get system-wide improvements. (Do other known mods in ROM too like the SIO port speed-up.) That would be most helpful if you add some floating point opcodes and have your custom ROM use those in its FPU emulator. Also, rewriting the BASIC cartridge might also be good. Those are the 2 most-used sets of code, ROM and BASIC. Thus ROM calls can take advantage of more efficient opcodes and software would gain collateral benefits. The stock BASIC could run faster with more efficient ROM routines. But a modded BASIC could do even better since it could directly call the extra opcodes, and there are other ways that would work on a stock 6502 to make it more efficient. The Turbo BASIC cartridge does such tricks to be about twice the speed of the original, using more efficient floating point emulation and drawing routines. There is a Veronica BASIC that is up to 5 times faster than that, but it cheats. It has its own '816 CPU that is clocked faster. I'm not sure if includes its own RAM or not. But having another CPU to emulate ROM and create proxied instructions for the main CPU is sometimes done.

The Kim Klone (not to be confused with a Kim-1 clone) used TTL chips and programmable logic to execute repurposed illegal 65C02 opcodes. It would get them before the CPU, and for most NOPs, it would simply do whatever and pass the NOP on. But for a few instructions, it would do whatever instruction and then have the CPU do something else, like if you want to copy from an extra external register to the CPU. In that case, you might load something from the outside register and pass it on as an immediate operation.

I see two ways to go with such a design. One is to modify the POKEY to use a PS/2 keyboard and do whatever interpretation is needed and emulate the entire machine.

Another way to go would be to make things more like the original so it could be mounted to a board that can go in the system board socket of an original. In that case, you'd probably omit the POKEY emulation and any PIA emulation as that would be on the mainboard. In that case, you'd leave all the other boards out. The system board tends to have just the 6502, GTIA, and ANTIC chips with their glue logic, and everything else is on the mainboard and the other cards. There may be several memory cards and a ROM card, and if you do RAM and ROM closer to the CPU, you wouldn't need those. And if this could work, you wouldn't need the potentiometer on the system board either if you convert things to use VGA. That pot is to make up for voltage changes affecting the colors. And if dropping this into a stock machine, you might need a CAS-inhibit line to disable the FREDDY chip, since there is no reason it should be refreshing the RAM.

Another note about mixing with the original parts would be voltages. In that case, anything that goes to the external bus may need to use voltage levelers. If by chance 3.3V chips can drive the 5V chips on the mainboard (and distance could be a factor), you'd likely need a resistor and a diode (thus no more than 3.3V passes either way). But signal levelers would likely be best.

Now, if you emulate the SIO port, then you'd need voltage levelers there. However, you could decide to dispense with that and provide another means and update the ROM to use it. The SIO port is interesting since we still have a modified version of that today (USB). It is a packet-based serial protocol that relies on smart devices. The 410 floppy drive used a 6507 CPU, just like the Atari 2600 console. I think it could handle I/O a little faster than what a stock Atari 800 could provide. A few ROM and DOS mods could speed that up some, and some folks did. On some of the modded Atari 800's (and the other 8-bit computers of that type, including the 400, the XE, and the XL), they added some version of the IDE/ATA protocol and used CF cards. I imagine SD cards could be made to work. With interrupts, a /halt line, and a /rdy line, that should be possible.

So could a P2 emulate an Atari 800 or similar, and would it require more than one?

«1

Comments

  • pik33pik33 Posts: 2,350

    P2 can emulate A8. Maybe in more than 1 cog, GTIA is a complex stuff to emulate. This was my 1st home machine when I was young so I always wanted to do this, but there is always something else to do.

    We have already a 6502, it only needs to be fit to the rest of the design.

    I have a miniature of 1050 FDD on my desk. This is the reduced size 1050 3D-printed case to fit 3.5" FDD. These are ready to print files on Thingivesre. Adafruit did a FDD controller with Raspberry Pico: maybe it can also be modified to fit the project.

  • PurpleGirlPurpleGirl Posts: 151
    edited 2022-08-30 12:21

    @pik33 said:
    P2 can emulate A8. Maybe in more than 1 cog, GTIA is a complex stuff to emulate. This was my 1st home machine when I was young so I always wanted to do this, but there is always something else to do.

    We have already a 6502, it only needs to be fit to the rest of the design.

    I have a miniature of 1050 FDD on my desk. This is the reduced size 1050 3D-printed case to fit 3.5" FDD. These are ready to print files on Thingivesre. Adafruit did a FDD controller with Raspberry Pico: maybe it can also be modified to fit the project.

    Obviously, I assumed multiple cogs. Whatever the number of cogs that the 6502 CPU emulator would use, plus whatever you'd need for ANTIC, GTIA, and POKEY, plus any needed for glue logic. When I said more than one P2s, I meant multiple P2s if you run out of cogs.

    As for a drive, in this day and age, you'd do better to find a way to hack in an SD card. It could use the SIO protocol, and you could go as far as to have some other cheap microcontroller or even another cog to do interpretation to make it FAT32 compatible (and make things easier for transferring disk images with a PC). And yes, there are even 3D prints available for SD socket covers, and one could go as far as to put a drive busy light in there.

    As for the 6502 core, it should be the original 6502, not a 65C02, with the most-used illegal instructions (such as shift instructions and instructions that wrote to two registers), with the code modded to include the equivalent of a hardware halt signal. The "Sally" 6502s added a multiplexer inside to remove the 6502 from the bus. It isn't enough to de-assert the Ready line. The bus must be cleared so the ANTIC (and DRAM refresh on the original machine) can work. On the first of the Atari 800 machines, there were 4 more chips. Then Atari had MOS make one with this circuitry inside and repurpose one of the NC lines for the halt line. It may have also added another R/W line, but I don't know the significance of that.

    I still like my other ideas of taking some of the unused 6502 instruction slots and repurposing them. A few FP instructions, a 3rd index register, the most-used illegal instructions (both shifts and bug-fixed), etc. I wouldn't include the most stupid of the illegal opcodes (like the one that's guaranteed to cause a crash).

  • Wuerfel_21Wuerfel_21 Posts: 4,501
    edited 2022-08-20 18:46

    How likely could a P2 emulate an Atari 800?

    Much likely.

    Slight note: You shouldn't neccessarily think of the individual physical chips when it's highly integrated stuff. Think of functional blocks.

    For example, on Megadrive, all these things are inside the VDP chip that end up being rather independent in my emulator:
    - Video rendering (duh) -> split into 3 cog-sized components: layer rendering, compositing, scanout
    - VRAM access port and DMA -> implemented in 68000 memory handlers (caveat: VRAM fill/copy halts the CPU)
    - SN76489 clone -> bolted onto YM2612 cog (weird port interface handled in memory handler)
    - data bus transceiver between Z80/68000 -> implemented in memory handlers (rather obvious)

    (on NeoGeo this is even worse, particularly original MV4/MV6 boards, the video pipeline is split into like 4 ASICs and some of them have random other functionality like the watchdog or the controller inputs shoved into them)

    Same thing with POKEY doubling as synthesizer/keyboard IF/serial IO. Technically the oscillators generate the serial clock, but that's an implementation detail that doesn't really matter (I think). Infact, the serial port having a clock or being serial at all may be abstracted away if the peripherals are virtual to begin with.

  • pik33pik33 Posts: 2,350
    edited 2022-08-20 16:37

    As for the 6502 core, it should be the original 6502,

    It is near. Macca did it for his esoteric 6502 machine emulator as a "clear" 6502 with nops instead of illegals. Then I used it for my SID player. As a C64 SID tune authors (ab)use 6502 illegal instructions, several of these was added by macca (before I attempted to do it myself). Adding the rest of them should be not a problem. It is one cog.

    I have tried already to do gtia. As it is heavily connected with Antic, I think both chips can fit together in one cog

    The third cog should do a Pokey.

    PSRAM may be also useful as there are several memory extensions available for A8, up to 2 MB, and there is VBXE, a graphic card with half a meg memory.

    There is no problem with SD access, we have it already in Flexprop, but I dream of reviving the floppy. Maybe not a 5.25", although I have several of them, but 3.5" - I have already print the case, and using the Adafruit contraption it should be possible to format these 3'5" in the Atari style.

  • @Wuerfel_21 said:

    How likely could a P2 emulate an Atari 800?

    Much likely.

    Slight note: You shouldn't neccessarily think of the individual physical chips when it's highly integrated stuff. Think of functional blocks.

    For example, on Megadrive, all these things are inside the VDP chip that end up being rather independent in my emulator:
    - Video rendering (duh) -> split into 3 cog-sized components: layer rendering, compositing, scanout
    - VRAM access port and DMA -> implemented in 68000 memory handlers (caveat: VRAM fill/copy halts the CPU)
    - SN76489 clone -> bolted onto YM2612 cog (weird port interface handled in memory handler)
    - data bus transceiver between Z80/68000 -> implemented in memory handlers (rather obvious)

    (on NeoGeo this is even worse, particularly original MV4/MV6 boards, the video pipeline is split into like 4 ASICs and some of them have random other functionality like the watchdog or the controller inputs shoved into them)

    Same thing with POKEY doubling as synthesizer/keyboard IF/serial IO. Technically the oscillators generate the serial clock, but that's an implementation detail that doesn't really matter (I think). Infact, the serial port having a clock or being serial at all may be abstracted away if the peripherals are virtual to begin with.

    I like to think of it in terms of the original hardware as it helps me think of it the other way. So actually, borrowing from what I said and what you said, one could actually upgrade it and go with USB. SIO was the earliest implementation of USB and had the same lead developer. You are thinking more like a coder while I am thinking more in terms of a board designer since I mentioned the possibility of one making the system card for the 400/800. In that case, you need the functionality of the CPU and the 2 display chips. That simplifies things, and ROM/RAM could also be included there, and one probably should use a CAS-inhibit line (or mod the mainboard with a dummy Freddy). That could simplify an overall design. Thus the mainboard deals with POKEY, PIA, SIO, the keyboard, game sockets, etc. And if one wants the VGA output as I'd like to see, then one could unplug the line cable from the RFM and maybe widen the hole in the back and put a VGA cable there. And if going for VGA and leaving out the memory boards, then one could leave off the color adjustment pot since that only adjusts the signal strength for color burst to calibrate the colors based on the voltage drop. Something as simple as adding a memory card could cause the colors to need to be recalibrated. So if you replace all the memory, the decoders, other glue logic, the CPU, ANTIC, and GTIA, you'd likely use less power, and using VGA breaks out the individual colors, so the pot here wouldn't be necessary. The neat thing is if a P2 could do most of the system board is that it would actually fit inside the cast aluminum shield. To add extended ANTIC and GTIA functionality, some folks added piggyback header boards to the other A8 varieties which won't fit in the 800 without removing the RFI shielding.

    Another detail one might add for nostalgia is the sound that comes from the PIA. It doesn't actually have to come from a PIA, just so long it is mapped to the same addresses. That is what generated the key press clicks and provided an error siren. That went to a speaker in the cabinet, while the POKEY sound went out the modulator.

    And it wouldn't hurt to map a simplified POKEY to the commonly used secondary location. Thus stereo or 8-channel mono could be possible. A number of folks have modded theirs to include a 2nd POKEY, and omitted using all the interrupt and device-handling stuff, just so they could extend the sound. Similar usage was done in pinball and arcade games. They didn't need the Atari computer functionality, just the sound, and there were up to 16-channel POKEYs available. With 16-channels, that could provide up to 8 channels at 16-bits. I've never seen a quad POKEY actually existing on any of the machines that Ron Lyons repaired on his YouTube videos. Those are rarer than plain POKEYs, but thankfully, there are FPGA replacement boards available for that. It was too bad that most manufactured POKEYs are in landfills. When there were blowouts on cartridges, and since it was before 3D printing, folks would buy up the cartridges, strip the PCBs, and use the PCB and case for their own cartridges. They never thought to sell the chips and particularly the POKEYs.

    Speaking of the PIA (and all 6518-23/8518-23 varieties), folks used to like to "abuse" them in all sorts of ways. So they were used as sound chips, stack counters, and other things beyond their intended purpose. I was wondering how the one used it as a stack counter for a wrap-around coprocessor since the PIA/VIA only counts forward. They added an inverting octal transceiver chip. So if the stack was 255 the PIA would see 0, and when it incremented to 1, the SP would be at 254. They did it this way since they already used the PIA for something else and didn't want to add multiple counter registers.

  • @pik33 said:

    As for the 6502 core, it should be the original 6502,

    It is near. Macca did it for his esoteric 6502 machine emulator as a "clear" 6502 with nops instead of illegals. Then I used it for my SID player. As a C64 SID tune authors (ab)use 6502 illegal instructions, several of these was added by macca (before I attempted to do it myself). Adding the rest of them should be not a problem. It is one cog.

    I have tried already to do gtia. As it is heavily connected with Antic, I think both chips can fit together in one cog

    The third cog should do a Pokey.

    PSRAM may be also useful as there are several memory extensions available for A8, up to 2 MB, and there is VBXE, a graphic card with half a meg memory.

    There is no problem with SD access, we have it already in Flexprop, but I dream of reviving the floppy. Maybe not a 5.25", although I have several of them, but 3.5" - I have already print the case, and using the Adafruit contraption it should be possible to format these 3'5" in the Atari style.

    At least that would be a good starting base for the 6502. Implementing some variation of the SALLY mod might be good, not for hardware races, but to prevent software races. (Or if one is up to it, apply selective halting to let normal code continue but halt if an instruction tries to write a display list.) Of course, now that I've thought about it, ANTIC could just assert the /RDY line. Duh! Since concurrent DMA is inherent to the P2, you don't actually need to unlatch anything, just tell the CPU that the bus is busy. That mechanism would be needed for external serial RAM (whether PS or truly static) to buy time to access it. But yeah, whatever mechanism that's used for that could be used in place of a /HALT line.

    And some demos used the more useful illegal opcodes. One or two were undocumented because they were broken. As I've said, I'd love to see the remaining opcodes from what was used on the original be used for enhanced extra instructions. Such as 3rd index register instructions (and whatever new modes that inherently adds for aware software), making the shifts legal instructions, and a few FPU instructions. That would make it possible to rewrite the ROM (and BASIC) to speed up the FPU emulator calls. With a new ROM and BASIC, more memory could be made available under certain circumstances.

  • A thought came to mind on POKEY emulation. It would be nice in 8-bit mode to have a way to get it more in tune, sound-wise. Programmers used a note chart for programming music. The chart was in the beginner's manual and gave you the machine values. The problem is that the suggested values in the chart are not 100% linear and the notes produced may fall outside the even temperament scale chart a little in places. I get why that was the case. The default POKEY music mode was 8-bits. So what if one could have a lookup table so that when the numbers most used for music are encountered, more accurate notes were played?

    For a POKEY, one would want to include all the operation modes it has. I mean, we'd need a 16-bit POKEY mode that halves the number of channels. Thus, 9-octave sounds are possible when software enables this mode. There might be a couple of modes based on the main clock frequency, maybe one that is tied to 60 Hz, and the ability to directly bit-bang it if one wants. Bit-banging was occasionally done, but usually in assembly. And it wouldn't hurt to have 2 POKEYs, but the 2nd one doesn't have to be I/O complete as it is just used as a PSG. We'd need to know where to put the 2nd one on the memory map. Modders did add an additional POKEY, and there are demos that use it if it is there. Thus you can have stereo output or 8-channel sound.

    And with the earlier discussion, there is no reason why in a P2 implementation the PSG and the I/O portions of a POKEY even have to be in the same cog.

    Personally, I'd like to see a hardware setup that is in fidelity with the Atari 800, and that means using SIO, 4 Atari-style joystick ports, cartridge slots, etc. If one insists on that, they might need a 2nd P2 as they may run out of pins. Now, yes, if one wants to use Famicon or USB controllers and forego SIO, the wiring could be simpler. Speaking in general, it would be nice to work out some sort of "southbridge" protocol for folks who want to use multiple P2s together. One popular bus that Atari modders have added to A8s is an ATA/IDE bus, and I presume PATA. Of course, they tend to use CF cards with that, not hard drives.

  • pik33pik33 Posts: 2,350

    We have 1.79 MHz, 64 kHz and 15 kHz as a base for Pokey frequencies. This is divided by the value+3 for 8bit and value+7 for 16bit. This means we cannot have "exact" note frequencies at 8-bit and 16-bit needs to use 1.79 MHz main clock. We can also not have any "better" lookup table

    One of workarounds used by demosceners is to continuously switch between values to keep the average frequency in tune.

    The second POKEY in standard A8 extension is $20 "up" from the first.

  • PurpleGirlPurpleGirl Posts: 151
    edited 2022-08-25 02:25

    There is nothing that says a pokey replacement has to be clocked internally at the board clock. Maybe you can't do that on a P2, but I am sure I could do that on an FPGA. So what you could have is a faster core and lookup tables to remap the 256 values to where the ones most used are aligned with the nearest music note. What I said before is possible. It has 16-bit registers. So it wouldn't be hard to remap 8-bit values to work in the 16-bit internal divisor tables. I do believe it is possible to produce an in-tune Pokey. If I have to make one to prove others wrong, I will. It is very possible to replace the internal math with a list of more arbitrary values. Again, maybe not on a P2, but I'm sure I could make one on an FPGA. I think translation rather than division is possible.

    I don't like to be around those who like to throw cold water on people's ideas. To me, that is incredibly rude.

    And in what I want, I don't want NTSC compatibility. So the board frequency would do better to match what a VGA GTIA would need. So a custom POKEY would do better to be remapped to work with those frequencies.

  • The weird intonation in Atari music is just part of how it is. It's what the composer heard when making it, so why change it? Also it'd make vibrato/slide weird and wouldn't really work if a different scale or reference frequency (NTSC vs PAL) is used.

    And in what I want, I don't want NTSC compatibility. So the board frequency would do better to match what a VGA GTIA would need. So a custom POKEY would do better to be remapped to work with those frequencies.

    Nothing stops the audio generator from running at whatever fractional frequency ratio it wants to. Also nothing stops you from generating slightly-offspec VGA/DVI at whatever exact timing the original hardware uses.

  • Well, I think fixing the tuning issue would have negligible effects on special effects, and I have nothing to display NTSC or PAL. If I were to do one, give me RGB. That said, for existing demoscene stuff and not BASIC programs, surely a register, unused control bit, non-standard opcode, or something could be used to change the function. I'd likely say to add that only for the 8-bit mode.

    But the notorious tuning issue is why Commodore commissioned the SID.

    And yes, I know about off-spec VGA. Even the Gigatron does that. I mean, I don't think a 6.275-6.3 Mhz crystal could be found. The standard 640x480 pixel clock is a tiny amount over 25.1. So 6.25 is slightly slow for 160 across, so Marcel tweaked the porch times to correctly center the image. It is nice that he designed a few specialized instructions to make bit-banging a bitmap more feasible. So an AND/OR can read from memory, send it to the port, and increment one of the index registers all at once. Still, that only works as good as the clock rate, and it is a tad slow for QQVGA, but luckily you can reduce the blanking intervals slightly and get by with it, even if your vertical frequency is 59.8 Hz.

  • pik33pik33 Posts: 2,350
    edited 2022-08-25 18:11

    whatever exact timing the original hardware uses.

    312 total lines for PAL, 262 total lines for NTSC, up to 240 lines of active picture
    114 CPU clock = 224 color(*) clock = 448 "hi-res" pixels total per line.

    CPU clock = PAL color subcarrier oscillator (17.734475) /10 = 1.7734475 MHz for PAL systems
    = NTSC color subcarrier oscillator (14.318182)/8 = 1.78977275 MHz for NTSC system

      • for PAL systems it is no more a color clock as the PAL color clock is 4.43 MHz, but the name is still used for PAL systems

    Pokey is clocked by the CPU clock. It can be divided internally by 28 or 114 (don't ask me why they selected these numbers)

    This is divided by the 8- or 16-bit register value

    for the main frequency clock  (1,773447 MHz): F = CLOCK/(2*(N+M)), where N is the AUDF value and  M is modifier, M = 4 for 8-bit M = 7 for 16-bit
    for the 1/28 or 1/114 of the main clock F = CLOCK/(2*(N+1)). 
    

    There are 4 8-bit registers. They can be linked to get 2 16-bit registers, which means one of the "linked" sound channel is lost.

    The most standard configuration is clock/28, which gives this table of frequencies:

    0 31668.70508
    1 15834.35254
    2 10556.23535
    3 7917.17627
    4 6333.741211
    5 5278.117676
    6 4524.100586
    7 3958.588135
    8 3518.745117
    9 3166.870605
    10 2878.973145
    11 2639.058838
    12 2436.054199
    13 2262.050293
    14 2111.24707
    15 1979.294067
    16 1862.86499
    17 1759.372559
    18 1666.773926
    19 1583.435303
    20 1508.033569
    21 1439.486572
    22 1376.900269
    23 1319.529419
    24 1266.748169
    25 1218.0271
    26 1172.915039
    27 1131.025146
    28 1092.024292
    29 1055.623535
    30 1021.571106
    31 989.6470337
    32 959.6577148
    33 931.4324951
    34 904.8201294
    35 879.6862793
    36 855.9109497
    37 833.3869629
    38 812.0180664
    39 791.7176514
    40 772.4074707
    41 754.0167847
    42 736.4815063
    43 719.7432861
    44 703.7490234
    45 688.4501343
    46 673.8022461
    47 659.7647095
    48 646.3001099
    49 633.3740845
    50 620.9550171
    51 609.0135498
    52 597.5227661
    53 586.4575195
    54 575.7946167
    55 565.5125732
    56 555.5913086
    57 546.012146
    58 536.7576904
    59 527.8117676
    60 519.1591187
    61 510.785553
    62 502.6778564
    63 494.8235168
    64 487.2108459
    65 479.8288574
    66 472.6672363
    67 465.7162476
    68 458.9667358
    69 452.4100647
    70 446.0380859
    71 439.8431396
    72 433.8178711
    73 427.9554749
    74 422.2493896
    75 416.6934814
    76 411.2818909
    77 406.0090332
    78 400.8696899
    79 395.8588257
    80 390.9716797
    81 386.2037354
    82 381.5506592
    83 377.0083923
    84 372.572998
    85 368.2407532
    86 364.0081177
    87 359.8716431
    88 355.8281555
    89 351.8745117
    90 348.0077515
    91 344.2250671
    92 340.5237122
    93 336.901123
    94 333.3547974
    95 329.8823547
    96 326.4815063
    97 323.1500549
    98 319.8858948
    99 316.6870422
    100 313.5515442
    101 310.4775085
    102 307.4631653
    103 304.5067749
    104 301.60672
    105 298.7613831
    106 295.9692078
    107 293.2287598
    108 290.5385742
    109 287.8973083
    110 285.3036499
    111 282.7562866
    112 280.2540283
    113 277.7956543
    114 275.3800354
    115 273.006073
    116 270.672699
    117 268.3788452
    118 266.1235657
    119 263.9058838
    120 261.724823
    121 259.5795593
    122 257.4691467
    123 255.3927765
    124 253.3496399
    125 251.3389282
    126 249.3598785
    127 247.4117584
    128 245.4938354
    129 243.605423
    130 241.7458344
    131 239.9144287
    132 238.1105652
    133 236.3336182
    134 234.5830078
    135 232.8581238
    136 231.158432
    137 229.4833679
    138 227.8324127
    139 226.2050323
    140 224.6007385
    141 223.019043
    142 221.4594727
    143 219.9215698
    144 218.4048615
    145 216.9089355
    146 215.4333649
    147 213.9777374
    148 212.5416412
    149 211.1246948
    150 209.7265167
    151 208.3467407
    152 206.9850006
    153 205.6409454
    154 204.3142242
    155 203.0045166
    156 201.7115021
    157 200.434845
    158 199.1742401
    159 197.9294128
    160 196.7000275
    161 195.4858398
    162 194.2865295
    163 193.1018677
    164 191.9315491
    165 190.7753296
    166 189.6329651
    167 188.5041962
    168 187.3887939
    169 186.286499
    170 185.197113
    171 184.1203766
    172 183.0560913
    173 182.0040588
    174 180.964035
    175 179.9358215
    176 178.9192352
    177 177.9140778
    178 176.9201355
    179 175.9372559
    180 174.9652252
    181 174.0038757
    182 173.0530396
    183 172.1125336
    184 171.1821899
    185 170.2618561
    186 169.3513641
    187 168.4505615
    188 167.5592804
    189 166.6773987
    190 165.8047333
    191 164.9411774
    192 164.0865479
    193 163.2407532
    194 162.4036102
    195 161.5750275
    196 160.7548523
    197 159.9429474
    198 159.1392212
    199 158.3435211
    200 157.5557404
    201 156.7757721
    202 156.003479
    203 155.2387543
    204 154.4814911
    205 153.7315826
    206 152.9889069
    207 152.2533875
    208 151.5249023
    209 150.80336
    210 150.0886536
    211 149.3806915
    212 148.6793671
    213 147.9846039
    214 147.2962952
    215 146.6143799
    216 145.938736
    217 145.2692871
    218 144.605957
    219 143.9486542
    220 143.2973022
    221 142.651825
    222 142.0121307
    223 141.3781433
    224 140.7498016
    225 140.1270142
    226 139.5097198
    227 138.8978271
    228 138.2912903
    229 137.6900177
    230 137.0939636
    231 136.5030365
    232 135.9171906
    233 135.3363495
    234 134.7604523
    235 134.1894226
    236 133.62323
    237 133.0617828
    238 132.5050354
    239 131.9529419
    240 131.4054108
    241 130.8624115
    242 130.3238831
    243 129.7897797
    244 129.260025
    245 128.7345734
    246 128.2133789
    247 127.6963882
    248 127.1835556
    249 126.6748199
    250 126.1701431
    251 125.6694641
    252 125.1727448
    253 124.6799393
    254 124.1910019
    255 123.7058792
    

    Becasuse the audio frequency is the result of dividing the base by the register value, this is the 1/x shaped, nonlinear function. Also the granularity is much worse for higher frequencies.

    Of course demosceners have a lot of tricks to do with Pokey. The example results:




    2 Pokeys:

  • hinvhinv Posts: 1,252

    @pik33 said:
    There is no problem with SD access, we have it already in Flexprop, but I dream of reviving the floppy. Maybe not a 5.25", although I have several of them, but 3.5" - I have already print the case, and using the Adafruit contraption it should be possible to format these 3'5" in the Atari style.

    Now that to me is funny, but I get it on the Atari 8bit, especially for those that have original disks.
    My own experience with buying an A500+ is that the floppy was the thing that I really hated about it. Unlike the Atari 8bits and the C64/C128, the Amiga is completely dependent on the floppy at the OS level, so it was SOOOO annoying waiting for the floppy to boot or load games, and the size limitations meant that I had to strip stuff out of a workbench disk in order to get on the programs I wanted to use. I didn't miss floppies at all after that experience, and I still have a lot of physical disks to go through that might just end up in the trash instead.

  • hinv,

    The C64 had the slowest floppy drive of all with the 1541; the joke was that you could make yourself a sandwich while it was loading.

  • hinvhinv Posts: 1,252

    @PurpleGirl said:

    ...

    And it wouldn't hurt to map a simplified POKEY to the commonly used secondary location. Thus stereo or 8-channel mono could be possible. A number of folks have modded theirs to include a 2nd POKEY, and omitted using all the interrupt and device-handling stuff, just so they could extend the sound. Similar usage was done in pinball and arcade games. They didn't need the Atari computer functionality, just the sound, and there were up to 16-channel POKEYs available. With 16-channels, that could provide up to 8 channels at 16-bits.

    The problem with doing all of these upgrade is that it makes the machine unique, which is great for the code that you want to write, but for running other code, it kind of defeats the purpose of emulation in the first place. If you want to build audio and video to your specs resulting in a unique system, you can certainly do this inside the P2 and avoid a lot of the limitations of the day, like 40-48 pin DIP max package sizes constraints, clock speed constraints that drove a lot of the 8-Bit designs.

  • hinvhinv Posts: 1,252
    edited 2022-08-26 13:02

    @Genetix said:
    hinv,

    The C64 had the slowest floppy drive of all with the 1541; the joke was that you could make yourself a sandwich while it was loading.

    Yeah, I remember. I had a 1571, but I think it was still limited by the 300 baud or so bus interface. I spend literal months of my life waiting for floppy loads. I had friends with cassette drives though, so I was happy to have a floppy.
    The C64/C128, like the A8's were ready to program 3 seconds after flipping the switch through. The Amiga, on the other hand wasn't. I didn't have as much experience with the Atari floppy drives because the 400's and 800's at school simply didn't have them IIRC. We typed in a lot of basic programs (which was especially annoying on the 400), but since at that point I didn't have my own computer, I still was excited to do it.

  • @hinv said:

    @PurpleGirl said:

    ...

    And it wouldn't hurt to map a simplified POKEY to the commonly used secondary location. Thus stereo or 8-channel mono could be possible. A number of folks have modded theirs to include a 2nd POKEY, and omitted using all the interrupt and device-handling stuff, just so they could extend the sound. Similar usage was done in pinball and arcade games. They didn't need the Atari computer functionality, just the sound, and there were up to 16-channel POKEYs available. With 16-channels, that could provide up to 8 channels at 16-bits.

    The problem with doing all of these upgrade is that it makes the machine unique, which is great for the code that you want to write, but for running other code, it kind of defeats the purpose of emulation in the first place. If you want to build audio and video to your specs resulting in a unique system, you can certainly do this inside the P2 and avoid a lot of the limitations of the day, like 40-48 pin DIP max package sizes constraints, clock speed constraints that drove a lot of the 8-Bit designs.

    That is why I suggested a way to turn my proposed feature off and go for default behavior. Those who make the FPGA Ataris usually make their own ROMs and OS too, and go as far as to include a "CMOS" configuration page. That way, they can select the CPU type, the number of Pokeys, how much RAM is available, the available video modes, and the display type, and get nearly any program written for them to work.

    I suggested various performance enhancements that won't affect compatibility. Such as adding opcodes in places where no programs written for them use them. Thus, one can only rewrite their BIOS and maybe their BASIC cartridge and gain performance. The only downside to this one is if you make the ROM routines too good and some games are unplayable due to the performance.

    The 2nd Pokey upgrade is a common and known one, and very unlikely to cause side effects.

  • pik33pik33 Posts: 2,350

    The de facto standard now are 2 pokeys and at least 256 kB, if not 1 MB, of RAM.

    For adding opcodes to 6502 while making it still compatible with the original, several slots with KIL instruction are avalilable. KIL stops the CPU at all until reset, so it can be replaced with something else (but not HCF :) )

  • I had to look up HCF as an opcode. LMAO! Yeah. I've seen a few machines like that. "What's wrong with it?" "It's giving off holographic error codes in the form of smoke signals."

    An instruction to make it lockup isn't useful. At least x86 machines have a conditional Halt instruction. That is useful since it is used for syncing with external devices. That is really what FWAIT is. It halts the CPU until the FPU is finished. That can cause a race condition since the main CPU may have used the result from the FPU before it was available. And that is a case where assembly can help with performance. I mean, a programmer would know how long things should take and what would be a reasonable safety margin, but a compiler doesn't, and since those are made for those of wide ranges of skill, they are coded for safety. I mean, if you dispatch FPU ops a little way before you use them, you can allow for true "co-processing." But if you can't structure your code that way and you need to do something with the result right after the FPU returns it, then you must wait on the FPU to finish.

    I do love the fact that the P2 6502 core performs on par with the fastest official 65C02 offered in DIP packaging. It should perform about 14 Mhz using a 320 Mhz clock. It is funny of the story behind ARM where they went to Bill Mensch of WDC (Western Design Center, not the hard drive maker) and asked for help making a 32-bit CPU that is in no way related to the 6502 or the '816. He told them to take a hike. But, I wish he and Chuck would have surveyed the user base and use cases of the undocumented opcodes of the 6502 before making the 65C02. That would have saved a lot of headaches.

    A lot of FPGA 6502/65C02 designs use a single-phase clock. Using a multi-phase clock may add to board complexity, but I get why they would do it that way. If you want to avoid pipelines, slot delays, etc., or cheat with the clock rate to get it up, you can use multiple overlapping clocks to give each part of the process enough time. Even the Gigatron does this. Memory writes (off of Clk 2) borrow some of the time of Clock 1 being low. Then its rest period ends after the next cycle of Clock 1 has given enough setup time for things early in the cycle. So what takes close to 190 ns can complete in 160 ns. Of course, for an Atari 800, everything needs to finish in under 558 ns.

  • I am not familiar with Atari computers but according to Wikipedia, the 800 used a 6502C which is based on the original design.
    The original 6502 is known to have a number of stable illegal opcodes that were sometimes used for games but exploited in demos.
    Commodore continued to use the original 6502, which they owned, on their computers, bugs and all.

    The 65C02 is a CMOS version of the 6502 made by a different company that fixed a number of bugs but also made all illegal opcodes act as NOP, or No Operation, which did nothing but each up processor timing cycles.

  • @Genetix said:
    I am not familiar with Atari computers but according to Wikipedia, the 800 used a 6502C which is based on the original design.
    The original 6502 is known to have a number of stable illegal opcodes that were sometimes used for games but exploited in demos.
    Commodore continued to use the original 6502, which they owned, on their computers, bugs and all.

    The 65C02 is a CMOS version of the 6502 made by a different company that fixed a number of bugs but also made all illegal opcodes act as NOP, or No Operation, which did nothing but each up processor timing cycles.

    Actually, they used a modified 6502B called Sally. While they put a C on it, "6502C" and "6502 C" are not the same. That naming issue has led to confusion. The difference from the earlier ones is a Halt line. For a P2 clone of the Atari, we don't need the /Halt line. We can add the /Rdy line to the emulator's code and use that in place of the /Halt line. The A8s used bus-mastering DMA to let the Antic chip work. So the CPU had to be halted and unlatched. Antic took over the RAM and the rest of the bus, and then passed it back and released the bus. In a P2 setup, we have inherent, 8-channel, concurrent DMA (which is why the hub is so slow, though much faster than on the P1). So we don't need to release the bus, as that automatically occurs as we access the hub. So hardware races can't occur, but software races are another issue. You don't want the 6502 to rewrite the display list midway through. One can likely have more freedom of when to pause the CPU, but for best compatibility, it should have similar enough behavior. So this CPU distinction is not important for emulating it on the P2. Emulating this across different microcontrollers would be different, but we should be able to do it all in the P2.

    And yes, Commodore Semiconductor Group bought out MOS. On the C64, actually, they used the 6510. That had 6 port lines, and I think they used those for memory banking or something, not general I/O which came from RAM.

    Actually, Western Design Center (not to be confused with Western Digital Corporation) wasn't so much a "different company." The 6502 inventors left CSG and started it. I guess they couldn't stand Jack Tramiel. Tramiel eventually moved to Atari, and not too many were thrilled. He was the one who started the price wars. His decision to acquire MOS was good for the industry at the time since it kept TI in check regarding component prices (they reluctantly sold to their competitors and started jacking the prices), but what hurt all the smaller computer manufacturers was Tramiel viciously undercutting TI. So an $800 computer went to under $100 overnight. Not only did it hurt TI, it hurt Commodore, Atari, and various others. It didn't harm Apple and IBM as they used a different customer base and pricing model, and schools, businesses, and high-end enthusiasts bought those. So if anything, it helped the giants. They let their competition "kill" each other, let the customers pick through the carnage, and waited it out. Then everyone came to them because nobody was left.

    Then Tandy came around and cloned the IBM PC, almost, with their own multimedia features (though they copied TI on the additional sound capabilities and used their sound chip). IBM tried to sue them, but the courts said no infringement took place. They used their own PCB design and they guessed what was in the ROM without actually ever seeing it. That paved the way for other companies to make clones. And in the midst of that, an obscure company called Acorn RISC Machines (check the initials) popped up with their Acorn computer in the late '80s. It didn't seem to go far from what I saw, but they never went away. My parents tried to get one, but Western Auto never got them in stock as it said in their sales paper. I guess they had their own CPU, and the goal was to produce cheap computers for the underprivileged and those in developing nations. Now, they restructured and changed what their initials stood for, and now they are ARM Holdings.

  • hinvhinv Posts: 1,252

    @Genetix said:
    I am not familiar with Atari computers but according to Wikipedia, the 800 used a 6502C which is based on the original design.
    The original 6502 is known to have a number of stable illegal opcodes that were sometimes used for games but exploited in demos.
    Commodore continued to use the original 6502, which they owned, on their computers, bugs and all.

    The 65C02 is a CMOS version of the 6502 made by a different company that fixed a number of bugs but also made all illegal opcodes act as NOP, or No Operation, which did nothing but each up processor timing cycles.

    That IS interesting. How many 6502 illegal codes are now just NOP on the 65C02?

    @PurpleGirl said:
    I had to look up HCF as an opcode. LMAO! Yeah. I've seen a few machines like that. "What's wrong with it?" "It's giving off holographic error codes in the form of smoke signals."

    But, I wish he and Chuck would have surveyed the user base and use cases of the undocumented opcodes of the 6502 before making the 65C02. That would have saved a lot of headaches.

    Yeah, that would have been nice since some of the illegal opcodes were used to increase performance. If I was going to roll my own processor, I would probably start with a 6502, depending on how many useless undocumented instructions there are. I would probably strip out the BCD instructions as well to make room for more useful instructions.

  • @hinv said:

    @Genetix said:
    I am not familiar with Atari computers but according to Wikipedia, the 800 used a 6502C which is based on the original design.
    The original 6502 is known to have a number of stable illegal opcodes that were sometimes used for games but exploited in demos.
    Commodore continued to use the original 6502, which they owned, on their computers, bugs and all.

    The 65C02 is a CMOS version of the 6502 made by a different company that fixed a number of bugs but also made all illegal opcodes act as NOP, or No Operation, which did nothing but each up processor timing cycles.

    That IS interesting. How many 6502 illegal codes are now just NOP on the 65C02?

    There’s more to it than just rendering some as NOP, there’s also some new instructions and some new addressing modes. The 6502 has 151 valid opcodes (105 illegal) while the 65C02 has different numbers of valid opcodes depending on which revision you have, but the latest has 212 valid (44 illegal).
    Despite the manufacturer’s claim, some of those ‘NOP’s do more than eat cycles, but perform extra reads and can block interrupts for themselves and the following instruction (undocumented behaviour.)

    @PurpleGirl said:
    I had to look up HCF as an opcode. LMAO! Yeah. I've seen a few machines like that. "What's wrong with it?" "It's giving off holographic error codes in the form of smoke signals."

    But, I wish he and Chuck would have surveyed the user base and use cases of the undocumented opcodes of the 6502 before making the 65C02. That would have saved a lot of headaches.

    It’s easy to forget that back in 1981, when the development of the 65C02 started, with no public Internet there was no easy way to even identify who the ‘user base’ were, let alone engage with them about the use of opcodes that weren’t even listed in the original datasheet. They would have needed to pay for adverts in every trade and hobby magazine at the time, and then waited several months for any replies, while the market (and competitors) moved ahead. It just would not have been a feasible approach in 1981.
    Conversely, you might wish that noone had used the illegal opcodes on the 6502 (that seems to be Bill Mensch’s view), and that would have saved a lot of headaches.
    Future versions of the processor would only need to comply with the intended operation of the original while extending the capabilities.

    Yeah, that would have been nice since some of the illegal opcodes were used to increase performance. If I was going to roll my own processor, I would probably start with a 6502, depending on how many useless undocumented instructions there are. I would probably strip out the BCD instructions as well to make room for more useful instructions.

    There are no ‘BCD instructions’ just a decimal mode flag and instructions to set and clear it. While that flag bit could be left unused, you don’t get much (if anything) back from omitting BCD.

  • PurpleGirlPurpleGirl Posts: 151
    edited 2022-08-31 12:11

    @AJL said:
    There are no ‘BCD instructions’ just a decimal mode flag and instructions to set and clear it. While that flag bit could be left unused, you don’t get much (if anything) back from omitting BCD.

    That is just semantics.

    Knowing a bit about the inside, omitting that mode may actually help you decrease latency and increase the clock rate in a real design. In the original, at least, the BCD mode cuts into the critical path. It also led to design decisions that slowed the regular binary mode. If you need BCD stuff, you'd need a nibble-sized adder and extra microcode to add and subtract 6 in addition to the main adds. (TBH, there is no subtraction in a CPU, just adding the complement and provoking an unused carry.) Anyway, if you break the circuitry out for a nibble adder to be able to use and intercept the carry signal independently before it becomes bit 4 so you can add 6 to that, you insert a ripple carry into an otherwise tree adder which increases the latency for binary byte operations. And yes, that isn't inherent to being able to run all the opcodes. I mean, there are other ways, in more modern implementations. to account for the carry dilemma. One could use a carry-skip arrangement, use separate ALUs for decimal and binary modes, or just do the decimal mode with more microcode if you are willing to sacrifice cycle accuracy for the decimal mode.

    In saying the above, I am not criticizing. It is easy to see what could have been done differently after the fact. Die space was at a premium, so if you could have one ALU do everything, that was a plus, since you wouldn't be forced to cut corners elsewhere just to make it all fit. That is one reason why "clones" of CPUs often worked faster. Second manufacturers play catchup, so enough time would have elapsed to where a smaller node would have been available. And with the latest runs of the 65C02, there is no telling how many of the above refinements I alluded to above were actually used in the 14 Mhz variety.

    This reminds me of the history of the V20 chip. That is an 8088 replacement that added the instructions that were added to the 186 instruction set, added a hardware multiplier, added a second ALU, and gave it an 8080 mode. That likely couldn't have been done a couple of years earlier since the die space didn't exist, though arguably, they could have omitted "mini mode" and 8080 compatibility mode. The 8088/8086 had 2 main hardware configurations. They had a maxi mode which is what the majority of the compatible computers of the day used. But you could also use the mini mode which provided simple versions of board components in the CPU to where you could use it more like a microcontroller. That was intended for embedded applications, not home use. So NEC could have left that out. However, they had the die space. Had they left that mode out, it likely wouldn't have been used so widely by the telephone industry. The performance decisions made sense to me. Adding an ALU to the BIU (Bus Interface Unit) meant that the BIU wouldn't need to use the poor communication stream to the ALU and hold up execution to calculate addresses. It could just do all that on its own. The hardware multiplier made multiplication faster since the CPU no longer had to use microcode addition loops to multiply. The 286 used those refinements too and demuxed the address and data lines, which was another bottleneck. And of course, the 286 had more in common with the V30 CPU, but with a few more instructions and completely separate address and data lines.

    As for my comments about WDC's opcode map, what you said makes sense. It could be argued that programmers shouldn't have used those opcodes, and yes, trying to track down the coders and asking them may have been difficult to do. That reminds me of how MOS first introduced the 6502. They attended a trade convention event that didn't allow direct sales from vendors at the event and invited any interested parties to a hotel suite. They put all the chips in a jar, Including the duds, and sold the engineering samples for maybe $50 each. While those getting bad chips likely weren't the most thrilled, I doubt they complained much, knowing that they were holding a piece of history.

  • @hinv said:

    That IS interesting. How many 6502 illegal codes are now just NOP on the 65C02?

    Another member said 44, and it was interesting what the Kim Klone (not to be confused with a Kim-1 clone) board design did. The board's designer intercepted at least 42 of those and made a TTL wrap-around coprocessor to use them to add more instructions, particularly those using a 3rd index register. So he gave it '816 features without using an '816 or any sort of memory management unit. So he could run up to 16MB of memory and use his other instructions to nearly double the performance In a Forth interpreter. For a few of his added opcodes, he worked with the CPU by intercepting one of these NOPs, doing whatever, and then telling the 65C02 to do something else to complete his instructions. Like if he wanted to do a PUSH Z:Y:X, his coprocessor would set the upper byte using his added "Z" register (or whatever name he called that new register) and then tell the CPU to PUSH Y:X. I could be wrong on the specifics, but you get the point. The CPU didn't know about the uppermost register and didn't have to if his circuitry handled that.

    Yeah, that would have been nice since some of the illegal opcodes were used to increase performance. If I was going to roll my own processor, I would probably start with a 6502, depending on how many useless undocumented instructions there are. I would probably strip out the BCD instructions as well to make room for more useful instructions.

    Well, there were no explicit instructions for those, just a decimal mode. So you wouldn't save any opcode space. And besides, unless you had a division with remainder instruction, you would make handling things like binary <--> ASCII string conversion harder. For BCD to ASCII just take the nibble and add (or OR) 48, and build a string of numbers with that. With binary, you'd do similar, though you'd need division and a remainder to split out the digits. That in itself is harder to do, but if you have no division, it is even harder since it will likely be hell with all the nested loops. So actually, the decimal mode can be easier for converting to displayable numbers. And of course, if you are designing your own game (such as for a 2600 console), who says you even need to follow ASCII conventions? If you don't need the control codes, just make binary 0-9 map to the digits your game displays. So with BCD, you might not need to do any conversion at all, depending on your program.

  • @PurpleGirl said:

    @AJL said:
    There are no ‘BCD instructions’ just a decimal mode flag and instructions to set and clear it. While that flag bit could be left unused, you don’t get much (if anything) back from omitting BCD.

    That is just semantics.

    Actually, it isn’t.

    Knowing a bit about the inside, omitting that mode may actually help you decrease latency and increase the clock rate in a real design. In the original, at least, the BCD mode cuts into the critical path. It also led to design decisions that slowed the regular binary mode. If you need BCD stuff, you'd need a nibble-sized adder and extra microcode to add and subtract 6 in addition to the main adds. (TBH, there is no subtraction in a CPU, just adding the complement and provoking an unused carry.) Anyway, if you break the circuitry out for a nibble adder to be able to use and intercept the carry signal independently before it becomes bit 4 so you can add 6 to that, you insert a ripple carry into an otherwise tree adder which increases the latency for binary byte operations. And yes, that isn't inherent to being able to run all the opcodes. I mean, there are other ways, in more modern implementations. to account for the carry dilemma. One could use a carry-skip arrangement, use separate ALUs for decimal and binary modes, or just do the decimal mode with more microcode if you are willing to sacrifice cycle accuracy for the decimal mode.

    Knowing a bit more about the internals of the NMOS 6502, removing decimal mode saves approximately 2 inverters and 10 XOR gates in the 8 bit ALU.

    In saying the above, I am not criticizing. It is easy to see what could have been done differently after the fact. Die space was at a premium, so if you could have one ALU do everything, that was a plus, since you wouldn't be forced to cut corners elsewhere just to make it all fit.

    You should also consider that the 6502 was taped out by hand, so every transistor added was more work (and therefore cost)

    As for my comments about WDC's opcode map, what you said makes sense. It could be argued that programmers shouldn't have used those opcodes, and yes, trying to track down the coders and asking them may have been difficult to do. That reminds me of how MOS first introduced the 6502. They attended a trade convention event that didn't allow direct sales from vendors at the event and invited any interested parties to a hotel suite. They put all the chips in a jar, Including the duds, and sold the engineering samples for maybe $50 each. While those getting bad chips likely weren't the most thrilled, I doubt they complained much, knowing that they were holding a piece of history.

    At the time it was something they could get cheaply. Getting two for $100 was still cheaper than any of the other options available. The perspective of ‘A piece of history’ generally comes significantly later after it has been successful. How many of the purchasers of Apple I computers would have taken up the trade-in option for the Apple ][ if they thought they were holding ‘a piece of history’?

  • PurpleGirlPurpleGirl Posts: 151
    edited 2022-08-31 12:02

    I really have nothing to add to the previous reply. I still think things are mostly semantics in that decimal mode turns the other opcodes into BCD ops. And you are right. I made the distinction in my other reply. One wouldn't be saving opcode space, only maybe 6+ ns or whatever in the ALU. And unless you redesign the rest of it, you won't realize any gains. For instance, prefetching the microcode for the next instruction would gain you significantly more performance, making the ALU the next bottleneck. In Drass's 20 Mhz TTL design, he prefetched the microcode for the next instruction. And yes, that created a few problems such as handling bootstrapping, but thankfully, the 6502 is orthogonal enough to where you can guess the first microcode instruction for most instructions, and the exceptions are not hard to manage. Doing just that got him to about 17 Mhz or whatever, and then he got the decimal mode out of the critical path and used a carry-skip arrangement with the adders (adding another nibble adder and a multiplexer).

    With Drass's pending 100 Mhz design, he has had to rethink things even more. The decimal mode instructions are no longer cycle-accurate. He figured adding a cycle to those was better than reducing the clock rate. He also had to make his own adders and incrementers using transparent latches. So being able to add an 8-bit number in 6.4 ns is pretty impressive for off-the-shelf components. (He could have made that even faster by being less elegant and using the "dead bug" approach to mount faster chips with a worse fanout and doubling up; mounting that would have been harder.) Of course, since that is synchronous, you'd need to set it up a cycle before you need it. Going that fast, you pretty much need to divide the AU and the LU instead of using an ALU. And of course, you must keep signal integrity in mind.

    So I guess we should reign in the topic some. We have a 6502 core now, and while it might need some refinements such as adding a few of the illegal opcodes and working out how to add the equivalent of the Ready line, it is mostly ready to go. In our usage, we won't need the /Halt line since the P2 gives us concurrent DMA already. However, an equivalent of the Ready line should be done so Antic can throttle the CPU when needed, and in case anyone wants to add external memory. The reason for a Ready line on a 6502 is to give external memory (including ROM) enough time to work. It can be used to pause the CPU for external devices when unlatching the bus is not required. So we can simulate the effects of bus-mastering without actually needing it.

    Then we'd need Antic, GTIA, and Pokey. Someone mentioned putting Antic and GTIA in the same cog. That might be doable. Atari had considered putting them in the same chip, but their CEO put the brakes on that project. I guess he saw the writing on the wall and knew the platform would not be tenable for too much longer. Now, someone mentioned using a framebuffer. While you can do it that way, I believe part of the purpose of the original design was to avoid doing that. (Remember how costly memory was back then.) Antic would render the display list and I assume, work it on the fly, 1 scanline at a time. Then GTIA would set the correct mode and colors and be cycle exact for the display to work. So you don't need a framebuffer if you can render a display list in perfect timing with the display. I'd want a VGA version of the GTIA in any design I might build, so even if we need 12.5 Mhz in the final video portion to get this to work, it should work, even if we must force the CPU to work around 1.79 Mhz, with maybe some mild variance to that. For a P2 implementation, we might need to render lines ahead of time and cache them when possible to make up for the hub variability and the variations in complexity of the rendering. So if it encounters 3 very easy lines to render and then finds a harder one that takes more than a line's worth of time, there would be lines already ready to be used, thus borrowing from time already saved. So our Antic can be made to work during porches and skipped lines and can stop rendering when its cache is completely saturated. So the frame buffer idea can be omitted if everything can render in real-time. So to be honest, even if our Antic works much faster than the original, it wouldn't really matter. In the implementation that I propose, It would be gated by how full its line buffers would be and by the GTIA core. The GTIA would send interrupts back to the CPU during V-sync as expected.

    Regarding Pokey, it might be possible that we'd want to split the sound abilities from its port and keyboard handling. I guess it depends on how one wants/needs to do things, and what is available in terms of peripherals we can use, and whatever our goals are. For instance, we could have a 2nd Pokey core mapped about 32 bytes in memory above the main one. That is the standard for the upgrade boards and the demos that use them. And we might want to use a PS/2 or USB keyboard and more readily obtainable game devices.

    There is also the PIA chip, and I am sure we could code something compatible based on what we want it to drive. Others here see no use to include an SIO port. I have things I'd plug into one. And of course, some newer Ataris actually included a parallel port in the back. Even the 800 was originally designed with that as an option, but it was never included. You can tell by how the PCB sticks out at the back end, with perhaps no actual traces there. So that was going to be a card-edge connector. There was also a limited developer edition of the Atari 800, with a ribbon connector on the system card and an additional PCB on the left side, with the case redesigned to bring out the extra ports and connectors. So we can iron out how far we take things as we go. It's a rather forgiving architecture.

  • PurpleGirlPurpleGirl Posts: 151
    edited 2022-12-04 06:53

    While my idea of adding opcodes to the 6502 is not one that would likely be used, if one were to, there's something to keep in mind. FP operations on the Atari were stored as BCD. So adding any math coprocessor opcodes would mean it would need to output as BCD. That is the advantage of the Veronica BASIC cartridge. It got its speed from not only having a faster '816 on the cartridge (clocked at maybe 14 Mhz), but it also used better graphics and floating point routines. So it is possible that it used binary mode for everything, but I don't know. I don't know how Veronica worked. There would need to be host and client code, and the Veronica cartridge would need to act like a ROM on the Atari side. The Atari would handle mostly the I/O, while the Veronica would run the interpreter and the programs, passing I/O tasks onto the Atari.

    I've seen benchmarks of a stock Atari running a program to draw a sphere in 2 minutes. Using the Turbo BASIC (interpreter) cartridge could run it in 55 seconds. The Veronica cartridge did it in about 28 seconds or something. And the Rapidus mod and Turbo BASIC did it in 3 seconds. I don't know what Rapidus and Veronica would do, as I've never seen anyone benchmark that. But I'd imagine it would be faster than 28 seconds since the bottleneck through the cartridge slot would likely be reduced, meaning any spinlocks to help with domain crossing would be reduced. I don't know about the Veronica internals. I mean, it would obviously have a ROM and a CPU, but I don't know whether it had RAM, latches, or what. I'd imagine latches would help with any spinlocks used in the code so you can have bus persistence from a faster CPU.

  • @Genetix said:
    hinv,

    The C64 had the slowest floppy drive of all with the 1541; the joke was that you could make yourself a sandwich while it was loading.

    Yeah, that was due to a bug in the VIA chip used, so they may have ended up bit-banging it to get around that. And some modded things to use parallel access to speed up the I/O.

    That reminds me of the original IBM computer. You could supposedly go pee, light a cigarette, and maybe pour yourself some coffee while it counted up to 640K of memory. For a while, the clone makers managed to do better than that, but nowadays, it isn't much different since the PC now does much more on boot. For instance, you have another computer inside that also has to boot (the Intel Management Engine) and the CPU won't boot without it. And while memory is so much faster, and assumed to be more reliable and not tested, there are all these other behind-the-scenes jobs to be done. Then, on top of that, the operating system takes quite a long time to come up and be ready for use. Microsoft made some services start as delayed-start, so you get to a screen faster, but you can't do anything. MS-DOS was simpler. The memory would count up, the floppy drive would strobe, and you'd have the C: prompt.

  • pik33pik33 Posts: 2,350

    The C64 had the slowest floppy drive of all with the 1541

    However, well known computer genius (Turbulence demo on a P1 author) did something with this:

    https://www.linusakesson.net/programming/gcr-decoding/index.php
    https://www.linusakesson.net/software/spindle/v1.php
    https://www.linusakesson.net/software/spindle/v2.php

Sign In or Register to comment.