Commodore 64 Emulator

hinv · 2007-01-11 21:05

Wow,

I got a "Whats a microcontroller" kit for CHRISTmas, and the basic remined me so much of my first computer which was a Commodore128(which lived mostly in C64 mode). Then I was thinking that I needed more power, so I started looking into the Propeller chip. Then, I got into the Prop2 chip discussion, and I thaught. With 128 or 256KB memory, you could emulate a whole C64 in there. I was then very pleased to see that you are writing this!

Kudos!

Doug

P.S. I just got my Propeller Demo board today, so I may be able to contribute as well.

Cyberbound · 2007-02-19 15:38

Has there been any progress on this project? Also, why emulate the SID? Why not just put in a socket for one and use the real thing (or have that as an option)?

mahjongg · 2007-02-19 16:36

If this project is still alive, then the author is perhaps interested in my memory expansion system for the Propeller.

see

http://forums.parallax.com/showthread.php?p=629266

Or perhaps the new memory expander board for the XGameStation could also be used.

see

http://forums.parallax.com/showthread.php?p=632823

These SRAM based memory expanders should be just what is needed to create a standalone C64 emulator.

My system uses a real SRAM (up to 256K) with random access and should be -much- faster than any I2C serial memory solution.

I was not aware of this project, but I designed my memory expansion board for just such a retro-computing solution.

Its nice to hear it's possible to emulate a 6510 with the Propeller.

Mahjongg

JT Cook · 2007-02-20 05:19

I think having SID emulation on a propeller would be pretty sweet. With time, it is getting harder to find SID chips and the Prop would be powerful enough to reproduce·it.

hinv · 2007-10-30 01:20

Well, has anyone heard from Goog on this one? I would really like to see a C64 emulator for the prop. I would even buy a HX512 board for the memory if it was needed(I really want to find a good excuse to get one).

Thanks,
Doug

Oldbitcollector (Jeff) · 2007-10-30 02:22

Shame too... This thread is the one that introduced me to the Propeller.. I think the 64 emulator is VERY doable, but I suspect it would take more than one prop. SIDprop, VICprop, 6502prop Heck the 8x8 character stuff has already been written, and the HSS code is more than capable of producing SIDlike sounds.

Oldbitcollector

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Buttons . . . check. Dials . . . check. Switches . . . check. Little colored lights . . . check.

— Calvin, of 'Calvin and Hobbes.

JT Cook · 2007-10-30 06:24

I spoke with Goog a while back (before he disappeared) in email and the last I heard from his is that he ran out of Prop memory for code. But I think he would really have to take a different approach because the system was really slow. Judging from the video it takes 30 seconds to do·an operation that takes a second on the real thing. And it didn't emulate video (really it was just text out), audio, or anything else.

Granted C64 emulation *may* be possible, it would definately require something like the 512k card and some guru ninja programming, and even then the best you could hope for is probably 50% functional.

deSilva · 2007-10-30 06:50

I should not be such pessimistic... To my investigations some months ago it would take 20 to 40 SPIN machine instructions to emulate one foreign 8-bit machine instruction = 1 to 2 us which will result in around the same speed as those kind of processors. The emulation of more complex hardware add- ons (Video, sound ...) would take more, but is most likely not time critical.

It is space critical however, as all those code must reside in the COG.... 20x250 instructions = 10 COGs worst case...

Having done it in SPIN, and taking the factor of 80, will indeed result in one SPIN minute for one Comodore second.

This will be a very handsome test for the C-Compiler!!

hinv · 2007-10-30 08:46

Oldbit, I think that having 2 props could solve the memory problem with a bit of code stuffing here and there. The HX512 cart would be a whole lot easier, and more available to at least the Hydra owners. I can't immagine using up all of those 32bit 80MHz cogs to emulate the chips in a c64 unless they had to be programmed in spin I believe that 8 would due just fine if coded in assembly.

deSilva, it shouldn't be that tough to run circles around a 6502 in 1 cog with paging to and from the hub, and another cog paging between external memory and the hub. One problem I could see, however is that because the c64 didn't have a real time clock, a lot of the game timing was done by making NOP loops. So when the processor is emulated in assembly instead of spin, you would have the opposite problem. You would have to go back and add in those clock cycles to get all of the 6502 instructions to run at the same speed as the original. He is also using serial FRAMs for memory, which will slow things down a bit especially if he is using spin (I hope ot) to page to and from them.

I think the big problem, is that with the intention of not having too many versions out there, we have 0 because Goog seems to be gone and has not given it OUT even though it appears that many from this forum have given it IN.

Just my 10 bits worth,

Doug

Ale · 2007-10-30 09:44

The C64 may not have a real time clock, but it had a 10 ms (or so) interrupt !. That is something.

May be we should decide on some ideas, and *sadly* start over. A Spin version, does not help much... especially because to get some speed you have to use assembler anyways.
Must be noted that the C64 had 16 KB of video RAM, plus 64 KB RAM plus all the ROM, char rom, kernal, basic, usw. The memory management inside the C64 is quite complex.
At full speed I think a 100 MHz 486 was able to emulate a C64 with sound and video. But it needed loads of RAM. (I think was Mihas Peternel emulator, If I remember correctly his name).

We have much more raw-power than a 486, but we do not have the memory flexibility :-(

May be a good start would be to map the video mem into HUB mem, and all extra mem to external devices. So a couple of COGS can take care of video generation and sprites, one or two for audio and one for instruction execution.

The 6502 has nothing fancy, but some somehow complicated addressing modes. Some BCD and binary arithmetic... and stop counting...

just a small thought. Ahhh.. before I forget, no closed source!
Another nice project and I have no time :-(

CodeGuy · 2007-10-30 13:57

I did a quick look at emulating a Commodore/MOS Kim-1 (a VERY early ancestor of the C-64) with the propeller and decided that I needed to wait for the C-compiler. There is not enough space in a single cog to do a cycle accurate 6502 emulator, even if memory access is handled by another cog. I thought about perhaps writing a small runtime state machine in assembly and emulating the processor with an optimized "microcode", but I think that the scope of this is large enough where I would not be able to get it done in a reasonable amount of time. (read: I can get it done faster by waiting for the C compiler, then writing the code in C.)

My thought was then to impliment a MMU in a second cog, so that one could impliment several different types of memory access. (standard RAM tied to IO ports, the HYDRA expansion card, a small piece of hub ram, etc)

Once the KIM emulator was done, it could then be a platform to impliment other 6502 based emulators (or perhaps, a board that could function as an in-circuit 6502 debug tool for repairing/debuging 6502 hardware -- OK so I may be dreaming here.)

So many possibilities, so little time. To bad I have to work for a living.

Dan

JT Cook · 2007-10-30 14:49

If you are going to start anywhere, you need to start at the 6502 core. You don't even need extra memory to test the 6502 core really, but it would help. Also if you don't have access to extra memory, you can always use a serial connection and make a host app to simulate the 64k + ROMs of memory. Also I don't think the C compiler is going to be faster than SPIN. And with Goog choosing not to make his program open source, that is his choice and I at least respect that. I think he didn't want to have someone steal his code and claim it as his own. His problem was that he just ran out of Prop memory to continue coding. And in the end that is going to be your biggest enemy, the 32k limit.

Another problem is timing. The C64 games are so timing sensitive that using the Prop would make it difficult to match the timing of the different chips, interupts, and clock cycles of the 6502. Now you can use different COGs to emulate different chips, but to get everything timed in a way that it was on the C64 will be too hard.

And to get to the point that Goog did, you just need to emulate teh 6502, have enough memory, have access to the ROMS, and a way to display the character output.·I started working on a C64 emulator about the same time that he did (for the PC) because I didn't think it could be done on the Prop and I just wanted to see what was involved (and had some extra free time at that time). After doing a C64 emulator PC, I beleive that it may be possible, but the only way that is going to happen is if someone writes a 6502 core in a single COG or maybe·2 COGs, and even then you would be lucky if it were full speed and reached 10% compatability.

·

hippy · 2007-10-30 16:48

JT Cook said...
I don't think the C compiler is going to be faster than SPIN.

I think it will be. The C compiles to Cog Assembler code which I recall gets paged in as required. It won't always be as fast as Assembler code which loads and stays in Cog until power-off but it shouldn't be that much slower if there's not too much paging going on. The other Large Memory Model loads an instruction or so at a time then executes them.

Whether either solves a problem is a different matter, but both should be faster than Spin.

CodeGuy · 2007-10-30 17:37

If anyone is interested here is a 6502 core that I threw together to test the feasibility of a 6502 emulator (SERIOUSLY, THREW TOGETHER, this is a real hack-o-rama).· There are much better ways to do many of the·things in this code, but if it will help any one feel free . . .

I cannot even promise that it will run in its current state.

deSilva · 2007-10-30 19:03

hinv said...
deSilva, it shouldn't be that tough to run circles around a 6502 in 1 cog

Do you have any indication for your opinion, e.g. an example of how to emulate one 6502 instruction including instruction fetching??

Ale · 2007-10-31 08:15

I just had the following thought:

May be we approaching the problem from the wrong angle...

* We want to emulate the 6502 using a traditional model. But the propeller is excatly the opposite !.

Several cogs should act together to emulate the CPU:

Example:

COG 0: Memory fetch unit
COG 1: Instruction decoder
COG 2: ALU

As we have very little memory for all these tasks... combining different COGS we *may solve* the problem.

At Parallax, they developed spin, a sort of machine language... and how much throughput did they get ? 80 ki/s !... And they had full control over architecture and everything... that means could be quite difficult to get pass them, with just 1 COG.

No idea if such an approach can reach the needed speed, but could be a way.

Gru

deSilva · 2007-10-31 09:14

A very good idea, but the wrong time scale.
When parallelizing things you need a certain "chunk size" to compensate for communication overhead.
MP instruction are not of that kind... Floating Point and SIMD would be different...

hinv · 2007-10-31 10:33

deSilva,

No, I don't, but it sometimes those that don't know about limitations that completely blow them away. I will stick with my opinion because it cannot be disproved.
I have done some 6502 assembly, and even hand coding machine code before. It seems to me that an average of around 4-5 32bit instructions/clock on the cog, and 4-5 8bit instructions/clock on the 6502, I should be able to
do the same thing with 8 0cog instructions that I can do with 1 8-bit instructions on the 6502.

There are also only 6 registers on the 6502, and 512 32bit "registers" in a cog.
PC .... program counter (16 bit)
AC .... accumulator (8 bit)
X .... X register (8 bit)
Y .... Y register (8 bit)
SR .... status register [noparse][[/noparse]NV-BDIZC] (8 bit)
SP .... stack pointer (8 bit)

with a possible 255 instructions(including undocumented ones), that would make a jump table of 64 longs.
Anybody know exactly how many opcodes the 6510 had? Anybody know exactly what happened when an illegal instruction was 'executed'?
Yes, I believe it is doable in 1 cog to emulate the 6502, and be faster than the original. I do, however, reserve some space in the hub for more than just main memory

It is not the execution of the 6502 code that I see as difficult, but it is making the timing correct. After all, if you don't have to make the timing correct, the NOP's just fall out.
Unfortunately, it is the timing, including the qwerks and interactions with the other chips that make the C64. That is the monster I really don't have any interest in fighting.

One more thing, is those blasted interrupts that would have to be emulated. That is one part of the 6502 I didn't understand in high school....but, it can be learned.

Doug

Ale · 2007-10-31 10:56

You are right, sort of

. In this case, was not parallel what I aimed at, it was pipelined. As all operations cannot be performed in just one COG due to the lack of program memory.

hinv · 2007-10-31 10:57

Actually, come to think of it, the timing might not be that hard with a waitvid and a counter running at 1MHz...not that I know how to use those things yet.
Of course that is predicated on being able to finish 1-8bit instruction with 80-32bit ones,

Just my 10 bits worth,

Doug

Ale · 2007-10-31 11:28

hinv, Counters are easy to use, and waitcnt instead of waitvid can be used for the core-emulating cogs. No problem.
Who is going to make a test run ?... more to do tomorrow I think

I don't think timing is a issue if this pipelined mode can be used...

hippy · 2007-10-31 13:03

I'm not really familiar with the 6502 but from axis.llx.com/~nparker/a2/opcodes.html the way the opcodes map to bit patterns is similar to the 6800 I am familiar with and that makes decoding much simpler that it looks from the high/low nibble table.

I've only ever written disassemblers and non-cycle accurate emulators so have no experience of getting timing accurate. I expect that could be a challenge.

Rather than have multiple cogs working as a pipeline an alternative would be to have two or more cogs acting as cores but each only executing a subset of the opcodes ( some opcodes may need to be handled by all cogs ). They would need to be kept in sync and 6502 register and flag information may need to be passed between them. No reason that could not be pipe-lined as well.

Another idea may be to treat the 6502 as microcoded, have one cog decode the opcodes into a sequence of microcode instructions which another cog can take and perform the low-level putting of data on buses, performing ALU operations and so on which would happen in the real chip.

Ale · 2007-11-08 20:59

Well,

after deep studies, thoughts, usw. I think I have a possible solution to the imlementation:

have a look at the instruction map here : [noparse][[/noparse]url]http://ericclever.com/6500/[noparse][[/noparse]url].

We need 5 COGs,
COG 0 : Memory controller : With a 256kx8 DRAM connected to P0..P12

COG 1 : Interprets instructions from 0..3f (LSB-MSB)
COG 2 : Interprets instructions from 40..7f
COG 3 : Interprets instructions from 80..bf
COG 4 : Interprets instructions from c0..ff

*** All 5 cogs are synchronized

Cog 0 fetches 1st byte of instruction

All 4 cogs 1..4 read the port at the same time and the right cog starts execution, the rest go to sleep till next instruction fetch.

For this approach to work these conditions should be met:

- Cog 0 asserts a signal (pin) that indicates that an instruction fetch begins, this signal is what wakes all 4 cogs 1..4 to execute.
- Registers/flags are in HUB RAM, and are fetched when the memory 1st read occurs
- COG 0 generates subsequent sync reads/writes as needed.
- Each cog executes only a subset of the instructions, sorted by LSB, and as last 16 instructions are not coded, extra 64 longs (48 in fact) are available.
- Each COG 1..4 reads/writes as needed extra bytes.

Well, this may work, I do not know if fast enough or not, but a division of tasks like this is more viable than what I proposed last week.

If anybody want to implement this, be my guest. I'll give it a try as soon as I can, but that can take some time due to some paper I have to finish for my Ph.D.

Have fun,

btw: deSilva what you think ?

Fred Hawkins · 2007-11-09 00:45

FPGA anyone? http://www.fpga4fun.com/

deSilva · 2007-11-09 06:21

I had two concerns:
- The amount of interpretating code wil not fit into one COG - but maybe four suffice; so this issue is well addressed.
- The interpretation time will be 1 to 2 µs, which could be slower than the generic machine. This will change for the worse, as the registers must now be held in th HUB, and the COG #0-COG #n sync will take additional time... I should guess 3µs for an instruction now...

Also, I do not understand the working of COG #0 clearly...

Oldbitcollector (Jeff) · 2007-11-09 06:26

Fred Hawkins said...
FPGA anyone? http://www.fpga4fun.com/

Commodore FPGA has aready been done..

c64upgra.de/c-one/

Shamefully, the project sorta died after it's two creators had a falling out.

Oldbitcollector

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Buttons . . . check. Dials . . . check. Switches . . . check. Little colored lights . . . check.

— Calvin, of 'Calvin and Hobbes.

Ale · 2007-11-09 08:36

We are discussing the Propeller implementation. With fpgas you can do loads of things, but is a topic for another forum.

Cog 0 is the memory controller. As we need +64 Kbytes, and to not use all pins to access memory, a DRAM could be used without too much impact (easily exchanged with SRAM, I think).

Maybe we cannnot reach 1us per cycle, but we can be nearby, much closer than with spin.

From an architectural stand point such a system as "described" is really interesting due to the strict sync needed, like the high res drivers with multiple cogs. I think is doable, let's see

deSilva · 2007-11-09 09:34

(a) 1µs is certainly not doable

You can know it within 10 minutes, when you write down the code needed for a ReadByte instruction, or a 8 bit- ShiftThroughCarry.
(b) A machine code implementation will be much faster than the general factor 80 speed-up against SPIN
(c) DRAM-controller is a nice idea, however @ 64 kB not a price advantage over fast static ram. note it should be nice to access that memory within 50 ns to avoid further delay...

Ale, your ideas are valid, but you get the wrong time scale and the wrong ganularity and the wrong complexity

Ale · 2007-11-09 10:02

Timing: as said.
DRAM: is not a matter of price, it is, as I said, to save pins and I pointed its possible exchange with SRAM, not the issue.

As you have to read the registers from HUB mem, accessing DRAM _could_ be hidden. I do not see a problem there.
explain me please: wrong granularity (of what ?) and complexity of what ?, maybe I missed something due to the fact that I did not read every post since the start, just the interesting ones ...

deSilva · 2007-11-09 19:23

When parallelizing things you generally generate a certain overhead for -lets call it: - synchronization:
"1 brick layer can lay 100 bricks in 1 hour. How long will it take 1000 brick layers to lay 1000 bricks?"
Dividing an operation like the interpretation of a simple machine instruction needs very little overhead.
On the other hand to emulata a CISC instruction it not a simple task using the Propeller instructions.. It is just not rewarding! You can do so many things with 32 bits, but you are hindered by many 8 bits and pieces

Commodore 64 Emulator

Comments