I agree that emulating any instruction set is difficult, but the C-64 used (if I remember right) a 6502-series CPU which had hard-wired logic for opcodes. It was not microcoded, so it looked more like todays RISC chips. I think all instructions took no more than two CPU cycles each. The cpu was greatly assisted by the graphics chip which shared memory with the cpu and did sprites, transparency, layers, and collision detection in hardware. Also the SID sound chip was way ahead of its time with hi-res hardware frequency generators, A-d conversion for proportional joysticks, and a tuneable switched-cap audio filter.
It would seem a shame to use the propellor's 80 MIPS to emulate an antique 1 MIP machine.
However, emulating the best features of the C-64 in a modern design sounds more interesting. I am especially interested in following the propellor-audio threads in this forum. Some of the prop-generated sound samples provided in this forum are quite amazing!
Well, as I see it, you hanged the towel. It is a hobby, does not have to provide money. Most of what is done here is made for self satisfaction, for learning purposes and to have a bit of fun.
Controlling 32 servos with a propeller is also a waste of resources. That can be done with an AVR. A 6502 emulator can also be done with an AVR, and was done several times. It works. Both architectures are closer than the propeller is. But there lays the challenge.
I think it can be done, I'd like to attempt it, it may have little value, but there are loads to learn from it. What do you want to do ? Number crunching ? Try to implement the DOOM engine, that is also challenging, and maybe much more difficult, the memory architecture hinders almost all conventional applications, that's where new ways of solving problems come in. Besides, without external helpers... it could be very difficult to achieve better than few colors at 128x96.
Have fun
Which is why I brought up FPGA's. While it is true that a FPGA can probably solve this problem without a propeller, it's my guess that a propeller with a FPGA assist would work as well. There are some bit moves and data sizes that on the propeller are a genuine pain in the butt -- "this dog can't hunt" -- and would benefit from a "don't bother with that, my trusty sidekick here does that stuff."
(Found it!* took 74 minutes)
Here's a guy who uses a FPGA to help out a pic·to dump a camera card's contents into a hard drive.
Please! I hate to discuss matters without any substance behind it from the counterpart. When I say you will need 3 µs I have my reasons. When you say you do not believe it, it is your full right.
What I do not understand is why you do not take the 10 minutes coding - as I have suggested above - to implement a simple 6502 instruction???
So I did it: LDA nnn,X
Quite unoptimized, I count 28 COG instructions per 4 ticks + 6 HUB-instructions per - say - 16 ticks = 112- 96 = 200+ ticks = 2.5 µs
Note that it does not help that the graphics co-processor is a co-processor; you have to emulate its instructions as well
DAT
' $BD = LD nnnn,X
' assume we wired DATA to 0..7 and ADDR to 8..23
loop
RDWORD instrcounter, #PCinHUB ' get insruction counter
SHL instrcounter, #8
MOV OUTA, instrcounter
MOV opcode, INA ' get opcode byte
AND opcode, #$FF
SUB opcode, #lowrange WC
IF_C jmpaway
MOVS jmpdest, #jmptable
ADD jmpdest, opcode
CMP opcode, #(highrange-lowrange) WC
IF_NC jmpaway
jmpdest
JMP 0-0
jmptable
....
long instBD 'LDA nnnn,X
.....
instBD
ADD OUTA,#$100
MOV bytelow, INA
AND bytelow,#$FF
ADD OUTA,#$100
MOV newaddr, INA
AND newaddr,#$FF
SHL newaddr,#8
OR newaddr, bytelow
RDBYTE bytelow,#XinHUB
ADD newaddr, bytelow
MOV OUTA, newaddr
MOV thebyte, INA
WRBYTE thebyte, #AinHUB
AND thebyte, #$FF WZ
IF_Z WRBYTE aOne, #ZimHUB
IF_NZ WRBYTE aZero, #ZimHUB
AND thebyte, #$80 WZ
IF_Z WRBYTE aOne, #NimHUB
IF_NZ WRBYTE aZero, #NimHUB
ADD instrcounter, #3
WRWORD instrcounter, #PCinHUB
JMP loop
That implementation is not what I had in mind. I was expressing an idea, to be refined, not something written on stone . I'll try to do it and come back. Your code shows (sadly) that a traditional approach, if we had >> 2K COG RAM, will not work :-(
I wouldn't be at all put off by speed constraints. It's amazing what optimisations can come up after you've produced some initial code to solve a problem and it turns out not to be good enough. With a chip as unusual as the Prop, the first thought solution may very well not be the best.
And if you can't get the speed up to that of the original, it's still an achievement, and you can still run some programs. And once the next generation Prop comes out you're good to go for full speed.
When I look at the instruction set of the 6510 CPU, a LDA abs,x takes 4 cycles to execute - that are 4 us at 1MHz Clock. So also the unoptimized code of deSilva is even faster then the original !
The 6510 CPU need more clock cycles for complexer addressing modes, and that helps a lot to emulate this complexer addressing modes.
Or I am wrong?
Andy
P.S. I also think that the graphic coprocessor with the Sprites is the really challenge.
When I said "unrewarding" above I wanted to express the fact that you cannot gain anything substantial from the parallel working in the COGs. However due to the mass of code needed for all 6502 instructions you will need two or even four of them to hold it all, with additional overhead for Inter-COG synchronization, requiring the registers to be kept in HUB memory...
So this seems to be a very unappropriate example to show-off the power of the Propeller
Post Edited (deSilva) : 11/10/2007 10:58:55 AM GMT
Before we start counting COGs for the 6502, lets throw something out here.
We will probably need 5 COGs just for video rendering (1 for TV driver and 4 for Rendering), 1 for audio, and of course 1 for the SPIN interpreter, which leaves us 1 COG (2 if we remove sound). The of course we have the CIA chips and VIC operations (which the two could probably be combined into 1 COG).
This is not funny what JT Cook requests.
It undigs some of the issues of the COG concept. Life is easy as long you have no more tasks than you have COGs and your application code fits into 512 words. When you have a VERY LOW SPEED application (below 10 kHz), you can change to SPIN thus working nicely around the memory restriction.
Needing more than 8 tasks however can become a nightmare of painstickingly arranging polling code and counting clock ticks, as the well understood method of choice - interrupts - is not available...
Some clever tricks in some drivers ("co-routines") work as long as the code fits into 512 words.
I would not complain if I had 64 COGs (well not this year at least...)
Good idea Ale, and if we run out of cogs, there's always Prop2, by which time we'll have most of the code ready, as that'll have ample RAM and ample cogs and ample IO's and more than ample speed [noparse];)[/noparse]
Large memory model, if anyone gets that working in a practical way.
Rather than use SPIN, write your own byte code interpreter which is tailored for the problem space. Think of it as an metaphor for the microcode used to implement instructions on a CISC CPU.
Use a table of data in hum memory with each row representing a machine code instruction and the columns representing different elements of that instruction (register, addressing mode, type of operation, flags affected etc.) Then rather than implementing each instruction in the cog program, you are writing generic table driven code.
Only emulate a sub set of instructions needed to run a particular program.
Worth investigating.
How about starting simpler? A Vic-20 is less complicated with less memory and therefore less complicated programs. Or maybe a PET. It would get the 6502 emulator working, ready to do he C64 when the PROP II comes out.
CardboardGuru said...
There's other approaches too.
Large memory model, if anyone gets that working in a practical way.
Rather than use SPIN, write your own byte code interpreter which is tailored for the problem space. Think of it as an metaphor for the microcode used to implement instructions on a CISC CPU.
Use a table of data in hum memory with each row representing a machine code instruction and the columns representing different elements of that instruction (register, addressing mode, type of operation, flags affected etc.) Then rather than implementing each instruction in the cog program, you are writing generic table driven code.
Only emulate a sub set of instructions needed to run a particular program.
Worth investigating.
How about starting simpler? A Vic-20 is less complicated with less memory and therefore less complicated programs. Or maybe a PET. It would get the 6502 emulator working, ready to do he C64 when the PROP II comes out.
That's why I've been thinking of doing a Z80 emulator·using the·Large memory model, to start simple, with say a ZX81 emulator, to later do a Spectrum emu, either on prop2 or HX512.
But that's once later on, as I've got other stuff to finish first, and I don't have much spare time with work being busy at the mo.
- The idea of the subset is quite sound. Anyway for any other model... we need the processor emulator, maybe with looser timings.
I was diagramming yesterday the parallel tasks with more or less accurate timings and it seems to fit, as I explained before. I'll see if I can put a diagram together to show this, with some code.
Can't help but laugh at all the old C-64 references....
Load $,8,1.· That's classic.
SYS 49152...I'm trying to recall what this did?· (I think it was like "ctrl+alt+del", a reset?)
I remember you could crash the thing by entering 99E88 into a number field.· (overflow error)
And the graphics were "sprites".
....I would download games from the local "BBS".· It would take like all night and as it was downloading it would add a dash like every 5 seconds and go across your screen:
I had a Vic-20 first!
I thought I'd post an answer to Dr A. in this thread instead of the "Thank you, Parallax" thread.
This is still a very worthy project, and we have many of the pieces in place.
* We have an SDRAM solution for memory
* We have SID emulation
* We have most of the requirements for a VIC emulation
* We have a primitive 6502 emulator. Perhaps the 6510 could be achieved from it.
I suspect that we should probably do this in steps, starting with a VIC-20 emulation, or perhaps even a PET emulator and work our way up. Personally, I think it would be a hoot to see Omega Race or Pirate Cove running off a prop.
Do you have a board with external memory? Attached is Juergen's CP/M emulation. The 'main' program is CPM.Spin.
There are a few interesting things in there. A graphics driver. Sound driver. But I think the main thing is the LMM emulation code. Juergen said that this was the breakthrough that meant it was possible to keep coding opcode emulations without always having to worry about running out of cog space.
For a C64, the disk drive format will be different but hopefully not too different.
The opcodes are different but you maybe can drop in code already done.
I think the sound chip emulation will be different - maybe you can hack the code that is already done.
What was the standard graphics resolution for a C64?
The 6502 and the 6510 are almost identical; The only difference is the 8bit I/O port which is controlled by two memory mapped registers at address $0 and 1$ (data, direction).
These registers were used to map the different rom-chips in and out of the address space and also to handle communication with the datasette drive.
The hardest part to emulate, BY FAAAAAR, is the Vic II, everything else are either already done or quite easy to do.
The 6502/6510 could have been very easy to emulate if it wasn't for the illegal opcodes.
These are not fully documented even today.They are just byproducts of the way the 6502 selects opcodes by combining addressing modes bits, instruction bits and instructions type bits.
I have been coding CPU-emulators for 6502, 6510 and HuC6280 (A pimped 6502 used in the PCe).
I'm sure it shouldn't be any problem to make an (quite) accurate emulation running at full speed in a single cog. I don't know how accurate Ericballs emulation is; it may be a good start for the 6510.
A Vic20 is very simple compared to the C64. It uses the same cpu, but that's all it shares with the C64. So basing a C64 emulator on a Vic20 emulator makes no sense at all.
@Ahle2:
As the code for the C64 has been written already the undocumented opcodes used by every piece of software are known, do you happen to know how many of those opcodes are used and or how often ? Because shooting for a perfect emulation may not be the best approach... me thinks...
Let's say you want to run "Skate or Die" (I don't make it that easy anyways...) ;-)
@Dr_Acula
The standard graphics modes used on the C64 were:
- Highres, 1bpp per character/tile at 320x200 resolution (used in the basic interpreter)
- Multicolor lowres, 2bpp per character/tile at 160x200 resolution (used in most games)
Colors can be selected from the fixed palette of 16 entries.
This might not seem to be too hard to emulate on a prop in a single cog.
But the VicII in the C64 was nothing like the counterparts in other 8bits machines at the time.
It featured hardware scrolling, extremely good sprite capabilites (for the time), hardware collision detection, hardware sprite scaling(it's true)... etc etc ... etc ..
I would be surprised if it could be done in less than 4 cogs.
Comments
I agree that emulating any instruction set is difficult, but the C-64 used (if I remember right) a 6502-series CPU which had hard-wired logic for opcodes. It was not microcoded, so it looked more like todays RISC chips. I think all instructions took no more than two CPU cycles each. The cpu was greatly assisted by the graphics chip which shared memory with the cpu and did sprites, transparency, layers, and collision detection in hardware. Also the SID sound chip was way ahead of its time with hi-res hardware frequency generators, A-d conversion for proportional joysticks, and a tuneable switched-cap audio filter.
It would seem a shame to use the propellor's 80 MIPS to emulate an antique 1 MIP machine.
However, emulating the best features of the C-64 in a modern design sounds more interesting. I am especially interested in following the propellor-audio threads in this forum. Some of the prop-generated sound samples provided in this forum are quite amazing!
Controlling 32 servos with a propeller is also a waste of resources. That can be done with an AVR. A 6502 emulator can also be done with an AVR, and was done several times. It works. Both architectures are closer than the propeller is. But there lays the challenge.
I think it can be done, I'd like to attempt it, it may have little value, but there are loads to learn from it. What do you want to do ? Number crunching ? Try to implement the DOOM engine, that is also challenging, and maybe much more difficult, the memory architecture hinders almost all conventional applications, that's where new ways of solving problems come in. Besides, without external helpers... it could be very difficult to achieve better than few colors at 128x96.
Have fun
ALe
(Found it!* took 74 minutes)
Here's a guy who uses a FPGA to help out a pic·to dump a camera card's contents into a hard drive.
·http://home.nikocity.de/andymon/hfg/Alya/alya.html
If I read him right, he's just using it for wiring, but the general idea -- FPGA as a helper --·seems sound.
*caveat, if you have a webpage that brags about your hacks, make sure its title·helps find it again. "The alya project" doesn't help much.
Post Edited (Fred Hawkins) : 11/9/2007 9:43:44 PM GMT
What I do not understand is why you do not take the 10 minutes coding - as I have suggested above - to implement a simple 6502 instruction???
So I did it: LDA nnn,X
Quite unoptimized, I count 28 COG instructions per 4 ticks + 6 HUB-instructions per - say - 16 ticks = 112- 96 = 200+ ticks = 2.5 µs
Note that it does not help that the graphics co-processor is a co-processor; you have to emulate its instructions as well
Post Edited (deSilva) : 11/9/2007 10:09:03 PM GMT
That implementation is not what I had in mind. I was expressing an idea, to be refined, not something written on stone . I'll try to do it and come back. Your code shows (sadly) that a traditional approach, if we had >> 2K COG RAM, will not work :-(
And if you can't get the speed up to that of the original, it's still an achievement, and you can still run some programs. And once the next generation Prop comes out you're good to go for full speed.
Don't let anyone tell you it can't be done.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Help to build the Propeller wiki - propeller.wikispaces.com
Prop Room Robotics - my web store for Roomba spare parts in the UK
The 6510 CPU need more clock cycles for complexer addressing modes, and that helps a lot to emulate this complexer addressing modes.
Or I am wrong?
Andy
P.S. I also think that the graphic coprocessor with the Sprites is the really challenge.
So this seems to be a very unappropriate example to show-off the power of the Propeller
Post Edited (deSilva) : 11/10/2007 10:58:55 AM GMT
We will probably need 5 COGs just for video rendering (1 for TV driver and 4 for Rendering), 1 for audio, and of course 1 for the SPIN interpreter, which leaves us 1 COG (2 if we remove sound). The of course we have the CIA chips and VIC operations (which the two could probably be combined into 1 COG).
·
It undigs some of the issues of the COG concept. Life is easy as long you have no more tasks than you have COGs and your application code fits into 512 words. When you have a VERY LOW SPEED application (below 10 kHz), you can change to SPIN thus working nicely around the memory restriction.
Needing more than 8 tasks however can become a nightmare of painstickingly arranging polling code and counting clock ticks, as the well understood method of choice - interrupts - is not available...
Some clever tricks in some drivers ("co-routines") work as long as the code fits into 512 words.
I would not complain if I had 64 COGs (well not this year at least...)
Post Edited (deSilva) : 11/11/2007 9:32:49 AM GMT
- Large memory model, if anyone gets that working in a practical way.
- Rather than use SPIN, write your own byte code interpreter which is tailored for the problem space. Think of it as an metaphor for the microcode used to implement instructions on a CISC CPU.
- Use a table of data in hum memory with each row representing a machine code instruction and the columns representing different elements of that instruction (register, addressing mode, type of operation, flags affected etc.) Then rather than implementing each instruction in the cog program, you are writing generic table driven code.
- Only emulate a sub set of instructions needed to run a particular program.
Worth investigating.How about starting simpler? A Vic-20 is less complicated with less memory and therefore less complicated programs. Or maybe a PET. It would get the 6502 emulator working, ready to do he C64 when the PROP II comes out.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Help to build the Propeller wiki - propeller.wikispaces.com
Prop Room Robotics - my web store for Roomba spare parts in the UK
But that's once later on, as I've got other stuff to finish first, and I don't have much spare time with work being busy at the mo.
- The idea of the subset is quite sound. Anyway for any other model... we need the processor emulator, maybe with looser timings.
I was diagramming yesterday the parallel tasks with more or less accurate timings and it seems to fit, as I explained before. I'll see if I can put a diagram together to show this, with some code.
More than prop2 we may need 2 props
Load $,8,1.· That's classic.
SYS 49152...I'm trying to recall what this did?· (I think it was like "ctrl+alt+del", a reset?)
I remember you could crash the thing by entering 99E88 into a number field.· (overflow error)
And the graphics were "sprites".
....I would download games from the local "BBS".· It would take like all night and as it was downloading it would add a dash like every 5 seconds and go across your screen:
I had a Vic-20 first!
fun stuff!
Gosh....those were the days!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
The more I know, the more I know I don't know.· Is this what they call Wisdom?
This is still a very worthy project, and we have many of the pieces in place.
* We have an SDRAM solution for memory
* We have SID emulation
* We have most of the requirements for a VIC emulation
* We have a primitive 6502 emulator. Perhaps the 6510 could be achieved from it.
I suspect that we should probably do this in steps, starting with a VIC-20 emulation, or perhaps even a PET emulator and work our way up. Personally, I think it would be a hoot to see Omega Race or Pirate Cove running off a prop.
Thoughts? Is a VIC-20 emulation thread in order?
OBC
I think that woud be a great idea! the VICE emulator project would be a good resource (http://www.viceteam.org/)
Can some of the other Propeller "emulator" experts provide some direction on some starting points based on your experience with the Propeller?
OBC
Do you have a board with external memory? Attached is Juergen's CP/M emulation. The 'main' program is CPM.Spin.
There are a few interesting things in there. A graphics driver. Sound driver. But I think the main thing is the LMM emulation code. Juergen said that this was the breakthrough that meant it was possible to keep coding opcode emulations without always having to worry about running out of cog space.
For a C64, the disk drive format will be different but hopefully not too different.
The opcodes are different but you maybe can drop in code already done.
I think the sound chip emulation will be different - maybe you can hack the code that is already done.
What was the standard graphics resolution for a C64?
These registers were used to map the different rom-chips in and out of the address space and also to handle communication with the datasette drive.
The hardest part to emulate, BY FAAAAAR, is the Vic II, everything else are either already done or quite easy to do.
The 6502/6510 could have been very easy to emulate if it wasn't for the illegal opcodes.
These are not fully documented even today.They are just byproducts of the way the 6502 selects opcodes by combining addressing modes bits, instruction bits and instructions type bits.
I have been coding CPU-emulators for 6502, 6510 and HuC6280 (A pimped 6502 used in the PCe).
I'm sure it shouldn't be any problem to make an (quite) accurate emulation running at full speed in a single cog. I don't know how accurate Ericballs emulation is; it may be a good start for the 6510.
A Vic20 is very simple compared to the C64. It uses the same cpu, but that's all it shares with the C64. So basing a C64 emulator on a Vic20 emulator makes no sense at all.
/Ahle2
As the code for the C64 has been written already the undocumented opcodes used by every piece of software are known, do you happen to know how many of those opcodes are used and or how often ? Because shooting for a perfect emulation may not be the best approach... me thinks...
Let's say you want to run "Skate or Die" (I don't make it that easy anyways...) ;-)
The standard graphics modes used on the C64 were:
- Highres, 1bpp per character/tile at 320x200 resolution (used in the basic interpreter)
- Multicolor lowres, 2bpp per character/tile at 160x200 resolution (used in most games)
Colors can be selected from the fixed palette of 16 entries.
This might not seem to be too hard to emulate on a prop in a single cog.
But the VicII in the C64 was nothing like the counterparts in other 8bits machines at the time.
It featured hardware scrolling, extremely good sprite capabilites (for the time), hardware collision detection, hardware sprite scaling(it's true)... etc etc ... etc ..
I would be surprised if it could be done in less than 4 cogs.
/Ahle2
C is faster, but LMM is going to be a lot faster again.