Intel 8086 CPU Emulator
macca
Posts: 829
Hi,
Seems that today is Intel's 8086 launch anniversary, looks a good day to present my own Intel 8086 processor emulator for the P2.
Here are some screenshots of it running the 8086 Monitor by Seattle Computer Products.
All instructions are (hopefully) implemented, including the single-step and breakpoint interrupts.
The attached example emulates a very simple system with 256KB RAM, 64KB ROM and an ACIA-ish serial port hooked to the programming serial pins running at 115_200 bps. The monitor source is slightly modified from the original to accomodate for the different system configurations and can be compiled with nasm.
The monitor's manual is here.
There are bugs, so use at your own risk
Enjoy!
Comments
Holy cow! Neat!!!
Supremely intriguing.
Sweet. Guess that means it can run old DOS programs?
How many MHz does it do?
Nice one macca. Do you have further plans to incorporate any PC type peripherals into the memory map over time, or will this be intended as a pure standalone 8086 (non-PC) emulator?
From the processor perspective yes, however it needs a lot of other things to be able to run DOS programs.
Haven't measured... will try to assemble something, but it doesn't count the cycles so it will not be significant.
Yes, I would like to implement something like dosemu.
One problem will be the ram, however, if I'm not wrong, the first IBM-PC had a maximum of 256KB so I think I'll start with it.
Cool project.
It compiles and downloads fine.
But unfortunately all I get is gibberish when I try to run it, like the serial isn't sync'ing.
I started with 115200, no parity, one stop bit.
I've tried the latest versions of PropTool, PNut and flexprop.
I've tried various baud rates from 9600 to 460800.
I've tried various parities and stop bit combinations.
I've both the Parallax Serial Terminal and TeraTerm.
The gibberish changes with changes to baud, parities and stop bits, but I never get good old readable ASCII.
I've tried changing the P2 clock frequency from 150 to 200 MHz. I don't get any response from the program below 150 or greater than 170 MHz.
Any thoughts? I'm using a P2 Edge Rev. C on a mini breakout using the latest prop-plug(rev. D), which has been working fine with everything else I've thrown at it...
x86 So we now have them all. Z80, 6502, 68000 and now this. We can make a dosbox.
Ah, I always forget that these tools can't handle 64-bit constants, try with the updated source.
If you need to change the baud rate or clock you need to recalculate the value for the smart pins configurations at lines 1729 and 1734, or replace with cordic calls.
Given the relatively slow speed of the original x86 you could likely run code directly from PSRAM, if it doesn't have to be especially cycle accurate. A small read buffer from PSRAM will help out a lot there and could potentially somewhat mimic the BIU prefetch instruction queue and EU of the 8086. You may need two COGs if you wanted some overlapped pipelining happening there (one COG doing PSRAM and another COG for your x86 emulator) although that also can exacerbate latency.
Some time ago I forked this to my github:
https://github.com/pik33/8086tiny
There is a bios here to start playing with a PC stuff and a test disk image with the Alley Cat game.
A CGA/VGA/Hercules emulator has to be writen now with all these addressing modes and other strange stuff... I have a manual at home...
Oh that's a good starting point. And the BIOS can be compiled with NASM, wonderful! Thank you!
In the workings...
1111111x xx111xxx is an alias of PUSH r/m.
The 8086 has no real NOPs and every opcode does something.
If there is a place for it, maybe consider adding 186 (improperly called "286") additional opcodes.
In late 80s and early 90s there were a lot of AT computers in use, with 286 or even 386 processors, but nobody was using 286 "protected mode" yet. All things were DOS or Windows 3.11, CPU in the "real mode" except BIOS switches them for emulating the EMS. But there were several convenient instructions we liked to use in asm. The most popular of them was pusha and popa. The rest of them was shift,rol,push,mul imm, which I used, and insb/outsb/enter/leave/bounfd which I don't remember using.
I remember the first 486 we bought for the faculty. Then we installed Matlab on it - it worked with the light speed
Not a lot of software uses 286's protected mode because intel forgot to add a way to switch back to real mode. 386 allows this, but of course if you're gonna write for 386, you might aswell use 32 bit mode.
I never wrote anything pure asm on a 386. I used 386 type asm of course but only as an inlined asm in the high level language (Turbo Pascal, then Delphi)
I wrote a lot of things in "286" (in reality, 8086+186 extension) pure asm (a long story to tell including love and cooling the CPU using a jar of pickles water, (the pickles were already eaten) ) on my 386/486 AT (home made compatible) computer.
I just downloaded the updated version and it works fine now. Thank you.
The space is bit tight right now, there is space for improvements as some opcodes can be merged with a skip pattern, however I don't like much an "hybrid" emulation, but will consider it once all is set an tested.
One thing I would like to add is the 8087 math co-processor using P2 cordic, this may reduce the space a bit but most of it will be implemented as hub-exec code.
I have used the MAME source as a reference, and it doesn't seems to implement it, as well as D0/D1 xx_110_xxx)
You have a CMP for 11111111 xx111xxx, when it should be the same as 11111111 xx110xxx.
These two links should be helpful:
https://www.os2museum.com/wp/undocumented-8086-opcodes-part-i/
https://www.reenigne.org/blog/8086-microcode-disassembled/
Unfortunately the first ends at opcode F1 and Part II was never written. FE xx does byte-sized versions of all the seven different instructions that FF xx does, but apart from INC & DEC they are a bit strange and can involve data or EA from a previous instruction.
Oh, I think I was looking at the wrong place in the sources... actually I don't remember where that CMP comes from (maybe a copy/paste from somewhere else), because as I said, the MAME source doesn't implement it, anyway I'll change to PUSH r/m.
I don't like to implement undocumented behaviours unless strictly needed (like those few 6502 instructions that are actually used) and I'm more comfortable to use a well established source like MAME as a reference.
Some progress...
Now there is fd.img in the repository so you have something to boot from. It contains Alley Cat game on it.
Need a YM3812 core? ;3
Yeah, I know, the cross-section of software that needs < 512K memory and has Adlib support is rather slim...
Still several things to do before that, one of which is a PASM SD/FAT driver...
Some more progress... after fighting an issue that was driving me crazy...
Now it can boot a real XT BIOS (the 601 error is the floppy drive, there is no floppy drive...).
Booting the BIOS means there are few things that are emulated correctly enough, like the processor itself, the programmable interrupt controller, the timer and, to some extent, the DMA (this is actually faked enough to let the BIOS believe it is refreshing the ram...).
And start the BASIC ROM.
Had to rewrite the processor emulation almost completely to fix the bugs. Thankfully I found a suite of tests that allowed me to fix everything... at the price of a rewrite because of some wrong assumptions I made about how things should work... the code now is completely unrolled and runs from hub memory, no skip patterns grouping, no optimizations, nothing... now that the tests are passing I can progressively optimize the code.
The speed is not much bad, however the fastest 8086 instructions are always slower than the allowed P2 clock cycles for accurate timing, however several 8086 common instructions are way faster on the P2 so on the average and with some optimizations I think I can get an (almost) cycle accurate emulation. The target for now is the 4.77MHz of the original PC XT.
Now it is time for the long-delayed PASM SD card driver...
Good stuff. How much total overhead is taken up so far in HUB/COG space with your emulator?
By the way, my video driver in its text mode probably does a reasonable representation of rendering the CGA screen buffer. It uses the same colour format as the PC and supports hi/low BG colour or blinking text attributes. I know there's far more to emulating a proper CGA if you want to go down to register level access and do everything correctly, but it might be useful temporarily if you needed to see colour output from a 80x25 HUB screen buffer with the usual default 16 colours and a CGA font etc.
Most of the commonly used code, like flags calculation, registers lookup and address calculations, is in COG ram to minimize the overhead, however the hub code uses a lot of call/jumps so the hub-exec fifo needs to be refilled frequently (I still need to understand if this is always true...).
I'm writing a bulk test to see how much each instruction takes (the 8086 cycles are not yet filled correctly), the following
0040:0000 : 00 C8 ADD AL, CL (0, 412)
Takes 3 clock cycles on the 8086 and 412 on the P2, however
0040:0000 : 02 06 A0 08 ADD AL, BYTE[DS:08A0h] (16, 567)
Takes 3+16 clock cycles on the 8086 but only 567 on the P2, so it is way faster.
I think I'll move the ALU instructions to HUB/LUT to make them faster to at least prevent the fifo overhead, but still they will be slow, unless I find a good way to optimize the registers lookup.
Thanks, however the CGA is already fully emulated (well, almost, I just ignore the MC6845 timing registers, the screens above doesn't show much but it is the CGA output at 640x400), I also have the MDA ready.
Yeah that seems like quite a lot of overhead, I suspect the flags manipulation stuff is large chunk of that.
Actually not, the flags takes only 23 instructions.
That instruction has the modr/m byte setup overhead to get the operands, the immediate variant takes a bit less
0040:0000 : 04 00 ADD AL, 00h (0, 327)
Don't remember the 8086 clock cycles (i think they are 4) but still slow.
If my math is correct, at 160MHz we have about 33 P2 cycles for 1 8086 cycle at 4.77 MHz, so the above is about 2,5 times slower.
This will be compensated, for now, by other instructions that are much slower on the 8086.