I have now tested all the 8080 op codes and got the thing to run the T8080 CPU diagnostic successfully. This is the first version that actually works!
Changes:
1) Fixed a load of bugs in opcode handling.
2) Minor changes to the dispatch loop to get speed up a bit.
3) Added T8080.COM as an example of how to get any 8080 code running.
4) The source and executable files of T8080 are included.
Speed is now up to 407 thousand instructions per second (KIPS). Hopes of getting any significant improvement on that are fading fast as all the ideas I had or suggestions received either slowed it down (!) or are not applicable when using external RAM.
There are still a few tweaks that can be done to gain a few percent and I may maintain a HUB RAM only version for max speed in applications that don't need much RAM, say, ZX81 emulation.
All in all I'm a bit depressed about the speed, all those COGs and HUB RAM used to gain 30% over the single COG emulator!
BUT this 4 COG version is very amenable to extension to the full Z80 instruction set. Is there any kind sole out there who knows his Z80 willing to contribute code to emulate the missing op codes ?
I have kept this independent of any particular machine emulation, no CP/M and such is included, so its free and easy for anyone to adapt to their desires.
Cheers all.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Suggestion for your table in cog:
Start your table in cog starting at $000, 1 long for each for the respective 6 bit page = 64 longs.
(You will have to have a jmp to start initially and put the correct table entry back on initialisation) Your table will just be long versions of your word table in hub now.
Here is code I suggest to access the table (vector) if stored in cog $000-$0BF· .....untested
xor op,#$00 '<- $00, $40, $80, $C0 for each of the 4 pages (op codes b7,b6) - $00 could be skipped
test op,#$C0 wz 'corect page ? (the above xor makes the correct page bits => 00)
if_z jmp op 'jump via the vector
jmp #skip 'not this page (cog) <--- fixed (from heater)
Re using a pin for synchronisation, I suggest you just invert (xor) each time you complete an instruction. The other cogs can read the pin at skip and set waitpeq or waitpne. I am sure this will be faster.
skip test ina,maskpin wz 'is pin a "0"
if_z movi :wait,#$1E0 'y so set waitpeq
in_nz movi :wait,#$1E8 'n so set waitpne
nop
:wait waitpne pinmask,pinmask 'if pin="0" waitpeq; if pin="1" waitpne
rdword pc, pc_reg 'get next global (HUB) Program Counter
jmp #fetch
If the read_rom routine is only called from read_memory_byte then you could code this within the read_memory_byte routine to save the call and ret instructions.
It seems all your do_add_x,·do_sub_x, etc (in i8080_mu_2) always do
In the hours since I posted some progress has been made. Now up to 414KIPS !
I have saved a lot of COG space by turning those arithmetic and logical functions into things that you jump to after having collected the parameters, from there they jump out to #done. No CALLs and RETs. As per Clusos suggestion I think.
Did a similar thing with many ops in the jmp/call/ret COG.
Now reading the A reg and Flags together as word for the arithmetic ops. Does not save much but something. Strangely writing them out as word had no noticeable effect
Cluso, I like you sneaky XOR to check for the right page in the dispatch table. Luckily with the changes I just made there is now room in all the COGs for their share of the table.
However I was wondering if it may be better to save the COG space for Z80 opcodes. Bah, what the hell, Z80 can wait a bit.
As for the pin I will try that again at some point, I'm sure I must have implemented it badly.
One surprise was that I had previously only written out the PC when a COG was done with it's turn. Reasoning that it was a waste of time when a COG is steaming through a bunch of instructions that are all "his". So the PC only got written when the COG came to skip.
Turns out it quicker to write the PC after every op. I guess that gives the other COGs a head start on checking the op.
This is fortunate because it makes the "single step" feature easier.
You can place 3 addresses and some flags into the 32 bit vector, which can call different subroutines - see my ClusoInterpreter for how I have done this. Quite quick if you can break code into common sections.
For the Z80, maybe you will need a few more cogs, just put the little used instructions into an overlay cog. Believe me the overlay is quicker than LMM style. This is where I started in the ClusInterpreter, before I put the decode table into hub.
If you get to using external SRAM, then I have ideas to ensure only 1 cog accesses at a time without having to do contention.
I saw there were some routines that could be sped up using some simple inline coding, but you probably should leave that for extra instructions for now.
Some of your code does the same things before jumping to done, so you could have these common routines above the done code and jump there to execute the common code.
You certainly have done well - congratulations
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
This is all very impressive. The speed will be fine. Most cp/m programs don't really need speed (99% of the time cp/m is just waiting for a keypress). And if you really want speed (eg compiling), I've found it quicker to do it in the altair simh which simulates a 180mhz z80). Looking at the z80 vs 8080 opcodes, and looking at examples of z80 source code, there may not be much urgency to get all the z80 opcodes in as most code doesn't use them. LDIR is the only instruction I use that is z80 specific (and even that comes with examples of 8080 instructions to replace it). I wonder if a higher priority might be to get 64k ram working rather than z80 opcodes? CP/M is totally 8080 as are a large number of cp/m programs. But there are lots of programs that need more than 24k ram. And if the source code is available (which it often is), it may well be easier to recode a cp/m source code with no z80 opcodes than to code the equivalent z80 opcode into this emulator. Re ram, that is a bit tricky as it is going to use up a huge number of pins all in one go. What size ram is probably the first decision. Do you go 64k or bigger?
@potatohead: I'll post the revs, here if you like, as I reach "islands of stability". As Linus Torvalds said "Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it"
@hinv: Someone who has a real 8080 in an Altair out there on the internet said he was getting 350KIPs froma 1MHz CPU if I remember correctly. So we have a correct emulation of the speed here[noparse]:)[/noparse]
@Cluso: I'm having a real hard time to get your dispatch table in COG suggestion to work. I presume when you said "Your table will just be long versions of your word table in hub now." you really meant it would be a list of JMP instructions to the handlers. I can't see how it could work otherwise. Anyway I tried the idea out in a small experimental program, attached, and got it to work just fine. But when I incorporate it into one of my emulator COGs everything goes crazy. Seems to be dispatching to the right places, at least sometimes, but then funny things start to happen to the HUB program counter and such. When I use the single step mode I can see it has trouble swapping from one COG to another.
Not sure what I would do with three addresses one vector just now.
I've been all around the MOV and control transfer COGs in lining code and replacing calls with jumps and such. Some in lining I will leave until we are "stable".
Z80 extensions:
This is interesting. Basically the new instructions are created by either using an undocumented 8080 opcode for the instruction or by using an undocumented op as a "prefix" to a whole bunch of other new instructions.
In the former case the Z80 has 5 new relative jump instructions that can probably just go in the current control transfer COG. And it has two instructions for swapping things between the alternate register sets that can probably be squeezed in just about anywhere.
In the latter case we need another COG or two. One could use dispatch tables to decode things after the "prefix byte" but just now this looks like a lot of table. Or possibly one could decode the ops in program logic. For example there are hundreds of bit set/reset/test instructions where I think it may be sensible just to extract the bit, the register and the operation in code and set some parameterized code to perform it.
Not sure I want to worry about the speed of Z80 ops. Apart from block moves maybe. As Dr_Acula say most of the CP/M world does not use many of the ops very much. So LMM or overlays would be fine by me. I have a sneaking feeling those who want to emulate a Sinclair ZX81 or Specrum may have more of a problem as Uncle Clive always used devices in devious ways.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
And yes, your skip code should follow, saving the jump anyway.
Yes, the way I coded the "if_z jmp op" you would need the vector table to be made of jumps which takes another instruction. This was not my intention, but instead to use the op code to jump indirectly via the vector.
xor op,#$00 '<- $00, $40, $80, $C0 for each of the 4 pages (op codes b7,b6) - $00 could be skipped
movs :jp,op
test op,#$C0 wz 'corect page ? (the above xor makes the correct page bits => 00)
:jp if_z jmp *-* 'jump via the vector
'fall through to #skip
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
[i]Links to other interesting threads:[/i]
After in lining a lot of codes and dispensing with calls/returns etc as discussed above and moving the dispatch table into the COGs adopting Cluso's super neat table look up and dispatch technique we have now hit 500KIPS.
As released here it runs the same CPU diagnostic as previously and reports 446KIPS.
Running through a bunch of MOV reg,reg it hits 621KIPs.
@Cluso: Many thanks for your table look up idea. I have used the "JMPS in table" version here. At the end of the day the same number of instructions are executed. I won't change it until you've explained to me how to use all the space in those LONGS.
Also, any idea what the code will look like to do random byte read/writes on the HexaPropTurboBladeHyperComputeSurface thingy ? Ideally we can throw out the existing distinction between RAM and ROM (Only used for CP/M so far) and replace it with routines for linear access to external RAM.
As usual any further suggestions on turbo charging this a tad more are surely welcome.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Vector table:
You can place 3 addresses in the long vectors plus 5 bits that can be shifted into the carry bit for testing.
xor op,#$00 '<- $00, $40, $80, $C0 for each of the 4 pages (op codes b7,b6) - $00 could be skipped
mov vector,op
test op,#$C0 wz 'corect page ? (the above xor makes the correct page bits => 00)
:jp if_z jmp vector 'jump via the vector
'fall through to #skip
'call 2nd and 3rd vector by
' jmpret v_ret,v
v shr vector,#9 'shift to 2nd/3rd vector
jmp vector 'go execute
v_ret jmp *-* 'may or may not be used depending on code by "jmp v_ret" (not jmp #v_ret)
vector long 0-0 'holds vectors
'for testing the bits, you will need to know if the vector has been shifted (presume it hasn't here)
'to test bit 30...
rcl vector,#2 wc,nr 'get bit 30 into carry, don't writeback
if_c jmp ... 'NOTE- no penalty if jmp not taken (i.e. 4 cycles always taken or not)
'for testing 2 bits b30 & b31
rcr vector,#30 wc,wz,nr 'puts c=bit 30, z=-bit31(zero result)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
i8080_emu has started mutating into a full Z80 emulation.
In this release I have added all the z80 registers, including the alternate set.
All the single byte Z80 ops are implemented. Well that's not many only relative jumps, DJNZ, EX and EXX.
I have put a "hook" where all the multi byte Z80 instructions end up. Currently un-handled.
This release is configured to single step through 8080 code every time you hit a key. Displaying the registers after each step.
If you hold down a key long enough it will step to the end of the T8080 CPU diagnostic and display that a 01 was output on port 00 indicating a pass result.
Speed on the CPU diagnostic test is still 446KIPs, I lost a few fractions of a percent shrinking the code here and there.
I have included a file listing all the Z80 ops yet to do. There is only about 510 of them !!
Anyone got any advice on how to deal with them? There is space enough in at least two of the COGs to do about 200 LONGs worth of work so I can probably squeeze all the 250 odd bit operations into there somewhere. But what about the other 250 ops? I could throw another COG or two at them but that seems like such a waste for instructions that are rarely used. Clusso has hinted at some sneaky overlay mechanism which may be just what I need here.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Seeing that bit, set and res are quite a few of those 510, the count should be easily down to at least 499!, just joking! You are really squeezing the thing. There is a spectrum emulator in 4096 bytes of x86 assembler, complete with screen and z80 emulation. I have yet to understand it (just look through it), it is called bacteria. May be it is of any inspiration.
Great work !
Oh thanks a lot for that titbit Ale. Now I'm really depressed. I bet bacteria just screams along on a modern PC. Think I'll just delete my emulator an throw myself off the balcony.
Wait a minute. My original PropAltair 8080 emulator was only 1 COG and a bit of code say 2.5Kbytes. I'm not doing so bad. That balcony looks a bit unsafe I think I'll stay inside.
Thing is, every 8080 instruction (not sure about Z80) has an almost direct equivalent in x86. There are some minor differences in the way flags are set some times.
When Intel introduced the 8086 they gave away a tool, Conv86, that translated instruction for instruction from 8080 assembler source to 8086 source.
If those flag difference caused something not to behave correctly there was a switch you could set that would add some extra flag twiddling instructions to make it correct.
I used Conv86 once when we had to get about 10 man years of 8085 code running on 8088. Worked a treat.
So I imagine writing an 8080/z80 emulator in x86 asm is much easier than doing it on any other architecture. Still bacreia sounds impressive.
No. I will not take a look at bacteria. Doctor won't give me any more Prozac.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Don't get depressed - you are actually making a huge amount of progress!
Most of those 500 instructions are set, reset and bit. Given these are hardly used then a slower emulation that takes less code space is the better choice. So say you have an instruction that sets a bit on register D. Do all the bit changes on register A, so you can replace that code with existing code eg push af, ld a,d, ... set the bit etc on a ... ld d, a pop af. (or the equivalent in 8080 mnemonics). There might be a problem there with your existing mov/ld code if you have optimised that by using jumps rather than calls - then you can't call it from another location. Then again, the above might take more code space than actually setting and resetting bits directly. In whatever case, the code is going to end up with rows of instructions that look very similar to each other. Copy and paste will probably get a workout. If it takes another cog does that matter? I'm just thinking that Vince has just put a photo on the n8vem website of his working propeller termimal board, all populated and working. So if there is a terminal board doing all the vga/keyboard and all that stuff, then your prop could be free to use all of the cogs if you like. Down the track, maybe merge two props into one, but for the moment would it be ok to not worry too much about needing more cog space?
I'm only just getting into how the z80 opcodes are laid out. Starting with BIT/SET/RES. Of which there are 240!
However this can be very straight forward as most are of the form "SET bit_number, register" and have a two byte instruction sequence :
11001011
01BBBRRR
Where BBB specifies the bit and RRR the register.
So we only have to extract the RRR bits and use them to index to the correct register in the register file. Extract he BBB bits and use them to shift a bit mask to the right place then we are done with an "OR target_reg, bit_mask" in PASM. (also ANDN and TEST). Only the zero flag to worry about.
The rest of them operate on memory via HL, IX, IY and have a similar regular structure we can handle in a few LONGS of PASM.
What I'm saying is that rather than fully decode the instructions by use of a dispatch table and having hundreds of similar looking routines we use program logic to analyse the instruction fields and arrive at the correct operation and target register or memory location.
The dispatch table is fast but big, program logic can be slower but very small. Fine for all those mostly unused ops.
Haven't looked much at the other 250 ops! But looks like a similar trick can be done with the rotate/shifts.
Of the 200 remaining ops most start with a prefix byte DD or FD which selects which index register to use IX or IY we can just select the correct index reg with a few PASM instructions and then continue decoding with common code. Straight away that reduces things down to only 100 or so operations.
Many of those 100 ops use an instruction field to select a register B,C,D,.. so again we use a few PASM instructions to select the target register and continue with common code. That divides the 100 by 7 or so and we are down to about 15 distinct operations.
At this point I start to think we could fit the whole thing into the space remaining in the 4 COGs. Probably not, so having decoded as much as possible its time to do the op from some overlay code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I have been thinking about adding external sram and its impact on your code. If when a cog completes it's 8080 instruction, it reads the next opcode and places this into a fixed hub location, and then·updates the pc in hub, no cogs will have sram contention because they will·just fetch the·opcode from a fixed hub location (which has been pre-fetched from sram).
Here are a few ideas...
You can preload your variables/constants (ram_base, etc) in spin directly into the hub ram before loading. This will save code space in the cogs.
Suggest you place the "patch jmp ....." as·an extra entry to the despatch table (at the end) and do the replace as the first instruction at "Initialise" -·it just makes the code·more readable (keeping the patch entry together with the table).
I·would like·to understand basically what the cpm implements in regards to peripherals to see what impact the sram will have on this and the easiest/fastest method to interface to the sram. Is the·terminal part of the code or can it be external (i.e. keyboard and vga/tv in another prop)?
Place the code "alu_arith_flags" immedaitely after one of the instructions that jumps to it - saves an instruction - try and place after the most likely used opcode which is probably "cmp" (not the alu_add_with_carry/borrow ones). Do likewise with "alu_logic_flags" - probably most used is "and".
For the opcodes "do_mov_b_b" etc, just code the dispatch table to jmp #done. They are not likely used so this is unlikely to improve speed, but saves·code space which you may later need, or allow better inline coding for highly used opcodes. Such as inlining "read_rom" inside "read_memory_byte" to save 2 instructions executing.
For the instructions like this "jmp·#read_memory_byte_ret" remove the "#" to save an extra jmp being executed.
I think that the read and write words have to be word aligned in 8080, so you probably could simplify to a hub word read. Not sure if the bits need swapping (endian). I think the prop and 8080 are the same (motorola and 6502 are opposite).
Hope this helps · Please, please,·don't take this the wrong way - you have done a fantastic job
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
Re "I would like to understand basically what the cpm implements in regards to peripherals to see what impact the sram will have on this and the easiest/fastest method to interface to the sram. Is the terminal part of the code or can it be external (i.e. keyboard and vga/tv in another prop)?"
CP/M is very complex but if the 8080 codes are all emulated then it will just run and you don't have to worry about what it is doing internally. The custom bits, as you say, are the teminal/keyboard etc. At the very simplest level, these come down to implementing the OUT and IN instructions. These read or write a byte from/to the data bus. For an OUT, on the chip the IORQ line goes low, the WR line goes low, the lower 8 bits of the address lines goes low corresponding to which port 0-255, and the data lines output the byte. In hardware, you decode 0 to 255 using chips like the HC138 1 of 8 decoders, so one of the output lines goes low whenever it gets its correct address. At the simplest level, you could then latch the byte onto a HC373. Or to be a bit more complex, latch bytes into a UART. UARTs and PIO chips usually have a range of addresses (eg 5) so you can output one of 5 numbers.
You don't usually need very many address lines decoded - maybe 10 or so on a typical board.
So then it is a matter of how you implement OUT and IN. If you are driving sram, then you already have a data bus (8 lines) and an address bus (16 lines). You can use the lower 8 of the address lines (or even less, if you are only decoding out ports 0-7 you only need 3 lines). You will already have RD and WR going to the sram. In such a setup you would not have MREQ working because all RD and WR signals are only going to sram, but if you want to sometimes address ram and sometimes io, then you just need two extra pins on the prop - MREQ which goes low to indicate a sram address, and IORQ to indicate a port address.
This assumes you are writing to a real external chip, which might be a 373 latch, or it might be an 8255 PIO or it might be a 16C550 uart. But the prop can do things like the uart! So instead of an OUT instruction sending out a byte on some real pins, you could trap some addresses and instead write that byte to a location in hub ram. You might write another byte to another hub ram location with a value 1 to indicate there is a byte there. Then another cog could be checking that hub ram location and if there is a byte there, it could go off and send it out with the standard uart/serial code.
Thinking of a real prop board, the most useful thing would be some serial ports. So IN and OUT instructions could jump to a cog that handles several serial lines. Maybe you want some parallel outputs/inputs as well, so you could add some HC373s for outputs and some HC244s for inputs. But they may not be needed. A very useful prop emulation would have vga, ps2 keyboard, 2 serial ports and sd card for mass storage and 64k or 128k sram. In that configuration, you don't need a physical pin devoted to IORQ (or to MREQ) because OUT and IN instructions never do anything directly to the prop pins. They just jump to either routines to run prop uart code, or a routine to send a byte out to the vga screen (via a terminal emulation in code) or check if a byte has come in from the keyboard cog routine. Or you could have the keyboard and vga on a seperate prop (see Vince's one or even the demo board) but that still ends up being just one standard serial connection at (say) 9600 baud.
Pre-loading parameters prior to loading the COG I was thinking about this morning, saves about 25 LONGs in each COG!
Agreed about the "patch", should generally try to keep things tidy. But I thought I'd adopt your dispatch table of LONGs idea in which case we don't need "patch" LONG.
As far a CP/M peripherals are concerned everything I have so far goes through IN/OUT instructions. Basically I/O mapped peripherals. That I/O space is totally independent of the RAM space. What I have done is to write port number and data to some shared HUB variables along with a command indicating IN or OUT. From there another COG looks at the command and port and responds as a UART or floppy disk controller. The emulator COG does a loop waiting for the I/O cog to sort it self out and set the command back to zero. Currently the I/O cog is actually the main start up COG also and IO is handled slowly in Spin it could just as well be PASM in another COG.
When we come to emulating hard disks things might get a bit more tricky as the hard disk driver in my CP/M BIOS uses DMA. The BIOS I am using is exactly that used in the SIMH Altairz80 emulator and I'd rather not hack it around and create yet another BIOS. However given that CP/M is not multi-tasking it waits for the DMA sector transfer to complete before moving on. So we can give the hard disk simulation code access to the RAM whilst CP/M is waiting I think.
Agreed about "alu_arith_flags" and "do_mov_b_b".
I thought perhaps with a 64K external RAM space available we would get rid of read_rom. Just write the 256 bytes of boot ROM to the RAM at FF00 before we start. read_rom is only there at he moment because I don't have a contiguous address space and have to map RAm and ROM areas into the HUB.
remove the "#" - Cool
Word accesses do NOT have to be word aligned in 8080. Which is a pain for us. It is little endian like the Prop.
I'm still trying to understand the consequences of your paragraph about pre-fetching opcodes. But get this:
Messing around this morning I discovered I could put 5 or 6 NOPs after the PC has been written out and before reading it back in again in the "skip" loop without any noticeable effect on the KIPS. Well perhaps 0.1%. Those NOPs are just wasting "dead" time after a COG has given up.
So given that we already have the opcode, which we have decided to skip, we can write it to HUB in free time! and save the next COG having to read it again. Which I think is what you were saying. Probably makes no sense to do when using HUB RAM but at least I an try it out.
Clusso, I don't take anything the wrong way. I've always said I'm open to suggestions. May never have got this far without help and encouragement.
So actually a big thank you to yourself and everyone else who has contributed. We are not "done" yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Heater, that is great news. The nops probably means that the code timing is not really dependent on this bit. The more space you get in a cog, the more you can unravel the high usage opcodes.
Yes, I agree that 64K contiguous RAM will solve the rom although I think you should check writes to ensure that they dont clobber ROM.
Shame 8080 words are not aligned. May still be quicker to see if they are aligned?
And yes, the prefetching before passing back control is for RAM use only, but should not impact the speed using hub.
The big part is your I/O. This is great news because you can trap and replace the code easily with other code. I know you are actually doing this now anyway. I guess I was just trying to understand it a little better. The main reason is that I believe the terminal (vga/tv and keyboard or a pc terminal) maybe on another prop, with I/O pins as well. The microSD card should be on the emulator prop together with the SRAM (512KBytes), so one hdisk could be sram of approx 448KBytes (or 360KB for floppy size) - this would fly !!!
Dr_Acula: Yes, I understand the hardware level, but as you have said it is not important in an emulation. This is what I was trying to get to understand - how the emulation was being done.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
This sounnds better an better everyday, It's going to turn out much better than I ever hoped when I started over a year ago.
As for timing , that's why I did the experiment with the NOPs at that point in the code. For sure the "exiting" COG has nothing to do at that point and the others are busy trying to figure out if it is their turn. I just wondered how much we could squeeze in there. Like testing for Z80 op prefixes maybe.
For sure we should "write protect" the ROM.
Alignment checking had ocurred to me also.
Actually shouldn't opcode pre-fetching help when using HUB RAM as well ? As it is all COGs have to go through read_memory_byte to get the op.
For sure the terminal can be on another Prop. OBC's PropComm VT100 would be just fine, modified to use what ever communication scheme you have rather than FullDuplexSerial or whatever it is now.
Given fast enough Prop to Prop communication the SD card could be elsewhere as well. At the moment disks are very slow as the I/O is all handled in Spin going through the Spin interface to the sd spi driver. This needs turbo charging with PASM.
RAM disks would be a treat and easy todo.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
The RAM disk would be fantastic!!! You can load up the RAM disk from a microSD or SPI Flash (1-64Mbit) chip.
The SD Card can and should be on the same prop as it will speed things up. I have designed it to be so, just that it will have to be wired because I ran out of time to place one on the pcb - But, Peter J. squeezed a socket on for us and I am planning on some extras on the top edge of my pcb I want it to be extremely versatile/flexible.
The terminal will fit nicely on another prop - OBC's VT100 will be just fine For this application, we will not require the SRAM for the VT100. But hey, we can have color !!!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
This whole idea about passing already fetched opcodes to from a "skipping" COG to a "starting" COG does not work out for me. In this release I have a version of such a dispatch loop in each COG. (It is commented out, but it does work if you swap it in). Basically I loose 10% or more of speed instead of gaining ! Perhaps it starts to win if we have a really slow external memory interface.
Perhaps Clusso or someone could dream up a quicker version.
Anyway the main point of this was to prevent contention for external RAM between COGs. This release achieves that. By the simple expedient of pre-loading the first opcode to be executed into each COG before starting we can avoid them racing to get it. After that no one access 8080 memory space if they are the "skip" state.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Very nice page to download al CPM versions and suport files to Altair Z80 CPM
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Nothing is impossible, there are only different degrees of difficulty. For every stupid question there is at least one intelligent answer. Don't guess - ask instead. If you don't ask you won't know. If your gonna construct something, make it·as simple as·possible yet as versatile as posible.
I'll second that. The SIMH is a fantastic program. It emulates a 170Mhz Z80 on my pc and that is on a 6 year old machine. So fast indeed that I've incorporated the SIMH into a vb.net IDE that shells out to the SIMH to compile programs and then downloads them to a real Z80 because it is much faster than compiling on a real board. One touch compile to download to run in about 10 seconds and that is much faster than it ever was on a real Z80. The SIMH is actually essential to the process of developing code quickly. You can type instructions into it manually, or telnet to it from hyperterminal etc, or even send it batch instructions.
I think heater has got up to emulating about a 1mhz machine, maybe a bit more, but one thing I've found is that the emulation speed is not all that important for running programs - indeed I'm underclocking some chips to save power. Where you want a fast machine is with compiling, not with running, so the SIMH shines when it comes to compiling and then emulate at a slower speed.
We haven't heard from heater for a while and I presume he is deep inside the code, possibly with less hair than he started with. This little project has the potential to put a 20 chip circuit into a single 40 pin prop chip (ok, maybe with a sram as well). I hope heater hasn't given up whilst in the middle of tediously coding those single bit set and reset instructions.
Sapieha, Someone is not paying attention here[noparse];)[/noparse]
The PropAltair CP/M on a Propeller has been using CP/M and other resources from the SIMH Altairz80 project since the beginning. In fact one of my primary aims with PropAltair is to be able to use the CP/M disk images from SIMH unchanged. That is, using the same CP/M BIOS and Altair bootloader etc as SIMH. I did not want to have to maintain yet another customized CP/M for the Propeller.
i8080_emu is an entirely rewritten emulator using 4 COGs as opposed to PropAltair only using 1. The idea being to boost speed and eventually fit in all the Z80 opcodes that PropAltair is missing (8080 only). So far i8080_emu is released without any CP/M or BASIC or whatever, just a CPU emulator and some test code. The idea being that it may be useful to others for different machine simulators.
Dr_A, I'm still here tinkering with all things Prop, CPM and Z80. However I'm starting to despair again about the performance of this 4 COG emulator. I fitted out the old single COG emulator with the frequency counter object and set it up to run the same tests as i8080_emu. Well the bottom line is that when running the T8080 CPU diagnostics i8080_emu is all of 3.5% faster !!!.
So as it stands we are just wasting 3 COGs for 3.5% gain. I'm not sure where to go next with this. Write it off as a failure. Hope that some genius figures a way to turbo it. Just add the Z80 ops as LMM to the old 1 COG emulator......
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!
I have now tested all the 8080 op codes and got the thing to run the T8080 CPU diagnostic successfully. This is the first version that actually works!
Changes:
1) Fixed a load of bugs in opcode handling.
2) Minor changes to the dispatch loop to get speed up a bit.
3) Added T8080.COM as an example of how to get any 8080 code running.
4) The source and executable files of T8080 are included.
Speed is now up to 407 thousand instructions per second (KIPS). Hopes of getting any significant improvement on that are fading fast as all the ideas I had or suggestions received either slowed it down (!) or are not applicable when using external RAM.
There are still a few tweaks that can be done to gain a few percent and I may maintain a HUB RAM only version for max speed in applications that don't need much RAM, say, ZX81 emulation.
All in all I'm a bit depressed about the speed, all those COGs and HUB RAM used to gain 30% over the single COG emulator!
BUT this 4 COG version is very amenable to extension to the full Z80 instruction set. Is there any kind sole out there who knows his Z80 willing to contribute code to emulate the missing op codes ?
I have kept this independent of any particular machine emulation, no CP/M and such is included, so its free and easy for anyone to adapt to their desires.
Cheers all.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
A few suggestions to speed it up...
Suggestion for your table in cog:
Start your table in cog starting at $000, 1 long for each for the respective 6 bit page = 64 longs.
(You will have to have a jmp to start initially and put the correct table entry back on initialisation) Your table will just be long versions of your word table in hub now.
Here is code I suggest to access the table (vector) if stored in cog $000-$0BF· .....untested
Re using a pin for synchronisation, I suggest you just invert (xor) each time you complete an instruction. The other cogs can read the pin at skip and set waitpeq or waitpne. I am sure this will be faster.
If the read_rom routine is only called from read_memory_byte then you could code this within the read_memory_byte routine to save the call and ret instructions.
It seems all your do_add_x,·do_sub_x, etc (in i8080_mu_2) always do
·Perhaps this could be coded
Just some quick ideas - I haven't fully checked your code to ensure these routines are not called elsewhere and as such could make it fail.
You have done well
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
Post Edited (Cluso99) : 2/8/2009 1:36:17 PM GMT
Been really struggling with how to lay out some things in the 6502 project. There is some help here big time. Thanks a bunch for posting it up.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!
I have saved a lot of COG space by turning those arithmetic and logical functions into things that you jump to after having collected the parameters, from there they jump out to #done. No CALLs and RETs. As per Clusos suggestion I think.
Did a similar thing with many ops in the jmp/call/ret COG.
Now reading the A reg and Flags together as word for the arithmetic ops. Does not save much but something. Strangely writing them out as word had no noticeable effect
Cluso, I like you sneaky XOR to check for the right page in the dispatch table. Luckily with the changes I just made there is now room in all the COGs for their share of the table.
However I was wondering if it may be better to save the COG space for Z80 opcodes. Bah, what the hell, Z80 can wait a bit.
As for the pin I will try that again at some point, I'm sure I must have implemented it badly.
One surprise was that I had previously only written out the PC when a COG was done with it's turn. Reasoning that it was a waste of time when a COG is steaming through a bunch of instructions that are all "his". So the PC only got written when the COG came to skip.
Turns out it quicker to write the PC after every op. I guess that gives the other COGs a head start on checking the op.
This is fortunate because it makes the "single step" feature easier.
I was just checking over this nice 8085/Z80 instruction set table nemesis.lonestar.org/computers/tandy/software/apps/m4/qd/opcodes.html, wow, looks like all those extra Z80 ops could eat another 4 COGs!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Seeing that happen is instructive!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!
What was the original KIPS of the 8080?
You can place 3 addresses and some flags into the 32 bit vector, which can call different subroutines - see my ClusoInterpreter for how I have done this. Quite quick if you can break code into common sections.
For the Z80, maybe you will need a few more cogs, just put the little used instructions into an overlay cog. Believe me the overlay is quicker than LMM style. This is where I started in the ClusInterpreter, before I put the decode table into hub.
If you get to using external SRAM, then I have ideas to ensure only 1 cog accesses at a time without having to do contention.
I saw there were some routines that could be sped up using some simple inline coding, but you probably should leave that for extra instructions for now.
Some of your code does the same things before jumping to done, so you could have these common routines above the done code and jump there to execute the common code.
You certainly have done well - congratulations
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
@hinv: Someone who has a real 8080 in an Altair out there on the internet said he was getting 350KIPs froma 1MHz CPU if I remember correctly. So we have a correct emulation of the speed here[noparse]:)[/noparse]
@Cluso: I'm having a real hard time to get your dispatch table in COG suggestion to work. I presume when you said "Your table will just be long versions of your word table in hub now." you really meant it would be a list of JMP instructions to the handlers. I can't see how it could work otherwise. Anyway I tried the idea out in a small experimental program, attached, and got it to work just fine. But when I incorporate it into one of my emulator COGs everything goes crazy. Seems to be dispatching to the right places, at least sometimes, but then funny things start to happen to the HUB program counter and such. When I use the single step mode I can see it has trouble swapping from one COG to another.
Not sure what I would do with three addresses one vector just now.
I've been all around the MOV and control transfer COGs in lining code and replacing calls with jumps and such. Some in lining I will leave until we are "stable".
Z80 extensions:
This is interesting. Basically the new instructions are created by either using an undocumented 8080 opcode for the instruction or by using an undocumented op as a "prefix" to a whole bunch of other new instructions.
In the former case the Z80 has 5 new relative jump instructions that can probably just go in the current control transfer COG. And it has two instructions for swapping things between the alternate register sets that can probably be squeezed in just about anywhere.
In the latter case we need another COG or two. One could use dispatch tables to decode things after the "prefix byte" but just now this looks like a lot of table. Or possibly one could decode the ops in program logic. For example there are hundreds of bit set/reset/test instructions where I think it may be sensible just to extract the bit, the register and the operation in code and set some parameterized code to perform it.
Not sure I want to worry about the speed of Z80 ops. Apart from block moves maybe. As Dr_Acula say most of the CP/M world does not use many of the ops very much. So LMM or overlays would be fine by me. I have a sneaking feeling those who want to emulate a Sinclair ZX81 or Specrum may have more of a problem as Uncle Clive always used devices in devious ways.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
You left the "#" of the JMP #SKIP !!!!!
Turns out with my arrangement of code we don't need that jump anyway.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
And yes, your skip code should follow, saving the jump anyway.
Yes, the way I coded the "if_z jmp op" you would need the vector table to be made of jumps which takes another instruction. This was not my intention, but instead to use the op code to jump indirectly via the vector.
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
After in lining a lot of codes and dispensing with calls/returns etc as discussed above and moving the dispatch table into the COGs adopting Cluso's super neat table look up and dispatch technique we have now hit 500KIPS.
As released here it runs the same CPU diagnostic as previously and reports 446KIPS.
Running through a bunch of MOV reg,reg it hits 621KIPs.
@Cluso: Many thanks for your table look up idea. I have used the "JMPS in table" version here. At the end of the day the same number of instructions are executed. I won't change it until you've explained to me how to use all the space in those LONGS.
Also, any idea what the code will look like to do random byte read/writes on the HexaPropTurboBladeHyperComputeSurface thingy ? Ideally we can throw out the existing distinction between RAM and ROM (Only used for CP/M so far) and replace it with routines for linear access to external RAM.
As usual any further suggestions on turbo charging this a tad more are surely welcome.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Vector table:
You can place 3 addresses in the long vectors plus 5 bits that can be shifted into the carry bit for testing.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
i8080_emu has started mutating into a full Z80 emulation.
In this release I have added all the z80 registers, including the alternate set.
All the single byte Z80 ops are implemented. Well that's not many only relative jumps, DJNZ, EX and EXX.
I have put a "hook" where all the multi byte Z80 instructions end up. Currently un-handled.
This release is configured to single step through 8080 code every time you hit a key. Displaying the registers after each step.
If you hold down a key long enough it will step to the end of the T8080 CPU diagnostic and display that a 01 was output on port 00 indicating a pass result.
Speed on the CPU diagnostic test is still 446KIPs, I lost a few fractions of a percent shrinking the code here and there.
I have included a file listing all the Z80 ops yet to do. There is only about 510 of them !!
Anyone got any advice on how to deal with them? There is space enough in at least two of the COGs to do about 200 LONGs worth of work so I can probably squeeze all the 250 odd bit operations into there somewhere. But what about the other 250 ops? I could throw another COG or two at them but that seems like such a waste for instructions that are rarely used. Clusso has hinted at some sneaky overlay mechanism which may be just what I need here.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 2/11/2009 4:40:10 PM GMT
Great work !
Wait a minute. My original PropAltair 8080 emulator was only 1 COG and a bit of code say 2.5Kbytes. I'm not doing so bad. That balcony looks a bit unsafe I think I'll stay inside.
Thing is, every 8080 instruction (not sure about Z80) has an almost direct equivalent in x86. There are some minor differences in the way flags are set some times.
When Intel introduced the 8086 they gave away a tool, Conv86, that translated instruction for instruction from 8080 assembler source to 8086 source.
If those flag difference caused something not to behave correctly there was a switch you could set that would add some extra flag twiddling instructions to make it correct.
I used Conv86 once when we had to get about 10 man years of 8085 code running on 8088. Worked a treat.
So I imagine writing an 8080/z80 emulator in x86 asm is much easier than doing it on any other architecture. Still bacreia sounds impressive.
No. I will not take a look at bacteria. Doctor won't give me any more Prozac.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Yes, you are probably correct about a x86 doing the emulation easy, few instructions, and faster (GHz). But that is not what you are doing.
Take a look at my overlay routines here (listed in the Prop Tools thread in my signature)
Assembly Oververlay Loader for Cog FAST (renamed & released)
http://forums.parallax.com/forums/default.aspx?f=25&m=272823
If you need any help just shout
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
Most of those 500 instructions are set, reset and bit. Given these are hardly used then a slower emulation that takes less code space is the better choice. So say you have an instruction that sets a bit on register D. Do all the bit changes on register A, so you can replace that code with existing code eg push af, ld a,d, ... set the bit etc on a ... ld d, a pop af. (or the equivalent in 8080 mnemonics). There might be a problem there with your existing mov/ld code if you have optimised that by using jumps rather than calls - then you can't call it from another location. Then again, the above might take more code space than actually setting and resetting bits directly. In whatever case, the code is going to end up with rows of instructions that look very similar to each other. Copy and paste will probably get a workout. If it takes another cog does that matter? I'm just thinking that Vince has just put a photo on the n8vem website of his working propeller termimal board, all populated and working. So if there is a terminal board doing all the vga/keyboard and all that stuff, then your prop could be free to use all of the cogs if you like. Down the track, maybe merge two props into one, but for the moment would it be ok to not worry too much about needing more cog space?
I'm only just getting into how the z80 opcodes are laid out. Starting with BIT/SET/RES. Of which there are 240!
However this can be very straight forward as most are of the form "SET bit_number, register" and have a two byte instruction sequence :
11001011
01BBBRRR
Where BBB specifies the bit and RRR the register.
So we only have to extract the RRR bits and use them to index to the correct register in the register file. Extract he BBB bits and use them to shift a bit mask to the right place then we are done with an "OR target_reg, bit_mask" in PASM. (also ANDN and TEST). Only the zero flag to worry about.
The rest of them operate on memory via HL, IX, IY and have a similar regular structure we can handle in a few LONGS of PASM.
What I'm saying is that rather than fully decode the instructions by use of a dispatch table and having hundreds of similar looking routines we use program logic to analyse the instruction fields and arrive at the correct operation and target register or memory location.
The dispatch table is fast but big, program logic can be slower but very small. Fine for all those mostly unused ops.
Haven't looked much at the other 250 ops! But looks like a similar trick can be done with the rotate/shifts.
Of the 200 remaining ops most start with a prefix byte DD or FD which selects which index register to use IX or IY we can just select the correct index reg with a few PASM instructions and then continue decoding with common code. Straight away that reduces things down to only 100 or so operations.
Many of those 100 ops use an instruction field to select a register B,C,D,.. so again we use a few PASM instructions to select the target register and continue with common code. That divides the 100 by 7 or so and we are down to about 15 distinct operations.
At this point I start to think we could fit the whole thing into the space remaining in the 4 COGs. Probably not, so having decoded as much as possible its time to do the op from some overlay code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 2/12/2009 8:17:50 AM GMT
I have been thinking about adding external sram and its impact on your code. If when a cog completes it's 8080 instruction, it reads the next opcode and places this into a fixed hub location, and then·updates the pc in hub, no cogs will have sram contention because they will·just fetch the·opcode from a fixed hub location (which has been pre-fetched from sram).
Here are a few ideas...
You can preload your variables/constants (ram_base, etc) in spin directly into the hub ram before loading. This will save code space in the cogs.
Suggest you place the "patch jmp ....." as·an extra entry to the despatch table (at the end) and do the replace as the first instruction at "Initialise" -·it just makes the code·more readable (keeping the patch entry together with the table).
I·would like·to understand basically what the cpm implements in regards to peripherals to see what impact the sram will have on this and the easiest/fastest method to interface to the sram. Is the·terminal part of the code or can it be external (i.e. keyboard and vga/tv in another prop)?
Place the code "alu_arith_flags" immedaitely after one of the instructions that jumps to it - saves an instruction - try and place after the most likely used opcode which is probably "cmp" (not the alu_add_with_carry/borrow ones). Do likewise with "alu_logic_flags" - probably most used is "and".
For the opcodes "do_mov_b_b" etc, just code the dispatch table to jmp #done. They are not likely used so this is unlikely to improve speed, but saves·code space which you may later need, or allow better inline coding for highly used opcodes. Such as inlining "read_rom" inside "read_memory_byte" to save 2 instructions executing.
For the instructions like this "jmp·#read_memory_byte_ret" remove the "#" to save an extra jmp being executed.
I think that the read and write words have to be word aligned in 8080, so you probably could simplify to a hub word read. Not sure if the bits need swapping (endian). I think the prop and 8080 are the same (motorola and 6502 are opposite).
Hope this helps
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
CP/M is very complex but if the 8080 codes are all emulated then it will just run and you don't have to worry about what it is doing internally. The custom bits, as you say, are the teminal/keyboard etc. At the very simplest level, these come down to implementing the OUT and IN instructions. These read or write a byte from/to the data bus. For an OUT, on the chip the IORQ line goes low, the WR line goes low, the lower 8 bits of the address lines goes low corresponding to which port 0-255, and the data lines output the byte. In hardware, you decode 0 to 255 using chips like the HC138 1 of 8 decoders, so one of the output lines goes low whenever it gets its correct address. At the simplest level, you could then latch the byte onto a HC373. Or to be a bit more complex, latch bytes into a UART. UARTs and PIO chips usually have a range of addresses (eg 5) so you can output one of 5 numbers.
You don't usually need very many address lines decoded - maybe 10 or so on a typical board.
So then it is a matter of how you implement OUT and IN. If you are driving sram, then you already have a data bus (8 lines) and an address bus (16 lines). You can use the lower 8 of the address lines (or even less, if you are only decoding out ports 0-7 you only need 3 lines). You will already have RD and WR going to the sram. In such a setup you would not have MREQ working because all RD and WR signals are only going to sram, but if you want to sometimes address ram and sometimes io, then you just need two extra pins on the prop - MREQ which goes low to indicate a sram address, and IORQ to indicate a port address.
This assumes you are writing to a real external chip, which might be a 373 latch, or it might be an 8255 PIO or it might be a 16C550 uart. But the prop can do things like the uart! So instead of an OUT instruction sending out a byte on some real pins, you could trap some addresses and instead write that byte to a location in hub ram. You might write another byte to another hub ram location with a value 1 to indicate there is a byte there. Then another cog could be checking that hub ram location and if there is a byte there, it could go off and send it out with the standard uart/serial code.
Thinking of a real prop board, the most useful thing would be some serial ports. So IN and OUT instructions could jump to a cog that handles several serial lines. Maybe you want some parallel outputs/inputs as well, so you could add some HC373s for outputs and some HC244s for inputs. But they may not be needed. A very useful prop emulation would have vga, ps2 keyboard, 2 serial ports and sd card for mass storage and 64k or 128k sram. In that configuration, you don't need a physical pin devoted to IORQ (or to MREQ) because OUT and IN instructions never do anything directly to the prop pins. They just jump to either routines to run prop uart code, or a routine to send a byte out to the vga screen (via a terminal emulation in code) or check if a byte has come in from the keyboard cog routine. Or you could have the keyboard and vga on a seperate prop (see Vince's one or even the demo board) but that still ends up being just one standard serial connection at (say) 9600 baud.
Pre-loading parameters prior to loading the COG I was thinking about this morning, saves about 25 LONGs in each COG!
Agreed about the "patch", should generally try to keep things tidy. But I thought I'd adopt your dispatch table of LONGs idea in which case we don't need "patch" LONG.
As far a CP/M peripherals are concerned everything I have so far goes through IN/OUT instructions. Basically I/O mapped peripherals. That I/O space is totally independent of the RAM space. What I have done is to write port number and data to some shared HUB variables along with a command indicating IN or OUT. From there another COG looks at the command and port and responds as a UART or floppy disk controller. The emulator COG does a loop waiting for the I/O cog to sort it self out and set the command back to zero. Currently the I/O cog is actually the main start up COG also and IO is handled slowly in Spin it could just as well be PASM in another COG.
When we come to emulating hard disks things might get a bit more tricky as the hard disk driver in my CP/M BIOS uses DMA. The BIOS I am using is exactly that used in the SIMH Altairz80 emulator and I'd rather not hack it around and create yet another BIOS. However given that CP/M is not multi-tasking it waits for the DMA sector transfer to complete before moving on. So we can give the hard disk simulation code access to the RAM whilst CP/M is waiting I think.
Agreed about "alu_arith_flags" and "do_mov_b_b".
I thought perhaps with a 64K external RAM space available we would get rid of read_rom. Just write the 256 bytes of boot ROM to the RAM at FF00 before we start. read_rom is only there at he moment because I don't have a contiguous address space and have to map RAm and ROM areas into the HUB.
remove the "#" - Cool
Word accesses do NOT have to be word aligned in 8080. Which is a pain for us. It is little endian like the Prop.
I'm still trying to understand the consequences of your paragraph about pre-fetching opcodes. But get this:
Messing around this morning I discovered I could put 5 or 6 NOPs after the PC has been written out and before reading it back in again in the "skip" loop without any noticeable effect on the KIPS. Well perhaps 0.1%. Those NOPs are just wasting "dead" time after a COG has given up.
So given that we already have the opcode, which we have decided to skip, we can write it to HUB in free time! and save the next COG having to read it again. Which I think is what you were saying. Probably makes no sense to do when using HUB RAM but at least I an try it out.
Clusso, I don't take anything the wrong way. I've always said I'm open to suggestions. May never have got this far without help and encouragement.
So actually a big thank you to yourself and everyone else who has contributed. We are not "done" yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Yes, I agree that 64K contiguous RAM will solve the rom although I think you should check writes to ensure that they dont clobber ROM.
Shame 8080 words are not aligned. May still be quicker to see if they are aligned?
And yes, the prefetching before passing back control is for RAM use only, but should not impact the speed using hub.
The big part is your I/O. This is great news because you can trap and replace the code easily with other code. I know you are actually doing this now anyway. I guess I was just trying to understand it a little better. The main reason is that I believe the terminal (vga/tv and keyboard or a pc terminal) maybe on another prop, with I/O pins as well. The microSD card should be on the emulator prop together with the SRAM (512KBytes), so one hdisk could be sram of approx 448KBytes (or 360KB for floppy size) - this would fly !!!
Dr_Acula: Yes, I understand the hardware level, but as you have said it is not important in an emulation. This is what I was trying to get to understand - how the emulation was being done.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
As for timing , that's why I did the experiment with the NOPs at that point in the code. For sure the "exiting" COG has nothing to do at that point and the others are busy trying to figure out if it is their turn. I just wondered how much we could squeeze in there. Like testing for Z80 op prefixes maybe.
For sure we should "write protect" the ROM.
Alignment checking had ocurred to me also.
Actually shouldn't opcode pre-fetching help when using HUB RAM as well ? As it is all COGs have to go through read_memory_byte to get the op.
For sure the terminal can be on another Prop. OBC's PropComm VT100 would be just fine, modified to use what ever communication scheme you have rather than FullDuplexSerial or whatever it is now.
Given fast enough Prop to Prop communication the SD card could be elsewhere as well. At the moment disks are very slow as the I/O is all handled in Spin going through the Spin interface to the sd spi driver. This needs turbo charging with PASM.
RAM disks would be a treat and easy todo.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
The RAM disk would be fantastic!!! You can load up the RAM disk from a microSD or SPI Flash (1-64Mbit) chip.
The SD Card can and should be on the same prop as it will speed things up. I have designed it to be so, just that it will have to be wired because I ran out of time to place one on the pcb - But, Peter J. squeezed a socket on for us
The terminal will fit nicely on another prop - OBC's VT100 will be just fine
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz
This is really for Clusso.
This whole idea about passing already fetched opcodes to from a "skipping" COG to a "starting" COG does not work out for me. In this release I have a version of such a dispatch loop in each COG. (It is commented out, but it does work if you swap it in). Basically I loose 10% or more of speed instead of gaining ! Perhaps it starts to win if we have a really slow external memory interface.
Perhaps Clusso or someone could dream up a quicker version.
Anyway the main point of this was to prevent contention for external RAM between COGs. This release achieves that. By the simple expedient of pre-loading the first opcode to be executed into each COG before starting we can avoid them racing to get it. After that no one access 8080 memory space if they are the "skip" state.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 2/15/2009 5:29:41 PM GMT
Sorry that it is litle of topic.
The SIMH Altair 8800 Z80 simulator "http://www.schorn.ch/cpm/intro.php".
Very nice page to download al CPM versions and suport files to Altair Z80 CPM
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
For every stupid question there is at least one intelligent answer.
Don't guess - ask instead.
If you don't ask you won't know.
If your gonna construct something, make it·as simple as·possible yet as versatile as posible.
Sapieha
I think heater has got up to emulating about a 1mhz machine, maybe a bit more, but one thing I've found is that the emulation speed is not all that important for running programs - indeed I'm underclocking some chips to save power. Where you want a fast machine is with compiling, not with running, so the SIMH shines when it comes to compiling and then emulate at a slower speed.
We haven't heard from heater for a while and I presume he is deep inside the code, possibly with less hair than he started with. This little project has the potential to put a 20 chip circuit into a single 40 pin prop chip (ok, maybe with a sram as well). I hope heater hasn't given up whilst in the middle of tediously coding those single bit set and reset instructions.
The PropAltair CP/M on a Propeller has been using CP/M and other resources from the SIMH Altairz80 project since the beginning. In fact one of my primary aims with PropAltair is to be able to use the CP/M disk images from SIMH unchanged. That is, using the same CP/M BIOS and Altair bootloader etc as SIMH. I did not want to have to maintain yet another customized CP/M for the Propeller.
i8080_emu is an entirely rewritten emulator using 4 COGs as opposed to PropAltair only using 1. The idea being to boost speed and eventually fit in all the Z80 opcodes that PropAltair is missing (8080 only). So far i8080_emu is released without any CP/M or BASIC or whatever, just a CPU emulator and some test code. The idea being that it may be useful to others for different machine simulators.
Dr_A, I'm still here tinkering with all things Prop, CPM and Z80. However I'm starting to despair again about the performance of this 4 COG emulator. I fitted out the old single COG emulator with the frequency counter object and set it up to run the same tests as i8080_emu. Well the bottom line is that when running the T8080 CPU diagnostics i8080_emu is all of 3.5% faster !!!.
So as it stands we are just wasting 3 COGs for 3.5% gain. I'm not sure where to go next with this. Write it off as a failure. Hope that some genius figures a way to turbo it. Just add the Z80 ops as LMM to the old 1 COG emulator......
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.