Take the Dracblade. Remove all the latches. Remove the Sram. Put a 64k serial ram chip on the eeprom bus. Implement a Sphinx OS that frees up 14k of hub ram. Maybe toss out the LCD code for the moment, and the wireless layer, and the upper 512k code, and toss out the ramblade code too. Maybe optimise the VT100 code a bit. I think that should get us to 16k of free hub ram, maybe more.
Put a ram driver in the cog that is currently running the sram driver code. This new ram driver handles a list of 256 ram blocks of 256 bytes each.
The list handling is going to be a priority list. Each time a block is accessed you add 1 to a counter for that block. Rank them in order. If a new block is needed, take the lowest ranking one, put it into serial ram, and then get the new block. Can this all fit into a cog? I think it should. Is the serial ram driver code the same as the eeprom driver code, and if so, is this already somewhere anyway (?? in the sd card object).
Just looking at ram now SPI or I2C. Code exists for both I think.
This could halve the size of the dracblade board for starters, and decrease the chip count from 9 to 4. Plus free up a number of propeller pins for audio or more serial ports.
Agree a block write then read from serial ram will be slow, but that ought to happen only very infrequently. Possibly never for a small sbasic/c/assembly program.
We can't do this now because there there are 7 blocks of 2k code sitting in ram in random locations.
A thought? Maybe we can use it without even needing sphinx! Just tell the serial ram driver cog the locations of the 7 blocks of 2k code, and any more free code area. It can then have a simple list of where it keeps each block of 256 bytes.
Well Bill is THE inventor of the LMM technique for the Prop. So if he thinks he's on to something we should all sit up and pay attention.
This idea of a COG handling external memory with caches etc may not be as fast as the direct xxxBlade approach we have now but for those who want to save pins and for the up and coming ZPU emulator it shold be a very good compromise.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Any chance that you could do a quick port of ZiCog to the VMCOG interface? It would really exercise VMCOG, and allow running benchmarks with various paging strategies (and working set sizes).
I think I am only a few days away from having a preliminary VMCOG running, and need something real to test against (other than the VMCOG_Debugger).
I believe I will have VMCOG (roughly) running sometime next week, and would love to try Cogz/ZiCog (either/both) in about a week
heater said...
Bill, I'd love to but I'm not sure it's possible to find the time for some days. This Cogz spurt has taken my quota of free time for a while.
It is the chip used in some mp3 players. Z80 opcodes, keyboard, onboard usb2, dac and adc, onboard ram (more than enough for cp/m), I2C, uarts, SD card, direct output to headphones, mpeg decoder, and onboard dc-dc converter so it can run off 1 AA battery.
But- what it can't do; VGA and TV display plus the distinct lack of really detailed data and code.
I'm thinking hybrids. Absolute minimalist three chip hybrids, propeller and this chip and eeprom. Maybe you don't even need the eeprom - emulate it in the ATJ2085? Off to do some more research...
I was idly looking at STM8Sxxxx chips and saw that they appear to have very Z80 ish op-codes, then I saw that one of the evaluation boards·is only·£4.25 from Farnell, so one has been ordered out of pure curiosity.
Sorry if this is a bit "Leonish".
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
It is the chip used in some mp3 players. Z80 opcodes, keyboard, onboard usb2, dac and adc, onboard ram (more than enough for cp/m), I2C, uarts, SD card, direct output to headphones, mpeg decoder, and onboard dc-dc converter so it can run off 1 AA battery.
But- what it can't do; VGA and TV display plus the distinct lack of really detailed data and code.
I'm thinking hybrids. Absolute minimalist three chip hybrids, propeller and this chip and eeprom. Maybe you don't even need the eeprom - emulate it in the ATJ2085? Off to do some more research...
I was playing with the memory routines at the end of the DracBlade version, and surprise suprise there aren't any free longs. I could go all 8080 ish, but that would be vearing towards the Abacus joke again.
Was there any written stuff about the overlaying techniques, so that a bit of free space could be had, as well as the adding of a few more Z80 ops (eventually)?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
Toby, that's a very big question. I know you want a full/fullish Z80 for the sake of NASCOM.
There is nothing much written about the overlay technique. I think in that last ZiCog package I put out there was still a small example of using overlays in a spin file. That came from Cluso and is the only way I worked out how to use the overlays.
Anyway it might not help much to know. Last time I was wrapping my head around the ZiCog space issue I had pretty much run out of things to put into overlays. You see it's not just a case of "let's rip this opcode handler out of resident space and put it in an overlay". The problem is that most of the code that is left as resident code now is not entire Z80 opcode handlers but small parts of handlers that are used by may ops. Little micro-ops like "get this", "put that", "push this" pop that" etc. They are combined into complete Z80 ops by the way the instruction dispatch table works.
If I remember correctly the only complete Z80 ops left in resident were the tiny little STC (set carry) and CLC (Clear Carry) or some such. so there is a couple of longs to be had by making those into overlays.
I did spot a redundant LONG in there once. I think it was a JMP to a lable that is on the very next instruction! Can't remember if that was removed yet.
One possibility is coming up on the horizon that may save you. Bill Henning is working on a Virtual Memory system for the Prop, VMCog. If that was used by ZiCog then actual access to physical RAM would no longer be in ZiCog but in VMCog. This would also mean the read_memory_byte and write_memory byte routines would get smaller. It would also be nice because ZiCog itself would not have to cjange when using different memory hardware. Might be a bit slower though.
I don't know if Bill is working on a ZiCog with VMCog but it has to be done at some point I think. Just now I'm trying to get my head around using VMCog by adapting Zog to it.
Not very helpful am I.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I am sure that the real way around my problems is to go back to nailing a true Z80, and use the Prop for all the periferals. BladeX and DracBlade just got so tantalizingly close.
I shall go and vent my frustrations, on slugs, cats·and elves (now which one was I allowed to nail down .... )
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
Toby: I found some longs for Drac a while back. IIRC I found 2 more and it should be mentioned on the older post around the time Drac was getting his latched dracblade working. Not sure you are going to find any more space. This is why I said a long time ago that I didn't believe a latched design was workable and I would rather use a second prop for the I/O where 30 pins are available for whatever the user wants.
BTW: I intend to use the extra longs I found to improve the performance of ZiCog for reading & writing words and seperating the fetch and read bytes to improve the fetching. I can remove the call and return and the address move and increment from the fetch which is at least a 3 instruction improvement.
I tried SphinxOS with ZiCog. However, it fails because ZiCog uses 4 cogs (spin, zicog z80, SD driver and sram driver). SphinxOS uses 3 cogs for SD plus 1 or 2 for the I/O plus 1 for spin which can be killed. Not enough cogs to pass control to ZiCog. I have to work a way to stop the SD or reduce the number of cogs it uses.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
Perhaps you should check out the creations of the N8VEM project. They have been building Z80 cards of various types for a while and now they have a card using a Prop for I/O, SD/Video. Start here http://forums.parallax.com/showthread.php?p=878280
I have one of Dr_A's mini N8VEM Z80 cards, it's great. One day I'll find time for the PropIO as well.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
If I wanted CP/M then I am spoilt for choice with "the Blades" With the Nascom monitors I would have to change the I/O callings and it would run as if it were in serial terminal mode rather than the memory mapped screen, so "the Blades" will run bog standard BASICS just as well, with far better choices.
I think that the Nascom is just a compulsive disorder.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
I think that is what you are looking for - not quite there yet but I don't see it being too far away - just need to finish diskIO on the first pass that we are doing.
You're right I should stop bouncing off the commitment to have a simple Z80 and ram, force fed from a Prop at boot and then serviced by it afterwards. I still have two cmos Z80s an one PIO here.
The iron was feeling neglected anyway.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
I already have the "Blade2 with PropCMD" board to try that, so no ironing required, I guess.
I think the Birdsnest will be used to try and switching out the EEPROM after boot, and the using those pins for the KBD ( or KBD and VID in your case ) and then sticking the VGA up onto P24-P27. That will leave me with 20 or 24 (with no SD) free pins, in a row.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
first I noticed a wrong declaration in zicog.spin:
im_reg := reg_base + 8
should be
im_reg := reg_base + 16
Further I think it is possible to save a couple of longs by doing the following:
' reorder register_file as follows
bc_reg word 0
de_reg word 0
hl_reg word 0
af_reg word 0
ix_reg word 0
iy_reg word 0
sp_reg word 0
pc_reg word 0
im_reg word 0
bc_reg_alt word 0
de_reg_alt word 0
hl_reg_alt word 0
af_reg_alt word 0
' entries for put 8bit register functions
put_c jmpret put_r8, put_r8
put_b jmpret put_r8, put_r8
put_e jmpret put_r8, put_r8
put_d jmpret put_r8, put_r8
put_l jmpret put_r8, put_r8
put_h jmpret put_r8, put_r8
put_f jmpret put_r8, put_r8
put_a jmpret put_r8, put_r8
put_lx jmpret put_r8, put_r8
put_hx jmpret put_r8, put_r8
put_ly jmpret put_r8, put_r8
put_hy jmpret put_r8, put_r8
put_r8 mov ri, 0-0 ' ri is one of put_b, put_e, put_d etc.
sub ri, #put_b ' ri is 0, 4, 8 ...
shr ri, #2 ' ri is 0, 1, 2 ...
add ri, reg_base ' reg base + index
wrbyte data_8, ri
jmp #fetch
put_bc jmpret put_r16, put_r16
put_de jmpret put_r16, put_r16
put_hl jmpret put_r16, put_r16
put_af jmpret put_r16, put_r16
put_bc2 jmpret put_r16, put_r16
put_de2 jmpret put_r16, put_r16
put_hl2 jmpret put_r16, put_r16
put_af2 jmpret put_r16, put_r16
put_im jmpret put_r16, put_r16
put_ix jmpret put_r16, put_r16
put_iy jmpret put_r16, put_r16
put_sp jmpret put_r16, put_r16
put_pc jmpret put_r16, put_r16
put_r16 mov ri, 0-0 ' ri is one of put_de, put_hl etc.
sub ri, #put_de ' ri is 0, 4, 8, ...
shr ri, #1 ' ri is 0, 2, 4, ...
add ri, reg_base ' reg base + index
wrword data_16, ri
jmp #fetch
The tradeoff is between just 1 long per entry and the calculation of the retaddr into an index on the other hand. This would make the c_reg, b_reg etc. obsolete, and also require a change in the way how exx and ex af,af' are coded. I haven't counted how many longs this saves, specificially if you would otherwise add accessors for lx,hx,ly and hy.
The same principle can of course be applied to the get_ functions.
Then here's a (untested) rewrite of the DAA function, based on MAME's Z80 core code. See the attachment for the code.
Juergen
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Your comments are most interesting. I see some comments over on the catalina thread as well - I might answer them here as this is the zicog thread (though there is lots of overlap on various threads).
For the dracblade board I added a few extra instructions. I think there are four things you have to change to add an instruction but it is fairly easy and a matter of following examples.
The problem of it not fitting in a cog has been solved by using 'overlays', though I must confess I have absolutely no idea how overlays actually work. I have a vague understanding that somehow they shift codespace from the cog into hub ram. That works on all the current boards (though on the dracblade even hub ram is almost all used up).
In any case, zicog is portable code that can be run of several different platforms, just by changing the small code that handles ram access.
IX and IY and a whole lot of other instructions are not done yet.
It would be great to work together to add more instructions. What sort of hardware do you have?
Dr_Acula said...
It would be great to work together to add more instructions. What sort of hardware do you have?
None - yet I'm expecting to receive this board by the end of this week. I have an USB to serial adapter here and hope it will work. Otherwise I would have to solder some serial cable. Fortunately my PC still has a real serial port. I also have a FBAS monitor from the late 1980s somewhere in the shed, so I'll be going with PAL FBAS. And for the keyboard and mouse: I'm probably going to add two connectors on the raster area that connect to the same pins as the Parallax demo board. I'm not sure yet if and how to add additional (S)RAM and/or a SD card slot.
For the overlays and space constraints: I pretty much understood the problem, and the solution with overlays is probably the best thing you can do. It's like swapping in opcodes on demand, which isn't all that fast, but the opcodes are the less often used ones or the ones which repeat (the overlay) many times. As far as I can tell ZiCog is driving near the edge of the ways that are possible to walk on a single cog. I don't see paths to simplify what it does. Even my suggestion from above adds a lot of latency to the get/put accessors, just to save some (perhaps a dozen?) more longs for code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
I thought about the jmpret approach a little more. If you wasted some hub ram and aligned registers on long boundaries, then there could be one group of accessors to the LSBs, one for the MSBs, removing the need to shift the register index.
bc_reg long 0
de_reg long 0
...
put_c jmpret put_r8, put_r8
put_e jmpret put_r8, put_r8
put_l jmpret put_r8, put_r8
put_f jmpret put_r8, put_r8
put_lx jmpret put_r8, put_r8
put_ly jmpret put_r8, put_r8
put_r8 mov ri, 0-0 ' ri is one of put_b, put_e, put_d etc.
sub ri, #put_e ' ri is 0, 4, 8 ...
add ri, reg_base ' reg base + index
wrbyte data_8, ri
jmp #fetch
put_b jmpret put_r8, put_r8
put_d jmpret put_r8, put_r8
put_h jmpret put_r8, put_r8
put_a jmpret put_r8, put_r8
put_hx jmpret put_r8, put_r8
put_hy jmpret put_r8, put_r8
put_r8 mov ri, 0-0 ' ri is one of put_b, put_e, put_d etc.
sub ri, #put_d - 1 ' ri is 1, 5, 9 ...
add ri, reg_base ' reg base + index
wrbyte data_8, ri
jmp #fetch
put_bc jmpret put_r16, put_r16
put_de jmpret put_r16, put_r16
put_hl jmpret put_r16, put_r16
put_af jmpret put_r16, put_r16
put_ix jmpret put_r16, put_r16
put_iy jmpret put_r16, put_r16
put_sp jmpret put_r16, put_r16
put_pc jmpret put_r16, put_r16
put_im jmpret put_r16, put_r16
put_bc2 jmpret put_r16, put_r16
put_de2 jmpret put_r16, put_r16
put_hl2 jmpret put_r16, put_r16
put_af2 jmpret put_r16, put_r16
put_r16 mov ri, 0-0 ' ri is one of put_de, put_hl etc.
sub ri, #put_de ' ri is 0, 4, 8, ...
add ri, reg_base ' reg base + index
wrword data_16, ri
jmp #fetch
It may be useful to put the bc2, de2, hl2, af2 in the upper word of the bc, de, hl, af registers in this case, which would make exx a little simpler: rol bc_reg, #16; rol de_reg, #16; rol hl_reg, #16. Well, in hub ram, thus with rdlong/wrlong or something like that... awkward.
bc_reg long 0
de_reg long 0
hl_reg long 0
af_reg long 0
ix_reg long 0
iy_reg long 0
sp_reg long 0
pc_reg long 0
im_reg long 0
...
put_c jmpret put_r8, put_r8
put_e jmpret put_r8, put_r8
put_l jmpret put_r8, put_r8
put_f jmpret put_r8, put_r8
put_lx jmpret put_r8, put_r8
put_ly jmpret put_r8, put_r8
put_r8 mov ri, 0-0 ' ri is one of put_e, put_l, put_f etc.
sub ri, #put_e ' ri is 0, 4, 8 ...
add ri, reg_base ' reg base + index
wrbyte data_8, ri
jmp #fetch
put_b jmpret put_r8, put_r8
put_d jmpret put_r8, put_r8
put_h jmpret put_r8, put_r8
put_a jmpret put_r8, put_r8
put_hx jmpret put_r8, put_r8
put_hy jmpret put_r8, put_r8
put_r8 mov ri, 0-0 ' ri is one of put_d, put_j, put_a etc.
sub ri, #put_d - 1 ' ri is 1, 5, 9 ...
add ri, reg_base ' reg base + index
wrbyte data_8, ri
jmp #fetch
put_bc jmpret put_r16, put_r16
put_de jmpret put_r16, put_r16
put_hl jmpret put_r16, put_r16
put_af jmpret put_r16, put_r16
put_ix jmpret put_r16, put_r16
put_iy jmpret put_r16, put_r16
put_sp jmpret put_r16, put_r16
put_pc jmpret put_r16, put_r16
put_im jmpret put_r16, put_r16
put_r16 mov ri, 0-0 ' ri is one of put_de, put_hl etc.
sub ri, #put_de ' ri is 0, 4, 8, ...
add ri, reg_base ' reg base + index
wrword data_16, ri
jmp #fetch
And now, the longer I look at this approach, it doesn't seem to be all too useful. I think the additional cycles hurt too much to try it this way.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
@pullmoll: Welcome to the forum. Your ideas will be a great addition to the prop community
Hope you prop arrives soon.
IMHO we are now up to the point where we could use a second cog to do some of the instructions. I know heater tried using multiple cogs in the first place. We have now learnt a lot about the prop and it's cogs, and I am sure we could now use that with benefit.
However, one of the issues here is that the code is diverging as I am following the path that any complex I/O will be done by another prop as speed is my main motivation. Drac (and some others), are following a single prop solution where the complex I/O is on the same prop and speed is unimportant. Drac's solution requires the availability of cogs to perform the various I/O drivers.
And now heater is following a new processor, the ZPU.
There is another solution over on the N8VEM forum where they are using a prop (or 2) to control a real Z80 board. As far as I am concerned, it defeats the purpose of the prop ZiCog emulation.
I have been spending my energy on expanding SphinxOS with a view to get it running ZiCog as a Sphinx aware program. However, there are currently not enough cogs to do this, since Sphinx currently uses 3 cogs to handle the SD card drivers. This will require further work.
It will be interesting to see how all this plays out.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
Pullmoll: You have a lot of great ideas already, thank you.
I think ultimately you will need to get a Prop board with external RAM to appreciate what ZyCog can do with CP/M and such on the Prop. Mind you I did develop most of the emulator with only a Prop Demo board and a 2 x 20 LCD panel.
I like your idea for the getters and setters however I'm loath to add any instructions in the most uses code paths. We have to think about performance as well as code size. Did you notice that Reg A and the flags are not normally read from HUB they are kept locally in COG for speed as it avoids a lot of wrd/wrbyte.
Currently all Intel 8080 instructions are performed directly in COG resident code, except DAA, IN and OUT I think. This keeps CP/M performance up to being useful as most CP/M code does not use Z80 ops. The extra Z80 ops are done by pulling in overlays. Z80 performance is not such a worry for us CP/M heads. This does mean though that those wanting to create games systems, or Sinclair Spectrums etc may be disappointed with ZiCog due its Z80 performance.
Dr_A: "and a whole lot of other instructions are not done yet. " Actually I don't think it's so bad as a "whole lot". it's not so many.
Cluso: A ZiCog that uses more than one COG for Z80 emulation is not "ZiCog" anymore. Having tried it once already I'm not inclined to try again even if I think I have a way to make it work better. Besides as you are having problems with ZiCog on Sphinx and so was Dr_A on DracBlade due to running out of COGs we have to find a better one COG solution.
Edit: Yes I've wondered away from ZiCog a bit with my new Zog obsession but it is still always in the back of my mind I think it's just waiting for the next "great idea" that will get it up to 100% Z80 in a nice way.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Is it possible that the answer to all our COG space problems has been staring us in the face all the time?
Currently we have a lot of "resident" code that sits in COG all the time. It has that privileged position because it is code that is used by many Z80 instructions, the fetch/execute loop, the memory access functions, the putters and getters, PUSHer and POPers, all little "mico-op" things. This "privileged" code is pretty much all you need for the base 8080 instruction set.
The non-privileged, mostly Z80 extension, instructions are pushed out to slow overlays because as CP/M heads we don't care so much about how fast the are.
Now an example of our problem is that Pullmoll has proposed a nice solution to get DAA working accurately. That solution requires a 33 LONG overlay and does not fit.
Now here is the idea: During execution of that DAA a lot of our "privileged" resident code need not be resident at all. DAA does not need all those getters and putters, PUSHers and POPers etc etc. Worse still when DAA is not executing it is sucking up a big overlay space for nothing.
So:
1) Why not create a big(er) overlay area which initially holds a selected collection resident functions.
2) This initial overlay may never be swapped out when running 8080 code.
3) We arrange that whatever code is in that initial overlay is never required by any of the Z80 overlays.
4) On running a Z80 op the initial overlay is swapped for a Z80 op overlay.
5) On completion of the Z80 op the initial overlay is reloaded back into position in the COG.
Cluso's overlay mechanism already includes such a "default" initial overlay that is in place at start up. Currently it holds DAA code.
We just need to arrange that that initial overlay holds a big bunch of code that is not required during use of other overlays. The big difference here is that the default overlay must be reloaded after an overlaid Z80 op whereas currently we just leave whatever overlay was just used in COG. This reloading is a performance hit but there we are.
A slight complication is that the default overlay will now have to hold more than one function. Like a little library. I hope this can all be loaded and links up nicely.
P.S. My other take on this is to give up the overlay mechanism entirely.
Why not use LMM? With a simple LMM kernel in place we can just keep coding new ops without a care in the world until they are all done. Performance may not be up to overlay speed but it's pretty close. And with the above solution of reloading a semi-resident overlay after each other overlay is used then LMM may be quicker than overlay. It would make for a more elegant looking solution.
I tried LMM in the original PropAltair and gave up on it due to speed issues. But in that case it was being used for 8080 code as well, not good. ZiCog would not have that problem with LMM.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
2) Add the actual code - this is about 2/3 of the way down the zicog code.
'---------------------------------------------------------------------------------------------------------
'---------------------------------------------------------------------------------------------------------
'ex af,af' overlay. rdbyte and wrbyte are value,address for both
org OVERLAY_START
ex_af_af_ovl wrbyte flags, f_reg ' local flags to hub register
rdbyte data_8,f_reg ' get it back into data_8
mov data_16,f_reg ' temp store
add data_16,#8 ' alt flags location in data_16 (temp variable)
rdbyte temp1,data_16 ' get the alt flag data into temp1
wrbyte data_8,data_16 ' F to F'
wrbyte temp1,f_reg ' F' to F
mov flags,temp1 ' put the F' to local flags
wrbyte flags,f_reg ' and put into hub too for break display
' now do the A register
add data_16,#1 ' data_16 to A' register
rdbyte data_8,a_reg ' A to data_8
rdbyte temp1,data_16 ' alt A to temp1
wrbyte data_8,data_16 ' A to A'
wrbyte temp1,a_reg ' A' to A
jmp #fetch
long $0[noparse][[/noparse]($ - OVERLAY_START) // 2] 'fill to even number of longs (REQUIRED)
ex_af_af_ovl_end
fit $1F0
3) in a CON list add the instruction (you need to count in hex to do this!)
4) In a DAT list underneath this CON, add the overlay table
{1A}ex_af_af_ovl_ long 0-0
{1B}exx_ovl_ long 0-0
{1C}ireg_ovl_ long 0-0
5) Down the bottom of the code there is a list of Z80 opcodes. Put in the pointer at the appropriate opcode
{D9} long vector_overlay + (exx_ovl_no << 9)
I can post the code for ex af,af', exx and ireg if required.
I'm slowly coming to understand what the overlay does. I understand that it all fits in the DAA space. They all start with " org OVERLAY_START"
And they all end with
jmp #fetch
long $0[noparse][[/noparse]($ - OVERLAY_START) // 2] 'fill to even number of longs (REQUIRED)
exx_ovl_end
fit $1F0
What I've not quite understood is how you fit it all in a cog. I still don't understand it completely, as I was thinking that maybe the DAA was the last instruction in the cog code and the compiler converts it to bytes and then the overlay loader gets the bytes from that location and loads them into the DAA code space. All would make sense, but how is the DAA code in the middle of the cog code? Why doesn't this displace the cog code after this point so it can't then be loaded?
But anyway, the next relevant bit of code is the overlay code:
'---------------------------------------------------------------------------------------------------------
' OVERLAY LOADER follows...
' An even number of longs will be loaded (for efficiency)
' Flags c & z will be maintained
'
Overlay_par long 0-0 'overlay parameters for the OVERLAY_LOAD
break_ovl_ long 0-0
_0x400 long $0000_0400 'inc/decrement destination by 2
_djnz0 djnz overlay_par, #overlay_copy2 'prototype instruction (moved to overlay instruction)
t1 long 0 'used to determine if overlay already loaded (can be used by other code)
OVERLAY_LOAD mov OVERLAY_START, _djnz0 'Copy djnz instruction to head of overlay area.
movd overlay_copy2, overlay_par 'move cog END address into rdlong instruction
sub overlay_par, #1 'decrement cog End address by 1
movd overlay_copy1, overlay_par 'move cog END-1 address into rdlong instruction
shr overlay_par, #16 'extract the overlay## hub END address (remove cog address)
overlay_copy2 rdlong 0-0, overlay_par 'copy long from hub to cog (hptr ignores last 2 bits!)
sub overlay_par, #7 'decrement hub ptr by 1 long (prev by 1, now by 7)
sub overlay_copy2, _0x400 'decrement cog (destination) address by 2
overlay_copy1 rdlong 0-0, overlay_par 'copy long from hub to cog
sub overlay_copy1, _0x400 'decrement cog (destination) address by 2
'---------------------------------------------------------------------------------------------------------
Now, to the suggestions. We don't have any free bytes so the first thing might be to free some up somewhere. Because as you say, then you need some new code does heater's step 5) and reloads the original code.
I think that could work though. Indeed, the thing about all the Z80 instructions is they are only called infrequently. So as an extreme example, running a Z80 overlay could rewrite almost the entire cog, then do something, then reload the cog.
Where from?
Well I've got 448k of ram completely unused on the dracblade, and I think that is true for cluso's boards as well.
Ok, maybe you don't rewrite an entire cog but I like the concept of picking a block of code that would never be used by the overlay. Heck, it could even be the block at the end with all the pointers for the opcodes. Hence I need to understand better how the overlays can sit in the middle of zicog and don't have to be at the end of the code.
So - LMM. Certainly we already have code blocks to move bytes in and out of external ram. Hey, guess what, that happens to be the custom ram driver code already in the zicog!
I do like the idea of being able to add more instructions more easily. Maybe with less steps than the above 5 step example. Ok, maybe this is a bit of an ask, but I'm running out of hub ram, so maybe the code for Z80 opcodes could somehow be loaded off an sd card into the external ram chip at bootup?
The challenge I'm confused about is how to write 'portable' code that somehow can be compiled externally yet still knows about things like the registers. eg
'exx overlay.
org OVERLAY_START
exx_ovl mov data_16,c_reg ' happens to be 0 ie regbase
add data_16,#8 ' data_16 now points to the alt BC reg
rdword data_8,c_reg ' get BC into data_8 (messy as storing 2 bytes in data_8)
rdword temp1,data_16 ' BC' to temp1
wrword data_8,data_16 ' BC to BC'
wrword temp1,c_reg ' BC' to BC
add data_16,#2 ' point to DE'
rdword data_8,e_reg ' get DE into data_8
rdword temp1,data_16 ' DE' to temp1
wrword data_8,data_16 ' DE to DE'
wrword temp1,e_reg ' DE' to DE
add data_16,#2 ' point to HL'
rdword data_8,l_reg ' get HL into data_8
rdword temp1,data_16 ' HL' to temp1
wrword data_8,data_16 ' HL to HL'
wrword temp1,l_reg ' HL' to HL
jmp #fetch
fine once you have the registers like c_reg, but you need to point to them somehow?
Though, as a general principle, would it be true to say that if you gave an isolated bit of pasm code access to all the Z80 registers, and told it which instruction to process, that is all the information it would ever need?
So maybe you just need to pass the location of the first register? The register list order never changes, so if there is a comment at the beginning of the overlay opcode noting the order
reg_base := LONG[noparse][[/noparse]cpu_params] ' memory locations in hub that contain these variables
c_reg := reg_base + 0
b_reg := reg_base + 1
e_reg := reg_base + 2
d_reg := reg_base + 3
l_reg := reg_base + 4
h_reg := reg_base + 5
f_reg := reg_base + 6
a_reg := reg_base + 7
' bc_reg_alt := reg_base + 8 ' no room for these but values here for reference
' de_reg_alt := reg_base + 10
' hl_reg_alt := reg_base + 12
' af_reg_alt := reg_base + 14
' im_reg := reg_base + 16 ' this is not used at all,? delete
ix_reg := reg_base + 18
iy_reg := reg_base + 20
sp_reg := reg_base + 22
pc_reg := reg_base + 24
Then really, all you need to do is pass it 'reg_base'
heater: Using LMM for the extra instructions may well be a nice solution. I would like to get a v1.0 & v2.0 out first, but like you, I have been on other things. I am still in a mind that 2 cogs could work nicely, but as you know, depending on the hardware, we may not have enough cogs available.
Am I correct (Bill can answer this) in saying that each LMM instruction takes 32 clocks as it is not possible to catch each hub cycle? If so, then I suspect overlaying is still likely to be quicker. Obviously this depends on the exact code being executed. For example, a loop will be much faster in overlays, but inline would likely be marginal if an overlay had to be reloaded at the end.
My issue with SPhinxOS is that it requires 3 cogs for SD drivers. For Sphinx to be of any use, this has to be trimmed, even if it means more hub space is used. As it is now, normally 5 cogs are used for drivers. 2 for the I/O (only 1 if it is serial) and 3 for SD. Of course, Sphinx itself uses 1 but that is stopped when control passes to another program. So, in reality, a user Sphinx aware program can only use 3 cogs, plus have SD and I/O drivers resident.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
Cluso:
Using 2 Cogs is out for ZiCog, at least for me. If anyone wants to do it that's a different project. More than one COG would upset Dr_A and your Sphinx efforts. To me it just seems wrong to need two 32-bit processors to emulate one little 8-bitter[noparse]:)[/noparse]
I think I follow you re: the performance of overlays vs LMM but I feel it might be close enough not to worry about in the end.
My reasoning is this:
1) An LMM solution can execute "straight line" code as fast as the overlay. Reason being that whichever way you do it instructions have to be loaded and executed. The only difference between the two techniques is that in overlay there is a tight loop around the load part but in LMM there is a loop that includes the load and the execute. I'm sure you will find that the reverse LMM execution I used in PropAltair is of equivalent speed to overlay on straight line code.
2) Overlays obviously win where the code includes a lot of jumps and especially loops around a few times. The code only needs loading once rather than every time around a loop.
3) BUT most of our overlaid stuff is "straight line" apart from the string ops. I'm inclined not to worry about them[noparse]:)[/noparse]
4) If we go down the route of putting currently "resident" code into an overlay as I suggested above. Then we may get the space BUT it requires that the "semi-resident" stuff is always reloaded after a Z80 overlay is completed. This reloading overhead is going to put overlays back into the performance range of LMM.
5) If the aim is to get as close to 100% Z80 correctness as possible then the slight speed hit of LMM may just be worth it.
On the plus side a simple LMM kernel need not be much bigger than the current overlay loader and it does not require dedicating space in COG for the overlay area.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Comments
Take the Dracblade. Remove all the latches. Remove the Sram. Put a 64k serial ram chip on the eeprom bus. Implement a Sphinx OS that frees up 14k of hub ram. Maybe toss out the LCD code for the moment, and the wireless layer, and the upper 512k code, and toss out the ramblade code too. Maybe optimise the VT100 code a bit. I think that should get us to 16k of free hub ram, maybe more.
Put a ram driver in the cog that is currently running the sram driver code. This new ram driver handles a list of 256 ram blocks of 256 bytes each.
The list handling is going to be a priority list. Each time a block is accessed you add 1 to a counter for that block. Rank them in order. If a new block is needed, take the lowest ranking one, put it into serial ram, and then get the new block. Can this all fit into a cog? I think it should. Is the serial ram driver code the same as the eeprom driver code, and if so, is this already somewhere anyway (?? in the sd card object).
Just looking at ram now SPI or I2C. Code exists for both I think.
This could halve the size of the dracblade board for starters, and decrease the chip count from 9 to 4. Plus free up a number of propeller pins for audio or more serial ports.
Agree a block write then read from serial ram will be slow, but that ought to happen only very infrequently. Possibly never for a small sbasic/c/assembly program.
We can't do this now because there there are 7 blocks of 2k code sitting in ram in random locations.
A thought? Maybe we can use it without even needing sphinx! Just tell the serial ram driver cog the locations of the 7 blocks of 2k code, and any more free code area. It can then have a simple list of where it keeps each block of 256 bytes.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Post Edited (Dr_Acula) : 2/3/2010 10:22:59 AM GMT
This idea of a COG handling external memory with caches etc may not be as fast as the direct xxxBlade approach we have now but for those who want to save pins and for the up and coming ZPU emulator it shold be a very good compromise.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I will find time somehow.
Today I will edit my various postings togeather. I will post replies to the messages above later today as I edit the spec.
Heater: I always intended to have a way of requesting bytes, words, longs , and N bytes.
The new thread for this topic is: http://forums.parallax.com/forums/?f=25&m=424051&g=424051#m424051
I've added a lot to it, and I am about to start responding to all of your comments there now... I summarized the comments from this thread there.
There is even the start of vmcog.spin available for download!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 2/3/2010 10:48:05 PM GMT
Any chance that you could do a quick port of ZiCog to the VMCOG interface? It would really exercise VMCOG, and allow running benchmarks with various paging strategies (and working set sizes).
I think I am only a few days away from having a preliminary VMCOG running, and need something real to test against (other than the VMCOG_Debugger).
Thanks,
Bill
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I believe I will have VMCOG (roughly) running sometime next week, and would love to try Cogz/ZiCog (either/both) in about a week
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Ok, this chip came up on the comp.os.cpm discussion forum www.s1mp3.org/files/datasheets/ATJ2085/ATJ2085_datasheet_v1.5.pdf
It is the chip used in some mp3 players. Z80 opcodes, keyboard, onboard usb2, dac and adc, onboard ram (more than enough for cp/m), I2C, uarts, SD card, direct output to headphones, mpeg decoder, and onboard dc-dc converter so it can run off 1 AA battery.
But- what it can't do; VGA and TV display plus the distinct lack of really detailed data and code.
I'm thinking hybrids. Absolute minimalist three chip hybrids, propeller and this chip and eeprom. Maybe you don't even need the eeprom - emulate it in the ATJ2085? Off to do some more research...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Sorry if this is a bit "Leonish".
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
Problems: does not present clean 64KB ram memory map
Not available from: Digikey, Newark, FutureElectronics, Mouser
But definitely an interesting chip!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
I was playing with the memory routines at the end of the DracBlade version, and surprise suprise there aren't any free longs. I could go all 8080 ish, but that would be vearing towards the Abacus joke again.
Was there any written stuff about the overlaying techniques, so that a bit of free space could be had, as well as the adding of a few more Z80 ops (eventually)?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
There is nothing much written about the overlay technique. I think in that last ZiCog package I put out there was still a small example of using overlays in a spin file. That came from Cluso and is the only way I worked out how to use the overlays.
Anyway it might not help much to know. Last time I was wrapping my head around the ZiCog space issue I had pretty much run out of things to put into overlays. You see it's not just a case of "let's rip this opcode handler out of resident space and put it in an overlay". The problem is that most of the code that is left as resident code now is not entire Z80 opcode handlers but small parts of handlers that are used by may ops. Little micro-ops like "get this", "put that", "push this" pop that" etc. They are combined into complete Z80 ops by the way the instruction dispatch table works.
If I remember correctly the only complete Z80 ops left in resident were the tiny little STC (set carry) and CLC (Clear Carry) or some such. so there is a couple of longs to be had by making those into overlays.
I did spot a redundant LONG in there once. I think it was a JMP to a lable that is on the very next instruction! Can't remember if that was removed yet.
One possibility is coming up on the horizon that may save you. Bill Henning is working on a Virtual Memory system for the Prop, VMCog. If that was used by ZiCog then actual access to physical RAM would no longer be in ZiCog but in VMCog. This would also mean the read_memory_byte and write_memory byte routines would get smaller. It would also be nice because ZiCog itself would not have to cjange when using different memory hardware. Might be a bit slower though.
I don't know if Bill is working on a ZiCog with VMCog but it has to be done at some point I think. Just now I'm trying to get my head around using VMCog by adapting Zog to it.
Not very helpful am I.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I shall go and vent my frustrations, on slugs, cats·and elves (now which one was I allowed to nail down .... )
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
BTW: I intend to use the extra longs I found to improve the performance of ZiCog for reading & writing words and seperating the fetch and read bytes to improve the fetching. I can remove the call and return and the address move and increment from the fetch which is at least a 3 instruction improvement.
I tried SphinxOS with ZiCog. However, it fails because ZiCog uses 4 cogs (spin, zicog z80, SD driver and sram driver). SphinxOS uses 3 cogs for SD plus 1 or 2 for the I/O plus 1 for spin which can be killed. Not enough cogs to pass control to ZiCog. I have to work a way to stop the SD or reduce the number of cogs it uses.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Perhaps you should check out the creations of the N8VEM project. They have been building Z80 cards of various types for a while and now they have a card using a Prop for I/O, SD/Video. Start here http://forums.parallax.com/showthread.php?p=878280
I have one of Dr_A's mini N8VEM Z80 cards, it's great. One day I'll find time for the PropIO as well.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I think that the Nascom is just a compulsive disorder.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
I know what you mean.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Take a look at my last post on N8VEM group: http://groups.google.com/group/n8vem/browse_thread/thread/23d699592d963c8b#
I think that is what you are looking for - not quite there yet but I don't see it being too far away - just need to finish diskIO on the first pass that we are doing.
You're right I should stop bouncing off the commitment to have a simple Z80 and ram, force fed from a Prop at boot and then serviced by it afterwards. I still have two cmos Z80s an one PIO here.
The iron was feeling neglected anyway.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
No hoops to jump through this way!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
I already have the "Blade2 with PropCMD" board to try that, so no ironing required, I guess.
I think the Birdsnest will be used to try and switching out the EEPROM after boot, and the using those pins for the KBD ( or KBD and VID in your case ) and then sticking the VGA up onto P24-P27. That will leave me with 20 or 24 (with no SD) free pins, in a row.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
first I noticed a wrong declaration in zicog.spin:
im_reg := reg_base + 8
should be
im_reg := reg_base + 16
Further I think it is possible to save a couple of longs by doing the following:
The tradeoff is between just 1 long per entry and the calculation of the retaddr into an index on the other hand. This would make the c_reg, b_reg etc. obsolete, and also require a change in the way how exx and ex af,af' are coded. I haven't counted how many longs this saves, specificially if you would otherwise add accessors for lx,hx,ly and hy.
The same principle can of course be applied to the get_ functions.
Then here's a (untested) rewrite of the DAA function, based on MAME's Z80 core code. See the attachment for the code.
Juergen
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/3/2010 6:58:04 PM GMT
Your comments are most interesting. I see some comments over on the catalina thread as well - I might answer them here as this is the zicog thread (though there is lots of overlap on various threads).
For the dracblade board I added a few extra instructions. I think there are four things you have to change to add an instruction but it is fairly easy and a matter of following examples.
The problem of it not fitting in a cog has been solved by using 'overlays', though I must confess I have absolutely no idea how overlays actually work. I have a vague understanding that somehow they shift codespace from the cog into hub ram. That works on all the current boards (though on the dracblade even hub ram is almost all used up).
In any case, zicog is portable code that can be run of several different platforms, just by changing the small code that handles ram access.
IX and IY and a whole lot of other instructions are not done yet.
It would be great to work together to add more instructions. What sort of hardware do you have?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Post Edited (Dr_Acula) : 3/3/2010 11:46:11 PM GMT
None - yet I'm expecting to receive this board by the end of this week. I have an USB to serial adapter here and hope it will work. Otherwise I would have to solder some serial cable. Fortunately my PC still has a real serial port. I also have a FBAS monitor from the late 1980s somewhere in the shed, so I'll be going with PAL FBAS. And for the keyboard and mouse: I'm probably going to add two connectors on the raster area that connect to the same pins as the Parallax demo board. I'm not sure yet if and how to add additional (S)RAM and/or a SD card slot.
For the overlays and space constraints: I pretty much understood the problem, and the solution with overlays is probably the best thing you can do. It's like swapping in opcodes on demand, which isn't all that fast, but the opcodes are the less often used ones or the ones which repeat (the overlay) many times. As far as I can tell ZiCog is driving near the edge of the ways that are possible to walk on a single cog. I don't see paths to simplify what it does. Even my suggestion from above adds a lot of latency to the get/put accessors, just to save some (perhaps a dozen?) more longs for code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/4/2010 12:20:27 AM GMT
It may be useful to put the bc2, de2, hl2, af2 in the upper word of the bc, de, hl, af registers in this case, which would make exx a little simpler: rol bc_reg, #16; rol de_reg, #16; rol hl_reg, #16. Well, in hub ram, thus with rdlong/wrlong or something like that... awkward.
And now, the longer I look at this approach, it doesn't seem to be all too useful. I think the additional cycles hurt too much to try it this way.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/4/2010 12:45:11 AM GMT
Hope you prop arrives soon.
IMHO we are now up to the point where we could use a second cog to do some of the instructions. I know heater tried using multiple cogs in the first place. We have now learnt a lot about the prop and it's cogs, and I am sure we could now use that with benefit.
However, one of the issues here is that the code is diverging as I am following the path that any complex I/O will be done by another prop as speed is my main motivation. Drac (and some others), are following a single prop solution where the complex I/O is on the same prop and speed is unimportant. Drac's solution requires the availability of cogs to perform the various I/O drivers.
And now heater is following a new processor, the ZPU.
There is another solution over on the N8VEM forum where they are using a prop (or 2) to control a real Z80 board. As far as I am concerned, it defeats the purpose of the prop ZiCog emulation.
I have been spending my energy on expanding SphinxOS with a view to get it running ZiCog as a Sphinx aware program. However, there are currently not enough cogs to do this, since Sphinx currently uses 3 cogs to handle the SD card drivers. This will require further work.
It will be interesting to see how all this plays out.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
I think ultimately you will need to get a Prop board with external RAM to appreciate what ZyCog can do with CP/M and such on the Prop. Mind you I did develop most of the emulator with only a Prop Demo board and a 2 x 20 LCD panel.
I like your idea for the getters and setters however I'm loath to add any instructions in the most uses code paths. We have to think about performance as well as code size. Did you notice that Reg A and the flags are not normally read from HUB they are kept locally in COG for speed as it avoids a lot of wrd/wrbyte.
Currently all Intel 8080 instructions are performed directly in COG resident code, except DAA, IN and OUT I think. This keeps CP/M performance up to being useful as most CP/M code does not use Z80 ops. The extra Z80 ops are done by pulling in overlays. Z80 performance is not such a worry for us CP/M heads. This does mean though that those wanting to create games systems, or Sinclair Spectrums etc may be disappointed with ZiCog due its Z80 performance.
Dr_A: "and a whole lot of other instructions are not done yet. " Actually I don't think it's so bad as a "whole lot". it's not so many.
Cluso: A ZiCog that uses more than one COG for Z80 emulation is not "ZiCog" anymore. Having tried it once already I'm not inclined to try again even if I think I have a way to make it work better. Besides as you are having problems with ZiCog on Sphinx and so was Dr_A on DracBlade due to running out of COGs we have to find a better one COG solution.
Edit: Yes I've wondered away from ZiCog a bit with my new Zog obsession but it is still always in the back of my mind I think it's just waiting for the next "great idea" that will get it up to 100% Z80 in a nice way.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 3/4/2010 5:51:35 AM GMT
Is it possible that the answer to all our COG space problems has been staring us in the face all the time?
Currently we have a lot of "resident" code that sits in COG all the time. It has that privileged position because it is code that is used by many Z80 instructions, the fetch/execute loop, the memory access functions, the putters and getters, PUSHer and POPers, all little "mico-op" things. This "privileged" code is pretty much all you need for the base 8080 instruction set.
The non-privileged, mostly Z80 extension, instructions are pushed out to slow overlays because as CP/M heads we don't care so much about how fast the are.
Now an example of our problem is that Pullmoll has proposed a nice solution to get DAA working accurately. That solution requires a 33 LONG overlay and does not fit.
Now here is the idea: During execution of that DAA a lot of our "privileged" resident code need not be resident at all. DAA does not need all those getters and putters, PUSHers and POPers etc etc. Worse still when DAA is not executing it is sucking up a big overlay space for nothing.
So:
1) Why not create a big(er) overlay area which initially holds a selected collection resident functions.
2) This initial overlay may never be swapped out when running 8080 code.
3) We arrange that whatever code is in that initial overlay is never required by any of the Z80 overlays.
4) On running a Z80 op the initial overlay is swapped for a Z80 op overlay.
5) On completion of the Z80 op the initial overlay is reloaded back into position in the COG.
Cluso's overlay mechanism already includes such a "default" initial overlay that is in place at start up. Currently it holds DAA code.
We just need to arrange that that initial overlay holds a big bunch of code that is not required during use of other overlays. The big difference here is that the default overlay must be reloaded after an overlaid Z80 op whereas currently we just leave whatever overlay was just used in COG. This reloading is a performance hit but there we are.
A slight complication is that the default overlay will now have to hold more than one function. Like a little library. I hope this can all be loaded and links up nicely.
P.S. My other take on this is to give up the overlay mechanism entirely.
Why not use LMM? With a simple LMM kernel in place we can just keep coding new ops without a care in the world until they are all done. Performance may not be up to overlay speed but it's pretty close. And with the above solution of reloading a semi-resident overlay after each other overlay is used then LMM may be quicker than overlay. It would make for a more elegant looking solution.
I tried LMM in the original PropAltair and gave up on it due to speed issues. But in that case it was being used for 8080 code as well, not good. ZiCog would not have that problem with LMM.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 3/4/2010 6:28:01 AM GMT
Ok, just to recap - to add an instruction:
1) in the PUB Start of the zicog, lines here
2) Add the actual code - this is about 2/3 of the way down the zicog code.
3) in a CON list add the instruction (you need to count in hex to do this!)
4) In a DAT list underneath this CON, add the overlay table
5) Down the bottom of the code there is a list of Z80 opcodes. Put in the pointer at the appropriate opcode
I can post the code for ex af,af', exx and ireg if required.
I'm slowly coming to understand what the overlay does. I understand that it all fits in the DAA space. They all start with " org OVERLAY_START"
And they all end with
What I've not quite understood is how you fit it all in a cog. I still don't understand it completely, as I was thinking that maybe the DAA was the last instruction in the cog code and the compiler converts it to bytes and then the overlay loader gets the bytes from that location and loads them into the DAA code space. All would make sense, but how is the DAA code in the middle of the cog code? Why doesn't this displace the cog code after this point so it can't then be loaded?
But anyway, the next relevant bit of code is the overlay code:
Now, to the suggestions. We don't have any free bytes so the first thing might be to free some up somewhere. Because as you say, then you need some new code does heater's step 5) and reloads the original code.
I think that could work though. Indeed, the thing about all the Z80 instructions is they are only called infrequently. So as an extreme example, running a Z80 overlay could rewrite almost the entire cog, then do something, then reload the cog.
Where from?
Well I've got 448k of ram completely unused on the dracblade, and I think that is true for cluso's boards as well.
Ok, maybe you don't rewrite an entire cog but I like the concept of picking a block of code that would never be used by the overlay. Heck, it could even be the block at the end with all the pointers for the opcodes. Hence I need to understand better how the overlays can sit in the middle of zicog and don't have to be at the end of the code.
So - LMM. Certainly we already have code blocks to move bytes in and out of external ram. Hey, guess what, that happens to be the custom ram driver code already in the zicog!
I do like the idea of being able to add more instructions more easily. Maybe with less steps than the above 5 step example. Ok, maybe this is a bit of an ask, but I'm running out of hub ram, so maybe the code for Z80 opcodes could somehow be loaded off an sd card into the external ram chip at bootup?
The challenge I'm confused about is how to write 'portable' code that somehow can be compiled externally yet still knows about things like the registers. eg
fine once you have the registers like c_reg, but you need to point to them somehow?
Though, as a general principle, would it be true to say that if you gave an isolated bit of pasm code access to all the Z80 registers, and told it which instruction to process, that is all the information it would ever need?
So maybe you just need to pass the location of the first register? The register list order never changes, so if there is a comment at the beginning of the overlay opcode noting the order
Then really, all you need to do is pass it 'reg_base'
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Am I correct (Bill can answer this) in saying that each LMM instruction takes 32 clocks as it is not possible to catch each hub cycle? If so, then I suspect overlaying is still likely to be quicker. Obviously this depends on the exact code being executed. For example, a loop will be much faster in overlays, but inline would likely be marginal if an overlay had to be reloaded at the end.
My issue with SPhinxOS is that it requires 3 cogs for SD drivers. For Sphinx to be of any use, this has to be trimmed, even if it means more hub space is used. As it is now, normally 5 cogs are used for drivers. 2 for the I/O (only 1 if it is serial) and 3 for SD. Of course, Sphinx itself uses 1 but that is stopped when control passes to another program. So, in reality, a user Sphinx aware program can only use 3 cogs, plus have SD and I/O drivers resident.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Using 2 Cogs is out for ZiCog, at least for me. If anyone wants to do it that's a different project. More than one COG would upset Dr_A and your Sphinx efforts. To me it just seems wrong to need two 32-bit processors to emulate one little 8-bitter[noparse]:)[/noparse]
I think I follow you re: the performance of overlays vs LMM but I feel it might be close enough not to worry about in the end.
My reasoning is this:
1) An LMM solution can execute "straight line" code as fast as the overlay. Reason being that whichever way you do it instructions have to be loaded and executed. The only difference between the two techniques is that in overlay there is a tight loop around the load part but in LMM there is a loop that includes the load and the execute. I'm sure you will find that the reverse LMM execution I used in PropAltair is of equivalent speed to overlay on straight line code.
2) Overlays obviously win where the code includes a lot of jumps and especially loops around a few times. The code only needs loading once rather than every time around a loop.
3) BUT most of our overlaid stuff is "straight line" apart from the string ops. I'm inclined not to worry about them[noparse]:)[/noparse]
4) If we go down the route of putting currently "resident" code into an overlay as I suggested above. Then we may get the space BUT it requires that the "semi-resident" stuff is always reloaded after a Z80 overlay is completed. This reloading overhead is going to put overlays back into the performance range of LMM.
5) If the aim is to get as close to 100% Z80 correctness as possible then the slight speed hit of LMM may just be worth it.
On the plus side a simple LMM kernel need not be much bigger than the current overlay loader and it does not require dedicating space in COG for the overlay area.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.