Fixed another flaw with the RST opcodes. They were also reading a word imm16 at PC, which of course was wrong. I swapped the way how current and next PC are stored in the code snippets and could join the patchers for CALLs and RSTs, saving 7 or 8 cog longs.
Hmm.. first result from a patched EXZ80DOC.COM
Z80 instruction exerciser
Undefined status bits NOT taken into account
<inc,dec> h...................( 3,072) cycles ERROR **** crc expected:1ced847d found:7a5cf2b6
:-(
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/14/2010 11:02:05 PM GMT
Now I start understand that it is not native emulator.
That exactly flow PC code.
And why You ned PC at compile time not at time rela PC counter point to.
As PC steps not fould execution time.
Thanks
Regards
Christoffer J
Just download the source from the first page and enable the #define DEBUG in pm80.spin prior to running pm80_demo.spin. Then you will see on the terminal a dump of alternatingly Z80 code and registers and a coredump of the generated PASM opcodes that are to be executed. The length of the compiled section of Z80 code depends on two limits: 1) the output buffer in cog RAM may become full. 2) The instruction may be a terminator, that is it alters the program flow and it would make no sense to sequentially compile more code (RET unconditional, JR unconditional, JP unconditional, CALL, RST). Conditional jumps that are in range (backwards) of the currently compiled block are not necessarily terminators and are compiled to also conditional jumps in PASM. This works only for backwards jumps, because otherwise the generated code would have to be patched after it was completed, which is way too complicated to fit in the cog RAM. At least I believe it is too complicated. It would generate pretty efficient code if it was possible... Most loops use backwards jumps, though, and if they are short enough, they will execute at full speed in the cog.
In the output the lines starting with the Z80 pc have the format:
<PC>:<opcode1> <BC> <DE> <HL> <AF> <IX> <IY> <SP> <PC> <PC0> <PC1> <R> <R2_I> <offset into compiler buffer>
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/14/2010 11:43:49 PM GMT
pullmoll, how many instructions are you reading ahead?
I'm thinking that now there is some space in the zicog there might be some space for some extra read code. With latched ram, the top 8 bits are loaded then the lower 8 bits, but if you are reading sequential bytes, almost all the time the top 8 bits are not going to change.
I'll wait until you have more instructions coded and see how much space is free.
I just had another crazy idea. Now that the idea of recycling hub ram (that was used to load a cog) has been shown to work, I wonder about caching 256 bytes into hub? I shall think about this more. Actually, there is more than enough space for a cache in hub that doesn't recycle lost code space.
What you are doing is very clever. I need to go back to the beginning of this thread and read it all again!
Dr_Acula said...
pullmoll, how many instructions are you reading ahead?
I described it in the previous post. The numbers are between just 1 (e.g. a called CALL) and around 80 opcodes (long sections of simple opcodes, e.g. LD B, C and the like).
Dr_Acula said...
I'm thinking that now there is some space in the zicog there might be some space for some extra read code. With latched ram, the top 8 bits are loaded then the lower 8 bits, but if you are reading sequential bytes, almost all the time the top 8 bits are not going to change.
Yes, latched external RAM would be possible, but the code accessing it would be longer and thus eat up pretty cog longs.
Dr_Acula said...
I'll wait until you have more instructions coded and see how much space is free.
My pm80.spin is complete, it has all Z80 instructions, even the undocumented ones. Or are you talking about zicog?
Dr_Acula said...
I just had another crazy idea. Now that the idea of recycling hub ram (that was used to load a cog) has been shown to work, I wonder about caching 256 bytes into hub? I shall think about this more. Actually, there is more than enough space for a cache in hub that doesn't recycle lost code space.
What you are doing is very clever. I need to go back to the beginning of this thread and read it all again!
I think your idea what is the core of the VMM idea. The virtual memory management would keep pages of external RAM in hub RAM space by managing a cog RAM lookup table for virtual -> physical address.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/14/2010 11:44:40 PM GMT
The last version of the SD card drivers I read was very old so I'm not entirely sure what the current version is doing but some reasons that various cards may not work:
1. MMC cards need to be started with CMD0 the CMD1, where as the spec for SD is CMD0, ACMD41. If the code is using ACMD41 then small MMC cards will never finish initialisation properly.
2. Some (most modern) 2GB SD cards are release 2 cards and should have CMD8 issued between CMD0 and CMD1 at start up.
3. 4GB+ cards are HCSD and require a CMD8 message with the "High Capacity Supported" bit set to let the card know the host can handle them, once started they are actually simpler to use but the addresses are block aligned rather than byte, and the card size is specified in a multiple of 512K blocks rather than lots of separate multiplier fields.
Also worth noting is that most HCSD cards come pre-formatted in FAT32 which won't work with the basic FAT16 driver obviously.
If I get a chance I might take a look at the current SD drivers and see if I can add a bit of code for supporting more SD cards.
A possible compromise for FAT/not FAT users is to come up with some script to generate a card image that puts the FAT partition offset from the start of the disk by a few MB and put the raw data between the end of the Master Boot Record (first 512 byte block) and the start of the FAT partition.
hairymnstr said...
A possible compromise for FAT/not FAT users is to come up with some script to generate a card image that puts the FAT partition offset from the start of the disk by a few MB and put the raw data between the end of the Master Boot Record (first 512 byte block) and the start of the FAT partition.
First thanks for the explanation! Getting bigger SD cards to work would certainly help, because the 2GB ones are already the low end and we all know that things tend to get bigger only all the time. 4GB is sometimes as cheap or even cheaper, depending on the brand, than 2GB cards. The biggest ones you get seem to be 32GB cards.
Is there a MBR like structure on the SD cards? Then it would of course be easy to create an image partition of a type that other media ignores and put the FAT partition behind that. Then code being interested in the raw image can just calculate the start offset and size from the MBR entry and do its own things inside there. This is a very good idea! Thanks!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/14/2010 11:53:09 PM GMT
I'm thinking something like the original MicroSoft 4K BASIC for the Altair. Does not need any operating system just console ports. We had it running on the predecessor of ZiCog, PropAltair. Just an 8080 emulator.
4K BASIC does need RST working and requires some fixed number to be input from a "switch" port. I have the details somewhere.
That would now be an interesting candidate to try, before I get the DracBlade and can throw the CP/M boot loader at it...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Yes I hope that board arrives soon because then you can do things like write a tiny assembly program to test as many or as few instructions as you like instead of waiting for the exerciser to run.
I think I understand the read-ahead model now. It sounds very clever and should speed things up a lot.
Dr_Acula said...
Yes I hope that board arrives soon because then you can do things like write a tiny assembly program to test as many or as few instructions as you like instead of waiting for the exerciser to run.
I think I understand the read-ahead model now. It sounds very clever and should speed things up a lot.
I already did that. In the archive is the source of my own z80 assembler az80. It is used to generate the page0.bin which contains a BDOS replacement for conin, conout and printst. Then there's a demo.z80 which also has a section to exercise some opcodes. It is compiled and run at 100h if you comment out the EXZ80DOC define. EXZ80DOC.COM is also running now and giving me slaps I'll let it run for some time and see how bad I implemented the opcodes.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Re "I'll let it run for some time and see how bad I implemented the opcodes."
Way back in the dim dark mists of time (ie last year) I ran the exerciser on a real Z80 board and there were a couple of groups of instructions that did not pass on a real Z80. Heater might remember where the discussion was ? on the zicog thread but it might be on the triblade thread. So if something doesn't pass please let us know which one, as I still have those boards and I can do the tests again if needed.
Dr_Acula said...
Re "I'll let it run for some time and see how bad I implemented the opcodes."
Way back in the dim dark mists of time (ie last year) I ran the exerciser on a real Z80 board and there were a couple of groups of instructions that did not pass on a real Z80. Heater might remember where the discussion was ? on the zicog thread but it might be on the triblade thread. So if something doesn't pass please let us know which one, as I still have those boards and I can do the tests again if needed.
I had seen that post in the ZiCog thread (almost sure). It looks like they are all failing here, so its probably a bug in the opcodes used to calculate the CRC. Anyway, the biggest hurdle was that nasty bug and that's squashed now. It looks like my emulation isn't fast(er) at running this exerciser. Time will tell how it behaves with the average application program.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
I've started looking at some of the code that is being used to access the SD cards, turns out it's going to be a huge task in co-ordination to get another developer working on those bits of code. For example on the Tri-Blade as far as I can see the code for accessing the SD card is in a file shared with some i2c routines for loading extra info from the EEPROM (to bootload other blades??).
The "everything and the kitchen sink" file-system driver that Kye has written does startup correctly to register all MMC, SD1, SD2, SDHC cards. There's a huge amount of spin on top of the low level block driver to support FAT16 and 32, but that could be stripped out.
I'm not really sure where to look now without stepping on too many toes. I'm also now stuck between a DracBlade clone or a Tri-Blade compatible. I can see advantages to both.
I've started looking at some of the code that is being used to access the SD cards, turns out it's going to be a huge task in co-ordination to get another developer working on those bits of code. For example on the Tri-Blade as far as I can see the code for accessing the SD card is in a file shared with some i2c routines for loading extra info from the EEPROM (to bootload other blades??).
The "everything and the kitchen sink" file-system driver that Kye has written does startup correctly to register all MMC, SD1, SD2, SDHC cards. There's a huge amount of spin on top of the low level block driver to support FAT16 and 32, but that could be stripped out.
I'm not really sure where to look now without stepping on too many toes. I'm also now stuck between a DracBlade clone or a Tri-Blade compatible. I can see advantages to both.
Well, if Kye's driver already does detect all card types, then this would be the way to go, no? Perhaps it can be modularized, i.e. separate the PASM code that does the access from the file system code so that it could be used standalone? I have not looked into anything else but the sdspiFemto.spin that was in Dr_Acula's package. Sorry if I am no help here.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
I agree, if we can get some cut down version that still implements enough functionality to support a wider range of cards that would be a useful improvement. The big problem is how hardware dependent the SD card code is, all kinds of tricks have been done to save a pin or two on these CPM machines, so there are usually some contention issues with other hardware.
The other problem is that the SD SPI code alone is not big enough to fill a COG really and with the desperate need for more memory and processors in some setups (DracBlade?) there is usually something else in with the SD code. This means that it's not so easy to just write a nice .spin file with an object in it and drop it in.
As I see it Kye has already done the generic cross-platform driver work, what's left is hardware specific integration of the pasm.
hairymnstr said...
I agree, if we can get some cut down version that still implements enough functionality to support a wider range of cards that would be a useful improvement. The big problem is how hardware dependent the SD card code is, all kinds of tricks have been done to save a pin or two on these CPM machines, so there are usually some contention issues with other hardware.
The other problem is that the SD SPI code alone is not big enough to fill a COG really and with the desperate need for more memory and processors in some setups (DracBlade?) there is usually something else in with the SD code. This means that it's not so easy to just write a nice .spin file with an object in it and drop it in.
As I see it Kye has already done the generic cross-platform driver work, what's left is hardware specific integration of the pasm.
Well, I don't know enough about the details. I'm about to receive a DracBlade and then might help, if I can. Is there more than different pins for the 4 signals to do with hardware specific integration? I saw in some code for the TriBladeProp that it disables access to XMM before going to possibly send or receive commands to/from the SD card and file system stuff.
As I said, I don't know enough about it to be of any help at the moment.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Z80 instruction exerciser
Undefined status bits NOT taken into account
ld hl,(nnnn)..................( 16) cycles OK
ld sp,(nnnn)..................( 16) cycles OK
ld (nnnn),hl..................( 16) cycles OK
ld (nnnn),sp..................( 16) cycles OK
ld (<ix,iy>+1),nn.............( 32) cycles OK
ld (<ix,iy>-1),nn.............( 32) cycles OK
ld <bc,de>,(nnnn).............( 32) cycles OK
ld <ix,iy>,(nnnn).............( 32) cycles OK
ld <ix,iy>,nnnn............cles OK
ld a,<(bc),(de)>..............( 44) cycles OK
ld a,(nnnn) / ld (nnnn),a.....( 44) cycles OK
ldd<r> (1)....................( 44) cycles OK
ldd<r> (2)....................( 44) cycles OK
ldi<r> (1)....................( 44) cycles OK
ldi<r> (2)....................( 44) cycles OK
ld <b,c,d,e,h,l,(hl),a>,nn....( 64) cycles OK
ld (nnnn),<bc,de>.............( 64) cycles OK
ld (nnnn),<ix,iy>.............( 64) cycles OK
ld <bc,de,hl,sp>,nnnn.........( 64) cycles OK
ld (<ix,iy>+1),a..............( 64) cycles OK
ld (<ix,iy>-1),a..............( 64) cycles OK
ld (<bc,de>),a................( 96) cycles OK
ld a,(<ix,iy>+1)..............( 128) cycles OK
ld a,(<ix,iy>-1)..............( 128) cycles OK
ld (<ix,iy>+1),<h,l>..........( 256) cycles OK
ld (<ix,iy>-1),<h,l>..........( 256) cycles OK
ld <h,l>,(<ix,iy>+1)..........( 256) cycles OK
ld <h,l>,(<ix,iy>-1)..........( 256) cycles OK
<set,res> n,(<ix,iy>+1).......( 448) cycles OK
<set,res> n,(<ix,iy>-1).......( 448) cycles OK
ld <b,c,d,e>,(<ix,iy>+1)......( 512) cycles OK
ld <b,c,d,e>,(<ix,iy>-1)......( 512) cycles OK
ld (<ix,iy>+1),<b,c,d,e>......( 1,024) cycles OK
ld (<ix,iy>-1),<b,c,d,e>......( 1,024) cycles OK
<inc,dec> bc..................( 1,536) cycles OK
<inc,dec> de..................( 1,536) cycles OK
<inc,dec> hl..................( 1,536) cycles OK
<inc,dec> ix..................( 1,536) cycles OK
<inc,dec> iy..................( 1,536) cycles OK
<inc,dec> sp..................( 1,536) cycles OK
bit n,(<ix,iy>+1).............( 2,048) cycles OK
bit n,(<ix,iy>-1).............( 2,048) cycles OK
<inc,dec> a...................( 3,072) cycles OK
<inc,dec> b...................( 3,072) cycles OK
<inc,dec> c...................( 3,072) cycles OK
<inc,dec> d...................( 3,072) cycles OK
<inc,dec> e...................( 3,072) cycles OK
<inc,dec> h...................( 3,072) cycles OK
<inc,dec> l...................( 3,072) cycles
And in the ld <ix,iy>,nnnn the myterious UART dropout or hiccup happened. It looks like an entire block of 16 bytes is missing.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Lots of IX and IY opcodes there. Does the exerciser do exx and ex af,af'?
re sd cards, the code in the dracblade is old, but it works fine. I'm sure it can be updated and made shorter. (Also the VT100 code might be able to be improved. )
re uarts, I had to go from 38400 back to 19200 baud as xmodem wasn't storing the packets fast enough. I know the N8VEM at 4mhz will work at 38400 but I think the zicog is more like 3 to 3.5mhz so it is not quite fast enough. Also I'm using a different serial object, so there are several reasons why I might have never seen the uart problems. I am hopeful that faster zicog might mean the baud rate can go back up to 38400.
Actually, this is good brain exercise to look at a real program and see how it is processed. See attached - the code for getting a byte in xmodem is in RECV:
Ignore the .IF commands. I think it will read all the commands in until
JP Z,MCHAR ; GOT CHAR
as that is the first jump? Then if that condition is true, jump to MCHAR which is a few bytes further down, skip a whole lot of .IF code, then do the CRC routine on the byte in A and return having collected one byte. A question - next time it gets the byte in RECV, does it need to load all that RECV code back in again? Or is that now cached?
I seem to remember the were two or three. All but one were testing some kind of IN. I was not sure how that was supposed to pass anyway on a MiniN8VEM card where the inputs were undefined in hardware.
The one other failing test case I have of course forgotten.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Dr_Acula said...
A question - next time it gets the byte in RECV, does it need to load all that RECV code back in again? Or is that now cached?
@pullmoll, how many more instructions left?
This code is probably a) too long to fit in the remaining cog RAM and b) contains forward jumps that, when executed, will cause recompilation. The caching and full speed execution works only for conditional backwards jumps, such as loops in division/multiplication functions or the like.
Do you mean the number of free hub RAM longs? That is around 3792 with pm80.spin alone, so it uses more than half the RAM. Then there is io.spin, which is required as the counterpart to the in and out opcodes. It is currently 469 longs and doing the UART and part of the HDSK handling, without actually accessing an SD card. FullDuplexSerial is 174 longs. It's definitely going to be tight.
I'm thinking about how to get rid of the almost identical code and table for IX/IY. They could be joined if I had a pointer to the registers in hub RAM rather than in cog RAM. This would save a lot of duplicated code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
pullmoll: My designs do not use latches (The latch on the TriBlade Blade #2 is there for chip selects and resetting the other props) so the sram takes 8 data + 19 address lines. The SD card shares 3 of these lines so it is required to tristate the bus in order to access the SD card. Similar constraints apply to the RamBlade but no latch is present at all. The valid assumption is that SD is only accessed ocasionally wheras sram is accessed consistently. This is a speed design.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
Cluso99 said...
pullmoll: My designs do not use latches (The latch on the TriBlade Blade #2 is there for chip selects and resetting the other props) so the sram takes 8 data + 19 address lines. The SD card shares 3 of these lines so it is required to tristate the bus in order to access the SD card. Similar constraints apply to the RamBlade but no latch is present at all. The valid assumption is that SD is only accessed ocasionally wheras sram is accessed consistently. This is a speed design.
I have Heater's ifdef TriBladeProp and dira assignments in the pm80 core, too. It should work, if you define TriBladeProp at the top. Of course untested yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
I seem to remember the were two or three. All but one were testing some kind of IN. I was not sure how that was supposed to pass anyway on a MiniN8VEM card where the inputs were undefined in hardware.
The one other failing test case I have of course forgotten.
The following tests were almost all failing, though.
I know the reason for cpd/cpi now, but wonder why ld <bcdexya>,<bcdexya> could fail, as there are no flags involved. Also NEG is just a subtract from 0 and should be okay. Well, I've still got some things to do.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
FYI: I managed to join the IX and IY opcodes, so one 256 words table is gone and a number of nearly identical code snippets for IY also. Now there are XY code fragments that use a preset register containing a pointer to either IX or IY. The hub RAM used by pm80.spin alone is now down to 3855 longs, so more than 50% of the Propeller are free to fill with other code. I think this is about as much as you can squeeze it. The Z80 cpi<r> and cpd<r> opcodes are passing the exerciser tests, but there's still a number of opcodes left to fix.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Great, ZiCog has the same idea for IX and IY ops, only one table and pointers to index registers. I'm sure we could combine those with the base dispatch table with a bit of effort. I don't want to do that for ZiCog as I'm sure selecting the correct index HL, IX or IY all the time would slow thing down. Also I'd like to make an ICog (damn Apple) that is a basic Intel 8080/5 at some point so its best to leave as is.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
heater said...
Great, ZiCog has the same idea for IX and IY ops, only one table and pointers to index registers. I'm sure we could combine those with the base dispatch table with a bit of effort. I don't want to do that for ZiCog as I'm sure selecting the correct index HL, IX or IY all the time would slow thing down. Also I'd like to make an ICog (damn Apple) that is a basic Intel 8080/5 at some point so its best to leave as is.
No, combining IX/IY with the base opcodes is not feasible, because there are too many exceptions. Think of the LD L,(IX+offs), which would be LD LX,(IX+offs) if you just swapped IX with HL for the instruction.
For me joining the tables cost one conditional jump in the comiler loop and a few longs of precious, precious cog RAM, but it saved so much hub that it's worth it.
I had the 8085 separated out to not clutter the source with even more #ifdefs. It doesn't work right now, after fixing the big bug I had in both cores, but it's in the backyard for use when I understand video generation. I intend to write a Tandy M100 emulator some day. I have written one in C and know the awkward LC display addressing. The M100 has 32KB of ROM and 32KB of RAM, so this is only going to work with XMM.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Why? Simple: Today madness got me and I wrote a nearly complete Z80 core using the LMM technique only, for every opcode. The cog contains the subroutines to do arithmetic and logic operations and flag calculations, while all opcodes are in the hub RAM and executed as LMM. It isn't ready yet, but it will be a little smaller than the DRC core. It is much easier to understand and follow the source compared with the DRC core. It is twice as fast compared to the DRC core, at least! It may even be three times as fast. To me this suggests that the overhead of compiling the Z80 code into PASM does not outweigh the performance gain by direct execution of the compiled code. The idea of dynamic recompilation fails for the Propeller, mostly because there is not enough RAM to have a real cache of translated code fragments, as you would have them on a PC with lots of RAM.
The new core won't take all too long, since it's merely a re-use of the existing and tested code with slight changes here and there. The first dozen of the EX tests already succeed. I'll open a new thread once that core is halfway stable.
Juergen
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/18/2010 11:53:47 PM GMT
Any LMM code that loops - for example the string move and/or compare operations - benefits greatly, becoming MUCH faster, as it gets rid of the LMM overhead for all but the first execution of the loop.
My standard implementation uses a 128 long buffer in the cog at $080 as the FCACHE buffer, but it can be smaller - it only needs to be large enough to fit the largest FCACHE'd segment.
The format is:
CALL #FCACHE
long number_of_longs_following
org $80
regular pasm code
...
next LMM op follows the number_of_longs longs
Within the cached code, regular jumps & branches & djnz work. I suspect 40-80 longs would be large enough for any complex loop a implementing Z80 opcode would require. Note: only FCACHE the loop, not the setup and epilog code for best results.
Any LMM code that loops - for example the string move and/or compare operations - benefits greatly, becoming MUCH faster, as it gets rid of the LMM overhead for all but the first execution of the loop.
My standard implementation uses a 128 long buffer in the cog at $080 as the FCACHE buffer, but it can be smaller - it only needs to be large enough to fit the largest FCACHE'd segment.
The format is:
CALL #FCACHE
long number_of_longs_following
org $80
regular pasm code
...
next LMM op follows the number_of_longs longs
Within the cached code, regular jumps & branches & djnz work. I suspect 40-80 longs would be large enough for any complex loop a implementing Z80 opcode would require. Note: only FCACHE the loop, not the setup and epilog code for best results.
Hope this helps,
Bill
Thank you for the hint! It depends on how much will be left in the cog after all alu operations and flag handling is there. This has priority, because repeating it in the opcodes is useless waste of RAM. There are 8 Z80 instructions with loops, where only 2 are really widely used: ldir/lddr. Their core is 8 or 9 PASM instructions. If space permits, I may just move these core loops into the cog and call them from the LMM code. Currently I have 156 longs left and there isn't much more to do to finish the remaining opcodes. I guess I could go either way: fixed cog functions or dynamic FCACHE functions.
Juergen
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Comments
Hmm.. first result from a patched EXZ80DOC.COM
:-(
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/14/2010 11:02:05 PM GMT
Just download the source from the first page and enable the #define DEBUG in pm80.spin prior to running pm80_demo.spin. Then you will see on the terminal a dump of alternatingly Z80 code and registers and a coredump of the generated PASM opcodes that are to be executed. The length of the compiled section of Z80 code depends on two limits: 1) the output buffer in cog RAM may become full. 2) The instruction may be a terminator, that is it alters the program flow and it would make no sense to sequentially compile more code (RET unconditional, JR unconditional, JP unconditional, CALL, RST). Conditional jumps that are in range (backwards) of the currently compiled block are not necessarily terminators and are compiled to also conditional jumps in PASM. This works only for backwards jumps, because otherwise the generated code would have to be patched after it was completed, which is way too complicated to fit in the cog RAM. At least I believe it is too complicated. It would generate pretty efficient code if it was possible... Most loops use backwards jumps, though, and if they are short enough, they will execute at full speed in the cog.
In the output the lines starting with the Z80 pc have the format:
<PC>:<opcode1> <BC> <DE> <HL> <AF> <IX> <IY> <SP> <PC> <PC0> <PC1> <R> <R2_I> <offset into compiler buffer>
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/14/2010 11:43:49 PM GMT
I'm thinking that now there is some space in the zicog there might be some space for some extra read code. With latched ram, the top 8 bits are loaded then the lower 8 bits, but if you are reading sequential bytes, almost all the time the top 8 bits are not going to change.
I'll wait until you have more instructions coded and see how much space is free.
I just had another crazy idea. Now that the idea of recycling hub ram (that was used to load a cog) has been shown to work, I wonder about caching 256 bytes into hub? I shall think about this more. Actually, there is more than enough space for a cache in hub that doesn't recycle lost code space.
What you are doing is very clever. I need to go back to the beginning of this thread and read it all again!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
I described it in the previous post. The numbers are between just 1 (e.g. a called CALL) and around 80 opcodes (long sections of simple opcodes, e.g. LD B, C and the like).
Yes, latched external RAM would be possible, but the code accessing it would be longer and thus eat up pretty cog longs.
My pm80.spin is complete, it has all Z80 instructions, even the undocumented ones. Or are you talking about zicog?
I think your idea what is the core of the VMM idea. The virtual memory management would keep pages of external RAM in hub RAM space by managing a cog RAM lookup table for virtual -> physical address.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/14/2010 11:44:40 PM GMT
The last version of the SD card drivers I read was very old so I'm not entirely sure what the current version is doing but some reasons that various cards may not work:
1. MMC cards need to be started with CMD0 the CMD1, where as the spec for SD is CMD0, ACMD41. If the code is using ACMD41 then small MMC cards will never finish initialisation properly.
2. Some (most modern) 2GB SD cards are release 2 cards and should have CMD8 issued between CMD0 and CMD1 at start up.
3. 4GB+ cards are HCSD and require a CMD8 message with the "High Capacity Supported" bit set to let the card know the host can handle them, once started they are actually simpler to use but the addresses are block aligned rather than byte, and the card size is specified in a multiple of 512K blocks rather than lots of separate multiplier fields.
Also worth noting is that most HCSD cards come pre-formatted in FAT32 which won't work with the basic FAT16 driver obviously.
If I get a chance I might take a look at the current SD drivers and see if I can add a bit of code for supporting more SD cards.
A possible compromise for FAT/not FAT users is to come up with some script to generate a card image that puts the FAT partition offset from the start of the disk by a few MB and put the raw data between the end of the Master Boot Record (first 512 byte block) and the start of the FAT partition.
First thanks for the explanation! Getting bigger SD cards to work would certainly help, because the 2GB ones are already the low end and we all know that things tend to get bigger only all the time. 4GB is sometimes as cheap or even cheaper, depending on the brand, than 2GB cards. The biggest ones you get seem to be 32GB cards.
Is there a MBR like structure on the SD cards? Then it would of course be easy to create an image partition of a type that other media ignores and put the FAT partition behind that. Then code being interested in the raw image can just calculate the start offset and size from the MBR entry and do its own things inside there. This is a very good idea! Thanks!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/14/2010 11:53:09 PM GMT
That would now be an interesting candidate to try, before I get the DracBlade and can throw the CP/M boot loader at it...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
I think I understand the read-ahead model now. It sounds very clever and should speed things up a lot.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
I already did that. In the archive is the source of my own z80 assembler az80. It is used to generate the page0.bin which contains a BDOS replacement for conin, conout and printst. Then there's a demo.z80 which also has a section to exercise some opcodes. It is compiled and run at 100h if you comment out the EXZ80DOC define. EXZ80DOC.COM is also running now and giving me slaps I'll let it run for some time and see how bad I implemented the opcodes.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Way back in the dim dark mists of time (ie last year) I ran the exerciser on a real Z80 board and there were a couple of groups of instructions that did not pass on a real Z80. Heater might remember where the discussion was ? on the zicog thread but it might be on the triblade thread. So if something doesn't pass please let us know which one, as I still have those boards and I can do the tests again if needed.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
I had seen that post in the ZiCog thread (almost sure). It looks like they are all failing here, so its probably a bug in the opcodes used to calculate the CRC. Anyway, the biggest hurdle was that nasty bug and that's squashed now. It looks like my emulation isn't fast(er) at running this exerciser. Time will tell how it behaves with the average application program.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
I've started looking at some of the code that is being used to access the SD cards, turns out it's going to be a huge task in co-ordination to get another developer working on those bits of code. For example on the Tri-Blade as far as I can see the code for accessing the SD card is in a file shared with some i2c routines for loading extra info from the EEPROM (to bootload other blades??).
The "everything and the kitchen sink" file-system driver that Kye has written does startup correctly to register all MMC, SD1, SD2, SDHC cards. There's a huge amount of spin on top of the low level block driver to support FAT16 and 32, but that could be stripped out.
I'm not really sure where to look now without stepping on too many toes. I'm also now stuck between a DracBlade clone or a Tri-Blade compatible. I can see advantages to both.
Well, if Kye's driver already does detect all card types, then this would be the way to go, no? Perhaps it can be modularized, i.e. separate the PASM code that does the access from the file system code so that it could be used standalone? I have not looked into anything else but the sdspiFemto.spin that was in Dr_Acula's package. Sorry if I am no help here.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
The other problem is that the SD SPI code alone is not big enough to fill a COG really and with the desperate need for more memory and processors in some setups (DracBlade?) there is usually something else in with the SD code. This means that it's not so easy to just write a nice .spin file with an object in it and drop it in.
As I see it Kye has already done the generic cross-platform driver work, what's left is hardware specific integration of the pasm.
Well, I don't know enough about the details. I'm about to receive a DracBlade and then might help, if I can. Is there more than different pins for the 4 signals to do with hardware specific integration? I saw in some code for the TriBladeProp that it disables access to XMM before going to possibly send or receive commands to/from the SD card and file system stuff.
As I said, I don't know enough about it to be of any help at the moment.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
And in the ld <ix,iy>,nnnn the myterious UART dropout or hiccup happened. It looks like an entire block of 16 bytes is missing.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Lots of IX and IY opcodes there. Does the exerciser do exx and ex af,af'?
re sd cards, the code in the dracblade is old, but it works fine. I'm sure it can be updated and made shorter. (Also the VT100 code might be able to be improved. )
re uarts, I had to go from 38400 back to 19200 baud as xmodem wasn't storing the packets fast enough. I know the N8VEM at 4mhz will work at 38400 but I think the zicog is more like 3 to 3.5mhz so it is not quite fast enough. Also I'm using a different serial object, so there are several reasons why I might have never seen the uart problems. I am hopeful that faster zicog might mean the baud rate can go back up to 38400.
Actually, this is good brain exercise to look at a real program and see how it is processed. See attached - the code for getting a byte in xmodem is in RECV:
Ignore the .IF commands. I think it will read all the commands in until
JP Z,MCHAR ; GOT CHAR
as that is the first jump? Then if that condition is true, jump to MCHAR which is a few bytes further down, skip a whole lot of .IF code, then do the CRC routine on the byte in A and return having collected one byte. A question - next time it gets the byte in RECV, does it need to load all that RECV code back in again? Or is that now cached?
@pullmoll, how many more instructions left?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Post Edited (Dr_Acula) : 3/16/2010 12:08:52 AM GMT
Re: EX tests that failed on a real Z80.
I seem to remember the were two or three. All but one were testing some kind of IN. I was not sure how that was supposed to pass anyway on a MiniN8VEM card where the inputs were undefined in hardware.
The one other failing test case I have of course forgotten.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
This code is probably a) too long to fit in the remaining cog RAM and b) contains forward jumps that, when executed, will cause recompilation. The caching and full speed execution works only for conditional backwards jumps, such as loops in division/multiplication functions or the like.
Do you mean the number of free hub RAM longs? That is around 3792 with pm80.spin alone, so it uses more than half the RAM. Then there is io.spin, which is required as the counterpart to the in and out opcodes. It is currently 469 longs and doing the UART and part of the HDSK handling, without actually accessing an SD card. FullDuplexSerial is 174 longs. It's definitely going to be tight.
I'm thinking about how to get rid of the almost identical code and table for IX/IY. They could be joined if I had a pointer to the registers in hub RAM rather than in cog RAM. This would save a lot of duplicated code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/16/2010 6:00:44 AM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
I have Heater's ifdef TriBladeProp and dira assignments in the pm80 core, too. It should work, if you define TriBladeProp at the top. Of course untested yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
The following tests were almost all failing, though.
I know the reason for cpd/cpi now, but wonder why ld <bcdexya>,<bcdexya> could fail, as there are no flags involved. Also NEG is just a subtract from 0 and should be okay. Well, I've still got some things to do.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/16/2010 6:10:13 AM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/16/2010 5:30:16 PM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
T o n y
No, combining IX/IY with the base opcodes is not feasible, because there are too many exceptions. Think of the LD L,(IX+offs), which would be LD LX,(IX+offs) if you just swapped IX with HL for the instruction.
For me joining the tables cost one conditional jump in the comiler loop and a few longs of precious, precious cog RAM, but it saved so much hub that it's worth it.
I had the 8085 separated out to not clutter the source with even more #ifdefs. It doesn't work right now, after fixing the big bug I had in both cores, but it's in the backyard for use when I understand video generation. I intend to write a Tandy M100 emulator some day. I have written one in C and know the awkward LC display addressing. The M100 has 32KB of ROM and 32KB of RAM, so this is only going to work with XMM.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/16/2010 6:08:15 PM GMT
Why? Simple: Today madness got me and I wrote a nearly complete Z80 core using the LMM technique only, for every opcode. The cog contains the subroutines to do arithmetic and logic operations and flag calculations, while all opcodes are in the hub RAM and executed as LMM. It isn't ready yet, but it will be a little smaller than the DRC core. It is much easier to understand and follow the source compared with the DRC core. It is twice as fast compared to the DRC core, at least! It may even be three times as fast. To me this suggests that the overhead of compiling the Z80 code into PASM does not outweigh the performance gain by direct execution of the compiled code. The idea of dynamic recompilation fails for the Propeller, mostly because there is not enough RAM to have a real cache of translated code fragments, as you would have them on a PC with lots of RAM.
The new core won't take all too long, since it's merely a re-use of the existing and tested code with slight changes here and there. The first dozen of the EX tests already succeed. I'll open a new thread once that core is halfway stable.
Juergen
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/18/2010 11:53:47 PM GMT
Or have you sussed out how to do Z80 in the self extacting bit (as in the 0 or 1)?
Off to bed, one hurting "brain".
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
Look up my FCACHE primitive for LMM, it is in the original thread where I introduced LMM.
http://forums.parallax.com/showthread.php?p=615022
Any LMM code that loops - for example the string move and/or compare operations - benefits greatly, becoming MUCH faster, as it gets rid of the LMM overhead for all but the first execution of the loop.
My standard implementation uses a 128 long buffer in the cog at $080 as the FCACHE buffer, but it can be smaller - it only needs to be large enough to fit the largest FCACHE'd segment.
The format is:
CALL #FCACHE
long number_of_longs_following
org $80
regular pasm code
...
next LMM op follows the number_of_longs longs
Within the cached code, regular jumps & branches & djnz work. I suspect 40-80 longs would be large enough for any complex loop a implementing Z80 opcode would require. Note: only FCACHE the loop, not the setup and epilog code for best results.
Hope this helps,
Bill
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Thank you for the hint! It depends on how much will be left in the cog after all alu operations and flag handling is there. This has priority, because repeating it in the opcodes is useless waste of RAM. There are 8 Z80 instructions with loops, where only 2 are really widely used: ldir/lddr. Their core is 8 or 9 PASM instructions. If space permits, I may just move these core loops into the cog and call them from the LMM code. Currently I have 156 longs left and there isn't much more to do to finish the remaining opcodes. I guess I could go either way: fixed cog functions or dynamic FCACHE functions.
Juergen
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
He died at the console of hunger and thirst.
Next day he was buried. Face down, nine edge first.
Post Edited (pullmoll) : 3/19/2010 5:10:07 AM GMT