VMCOG: Virtual Memory for ZiCog, Zog & more (VMCOG 0.976: PropCade,TriBlade_2,HYDRA HX512,XEDODRAM)

Heater. · 2010-09-01 23:58

Strangely enough with vmcog_v0_975 and 10 pages Zog can get all the way through fibo(26) with all results correct except fib(23).

ffibo(15) = 000610 (00245ms)
fibo(16) = 000987 (00396ms)
fibo(17) = 001597 (00642ms)
fibo(18) = 002584 (01039ms)
fibo(19) = 004181 (01681ms)
fibo(20) = 006765 (02720ms)
fibo(21) = 010946 (04402ms)
fibo(22) = 017711 (07123ms)
fibo(23) = 010968 (04402ms)  <-- Should be 28657
fibo(24) = 046368 (18648ms)  <-- Correct
fibo(25) = 075025 (30174ms)  <-- Correct
fibo(26) = 121393 (07862ms)  <-- Correct

Bill Henning · 2010-09-02 09:33

Weird. There definitely was a bug... and it did get fixed.

The zog issue almost certainly has to be one of:

- another problem with shr_hits
- a problem within BUSERR

If it is a problem in BUSERR, it is almost certainly the overwriting of a variable that would be used to do the actual read/write by the appropriate (byte/word/long) routine after handling the BUSERR. BUSERR is effectively an interrupt handler for a "page not present" interrupt from the soft MMU.

I need to get myself set up to be able to replicate your problem again.

I'd appreciate a bit of help with that - if you could set up a fibo.spin that does not need SD access to run for me it would save me the half day or so required for changing fsrw and testing it to co-exist with the SPI ram's on PropCade.

I'll merge in your TRIBLADE_2 code today.

I'm also going to duplicate David's two chip setup to see why that is not working for him.

Heater. wrote: »

ZOG no like.

This is worse. Depending on the number of pages I have (I tried 8, 10, 20) it either hangs up around fibo(21) or continues a few more fibos with wrong results and then hangs up.

The heater test in my old vmdebug works OK though.

I cannot compile the new vmdebug, BST is complaining about not finding hex method in FullDuplexSerialPlus. No idea why.

Bill, can you take the TRIBLADE_2 sections from the attached VMCog. It is your last 0.981 version + TRIBLADE_2.

Bill Henning · 2010-09-02 09:37

Very weird.

Heater. wrote: »

Strangely enough with vmcog_v0_975 and 10 pages Zog can get all the way through fibo(26) with all results correct except fib(23).

ffibo(15) = 000610 (00245ms)
fibo(16) = 000987 (00396ms)
fibo(17) = 001597 (00642ms)
fibo(18) = 002584 (01039ms)
fibo(19) = 004181 (01681ms)
fibo(20) = 006765 (02720ms)
fibo(21) = 010946 (04402ms)
fibo(22) = 017711 (07123ms)
fibo(23) = 010968 (04402ms)  <-- Should be 28657
fibo(24) = 046368 (18648ms)  <-- Correct
fibo(25) = 075025 (30174ms)  <-- Correct
fibo(26) = 121393 (07862ms)  <-- Correct

jazzed · 2010-09-02 11:03

Bill Henning wrote: »

Very weird.

At this point I'm not entirely surprised by bad output when there is a small memory problem. The committed SdramCache.spin file makes zog speak French when trying to use the dirty bit to flag writes. Since I don't know French, the code is removed entirely now as part of my speed up

David Betz · 2010-09-02 18:14

I got my Hydra SPI SRAM card working with VMCOG!!!

I think the fix to the shr_hits subroutine helped. What prevented it from working after I moved to Bill's new version was forgetting to update the spidir variable with the correct bits to set the direction of all of the pins I use to talk to my SPI SRAM card.

Now, I need to try the microSD card adapter that I also put on this Hydra card. That should let me try running ZOG!!!!

Thanks to everyone who offered help!

Bill Henning · 2010-09-03 10:06

Glad to hear it works!!!

David Betz wrote: »

I got my Hydra SPI SRAM card working with VMCOG!!!

I think the fix to the shr_hits subroutine helped. What prevented it from working after I moved to Bill's new version was forgetting to update the spidir variable with the correct bits to set the direction of all of the pins I use to talk to my SPI SRAM card.

Now, I need to try the microSD card adapter that I also put on this Hydra card. That should let me try running ZOG!!!!

Thanks to everyone who offered help!

Heater. · 2010-09-04 15:51

Bill, here is a zog using VMCog with no SD card and running fibo.

I had to make fibo much smaller to get it to fit in HUB with some VM pages. It uses print and outbyte rather than iprintf which gets it down to about 3.5K

As it's now a different program it fails in different ways but fails anyway.

Bill Henning · 2010-09-05 12:18

It would not compile for me - too big, with my version of vmcog.

I hacked and slashed, made it barely fit, and it won't print the logon message.

I'll try to get the SD working, which requires hacking fsrw, so I won't have these problems.

Heater. wrote: »

Bill, here is a zog using VMCog with no SD card and running fibo.

I had to make fibo much smaller to get it to fit in HUB with some VM pages. It uses print and outbyte rather than iprintf which gets it down to about 3.5K

As it's now a different program it fails in different ways but fails anyway.

Heater. · 2010-09-05 12:55

TOO BIG?

F8 tells me it compiles to 2727 longs with 5280 free. Plenty of room for the 8 or so pages I was using.

What on earth have you put into VMCog for it to be too big?

I've been playing with various variations of the heater test this evening basically try to introduce a lot of fast thrashing by hitting only a few bytes in each page. And trying to induce a reproducible failure.

I have a hard time making it fail. After one very long run there was a failure but I could not reproduce it. I don't like a random failure as it points to some power supply or other hardware glitch.

Bill Henning · 2010-09-05 13:04

LOL, I think it is time I took a breath, re-factored VMCOG, and tried to fix it slowly and methodically, instead of trying to hurry to fix it.

I am running into MANY strange problems, and I need to find out why.

1) the loop that initialized six spi ram's now does not initialize ram 0 - unless I use a different, more verbose version

2) the 17th command added to the table simply does not work!

3) FlexMem ReadStatus and ReadLong work, WriteStatus and WriteLong don't

I am beginning to wonder if I fried TWO props...

Power and ground are clean, I've already put a scope on it. Everything is being transmitted correctly, I've run it under viewport.

I am also thinking along the same lines you are... and writing more memory tests, to find out if there are any faults left in the sacrificial page selection and paging in/out.

I am about ready to kill for #include, and for #ifdef supporting "A | B"...

Heater. wrote: »

TOO BIG?

F8 tells me it compiles to 2727 longs with 5280 free. Plenty of room for the 8 or so pages I was using.

What on earth have you put into VMCog for it to be too big?

I've been playing with various variations of the heater test this evening basically try to introduce a lot of fast thrashing by hitting only a few bytes in each page. And trying to induce a reproducible failure.

I have a hard time making it fail. After one very long run there was a failure but I could not reproduce it. I don't like a random failure as it points to some power supply or other hardware glitch.

Heater. · 2010-09-05 13:42

What clues have we got here?

1) I have a hard time getting a failure in VMCog on TriBlade with any variant of heater test (now modified to run in a loop for ever or until failure). I have had one, unreproducible, failure over many hours now. That one I might put down to some funky hardware glitch. This kind of tells me that vmcog_v0_981_TRIBLADE_2 basically works.

2) Zog itself seems to be solid. It runs from HUB OK. It runs fro Jazzed's SDRAM OK.

3) David has the same symptoms (as far as I can tell) with Zog/VMCog using a different hardware. This suggests the problem is not with the hardware interface code of TriBlade.

What does that leave us?

Well one major difference between using VMCog from Zog or a heater test is the fact that Zog has a PASM interface to the mailbox whereas the heater test uses VMCogs Spin interface.

So there is a possibility that Zog's PASM interface is not quite correct? Is there perhaps some weird timing issue as it is operating at PASM, not Spin, speeds? Has anyone reviewed that Zog/VMCog interface, from both ends? I've had a look occasionally but spotted nothing untoward.

Heater. · 2010-09-05 14:33

Important clue found!

I can reliably change the way Zog/VMCog fails by changing the timing in Zog's PASM mailbox handling code.

What I do is to add HUB operations to the end of the read functions, for example read_long ends like this:

                        mov     addr, address
                        shl     addr, #9
                        movs    addr, #READVML
                        wrlong  addr, mboxcmd
:waitres                rdlong  data, mboxcmd wz
              if_nz     jmp     #:waitres
                        rdlong  data, mboxdat
                        rdlong  data, mboxdat       <---EXTRA HUB OP ADDED

Now as far as I know the data at mboxdat should not be changing unless I command it to do so but here is the result, first is Zog with extra delays second original Zog:

ZOG v1.6 (VM, No SD)
fibo(0) = 0 (1ms)
fibo(1) = 1 (0ms)
fibo(2) = 1 (0ms)
fibo(3) = 2 (0ms)
fibo(4) = 3 (1ms)
fibo(5) = 5 (2ms)
fibo(6) = 8 (3ms)
fibo(7) = 13 (5ms)
fibo(8) = 21 (9ms)
fibo(9) = 34 (15ms)
fibo(10) = 55 (25ms)
fibo(11) = 89 (40ms)
fibo(12) = 144 (66ms)
fibo(13) = 233 (107ms)
fibo(14) = 377 (173ms)
fibo(15) = 610 (280ms)
fibo(16) = 987 (454ms)
fibo(17) = 1597 (734ms)
fibo(18) = 2584 (1189ms)
fibo(19) = 4181 (1924ms)
fibo(20) = 6765 (3113ms)
fibo(21) = 10946 (5037ms)
fibo(22) = 17711 (8150ms)
fibo(23) = 29229 (13188ms)
fibo(24) =
#pc,opcode,sp,top_of_stack,next_on_stack
#

0X000061D 0X00 0X00000618 0X000000B1
BREAKPOINT
ZOG v1.6 (VM, No SD)
fibo(0) = 0 (1ms)
fibo(1) = 1 (0ms)
fibo(2) = 1 (0ms)
fibo(3) = 2 (0ms)
fibo(4) = 3 (1ms)
fibo(5) = 5 (1ms)
fibo(6) = 8 (3ms)
fibo(7) = 13 (5ms)
fibo(8) = 21 (8ms)
fibo(9) = 34 (13ms)
fibo(10) = 55 (22ms)
fibo(11) = 89 (35ms)
fibo(12) = 144 (57ms)
fibo(13) = 233 (93ms)
fibo(14) = 377 (151ms)
fibo(15) = 610 (245ms)
fibo(16) = 987 (396ms)
fibo(17) = 1597 (642ms)
fibo(18) = 2584 (1039ms)
fibo(19) = 4181 (1681ms)
fibo(20) = 6765 (2720ms)
fibo(21) = 10946 (4402ms)
fibo(22) =
#pc,opcode,sp,top_of_stack,next_on_stack
#

0X000061D 0X00 0X00000618 0X000000B0
BREAKPOINT

Heater. · 2010-09-05 20:34

Another similar clue:

Moving to the other side of the Zog/VMCog interface I can change the point at which the fibo test fails, from fibo(22) to fibo(25) by adding hubops to VMCogs waitcmd loop. Like so:

waitcmd wrlong zero,pvmcmd

wl    rdlong    vminst, pvmcmd wz ' top 23 bits = address, bottom 9 = command
    mov    vmaddr, vminst

    rdlong    nothing, 0          <--- Add this 'harmless' HUBOP

 if_z    jmp    #wl

    rdlong    nothing, 0        <--- or this, or both

    shr    vmaddr, #9
    mov     vmpage, vmaddr
    shr    vmpage, #9
    movs    present, vmpage

As this change should affect nothing but timing I find this a bit worry some.

Edit: Slugging the TriBlade BREAD, BWRITE loops with such delays, extending RAM access time, has no noticeable effect.

Bill Henning · 2010-09-06 06:53

Hi heater.

Sound reasoning. Other than ZOG, I have been unable to get a failure - mind you, just like your tests, that was with a Spin client.

I will take another look at ZOG's pasm interface. I actually have a few hours to work on this today!

Heater. wrote: »

What clues have we got here?

1) I have a hard time getting a failure in VMCog on TriBlade with any variant of heater test (now modified to run in a loop for ever or until failure). I have had one, unreproducible, failure over many hours now. That one I might put down to some funky hardware glitch. This kind of tells me that vmcog_v0_981_TRIBLADE_2 basically works.

2) Zog itself seems to be solid. It runs from HUB OK. It runs fro Jazzed's SDRAM OK.

3) David has the same symptoms (as far as I can tell) with Zog/VMCog using a different hardware. This suggests the problem is not with the hardware interface code of TriBlade.

What does that leave us?

Well one major difference between using VMCog from Zog or a heater test is the fact that Zog has a PASM interface to the mailbox whereas the heater test uses VMCogs Spin interface.

So there is a possibility that Zog's PASM interface is not quite correct? Is there perhaps some weird timing issue as it is operating at PASM, not Spin, speeds? Has anyone reviewed that Zog/VMCog interface, from both ends? I've had a look occasionally but spotted nothing untoward.

Bill Henning · 2010-09-06 06:54

Ugh.

I don't like this... the extra readlong should not be needed if the prop was behaving as documented...

Heater. wrote: »
Important clue found!

I can reliably change the way Zog/VMCog fails by changing the timing in Zog's PASM mailbox handling code.

What I do is to add HUB operations to the end of the read functions, for example read_long ends like this:
                        mov     addr, address
                        shl     addr, #9
                        movs    addr, #READVML
                        wrlong  addr, mboxcmd
:waitres                rdlong  data, mboxcmd wz
              if_nz     jmp     #:waitres
                        rdlong  data, mboxdat
                        rdlong  data, mboxdat       <---EXTRA HUB OP ADDED
Now as far as I know the data at mboxdat should not be changing unless I command it to do so but here is the result, first is Zog with extra delays second original Zog:

ZOG v1.6 (VM, No SD)
fibo(0) = 0 (1ms)
fibo(1) = 1 (0ms)
fibo(2) = 1 (0ms)
fibo(3) = 2 (0ms)
fibo(4) = 3 (1ms)
fibo(5) = 5 (2ms)
fibo(6) = 8 (3ms)
fibo(7) = 13 (5ms)
fibo(8) = 21 (9ms)
fibo(9) = 34 (15ms)
fibo(10) = 55 (25ms)
fibo(11) = 89 (40ms)
fibo(12) = 144 (66ms)
fibo(13) = 233 (107ms)
fibo(14) = 377 (173ms)
fibo(15) = 610 (280ms)
fibo(16) = 987 (454ms)
fibo(17) = 1597 (734ms)
fibo(18) = 2584 (1189ms)
fibo(19) = 4181 (1924ms)
fibo(20) = 6765 (3113ms)
fibo(21) = 10946 (5037ms)
fibo(22) = 17711 (8150ms)
fibo(23) = 29229 (13188ms)
fibo(24) =
#pc,opcode,sp,top_of_stack,next_on_stack
#

0X000061D 0X00 0X00000618 0X000000B1
BREAKPOINT
ZOG v1.6 (VM, No SD)
fibo(0) = 0 (1ms)
fibo(1) = 1 (0ms)
fibo(2) = 1 (0ms)
fibo(3) = 2 (0ms)
fibo(4) = 3 (1ms)
fibo(5) = 5 (1ms)
fibo(6) = 8 (3ms)
fibo(7) = 13 (5ms)
fibo(8) = 21 (8ms)
fibo(9) = 34 (13ms)
fibo(10) = 55 (22ms)
fibo(11) = 89 (35ms)
fibo(12) = 144 (57ms)
fibo(13) = 233 (93ms)
fibo(14) = 377 (151ms)
fibo(15) = 610 (245ms)
fibo(16) = 987 (396ms)
fibo(17) = 1597 (642ms)
fibo(18) = 2584 (1039ms)
fibo(19) = 4181 (1681ms)
fibo(20) = 6765 (2720ms)
fibo(21) = 10946 (4402ms)
fibo(22) =
#pc,opcode,sp,top_of_stack,next_on_stack
#

0X000061D 0X00 0X00000618 0X000000B0
BREAKPOINT

Bill Henning · 2010-09-06 06:56

This is beginning to look more and more like some sort of hub weirdness with very high speed spinners!

It almost looks like the prop is returning the wrong result sometimes when two cogs interact with very high speed spinners.

Heater. wrote: »
Another similar clue:

Moving to the other side of the Zog/VMCog interface I can change the point at which the fibo test fails, from fibo(22) to fibo(25) by adding hubops to VMCogs waitcmd loop. Like so:
waitcmd wrlong zero,pvmcmd

wl    rdlong    vminst, pvmcmd wz ' top 23 bits = address, bottom 9 = command
    mov    vmaddr, vminst

    rdlong    nothing, 0          <--- Add this 'harmless' HUBOP

 if_z    jmp    #wl

    rdlong    nothing, 0        <--- or this, or both

    shr    vmaddr, #9
    mov     vmpage, vmaddr
    shr    vmpage, #9
    movs    present, vmpage
As this change should affect nothing but timing I find this a bit worry some.

Edit: Slugging the TriBlade BREAD, BWRITE loops with such delays, extending RAM access time, has no noticeable effect.

Bill Henning · 2010-09-06 07:20

heater,

Sapieha had a good question... do you still see an improvement if you use NOP's instead of extra hub ops?

David Betz · 2010-09-06 07:43

I have a question about how BUSERR works. I was looking at the following code.

clrold  mov   0-0,#0            ' clear old version
        movd  setent,vmpage
' method #1 - leave access count alone
        andn  tlbi,VM_FLAGS    ' does not work - cannot keep count
' method #2 - semi-average
'        call  #hitavg

This is the code that clears the old TLB entry and stores the new one. I looked at it and decided it was wrong since it didn't set the access count to one for the newly loaded page. It seems like that would make sense because the page has just been loaded and has only been accessed once. Actually, zero would probably be best since it gets incremented after the call to BUSERR by the read/write code. However, setting it to one or zero makes VMCOG perform very slowly. If you think about it, the idea of basing the decision of which page to kick out on the access count doesn't make any sense. The most recently loaded page will have the lowest access count and hence will be the first to be kicked out. That will cause significant trashing if the program being run crosses at least one page boundry. It would be better to kick out the least recently used page instead. I don't think this is the cause of the FIBO problems because it only really affects the page replacement algorithm but it might be causing performance problems. Also, the current code just leaves the count alone when it reuses a page which seems wrong since it means that the counts will be ever increasing until they overflow and are divided by two. It also means that the access counts don't really reflect the number of access to the current virtual page that is loaded but instead reflect the history of access to that hub page.

Bill Henning · 2010-09-06 08:11

Excellent, and correct, observations David.

The 'andn tlbi,VM_FLAGS' as you correctly point out leaves the access count alone - which seems to be counter intuitive - however it is MUCH better than zeroing the count.

Zeroing the count would pretty much guarantee that the new page would be sacrificed the next time a page needed to be swapped in.

Leaving the count alone means it will be increased by the number of accesses until another sacrificial page was needed - so there is at least a chance that the new page will stay in-core.

Initially I was going to assign newly loaded pages the "average" count of all pages, to ensure they stick around for a while; but I decided not to spend precious cog memory on it. It would also hurt performance if the new page was not going to be accessed frequently.

I've gotten sidetracked by my new boards, debugging, and adding MORPHEUS1 and FLEXMEM drivers; however one simple answer is to add a small bias to make it more likely that a newly loaded page will stay around for a bit - say (16...256)<<11. This would however be counter productive if only a single read was needed from that new page... Note that keeping the previous count already provides the newly loaded page with a significant bias to stay around for a while.

There is a similar issue in shr_hits. The page that overflows the count would have its count set to zero by the shift, so I actually force it to just over half the possible range. That is another "tuning" constant in the code.

Many papers have been written on page replacement policies - my intention was to implement a simple "least used" policy to start with, and then tune it later after it was working well.

The currently implemented replacement policy will tend to favor keeping frequently used pages in-core, and will tend to penalize newly swapped in pages, as you pointed out. The penalty is not quite as bad is it would seem, as whenever a count for a page overflows, the usage count of all in-core pages is divided in two - so little used pages will tend to have their access counts reduced drastically, and thus offer themselves for sacrifice when a new page needs to be loaded.

Adding a small bias to the count of a replaced page will tend to keep the newly swapped in page around long enough for its access count to become influential as another little-used page will tend to be sacrificed before the newly loaded page due to the bias.

Frankly, once VMCOG works perfectly, playing with page replacement policies promises to be fun

David Betz wrote: »
I have a question about how BUSERR works. I was looking at the following code.
clrold  mov   0-0,#0            ' clear old version
        movd  setent,vmpage
' method #1 - leave access count alone
        andn  tlbi,VM_FLAGS    ' does not work - cannot keep count
' method #2 - semi-average
'        call  #hitavg
This is the code that clears the old TLB entry and stores the new one. I looked at it and decided it was wrong since it didn't set the access count to one for the newly loaded page. It seems like that would make sense because the page has just been loaded and has only been accessed once. Actually, zero would probably be best since it gets incremented after the call to BUSERR by the read/write code. However, setting it to one or zero makes VMCOG perform very slowly. If you think about it, the idea of basing the decision of which page to kick out on the access count doesn't make any sense. The most recently loaded page will have the lowest access count and hence will be the first to be kicked out. That will cause significant trashing if the program being run crosses at least one page boundry. It would be better to kick out the least recently used page instead. I don't think this is the cause of the FIBO problems because it only really affects the page replacement algorithm but it might be causing performance problems. Also, the current code just leaves the count alone when it reuses a page which seems wrong since it means that the counts will be ever increasing until they overflow and are divided by two. It also means that the access counts don't really reflect the number of access to the current virtual page that is loaded but instead reflect the history of access to that hub page.

Bill Henning · 2010-09-06 08:26

Due to David's question, I re-read the "find sacrifical page" algorithm, and found another bug!

When I changed the TLB format, I forgot to change the "bigacc" value that is used when selecting a sacrificial page.

This would have lead to a bug where it might not have been possible to locate a page to sacrifice!

Please change:

bigacc LONG $00400000

to

bigacc long $FFFFF800

Above was in error as per David's message below, DO NOT CHANGE

David Betz · 2010-09-06 09:02

Are you sure this is correct? You shift the access count right by 11 before comparing it with minacc. That means that the largest value it could possibly have is $1fffff (21 bits of ones) which is smaller than $400000.

Bill Henning wrote: »

Due to David's question, I re-read the "find sacrifical page" algorithm, and found another bug!

When I changed the TLB format, I forgot to change the "bigacc" value that is used when selecting a sacrificial page.

This would have lead to a bug where it might not have been possible to locate a page to sacrifice!

Please change:

bigacc LONG $00400000

to

bigacc long $FFFFF800

Bill Henning · 2010-09-06 09:06

Thanks, you are correct.

My excuse: I have not had coffee yet...

(the shift-right is to get rid of the hub page address, and the LOCK and DIRTY bits)

David Betz wrote: »

Are you sure this is correct? You shift the access count right by 11 before comparing it with minacc. That means that the largest value it could possibly have is $1fffff (21 bits of ones) which is smaller than $400000.

David Betz · 2010-09-06 09:11

Bill Henning wrote: »

Thanks, you are correct.

My excuse: I have not had coffee yet...

(the shift-right is to get rid of the hub page address, and the LOCK and DIRTY bits)

However, after thinking about it a little, I think you could probably get rid of the shift and use your new bigacc value. It won't hurt to include the page number and lock/dirty bits in the comparison since they're low order bits. The only time they will make a difference is when two entries have the same upper 21 bits and then it probably doesn't matter which page you choose to sacrifice anyway. So, use your new bigacc value and get rid of the shift instruction and you speed up VMCOG (slightly).

Heater. · 2010-09-06 09:14

Bill,

I'm not ready to entertain the idea that the Prop is mis-behaving with high speed spinners. Surely something like that would have surfaced already, there have been plenty of projects with high speed spinners and PASM COG synchronization. Even my old 4 COG Z80 emulation had high speed spinners and showed no problems there.

Adding otherwise harmless instructions has two effects:
1) Change the timing a little.
2) Move the rest of the code around a bit.

Looking at the code I can't see 1) being the problem.

But 2) means that if something is being corrupted by a stray pointer or such then adding the instruction can cause something different to be corrupted and the failure mode to change.

Anyway I just tried adding the HUBOP or NOP into waicmd.

Without the new instruction we get up to fibo(23)
With either RDLONG or NOP we only get to fibo(21)
With 5 NOPS we get to fibo(25)
With 9 NOPS we get only to fibo(22).

What does it all mean?

waitcmd wrlong zero,pvmcmd

wl    rdlong    vminst, pvmcmd wz ' top 23 bits = address, bottom 9 = command
    mov    vmaddr, vminst

'        rdlong  nothing, 0         <--- Harmless HUBOP  ?
        nop                              <--- Harmless NOPs  ?
        nop
        nop
        nop
        nop
        nop
        nop
        nop
        nop

 if_z    jmp    #wl

Bill Henning · 2010-09-06 09:20

Good point!

David Betz wrote: »

However, after thinking about it a little, I think you could probably get rid of the shift and use your new bigacc value. It won't hurt to include the page number and lock/dirty bits in the comparison since they're low order bits. The only time they will make a difference is when two entries have the same upper 21 bits and then it probably doesn't matter which page you choose to sacrifice anyway. So, use your new bigacc value and get rid of the shift instruction and you speed up VMCOG (slightly).

Bill Henning · 2010-09-06 09:22

Very strong evidence that something is getting corrupted!

Now to locate the culprit...

Heater. wrote: »
Bill,

I'm not ready to entertain the idea that the Prop is mis-behaving with high speed spinners. Surely something like that would have surfaced already, there have been plenty of projects with high speed spinners and PASM COG synchronization. Even my old 4 COG Z80 emulation had high speed spinners and showed no problems there.

Adding otherwise harmless instructions has two effects:
1) Change the timing a little.
2) Move the rest of the code around a bit.

Looking at the code I can't see 1) being the problem.

But 2) means that if something is being corrupted by a stray pointer or such then adding the instruction can cause something different to be corrupted and the failure mode to change.

Anyway I just tried adding the HUBOP or NOP into waicmd.

Without the new instruction we get up to fibo(23)
With either RDLONG or NOP we only get to fibo(21)
With 5 NOPS we get to fibo(25)
With 9 NOPS we get only to fibo(22).

What does it all mean?
waitcmd wrlong zero,pvmcmd

wl    rdlong    vminst, pvmcmd wz ' top 23 bits = address, bottom 9 = command
    mov    vmaddr, vminst

'        rdlong  nothing, 0         <--- Harmless HUBOP  ?
        nop                              <--- Harmless NOPs  ?
        nop
        nop
        nop
        nop
        nop
        nop
        nop
        nop

 if_z    jmp    #wl

Sapieha · 2010-09-06 09:26

Hi Heater.

I said that I will NOT post anything before FORUM function correct ---- BUT.

It is to intriguing problem.
I have ONE more test to You - LOOK on Yours attached code.

Heater. wrote: »
Bill,

I'm not ready to entertain the idea that the Prop is mis-behaving with high speed spinners. Surely something like that would have surfaced already, there have been plenty of projects with high speed spinners and PASM COG synchronization. Even my old 4 COG Z80 emulation had high speed spinners and showed no problems there.

Adding otherwise harmless instructions has two effects:
1) Change the timing a little.
2) Move the rest of the code around a bit.

Looking at the code I can't see 1) being the problem.

But 2) means that if something is being corrupted by a stray pointer or such then adding the instruction can cause something different to be corrupted and the failure mode to change.

Anyway I just tried adding the HUBOP or NOP into waicmd.

Without the new instruction we get up to fibo(23)
With either RDLONG or NOP we only get to fibo(21)
With 5 NOPS we get to fibo(25)
With 9 NOPS we get only to fibo(22).

What does it all mean?
waitcmd wrlong zero,pvmcmd

wl    rdlong    vminst, pvmcmd wz ' top 23 bits = address, bottom 9 = command
       [COLOR=Red] NOP        ' <---- ADD one else more nop's at this position[/COLOR]
    mov    vmaddr, vminst

'        rdlong  nothing, 0         <--- Harmless HUBOP  ?
        nop                              <--- Harmless NOPs  ?
'        nop
'        nop
[COLOR=Blue]'        nop           '<---- Test in same time with different count of NOP's in this place[/COLOR]
'        nop
'        nop
'        nop
'        nop
'        nop

 if_z    jmp    #wl

Bill Henning · 2010-09-06 09:33

The culprit would pretty much have to be a 'movd' used to modify the destination of an instruction; or the code before it that calculated the value to 'moved'.

Most of the movd's place 'vmpage' in the destination; I just added an 'and vmpage,#127' where vmpage is calculated (that is the maximum valid vmpage with this version of VMCOG) but that was not enough to stop the old zog from crashing.

Searching...

Heater. wrote: »
Bill,

I'm not ready to entertain the idea that the Prop is mis-behaving with high speed spinners. Surely something like that would have surfaced already, there have been plenty of projects with high speed spinners and PASM COG synchronization. Even my old 4 COG Z80 emulation had high speed spinners and showed no problems there.

Adding otherwise harmless instructions has two effects:
1) Change the timing a little.
2) Move the rest of the code around a bit.

Looking at the code I can't see 1) being the problem.

But 2) means that if something is being corrupted by a stray pointer or such then adding the instruction can cause something different to be corrupted and the failure mode to change.

Anyway I just tried adding the HUBOP or NOP into waicmd.

Without the new instruction we get up to fibo(23)
With either RDLONG or NOP we only get to fibo(21)
With 5 NOPS we get to fibo(25)
With 9 NOPS we get only to fibo(22).

What does it all mean?
waitcmd wrlong zero,pvmcmd

wl    rdlong    vminst, pvmcmd wz ' top 23 bits = address, bottom 9 = command
    mov    vmaddr, vminst

'        rdlong  nothing, 0         <--- Harmless HUBOP  ?
        nop                              <--- Harmless NOPs  ?
        nop
        nop
        nop
        nop
        nop
        nop
        nop
        nop

 if_z    jmp    #wl

Bill Henning · 2010-09-06 09:36

I think the most likely place is in BUSERR, as it would do the most (and most relevant) movd's that are not movd <someloc>,vmpage

update

I just went over BUSERR again - very slowly, line by line - and could not find anything there.

Bill Henning · 2010-09-06 10:33

Please find attached a new VMDEBUG, with a new 'x' test.

This test:

- fills 64KB with a pattern that encodes the page/word, so we can predict the value at each word, then

repeat forever

- uses Chip's realrandom object to pick a random page, and a random word within the page
- checks that it reads the expected value, shows error if unexpected value read
- writes $aa55 to force the DIRTY bit on
- makes sure that it can read it back, shows error if write did not go as expected
- puts back the 'expected value'

The above ensures a great deal of paging (BUSERR's) with a flush & read at each BUSERR due to the write setting the dirty bit

every 1000 cycles the current count is printed

- I have yet to see an error message (except when I tested those routines with deliberate bad values)

It even works with a working set of just one page!

I may write a PASM version later.

UPDATE:

- added error count to VMDEBUG 'x' command, shown every 1000 probes

VMCOG: Virtual Memory for ZiCog, Zog & more (VMCOG 0.976: PropCade,TriBlade_2,HYDRA HX512,XEDODRAM)

Comments