Weird. There definitely was a bug... and it did get fixed.
The zog issue almost certainly has to be one of:
- another problem with shr_hits
- a problem within BUSERR
If it is a problem in BUSERR, it is almost certainly the overwriting of a variable that would be used to do the actual read/write by the appropriate (byte/word/long) routine after handling the BUSERR. BUSERR is effectively an interrupt handler for a "page not present" interrupt from the soft MMU.
I need to get myself set up to be able to replicate your problem again.
I'd appreciate a bit of help with that - if you could set up a fibo.spin that does not need SD access to run for me it would save me the half day or so required for changing fsrw and testing it to co-exist with the SPI ram's on PropCade.
I'll merge in your TRIBLADE_2 code today.
I'm also going to duplicate David's two chip setup to see why that is not working for him.
This is worse. Depending on the number of pages I have (I tried 8, 10, 20) it either hangs up around fibo(21) or continues a few more fibos with wrong results and then hangs up.
The heater test in my old vmdebug works OK though.
I cannot compile the new vmdebug, BST is complaining about not finding hex method in FullDuplexSerialPlus. No idea why.
Bill, can you take the TRIBLADE_2 sections from the attached VMCog. It is your last 0.981 version + TRIBLADE_2.
At this point I'm not entirely surprised by bad output when there is a small memory problem. The committed SdramCache.spin file makes zog speak French when trying to use the dirty bit to flag writes. Since I don't know French, the code is removed entirely now as part of my speed up
I got my Hydra SPI SRAM card working with VMCOG!!!
I think the fix to the shr_hits subroutine helped. What prevented it from working after I moved to Bill's new version was forgetting to update the spidir variable with the correct bits to set the direction of all of the pins I use to talk to my SPI SRAM card.
Now, I need to try the microSD card adapter that I also put on this Hydra card. That should let me try running ZOG!!!!
I got my Hydra SPI SRAM card working with VMCOG!!!
I think the fix to the shr_hits subroutine helped. What prevented it from working after I moved to Bill's new version was forgetting to update the spidir variable with the correct bits to set the direction of all of the pins I use to talk to my SPI SRAM card.
Now, I need to try the microSD card adapter that I also put on this Hydra card. That should let me try running ZOG!!!!
Bill, here is a zog using VMCog with no SD card and running fibo.
I had to make fibo much smaller to get it to fit in HUB with some VM pages. It uses print and outbyte rather than iprintf which gets it down to about 3.5K
As it's now a different program it fails in different ways but fails anyway.
Bill, here is a zog using VMCog with no SD card and running fibo.
I had to make fibo much smaller to get it to fit in HUB with some VM pages. It uses print and outbyte rather than iprintf which gets it down to about 3.5K
As it's now a different program it fails in different ways but fails anyway.
F8 tells me it compiles to 2727 longs with 5280 free. Plenty of room for the 8 or so pages I was using.
What on earth have you put into VMCog for it to be too big?
I've been playing with various variations of the heater test this evening basically try to introduce a lot of fast thrashing by hitting only a few bytes in each page. And trying to induce a reproducible failure.
I have a hard time making it fail. After one very long run there was a failure but I could not reproduce it. I don't like a random failure as it points to some power supply or other hardware glitch.
LOL, I think it is time I took a breath, re-factored VMCOG, and tried to fix it slowly and methodically, instead of trying to hurry to fix it.
I am running into MANY strange problems, and I need to find out why.
1) the loop that initialized six spi ram's now does not initialize ram 0 - unless I use a different, more verbose version
2) the 17th command added to the table simply does not work!
3) FlexMem ReadStatus and ReadLong work, WriteStatus and WriteLong don't
I am beginning to wonder if I fried TWO props...
Power and ground are clean, I've already put a scope on it. Everything is being transmitted correctly, I've run it under viewport.
I am also thinking along the same lines you are... and writing more memory tests, to find out if there are any faults left in the sacrificial page selection and paging in/out.
I am about ready to kill for #include, and for #ifdef supporting "A | B"...
F8 tells me it compiles to 2727 longs with 5280 free. Plenty of room for the 8 or so pages I was using.
What on earth have you put into VMCog for it to be too big?
I've been playing with various variations of the heater test this evening basically try to introduce a lot of fast thrashing by hitting only a few bytes in each page. And trying to induce a reproducible failure.
I have a hard time making it fail. After one very long run there was a failure but I could not reproduce it. I don't like a random failure as it points to some power supply or other hardware glitch.
1) I have a hard time getting a failure in VMCog on TriBlade with any variant of heater test (now modified to run in a loop for ever or until failure). I have had one, unreproducible, failure over many hours now. That one I might put down to some funky hardware glitch. This kind of tells me that vmcog_v0_981_TRIBLADE_2 basically works.
2) Zog itself seems to be solid. It runs from HUB OK. It runs fro Jazzed's SDRAM OK.
3) David has the same symptoms (as far as I can tell) with Zog/VMCog using a different hardware. This suggests the problem is not with the hardware interface code of TriBlade.
What does that leave us?
Well one major difference between using VMCog from Zog or a heater test is the fact that Zog has a PASM interface to the mailbox whereas the heater test uses VMCogs Spin interface.
So there is a possibility that Zog's PASM interface is not quite correct? Is there perhaps some weird timing issue as it is operating at PASM, not Spin, speeds? Has anyone reviewed that Zog/VMCog interface, from both ends? I've had a look occasionally but spotted nothing untoward.
Now as far as I know the data at mboxdat should not be changing unless I command it to do so but here is the result, first is Zog with extra delays second original Zog:
Moving to the other side of the Zog/VMCog interface I can change the point at which the fibo test fails, from fibo(22) to fibo(25) by adding hubops to VMCogs waitcmd loop. Like so:
1) I have a hard time getting a failure in VMCog on TriBlade with any variant of heater test (now modified to run in a loop for ever or until failure). I have had one, unreproducible, failure over many hours now. That one I might put down to some funky hardware glitch. This kind of tells me that vmcog_v0_981_TRIBLADE_2 basically works.
2) Zog itself seems to be solid. It runs from HUB OK. It runs fro Jazzed's SDRAM OK.
3) David has the same symptoms (as far as I can tell) with Zog/VMCog using a different hardware. This suggests the problem is not with the hardware interface code of TriBlade.
What does that leave us?
Well one major difference between using VMCog from Zog or a heater test is the fact that Zog has a PASM interface to the mailbox whereas the heater test uses VMCogs Spin interface.
So there is a possibility that Zog's PASM interface is not quite correct? Is there perhaps some weird timing issue as it is operating at PASM, not Spin, speeds? Has anyone reviewed that Zog/VMCog interface, from both ends? I've had a look occasionally but spotted nothing untoward.
Now as far as I know the data at mboxdat should not be changing unless I command it to do so but here is the result, first is Zog with extra delays second original Zog:
Moving to the other side of the Zog/VMCog interface I can change the point at which the fibo test fails, from fibo(22) to fibo(25) by adding hubops to VMCogs waitcmd loop. Like so:
I have a question about how BUSERR works. I was looking at the following code.
clrold mov 0-0,#0 ' clear old version
movd setent,vmpage
' method #1 - leave access count alone
andn tlbi,VM_FLAGS ' does not work - cannot keep count
' method #2 - semi-average
' call #hitavg
This is the code that clears the old TLB entry and stores the new one. I looked at it and decided it was wrong since it didn't set the access count to one for the newly loaded page. It seems like that would make sense because the page has just been loaded and has only been accessed once. Actually, zero would probably be best since it gets incremented after the call to BUSERR by the read/write code. However, setting it to one or zero makes VMCOG perform very slowly. If you think about it, the idea of basing the decision of which page to kick out on the access count doesn't make any sense. The most recently loaded page will have the lowest access count and hence will be the first to be kicked out. That will cause significant trashing if the program being run crosses at least one page boundry. It would be better to kick out the least recently used page instead. I don't think this is the cause of the FIBO problems because it only really affects the page replacement algorithm but it might be causing performance problems. Also, the current code just leaves the count alone when it reuses a page which seems wrong since it means that the counts will be ever increasing until they overflow and are divided by two. It also means that the access counts don't really reflect the number of access to the current virtual page that is loaded but instead reflect the history of access to that hub page.
The 'andn tlbi,VM_FLAGS' as you correctly point out leaves the access count alone - which seems to be counter intuitive - however it is MUCH better than zeroing the count.
Zeroing the count would pretty much guarantee that the new page would be sacrificed the next time a page needed to be swapped in.
Leaving the count alone means it will be increased by the number of accesses until another sacrificial page was needed - so there is at least a chance that the new page will stay in-core.
Initially I was going to assign newly loaded pages the "average" count of all pages, to ensure they stick around for a while; but I decided not to spend precious cog memory on it. It would also hurt performance if the new page was not going to be accessed frequently.
I've gotten sidetracked by my new boards, debugging, and adding MORPHEUS1 and FLEXMEM drivers; however one simple answer is to add a small bias to make it more likely that a newly loaded page will stay around for a bit - say (16...256)<<11. This would however be counter productive if only a single read was needed from that new page... Note that keeping the previous count already provides the newly loaded page with a significant bias to stay around for a while.
There is a similar issue in shr_hits. The page that overflows the count would have its count set to zero by the shift, so I actually force it to just over half the possible range. That is another "tuning" constant in the code.
Many papers have been written on page replacement policies - my intention was to implement a simple "least used" policy to start with, and then tune it later after it was working well.
The currently implemented replacement policy will tend to favor keeping frequently used pages in-core, and will tend to penalize newly swapped in pages, as you pointed out. The penalty is not quite as bad is it would seem, as whenever a count for a page overflows, the usage count of all in-core pages is divided in two - so little used pages will tend to have their access counts reduced drastically, and thus offer themselves for sacrifice when a new page needs to be loaded.
Adding a small bias to the count of a replaced page will tend to keep the newly swapped in page around long enough for its access count to become influential as another little-used page will tend to be sacrificed before the newly loaded page due to the bias.
Frankly, once VMCOG works perfectly, playing with page replacement policies promises to be fun
I have a question about how BUSERR works. I was looking at the following code.
clrold mov 0-0,#0 ' clear old version
movd setent,vmpage
' method #1 - leave access count alone
andn tlbi,VM_FLAGS ' does not work - cannot keep count
' method #2 - semi-average
' call #hitavg
This is the code that clears the old TLB entry and stores the new one. I looked at it and decided it was wrong since it didn't set the access count to one for the newly loaded page. It seems like that would make sense because the page has just been loaded and has only been accessed once. Actually, zero would probably be best since it gets incremented after the call to BUSERR by the read/write code. However, setting it to one or zero makes VMCOG perform very slowly. If you think about it, the idea of basing the decision of which page to kick out on the access count doesn't make any sense. The most recently loaded page will have the lowest access count and hence will be the first to be kicked out. That will cause significant trashing if the program being run crosses at least one page boundry. It would be better to kick out the least recently used page instead. I don't think this is the cause of the FIBO problems because it only really affects the page replacement algorithm but it might be causing performance problems. Also, the current code just leaves the count alone when it reuses a page which seems wrong since it means that the counts will be ever increasing until they overflow and are divided by two. It also means that the access counts don't really reflect the number of access to the current virtual page that is loaded but instead reflect the history of access to that hub page.
Are you sure this is correct? You shift the access count right by 11 before comparing it with minacc. That means that the largest value it could possibly have is $1fffff (21 bits of ones) which is smaller than $400000.
Are you sure this is correct? You shift the access count right by 11 before comparing it with minacc. That means that the largest value it could possibly have is $1fffff (21 bits of ones) which is smaller than $400000.
(the shift-right is to get rid of the hub page address, and the LOCK and DIRTY bits)
However, after thinking about it a little, I think you could probably get rid of the shift and use your new bigacc value. It won't hurt to include the page number and lock/dirty bits in the comparison since they're low order bits. The only time they will make a difference is when two entries have the same upper 21 bits and then it probably doesn't matter which page you choose to sacrifice anyway. So, use your new bigacc value and get rid of the shift instruction and you speed up VMCOG (slightly).
I'm not ready to entertain the idea that the Prop is mis-behaving with high speed spinners. Surely something like that would have surfaced already, there have been plenty of projects with high speed spinners and PASM COG synchronization. Even my old 4 COG Z80 emulation had high speed spinners and showed no problems there.
Adding otherwise harmless instructions has two effects:
1) Change the timing a little.
2) Move the rest of the code around a bit.
Looking at the code I can't see 1) being the problem.
But 2) means that if something is being corrupted by a stray pointer or such then adding the instruction can cause something different to be corrupted and the failure mode to change.
Anyway I just tried adding the HUBOP or NOP into waicmd.
Without the new instruction we get up to fibo(23)
With either RDLONG or NOP we only get to fibo(21)
With 5 NOPS we get to fibo(25)
With 9 NOPS we get only to fibo(22).
However, after thinking about it a little, I think you could probably get rid of the shift and use your new bigacc value. It won't hurt to include the page number and lock/dirty bits in the comparison since they're low order bits. The only time they will make a difference is when two entries have the same upper 21 bits and then it probably doesn't matter which page you choose to sacrifice anyway. So, use your new bigacc value and get rid of the shift instruction and you speed up VMCOG (slightly).
I'm not ready to entertain the idea that the Prop is mis-behaving with high speed spinners. Surely something like that would have surfaced already, there have been plenty of projects with high speed spinners and PASM COG synchronization. Even my old 4 COG Z80 emulation had high speed spinners and showed no problems there.
Adding otherwise harmless instructions has two effects:
1) Change the timing a little.
2) Move the rest of the code around a bit.
Looking at the code I can't see 1) being the problem.
But 2) means that if something is being corrupted by a stray pointer or such then adding the instruction can cause something different to be corrupted and the failure mode to change.
Anyway I just tried adding the HUBOP or NOP into waicmd.
Without the new instruction we get up to fibo(23)
With either RDLONG or NOP we only get to fibo(21)
With 5 NOPS we get to fibo(25)
With 9 NOPS we get only to fibo(22).
I'm not ready to entertain the idea that the Prop is mis-behaving with high speed spinners. Surely something like that would have surfaced already, there have been plenty of projects with high speed spinners and PASM COG synchronization. Even my old 4 COG Z80 emulation had high speed spinners and showed no problems there.
Adding otherwise harmless instructions has two effects:
1) Change the timing a little.
2) Move the rest of the code around a bit.
Looking at the code I can't see 1) being the problem.
But 2) means that if something is being corrupted by a stray pointer or such then adding the instruction can cause something different to be corrupted and the failure mode to change.
Anyway I just tried adding the HUBOP or NOP into waicmd.
Without the new instruction we get up to fibo(23)
With either RDLONG or NOP we only get to fibo(21)
With 5 NOPS we get to fibo(25)
With 9 NOPS we get only to fibo(22).
What does it all mean?
waitcmd wrlong zero,pvmcmd
wl rdlong vminst, pvmcmd wz ' top 23 bits = address, bottom 9 = command
[COLOR=Red] NOP ' <---- ADD one else more nop's at this position[/COLOR]
mov vmaddr, vminst
' rdlong nothing, 0 <--- Harmless HUBOP ?
nop <--- Harmless NOPs ?
' nop
' nop
[COLOR=Blue]' nop '<---- Test in same time with different count of NOP's in this place[/COLOR]
' nop
' nop
' nop
' nop
' nop
if_z jmp #wl
The culprit would pretty much have to be a 'movd' used to modify the destination of an instruction; or the code before it that calculated the value to 'moved'.
Most of the movd's place 'vmpage' in the destination; I just added an 'and vmpage,#127' where vmpage is calculated (that is the maximum valid vmpage with this version of VMCOG) but that was not enough to stop the old zog from crashing.
I'm not ready to entertain the idea that the Prop is mis-behaving with high speed spinners. Surely something like that would have surfaced already, there have been plenty of projects with high speed spinners and PASM COG synchronization. Even my old 4 COG Z80 emulation had high speed spinners and showed no problems there.
Adding otherwise harmless instructions has two effects:
1) Change the timing a little.
2) Move the rest of the code around a bit.
Looking at the code I can't see 1) being the problem.
But 2) means that if something is being corrupted by a stray pointer or such then adding the instruction can cause something different to be corrupted and the failure mode to change.
Anyway I just tried adding the HUBOP or NOP into waicmd.
Without the new instruction we get up to fibo(23)
With either RDLONG or NOP we only get to fibo(21)
With 5 NOPS we get to fibo(25)
With 9 NOPS we get only to fibo(22).
Please find attached a new VMDEBUG, with a new 'x' test.
This test:
- fills 64KB with a pattern that encodes the page/word, so we can predict the value at each word, then
repeat forever
- uses Chip's realrandom object to pick a random page, and a random word within the page
- checks that it reads the expected value, shows error if unexpected value read
- writes $aa55 to force the DIRTY bit on
- makes sure that it can read it back, shows error if write did not go as expected
- puts back the 'expected value'
The above ensures a great deal of paging (BUSERR's) with a flush & read at each BUSERR due to the write setting the dirty bit
every 1000 cycles the current count is printed
- I have yet to see an error message (except when I tested those routines with deliberate bad values)
It even works with a working set of just one page!
I may write a PASM version later.
UPDATE:
- added error count to VMDEBUG 'x' command, shown every 1000 probes
Comments
The zog issue almost certainly has to be one of:
- another problem with shr_hits
- a problem within BUSERR
If it is a problem in BUSERR, it is almost certainly the overwriting of a variable that would be used to do the actual read/write by the appropriate (byte/word/long) routine after handling the BUSERR. BUSERR is effectively an interrupt handler for a "page not present" interrupt from the soft MMU.
I need to get myself set up to be able to replicate your problem again.
I'd appreciate a bit of help with that - if you could set up a fibo.spin that does not need SD access to run for me it would save me the half day or so required for changing fsrw and testing it to co-exist with the SPI ram's on PropCade.
I'll merge in your TRIBLADE_2 code today.
I'm also going to duplicate David's two chip setup to see why that is not working for him.
I think the fix to the shr_hits subroutine helped. What prevented it from working after I moved to Bill's new version was forgetting to update the spidir variable with the correct bits to set the direction of all of the pins I use to talk to my SPI SRAM card.
Now, I need to try the microSD card adapter that I also put on this Hydra card. That should let me try running ZOG!!!!
Thanks to everyone who offered help!
I had to make fibo much smaller to get it to fit in HUB with some VM pages. It uses print and outbyte rather than iprintf which gets it down to about 3.5K
As it's now a different program it fails in different ways but fails anyway.
I hacked and slashed, made it barely fit, and it won't print the logon message.
I'll try to get the SD working, which requires hacking fsrw, so I won't have these problems.
F8 tells me it compiles to 2727 longs with 5280 free. Plenty of room for the 8 or so pages I was using.
What on earth have you put into VMCog for it to be too big?
I've been playing with various variations of the heater test this evening basically try to introduce a lot of fast thrashing by hitting only a few bytes in each page. And trying to induce a reproducible failure.
I have a hard time making it fail. After one very long run there was a failure but I could not reproduce it. I don't like a random failure as it points to some power supply or other hardware glitch.
I am running into MANY strange problems, and I need to find out why.
1) the loop that initialized six spi ram's now does not initialize ram 0 - unless I use a different, more verbose version
2) the 17th command added to the table simply does not work!
3) FlexMem ReadStatus and ReadLong work, WriteStatus and WriteLong don't
I am beginning to wonder if I fried TWO props...
Power and ground are clean, I've already put a scope on it. Everything is being transmitted correctly, I've run it under viewport.
I am also thinking along the same lines you are... and writing more memory tests, to find out if there are any faults left in the sacrificial page selection and paging in/out.
I am about ready to kill for #include, and for #ifdef supporting "A | B"...
1) I have a hard time getting a failure in VMCog on TriBlade with any variant of heater test (now modified to run in a loop for ever or until failure). I have had one, unreproducible, failure over many hours now. That one I might put down to some funky hardware glitch. This kind of tells me that vmcog_v0_981_TRIBLADE_2 basically works.
2) Zog itself seems to be solid. It runs from HUB OK. It runs fro Jazzed's SDRAM OK.
3) David has the same symptoms (as far as I can tell) with Zog/VMCog using a different hardware. This suggests the problem is not with the hardware interface code of TriBlade.
What does that leave us?
Well one major difference between using VMCog from Zog or a heater test is the fact that Zog has a PASM interface to the mailbox whereas the heater test uses VMCogs Spin interface.
So there is a possibility that Zog's PASM interface is not quite correct? Is there perhaps some weird timing issue as it is operating at PASM, not Spin, speeds? Has anyone reviewed that Zog/VMCog interface, from both ends? I've had a look occasionally but spotted nothing untoward.
I can reliably change the way Zog/VMCog fails by changing the timing in Zog's PASM mailbox handling code.
What I do is to add HUB operations to the end of the read functions, for example read_long ends like this: Now as far as I know the data at mboxdat should not be changing unless I command it to do so but here is the result, first is Zog with extra delays second original Zog:
ZOG v1.6 (VM, No SD)
fibo(0) = 0 (1ms)
fibo(1) = 1 (0ms)
fibo(2) = 1 (0ms)
fibo(3) = 2 (0ms)
fibo(4) = 3 (1ms)
fibo(5) = 5 (2ms)
fibo(6) = 8 (3ms)
fibo(7) = 13 (5ms)
fibo(8) = 21 (9ms)
fibo(9) = 34 (15ms)
fibo(10) = 55 (25ms)
fibo(11) = 89 (40ms)
fibo(12) = 144 (66ms)
fibo(13) = 233 (107ms)
fibo(14) = 377 (173ms)
fibo(15) = 610 (280ms)
fibo(16) = 987 (454ms)
fibo(17) = 1597 (734ms)
fibo(18) = 2584 (1189ms)
fibo(19) = 4181 (1924ms)
fibo(20) = 6765 (3113ms)
fibo(21) = 10946 (5037ms)
fibo(22) = 17711 (8150ms)
fibo(23) = 29229 (13188ms)
fibo(24) =
#pc,opcode,sp,top_of_stack,next_on_stack
#
0X000061D 0X00 0X00000618 0X000000B1
BREAKPOINT
ZOG v1.6 (VM, No SD)
fibo(0) = 0 (1ms)
fibo(1) = 1 (0ms)
fibo(2) = 1 (0ms)
fibo(3) = 2 (0ms)
fibo(4) = 3 (1ms)
fibo(5) = 5 (1ms)
fibo(6) = 8 (3ms)
fibo(7) = 13 (5ms)
fibo(8) = 21 (8ms)
fibo(9) = 34 (13ms)
fibo(10) = 55 (22ms)
fibo(11) = 89 (35ms)
fibo(12) = 144 (57ms)
fibo(13) = 233 (93ms)
fibo(14) = 377 (151ms)
fibo(15) = 610 (245ms)
fibo(16) = 987 (396ms)
fibo(17) = 1597 (642ms)
fibo(18) = 2584 (1039ms)
fibo(19) = 4181 (1681ms)
fibo(20) = 6765 (2720ms)
fibo(21) = 10946 (4402ms)
fibo(22) =
#pc,opcode,sp,top_of_stack,next_on_stack
#
0X000061D 0X00 0X00000618 0X000000B0
BREAKPOINT
Moving to the other side of the Zog/VMCog interface I can change the point at which the fibo test fails, from fibo(22) to fibo(25) by adding hubops to VMCogs waitcmd loop. Like so:
As this change should affect nothing but timing I find this a bit worry some.
Edit: Slugging the TriBlade BREAD, BWRITE loops with such delays, extending RAM access time, has no noticeable effect.
Sound reasoning. Other than ZOG, I have been unable to get a failure - mind you, just like your tests, that was with a Spin client.
I will take another look at ZOG's pasm interface. I actually have a few hours to work on this today!
I don't like this... the extra readlong should not be needed if the prop was behaving as documented...
It almost looks like the prop is returning the wrong result sometimes when two cogs interact with very high speed spinners.
Sapieha had a good question... do you still see an improvement if you use NOP's instead of extra hub ops?
This is the code that clears the old TLB entry and stores the new one. I looked at it and decided it was wrong since it didn't set the access count to one for the newly loaded page. It seems like that would make sense because the page has just been loaded and has only been accessed once. Actually, zero would probably be best since it gets incremented after the call to BUSERR by the read/write code. However, setting it to one or zero makes VMCOG perform very slowly. If you think about it, the idea of basing the decision of which page to kick out on the access count doesn't make any sense. The most recently loaded page will have the lowest access count and hence will be the first to be kicked out. That will cause significant trashing if the program being run crosses at least one page boundry. It would be better to kick out the least recently used page instead. I don't think this is the cause of the FIBO problems because it only really affects the page replacement algorithm but it might be causing performance problems. Also, the current code just leaves the count alone when it reuses a page which seems wrong since it means that the counts will be ever increasing until they overflow and are divided by two. It also means that the access counts don't really reflect the number of access to the current virtual page that is loaded but instead reflect the history of access to that hub page.
The 'andn tlbi,VM_FLAGS' as you correctly point out leaves the access count alone - which seems to be counter intuitive - however it is MUCH better than zeroing the count.
Zeroing the count would pretty much guarantee that the new page would be sacrificed the next time a page needed to be swapped in.
Leaving the count alone means it will be increased by the number of accesses until another sacrificial page was needed - so there is at least a chance that the new page will stay in-core.
Initially I was going to assign newly loaded pages the "average" count of all pages, to ensure they stick around for a while; but I decided not to spend precious cog memory on it. It would also hurt performance if the new page was not going to be accessed frequently.
I've gotten sidetracked by my new boards, debugging, and adding MORPHEUS1 and FLEXMEM drivers; however one simple answer is to add a small bias to make it more likely that a newly loaded page will stay around for a bit - say (16...256)<<11. This would however be counter productive if only a single read was needed from that new page... Note that keeping the previous count already provides the newly loaded page with a significant bias to stay around for a while.
There is a similar issue in shr_hits. The page that overflows the count would have its count set to zero by the shift, so I actually force it to just over half the possible range. That is another "tuning" constant in the code.
Many papers have been written on page replacement policies - my intention was to implement a simple "least used" policy to start with, and then tune it later after it was working well.
The currently implemented replacement policy will tend to favor keeping frequently used pages in-core, and will tend to penalize newly swapped in pages, as you pointed out. The penalty is not quite as bad is it would seem, as whenever a count for a page overflows, the usage count of all in-core pages is divided in two - so little used pages will tend to have their access counts reduced drastically, and thus offer themselves for sacrifice when a new page needs to be loaded.
Adding a small bias to the count of a replaced page will tend to keep the newly swapped in page around long enough for its access count to become influential as another little-used page will tend to be sacrificed before the newly loaded page due to the bias.
Frankly, once VMCOG works perfectly, playing with page replacement policies promises to be fun
When I changed the TLB format, I forgot to change the "bigacc" value that is used when selecting a sacrificial page.
This would have lead to a bug where it might not have been possible to locate a page to sacrifice!
Please change:
bigacc LONG $00400000
to
bigacc long $FFFFF800
Above was in error as per David's message below, DO NOT CHANGE
My excuse: I have not had coffee yet...
(the shift-right is to get rid of the hub page address, and the LOCK and DIRTY bits)
However, after thinking about it a little, I think you could probably get rid of the shift and use your new bigacc value. It won't hurt to include the page number and lock/dirty bits in the comparison since they're low order bits. The only time they will make a difference is when two entries have the same upper 21 bits and then it probably doesn't matter which page you choose to sacrifice anyway. So, use your new bigacc value and get rid of the shift instruction and you speed up VMCOG (slightly).
I'm not ready to entertain the idea that the Prop is mis-behaving with high speed spinners. Surely something like that would have surfaced already, there have been plenty of projects with high speed spinners and PASM COG synchronization. Even my old 4 COG Z80 emulation had high speed spinners and showed no problems there.
Adding otherwise harmless instructions has two effects:
1) Change the timing a little.
2) Move the rest of the code around a bit.
Looking at the code I can't see 1) being the problem.
But 2) means that if something is being corrupted by a stray pointer or such then adding the instruction can cause something different to be corrupted and the failure mode to change.
Anyway I just tried adding the HUBOP or NOP into waicmd.
Without the new instruction we get up to fibo(23)
With either RDLONG or NOP we only get to fibo(21)
With 5 NOPS we get to fibo(25)
With 9 NOPS we get only to fibo(22).
What does it all mean?
Now to locate the culprit...
I said that I will NOT post anything before FORUM function correct ---- BUT.
It is to intriguing problem.
I have ONE more test to You - LOOK on Yours attached code.
Most of the movd's place 'vmpage' in the destination; I just added an 'and vmpage,#127' where vmpage is calculated (that is the maximum valid vmpage with this version of VMCOG) but that was not enough to stop the old zog from crashing.
Searching...
update
I just went over BUSERR again - very slowly, line by line - and could not find anything there.
This test:
- fills 64KB with a pattern that encodes the page/word, so we can predict the value at each word, then
repeat forever
- uses Chip's realrandom object to pick a random page, and a random word within the page
- checks that it reads the expected value, shows error if unexpected value read
- writes $aa55 to force the DIRTY bit on
- makes sure that it can read it back, shows error if write did not go as expected
- puts back the 'expected value'
The above ensures a great deal of paging (BUSERR's) with a flush & read at each BUSERR due to the write setting the dirty bit
every 1000 cycles the current count is printed
- I have yet to see an error message (except when I tested those routines with deliberate bad values)
It even works with a working set of just one page!
I may write a PASM version later.
UPDATE:
- added error count to VMDEBUG 'x' command, shown every 1000 probes