We know it will work as there is plenty of history for this kind of thing.
I don't know if it will help, but I'm attaching another graph I worked up a
few days ago that shows a graph of collisions with no replacement policy.
I wonder how long it would take to calculate an average of hits in PASM.
Jazed: This is the same version of VMCog that causes dhrystone to run all night long. I was just setting myself up with a timer and a means of automating the testing.
Actually I have never used big printf as it bloats the code to over 40K which I cannot run from HUB. This is using a small iprintf which results in a binary of 17056 bytes. Previously I used a tiny print function that gives a 4K executable.
Bill: I have a hard time getting my head around this access pattern business.
Firstly the code is loaded to RAM. So every page that will ever be used, barring stack pages, is accessed 512 times.
Then we start to run the code. If it was a straight line through initialization every page through to the fibo routine would be accessed 512 times again. It's far from a straight line so perhaps many pages on the way get much bigger hit counts.
So it looks like by the time one gets to the fibo routine it might have the least hit counts of all the code section.
But then there is the stack. That only starts getting hit when the code runs. So it starts off with a hit count at least 512 less than all the rest.
So, looking at it like that it seems the fibo code and the stack end up with the least hit counts by the time we start to run fibo. Given that fibo is so small it never accumulates many hits before being swapped again for the stack. Same for the hits on the stack as fibo runs.
Do we then constantly thrash between the fibo page and a stack page as fibo runs because they are both the least used pages all the time?
Would be nice to be able to glue the stack area into HUB prior to starting.
Hmm...perhaps I can "prime" the system to writing to stack prior to launching fibo.
Edit: That line I struck out is gibberish. It matters not that fibo is small. There is constant thrashing from code to stack anyway as every instruction hits code and stack.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
As soon as I find the bug in my fixes the thrashing issue should go away. If I we are lucky, I may have some time tonight or tomorrow night for VMCOG.
First, I have to finish 10 boards for a customer, and test them - I should be done with that task sometime today. I will keep an eye on the forum when taking a break from solder fumes. If UPS shows up with new PCB's today, I'll have to populate and test those before getting back to VMCOG.
This thrashing is driving me nuts. VMCOG is SOOOOOOOOO close to being able to run ZOG & CP/M at a decent pace! I was hoping to demonstrate that at UPEW, but now I am starting to doubt it.
I put in the closest I've come to a fix. This works with the "h" test, but fails the 'f' test. I am attaching it to this post (sorry no .10 triblade merge yet)
I'd be curious to see if (a) it ran fibo, and (b) the differences in timing results...
In Spin I read every location in the top 1K (The stack area) twice (1K of hits for two pages)
Then I read every byte of the fibo routines code 1000 times.
Then I start Zog.
Damn thing runs in 10 pages as fast as it did in 34 pages before!
Problem is when I reduce the pages to 4 it crashes:
Nice! That verifies that it WAS thrashing causing the problem!
Was that with the "stock" version from page 1, or the hack from page 11 (last post on the page)?
The "runs in 10 pages as fast as it did in 34 pages before" is the kind of result I'd expect with a good page replacement algorithm - and it bodes well for ZOG and ZiCog on VMCOG.
It should work (but slowly) in 4 pages... and it did without "priming"... sounds like I have more multi-legged creatures to locate in the code...
(if it was not with the p.11 version, please try that...)
Sadly that patch does not pass the new heater test which now includes a random fill I put in because a previous bug slipped through the heater test. It's in the v10 TriBlade version.
Testing $00010000 bytes
Up count fill...
Up count check...
Down count fill...
Down count check...
Random fill...
Random check...
Error at 1000 expected 26 got 40
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
That "priming" experiment was with the TriBlade V0.11 version which adds sharing RAM and SD card on the same oins on TriBlade. It's based off your v0.961
It's attached in case of confusion. Note all sections have changed BINIT, BSTART, BREAD, BWRITE,BDATA.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I am 99.99% sure I squished the bug - it was obvious in hindsight.
The version in this message (still without .11 triblade, sorry) now passes 'f' and 'h', while NOT always re-using the same page. It keeps the previous hitcount going, to "even the load".
Even better, I also fixed a bug in shr_hits, so when counts overflow, it it should divide all access counts in half.
heater - any chance you can try it on fibo, and if it works, give another nice set of numbers for varying numbers of pages?
Testing $00010000 bytes
Up count fill...
Up count check...
Down count fill...
Down count check...
Random fill...
Random check...
Zero fill...
Checking 00 writing FF...
Checking FF writing AA...
Checking AA writing 55...
Checking 55 writing 00...
OK
Sadly as you see it crashes on fibo(21) same with 35, 20 and 4 pages.
By the way prior to crashing the fibo speed with 4 pages is the is the same as 10. Just the printing code is obviously staggering instead. Just what we expect.
Sounds like something nasty happens when a hit counter gets big after being hammered on by fibo.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Actually, I think you are right. After seeing your results, I've found two bugs in shr_hits, but I think there must still be one more as it is crashing for me.
Sadly as you see it crashes on fibo(21) same with 35, 20 and 4 pages.
By the way prior to crashing the fibo speed with 4 pages is the is the same as 10. Just the printing code is obviously staggering instead. Just what we expect.
Sounds like something nasty happens when a hit counter gets big after being hammered on by fibo.
I had to reduce the size of the working set to 10 pages, looks like the Spin stack was corrupting some of the working set pages with a 32 page VM [noparse]:)[/noparse]
I will now try to find how big I can grow the working set while not letting Spin clobber it.
@Bill,
Stack clobber could account for what I was seeing trying to integrate with Propeller JVM.
Sometimes the serial port output would just go nuts something like that last line above.
I'll try integrating again very soon. BTW: 4 was the biggest working set I could use with JVM.
This runs through all my fibos up to fibo(26) like (expletive) off a shovel with only 10 pages, staggers a bit at 4 pages. However it now fails with 24 or more pages at different fibos depending on the number of pages used.
This is weird as previously I could happily use 35 pages.
The random fill/read looks like it will run around forever.
My latest Zog is attached. Here rather than on the Zog as it's a bit dodgy still.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
The random fill/read does not need ZOG to be functional, and does not need the Spin system traps... my best guess is that the stack is overwriting the working set when using Zog, using traps etc.
The previous version of ZOG I used left less than 16K free (32 pages) when I hit F8 - and that is without stack space.
If it does not fail with 24 or less pages, then I think we have proof that after the stack only 12KB is available.
I gather you like the speed obtained by a fairer page sacrifice algorithm [noparse]:)[/noparse]
When I have a bit of time, I will add code that it uses the "lowest" working set page as a "guard" page; filled with a specific pattern and checksummed.
Then at every page fault, I will checksum the guard page - this will tell us if the stack is overwriting it! - in which case I'll blink an LED or something.
heater said...
Almost...
This runs through all my fibos up to fibo(26) like (expletive) off a shovel with only 10 pages, staggers a bit at 4 pages. However it now fails with 24 or more pages at different fibos depending on the number of pages used.
This is weird as previously I could happily use 35 pages.
The random fill/read looks like it will run around forever.
My latest Zog is attached. Here rather than on the Zog as it's a bit dodgy still.
When Zog traps out to print something through the I/O trap it uses Spin stack at the top of HUB.
If the working set (last page at $7C00) were in the way of the Spin stack it would fail no matter how many pages were in use. The other pages are lower down. With only 4 pages it runs through fibo(26).
So I think we can rule out the possibility that the Spin stack grows down into the working area.
What about the other end?
When I hit F8 I see Zog has 4713 longs free, that's a tad over 18K bytes. Let's call it 17K for safety sake. That's 34 pages. Remember it used to run with 35 pages. 24 pages should fit OK.
So I don't see the working set crashing into the Spin code at the lower end.
But still it fails at 24 pages and over.
Oh yes, I like the speed. Do describe the algorithm in use now.
My guess is there is no "one best" algorithm that suites all programs and that there will be awkward programs for which nothing much helps. Still, like in the PC world now, people are learning to write their code to cooperate with the huge caches on modern processors. Like organizing data in memory so that you iterate along cache lines rather than across them.
The guard page idea might be a good debugging aid but should be optional, 512 bytes lost to it is a lot. Besides on the TriBlade for example there are no spare pins for indicators.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I've not dug deep in Spin's guts, but I seem to recall that the stack grows UP from the end of variables (I could easily be wrong on this one).
After I finish testing the pile of boards I just built, I will write a stack overflow checking utility.
The new algorithm is quite simple.
- when asking for a non-resident page, find the least-used resident page that is not locked.
- If it is dirty, write it out to the backing store first, and clear the dirty bit.
- move its access count and working set page to the TLB entry for the requested page.
- Clear old entry.
- Load requested page.
By keeping the previous count, new accesses add to it, and the same page won't always be sacrificed.
I am considering adding a "BIAS" value, that may make it even fairer by a bit.
You are correct, there is no "one" perfect algorithm, and may PhD's were awarded for papers on different page replacement algorithms better for different workloads.
Note, cache's don't keep access counts, and don't have interesting algorithms. They are either simple direct mapped / hashed, or several way associative hashed. Much simpler than VM page replacement strategies.
Bill Henning said...
I've not dug deep in Spin's guts, but I seem to recall that the stack grows UP from the end of variables (I could easily be wrong on this one).
No you are correct. The Spin stack starts at DBASE. Check the top of your listing for DBASE and it will tell you where the stack starts in your particular program.
There are 2 longs placed on the start of the stack by the loader and you are off an running. Stack grows from DBASE towards ROM (which as I said before things get really funky when the stack meets ROM)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"I mean, if I went around sayin' I was an emperor just because some moistened bint had lobbed a scimitar at me they'd put me away!"
- I finished testing the 10 boards I had to finish and test today
- I've written a new Spin object called "GuardDog", that I hope to finish testing tonight. It's for testing for stack over-growth [noparse]:)[/noparse]
BradC said...
Bill Henning said...
I've not dug deep in Spin's guts, but I seem to recall that the stack grows UP from the end of variables (I could easily be wrong on this one).
No you are correct. The Spin stack starts at DBASE. Check the top of your listing for DBASE and it will tell you where the stack starts in your particular program.
There are 2 longs placed on the start of the stack by the loader and you are off an running. Stack grows from DBASE towards ROM (which as I said before things get really funky when the stack meets ROM)
OK, if the stack grows UP (is there anything normal about the Prop?) from the end of the Spin data into the Stack/Free Space area (Blue bar in the F8 output window) then Zog should have about 6K of stack with a 24 page VM. Which sounds ample. Used to be enough for 25 page of working set.
Now I move my VM last page up to $7E00. The last page of HUB memory.
With 23 pages fibo runs through to completion.
With 24 pages it fails on fibo(26)
But I just made room for that extra page by moving everything up one page.
I still don't buy the stack overflow explanation.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Stack overflows can be quite insidious, particularly if something writes to the same memory location the stack has overflowed into. When the next method returns, it ends up popping random garbage off the stack and going off into never never land. Generally stack overflows are pretty repeatable though, so they're not *that* hard to pin down.
I thought a growing stack pointer was pretty normal. Every time you push something on the stack you simply increment the pointer. I do recall a processor I once used worked the other direction, but I can't recall which one it was now.
I like Bills idea of just poisoning a guard area between the stack and the remainder of ram. Just check it regularly and if it has changed your stack has been bigger than you thought it might.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"I mean, if I went around sayin' I was an emperor just because some moistened bint had lobbed a scimitar at me they'd put me away!"
Comments
We know it will work as there is plenty of history for this kind of thing.
I don't know if it will help, but I'm attaching another graph I worked up a
few days ago that shows a graph of collisions with no replacement policy.
I wonder how long it would take to calculate an average of hits in PASM.
Cheers.
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Actually I have never used big printf as it bloats the code to over 40K which I cannot run from HUB. This is using a small iprintf which results in a binary of 17056 bytes. Previously I used a tiny print function that gives a 4K executable.
Bill: I have a hard time getting my head around this access pattern business.
Firstly the code is loaded to RAM. So every page that will ever be used, barring stack pages, is accessed 512 times.
Then we start to run the code. If it was a straight line through initialization every page through to the fibo routine would be accessed 512 times again. It's far from a straight line so perhaps many pages on the way get much bigger hit counts.
So it looks like by the time one gets to the fibo routine it might have the least hit counts of all the code section.
But then there is the stack. That only starts getting hit when the code runs. So it starts off with a hit count at least 512 less than all the rest.
So, looking at it like that it seems the fibo code and the stack end up with the least hit counts by the time we start to run fibo. Given that fibo is so small it never accumulates many hits before being swapped again for the stack. Same for the hits on the stack as fibo runs.
Do we then constantly thrash between the fibo page and a stack page as fibo runs because they are both the least used pages all the time?
Would be nice to be able to glue the stack area into HUB prior to starting.
Hmm...perhaps I can "prime" the system to writing to stack prior to launching fibo.
Edit: That line I struck out is gibberish. It matters not that fibo is small. There is constant thrashing from code to stack anyway as every instruction hits code and stack.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 6/17/2010 7:17:39 PM GMT
I wish I had an extra week [noparse]:)[/noparse]
As soon as I find the bug in my fixes the thrashing issue should go away. If I we are lucky, I may have some time tonight or tomorrow night for VMCOG.
First, I have to finish 10 boards for a customer, and test them - I should be done with that task sometime today. I will keep an eye on the forum when taking a break from solder fumes. If UPS shows up with new PCB's today, I'll have to populate and test those before getting back to VMCOG.
This thrashing is driving me nuts. VMCOG is SOOOOOOOOO close to being able to run ZOG & CP/M at a decent pace! I was hoping to demonstrate that at UPEW, but now I am starting to doubt it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I put in the closest I've come to a fix. This works with the "h" test, but fails the 'f' test. I am attaching it to this post (sorry no .10 triblade merge yet)
I'd be curious to see if (a) it ran fibo, and (b) the differences in timing results...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
In Spin I read every location in the top 1K (The stack area) twice (1K of hits for two pages)
Then I read every byte of the fibo routines code 1000 times.
Then I start Zog.
Damn thing runs in 10 pages as fast as it did in 34 pages before!
Problem is when I reduce the pages to 4 it crashes:
Take out the "priming" and it runs through OK again, slowly.
Is there any merit in just replacing pages at random?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Was that with the "stock" version from page 1, or the hack from page 11 (last post on the page)?
The "runs in 10 pages as fast as it did in 34 pages before" is the kind of result I'd expect with a good page replacement algorithm - and it bodes well for ZOG and ZiCog on VMCOG.
It should work (but slowly) in 4 pages... and it did without "priming"... sounds like I have more multi-legged creatures to locate in the code...
(if it was not with the p.11 version, please try that...)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
It's attached in case of confusion. Note all sections have changed BINIT, BSTART, BREAD, BWRITE,BDATA.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I am 99.99% sure I squished the bug - it was obvious in hindsight.
The version in this message (still without .11 triblade, sorry) now passes 'f' and 'h', while NOT always re-using the same page. It keeps the previous hitcount going, to "even the load".
Even better, I also fixed a bug in shr_hits, so when counts overflow, it it should divide all access counts in half.
heater - any chance you can try it on fibo, and if it works, give another nice set of numbers for varying numbers of pages?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
The last patch worked wonders.
Remember the RC4 zog demo?
All tests on PropCade, with SPI ram @ 4mbps.
old vmcog, 12 pages 18:07:35-18:08:29 ... 54s
new, 12 pages ... <1.5s
new, 10 pages ... <1.5s
new, 8 pages ... <1.5s
new, 6 pages ... 2s
new, 5 pages ... 4s
new, 4 pages ... 24s
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
I suspect it would also be MUCH faster with the EEPROM code now.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
The (new) heater test passes fine:
And FIBO with 10 pages is damn fast:
Sadly as you see it crashes on fibo(21) same with 35, 20 and 4 pages.
By the way prior to crashing the fibo speed with 4 pages is the is the same as 10. Just the printing code is obviously staggering instead. Just what we expect.
Sounds like something nasty happens when a hit counter gets big after being hammered on by fibo.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Actually, I think you are right. After seeing your results, I've found two bugs in shr_hits, but I think there must still be one more as it is crashing for me.
Will fix [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I had to reduce the size of the working set to 10 pages, looks like the Spin stack was corrupting some of the working set pages with a 32 page VM [noparse]:)[/noparse]
I will now try to find how big I can grow the working set while not letting Spin clobber it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 6/18/2010 2:12:15 PM GMT
@Bill,
Stack clobber could account for what I was seeing trying to integrate with Propeller JVM.
Sometimes the serial port output would just go nuts something like that last line above.
I'll try integrating again very soon. BTW: 4 was the biggest working set I could use with JVM.
Cheers,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
This runs through all my fibos up to fibo(26) like (expletive) off a shovel with only 10 pages, staggers a bit at 4 pages. However it now fails with 24 or more pages at different fibos depending on the number of pages used.
This is weird as previously I could happily use 35 pages.
The random fill/read looks like it will run around forever.
My latest Zog is attached. Here rather than on the Zog as it's a bit dodgy still.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
The previous version of ZOG I used left less than 16K free (32 pages) when I hit F8 - and that is without stack space.
If it does not fail with 24 or less pages, then I think we have proof that after the stack only 12KB is available.
I gather you like the speed obtained by a fairer page sacrifice algorithm [noparse]:)[/noparse]
When I have a bit of time, I will add code that it uses the "lowest" working set page as a "guard" page; filled with a specific pattern and checksummed.
Then at every page fault, I will checksum the guard page - this will tell us if the stack is overwriting it! - in which case I'll blink an LED or something.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
When Zog traps out to print something through the I/O trap it uses Spin stack at the top of HUB.
If the working set (last page at $7C00) were in the way of the Spin stack it would fail no matter how many pages were in use. The other pages are lower down. With only 4 pages it runs through fibo(26).
So I think we can rule out the possibility that the Spin stack grows down into the working area.
What about the other end?
When I hit F8 I see Zog has 4713 longs free, that's a tad over 18K bytes. Let's call it 17K for safety sake. That's 34 pages. Remember it used to run with 35 pages. 24 pages should fit OK.
So I don't see the working set crashing into the Spin code at the lower end.
But still it fails at 24 pages and over.
Oh yes, I like the speed. Do describe the algorithm in use now.
My guess is there is no "one best" algorithm that suites all programs and that there will be awkward programs for which nothing much helps. Still, like in the PC world now, people are learning to write their code to cooperate with the huge caches on modern processors. Like organizing data in memory so that you iterate along cache lines rather than across them.
The guard page idea might be a good debugging aid but should be optional, 512 bytes lost to it is a lot. Besides on the TriBlade for example there are no spare pins for indicators.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 6/18/2010 8:04:42 PM GMT
I've not dug deep in Spin's guts, but I seem to recall that the stack grows UP from the end of variables (I could easily be wrong on this one).
After I finish testing the pile of boards I just built, I will write a stack overflow checking utility.
The new algorithm is quite simple.
- when asking for a non-resident page, find the least-used resident page that is not locked.
- If it is dirty, write it out to the backing store first, and clear the dirty bit.
- move its access count and working set page to the TLB entry for the requested page.
- Clear old entry.
- Load requested page.
By keeping the previous count, new accesses add to it, and the same page won't always be sacrificed.
I am considering adding a "BIAS" value, that may make it even fairer by a bit.
You are correct, there is no "one" perfect algorithm, and may PhD's were awarded for papers on different page replacement algorithms better for different workloads.
Note, cache's don't keep access counts, and don't have interesting algorithms. They are either simple direct mapped / hashed, or several way associative hashed. Much simpler than VM page replacement strategies.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
No you are correct. The Spin stack starts at DBASE. Check the top of your listing for DBASE and it will tell you where the stack starts in your particular program.
There are 2 longs placed on the start of the stack by the loader and you are off an running. Stack grows from DBASE towards ROM (which as I said before things get really funky when the stack meets ROM)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"I mean, if I went around sayin' I was an emperor just because some moistened bint had lobbed a scimitar at me they'd put me away!"
In other news...
- I finished testing the 10 boards I had to finish and test today
- I've written a new Spin object called "GuardDog", that I hope to finish testing tonight. It's for testing for stack over-growth [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Now I move my VM last page up to $7E00. The last page of HUB memory.
With 23 pages fibo runs through to completion.
With 24 pages it fails on fibo(26)
But I just made room for that extra page by moving everything up one page.
I still don't buy the stack overflow explanation.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
20 pages 42 seconds
10 pages 65 seconds
5 pages 270 seconds
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Stack overflows can be quite insidious, particularly if something writes to the same memory location the stack has overflowed into. When the next method returns, it ends up popping random garbage off the stack and going off into never never land. Generally stack overflows are pretty repeatable though, so they're not *that* hard to pin down.
I thought a growing stack pointer was pretty normal. Every time you push something on the stack you simply increment the pointer. I do recall a processor I once used worked the other direction, but I can't recall which one it was now.
I like Bills idea of just poisoning a guard area between the stack and the remainder of ram. Just check it regularly and if it has changed your stack has been bigger than you thought it might.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"I mean, if I went around sayin' I was an emperor just because some moistened bint had lobbed a scimitar at me they'd put me away!"