So you've laid out a road map. That's a good start.
Attached is an sd based boot-loader I wrote last year that you could consider as a small starting point. Look for BootLoader.doc in the .zip. As far as I remember, it is functional but uses the old FSRW.
There is a lot of *nix flavor here which is good considering many software engineers understand that, and it is a known workable model.
It seems to me that SD access should be available to any application at a very low cost to the application. It's just another O/S service and it can be managed in a sane way with ADT methodology just like any other system service. I assume you've considered /proc and /dev file-systems for resource management although it's unclear if /proc is truly necessary.
Yes, anything that people are willing to write or release should be able to run on such a system. There are some things I probably won't test though. BigSpin has been a goal of mine and I even have a project on the back burner with that name [noparse]:)[/noparse] Still, I'm more interested in more mainstream things like GNU in industry and Java in education or both in any form; other languages and tools come to mind. Still, small is a very good thing especially when dealing with a micro-controller [noparse]:)[/noparse]
It's Ok to use .sh extension, but it usually means "this is a Bourne shell file" ... just like .csh usually means this is a "C shell file". Mixing things up is not really desirable. I've seen too many bugs take flight on incorrect assumptions.
Cheers.
--Steve
Bill Henning said...
I will make a Minos feature set document soon... I am still working on it.
Basically, SD with fsrw during boot, MODULES and INIT.D, and I suspect mostly .ZOG executables. .COG drivers/vm's.
I desperately need an SD cog, with a mailbox interface, and an fsrw26 modified to use that. Once we have those, work on the boot loader can be started.
Once the bootloader can process MODULES and INIT.D, it is time to write mash.zog
The idea is that small C version of FAT support can be compiled with Gcc to zog code, so it can run from the virtual memory.
Then it is just more mailbox drivers, and utilities. I also want to support spin bytecode executables, but ideally with a "BigSpin" interpreter - OR - they take over the 24KB left after the Minos system area until BigSpinVM exists.
The ideas are gelling in my head... ie what subset of Largos features are needed / feasible.
For "compatibility", I think mash scripts should still have a plain ".sh" extension
And I *REALLY* want to see Sphinx running under Minos. Combined with BigSpinVM it would be a killer combination.
jazzed said...
So you've laid out a road map. That's a good start.
Thanks - I think between adherence to the KISS principle, and an incremental development cycle with baby steps, we can get something basic running pretty quick - and expand from there!
jazzed said...
Attached is an sd based boot-loader I wrote last year that you could consider as a small starting point. Look for BootLoader.doc in the .zip. As far as I remember, it is functional but uses the old FSRW.
Nice work! I quickly read the .doc, there is a lot of good stuff there...
jazzed said...
There is a lot of *nix flavor here which is good considering many software engineers understand that, and it is a known workable model.
That is quite deliberate, within the context of no subdirectories and 8.3 file names (for now at least)
Largos is even more *nixy, VERY deliberately. I'll bring the prototype on a Morpheus to UPEW, running off a 1MB Winbond with my FS.
jazzed said...
It seems to me that SD access should be available to any application at a very low cost to the application. It's just another O/S service and it can be managed in a sane way with ADT methodology just like any other system service. I assume you've considered /proc and /dev file-systems for resource management although it's unclear if /proc is truly necessary.
Largos does implement /dev, but not /proc - and I may even drop /dev. I was mainly planning to use it for device numbers, but /etc/modules is sufficient I think.
I have to keep things nice and very small... at least until Prop2
jazzed said...
Yes, anything that people are willing to write or release should be able to run on such a system. There are some things I probably won't test though. BigSpin has been a goal of mine and I even have a project on the back burner with that name [noparse]:)[/noparse] Still, I'm more interested in more mainstream things like GNU in industry and Java in education or both in any form; other languages and tools come to mind. Still, small is a very good thing especially when dealing with a micro-controller [noparse]:)[/noparse]
LOL... I just used BigSpin as a handy moniker, I think I heard both you and Cluso use that description in the past.
I am interested in all three tool chains - gnu C/C++, Java, and spin - for the same reasons you are interested in them.
jazzed said...
It's Ok to use .sh extension, but it usually means "this is a Bourne shell file" ... just like .csh usually means this is a "C shell file". Mixing things up is not really desirable. I've seen too many bugs take flight on incorrect assumptions.
ah I guess ".msh" is ok.
I liked ".sh" as it implies a limited functionality [noparse]:)[/noparse]
Hmm. I was just running that bootloader and reviewing implementation.
There is a lot of unnecessary cruft there especially with that config block.
I think a file line-editor would be more valuable than the config business,
but there are some things that are minimum requirements like sd card
pins and a startup program filename (startup can be hard coded).
There are a few salvageable items:
- It will boot a default but limited size spin binary up to 26KB. Mike Green's
booter by comparison allows booting a full 32KB application using cog code.
I know how to do that too, just never thought it too important for startup.
- There are some limited diagnostic abilities if boot fails.
The Spin access methods are pretty much just simple manipulation of that block.
How would you like it to look for minos or can it stay the same?
I am using a slightly modified version by Cluso that has a function to tristate it's pins. This is required as RAM and SD share pins on the TriBlade/RamBlades.
Edit: fsrwfemto itself does not have a running COG, It's only a bunch of Spin functions. So it makes no sense to have a mailbox interface to it from above.
It could be easily modified to use a mailbox interface to the SPI driver but what is the point?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
The boot-loader I posted uses the Spin only version of the old SD fsrw driver mainly to save hub space for the application. The total fsrw code/var size for that is 850 longs. However, one could probably create an fsrw app that uses less space by recycling the cog code.
Also, the directory/filename issues for any FAT* driver will end up needing to use trans.tbl or one of it's successors, so it may as well be done up front.
I took a closer look, and it is a bit heavy in its current form. It is a nice piece of work though.
jazzed said...
Hmm. I was just running that bootloader and reviewing implementation.
There is a lot of unnecessary cruft there especially with that config block.
I think a file line-editor would be more valuable than the config business,
but there are some things that are minimum requirements like sd card
pins and a startup program filename (startup can be hard coded).
There are a few salvageable items:
- It will boot a default but limited size spin binary up to 26KB. Mike Green's
booter by comparison allows booting a full 32KB application using cog code.
I know how to do that too, just never thought it too important for startup.
- There are some limited diagnostic abilities if boot fails.
Frankly, a mult-processor chip with shared ram will pretty much devolve to some mailbox like mechanism, and Mike's layout is a logical one.
I prefer a fixed 4 long format. VMCOG diverged a bit from the one I use in Largos, due to merging the command and vmaddress longs into one for efficiency.
In cases where it is not absolutely necessary, I do not like packing the longs with several fields, as it takes too many cog instructions to pack/unpack them, so I don't plan to adopt Mike's format.
I think something like the following would work for an SD cog:
mbox[noparse][[/noparse]0] = command / set to 0 by SD.COG when complete
mbox = sector address (2^32 sectors of 512 bytes each = maximum 2^41 bytes on media = approx 2TB
mbox = hub address
mbox = number of sectors to transfer / return code
Obviously the setup commands can assign different meanings to 1/2/3 for their input, but all should return status/error code in 3.
There is no reason a cog running the spin fsrwfemto could not be equipped with such a mailbox .. the point being invoking it from within ZOG directly [noparse]:)[/noparse]
heater said...
Bill, Jazzed, a lot of good minos strategy expressed in that page of postings. I should print the thing out.
Did anyone notice that spiFemto pretty much uses a minos mailbox already? It's cog is started with par pointing at this structure:
The Spin access methods are pretty much just simple manipulation of that block.
How would you like it to look for minos or can it stay the same?
I am using a slightly modified version by Cluso that has a function to tristate it's pins. This is required as RAM and SD share pins on the TriBlade/RamBlades.
Edit: fsrwfemto itself does not have a running COG, It's only a bunch of Spin functions. So it makes no sense to have a mailbox interface to it from above.
It could be easily modified to use a mailbox interface to the SPI driver but what is the point?
jazzed said...
The boot-loader I posted uses the Spin only version of the old SD fsrw driver mainly to save hub space for the application. The total fsrw code/var size for that is 850 longs. However, one could probably create an fsrw app that uses less space by recycling the cog code.
Also, the directory/filename issues for any FAT* driver will end up needing to use trans.tbl or one of it's successors, so it may as well be done up front.
but - " ...the point being invoking it from within ZOG directly" does not sound right to me.
Zog as a CPU engine should not know about such things. It should have connections to it's memory, its I/O and possibly signals like interrupt and halt.
As ZPU is has no I/O address space or bus one could say that all I/O should go through its memory interface i.e. VMCOG. However it seems more reasonable to partition the memory space and have access to some I/O area be directed through an IO mailbox to whatever I/O handler there is.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I meant invoking it from ZOG byte code, which resides in the virtual memory.
As far as how to communicate with mailboxes, I see two possible scenarios:
map a high virtual address onto a specific hub page
or
add "raw" {rd|wr}{byte|word|long} system calls to ZOG, which bypass the VM. (I prefer this escape hatch)
heater said...
Point taken about fsrwfemto's mailbox.
but - " ...the point being invoking it from within ZOG directly" does not sound right to me.
Zog as a CPU engine should not know about such things. It should have connections to it's memory, its I/O and possibly signals like interrupt and halt.
As ZPU is has no I/O address space or bus one could say that all I/O should go through its memory interface i.e. VMCOG. However it seems more reasonable to partition the memory space and have access to some I/O area be directed through an IO mailbox to whatever I/O handler there is.
I start to think we should map all of HUB memory (including ROM) to some high address in the ZPU memory space which would bypass VMCOG. Which gives us the following:
1) Access to any mailboxes.
2) Access buffers spaces in HUB.
3) Enables putting the ZPU stack in HUB.
4) Enables putting chunks of ZPU code and data in HUB.
1,2, and 3 can run normal ZPU programs as built "out of the box" now. ZPU does not care where you put the stack and we only need some pointers into HUB space.
4 requires a little tweaking of linker scripts to locate code from selected/modules into a .hub section.
There we have a possibility to turbo some parts of large programs. Or have ZPU programs in COG operating on data in ext RAM.
I'd like to adapt the syscall mechanism to allow direct access to pins, counters, locks, cognew, the hub memory etc. All normal Prop features. Such that anything you can do in Spin you can do in C.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I think that sounds like an excellent idea! The only downside is the test for hub/vm address space, however I think it would be worth it.
Three IMM's give 21 bit constants. Four give 28 bits.
I would suggest bit 21 for a "medium" ZOG (small is hub-only), and bit 27 for "large" zog. Why? Simply to keep the number of IM's under control.
Instead of 4, I suspect a "launch temporary cog" would be better, treating the cog like a subroutine with 100us overhead... and running pasm in the cog.
Instead of having specific pin/dir features, maybe two sys functions getreg() and setreg(), which read/write cog registers $000-$1FF, thus accessing dira+friends
Good point about the address lengths and number of IM's required.
If we were to omit 4), executing code from HUB, there would be no extra test for address space required in the opcode fetch path.
If we were to fix where the stack resides, HUB or ext, then there is no need for the address space test in push_tos or pop either. Perhaps this could build option with #define.
Hmm... "launch temporary cog", remote procedure calls. Temporary or not this should be done through syscall.
"getreg() and setreg(), which read/write cog registers $000-$1FF" - That gives ZPU programs the possibility to alter the virtual machine they are running on. Ouch! Could get interesting[noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
The good news is: with a direct-mapped cache using 16 bytes per cache line,
64 tags and no aging replacement policy, the Propeller JVM can run the FIBO24
test from EEPROM faster than Javelin in the worst case test
Moreover, it is only 2x slower in the worst case than running from HUB RAM.
The bad news is: I can't get vmcog to work yet.
I'm sure it's just a matter of time and more effort though.
Validation of principle? [noparse]:)[/noparse] I think so.
I do find though that trying to use a cache line larger than 16 significantly degrades performance.
While FIBO(0) time goes to 3ms (+/- 1ms), the FIBO(22) test takes over 1 minute to finish.
Have you by chance considered a vmcog with a cache line smaller than 512 bytes ?
Nice! That certainly makes an EEPROM based direct-mapped Javelin replacement very workable!
I am sure you can get VMCOG running.
Yes, we can try smaller page sizes once I move to a hub-based page present table.
I don't think the paging speed impact (for RAM) will be as bad as for eeprom, because I2C eeprom normally runs at 400kbps, where even my slow SPI routines run at 4mbps, and using counter code 10mbps is definitely possible, and 20mbps may be possible (have to check reliability).
Ignoring the headers, 10mbps SPI flash will read/write 25x faster than 400kbps eeprom.
16 bytes * 25 = 400 bytes
Moving to 256 byte pages then should be faster than 16 byte eeprom pages.
The problem with really small pages is the number of "page present" entries required in the hub.
The other limit is that realistically, there can only be 128 TLB entries in a cog even after the page present table is moved to the hub.
jazzed said...
@Bill.
Well I have good news and bad news.
The good news is: with a direct-mapped cache using 16 bytes per cache line,
64 tags and no aging replacement policy, the Propeller JVM can run the FIBO24
test from EEPROM faster than Javelin in the worst case test
Moreover, it is only 2x slower in the worst case than running from HUB RAM.
The bad news is: I can't get vmcog to work yet.
I'm sure it's just a matter of time and more effort though.
Validation of principle? [noparse]:)[/noparse] I think so.
I do find though that trying to use a cache line larger than 16 significantly degrades performance.
While FIBO(0) time goes to 3ms (+/- 1ms), the FIBO(22) test takes over 1 minute to finish.
Have you by chance considered a vmcog with a cache line smaller than 512 bytes ?
Bill Henning said...
The other limit is that realistically, there can only be 128 TLB entries in a cog even after the page present table is moved to the hub.
Ya, propeller memory size always makes me feel like I'm wearing a straight-jacket.
I'll make VMCOG work on Propeller JVM at some point.
I haven't given up on VMCOG, it's just that in the EEPROM case, it is just not practical.
As is, I don't think it would serve VMCOG very well by adding EEPROM code.
Good thing the SPI ram is not limited to 400KHz [noparse]:)[/noparse]
No worries, take your time... I can wait a while to see Java on VMCOG - but I am looking forward to it!
heater:
Great progress!
I think Dhrystone would be a lot quicker with a bigger working set. As a matter of fact, plotting a graph of Dhrystone execution time vs. working set size is one of the things I'd like to see.
I'll post the cut-down PropCmd later today, that should let you have a 46 page or so working set.
Attached is vmcog with v0.10 of my TriBlade access routines.
The TriBlade 2 shares pins between ext RAM and SD card (and other things) so I had to ensure the "bus" is tristated at the end of each BREAD and BWRITE.
That meant moving all ext memory handling code out of BINIT and BSTART and placing it at the front of BREAD and BWRITE. Then tristating the bus after BREAD and BWRITE. This way the RAM chip select is reasserted on every BREAD/WRITE and "lost" again when they are done.
This now enables Zog to load it's ext RAM from SD at start up.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
It's nice to see some progress, but it seems this is not so encouraging.
I would have expected something other than a step function.
So 34*512 > code size? What happens if you use a big printf again?
jazzed: The current page replacement policy basically sucks for these tests.
I am preparing for UPEW, so my time is incredibly tight right now, but I have started trying to fix it.
Basically what happens is this:
1- program runs along
2- oops, need to swap a page in
3- select sacrificial page (based on least accesses)
4- runs a bit
5- goto 2
Now if the access pattern of the program is such that the newly swapped in page is not hit a lot before there is a need to swap another page in... guess which page gets chosen for the sacrifice? You guessed right. The same page as last time. So the VM ends up thrashing, flushing and re-loading the same page --> performance then sucks.
I am working on two possible fixes, but they are not (yet) working correctly!
1) simple fix: keep the access count of the page that was swapped out. This will make the count increase with hits to the newly loaded page, and hopefully it won't be always chosen as the sacrificial page.
2) better fix: average the hit counts of all pages, and assign the average as the hit count of the new page. This should lead to a "fairer" sacrifice.
I've tried both, and both fail the "f" fill word test. I have a bug somewhere, probably another badly re-used variable.
VMCOG is now proven to work, at this point it has to be tuned for better performance.
I will also make a "bigvm" version as soon as I can.
Comments
Attached is an sd based boot-loader I wrote last year that you could consider as a small starting point. Look for BootLoader.doc in the .zip. As far as I remember, it is functional but uses the old FSRW.
There is a lot of *nix flavor here which is good considering many software engineers understand that, and it is a known workable model.
It seems to me that SD access should be available to any application at a very low cost to the application. It's just another O/S service and it can be managed in a sane way with ADT methodology just like any other system service. I assume you've considered /proc and /dev file-systems for resource management although it's unclear if /proc is truly necessary.
Yes, anything that people are willing to write or release should be able to run on such a system. There are some things I probably won't test though. BigSpin has been a goal of mine and I even have a project on the back burner with that name [noparse]:)[/noparse] Still, I'm more interested in more mainstream things like GNU in industry and Java in education or both in any form; other languages and tools come to mind. Still, small is a very good thing especially when dealing with a micro-controller [noparse]:)[/noparse]
It's Ok to use .sh extension, but it usually means "this is a Bourne shell file" ... just like .csh usually means this is a "C shell file". Mixing things up is not really desirable. I've seen too many bugs take flight on incorrect assumptions.
Cheers.
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Thanks - I think between adherence to the KISS principle, and an incremental development cycle with baby steps, we can get something basic running pretty quick - and expand from there!
Nice work! I quickly read the .doc, there is a lot of good stuff there...
That is quite deliberate, within the context of no subdirectories and 8.3 file names (for now at least)
Largos is even more *nixy, VERY deliberately. I'll bring the prototype on a Morpheus to UPEW, running off a 1MB Winbond with my FS.
Largos does implement /dev, but not /proc - and I may even drop /dev. I was mainly planning to use it for device numbers, but /etc/modules is sufficient I think.
I have to keep things nice and very small... at least until Prop2
LOL... I just used BigSpin as a handy moniker, I think I heard both you and Cluso use that description in the past.
I am interested in all three tool chains - gnu C/C++, Java, and spin - for the same reasons you are interested in them.
ah I guess ".msh" is ok.
I liked ".sh" as it implies a limited functionality [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
There is a lot of unnecessary cruft there especially with that config block.
I think a file line-editor would be more valuable than the config business,
but there are some things that are minimum requirements like sd card
pins and a startup program filename (startup can be hard coded).
There are a few salvageable items:
- It will boot a default but limited size spin binary up to 26KB. Mike Green's
booter by comparison allows booting a full 32KB application using cog code.
I know how to do that too, just never thought it too important for startup.
- There are some limited diagnostic abilities if boot fails.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Did anyone notice that spiFemto pretty much uses a minos mailbox already? It's cog is started with par pointing at this structure:
cmd/status is a byte.
The Spin access methods are pretty much just simple manipulation of that block.
How would you like it to look for minos or can it stay the same?
I am using a slightly modified version by Cluso that has a function to tristate it's pins. This is required as RAM and SD share pins on the TriBlade/RamBlades.
Edit: fsrwfemto itself does not have a running COG, It's only a bunch of Spin functions. So it makes no sense to have a mailbox interface to it from above.
It could be easily modified to use a mailbox interface to the SPI driver but what is the point?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 6/12/2010 3:27:47 AM GMT
Also, the directory/filename issues for any FAT* driver will end up needing to use trans.tbl or one of it's successors, so it may as well be done up front.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Frankly, a mult-processor chip with shared ram will pretty much devolve to some mailbox like mechanism, and Mike's layout is a logical one.
I prefer a fixed 4 long format. VMCOG diverged a bit from the one I use in Largos, due to merging the command and vmaddress longs into one for efficiency.
In cases where it is not absolutely necessary, I do not like packing the longs with several fields, as it takes too many cog instructions to pack/unpack them, so I don't plan to adopt Mike's format.
I think something like the following would work for an SD cog:
mbox[noparse][[/noparse]0] = command / set to 0 by SD.COG when complete
mbox = sector address (2^32 sectors of 512 bytes each = maximum 2^41 bytes on media = approx 2TB
mbox = hub address
mbox = number of sectors to transfer / return code
Obviously the setup commands can assign different meanings to 1/2/3 for their input, but all should return status/error code in 3.
There is no reason a cog running the spin fsrwfemto could not be equipped with such a mailbox .. the point being invoking it from within ZOG directly [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
but - " ...the point being invoking it from within ZOG directly" does not sound right to me.
Zog as a CPU engine should not know about such things. It should have connections to it's memory, its I/O and possibly signals like interrupt and halt.
As ZPU is has no I/O address space or bus one could say that all I/O should go through its memory interface i.e. VMCOG. However it seems more reasonable to partition the memory space and have access to some I/O area be directed through an IO mailbox to whatever I/O handler there is.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
As far as how to communicate with mailboxes, I see two possible scenarios:
map a high virtual address onto a specific hub page
or
add "raw" {rd|wr}{byte|word|long} system calls to ZOG, which bypass the VM. (I prefer this escape hatch)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
1) Access to any mailboxes.
2) Access buffers spaces in HUB.
3) Enables putting the ZPU stack in HUB.
4) Enables putting chunks of ZPU code and data in HUB.
1,2, and 3 can run normal ZPU programs as built "out of the box" now. ZPU does not care where you put the stack and we only need some pointers into HUB space.
4 requires a little tweaking of linker scripts to locate code from selected/modules into a .hub section.
There we have a possibility to turbo some parts of large programs. Or have ZPU programs in COG operating on data in ext RAM.
I'd like to adapt the syscall mechanism to allow direct access to pins, counters, locks, cognew, the hub memory etc. All normal Prop features. Such that anything you can do in Spin you can do in C.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I think that sounds like an excellent idea! The only downside is the test for hub/vm address space, however I think it would be worth it.
Three IMM's give 21 bit constants. Four give 28 bits.
I would suggest bit 21 for a "medium" ZOG (small is hub-only), and bit 27 for "large" zog. Why? Simply to keep the number of IM's under control.
Instead of 4, I suspect a "launch temporary cog" would be better, treating the cog like a subroutine with 100us overhead... and running pasm in the cog.
Instead of having specific pin/dir features, maybe two sys functions getreg() and setreg(), which read/write cog registers $000-$1FF, thus accessing dira+friends
Bill
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Good point about the address lengths and number of IM's required.
If we were to omit 4), executing code from HUB, there would be no extra test for address space required in the opcode fetch path.
If we were to fix where the stack resides, HUB or ext, then there is no need for the address space test in push_tos or pop either. Perhaps this could build option with #define.
Hmm... "launch temporary cog", remote procedure calls. Temporary or not this should be done through syscall.
"getreg() and setreg(), which read/write cog registers $000-$1FF" - That gives ZPU programs the possibility to alter the virtual machine they are running on. Ouch! Could get interesting[noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Well I have good news and bad news.
The good news is: with a direct-mapped cache using 16 bytes per cache line,
64 tags and no aging replacement policy, the Propeller JVM can run the FIBO24
test from EEPROM faster than Javelin in the worst case test
Moreover, it is only 2x slower in the worst case than running from HUB RAM.
The bad news is: I can't get vmcog to work yet.
I'm sure it's just a matter of time and more effort though.
Validation of principle? [noparse]:)[/noparse] I think so.
I do find though that trying to use a cache line larger than 16 significantly degrades performance.
While FIBO(0) time goes to 3ms (+/- 1ms), the FIBO(22) test takes over 1 minute to finish.
Have you by chance considered a vmcog with a cache line smaller than 512 bytes ?
Cheers,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
I am sure you can get VMCOG running.
Yes, we can try smaller page sizes once I move to a hub-based page present table.
I don't think the paging speed impact (for RAM) will be as bad as for eeprom, because I2C eeprom normally runs at 400kbps, where even my slow SPI routines run at 4mbps, and using counter code 10mbps is definitely possible, and 20mbps may be possible (have to check reliability).
Ignoring the headers, 10mbps SPI flash will read/write 25x faster than 400kbps eeprom.
16 bytes * 25 = 400 bytes
Moving to 256 byte pages then should be faster than 16 byte eeprom pages.
The problem with really small pages is the number of "page present" entries required in the hub.
page_present_table_size_in_bytes = VM_size_in_bytes / byte_per_page
The other limit is that realistically, there can only be 128 TLB entries in a cog even after the page present table is moved to the hub.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I'll make VMCOG work on Propeller JVM at some point.
I haven't given up on VMCOG, it's just that in the EEPROM case, it is just not practical.
As is, I don't think it would serve VMCOG very well by adding EEPROM code.
Good thing the SPI ram is not limited to 400KHz [noparse]:)[/noparse]
Cheers.
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
I can now load the ZPU image to ext RAM from an SD file.
Sadly dhrystone took over 20 minutes to run!
I have 16 pages for VMCog and dhrystone is 17K of code and the stack lives at 64K
I'll post TriBlade updates for VMCog when I have cleaned it up a bit.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
No worries, take your time... I can wait a while to see Java on VMCOG - but I am looking forward to it!
heater:
Great progress!
I think Dhrystone would be a lot quicker with a bigger working set. As a matter of fact, plotting a graph of Dhrystone execution time vs. working set size is one of the things I'd like to see.
I'll post the cut-down PropCmd later today, that should let you have a 46 page or so working set.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
The TriBlade 2 shares pins between ext RAM and SD card (and other things) so I had to ensure the "bus" is tristated at the end of each BREAD and BWRITE.
That meant moving all ext memory handling code out of BINIT and BSTART and placing it at the front of BREAD and BWRITE. Then tristating the bus after BREAD and BWRITE. This way the RAM chip select is reasserted on every BREAD/WRITE and "lost" again when they are done.
This now enables Zog to load it's ext RAM from SD at start up.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I just uploaded version 0.970 into the first post.
This version adds a "hitavg" subroutine used by BUSERR to set the access count of the newly loaded page to the average of all pages access counts.
It may help with certain thrashing situations.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I mentioned the lack in the ZOG thread.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I will put back the last release, and remove this version until I have time to fix it (tonight maybe). Sorry, I am soldering like crazy.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Stay tuned...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Under vmcog it calculates and prints ALL the FIBOs from 0 to 26 in 140 seconds.
From HUB RAM that's 42 seconds.
Much more like it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
This one is 17056 bytes as it uses a small version of printf to display results like so:
fibo(00) = 000000 (00000ms)
fibo(01) = 000001 (00000ms)
fibo(02) = 000001 (00000ms)
fibo(03) = 000002 (00000ms)
fibo(04) = 000003 (00001ms)
fibo(05) = 000005 (00001ms)
fibo(06) = 000008 (00003ms)
fibo(07) = 000013 (00005ms)
fibo(08) = 000021 (00008ms)
fibo(09) = 000034 (00013ms)
fibo(10) = 000055 (00022ms)
fibo(11) = 000089 (00035ms)
fibo(12) = 000144 (00057ms)
fibo(13) = 000233 (00093ms)
fibo(14) = 000377 (00151ms)
fibo(15) = 000610 (00245ms)
Results only go up to FIBO(15) as my crude CNT based timing rolls over for the slow tests after that.
I have run this with a selection page values from 35 (the most I can fit in) down to 1 (yes 1).
A summary of the execution times for FIBO(15) looks like this:
Pages, milliseconds:
01 26587
05 26587
10 26709
20 26927
30 27180
31 27203
32 03134
33 03134
34 00245
35 00245
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 6/17/2010 4:09:12 PM GMT
It's nice to see some progress, but it seems this is not so encouraging.
I would have expected something other than a step function.
So 34*512 > code size? What happens if you use a big printf again?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
jazzed: The current page replacement policy basically sucks for these tests.
I am preparing for UPEW, so my time is incredibly tight right now, but I have started trying to fix it.
Basically what happens is this:
1- program runs along
2- oops, need to swap a page in
3- select sacrificial page (based on least accesses)
4- runs a bit
5- goto 2
Now if the access pattern of the program is such that the newly swapped in page is not hit a lot before there is a need to swap another page in... guess which page gets chosen for the sacrifice? You guessed right. The same page as last time. So the VM ends up thrashing, flushing and re-loading the same page --> performance then sucks.
I am working on two possible fixes, but they are not (yet) working correctly!
1) simple fix: keep the access count of the page that was swapped out. This will make the count increase with hits to the newly loaded page, and hopefully it won't be always chosen as the sacrificial page.
2) better fix: average the hit counts of all pages, and assign the average as the hit count of the new page. This should lead to a "fairer" sacrifice.
I've tried both, and both fail the "f" fill word test. I have a bug somewhere, probably another badly re-used variable.
VMCOG is now proven to work, at this point it has to be tuned for better performance.
I will also make a "bigvm" version as soon as I can.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system