Shop OBEX P1 Docs P2 Docs Learn Events
VMCOG: Virtual Memory for ZiCog, Zog & more (VMCOG 0.976: PropCade,TriBlade_2,HYDRA HX512,XEDODRAM) - Page 9 — Parallax Forums

VMCOG: Virtual Memory for ZiCog, Zog & more (VMCOG 0.976: PropCade,TriBlade_2,HYDRA HX512,XEDODRAM)

1679111216

Comments

  • jazzedjazzed Posts: 11,803
    edited 2010-06-11 23:46
    So you've laid out a road map. That's a good start.

    Attached is an sd based boot-loader I wrote last year that you could consider as a small starting point. Look for BootLoader.doc in the .zip. As far as I remember, it is functional but uses the old FSRW.

    There is a lot of *nix flavor here which is good considering many software engineers understand that, and it is a known workable model.

    It seems to me that SD access should be available to any application at a very low cost to the application. It's just another O/S service and it can be managed in a sane way with ADT methodology just like any other system service. I assume you've considered /proc and /dev file-systems for resource management although it's unclear if /proc is truly necessary.

    Yes, anything that people are willing to write or release should be able to run on such a system. There are some things I probably won't test though. BigSpin has been a goal of mine and I even have a project on the back burner with that name [noparse]:)[/noparse] Still, I'm more interested in more mainstream things like GNU in industry and Java in education or both in any form; other languages and tools come to mind. Still, small is a very good thing especially when dealing with a micro-controller [noparse]:)[/noparse]

    It's Ok to use .sh extension, but it usually means "this is a Bourne shell file" ... just like .csh usually means this is a "C shell file". Mixing things up is not really desirable. I've seen too many bugs take flight on incorrect assumptions.

    Cheers.
    --Steve
    Bill Henning said...
    I will make a Minos feature set document soon... I am still working on it.

    Basically, SD with fsrw during boot, MODULES and INIT.D, and I suspect mostly .ZOG executables. .COG drivers/vm's.

    I desperately need an SD cog, with a mailbox interface, and an fsrw26 modified to use that. Once we have those, work on the boot loader can be started.

    Once the bootloader can process MODULES and INIT.D, it is time to write mash.zog

    The idea is that small C version of FAT support can be compiled with Gcc to zog code, so it can run from the virtual memory.

    Then it is just more mailbox drivers, and utilities. I also want to support spin bytecode executables, but ideally with a "BigSpin" interpreter - OR - they take over the 24KB left after the Minos system area until BigSpinVM exists.

    The ideas are gelling in my head... ie what subset of Largos features are needed / feasible.

    For "compatibility", I think mash scripts should still have a plain ".sh" extension

    And I *REALLY* want to see Sphinx running under Minos. Combined with BigSpinVM it would be a killer combination.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-12 01:05
    jazzed said...
    So you've laid out a road map. That's a good start.

    Thanks - I think between adherence to the KISS principle, and an incremental development cycle with baby steps, we can get something basic running pretty quick - and expand from there!
    jazzed said...
    Attached is an sd based boot-loader I wrote last year that you could consider as a small starting point. Look for BootLoader.doc in the .zip. As far as I remember, it is functional but uses the old FSRW.

    Nice work! I quickly read the .doc, there is a lot of good stuff there...
    jazzed said...
    There is a lot of *nix flavor here which is good considering many software engineers understand that, and it is a known workable model.

    That is quite deliberate, within the context of no subdirectories and 8.3 file names (for now at least)

    Largos is even more *nixy, VERY deliberately. I'll bring the prototype on a Morpheus to UPEW, running off a 1MB Winbond with my FS.
    jazzed said...
    It seems to me that SD access should be available to any application at a very low cost to the application. It's just another O/S service and it can be managed in a sane way with ADT methodology just like any other system service. I assume you've considered /proc and /dev file-systems for resource management although it's unclear if /proc is truly necessary.

    Largos does implement /dev, but not /proc - and I may even drop /dev. I was mainly planning to use it for device numbers, but /etc/modules is sufficient I think.

    I have to keep things nice and very small... at least until Prop2
    jazzed said...
    Yes, anything that people are willing to write or release should be able to run on such a system. There are some things I probably won't test though. BigSpin has been a goal of mine and I even have a project on the back burner with that name [noparse]:)[/noparse] Still, I'm more interested in more mainstream things like GNU in industry and Java in education or both in any form; other languages and tools come to mind. Still, small is a very good thing especially when dealing with a micro-controller [noparse]:)[/noparse]

    LOL... I just used BigSpin as a handy moniker, I think I heard both you and Cluso use that description in the past.

    I am interested in all three tool chains - gnu C/C++, Java, and spin - for the same reasons you are interested in them.
    jazzed said...
    It's Ok to use .sh extension, but it usually means "this is a Bourne shell file" ... just like .csh usually means this is a "C shell file". Mixing things up is not really desirable. I've seen too many bugs take flight on incorrect assumptions.

    ah I guess ".msh" is ok.

    I liked ".sh" as it implies a limited functionality [noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • jazzedjazzed Posts: 11,803
    edited 2010-06-12 01:42
    Hmm. I was just running that bootloader and reviewing implementation.
    There is a lot of unnecessary cruft there especially with that config block.

    I think a file line-editor would be more valuable than the config business,
    but there are some things that are minimum requirements like sd card
    pins and a startup program filename (startup can be hard coded).

    There are a few salvageable items:

    - It will boot a default but limited size spin binary up to 26KB. Mike Green's
    booter by comparison allows booting a full 32KB application using cog code.
    I know how to do that too, just never thought it too important for startup.

    - There are some limited diagnostic abilities if boot fails.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • heaterheater Posts: 3,370
    edited 2010-06-12 03:16
    Bill, Jazzed, a lot of good minos strategy expressed in that page of postings. I should print the thing out.

    Did anyone notice that spiFemto pretty much uses a minos mailbox already? It's cog is started with par pointing at this structure:

    '' -------------------------------------------------------------------
    '' |   cmd/status   |          I/O pin / device / address            |
    '' -------------------------------------------------------------------
    '' |           byte count           |          HUB address           |
    '' -------------------------------------------------------------------
    
    



    cmd/status is a byte.

    The Spin access methods are pretty much just simple manipulation of that block.

    How would you like it to look for minos or can it stay the same?

    I am using a slightly modified version by Cluso that has a function to tristate it's pins. This is required as RAM and SD share pins on the TriBlade/RamBlades.


    Edit: fsrwfemto itself does not have a running COG, It's only a bunch of Spin functions. So it makes no sense to have a mailbox interface to it from above.

    It could be easily modified to use a mailbox interface to the SPI driver but what is the point?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.

    Post Edited (heater) : 6/12/2010 3:27:47 AM GMT
  • jazzedjazzed Posts: 11,803
    edited 2010-06-12 15:14
    The boot-loader I posted uses the Spin only version of the old SD fsrw driver mainly to save hub space for the application. The total fsrw code/var size for that is 850 longs. However, one could probably create an fsrw app that uses less space by recycling the cog code.

    Also, the directory/filename issues for any FAT* driver will end up needing to use trans.tbl or one of it's successors, so it may as well be done up front.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-12 17:42
    I took a closer look, and it is a bit heavy in its current form. It is a nice piece of work though.
    jazzed said...
    Hmm. I was just running that bootloader and reviewing implementation.
    There is a lot of unnecessary cruft there especially with that config block.

    I think a file line-editor would be more valuable than the config business,
    but there are some things that are minimum requirements like sd card
    pins and a startup program filename (startup can be hard coded).

    There are a few salvageable items:

    - It will boot a default but limited size spin binary up to 26KB. Mike Green's
    booter by comparison allows booting a full 32KB application using cog code.
    I know how to do that too, just never thought it too important for startup.

    - There are some limited diagnostic abilities if boot fails.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-12 17:53
    Well, convergent designs abound [noparse]:)[/noparse]

    Frankly, a mult-processor chip with shared ram will pretty much devolve to some mailbox like mechanism, and Mike's layout is a logical one.

    I prefer a fixed 4 long format. VMCOG diverged a bit from the one I use in Largos, due to merging the command and vmaddress longs into one for efficiency.

    In cases where it is not absolutely necessary, I do not like packing the longs with several fields, as it takes too many cog instructions to pack/unpack them, so I don't plan to adopt Mike's format.

    I think something like the following would work for an SD cog:

    mbox[noparse][[/noparse]0] = command / set to 0 by SD.COG when complete
    mbox = sector address (2^32 sectors of 512 bytes each = maximum 2^41 bytes on media = approx 2TB
    mbox = hub address
    mbox = number of sectors to transfer / return code

    Obviously the setup commands can assign different meanings to 1/2/3 for their input, but all should return status/error code in 3.

    There is no reason a cog running the spin fsrwfemto could not be equipped with such a mailbox .. the point being invoking it from within ZOG directly [noparse]:)[/noparse]

    heater said...
    Bill, Jazzed, a lot of good minos strategy expressed in that page of postings. I should print the thing out.

    Did anyone notice that spiFemto pretty much uses a minos mailbox already? It's cog is started with par pointing at this structure:

    '' -------------------------------------------------------------------
    '' |   cmd/status   |          I/O pin / device / address            |
    '' -------------------------------------------------------------------
    '' |           byte count           |          HUB address           |
    '' -------------------------------------------------------------------
    
    



    cmd/status is a byte.

    The Spin access methods are pretty much just simple manipulation of that block.

    How would you like it to look for minos or can it stay the same?

    I am using a slightly modified version by Cluso that has a function to tristate it's pins. This is required as RAM and SD share pins on the TriBlade/RamBlades.


    Edit: fsrwfemto itself does not have a running COG, It's only a bunch of Spin functions. So it makes no sense to have a mailbox interface to it from above.

    It could be easily modified to use a mailbox interface to the SPI driver but what is the point?
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system

  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-12 17:54
    I did like the small size.
    jazzed said...
    The boot-loader I posted uses the Spin only version of the old SD fsrw driver mainly to save hub space for the application. The total fsrw code/var size for that is 850 longs. However, one could probably create an fsrw app that uses less space by recycling the cog code.

    Also, the directory/filename issues for any FAT* driver will end up needing to use trans.tbl or one of it's successors, so it may as well be done up front.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • heaterheater Posts: 3,370
    edited 2010-06-13 07:16
    Point taken about fsrwfemto's mailbox.

    but - " ...the point being invoking it from within ZOG directly" does not sound right to me.

    Zog as a CPU engine should not know about such things. It should have connections to it's memory, its I/O and possibly signals like interrupt and halt.

    As ZPU is has no I/O address space or bus one could say that all I/O should go through its memory interface i.e. VMCOG. However it seems more reasonable to partition the memory space and have access to some I/O area be directed through an IO mailbox to whatever I/O handler there is.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-13 12:25
    I meant invoking it from ZOG byte code, which resides in the virtual memory.

    As far as how to communicate with mailboxes, I see two possible scenarios:

    map a high virtual address onto a specific hub page

    or

    add "raw" {rd|wr}{byte|word|long} system calls to ZOG, which bypass the VM. (I prefer this escape hatch)
    heater said...
    Point taken about fsrwfemto's mailbox.

    but - " ...the point being invoking it from within ZOG directly" does not sound right to me.

    Zog as a CPU engine should not know about such things. It should have connections to it's memory, its I/O and possibly signals like interrupt and halt.

    As ZPU is has no I/O address space or bus one could say that all I/O should go through its memory interface i.e. VMCOG. However it seems more reasonable to partition the memory space and have access to some I/O area be directed through an IO mailbox to whatever I/O handler there is.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • heaterheater Posts: 3,370
    edited 2010-06-13 14:18
    I start to think we should map all of HUB memory (including ROM) to some high address in the ZPU memory space which would bypass VMCOG. Which gives us the following:

    1) Access to any mailboxes.
    2) Access buffers spaces in HUB.
    3) Enables putting the ZPU stack in HUB.
    4) Enables putting chunks of ZPU code and data in HUB.

    1,2, and 3 can run normal ZPU programs as built "out of the box" now. ZPU does not care where you put the stack and we only need some pointers into HUB space.

    4 requires a little tweaking of linker scripts to locate code from selected/modules into a .hub section.

    There we have a possibility to turbo some parts of large programs. Or have ZPU programs in COG operating on data in ext RAM.



    I'd like to adapt the syscall mechanism to allow direct access to pins, counters, locks, cognew, the hub memory etc. All normal Prop features. Such that anything you can do in Spin you can do in C.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-13 15:53
    heater,

    I think that sounds like an excellent idea! The only downside is the test for hub/vm address space, however I think it would be worth it.

    Three IMM's give 21 bit constants. Four give 28 bits.

    I would suggest bit 21 for a "medium" ZOG (small is hub-only), and bit 27 for "large" zog. Why? Simply to keep the number of IM's under control.

    Instead of 4, I suspect a "launch temporary cog" would be better, treating the cog like a subroutine with 100us overhead... and running pasm in the cog.

    Instead of having specific pin/dir features, maybe two sys functions getreg() and setreg(), which read/write cog registers $000-$1FF, thus accessing dira+friends

    Bill

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • heaterheater Posts: 3,370
    edited 2010-06-13 20:26
    OK I'll see what I can do.

    Good point about the address lengths and number of IM's required.

    If we were to omit 4), executing code from HUB, there would be no extra test for address space required in the opcode fetch path.

    If we were to fix where the stack resides, HUB or ext, then there is no need for the address space test in push_tos or pop either. Perhaps this could build option with #define.

    Hmm... "launch temporary cog", remote procedure calls. Temporary or not this should be done through syscall.

    "getreg() and setreg(), which read/write cog registers $000-$1FF" - That gives ZPU programs the possibility to alter the virtual machine they are running on. Ouch! Could get interesting[noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • jazzedjazzed Posts: 11,803
    edited 2010-06-14 05:15
    @Bill.

    Well I have good news and bad news.

    The good news is: with a direct-mapped cache using 16 bytes per cache line,
    64 tags and no aging replacement policy, the Propeller JVM can run the FIBO24
    test from EEPROM faster than Javelin in the worst case test smile.gif

    Moreover, it is only 2x slower in the worst case than running from HUB RAM.

    The bad news is: I can't get vmcog to work yet. eyes.gif
    I'm sure it's just a matter of time and more effort though.

    FIBO(00) =     0 (   11 ticks) (    6 ms)
    FIBO(01) =     1 (   13 ticks) (    7 ms)
    FIBO(02) =     1 (   13 ticks) (    7 ms)
    FIBO(03) =     2 (   13 ticks) (    7 ms)
    FIBO(04) =     3 (   13 ticks) (    7 ms)
    FIBO(05) =     5 (   14 ticks) (    8 ms)
    FIBO(06) =     8 (   16 ticks) (    9 ms)
    FIBO(07) =    13 (   19 ticks) (   11 ms)
    FIBO(08) =    21 (   23 ticks) (   13 ms)
    FIBO(09) =    34 (   29 ticks) (   16 ms)
    FIBO(10) =    55 (   40 ticks) (   22 ms)
    FIBO(11) =    89 (   57 ticks) (   32 ms)
    FIBO(12) =   144 (   84 ticks) (   47 ms)
    FIBO(13) =   233 (  128 ticks) (   71 ms)
    FIBO(14) =   377 (  201 ticks) (  112 ms)
    FIBO(15) =   610 (  317 ticks) (  176 ms)
    FIBO(16) =   987 (  505 ticks) (  281 ms)
    FIBO(17) =  1597 (  810 ticks) (  450 ms)
    FIBO(18) =  2584 ( 1304 ticks) (  724 ms)
    FIBO(19) =  4181 ( 2106 ticks) ( 1170 ms)
    FIBO(20) =  6765 ( 3394 ticks) ( 1885 ms)
    FIBO(21) = 10946 ( 5485 ticks) ( 3047 ms)
    FIBO(22) = 17711 ( 8867 ticks) ( 4926 ms)
    FIBO(23) = 28657 (14340 ticks) ( 7966 ms)
    FIBO(24) = 46368 (23196 ticks) (12886 ms)
    
    



    Validation of principle? [noparse]:)[/noparse] I think so.

    I do find though that trying to use a cache line larger than 16 significantly degrades performance.
    While FIBO(0) time goes to 3ms (+/- 1ms), the FIBO(22) test takes over 1 minute to finish.

    Have you by chance considered a vmcog with a cache line smaller than 512 bytes ?

    Cheers,
    --Steve

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-14 14:22
    Nice! That certainly makes an EEPROM based direct-mapped Javelin replacement very workable!

    I am sure you can get VMCOG running.

    Yes, we can try smaller page sizes once I move to a hub-based page present table.

    I don't think the paging speed impact (for RAM) will be as bad as for eeprom, because I2C eeprom normally runs at 400kbps, where even my slow SPI routines run at 4mbps, and using counter code 10mbps is definitely possible, and 20mbps may be possible (have to check reliability).

    Ignoring the headers, 10mbps SPI flash will read/write 25x faster than 400kbps eeprom.

    16 bytes * 25 = 400 bytes

    Moving to 256 byte pages then should be faster than 16 byte eeprom pages.

    The problem with really small pages is the number of "page present" entries required in the hub.

    page_present_table_size_in_bytes = VM_size_in_bytes / byte_per_page

    The other limit is that realistically, there can only be 128 TLB entries in a cog even after the page present table is moved to the hub.
    jazzed said...
    @Bill.

    Well I have good news and bad news.

    The good news is: with a direct-mapped cache using 16 bytes per cache line,
    64 tags and no aging replacement policy, the Propeller JVM can run the FIBO24
    test from EEPROM faster than Javelin in the worst case test smile.gif

    Moreover, it is only 2x slower in the worst case than running from HUB RAM.

    The bad news is: I can't get vmcog to work yet. eyes.gif
    I'm sure it's just a matter of time and more effort though.

    Validation of principle? [noparse]:)[/noparse] I think so.

    I do find though that trying to use a cache line larger than 16 significantly degrades performance.
    While FIBO(0) time goes to 3ms (+/- 1ms), the FIBO(22) test takes over 1 minute to finish.

    Have you by chance considered a vmcog with a cache line smaller than 512 bytes ?

    Cheers,
    --Steve
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • jazzedjazzed Posts: 11,803
    edited 2010-06-14 15:35
    Bill Henning said...
    The other limit is that realistically, there can only be 128 TLB entries in a cog even after the page present table is moved to the hub.
    Ya, propeller memory size always makes me feel like I'm wearing a straight-jacket.

    I'll make VMCOG work on Propeller JVM at some point.
    I haven't given up on VMCOG, it's just that in the EEPROM case, it is just not practical.

    As is, I don't think it would serve VMCOG very well by adding EEPROM code.
    Good thing the SPI ram is not limited to 400KHz [noparse]:)[/noparse]

    Cheers.
    --Steve

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • heaterheater Posts: 3,370
    edited 2010-06-15 06:50
    Just managed to get VMCog and SD card to play together on TriBlade. It's fun because SD shares pins with the RAM.

    I can now load the ZPU image to ext RAM from an SD file.

    Sadly dhrystone took over 20 minutes to run!

    I have 16 pages for VMCog and dhrystone is 17K of code and the stack lives at 64K

    I'll post TriBlade updates for VMCog when I have cleaned it up a bit.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-15 12:57
    jazzed:

    No worries, take your time... I can wait a while to see Java on VMCOG - but I am looking forward to it!

    heater:

    Great progress!

    I think Dhrystone would be a lot quicker with a bigger working set. As a matter of fact, plotting a graph of Dhrystone execution time vs. working set size is one of the things I'd like to see.

    I'll post the cut-down PropCmd later today, that should let you have a 46 page or so working set.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • heaterheater Posts: 3,370
    edited 2010-06-15 16:21
    Attached is vmcog with v0.10 of my TriBlade access routines.

    The TriBlade 2 shares pins between ext RAM and SD card (and other things) so I had to ensure the "bus" is tristated at the end of each BREAD and BWRITE.

    That meant moving all ext memory handling code out of BINIT and BSTART and placing it at the front of BREAD and BWRITE. Then tristating the bus after BREAD and BWRITE. This way the RAM chip select is reasserted on every BREAD/WRITE and "lost" again when they are done.

    This now enables Zog to load it's ext RAM from SD at start up.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-15 16:51
    Nice work!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-15 20:37
    UPDATE:

    I just uploaded version 0.970 into the first post.

    This version adds a "hitavg" subroutine used by BUSERR to set the access count of the newly loaded page to the average of all pages access counts.

    It may help with certain thrashing situations.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • heaterheater Posts: 3,370
    edited 2010-06-15 20:42
    You don't have my TriBlade v0.10 mods in there.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-15 20:49
    I know, I will merge them later. Sorry, I am in the middle of soldering, but snuck in a quick hitavg routine [noparse]:)[/noparse]

    I mentioned the lack in the ZOG thread.
    heater said...
    You don't have my TriBlade v0.10 mods in there.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • heaterheater Posts: 3,370
    edited 2010-06-15 20:59
    Sorry, so you did. I just merged again. Now Zog crashes...and the new vmdebug fails the heater test.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-15 21:01
    Sorry, bug in the patch - that's what happens when I try a major change quickly.

    I will put back the last release, and remove this version until I have time to fix it (tonight maybe). Sorry, I am soldering like crazy.
    heater said...
    Sorry, so you did. I just merged again. Now Zog crashes...and the new vmdebug fails the heater test.
    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-16 15:52
    I am trying to chase down why I have problems when I initialize the hit count for a newly swapped in page to a non-zero value.

    Stay tuned...

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • heaterheater Posts: 3,370
    edited 2010-06-16 17:54
    Meanwhile I created a small standalone FIBO test (attached) that compiles down to only 3376 bytes.

    Under vmcog it calculates and prints ALL the FIBOs from 0 to 26 in 140 seconds.

    From HUB RAM that's 42 seconds.

    Much more like it.

    #include <stdio.h>
    #include <string.h>
    #include <unistd.h>
    #include <stdlib.h>
    
    
    void putnum(unsigned int num)
    {
      char  buf[noparse][[/noparse]9];
      int   cnt;
      char  *ptr;
      int   digit;
    
      ptr = buf;
      for (cnt = 7 ; cnt >= 0 ; cnt--)
      {
        digit = (num >> (cnt * 4)) & 0xf;
    
        if (digit <= 9)
          *ptr++ = (char) ('0' + digit);
        else
          *ptr++ = (char) ('a' - 10 + digit);
      }
    
      *ptr = (char) 0;
      print (buf);
    }
    
    unsigned int fibo (unsigned int n)
    {
            if (n <= 1)
            {
                    return (n);
            }
            else
            {
                    return fibo(n - 1) + fibo(n - 2);
            }
    }
    
    int main (int argc,  char* argv[noparse][[/noparse]])
    {
        int n;
        int result;
            for (n = 0; n <= 26; n++)
        {
            result = fibo(n);
            print ("fibo(");
            putnum(n);
            print (") = ");
            putnum(result);
            print("\n");
        }
        return(0);
    }
    
    

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • heaterheater Posts: 3,370
    edited 2010-06-17 15:59
    Another new FIBO test program.

    This one is 17056 bytes as it uses a small version of printf to display results like so:

    fibo(00) = 000000 (00000ms)
    fibo(01) = 000001 (00000ms)
    fibo(02) = 000001 (00000ms)
    fibo(03) = 000002 (00000ms)
    fibo(04) = 000003 (00001ms)
    fibo(05) = 000005 (00001ms)
    fibo(06) = 000008 (00003ms)
    fibo(07) = 000013 (00005ms)
    fibo(08) = 000021 (00008ms)
    fibo(09) = 000034 (00013ms)
    fibo(10) = 000055 (00022ms)
    fibo(11) = 000089 (00035ms)
    fibo(12) = 000144 (00057ms)
    fibo(13) = 000233 (00093ms)
    fibo(14) = 000377 (00151ms)
    fibo(15) = 000610 (00245ms)

    Results only go up to FIBO(15) as my crude CNT based timing rolls over for the slow tests after that.

    I have run this with a selection page values from 35 (the most I can fit in) down to 1 (yes 1).

    A summary of the execution times for FIBO(15) looks like this:

    Pages, milliseconds:
    01 26587
    05 26587
    10 26709
    20 26927
    30 27180
    31 27203
    32 03134
    33 03134
    34 00245
    35 00245

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.

    Post Edited (heater) : 6/17/2010 4:09:12 PM GMT
    640 x 480 - 4K
  • jazzedjazzed Posts: 11,803
    edited 2010-06-17 17:42
    heater said...
    A summary of the execution times for FIBO(15) looks like this:

    Pages, milliseconds:
    01 26587
    05 26587
    10 26709
    20 26927
    30 27180
    31 27203
    32 03134
    33 03134
    34 00245
    35 00245

    It's nice to see some progress, but it seems this is not so encouraging.
    I would have expected something other than a step function.
    So 34*512 > code size? What happens if you use a big printf again?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-17 18:13
    heater: Thanks!

    jazzed: The current page replacement policy basically sucks for these tests.

    I am preparing for UPEW, so my time is incredibly tight right now, but I have started trying to fix it.

    Basically what happens is this:

    1- program runs along
    2- oops, need to swap a page in
    3- select sacrificial page (based on least accesses)
    4- runs a bit
    5- goto 2

    Now if the access pattern of the program is such that the newly swapped in page is not hit a lot before there is a need to swap another page in... guess which page gets chosen for the sacrifice? You guessed right. The same page as last time. So the VM ends up thrashing, flushing and re-loading the same page --> performance then sucks.

    I am working on two possible fixes, but they are not (yet) working correctly!

    1) simple fix: keep the access count of the page that was swapped out. This will make the count increase with hits to the newly loaded page, and hopefully it won't be always chosen as the sacrificial page.

    2) better fix: average the hit counts of all pages, and assign the average as the hit count of the new page. This should lead to a "fairer" sacrifice.

    I've tried both, and both fail the "f" fill word test. I have a bug somewhere, probably another badly re-used variable.

    VMCOG is now proven to work, at this point it has to be tuned for better performance.

    I will also make a "bigvm" version as soon as I can.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
Sign In or Register to comment.