Sorry! I missed your post before , you posted while I was posting responses [noparse]:)[/noparse]
Thanks, and no, it is not Christmas... and unfortunately I missed my Christmas PropellerBasic demo date. So now, I will not put a date on that until it is very close... it turned out that I significantly underestimated how long it would take to write a highly optimizing LMM code generator for the Propeller.
64KB EEPROM's cost about 30%-40% more than 32KB ones, and they would be a good place to hold drivers. Personally, I was thinking of using standard 32KB EEPROM's, holding a "boot loader" that can load device drivers etc. from SD, and let the user pick the image they want to run - or perhaps have an "AUTOEXEC.BAT" file [noparse]:)[/noparse]
Actually, Largos uses /etc/modules to specify what drivers to load, and /etc/rc.local as an "AUTOEXEC.BAT" equivalent.
Initially I was going to use 4 lines for one SPI RAM chip (and 5 for two), but Andy's code uses 4 for two RAM's (/CS0, /CS1, CLK, DATA) so I am using his pinout.
Ahh... no matter the temptation, I am keeping my mouth zipped about the new board until I have working boards in my hands...
All I will reveal for now is that it will have a lot of features, and be very affordable.
... and thanks
Dr_Acula said...
This just keeps getting better and better. But - I just checked the date, and it isn't Christmas?!
Ok, custom objects sounds great. And like you say, the idea ought to catch on, especially when you see how much memory it will save in a typical project.
I seem to recall that eeproms come in bigger sizes for really not much more money, so presumably the spare space in an eeprom is a good place to put cog code?
I see 4 lines to drive two SPI rams - is that right, and you are joining data in and data out, which would make sense and that saves pins.
Great to hear a new board is in the pipeline. I left one thing off that list before, and that was TV, but on one of the dracblade designs I used the 8 pins 16 to 24 for vga and also pins 16 to 19 went to TV resistors so you could use one or the other by installing a vga plug or an RCA plug plus the appropriate resistors. (I'm not sure, could you ever drive TV and VGA at the same time?)
Ok, well, I'm out of ideas because everything I ever could have dreamed of is going to be in this new design. Life is good!
- some progress on unaligned reads and writes, still not quite functional and it is taking more memory than I want it to
Which would you prefer:
A) a small performance hit on all accesses, read and write, to make it fit with current 256 entry page table with 256 byte pages
-OR-
no performance hit, but an immediate switch to 128 entry page table with 512 byte pages
At some point I will be moving to 512 byte pages anyway, but for now I have a choice of using a couple of subroutines, with the subroutine call/return overhead added in order to save code space in the cog, or I can free up 128 longs by moving to a smaller TLB.
Later, I will be moving to either an inverted page table, or a hub based page table in order to allow for a much larger virtual memory space - but I want something running before that with a small easily understood VM layout.
Ahhg, I didn't get to vote. 512 byte pages sounds quite reasonable to me. I'm sure you have some interesting uses for those 128 LONGS.
Fortunately Zog does not do unaligned accesses of WORDs and LONGs.
Well actually Zog does not check for unaligned, or even clear the low bit or two to force alignment, but I think ZPU is supposed to do one or the other.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
You can then use the current version in the top thread!
Aligned accesses have been working for about a week. I'll fork the code, leaving the 256 byte version for you to work with while I perfect the 512 byte version (that will support unaligned access).
Since you don't need unaligned access... would it help you if I got SPI RAM running with the 256 byte version before I tackled 512 byte pages and allowing unaligned access? If I don't have to handle unaligned access, I think I can have it working with two SPI ram's in a day or two, I'd just need to write a small SPI_READ and SPI_WRITE routines that fit in the 104 longs currently available.
I need the longs to handle unaligned accesses without slowing down single byte accesses, and to allow me to have faster SPI routines.
heater said...
Ahhg, I didn't get to vote. 512 byte pages sounds quite reasonable to me. I'm sure you have some interesting uses for those 128 LONGS.
Fortunately Zog does not do unaligned accesses of WORDs and LONGs.
Well actually Zog does not check for unaligned, or even clear the low bit or two to force alignment, but I think ZPU is supposed to do one or the other.
Also... if you are not running under SphinxOS, and want to run Zog code, you can use as much of the 32KB of hub ram as you want for the working set!
heater said...
Ahhg, I didn't get to vote. 512 byte pages sounds quite reasonable to me. I'm sure you have some interesting uses for those 128 LONGS.
Fortunately Zog does not do unaligned accesses of WORDs and LONGs.
Well actually Zog does not check for unaligned, or even clear the low bit or two to force alignment, but I think ZPU is supposed to do one or the other.
heater: FWIW in case you have forgotton. It you do a read/write word/long with the hub address misaligned (eg access a long at $4001) you will actually access the long at $4000. The lower bit/bits are ignored by the instruction. This is used to advantage in the overlay loader code. However, it can also be a "gotcha".
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
The ZPU emulator in Java traps unaligned accesses as an error, so I guess they really should not happen. Just now I'm not checking, probably never will, but relying on rd/wrword and rd/wrlong to align things right or wrong. Rather like the last ARM board I worked on with Linux.
No, SphinxOS or other OS here, just bare metal. Using 16K bytes for Zog memory. The is enough free space for another 12Kb for Zog. Minus whatever VMCOG needs.
Bill, don't schedule any developments around me, my available time is very unpredictable for a while. Except I can predict there is not much available[noparse]:([/noparse] Besides I have no SPI RAMS or such.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
- switched to 512 byte pages
- TLB now has 128 entries, freeing up 128 longs for code!
- TLB format changed to allow for more optimization and new 'L' bit
- started adding support for Lock() and Unlock()
- reads/writes now slightly faster due to more optimal TLB usage
- hit counters in TLB increased to 23 bits (can count to 8M accesses before decimating)
- eliminated "Guard" bit
- using Carry flag to detect overflow of hit count
Note, the new code is untested. You can find it attached to the first post!
I've left the last published version with 256 byte pages in the first post as well.
I look forward to porting this to my language project. Being able to buffer instructions from an SdCard will be very important.
Best of luck finishing it.
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Short answers? Not available at this time since I think you deserve more information than you requested.
Thanks for the new code. I'm studying it at the moment.
Re the first post: "I made a suggestion a few days ago on how it might be possible to make a virtual memory manager that could be used by ZiCog to get "acceptable" performance with slow external memory designs."
I see two cogs being used for this code - one for the serial ram and one is vmcog for virtual memory.
Just to refresh the typical cogs a board needs - eg the dracblade:
Main routine =1 cog
4 port serial =1 cog
Keyboard =1 cog
Display =2 cogs
Sd card =1 cog
Zicog =1 cog
latched ram driver =1 cog
Total = 8 cogs
The serial ram driver replaces the latched ram driver but this leaves a total of 9 cogs [noparse]:([/noparse]
So - which things can be combined in cunning ways? Serial is 4 ports in one cog and is full. VGA seems pretty optimised and full. Keyboard maybe combined with something but a bit tricky. Zicog is currently full, though it is very much in flux at the moment and that may change with LMM.
But SPI code for the sd card and SPI code for the serial ram. Could these be combined into one cog, and hence get the count down to 8? If so, this could lead to a board design about half the size and with a lot less chips.
The SPI code for the SD card and I2C code for EEPROMs already have been combined and just fit in one cog. If you leave out the I2C code, you could easily combine the SD card code with SPI code for the serial ram.
It will take several more weeks as I am currently spending most of my time building and testing boards [noparse]:)[/noparse] but those should be done within two weeks.
Dr_Acula:
The reason I spent so much time optimizing for space is to make room for SPI drivers for SPI ram, which could potentially also be used for SD card access [noparse]:)[/noparse] so theoretically VMCOG could replace the latched ram driver AND the SD card cog, so you would actually have one cog free [noparse]:)[/noparse]
At the very least, VMCOG will support the SPI ram, with a separate, faster, SD cog.
Mike:
Absolutely true. Unfortunately I don't think I'll have enough room for the counter based 32 way unrolled 10mbps SPI code, however I can do a variation, using the counters for 8 bits at a time, and get over 8Mbps potentially... supporting both SD and RAM internally to VMCOG. I can definitely do 4Mbps with very little code.
So just to double check, you are combining sd card, spi ram and the virtual memory driver all into one cog? That would be very nice!
Brainstorming here - ok zicog says 'I want a byte at location $nnnn' and at the moment the ram drive code sits within the zicog. It is a bit messy at present as there also is ram driver code in another cog for pre-filling the ram with data, so every rewrite of the code involves writing memory driver code in two places.
I see your design simplifying this process as well. So - zicog asks for a byte - is this done by checking a flag in a hub location and if the byte is in hub (in the 'fast cache') then getting the byte? Or would it set a flag to say that it is requesting a byte, then when the vmcog finds that flag it processes the request and returns the byte? Either of these will work and still may be faster even than latched code.
Bill Henning said...
It will take several more weeks as I am currently spending most of my time building and testing boards [noparse]:)[/noparse] but those should be done within two weeks.
I can wait as I still have lots of work to do [noparse]:)[/noparse]
BTW: I don't mind having physical device drivers in a second COG.
What I would prefer is TLB memory mapped sections similar to what is used in MIPS.
In that design the TLB maps the memory segment to the physical device and driver.
This way, you could use multiple drivers such as one intra-COG and one or more inter-COG.
Of course, the VMCOG TLB options do not have to be as complicated as MIPS [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Short answers? Not available at this time since I think you deserve more information than you requested.
Initially I am combining SPI ram and VM into one cog, and hopefully will later merge the low level SD code too.
See vmcog.spin for the Spin code used to read/write bytes, that shows how to use the VMCOG mailbox [noparse]:)[/noparse]
Obviously ZiCog could do implement the equivalent of the spin code in PASM.
Basically, a request consists of:
- checking until the "command" long is 0 (ie VMCOG is not busy servicing the previous request)
- encoding the VMCOG command and virtual address into that long, which places the service request into the mailbox
- waiting until the command is done
- picking up the result
PUB rdvbyte(addr)
repeat while long[noparse][[/noparse]cmdptr]
long[noparse][[/noparse]cmdptr] := (addr<<9)|READVMB
repeat while long[noparse][[/noparse]cmdptr]
return long[noparse][[/noparse]dataptr]
however I am considering changing this somewhat to optimize VMCOG even more. Stay tuned for important messages [noparse]:)[/noparse]
Dr_Acula said...
So just to double check, you are combining sd card, spi ram and the virtual memory driver all into one cog? That would be very nice!
Brainstorming here - ok zicog says 'I want a byte at location $nnnn' and at the moment the ram drive code sits within the zicog. It is a bit messy at present as there also is ram driver code in another cog for pre-filling the ram with data, so every rewrite of the code involves writing memory driver code in two places.
I see your design simplifying this process as well. So - zicog asks for a byte - is this done by checking a flag in a hub location and if the byte is in hub (in the 'fast cache') then getting the byte? Or would it set a flag to say that it is requesting a byte, then when the vmcog finds that flag it processes the request and returns the byte? Either of these will work and still may be faster even than latched code.
For the first cut, I am going for a very simple direct mapped TLB held in the VMCOG itself. With 512 byte pages and 128 TLB entries, this only allows for a 64KB VM
Later, I can use a reverse-mapped TLB, or a larger, hub-based direct mapped TLB - or larger pages - to allow a larger virtual address space.
I like direct mapped because the time to look up if a page is in the working set is deterministic, and short.
With reverse mapped caches, you actually have to search a list to see if it is in memory, greatly slowing every access. Sure, there are tricks like keeping track of the last accessed and most accessed pages, but it is still a huge loss.
jazzed said...
Bill Henning said...
It will take several more weeks as I am currently spending most of my time building and testing boards [noparse]:)[/noparse] but those should be done within two weeks.
I can wait as I still have lots of work to do [noparse]:)[/noparse]
BTW: I don't mind having physical device drivers in a second COG.
What I would prefer is TLB memory mapped sections similar to what is used in MIPS.
In that design the TLB maps the memory segment to the physical device and driver.
This way, you could use multiple drivers such as one intra-COG and one or more inter-COG.
Of course, the VMCOG TLB options do not have to be as complicated as MIPS [noparse]:)[/noparse]
Someone generously gave my wife a bad cold... which I subsequently caught - I've been too sick to work for about a week (I tried, and made way too many mistakes, so I stopped)
The good news:
I am now well enough to work... and I have just taken receipt of prototype PCB's for my two latest Propeller products, both designed specifically for VMCOG [noparse]:)[/noparse]
I have *FINALLY* gotten SPI ram's working properly - I wrote new bit-banging code for PropCade, and it works just fine (if a bit slower than I'd like).
Once that worked, I wrote READ_PAGE and WRITE_PAGE routines, I just finished testing them - and they work
I expect to release VMCOG v0.90 sometime tomorrow!
VMCOG v0.90 Features:
- 64KB VM provided by two SPI 23K256 32Kx8 memory devices
- 512 byte pages
- 128 direct mapped TLB entries
- optional 64 bit total number of memory reads counter
- optional 64 bit total number of memory writes counter
- page miss counter
I am already planning a version of VMCOG with 1KB pages. This will allow 128KB of VM on PropCade and FlexMem for really large ZOG programs [noparse]:)[/noparse]
Bill Henning said...
I have *FINALLY* gotten SPI ram's working properly - I wrote new bit-banging code for PropCade, and it works just fine (if a bit slower than I'd like).
Once that worked, I wrote READ_PAGE and WRITE_PAGE routines, I just finished testing them - and they work
I expect to release VMCOG v0.90 sometime tomorrow! ...
I do look forward to trying VMCOG.
Unfortunately, I'm stuck with a bug from the 9th level of heck·in the Java project and won't get to it right away.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
May the road rise to meet you; may the sun shine on your back.
May you create something useful, even if it's just a hack.
I've been quietly following the Java thread, it's a fascinating project.
jazzed said...
Bill Henning said...
I have *FINALLY* gotten SPI ram's working properly - I wrote new bit-banging code for PropCade, and it works just fine (if a bit slower than I'd like).
Once that worked, I wrote READ_PAGE and WRITE_PAGE routines, I just finished testing them - and they work
I expect to release VMCOG v0.90 sometime tomorrow! ...
I do look forward to trying VMCOG.
Unfortunately, I'm stuck with a bug from the 9th level of heck in the Java project and won't get to it right away.
I added three constants to VMCOG that can be used by external programs:
PAGESIZE
The number of bytes per page
TLBENTRIES
The number of TLB entries
MEMSIZE
The size of the virtual memory
USAGE:
OBJ
vm : "vmcog"
ser : "FullDuplexSerial"
' lots of code omitted
ser.dec(VM#PAGESIZE) ' output number of bytes per page
ser.dec(VM#TLBENTRIES) ' output number of TLB entries
ser.dec(VM#MEMSIZE) ' output size of virtual memory
Comments
Sorry! I missed your post before , you posted while I was posting responses [noparse]:)[/noparse]
Thanks, and no, it is not Christmas... and unfortunately I missed my Christmas PropellerBasic demo date. So now, I will not put a date on that until it is very close... it turned out that I significantly underestimated how long it would take to write a highly optimizing LMM code generator for the Propeller.
64KB EEPROM's cost about 30%-40% more than 32KB ones, and they would be a good place to hold drivers. Personally, I was thinking of using standard 32KB EEPROM's, holding a "boot loader" that can load device drivers etc. from SD, and let the user pick the image they want to run - or perhaps have an "AUTOEXEC.BAT" file [noparse]:)[/noparse]
Actually, Largos uses /etc/modules to specify what drivers to load, and /etc/rc.local as an "AUTOEXEC.BAT" equivalent.
Initially I was going to use 4 lines for one SPI RAM chip (and 5 for two), but Andy's code uses 4 for two RAM's (/CS0, /CS1, CLK, DATA) so I am using his pinout.
Ahh... no matter the temptation, I am keeping my mouth zipped about the new board until I have working boards in my hands...
All I will reveal for now is that it will have a lot of features, and be very affordable.
... and thanks
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 2/13/2010 8:01:09 PM GMT
If you try my drivers with this connection, then it can not work. My driver expects these pins:
- DI/DO from RAM1
- DI/DO from RAM2
- CLK for both RAMs
- /CS for both RAMs
Andy
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
- some progress on unaligned reads and writes, still not quite functional and it is taking more memory than I want it to
Which would you prefer:
A) a small performance hit on all accesses, read and write, to make it fit with current 256 entry page table with 256 byte pages
-OR-
no performance hit, but an immediate switch to 128 entry page table with 512 byte pages
At some point I will be moving to 512 byte pages anyway, but for now I have a choice of using a couple of subroutines, with the subroutine call/return overhead added in order to save code space in the cog, or I can free up 128 longs by moving to a smaller TLB.
Later, I will be moving to either an inverted page table, or a hub based page table in order to allow for a much larger virtual memory space - but I want something running before that with a small easily understood VM layout.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Thank you Dr_Acula & Cluso99.
I am changing VMCOG to use 512 byte pages, which will free up 128 longs for more code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Fortunately Zog does not do unaligned accesses of WORDs and LONGs.
Well actually Zog does not check for unaligned, or even clear the low bit or two to force alignment, but I think ZPU is supposed to do one or the other.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
You can then use the current version in the top thread!
Aligned accesses have been working for about a week. I'll fork the code, leaving the 256 byte version for you to work with while I perfect the 512 byte version (that will support unaligned access).
Since you don't need unaligned access... would it help you if I got SPI RAM running with the 256 byte version before I tackled 512 byte pages and allowing unaligned access? If I don't have to handle unaligned access, I think I can have it working with two SPI ram's in a day or two, I'd just need to write a small SPI_READ and SPI_WRITE routines that fit in the 104 longs currently available.
I need the longs to handle unaligned accesses without slowing down single byte accesses, and to allow me to have faster SPI routines.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
No, SphinxOS or other OS here, just bare metal. Using 16K bytes for Zog memory. The is enough free space for another 12Kb for Zog. Minus whatever VMCOG needs.
Bill, don't schedule any developments around me, my available time is very unpredictable for a while. Except I can predict there is not much available[noparse]:([/noparse] Besides I have no SPI RAMS or such.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I've finally had time to do an update to VMCOG!
Changes:
- switched to 512 byte pages
- TLB now has 128 entries, freeing up 128 longs for code!
- TLB format changed to allow for more optimization and new 'L' bit
- started adding support for Lock() and Unlock()
- reads/writes now slightly faster due to more optimal TLB usage
- hit counters in TLB increased to 23 bits (can count to 8M accesses before decimating)
- eliminated "Guard" bit
- using Carry flag to detect overflow of hit count
Note, the new code is untested. You can find it attached to the first post!
I've left the last published version with 256 byte pages in the first post as well.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 3/10/2010 9:25:07 PM GMT
Best of luck finishing it.
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Short answers? Not available at this time since I think you deserve more information than you requested.
Re the first post: "I made a suggestion a few days ago on how it might be possible to make a virtual memory manager that could be used by ZiCog to get "acceptable" performance with slow external memory designs."
I see two cogs being used for this code - one for the serial ram and one is vmcog for virtual memory.
Just to refresh the typical cogs a board needs - eg the dracblade:
Main routine =1 cog
4 port serial =1 cog
Keyboard =1 cog
Display =2 cogs
Sd card =1 cog
Zicog =1 cog
latched ram driver =1 cog
Total = 8 cogs
The serial ram driver replaces the latched ram driver but this leaves a total of 9 cogs [noparse]:([/noparse]
So - which things can be combined in cunning ways? Serial is 4 ports in one cog and is full. VGA seems pretty optimised and full. Keyboard maybe combined with something but a bit tricky. Zicog is currently full, though it is very much in flux at the moment and that may change with LMM.
But SPI code for the sd card and SPI code for the serial ram. Could these be combined into one cog, and hence get the count down to 8? If so, this could lead to a board design about half the size and with a lot less chips.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Thanks!
It will take several more weeks as I am currently spending most of my time building and testing boards [noparse]:)[/noparse] but those should be done within two weeks.
Dr_Acula:
The reason I spent so much time optimizing for space is to make room for SPI drivers for SPI ram, which could potentially also be used for SD card access [noparse]:)[/noparse] so theoretically VMCOG could replace the latched ram driver AND the SD card cog, so you would actually have one cog free [noparse]:)[/noparse]
At the very least, VMCOG will support the SPI ram, with a separate, faster, SD cog.
Mike:
Absolutely true. Unfortunately I don't think I'll have enough room for the counter based 32 way unrolled 10mbps SPI code, however I can do a variation, using the counters for 8 bits at a time, and get over 8Mbps potentially... supporting both SD and RAM internally to VMCOG. I can definitely do 4Mbps with very little code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Brainstorming here - ok zicog says 'I want a byte at location $nnnn' and at the moment the ram drive code sits within the zicog. It is a bit messy at present as there also is ram driver code in another cog for pre-filling the ram with data, so every rewrite of the code involves writing memory driver code in two places.
I see your design simplifying this process as well. So - zicog asks for a byte - is this done by checking a flag in a hub location and if the byte is in hub (in the 'fast cache') then getting the byte? Or would it set a flag to say that it is requesting a byte, then when the vmcog finds that flag it processes the request and returns the byte? Either of these will work and still may be faster even than latched code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Post Edited (Dr_Acula) : 3/11/2010 12:18:07 AM GMT
BTW: I don't mind having physical device drivers in a second COG.
What I would prefer is TLB memory mapped sections similar to what is used in MIPS.
In that design the TLB maps the memory segment to the physical device and driver.
This way, you could use multiple drivers such as one intra-COG and one or more inter-COG.
Of course, the VMCOG TLB options do not have to be as complicated as MIPS [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Short answers? Not available at this time since I think you deserve more information than you requested.
See vmcog.spin for the Spin code used to read/write bytes, that shows how to use the VMCOG mailbox [noparse]:)[/noparse]
Obviously ZiCog could do implement the equivalent of the spin code in PASM.
Basically, a request consists of:
- checking until the "command" long is 0 (ie VMCOG is not busy servicing the previous request)
- encoding the VMCOG command and virtual address into that long, which places the service request into the mailbox
- waiting until the command is done
- picking up the result
however I am considering changing this somewhat to optimize VMCOG even more. Stay tuned for important messages [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Later, I can use a reverse-mapped TLB, or a larger, hub-based direct mapped TLB - or larger pages - to allow a larger virtual address space.
I like direct mapped because the time to look up if a page is in the working set is deterministic, and short.
With reverse mapped caches, you actually have to search a list to see if it is in memory, greatly slowing every access. Sure, there are tricks like keeping track of the last accessed and most accessed pages, but it is still a huge loss.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Someone generously gave my wife a bad cold... which I subsequently caught - I've been too sick to work for about a week (I tried, and made way too many mistakes, so I stopped)
The good news:
I am now well enough to work... and I have just taken receipt of prototype PCB's for my two latest Propeller products, both designed specifically for VMCOG [noparse]:)[/noparse]
Now to soldering...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
I'm glad you'r feeling better. Take it easy.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler Largos - upcoming nano operating system
Once that worked, I wrote READ_PAGE and WRITE_PAGE routines, I just finished testing them - and they work
I expect to release VMCOG v0.90 sometime tomorrow!
VMCOG v0.90 Features:
- 64KB VM provided by two SPI 23K256 32Kx8 memory devices
- 512 byte pages
- 128 direct mapped TLB entries
- optional 64 bit total number of memory reads counter
- optional 64 bit total number of memory writes counter
- page miss counter
I am already planning a version of VMCOG with 1KB pages. This will allow 128KB of VM on PropCade and FlexMem for really large ZOG programs [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Unfortunately, I'm stuck with a bug from the 9th level of heck·in the Java project and won't get to it right away.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
May the road rise to meet you; may the sun shine on your back.
May you create something useful, even if it's just a hack.
I've been quietly following the Java thread, it's a fascinating project.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
The 23K256 driver has been merged with VMCOG, and the merged driver compiles without error.
Functional testing begins ... NOW!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I uploaded a snapshot of where I am right now to the first post - v0.85 is not fully tested, but it is on the road to v0.90
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I added three constants to VMCOG that can be used by external programs:
PAGESIZE
The number of bytes per page
TLBENTRIES
The number of TLB entries
MEMSIZE
The size of the virtual memory
USAGE:
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system