The more supported interfaces, the better [noparse]:)[/noparse]
That would work well as long as there was a page miss every 15ms or less.
It has also been my experience that the data sheet minimum refresh intervals are extremely conservative; I've heard reports of dram keeping contents for a second or two!
I went up to promising memory tests and benchmarks, but no further (thanks to VMCog maybe that code could be recovered now...)
For the refresh, I simply added a CBR at each command wait cycle, thus enforcing that at least one refresh cycle was made in the event of continuos page read requests.
It only adds a slight latency to the page transfer, and in this case we're only talking of cache misses, not regular memory accesses, so I'd go with the simple and easy one.
[noparse]:)[/noparse]. I have both Samsung and Toshiba 30 pin SIMM. My Propeller pin-out is probably different. Too bad the SIMMs are obsolete.
Bill Henning said...
How long is the longest JNI call, in ms?
I would be perfectly willing to add a "VMREFRESH" command, that Spin could send to the mailbox to force a refresh...
also, at the cost of a 200ns performance hit, it would be possible to change the spinner to incorporate a "is it time to refresh yet? / call refresh" two instruction sequence.
I am sure a viable solution exists
Funny, I have five of those 32Mx8 TSOP-II parts in my "in box"... strange coincidence, don't you think? They arrived from Digikey a few months ago...
Bill, the longest JNI for delay is up to the user ... could be 10's of seconds.
I've looked at those SDRAM over and over; my conclusion is a 2 Propeller solution minimum. We can talk more about that later.
You have the tightest possible spinner now. A 400ns spinner is possible with a djnz ... considering how long it takes to start a DRAM access, it probably doesn't matter.
Honestly though I care less about performance and more about getting something that works reliably at any speed right now so I can test the JVM port.
waitcmd wrlong zero,pvmcmd
goon mov refcount,#100 ' tune count to taste, each iteration takes 200ns normally when spinning
wl rdlong vminst, pvmcmd wz ' top 23 bits = address, bottom 9 = command
mov vmaddr, vminst
if_z djnz refcount,#wl ' take care of JNI problem... during normal run, costs only 200ns extra per read
if_z jmp #refresh ' refresh jumps to goon when complete
And JNI problem is SOLVED!
jazzed said...
Bill, the longest JNI for delay is up to the user ... could be 10's of seconds.
I've looked at those SDRAM over and over; my conclusion is a 2 Propeller solution minimum. We can talk more about that later.
You have the tightest possible spinner now. A 400ns spinner is possible with a djnz ... considering how long it takes to start a DRAM access, it probably doesn't matter.
Honestly though I care less about performance and more about getting something that works reliably at any speed right now so I can test the JVM port.
I almost posted something just like that this morning in an #ifdef XEDODRAM wrapper [noparse]:)[/noparse]
However, since the entire driver is packaged in another cog at the moment I think I'm set for a while.
Just got to finish debugging some things [noparse]:)[/noparse]
Cheers.
--Steve
Bill Henning said...
Ah! if you don't care about performance...
waitcmd wrlong zero,pvmcmd
goon mov refcount,#100 ' tune count to taste, each iteration takes 200ns normally when spinning
wl rdlong vminst, pvmcmd wz ' top 23 bits = address, bottom 9 = command
mov vmaddr, vminst
if_z djnz refcount,#wl ' take care of JNI problem... during normal run, costs only 200ns extra per read
if_z jmp #refresh ' refresh jumps to goon when complete
And JNI problem is SOLVED!
jazzed said...
Bill, the longest JNI for delay is up to the user ... could be 10's of seconds.
I've looked at those SDRAM over and over; my conclusion is a 2 Propeller solution minimum. We can talk more about that later.
You have the tightest possible spinner now. A 400ns spinner is possible with a djnz ... considering how long it takes to start a DRAM access, it probably doesn't matter.
Honestly though I care less about performance and more about getting something that works reliably at any speed right now so I can test the JVM port.
I can't wait to see what you are up to [noparse]:)[/noparse]
JVM with 64KB of code space perhaps?
As soon as this version is "proved out" with ZOG/JVM/ZiCog (at least one or two of them) I will make a version that removes the 64KB VM limit.
It will be slightly slower (about 200ns per command) but able to support a large virtual space - a 512KB VM address space would only need a 1KB table in the hub (and 64 longs for access count and DIRTY/LOCKED/READONLY flags in the VMCOG).
jazzed said...
I almost posted something just like that this morning in an #ifdef XEDODRAM wrapper [noparse]:)[/noparse]
However, since the entire driver is packaged in another cog at the moment I think I'm set for a while.
Just got to finish debugging some things [noparse]:)[/noparse]
Bill Henning said...
I can't wait to see what you are up to [noparse]:)[/noparse]
JVM with 64KB of code space perhaps?
I would prefer 1MB for JVM, but I'll take what I can get. It's likely the linker tool would only support 64K until I fix it.
Some day I'll have a file system that supports long file names and will make a JAVA2 compliant JVM that uses Swing, etc....
Of course I need a good touch screen LCD with graphics memory to do Swing.
With the read-only option for embedded flash, I could have a 32MB code segment and 1MB data space using two VMCOGs.
Not trying to set expectations that I can actually do it all quickly, but those are examples of what I'm shooting for.
I'm attaching an archive for review. Here are the diffs without XEDODRAM_1M.spin
I removed the non-ASCII chars from the MIT license so diff/patch can be used.
Unfortunately I have a problem with heater's march-c test ... I'll fix it later.
Cheers,
--Steve
diff -c vmcog_0607.spin vmcog.spin
*** vmcog_0607.spin 2010-06-07 18:18:29.054000000 -0700
--- vmcog.spin 2010-06-07 19:02:38.322000000 -0700
***************
*** 1,6 ****
'----------------------------------------------------------------------------------------------------
'
! ' VMCOG v0.941 - virtual memory server for the Propeller
'
' Copyright February 3, 2010 by William Henning
'
--- 1,6 ----
'----------------------------------------------------------------------------------------------------
'
! ' VMCOG v0.961 - virtual memory server for the Propeller
'
' Copyright February 3, 2010 by William Henning
'
***************
*** 86,92 ****
#define EXTRAM
#ifdef EXTRAM
! #define PROPCADE
'#define FLEXMEM
'#define MORPHEUS_SPI
'#define MORPHEUS_XMM
--- 86,92 ----
#define EXTRAM
#ifdef EXTRAM
! '#define PROPCADE
'#define FLEXMEM
'#define MORPHEUS_SPI
'#define MORPHEUS_XMM
***************
*** 96,102 ****
'#define TRIBLADE
'#define TRIBLADE_2
'#define RAMBLADE
! '#define XEDODRAM
#else
#warn External Ram access disabled - only use memory up to WS_SIZE
#endif
--- 96,102 ----
'#define TRIBLADE
'#define TRIBLADE_2
'#define RAMBLADE
! #define XEDODRAM
#else
#warn External Ram access disabled - only use memory up to WS_SIZE
#endif
***************
*** 175,186 ****
--- 175,208 ----
long[noparse][[/noparse]dataptr] := nump ' singe byte read/writes are the default
word[noparse][[/noparse]cmdptr] := 0
+ #ifdef XEDODRAM
+ longfill(@xmailbox,0,2) ' ensure command starts as 0
+ long[noparse][[/noparse]mailbox+12]:= @xmailbox ' 3rd vmcog mailbox word used for xm interface
+ xm.start(@xmailbox) ' start up inter-cog xmem
+ #endif
+
cognew(@vmcog,mailbox)
fcmdptr := @fakebox
fdataptr := fcmdptr+4
repeat while long[noparse][[/noparse]cmdptr] ' should fix startup bug heater found - it was the delay to load/init the cog
+ #ifdef XEDODRAM
+ VAR
+ ' xmailbox is a command and data word
+ ' command is %CCCC_LLLL_LLLL_AAAA_AAAA_AAAA_AAAA_AAAA
+ ' C is Command bits up to 16 commands
+ ' L is Length bits up to 256 bytes
+ ' A is Address bits up to 1MB address range
+ ' data is interpreter based on command word context
+ ' data is a pointer in case of buffer read/write
+ ' data is a long/word/byte in other command cases
+ '
+ long xmailbox
+ OBJ
+ xm : "XEDODRAM_1MB"
+ #endif
+
PUB rdvbyte(adr)
repeat while long[noparse][[/noparse]cmdptr]
long[noparse][[/noparse]cmdptr] := (adr<<9)|READVMB
***************
*** 364,373 ****
'End of TriBlade code
#endif
#ifdef XEDODRAM
! mov XDRAM_cmd,par
! add XDRAM_cmd,#16 ' par+16 for cmd ... vmcog mailbox is 4 longs
! mov XDRAM_dat,XDRAM_cmd
! add XDRAM_dat,#4 ' par+20 for dat
#endif
'----------------------------------------------------------------------------------------------------
' END OF BINIT SECTION
--- 386,396 ----
'End of TriBlade code
#endif
#ifdef XEDODRAM
! mov XDRAM_cmd,par
! add XDRAM_cmd,#12 ' par+n+4 for dat
! rdlong XDRAM_cmd,XDRAM_cmd
! mov XDRAM_dat,XDRAM_cmd
! add XDRAM_dat,#4 ' par+n+4 for dat
#endif
'----------------------------------------------------------------------------------------------------
' END OF BINIT SECTION
***************
*** 998,1007 ****
wrlong ptr,XDRAM_dat ' send new hub pointer
wrlong xcmd,XDRAM_cmd ' send command
rdlong temp,XDRAM_cmd wz
! if_nz jmp #$-1 ' wait for command complete
add ptr,#256 ' incr pointers
add xcmd,#256
! djnz count,#:next ' do next 128
#endif
' endif XEDORAM
#endif
--- 1021,1030 ----
wrlong ptr,XDRAM_dat ' send new hub pointer
wrlong xcmd,XDRAM_cmd ' send command
rdlong temp,XDRAM_cmd wz
! if_nz jmp #$-1 ' wait for command complete
add ptr,#256 ' incr pointers
add xcmd,#256
! djnz count,#:next ' do next 256
#endif
' endif XEDORAM
#endif
***************
*** 1084,1093 ****
wrlong ptr,XDRAM_dat ' send new hub pointer
wrlong xcmd,XDRAM_cmd ' send command
rdlong temp,XDRAM_cmd wz
! if_nz jmp #$-1 ' wait for command complete
add ptr,#256 ' incr pointers
add xcmd,#256
! djnz count,#:next ' do next 128
#endif
' endif XEDODRAM
#endif
--- 1107,1116 ----
wrlong ptr,XDRAM_dat ' send new hub pointer
wrlong xcmd,XDRAM_cmd ' send command
rdlong temp,XDRAM_cmd wz
! if_nz jmp #$-1 ' wait for command complete
add ptr,#256 ' incr pointers
add xcmd,#256
! djnz count,#:next ' do next 256
#endif
' endif XEDODRAM
#endif
***************
*** 1201,1209 ****
xcmd long 0
XDRAM_cmd long 0
XDRAM_dat long 0
! XDRAM_rbuf long $7000_0000 ' always read 256 at a time ... loops 4 times
! XDRAM_wbuf long $8000_0000 ' always write 256 at a time ... loops 4 times
! XDRAM_amsk long $0003_ffff ' address mask ... just allow 256K for now ....
#endif
' endif XEDODRAM
#endif
--- 1224,1232 ----
xcmd long 0
XDRAM_cmd long 0
XDRAM_dat long 0
! XDRAM_rbuf long $7ff0_0000 ' always read 256 at a time ... loops 2 times
! XDRAM_wbuf long $8ff0_0000 ' always write 256 at a time ... loops 2 times
! XDRAM_amsk long $000f_ffff ' address mask up to 1MB
#endif
' endif XEDODRAM
#endif
1MB VM address space is not a problem, but it will cost 2KB of hub ram. I plan that the VM size will be a CONstant, making the hub lookup table size self-adjusting.
re/ long file name fs... I am planning (when I have time (LOL)) porting my flashfs to SDHC cards. It is far superior to FAT, supports long file names, arbitrarily deep nested directories, unlimited files per directory, unix-style attributes, and huge files.
Nix on the 32MB code - that would need 64KB for the hub tlb with 512 byte pages. It would take 16KB with 2KB code pages.
Besides, I shudder to think of 32MB of Java byte codes!
FYI, on a Morpheus with a Mem+ you could have 1MB code, 1MB data, 512KB frame buffer - all in fast SRAM, all on CPU#2
CPU#1 can take a FlexMem (or three) as well...
I'll merge the driver tomorrow. I am dealing with a realtor tonight.
jazzed said...
I would prefer 1MB for JVM, but I'll take what I can get. It's likely the linker tool would only support 64K until I fix it.
Some day I'll have a file system that supports long file names and will make a JAVA2 compliant JVM that uses Swing, etc....
Of course I need a good touch screen LCD with graphics memory to do Swing.
With the read-only option for embedded flash, I could have a 32MB code segment and 1MB data space using two VMCOGs.
Not trying to set expectations that I can actually do it all quickly, but those are examples of what I'm shooting for.
I'm attaching an archive for review. Here are the diffs without XEDODRAM_1M.spin
I removed the non-ASCII chars from the MIT license so diff/patch can be used.
Unfortunately I have a problem with heater's march-c test ... I'll fix it later.
Bill. Big apologies from me. Those "optimized" versions were actually "brokenized" somewhere along the line.
Attached is your latest version again but with all my TriBlade optimizations backed out.
This now works on TriBlade (Honest)
I have extended the heater3 test with an incrementing counter fill test, a decrementing counter fill test, and a pseudo random number fill test.
READY>
Testing $00010000 bytes
Up count fill...
Up count check...
Down count fill...
Down count check...
Random fill...
Random check...
Zero fill...
Checking 00 writing FF...
Checking FF writing AA...
Checking AA writing 55...
Checking 55 writing 00...
OK
I did learn may years ago that memory testing is not as simple as it may seem at first. And then forgot!
Looks like I was putting too much faith in that little heater test as I blundered on optimizing and testing, optimizing and testing...
I will forgo any optimization attempts for a while and get Zog working with VM whilst this is solid.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I did the following test inspired by a robot or game character reading a map.· For hub ram the map was 31x31.· For SPI ram the map was 90x90 or 32,400 bytes. I only had one SPI ram chip installed on the robot.
repeat 1000
·· dx := -1, 0, or 1· dy := -1, 0, 1
·· x += dx y += dy (stay inside map)
·· read the 25·longs centered around x and y
Timing results per read and assignment (in microseconds):
HUB - 30
SPI with sram_contr20 - 402
VMcog with 2kB hub ram - 262
VMcog with 4kB hub ram - 186
VMcog with 8kB hub ram - 120
edit:· clock is 80,000,000
John Abshier
Post Edited (John Abshier) : 6/8/2010 3:34:44 PM GMT
Sorry, I missed this message earlier (before coffee)
I can't wait to try 64KB (and later larger) Zog!
I think I will hold off on improvements to VMCOG itself (other than integrating more hardware drivers) until we have some software running under ZOG - at the very least, Fibo and Dhrystone, which can use 64KB or more. Why?
I can then use that software to test the "big" VM version!
heater said...
Yep, the first byte in page bug looks to be fixed in my testing.
The heater test now fails correctly when a error is forced.
My last Triblade optimizations were missed from Bill's last release so here is his last release plus my latest TriBlade mods.
Could you send me your sram_contr20 again? I could not get the copy I have running in the past, but I want to try it again soon as it would speed up page reads/writes.
Very interesting results!
So VMCOG was 1/4 of the speed of direct hub reads with 8K working set. This is consistent with what I expected based on the number of hub accesses VMCOG and client have to do.
Thank you for the smaller working test results - it shows that the performance scales with a larger working set, as expected with more page hits and fewer misses.
John Abshier said...
I did the following test inspired by a robot or game character reading a map. For hub ram the map was 31x31. For SPI ram the map was 90x90 or 32,400 bytes. I only had one SPI ram chip installed on the robot.
repeat 1000
dx := -1, 0, or 1 dy := -1, 0, 1
x += dx y += dy (stay inside map)
read the 25 longs centered around x and y
Timing results per read and assignment (in microseconds):
HUB - 30
SPI with sram_contr20 - 402
VMcog with 2kB hub ram - 262
VMcog with 4kB hub ram - 186
VMcog with 8kB hub ram - 120
With a working set of 8KB (1/4 of the virtual memory) the performance was 25% that of the hub version
With a working set of 4KB (1/8 of the virtual memory) the performance was 16% that of the hub version
With a working set of 2KB (1/16 of the virtual memory) the performance was 11% that of the hub version
And let's not forget - the hub version would not be able to execute the test on a 90x90 map at all!
The really interesting point, is by counting cycles, the best case result for VMCOG is 1.2us per access (if the hub sweetspot is hit by both client and server)
A bare hub op (best case) takes 0.2us - so best case to best case, there is a 1us penalty per access to using VMCOG... that is, VMCOG is 1/6th the speed at best of a hypothetical large enough hub.
John's example shows that real clients will not be back-to-back hub operations, so we won't see a 6x slowdown. The more operations the client does between VMCOG accesses, the less slow down there will be compared to pure hub reads.
Just posted a working Zog plus VMCog to the Zog thread.
This has almost all the functionality of the HUB memory build. The syscall I/O stuff is not there yet.
Here is the evidence, Zog HUB and then Zog VM running the RC4 crypto test. That's the test that gave me all the grief with endianness issues a while back.
The first run of this in VM took 30 seconds as opposed to the HUB version at 1 or 2 seconds!
Then I thought that using only 4 pages was a bit mean. Using 8 was the same slow. Moving to 16 pages gets us back to 2 or 3 seconds.
Edit: 12 pages gives about 16 seconds.
I'll try and get Dhrystone running but first I need to arrange to pull ZPU executables from SD cards rather than waste space with using "file" statements.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Your timing results perfectly reflect CS theory - as soon as you find the "sweet spot" for a given application, VM is not that much slower than using physical ram. Once there is more hub ram free, I'll be curious to see the effect of adding more pages (20, 24,32 total pages) - I bet the curve will flatten and at some point the speed won't improve.
I have some (paying) work that I need to do this week, but I intend to remove the 64KB VM limit by the weekend, latest. Until then, VMCOG should get a good workout (testing). This will also allow testing the large VM version against a known working VM.
As I mentioned in the ZOG thread, what we now need is a simple shell, that can launch .ZOG binaries from an SD card.
vga and sd pins might need changing, but this bootloader does not need any external ram. It starts off with a list of .bin files and you run them with SPIN MYFILE.BIN
It seems to me a vmcog based boot-loader would be best to write an image into memory
since it already understands the hardware. It could also optionally start the program.
Thanks, I will take a look, it sounds good from your description.
Jazzed:
I added an API call to VMCOG to make loading faster & easier:
PUB GetVirtLoadAddr(vaddr)|va
va:= vaddr&$7FFE00 ' 23 bit VM address - force start of page
wrvbyte(vaddr,rdvbyte(va)) ' force page into working set, set dirty bit
return GetPhysVirt(va) ' note returned pointer only valid until next vm call
So to load, you would do something like the following pseudo-code:
addr:=0;
while (filesize>0) {
fread(f,vm.GetVirtLoadAddr(addr),512)
addr+=512;
filesize-=512;
}
This would "read" directly into the VM, 512 bytes at a time.
NOTE:
The pointer returned by GetVirtLoadAddr() is ONLY valid until a different VM op is executed, as a different vm op may force that page to be paged out!
Obviously, this could also be used to save the whole VM image to an SD card, for "swapping" whole VM's, or implementing "suspend" and "hibernation" modes
Bill Henning said...
I added an API call to VMCOG to make loading faster & easier:
PUB GetVirtLoadAddr(vaddr)|va
va:= vaddr&$7FFE00 ' 23 bit VM address - force start of page
wrvbyte(vaddr,rdvbyte(va)) ' force page into working set, set dirty bit
return GetPhysVirt(va) ' note returned pointer only valid until next vm call
So to load, you would do something like the following pseudo-code:
addr:=0;
while (filesize>0) {
fread(f,vm.GetVirtLoadAddr(addr),512)
addr+=512;
filesize-=512;
}
This would "read" directly into the VM, 512 bytes at a time.
NOTE:
The pointer returned by GetVirtLoadAddr() is ONLY valid until a different VM op is executed, as a different vm op may force that page to be paged out!
Obviously, this could also be used to save the whole VM image to an SD card, for "swapping" whole VM's, or implementing "suspend" and "hibernation" modes
@Bill, that looks good especially for SRAM. It is a small problem for DRAM as you
probably know and I'll go into that later. ...
... Meanwhile, I think I'll look at adding read-only devices like the serial EEPROM and/or
the NAND flash which would not flush out on dirty page swaps and use very little precious
Propeller memory.
Having "non-flush" EEPROM would allow one of my goals for running JVM from EEPROM
especially since I already have a JVM that can work like that, and it will provide another
comparison for proving the value and effectiveness of the vmcog cache design
--
One of the problems with using DRAM is that if the COG running refresh is rebooted,
the data goes away :-( This is not an issue with SRAM obviously.
Of course if Propeller is not completely rebooted, it's not a problem for DRAM.
But how do you partially reboot Propeller? There are a few answers to that question,
but the easy one is not desirable if you're a "cognostic" (cognew -vs- coginit) user.
Still, in this case, I think there is merit in using coginit(7) for the device with a disclaimer.
Maybe in the future the best answer for the DRAM problem while remaining cognostic
is to have read-only vmcog devices such as SD-CARD, EEPROM, or that NAND flash that
can be loaded on demand per cog for the task and then be recycled. I'm not saying that
vmdebug or vmcog has to have the on-demand loader feature [noparse]:)[/noparse], but there can be
some benefit to having the read-only devices.
--
BTW: I also have another low pin count SRAM design that I may want to add later.
Bill Henning said...
I added an API call to VMCOG to make loading faster & easier:
addr:=0;
while (filesize>0) {
fread(f,vm.GetVirtLoadAddr(addr),512)
addr+=512;
filesize-=512;
}
@Bill, that looks good especially for SRAM. It is a small problem for DRAM as you
probably know and I'll go into that later. ...
Thanks - I thought that would handle loading nicely...
As for DRAM - just have a loop with 128 vm.wrvlong(addr,long)
jazzed said...
... Meanwhile, I think I'll look at adding read-only devices like the serial EEPROM and/or
the NAND flash which would not flush out on dirty page swaps and use very little precious
Propeller memory.
I may have time to implement READONLY later today; if not, my plan is to change the LUT entries as follows:
' LUT Entry values take the following form:
'
' if 0, page is not present in memory
'
' if not 0, the 32 bits are interpreted as:
'
' CCCCCCCC CCCCCCCC CCCCWDP PPPPPPPP
' where
'
' PPPPPPPPPP = hub address, upper 9 bits of up to 18 bit address, 000xxxxxx on Prop1
'
' D = Dirty bit - This bit is set whenever a write is performed to any byte(s) in the page
'
' L = Locked bit - set whenever the physical memory page is locked into the hub and may not be swapped out
'
' W = Write protected, probably from FLASH / EEPROM, but can be used to write-protect a RAM page as well
'
This does reduce the hit access count to 20 bits, but the (untested) shr_hits routine compensates for this.
jazzed said...
Having "non-flush" EEPROM would allow one of my goals for running JVM from EEPROM
especially since I already have a JVM that can work like that, and it will provide another
comparison for proving the value and effectiveness of the vmcog cache design
If someone wants to submit some nice, small, fast (but speed tunable for fast/slow eeproms) I2C code for BINIT/BREAD/BWRITE/BDATA there is no reason that there can't be an EEPROM VMCOG - however for such a VMCOG, a write-through strategy makes a lot more sense, so BWRITE would have to take a size: 1/2/4 bytes, and the DIRTY flag would not be maintained.
For a JVM, using hub memory for the stack and heap, but VM for code would be an interesting mix.
jazzed said...
One of the problems with using DRAM is that if the COG running refresh is rebooted,
the data goes away :-( This is not an issue with SRAM obviously.
Of course if Propeller is not completely rebooted, it's not a problem for DRAM.
But how do you partially reboot Propeller? There are a few answers to that question,
but the easy one is not desirable if you're a "cognostic" (cognew -vs- coginit) user.
Still, in this case, I think there is merit in using coginit(7) for the device with a disclaimer.
I think there is a place for both. Personally, both Minos and Largos will have a "system" cog, for managing resources/cogs and providing some other system services.
As for reboot - I think you would find that if you did a refresh just before the reboot, the DRAM would survive. I've seen DRAM content survive for seconds without refresh...
jazzed said...
Maybe in the future the best answer for the DRAM problem while remaining cognostic
is to have read-only vmcog devices such as SD-CARD, EEPROM, or that NAND flash that
can be loaded on demand per cog for the task and then be recycled. I'm not saying that
vmdebug or vmcog has to have the on-demand loader feature [noparse]:)[/noparse], but there can be
some benefit to having the read-only devices.
As soon as I move to an external "page present" table, there will be enough space left in VMCOG for basic I2C capability - which opens interesting possibilities such as your suggested demand loading of drivers (with reusing cogs after the operation is done).
Actually, right now I am thinking that the low-level SD card stuff (initialize, send command, read status, read sector, write sector) needs to be totally split out from the file system and made mailbox based. This would allow compiling file systems to run out of virtual memory! With Catalina and ZOG, it would also allow running arbitrary file systems written in C.
jazzed said...
BTW: I also have another low pin count SRAM design that I may want to add later.
Cheers.
--Steve
Nice!
The more, the merrier!
I have a 12 pin CPLD based design that is just waiting for me to have time after UPEW to test and have boards made.
Bill: "...the low-level SD card stuff (initialize, send command, read status, read sector, write sector) needs to be totally split out from the file system and made mailbox based."
That would be excellent. Useful for the Z80, 6502 emulators as well.
By the way shouldn't the Zogs be converted to uses mailboxes as well?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Comments
That would work well as long as there was a page miss every 15ms or less.
It has also been my experience that the data sheet minimum refresh intervals are extremely conservative; I've heard reports of dram keeping contents for a second or two!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
[noparse]:)[/noparse]. I have both Samsung and Toshiba 30 pin SIMM. My Propeller pin-out is probably different. Too bad the SIMMs are obsolete.
Bill, the longest JNI for delay is up to the user ... could be 10's of seconds.
I've looked at those SDRAM over and over; my conclusion is a 2 Propeller solution minimum. We can talk more about that later.
You have the tightest possible spinner now. A 400ns spinner is possible with a djnz ... considering how long it takes to start a DRAM access, it probably doesn't matter.
Honestly though I care less about performance and more about getting something that works reliably at any speed right now so I can test the JVM port.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
replace
with
And JNI problem is SOLVED!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
it's the first thing you suggested, only·a single row per cycle.
even interleaved with page reads this works, at least with a 256 byte page
However, since the entire driver is packaged in another cog at the moment I think I'm set for a while.
Just got to finish debugging some things [noparse]:)[/noparse]
Cheers.
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
JVM with 64KB of code space perhaps?
As soon as this version is "proved out" with ZOG/JVM/ZiCog (at least one or two of them) I will make a version that removes the 64KB VM limit.
It will be slightly slower (about 200ns per command) but able to support a large virtual space - a 512KB VM address space would only need a 1KB table in the hub (and 64 longs for access count and DIRTY/LOCKED/READONLY flags in the VMCOG).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
I would prefer 1MB for JVM, but I'll take what I can get. It's likely the linker tool would only support 64K until I fix it.
Some day I'll have a file system that supports long file names and will make a JAVA2 compliant JVM that uses Swing, etc....
Of course I need a good touch screen LCD with graphics memory to do Swing.
With the read-only option for embedded flash, I could have a 32MB code segment and 1MB data space using two VMCOGs.
Not trying to set expectations that I can actually do it all quickly, but those are examples of what I'm shooting for.
I'm attaching an archive for review. Here are the diffs without XEDODRAM_1M.spin
I removed the non-ASCII chars from the MIT license so diff/patch can be used.
Unfortunately I have a problem with heater's march-c test ... I'll fix it later.
Cheers,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
re/ long file name fs... I am planning (when I have time (LOL)) porting my flashfs to SDHC cards. It is far superior to FAT, supports long file names, arbitrarily deep nested directories, unlimited files per directory, unix-style attributes, and huge files.
Nix on the 32MB code - that would need 64KB for the hub tlb with 512 byte pages. It would take 16KB with 2KB code pages.
Besides, I shudder to think of 32MB of Java byte codes!
FYI, on a Morpheus with a Mem+ you could have 1MB code, 1MB data, 512KB frame buffer - all in fast SRAM, all on CPU#2
CPU#1 can take a FlexMem (or three) as well...
I'll merge the driver tomorrow. I am dealing with a realtor tonight.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
The heater test now fails correctly when a error is forced.
My last Triblade optimizations were missed from Bill's last release so here is his last release plus my latest TriBlade mods.
Now I can start to add this to Zog.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
On going back to vmdebug I ended up changing the heater3 test from a zero fill at the start to filling with the low byte of address, like so:
heater3 then fails like so:
It should fail at 0001. This reproduces the problem I had in the zog code.
Reading back the memory I see a lot of zeros at the beginning.
Edit: The fill2 test ("f") does not work either.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 6/8/2010 10:05:33 AM GMT
I cannot reproduce your faulty result - using the zip from your previous message, with your change below, I get:
Which I believe is the correct result.
fill2 also works with #define PROPCADE
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 6/8/2010 1:31:32 PM GMT
Attached is your latest version again but with all my TriBlade optimizations backed out.
This now works on TriBlade (Honest)
I have extended the heater3 test with an incrementing counter fill test, a decrementing counter fill test, and a pseudo random number fill test.
I did learn may years ago that memory testing is not as simple as it may seem at first. And then forgot!
Looks like I was putting too much faith in that little heater test as I blundered on optimizing and testing, optimizing and testing...
I will forgo any optimization attempts for a while and get Zog working with VM whilst this is solid.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
repeat 1000
·· dx := -1, 0, or 1· dy := -1, 0, 1
·· x += dx y += dy (stay inside map)
·· read the 25·longs centered around x and y
Timing results per read and assignment (in microseconds):
HUB - 30
SPI with sram_contr20 - 402
VMcog with 2kB hub ram - 262
VMcog with 4kB hub ram - 186
VMcog with 8kB hub ram - 120
edit:· clock is 80,000,000
John Abshier
Post Edited (John Abshier) : 6/8/2010 3:34:44 PM GMT
Sorry, I missed this message earlier (before coffee)
I can't wait to try 64KB (and later larger) Zog!
I think I will hold off on improvements to VMCOG itself (other than integrating more hardware drivers) until we have some software running under ZOG - at the very least, Fibo and Dhrystone, which can use 64KB or more. Why?
I can then use that software to test the "big" VM version!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Could you send me your sram_contr20 again? I could not get the copy I have running in the past, but I want to try it again soon as it would speed up page reads/writes.
Very interesting results!
So VMCOG was 1/4 of the speed of direct hub reads with 8K working set. This is consistent with what I expected based on the number of hub accesses VMCOG and client have to do.
Thank you for the smaller working test results - it shows that the performance scales with a larger working set, as expected with more page hits and fewer misses.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
90x90x4 = 32,400 - just barely under 32KB
31x31x4 = 3,844 - just over 3.5KB
So with VMCOG, an 8.4x larger map was used.
With a working set of 8KB (1/4 of the virtual memory) the performance was 25% that of the hub version
With a working set of 4KB (1/8 of the virtual memory) the performance was 16% that of the hub version
With a working set of 2KB (1/16 of the virtual memory) the performance was 11% that of the hub version
And let's not forget - the hub version would not be able to execute the test on a 90x90 map at all!
The really interesting point, is by counting cycles, the best case result for VMCOG is 1.2us per access (if the hub sweetspot is hit by both client and server)
A bare hub op (best case) takes 0.2us - so best case to best case, there is a 1us penalty per access to using VMCOG... that is, VMCOG is 1/6th the speed at best of a hypothetical large enough hub.
John's example shows that real clients will not be back-to-back hub operations, so we won't see a 6x slowdown. The more operations the client does between VMCOG accesses, the less slow down there will be compared to pure hub reads.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 6/8/2010 4:51:23 PM GMT
John Abshier
That is NOP, NOP, NOP, NOP, IM 2
Was a bit thrown for a while as the example PASM codes in vmaccess.spin are all a mess:
mboxdat is at offset +8 in the mail box
"if_z jmp #waitdone" should be "if_nz...".
but otherwise Big Zog is in good shape.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Turns out that one of my old SIMMs is bad. Three out of four work fine.
Now I can focus on the higher goals.
@Bill, can you cut in the diffs I posted? Thanks.
Cheers.
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Thanks!
heater:
Nice! Good progress... sorry about vmaccess, that was thrown together quickly for pullmol weeks ago, and I have not had a chance to test them.
jazzed:
I will tonight, I am dealing with a realtor most of today.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
This has almost all the functionality of the HUB memory build. The syscall I/O stuff is not there yet.
Here is the evidence, Zog HUB and then Zog VM running the RC4 crypto test. That's the test that gave me all the grief with endianness issues a while back.
The first run of this in VM took 30 seconds as opposed to the HUB version at 1 or 2 seconds!
Then I thought that using only 4 pages was a bit mean. Using 8 was the same slow. Moving to 16 pages gets us back to 2 or 3 seconds.
Edit: 12 pages gives about 16 seconds.
I'll try and get Dhrystone running but first I need to arrange to pull ZPU executables from SD cards rather than waste space with using "file" statements.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 6/9/2010 9:17:09 AM GMT
Now we can really exercise VMCOG.
Your timing results perfectly reflect CS theory - as soon as you find the "sweet spot" for a given application, VM is not that much slower than using physical ram. Once there is more hub ram free, I'll be curious to see the effect of adding more pages (20, 24,32 total pages) - I bet the curve will flatten and at some point the speed won't improve.
I have some (paying) work that I need to do this week, but I intend to remove the 64KB VM limit by the weekend, latest. Until then, VMCOG should get a good workout (testing). This will also allow testing the large VM version against a known working VM.
As I mentioned in the ZOG thread, what we now need is a simple shell, that can launch .ZOG binaries from an SD card.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
vga and sd pins might need changing, but this bootloader does not need any external ram. It starts off with a list of .bin files and you run them with SPIN MYFILE.BIN
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
since it already understands the hardware. It could also optionally start the program.
Cheers.
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Post Edited (jazzed) : 6/9/2010 11:22:02 PM GMT
Thanks, I will take a look, it sounds good from your description.
Jazzed:
I added an API call to VMCOG to make loading faster & easier:
So to load, you would do something like the following pseudo-code:
This would "read" directly into the VM, 512 bytes at a time.
NOTE:
The pointer returned by GetVirtLoadAddr() is ONLY valid until a different VM op is executed, as a different vm op may force that page to be paged out!
Obviously, this could also be used to save the whole VM image to an SD card, for "swapping" whole VM's, or implementing "suspend" and "hibernation" modes
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
@Bill, that looks good especially for SRAM. It is a small problem for DRAM as you
probably know and I'll go into that later. ...
... Meanwhile, I think I'll look at adding read-only devices like the serial EEPROM and/or
the NAND flash which would not flush out on dirty page swaps and use very little precious
Propeller memory.
Having "non-flush" EEPROM would allow one of my goals for running JVM from EEPROM
especially since I already have a JVM that can work like that, and it will provide another
comparison for proving the value and effectiveness of the vmcog cache design
--
One of the problems with using DRAM is that if the COG running refresh is rebooted,
the data goes away :-( This is not an issue with SRAM obviously.
Of course if Propeller is not completely rebooted, it's not a problem for DRAM.
But how do you partially reboot Propeller? There are a few answers to that question,
but the easy one is not desirable if you're a "cognostic" (cognew -vs- coginit) user.
Still, in this case, I think there is merit in using coginit(7) for the device with a disclaimer.
Maybe in the future the best answer for the DRAM problem while remaining cognostic
is to have read-only vmcog devices such as SD-CARD, EEPROM, or that NAND flash that
can be loaded on demand per cog for the task and then be recycled. I'm not saying that
vmdebug or vmcog has to have the on-demand loader feature [noparse]:)[/noparse], but there can be
some benefit to having the read-only devices.
--
BTW: I also have another low pin count SRAM design that I may want to add later.
Cheers.
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Thanks - I thought that would handle loading nicely...
As for DRAM - just have a loop with 128 vm.wrvlong(addr,long)
I may have time to implement READONLY later today; if not, my plan is to change the LUT entries as follows:
' LUT Entry values take the following form:
'
' if 0, page is not present in memory
'
' if not 0, the 32 bits are interpreted as:
'
' CCCCCCCC CCCCCCCC CCCCWDP PPPPPPPP
' where
'
' PPPPPPPPPP = hub address, upper 9 bits of up to 18 bit address, 000xxxxxx on Prop1
'
' D = Dirty bit - This bit is set whenever a write is performed to any byte(s) in the page
'
' L = Locked bit - set whenever the physical memory page is locked into the hub and may not be swapped out
'
' W = Write protected, probably from FLASH / EEPROM, but can be used to write-protect a RAM page as well
'
This does reduce the hit access count to 20 bits, but the (untested) shr_hits routine compensates for this.
If someone wants to submit some nice, small, fast (but speed tunable for fast/slow eeproms) I2C code for BINIT/BREAD/BWRITE/BDATA there is no reason that there can't be an EEPROM VMCOG - however for such a VMCOG, a write-through strategy makes a lot more sense, so BWRITE would have to take a size: 1/2/4 bytes, and the DIRTY flag would not be maintained.
For a JVM, using hub memory for the stack and heap, but VM for code would be an interesting mix.
I think there is a place for both. Personally, both Minos and Largos will have a "system" cog, for managing resources/cogs and providing some other system services.
As for reboot - I think you would find that if you did a refresh just before the reboot, the DRAM would survive. I've seen DRAM content survive for seconds without refresh...
As soon as I move to an external "page present" table, there will be enough space left in VMCOG for basic I2C capability - which opens interesting possibilities such as your suggested demand loading of drivers (with reusing cogs after the operation is done).
Actually, right now I am thinking that the low-level SD card stuff (initialize, send command, read status, read sector, write sector) needs to be totally split out from the file system and made mailbox based. This would allow compiling file systems to run out of virtual memory! With Catalina and ZOG, it would also allow running arbitrary file systems written in C.
Nice!
The more, the merrier!
I have a 12 pin CPLD based design that is just waiting for me to have time after UPEW to test and have boards made.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 6/10/2010 4:55:24 PM GMT
That would be excellent. Useful for the Z80, 6502 emulators as well.
By the way shouldn't the Zogs be converted to uses mailboxes as well?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.