I cut the SDRAM code directly into VMCOG, and it passes the heater test. I had to remove lots of optimizations such as 32 byte bursts to make it fit so this version is much slower than using a separate COG 4s -vs- 3.5s on benchmark.
I'll try integrated VMCOG SDRAM with ZOG after lunch. If I can make that work, I'll try to patch the SDRAM Cache code directly to ZOG later.
I cut the SDRAM code directly into VMCOG, and it passes the heater test. I had to remove lots of optimizations such as 32 byte bursts to make it fit so this version is much slower than using a separate COG 4s -vs- 3.5s on benchmark.
I'll try integrated VMCOG SDRAM with ZOG after lunch. If I can make that work, I'll try to patch the SDRAM Cache code directly to ZOG later.
The latest PropCade version of VMCOG adds two new messages for the mailbox for reading/writing registers in the MCP23S17 that shares the SPI bus with the SPI ram's.
The reason for this is that PropCade multiplexes 8 SPI devices onto one SPI bus, six SPI memory chips, the uSD card, and an MCP23S17 used for two Sega joysticks or two eight bit I/O ports.
With just a bit of spin code, this lets me read the two Sega joysticks as if they were NES joysticks :-)
The current code supports:
- UP/DOWN/LEFT/RIGHT/START/A/B/C buttons on the Sega
- the "C" button is mapped to the NES "Select" button.
The Sega X/Y/Z buttons (on six button joysticks) are currently not supported, as it requires PASM code to generate the necessary timing on an output bit to get that working.
Did you try it with a few different settings for number of working set pages?
VMDebug fails to start with more than 46 pages. The heater test passes with 1, 2, 10, 19, 32, 40 and 46. Using 1 is funny, but it should work regardless of how much we snicker
VMDebug fails to start with more than 46 pages. The heater test passes with 1, 2, 10, 19, 32, 40 and 46. Using 1 is funny, but it should work regardless of how much we snicker
Bill, I got fibo running on zog with vmcog/sdram up to fibo(20). I reversed the "shr_hits" 075->076 changes and the test gets to fibo(23). The LRU algorithm may still have some issues.
Heater, I think I can cut in the SDRAM code. I'll let you know.
Jazzed. Ahh, that answers the question I just put on the Zog thread.
But if using vmcog v075 and 20 pages surely you should get the same OK result as me. Assuming SDRAM access is always working correctly?
Jazzed "I think I can cut in the SDRAM code."
Is it possible we should take a little speed hit and un-inline the memory accesses. e.g. opcode fetch would call read_byte instead of going to VMCOG directly.
This would isolate RAM access to a few functions, save some space and make adding direct hardware access much easier.
I was never inclined to add direct hardware access to Zog but if you start down that road it will continue for TriBlade, RamBlade, DracBlade etc. That either leads to #ifdef soup or multiple versions. Or can we set up a way to have include files for different hardware codes.
Bill, I got fibo running on zog with vmcog/sdram up to fibo(20). I reversed the "shr_hits" 075->076 changes and the test gets to fibo(23). The LRU algorithm may still have some issues.
Heater, I think I can cut in the SDRAM code. I'll let you know.
I'm trying to port the VMCOG MORPHEUS1 mode to my custom Hydra SDRAM card that has two 23k256 chips on it with separate chip select pins but common SI, SO, and CLK pins. I've tested this board using a simple SPI driver written in SPIN and it seems to work but I have problems when I run it with VMCOG. The only changes I've made to the code from the vmdebug-bst-archive-100826-163205.zip file is to change the PLL and clock to match the Hydra:
_clkmode = xtal1 + pll8x
_xinfreq = 10_000_000
And to change the pin assignments for the SDRAM chips:
cs long 1<<19
clk long 1<<17
mosi long 1<<16
miso long 1<<18
cs_clk long (1<<19)|(1<<17)
clk_mosi long (1<<17)|(1<<16)
cs2 long 1<<20
clk2 long 1<<17
mosi2 long 1<<16
miso2 long 1<<18
cs2_clk2 long (1<<20)|(1<<17)
clk2_mosi2 long (1<<17)|(1<<16)
My board uses the following pin assignments:
SI = P16
SCK = P17
SO = P18
CS = P19
CS2 = P20
Shouldn't that be all I have to do to get this to work? If I try using the 'f' command in vmdebug and then dump page 0 all I get is lots of $1818 words. Any idea what might be going wrong?
I will try to wire up chips with your pinout tomorrow. Unfortunately my uncle is in emergency, so I was tied up all day yesterday, and will still be busy today.
I found one problem. I hadn't updated the variable spidir to match my pins. In order to make it easier to change pin assignments I made the following changes to vmcog.spin. Unfortunately, setting the spidir variable didn't fix my problem. I still get all $1818 values when I try to fill memory using the 'f' command.
#ifdef MORPHEUS1
dv long 0 ' device address, between 0 and 7, however 6&7 are not valid
bits long 0
read long $03000000 ' read command
write long $02000000 ' write command
ramseq long $01400000 ' %00000001_01000000 << 16 ' set sequetial mode
readstat long $05000000 ' read status
pagesiz long 128 ' in longs
spidir long (1<<CS_PIN)|(1<<CLK_PIN)|(1<<MOSI_PIN)|(1<<CS2_PIN)|(1<<CLK2_PIN)|(1<<MOSI2_PIN)
pdata long 0
offs_mask long $7FFF
bit16 long $8000
chip1 mov tcs,cs
mov tclk,clk
mov tmosi,mosi
mov tmiso,miso
mov tcs_clk,cs_clk
mov tclk_mosi,clk_mosi
chip1_ret ret
chip2 mov tcs,cs2
mov tclk,clk2
mov tmosi,mosi2
mov tmiso,miso2
mov tcs_clk,cs2_clk2
mov tclk_mosi,clk2_mosi2
chip2_ret ret
cs long 1<<CS_PIN
clk long 1<<CLK_PIN
mosi long 1<<MOSI_PIN
miso long 1<<MISO_PIN
cs_clk long (1<<CS_PIN)|(1<<CLK_PIN)
clk_mosi long (1<<CLK_PIN)|(1<<MOSI_PIN)
cs2 long 1<<CS2_PIN
clk2 long 1<<CLK2_PIN
mosi2 long 1<<MOSI2_PIN
miso2 long 1<<MISO2_PIN
cs2_clk2 long (1<<CS2_PIN)|(1<<CLK2_PIN)
clk2_mosi2 long (1<<CLK2_PIN)|(1<<MOSI2_PIN)
tcs long 0
tclk long 0
tmosi long 0
tmiso long 0
tcs_clk long 0
tclk_mosi long 0
#endif
Okay, now I'm completely confused. I pulled the SPI SRAM code out of VMCOG and wrote a simple test program to see if my Hydra SPI SRAM card would work with it. To simplify my testing I changed the page size to 64 but otherwise I'm running the code from VMCOG unchanged and it seems to work just fine with my Hydra SPI SRAM card. I have no idea why it doesn't work with the vmdebug test program. I'll have to try to understand more of the VMCOG code to see if I can figure it out. In the meantime, I've attached my SPI SRAM test program.
Btw, I'd love it if you went through the VMCOG code - there may be a bug lurking, that you may find while understanding it, as Fibo under Zog crashes with some working set sizes. I can't seem to find it, even after looking hundreds of times.
Want to hear something else confusing? I've merged my preliminary (slow) FlexMem drivers into VMCOG, and:
- writes to status register don't work
- reads of status register work
- reads of memory (one long at a time) work
- writes to memory (one long at a time) don't work
- can't read/write pages at a time until writes to status register work as it needs setting sequential mode
(the 23K256 does not have /WP pin, so that can't be it)
And it is basically the same code that works for PropCade and Morpheus1!!!!
Even worse, a scope shows clean signals on all pins, and ViewPort shows correct waveforms in LSA mode!
The good news is that I've sent off 4 of the PCB's I've shown at UPEW to production, so I can concentrate on VMCOG new for a few days.
Okay, now I'm completely confused. I pulled the SPI SRAM code out of VMCOG and wrote a simple test program to see if my Hydra SPI SRAM card would work with it. To simplify my testing I changed the page size to 64 but otherwise I'm running the code from VMCOG unchanged and it seems to work just fine with my Hydra SPI SRAM card. I have no idea why it doesn't work with the vmdebug test program. I'll have to try to understand more of the VMCOG code to see if I can figure it out. In the meantime, I've attached my SPI SRAM test program.
Is there a description of your new boards posted somewhere? I had thought about buying Morpheus but somehow I thought there was a new version coming out so I decided to wait. Are these new boards you're talking about new versions of Morpheus and Mem+?
Hmmm... working fine outside of VMCOG implies that memory within VMCOG is getting corrupted.
One of the many self-modifying indirect stores may be going wild... best bet would be within the BUSERR handling, when it updates the TLB - if it somehow computed a bad cog address, that could easily clobber code within the cog, thus explaining the behavior you report, and the problem with ZOG!
Okay, now I'm completely confused. I pulled the SPI SRAM code out of VMCOG and wrote a simple test program to see if my Hydra SPI SRAM card would work with it. To simplify my testing I changed the page size to 64 but otherwise I'm running the code from VMCOG unchanged and it seems to work just fine with my Hydra SPI SRAM card. I have no idea why it doesn't work with the vmdebug test program. I'll have to try to understand more of the VMCOG code to see if I can figure it out. In the meantime, I've attached my SPI SRAM test program.
Btw, I'd love it if you went through the VMCOG code - there may be a bug lurking, that you may find while understanding it, as Fibo under Zog crashes with some working set sizes. I can't seem to find it, even after looking hundreds of times.
I will look it over tonight but I'll warn you that I'm far from an expert Spin/PASM programmer as you can probably tell from the code I wrote in my SPI SRAM test. Any good code in there was probably stolen from either you or Andre' LaMothe. :-)
The other board that went to production is 485Plug, described on p.13 of the Morpheus thread.
I have 12 other boards going into production over the next month or two, including the high-end Morpheus+ / Mem* combination, and the mysterious "PLC-G"... along with a ton of industrial I/O modules for my boards. If you read the Morpheus thread starting p.12, I briefly described all the new boards except for PLC-G there
Is there a description of your new boards posted somewhere? I had thought about buying Morpheus but somehow I thought there was a new version coming out so I decided to wait. Are these new boards you're talking about new versions of Morpheus and Mem+?
Every extra pair of eyeballs is MUCH appreciated - I figure I am too close to the code, and know too well how it "should" work, thus I might be missing something basic!
I will look it over tonight but I'll warn you that I'm far from an expert Spin/PASM programmer as you can probably tell from the code I wrote in my SPI SRAM test. Any good code in there was probably stolen from either you or Andre' LaMothe. :-)
I've been reading through the VMCOG code trying to understand it and I have a general question about the behavior of the Propeller hub access instructions. Do the RDBYTE/RDWORD/RDLONG and WRBYTE/WRWORD/WRLONG instructions ignore all but the low order 16 bits of their source operands? In other words is RDLONG foo,$1000 interpreted the same as RDLONG foo,$ffff1000?
I've been reading through the VMCOG code trying to understand it and I have a general question about the behavior of the Propeller hub access instructions. Do the RDBYTE/RDWORD/RDLONG and WRBYTE/WRWORD/WRLONG instructions ignore all but the low order 16 bits of their source operands? In other words is RDLONG foo,$1000 interpreted the same as RDLONG foo,$ffff1000?
Okay, I'm going to try my hand at offering a suggestion. I think the following code:
shr_hits ' walk through TLB, divide all non-zero hit counts by two
movs jx,#0 ' finding candidate page to sacrifice
forj
jx mov tlbi,0-0 wz
if_z jmp #nextj
movd updtc,jx
mov temp,tlbi
andn temp,elevenbits
shr tlbi,#1
andn tlbi,elevenbits
or tlbi,temp
updtc mov 0-0,tlbi
' next ix
nextj add jx, #1
and jx, #128 nr, wz
if_z jmp #forj
shr_hits_ret ret
elevenbits long $07FF
Could be changed to this:
shr_hits ' walk through TLB, divide all non-zero hit counts by two
movs jx,#0 ' finding candidate page to sacrifice
forj
jx mov tlbi,0-0 wz
if_z jmp #nextj
movd updtc,jx
mov temp,tlbi
' don't need to mask out the high bits here because we do it below
shr tlbi,#1
andn tlbi,elevenbits
' need to mask out the current count before combining with the updated count
and tlbi,elevenbits
or tlbi,temp
updtc mov 0-0,tlbi
' next ix
nextj add jx, #1
and jx, #128 nr, wz
if_z jmp #forj
shr_hits_ret ret
elevenbits long $07FF
There is no need to mask the count twice, once before and once after the right shift. On the other hand, we do need to mask out the current count before ORing with the new count. Otherwise we get the combination of both sets of bits.
Also, this subroutine is entered when a count overflows. That means that the entry pointed to by vmpage has a zero count. Nothing is done in this code to adjust that. I'm not sure if a zero count will cause any problems but it will make the most frequently accessed page appear to be the least frequently accessed page. Another possible problem is that the entire entry could be zero if it happens to be pointing to the first page in hub RAM and the DIRTY and LOCK bits are also clear. This would make it look like the page wasn't in the cache.
Actually, you found a bug - there is a need to mask twice, but the first one should have been "and" not andn!
The "and" was to preserve the hub page allocated to that VM page, and it was being lost. This is a serious bug, that I would have noticed had I not had my nose buried in the code too long... The extra 'n' (ie andn instead of and) was the culprit, not fixing the hit could would just have caused a performance hit.
I think there is an excellent chance that this is the cause of the fibo() problems, as it would clear the pointer to the page in the working set when the hit count overflowed - thus clobbering the lowest 512 bytes in memory!
As you noticed, I forgot to put in a fixed count for the count that wrapped, I added code to do that as well.
The routine below should work now, and I would not be at all surprised if it fixes the fibo() problem.
Frankly, I would not be surprised if this measurably improves performance.
Okay, I'm going to try my hand at offering a suggestion. <snip>
There is no need to mask the count twice, once before and once after the right shift. On the other hand, we do need to mask out the current count before ORing with the new count. Otherwise we get the combination of both sets of bits.
Also, this subroutine is entered when a count overflows. That means that the entry pointed to by vmpage has a zero count. Nothing is done in this code to adjust that. I'm not sure if a zero count will cause any problems but it will make the most frequently accessed page appear to be the least frequently accessed page. Another possible problem is that the entire entry could be zero if it happens to be pointing to the first page in hub RAM and the DIRTY and LOCK bits are also clear. This would make it look like the page wasn't in the cache.
'----------------------------------------------------------------------------------------------------
'
' SHR_HITS - divide all valid hit counts by two, called when a hit count would wrap around
'
' NOTE: shr_hits is not debugged yet!
'
'----------------------------------------------------------------------------------------------------
shr_hits ' walk through TLB, divide all non-zero hit counts by two
movs jx,#0 ' finding candidate page to sacrifice
movd fixup,vmpage
forj
jx mov tlbi,0-0 wz
if_z jmp #nextj
movd updtc,jx
mov temp,tlbi
and temp,elevenbits
shr tlbi,#1
andn tlbi,elevenbits
or tlbi,temp
updtc mov 0-0,tlbi
' next ix
nextj add jx, #1
and jx, #128 nr, wz
if_z jmp #forj
' fix overflow count, give it half-count
fixup or 0-0,halfcount
shr_hits_ret ret
elevenbits long $07FF
halfcount long $80000000
Actually, you found a bug - there is a need to mask twice, but the first one should have been "and" not andn!
Sorry, I guess there was a bug in my bug fix! I failed to notice that you were shifting the original value not the one you had just ANDNed. I'm glad you caught my error before releasing your fix.
Sorry, I guess there was a bug in my bug fix! I failed to notice that you were shifting the original value not the one you had just ANDNed. I'm glad you caught my error before releasing your fix.
This is worse. Depending on the number of pages I have (I tried 8, 10, 20) it either hangs up around fibo(21) or continues a few more fibos with wrong results and then hangs up.
The heater test in my old vmdebug works OK though.
I cannot compile the new vmdebug, BST is complaining about not finding hex method in FullDuplexSerialPlus. No idea why.
Bill, can you take the TRIBLADE_2 sections from the attached VMCog. It is your last 0.981 version + TRIBLADE_2.
Comments
I'll try integrated VMCOG SDRAM with ZOG after lunch. If I can make that work, I'll try to patch the SDRAM Cache code directly to ZOG later.
--Steve
Did you try it with a few different settings for number of working set pages?
The latest PropCade version of VMCOG adds two new messages for the mailbox for reading/writing registers in the MCP23S17 that shares the SPI bus with the SPI ram's.
The reason for this is that PropCade multiplexes 8 SPI devices onto one SPI bus, six SPI memory chips, the uSD card, and an MCP23S17 used for two Sega joysticks or two eight bit I/O ports.
With just a bit of spin code, this lets me read the two Sega joysticks as if they were NES joysticks :-)
The current code supports:
- UP/DOWN/LEFT/RIGHT/START/A/B/C buttons on the Sega
- the "C" button is mapped to the NES "Select" button.
The Sega X/Y/Z buttons (on six button joysticks) are currently not supported, as it requires PASM code to generate the necessary timing on an output bit to get that working.
I plan to release a generic MCP23S17 object RSN.
Now if you could only support 32MB
46 pages = 23k, so it makes sense that VMDebug would fail - it would be getting clobbered
I can support 32MB, but it will cause a performance hit any way I do it.
The "easiest" way is to do a simple direct mapped cache approach, however this will lead to some trashing.
Second easiest is a two-way associative scheme, I think there is room in VMCOG for that.
Third is a four way associative scheme, however that will require a minimum 2KB table in the hub.
I DO NOT want to get into multi-level page tables, I am certain the performance would be very poor.
An interesting option would be to support say a 2MB VM, but make 30MB available as a very high speed disk...
Wow. How much space do you need? Operating with VMCog there is only 10 LONG's left in Zog.
I'm sure 20 or more LONGs can be recovered by recycling some init code for variables.
If I had adding direct access to RAM into Zog in mind I would not have in lined it so much.
Heater, I think I can cut in the SDRAM code. I'll let you know.
But if using vmcog v075 and 20 pages surely you should get the same OK result as me. Assuming SDRAM access is always working correctly?
Jazzed "I think I can cut in the SDRAM code."
Is it possible we should take a little speed hit and un-inline the memory accesses. e.g. opcode fetch would call read_byte instead of going to VMCOG directly.
This would isolate RAM access to a few functions, save some space and make adding direct hardware access much easier.
I was never inclined to add direct hardware access to Zog but if you start down that road it will continue for TriBlade, RamBlade, DracBlade etc. That either leads to #ifdef soup or multiple versions. Or can we set up a way to have include files for different hardware codes.
I almost have VMCOG running with two chips (on separate SPI 4-wire ports) running on Morpheus CPU1
One bug left to squish then I will upload a new version.
After that:
FlexMem driver for VMCOG!
(After I test the IR in/out on the rev2 pcb, I will add FlexMem support to VMCOG)
And to change the pin assignments for the SDRAM chips:
My board uses the following pin assignments:
SI = P16
SCK = P17
SO = P18
CS = P19
CS2 = P20
Shouldn't that be all I have to do to get this to work? If I try using the 'f' command in vmdebug and then dump page 0 all I get is lots of $1818 words. Any idea what might be going wrong?
Thanks!
David
That should work...
I will try to wire up chips with your pinout tomorrow. Unfortunately my uncle is in emergency, so I was tied up all day yesterday, and will still be busy today.
Regards,
Bill
Changed in the CON section:
Changed in the DAT section:
Btw, I'd love it if you went through the VMCOG code - there may be a bug lurking, that you may find while understanding it, as Fibo under Zog crashes with some working set sizes. I can't seem to find it, even after looking hundreds of times.
Want to hear something else confusing? I've merged my preliminary (slow) FlexMem drivers into VMCOG, and:
- writes to status register don't work
- reads of status register work
- reads of memory (one long at a time) work
- writes to memory (one long at a time) don't work
- can't read/write pages at a time until writes to status register work as it needs setting sequential mode
(the 23K256 does not have /WP pin, so that can't be it)
And it is basically the same code that works for PropCade and Morpheus1!!!!
Even worse, a scope shows clean signals on all pins, and ViewPort shows correct waveforms in LSA mode!
The good news is that I've sent off 4 of the PCB's I've shown at UPEW to production, so I can concentrate on VMCOG new for a few days.
One of the many self-modifying indirect stores may be going wild... best bet would be within the BUSERR handling, when it updates the TLB - if it somehow computed a bad cog address, that could easily clobber code within the cog, thus explaining the behavior you report, and the problem with ZOG!
Morpheus (pcb rev 2) and Mem+ (pcb rev 2) are described in towards the end of p.12 in the Morpheus thread:
http://forums.parallax.com/showthread.php?t=113929
I think you'd like the Morpheus Developer's Guide on my downloads page, as it explains the architecture. There is also a page on it on the site.
The Developer's Guide applies to rev.2 pcb's as well, but I will have to add a couple of pages for the new IR features.
PropCade is described in its own thread at:
http://forums.parallax.com/showthread.php?t=121315
The other board that went to production is 485Plug, described on p.13 of the Morpheus thread.
I have 12 other boards going into production over the next month or two, including the high-end Morpheus+ / Mem* combination, and the mysterious "PLC-G"... along with a ton of industrial I/O modules for my boards. If you read the Morpheus thread starting p.12, I briefly described all the new boards except for PLC-G there
RDLONG also ignores the two lowest bits
RDWORD ignores the lowest bit
Could be changed to this:
There is no need to mask the count twice, once before and once after the right shift. On the other hand, we do need to mask out the current count before ORing with the new count. Otherwise we get the combination of both sets of bits.
Also, this subroutine is entered when a count overflows. That means that the entry pointed to by vmpage has a zero count. Nothing is done in this code to adjust that. I'm not sure if a zero count will cause any problems but it will make the most frequently accessed page appear to be the least frequently accessed page. Another possible problem is that the entire entry could be zero if it happens to be pointing to the first page in hub RAM and the DIRTY and LOCK bits are also clear. This would make it look like the page wasn't in the cache.
Actually, you found a bug - there is a need to mask twice, but the first one should have been "and" not andn!
The "and" was to preserve the hub page allocated to that VM page, and it was being lost. This is a serious bug, that I would have noticed had I not had my nose buried in the code too long... The extra 'n' (ie andn instead of and) was the culprit, not fixing the hit could would just have caused a performance hit.
I think there is an excellent chance that this is the cause of the fibo() problems, as it would clear the pointer to the page in the working set when the hit count overflowed - thus clobbering the lowest 512 bytes in memory!
As you noticed, I forgot to put in a fixed count for the count that wrapped, I added code to do that as well.
The routine below should work now, and I would not be at all surprised if it fixes the fibo() problem.
Frankly, I would not be surprised if this measurably improves performance.
THANK YOU!
I think this may very well fix the fibo under ZOG under VMCOG issue, and run a bit faster to boot
This is worse. Depending on the number of pages I have (I tried 8, 10, 20) it either hangs up around fibo(21) or continues a few more fibos with wrong results and then hangs up.
The heater test in my old vmdebug works OK though.
I cannot compile the new vmdebug, BST is complaining about not finding hex method in FullDuplexSerialPlus. No idea why.
Bill, can you take the TRIBLADE_2 sections from the attached VMCog. It is your last 0.981 version + TRIBLADE_2.