Shop OBEX P1 Docs P2 Docs Learn Events
VMCOG: Virtual Memory for ZiCog, Zog & more (VMCOG 0.976: PropCade,TriBlade_2,HYDRA HX512,XEDODRAM) - Page 12 — Parallax Forums

VMCOG: Virtual Memory for ZiCog, Zog & more (VMCOG 0.976: PropCade,TriBlade_2,HYDRA HX512,XEDODRAM)

11012141516

Comments

  • jazzedjazzed Posts: 11,803
    edited 2010-08-18 11:22
    I cut the SDRAM code directly into VMCOG, and it passes the heater test. I had to remove lots of optimizations such as 32 byte bursts to make it fit so this version is much slower than using a separate COG 4s -vs- 3.5s on benchmark.

    I'll try integrated VMCOG SDRAM with ZOG after lunch. If I can make that work, I'll try to patch the SDRAM Cache code directly to ZOG later.

    --Steve
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-18 11:29
    Sounds good!

    Did you try it with a few different settings for number of working set pages?
    jazzed wrote: »
    I cut the SDRAM code directly into VMCOG, and it passes the heater test. I had to remove lots of optimizations such as 32 byte bursts to make it fit so this version is much slower than using a separate COG 4s -vs- 3.5s on benchmark.

    I'll try integrated VMCOG SDRAM with ZOG after lunch. If I can make that work, I'll try to patch the SDRAM Cache code directly to ZOG later.

    --Steve
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-18 11:37
    Just a quick update...

    The latest PropCade version of VMCOG adds two new messages for the mailbox for reading/writing registers in the MCP23S17 that shares the SPI bus with the SPI ram's.

    The reason for this is that PropCade multiplexes 8 SPI devices onto one SPI bus, six SPI memory chips, the uSD card, and an MCP23S17 used for two Sega joysticks or two eight bit I/O ports.

    With just a bit of spin code, this lets me read the two Sega joysticks as if they were NES joysticks :-)

    The current code supports:

    - UP/DOWN/LEFT/RIGHT/START/A/B/C buttons on the Sega
    - the "C" button is mapped to the NES "Select" button.

    The Sega X/Y/Z buttons (on six button joysticks) are currently not supported, as it requires PASM code to generate the necessary timing on an output bit to get that working.

    I plan to release a generic MCP23S17 object RSN.
  • jazzedjazzed Posts: 11,803
    edited 2010-08-18 11:44
    Sounds good!

    Did you try it with a few different settings for number of working set pages?
    VMDebug fails to start with more than 46 pages. The heater test passes with 1, 2, 10, 19, 32, 40 and 46. Using 1 is funny, but it should work regardless of how much we snicker :)

    Now if you could only support 32MB :)
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-18 12:18
    Thanks for the results!

    46 pages = 23k, so it makes sense that VMDebug would fail - it would be getting clobbered :)

    I can support 32MB, but it will cause a performance hit any way I do it.

    The "easiest" way is to do a simple direct mapped cache approach, however this will lead to some trashing.

    Second easiest is a two-way associative scheme, I think there is room in VMCOG for that.

    Third is a four way associative scheme, however that will require a minimum 2KB table in the hub.

    I DO NOT want to get into multi-level page tables, I am certain the performance would be very poor.

    An interesting option would be to support say a 2MB VM, but make 30MB available as a very high speed disk...
    jazzed wrote: »
    VMDebug fails to start with more than 46 pages. The heater test passes with 1, 2, 10, 19, 32, 40 and 46. Using 1 is funny, but it should work regardless of how much we snicker :)

    Now if you could only support 32MB :)
  • Heater.Heater. Posts: 21,230
    edited 2010-08-18 13:29
    Jazzed:
    I'll try to patch the SDRAM Cache code directly to ZOG later.

    Wow. How much space do you need? Operating with VMCog there is only 10 LONG's left in Zog.

    I'm sure 20 or more LONGs can be recovered by recycling some init code for variables.

    If I had adding direct access to RAM into Zog in mind I would not have in lined it so much.
  • jazzedjazzed Posts: 11,803
    edited 2010-08-18 14:35
    Bill, I got fibo running on zog with vmcog/sdram up to fibo(20). I reversed the "shr_hits" 075->076 changes and the test gets to fibo(23). The LRU algorithm may still have some issues.

    Heater, I think I can cut in the SDRAM code. I'll let you know.
  • Heater.Heater. Posts: 21,230
    edited 2010-08-18 22:03
    Jazzed. Ahh, that answers the question I just put on the Zog thread.

    But if using vmcog v075 and 20 pages surely you should get the same OK result as me. Assuming SDRAM access is always working correctly?

    Jazzed "I think I can cut in the SDRAM code."

    Is it possible we should take a little speed hit and un-inline the memory accesses. e.g. opcode fetch would call read_byte instead of going to VMCOG directly.

    This would isolate RAM access to a few functions, save some space and make adding direct hardware access much easier.

    I was never inclined to add direct hardware access to Zog but if you start down that road it will continue for TriBlade, RamBlade, DracBlade etc. That either leads to #ifdef soup or multiple versions. Or can we set up a way to have include files for different hardware codes.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-19 08:34
    Thanks, good clue
    jazzed wrote: »
    Bill, I got fibo running on zog with vmcog/sdram up to fibo(20). I reversed the "shr_hits" 075->076 changes and the test gets to fibo(23). The LRU algorithm may still have some issues.

    Heater, I think I can cut in the SDRAM code. I'll let you know.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-26 13:46
    UPDATE

    I almost have VMCOG running with two chips (on separate SPI 4-wire ports) running on Morpheus CPU1

    One bug left to squish then I will upload a new version.

    After that:

    FlexMem driver for VMCOG!
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-26 16:33
    VMCOG now runs on Morpheus CPU#1 :)

    (After I test the IR in/out on the rev2 pcb, I will add FlexMem support to VMCOG)
    UPDATE

    I almost have VMCOG running with two chips (on separate SPI 4-wire ports) running on Morpheus CPU1

    One bug left to squish then I will upload a new version.

    After that:

    FlexMem driver for VMCOG!
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-26 16:39
    New archive with Morpheus CPU#1 support uploaded into the first post!
  • David BetzDavid Betz Posts: 14,516
    edited 2010-08-28 20:05
    I'm trying to port the VMCOG MORPHEUS1 mode to my custom Hydra SDRAM card that has two 23k256 chips on it with separate chip select pins but common SI, SO, and CLK pins. I've tested this board using a simple SPI driver written in SPIN and it seems to work but I have problems when I run it with VMCOG. The only changes I've made to the code from the vmdebug-bst-archive-100826-163205.zip file is to change the PLL and clock to match the Hydra:
      _clkmode          = xtal1 + pll8x
      _xinfreq          = 10_000_000
    

    And to change the pin assignments for the SDRAM chips:
    cs      long  1<<19
    clk     long  1<<17
    mosi    long  1<<16
    miso    long  1<<18
    cs_clk  long  (1<<19)|(1<<17)
    clk_mosi long (1<<17)|(1<<16)
    
    cs2     long  1<<20
    clk2    long  1<<17
    mosi2   long  1<<16
    miso2   long  1<<18
    cs2_clk2  long  (1<<20)|(1<<17)
    clk2_mosi2 long (1<<17)|(1<<16)
    

    My board uses the following pin assignments:

    SI = P16
    SCK = P17
    SO = P18
    CS = P19
    CS2 = P20

    Shouldn't that be all I have to do to get this to work? If I try using the 'f' command in vmdebug and then dump page 0 all I get is lots of $1818 words. Any idea what might be going wrong?

    Thanks!
    David
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-29 11:24
    Hi David,

    That should work...

    I will try to wire up chips with your pinout tomorrow. Unfortunately my uncle is in emergency, so I was tied up all day yesterday, and will still be busy today.

    Regards,

    Bill
  • David BetzDavid Betz Posts: 14,516
    edited 2010-08-29 18:33
    I found one problem. I hadn't updated the variable spidir to match my pins. In order to make it easier to change pin assignments I made the following changes to vmcog.spin. Unfortunately, setting the spidir variable didn't fix my problem. I still get all $1818 values when I try to fill memory using the 'f' command.

    Changed in the CON section:
    #ifdef MORPHEUS1
    
      CS_PIN		= 19
      CLK_PIN		= 17
      MOSI_PIN		= 16
      MISO_PIN		= 18
      
      CS2_PIN		= 20
      CLK2_PIN		= 17
      MOSI2_PIN		= 16
      MISO2_PIN		= 18
      
      READSTATUS    = 140 ' Read SPI RAM status register
    
      PIOREAD       = 141
      PIOWRITE      = 142
    
      '--------------------------------------------------------------------------------------------------
      ' PIO commands
      '--------------------------------------------------------------------------------------------------
    
      PIOREADK      = %0100_000_1_00000000
      PIOWRITEK     = %0100_000_0_00000000
    
    #endif
    

    Changed in the DAT section:
    #ifdef MORPHEUS1
    dv        long  0               ' device address, between 0 and 7, however 6&7 are not valid
    bits      long  0
    
    read      long  $03000000       ' read command
    write     long  $02000000       ' write command
    ramseq    long  $01400000       ' %00000001_01000000 << 16 ' set sequetial mode
    readstat  long  $05000000       ' read status
    
    pagesiz   long 128              ' in longs
    
    spidir    long  (1<<CS_PIN)|(1<<CLK_PIN)|(1<<MOSI_PIN)|(1<<CS2_PIN)|(1<<CLK2_PIN)|(1<<MOSI2_PIN)
    
    pdata     long  0
    
    offs_mask long $7FFF
    
    bit16     long $8000
    
    chip1   mov   tcs,cs
            mov   tclk,clk
            mov   tmosi,mosi
            mov   tmiso,miso
            mov   tcs_clk,cs_clk
            mov   tclk_mosi,clk_mosi
    chip1_ret ret
    
    chip2   mov   tcs,cs2
            mov   tclk,clk2
            mov   tmosi,mosi2
            mov   tmiso,miso2
            mov   tcs_clk,cs2_clk2
            mov   tclk_mosi,clk2_mosi2
    chip2_ret ret
    
    cs      long  1<<CS_PIN
    clk     long  1<<CLK_PIN
    mosi    long  1<<MOSI_PIN
    miso    long  1<<MISO_PIN
    cs_clk  long  (1<<CS_PIN)|(1<<CLK_PIN)
    clk_mosi long (1<<CLK_PIN)|(1<<MOSI_PIN)
    
    cs2     long  1<<CS2_PIN
    clk2    long  1<<CLK2_PIN
    mosi2   long  1<<MOSI2_PIN
    miso2   long  1<<MISO2_PIN
    cs2_clk2  long  (1<<CS2_PIN)|(1<<CLK2_PIN)
    clk2_mosi2 long (1<<CLK2_PIN)|(1<<MOSI2_PIN)
    
    tcs     long 0
    tclk    long 0
    tmosi   long 0
    tmiso   long 0
    tcs_clk long 0
    tclk_mosi long 0
    
    #endif
    
  • David BetzDavid Betz Posts: 14,516
    edited 2010-08-31 05:54
    Okay, now I'm completely confused. I pulled the SPI SRAM code out of VMCOG and wrote a simple test program to see if my Hydra SPI SRAM card would work with it. To simplify my testing I changed the page size to 64 but otherwise I'm running the code from VMCOG unchanged and it seems to work just fine with my Hydra SPI SRAM card. I have no idea why it doesn't work with the vmdebug test program. I'll have to try to understand more of the VMCOG code to see if I can figure it out. In the meantime, I've attached my SPI SRAM test program.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-31 10:08
    Ok, I am officially confused too!

    Btw, I'd love it if you went through the VMCOG code - there may be a bug lurking, that you may find while understanding it, as Fibo under Zog crashes with some working set sizes. I can't seem to find it, even after looking hundreds of times.

    Want to hear something else confusing? I've merged my preliminary (slow) FlexMem drivers into VMCOG, and:

    - writes to status register don't work
    - reads of status register work
    - reads of memory (one long at a time) work
    - writes to memory (one long at a time) don't work
    - can't read/write pages at a time until writes to status register work as it needs setting sequential mode

    (the 23K256 does not have /WP pin, so that can't be it)

    And it is basically the same code that works for PropCade and Morpheus1!!!!

    Even worse, a scope shows clean signals on all pins, and ViewPort shows correct waveforms in LSA mode!

    The good news is that I've sent off 4 of the PCB's I've shown at UPEW to production, so I can concentrate on VMCOG new for a few days.
    David Betz wrote: »
    Okay, now I'm completely confused. I pulled the SPI SRAM code out of VMCOG and wrote a simple test program to see if my Hydra SPI SRAM card would work with it. To simplify my testing I changed the page size to 64 but otherwise I'm running the code from VMCOG unchanged and it seems to work just fine with my Hydra SPI SRAM card. I have no idea why it doesn't work with the vmdebug test program. I'll have to try to understand more of the VMCOG code to see if I can figure it out. In the meantime, I've attached my SPI SRAM test program.
  • David BetzDavid Betz Posts: 14,516
    edited 2010-08-31 10:21
    Is there a description of your new boards posted somewhere? I had thought about buying Morpheus but somehow I thought there was a new version coming out so I decided to wait. Are these new boards you're talking about new versions of Morpheus and Mem+?
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-31 10:21
    Hmmm... working fine outside of VMCOG implies that memory within VMCOG is getting corrupted.

    One of the many self-modifying indirect stores may be going wild... best bet would be within the BUSERR handling, when it updates the TLB - if it somehow computed a bad cog address, that could easily clobber code within the cog, thus explaining the behavior you report, and the problem with ZOG!
    David Betz wrote: »
    Okay, now I'm completely confused. I pulled the SPI SRAM code out of VMCOG and wrote a simple test program to see if my Hydra SPI SRAM card would work with it. To simplify my testing I changed the page size to 64 but otherwise I'm running the code from VMCOG unchanged and it seems to work just fine with my Hydra SPI SRAM card. I have no idea why it doesn't work with the vmdebug test program. I'll have to try to understand more of the VMCOG code to see if I can figure it out. In the meantime, I've attached my SPI SRAM test program.
  • David BetzDavid Betz Posts: 14,516
    edited 2010-08-31 10:27
    Btw, I'd love it if you went through the VMCOG code - there may be a bug lurking, that you may find while understanding it, as Fibo under Zog crashes with some working set sizes. I can't seem to find it, even after looking hundreds of times.
    I will look it over tonight but I'll warn you that I'm far from an expert Spin/PASM programmer as you can probably tell from the code I wrote in my SPI SRAM test. Any good code in there was probably stolen from either you or Andre' LaMothe. :-)
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-31 10:50
    Yep, these are the new versions, and there are descriptions!

    Morpheus (pcb rev 2) and Mem+ (pcb rev 2) are described in towards the end of p.12 in the Morpheus thread:

    http://forums.parallax.com/showthread.php?t=113929

    I think you'd like the Morpheus Developer's Guide on my downloads page, as it explains the architecture. There is also a page on it on the site.

    The Developer's Guide applies to rev.2 pcb's as well, but I will have to add a couple of pages for the new IR features.

    PropCade is described in its own thread at:

    http://forums.parallax.com/showthread.php?t=121315

    The other board that went to production is 485Plug, described on p.13 of the Morpheus thread.

    I have 12 other boards going into production over the next month or two, including the high-end Morpheus+ / Mem* combination, and the mysterious "PLC-G"... along with a ton of industrial I/O modules for my boards. If you read the Morpheus thread starting p.12, I briefly described all the new boards except for PLC-G there :)


    David Betz wrote: »
    Is there a description of your new boards posted somewhere? I had thought about buying Morpheus but somehow I thought there was a new version coming out so I decided to wait. Are these new boards you're talking about new versions of Morpheus and Mem+?
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-08-31 10:51
    Every extra pair of eyeballs is MUCH appreciated - I figure I am too close to the code, and know too well how it "should" work, thus I might be missing something basic!
    David Betz wrote: »
    I will look it over tonight but I'll warn you that I'm far from an expert Spin/PASM programmer as you can probably tell from the code I wrote in my SPI SRAM test. Any good code in there was probably stolen from either you or Andre' LaMothe. :-)
  • David BetzDavid Betz Posts: 14,516
    edited 2010-09-01 09:01
    I've been reading through the VMCOG code trying to understand it and I have a general question about the behavior of the Propeller hub access instructions. Do the RDBYTE/RDWORD/RDLONG and WRBYTE/WRWORD/WRLONG instructions ignore all but the low order 16 bits of their source operands? In other words is RDLONG foo,$1000 interpreted the same as RDLONG foo,$ffff1000?
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-09-01 10:00
    You got it - the upper 16 bits are totally ignored

    RDLONG also ignores the two lowest bits

    RDWORD ignores the lowest bit
    David Betz wrote: »
    I've been reading through the VMCOG code trying to understand it and I have a general question about the behavior of the Propeller hub access instructions. Do the RDBYTE/RDWORD/RDLONG and WRBYTE/WRWORD/WRLONG instructions ignore all but the low order 16 bits of their source operands? In other words is RDLONG foo,$1000 interpreted the same as RDLONG foo,$ffff1000?
  • David BetzDavid Betz Posts: 14,516
    edited 2010-09-01 16:07
    Okay, I'm going to try my hand at offering a suggestion. I think the following code:
    shr_hits      ' walk through TLB, divide all non-zero hit counts by two
            movs  jx,#0             ' finding candidate page to sacrifice
    
    forj
    
    jx      mov   tlbi,0-0 wz
     if_z   jmp   #nextj
            movd  updtc,jx
            mov   temp,tlbi
            andn  temp,elevenbits
            shr   tlbi,#1
            andn  tlbi,elevenbits
            or    tlbi,temp
    
    updtc   mov   0-0,tlbi
    
            ' next ix
    nextj   add   jx, #1
            and   jx, #128 nr, wz
     if_z   jmp   #forj
    shr_hits_ret  ret
    
    elevenbits long $07FF
    

    Could be changed to this:
    shr_hits      ' walk through TLB, divide all non-zero hit counts by two
            movs  jx,#0             ' finding candidate page to sacrifice
    
    forj
    
    jx      mov   tlbi,0-0 wz
     if_z   jmp   #nextj
            movd  updtc,jx
            mov   temp,tlbi
    ' don't need to mask out the high bits here because we do it below
            shr   tlbi,#1
            andn  tlbi,elevenbits
    ' need to mask out the current count before combining with the updated count
            and   tlbi,elevenbits
            or    tlbi,temp
    
    updtc   mov   0-0,tlbi
    
            ' next ix
    nextj   add   jx, #1
            and   jx, #128 nr, wz
     if_z   jmp   #forj
    shr_hits_ret  ret
    
    elevenbits long $07FF
    

    There is no need to mask the count twice, once before and once after the right shift. On the other hand, we do need to mask out the current count before ORing with the new count. Otherwise we get the combination of both sets of bits.

    Also, this subroutine is entered when a count overflows. That means that the entry pointed to by vmpage has a zero count. Nothing is done in this code to adjust that. I'm not sure if a zero count will cause any problems but it will make the most frequently accessed page appear to be the least frequently accessed page. Another possible problem is that the entire entry could be zero if it happens to be pointing to the first page in hub RAM and the DIRTY and LOCK bits are also clear. This would make it look like the page wasn't in the cache.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-09-01 16:29
    Thanks David!

    Actually, you found a bug - there is a need to mask twice, but the first one should have been "and" not andn!

    The "and" was to preserve the hub page allocated to that VM page, and it was being lost. This is a serious bug, that I would have noticed had I not had my nose buried in the code too long... The extra 'n' (ie andn instead of and) was the culprit, not fixing the hit could would just have caused a performance hit.

    I think there is an excellent chance that this is the cause of the fibo() problems, as it would clear the pointer to the page in the working set when the hit count overflowed - thus clobbering the lowest 512 bytes in memory!

    As you noticed, I forgot to put in a fixed count for the count that wrapped, I added code to do that as well.

    The routine below should work now, and I would not be at all surprised if it fixes the fibo() problem.

    Frankly, I would not be surprised if this measurably improves performance.

    THANK YOU!
    David Betz wrote: »
    Okay, I'm going to try my hand at offering a suggestion. <snip>

    There is no need to mask the count twice, once before and once after the right shift. On the other hand, we do need to mask out the current count before ORing with the new count. Otherwise we get the combination of both sets of bits.

    Also, this subroutine is entered when a count overflows. That means that the entry pointed to by vmpage has a zero count. Nothing is done in this code to adjust that. I'm not sure if a zero count will cause any problems but it will make the most frequently accessed page appear to be the least frequently accessed page. Another possible problem is that the entire entry could be zero if it happens to be pointing to the first page in hub RAM and the DIRTY and LOCK bits are also clear. This would make it look like the page wasn't in the cache.
    '----------------------------------------------------------------------------------------------------
    '
    ' SHR_HITS - divide all valid hit counts by two, called when a hit count would wrap around
    '
    ' NOTE: shr_hits is not debugged yet!
    '
    '----------------------------------------------------------------------------------------------------
    
    
    shr_hits      ' walk through TLB, divide all non-zero hit counts by two
            movs  jx,#0             ' finding candidate page to sacrifice
            movd  fixup,vmpage
    forj
    
    jx      mov   tlbi,0-0 wz
     if_z   jmp   #nextj
            movd  updtc,jx
            mov   temp,tlbi
            and   temp,elevenbits
            shr   tlbi,#1
            andn  tlbi,elevenbits
            or    tlbi,temp
    
    updtc   mov   0-0,tlbi
    
            ' next ix
    nextj   add   jx, #1
            and   jx, #128 nr, wz
     if_z   jmp   #forj
    
    ' fix overflow count, give it half-count
    
    fixup   or    0-0,halfcount
    
    shr_hits_ret  ret
    
    elevenbits long $07FF
    halfcount  long $80000000
    
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-09-01 16:36
    Here is an experimental 0.981 release of VMCOG, with fixes for the bug David just found!

    I think this may very well fix the fibo under ZOG under VMCOG issue, and run a bit faster to boot :)
  • David BetzDavid Betz Posts: 14,516
    edited 2010-09-01 18:51
    Actually, you found a bug - there is a need to mask twice, but the first one should have been "and" not andn!
    Sorry, I guess there was a bug in my bug fix! I failed to notice that you were shifting the original value not the one you had just ANDNed. I'm glad you caught my error before releasing your fix.
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-09-01 19:41
    Thank you for trying to optimize it - going over your suggested change is what made me notice the bug!
    David Betz wrote: »
    Sorry, I guess there was a bug in my bug fix! I failed to notice that you were shifting the original value not the one you had just ANDNed. I'm glad you caught my error before releasing your fix.
  • Heater.Heater. Posts: 21,230
    edited 2010-09-01 22:23
    ZOG no like.

    This is worse. Depending on the number of pages I have (I tried 8, 10, 20) it either hangs up around fibo(21) or continues a few more fibos with wrong results and then hangs up.

    The heater test in my old vmdebug works OK though.

    I cannot compile the new vmdebug, BST is complaining about not finding hex method in FullDuplexSerialPlus. No idea why.

    Bill, can you take the TRIBLADE_2 sections from the attached VMCog. It is your last 0.981 version + TRIBLADE_2.
Sign In or Register to comment.