Shop OBEX P1 Docs P2 Docs Learn Events
New XMM hardware - Page 5 — Parallax Forums

New XMM hardware

1235

Comments

  • average joeaverage joe Posts: 795
    edited 2012-05-25 19:43
    Ok guys, I think I got it! I was trashing "hubadd" in the loops. After reading the dracblade driver, I figured it out!
    {
    Skeleton JCACHE external RAM driver
      Copyright (c) 2011 by David Betz
    
      Based on code by Steve Denson (jazzed)
      Copyright (c) 2010 by John Steven Denson
    
      Inspired by VMCOG - virtual memory server for the Propeller
      Copyright (c) February 3, 2010 by William Henning
    
      For the EuroTouch 161 By James Moxham and Joe Heinz
      
      TERMS OF USE: MIT License
    
      Permission is hereby granted, free of charge, to any person obtaining a copy
      of this software and associated documentation files (the "Software"), to deal
      in the Software without restriction, including without limitation the rights
      to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
      copies of the Software, and to permit persons to whom the Software is
      furnished to do so, subject to the following conditions:
    
      The above copyright notice and this permission notice shall be included in
      all copies or substantial portions of the Software.
    
      THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
      IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
      FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
      AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
      LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,ARISING FROM,
      OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
      THE SOFTWARE.
    }
    
    CON
    
      ' default cache dimensions
      DEFAULT_INDEX_WIDTH   = 6
      DEFAULT_OFFSET_WIDTH  = 7
    
      ' cache line tag flags
      EMPTY_BIT             = 30
      DIRTY_BIT             = 31
    
    PUB image
      return @init_vm
    
    DAT
            org   $0
    
    ' initialization structure offsets
    ' $0: pointer to a two word mailbox
    ' $4: pointer to where to store the cache lines in hub ram
    ' $8: number of bits in the cache line index if non-zero (default is DEFAULT_INDEX_WIDTH)
    ' $a: number of bits in the cache line offset if non-zero (default is DEFAULT_OFFSET_WIDTH)
    ' note that $4 must be at least 2^(index_width+offset_width) bytes in size
    ' the cache line mask is returned in $0
    
    init_vm mov     t1, par             ' get the address of the initialization structure
            rdlong  pvmcmd, t1          ' pvmcmd is a pointer to the virtual address and read/write bit
            mov     pvmaddr, pvmcmd     ' pvmaddr is a pointer into the cache line on return
            add     pvmaddr, #4
            add     t1, #4
            rdlong  cacheptr, t1        ' cacheptr is the base address in hub ram of the cache
            add     t1, #4
            rdlong  t2, t1 wz
      if_nz mov     index_width, t2     ' override the index_width default value
            add     t1, #4
            rdlong  t2, t1 wz
      if_nz mov     offset_width, t2    ' override the offset_width default value
    
            mov     index_count, #1
            shl     index_count, index_width
            mov     index_mask, index_count
            sub     index_mask, #1
    
            mov     line_size, #1
            shl     line_size, offset_width
            mov     t1, line_size
            sub     t1, #1
            wrlong  t1, par
    
            ' put external memory initialization here
    
            jmp     #vmflush
    
    fillme  long    0[128-fillme]           ' first 128 cog locations are used for a direct mapped cache table
    
            fit   128
    
            ' initialize the cache lines
    vmflush movd    :flush, #0
            mov     t1, index_count
    :flush  mov     0-0, empty_mask
            add     :flush, dstinc
            djnz    t1, #:flush
    
            ' start the command loop
    waitcmd wrlong  zero, pvmcmd
    :wait   rdlong  vmline, pvmcmd wz
      if_z  jmp     #:wait
    
            shr     vmline, offset_width wc ' carry is now one for read and zero for write
            mov     set_dirty_bit, #0       ' make mask to set dirty bit on writes
            muxnc   set_dirty_bit, dirty_mask
            mov     line, vmline            ' get the cache line index
            and     line, index_mask
            mov     hubaddr, line
            shl     hubaddr, offset_width
            add     hubaddr, cacheptr       ' get the address of the cache line
            wrlong  hubaddr, pvmaddr        ' return the address of the cache line
            movs    :ld, line
            movd    :st, line
    :ld     mov     vmcurrent, 0-0          ' get the cache line tag
            and     vmcurrent, tag_mask
            cmp     vmcurrent, vmline wz    ' z set means there was a cache hit
      if_nz call    #miss                   ' handle a cache miss
    :st     or      0-0, set_dirty_bit      ' set the dirty bit on writes
            jmp     #waitcmd                ' wait for a new command
    
    ' line is the cache line index
    ' vmcurrent is current cache line
    ' vmline is new cache line
    ' hubaddr is the address of the cache line
    miss    movd    :test, line
            movd    :st, line
    :test   test    0-0, dirty_mask wz
      if_z  jmp     #:rd                    ' current cache line is clean, just read new one
            mov     vmaddr, vmcurrent
            shl     vmaddr, offset_width
            call    #wr_cache_line          ' write current cache line
    :rd     mov     vmaddr, vmline
            shl     vmaddr, offset_width
            call    #rd_cache_line          ' read new cache line
    :st     mov     0-0, vmline
    miss_ret ret
    
    ' pointers to mailbox entries
    pvmcmd          long    0       ' on call this is the virtual address and read/write bit
    pvmaddr         long    0       ' on return this is the address of the cache line containing the virtual address
    
    cacheptr        long    0       ' address in hub ram where cache lines are stored
    vmline          long    0       ' cache line containing the virtual address
    vmcurrent       long    0       ' current selected cache line (same as vmline on a cache hit)
    line            long    0       ' current cache line index
    set_dirty_bit   long    0       ' DIRTY_BIT set on writes, clear on reads
    
    zero            long    0       ' zero constant
    dstinc          long    1<<9    ' increment for the destination field of an instruction
    t1              long    0       ' temporary variable
    t2              long    0       ' temporary variable
    
    tag_mask        long    !(1<<DIRTY_BIT) ' includes EMPTY_BIT
    index_width     long    DEFAULT_INDEX_WIDTH
    index_mask      long    0
    index_count     long    0
    offset_width    long    DEFAULT_OFFSET_WIDTH
    line_size       long    0                       ' line size in bytes
    empty_mask      long    (1<<EMPTY_BIT)
    dirty_mask      long    (1<<DIRTY_BIT)
    
    ' input parameters to rd_cache_line and wr_cache_line
    vmaddr          long    0       ' external address
    hubaddr         long    0       ' hub memory address
    ' temporaries used by BREAD and BWRITE
    
    ptr           long    0
    count         long    0  ' copy line size and devide by two for byte-word offset
    
    
    'get_values              rdlong  hubaddr, hubptr         ' get hub address
    '                        rdlong  ramaddr, ramptr         ' get ram address
    '                        rdlong  len, lenptr             ' get length
    '                        mov     err, #5                 ' err=5
    'get_values_ret          ret
    
    'init                    'mov     err, #0                  ' reset err=false=good
                            'mov     dira,zero                ' tristate the pins with the cog dira
    '                        and     dira,maskP0P20P22       ' tristates all the common pins
    
    'done                    'wrlong  err, errptr             ' status  =0=false=good, else error x
                            'wrlong  zero, comptr            ' command =0 (done)
                            
               ' Pass pasm_n = 0- 7 come to this with P0-P20 and P22 tristated and returns them as this too
    set137                  or      dira,maskP22            ' pin 22 is an output
                            andn    outa,maskP22            ' set P22low so Y0-Y7 are all high
                            or      dira,maskP0P20          ' pins P0-P20 are outputs
                            and     outa,maskP0P2low        ' set these 3 pins low
                            or      outa,pasm_n             ' set the 137 pins
                            or      outa,maskP22            ' pin 22 high
                            
    set137_ret              ret                             ' return                        
    
    
    load161pasm                                             ' uses vmaddr
                            mov     count, line_size       ' make a copy of line_size AND.
                            shr     count, #1              ' devide lenght by two for word-byte
                                            
                            mov     ptr, hubaddr            ' hubaddr = hub page address
                            
                            or      outa,maskP0P20          ' set P0-P20 high     
                            or      dira,maskP0P20          ' output pins 0-20
                            mov     pasm_n,#0               ' group 0
                            call    #set137                 ' set the 137 output
                            and     outa,maskP0P18low       ' pins 0-18 set low
                            or      outa,vmaddr             ' output addres to 161 chips
                            or      outa,maskP19            ' clock high
                            or      outa,maskP20            ' load high
                            andn    outa,maskP19            ' clock low
                            andn    outa,maskP20            ' load low
                            or      outa,maskP19            ' clock high
                            or      outa,maskP20            ' load high
    
                            'andn    outa,maskP20            ' load low
                            'andn    outa,maskP19            ' clock low
    
       
                            'or      outa,maskP19            ' clock high
                            'or      outa,maskP20            ' load high
                             
    load161pasm_ret         ret
    
    stop                   jmp     #stop                  ' for debugging
    
    memorytransfer          or      dira,maskP16P20         ' so /wr and other pins definitely high
                            or      outa,maskP16P20
                            mov     pasm_n,#1               ' back to group 1 for memory transfer
                            call    #set137                 ' as next routine will always be group 1
                            or      dira,maskP16P20         ' output pins 16-20
                            or      outa,maskP16P20         ' set P16-P20 high (P0-P15 set as inputs or outputs in the calling routine)
    memorytransfer_ret      ret
    
    busoutput               or      dira,maskP0P15          ' set prop pins 0-15 as outputs
    busoutput_ret           ret
    
    businput                and     dira,maskP16P31         ' set P0-P15 as inputs
    businput_ret            ret  
    
    delaynop                nop
                            nop
                            nop
                            nop
    delaynop_ret            ret
    
    '----------------------------------------------------------------------------------------------------
    '
    ' rd_cache_line - read a cache line from external memory
    '
    ' vmaddr is the external memory address to read
    ' hubaddr is the hub memory address to write
    ' line_size is the number of bytes to read
    '
    '----------------------------------------------------------------------------------------------------
                
    rd_cache_line
    
    ' command T
    pasmramtohub            
                            call    #load161pasm            ' load the 161 counters with ramaddr
                            call    #memorytransfer         ' set to group 1, enable P16-P20 as outputs and set P16-P20 high                             
                            call    #businput               ' set prop pins P0-P15 as inputs
                            andn    outa,maskP16            ' memory /rd low
    ramtohub_loop           mov     data_16,ina             ' get the data
                            wrword  data_16,ptr         ' move data to hub
                            andn    outa,maskP19            ' clock 161 low
                            or      outa,maskP19            ' clock 161 high
                            add     ptr,#2              ' increment the hub address
                             
                            djnz    count,#ramtohub_loop
                            or      outa,maskP16            ' memory /rd high  
                            or      dira,maskP0P15          ' %00000000_00000000_11111111_11111111 restore P0-P15as outputs
                            
                            and     dira,maskP0P20P22       ' tristates all the common pins
    rd_cache_line_ret
            ret
    
    '----------------------------------------------------------------------------------------------------
    '
    ' wr_cache_line - write a cache line to external memory
    '
    ' vmaddr is the external memory address to write
    ' hubaddr is the hub memory address to read
    ' line_size is the number of bytes to write
    '
    '----------------------------------------------------------------------------------------------------
    
    wr_cache_line
     ' command S
    pasmhubtoram            
                            call    #load161pasm            ' load the 161 counters with ramaddr
                            call    #memorytransfer         ' set to group 1, enable P16-P20 as outputs and set P16-P20 high
                            call    #busoutput              ' set prop pins P0-P15 as outputs
    hubtoram_loop           and     outa,maskP16P31         '%11111111_11111111_00000000_00000000       ' clear for output                   
                            rdword  data_16,ptr         ' get the word from hub
                            and     data_16,maskP0P15       ' mask to a word only
                            or      outa,data_16            ' send out the byte to P0-P15
                            andn    outa,maskP17            ' set mem write low
                            add     ptr,#2              ' increment by 2 bytes = 1 word. Put this here for small delay while writes
                            or      outa,maskP17            ' mem write high
                            andn    outa,maskP19            ' clock 161 low
                            or      outa,maskP19            ' clock 161 high
                            
                            djnz    count,#hubtoram_loop      ' loop this many times
                            
                            and     dira,maskP0P20P22       ' tristates all the common pins
    wr_cache_line_ret
            ret
    
    
    
    
            
    pasm_n                  long    0                                    ' general purpose value
    data_16                 long    0                                    ' general purpose value
                                    
    
    
    maskP0P2low             long    %11111111_11111111_11111111_11111000 ' P0-P2 low
    maskP0P20               long    %00000000_00011111_11111111_11111111 ' P0-P18 enabled for output plus P19,P20    
    maskP0P18low            long    %11111111_11111000_00000000_00000000 ' P0-P18 low
    maskP16                 long    %00000000_00000001_00000000_00000000 ' pin 16
    maskP17                 long    %00000000_00000010_00000000_00000000 ' pin 17
    maskP18                 long    %00000000_00000100_00000000_00000000 ' pin 18
    maskP19                 long    %00000000_00001000_00000000_00000000 ' pin 19
    maskP20                 long    %00000000_00010000_00000000_00000000 ' pin 20
    maskP22                 long    %00000000_01000000_00000000_00000000 ' pin 22
    maskP16P31              long    %11111111_11111111_00000000_00000000 ' pin 16 to pin 31
    maskP0P15               long    %00000000_00000000_11111111_11111111 ' for masking words
    maskP16P20              long    %00000000_00011111_00000000_00000000
    maskP0P20P22            long    %11111111_10100000_00000000_00000000 ' for returning all group pins HiZ
    
    
            fit     496
    
    The change is this :
                            mov     ptr, hubaddr            ' hubaddr = hub page address
    
    Then change all loop references of hubaddr to ptr.
    Now all tests pass!


    Now I need to figure out how to test the actual read and write speeds! Any test scrips for this?
  • David BetzDavid Betz Posts: 14,516
    edited 2012-05-25 19:56
    Congratulations!! It sounds like I should annotate the skeleton driver a little better so it says you have to leave line_size and hubaddr unchanged in your rd/wr code.
  • average joeaverage joe Posts: 795
    edited 2012-05-25 20:06
    That would be very helpful! I'm sorry I sent you guys on a wild goose chase. Now I know for in the future. Also, are there any scripts to test the performance of the cache driver? I know this could still use some tuning and optimizations. Would it be helpful to unroll the loops . eg move the code in set137 and load161 to the loops?
    Thanks again for all your help!
    Joe
  • jazzedjazzed Posts: 11,803
    edited 2012-05-25 23:00
    Hi Joe,

    Glad you got something running. Now what else needs to be done?

    To try running programs there are a few things to do:

    1. Create a .dat file with BSTC
    2. Copy the .dat file to the c:\propgcc\propeller-load directory
    3. Add the cache driver to your touch161.cfg file.
    4. Start SimpleIDE and try a program

    Details:

    1. Create a .dat file.
    Assuming your cache source is called touch161_cache.spin, use this bstc command in a CMD window:
    bstc -Ograux -c touch161_cache.spin
    This creates a touch161_cache.dat file.
    2. Copy the .dat file
    copy touch161_cache.dat c:\propgcc\propeller-load
    3. Add the cache-driver
    Edit the touch161.cfg file so that it looks like this:
    # touch161
    # IDE:SDLOAD
    # IDE:SDXMMC
        clkfreq: 80000000
        clkmode: XTAL1+PLL16X
        baudrate: 115200
        rxpin: 31
        txpin: 30
        cache-driver: touch161_cache.dat
        cache-size: 8K
        cache-param1: 0
        cache-param2: 0
       sd-driver: sd_driver.dat
        sdspi-do: 24
        sdspi-clk: 25
        sdspi-di: 26
        sdspi-cs: 27
    

    4. Start SimpleIDE ...
    Since your test_cache.spin tests all pass, you should be able to run some xmm programs.
    Open the hello demo.

    a. Establish a "basis" with LMM mode. That is, choose memory model LMM, and Run Console F8
    Verify that LMM hello works.

    b. Change memory model to XMMC, select Board Type TOUCH161, and Run Console F8.
    Verify that XMMC hello works.

    c. Change memory model to XMM-SINGLE, keep Board Type TOUCH161, and Run Console F8.
    Verify that XMM-SINGLE hello works.


    I'll post some SimpleIDE package code for you to do some performance comparisons later.
  • average joeaverage joe Posts: 795
    edited 2012-05-25 23:29
    Hmm, it appears as if XMMC will not run unless I select boardtype *board*-SDXMMC?
    XMM-Single works, not sure why XMMC wouldn't. I also tested SD card test in XMM-single. I tried to compile dry.c but:
    dry.c:415:23: fatal error: sys/times.h: No such file or directory
    dry.c:417:76: fatal error: sys/param.h: No such file or directory
    
    that's as far as I get
    Not sure how to set this?
    #include <sys/param.h> /* If your system doesn't have this, use -DHZ=xxx */
    Sounds like I'm getting ahead of myself since no XMMC though.
  • jazzedjazzed Posts: 11,803
    edited 2012-05-26 00:04
    Joe,

    SimpleIDE will not build the dhrystone test because of the way it must be built.
    I've attached a dry_xmmc.elf and dry_xmm_single.elf in a .zip for you.

    Use the loader in a command window for these examples.

    propeller-load -r -t -b touch161 dry_xmmc.elf
    propeller-load -r -t -b touch161 dry_xmm_single.elf

    Did you try the hello examples?

    Hmm, it appears as if XMMC will not run unless I select boardtype *board*-SDXMMC?
    XMM-Single works, not sure why XMMC wouldn't. I also tested SD card test in XMM-single. I tried to compile dry.c but:
    dry.c:415:23: fatal error: sys/times.h: No such file or directory
    dry.c:417:76: fatal error: sys/param.h: No such file or directory
    
    that's as far as I get
    Not sure how to set this?
    #include <sys/param.h> /* If your system doesn't have this, use -DHZ=xxx */
    Sounds like I'm getting ahead of myself since no XMMC though.
    dry.zip 30.3K
  • average joeaverage joe Posts: 795
    edited 2012-05-26 18:34
    For some reason XMMC still doesn't work. No hello world, nothing. XMM seems to work fine though?
    C:\Users\Joe\Downloads\dry>propeller-load -r -t -b dracTouchEX dry_xmm_single.elf
    Propeller Version 1 on COM21
    Loading the serial helper to hub memory
    9528 bytes sent
    Verifying RAM ... OK
    Loading cache driver 'ET_cache.dat'
    1088 bytes sent
    Loading program image to RAM
    17408 bytes sent
    Loading .xmmkernel
    1724 bytes sent
    [ Entering terminal mode. Type ESC or Control-C to exit. ]
    
    Dhrystone Benchmark, Version C, Version 2.2
    Program compiled without 'register' attribute
    Using STDC clock(), HZ=80000000
    
    Trying 5000 runs through Dhrystone:
    Final values of the variables used in the benchmark:
    
    Int_Glob:            5
            should be:   5
    Bool_Glob:           1
            should be:   1
    Ch_1_Glob:           A
            should be:   A
    Ch_2_Glob:           B
            should be:   B
    Arr_1_Glob[8]:       7
            should be:   7
    Arr_2_Glob[8][7]:    5010
            should be:   Number_Of_Runs + 10
    Ptr_Glob->
      Ptr_Comp:          536899232
            should be:   (implementation-dependent)
      Discr:             0
            should be:   0
      Enum_Comp:         2
            should be:   2
      Int_Comp:          17
            should be:   17
      Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
            should be:   DHRYSTONE PROGRAM, SOME STRING
    Next_Ptr_Glob->
      Ptr_Comp:          536899232
            should be:   (implementation-dependent), same as above
      Discr:             0
            should be:   0
      Enum_Comp:         1
            should be:   1
      Int_Comp:          18
            should be:   18
      Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
            should be:   DHRYSTONE PROGRAM, SOME STRING
    Int_1_Loc:           5
            should be:   5
    Int_2_Loc:           13
            should be:   13
    Int_3_Loc:           7
            should be:   7
    Enum_Loc:            1
            should be:   1
    Str_1_Loc:           DHRYSTONE PROGRAM, 1'ST STRING
            should be:   DHRYSTONE PROGRAM, 1'ST STRING
    Str_2_Loc:           DHRYSTONE PROGRAM, 2'ND STRING
            should be:   DHRYSTONE PROGRAM, 2'ND STRING
    
    Microseconds for one run through Dhrystone: 1400
    Dhrystones per Second:                      714
    
  • David BetzDavid Betz Posts: 14,516
    edited 2012-05-26 18:38
    Is it possible that 24k isn't enough RAM to run your program? In xmmc mode the code is in external memory but the data must fit in hub memory along with the cache. The cache size is probably set to 8k so that means the program you're running can't use more than 24k of RAM.
  • average joeaverage joe Posts: 795
    edited 2012-05-26 18:43
    I think the hello.c should fit in 24k of ram? Not sure about the DrAFile or Dhrystone. Theoretically, if hello.c won't run, then nothing will. SDXMMC will run, but that's not what we want. I also tried adding SDLoad to the cfg file and this will not run either.
  • David BetzDavid Betz Posts: 14,516
    edited 2012-05-26 18:55
    I think the hello.c should fit in 24k of ram? Not sure about the DrAFile or Dhrystone. Theoretically, if hello.c won't run, then nothing will. SDXMMC will run, but that's not what we want. I also tried adding SDLoad to the cfg file and this will not run either.
    The SD loader should work as long as whatever program it's trying to run works. It can load either xmm or xmmc programs. If you try running the xmm program that's been working directly with propeller-load then it should work.

    Here is one reason why xmmc might not work:

    A program compiled using -mxmmc will have code that lives at 0x30000000. A program that is compiled in -mxmm-single mode will have code and data that live at 0x20000000. In order for an xmmc program to work, your cache driver must make the external memory visible at the 0x30000000 address. Most cache drivers don't decode the high order bits anyway so there are images of the memory repeated starting at 0x20000000 and repeating throughout the rest of memory. If you're decoding those high order bits then that might be why your xmmc program isn't working. Also, you might want to mask off those bits anyway since I believe your memory has many fewer address bits. Could those high bits be causing you trouble?
  • average joeaverage joe Posts: 795
    edited 2012-05-26 19:15
    That very well could be it!
    Now let me say I do understand what you're saying, for the most part. To make sure I DO understand:

    Mask of the high bits of the EXTERNAL address. eg:
                            mov     address, vmaddr
                            and     address,maskhighadd
    ..
    ..
    maskhighadd             long    %00000000_00000111_11111111_11111111       'access to full memory space
    
    So copy vmaddr and then mask off the address bits my memory doesn't have? Now am I able to mask off even more bits to "reserve" memory for the rest of the system? Say :
    maskhighadd             long    %00000000_00000000_01111111_11111111          'access to 32k partition of memory?
    
  • jazzedjazzed Posts: 11,803
    edited 2012-05-27 11:45
    Got your board and promptly blew it up ! ;)

    Kidding of course. Did have to find my own HC00 chip though.


    Word wide memory needs a small adjustment: A1 should map to A0 on the chips. Add line 205.
                            and     outa,maskP0P18low       ' pins 0-18 set low
                            shr     vmaddr, #1              ' schematic connects SRAM A0 to A0, not A1 - jsd. line 205
                            or      outa,vmaddr             ' output addres to 161 chips
    



    Preliminary results before optimizations - SSF listed for comparison:



    Memory Model
    Board Type
    Dhrystones/Second


    LMM
    SSF (HUB)
    6983


    XMMC
    SSF
    1256


    LMM
    TOUCH161 (HUB)
    6983


    XMMC
    TOUCH161
    1278


    XMM-SINGLE
    TOUCH161
    713

  • David BetzDavid Betz Posts: 14,516
    edited 2012-05-27 14:41
    Cool! So the address shift issue is what was causing Joe's problem? His cache driver is now working?
  • jazzedjazzed Posts: 11,803
    edited 2012-05-27 14:45
    Below are results with fast read and other optimizations. Faster writes will take more work because of the write strobe requirement. Performance for this test and applications could be different with various cache line sizes. Currently the cache line size is 128 bytes and the whole read burst happens in 16us - 8MB/s line read.

    Preliminary results before optimizations - SSF listed for comparison:



    Memory Model
    Board Type
    Dhrystones/Second


    LMM
    TOUCH161 (HUB)
    6983


    XMMC
    TOUCH161
    1364


    XMM-SINGLE
    TOUCH161
    754

  • average joeaverage joe Posts: 795
    edited 2012-05-27 15:59
    This is very exciting! So, I DL touch_cache and built the dat with "bstc -Ograux -c touch_cache.spin" GOOD.

    Then, I ran dry xmm-single and the numbers check out. 754 DhryPerSec. GOOD.
    C:\Users\Joe\Downloads\dry>propeller-load -r -t -b touch161 dry_xmm_single.elf
    Propeller Version 1 on COM21
    Loading the serial helper to hub memory
    9528 bytes sent
    Verifying RAM ... OK
    Loading cache driver 'new_cache.dat'
    1084 bytes sent
    Loading program image to RAM
    17408 bytes sent
    Loading .xmmkernel
    1724 bytes sent
    [ Entering terminal mode. Type ESC or Control-C to exit. ]
    
    Dhrystone Benchmark, Version C, Version 2.2
    Program compiled without 'register' attribute
    Using STDC clock(), HZ=80000000
    
    Trying 5000 runs through Dhrystone:
    Final values of the variables used in the benchmark:
    
    Int_Glob:            5
            should be:   5
    Bool_Glob:           1
            should be:   1
    Ch_1_Glob:           A
            should be:   A
    Ch_2_Glob:           B
            should be:   B
    Arr_1_Glob[8]:       7
            should be:   7
    Arr_2_Glob[8][7]:    5010
            should be:   Number_Of_Runs + 10
    Ptr_Glob->
      Ptr_Comp:          536899232
            should be:   (implementation-dependent)
      Discr:             0
            should be:   0
      Enum_Comp:         2
            should be:   2
      Int_Comp:          17
            should be:   17
      Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
            should be:   DHRYSTONE PROGRAM, SOME STRING
    Next_Ptr_Glob->
      Ptr_Comp:          536899232
            should be:   (implementation-dependent), same as above
      Discr:             0
            should be:   0
      Enum_Comp:         1
            should be:   1
      Int_Comp:          18
            should be:   18
      Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
            should be:   DHRYSTONE PROGRAM, SOME STRING
    Int_1_Loc:           5
            should be:   5
    Int_2_Loc:           13
            should be:   13
    Int_3_Loc:           7
            should be:   7
    Enum_Loc:            1
            should be:   1
    Str_1_Loc:           DHRYSTONE PROGRAM, 1'ST STRING
            should be:   DHRYSTONE PROGRAM, 1'ST STRING
    Str_2_Loc:           DHRYSTONE PROGRAM, 2'ND STRING
            should be:   DHRYSTONE PROGRAM, 2'ND STRING
    
    Microseconds for one run through Dhrystone: 1324
    Dhrystones per Second:                      754
    
    
    Then I try the xmmC and I still get nothing? Fail
    C:\Users\Joe\Downloads\dry>propeller-load -r -t -b touch161 dry_xmmc.elf
    Propeller Version 1 on COM21
    Loading the serial helper to hub memory
    9528 bytes sent
    Verifying RAM ... OK
    Loading cache driver 'new_cache.dat'
    1084 bytes sent
    Loading program image to flash
    15412 bytes sent
    Loading .xmmkernel
    1724 bytes sent
    [ Entering terminal mode. Type ESC or Control-C to exit. ]
    
    
    I still don't know what my xmmc issue is *NOOB* Still the numbers are pretty good.

    Now I'm worried though, because we might have a new board on the way with a different Group Select chip *MCP23008* I will be using the current board for quite a while, since I have a software fix for the display issue.

    *edit*
    Sorry about making you look for the HC00. I only have 1.
  • jazzedjazzed Posts: 11,803
    edited 2012-05-27 16:55
    Hi Joe,

    Well at least something jives. Maybe you should play some blues guitar and try again later.
    I still don't know what my xmmc issue is *NOOB* Still the numbers are pretty good.

    Now I'm worried though, because we might have a new board on the way with a different Group Select chip *MCP23008* I will be using the current board for quite a while, since I have a software fix for the display issue.

    *edit*
    Sorry about making you look for the HC00. I only have 1.

    I don't get the dry_xmmc.elf problem. Works fine for me.
    How big is the file? Is it a text file now by any chance?
    >dir /A dry_xmmc.elf
     Volume in drive C is OS
     Volume Serial Number is 6C8D-7E65
    
    
     Directory of c:\gccdev\propside\MyProjects\dry
    
    
    05/27/2012  04:40 PM            37,526 dry_xmmc.elf
    ....
    

    Downloaded it again and tested using the config file below:
    propeller-load -r -t -b touch161 dry_xmmc.elf

    # touch161
    # IDE:SDLOAD
    # IDE:SDXMMC
        clkfreq: 80000000
        clkmode: XTAL1+PLL16X
        baudrate: 115200
        rxpin: 31
        txpin: 30
        tvpin: 12   # only used if TV_DEBUG is defined
        cache-driver: touch_cache.dat
        cache-size: 8K
        cache-param1: 0
        cache-param2: 0
        sd-driver: sd_driver.dat
        sdspi-do: 24
        sdspi-clk: 25
        sdspi-di: 26
        sdspi-cs: 27
        load-target: ram
    

    Can't just write this off. We need a root-cause otherwise we could be hiding some problem by accident.
    Not sure how to get there though. The only thing i can think of is that the dry_xmmc.elf file is corrupted.

    Thanks,
    --Steve
  • average joeaverage joe Posts: 795
    edited 2012-05-27 17:07
    I knew it was a noob mistake. Missing load target = ram. I will run everything through xmmc to make sure this was the problem but it seems like it. I get the same number as you do in both modes now! How exciting!!!

    I can't thank everyone enough! Steve, David, James and everyone else who has contributed to the community. You guys rock!
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-05-27 18:10
    Wow, lots happened here while I was asleep! Great work.

    One driving force for getting a GUI into C is that there will eventually not be enough room in Spin. The demo program I am using is taking about 3/4 of hub memory and every extra demo function takes it inexorably towards that moment when there is no code space left.

    Also the demo program is getting unwieldy now as we are needing different code for the two displays. So that may well lend itself to a C Class or similar, where you call a common function DrawPixel(x,y,color) and it works for both displays.

    So one could think about two header files, ILI9325.h and SSD1289.h

    Each would have the same functions - load a font, draw a radiobox, draw a textbox, draw a line etc but the code would be different.

    I've been reading through this webpage on headers http://www.gamedev.net/page/resources/_/technical/general-programming/organizing-code-files-in-c-and-c-r1798 which has some great advice.

    I also need to get my head around writing pasm in C. I pushed Catalina out to the limit with this but it could have been easier than it was. One nice thing about the proptool and spin is you can have the pasm and spin code in the same file. You can even copy and paste them so they are near each other so it is a couple of taps on a page up or page down button to swap between the two. That makes debugging a lot easier. At the other extreme, there is the idea of binary blobs of pasm that you have to copy to an SD card as separate files that are precompiled. That involves lots of removing the SD card which is a pain, slower and I ended up solving that by automating downloads of the pasm part as part of the compile process. That got complicated behind the scenes as there was a precompiler that split the pasm part out of the C program, then the pasm part was compiled separately and downloaded separately, and the C part was then compiled and run. I ended up writing an IDE and it had two panes, one for C and one for pasm.

    How is this being done in GCC?

    Can you mix and match pasm like in the Spintool? Is it better to actually write the pasm in C (I think this is possible, right?)

    Or is it better to keep things totally separate - GCC does C and when you start a cog you pass a function the name of a binary file - "mycog.bin" - and that is loaded off the SD card and into a cog?

    In the latter scenario, one would have SimpleIDE and the Proptool open at the same time and use Windows to flip between the two. I did that for a while in Catalina and it is a quite plausible way of doing development albeit slow until we got file transfer to SD card working using xmodem. Not quite as "integreated" as the Spin proptool but at least it worked with existing tools.

    What is the best way to proceed?

    I guess as a practical example, the latest board uses the MCP23008 chip. That will involve getting an I2C driver from the Obex. I haven't looked in detail, but hopefully there is one that conforms to what I call the "Gracey" standard based on Chips original mouse/display drivers where all variables are passed as a contiguous array at cog startup. If so, then the pasm part could compiled separately and the Spin translated to C.

    And the temptation at that point would be to combine the GUI driver pasm with the I2C driver pasm so it only uses one cog instead of two.

    Thoughts and sage advice would be most appreciated.
  • jazzedjazzed Posts: 11,803
    edited 2012-05-27 20:19
    Dr_Acula wrote: »
    Thoughts and sage advice would be most appreciated.


    I could use some sage advice too since I've been cutting sage, milkweed, and some kind of spiny miserable cactus all afternoon.

    The best generic approach to writing GCC COG drivers right now is to use PASM which we're all familiar with or COG C. COG C programs are special kind of C file. GCC also supports GAS assembler if you want to try that. GCC inline ASM is based on GAS syntax. Honestly I would avoid any inline ASM in C if at all possible. It just causes trouble.

    We have some COG driver programs written in COG C. VGA, I2C, and others. I want to write a cache cog in COG C.

    Btw, you can include Spin/PASM files in SimpleIDE projects. Some day we'll have an integrated Spin compiler too.

    Thanks,
    --Steve
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-05-27 21:18
    Thanks jazzed - good to hear this is all possible to do. Hope you are ok with no cactus prickles.
  • ersmithersmith Posts: 6,097
    edited 2012-05-29 14:28
    jazzed wrote: »
    The best generic approach to writing GCC COG drivers right now is to use PASM which we're all familiar with or COG C. COG C programs are special kind of C file. GCC also supports GAS assembler if you want to try that. GCC inline ASM is based on GAS syntax. Honestly I would avoid any inline ASM in C if at all possible. It just causes trouble.

    Personally I think using GAS drivers is easiest, but that's probably a matter of taste. For small drivers inline assembly is fine, and actually pretty straightforward. For example here's the real time clock driver, from the GCC library. The actual COG code is in the __asm__ portion, as a string (using the ANSI C convention that strings are automatically concatenated if there are no tokens between them).
    /*
     * very simple COG program to keep the 64 bit _default_ticks variable
     * up to date
     */
    #include <propeller.h>
    #include <sys/rtc.h>
    
    __asm__(
    "    .section .cogrtcupdate,\"ax\"\n"
    "L_main\n"
    
    "    rdlong oldlo, default_ticks_ptr\n"
    "    mov    newlo, CNT\n"
    "    cmp    newlo,oldlo wc\n"
    "    add    default_ticks_ptr,#4\n"
    "    rdlong newhi, default_ticks_ptr\n" 
    "    addx   newhi,#0\n"  /* adds in the carry set above */
    "    sub    default_ticks_ptr,#4\n"
    
    /* the sequence here makes sure to write newlo,newhi in that
     * order and in the fewest possible hub windows; if all readers
     * of default_ticks also read lo,hi in the fewest possible
     * hub cycles, then all users will
     * see consistent values
     */
    "    wrlong newlo, default_ticks_ptr\n"
    "    add    default_ticks_ptr,#4\n"
    "    wrlong newhi, default_ticks_ptr\n"
    "    sub    default_ticks_ptr,#4\n"
    "    jmp    #L_main\n"
    "newlo long 0\n"
    "newhi long 0\n"
    "oldlo long 0\n"
    "default_ticks_ptr long __default_ticks\n"
        );
    
    void
    _rtc_start_timekeeping_cog(void)
    {
      extern unsigned int _load_start_cogrtcupdate[];
    
      if (_default_ticks_updated)
        return;  /* someone is already updating the time */
    
      _default_ticks_updated = 1;
    
    #if defined(__PROPELLER_XMMC__) || defined(__PROPELLER_XMM__)
        unsigned int *buffer;
    
        // allocate a buffer in hub memory for the cog to start from
        buffer = __builtin_alloca(2048);
        memcpy(buffer, _load_start_cogrtcupdate, 2048);
        cognew(buffer, 0);
    #else
        cognew(_load_start_cogrtcupdate, 0);
    #endif
    }
    

    This uses the linker magic that automatically turns any section starting or ending with ".cog" into a COG overlay.
  • mindrobotsmindrobots Posts: 6,506
    edited 2012-05-29 14:33
    Awesome, Eric!

    This is a good one for the PropGCC Cookbook!!

    (you are writing the Propeller GCC Cookbook, aren't you???? :smile: )
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-05-29 16:34
    Hey, great work Eric. That looks like a fantastic solution.

    I'm presuming no problem with adding a comment eg
    "    wrlong newlo, default_ticks_ptr ' this is a comment\n"
    

    or do comments need to be /* ... */
  • ersmithersmith Posts: 6,097
    edited 2012-05-30 03:27
    Dr_Acula wrote: »
    Hey, great work Eric. That looks like a fantastic solution.

    I'm presuming no problem with adding a comment eg
    "    wrlong newlo, default_ticks_ptr ' this is a comment\n"
    

    or do comments need to be /* ... */

    You can put the comments into the GAS string using single quotes (as in your example), or you can put them outside the string using C/C++ style comments -- either should be fine.
  • average joeaverage joe Posts: 795
    edited 2012-06-03 03:21
    Hey guys, I've been thinking a lot about the C driver and I see one glaring issue. The "BUS" needs to be locked to prevent contention from multiple cogs accessing the bus for different functions. There will be at least 2 cogs accessing the bus : The cog running the cache driver and the "bus master" cog. SO, this begs the question of how to "lock the bus." This is probably only tricky since I have never ACTUALLY used the locks. Any recommendations? It should be as simple as a repeat loop calling lock until it returns the cog's id: before the first bus command? Then releasing the lock after the last bus command? Pass the lock id to use in one of the optional parameters? Or is there a default?

    Started building dual-screen board, should be done in the next few days. Displays are still 2 weeks out. I'm a bit bummed about writing a new driver for the mpc board. It is a nice design and has some promising features! A side note, I'm still an 74hc08 away from firing the board up. That and I'm running out of sockets! As soon as I "secure" my new location I'm ordering a BUNCH of sockets, as well as a few parts I'm in dire need of. I find it cheaper to buy 8 pin, 16 pin, and 40 pin sockets in quantity and cut down to fit. It takes a bit longer but saves a few cents. The 40-pins don't cut down to 32 as well as the 16's to 14's. I'm down to my last 2 loose crystals and need to order a few 6.25s. On the list now is : wireless pair, scribbler2 badges for my wife, and various components TBD. Any thoughts?
  • David BetzDavid Betz Posts: 14,516
    edited 2012-06-03 04:46
    Hey guys, I've been thinking a lot about the C driver and I see one glaring issue. The "BUS" needs to be locked to prevent contention from multiple cogs accessing the bus for different functions. There will be at least 2 cogs accessing the bus : The cog running the cache driver and the "bus master" cog. SO, this begs the question of how to "lock the bus." This is probably only tricky since I have never ACTUALLY used the locks. Any recommendations? It should be as simple as a repeat loop calling lock until it returns the cog's id: before the first bus command? Then releasing the lock after the last bus command? Pass the lock id to use in one of the optional parameters? Or is there a default?

    Try looking at the code at the start of the spi_flash_cache.spin driver. It has code that denominator added to allow sharing of the SPI pins with another driver using a lock. There is no reason why the same code couldn't be used to share your larger bus with another driver. I think you can probably get away with just pasting that into your skeleton-based driver in place of the cache line handling code. If you need help with that let me know.
  • average joeaverage joe Posts: 795
    edited 2012-06-03 05:23
    I will look at spi_flash_cache and see if I can figure it out. Shouldn't be too hard. Right now I'm building the dual-screen board. There's a trick though. I'm using male headers and ribbon cables. Which means mounting the headers for the display on the BOTTOM of the board. The DB9 port also needs to go on the bottom, so I'm thinking about putting the pin-headers on the bottom and use upside down. Should be interesting!

    *edit*
    2n2222 for reset transistor??
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-06-03 06:43
    Which means mounting the headers for the display on the BOTTOM of the board.

    Yikes - watch the polarities of the plugs. Is this so the display can be some distance away from the board?

    Also re 2n2222 yes that will work. BC547 is the sort of "generic" signal transistor here in Australia and I think in the US the 2n2222 is the generic one.
  • average joeaverage joe Posts: 795
    edited 2012-06-03 07:13
    There's a two-fold saving for me using ribbon cables. I have a BUNCH of male pin headers and cables. Not that many female pin headers. I will also be mounting the board in my rack-mount. To use the 3.2" displays I need to fudge the space, so this seems the best way. The problem is: using regular 40-pin PATA cables requires the header be placed on the BOTTOM. A bit tricky but I'm blaming THAT lesson for breaking my display. Until my new displays arrive I'll be using my old one and only one screen. 2 weeks will go fast and there's still much work. The board is 95%. Missing 3.3Vreg and large caps. Still trying to figure out the substitutions since buying caps is not an option and my stock is quickly disappearing.

    I've asked the transistor question a few times I'm sure. I don't plan on using a whole roll of de-soldering braid on THIS board :D Also wondering if the MAX components would interfere with prop-plug? I guess I'll find out soon enough!
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-06-03 16:27
    The problem is: using regular 40-pin PATA cables requires the header be placed on the BOTTOM. A bit tricky but I'm blaming THAT lesson for breaking my display.

    Hmm, could be. I guess once it is soldered and before you do the smoke test, do a conductivity test on pins 1,2 and 39 and 40 at the least and check there are no crossovers.

    Re large caps, I get a lot of mine from electronic junk. Computer motherboards are a good source. I once went to a computer store and asked for an old motherboard. The guy's eyes lit up and he showed me a room full of several hundred old PC boxes and told me to take as many as I liked for free as he wanted his room back. He seemed a bit disappointed when I only took one!

    Re the max chip, yes it would interfere with the propplug. Just pull the max3232 out of its socket if you are using the propplug.

    Re
    Not that many female pin headers
    I'll be sending you some freebies but it might be another 2-3 weeks.
Sign In or Register to comment.