Shop OBEX P1 Docs P2 Docs Learn Events
New XMM hardware — Parallax Forums

New XMM hardware

Dr_AculaDr_Acula Posts: 5,484
edited 2012-06-21 11:55 in Propeller 1
I'd like to test out GCC with the latest touchscreen schematic. I've read this page http://code.google.com/p/propgcc/wiki/PropGccExternalMemory

The cog pasm routines are already written and so hopefully are easy to integrate. I see some clever person has added the Dracblade. This was built on software originally written by Cluso99 that had a very simple interface format between spin and the cog. Send 4 longs:

1) is a single ascii character instruction eg "A" is move a block from hub to ram, "B" is move a block from ram to hub etc ("I" is reserved for "initialize")
2) is the hub address
3) is the external ram address
4) is the number of bytes.

and then wait for a long to change to say the operation is completed.

The touchscreen external ram driver uses the same interface, so whatever is working on the Dracblade ought to be possible to port over to the touchscreen without too many changes.

Can some kind soul please point me in the direction of the code used for the Dracblade external ram driver - I'm hoping I can just make a few changes and post the new pasm code.

Many thanks in advance.
«13456

Comments

  • jazzedjazzed Posts: 11,803
    edited 2012-04-24 21:49
    Dr_Acula wrote: »
    Can some kind soul please point me in the direction of the code used for the Dracblade external ram driver - I'm hoping I can just make a few changes and post the new pasm code.

    Many thanks in advance.

    http://propgcc.googlecode.com/hg/loader/spin/dracblade_cache.spin
    Hope you can read it. David will probably be around tomorrow if you have questions.

    You can use BSTC -C to compile your new cache driver and put it in the c:\propgcc\propeller-loader path. This can be compiled in a SimpleIDE project, but you have to copy the new driver to the aforementioned path. The resulting file will be something like newdriver_cache.dat assuming your source is newdriver_cache.spin

    You might consider a temporary config file name like dractouch.cfg or something, I'll use that here for simplicity.

    All you have to do to use newdriver_cache.dat is copy the file in c:\propgcc\propeller-loader\dracblade.cfg to some new file such as c:\propgcc\propeller-loader\dractouch.cfg and then change the "cache-driver: dracblade_cache.dat" line in dractouch.cfg to "cache-driver: newdriver_cache.dat".

    Once you have a new dractouch.cfg all setup, just click the jigsaw puzzle piece in SimpleIDE to reload the board config types.

    Then you can choose dractouch.cfg before clicking the blue right arrow Run Console button F8 or the green right arrow Run button F10.

    There is a tool available for testing a newdriver_cache.spin in the repository.
    It's here: http://propgcc.googlecode.com/hg/loader/spin/test_cache.spin

    You will also need this: http://propgcc.googlecode.com/hg/loader/spin/cache_interface.spin

    In test_cache.spin, you can change the "mdev" object from mdev : "eeprom_cache" to mdev : "newdriver_cache".

    As for the program to compile and load, well I can't with that tonight or tomorrow. Busy day ahead for me tomorrow.
    Hope this helps some.

    Thanks,
    --Steve

    Forum software is horrible these days :(
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-24 22:14
    Wow, thanks for the rapid and detailed response. I'll check it all out. Cheers!
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-24 22:23
    Code looks very logical.

    Quick question;
    BREAD
            call    #BSTART
    rdloop  call    #read_memory_byte       ' read byte from address into data_8
            wrbyte  data_8,ptr              ' write data_8 to hubaddr ie copy byte to hub
            add     ptr,#1                  ' add 1 to hub address
            add     address,#1              ' add 1 to ram address
            djnz    count,#rdloop           ' loop until done
    BREAD_RET
            ret
    

    The touchscreen is reading and writing in words rather than bytes. Simple answer is to shift count >>1. Can I assume that "count" is always an even number?
  • David BetzDavid Betz Posts: 14,516
    edited 2012-04-25 05:03
    Dr_Acula wrote: »
    Code looks very logical.

    Quick question;
    BREAD
            call    #BSTART
    rdloop  call    #read_memory_byte       ' read byte from address into data_8
            wrbyte  data_8,ptr              ' write data_8 to hubaddr ie copy byte to hub
            add     ptr,#1                  ' add 1 to hub address
            add     address,#1              ' add 1 to ram address
            djnz    count,#rdloop           ' loop until done
    BREAD_RET
            ret
    

    The touchscreen is reading and writing in words rather than bytes. Simple answer is to shift count >>1. Can I assume that "count" is always an even number?
    Yes, you can assume that. In fact, we always read an entire cache line at a time. I think the cache line size for the DracBlade driver is 64 bytes.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-25 05:16
    No problem. I've been using 512 bytes for graphics - is there a "standard" cache size?

    Also, I'm working through the step-by-step instructions in the Quickstart pdf file and there does not seem to be a "demos" folder. In fact, searching through all the files in the propgcc folder, I have lots of .h files but windows could not find a single .c file nor anything with toggle.* Seems a bit odd.
  • David BetzDavid Betz Posts: 14,516
    edited 2012-04-25 05:58
    Dr_Acula wrote: »
    No problem. I've been using 512 bytes for graphics - is there a "standard" cache size?

    Also, I'm working through the step-by-step instructions in the Quickstart pdf file and there does not seem to be a "demos" folder. In fact, searching through all the files in the propgcc folder, I have lots of .h files but windows could not find a single .c file nor anything with toggle.* Seems a bit odd.
    There is no standard cache line size. Different drivers can use different sizes based on the requirements of the backing store device. For instance, the SD cache driver uses 512 byte cache lines because that's the sector size on the SD card.

    I'm not sure about why you can't find the demos directory. You can find it in Google Code here: http://code.google.com/p/propgcc/source/browse/#hg%2Fdemos

    You'll find a 'toggle' directory under 'demos' with lots of versions of the toggle program.
  • David BetzDavid Betz Posts: 14,516
    edited 2012-04-25 06:36
    Dr_Acula wrote: »
    No problem. I've been using 512 bytes for graphics - is there a "standard" cache size?

    Also, I'm working through the step-by-step instructions in the Quickstart pdf file and there does not seem to be a "demos" folder. In fact, searching through all the files in the propgcc folder, I have lots of .h files but windows could not find a single .c file nor anything with toggle.* Seems a bit odd.
    I forgot to mention this in my previous post but even though 512 byte cache lines are supported by some cache drivers, that doesn't mean they are optimal. It turns out that the SD cache driver has worse performance than some of the other cache drivers partly because of the large cache lines. You may want to experiment with different cache geometries to determine which gives the best performance for your application.
  • Dave HeinDave Hein Posts: 6,347
    edited 2012-04-25 08:51
    I did some benchmarking of various XMMC cache implementations several months ago, and it seemed like the number of cache lines in the cache memory was very important. In an 8K cache with a 512-byte cache line, there are only 16 cache lines in hub RAM. Unfortunately, SD access is always done in chunks of 512 bytes, so there's not a lot of flexibility there. With flash memory I tried a 32x256 cofiguration and 64x128. In general, the 64 128-byte cache lines performed the best.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-25 16:03
    128 bytes will be fine. For this particular driver, moving data is fast but there is some setup code to load all the 161 counters, so a hypothetical 16 byte cache would not be very optimal. Back to coding... :)
  • David BetzDavid Betz Posts: 14,516
    edited 2012-04-26 20:50
    How are you doing on this? Do you need help?
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-26 23:26
    Yea, you got me! I'm stuck.

    Ok, what I have is a board in front of me doing all sorts of cool icons and touchscreen things. But the design is about to be superseded by a new design (based on a brilliant idea from jazzed) and that board won't be here till next week. Coding the pasm driver can be done on the board I have but the code will end up being changed very soon. So I think I will wait till the new board arrives. It will need some code to drive the 74HC237 (even though that will only be a few lines of code, I put some debugging leds on the board as I've never used this chip before). On the plus side, I've ordered 10 boards so if they work I've got spares I can give away to anyone wanting to help out.

    Which brings up the next step - debugging. On the touchscreen I wrote a slow routine that takes the propeller font from inside the propeller and puts it on the touchscreen. This works without an SD card so you can display a message to say there is no SD card.

    But for debugging C right at the very start, I wonder if the serial port might be easier as it can be used to get things working before the display is working.

    So...

    would anyone have a very simple C program that sends "Hello World" back up the P30/P31 serial lines to a terminal program? And if possible, maybe add a 5 second delay so there is time to do a download and then fire up a terminal program to check the data coming back.

    I'm thinking of the way I got the current version working, which was to code every pin logic level change in Spin first, and then port each spin routine over to pasm. So I guess that might be one way to do it in C as well. On the other hand, you might look at the pasm and say - hey that is easy to port over. Maybe it is - it does work rather like a cache. The display startup is still in spin though and so would go to C. And so I guess I'll be asking dumb things like how to do dira and outa in C.
  • jazzedjazzed Posts: 11,803
    edited 2012-04-26 23:50
    Dr_Acula wrote: »
    would anyone have a very simple C program that sends "Hello World" back up the P30/P31 serial lines to a terminal program? And if possible, maybe add a 5 second delay so there is time to do a download and then fire up a terminal program to check the data coming back.

    SimpleIDE packages come with a hello demo for P30/31. Other demo programs are also included.

    Download here: http://propside.googlecode.com/files/Simple-IDE_0-6-7_setup.zip
    User guide is here: https://sites.google.com/site/propellergcc/simpleide/user-s-guide
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-27 00:58
    Q: That easy?
    A: YES!

    See attached screenshot. Download the program, F10 and open the terminal.

    Well, hats off to the GCC team and congratulations. You guys have made C super easy to use on the propeller.
    1018 x 740 - 137K
    gcc.jpg 136.5K
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-27 23:14
    This is more a memo to myself for later. Notes re building a new xmm driver
    1) Toggle a led in C
    2) Toggle a led by calling a pasm routine from C, and the pasm routine toggles the led
    3) Write xmm driver primitives in C to send a byte to xmm, and return a byte
    4) Port each routine over to cog pasm code
    5) Test running this as actual xmm

    I'm up to step 2. Use link from post #7 David Betz for the demos
    Scroll to the "toggle" program.
    Grab both the C code and the Spin code.

    From the way these are separate I surmise that GCC is working with "binary blobs", precompiled in a spin compiler.

    This is the spin bit
    {{
    toggle.spin
    Propgcc - PASM toggle demo
    
    Simple PASM routine to demonstrate interaction between PASM subroutines an PROPGCC main program.
    The code running in the C address space to talk to exchange data values with PASM code running
    in a COG.
    
    The C program has visibility/access to the mailbox variables as normal C variables.
    The PASM program has visibility/access to the mailbox variables through the PAR register
    initialized by the COGNEW and hte RDLONG/RDWORD/RDCHAR and WRLONG/WRWORD/WRCHAR instructions.
    
    C program starts the PASM routine via cognew() function, passing strat address and PAR register value.
    PAR register should be the address of a STATIC data area of LONGs in the C program.
    
    For this example:
    
    PAR -> static unsigned int delay;
    static unsigned int pins;
    static unsigned int loop_cnt;
    static unsigned int pasm_done;
    
    The first three variables are used as input to the PASM routine, the last is used to act as a semaphore
    back to the C routine.
    
    This could have been written as more efficient PASM code but for the examples, I was going for maximum clarity at this point.
    
    Copyright (c) 2011, Steve Denson, Rick Post
    MIT Licensed. Terms of use below.
    
    }}
    
    pub start(pinptr)
    cognew(@pasm, pinptr)
    
    dat org 0
    
    pasm
    mov mailbox_ptr,par ' save the pointer to the STATIC parameter (HUB) memory area
    ' the PAR register is initialized by the cognew() function and is a pointer to
    ' the first STATIC int declared in the C code as the mailbox
    ' mailbox_ptr will be changed as the code executes. You can reload
    ' the initial pointer from PAR if you ever need it to point to
    ' the start of the mailbox again
    rdlong waitdelay, mailbox_ptr ' read the wait delay from HUB - it is initialized by the C program
    ' in C program: delay = CLKFREQ>>1;
    add mailbox_ptr,#4 ' point to the next LONG in HUB
    rdlong pins,mailbox_ptr ' the caller's PIN mask as initialized in the C program
    ' in C program: pins = 0x3fffffff;
    add mailbox_ptr, #4 ' point to the next LONG (4 bytes each)
    rdlong loopcounter,mailbox_ptr ' set the loop count as provided by the C program
    ' in C program: loop_cnt = 20;
    add mailbox_ptr, #4 ' point to the next LONG which is the semaphore we are setting when done
    
    mov dira, pins ' set pins provided by C program to OUTPUT
    mov nextcnt, waitdelay
    add nextcnt, cnt ' best to add cnt last
    :loop
    xor outa, pins ' toggle pins
    waitcnt nextcnt, waitdelay ' wait for user specified delay
    djnz loopcounter,#:loop ' loop until the C provided counter hits zero
    
    mov done_flag,#1 ' set the semaphore to one
    wrlong done_flag, mailbox_ptr ' and save it back into hub memory via the ptr provided by the C program
    ' in C program: while(!pasm_done) to test for update from PASM
    jmp #$ ' to infinity and BEYOND!!
    
    ' these do not need to be in any particular order or have particular names. There is no relationship between these
    ' local copies of the C variable except when you create via the PAR register and HUB instructions
    ' there is no address resolution or linkage done by propgcc or the loader
    '
    mailbox_ptr long 0 ' working ptr into the HUB area - reload from PAR if needed
    pins long 0 ' local copy of the user's PIN mask
    waitdelay long 0 ' local copy of the user's delay
    loopcounter long 0 ' local copy of the user's loop counter
    done_flag long 0 ' local copy of the semaphore to return to the C program
    nextcnt long 0 ' local variable to save target value from waitcnt
    
    {{
    
    MIT Licensed.
    
    +--------------------------------------------------------------------
    TERMS OF USE: MIT License
    +--------------------------------------------------------------------
    Permission is hereby granted, free of charge, to any person obtaining
    a copy of this software and associated documentation files
    (the "Software"), to deal in the Software without restriction,
    including without limitation the rights to use, copy, modify, merge,
    publish, distribute, sublicense, and/or sell copies of the Software,
    and to permit persons to whom the Software is furnished to do so,
    subject to the following conditions:
    
    The above copyright notice and this permission notice shall be
    included in all copies or substantial portions of the Software.
    
    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
    MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
    IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
    CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
    TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
    SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
    +--------------------------------------------------------------------
    }}
    

    and this is the C bit
    /**
     * @file toggle.c
     * This program demonstrates starting a PASM COG
     * and being able to pass parameters from a C program to a PASM program
     * and from PASM back to C.
     *
     * C to PASM Mailbox example
     *
     * WARNING: This code makes all IO pins except 30/31 toggle HIGH/LOW. Check if this is OK
     * for the board you are using.
     *
     *
     * to use:
     * from directory containing source
     * make clean
     * make
     * propeller-load -pn -t -r toggle.elf  (where n is port #)
     *
     * Copyright (c) 2011, Steve Denson, Rick Post
     * MIT Licensed - terms of use below.
     */
    
    #include <stdio.h>
    #include <propeller.h>                    // propeller specific definitions
    
    // the STATIC HUB mailbox for communication to PASM routine
    //
    static unsigned int delay; // a pointer to this gets passed to the PASM code as the PAR register
    static unsigned int pins;
    static int loop_cnt;
    static int pasm_done;
    
    // C stub function to start the PASM routine
    // need to be able to provide the entry point to the PASM
    // and a pointer to the STATIC HUB mailbox
    // the cognew function in the propeller.c library returns the COG #
    //
    int start(unsigned int *pinptr)
    {
        // The label binary_toggle_dat_start is automatically placed
        // on the cog code from toggle.dat by objcopy (see the Makefile).
        extern unsigned int binary_toggle_dat_start[];
        return cognew(&binary_toggle_dat_start, pinptr);
    }
    
    void usleep(int t)
    {
        if(t < 10)  // very small t values will cause a hang
            return; // don't bother function delay is likely enough
        waitcnt((CLKFREQ/1000000)*t+CNT);
    }
    // C main function
    // LMM model
    void main (int argc,  char* argv[])
    {
        printf("hello, world!\n");            // let the lead LMM COG say hello
        delay = CLKFREQ>>1;                    // set the delay rate in the STATIC mailbox
                                            // this is actually the duty cycle of the blink 0.5 sec on, 0.5 sec off
        pins = 0x3fFFffff;                     // set the PIN mask into the STATIC mailbox
                                            // light up all pins except 30 & 31 since we don't know board config
        loop_cnt = 20;                        // number of time through the loop (20 toggles, 10 on/off cycles)
        pasm_done = 0;                        // make sure it's zero since we'll sit and wait on it to change in a few lines
        printf ("New COG# %d started.\n",start(&delay)); // start a new COG passing a pointer to the STATIC mailbox structure
        printf ("waiting for semaphore to be set by PASM code.\n");
        while (!pasm_done)
        {
          usleep(10);                        // wait for the PASM code to clear the loop counter
        }    
        printf("goodbyte, world!\n");
        while(1);                            //let the original COG sit and spin
    }
    
    /*
        +--------------------------------------------------------------------
          TERMS OF USE: MIT License
        +--------------------------------------------------------------------
        Permission is hereby granted, free of charge, to any person obtaining
        a copy of this software and associated documentation files
        (the "Software"), to deal in the Software without restriction,
        including without limitation the rights to use, copy, modify, merge,
        publish, distribute, sublicense, and/or sell copies of the Software,
        and to permit persons to whom the Software is furnished to do so,
        subject to the following conditions:
    
        The above copyright notice and this permission notice shall be
        included in all copies or substantial portions of the Software.
    
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
        EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
        MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
        IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
        CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
        TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
        SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
        +--------------------------------------------------------------------
    */
    

    Now in Spin, that would be one program, not two. This sort of thing can be glued into one program - I wrote an IDE for Catalina that put some tags around the pasm/spin bit and then wrote a simple program the split it all up, send the pasm bits off to BST and the C bit off to the C compiler.

    So here is a question. If there is a "COG" mode in SimpleIDE and also a "LMM" mode, can you combine both in the same program? Have some C code that goes into a cog, and some C code that is the "main" program. Maybe with some sort of tag around the bit of code that goes into the cogs. Then it becomes one program, not two like "toggle" above?

    If that is possible, then the whole program can be pure C and you don't have to learn pasm.

    I'd have not said such a thing until recently when heater pointed out his FFT was running almost the same speed compiled from C as it was in native pasm.

    Meanwhile - thinking about that some more, I think I'll do things in a different order. I have xmm code already working in a spin program. So I think I'll just grab the pasm bit plus the minimal spin so it compiles and turn that into a binary blob. Then translate the Spin driver into C.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-28 01:10
    More testing.

    Well, the SimpleIDE might be simple to use but it is very clever behind the scenes. A little drop-down menu in the top right corner with all the hardware boards. The Dracblade works for a XMMC program right out of the box. All those hardware configs had be coded and that must have been a lot of work.

    Also the text highlighting in the code. I've written code to do that and it is very complicated. Again, it is all there and working and easy to work out without even reading a help file.

    Ok, a new driver.

    This is the Dracblade driver using latches to talk to an external ram chip
    DAT
    '' +--------------------------------------------------------------------------+
    '' | Dracblade Ram Driver (with grateful acknowlegements to Cluso)            |
    '' +--------------------------------------------------------------------------+
                            org     0
    tbp2_start    ' setup the pointers to the hub command interface (saves execution time later
                                          '  +-- These instructions are overwritten as variables after start
    comptr                  mov     comptr, par     ' -|  hub pointer to command                
    hubptr                  mov     hubptr, par     '  |  hub pointer to hub address            
    ramptr                  add     hubptr, #4      '  |  hub pointer to ram address            
    lenptr                  mov     ramptr, par     '  |  hub pointer to length                 
    errptr                  add     ramptr, #8      '  |  hub pointer to error status           
    cmd                     mov     lenptr, par     '  |  command  I/R/W/G/P/Q                  
    hubaddr                 add     lenptr, #12     '  |  hub address                           
    ramaddr                 mov     errptr, par     '  |  ram address                           
    len                     add     errptr, #16     '  |  length                                
    err                     nop                     ' -+  error status returned (=0=false=good) 
    
    
    ' Initialise hardware (unlike the triblade, just tristates everything and read/write set the pins)
    init                    mov     err, #0                  ' reset err=false=good
                            mov     dira,zero                ' tristate the pins
    
    done                    wrlong  err, errptr             ' status  =0=false=good, else error x
                            wrlong  zero, comptr            ' command =0 (done)
    ' wait for a command (pause short time to reduce power)
    pause                   mov     ctr, delay      wz      ' if =0 no pause
                  if_nz     add     ctr, cnt
                  if_nz     waitcnt ctr, #0                 ' wait for a short time (reduces power)
                            rdlong  cmd, comptr     wz      ' command ?
                  if_z      jmp     #pause                  ' not yet
    ' decode command
                            cmp     cmd, #"R"       wz      ' R = read block
                  if_z      jmp     #rdblock
                            cmp     cmd, #"W"       wz      ' W = write block
                  if_z      jmp     #wrblock
                            cmp     cmd, #"N"       wz      ' N= led on
                  if_z      jmp     #led_turn_on
                            cmp     cmd, #"F"       wz      ' F = led off
                  if_z      jmp     #led_turn_off
                            cmp     cmd, #"H"       wz      ' H sets the high latch
                  if_z      jmp     #sethighlatch
                            mov     err, cmd                ' error = cmd (unknown command)
                            jmp     #done
    
    
    tristate                mov     dira,zero                ' all inputs to zero
                            jmp     #done
    
    ' turn led on
    led_turn_on             or      HighLatch,ledpin        ' set the led pin high
                            jmp     #OutputHighLatch         ' send this out
    
    led_turn_off            andn    HighLatch,ledpin        ' set the led pin low
                            jmp     #OutputHighLatch         ' send this out
    
    ' set high address bytes with command H, pass value in third variable of the DoCmd
    ' 4 bytes - masks off all but bits 16 to 23
    
    sethighlatch            call #ram_open                  ' gets address value in 'address'
                            shr  address,#16                ' shift right by 16 places
                            and  address,#$FF               ' ensure rest of bits zero
                            mov  HighLatch,address          ' put value into HighLatch
                            jmp  #OutputHighLatch           ' and output it
    
    '---------------------------------------------------------------------------------------------------------
    'Memory Access Functions
    
    rdblock                 call    #ram_open               ' get variables from hub variables
    rdloop                  call    #read_memory_byte       ' read byte from address into data_8
                            wrbyte  data_8,hubaddr          ' write data_8 to hubaddr ie copy byte to hub
                            add     hubaddr,#1              ' add 1 to hub address
                            add     address,#1              ' add 1 to ram address
                            djnz    len,#rdloop             ' loop until done
                            jmp     #init                   ' reinitialise
    
    wrblock                 call    #ram_open                        
    wrloop                  rdbyte  data_8, hubaddr         ' copy byte from hub
                            call    #write_memory_byte      ' write byte from data_8 to address
                            add     hubaddr,#1              ' add 1 to hub address
                            add     address,#1              ' add 1 to ram address
                            djnz    len,#wrloop             ' loop until done
                            jmp     #init                   ' reinitialise
    
    ram_open                rdlong  hubaddr, hubptr         ' get hub address
                            rdlong  ramaddr, ramptr         ' get ram address
                            rdlong  len, lenptr             ' get length
                            mov     err, #5                 ' err=5
                            mov     address,ramaddr         ' cluso's variable 'ramaddr' to dracblade variable 'address'
    ram_open_ret            ret
      
    read_memory_byte        call #RamAddress                ' sets up the latches with the correct ram address
                            mov dira,LatchDirection2        ' for reads so P0-P7 tristate till do read
                            mov outa,GateHigh               ' actually ReadEnable but they are the same
                            andn outa,GateHigh              ' set gate low
                            nop                             ' short delay to stabilise
                            nop
                            mov data_8, ina                 ' read SRAM
                            and data_8, #$FF                ' extract 8 bits
                            or  outa,GateHigh               ' set the gate high again
    read_memory_byte_ret    ret
    
    write_memory_byte       call #RamAddress                ' sets up the latches with the correct ram address
                            mov outx,data_8                 ' get the byte to output
                            and outx, #$FF                  ' ensure upper bytes=0
                            or outx,WriteEnable             ' or with correct 138 address
                            mov outa,outx                   ' send it out
                            andn outa,GateHigh              ' set gate low
                            nop                             ' no nop doesn't work, one does, so put in two to be sure
                            nop                             ' another NOP
                            or outa,GateHigh                ' set it high again
    write_memory_byte_ret   ret
    
    RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of ram
                            mov dira,LatchDirection         ' set up the pins for programming latch chips
                            mov outx,address                ' get the address into a temp variable
                            and outx,#$FF                   ' mask the low byte
                            or  outx,LowAddress             ' or with 138 low address
                            mov outa,outx                   ' send it out
                            andn outa,GateHigh              ' set gate low
                                                            ' ?? a NOP
                            or outa,GateHigh                ' set it high again  
                                                            ' now repeat for the middle byte     
                            mov outx,address                ' get the address into a temp variable
                            shr outx,#8                     ' shift right by 8 places
                            and outx,#$FF                   ' mask the low byte
                            or  outx,MiddleAddress          ' or with 138 middle address
                            mov outa,outx                   ' send it out
                            andn outa,GateHigh              ' set gate low
                            or outa,GateHigh                ' set it high again 
    RamAddress_ret          ret
    
    OutputHighLatch ' sends out HighLatch to the 374 that does A16-19, led and the 4 spare outputs
                            mov     dira,latchdirection     ' setup active pins 138 and bus
                            mov     outa,HighLatch          ' send out HighLatch
                            or      outa,HighAddress        ' or with the high address
                            andn    outa,GateHigh           ' set gate low
                            or      outa,GateHigh           ' set the gate high again
    OutputHighLatch_ret     jmp     #tristate               ' set pins tristate
    
    
    
    
    
    delay                   long    80                                    ' waitcnt delay to reduce power (#80 = 1uS approx)
    ctr                     long    0                                     ' used to pause execution (lower power use) & byte counter
    GateHigh                long    %00000000_00000000_00000001_00000000  ' HC138 gate high, all others must be low
    Outx                    long    0                                     ' for temp use, same as n in the spin code
    LatchDirection          long    %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active
    LatchDirection2         long    %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read
    LowAddress              long    %00000000_00000000_00000101_00000000 ' low address latch = xxxx010x and gate high xxxxxxx1
    MiddleAddress           long    %00000000_00000000_00000111_00000000 ' middle address latch = xxxx011x and gate high xxxxxxx1
    HighAddress             long    %00000000_00000000_00001001_00000000 ' high address latch = xxxx100x and gate high xxxxxxx1
    'ReadEnable long    %00000000_00000000_00000001_00000000 ' /RD = xxxx000x and gate high xxxxxxx1
                                                            ' commented out as the same as GateHigh
    WriteEnable             long    %00000000_00000000_00000011_00000000 ' /WE = xxxx001x and gate high xxxxxxx1
    Zero                    long    %00000000_00000000_00000000_00000000 ' for tristating all pins
    data_8                  long    %00000000_00000000_00000000_00000000 ' so code compatability with zicog driver
    address                 long    %00000000_00000000_00000000_00000000 ' address for ram chip
    ledpin                  long    %00000000_00000000_00000000_00001000 ' to turn on led
    HighLatch               long    %00000000_00000000_00000000_00000000 ' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs
    

    This is the very clever code David Betz wrote http://propgcc.googlecode.com/hg/loader/spin/dracblade_cache.spin
    DAT
            org   $0
    
    ' initialization structure offsets
    ' $0: pointer to a two word mailbox
    ' $4: pointer to where to store the cache lines in hub ram
    ' $8: number of bits in the cache line index if non-zero (default is DEFAULT_INDEX_WIDTH)
    ' $a: number of bits in the cache line offset if non-zero (default is DEFAULT_OFFSET_WIDTH)
    ' note that $4 must be at least 2^($8+$a) bytes in size
    ' the cache line mask is returned in $0
    
    init_vm mov     t1, par             ' get the address of the initialization structure
            rdlong  pvmcmd, t1          ' pvmcmd is a pointer to the virtual address and read/write bit
            mov     pvmaddr, pvmcmd     ' pvmaddr is a pointer into the cache line on return
            add     pvmaddr, #4
            add     t1, #4
            rdlong  cacheptr, t1        ' cacheptr is the base address in hub ram of the cache
            add     t1, #4
            rdlong  t2, t1 wz
      if_nz mov     index_width, t2     ' override the index_width default value
            add     t1, #4
            rdlong  t2, t1 wz
      if_nz mov     offset_width, t2    ' override the offset_width default value
    
            mov     index_count, #1
            shl     index_count, index_width
            mov     index_mask, index_count
            sub     index_mask, #1
    
            mov     line_size, #1
            shl     line_size, offset_width
            mov     t1, line_size
            sub     t1, #1
            wrlong  t1, par
    
            jmp     #vmflush
    
    fillme  long    0[128-fillme]           ' first 128 cog locations are used for a direct mapped page table
    
            fit   128
    
            ' initialize the cache lines
    vmflush movd    :flush, #0
            mov     t1, index_count
    :flush  mov     0-0, empty_mask
            add     :flush, dstinc
            djnz    t1, #:flush
    
            ' start the command loop
    waitcmd mov     dira, #0                ' release the pins for other SPI clients
            wrlong  zero, pvmcmd
    :wait   rdlong  vmpage, pvmcmd wz
      if_z  jmp     #:wait
    
            shr     vmpage, offset_width wc ' carry is now one for read and zero for write
            mov     set_dirty_bit, #0       ' make mask to set dirty bit on writes
            muxnc   set_dirty_bit, dirty_mask
            mov     line, vmpage            ' get the cache line index
            and     line, index_mask
            mov     hubaddr, line
            shl     hubaddr, offset_width
            add     hubaddr, cacheptr       ' get the address of the cache line
            wrlong  hubaddr, pvmaddr        ' return the address of the cache line
            movs    :ld, line
            movd    :st, line
    :ld     mov     vmcurrent, 0-0          ' get the cache line tag
            and     vmcurrent, tag_mask
            cmp     vmcurrent, vmpage wz    ' z set means there was a cache hit
      if_nz call    #miss                   ' handle a cache miss
    :st     or      0-0, set_dirty_bit      ' set the dirty bit on writes
            jmp     #waitcmd                ' wait for a new command
    
    ' line is the cache line index
    ' vmcurrent is current page
    ' vmpage is new page
    ' hubaddr is the address of the cache line
    miss    movd    :test, line
            movd    :st, line
    :test   test    0-0, dirty_mask wz
      if_z  jmp     #:rd                    ' current page is clean, just read new page
            mov     vmaddr, vmcurrent
            shl     vmaddr, offset_width
            call    #BWRITE                 ' write current page
    :rd     mov     vmaddr, vmpage
            shl     vmaddr, offset_width
            call    #BREAD                  ' read new page
    :st     mov     0-0, vmpage
    miss_ret ret
    
    ' pointers to mailbox entries
    pvmcmd          long    0       ' on call this is the virtual address and read/write bit
    pvmaddr         long    0       ' on return this is the address of the cache line containing the virtual address
    
    cacheptr        long    0       ' address in hub ram where cache lines are stored
    vmpage          long    0       ' page containing the virtual address
    vmcurrent       long    0       ' current page in selected cache line (same as vmpage on a cache hit)
    line            long    0       ' current cache line index
    set_dirty_bit   long    0       ' DIRTY_BIT set on writes, clear on reads
    
    zero            long    0       ' zero constant
    dstinc          long    1<<9    ' increment for the destination field of an instruction
    t1              long    0       ' temporary variable
    t2              long    0       ' temporary variable
    
    tag_mask        long    !(1<<DIRTY_BIT) ' includes EMPTY_BIT
    index_width     long    DEFAULT_INDEX_WIDTH
    index_mask      long    0
    index_count     long    0
    offset_width    long    DEFAULT_OFFSET_WIDTH
    line_size       long    0                       ' line size in longs
    empty_mask      long    (1<<EMPTY_BIT)
    dirty_mask      long    (1<<DIRTY_BIT)
    
    '----------------------------------------------------------------------------------------------------
    '
    ' BSTART
    '
    ' setup the high order address byte
    '
    '----------------------------------------------------------------------------------------------------
    
    BSTART
            mov     address,vmaddr          ' get the high address byte
            shr     address,#16             ' shift right by 16 places
            and     address,#$FF            ' ensure rest of bits zero
            mov     HighLatch,address       ' put value into HighLatch
            mov     dira,LatchDirection     ' setup active pins 138 and bus
            mov     outa,HighLatch          ' send out HighLatch
            or      outa,HighAddress        ' or with the high address
            andn    outa,GateHigh           ' set gate low
            or      outa,GateHigh           ' set the gate high again
            mov     ptr, hubaddr            ' hubaddr = hub page address
            mov     address, vmaddr
            mov     count, line_size
    BSTART_RET
            ret
    
    '----------------------------------------------------------------------------------------------------
    '
    ' BREAD
    '
    ' vmaddr is the virtual memory address to read
    ' hubaddr is the hub memory address to write
    ' count is the number of longs to read
    '
    ' trashes count, ptr
    '
    '----------------------------------------------------------------------------------------------------
    
    BREAD
            call    #BSTART
    rdloop  call    #read_memory_byte       ' read byte from address into data_8
            wrbyte  data_8,ptr              ' write data_8 to hubaddr ie copy byte to hub
            add     ptr,#1                  ' add 1 to hub address
            add     address,#1              ' add 1 to ram address
            djnz    count,#rdloop           ' loop until done
    BREAD_RET
            ret
    
    '----------------------------------------------------------------------------------------------------
    '
    ' BWRITE
    '
    ' vmaddr is the virtual memory address to write
    ' hubaddr is the hub memory address to read
    ' count is the number of longs to write
    '
    ' trashes count, ptr, count
    '
    '----------------------------------------------------------------------------------------------------
    
    BWRITE
            call    #BSTART
    wrloop  rdbyte  data_8, ptr             ' copy byte from hub
            call    #write_memory_byte      ' write byte from data_8 to address
            add     ptr,#1                  ' add 1 to hub address
            add     address,#1              ' add 1 to ram address
            djnz    count,#wrloop           ' loop until done
    BWRITE_RET
            ret
    
    ' input parameters to BREAD and BWRITE
    vmaddr      long    0       ' virtual address
    hubaddr     long    0       ' hub memory address to read from or write to
    
    ' temporaries used by BREAD and BWRITE
    ptr         long    0
    count       long    0
    
    ''From Dracblade driver for talking to a ram chip via three latches
    '' Modified code from Cluso's triblade
    
    '---------------------------------------------------------------------------------------------------------
    'Memory Access Functions
    
    read_memory_byte        call #RamAddress                ' sets up the latches with the correct ram address
                            mov dira,LatchDirection2        ' for reads so P0-P7 tristate till do read
                            mov outa,GateHigh               ' actually ReadEnable but they are the same
                            andn outa,GateHigh              ' set gate low
                            nop                             ' short delay to stabilise
                            nop
                            mov data_8, ina                 ' read SRAM
                            and data_8, #$FF                ' extract 8 bits
                            or  outa,GateHigh               ' set the gate high again
    read_memory_byte_ret    ret
    
    write_memory_byte       call #RamAddress                ' sets up the latches with the correct ram address
                            mov outx,data_8                 ' get the byte to output
                            and outx, #$FF                  ' ensure upper bytes=0
                            or outx,WriteEnable             ' or with correct 138 address
                            mov outa,outx                   ' send it out
                            andn outa,GateHigh              ' set gate low
                            nop                             ' no nop doesn't work, one does, so put in two to be sure
                            nop                             ' another NOP
                            or outa,GateHigh                ' set it high again
    write_memory_byte_ret   ret
    
    RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of ram
                            mov dira,LatchDirection         ' set up the pins for programming latch chips
                            mov outx,address                ' get the address into a temp variable
                            and outx,#$FF                   ' mask the low byte
                            or  outx,LowAddress             ' or with 138 low address
                            mov outa,outx                   ' send it out
                            andn outa,GateHigh              ' set gate low
                            or outa,GateHigh                ' set it high again
                                                            ' now repeat for the middle byte
                            mov outx,address                ' get the address into a temp variable
                            shr outx,#8                     ' shift right by 8 places
                            and outx,#$FF                   ' mask the low byte
                            or  outx,MiddleAddress          ' or with 138 middle address
                            mov outa,outx                   ' send it out
                            andn outa,GateHigh              ' set gate low
                            or outa,GateHigh                ' set it high again
    RamAddress_ret          ret
    
    GateHigh                long    %00000000_00000000_00000001_00000000  ' HC138 gate high, all others must be low - also used as ReadEnable
    outx                    long    0                                     ' for temp use, same as n in the spin code
    LatchDirection          long    %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active
    LatchDirection2         long    %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read
    LowAddress              long    %00000000_00000000_00000101_00000000 ' low address latch = xxxx010x and gate high xxxxxxx1
    MiddleAddress           long    %00000000_00000000_00000111_00000000 ' middle address latch = xxxx011x and gate high xxxxxxx1
    HighAddress             long    %00000000_00000000_00001001_00000000 ' high address latch = xxxx100x and gate high xxxxxxx1
    WriteEnable             long    %00000000_00000000_00000011_00000000 ' /WE = xxxx001x and gate high xxxxxxx1
    data_8                  long    %00000000_00000000_00000000_00000000 ' so code compatability with zicog driver
    address                 long    %00000000_00000000_00000000_00000000 ' address for ram chip
    HighLatch               long    %00000000_00000000_00000000_00000000 ' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs
    
                            fit     496
    

    and this is the complete driver code for the touchscreen
    DAT
    '' +-----------------------------------------------------------------------------------------------+
    '' | Touchblade 161 Ram Driver (with grateful acknowlegements to Cluso and Average Joe)            |
    '' +-----------------------------------------------------------------------------------------------+
                            org     0
    tbp2_start    ' setup the pointers to the hub command interface (saves execution time later
                                          '  +-- These instructions are overwritten as variables after start
    comptr                  mov     comptr, par     ' -|  hub pointer to command                
    hubptr                  mov     hubptr, par     '  |  hub pointer to hub address            
    ramptr                  add     hubptr, #4      '  |  hub pointer to ram address            
    lenptr                  mov     ramptr, par     '  |  hub pointer to length                 
    errptr                  add     ramptr, #8      '  |  hub pointer to error status           
    cmd                     mov     lenptr, par     '  |  command  I/R/W/G/P/Q                  
    hubaddr                 add     lenptr, #12     '  |  hub address                           
    ramaddr                 mov     errptr, par     '  |  ram address                           
    len                     add     errptr, #16     '  |  length                                
    err                     nop                     ' -+  error status returned (=0=false=good) 
    
    
    ' Initialise hardware tristates everything and read/write set the pins
    init                    mov     err, #0                  ' reset err=false=good
                            mov     dira,zero                ' tristate the pins with the cog dira
    
    done                    wrlong  err, errptr             ' status  =0=false=good, else error x
                            wrlong  zero, comptr            ' command =0 (done)
    ' wait for a command (pause short time to reduce power)
    pause
    '                        mov     ctr, delay      wz      ' if =0 no pause
    '              if_nz     add     ctr, cnt
    '              if_nz     waitcnt ctr, #0                 ' wait for a short time (reduces power)
                            rdlong  cmd, comptr     wz      ' command ?
                  if_z      jmp     #pause                  ' not yet
    ' decode command
                            cmp     cmd, #"S"       wz      ' hub to ram
                  if_z      jmp     #pasmhubtoram           
                            cmp     cmd, #"T"       wz      ' ram to hub
                  if_z      jmp     #pasmramtohub
                            cmp     cmd, #"U"       wz      ' ram to display
                  if_z      jmp     #pasmramtodisplay
                            cmp     cmd, #"V"       wz      ' hub to display
                  if_z      jmp     #pasmhubtodisplay           
                            cmp     cmd, #"E"       wz      ' convert 3 byte .raw format to 2 byte .ili format - hub to hub
                  if_z      jmp     #rawtoiliformat
                            cmp     cmd, #"F"       wz      ' convert 3 byte .bmp format BGR to 2 byte ili format (same as E but order reversed)
                  if_z      jmp     #bmptoiliformat              
     '                       cmp     cmd, #"W"       wz      ' lcdwritecom in pasm, not working
     '             if_z      jmp     #pasmlcdwritecom
                            cmp     cmd, #"X"       wz      ' merge icon and background based on a mask
                  if_z      jmp     #mergeicons
                   
                            cmp     cmd, #"I"       wz      ' init
                  if_z      jmp     #init     
                            mov     err, cmd                ' error = cmd (unknown command)
                            jmp     #done
                            
    ' ----------------- common routines -------------------------------------
    
    get_values              rdlong  hubaddr, hubptr         ' get hub address
                            rdlong  ramaddr, ramptr         ' get ram address
                            rdlong  len, lenptr             ' get length
                            mov     err, #5                 ' err=5
    get_values_ret          ret
    
                        ' ??come to this with possibly all pins tristated so need to make P16-P20 high before changing the 138 value 
    set138                  shl     pasm_n,#25              ' pass n =0 to 7
                            or      dira,maskP0P20P25P27    ' make P25-P27 outputs as well as P0 to P20
                            andn    outa,mask138            ' make these three pins low
                            or      outa,pasm_n             ' set the 138 pins
    set138_ret              ret
    
    
    load161pasm                                             ' uses ramaddr
                            mov     pasm_n,#7
                            call    #set138                 ' deselect previous 138 value
                            or      dira,maskP0P20          ' %00000000_00011111_11111111_11111111         ' P0-P18 enabled for output plus P19,P20 
                            and     outa,maskP18low         ' %00001111_11111000_00000000_00000000         ' preserve previous values but set A0-18 low   
                            or      outa,ramaddr            ' output address to 161 chips
                            andn    outa,maskP19            ' set pin 19 low =  161 clock
                            mov     pasm_n,#1               ' 161 load low
                            call    #set138                 ' set it low
                            or      outa,maskP19            ' set pin 19 high = 161 clock
                            or      outa,maskP16P20         ' %00000000_00011111_00000000_00000000         ' set P16-P20 high prior to changing 138 
                            mov     pasm_n,#2               ' 161 load high and back to mem transfer
                            call    #set138                 ' send out
    load161pasm_ret         ret
    
    stop                   jmp     #stop                  ' for debugging
    
    
    
    ' ------------------ single letter commands  -------------------------------------
    ' command S
    pasmhubtoram            call    #get_values             ' get hubaddr,ramaddr,len
                            call    #load161pasm                ' load the 161 counters with ramaddr
    hubtoram_loop           and     outa,maskP16P31         '%11111111_11111111_00000000_00000000       ' clear for output                   
                            rdword  data_16,hubaddr         ' get the word from hub
                            and     data_16,maskP0P15       ' mask to a word only
                            or      outa,data_16            ' send out the byte to P0-P15
                            andn    outa,maskP20            ' set write low
                            add     hubaddr,#2              ' increment by 2 bytes = 1 word. Put this here for small delay while writes
                            or      outa,maskP20            ' write high
                            andn    outa,maskP19            ' clock 161 low
                            or      outa,maskP19            ' clock 161 high
                            djnz    len,#hubtoram_loop      ' loop this many times
                            jmp     #init                   ' tristate pins and listen for commands
    
    ' command T
    pasmramtohub            call    #get_values             ' get hubaddr,ramaddr,len
                            call    #load161pasm            ' load the 161 counters with ramaddr
                            and     dira,maskP16P27         ' %00001111_11111111_00000000_00000000 set P0-P15 as inputs   
                            andn    outa,maskP16            ' memory /rd low
    ramtohub_loop           mov     data_16,ina             ' get the data
                            wrword  data_16,hubaddr         ' move data to hub
                            andn    outa,maskP19            ' clock 161 low
                            or      outa,maskP19            ' clock 161 high
                            add     hubaddr,#2              ' increment the hub address 
                            djnz    len,#ramtohub_loop
                            or      outa,maskP16            ' memory /rd high  
                            or      dira,maskP0P15          ' %00000000_00000000_11111111_11111111 restore P0-P15as outputs
                            jmp     #init                   ' ' tristate pins and listen for commands
    
    ' command U
    pasmramtodisplay        call    #get_values             ' get hubaddr,ramaddr,len
                            call    #load161pasm            ' load the 161 counters with ramaddr
                            or      outa,maskP18            ' ILI_RS high
                            andn    outa,maskP16            ' memory /rd low 
                            and     dira,maskP16P27         ' disable prop pins %00001111_11111111_00000000_00000000 set P0-P15 as inputs    
    ramtodisplay_loop       andn    outa,maskP17            ' ILI write low
                            or      outa,maskP17            ' ILI write high
                            andn    outa,maskP19            ' clock 161 low
                            or      outa,maskP19            ' clock 161 high
                            djnz    len,#ramtodisplay_loop
                            or      outa,maskP16            ' memory /rd high  
                            or      dira,maskP0P15          ' %00000000_00000000_11111111_11111111 restore P0-P15as outputs
                            jmp     #init
    
    ' command V
    pasmhubtodisplay        call    #get_values             ' get hubaddr,ramaddr,len
                            or      outa,maskP16P20         ' %00000000_00011111_00000000_00000000         ' set P16-P20 high prior to changing 138
                            mov     pasm_n,#7
                            call    #set138                 ' deselect previous 138 value
                            mov     pasm_n,#2               ' mem transfer
                            call    #set138                 ' send out
    hubtodisplay_loop       and     outa,maskP16P31         '%11111111_11111111_00000000_00000000       ' clear for output                   
                            rdword  data_16,hubaddr         ' get the word from hub
                            and     data_16,maskP0P15       ' mask to a word only
                            or      outa,data_16            ' send out the byte to P0-P15
                            andn    outa,maskP17            ' ILI write low
                            or      outa,maskP17            ' ILI write high
                            add     hubaddr,#2              ' one word
                            djnz    len,#hubtodisplay_loop
                            jmp     #init
    
    'command E
    RawtoILIformat          ' takes a .raw 3 byte RRRRRRRR GGGGGGGG BBBBBBBB and converts to 2 byte RRRRRGGG GGGBBBBB
                            ' pass hubaddr, ramaddr and len
                            ' hubaddr is source location, len is number of pixels
                            ' ramaddr is destination in hub (messy naming) and length is 2/3 of blocklength
                            call    #get_values ' gets hubaddress, ramaddress and len (ignores ramaddress)
    rawloop
                            rdbyte red,hubaddr
                            add hubaddr,#1
                            rdbyte green,hubaddr
                            add hubaddr,#1
                            rdbyte blue,hubaddr
                            add hubaddr,#1
                            call #rgbtoili
                            wrbyte ililow,ramaddr
                            add ramaddr,#1
                            wrbyte ilihigh,ramaddr
                            add ramaddr,#1
                            djnz    len,#rawloop            ' loop until done 
                            jmp     #init                   ' set pins to tristate
    
    RGBtoILI                ' pass red,green, blue, returns ililow and ilihigh
                            shr     red,#3                  ' 000RRRRR 
                            shl     red,#3                  ' RRRRR000 
                            shr     green,#2                ' 00GGGGGG
                            mov     ilihigh,green           ' ilihigh = 00GGGGGG
                            shr     ilihigh,#3              ' ilihigh = 00000GGG
                            or      ilihigh,red             ' ilihigh = RRRRRGGG
                            and     green,#%00000111        ' 00000GGG
                            shl     green,#5                ' GGG00000
                            mov     ililow,green            ' ililow = GGG00000
                            shr     blue,#3                 ' blue = 000BBBBB
                            or      ililow,blue             ' ililow = GGGBBBBB
    RGBtoILI_ret            ret
    
    BMPtoILIformat          ' takes a .bmp 3 byte BBBBBBBB GGGGGGGG RRRRRRRR and converts to 2 byte RRRRRGGG GGGBBBBB
                            ' same as E above but BGR instead of RGB
                            ' pass hubaddr, ramaddr and len
                            ' hubaddr is source location, len is number of pixels
                            ' ramaddr is destination in hub (messy naming) and length is 2/3 of blocklength
                            call    #get_values ' gets hubaddress, ramaddress and len (ignores ramaddress)
    bmploop
                            rdbyte blue,hubaddr
                            add hubaddr,#1
                            rdbyte green,hubaddr
                            add hubaddr,#1
                            rdbyte red,hubaddr
                            add hubaddr,#1
                            call #rgbtoili
                            wrbyte ililow,ramaddr
                            add ramaddr,#1
                            wrbyte ilihigh,ramaddr
                            add ramaddr,#1
                            djnz    len,#bmploop            ' loop until done 
                            jmp     #init                   ' set pins to tristate
    ' **** command X *********************
    
    MergeIcons              call    #get_values ' gets hubaddress, ramaddress,len which are used here as background,icon,mask
                            mov     pasm_n,#59               ' do a single row
    mergeiconsloop          rdbyte  ililow,len                 ' reuse ililow, so this is rdword mask,maskcounter
                            and     ililow,#%11111             ' mask off low 5 bits and use just the blue as this is a grayscale bitmap
                            rdword  red,hubaddr              ' reuse red, so actually this is rdword background,backgroundcounter                        
                            cmp     ililow,#%10000   wc       ' compare if >128 (ie mid level gray)
                  if_c      jmp     #mergeskip
                            rdword  green,ramaddr            ' reuse green, so this is rdword iconpixel, iconpixelcounter 
                            wrword  green,hubaddr            ' if replace, then move icon pixel to the background     
    mergeskip               add     hubaddr,#2
                            add     ramaddr,#2
                            add     len,#2
                            djnz    pasm_n,#mergeiconsloop            ' loop until done 
                            jmp     #init                   'set pins to tristate 
    
                            
    
    'pasmlcdwritecom         call    #get_values             ' use hubaddr as the data
    '                        or      dira,maskP0P20          ' set these pins high (pass all pins tristated)
    '                        or      outa,maskP0P20          '  set pins high
    '                        mov     pasm_n,#2               '  mem transfer
    '                        call    #set138                 ' set the 138
    '                        andn    outa,maskP18            ' P18 ILIRS low
    '                        and     outa,maskP16P31         ' set P0-P15 low
    '                        or      outa,hubaddr            ' send out the data
    '                        andn    outa,maskP17            ' ILI write low
    '                        or      outa,maskP17            ' ILI write high
    '                        jmp     #init                   ' set pins to tristate  
    
    ' variables
    pasm_n                  long    0                                    ' general purpose value
    data_16                 long    0                                    ' general purpose value
    ililow                  long    0                                    ' low data byte 
    ilihigh                 long    0                                    ' high data byte 
    red                     long    0                                    ' red, green blue variables
    green                   long    0
    blue                    long    0           
    
    ' constants
    Zero                    long    %00000000_00000000_00000000_00000000 ' used in several places
    mask138                 long    %00001110_00000000_00000000_00000000 ' mask for the three 138 pins   
    maskP0P20               long    %00000000_00011111_11111111_11111111 ' P0-P18 enabled for output plus P19,P20    
    maskP18low              long    %00001111_11111000_00000000_00000000 ' P0-P18 low
    maskP16                 long    %00000000_00000001_00000000_00000000 ' pin 16
    maskP17                 long    %00000000_00000010_00000000_00000000 ' pin 17
    maskP18                 long    %00000000_00000100_00000000_00000000 ' pin 18
    maskP19                 long    %00000000_00001000_00000000_00000000 ' pin 19
    maskP20                 long    %00000000_00010000_00000000_00000000 ' pin 20
    maskP16P31              long    %11111111_11111111_00000000_00000000 ' pin 16 to pin 31
    maskP0P20P25P27         long    %00001110_00011111_11111111_11111111  ' enable all pins as outputs except SD pins
    maskP0P15               long    %00000000_00000000_11111111_11111111 ' for masking words
    maskP16P20              long    %00000000_00011111_00000000_00000000
    maskP16P27              long    %00001111_11111111_00000000_00000000
                            fit     496      
    

    I guess the first thing to point out is there are a lot of extra functions in that code that can be ignored. At the core are just two routines - move a block of data from hub to ram, and move a block of data from ram to hub.

    In every routine there is one common call, which collects the variables from the calling program
    call    #ram_open               ' get variables from hub variables 
    

    and just to cause myself and others confusion, in the new code this routine has changed names to
    call    #get_values             ' get hubaddr,ramaddr,len   
    

    It does the same thing though which is this
    get_values              rdlong  hubaddr, hubptr         ' get hub address
                            rdlong  ramaddr, ramptr         ' get ram address
                            rdlong  len, lenptr             ' get length
                            mov     err, #5                 ' err=5
    get_values_ret          ret
    

    Next thing is a startup routine. David Betz calls this BSTART
    call    #BSTART
    

    For the dracblade this sets the high latch with address A16-A18.

    On the touchscreen, the equivalent code is
                            call    #load161pasm                ' load the 161 counters with ramaddr 
    

    which loads up the 161 counters with the starting address.

    I'll just note here that in the middle of David's BSTART routine there are some cache lines of code eg
            mov     address,vmaddr          ' get the high address byte
    

    and a couple more at the end. So those will need to be replicated in the load161pasm driver code.

    So once that is done, I think the aim is to replace this read loop
    BREAD
            call    #BSTART
    rdloop  call    #read_memory_byte       ' read byte from address into data_8
            wrbyte  data_8,ptr              ' write data_8 to hubaddr ie copy byte to hub
            add     ptr,#1                  ' add 1 to hub address
            add     address,#1              ' add 1 to ram address
            djnz    count,#rdloop           ' loop until done
    BREAD_RET
            ret
    

    with this code
    pasmhubtoram            call    #get_values             ' get hubaddr,ramaddr,len
                            call    #load161pasm                ' load the 161 counters with ramaddr
    hubtoram_loop           and     outa,maskP16P31         '%11111111_11111111_00000000_00000000       ' clear for output                   
                            rdword  data_16,hubaddr         ' get the word from hub
                            and     data_16,maskP0P15       ' mask to a word only
                            or      outa,data_16            ' send out the byte to P0-P15
                            andn    outa,maskP20            ' set write low
                            add     hubaddr,#2              ' increment by 2 bytes = 1 word. Put this here for small delay while writes
                            or      outa,maskP20            ' write high
                            andn    outa,maskP19            ' clock 161 low
                            or      outa,maskP19            ' clock 161 high
                            djnz    len,#hubtoram_loop      ' loop this many times
                            jmp     #init                   ' tristate pins and listen for commands
    

    A couple of things to note there. First, all my routines finish with a jmp #init. All David's routine finish with a ret. Whatever happens, ultimately the cog routine must end by tristating all the propeller pins, ie
    init                    mov     err, #0                  ' reset err=false=good
                            mov     dira,zero                ' tristate the pins
    

    And the other thing to note is that there are two ram chips so it is reading in a word at a time, not a byte at a time. So hubaddr etc are incremented by 2 rather than by 1 each loop. So somewhere along the line, len will need to be divided by 2 for cache access (but not for access from within C code).

    I suppose we have to work out what this board should be called. The easy answer might be a Touchblade, but I don't want to monopolise the "touch" part as I am sure many other touchscreens will end up supported by GCC.

    So I think this encapsulates all the code and bits that need to change.

    A general question first. The ILI driver code does multiple things in one cog, because each routine is small so lots can be fitted into a cog.

    Is it possible to "share" the cache code and the display driver code in one cog?

    If so, how do we send commands like Cluso99's single letter commands, and also send a cache command?

    OR (and probably easier), split it up and have the cache code run in one cog, and do all the video driver code in another cog.

    If we go for option #2, then the code becomes much simpler. Just ram to hub and hub to ram. I might start with that first and then think about combining cogs (to save a cog) later.
  • David BetzDavid Betz Posts: 14,516
    edited 2012-04-28 02:49
    Wow! It looks like you've done a lot of work on this already. You asked if there was a way to add a command parser to handle the single character commands that Cluso uses in his driver. While it doesn't use characters, all of the PropGCC cache drivers already have a command dispatch mechanism based on integer command IDs. This table could be extended to support more commands. You could certainly add commands to drive the display if all of the code will fit in the COG.
  • denominatordenominator Posts: 242
    edited 2012-04-28 05:41
    Dr_Acula wrote: »
    Is it possible to "share" the cache code and the display driver code in one cog?

    Yes. The SD cache driver does just that. It supports the XMM kernels by handling code-memory read cache misses, and it also handles generic block read/write commands for the C library's SD file system interface.

    The trick is found in lib/drivers/load_sd_driver.c/file_io.c - in that code, you'll notice code similar to this:
    #ifndef __PROPELLER_LMM__
    extern uint16_t _xmm_mbox_p;
    
    ...
    
            sd_mbox = (uint32_t *)(uint32_t)_xmm_mbox_p;
    #endif
    
    
    static uint32_t __attribute__((section(".hubtext"))) do_cmd(uint32_t cmd)
    {
        sd_mbox[0] = cmd;
        while (sd_mbox[0]);
        return sd_mbox[1];
    }
    
    

    The C library uses this code to send SD sector commands to the underlying cache driver. These are the extended commands that David mentioned. Check out the full source at http://code.google.com/p/propgcc/source/browse/lib/drivers/file_io.c. You might also want to check out the SD cache driver at http://code.google.com/p/propgcc/source/browse/loader/spin/sd_cache.spin.

    Note that this code also supports using a separate driver that does just the sector read/writes needed by the library (your "option #2"). This works when you need SD card access in LMM mode and also when you need SD card access and you're using a different caching mechanism (most likely because all the other caching mechanisms are faster than the SD card cache). In these cases, the split driver is necessary because there is no SD cache driver!

    To see this, check out http://code.google.com/p/propgcc/source/browse/lib/drivers/sd_driver.s and notice how similar it is to the aforementioned SD cache driver - just the cache code is missing. This file is loaded manually by the library in http://code.google.com/p/propgcc/source/browse/lib/drivers/load_sd_driver.c. (BTW, the sd_driver.spin driver in loader/spin looks almost 100% identical to this driver - but this driver is not used by the C library, it is solely used by the loader.)

    Also note that it works 100% fine when both the SD cache driver and the SD library are loaded at the same time (again, your "option #2"). This wastes a cog, but it does work.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-28 06:58
    Interesting. Sharing SD XMM and SD for file I/O is similar to the problem of sharing external ram for XMM and for display updates.

    Speaking of SD cards, I see there is an option in the dropdown menu "Dracblade-SDXMMC". What does that do?

    And are there any xmm models out there where the program is stored on an SD card? Sounds a bit crazy, but with caching, it ought to be no slower than storing a program in serial sram or serial flash.
  • jazzedjazzed Posts: 11,803
    edited 2012-04-28 07:58
    Dr_Acula wrote: »
    Interesting. Sharing SD XMM and SD for file I/O is similar to the problem of sharing external ram for XMM and for display updates.

    Speaking of SD cards, I see there is an option in the dropdown menu "Dracblade-SDXMMC". What does that do?

    And are there any xmm models out there where the program is stored on an SD card? Sounds a bit crazy, but with caching, it ought to be no slower than storing a program in serial sram or serial flash.

    In SimpleIDE version 0-6-7 the -SDXMMC and -SDLOAD board types don't work. I have fix and will post later today.

    The user's guide explains these, but for convenience ....


    Program->Run (F10) and Program->Run Console (F8) buttons:
    Dracblade-SDXMMC should start your XMMC program run from SDCard.
    Dracblade-SDLOAD should put your program on SDCard and load it to SRAM and run.


    Program->Burn (F11) button:
    The boot loader is programmed to EEPROM for booting either SDXMMC or SDLOAD method.
    The AUTORUN.PEX program can be replaced by copying it to SDcard - then you reset the board to run it.


    Program->Build (F9) button:
    Using the build hammer with SDXMMC or SDLOAD selected will create a.pex.
    The a.pex file can be copied to SDcard as AUTORUN.PEX for booting after burning the EEPROM.


    Tools->Send File to Target SDCard:
    The download button next to the build hammer lets you send any file to SDcard via a serial protocol.
    By default in XMM modes it will create a new AUTORUN.PEX program.
    You can choose to send that or any file serially to the target SDCard.
    I often use it to just copy AUTORUN.PEX to the SDCard using the filesystem.
  • David BetzDavid Betz Posts: 14,516
    edited 2012-04-28 07:59
    Dr_Acula wrote: »
    Interesting. Sharing SD XMM and SD for file I/O is similar to the problem of sharing external ram for XMM and for display updates.

    Speaking of SD cards, I see there is an option in the dropdown menu "Dracblade-SDXMMC". What does that do?

    And are there any xmm models out there where the program is stored on an SD card? Sounds a bit crazy, but with caching, it ought to be no slower than storing a program in serial sram or serial flash.
    You guessed it. That is exactly what SD XMMC mode does. It runs a program directly from the SD cards for systems that don't have any other external memory. The downside is that the SD sector size is 512 bytes which is much larger than the optimal cache line size.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-28 16:28
    You guessed it. That is exactly what SD XMMC mode does.

    Prof Braino suggested something similar recently too. Thinking of the way a touchscreen OS might work, you have a "main" program that waits for the user to touch a key. Then it loads up an operation, eg a calculator, or a picture viewer, and that would need some cache changes. But while a calculator is running, no cache updates would be needed as it would all fit in a few kilobytes. So maybe SD cache is a "generic" option for many boards?

    If so, could one think about a SD cache driver that worked on the dracblade by defining pins 12,13,14,15 and worked on the various demoboards by devining pins 0,1,2,3 and worked on the Touch161 board by defining pins 24,25,26,27.

    Maybe you already have this for the demoboards?

    Re the 512 byte sector size, all my programs seem to end up with an array "byte sdbuffer[512]" so if you had that and you read in 512 bytes and the next cache read request happened to be already in the buffer, could you detect that and not have to read the sd card again?
  • David BetzDavid Betz Posts: 14,516
    edited 2012-04-28 16:32
    Dr_Acula wrote: »
    Prof Braino suggested something similar recently too. Thinking of the way a touchscreen OS might work, you have a "main" program that waits for the user to touch a key. Then it loads up an operation, eg a calculator, or a picture viewer, and that would need some cache changes. But while a calculator is running, no cache updates would be needed as it would all fit in a few kilobytes. So maybe SD cache is a "generic" option for many boards?

    If so, could one think about a SD cache driver that worked on the dracblade by defining pins 12,13,14,15 and worked on the various demoboards by devining pins 0,1,2,3 and worked on the Touch161 board by defining pins 24,25,26,27.

    Maybe you already have this for the demoboards?

    Re the 512 byte sector size, all my programs seem to end up with an array "byte sdbuffer[512]" so if you had that and you read in 512 bytes and the next cache read request happened to be already in the buffer, could you detect that and not have to read the sd card again?
    I don't have a demo board so I can't say whether our SD cache driver will work with it but I don't see why it wouldn't. The pin numbers are programmable and we can even handle different CS mechanisms like a simple single pin CS, the C3-style counter CS, and a mux like Bill Henning's boards use. The mux hasn't been tested, again because I don't have a board that uses one.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-28 17:31
    @jazzed
    In SimpleIDE version 0-6-7 the -SDXMMC and -SDLOAD board types don't work. I have fix and will post later today.

    Thanks++

    If yo have a generic SD XMM version then this might end up a super simple solution for both myself and for others with different boards. Just take the board I have and change the pin numbers. (see post #22. The order of pins on the touch161 is the same order as the dracblade. I'm not sure about gadget ganster boards though).

    I've designed boards with multiplexed pins for SD cards but in the end it is simpler to use existing code and just devote 4 propeller pins to SD cards. Which means that a solution for the dracblade will work for the gadget ganster board and the touch161 board. Just change the pin numbers.

    This could open up large C programs for a whole lot of people who have boards with SD cards.
  • David BetzDavid Betz Posts: 14,516
    edited 2012-04-28 18:14
    Dr_Acula wrote: »
    @jazzed

    Thanks++

    If yo have a generic SD XMM version then this might end up a super simple solution for both myself and for others with different boards. Just take the board I have and change the pin numbers. (see post #22. The order of pins on the touch161 is the same order as the dracblade. I'm not sure about gadget ganster boards though).

    I've designed boards with multiplexed pins for SD cards but in the end it is simpler to use existing code and just devote 4 propeller pins to SD cards. Which means that a solution for the dracblade will work for the gadget ganster board and the touch161 board. Just change the pin numbers.

    This could open up large C programs for a whole lot of people who have boards with SD cards.
    Several of our cache drivers can easily be programmed for different pins and chip selects by setting values in the board configuration file. The drivers themselves do not need to be modified or recompiled. This is true for the SD cache driver and there are also some new drivers for SPI flash and Quad SPI flash chips that can be used with chips connected to any pins and with a variety of chip selects.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-28 18:16
    You wouldn't happen to have a link to such a generic SD cache driver by any chance? Or the board config file? (which c:\propgcc folder?) Maybe this whole thing might come down to changing 4 numbers in a text file?!
  • David BetzDavid Betz Posts: 14,516
    edited 2012-04-28 18:31
    Dr_Acula wrote: »
    You wouldn't happen to have a link to such a generic SD cache driver by any chance? Or the board config file? (which c:\propgcc folder?) Maybe this whole thing might come down to changing 4 numbers in a text file?!
    Here is the configuration file for the PropBOE. The lines that begin with "sdspi-" are the definitions of the SD card pins that are used by the SD cache driver. This is using a simple single pin CS but other options are available as I mentioned in an earlier message.
    # [propboe]
    # IDE:SDXMMC
        clkfreq: 80000000
        clkmode: XTAL1+PLL16X
        baudrate: 115200
        rxpin: 31
        txpin: 30
        cache-driver: eeprom_cache.dat
        cache-size: 8K
        cache-param1: 0
        cache-param2: 0
        eeprom-first: TRUE
        sd-driver: sd_driver.dat
        sdspi-do: 22
        sdspi-clk: 23
        sdspi-di: 24
        sdspi-cs: 25
    
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-28 19:01
    Fantasic. I'll plug that in when I get home from work and give it a go.

    I'm really excited about the whole "cache and run from an SD card" concept - it has so many possibilities.
  • denominatordenominator Posts: 242
    edited 2012-04-28 20:24
    David Betz wrote: »
    I don't have a demo board so I can't say whether our SD cache driver will work with it but I don't see why it wouldn't.

    Dr. Acula:

    I have adding both a full-size- and a micro-SD to the demo board and it works fine. Just like David said, all you have to do is provide the appropriate loader config file in the propeller-load directory.

    To convert an existing config to allow it to be used with SD caching:

    1) Add the following parameters - these use the pins I used for my demo board:

    sdspi-do: 4
    sdspi-clk: 5
    sdspi-di: 6
    sdspi-cs: 7

    Note that there are 4 additional sdspi- parameters that allow you to using address multiplexing on the SPI bus - check the code or ask for an explanation if you want to use them.

    2) If you're going to use the IDE, add a line near the top like this:

    # IDE:SDXMMC

    Note that you do not have to include the "sd-driver: sd_driver.dat" line unless your board provides some caching mechanism and you want to run your program using the alternate SD card execution method (the sd-loader method that reads your entire program and tosses it to the cache before starting).

    - Ted
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2012-04-29 03:21
    Thanks denominator.

    I tired doing that but have run into some problems.

    1) Change the file "propboe.cfg" so the pins are correct for my sd card. (same order, just add 2 to each number)
    corrected file below
    # [propboe]
    # IDE:SDXMMC
        clkfreq: 80000000
        clkmode: XTAL1+PLL16X
        baudrate: 115200
        rxpin: 31
        txpin: 30
        cache-driver: eeprom_cache.dat
        cache-size: 8K
        cache-param1: 0
        cache-param2: 0
        eeprom-first: TRUE
        sd-driver: sd_driver.dat
        sdspi-do: 24
        sdspi-clk: 25
        sdspi-di: 26
        sdspi-cs: 27
    

    Now compile a program using simpleIDE. PROPBOE in the dropdown menu at the top. I presume that is right. Memory model is XMMC.

    I can't copy and paste the build dialog but it says that it verified sending the data to ram, and then verified sending it to flash.

    I did a program to eeprom. However, no file has appeared on the SD card. The program does run though so it appears as if it is in eeprom, not on the SD card.

    I tried sending it to PROPBOE-SDXMMC but it says it can't find the board configuration.

    Any advice here would be most appreciated!
  • jazzedjazzed Posts: 11,803
    edited 2012-04-29 04:54
    Dr_Acula wrote: »
    I tried sending it to PROPBOE-SDXMMC but it says it can't find the board configuration.

    Any advice here would be most appreciated!

    PROPBOE-SDXMMC must be selected for the IDE to use the SDXMMC mode.
    As I mentioned though, version 0-6-7 has a bug regarding this board type.

    Please read this message: http://forums.parallax.com/showthread.php?137928-PropGCC-SimpleIDE&p=1094377&viewfull=1#post1094377
Sign In or Register to comment.