New XMM hardware

Dr_Acula · 2012-04-24 21:11

I'd like to test out GCC with the latest touchscreen schematic. I've read this page http://code.google.com/p/propgcc/wiki/PropGccExternalMemory

The cog pasm routines are already written and so hopefully are easy to integrate. I see some clever person has added the Dracblade. This was built on software originally written by Cluso99 that had a very simple interface format between spin and the cog. Send 4 longs:

1) is a single ascii character instruction eg "A" is move a block from hub to ram, "B" is move a block from ram to hub etc ("I" is reserved for "initialize")
2) is the hub address
3) is the external ram address
4) is the number of bytes.

and then wait for a long to change to say the operation is completed.

The touchscreen external ram driver uses the same interface, so whatever is working on the Dracblade ought to be possible to port over to the touchscreen without too many changes.

Can some kind soul please point me in the direction of the code used for the Dracblade external ram driver - I'm hoping I can just make a few changes and post the new pasm code.

Many thanks in advance.

jazzed · 2012-04-24 21:49

Dr_Acula wrote: »

Can some kind soul please point me in the direction of the code used for the Dracblade external ram driver - I'm hoping I can just make a few changes and post the new pasm code.

Many thanks in advance.

http://propgcc.googlecode.com/hg/loader/spin/dracblade_cache.spin
Hope you can read it. David will probably be around tomorrow if you have questions.

You can use BSTC -C to compile your new cache driver and put it in the c:\propgcc\propeller-loader path. This can be compiled in a SimpleIDE project, but you have to copy the new driver to the aforementioned path. The resulting file will be something like newdriver_cache.dat assuming your source is newdriver_cache.spin

You might consider a temporary config file name like dractouch.cfg or something, I'll use that here for simplicity.

All you have to do to use newdriver_cache.dat is copy the file in c:\propgcc\propeller-loader\dracblade.cfg to some new file such as c:\propgcc\propeller-loader\dractouch.cfg and then change the "cache-driver: dracblade_cache.dat" line in dractouch.cfg to "cache-driver: newdriver_cache.dat".

Once you have a new dractouch.cfg all setup, just click the jigsaw puzzle piece in SimpleIDE to reload the board config types.

Then you can choose dractouch.cfg before clicking the blue right arrow Run Console button F8 or the green right arrow Run button F10.

There is a tool available for testing a newdriver_cache.spin in the repository.
It's here: http://propgcc.googlecode.com/hg/loader/spin/test_cache.spin

You will also need this: http://propgcc.googlecode.com/hg/loader/spin/cache_interface.spin

In test_cache.spin, you can change the "mdev" object from mdev : "eeprom_cache" to mdev : "newdriver_cache".

As for the program to compile and load, well I can't with that tonight or tomorrow. Busy day ahead for me tomorrow.
Hope this helps some.

Thanks,
--Steve

Forum software is horrible these days

Dr_Acula · 2012-04-24 22:14

Wow, thanks for the rapid and detailed response. I'll check it all out. Cheers!

Dr_Acula · 2012-04-24 22:23

Code looks very logical.

Quick question;

BREAD
        call    #BSTART
rdloop  call    #read_memory_byte       ' read byte from address into data_8
        wrbyte  data_8,ptr              ' write data_8 to hubaddr ie copy byte to hub
        add     ptr,#1                  ' add 1 to hub address
        add     address,#1              ' add 1 to ram address
        djnz    count,#rdloop           ' loop until done
BREAD_RET
        ret

The touchscreen is reading and writing in words rather than bytes. Simple answer is to shift count >>1. Can I assume that "count" is always an even number?

David Betz · 2012-04-25 05:03

Dr_Acula wrote: »
Code looks very logical.

Quick question;
BREAD
        call    #BSTART
rdloop  call    #read_memory_byte       ' read byte from address into data_8
        wrbyte  data_8,ptr              ' write data_8 to hubaddr ie copy byte to hub
        add     ptr,#1                  ' add 1 to hub address
        add     address,#1              ' add 1 to ram address
        djnz    count,#rdloop           ' loop until done
BREAD_RET
        ret
The touchscreen is reading and writing in words rather than bytes. Simple answer is to shift count >>1. Can I assume that "count" is always an even number?

Yes, you can assume that. In fact, we always read an entire cache line at a time. I think the cache line size for the DracBlade driver is 64 bytes.

Dr_Acula · 2012-04-25 05:16

No problem. I've been using 512 bytes for graphics - is there a "standard" cache size?

Also, I'm working through the step-by-step instructions in the Quickstart pdf file and there does not seem to be a "demos" folder. In fact, searching through all the files in the propgcc folder, I have lots of .h files but windows could not find a single .c file nor anything with toggle.* Seems a bit odd.

David Betz · 2012-04-25 05:58

Dr_Acula wrote: »

No problem. I've been using 512 bytes for graphics - is there a "standard" cache size?

Also, I'm working through the step-by-step instructions in the Quickstart pdf file and there does not seem to be a "demos" folder. In fact, searching through all the files in the propgcc folder, I have lots of .h files but windows could not find a single .c file nor anything with toggle.* Seems a bit odd.

There is no standard cache line size. Different drivers can use different sizes based on the requirements of the backing store device. For instance, the SD cache driver uses 512 byte cache lines because that's the sector size on the SD card.

I'm not sure about why you can't find the demos directory. You can find it in Google Code here: http://code.google.com/p/propgcc/source/browse/#hg%2Fdemos

You'll find a 'toggle' directory under 'demos' with lots of versions of the toggle program.

David Betz · 2012-04-25 06:36

Dr_Acula wrote: »

No problem. I've been using 512 bytes for graphics - is there a "standard" cache size?

Also, I'm working through the step-by-step instructions in the Quickstart pdf file and there does not seem to be a "demos" folder. In fact, searching through all the files in the propgcc folder, I have lots of .h files but windows could not find a single .c file nor anything with toggle.* Seems a bit odd.

I forgot to mention this in my previous post but even though 512 byte cache lines are supported by some cache drivers, that doesn't mean they are optimal. It turns out that the SD cache driver has worse performance than some of the other cache drivers partly because of the large cache lines. You may want to experiment with different cache geometries to determine which gives the best performance for your application.

Dave Hein · 2012-04-25 08:51

I did some benchmarking of various XMMC cache implementations several months ago, and it seemed like the number of cache lines in the cache memory was very important. In an 8K cache with a 512-byte cache line, there are only 16 cache lines in hub RAM. Unfortunately, SD access is always done in chunks of 512 bytes, so there's not a lot of flexibility there. With flash memory I tried a 32x256 cofiguration and 64x128. In general, the 64 128-byte cache lines performed the best.

Dr_Acula · 2012-04-25 16:03

128 bytes will be fine. For this particular driver, moving data is fast but there is some setup code to load all the 161 counters, so a hypothetical 16 byte cache would not be very optimal. Back to coding...

David Betz · 2012-04-26 20:50

How are you doing on this? Do you need help?

Dr_Acula · 2012-04-26 23:26

Yea, you got me! I'm stuck.

Ok, what I have is a board in front of me doing all sorts of cool icons and touchscreen things. But the design is about to be superseded by a new design (based on a brilliant idea from jazzed) and that board won't be here till next week. Coding the pasm driver can be done on the board I have but the code will end up being changed very soon. So I think I will wait till the new board arrives. It will need some code to drive the 74HC237 (even though that will only be a few lines of code, I put some debugging leds on the board as I've never used this chip before). On the plus side, I've ordered 10 boards so if they work I've got spares I can give away to anyone wanting to help out.

Which brings up the next step - debugging. On the touchscreen I wrote a slow routine that takes the propeller font from inside the propeller and puts it on the touchscreen. This works without an SD card so you can display a message to say there is no SD card.

But for debugging C right at the very start, I wonder if the serial port might be easier as it can be used to get things working before the display is working.

So...

would anyone have a very simple C program that sends "Hello World" back up the P30/P31 serial lines to a terminal program? And if possible, maybe add a 5 second delay so there is time to do a download and then fire up a terminal program to check the data coming back.

I'm thinking of the way I got the current version working, which was to code every pin logic level change in Spin first, and then port each spin routine over to pasm. So I guess that might be one way to do it in C as well. On the other hand, you might look at the pasm and say - hey that is easy to port over. Maybe it is - it does work rather like a cache. The display startup is still in spin though and so would go to C. And so I guess I'll be asking dumb things like how to do dira and outa in C.

jazzed · 2012-04-26 23:50

Dr_Acula wrote: »

would anyone have a very simple C program that sends "Hello World" back up the P30/P31 serial lines to a terminal program? And if possible, maybe add a 5 second delay so there is time to do a download and then fire up a terminal program to check the data coming back.

SimpleIDE packages come with a hello demo for P30/31. Other demo programs are also included.

Download here: http://propside.googlecode.com/files/Simple-IDE_0-6-7_setup.zip
User guide is here: https://sites.google.com/site/propellergcc/simpleide/user-s-guide

Dr_Acula · 2012-04-27 00:58

Q: That easy?
A: YES!

See attached screenshot. Download the program, F10 and open the terminal.

Well, hats off to the GCC team and congratulations. You guys have made C super easy to use on the propeller.

Dr_Acula · 2012-04-27 23:14

This is more a memo to myself for later. Notes re building a new xmm driver
1) Toggle a led in C
2) Toggle a led by calling a pasm routine from C, and the pasm routine toggles the led
3) Write xmm driver primitives in C to send a byte to xmm, and return a byte
4) Port each routine over to cog pasm code
5) Test running this as actual xmm

I'm up to step 2. Use link from post #7 David Betz for the demos
Scroll to the "toggle" program.
Grab both the C code and the Spin code.

From the way these are separate I surmise that GCC is working with "binary blobs", precompiled in a spin compiler.

This is the spin bit

{{
toggle.spin
Propgcc - PASM toggle demo

Simple PASM routine to demonstrate interaction between PASM subroutines an PROPGCC main program.
The code running in the C address space to talk to exchange data values with PASM code running
in a COG.

The C program has visibility/access to the mailbox variables as normal C variables.
The PASM program has visibility/access to the mailbox variables through the PAR register
initialized by the COGNEW and hte RDLONG/RDWORD/RDCHAR and WRLONG/WRWORD/WRCHAR instructions.

C program starts the PASM routine via cognew() function, passing strat address and PAR register value.
PAR register should be the address of a STATIC data area of LONGs in the C program.

For this example:

PAR -> static unsigned int delay;
static unsigned int pins;
static unsigned int loop_cnt;
static unsigned int pasm_done;

The first three variables are used as input to the PASM routine, the last is used to act as a semaphore
back to the C routine.

This could have been written as more efficient PASM code but for the examples, I was going for maximum clarity at this point.

Copyright (c) 2011, Steve Denson, Rick Post
MIT Licensed. Terms of use below.

}}

pub start(pinptr)
cognew(@pasm, pinptr)

dat org 0

pasm
mov mailbox_ptr,par ' save the pointer to the STATIC parameter (HUB) memory area
' the PAR register is initialized by the cognew() function and is a pointer to
' the first STATIC int declared in the C code as the mailbox
' mailbox_ptr will be changed as the code executes. You can reload
' the initial pointer from PAR if you ever need it to point to
' the start of the mailbox again
rdlong waitdelay, mailbox_ptr ' read the wait delay from HUB - it is initialized by the C program
' in C program: delay = CLKFREQ>>1;
add mailbox_ptr,#4 ' point to the next LONG in HUB
rdlong pins,mailbox_ptr ' the caller's PIN mask as initialized in the C program
' in C program: pins = 0x3fffffff;
add mailbox_ptr, #4 ' point to the next LONG (4 bytes each)
rdlong loopcounter,mailbox_ptr ' set the loop count as provided by the C program
' in C program: loop_cnt = 20;
add mailbox_ptr, #4 ' point to the next LONG which is the semaphore we are setting when done

mov dira, pins ' set pins provided by C program to OUTPUT
mov nextcnt, waitdelay
add nextcnt, cnt ' best to add cnt last
:loop
xor outa, pins ' toggle pins
waitcnt nextcnt, waitdelay ' wait for user specified delay
djnz loopcounter,#:loop ' loop until the C provided counter hits zero

mov done_flag,#1 ' set the semaphore to one
wrlong done_flag, mailbox_ptr ' and save it back into hub memory via the ptr provided by the C program
' in C program: while(!pasm_done) to test for update from PASM
jmp #$ ' to infinity and BEYOND!!

' these do not need to be in any particular order or have particular names. There is no relationship between these
' local copies of the C variable except when you create via the PAR register and HUB instructions
' there is no address resolution or linkage done by propgcc or the loader
'
mailbox_ptr long 0 ' working ptr into the HUB area - reload from PAR if needed
pins long 0 ' local copy of the user's PIN mask
waitdelay long 0 ' local copy of the user's delay
loopcounter long 0 ' local copy of the user's loop counter
done_flag long 0 ' local copy of the semaphore to return to the C program
nextcnt long 0 ' local variable to save target value from waitcnt

{{

MIT Licensed.

+--------------------------------------------------------------------
TERMS OF USE: MIT License
+--------------------------------------------------------------------
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files
(the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software,
and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+--------------------------------------------------------------------
}}

and this is the C bit

/**
 * @file toggle.c
 * This program demonstrates starting a PASM COG
 * and being able to pass parameters from a C program to a PASM program
 * and from PASM back to C.
 *
 * C to PASM Mailbox example
 *
 * WARNING: This code makes all IO pins except 30/31 toggle HIGH/LOW. Check if this is OK
 * for the board you are using.
 *
 *
 * to use:
 * from directory containing source
 * make clean
 * make
 * propeller-load -pn -t -r toggle.elf  (where n is port #)
 *
 * Copyright (c) 2011, Steve Denson, Rick Post
 * MIT Licensed - terms of use below.
 */

#include <stdio.h>
#include <propeller.h>                    // propeller specific definitions

// the STATIC HUB mailbox for communication to PASM routine
//
static unsigned int delay; // a pointer to this gets passed to the PASM code as the PAR register
static unsigned int pins;
static int loop_cnt;
static int pasm_done;

// C stub function to start the PASM routine
// need to be able to provide the entry point to the PASM
// and a pointer to the STATIC HUB mailbox
// the cognew function in the propeller.c library returns the COG #
//
int start(unsigned int *pinptr)
{
    // The label binary_toggle_dat_start is automatically placed
    // on the cog code from toggle.dat by objcopy (see the Makefile).
    extern unsigned int binary_toggle_dat_start[];
    return cognew(&binary_toggle_dat_start, pinptr);
}

void usleep(int t)
{
    if(t < 10)  // very small t values will cause a hang
        return; // don't bother function delay is likely enough
    waitcnt((CLKFREQ/1000000)*t+CNT);
}
// C main function
// LMM model
void main (int argc,  char* argv[])
{
    printf("hello, world!\n");            // let the lead LMM COG say hello
    delay = CLKFREQ>>1;                    // set the delay rate in the STATIC mailbox
                                        // this is actually the duty cycle of the blink 0.5 sec on, 0.5 sec off
    pins = 0x3fFFffff;                     // set the PIN mask into the STATIC mailbox
                                        // light up all pins except 30 & 31 since we don't know board config
    loop_cnt = 20;                        // number of time through the loop (20 toggles, 10 on/off cycles)
    pasm_done = 0;                        // make sure it's zero since we'll sit and wait on it to change in a few lines
    printf ("New COG# %d started.\n",start(&delay)); // start a new COG passing a pointer to the STATIC mailbox structure
    printf ("waiting for semaphore to be set by PASM code.\n");
    while (!pasm_done)
    {
      usleep(10);                        // wait for the PASM code to clear the loop counter
    }    
    printf("goodbyte, world!\n");
    while(1);                            //let the original COG sit and spin
}

/*
    +--------------------------------------------------------------------
      TERMS OF USE: MIT License
    +--------------------------------------------------------------------
    Permission is hereby granted, free of charge, to any person obtaining
    a copy of this software and associated documentation files
    (the "Software"), to deal in the Software without restriction,
    including without limitation the rights to use, copy, modify, merge,
    publish, distribute, sublicense, and/or sell copies of the Software,
    and to permit persons to whom the Software is furnished to do so,
    subject to the following conditions:

    The above copyright notice and this permission notice shall be
    included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
    MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
    IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
    CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
    TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
    SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
    +--------------------------------------------------------------------
*/

Now in Spin, that would be one program, not two. This sort of thing can be glued into one program - I wrote an IDE for Catalina that put some tags around the pasm/spin bit and then wrote a simple program the split it all up, send the pasm bits off to BST and the C bit off to the C compiler.

So here is a question. If there is a "COG" mode in SimpleIDE and also a "LMM" mode, can you combine both in the same program? Have some C code that goes into a cog, and some C code that is the "main" program. Maybe with some sort of tag around the bit of code that goes into the cogs. Then it becomes one program, not two like "toggle" above?

If that is possible, then the whole program can be pure C and you don't have to learn pasm.

I'd have not said such a thing until recently when heater pointed out his FFT was running almost the same speed compiled from C as it was in native pasm.

Meanwhile - thinking about that some more, I think I'll do things in a different order. I have xmm code already working in a spin program. So I think I'll just grab the pasm bit plus the minimal spin so it compiles and turn that into a binary blob. Then translate the Spin driver into C.

Dr_Acula · 2012-04-28 01:10

More testing.

Well, the SimpleIDE might be simple to use but it is very clever behind the scenes. A little drop-down menu in the top right corner with all the hardware boards. The Dracblade works for a XMMC program right out of the box. All those hardware configs had be coded and that must have been a lot of work.

Also the text highlighting in the code. I've written code to do that and it is very complicated. Again, it is all there and working and easy to work out without even reading a help file.

Ok, a new driver.

This is the Dracblade driver using latches to talk to an external ram chip

DAT
'' +--------------------------------------------------------------------------+
'' | Dracblade Ram Driver (with grateful acknowlegements to Cluso)            |
'' +--------------------------------------------------------------------------+
                        org     0
tbp2_start    ' setup the pointers to the hub command interface (saves execution time later
                                      '  +-- These instructions are overwritten as variables after start
comptr                  mov     comptr, par     ' -|  hub pointer to command                
hubptr                  mov     hubptr, par     '  |  hub pointer to hub address            
ramptr                  add     hubptr, #4      '  |  hub pointer to ram address            
lenptr                  mov     ramptr, par     '  |  hub pointer to length                 
errptr                  add     ramptr, #8      '  |  hub pointer to error status           
cmd                     mov     lenptr, par     '  |  command  I/R/W/G/P/Q                  
hubaddr                 add     lenptr, #12     '  |  hub address                           
ramaddr                 mov     errptr, par     '  |  ram address                           
len                     add     errptr, #16     '  |  length                                
err                     nop                     ' -+  error status returned (=0=false=good) 


' Initialise hardware (unlike the triblade, just tristates everything and read/write set the pins)
init                    mov     err, #0                  ' reset err=false=good
                        mov     dira,zero                ' tristate the pins

done                    wrlong  err, errptr             ' status  =0=false=good, else error x
                        wrlong  zero, comptr            ' command =0 (done)
' wait for a command (pause short time to reduce power)
pause                   mov     ctr, delay      wz      ' if =0 no pause
              if_nz     add     ctr, cnt
              if_nz     waitcnt ctr, #0                 ' wait for a short time (reduces power)
                        rdlong  cmd, comptr     wz      ' command ?
              if_z      jmp     #pause                  ' not yet
' decode command
                        cmp     cmd, #"R"       wz      ' R = read block
              if_z      jmp     #rdblock
                        cmp     cmd, #"W"       wz      ' W = write block
              if_z      jmp     #wrblock
                        cmp     cmd, #"N"       wz      ' N= led on
              if_z      jmp     #led_turn_on
                        cmp     cmd, #"F"       wz      ' F = led off
              if_z      jmp     #led_turn_off
                        cmp     cmd, #"H"       wz      ' H sets the high latch
              if_z      jmp     #sethighlatch
                        mov     err, cmd                ' error = cmd (unknown command)
                        jmp     #done


tristate                mov     dira,zero                ' all inputs to zero
                        jmp     #done

' turn led on
led_turn_on             or      HighLatch,ledpin        ' set the led pin high
                        jmp     #OutputHighLatch         ' send this out

led_turn_off            andn    HighLatch,ledpin        ' set the led pin low
                        jmp     #OutputHighLatch         ' send this out

' set high address bytes with command H, pass value in third variable of the DoCmd
' 4 bytes - masks off all but bits 16 to 23

sethighlatch            call #ram_open                  ' gets address value in 'address'
                        shr  address,#16                ' shift right by 16 places
                        and  address,#$FF               ' ensure rest of bits zero
                        mov  HighLatch,address          ' put value into HighLatch
                        jmp  #OutputHighLatch           ' and output it

'---------------------------------------------------------------------------------------------------------
'Memory Access Functions

rdblock                 call    #ram_open               ' get variables from hub variables
rdloop                  call    #read_memory_byte       ' read byte from address into data_8
                        wrbyte  data_8,hubaddr          ' write data_8 to hubaddr ie copy byte to hub
                        add     hubaddr,#1              ' add 1 to hub address
                        add     address,#1              ' add 1 to ram address
                        djnz    len,#rdloop             ' loop until done
                        jmp     #init                   ' reinitialise

wrblock                 call    #ram_open                        
wrloop                  rdbyte  data_8, hubaddr         ' copy byte from hub
                        call    #write_memory_byte      ' write byte from data_8 to address
                        add     hubaddr,#1              ' add 1 to hub address
                        add     address,#1              ' add 1 to ram address
                        djnz    len,#wrloop             ' loop until done
                        jmp     #init                   ' reinitialise

ram_open                rdlong  hubaddr, hubptr         ' get hub address
                        rdlong  ramaddr, ramptr         ' get ram address
                        rdlong  len, lenptr             ' get length
                        mov     err, #5                 ' err=5
                        mov     address,ramaddr         ' cluso's variable 'ramaddr' to dracblade variable 'address'
ram_open_ret            ret
  
read_memory_byte        call #RamAddress                ' sets up the latches with the correct ram address
                        mov dira,LatchDirection2        ' for reads so P0-P7 tristate till do read
                        mov outa,GateHigh               ' actually ReadEnable but they are the same
                        andn outa,GateHigh              ' set gate low
                        nop                             ' short delay to stabilise
                        nop
                        mov data_8, ina                 ' read SRAM
                        and data_8, #$FF                ' extract 8 bits
                        or  outa,GateHigh               ' set the gate high again
read_memory_byte_ret    ret

write_memory_byte       call #RamAddress                ' sets up the latches with the correct ram address
                        mov outx,data_8                 ' get the byte to output
                        and outx, #$FF                  ' ensure upper bytes=0
                        or outx,WriteEnable             ' or with correct 138 address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                        nop                             ' no nop doesn't work, one does, so put in two to be sure
                        nop                             ' another NOP
                        or outa,GateHigh                ' set it high again
write_memory_byte_ret   ret

RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of ram
                        mov dira,LatchDirection         ' set up the pins for programming latch chips
                        mov outx,address                ' get the address into a temp variable
                        and outx,#$FF                   ' mask the low byte
                        or  outx,LowAddress             ' or with 138 low address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                                                        ' ?? a NOP
                        or outa,GateHigh                ' set it high again  
                                                        ' now repeat for the middle byte     
                        mov outx,address                ' get the address into a temp variable
                        shr outx,#8                     ' shift right by 8 places
                        and outx,#$FF                   ' mask the low byte
                        or  outx,MiddleAddress          ' or with 138 middle address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                        or outa,GateHigh                ' set it high again 
RamAddress_ret          ret

OutputHighLatch ' sends out HighLatch to the 374 that does A16-19, led and the 4 spare outputs
                        mov     dira,latchdirection     ' setup active pins 138 and bus
                        mov     outa,HighLatch          ' send out HighLatch
                        or      outa,HighAddress        ' or with the high address
                        andn    outa,GateHigh           ' set gate low
                        or      outa,GateHigh           ' set the gate high again
OutputHighLatch_ret     jmp     #tristate               ' set pins tristate





delay                   long    80                                    ' waitcnt delay to reduce power (#80 = 1uS approx)
ctr                     long    0                                     ' used to pause execution (lower power use) & byte counter
GateHigh                long    %00000000_00000000_00000001_00000000  ' HC138 gate high, all others must be low
Outx                    long    0                                     ' for temp use, same as n in the spin code
LatchDirection          long    %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active
LatchDirection2         long    %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read
LowAddress              long    %00000000_00000000_00000101_00000000 ' low address latch = xxxx010x and gate high xxxxxxx1
MiddleAddress           long    %00000000_00000000_00000111_00000000 ' middle address latch = xxxx011x and gate high xxxxxxx1
HighAddress             long    %00000000_00000000_00001001_00000000 ' high address latch = xxxx100x and gate high xxxxxxx1
'ReadEnable long    %00000000_00000000_00000001_00000000 ' /RD = xxxx000x and gate high xxxxxxx1
                                                        ' commented out as the same as GateHigh
WriteEnable             long    %00000000_00000000_00000011_00000000 ' /WE = xxxx001x and gate high xxxxxxx1
Zero                    long    %00000000_00000000_00000000_00000000 ' for tristating all pins
data_8                  long    %00000000_00000000_00000000_00000000 ' so code compatability with zicog driver
address                 long    %00000000_00000000_00000000_00000000 ' address for ram chip
ledpin                  long    %00000000_00000000_00000000_00001000 ' to turn on led
HighLatch               long    %00000000_00000000_00000000_00000000 ' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs

This is the very clever code David Betz wrote http://propgcc.googlecode.com/hg/loader/spin/dracblade_cache.spin

DAT
        org   $0

' initialization structure offsets
' $0: pointer to a two word mailbox
' $4: pointer to where to store the cache lines in hub ram
' $8: number of bits in the cache line index if non-zero (default is DEFAULT_INDEX_WIDTH)
' $a: number of bits in the cache line offset if non-zero (default is DEFAULT_OFFSET_WIDTH)
' note that $4 must be at least 2^($8+$a) bytes in size
' the cache line mask is returned in $0

init_vm mov     t1, par             ' get the address of the initialization structure
        rdlong  pvmcmd, t1          ' pvmcmd is a pointer to the virtual address and read/write bit
        mov     pvmaddr, pvmcmd     ' pvmaddr is a pointer into the cache line on return
        add     pvmaddr, #4
        add     t1, #4
        rdlong  cacheptr, t1        ' cacheptr is the base address in hub ram of the cache
        add     t1, #4
        rdlong  t2, t1 wz
  if_nz mov     index_width, t2     ' override the index_width default value
        add     t1, #4
        rdlong  t2, t1 wz
  if_nz mov     offset_width, t2    ' override the offset_width default value

        mov     index_count, #1
        shl     index_count, index_width
        mov     index_mask, index_count
        sub     index_mask, #1

        mov     line_size, #1
        shl     line_size, offset_width
        mov     t1, line_size
        sub     t1, #1
        wrlong  t1, par

        jmp     #vmflush

fillme  long    0[128-fillme]           ' first 128 cog locations are used for a direct mapped page table

        fit   128

        ' initialize the cache lines
vmflush movd    :flush, #0
        mov     t1, index_count
:flush  mov     0-0, empty_mask
        add     :flush, dstinc
        djnz    t1, #:flush

        ' start the command loop
waitcmd mov     dira, #0                ' release the pins for other SPI clients
        wrlong  zero, pvmcmd
:wait   rdlong  vmpage, pvmcmd wz
  if_z  jmp     #:wait

        shr     vmpage, offset_width wc ' carry is now one for read and zero for write
        mov     set_dirty_bit, #0       ' make mask to set dirty bit on writes
        muxnc   set_dirty_bit, dirty_mask
        mov     line, vmpage            ' get the cache line index
        and     line, index_mask
        mov     hubaddr, line
        shl     hubaddr, offset_width
        add     hubaddr, cacheptr       ' get the address of the cache line
        wrlong  hubaddr, pvmaddr        ' return the address of the cache line
        movs    :ld, line
        movd    :st, line
:ld     mov     vmcurrent, 0-0          ' get the cache line tag
        and     vmcurrent, tag_mask
        cmp     vmcurrent, vmpage wz    ' z set means there was a cache hit
  if_nz call    #miss                   ' handle a cache miss
:st     or      0-0, set_dirty_bit      ' set the dirty bit on writes
        jmp     #waitcmd                ' wait for a new command

' line is the cache line index
' vmcurrent is current page
' vmpage is new page
' hubaddr is the address of the cache line
miss    movd    :test, line
        movd    :st, line
:test   test    0-0, dirty_mask wz
  if_z  jmp     #:rd                    ' current page is clean, just read new page
        mov     vmaddr, vmcurrent
        shl     vmaddr, offset_width
        call    #BWRITE                 ' write current page
:rd     mov     vmaddr, vmpage
        shl     vmaddr, offset_width
        call    #BREAD                  ' read new page
:st     mov     0-0, vmpage
miss_ret ret

' pointers to mailbox entries
pvmcmd          long    0       ' on call this is the virtual address and read/write bit
pvmaddr         long    0       ' on return this is the address of the cache line containing the virtual address

cacheptr        long    0       ' address in hub ram where cache lines are stored
vmpage          long    0       ' page containing the virtual address
vmcurrent       long    0       ' current page in selected cache line (same as vmpage on a cache hit)
line            long    0       ' current cache line index
set_dirty_bit   long    0       ' DIRTY_BIT set on writes, clear on reads

zero            long    0       ' zero constant
dstinc          long    1<<9    ' increment for the destination field of an instruction
t1              long    0       ' temporary variable
t2              long    0       ' temporary variable

tag_mask        long    !(1<<DIRTY_BIT) ' includes EMPTY_BIT
index_width     long    DEFAULT_INDEX_WIDTH
index_mask      long    0
index_count     long    0
offset_width    long    DEFAULT_OFFSET_WIDTH
line_size       long    0                       ' line size in longs
empty_mask      long    (1<<EMPTY_BIT)
dirty_mask      long    (1<<DIRTY_BIT)

'----------------------------------------------------------------------------------------------------
'
' BSTART
'
' setup the high order address byte
'
'----------------------------------------------------------------------------------------------------

BSTART
        mov     address,vmaddr          ' get the high address byte
        shr     address,#16             ' shift right by 16 places
        and     address,#$FF            ' ensure rest of bits zero
        mov     HighLatch,address       ' put value into HighLatch
        mov     dira,LatchDirection     ' setup active pins 138 and bus
        mov     outa,HighLatch          ' send out HighLatch
        or      outa,HighAddress        ' or with the high address
        andn    outa,GateHigh           ' set gate low
        or      outa,GateHigh           ' set the gate high again
        mov     ptr, hubaddr            ' hubaddr = hub page address
        mov     address, vmaddr
        mov     count, line_size
BSTART_RET
        ret

'----------------------------------------------------------------------------------------------------
'
' BREAD
'
' vmaddr is the virtual memory address to read
' hubaddr is the hub memory address to write
' count is the number of longs to read
'
' trashes count, ptr
'
'----------------------------------------------------------------------------------------------------

BREAD
        call    #BSTART
rdloop  call    #read_memory_byte       ' read byte from address into data_8
        wrbyte  data_8,ptr              ' write data_8 to hubaddr ie copy byte to hub
        add     ptr,#1                  ' add 1 to hub address
        add     address,#1              ' add 1 to ram address
        djnz    count,#rdloop           ' loop until done
BREAD_RET
        ret

'----------------------------------------------------------------------------------------------------
'
' BWRITE
'
' vmaddr is the virtual memory address to write
' hubaddr is the hub memory address to read
' count is the number of longs to write
'
' trashes count, ptr, count
'
'----------------------------------------------------------------------------------------------------

BWRITE
        call    #BSTART
wrloop  rdbyte  data_8, ptr             ' copy byte from hub
        call    #write_memory_byte      ' write byte from data_8 to address
        add     ptr,#1                  ' add 1 to hub address
        add     address,#1              ' add 1 to ram address
        djnz    count,#wrloop           ' loop until done
BWRITE_RET
        ret

' input parameters to BREAD and BWRITE
vmaddr      long    0       ' virtual address
hubaddr     long    0       ' hub memory address to read from or write to

' temporaries used by BREAD and BWRITE
ptr         long    0
count       long    0

''From Dracblade driver for talking to a ram chip via three latches
'' Modified code from Cluso's triblade

'---------------------------------------------------------------------------------------------------------
'Memory Access Functions

read_memory_byte        call #RamAddress                ' sets up the latches with the correct ram address
                        mov dira,LatchDirection2        ' for reads so P0-P7 tristate till do read
                        mov outa,GateHigh               ' actually ReadEnable but they are the same
                        andn outa,GateHigh              ' set gate low
                        nop                             ' short delay to stabilise
                        nop
                        mov data_8, ina                 ' read SRAM
                        and data_8, #$FF                ' extract 8 bits
                        or  outa,GateHigh               ' set the gate high again
read_memory_byte_ret    ret

write_memory_byte       call #RamAddress                ' sets up the latches with the correct ram address
                        mov outx,data_8                 ' get the byte to output
                        and outx, #$FF                  ' ensure upper bytes=0
                        or outx,WriteEnable             ' or with correct 138 address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                        nop                             ' no nop doesn't work, one does, so put in two to be sure
                        nop                             ' another NOP
                        or outa,GateHigh                ' set it high again
write_memory_byte_ret   ret

RamAddress ' sets up the ram latches. Assumes high latch A16-A18 low so only accesses 64k of ram
                        mov dira,LatchDirection         ' set up the pins for programming latch chips
                        mov outx,address                ' get the address into a temp variable
                        and outx,#$FF                   ' mask the low byte
                        or  outx,LowAddress             ' or with 138 low address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                        or outa,GateHigh                ' set it high again
                                                        ' now repeat for the middle byte
                        mov outx,address                ' get the address into a temp variable
                        shr outx,#8                     ' shift right by 8 places
                        and outx,#$FF                   ' mask the low byte
                        or  outx,MiddleAddress          ' or with 138 middle address
                        mov outa,outx                   ' send it out
                        andn outa,GateHigh              ' set gate low
                        or outa,GateHigh                ' set it high again
RamAddress_ret          ret

GateHigh                long    %00000000_00000000_00000001_00000000  ' HC138 gate high, all others must be low - also used as ReadEnable
outx                    long    0                                     ' for temp use, same as n in the spin code
LatchDirection          long    %00000000_00000000_00001111_11111111 ' 138 active, gate active and 8 data lines active
LatchDirection2         long    %00000000_00000000_00001111_00000000 ' for reads so data lines are tristate till the read
LowAddress              long    %00000000_00000000_00000101_00000000 ' low address latch = xxxx010x and gate high xxxxxxx1
MiddleAddress           long    %00000000_00000000_00000111_00000000 ' middle address latch = xxxx011x and gate high xxxxxxx1
HighAddress             long    %00000000_00000000_00001001_00000000 ' high address latch = xxxx100x and gate high xxxxxxx1
WriteEnable             long    %00000000_00000000_00000011_00000000 ' /WE = xxxx001x and gate high xxxxxxx1
data_8                  long    %00000000_00000000_00000000_00000000 ' so code compatability with zicog driver
address                 long    %00000000_00000000_00000000_00000000 ' address for ram chip
HighLatch               long    %00000000_00000000_00000000_00000000 ' static value for the 374 latch that does the led, hA16-A19 and the other 4 outputs

                        fit     496

and this is the complete driver code for the touchscreen

DAT
'' +-----------------------------------------------------------------------------------------------+
'' | Touchblade 161 Ram Driver (with grateful acknowlegements to Cluso and Average Joe)            |
'' +-----------------------------------------------------------------------------------------------+
                        org     0
tbp2_start    ' setup the pointers to the hub command interface (saves execution time later
                                      '  +-- These instructions are overwritten as variables after start
comptr                  mov     comptr, par     ' -|  hub pointer to command                
hubptr                  mov     hubptr, par     '  |  hub pointer to hub address            
ramptr                  add     hubptr, #4      '  |  hub pointer to ram address            
lenptr                  mov     ramptr, par     '  |  hub pointer to length                 
errptr                  add     ramptr, #8      '  |  hub pointer to error status           
cmd                     mov     lenptr, par     '  |  command  I/R/W/G/P/Q                  
hubaddr                 add     lenptr, #12     '  |  hub address                           
ramaddr                 mov     errptr, par     '  |  ram address                           
len                     add     errptr, #16     '  |  length                                
err                     nop                     ' -+  error status returned (=0=false=good) 


' Initialise hardware tristates everything and read/write set the pins
init                    mov     err, #0                  ' reset err=false=good
                        mov     dira,zero                ' tristate the pins with the cog dira

done                    wrlong  err, errptr             ' status  =0=false=good, else error x
                        wrlong  zero, comptr            ' command =0 (done)
' wait for a command (pause short time to reduce power)
pause
'                        mov     ctr, delay      wz      ' if =0 no pause
'              if_nz     add     ctr, cnt
'              if_nz     waitcnt ctr, #0                 ' wait for a short time (reduces power)
                        rdlong  cmd, comptr     wz      ' command ?
              if_z      jmp     #pause                  ' not yet
' decode command
                        cmp     cmd, #"S"       wz      ' hub to ram
              if_z      jmp     #pasmhubtoram           
                        cmp     cmd, #"T"       wz      ' ram to hub
              if_z      jmp     #pasmramtohub
                        cmp     cmd, #"U"       wz      ' ram to display
              if_z      jmp     #pasmramtodisplay
                        cmp     cmd, #"V"       wz      ' hub to display
              if_z      jmp     #pasmhubtodisplay           
                        cmp     cmd, #"E"       wz      ' convert 3 byte .raw format to 2 byte .ili format - hub to hub
              if_z      jmp     #rawtoiliformat
                        cmp     cmd, #"F"       wz      ' convert 3 byte .bmp format BGR to 2 byte ili format (same as E but order reversed)
              if_z      jmp     #bmptoiliformat              
 '                       cmp     cmd, #"W"       wz      ' lcdwritecom in pasm, not working
 '             if_z      jmp     #pasmlcdwritecom
                        cmp     cmd, #"X"       wz      ' merge icon and background based on a mask
              if_z      jmp     #mergeicons
               
                        cmp     cmd, #"I"       wz      ' init
              if_z      jmp     #init     
                        mov     err, cmd                ' error = cmd (unknown command)
                        jmp     #done
                        
' ----------------- common routines -------------------------------------

get_values              rdlong  hubaddr, hubptr         ' get hub address
                        rdlong  ramaddr, ramptr         ' get ram address
                        rdlong  len, lenptr             ' get length
                        mov     err, #5                 ' err=5
get_values_ret          ret

                    ' ??come to this with possibly all pins tristated so need to make P16-P20 high before changing the 138 value 
set138                  shl     pasm_n,#25              ' pass n =0 to 7
                        or      dira,maskP0P20P25P27    ' make P25-P27 outputs as well as P0 to P20
                        andn    outa,mask138            ' make these three pins low
                        or      outa,pasm_n             ' set the 138 pins
set138_ret              ret


load161pasm                                             ' uses ramaddr
                        mov     pasm_n,#7
                        call    #set138                 ' deselect previous 138 value
                        or      dira,maskP0P20          ' %00000000_00011111_11111111_11111111         ' P0-P18 enabled for output plus P19,P20 
                        and     outa,maskP18low         ' %00001111_11111000_00000000_00000000         ' preserve previous values but set A0-18 low   
                        or      outa,ramaddr            ' output address to 161 chips
                        andn    outa,maskP19            ' set pin 19 low =  161 clock
                        mov     pasm_n,#1               ' 161 load low
                        call    #set138                 ' set it low
                        or      outa,maskP19            ' set pin 19 high = 161 clock
                        or      outa,maskP16P20         ' %00000000_00011111_00000000_00000000         ' set P16-P20 high prior to changing 138 
                        mov     pasm_n,#2               ' 161 load high and back to mem transfer
                        call    #set138                 ' send out
load161pasm_ret         ret

stop                   jmp     #stop                  ' for debugging



' ------------------ single letter commands  -------------------------------------
' command S
pasmhubtoram            call    #get_values             ' get hubaddr,ramaddr,len
                        call    #load161pasm                ' load the 161 counters with ramaddr
hubtoram_loop           and     outa,maskP16P31         '%11111111_11111111_00000000_00000000       ' clear for output                   
                        rdword  data_16,hubaddr         ' get the word from hub
                        and     data_16,maskP0P15       ' mask to a word only
                        or      outa,data_16            ' send out the byte to P0-P15
                        andn    outa,maskP20            ' set write low
                        add     hubaddr,#2              ' increment by 2 bytes = 1 word. Put this here for small delay while writes
                        or      outa,maskP20            ' write high
                        andn    outa,maskP19            ' clock 161 low
                        or      outa,maskP19            ' clock 161 high
                        djnz    len,#hubtoram_loop      ' loop this many times
                        jmp     #init                   ' tristate pins and listen for commands

' command T
pasmramtohub            call    #get_values             ' get hubaddr,ramaddr,len
                        call    #load161pasm            ' load the 161 counters with ramaddr
                        and     dira,maskP16P27         ' %00001111_11111111_00000000_00000000 set P0-P15 as inputs   
                        andn    outa,maskP16            ' memory /rd low
ramtohub_loop           mov     data_16,ina             ' get the data
                        wrword  data_16,hubaddr         ' move data to hub
                        andn    outa,maskP19            ' clock 161 low
                        or      outa,maskP19            ' clock 161 high
                        add     hubaddr,#2              ' increment the hub address 
                        djnz    len,#ramtohub_loop
                        or      outa,maskP16            ' memory /rd high  
                        or      dira,maskP0P15          ' %00000000_00000000_11111111_11111111 restore P0-P15as outputs
                        jmp     #init                   ' ' tristate pins and listen for commands

' command U
pasmramtodisplay        call    #get_values             ' get hubaddr,ramaddr,len
                        call    #load161pasm            ' load the 161 counters with ramaddr
                        or      outa,maskP18            ' ILI_RS high
                        andn    outa,maskP16            ' memory /rd low 
                        and     dira,maskP16P27         ' disable prop pins %00001111_11111111_00000000_00000000 set P0-P15 as inputs    
ramtodisplay_loop       andn    outa,maskP17            ' ILI write low
                        or      outa,maskP17            ' ILI write high
                        andn    outa,maskP19            ' clock 161 low
                        or      outa,maskP19            ' clock 161 high
                        djnz    len,#ramtodisplay_loop
                        or      outa,maskP16            ' memory /rd high  
                        or      dira,maskP0P15          ' %00000000_00000000_11111111_11111111 restore P0-P15as outputs
                        jmp     #init

' command V
pasmhubtodisplay        call    #get_values             ' get hubaddr,ramaddr,len
                        or      outa,maskP16P20         ' %00000000_00011111_00000000_00000000         ' set P16-P20 high prior to changing 138
                        mov     pasm_n,#7
                        call    #set138                 ' deselect previous 138 value
                        mov     pasm_n,#2               ' mem transfer
                        call    #set138                 ' send out
hubtodisplay_loop       and     outa,maskP16P31         '%11111111_11111111_00000000_00000000       ' clear for output                   
                        rdword  data_16,hubaddr         ' get the word from hub
                        and     data_16,maskP0P15       ' mask to a word only
                        or      outa,data_16            ' send out the byte to P0-P15
                        andn    outa,maskP17            ' ILI write low
                        or      outa,maskP17            ' ILI write high
                        add     hubaddr,#2              ' one word
                        djnz    len,#hubtodisplay_loop
                        jmp     #init

'command E
RawtoILIformat          ' takes a .raw 3 byte RRRRRRRR GGGGGGGG BBBBBBBB and converts to 2 byte RRRRRGGG GGGBBBBB
                        ' pass hubaddr, ramaddr and len
                        ' hubaddr is source location, len is number of pixels
                        ' ramaddr is destination in hub (messy naming) and length is 2/3 of blocklength
                        call    #get_values ' gets hubaddress, ramaddress and len (ignores ramaddress)
rawloop
                        rdbyte red,hubaddr
                        add hubaddr,#1
                        rdbyte green,hubaddr
                        add hubaddr,#1
                        rdbyte blue,hubaddr
                        add hubaddr,#1
                        call #rgbtoili
                        wrbyte ililow,ramaddr
                        add ramaddr,#1
                        wrbyte ilihigh,ramaddr
                        add ramaddr,#1
                        djnz    len,#rawloop            ' loop until done 
                        jmp     #init                   ' set pins to tristate

RGBtoILI                ' pass red,green, blue, returns ililow and ilihigh
                        shr     red,#3                  ' 000RRRRR 
                        shl     red,#3                  ' RRRRR000 
                        shr     green,#2                ' 00GGGGGG
                        mov     ilihigh,green           ' ilihigh = 00GGGGGG
                        shr     ilihigh,#3              ' ilihigh = 00000GGG
                        or      ilihigh,red             ' ilihigh = RRRRRGGG
                        and     green,#%00000111        ' 00000GGG
                        shl     green,#5                ' GGG00000
                        mov     ililow,green            ' ililow = GGG00000
                        shr     blue,#3                 ' blue = 000BBBBB
                        or      ililow,blue             ' ililow = GGGBBBBB
RGBtoILI_ret            ret

BMPtoILIformat          ' takes a .bmp 3 byte BBBBBBBB GGGGGGGG RRRRRRRR and converts to 2 byte RRRRRGGG GGGBBBBB
                        ' same as E above but BGR instead of RGB
                        ' pass hubaddr, ramaddr and len
                        ' hubaddr is source location, len is number of pixels
                        ' ramaddr is destination in hub (messy naming) and length is 2/3 of blocklength
                        call    #get_values ' gets hubaddress, ramaddress and len (ignores ramaddress)
bmploop
                        rdbyte blue,hubaddr
                        add hubaddr,#1
                        rdbyte green,hubaddr
                        add hubaddr,#1
                        rdbyte red,hubaddr
                        add hubaddr,#1
                        call #rgbtoili
                        wrbyte ililow,ramaddr
                        add ramaddr,#1
                        wrbyte ilihigh,ramaddr
                        add ramaddr,#1
                        djnz    len,#bmploop            ' loop until done 
                        jmp     #init                   ' set pins to tristate
' **** command X *********************

MergeIcons              call    #get_values ' gets hubaddress, ramaddress,len which are used here as background,icon,mask
                        mov     pasm_n,#59               ' do a single row
mergeiconsloop          rdbyte  ililow,len                 ' reuse ililow, so this is rdword mask,maskcounter
                        and     ililow,#%11111             ' mask off low 5 bits and use just the blue as this is a grayscale bitmap
                        rdword  red,hubaddr              ' reuse red, so actually this is rdword background,backgroundcounter                        
                        cmp     ililow,#%10000   wc       ' compare if >128 (ie mid level gray)
              if_c      jmp     #mergeskip
                        rdword  green,ramaddr            ' reuse green, so this is rdword iconpixel, iconpixelcounter 
                        wrword  green,hubaddr            ' if replace, then move icon pixel to the background     
mergeskip               add     hubaddr,#2
                        add     ramaddr,#2
                        add     len,#2
                        djnz    pasm_n,#mergeiconsloop            ' loop until done 
                        jmp     #init                   'set pins to tristate 

                        

'pasmlcdwritecom         call    #get_values             ' use hubaddr as the data
'                        or      dira,maskP0P20          ' set these pins high (pass all pins tristated)
'                        or      outa,maskP0P20          '  set pins high
'                        mov     pasm_n,#2               '  mem transfer
'                        call    #set138                 ' set the 138
'                        andn    outa,maskP18            ' P18 ILIRS low
'                        and     outa,maskP16P31         ' set P0-P15 low
'                        or      outa,hubaddr            ' send out the data
'                        andn    outa,maskP17            ' ILI write low
'                        or      outa,maskP17            ' ILI write high
'                        jmp     #init                   ' set pins to tristate  

' variables
pasm_n                  long    0                                    ' general purpose value
data_16                 long    0                                    ' general purpose value
ililow                  long    0                                    ' low data byte 
ilihigh                 long    0                                    ' high data byte 
red                     long    0                                    ' red, green blue variables
green                   long    0
blue                    long    0           

' constants
Zero                    long    %00000000_00000000_00000000_00000000 ' used in several places
mask138                 long    %00001110_00000000_00000000_00000000 ' mask for the three 138 pins   
maskP0P20               long    %00000000_00011111_11111111_11111111 ' P0-P18 enabled for output plus P19,P20    
maskP18low              long    %00001111_11111000_00000000_00000000 ' P0-P18 low
maskP16                 long    %00000000_00000001_00000000_00000000 ' pin 16
maskP17                 long    %00000000_00000010_00000000_00000000 ' pin 17
maskP18                 long    %00000000_00000100_00000000_00000000 ' pin 18
maskP19                 long    %00000000_00001000_00000000_00000000 ' pin 19
maskP20                 long    %00000000_00010000_00000000_00000000 ' pin 20
maskP16P31              long    %11111111_11111111_00000000_00000000 ' pin 16 to pin 31
maskP0P20P25P27         long    %00001110_00011111_11111111_11111111  ' enable all pins as outputs except SD pins
maskP0P15               long    %00000000_00000000_11111111_11111111 ' for masking words
maskP16P20              long    %00000000_00011111_00000000_00000000
maskP16P27              long    %00001111_11111111_00000000_00000000
                        fit     496

I guess the first thing to point out is there are a lot of extra functions in that code that can be ignored. At the core are just two routines - move a block of data from hub to ram, and move a block of data from ram to hub.

In every routine there is one common call, which collects the variables from the calling program

call    #ram_open               ' get variables from hub variables

and just to cause myself and others confusion, in the new code this routine has changed names to

call    #get_values             ' get hubaddr,ramaddr,len

It does the same thing though which is this

get_values              rdlong  hubaddr, hubptr         ' get hub address
                        rdlong  ramaddr, ramptr         ' get ram address
                        rdlong  len, lenptr             ' get length
                        mov     err, #5                 ' err=5
get_values_ret          ret

Next thing is a startup routine. David Betz calls this BSTART

call    #BSTART

For the dracblade this sets the high latch with address A16-A18.

On the touchscreen, the equivalent code is

                        call    #load161pasm                ' load the 161 counters with ramaddr

which loads up the 161 counters with the starting address.

I'll just note here that in the middle of David's BSTART routine there are some cache lines of code eg

        mov     address,vmaddr          ' get the high address byte

and a couple more at the end. So those will need to be replicated in the load161pasm driver code.

So once that is done, I think the aim is to replace this read loop

BREAD
        call    #BSTART
rdloop  call    #read_memory_byte       ' read byte from address into data_8
        wrbyte  data_8,ptr              ' write data_8 to hubaddr ie copy byte to hub
        add     ptr,#1                  ' add 1 to hub address
        add     address,#1              ' add 1 to ram address
        djnz    count,#rdloop           ' loop until done
BREAD_RET
        ret

with this code

pasmhubtoram            call    #get_values             ' get hubaddr,ramaddr,len
                        call    #load161pasm                ' load the 161 counters with ramaddr
hubtoram_loop           and     outa,maskP16P31         '%11111111_11111111_00000000_00000000       ' clear for output                   
                        rdword  data_16,hubaddr         ' get the word from hub
                        and     data_16,maskP0P15       ' mask to a word only
                        or      outa,data_16            ' send out the byte to P0-P15
                        andn    outa,maskP20            ' set write low
                        add     hubaddr,#2              ' increment by 2 bytes = 1 word. Put this here for small delay while writes
                        or      outa,maskP20            ' write high
                        andn    outa,maskP19            ' clock 161 low
                        or      outa,maskP19            ' clock 161 high
                        djnz    len,#hubtoram_loop      ' loop this many times
                        jmp     #init                   ' tristate pins and listen for commands

A couple of things to note there. First, all my routines finish with a jmp #init. All David's routine finish with a ret. Whatever happens, ultimately the cog routine must end by tristating all the propeller pins, ie

init                    mov     err, #0                  ' reset err=false=good
                        mov     dira,zero                ' tristate the pins

And the other thing to note is that there are two ram chips so it is reading in a word at a time, not a byte at a time. So hubaddr etc are incremented by 2 rather than by 1 each loop. So somewhere along the line, len will need to be divided by 2 for cache access (but not for access from within C code).

I suppose we have to work out what this board should be called. The easy answer might be a Touchblade, but I don't want to monopolise the "touch" part as I am sure many other touchscreens will end up supported by GCC.

So I think this encapsulates all the code and bits that need to change.

A general question first. The ILI driver code does multiple things in one cog, because each routine is small so lots can be fitted into a cog.

Is it possible to "share" the cache code and the display driver code in one cog?

If so, how do we send commands like Cluso99's single letter commands, and also send a cache command?

OR (and probably easier), split it up and have the cache code run in one cog, and do all the video driver code in another cog.

If we go for option #2, then the code becomes much simpler. Just ram to hub and hub to ram. I might start with that first and then think about combining cogs (to save a cog) later.

David Betz · 2012-04-28 02:49

Wow! It looks like you've done a lot of work on this already. You asked if there was a way to add a command parser to handle the single character commands that Cluso uses in his driver. While it doesn't use characters, all of the PropGCC cache drivers already have a command dispatch mechanism based on integer command IDs. This table could be extended to support more commands. You could certainly add commands to drive the display if all of the code will fit in the COG.

denominator · 2012-04-28 05:41

Dr_Acula wrote: »

Is it possible to "share" the cache code and the display driver code in one cog?

Yes. The SD cache driver does just that. It supports the XMM kernels by handling code-memory read cache misses, and it also handles generic block read/write commands for the C library's SD file system interface.

The trick is found in lib/drivers/load_sd_driver.c/file_io.c - in that code, you'll notice code similar to this:

#ifndef __PROPELLER_LMM__
extern uint16_t _xmm_mbox_p;

...

        sd_mbox = (uint32_t *)(uint32_t)_xmm_mbox_p;
#endif


static uint32_t __attribute__((section(".hubtext"))) do_cmd(uint32_t cmd)
{
    sd_mbox[0] = cmd;
    while (sd_mbox[0]);
    return sd_mbox[1];
}

The C library uses this code to send SD sector commands to the underlying cache driver. These are the extended commands that David mentioned. Check out the full source at http://code.google.com/p/propgcc/source/browse/lib/drivers/file_io.c. You might also want to check out the SD cache driver at http://code.google.com/p/propgcc/source/browse/loader/spin/sd_cache.spin.

Note that this code also supports using a separate driver that does just the sector read/writes needed by the library (your "option #2"). This works when you need SD card access in LMM mode and also when you need SD card access and you're using a different caching mechanism (most likely because all the other caching mechanisms are faster than the SD card cache). In these cases, the split driver is necessary because there is no SD cache driver!

To see this, check out http://code.google.com/p/propgcc/source/browse/lib/drivers/sd_driver.s and notice how similar it is to the aforementioned SD cache driver - just the cache code is missing. This file is loaded manually by the library in http://code.google.com/p/propgcc/source/browse/lib/drivers/load_sd_driver.c. (BTW, the sd_driver.spin driver in loader/spin looks almost 100% identical to this driver - but this driver is not used by the C library, it is solely used by the loader.)

Also note that it works 100% fine when both the SD cache driver and the SD library are loaded at the same time (again, your "option #2"). This wastes a cog, but it does work.

Dr_Acula · 2012-04-28 06:58

Interesting. Sharing SD XMM and SD for file I/O is similar to the problem of sharing external ram for XMM and for display updates.

Speaking of SD cards, I see there is an option in the dropdown menu "Dracblade-SDXMMC". What does that do?

And are there any xmm models out there where the program is stored on an SD card? Sounds a bit crazy, but with caching, it ought to be no slower than storing a program in serial sram or serial flash.

jazzed · 2012-04-28 07:58

Dr_Acula wrote: »

Interesting. Sharing SD XMM and SD for file I/O is similar to the problem of sharing external ram for XMM and for display updates.

Speaking of SD cards, I see there is an option in the dropdown menu "Dracblade-SDXMMC". What does that do?

And are there any xmm models out there where the program is stored on an SD card? Sounds a bit crazy, but with caching, it ought to be no slower than storing a program in serial sram or serial flash.

In SimpleIDE version 0-6-7 the -SDXMMC and -SDLOAD board types don't work. I have fix and will post later today.

The user's guide explains these, but for convenience ....

Program->Run (F10) and Program->Run Console (F8) buttons:

Dracblade-SDXMMC should start your XMMC program run from SDCard.
Dracblade-SDLOAD should put your program on SDCard and load it to SRAM and run.

Program->Burn (F11) button:

The boot loader is programmed to EEPROM for booting either SDXMMC or SDLOAD method.
The AUTORUN.PEX program can be replaced by copying it to SDcard - then you reset the board to run it.

Program->Build (F9) button:

Using the build hammer with SDXMMC or SDLOAD selected will create a.pex.
The a.pex file can be copied to SDcard as AUTORUN.PEX for booting after burning the EEPROM.

Tools->Send File to Target SDCard:

The download button next to the build hammer lets you send any file to SDcard via a serial protocol.
By default in XMM modes it will create a new AUTORUN.PEX program.
You can choose to send that or any file serially to the target SDCard.
I often use it to just copy AUTORUN.PEX to the SDCard using the filesystem.

David Betz · 2012-04-28 07:59

Dr_Acula wrote: »

Interesting. Sharing SD XMM and SD for file I/O is similar to the problem of sharing external ram for XMM and for display updates.

Speaking of SD cards, I see there is an option in the dropdown menu "Dracblade-SDXMMC". What does that do?

And are there any xmm models out there where the program is stored on an SD card? Sounds a bit crazy, but with caching, it ought to be no slower than storing a program in serial sram or serial flash.

You guessed it. That is exactly what SD XMMC mode does. It runs a program directly from the SD cards for systems that don't have any other external memory. The downside is that the SD sector size is 512 bytes which is much larger than the optimal cache line size.

Dr_Acula · 2012-04-28 16:28

You guessed it. That is exactly what SD XMMC mode does.

Prof Braino suggested something similar recently too. Thinking of the way a touchscreen OS might work, you have a "main" program that waits for the user to touch a key. Then it loads up an operation, eg a calculator, or a picture viewer, and that would need some cache changes. But while a calculator is running, no cache updates would be needed as it would all fit in a few kilobytes. So maybe SD cache is a "generic" option for many boards?

If so, could one think about a SD cache driver that worked on the dracblade by defining pins 12,13,14,15 and worked on the various demoboards by devining pins 0,1,2,3 and worked on the Touch161 board by defining pins 24,25,26,27.

Maybe you already have this for the demoboards?

Re the 512 byte sector size, all my programs seem to end up with an array "byte sdbuffer[512]" so if you had that and you read in 512 bytes and the next cache read request happened to be already in the buffer, could you detect that and not have to read the sd card again?

David Betz · 2012-04-28 16:32

Dr_Acula wrote: »

Prof Braino suggested something similar recently too. Thinking of the way a touchscreen OS might work, you have a "main" program that waits for the user to touch a key. Then it loads up an operation, eg a calculator, or a picture viewer, and that would need some cache changes. But while a calculator is running, no cache updates would be needed as it would all fit in a few kilobytes. So maybe SD cache is a "generic" option for many boards?

If so, could one think about a SD cache driver that worked on the dracblade by defining pins 12,13,14,15 and worked on the various demoboards by devining pins 0,1,2,3 and worked on the Touch161 board by defining pins 24,25,26,27.

Maybe you already have this for the demoboards?

Re the 512 byte sector size, all my programs seem to end up with an array "byte sdbuffer[512]" so if you had that and you read in 512 bytes and the next cache read request happened to be already in the buffer, could you detect that and not have to read the sd card again?

I don't have a demo board so I can't say whether our SD cache driver will work with it but I don't see why it wouldn't. The pin numbers are programmable and we can even handle different CS mechanisms like a simple single pin CS, the C3-style counter CS, and a mux like Bill Henning's boards use. The mux hasn't been tested, again because I don't have a board that uses one.

Dr_Acula · 2012-04-28 17:31

@jazzed

In SimpleIDE version 0-6-7 the -SDXMMC and -SDLOAD board types don't work. I have fix and will post later today.

Thanks++

If yo have a generic SD XMM version then this might end up a super simple solution for both myself and for others with different boards. Just take the board I have and change the pin numbers. (see post #22. The order of pins on the touch161 is the same order as the dracblade. I'm not sure about gadget ganster boards though).

I've designed boards with multiplexed pins for SD cards but in the end it is simpler to use existing code and just devote 4 propeller pins to SD cards. Which means that a solution for the dracblade will work for the gadget ganster board and the touch161 board. Just change the pin numbers.

This could open up large C programs for a whole lot of people who have boards with SD cards.

David Betz · 2012-04-28 18:14

Dr_Acula wrote: »

@jazzed

Thanks++

If yo have a generic SD XMM version then this might end up a super simple solution for both myself and for others with different boards. Just take the board I have and change the pin numbers. (see post #22. The order of pins on the touch161 is the same order as the dracblade. I'm not sure about gadget ganster boards though).

I've designed boards with multiplexed pins for SD cards but in the end it is simpler to use existing code and just devote 4 propeller pins to SD cards. Which means that a solution for the dracblade will work for the gadget ganster board and the touch161 board. Just change the pin numbers.

This could open up large C programs for a whole lot of people who have boards with SD cards.

Several of our cache drivers can easily be programmed for different pins and chip selects by setting values in the board configuration file. The drivers themselves do not need to be modified or recompiled. This is true for the SD cache driver and there are also some new drivers for SPI flash and Quad SPI flash chips that can be used with chips connected to any pins and with a variety of chip selects.

Dr_Acula · 2012-04-28 18:16

You wouldn't happen to have a link to such a generic SD cache driver by any chance? Or the board config file? (which c:\propgcc folder?) Maybe this whole thing might come down to changing 4 numbers in a text file?!

David Betz · 2012-04-28 18:31

Dr_Acula wrote: »

You wouldn't happen to have a link to such a generic SD cache driver by any chance? Or the board config file? (which c:\propgcc folder?) Maybe this whole thing might come down to changing 4 numbers in a text file?!

Here is the configuration file for the PropBOE. The lines that begin with "sdspi-" are the definitions of the SD card pins that are used by the SD cache driver. This is using a simple single pin CS but other options are available as I mentioned in an earlier message.

# [propboe]
# IDE:SDXMMC
    clkfreq: 80000000
    clkmode: XTAL1+PLL16X
    baudrate: 115200
    rxpin: 31
    txpin: 30
    cache-driver: eeprom_cache.dat
    cache-size: 8K
    cache-param1: 0
    cache-param2: 0
    eeprom-first: TRUE
    sd-driver: sd_driver.dat
    sdspi-do: 22
    sdspi-clk: 23
    sdspi-di: 24
    sdspi-cs: 25

Dr_Acula · 2012-04-28 19:01

Fantasic. I'll plug that in when I get home from work and give it a go.

I'm really excited about the whole "cache and run from an SD card" concept - it has so many possibilities.

denominator · 2012-04-28 20:24

David Betz wrote: »

I don't have a demo board so I can't say whether our SD cache driver will work with it but I don't see why it wouldn't.

Dr. Acula:

I have adding both a full-size- and a micro-SD to the demo board and it works fine. Just like David said, all you have to do is provide the appropriate loader config file in the propeller-load directory.

To convert an existing config to allow it to be used with SD caching:

1) Add the following parameters - these use the pins I used for my demo board:

sdspi-do: 4
sdspi-clk: 5
sdspi-di: 6
sdspi-cs: 7

Note that there are 4 additional sdspi- parameters that allow you to using address multiplexing on the SPI bus - check the code or ask for an explanation if you want to use them.

2) If you're going to use the IDE, add a line near the top like this:

# IDE:SDXMMC

Note that you do not have to include the "sd-driver: sd_driver.dat" line unless your board provides some caching mechanism and you want to run your program using the alternate SD card execution method (the sd-loader method that reads your entire program and tosses it to the cache before starting).

- Ted

Dr_Acula · 2012-04-29 03:21

Thanks denominator.

I tired doing that but have run into some problems.

1) Change the file "propboe.cfg" so the pins are correct for my sd card. (same order, just add 2 to each number)
corrected file below

# [propboe]
# IDE:SDXMMC
    clkfreq: 80000000
    clkmode: XTAL1+PLL16X
    baudrate: 115200
    rxpin: 31
    txpin: 30
    cache-driver: eeprom_cache.dat
    cache-size: 8K
    cache-param1: 0
    cache-param2: 0
    eeprom-first: TRUE
    sd-driver: sd_driver.dat
    sdspi-do: 24
    sdspi-clk: 25
    sdspi-di: 26
    sdspi-cs: 27

Now compile a program using simpleIDE. PROPBOE in the dropdown menu at the top. I presume that is right. Memory model is XMMC.

I can't copy and paste the build dialog but it says that it verified sending the data to ram, and then verified sending it to flash.

I did a program to eeprom. However, no file has appeared on the SD card. The program does run though so it appears as if it is in eeprom, not on the SD card.

I tried sending it to PROPBOE-SDXMMC but it says it can't find the board configuration.

Any advice here would be most appreciated!

jazzed · 2012-04-29 04:54

Dr_Acula wrote: »

I tried sending it to PROPBOE-SDXMMC but it says it can't find the board configuration.

Any advice here would be most appreciated!

PROPBOE-SDXMMC must be selected for the IDE to use the SDXMMC mode.
As I mentioned though, version 0-6-7 has a bug regarding this board type.

Please read this message: http://forums.parallax.com/showthread.php?137928-PropGCC-SimpleIDE&p=1094377&viewfull=1#post1094377

New XMM hardware

Comments