Ok guys, I think I got it! I was trashing "hubadd" in the loops. After reading the dracblade driver, I figured it out!
{
Skeleton JCACHE external RAM driver
Copyright (c) 2011 by David Betz
Based on code by Steve Denson (jazzed)
Copyright (c) 2010 by John Steven Denson
Inspired by VMCOG - virtual memory server for the Propeller
Copyright (c) February 3, 2010 by William Henning
For the EuroTouch 161 By James Moxham and Joe Heinz
TERMS OF USE: MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
}
CON
' default cache dimensions
DEFAULT_INDEX_WIDTH = 6
DEFAULT_OFFSET_WIDTH = 7
' cache line tag flags
EMPTY_BIT = 30
DIRTY_BIT = 31
PUB image
return @init_vm
DAT
org $0
' initialization structure offsets
' $0: pointer to a two word mailbox
' $4: pointer to where to store the cache lines in hub ram
' $8: number of bits in the cache line index if non-zero (default is DEFAULT_INDEX_WIDTH)
' $a: number of bits in the cache line offset if non-zero (default is DEFAULT_OFFSET_WIDTH)
' note that $4 must be at least 2^(index_width+offset_width) bytes in size
' the cache line mask is returned in $0
init_vm mov t1, par ' get the address of the initialization structure
rdlong pvmcmd, t1 ' pvmcmd is a pointer to the virtual address and read/write bit
mov pvmaddr, pvmcmd ' pvmaddr is a pointer into the cache line on return
add pvmaddr, #4
add t1, #4
rdlong cacheptr, t1 ' cacheptr is the base address in hub ram of the cache
add t1, #4
rdlong t2, t1 wz
if_nz mov index_width, t2 ' override the index_width default value
add t1, #4
rdlong t2, t1 wz
if_nz mov offset_width, t2 ' override the offset_width default value
mov index_count, #1
shl index_count, index_width
mov index_mask, index_count
sub index_mask, #1
mov line_size, #1
shl line_size, offset_width
mov t1, line_size
sub t1, #1
wrlong t1, par
' put external memory initialization here
jmp #vmflush
fillme long 0[128-fillme] ' first 128 cog locations are used for a direct mapped cache table
fit 128
' initialize the cache lines
vmflush movd :flush, #0
mov t1, index_count
:flush mov 0-0, empty_mask
add :flush, dstinc
djnz t1, #:flush
' start the command loop
waitcmd wrlong zero, pvmcmd
:wait rdlong vmline, pvmcmd wz
if_z jmp #:wait
shr vmline, offset_width wc ' carry is now one for read and zero for write
mov set_dirty_bit, #0 ' make mask to set dirty bit on writes
muxnc set_dirty_bit, dirty_mask
mov line, vmline ' get the cache line index
and line, index_mask
mov hubaddr, line
shl hubaddr, offset_width
add hubaddr, cacheptr ' get the address of the cache line
wrlong hubaddr, pvmaddr ' return the address of the cache line
movs :ld, line
movd :st, line
:ld mov vmcurrent, 0-0 ' get the cache line tag
and vmcurrent, tag_mask
cmp vmcurrent, vmline wz ' z set means there was a cache hit
if_nz call #miss ' handle a cache miss
:st or 0-0, set_dirty_bit ' set the dirty bit on writes
jmp #waitcmd ' wait for a new command
' line is the cache line index
' vmcurrent is current cache line
' vmline is new cache line
' hubaddr is the address of the cache line
miss movd :test, line
movd :st, line
:test test 0-0, dirty_mask wz
if_z jmp #:rd ' current cache line is clean, just read new one
mov vmaddr, vmcurrent
shl vmaddr, offset_width
call #wr_cache_line ' write current cache line
:rd mov vmaddr, vmline
shl vmaddr, offset_width
call #rd_cache_line ' read new cache line
:st mov 0-0, vmline
miss_ret ret
' pointers to mailbox entries
pvmcmd long 0 ' on call this is the virtual address and read/write bit
pvmaddr long 0 ' on return this is the address of the cache line containing the virtual address
cacheptr long 0 ' address in hub ram where cache lines are stored
vmline long 0 ' cache line containing the virtual address
vmcurrent long 0 ' current selected cache line (same as vmline on a cache hit)
line long 0 ' current cache line index
set_dirty_bit long 0 ' DIRTY_BIT set on writes, clear on reads
zero long 0 ' zero constant
dstinc long 1<<9 ' increment for the destination field of an instruction
t1 long 0 ' temporary variable
t2 long 0 ' temporary variable
tag_mask long !(1<<DIRTY_BIT) ' includes EMPTY_BIT
index_width long DEFAULT_INDEX_WIDTH
index_mask long 0
index_count long 0
offset_width long DEFAULT_OFFSET_WIDTH
line_size long 0 ' line size in bytes
empty_mask long (1<<EMPTY_BIT)
dirty_mask long (1<<DIRTY_BIT)
' input parameters to rd_cache_line and wr_cache_line
vmaddr long 0 ' external address
hubaddr long 0 ' hub memory address
' temporaries used by BREAD and BWRITE
ptr long 0
count long 0 ' copy line size and devide by two for byte-word offset
'get_values rdlong hubaddr, hubptr ' get hub address
' rdlong ramaddr, ramptr ' get ram address
' rdlong len, lenptr ' get length
' mov err, #5 ' err=5
'get_values_ret ret
'init 'mov err, #0 ' reset err=false=good
'mov dira,zero ' tristate the pins with the cog dira
' and dira,maskP0P20P22 ' tristates all the common pins
'done 'wrlong err, errptr ' status =0=false=good, else error x
'wrlong zero, comptr ' command =0 (done)
' Pass pasm_n = 0- 7 come to this with P0-P20 and P22 tristated and returns them as this too
set137 or dira,maskP22 ' pin 22 is an output
andn outa,maskP22 ' set P22low so Y0-Y7 are all high
or dira,maskP0P20 ' pins P0-P20 are outputs
and outa,maskP0P2low ' set these 3 pins low
or outa,pasm_n ' set the 137 pins
or outa,maskP22 ' pin 22 high
set137_ret ret ' return
load161pasm ' uses vmaddr
mov count, line_size ' make a copy of line_size AND.
shr count, #1 ' devide lenght by two for word-byte
mov ptr, hubaddr ' hubaddr = hub page address
or outa,maskP0P20 ' set P0-P20 high
or dira,maskP0P20 ' output pins 0-20
mov pasm_n,#0 ' group 0
call #set137 ' set the 137 output
and outa,maskP0P18low ' pins 0-18 set low
or outa,vmaddr ' output addres to 161 chips
or outa,maskP19 ' clock high
or outa,maskP20 ' load high
andn outa,maskP19 ' clock low
andn outa,maskP20 ' load low
or outa,maskP19 ' clock high
or outa,maskP20 ' load high
'andn outa,maskP20 ' load low
'andn outa,maskP19 ' clock low
'or outa,maskP19 ' clock high
'or outa,maskP20 ' load high
load161pasm_ret ret
stop jmp #stop ' for debugging
memorytransfer or dira,maskP16P20 ' so /wr and other pins definitely high
or outa,maskP16P20
mov pasm_n,#1 ' back to group 1 for memory transfer
call #set137 ' as next routine will always be group 1
or dira,maskP16P20 ' output pins 16-20
or outa,maskP16P20 ' set P16-P20 high (P0-P15 set as inputs or outputs in the calling routine)
memorytransfer_ret ret
busoutput or dira,maskP0P15 ' set prop pins 0-15 as outputs
busoutput_ret ret
businput and dira,maskP16P31 ' set P0-P15 as inputs
businput_ret ret
delaynop nop
nop
nop
nop
delaynop_ret ret
'----------------------------------------------------------------------------------------------------
'
' rd_cache_line - read a cache line from external memory
'
' vmaddr is the external memory address to read
' hubaddr is the hub memory address to write
' line_size is the number of bytes to read
'
'----------------------------------------------------------------------------------------------------
rd_cache_line
' command T
pasmramtohub
call #load161pasm ' load the 161 counters with ramaddr
call #memorytransfer ' set to group 1, enable P16-P20 as outputs and set P16-P20 high
call #businput ' set prop pins P0-P15 as inputs
andn outa,maskP16 ' memory /rd low
ramtohub_loop mov data_16,ina ' get the data
wrword data_16,ptr ' move data to hub
andn outa,maskP19 ' clock 161 low
or outa,maskP19 ' clock 161 high
add ptr,#2 ' increment the hub address
djnz count,#ramtohub_loop
or outa,maskP16 ' memory /rd high
or dira,maskP0P15 ' %00000000_00000000_11111111_11111111 restore P0-P15as outputs
and dira,maskP0P20P22 ' tristates all the common pins
rd_cache_line_ret
ret
'----------------------------------------------------------------------------------------------------
'
' wr_cache_line - write a cache line to external memory
'
' vmaddr is the external memory address to write
' hubaddr is the hub memory address to read
' line_size is the number of bytes to write
'
'----------------------------------------------------------------------------------------------------
wr_cache_line
' command S
pasmhubtoram
call #load161pasm ' load the 161 counters with ramaddr
call #memorytransfer ' set to group 1, enable P16-P20 as outputs and set P16-P20 high
call #busoutput ' set prop pins P0-P15 as outputs
hubtoram_loop and outa,maskP16P31 '%11111111_11111111_00000000_00000000 ' clear for output
rdword data_16,ptr ' get the word from hub
and data_16,maskP0P15 ' mask to a word only
or outa,data_16 ' send out the byte to P0-P15
andn outa,maskP17 ' set mem write low
add ptr,#2 ' increment by 2 bytes = 1 word. Put this here for small delay while writes
or outa,maskP17 ' mem write high
andn outa,maskP19 ' clock 161 low
or outa,maskP19 ' clock 161 high
djnz count,#hubtoram_loop ' loop this many times
and dira,maskP0P20P22 ' tristates all the common pins
wr_cache_line_ret
ret
pasm_n long 0 ' general purpose value
data_16 long 0 ' general purpose value
maskP0P2low long %11111111_11111111_11111111_11111000 ' P0-P2 low
maskP0P20 long %00000000_00011111_11111111_11111111 ' P0-P18 enabled for output plus P19,P20
maskP0P18low long %11111111_11111000_00000000_00000000 ' P0-P18 low
maskP16 long %00000000_00000001_00000000_00000000 ' pin 16
maskP17 long %00000000_00000010_00000000_00000000 ' pin 17
maskP18 long %00000000_00000100_00000000_00000000 ' pin 18
maskP19 long %00000000_00001000_00000000_00000000 ' pin 19
maskP20 long %00000000_00010000_00000000_00000000 ' pin 20
maskP22 long %00000000_01000000_00000000_00000000 ' pin 22
maskP16P31 long %11111111_11111111_00000000_00000000 ' pin 16 to pin 31
maskP0P15 long %00000000_00000000_11111111_11111111 ' for masking words
maskP16P20 long %00000000_00011111_00000000_00000000
maskP0P20P22 long %11111111_10100000_00000000_00000000 ' for returning all group pins HiZ
fit 496
The change is this :
mov ptr, hubaddr ' hubaddr = hub page address
Then change all loop references of hubaddr to ptr.
Now all tests pass!
Now I need to figure out how to test the actual read and write speeds! Any test scrips for this?
Congratulations!! It sounds like I should annotate the skeleton driver a little better so it says you have to leave line_size and hubaddr unchanged in your rd/wr code.
That would be very helpful! I'm sorry I sent you guys on a wild goose chase. Now I know for in the future. Also, are there any scripts to test the performance of the cache driver? I know this could still use some tuning and optimizations. Would it be helpful to unroll the loops . eg move the code in set137 and load161 to the loops?
Thanks again for all your help!
Joe
Glad you got something running. Now what else needs to be done?
To try running programs there are a few things to do:
1. Create a .dat file with BSTC
2. Copy the .dat file to the c:\propgcc\propeller-load directory
3. Add the cache driver to your touch161.cfg file.
4. Start SimpleIDE and try a program
Details:
1. Create a .dat file.
Assuming your cache source is called touch161_cache.spin, use this bstc command in a CMD window:
bstc -Ograux -c touch161_cache.spin
This creates a touch161_cache.dat file.
2. Copy the .dat file
copy touch161_cache.dat c:\propgcc\propeller-load
3. Add the cache-driver
Edit the touch161.cfg file so that it looks like this:
Hmm, it appears as if XMMC will not run unless I select boardtype *board*-SDXMMC?
XMM-Single works, not sure why XMMC wouldn't. I also tested SD card test in XMM-single. I tried to compile dry.c but:
dry.c:415:23: fatal error: sys/times.h: No such file or directory
dry.c:417:76: fatal error: sys/param.h: No such file or directory
that's as far as I get
Not sure how to set this?
#include <sys/param.h> /* If your system doesn't have this, use -DHZ=xxx */
Sounds like I'm getting ahead of myself since no XMMC though.
SimpleIDE will not build the dhrystone test because of the way it must be built.
I've attached a dry_xmmc.elf and dry_xmm_single.elf in a .zip for you.
Use the loader in a command window for these examples.
Hmm, it appears as if XMMC will not run unless I select boardtype *board*-SDXMMC?
XMM-Single works, not sure why XMMC wouldn't. I also tested SD card test in XMM-single. I tried to compile dry.c but:
dry.c:415:23: fatal error: sys/times.h: No such file or directory
dry.c:417:76: fatal error: sys/param.h: No such file or directory
that's as far as I get
Not sure how to set this?
#include <sys/param.h> /* If your system doesn't have this, use -DHZ=xxx */
Sounds like I'm getting ahead of myself since no XMMC though.
For some reason XMMC still doesn't work. No hello world, nothing. XMM seems to work fine though?
C:\Users\Joe\Downloads\dry>propeller-load -r -t -b dracTouchEX dry_xmm_single.elf
Propeller Version 1 on COM21
Loading the serial helper to hub memory
9528 bytes sent
Verifying RAM ... OK
Loading cache driver 'ET_cache.dat'
1088 bytes sent
Loading program image to RAM
17408 bytes sent
Loading .xmmkernel
1724 bytes sent
[ Entering terminal mode. Type ESC or Control-C to exit. ]
Dhrystone Benchmark, Version C, Version 2.2
Program compiled without 'register' attribute
Using STDC clock(), HZ=80000000
Trying 5000 runs through Dhrystone:
Final values of the variables used in the benchmark:
Int_Glob: 5
should be: 5
Bool_Glob: 1
should be: 1
Ch_1_Glob: A
should be: A
Ch_2_Glob: B
should be: B
Arr_1_Glob[8]: 7
should be: 7
Arr_2_Glob[8][7]: 5010
should be: Number_Of_Runs + 10
Ptr_Glob->
Ptr_Comp: 536899232
should be: (implementation-dependent)
Discr: 0
should be: 0
Enum_Comp: 2
should be: 2
Int_Comp: 17
should be: 17
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
Ptr_Comp: 536899232
should be: (implementation-dependent), same as above
Discr: 0
should be: 0
Enum_Comp: 1
should be: 1
Int_Comp: 18
should be: 18
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc: 5
should be: 5
Int_2_Loc: 13
should be: 13
Int_3_Loc: 7
should be: 7
Enum_Loc: 1
should be: 1
Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING
should be: DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING
should be: DHRYSTONE PROGRAM, 2'ND STRING
Microseconds for one run through Dhrystone: 1400
Dhrystones per Second: 714
Is it possible that 24k isn't enough RAM to run your program? In xmmc mode the code is in external memory but the data must fit in hub memory along with the cache. The cache size is probably set to 8k so that means the program you're running can't use more than 24k of RAM.
I think the hello.c should fit in 24k of ram? Not sure about the DrAFile or Dhrystone. Theoretically, if hello.c won't run, then nothing will. SDXMMC will run, but that's not what we want. I also tried adding SDLoad to the cfg file and this will not run either.
I think the hello.c should fit in 24k of ram? Not sure about the DrAFile or Dhrystone. Theoretically, if hello.c won't run, then nothing will. SDXMMC will run, but that's not what we want. I also tried adding SDLoad to the cfg file and this will not run either.
The SD loader should work as long as whatever program it's trying to run works. It can load either xmm or xmmc programs. If you try running the xmm program that's been working directly with propeller-load then it should work.
Here is one reason why xmmc might not work:
A program compiled using -mxmmc will have code that lives at 0x30000000. A program that is compiled in -mxmm-single mode will have code and data that live at 0x20000000. In order for an xmmc program to work, your cache driver must make the external memory visible at the 0x30000000 address. Most cache drivers don't decode the high order bits anyway so there are images of the memory repeated starting at 0x20000000 and repeating throughout the rest of memory. If you're decoding those high order bits then that might be why your xmmc program isn't working. Also, you might want to mask off those bits anyway since I believe your memory has many fewer address bits. Could those high bits be causing you trouble?
That very well could be it!
Now let me say I do understand what you're saying, for the most part. To make sure I DO understand:
Mask of the high bits of the EXTERNAL address. eg:
mov address, vmaddr
and address,maskhighadd
..
..
maskhighadd long %00000000_00000111_11111111_11111111 'access to full memory space
So copy vmaddr and then mask off the address bits my memory doesn't have? Now am I able to mask off even more bits to "reserve" memory for the rest of the system? Say :
maskhighadd long %00000000_00000000_01111111_11111111 'access to 32k partition of memory?
Kidding of course. Did have to find my own HC00 chip though.
Word wide memory needs a small adjustment: A1 should map to A0 on the chips. Add line 205.
and outa,maskP0P18low ' pins 0-18 set low
shr vmaddr, #1 ' schematic connects SRAM A0 to A0, not A1 - jsd. line 205
or outa,vmaddr ' output addres to 161 chips
Preliminary results before optimizations - SSF listed for comparison:
Below are results with fast read and other optimizations. Faster writes will take more work because of the write strobe requirement. Performance for this test and applications could be different with various cache line sizes. Currently the cache line size is 128 bytes and the whole read burst happens in 16us - 8MB/s line read.
Preliminary results before optimizations - SSF listed for comparison:
This is very exciting! So, I DL touch_cache and built the dat with "bstc -Ograux -c touch_cache.spin" GOOD.
Then, I ran dry xmm-single and the numbers check out. 754 DhryPerSec. GOOD.
C:\Users\Joe\Downloads\dry>propeller-load -r -t -b touch161 dry_xmm_single.elf
Propeller Version 1 on COM21
Loading the serial helper to hub memory
9528 bytes sent
Verifying RAM ... OK
Loading cache driver 'new_cache.dat'
1084 bytes sent
Loading program image to RAM
17408 bytes sent
Loading .xmmkernel
1724 bytes sent
[ Entering terminal mode. Type ESC or Control-C to exit. ]
Dhrystone Benchmark, Version C, Version 2.2
Program compiled without 'register' attribute
Using STDC clock(), HZ=80000000
Trying 5000 runs through Dhrystone:
Final values of the variables used in the benchmark:
Int_Glob: 5
should be: 5
Bool_Glob: 1
should be: 1
Ch_1_Glob: A
should be: A
Ch_2_Glob: B
should be: B
Arr_1_Glob[8]: 7
should be: 7
Arr_2_Glob[8][7]: 5010
should be: Number_Of_Runs + 10
Ptr_Glob->
Ptr_Comp: 536899232
should be: (implementation-dependent)
Discr: 0
should be: 0
Enum_Comp: 2
should be: 2
Int_Comp: 17
should be: 17
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
Ptr_Comp: 536899232
should be: (implementation-dependent), same as above
Discr: 0
should be: 0
Enum_Comp: 1
should be: 1
Int_Comp: 18
should be: 18
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc: 5
should be: 5
Int_2_Loc: 13
should be: 13
Int_3_Loc: 7
should be: 7
Enum_Loc: 1
should be: 1
Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING
should be: DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING
should be: DHRYSTONE PROGRAM, 2'ND STRING
Microseconds for one run through Dhrystone: 1324
Dhrystones per Second: 754
Then I try the xmmC and I still get nothing? Fail
C:\Users\Joe\Downloads\dry>propeller-load -r -t -b touch161 dry_xmmc.elf
Propeller Version 1 on COM21
Loading the serial helper to hub memory
9528 bytes sent
Verifying RAM ... OK
Loading cache driver 'new_cache.dat'
1084 bytes sent
Loading program image to flash
15412 bytes sent
Loading .xmmkernel
1724 bytes sent
[ Entering terminal mode. Type ESC or Control-C to exit. ]
I still don't know what my xmmc issue is *NOOB* Still the numbers are pretty good.
Now I'm worried though, because we might have a new board on the way with a different Group Select chip *MCP23008* I will be using the current board for quite a while, since I have a software fix for the display issue.
*edit*
Sorry about making you look for the HC00. I only have 1.
I still don't know what my xmmc issue is *NOOB* Still the numbers are pretty good.
Now I'm worried though, because we might have a new board on the way with a different Group Select chip *MCP23008* I will be using the current board for quite a while, since I have a software fix for the display issue.
*edit*
Sorry about making you look for the HC00. I only have 1.
I don't get the dry_xmmc.elf problem. Works fine for me.
How big is the file? Is it a text file now by any chance?
>dir /A dry_xmmc.elf
Volume in drive C is OS
Volume Serial Number is 6C8D-7E65
Directory of c:\gccdev\propside\MyProjects\dry
05/27/2012 04:40 PM 37,526 dry_xmmc.elf
....
Downloaded it again and tested using the config file below:
propeller-load -r -t -b touch161 dry_xmmc.elf
# touch161
# IDE:SDLOAD
# IDE:SDXMMC
clkfreq: 80000000
clkmode: XTAL1+PLL16X
baudrate: 115200
rxpin: 31
txpin: 30
tvpin: 12 # only used if TV_DEBUG is defined
cache-driver: touch_cache.dat
cache-size: 8K
cache-param1: 0
cache-param2: 0
sd-driver: sd_driver.dat
sdspi-do: 24
sdspi-clk: 25
sdspi-di: 26
sdspi-cs: 27
load-target: ram
Can't just write this off. We need a root-cause otherwise we could be hiding some problem by accident.
Not sure how to get there though. The only thing i can think of is that the dry_xmmc.elf file is corrupted.
I knew it was a noob mistake. Missing load target = ram. I will run everything through xmmc to make sure this was the problem but it seems like it. I get the same number as you do in both modes now! How exciting!!!
I can't thank everyone enough! Steve, David, James and everyone else who has contributed to the community. You guys rock!
Wow, lots happened here while I was asleep! Great work.
One driving force for getting a GUI into C is that there will eventually not be enough room in Spin. The demo program I am using is taking about 3/4 of hub memory and every extra demo function takes it inexorably towards that moment when there is no code space left.
Also the demo program is getting unwieldy now as we are needing different code for the two displays. So that may well lend itself to a C Class or similar, where you call a common function DrawPixel(x,y,color) and it works for both displays.
So one could think about two header files, ILI9325.h and SSD1289.h
Each would have the same functions - load a font, draw a radiobox, draw a textbox, draw a line etc but the code would be different.
I also need to get my head around writing pasm in C. I pushed Catalina out to the limit with this but it could have been easier than it was. One nice thing about the proptool and spin is you can have the pasm and spin code in the same file. You can even copy and paste them so they are near each other so it is a couple of taps on a page up or page down button to swap between the two. That makes debugging a lot easier. At the other extreme, there is the idea of binary blobs of pasm that you have to copy to an SD card as separate files that are precompiled. That involves lots of removing the SD card which is a pain, slower and I ended up solving that by automating downloads of the pasm part as part of the compile process. That got complicated behind the scenes as there was a precompiler that split the pasm part out of the C program, then the pasm part was compiled separately and downloaded separately, and the C part was then compiled and run. I ended up writing an IDE and it had two panes, one for C and one for pasm.
How is this being done in GCC?
Can you mix and match pasm like in the Spintool? Is it better to actually write the pasm in C (I think this is possible, right?)
Or is it better to keep things totally separate - GCC does C and when you start a cog you pass a function the name of a binary file - "mycog.bin" - and that is loaded off the SD card and into a cog?
In the latter scenario, one would have SimpleIDE and the Proptool open at the same time and use Windows to flip between the two. I did that for a while in Catalina and it is a quite plausible way of doing development albeit slow until we got file transfer to SD card working using xmodem. Not quite as "integreated" as the Spin proptool but at least it worked with existing tools.
What is the best way to proceed?
I guess as a practical example, the latest board uses the MCP23008 chip. That will involve getting an I2C driver from the Obex. I haven't looked in detail, but hopefully there is one that conforms to what I call the "Gracey" standard based on Chips original mouse/display drivers where all variables are passed as a contiguous array at cog startup. If so, then the pasm part could compiled separately and the Spin translated to C.
And the temptation at that point would be to combine the GUI driver pasm with the I2C driver pasm so it only uses one cog instead of two.
Thoughts and sage advice would be most appreciated.
Thoughts and sage advice would be most appreciated.
I could use some sage advice too since I've been cutting sage, milkweed, and some kind of spiny miserable cactus all afternoon.
The best generic approach to writing GCC COG drivers right now is to use PASM which we're all familiar with or COG C. COG C programs are special kind of C file. GCC also supports GAS assembler if you want to try that. GCC inline ASM is based on GAS syntax. Honestly I would avoid any inline ASM in C if at all possible. It just causes trouble.
We have some COG driver programs written in COG C. VGA, I2C, and others. I want to write a cache cog in COG C.
Btw, you can include Spin/PASM files in SimpleIDE projects. Some day we'll have an integrated Spin compiler too.
The best generic approach to writing GCC COG drivers right now is to use PASM which we're all familiar with or COG C. COG C programs are special kind of C file. GCC also supports GAS assembler if you want to try that. GCC inline ASM is based on GAS syntax. Honestly I would avoid any inline ASM in C if at all possible. It just causes trouble.
Personally I think using GAS drivers is easiest, but that's probably a matter of taste. For small drivers inline assembly is fine, and actually pretty straightforward. For example here's the real time clock driver, from the GCC library. The actual COG code is in the __asm__ portion, as a string (using the ANSI C convention that strings are automatically concatenated if there are no tokens between them).
/*
* very simple COG program to keep the 64 bit _default_ticks variable
* up to date
*/
#include <propeller.h>
#include <sys/rtc.h>
__asm__(
" .section .cogrtcupdate,\"ax\"\n"
"L_main\n"
" rdlong oldlo, default_ticks_ptr\n"
" mov newlo, CNT\n"
" cmp newlo,oldlo wc\n"
" add default_ticks_ptr,#4\n"
" rdlong newhi, default_ticks_ptr\n"
" addx newhi,#0\n" /* adds in the carry set above */
" sub default_ticks_ptr,#4\n"
/* the sequence here makes sure to write newlo,newhi in that
* order and in the fewest possible hub windows; if all readers
* of default_ticks also read lo,hi in the fewest possible
* hub cycles, then all users will
* see consistent values
*/
" wrlong newlo, default_ticks_ptr\n"
" add default_ticks_ptr,#4\n"
" wrlong newhi, default_ticks_ptr\n"
" sub default_ticks_ptr,#4\n"
" jmp #L_main\n"
"newlo long 0\n"
"newhi long 0\n"
"oldlo long 0\n"
"default_ticks_ptr long __default_ticks\n"
);
void
_rtc_start_timekeeping_cog(void)
{
extern unsigned int _load_start_cogrtcupdate[];
if (_default_ticks_updated)
return; /* someone is already updating the time */
_default_ticks_updated = 1;
#if defined(__PROPELLER_XMMC__) || defined(__PROPELLER_XMM__)
unsigned int *buffer;
// allocate a buffer in hub memory for the cog to start from
buffer = __builtin_alloca(2048);
memcpy(buffer, _load_start_cogrtcupdate, 2048);
cognew(buffer, 0);
#else
cognew(_load_start_cogrtcupdate, 0);
#endif
}
This uses the linker magic that automatically turns any section starting or ending with ".cog" into a COG overlay.
Hey, great work Eric. That looks like a fantastic solution.
I'm presuming no problem with adding a comment eg
" wrlong newlo, default_ticks_ptr ' this is a comment\n"
or do comments need to be /* ... */
You can put the comments into the GAS string using single quotes (as in your example), or you can put them outside the string using C/C++ style comments -- either should be fine.
Hey guys, I've been thinking a lot about the C driver and I see one glaring issue. The "BUS" needs to be locked to prevent contention from multiple cogs accessing the bus for different functions. There will be at least 2 cogs accessing the bus : The cog running the cache driver and the "bus master" cog. SO, this begs the question of how to "lock the bus." This is probably only tricky since I have never ACTUALLY used the locks. Any recommendations? It should be as simple as a repeat loop calling lock until it returns the cog's id: before the first bus command? Then releasing the lock after the last bus command? Pass the lock id to use in one of the optional parameters? Or is there a default?
Started building dual-screen board, should be done in the next few days. Displays are still 2 weeks out. I'm a bit bummed about writing a new driver for the mpc board. It is a nice design and has some promising features! A side note, I'm still an 74hc08 away from firing the board up. That and I'm running out of sockets! As soon as I "secure" my new location I'm ordering a BUNCH of sockets, as well as a few parts I'm in dire need of. I find it cheaper to buy 8 pin, 16 pin, and 40 pin sockets in quantity and cut down to fit. It takes a bit longer but saves a few cents. The 40-pins don't cut down to 32 as well as the 16's to 14's. I'm down to my last 2 loose crystals and need to order a few 6.25s. On the list now is : wireless pair, scribbler2 badges for my wife, and various components TBD. Any thoughts?
Hey guys, I've been thinking a lot about the C driver and I see one glaring issue. The "BUS" needs to be locked to prevent contention from multiple cogs accessing the bus for different functions. There will be at least 2 cogs accessing the bus : The cog running the cache driver and the "bus master" cog. SO, this begs the question of how to "lock the bus." This is probably only tricky since I have never ACTUALLY used the locks. Any recommendations? It should be as simple as a repeat loop calling lock until it returns the cog's id: before the first bus command? Then releasing the lock after the last bus command? Pass the lock id to use in one of the optional parameters? Or is there a default?
Try looking at the code at the start of the spi_flash_cache.spin driver. It has code that denominator added to allow sharing of the SPI pins with another driver using a lock. There is no reason why the same code couldn't be used to share your larger bus with another driver. I think you can probably get away with just pasting that into your skeleton-based driver in place of the cache line handling code. If you need help with that let me know.
I will look at spi_flash_cache and see if I can figure it out. Shouldn't be too hard. Right now I'm building the dual-screen board. There's a trick though. I'm using male headers and ribbon cables. Which means mounting the headers for the display on the BOTTOM of the board. The DB9 port also needs to go on the bottom, so I'm thinking about putting the pin-headers on the bottom and use upside down. Should be interesting!
Which means mounting the headers for the display on the BOTTOM of the board.
Yikes - watch the polarities of the plugs. Is this so the display can be some distance away from the board?
Also re 2n2222 yes that will work. BC547 is the sort of "generic" signal transistor here in Australia and I think in the US the 2n2222 is the generic one.
There's a two-fold saving for me using ribbon cables. I have a BUNCH of male pin headers and cables. Not that many female pin headers. I will also be mounting the board in my rack-mount. To use the 3.2" displays I need to fudge the space, so this seems the best way. The problem is: using regular 40-pin PATA cables requires the header be placed on the BOTTOM. A bit tricky but I'm blaming THAT lesson for breaking my display. Until my new displays arrive I'll be using my old one and only one screen. 2 weeks will go fast and there's still much work. The board is 95%. Missing 3.3Vreg and large caps. Still trying to figure out the substitutions since buying caps is not an option and my stock is quickly disappearing.
I've asked the transistor question a few times I'm sure. I don't plan on using a whole roll of de-soldering braid on THIS board Also wondering if the MAX components would interfere with prop-plug? I guess I'll find out soon enough!
The problem is: using regular 40-pin PATA cables requires the header be placed on the BOTTOM. A bit tricky but I'm blaming THAT lesson for breaking my display.
Hmm, could be. I guess once it is soldered and before you do the smoke test, do a conductivity test on pins 1,2 and 39 and 40 at the least and check there are no crossovers.
Re large caps, I get a lot of mine from electronic junk. Computer motherboards are a good source. I once went to a computer store and asked for an old motherboard. The guy's eyes lit up and he showed me a room full of several hundred old PC boxes and told me to take as many as I liked for free as he wanted his room back. He seemed a bit disappointed when I only took one!
Re the max chip, yes it would interfere with the propplug. Just pull the max3232 out of its socket if you are using the propplug.
Re
Not that many female pin headers
I'll be sending you some freebies but it might be another 2-3 weeks.
Comments
Now all tests pass!
Now I need to figure out how to test the actual read and write speeds! Any test scrips for this?
Thanks again for all your help!
Joe
Glad you got something running. Now what else needs to be done?
To try running programs there are a few things to do:
1. Create a .dat file with BSTC
2. Copy the .dat file to the c:\propgcc\propeller-load directory
3. Add the cache driver to your touch161.cfg file.
4. Start SimpleIDE and try a program
Details:
1. Create a .dat file.
bstc -Ograux -c touch161_cache.spin
This creates a touch161_cache.dat file.
Open the hello demo.
a. Establish a "basis" with LMM mode. That is, choose memory model LMM, and Run Console F8
Verify that LMM hello works.
b. Change memory model to XMMC, select Board Type TOUCH161, and Run Console F8.
Verify that XMMC hello works.
c. Change memory model to XMM-SINGLE, keep Board Type TOUCH161, and Run Console F8.
Verify that XMM-SINGLE hello works.
I'll post some SimpleIDE package code for you to do some performance comparisons later.
XMM-Single works, not sure why XMMC wouldn't. I also tested SD card test in XMM-single. I tried to compile dry.c but: that's as far as I get
Not sure how to set this?
#include <sys/param.h> /* If your system doesn't have this, use -DHZ=xxx */
Sounds like I'm getting ahead of myself since no XMMC though.
SimpleIDE will not build the dhrystone test because of the way it must be built.
I've attached a dry_xmmc.elf and dry_xmm_single.elf in a .zip for you.
Use the loader in a command window for these examples.
propeller-load -r -t -b touch161 dry_xmmc.elf
propeller-load -r -t -b touch161 dry_xmm_single.elf
Did you try the hello examples?
Here is one reason why xmmc might not work:
A program compiled using -mxmmc will have code that lives at 0x30000000. A program that is compiled in -mxmm-single mode will have code and data that live at 0x20000000. In order for an xmmc program to work, your cache driver must make the external memory visible at the 0x30000000 address. Most cache drivers don't decode the high order bits anyway so there are images of the memory repeated starting at 0x20000000 and repeating throughout the rest of memory. If you're decoding those high order bits then that might be why your xmmc program isn't working. Also, you might want to mask off those bits anyway since I believe your memory has many fewer address bits. Could those high bits be causing you trouble?
Now let me say I do understand what you're saying, for the most part. To make sure I DO understand:
Mask of the high bits of the EXTERNAL address. eg: So copy vmaddr and then mask off the address bits my memory doesn't have? Now am I able to mask off even more bits to "reserve" memory for the rest of the system? Say :
Kidding of course. Did have to find my own HC00 chip though.
Word wide memory needs a small adjustment: A1 should map to A0 on the chips. Add line 205.
Preliminary results before optimizations - SSF listed for comparison:
Memory Model
Board Type
Dhrystones/Second
LMM
SSF (HUB)
6983
XMMC
SSF
1256
LMM
TOUCH161 (HUB)
6983
XMMC
TOUCH161
1278
XMM-SINGLE
TOUCH161
713
Preliminary results before optimizations - SSF listed for comparison:
Memory Model
Board Type
Dhrystones/Second
LMM
TOUCH161 (HUB)
6983
XMMC
TOUCH161
1364
XMM-SINGLE
TOUCH161
754
Then, I ran dry xmm-single and the numbers check out. 754 DhryPerSec. GOOD. Then I try the xmmC and I still get nothing? Fail I still don't know what my xmmc issue is *NOOB* Still the numbers are pretty good.
Now I'm worried though, because we might have a new board on the way with a different Group Select chip *MCP23008* I will be using the current board for quite a while, since I have a software fix for the display issue.
*edit*
Sorry about making you look for the HC00. I only have 1.
Well at least something jives. Maybe you should play some blues guitar and try again later.
I don't get the dry_xmmc.elf problem. Works fine for me.
How big is the file? Is it a text file now by any chance?
Downloaded it again and tested using the config file below:
propeller-load -r -t -b touch161 dry_xmmc.elf
Can't just write this off. We need a root-cause otherwise we could be hiding some problem by accident.
Not sure how to get there though. The only thing i can think of is that the dry_xmmc.elf file is corrupted.
Thanks,
--Steve
I can't thank everyone enough! Steve, David, James and everyone else who has contributed to the community. You guys rock!
One driving force for getting a GUI into C is that there will eventually not be enough room in Spin. The demo program I am using is taking about 3/4 of hub memory and every extra demo function takes it inexorably towards that moment when there is no code space left.
Also the demo program is getting unwieldy now as we are needing different code for the two displays. So that may well lend itself to a C Class or similar, where you call a common function DrawPixel(x,y,color) and it works for both displays.
So one could think about two header files, ILI9325.h and SSD1289.h
Each would have the same functions - load a font, draw a radiobox, draw a textbox, draw a line etc but the code would be different.
I've been reading through this webpage on headers http://www.gamedev.net/page/resources/_/technical/general-programming/organizing-code-files-in-c-and-c-r1798 which has some great advice.
I also need to get my head around writing pasm in C. I pushed Catalina out to the limit with this but it could have been easier than it was. One nice thing about the proptool and spin is you can have the pasm and spin code in the same file. You can even copy and paste them so they are near each other so it is a couple of taps on a page up or page down button to swap between the two. That makes debugging a lot easier. At the other extreme, there is the idea of binary blobs of pasm that you have to copy to an SD card as separate files that are precompiled. That involves lots of removing the SD card which is a pain, slower and I ended up solving that by automating downloads of the pasm part as part of the compile process. That got complicated behind the scenes as there was a precompiler that split the pasm part out of the C program, then the pasm part was compiled separately and downloaded separately, and the C part was then compiled and run. I ended up writing an IDE and it had two panes, one for C and one for pasm.
How is this being done in GCC?
Can you mix and match pasm like in the Spintool? Is it better to actually write the pasm in C (I think this is possible, right?)
Or is it better to keep things totally separate - GCC does C and when you start a cog you pass a function the name of a binary file - "mycog.bin" - and that is loaded off the SD card and into a cog?
In the latter scenario, one would have SimpleIDE and the Proptool open at the same time and use Windows to flip between the two. I did that for a while in Catalina and it is a quite plausible way of doing development albeit slow until we got file transfer to SD card working using xmodem. Not quite as "integreated" as the Spin proptool but at least it worked with existing tools.
What is the best way to proceed?
I guess as a practical example, the latest board uses the MCP23008 chip. That will involve getting an I2C driver from the Obex. I haven't looked in detail, but hopefully there is one that conforms to what I call the "Gracey" standard based on Chips original mouse/display drivers where all variables are passed as a contiguous array at cog startup. If so, then the pasm part could compiled separately and the Spin translated to C.
And the temptation at that point would be to combine the GUI driver pasm with the I2C driver pasm so it only uses one cog instead of two.
Thoughts and sage advice would be most appreciated.
I could use some sage advice too since I've been cutting sage, milkweed, and some kind of spiny miserable cactus all afternoon.
The best generic approach to writing GCC COG drivers right now is to use PASM which we're all familiar with or COG C. COG C programs are special kind of C file. GCC also supports GAS assembler if you want to try that. GCC inline ASM is based on GAS syntax. Honestly I would avoid any inline ASM in C if at all possible. It just causes trouble.
We have some COG driver programs written in COG C. VGA, I2C, and others. I want to write a cache cog in COG C.
Btw, you can include Spin/PASM files in SimpleIDE projects. Some day we'll have an integrated Spin compiler too.
Thanks,
--Steve
Personally I think using GAS drivers is easiest, but that's probably a matter of taste. For small drivers inline assembly is fine, and actually pretty straightforward. For example here's the real time clock driver, from the GCC library. The actual COG code is in the __asm__ portion, as a string (using the ANSI C convention that strings are automatically concatenated if there are no tokens between them).
This uses the linker magic that automatically turns any section starting or ending with ".cog" into a COG overlay.
This is a good one for the PropGCC Cookbook!!
(you are writing the Propeller GCC Cookbook, aren't you???? )
I'm presuming no problem with adding a comment eg
or do comments need to be /* ... */
You can put the comments into the GAS string using single quotes (as in your example), or you can put them outside the string using C/C++ style comments -- either should be fine.
Started building dual-screen board, should be done in the next few days. Displays are still 2 weeks out. I'm a bit bummed about writing a new driver for the mpc board. It is a nice design and has some promising features! A side note, I'm still an 74hc08 away from firing the board up. That and I'm running out of sockets! As soon as I "secure" my new location I'm ordering a BUNCH of sockets, as well as a few parts I'm in dire need of. I find it cheaper to buy 8 pin, 16 pin, and 40 pin sockets in quantity and cut down to fit. It takes a bit longer but saves a few cents. The 40-pins don't cut down to 32 as well as the 16's to 14's. I'm down to my last 2 loose crystals and need to order a few 6.25s. On the list now is : wireless pair, scribbler2 badges for my wife, and various components TBD. Any thoughts?
Try looking at the code at the start of the spi_flash_cache.spin driver. It has code that denominator added to allow sharing of the SPI pins with another driver using a lock. There is no reason why the same code couldn't be used to share your larger bus with another driver. I think you can probably get away with just pasting that into your skeleton-based driver in place of the cache line handling code. If you need help with that let me know.
*edit*
2n2222 for reset transistor??
Yikes - watch the polarities of the plugs. Is this so the display can be some distance away from the board?
Also re 2n2222 yes that will work. BC547 is the sort of "generic" signal transistor here in Australia and I think in the US the 2n2222 is the generic one.
I've asked the transistor question a few times I'm sure. I don't plan on using a whole roll of de-soldering braid on THIS board Also wondering if the MAX components would interfere with prop-plug? I guess I'll find out soon enough!
Hmm, could be. I guess once it is soldered and before you do the smoke test, do a conductivity test on pins 1,2 and 39 and 40 at the least and check there are no crossovers.
Re large caps, I get a lot of mine from electronic junk. Computer motherboards are a good source. I once went to a computer store and asked for an old motherboard. The guy's eyes lit up and he showed me a room full of several hundred old PC boxes and told me to take as many as I liked for free as he wanted his room back. He seemed a bit disappointed when I only took one!
Re the max chip, yes it would interfere with the propplug. Just pull the max3232 out of its socket if you are using the propplug.
Re I'll be sending you some freebies but it might be another 2-3 weeks.