Ok guys, I think I got it! I was trashing "hubadd" in the loops. After reading the dracblade driver, I figured it out!
{
Skeleton JCACHE external RAM driver
Copyright (c) 2011 by David Betz
Based on code by Steve Denson (jazzed)
Copyright (c) 2010 by John Steven Denson
Inspired by VMCOG - virtual memory server for the Propeller
Copyright (c) February 3, 2010 by William Henning
For the EuroTouch 161 By James Moxham and Joe Heinz
TERMS OF USE: MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
}CON' default cache dimensions
DEFAULT_INDEX_WIDTH = 6
DEFAULT_OFFSET_WIDTH = 7' cache line tag flags
EMPTY_BIT = 30
DIRTY_BIT = 31PUBimagereturn @init_vm
DATorg$0' initialization structure offsets' $0: pointer to a two word mailbox' $4: pointer to where to store the cache lines in hub ram' $8: number of bits in the cache line index if non-zero (default is DEFAULT_INDEX_WIDTH)' $a: number of bits in the cache line offset if non-zero (default is DEFAULT_OFFSET_WIDTH)' note that $4 must be at least 2^(index_width+offset_width) bytes in size' the cache line mask is returned in $0
init_vm mov t1, par' get the address of the initialization structurerdlong pvmcmd, t1 ' pvmcmd is a pointer to the virtual address and read/write bitmov pvmaddr, pvmcmd ' pvmaddr is a pointer into the cache line on returnadd pvmaddr, #4add t1, #4rdlong cacheptr, t1 ' cacheptr is the base address in hub ram of the cacheadd t1, #4rdlong t2, t1 wzif_nzmov index_width, t2 ' override the index_width default valueadd t1, #4rdlong t2, t1 wzif_nzmov offset_width, t2 ' override the offset_width default valuemov index_count, #1shl index_count, index_width
mov index_mask, index_count
sub index_mask, #1mov line_size, #1shl line_size, offset_width
mov t1, line_size
sub t1, #1wrlong t1, par' put external memory initialization herejmp #vmflush
fillme long0[128-fillme] ' first 128 cog locations are used for a direct mapped cache tablefit128' initialize the cache lines
vmflush movd :flush, #0mov t1, index_count
:flush mov0-0, empty_mask
add :flush, dstinc
djnz t1, #:flush
' start the command loop
waitcmd wrlong zero, pvmcmd
:wait rdlong vmline, pvmcmd wzif_zjmp #:wait
shr vmline, offset_width wc' carry is now one for read and zero for writemov set_dirty_bit, #0' make mask to set dirty bit on writesmuxnc set_dirty_bit, dirty_mask
mov line, vmline ' get the cache line indexand line, index_mask
mov hubaddr, line
shl hubaddr, offset_width
add hubaddr, cacheptr ' get the address of the cache linewrlong hubaddr, pvmaddr ' return the address of the cache linemovs :ld, line
movd :st, line
:ld mov vmcurrent, 0-0' get the cache line tagand vmcurrent, tag_mask
cmp vmcurrent, vmline wz' z set means there was a cache hitif_nzcall #miss ' handle a cache miss
:st or0-0, set_dirty_bit ' set the dirty bit on writesjmp #waitcmd ' wait for a new command' line is the cache line index' vmcurrent is current cache line' vmline is new cache line' hubaddr is the address of the cache line
miss movd :test, line
movd :st, line
:testtest0-0, dirty_mask wzif_zjmp #:rd ' current cache line is clean, just read new onemov vmaddr, vmcurrent
shl vmaddr, offset_width
call #wr_cache_line ' write current cache line
:rd mov vmaddr, vmline
shl vmaddr, offset_width
call #rd_cache_line ' read new cache line
:st mov0-0, vmline
miss_ret ret' pointers to mailbox entries
pvmcmd long0' on call this is the virtual address and read/write bit
pvmaddr long0' on return this is the address of the cache line containing the virtual address
cacheptr long0' address in hub ram where cache lines are stored
vmline long0' cache line containing the virtual address
vmcurrent long0' current selected cache line (same as vmline on a cache hit)
line long0' current cache line index
set_dirty_bit long0' DIRTY_BIT set on writes, clear on reads
zero long0' zero constant
dstinc long1<<9' increment for the destination field of an instruction
t1 long0' temporary variable
t2 long0' temporary variable
tag_mask long !(1<<DIRTY_BIT) ' includes EMPTY_BIT
index_width long DEFAULT_INDEX_WIDTH
index_mask long0
index_count long0
offset_width long DEFAULT_OFFSET_WIDTH
line_size long0' line size in bytes
empty_mask long (1<<EMPTY_BIT)
dirty_mask long (1<<DIRTY_BIT)
' input parameters to rd_cache_line and wr_cache_line
vmaddr long0' external address
hubaddr long0' hub memory address' temporaries used by BREAD and BWRITE
ptr long0
count long0' copy line size and devide by two for byte-word offset'get_values rdlong hubaddr, hubptr ' get hub address' rdlong ramaddr, ramptr ' get ram address' rdlong len, lenptr ' get length' mov err, #5 ' err=5'get_values_ret ret'init 'mov err, #0 ' reset err=false=good'mov dira,zero ' tristate the pins with the cog dira' and dira,maskP0P20P22 ' tristates all the common pins'done 'wrlong err, errptr ' status =0=false=good, else error x'wrlong zero, comptr ' command =0 (done)' Pass pasm_n = 0- 7 come to this with P0-P20 and P22 tristated and returns them as this too
set137 ordira,maskP22 ' pin 22 is an outputandnouta,maskP22 ' set P22low so Y0-Y7 are all highordira,maskP0P20 ' pins P0-P20 are outputsandouta,maskP0P2low ' set these 3 pins loworouta,pasm_n ' set the 137 pinsorouta,maskP22 ' pin 22 high
set137_ret ret' return
load161pasm ' uses vmaddrmov count, line_size ' make a copy of line_size AND.shr count, #1' devide lenght by two for word-bytemov ptr, hubaddr ' hubaddr = hub page addressorouta,maskP0P20 ' set P0-P20 high ordira,maskP0P20 ' output pins 0-20mov pasm_n,#0' group 0call #set137 ' set the 137 outputandouta,maskP0P18low ' pins 0-18 set loworouta,vmaddr ' output addres to 161 chipsorouta,maskP19 ' clock highorouta,maskP20 ' load highandnouta,maskP19 ' clock lowandnouta,maskP20 ' load loworouta,maskP19 ' clock highorouta,maskP20 ' load high'andn outa,maskP20 ' load low'andn outa,maskP19 ' clock low'or outa,maskP19 ' clock high'or outa,maskP20 ' load high
load161pasm_ret ret
stop jmp #stop ' for debugging
memorytransfer ordira,maskP16P20 ' so /wr and other pins definitely highorouta,maskP16P20
mov pasm_n,#1' back to group 1 for memory transfercall #set137 ' as next routine will always be group 1ordira,maskP16P20 ' output pins 16-20orouta,maskP16P20 ' set P16-P20 high (P0-P15 set as inputs or outputs in the calling routine)
memorytransfer_ret ret
busoutput ordira,maskP0P15 ' set prop pins 0-15 as outputs
busoutput_ret ret
businput anddira,maskP16P31 ' set P0-P15 as inputs
businput_ret ret
delaynop nopnopnopnop
delaynop_ret ret'----------------------------------------------------------------------------------------------------'' rd_cache_line - read a cache line from external memory'' vmaddr is the external memory address to read' hubaddr is the hub memory address to write' line_size is the number of bytes to read''----------------------------------------------------------------------------------------------------
rd_cache_line
' command T
pasmramtohub
call #load161pasm ' load the 161 counters with ramaddrcall #memorytransfer ' set to group 1, enable P16-P20 as outputs and set P16-P20 high call #businput ' set prop pins P0-P15 as inputsandnouta,maskP16 ' memory /rd low
ramtohub_loop mov data_16,ina' get the datawrword data_16,ptr ' move data to hubandnouta,maskP19 ' clock 161 loworouta,maskP19 ' clock 161 highadd ptr,#2' increment the hub addressdjnz count,#ramtohub_loop
orouta,maskP16 ' memory /rd high ordira,maskP0P15 ' %00000000_00000000_11111111_11111111 restore P0-P15as outputsanddira,maskP0P20P22 ' tristates all the common pins
rd_cache_line_ret
ret'----------------------------------------------------------------------------------------------------'' wr_cache_line - write a cache line to external memory'' vmaddr is the external memory address to write' hubaddr is the hub memory address to read' line_size is the number of bytes to write''----------------------------------------------------------------------------------------------------
wr_cache_line
' command S
pasmhubtoram
call #load161pasm ' load the 161 counters with ramaddrcall #memorytransfer ' set to group 1, enable P16-P20 as outputs and set P16-P20 highcall #busoutput ' set prop pins P0-P15 as outputs
hubtoram_loop andouta,maskP16P31 '%11111111_11111111_00000000_00000000 ' clear for output rdword data_16,ptr ' get the word from huband data_16,maskP0P15 ' mask to a word onlyorouta,data_16 ' send out the byte to P0-P15andnouta,maskP17 ' set mem write lowadd ptr,#2' increment by 2 bytes = 1 word. Put this here for small delay while writesorouta,maskP17 ' mem write highandnouta,maskP19 ' clock 161 loworouta,maskP19 ' clock 161 highdjnz count,#hubtoram_loop ' loop this many timesanddira,maskP0P20P22 ' tristates all the common pins
wr_cache_line_ret
ret
pasm_n long0' general purpose value
data_16 long0' general purpose value
maskP0P2low long%11111111_11111111_11111111_11111000' P0-P2 low
maskP0P20 long%00000000_00011111_11111111_11111111' P0-P18 enabled for output plus P19,P20
maskP0P18low long%11111111_11111000_00000000_00000000' P0-P18 low
maskP16 long%00000000_00000001_00000000_00000000' pin 16
maskP17 long%00000000_00000010_00000000_00000000' pin 17
maskP18 long%00000000_00000100_00000000_00000000' pin 18
maskP19 long%00000000_00001000_00000000_00000000' pin 19
maskP20 long%00000000_00010000_00000000_00000000' pin 20
maskP22 long%00000000_01000000_00000000_00000000' pin 22
maskP16P31 long%11111111_11111111_00000000_00000000' pin 16 to pin 31
maskP0P15 long%00000000_00000000_11111111_11111111' for masking words
maskP16P20 long%00000000_00011111_00000000_00000000
maskP0P20P22 long%11111111_10100000_00000000_00000000' for returning all group pins HiZfit496
The change is this :
mov ptr, hubaddr ' hubaddr = hub page address
Then change all loop references of hubaddr to ptr.
Now all tests pass!
Now I need to figure out how to test the actual read and write speeds! Any test scrips for this?
Congratulations!! It sounds like I should annotate the skeleton driver a little better so it says you have to leave line_size and hubaddr unchanged in your rd/wr code.
That would be very helpful! I'm sorry I sent you guys on a wild goose chase. Now I know for in the future. Also, are there any scripts to test the performance of the cache driver? I know this could still use some tuning and optimizations. Would it be helpful to unroll the loops . eg move the code in set137 and load161 to the loops?
Thanks again for all your help!
Joe
Glad you got something running. Now what else needs to be done?
To try running programs there are a few things to do:
1. Create a .dat file with BSTC
2. Copy the .dat file to the c:\propgcc\propeller-load directory
3. Add the cache driver to your touch161.cfg file.
4. Start SimpleIDE and try a program
Details:
1. Create a .dat file.
Assuming your cache source is called touch161_cache.spin, use this bstc command in a CMD window:
bstc -Ograux -c touch161_cache.spin
This creates a touch161_cache.dat file.
2. Copy the .dat file
copy touch161_cache.dat c:\propgcc\propeller-load
3. Add the cache-driver
Edit the touch161.cfg file so that it looks like this:
Hmm, it appears as if XMMC will not run unless I select boardtype *board*-SDXMMC?
XMM-Single works, not sure why XMMC wouldn't. I also tested SD card test in XMM-single. I tried to compile dry.c but:
that's as far as I get
Not sure how to set this?
#include <sys/param.h> /* If your system doesn't have this, use -DHZ=xxx */
Sounds like I'm getting ahead of myself since no XMMC though.
SimpleIDE will not build the dhrystone test because of the way it must be built.
I've attached a dry_xmmc.elf and dry_xmm_single.elf in a .zip for you.
Use the loader in a command window for these examples.
Hmm, it appears as if XMMC will not run unless I select boardtype *board*-SDXMMC?
XMM-Single works, not sure why XMMC wouldn't. I also tested SD card test in XMM-single. I tried to compile dry.c but:
that's as far as I get
Not sure how to set this?
#include <sys/param.h> /* If your system doesn't have this, use -DHZ=xxx */
Sounds like I'm getting ahead of myself since no XMMC though.
For some reason XMMC still doesn't work. No hello world, nothing. XMM seems to work fine though?
C:\Users\Joe\Downloads\dry>propeller-load -r -t -b dracTouchEX dry_xmm_single.elf
Propeller Version 1 on COM21
Loading the serial helper to hub memory
9528 bytes sent
Verifying RAM ... OK
Loading cache driver 'ET_cache.dat'1088 bytes sent
Loading program image to RAM
17408 bytes sent
Loading .xmmkernel
1724 bytes sent
[ Entering terminal mode. Type ESC or Control-C to exit. ]
Dhrystone Benchmark, Version C, Version 2.2
Program compiled without 'register' attribute
Using STDC clock(), HZ=80000000
Trying 5000 runs through Dhrystone:
Final values of the variables used in the benchmark:
Int_Glob: 5
should be: 5
Bool_Glob: 1
should be: 1
Ch_1_Glob: A
should be: A
Ch_2_Glob: B
should be: B
Arr_1_Glob[8]: 7
should be: 7
Arr_2_Glob[8][7]: 5010
should be: Number_Of_Runs + 10
Ptr_Glob->
Ptr_Comp: 536899232
should be: (implementation-dependent)
Discr: 0
should be: 0
Enum_Comp: 2
should be: 2
Int_Comp: 17
should be: 17
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
Ptr_Comp: 536899232
should be: (implementation-dependent), same as above
Discr: 0
should be: 0
Enum_Comp: 1
should be: 1
Int_Comp: 18
should be: 18
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc: 5
should be: 5
Int_2_Loc: 13
should be: 13
Int_3_Loc: 7
should be: 7
Enum_Loc: 1
should be: 1
Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING
should be: DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING
should be: DHRYSTONE PROGRAM, 2'ND STRING
Microseconds for one run through Dhrystone: 1400
Dhrystones per Second: 714
Is it possible that 24k isn't enough RAM to run your program? In xmmc mode the code is in external memory but the data must fit in hub memory along with the cache. The cache size is probably set to 8k so that means the program you're running can't use more than 24k of RAM.
I think the hello.c should fit in 24k of ram? Not sure about the DrAFile or Dhrystone. Theoretically, if hello.c won't run, then nothing will. SDXMMC will run, but that's not what we want. I also tried adding SDLoad to the cfg file and this will not run either.
I think the hello.c should fit in 24k of ram? Not sure about the DrAFile or Dhrystone. Theoretically, if hello.c won't run, then nothing will. SDXMMC will run, but that's not what we want. I also tried adding SDLoad to the cfg file and this will not run either.
The SD loader should work as long as whatever program it's trying to run works. It can load either xmm or xmmc programs. If you try running the xmm program that's been working directly with propeller-load then it should work.
Here is one reason why xmmc might not work:
A program compiled using -mxmmc will have code that lives at 0x30000000. A program that is compiled in -mxmm-single mode will have code and data that live at 0x20000000. In order for an xmmc program to work, your cache driver must make the external memory visible at the 0x30000000 address. Most cache drivers don't decode the high order bits anyway so there are images of the memory repeated starting at 0x20000000 and repeating throughout the rest of memory. If you're decoding those high order bits then that might be why your xmmc program isn't working. Also, you might want to mask off those bits anyway since I believe your memory has many fewer address bits. Could those high bits be causing you trouble?
That very well could be it!
Now let me say I do understand what you're saying, for the most part. To make sure I DO understand:
Mask of the high bits of the EXTERNAL address. eg:
mov address, vmaddr
and address,maskhighadd
..
..
maskhighadd long%00000000_00000111_11111111_11111111'access to full memory space
So copy vmaddr and then mask off the address bits my memory doesn't have? Now am I able to mask off even more bits to "reserve" memory for the rest of the system? Say :
maskhighadd long %00000000_00000000_01111111_11111111 'access to 32k partition of memory?
Kidding of course. Did have to find my own HC00 chip though.
Word wide memory needs a small adjustment: A1 should map to A0 on the chips. Add line 205.
and outa,maskP0P18low ' pins 0-18 set low
shr vmaddr, #1 ' schematic connects SRAM A0 to A0, not A1 - jsd. line 205or outa,vmaddr ' output addres to 161 chips
Preliminary results before optimizations - SSF listed for comparison:
Below are results with fast read and other optimizations. Faster writes will take more work because of the write strobe requirement. Performance for this test and applications could be different with various cache line sizes. Currently the cache line size is 128 bytes and the whole read burst happens in 16us - 8MB/s line read.
Preliminary results before optimizations - SSF listed for comparison:
This is very exciting! So, I DL touch_cache and built the dat with "bstc -Ograux -c touch_cache.spin" GOOD.
Then, I ran dry xmm-single and the numbers check out. 754 DhryPerSec. GOOD.
C:\Users\Joe\Downloads\dry>propeller-load -r -t -b touch161 dry_xmm_single.elf
Propeller Version 1 on COM21
Loading the serial helper to hub memory
9528 bytes sent
Verifying RAM ... OK
Loading cache driver 'new_cache.dat'1084 bytes sent
Loading program image to RAM
17408 bytes sent
Loading .xmmkernel
1724 bytes sent
[ Entering terminal mode. Type ESC or Control-C to exit. ]
Dhrystone Benchmark, Version C, Version 2.2
Program compiled without 'register' attribute
Using STDC clock(), HZ=80000000
Trying 5000 runs through Dhrystone:
Final values of the variables used in the benchmark:
Int_Glob: 5
should be: 5
Bool_Glob: 1
should be: 1
Ch_1_Glob: A
should be: A
Ch_2_Glob: B
should be: B
Arr_1_Glob[8]: 7
should be: 7
Arr_2_Glob[8][7]: 5010
should be: Number_Of_Runs + 10
Ptr_Glob->
Ptr_Comp: 536899232
should be: (implementation-dependent)
Discr: 0
should be: 0
Enum_Comp: 2
should be: 2
Int_Comp: 17
should be: 17
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
Ptr_Comp: 536899232
should be: (implementation-dependent), same as above
Discr: 0
should be: 0
Enum_Comp: 1
should be: 1
Int_Comp: 18
should be: 18
Str_Comp: DHRYSTONE PROGRAM, SOME STRING
should be: DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc: 5
should be: 5
Int_2_Loc: 13
should be: 13
Int_3_Loc: 7
should be: 7
Enum_Loc: 1
should be: 1
Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING
should be: DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING
should be: DHRYSTONE PROGRAM, 2'ND STRING
Microseconds for one run through Dhrystone: 1324
Dhrystones per Second: 754
Then I try the xmmC and I still get nothing? Fail
C:\Users\Joe\Downloads\dry>propeller-load -r -t -b touch161 dry_xmmc.elf
Propeller Version 1 on COM21
Loading the serial helper to hub memory
9528 bytes sent
Verifying RAM ... OK
Loading cache driver 'new_cache.dat'1084 bytes sent
Loading program image to flash
15412 bytes sent
Loading .xmmkernel
1724 bytes sent
[ Entering terminal mode. Type ESC or Control-C to exit. ]
I still don't know what my xmmc issue is *NOOB* Still the numbers are pretty good.
Now I'm worried though, because we might have a new board on the way with a different Group Select chip *MCP23008* I will be using the current board for quite a while, since I have a software fix for the display issue.
*edit*
Sorry about making you look for the HC00. I only have 1.
I still don't know what my xmmc issue is *NOOB* Still the numbers are pretty good.
Now I'm worried though, because we might have a new board on the way with a different Group Select chip *MCP23008* I will be using the current board for quite a while, since I have a software fix for the display issue.
*edit*
Sorry about making you look for the HC00. I only have 1.
I don't get the dry_xmmc.elf problem. Works fine for me.
How big is the file? Is it a text file now by any chance?
>dir /A dry_xmmc.elf
Volume in drive C is OS
Volume Serial Number is 6C8D-7E65
Directory of c:\gccdev\propside\MyProjects\dry
05/27/201204:40 PM 37,526 dry_xmmc.elf
....
Downloaded it again and tested using the config file below:
propeller-load -r -t -b touch161 dry_xmmc.elf
# touch161# IDE:SDLOAD# IDE:SDXMMCclkfreq:80000000clkmode:XTAL1+PLL16Xbaudrate:115200rxpin:31txpin:30tvpin:12# only used if TV_DEBUG is definedcache-driver:touch_cache.datcache-size:8Kcache-param1:0cache-param2:0sd-driver:sd_driver.datsdspi-do:24sdspi-clk:25sdspi-di:26sdspi-cs:27load-target:ram
Can't just write this off. We need a root-cause otherwise we could be hiding some problem by accident.
Not sure how to get there though. The only thing i can think of is that the dry_xmmc.elf file is corrupted.
I knew it was a noob mistake. Missing load target = ram. I will run everything through xmmc to make sure this was the problem but it seems like it. I get the same number as you do in both modes now! How exciting!!!
I can't thank everyone enough! Steve, David, James and everyone else who has contributed to the community. You guys rock!
Wow, lots happened here while I was asleep! Great work.
One driving force for getting a GUI into C is that there will eventually not be enough room in Spin. The demo program I am using is taking about 3/4 of hub memory and every extra demo function takes it inexorably towards that moment when there is no code space left.
Also the demo program is getting unwieldy now as we are needing different code for the two displays. So that may well lend itself to a C Class or similar, where you call a common function DrawPixel(x,y,color) and it works for both displays.
So one could think about two header files, ILI9325.h and SSD1289.h
Each would have the same functions - load a font, draw a radiobox, draw a textbox, draw a line etc but the code would be different.
I also need to get my head around writing pasm in C. I pushed Catalina out to the limit with this but it could have been easier than it was. One nice thing about the proptool and spin is you can have the pasm and spin code in the same file. You can even copy and paste them so they are near each other so it is a couple of taps on a page up or page down button to swap between the two. That makes debugging a lot easier. At the other extreme, there is the idea of binary blobs of pasm that you have to copy to an SD card as separate files that are precompiled. That involves lots of removing the SD card which is a pain, slower and I ended up solving that by automating downloads of the pasm part as part of the compile process. That got complicated behind the scenes as there was a precompiler that split the pasm part out of the C program, then the pasm part was compiled separately and downloaded separately, and the C part was then compiled and run. I ended up writing an IDE and it had two panes, one for C and one for pasm.
How is this being done in GCC?
Can you mix and match pasm like in the Spintool? Is it better to actually write the pasm in C (I think this is possible, right?)
Or is it better to keep things totally separate - GCC does C and when you start a cog you pass a function the name of a binary file - "mycog.bin" - and that is loaded off the SD card and into a cog?
In the latter scenario, one would have SimpleIDE and the Proptool open at the same time and use Windows to flip between the two. I did that for a while in Catalina and it is a quite plausible way of doing development albeit slow until we got file transfer to SD card working using xmodem. Not quite as "integreated" as the Spin proptool but at least it worked with existing tools.
What is the best way to proceed?
I guess as a practical example, the latest board uses the MCP23008 chip. That will involve getting an I2C driver from the Obex. I haven't looked in detail, but hopefully there is one that conforms to what I call the "Gracey" standard based on Chips original mouse/display drivers where all variables are passed as a contiguous array at cog startup. If so, then the pasm part could compiled separately and the Spin translated to C.
And the temptation at that point would be to combine the GUI driver pasm with the I2C driver pasm so it only uses one cog instead of two.
Thoughts and sage advice would be most appreciated.
Thoughts and sage advice would be most appreciated.
I could use some sage advice too since I've been cutting sage, milkweed, and some kind of spiny miserable cactus all afternoon.
The best generic approach to writing GCC COG drivers right now is to use PASM which we're all familiar with or COG C. COG C programs are special kind of C file. GCC also supports GAS assembler if you want to try that. GCC inline ASM is based on GAS syntax. Honestly I would avoid any inline ASM in C if at all possible. It just causes trouble.
We have some COG driver programs written in COG C. VGA, I2C, and others. I want to write a cache cog in COG C.
Btw, you can include Spin/PASM files in SimpleIDE projects. Some day we'll have an integrated Spin compiler too.
The best generic approach to writing GCC COG drivers right now is to use PASM which we're all familiar with or COG C. COG C programs are special kind of C file. GCC also supports GAS assembler if you want to try that. GCC inline ASM is based on GAS syntax. Honestly I would avoid any inline ASM in C if at all possible. It just causes trouble.
Personally I think using GAS drivers is easiest, but that's probably a matter of taste. For small drivers inline assembly is fine, and actually pretty straightforward. For example here's the real time clock driver, from the GCC library. The actual COG code is in the __asm__ portion, as a string (using the ANSI C convention that strings are automatically concatenated if there are no tokens between them).
/*
* very simple COG program to keep the 64 bit _default_ticks variable
* up to date
*/
#include <propeller.h>
#include <sys/rtc.h>
__asm__(
" .section .cogrtcupdate,\"ax\"\n""L_main\n"" rdlong oldlo, default_ticks_ptr\n"" mov newlo, CNT\n"" cmp newlo,oldlo wc\n"" add default_ticks_ptr,#4\n"" rdlong newhi, default_ticks_ptr\n"" addx newhi,#0\n"/* adds in the carry set above */" sub default_ticks_ptr,#4\n"/* the sequence here makes sure to write newlo,newhi in that
* order and in the fewest possible hub windows; if all readers
* of default_ticks also read lo,hi in the fewest possible
* hub cycles, then all users will
* see consistent values
*/" wrlong newlo, default_ticks_ptr\n"" add default_ticks_ptr,#4\n"" wrlong newhi, default_ticks_ptr\n"" sub default_ticks_ptr,#4\n"" jmp #L_main\n""newlo long 0\n""newhi long 0\n""oldlo long 0\n""default_ticks_ptr long __default_ticks\n"
);
void
_rtc_start_timekeeping_cog(void)
{
extern unsigned int _load_start_cogrtcupdate[];
if (_default_ticks_updated)
return; /* someone is already updating the time */
_default_ticks_updated =1;
#if defined(__PROPELLER_XMMC__) || defined(__PROPELLER_XMM__)
unsigned int *buffer;
// allocate a buffer in hub memory for the cog to start from
buffer = __builtin_alloca(2048);
memcpy(buffer, _load_start_cogrtcupdate, 2048);
cognew(buffer, 0);
#else
cognew(_load_start_cogrtcupdate, 0);
#endif
}
This uses the linker magic that automatically turns any section starting or ending with ".cog" into a COG overlay.
Hey, great work Eric. That looks like a fantastic solution.
I'm presuming no problem with adding a comment eg
" wrlong newlo, default_ticks_ptr ' this is a comment\n"
or do comments need to be /* ... */
You can put the comments into the GAS string using single quotes (as in your example), or you can put them outside the string using C/C++ style comments -- either should be fine.
Hey guys, I've been thinking a lot about the C driver and I see one glaring issue. The "BUS" needs to be locked to prevent contention from multiple cogs accessing the bus for different functions. There will be at least 2 cogs accessing the bus : The cog running the cache driver and the "bus master" cog. SO, this begs the question of how to "lock the bus." This is probably only tricky since I have never ACTUALLY used the locks. Any recommendations? It should be as simple as a repeat loop calling lock until it returns the cog's id: before the first bus command? Then releasing the lock after the last bus command? Pass the lock id to use in one of the optional parameters? Or is there a default?
Started building dual-screen board, should be done in the next few days. Displays are still 2 weeks out. I'm a bit bummed about writing a new driver for the mpc board. It is a nice design and has some promising features! A side note, I'm still an 74hc08 away from firing the board up. That and I'm running out of sockets! As soon as I "secure" my new location I'm ordering a BUNCH of sockets, as well as a few parts I'm in dire need of. I find it cheaper to buy 8 pin, 16 pin, and 40 pin sockets in quantity and cut down to fit. It takes a bit longer but saves a few cents. The 40-pins don't cut down to 32 as well as the 16's to 14's. I'm down to my last 2 loose crystals and need to order a few 6.25s. On the list now is : wireless pair, scribbler2 badges for my wife, and various components TBD. Any thoughts?
Hey guys, I've been thinking a lot about the C driver and I see one glaring issue. The "BUS" needs to be locked to prevent contention from multiple cogs accessing the bus for different functions. There will be at least 2 cogs accessing the bus : The cog running the cache driver and the "bus master" cog. SO, this begs the question of how to "lock the bus." This is probably only tricky since I have never ACTUALLY used the locks. Any recommendations? It should be as simple as a repeat loop calling lock until it returns the cog's id: before the first bus command? Then releasing the lock after the last bus command? Pass the lock id to use in one of the optional parameters? Or is there a default?
Try looking at the code at the start of the spi_flash_cache.spin driver. It has code that denominator added to allow sharing of the SPI pins with another driver using a lock. There is no reason why the same code couldn't be used to share your larger bus with another driver. I think you can probably get away with just pasting that into your skeleton-based driver in place of the cache line handling code. If you need help with that let me know.
I will look at spi_flash_cache and see if I can figure it out. Shouldn't be too hard. Right now I'm building the dual-screen board. There's a trick though. I'm using male headers and ribbon cables. Which means mounting the headers for the display on the BOTTOM of the board. The DB9 port also needs to go on the bottom, so I'm thinking about putting the pin-headers on the bottom and use upside down. Should be interesting!
Which means mounting the headers for the display on the BOTTOM of the board.
Yikes - watch the polarities of the plugs. Is this so the display can be some distance away from the board?
Also re 2n2222 yes that will work. BC547 is the sort of "generic" signal transistor here in Australia and I think in the US the 2n2222 is the generic one.
There's a two-fold saving for me using ribbon cables. I have a BUNCH of male pin headers and cables. Not that many female pin headers. I will also be mounting the board in my rack-mount. To use the 3.2" displays I need to fudge the space, so this seems the best way. The problem is: using regular 40-pin PATA cables requires the header be placed on the BOTTOM. A bit tricky but I'm blaming THAT lesson for breaking my display. Until my new displays arrive I'll be using my old one and only one screen. 2 weeks will go fast and there's still much work. The board is 95%. Missing 3.3Vreg and large caps. Still trying to figure out the substitutions since buying caps is not an option and my stock is quickly disappearing.
I've asked the transistor question a few times I'm sure. I don't plan on using a whole roll of de-soldering braid on THIS board Also wondering if the MAX components would interfere with prop-plug? I guess I'll find out soon enough!
The problem is: using regular 40-pin PATA cables requires the header be placed on the BOTTOM. A bit tricky but I'm blaming THAT lesson for breaking my display.
Hmm, could be. I guess once it is soldered and before you do the smoke test, do a conductivity test on pins 1,2 and 39 and 40 at the least and check there are no crossovers.
Re large caps, I get a lot of mine from electronic junk. Computer motherboards are a good source. I once went to a computer store and asked for an old motherboard. The guy's eyes lit up and he showed me a room full of several hundred old PC boxes and told me to take as many as I liked for free as he wanted his room back. He seemed a bit disappointed when I only took one!
Re the max chip, yes it would interfere with the propplug. Just pull the max3232 out of its socket if you are using the propplug.
Re
Not that many female pin headers
I'll be sending you some freebies but it might be another 2-3 weeks.
Comments
{ Skeleton JCACHE external RAM driver Copyright (c) 2011 by David Betz Based on code by Steve Denson (jazzed) Copyright (c) 2010 by John Steven Denson Inspired by VMCOG - virtual memory server for the Propeller Copyright (c) February 3, 2010 by William Henning For the EuroTouch 161 By James Moxham and Joe Heinz TERMS OF USE: MIT License Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. } CON ' default cache dimensions DEFAULT_INDEX_WIDTH = 6 DEFAULT_OFFSET_WIDTH = 7 ' cache line tag flags EMPTY_BIT = 30 DIRTY_BIT = 31 PUB image return @init_vm DAT org $0 ' initialization structure offsets ' $0: pointer to a two word mailbox ' $4: pointer to where to store the cache lines in hub ram ' $8: number of bits in the cache line index if non-zero (default is DEFAULT_INDEX_WIDTH) ' $a: number of bits in the cache line offset if non-zero (default is DEFAULT_OFFSET_WIDTH) ' note that $4 must be at least 2^(index_width+offset_width) bytes in size ' the cache line mask is returned in $0 init_vm mov t1, par ' get the address of the initialization structure rdlong pvmcmd, t1 ' pvmcmd is a pointer to the virtual address and read/write bit mov pvmaddr, pvmcmd ' pvmaddr is a pointer into the cache line on return add pvmaddr, #4 add t1, #4 rdlong cacheptr, t1 ' cacheptr is the base address in hub ram of the cache add t1, #4 rdlong t2, t1 wz if_nz mov index_width, t2 ' override the index_width default value add t1, #4 rdlong t2, t1 wz if_nz mov offset_width, t2 ' override the offset_width default value mov index_count, #1 shl index_count, index_width mov index_mask, index_count sub index_mask, #1 mov line_size, #1 shl line_size, offset_width mov t1, line_size sub t1, #1 wrlong t1, par ' put external memory initialization here jmp #vmflush fillme long 0[128-fillme] ' first 128 cog locations are used for a direct mapped cache table fit 128 ' initialize the cache lines vmflush movd :flush, #0 mov t1, index_count :flush mov 0-0, empty_mask add :flush, dstinc djnz t1, #:flush ' start the command loop waitcmd wrlong zero, pvmcmd :wait rdlong vmline, pvmcmd wz if_z jmp #:wait shr vmline, offset_width wc ' carry is now one for read and zero for write mov set_dirty_bit, #0 ' make mask to set dirty bit on writes muxnc set_dirty_bit, dirty_mask mov line, vmline ' get the cache line index and line, index_mask mov hubaddr, line shl hubaddr, offset_width add hubaddr, cacheptr ' get the address of the cache line wrlong hubaddr, pvmaddr ' return the address of the cache line movs :ld, line movd :st, line :ld mov vmcurrent, 0-0 ' get the cache line tag and vmcurrent, tag_mask cmp vmcurrent, vmline wz ' z set means there was a cache hit if_nz call #miss ' handle a cache miss :st or 0-0, set_dirty_bit ' set the dirty bit on writes jmp #waitcmd ' wait for a new command ' line is the cache line index ' vmcurrent is current cache line ' vmline is new cache line ' hubaddr is the address of the cache line miss movd :test, line movd :st, line :test test 0-0, dirty_mask wz if_z jmp #:rd ' current cache line is clean, just read new one mov vmaddr, vmcurrent shl vmaddr, offset_width call #wr_cache_line ' write current cache line :rd mov vmaddr, vmline shl vmaddr, offset_width call #rd_cache_line ' read new cache line :st mov 0-0, vmline miss_ret ret ' pointers to mailbox entries pvmcmd long 0 ' on call this is the virtual address and read/write bit pvmaddr long 0 ' on return this is the address of the cache line containing the virtual address cacheptr long 0 ' address in hub ram where cache lines are stored vmline long 0 ' cache line containing the virtual address vmcurrent long 0 ' current selected cache line (same as vmline on a cache hit) line long 0 ' current cache line index set_dirty_bit long 0 ' DIRTY_BIT set on writes, clear on reads zero long 0 ' zero constant dstinc long 1<<9 ' increment for the destination field of an instruction t1 long 0 ' temporary variable t2 long 0 ' temporary variable tag_mask long !(1<<DIRTY_BIT) ' includes EMPTY_BIT index_width long DEFAULT_INDEX_WIDTH index_mask long 0 index_count long 0 offset_width long DEFAULT_OFFSET_WIDTH line_size long 0 ' line size in bytes empty_mask long (1<<EMPTY_BIT) dirty_mask long (1<<DIRTY_BIT) ' input parameters to rd_cache_line and wr_cache_line vmaddr long 0 ' external address hubaddr long 0 ' hub memory address ' temporaries used by BREAD and BWRITE ptr long 0 count long 0 ' copy line size and devide by two for byte-word offset 'get_values rdlong hubaddr, hubptr ' get hub address ' rdlong ramaddr, ramptr ' get ram address ' rdlong len, lenptr ' get length ' mov err, #5 ' err=5 'get_values_ret ret 'init 'mov err, #0 ' reset err=false=good 'mov dira,zero ' tristate the pins with the cog dira ' and dira,maskP0P20P22 ' tristates all the common pins 'done 'wrlong err, errptr ' status =0=false=good, else error x 'wrlong zero, comptr ' command =0 (done) ' Pass pasm_n = 0- 7 come to this with P0-P20 and P22 tristated and returns them as this too set137 or dira,maskP22 ' pin 22 is an output andn outa,maskP22 ' set P22low so Y0-Y7 are all high or dira,maskP0P20 ' pins P0-P20 are outputs and outa,maskP0P2low ' set these 3 pins low or outa,pasm_n ' set the 137 pins or outa,maskP22 ' pin 22 high set137_ret ret ' return load161pasm ' uses vmaddr mov count, line_size ' make a copy of line_size AND. shr count, #1 ' devide lenght by two for word-byte mov ptr, hubaddr ' hubaddr = hub page address or outa,maskP0P20 ' set P0-P20 high or dira,maskP0P20 ' output pins 0-20 mov pasm_n,#0 ' group 0 call #set137 ' set the 137 output and outa,maskP0P18low ' pins 0-18 set low or outa,vmaddr ' output addres to 161 chips or outa,maskP19 ' clock high or outa,maskP20 ' load high andn outa,maskP19 ' clock low andn outa,maskP20 ' load low or outa,maskP19 ' clock high or outa,maskP20 ' load high 'andn outa,maskP20 ' load low 'andn outa,maskP19 ' clock low 'or outa,maskP19 ' clock high 'or outa,maskP20 ' load high load161pasm_ret ret stop jmp #stop ' for debugging memorytransfer or dira,maskP16P20 ' so /wr and other pins definitely high or outa,maskP16P20 mov pasm_n,#1 ' back to group 1 for memory transfer call #set137 ' as next routine will always be group 1 or dira,maskP16P20 ' output pins 16-20 or outa,maskP16P20 ' set P16-P20 high (P0-P15 set as inputs or outputs in the calling routine) memorytransfer_ret ret busoutput or dira,maskP0P15 ' set prop pins 0-15 as outputs busoutput_ret ret businput and dira,maskP16P31 ' set P0-P15 as inputs businput_ret ret delaynop nop nop nop nop delaynop_ret ret '---------------------------------------------------------------------------------------------------- ' ' rd_cache_line - read a cache line from external memory ' ' vmaddr is the external memory address to read ' hubaddr is the hub memory address to write ' line_size is the number of bytes to read ' '---------------------------------------------------------------------------------------------------- rd_cache_line ' command T pasmramtohub call #load161pasm ' load the 161 counters with ramaddr call #memorytransfer ' set to group 1, enable P16-P20 as outputs and set P16-P20 high call #businput ' set prop pins P0-P15 as inputs andn outa,maskP16 ' memory /rd low ramtohub_loop mov data_16,ina ' get the data wrword data_16,ptr ' move data to hub andn outa,maskP19 ' clock 161 low or outa,maskP19 ' clock 161 high add ptr,#2 ' increment the hub address djnz count,#ramtohub_loop or outa,maskP16 ' memory /rd high or dira,maskP0P15 ' %00000000_00000000_11111111_11111111 restore P0-P15as outputs and dira,maskP0P20P22 ' tristates all the common pins rd_cache_line_ret ret '---------------------------------------------------------------------------------------------------- ' ' wr_cache_line - write a cache line to external memory ' ' vmaddr is the external memory address to write ' hubaddr is the hub memory address to read ' line_size is the number of bytes to write ' '---------------------------------------------------------------------------------------------------- wr_cache_line ' command S pasmhubtoram call #load161pasm ' load the 161 counters with ramaddr call #memorytransfer ' set to group 1, enable P16-P20 as outputs and set P16-P20 high call #busoutput ' set prop pins P0-P15 as outputs hubtoram_loop and outa,maskP16P31 '%11111111_11111111_00000000_00000000 ' clear for output rdword data_16,ptr ' get the word from hub and data_16,maskP0P15 ' mask to a word only or outa,data_16 ' send out the byte to P0-P15 andn outa,maskP17 ' set mem write low add ptr,#2 ' increment by 2 bytes = 1 word. Put this here for small delay while writes or outa,maskP17 ' mem write high andn outa,maskP19 ' clock 161 low or outa,maskP19 ' clock 161 high djnz count,#hubtoram_loop ' loop this many times and dira,maskP0P20P22 ' tristates all the common pins wr_cache_line_ret ret pasm_n long 0 ' general purpose value data_16 long 0 ' general purpose value maskP0P2low long %11111111_11111111_11111111_11111000 ' P0-P2 low maskP0P20 long %00000000_00011111_11111111_11111111 ' P0-P18 enabled for output plus P19,P20 maskP0P18low long %11111111_11111000_00000000_00000000 ' P0-P18 low maskP16 long %00000000_00000001_00000000_00000000 ' pin 16 maskP17 long %00000000_00000010_00000000_00000000 ' pin 17 maskP18 long %00000000_00000100_00000000_00000000 ' pin 18 maskP19 long %00000000_00001000_00000000_00000000 ' pin 19 maskP20 long %00000000_00010000_00000000_00000000 ' pin 20 maskP22 long %00000000_01000000_00000000_00000000 ' pin 22 maskP16P31 long %11111111_11111111_00000000_00000000 ' pin 16 to pin 31 maskP0P15 long %00000000_00000000_11111111_11111111 ' for masking words maskP16P20 long %00000000_00011111_00000000_00000000 maskP0P20P22 long %11111111_10100000_00000000_00000000 ' for returning all group pins HiZ fit 496
The change is this :mov ptr, hubaddr ' hubaddr = hub page address
Then change all loop references of hubaddr to ptr.Now all tests pass!
Now I need to figure out how to test the actual read and write speeds! Any test scrips for this?
Thanks again for all your help!
Joe
Glad you got something running. Now what else needs to be done?
To try running programs there are a few things to do:
1. Create a .dat file with BSTC
2. Copy the .dat file to the c:\propgcc\propeller-load directory
3. Add the cache driver to your touch161.cfg file.
4. Start SimpleIDE and try a program
Details:
1. Create a .dat file.
bstc -Ograux -c touch161_cache.spin
This creates a touch161_cache.dat file.
# touch161 # IDE:SDLOAD # IDE:SDXMMC clkfreq: 80000000 clkmode: XTAL1+PLL16X baudrate: 115200 rxpin: 31 txpin: 30 cache-driver: touch161_cache.dat cache-size: 8K cache-param1: 0 cache-param2: 0 sd-driver: sd_driver.dat sdspi-do: 24 sdspi-clk: 25 sdspi-di: 26 sdspi-cs: 27
Open the hello demo.
a. Establish a "basis" with LMM mode. That is, choose memory model LMM, and Run Console F8
Verify that LMM hello works.
b. Change memory model to XMMC, select Board Type TOUCH161, and Run Console F8.
Verify that XMMC hello works.
c. Change memory model to XMM-SINGLE, keep Board Type TOUCH161, and Run Console F8.
Verify that XMM-SINGLE hello works.
I'll post some SimpleIDE package code for you to do some performance comparisons later.
XMM-Single works, not sure why XMMC wouldn't. I also tested SD card test in XMM-single. I tried to compile dry.c but:
dry.c:415:23: fatal error: sys/times.h: No such file or directory dry.c:417:76: fatal error: sys/param.h: No such file or directory
that's as far as I getNot sure how to set this?
#include <sys/param.h> /* If your system doesn't have this, use -DHZ=xxx */
Sounds like I'm getting ahead of myself since no XMMC though.
SimpleIDE will not build the dhrystone test because of the way it must be built.
I've attached a dry_xmmc.elf and dry_xmm_single.elf in a .zip for you.
Use the loader in a command window for these examples.
propeller-load -r -t -b touch161 dry_xmmc.elf
propeller-load -r -t -b touch161 dry_xmm_single.elf
Did you try the hello examples?
C:\Users\Joe\Downloads\dry>propeller-load -r -t -b dracTouchEX dry_xmm_single.elf Propeller Version 1 on COM21 Loading the serial helper to hub memory 9528 bytes sent Verifying RAM ... OK Loading cache driver 'ET_cache.dat' 1088 bytes sent Loading program image to RAM 17408 bytes sent Loading .xmmkernel 1724 bytes sent [ Entering terminal mode. Type ESC or Control-C to exit. ] Dhrystone Benchmark, Version C, Version 2.2 Program compiled without 'register' attribute Using STDC clock(), HZ=80000000 Trying 5000 runs through Dhrystone: Final values of the variables used in the benchmark: Int_Glob: 5 should be: 5 Bool_Glob: 1 should be: 1 Ch_1_Glob: A should be: A Ch_2_Glob: B should be: B Arr_1_Glob[8]: 7 should be: 7 Arr_2_Glob[8][7]: 5010 should be: Number_Of_Runs + 10 Ptr_Glob-> Ptr_Comp: 536899232 should be: (implementation-dependent) Discr: 0 should be: 0 Enum_Comp: 2 should be: 2 Int_Comp: 17 should be: 17 Str_Comp: DHRYSTONE PROGRAM, SOME STRING should be: DHRYSTONE PROGRAM, SOME STRING Next_Ptr_Glob-> Ptr_Comp: 536899232 should be: (implementation-dependent), same as above Discr: 0 should be: 0 Enum_Comp: 1 should be: 1 Int_Comp: 18 should be: 18 Str_Comp: DHRYSTONE PROGRAM, SOME STRING should be: DHRYSTONE PROGRAM, SOME STRING Int_1_Loc: 5 should be: 5 Int_2_Loc: 13 should be: 13 Int_3_Loc: 7 should be: 7 Enum_Loc: 1 should be: 1 Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING should be: DHRYSTONE PROGRAM, 1'ST STRING Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING should be: DHRYSTONE PROGRAM, 2'ND STRING Microseconds for one run through Dhrystone: 1400 Dhrystones per Second: 714
Here is one reason why xmmc might not work:
A program compiled using -mxmmc will have code that lives at 0x30000000. A program that is compiled in -mxmm-single mode will have code and data that live at 0x20000000. In order for an xmmc program to work, your cache driver must make the external memory visible at the 0x30000000 address. Most cache drivers don't decode the high order bits anyway so there are images of the memory repeated starting at 0x20000000 and repeating throughout the rest of memory. If you're decoding those high order bits then that might be why your xmmc program isn't working. Also, you might want to mask off those bits anyway since I believe your memory has many fewer address bits. Could those high bits be causing you trouble?
Now let me say I do understand what you're saying, for the most part. To make sure I DO understand:
Mask of the high bits of the EXTERNAL address. eg:
mov address, vmaddr and address,maskhighadd .. .. maskhighadd long %00000000_00000111_11111111_11111111 'access to full memory space
So copy vmaddr and then mask off the address bits my memory doesn't have? Now am I able to mask off even more bits to "reserve" memory for the rest of the system? Say :maskhighadd long %00000000_00000000_01111111_11111111 'access to 32k partition of memory?
Kidding of course. Did have to find my own HC00 chip though.
Word wide memory needs a small adjustment: A1 should map to A0 on the chips. Add line 205.
and outa,maskP0P18low ' pins 0-18 set low shr vmaddr, #1 ' schematic connects SRAM A0 to A0, not A1 - jsd. line 205 or outa,vmaddr ' output addres to 161 chips
Preliminary results before optimizations - SSF listed for comparison:
Memory Model
Board Type
Dhrystones/Second
LMM
SSF (HUB)
6983
XMMC
SSF
1256
LMM
TOUCH161 (HUB)
6983
XMMC
TOUCH161
1278
XMM-SINGLE
TOUCH161
713
Preliminary results before optimizations - SSF listed for comparison:
Memory Model
Board Type
Dhrystones/Second
LMM
TOUCH161 (HUB)
6983
XMMC
TOUCH161
1364
XMM-SINGLE
TOUCH161
754
Then, I ran dry xmm-single and the numbers check out. 754 DhryPerSec. GOOD.
C:\Users\Joe\Downloads\dry>propeller-load -r -t -b touch161 dry_xmm_single.elf Propeller Version 1 on COM21 Loading the serial helper to hub memory 9528 bytes sent Verifying RAM ... OK Loading cache driver 'new_cache.dat' 1084 bytes sent Loading program image to RAM 17408 bytes sent Loading .xmmkernel 1724 bytes sent [ Entering terminal mode. Type ESC or Control-C to exit. ] Dhrystone Benchmark, Version C, Version 2.2 Program compiled without 'register' attribute Using STDC clock(), HZ=80000000 Trying 5000 runs through Dhrystone: Final values of the variables used in the benchmark: Int_Glob: 5 should be: 5 Bool_Glob: 1 should be: 1 Ch_1_Glob: A should be: A Ch_2_Glob: B should be: B Arr_1_Glob[8]: 7 should be: 7 Arr_2_Glob[8][7]: 5010 should be: Number_Of_Runs + 10 Ptr_Glob-> Ptr_Comp: 536899232 should be: (implementation-dependent) Discr: 0 should be: 0 Enum_Comp: 2 should be: 2 Int_Comp: 17 should be: 17 Str_Comp: DHRYSTONE PROGRAM, SOME STRING should be: DHRYSTONE PROGRAM, SOME STRING Next_Ptr_Glob-> Ptr_Comp: 536899232 should be: (implementation-dependent), same as above Discr: 0 should be: 0 Enum_Comp: 1 should be: 1 Int_Comp: 18 should be: 18 Str_Comp: DHRYSTONE PROGRAM, SOME STRING should be: DHRYSTONE PROGRAM, SOME STRING Int_1_Loc: 5 should be: 5 Int_2_Loc: 13 should be: 13 Int_3_Loc: 7 should be: 7 Enum_Loc: 1 should be: 1 Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING should be: DHRYSTONE PROGRAM, 1'ST STRING Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING should be: DHRYSTONE PROGRAM, 2'ND STRING Microseconds for one run through Dhrystone: 1324 Dhrystones per Second: 754
Then I try the xmmC and I still get nothing? FailC:\Users\Joe\Downloads\dry>propeller-load -r -t -b touch161 dry_xmmc.elf Propeller Version 1 on COM21 Loading the serial helper to hub memory 9528 bytes sent Verifying RAM ... OK Loading cache driver 'new_cache.dat' 1084 bytes sent Loading program image to flash 15412 bytes sent Loading .xmmkernel 1724 bytes sent [ Entering terminal mode. Type ESC or Control-C to exit. ]
I still don't know what my xmmc issue is *NOOB* Still the numbers are pretty good.Now I'm worried though, because we might have a new board on the way with a different Group Select chip *MCP23008* I will be using the current board for quite a while, since I have a software fix for the display issue.
*edit*
Sorry about making you look for the HC00. I only have 1.
Well at least something jives. Maybe you should play some blues guitar and try again later.
I don't get the dry_xmmc.elf problem. Works fine for me.
How big is the file? Is it a text file now by any chance?
>dir /A dry_xmmc.elf Volume in drive C is OS Volume Serial Number is 6C8D-7E65 Directory of c:\gccdev\propside\MyProjects\dry 05/27/2012 04:40 PM 37,526 dry_xmmc.elf ....
Downloaded it again and tested using the config file below:
propeller-load -r -t -b touch161 dry_xmmc.elf
# touch161 # IDE:SDLOAD # IDE:SDXMMC clkfreq: 80000000 clkmode: XTAL1+PLL16X baudrate: 115200 rxpin: 31 txpin: 30 tvpin: 12 # only used if TV_DEBUG is defined cache-driver: touch_cache.dat cache-size: 8K cache-param1: 0 cache-param2: 0 sd-driver: sd_driver.dat sdspi-do: 24 sdspi-clk: 25 sdspi-di: 26 sdspi-cs: 27 load-target: ram
Can't just write this off. We need a root-cause otherwise we could be hiding some problem by accident.
Not sure how to get there though. The only thing i can think of is that the dry_xmmc.elf file is corrupted.
Thanks,
--Steve
I can't thank everyone enough! Steve, David, James and everyone else who has contributed to the community. You guys rock!
One driving force for getting a GUI into C is that there will eventually not be enough room in Spin. The demo program I am using is taking about 3/4 of hub memory and every extra demo function takes it inexorably towards that moment when there is no code space left.
Also the demo program is getting unwieldy now as we are needing different code for the two displays. So that may well lend itself to a C Class or similar, where you call a common function DrawPixel(x,y,color) and it works for both displays.
So one could think about two header files, ILI9325.h and SSD1289.h
Each would have the same functions - load a font, draw a radiobox, draw a textbox, draw a line etc but the code would be different.
I've been reading through this webpage on headers http://www.gamedev.net/page/resources/_/technical/general-programming/organizing-code-files-in-c-and-c-r1798 which has some great advice.
I also need to get my head around writing pasm in C. I pushed Catalina out to the limit with this but it could have been easier than it was. One nice thing about the proptool and spin is you can have the pasm and spin code in the same file. You can even copy and paste them so they are near each other so it is a couple of taps on a page up or page down button to swap between the two. That makes debugging a lot easier. At the other extreme, there is the idea of binary blobs of pasm that you have to copy to an SD card as separate files that are precompiled. That involves lots of removing the SD card which is a pain, slower and I ended up solving that by automating downloads of the pasm part as part of the compile process. That got complicated behind the scenes as there was a precompiler that split the pasm part out of the C program, then the pasm part was compiled separately and downloaded separately, and the C part was then compiled and run. I ended up writing an IDE and it had two panes, one for C and one for pasm.
How is this being done in GCC?
Can you mix and match pasm like in the Spintool? Is it better to actually write the pasm in C (I think this is possible, right?)
Or is it better to keep things totally separate - GCC does C and when you start a cog you pass a function the name of a binary file - "mycog.bin" - and that is loaded off the SD card and into a cog?
In the latter scenario, one would have SimpleIDE and the Proptool open at the same time and use Windows to flip between the two. I did that for a while in Catalina and it is a quite plausible way of doing development albeit slow until we got file transfer to SD card working using xmodem. Not quite as "integreated" as the Spin proptool but at least it worked with existing tools.
What is the best way to proceed?
I guess as a practical example, the latest board uses the MCP23008 chip. That will involve getting an I2C driver from the Obex. I haven't looked in detail, but hopefully there is one that conforms to what I call the "Gracey" standard based on Chips original mouse/display drivers where all variables are passed as a contiguous array at cog startup. If so, then the pasm part could compiled separately and the Spin translated to C.
And the temptation at that point would be to combine the GUI driver pasm with the I2C driver pasm so it only uses one cog instead of two.
Thoughts and sage advice would be most appreciated.
I could use some sage advice too since I've been cutting sage, milkweed, and some kind of spiny miserable cactus all afternoon.
The best generic approach to writing GCC COG drivers right now is to use PASM which we're all familiar with or COG C. COG C programs are special kind of C file. GCC also supports GAS assembler if you want to try that. GCC inline ASM is based on GAS syntax. Honestly I would avoid any inline ASM in C if at all possible. It just causes trouble.
We have some COG driver programs written in COG C. VGA, I2C, and others. I want to write a cache cog in COG C.
Btw, you can include Spin/PASM files in SimpleIDE projects. Some day we'll have an integrated Spin compiler too.
Thanks,
--Steve
Personally I think using GAS drivers is easiest, but that's probably a matter of taste. For small drivers inline assembly is fine, and actually pretty straightforward. For example here's the real time clock driver, from the GCC library. The actual COG code is in the __asm__ portion, as a string (using the ANSI C convention that strings are automatically concatenated if there are no tokens between them).
/* * very simple COG program to keep the 64 bit _default_ticks variable * up to date */ #include <propeller.h> #include <sys/rtc.h> __asm__( " .section .cogrtcupdate,\"ax\"\n" "L_main\n" " rdlong oldlo, default_ticks_ptr\n" " mov newlo, CNT\n" " cmp newlo,oldlo wc\n" " add default_ticks_ptr,#4\n" " rdlong newhi, default_ticks_ptr\n" " addx newhi,#0\n" /* adds in the carry set above */ " sub default_ticks_ptr,#4\n" /* the sequence here makes sure to write newlo,newhi in that * order and in the fewest possible hub windows; if all readers * of default_ticks also read lo,hi in the fewest possible * hub cycles, then all users will * see consistent values */ " wrlong newlo, default_ticks_ptr\n" " add default_ticks_ptr,#4\n" " wrlong newhi, default_ticks_ptr\n" " sub default_ticks_ptr,#4\n" " jmp #L_main\n" "newlo long 0\n" "newhi long 0\n" "oldlo long 0\n" "default_ticks_ptr long __default_ticks\n" ); void _rtc_start_timekeeping_cog(void) { extern unsigned int _load_start_cogrtcupdate[]; if (_default_ticks_updated) return; /* someone is already updating the time */ _default_ticks_updated = 1; #if defined(__PROPELLER_XMMC__) || defined(__PROPELLER_XMM__) unsigned int *buffer; // allocate a buffer in hub memory for the cog to start from buffer = __builtin_alloca(2048); memcpy(buffer, _load_start_cogrtcupdate, 2048); cognew(buffer, 0); #else cognew(_load_start_cogrtcupdate, 0); #endif }
This uses the linker magic that automatically turns any section starting or ending with ".cog" into a COG overlay.
This is a good one for the PropGCC Cookbook!!
(you are writing the Propeller GCC Cookbook, aren't you????
I'm presuming no problem with adding a comment eg
" wrlong newlo, default_ticks_ptr ' this is a comment\n"
or do comments need to be /* ... */
You can put the comments into the GAS string using single quotes (as in your example), or you can put them outside the string using C/C++ style comments -- either should be fine.
Started building dual-screen board, should be done in the next few days. Displays are still 2 weeks out. I'm a bit bummed about writing a new driver for the mpc board. It is a nice design and has some promising features! A side note, I'm still an 74hc08 away from firing the board up. That and I'm running out of sockets! As soon as I "secure" my new location I'm ordering a BUNCH of sockets, as well as a few parts I'm in dire need of. I find it cheaper to buy 8 pin, 16 pin, and 40 pin sockets in quantity and cut down to fit. It takes a bit longer but saves a few cents. The 40-pins don't cut down to 32 as well as the 16's to 14's. I'm down to my last 2 loose crystals and need to order a few 6.25s. On the list now is : wireless pair, scribbler2 badges for my wife, and various components TBD. Any thoughts?
Try looking at the code at the start of the spi_flash_cache.spin driver. It has code that denominator added to allow sharing of the SPI pins with another driver using a lock. There is no reason why the same code couldn't be used to share your larger bus with another driver. I think you can probably get away with just pasting that into your skeleton-based driver in place of the cache line handling code. If you need help with that let me know.
*edit*
2n2222 for reset transistor??
Yikes - watch the polarities of the plugs. Is this so the display can be some distance away from the board?
Also re 2n2222 yes that will work. BC547 is the sort of "generic" signal transistor here in Australia and I think in the US the 2n2222 is the generic one.
I've asked the transistor question a few times I'm sure. I don't plan on using a whole roll of de-soldering braid on THIS board
Hmm, could be. I guess once it is soldered and before you do the smoke test, do a conductivity test on pins 1,2 and 39 and 40 at the least and check there are no crossovers.
Re large caps, I get a lot of mine from electronic junk. Computer motherboards are a good source. I once went to a computer store and asked for an old motherboard. The guy's eyes lit up and he showed me a room full of several hundred old PC boxes and told me to take as many as I liked for free as he wanted his room back. He seemed a bit disappointed when I only took one!
Re the max chip, yes it would interfere with the propplug. Just pull the max3232 out of its socket if you are using the propplug.
Re I'll be sending you some freebies but it might be another 2-3 weeks.