New XMM hardware

David Betz · 2012-05-01 06:59

Dr_Acula wrote: »

I have my suspicions it is fsrw not liking my SD cards, but I don't want it to be that one because it leads down the path of porting Kye's SD driver into propGCC.

This won't be difficult. I may try it tonight. The main problem is modifying Kye's driver to support various CS mechanisms. I guess there is already a version that supports the C3 so maybe this won't be a big deal. However, this means that Kye's code will be used for loading stuff from the SD card. It won't be used for filesystem access once the C program is running. That is handled by the PropGCC library and dosfs.c.

Dr_Acula · 2012-05-01 07:19

All working!!

I think I had so many modified files I ended up not even using the files I thought I was using. So - all my mistakes, no one elses.

So - complete reinstall of GCC.

Brand new config file.

Go back to jazzed's original test post #48.

And it works absolutely perfectly - my little BCX basic program that concatenates strings:

Loading cache driver
Initializing SD card
Mounting SD filesystem
Opening AUTORUN.PEX
Loading kernel
Loading cluster map
Initializing cache
Starting program
Enter a string: test
Enter another string: test2
testtest2

Hey thanks guys. I would not have got this working without all your help.

Out of this comes a lesson for me - keep track of all the version changes I make along the way!

And I have just one tiny bug fix. In the terminal console, once it scrolls off the bottom of the page, it seems not to display the very last line and you have to keep using the scrollbar to see the last line.

Thanks again everyone!

And now... onwards to building a touchscreen GUI in GCC!

David Betz · 2012-05-01 07:29

Dr_Acula wrote: »
All working!!

I think I had so many modified files I ended up not even using the files I thought I was using. So - all my mistakes, no one elses.

So - complete reinstall of GCC.

Brand new config file.

Go back to jazzed's original test post #48.

And it works absolutely perfectly - my little BCX basic program that concatenates strings:
Loading cache driver
Initializing SD card
Mounting SD filesystem
Opening AUTORUN.PEX
Loading kernel
Loading cluster map
Initializing cache
Starting program
Enter a string: test
Enter another string: test2
testtest2
Hey thanks guys. I would not have got this working without all your help.

Out of this comes a lesson for me - keep track of all the version changes I make along the way!

And I have just one tiny bug fix. In the terminal console, once it scrolls off the bottom of the page, it seems not to display the very last line and you have to keep using the scrollbar to see the last line.

Thanks again everyone!

And now... onwards to building a touchscreen GUI in GCC!

Congratulations! I'm glad you got it working! Have you decided to use the SD cache driver instead of creating a new cache driver for your board?

Dr_Acula · 2012-05-01 07:36

Oh yes, I'll be using the SD cache driver. I really can't see the need at the moment for putting a program in the external ram. Piles of code to write, no real benefit, and it uses up ram space better used by storing icons and fonts.

And... in a general sense, I think that the SD cache driver is a solution that works for so many other boards. Indeed, I might go so far as saying that it may render pretty much all the external memory XMM program solutions obsolete.

All you have to do is tell the program which pins you are using for your SD card.

I'm not entirely sure the benefits of that have been appreciated yet by the wider prop community. I think one can start to say things like the extra memory in the prop 2 is not going to be needed for storing programs. Which means it is fully free to be used as a screen buffer, while at the same time allowing programs gigabytes in size.

Your thoughts?

David Betz · 2012-05-01 07:43

Dr_Acula wrote: »

Oh yes, I'll be using the SD cache driver. I really can't see the need at the moment for putting a program in the external ram. Piles of code to write, no real benefit, and it uses up ram space better used by storing icons and fonts.

And... in a general sense, I think that the SD cache driver is a solution that works for so many other boards. Indeed, I might go so far as saying that it may render pretty much all the external memory XMM program solutions obsolete.

All you have to do is tell the program which pins you are using for your SD card.

I'm not entirely sure the benefits of that have been appreciated yet by the wider prop community. I think one can start to say things like the extra memory in the prop 2 is not going to be needed for storing programs. Which means it is fully free to be used as a screen buffer, while at the same time allowing programs gigabytes in size.

Your thoughts?

Yes, it is very handy to be able to run large programs with just an SD card interface, I'll be interested to see if you find the performance good enough though. As I think I mentioned in an earlier post, the 512 byte cache lines that are used in the SD cache driver are really too big for an 8k cache. I guess cache performance depends on the specific application though and it may be that this works well for you. Let us know how this works out for you.

jazzed · 2012-05-01 07:51

Dr_Acula wrote: »

All working!!

Great! Thanks for the extra push.

Dr_Acula wrote: »

And now I am confused. Because I did that with the PropBOE file I modified and renamed Touch161. And I did it with the dracblade file.

Now in the c:\propgcc\propeller_load directory there is only one file that starts with "T" and that is TOUCH161.CFG However in the dropdown menu at the top of the SIDE there are now three entries - TOUCH161, TOUCH161 : PROPBOE, and TOUCH161-SDXMMC

So somewhere along the line the IDE seems to be creating files. Which one to use, and what exactly are these files? Can you view them or edit them?

The IDE looks at your config file and interprets it as follows:

If your config file contains a [name] that does not match the config file, the IDE assumes an inheritance relationship which is a future feature. That's why you get a TOUCH161:PROPBOE board type. You should either remove the line with [name] or change it to match the file name.

If your config file contains # IDE:SDXMMC, the IDE will create a TOUCH161-SDXMMC board type. Specifying this board type tells the IDE that you want to build a ".PEX" file and load the program to SDcard where it will be run in XMMC mode. You can either Burn F11 the loader program to EEPROM or just send it to HUB RAM F10.

If your config file contains # IDE:SDLOAD, the IDE will create a TOUCH161-SDLOAD board type. Specifying this board type tells the IDE that you want to build a ".PEX" file and load the program to SDcard.. At boot up from HUB RAM F10 or EEPROM F11, the loader program will read the AUTORUN.PEX file from SDcard and load it into SRAM (according to the cache driver you've specified) and run the program there. For SRAM only programs, use XMM-SINGLE model; for FLASH+SRAM programs like on C3, use XMM-SPLIT.

Hope this helps.

Thanks,
--Steve

jazzed · 2012-05-01 09:05

Dr_Acula wrote: »

Oh yes, I'll be using the SD cache driver. I really can't see the need at the moment for putting a program in the external ram. Piles of code to write, no real benefit, and it uses up ram space better used by storing icons and fonts.

The main drawbacks to SDXMMC are: 1) any global program data is generally limited to HUB RAM, and 2) SDXMMC performance.

SDXMMC does not obsolete other XMM designs. SDXMMC performance is somewhat impaired relative to other solutions since blocks must be read up to 512 bytes at a time to read a cache line - as David mentioned. Some optimization can be had with the SDXMMC driver so that it uses smaller and more cache lines, but it may not make enough difference for the solution to be faster than a comparable single bit SPI Flash solution. Allowing use of smaller/more cache lines would mean a different driver version which would use a counter clock to "seek" the flash block at highest speed to the right position for collecting the smaller cache line. As it stands SDXMMC performance is probably ok for most "business logic" programs with the help of some HUBTEXT functions, but there are better ways to do it if folks have the pins and patience. Yes, SDXMMC is very useful, but it is not an end all solution for every need.

Does that help?

Thanks,
--Steve

David Betz · 2012-05-01 09:12

jazzed wrote: »

The main drawbacks to SDXMMC are: 1) any global program data is generally limited to HUB RAM

This is correct but it also applies to any XMMC solution including a single bit SPI flash. The only way around this is to use an XMM driver that puts data in external memory like the one for the C3 but unfortunately performance is further degraded using external data. Also, even with an XMM driver, your stack must still fit in hub memory. We don't currently have a memory model that places the stack in external memory.

David Betz · 2012-05-01 09:58

Dr_Acula wrote: »

And... in a general sense, I think that the SD cache driver is a solution that works for so many other boards. Indeed, I might go so far as saying that it may render pretty much all the external memory XMM program solutions obsolete.

All you have to do is tell the program which pins you are using for your SD card.

I have another comment about this. You mention that one of the appealing things about the SD cache driver is the fact that you can use it by just setting a few pin numbers. We also have a SPI flash cache driver that has a similar ability. You just set a couple of cache driver parameters to indicate the MISO, MOSI, and CLK pins and then set a few more parameters to indicate how CS is handled for your SPI flash chip. It should work with any SPI flash chip that is compatible with the Atmel AT26DF081A chip. I think that includes chips from Winbond and SST.

Dr_Acula · 2012-05-01 16:10

Thanks for the explanation re the config files. That all makes sense now.

The main drawbacks to SDXMMC are: 1) any global program data is generally limited to HUB RAM, and 2) SDXMMC performance.

Re XMM on the SD card, I'm starting to write code that loads and reloads cogs many times a second and I think that opens up the possibility of moving more speed critical code into assembly.

The other thing that is available on the touchblade is one megabyte of parallel access ram. It is reading one word at a time and so has to be at least 16x faster than any single pin serial solution. It is currently being used for pictures and fonts but there is no reason it can't be used for global program memory data too. In spin for instance, I have reserved 64k of that ram for a text buffer and I think that can become the core of a text editor program.

GCC opens up a whole lot of possibilities here.

Re David's post #70 comment, I think that can tie in nicely with jazzed's hardware concept of having groups of propeller pins with a formal handover between groups where all pins go HiZ (with weak pullups) at the transition. With a 74HC137 chip this gives you hundreds of propeller pins. All you have to then do is make sure the software formally relinquishes pins when it is finished. So if you have a flash cache, fine, just grab 4 spare pins (there are always spare pins when you have hundreds available), take the group pin output from the 137, run it through a logic OR gate hc32 with the relevant propeller pin and use that to control the /CS pin on your spi flash cache. From the code perspective, you have to remember to "close" the SPI device when you have finished with it. But that is no different to remembering to close files, and is just one line of code to add eg spi_flash_close();

So you could have many places you can store data. SD card. SPI flash. Serial and parallel ram.

The drivers for these sorts of things don't need to be "internal" to GCC. They can be released as C code you can paste into your program, or turned into .h files once they are stable. The parallel ram driver for instance for the touch161 is not going to need any GCC modifications. In fact, all the touch161 config file is going to be is the propboe file but with different pins for the SD card.

The other thing I just thought of is that with the availability of many pins, you can create many SPI ports. I'm thinking ADC, DAC, SPI to Parallel, all the real world interface things. If each one of those came with a little C driver (some of which already have been done) it makes coding in C much easier.

Many many exciting possibilities here!

pedward · 2012-05-01 16:37

Is there a good benchmarking demo that would clearly demonstrate the speed of the different XMM options?

pedward · 2012-05-01 16:44

I'm thinking of designing a "flip" board for the PropKey. One side of the board will have the pattern for 1 or 2 qSPI flash chips, for ~16MB of flash, the other side will have SPI RAM chips to offer something like 256K of 8bit SPI memory. You decide which side to populate; which side is "up" determines the pinout and which headers are soldered.

It is conceivable that you could populate both sides and "flip" the board to do different things -- or you can put headers on both sides and have Flash on one set of pins and RAM on another for XMM-SPLIT.

jazzed · 2012-05-01 22:31

pedward wrote: »

Is there a good benchmarking demo that would clearly demonstrate the speed of the different XMM options?

Here are some FIBO results run with SimpleIDE. With FCACHE turned off, you can get some estimate of relative performance for small loops. Of course this is somewhat deceiving. The Dhrystone test shows a different picture. Real world programs can behave very differently from both.

Memory Model
Board Type
FCACHE
Fibo(24) Time ms
Fibo(0..24) Accumulated Clock Ticks

COG
C3F (HUB)
N/A
185
14863316

LMM
C3F (HUB)
Enabled
354
28351824

LMM
C3F (HUB)
Disabled
660
52817632

XMMC
C3F
Disabled
2595
207669984

XMMC
C3
Disabled
2595
207669808

XMM-SINGLE
C3
Disabled
2925
234078432

XMM-SPLIT
C3
Disabled
2925
23407843

Memory Model
Board Type
FCACHE
Fibo(24) Time ms
Fibo(0..24) Accumulated Clock Ticks

COG
SDRAM (HUB)
N/A
185
14863316

LMM
SDRAM (HUB)
Enabled
354
28351824

LMM
SDRAM (HUB)
Disabled
660
52817632

XMMC
SDRAM-SDXMMC
Disabled
2265
181260848

XMMC
SDRAM
Disabled
2889
231179456

XMM-SINGLE
SDRAM
Disabled
2888
231090320

XMM-SPLIT
SDRAM
N/A
N/A
N/A

Dhrystone is a good benchmark. Unfortunately, the SimpleIDE is not designed to produce 2 objects from a single C source file, so I won't post an example of that.

We can run this dry.c benchmark using Propeller-GCC and make.

All Dhrystone results gathered from 80MHz C3 and C3F board types.
Compiled with: propeller-elf-gcc -O2 -mfcache -Dprintf=__simple_printf -DMSC_CLOCK -DINTEGER_ONLY -DFIXED_NUMBER_OF_PASSES=2000 ... pass 1 and pass 2 .o files linked to dry.elf.

Here are some summary Dhrystones per second results for comparison:

Memory Model
Board Type
Dhrystones/Second

LMM
C3 (HUB)
6983

XMMC
C3F
1422

XMMC
C3
264

XMMC
C3-SDXMMC
176

XMMC-SINGLE
C3
138

XMMC-SPLIT
C3
104

Memory Model
Board Type
Dhrystones/Second

LMM
SSF (HUB)
6983

XMMC
SSF
1256

Memory Model
Board Type
Dhrystones/Second

LMM
SDRAM (HUB)
6983

XMMC
SDRAM
1207

XMMC-SINGLE
SDRAM
640

XMMC
SDRAM-SDXMMC
180

XMMC-SPLIT
SDRAM
N/A

Here is another performance table that shows some very different results and makes broader comparisons. It compares performance of different solutions using a fairly small transpose program that doesn't reload cache much and a larger program that uses sprintf which results in more cache line reloads. The smaller the results the better.

Spin
LMM
FLASH XMMC
EEPROM XMMC
SSF XMMC
SSF2 XMMC
SDRAM XMMC
SDCARD XMMC

Transpose
64,400
887
902
904
902
902
902
903

sprintf
1,351
531
27,386
10,046
3,515
3,415
3,886
34,262

CacheLines
N/A
N/A
32
128
64
128
64
16

LineSize
N/A
N/A
128
64
128
64
32
512

CacheSize
N/A
N/A
4KB
8KB
8KB
8KB
2KB
8KB

Note that SDCARD XMMC has the worst performance for sprintf which causes many cache misses and SSF has the best.
This table is consistent with what can be seen in applications like David Betz' EBASIC program.

jazzed · 2012-05-01 22:36

Dr_Acula wrote: »

...

The other thing I just thought of is that with the availability of many pins, you can create many SPI ports. I'm thinking ADC, DAC, SPI to Parallel, all the real world interface things. If each one of those came with a little C driver (some of which already have been done) it makes coding in C much easier.

Many many exciting possibilities here!

Interesting reading. I'll help however I can. Seems like you've solved the pins problem ... hope that's working out A-OK.

David Betz · 2012-05-02 03:28

Dr_Acula wrote: »

Thanks for the explanation re the config files. That all makes sense now.

Re XMM on the SD card, I'm starting to write code that loads and reloads cogs many times a second and I think that opens up the possibility of moving more speed critical code into assembly.

The other thing that is available on the touchblade is one megabyte of parallel access ram. It is reading one word at a time and so has to be at least 16x faster than any single pin serial solution. It is currently being used for pictures and fonts but there is no reason it can't be used for global program memory data too. In spin for instance, I have reserved 64k of that ram for a text buffer and I think that can become the core of a text editor program.

GCC opens up a whole lot of possibilities here.

Re David's post #70 comment, I think that can tie in nicely with jazzed's hardware concept of having groups of propeller pins with a formal handover between groups where all pins go HiZ (with weak pullups) at the transition. With a 74HC137 chip this gives you hundreds of propeller pins. All you have to then do is make sure the software formally relinquishes pins when it is finished. So if you have a flash cache, fine, just grab 4 spare pins (there are always spare pins when you have hundreds available), take the group pin output from the 137, run it through a logic OR gate hc32 with the relevant propeller pin and use that to control the /CS pin on your spi flash cache. From the code perspective, you have to remember to "close" the SPI device when you have finished with it. But that is no different to remembering to close files, and is just one line of code to add eg spi_flash_close();

So you could have many places you can store data. SD card. SPI flash. Serial and parallel ram.

The drivers for these sorts of things don't need to be "internal" to GCC. They can be released as C code you can paste into your program, or turned into .h files once they are stable. The parallel ram driver for instance for the touch161 is not going to need any GCC modifications. In fact, all the touch161 config file is going to be is the propboe file but with different pins for the SD card.

The other thing I just thought of is that with the availability of many pins, you can create many SPI ports. I'm thinking ADC, DAC, SPI to Parallel, all the real world interface things. If each one of those came with a little C driver (some of which already have been done) it makes coding in C much easier.

Many many exciting possibilities here!

How do you get hundreds of pins using a 3-8 demultiplexer?

Dr_Acula · 2012-05-02 04:56

Oh ok, not that many. 8 groups of 21 pins = 168, as some pins are always connected, eg 4 for SD, 2 for eeprom, 2 for download, 2 for audio, and 1 to control the 137.

Of course, you can't control all the pins at the same time. But you can select a group and latch out to a latch, or talk to an SPI port, and then use those same pins in another group for different things. I'm talking to external ram, and then in a different group, using the same pins to talk to the touch part of the touch screen via an SPI port.

The rule is that each group of 21 pins must be isolated from other groups. So if pins are outputs, use an OR gate with that group number. And if an input, isolate with a 244 or a 4016 etc. Sometimes you don't need any logic chips though - eg if an SPI device has the /CS pin connected to that group control (ie the output of the 137) then when the group is deselected (high) all the pins on that SPI device go HiZ.

David Betz · 2012-05-02 06:40

Dr_Acula wrote: »

Oh ok, not that many. 8 groups of 21 pins = 168, as some pins are always connected, eg 4 for SD, 2 for eeprom, 2 for download, 2 for audio, and 1 to control the 137.

Of course, you can't control all the pins at the same time. But you can select a group and latch out to a latch, or talk to an SPI port, and then use those same pins in another group for different things. I'm talking to external ram, and then in a different group, using the same pins to talk to the touch part of the touch screen via an SPI port.

The rule is that each group of 21 pins must be isolated from other groups. So if pins are outputs, use an OR gate with that group number. And if an input, isolate with a 244 or a 4016 etc. Sometimes you don't need any logic chips though - eg if an SPI device has the /CS pin connected to that group control (ie the output of the 137) then when the group is deselected (high) all the pins on that SPI device go HiZ.

I'm a software guy so bear with me. How are the 137's wired to achive a 21 pin group?

Dr_Acula · 2012-05-02 07:30

The 137 is a 3 to 8 decoder with a latch. One of the 8 outputs is low at any one time. Y0-Y7

If Y0 is low then that designates group0. Anything that talks to the 21 propeller pins in group0 is only allowed to do so if Y0 is low. So you need some extra logic in some circumstances. Outputs can only go out if Y0 is low and the relevant prop pin is low. Any inputs coming in are only allowed to get through if Y0 is low.

Initially this seems like you are going to need a lot of extra chips. But many times, whole groups of pins can be done without any extra logic. An example is the ram chip. These are group1, ie Y1 output from the 137, and that goes to the /CS line on the ram chips. So the ram chips will ignore any inputs to any other pins when Y1 is high, and also all memory pins are in HiZ. So there is no contention with other groups. Ditto the ILI9325 display - the whole display can be disabled with just one /CS pin.

Dr_Acula · 2012-05-11 00:38

I think averagejoe and I have finalised on a board design where the SD card is not going to change pins any more. It is pins 24-27 in the order below and there are no group selects or anything complicated on those pins. Just 4 prop pins direct to the SD card.

I'm using a file called TOUCH161.CFG which is this

# [propboe]
# IDE:SDXMMC
    clkfreq: 80000000
    clkmode: XTAL1+PLL16X
    baudrate: 115200
    rxpin: 31
    txpin: 30
    cache-driver: eeprom_cache.dat
    cache-size: 8K
    cache-param1: 0
    cache-param2: 0
    eeprom-first: TRUE
    sd-driver: sd_driver.dat
    sdspi-do: 24
    sdspi-clk: 25
    sdspi-di: 26
    sdspi-cs: 27

And so begins porting over the touchscreen code.

First thing that is *very* nice compared to C89. Binary numbers 0b work and don't need to translate everything into hex

  int binarynumber;
  binarynumber = 0b11111110; // binary value
  printf("%x\n",binarynumber); // print hex value

which prints FE as expected.

average joe · 2012-05-13 23:44

Wow, you guys have been working on this for a while! Why am I always so late to the party???

So now that I have a couple boards somewhat working, I've been REALLY wanting to get C running. Looks like everything I need to get started is here, just need to run through things. I am wondering though, as a NOOB to C, if DOC could post some stuff that is already working on the new board? Also, any hints about getting up to speed as quick as possible? I've programed with several different languages so C shouldn't be a huge problem. The board design is quite flexible and I can't wait to start playing around.

A very noob question, how hard would it be to port C code from a different micro-controller *pic18f* and different hardware? I don't expect it to be a carbon copy, just looking for an easy way to get open-source code chunks to handle complex functions, *such as an Arpeggiator or Real-time Pattern Sequencer, etc?*

*edited*
ER, well I did some research and I'm STILL looking for the FULL source. But in theory it should be possible! Seems thethe project I want to port's toolchain turns out to be GCC! There's hints as about compiling for PC, so should work!

jazzed · 2012-05-14 09:36

average joe wrote: »

A very noob question, how hard would it be to port C code from a different micro-controller *pic18f* and different hardware? I don't expect it to be a carbon copy, just looking for an easy way to get open-source code chunks to handle complex functions, *such as an Arpeggiator or Real-time Pattern Sequencer, etc?*

Propeller GCC supports the strict ANSI-C C89 and GNU variant C99 specification.

Whatever you find for PIC should work assuming:

1. type widths are ported - type "int" will have different width
2. any hardware specific device interfaces are ported

For XMM a cache_interface.spin compatible PASM driver will be needed.

average joe · 2012-05-21 11:21

Now that the SD card filesystem is working in XMMC and LMM, it appears we can continue work on porting the hardware over. The most obvious issue to me is the dataBus width change necessary. Since we are using 2 SRAM chips in parallel, we can get *and put* 2 bytes at a time to SRAM. Also, we will need to change the addressing methods to use the counters. Not sure about where the code to control the display should go since it's included in the PASM RamDriver. I believe I have condensed down the "core" object in SPIN and PASM to

For ILI:
Program:   3,897 Longs
Variable:    290 Longs
For SSD:                   'with glitch fix extension 
Program:   3,909 Longs
Variable:    290 Longs

These are old portions, I need to look at the UPDATED SRAM filesystem.
I read the cache files here : http://forums.parallax.com/showthread.php?140010-Files&p=1099520&viewfull=1#post1099520 I'm lost as to what to do next. I want to help out, but the smart guys need to point me in the right direction, so I can idiot savant my way through. LOL!

David Betz · 2012-05-21 12:57

average joe wrote: »
Now that the SD card filesystem is working in XMMC and LMM, it appears we can continue work on porting the hardware over. The most obvious issue to me is the dataBus width change necessary. Since we are using 2 SRAM chips in parallel, we can get *and put* 2 bytes at a time to SRAM. Also, we will need to change the addressing methods to use the counters. Not sure about where the code to control the display should go since it's included in the PASM RamDriver. I believe I have condensed down the "core" object in SPIN and PASM to
For ILI:
Program:   3,897 Longs
Variable:    290 Longs
For SSD:                   'with glitch fix extension 
Program:   3,909 Longs
Variable:    290 Longs
These are old portions, I need to look at the UPDATED SRAM filesystem.
I read the cache files here : http://forums.parallax.com/showthread.php?140010-Files&p=1099520&viewfull=1#post1099520 I'm lost as to what to do next. I want to help out, but the smart guys need to point me in the right direction, so I can idiot savant my way through. LOL!

Can you describe your hardware so we know what kind of interface you're using? To make your own cache driver, you just need to fill in the attached cache driver skelton. Mostly, you need to write functions to read and write one cache line.

average joe · 2012-05-21 13:06

The schematic is posted here:http://forums.parallax.com/showthread.php?137266-Propeller-GUI-touchscreen-and-full-color-display&p=1095915&viewfull=1#post1095915

It's basically 2 Sram chips in parallel. 16 bit data bus, 19 bit address bus. Similar to the dracblade except 2 chips and addressed by 161's. Doc posted quite a bit of the code previously, but here's what we've been using for the ramdriver:

CON
'' Modified code from Cluso's triblade
'' commands to move blocks of data to the ILI9325 touchscreen display
' DoCmd(command_, hub_address, ram_address, block_length)
' I - initialise     
' S - Move data from hub to ram
' T - Move data from ram to hub
' U - Move data from ram to display
' V - Hub to display
' W - not working - writecom in pasm
' E - convert from .raw RGB to two byte ILI format RRRRRGGG_GGG_BBBBB
' F - convert from .bmp BGR format to two byte ILI format
' X - merge icon and background based on a mask
' Y - Change 137 output Returns P0-P20 and P22 in HiZ. Pass hubaddrs
' Z - Set 161 pins. Returns in group 1

VAR

' communication params(5) between cog driver code - only "command" and "errx" are modified by the driver
   long  command, hubaddrs, ramaddrs, blocklen, errx, cog ' rendezvous between spin and assembly (can be used cog to cog)
'        command  = A to Z etc =0 when operation completed by cog
'        hubaddrs = hub address for data buffer
'        ramaddrs = ram address for data
'        blocklen = ram buffer length for data transfer
'        errx     = returns =0 (false=good), else <>0 (true & error code)
'        cog      = cog no of driver (set by spin start routine)
   

DAT
'' +-----------------------------------------------------------------------------------------------+
'' | Touchblade 161 Ram Driver (with grateful acknowlegements to Cluso and Average Joe)            |
'' +-----------------------------------------------------------------------------------------------+
                        org     0
tbp2_start    ' setup the pointers to the hub command interface (saves execution time later
                                      '  +-- These instructions are overwritten as variables after start
comptr                  mov     comptr, par     ' -|  hub pointer to command                
hubptr                  mov     hubptr, par     '  |  hub pointer to hub address            
ramptr                  add     hubptr, #4      '  |  hub pointer to ram address            
lenptr                  mov     ramptr, par     '  |  hub pointer to length                 
errptr                  add     ramptr, #8      '  |  hub pointer to error status           
cmd                     mov     lenptr, par     '  |  command  I/R/W/G/P/Q                  
hubaddr                 add     lenptr, #12     '  |  hub address                           
ramaddr                 mov     errptr, par     '  |  ram address                           
len                     add     errptr, #16     '  |  length                                
err                     nop                     ' -+  error status returned (=0=false=good) 


' Initialise hardware tristates everything and read/write set the pins
init                    mov     err, #0                  ' reset err=false=good
                        'mov     dira,zero                ' tristate the pins with the cog dira
                        and     dira,maskP0P20P22       ' tristates all the common pins

done                    wrlong  err, errptr             ' status  =0=false=good, else error x
                        wrlong  zero, comptr            ' command =0 (done)
' wait for a command (pause short time to reduce power)
pause
'                        mov     ctr, delay      wz      ' if =0 no pause
'              if_nz     add     ctr, cnt
'              if_nz     waitcnt ctr, #0                 ' wait for a short time (reduces power)
                        rdlong  cmd, comptr     wz      ' command ?
              if_z      jmp     #pause                  ' not yet
' decode command
                        cmp     cmd, #"S"       wz      ' hub to ram
              if_z      jmp     #pasmhubtoram           
                        cmp     cmd, #"T"       wz      ' ram to hub
              if_z      jmp     #pasmramtohub
                        cmp     cmd, #"U"       wz      ' ram to display
              if_z      jmp     #pasmramtodisplay
                        cmp     cmd, #"V"       wz      ' hub to display
              if_z      jmp     #pasmhubtodisplay           
                        cmp     cmd, #"E"       wz      ' convert 3 byte .raw format to 2 byte .ili format - hub to hub
              if_z      jmp     #rawtoiliformat
                        cmp     cmd, #"F"       wz      ' convert 3 byte .bmp format BGR to 2 byte ili format (same as E but order reversed)
              if_z      jmp     #bmptoiliformat              
 '                       cmp     cmd, #"W"       wz      ' lcdwritecom in pasm, not working
 '             if_z      jmp     #pasmlcdwritecom
                        cmp     cmd, #"X"       wz      ' merge icon and background based on a mask
              if_z      jmp     #mergeicons
                        cmp     cmd, #"Y"       wz      ' change the 137 output
              if_z      jmp     #changegroup
                        cmp     cmd, #"Z"       wz      ' set the 161 counters
              if_z      jmp     #set161          
                        cmp     cmd, #"I"       wz      ' init
              if_z      jmp     #init     
                        mov     err, cmd                ' error = cmd (unknown command)
                        jmp     #done
                        
' ----------------- common routines -------------------------------------

get_values              rdlong  hubaddr, hubptr         ' get hub address
                        rdlong  ramaddr, ramptr         ' get ram address
                        rdlong  len, lenptr             ' get length
                        mov     err, #5                 ' err=5
get_values_ret          ret

           ' Pass pasm_n = 0- 7 come to this with P0-P20 and P22 tristated and returns them as this too
set137                  or      dira,maskP22            ' pin 22 is an output
                        andn    outa,maskP22            ' set P22low so Y0-Y7 are all high
                        or      dira,maskP0P20          ' pins P0-P20 are outputs
                        and     outa,maskP0P2low        ' set these 3 pins low
                        or      outa,pasm_n             ' set the 137 pins
                        or      outa,maskP22            ' pin 22 high
set137_ret              ret                             ' return                        


load161pasm                                             ' uses ramaddr
                        or      outa,maskP0P20          ' set P0-P20 high     
                        or      dira,maskP0P20          ' output pins 0-20
                        mov     pasm_n,#0               ' group 0
                        call    #set137                 ' set the 137 output
                        and     outa,maskP0P18low       ' pins 0-18 set low
                        or      outa,ramaddr            ' output addres to 161 chips
                        or      outa,maskP19            ' clock high
                        or      outa,maskP20            ' load high
                        andn    outa,maskP19            ' clock low
                        andn    outa,maskP20            ' load low
                        or      outa,maskP19            ' clock high
                        or      outa,maskP20            ' load high 
load161pasm_ret         ret

stop                   jmp     #stop                  ' for debugging

memorytransfer          or      dira,maskP16P20         ' so /wr and other pins definitely high
                        or      outa,maskP16P20
                        mov     pasm_n,#1               ' back to group 1 for memory transfer
                        call    #set137                 ' as next routine will always be group 1
                        or      dira,maskP16P20         ' output pins 16-20
                        or      outa,maskP16P20         ' set P16-P20 high (P0-P15 set as inputs or outputs in the calling routine)
memorytransfer_ret      ret

busoutput               or      dira,maskP0P15          ' set prop pins 0-15 as outputs
busoutput_ret           ret

businput                and     dira,maskP16P31         ' set P0-P15 as inputs
businput_ret            ret  

delaynop                nop
                        nop
                        nop
                        nop
delaynop_ret            ret


' ------------------ single letter commands  -------------------------------------
 
' command S
pasmhubtoram            call    #get_values             ' get hubaddr,ramaddr,len
                        call    #load161pasm            ' load the 161 counters with ramaddr
                        call    #memorytransfer         ' set to group 1, enable P16-P20 as outputs and set P16-P20 high
                        call    #busoutput              ' set prop pins P0-P15 as outputs
hubtoram_loop           and     outa,maskP16P31         '%11111111_11111111_00000000_00000000       ' clear for output                   
                        rdword  data_16,hubaddr         ' get the word from hub
                        and     data_16,maskP0P15       ' mask to a word only
                        or      outa,data_16            ' send out the byte to P0-P15
                        andn    outa,maskP17            ' set mem write low
                        add     hubaddr,#2              ' increment by 2 bytes = 1 word. Put this here for small delay while writes
                        or      outa,maskP17            ' mem write high
                        andn    outa,maskP19            ' clock 161 low
                        or      outa,maskP19            ' clock 161 high
                        djnz    len,#hubtoram_loop      ' loop this many times
                        jmp     #init                   ' tristate pins and listen for commands

' command T
pasmramtohub            call    #get_values             ' get hubaddr,ramaddr,len
                        call    #load161pasm            ' load the 161 counters with ramaddr
                        call    #memorytransfer         ' set to group 1, enable P16-P20 as outputs and set P16-P20 high                             
                        call    #businput               ' set prop pins P0-P15 as inputs
                        andn    outa,maskP16            ' memory /rd low
ramtohub_loop           mov     data_16,ina             ' get the data
                        wrword  data_16,hubaddr         ' move data to hub
                        andn    outa,maskP19            ' clock 161 low
                        or      outa,maskP19            ' clock 161 high
                        add     hubaddr,#2              ' increment the hub address 
                        djnz    len,#ramtohub_loop
                        or      outa,maskP16            ' memory /rd high  
                        or      dira,maskP0P15          ' %00000000_00000000_11111111_11111111 restore P0-P15as outputs
                        jmp     #init                   ' ' tristate pins and listen for commands

' command U
pasmramtodisplay        call    #get_values             ' get hubaddr,ramaddr,len
                        call    #load161pasm            ' load the 161 counters with ramaddr
                        call    #memorytransfer         ' set to group 1, enable P16-P20 as outputs and set P16-P20 high                             
                        call    #businput               ' set prop pins 0-15 as inputs so doesn't interfere with the transfer
                        or      outa,maskP18            ' ILI_RS high
                        andn    outa,maskP16            ' memory /rd low  
ramtodisplay_loop       andn    outa,maskP20            ' ILI write low
                        or      outa,maskP20            ' ILI write high
                        andn    outa,maskP19            ' clock 161 low
                        or      outa,maskP19            ' clock 161 high
                        djnz    len,#ramtodisplay_loop
                        or      outa,maskP16            ' memory /rd high  
                        or      dira,maskP0P15          ' %00000000_00000000_11111111_11111111 restore P0-P15as outputs
                        jmp     #init

' command V
pasmhubtodisplay        call    #get_values             ' get hubaddr,ramaddr,len
                        call    #memorytransfer         ' set to group 1, enable P16-P20 as outputs and set P16-P20 high                             
                        call    #busoutput              ' set P0-P15 as outputs
hubtodisplay_loop       and     outa,maskP16P31         '%11111111_11111111_00000000_00000000       ' clear for output                   
                        rdword  data_16,hubaddr         ' get the word from hub
                        and     data_16,maskP0P15       ' mask to a word only
                        or      outa,data_16            ' send out the byte to P0-P15
                        andn    outa,maskP20            ' ILI write low
                        or      outa,maskP20            ' ILI write high
                        add     hubaddr,#2              ' one word
                        djnz    len,#hubtodisplay_loop
                        jmp     #init

'command E
RawtoILIformat          ' takes a .raw 3 byte RRRRRRRR GGGGGGGG BBBBBBBB and converts to 2 byte RRRRRGGG GGGBBBBB
                        ' pass hubaddr, ramaddr and len
                        ' hubaddr is source location, len is number of pixels
                        ' ramaddr is destination in hub (messy naming) and length is 2/3 of blocklength
                        call    #get_values ' gets hubaddress, ramaddress and len (ignores ramaddress)
rawloop
                        rdbyte red,hubaddr
                        add hubaddr,#1
                        rdbyte green,hubaddr
                        add hubaddr,#1
                        rdbyte blue,hubaddr
                        add hubaddr,#1
                        call #rgbtoili
                        wrbyte ililow,ramaddr
                        add ramaddr,#1
                        wrbyte ilihigh,ramaddr
                        add ramaddr,#1
                        djnz    len,#rawloop            ' loop until done 
                        jmp     #init                   ' set pins to tristate

RGBtoILI                ' pass red,green, blue, returns ililow and ilihigh
                        shr     red,#3                  ' 000RRRRR 
                        shl     red,#3                  ' RRRRR000 
                        shr     green,#2                ' 00GGGGGG
                        mov     ilihigh,green           ' ilihigh = 00GGGGGG
                        shr     ilihigh,#3              ' ilihigh = 00000GGG
                        or      ilihigh,red             ' ilihigh = RRRRRGGG
                        and     green,#%00000111        ' 00000GGG
                        shl     green,#5                ' GGG00000
                        mov     ililow,green            ' ililow = GGG00000
                        shr     blue,#3                 ' blue = 000BBBBB
                        or      ililow,blue             ' ililow = GGGBBBBB
RGBtoILI_ret            ret

BMPtoILIformat          ' takes a .bmp 3 byte BBBBBBBB GGGGGGGG RRRRRRRR and converts to 2 byte RRRRRGGG GGGBBBBB
                        ' same as E above but BGR instead of RGB
                        ' pass hubaddr, ramaddr and len
                        ' hubaddr is source location, len is number of pixels
                        ' ramaddr is destination in hub (messy naming) and length is 2/3 of blocklength
                        call    #get_values ' gets hubaddress, ramaddress and len (ignores ramaddress)
bmploop
                        rdbyte blue,hubaddr
                        add hubaddr,#1
                        rdbyte green,hubaddr
                        add hubaddr,#1
                        rdbyte red,hubaddr
                        add hubaddr,#1
                        call #rgbtoili
                        wrbyte ililow,ramaddr
                        add ramaddr,#1
                        wrbyte ilihigh,ramaddr
                        add ramaddr,#1
                        djnz    len,#bmploop            ' loop until done 
                        jmp     #init                   ' set pins to tristate
' **** command X *********************

MergeIcons              call    #get_values ' gets hubaddress, ramaddress,len which are used here as background,icon,mask
                        mov     pasm_n,#59               ' do a single row
mergeiconsloop          rdbyte  ililow,len                 ' reuse ililow, so this is rdword mask,maskcounter
                        and     ililow,#%11111             ' mask off low 5 bits and use just the blue as this is a grayscale bitmap
                        rdword  red,hubaddr              ' reuse red, so actually this is rdword background,backgroundcounter                        
                        cmp     ililow,#%10000   wc       ' compare if >128 (ie mid level gray)
              if_c      jmp     #mergeskip
                        rdword  green,ramaddr            ' reuse green, so this is rdword iconpixel, iconpixelcounter 
                        wrword  green,hubaddr            ' if replace, then move icon pixel to the background     
mergeskip               add     hubaddr,#2
                        add     ramaddr,#2
                        add     len,#2
                        djnz    pasm_n,#mergeiconsloop            ' loop until done 
                        jmp     #init                   'set pins to tristate 

' *** command Y **********
changegroup             call    #get_values             'gets hubaddr, ramaddr,len which are used here as background,icon,mask           
                        mov     pasm_n,hubaddr          ' pass hubaddr
                        call    #set137                 ' change the group
                        jmp     #init

' *** command Z **********
set161                  call    #get_values             ' gets ramaddr = the 161 value to set
                        call    #load161pasm            ' ramaddr to the 161 chips
                        call    #memorytransfer         ' change to group 1
                        jmp     #init                        

                        

'pasmlcdwritecom         call    #get_values             ' use hubaddr as the data
'                        or      dira,maskP0P20          ' set these pins high (pass all pins tristated)
'                        or      outa,maskP0P20          '  set pins high
'                        mov     pasm_n,#2               '  mem transfer
'                        call    #set138                 ' set the 138
'                        andn    outa,maskP18            ' P18 ILIRS low
'                        and     outa,maskP16P31         ' set P0-P15 low
'                        or      outa,hubaddr            ' send out the data
'                        andn    outa,maskP17            ' ILI write low
'                        or      outa,maskP17            ' ILI write high
'                        jmp     #init                   ' set pins to tristate  

' variables
pasm_n                  long    0                                    ' general purpose value
data_16                 long    0                                    ' general purpose value
ililow                  long    0                                    ' low data byte 
ilihigh                 long    0                                    ' high data byte 
red                     long    0                                    ' red, green blue variables
green                   long    0
blue                    long    0           

' constants
Zero                    long    %00000000_00000000_00000000_00000000 ' used in several places
maskP0P2low             long    %11111111_11111111_11111111_11111000 ' P0-P2 low
maskP0P20               long    %00000000_00011111_11111111_11111111 ' P0-P18 enabled for output plus P19,P20    
maskP0P18low            long    %11111111_11111000_00000000_00000000 ' P0-P18 low
maskP16                 long    %00000000_00000001_00000000_00000000 ' pin 16
maskP17                 long    %00000000_00000010_00000000_00000000 ' pin 17
maskP18                 long    %00000000_00000100_00000000_00000000 ' pin 18
maskP19                 long    %00000000_00001000_00000000_00000000 ' pin 19
maskP20                 long    %00000000_00010000_00000000_00000000 ' pin 20
maskP22                 long    %00000000_01000000_00000000_00000000 ' pin 22
maskP16P31              long    %11111111_11111111_00000000_00000000 ' pin 16 to pin 31
maskP0P15               long    %00000000_00000000_11111111_11111111 ' for masking words
maskP16P20              long    %00000000_00011111_00000000_00000000
maskP0P20P22            long    %11111111_10100000_00000000_00000000 ' for returning all group pins HiZ
                        fit     496

I'll start looking at the skeleton file and see if I can decipher it! I'm sorry I'm asking you guys to "hold my hand" as I work through this!

David Betz · 2012-05-21 13:11

average joe wrote: »

The schematic is posted here:http://forums.parallax.com/showthread.php?137266-Propeller-GUI-touchscreen-and-full-color-display&p=1095915&viewfull=1#post1095915

It's basically 2 Sram chips in parallel. 16 bit data bus, 19 bit address bus. Similar to the dracblade except 2 chips and addressed by 161's. Doc posted quite a bit of the code previously, but here's what we've been using for the ramdriver:

I'll start looking at the skeleton file and see if I can decipher it! I'm sorry I'm asking you guys to "hold my hand" as I work through this!

You should be able to use your hubtoram and ramtohub functions to implement the cache line read/write and that should be about enough to get you going. Is this hardware available yet?

jazzed · 2012-05-21 13:17

average joe wrote: »

Now that the SD card filesystem is working in XMMC and LMM, it appears we can continue work on porting the hardware over. ...

Basically you need to copy dracblade_cache.spin to dractouch_cache.spin and replace lines 162 through 254 with code that interfaces with the hardware.

The fundamental idea is to have BSTART set an address, and have BREAD/BWRITE do read/write based on the cache line length. This looks like a very simple change.

David Betz · 2012-05-21 13:18

jazzed wrote: »

Basically you need to copy dracblade_cache.spin to dractouch_cache.spin and replace lines 162 through 254 with code that interfaces with the hardware.

The fundamental idea is to have BSTART set an address, and have BREAD/BWRITE do read/write based on the cache line length. This looks like a very simple change.

Yeah but that's more or less like filling in the skeleton but with the added disadvantage that you may make mistakes excising the old code.

average joe · 2012-05-21 13:28

I think doc has the design pretty much nailed down. There is a 2-screen variant I'm eagerly waiting to get my hands on too! Should function similar at the low level.
I'm not sure what you mean "available," Dr. Acula and I have boards built and running. If you're asking about "distribution" spec boards, you'd have to check with Doc. I know we are "close" to release and sure we could get boards to those interested in the future. I'm not sure how soon exactly.
These use the 40-pin touchscreen boards and I believe we will be supporting the SSD1289 controller. *These are the displays I'm using. Doc is using an ili9325 right now but it sounds like he will be converting to the SSD soon. I'm not sure if you guys have any of these displays or similar floating around.
I've been looking at the skeleton cache file and I think I understand it *sort of* I'm sure I will have questions later.
Thanks again for all the help!

David Betz · 2012-05-21 13:30

average joe wrote: »

I think doc has the design pretty much nailed down. There is a 2-screen variant I'm eagerly waiting to get my hands on too! Should function similar at the low level.
I'm not sure what you mean "available," Dr. Acula and I have boards built and running. If you're asking about "distribution" spec boards, you'd have to check with Doc. I know we are "close" to release and sure we could get boards to those interested in the future. I'm not sure how soon exactly.
These use the 40-pin touchscreen boards and I believe we will be supporting the SSD1289 controller. *These are the displays I'm using. Doc is using an ili9325 right now but it sounds like he will be converting to the SSD soon. I'm not sure if you guys have any of these displays or similar floating around.
I've been looking at the skeleton cache file and I think I understand it *sort of* I'm sure I will have questions later.
Thanks again for all the help!

You don't need to understand the part at the start. It just takes a request from the XMM kernel and figures out whether the cache line is already in hub memory. If not, it will call your code to read it from external memory. If it needs to reuse a cache line that contains modified data it will call your code to write that cache line to external memory before reusing it. That's about all there is to it.

jazzed · 2012-05-21 13:38

David Betz wrote: »

Yeah but that's more or less like filling in the skeleton but with the added disadvantage that you may make mistakes excising the old code.

I totally missed your earlier post about a skeleton driver. Ya, that's the right approach.

New XMM hardware

Comments