Catalina 3.3

Dr_Acula · 2011-10-14 00:00

Great! I look forward to doing some speed tests. Thanks Ross.

RossH · 2011-10-14 00:50

Dr_Acula wrote: »

Great! I look forward to doing some speed tests. Thanks Ross.

Be aware that the cache is a great leveller. By the very nature of caching, the majority of accesses are done at the cache access speed (which is the same for all XMM implementations) not the speed the cache can access the backing store. So even if one XMM implementation is many times faster than another, the differences in the cache performance will be marginal.

You can artifically "accentuate" the difference by running a program such as the Catalina RAM Test program, which deliberately interleaves accesses across the whole address range (it does this to force as many page accesses as it can, so that any page access errors or marginal timing conditions will be found). That means this program will show a much larger difference than most real-world programs, which tend to access RAM much more sequentially.

Ross.

Cluso99 · 2011-10-14 02:15

Ross: I just downloaded Catalina 3.3 via the "one touch" link.

I fixed a couple of typos in the CodeBlocks Quickstart guide and attached here. If I may make a suggestion, an example using the serial and PST (i.e. using the while(1) statement for those who may not understand how) could be in order.

Do you have a config for a simple prop with serial only (nothing else)?? I just thought I would start off with a simple non-descript prop board, as that is usually what I start with and use the serial/PST for debugging purposes. Your thoughts?

RossH · 2011-10-14 04:15

Cluso99 wrote: »

Ross: I just downloaded Catalina 3.3 via the "one touch" link.

I fixed a couple of typos in the CodeBlocks Quickstart guide and attached here.

Where?

Cluso99 wrote: »

If I may make a suggestion, an example using the serial and PST (i.e. using the while(1) statement for those who may not understand how) could be in order.
Do you have a config for a simple prop with serial only (nothing else)?? I just thought I would start off with a simple non-descript prop board, as that is usually what I start with and use the serial/PST for debugging purposes. Your thoughts?

Perhaps I should expand the "basic" target to include the PC option (since that is suitable for all platforms) and include a better introductory example.

Ross.

graffix · 2011-10-14 04:56

Can you point me in the right direction? I am looking for more info and demos as to what catalina is all about.

"Catalina is a C compiler plus a set of C libraries and device drivers for use with the Parallax Propeller microcontroller. Catalina is a cross-compiler based on the retargetable C compiler "lcc". Catalina runs on Windows or Linux"
thanks

Cluso99 · 2011-10-14 05:11

Sorry Ross, will attach tomorrow. On XOOM atm. Yes a "basic" target would be ideal IMHO. It is certainly what I start with every time.

Rayman · 2011-10-14 05:39

Ross, Ok this looks good:

#ifdef DEBUG
  t_printf("here's where I am\n");
#endif 
//Then compile your program like so:
catalina hello_world.c -lc -W-DDEBUG

But this outputs on the regular display, right?
Is there a simple way to have it go instead out a serial connection?

I'm just trying to look for ways of debugging flash based programs, since (if I understand you correctly) BlackBox debugging doesn't yet work for code running from flash.

BTW: I'm starting writing things up now... Can I say that you plan to extend Catalina to Prop2? Is that on the table for you?

Rayman · 2011-10-14 05:44

graffix, I'm fairly new to Catalina too, but I'll give you some tips.
Assuming you have Windows, just download from this link:

A Windows "one touch" installer.

Then, run "Code::Blocks" and open up a "hello world" example workspace...

Your Windows "Start Menu" will also now have "Catalina" and you find documentation there...

Dr_Acula · 2011-10-14 06:02

Can you point me in the right direction? I am looking for more info and demos as to what catalina is all about.

Simple answer is that Catalina enables you to write programs in C on the propeller chip. I'm busy writing huge programs (200k) but maybe this is an opportune time to step back to something a lot simpler.

Cluso said

If I may make a suggestion, an example using the serial and PST (i.e. using the while(1) statement for those who may not understand how) could be in order.

Ross probably already has all this working, and written up somewhere, but if I wanted to build something on a breadboard I might start really simple and work through the following:

1) C program to flash a led once a second. Led is on pin1 of the prop. Program is downloaded to internal ram in the prop (no eeprom attached)
2) Attach an eeprom and download the same program to the eeprom. Turn it off, turn it on and the program is still there.
3) Download a C program to either ram or eeprom, and send back "Hello World" via the programming serial port (and display on a terminal program).
4) Add a few more lines of C code to receive data from a terminal program. When you press a character and send it to the board, the board replies with "you pressed character A"
5) Add a keyboard. Needs 4 resistors and a 5V regulator and mini DIN6 socket. Write a program in C to receive a character on the keyboard and send it up the serial port.
6) Add a VGA display. Add 8 resistors and a D15 socket. Write a program in C to display characters from the keyboard on the display.
7) Or, a TV display. Add 3 resistors and a RCA socket. Write a program in C to display characters from the keyboard on the display.
8) Add a mouse. 4 resistors and mini DIN6 socket. Read the mouse and display the co-ordinates on the screen.
9) Add an SD card. 4 resistors and an SD socket (or module from Futurlec etc). Write "Hello World" into MYPROG.TXT on the SD card.
10) Add external memory. Take the simplest external memory (?SPI) and add the cache and write a program that includes a huge array. Write then read a value from a location in that array that is at address 40,000.
11) Describe other external memory options that Catalina supports.

Can Catalina do all of these things?

If so, I am thinking this might be ready for an Instructable.

Rayman · 2011-10-14 08:15

Ross, there's 2 more things I don't quite understand. I did actually look through the documentation this time, but didn't see anything...

First: How can I tell if a program will or won't run in "small" mode with Flashpoint SuperQuad (2MB flash, no SRAM).
The "sst" example compiles and load without errors, but the screen is garbage...
The build log gives these numbers:

code = 242048 bytes
cnst = 30976 bytes
init = 1116 bytes
data = 5540 bytes
file = 439176 bytes

I think I asked a similar question before, but I want to be sure... Does the sum, cnst+init+data+(a little extra for other stuff) need to be less than 32kB ?
Ok, you did already answer this, but:
Can you make it so that the compiler gives an error if cnst+init+data>32kB in "small" mode?
Also, is there some maximum extra required space, maxextra, you could use to give a warning if cnst+init+data+maxextra>32kB?
Any way put the "cnst" part onto flash?

Second: I don't understand the purpose or function of the "set memory size" build option. The "sst" example seems to work fine on RamPage without this option set... The executable is slightly smaller though. Could something go horribly wrong without this being set? I remember that you told me the original size of 384kB was too small and I had to increase it to 512kB. Is this some kind of upper limit? Can't it get that info from the "XMM_RO_SIZE" variable?

Rayman · 2011-10-14 13:14

Ross, I kinda think having two windows to enter "Build Options" is going to cause people who don't read the instructions (like me) a lot of trouble...
I appears to me that the "Release" target build options are always shown when you open the "Build Options" window.
Wouldn't it be better to only have things selected there?
Actually, if you also select "Use Target Options Only" you'd be double sure not to make a mistake.
If this makes sense, maybe the demo projects and workspaces could be set this way on future releases of Catalina?

RossH · 2011-10-14 14:47

Rayman wrote: »
Ross, Ok this looks good:
#ifdef DEBUG
  t_printf("here's where I am\n");
#endif 
//Then compile your program like so:
catalina hello_world.c -lc -W-DDEBUG
But this outputs on the regular display, right?
Is there a simple way to have it go instead out a serial connection?

Ah! I see now - you want to use the serial output in addition to the normal output (TV/VGA etc). The easiest way to do this is to load an OBEX serial driver as a separate Spin object, and then interact with it. Examples of doing this are given in the demos\spinc folder (the examples show the use of the OBEX keyboard and TV objects, but could be easily modified to use the FullDuplexSerial object). However, your biggest problem is that doing this sucks up addiitonal hub space, and will quite likely make your program too large to run anyway.

Rayman wrote: »

I'm just trying to look for ways of debugging flash based programs, since (if I understand you correctly) BlackBox debugging doesn't yet work for code running from flash.

Correct. I have a hack that makes it work, but I'm not happy with it and need to do some more work on it.

Rayman wrote: »

BTW: I'm starting writing things up now... Can I say that you plan to extend Catalina to Prop2? Is that on the table for you?

Hmmm. 6 months ago I would have said "yes". Now, I am a bit unwilling to promise anything since the Prop2 itself has been so delayed. Ask me again when there is a documented instruction set for the Prop2.

Ross.

RossH · 2011-10-14 14:52

Dr_Acula wrote: »

Simple answer is that Catalina enables you to write programs in C on the propeller chip. I'm busy writing huge programs (200k) but maybe this is an opportune time to step back to something a lot simpler.

...

Can Catalina do all of these things?

Of course. These are all either already done, quite easy to do, or described in the existing documentation. Are you suggesting I should write more documentation? I hesitate to do this since very few people even seem to read the documentation that already exists.

But if you want to have a go at it, be my guest!

Ross.

RossH · 2011-10-14 14:54

graffix wrote: »

Can you point me in the right direction? I am looking for more info and demos as to what catalina is all about.

"Catalina is a C compiler plus a set of C libraries and device drivers for use with the Parallax Propeller microcontroller. Catalina is a cross-compiler based on the retargetable C compiler "lcc". Catalina runs on Windows or Linux"
thanks

Hi graffix. Catalina is a stand-alone development system for the Propeller to allow you to develop programs in C rather than Spin. Are you familiar with C?

Ross.

RossH · 2011-10-14 14:56

Rayman wrote: »

Ross, I kinda think having two windows to enter "Build Options" is going to cause people who don't read the instructions (like me) a lot of trouble...
I appears to me that the "Release" target build options are always shown when you open the "Build Options" window.
Wouldn't it be better to only have things selected there?
Actually, if you also select "Use Target Options Only" you'd be double sure not to make a mistake.
If this makes sense, maybe the demo projects and workspaces could be set this way on future releases of Catalina?

Yes, this is the commonest mistake that people seem to make when using Code::Blocks. I'll see if I can simplify it further.

Ross.

Dr_Acula · 2011-10-14 15:23

Of course. These are all either already done, quite easy to do, or described in the existing documentation.

Great news! Ok, what I need to think about is the hardware side. I want to start off with something fairly non-threatening like a breadboard and build it up, starting with a propeller chip, xtal, a few caps and the download circuit.

I need to think about how to connect things to a breadboard. SD card is a bit tricky but futurlec make a nice adapter board like this http://www.futurlec.com.au/MiniBoards.jsp

Or these http://www.futurlec.com/Computer_Adapters.shtml

Need something similar for the mini 6 pin DIN sockets as they don't fit into a breadboard. I am thinking of a few tiny PCBs that take plugs/sockets and convert to SIL socket headers. Then use the flexible wire that comes with these kits http://www.ebay.com/itm/2p-2x400-pts-solderless-breadboard-w-75-pcs-jumper-wire-/260871538014?pt=LH_DefaultDomain_0&hash=item3cbd27c55e (search "jumper wires" if that auction has ended)

I'm thinking that maybe going straight to running C on a board with all the bells and whistles might be a little daunting, but if you start simple, then build up a breadboad as time and budget permit, a staged approach might work out better.

Your thoughts?

Cluso99 · 2011-10-14 15:30

Ross: The file is too big for the forum so I have emailed it to you.

Dr_Acula · 2011-10-14 16:11

WRT caching, I suppose it would depend on the program, but would ram be better than flash ram? ie http://ww1.microchip.com/downloads/en/DeviceDoc/22127a.pdf which are on ebay for about $5ea http://www.ebay.com.au/sch/i.html?_from=R40&_trksid=m570&_nkw=23k256&_sacat=See-All-Categories

And then thinking aloud, what advantages do you get from flash ram, compared with using an sd card?

And thinking further, $5 for 32k does seem a bit expensive compared with $3.25 for a 512k sram.

But the sram uses lots of pins. But speed of access is not so important when you have caching.

I wonder what you could do with some 74HC595 chips and a 512k sram. Ultra slow I know (clock in 32 bits for address plus data plus /rd and /wr). But does it matter so much with caching now available. And 595s plus 512ksram may be the cheapest ram solution that does not wear out? And you would free up a lot of pins.

RossH · 2011-10-14 18:53

Dr_Acula wrote: »

Great news! Ok, what I need to think about is the hardware side. I want to start off with something fairly non-threatening like a breadboard and build it up, starting with a propeller chip, xtal, a few caps and the download circuit.
...
I'm thinking that maybe going straight to running C on a board with all the bells and whistles might be a little daunting, but if you start simple, then build up a breadboad as time and budget permit, a staged approach might work out better.

Your thoughts?

Yes, good idea. But how many people start from a bare chip? Most people start with a board, so why not do the same - i.e. start with something like the Parallax QuickStart board? This would save a lot of bother, and Catalina already runs on that board (serial I/O only, but you could add simple external circuitry for keyboard, mouse, and either a TV or VGA output).

At one point, Parallax was giving these boards away if you had a good idea for using them. I don't know if that offer is still avalable, but why not ask them if they'll send you one?

Ross.

RossH · 2011-10-14 19:04

Dr_Acula wrote: »

WRT caching, I suppose it would depend on the program, but would ram be better than flash ram? ie http://ww1.microchip.com/downloads/en/DeviceDoc/22127a.pdf which are on ebay for about $5ea http://www.ebay.com.au/sch/i.html?_from=R40&_trksid=m570&_nkw=23k256&_sacat=See-All-Categories

Catalina supports serial RAM as well as serial FLASH. Many boards have both.

Dr_Acula wrote: »

And then thinking aloud, what advantages do you get from flash ram, compared with using an sd card?

No idea. May try it one day to find out.

Dr_Acula wrote: »

And thinking further, $5 for 32k does seem a bit expensive compared with $3.25 for a 512k sram.

But the sram uses lots of pins. But speed of access is not so important when you have caching.

Yes, it is still important. Parallel SRAM can be accessed much faster than cached serial RAM - even with the cache in use.

Dr_Acula wrote: »

I wonder what you could do with some 74HC595 chips and a 512k sram. Ultra slow I know (clock in 32 bits for address plus data plus /rd and /wr). But does it matter so much with caching now available. And 595s plus 512ksram may be the cheapest ram solution that does not wear out? And you would free up a lot of pins.

I don't think this would work out any cheaper than just buying SPI RAM in the first place.

Ross.

RossH · 2011-10-14 19:46

Dr_Acula wrote: »

I'm using TV and VGA about equally at the moment and Kyedos comes in both versions. Any chance of producing two versions of Catalyst for the dracblade - a TV version as well?

(Bog standard 40x13 TV driver, colors maybe white on blue, pins 16,17,18)

If you could, then this could be a really useful program for the dracblade. Thanks++

Hi Dr_A,

Replace the file target\DracBlade_HMI.inc file with the one attached, and then recompile Catalyst from the Catalina Command Line using the commands:

cd catalyst
build_all DRACBLADE TV

Let me know how it goes.

Ross.

jazzed · 2011-10-14 19:49

RossH wrote: »

Parallel SRAM can be accessed much faster than cached serial RAM - even with the cache in use.

Can you show your Dhrystone 1.2 results that prove this ? I don't have any of the "fast" SRAM solutions.

RossH · 2011-10-14 21:02

jazzed wrote: »

Can you show your Dhrystone 1.2 results that prove this ? I don't have any of the "fast" SRAM solutions.

I couldn't find 1.2, but I found a copy of 2.1, so I ran this on my RamBlade, which has a nice fast parallel SRAM implementation (but can also use the cache if you choose to enable it). This is just for a quick comparison - I didn't bother turning on any optimization:

First, the uncached result:

Dhrystone Benchmark, Version 2.1 (Language: C)

Program compiled without 'register' attribute

Please give the number of runs through the benchmark: 5000

Execution starts, 5000 runs through Dhrystone
Execution ends

Final values of the variables used in the benchmark:

Int_Glob:            5
        should be:   5
Bool_Glob:           1
        should be:   1
Ch_1_Glob:           A
        should be:   A
Ch_2_Glob:           B
        should be:   B
Arr_1_Glob[8]:       7
        should be:   7
Arr_2_Glob[8][7]:    5010
        should be:   Number_Of_Runs + 10
Ptr_Glob->
  Ptr_Comp:          88072
        should be:   (implementation-dependent)
  Discr:             0
        should be:   0
  Enum_Comp:         2
        should be:   2
  Int_Comp:          17
        should be:   17
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
  Ptr_Comp:          88072
        should be:   (implementation-dependent), same as above
  Discr:             0
        should be:   0
  Enum_Comp:         1
        should be:   1
  Int_Comp:          18
        should be:   18
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:           5
        should be:   5
Int_2_Loc:           13
        should be:   13
Int_3_Loc:           7
        should be:   7
Enum_Loc:            1
        should be:   1
Str_1_Loc:           DHRYSTONE PROGRAM, 1'ST STRING
        should be:   DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:           DHRYSTONE PROGRAM, 2'ND STRING
        should be:   DHRYSTONE PROGRAM, 2'ND STRING

Microseconds for one run through Dhrystone: 4600.0
Dhrystones per Second:                       217.4

Now the result when using an 8kb cache:

Microseconds for one run through Dhrystone: 6000.0
Dhrystones per Second:                       166.7

Now the result when using a 1kb cache:

Microseconds for one run through Dhrystone: 7200.0
Dhrystones per Second:                       138.9

Trying to interpret these results is non-trivial - exactly how much faster uncached access is depends very heavily on how many cache "misses" the program generates. Dhrystone is quite a small program, and it probably doesn't need to page all that much. In real-world cases I typically see programs execute around twice as fast when using direct access than when using cached access. A really "bad" example is my Catalina Ram Test program - this program is deliberately designed to absolutely thrash any paging algorithms, and on the RamBlade it executes over 4 times faster when using direct access than when using cached access.

However, caching is great for platforms with slow RAM (e.g. SPI RAM or FLASH). These boards would otherwise be quite unusable for executing XMM programs.

Ross.

Cluso99 · 2011-10-14 22:16

Perhaps I am not typical, but apart from the first intro to the prop, I always build my own boards, and I always start at the beginning with a board with no peripherals. Then, one by one I add them. Some of my boards do not have common displays like TV or VGA, and some don't have PS2 Keyboard or PS2 Mouse.

So my approach to using Catalina was the same... Start with the basics and build it up.

RossH · 2011-10-14 23:10

Cluso99 wrote: »

So my approach to using Catalina was the same... Start with the basics and build it up.

I'll keep this in mind the next time I update the Catalina documentation. Although if I put it off it long enough, Dr_A, Rayman and yourself may end up doing the job for me! :thumb:

Ross.

Dr_Acula · 2011-10-14 23:16

Looking good for XMM programs. I can compile for TV and for VGA. Sweet!

Now to recompile Catalyst...

Addit: minor problem there:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Administrator>cd c:\program files\catalina

C:\Program Files\Catalina>cd catalyst

C:\Program Files\Catalina\catalyst>build_all DRACBLADE TV

C:\Program Files\Catalina\catalyst>if NOT "" == "" goto define_error

C:\Program Files\Catalina\catalyst>if NOT "DRACBLADE" == "" goto have_parameters

C:\Program Files\Catalina\catalyst>set TMP_LCCDIR=C:\Program Files\Catalina

C:\Program Files\Catalina\catalyst>if "C:\Program Files\Catalina" == "" set TMP_LCCDIR=C:\Program Files\Catalina

C:\Program Files\Catalina\catalyst>if EXIST "C:\Program Files\Catalina\bin\homespun.exe" goto found_catalina

C:\Program Files\Catalina\catalyst>make -v  1>NUL:
'make' is not recognized as an internal or external command,
operable program or batch file.

C:\Program Files\Catalina\catalyst>if NOT ERRORLEVEL 1 goto start_build

ERROR: "make" not found !

   MinGW and MSYS must be installed, and in the in the
   current path in order to build Catalyst successfully.


C:\Program Files\Catalina\catalyst>goto done

 ====
 Done
 ====


C:\Program Files\Catalina\catalyst>

RossH · 2011-10-15 00:08

Dr_Acula wrote: »

Addit: minor problem there:

ERROR: "make" not found !

   MinGW and MSYS must be installed, and in the in the
   current path in order to build Catalyst successfully.

As the error message says, you must download and install MSYS and MinGW - go to http://www.mingw.org/

Alternatively, just update your version of Catalina to 3.3 - in the newer versions I made this a warning rather than an error, since you don't really need MinGW or MSYS any longer.

Ross.

jazzed · 2011-10-15 01:57

Sorry, I meant Dhrystone2.1. Interesting results.

So the "fast" solution is 217.4 Dhrystones/sec. That is: 217.4 / 1757 Dhrystone MIPS = 0.124 DMIPS

On what XMM model is this based ? text/data/stack in external memory (XMEM) ?
Do you have an XMM model with text/data in XMEM and stack in HUB or text only in XMEM?
By text I mean code + constants.

What would you say is the theoretical maximum for Catalina XMM (.text only in XMEM) ? LMM / 4?

What is the Catalina LMM Dhrystone performance? Surely its better than 0.5 DMIPS ?

If theoretical maximum for LMM (unrolled x4) at 80MHz is 4.0 DMIPS, then the best XMM rate would be 1.0 DMIPS.

A theoretical maximum LMM unrolled x4 without in-COG cache at 80MHz would be 4 MIPS. That is: 3*200ns + 1*400ns = 1000/4 = an instruction every 250ns or 4 MIPS. LMM unrolled x8 would give a slightly better theoretical maximum at 4.44 MIPS.

In the most pure fetch/execute sense with a long line of no-macro Propeller instructions one might get asymptotically closer to a theoretical maximum. Pragmatically speaking though, the expectation might be roughly 80% to 90%. Results might be different using Bill's LMM definition.

RossH wrote: »
I couldn't find 1.2, but I found a copy of 2.1, so I ran this on my RamBlade, which has a nice fast parallel SRAM implementation (but can also use the cache if you choose to enable it). This is just for a quick comparison - I didn't bother turning on any optimization:

First, the uncached result:
Dhrystone Benchmark, Version 2.1 (Language: C)

Program compiled without 'register' attribute

Please give the number of runs through the benchmark: 5000

Execution starts, 5000 runs through Dhrystone
Execution ends

Final values of the variables used in the benchmark:

Int_Glob:            5
        should be:   5
Bool_Glob:           1
        should be:   1
Ch_1_Glob:           A
        should be:   A
Ch_2_Glob:           B
        should be:   B
Arr_1_Glob[8]:       7
        should be:   7
Arr_2_Glob[8][7]:    5010
        should be:   Number_Of_Runs + 10
Ptr_Glob->
  Ptr_Comp:          88072
        should be:   (implementation-dependent)
  Discr:             0
        should be:   0
  Enum_Comp:         2
        should be:   2
  Int_Comp:          17
        should be:   17
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
  Ptr_Comp:          88072
        should be:   (implementation-dependent), same as above
  Discr:             0
        should be:   0
  Enum_Comp:         1
        should be:   1
  Int_Comp:          18
        should be:   18
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:           5
        should be:   5
Int_2_Loc:           13
        should be:   13
Int_3_Loc:           7
        should be:   7
Enum_Loc:            1
        should be:   1
Str_1_Loc:           DHRYSTONE PROGRAM, 1'ST STRING
        should be:   DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:           DHRYSTONE PROGRAM, 2'ND STRING
        should be:   DHRYSTONE PROGRAM, 2'ND STRING

Microseconds for one run through Dhrystone: 4600.0
Dhrystones per Second:                       217.4
Now the result when using an 8kb cache:
Microseconds for one run through Dhrystone: 6000.0
Dhrystones per Second:                       166.7
Now the result when using a 1kb cache:
Microseconds for one run through Dhrystone: 7200.0
Dhrystones per Second:                       138.9
Trying to interpret these results is non-trivial - exactly how much faster uncached access is depends very heavily on how many cache "misses" the program generates. Dhrystone is quite a small program, and it probably doesn't need to page all that much. In real-world cases I typically see programs execute around twice as fast when using direct access than when using cached access. A really "bad" example is my Catalina Ram Test program - this program is deliberately designed to absolutely thrash any paging algorithms, and on the RamBlade it executes over 4 times faster when using direct access than when using cached access.

However, caching is great for platforms with slow RAM (e.g. SPI RAM or FLASH). These boards would otherwise be quite unusable for executing XMM programs.

Ross.

RossH · 2011-10-15 02:50

jazzed wrote: »

Sorry, I meant Dhrystone2.1. Interesting results.

So the "fast" solution is 217.4 Dhrystones/sec. That is: 217.4 / 1757 Dhrystone MIPS = 0.124 DMIPS

On what XMM model is this based ? text/data/stack in external memory (XMEM) ?
Do you have an XMM model with text/data in XMEM and stack in HUB or text only in XMEM?
By text I mean code + constants.

What would you say is the theoretical maximum for Catalina XMM (.text only in XMEM) ? LMM / 4?

What is the Catalina LMM Dhrystone performance? Surely its better than 0.5 DMIPS ?

If theoretical maximum for LMM (unrolled x4) at 80MHz is 4.0 DMIPS, then the best XMM rate would be 1.0 DMIPS.

A theoretical maximum LMM unrolled x4 without in-COG cache at 80MHz would be 4 MIPS. That is: 3*200ns + 1*400ns = 1000/4 = an instruction every 250ns or 4 MIPS. LMM unrolled x8 would give a slightly better theoretical maximum at 4.44 MIPS.

In the most pure fetch/execute sense with a long line of no-macro Propeller instructions one might get asymptotically closer to a theoretical maximum. Pragmatically speaking though, the expectation might be roughly 80% to 90%. Results might be different using Bill's LMM definition.

You asked for figures that showed the relative diffference of cached vs uncached performance. That's all these figures were intended to do. They demonstrate that cached performance is slower than uncached performance, even on trivial benchmarks. I can post times for larger "real world" programs that show it is typically two times, and may be up to four times slower.

Catalina LMM Dhrystone performance has been posted elsewhere, and can't be easily compared to Catalina XMM Dhrystone performance since the XMM memory design is the limiting factor, not the performance of the XMM kernel or the Catalina code generator. Also, the platform and optimization levels are not the same. However, we can compare Catalina XMM Dhrystone performance with GCC XMM Dhrystone performance when GCC can do XMM one or more of the same platforms as Catalina.

Ross.

graffix · 2011-10-15 02:56

@Ross no Sir. I dont currently use C.In the past Ive seen some projects that used Catalina and only started to do research on the topic.I quickly decided the overall complexity of the project was more than I could handle at that time.Thanks for all your comments.

Catalina 3.3

Comments