A small tutorial on calling LMM PASM functions from C
I've had some more email enquiries on how to call LMM PASM from C, so I thought I would post a small fully worked example here:
Step 1. Write an LMM PASM function. Here is a simple one that fetches the current value of the CNT register, "ands" it with the parameter you pass to the function and returns the result:
[SIZE=2]
' Catalina Code
DAT
long ' <-- this line ensures we are aligned on a long boundary
' Catalina Export my_asm_function[/SIZE][SIZE=2]
'--------------------------------------------------------------------------
[/SIZE][SIZE=2]C_my_asm_function
mov r0, CNT
and r0, r2
jmp #RETN
[/SIZE][SIZE=2]'--------------------------------------------------------------------------
[/SIZE][SIZE=2]' end
[/SIZE]
Save this code in a file called my_pasm_function.s
NOTES: The lines outside the marker lines are required by Catalina. They specify what segment to put the assembled output, and also the C name of the function (in this case it is my_asm_function - by convention the actual PASM function name itself must have a C_ prefix, which means in this case it must be named C_my_asm_function). Those lines should appear exactly as shown. The lines inside the marker lines are the actual LMM PASM function - you are free to add any PASM code you like, with a few restrictions (described in the Catalina Reference Manual, page 102).
The reason this LMM PASM function uses registers r0 and r2 is described in Step 3 (below). If you need only a few registers, you can use r0 .. r5 without any problems (and also BC and RI if you are not planning to call any LMM primitives). If you need to use more registers (i.e. r6 ... r23) then you can do so, but you will need to save them on entry and restore them before returning (you can also use stack space or Hub RAM space, but all of these topics are beyond the scope of this tutorial).
Step 2: The simplest way to manage your assembly language functions (especially if you have more than one) is in a library. We can create a simple library from our pasm function using the following commands:
mkdir libpasm
cd libpasm
move ..\my_pasm_function.s .
catbind -i -e *.s
cd ..
NOTES: In Catalina, libraries are simplly directories, and library files are simply LMM PASM source files. The name of the directory should begin with the prefix lib - in this case we have chosen libpasm as the name of our library. The catbind function simply catalogues the contents of the library and creates an index file for it.
Step 3: Write a C program that calls the assembly language function. Here is one that just prints the values returned by the function in an infinite loop:
#include <stdio.h>
// declare our function - it takes an int parameter and returns an int result:
extern int my_asm_function(int mask);
int main (void) {
while (1) {
printf("Result = %8X\r\n", my_asm_function(0xFF));
}
return 0;
}
Save this code in a file called pasm_example.c
NOTES: We declare and use the LMM PASM function the same way we would a normal C function. Assuming we pass less than four integer-compatible parameters to the function, they will appear within the function in registers r2, r3, r4, r5 - but starting from the last parameter. So if we pass one parameter it will appear in r2. If we passed two parameters, the first would appear in r3, and the second in r2. For three parameters, the first would appear in r4, the second in r3 and the third in r2 - and so on. If the function wants to return an integer-compatible result, it can do so in r0. If the function needs more than four parameters, or needs to pass or return non-integer types then this is also possible but slightly more complex (and beyond the scope of this simple tutorial!).
Step 4: Compile the program. The following command will do the job - it will compile the program for a C3, and the output of the program will appear on the PC output (it can be viewed using a terminal emulator):
catalina -lc -lpasm pasm_example.c -D C3 -D PC
NOTES: We include the command-line option -lpasm - this tells Catalina to look in the library libpasm when compiling the C code. In that library it will find the function my_pasm_function, so the program should compile correcrtly.
And you're done!
Step 5 (Optional): If you have the Catalina Optimizer, you can add the -O3 flag. The pasm function body will then be 'in-lined' with the C code. To see this, also add the -y flag to look at the compiled output:
catalina -lc -lpasm pasm_example.c -D C3 -D PC -O3 -y
Here is the output (from the file pasm_example.lst) with the in-lined PASM function code coloured green:
025c(0066): ' C_main_2
[COLOR=black]025c(0066): ff 6a fc a0 ' mov r2, #255 ' reg ARG coni[/COLOR][B][COLOR=seagreen]
0260(0067): f1 67 bc a0 ' mov r0, CNT
0264(0068): 35 66 bc 60 ' and r0, r2[/COLOR]
[/B]0268(0069): 33 7c bc a0 ' mov r11, r0 ' CVI, CVU or LOAD
026c(006a): 3e 6a bc a0 ' mov r2, r11 ' CVI, CVU or LOAD
0270(006b): 04 00 7c 5c ' jmp #LODA
0274(006c): 30 32 00 00 ' long @C_main_5_L000006
0278(006d): 2e 6c bc a0 ' mov r3, RI ' reg ARG ADDRG
027c(006e): 08 5e fc a0 ' mov BC, #8
0280(006f): 04 58 fc 84 ' sub SP, #4
0284(0070): 0b 00 7c 5c ' jmp #CALA
0288(0071): 94 2a 00 00 ' long @C_printf
028c(0072): 04 58 fc 80 ' add SP, #4 ' CALL addrg
NOTES: If you look carefully, you will see that the overhead of performing the function call has been completely eliminated. This technique makes the result as effective as using in-line assembler - but much neater, more portable, and easier to maintain!
I think he means that if you write the Init, ReadByte, and WriteByte functions for your hardware and send them to him, he'll send you a set of target files that will let Catalina work with your board. In the case of your SDRAM board, I think those functions could be written on top of your JCACHE interface to the COG that manages the SDRAM access and refresh. Presumably, they are functions that run in his LMM kernel COG. The exact names don't matter because he'll adjust his code to use whatever you supply. At least that's my understanding of what he's saying.
Almost, David - except that caching functionality suitable for use with Jazzed's board is already built into Catalina. While you could cache it again, that would just be a waste of Hub RAM. I just need the access routines that Jazzed himself would use to fill such a cache.
Jazzed, if you can't provide me the 9 standard XMM API functions documented in the Catalina Reference Manual on page 105 (and there are around a dozen examples of how to do this for both parallel and serial boards in the Catalina target directory) then I am willing to do this part for you. I can do that if you will provide me just the three most fundamental routines I can possibly think of for accessing any XMM RAM board:
- initialize the board (e.g. set the Propeller pins and any latches required)
- read a byte from the board
- write a byte from the board
You don't have to worry about assembling these bytes into words or longs - I will do all that. You don't even have to worry about making these routines particularly efficient, since when the cache is in use, the efficiency of the underlying access routines makes almost no difference to the final results.
Ross, I appreciate your post above. Given my earlier post about inline assembly, I think you've struck a very nicely designed balance. The idea then, is to encapsulate the inline as a discrete unit, quantifiable and replaceable on other CPU's, meaning one could just focus on the function itself for a port, nicely compartmentalized by the process, increasing portability overall.
You've given the higher order matters a lot of detail thought that I think goes missed, but for posts like that one. Thanks, and appreciated. I'll be mulling that over for a while, reconsidering some things in my own mind, likely for the better, and that is, of course, one of the best things about this forum.
Almost, David - except that caching functionality suitable for use with Jazzed's board is already built into Catalina. While you could cache it again, that would just be a waste of Hub RAM. I just need the access routines that Jazzed himself would use to fill such a cache.
Okay, I think I understand now. You already have code that runs in a separate COG that manages the cache. You need init/readbyte/writebyte functions that will run in that COG and return bytes from the external memory. Is that correct? I would actually think you'd want read/write cache line functions rather than a byte at a time since that would probably be faster.
I have been playing around with some memory board designs and this is a design that works for the 256x224 video driver and which I also think will work for catalina. So the board design can do double duty.
I am thinking of one Gadget Gangster tower doing Catalina, and the other running the Video.
P0-P23
P23 is chip select for the SD card
P22 is /WR on the ram chip
P21 is /RD on the ram chip
P20 selects the latch
The prop has direct access to blocks of 4096 bytes and there are 128 of these
The high bit on the latch is /OE on the ram chip but it probably is not needed as (I think) setting /RD and /WR high will set the ram chip to tristate.
If the ram is deselected, then the prop can talk to the sd card via P0-P2
There might be a mistake or two as this is a prototype but it is pretty similar to the design I have working on a breadboard.
The driver ought to be fairly easy to write and ought to run faster than the three latch dracblade.
I think I recall some discussion earlier where you said that the ram and the sd card were separate entities. I hope that is correct otherwise this circuit won't work for the sd card!
Hi Ross,
I think I recall some discussion earlier where you said that the ram and the sd card were separate entities. I hope that is correct otherwise this circuit won't work for the sd card!
This design should work with Catalina. In Catalina the SD card driver code is completely separate from the XMM RAM access code. However (as in your design) it is often the case that these devices share some pins, which means both the Kernel and the SD card driver have to know to not access XMM RAM while an the SD card request is in progress (and vice-versa), and also that it is necessary to activate and deactivate the respective device.
This is the purpose of the SHARED_SD and SHARED_XMM symbols in various Catalina components (e.g. Catalina_SD_Plugin.spin and Catalina_XMM.spin) - these symbols enable the use of the SD_Activate/SD_Tristate and XMM_Activate/XMM_Tristate functions (respectively) at the apprropriate times.
Okay, I think I understand now. You already have code that runs in a separate COG that manages the cache. You need init/readbyte/writebyte functions that will run in that COG and return bytes from the external memory. Is that correct? I would actually think you'd want read/write cache line functions rather than a byte at a time since that would probably be faster.
You are correct. My caching cog is actually VMCOG developed by Bill Henning. I just abstracted it a bit to use a common platform-independent XMM API. But the only API functions I actually call from this cog are XMM_Activate, XMM_Tristate, XMM_ReadPage and XMM_WritePage. The code is also complicated by the need to also support SPI Flash based XMM (which needs a few more functions) - but we can ignore that in this case.
Since I can easily write versions of XMM_ReadPage and XMM_WritePage that call simpler byte-oriented access functions, I thought it would be simpler for Jazzed to understand if I reduced it to the simplest possible set of necessary functions (e.g. I don't think I even need XMM_Tristate in his case - his board is SDRAM based, so I don't think he can share his XMM bus with other devices). However, it seems I only managed to confuse him even further
The idea then, is to encapsulate the inline as a discrete unit, quantifiable and replaceable on other CPU's, meaning one could just focus on the function itself for a port, nicely compartmentalized by the process, increasing portability overall.
Yes! With C, encapsulation is the right way to make your code both maintanable and portable. Sprinkling the code with chunks of inline assembler, #ifdef statements and various compiler-specific pragmas and attributes is the wrong way!
The file I posted was not for SDRAM. It was for a 10 pin high performance flash cache interface.
Just as well I didn't spend any time trying to descipher it! I though you were still trying to get your SDRAM board working - what ever happened to that one?
For Flash based XMM RAM, I had to add a new set of API functions because some platforms (such as the C3) have both SRAM and Flash, and want to use them both as XMM RAM. I would have preferred to use the existing API functions to cater for both types of RAM, but code size issues in the kernel meant this was impossible - so intsead I have added a new set of API functions specifically for Flash access - it doesn't matter that these functions need more code space since they are only required in the cache cog (where there is plenty of space), not in the kernel itself (where there is none!):
XMM_FlashActivate - set up the Flash XMM device for read/write XMM_FlashTristate - allow other devices to use the Flash bus XMM_FlashOutByte - write a byte to Flash XMM_FlashInByte - read a byte from Flash XMM_FlashOutBits - write "n" bits to the Flash (used when erasing sectors etc)
From the perspective of anything outside the cache cog, these functions are invisible - the kernel and loaders still only know about the standard XMM API functions documented in the Catalina Reference Manual.
What chipset is your new board? I currently support single bit serial flash chips on the C3 and the Morpheus, but I will soon be adding support for quad bit serial flash chips for Rayman's SuperQuad and RamPage boards.
if you want to start work on your own version, check out the existing implementations in the target directory - i.e. Morpheus_XMM.inc and C3_XMM.inc. If you implement the functions described above, your board will work with the caching cog. All you need to do is define the symbols FLASH and CACHED when you compile your C programs.
Just as well I didn't spend any time trying to descipher it! I though you were still trying to get your SDRAM board working - what ever happened to that one?
It's still around. However Flash is faster, cheaper, smaller, and uses less pins. Compelling?
From a performance perspective Flash is a perfect solution for faster boot time and cache page swaps (read-only). Once it's programmed it boots "in a flash" and does not require loading from SDcard (but it can be programmed from an SDcard file).
I posted a flash solution just before Rayman posted his. We both started working on it about the same time independently.
Rayman chose to use the ST parts, I'm using the Winbond parts mainly because of availibility. The cache driver I posted was for 2 of the QuadSPI Winbond parts in parallel to form an 8 bit bus. My SpinSocket Flash modules have devices on P0..7 for best performance. There are some differences between the devices for addressing. Having 2 QuadSPI parts is much faster than 1, but the code is a little trickier. Too bad you can't just drop it in.
Rayman chose to use the ST parts, I'm using the Winbond parts mainly because of availibility. The cache driver I posted was for 2 of the QuadSPI Winbond parts in parallel to form an 8 bit bus. My SpinSocket Flash modules have devices on P0..7 for best performance. There are some differences between the devices for addressing. Having 2 QuadSPI parts is much faster than 1, but the code is a little trickier. Too bad you can't just drop it in.
I see. Then the best option is for me to get Rayman's boards working. When I post that code you can modify it to suit your own board.
How does one specify COG code encapsulated? I know I can make a binary, assemble it, etc... Possible to do in the Prop Tool too. But, if one wanted to write COG code in Catalina, say for a math library, or video output, and use the tool in the way shown above, how does that happen, or is that some abuse of it? I personally don't see a lot of merit to drivers and such written in C, when PASM is so brilliant. And this is one of those reconsiderations
I personally don't see a lot of merit to drivers and such written in C, when PASM is so brilliant.
The merit is of numbers and sales volume. That is, it allows the great world army of C programmers use the Propeller without needing special skills to do simple to moderately complex things. As long as provisions exist for building, including, and running PASM, that great world C programming army can use the PASM others produce and even learn PASM if they like.
Let me be clear, there are my own efforts and ends, and there are other efforts and ends. I posed the question and made the comment I did for my own ends only. I am currently working on new skills, and have a genuine interest in the ideas expressed by various people here as to the "right" way to do things.
I often teach engineering software modeling classes, for example. That is software like Solidworks (a popular competitor to me), Siemens NX (home turf), etc...
What I find fascinating, and have for quite some time now, is the basic problem spaces of editing work done, re-use, scaling, performance, etc... found in the spaces of software development and mechanical geometry development are remarkably similar, sharing many of the same forward create philosophy options, and core ideological differences in approach. Quite simply put, parametric geometry modeling is really no different from writing code, in that both tasks are filled with necessary abstractions that vary and that pose similar problems.
There is the just do it option, where basically anything is on the table, where people can make as big of a mess as they are inclined to do, and on many "axis" diverging from that, what I would call "disciplines" where specific means and methods are favored, ideally delivering optimal, or favorable results, each at a cost and barrier to use value and trade-offs of various sorts.
Over the years of grappling with these things, I have been able to observe these dynamics play out in various ways, I find personally enlightening and valuable. This is no different, but for the fact that I am learning more than not right now. So, toward that end, I want to strongly differentiate any related commentary as such, not necessarily aligned with the greater meta-discussions in play here. When I want to make that distinction, I'll explicitly do so, leaving no ambiguity. At all other times, I am just a interested participant, looking to learn some stuff.
(and yes, I could very easily pursue a career in either legal or politics and see a high degree of success, but for both being distasteful to me for various reasons, though I have found those skill sets to be quite useful in managing everyday matters --just so you know the politics in the above are not lost on me )
It's a good question. The answer will be useful. Just plugging something in without needing to understand all the underlying details (encapsulating) would be perfect.
Potatohead,
What?
Maybe I'm tired, I did not see what any of that long post had to do with the questions about Catalina or C or PASM.
Well, actually I did not understand any of it so perhaps it was relevent in some way I missed:)
Well, it does have merits, thus the opinion! No worries. Really.
Again, to use the analogy from modeling, I have good mastery of ALL techniques possible, and have even created a couple of my own, long before vendors got there, some of whom were decidedly annoyed at my white papers and feature requests. Parametric design is a lot of fun, though not always optimal, as this is exactly the same dynamic, which I seek to understand better. (and some of the vendors get that now, 5 years later... bunch of us are happy for that too)
Jazzed, There are always merits. What isn't always true is a mapping of those to a particular individual, or problem space. Agreed on the utility, which is why I asked, though I think the understanding is also of equal utility, which is exactly what got me to thinking on this some.
@Ross, my apologies for making a mess.
@Heater, it was relevant to the higher order and longer running conversation we've had on need for gcc, C vs C++, success of propeller, etc... I simply wanted to compartmentalize my query, avoiding that, to persue my own enlightenment, got wordy, and well? Here we are Best case, IMHO, is to ignore it, and carry on, unless you would like to expand some on my encapsulation vs in-line / modifiers comment and question, which I would enjoy and consider valuable.
LOL!! (Dare I ask what basis there was for interesting? --and no, I'm not actually asking, just having a bit of fun Ross)
Let's say I want to author PASM COG code in Catalina. I don't want to use some outside tool, or assembler to build a BLOB, and link it in, or stuff some array with hex data. (and my use of BLOB is in the strict technical sense of "Binary Large OBject")
If we take the "in-line" approach, that can be as simple as a framework stub, where one can then stuff the code right there in the editor, happy fun! Prior to your post above, that's what I would want to do, or make the BLOB elsewhere, and drag it in however is most expedient.
So, how does one author encapsulated COG PASM code , where you can see the PASM nice and clean, adhering to the principles we discussed here? (encapsulation, portability, etc...)
And I'm asking because I really like SPIN + PASM. The only real downside is no in-line, but then again, there are many upsides, and the PASM environment is just sweet. Doing LMM, etc... isn't really as sexy either. I can however, launch Prop Tool and just do it, whatever it is, in Prop Tool. That's compelling for a lot of reasons.
Being able to author COG PASM code, in the context of Catalina, or any C environment would frankly be intriguing, and have many of the benefits, right along with LMM, etc... and fewer hassles. Right now, it seems to me that one must use two tools, or use a mix of C and other stuff that may or may not make sense, depending on a lot of things to do the same. Not that I think those things are bad, it's more like I'm just thinking what you wrote all the way through, wanting to just do the entire task, drivers on up, in Catalina, nothing else, like I would Prop Tool.
Let's say I want to author PASM COG code in Catalina. I don't want to use some outside tool, or assembler to build a BLOB, and link it in, or stuff some array with hex data. (and my use of BLOB is in the strict technical sense of "Binary Large OBject")
Easy! But first you have to define your terms a bit further. First, what is an "outside tool" with Catalina?.
The program you think of as "Catalina" is really only a thin wrapper program around a whole suite of related tools - e.g. cpp, rcc, catbind, catoptimize, catdbgfilegen, homespun and srecord. You can invoke any or all of these programs independently (and I for one quite often do) - or you can just let Catalina invoke them all for you in turn (which is of course what most people do). So homespun is definitely not an "outside tool".
What about spinc? Is it an outside tool? I would argue "no" on the basis that it is also part of the Catalina "suite" of tools - it is just not invoked by Catalina itself - you currently always have to do so manually. However, I should point out that in the last release of Catalina I very nearly embedded the calling of spinc within Catalina so that it would be automatically invoked on any spin program source files you included on the Catalina command line (just as lcc is invoked on any C files). In the end I decided not to do so - but only because it didn't seem necessary (since invoking it separately is so easy - even from within Code::Blocks).
So - on the basis that both homespun and spinc are part of Catalina, and not "outside tools" then adding cog-based PASM to your C program is in fact only marginally more complex than adding LMM PASM.
Of course, the proviso is that the cog-based PASM executes on a different cog to the C code. But with the Propeller, this is the default model we are all used to, is it not? This is in fact the very same model that Spin uses. Actually, since you only have 496 instructions to work with in a cog, it would also be the default model for a purely cog-based PASM program as well (assuming someone wanted to write a PASM program larger than 496 instructions).
Next, the "BLOB" issue - well, the kernel itself is such a BLOB - it just happens to be one that is managed invisibly to you as a Catalina user. But when you load a cog-based PASM program in Catalina (via spinc) you don't see the actual BLOB either - so I don't see that BLOBs are an issue per se.
If we take the "in-line" approach, that can be as simple as a framework stub, where one can then stuff the code right there in the editor, happy fun! Prior to your post above, that's what I would want to do, or make the BLOB elsewhere, and drag it in however is most expedient.
You cannot take the "in-line" approach for cog-based PASM with an LMM compiler. There is typically no space available in the cog executing your C code to execute any cog-based PASM.
I suppose you could "reserve" some cog space in your LMM kernel specifically for executing cog-based PASM and then invoke it like a function - this would be easy enough to accomplish in a small way (about the same complexity as FCACHE, supporting blobs of PASM code of perhaps 100 longs or so). However, any benefit you could possibly get from this technique over and above executing the same code in another cog is very questionable - it may in fact be slower to do it this way since you would have to load the code into your own cog one long at a time, whereas you can load it into another cog using a single "coginit" instruction. Also remember that the code cannot be made permanently resident in the cog at compile time - it has to be able to be loaded as it is needed (e.g. what happens if you have a main function containing inlined code which then calls a pre-compiled library function that itself contains inlined code? Which code do you inline, and what do you do with the other code?).
So why not instead simply load your cog-based PASM in another cog, and also thereby remove all the limitations you would have to place on such "inlined" code?
So, how does one author encapsulated COG PASM code , where you can see the PASM nice and clean, adhering to the principles we discussed here? (encapsulation, portability, etc...)
And I'm asking because I really like SPIN + PASM. The only real downside is no in-line, but then again, there are many upsides, and the PASM environment is just sweet. Doing LMM, etc... isn't really as sexy either. I can however, launch Prop Tool and just do it, whatever it is, in Prop Tool. That's compelling for a lot of reasons.
Being able to author COG PASM code, in the context of Catalina, or any C environment would frankly be intriguing, and have many of the benefits, right along with LMM, etc... and fewer hassles. Right now, it seems to me that one must use two tools, or use a mix of C and other stuff that may or may not make sense, depending on a lot of things to do the same. Not that I think those things are bad, it's more like I'm just thinking what you wrote all the way through, wanting to just do the entire task, drivers on up, in Catalina, nothing else, like I would Prop Tool.
Here, you are talking about a "convenience" issue - not a language issue. For instance, if I did add the ability to invoke the spinc utility to Catalina, and "tweaked" the internal build engine in Code::Blocks to also understand that Catalina should also be invoked on any files with the .spin extension, and also added knowledge of the Spin syntax to the Code::Blocks editor ... then you could combine all your Spin programming, your C programming and combined Spin/PASM/C programming into a single tool.
Could I do so? Of course!
Will I do so? Probably not - there would not be enough demand to justify it, and my time is limited.
However, I will post a small tutorial about how to use cog-based PASM (similar to the one I posted about LMM PASM) when I have some time.
Interesting discussion. I've got some boards being made and batchpcb will take a few weeks so I have a little time to think about software.
I've got this idea of trying to write pasm using a simple subset of C. Assembly by its nature contains lots of jumps, and I have been pondering the absolute minimum number of structures you could have to eliminate all jumps, but still produce code that is the most efficient pasm code that can be produced anyway.
Consider an IF statement. In pasm it is two lines of code - a test for a condition, and a jump to skip some code if the condition is not true.
That translates very easily into a C syntax.
if (a ==b)
{
some code
}
And you can do that for a>b, a<b, a=b and a notequal b using the wz or wc flags.
The end result is the jump disappears.
There are various loop structures that can hide all the jumps.
I think you also need to add the 'switch' command for multiple IF statements in a row where you want the code to fall through to the end after one of the conditions has been met. If you have 'break' then it falls to the end, and if there is no 'break' it works through them one at a time.
Add and Subtract convert easily from C to Pasm.
Rotate also converts easily.
Then there are commands that are unique to the propeller eg dira. But you can write them in a C like syntax with fake function calls
mov dira,myvariable becomes dira(myvariable)
ditto wrlong, rdlong etc.
These can be fake function calls, or they can even be real, but the function just contains a comment. Indeed, having a comment about what a wrlong is might be quite useful. The precompiler just ignores the function wrlong.
At the end of the day maybe you can't replicate every sort of cunning pasm trick (jmpret and multithreading) but you could replicate all the sort of pasm code I write.
Why do this?
Well, I kind of like the idea of C for cogs, so long as the code produced is truly as lean as pasm code.
The big question - is it portable? Well not as such. Much of it would be, and would run on any C compiler. The bits that would not run would be the bits that are invoking propeller unique commands, like wrlong. But if you wrote this in a C type of syntax wrlong(destination,source) then for a simulator you could write a function to write that value to a 32k array you might call hub[32768]
Catalina is already made up of multiple pre-compilers so this might just add another one. Input would be a giant text file including cog C, and output is the Cog C bits converted to inline arrays ready for cognew statements.
In a sense this bends the rules by adding in pasm code inline. But it would be clearer to read, because all the pasm code would look like C, so the entire program would look like a C program.
I might do some more experiments and see where this might go...
It is written in C# which adds a nice touch, because you have a C program being processed and compiled by another C program. If you have pasm code that you can write in C knowing that it is as optimised as it can be, then it can be possible to think about a total C solution for the propeller.
Ross, did you look any more at Nethack?
Do you think it could run from Flash?
Or, does it need SRAM?
Hi Rayman,
I'll have another look at NetHack when I get time, but I may have to go back to a quite an early version - the later versions require more memory than any of the Propellers I currently have available - from memory (it's been a while) I think the last time I looked at it, I estimated it would require several megabytes of SRAM just for data space, as well as several megabytes of code space (which could be FLASH).
Thanks for your great response. The tutorial would be appreciated when you've got a chunk of time suitable for it.
You very correctly called me out on "outside tool". Suffice it to say, a collection of tools, where there is one focus point most of the time, is "a tool" to me.
This:
So why not instead simply load your cog-based PASM in another cog, and also thereby remove all the limitations you would have to place on such "inlined" code?
is simply a matter I had not fully considered, and your consideration of it, as well as presentation of it makes perfect sense. Agreed.
And:
Here, you are talking about a "convenience" issue - not a language issue.
is absolutely true, my apologies, if I appeared to frame it as a language matter. I simply was trying to think through some things I do now, mapping them to C on Propeller. I am considering some larger code projects, where SPIN + PASM (SPASM!) isn't going to make any real sense, thus my query.
Thanks again. I really do appreciate your clear explanations, and your very smart questions back to clarify what is murky. It doesn't take a lot of dialog to learn something from you Ross. (A quality I want to be sure and note, because it's of high value.)
The tutorial would be appreciated when you've got a chunk of time suitable for it.
Ah! thanks for reminding me - I had clean forgotten I promised that!. I've been busy in the guts of Catalina working on release 3.3 - I will try and include a tutorial about this either with the new release, or shortly thereafer.
For those that are interested, the main feature of the next release of Catalina will be support for Rayman's fantastic FlashPoint Propeller expansion modules - the SuperQuad and the RamPage.
Now everyone with a few Prop pins to spare will be able to run Catalina C programs up to 2Mb in size from XMM FLASH and/or SRAM - for just a few dollars (when I think of the amount of money I spent on the original Hydra eXTreme 512k RAM expansion board ...!)
Since I was in there messing about anyway, I have also decided to refactor, formalize and document the API I already had in place for SPI FLASH and SPI RAM (e.g. on the C3). This will make the job of adding new XMM implementations to Catalina completely trivial - whether it be parallel, single-bit serial, or quad-bit serial, and SRAM, DRAM or FLASH.
Now there are three different aspects to the XMM API:
The existing XMM API, for small and fast RAM solutions (such as parallel SRAM). This will usually give best performance, but the entire API has to be small enough to fit into the space available in the XMM kernel. The functions that must be provided are:
XMM_Activate
XMM_Tristate
XMM_ReadLong
XMM_WriteLong
XMM_ReadMult
XMM_WriteMult
XMM_ReadPage
XMM_WritePage
A simplified XMM API, for slower or more complex RAM solutions (such as serial SRAM, or DRAM). This API is designed for use with the Caching XMM driver. Since the code no longer resides in the kernel, it is good for cases where the memory is very complex to access (as it tends to be for serial RAM), or where the speed of the memory is slow (ditto). The functions in this API are a subset of the ones described above, which means that any existing XMM APIs can also use the Caching XMM driver (since the cache can also speed up slower parallel XMM implementations). The functions that must be provided for Caching access are:
XMM_Activate
XMM_Tristate
XMM_ReadPage
XMM_WritePage
A new XMM API specifically for FLASH-based solutions. The complexity of writing to FLASH means the Caching XMM driver must always be used, but for solutions that have both FLASH and SRAM available, you can provide both these functions and the previous four functions, then use both types of RAM simultaneously. The functions that must be provided for FLASH access are:
XMM_FlashActivate
XMM_FlashTristate
XMM_FlashReadPage
XMM_FlashWritePage
XMM_FlashEraseChip
XMM_FlashEraseBlock
XMM_FlashUnprotect
XMM_FlashWriteEnable
There are also some pretty good performance enhancements in the pipeline. Some of these may be included in the next release as well.
Comments
I've had some more email enquiries on how to call LMM PASM from C, so I thought I would post a small fully worked example here:
Step 1. Write an LMM PASM function. Here is a simple one that fetches the current value of the CNT register, "ands" it with the parameter you pass to the function and returns the result:
Save this code in a file called my_pasm_function.s
NOTES: The lines outside the marker lines are required by Catalina. They specify what segment to put the assembled output, and also the C name of the function (in this case it is my_asm_function - by convention the actual PASM function name itself must have a C_ prefix, which means in this case it must be named C_my_asm_function). Those lines should appear exactly as shown. The lines inside the marker lines are the actual LMM PASM function - you are free to add any PASM code you like, with a few restrictions (described in the Catalina Reference Manual, page 102).
The reason this LMM PASM function uses registers r0 and r2 is described in Step 3 (below). If you need only a few registers, you can use r0 .. r5 without any problems (and also BC and RI if you are not planning to call any LMM primitives). If you need to use more registers (i.e. r6 ... r23) then you can do so, but you will need to save them on entry and restore them before returning (you can also use stack space or Hub RAM space, but all of these topics are beyond the scope of this tutorial).
Step 2: The simplest way to manage your assembly language functions (especially if you have more than one) is in a library. We can create a simple library from our pasm function using the following commands:
NOTES: In Catalina, libraries are simplly directories, and library files are simply LMM PASM source files. The name of the directory should begin with the prefix lib - in this case we have chosen libpasm as the name of our library. The catbind function simply catalogues the contents of the library and creates an index file for it.
Step 3: Write a C program that calls the assembly language function. Here is one that just prints the values returned by the function in an infinite loop: Save this code in a file called pasm_example.c
NOTES: We declare and use the LMM PASM function the same way we would a normal C function. Assuming we pass less than four integer-compatible parameters to the function, they will appear within the function in registers r2, r3, r4, r5 - but starting from the last parameter. So if we pass one parameter it will appear in r2. If we passed two parameters, the first would appear in r3, and the second in r2. For three parameters, the first would appear in r4, the second in r3 and the third in r2 - and so on. If the function wants to return an integer-compatible result, it can do so in r0. If the function needs more than four parameters, or needs to pass or return non-integer types then this is also possible but slightly more complex (and beyond the scope of this simple tutorial!).
Step 4: Compile the program. The following command will do the job - it will compile the program for a C3, and the output of the program will appear on the PC output (it can be viewed using a terminal emulator): NOTES: We include the command-line option -lpasm - this tells Catalina to look in the library libpasm when compiling the C code. In that library it will find the function my_pasm_function, so the program should compile correcrtly.
And you're done!
Step 5 (Optional): If you have the Catalina Optimizer, you can add the -O3 flag. The pasm function body will then be 'in-lined' with the C code. To see this, also add the -y flag to look at the compiled output: Here is the output (from the file pasm_example.lst) with the in-lined PASM function code coloured green: NOTES: If you look carefully, you will see that the overhead of performing the function call has been completely eliminated. This technique makes the result as effective as using in-line assembler - but much neater, more portable, and easier to maintain!
Ross.
Almost, David - except that caching functionality suitable for use with Jazzed's board is already built into Catalina. While you could cache it again, that would just be a waste of Hub RAM. I just need the access routines that Jazzed himself would use to fill such a cache.
Jazzed, if you can't provide me the 9 standard XMM API functions documented in the Catalina Reference Manual on page 105 (and there are around a dozen examples of how to do this for both parallel and serial boards in the Catalina target directory) then I am willing to do this part for you. I can do that if you will provide me just the three most fundamental routines I can possibly think of for accessing any XMM RAM board:
- initialize the board (e.g. set the Propeller pins and any latches required)
- read a byte from the board
- write a byte from the board
You don't have to worry about assembling these bytes into words or longs - I will do all that. You don't even have to worry about making these routines particularly efficient, since when the cache is in use, the efficiency of the underlying access routines makes almost no difference to the final results.
Ross.
Then I think we may as well both forget about it for now. When you have more time, let me know. Or send me a board and I'll add it to the backlog.
Ross.
You've given the higher order matters a lot of detail thought that I think goes missed, but for posts like that one. Thanks, and appreciated. I'll be mulling that over for a while, reconsidering some things in my own mind, likely for the better, and that is, of course, one of the best things about this forum.
Okay, I think I understand now. You already have code that runs in a separate COG that manages the cache. You need init/readbyte/writebyte functions that will run in that COG and return bytes from the external memory. Is that correct? I would actually think you'd want read/write cache line functions rather than a byte at a time since that would probably be faster.
I have been playing around with some memory board designs and this is a design that works for the 256x224 video driver and which I also think will work for catalina. So the board design can do double duty.
I am thinking of one Gadget Gangster tower doing Catalina, and the other running the Video.
P0-P23
P23 is chip select for the SD card
P22 is /WR on the ram chip
P21 is /RD on the ram chip
P20 selects the latch
The prop has direct access to blocks of 4096 bytes and there are 128 of these
The high bit on the latch is /OE on the ram chip but it probably is not needed as (I think) setting /RD and /WR high will set the ram chip to tristate.
If the ram is deselected, then the prop can talk to the sd card via P0-P2
There might be a mistake or two as this is a prototype but it is pretty similar to the design I have working on a breadboard.
The driver ought to be fairly easy to write and ought to run faster than the three latch dracblade.
I think I recall some discussion earlier where you said that the ram and the sd card were separate entities. I hope that is correct otherwise this circuit won't work for the sd card!
This design should work with Catalina. In Catalina the SD card driver code is completely separate from the XMM RAM access code. However (as in your design) it is often the case that these devices share some pins, which means both the Kernel and the SD card driver have to know to not access XMM RAM while an the SD card request is in progress (and vice-versa), and also that it is necessary to activate and deactivate the respective device.
This is the purpose of the SHARED_SD and SHARED_XMM symbols in various Catalina components (e.g. Catalina_SD_Plugin.spin and Catalina_XMM.spin) - these symbols enable the use of the SD_Activate/SD_Tristate and XMM_Activate/XMM_Tristate functions (respectively) at the apprropriate times.
Ross.
You are correct. My caching cog is actually VMCOG developed by Bill Henning. I just abstracted it a bit to use a common platform-independent XMM API. But the only API functions I actually call from this cog are XMM_Activate, XMM_Tristate, XMM_ReadPage and XMM_WritePage. The code is also complicated by the need to also support SPI Flash based XMM (which needs a few more functions) - but we can ignore that in this case.
Since I can easily write versions of XMM_ReadPage and XMM_WritePage that call simpler byte-oriented access functions, I thought it would be simpler for Jazzed to understand if I reduced it to the simplest possible set of necessary functions (e.g. I don't think I even need XMM_Tristate in his case - his board is SDRAM based, so I don't think he can share his XMM bus with other devices). However, it seems I only managed to confuse him even further
Ross.
Yes! With C, encapsulation is the right way to make your code both maintanable and portable. Sprinkling the code with chunks of inline assembler, #ifdef statements and various compiler-specific pragmas and attributes is the wrong way!
Thanks goodness someone gets it!
Ross.
For Flash based XMM RAM, I had to add a new set of API functions because some platforms (such as the C3) have both SRAM and Flash, and want to use them both as XMM RAM. I would have preferred to use the existing API functions to cater for both types of RAM, but code size issues in the kernel meant this was impossible - so intsead I have added a new set of API functions specifically for Flash access - it doesn't matter that these functions need more code space since they are only required in the cache cog (where there is plenty of space), not in the kernel itself (where there is none!):
XMM_FlashTristate - allow other devices to use the Flash bus
XMM_FlashOutByte - write a byte to Flash
XMM_FlashInByte - read a byte from Flash
XMM_FlashOutBits - write "n" bits to the Flash (used when erasing sectors etc)
What chipset is your new board? I currently support single bit serial flash chips on the C3 and the Morpheus, but I will soon be adding support for quad bit serial flash chips for Rayman's SuperQuad and RamPage boards.
if you want to start work on your own version, check out the existing implementations in the target directory - i.e. Morpheus_XMM.inc and C3_XMM.inc. If you implement the functions described above, your board will work with the caching cog. All you need to do is define the symbols FLASH and CACHED when you compile your C programs.
Ross.
From a performance perspective Flash is a perfect solution for faster boot time and cache page swaps (read-only). Once it's programmed it boots "in a flash" and does not require loading from SDcard (but it can be programmed from an SDcard file).
I posted a flash solution just before Rayman posted his. We both started working on it about the same time independently.
Rayman chose to use the ST parts, I'm using the Winbond parts mainly because of availibility. The cache driver I posted was for 2 of the QuadSPI Winbond parts in parallel to form an 8 bit bus. My SpinSocket Flash modules have devices on P0..7 for best performance. There are some differences between the devices for addressing. Having 2 QuadSPI parts is much faster than 1, but the code is a little trickier. Too bad you can't just drop it in.
I see. Then the best option is for me to get Rayman's boards working. When I post that code you can modify it to suit your own board.
Ross.
How does one specify COG code encapsulated? I know I can make a binary, assemble it, etc... Possible to do in the Prop Tool too. But, if one wanted to write COG code in Catalina, say for a math library, or video output, and use the tool in the way shown above, how does that happen, or is that some abuse of it? I personally don't see a lot of merit to drivers and such written in C, when PASM is so brilliant. And this is one of those reconsiderations
I often teach engineering software modeling classes, for example. That is software like Solidworks (a popular competitor to me), Siemens NX (home turf), etc...
What I find fascinating, and have for quite some time now, is the basic problem spaces of editing work done, re-use, scaling, performance, etc... found in the spaces of software development and mechanical geometry development are remarkably similar, sharing many of the same forward create philosophy options, and core ideological differences in approach. Quite simply put, parametric geometry modeling is really no different from writing code, in that both tasks are filled with necessary abstractions that vary and that pose similar problems.
There is the just do it option, where basically anything is on the table, where people can make as big of a mess as they are inclined to do, and on many "axis" diverging from that, what I would call "disciplines" where specific means and methods are favored, ideally delivering optimal, or favorable results, each at a cost and barrier to use value and trade-offs of various sorts.
Over the years of grappling with these things, I have been able to observe these dynamics play out in various ways, I find personally enlightening and valuable. This is no different, but for the fact that I am learning more than not right now. So, toward that end, I want to strongly differentiate any related commentary as such, not necessarily aligned with the greater meta-discussions in play here. When I want to make that distinction, I'll explicitly do so, leaving no ambiguity. At all other times, I am just a interested participant, looking to learn some stuff.
(and yes, I could very easily pursue a career in either legal or politics and see a high degree of success, but for both being distasteful to me for various reasons, though I have found those skill sets to be quite useful in managing everyday matters --just so you know the politics in the above are not lost on me )
...so, back to the question then.
It's a good question. The answer will be useful. Just plugging something in without needing to understand all the underlying details (encapsulating) would be perfect.
What?
Maybe I'm tired, I did not see what any of that long post had to do with the questions about Catalina or C or PASM.
Well, actually I did not understand any of it so perhaps it was relevent in some way I missed:)
Again, to use the analogy from modeling, I have good mastery of ALL techniques possible, and have even created a couple of my own, long before vendors got there, some of whom were decidedly annoyed at my white papers and feature requests. Parametric design is a lot of fun, though not always optimal, as this is exactly the same dynamic, which I seek to understand better. (and some of the vendors get that now, 5 years later... bunch of us are happy for that too)
Jazzed, There are always merits. What isn't always true is a mapping of those to a particular individual, or problem space. Agreed on the utility, which is why I asked, though I think the understanding is also of equal utility, which is exactly what got me to thinking on this some.
@Ross, my apologies for making a mess.
@Heater, it was relevant to the higher order and longer running conversation we've had on need for gcc, C vs C++, success of propeller, etc... I simply wanted to compartmentalize my query, avoiding that, to persue my own enlightenment, got wordy, and well? Here we are Best case, IMHO, is to ignore it, and carry on, unless you would like to expand some on my encapsulation vs in-line / modifiers comment and question, which I would enjoy and consider valuable.
Ross.
Let's say I want to author PASM COG code in Catalina. I don't want to use some outside tool, or assembler to build a BLOB, and link it in, or stuff some array with hex data. (and my use of BLOB is in the strict technical sense of "Binary Large OBject")
If we take the "in-line" approach, that can be as simple as a framework stub, where one can then stuff the code right there in the editor, happy fun! Prior to your post above, that's what I would want to do, or make the BLOB elsewhere, and drag it in however is most expedient.
So, how does one author encapsulated COG PASM code , where you can see the PASM nice and clean, adhering to the principles we discussed here? (encapsulation, portability, etc...)
And I'm asking because I really like SPIN + PASM. The only real downside is no in-line, but then again, there are many upsides, and the PASM environment is just sweet. Doing LMM, etc... isn't really as sexy either. I can however, launch Prop Tool and just do it, whatever it is, in Prop Tool. That's compelling for a lot of reasons.
Being able to author COG PASM code, in the context of Catalina, or any C environment would frankly be intriguing, and have many of the benefits, right along with LMM, etc... and fewer hassles. Right now, it seems to me that one must use two tools, or use a mix of C and other stuff that may or may not make sense, depending on a lot of things to do the same. Not that I think those things are bad, it's more like I'm just thinking what you wrote all the way through, wanting to just do the entire task, drivers on up, in Catalina, nothing else, like I would Prop Tool.
The program you think of as "Catalina" is really only a thin wrapper program around a whole suite of related tools - e.g. cpp, rcc, catbind, catoptimize, catdbgfilegen, homespun and srecord. You can invoke any or all of these programs independently (and I for one quite often do) - or you can just let Catalina invoke them all for you in turn (which is of course what most people do). So homespun is definitely not an "outside tool".
What about spinc? Is it an outside tool? I would argue "no" on the basis that it is also part of the Catalina "suite" of tools - it is just not invoked by Catalina itself - you currently always have to do so manually. However, I should point out that in the last release of Catalina I very nearly embedded the calling of spinc within Catalina so that it would be automatically invoked on any spin program source files you included on the Catalina command line (just as lcc is invoked on any C files). In the end I decided not to do so - but only because it didn't seem necessary (since invoking it separately is so easy - even from within Code::Blocks).
So - on the basis that both homespun and spinc are part of Catalina, and not "outside tools" then adding cog-based PASM to your C program is in fact only marginally more complex than adding LMM PASM.
Of course, the proviso is that the cog-based PASM executes on a different cog to the C code. But with the Propeller, this is the default model we are all used to, is it not? This is in fact the very same model that Spin uses. Actually, since you only have 496 instructions to work with in a cog, it would also be the default model for a purely cog-based PASM program as well (assuming someone wanted to write a PASM program larger than 496 instructions).
Next, the "BLOB" issue - well, the kernel itself is such a BLOB - it just happens to be one that is managed invisibly to you as a Catalina user. But when you load a cog-based PASM program in Catalina (via spinc) you don't see the actual BLOB either - so I don't see that BLOBs are an issue per se.
You cannot take the "in-line" approach for cog-based PASM with an LMM compiler. There is typically no space available in the cog executing your C code to execute any cog-based PASM.
I suppose you could "reserve" some cog space in your LMM kernel specifically for executing cog-based PASM and then invoke it like a function - this would be easy enough to accomplish in a small way (about the same complexity as FCACHE, supporting blobs of PASM code of perhaps 100 longs or so). However, any benefit you could possibly get from this technique over and above executing the same code in another cog is very questionable - it may in fact be slower to do it this way since you would have to load the code into your own cog one long at a time, whereas you can load it into another cog using a single "coginit" instruction. Also remember that the code cannot be made permanently resident in the cog at compile time - it has to be able to be loaded as it is needed (e.g. what happens if you have a main function containing inlined code which then calls a pre-compiled library function that itself contains inlined code? Which code do you inline, and what do you do with the other code?).
So why not instead simply load your cog-based PASM in another cog, and also thereby remove all the limitations you would have to place on such "inlined" code?
Here, you are talking about a "convenience" issue - not a language issue. For instance, if I did add the ability to invoke the spinc utility to Catalina, and "tweaked" the internal build engine in Code::Blocks to also understand that Catalina should also be invoked on any files with the .spin extension, and also added knowledge of the Spin syntax to the Code::Blocks editor ... then you could combine all your Spin programming, your C programming and combined Spin/PASM/C programming into a single tool.
Could I do so? Of course!
Will I do so? Probably not - there would not be enough demand to justify it, and my time is limited.
However, I will post a small tutorial about how to use cog-based PASM (similar to the one I posted about LMM PASM) when I have some time.
Ross.
I've got this idea of trying to write pasm using a simple subset of C. Assembly by its nature contains lots of jumps, and I have been pondering the absolute minimum number of structures you could have to eliminate all jumps, but still produce code that is the most efficient pasm code that can be produced anyway.
Consider an IF statement. In pasm it is two lines of code - a test for a condition, and a jump to skip some code if the condition is not true.
That translates very easily into a C syntax.
And you can do that for a>b, a<b, a=b and a notequal b using the wz or wc flags.
The end result is the jump disappears.
There are various loop structures that can hide all the jumps.
I think you also need to add the 'switch' command for multiple IF statements in a row where you want the code to fall through to the end after one of the conditions has been met. If you have 'break' then it falls to the end, and if there is no 'break' it works through them one at a time.
Add and Subtract convert easily from C to Pasm.
Rotate also converts easily.
Then there are commands that are unique to the propeller eg dira. But you can write them in a C like syntax with fake function calls
mov dira,myvariable becomes dira(myvariable)
ditto wrlong, rdlong etc.
These can be fake function calls, or they can even be real, but the function just contains a comment. Indeed, having a comment about what a wrlong is might be quite useful. The precompiler just ignores the function wrlong.
At the end of the day maybe you can't replicate every sort of cunning pasm trick (jmpret and multithreading) but you could replicate all the sort of pasm code I write.
Why do this?
Well, I kind of like the idea of C for cogs, so long as the code produced is truly as lean as pasm code.
The big question - is it portable? Well not as such. Much of it would be, and would run on any C compiler. The bits that would not run would be the bits that are invoking propeller unique commands, like wrlong. But if you wrote this in a C type of syntax wrlong(destination,source) then for a simulator you could write a function to write that value to a 32k array you might call hub[32768]
Catalina is already made up of multiple pre-compilers so this might just add another one. Input would be a giant text file including cog C, and output is the Cog C bits converted to inline arrays ready for cognew statements.
In a sense this bends the rules by adding in pasm code inline. But it would be clearer to read, because all the pasm code would look like C, so the entire program would look like a C program.
I might do some more experiments and see where this might go...
Bob Anderson wrote an Augmented Assembly Code pre-processor that may be a better solution to what you're trying to do.
It takes PASM and adds stuff, which in general is a better approach than taking C and removing stuff.
Download the object and check out the manual.
I haven't heard from Bob for a while (he didn't answer my last email) which is a shame, because he did some very clever stuff.
Ross.
It is written in C# which adds a nice touch, because you have a C program being processed and compiled by another C program. If you have pasm code that you can write in C knowing that it is as optimised as it can be, then it can be possible to think about a total C solution for the propeller.
Off to study that code... thanks!
Do you think it could run from Flash?
Or, does it need SRAM?
Hi Rayman,
I'll have another look at NetHack when I get time, but I may have to go back to a quite an early version - the later versions require more memory than any of the Propellers I currently have available - from memory (it's been a while) I think the last time I looked at it, I estimated it would require several megabytes of SRAM just for data space, as well as several megabytes of code space (which could be FLASH).
Ross.
Thanks for your great response. The tutorial would be appreciated when you've got a chunk of time suitable for it.
You very correctly called me out on "outside tool". Suffice it to say, a collection of tools, where there is one focus point most of the time, is "a tool" to me.
This:
is simply a matter I had not fully considered, and your consideration of it, as well as presentation of it makes perfect sense. Agreed.
And:
is absolutely true, my apologies, if I appeared to frame it as a language matter. I simply was trying to think through some things I do now, mapping them to C on Propeller. I am considering some larger code projects, where SPIN + PASM (SPASM!) isn't going to make any real sense, thus my query.
Thanks again. I really do appreciate your clear explanations, and your very smart questions back to clarify what is murky. It doesn't take a lot of dialog to learn something from you Ross. (A quality I want to be sure and note, because it's of high value.)
Ah! thanks for reminding me - I had clean forgotten I promised that!. I've been busy in the guts of Catalina working on release 3.3 - I will try and include a tutorial about this either with the new release, or shortly thereafer.
For those that are interested, the main feature of the next release of Catalina will be support for Rayman's fantastic FlashPoint Propeller expansion modules - the SuperQuad and the RamPage.
Now everyone with a few Prop pins to spare will be able to run Catalina C programs up to 2Mb in size from XMM FLASH and/or SRAM - for just a few dollars (when I think of the amount of money I spent on the original Hydra eXTreme 512k RAM expansion board ...!)
Since I was in there messing about anyway, I have also decided to refactor, formalize and document the API I already had in place for SPI FLASH and SPI RAM (e.g. on the C3). This will make the job of adding new XMM implementations to Catalina completely trivial - whether it be parallel, single-bit serial, or quad-bit serial, and SRAM, DRAM or FLASH.
Now there are three different aspects to the XMM API:
- The existing XMM API, for small and fast RAM solutions (such as parallel SRAM). This will usually give best performance, but the entire API has to be small enough to fit into the space available in the XMM kernel. The functions that must be provided are:
- XMM_Activate
- XMM_Tristate
- XMM_ReadLong
- XMM_WriteLong
- XMM_ReadMult
- XMM_WriteMult
- XMM_ReadPage
- XMM_WritePage
- A simplified XMM API, for slower or more complex RAM solutions (such as serial SRAM, or DRAM). This API is designed for use with the Caching XMM driver. Since the code no longer resides in the kernel, it is good for cases where the memory is very complex to access (as it tends to be for serial RAM), or where the speed of the memory is slow (ditto). The functions in this API are a subset of the ones described above, which means that any existing XMM APIs can also use the Caching XMM driver (since the cache can also speed up slower parallel XMM implementations). The functions that must be provided for Caching access are:
- XMM_Activate
- XMM_Tristate
- XMM_ReadPage
- XMM_WritePage
- A new XMM API specifically for FLASH-based solutions. The complexity of writing to FLASH means the Caching XMM driver must always be used, but for solutions that have both FLASH and SRAM available, you can provide both these functions and the previous four functions, then use both types of RAM simultaneously. The functions that must be provided for FLASH access are:
- XMM_FlashActivate
- XMM_FlashTristate
- XMM_FlashReadPage
- XMM_FlashWritePage
- XMM_FlashEraseChip
- XMM_FlashEraseBlock
- XMM_FlashUnprotect
- XMM_FlashWriteEnable
There are also some pretty good performance enhancements in the pipeline. Some of these may be included in the next release as well.Ross.