Language Interoperability (was Compiller and interpreter calling conventions)
ersmith
Posts: 6,053
One thing we haven't really discussed yet is how to call functions and pass parameters, and what registers / cog memory locations have special uses. This kind of thing is called an Application Binary Interface (ABI) and is usually standardized for a processor so that different languages can interoperate.
The first thing we should probably nail down is what register/location is used for the stack pointer, what direction the stack grows in, and what alignment is required for the stack. For example, an obvious proposal is: PTRB is the stack pointer, it grows up (push increments the stack pointer, pop decrements it), and should always be long aligned. The choices are somewhat arbitrary, but if we don't have a standard languages won't be able to call each other. C compilers typically specify that the stack grows down (from high addresses towards low) and the malloc heap grows up (from low towards high) but it's not required, and we can change this to match whatever Spin does.
I suspect that for efficiency we will need to have two standard forms of functions -- bytecode (stack based) and compiled (cog memory based). Fastspin already does this, calling them "stackcall" and "fastcall". Pushing parameters onto the stack makes things easier for interpreters, but it's not very efficient. Consider a function to add two variables:
Fastspin uses fastcall (the register based convention) by default, but if it sees that a function does something tricky like taking the address of a parameter it switches over to stackcall so that the stack layout will match Spin's. Of course this is for Spin1 compatibility, so we can change it for Spin2.
Eric
The first thing we should probably nail down is what register/location is used for the stack pointer, what direction the stack grows in, and what alignment is required for the stack. For example, an obvious proposal is: PTRB is the stack pointer, it grows up (push increments the stack pointer, pop decrements it), and should always be long aligned. The choices are somewhat arbitrary, but if we don't have a standard languages won't be able to call each other. C compilers typically specify that the stack grows down (from high addresses towards low) and the malloc heap grows up (from low towards high) but it's not required, and we can change this to match whatever Spin does.
I suspect that for efficiency we will need to have two standard forms of functions -- bytecode (stack based) and compiled (cog memory based). Fastspin already does this, calling them "stackcall" and "fastcall". Pushing parameters onto the stack makes things easier for interpreters, but it's not very efficient. Consider a function to add two variables:
PUB sum(a, b) return a+bIf we compile this to PASM with a stack based calling convention we get something like:
rdlong x0, --ptrb ' pop b rdlong x1, --ptrb ' pop a add x0, x1 wrlong x0, ptrb++ ' push sumWith a register based convention we get:
mov result1, arg1 add result1, arg2where "result1", "arg1", and "arg2" are some predefined COG memory locations.
Fastspin uses fastcall (the register based convention) by default, but if it sees that a function does something tricky like taking the address of a parameter it switches over to stackcall so that the stack layout will match Spin's. Of course this is for Spin1 compatibility, so we can change it for Spin2.
Eric
Comments
So the top most element of the stack is directly added to the second element and then the top one dropped.
This avoids a lot of stack juggling and HUB access.
Sure, we can do that too (I hope Spin2 will). In that case we'll need to define exactly which COG memory locations will hold the top elements and how many there are.
My point is that *now* is the time to define these kinds of things so that languages for P2 can interoperate. I think it's very much something that Chip should take into account as he designs Spin2.
Thanks,
Eric
Eric
PTRB is used as a stack, and it grows up (from low memory to high)
The stack must always be 32 bit aligned
Subroutine/function call return address is passed on the stack (so use CALLB)
Parameters are passed in COG memory at locations 0x1EC..0x1EF. If there are more than 4 parameters then extra ones are passed on the stack.
Return values are sent back in the same COG memory locations as parameters.
(Having too many options is sometimes a problem... one reason to establish some standards early!)
Eric
I suspect it will get some refining when we actually start using it.
Good point. OK, how about PB for return addresses?
So the updated stake in the ground is:
Stack pointer is PTRB, stack grows up, stack is always 32 bit aligned.
Subroutine calls use CALLD and put return address in PB
Up to 8 arguments go in $1e8-$1ef; any additional arguments go on the stack
Return values in $1e8, $1e9
$1e8-$1ef and PB are scratch registers (not preserved by the subroutine). All other COG memory locations must be saved/restored. (In practice the subroutine is almost certainly going to save PB in order to jump back to it, but the caller shouldn't count on this.)
Chip, does this sound reasonable for calling Spin functions? Or would they prefer a different calling convention?
Eric
That sounds fine.
Man, that's hard to think about. I don't know right now. I suppose interfacing to Spin from C will mean that some kind of context will need to be set up.
Yep. It's not just C, it's any PASM code. It didn't really come up in P1 because there was no way to run PASM in the same COG as the Spin interpreter, but with HUBEXEC it should be feasible for PASM to call Spin (and vice-versa).
Traditionally C++ passes a pointer to the object as the first parameter, so a C++ method like: gets implemented the same as the C code: I forget the terminology Spin uses, but IIRC Spin1 has a pointer for the method table and a pointer to the instance data. The instance data is what C++ passes in the first parameter. The method table is normally implicit -- if a method is part of object "foo" then it knows what method table it needs (and in fact the compiler normally elides the method table and just inserts direct jumps to the methods). If methods aren't known at run time for some reason (e.g. virtual methods) then a pointer to the method table is placed in the first long of the object.
Eric
We have cog(s) running other things like UART(s), Video, Keyboard, Mouse, I2C, SPI, ESC drivers, Sound, motors, sensors, etc, etc.
Spin and PASM work well together, but there is no one standard interface. When it comes to C, both Catalina and GCC struggled. Ross proposed a mailbox standard, but no-one wanted a standard, so he did his own thing in Catalina.
I also did my own thing with my Propeller OS.
Now there is also Blockly, PropBasic, Forth, Tachyon, plus others to add into this mix. We need some form of mailbox standard.
Seemed that only Ross, Bill and I (maybe I missed a couple) wanted an interface standard, but the vast vocal majority didn't want one. Bill even suggested that it be baked into P2 but that ship has sailed.
Perhaps we can try again ???
Communicating through FIFO's is pretty easy, though, but maybe not the best approach for all cases. I don't know enough to even know how to think about this, at this point.
The solution was always to pass the values with a mailbox mechanism when the cog starts up, instead of patching the cog image before starting it.
How this mailbox is set up is not really important, every language can just generate the same order of variables and sizes in memory as the original driver did. So no standard needed.
Just the rule: use mailboxes, not patches.
Ross' proposal was heavy related to his Catalina system with some registering of the mailboxes needed. It was much too complicated.
Andy
This was not the real problem. The issue was passing data between programs, not initialising the code where spin patches the cog code before loading.
Patching the code before loading was, for a large part, because of cog code space. This can now be done by a call to hubexec if space is tight.
The real preference was for a simple standard use of a mailbox for each cog, that could preferably be located in hub at a standard location.
I agree that Ross went a little overboard, but that in no way was why it failed to get any traction.
The noisy ones killed any discussion dead in its' tracks a number of times. We just gave up.
Now is the time to try again, but I just don't feel like wasting my time again if no-one is interested!!!
So what you want do define is a standard for Operating Systems on the P2. How background services like Keyboard and Displays can be accessed in a proper way by applications compiled in different languages.
I think the chance for such a standard is much higher for the P2, because we have not such limited resources as on the P1.
But I think it's too early now to think about Operating Systems. Or has the definition of addresses for mailboxes or the parameters they must contain, any inpact to the design of a language?
I must admit that I don't fully understand what Eric wants do achieve with a common calling convention. How does that help to call Spin functions from C for example?
Spin needs the Bytecodeinterpreter in Cog and LUT, C not, or maby a different one for CMM. Then C needs to switch to XBYTE mode when calling the Spin methode and back to HubExec after return. So C has to handle calls to Spincode anyway different than calls to C code.
Does C know about the Spin source code or just libraries? Does it compile this Spin code or is the Spin code already in the P2 memory compiled and written by a Spin-Tool? How does it know the address of the methodes then?
Questions about questions....
Andy
This whole thing may not be too complicated, afterall, but it's hard to think clearly about. Usually these problems settle when you realize where, exactly, the pie must be cut.
Perhaps a more cogent example isn't C calling Spin, but Spin calling C (or PASM in general). In P1 Spin calling PASM required that the Spin code launch a new COG, but there's no reason that has to be the case in P2. Suppose the Spin code is almost fast enough, and you just need one or two key routines recoded in PASM. That PASM might be written by hand, or it might be generated by the C compiler (as PASM output, or as a binary blob), or it might even be generated automatically by a next generation Spin compiler than can output PASM for some functions if the programmer puts a keyword on them. The details of how the Spin and C get linked together can be worked out later. Right now I'm just trying to solve the low level interface. If there's a standard way for Spin to call out to PASM (and ideally vice-versa) then all the languages can use this and it would be a big start to getting them to work well together.
Pretty much every other microprocessor has a standard calling convention documented along with the instruction set. I'm proposing that we do the same for the P2.
I should also clarify that I don't suggest that PASM code must always use the standard calling convention. Within a self-contained PASM block programmers or compilers will use whatever conventions they want. It's only at interfaces to other code that the calling convention comes into play. We can use some keyword like "stdcall" to indicate functions that need this convention.
Eric
Yes this makes a lot of sense.
I think we also need to define which registers can be safely used by the PASM code and which one needs to be left unmodified to not break the function that calls the PASM code.
Maybe also if a part of the LUT is always available for such "stdcall" PASM routines (the higher 256 longs for example).
Andy
I imagine a Spin2 keyword that just takes an address and a data param, and it just does a CALLPA/PB to the address with the data in PA/PB.
Going the other way is trickier, because you do need the Spin interpreter in a cog, but since David worked that out for P1, we could just do the same on P2, but get it to be more built in or defined or whatever with PropGCC?
Stack grows down.
In the compilers I've worked on, I grew the stack down, and heap grows up.
By growing down, you can push return address, arguments, then have locals, and by using PTRA as the frame pointer, use a small positive offset to access the current functions information on the stack.
This of course is less of an issue with both positive and negative offsets, other than the potentially larger reach of using just a positive offset.
It has nothing to do with operating systems. It's all about interoperability between various cog objects. Some of thos objects will be in spin, spin + PASM, PASM, c, tachyon, etc etc.
We need a standard cog to cog interface / mailbox.
There is no stack that can be used here.
The prop is a multi-processor using a mix of languages, and the different languages/processors need to communicate.
Most micros are still single core, and interoperability between them is totally different, yet that is what this threads focus seems to be about. It's neglecting the main use of the prop.
Even calling between compiled languages like C to Pascal or vice versa.
Heck, even calling from C to C++ and C++ to C is a pain.
Then mixing a compiled language with an interpreted one. Sure you can start a Lua interpreter from C then interact with it, but that is a pain. Or perhaps you can add C functionality to Python or Javascript. Also a pain.
Then the killer, you want something like Lua to make calls to Python or vice versa. Good luck with that.
Microsoft's .Net framework was supposed to make it easy to use a bunch of different languages in a single program. I have no idea how well it works but I'm sure it's not coming to the Propeller soon.
As such, I think the only hope of language Interoperability in a single Propeller project is cog to cog communication. If that were standardized, perhaps even baked into the hardware it would stand a chance.
Heater is right that most inter-language mechanisms are a pain to work with, exposing actual functions and parameter passing can be so different between languages that you can't make one solution to cover them all.
But if you just have a built in mailbox/message mechanism, then each language can expose that how it sees fit. It's not the same as having Spin call a C function directly, or vice versa, but it can accomplish the same end result.
Given that: Language A calling language B calling language C ... X, Y, Z is a pipe dream. Never mind vice versa or in any other random combination, that leaves us with inter process communication.
That brings us to a debate that is a decade old on this forum. When the topic was how to reuse PASM drivers from, Spin objects, running in a COG from C. Be it Catalina and then GCC.
At the time nobody could agree on a standard way to do this.
I don't see any hope of improvement.
There is still the question of how do you actually build a single project that is made from languages A, B, C..X,Y,Z, into a single binary for down load ?