Language Interoperability (was Compiller and interpreter calling conventions)

ersmith · 2017-04-09 01:25

One thing we haven't really discussed yet is how to call functions and pass parameters, and what registers / cog memory locations have special uses. This kind of thing is called an Application Binary Interface (ABI) and is usually standardized for a processor so that different languages can interoperate.

The first thing we should probably nail down is what register/location is used for the stack pointer, what direction the stack grows in, and what alignment is required for the stack. For example, an obvious proposal is: PTRB is the stack pointer, it grows up (push increments the stack pointer, pop decrements it), and should always be long aligned. The choices are somewhat arbitrary, but if we don't have a standard languages won't be able to call each other. C compilers typically specify that the stack grows down (from high addresses towards low) and the malloc heap grows up (from low towards high) but it's not required, and we can change this to match whatever Spin does.

I suspect that for efficiency we will need to have two standard forms of functions -- bytecode (stack based) and compiled (cog memory based). Fastspin already does this, calling them "stackcall" and "fastcall". Pushing parameters onto the stack makes things easier for interpreters, but it's not very efficient. Consider a function to add two variables:

PUB sum(a, b)
  return a+b

If we compile this to PASM with a stack based calling convention we get something like:

   rdlong x0, --ptrb  ' pop b
   rdlong x1, --ptrb ' pop a
   add    x0, x1
   wrlong x0, ptrb++ ' push sum

With a register based convention we get:

   mov result1, arg1
   add result1, arg2

where "result1", "arg1", and "arg2" are some predefined COG memory locations.

Fastspin uses fastcall (the register based convention) by default, but if it sees that a function does something tricky like taking the address of a parameter it switches over to stackcall so that the stack layout will match Spin's. Of course this is for Spin1 compatibility, so we can change it for Spin2.

Eric

MJB · 2017-04-09 11:21

to speed things up the TACHYON byte/word code engine keeps the top few elements of the stack in fixed registers in COG. Then the ADD code looks like this:

' + ( n1 n2 -- n3 ) Add top two stack items together and replace with result
PLUS                    add     tos+1,tos wc
                        jmp     #DROP

So the top most element of the stack is directly added to the second element and then the top one dropped.
This avoids a lot of stack juggling and HUB access.

ersmith · 2017-04-09 15:10

MJB wrote: »

to speed things up the TACHYON byte/word code engine keeps the top few elements of the stack in fixed registers in COG.

Sure, we can do that too (I hope Spin2 will). In that case we'll need to define exactly which COG memory locations will hold the top elements and how many there are.

My point is that *now* is the time to define these kinds of things so that languages for P2 can interoperate. I think it's very much something that Chip should take into account as he designs Spin2.

Thanks,
Eric

Dave Hein · 2017-04-09 17:40

Chip has already changed the Spin2 interpreter to keep the top stack item in a register. Optimized C code will keep stuff in registers and avoid the stack as much as possible. Forth code uses the stack for every instruction, so it is beneficial to keep the top couple of elements in registers.

ersmith · 2017-04-11 22:14

I changed the thread title to make it clear that the focus of my question wasn't "what is the best calling convention?" (although that is relevant) but rather "can we pick one or a few calling conventions to be standard on the P2?" The Propeller didn't have any standard calling convention, and as a result there was no easy way for languages to be linked together and interoperate. It would be kind of nice if people could use Spin2 objects from C, and vice-versa, and if other languages like Forth and Lisp could call out to C or PASM then so much the better.

Eric

ersmith · 2017-04-11 22:19

Just to throw a stake in the ground, here's a proposal for PASM/compiled code calling conventions:

PTRB is used as a stack, and it grows up (from low memory to high)
The stack must always be 32 bit aligned
Subroutine/function call return address is passed on the stack (so use CALLB)
Parameters are passed in COG memory at locations 0x1EC..0x1EF. If there are more than 4 parameters then extra ones are passed on the stack.
Return values are sent back in the same COG memory locations as parameters.

David Betz · 2017-04-11 22:27

ersmith wrote: »

Just to throw a stake in the ground, here's a proposal for PASM/compiled code calling conventions:

PTRB is used as a stack, and it grows up (from low memory to high)
The stack must always be 32 bit aligned
Subroutine/function call return address is passed on the stack (so use CALLB)
Parameters are passed in COG memory at locations 0x1EC..0x1EF. If there are more than 4 parameters then extra ones are passed on the stack.
Return values are sent back in the same COG memory locations as parameters.

It's interesting that you suggest using CALLB for calling subroutines. We asked Chip for a CALL instruction that stores its return address in a register rather than on the stack. Was that a mistake?

ersmith · 2017-04-11 22:35

David Betz wrote: »

ersmith wrote: »

Just to throw a stake in the ground, here's a proposal for PASM/compiled code calling conventions:

PTRB is used as a stack, and it grows up (from low memory to high)
The stack must always be 32 bit aligned
Subroutine/function call return address is passed on the stack (so use CALLB)
Parameters are passed in COG memory at locations 0x1EC..0x1EF. If there are more than 4 parameters then extra ones are passed on the stack.
Return values are sent back in the same COG memory locations as parameters.

It's interesting that you suggest using CALLB for calling subroutines. We asked Chip for a CALL instruction that stores its return address in a register rather than on the stack. Was that a mistake?

No, it wasn't a mistake at all, and maybe having the return address stored in COG memory would be even better. I was thinking that simple compilers might want to just use CALLB, but you're right, it would be more efficient to store the return address in COG memory. Would you rather see the return address placed in 0x1EF? and 0x1E8-0x1EE used for parameters and working memory? I'd be OK with that too.

(Having too many options is sometimes a problem... one reason to establish some standards early!)

Eric

cgracey · 2017-04-11 22:39

Eric, I'm pretty open on this. Whatever you think is good will probably be fine. I'll accommodate it in Spin2.

I suspect it will get some refining when we actually start using it.

Dave Hein · 2017-04-11 23:03

The long form of CALLD requires using one of the registers at $1f6 to $1f9 for the return address. Since $1f8 and $1f9 are PTRA and PTRB it's probably best to use either $1f6 or $1f7 (PA or PB) for the return address.

David Betz · 2017-04-11 23:04

ersmith wrote: »

Would you rather see the return address placed in 0x1EF? and 0x1E8-0x1EE used for parameters and working memory? I'd be OK with that too.

That sounds fine. In fact, you know more about what would make generating code from GCC easier than I do. Whatever you suggest is okay with me.

ersmith · 2017-04-11 23:19

Dave Hein wrote: »

The long form of CALLD requires using one of the registers at $1f6 to $1f9 for the return address. Since $1f8 and $1f9 are PTRA and PTRB it's probably best to use either $1f6 or $1f7 (PA or PB) for the return address.

Good point. OK, how about PB for return addresses?

So the updated stake in the ground is:

Stack pointer is PTRB, stack grows up, stack is always 32 bit aligned.
Subroutine calls use CALLD and put return address in PB
Up to 8 arguments go in $1e8-$1ef; any additional arguments go on the stack
Return values in $1e8, $1e9
$1e8-$1ef and PB are scratch registers (not preserved by the subroutine). All other COG memory locations must be saved/restored. (In practice the subroutine is almost certainly going to save PB in order to jump back to it, but the caller shouldn't count on this.)

Chip, does this sound reasonable for calling Spin functions? Or would they prefer a different calling convention?

Eric

cgracey · 2017-04-11 23:30

ersmith wrote: »

Dave Hein wrote: »

The long form of CALLD requires using one of the registers at $1f6 to $1f9 for the return address. Since $1f8 and $1f9 are PTRA and PTRB it's probably best to use either $1f6 or $1f7 (PA or PB) for the return address.

Good point. OK, how about PB for return addresses?

So the updated stake in the ground is:

Stack pointer is PTRB, stack grows up, stack is always 32 bit aligned.
Subroutine calls use CALLD and put return address in PB
Up to 8 arguments go in $1e8-$1ef; any additional arguments go on the stack
Return values in $1e8, $1e9
$1e8-$1ef and PB are scratch registers (not preserved by the subroutine). All other COG memory locations must be saved/restored. (In practice the subroutine is almost certainly going to save PB in order to jump back to it, but the caller shouldn't count on this.)

Chip, does this sound reasonable for calling Spin functions? Or would they prefer a different calling convention?

Eric

That sounds fine.

ersmith · 2017-04-11 23:35

I guess Spin methods are going to need some additional info like where vbase & pbase are. Any thoughts on that? Were you thinking of using PTRA for one of those?

cgracey · 2017-04-11 23:39

ersmith wrote: »

I guess Spin methods are going to need some additional info like where vbase & pbase are. Any thoughts on that? Were you thinking of using PTRA for one of those?

Man, that's hard to think about. I don't know right now. I suppose interfacing to Spin from C will mean that some kind of context will need to be set up.

ersmith · 2017-04-12 00:03

cgracey wrote: »

ersmith wrote: »

I guess Spin methods are going to need some additional info like where vbase & pbase are. Any thoughts on that? Were you thinking of using PTRA for one of those?

Man, that's hard to think about. I don't know right now. I suppose interfacing to Spin from C will mean that some kind of context will need to be set up.

Yep. It's not just C, it's any PASM code. It didn't really come up in P1 because there was no way to run PASM in the same COG as the Spin interpreter, but with HUBEXEC it should be feasible for PASM to call Spin (and vice-versa).

Traditionally C++ passes a pointer to the object as the first parameter, so a C++ method like:

class foo {
   int x;
   void setx(int y) { x = y; }
};

gets implemented the same as the C code:

struct foo {
   int x;
};
void setx_for_foo(struct foo *self, int y) { self->x = y; }

I forget the terminology Spin uses, but IIRC Spin1 has a pointer for the method table and a pointer to the instance data. The instance data is what C++ passes in the first parameter. The method table is normally implicit -- if a method is part of object "foo" then it knows what method table it needs (and in fact the compiler normally elides the method table and just inserts direct jumps to the methods). If methods aren't known at run time for some reason (e.g. virtual methods) then a pointer to the method table is placed in the first long of the object.

Eric

Cluso99 · 2017-04-12 02:08

IMHO this thread seems to be missing the biggest interoperability issue, that of interfacing other cog(s) programs.

We have cog(s) running other things like UART(s), Video, Keyboard, Mouse, I2C, SPI, ESC drivers, Sound, motors, sensors, etc, etc.

Spin and PASM work well together, but there is no one standard interface. When it comes to C, both Catalina and GCC struggled. Ross proposed a mailbox standard, but no-one wanted a standard, so he did his own thing in Catalina.

I also did my own thing with my Propeller OS.

Now there is also Blockly, PropBasic, Forth, Tachyon, plus others to add into this mix. We need some form of mailbox standard.

Seemed that only Ross, Bill and I (maybe I missed a couple) wanted an interface standard, but the vast vocal majority didn't want one. Bill even suggested that it be baked into P2 but that ship has sailed.

Perhaps we can try again ???

cgracey · 2017-04-12 02:18

Well, a mailbox is a simpler matter, it seems, because you aren't going to be running different code types on the same cog. I haven't thought enough about this, yet, to have much of an idea of HOW all this would work.

Communicating through FIFO's is pretty easy, though, but maybe not the best approach for all cases. I don't know enough to even know how to think about this, at this point.

Ariba · 2017-04-12 03:58

The Problem on Prop1 was that many Spin drivers just patched the Cog Image in Hub, before they started the new cog with the PASM driver. That was possible because Spin knows the adresses of the register variables (they have a label that Spin knows). But other languages with no integrated Assembler could not know the location of the variables to patch, especially if they just load the cog image as a binary blob.

The solution was always to pass the values with a mailbox mechanism when the cog starts up, instead of patching the cog image before starting it.
How this mailbox is set up is not really important, every language can just generate the same order of variables and sizes in memory as the original driver did. So no standard needed.
Just the rule: use mailboxes, not patches.

Ross' proposal was heavy related to his Catalina system with some registering of the mailboxes needed. It was much too complicated.

Andy

Cluso99 · 2017-04-12 05:46

Ariba wrote: »

The Problem on Prop1 was that many Spin drivers just patched the Cog Image in Hub, before they started the new cog with the PASM driver. That was possible because Spin knows the adresses of the register variables (they have a label that Spin knows). But other languages with no integrated Assembler could not know the location of the variables to patch, especially if they just load the cog image as a binary blob.

The solution was always to pass the values with a mailbox mechanism when the cog starts up, instead of patching the cog image before starting it.
How this mailbox is set up is not really important, every language can just generate the same order of variables and sizes in memory as the original driver did. So no standard needed.
Just the rule: use mailboxes, not patches.

Ross' proposal was heavy related to his Catalina system with some registering of the mailboxes needed. It was much too complicated.

Andy

Andy,
This was not the real problem. The issue was passing data between programs, not initialising the code where spin patches the cog code before loading.

Patching the code before loading was, for a large part, because of cog code space. This can now be done by a call to hubexec if space is tight.

The real preference was for a simple standard use of a mailbox for each cog, that could preferably be located in hub at a standard location.

I agree that Ross went a little overboard, but that in no way was why it failed to get any traction.

The noisy ones killed any discussion dead in its' tracks a number of times. We just gave up.

Now is the time to try again, but I just don't feel like wasting my time again if no-one is interested!!!

Ariba · 2017-04-12 06:40

Cluso

So what you want do define is a standard for Operating Systems on the P2. How background services like Keyboard and Displays can be accessed in a proper way by applications compiled in different languages.

I think the chance for such a standard is much higher for the P2, because we have not such limited resources as on the P1.
But I think it's too early now to think about Operating Systems. Or has the definition of addresses for mailboxes or the parameters they must contain, any inpact to the design of a language?

I must admit that I don't fully understand what Eric wants do achieve with a common calling convention. How does that help to call Spin functions from C for example?
Spin needs the Bytecodeinterpreter in Cog and LUT, C not, or maby a different one for CMM. Then C needs to switch to XBYTE mode when calling the Spin methode and back to HubExec after return. So C has to handle calls to Spincode anyway different than calls to C code.

Does C know about the Spin source code or just libraries? Does it compile this Spin code or is the Spin code already in the P2 memory compiled and written by a Spin-Tool? How does it know the address of the methodes then?
Questions about questions....

Andy

cgracey · 2017-04-12 06:47

Yeah, like Andy says, this is hard to think about, at this point. Please don't register a lack of enthusiasm from me, Eric. I really just don't know how to think about this, yet. Andy pushed my awareness a little further, so I realize I even know less than I thought I did.

This whole thing may not be too complicated, afterall, but it's hard to think clearly about. Usually these problems settle when you realize where, exactly, the pie must be cut.

ersmith · 2017-04-12 09:10

Ariba wrote: »

I must admit that I don't fully understand what Eric wants do achieve with a common calling convention. How does that help to call Spin functions from C for example?
Spin needs the Bytecodeinterpreter in Cog and LUT, C not, or maby a different one for CMM. Then C needs to switch to XBYTE mode when calling the Spin methode and back to HubExec after return. So C has to handle calls to Spincode anyway different than calls to C code.

Calling Spin methods from C is something lots of people wanted to do on P1, but of course it was hard (David Betz did come up with a way to do this, but it didn't gain a lot of traction, in part because it wasn't baked into PropGCC and in part because it needed the Spin interpreter running in another COG.)

Perhaps a more cogent example isn't C calling Spin, but Spin calling C (or PASM in general). In P1 Spin calling PASM required that the Spin code launch a new COG, but there's no reason that has to be the case in P2. Suppose the Spin code is almost fast enough, and you just need one or two key routines recoded in PASM. That PASM might be written by hand, or it might be generated by the C compiler (as PASM output, or as a binary blob), or it might even be generated automatically by a next generation Spin compiler than can output PASM for some functions if the programmer puts a keyword on them. The details of how the Spin and C get linked together can be worked out later. Right now I'm just trying to solve the low level interface. If there's a standard way for Spin to call out to PASM (and ideally vice-versa) then all the languages can use this and it would be a big start to getting them to work well together.

Pretty much every other microprocessor has a standard calling convention documented along with the instruction set. I'm proposing that we do the same for the P2.

I should also clarify that I don't suggest that PASM code must always use the standard calling convention. Within a self-contained PASM block programmers or compilers will use whatever conventions they want. It's only at interfaces to other code that the calling convention comes into play. We can use some keyword like "stdcall" to indicate functions that need this convention.

Eric

Ariba · 2017-04-12 11:07

Thank's Eric

Yes this makes a lot of sense.

I think we also need to define which registers can be safely used by the PASM code and which one needs to be left unmodified to not break the function that calls the PASM code.
Maybe also if a part of the LUT is always available for such "stdcall" PASM routines (the higher 256 longs for example).

Andy

Roy Eltham · 2017-04-12 16:23

It would be fairly easy for Spin2 to have an "invoke" mechanism that just did a call to a piece of code in hubexec mode. This would be similar to what exists in the manage language stuff (.NET), where you can use an invoke mechanism to call native code inside of libraries.

I imagine a Spin2 keyword that just takes an address and a data param, and it just does a CALLPA/PB to the address with the data in PA/PB.

Going the other way is trickier, because you do need the Spin interpreter in a cog, but since David worked that out for P1, we could just do the same on P2, but get it to be more built in or defined or whatever with PropGCC?

Bill Henning · 2017-04-12 17:26

Sounds pretty good, I'd only suggest one change:

Stack grows down.

In the compilers I've worked on, I grew the stack down, and heap grows up.

By growing down, you can push return address, arguments, then have locals, and by using PTRA as the frame pointer, use a small positive offset to access the current functions information on the stack.

This of course is less of an issue with both positive and negative offsets, other than the potentially larger reach of using just a positive offset.

ersmith wrote: »

Dave Hein wrote: »

The long form of CALLD requires using one of the registers at $1f6 to $1f9 for the return address. Since $1f8 and $1f9 are PTRA and PTRB it's probably best to use either $1f6 or $1f7 (PA or PB) for the return address.

Good point. OK, how about PB for return addresses?

So the updated stake in the ground is:

Stack pointer is PTRB, stack grows up, stack is always 32 bit aligned.
Subroutine calls use CALLD and put return address in PB
Up to 8 arguments go in $1e8-$1ef; any additional arguments go on the stack
Return values in $1e8, $1e9
$1e8-$1ef and PB are scratch registers (not preserved by the subroutine). All other COG memory locations must be saved/restored. (In practice the subroutine is almost certainly going to save PB in order to jump back to it, but the caller shouldn't count on this.)

Chip, does this sound reasonable for calling Spin functions? Or would they prefer a different calling convention?

Eric

Cluso99 · 2017-04-12 21:46

Ariba wrote: »

Cluso

So what you want do define is a standard for Operating Systems on the P2. How background services like Keyboard and Displays can be accessed in a proper way by applications compiled in different languages.

I think the chance for such a standard is much higher for the P2, because we have not such limited resources as on the P1.
But I think it's too early now to think about Operating Systems. Or has the definition of addresses for mailboxes or the parameters they must contain, any inpact to the design of a language?

I must admit that I don't fully understand what Eric wants do achieve with a common calling convention. How does that help to call Spin functions from C for example?
Spin needs the Bytecodeinterpreter in Cog and LUT, C not, or maby a different one for CMM. Then C needs to switch to XBYTE mode when calling the Spin methode and back to HubExec after return. So C has to handle calls to Spincode anyway different than calls to C code.

Does C know about the Spin source code or just libraries? Does it compile this Spin code or is the Spin code already in the P2 memory compiled and written by a Spin-Tool? How does it know the address of the methodes then?
Questions about questions....

Andy

Andy,
It has nothing to do with operating systems. It's all about interoperability between various cog objects. Some of thos objects will be in spin, spin + PASM, PASM, c, tachyon, etc etc.

We need a standard cog to cog interface / mailbox.
There is no stack that can be used here.

The prop is a multi-processor using a mix of languages, and the different languages/processors need to communicate.

Most micros are still single core, and interoperability between them is totally different, yet that is what this threads focus seems to be about. It's neglecting the main use of the prop.

Tor · 2017-04-13 22:21

No, these are two different issues that both need to be addressed: Language interoperability, and cog to cog communication. This thread is about the former. The latter should be addressed in a separate thread.

Heater. · 2017-04-13 22:46

My experience of mixing languages in a single project/program has always been a royal pain in the butt.

Even calling between compiled languages like C to Pascal or vice versa.

Heck, even calling from C to C++ and C++ to C is a pain.

Then mixing a compiled language with an interpreted one. Sure you can start a Lua interpreter from C then interact with it, but that is a pain. Or perhaps you can add C functionality to Python or Javascript. Also a pain.

Then the killer, you want something like Lua to make calls to Python or vice versa. Good luck with that.

Microsoft's .Net framework was supposed to make it easy to use a bunch of different languages in a single program. I have no idea how well it works but I'm sure it's not coming to the Propeller soon.

As such, I think the only hope of language Interoperability in a single Propeller project is cog to cog communication. If that were standardized, perhaps even baked into the hardware it would stand a chance.

Roy Eltham · 2017-04-13 23:08

I think a good mailbox mechanism for cog to cog communication could be a reasonable solution for inter-language stuff. Just need mailbox send/receive code in both languages.

Heater is right that most inter-language mechanisms are a pain to work with, exposing actual functions and parameter passing can be so different between languages that you can't make one solution to cover them all.

But if you just have a built in mailbox/message mechanism, then each language can expose that how it sees fit. It's not the same as having Spin call a C function directly, or vice versa, but it can accomplish the same end result.

Heater. · 2017-04-13 23:37

There we have it.

Given that: Language A calling language B calling language C ... X, Y, Z is a pipe dream. Never mind vice versa or in any other random combination, that leaves us with inter process communication.

That brings us to a debate that is a decade old on this forum. When the topic was how to reuse PASM drivers from, Spin objects, running in a COG from C. Be it Catalina and then GCC.

At the time nobody could agree on a standard way to do this.

I don't see any hope of improvement.

There is still the question of how do you actually build a single project that is made from languages A, B, C..X,Y,Z, into a single binary for down load ?

Language Interoperability (was Compiller and interpreter calling conventions)

Comments