Eric, we were thinking the same thing. I chose parentheses instead of brackets, because brackets may already be present in selecting a member of an object array.
I liked brackets because they mirror the LONG[p] usage that is already part of Spin. That syntax already indicates pointer dereference and can just be extended by adding these new objects types to the built-in types of LONG, WORD, and BYTE that are already supported in Spin1.
I like brackets better, too, but how to handle the idea that objects can be in arrays?
I think you can resolve that by the type of symbol that precedes the open bracket just like you do with LONG, WORD, and BYTE now. If the identifier before the bracket is a type name (including the new object types) then the brackets indicate pointer dereferencing. Otherwise, it is array indexing.
Let's look at a concrete example: a Terminal object that can extend any object that has a "tx" method with calls like str(), dec(), hex(), etc. There are two ways to approach this: method pointers, or inheritence. Let's look at method pointers first.
For a method pointer we need 3 things: (1) a pointer to the table of functions for the object (vtable), (2) an offset into that table for the particular method we will call, and (3) a pointer to the VAR data of the particular object that will be handling the call. (1) and (2) can actually be combined to give just a single pointer. That is, given a table of functions FUNCTAB and an offset into the table i we can calculate @FUNCTAB, or even do the lookup and just get FUNCTAB to get a pointer to the final code to call. So we're down to having to pass two pointers for a method: a pointer to the method's code, and a pointer to the object data. These could be passed as two variables. The syntax could look like:
' send a char using the func method of object ser
PUB sendchar( ser.func, val )
ser.func( val )
The sendchar function really takes three parameters; the "ser.func" notation in the parameter list is just syntactic sugar (although we could also use it for type checking in the compiler if we wanted to). In the body of the PUB the "ser.func" notation would currently trigger an error, since "ser" and "func" are just regular variables. In the new compilier instead of triggering an error this would construct a call using "ser" as the object var pointer and "func" as a pointer to the method. If we were compiling to PASM this would look something like:
push objptr ' save current object pointer
mov objptr, ser ' set new current object
push val ' set parameter
call *func ' indirect call through register func
pop objptr ' restore object pointer
Here's how a generic "hex" object would be created with this scheme:
VAR
long baseObj ' data of object used for printing chars
long txFunc ' method used to print one char
'' Print a hexadecimal number
PUB hex(value, digits)
value <<= (8 - digits) << 2
repeat digits
baseObj.txFunc(lookupz((value <-= 4) & $F : "0".."9", "A".."F"))
PUB init(theobj.method)
baseObj := theobj
txFunc := method
and to use we'd do something like:
OBJ
fds : "FullDuplexSerial"
t: "hex.spin"
...
t.init(@fds.tx) ' pushes pointers to fds data and tx method code
t.hex($12345, 8)
That works well for this simple example but it becomes complex if there are lots of methods you need to install in the Terminal object instance. I think Roy's interface scheme works better for the general case.
Eric, we were thinking the same thing. I chose parentheses instead of brackets, because brackets may already be present in selecting a member of an object array.
Yes, it would be nice to be able to treat abstract objects (types) and concrete objects the same way. I went with the brackets for the reason David mentioned, because it's similar to the LONG[] notation. But maybe we should use a totally new syntax, like "obj::t" instead of "obj[t]" or "obj(t)".
That works well for this simple example but it becomes complex if there are lots of methods you need to install in the Terminal object instance. I think Roy's interface scheme works better for the general case.
Handling multiple interfaces in an object could get hairy though -- if a FullDuplexSerial object "fds" implements both the "Input" and "Output" interfaces, then in a call like:
t.hex(fds, val)
what do you pass in the first argument -- a pointer to the FullDuplexSerial object as a whole (the whole thing), a pointer to the Input interface, or a pointer to the Output interface? In C++ the compiler figures that out based on the parameter types, but so far Spin doesn't have types on parameters.
I guess we could restrict objects and require that they implement only one interface, which always goes at the beginning, but I think that's too much of a restriction -- for example in general we would like to be able to do input and output interfaces.
The beauty of method pointers is that you can use them to implement interfaces -- you could package up all the methods you might want into an object, and initialize that object with method pointers from objects that implement the interface.
That works well for this simple example but it becomes complex if there are lots of methods you need to install in the Terminal object instance. I think Roy's interface scheme works better for the general case.
Handling multiple interfaces in an object could get hairy though -- if a FullDuplexSerial object "fds" implements both the "Input" and "Output" interfaces, then in a call like:
t.hex(fds, val)
what do you pass in the first argument -- a pointer to the FullDuplexSerial object as a whole (the whole thing), a pointer to the Input interface, or a pointer to the Output interface? In C++ the compiler figures that out based on the parameter types, but so far Spin doesn't have types on parameters.
I guess we could restrict objects and require that they implement only one interface, which always goes at the beginning, but I think that's too much of a restriction -- for example in general we would like to be able to do input and output interfaces.
The beauty of method pointers is that you can use them to implement interfaces -- you could package up all the methods you might want into an object, and initialize that object with method pointers from objects that implement the interface.
Right. Implementing multiple interfaces would be difficult. I guess I was assuming only a single interface for each object. If you really want to implement multiple interfaces I guess you could do it with proxies. I guess there is also the question of whether Spin should support inheritance. As someone has already mentioned, we're well on the way to reinventing C++.
The key is limits. Pick the right ones and a whole lot people want becomes possible, perhaps not always optimal. We need the very best bits framed to keep people out of trouble and give them a shot at reusing what others have done.
Knowing how methods are tracked in Spin and how they are called, I don't think multiple interfaces would be all that nasty.
It's just index remapping, so the objects would have a table of interface offsets, each entry is an interface ID and an offset into the object's main method table.
It requires a lookup my interface ID, and an extra indirection on the call (p1 spin calls to child objects are already a double indirection).
This might even be able to be made simpler given a bit more thought and compiler work.
On a high level view, doing this work will expand SPIN plus PASM both directions.
The nature of the hardware gets us real time and does so in a pretty clean, reusable way. Low level is right there. Pretty easy.
Going this direction means both more complex and big programs dynamically loaded and or generated if desired. But it can also mean more directly reusing PASM too. Plug n play.
It also gets us Chips vision of exploring objects conversationally too.
Knowing how methods are tracked in Spin and how they are called, I don't think multiple interfaces would be all that nasty.
It's just index remapping, so the objects would have a table of interface offsets, each entry is an interface ID and an offset into the object's main method table.
It requires a lookup my interface ID, and an extra indirection on the call (p1 spin calls to child objects are already a double indirection).
This might even be able to be made simpler given a bit more thought and compiler work.
I think as long as the compiler knows the (real) type of the object finding the interface should be easy. Otherwise, though, there could be a problem. For handling objects as function parameters, I think this means:
(a) provide explicit types for the function parameter, so the compiler knows which interface of a multi-interface object to pass to the function; or
(b) make the compiler smart enough to be able to infer the function parameter types; this probably requires putting some restrictions on what you can do with parameters; or
(c) put some kind of tagging into the objects in memory, so that at run time given a pointer to an object we can find the appropriate interface (so for example in a vga object the "Output" interface is the first pointer in the vtable, since vga only implements one interface, whereas in a serial object it may be the second pointer since serial implements both "Input" and "Output"); or
(d) don't allow objects as function parameters. Inheritence would still allow us to do many things at compile time (the "hex" object becomes a "hex" interface that output objects would have to implement), but this would be pretty restrictive.
Am I missing something? A worked example would probably be helpful.
Question, is there any realistic need for thousands of instances of a particular object? Might even 256 be excessive?
2D points in a paint program. 3D vectors in an IMU, ... 640 should be enough for everyone. Roy's suggestion of setup on startup is pretty common. The "constructor" of an object (code called when you create one) fills in the necessary pointers after the memory is cleared, calls any nested constructors in objects it might own.
I haven't fully caught up (life, work, ...) but can someone explain why current Spin objects require two pointers? I get needing a pointer to the VAR data, but isn't everything else completely fixed in address space at runtime? If that assumption is correct, then why the need for the additional pointer?
Even inheritance doesn't require dynamic vtables unless you want virtual objects (IE, Roy's description of interfaces) where calling code doesn't actually know the fully derived type of the object you're calling functions on.
Just trying to figure out why there's an actual need for two pointers in the typical, "fully qualified type" case we currently have with Spin.
OBJ = was going to be 64 bits (needs another name to not trigger OBJ block)
If it turns out that a double long ends up being needed after all for some kind of object construct, perhaps it could be called BLOB for "Big Long OBject" or something like that. On the other hand, I guess it doesn't scale well to 128 bits.
If we're going to start overloading existing terms or inventing new ones, my vote would be just to go directly to:
i8 / u8
i16 / u16
i32 / u32
i64 / u64
and so on, assuming you actually care about the size. For things where you really don't care, char, short, int are sufficient, and can just be aliases to the "real" types.
If we're going to start overloading existing terms or inventing new ones, my vote would be just to go directly to:
i8 / u8
i16 / u16
i32 / u32
i64 / u64
and so on, assuming you actually care about the size. For things where you really don't care, char, short, int are sufficient, and can just be aliases to the "real" types.
Coders will always care about size, and aliases are fine.
That brings things into line with wider industry conventions, which is to include numbers in types to avoid confusion.
JasonDorie,
The current way that calls work in Spin is that the current object has a table at the front with all method pointers in it and child object pointers (the pbase and vbase for the given instance, these are actually offsets from the current object's start not absolute pointers). So a call to a child object has 2 indices, one is the index into it's own table to get the object pointers(offsets that it adds to it's base address to get actual pointers) then the index into the child object's table for the method. Since all instances of an object's code get distilled down to one copy, the object itself doesn't know it's vbase (since it might have many). The vbase pointers(again offsets converted to pointers at runtime) are all in the tables of the objects that include them.
Also, at compile time no object knows it's final address in memory, they just have offsets from their base address. The whole program has a pbase and vbase address at the beginning, so those are used to get to the actual data via all the offsets in the objects.
This could all change, and I think maybe it should.
Roy, I probably misunderstood your last comment, but it sounds like you said that bytes, words and longs are all signed. Of course, only longs are signed. bytes and words are unsigned. However, all bytes and words are extended to longs after they are fetched from memory and used in computations.
No, a BYTE $FF is treated as an unsigned value of 255. A WORD $FFFF is an unsigned value of 65,535. So the types used by Spin are u8, u16 and s32. The ~ and ~~ operators can be used to do byte and word sign extension.
JasonDorie,Also, at compile time no object knows it's final address in memory, they just have offsets from their base address. The whole program has a pbase and vbase address at the beginning, so those are used to get to the actual data via all the offsets in the objects.
This could all change, and I think maybe it should.
re i8/u8, etc. Spin is all signed for it's VARs.
That sounds like a lot of extra work at runtime - a single finalizing pass in the compiler could, in theory, fix all code addresses to static memory locations. I understand that's not how it's currently done, I just think if it was changed it would simplify the runtime and make code execution faster. You know more about the under-the-hood in Spin, so please correct me if my assumptions are bogus.
And yeah, I knew Spin was all signed (mostly) but the logic stands - i8, i16, i32, and i64 would make more sense than byte, char, word, short, dword, long, and qword.
Well, since the concept of something smaller than a long is only in hub reads... those don't extend the value so much as copy the byte/word into a long register. If you use the ~ or ~~ operator on them then they will be sign extended.
So yes, you can call bytes and words in hub memory unsigned. You need to take care when doing math/shift/roll operations on them if you want the results to be as expected.
I just got up, forgive me for not being clear. Sorry.
JasonDorie,
Yeah changing the compiler to fix up all the calls and whatnot to be absolute addresses would make things faster. It would also make these new proposed features easier.
This is not possible for the existing in ROM P1 bytecode runtime, since it doesn't have the proper call type, it could be done for the new P2 one.
The code size would be a little bigger though, since the existing call bytecode is 2 or 3 bytes total, and then uses the tables.
Well, rel-call vs abs-call could improve that. A rel-call within a 32kb address space maxes out at +/-32kb, so if the interpreter supported those that would cover many cases (though granted, not all). Lots of modern hardware has rel-jump to i8 offset, then i16, then i32 as a form of compression. If you can make the assumption that addresses are aligned in some way (2 byte or 4 byte, for example) you can double or quad that range.
“When I use a WORD,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is,” said Alice, “whether you can make WORDs mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master—that’s all.”
Lewis Carroll (1832–98)
Lewis Caroll is essential reading for anyone who wants to learn computer programming.
JasonDorie,
For Spin calls, you would need the pointer to the code and the pointer to the var space for that given instance (aka the this pointer). Even if both of those pointers could be packed, it would still be larger (2-3x?) than the current 2 or 3 byte call bytecode.
I'm not sure that it really matters though, in many cases methods are only called from one place, and so the 2-3 byte call bytecode plus the table entry is larger than a single absolute address call would be. It's only when you call a method from multiple places that the bytecode + table thing is a savings.
We could keep the bytecode compact call type, and just change how the tables are made/arranged to reduce the work a bit.
In any case, these are implementation details that we can work out when the compiler code is being written/updated. We should define what we want, feature wise, for Spin2 and then worry about the implementation.
If we're going to start overloading existing terms or inventing new ones, my vote would be just to go directly to:
i8 / u8
i16 / u16
i32 / u32
i64 / u64
and so on, assuming you actually care about the size. For things where you really don't care, char, short, int are sufficient, and can just be aliases to the "real" types.
Ersmith, thanks for your example. I read through it several times and it pretty much made sense. I'm still trying to get my head around all this. I'm not able to have a good conversation, yet.
I just read ersmith's example - It'll work, but it requires manually "hooking up" pointers to all the methods of an object you want to call, which seems really cumbersome, given that the compiler should have all the knowledge it needs about what methods are available on an object of a given type.
I would alter his example to look more like this:
Generic "hex" object creation:
TYPE
FDSTYPE: "fullDuplexSerial" 'this line imports the methods of FullDuplexSerial so the compiler knows what's available, ref's it as FDSTYPE
VAR
FDSTYPE fdsInstance 'this line creates a pointer to an object of a defined type
'' Print a hexadecimal number
PUB hex(value, digits)
value <<= (8 - digits) << 2
repeat digits
fdsInstance.tx(lookupz((value <-= 4) & $F : "0".."9", "A".."F")) 'this line uses the pointer to access a method through the pointer
PUB init( FDSTYPE pointerToOriginal )
fdsInstance := pointerToOriginal 'this line just copies the pointer passed from the caller to a local var
and to use we'd do something like:
OBJ
fds : "FullDuplexSerial"
t: "hex.spin"
t.init(@fds) 'push a single pointer to the FDS object
t.hex($12345, 8)
Ignore for a moment the current implementation limits of Spin - it feels like you keep offloading things the compiler can do onto the user because you're stuck in how the Spin compiler currently works.
Comments
Yes, it would be nice to be able to treat abstract objects (types) and concrete objects the same way. I went with the brackets for the reason David mentioned, because it's similar to the LONG[] notation. But maybe we should use a totally new syntax, like "obj::t" instead of "obj[t]" or "obj(t)".
Handling multiple interfaces in an object could get hairy though -- if a FullDuplexSerial object "fds" implements both the "Input" and "Output" interfaces, then in a call like: what do you pass in the first argument -- a pointer to the FullDuplexSerial object as a whole (the whole thing), a pointer to the Input interface, or a pointer to the Output interface? In C++ the compiler figures that out based on the parameter types, but so far Spin doesn't have types on parameters.
I guess we could restrict objects and require that they implement only one interface, which always goes at the beginning, but I think that's too much of a restriction -- for example in general we would like to be able to do input and output interfaces.
The beauty of method pointers is that you can use them to implement interfaces -- you could package up all the methods you might want into an object, and initialize that object with method pointers from objects that implement the interface.
Chip tends to simplify to the nubs people really need. This conversation is high value IMHO.
So far, the accessible core of SPIN is not impacted. Good.
It's just index remapping, so the objects would have a table of interface offsets, each entry is an interface ID and an offset into the object's main method table.
It requires a lookup my interface ID, and an extra indirection on the call (p1 spin calls to child objects are already a double indirection).
This might even be able to be made simpler given a bit more thought and compiler work.
The nature of the hardware gets us real time and does so in a pretty clean, reusable way. Low level is right there. Pretty easy.
Going this direction means both more complex and big programs dynamically loaded and or generated if desired. But it can also mean more directly reusing PASM too. Plug n play.
It also gets us Chips vision of exploring objects conversationally too.
Could be really good.
I think as long as the compiler knows the (real) type of the object finding the interface should be easy. Otherwise, though, there could be a problem. For handling objects as function parameters, I think this means:
(a) provide explicit types for the function parameter, so the compiler knows which interface of a multi-interface object to pass to the function; or
(b) make the compiler smart enough to be able to infer the function parameter types; this probably requires putting some restrictions on what you can do with parameters; or
(c) put some kind of tagging into the objects in memory, so that at run time given a pointer to an object we can find the appropriate interface (so for example in a vga object the "Output" interface is the first pointer in the vtable, since vga only implements one interface, whereas in a serial object it may be the second pointer since serial implements both "Input" and "Output"); or
(d) don't allow objects as function parameters. Inheritence would still allow us to do many things at compile time (the "hex" object becomes a "hex" interface that output objects would have to implement), but this would be pretty restrictive.
Am I missing something? A worked example would probably be helpful.
Thanks,
Eric
2D points in a paint program. 3D vectors in an IMU, ... 640 should be enough for everyone. Roy's suggestion of setup on startup is pretty common. The "constructor" of an object (code called when you create one) fills in the necessary pointers after the memory is cleared, calls any nested constructors in objects it might own.
Even inheritance doesn't require dynamic vtables unless you want virtual objects (IE, Roy's description of interfaces) where calling code doesn't actually know the fully derived type of the object you're calling functions on.
Just trying to figure out why there's an actual need for two pointers in the typical, "fully qualified type" case we currently have with Spin.
If it turns out that a double long ends up being needed after all for some kind of object construct, perhaps it could be called BLOB for "Big Long OBject" or something like that. On the other hand, I guess it doesn't scale well to 128 bits.
i8 / u8
i16 / u16
i32 / u32
i64 / u64
and so on, assuming you actually care about the size. For things where you really don't care, char, short, int are sufficient, and can just be aliases to the "real" types.
Yup, WORDs alone are simply too vague today (pardon the pun)
What is needed is numbers, that clearly show what is meant .....
Coders will always care about size, and aliases are fine.
That brings things into line with wider industry conventions, which is to include numbers in types to avoid confusion.
The current way that calls work in Spin is that the current object has a table at the front with all method pointers in it and child object pointers (the pbase and vbase for the given instance, these are actually offsets from the current object's start not absolute pointers). So a call to a child object has 2 indices, one is the index into it's own table to get the object pointers(offsets that it adds to it's base address to get actual pointers) then the index into the child object's table for the method. Since all instances of an object's code get distilled down to one copy, the object itself doesn't know it's vbase (since it might have many). The vbase pointers(again offsets converted to pointers at runtime) are all in the tables of the objects that include them.
Also, at compile time no object knows it's final address in memory, they just have offsets from their base address. The whole program has a pbase and vbase address at the beginning, so those are used to get to the actual data via all the offsets in the objects.
This could all change, and I think maybe it should.
re i8/u8, etc. Spin is all signed for it's VARs.
How does that change anything, Spin extends them to longs for calculations, so they are effectively signed. Right?
-Phil
That sounds like a lot of extra work at runtime - a single finalizing pass in the compiler could, in theory, fix all code addresses to static memory locations. I understand that's not how it's currently done, I just think if it was changed it would simplify the runtime and make code execution faster. You know more about the under-the-hood in Spin, so please correct me if my assumptions are bogus.
And yeah, I knew Spin was all signed (mostly) but the logic stands - i8, i16, i32, and i64 would make more sense than byte, char, word, short, dword, long, and qword.
So yes, you can call bytes and words in hub memory unsigned. You need to take care when doing math/shift/roll operations on them if you want the results to be as expected.
I just got up, forgive me for not being clear. Sorry.
Yeah changing the compiler to fix up all the calls and whatnot to be absolute addresses would make things faster. It would also make these new proposed features easier.
This is not possible for the existing in ROM P1 bytecode runtime, since it doesn't have the proper call type, it could be done for the new P2 one.
The code size would be a little bigger though, since the existing call bytecode is 2 or 3 bytes total, and then uses the tables.
“When I use a WORD,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is,” said Alice, “whether you can make WORDs mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master—that’s all.”
Lewis Carroll (1832–98)
Lewis Caroll is essential reading for anyone who wants to learn computer programming.
For Spin calls, you would need the pointer to the code and the pointer to the var space for that given instance (aka the this pointer). Even if both of those pointers could be packed, it would still be larger (2-3x?) than the current 2 or 3 byte call bytecode.
I'm not sure that it really matters though, in many cases methods are only called from one place, and so the 2-3 byte call bytecode plus the table entry is larger than a single absolute address call would be. It's only when you call a method from multiple places that the bytecode + table thing is a savings.
We could keep the bytecode compact call type, and just change how the tables are made/arranged to reduce the work a bit.
In any case, these are implementation details that we can work out when the compiler code is being written/updated. We should define what we want, feature wise, for Spin2 and then worry about the implementation.
Something like that would be very clean.
I would alter his example to look more like this:
Generic "hex" object creation:
and to use we'd do something like:
Ignore for a moment the current implementation limits of Spin - it feels like you keep offloading things the compiler can do onto the user because you're stuck in how the Spin compiler currently works.