Data Structures in Spin2
cgracey
Posts: 14,256
I am working out a structured data scheme for Spin2 and I am wondering how important you all think nesting is.
Beyond a single-level data structure, how valuable is it to be able to nest structures inside each other?
Comments
Hi Chip, great that you consider implementing it. Me (and probably others) have suggested this long ago...
I think, nesting is important. Or at least, it should be possible to extend existing data structures by appending new content.
For example it would be possible to implement an abstract data type "linked list" by defining a header with two pointers.
Sorry, I've used C syntax because it doesn't exist in Spin, yet. Then you can add "payload" data by declaring the header as first element of your new data type.
Object oriented languages usually have a special type/struct inheritance mechanism for that purpose. But to keep it simple, nesting is sufficient. I think the only difference between nesting and appending is that the compiler has to add the offsets of individual fields either recursively or iteratively. One shouldn't be much more dificult than the other.
To start, I am just thinking about static structures which could be indexed and possibly nested.
Linked lists seem to go towards needing garbage collection which is problematic because of the timing uncertainties and memory leaks that are attendant to them.
Right now, I am just thinking about ways to organize simple data.
Dynamic memory allocation and garbage collection are a competely different beast. I agree that they are difficult to implement and to use in real-time and multi-cog systems. But you don't have to provide that as part of the language. Everybody can add his own alloc() andf free() function as a library if he/she want's to.
But graphics are also a perfect example for nested structures. A point has 2 or 3 coordinates. A triangle consists of 3 points and so on...
So, do you think we need:
point[i].x
point[i].y
point[i].z
...or deeper nesting...
poly[p].point[i].x
poly[p].point[i].y
poly[p].point[i].z
I don't know if we really need deeper nesting, now. But if the compiler is written in a high level language (I hope so, or is it still assembler?) then it shouldn't be much more difficult to allow any number of nesting. I think it doesn't have any run-time, data size or code size impact on the binary or interpreter side because it eventually all compiles down to constant address offsets and a simple ADD instruction no matter how deep the nesting is.
Will you enable typing by element, e.g.
That looks a bit silly. There needs to be a distinction between value and pointer. That looks like
boxStructure
in VAR declares a box value, butboxStructure
in an argument (or local variable?) declares a box pointer. For small structs like that, you really, really want to be able to pass them as values. And you 100% definitely need to be able to create one as a local variable.Also, since structures can get a bit large, I think there's merit in defining them in their own block (I think I proposed REC (= "record") at some point?).
So what I'd do is:
For my GUI stuff, everything is in a CON section.
So, would be nice for me if structure data could be defined in a CON section...
Doing it like this now:
An easy way out is to use the existing enum and be able to use the "." with it...
Like:
And then be able to use
Form1Button6.oTop
I'm kind of doing this now, but have to do
Long[@Form1Button6][oTop]
I quite like the CON approach, as it avoids a new keyword and is a fairly consistent place for things being defined that are not (of themselves) using memory.
I'm also quite used to splitting up CON blocks to create logical little groupings, rather than a big whopper.
etc...
That said- I wouldn't mind a new block if that didn't cause pain for others; especially thinking of the compiler/editor gurus!
I'm not sure I've seen that syntax ... is that something FlexProp supports?
You're sending the address as a pointer:
computeVolume(@box[i])
So I wonder if explicitly receiving the address as a pointer is necessary ? (or maybe just a preference/readability thing?)
Or is Double @@ intended ?
The way I read that PRI... considering b is type-declared as boxStructure, then in the PRI you probably could work with b either long-handed or short-handed...
boxStructure[b][i].length
or
b[i].length
length would be an implicit long seeing as it wasn't declared otherwise in boxStructure
Though it could be made a byte I guess? And two colors, in-case grey is not enough
Interesting- will take a while to sink in and distil. This might be one of those features that watching it develop will broaden my horizon from so many different ideas.
The point is that you often do not want a pointer at all. You just want the contents of the structure dumped into your arguments. The other thing is, local variables, return values and arguments are basically analogous and use the same syntax. You definitely need to distinguish between a local that's a pointer to a structure and a local that is an actual instance of the structure being stored on the stack
local variables, return values and arguments are all longs though? No space for structure, just the ptr address.
Sounds like a higher level wrapper will handle the value addressing anyway, either coded by app, or in the compiler.
The structure just takes the space of some N longs. You can already have arrays, so that's no issue. As mentioned earlier, having structures on the stack is essential and without that they're borderline useless.
+1 for nesting support. Keeping record structures in only a single level would reduce the ability to port code from other languages and requires flattening everything down. It works for some cases but not so good for more complex applications.
Also passing entire record structures on the stack only can get inefficient so it would be nice to be able to pass pointers to structures and dereference them in the called method. This may require some type of syntax such as this...
But how will the compiler know what type of record ptr is referencing in order to lookup the correct field offset to apply? Unless the record's type name is provided somewhere it means the namespace of structure fields would have to be globally shared and therefore unique which is not ideal. You may need some type of explicit typecasting support for this or provide type names in the method prototype which is a syntax change and not good either.
Specify the type of a struct pointer the same way you specify the type of any other pointer:
You need both. For small structures passing the values is generally better. For large structures you obviously want to pass a pointer. Though the difference may be much smaller for interpreted spin vs native code (where any RDxxxx just really hurts)
Ok that works too. It sort of looks like array indexing but the presence of the dot and field name can at least differentiate it and it is somewhat consistent with LONG[p][x] and WORD[p][x] etc. How about when a real structure is passed not just a pointer? I guess that still needs changes to the method's prototype...? We could possibly do this... and the compiler would then know whether b is a pointer or a structure when it accesses b, though it is confusing if b is not accessed at all as to what the method expects to be passed. EDIT: err, maybe scratch that idea, it probably messes up the ability to have pointers to pointers.
Yes, agree.
Yes, that kind of thing!
Even this:
struct1(a, b, c)
struct2(byte x[3], word y[3], z[3], struct1 sub[10])
struct3(byte flags, struct2 m[3])
@cgracey : Why invent a new kind of data structure? Why not use OBJ for structured data? That is pretty much what an object is (some collection of data along with some functions). You could extend OBJ a little bit to allow data-only objects, for example. And any way to declare local objects or to pass object pointers would be useful in general.
Having a new kind of structured data alongside OBJ means there would be two ways to do everything, which seems like it could get confusing.
@ersmith That's an interesting idea... Have a hard time seeing how to make that work though... For example, I have my GUI data in an object, but I have to call functions to get at the DAT section data. And, I have to use @@ inside that OBJ to get the actual address of the data... Maybe at runtime, you could use the VAR section as a structure though.
For GUI objects I don't think you'd usually want the data in DAT, it would normally be in the VAR variables, e.g. for a point you would have a really simple object like:
and use it like
This might require extending Spin2 to allow direct access to VAR from outside (I can't remember if PNut supports that, flexspin does). You'd do addresses the usual way, like
@pt1.x
: the compiler would do whatever juggling is needed behind the scenes to get the correct address.The advantage of using OBJ instead of some new feature like REC is that users already know about OBJ, you're just extending an existing feature rather than adding a completely new one. Making OBJ a little more flexible will just generally make things easier, too, without having to worry about complications like how OBJ and REC interact, or when to use one instead of the other.
I've done something similar to what @ersmith is talking about (i think). I'd create a child object with nothing but VARs and reference them directly (since you can for reads anyway, with FlexSpin). You can't write directly to child vars and afaik you can't just order vars of different sizes willy-nilly and expect them to actually be in memory in that order, but the basic syntax is there. That doesn't address the pointer to structure cases though.
Agreeing with ersmith that we already have this feature, OBJs, they just don't offer constructs to access static (DAT) or object-instance data (VAR). It was partially implemented (accessing CONs) but everything else in the namespace should be accessible (with accessibility modifiers?). Nesting definitely should be allowed.
Being able to push/pop objects to the stack or construct them on the heap would be interesting and both IMHO should be considered (in my eyes, not too much difference constructing VBASEs just in different memory pools).
With the struct concept mentioned, what is the reusability of them? How are they passed across objects? How can two objects share the same structure type?
I also agree with @ersmith . All that's needed for by-reference data structures is a way to access objects by pointer (e.g.
typename[ptr].fieldname
ortypename[ptr].method(args)
), a way to declare an object by type without actually instantiating any instances, and the ability to directly access VAR variables.+1
I'm not seeing why OBJ should be used for data structures. Are you guys thinking OBJ is appropriate because it already uses "." as a separator?
Using a simple syntax in CON is very efficient and doesn't grow the final binary by anything other than the adds/muls needed to get into a nested structure.
Right now, I have this going:
You can do things like this:
It's super simple and you can build nested/arrayed (sub)structures with just a few CON statements. Then, you can instantiate them in VAR, DAT, and local variables and address them in several ways, which I am now working out.
Unless I'm missing something, what you have here seems exactly the same as OBJ objects except that they are defined in the same file, have no methods of their own, and have public VAR member variables. Just like OBJ objects, they can be nested and can only be statically declared/instantiated. If this is all you want, why not just make VARs directly accessible by the parent OBJ, as they are in flexspin, and call it a day?
Please add a way to declare OBJ objects without actually instantiating any instances, and add a way to access them by pointer (just the VBASE, I think).
EDIT: How much extra RAM besides VAR variables would an OBJ object with zero methods consume, since the method table would be empty?
I smile again watching Chip re-inventing the wheel because he REALLY wants to know why it is round.
As usual with Chip's thinking one needs to ignore common wisdom and ask the question 'why not like this?'
Chip just reinvented TYPES, yeah not all things are longs, just addresses pointing to a value to be interpreted as 'somehow defined in code'. Basically a good Idea but a big step for the spin language, now needing to track the TYPE of a Label and act accordingly.
So, I do find the CON version kind of attractive.
But using OBJ for it makes a lot of sense too. The main Question is as @Electrodude pointed out:
How expensive is a OBJ, memory-wise.
Because with current Language features we almost have all of it. The only need would be to not just access CON from a Sub-object with obj#constant but also allow access to VAR and DAT by obj.Label like FlexSpin does.
No change in the Language at all and a OBJ has its own VARs with Labels plus some common DAT things if needed. Perfect STRUCT to access records.
Nesting gets ugly and one would need a OBJ-file per Type not just a CON entry
So not as nice as Chips version.
I am undecided.
A parent object could access a child's structures by doing:
Does that solve the problem? The structures could be available with the rest of the CON declarations.
Oh, wait, I understand that there is interest in getting into a child-object's VAR and DAT, too. Must be some clean way to do that.
I am thinking that mandating objects for structures is overkill. That's usually not needed. We just need a simple way to define structures, with or without any objects involved.