Data Structures in Spin2

Rayman · 2024-02-29 11:55

Maybe:
CON point_struct(x, y, z)

Dat
Mypoint Point_struct 1,2,3

wummi · 2024-02-29 12:16

@cgracey said:
I'm not seeing why OBJ should be used for data structures. Are you guys thinking OBJ is appropriate because it already uses "." as a separator?

Using a simple syntax in CON is very efficient and doesn't grow the final binary by anything other than the adds/muls needed to get into a nested structure.

Right now, I have this going:
CON structname({byte/word/long/structname} membername{[count]}, ...)
You can do things like this:
CON  point_struct(x, y, z)
     triangle_struct(point_struct a, point_struct b, point_struct c)
     triangle_count = 100

VAR  triangle_struct triangle[triangle_count]

PRI work() | i

    repeat  triangle_count with i
      setpoint(triangle[i].a.x)
      setpoint(triangle[i].b.y)
      setpoint(triangle[i].c.z)
It's super simple and you can build nested/arrayed (sub)structures with just a few CON statements. Then, you can instantiate them in VAR, DAT, and local variables and address them in several ways, which I am now working out.

And how to work with pointers to structures?
Like byte[memadr] to access a byte in memory.
triangle_struct[memadr].a.x to access a structure member in memory.

ersmith · 2024-02-29 12:48

@cgracey said:
I'm not seeing why OBJ should be used for data structures. Are you guys thinking OBJ is appropriate because it already uses "." as a separator?

No, it's not a syntax thing. It's the semantics: an OBJ is some data plus some functions to operate on that data. That's a data structure! It's just like a class in other programming languages.

The new things you want to add are just collections of data, i.e. OBJs but without the member functions.

Don't think about the implementation (I know from experience that's hard when you're in the guts of the compiler!) Think about the meaning. The new things you're trying to add are, from the user's perspective, just like simple OBJs. They have some VARs (like, for a point, long x,y,z). They might usefully have some methods attached, but not necessarily.

The other important thing is that in practice your new structures are going to be most useful if they can be referenced in different source files: a point structure, for example, might be needed both in the low level 2D/3D renderer and also in the high level GUI. That means you'll need a way to put the definition into a separate source file. But OBJ already supports this.

In my compiler C struct, BASIC TYPE, and Spin OBJ are all implemented basically the same. The Spin objects have some limitations compared to the other languages (you can't just define an OBJ inline, for example) but that's just an implementation detail. They're all what are called "classes" in most programming languages, namely data + functions to operate on that data.

ersmith · 2024-02-29 13:05

The other thing is that most of the things you're talking about doing with your structure types would be useful for existing OBJs too. For example, I remember that in the Flash file system test code there was some manual tweaking required to get the same OBJ in the test runner and in the tests, and a lot of the data had to be put in DAT to make this work when it more naturally would have been VAR. If we could have passed a pointer to an OBJ that would have solved a lot of problems. Extending OBJ would be a win for all kinds of existing objects.

From the user's perspective if there were both OBJs and the new records, they would always have to think about which one to use when. They would both be limited in some ways: if you want methods for the data, or data that's shared (like in a DAT) you need an OBJ. If you want to pass them to functions, they have to be REC.

I know which one to use when would be clear to you, @cgracey . But you're not a typical user of Spin2 . New users would have to learn more, and would have more traps to fall into, if there were two kinds of things that did almost, but not quite, the same things.

Wuerfel_21 · 2024-02-29 19:18

I feel like both approaches should be considered. That is, being able to use OBJs as more general types but also inline-defined data-only types.

Rayman · 2024-02-29 20:07

Well, my problem with using OBJ is that sounds like it would mean created a bunch of .spin2 files with trivial amounts of stuff in them. That kind of thing annoys me...

Maybe can create a new kind of OBJ entry that is an inline structure definition, so can do either way?

OBJ
     point_struct : long x, long y, long z
     triangle_struct : point_struct a, point_struct b, point_struct c

Electrodude · 2024-02-29 21:47

@Rayman said:
Well, my problem with using OBJ is that sounds like it would mean created a bunch of .spin2 files with trivial amounts of stuff in them. That kind of thing annoys me...

Maybe can create a new kind of OBJ entry that is an inline structure definition, so can do either way?
OBJ
     point_struct : long x, long y, long z
     triangle_struct : point_struct a, point_struct b, point_struct c

I like the syntax, but it's inconsistent with existing use of OBJ blocks because typename: long x, y, z only defines the type whereas typename: "filename" actually instantiates an object instance.

Wuerfel_21 · 2024-02-29 23:09

@Electrodude said:
I like the syntax, but it's inconsistent with existing use of OBJ blocks because typename: long x, y, z only defines the type whereas typename: "filename" actually instantiates an object instance.

Flexspin already supports typename = "filename" to define a class without instantiating it. (and then typename[ptr] can be used to access them), so that'd be the way.

rogloh · 2024-03-01 00:36

@cgracey said:
A parent object could access a child's structures by doing:
VAR  childobject.struct_triangle triangle[100]   'use struct_triangle and all its sub-structs
Does that solve the problem? The structures could be available with the rest of the CON declarations.

Oh, wait, I understand that there is interest in getting into a child-object's VAR and DAT, too. Must be some clean way to do that.

I am thinking that mandating objects for structures is overkill. That's usually not needed. We just need a simple way to define structures, with or without any objects involved.

This could be particularly useful. There are times when you want to pass some record data to a child and/or provide memory for it to store the result. If you don't know the actual size of this it gets trickier to allocate memory ahead of time and you need to resort to accessing fixed "sizeof" type constants buried in the child to work out all the needed size for the parent/caller object. It's messy. Having a named type defined in the child accessible to the caller is going to improve things a great deal for this issue and allow each object to define its own types independently and for this to be accessible by the parent. We need something like this.

cgracey · 2024-03-01 08:42

What do you guys think about clamping index values in nested arrays, in order to prevent some possibly difficult-to-find bugs? There is little runtime and memory penalty to implement this, but it is worth doing?

cgracey · 2024-03-01 10:26

Would limiting structure-member size and indexing values to $FFFF be a serious limitation? I ask because it affords convenient use of the MUL instruction, which is fastest and simplest.

Wuerfel_21 · 2024-03-01 11:41

@cgracey said:
What do you guys think about clamping index values in nested arrays, in order to prevent some possibly difficult-to-find bugs? There is little runtime and memory penalty to implement this, but it is worth doing?

That doesn't happen anywhere else, so rather no for consistency.

@cgracey said:
Would limiting structure-member size and indexing values to $FFFF be a serious limitation? I ask because it affords convenient use of the MUL instruction, which is fastest and simplest.

Not sure. You wouldn't really make a structure that's really large, but having a huge array of small structures seems likely.

cgracey · 2024-03-01 11:59

@Wuerfel_21 said:

@cgracey said:
What do you guys think about clamping index values in nested arrays, in order to prevent some possibly difficult-to-find bugs? There is little runtime and memory penalty to implement this, but it is worth doing?

That doesn't happen anywhere else, so rather no for consistency.

@cgracey said:
Would limiting structure-member size and indexing values to $FFFF be a serious limitation? I ask because it affords convenient use of the MUL instruction, which is fastest and simplest.

Not sure. You wouldn't really make a structure that's really large, but having a huge array of small structures seems likely.

Yes, structures could be huge, but no indexed sub-structure could exceed $FFFF bytes and no index could exceed $FFFF. The compiler would catch this, of course.

So, maybe no on guardrails at runtime. I agree it would be inconsistent.

wummi · 2024-03-01 14:31

$FFFF is lage enough.
And please allow pointers to structures, like struct_name[memadr].a.x to access a structure member everywhere in memory.

rogloh · 2024-03-01 23:10

@cgracey said:
Would limiting structure-member size and indexing values to $FFFF be a serious limitation? I ask because it affords convenient use of the MUL instruction, which is fastest and simplest.

I'd say yes it could be limiting in some more extreme cases. You might want to create a linear array of bytes or other data that is larger than 64k for example to simplify its access in SPIN2 with some algorithms.

The clamping thing is a bit weird, and it will just create a new bug out of a different one. Proper index bounds checking could be nice if it's fast and would catch real bugs but what is a microcontroller even going to do if it detects it at runtime? Generate some fixed assert/debug output message when it happens to an attached debugger and then halt? That's about all it could do, other than try to ignore the problem and continue.

Could the compiler generate two different index code check blocks based on the known array size? Eg. if less than 64k it does bounds checking and uses MUL. But if it exceeds 64k it does not do that work and uses different (slower?) code for index calculations.

Wuerfel_21 · 2024-03-01 23:15

@rogloh said:
The clamping thing is a bit weird, and it will just create a new bug out of a different one. Proper index bounds checking could be nice if it's fast and would catch real bugs but what is a microcontroller even going to do if it detects it at runtime? Generate some fixed assert/debug output message when it happens to an attached debugger and then halt? That's about all it could do, other than try to ignore the problem and continue.

Yes, simply clamping out-of-range values is just as bad as writing out of bounds. As is ignoring the access entirely.

RossH · 2024-03-02 06:45

@rogloh said:

@cgracey said:
Would limiting structure-member size and indexing values to $FFFF be a serious limitation? I ask because it affords convenient use of the MUL instruction, which is fastest and simplest.

I'd say yes it could be limiting in some more extreme cases. You might want to create a linear array of bytes or other data that is larger than 64k for example to simplify its access in SPIN2 with some algorithms.

Agreed. C programs can have arrays with indexes larger than 64k. It would be silly if you can do it in C but cannot do it in Spin.

cgracey · 2024-03-02 09:11

You can have arrays of structures way bigger than 64KB. It's just the structures that must be within 64KB.

RossH · 2024-03-02 09:30

@cgracey said:
You can have arrays of structures way bigger than 64KB. It's just the structures that must be within 64KB.

Same applies. C can do this. Why not Spin?

cgracey · 2024-03-02 10:09

@RossH said:

@cgracey said:
You can have arrays of structures way bigger than 64KB. It's just the structures that must be within 64KB.

Same applies. C can do this. Why not Spin?

Because we would need to use the CORDIC to compute larger than 16x16-bit multiplies. It would just be slower, but could be done. A MUL instruction is just 2 clocks.

ManAtWork · 2024-03-02 10:27

There are a few cases where arrays or structures may get bigger than 64k but they are rare. Screen or file buffers are an example but they are usually just array of bytes or simple types.

@cgracey said:
You can have arrays of structures way bigger than 64KB. It's just the structures that must be within 64KB.

That's totally fine for me.

@cgracey said:
Because we would need to use the CORDIC to compute larger than 16x16-bit multiplies. It would just be slower, but could be done. A MUL instruction is just 2 clocks.

The best (most convenient for the user) thing would be to atomatically generate the appropriate code. If it's <64k use MUL, if not use the CORDIC. If that's too much trouble I could live with the 64k limitation if the compiler correctly reports an error if it doesn't fit. If somebody really needs it it support for >64k could be added later.

@Wuerfel_21 said:
Yes, simply clamping out-of-range values is just as bad as writing out of bounds. As is ignoring the access entirely.

I find array index range checking very important for real high-level languages. 20 years ago I was used to programming in Oberon which had index checking and NULL pointer checking. Oberon programs never ever crashed. It just adds a really good layer of security. I can't tell how many times I banged my head against the wall because of stupid errors in C programs that took days, weeks or months to debug.

On the other hand, spin is a language I typically use for driver code and other hardware related stuff or small "hacks" that can be completed in a day or two. It's simple and efficient. So IMHO it's totally acceptable to not have index range checking.

VonSzarvas · 2024-03-02 11:09

Agree with @ManAtWork
Also- for those "rare larger than 64K" structures, running on cordic might be too slow for purpose anyway- and in that case the code might benefit from using multiple 64K structures instead of one slow whopper.

Given the scarcity of time and resources, I'd vote to move on with the MUL solution, and forget about Large_Structure for now.
If still required later a Large_Structure option could be added, or some helper functions written.

Avoid the rabbit hole of rare detail that seek to derail the rest of the project! (Portable compiler, sd driver, usb driver... to name a few!)

cgracey · 2024-03-02 12:41

How many staged indexes do you guys think are practically needed?

Here is an example of three:

array[idx1].field1.subfield1[idx2].subfield2.subfield3.subfield4[idx3]

Are 3 sufficient, or must I jump to the next possibility and handle 7?

Wuerfel_21 · 2024-03-02 12:48

Why would there be a limit in the first place? There really shouldn't be with how the bytecodes stack up.

cgracey · 2024-03-02 13:46

@Wuerfel_21 said:
Why would there be a limit in the first place? There really shouldn't be with how the bytecodes stack up.

It's not a bytecode thing, It's an economical issue with RFVAR, which brings in the offset address plus some extra LSBs to handle index count (0..3) and byte/word/long size. There are two bits for index count and two for size.

Wuerfel_21 · 2024-03-02 14:46

Oh, I'd think each indexing would be its own individual operation

cgracey · 2024-03-02 23:14

@Wuerfel_21 said:
Oh, I'd think each indexing would be its own individual operation

Each is unique, but I must have a static count of how many dynamic index values need to be popped from the stack. For each value popped, I do an RFVAR to get the size. Then index*size gets added into the address. Once we have the final address, we chain to the byte/word/long setup bytecode, so that a r/w can happen.

There is an initial RFVAR that gets starting offset, plus index count and BYTE/WORD/LONG type.

Wuerfel_21 · 2024-03-02 23:18

I thought there'd be an immediate address on the stack, so the index values would just be computed inbetween.

But in such a case... I really don't know, probably be safe and go for 7.

Rayman · 2024-03-02 23:23

The revised 3 version above already looks ridiculous. But, I'm sure someone will try to push the envelope...

cgracey · 2024-03-02 23:48

@Wuerfel_21 said:
I thought there'd be an immediate address on the stack, so the index values would just be computed inbetween.

But in such a case... I really don't know, probably be safe and go for 7.

The compiler can compute everything but dynamic indexing, which requires an index pop and a structure/byte/word/long size. Bases are added at runtime per the bytecode, which kicks off the process.

Data Structures in Spin2

Comments