New Spin

David Betz · 2017-02-18 22:01

Phil Pilgrim (PhiPi) wrote: »

David Betz wrote:

I think Spin itself is a counter example to your claim. It compiles to a stack machine but the Propeller has no stack. It certainly abstracts the hardware except where PASM is used.

That's a pretty insignifcant exception, considering the proliferation on non-standard operators like <# and ><, plus cognew, waitcnt, waitpe, and other language features tailored to the Prop's unique architecture. To implement those in C, for example, has required some pretty awkward contortions, IMO.

-Phil

You're probably right about <# and #> although I admit that I don't remember what they do. I consider things like cognew, waitcnt, and waitpe to be essentially part of a "runtime library". Yes, they might be different for P2.

Heater. · 2017-02-18 22:07

Cluso99,

Currently, where do you pass the address of a local variable?

Maybe I'm missing a point but if one calls a method of another object with local variables, like so:

anOtherObject.someMethod(a, b, c, c)

Then all those parameters are passed on the stack and hence on the LUT on P2 as proposed. So far so good. someMethod is run on the same COG so it see those parameters on the stack in the same LUT.

BUT, then what happens when someMethod wants to pass those to some PASM that anOtherObject has started running in another COG?

Of course they can be copied.

Forgive me if I am wrong but I seem to recall that objects like F32 optimize things by not copying, they pass an address to the COG running the PASM.

Boom, fail.

I have no idea how common that is.

msrobots · 2017-02-19 00:11

I think it is quite common to pass the address of a block of parameters and/or local variables to a PASM cog.

for example look at kye's FAT_engine.

It might be a kludge, but nice to use.

Usually the cog reads the parameter into the cog faster as the surrounding SPIN method finishes, so the local var's on the stack are long enough in existence to allow this.

And you have no need for a global address space to be wasted just to pass parameters.

Sure the P2 has more RAM and the 'waste' is not so significant as on a P1 but I cringe when somebody says 'just compile SPIN to PASM'.

Like CMM the main goal of SPIN is to have a bigger amount of code running in a small amount of RAM.

Therefor a bytecode interpreter is the right thing to do.

As spin2cpp shows, a conversion from Spin to C to run PropGCC is still doable and if your Spin Code is small enough, fine, do it and compile it native.

But if not?

512k of RAM sound big when used to 32k on P1. But just think a bit about it.

A COG now has 2K of code and 2k of lut to be loaded if needed.

that's 4k per Cog times 16 and 64k are already gone.

VGA on P1 with bitmap takes almost all hub memory and on a P2 everybody assumes we can do higher resolutions and more colors.

There is no ROM anymore so fonts need to get loaded, and so on.

512k sounds big, but is it really?

Code density is very important without a MMU to handle virtual RAM.

I wish there would be space for a 'memory out of bounds' interrupt catching all reads and writes out of the regular address space to allow seamless virtual/extended memory access and a return to the instruction causing the 'memory out of bounds'.

Enjoy!

Mike

David Betz · 2017-02-19 00:16

msrobots wrote: »

I wish there would be space for a 'memory out of bounds' interrupt catching all reads and writes out of the regular address space to allow seamless virtual/extended memory access and a return to the instruction causing the 'memory out of bounds'.

I asked for a TLB but that was deemed too expensive. In fact, I'm sure it is. Alternatively, there could be something like the ARM Thumb mode where code is denser. All of this could be considered for P3 I guess.

potatohead · 2017-02-19 00:22

@Heater

I am hopefully expecting that Spin on the P2 works the same as Spin on the P1. Except...

1) Anything to do with PASM. Or P1 hardware specific builtin functions.

***2) A pile of new builtin functions to deal with new/different hardware features.

3) Perhaps some new high level language features. That don't break old high level Spin syntax/semantics.

That is to say, the high level language features of Spin stay the same. The low level machine dependent parts change.

I'm sure there are many Prop users out there who have never used PASM. What they need to allow them to use the P2 is the same Spin language. They also need objects with the same API's to replace all they ones they have used in their P1 projects and become used to.

That is the way is should be.

4) we get real in-line PASM.

This has allowed for the hardware functions to be done as procedures, eliminating most of #2, shrinking the language core dramatically. Roy, Chip, others, me, had that exact discussion some years ago. HUBEXEC really helps here, as well as with speed / size trade-offs. A combination of inline and PASM procedures being available makes for a lot more granular and efficient options for doing this compared to P1.

We are in rough agreement.

I mainly wrote what I did to very strongly object to the idea that a language and chip designed together do not make sense. If they don't, why even bother with this?

I'm not here because I want to pretend a Prop is like a ton of other things. I'm here precisely because it's not like a ton of other things, and that "chip way" of doing things is not only effective and compelling, but efficient and productive too. Ask around, a lot of people will tell you this, and it's important that we complete that for P2, with the same ideas, same overall goals.

Add some higher level features, whatever. SPIN can use a few things. Wonderful.

But, at the end of the day, there needs to be one unified environment that exploits the chip in a native way, P2+SPIN+PASM needs to exist as it does on P1, and work in the basic ways it does on P1.

All the other stuff can happen as it will, needs to, and everyone will be happy, but this idea of snuffing out what makes the thing great, because standards, is very highly objectionable to me, and numerous others.

Yes, that means a sort of island. Wonderful! No dependencies that way. See how that works? And once you do, it works the same exact way forever too. This is worth it.

Frankly, I can still do exactly what I did the very first time I used a Prop, and it works as well today as it ever did. P2 needs to demonstrate that same quality.

Not everyone will use it, and that's fine. The more code the merrier. But, for those who do, this is all extremely high value.

msrobots · 2017-02-19 00:22

another idea would be a uniform read and write instruction using

000-3FF for cog access,
400-7FF for lut access and
800-above for hub access,
hiding all the details from the user/program.

000-7FF hub are not accessible for this instructions.

could raise 'memory out of bounds' interrupt?

Enjoy!

Mike

evanh · 2017-02-19 00:24

David Betz wrote: »

msrobots wrote: »

I wish there would be space for a 'memory out of bounds' interrupt catching all reads and writes out of the regular address space to allow seamless virtual/extended memory access and a return to the instruction causing the 'memory out of bounds'.

I asked for a TLB but that was deemed too expensive. In fact, I'm sure it is. Alternatively, there could be something like the ARM Thumb mode where code is denser. All of this could be considered for P3 I guess.

None of that talk in here!

David Betz · 2017-02-19 00:25

potatohead wrote: »

@Heater

I am hopefully expecting that Spin on the P2 works the same as Spin on the P1. Except...

1) Anything to do with PASM. Or P1 hardware specific builtin functions.

***2) A pile of new builtin functions to deal with new/different hardware features.

3) Perhaps some new high level language features. That don't break old high level Spin syntax/semantics.

That is to say, the high level language features of Spin stay the same. The low level machine dependent parts change.

I'm sure there are many Prop users out there who have never used PASM. What they need to allow them to use the P2 is the same Spin language. They also need objects with the same API's to replace all they ones they have used in their P1 projects and become used to.

That is the way is should be.

4) we get real in-line PASM.

This has allowed for the hardware functions to be done as procedures, eliminating most of #2, shrinking the language core dramatically. Roy, Chip, others, me, had that exact discussion some years ago. HUBEXEC really helps here, as well as with speed / size trade-offs. A combination of inline and PASM procedures being available makes for a lot more granular and efficient options for doing this compared to P1.

We are in rough agreement.

I mainly wrote what I did to very strongly object to the idea that a language and chip designed together do not make sense. If they don't, why even bother with this?

I'm not here because I want to pretend a Prop is like a ton of other things. I'm here precisely because it's not like a ton of other things, and that "chip way" of doing things is not only effective and compelling, but efficient and productive too. Ask around, a lot of people will tell you this, and it's important that we complete that for P2, with the same ideas, same overall goals.

Add some higher level features, whatever. SPIN can use a few things. Wonderful.

But, at the end of the day, there needs to be one unified environment that exploits the chip in a native way, P2+SPIN+PASM needs to exist as it does on P1, and work in the basic ways it does on P1.

All the other stuff can happen as it will, needs to, and everyone will be happy, but this idea of snuffing out what makes the thing great, because standards, is very highly objectionable to me, and numerous others.

Yes, that means a sort of island. Wonderful! No dependencies that way. See how that works? And once you do, it works the same exact way forever too. This is worth it.

Frankly, I can still do exactly what I did the very first time I used a Prop, and it works as well today as it ever did. P2 needs to demonstrate that same quality.

Not everyone will use it, and that's fine. The more code the merrier. But, for those who do, this is all extremely high value.

Ugh. A new language for every new chip. Seems like a recipe for failure.

potatohead · 2017-02-19 00:30

How exactly is that a failure?

Remember the interrupt discussion, and how so damn many people said, "let's add 'em so we have all the options and people can work the way they want to?"

Guess what?

Those people were right. And this chip has all those options. It's going to be great. BTW, my apologies to all of you, who didn't get that recognition before. You won! Glad you did. This chip would be considerably less capable without those options.

Now, this chip is designed specifically for people to program it low level. To make that work optimally, it's important that the environment make complete sense in that context, and when it does, the hardware capability is exposed completely and in an unobtrusive, easy to fully exploit way.

To be frank, if you find that objectionable, don't do it.

But you can bet a bunch of us will. And that includes just forking a broken thing to make it happen no matter what too, if it has to go that way, which from all indications I see, it does not.

Meanwhile, people who do want a unified language environment will get it. Doing that comes with a different set of trade offs.

All are valid here. I do my stuff with Propellers, and I have a pile of P1 code I use to do that with. Some I can share and did. A lot I can't, and don't. You really aren't impacted.

The same will happen with P2, and that pile of code will be different because the hardware is different. And you still won't be impacted. But, if it goes your way, I will profoundly. Not cool.

You might as well ask, "How come all assembly language isn't x86?" That's about as much sense as it all makes, from my point of view, of course. YMMV

potatohead · 2017-02-19 00:43

And maybe I can explain all of that another way. I am not meaning to be hostile or anything.

This just matters.

The primary value of the P1 environment and how it was designed to work together like it does, happens to be consistency and efficiency. It's really lean, but also really effective. A person does not have to know much to get things done. It also means a person doesn't necessarily have to remember a ton to get stuff done too. More is always better, of course.

Now, I happen to take large amounts of time away from this stuff. The P1, and growing set of P2 skills I'm mapping out right now will endure for a very long time. This means I can jump in, grab my code, my stuff, build it, whatever, and then just go. And always just go. It's never different, unless I want or need it to be, and it depends on nothing, but for the OS and since I often use PropTool, that's windows, and I just don't care.

This is also why I really want to see and contribute to an on-chip solution. It's about locking in some good capability and then just applying it, exploiting it, refining it, until it just works well.

Better may come along. Portable may come along. Lots of stuff may come along, and that is all just great. It's there, at the ready, and potentially useful. I'm not opposed. But I will often judge it based on need.

I want to get the thing done, not deal with the tools needed to get the thing that helps me get another thing, that gets the real thing done. I also very firmly believe those same qualities help new people who seek to do this stuff jump in and be successful, and do so on their terms and time.

This is about all of that. It's important.

David Betz · 2017-02-19 00:46

Sorry. I'll keep quiet.

msrobots · 2017-02-19 00:49

Someone here in the forum wrote about some interesting concept for a spin interpreter.

Since the interpreter has to be loaded with the program anyways (not in ROM) it just need to have the 'used' byte codes contained in it.

So if your program does not need a bytecode, its implementation is not compiled into the final binary.

Enjoy!

Mike

potatohead · 2017-02-19 00:51

David Betz wrote: »

Sorry. I'll keep quiet.

Not my intent and unfortunate.

I'll add one more thing. Remember the many C discussions? I've been a consistent and firm advocate for making sure all of that is possible and potent. Did that, because you are a great advocate and teacher of those ways of working. You need to see it maximized same as me.

Can't others get that same consideration?

That is my intent.

Tons of people, from various backgrounds will benefit. Isn't that what we are doing this for?

evanh · 2017-02-19 01:01

I'm not convinced about including interrupts being a good idea. The debug IRQ gives false sense of oversight.

JRetSapDoog · 2017-02-19 01:07

cgracey wrote: »

I've started working on the new Spin.

That's fantastic, Chip! Seems like a good strategy, and perhaps starting with an interpreted version will get you to the finish line faster (and at least provide a baseline). This also suggests that the design of the P2 hardware is pretty stable now.

With the chip being roughly about 10X faster and with all the optimizations possible with the new instructions and with using the LUT, it sounds like we're looking at a new SPIN that's often 30X faster (perhaps 50X or more?), while still generating compact code (I'm ignoring in-line PASM, of course). Obviously, any speed gains are code dependent. But that's going to make many things really fly. For example, slinging around tile maps is already pretty fast on the P1, as they can paint the screen in nearly a blink of an eye, but with the new SPIN, that will seem instantaneous.

potatohead · 2017-02-19 01:13

I'm not convinced about including interrupts being a good idea. The debug IRQ gives false sense of oversight.

Understood. I thought Chip put it best, "We aren't going to talk about this."

Here is the beauty of it:

We are gonna get to find out. And as we do, we can resolve it, share, and do all that we know how to do.

That's why those people were right.

evanh · 2017-02-19 01:42

JRetSapDoog wrote: »

cgracey wrote: »

I've started working on the new Spin.

...
With the chip being roughly about 10X faster and with all the optimizations possible with the new instructions and with using the LUT, it sounds like we're looking at a new SPIN that's often 30X faster (perhaps 50X or more?), ...

Aw, might be a tad optimistic. Individual Cogs are only 5x faster for a starters.

Actually, regarding the Spin interpreter, swapping things around by putting the bulk of the engine into LUTRAM and having the bytecode buffer and Spin stack reside in CogRAM would be faster execution I suspect.

David Betz · 2017-02-19 01:47

evanh wrote: »

JRetSapDoog wrote: »

cgracey wrote: »

I've started working on the new Spin.

...
With the chip being roughly about 10X faster and with all the optimizations possible with the new instructions and with using the LUT, it sounds like we're looking at a new SPIN that's often 30X faster (perhaps 50X or more?), ...

Aw, might be a tad optimistic. Individual Cogs are only 5x faster for a starters.

Actually, regarding the Spin interpreter, swapping things around by putting the bulk of the engine into LUTRAM and having the bytecode buffer and Spin stack reside in CogRAM would be faster execution I suspect.

Yes but there aren't any byte access instructions for COG RAM are there? Or do you intend to store one byte per long?

evanh · 2017-02-19 01:49

Damn! Foiled am I.

cgracey · 2017-02-19 02:28

David Betz wrote: »

evanh wrote: »

It's common practice to make everything global too.

Globals have the problem that if more than one COG tries to use them at the same time you run into conflicts. I guess we could have a malloc/free concept and used heap allocated objects for those cases. That's starting to seem too much like C though.

Garbage collection is a tricky thing. It's at big odds with determinism.

David Betz · 2017-02-19 02:32

cgracey wrote: »

David Betz wrote: »

evanh wrote: »

It's common practice to make everything global too.

Globals have the problem that if more than one COG tries to use them at the same time you run into conflicts. I guess we could have a malloc/free concept and used heap allocated objects for those cases. That's starting to seem too much like C though.

Garbage collection is a tricky thing. It's at big odds with determinism.

Ignore my observation about locals. It should be possible to use VAR variables to build an argument block to pass to another COG. Since every instance of an object has its own copy of the VAR variables there should be no conflict if different object instances in different COGs run in parallel. This does mean you can't use the trick of using local variables or parameters like you can on P1 but that might not be such a big problem.

cgracey · 2017-02-19 02:33

Heater. wrote: »

If I understand the issue correctly:

1) Spin/PASM running in a COG cannot read the LUT of other COGS.
2) The suggestion is to put Spin's stack in the LUT of the COG running it. All local variables will be in LUT.
3) Therefore local variables cannot be passed, via PAR, some mailbox, or whatever, from Spin in one COG to Spin/PASM running in another COG.

This sure sounds like it would break things like the float 32 objects.

Or am I missing a point here?

Local variables in Spin are only the variables declared within PUB/PRI methods, which serve as workspace during that method's execution. Any variable that is going to be shared among methods must be global and in the hub.

ozpropdev · 2017-02-19 02:39

David Betz wrote: »

Yes but there aren't any byte access instructions for COG RAM are there? Or do you intend to store one byte per long?

A "byte" variant of ALTB would have neat!

David Betz · 2017-02-19 02:45

cgracey wrote: »

Heater. wrote: »

If I understand the issue correctly:

1) Spin/PASM running in a COG cannot read the LUT of other COGS.
2) The suggestion is to put Spin's stack in the LUT of the COG running it. All local variables will be in LUT.
3) Therefore local variables cannot be passed, via PAR, some mailbox, or whatever, from Spin in one COG to Spin/PASM running in another COG.

This sure sounds like it would break things like the float 32 objects.

Or am I missing a point here?

Local variables in Spin are only the variables declared within PUB/PRI methods, which serve as workspace during that method's execution. Any variable that is going to be shared among methods must be global and in the hub.

I think the point is that local variables or even parameters are often used to pass parameters to cognew for use by a newly launched COG. That won't be possible on P2.

cgracey · 2017-02-19 02:58

ozpropdev wrote: »

David Betz wrote: »

Yes but there aren't any byte access instructions for COG RAM are there? Or do you intend to store one byte per long?

A "byte" variant of ALTB would have neat!

Yes!!!!

That can be done, and rather than just modify S, it could modify the N bits for 'GETBYTE D,S,#N'. This will mean that you can do a SETQ+RDLONG and then index through the bytes! That's important for a bytecode interpreter. Another instruction could write bytes sequentially, as well.

Phil Pilgrim (PhiPi) · 2017-02-19 03:18

cgracey wrote:

Garbage collection is a tricky thing. It's at big odds with determinism.

I've often wondered about that in the context of the Propeller. How much would determinism suffer if one cog were devoted full-time to continuous garbage collection -- assuming, of course, that it could keep up with garbage creation by the other cogs?

-Phil

potatohead · 2017-02-19 03:22

Apple says resolving that is not possible. Google says it is. Apple explicitly does not do garbage collection on it's mobile / tablet devices. Watches too, I suppose. Google does.

iOS has less latency in it's user interaction and processing than Android does.

IMHO, the answer here lies in "what is being determined?"

ozpropdev · 2017-02-19 04:32

Some thoughts on SPIN2.

While I see the fundamental look and feel of SPIN2 being the same as P1's SPIN, it also needs a personality upgrade to take advantage of the new features of the P2.
We now have interrupts,transcedental capabilty (cordic), Analog IO, smart pins,streamers and so on.
Weaving these features into SPIN's fabric makes for powerful and probably more to the point "FUN" programming.

I imagine cool features like.

	sensor := AIN[27]  'read analog input on pin #27
	AOUT[22] := speed_value	'set analog out value on pin #22
	myval := SQRT[myval1]	'get square root of 64 bit value
	ROTATE[x,y,angle,new_x,new_y]	'rotate signed (x,y) by angle, result into new_x,new_y

With 32x32 multiply, 64/32 divide and 64 bit square root maybe 64 bit suppprt is needed too.

VAR
	dlong	myval1
'instead of
	long	myval1[2]

The smart pins might be configured by PNut using pre-defined "macro" style commands.

	smartpin(7,_nco_freq,1-200_000)	'output 1.2MHz on pin #7

or for smart pin modes that require interaction maybe something like

	smartpin("mytx",16,_asynctx,8,115200)	'setup async tx on pin #16 @ 115200, 8 bits
'and then used in the form
	mytx := byte[index++]			'send byte to serial tx

cgracey · 2017-02-19 05:26

Phil Pilgrim (PhiPi) wrote: »

cgracey wrote:

Garbage collection is a tricky thing. It's at big odds with determinism.

I've often wondered about that in the context of the Propeller. How much would determinism suffer if one cog were devoted full-time to continuous garbage collection -- assuming, of course, that it could keep up with garbage creation by the other cogs?

-Phil

I think the most Insidious delays related to garbage collection come from when blocks need to be moved, in order to defragment memory. During that operation, nobody is allowed to touch the blocks, so the other programs must be stalled temporarily, even if through agreed semaphores.

Phil Pilgrim (PhiPi) · 2017-02-19 05:27

ozpropdev wrote:

ROTATE[x,y,angle,new_x,new_y]

If Spin2 is going to be call-by-value, like Spin, I could see this being expressed as:

(new_x, new_y) := rotate(x, y, angle)

Yes, that's a vector assignment, just like it looks.

-Phil

New Spin

Comments