Propeller II update - BLOG

Heater. · 2014-01-10 04:21

All this backwards compatibility talk is making me queezy, please stop.

jmg,

Spin2 is not a ROM product, so it is no longer a single-point design.

This is a horrible nightmare.

To be clear: Spin has never been a ROM product. Perhaps it's byte code interpreter is but the language and compilers are not. We already have multiple implementations of the Spin compiler. They tend to have little extra features over Parallax Spin, #define or the @@@ operator etc.

What jmg is saying is that we can end up in a horrible mess of incompatibility:
1) Standard Spin for P2 may not run Spin 1 programs out of the box.
2) P2 might get modified Spin dialects that do run old P1 code..
3) Anyone can hack the language around how they like and make many incompatible dialects.
4) Just for fun we can have multiple variations of the interpreter now!

Disaster.

It's about time the Spin language was specified formally.

Next, it also makes sense to allow the hardware to run Spin1 Constants (esp Waits) unchanged , and for that, a very simple prescaler option is needed on the chip.

Oh my God no!

I rebel at adding hardware junk to the P2 in an attempt to obtain backwards compatibility. The P2 does not need any further complexity. I'm very sure that even with such hardware warts added 100% compatibility will not be achievable.

There is a reason we have the ability to define constants in our code. So that it's easy to adapt the program for new environments. Clock frequency setting is a case in point.

My feeling is that most P1 code that we want to carry forward to P2 (all of obex for example) fits into a couple of categories:

a) It's some kind of driver that relies on a lump of PASM.

I don't believe those objects will ever work "out of the box". That PASM will need tweaking for the P2.

b) It's some kind of general functionality in Spin.

For example a FAT file system layer. That kind of code has no PASM and no hardware dependencies. It will no doubt run on P2 easily.

Ergo. Adding hardware to the P2 in a vain attempt at backwards compatibility is pointless.

Still, the question stands I guess. How do we get all of OBEX working on the P2?

Is it just down to the object authors to adapt their code to P2. That might take forever or not happen at all depending on the objects author.

Or does the P2 just start out OBEXless like the P1 did?

ctwardell · 2014-01-10 07:10

cgracey wrote: »

BIG NEWS!!!

I got the hub execution working last night (5:30am). It only uses one cache line, but it should be easily expanded to four lines, using a least-recently-used algorithm.

Since hub execution occurs whenever a task's 16-bit program counter is beyond $01FF, it turns out that there is no need to store and recall a hub-mode bit. The only rule is: if an instruction is being fetched above $01FF, it needs to be read from the icache, which might entail an 8-long hub fetch. This got rid of all kinds of state hardware (like hub mode being stored in bit 18 of stack data). When things turn out right, they are always simple. I feel really good about how this is shaping up. There's no bandwidth pinch anywhere, either, which I was concerned about.

When hub code is cached up, it runs exactly as fast as it would from the cog. I made a pin-toggling program that runs partly in the cog and partly in the hub, and on the scope you can see every 50ns cycle and when the cache fetching occurs. I feel relieved. This was a big feature add, but it's something very valuable. This will work wonders for the Spin interpreter's efficiency. Being able to bust beyond the cog's RAM is a fantastic feeling.

This will work really well for bytecode interpreters. The jump table can use COG space code for operations that short and called often and then HUB space for operations that are larger and called less often.

C.W.

David Betz · 2014-01-10 07:36

ctwardell wrote: »

This will work really well for bytecode interpreters. The jump table can use COG space code for operations that short and called often and then HUB space for operations that are larger and called less often.

C.W.

Definitely! This hub execution mode opens up lots of exciting possibilities!

Dave Hein · 2014-01-10 08:46

ctwardell wrote: »

This will work really well for bytecode interpreters. The jump table can use COG space code for operations that short and called often and then HUB space for operations that are larger and called less often.

Yes, a Spin2 VM could be made to run faster by using this technique versus a VM that executes entirely out of COG memory. Opcodes that are used most often can be optimized to execute more efficiently than they could be if the entire VM is in COG memory. This is how the SpinLMM object works for P1, but it uses LMM instead of directly executing from HUB RAM.

It would be good if Spin2 would allow for inline PASM, which would eliminate the need to start up another PASM cog in a lot of applications. By using inline PASM, a Spin2 program that does floating point calculations could achieve high speed while using a single cog.

David Betz · 2014-01-10 09:09

Dave Hein wrote: »

Yes, a Spin2 VM could be made to run faster by using this technique versus a VM that executes entirely out of COG memory. Opcodes that are used most often can be optimized to execute more efficiently than they could be if the entire VM is in COG memory. This is how the SpinLMM object works for P1, but it uses LMM instead of directly executing from HUB RAM.

It would be good if Spin2 would allow for inline PASM, which would eliminate the need to start up another PASM cog in a lot of applications. By using inline PASM, a Spin2 program that does floating point calculations could achieve high speed while using a single cog.

Even hub execution by itself may eliminate the need for loading other COGs. For example, we've had floating point processor COGs that really only existed because you couldn't have inline assembly language in Spin and there wasn't enough space in a COG for the floating point code (in the case of the PropGCC LMM kernel). I guess what I'm saying is that sometimes additional COGs were used in a program solely to get more program space. This is true of the PropGCC cache drivers as well. There is really no reason that code couldn't be in the XMM kernel if the COG had more program memory. Hub execution is going to change a lot of things for the better. Thanks Chip!!!

Ariba · 2014-01-10 09:27

Spin is a language that goes down to the bare silicon of the Propeller chip.
So Spin2 will be as different from Spin1 as the Prop2 is different from Prop1.

Perhaps somebody will make a Prop1 emulator on Prop2 which can run Spin1, that's the only way to make a compatible Spin.

We just will have a new OBEX with new objects. Prop2 can do so much more and so much faster that it makes no sense to run the old Prop1 objects on it. There will be new objects that replace 4 or more objects from the Prop1, and TV and VGA drivers with resolutions and colors the Prop1 can only dream of - so who will use the old Prop1 objects then?

If you need to run the same code on Prop1 and Prop2 you better use C or C++.

Andy

David Betz · 2014-01-10 09:33

Ariba wrote: »

If you need to run the same code on Prop1 and Prop2 you better use C or C++.

Andy

Nice to see someone promoting C on the Propeller! :-)

Since we're starting all of this over again for P2, I wonder if there might be some way to get Spin and C to coexist better than they can on P1? I'm not necessarily saying you should be able to call a Spin function from C code but it would be nice if they could share hub memory and maybe common variables.

Ariba · 2014-01-10 09:33

Dave Hein wrote: »

...

It would be good if Spin2 would allow for inline PASM, which would eliminate the need to start up another PASM cog in a lot of applications. By using inline PASM, a Spin2 program that does floating point calculations could achieve high speed while using a single cog.

Not sure if you call this inline PASM, but Spin2 (as it was before the newest change) can call PASM routines in the cog memory (as a blocking routine) or start up to 3 PASM threads which run parallel to the Spin task. You even can let run a high resolution VGA driver in the same cog. Two serial ports are anyway available for free in hardware in every cog, they may need only a FIFO handler in PASM or Spin.

Andy

FredBlais · 2014-01-10 10:15

David Betz wrote: »

Since we're starting all of this over again for P2, I wonder if there might be some way to get Spin and C to coexist better than they can on P1? I'm not necessarily saying you should be able to call a Spin function from C code but it would be nice if they could share hub memory and maybe common variables.

It would be great if someone could write a C program, include a spin object and call its function like a C function. What is preventing us to do that?
I always programmed in C for embedded but this changed with the Propeller and Spin, I just don't want to lose access to all these great OBEX objects.

MJB · 2014-01-10 10:30

evanh wrote: »

With a small bit of circuitry added the two counters in a Cog could be stitched together to form a second order conversion. Based on the AD7400's example I proposed one solution not long ago - http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG/page134 . Drawing - http://forums.parallax.com/attachment.php?attachmentid=104153&d=1380541194

not long go - but 90 pages back ;-)
but good ideas come again -
didn't remember it

on rereading your post and later, I didn't see it further discussed.
maybe it got lost in the many ideas talked about.

higher order is just so much better SNR
and having the counters support decoding of external bitstreams of higher order than 1 would be great.

rod1963 · 2014-01-10 11:22

Ariba is right, if you want portability use C/C++. SPIN is tied too closely to the P1's hardware to have any real amount of portability to the P2.

Dave Hein · 2014-01-10 11:27

rod1963 wrote: »

Ariba is right, if you want portability use C/C++. SPIN is tied too closely to the P1's hardware to have any real amount of portability to the P2.

For the most part Spin is a high level language that could be implemented on any processor. The only hardware-specific stuff is the interaction with registers.

David Betz · 2014-01-10 11:29

rod1963 wrote: »

Ariba is right, if you want portability use C/C++. SPIN is tied too closely to the P1's hardware to have any real amount of portability to the P2.

I'm not sure I buy this argument. Sure, PASM is heavily tied to the P1 hardware but the Spin language itself isn't other than the fact that it has a bunch of built-in variables and functions to interface with the P1 hardware. In fact, PropGCC also has those functions and variables. The core Spin language could remain the same when moving to P2 or at least be a superset of what is in P1.

Cluso99 · 2014-01-10 11:50

David Betz wrote: »

Even hub execution by itself may eliminate the need for loading other COGs. For example, we've had floating point processor COGs that really only existed because you couldn't have inline assembly language in Spin and there wasn't enough space in a COG for the floating point code (in the case of the PropGCC LMM kernel). I guess what I'm saying is that sometimes additional COGs were used in a program solely to get more program space. This is true of the PropGCC cache drivers as well. There is really no reason that code couldn't be in the XMM kernel if the COG had more program memory. Hub execution is going to change a lot of things for the better. Thanks Chip!!!

YES! I had not thought about the fact that you can just run a cog from hub. Of course you cannot use instruction modification on this part, so there will be some restrictions.

There have been some excellent points made since you posted this.

Spin2 will be able to do all sorts of things, so it will grow with time. So ultimately we are going to need the compiler to be able to include/exclude certain functions as required by the "complete" P2 code - ie I mean that one cog may require some floating point routines, another may require some others, so the final compilation will need to know what can be removed.

IIRC Chip has already said he will be allowing inline pasm.

We have not even begun to scratch the surface of the possibilities with hubexec mode. The COG RAM restriction has certainly had its shackles removed!! We are really going to make a lot of use of this extra 128KB of hub ram now!!

rod1963 · 2014-01-10 12:16

Dave,

Well then, there is no problems at all, so we can run P1 Spin on the P2 with no modification. Cool.

The bigger issue seems to be how touchy/defensive people here are at the mere mention of C/C++. Makes me wonder if it's going to be the red haired step child that Parallax pays lip service to.

David Betz · 2014-01-10 12:22

rod1963 wrote: »

Dave,

Well then, there is no problems at all, so we can run P1 Spin on the P2 with no modification. Cool.

The bigger issue seems to be how touchy/defensive people here are at the mere mention of C/C++. Makes me wonder if it's going to be the red haired step child that Parallax pays lip service to.

I don't know if that is still true. It seems like many people are recognizing the value of having C/C++ available for the Propeller even though it might not be their language of choice. While I may not ever do a serious project in Forth or Basic I still think they have value. I'm not sure I can say the same about BrainF*ck though! :-)

Anyway, It doesn't have to be either Spin or C/++. It can be both and both can add to the value of both P1 and P2.

Seairth · 2014-01-10 12:24

David Betz wrote: »

It doesn't have to be either Spin or C/++. It can be both and both can add to the value of both P1 and P2.

Now, about Python...

David Betz · 2014-01-10 12:25

Seairth wrote: »

Now, about Python...

Python may be viable on P2. Not sure about P1 though.

Seairth · 2014-01-10 12:28

David Betz wrote: »

Python may be viable on P2. Not sure about P1 though.

Actually, I was kidding. I love the language, but I don't think it would be appropriate for the P2. Maybe it'll be approriate for the P3, when it's got hardware-level access to gobs of extra memory...

potatohead · 2014-01-10 12:42

I am one of those people. Gcc needs to be awesome on P2.

C isn't my language of choice, though I have used it at various times in the past with good experiences more the norm than not. I find C too "thick" on P1, where the scale of things just pulls me right to SPIN and PASM. Ross has people getting stuff done on Catalina, and the Gcc team has users now getting stuff done too. This is just great!

P2 is at a scale where C will be more appealing and less "thick" relative to the overall chip capability. I think that difference will very significantly impact the overall desirability and utility of C in a positive way.

We will have more robust and inclusive and capable standard libraries on P2, and that will cut down on the need to instruct C as much to make effective use of the chip.

That means being able to grab and use more general C code, easier, better, faster.

potatohead · 2014-01-10 12:47

I am not going to face a forum edit on my phone...

So, the last part of my post was to the effect of finding language wars undesirable and finding maximized language options highly desirable.

Who knows what each of us may bump into and what language support may work most optimally? I for one think having the C option running great is nothing but value added and look forward to seeing it all realized as potent as we likely will SPIN and PASM.

evanh · 2014-01-10 14:54

MJB wrote: »

on rereading your post and later, I didn't see it further discussed.
maybe it got lost in the many ideas talked about.

higher order is just so much better SNR
and having the counters support decoding of external bitstreams of higher order than 1 would be great.

Too true.

It's not looking good for this idea if I'm reading things right. It looks like Chip has decided to remove the second counter all together.

Chip, if you didn't see this first time around, here's the drawing as a PNG again - http://forums.parallax.com/attachment.php?attachmentid=104154&d=1380541194

jmg · 2014-01-10 15:08

Heater. wrote: »

Still, the question stands I guess. How do we get all of OBEX working on the P2?

Is it just down to the object authors to adapt their code to P2. That might take forever or not happen at all depending on the objects author.

Or does the P2 just start out OBEXless like the P1 did?

Oops, you have just made a case for thinking about backward compatible use, after first trying to say it was not on the table ?

jmg · 2014-01-10 15:25

potatohead wrote: »

I would have to wonder about "a little bit of circuitry" given all the timing related things we have interacting now. For single core devices maybe this makes sense, but having variable timings on the COGS, etc... doesn't seem simple.

A prescaler is quite a simple thing. Certainly very simple, relative to what else is going into the box.
It is not to be feared, it is simply giving users choice, and control, and giving them freedom to select dynamic range.

potatohead wrote: »

If we do nothing at all to the silicon, there are reasonable software answers to this.

Only up to a point, after that, the user is left wondering why some simple HW support, found in almost all other parts, was omitted.

potatohead wrote: »

On the other hand, having a software assist in migrating code to P2 proper may well make a lot of sense and any of us sufficiently motivated to do it, simply could. Doing that may offer some opportunity for somebody here and it would be on no path at all. Optional.

That was my point - Software pathways are always optional.

Some may run screaming from the idea of even thinking about supporting existing code, but history tends to favour those who consider their customer investments, and see existing code as a resource, not a liability.

jmg · 2014-01-10 15:29

evanh wrote: »

It's not looking good for this idea if I'm reading things right. It looks like Chip has decided to remove the second counter all together.

I think the comments re Counter removal are merely limited to fitting a build into the Smaller FPGA board.
My reading is the silicon, and larger PFGAs will still have second counter.

It may be the smaller FPGA can still be supported with a SIZE rather than SPEED selection.

It also appears that Parallax are bringing up a FPGA themselves, in part I think to pair newest FPGA and SDRAM to allow proper SDRAM testing, with latest FPGA silicon. Commercial boards have moved on from SDRAM.

Cluso99 · 2014-01-10 16:47

Yes, CTRB removal is only for the DE0 emulation to fit it into the fpga, not the final chip.

cgracey · 2014-01-10 17:10

evanh wrote: »

Too true.

It's not looking good for this idea if I'm reading things right. It looks like Chip has decided to remove the second counter all together.

Chip, if you didn't see this first time around, here's the drawing as a PNG again - http://forums.parallax.com/attachment.php?attachmentid=104154&d=1380541194

I'm not removing CTRB from the Prop2, only from the DE0-Nano configuration, in an effort to make one cog fit.

I looked at the drawing (thanks for making that), but I'm not understanding it yet. I'll look at it some more.

evanh · 2014-01-10 17:33

cgracey wrote: »

I looked at the drawing (thanks for making that), but I'm not understanding it yet. I'll look at it some more.

I was a bit cheap on the editing, I just duplicated the counter schematic without relabelling anything. The two counters in my drawing are CTRA and CTRB respectively. So, what I'm proposing is a link between them so as to have the summing of CTRB being feed from CTRA.

Both counters can then be configured as accumulators with PHSA just accumulating 1's from FRQA depending on pin state. And at the same time PHSB accumulates from PHSA (Instead of FRQB).

The software then just has to metronomicly read PHSB at the chosen decimation rate to be able to calculate the DC level of the source voltage. The calculation is a simple running delta (differentiation) per decimation sample, or double differentiation in the case of third-order filters.

The maths of why this works eludes me but it's certainly not hard to implement.

Cluso99 · 2014-01-10 17:41

evanh: I have included your chained counter proposal here. I think this is a simple and worthwhile addition to the P2 if Chip has the time (now or later).
I will try and remember it when we go thru the list later.

I thought it would be nice...

Source the Clock from an external pin.
Source FRQx from an external pin (ie mux FRQx output with an external pin)

With my mods (in red), and using the internal pins, it would be possible to chain CTRA APin to CTRB BPin to achieve your result above.

Postedit: I see Chip has commented while I was posting.

jmg · 2014-01-10 18:02

evanh wrote: »

Both counters can then be configured as accumulators with PHSA just accumulating 1's from FRQA depending on pin state. And at the same time PHSB accumulates from PHSA (Instead of FRQB).
...
The maths of why this works eludes me but it's certainly not hard to implement.

That drawing is not quite the same as the verilog code someone linked to before ?
IIRC the verilog seemed to do 3 adds and 3 subtracts and a shift, on every CLK in.

Propeller II update - BLOG

Comments