Is it time to re-examine the P2 requirements ???

jmg · 2015-02-02 14:18

evanh wrote: »

Not everything went to plan. A significant redesign occurred that would normally not be in public viewing. That's all happened now.

There are actually two significant redesigns :
One in the Verilog respin, and the other in the (significant) full custom stuff being moved to a sign-off outside contractor.
In many ways, that second one is a larger shift than recoding some verilog.
It will elevate the chance of first-silicon working.

Heater. · 2015-02-02 14:41

Whoever said it,

Hardware Threads...

I feel a bit guilty about this.

In the middle of that old humungus "throw in your ideas everybody" P2 design thread, I happened to mention that as Chip had everything for program controlled task switching in place it might be a simple matter to automate that into multi-tasking system. Basically swapping from task to task from instruction to instruction as is done on the XMOS devices. (Yes, turns out I was not the fist to suggest this, I just happened to pipe up at the right moment)

Amazingly a week or so later Chip had implemented it.

My suggestion turned out to have been another of the straws that broke the camels back of that design.

Sorry everybody.

David Betz · 2015-02-02 14:45

Heater. wrote: »

Whoever said it,

I feel a bit guilty about this.

In the middle of that old humungus "throw in your ideas everybody" P2 design thread, I happened to mention that as Chip had everything for program controlled task switching in place it might be a simple matter to automate that into multi-tasking system. Basically swapping from task to task from instruction to instruction as is done on the XMOS devices. (Yes, turns out I was not the fist to suggest this, I just happened to pipe up at the right moment)

Amazingly a week or so later Chip had implemented it.

My suggestion turned out to have been another of the straws that broke the camels back of that design.

Sorry everybody.

Well, my request for the ability to execute code from hub memory certainly didn't help either. I guess others asked for it too so I can't take full blame. :-)

Heater. · 2015-02-02 14:55

David,

You should feel extra guilty.

Unlike threading, the last I heard was that the ability to execute code directly from HUB is still in the plan and has no doubt been holding things up.

Speaking of which, does a future Spin language still need byte codes? With HUB exec in place why not just compile Spin into machine code?

OK the code will be bigger, but we have the space.

David Betz · 2015-02-02 15:18

Heater. wrote: »

Speaking of which, does a future Spin language still need byte codes? With HUB exec it place why not just compile Spin into machine code?

OK the code will be bigger, but we have the space.

Maybe the P2 Spin compiler could just be a GCC front-end which might allow for linking C code with Spin code if an appropriate ABI is selected.

Tubular · 2015-02-02 15:19

Well you won't find me apologizing nor feeling guilty about my request for a ';' delimiter so we can comment snippets dumped into the P2 monitor. Surely that would run cool

That task switching was brilliant, got out of the way nicely when not needed. Its a real shame to lose that, but I do understand. Hub execution - haven't used it just yet but can see why it needs to be there

Heater. · 2015-02-02 15:40

David,

Maybe the P2 Spin compiler could just be a GCC front-end...

Sounds like a great idea.

However my prediction is:

1) Chip will write the new Spin compiler in x86 assembler. He can do that faster than learning C++ or how Open Spin works.

2) Roy, or somebody, will reverse engineer that x86 code into a new Open Spin 2 compiler.

3) Years from now people will be wondering where the #includes and other features we had in BST ages ago are.

Electrodude · 2015-02-02 20:58

Heater. wrote: »

Speaking of which, does a future Spin language still need byte codes? With HUB exec in place why not just compile Spin into machine code?

OK the code will be bigger, but we have the space.

I can just tell now what a ball Peter Jackaki will have with his bytecodes and indirect jumps on the P2.

Heater. wrote: »

David,

Sounds like a great idea.

However my prediction is:

1) Chip will write the new Spin compiler in x86 assembler. He can do that faster than learning C++ or how Open Spin works.

2) Roy, or somebody, will reverse engineer that x86 code into a new Open Spin 2 compiler.

3) Years from now people will be wondering where the #includes and other features we had in BST ages ago are.

My Spin compiler, which will eventually serve as the final-stage assembler for the PropeLLVM compiler I want to write (but will still be 100% Parallax compatible and usable for normal Spin programs), will include #includes and #ifdefs and such. I'd like to have it working for the P2 as soon as the P2 is released, but I'm not sure I'll be able to get an FPGA, so it might have to wait a month or so after. In the mean time, I'm going to write it for the P1 now. It will make C-Spin linking as easy as including the Spin file that PropeLLVM emits in an OBJ block and exporting the symbols into the parent object's namespace (new feature).

Cluso99 · 2015-02-02 23:04

IMHO none of us should feel guilty (except heater

)

The task switcher you proposed was nice and simple. It was the following explosion IIRC that killed the cat. Like many things, they need to get complex before they are simplified. Unfortunately the simplification wasn't so easy and caused it to be dumped.

A simpler hubexec model could be implemented easily, and/or put more single-port ram in a couple of cogs.

But I would rather see a simpler P2 that works first time than a full-blown P2 that fails. The simpler version would prove the process (and Parallax could sell it) - remember there is no proven process yet with OnSemi.

evanh · 2015-02-02 23:52

Cluso99 wrote: »

The task switcher you proposed was nice and simple. It was the following explosion IIRC that killed the cat. Like many things, they need to get complex before they are simplified. Unfortunately the simplification wasn't so easy and caused it to be dumped.

No, the feature creep per se wasn't the cause, it was the simulation of thermal performance that killed it. There was no idea prior to the simulation as to what the situation was.

I think there was some specific instructions pointed out as the major contributors and I remember Bill saying it wasn't as bad as the simulation suggested but I guess Chip decided it wasn't desirable anyway.

I'm personally glad, as I've said, I didn't like the direction of bulky Cogs. I wasn't very vocal but around that time I made a few posts about the design choice of having lots of little cores doing a job each vs a few cores all packing in lots of features and having to find ways to share cores between many jobs.

Heater. · 2015-02-02 23:54

Electrodude,

What is this PropeLLVM going to be?

The name suggests that it's going to use LLVM http://llvm.org/

Are you planning to create a Spin front end for LLVM and also the back end Propeller code generator.

A Propeller code generator for LLVM would give us a very good C compiler for the Prop straight away, clang http://clang.llvm.org/

Sounds like you have your work cut out.

jmg · 2015-02-03 00:37

Cluso99 wrote: »

I
But I would rather see a simpler P2 that works first time than a full-blown P2 that fails. The simpler version would prove the process (and Parallax could sell it) - remember there is no proven process yet with OnSemi.

?? This claim makes no sense to me.
Parallax are not OnSemi's first 180nm customer, and OnSemi's process certainly is proven, and I am sure they can take Verilog, and give back timing files for the client to sign off on, and will have done so many times.
Also, the P2 verilog will be FPGA proven.
A higher risk element than the Verilog, is the Full Custom portion, and Ken has changed the way that is being managed.

Cluso99 · 2015-02-03 01:03

jmg wrote: »

?? This claim makes no sense to me.
Parallax are not OnSemi's first 180nm customer, and OnSemi's process certainly is proven, and I am sure they can take Verilog, and give back timing files for the client to sign off on, and will have done so many times.
Also, the P2 verilog will be FPGA proven.
A higher risk element than the Verilog, is the Full Custom portion, and Ken has changed the way that is being managed.

Precisely! The custom section is unproven!
ie the I/O blocks that Beau had taped out plus the fuses

Add to that the new hub method (lazy susan) that I still have significant misgivings about.

Then add in the smart pins (presumed to be Verilog) - what interactive bugs might surface.

All this together makes a significant risk, at least for first silicon. A simpler P2 could not only be used to verify OnSemi, but also result in a saleable Chip and give some ROI.
Anyway, its only MHO and its not my $.

Might be time to look at a Pi 2 and see what can be done at a low level with 4x 900MHz cores, GPU and 1GB DDR2 (no *nix/*droid/w10, just ARMv7 Assembler).
Even if P2 was available today, with the specs of the failed "hot" chip, how is one to produce a P2 board (with just some of those Pi2 specs) for $35 ???

potatohead · 2015-02-03 01:24

The GPU has been documented well enough for a macro-assembler to be made available: http://www.raspberrypi.org/new-qpu-macro-assembler/

http://wiki.osdev.org/Raspberry_Pi_Bare_Bones

All the stuff needed to start bare bones is there now.

evanh · 2015-02-03 02:24

Cluso99 wrote: »

Even if P2 was available today, with the specs of the failed "hot" chip, how is one to produce a P2 board (with just some of those Pi2 specs) for $35 ???

You know perfectly well they don't compare. They don't compare on market volume and therefore price. They don't compare on fab process technology and therefore gate count. And most importantly they don't compare on architecture! It would be easier to compare a PIC to an ARM than a Prop to an ARM. You knew all that years ago when the Pi first launched.

If the Prop2 sells well enough then we'll see even better to come. Maybe even a process shrink. I've said it a number of times in the past but I'm also hopeful to see MRAM as main memory eventually. HubRAM is a perfect candidate for leading in this area.

evanh · 2015-02-03 03:19

One area they will be comparable is on leakage. Power saving on the fine processes usually ends up meaning turn the power off.

mark · 2015-02-03 06:34

evanh wrote: »

I'm personally glad, as I've said, I didn't like the direction of bulky Cogs. I wasn't very vocal but around that time I made a few posts about the design choice of having lots of little cores doing a job each vs a few cores all packing in lots of features and having to find ways to share cores between many jobs.

Hear hear!

The "hot" Prop2 turned out to be an intimidating beast whose feature set I feared would only be able to be largely exploited by the forum's geniuses. Would this have even been a good idea considering that Parallax sees their target market being makers and the educational sector? The current version is shaping up to be significantly streamlined and simplified with a few core features that will likely offer significant performance boosts which should blow the doors off the P1 and hopefully make most users happy while not requiring too much hair pulling. There's no doubt in my mind that Chip can make amazing things happen, but just because you could doesn't mean you should. I think what happened was necessary in order to get chip to take a step back and be able to reconsider what the successor to the P1 should actually be, and I think this new design is just that more than the old version ever was.

Heater. · 2015-02-03 07:00

evanh,

I'm not sure what you are trying to say there:

They don't compare on market volume and therefore price

As a potential customer I probably don't care what your market volume is. I do care how much I have to pay though. Price comparison is very valid.

They don't compare on fab process technology and therefore gate count.

As a potential customer I neither know nor care what your process technology is or how many gates it took to make the thing. But hey, I get more transistors for the buck with a Pi

And most importantly they don't compare on architecture!

OK. Now we are talking. Well, actually it's not even architecture that matters so much as: Can it do what I want? Can I make it do what I want easily? Does it meet many other requirements, power consumption, voltage tolerance bla bla.

It would be easier to compare a PIC to an ARM than a Prop to an ARM.

Do what? An 8 bit PIC with no speed and no memory to a system capable of running a full up Linux OS. Seems like chalk and cheese rather than comparable, like, things.

Clusso's question was "how is one to produce a P2 board (with just some of those Pi2 specs) for $35 ???"

I think it'a a good question. For many applications Pi and P2 may well be valid solutions. In that area it's going to be very hard for a P2 board to compete on price.

Of course there are real-time, real-world interfacing applications where the Pi is not suitable. How big is that market space? I don't know.

Electrodude · 2015-02-03 09:13

Heater. wrote: »

Electrodude,

What is this PropeLLVM going to be?

The name suggests that it's going to use LLVM http://llvm.org/

Are you planning to create a Spin front end for LLVM and also the back end Propeller code generator.

A Propeller code generator for LLVM would give us a very good C compiler for the Prop straight away, clang http://clang.llvm.org/

Sounds like you have your work cut out.

Yes, it will use LLVM and clang. Clang will compile all of your C files into one big LLVM IR file, PropeLLVM will compile that to PASM in a Spin file, and my Spin compiler will compile that into a .binary file (and possibly link it with other Spin files). My Spin compiler will have many more features than normal Spin compilers, such as namespaces, multiple objects in one spin file, and being able to (sometimes) put pointers in CON blocks and such and then export them to the parent object with child#con. I doubt the LLVM backend will be anywhere near done by the time the P2 comes out, but my Spin compiler should be.

David Betz · 2015-02-03 09:33

Electrodude wrote: »

Yes, it will use LLVM and clang. Clang will compile all of your C files into one big LLVM IR file, PropeLLVM will compile that to PASM in a Spin file, and my Spin compiler will compile that into a .binary file (and possibly link it with other Spin files). My Spin compiler will have many more features than normal Spin compilers, such as namespaces, multiple objects in one spin file, and being able to (sometimes) put pointers in CON blocks and such and then export them to the parent object with child#con. I doubt the LLVM backend will be anywhere near done by the time the P2 comes out, but my Spin compiler should be.

Ummm... You're going to compile C into Spin and then compile Spin to PASM? Why not go directly from C to PASM? What's the point of using Spin as an intermediate code. My guess is that it doesn't match C very well.

Electrodude · 2015-02-03 10:07

David Betz wrote: »

Ummm... You're going to compile C into Spin and then compile Spin to PASM? Why not go directly from C to PASM? What's the point of using Spin as an intermediate code. My guess is that it doesn't match C very well.

No, C to LLVM to PASM. The PASM will be embedded in a DAT block in a Spin file to make C-Spin linking easier.

David Betz · 2015-02-03 10:18

Electrodude wrote: »

No, C to LLVM to PASM. The PASM will be embedded in a DAT block in a Spin file to make C-Spin linking easier.

Why not just compile Spin also to LLVM and then use a standard linker to link the resulting object code? Anyway, an LLVM C compiler would be very nice.

Electrodude · 2015-02-03 10:43

David Betz wrote: »

Why not just compile Spin also to LLVM and then use a standard linker to link the resulting object code? Anyway, an LLVM C compiler would be very nice.

I want to use my own linker so I can use tricks like putting PASM images in uninitialized VAR buffers and reusing initialization code for other stuff later. Also, I want my spin compiler to be something a human would prefer over other Spin compilers even when not used with PropeLLVM. Also, it should give identical output to Propeller Tool when only using Propeller Tool features, and I doubt a pre-existing linker would know how to do that. I'll probably eventually write a PropGCC .o file to DAT block converter so people can link in .o files.

I'm not convinced yet that LLVM is as ideal for stack-based languages like Spin as it is for register based lanugages like PASM. Also, Spin usually doesn't need to be terribly fast but often does have to be compact. I'd ultimately like to have some framework that would allow one to compile Spin, PNUT bytecode, Tachyon Forth, etc. into a common internal representation and then turn that into PNUT or Tachyon bytecode or PASM. It would probably only make sense use LLVM when the target language is PASM.

Maybe I should just use LLVM for everything. Then one could put Spin, PNUT, Tachyon, C, and PASM in a Spin file, compile and optimize the whole thing with LLVM, and then convert it to PNUT, Tachyon, PASM, etc. But this still has many problems, such as how to embed hand-written PASM in LLVM IR and how to mark what backends get used for different functions. Simultaneously supporting multiple backends is really only important for C and Spin together, but I definitely want to be able to use the Tachyon kernel for Spin code.

I barely have any code at this point. I have most of it planned out in my head, mostly just extensions to Spin and my optionally blockloading LMM kernel, but I'm still struggling to find a good library to write a Spin parser in. At this point, I think I'm going to use the Flex or Ragel lexer but write my own parser.

David Betz · 2015-02-03 10:52

Well, Spin is not any more stack oriented than C or C++ is. It's just the Spin VM that is stack oriented. I would think the Spin VM could go away for P2 since we now have much more hub memory for code space and hub execution (if that's still in). Not sure what to say about embedding PASM in Spin code and passing that through LLVM but my guess is that there won't be as much need for embedded PASM code once Spin has a native code generator. Really, a lot of the PASM in P1 Spin objects is there because interpreted Spin isn't fast enough. Of course, there will always be a need for PASM for really time critical code but I think it should become less common with P2.

evanh · 2015-02-03 10:59

Heater. wrote: »

evanh,

I'm not sure what you are trying to say there:

As a potential customer I probably don't care what your market volume is. I do care how much I have to pay though. Price comparison is very valid.

As a potential customer I neither know nor care what your process technology is or how many gates it took to make the thing. But hey, I get more transistors for the buck with a Pi

Yep, and obviously that happens. I totally snobbed the BASICStamp upon launch partly because I was comparing with unprogrammed commercial grade parts. Although, I also didn't think it would be fast enough nor suited to multitasking - Not that I ever tried.

I still stand by what I said, the different designs can't be compared on those production points because they're simply not in the same league.

You never know, one day that may change.

Clusso's question was "how is one to produce a P2 board (with just some of those Pi2 specs) for $35 ???"

And I fully answered it. The only thing I left out was directly telling him he was being unrealistic.

Heater. · 2015-02-03 12:01

Electrodude,

You have big plans there. I am in no way qualified to comment. Although I will when I have mulled over your posts a bit

But this stood out:

I'm not convinced yet that LLVM is as ideal for stack-based languages like Spin...

Spin, the language, is no more stack based than C or C++ or many other languages typical of the genre.

They are all stack based.

They call functions - return address on the stack.
They have local variables - on the stack.
They evaluate expressions - intermediate values on the stack.

Admittedly processors tend to have lots registers now a days so the last of those may all happen in registers and not on the stack. It need not be so. A complex expression in C may well spill out of registers into the stack.

LLVM has proven to be great for the stack based C/C++ (clang). No reason it cannot be so for Spin.

Whilst we are here: An LLVM front end for Spin would be amazing by itself. And a milestone in your project.

Such a front end would mean that the LLVM bit code produced from Spin could then be used to generate code for x86 and for whatever other architectures there exists LLVM code generators.

That means Spin could be run on the PC.

That means Spin could be run in the browser. (Emscripten converts LLVM bit code into JavaScript)

That would all be totally amazing!

With that in place you could start to think about the LLVM bit code to Prop machine code problem.

Dave Hein · 2015-02-03 12:30

Heater. wrote: »

Spin, the language, is no more stack based than C or C++ or many other languages typical of the genre.

However, the Spin VM in P1 is a stack-based VM. Since there are no registers everything must go through the stack. Even a simple operation like "x := 1 + 2" requires putting a value of 1 on the stack, and then putting the value of 2 on the stack, and then adding the two values on the stack and putting the result on the stack, and then finally popping the result off of the stack and storing it in the variable, x. Of course Spin can be compiled directly to LMM PASM, and the stack-based VM isn't needed, which is what happens when using spin2cpp and PropGCC.

Heater. · 2015-02-03 13:13

Dave,

Are we are mixing up concepts here?

Spin is a language, much the same as C, C++, Pascal and many others of it's kind all the way back to ALGOL and FORTRAN.

The P1 VM is a machine that Spin gets compiled to.

Now, I know pretty much nothing of compilers or LLVM in particular but an over all picture is clear:

1) Source code => front end, produces LLVM bit code.

2) LLVM bit code => back end code generator, produces machine code.

Ergo, a Spin front end for LLVM could produce LLVM bit code that ends up being compiled to x86 or ARM or JavaScript or whatever you have for a back end.

The back ends could produce byte codes for the existing, stack based, Spin interpreter, or LMM to run from HUB.

If anyone has the skills to make a Spin front end for LLVM and/or the back end code generators for the Spin interpreter or LMM my hats off to them!

Having the front end means we can run Spin on many platforms.

Having the back end means we can use clang to program a Prop in C/C++.

How cool would that all be?!

David Betz · 2015-02-03 13:29

Well, as usual, there is more to supporting a language than just the compiler. If you have an LLVM backend for the Propeller then you can generate Propeller code from any language that has an LLVM frontend. This is similar to the situation we now have for PropGCC. In theory, we could support any language that has a GCC frontend. However, you also need to provide a runtime system and libraries for the target architecture. Probably a lot of the library code we wrote for PropGCC could be moved over to an LLVM compiler but there is work involved beyond just writing an LLVM backend. That said, I would love to see someone do an LLVM backend for the Propeller and an LLVM frontend for Spin. We didn't do it for PropGCC because we had GCC experience on the team but not LLVM experience. It wasn't necessarily because we thought GCC was a better choice.

Heater. · 2015-02-03 13:44

Dave,

I'm sure that is true.

For example. One can compile a C/C++ program into LLVM bit code. One can then compile that bit code into JavaScript with Emscripten.

Fine, where does the standard C library API come from?

I'm not sure, but's its all there in JS when your program is linked. Emscripten has provided it.

Life must be much simpler doing that for Spin. There are no libraries for Spin. It's just source in, LLVM bit code out. Then LLVM bit code in, LLM out.

I am no position to say that clang/LLVM is better or worse than GCC. My observation is that the clang guys, the new kids on the block, shook up the GCC guys. And now GCC is adopting many of the features that were asked for in the first place.

Is it time to re-examine the P2 requirements ???

Comments