Propeller II

Sapieha · 2012-08-07 05:07

Hi Heatre.

It is not only at C and its friends don't use entire Z80 instructions set --- Them don't use even entire 8080 instruction set.

It is why Intel omitted all CAL on flags instructions in 8086 family CPU's and Now it is pain program in assembly.
Instead of single instruction You need make JUMP tables then use them by JUMP to Table instruction. Instead of use single instruction.

Heater. wrote: »

Chip,

Some interesting new instructions yoou have there. Have to think about them for a bit.

Any chance of changing the mnemonics, I find INDA/INDB a bit clumbsy and grating? Perhaps call them IX and IY and have SETIX, SETIY, SETIS and so on.

More seriously, these appear to offer code compaction and speed up to raw PASM running in COG but of litle use to LMM code. Are there any new instructions to help with C compiled to LMM for example? Has there been any consultation with compiler gurus to see what would help the compiler there?

I get the impression from looking at other architectures that they can have many weird and wonderful instructions and architectural features that seemed like a good idea to speed things up but end up not being used by compilers because it's just to hard. As an example the Z80 triples the instruction set size over the 8080 with a lot of bit twiddling ops etc, nice for assembler programmers but the compilers stuck to the old 8080 set.

Kye · 2012-08-07 07:35

LMM on the P2 will likely hit half the clock speed by default. There are new instructions to make this happen.

An Octal SPI flash setup (2 quad SPI devices at once) should be able to feed the LMM interpreter at 1/4 to 1/2 the LMM instruction consumption rate depending on your setup. With good caching the consumption and production instruction rate mismatch should not be too much of a problem.

Using 16 bit SRAM it will will be possible to equalize the production and consumption instruction rate... as the cost of a huge number of I/O pins (because the address bus will eat up another 16+ pins).

---

GCC will export new unused opcodes in the GCC code generator as builtin functions. They can be used by the programmer at will.

Heater. · 2012-08-07 07:51

That's all good news for fetching instructions in LMM or XMM. But what about the instructions themselves? Having index registers, as described above, is going to be useful in COG code but the equivalent for accessing HUB might be required in the LMM case. Similarly stack pointers and stack frame pointers and such are useful to compilers.

Having quirky/arch specific instructions available as built in functions is good but no help when compiling regular portable C.

Still I'm no compiler writer so I don't know what instructions/addressing modes etc are most useful to a compiler.

David Betz · 2012-08-07 08:00

Heater. wrote: »

That's all good news for fetching instructions in LMM or XMM. But what about the instructions themselves? Having index registers, as described above, is going to be useful in COG code but the equivalent for accessing HUB might be required in the LMM case. Similarly stack pointers and stack frame pointers and such are useful to compilers.

Having quirky/arch specific instructions available as built in functions is good but no help when compiling regular portable C.

Still I'm no compiler writer so I don't know what instructions/addressing modes etc are most useful to a compiler.

Isn't this what RISC is all about? Get rid of all of the quirky complex instructions that are great for inner loops coded in assembly but useless for compiler code generators. Maybe P3 can be designed in consultation with a compiler code generator expert. I think the best would be a co-design where the tools, particularly the compiler, are designed in conjunction with instruction set changes. That way instructions can be tailored for compiler use when it's still easy to change them. This could easily be done by making a software simulator for proposed new instructions built on spinsim that could be used for compiler code generator debugging before the silicon even exists. Later, once the RTL is done, the compiler code generator can be validate on the actual hardware in simulation. Then you have a working toolset when the silicon is released.

Heater. · 2012-08-07 08:42

I guess it's a bit late to be thinking about all this for the Prop I.
Although I think I did read that Prop II has had some consideration put into speeding up the LMM loop at least.

Now all we need is Bill Henning to look at the new PII instructions and find cunning ways to use them that no one has though of yet, like he did when inventing the LMM technique.

Phil Pilgrim (PhiPi) · 2012-08-07 08:45

cgracey wrote:

REPS #32,#1 'repeat 1 instruction 32 times

Immediate operand flag for the destination field? Just kidding: I know there isn't room for one. So is this instruction #dst,src and #dst,#src only? Or are there two separate REP instructions?

-Phil

Sapieha · 2012-08-07 09:02

Hi David.

Say instead --- Badly managed by Compiler writers --- It si why I don't like and its friends.

Them are written by LAZY programmers. That don't like thinking -- and lite extra work to use entire CPU's possibility !!!!!!!!!!!!!!!!!!!!!!!!!!

David Betz wrote: »

Isn't this what RISC is all about? Get rid of all of the quirky complex instructions that are great for inner loops coded in assembly but useless for compiler code generators. Maybe P3 can be designed in consultation with a compiler code generator expert. I think the best would be a co-design where the tools, particularly the compiler, are designed in conjunction with instruction set changes. That way instructions can be tailored for compiler use when it's still easy to change them. This could easily be done by making a software simulator for proposed new instructions built on spinsim that could be used for compiler code generator debugging before the silicon even exists. Later, once the RTL is done, the compiler code generator can be validate on the actual hardware in simulation. Then you have a working toolset when the silicon is released.

Dave Hein · 2012-08-07 09:14

The P2 spec at http://www.parallaxsemiconductor.com/sites/default/files/parallax/Propeller2DetailedPreliminaryFeatureList-v2.0.pdf lists a REPD instruction, which I guessing is the same as REPS, but with a new name. In the ZCRI field, the I-bit is always set, which means that the source value is always an immediate value. The R-bit is used to control whether the destination value refers to an immediate value or a cog memory location.

Chip, is there an updated version of the P2 spec that you can post?

Heater. · 2012-08-07 09:15

Sapieha,

I think that's a bit hard on compiler writers. I am sure some very smart, motivated and hard working guys have thought about this for a long time. It just seems to be a hard problem.

A classic example, if rumour is to be believed, is the Intel Itanium. A processor architecture designed with the idea in mind that a lot of complex instruction ordering and scheduling would be done by the compiler. Which turned out to be a tough nut to crack.

David Betz · 2012-08-07 09:16

Sapieha wrote: »

Hi David.

Say instead --- Badly managed by Compiler writers --- It si why I don't like and its friends.

Them are written by LAZY programmers. That don't like thinking -- and lite extra work to use entire CPU's possibility !!!!!!!!!!!!!!!!!!!!!!!!!!

Ummm... I'm not quite sure how to respond to this. There are many very smart people writing compilers (I'm not one of them) who make every attempt to fully exploit the features of the target processor but some instructions are so specialized that it is hard to fit them into a general code generation framework. If you have ideas about how this can be done we'd be very happy to hear them.

jazzed · 2012-08-07 09:18

Heater. wrote: »

Although I think I did read that Prop II has had some consideration put into speeding up the LMM loop at least.

Ya. And the ideas theoretically should help, but practicality is still to be seen. Some things are more efficient such as read caching and indexing on HUB memory. That's propably as far as Chip was willing to go. There have been other ideas for more mainstream support, but had they been implemented, P2 would be more ARMY than Propellery. I'm quite sure anyone who wants that is in the minority here - Chip is certainly not interested. After all even P2 is supposed to be a micro-controller and not a micro-processor.

Circuitsoft · 2012-08-07 09:19

The LLVM bytecode has some very high-level instructions in it to allow code generators to make use of very interesting instructions. I think it even has an automatic code-generator framework where you describe the instruction set of the target processor to the code generator framework, and it sets up special cases for all the specialized instructions of the target processor.

David Betz · 2012-08-07 09:31

jazzed wrote: »

After all even P2 is supposed to be a micro-controller and not a micro-processor.

This statement doesn't make sense to me. Are you suggesting that a microcontroller doesn't need good support for high-level languages and that it is always programmed in assembly language? That certainly isn't true of any other microcontroller and really isn't even true of the Propeller even ignoring C/C++. Good high-level language support in the instruction set would benefit a native Spin compiler too.

Dave Hein · 2012-08-07 09:40

The GCC compiler tries to take advantage of unique features of a processor. Take a look at this page that explains how the GCC machine description works -- https://idlebox.net/2011/apidocs/gcc-4.6.0.zip/gccint-4.6.0/gccint_16.html#SEC252 . The compiler writers definately aren't lazy people. They work very hard so that "lazy" programmers like us can generate efficient code for even the quirkiest CISC machine.

Phil Pilgrim (PhiPi) · 2012-08-07 09:46

David Betz wrote:

Get rid of all of the quirky complex instructions that are great for inner loops coded in assembly but useless for compiler code generators.

A lot of those so-called "quirky" instructions come right out of the DSP playbook. As a PASM programmer interested in signal processing, I welcome them with open arms. Optimizing an instruction set for mainstream languages like C may be a design goal for some, but I don't see that being the case here; nor should it be, when the main objective is raw assembly-level performance.

Compiled HLLs like C on the Prop I and Prop II are really nothing more than an accommodation for users who either don't want or don't have the time to learn PASM. So I don't think the chip should be designed for the convenience of compiler writers, if it entails compromise on the fronts that really matter.

-Phil

jazzed · 2012-08-07 09:57

David Betz wrote: »

Are you suggesting that a microcontroller doesn't need good support for high-level languages and that it is always programmed in assembly language?

That depends on the chip designer. A micro-processor has a generic bus to interface with memory as quickly efficiently as possible. P2 will be more capable of P1 in this regard, but it is not designed for that - even the new SDRAM interface is designed for data and not code.

David Betz wrote: »

That certainly isn't true of any other micro-controller and really isn't even true of the Propeller even ignoring C/C++.

Actually it is true of Propeller. SPIN is merely an interpreted "object dot" scripting language that borrows heavily from PASCAL. C/C++ on P1 and P2 are being treated the same way. AVR and any other MCUs let you go straight to the metal from C to ASM - There is no LMM interpreter.

Dave Hein · 2012-08-07 10:01

Companies that build DSP chips usually have a team of compiler writers that make compilers that are almost as efficient as hand-assembly. This wouldn't be cost-effective for Parallax. It would be nice to have a better PASM compiler with macros and such, but even P2 assembly should be fairly straight-forward.

David Betz · 2012-08-07 10:25

Phil Pilgrim (PhiPi) wrote: »

A lot of those so-called "quirky" instructions come right out of the DSP playbook. As a PASM programmer interested in signal processing, I welcome them with open arms. Optimizing an instruction set for mainstream languages like C may be a design goal for some, but I don't see that being the case here; nor should it be, when the main objective is raw assembly-level performance.

Compiled HLLs like C on the Prop I and Prop II are really nothing more than an accommodation for users who either don't want or don't have the time to learn PASM. So I don't think the chip should be designed for the convenience of compiler writers, if it entails compromise on the fronts that really matter.

-Phil

I'm not suggesting that we get rid of the DSP-like instructions. I'm just suggesting that there be some consideration given to also having instructions that fit well with high-level language code generators.

Sapieha · 2012-08-07 10:26

Hi Dave.

It is NOT personally directed to Propeller GCC people --- BUT to people that made first standards for C and friends.
Now it is that all that give headache to programmers how overcome that problems !!!!

Dave Hein wrote: »

The GCC compiler tries to take advantage of unique features of a processor. Take a look at this page that explains how the GCC machine description works -- https://idlebox.net/2011/apidocs/gcc-4.6.0.zip/gccint-4.6.0/gccint_16.html#SEC252 . The compiler writers definately aren't lazy people. They work very hard so that "lazy" programmers like us can generate efficient code for even the quirkiest CISC machine.

David Betz · 2012-08-07 10:27

Sapieha wrote: »

Hi Dave.

It is NOT personally directed to Propeller GCC people --- BUT to people that made first standards for C and friends.
Now it is that all that give headache to programmers how overcome that problems !!!!

I didn't assume you were directlly attacking Propeller GCC people. I was basically defending compiler writers in general. Anyway, it is certainly true that there are features of C that make it difficult to generate good code. Maybe we should all switch to Fortran! :-)

Dave Hein · 2012-08-07 10:41

Sapieha wrote: »

It is NOT personally directed to Propeller GCC people --- BUT to people that made first standards for C and friends.
Now it is that all that give headache to programmers how overcome that problems !!!!

So you are suggesting that Dennis Ritchie was lazy?

I suppose all he ever did was invent C and help develop UNIX and the software tools, and write a few books about it, and collected a few awards for his efforts.

Sapieha · 2012-08-07 10:51

Hi Dave.

NOT only lazy BUT had not future thinking on it.
Hi write what was good for him NOT what was GOOD for CPU's and its development.

And now we need live with it !!!

Dave Hein wrote: »

So you are suggesting that Dennis Ritchie was lazy? I suppose all he ever did was invent C and help develop UNIX and the software tools, and write a few books about it, and collected a few awards for his efforts.

Circuitsoft · 2012-08-07 10:52

I wonder if TASKING/Altium would be interested in writing a compiler for the propII. They seem to have the best dsp56k compiler on the market (latest gcc for it is based on gcc 1.37), and tout their optimizations quite highly.

David Betz · 2012-08-07 10:57

Circuitsoft wrote: »

I wonder if TASKING/Altium would be interested in writing a compiler for the propII. They seem to have the best dsp56k compiler on the market (latest gcc for it is based on gcc 1.37), and tout their optimizations quite highly.

I'm sure they would be interested if offered enough money. That's the problem with custom proprietary compilers. They are very expensive.

Dave Hein · 2012-08-07 11:36

Sapieha wrote: »

NOT only lazy BUT had not future thinking on it.
Hi write what was good for him NOT what was GOOD for CPU's and its development.

And now we need live with it !!!

I suppose everyone has their own opinion -- even when they are wrong.

jmg · 2012-08-07 11:52

Sapieha wrote: »

.... BUT to people that made first standards for C and friends.
Now it is that all that give headache to programmers how overcome that problems !!!!

I can agree about omitting useful opcodes, but you seem to believe 'Compiler writers' designed the new silicon.
They clearly did not.

Yes, C has some clear flaws for embedded work, the original lack of a Rotate and native Boolean types to name just two, but C was not the only language on the block.

Other languages DO have those, and ASM certainly does.

Chip designers simply trade off useful silicon - it is not a conspiracy.

Dave Hein · 2012-08-07 12:06

C doesn't have a rotate operator? Just define a macro and use an optimizing compiler to generate one.

This C code

#define ROTATE(x, y) (((x) << (y)) | (((x)) >> (32 - (y))))

int test(unsigned int a, unsigned int b)
{
    return ROTATE(a, b);
}

produces this assembly.

	.text
	.balign	4
	.global	_test
_test
	rol	r0, r1
	mov	pc,lr

Put the ROTATE macro in a header file, and the programmer never has to see how ugly it looks.

pedward · 2012-08-07 12:12

Some quick thoughts.

The external SDRAM interface will allow you to store XMM code very easily. You move the code into the 256 long video/stack/general memory, then you move blocks into COG memory. The ability to access external memory is done independent of the COG ALU, so you schedule a load and it happens in a separate state machine. This leaves main memory for data in LMM style; there is no contention for main memory between code and data.

Talked to Chip about the REP instructions the other day. REPS takes static operands for repeat and instruction count and happens at the beginning of the pipeline. You put 1 spacer instruction after, then it repeats the next N instructions Y times. REPD is Repeat Delayed and gets inserted into the pipeline later, which allows you to supply a register as the repeat value, but the instruction count is still static.

This is analogous to

while(64) {}

and

do {} while (n);

Also, if you use #$1FF as the dynamic register to REPD, it will repeat infinitely.

Based on these points, I don't see what else Chip could have done to improve LMM/XMM performance -- you've got dual path code & data memory, stack instructions, and repeat instructions.

cgracey · 2012-08-07 12:13

I share Sapieha's sentiments that C has been an often-misguiding force in the evolution of computing.

I suppose it made fine sense for large-memory, 'academically'-ideal systems of the time, but it has been shoehorned into everything else since, and has been dissuasive to the advent of computing architectures to which it would not be amenable. I suppose an ARM chip is what it is because of C. I understand that there are mountains of C code and many reasons to support C, but C's hegemony does not inspire me, personally, and I really hope that someday we will have better systems that get created because, finally, the inspiration came to throw off the old shackles that C has subtly placed on computing for the past 30 years. When this happens, I think we will all know it, because computing will suddenly be fresh again and something to get excited about.

David Betz · 2012-08-07 12:16

pedward wrote: »

Some quick thoughts.

The external SDRAM interface will allow you to store XMM code very easily. You move the code into the 256 long video/stack/general memory, then you move blocks into COG memory. The ability to access external memory is done independent of the COG ALU, so you schedule a load and it happens in a separate state machine. This leaves main memory for data in LMM style; there is no contention for main memory between code and data.

Talked to Chip about the REP instructions the other day. REPS takes static operands for repeat and instruction count and happens at the beginning of the pipeline. You put 1 spacer instruction after, then it repeats the next N instructions Y times. REPD is Repeat Delayed and gets inserted into the pipeline later, which allows you to supply a register as the repeat value, but the instruction count is still static.

This is analogous to

while(64) {}

and

do {} while (n);

Also, if you use #$1FF as the dynamic register to REPD, it will repeat infinitely.

Based on these points, I don't see what else Chip could have done to improve LMM/XMM performance -- you've got dual path code & data memory, stack instructions, and repeat instructions.

Is there a document describing this SDRAM interface? So I guess you're saying we can run LMM-style code from SDRAM without first loading it into hub memory, right?

Propeller II

Comments