The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Part 2

potatohead · 2015-09-23 20:33

I second "##" doing that is just super easy.

Cluso99 · 2015-09-23 20:47

I am quite concerned about the Special Registers being located at COG $000+.

Cog RAM $000+ is often used for tables. Now that they cannot be "0" based means adding an extra offset value to get the table. While this is often not a problem, it is if the table is being continually used which will slow down the code.

Some examples...

1. Font table: I use a font table located in cog $000+ within the video generator cog. It is extremely timing dependant!

2. Vector table: Currently there is no other way, but in my faster spin interpreter I have a vector table located in hub. It will be much faster for this to be located in COG or LUT. If it's in COG $000 it will be much faster to decode each spin opcode. IIRC the average spin opcode uses about 50 instructions. Cutting just one instruction in EVERY op code will yield another 2% gain. With LUT-exec and stacks, we are going to see a dramatic improvement in spin execution time. Every bit of speed will help. I am also sure we will see other interpreters making an appearance on P2, as well as GCC

If Bill is around I would love to hear his opinion ???

Meanwhile, Chip may I suggest you just leave it as you now have it (Special Registers at $000+). This way we can check it out.
We all need an FPGA code release

David Betz · 2015-09-23 20:53

Cluso99 wrote: »

I am quite concerned about the Special Registers being located at COG $000+.

Cog RAM $000+ is often used for tables. Now that they cannot be "0" based means adding an extra offset value to get the table. While this is often not a problem, it is if the table is being continually used which will slow down the code.

Some examples...

1. Font table: I use a font table located in cog $000+ within the video generator cog. It is extremely timing dependant!

2. Vector table: Currently there is no other way, but in my faster spin interpreter I have a vector table located in hub. It will be much faster for this to be located in COG or LUT. If it's in COG $000 it will be much faster to decode each spin opcode. IIRC the average spin opcode uses about 50 instructions. Cutting just one instruction in EVERY op code will yield another 2% gain. With LUT-exec and stacks, we are going to see a dramatic improvement in spin execution time. Every bit of speed will help. I am also sure we will see other interpreters making an appearance on P2, as well as GCC

If Bill is around I would love to hear his opinion ???

Meanwhile, Chip may I suggest you just leave it as you now have it (Special Registers at $000+). This way we can check it out.
We all need an FPGA code release

Will Spin2 really be an interpreted language? I kind of figured, with hub execution, that Spin2 would be compiled to native code and fast as heck. :-)

potatohead · 2015-09-23 21:10

I actually hope it's both...

Byte code SPIN still has a place for code size reasons.

Seairth · 2015-09-23 21:24

Please don't add more address operators. This is just hidingobscuring complication with syntax sugar. This makes PASM more difficult to learn for new people. I'm sure some of you will disagree, but youneed to remember that you have an entirely different perspective of the P2 than a new person will.

And, personally, I think it makes the Propeller less fun to program for. I'd much rather get rid of the complication and keep the fun!

David Betz · 2015-09-23 21:31

potatohead wrote: »

I actually hope it's both...

Byte code SPIN still has a place for code size reasons.

Of course, code size will be less of a concern with 512K of hub memory. On the other hand, there are those who will probably eat most of that up with video buffers! :-)

potatohead · 2015-09-23 21:32

Who me?

Seairth · 2015-09-23 21:34

potatohead wrote: »

Who me?

That depends on if you disagree.

David Betz · 2015-09-23 21:40

Actually, I like bytecode languages. I've created a number of them myself. I remember being really excited when Addison Wesley published the book "Smalltalk 80 - The Language and Its Implementation". It was the last part that excited me most. It described in detail the Smalltalk virtual machine including defining all of the bytecodes. It will be interesting to see what Chip does with Spin2. I hope he finds a way to make it as simple and easy to use as Spin.

potatohead · 2015-09-23 21:53

Me too.

Ideally, SPIN gets inline PASM now that we have HUBEXEC. That would be kind of an old school thing, but it's also efficient, easy, and a great platform for prototyping or learning about PASM in general.

Byte code plus PASM online would deliver most of the speed benefits complied to native SPIN does while still delivering on size benefits.

Which does make room for video buffers.

jmg · 2015-09-23 22:07

Seairth wrote: »

Please don't add more address operators. This is just hidingobscuring complication with syntax sugar. This makes PASM more difficult to learn for new people. I'm sure some of you will disagree, but youneed to remember that you have an entirely different perspective of the P2 than a new person will.

I'd agree, and now is also a great time to make Prop ASM more like standard everyone is used to.
I think GAS is more 'normal' Prop ASM should move toward that.

In many cases, it can tolerate older P1 style, but that should be discouraged.

jmg · 2015-09-23 22:23

Cluso99 wrote: »

I am quite concerned about the Special Registers being located at COG $000+.

Cog RAM $000+ is often used for tables. Now that they cannot be "0" based means adding an extra offset value to get the table. While this is often not a problem, it is if the table is being continually used which will slow down the code.

Special regs are only part of the problem there are also interrupt vectors ?

Options would seem to be
a) move to the top above LUT - but can opcodes still reach there ?
b) Have an index-with-offset opcode, that allows multiple "Zero-based" arrays
c) Move the regs out of REG space, and use a special opcode to write.
That does not solve the INT vector placement.

A downside of a) is you lose another form of fast wrap, which is Address OR.
In another MCU, I built fast fifos using a ORL on the pointer, no need to compare & replace but that simple fast ORL needs top-based arrays

b) looks the best ?

potatohead · 2015-09-23 22:24

Screw that.

SPIN and PASM are one thing, designed together, and that means having things like the operators. A simple "##" is no different than the equally simple "%%" often used for two bit values represented by "0,1,2,3"

SPIN and PASM are full of these, and they make sense, and they are easy, etc...

Anyone wanting gas, should just go and use gas when that's all done and ready as part of the gcc tools for P2.

What Chip did, and tends to do so far with SPIN and PASM, makes a ton of sense! Yes it's not standard, but that is precisely what makes it as awesome as it is!

As that "new user" so many years ago, I was able to jump on a P1 and get things done in a DAY. Nothing about it was hard, and lots about it was a lot of fun.

I really want that same overall feel for P2 with SPIN and PASM. Ideally, the on chip system will complete the picture with the whole thing one design, made to operate together, etc...

Other tools and variants will make plenty of sense. Anyone wanting to use the mess that is gas (and it is a complete mess compared to the elegance and ease found in SPIN+PASM), can and should be using it when that all gets done as part of the gcc work for P2.

Now I've said mess and gas and gcc in one sentence. I don't mean they are bad, because they aren't. I do absolutely mean they aren't fun.

Nothing touches SPIN+PASM for quick and fairly robust coding. I really don't want that flexible syntax, operators, etc... broken or mangled into something like gas, because we will get gas anyway.

I don't want a standard assembler for PASM. I want the cool one Chip is making, and I want it with all the little shortcuts, operators, ability to put lots of kinds of things in one file, etc... that kicks a lot of Smile.

This has come up a pile of times before. And I'll say it again, if SPIN and PASM didn't make the great sense they did, I would have passed on this chip in a second, never thinking twice. It's really important that we leave SPIN and PASM to it's creator, who is Chip, and let him do what he does with languages and tools.

Yes, that is different, and that is precisely why a lot of us like using those tools and languages.

Bear in mind, one of the design specs is "fun to use"

There is no "great" time to screw SPIN + PASM users. Let's not start now.

potatohead · 2015-09-23 22:28

In many cases, it can tolerate older P1 style, but that should be discouraged.

Bull.

There is no reason at all to discourage P1 style SPIN+PASM. It's a great combination, awesome syntax, flexible data representations, etc...

gas is an ugly thing by comparison. And that's not bad, it just is. And it is, because it's meant to be portable, and all that jazz needed to support gcc.

Wonderful! Let's make sure all of that works.

But let's also not hose up SPIN+PASM, because it's a different thing. It's not meant to be portable, nor should it be at all.

What SPIN and PASM are is one language, designed with the chip, and that has some distinct advantages over other tools and approaches, and we need to preserve those, not pretend they didn't happen somehow.

potatohead · 2015-09-23 22:37

This makes PASM more difficult to learn for new people. I'm sure some of you will disagree, but youneed to remember that you have an entirely different perspective of the P2 than a new person will.

I've taught assembly language to new people a few times now. And I'm self taught on a variety of assembly languages too.

PASM+SPIN is the easiest assembly language I've ever used. Teaching others is a doddle. No joke. And it's fun.

What we've got so far is a bit harder, but still totally cake. You guys worry way too much about compliance with "pro" or "standard" type tools, which are fine, but they aren't the fun tools Chip made us.

That "##" operator will make just as much sense, just as quickly as "%%" did, and it's no big deal and a simple fix compared to the alternatives.

And here's the simple reality.

SPIN+PASM can compete with whatever else people end up doing. Gcc, Python, BASIC, FORTH, whatever, right?

Bet you it dominates on Propeller just like it did for P1.

There are clear reasons for that, and they have absolutely nothing to do with "being more like standard tools" and everything to do with the unified design possible when we've got chip designing the CPU, assembler and SPIN.

Roy Eltham · 2015-09-23 23:00

Seairth, jmg, etc.
I STRONGLY disagree with you. In fact, I argue that doing it as I suggested makes it MUCH easier for the new person. Having to put addr/4 or addr<<2 all over you code and know which to use when it extra complication just because the actual stuff is byte addressed, but most of the opcodes only contain 9 bits for cog addressing so you need to /4 the values, however some of them expect the larger 20bit address.

mov x, ##value (or mov x, &value) is a lot cleaner than mov x, #value/4

As for Spin2, if I have any say in the matter (and I am pretty sure I do

, it will be both byte code and pasm2. byte code for smaller size, and pasm2 for speed when needed. At the very least the compiler could be told to compile for size or speed and do the appropriate thing. In the ideal case, we'd have a way to allow you to specify that a function or object be compiled one way or the other. I'm not sure if we'll initially have a fancy optimizing compiler for Spin2 to pasm2 that works like gcc and the like, but at least we'll have something that produces pasm2 code from Spin2 and is faster than the bytecode interpreter version.

jmg · 2015-09-23 23:02

Roy Eltham wrote: »

Seairth, jmg, etc.
I STRONGLY disagree with you. In fact, I argue that doing it as I suggested makes it MUCH easier for the new person. Having to put addr/4 or addr<<2 all over you code and know which to use when it extra complication just because the actual

Err, you may need to read my posts, carefully.
I did not suggest what you say above.

David Betz · 2015-09-23 23:10

I'm interested to see what Chip does with the AUGx and ALTx instructions. When I first suggested a BIG instruction, it's use was simply to extend the SRC field to allow full 32 bit immediate values. That operation could be handled by the assembler by noticing the size of the immediate value and adding the BIG prefix if more than 9 bits were needed. Of course, if the immediate value is not known at compile time, that is a bit more difficult. Anyway, AUGx and ALTx have quite a bit more power than my BIG instruction and at least some uses of them probably can't be infered by the compiler or assembler. I hope there are some "macro instructions" that make it easy to do things like handling 32 bit immediates without having to hand code the AUGx instruction prefix. It would also be nice (I think!) if there was a macro facility in PASM2 so users could create macros for useful combinations of AUGx, ALTx, and normal instructions.

jmg · 2015-09-23 23:13

potatohead wrote: »

Nothing touches SPIN+PASM for quick and fairly robust coding. I really don't want that flexible syntax, operators, etc... broken or mangled into something like gas, because we will get gas anyway.

To keep away from generic arm-waving, lets look at a specific example of how PASM is simply strangely different, for no gain in 'fun' or 'teaching' :

	jmp	#begin	
begin	

vs the more common and thus normal 
	jmp	begin	
begin:

The second form is smaller, and less prone to error, as you use labels as a destination more frequently than you place the label.
I know which I would rather teach

Roy Eltham · 2015-09-23 23:21

jmg, you replied to Seairth's post, that mine is a reply to, saying you agreed with him.

Also, the reason it's jmp #label vs jmp value, is because #label means the immediate value represented by the label (a constant) verses a register value to use as the target. jmp value means load the contents of the register at value and jump to that value as an address. There needs to be both types.

potatohead · 2015-09-23 23:24

I have, and again PASM is both the easiest assembly language I have personally learned, and a pleasure to teach. That one example takes a few moments tops. Add in all the super flexible and easy to read ways to specify data, and it's excellent.

Leave SPIN plus PASM as it is.

All that "better" and "standard" stuff exists in gcc and tools. No worries right? They will get done, and you can use gas all you want.

jmg · 2015-09-23 23:24

Roy Eltham wrote: »

Also, the reason it's jmp #label vs jmp value, is because #label means the immediate value represented by the label (a constant) verses a register value to use as the target. jmp value means load the contents of the register at value and jump to that value as an address. There needs to be both types.

Most other ASM's use JMP @register for that second case.
The most common use is not indirect, and that is the one that should be cleanest and simplest.

potatohead · 2015-09-23 23:27

Says who?

Let's see, there is JMP (value), JMP value, JMP register, JMP #value, etc...

In order to do anything meaningful, one has to understand address modes. That discussion in PASM is short, and the JMP then makes great sense.

Roy Eltham · 2015-09-23 23:29

David Betz wrote: »

I'm interested to see what Chip does with the AUGx and ALTx instructions. When I first suggested a BIG instruction, it's use was simply to extend the SRC field to allow full 32 bit immediate values. That operation could be handled by the assembler by noticing the size of the immediate value and adding the BIG prefix if more than 9 bits were needed. Of course, if the immediate value is not known at compile time, that is a bit more difficult. Anyway, AUGx and ALTx have quite a bit more power than my BIG instruction and at least some uses of them probably can't be infered by the compiler or assembler. I hope there are some "macro instructions" that make it easy to do things like handling 32 bit immediates without having to hand code the AUGx instruction prefix. It would also be nice (I think!) if there was a macro facility in PASM2 so users could create macros for useful combinations of AUGx, ALTx, and normal instructions.

At the very least we could add a bunch of "virtual" opcodes that assemble into the augx/altx + opcode pairs. Jeff Martin and I have talked a bunch about having a "snippet" set/lib that would auto expand keywords into snippets of pasm or bytecode to make things easier for doing stuff like I2C or simple shift out/in operations. Things that exist now as external objects of small spin functions. We planned to have a bunch of "built ins" for this, and it would be pretty easy to extend that functionality to allow user defined ones as part of it. Perhaps a new block you can optionally include that defines these snippet/macros that you can then use elsewhere in your code.

jmg · 2015-09-23 23:30

Roy Eltham wrote: »

jmg, you replied to Seairth's post, that mine is a reply to, saying you agreed with him.

Nor does that post mention addr/4 or addr<<2 ?

I agree with you, that the tools should handle this stuff, users should not have to do : Having to put addr/4 or addr<<2 all over you code and know which to use when it extra complication

However, avoiding that needs to be done in an industry following manner, ad-hoc glyphs do not help new users.

Roy Eltham · 2015-09-23 23:31

jmg,
however, the # thing means immediate value across all pasm instructions, not just for the jmp #label case. So you are proposing making it inconsistent.

Also, most other asm's don't have user named registers. They have a fixed set of specifically named ones that are used specifically.

potatohead · 2015-09-23 23:33

Again, says who jmg?

Gcc and tools are among the industry tools, and those will be available, and likely early like we had last time.

SPIN and PASM are not industry standard, nor should they be. It's actually just one language, and that has a lot of merits. That is the part we like, those of us who like it that is.

There just won't be one PASM assembler, and the stuff you are wanting to change breaks other things that do not need breaking.

Once gas gets done, it's then possible to work in a standard way and with the set of things gcc supports, or that can work with its tools.

When we get on chip dev done, the advantages of SPIN and PASM will play out very nicely there. And having it all be developed together, one language, one file, if somebody wants to, is all part of what makes a Propeller pretty great to program on.

There is no need to standardize at this stage at all.

Roy Eltham · 2015-09-23 23:39

Who here actually programs the P1 in Spin and PASM a lot? Pretty much all of my P1 coding is in PASM with a smidgen of Spin to glue it all together. I love PASM, it's by far the best ASM language I have ever used, and I have used a half dozen or more. It's super simple and consistent. I want the P2 version to retain that as much as possible while it adds all the new abilities.

potatohead · 2015-09-23 23:42

Me too.

And my use of SPIN varies. Mostly centered on speed.

With SPIN getting both compiled and inline PASM, I expect to write some more SPIN and much larger PASM programs.

cgracey · 2015-09-23 23:45

Roy Eltham wrote: »

I think labels should be the byte address, not the long address, otherwise we'll have oddity between labels in cog code vs hub code, and labels should be able to mark data that can be unaligned, so doing #label should require a div 4 in cog space. Right?

Seairth wrote: »

Wouldn't all/most of that addressing bit shift stuff go away if you made hub instructions long-aligned and starting at byte $1000? In this case, all instruction addressing is in terms of longs, not bytes:

Cog: $000-$1FF
LUT: $200-$3FF
Hub: $0400-$1FFF

This would have some other advantages as well:

* If relative addresses where byte-oriented, this gives relative addresses a greater range. If relative addresses were long-oriented, then it is now more consistent.

* The 20-bit address would now cover 4x as much instruction space. Of course, that won't do much for the P2, but a few people have expressed extending memory on an FPGA. And then there is the P3.

Also, I will make a plug for one variation on the above scheme:

Instruction Addressing that is local to a cog is in the form %0xxx_xxxxxxxx_xxxxxxxx. Instruction Addressing that is global to all cogs is in the form %1xxx_xxxxxxxx_xxxxxxxx. This would make the current P2 implementation look like:

Cog: $000-$1FF
LUT: $200-$3FF
Hub: $80000-9FFFF

This makes the entire hub memory executable. Because addressing is long-aligned, the hub can still be extended to $FFFFF (an additional 384K instructions), so you certainly aren't limiting your options. Further, this provides additional cog-local addressing space, should that ever be desired (e.g. LUT2).

And this does not affect data addressing, since each memory type has it's own instruction set:

Cog: $000-$1FF (long-addressing)
LUT: $000-$1FF (long-addressing)
Hub: $00000-$7FFFF (byte-addressing)

The whole conundrum is in supporting less-than-long data (words and bytes). They need extra bits to resolve addresses among longs.

It would be great to make a machine that is just long-based - what a relief that would be! Supporting words and bytes, though, requires those extra sub-bits. Then there's the issue of how to handle the addressing scheme which must involve all three sizes.

On the Prop2-Hot, there was a hybrid solution, with special instructions to translate long-aligned PC addresses into PC-plus-byte/word-offset addresses which had the extra bits. This Prop2 has a very unified solution, where longs and words can start at any byte offset. I kind of miss aspects of Prop2-Hot addressing, though, like those PC-plus-offset instructions.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Part 2

Comments