propeller assembly quite similar to mips32

iammegatron · 2013-06-14 08:11

While learning MIPS32 for PIC32 assembly programming, I noticed its similarity to Propeller assembly, in terms of:

32-bit register (496 vs 32)
load/store architecture
pipeline
...

Certainly the underlying hardware and detailed instruction sets are far different. But the way of thinking is surprisingly similar.

Heater. · 2013-06-14 10:24

A cursory glance would tell me that the Propeller and MIPs architectures are very different.

Firstly 32 bits or not makes no odds, Either one of them could be extended to 64 bits or whatever.

A Propeller COG, once loaded, can run totally independently of HUB RAM. It has 512 longs that are at the same time registers, data space and instruction space. It executes instructions from that space. It's program counter only addresses that space. It's addressing modes only accesses that space. It does not have a load/store architecture. The read/write HUB RAM instructions are more akin to I/O instructions on other machines.

Meanwhile a MIPs has a lot of registers, only used for data. It executes code from RAM. It's program counter addresses that RAM space. It cannot operate without that instruction sequence in RAM

If I remember correctly MIPs has a three operand instruction set. Something like:

ADD R3, R1, R2

As in add R1 and R2 and place the result in R3.

Clearly this is not like the COGs instruction set.

Chalk and cheese really.

Mike Green · 2013-06-14 10:53

On the other hand, you're correct in that the Prop has a 32-bit architecture and a pipeline. It's not a load/store architecture, but a two operand (destination/source1 and source2) memory to memory instruction set. There are no registers to load/store even though that word is used (incorrectly) from time to time. In any event, you've certainly noticed that many of the concepts are transferrable from one device to another.

Martin_H · 2013-06-14 11:04

Heater. wrote: »

A Propeller COG, once loaded, can run totally independently of HUB RAM. It has 512 longs that are at the same time registers, data space and instruction space. It executes instructions from that space. It's program counter only addresses that space. It's addressing modes only accesses that space. It does not have a load/store architecture. The read/write HUB RAM instructions are more akin to I/O instructions on other machines.

I haven't encountered another processor quite like the Propeller. The hub RAM is in some ways similar to a classic Harvard architecture. This allows a cog to load and store data from it, but instructions don't execute from it directly. However, cogs don't fit into any neat category. In particular their RAM can store a program or data which is more like a Von Neuman architecture. Most of the load store machines I've worked with only allow ALU operations between registers or immediate mode. In that sense a cog has no registers at all.

Heater. · 2013-06-14 11:08

Mike,

One could also look at the Prop the other way round and say it's COGs have 512 registers.

But wait a minute, the instructions are in those registers as well. They are totally interchangeable. Any COG location can be seen as memory, register, or instruction.

This is totally unlike any other main stream CPU I have ever seen.

Heater. · 2013-06-14 11:13

Martin_H,

The hub RAM is in some ways similar to a classic Harvard architecture.

Cannot be. Harvard implies that instructions are fetched from one memory space and data is in another memory space. A GOG has it's instructions, data, and registers (whatever "register" is supposed to mean here) in the same space.

HUB RAM is more like some external I/O device like a disk that you can read/write data to.

Mike Green · 2013-06-14 11:15

You just have to go back far enough in time to find similar CPUs ... even the use of the word "registers" to refer to memory. Still, it's a 2-operand memory to memory architecture with no explicit registers.

Heater. · 2013-06-14 11:36

Ah yes. My "back in time" only goes back as far as the Texas Instruments TMS9900 which kept it's 16 registers in main RAM but that's not really the same either.

What we need is a 64 bit COG whose src and dst fields would allow for 16 million (?) COG registers. HUB or whatever would only be for inter process communication.

Martin_H · 2013-06-14 12:52

Heater. wrote: »

HUB RAM is more like some external I/O device like a disk that you can read/write data to.

I was thinking in the sense of being two kinds of RAM, but I suppose an I/O device is a better fit.

potatohead · 2013-06-14 13:05

Well, I/O doesn't fit for me, though I do agree on it being memory to memory VonNewman style within the COG. There are just addresses that refer to storage, and how the programmer chooses to use them impacts what we call them more than anything else, and is one of my favorite aspects of the Prop design.

I/O is actually done via memory mapped registers! I would argue Motorola / Mos Technologies style as opposed to Zilog / Intel style with explicit instructions.

The HUB is a shared memory storage area, not I/O in the sense we see I/O instructions on other chips. Shared, external storage to the various COGS is really how I would best characterize it. And it being shared memory more or less resonates with LMM type code too. It is random access, just not directly executable.

If one were to interconnect several VonNewman type CPU's, the concept of a HUB seems a fairly sane concept to come out of that requirement; however, the way Chip did it with the round robin type access and the COGSTART from that memory is where the magic is.

The dynamic is a simple enough one that people can write code with few worries, and the simplicity also makes for all the determinism / re-use / software peripherials type discussions we've had.

Dave Hein · 2013-06-14 14:35

I've always considered each COG to be a RISC processor. Most instructions only operate on registers (COG RAM), and data in HUB RAM must be explicitly accessed through reads and writes. So in that sense, the Prop is similar to other RISC processors.

Phil Pilgrim (PhiPi) · 2013-06-14 15:10

I like to think of the Propeller as eight microcoded channel processors with no CPU. People who complain that the Propeller lacks peripherals should understand that the Propeller is all peripherals!

-Phil

kwinn · 2013-06-14 18:40

Dave Hein wrote: »

I've always considered each COG to be a RISC processor. Most instructions only operate on registers (COG RAM), and data in HUB RAM must be explicitly accessed through reads and writes. So in that sense, the Prop is similar to other RISC processors.

That's more or less the way I see it as well, eight 32 bit RISC processors with a memory to memory architecture, a very small memory space, and an external ram storage space that is shared with the other processors.

What we need is a 64 bit COG whose src and dst fields would allow for 16 million (?) COG registers. HUB or whatever would only be for inter process communication.

Heater, I have often wished for a little more memory, perhaps 1 or 2K longs, but 16 meg, wow, that's a mind boggling thought. Same 16 meg of hub ram or would you like to increase that proportionally to 128 meg?

Heater. · 2013-06-15 00:57

kwin,

The point of my 64 bit COG fantasy is that the current Prop has has 32 bit instructions which contain operand src and dst fields that are 9 bits wide. 9 bits is enough to address 512 unique locations and that is why the COG only has 512 locations. If you want to increase COG space you are kind of out of luck without adding memory paging, segments (like x86) or some other horrible cludge.

What if we made the COG CPU 64 bits wide? Using the same basic instruction encoding we then have an extra 16 bits each for src and dst fields. That is 9 + 16 = 25 bits giving us a enough bits to address 33,554,432 longs or about 128MBytes. (Mmm...seeems I got it wrong in my last post).

Of course not all of it need physically exist in the COG. We could have whatever the technology of the day allows.

At this point who needs HUB RAM? Well there might need to be a few megs to use for COG to COG communication.

potatoehead,

I/O is actually done via memory mapped registers! I would argue Motorola / Mos Technologies style as opposed to Zilog / Intel style with explicit instructions.

Depends what you mean by I/O.

Clearly the counters, video and pins are done with memory mapped I/O like Motorola and co.

However, given that a COG, once loaded can operate with out HUB RAM I would argue that the HUB RAM is as much an external device (to the COG) as the pins, counters or video hardware are. It is I/O accessed with special instructions just like the I/O instructions of Zilog and co.

The Prop defies classification in so many ways.

localroger · 2013-06-15 06:23

While we are fantasizing about larger COG spaces it's worth noting that there is no law requiring computer words to be powers of 2 wide. Making the COG word 40 bits would, all else staying the same, allow for 8192 longs in each COG -- what we have now in the HUB. Given a similarly scaled up HUB this seems a bit more practical than 64-bit.

In some ways that would make Hub-Cog mapping a bit awkward, but not all that much. If we continue to consider the Hub a 8/16/32 bit environment, then Hub data would easily be handled by Cogs, and other than loading PASM images there isn't a lot of need for the Hub to handle important dynamic Cog data.

Dave Hein · 2013-06-15 08:47

Add a 32-bit index register and you can access 4Gig of cog memory without increasing the instruction width.

Dave Hein · 2013-06-15 09:00

After thinking about this for a few minutes I realized that the existing instruction set could support more than 496 instructions if indirect jumps allowed for more than 9 bits. You could do something like "jmpret 0, 1", where location 0 contains the return address and location 1 contains the target address. This would require storing addresses in the first 496 cog locations. Variables would also need to be located in the first 496 locations. So the existing instruction set could support a much larger address space just by increasing the number of bits used in indirect jumps.

Mike Green · 2013-06-15 09:53

Not quite that simple ... the JMPRET instruction stores its return address in bits 0-8 of the destination operand. That won't work for a cog address space > 512. At the very least, you'd need a new instruction that would store the full return address in its destination and you'd use an indirect jump to load it back into the PC.

You'd also need to have index registers or more general use of indirect addressing or a lot of subroutine calls into the 1st 512 locations to handle array access or other computed addressing since you couldn't handle instruction modification beyond location 512.

Heater. · 2013-06-15 10:31

localroger,

Damn yes, I was going to mention that we don't actually need 64 bits. Given the fact that everything else about the Prop is so weird it might as well go for 40 or 48 or whatever. Preferably a multiple of 8. Just choose a word width that soaks up the available number of transistors at the time:)

Dave,

Index registers would do it, but would not using indexed addressing bugger up the instruction cycle timing? I mean you have this extra level of indirection to churn through.

Dave Hein · 2013-06-15 10:41

Mike, in my proposal the return address would not be limited to 9 bits. You would have to return by doing an indirect jump to the return address register. So in my example where the call was done with "jmpret 0, 1", the return would be done with "jmp 0". I written some code in the past that performs returns like this to save an extra instruction cycle. So instead of doing "jmp #routine_ret" I just do "jmp routine_ret".

Programs that run within the first 496 locations could still use "jmp #routine_ret" if the Prop only overwrote the 18 LSBs in the return instruction. However, this would only allow 262,144 cog locations instead of 4 Gig.

My thinking is that all variables and jump addresses would be located in the first 496 locations. This should work for most programs that contain less than a few dozen variables and jumps. Arrays could also be stored in the first 496 locations so that the current self-modifying code techniques could be used. Of course, the best way would be to add a few index registers so that we wouldn't have to resort to self-modifying code. The current instruction set could remain the same and a few new instructions could be added to support index registers. Or probably a better approach is just to use the P2 approach, but with more COG memory.

EDIT: Heater, the indexed addressing can be done in a single cycle. That's how P2 is going to do it.

kwinn · 2013-06-15 11:26

@ Heater

I understood what you meant in your post regarding the additional 16 bits for addressing, and I was also wrong in my address calculation. To be somewhat consistent with the propeller architecture that 25 bits would address 32MBytes, 16MWords, 8MLongs, or 4MDoubleLongs (referred to as MB, MW, ML, and MD respectively from now on). It might be a good idea to use a couple of those additional bits for the instruction set so that would leave us with 24 bits, or 16MB of address space for cogs.

With 16MB of ram an entire (32 bit color) frame of HD video could be buffered in the cog along with lots of memory left over for the program.

I still think hub ram has it's place, but it could be an external ram. Whether internal or external it could also be as much as 16MB, or if hub to cog transfers were limited to DoubleLongs as much as 128MB.

As you say though, not all of it need physically exist in the COG. We could have whatever the technology of the day allows.*

@localroger

Yes, I thought about a 40 bit processor. It is a more practical approach although it also has it's down sides. I was hoping the propeller 2 would have indirect addressing for jumps and moves but that would be difficult (won't say impossible on this forum) to do without adding instruction bits. I think the best way to get more memory would be to stay with a 32 bit propeller and go with the indirect addressing idea as per my response to Dave.

A 64 bit prop is a much nicer fantasy. It might be impossible to get the amount of memory the additional address width permits at present, but it would certainly leave room for future expansion without changing the architecture or instruction set.

@Dave Hein

Indirect addressing could allow up to 64K longs of cog memory if an unused instruction code (assuming there is an unused one) could be used as the “indirect” instruction. Some of the 18 bits of the src/dest address could be used to specify the specific instruction to execute, and the rest of the bits would be the address of the long that holds the 16 bit src/dest addresses.

kwinn · 2013-06-15 11:31

Darn, a whole bunch of new posts before I finish composing and posting a reply.

Phil Pilgrim (PhiPi) · 2013-06-15 11:39

kwinn wrote:

I was hoping the propeller 2 would have indirect addressing for jumps and moves ...

The Prop I and Prop II both have indirect addressing for JMPs and JMPRETs, since the source field can point to a register containing the target address. It would not be a huge leap to be able to use more than the 9 LSBs of that address or to deposit more than 9 bits of return address at the destination location of a JMPRET. Moreover, a JMP (i.e. JMPRET nr) could be configured to use all 18 bits of the src and dst combined, since the dst isn't really used for anything.

-Phil

KC_Rob · 2013-06-18 13:57

Heater. wrote: »

kwin,

The point of my 64 bit COG fantasy is that the current Prop has has 32 bit instructions which contain operand src and dst fields that are 9 bits wide. 9 bits is enough to address 512 unique locations and that is why the COG only has 512 locations. If you want to increase COG space you are kind of out of luck without adding memory paging, segments (like x86) or some other horrible cludge.

What if we made the COG CPU 64 bits wide? Using the same basic instruction encoding we then have an extra 16 bits each for src and dst fields. That is 9 + 16 = 25 bits giving us a enough bits to address 33,554,432 longs or about 128MBytes. (Mmm...seeems I got it wrong in my last post).

Of course not all of it need physically exist in the COG. We could have whatever the technology of the day allows.

At this point who needs HUB RAM? Well there might need to be a few megs to use for COG to COG communication.

Now I like this idea. When can we expect to see one?

Heater. · 2013-06-18 14:27

About seven years after the release of the Prop II....

KC_Rob · 2013-06-18 14:37

Heater. wrote: »

About seven years after the release of the Prop II....

propeller assembly quite similar to mips32

Comments