propeller assembly quite similar to mips32
iammegatron
Posts: 40
While learning MIPS32 for PIC32 assembly programming, I noticed its similarity to Propeller assembly, in terms of:
32-bit register (496 vs 32)
load/store architecture
pipeline
...
Certainly the underlying hardware and detailed instruction sets are far different. But the way of thinking is surprisingly similar.
32-bit register (496 vs 32)
load/store architecture
pipeline
...
Certainly the underlying hardware and detailed instruction sets are far different. But the way of thinking is surprisingly similar.
Comments
Firstly 32 bits or not makes no odds, Either one of them could be extended to 64 bits or whatever.
A Propeller COG, once loaded, can run totally independently of HUB RAM. It has 512 longs that are at the same time registers, data space and instruction space. It executes instructions from that space. It's program counter only addresses that space. It's addressing modes only accesses that space. It does not have a load/store architecture. The read/write HUB RAM instructions are more akin to I/O instructions on other machines.
Meanwhile a MIPs has a lot of registers, only used for data. It executes code from RAM. It's program counter addresses that RAM space. It cannot operate without that instruction sequence in RAM
If I remember correctly MIPs has a three operand instruction set. Something like:
As in add R1 and R2 and place the result in R3.
Clearly this is not like the COGs instruction set.
Chalk and cheese really.
I haven't encountered another processor quite like the Propeller. The hub RAM is in some ways similar to a classic Harvard architecture. This allows a cog to load and store data from it, but instructions don't execute from it directly. However, cogs don't fit into any neat category. In particular their RAM can store a program or data which is more like a Von Neuman architecture. Most of the load store machines I've worked with only allow ALU operations between registers or immediate mode. In that sense a cog has no registers at all.
One could also look at the Prop the other way round and say it's COGs have 512 registers.
But wait a minute, the instructions are in those registers as well. They are totally interchangeable. Any COG location can be seen as memory, register, or instruction.
This is totally unlike any other main stream CPU I have ever seen.
Cannot be. Harvard implies that instructions are fetched from one memory space and data is in another memory space. A GOG has it's instructions, data, and registers (whatever "register" is supposed to mean here) in the same space.
HUB RAM is more like some external I/O device like a disk that you can read/write data to.
What we need is a 64 bit COG whose src and dst fields would allow for 16 million (?) COG registers. HUB or whatever would only be for inter process communication.
I was thinking in the sense of being two kinds of RAM, but I suppose an I/O device is a better fit.
I/O is actually done via memory mapped registers! I would argue Motorola / Mos Technologies style as opposed to Zilog / Intel style with explicit instructions.
The HUB is a shared memory storage area, not I/O in the sense we see I/O instructions on other chips. Shared, external storage to the various COGS is really how I would best characterize it. And it being shared memory more or less resonates with LMM type code too. It is random access, just not directly executable.
If one were to interconnect several VonNewman type CPU's, the concept of a HUB seems a fairly sane concept to come out of that requirement; however, the way Chip did it with the round robin type access and the COGSTART from that memory is where the magic is.
The dynamic is a simple enough one that people can write code with few worries, and the simplicity also makes for all the determinism / re-use / software peripherials type discussions we've had.
-Phil
That's more or less the way I see it as well, eight 32 bit RISC processors with a memory to memory architecture, a very small memory space, and an external ram storage space that is shared with the other processors.
Heater, I have often wished for a little more memory, perhaps 1 or 2K longs, but 16 meg, wow, that's a mind boggling thought. Same 16 meg of hub ram or would you like to increase that proportionally to 128 meg?
The point of my 64 bit COG fantasy is that the current Prop has has 32 bit instructions which contain operand src and dst fields that are 9 bits wide. 9 bits is enough to address 512 unique locations and that is why the COG only has 512 locations. If you want to increase COG space you are kind of out of luck without adding memory paging, segments (like x86) or some other horrible cludge.
What if we made the COG CPU 64 bits wide? Using the same basic instruction encoding we then have an extra 16 bits each for src and dst fields. That is 9 + 16 = 25 bits giving us a enough bits to address 33,554,432 longs or about 128MBytes. (Mmm...seeems I got it wrong in my last post).
Of course not all of it need physically exist in the COG. We could have whatever the technology of the day allows.
At this point who needs HUB RAM? Well there might need to be a few megs to use for COG to COG communication.
potatoehead,
Depends what you mean by I/O.
Clearly the counters, video and pins are done with memory mapped I/O like Motorola and co.
However, given that a COG, once loaded can operate with out HUB RAM I would argue that the HUB RAM is as much an external device (to the COG) as the pins, counters or video hardware are. It is I/O accessed with special instructions just like the I/O instructions of Zilog and co.
The Prop defies classification in so many ways.
In some ways that would make Hub-Cog mapping a bit awkward, but not all that much. If we continue to consider the Hub a 8/16/32 bit environment, then Hub data would easily be handled by Cogs, and other than loading PASM images there isn't a lot of need for the Hub to handle important dynamic Cog data.
You'd also need to have index registers or more general use of indirect addressing or a lot of subroutine calls into the 1st 512 locations to handle array access or other computed addressing since you couldn't handle instruction modification beyond location 512.
Damn yes, I was going to mention that we don't actually need 64 bits. Given the fact that everything else about the Prop is so weird it might as well go for 40 or 48 or whatever. Preferably a multiple of 8. Just choose a word width that soaks up the available number of transistors at the time:)
Dave,
Index registers would do it, but would not using indexed addressing bugger up the instruction cycle timing? I mean you have this extra level of indirection to churn through.
Programs that run within the first 496 locations could still use "jmp #routine_ret" if the Prop only overwrote the 18 LSBs in the return instruction. However, this would only allow 262,144 cog locations instead of 4 Gig.
My thinking is that all variables and jump addresses would be located in the first 496 locations. This should work for most programs that contain less than a few dozen variables and jumps. Arrays could also be stored in the first 496 locations so that the current self-modifying code techniques could be used. Of course, the best way would be to add a few index registers so that we wouldn't have to resort to self-modifying code. The current instruction set could remain the same and a few new instructions could be added to support index registers. Or probably a better approach is just to use the P2 approach, but with more COG memory.
EDIT: Heater, the indexed addressing can be done in a single cycle. That's how P2 is going to do it.
I understood what you meant in your post regarding the additional 16 bits for addressing, and I was also wrong in my address calculation. To be somewhat consistent with the propeller architecture that 25 bits would address 32MBytes, 16MWords, 8MLongs, or 4MDoubleLongs (referred to as MB, MW, ML, and MD respectively from now on). It might be a good idea to use a couple of those additional bits for the instruction set so that would leave us with 24 bits, or 16MB of address space for cogs.
With 16MB of ram an entire (32 bit color) frame of HD video could be buffered in the cog along with lots of memory left over for the program.
I still think hub ram has it's place, but it could be an external ram. Whether internal or external it could also be as much as 16MB, or if hub to cog transfers were limited to DoubleLongs as much as 128MB.
As you say though, not all of it need physically exist in the COG. We could have whatever the technology of the day allows.*
@localroger
Yes, I thought about a 40 bit processor. It is a more practical approach although it also has it's down sides. I was hoping the propeller 2 would have indirect addressing for jumps and moves but that would be difficult (won't say impossible on this forum) to do without adding instruction bits. I think the best way to get more memory would be to stay with a 32 bit propeller and go with the indirect addressing idea as per my response to Dave.
A 64 bit prop is a much nicer fantasy. It might be impossible to get the amount of memory the additional address width permits at present, but it would certainly leave room for future expansion without changing the architecture or instruction set.
@Dave Hein
Indirect addressing could allow up to 64K longs of cog memory if an unused instruction code (assuming there is an unused one) could be used as the “indirect” instruction. Some of the 18 bits of the src/dest address could be used to specify the specific instruction to execute, and the rest of the bits would be the address of the long that holds the 16 bit src/dest addresses.
-Phil