More COG memory suggestion for Prop 2

tellurian · 2009-02-06 14:34

I have been using the propeller now for a while and am a definitely a fan. I have been catching up on my reading in this forum and it dawned on me that there have been no suggestions about more COG memory. The 2k COG RAM seems a bit spartan. How about 4k or 8k per cog? I realize this may be late in the game, but if possible I expect that it would definitely ratchet up throughput of SPIN and C programs as well as allow for more powerful and sophisticated ASM drivers and apps.

Lame? Late? ... or not?

BradC · 2009-02-06 14:35

tellurian said...
I have been using the propeller now for a while and am a definitely a fan. I have been catching up on my reading in this forum and it dawned on me that there have been no suggestions about more COG memory. The 2k COG RAM seems a bit spartan. How about 4k or 8k per cog? I realize this may be late in the game, but if possible I expect that it would definitely ratchet up throughput of SPIN and C programs as well as allow for more powerful and sophisticated ASM drivers and apps.

Lame? Late? ... or not?

Have you looked at the architecture of a cog instruction?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cardinal Fang! Fetch the comfy chair.

Kye · 2009-02-06 14:43

Impossible without redesign of the architecture.

9 Bits for source, 9 Bits for destination. That means you have 512 addresses to choose from.

Unless you would like two memory maps in each cog where you would need different instructions to switch between memory maps, and then while were at it we could throw in interrupts and maybe some...

Its RISC for a reason...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,

virtuPIC · 2009-02-06 14:52

Yeah, you have 9 bits for source, 9 bits for destination. Then there are 14 bits left for instruction, condition and this wc/wc/r/i stuff. Maybe it's possible to encode the cery-RiSCy instructions and conditions a little more compact? If we get two bits - one for source, one for dest - then we could double the COG RAM.

Or alternatively let's assume that access to the hub RAM will become faster. Then we could use some LMM or other virtual memory mechanism. Or maybe even with some small hardware support for that? Oh well, I'm dreaming...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Airspace V - international hangar flying!
www.airspace-v.com/ggadgets for tools & toys

mctrivia · 2009-02-06 15:07

if we added 2 opps read and write s second layer of cog ram could be made for data storage

dMajo · 2009-02-06 15:10

Kye said...
Impossible without redesign of the architecture.

9 Bits for source, 9 Bits for destination. That means you have 512 addresses to choose from.

Unless you would like two memory maps in each cog where you would need different instructions to switch between memory maps, and then while were at it we could throw in interrupts and maybe some...

Its RISC for a reason...

Yes, but having two memory maps (pages) and maybe two instructions more (a jump that select the page and set the PC on a given location(label)) and the other instructions working as usual having their source and destination targeted to the selected page not make it a cisc. It only give the opportunity for more complex time-critical pasm code.

tellurian · 2009-02-06 15:34

Kye said...
Impossible without redesign of the architecture.

9 Bits for source, 9 Bits for destination. That means you have 512 addresses to choose from.

Unless you would like two memory maps in each cog where you would need different instructions to switch between memory maps, and then while were at it we could throw in interrupts and maybe some...

Its RISC for a reason...

Ha! OK. Well I'm a C hack and have avoided asm (on anything) for the past 20 years so I did not look at the instruction set ... I guess I should have. Too bad eh, a bit more COG memory would go a long way.

Andrew E Mileski · 2009-02-06 19:06

tellurian said...
I have been using the propeller now for a while and am a definitely a fan. I have been catching up on my reading in this forum and it dawned on me that there have been no suggestions about more COG memory. The 2k COG RAM seems a bit spartan. How about 4k or 8k per cog? I realize this may be late in the game, but if possible I expect that it would definitely ratchet up throughput of SPIN and C programs as well as allow for more powerful and sophisticated ASM drivers and apps.

Lame? Late? ... or not?

Have you looked at LMM yet? It's a simple way to execute out of hub memory, while allowing small bursts (like loops) to be done in cog memory. I think LMM's unique defining feature is single instruction execution though, as others have pointed out paging and caching have been done before.

mctrivia · 2009-02-06 19:44

we could get a lot more memory in if the current 512 registers were squashed to a more conventional 16. that means only 4 bits required for referencing registers leaving 14 for data. With 16 registers there is still very little data swaping required but it does make it a little less intuitive to people use to just working with variables instead of seperate registers and code areas. I learned assembly on the 68HC11 wich has only 2 8 bit registers plus 2 16 bit pointer registers.

Since I think more intuitive will win out on that and they will keep all data as usable registers I think having a secondary cog memory which can not be used for executing code but can be used for storing local data with only 2 instructions able to access it would be the best trade off. It would be just like accessing hub ram now but always taking 1 cycle instead of the 1 to 8 that hub instructions will take. This method also has the benefit of allowing the same 512 long copy as the prop 1 and just zeroing out the higher cog memory on coginit. So very few changes to the architecture is required. The big question is, is there enough space to fin another 512+ longs/cog?

virtuPIC · 2009-02-06 20:39

mctrivia said...
we could get a lot more memory in if the current 512 registers were squashed to a more conventional 16. that means only 4 bits required for referencing registers leaving 14 for data. With 16 registers there is still very little data swaping required but it does make it a little less intuitive to people use to just working with variables instead of seperate registers and code areas. I learned assembly on the 68HC11 wich has only 2 8 bit registers plus 2 16 bit pointer registers.

Well, what about a stack machine? Yeah, sounds frightening for assembler programming, but as soon as you've got the idea it's very convenient. A stack processor only needs single address for read and write instructions. All processing instructions are without address. Stack machines are simple, fast, and compilers can easily generate code for them. There was a a processor family some time ago: the Transputer. Cute device, but overtaken by regular RISC technology. Maybe because chips have become cheaper, maybe they were meant for special purpose, maybe the also had a new strange high level language...

Well, I don't know their current offers, but the old HP pocket calculators also have been stack machines called reverse polish notation (RPN) in this context. They have been technical and commercial success.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Airspace V - international hangar flying!
www.airspace-v.com/ggadgets for tools & toys

hippy · 2009-02-07 01:53

Discussions on a major revamp of the Cog architecture are probably moot ( and that's putting it mildly ), and most have already been discussed before. At best I think we can expect additional instructions ( and perhaps some hardware support ) to enhance what there is rather than a change to the architecture which exists.

I too believed the Cog RAM needed extending but having used LMM that is IMO a suitable alternative for most cases although it does mean a reduction of execution speed and can shift the issue to being a shortage of Hub RAM.

The future Prop II will reduce speed and Hub RAM issues. If it comes with some thumb-style 16-bit 'PASM' instruction support for speeding up LMM that will be the icing on the cake.

tellurian · 2009-02-07 02:29

hippy said...
Discussions on a major revamp of the Cog architecture are probably moot ( and that's putting it mildly ), and most have already been discussed before. At best I think we can expect additional instructions ( and perhaps some hardware support ) to enhance what there is rather than a change to the architecture which exists.

I too believed the Cog RAM needed extending but having used LMM that is IMO a suitable alternative for most cases although it does mean a reduction of execution speed and can shift the issue to being a shortage of Hub RAM.

The future Prop II will reduce speed and Hub RAM issues. If it comes with some thumb-style 16-bit 'PASM' instruction support for speeding up LMM that will be the icing on the cake.

You are right, significant architectural changes are not going to happen, and with good reason. Doubling the size of COG ram would have been awesome, but I now see why that is not so simple given the machine addressing. LMM it will have to be, suitable or not that is all there is. I suppose I really should learn PASM then, but it is just so tedious and persnickety compared to C. I must say though that SPIN is really great, surprisingly flexible and comprehensive. I just bought ICCV7 and can see that I will be using all three languages. Some real hardware peripherals (UART, I2C, SPI, ADC ) would propagate the Propeller into design discussions very fast. I know a number of engineers whose main beefs were 1) no C and 2) no peripherals. So many engineers snub their noses at software UART or I2C implementations. Me too (at first). C has been taken care of now, and with some basic peripherals on board I think the Prop would become a much easier sell at design time, goodbye RTOS for many many applications.

Cluso99 · 2009-02-07 04:26

You really should read the old thread by Chip. One of his latter ideas was to put an extra 128 longs into each cog which could be used as registers/stack/fifo but not as instructions. I therefore presume there will be extra instructions to access this extra memory. I think this is a great trade-off. As Hippy and others have stated, the cog architecture is limited to 512 longs and that will not change.

Pasm is fairly simple to learn. You should give it a try with a simple program like flashing LEDs, then progress to using the FullDuplexSerial object, the the VGA object. The instructions are so regular and you only need to learn a few initially.

Now for the peripherals. That is what the cogs are for. They each have very powerful counters/registers which are used for video and pwm. However, I think a UART, I2C and SPI silicon peripherals may have been nice to save cogs in the future. I think the cogs are going to be the next scarce resource on the Prop II, as will be I/O pins. An external memory interface (SDRAM ro SRAM) may be a nice addition also.

Anyway, we shall just have to wait and see what Chip has install for us with the PropII.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps (SixBladeProp)
· Prop Tools under Development or Completed (Index)
· Emulators (Micros eg Altair, and Terminals eg VT100) - index
· Search the Propeller forums (via Google)

My cruising website is: ·www.bluemagic.biz

mctrivia · 2009-02-07 04:40

also unfortunately no set date for the prop2 release [noparse]:([/noparse]

kwinn · 2009-02-07 04:41

I agree with tellurian that extra cog ram would be nice, and it could fit in without architectural changes by using a cog version of rdbyte/rdword/rdlong. The first place I could see that being used is in the video drivers, and sure wouldn't hurt for bit pattern generation.

Mike Green · 2009-02-07 04:45

To give you an example of what can be done with just one cog, look at the 4-port UART driver. This is a serial I/O driver that can handle up to 4 ports, each with optional hardware flow control. The maximum speed of the ports depends on how many there are. Rates range from about 750KBaud with one port to about 100KBaud with four ports. I've seen a combined PS/2 keyboard and mouse driver. The low level driver for I2C and SD card SPI that's in FemtoBasic and its derivatives fits all in one cog and the I2C portion should work with any sort of I2C device. The SPI portion is designed specifically for SD cards and won't really work with other kinds of SPI devices, but there's so much variation with SPI, that it's unreasonable to expect any universality. A TV driver will fit in one cog and a decent resolution VGA driver needs two cogs. That's only 4-5 cogs for a fairly wide range of devices.

If you add the high speed floating point, that would take another 1-2 cogs depending on the features you need. With 1 cog for the main program, you're using at most 8 cogs, 7 if you're using composite video output rather than VGA.

Post Edited (Mike Green) : 2/7/2009 6:46:53 AM GMT

mctrivia · 2009-02-07 05:57

yes it is amazing what you can fit into a cog. When you run in to problems is when you need a large fast memmory area for computing data in. code space is often not the problem(at least for me) it is memory space that has gotten me jumping through hoops a few times.

More COG memory suggestion for Prop 2

Comments