What would you want more of, cogs or RAM?

MarkS · 2007-08-31 20:13

Ken Peterson said...
or...better yet...

20 pins address + 8 pins data + 4 pins handshake. That allows addressing a megabyte of RAM. Imagine being able to use RDBYTE and WRBYTE instructions to access a MB of RAM! I'd trade 32 out of 64 I/O pins for that!

EXACTLY! I'd make that trade as well! Maybe Prop 3?

Ken Peterson · 2007-08-31 20:31

The question remains:· Whether there are enough applications to which the Propeller is best suited that need fast access to more than 256K of RAM.· Only ones I can think of involve audio processing and graphics.· I know 32K fills up pretty quick with what I've been working on.· There exist many DSP's that are better for audio processing than the prop would be, if that's what you need.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

The more I know, the more I know I don't know.· Is this what they call Wisdom?

deSilva · 2007-08-31 20:32

.. and still OT: www.axis.com/products/dev_etrax_100lx_mcm/index.htm This "chip" (around $30) sports 16 MB (yes MEGA bytes) INTERNAL RAM. Of course it is a hybrid, and the RAM is SD but nevertheless...

Post Edited (deSilva) : 9/1/2007 8:44:42 AM GMT

Fred Hawkins · 2007-08-31 22:13

deSilva,
I was thinking "bet that can cross the street and chew gum too" until this:

Timers and Watchdog
'• Two eight-bit timers with programmable
clock from 381 Hz to 12.5 MHz. Fixed timer
clocks from 300 Hz to 6.25 MHz. Additional
watchdog timer'

So, it starts to look like a half-Cog chip with great on-chip peripherals.

It may be worth it to view it as an omnibus peripheral for a prop --
flash, usb, master/slave stuff, even decent ram. Wonder what its development platform looks like?

Fred

deSilva · 2007-08-31 22:26

This is the draw-back with "specific silicon": You have to compromise all the way. A COG is more or less "unlimited", as far as 490 instructions go...
The (peak!) performance of the axis cpu is advertised as 100 MIPS, which is not 1/2 COG , but 5.

What I find interesting is the "all in one package" concept, as implemented with the Stamps for years. The cost for internal links is considerably lower than for external pins.

Fred Hawkins · 2007-08-31 22:50

just kidding. I knocked them down on account of their timers' top advertised speeds. Where did you get a $30 price (insert deja vu here)? And got a figure on their devboard? US distributor (ctek) ain't saying. (Visions of a prop proto board floated on top, whipping this slave into shape.)

deSilva · 2007-08-31 23:01

A price for the "Fox Board" is here www.acmesystems.it/?id=4 saying 129 Euro = around 160 $

The 30 $ per chip have been communicated by a friend, references are in fact unclear. But I think it should be most likely somewhat higher.... (around 40$ to 50$ ?)

Ken Peterson · 2007-09-01 03:21

volume pricing?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

The more I know, the more I know I don't know.· Is this what they call Wisdom?

Paul Baker · 2007-09-01 09:28

Hey guys, this thread is already monsterously long; can we not start talking about other processors too?

-Thanks

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Post Edited (Paul Baker (Parallax)) : 9/1/2007 9:34:05 AM GMT

Franz Achatz · 2007-09-01 10:14

Paul,

maybe about deSilva's nice FoxBoard

Franz

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Please visit my online Propeller at:

http://achatz.xs4all.nl

Ken Peterson · 2007-09-01 13:22

I agree with Paul. This thread was intended to discuss the virtues of more RAM vs. more cogs. I think the horse has been beaten beyond all recognition.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

The more I know, the more I know I don't know.· Is this what they call Wisdom?

javer · 2007-09-01 19:46

First, I must apologize, my English is bad.
I see there is many suggestions about new Prop. I have some ideas too [noparse]:)[/noparse].

1. More HUBs: I think it is possible to organize COGs into small groups,
with 1 small HUB and 1 global RAM block. As example, Prop with 16 COGs can
be organized as 4 smaller Prop with 4 COG, 1 HUB and 1 RAM block. All 4 groups
are then connected trough 1 'Master' HUB. Comparing to 16 COGs with 1 HUB,
accessing form COG to global RAM in the same group will be 4 time faster, but
access to global RAM in other groups will be slower.

2. I think more local RAM (banked or non-banked) will be excellent.

3. HUB can be organized as FIFO (I think something similar is already suggested).
In this case, HUB will proceed only requested COGs, not all of them. If all COGs
need access to HUB, it will work similar to current RoundRobin method.

4. I think we realy need more different peripherals. Some of them are slow, and
some need to be fast (slow and fast COG access). All slow devices can be mapped
in global RAM space. Fast devices can be linked directly to COG(s), with global
register with 'assign' parameter (which COG is assigned to this device). Another
options for accessing (fast) devices is similar to port connections (all lines are
or-ed).

5. What about single interrupt with simple fixed priority int-controller? It is maybe
hard to implement it because of different reasonts, but it can be very usable.

6. Why we need video controller in every COG? Maybe one COG - one peripheral?

Javer

potatohead · 2007-09-01 21:06

IMHO, the RAM issue is only significant with the current Prop. 32Kb is just tight, requiring lots of juggling. Prop II will have 256Kb. This is roomy enough, combined with the speed boost, to incorporate robust paging schemes, while still having plenty of local RAM available for buffering. This really is just a scale issue, with the current scale being barely enough for a lot of tasks that are well within the limits of the Prop otherwise.

At the 256K, there is gonna be room for paging, graphics, sounds, i/o, etc... More code size / speed trade offs will be easier to manage. Having 16 COGs also will contribute to this. It's gonna be no big deal to have a few of them handling specific tasks, leaving plenty of them for general computing. That's all going to be far more granular than it is right now.

Re: Video controller in every COG. (I thought this too, when I first started working with the thing)

Symmetry is key to the Propeller. The only COG that is distinctive, with the current design, is COG 0, and only because it's the startup COG. From there, a COG is a COG.

The video controller does not have to output video, for example. It can be used for a bunch of things, so it makes sense to have more than one of them. Also, should one need multiple displays, it's on chip, meaning the programmer gets to make software choices, not hardware ones. The video system, TV specific functions aside, can serialize lots of things, quick and easy. It's a lot easier and cheaper to ignore something, than it is to kludge around not having it when needed, or add extra goodies on for the same reason.

IMHO, it's much better to scale the symmetry and determinism than it is to break it, for functional gains that would be realized either way.

eg:

let's say we've got one video cog, one sound cog, one i/o cog, one memory paging cog, and the rest being general purpose compute cogs, perhaps with more COG memory, instead of hardware functions. The trade off being some additional RAM, for sake of discussion.

This would dedicate some aspects of the chip, and would likely increase performance, or maybe just make things easier right? IMHO, for a fair number of defined applications, I think this would be somewhat true. Had this been done on the current Prop, video would have been limited well below what we've seen done today. And that's not over yet. Still lots of ways to apply the hardware remain. Had that been done, we might see some tasks easier on Prop I, but what happens when it scales?

Now, it's got less overall potential. The kludges scale too, adding complexity at the expense of utility. If the scaling is done, preserving the symmetry and determinism, those tasks become possible, along with a lot of new ones, while complexity remains lower --and the need for add-ons is actually reduced, in a lot of cases.

What happens when ones project scope exceeds the pre-planned functionality? Say two sound cogs are needed, or maybe two displays? Perhaps more RAM is necessary, or a project needs lots of I/O, frequency generation, etc...

With only some COGs able to do some things, one then must kludge to make one of the lesser COGs perform the task anyway, likely meaning additional hardware, or code complexity necessary to enable the task, that would have otherwise been applied to the task itself, instead of the meta-task.

The trade off here (and I find it a very interesting one) is some additional thought being required to achieve the task in the first place, as opposed to having more dedicated, higher level functionality on the chip. The return is a very broad scope of possible tasks, with the only real expense being more thought required to realize them. Given code can be shared far more easily than hardware can, one gets to build on the work of others, without having to acquire / build lots of stuff to do so.

Essentially, if one has a Prop to work with, not much else is needed to do a lot of things, given some software effort. This is not true for a very large segment of other devices. Think about re-purposing things too. Any Prop runs Prop code. Since many tasks do not require sophisticated hardware, having a prop available means being able to use it, for the task at hand, is highly likely. That's a very serious longer term value add, IMHO. This means skill learned on the Prop can be applied a very high percentage of the time, for a growing number of tasks, not really limited very much by the Prop itself. Dedicated devices do not share this property, so ones time is then spent on meta-skills, largely needed to combine discrete solutions instead of just building the solution.

It's all very lean. (And I'm referring to manufacturing lean. Good stuff, and the current focus for those building lots of things.)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!

Skogsgurra · 2007-09-01 21:50

I think Potatohead's post should be the last post in this thread.

He summarises what I think most of us feel about the Propeller architecture and the applications. I, for one, am very happy with the lack of any dedicated peripheral and the availability of seven general-purpose ones. That also can do service as processors - if you prefer to look at it that way.

The learning path is quick. Once you master the passing of data to and fro hub and cogs, the rest is just a matter of practising and applying the immense and growing number of ready-to-use-at-no-cost-very-nice-objects. That makes it a very real proposition to use one base design, I have one specialised for special instrumentation, for a very broad class of applications.

There are certainly many that feel that they need special peripherals. But, from what I have learned so far, very many of them can be written in assembly and put in a cog. I can not see how one could make a one-size-fits-all selection of peripherals to incorporate in a future Propeller. To those who are not satisfied with the Propeller philosophy, I can only say: there are other chips. But only one Chip - let him decide!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

paulmac · 2007-10-09 01:26

Despite my best efforts have not seen all of this thread so I may have missed this being discussed.

I'm very green when it comes to uControllers but here's my thoughts on one of the side issues. I've been sitting on this idea for a while because of my inexperience but here goes.

Would it be possible to have the hub execute custom code instead of the default? This would allow changing the timing of hub RAM access and possibly provide better support for inter-cog transfers with custom routines and who knows what else.
It would require no changes to the cog instruction set as the instructions dealing with the default hub routines would execute custom code on the hub instead.

If no custom code is specified at runtime the default is run: timings etc. as they are now. = compatibility with current cog code.

Dunno how you would get the custom code to the hub in the first place tho.

Skogsgurra, I do agree with you though. Parallax are the author, and I like what they've written so far.

Post Edited (paulmac) : 10/9/2007 1:38:44 AM GMT

Mike Green · 2007-10-09 01:49

The HUB doesn't execute instructions at all. The COGs are the only processors on the chip. The HUB logic is strictly a synchronization / access coordination and system clock controller. What looks like HUB instructions is the result of a COG program (the Spin interpreter) that's in ROM so it's always available.

LewisD · 2007-10-09 05:00

Chip Gracey (Parallax) said...

Mike Green said...

It's always refreshing to hear that there really are people who plan ahead several years, let alone five years or more for a product line as important as the Propeller is to Parallax.

One small technical question about what you've said ... What's the 8-contiguous-long accesses per hub read/write?

With 16 cogs and a full-speed hub (not half-speed, like we have now), each cog will get hub access once every 16 clocks (16 instructions). To increase the main memory bandwidth, we will likely put 8 special long registers into each cog at maybe $1E8-$1EF which are read/write data conduits. This way, in your hub turn, you can read or write all 8 of these longs if you want. This will help large-model code and video apps, too. You can 'breathe' 8 times the main memory than a RDLONG/WRLONG instruction would allow.

The TurboProp has been filling my thoughts as I drift off to sleep.

Are there any updates on its progress or any more details of its design?

As I lay imagining all of the neat thing I want to do with TurboProp and writing programs for it in my mind(The integrated IDE is awesome!!!) I was mulling over how to take full advantage of the new way to read and write to the hub with up to 8 longs at a time.
It will be a huge advantage in moving data in and out of the Turboprop but I was thinking it could be used to swap in code from the Hub and execute it in place. I imagine this would be very useful in 'The large memory model' and in other languages like Forth.

$1F0 = the location of the PAR register

I have run into one problem with this model and it is that after executing the 8 new long I will be at location $1F0. Is there a way to load a JUMP instruction into $1F0 and have it execute properly? I know that this is mostly hypothetical but loading the JUMP instruction at $1F0 and executing it might also work on the current prop.

What would certainly fix my imaginary program is having the 8 longs load in at $1E0-$1E7 instead of $1E8-$1EF. This would give me 8 additional logs of program space

OR

What would even be cooler is a way to change the address where the 8 long are written to or read from.

I realize that much of this is Hypothetical but I thought I would share my musings and I hope encourage others to share theirs....

LewisD

Post Edited (LewisD) : 10/9/2007 5:08:32 AM GMT

cgracey · 2007-10-09 07:14

Cog RAM is read directly for instructions, not special 'source' registers such as PAR, CNT, etc. So, you could write a JMP instruction into the PAR register ($1F0) and have it execute, not having to consider what PAR would return as a 'source'. In other words, the large-model would work fine by reading eight contiguous hub longs into $1E8-$1EF and then JMPing to $1E8, and then having a JMP back execute from RAM (not PAR) at $1F0.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

LewisD · 2007-10-09 07:44

Awesome!!!

You ROCK Chip!!!

My programing dreams continue with your help.

Thanks

LewisD

PS. Chip how many of the registers are like that?(all?)

Would the code look like this to stuff $1f0?

mov $1f0,stuff
...
...
stuff jmp next-page

Post Edited (LewisD) : 10/9/2007 7:58:36 AM GMT

cgracey · 2007-10-09 08:35

LewisD said...

Chip how many of the registers are like that?(all?)

Yes, all. Some registers like PAR, CNT, INA, etc. have special mux's for reading live states when accessed as source registers. All instructions and destination registers are·read from RAM, though -·not mux's.

Would the code look like this to stuff $1f0?

mov $1f0,stuff
...
...
stuff jmp next-page

That's it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Chip Gracey
Parallax, Inc.

paulmac · 2007-10-09 08:47

Mike, I am aware that the current Prop can do nothing like I propose.
Seeing as very few of us know what the hub actually looks like in the flesh, I was speculating that it may be more like a cog dedicated to the things that the hub does than a hardwired whatever else it might be.
My thoughts were also in the context of the thread, the Prop II.

Cheers.

LewisD · 2007-10-09 09:28

Chip,

When a COG is loaded using COGNEW or COGINIT does it load all 512 longs?

So if I had a DAT of 512 longs it would populate $1f0-$1ff and those location would execute code?

LewisD

Post Edited (LewisD) : 10/9/2007 9:33:11 AM GMT

evanh · 2007-10-09 10:14

paulmac said...
I was speculating that it may be more like a cog dedicated to the things that the hub does than a hardwired whatever else it might be.

The Hub doesn't execute anything. It will be a state machine that sequences through all the Cogs allowing a read or write access to Hub memory one Cog at a time.

Take boot time for example where Cog 0 must be loaded with some code to execute. If I've picked up the correct idea here, it's something like a hardwired COGINIT (0,$FE10,0) occurs in Cog 0 that then loads the top 496 words from Hub ROM into Cog 0 RAM and then begins executing it.

evanh · 2007-10-09 10:18

As another example: The Hub doesn't access any I/O. Like when the serial EEPROMs are loaded into Hub RAM, the Hub itself doesn't do that. It will be a Cog program that bit twiddles the I/O pins grabbing each bit one at a time then feeding the complete words into Hub RAM.

Ken Peterson · 2007-10-09 12:07

If the HUB doesn't execute anything, then what executes the loading of the cog RAM? Does the cog do that? Doesn't seem like it could, because it would be overwriting its own code.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

The more I know, the more I know I don't know.· Is this what they call Wisdom?

Graham Stabler · 2007-10-09 12:15

Executing code and performing a hardwired task are not the same thing. The hub may contain circuitry to stick the boot program into cog ram but that doesn't mean it is running a program like a cog.

Graham

Ken Peterson · 2007-10-09 12:18

OK...makes sense. That goes beyond what Mike said the Hub is able to do. It must also have hard-wired logic for copying memory.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

The more I know, the more I know I don't know.· Is this what they call Wisdom?

Graham Stabler · 2007-10-09 12:32

It may be neither hub nor cog, it depends how those are defined. Or I might be wrong [noparse]:)[/noparse]

Mike Green · 2007-10-09 14:28

We're into a grey area of "who does what to whom?". The COGNEW/COGINIT instruction is executed by a COG's processor, but it's a HUB instruction, so the HUB logic gets into the act. The HUB may pick which COG to use (like on a COGNEW). The HUB forces that COG into a special mode, kind of like a "boot" instruction. The COG's logic clears its registers and performs the memory copy, much like a multiple RDLONG, incrementing the source and destination each cycle. When the destination address rolls over, the COG switches back to normal mode and fetches the first instruction from location zero.

During a power-on reset, there's special logic to force the reset bit of the CLK register to one. That causes the HUB logic to force a COGSTOP for all COGs, then an internal COGNEW with the address of the boot loader in ROM. That forces COG zero to load and execute the boot loader from HUB ROM.

deSilva · 2007-10-09 17:09

LewisD said...
When a COG is loaded using COGNEW or COGINIT does it load all 512 longs?

Yes and no: It reads 512 longs, but substitutes the last 16 with zeroes.

Some of those last cells (PAR, CNT, INA, INB, PHYA, PHYB) are wired to specific data when read via the "src" operand. They behave differently when read via "dest" or as an instruction: You will read what you have written to this "shadow cell" with a COG instruction.

There are many threads discussing this concept - best search for "shadow"

Post Edited (deSilva) : 10/9/2007 6:37:04 PM GMT

What would you want more of, cogs or RAM?

Comments