What would you want more of, cogs or RAM?

Harley · 2007-02-05 16:38

And, on the other hand, I've an application where 64 I/O is nearly twice as
many as needed, but 8 cogs would be about 1/2 required. Would be silly
to have to use TWO Gen 2 Props for a few more cogs!

To presently implement this design I need 3 Gen 1 Props to get all the I/O
and cogs required for the full task. So, feel 64 I/O but only 8 cogs is the
wrong direction to go. I presently don't envision the speed penalty of 16
cogs to be a problem to this application.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
h.a.s. designn

Larry · 2007-02-06 01:51

Here's a Hare-brained scheme.

Change the chip so you can stack it (literally, with the DIP version) so that the hub knows when it's piggybacked and have the hubs cooperate to provide ectenibility with additional chips.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Dennis Ferron · 2007-02-06 03:47

Hmmm... I bet with some clever software you could do this with the current Prop....

Alex Stahl · 2007-02-06 07:11

I'd opt for the 16 cog/128kB version. I also like the 1/4,1/8,1/16 hub bandwidth concept.

thank you,
Alex

Gadgetman · 2007-02-06 10:29

Larry said...
Here's a Hare-brained scheme.

Change the chip so you can stack it (literally, with the DIP version) so that the hub knows when it's piggybacked and have the hubs cooperate to provide ectenibility with additional chips.

You're right, it's a hare-brained scheme. :-)

For that to work, it would have to expose internal connections. A whole lot of connections, in fact.
32bit wide databus, HUB timing controls, I/O-pins...
(The I/O-pins can be set/reset by any COG at any moment without waiting for a HUB-slot. There's also the input/output flag register to keep track of... That can't properly be replicated by using external connections.)

Not that there's anything wrong with piggybacking ICs sometimes.
In one computer I own the builders stacked the RAM chips four high, and soldered all but the CS-pins together.
(The CS-pins were connected to the correct output of a 3-to-8 decoder with insulated wires)
Saved both space and a whole lot of drilling...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

Ym2413a · 2007-02-06 17:10

Here is what I want!

I want the next prop to have a bigger better video generater!
Programmable 2bit saturation! (Grey Scale, Muted Shade, Normal Shade, Vivid Shade)
And maybe a 6 bit lumi. *smiles*

I can dream right?

Funny enough... I know the currect prop can already do this only not from one COG.
(I've been working on code to generate more NTSC colors and it works! Takes more then one COG though.)

But then again, If we had more COGs then I guess wasting two for video signal generating wouldn't be a problem. ^^

asterick · 2007-02-06 17:34

Ym2413a said...
Here is what I want!

I want the next prop to have a bigger better video generater!
Programmable 2bit saturation! (Grey Scale, Muted Shade, Normal Shade, Vivid Shade)
And maybe a 6 bit lumi. *smiles*

I can dream right?

Funny enough... I know the currect prop can already do this only not from one COG.
(I've been working on code to generate more NTSC colors and it works! Takes more then one COG though.)

But then again, If we had more COGs then I guess wasting two for video signal generating wouldn't be a problem. ^^

Nah, just get creative! Use the 2 color mode, VGA output, and just manually phase shift.

Using that method, you could actually get 32 hues. You would have to pump out the pixels fast though. [noparse]:)[/noparse]

hinv · 2007-02-06 18:13

Once Chip braught up the option of having 1.8volt I/Os. He said that it could have faster transitions.
Anyone know how much faster. I am thinking prop to prop communication.

Ym2413a · 2007-02-06 18:16

Yeah!! 32 shades would be cool!

I feel 16 shades is a lot compared to the other limits on the currect design. The Human eye doesn't see much detail in the chroma zone of vision. We see most of our details in Lumi.
Saturation output on the prop is only 1 bit (on/off) right now. By making it 2 bits or 3bits the number of colors go way up!
Vivid and muted tones together would be great and pleasing on the eyes!

I've already written my own two-cog video output engine with syncing. It helps display a extra 32 colors on the current 3bit DAC in NTSC! (134 colors total) 86 + 16 + (32)
I wanna try a bigger DAC next and get crazy with this idea. ^^

asterick · 2007-02-06 19:17

Hehehe. I'm working on a video engine right now. Two cog design also. [noparse];)[/noparse] I only use the standard video palette though (timing and space limitations mostly) I might try doing some crazyness and see if I can get a wider palette.

Larry · 2007-02-06 20:55

Gadgetman said...

You're right, it's a hare-brained scheme.

For that to work, it would have to expose internal connections. A whole lot of connections, in fact.
32bit wide databus, HUB timing controls, I/O-pins...
(The I/O-pins can be set/reset by any COG at any moment without waiting for a HUB-slot. There's also the input/output flag register to keep track of... That can't properly be replicated by using external connections.)

See?... I TOLD you.

I guess I wanted to introduce the idea of modularity/extensibility, That you could take a couple props and stick-em together to produce some of the many configurations folks seem to want without having to design several different editions in silicon.

Hey- people laughed at the concept of the integrated circuit when I was in college.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

hinv · 2007-02-06 21:00

Hi Larry,

I think your idea is a good one, but just not practical.....yet! I have thaught of that myself.
There will come a day when die size is measured in cubic units rather than sqare units. Ever seen the Terminator series of movies? I think in T2 they showed 3d fabrication technology. Of course, that is just science fiction.

Beau Schwabe · 2007-02-06 22:05

hinv and Larry,

"There will come a day when die size is measured in cubic units rather than square units."

There is a process that stacks multiple silicon die to form what is called a flip-chip, but it's usually only two layers. When I was at National Semiconductor, there were a few chips that we did this way, but the cost for the extra processes involved were tremendous, and because of the extra fabrication steps,·yield was very poor.

But you are right, it is only a matter of time. ...by then silicon will not be the medium of choice, It will probably be something right in front of us that know one has even considered yet.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe

IC Layout Engineer
Parallax, Inc.

hinv · 2007-02-06 22:27

Hi Beau,

I thaught the 'flip chip' was just the technology that flipped the chip over having the silicon to package connections on the bottom so that the whole "bottom" of the die could be mated to a heat sink. I didn't know that it involved 2 layers.
One thing that I just thaught of is a etching technology for cooling fluid paths right there in the silicon. I wonder if this has been done allready. I know someone was working on a die sized sterling engine for cooling.

That brings up another question. Does the propeller use aluminum or copper interconnects?
Any idea why silver interconnects aren't used? It is a superior conductor of heat as well as electricity to both copper and aluminum.

Thanks,
Doug

Beau Schwabe · 2007-02-06 23:17

hinv,

"I thought the 'flip chip' was just the technology that flipped the chip over having the silicon to package connections on the bottom so that the whole "bottom" of the die could be mated to a heat sink. I didn't know that it involved 2 layers."
Right, the process that I am familiar with placed two die back-to-back. One of which was flipped.

"That brings up another question. Does the propeller use aluminum or copper interconnects?"
The interconnects are aluminum separated by silicon dioxide dielectrics and connected with tungsten plugs

"Any idea why silver interconnects aren't used?"
I'm not for sure, other than the processes required to deposit silver not to mention the cost of silver itself.
Also, it might have poor metal migration characteristics that are not favorable.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe

IC Layout Engineer
Parallax, Inc.

viskr · 2007-02-07 03:43

FYI-

Flip Chip has been used in the past for a variety of technologies to eliminate bonding, but so far bonding technology has stayed one step ahead of the alternatives.

It is very common to stack multiple chips, especially in cell phones. When I was doing this a couple years back 2 chips was routine and some 3 chip stacks were being done.

But of course you are talking about huge volumes, a pilot production run was often 10K units.

don.seglio · 2007-02-07 19:33

I have read quite a few of the suggestions but not all so excuse me if this has been mentioned before.

I commented before on the issue of 8/16 cogs as being in favor of 8 cogs with more memory but there is a memory addressing issue. SOme have suggested a 64 bit version, but inmy opinion that is really going over the top requiring a massive increase in logic in the chip. My proposal is a little more modest, what if you add one more byte to the width of the word, 40 bits will cause a modes increase in complexity but it increases your addressing space by a factor of 250 times.

Instead of being able to address 512 words you will be able to address 128K words with a modest increase in hardware. Eight additional I/O pins would come in handy.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don.Seglio

"Sacred Cows make the best Hamburger!" Don Seglio Batuna

SK · 2007-02-11 10:44

I vote for
* 16 COG/128KB HUB RAM
* 3.3V I/O
* 160 MHz

I would like to have more COGs because every COG can simulate a peripheral. So having more COGs will allow more complex designs.

I suggest a different hub bandwith management:

Standard Mode:
10_000_000 timeslots/sec for every COG.
COGs get HUB access in this order:
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
After 16 steps it starts from the beginning.

_new:_
Unbalanced Hi-/Lo-Bandwith Mode:
20_000_000 timeslots/sec for COG 0-6.
2_222_222 timeslots/sec for COG 7-15.
COGs get HUB access in this order:
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 8
0 1 2 3 4 5 6 9
0 1 2 3 4 5 6 10
0 1 2 3 4 5 6 11
0 1 2 3 4 5 6 12
0 1 2 3 4 5 6 13
0 1 2 3 4 5 6 14
0 1 2 3 4 5 6 15
After 72 steps it starts from the beginning.

The unbalanced mode will better support complex designs.

Robots:
********************************
Jobs for Hi-Bandwith COGs:
*video input (perhaps from a cheap mobile phone vga-camera with serial interface)
*analyze the picture and track objects.
*intelligent path finding

Lo-Bandwith COGS are good for>:
*PWM-DACs,
*Serial Interfacing like RS232, I2C
*Sensor input etc.

Multimedia / Computing:
********************************
Jobs for Hi-Bandwith COGs:
*Hires and multicolour video generation with cursor, sprites
*Multichannel sound generation
*Homecomputer Emulators (C64 SID, VIC)

Lo-Bandwith COGS are good for>:
*User input like Keyboard ans Mouse
*Serial Interfacing like RS232, I2C
*RTC and low bandwith peripherals

I like the unbalanced mode more than the 1:4 or 1:8 mode, because all 16 COGs can be used together.

hinv · 2007-02-11 13:33

Now that is not a bad idea. It still doesn't give a cog 1/4 access for REAL high bandwidth tasks, but it doesn't waste the cogs that would otherwise be turned off.
Great Idea!

inservi · 2007-02-11 14:32

Hello,

Option 2: 8 cogs with 256KB of hub RAM. Hub access once every 8 clocks. + interrupt support

A good interrupt support can balance the advantage between cogs and ram.

So some light tasks can be done with a simple interupt and more strong memory needing task can use cog.

dro

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
in medio virtus

JoannaK · 2007-02-11 23:46

SK: your idea might work.. dividing cog's to (optionally) two different priority class would be doable even though it might cause some backward compatibility.

Otoh: there's quite many changes due cog/memory/IO increase so there may not be 100% compatibility with current code..
- CogInit opcpde don't have bits left for more addresses/cogs.. so it'll need major rewrite.
- Pinwait opcodes become sensitive to Carry-flag (documented, but I'd expect some existing routines fail on this)
- Length of instruction words.. (more cog.mem woudl be nice time to time)

All in all.. IMHO this ain't about who can shout the best system... I'm considering the current an exiting product. Next one is likely be even better.. Please Parallax don't make it Over-Bloated, one of the good ideas in Propeller is that it is *different*... No poiont of making it just like others (interrupts for example).

JTC · 2007-02-13 03:54

·· Chip,
····· I believe more ram would be better, more the better infact.
Thanks
jim

boeboy · 2007-02-14 14:18

I personally would like some more ram I am pushing it on my projects when I use video and graphics.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lets see what this does... KA BOOM (note to self do not cross red and black)

RinksCustoms · 2007-02-15 07:58

I'm new but might give a different POV on this subject. In terms of architecture, I agree with the generation of two other versions of the prop, bandwitdh and speed do not appear to be synonimous.

Keeping it inside:

*Would 512MB RAM/512MB ROM or a GIG of each be asking for too much. I would think a few memory options would keep the imbedded system simple, fast, and of course compact, without the need for external memory
*Perhaps on the same chip have say 12 cogs, 4 dedicated for video in high res.
*What says the silicon god on a double hub idea (something like a double pole rotary switch)?

Thinking outside the chip (no pun):

*do i detect the need for a DDR200/333/400 RAM slot on the demo board?!
*and a PCI Express slot next to it?!
*Maybe store big programs/data on the ever popular memory cards we all have piles of lying around by now?

I know these ideas are outlandish and probably naive and not even compatible technologies, just some imaginative brainstorming that may play a role in the next gen PROpeller. Mock me if you might, but there is a freedom of speech ammendment u know.

I say take the Original and try to double it in every respect, that should be enough power for ANY imbedded solution, anything more than that, one should maybe rethink the project!

4x/8x the memory
2x the cogs
2x the PLL
2x the I/O Bus (bus b?)
maybe include an easy way to "link" i/o lines between cogs (shortcutting the global ram and hub) making for even faster parallel proccesing!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Definetly a E³ (Electronics Engineer Extrodinare!)
"I laugh in the face of imposible,... not because i know it all, ... but because I don't know well enough!"

QuattroRS4 · 2007-02-15 10:50

How about 256K hub ram - standard mode running 8 cogs but an enable pin (or instruction) to bring in the·8 other cogs and hub timing is adjusted accordingly ??·Thus satisfying both ??

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Necessity is the mother of invention'

Post Edited (QuattroRS4) : 3/8/2007 12:28:27 PM GMT

James Armstrong · 2007-02-15 13:10

I haven't voted yet, so my 2 cents would be more ram as possible with 16 cogs. I seem to be running towards the limit of cogs in an application I am doing and anyone wanting to do robotics and stuff might have the same problem. Lots of sensors, motor control, lcd output, serial output and more, each of which requires a cog.

- James

Roger Levy · 2007-02-18 17:45

Perhaps XGameStation should think about looking at the SeaForth 24A or 24B chip, with 24 decentralized (no hub), concurrent, customizable cores. They're cheap, incredibly fast (they can actually replace most hardware such as network, usb, memory controllers, dsp) and have only 32 instructions [noparse]:D[/noparse] They can also do anything a conventional processor can do and they are, in my eye, refreshingly different. Lightning fast memory transfer between cores. With these, external memory could be as big as you wanted.

Here's a link to the chip-maker's website: http://www.intellasys.net/ Detailed documentation (albeit very simple) is available once you register.

Of course, most people would probably not want to learn to program a whole new chip after they have already gotten used to the Propellor - which is a fine chip!! I just thought I'd drop the idea in, see what people thought. Personally I would DIE to be able to program this chip...

Gadgetman · 2007-02-18 18:25

Frankly, the SEAforth-24 doesn't impress me.

1. They insist you register before you can download the datasheets. Why?

2. The chip has 144pins, of which only 18 is General Purpose I/O.

3. 18bits 'words'/Instructions. Each core has barely HALF the RAM as a COG.

4. Transputer-like interconnects. OK if the cores only need to communicate with one or two other cores(Do you need to write the SW for communication? ) but with added delays and complexity if a Core needs to 'speak' to a non-adjacent core.

5. It costs $20(or so) if ordered in quantities of 1000 or more...

6. No DIP version, only 'Ball-grid', making it 'rather unsuitable' for hobbyist applications...

7. The XGameStation, or even the Hydra has never been about 'pure power'. The XGS is a platform designed to TEACH Console architecture, while the Hydra is designed to teach SMP Games design. And the Propeller is much closer to the CELL processor, or any other multi-core processor used in commercially available consoles than the SEAForth...

8. What about Shared RAM for keeping variables? This is important in RealTime games.
(It's important in many types of applications. )

9. Can the SEAForth generate VGA/Composite/PAL/NTSC output, or will it need dedicated HW for this?

Edit: typo

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

Roger Levy · 2007-02-18 18:48

Eh. Well, OK, I guess I need to defend it now.

1. They insist you register before you can download the datasheets. Why?

I don't know. I'd guess that they want to conserve bandwidth. Is it really that big of a deal?

3. 18bits 'words'/Instructions. Each core has barely HALF the RAM as a COG.
4. Transputer-like interconnects. OK if the cores only need to communicate with one or two other cores(Do you need to write the SW for communication? ) but with added delays and complexity if a Core needs to 'speak' to a non-adjacent core.
8. What about Shared RAM for keeping variables? This is important in RealTime games.
(It's important in many types of applications. )

Communication is facilitated by the hardware and is incredibly fast, because you can pack an entire readwrite-loop into a single 18-bit word. The cores wait until the neighboring cores send data, transparently. Routines in most cores' ROM facilitate transferring memory from distant cores. Yes, there is no shared RAM on the chip. A core could be set up as a memory controller allowing unlimited outside shared RAM. This means that you could add your own SRAM, DRAM, or whatever. Admittedly, not every core could or would in practice need access to shared RAM, but you can still have more than 8 cores reserved for the user, which could have access to it.

5. It costs $20(or so) if ordered in quantities of 1000 or more...

But, if you consider how it could replace other hardware, it could actually save cost.

6. No DIP version, only 'Ball-grid', making it 'rather unsuitable' for hobbyist applications...

Perhaps. I look at it with the perspective of a game developer , not a hardware hobbyist , so maybe it is uninteresting to people who want to tinker with their own boards.

7. The XGameStation, or even the Hydra has never been about 'pure power'. The XGS is a platform designed to TEACH Console architecture, while the Hydra is designed to teach SMP Games design. And the Propeller is much closer to the CELL processor, or any other multi-core processor used in commercially available consoles than the SEAForth...

Well, where is it written that education hardware must match industry hardware? You are still going to need to learn the CELL anyway. Perhaps the most valuable lessons are not about hardware itself but about learning to effectively organize whatever resources one has. That's what I would encourage, anyway.

9. Can the SEAForth generate VGA/Composite/PAL/NTSC output, or will it need dedicated HW for this?

In theory it can generate video signals, but I don't know its precise limitations. This would need to be evaluated by someone more qualified/knowledgable about the timing structures of the various formats.

hinv · 2007-02-19 19:56

I looked at the transputer a while back, and I was throughly impressed with version2 that never actually came out...
These SEAForth chips look impressive, especially with what appears to be a 1 cycle multiply, but what bugs me about it is WHY IS IT ONLY 18 BITS?
When high end computing is going to 64 bits, and mcu's are going to 32 bits, why only 18 bits?

Doug

What would you want more of, cogs or RAM?

Comments