What would you want more of, cogs or RAM?

SailerMan · 2007-01-15 13:58

I'm not sure if I'm speaking the truth but the firmware is in Rom correct?

I think the Firmware should be able to be updated through software. This way enhancements to the language can be made without redesigning the whole chip.

hinv · 2007-01-15 17:29

Mike,

Chip said...
Pins would transition much faster for LVDS possibilities.

This would make for higher bandwidh prop to prop connections, prop to memory connections, and faster long distance connections via lvds without using up so much power.

Bill said...
Therefore, in the 16 cog version each cog would get TWICE the memory bandwidth the current cogs get!

That is twice the memory bandwidth, but 8x the cog speed right? Wouldn't that make memory bandwidth dependant apps even more lop-sided?

Biill said...
- with the 64 bits of I/O, it is easy enough to hook up some external memory - as much as we like.

Would there be 1.8V memory that would be faster and as easy to connect?
Is faster realy an issue for prop connected external memory with the bandwidth limitation on main memory in prop2?

Thanks,
Doug

hinv · 2007-01-15 17:43

Nefastor said...
I'd say we can use both solutions :
- More cogs means increased access time to main RAM, so if you application ends up not using more than 8 cogs, you're reducing performance for no gain at all. Unless there could be a mechanism in the hub that would connect ONLY the active cogs to main memory, but that would kill timing determinism and I'd rather keep it.

The ability to change clock speed also could kill determinism, but it handled nicely in spin, so couldn't the hub be handled ths same way. Why couldn't the prop have a register to configure the shared resource manager. You could set it up to just have 1,2,4,8,or 16 cogs in the round robin. Then it would suit both worlds.
The applications that need more memory bandwidth would get it, or the applications that needed more cogs could get them. newcogg would just not work once you hit the limit to how the hub was configured. The only appications that would suffer in this scenario would be the ones that NEED 256KB main memory?

Any thaughts?

Gadgetman · 2007-01-15 19:25

The nice thing about the Round-robin scheme that is used now is that it IS fixed.
Particularly in assembly-language routines this means not having to do all kinds of tricks to adapt to a new interval. We already KNOW how often each COG will get access to the System RAM.

If we didn't know, we'd either have to write self-modifying code, or have the loading code patch the assembly routines before they get loaded into the COG(probably by having several similar routines stored and just picking another one).

The thought of writing several similar routines, then finding a fault and having to patch all of them...

Updatable firmware would be nice, though.
(Probably not feasible as FLASH takes a bit more silicon real-estate than a masked ROM. Or so I've heard. not to mention having to add new functions to the Boot code.)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

hinv · 2007-01-16 00:18

Gadgetman said...
We already KNOW how often each COG will get access to the System RAM.

You know when in clocks, but not in time, because the clock can be changed, which most would believe is a good thing.
If you set the round robin for 1,2,4,8 or 16 cogs at the same time you set the clock, why couldn't you just reference a HUB register at the same time you check the clock frequency.
If, for instance you set the hub for a 4cog round robin, you could only newcog 3 times successfully for all 4 cogs configured for use, but each cog would have 1/4 time access to the shared resources.
I don't see why you would have to do any self-modifying code. You don't have to for different clock settings. Am I missing something?

Thanks,
Doug

Harley · 2007-01-16 00:37

Don't recall if this 'problem' has been brought up before.

Will the existing PropPlug and similar programming devices work with the 1.8v 2nd gen Propeller? Bummer, if not.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
h.a.s. designn

Paul Baker · 2007-01-16 01:50

The new chip will run it's core at 1.8V but the I/O will be 3.3V, so there will be no problems.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Gadgetman · 2007-01-16 08:02

hinv said...

Gadgetman said...
We already KNOW how often each COG will get access to the System RAM.

You know when in clocks, but not in time, because the clock can be changed, which most would believe is a good thing.
If you set the round robin for 1,2,4,8 or 16 cogs at the same time you set the clock, why couldn't you just reference a HUB register at the same time you check the clock frequency.

No.
The problem is when writing tight code, like i2C or SPI to access external storage. Then you want to maximize the use of HUB-accesses, right?
At the moment we KNOW that a HUB-access comes around every 16 clock pulses, and that the operation takes 8 clocks, meaning we can interleave 2 regular 4-clock instructions, right?
(First HUB-access excempted as we have no way of determining in beforehand when the first access will be available)
That means we code for that situation.
If we can't fit what is needed, we skip one HUB-access and get a potential 6 assembly instructions in the gap.

This is what we mean with deterministic timing.
Clock-frequency doesn't matter; this is just as valid at 20KHz as on 80MHz.
If we don't know how often a HUB-access is going to come along, how can we code to use them optimally?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

hinv · 2007-01-16 14:24

Gadgetman:
Now I understand your objection a little more, but if you look at the implementation I suggest, you could still code for worse case as you plan on doing, with HUB-access every 16 clock pulses.
Then, say the HUB was configured for 8Cog round robin, then your program would just skip over every other HUB access. It won't hurt, would it?
On the other hand, in applications which are more memory access intensive, they could use the greater hub access with the tradeoff being less cogs available. It would make the Prop2 more versitle.
It sounds like from other posts to this forum that there are some applications out there that could use the shorter HUB cycle.

Thanks,
Doug

P.S. Post your new web site so I don't actually viisit it. ;^)

helloseth · 2007-01-16 14:26

Gadgetman said...

hinv said...

If we don't know how often a HUB-access is going to come along, how can we code to use them optimally?

There is a difference between a 'programmable' access rate, and a variable access rate.

I thought people have been requesting 'programmable' hub access rates. 1:16, 1:4, etc. Once the rate is set. It is 'fixed' and the deterministic timing still applies, although you may have a constant to adjust common code. Or a generalized routine, may require a certain hub access rate.

Seth

Paul Baker · 2007-01-16 21:04

Even programmable access rate can/will cause massive problems, since the hub is a limited resource (there are only so many time slots availible) all objects in a program have to agree to share the resource in a friendly manner. This man not be to much of a task when you're writing up the entire program, but what if you include objects written by others? That adds a futher wrinkle beyond "How do I interface and use the object", you also need to keep track of each object's requested hub resource, and you may end up with a situation where you include a couple objects, have plenty of cog and ram to spare but no additional hub slots left to do anything.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

hinv · 2007-01-17 14:49

Can you change the clock after startup?

Paul Baker · 2007-01-17 18:22

Yes

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

hinv · 2007-01-17 20:55

So, if you change the clock after startup, and I haven't heard of any objections to that, even though it would definately change determinism, I don't understand the objection to have the option to change the Hub.
I do understand that with code stuffing, it is a little bit different than a adjusting for a clock change, but in fact, if your application can do with only 1 access to shared memory every 16 cycles, then it would work just fine if it just skipped over every other one if it had access to shared memory every 8 cycles.
Your code wouldn't have to change. The important part to preserving this timing would be that the Hub round robin would have to be a power of two. Ie

Hub points to cog:
1 Cog configured -> main memory access every cycle                                             0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 Cogs configured-> each cog would have main memory access every other cycle   0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
4 Cogs configured-> each cog would have main memory access every 4 cycles        0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 
8 Cogs configured-> each cog would have main memory access every 8 cycles        0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
16 Cogs configured-> each cog would have a main memory access every 16 cycles 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F

So you can see if your code was configured for a 16 cog Hub, you would get your access every 16 cycles.

The advantage is that if you had an application that needed faster access to main memory, but not more processing power than is in one cog, you wouldn't have to spread it over multiple cogs just to get faster memory access, even if it was possible with the given application. You wouldn't have to waste the processing power and memory accesses communicating between cogs for what would fit easily in 1 cog if it could talk to main memory faster.
So in conclusion, everybody wins(well not those that would like 256KB), but the ones that want faster memory acces win, and so do the 16 coggers!
Even if you just had 1 bit in the configuration register, you configure between 8 and 16 cogs, with 2 bits, you could configure between 2 4 8 or 16 cogs. If people want to use just 1 cog, I would guess that they would be served better by other MCU's out there.

I know I am late to this discussion, and even late the propeller in general, but I think my argument is a valid one. I would love to invoke some responce from Chip, BillH, Phil, or Andre, or MikeG. What do you think?

helloseth · 2007-01-18 13:42

hiv,

It seems that their objection is that if the hub access can be changed, it can/will break generalized code 'objects' which are/may be programmed by other people. So I make a widget and write my code with hub access configured for 16 cogs, but want to use an SPI object from someone else which requires/assumes 8 cog hub timing. Then the SPI object won't work properly unless you modify the code, if at all. (SPI is probably a bad example, but I can't think of another at the moment).

Seth

Oliver H. Bailey · 2007-01-18 15:10

Seth,
Objects are not COG sensitive as the next available COG is used when it is needed by an object startup. Each COG has control over it's piece of the clock world via counters, frequency registers, and PLL registers, which are tuneable for each COG application. Since the new designs being discussed would be faster in clock speed (160 mips vs. the current 20 mips) the worst that could happen would be a compatability mode switch that throttle a COG to the current clock speed and frequency. The part has so much flexibility already built in that once you know how to use the clock, counters, PLL, and frequency resisters, there is little you can't do.

Oliver

helloseth · 2007-01-18 15:35

Oliver,

Correct. This part of the discussion was wrt 'programmable' hub access rates. Some people want to be able to programmable hub access 'windows' so some cogs can access the hub more frequently than other cogs. So a cog doing PWM could have less-frequent accesss to the hub resources freeing hub access bandwidth for other cogs. Others insist on maintaining the strict round-robin hub access scheme which is fixed at the factory so-to-speak. This would 'waste' some hub access bandwidth, but provide a dependable access rate/sequence.

Seth

hinv · 2007-01-18 17:22

Seth:

I think you missed what I am proposing.
I am NOT proposing programmable hub access rates where some cogs will have more frequent hub access than others.
I AM proposing a strict round robin acces where you can turn on access to 2,4, 8 or 16 cogs, and have the round robin access sceme cycle between the cogs that are on. The remaining cogs would be OFF

Objects should be written to the lowest access for the task(16cog config), but some tasks just need more bandwidth than a 1/16th hub cycle. So, if you NEED more hub access you will limit the number of cogs availble for your program, say to 4 or 8 cogs. Objects written for access for a 16Cog config would work just fine on a 8Cog or 4Cog configuration unless it counted on NOT getting access to main memory for 16 cycles.

My goal here is to eliminate the bottleneck for those that would otherwise run into it by being to ballance the the system for high bandwidth tasks....making it usable to a larger group of developers like maybe competing with ARM7 and ARM9 cores
It is essentially what Chip proposed, but simpler, requiring less logic to do. The difference is that cognew would fail after it hit the configuration limit of 2, 4, 8, or 16 cogs.

potatoehead said...
Any compute problem, properly coded, becomes an I/O problem

Doug
hao

helloseth · 2007-01-18 17:57

hinv said...
Seth:
I am NOT proposing programmable hub access rates where some cogs will have more frequent hub access than others.
I AM proposing a strict round robin acces where you can turn on access to 2,4, 8 or 16 cogs, and have the round robin access sceme cycle between the cogs that are on. The remaining cogs would be OFF

hao

Your preaching to the choir. But, your proposal IS programmable access rates. What OTHERS were objecting to was having the hub access scheme that is NOT fixed. The others were concerned that code which required more than 1/16th or 1/8th or whatever, would break if run on a prop configured for less frequent access. I understand the problem, but would rather have that flexability and deal with the code changes to standard routines when the problem arises.

Seth

ciw1973 · 2007-01-18 18:29

If we're going to get faster cogs, and more of them, then the fixed deterministic timing that is important to many of the experienced developers on these forums, will be different in version 2 of the Propeller than the current version, and many existing modules will need to be re-written/modified anyway.

I suspect now that people are aware that the speed of the cogs and frequency of hub access slots for each are going to change, they will either start writing their objects in a manner so as to be flexible/configurable when it comes to timing, or they'll start to stipulate that they must be run on a Propeller running at specific speeds etc. The first option is obviously going to be more work, but a good thing in the long-run, the second is likely to cause quite a few problems.

I think in practice, it's likely that many of the existing objects will be released in two versions, one for the current Propeller, and one for version 2. At least, that's what should happen, as there will certainly be optimisations to be made when it comes to running on the new chip.

hinv · 2007-01-18 19:25

ciw1973 said...
If we're going to get faster cogs, and more of them, then the fixed deterministic timing that is important to many of the experienced developers on these forums, will be different in version 2 of the Propeller than the current version, and many existing modules will need to be re-written/modified anyway.

Agreed. Is there somebody that plans on running timing sensitive code unchanged from Prop to Prop2?

ciw1973 said...
The first option is obviously going to be more work

I would disagree with that statement. The only thing different between code that expected a fixed 1/16 hub cycle and one that would work on either 1/16, 1/8, 1/4, 1/2 is that you would have to make shure that the timing wouldn't be messed up if you got access to main memory earlier than you figured. If you are really stuffing 13 instructions between hub accesses, this won't happen. On the other hand, it would be a lot easier to write apps that needed more bandwidth to fewer cogs. Some applications just don't fit in 40MB/Second. Rather than having to spread a task out with clever code to share the bandwidth between 2 or more cogs, you could do it in 1, at the expense of cogs to work with.

Another suggestion that I thaught of, would be a real time clock that can be tied into the system clock somehow.

Doug

Bergamot · 2007-01-18 20:02

hinv said...
I would disagree with that statement. The only thing different between code that expected a fixed 1/16 hub cycle and one that would work on either 1/16, 1/8, 1/4, 1/2 is that you would have to make shure that the timing wouldn't be messed up if you got access to main memory earlier than you figured. If you are really stuffing 13 instructions between hub accesses, this won't happen. On the other hand, it would be a lot easier to write apps that needed more bandwidth to fewer cogs. Some applications just don't fit in 40MB/Second. Rather than having to spread a task out with clever code to share the bandwidth between 2 or more cogs, you could do it in 1, at the expense of cogs to work with.

But if an object is coded for 1/4 mode, and you run it in 1/16 mode, it is *not* going to run as expected. The nice thing about the fixed ratio is that you don't have to worry about whether the code will run properly.

hinv · 2007-01-18 23:28

Bergamot,

I don't think somebody should *expect* any high bandwidth object to run in low bandwidth mode. The Prop2 is a different beastie, and the expectations should be set at the start. It would also spurr a little competition, come to think of it. If a someone writes and published object for 1/4 mode, and somebody else can acheive the same results on a 1/8mode object, I would guess that more people would use the object that runs in 1/8 mode. BTW, I would suggest that the objects say what the requirements are right at the top of the file.
The same goes, however for objects that can't fit in 1 cog because of the bandwith limitation. Say a video object takes up 4 cogs because it takes that much bandwidth. If someone allready had 13 cogs allocated in a 1/16 config, it is *not* going to run as expected.
It wouldn't degrade the chip 1 bit by having a configurable hub! If you wanted to run it in 16 cog mode all of the time, you could.

Come to think of it, this would complicate thinsgs a bit, but it may be desirable to make the unused cogs in headless mode, not getting access to the hub after bootup, but could do tasks communicating through pins only.

JoannaK · 2007-01-31 22:02

Chip Gracey (Parallax) said...
What would you rather have in a future Propeller chip:

Option 1: 16 cogs with 128KB of hub RAM. Hub access once every 16 clocks.
Option 2: 8 cogs with 256KB of hub RAM. Hub access once every 8 clocks.

Note that each cog would run at about 160 MIPS, as opposed to the current 20 MIPS.

Hi.. I know this comes a bit late.. been reading trough old posts, joined the forums and made my first order of couple Propellers.

Of those 2, I would chose 16 cogs /128KB ram.. Afterall this new chip is going to have double number or IO lines so IMHO it would be good to have more counters and cpu cores. Besides that 128KB is four times as much Ram memory are there is now, so IMHO it should be ok for a while.

About the In-chip leakage current.. It's a tough call.. 160MIPS is nice speed, and IMHO there is no dire need to increase it to 200+ at the moment. I'm aiming to USB 2.0 full speed with cog-IO, and with 9Mhz clock with 16*PLL it would make nice 144MHz clock that makes 12 instructions/bit ... Should be quite ok? Nah.. Looks like it needs bit more cpu power.. Would it be possible to have 12MHz x-tal (192MHz clock speed)? For Low-leakage slow-cpu projects this current version of propeller is good.

Faster Hub access might be nice, but IMHO 40Megabytes/s for each cog is quite enough for most uses (besides Bitmap-VGA output, but on those there should be multi-megabyte rams anyhow).

Edit2: Ah.. I think 160Mhz ain't quite enough.. how about 192Mhz (12MHz oscillator *16xpll)

Post Edited (JoannaK) : 2/1/2007 5:32:58 PM GMT

JTC · 2007-02-02 03:28

·· Both looks great. I am fairly new to the prop and I bounce back and forth
between the two.
I like the greater speed of the 8 cog arangement but also
like the idea of more cogs.
Jim

R Baggett · 2007-02-02 14:57

I've used the prop on a few industrial instrumentation and control projects, and vote for more memory, hands down..

As for adaptive cog access.. Yes, it would break some existing objects.. when used.

But would be soooo handy to have when needed. I would imagine that for the majority of applications, it could be left at default and would be no problem..

You wouldn't HAFTA set it up for something other than default unless you had a compelling reason.. (Right?)

asterick · 2007-02-02 15:40

I still stand by the idea of having fixed determinisitic round-robin access for a select number of cogs (1/16,1/8,etc) with the option for specific cogs to drive on off-cycle unused hub accesses to maximize performance (100% Hub usage FTW!)

It's compatible AND it gives us the option of plowing the Smile out of the HUB. (Just think what that would do for SPIN)

johnny_b · 2007-02-05 13:02

Hi,

8 Cogs, big RAM, and LOTS MORE I/O pins!!!!!!!

please..... ;-))))

PHX · 2007-02-05 14:24

Hi,
I'm not sure if this has been mentioned already but is there any approximate date for this baby ?
I'm eager to have the mutiplication ! :-D
Richard

Luis Digital · 2007-02-05 15:10

PHX said...
Hi,
I'm not sure if this has been mentioned already but is there any approximate date for this baby ?
I'm eager to have the mutiplication ! :-D
Richard

1 Year.

What would you want more of, cogs or RAM?

Comments