First litle of topic.
My languages is 1 Polish, 2 Swedish .... I cant (very litle speak English) My first atempt to write English was on Yours Forum.
I am interested and if I can help in ideas. End decision is only Yours.
I like Prop and its architecture.
Only You can stop My if You not like my ideas
There are hopes My ideas is of interest.
·
·
Ps. Must to collect already money to PropII. My pocket is very small (I have many only on My health pension)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Nothing is impossible, there are only different degrees of difficulty. For every stupid question there is at least one intelligent answer. Don't guess - ask instead. If you don't ask you won't know. If your gonna construct something, make it·as simple as·possible yet as versatile as posible.
Would it be possible for a sleeping threads (one waiting for waitcnt, waitpne, waitpeq) execution slice to be freed up for the other threads when its sleeping?
Yes, this is possible, but it grows the hardware a bit, as 32 flipflops have to be dedicated to store the what-we-are-waiting-for data for each task. For WAITCNT, it's just 32 bits, but for WAITPEQ/WAITPNE it's 64 bits (mask and data). This all times 8 tasks = 512 flipflops, Multiply that by 8 cogs and you get 4096 flipflops. I think our flipflops are about 5um x 20um, so that would total to about 0.4mm2, or about $0.04 per chip. Hey, that sounds cheap, doesn't it? We would definitely have to gate the clock on those things when they weren't being used.
Think about all the "what about interrupts" questions it would stop [noparse]:)[/noparse]
All kidding aside, it would be great for multi-threading all sorts of comms etc, I am drooling at the possibilities [noparse]:)[/noparse][noparse]:)[/noparse]
Please Implement [noparse]:)[/noparse]
And please allow outputting the dot clock for vga modes, would allow easy driving of parallel LCD's.
Best,
Bill
Chip Gracey (Parallax) said...
Bill Henning said...
Would it be possible for a sleeping threads (one waiting for waitcnt, waitpne, waitpeq) execution slice to be freed up for the other threads when its sleeping?
Yes, this is possible, but it grows the hardware a bit, as 32 flipflops have to be dedicated to store the what-we-are-waiting-for data for each task. For WAITCNT, it's just 32 bits, but for WAITPEQ/WAITPNE it's 64 bits (mask and data). This all times 8 tasks = 512 flipflops, Multiply that by 8 cogs and you get 4096 flipflops. I think our flipflops are about 5um x 20um, so that would total to about 0.4mm2, or about $0.04 per chip. Hey, that sounds cheap, doesn't it? We would definitely have to gate the clock on those things when they weren't being used.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ www.mikronauts.com - a new blog about microcontrollers
Think about all the "what about interrupts" questions it would stop [noparse]:)[/noparse]
All kidding aside, it would be great for multi-threading all sorts of comms etc, I am drooling at the possibilities [noparse]:)[/noparse][noparse]:)[/noparse]
Please Implement [noparse]:)[/noparse]
And please allow outputting the dot clock for vga modes, would allow easy driving of parallel LCD's.
Best,
Bill
Chip Gracey (Parallax) said...
Bill Henning said...
Would it be possible for a sleeping threads (one waiting for waitcnt, waitpne, waitpeq) execution slice to be freed up for the other threads when its sleeping?
Yes, this is possible, but it grows the hardware a bit, as 32 flipflops have to be dedicated to store the what-we-are-waiting-for data for each task. For WAITCNT, it's just 32 bits, but for WAITPEQ/WAITPNE it's 64 bits (mask and data). This all times 8 tasks = 512 flipflops, Multiply that by 8 cogs and you get 4096 flipflops. I think our flipflops are about 5um x 20um, so that would total to about 0.4mm2, or about $0.04 per chip. Hey, that sounds cheap, doesn't it? We would definitely have to gate the clock on those things when they weren't being used.
Do you mean like the pixel clock to drive graphic LCDs? If so, I second that!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I am 1011, so be surprised!
Advertisement sponsored by dfletch:
Come and join us on the Propeller IRC channel for fast and easy help!
Channel: #propeller
Server: irc.freenode.net or freenode.net
If you don't want to bother installing an IRC client, use Mibbit. www.mibbit.com
Bill Henning said...
Would it be possible for a sleeping threads (one waiting for waitcnt, waitpne, waitpeq) execution slice to be freed up for the other threads when its sleeping?
Hmmm. Be careful what you wish for! My preference would be for sleeping tasks to use their timeslots for NOPs. The reason is determinism. I think it's important for the programmer to know that, if he has started four tasks, say, each task gets one out of every four clocks. If this gets shortened sporadically due to sleeping tasks, it's impossible to predict how long a segment of code might take.
-Phil
Addendum: Bill, BTW, congrats on your new life!!! And welcome back!
If your market is to techo's then read no further. If it is to education and hobbyists read on...
I have spent nearly 40 years designing and programming. I started designing with and programming micros in 1976. In the mid 80's I designed and wrote code for a MC68705P3S with 1.7KB with a soft UART and AT command set and drive a modem chipset, all at 1MHz 4 cycles instructions. In the early 90's I wrote a 486 30MHz·emulation of a minicomputer with 20 partitions (~cogs) with their own memory (~cog) and common shared memory (~hub), 80 bit Risc instructions. It ran 3 times faster than the mini. I think that ranks me as a techo.
Do we need 16 cogs ? We must if we are working ways to get multitasking working if we only have 8 cogs !
Multitasking
This sounds great and maybe you might get an object to run the keyboard, mouse, TV/VGA and Serial all in one cog. Hey presto, your code is no longer readable by your audience and you just lost the sale before you started.
What's a beginner going to see when he/she looks at the forum? Lots of questions about multitasking - once again they're gone.·The same goes for the school teacher.
The Prop I is KISS. Simplicity is it's elligence. In my opinion, it's the cogs that make it so. No interrupts, no multitasking (as such).
Hub access is now going to be critical because multitasking will require lots more hub loads. Remember, Cog ram is still 496 longs.
My opinion for what it's worth... Quickly bring out a 16 cog Prop 1.5 with simple cog ram (no 3 read and 1 write per cycle) and therefore still 4 cycle execution (so only 2x faster than Prop I). This will solve the silicon issue, right?
Add whatever other features you can, but keep it simple. I would like improved counters, 64 I/O (some can then be used for inter cog communication). But make it sooner rather than later. As all have agreed, skip compatibility.
The Prop II could then be 16 cogs and maybe even wait for·the next feature size???
Suggestions for any version...
1. Be able to zero CNT (so that multiple props could be synchronised)
2. Hub access: Could a bit in a register, or a special rdxxxx/wrxxxx, be used to do a hub transfer which will use the next available unused access spot to improve performance by not having to wait for this cog's slot? As it would upset determinism, this can be turned on or off. Alternately, a special rd/wr that transfers say 8 or 16 longs using the same cycle stealing. Even a rd/wr that autoincrements both hub and cog and has a variable length. Just trying to think of a way to improve hub access without upsetting the current mechanism.
Multitasking would be optional. People wouldn't have to use it if they didn't want to or know how. And its presence would be entirely transparent if they don't use it. Heck, there are aspects of the Prop I that are opaque to beginners. WAITVID is a perfect example. But, by using canned objects, even beginners can take advantage of video output. (I still haven't entirely grokked the source code for TV.spin, but it doesn't keep me from using it.) The nice thing about the four-port, multitasking, 8-cog model is that it keeps as many options open as possible. If you need 160MIPS in a cog, you can have it. If you don't, but need more cog-equivalents, multitasking will fill the bill.
You miss my point - it is not that it's optional, but that it's required to solve problems. It introduces complexity that will scare the target audience.
(Cluso99 said) Quickly bring out a 16 cog Prop 1.5 with simple cog ram (no 3 read and 1 write per cycle) and therefore still 4 cycle execution (so only 2x faster than Prop I). This will solve the silicon issue, right? said...
But then we would end up with·a $30.00+ Propeller chip & about a $50.00 protoboard.(my estimate)
Hmmm... I see your point, but I'd bet that Chip could add an extra flip-flop to decide between nop's and freeing the pipeline slices, thus giving us the best of both worlds; also hub access will still impact determinism.
Frankly, what I like about the tasking is how much easier it would make things like shared serial, ps/2, mouse etc drivers that don't need anywhere near a cog's bandwidth; not to mention it more than compensating for 8 cogs instead of the previously assumed 16.
Best,
Bill
ps
Thanks! I love my new life - and my new wife [noparse]:)[/noparse] [noparse]:)[/noparse] [noparse]:)[/noparse] but its nice to start to have time to Propeller again; and I now have space to set up my scope, logic analyzer etc that I did not have physical space to set up before (small condo). It will help me finish the couple of Propeller projects that I had suspended.
Phil Pilgrim (PhiPi) said...
Bill Henning said...
Would it be possible for a sleeping threads (one waiting for waitcnt, waitpne, waitpeq) execution slice to be freed up for the other threads when its sleeping?
Hmmm. Be careful what you wish for! My preference would be for sleeping tasks to use their timeslots for NOPs. The reason is determinism. I think it's important for the programmer to know that, if he has started four tasks, say, each task gets one out of every four clocks. If this gets shortened sporadically due to sleeping tasks, it's impossible to predict how long a segment of code might take.
-Phil
Addendum: Bill, BTW, congrats on your new life!!! And welcome back!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ www.mikronauts.com - a new blog about microcontrollers
Bill Henning said...
Would it be possible for a sleeping threads (one waiting for waitcnt, waitpne, waitpeq) execution slice to be freed up for the other threads when its sleeping?
Hmmm. Be careful what you wish for! My preference would be for sleeping tasks to use their timeslots for NOPs. The reason is determinism...
I actually like this determinism thing, too. It would mean that WAITCNT and WAITPEQ/WAITPNE would need to be done in software (a few instructions each) and loop until the condition is met, in lieu of a lot of silicon. With tasks that could sleep and pass their cycle to another task, if a cog was dynamically starting and stopping tasks all the time, each task would have radically altering time frames, like 1x to 64x, depending on how many others were running and what their RDxxxx/WRxxxx load was. It could play havoc with code timing and make verification nearly impossible. With·determinism, verification would be a lot easier.
On ""Hang section" wi have similar experience and I start with Intel CPUs 4, 8 Bits
In beginning of 80´s I have constructed and programed system tu run over 400 electric motor´s on one system with modem reprogamablity and control.
On ""Multitasking" It is one of posiblites. It is no bad. Hub access is not so critical, it only changes one COG with maltitasking enabled. All other COG´s have its time slots untouched.
On ""My opinion" in normal case it is not always "" Biger Is Beter "" 8 contra 16 COG´s.
You wil have improved counters. I have many ideas with it also ( capablity to Count down like WatchDog with flag to test for complete.
Count on PIN like fequency counter with stop on given CNT.
No PIN use to communikatin with COG´s on same chip (It is waste on PIN resourses) rather shadow register to comunikate directly.
It is control feautures and programablity to them that is ""Power of Propeller".
On ""Suggestions" It is Chip that have its concept on how it must be implemented.
Ps. In Next post You said ""It introduces complexity that will scare the target audience." ..... In·PropI mode (After Reset)·its function is same that PropI ... No problem!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Nothing is impossible, there are only different degrees of difficulty.
You miss my point - it is not that it's optional, but that it's required to solve problems. It introduces complexity that will scare the target audience.
Although it's not necessarily required for the hobbyist or student. Most can get along quite well with the cogs available and with capacity to spare.
It's only when one wants to use more than what's available is there a problem. Like you don't need to understand CogNew() if you're writing a self-contained Spin program, you don't need to understand PASM if writing only in Spin, you don't need to understand task switching within a cog if you don't need it or use it.
The complexity is there for those who want to take it. It doesn't mean it will be taken but some people will do no end of clever things and most people will just use those objects as I imagine most people do with the TV drivers.
If people want to start understanding how very clever code works then they'll have to buckle down and get their heads straight and work at understanding it, but it's not required that they understand it or have to use one particular model of programming.
My VCR performed some pretty clever tricks I never understood; it never scared me off, I just never used them. It did everything I wanted but could - apparently - do a whole lot more. I don't really see a souped-up Propeller Mk II being any different.
Chip,
Would the single instruction WAITCNT / WAITPNE / WAITPEQ functionality remain in the case of a cog running a single thread?
By the way, what happened to the notion of a WAITPNE / WAITPEQ with a timeout? You could implement it with two threads in this scheme.
Regarding complexity ... as long as it's all optional, then it only affects those who need to use it. You get into problems when there are some
features where you sometimes need to use them or some kind of canned routine needs to use them and it requires you to master a whole bunch
of new, sometimes contradictory stuff that you could ignore before, yet your existing straightforward code now has to be modified because these
new features were introduced in a way you really didn't understand or necessarily want.
Post Edited (Mike Green) : 9/2/2008 4:00:02 AM GMT
Mike Green said...
Chip,
Would the single instruction WAITCNT / WAITPNE / WAITPEQ functionality remain in the case of a cog running a single thread?
Absolutely. It's only within tasks where something different has to be done in order to avoid stalling any other tasks. Either we replicate this functionality within tasks via simple instruction sequences, or we make special hardware which can turn unused task slots over to other active tasks.
By the way, what happened to the notion of a WAITPNE / WAITPEQ with a timeout? You could implement it with two threads in this scheme.
This is a pain to implement in hardware and, personally, I think it's not quite on target, because if you·timeout and then the event occurs, you might as well have been waiting in a software loop the whole time. It would only provide a potential power savings, and nothing else.
On...
""By the way, what happened to the notion of a WAITPNE / WAITPEQ with a timeout? You could implement it with two threads in this scheme."
You said....
""This is a pain to implement in hardware and, personally, I think it's not quite on target, because if you timeout and then the event occurs, you might as well have been waiting in a software loop the whole time. It would only provide a potential power savings, and nothing else."
With TimeOut capablites it is only signals that comme in this window that is interesting. All others is only ERROR and mast be falged at such. In other words program must handle that ERROR with no wait.
·
·
Ps. Maybe flag C maybe Z can be used as ERROR flag
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Nothing is impossible, there are only different degrees of difficulty.
Chip Gracey said...
I actually like this determinism thing, too. It would mean that WAITCNT and WAITPEQ/WAITPNE would need to be done in software...
Chip Gracey said...
Either we replicate this functionality within tasks via simple instruction sequences, or we make special hardware which can turn unused task slots over to other active tasks.
Chip, I'm not quite sure why this dichotomy. Determinism can still be preserved with the WAITCNT/TASKSLP hardware mechanism discussed earlier. It only requires that a task that's asleep use its timeslot, but do nothing. Only task slots that are completely inactive/unoccupied would use zero time. The WAITPxx are another issue, though, and there might be better ways to accommodate them by software in a multi-tasking environment. But please reconsider jettisoning the WAITCNT mechanism!
If you think about it, threaded-mode hardware multitasking is·no more efficient than just doing JMPRETD multitasking·(JMPRETD is like JMPRET, but executes two trailing instructions, hence it is single-cycle, just like TASKNXT would be). The only thing extra the hardware multithreading would provide is automatic flag saving/restoring and remapping incrementing. Big woop! TOTALLY NOT WORTH THE ADDED HARDWARE COMPLEXITY. Single-cycle hardware multitasking absolutely requires·more hardware,·but is it really necessary if we have good multithreading?
I think that·a register remapping scheme is really the only·vital ingredient required to do multithreading. Even flag swapping is not that important in threaded multitasking, because you can segment the threads at points where the flags don't matter. Think about the new indirect registers... they are·a singular resource that only one thread can use at a time, so count them out for general intra-thread use. But, they can be exploited as a task switcher:
······· JMPRETD INDA,INDB
This instruction would store the thread's return address·to a circular buffer via·INDA and fetch the new jump address from a circular buffer via INDB. INDB would always equal INDA plus 1, with both wrapping around. There is a single-cycle round-robin task switcher! Furthermore, INDA's pointer could select the register remapping index, automatically incrementing with each JMPRET INDA,INDB.
The cool thing about this whole approach is that it's almost·zero additional hardware and can handle as many tasks as you'll ever want without any arbitrary constraint. It would also make it easy to play with remapping without having to commit to multitasking: just set the remap factor (0..7) and do a SETPTRA.
I feel best about this approach. Does this take the wind out of anyone's sails? The biggest drawback to some of you would be that·any WAITing slots would still be taking time, but that can be mitigated by some clever programming outside the thread. In fact, 'threads' are 'threads' only insofar as you call them such. They could be hybridized to do all kinds of different things, since you're never bound within some special hardware mode.
I think this fits much more into the KISS principle and doesn't overload the Propeller architecture. It's a simple and organic extension to what we've already got and·allows you to ease into the whole multithreading concept rather than having to·'turn·it on and stand back', like the approach we've been discussing would dictate. Besides, the elements that would make multithreading viable, like the register remapping and use of INDA to select the remap index, might be·put to use for·yet-unimagined purposes.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
Post Edited (Chip Gracey (Parallax)) : 9/2/2008 5:53:35 AM GMT
Oh, lawsy, lawsy! The room's aspinnin'. I'm feelin' faint! Quick! Someone fetch my smellin' salts!
(Just kidding! ) Chip, if the single cycle multitasking is getting too messy, I've got nothing against cooperative multitasking. Not having a hardware mechanism to save and restore the cc bits leaves me a bit cold, though. The Propeller is unique in allowing the programmer to determine when Z and C get modified. Consequenty, it's possible to use these flags to record conditions that may persist for thousands of instructions. I've done this in my own programming to save machine cycles. But if this is a feature that must be sacrificed on the altar of simplicity, I may whimper, but I'd be loathe to put up a fight. (I have a feeling this matter will be revisited more than once before all is said and done, anyway! And I love the give and take of this discussion!)
Phil Pilgrim (PhiPi) said...
Oh, lawsy, lawsy! The room's aspinnin'. I'm feelin' faint! Quick! Someone fetch my smellin' salts!
(Just kidding! ) Chip, if the single cycle multitasking is getting too messy, I've got nothing against cooperative multitasking. Not having a hardware mechanism to save and restore the cc bits leaves me a bit cold, though. The Propeller is unique in allowing the programmer to determine when Z and C get modified. Consequenty, it's possible to use these flags to record conditions that may persist for thousands of instructions. I've done this in my own programming to save machine cycles. But if this is a feature that must be sacrificed on the altar of simplicity, I may whimper, but I'd be loathe to put up a fight. (I have a feeling this matter will be revisited more than once before all is said and done, anyway! And I love the give and take of this discussion!)
-Phil
But wait! There's more!
JMPRETD INDA,INDB·· WC, WZ·· 'save old·Z/C to D[noparse][[/noparse]10..9], load new Z/C from S[noparse][[/noparse]10..9]
Chip: Here is the problem. I've already lost the "thread" trying to follow all this and perhaps others to.
Can we get the terminology sorted out, "threaded-mode hardware multitasking", "multitasking", "hardware multithreading", "Single-cycle hardware multitasking", "multithreading". And that's just the first paragraph. Later we get "threaded multitasking" and so it goes.
To my simple mind we have:
1) Cooperative threading where a thread runs at full speed until it executes some instruction that causes it to suspend and let some other thread run.
2) Preemptive threading where threads are swapped around by some hardware mechanism, perhaps an interrupt or perhaps by interleaving instructions of each.
Could we fix the names of these ?
I'll now go back to trying to understand the question
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
heater said...
Chip: Here is the problem. I've already lost the "thread" trying to follow all this and perhaps others to.
Can we get the terminology sorted out, "threaded-mode hardware multitasking", "multitasking", "hardware multithreading", "Single-cycle hardware multitasking", "multithreading". And that's just the first paragraph. Later we get "threaded multitasking" and so it goes.
To my simple mind we have:
1) Cooperative threading where a thread runs at full speed until it executes some instruction that causes it to suspend and let some other thread run.
2) Preemptive threading where threads are swapped around by some hardware mechanism, perhaps an interrupt or perhaps by interleaving instructions of each.
Could we fix the names of these ?
I'll now go back to trying to understand the question
Heater, I was referring to a proposal I made on page 15, post 13 (you have to count from the top), in which·a hardware mechanism was explained to perform multitasking and·multithreading. There were two modes: single-instruction and threaded. In single-instruction mode, the·switch occurred after every instruction, whereas in threaded mode, a·TASKNXT instruction would signal·the switch. These are what you talk about above in·#2 and·#1, respectively.
The big issue·here is "HOW MUCH hardware·do we throw at the goal of multitasking?" I'm for a minimalist approach, myself, as it is·simpler and consequently very open-ended. A hardware approach, on the other hand, has some·potential benefits, such as WAITing tasks which don't take time slots, but requires a lot of stuff to make work, in terms of·gates and flipflops, and it tends to be quite modal.·
1. Preemptive multitasking: High priorty task can interrupt lower priority tasks. This wasn't under consideration here.
2. Single-cycle hardware multitasking, interleaved multitasking: Task switching occurs after each clock, passing control, in turn, to the next task in the active task list. Tasks can't monopolize the cog, so long as the WAITxxx instructions are handled by special, task-aware hardware.
3. Cooperative multitasking: Each task decides when to relinquish control to the next task. One task can monopolize the cog if it wants to.
4. Round-robin multiasking: Any tasking system using a circular list that gives each task a turn in order. #2 and #3, above, are examples of this; #1 is not.
I've been using "thread" and "task" interchangably. Some may cringe at the informality.
For My it is first of all at it funktion not what system is used. Else none.
In My opinion it is more usable to have instruction to LOAD COG from HUB Address to COG Address X Longs.
Like any block portion of CODE/DATA to transfer to else from COG.
With that instruction I can program my own multitask protocol.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Nothing is impossible, there are only different degrees of difficulty.
Phil: I feel better now, you are as confused as I am
No, seriously, I like you exposition up until para. 4.
You see as far as I can tell the concept of "round robin" is orthogonal to that of "preemptive vs Cooperative multitasking"
There is no reason why cooperative threads can't "suspend" into a scheduler that selects the next cooperative thread in anyway in chooses, round robin, prioritized, whatever.
Similarly there is no reason a thread can't be "preempted" (by interrupt say) into a scheduler that selects the next thread in any way it chooses.
I have implemented various combinations of these in the distant past.
However I guess "single-cycle" threading is necessarily round robin unless you want to build a scheduler algorithm into the CPU !!
By the way "single-cycle" threading should I think be called "single-instruction" threading.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I didn't follow all the arguments, and maybe this is not new, but: first we see the cogs, second we start tasks in them. What about thinking invers: start tasks and place them in cogs. In this case you first will not discriminate, if you have multi tasking or multiprocessing.
Tasknew (Programm, priority, required recources)
Tasknew starts programm. If priority is high, it will use a new cog. Task using no resources (like counters) can freely be distributed to cogs.
I miss a mechanism for inter-task communication. Just now, there is a lock. But couldnt there be a messaging means?
...In My opinion it is more usable to have instruction to LOAD COG from HUB Address to COG Address X Longs.
Like any block portion of CODE/DATA to transfer to else from COG.
With that instruction I can program my own multitask protocol.
Right on! But there is no speed benefit to making hardware perform this task, since you've got time for six instructions between RDLONG's, which is more than enough to get the job done.
Comments
First litle of topic.
My languages is 1 Polish, 2 Swedish .... I cant (very litle speak English) My first atempt to write English was on Yours Forum.
I am interested and if I can help in ideas. End decision is only Yours.
I like Prop and its architecture.
Only You can stop My if You not like my ideas
There are hopes My ideas is of interest.
·
·
Ps. Must to collect already money to PropII. My pocket is very small (I have many only on My health pension)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
For every stupid question there is at least one intelligent answer.
Don't guess - ask instead.
If you don't ask you won't know.
If your gonna construct something, make it·as simple as·possible yet as versatile as posible.
Sapieha
Post Edited (Sapieha) : 9/25/2008 11:56:57 AM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
THAT WOULD BE WELL WORTH IT!
Think about all the "what about interrupts" questions it would stop [noparse]:)[/noparse]
All kidding aside, it would be great for multi-threading all sorts of comms etc, I am drooling at the possibilities [noparse]:)[/noparse][noparse]:)[/noparse]
Please Implement [noparse]:)[/noparse]
And please allow outputting the dot clock for vga modes, would allow easy driving of parallel LCD's.
Best,
Bill
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
Do you mean like the pixel clock to drive graphic LCDs? If so, I second that!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I am 1011, so be surprised!
Advertisement sponsored by dfletch:
Come and join us on the Propeller IRC channel for fast and easy help!
Channel: #propeller
Server: irc.freenode.net or freenode.net
If you don't want to bother installing an IRC client, use Mibbit. www.mibbit.com
-Phil
Addendum: Bill, BTW, congrats on your new life!!! And welcome back!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Post Edited (Phil Pilgrim (PhiPi)) : 9/2/2008 2:34:51 AM GMT
If your market is to techo's then read no further. If it is to education and hobbyists read on...
I have spent nearly 40 years designing and programming. I started designing with and programming micros in 1976. In the mid 80's I designed and wrote code for a MC68705P3S with 1.7KB with a soft UART and AT command set and drive a modem chipset, all at 1MHz 4 cycles instructions. In the early 90's I wrote a 486 30MHz·emulation of a minicomputer with 20 partitions (~cogs) with their own memory (~cog) and common shared memory (~hub), 80 bit Risc instructions. It ran 3 times faster than the mini. I think that ranks me as a techo.
Do we need 16 cogs ? We must if we are working ways to get multitasking working if we only have 8 cogs !
Multitasking
This sounds great and maybe you might get an object to run the keyboard, mouse, TV/VGA and Serial all in one cog. Hey presto, your code is no longer readable by your audience and you just lost the sale before you started.
What's a beginner going to see when he/she looks at the forum? Lots of questions about multitasking - once again they're gone.·The same goes for the school teacher.
The Prop I is KISS. Simplicity is it's elligence. In my opinion, it's the cogs that make it so. No interrupts, no multitasking (as such).
Hub access is now going to be critical because multitasking will require lots more hub loads. Remember, Cog ram is still 496 longs.
My opinion for what it's worth...
Quickly bring out a 16 cog Prop 1.5 with simple cog ram (no 3 read and 1 write per cycle) and therefore still 4 cycle execution (so only 2x faster than Prop I). This will solve the silicon issue, right?
Add whatever other features you can, but keep it simple. I would like improved counters, 64 I/O (some can then be used for inter cog communication). But make it sooner rather than later. As all have agreed, skip compatibility.
The Prop II could then be 16 cogs and maybe even wait for·the next feature size???
Suggestions for any version...
1. Be able to zero CNT (so that multiple props could be synchronised)
2. Hub access: Could a bit in a register, or a special rdxxxx/wrxxxx, be used to do a hub transfer which will use the next available unused access spot to improve performance by not having to wait for this cog's slot? As it would upset determinism, this can be turned on or off. Alternately, a special rd/wr that transfers say 8 or 16 longs using the same cycle stealing. Even a rd/wr that autoincrements both hub and cog and has a variable length. Just trying to think of a way to improve hub access without upsetting the current mechanism.
·
Multitasking would be optional. People wouldn't have to use it if they didn't want to or know how. And its presence would be entirely transparent if they don't use it. Heck, there are aspects of the Prop I that are opaque to beginners. WAITVID is a perfect example. But, by using canned objects, even beginners can take advantage of video output. (I still haven't entirely grokked the source code for TV.spin, but it doesn't keep me from using it.) The nice thing about the four-port, multitasking, 8-cog model is that it keeps as many options open as possible. If you need 160MIPS in a cog, you can have it. If you don't, but need more cog-equivalents, multitasking will fill the bill.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
You miss my point - it is not that it's optional, but that it's required to solve problems. It introduces complexity that will scare the target audience.
Speed:
16 cogs @ 160Mips / 4cycles = 640 Mips
8 cogs @ 160Mips = 1280 Mips
So yes my version would only be half as fast, but I bet the overhead of the 8 cogs will grind the hub access below this.
Just my 2c.
Frankly, what I like about the tasking is how much easier it would make things like shared serial, ps/2, mouse etc drivers that don't need anywhere near a cog's bandwidth; not to mention it more than compensating for 8 cogs instead of the previously assumed 16.
Best,
Bill
ps
Thanks! I love my new life - and my new wife [noparse]:)[/noparse] [noparse]:)[/noparse] [noparse]:)[/noparse] but its nice to start to have time to Propeller again; and I now have space to set up my scope, logic analyzer etc that I did not have physical space to set up before (small condo). It will help me finish the couple of Propeller projects that I had suspended.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
On ""Hang section" wi have similar experience and I start with Intel CPUs 4, 8 Bits
In beginning of 80´s I have constructed and programed system tu run over 400 electric motor´s on one system with modem reprogamablity and control.
On ""Multitasking" It is one of posiblites. It is no bad. Hub access is not so critical, it only changes one COG with maltitasking enabled. All other COG´s have its time slots untouched.
On ""My opinion" in normal case it is not always "" Biger Is Beter "" 8 contra 16 COG´s.
You wil have improved counters. I have many ideas with it also ( capablity to Count down like WatchDog with flag to test for complete.
Count on PIN like fequency counter with stop on given CNT.
No PIN use to communikatin with COG´s on same chip (It is waste on PIN resourses) rather shadow register to comunikate directly.
It is control feautures and programablity to them that is ""Power of Propeller".
On ""Suggestions" It is Chip that have its concept on how it must be implemented.
Ps. In Next post You said ""It introduces complexity that will scare the target audience." ..... In·PropI mode (After Reset)·its function is same that PropI ... No problem!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
Sapieha
Post Edited (Sapieha) : 9/2/2008 3:59:06 AM GMT
Although it's not necessarily required for the hobbyist or student. Most can get along quite well with the cogs available and with capacity to spare.
It's only when one wants to use more than what's available is there a problem. Like you don't need to understand CogNew() if you're writing a self-contained Spin program, you don't need to understand PASM if writing only in Spin, you don't need to understand task switching within a cog if you don't need it or use it.
The complexity is there for those who want to take it. It doesn't mean it will be taken but some people will do no end of clever things and most people will just use those objects as I imagine most people do with the TV drivers.
If people want to start understanding how very clever code works then they'll have to buckle down and get their heads straight and work at understanding it, but it's not required that they understand it or have to use one particular model of programming.
My VCR performed some pretty clever tricks I never understood; it never scared me off, I just never used them. It did everything I wanted but could - apparently - do a whole lot more. I don't really see a souped-up Propeller Mk II being any different.
Would the single instruction WAITCNT / WAITPNE / WAITPEQ functionality remain in the case of a cog running a single thread?
By the way, what happened to the notion of a WAITPNE / WAITPEQ with a timeout? You could implement it with two threads in this scheme.
Regarding complexity ... as long as it's all optional, then it only affects those who need to use it. You get into problems when there are some
features where you sometimes need to use them or some kind of canned routine needs to use them and it requires you to master a whole bunch
of new, sometimes contradictory stuff that you could ignore before, yet your existing straightforward code now has to be modified because these
new features were introduced in a way you really didn't understand or necessarily want.
Post Edited (Mike Green) : 9/2/2008 4:00:02 AM GMT
Chip Gracey
Parallax, Inc.
On...
""By the way, what happened to the notion of a WAITPNE / WAITPEQ with a timeout? You could implement it with two threads in this scheme."
You said....
""This is a pain to implement in hardware and, personally, I think it's not quite on target, because if you timeout and then the event occurs, you might as well have been waiting in a software loop the whole time. It would only provide a potential power savings, and nothing else."
With TimeOut capablites it is only signals that comme in this window that is interesting. All others is only ERROR and mast be falged at such. In other words program must handle that ERROR with no wait.
·
·
Ps. Maybe flag C maybe Z can be used as ERROR flag
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
Sapieha
Post Edited (Sapieha) : 9/2/2008 4:43:34 AM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
If you think about it, threaded-mode hardware multitasking is·no more efficient than just doing JMPRETD multitasking·(JMPRETD is like JMPRET, but executes two trailing instructions, hence it is single-cycle, just like TASKNXT would be). The only thing extra the hardware multithreading would provide is automatic flag saving/restoring and remapping incrementing. Big woop! TOTALLY NOT WORTH THE ADDED HARDWARE COMPLEXITY. Single-cycle hardware multitasking absolutely requires·more hardware,·but is it really necessary if we have good multithreading?
I think that·a register remapping scheme is really the only·vital ingredient required to do multithreading. Even flag swapping is not that important in threaded multitasking, because you can segment the threads at points where the flags don't matter. Think about the new indirect registers... they are·a singular resource that only one thread can use at a time, so count them out for general intra-thread use. But, they can be exploited as a task switcher:
······· JMPRETD INDA,INDB
This instruction would store the thread's return address·to a circular buffer via·INDA and fetch the new jump address from a circular buffer via INDB. INDB would always equal INDA plus 1, with both wrapping around. There is a single-cycle round-robin task switcher! Furthermore, INDA's pointer could select the register remapping index, automatically incrementing with each JMPRET INDA,INDB.
The cool thing about this whole approach is that it's almost·zero additional hardware and can handle as many tasks as you'll ever want without any arbitrary constraint. It would also make it easy to play with remapping without having to commit to multitasking: just set the remap factor (0..7) and do a SETPTRA.
I feel best about this approach. Does this take the wind out of anyone's sails? The biggest drawback to some of you would be that·any WAITing slots would still be taking time, but that can be mitigated by some clever programming outside the thread. In fact, 'threads' are 'threads' only insofar as you call them such. They could be hybridized to do all kinds of different things, since you're never bound within some special hardware mode.
I think this fits much more into the KISS principle and doesn't overload the Propeller architecture. It's a simple and organic extension to what we've already got and·allows you to ease into the whole multithreading concept rather than having to·'turn·it on and stand back', like the approach we've been discussing would dictate. Besides, the elements that would make multithreading viable, like the register remapping and use of INDA to select the remap index, might be·put to use for·yet-unimagined purposes.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
Post Edited (Chip Gracey (Parallax)) : 9/2/2008 5:53:35 AM GMT
(Just kidding! ) Chip, if the single cycle multitasking is getting too messy, I've got nothing against cooperative multitasking. Not having a hardware mechanism to save and restore the cc bits leaves me a bit cold, though. The Propeller is unique in allowing the programmer to determine when Z and C get modified. Consequenty, it's possible to use these flags to record conditions that may persist for thousands of instructions. I've done this in my own programming to save machine cycles. But if this is a feature that must be sacrificed on the altar of simplicity, I may whimper, but I'd be loathe to put up a fight. (I have a feeling this matter will be revisited more than once before all is said and done, anyway! And I love the give and take of this discussion!)
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
JMPRETD INDA,INDB·· WC, WZ·· 'save old·Z/C to D[noparse][[/noparse]10..9], load new Z/C from S[noparse][[/noparse]10..9]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
Can we get the terminology sorted out, "threaded-mode hardware multitasking", "multitasking", "hardware multithreading", "Single-cycle hardware multitasking", "multithreading". And that's just the first paragraph. Later we get "threaded multitasking" and so it goes.
To my simple mind we have:
1) Cooperative threading where a thread runs at full speed until it executes some instruction that causes it to suspend and let some other thread run.
2) Preemptive threading where threads are swapped around by some hardware mechanism, perhaps an interrupt or perhaps by interleaving instructions of each.
Could we fix the names of these ?
I'll now go back to trying to understand the question
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
The big issue·here is "HOW MUCH hardware·do we throw at the goal of multitasking?" I'm for a minimalist approach, myself, as it is·simpler and consequently very open-ended. A hardware approach, on the other hand, has some·potential benefits, such as WAITing tasks which don't take time slots, but requires a lot of stuff to make work, in terms of·gates and flipflops, and it tends to be quite modal.·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
Here are some definitions as I understand them:
1. Preemptive multitasking: High priorty task can interrupt lower priority tasks. This wasn't under consideration here.
2. Single-cycle hardware multitasking, interleaved multitasking: Task switching occurs after each clock, passing control, in turn, to the next task in the active task list. Tasks can't monopolize the cog, so long as the WAITxxx instructions are handled by special, task-aware hardware.
3. Cooperative multitasking: Each task decides when to relinquish control to the next task. One task can monopolize the cog if it wants to.
4. Round-robin multiasking: Any tasking system using a circular list that gives each task a turn in order. #2 and #3, above, are examples of this; #1 is not.
I've been using "thread" and "task" interchangably. Some may cringe at the informality.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
For My it is first of all at it funktion not what system is used. Else none.
In My opinion it is more usable to have instruction to LOAD COG from HUB Address to COG Address X Longs.
Like any block portion of CODE/DATA to transfer to else from COG.
With that instruction I can program my own multitask protocol.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
Sapieha
Post Edited (Sapieha) : 9/2/2008 6:43:55 AM GMT
No, seriously, I like you exposition up until para. 4.
You see as far as I can tell the concept of "round robin" is orthogonal to that of "preemptive vs Cooperative multitasking"
There is no reason why cooperative threads can't "suspend" into a scheduler that selects the next cooperative thread in anyway in chooses, round robin, prioritized, whatever.
Similarly there is no reason a thread can't be "preempted" (by interrupt say) into a scheduler that selects the next thread in any way it chooses.
I have implemented various combinations of these in the distant past.
However I guess "single-cycle" threading is necessarily round robin unless you want to build a scheduler algorithm into the CPU !!
By the way "single-cycle" threading should I think be called "single-instruction" threading.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Tasknew (Programm, priority, required recources)
Tasknew starts programm. If priority is high, it will use a new cog. Task using no resources (like counters) can freely be distributed to cogs.
I miss a mechanism for inter-task communication. Just now, there is a lock. But couldnt there be a messaging means?
Right on! But there is no speed benefit to making hardware perform this task, since you've got time for six instructions between RDLONG's, which is more than enough to get the job done.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
If I start one COG as multitask system I are not interested in speed. It is in first plase more functionality to COG with tasks not time critical.
In that way I have 7 COG´s to critical taks
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
Sapieha