Chip !
What extra instructions are added besides mul and div ?
I havn't been following this thread (or any other prop2 threads), so I'm sorry for being ignorant.
Chip Gracey (Parallax) said...
I'm starting to think maybe just simple I/O pins that aren't brought out are the way to go, if anything. It's a simple circuit and requires one sentence of explanation. If that's how it winds up being, people had better use good sense about how to allocate them.
So far, by the way, I've only heard people say that they use single pins at a time to release other cogs from WAITPxx instructions. If that's all that needed, 64 internal I/Os is total overkill....
In this case, perhaps 32 pins could be data pins, and the other 32 could be like a global DIRA register, common to all cogs, but where each cog has to "check out" the bits it wants to use via a mask (similar to checking out a lock), and all cogs can read this Gobal DIRA to see which internal pins are already allocated.
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
You will not be able to shake my belief that internal I/O pins are a kludge and a bad fix for a problem...
Okay so, if the main use of all the internal I/O pins will just be for waiting in idle state for some event to happen... why not make a new "HUB OP" called wait for signal.
You could have 8 signal lines in the hub which would kick any cog out of a low power state.
....
So we·should have two instructions.
"CogState" - Returns the state of the cog specified. I.E. it returns -1 if the cog is stopped, 0 if the cog is running, 1 if the cog is idling.
"CogSignal" - Forces the specified cog to exit low power idle state and being executing the next instruction.
...
I could see having both these two hub instructions as more useful then internal I/Os as you could then dynamically look to see which cogs are stopped and even restart cogs that may have missed waitCNT windows or whatever.
It would be far better if we had two instructions like·this vs a bunch of extra internal I/O.
...
In fact this would make alot of the code written for the propeller chip alot nicer. For example in my SD card block driver and in alot of other drivers where a host cog waits on a device driver cog to complete a request we could utlize this instruction setup to have the host and the slave processor enter and exit low power states until both finish.
First the host would wake up the device driver cog. And then the host would go to sleep waiting. The device driver cog would then execute the command and wake the host back up so that the host could go back along on its merry way.
...
If we do make these two instructions the signal commands to the cogs·should be treated as flags which are only sampled when·a cog is asleep.
For example in the example above if the device driver cog finished its task before the SPIN host feel alseep the host cog would never wake up again if the instruction does not set flags which are sampled only when the cog is asleep because the wakeup event would have already occured.
EXTRA ports are for Both State of COG's usage and as Mail BOXes betwen COG's without need of waiting on HUB window.
One of posibility of that is to have Match corotuines in one COG and Spin else other interpreter in other COG that can communicate directly without need of waiting for HUB window.
I have many other usages of that mail boxes but some are customer related and I can't talk on forum on them.
SORRY.
One thing I can say on that posibility to communication directly betwen COG's --- It is much more usable as some people understand.
Regards
Christoffer J
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Nothing is impossible, there are only different degrees of difficulty. For every stupid question there is at least one intelligent answer. Don't guess - ask instead. If you don't ask you won't know. If your gonna construct something, make it·as simple as·possible yet as versatile as posible.
Please don't listen to other people too much... except me (of course). Build it the way you want it. If it doesn't tickle you and inspire you, then it isn't worth doing. If it is too complex for the average guy(such as me)... that will simply create a market for all kinds of inventiveness by the engineering gurus around here.... "interpreting the Propeller, my way." Leave room for all kinds of 3rd party offerings... it will help to keep people invested in your offerings.
I was very interested to learn that you actually built your own FPGAs... and a little depressed that someone wasn't following after you to market/explain the little beasty to the rest of us. A Prop/FPGA combo would be irresistible and a perfect little product for educators and students. And look what it would do for all kinds of bottle necks that now require dedicated boards to alleviate. A Propeller with a Swiss Army Knife... what could be sweeter?
I begged for this a couple of years ago... I'm still begging[noparse]:)[/noparse]
A Prop/FPGA combo would be irresistible and a perfect little product for educators and students. And look what it would do for all kinds of bottle necks that now require dedicated boards to alleviate. A Propeller with a Swiss Army Knife... what could be sweeter?
I begged for this a couple of years ago... I'm still begging[noparse]:)[/noparse]
Don't be misled by the fact that people have so far only used single pins for synchronization - that's only because sacrificing external pins to do internal tasks is not generally a worthwhile tradeoff (i.e. the external pints are scarce enough already, and so much more useful for doing other things!).
But if 32 internal pins were available which did not consume external pins, then we would make good use of 32. If 64 were available, we would make good use of 64. How? Damned if I know exactly. But let's say we had 32 bits - that provides a simple way of being able to wake up any of up to 7 slave cogs and pass it a chunk of data to chew on - all in a single instruction. Results could be returned the same way a few instructions later - possibly 10 to 20 times faster than doing the same thing if you have to do it via the hub.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Turbulence uses a lot more than just one pin to sync. It was reported that the LED's on a demoboard go nuts while it's running, and that it won't run if the I/O pins that normally go to the breadboard are shorted.
I see the usefulness and simplicity of a pinless pin set. It's there if you want it, and if you don't you can ignore it, and it works just like the pins that do have external pins so you don't have to learn anything new to use it.
localroger said...
Turbulence uses a lot more than just one pin to sync. It was reported that the LED's on a demoboard go nuts while it's running, and that it won't run if the I/O pins that normally go to the breadboard are shorted.
Show me where (source is available)!
IIRC P0-P8 are used for inter-cog data transfers only, the LED's come up because it's a VGA demo. Synchronisation - where required - is done through waitcnt. SCNR
RossH said...
Results could be returned the same way a few instructions later - possibly 10 to 20 times faster than doing the same thing if you have to do it via the hub.
Can you back that up? I always hear wild claims about hub being slow but I'm not buying 10 to 20 times (until I see a concrete example).
Why are there no examles of the pin use between cogs, except simple 1 pin examples?
I cannot believe this question was even asked... Where on earth are you living? We simply cannot spare the pins. Look at all the kludges we are using to get more pins... TriBlade & RamBlade really use 2 props and no latches, DracBlade uses 2/3 latches, Bill uses latches and is now trying to use SPI sram, Ale is using a FPGA, Hydra uses a CPLD?, coley & baggers are using 3 props, (sorry for the ones I have forgotten).
There are a number here that have pointed out uses if we had spare (unused) pins. Fact is, we don't, so we cannot waste them. There are all sorts of comms between cogs we could do if only for the spare pins.
Chip, please, please, go with the 64 internal I/O pins. They are simple from a users point of view and we WILL use them.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
Okay, so everyone who posted above seems to have just ignored my idea...
...But lets say we use my idea. Then what would internal pins be good for? They won't be external pins. Please give examples other than speed, its a rather moot point to have more speed if it is not needed.
Hub communication isn't terrible. We've got by on it so far.
Actually, hub communication IS pretty bad compared to direct cog-to-cog communication.
Let's transfer a byte from cog a to cog b via the hub, then returning a byte from cog b to cog a ...
This will take at least 2 hub syncs for each cog. Assume the best case (i.e. cog a is synced with the hub when it writes the byte) - 7 clocks. Let's assume b=a+1, and that cog b is sitting in a tight loop and happens to read at just the right time - 7 clocks. Then cog b writes back - but it isn't synced any more, so that will take 21 clocks. Meanwhile cog a has been sitting in a tight loop waiting for the reply - let's hope we get lucky and it just happens to be synced with the hub when it reads - 7 clocks.
So a minimum of 42 clocks. And that's the best case - if a and b are not next to each other, and/or the loops don't align with the hub nicely, it could be nearly twice that.
Now let's do the same using an internal register ...
Assume both cog a and b have already set their direction and output bits. Cog a writes a byte to the internal register (e.g. in bits 0..7) and includes the pattern that b is looking for (e.g. in the 8..15 bits) - 4 clocks. Cog b is sitting at a waitpeq looking for "its" pattern in bits 8..15. It sees it and wakes up - 1 clock. Cog b reads the value - 4 clocks. Cog b writes a byte value back to the register (bits 17..24) and also a value that cog a knows to look for (bits 25 .. 31) - 4 clocks. Meanwhile cog a has executed a waitpeq and now wakes up - 1 clock. It reads the value - 4 clocks.
So a total of 18 clocks.
So for two-way comms it's somewhere between 2 to 5 times faster, depending on the geometry involved. And it's constant!
It gets even better if you only need to sync (i.e. not transfer any data) - 5 clocks (constant) vs a minimum of 18 to a maximum of about 46 clocks - i.e. up to 10 times faster (okay, 20 times may have been a bit of exaggeration!)
And the power consumption is reduced by 7/8 for those cogs in waitpeq state.
What's not to like?
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
I think, after reading this discussion, I will gladly stand by my second comment, on this topic, trusting the alphas here to set good norms, leaving us a nice resource, and the easy option of simply doing things the pure, most easily adapted way.
There will be some conflicts and such, they will largely get sorted out, and we have a nice resource addition to the propeller. Perhaps some sample case examples will serve to get things rolling in the right directions sooner rather than later. After all, we do now have a lot of understanding and experience on what's going to be an issue, and what just isn't.
That said, I am in total agreement with you on keeping the propeller as pure as possible, because that is really a big part of what differentiates it.
RossH said...
Let's transfer a byte from cog a to cog b via the hub, then returning a byte from cog b to cog a ...
This will take at least 2 hub syncs for each cog. Assume the best case (i.e. cog a is synced with the hub when it writes the byte) - 7 clocks. Let's assume b=a+1, and that cog b is sitting in a tight loop and happens to read at just the right time - 7 clocks. Then cog b writes back - but it isn't synced any more, so that will take 21 clocks. Meanwhile cog a has been sitting in a tight loop waiting for the reply - let's hope we get lucky and it just happens to be synced with the hub when it reads - 7 clocks.
So a minimum of 42 clocks. And that's the best case - if a and b are not next to each other, and/or the loops don't align with the hub nicely, it could be nearly twice that.
Now let's do the same using an internal register ...
Assume both cog a and b have already set their direction and output bits. Cog a writes a byte to the internal register (e.g. in bits 0..7) and includes the pattern that b is looking for (e.g. in the 8..15 bits) - 4 clocks. Cog b is sitting at a waitpeq looking for "its" pattern in bits 8..15. It sees it and wakes up - 1 clock. Cog b reads the value - 4 clocks. Cog b writes a byte value back to the register (bits 17..24) and also a value that cog a knows to look for (bits 25 .. 31) - 4 clocks. Meanwhile cog a has executed a waitpeq and now wakes up - 1 clock. It reads the value - 4 clocks.
So for two-way comms it's somewhere between 2 to 5 times faster, depending on the geometry involved. And it's constant!
So a total of 18 clocks.
Your timing is slightly off. The internal register case uses wait-read-write so the same should apply for the hub version (i.e. I don't see why b writing back shouldn't be in sync). Anyway best/worse case hub for any cog pair is actually 40/55 cycles against 22 for the internal register.
RossH said...
It gets even better if you only need to sync (i.e. not transfer any data) - 5 clocks (constant) vs a minimum of 18 to a maximum of about 46 clocks - i.e. up to 10 times faster (okay, 20 times may have been a bit of exaggeration!)
I'm amazed, where do those numbers come from? For starters, the 6 cycles for waitpxx only apply for the ideal case but that's acceptable for this example. Then the ideal case for hub sync is 10 cycles (minimum 8 cycles + 2 cog offset). Worst case here - cog pair is far apart (+12) and the initial hub window is missed (+15) - is 37. Throwing in a branch to get out of the loop makes that 14/41 but that's still not too bad. If you use hub interaction you damn well make sure that you are in sync and hand pick the cog pairs [noparse]:)[/noparse]
Having said all that, I'd happily use internal pin banks for tasks where it makes sense. My point was that too many claims flying around here without backup.
Kye:
* Instructions have to be documented.
* We have to have the available instruction op code space. I don't know Chip is finding this now.
* It does not achieve the performance we can achieve with > 32 bits.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
kuroneko said...
Your timing is slightly off. The internal register case uses wait-read-write so the same should apply for the hub version (i.e. I don't see why b writing back shouldn't be in sync). Anyway best/worse case hub for any cog pair is actually 40/55 cycles against 22 for the internal register.
Not only is the internal register method much faster, it is constant in the time it takes no matter where the hub is, while the "best case" hub method depends on catching the hub at exactly the right time - which is unlikely to be true in a real-world case (where you would probably need to do some procesing before returning a value). In any case, I'd be happy with a 3-fold performance improvement.
kuroneko said...
I'm amazed, where do those numbers come from? For starters, the 6 cycles for waitpxx only apply for the ideal case but that's acceptable for this example. Then the ideal case for hub sync is 10 cycles (minimum 8 cycles + 2 cog offset). Worst case here - cog pair is far apart (+12) and the initial hub window is missed (+15) - is 37. Throwing in a branch to get out of the loop makes that 14/41 but that's still not too bad. If you use hub interaction you damn well make sure that you are in sync and hand pick the cog pairs [noparse]:)[/noparse]
The minimum time for waitpeq is given as 5 clock cycles (pp 369) - but you are not re-executing the whole waitpeq each time. You execute it once and wait. The documentation (pp 222) says the I/O pins are checked every clock cycle. My assumption was that some of the waitpeq time is setting up to detect the pin match and that the cog will resume on the next clock cycle after the pin match is detected - not 5 clock cycles later. But perhaps I am wrong here - some Propeller guru can confirm how long it takes from detection of the matching pin state till a cog that is already in a waitpeq executes the next instruction. And in the waitpeq case the next instruction will be to immediately read the data. So that's somewhere between 5 and 9 clocks for the internal register case (depending on what the gurus say), versus 40-odd for the hub case. So a 4-fold to 5-fold performance improvement.
But to be honest, arguing about the exact numbers is kind of missing the point. I don't really care whether it's a 3-fold, 4-fold or 10-fold improvement - I can't understand why anyone would NOT want to do it if they could. It's a very simple hardware change (according to Chip) and one which is absolutley in keeping with the existing Prop design (it can in fact be done on the current Prop if you are willing to sacrifice nearly all your I/O pins) - and for this simple change you not only get somewhere betewen a 3-fold to 5-fold (by your figuring) to 10-fold (by my admittedly optimistic figuring) increase in performance - and you also get a significant reduction in power consumption.
A no-brainer, really.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
kuroneko, you misunderstood my point. Someone said Turbulence just uses one odd pin, for sync. My point was that it uses all eight of the odd pins, a lot more than one. I understood that it's doing high-speed intercog transfers instead of sync, but I phrased it kind of confusingly.
"The LEDs going nuts on Turbulence" might have been me. I was nailing a Z80 to the ports and had put LEDs everywhere, for debuging. This showed the P0-P7 being hailed, within hours of the programs release.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
I've finally found time to read this whole thread and it has been an interesting read. I think cog syncing is one of the weakest uses for internal ports but I am sure there are others that will be more useful especially when working at the limits of the chips capability. The idea of a general purpose straightforward means of sharing is pretty neat, I particularly like the idea of routing counter outputs between cogs with no loss of IO, this would let you chain counter functions if you wanted and given the new counter awesomeness this might be jolly useful (Chip may be able to confirm/deny).
I can see why Cluso99 and others are so keen on having this functionality and I can also see why Kye is worried. Let us assume that implementing this does not lose us a load of other super cool features, and does not f**K up the assembly. This leaves the issue of object compatibility.
As expert users Cluso99 et al will totally grok the internal ports and will modify any objects that are not implemented properly to suit. Whatever happens they will be fine as they build their "funny computers" (my tongue is in my cheek). Chip on the other hand is running a business, I suspect the average buyer/user of the propeller is more likely to be interested in interfacing with cool stuff, robotics and industrial style control. They may also be far removed from the computer scientists that continually impress on these forums.
Take an example, you download an object and in the documentation is says "communication between cogs is done on internal pins 1-3", if you can program well you can probably change this, let's hope it is not in assembly. At least they told you though, you download another object: "communication between cogs is done on internal pins 2-6", oh bugger! Now you have to modify at least one object for sure. On another occasion you may download an object that uses the internal port but that is not even mentioned in the docs, crashing and confusion ensue "I didn't even know there was an internal port". Another might use all 32bits, why not it was fine for their application.
Of course all of these are examples of bad practice. In hardware implemented ports the writer of objects would almost expect that the user would have stuff already soldered on to the pins, for the internal port they might be less thoughtful.
I think internal ports should be implemented because they seem super useful but I also think some thought needs to be put into how to encourage or force good practice in creating objects. What people should do is not necessarily what people will do, if OBEX2 is a mess it will be Parallax that pay the price not the super users with their funky boundary pushing projects.
That could start with a best practice section in the documentation in the internal ports section. There could also be a simple questionnaire on the OBEX "does this object use internal ports?", "if so can these be reconfigured?", "do they need to be contiguous?". After ticking the various radio boxes warnings and suggestions could be given and the information could also be provided with the object. This sort of best practice checking might even be useful now even as a stand alone app.
Graham
Post Edited (Graham Stabler) : 2/19/2010 3:58:46 PM GMT
That problems we have already in some objects on configuring EXTERNAL Ports -- BAD programing structures and documentation.
BUT onl for at some programers not have good documentations on theirs work we not ned be Restricted in hardware for at met theirs badly writen code.
Regards
Christoffer J
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Nothing is impossible, there are only different degrees of difficulty. For every stupid question there is at least one intelligent answer. Don't guess - ask instead. If you don't ask you won't know. If your gonna construct something, make it·as simple as·possible yet as versatile as posible.
Chip, are you sure those extra internal IO's aren't usefull in some fancy way in this sine-wave generator business you're talking about?
I guess here is where I would like my sync's real fast and precise. Anyway it's obvious my remarks are not based on knowledge, it's
just this feeling, that these new features will blow our minds... Once we have our I's and Q's wrapped around it, after it all is Complex.
By the way, although I owned a Sinclair ZX81, however great the design, it was not an Apple ][noparse][[/noparse] and when I hear The Great Woz talk
about his desing, I am sure I have missed out on something. You and your propellers surely me give another chance at the fun.
for a sort of real world use for the internal ports, if someone wanted to implement single instruction multiple data system on the prop you could use internal pins to pass instructions from 1 cog to manipulate 7 arrays of around 500 longs(assuming 6 special function registers) in the other cogs.
assuming that the prop 2 will have 8 cycles between hubops, this could do the array manipulations twice as fast, and one could likely make some code that would work faster, i just wanted to show something that would most likely work.
I am not advocating this as useful, just as a scenario that would benefit from the added pins.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
My head is pasming!
Post Edited (Alsowolfman) : 2/19/2010 3:48:52 PM GMT
Ale said...
Chuckz: you just use one of the methods to bring 5V signals to 3.3 V levels or the other way around: CPLDs, '8T245s, resistors or some other level shifters by Maxim, National, Analog, NXP....
I'd use resistors and a 8T245 in your case...
Someone from 6502.org said I could just put an LED between each pin and that would drop the voltage down.· I would just be wary of the LED burning out.
Comments
Do you plan to have a TurboPropeller release party? It would be an extraordinary event.
Cheers.
--Steve
What extra instructions are added besides mul and div ?
I havn't been following this thread (or any other prop2 threads), so I'm sorry for being ignorant.
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
Okay so, if the main use of all the internal I/O pins will just be for waiting in idle state for some event to happen... why not make a new "HUB OP" called wait for signal.
You could have 8 signal lines in the hub which would kick any cog out of a low power state.
....
So we·should have two instructions.
"CogState" - Returns the state of the cog specified. I.E. it returns -1 if the cog is stopped, 0 if the cog is running, 1 if the cog is idling.
"CogSignal" - Forces the specified cog to exit low power idle state and being executing the next instruction.
...
I could see having both these two hub instructions as more useful then internal I/Os as you could then dynamically look to see which cogs are stopped and even restart cogs that may have missed waitCNT windows or whatever.
It would be far better if we had two instructions like·this vs a bunch of extra internal I/O.
...
In fact this would make alot of the code written for the propeller chip alot nicer. For example in my SD card block driver and in alot of other drivers where a host cog waits on a device driver cog to complete a request we could utlize this instruction setup to have the host and the slave processor enter and exit low power states until both finish.
First the host would wake up the device driver cog. And then the host would go to sleep waiting. The device driver cog would then execute the command and wake the host back up so that the host could go back along on its merry way.
...
If we do make these two instructions the signal commands to the cogs·should be treated as flags which are only sampled when·a cog is asleep.
For example in the example above if the device driver cog finished its task before the SPIN host feel alseep the host cog would never wake up again if the instruction does not set flags which are sampled only when the cog is asleep because the wakeup event would have already occured.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,
Post Edited (Kye) : 2/12/2010 4:36:58 PM GMT
EXTRA ports are for Both State of COG's usage and as Mail BOXes betwen COG's without need of waiting on HUB window.
One of posibility of that is to have Match corotuines in one COG and Spin else other interpreter in other COG that can communicate directly without need of waiting for HUB window.
I have many other usages of that mail boxes but some are customer related and I can't talk on forum on them.
SORRY.
One thing I can say on that posibility to communication directly betwen COG's --- It is much more usable as some people understand.
Regards
Christoffer J
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
For every stupid question there is at least one intelligent answer.
Don't guess - ask instead.
If you don't ask you won't know.
If your gonna construct something, make it·as simple as·possible yet as versatile as posible.
Sapieha
Please don't listen to other people too much... except me (of course). Build it the way you want it. If it doesn't tickle you and inspire you, then it isn't worth doing. If it is too complex for the average guy(such as me)... that will simply create a market for all kinds of inventiveness by the engineering gurus around here.... "interpreting the Propeller, my way." Leave room for all kinds of 3rd party offerings... it will help to keep people invested in your offerings.
I was very interested to learn that you actually built your own FPGAs... and a little depressed that someone wasn't following after you to market/explain the little beasty to the rest of us. A Prop/FPGA combo would be irresistible and a perfect little product for educators and students. And look what it would do for all kinds of bottle necks that now require dedicated boards to alleviate. A Propeller with a Swiss Army Knife... what could be sweeter?
I begged for this a couple of years ago... I'm still begging[noparse]:)[/noparse]
Rich
Rich
Try this: http://forums.parallax.com/showthread.php?p=877975
Don't be misled by the fact that people have so far only used single pins for synchronization - that's only because sacrificing external pins to do internal tasks is not generally a worthwhile tradeoff (i.e. the external pints are scarce enough already, and so much more useful for doing other things!).
But if 32 internal pins were available which did not consume external pins, then we would make good use of 32. If 64 were available, we would make good use of 64. How? Damned if I know exactly. But let's say we had 32 bits - that provides a simple way of being able to wake up any of up to 7 slave cogs and pass it a chunk of data to chew on - all in a single instruction. Results could be returned the same way a few instructions later - possibly 10 to 20 times faster than doing the same thing if you have to do it via the hub.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
I see the usefulness and simplicity of a pinless pin set. It's there if you want it, and if you don't you can ignore it, and it works just like the pins that do have external pins so you don't have to learn anything new to use it.
IIRC P0-P8 are used for inter-cog data transfers only, the LED's come up because it's a VGA demo. Synchronisation - where required - is done through waitcnt. SCNR
Can you back that up? I always hear wild claims about hub being slow but I'm not buying 10 to 20 times (until I see a concrete example).
Post Edited (kuroneko) : 2/13/2010 1:01:03 AM GMT
I cannot believe this question was even asked... Where on earth are you living? We simply cannot spare the pins. Look at all the kludges we are using to get more pins... TriBlade & RamBlade really use 2 props and no latches, DracBlade uses 2/3 latches, Bill uses latches and is now trying to use SPI sram, Ale is using a FPGA, Hydra uses a CPLD?, coley & baggers are using 3 props, (sorry for the ones I have forgotten).
There are a number here that have pointed out uses if we had spare (unused) pins. Fact is, we don't, so we cannot waste them. There are all sorts of comms between cogs we could do if only for the spare pins.
Chip, please, please, go with the 64 internal I/O pins. They are simple from a users point of view and we WILL use them.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
...But lets say we use my idea. Then what would internal pins be good for? They won't be external pins. Please give examples other than speed, its a rather moot point to have more speed if it is not needed.
Hub communication isn't terrible. We've got by on it so far.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,
Actually, hub communication IS pretty bad compared to direct cog-to-cog communication.
Let's transfer a byte from cog a to cog b via the hub, then returning a byte from cog b to cog a ...
This will take at least 2 hub syncs for each cog. Assume the best case (i.e. cog a is synced with the hub when it writes the byte) - 7 clocks. Let's assume b=a+1, and that cog b is sitting in a tight loop and happens to read at just the right time - 7 clocks. Then cog b writes back - but it isn't synced any more, so that will take 21 clocks. Meanwhile cog a has been sitting in a tight loop waiting for the reply - let's hope we get lucky and it just happens to be synced with the hub when it reads - 7 clocks.
So a minimum of 42 clocks. And that's the best case - if a and b are not next to each other, and/or the loops don't align with the hub nicely, it could be nearly twice that.
Now let's do the same using an internal register ...
Assume both cog a and b have already set their direction and output bits. Cog a writes a byte to the internal register (e.g. in bits 0..7) and includes the pattern that b is looking for (e.g. in the 8..15 bits) - 4 clocks. Cog b is sitting at a waitpeq looking for "its" pattern in bits 8..15. It sees it and wakes up - 1 clock. Cog b reads the value - 4 clocks. Cog b writes a byte value back to the register (bits 17..24) and also a value that cog a knows to look for (bits 25 .. 31) - 4 clocks. Meanwhile cog a has executed a waitpeq and now wakes up - 1 clock. It reads the value - 4 clocks.
So a total of 18 clocks.
So for two-way comms it's somewhere between 2 to 5 times faster, depending on the geometry involved. And it's constant!
It gets even better if you only need to sync (i.e. not transfer any data) - 5 clocks (constant) vs a minimum of 18 to a maximum of about 46 clocks - i.e. up to 10 times faster (okay, 20 times may have been a bit of exaggeration!)
And the power consumption is reduced by 7/8 for those cogs in waitpeq state.
What's not to like?
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
I think, after reading this discussion, I will gladly stand by my second comment, on this topic, trusting the alphas here to set good norms, leaving us a nice resource, and the easy option of simply doing things the pure, most easily adapted way.
There will be some conflicts and such, they will largely get sorted out, and we have a nice resource addition to the propeller. Perhaps some sample case examples will serve to get things rolling in the right directions sooner rather than later. After all, we do now have a lot of understanding and experience on what's going to be an issue, and what just isn't.
That said, I am in total agreement with you on keeping the propeller as pure as possible, because that is really a big part of what differentiates it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
8x8 color 80 Column NTSC Text Object
Safety Tip: Life is as good as YOU think it is!
I'm amazed, where do those numbers come from? For starters, the 6 cycles for waitpxx only apply for the ideal case but that's acceptable for this example. Then the ideal case for hub sync is 10 cycles (minimum 8 cycles + 2 cog offset). Worst case here - cog pair is far apart (+12) and the initial hub window is missed (+15) - is 37. Throwing in a branch to get out of the loop makes that 14/41 but that's still not too bad. If you use hub interaction you damn well make sure that you are in sync and hand pick the cog pairs [noparse]:)[/noparse]
Having said all that, I'd happily use internal pin banks for tasks where it makes sense. My point was that too many claims flying around here without backup.
I'm happy as long as I don't loose register space.
On another note, I think that idea I had above is pretty neat.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,
* Instructions have to be documented.
* We have to have the available instruction op code space. I don't know Chip is finding this now.
* It does not achieve the performance we can achieve with > 32 bits.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Not only is the internal register method much faster, it is constant in the time it takes no matter where the hub is, while the "best case" hub method depends on catching the hub at exactly the right time - which is unlikely to be true in a real-world case (where you would probably need to do some procesing before returning a value). In any case, I'd be happy with a 3-fold performance improvement.
The minimum time for waitpeq is given as 5 clock cycles (pp 369) - but you are not re-executing the whole waitpeq each time. You execute it once and wait. The documentation (pp 222) says the I/O pins are checked every clock cycle. My assumption was that some of the waitpeq time is setting up to detect the pin match and that the cog will resume on the next clock cycle after the pin match is detected - not 5 clock cycles later. But perhaps I am wrong here - some Propeller guru can confirm how long it takes from detection of the matching pin state till a cog that is already in a waitpeq executes the next instruction. And in the waitpeq case the next instruction will be to immediately read the data. So that's somewhere between 5 and 9 clocks for the internal register case (depending on what the gurus say), versus 40-odd for the hub case. So a 4-fold to 5-fold performance improvement.
But to be honest, arguing about the exact numbers is kind of missing the point. I don't really care whether it's a 3-fold, 4-fold or 10-fold improvement - I can't understand why anyone would NOT want to do it if they could. It's a very simple hardware change (according to Chip) and one which is absolutley in keeping with the existing Prop design (it can in fact be done on the current Prop if you are willing to sacrifice nearly all your I/O pins) - and for this simple change you not only get somewhere betewen a 3-fold to 5-fold (by your figuring) to 10-fold (by my admittedly optimistic figuring) increase in performance - and you also get a significant reduction in power consumption.
A no-brainer, really.
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
I can see why Cluso99 and others are so keen on having this functionality and I can also see why Kye is worried. Let us assume that implementing this does not lose us a load of other super cool features, and does not f**K up the assembly. This leaves the issue of object compatibility.
As expert users Cluso99 et al will totally grok the internal ports and will modify any objects that are not implemented properly to suit. Whatever happens they will be fine as they build their "funny computers" (my tongue is in my cheek). Chip on the other hand is running a business, I suspect the average buyer/user of the propeller is more likely to be interested in interfacing with cool stuff, robotics and industrial style control. They may also be far removed from the computer scientists that continually impress on these forums.
Take an example, you download an object and in the documentation is says "communication between cogs is done on internal pins 1-3", if you can program well you can probably change this, let's hope it is not in assembly. At least they told you though, you download another object: "communication between cogs is done on internal pins 2-6", oh bugger! Now you have to modify at least one object for sure. On another occasion you may download an object that uses the internal port but that is not even mentioned in the docs, crashing and confusion ensue "I didn't even know there was an internal port". Another might use all 32bits, why not it was fine for their application.
Of course all of these are examples of bad practice. In hardware implemented ports the writer of objects would almost expect that the user would have stuff already soldered on to the pins, for the internal port they might be less thoughtful.
I think internal ports should be implemented because they seem super useful but I also think some thought needs to be put into how to encourage or force good practice in creating objects. What people should do is not necessarily what people will do, if OBEX2 is a mess it will be Parallax that pay the price not the super users with their funky boundary pushing projects.
That could start with a best practice section in the documentation in the internal ports section. There could also be a simple questionnaire on the OBEX "does this object use internal ports?", "if so can these be reconfigured?", "do they need to be contiguous?". After ticking the various radio boxes warnings and suggestions could be given and the information could also be provided with the object. This sort of best practice checking might even be useful now even as a stand alone app.
Graham
Post Edited (Graham Stabler) : 2/19/2010 3:58:46 PM GMT
That problems we have already in some objects on configuring EXTERNAL Ports -- BAD programing structures and documentation.
BUT onl for at some programers not have good documentations on theirs work we not ned be Restricted in hardware for at met theirs badly writen code.
Regards
Christoffer J
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
For every stupid question there is at least one intelligent answer.
Don't guess - ask instead.
If you don't ask you won't know.
If your gonna construct something, make it·as simple as·possible yet as versatile as posible.
Sapieha
Graham
Chip, are you sure those extra internal IO's aren't usefull in some fancy way in this sine-wave generator business you're talking about?
I guess here is where I would like my sync's real fast and precise. Anyway it's obvious my remarks are not based on knowledge, it's
just this feeling, that these new features will blow our minds... Once we have our I's and Q's wrapped around it, after it all is Complex.
By the way, although I owned a Sinclair ZX81, however great the design, it was not an Apple ][noparse][[/noparse] and when I hear The Great Woz talk
about his desing, I am sure I have missed out on something. You and your propellers surely me give another chance at the fun.
Thanks, and keep the good spirit turning.
for cogs containing data:
:start mov :loop, inb
nop
:loop long 0-0
jmp #:start
'data
for the cog containing instructions:
:start mov outb, counter
add counter, #1
nop
jmp #:start
'instructions
assuming that the prop 2 will have 8 cycles between hubops, this could do the array manipulations twice as fast, and one could likely make some code that would work faster, i just wanted to show something that would most likely work.
I am not advocating this as useful, just as a scenario that would benefit from the added pins.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
My head is pasming!
Post Edited (Alsowolfman) : 2/19/2010 3:48:52 PM GMT