Roy:
I wasn't pointing at Intel for being first or even for having repeated the mistake. I was pointing at Intel because of the PC's total domination of the home computer market and because they could still reverse their mistake. There is opportunity with each new databus design released.
High:
Amusing. I'm not sure if it's really an improvement. Certainly doesn't solve the source code mismatch nor the mismatch with our perception of numbers. There can be confusion about whether 'a' is $16 or $61 if one is not familiar with the encoding already.
Dude, Intel can't change to BE now. You do realize that every CPU is still binary data compatible with 8086 machine code. This is probably one of the main reasons that Intel dominates the PC market.
If they switch to BE the break everything, and that's been true all along. So my point was that several CPU architectures in the 70s used LE, and they did this likely because it was more efficient in the hardware. Also, I remember the 68000 CPUs being MUCH MUCH more expensive back in the late 70s early 80s timeframe (parhaps some of that is because of the extra hardware to do BE which was not trivial at that time). Intel stayed LE to remain compatible. Sure they could have gone bi-endian or something, but seriously, why? What good reason is there for them to do that? To make a handful of BE fans happy?
Again, I couldn't care less whether it was BE or LE, it makes no difference to me at all. My job entails working with Intel and PPC architectures as well as with networking, so I work with LE and BE all the time. It's trivial to convert back and forth in software as needed. It's trivial to look at hex dumps and mentally reverse as needed (but that's rare since the tools I use can just display it as a word or dword (and hex, decimal, or float)). I just don't get why you seem so worked up over this issue.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out the Propeller Wiki·and contribute if you can.
Because I've had to correct you lot so much maybe? But really, the reason I go to this trouble is to encourage changing the PropII to big endian.
As for the PC, binary compatibility is no big deal. It wouldn't take a lot to support both BE and LE. And any minor issues would be brushed aside pretty quickly like it always has been.
You'll always have as many take-offs as landings, the trick is to be sure you can take-off again BTW: I type as I'm thinking, so please don't take any offence at my writing style
It's not a databus issue; it's much more fundamental in the architecture.
Having dealt with Intel architects, I wouldn't hold my breath waiting for them to change; it ain't gonna happen.
What I do find amusing is that while their data diagrams show little-endian very explicitly, all of Intel's instruction format diagrams are big-endian. I.e., the opcode is at the left, extending to the right. National's NS16000 family's manuals were the only ones I remember that had EVERYTHING in little-endian format, with opcodes displayed on the right margin, extending to the left, showing the endian-ness very clearly.
Ron
evanh said...
Roy:
I wasn't pointing at Intel for being first or even for having repeated the mistake. I was pointing at Intel because of the PC's total domination of the home computer market and because they could still reverse their mistake. There is opportunity with each new databus design released.
High:
Amusing. I'm not sure if it's really an improvement. Certainly doesn't solve the source code mismatch nor the mismatch with our perception of numbers. There can be confusion about whether 'a' is $16 or $61 if one is not familiar with the encoding already.
This is really about the Prop. Of course I don't expect Intel to read this whinge. They still could make the change though.
The databus is fairly important. The CPU architecture spills out on to the databus. This defines what other IC makers and board makers will do. The fundamental nature is that the problem is a hardware one, it's not fundamental in it's difficultly to flip around in the design.
Lets put it this way, forget how you read the number, how do you perform math on it? If you have the numbers 578 and 423 that you have to add together, how do you manually add these? Do you add the 5 & 4 together, add 7 & 2 together, add 8 & 3 together, then propagate the carry upwards? No, you perform mathmatics right to left starting with the smallest unit first and deal with any carries along the way. It is this reasoning that Little Endian is the most used endianess. It has nothing to do with readability, but what is most convenient for the processor to deal with computationally. It too works on the small end first, so the bytes are ordered in the manner they will be operated on.
Around we go ... you're arguing that, A) LE is better because of the flow in a single stage (Even though we've already covered this detail)!? and, because Intel adopted LE and the clone market exploded, therefore many followed, that that is proof that LE must better than BE!?
evanh,
From a hardware perspective, if you don't have the 32 bit wide data paths (for 32 bit values), LE is easier because you can access the data one byte at a time or one 16 bit word at a time from least significant to most significant in computational order. If you have 32 bit data paths and variable length fetches, than either form will work.
Mike:
Already discussed. It doesn't happen for the most part because the databus size is 32 bit, the same as the ALU. And the offset also occurs for stacking LE. It's a non-issue.
Frankly, I could go either way, as long as the bit order, nybble order, byte order, word order, etc., are all the same. For BE notation, this would mean numbering bits from the left, as well, so the MSB of any entity would be bit 0. That way bit 0 of memory is bit 0 of every entity that gets stored in the first slot, be it a word, byte, or what have you. This is the same consistency that LE notation enjoys now, despite its apparent unnaturalness to Western readers.
I daresay, juggling multiple or unnatural representations does keep the mind agile. For this we should be thankful. With 401Ks tanking and retirements being pushed later and later, it's important to stave off senility as long as possible!
A change of direction. The per-pin delta-sigma modulators planned for the Prop II sound really cool. (one external resistor to set range and point a counter at the pin for a "good'nuff" ADC) Any chance these will migrate back to the current Prop? I've played around a bit with delta-sigma on the current prop and it's hard to get more than eight stable bits with external components and some decent speed. (like a 1kHz sample rate) Haven't tried a dedicated PCB layout or surface mount parts yet.
Marty
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Lunch cures all problems! have you had lunch?
Paul: Certainly not dead for the PropII. I look forward to the switch. [noparse]:)[/noparse]
Phil: I don't get your idea. The bit position is currently also it's significance for LE. What you appear to be advocating is to forgo that. I can't see how that can be considered consistent.
Actually, Phil. I think you might have misunderstood the correct linguistic layout of little endian. LE bit zero is the first (left most) bit for western writing (and your average source code listing). The fact that it's mostly displayed last is a misrepresentation, just one more of the confusions that go with the territory.
I guess I'll have to draw a picture. (Paul, I promise: this is my final word on the matter! )
The LE layout is shown in its natural right-to-left order (i.e. like HighJump's memory dump, which I really like). Note that the orders of bits, bytes, words, and longs are all the same, and that bits 0 of byte 0, word 0 and long 0 are all the same bit. (This is true regardless of the memory dump order, BTW.) Bit order is monotonic all the way through memory.
Now look what happens in normal BE notation, shown in its natural left-to-right order. Bit order is the reverse of byte, word, and long order. Moreover, bits 0 of byte 0, word 0, and long 0 refer to different bits entirely (again independent of memory dump order). Bit order is no longer monotonic and suffers discontinuities at the breaks between entities, depending on what size we assign those entities.
But it's easy to restore order to BE format by reordering the way bits are numbered, as the third chart illustrates.
I'm perfectly happy choosing door #1 or door #3. Door #2? Not so much. But let's face it, #3 would raise a huge stink, since bit order is so ingrained. I'm afraid that, from a purely mathematical standpoint (Western linguistic sensitivities notwithstanding), that leaves #1. #2 just trades one set of incongruities (lingquistic) for another (mathematical). Given the choice, I'd rather deal with a little linguistic discomfort. But, seriously, given just two choices (#1 and #2) it boils down to a matter of taste, which really can't be discussed rationally but, as Paul stated, merely beaten to death. And we've done such a thorough job of that, there's hardly anything left to bury!
Lol. That's broken. What you are trying to do there is change the linguistics to match the incorrectly arranged LE. Correctly formatted LE is least significant first. And for us that's leftmost.
I vote for a clean slate. In the long run the problems and limitations imposed by code compatibility are not worth it for an embedded processor. The reason for going to a more powerful processor requires more coding any way so why not go to a clean slate. What you do not want to do is to is to make current users learn a whole new architecture from scratch.
I have written assembly language by hand assembling opcodes in octal and hexadecimal in both big and little endian systems. I am not talking about programs like DOS DEBUG which translate opcodes for you, I am talking about memorizing the fact that $CD is CALL.
Little Endian is the correct representation. It doesn't make much sense for humans because humans aren't computers, but for the computer if bit 0 is the LSB, then byte 0, word 0, and long 0 should be the least significant members of their larger structures too. If you don't like reading it on the hex dump write a hex dumper that swaps them.
That's what computers are for, and that's what I did. When you are mixing data sizes, which you always do in a limited memory environment because every bit is precious, Big Endian is a great big mess.
I also vote for breaking things in the near term for flat memory for some time to come.
You mention "analog pins". Are these pins that have ADCs attached to them? What's the maximum sampling rate?
Each pin on the next Propeller will have an integrated delta-sigma ADC that will sample at the clock frequency. At 160MHz, you could get the following resolutions and sample rates:
I know this is an old post (sorry!) but when Chip says "integrated ADC", does that mean it will not require any external resistors/caps? This is amazing either way.... I would be able to to measure thermocouples and loads cells directly.... wow.
Andrey, some registers cannot be used in the destination field because there is a shadow RAM position that gets read or written instead. For high speed you have to write to COG memory:
[noparse][[/noparse]code]
mov dest_0, INA
mov dest_1, INA
mov dest_2, INA
...
dest_0 long 0
dest_1 long 0
dest_2 long 0
...
[noparse][[/noparse]code]
and then transfer to HUB. The propeller logic analyzer uses this method if I'm not mistaken.
But if you need more than 20 (25) MHz, either use more COGS and interleave them or use another uC.
I know that, and I do have techniques for high speed acquisition up to 1/2 clock frequency to COG memory, using ring buffers. I just feel that is wrong - and should be corrected.
Wow.. I finally got through this whole thread, and while I may be late (very?) to the party, I'll celebrate anyways!
I have been thinking a bit about the possibility of having the hub (and hub ram) clock being = system clock * number of cogs. This would allow hub ram access every cycle in every cog, in some ways eliminating the benefits of LMM. I'm assuming the desired manufacturing process could not allow for such speeds in the necessary circuits (otherwise, why not make the entire circuit as fast, I guess)? As this is probably not possible, what about the following:
Not every program needs the exact determinism of hub access to shuffle data around. Some cog programs might want data more frequently than the allotted 1/16 system clock window, while others might need less. I know Chip and others have suggested various methods for slicing pieces of the pie, but from the suggestions I've seen, seem to create many potential conflicts. So, I thought of a simple two-mode hub memory access scheme - mode 0 could the typical 1/16th sysClock hub access, while mode 1 could be a FIFO-buffered request. For example, lets say all 8 cogs are running in mode 1, a cog can request a read/write to the hub at any given time, and depending on how many pending requests are in the buffer, will determine when the read/write request can be performed. The benefit to this mode is that depending on the other cog's hub access requirements, hub access could potentially be granted more than 1/16th sysClock (but never less). The downside to this mode is that determinism would prove to be nearly impossible. For those cog programs that need maximum MIPS utilization (and therefore, maximum low-level predictability), mode 0 could be specified so that hub access occurs in it's typical 1/16th sysClock time frame. Each cog could specify one mode or the other.. If one cog is configured for mode 0, while the rest are on mode one, those that are on mode one would be able to send requests to the buffer for a piece of the 14/16ths pie.
markaeric:
"Not every program needs the exact determinism of hub access to shuffle data around"
Whilst that may well be true many times things become very awkward as soon as you have some special fast access, turbo, low latency, high priority mode (SFATLLHPM) available for some objects.
What happens when a user picks two drivers out of OBEX that happen to both need SFATLLHPM?
As soon as it is allowed that one or more COGs can be "more equal" than the others we run into a very hard to manage situation with regard mixing and matching any and all combinations of objects.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Comments
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Just a few PropSTICK Kit bare PCBs left!
I wasn't pointing at Intel for being first or even for having repeated the mistake. I was pointing at Intel because of the PC's total domination of the home computer market and because they could still reverse their mistake. There is opportunity with each new databus design released.
High:
Amusing. I'm not sure if it's really an improvement. Certainly doesn't solve the source code mismatch nor the mismatch with our perception of numbers. There can be confusion about whether 'a' is $16 or $61 if one is not familiar with the encoding already.
Changing the hardware is the only solution.
If they switch to BE the break everything, and that's been true all along. So my point was that several CPU architectures in the 70s used LE, and they did this likely because it was more efficient in the hardware. Also, I remember the 68000 CPUs being MUCH MUCH more expensive back in the late 70s early 80s timeframe (parhaps some of that is because of the extra hardware to do BE which was not trivial at that time). Intel stayed LE to remain compatible. Sure they could have gone bi-endian or something, but seriously, why? What good reason is there for them to do that? To make a handful of BE fans happy?
Again, I couldn't care less whether it was BE or LE, it makes no difference to me at all. My job entails working with Intel and PPC architectures as well as with networking, so I work with LE and BE all the time. It's trivial to convert back and forth in software as needed. It's trivial to look at hex dumps and mentally reverse as needed (but that's rare since the tools I use can just display it as a word or dword (and hex, decimal, or float)). I just don't get why you seem so worked up over this issue.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Check out the Propeller Wiki·and contribute if you can.
As for the PC, binary compatibility is no big deal. It wouldn't take a lot to support both BE and LE. And any minor issues would be brushed aside pretty quickly like it always has been.
</OT>
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheers,
Simon
www.norfolkhelicopterclub.com
You'll always have as many take-offs as landings, the trick is to be sure you can take-off again
BTW: I type as I'm thinking, so please don't take any offence at my writing style
Everytime someone tries that over here they die in a head on collision. There is nothing gradual about it [noparse];)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
It's not a databus issue; it's much more fundamental in the architecture.
Having dealt with Intel architects, I wouldn't hold my breath waiting for them to change; it ain't gonna happen.
What I do find amusing is that while their data diagrams show little-endian very explicitly, all of Intel's instruction format diagrams are big-endian. I.e., the opcode is at the left, extending to the right. National's NS16000 family's manuals were the only ones I remember that had EVERYTHING in little-endian format, with opcodes displayed on the right margin, extending to the left, showing the endian-ness very clearly.
Ron
The databus is fairly important. The CPU architecture spills out on to the databus. This defines what other IC makers and board makers will do. The fundamental nature is that the problem is a hardware one, it's not fundamental in it's difficultly to flip around in the design.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
Post Edited (Paul Baker (Parallax)) : 10/30/2008 10:44:38 PM GMT
But, from a human convenience perspective BE wins hands down.
From a hardware perspective, if you don't have the 32 bit wide data paths (for 32 bit values), LE is easier because you can access the data one byte at a time or one 16 bit word at a time from least significant to most significant in computational order. If you have 32 bit data paths and variable length fetches, than either form will work.
Already discussed. It doesn't happen for the most part because the databus size is 32 bit, the same as the ALU. And the offset also occurs for stacking LE. It's a non-issue.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer
Parallax, Inc.
Live long and prosper!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Adequatio Rei Et Intellectus
I daresay, juggling multiple or unnatural representations does keep the mind agile. For this we should be thankful. With 401Ks tanking and retirements being pushed later and later, it's important to stave off senility as long as possible!
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Just a few PropSTICK Kit bare PCBs left!
Marty
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Lunch cures all problems! have you had lunch?
Phil: I don't get your idea. The bit position is currently also it's significance for LE. What you appear to be advocating is to forgo that. I can't see how that can be considered consistent.
The LE layout is shown in its natural right-to-left order (i.e. like HighJump's memory dump, which I really like). Note that the orders of bits, bytes, words, and longs are all the same, and that bits 0 of byte 0, word 0 and long 0 are all the same bit. (This is true regardless of the memory dump order, BTW.) Bit order is monotonic all the way through memory.
Now look what happens in normal BE notation, shown in its natural left-to-right order. Bit order is the reverse of byte, word, and long order. Moreover, bits 0 of byte 0, word 0, and long 0 refer to different bits entirely (again independent of memory dump order). Bit order is no longer monotonic and suffers discontinuities at the breaks between entities, depending on what size we assign those entities.
But it's easy to restore order to BE format by reordering the way bits are numbered, as the third chart illustrates.
I'm perfectly happy choosing door #1 or door #3. Door #2? Not so much. But let's face it, #3 would raise a huge stink, since bit order is so ingrained. I'm afraid that, from a purely mathematical standpoint (Western linguistic sensitivities notwithstanding), that leaves #1. #2 just trades one set of incongruities (lingquistic) for another (mathematical). Given the choice, I'd rather deal with a little linguistic discomfort. But, seriously, given just two choices (#1 and #2) it boils down to a matter of taste, which really can't be discussed rationally but, as Paul stated, merely beaten to death. And we've done such a thorough job of that, there's hardly anything left to bury!
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Just a few PropSTICK Kit bare PCBs left!
I have written assembly language by hand assembling opcodes in octal and hexadecimal in both big and little endian systems. I am not talking about programs like DOS DEBUG which translate opcodes for you, I am talking about memorizing the fact that $CD is CALL.
Little Endian is the correct representation. It doesn't make much sense for humans because humans aren't computers, but for the computer if bit 0 is the LSB, then byte 0, word 0, and long 0 should be the least significant members of their larger structures too. If you don't like reading it on the hex dump write a hex dumper that swaps them.
That's what computers are for, and that's what I did. When you are mixing data sizes, which you always do in a limited memory environment because every bit is precious, Big Endian is a great big mess.
I know this is an old post (sorry!) but when Chip says "integrated ADC", does that mean it will not require any external resistors/caps? This is amazing either way.... I would be able to to measure thermocouples and loads cells directly.... wow.
do not work. That maybe also true for other registers, but INA is a real show-stopper for high speed data acquisition
[noparse][[/noparse]code]
mov dest_0, INA
mov dest_1, INA
mov dest_2, INA
...
dest_0 long 0
dest_1 long 0
dest_2 long 0
...
[noparse][[/noparse]code]
and then transfer to HUB. The propeller logic analyzer uses this method if I'm not mistaken.
But if you need more than 20 (25) MHz, either use more COGS and interleave them or use another uC.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU
I have been thinking a bit about the possibility of having the hub (and hub ram) clock being = system clock * number of cogs. This would allow hub ram access every cycle in every cog, in some ways eliminating the benefits of LMM. I'm assuming the desired manufacturing process could not allow for such speeds in the necessary circuits (otherwise, why not make the entire circuit as fast, I guess)? As this is probably not possible, what about the following:
Not every program needs the exact determinism of hub access to shuffle data around. Some cog programs might want data more frequently than the allotted 1/16 system clock window, while others might need less. I know Chip and others have suggested various methods for slicing pieces of the pie, but from the suggestions I've seen, seem to create many potential conflicts. So, I thought of a simple two-mode hub memory access scheme - mode 0 could the typical 1/16th sysClock hub access, while mode 1 could be a FIFO-buffered request. For example, lets say all 8 cogs are running in mode 1, a cog can request a read/write to the hub at any given time, and depending on how many pending requests are in the buffer, will determine when the read/write request can be performed. The benefit to this mode is that depending on the other cog's hub access requirements, hub access could potentially be granted more than 1/16th sysClock (but never less). The downside to this mode is that determinism would prove to be nearly impossible. For those cog programs that need maximum MIPS utilization (and therefore, maximum low-level predictability), mode 0 could be specified so that hub access occurs in it's typical 1/16th sysClock time frame. Each cog could specify one mode or the other.. If one cog is configured for mode 0, while the rest are on mode one, those that are on mode one would be able to send requests to the buffer for a piece of the 14/16ths pie.
I hope that makes some kind of sense.
"Not every program needs the exact determinism of hub access to shuffle data around"
Whilst that may well be true many times things become very awkward as soon as you have some special fast access, turbo, low latency, high priority mode (SFATLLHPM) available for some objects.
What happens when a user picks two drivers out of OBEX that happen to both need SFATLLHPM?
As soon as it is allowed that one or more COGs can be "more equal" than the others we run into a very hard to manage situation with regard mixing and matching any and all combinations of objects.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.