Why does WRLONG arrange data like this?
DragonRaider5
Posts: 13
in Propeller 1
Hey,
I just discovered something which felt very strange to me: When I use WRLONG to write a long(who thought so xD), it arranges it with the least significant byte first(not most significant) - why does it do so? It messed up a lot of my code...
I just discovered something which felt very strange to me: When I use WRLONG to write a long(who thought so xD), it arranges it with the least significant byte first(not most significant) - why does it do so? It messed up a lot of my code...
Comments
-Phil
Just one of those things. Once you know about it, no big deal. But, it's gonna snag a lot of people once!
Personally, I got started little endian on 6502. It's annoying, but it's also very entrenched... never gonna be rid of it!
I share your pain evanh!
There's a lesson here: test code in small chunks to make sure things are working.
And then, when you hex dump memory as bytes, the 16 and 32 bit values get printed in the correct order for a human reader.
All in all, I think I for little endian. But I don't really much care either way.
It's certainly an issue every one has to be aware of when moving code from machine to machine or transmitting data between them.
I've noticed on occasion, just very rarely, in this throwaway society ... things get labelled "obsolete"! It's surprising how quickly equipment can get replaced.
So we had Motorola big endian and Intel little endian.
IMHO little endian is simpler for humans. But it is what it is, and it's not going to change any time soon - both camps are entrenched.
I don't think it's a big deal, nor do I think it's going anywhere. There is no right order, just understanding the order, IMHO.
MIPS would let you pick!
I read somewhere that there is no meaningful advantage either way today. It was a nice choice on small, early chips.
As far as code goes, endian type needs to be on the short list. Data sizes, twos compliment, endian, alignment, addressing.
Gotta think about those no matter what, right?
@OP. Well, now you know. Stuff happens. Should be easier now.
Let's suppose I have some data in my program like so:
When I compile that I get a binary. I might be debugging or otherwise inspecting that binary one day and to do so I dump it out in hexadecimal. I get this: Well, that's really annoying, we can see our numbers in there but they are displayed backwards!
This is also annoying when reverse engineering EPROMS (who would so such a thing) or inspecting disk formats and so on.
This is little endian and it's annoying. What we want for readability is big endian.
On the other hand, little endian is more optimal for doing byte wide arithmetic on multi-byte values. Not really a concern since the 8 bit days but hey, we have emulations of 8 bitters on the Prop
Note also that communications protocols are not consistent, you will find big and little endian formats on the wire. So it's wise to be aware of these issues.
Now, to add to the curiosity, Spin is both little and big endian at the same time! Consider the following program: And have a look at its binary: There you can read "de ad be ef" clear as day.
Note: Astute readers will notice I got my argument backwards in my previous post.
I was of course talking about big endian being the easy human view.
I read the same. Little endian indeed suited the 8-bit architectures as per Heater's comment on small adders and tiny real-estate. It would be fair to say that was the catalyst for today's situation.
The bigger the data structures the more the need to fix it is. It'll never be too late to change.
-Phil
1. Assume mylong is a long variable. We can write mylong := 10. What would you really want byte[@mylong] to equal? 10? or 0? I say 10 seems the more logical.
2. Part of Western-language speakers' discomfort with little endian notation is due to our perverse habit of reading numbers from left to right. Remember that our number system was crafted by right-to-left language speakers. But, instead of changing the digit order to conform to our lexical order, we left it as-is. Manual math operations are still performed right-to-left. English is read left-to-right. So we have a hodge-podge of left-to-right text and right-to-left numbers. Numbers would make more sense if we actually read them right-to-left, since we wouldn't have to scan ahead to tell what units the "first" digit represented. So if 123,456,789 was actually written 987,654,321, we could actually scan it left-to-right without having to look ahead to see its order of magnitude. Little endian order fixes the absurdity of our adopted-without-reordering numbering system. It's the latter that messed up, not the little endian system.
-Phil
Phil,
1: That's not any particular argument. Casting is no biggie in either format. I think what you are referring to is not a casting case at all. Technically, either endian works fine.
2: I once queried this very topic with someone that knew a little about middle-eastern languages. Turns out they read their numbers most significant digit first, the same as the west.
Little endian makes a lot of sense but not if you use a simple DUMP whereas if I dump longs I type DUMPL and I can see the $DEADBEEF very readily.
Little and big endian are just ways of storing data in byte sized chunks much like the ways we have for representing binary in hex for instance but the really funny part is how us humans get ourselves "confused" by it
In terms of resources and returns, the list of higher value things is very long, and until that changes, it's a very hard sell.
One good thing we appear to be doing more of is endian neutral data representation. Maybe some critical mass may bubble up, out of that. I have a long career in CAD, for example, and endian along with floating point issues have been very painful. Good progress on both of those has happened, but it took a really long time.
The endian question looks kind of like imperial vs metric units. The inertia is very significant, despite the fact that little endian is dominant and imperial units are not. Same with paperless manufacturing. Early in my career, I was, and to a degree remain, a strong advocate for both of these to be resolved.
Progress is slow. Heck, units are still an issue here in the US, and that remains true despite most equipment being dual unit capable. Paperless is getting there finally. I once thought we would have that licked by 2000. Lol, was I ever wrong! On the plus side, I do see really paperless shops now. They at least exist!
So I won't put this endian deal off the table. Could get sorted, but it's just way down the list of stuff worth doing, that's all. I just can't muster any real hope, due to the dynamics I see playing out.
Totally hear you though! It's a PITA that doesn't have to be.
http://stackoverflow.com/questions/7415071/big-endian-vs-little-endian-machines
But we may be stuck with it for a very long time yet. That big body of machine dependant code and data isn't shrinking very fast...
I'll trust you to keep that candle lit should I be wrong, great! I'm all over it.
In fact they are now baked into the insanity that is the Unicode standard. See Byte Order Mark https://en.wikipedia.org/wiki/Byte_order_mark
Phil, That's a trick question right?
That is a compiler or runtime bug that should result in a type error or memory access violation exception
There is no confusion with either endianness. The confusion comes because we have two ways to do it. Like metric vs imperial, driving on the left or right, American date format vs sensible ones, and so on.
It's perfectly valid SPIN. As valid as x := x + "hello" is.
We've had the type discussion before. Strong alignment and checks, etc... don't really get us anywhere that testing doesn't right?
So then, people can do it either way. Warnings and options could be added, for those in the type purity school, and SPIN will execute that just fine for those in the testing school.
One of the nice things about SPIN is that it's really simple in this way. It's possible to know just what those two will do, and once that is known, it's consistent.
From there, either test it, or check it.
And, the tests are needed anyway, for a lot of reasons, so.... exercise for the reader.
Of course that kind of thing can result in a real mess, or something really clever too. I like it that way personally.
Being really simple means needing to know and track a lot less. That being good or bad lies with the developer and their goals.
And it's not portable, etc... that's fine too. We don't always need portable, and we can get it, if we want it. Same choice as the type or test one is.
-Phil
I kind of like the idea of getting rid of bytes. UTF-32 is the easiest way to deal with unicode characters so obviously all strings should be strings of 32 bit code points.