Why does WRLONG arrange data like this?

DragonRaider5 · 2016-02-20 19:54

Hey,
I just discovered something which felt very strange to me: When I use WRLONG to write a long(who thought so xD), it arranges it with the least significant byte first(not most significant) - why does it do so? It messed up a lot of my code...

Phil Pilgrim (PhiPi) · 2016-02-20 20:00

That's just the way bytes are arranged in longs by the Propeller hardware. It's called "little endian order," and it's not at all uncommon among microcontrollers and microprocessors. Moreover, if you define a long in the DAT section or assign a value to a long VAR variable, their bytes, too will be in little endian order.

-Phil

evanh · 2016-02-20 21:50

I like to make a point of complaining about little endian, the world over ... so count this as another entry.

potatohead · 2016-02-20 21:58

Lol

Just one of those things. Once you know about it, no big deal. But, it's gonna snag a lot of people once!

evanh · 2016-02-20 22:03

Once?! Check out this detail and tell me again people don't get caught more than once -

The GUIDs in this table are written assuming a little-endian byte order. For example, the GUID for an EFI System partition is written as C12A7328-F81F-11D2-BA4B-00A0C93EC93B here, which corresponds to the 16 byte sequence 28h 73h 2Ah C1h 1Fh F8h D2h 11h BAh 4Bh 00h A0h C9h 3Eh C9h 3Bh – only the first three blocks are byte-swapped.

potatohead · 2016-02-20 22:23

That's a GUID design matter, if you ask me.

Personally, I got started little endian on 6502. It's annoying, but it's also very entrenched... never gonna be rid of it!

I share your pain evanh!

JonnyMac · 2016-02-20 22:25

Geez... it's not as if Parallax hides the fact that multi-byte values are stored little endian. As Phil points out, little endian is quite common.

It messed up a lot of my code...

There's a lesson here: test code in small chunks to make sure things are working.

Heater. · 2016-02-20 22:52

Isn't is so that if you have an 8 bit machine adding/subtracting 16 or 32 bit numbers is easier to program and quicker to run if they are stored little edian? Just point to the first bytes, add them, increment the pointers, add the bytes there (with carry) and so on. No gain on a 32 bit machine but there we go.

And then, when you hex dump memory as bytes, the 16 and 32 bit values get printed in the correct order for a human reader.

All in all, I think I for little endian. But I don't really much care either way.

It's certainly an issue every one has to be aware of when moving code from machine to machine or transmitting data between them.

evanh · 2016-02-20 23:13

potatohead wrote: »

That's a GUID design matter, if you ask me.

Kind of true in that the only resolution is to have a rule that all data be stored big endian.

but it's also very entrenched... never gonna be rid of it!

I've noticed on occasion, just very rarely, in this throwaway society ... things get labelled "obsolete"! It's surprising how quickly equipment can get replaced.

Cluso99 · 2016-02-20 23:33

I believe 6800 is big endian and 6502 was designed by some of the ex Motorola engineers and also used big endian.

So we had Motorola big endian and Intel little endian.

IMHO little endian is simpler for humans. But it is what it is, and it's not going to change any time soon - both camps are entrenched.

evanh · 2016-02-20 23:56

Cluso99 wrote: »

IMHO little endian is simpler for humans.

What?! I can see why Heater would mention historical factors but I can't see why anyone would say that. There's nothing simpler for humans. All human languages use big endian format.

potatohead · 2016-02-21 00:40

6502 is little endian. It makes for a very simple machine, which that chip was. Moto used big endian, 6809, 68k, 6800 I believe.

I don't think it's a big deal, nor do I think it's going anywhere. There is no right order, just understanding the order, IMHO.

MIPS would let you pick!

I read somewhere that there is no meaningful advantage either way today. It was a nice choice on small, early chips.

As far as code goes, endian type needs to be on the short list. Data sizes, twos compliment, endian, alignment, addressing.

Gotta think about those no matter what, right?

@OP. Well, now you know. Stuff happens. Should be easier now.

Heater. · 2016-02-21 01:13

evanh,

There's nothing simpler for humans

Ah but there is.

Let's suppose I have some data in my program like so:

pub start

dat
long  1
long  2
long  3
long  $DEADBEEF

When I compile that I get a binary. I might be debugging or otherwise inspecting that binary one day and to do so I dump it out in hexadecimal. I get this:

$ hexdump -C test.binary 
00000000  00 1b b7 00 00 cc 10 00  2c 00 34 00 28 00 38 00  |........,.4.(.8.|
00000010  1c 00 02 00 18 00 00 00  01 00 00 00 02 00 00 00  |................|
00000020  03 00 00 00 ef be ad de  32 00 00 00              |........2...|
0000002c

Well, that's really annoying, we can see our numbers in there but they are displayed backwards!

This is also annoying when reverse engineering EPROMS (who would so such a thing) or inspecting disk formats and so on.

This is little endian and it's annoying. What we want for readability is big endian.

On the other hand, little endian is more optimal for doing byte wide arithmetic on multi-byte values. Not really a concern since the 8 bit days but hey, we have emulations of 8 bitters on the Prop

Note also that communications protocols are not consistent, you will find big and little endian formats on the wire. So it's wise to be aware of these issues.

Now, to add to the curiosity, Spin is both little and big endian at the same time! Consider the following program:

var
    long x 

pub start
   x := $deadbeef
dat

And have a look at its binary:

$ hexdump -C test.binary 
00000000  00 1b b7 00 00 9e 10 00  20 00 2c 00 18 00 30 00  |........ .,...0.|
00000010  10 00 02 00 08 00 00 00  3b de ad be ef 41 32 00  |........;....A2.|
00000020

There you can read "de ad be ef" clear as day.

Note: Astute readers will notice I got my argument backwards in my previous post.

evanh · 2016-02-21 02:08

Heater. wrote: »

evanh,

There's nothing simpler for humans

Ah but there is.

Let's suppose I have some data in my program like so:... There you can read "de ad be ef" clear as day.

I was of course talking about big endian being the easy human view.

evanh · 2016-02-21 02:22

potatohead wrote: »

I don't think it's a big deal, nor do I think it's going anywhere. There is no right order, just understanding the order, IMHO.

Human readable is preferable. And, as a strong example of change is easy to do, major changes to the PC architecture have been pushed through without much ado at all before. ISA to VLB to PCI then PCI express for one example. Memory models is a biggie. Apple even bridged the whole lot at once when they adopted the PC architecture for the Mac.

I read somewhere that there is no meaningful advantage either way today. It was a nice choice on small, early chips.

I read the same. Little endian indeed suited the 8-bit architectures as per Heater's comment on small adders and tiny real-estate. It would be fair to say that was the catalyst for today's situation.

The bigger the data structures the more the need to fix it is. It'll never be too late to change.

Phil Pilgrim (PhiPi) · 2016-02-21 02:23

Low address, least significant; high address, most significant. Lowest = least; highest = most. 'Seems totally natural to me!

-Phil

evanh · 2016-02-21 02:34

Funny!

potatohead · 2016-02-21 02:54

It's just not ever going to change. Sorry. Way too much out there, returns too few.

Phil Pilgrim (PhiPi) · 2016-02-21 03:21

Here are a couple logical arguments in favor of little endian order:

1. Assume mylong is a long variable. We can write mylong := 10. What would you really want byte[@mylong] to equal? 10? or 0? I say 10 seems the more logical.

2. Part of Western-language speakers' discomfort with little endian notation is due to our perverse habit of reading numbers from left to right. Remember that our number system was crafted by right-to-left language speakers. But, instead of changing the digit order to conform to our lexical order, we left it as-is. Manual math operations are still performed right-to-left. English is read left-to-right. So we have a hodge-podge of left-to-right text and right-to-left numbers. Numbers would make more sense if we actually read them right-to-left, since we wouldn't have to scan ahead to tell what units the "first" digit represented. So if 123,456,789 was actually written 987,654,321, we could actually scan it left-to-right without having to look ahead to see its order of magnitude. Little endian order fixes the absurdity of our adopted-without-reordering numbering system. It's the latter that messed up, not the little endian system.

-Phil

evanh · 2016-02-21 03:43

Spud, we clearly disagree on the effectiveness of entrenchment.

Phil,
1: That's not any particular argument. Casting is no biggie in either format. I think what you are referring to is not a casting case at all. Technically, either endian works fine.

2: I once queried this very topic with someone that knew a little about middle-eastern languages. Turns out they read their numbers most significant digit first, the same as the west.

Peter Jakacki · 2016-02-21 04:17

I understand why Spin stores longs in big endian for the same reason that I came to in Tachyon, it was simply because it made the byte/word/long fetch code more compact in that it could read a byte, <<8, read a byte and add, etc. If I wasn't so pressed with 496 cog longs I would have separate routines that read little-endian words/trytes/longs etc.

Little endian makes a lot of sense but not if you use a simple DUMP whereas if I dump longs I type DUMPL and I can see the $DEADBEEF very readily.

Little and big endian are just ways of storing data in byte sized chunks much like the ways we have for representing binary in hex for instance but the really funny part is how us humans get ourselves "confused" by it

evanh · 2016-02-21 04:27

A detail, the confusion comes with little endian only.

potatohead · 2016-02-21 04:32

Totally understandable evanh. I'm very highly pragmatic. I tend to operate on a value basis.

In terms of resources and returns, the list of higher value things is very long, and until that changes, it's a very hard sell.

One good thing we appear to be doing more of is endian neutral data representation. Maybe some critical mass may bubble up, out of that. I have a long career in CAD, for example, and endian along with floating point issues have been very painful. Good progress on both of those has happened, but it took a really long time.

The endian question looks kind of like imperial vs metric units. The inertia is very significant, despite the fact that little endian is dominant and imperial units are not. Same with paperless manufacturing. Early in my career, I was, and to a degree remain, a strong advocate for both of these to be resolved.

Progress is slow. Heck, units are still an issue here in the US, and that remains true despite most equipment being dual unit capable. Paperless is getting there finally. I once thought we would have that licked by 2000. Lol, was I ever wrong! On the plus side, I do see really paperless shops now. They at least exist!

So I won't put this endian deal off the table. Could get sorted, but it's just way down the list of stuff worth doing, that's all. I just can't muster any real hope, due to the dynamics I see playing out.

Totally hear you though! It's a PITA that doesn't have to be.

http://stackoverflow.com/questions/7415071/big-endian-vs-little-endian-machines

But we may be stuck with it for a very long time yet. That big body of machine dependant code and data isn't shrinking very fast...

I'll trust you to keep that candle lit

should I be wrong, great! I'm all over it.

evanh · 2016-02-21 04:44

Well written Doug. The effort is appreciated. Intel probably have it weighed like that; ie: Endianess is only a developer issue, the general public never sees it, and it doesn't impact performance like say the memory model did.

Heater. · 2016-02-21 05:24

Endian issues are not going away any time soon.

In fact they are now baked into the insanity that is the Unicode standard. See Byte Order Mark https://en.wikipedia.org/wiki/Byte_order_mark

Phil,

We can write mylong := 10. What would you really want byte[@mylong] to equal? 10? or 0?

That's a trick question right?

That is a compiler or runtime bug that should result in a type error or memory access violation exception

There is no confusion with either endianness. The confusion comes because we have two ways to do it. Like metric vs imperial, driving on the left or right, American date format vs sensible ones, and so on.

potatohead · 2016-02-21 06:30

Why an exception?

It's perfectly valid SPIN. As valid as x := x + "hello" is.

We've had the type discussion before. Strong alignment and checks, etc... don't really get us anywhere that testing doesn't right?

So then, people can do it either way. Warnings and options could be added, for those in the type purity school, and SPIN will execute that just fine for those in the testing school.

One of the nice things about SPIN is that it's really simple in this way. It's possible to know just what those two will do, and once that is known, it's consistent.

From there, either test it, or check it.

And, the tests are needed anyway, for a lot of reasons, so.... exercise for the reader.

Of course that kind of thing can result in a real mess, or something really clever too. I like it that way personally.

Being really simple means needing to know and track a lot less. That being good or bad lies with the developer and their goals.

And it's not portable, etc... that's fine too. We don't always need portable, and we can get it, if we want it. Same choice as the type or test one is.

evanh · 2016-02-21 07:04

Heater. wrote: »

There is no confusion with either endianness. The confusion comes because we have two ways to do it.

Bear in mind that human language is universally big endian. It's only computing that has belatedly created a split at all. Something that could be fixed.

Phil Pilgrim (PhiPi) · 2016-02-21 07:14

evanh wrote:

Bear in mind that human language is universally big endian.

What does that even mean? Are you referring to word order? If so, there's nothing universal about that even among Western languages. And then there's Yoda.

-Phil

evanh · 2016-02-21 07:40

Phil Pilgrim (PhiPi) wrote: »

So if 123,456,789 was actually written 987,654,321 ...

Peter Jakacki · 2016-02-21 10:35

An end to this endless big endian debate I say. If we get rid of bytes then it won't matter. Wonder what they did with packing nibbles? For that matter how are bits packed into a byte, let's see.......oh oh.....no end to it now.

Heater. · 2016-02-21 11:46

Ahhh....Peter, is a HIGH bit a 1 or a 0?

I kind of like the idea of getting rid of bytes. UTF-32 is the easiest way to deal with unicode characters so obviously all strings should be strings of 32 bit code points.

Why does WRLONG arrange data like this?

Comments