Good coding habits for HLL (was 'The Merits Of Assembly Lang.).

davidsaunders · 2011-05-25 18:49

Started as a thread on the merits of assembly language. Since the point was missed direction changed.

I changed the topic from a discussion of assembly language in order to better target the point I originally intended to. Discussing this from the stance of assembly was counter productive as the topic repeatedly wondered to a discussion of ASM vs HLL.

If people would fallow some simple rules development time would be greatly reduced for any project and there would be fewer bugs to worry about. These rules in part are

1) Comment header for the parameters taken by every Procedure, and any used resources (including stack), and include all functions/procedures called including all functions/procedures called by the functions you call (and remember to include the total stack usage in detail).

2) Include a thorough description of the functionality of each procedure in its header comment.

3) Comment every line of code, thoroughly.

4) Code in good resource tracking.

4) Comment all system calls and resource known to be used by them (include these in the function/procedure header).

5) Provide detailed comments at the head of each module.

6) Write the comments first in a way that describes the procedure in order, then write the code between the comments.

7) Keep a separate document containing all of the information on each procedure and its usage, etc.

RossH · 2011-05-25 19:49

Hi David,

Much as I personally like assembly language, there is a good reason why assembly language translators have not developed very far - this is because in general it is only possible to automatically translate between different assembly languages where the basic machine architecture is very similar. If it is not, then translating assembly can become more complex than compiling a high level language - because the original intent of the assembly language program is often not at all clear from its structure.

Some examples to illustrate what I mean:

Translating from a CISC instruction set to a RISC instruction set. This can be done by substituting a block of RISC code for each CISC instruction, or by calling RISC functions to implement each CISC operation. But both of these tend to result in very inefficient translations. Other techniques will give better results, but require about the same level of complexity as an optimizing compiler.
Translating any assembly language to PASM. PASM has a built-in restriction of 512 instructions (on the Prop 1) - yet anyone can see that anything is possible on the Prop I if you code it to take advantage of multiple cogs. But how would you do this automatically? Your best hope is to adopt an LMM PASM technique, or use overlays - and both of these will be slow when compared to the original program (unless the algorithm is triviallly "parallelizable").

Another point - it is generally impossible to identify the areas of an assembly language program where timing is critical - so for real-time programs any "translated" program will almost inevitably fail.

The C language is the closest thing we currently have to a "universal" assembler. It can be implemented (with varying degrees of efficiency) on any machine architecture. When written with a little care - i.e. avoiding all the non-standard language extensions that some compiler vendors just love to throw in to "improve" things - it is also extremely portable.

Ross.

Bean · 2011-05-25 20:32

Just one comment about comments.
Make sure you don't make the comment just explain what the instructions. More than once I've seen this:

MOVLW 100 ; Move 100 into W
MOVWF temp ; Move W into temp

Well NO DUH...

This type of comment does NOTHING to explain what is going on.

Bean

RossH · 2011-05-25 20:37

Bean wrote: »

Just one comment about comments.
Make sure you don't make the comment just explain what the instructions. More than once I've seen this:

MOVLW 100 ; Move 100 into W
MOVWF temp ; Move W into temp

Well NO DUH...

This type of comment does NOTHING to explain what is going on.

Bean

Hi Bean,

I'd recommend you don't read any of my assembly code!

But I do have one justification - when I'm coding in assembly, I tend to comment every line - it's kind of a "stream of consciousness" thing. Sometimes this makes the comments seem a little redundant. But when I'm re-reading the code later, it's the comments I read, not the code - to see what the heck I thought I was doing in the first place!

Ross.

bill190 · 2011-05-25 21:00

I like assembly because you are directly "in sync" with the data sheet, especially if you use the same number system for registers as in the data sheet like binary.

Sort of like the movie "Tron". You are right in there with all the bits and bytes!

C libraries might rename things and might "do extra steps for you". So it can be difficult to read a data sheet, then translate that into C or figure out why something is being set when you did not specifically specify that it be set.

Phil Pilgrim (PhiPi) · 2011-05-25 21:36

Given Microchip's obscure mnemonics, I think "Move 100 into W" is entirely justified as a comment! Thank Chip for Parallax's more logical PIC and SX mnemonics!

-Phil

potatohead · 2011-05-25 22:41

    mov    A, current char    'get current display character value
    shl    A, #3             'multiply by 8 bytes / character
    add    A, fontsum        'setup pointer to pixel data
    rdbyte    pixels, A      'get pixels

I find that when a routine can be expressed as a sum of simple, plain english actions, writing the assembly language code for it becomes a whole lot easier. Generally, I will work it out, then write it out, then match up instructions, and then code. When I use this comment style, I literally do read the comments, then when I've located whatever it is that matters, I'll then parse the instructions at that point for the details.

In higher level languages, I find it considerably easier to follow the code, with much more brief and sporadic comments.

Seconded on C being the "universal assembly".

RossH · 2011-05-25 22:52

potatohead wrote: »

In higher level languages, I find it considerably easier to follow the code, with much more brief and sporadic comments.

Absolutely! In well-written assembly language, one can never have too many comments - but in well-written high level language, one can never have too few.

The key phrase here, of course, is "well-written"

Ross.

davidsaunders · 2011-05-26 06:15

Interesting.

RossH:
I do agree that writing translators to go from a CISC assembly to a RISC assembly can often present problems, and translating from anything else to Propeller assembly is not very efficient. Though translating from RISC to CISC generally is fairly simple. Of course if you wish to translate assembly between modern pipelined processors you better make sure to either go from RISC to CISC or CISC to CISC, and fallow the optimization rules in the original.

Now for the most part as long as you do not code in a way that is address dependent, you can translate between different HW architectures with a reasonable amount of ease (and all targets use an OS please).

And why do people say 'PASM'? PASM is the name of an old HLL.

Bean:

Yes. Comment every line of code though do not just describe the instruction.

schill · 2011-05-26 06:19

Don't most of your rules apply to higher level languages as well? For the most part, you don't seem to be specifying why assembly is better but rather why good programming practices are important.

But what do I know. I'm just an engineer who hasn't been tainted by a formal programming education

.

davidsaunders · 2011-05-26 06:29

I almost forgot:
You do not want to translate assembly from a traditional architecture to a Harvard architecture.

Schill:
Yes most of the rules apply to any HLL also. Though in asm you are responsible for how every thing is done, this tends to make you think about what you are doing as you design the algorithm.

And most people tend to become complacent in there programming practices with HLLs. If every one kept tract of things correctly we would not have products that are considered usable quality that have significant memory leaks, and/or can have buffer overruns.

davidsaunders · 2011-05-26 06:36

Most people tend to become complacent in there programming practices with HLLs. If every one kept tract of things correctly we would not have products that are considered usable quality that have significant memory leaks, and/or can have buffer overruns.

This is a good example of saving development time. Two products that I am aware of that regardless of what language they are implemented in could have been completed by now if they would have followed the rule KEEP TRACT OF ARRAYS AND MEMORY ALLOCATIONS. These two are Mozilla, and WebKit, to elaborate:

Both Mozilla and Webkit have many known memory leaks and can have buffer overruns, these are bugs that are unacceptable for even Alpha quality software. If all the programmers would have fallowed the rules, including the one in direct issue, these bugs would not have developed to begin with.

Mike G · 2011-05-26 06:44

And most people tend to become complacent in there programming practices with HLLs. If every one kept tract of things correctly we would not have products that are considered usable quality that have significant memory leaks, and/or can have buffer overruns.

Don't you have the same problem in assembly? You're argument seems to be from the perspective of an accomplished assembly coder vs a HLL coder with little system experience or care.

davidsaunders · 2011-05-26 06:52

Mike G:
It seems that generally speaking assembly language programmers pay better attention to what they do.
All my arguments could equally apply to arguments in favor of proper coding practices in HLLs, though this would be a waste of breath as we have seen with release products that should not even be labeled as pre-alpha (Windoze, Mozilla, GNOME, webkit, etc) do to memory leaks and buffer overruns.

davidsaunders · 2011-05-26 07:25

Since I am mentioning known bugs in popular products as example points, I must say the following:

I respect the programmers that create the big products, and I like many of these products. I am only wishing to point out that certain kinds of bugs can be prevented by good coding practices and defining, and fallowing, a resource tracking scheme from the time the project is started. Unfortunately this was not done for some projects. The programmers involved in these projects are quite good at what they do, and usually can not be blamed for the way things come out as they did not define the structure of the project, and to often accurate documentation about ALL routines is not available, or is incomplete.

BREATH: Another two extreme run on sentences (my grammar is terrible

).

prof_braino · 2011-05-26 14:58

david's "simple rules" and obversations apply to ANY programming language especially if the program is to have multiple contributors, or is to be revisited by the same author at a later date. Degree of observance to these rules is proportional to the degree of the application's successful function and maintainability.

If by "universal assembler" you mean except for processor specific hardware instructions (ie optionally present parallel ports, serial ports, AD, etc) the remainder of the code is more or less immediately portable to other platforms, there IS a tool that acts as a "universal assembler". It has been ported to nearly every microcontroler and microprocessor. The issues of translating RISC to CISC, and a traditional architecture to a Harvard architecture are minimal in that all these issues are addressed in the intial port, and the "internals" of the processor are transparent to the developer. The hardware specific differences only come into play at the application level when a particular set of functions need optimization.

The listed rules 1, 3, 5, 6 are best practices. 7 and 8 are obseverances if the rules are followed. But the key issues are in your rules 2 and 4.

2) You are responcible for all the bounds checking, and all memory tracking, and you know it. This means that it is more likely that the effort will be exerted to avoid array bounds errors, as well as to avoid memory leaks.

4) You know exactly how the stack is being used.

As RossH points out, C is commonly considered universal. However, C does many things for you automatically that you have no control over, or are likely to assume are correct and not pursue. This is a common source of errors for all but the most experience programmers. And a programmer's experience tends to be limited to a particular set of tools, so moving to a new tool tend to trick folks because the new works from a different set of assumptions.

Of course, the tool I'm discussing is forth. The developer will always know exactly what is happening on the stack. The developer knows to check implementation differences up front. Code that is not directly portable will be obvious in that it will not interpret correctly or will crash. (There may be exceptions to this but I have never seen any).

High level forth code is for the most part directly portable depending on the implementation (some do or don't allow mixed case or restrict name size, etc).

The low level forth assembly is ALSO directly portable up to the point where processor specific op-code come into play. But dealing with processor specific op-code is usually needed only in the small percentage of code that requires optimization.

Neither forth nor C are a perfect solution. C does things for us, and hides the stack; and forth requires us to arrange the low level compenents, and manage the stack ourselves. These differences are both the advantage and drawback of each.

In the case of forth, users tend to view it as "high-level interactive assembler", with low-level assembler available when we need it.

Mike Green · 2011-05-26 15:18

It's interesting that this discussion has occurred in the past over and over again. There was a tremendous amount of debate when Fortran was first introduced. The amount of optimization included in the first Fortran compiler (really the first compiler) was huge and included partly because of the high-level vs. assembly efficiency issue.

PL360 was an interesting experiment on structured assembly in that it combined direct construction of individual instructions, use of conventional arithmetic and logical operators directly generating specific instructions, the use of high-level control structures (IF/THEN, FOR, WHILE, DO/UNTIL) that produced well defined, minimal instruction sequences, and function / procedure calls that had minimal and well defined overhead, usually just a jump and save return instruction.

PL360 was used to write its own compiler as well as a complete multi-threaded stand-alone operating system and utilities. Although PL360 did have GOTOs, the operating system was written without them as an exercise. As a sidenote, the availability of high level control structures did help reduce errors.

HollyMinkowski · 2011-05-26 17:20

RossH wrote: »

when I'm coding in assembly, I tend to comment every line - it's kind of a "stream of consciousness" thing. Sometimes this makes the comments seem a little redundant. But when I'm re-reading the code later, it's the comments I read, not the code - to see what the heck I thought I was doing in the first place!

This may sound strange but I usually write the comments for a program first.
Then I go back and write the code that the comments describe. I know, it seems
backwards but believe me it actually makes coding easier.

Actually, I do a flowchart first, then the comments, then the code. I completely
test each function before moving on...NEVER write the whole program and then test.
You will be sorry if you do.

I love asm, I consider it the easiest language to use..but it is slow, at least until
you get really good at it and build up a huge library of optimized code for re-use.
I even started a beginners asm blog...but everyone kept telling me I was leading
poor trusting newbies astray when I should be teaching them C...I got tired of hearing
it and when a good excuse came up I stopped the asm blogging :-(

I actually like C a lot. I know it has its blemishes (I think localroger hates C) but it
really is a sort of portable assembly language. Like with asm you have to be very
careful or you will shoot yourself in the foot with stuff like bounds checking. You
simply must know exactly what you are doing when using asm or C since both
languages give you complete power over the cpu and complete power is a dangerous thing
in the hands of a fool.

I like using a C compiler like gcc because I can look at the compilers source if I am
curious or suspect a problem.

Phil Pilgrim (PhiPi) · 2011-05-26 17:52

Assembly code is the only code that I comment for my own benefit. And I typically comment every line. I've learned the hard way that the time invested gets returned in spades if I need to modify the code later.

But for HLLs, I seldom do any commenting. I probably should comment Perl regexes, though. More than once, I've had to decipher one from scratch to modify it, and it's not easy.

-Phil

Martin_H · 2011-05-26 19:52

Perl regexes needs comments and a warning label!

I comment HLL code because I write the comments first to document and sketch out the program. I then write the program in between the comments. The most valuable comment in any program are header blocks which tell you what this program is used for and what its major parts consist of. You can always reverse engineer function, but intent is much harder to ascertain.

davidsaunders · 2011-05-26 20:31

HOLLY wrote:

This may sound strange but I usually write the comments for a program first.
Then I go back and write the code that the comments describe. I know, it seems
backwards but believe me it actually makes coding easier.

This is always how I code. Though I begin with sudo code, not a flow chart. I usually end up with 3 lines of comment to every line of code. I have had to write my own code line counters, because the normal lines of code counters include comments and labels on a line with out code and I do not count comments or labels on there own line as code.

davidsaunders · 2011-05-26 20:37

The trouble is that I use the same diligence in HLLs thus producing about ten lines of comment for every line of code. As a result my HLL coding is a lot quicker than it would be if not for this. Though my Asm coding is still quicker than my HLL coding.

davidsaunders · 2011-05-26 20:45

Martin_H wrote:

I comment HLL code because I write the comments first to document and sketch out the program. I then write the program in between the comments. The most valuable comment in any program are header blocks which tell you what this program is used for and what its major parts consist of. You can always reverse engineer function, but intent is much harder to ascertain.

Ah thus I say. Now extend this to include an equal or better level of information in the header comment block for every single procedure. Do not forget to include accurate information on each individual parameter, the return values, the stack usage, and the register usage, then you include a detailed explanation of the procedure, how it is used, what it is intended for, if it allocates memory, what procedures it invokes, what system calls it makes and all procedures it calls all the way down make. Now you are talking my language.

Phil Pilgrim (PhiPi) · 2011-05-26 20:48

Martin_H wrote:

I comment HLL code because I write the comments first to document and sketch out the program.

I have neither the patience nor the discipline! More power to you, though!

-Phil

RossH · 2011-05-26 21:05

davidsaunders wrote: »

Ah thus I say. Now extend this to include an equal or better level of information in the header comment block for every single procedure. Do not forget to include accurate information on each individual parameter, the return values, the stack usage, and the register usage, then you include a detailed explanation of the procedure, how it is used, what it is intended for, if it allocates memory, what procedures it invokes, what system calls it makes and all procedures it calls all the way down make. Now you are talking my language.

Jeez, talk about squeezing all the joy out of programming!

I'm in this game for the excitement! The danger! The adrenalin! No compiling allowed for any program under 5,000 lines! Run the whole thing on the production server as soon as it compiles! - unit testing is for nervous nellies! Documentation? Why? - my programs aren't complicated, you're just dumb! Comment lines? - just a waste of disk space! Never give a machine the benefit of the doubt! - If it won't run the program right first time, then it's got to be a hardware glitch - power cycle it and try again! Backups? - why bother when I can remember every line? Source code control? - don't make me laugh! - why would I ever want to undo any change I made?

Ross.

Phil Pilgrim (PhiPi) · 2011-05-26 21:25

Ross,

'Glad to hear you're still young! I was like that thirty-some years ago. I could remember every line of code I ever wrote -- and why! Fortunately, experience eventually trumps enthusiasm and native brilliance.

-Phil

Heater. · 2011-05-26 22:27

RossH,

Sounds like the young guys who created on of our companies core applications. Tens of thousands of lines of code, in Pascal. I did find a comment in it once. It's full of some obscure mathematical models. The only documentation for it is the stack of masters theses by the 5 or so guys who wrote it over 10 years. They are written in Finnish. I don't stand a chance.

HollyMinkowski · 2011-05-27 01:04

When I'm done with a project I write up a complete brief
that tells all I know about it. This is a supplement to the
actual code and comments. In my programs header I place
info that lets you know how to contact me at any point in
the future so I can send you a copy of the brief in case it has
been lost. IMO the job is not done until the documentation
is complete. I have the documentation in several repositories
so it will not be lost. Some of it I have to keep in encrypted
form, this presents the problem of what happens to it if I'm
no longer around to decrypt it. I decided to not worry about
this since I will never be called to account on it :-)

I would include the brief in the actual code as a huge comment
but these briefs often include photographs and other materials
not easily contained in comments.

To avoid the possible problem of losing access to my email
accounts in the future I have a backup plan for anyone needing
to reach me. I will place a unique string on a website that contains
current ways to contact me. This means anyone at any time
could do a search for that unique string using a search engine
and find me...it's the best plan I could come up with.

This all has caused me some grief from other coders because
my old bosses liked it so much they made them start doing it.
Now they are all simulating the habits of an OCD programmer..LoL

From the comments in this thread it seems I am not alone in
writing the comments first and the code second. My comments
at first are a sort of detailed pseudo-code. Actually if you think
about it this is the way people will write programs in the near
future when some level of pseudo-AI becomes the norm. The
compilers will be so good that all you will need to do is write
the comments/pseudo-code and the compiler will create the
software from that description.

potatohead · 2011-05-27 08:40

Holly, that is brilliant!

Unknown · 2011-05-27 13:13

64-bit OS written entirely in assembly

The goal of the BareMetal project, which includes a stripped-down bootloader and a cluster computing platform is to get away from the inefficient obfuscated machine code generated by higher level languages like C/C++ and Java. By writing the OS in assembly, runtime speeds are increased, and theres very little overhead for when every clock cycle counts.

RossH · 2011-05-27 16:18

Chuckz wrote: »

64-bit OS written entirely in assembly

Wow! We think we have retro-computing nailed in these forums - but these people are taking it all the way back to the 1950's!

I predict that their next project will be an OS based on milled-steel components which will implement multi-tasking for Babbage's Difference Engine.

Ross.

Good coding habits for HLL (was 'The Merits Of Assembly Lang.).

Comments