I don't know where this geek fest leads Phil, but...
The attached file contains all the results of fibo(n) for 95001 <= n <= 95752.
Those results vary from 19854 digits to 20011 so you have about 15 million fibo digits to analyse!
Being the anti-JS thread of course I had to do this in JS like see bellow.
This far outstrips what we can do in COBOL.
It outstrips what I can do in Python which only got up to fibo(27558). Though I suspect Python could do better if I could figure out how to give it more stack or memory or whatever it needs.
It takes about 16 seconds to produce those 751 results. I checked the final result against the fibo calculator lined to above. That blows up if you ask for an exact result but if asked for the first few digits and number of digits in the result then we are in agreement. I think we can say by induction that the rest of the lower digits must also be correct. I can't get any bigger results without finding more memory.
What can the Java boys do with this?
//
// Recursive fibo with memoization.
//
"use strict";
var bigInt = require("big-integer");
var memo = [bigInt("0"), bigInt("1")];
function fibo(n) {
if (memo[n] === undefined) {
memo[n] = fibo(n - 1).add(fibo(n - 2));
}
return memo[n];
}
var n = 95000;
var resultString;
while (true) {
resultString = fibo(n).toString();
console.log("The " + resultString.length + " digits of fibo(" + n + ") are:");
console.log(resultString);
console.log("Done.");
n += 1;
}
Like Konrad Zuse I am a Kraut, as them British soldiers loosing pool against me all the time told me.
I looked at the Zuse Z3(?) sitting in Munich and no way that it used a HLL. Zuse might have had thought about stuff like that, but never did.
Flow-Matic and Math-Matic by Grace Hopper where done before FORTRAN.
But like other things in History the time was ripe for it. Like with automobiles and planes. Different people at different places, mostly not knowing of each other, had the same ideas at the same time.
FLOW-MATIC - Concept Grace Hopper in 1955. Released 1958.
FORTRAN - Concept John W. Backus in 1953. Released 1956.
Anyway I'm sure you are right. The idea and the means to do it were "in the air".
Also the problem with pinning dates to these things is in the definition of the thing. What exactly is a compiler? Does it count as a compiler if it cannot compile itself (even conceptually if not in practice)? Does it count if it's basically an assembler wrapped up with some easier function calling syntax? Does it count if uses numbers for variable an function references rather than names? And so on.
We can have similar arguments about who built the first electronic computer and when.
Hmmm...does anyone haves link to source code complied by these so called early compilers?
Because, for the sake of argument, I don't buy it. From one source:
In producing A-0, she took all the subroutines she had been collecting over the years and put them on tape. Each routine was given a call number, so that it the machine could find it on the tape. "All I had to do was to write down a set of call numbers, let the computer find them on the tape, bring them over and do the additions. This was the first compiler," as described by Grace.
Sounds more like an overlaying linker/loader rather that a compiler for a high level language. Where are the human readable symbolic names? Where are the constructs for sequence, selection, iteration, the essentials of algorithm expression?
On the other hand FORTRAN had all this worked out and specified in BNF.
I in no way want to belittle the wonderful Grace Hopper. I'm sure she was on the right track as were others at the time.
If someone asked me to produce the source code to prove it, I couldn't. The hard drive it was on is long gone because after the market for it was gone it wasn't worth saving. The fact that it might be of historical interest 30 years in the future didn't occur to any of us at the time. So all that's left of it are magazine reviews saying we did it. Probably the same is true for all those early compilers from the 50's.
I was not really thinking of the source for the compiler itself but rather examples of the source code that it compiled. That is to say sources that showed the language design. I might have thought the language had be discussed in books, journals or papers of the time that would have survived somewhere.
Whilst we are here. Did you say you were using BigDecimal in your project? Is it the popular one for node that originates from Java via GWT?
Because if so you may be wanting to look at these bugs that have been uncovered. I was looking at a different big number library https://github.com/MikeMcl/bignumber.js/ and found this statement:
The perf directory contains two applications and a lib directory containing the BigDecimal libraries used by both.
bignumber-vs-bigdecimal.html tests the performance of bignumber.js against the JavaScript translations of two versions of BigDecimal, its use should be more or less self-explanatory. (The GWT version doesn't work in IE 6.)
GWT: java.math.BigDecimal https://github.com/iriscouch/bigdecimal.js
ICU4J: com.ibm.icu.math.BigDecimal https://github.com/dtrebbien/BigDecimal.js
The BigDecimal in Node's npm registry is the GWT version. Despite its seeming popularity I have found it to have some serious bugs, see the Node script perf/lib/bigdecimal_GWT/bugs.js for examples of flaws in its remainder, divide and compareTo methods.
Sure enough if you look in that bugs.js file there are some scary things.
Thanks for the heads up. We use BigDecimal pretty extensively in Java for currency manipulation and formatting. We'll eventually have a need for something similar in Nodes.js.
Yep, self-modifying code. I found a nice analysis of the operation of that FLOW-MATIC code sample here: http://www.linuxvoice.com/history-of-computing-part-2/
There he says: "If we have run out of B data, we rewrite (9) so that all the rest of the products go directly to the unpriced output file."
For you practical extraction and reporting delight I offer all the digits of fibo(4784969). This is the first number in the Fibonacci sequence to have 1 million digits!
It looks, in summary, like this:
The 1000000 digits of fibo(4784969) are:
10727395641800477229364..........706378405156269
Done.
The first ten digits agree with the online calculator so I think it's a correct result.
This took about 12 hours in JS, using the normal iterative algorithm like so:
//
// Iterative fibo.
//
"use strict";
var bigNumber = require("big-integer");
function fibo_iterative(n) {
var first = bigNumber(0),
second = bigNumber(1),
next = bigNumber(),
c;
if (n <= 1) {
return bigNumber(n);
}
for (c = 0; c < n - 1; c += 1) {
next = first.plus(second);
first = second;
second = next;
}
return next;
}
var n = 4784969; // The first fibo number with one million digits!
var resultString = fibo_iterative(n).toString();
console.log("The " + resultString.length + " digits of fibo(" + n + ") are:");
console.log(resultString);
console.log("Done.");
Looking at what they have thrown away and what they have kept in their Java example I guess much the same result would arise for JS.
I can't imagine what they are considering "wheat" in a piece of code. If you look at the example they have there you see the wheat they produce contains the symbols "(", ")", ";", "{", "}".
That just seems totally wrong as all of those brackets and semicolons are redundant if one uses a white space delimited language like Spin or Python. In fact the round brackets and semi-colons are redundant even if you don't.
Their wheat contains "int". This makes no sense as the essential meaning of the input bubble sort algorithm in no way depends on the data type of the input (as long as the elements are orderable)
So they have kept "chaff" that we might like throw away as being noise. They have kept the noise as "wheat". All seems backward to me.
Just more mumbo jumbo from the ivory towers of higher edjumacation! Congratulate them on their doctoral work and move on.
I was playing with the bubble sort code they provided from their sample "corpus" and without changing the 2nd lexeme results, you can write code that does a number of things to an array beside bubble sort it? To me, this means that all those code snippets would co pose a corpus of code with the same signature but possibly radically different functions. How does that help anyone?
This is an old observation. In the past computer systems were more RAM constrained than they are today. Virtual memory systems would page much more than we would find acceptable today. Program settled down to their page working set, which was the set of code pages in the 20% of frequently used code. But the 80% wasn't written to waste space. It handled various rarely used features or encountered edge cases that nonetheless needed handling. Often when you used one of those rarely used features the system would pause as it page faulted in that code.
First chopping off small functions. Why? Do the code in small functions does not count at all? Not needed to program something, because it has less then 50 byte code token?
Next randomly select 10.000 functions out of the code. Why? What can this selected set tell you about the whole code?
Nothing.
For some meaningful Analysis you have to take the full set of 100 million lines, remove the syntactical clutter like (){}[] and you might be able to count keywords.
Then you will see the most used parts of the language and can decide what a MinSet would be. But where to put that line? Top10%? Top20%? Top50%?
I think that whole Study is as bogus as a Study can be. Them guys should be fired for publishing that. They even should be fired for writing the paper.
Sorry about the rant, but I am really fed up with pseudo science. Sentences like:
70% of all children have above average grades. - Really? How do you define Average?
Fox News: 20 out of 115 people answered yes. That is 20%. - No it isn't.
How did the world got to this point? Something went terrible wrong, somewhere.
You missed the point of the "research" here. What you are describing is the well known idea that most programs have a small core of functionality that is used most of the time and a whole bunch of other features that are almost never used. Consider all the hundreds of command line options that most Linux utilities have. Most people don't know or care what they are. Then there is the world of GUI programs with a lot of GUI fluff that does not contribute to functionality.
BUT this article is about something else. It's about the "chaff" in the actual source code of the code you write. Even if it is core functionality. They give an example of a bubble sort method in Java.
I'm sure we all have a gut feeling about this "chaff" idea. It's all the stuff you have to write to implement some algorithm that is not really anything to do with what you actually want to say in your code. We can clearly see lot's of chaff in the hugely verbose COBOL language. Many would say Java is an overly verbose language, it's insistence on having to wrap everything as a class and all that required type specification is nothing to do with the algorithm you are trying to write.
I hinted at the chaff that I see in such code. All those brackets, braces and semicolons so not carry any meaning. The type specification is, int in their example, is not relevant to the meaning of bubble sort. And so on.
BUT what they have reduced that Java source down to seems like pointless jibberish to most who have seen the article. As somebody pointed out you can write many different algorithms that result in the same "wheat" collection as their bubble sort example, so clearly their idea of wheat carries no meaning, so what's the point?
I'm with most other readers of that article. It's junk research.
They do however hint at a idea that may be useful if we could do it. They point out that with google search a few well selected keywords will get you useful results. That in natural language, like English, you can drop "chaff" words from sentences and people will still get the meaning. So how cool would it be if we could search huge source code bases using just a few key words.
For example if you have a large team writing millions of lines of code perhaps the same things have been implemented in there many times, each unaware of he other. How cool would it be to be able to search for that and refactor the dupes away.
Or what if you want to search all of github for all the implementations of something.
Sadly what those researchers are doing is far away from that.
Had they been able to use the "wheat" strings that they pulled out of a Java method to search all the code base they had for similar Java methods then I might be impressed.
It has things like duplicate runners for your source. Really nice product. And free of cost as long as you not run to much projects per installation.
It is a quite nice web based build/test/deploy management system, working with a lot of Source Control Systems. Once somebody is checking something in, tests can be run and depending on output, deployment can be executed.
Since it is a server based installation (Tomcat webserver?) it runs completely independent of your work station.
Them duplicate runners are good. Like lint they kick your ***. It really hurts to find out how often you use copy and paste for similar parts of your code. Them duplicate runner show to you what you did. Embarrassing, somehow.
Heater, wow it's about the source not the executable? That's completely stupid. Next they'll complain about comments being nonfunctional and should be eliminated as chaff.
I honestly don't understand why people complain so much about source code verbosity. It's like people think programming is a typing exercise where words per minute counts. Most of my time is spent figuring out what needs to be done, thinking about how to do it, and finally doing it. The verbosity of the programming language has little impact on my overall productivity because typing is only a part of what I do.
My bigger complaint about all C family languages is their over reliance on brackets, braces, and parentheses. It makes them nearly impossible for me to touch type because I'd be using my right little finger all the time. So I shift my hand over and use my pointer and index fingers to avoid hand strain.
...it's about the source not the executable? That's completely stupid.
Whilst I think their whole thesis is stupid the idea of "chaff" in source code is not so dumb. A bit of verbosity is OK in by book but it can go overboard. The text required to get the thing compiled or satisfy the language rather than the programmer can make grokking the underlying algorithm harder.
I honestly don't understand why people complain so much about source code verbosity. It's like people think programming is a typing exercise where words per minute counts. Most of my time is spent figuring out what needs to be done, thinking about how to do it, and finally doing it. The verbosity of the programming language has little impact on my overall productivity because typing is only a part of what I do.
I do agree. Programming is not a typing competition. If the majority of your time is not spent on the thinking part something is wrong. Or you will be spending a lot of time thinking later when your code needs debugging
My bigger complaint about all C family languages is their over reliance on brackets, braces, and parentheses. It makes them nearly impossible for me to touch type because I'd be using my right little finger all the time. So I shift my hand over and use my pointer and index fingers to avoid hand strain.
Now you are contradicting yourself. You don't need to touch type, this is not a typing competition. Any way those brackets and braces are there to slow you down so you have more time for thinking
Now back to our research paper.
Let's look at their example of a Java Bubble Sort:
private static void bubbleSort(int array[]) {
int length = array.length;
for (int i = 0; i < length; i++) {
for (int j = 1; j > length - i; j++) {
if (array[j - 1] > array[j]) {
int temp = array[j - 1];
array[j - 1] = array[j];
array[j] = temp;
}
}
}
}
They take that source and pronounce that 90% of it is "chaff" and all your really need is the important bits. In the same way you can drop words from English and still be understood. Then they remove the chaff and say that the following string represents the meaning of the source.
int length = array . for ( i 0 < ; + ) { if [ j 1 - ] > temp }
So what have they done? They have reduced 300 odd characters down to 23.
Basically they have made a hash function.
That resulting hash has no more connection to the original meaning of the code than if I had taken the MD5SUM. And like all hashes it will suffer from collisions. No doubt there are many functions that perform totally different tasks that will hash down to the same string.
Not only that I'm sure I could find ways to write bubble sort that don't look anything like the original and hash down to something completely different. Simply using "long" instead of "int" or "while" instead of "for" would be a good way to confound their concept.
Yes. Not useful as presented, but I knew it would generate some discussion.
I was just thinking about lean, and various ways to get about that concept related to programming. A part of lean is complexity state at any given time. The core task to accomplish may itself be complex. That is what it is. Lean is about the meta-task complexity being low, or as low as can be managed. The higher the meta-complexity, the lower the efficiency and the higher the error rates are.
Another part of lean is being able to work with tight feedback loops. See trouble, deal right then, continue with observations until trouble gone. The greater the latency, the higher the cost of errors and efficiency can be very significantly reduced, depending on the nature of errors.
Chip also adds what you type as part of lean. One time, I heard him say, every keystroke should be advancing your goal or something like that.
There is readability too.
The COBOL posted here a while back was surprisingly readable. Wasn't hard to understand what it did. Documentation is generally a part of lean in that clarity of task understanding reduces error and enables consistent change and or decisions where they are required. The cost is some efficiency, the benefit is reduced error and or flexibility. (among others)
Now you are contradicting yourself. You don't need to touch type, this is not a typing competition. Any way those brackets and braces are there to slow you down so you have more time for thinking
Actually, you do need to touch type, if you're a touch typist. As I have been since I was 14. Switching to single finger pecking is like stopping and going off the bicycle all the time, or write something with the left hand, or any other abrupt break in the flow. It's not something you wish to do if you can possibly avoid it (and it's too late to think at that point..) But it is possible to touch type the C brackets and braces of course, at least on a US keyboard (on my Nordic keyboard the {} chars are inconveniantly accessed via Altgr).
Yes most of the programming is done in the head. The writing is just the end part. But that end part better be flowing nicely. For touch typists it's painful to have to sit and watch non-typists enter code.. it takes *forever*! It's not long compared to the design phase of programming, but imagine when you are watching someone walking the last one metre to the front door.. and taking half a minute to do so. It doesn't matter overall, but it's not good to watch.
Over on the 6502 forum a guy came up with the idea of a 'touch-typists assembler'. Now that sounds strange.. but what he did was to define an assembler syntax which didn't have any of those characters so common in assembly but slightly inconviniently placed on the keyboard (# and $ dn & and so on). It sounds a bit silly, but I warmed to the idea.
My bigger complaint about all C family languages is their over reliance on brackets, braces, and parentheses. It makes them nearly impossible for me to touch type because I'd be using my right little finger all the time. So I shift my hand over and use my pointer and index fingers to avoid hand strain.
You could do what I do: programmer Dvorak on a Kinesis Advantage keyboard:
Before I switched from QWERTY my hands were constantly hurting, and I had to take breaks to rest them. Ever since I switched 6 years ago I've never had any pain.
Keyboards, I hate key boards, all of them. Big, ugly, mostly redundant things. I'm not sure why after all these years I have not built a custom keyboard for myself.
Firstly I want to chop off the num pad area. All redundant and a waste of space. Space where my mouse could be running so I don't have to reach so far to get it.
The Caps Lock, that has to go. Nothing but trouble. Does anybody ever use that?
Along the bottom I have perhaps 4 keys that are never used. Two Windows keys that don't seem to do anything. An Alt key that does nothing. And a menu key that I only just this minute discovered does actually act like a right mouse click. They can all go.
Then we have things like § and ½ and ¤. They can go. Does any one ever use those? Oddly none of those can be used in symbol names in JavaScript despite the fact that Hebrew, Arabic and a million other Unicode characters can be.
At the top we have 12 function keys. 12 for goodness sake. They can all go. I never remember what function key is what function and it's always different from app to app so I don't bother remembering them.
There are three odd keys, a crescent moon, a light bulb, and a power switch symbol.They can go. They never seem to do anything....Holly s..I just hit one of those keys for the first time in years and my machine immediately switched itself off!!! WTF?
Edit: Turns out the moon is a sleep button. Please no, this machine is a server as well and should never sleep. The light bulb seemed to throw away my Chrome tab and open a new tab, thus losing my edit. Great! I dare not hit he power looking key.
Now, that area where the cursor keys live. Pause/Break can go. Scroll Lock can go. Page Up and Page Down can go, we can use shift and and cursor up/down for that. Home and End can go, we can use shift cursor left/right for that.
There, that's much better. Cleared a lot of keyboard "chaff" and made some space. Now perhaps we can pull those {[]} keys out to keys of their own. In fact we can do that for all the Alt-Gr keys and get rid of Alt-Gr !
Comments
The attached file contains all the results of fibo(n) for 95001 <= n <= 95752.
Those results vary from 19854 digits to 20011 so you have about 15 million fibo digits to analyse!
Being the anti-JS thread of course I had to do this in JS like see bellow.
This far outstrips what we can do in COBOL.
It outstrips what I can do in Python which only got up to fibo(27558). Though I suspect Python could do better if I could figure out how to give it more stack or memory or whatever it needs.
It takes about 16 seconds to produce those 751 results. I checked the final result against the fibo calculator lined to above. That blows up if you ask for an exact result but if asked for the first few digits and number of digits in the result then we are in agreement. I think we can say by induction that the rest of the lower digits must also be correct. I can't get any bigger results without finding more memory.
What can the Java boys do with this?
...your American...
I am as American as you are out of Finland.
Like Konrad Zuse I am a Kraut, as them British soldiers loosing pool against me all the time told me.
I looked at the Zuse Z3(?) sitting in Munich and no way that it used a HLL. Zuse might have had thought about stuff like that, but never did.
Flow-Matic and Math-Matic by Grace Hopper where done before FORTRAN.
But like other things in History the time was ripe for it. Like with automobiles and planes. Different people at different places, mostly not knowing of each other, had the same ideas at the same time.
Enjoy!
Mike
My reading of history is that:
FLOW-MATIC - Concept Grace Hopper in 1955. Released 1958.
FORTRAN - Concept John W. Backus in 1953. Released 1956.
Anyway I'm sure you are right. The idea and the means to do it were "in the air".
Also the problem with pinning dates to these things is in the definition of the thing. What exactly is a compiler? Does it count as a compiler if it cannot compile itself (even conceptually if not in practice)? Does it count if it's basically an assembler wrapped up with some easier function calling syntax? Does it count if uses numbers for variable an function references rather than names? And so on.
We can have similar arguments about who built the first electronic computer and when.
Because, for the sake of argument, I don't buy it. From one source:
In producing A-0, she took all the subroutines she had been collecting over the years and put them on tape. Each routine was given a call number, so that it the machine could find it on the tape. "All I had to do was to write down a set of call numbers, let the computer find them on the tape, bring them over and do the additions. This was the first compiler," as described by Grace.
Sounds more like an overlaying linker/loader rather that a compiler for a high level language. Where are the human readable symbolic names? Where are the constructs for sequence, selection, iteration, the essentials of algorithm expression?
On the other hand FORTRAN had all this worked out and specified in BNF.
I in no way want to belittle the wonderful Grace Hopper. I'm sure she was on the right track as were others at the time.
Edit: Changed "written in" to "compiled by".
http://www.atarimagazines.com/v4n7/kyanpascal.html
If someone asked me to produce the source code to prove it, I couldn't. The hard drive it was on is long gone because after the market for it was gone it wasn't worth saving. The fact that it might be of historical interest 30 years in the future didn't occur to any of us at the time. So all that's left of it are magazine reviews saying we did it. Probably the same is true for all those early compilers from the 50's.
I was not really thinking of the source for the compiler itself but rather examples of the source code that it compiled. That is to say sources that showed the language design. I might have thought the language had be discussed in books, journals or papers of the time that would have survived somewhere.
Whilst we are here. Did you say you were using BigDecimal in your project? Is it the popular one for node that originates from Java via GWT?
Because if so you may be wanting to look at these bugs that have been uncovered. I was looking at a different big number library https://github.com/MikeMcl/bignumber.js/ and found this statement:
Sure enough if you look in that bugs.js file there are some scary things.
http://archive.computerhistory.org/resources/text/Remington_Rand/Univac.Flowmatic.1957.102646140.pdf
Enjoy!
Mike
Looks a lot like macro assembler type stuff to me.
Hey, that's really cool. Thanks for finding that. I could be convinced. Studying the code now. Seems to have the flavour of BASIC.
I'm already curious about this statement: Is that actually self modifying code going on there? Operation 9 originally looked like: So what is the difference between JUMP and GO TO ?
Curiouser and Curiouser.
There he says: "If we have run out of B data, we rewrite (9) so that all the rest of the products go directly to the unpriced output file."
.... or strange monster called Propmatic.
For you practical extraction and reporting delight I offer all the digits of fibo(4784969). This is the first number in the Fibonacci sequence to have 1 million digits!
It looks, in summary, like this: The first ten digits agree with the online calculator so I think it's a correct result.
This took about 12 hours in JS, using the normal iterative algorithm like so:
This is what I think about when I talk about lean in the context of programming.
Anyone see metrics for other languages out there?
Wondering how JS compares.
I can't imagine what they are considering "wheat" in a piece of code. If you look at the example they have there you see the wheat they produce contains the symbols "(", ")", ";", "{", "}".
That just seems totally wrong as all of those brackets and semicolons are redundant if one uses a white space delimited language like Spin or Python. In fact the round brackets and semi-colons are redundant even if you don't.
Their wheat contains "int". This makes no sense as the essential meaning of the input bubble sort algorithm in no way depends on the data type of the input (as long as the elements are orderable)
So they have kept "chaff" that we might like throw away as being noise. They have kept the noise as "wheat". All seems backward to me.
I was playing with the bubble sort code they provided from their sample "corpus" and without changing the 2nd lexeme results, you can write code that does a number of things to an array beside bubble sort it? To me, this means that all those code snippets would co pose a corpus of code with the same signature but possibly radically different functions. How does that help anyone?
Plus, what Heater said about (){}[];
First chopping off small functions. Why? Do the code in small functions does not count at all? Not needed to program something, because it has less then 50 byte code token?
Next randomly select 10.000 functions out of the code. Why? What can this selected set tell you about the whole code?
Nothing.
For some meaningful Analysis you have to take the full set of 100 million lines, remove the syntactical clutter like (){}[] and you might be able to count keywords.
Then you will see the most used parts of the language and can decide what a MinSet would be. But where to put that line? Top10%? Top20%? Top50%?
I think that whole Study is as bogus as a Study can be. Them guys should be fired for publishing that. They even should be fired for writing the paper.
Sorry about the rant, but I am really fed up with pseudo science. Sentences like:
70% of all children have above average grades. - Really? How do you define Average?
Fox News: 20 out of 115 people answered yes. That is 20%. - No it isn't.
How did the world got to this point? Something went terrible wrong, somewhere.
Mike
You missed the point of the "research" here. What you are describing is the well known idea that most programs have a small core of functionality that is used most of the time and a whole bunch of other features that are almost never used. Consider all the hundreds of command line options that most Linux utilities have. Most people don't know or care what they are. Then there is the world of GUI programs with a lot of GUI fluff that does not contribute to functionality.
BUT this article is about something else. It's about the "chaff" in the actual source code of the code you write. Even if it is core functionality. They give an example of a bubble sort method in Java.
I'm sure we all have a gut feeling about this "chaff" idea. It's all the stuff you have to write to implement some algorithm that is not really anything to do with what you actually want to say in your code. We can clearly see lot's of chaff in the hugely verbose COBOL language. Many would say Java is an overly verbose language, it's insistence on having to wrap everything as a class and all that required type specification is nothing to do with the algorithm you are trying to write.
I hinted at the chaff that I see in such code. All those brackets, braces and semicolons so not carry any meaning. The type specification is, int in their example, is not relevant to the meaning of bubble sort. And so on.
BUT what they have reduced that Java source down to seems like pointless jibberish to most who have seen the article. As somebody pointed out you can write many different algorithms that result in the same "wheat" collection as their bubble sort example, so clearly their idea of wheat carries no meaning, so what's the point?
I'm with most other readers of that article. It's junk research.
They do however hint at a idea that may be useful if we could do it. They point out that with google search a few well selected keywords will get you useful results. That in natural language, like English, you can drop "chaff" words from sentences and people will still get the meaning. So how cool would it be if we could search huge source code bases using just a few key words.
For example if you have a large team writing millions of lines of code perhaps the same things have been implemented in there many times, each unaware of he other. How cool would it be to be able to search for that and refactor the dupes away.
Or what if you want to search all of github for all the implementations of something.
Sadly what those researchers are doing is far away from that.
Had they been able to use the "wheat" strings that they pulled out of a Java method to search all the code base they had for similar Java methods then I might be impressed.
google TeamCity from jetbrains.
It has things like duplicate runners for your source. Really nice product. And free of cost as long as you not run to much projects per installation.
It is a quite nice web based build/test/deploy management system, working with a lot of Source Control Systems. Once somebody is checking something in, tests can be run and depending on output, deployment can be executed.
Since it is a server based installation (Tomcat webserver?) it runs completely independent of your work station.
Them duplicate runners are good. Like lint they kick your ***. It really hurts to find out how often you use copy and paste for similar parts of your code. Them duplicate runner show to you what you did. Embarrassing, somehow.
Enjoy!
Mike.
I honestly don't understand why people complain so much about source code verbosity. It's like people think programming is a typing exercise where words per minute counts. Most of my time is spent figuring out what needs to be done, thinking about how to do it, and finally doing it. The verbosity of the programming language has little impact on my overall productivity because typing is only a part of what I do.
My bigger complaint about all C family languages is their over reliance on brackets, braces, and parentheses. It makes them nearly impossible for me to touch type because I'd be using my right little finger all the time. So I shift my hand over and use my pointer and index fingers to avoid hand strain.
Now back to our research paper.
Let's look at their example of a Java Bubble Sort: They take that source and pronounce that 90% of it is "chaff" and all your really need is the important bits. In the same way you can drop words from English and still be understood. Then they remove the chaff and say that the following string represents the meaning of the source. So what have they done? They have reduced 300 odd characters down to 23.
Basically they have made a hash function.
That resulting hash has no more connection to the original meaning of the code than if I had taken the MD5SUM. And like all hashes it will suffer from collisions. No doubt there are many functions that perform totally different tasks that will hash down to the same string.
Not only that I'm sure I could find ways to write bubble sort that don't look anything like the original and hash down to something completely different. Simply using "long" instead of "int" or "while" instead of "for" would be a good way to confound their concept.
Brilliantly non-useful.
I was just thinking about lean, and various ways to get about that concept related to programming. A part of lean is complexity state at any given time. The core task to accomplish may itself be complex. That is what it is. Lean is about the meta-task complexity being low, or as low as can be managed. The higher the meta-complexity, the lower the efficiency and the higher the error rates are.
Another part of lean is being able to work with tight feedback loops. See trouble, deal right then, continue with observations until trouble gone. The greater the latency, the higher the cost of errors and efficiency can be very significantly reduced, depending on the nature of errors.
Chip also adds what you type as part of lean. One time, I heard him say, every keystroke should be advancing your goal or something like that.
There is readability too.
The COBOL posted here a while back was surprisingly readable. Wasn't hard to understand what it did. Documentation is generally a part of lean in that clarity of task understanding reduces error and enables consistent change and or decisions where they are required. The cost is some efficiency, the benefit is reduced error and or flexibility. (among others)
Yes most of the programming is done in the head. The writing is just the end part. But that end part better be flowing nicely. For touch typists it's painful to have to sit and watch non-typists enter code.. it takes *forever*! It's not long compared to the design phase of programming, but imagine when you are watching someone walking the last one metre to the front door.. and taking half a minute to do so. It doesn't matter overall, but it's not good to watch.
Over on the 6502 forum a guy came up with the idea of a 'touch-typists assembler'. Now that sounds strange.. but what he did was to define an assembler syntax which didn't have any of those characters so common in assembly but slightly inconviniently placed on the keyboard (# and $ dn & and so on). It sounds a bit silly, but I warmed to the idea.
-Tor
You could do what I do: programmer Dvorak on a Kinesis Advantage keyboard:
Before I switched from QWERTY my hands were constantly hurting, and I had to take breaks to rest them. Ever since I switched 6 years ago I've never had any pain.
Keyboards, I hate key boards, all of them. Big, ugly, mostly redundant things. I'm not sure why after all these years I have not built a custom keyboard for myself.
Firstly I want to chop off the num pad area. All redundant and a waste of space. Space where my mouse could be running so I don't have to reach so far to get it.
The Caps Lock, that has to go. Nothing but trouble. Does anybody ever use that?
Along the bottom I have perhaps 4 keys that are never used. Two Windows keys that don't seem to do anything. An Alt key that does nothing. And a menu key that I only just this minute discovered does actually act like a right mouse click. They can all go.
Then we have things like § and ½ and ¤. They can go. Does any one ever use those? Oddly none of those can be used in symbol names in JavaScript despite the fact that Hebrew, Arabic and a million other Unicode characters can be.
At the top we have 12 function keys. 12 for goodness sake. They can all go. I never remember what function key is what function and it's always different from app to app so I don't bother remembering them.
There are three odd keys, a crescent moon, a light bulb, and a power switch symbol.They can go. They never seem to do anything....Holly s..I just hit one of those keys for the first time in years and my machine immediately switched itself off!!! WTF?
Edit: Turns out the moon is a sleep button. Please no, this machine is a server as well and should never sleep. The light bulb seemed to throw away my Chrome tab and open a new tab, thus losing my edit. Great! I dare not hit he power looking key.
Now, that area where the cursor keys live. Pause/Break can go. Scroll Lock can go. Page Up and Page Down can go, we can use shift and and cursor up/down for that. Home and End can go, we can use shift cursor left/right for that.
There, that's much better. Cleared a lot of keyboard "chaff" and made some space. Now perhaps we can pull those {[]} keys out to keys of their own. In fact we can do that for all the Alt-Gr keys and get rid of Alt-Gr !
And finally, I will reinstate the HERE-IS key.
End of keyboard rant.
I use caps lock at times when writing, not programming.