Heater uses a debugger. Shock, horror !
Heater.
Posts: 21,230
Heater.
A couple of weeks ago I semi-jokingly, semi-seriously said this: https://forums.parallax.com/discussion/comment/1432597/#Comment_1432597
I am now eating humble Pi. For the first time in a decade or so I broke down and fired up a debugger. The story...
1) A certain server side process, in C++, that had been running for months suddenly decided to crash out a couple of times every day.
2) We have the source but much of it written by a partner company. I don't know my way around it.
3) It's the Easter holiday, there is no one around to fix it but me
4) OK, I break out gdb.
a) First set things up to get core dumps on segfaults: $ ulimit -c unlimited
b) Rebuild the process with debugging symbols, -g option.
c) Restart the process. Go down the pub for a few hours and wait for it to segfault.
d) When I get back there it is, dead, but with a core file to look at.
e) Easy: $ gdb theExecutable corefile. BOOM, there displayed is the offending source line.
Turns out our partner code has a buffer overrun in a parser function. It's using a fixed size buffer but the data length is pulled straight from incoming binary messages. Ergo, an error in the input overruns the buffer, segfault.
A quick fix to put a test of the length in there and prevent the overrun. But where are those messages errors coming from anyway...
So, Roy was right, the debugger saved our bacon.
On the other hand I still stick to my point a bit. C++ is a "dangerous" language and has become so complex that no single human knows how it all works. If we had proper testing in place, unit tests, integration tests, etc this bug would have been found when the code was written, not months after it was deployed. The debugger, as last resort, would not be needed.
Sorry, for the way OT ramblings but hey, it's Easter, we were facing disaster, we overcame it. Gotta tell someone...
A couple of weeks ago I semi-jokingly, semi-seriously said this: https://forums.parallax.com/discussion/comment/1432597/#Comment_1432597
To which Roy Eltham replied:Nah, the only people who need debuggers are C++ programmers. The language they are trying to use has become so complex that they have no idea what their programs will do ahead of time and have to single step their way though it to find out.
Heater, I know you probably are trying to push buttons, but you are so incredibly very wrong with that statement, joke or not.
I am now eating humble Pi. For the first time in a decade or so I broke down and fired up a debugger. The story...
1) A certain server side process, in C++, that had been running for months suddenly decided to crash out a couple of times every day.
2) We have the source but much of it written by a partner company. I don't know my way around it.
3) It's the Easter holiday, there is no one around to fix it but me
4) OK, I break out gdb.
a) First set things up to get core dumps on segfaults: $ ulimit -c unlimited
b) Rebuild the process with debugging symbols, -g option.
c) Restart the process. Go down the pub for a few hours and wait for it to segfault.
d) When I get back there it is, dead, but with a core file to look at.
e) Easy: $ gdb theExecutable corefile. BOOM, there displayed is the offending source line.
Turns out our partner code has a buffer overrun in a parser function. It's using a fixed size buffer but the data length is pulled straight from incoming binary messages. Ergo, an error in the input overruns the buffer, segfault.
A quick fix to put a test of the length in there and prevent the overrun. But where are those messages errors coming from anyway...
So, Roy was right, the debugger saved our bacon.
On the other hand I still stick to my point a bit. C++ is a "dangerous" language and has become so complex that no single human knows how it all works. If we had proper testing in place, unit tests, integration tests, etc this bug would have been found when the code was written, not months after it was deployed. The debugger, as last resort, would not be needed.
Sorry, for the way OT ramblings but hey, it's Easter, we were facing disaster, we overcame it. Gotta tell someone...
Comments
And only that, nowadays.
We run into this kind of thing (having to debug code written by someone other than those currently working with it) often enough. A debugger is required.
Our projects, contain thousands of source files, hundreds of executables/dlls, and run on multiple platforms. We have many many gigabytes of data that the code works with coming from flat files, databases, and third party services. Our client code runs on hundreds of thousands of end users machines (almost all of them being different from each other) sending data over the internet to our multi-server back ends spread across the world. It's practically impossible to test for every possible situation ahead of time, let alone doing so in a time frame that would be acceptable for our success in our market. A good debugger is required.
Make friends with valgrind, has tests for all these kinds of errors.
We do unit tests at work, and poorly written unit tests (we have to get 100 % code coverage!) give a false sense of having done a good job. But they catch many potential problems and are very useful to check, at least partially, that newer changes didn't broke anything. I use them at home too
Are you sure you couldn't have solved the problem with an oscilloscope? That's what I use for a debugger when deductive logic fails me.
-Phil
It's a good point though, there are many good debugging aids besides debuggers. In the past I have used:
Logic analyser - to see what is wrong on the I/O bus.
Oscilloscope - general I/O watching, especially if analog of course.
Multimeter - Check supply voltages and low speed I/O.
LEDs - A simple LED and resistor makes a logic probe.
Speaker/piezo sounder - Listen to your logic!
Tongue - Is that battery dead?
Finger - Is the 500v HT rail up!
Nose - Did the magic smoke come out?
(a bit OT - but the above reminds me of my militiary service time, where I was a telex operator (paper tape). During night shifts, mostly spent on the sofa, I could hear from the noise of the telex machine if names of Russian ships ticked in)
Friend of mine happened to notice one day that when some piece of equipment started failing the station of the FM radio he had on the bench went silent. Using the radio as a debugger he narrowed it down to a power supply regulator that would start oscillating and blasting out on the FM band. Which was interesting because it's not easy to get old 7805 regulators to oscillate.
If you want to get serious there is always field probes https://www.rohde-schwarz.com/fi/product/emc_near-field_probes-productstartpage_63493-73798.html
Oh, and thermometers and IR cameras can be useful.
That code was running on an embedded device with limited memory where valgrind would not even fit. Luckily I had insisted that the entire project should be runnable on PC Linux, Windows and Mac. So I could poke at it at leisure on my PC.
Actually... my experience has been that if you get the team to build code that will run on varied platforms, besides the target, that in itself unearths a lot of problems.
As far as I can tell the new analysers in GCC and Clang do a much better job. Also true.
Getting 100% coverage is only the beginning. One has to test for all kind of cases of input values and permutations. And edge cases. Not to mention all kind of cases of internal state.
For example, we want test code like this: A couple of tests with A equal to B and A not equal to B gets you 100% coverage. That is easy.
But what if there is a bug? What if the programmer had incorrectly written "(A <= "? Or something way off base like "((A == || (A == 42))"? Then your two tests might miss that.
You end up having to write a ton of test cases for even the simplest code.
Which is what I have spent many happy hours doing on serious projects in military, space, aerospace, etc. There is no way to use a debugger when the stuff is deployed!
The forums software has corrupted my post by replacing "B )", without the space in the middle, with "<span class="Emoticon Emoticonnerd">"
How nuts is that?!
Once I saw a compiler bug report from some fellow programmer. He was complaining because the compiler supposedly generated wrong code... he did something like this:
1) Using "int" is undefined behavior. Might not be the size you think it is.
2) Bitwise OR on floats is not on. I can't even get GCC to even compile that no matter what I use for SOME_CONSTANT.
3) Testing floats for equality, or inequality, is prone to failure.
Even if he used a logical OR, as in ((my_fl32 || SOME_CONSTANT), that is equivalent to an equality comparison with zero on each side, so that would be two bugs in one line, as per 3) above.
What have I missed?
Thats what I'll call it when a perfectly functioning PBASIC Program decides not to load on a specific COM Port.
At least your original statement held true. You only had to use a debugger because it was C++, and written by someone else.
Yes unit testing can be a pain in the Java/C++, though it is always worth it in the end if it is done correctly. It takes a lot of thought to write good unit tests for what is being tested, though you have to write each at least 5 times using different algorithms for the unit tests for each time (just in case there is a bug in the unit test code).
I much prefer taking the time to do correct and thorough unit testing though, it eliminates a lot of potential bugs (yea there is always the stray bug that falls through unit testing).
What is "unit testing"?
In products that I've worked on in the past, debug logs were valuable in tracking down problems that occurred at customer sites. In some cases, the use of core dumps along with a debugger were the only way to track down the cause of a crash.
What is a "unit of software", you might ask? A very good question.
Generally programs are composed of a bunch of parts, functions, procedures, classes,modules. A "unit" might be any of those, depending on the language one is using.
The idea of a unit test is to verify that those functions, procedures, classes, modules, etc work as expected, prior to using them in the complete program. A unit test becomes a little program that uses those "units" and checks that they do what they should. And importantly checks that they don't do what they should not.
To bring this into the Propeller world, one might want to test a Spin object before using it in a bigger program. You might have noticed that most Spin objects in OBEX and found elsewhere come with demo programs that show how they work. In a way those demo programs are unit tests for Spin. They at least show the object in question works to some degree.
I was thinking a unit was some kind of individual part and I know what verification testing is but more from a physical (real world part) than code.
Now the bigger question is how do you verify and validate those software tools or test programs, or do you even bother or care to?
-Phil
So, assuming you have a good requirements document:
1) Make a software design. This gets reviewed and signed off by a couple of guys other than the design authors.
2) Write software according to the design. This gets reviewed and signed off by a couple of guys other than the software authors.
3) Create unit tests. The shape of these of course depends on the way the software was written, but the expected test results are traceable back to the requirements document. This gets reviewed and signed off by a couple of guys other than the test writers.
4) Perform unit tests. The results of which are reviewed and signed off by a couple of guys other than those who performed the tests.
Of course all that palava gets repeated when all those units are assembled into the complete program and integration testing is done. In simulators and on the real target hardware.
Now, as you say, all of that depends on using various software tools, compilers and so on. What about those? Well, they are just software, built with same attention to detail.
It has been known that projects get built three times, by three different teams, using three different tool sets. If the results don't match then you know there is a problem somewhere.
Except, in one case I recall we were using a compiler from a not so trustworthy and traceable source, Intel. The customer insisted on reviewing all the assembler output of that compiler in case there were compiler bugs!
If you ever find yourself flying in a Boeing 777, just relax and enjoy the flight, safe in the knowledge that Heater spent a year or so testing it's fly by wire software!
777 is my favorite airplane after the 747.
I've heard of design reviews but I don't know exactly what they do there.
I have also done some firmware verification testing and seen some of the software specification documents.
When I was at J&J even equipment software needed validation.
Debugging was HARD, e.g. why didn't that apple get tripped off the conveyor where it was supposed to? My main debugging tool was a video camera, playing footage back frame-by-frame, from the time an apple passed under the camera until it reached its targeted drop point. Stainless-steel conveyor chain stretches over time, and that can cause timing errors vis-a-vis the encoder signals going back to the controller. But to a shop foreman, it all looks like a developer screw-up.
I dreaded the 8 a.m. phone call, because I knew I would not be sleeping in my bed that night. I probably aged more in the year I turned 35 than in any year since. But the work paid for my house, along with significant royalties after I exited the day-to-day grind. So I would be loathe to complain.
BTW, my favorite plane was a DC10. (Sorry, Boeing!) But I haven't flown in a 777 -- or any other plane in quite some time.
-Phil
I've always been fascinated by those optical scanning/grading machines.
Frank,
I am a big believer in Torture testing because I've seen production do horrible things to tooling and end users sometimes do even worse.
I think the ultimate torture test is if kids can't destroy it by hand.
Yes, the put the keyboard with the keys down and press it, or press as many keys at once as many times as possible and see if it still works...
They provide traceability, to see when something got broken (assuming you store the results of the tests), and who did it too.
Once upon a time, I worked for a shop that sold "security software". They had an "intrusion alert" package, it was called something like Intruder alert, that you could install in Novell Netware. In our clients it would "abend" from time to time (in all of them!). I never got a memory dump to analyze the problem. The software developers where baffled because they never got to reproduce the problem... I thought it was due to slow links between the servers, they had something like 64k links. It was most probably a buffer overrun, they probably thought "we will send the information to the other server before the buffer is full and a new event happens, networks are quite fast"... not all of them.