Shop OBEX P1 Docs P2 Docs Learn Events
Heater uses a debugger. Shock, horror ! — Parallax Forums

Heater uses a debugger. Shock, horror !

Heater.

A couple of weeks ago I semi-jokingly, semi-seriously said this: https://forums.parallax.com/discussion/comment/1432597/#Comment_1432597
Nah, the only people who need debuggers are C++ programmers. The language they are trying to use has become so complex that they have no idea what their programs will do ahead of time and have to single step their way though it to find out.
To which Roy Eltham replied:
Heater, I know you probably are trying to push buttons, but you are so incredibly very wrong with that statement, joke or not.

I am now eating humble Pi. For the first time in a decade or so I broke down and fired up a debugger. The story...

1) A certain server side process, in C++, that had been running for months suddenly decided to crash out a couple of times every day.

2) We have the source but much of it written by a partner company. I don't know my way around it.

3) It's the Easter holiday, there is no one around to fix it but me :(

4) OK, I break out gdb.

a) First set things up to get core dumps on segfaults: $ ulimit -c unlimited
b) Rebuild the process with debugging symbols, -g option.
c) Restart the process. Go down the pub for a few hours and wait for it to segfault.
d) When I get back there it is, dead, but with a core file to look at.
e) Easy: $ gdb theExecutable corefile. BOOM, there displayed is the offending source line.

Turns out our partner code has a buffer overrun in a parser function. It's using a fixed size buffer but the data length is pulled straight from incoming binary messages. Ergo, an error in the input overruns the buffer, segfault.

A quick fix to put a test of the length in there and prevent the overrun. But where are those messages errors coming from anyway...

So, Roy was right, the debugger saved our bacon.

On the other hand I still stick to my point a bit. C++ is a "dangerous" language and has become so complex that no single human knows how it all works. If we had proper testing in place, unit tests, integration tests, etc this bug would have been found when the code was written, not months after it was deployed. The debugger, as last resort, would not be needed.

Sorry, for the way OT ramblings but hey, it's Easter, we were facing disaster, we overcame it. Gotta tell someone...

Comments

  • evanhevanh Posts: 15,187
    Nice. Clearly made the bug finding job a lot faster.
  • TorTor Posts: 2,010
    That's what I use gdb for - looking at coredumps.
    And only that, nowadays.
  • I work as part of a team of many coders, on a code base that spans more than a decade in its creation/usage. Lots of the people that contributed to this code base no longer work with us. We also use a decent number of 3rd party libraries (mostly with full source).
    We run into this kind of thing (having to debug code written by someone other than those currently working with it) often enough. A debugger is required.

    Our projects, contain thousands of source files, hundreds of executables/dlls, and run on multiple platforms. We have many many gigabytes of data that the code works with coming from flat files, databases, and third party services. Our client code runs on hundreds of thousands of end users machines (almost all of them being different from each other) sending data over the internet to our multi-server back ends spread across the world. It's practically impossible to test for every possible situation ahead of time, let alone doing so in a time frame that would be acceptable for our success in our market. A good debugger is required.

  • Also, thanks for sharing this. :) Glad to see you come around a little.
  • AleAle Posts: 2,363
    heater: I suspect that a bug never comes alone :).

    Make friends with valgrind, has tests for all these kinds of errors.

    We do unit tests at work, and poorly written unit tests (we have to get 100 % code coverage!) give a false sense of having done a good job. But they catch many potential problems and are very useful to check, at least partially, that newer changes didn't broke anything. I use them at home too :)
  • Heater,

    Are you sure you couldn't have solved the problem with an oscilloscope? That's what I use for a debugger when deductive logic fails me. :)

    -Phil
  • Heater.Heater. Posts: 21,230
    Phil,
    Are you sure you couldn't have solved the problem with an oscilloscope? That's what I use for a debugger when deductive logic fails me.
    Pretty sure. I could not find scope probes long enough to reach our server in a Google data center on the west side of the USA, where the bug manifested, from Finland!

    It's a good point though, there are many good debugging aids besides debuggers. In the past I have used:

    Logic analyser - to see what is wrong on the I/O bus.

    Oscilloscope - general I/O watching, especially if analog of course.

    Multimeter - Check supply voltages and low speed I/O.

    LEDs - A simple LED and resistor makes a logic probe.

    Speaker/piezo sounder - Listen to your logic!

    Tongue - Is that battery dead?

    Finger - Is the 500v HT rail up!

    Nose - Did the magic smoke come out?




  • TorTor Posts: 2,010
    You forgot AM radio.. some bugs have been found by using the radio to determine there's a loop or something.

    (a bit OT - but the above reminds me of my militiary service time, where I was a telex operator (paper tape). During night shifts, mostly spent on the sofa, I could hear from the noise of the telex machine if names of Russian ships ticked in)
  • Heater.Heater. Posts: 21,230
    Yes of course, radio, forgot that one. Even FM radios.

    Friend of mine happened to notice one day that when some piece of equipment started failing the station of the FM radio he had on the bench went silent. Using the radio as a debugger he narrowed it down to a power supply regulator that would start oscillating and blasting out on the FM band. Which was interesting because it's not easy to get old 7805 regulators to oscillate.

    If you want to get serious there is always field probes https://www.rohde-schwarz.com/fi/product/emc_near-field_probes-productstartpage_63493-73798.html

    Oh, and thermometers and IR cameras can be useful.


  • Heater.Heater. Posts: 21,230
    edited 2018-04-10 10:04
    Ale,
    ... a bug never comes alone
    How profoundly true that is.
    Make friends with valgrind, has tests for all these kinds of errors.
    I know valgrind. Last time I tried to use it made the code run many times slower and did not show up the memory leak I knew was in a big pile of C++ one of our contractors had written. No help at all. I ended up creating my own versions of "new" and "delete" so that I could log exactly what was going on and find the dead objects that were accumulating.

    That code was running on an embedded device with limited memory where valgrind would not even fit. Luckily I had insisted that the entire project should be runnable on PC Linux, Windows and Mac. So I could poke at it at leisure on my PC.

    Actually... my experience has been that if you get the team to build code that will run on varied platforms, besides the target, that in itself unearths a lot of problems.

    As far as I can tell the new analysers in GCC and Clang do a much better job.
    We do unit tests at work, and poorly written unit tests (we have to get 100 % code coverage!) give a false sense of having done a good job.
    Also true.

    Getting 100% coverage is only the beginning. One has to test for all kind of cases of input values and permutations. And edge cases. Not to mention all kind of cases of internal state.

    For example, we want test code like this:
    if (A == B)  {
        doX()
    } else {
        doY()
    }
    
    A couple of tests with A equal to B and A not equal to B gets you 100% coverage. That is easy.

    But what if there is a bug? What if the programmer had incorrectly written "(A <= B)"? Or something way off base like "((A == B) || (A == 42))"? Then your two tests might miss that.

    You end up having to write a ton of test cases for even the simplest code.

    Which is what I have spent many happy hours doing on serious projects in military, space, aerospace, etc. There is no way to use a debugger when the stuff is deployed!

  • Heater.Heater. Posts: 21,230
    Ha! That is funny. In my post above, talking about software bugs, it shows up a bug in the forum software!

    The forums software has corrupted my post by replacing "B )", without the space in the middle, with "<span class="Emoticon Emoticonnerd">"

    How nuts is that?!

  • AleAle Posts: 2,363
    I thought those bespectacled emoticons looked kind of funny...

    Once I saw a compiler bug report from some fellow programmer. He was complaining because the compiler supposedly generated wrong code... he did something like this:
    float my_fl32 = -12.34;
    int my_i32 = 500;
    
      if ((my_fl32 | SOME_CONSTANT) != my_i32)
      {
        // do something
      }
    

  • Heater.Heater. Posts: 21,230
    edited 2018-04-10 11:44
    Wow, how many bugs can one fit into 6 lines of code?

    1) Using "int" is undefined behavior. Might not be the size you think it is.
    2) Bitwise OR on floats is not on. I can't even get GCC to even compile that no matter what I use for SOME_CONSTANT.
    3) Testing floats for equality, or inequality, is prone to failure.

    Even if he used a logical OR, as in ((my_fl32 || SOME_CONSTANT), that is equivalent to an equality comparison with zero on each side, so that would be two bugs in one line, as per 3) above.

    What have I missed?
  • Heater.Heater. Posts: 21,230
    Mind you, back in the day, C compilers were far less fussy and would compile any old junk you fed them without warning or error. At least the ones I had on the PC in the early 1980's.

  • AleAle Posts: 2,363
    edited 2018-04-10 18:25
    Yeah, it wasn't gcc it was I think tasking for the PowerPC. I couldn't get it to compile, too, because the given example, the one my colleague submitted, was so full of errors and misconceptions (like you noted bitwise OR on floats ?), that I thought go grab a book before posting garbage and blaming the compiler... I'm sure he coerced the compiler somehow... (rollseyes)
  • Heater. wrote: »
    I knew was in a big pile of C++

    Thats what I'll call it when a perfectly functioning PBASIC Program decides not to load on a specific COM Port. :innocent:
  • Well Heater:
    At least your original statement held true. You only had to use a debugger because it was C++, and written by someone else.

    Yes unit testing can be a pain in the Java/C++, though it is always worth it in the end if it is done correctly. It takes a lot of thought to write good unit tests for what is being tested, though you have to write each at least 5 times using different algorithms for the unit tests for each time (just in case there is a bug in the unit test code).

    I much prefer taking the time to do correct and thorough unit testing though, it eliminates a lot of potential bugs (yea there is always the stray bug that falls through unit testing).
  • davidsaunders,

    What is "unit testing"?
  • I believe unit testing refers to thorough quality control tests before a software version or product is released. It might also refer to testing each unit of a product before it is shipped. According to Wikipedia, unit testing is the process of testing individual modules before they are integrated. Whichever definition is used, it is difficult, if not impossible, to test a product for every type of configuration that a customer will use it in.

    In products that I've worked on in the past, debug logs were valuable in tracking down problems that occurred at customer sites. In some cases, the use of core dumps along with a debugger were the only way to track down the cause of a crash.


  • Heater.Heater. Posts: 21,230
    Genetix,
    What is "unit testing"?
    In the software engineering world a "unit test" is a test of a unit of software.

    What is a "unit of software", you might ask? A very good question.

    Generally programs are composed of a bunch of parts, functions, procedures, classes,modules. A "unit" might be any of those, depending on the language one is using.

    The idea of a unit test is to verify that those functions, procedures, classes, modules, etc work as expected, prior to using them in the complete program. A unit test becomes a little program that uses those "units" and checks that they do what they should. And importantly checks that they don't do what they should not.

    To bring this into the Propeller world, one might want to test a Spin object before using it in a bigger program. You might have noticed that most Spin objects in OBEX and found elsewhere come with demo programs that show how they work. In a way those demo programs are unit tests for Spin. They at least show the object in question works to some degree.

  • Thanks Heater,

    I was thinking a unit was some kind of individual part and I know what verification testing is but more from a physical (real world part) than code.


    Now the bigger question is how do you verify and validate those software tools or test programs, or do you even bother or care to?
  • Oh, dg. In my mind, "unit testing" falls into the same jargon bucket as "use case." It's overused, misused and, as a consequence, nearly meaningless. What's worse, the previous sentence forced me to omit an Oxford comma, due to too many commas. How will I ever get to sleep tonight? :)

    -Phil
  • Heater.Heater. Posts: 21,230
    edited 2018-04-13 03:22
    Back in the day when I was involved in software that had to work, think avionics and secure communications, everything was checked and double checked.

    So, assuming you have a good requirements document:

    1) Make a software design. This gets reviewed and signed off by a couple of guys other than the design authors.
    2) Write software according to the design. This gets reviewed and signed off by a couple of guys other than the software authors.
    3) Create unit tests. The shape of these of course depends on the way the software was written, but the expected test results are traceable back to the requirements document. This gets reviewed and signed off by a couple of guys other than the test writers.
    4) Perform unit tests. The results of which are reviewed and signed off by a couple of guys other than those who performed the tests.

    Of course all that palava gets repeated when all those units are assembled into the complete program and integration testing is done. In simulators and on the real target hardware.

    Now, as you say, all of that depends on using various software tools, compilers and so on. What about those? Well, they are just software, built with same attention to detail.

    It has been known that projects get built three times, by three different teams, using three different tool sets. If the results don't match then you know there is a problem somewhere.

    Except, in one case I recall we were using a compiler from a not so trustworthy and traceable source, Intel. The customer insisted on reviewing all the assembler output of that compiler in case there were compiler bugs!

    If you ever find yourself flying in a Boeing 777, just relax and enjoy the flight, safe in the knowledge that Heater spent a year or so testing it's fly by wire software!
  • Heater,

    777 is my favorite airplane after the 747.
    I've heard of design reviews but I don't know exactly what they do there.
    I have also done some firmware verification testing and seen some of the software specification documents.

    When I was at J&J even equipment software needed validation.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2018-04-13 04:21
    My initiation into control software was with linescan-camera fruit sizers. Fortunately, I had total control over all the firmware, so I didn't have to interact with other developers' programming. Unfortunately, the performance criteria of the system were, to a certain extent, subjective. For example, a red delicious apple lying on its side will look bigger from above than one standing up. But try to tell that to a packing house foreman whose apple case weights are all over the place.

    Debugging was HARD, e.g. why didn't that apple get tripped off the conveyor where it was supposed to? My main debugging tool was a video camera, playing footage back frame-by-frame, from the time an apple passed under the camera until it reached its targeted drop point. Stainless-steel conveyor chain stretches over time, and that can cause timing errors vis-a-vis the encoder signals going back to the controller. But to a shop foreman, it all looks like a developer screw-up.

    I dreaded the 8 a.m. phone call, because I knew I would not be sleeping in my bed that night. I probably aged more in the year I turned 35 than in any year since. But the work paid for my house, along with significant royalties after I exited the day-to-day grind. So I would be loathe to complain.

    BTW, my favorite plane was a DC10. (Sorry, Boeing!) But I haven't flown in a 777 -- or any other plane in quite some time.

    -Phil
  • One place I worked did telecom switches. When a new system build was done it would be put through a Sanity Test. This was a quick sort of go / no go testing. No point in going full testing if it blew chunks on something major. So, there's another term to add to the list of test types.
  • GenetixGenetix Posts: 1,742
    edited 2018-04-13 16:55
    Phil,

    I've always been fascinated by those optical scanning/grading machines.


    Frank,

    I am a big believer in Torture testing because I've seen production do horrible things to tooling and end users sometimes do even worse.
    I think the ultimate torture test is if kids can't destroy it by hand.
  • AleAle Posts: 2,363
    I am a big believer in Torture testing because I've seen production do horrible things to tooling and end users sometimes do even worse.
    I think the ultimate torture test is if kids can't destroy it by hand.

    Yes, the put the keyboard with the keys down and press it, or press as many keys at once as many times as possible and see if it still works...
    The idea of a unit test is to verify that those functions, procedures, classes, modules, etc work as expected, prior to using them in the complete program. A unit test becomes a little program that uses those "units" and checks that they do what they should. And importantly checks that they don't do what they should not.

    They provide traceability, to see when something got broken (assuming you store the results of the tests), and who did it too.

    Once upon a time, I worked for a shop that sold "security software". They had an "intrusion alert" package, it was called something like Intruder alert, that you could install in Novell Netware. In our clients it would "abend" from time to time (in all of them!). I never got a memory dump to analyze the problem. The software developers where baffled because they never got to reproduce the problem... I thought it was due to slow links between the servers, they had something like 64k links. It was most probably a buffer overrun, they probably thought "we will send the information to the other server before the buffer is full and a new event happens, networks are quite fast"... not all of them.

  • Heater.Heater. Posts: 21,230
    It was probably some intruder crashing the intrusion alert process so that they can have their evil way with your machines without being detected :)

Sign In or Register to comment.