Reuse: The rule of threes.
potatohead
Posts: 10,261
Everyone, we've had what? A million and one discussions on reuse and to some degree standards, which actually depend on reuse to a high degree.
So give this a read:
http://www.codinghorror.com/blog/2013/07/rule-of-three.html
We have this problem. And it's chronic too. I don't see it as a bad thing. We are a pretty smart crowd with a lot of varied interests and we've got a pretty strong "go your own way" culture inspired by the creator of the Propeller, Chip Gracey, who regularly challenges basic assumptions to pretty great effect I think. And I think most of us do.
I'm hoping for some fun discussion, hopefully no serious biting, and perhaps some realizations we can all take home and consider as we build, do, create, hack, explore, that's all.
My thoughts on it later, either in context of discussion, or just later after I've thought it through some.
So give this a read:
http://www.codinghorror.com/blog/2013/07/rule-of-three.html
Every programmer ever born thinks whatever idea just popped out of their head into their editor is the most generalized, most flexible, most one-size-fits all solution that has ever been conceived. We think we've built software that is a general purpose solution to some set of problems, but we are almost always wrong. We have the delusion of reuse. Don't feel bad. It's an endemic disease among software developers. An occupational hazard, really.
If I have learned anything in my programming career, it is this: building reusable software, truly reusable software, is an incredibly hard problem right up there with naming things and cache invalidation. My ideas on this crystallized in 2004 when I read Facts and Fallacies of Software Engineering for the first time. It's kind of a hit-or-miss book overall, but there are a few gems in it, like fact #18:There are two "rules of three" in [software] reuse:Yes, this is merely a craftsman's rule of thumb, but the Rule of Three is an incredibly powerful and effective rule of thumb that I have come to believe deeply in. It's similar to the admonition to have at least one other person review your code, another rule of thumb that is proven to work. To build something truly reusable, you must convince three different audiences to use it thoroughly first.
- It is three times as difficult to build reusable components as single use components, and
- a reusable component should be tried out in three different applications before it will be sufficiently general to accept into a reuse library.
We have this problem. And it's chronic too. I don't see it as a bad thing. We are a pretty smart crowd with a lot of varied interests and we've got a pretty strong "go your own way" culture inspired by the creator of the Propeller, Chip Gracey, who regularly challenges basic assumptions to pretty great effect I think. And I think most of us do.
I'm hoping for some fun discussion, hopefully no serious biting, and perhaps some realizations we can all take home and consider as we build, do, create, hack, explore, that's all.
My thoughts on it later, either in context of discussion, or just later after I've thought it through some.
Comments
Once you have programmed for a while, you get used to common routines that most programs require. Things like the printf function in C did not just happen by accident. It was inserted to satisfy a general need for formatting output. Other languages have specific output formatting functions as well. We even use them in spin (though they are not inbuilt) - the xxx.hex, xxx.dec, xxx.char/out/tx are all output formatting functions.
Now, how you go about reusing code varies. On the ICL mini, we had up to 20 cores (yep, on a computer released in 1969) with cog memory for each core (called partition memory) and hub memory for all cores (called common memory). Each core could execute code directly from cog or hub and jump between them. I best tell you that the cores were not true cores in todays sense, but were time switched and controlled by hardware.
Most of the actual OS was loaded into hub memory and remained resident. Calls to the OS were made by each core as required. The hub code was known as re-entrant code because all cores could execute it concurrently (although time sliced). So the OS code was actually "reused" code but designed by the supplier.
What I did was to work out what other functions were commonly required by my programs, and I loaded these up as routines into hub at the beginning of the day. This way, all my "reusable" routines were actually loaded, and my "user" programs could call these routines as required. No need to recode these routines, nor include them, in each of my programs.
I had routines to control the video terminal so that all I did was to call a routine in hub and pass it the address (row/col) on the screen, and pass it the address of the string to display. I did the same for printing - pass it addresses for headings, reset the page counter, and then just print lines by passing the address of the data line. All printing of headings, page numbers and page formatting was controlled by the hub routines - so they were reused by any program that needed to print. I later extended the hub routines to compress and spool to disc without requiring the original programs to be modified at all. This only touches the surface of the reusable routines that I built.
To illustrate one particular benefit of using these reusable "common" routines in hub, I had a group of particular programs that often searched the database for specific stock movements over the past few years. These programs took quite some time to run. Because these all utilised my "common" routines, I was able to bypass the OS for these particular searches, and yielded a 30 fold increase in performance. And, I only had one instance to change. This was pretty important as in those days, compiling a program took anything from 30 minutes to many hours!!!
On one system I wrote for a client, I had over 500 user programs in the system. Most of these would be run many times daily. And of course, these routines were used as the basis of other systems I wrote.
So, I think it fair to say, I think reusable code is extremely beneficial. And that I totally disagree with that quote!
I have endeavoured to do something similar to the PropOS that I have been working on. I am also a firm believer in standards, if you can get any agreement. To this end, I believe we have failed miserably for the prop. The only thing useful is obex. Of course, it's just my opinion.
We don't have good standards on the prop because we have a lot of others and their own purposes, not that we shouldn't have standards or intended to not have them, thus my "center of gravity" comments here to that effect from time to time.
On a P1, general applicability has proven to be just expensive enough to be prioritized beneath "it works for me and this project" more often than not, which is why obex is useful, where other things aren't so useful in a reuse sense.
On a P2, that may well not be the case more of the time, allowing for a more robust "center of gravity" effect.
Chip will be one gravity well. He's the source of a lot of the core stuff people may choose to use. Others may well prove significant gravity wells too, depending. From those, standards will appear and see traction. From what I have seen, merely defining them regardless of how well they are defined, isn't enough. Multiple entities need to adopt them and see more value in that than by not adopting them, or they will do their own thing defaulting to "code I reuse that works for me."
I'm putting gravity wells in that context. If a few of the major centers of gravity surrounding propeller code adopt standards and see value, many others will center around those things and we have standards that promote reuse.
On a micro controller, we deal with lots of the same simple stuff, over and over. Turn on the LED, listen to the serial. Micro controller applications quickly diverge.
On a workstation we can deal with more complex stuff. But there's simple stuff we always deal with. Listen to an input stream, write to standard I/O. Workstation applications still diverge, just not so quickly.
The problem comes when we try to reuse complex stuff, that doesnt lend itself to reuse. KISS.
As for gravity wells... they are obvious and well behaved. It is the gravity sinks we should worry abouthttp://www.google.com/imgres?imgurl=https://www.spa-mall.com/productimages/Install-With-Gravity-Sink.jpg&imgrefurl=https://www.spa-mall.com/index.php?type%3Dpage%26name%3Dfaq%26spa_lang%3Den&h=640&w=800&sz=93&tbnid=UlN6RYng9pxWaM:&tbnh=74&tbnw=93&prev=/search%3Fq%3Dgravity%2Bsink%26tbm%3Disch%26tbo%3Du&zoom=1&q=gravity+sink&usg=__9-912ubTmLg-d0KAg-ro-xpFn30=&docid=GE_MdtoGvamylM&sa=X&ei=iEXrUcDsC8eRqQHv_oHQBQ&ved=0CDwQ9QEwAw&dur=4376
The major questions being this: At the center of a gravitational mass(in the gravitational sink), where all force is pulling outward, does the nucleus expand or contract? Is this the same for more energetic but less massive parts of the atom?... and what does this all imply about cleaning up that mess in the Ukraine?
If you combine Spin, PASM, and the OBEX, the problem is solved...until you add C, Basic, and Forth... then everything gets murky again.
spin2, Spin2, SPIN2
There, I said it three times. That is how magic works.
I'm waiting... and filing my Veroboards to a fine pitch.
Rich
I'm not sure what you are getting at by bringing up arrest records and employment but Jeff Atwood created http://stackoverflow.com/ and http://stackexchange.com/
One or other or both is in the top 100 most visited internet sites in the world. As such I imagine they generate enough revenue and create enough work that Jeff does not need to look for other employment.
Actually I was thinking about these sites recently. When I get stuck with some odd programming problem or something is failing to compile or run spitting out an error message i don't understand, then I have learned like many others to hit Google straight away. It's quicker to find an answer than wading through the documentation and for sure someone else has already hit and solved that issue. Recently I notice that more and more the first hit on Google is stackoverflow and the first answer there is often exactly what I need.
Those sites are like this forum but for more general programming problems. It's starting to be that I cannot function without them.
Beneficial yes of course. But I think Jeff's message is very true. There is a lot more to making reusable code than code for yourself.
Any non trivial chunk of code will have multiple functions to be called to make it work and perhaps data structures to be exchanged with the user code. The Application Programing interface (API).
Your API needs to be easily understandable and easy to integrate with other peoples applications. Not something that "just grew" as you needed things for you application.
The API, indeed your library/object, needs to be focused. It should contain just the functionality it is expected to provide and nothing more. Many OBEX examples are bad for this, why is any kind of HEX conversion or other format conversion part of any serial device or display driver?
Your API should have no surprises. A thing that looks like it does X should do X and not have any weird side effects that happen to work or be required in your app.
Your API needs to be well documented - There is a huge pile of work right off.
Then there are lot's of other things to consider:
0) Is yor library/object written in a language that a lot of people can actually reuse?
2) Does your object rely on a particular OS or other environment? Is it cross platform?
3) Does you API demand that the user program is architected in a particular way?
4) Does your library include a lot of baggage that most other users don't expect or need that just bloats things up, slows things down or is just
confusing.
5) Scalability. Your code may work fine for your data set but does it scale to mine which is orders of magnitudes larger. Perhaps you have been using slow and simple algorithms instead of taking time to implement more efficient techniques.
6) Edge cases. Your code may work fine for you but will it work for all the far out corners of usage that everyone else will find and you never thought of.
7) Which brings us to bugs. One of the best ways to find bugs, apart from giving a demo to your boss where it will immediately fail, is to give your code to others to use. They will soon come back and tell you what a pile of Smile it is.
8) Testing and maintainability. Linked to the above but have you tested your code? Are there automated tests in place? Do the have good coverage? There is another huge pile of work there.
9) Communual - Hopefully your code will have a lot of users and be in use for a long time. That probably means it's going to be ongoing work for you to add/tweak features, fix bugs and so on. Your users should be part of that process. As such your source code should be easy to read, understand and modify. You will need a source code management system. You wil need that automated testing to be sure you don' break things as you go along.
10) Standards . Are you using commonly accepted standards? Perhaps your works with data input in XML when the rest of the world has moved on to JSON.
I could go on...
Jeff's work is indeed very impressive.
I suspect that there must be some way to turn his work to Parallax's advantage.
I followed the link and then poked around a little, interesting fellow.
I understand the push for additional languages and support utilities, but I'm a bit of a luddite.
Keeping the user as close as possible to the Prop2 is a formula for success. Letting
users get as far away as they like is also a formula for success, but it leads to competing
abstractions, which lead to arguments.
Rich
I tend to favor Chuck Moore's approach to this, which is that you should know your craft and be ready to write your own routines stramlined for the application you're building. This applies quadruple to an embedded environment like the Propeller. It's great to have easy to use stuff in the obex for the n00bs but when you're pressing the limits of what you can do at clock speed or with the fixed RAM resources, you've got to be prepared to prune, streamline, and specialize.
I recently developed a library of UDP routines for both VB6 and Spin/PASM for some work projects. On the Propeller end I noticed that I could get a massive reduction in code size by accepting a limitation that the Propeller would always function as a server and only ever transmit single packet responses to single packets received. This means there is always a packet in the buffer that has been received which can be rearranged to build the response, instead of having to build a packet from scratch. This halved the size of the code without compromising the functionality I needed for that application.
Perhaps one conclusion is we should improve ways to streamline, prune and specialize as general reuse is an ongoing problem due to the resource available on P1.
The Propeller is an MCU that has the easiest code reuse. Because:
1) The multiple cores and no interrupts means that any object grabbed from OBEX or elsewhere is very easy to use. Just drop it into your project and call it. It works. No messing with hooking up interrupts or messing with thread priorities etc etc. Doing this kind of thing on any other micro is much harder.
2) The Spin languages object system makes this very easy.
However there are still flaws in the plan.
For example an object will often carry around functionality that is not really it's business. The canonical example being FullDuplexSerial that has HEX and DEC and other output formatting routines included. This bloats up the code. Meanwhile the fundamental API it should have, input and output of a stream of bytes is perhaps not as standard as we would like. I believe localroger has been tackling this recently.
Then there is the case of reusing all the nice PASM code in those objects in languages other than Spin. Or indeed with no high level language at all. It has already been a subject of long debate here that such buried PASM is not is not reusable in C or other languages. That is a big shame. That leads to all the debate about standardizing mail box interfaces to PASM code which never got any traction.
Nobody ever thought that someone might like to use their PASM drivers from C (Or PASM) . Had we had a C compiler for the Prop at the time and had the "rule of three's" been in force perhaps that would have been tackled a long time ago.
Secondly, I was thinking about standards in general. We have, from time to time, had those discussions with a lot of good ideas, but no real traction. It struck me this rule of threes may well explain why when combined with the center of gravity idea. Thought it worth thinking about.
1. Reusing the Hub RAM needed for PASM images. This is a huge waste which is even worse with the 32K limit. Considering how much P1 got right it seems a bit niggly to whale on Chip for 20/20 hindsight but really, the standard EPROM should have been 64K and COGINIT should load directly from EE via I2C. Particularly with PAR in the mix for passing parameters. As it is, with the images needing to be in RAM for launch anyway, it makes much more sense to poke parameter values into the image before launch than to waste COG Longs retrieving PAR parameters.
I have had some thoughts, especially in the last week or so since getting my pseudo global object technique together, of creating a PASM server object which could be used to load PASM images from EE and then releasing the buffer memory for other uses. Right now we have techniques for reclaiming PASM images, such as using the FD serial PASM image for buffers, but these do not allow for sharing between objects and often the object's PASM image isn't the right size for its post-launch Hub RAM needs. This brings us to...
2. Spin's inability to explicitly share variable pointers either globally or in multiple parent/child branches makes this kind of resource management awkward at best. I can be onboard with not having GOTO, but Spin really badly needs a better way to share resource access in a more general way. There is no technical reason this could not be done, just as BST implements the missing @@@ absolute-address method an alternate Spin compiler could support the equivalent of PUB var / dat pointers, in the way CON and PUB procedures are supported now. If we could do that, we could create an object which declares a bunch of DAT vars used by different objects, but as long as they all total to > 2K the object could use that RAM to load up PASM images before other objects do their initializations and claim it.
Or we might declare a special class of VARs which are automatically grouped together in global memory to contribute to a COG launch buffer, which are then automatically treated independently by their owner objects after cog launch. There are lots of ways to approach this which do not require any hardware changes to P1 at all. It's just that Hub RAM is precious and this is a serious problem which needs a solution.