Oct 20

Sometimes it’s convenient to direct the output of a command to both the screen and a file. Here’s how to do it:

command 2>&1 | tee output.log

The tee command copies stdin to stdout, and makes a copy of the input to a file. The 2>&1 part also directs the stderr output, in addition to stdout. You can use tee -a to append to the log instead of overwriting.

Feb 23

The fail fast idiom is all about catching software errors at the earliest possible stage. The idiom is surprisingly often just neglected, as if developers loved to spend days tracking down obscure bugs which they know would just as well have been caught with a tiny bit of extra work in the first place.

Failing fast doesn’t mean that the number of failures increases. Rather, it means that the failures are not neglected and ignored, but found as soon as possible. For example, suppose that we call a function that returns a value, which in our subsequent code must then be below a certain maximum value:

int value = p->FunctionCall();
if ( value > max )
{
    value = max;
}

DoStuff( value );

Here, if the returned value is greater than max, it is set to max. DoStuff() requires that the value is less or equal to max. Now, this clamping may or may not be the intended behaviour. If setting the value to max is just a fail-safe in order to prevent DoStuff() from crashing if p->FunctionCall() returns an erroneous value, it is better to make sure already at this point that the value is indeed within specified bounds by adding an assertion.

Assertions are one of the foresighted developer’s best friends. There are a multitude of ways in which assertions are used, for example in Symbian code there are e.g. __ASSERT_DEBUG() and __ASSERT_ALWAYS() macros, for which you can provide a function to be called if the assertion fails. Then there’s the good old assert() macro from the standard C library, which is defined to write to the standard error output (prior to calling abort()) at least the asserted expression and the file name and line number where the assertion failed.

Where to put assertions, then? Usually there are quite a few places in code where a certain invariant must hold true. For example, there may be a member variable which must have a certain value upon function invocation, or else the results of the function are undefined. As a pseudocode example, imagine that we’re supposed to sell toys, and all their bells and whistles must be painted before they’re ready for shipment.

void Toy::Paint()
{
    for_each( bells_.begin(), bells_.end(), PaintFunc );
    for_each( whistles_.begin(), whistles_.end(), PaintFunc );
}

Let’s say that we add a new Toy to a vector container to be shipped to a retailer but forget to call its Paint() function. We now have a bunch of toys but, unbeknownst to us due to our negligence, one of them is missing paint from its bells and whistles. Uh oh, the delivery truck is here, and those guys don’t have all day. Better call the Offload() function to make some space in the warehouse:

void WareHouse::Offload( const std::vector<Toy>& toys )
{
    for_each( toys.begin(); toys.end(); ShipFunc );
}

Now we’re done with the toys, but one has slipped through without all the invariants checked. A good solution here would be to check in the Offload() function that we’re shipping out kosher items. Supposing that bells and whistles derive from a common class, Part, and that the base class contains a IsPainted() function to check if the part in question has been painted, we can add an assert to the Offload() function as follows:

void WareHouse::Offload( const std::vector<Toy>& toys )
{
    for ( std::vector<Toy>::const_iterator<Toy> iter = toys.begin();
           iter != toys.end(); ++iter )
    {
        assert( iter->IsPainted() );
    }

    for_each( toys.begin(); toys.end(); ShipFunc );
}

Here the assert will stop program execution to prevent shipping of unpainted toys. What we’re trying to achieve here is that already during development phase we are able to deal with code paths that result in unpainted toys being shipped. It’s better to make the problems manifest themselves as early as possible (to fail fast) than trying to find obscure bugs during the final, often hectic stages of an imminent release.

Aug 22

This post concerns something that has bothered me for years. Namely, the tendency of people to write their code using as few keystrokes as possible. While there’s often nothing to gain, there’s a lot to lose, two major issues being readability and debuggability. While reading this post may be akin to drinking poison to some of you (and you know who you are!), your fellow developers will thank you when they are wading through your code.

Depending on the environment you’re using, run-time debug information may be available to you in several ways. Since I have most experience with the Eclipse/Carbide IDE, the debuggability part of post will be primarily applicable to that environment, but it’ll likely apply just as well to most other IDEs. The readability part of course applies no matter what environment you’re using.

Consider the following fragment of code:

int i = 5;

When you’re running this on a debugger, you’re able to see that the value of a variable named ‘i’ is 5. (It often happens that the compiler optimizes out unused variables even in debug builds, so you may need to write some superfluous statement (such as logging) where you use such an otherwise unused variable in order to be able to see it in the debugger, but I digress.) This is all well and good, but consider the following example:

Base* b = ptrGadget->DoStuff(item).Factor()[index];

Here, some IDEs are able to tell you what kind of an object the DoStuff() and Factor() functions return. You’d still have to hover your mouse over the function names (or step into the code), which is basically extra work for you every single time you forget the return types. Considering the amount of information you have to juggle in your mind while debugging, this soon becomes a real frustration. Additionally, from a readability standpoint, let’s consider how a developer new to the project (or you after a 5-week vacation) sees this:

“We’re apparently trying to get a pointer to a class named Base (might be a virtual base class, mind you) by calling a method named DoStuff() on the ptrGadget pointer. The function takes as parameter a variable called item and returns some object (by value or reference?), on which we then call a function named Factor(). Finally, we take an element from an array(?) returned by the Factor() function and store it to the pointer b of the class named Base.”

The code works, but the problem for the person debugging the code is that a lot of details are hidden beneath the surface. For example, we cannot see just by looking at the code what kind of object the DoStuff() function returns. Neither do we see what kind of an array (or any type for which the subscript operator is defined) the Factor() function returns. Sure, you could step into the code using the debugger and have a look, but doing this every time after you forget the types in question gets old fast. Fortunately, the solution is simple and just as efficient.

For the solution, we just have to look up the types in the code being called once and write them as local (automatic) variables to our own code. For the sake of this example, let’s say you find out that the DoStuff() function returns a const reference to an object of class Charger (just picking words out of my immediate surroundings here). The Charger class has the Factor() function which returns an std::vector that contains Base class pointers.

const Charger& charger = ptrGadget->DoStuff(item);
std::vector<Base*> vecBase = charger.Factor();
Base* b = vecBase[index];

Now, we have divided our “neat” piece of code on one line to three. And to what gain? Well, we are now able to clearly see what types we are working on (Charger and std::vector<Base*>). These were hidden from the view previously. And although the code is on three lines and there’s more text, it is much more readable because now we have just a single statement per line instead of three. The debugger is also able to display information about the automatic variables we have created, namely charger and vecBase.

Naturally, you could replace the types used in this example with anything you like, and the point would still stand. The types I used where were from the top of my head and at some point I’ll probably replace them with others that better suit the example.

Jul 02

It’s easy to disregard writing code comments. Let’s face it, we as coders are not going for the Pulitzer prize, so it’s not about the lack of personal ability of being able to express oneself, it’s just plain laziness.

To be honest, leaving code without descriptive comments where they’re needed is much worse than that. Firstly, it indirectly shows the programmer’s lack of respect towards his peers. Everyone knows how frustrating it is to wade through dozens of classes and even more functions with cryptic, non-self-descripting code, especially when the original coder is on vacation, has left the company or otherwise not unavailable for any other reason. A programmer is an artist, with fellow programmers as the audience. And as all artists, we should learn to respect our audience.

Secondly, leaving out comments shows the programmer’s overconfident attitude towards himself. Sure enough, when writing code, we usually have a good idea of what we’re doing. But programming is often about juggling several things in mind at once, and when you next time look at your code, it may be quite difficult to get back to the mindset you were in when writing the code originally. In these quite common situations, it pays to have descriptive comments in place.

Naturally, I’m not suggesting to comment everything. In fact, one should avoid redundant comments when the code is clear by itself, i.e. when the code is self descriptive. For example, consider the following:

// Declare category id for products
const int prodCategoryId = 1024;

// Create an iterator over products
vector<Product>::iterator iter = products_.begin();

// Iterate through all products
for ( ; iter != products_.end(); ++iter )
{
    // Assign categody id to each product
    iter->AssignCategoryId( prodCategoryId );
}

 
This is an obvious example of excessive commenting. So what to leave out? Consider commenting general ideas, not individual code lines. We can refactor the above comments so that we both reduce comment clutter, yet make our general reasoning obvious:

// Assign categody id to each product
const int prodCategoryId = 1024;
vector<Product>::iterator iter = products_.begin();

for ( ; iter != products_.end(); ++iter )
{
    iter->AssignCategoryId( prodCategoryId );
}

The intent of the code is clear from the single comment alone. What’s more, variable and function names are descriptive enough not to leave too much for the imagination.

Also remember that the more comments you have, the more of a maintenance burden it will be to keep them all up to date. So strive to write concise and descriptive comments only when they are needed.

Jul 01

In general, the name of a class should reflect the responsibility of the class in question. Note that I say responsibility in singular form. This is because every class should provide features as a small and concise package, covering only a single reasonably sized area of responsibility. A class should not take multiple responsibilities because it would muddle its intent, make it less modular, hinder reusability and, in spirit of this blog entry, make class naming more difficult.

As developers, we often see the class names listed in alphabetical form, for example in a file explorer or an IDE. If we are unsure which class exactly we are looking for, the class name usually gives us a good idea what the class’ responsibility is.

As a real world example, consider Symbian’s RWriteStream class, which provides a base class for writing to different destinations (for example, a descriptor or a file; descriptor being basically a string) in a stream form. That is, you open a write stream to a descriptor or a file, write to it using one or more WriteL() calls, and close it when you’re done.

There are classes that derive from RWriteStream that are used for writing to a descriptor or a file, amongst other types of destination. Consider what happens when you don’t know which exact class you are looking for, but you know, or at least suspect, that there’s a RWriteStream derived class that is able to write to a descriptor. The Symbian documentation doesn’t tell you what classes are derived from RWriteStream, so no luck there. Next you have a look at the table of contents of the documentation, which will show you multiple classes, ordered alphabetically as follows (I’ve highlighted RWriteStream for your convenience):

  • RBufReadStream
  • RBufWriteStream
  • RDesReadStream
  • RDesWriteStream
  • RDictionaryReadStream
  • RDictionaryWriteStream
  • RFileReadStream
  • RFileWriteStream
  • RMemReadStream
  • RMemWriteStream
  • RReadStream
  • RShareBufReadStream
  • RShareBufWriteStream
  • RStoreReadStream
  • RStoreWriteStream
  • RWriteStream

In this simple example, after looking for it a for a moment you can spot the probable class you need to use, i.e. RDesWriteStream. But consider how much easier it would be to find the class if the classes were named so that the parts of the class names were ordered so that they would present the most generic part first, followed by the more specific parts one after another. For example, RDesWriteStream would become RStreamWriterDes (notice the added ‘r’ for better English), RFileWriteStream would become RStreamWriterFile and so on. These class names would depict that the classes are based on streams, they write to the streams (instead of reading), and their destination (i.e. a descriptor or a file.)

Following the from-generic-to-specific naming convention (don’t know of any existing name for it!) specified above, the alphabetical list would become like this:

  • RStreamReader
  • RStreamReaderBuf
  • RStreamReaderDes
  • RStreamReaderDictionary
  • RStreamReaderFile
  • RStreamReaderMem
  • RStreamReaderShareBuf
  • RStreamReaderStore
  • RStreamWriter
  • RStreamWriterBuf
  • RStreamWriterDes
  • RStreamWriterDictionary
  • RStreamWriterFile
  • RStreamWriterMem
  • RStreamWriterShareBuf
  • RStreamWriterStore

In the example above, the stream sister classes can be categorized into reader and writer classes, and further based on their source/destination. I bet you can spot the RStreamWriterDes class much easier from this more ordered looking list.

This is but a single example. The fact of the matter is that our job is hard and frustrating enough as it is. There’s no need to name classes that are similar in purpose in a form that just confuses their users. Strive to name your classes so that the correct ones are easily found by your customers.

Finally there is, of course, the question when to go for the from-generic-to-specific class naming idiom. Essentially, the more classes you have that provide similar functionality and derive from a common base, the more reason you have to go for it. The obvious minimum (in my experience) would be three derived classes, preferably four. Also, keep in mind possible future classes that may be written later on. If you are sure there are going to be e.g. three or more “sister” classes to the one you are writing, by all means go for the naming idiom. You may well earn the right to pat yourself on the back later on.

preload preload preload