May 23

Searching for elements in an STL container is a common task. There are several ways to accomplish this. Let’s have a look at some of them.

Suppose we have a collection of elements in an STL container. The element type is defined as a simplified phonebook entry, as follows:

struct Entry
{
    Entry( const string& n, const string& p ) : name( n ), phoneNum( p ) {}

    std::string name;
    std::string phoneNum;
};

We then push some entries into a container, e.g. a list, like follows:

std::list<Entry> phonebook;

phonebook.push_back( Entry( "James Bond", "123" ) );
phonebook.push_back( Entry( "Felix Leiter", "456" ) );
phonebook.push_back( Entry( "Vesper Lynd", "789" ) );

We’d then like to look up the phone number of a contact. We know the contact’s name, so we can search it as follows using a for loop and an iterator:

std::string searchFor = "James Bond";
std::list<Entry>::const_iterator iter = phonebook.begin();

for ( ; iter != phonebook.end(); ++iter )
{
    if ( iter->name == searchFor )
    {
        break;  // No need to look any further
    }
}

if ( iter != phonebook.end() )
{
    std::cout << "Call " << searchFor << " at " << iter->phoneNum << endl;
}

Quite simple. We could also make this into a function, and call it whenever we need to seach for a number:

string GetNumber( const std::list<Entry>& phonebook, const string& searchFor )
{
    std::list<Entry>::const_iterator iter = phonebook.begin();

    for ( ; iter != phonebook.end(); ++iter )
    {
        if ( iter->name == searchFor )
        {
            break;  // No need to look any further
        }
    }

    return iter->phoneNum;
}

There is also a more elegant way; we can write a comparison operator that returns true if the given Entry matches a specified std::string, and use it with the STL's find() algorithm. For the purposes of this example, it doesn't matter if the operator is a member operator or a global one. Here's the global version:

bool operator==( const Entry& e, const string& n )
{
    if ( e.name == n )
    {
        return true;
    }

    return false;
}

Make sure you have the parameters so that the container element type is on the left side, and the comparison argument (i.e. the third parameter to find()) is on the right side; this is what the algorithm expects.

Armed with this function, we are able to use the find() algorithm (remember to #include <algorithm>) to do our bidding:

std::string searchFor = "Vesper Lynd";

std::list<Entry>::const_iterator iter =
    std::find( phonebook.begin(), phonebook.end(), searchFor );

if ( iter != phonebook.end() )
{
    std::cout << "Call " << searchFor << " at " << iter->phoneNum << endl;
}

The find() algorithm compares each element of the specified range (between begin() and end(), as specified above) against the third argument, which happens to be the search string. Given that we now have a comparison operator taking an Entry struct and an std::string defined, the compiler calls the operator for each element in the range. The find() terminates either when the comparison returns true, or when the iterator position specified as the second argument to the algorithm is reached, i.e. phonebook.end().

You can also use find_if() with a function object as a search criterion. Function objects have also much more powerful features than mere functions, e.g. the ability to maintain state between invocations, but for this example we'll use one just as a simple comparison critetion. Let's have a look:

class CompPred : public binary_function<Entry, std::string, bool>
{
public:
    bool operator()( const Entry& e, const std::string& s ) const
    {
        if ( e.name == s )
        {
            return true;
        }

        return false;
    }
};

Here we derive a class from binary_function, and add a function call operator (operator()). The template parameters to the base class are two of the argument types (here Entry and std::string), and a return type (bool). What we're doing here is basically a template specialization of the binary_function class template. We then use the Entry and std::string types as parameters for the function call operator, akin to what we did with operator== above.

We can then do the search using a slightly modified version of the previously presented code:

std::string searchFor = "Vesper Lynd";

std::list::const_iterator iter =
    std::find_if( phonebook.begin(),
                  phonebook.end(),
                  bind2nd( CompPred(), searchFor ) );

if ( iter != phonebook.end() )
{
    std::cout << "Call " << searchFor << " at " << iter->phoneNum << endl;
}

Note the use of the find_if() algorithm. It takes as the third argument a pointer to a function or function object derived from unary_function. The bind2nd() adapter takes as parameters a function object and the search argument. We need the adapter in order to convert a binary function (taking two arguments) into a unary function (taking a single argument).

This post has droned on for quite some time, so I'll cut this short. There are loads of stuff I deliberately left unexplained here, e.g. why the function object's operator() must be declared const, function objects and function adapters, and template specializations, but I'll save those for another time, my dear imaginary readers.

Feb 23

The fail fast idiom is all about catching software errors at the earliest possible stage. The idiom is surprisingly often just neglected, as if developers loved to spend days tracking down obscure bugs which they know would just as well have been caught with a tiny bit of extra work in the first place.

Failing fast doesn’t mean that the number of failures increases. Rather, it means that the failures are not neglected and ignored, but found as soon as possible. For example, suppose that we call a function that returns a value, which in our subsequent code must then be below a certain maximum value:

int value = p->FunctionCall();
if ( value > max )
{
    value = max;
}

DoStuff( value );

Here, if the returned value is greater than max, it is set to max. DoStuff() requires that the value is less or equal to max. Now, this clamping may or may not be the intended behaviour. If setting the value to max is just a fail-safe in order to prevent DoStuff() from crashing if p->FunctionCall() returns an erroneous value, it is better to make sure already at this point that the value is indeed within specified bounds by adding an assertion.

Assertions are one of the foresighted developer’s best friends. There are a multitude of ways in which assertions are used, for example in Symbian code there are e.g. __ASSERT_DEBUG() and __ASSERT_ALWAYS() macros, for which you can provide a function to be called if the assertion fails. Then there’s the good old assert() macro from the standard C library, which is defined to write to the standard error output (prior to calling abort()) at least the asserted expression and the file name and line number where the assertion failed.

Where to put assertions, then? Usually there are quite a few places in code where a certain invariant must hold true. For example, there may be a member variable which must have a certain value upon function invocation, or else the results of the function are undefined. As a pseudocode example, imagine that we’re supposed to sell toys, and all their bells and whistles must be painted before they’re ready for shipment.

void Toy::Paint()
{
    for_each( bells_.begin(), bells_.end(), PaintFunc );
    for_each( whistles_.begin(), whistles_.end(), PaintFunc );
}

Let’s say that we add a new Toy to a vector container to be shipped to a retailer but forget to call its Paint() function. We now have a bunch of toys but, unbeknownst to us due to our negligence, one of them is missing paint from its bells and whistles. Uh oh, the delivery truck is here, and those guys don’t have all day. Better call the Offload() function to make some space in the warehouse:

void WareHouse::Offload( const std::vector<Toy>& toys )
{
    for_each( toys.begin(); toys.end(); ShipFunc );
}

Now we’re done with the toys, but one has slipped through without all the invariants checked. A good solution here would be to check in the Offload() function that we’re shipping out kosher items. Supposing that bells and whistles derive from a common class, Part, and that the base class contains a IsPainted() function to check if the part in question has been painted, we can add an assert to the Offload() function as follows:

void WareHouse::Offload( const std::vector<Toy>& toys )
{
    for ( std::vector<Toy>::const_iterator<Toy> iter = toys.begin();
           iter != toys.end(); ++iter )
    {
        assert( iter->IsPainted() );
    }

    for_each( toys.begin(); toys.end(); ShipFunc );
}

Here the assert will stop program execution to prevent shipping of unpainted toys. What we’re trying to achieve here is that already during development phase we are able to deal with code paths that result in unpainted toys being shipped. It’s better to make the problems manifest themselves as early as possible (to fail fast) than trying to find obscure bugs during the final, often hectic stages of an imminent release.

Feb 23

Consider the following class:

class Gadget
{
public:
    void MakeNewWidget()
    {
        delete widget_;
        widget_ = NULL;
        widget_ = new Widget;
    }

    const Widget& Get() { return *widget_; }

private:
    Widget* widget_;
};

In the MakeNewWidget() function the widget_ class member is first deleted and set to NULL before creating a new Widget object and assigning it to widget_. Plain and simple, and you see a lot of this kind of code around.

There’s a particular reasoning for this kind of code, especially in memory-contrived environments. The aim here is to prevent having a temporary Widget object in memory in addition to the widget_ member, thus saving on memory consumption.

But there’s something horribly wrong here. The “aim” of the code may be accurate, but it’s aiming straight at the coder’s foot. What would happen if the widget_ = new Widget; line threw an exception? The answer is that the Gadget object would be left in an inconsistent state, because the original widget_ member has already been deleted. If there’s a catch somewhere nearby down the call stack, the code there may well assume that the old state of the Gadget object is still valid; i.e. there’s been no change to its internals even though there was an exception.

We are able to remedy the situation by creating new Widget object into a temporary variable, deleting the original widget_ and assigning the temporary variable to widget_. Like follows:

void MakeNewWidget()
{
    Widget* temp = new Widget;
    delete widget_;
    widget = NULL;
}

Temporary variables such as these may not always be aesthetically pleasing, and we are using twice the amount of memory by briefly having two Widget objects present, but this code is significantly safer for the callers. No longer do the callers have to worry about the state of the Gadget object in case MakeNewWidget throws an exception, which is as it should be.

Aug 26

A C++ compiler implicitly creates a copy constructor and an assignment operator for any class that does not explicitly define them.

class Widget
{
public:
    Widget( Gadget* gadget )
    : gadget_( gadget ) {}

    ~Widget() { delete gadget_; }

private:
    Gagdet* gadget_;
};

 
Here, we have not explicitly defined the copy constructor or the assignment operator, so the compiler will add them, resulting a class like this:

class Widget
{
public:
    Widget( Gadget* gadget )
    : gadget_( gadget ) {}

    ~Widget() { delete gadget_; }

    Widget( const Widget& widget )
    : gadget_( widget.gadget_ ) {}

    Widget& operator=( const Widget& widget )
    {
        gadget_ = widget.gadget_;
        return *this;
    }

private:
    Gagdet* gadget_;
};

 
Consider the following code fragment that uses the Widget class:

int main()
{
    Gadget* gadget = new Gadget;
    Widget* w1 = new Widget( gadget );

    Widget w2 = *w1;    // Copy using the assignment operator
    Widget w3( *w1 );   // Copy using the copy constructor

    return 0;
}

 
The default copy constructor and assignment operator copy all member variables from an object to another as-is. The problem arises when an object has a pointer to some other object; gadget_ in this case. When copying or assigning an object using the automatically generated functions, only the pointer value is copied. Both the objects’ member variables now point to the same object. Now, when the original object is destroyed, it deletes the widget_ member object whose pointer it held. The result is that the copied object’s pointer now points to deleted memory, and all bets are off.

When you have classes that contain pointers to other objects, consider if it is ok to just copy the pointers using the automatically generated copy constructor and assignment operator. If the pointed-to object ownership is not within the objects being copied, memberwise copying using the default copy constructor and copy assignment is ok. But it the objects own the data behind the pointers, you’ll want to either disable the copy constructor and assignment operator (by making them protected/private), or make sure the object is actually cloned (or deep-copied) so that the new object is self-contained. This means that the pointers cannot point to the same objects but new ones instead, which are created upon cloning the object in the copy constructor or assignment operator.

Aug 22

This post concerns something that has bothered me for years. Namely, the tendency of people to write their code using as few keystrokes as possible. While there’s often nothing to gain, there’s a lot to lose, two major issues being readability and debuggability. While reading this post may be akin to drinking poison to some of you (and you know who you are!), your fellow developers will thank you when they are wading through your code.

Depending on the environment you’re using, run-time debug information may be available to you in several ways. Since I have most experience with the Eclipse/Carbide IDE, the debuggability part of post will be primarily applicable to that environment, but it’ll likely apply just as well to most other IDEs. The readability part of course applies no matter what environment you’re using.

Consider the following fragment of code:

int i = 5;

When you’re running this on a debugger, you’re able to see that the value of a variable named ‘i’ is 5. (It often happens that the compiler optimizes out unused variables even in debug builds, so you may need to write some superfluous statement (such as logging) where you use such an otherwise unused variable in order to be able to see it in the debugger, but I digress.) This is all well and good, but consider the following example:

Base* b = ptrGadget->DoStuff(item).Factor()[index];

Here, some IDEs are able to tell you what kind of an object the DoStuff() and Factor() functions return. You’d still have to hover your mouse over the function names (or step into the code), which is basically extra work for you every single time you forget the return types. Considering the amount of information you have to juggle in your mind while debugging, this soon becomes a real frustration. Additionally, from a readability standpoint, let’s consider how a developer new to the project (or you after a 5-week vacation) sees this:

“We’re apparently trying to get a pointer to a class named Base (might be a virtual base class, mind you) by calling a method named DoStuff() on the ptrGadget pointer. The function takes as parameter a variable called item and returns some object (by value or reference?), on which we then call a function named Factor(). Finally, we take an element from an array(?) returned by the Factor() function and store it to the pointer b of the class named Base.”

The code works, but the problem for the person debugging the code is that a lot of details are hidden beneath the surface. For example, we cannot see just by looking at the code what kind of object the DoStuff() function returns. Neither do we see what kind of an array (or any type for which the subscript operator is defined) the Factor() function returns. Sure, you could step into the code using the debugger and have a look, but doing this every time after you forget the types in question gets old fast. Fortunately, the solution is simple and just as efficient.

For the solution, we just have to look up the types in the code being called once and write them as local (automatic) variables to our own code. For the sake of this example, let’s say you find out that the DoStuff() function returns a const reference to an object of class Charger (just picking words out of my immediate surroundings here). The Charger class has the Factor() function which returns an std::vector that contains Base class pointers.

const Charger& charger = ptrGadget->DoStuff(item);
std::vector<Base*> vecBase = charger.Factor();
Base* b = vecBase[index];

Now, we have divided our “neat” piece of code on one line to three. And to what gain? Well, we are now able to clearly see what types we are working on (Charger and std::vector<Base*>). These were hidden from the view previously. And although the code is on three lines and there’s more text, it is much more readable because now we have just a single statement per line instead of three. The debugger is also able to display information about the automatic variables we have created, namely charger and vecBase.

Naturally, you could replace the types used in this example with anything you like, and the point would still stand. The types I used where were from the top of my head and at some point I’ll probably replace them with others that better suit the example.

Jul 02

It’s easy to disregard writing code comments. Let’s face it, we as coders are not going for the Pulitzer prize, so it’s not about the lack of personal ability of being able to express oneself, it’s just plain laziness.

To be honest, leaving code without descriptive comments where they’re needed is much worse than that. Firstly, it indirectly shows the programmer’s lack of respect towards his peers. Everyone knows how frustrating it is to wade through dozens of classes and even more functions with cryptic, non-self-descripting code, especially when the original coder is on vacation, has left the company or otherwise not unavailable for any other reason. A programmer is an artist, with fellow programmers as the audience. And as all artists, we should learn to respect our audience.

Secondly, leaving out comments shows the programmer’s overconfident attitude towards himself. Sure enough, when writing code, we usually have a good idea of what we’re doing. But programming is often about juggling several things in mind at once, and when you next time look at your code, it may be quite difficult to get back to the mindset you were in when writing the code originally. In these quite common situations, it pays to have descriptive comments in place.

Naturally, I’m not suggesting to comment everything. In fact, one should avoid redundant comments when the code is clear by itself, i.e. when the code is self descriptive. For example, consider the following:

// Declare category id for products
const int prodCategoryId = 1024;

// Create an iterator over products
vector<Product>::iterator iter = products_.begin();

// Iterate through all products
for ( ; iter != products_.end(); ++iter )
{
    // Assign categody id to each product
    iter->AssignCategoryId( prodCategoryId );
}

 
This is an obvious example of excessive commenting. So what to leave out? Consider commenting general ideas, not individual code lines. We can refactor the above comments so that we both reduce comment clutter, yet make our general reasoning obvious:

// Assign categody id to each product
const int prodCategoryId = 1024;
vector<Product>::iterator iter = products_.begin();

for ( ; iter != products_.end(); ++iter )
{
    iter->AssignCategoryId( prodCategoryId );
}

The intent of the code is clear from the single comment alone. What’s more, variable and function names are descriptive enough not to leave too much for the imagination.

Also remember that the more comments you have, the more of a maintenance burden it will be to keep them all up to date. So strive to write concise and descriptive comments only when they are needed.

preload preload preload