Techniques - October 1998

C++ RTTI - casting around

RTTI was originally omitted from C++ and Kevlin Henney shows there is more to the runtime type information mechanism than many developers realise.

Good software engineering techniques, and object-orientation is no exception, are based on principles of abstraction. At heart, abstraction is about hiding and simplification. For OO, the abstraction principles are encapsulation, inheritance, and polymorphism. Inheritance and polymorphism support simplification by classification, expressing variation of type-based behaviour but hiding the detail of a runtime type behind a more general interface.

Most type selection behaviour in C++ can be expressed using virtual functions, but there are some cases where more knowledge of the specific type is required. Consider a persistence or I/O framework where all storable objects are derived from a storable class. The framework is general and works only in terms of storable, but a user program will need to recover a more specific type to be able to use it. This concept is one of type recovery, when an object of a type is passed into a framework and then back again via a more specific type.

Being able to downcast from a general to a specific class is often considered the hallmark of a bad design, but turns out to be a requirement of many OO systems. However, it is a capability that is needed in so few places that the presence of downcasting can still highlight poor design. The capability for typesafe downcast must be supported by some kind of runtime type information (RTTI) and it is this capability that was originally omitted from C++.

Bjarne Stroustrup held back from introducing it into C++ because he had seen how it had been (ab)used for ‘type switching’ in Simula 67. In an interesting and rare parallel, Bertrand Meyer did not originally include it in Eiffel for the same reasons. However, RTTI and a typesafe downcasting mechanism can be likened to a parachute: not something you need often, but the difference between having one versus not having one, in the few cases it is needed, can be significant!

As part of the standardisation process, C++ acquired RTTI, the most visible aspect of which is dynamic_cast. There is perhaps more detail, and motivation, to the RTTI mechanism than many developers realise, so this article will take a closer look.

The code examples presented are written to the style and content of the new ISO C++ standard. For brevity, rather than as a recommended style, member functions are defined in the body of class definition rather than out of line. All the standard library features are in namespace std, but again for brevity a using namespace std; directive is assumed in each code example.

Rolling your own

Let us take the historical route: how did C++ programmers traditionally solve this problem? Typically, it was by creating a simple runtime type-checking framework, something like the one shown in Listing 1.1. Listing 1.2 shows how a derived class would use it, and Listing 1.3 shows how client code would take advantage of it. Such frameworks have been reinvented almost as often as string classes.

There are some obvious problems with this. It is tedious and error prone, both for the class user and the class provider. Listing 2.1 enumerates the common problems. Some things can be done to alleviate the tediousness and mundane errors. For instance, macros or templates can be used to simplify the responsibilities of the class provider and eliminate cut and paste coding; the job of the class user remains pretty much as before.

To understand why such a scheme requires a language rather than a library solution we must look beyond issues of convenience to more fundamental problems. A roll your own scheme does not work at all well for template classes as Listing 2.2 illustrates. It is possible to get round some of the template problems and resolve some efficiency issues by using the address of the static type information, but the class name will still be the same. The other real show-stoppers are associated with multiple inheritance and virtual base classes: assuming you have the type information, what happens with the downcast? A cast through virtual base classes results in a compilation error and, for other MI cases, a plain cast may end up as a simple address cast without offset readjustment (chaos follows...).

The typeid operator

It is clear that a language solution is more appropriate than a library one. Part of C++’s RTTI mechanism is the typeid operator, which is syntactically reminiscent of the sizeof operator. It may take either a type name or a value as its operand, and it returns a const reference to a std::type_info object – a standard library type – that describes the given type or the type of the given value. If typeid had a prototype, it would look much like the following pseudo-C++:

const type_info &typeid(value of any type);

const type_info &typeid(type name);

To check whether an object is of a specific type, a programmer could write:

if(typeid(*ptr) == typeid(example)) ...

To use the typeid operator the programmer must include the standard <typeinfo> header file.

The type_info class

The type_info class itself is very simple (see Listing 3). In addition to the name member, which returns the type name, you may find a raw_name member function on some platforms. This is a non-standard extension that can be used to query the name-mangled version of the type name.

Only the system can create type_info objects. Once a reference to a type_info has been returned from typeid, that reference will be valid for the rest of the program. This is one of the few cases in C++ programming where it is reasonable for a programmer to take the address associated with a reference and hold onto it for later use:

const type_info *info = &typeid(*ptr);

A good compiler/linker system will ensure that type_info for a given type is not duplicated in the system, ie in all translation units of a program, an expression &typeid(type) would yield the same address for the same type. However, although expected of a good quality implementation, it is not a requirement and not something that can always be achieved, eg when dealing with dynamically linked libraries.

Polymorphic vs non-polymorphic types

How is the type information associated with an object? It would certainly be an unacceptable overhead if each object carried the type_info or a pointer to it.

For polymorphic types – ie a type with an inherited or declared virtual function – the solution is to use the vtable, which already represents a form of RTTI with respect to virtual function lookup. Figure 1.1 shows one possible implementation.

For non-polymorphic types, such a structure would be an unacceptable imposition. It would interfere with the layout compatibility of plain structs with other languages, such as C and Pascal. The C++ principle of not paying for what is not used would also be violated.

The solution is that for expressions whose types are non-polymorphic, including built-in types, the type_info is determined at compile time. Thus it is quite independent from the object and its memory layout, as Figure 1.2 shows. In this case, typeid’s operand is not evaluated as there is no need – this is like sizeof. A function non_poly that returns a pointer to a non-polymorphic base class will not get called in the following statement:

cout << typeid(*non_poly()).name() << endl;

However, as the essence of RTTI suggests, an expression resulting in a reference to a polymorphic type will get evaluated, and the type_info associated with the most derived class – ie the actual object’s type – will be returned. In the following statement a function poly, which returns a pointer to a polymorphic class, will be called:

cout << typeid(*poly()).name() << endl;

If the pointer being dereferenced is null, a bad_typeid exception is thrown (see Listing 3).

A common error is to attempt to take the typeid of a pointer to a polymorphic object. This results in a type_info that describes the pointer rather than what is being pointed to. On reflection, this is perhaps not that surprising; it is the reference to the object that must be used.

The dynamic_cast operator

However, everything is not yet solved. Casting through multiple inheritance and virtual base classes is still a problem; having the cast separate from the check is still error prone – only exact type matches can be supported using typeid, ie there is no is-kind-of check.

This is where the dynamic_cast operator fits in. It combines the type check with the cast, and uses the syntax of the new keyword casts (see Keyword casts). It is only legal for pointers and references to polymorphic types; it will fail to compile for non-polymorphic and incomplete types. For a pointer, a successful cast will result in a pointer correctly cast and adjusted to the applicable part of the object – including multiple inheritance and virtual base class cases – whereas an unsuccessful cast will result in a testable null pointer:

base *b = ...;
derived *d = dynamic_cast<derived *>(b);
if(d)
    ... // successful cast
else
    ... // unsuccessful cast

As dynamic_cast checks that the cast is possible, rather than checking for an exact type match, it is our substitute for an is-kind-of operation. Listing 4.1 shows how it can be used to define a type predicate function object, and Listing 4.2 shows it in action as a type filter. If you still hanker for a more explicit is-kind-of operator, see the end of the Keyword casts box out for how to ‘define your own operators’.

Declarations in conditions

It is inevitable that almost all dynamic_cast operations will be associated with a null check. This inspired a change to the language. A pointer declaration may be combined with a declaration where a non-null initialisation represents true, and a null initialisation represents false. The variable has scope over the whole of the control statement, but not outside it. This simplifies the use of dynamic_cast, and reinforces the C++ principle of declaration as close to the point of first use as possible.

base *b = ...;
if(derived *d = dynamic_cast<derived *>(b))
    ... // successful cast, d in scope
else
    ... // unsuccessful cast, d in scope
... // d not in scope

Although originally intended to accommodate dynamic_cast, declarations in conditions were generalised so that variables of any type may be declared and initialised in the condition of an if, while, for, or switch statement. The constraint on this is that the type must be convertible to the expected type: bool for if, while, and for; and an integer or enum type for switch. This addition to the language also triggered a long overdue change to the scope of a variable declared in the initialiser of a for loop.

The bad_cast exception

The dynamic_cast operator may be used with references to objects of polymorphic type. The question naturally arises as to what happens in the event of failure: there is no such thing as a null reference. A bad_cast exception (see Listing 3) is thrown in such cases:

base &b = ...;
try
{
    derived &d = dynamic_cast<derived &>(b);
    ... // successful cast
}
catch(bad_cast &)
{
    ... // unsuccessful cast
}

Stylistically, one would use a pointer cast for type queries, and a checked reference cast for assertions, ie the premise is that the cast to the target type will succeed and any other outcome is a failure.

A bad_cast exception will also be thrown in the event of ambiguous classes – where a class has been repeatedly, but not virtually, inherited through multiple inheritance. The potential loophole of casting through to a private base class has been stoppered with a bad_cast exception.

Extending available type information

Standard C++ lacks the richness and detail of Smalltalk and Java’s reflective capabilities. There are, however, two mechanisms by which type information may be enhanced: the first is open to the system vendor, and the second to any developer.

The type_info class is itself a polymorphic base class. A compiler vendor may choose to derive more specific classes from it that categorise and describe types in greater detail, eg a derived class holding extra information on classes, another holding details of function pointer types, etc. How would the user gain access to such extra information, given that typeid is required to return a const reference to type_info? By using RTTI of course! The dynamic_cast can be used to probe the specific type for selection and downcasting purposes.

A different technique is to use the type_info as a lookup index. This is significantly more portable, extensible, and accessible to the ordinary programmer, but requires additional support types and objects. A programmer may wish to associate a handler object or function, or a key value, with a specific type. This amounts to extending the capabilities of a type non-intrusively outside that type, eg for the purposes of I/O, persistence, etc. It is here that the type_info::before member function comes into its own. It defines an ordering between type_info objects, but not one that is in anyway necessarily related to class hierarchy, alphabetical ordering, or position in memory. Listing 5 shows a generic map type for looking a value up based on type_info.

Navigation

Multiple inheritance ranks up there in the great computing holy wars alongside bracket alignment, editor of choice, real operating systems, and so on. Some regard it as an essential and fundamental OO feature, while others demonise it as a goto. It is important to remember that a program is not concerned whether or not you use MI badly. It is the programmer that cares; language features do not have a conscience.

One of the most common good uses of MI has the been the style of mix-in programming, where additional ‘property’ classes are mixed in with a principal class hierarchy to add capabilities, such as persistence, notification, etc. Mix-ins are abstract classes and, in the simplest form, they are pure abstract, ie they have no implementation at all. This use of pure abstract classes has been around for some time, but has more recently come into vogue as ‘interface style programming’ (see Listing 6.1).

A dynamic_cast is normally presented as a safe downcast mechanism, but this is a very narrow view of its capabilities: it may also be used for safe crosscasting to sibling classes. In this context the role of dynamic_cast is to query interfaces (a question for COM programmers: does this sound familiar?) as illustrated in Listing 6.2.

A special case for dynamic_cast is casting to a void *. This returns a pointer to the beginning of the object, and will not fail. This can be used to support particular memory management schemes or to establish the actual identity of an object. In a multiple inheritance lattice this latter concept can be valuable when establishing whether or not two pointers to different types actually point to different parts of the same object (see Listing 6.3).

In reserve

Although RTTI can be – and has been – easily abused, effective and sparing use can lead to more loosely coupled systems; with poor use, the opposite is very definitely the case. In addition to the high powered uses of RTTI that have been mentioned, having access to basic type meta-data such as the type name can assist in more mundane tasks such as debugging and exception reporting.

In spite of the continued use of home grown RTTI systems, native C++ RTTI presents a more robust, comprehensive, and preferred mechanism for type discovery and navigation within a class hierarchy.

Kevlin Henney works for QA Training as a senior technologist, specialising in programming languages and architectures. He is also a member of the BSI C++ panel. He can be contacted by email at khenney@qatraining.com.

Keyword casts

The adoption of the RTTI proposal in 1993, as part of the ISO/ANSI standardisation process, saw the introduction of four new casts, including dynamic_cast. The traditional cast form, inherited from C, is not attractive at the best of times. That they stand out is a good thing: explicit conversions are often an indication that the programmer is cheating the type system and making such assumptions clear in the code is important. Given that, the problem with the old style casts is that they are not clear enough in their intent or easy enough to locate across a large program – how many matching parentheses do you think there are in the average C++ program?

The four keyword casts address these issues. Each one is a specialist, good at only one type of conversion. Based, as they are, on keywords and explicit template function qualification syntax, they are very easy to locate in your source.

static_cast This cast covers all of the safe casts: widening numeric casts; pointer and reference casts from derived to base classes; construction of objects from single argument constructors; use of user-defined conversion operators; and pointer casts to void *.

It also covers many of the ‘plausible’ casts, ie ones that might feasibly be safe if the programmer is to be trusted, but are otherwise not guaranteed: narrowing numeric casts and casts from integers to enumerations; pointer and reference casts from base to derived classes (excluding virtual base classes); and pointer casts from void *.

Note that the types must be complete, otherwise you will receive a compilation error.

dynamic_cast This cast handles typesafe casting between polymorphic classes of an object via pointers or references. It can handle casting through virtual base classes, downcasting, and crosscasting. It can also handle upcasting, but this is redundant and can be handled at compile time.
const_cast This cast handles changes in cv qualification, ie const and volatile qualification, for pointers and references. The most common use of this form is to remove const-ness, hence its name, and in particular from poorly written code that is not const correct.

None of the other casts permit changes in qualification, and will flag any attempts as compile time errors.

reinterpret_cast This cast plays the role of the low-level cast beloved of systems programmers: unportable and dubious casts such as converting between pointers and integers, pointers to data and pointers to functions, etc. Here be dragons with attitude.

These keyword casts cover almost all of the capabilities, and more, of the older cast forms. Perhaps the one exception to this is casting to non-public base classes. However, this is no great loss. If you are genuinely hell bent on breaking encapsulation, you might as well do it in style: #define private public ...

The use of explicit template function qualification syntax means that you can, if you wish, define your own ‘casting operators’ to perform some kind of conversion and have the same appearance as the keyword casts. For instance, a numeric conversion implemented in terms of numeric_limits and checked for loss of range might be used as follows:

double d = ...;
int i = numeric_cast<int>(d);

If you wish to retain the versatility of the old cast forms, but the visibility of the modern cast form, you can achieve this as follows:

template<typename result_type, typename arg_type>
result_type explicit_cast(const arg_type &arg)
{
    return result_type(arg);
}

 

Keyword casts

The Design and Evolution of C++, Bjarne Stroustrup, Addison-Wesley, 1994 (ISBN 0-201-54330-3)

This excellent book describes the motivation and design decisions that have affected the development of C++ over the years, including the genesis and introduction of RTTI, keyword casts, and declarations in conditions.

Inside the C++ Object Model, Stanley B Lippman, Addison-Wesley, 1996 (ISBN 0-201-83454-5)

This book is written from the compiler writer’s perspective of C++, and shines a torch under the hood of C++. It describes the common vtable mechanism and how RTTI may be implemented.

The Annotated C++ Reference Manual, Margaret A Ellis and Bjarne Stroustrup, Addison-Wesley (ISBN 0-201-51459-1)

Although this book is getting a bit long in the tooth, superseded as it is by the new ISO C++ standard, it still thoroughly describes the details and implementation issues associated with the virtual function mechanism, multiple inheritance, virtual base classes, and casting. If you have a copy to hand, check it out, otherwise borrow one.

 

Listing 1.1 - A library solution to RTTI.
class runtime_checkable
{
public:
    virtual const string &type() const = 0;
    virtual bool is_kind_of(const string &) const { return false; }
    virtual ~runtime() {}
};

 

Listing 1.2 - Derived class view of library RTTI.
class example : public runtime_checkable
{
public:
    virtual const string &type() const
        { return type_name; }
    virtual bool is_kind_of(const string &other_type) const
        { return type_name == other_type; }
    static const string type_name;
    ...
};
const string example::type_name = "example";

 

Listing 1.3 – Usage example of library RTTI.
runtime_checkable *ptr = ...;
cout << "runtime type is " << ptr->type() << endl;
if(ptr->is_kind_of(example::type_name))
{
    example *cast_ptr = (example *) ptr;
    ... // use cast_ptr
}

 

Listing 2.1 – Some common problems with DIY RTTI.
namespace zoo
{
    class animal : public runtime_checkable
    {
    public:
        virtual const string &type() const
            { return type_name; }
        // forgot to override is_kind_of
        static const string type_name;
        // must be "zoo::animal", not "animal"
        ...
    };
    class mammal : public animal
    {
    public:
        virtual const string &type() // forgot const
            { return name; }
        virtual bool is_kind_of(const string &other_type) const
            { return name == other_type; }
        // forgot to chain to animal::is_kind_of
        static const string name; // should be type_name
        ...
    };
    class primate : public mammal
    {
    public:
        virtual const string &type() const
            { return type_name; }
        virtual bool is_kind_of(const string &other_type) const
            { return type_name == other_type || animal::is_kind_of(other_type); }
        // chained to wrong class
        static const string type_name;
        ...
    };
}

 

Listing 2.2 – Template problems with DIY RTTI.
template<typename value_type>
class container : public runtime_checkable
{
public:
    ...
    static const string type_name;
    ...
};
template<typename value_type>
const string container<value_type>::type_name = "container";
    // different template instantiations will
    // have same string value for type_name

 

Listing 3 – Contents of the standard <typeinfo> header.
namespace std
{
    class type_info
    {
    public:
        virtual ~type_info();
        bool operator==(const type_info &) const;
        bool operator!=(const type_info &) const;
        bool before(const type_info &) const;
        const char *name() const;
    private:
        // prevent copying
        type_info(const type_info &);
        type_info &operator=(const type_info &);
        ... // implementation not specified
    };
    // RTTI exception types derived from std::exception
    class bad_typeid : public exception { ... };
    class bad_cast : public exception { ... };
}

 

Listing 4.1 – Type conformance predicate.
template<class base, class derived>
struct kind_of : unary_function<const base *, bool>
{
    bool operator()(const base *operand) const
    {
        return dynamic_cast<const derived *>(operand);
    }
};

 

Listing 4.2 – Counting the mammals at the zoo with ISO C++.
set<animal *> zoo;
... // populate zoo
cout << count_if(zoo.begin(), zoo.end(), kind_of<animal, mammal>()) << endl;

 

Listing 5 – An associative container type that looks up a value of templated type from a typeid.
template<typename value_type>
class type_map
{
public:
    value_type &operator[](const type_info &type_id)
    {
        return contents[&type_id];
    }
    const value_type &operator[](const type_info &type_id) const
    {
        map_type::const_iterator found = contents.find(&type_id);
        return found != contents.end()
                    ? found->second
                    : throw logic_error("bad type lookup");
    }
    ...
private:
    struct compare :
        binary_function<const type_info *, const type_info *, bool>
    {
        bool operator()(const type_info *lhs, const type_info *rhs) const
        {
            return lhs->before(*rhs);
        }
    };
    typedef map<const type_info *, value_type, compare> map_type;
    map_type contents;
};

 

Listing 6.1 – Polymorphic interface and entity classes.
// interface classes
class displayable { ... };
class notifiable { ... };
class storable { ... };

// entity classes
class document { ... };
class wp_document :
    public document,
    public virtual displayable,
    public virtual storable
{ ... };

 

Listing 6.2 – Using dynamic_cast to query interface support.
document *d = new wp_document;
...
if(storable *s = dynamic_cast<storable *>(d))
    ... // use storable operations for d
if(notifiable *n = dynamic_cast<notifiable *>(d))
    ... // will not be executed for d

 

Listing 6.3 – Comparing object identity with dynamic_cast<void *>.
notifiable *n = ...;
storable *s = ...;
if(dynamic_cast<void *>(n) == dynamic_cast<void *>(s))
    ... // n and s refer to the same object
else
    ... // n and s refer to different objects

 

(P)1998, Centaur Communications Ltd. EXE Magazine is a publication of Centaur Communications Ltd. No part of this work may be published, in whole or in part, by any means including electronic, without the express permission of Centaur Communications and the copyright holder where this is a different party.

EXE Magazine, St Giles House, 50 Poland Street, London W1V 4AX, email editorial@dotexe.demon.co.uk

 

Techniques - October 1998