|
Techniques - October 1998 C++ RTTI - casting around RTTI was originally omitted from C++ and Kevlin Henney shows there is more to the runtime type information mechanism than many developers realise. Good software engineering techniques, and object-orientation is no exception, are based on principles of abstraction. At heart, abstraction is about hiding and simplification. For OO, the abstraction principles are encapsulation, inheritance, and polymorphism. Inheritance and polymorphism support simplification by classification, expressing variation of type-based behaviour but hiding the detail of a runtime type behind a more general interface. Most type selection behaviour in C++ can be expressed using virtual functions, but there are some cases where more knowledge of the specific type is required. Consider a persistence or I/O framework where all storable objects are derived from a storable class. The framework is general and works only in terms of storable, but a user program will need to recover a more specific type to be able to use it. This concept is one of type recovery, when an object of a type is passed into a framework and then back again via a more specific type.Being able to downcast from a general to a specific class is often considered the hallmark of a bad design, but turns out to be a requirement of many OO systems. However, it is a capability that is needed in so few places that the presence of downcasting can still highlight poor design. The capability for typesafe downcast must be supported by some kind of runtime type information (RTTI) and it is this capability that was originally omitted from C++. Bjarne Stroustrup held back from introducing it into C++ because he had seen how it had been (ab)used for ‘type switching’ in Simula 67. In an interesting and rare parallel, Bertrand Meyer did not originally include it in Eiffel for the same reasons. However, RTTI and a typesafe downcasting mechanism can be likened to a parachute: not something you need often, but the difference between having one versus not having one, in the few cases it is needed, can be significant! As part of the standardisation process, C++ acquired RTTI, the most visible aspect of which is dynamic_cast. There is perhaps more detail, and motivation, to the RTTI mechanism than many developers realise, so this article will take a closer look.The code examples presented are written to the style and content of the new ISO C++ standard. For brevity, rather than as a recommended style, member functions are defined in the body of class definition rather than out of line. All the standard library features are in namespace std, but again for brevity a using namespace std; directive is assumed in each code example.Rolling your own Let us take the historical route: how did C++ programmers traditionally solve this problem? Typically, it was by creating a simple runtime type-checking framework, something like the one shown in Listing 1.1. Listing 1.2 shows how a derived class would use it, and Listing 1.3 shows how client code would take advantage of it. Such frameworks have been reinvented almost as often as string classes. There are some obvious problems with this. It is tedious and error prone, both for the class user and the class provider. Listing 2.1 enumerates the common problems. Some things can be done to alleviate the tediousness and mundane errors. For instance, macros or templates can be used to simplify the responsibilities of the class provider and eliminate cut and paste coding; the job of the class user remains pretty much as before. To understand why such a scheme requires a language rather than a library solution we must look beyond issues of convenience to more fundamental problems. A roll your own scheme does not work at all well for template classes as Listing 2.2 illustrates. It is possible to get round some of the template problems and resolve some efficiency issues by using the address of the static type information, but the class name will still be the same. The other real show-stoppers are associated with multiple inheritance and virtual base classes: assuming you have the type information, what happens with the downcast? A cast through virtual base classes results in a compilation error and, for other MI cases, a plain cast may end up as a simple address cast without offset readjustment (chaos follows...).The typeid operator It is clear that a language solution is more appropriate than a library one. Part of C++’s RTTI mechanism is the typeid operator, which is syntactically reminiscent of the sizeof operator. It may take either a type name or a value as its operand, and it returns a const reference to a std::type_info object – a standard library type – that describes the given type or the type of the given value. If typeid had a prototype, it would look much like the following pseudo-C++:const type_info &typeid(value of any type); const type_info &typeid(type name); To check whether an object is of a specific type, a programmer could write: if(typeid(*ptr) == typeid(example)) ... To use the typeid operator the programmer must include the standard <typeinfo> header file.The type_info class The type_info class itself is very simple (see Listing 3). In addition to the name member, which returns the type name, you may find a raw_name member function on some platforms. This is a non-standard extension that can be used to query the name-mangled version of the type name.Only the system can create type_info objects. Once a reference to a type_info has been returned from typeid, that reference will be valid for the rest of the program. This is one of the few cases in C++ programming where it is reasonable for a programmer to take the address associated with a reference and hold onto it for later use:const type_info *info = &typeid(*ptr); A good compiler/linker system will ensure that type_info for a given type is not duplicated in the system, ie in all translation units of a program, an expression &typeid(type) would yield the same address for the same type. However, although expected of a good quality implementation, it is not a requirement and not something that can always be achieved, eg when dealing with dynamically linked libraries.Polymorphic vs non-polymorphic types How is the type information associated with an object? It would certainly be an unacceptable overhead if each object carried the type_info or a pointer to it.For polymorphic types – ie a type with an inherited or declared virtual function – the solution is to use the vtable, which already represents a form of RTTI with respect to virtual function lookup. Figure 1.1 shows one possible implementation. For non-polymorphic types, such a structure would be an unacceptable imposition. It would interfere with the layout compatibility of plain structs with other languages, such as C and Pascal. The C++ principle of not paying for what is not used would also be violated.The solution is that for expressions whose types are non-polymorphic, including built-in types, the type_info is determined at compile time. Thus it is quite independent from the object and its memory layout, as Figure 1.2 shows. In this case, typeid’s operand is not evaluated as there is no need – this is like sizeof. A function non_poly that returns a pointer to a non-polymorphic base class will not get called in the following statement:cout << typeid(*non_poly()).name() << endl; However, as the essence of RTTI suggests, an expression resulting in a reference to a polymorphic type will get evaluated, and the type_info associated with the most derived class – ie the actual object’s type – will be returned. In the following statement a function poly, which returns a pointer to a polymorphic class, will be called:cout << typeid(*poly()).name() << endl; If the pointer being dereferenced is null, a bad_typeid exception is thrown (see Listing 3).A common error is to attempt to take the typeid of a pointer to a polymorphic object. This results in a type_info that describes the pointer rather than what is being pointed to. On reflection, this is perhaps not that surprising; it is the reference to the object that must be used.The dynamic_cast operator However, everything is not yet solved. Casting through multiple inheritance and virtual base classes is still a problem; having the cast separate from the check is still error prone – only exact type matches can be supported using typeid, ie there is no is-kind-of check.This is where the dynamic_cast operator fits in. It combines the type check with the cast, and uses the syntax of the new keyword casts (see Keyword casts). It is only legal for pointers and references to polymorphic types; it will fail to compile for non-polymorphic and incomplete types. For a pointer, a successful cast will result in a pointer correctly cast and adjusted to the applicable part of the object – including multiple inheritance and virtual base class cases – whereas an unsuccessful cast will result in a testable null pointer:
base *b = ...;
derived *d = dynamic_cast<derived *>(b);
if(d)
... // successful cast
else
... // unsuccessful cast
As dynamic_cast checks that the cast is possible, rather than checking for an exact type match, it is our substitute for an is-kind-of operation. Listing 4.1 shows how it can be used to define a type predicate function object, and Listing 4.2 shows it in action as a type filter. If you still hanker for a more explicit is-kind-of operator, see the end of the Keyword casts box out for how to ‘define your own operators’.Declarations in conditions It is inevitable that almost all dynamic_cast operations will be associated with a null check. This inspired a change to the language. A pointer declaration may be combined with a declaration where a non-null initialisation represents true, and a null initialisation represents false. The variable has scope over the whole of the control statement, but not outside it. This simplifies the use of dynamic_cast, and reinforces the C++ principle of declaration as close to the point of first use as possible.
base *b = ...;
if(derived *d = dynamic_cast<derived *>(b))
... // successful cast, d in scope
else
... // unsuccessful cast, d in scope
... // d not in scope
Although originally intended to accommodate dynamic_cast, declarations in conditions were generalised so that variables of any type may be declared and initialised in the condition of an if, while, for, or switch statement. The constraint on this is that the type must be convertible to the expected type: bool for if, while, and for; and an integer or enum type for switch. This addition to the language also triggered a long overdue change to the scope of a variable declared in the initialiser of a for loop.The bad_cast exception The dynamic_cast operator may be used with references to objects of polymorphic type. The question naturally arises as to what happens in the event of failure: there is no such thing as a null reference. A bad_cast exception (see Listing 3) is thrown in such cases:
base &b = ...;
try
{
derived &d = dynamic_cast<derived &>(b);
... // successful cast
}
catch(bad_cast &)
{
... // unsuccessful cast
}
Stylistically, one would use a pointer cast for type queries, and a checked reference cast for assertions, ie the premise is that the cast to the target type will succeed and any other outcome is a failure. A bad_cast exception will also be thrown in the event of ambiguous classes – where a class has been repeatedly, but not virtually, inherited through multiple inheritance. The potential loophole of casting through to a private base class has been stoppered with a bad_cast exception.Extending available type information Standard C++ lacks the richness and detail of Smalltalk and Java’s reflective capabilities. There are, however, two mechanisms by which type information may be enhanced: the first is open to the system vendor, and the second to any developer. The type_info class is itself a polymorphic base class. A compiler vendor may choose to derive more specific classes from it that categorise and describe types in greater detail, eg a derived class holding extra information on classes, another holding details of function pointer types, etc. How would the user gain access to such extra information, given that typeid is required to return a const reference to type_info? By using RTTI of course! The dynamic_cast can be used to probe the specific type for selection and downcasting purposes.A different technique is to use the type_info as a lookup index. This is significantly more portable, extensible, and accessible to the ordinary programmer, but requires additional support types and objects. A programmer may wish to associate a handler object or function, or a key value, with a specific type. This amounts to extending the capabilities of a type non-intrusively outside that type, eg for the purposes of I/O, persistence, etc. It is here that the type_info::before member function comes into its own. It defines an ordering between type_info objects, but not one that is in anyway necessarily related to class hierarchy, alphabetical ordering, or position in memory. Listing 5 shows a generic map type for looking a value up based on type_info.Navigation Multiple inheritance ranks up there in the great computing holy wars alongside bracket alignment, editor of choice, real operating systems, and so on. Some regard it as an essential and fundamental OO feature, while others demonise it as a goto. It is important to remember that a program is not concerned whether or not you use MI badly. It is the programmer that cares; language features do not have a conscience.One of the most common good uses of MI has the been the style of mix-in programming, where additional ‘property’ classes are mixed in with a principal class hierarchy to add capabilities, such as persistence, notification, etc. Mix-ins are abstract classes and, in the simplest form, they are pure abstract, ie they have no implementation at all. This use of pure abstract classes has been around for some time, but has more recently come into vogue as ‘interface style programming’ (see Listing 6.1). A dynamic_cast is normally presented as a safe downcast mechanism, but this is a very narrow view of its capabilities: it may also be used for safe crosscasting to sibling classes. In this context the role of dynamic_cast is to query interfaces (a question for COM programmers: does this sound familiar?) as illustrated in Listing 6.2.A special case for dynamic_cast is casting to a void *. This returns a pointer to the beginning of the object, and will not fail. This can be used to support particular memory management schemes or to establish the actual identity of an object. In a multiple inheritance lattice this latter concept can be valuable when establishing whether or not two pointers to different types actually point to different parts of the same object (see Listing 6.3).In reserve Although RTTI can be – and has been – easily abused, effective and sparing use can lead to more loosely coupled systems; with poor use, the opposite is very definitely the case. In addition to the high powered uses of RTTI that have been mentioned, having access to basic type meta-data such as the type name can assist in more mundane tasks such as debugging and exception reporting. In spite of the continued use of home grown RTTI systems, native C++ RTTI presents a more robust, comprehensive, and preferred mechanism for type discovery and navigation within a class hierarchy. Kevlin Henney works for QA Training as a senior technologist, specialising in programming languages and architectures. He is also a member of the BSI C++ panel. He can be contacted by email at khenney@qatraining.com.
(P)1998, Centaur Communications Ltd. EXE Magazine is a publication of Centaur Communications Ltd. No part of this work may be published, in whole or in part, by any means including electronic, without the express permission of Centaur Communications and the copyright holder where this is a different party. EXE Magazine, St Giles House, 50 Poland Street, London W1V 4AX, email editorial@dotexe.demon.co.uk
Techniques - October 1998 |