Changing Polymorphic Behavior at Runtime

I've recently discovered an interesting C++ technique that I've never read about before, so I thought that I'd share it here. It isn't a language feature or anything, but it is still interesting and (in my case at least) useful. The technique allows you to change the polymorphic behavior of an object at runtime.

First, a little back story. I've got a Property class that provides generic access to an object's property value. To provide this, the Property class must know the data type of the property that it encapsulates. So, I've also got a DataType class that encapsulates a data type and provides generic access to values of that type. This DataType class uses standard polymorphic class design such that the abstract base DataType class is implemented for each data type that we need to support (i.e., DataType_int or DataType_MyClass). So, my Property class has a reference (pointer) to a DataType object which provides it with generic access to that types' value. This is also an example of the Strategy pattern, which allows for the Property class to change its behavior (its DataType) at runtime and an example of design by composition (Property HAS a DataType) rather than inheritance (Property is subclassed for each DataType it must support). So far, I think that I'm on the right path.

The problem arises when I make a couple of DataType subclasses and begin trying to assign them to Property. Since Property has a reference to a DataType object, that object must exist somewhere. So, I have a couple of options. I can have Singleton instances of each DataTypesubclass and let Property objects reference those Singletons. Or I can dynamically allocate an instance of a DataType class and let the Property class manage that object's memory. The latter would result in many small allocations, which would be slow and could fragment the heap. So it isn't desirable. And I prefer not to keep globals around if at all possible, so the Singleton solution, while not terrible, was not ideal. I started thinking of using a structure of function pointers to encapsulate the many behaviors required to encapsulate a given type. However, I quickly realized that this would result in huge objects when I really only wanted a single reference to a class of functionality that the group of functions would define. At this point, I realized (as I'm sure you also have) that what I needed was a class. The class provides each instance of it with a group of functions accessed via a single reference, the v-table.

Following this train of thought, I began to think of an object as a reference to a group of functions (methods). If I just copied this reference, then I could change the functionality of my object (exactly the way that my Property class can change its functionality by changing its DataType reference). This is the standard strategy design pattern.

Code

The solution that I arrived at looks like this (I'll explain below):
#include <cstring> // for memcpy

// Base DataType class
class DataType {
public:

 // Construction
 DataType() {}
 DataType(const DataType &newType) { setType(newType); }

 // Set the polymorphic behavior of this DataType object
 void setType(const DataType &newType) {
  memcpy(this, &newType, sizeof(DataType));
 }

 // Polymorphic behavior example
 protected: virtual int _getSizeOfType() const { return -1; }
 public: inline int getSizeOfType() const { return _getSizeOfType(); }

 // Polymorphic behavior example
 protected: virtual const char *_getTypeName() const { return NULL; }
 public: inline const char *getTypeName() const { return _getTypeName(); }
};
 
// Implementation of DataType for 'int'
class DataType_int : public DataType {
public:

 // Construction
 DataType_int() {}
 DataType_int(const DataType &newType) : DataType(newType) {}

 // Polymorphic behavior example
 protected:  virtual int _getSizeOfType() const { return sizeof(int); }

 // Polymorphic behavior example
 protected: virtual const char *_getTypeName() const { return "int"; }
};
 
// Implementation of DataType for 'float'
class DataType_float : public DataType {
public:

 // Construction
 DataType_float() {}
 DataType_float(const DataType &newType) : DataType(newType) {}

 // Polymorphic behavior example
 protected:  virtual int _getSizeOfType() const { return sizeof(float); }

 // Polymorphic behavior example
 protected: virtual const char *_getTypeName() const { return "float"; }
};
 
// Example
DataType myType = DataType_int();
const char *typeName = myType.getTypeName(); // returns "int"
int typeSize = myType.getSizeOfType(); // returns sizeof(int)
 
myType.setType(DataType_float());
typeName = myType.getTypeName(); // returns "float"

As you can see, when we set the type, we are simply using memcpy to make the object's v-table pointer point to the v-table of the object that gets passed in. This changes myType's polymorphic behavior to that of the new type!

And we no longer need pointers or singletons or dynamic memory allocations! We have an object that is the size of a v-table pointer and that is all! If you prefer a bit of a speedup here, you could just use *((void**)this) = *((void**)&newType; to copy directly, assuming that your DataType class has no members (thanks to Dezhi Zhao for pointing that out in his comments below).

Please keep in mind that this technique is not standards compliant, as the standard doesn't say anything about v-tables or v-ptrs (thank you to all of the commentators below that pointed this out). If a compiler implements virtual methods in such a way that doesn't store lookup information within an object's memory space, this technique will fail completely. However, I have never heard of a C++ compiler that doesn't work this way.

Also, you can see that we can easily change the type of myType at any point during runtime. This allows you the flexibility of having an uninitialized array of DataType objects and initialize them whenever you like later. For the performance minded out there, Dezhi Zhao also pointed out below that this will most likely cause the processor's branch prediction to fail for the getTypeName()call immediately after changing it. This will only happen for the DataType_float version above, however, as the prediction will only fail if the processor has made a prediction already.

One curiosity that you may have noticed was the use of public proxy methods (getSizeOfType) that call protected virtual methods (_getSizeOfType). We need to do this because the compiler may skip the v-table lookup when it knows the actual type of an object (as opposed to pointers or references where it doesn't). This is perfectly reasonable, but breaks our setup.

Inside the proxies, though, the v-table lookup always happens. And because they are inline, all they really do is make the compiler look up the correct method in the v-table and call that one. Remember, however, that we are NOT removing the virtual method lookup. This setup will not speed up virtual method calls in any way. In fact, we depend on compiler looking up our virtual method for this to work.

Members

One important thing to note about this setup is the absence of any member variables in DataType. Since we are doing a memcpy expecting that both objects have the same size (sizeof(DataType)), none of DataType's subclasses may add any member variables. You could add member variables to DataType with no problem, but you are NOT able to add any member variables to subclasses. Since I didn't need any member variables for DataType, this didn't present a problem for me. However, it is not impossible to add member variables to subclasses. You just need to use memory that was provided in the base class as the memory where your members live. For example:
#include <cstring> // for memcpy

// Base DataType class
class DataType {
public:

 // Construction
 DataType() {}
 DataType(const DataType &newType) { setType(newType); }

 // Set the polymorphic behavior of this DataType object
 void setType(const DataType &newType) {
  memcpy(this, &newType, sizeof(DataType));
 }

protected:

 // Member data
 enum { kMemberDataBufferSize = 256, kMemberDataSize = 0 };
 char memberDataBuffer[kMemberDataBufferSize];
};
 
// My Data Type class
class DataType_MyType : public DataType {
public:

 // Construction
 DataType_MyType() {}
 DataType_MyType(const DataType &newType) : DataType(newType) {}

 // Access myData
 inline int getExampleMember() const {
  return _getMemberData().exampleMember;
 }
 inline void setExampleMember(int newExampleMember) {
  _getMemberData().exampleMember = newExampleMember;
 }

protected:

 // Member Data
 struct SMemberData {
  int exampleMember;
 };

 // Amount of member data buffer that we use (this class' member data +
 // all base class' member data)
 enum { kMemberDataSize = sizeof(SMemberData) + DataType::kMemberDataSize };

 // Make sure that we don't run out of data buffer
 #define compileTimeAssert(x) typedef char _assert_##__LINE__[ ((x) ? 1 : 0) ];
 compileTimeAssert(kMemberDataSize <= kMemberDataBufferSize);

 // Access member data
 inline SMemberData &_getMemberData() {
  return *((SMemberData*) memberDataBuffer);
 }
 inline const SMemberData &_getMemberData() const {
  return *((const SMemberData*) memberDataBuffer);
 }
};

As you can see, the DataType base class simply provides a buffer of data, which the subclasses may use to store whatever member data they like. While this setup is a bit messy, it clearly works and without too many hoops to jump through.

0 comments:

Post a Comment