Notes on plugins and shared libraries

| Comments (1) |
One of the systems I work on has a feature that lets you add new functionality via loading shared libraries (i.e., plugins). This is a pretty conventional technique for programs like Apache, IE, etc. but we add a new wrinkle; there are multiple programs that all need to load the same shared object. For instance, if you want to add a new protocol decoder, you're actually adding it in three places:

  • Into the packet processing stack (the obvious place).
  • New commands into the management shell.
  • A statistics pretty printer into the statistics command line utility.

So, at some level here you actually have three plugins that work together, one for each application. You could do this as three separate .sos, but that makes for a nasty management problem, especially as the number of applications goes up, and of course it makes code re-use difficult. So, ideally, you'd like to just have one .so.

Getting something like this to work properly requires a fair amount of attention to dependencies (I'm assuming you know how to make a .so using ld -shared.) The simplest problem is that the plugins need to invoke functions that are in the main program. For instance, in the management shell we need to call functions in the main program in order to register our new commands. Normally, functions that are linked into the main program aren't exported to dynamically loaded objects, so you need to use the --export-dynamic argument to ld when you link the main program.

A secondary problem with this kind of reference is that there may be functions that plugins depend upon that aren't called anywhere in the program itself. Plugins need to be able to know they can call a specific API. In order to make this work, you need to force all the API functions to be linked into the application program (so they can be re-exported to the plugins). Basically, you do this by creating a new source file that references the API functions via placing a function pointer for the API function into some static variable. On UNIX-type systems, files are linked in one at a time, and the transitive closure of all references needs to be linked in. So, what you do is create some API forcing file which points to each API file (at least transitively) and then have that file export some variable that gets referenced in some source file that is actually used in the main program.

The above two techniques take care of making the functions we need available. Now we have to deal with the opposite problem, which is having things work when not everything is available. So, for instance, if we're writing an SSL processing module, it needs to reference OpenSSL in order to do its thing. But, the prettyprinter for its stats doesn't need OpenSSL. We don't want to force the stats system to link with OpenSSL just to print some values.

This is taken care of with two techniques. First, when we dynamically load modules, we call dlopen with the argument RTLD_LAZY. This tells it only to try to resolve functions when they're first referenced rather then when the library is loaded. This gets us part but not all of the way there. The second thing is that we need to compile our plugins with position-independent code---code which can run correctly no matter where the library is loaded into memory. With gcc, this is done with the -fPIC falg.

This is a simple fact and something I knew about, but when I started writing shared libraries for this system, I hadn't written one in a long time and I forgot -fPIC and yet all my shared libraries seemed to load and work properly so I never noticed the problem. Then I noticed that I was starting to have problems where even though I was loading with RTLD_LAZY, dlopen was still trying to resolve all the symbols referenced in the .so. This isn't just a code size issue: remember that the module needs to call functions in all the application programs, but the management shell doesn't provide the functions that are in the packet processing stack, so you get linkage errors. A little debugging resolved what appears to be the problem: if you don't provide -fPIC you get relocatable code and the linker automatically relocates it, but it tries to resolve all (or at least some) of the symbols even if they're not referenced, which, as I mentioned before, is bad.

So, to recap:

  • Forcibly link all the APIs that plugins can depend on.
  • Use --export-dynamic to make those APIs available to the plugins.
  • Compile your plugin code with -fPIC
  • Use RTLD_LAZY when you link in the modules.

Fun, eh?

1 Comments

Does the gcc 4.x __attribute__(dllexport) work the way you might want it to for exporting symbols from the app? That might remove the need for the hacky force-the-linkage thing. You might also need to use -fkeep-inline-functions to make sure that anything which gets inlined in the main app is still there as a callable exported function too. Using dllexport instead of --export-dynamic will allow you better control over which symbols are exported (assuming that it works right for apps and not just regular DLLs). Otherwise some fun-loving plugin author might start calling functions in your app which they shouldn't be calling, since *everything* is exported.

Leave a comment