plugins

This namespace aggregates modules related to Tuna’s plugin system.

Why have a plugin system?

In a sense, Tuna is both a toolkit and a framework. When used as a library, the tools, models and structural pieces of Tuna can be used by external programs, just like any other Python module. As a framework, however, these different parts are connected in “pipelines”, so that specific data processing tasks can be constructed from these independent parts.

It is straightforward to add a new tool to Tuna: it can be encapsulated in a Python module .py file, and placed in the tools directory. If the module has the same name, input and output variables, then it will substitute the original module, and pipelines that used the original version should work with the edited version.

However, when it comes to creating new algorithms, the existing pipelines will not know of this new tool, so a system is needed to take care of that.

An analogous situation motivated the development of the plugin system of [CosmoSIS], where benefits and costs of a modular design were considered, regarding their field of cosmological parameter estimation. Their cost and benefits analysis can be summarized as below.

Benefits:

  • replacement, or the ability to change parts of the code easily;
  • verifiability, since it is easier to test, read and understand smaller pieces of code;
  • debugging, since the modular architecture imposes a clear structure on its modules;
  • consistency, derived physics parameters that are shared through the modular framework, instead of being re-coded in each module;
  • languages, in the sense that different modules can be written in different programming languages;
  • legacy, by providing a structure where existing but independent code can be reused in a larger workflow;
  • publishing, in the sense that innovative code regarding one module can be readily combined with innovative code from other developers, as long as all respect the framework modularization rules;
  • samplers, which correspond to Tuna’s pipelines, can be more easily be substituted - and the feasibility of different approaches to solve the problem at hand can be more readily investigated.

Costs:

  • overheads, corresponding to the “boilerplate code” that must run to effectively launch the module as a part of the framework.
  • interpolation, since the data served to a module may not be sampled at the points it would require, because of requirements of another module;
  • speed, related to the overhead code being run;
  • consistency, missuse of the code can happen, specially when compared to a monolithic code base, written to specifically implement a scientific solution;
  • temptation, since it is simple to add steps to a pipeline, this might happen often, although the introduction of uncertainty and errors from the “excessive” module will be its negative consequences;
  • legacy, since existing code is not yet adapted to the specific modular framework, the development of adapted versions will require some effort.

It is our opinion that the benefits far outweight the costs associated with a modular approach. Also, it is our expectation that costs associated with developing the framework, and adapting existing code to it, would become smaller in the future, as new code is already produced modularly, and as the expertise of adapting code is spread to a wider community of developers.

[CosmoSIS]Zuntz et al, CosmoSIS: Modular cosmological parameter estimation. Available at http://arxiv.org/abs/1409.3409. Retrieved on 2015-09-17.

Tuna’s plugins system

Plugins fulfill two important goals in Tuna:

  1. They facilitate extension of Tuna, by providing a mechanism for writing compatible functionality.
  2. They allow modification of the framework’s default behaviour, by allowing default plugins to be substituted with customized ones.

Technologically, such goals could be fulfilled in many ways. However, since plugins are meant to be the easiest way for some functionality to be included in Tuna, the specific system for plugins was designed with simplicity in mind.

It was decided to design the system around Python function calls. This was chosen because it is expected that users would have, at the very least, mastery over structured programming. Therefore, having a plugin system that consists essentially of some limitation on how to write functions would not be a syntatical problem for these users.

This simplistic design has its limitations - for example, previous work on parallelism on Tuna was negated by the refactoring of its tools into plugins. When other designs were considered, the designs they inspired were much more complex and reliant on higher abstractions (classes, APIs, RPCs). In the end, the “function signature” plugin system was chosen because it seems the most likely to be properly understood by the users.

How it works

Consider the following two functions:

def function_1 ( argument_1, argument_2 ):
    # ...

def function_2 ( argument_3, argument_4 ):
    # ...

Since Python does not have type enforcement for its function signatures, there is no way to know if these two functions are equivalent in their signatures: we do not know if argument_1 is an int and argument_3 is a str or something else. We do not know the return types for either function!

However, if we are to create plugins out of functions, and a plugin must be replaceable by an equivalent plugin, we need a form of specifying this equivalency. This is done by adopting Python type hinting, which is a feature introduced in Python 3.5 but which has been in development for a long time, under other names. Essentially, it is a syntax to define the types of the arguments and return values for a function.

These are two functions, including type hinting:

def function_1 ( argument_1 : int, argument_2 : float ) -> numpy.ndarray:
    # ...

def function_2 ( argument_3 : float, argument_4 : int ) -> numpy.ndarray:
    # ...

Now, it is possible to decide on the equivalency of two functions - from a purely syntactical point of view. However, since Tuna is a framework focused on a specific domain - reduction of data from Fabry-Pérot spectrometers - the semantic of the function should also be taken in consideration.

This is accomplished in two ways. First, arguments names and ordering must be equal. So that in our example, the two functions would not be considered equivalent. Secondly, functions must be registered as plugins, under a certain key, and pipelines will use the plugin currently associated with a key.

When a new key / function plugin is registered, the user is free to use whatever function he wants, since there is no equivalency to be met. However, to substitute an existing key / function plugin, it is necessary to do so with a function that is equivalent to the one currently registered in the plugin.

The equivalency rules are:

  1. Functions must be fully annotated (all parameters and the function return type must be specified).
  2. Arguments names and order of appearance on the function signature must be equal, and obey alphanumerical order.

During development, a user might consult the current function signature associated with a given key by the following command:

tuna.plugins.registry ( 'key' )

And an output similar to this should be displayed:

def function ( argument_1 : int ) -> float

With this information, it should be simple for a user to write new or replacement plugins.