API Reference¶

Nodes Types
Node Factory Functions
- datanode()
- filternode()
Custom Node Types
- nodetype()
Pre-defined Nodes
- now()
Functions
Classes

Nodes Types¶

mdf.varnode([name] [, default] [, category])¶

Creates a simple MDFNode that can have a value assigned to it in a context.

It may also take a default value that will be used if no specific value is set for the node in a context.

A varnode may be explicitly named using the name argument, or if left as None the variable name the node is being assigned to will be used.

my_varnode = varnode(default=100)

mdf.evalnode(func [, filter] [, category])¶

Decorator for creating an MDFNode whose value is determined by calling the function func.

func should be a function or generator that takes no arguments and returns or yields the current value of the node.

If func is a generator instead of a function it will be advanced as the now() node is advanced. This can be used to calculate accumulated values and maintain internal state over evaluations:

@evalnode
def some_function():
    # Set initial value of 'accum' to 0
    accum = 0

    # yield the initial value of 'accum' (0 in this case)
    yield accum

    while True:
        accum += 1
        # yield the updated value of 'accum' on each evaluation
        yield accum

Yield essentially bookmarks the current execution point, and returns the supplied value (accum in the above example). When the node is evaluated again, execution will resume at the bookmark and continue until the next yield statement is encountered. By using an infinite while loop the node can be evaluated any number of times.

The above example can actually be shortened to:

@evalnode
def some_function():
    # Set initial value of 'accum' to 0
    accum = 0

    while True:
        yield accum
        accum += 1

In this case, yield will first return the initial value of ‘accum’. On subsequent evaluations, the incrementation step will be also executed and the node will produce the updated value.

filter may be used in the case when func is a generator to prevent the node valuation being advanced on every timestep. If supplied, it should be a function or node that returns True if the node should be advanced for the current timestep or False otherwise.

mdf.queuenode(func [, size] [, filter] [, category])¶

Decorator for creating an MDFNode that accumulates values in a collections.deque each time the context’s date is advanced.

The values that are accumulated are the results of the function func. func is a node function and takes no arguments.

If size is specified the queue will grow to a maximum of that size and then values will be dropped off the queue (FIFO).

size may either be a value or a callable (i.e. a function or a node):

@queuenode(size=10)
def node():
    return x

or:

# could be an evalnode also
queue_size = varnode("queue_size", 10)

@queunode(size=queue_size)
def node():
    return x

or using the nodetype method syntax (see nodetype_method_syntax):

@evalnode
def some_value():
    return ...

@evalnode
def node():
    return some_value.queue(size=5)

mdf.delaynode(func [, periods] [, initial_value] [, lazy] [, filter] [, category])¶

Decorator for creating an MDFNode that delays values for a number of periods corresponding to each time the context’s date is advanced.

The values that are delayed are the results of the function func. func is a node function and takes no arguments.

periods is the number of timesteps to delay the value by.

initial_value is the value of the node to be used before the specified number of periods have elapsed.

periods, initial_value and filter can either be values or callable objects (e.g. a node or a function):

@delaynode(periods=5)
def node():
    return x

or:

# could be an evalnode also
periods = varnode("periods", 5)

@delaynode(periods=periods)
def node():
    return x

If lazy is True the node value is calculated after any calling nodes have returned. This allows nodes to call delayed version of themselves without ending up in infinite recursion.

The default for lazy is False as in most cases it’s not necessary and can cause problems because the dependencies aren’t all discovered when the node is first evaluated.

e.g.:

@delaynode(periods=10)
def node():
    return some_value

or using the nodetype method syntax (see nodetype_method_syntax):

@evalnode
def some_value():
    return ...

@evalnode
def node():
    return some_value.delay(periods=5)

mdf.nansumnode(func [, filter] [, category])¶

Decorator that creates an MDFNode that maintains the nansum of the result of func.

Each time the context’s date is advanced the value of this node is calculated as the nansum of the previous value and the new value returned by func.

e.g.:

@nansumnode
def node():
    return some_value

or using the nodetype method syntax (see nodetype_method_syntax):

@evalnode
def some_value():
    return ...

@evalnode
def node():
    return some_value.nansum()

mdf.cumprodnode(func [, filter] [, category])¶

Decorator that creates an MDFNode that maintains the cumulative product of the result of func.

Each time the context’s date is advanced the value of this node is calculated as the previous value muliplied by the new value returned by func.

e.g.:

@cumprodnode
def node():
    return some_value

or using the nodetype method syntax (see nodetype_method_syntax):

@evalnode
def some_value():
    return ...

@evalnode
def node():
    return some_value.cumprod()

TODO: That node needs a test for the argument skipna, since it is not entirely clear what it should do if the first value is na. It would be nice to be able to specify an initial value.

mdf.ffillnode(func[, initial_value])¶

Decorator that creates an MDFNode that returns the current result of the decoratored function forward filled from the previous value where the current value is NaN.

The decorated function may return a float, pandas Series or numpy array.

e.g.:

@ffillnode
def node():
    return some_value

or using the nodetype method syntax (see nodetype_method_syntax):

@evalnode
def some_value():
    return ...

@evalnode
def node():
    return some_value.ffill()

mdf.rowiternode(func [, index_node=now] [, missing_value=np.nan] [, filter] [, category])¶

Decorator that creates an MDFNode that returns the current row of item of a pandas DataFrame, WidePanel or Series returned by the decoratored function.

What row is considered current depends on the index_node parameter, which by default is now.

missing_value may be specified as the value to use when the index_node isn’t included in the data’s index. The default is NaN.

delay can be a number of timesteps to delay the index_node by, effectively shifting the data.

ffill causes the value to get forward filled if True, default is False.

e.g.:

@rowiternode
def datarow_node():
    # construct a dataframe indexed by date
    return a_dataframe

@evalnode
def another_node():
    # the rowiternode returns the row from the dataframe
    # for the current date 'now'
    current_row = datarow_node()

or using the nodetype method syntax (see nodetype_method_syntax):

@evalnode
def dataframe_node():
    # construct a dataframe indexed by date
    return a_dataframe

@evalnode
def another_node():
    # get the row from dataframe_node for the current_date 'now'
    current_row = dataframe_node.rowiter()

mdf.returnsnode(func [, filter] [, category])¶

Decorator that creates an MDFNode that returns the returns of a price series.

NaN prices are filled forward. If there is a NaN price at the beginning of the series, we set the return to zero. The decorated function may return a float, pandas Series or numpy array.

e.g.:

@returnsnode
def node():
    return some_price

or using the nodetype method syntax (see nodetype_method_syntax):

@evalnode
def some_price():
    return ...

@evalnode
def node():
    return some_price.returns()

The value at any timestep is the return for that timestep, so the methods ideally would be called ‘return’, but that’s a keyword and so returns is used.

mdf.applynode(func, [, args=()] [, kwargs={}] [, category])¶

Return a new mdf node that applies func to the value of the node that is passed in. Extra args and kwargs can be passed in as values or nodes.

Unlike most other node types this shouldn’t be used as a decorator, but instead should only be used via the method syntax for node types, (see nodetype_method_syntax) e.g.:

A_plus_B_node = A.applynode(operator.add, args=(B,))

mdf.lookaheadnode(func, periods [, offset=pa.datetools.BDay()] [, filter] [, category])¶

Node type that creates an MDFNode that returns a pandas Series of values of the underlying node for a sequence of dates in the future.

Unlike most other node types this shouldn’t be used as a decorator, but instead should only be used via the method syntax for node types, (see nodetype_method_syntax) e.g.:

future_values = some_node.lookahead(periods=10)

This would get the next 10 values of some_node after the current date. Once evaluated it won’t be re-evaluated as time moves forwards; it’s always the first set of future observations. It is intended to be used sparingly for seeding moving average calculations or other calculations that need some initial value based on the first few samples of another node.

The dates start with the current context date (i.e. now()) and is incremented by the optional argument offset which defaults to weekdays (see pandas.datetools.BDay).

Parameters:	periods (int) – the total number of observations to collect, excluding any that are ignored due to any filter being used. offset – date offset object (e.g. datetime timedelta or pandas date offset) to use to increment the date for each sample point. filter – optional node that if specified should evaluate to True if an observation is to be included, or False otherwise.

Node Factory Functions¶

mdf.datanode([name=None,] data [, index_node] [, missing_value] [, delay] [, name] [,filter] [,category])¶

Return a new mdf node for iterating over a dataframe, panel or series.

data is indexed by another node index_node, (default is now()), which can be any node that evaluates to a value that can be used to index into data.

If the index_node evaluates to a value that is not present in the index of the data then missing_value is returned.

missing_value can be a scalar, in which case it will be converted to the same row format used by the data object with the same value for all items.

delay can be a number of timesteps to delay the index_node by, effectively shifting the data.

ffill causes the value to get forward filled if True, default is False.

data may either be a data object itself (DataFrame, WidePanel or Series) or a node that evaluates to one of those types.

e.g.:

df = pa.DataFrame({"A" : range(100)}, index=date_range)
df_node = datanode(data=df)

ctx[df_node] # returns the row from df where df == ctx[now]

A datanode may be explicitly named using the name argument, or if left as None the variable name the node is being assigned to will be used.

mdf.filternode([name=None,] data [, index_node] [, delay] [, name] [,filter] [,category])¶

Return a new mdf node for using as a filter for other nodes based on the index of the data object passed in (DataFrame, Series or WidePanel).

The node value is True when the index_node (default=now) is in the index of the data, and False otherwise.

This can be used to easily filter other nodes so that they operate at the same frequency of the underlying data.

delay can be a number of timesteps to delay the index_node by, effectively shifting the data.

A filternode may be explicitly named using the name argument, or if left as None the variable name the node is being assigned to will be used.

Custom Node Types¶

mdf.nodetype(func)¶

decorator for creating a custom node type:

#
# create a new node type 'new_node_type'
#
@nodetype
def new_node_type(value, fast, slow):
    return (value + fast) * slow

#
# use the new type to create a node
#
@new_node_type(fast=1, slow=10)
def my_node():
    return some_value

# ctx[my_node] returns new_node_type(value=my_node(), fast=1, slow=10)

The node type function takes the value of the decorated node and any other keyword arguments that may be supplied when the node is created.

The node type function may be a plain function, in which case it is simply called for every evaluation of the node, or it may be a co-routine in which case it is sent the new value for each iteration:

@nodetype
def nansumnode(value):
    accum = 0.
    while True:
        accum = np.nansum([value, accum])
        value = yield accum

@nansumnode
def my_nansum_node():
    return some_value

The kwargs passed to the node decorator may be values (as shown above) or nodes which will be evaluated before the node type function is called.

Nodes defined using the @nodetype decorator may be applied to classmethods as well as functions and also support the standard node kwargs ‘filter’ and ‘category’.

Node types may also be used to add methods to the MDFNode class (See nodetype_method_syntax):

@nodetype(method="my_nodetype_method")
def my_nodetype(value, scale=1):
    return value * scale

@evalnode
def x():
    return ...

@my_nodetype(scale=10)
def y():
    return x()

# can be re-written as:
y = x.my_nodetype_method(scale=10)

Pre-defined Nodes¶

mdf.now()¶

Pre-defined node present in every context that always evaluates to the date set on the context.

See MDFContext.get_date() and MDFContext.set_date().

Functions¶

mdf.shift(node, target [, values] [, shift_sets])¶

This function is for use inside node functions.

Applies shifts to the current context for each shift specified and returns the value of ‘node’ with each of the shifts applied.

If target and values are specified ‘target’ is a node to apply a series of shifts to, specified by ‘values’.

If shifts_sets is specified, ‘shift_sets’ is a list of nodes to values dictionaries, each one specifying a shift.

If the same shift set dictionaries are used several times ShiftSet objects may be used instead which will be slightly faster. See make_shift_set().

Returns a list of the results of evaluating node for each of the shifted contexts in the same order as values or shift_sets.

See MDFContext.shift() for more details about shifted contexts.

mdf.run(date_range [, callbacks=[]] [, values={}] [, shifts=None] [, filter=None] [, ctx=None])¶

creates a context and iterates through the dates in the date range updating the context and calling the callbacks for each date.

If the context needs some initial values set they can be passed in the values dict or as kwargs.

For running the same calculation but with different inputs shifts can be set to a list of dictionaries of (node -> value) shifts.

If shifts is not None and num_processes is greater than 0 then that many child processes will be spawned and the shifts will be processed in parallel.

Any time-dependent nodes are reset before starting by setting the context’s date to datetime.min (after applying time zone information if available).

mdf.plot(date_range, nodes [, labels=None] [, values={}] [, filter=None] [, ctx=None])¶: evaluates a list of nodes for each date in date_range and plots the results using matplotlib.

mdf.build_dataframe(date_range, nodes [, labels=None] [, values={}] [, filter=None] [, ctx=None])¶: evaluates a list of nodes for each date in date_range and returns a dataframe of results

mdf.get_final_values(date_range, nodes [, labels=None] [, values={}] [, filter=None] [, ctx=None])¶: evaluates a list of nodes for each date in date_range and returns a list of final values in the same order as nodes.

mdf.scenario(date_range, result_node, x_node, x_shifts, y_node, y_shifts [, values={}] [, filter=None] [, ctx=None] [, dtype=float])¶

evaluates a single result_node for each date in date_range and gets its final value for each shift in x_shifts and y_shifts.

x_shifts and y_shifts are values for x_node and y_node respectively.

result_node should evaluate to a single float, and the result is a 2d nparray

mdf.plot_surface(date_range, result_node, x_node, x_shifts, y_node, y_shifts [, values={}] [, filter=None] [, ctx=None] [, dtype=float])¶

evaluates a single result_node for each date in date_range and gets its final value for each shift in x_shifts and y_shifts.

x_shifts and y_shifts are values for x_node and y_node respectively.

result_node should evaluate to a single float.

The results are plotted as a 3d graph and returned as a 2d numpy array.

mdf.make_shift_set(shift_set_dict)¶

Return a ‘ShiftSet’ object that encapsulates the information required to get a shifted context.

This can be used to pass to the shift() function instead of a dictionary for better performance when regularly shifting by the same thing.

Classes¶

MDFContext¶

class mdf.MDFContext¶

Nodes on their own don’t have values, they are just the things that can calculate values.

Nodes only have values in a context.

Contexts can be thought of as containers for values of nodes.

__init__(now)¶: Initializes a new context with now() set to now (datetime).

save(filename, bat_filename=None, start_date=None, end_date=None)¶

Write the context and its state, including all shifted contexts and node states, to a binary file.

The resulting file can be re-loaded using MDFContext.load().

If filename endswith .zip or .bz2 or .gz the data will be compressed. The MDFContext.load() method is able to load these compressed files.

Parameters:	filename – filename of the output file, or an open file handle. start_date – datetime used as an optional argument to start the mdf viewer in the .bat file. end_date – datetime used as an optional argument to start the mdf viewer in the .bat file.

static load(filename)¶

Load a context from a file and return a new MDFContext with the same state as the context that was saved (i.e. all the same shifted contexts and node values).

Parameters:	filename – filename of the file to load or an open file handle.

get_date()¶

returns the current date set on this context.

This is equivalent to getting the value of the now() node in this context.

set_date(date)¶

sets the current date set on this context.

This updates the value of the now() node in this context and also calls the update functions for any previously evaluated time-dependent nodes in this context.

get_value(node)¶: returns the value of the node in this context

set_value(node, value)¶: Sets a value of a node in the context.

set_override(node, value)¶: Sets an override for a node in this context.

__getitem__(node)¶

Gets a node value in the context.

See get_value().

__setitem__(node, value)¶

Sets a node value in the context. If value is an MDFNode it is applied as an override.

See set_value() and set_override().

shift(shift_set, cache_context=True)¶

create a new context linked to this context, but with nodes set to specific values.

shift_set is a dictionary of nodes to values. The returned shifted context will have each node in the dictionary set to its corresponding value.

If the same shift set is used several times a ShiftSet object may be used instead of a dictionary which will be slightly faster. See make_shift_set().

If a value is an MDFNode then it will be applied as an override to the target node.

If a context has already been created with this shift that existing context is returned instead.

Shifted contexts are read-only.

If cache_context is True the shifted context will be cached and if shift() is called again with the same target and value the cached context will be returned. The exception to this is if target is now(), in which case a new shifted context is returned each time.

to_dot(filename=None, nodes=None, colors={}, all_contexts=True, max_depth=None, rankdir="LR")¶

constructs a .dot graph from the nodes that have a value in this context and writes it to filename if not None.

colors can be used to override any of the colors used to color the graph. the defaults are:

defaults = {
    "node"      :   "white",
    "nownode"   :   "darkorchid1",
    "queuenode" :   "darksalmon",
    "varnode"   :   "deepskyblue",
    "shiftnode" :   "gold",
    "headnode"  :   "olivedrab3",
    "edge"      :   "black",
    "nowedge"   :   "darkorchid4",
    "varedge"   :   "deepskyblue4",
    "shiftedge" :   "gold4",
    "context"   :   "grey90",
    "module0"   :   "grey81",
    "module1"   :   "grey72"
}

If all_contexts is true it will look for head nodes in all contexts, otherwise only this context will be used.

If max_depth is not None the graph will be truncated so that all nodes are at most max_depth levels deep from the root node(s).

rankdir sets how the graph is ordered when rendered. Possible values are:

“TB” : top to bottom
“LR” : left to right
“BT” : bottom to top
“RL” : right to left

returns a pydot.Graph object

MDFNode¶

class mdf.MDFNode¶

Nodes should be viewed as opaque objects and not instanciated through anything other than the decorators provided.

They are callable objects and should be called from inside other node functions.

When called they are evaluated in the current context and value is returned. If called multiple times a cached value is returned unless the node has been marked as requiring re-evaluation by one of its depenedencies changing.

MDFEvalNode¶

class mdf.MDFEvalNode¶

Sub-class of MDFNode for nodes that are evaluated rather than plain value storing nodes.

This is an opaque type and shouldn’t be used to construct nodes. Instead use the node type decorators.

CSVWriter¶

class mdf.CSVWriter(fh, nodes, columns=None)¶

callable object that appends values to a csv file For use with mdf.run

__init__(fh, nodes[, columns=None])¶

Writes node values to a csv file for each date.

‘fh’ may be a file handle, or a filename, or a node.

If fh is a node it will be evaluated for each context used and is expected to evaluate to the filename or file handle to write the results to.

DataFrameBuilder¶

class mdf.DataFrameBuilder(nodes, contexts=None, dtype=<type 'object'>, sparse_fill_value=None, filter=False, start_date=None)¶

__init__(nodes[, labels=None])¶

Constructs a new DataFrameBuilder.

dtype and sparse_fill_value can be supplied as hints to the data type that will be constructed and whether or not to try and create a sparse data frame.

If filter is True and the nodes are filtered then only values where all the filters are True will be returned.

NB. the labels parameter is currently not supported

clear()¶

get_dataframe([ctx=None])¶

dataframes¶: all dataframes created by this builder (one per context)

dataframe¶

plot([show=True])¶: plots all collected dataframes and shows, if show=True

FinalValueCollector¶

class mdf.FinalValueCollector(nodes)¶

callable object that collects the final values for a set of nodes. For use with mdf.run

__init__(nodes)¶

clear()¶: clears all previously collected values

get_values([ctx=None])¶: returns the collected values for a context

get_dict([ctx=None])¶: returns the collected values as a dict keyed by the nodes

values¶: returns the values for the last context