featurehub package

Submodules

featurehub.util module

featurehub.util.compute_dataset_hash(dataset)[source]

Return hash value of dataset contents.

Uses xxhash.xxh64 hash algorithm for performance, but this algorithm should not be considered cryptographically secure.

dataset : dict mapping str to pd.DataFrame

featurehub.util.get_function(source)[source]

Return a function from given source code.

This function is usually called on source code that was in turn produced by get_source. Note that the source code produced by get_source includes the source for the top-level function as well as any other local functions it calls. Here, we return the top-level function directly.

source : str or bytes

featurehub.util.get_function2(source)[source]

Return a function from given source code.

This function is usually called on source code that was in turn produced by get_source. This function differs from get_function in the method used is to write the source code to a file and then import that file as a new module.

Note that the source code produced by get_source includes the source for the top-level function as well as any other local functions it calls. Here, we return the top-level function directly.

Caveat: This does not solve the problem of being able to re-extract source from the returned function. (Or, at least, as currently implemented.)

source : str, bytes

featurehub.util.get_source(function)[source]

Extract the source code from a given function.

Recursively extracts the source code for all local functions called by given function. The resulting source code is encoded in utf-8.

Limitations: Cannot use get_source on function defined interactively in normal Python terminal. Functions defined interactively in IPython are still okay.

function : function

featurehub.util.get_top_level_function_name(namespace, remove_names=['__builtins__'])[source]

Figure out which is the top-level function in a namespace.

The top-level function is defined as the function that is not a name in any other functions. co_names is a tuple of local names. We could make more efficient, using constant lookups of names, stopping when there is only name left, and confirming this name is not called by anyone; but hard to anticipate a situation where user defines function chain that is long enough that this efficiency is required.

featurehub.util.is_positive_env(value)[source]
featurehub.util.myhash(obj)[source]

Compute md5 checksum of string-like object.

featurehub.util.possibly_talking_action(action, verbose=True)[source]

Wrap statements with description of their action.

Simply prints action before executing statement, without a trailing newline, and prints ‘done’ afterwards.

action : str
description of action
verbose : bool, optional (default=True)
whether to print anything at all
>>> with possibly_talking_action("Calling foo...", True):
        foo()
Calling foo...done
featurehub.util.run_isolated(f, *args)[source]

Execute f(args) in an isolated environment.

First, uses dill to serialize the function. Unfortunately, pickle is unable to serialize some functions, so we must serialize and deserialize the function ourselves.

Module contents