featurehub.modeling package¶

Submodules¶

featurehub.modeling.automl module¶

featurehub.modeling.metrics module¶

class featurehub.modeling.metrics.Metric(name, scoring, value)[source]¶

Bases: object

convert(kind='user')[source]¶

Convert to nice format for returning to user or inserting into db.

Conversion to user format returns a dictinary with one element mapping metric name to metric value. Conversion to db format returns a dictionary with keys “name”, “scoring”, and “value” mapping to their respective values. Both formats convert np.floating values to Python floats.

kind : str: One of “user” or “db”

classmethod from_dict(d, kind='user')[source]¶: Instantiate Metric from user/db format.

static name_to_scoring(name)[source]¶: Find the scoring type associated with the metric name.

to_string(kind='user')[source]¶: Convert to user/db format, then return string representation.

class featurehub.modeling.metrics.MetricList(data=None)[source]¶

Bases: collections.abc.MutableSequence

append(val)[source]¶: S.append(value) – append value to the end of the sequence

convert(kind='user')[source]¶

Convert underlying metric objects.

Conversion to user format returns a dictionary with each element mapping metric name to metric value. Conversion to db format returns a list of dictionaries, each with keys “name”, “scoring”, and “value” mapping to their respective values. Both formats convert np.floating values to Python floats.

kind : str: One of “user” or “db”

classmethod from_dict_user(d)[source]¶

classmethod from_list_db(l)[source]¶

classmethod from_object(obj)[source]¶

Instantiate MetricList from supported format.

Tries to detect the underlying format and deal with that appropriately.

obj: MetricList, dict, list of dict, or list of Metric

insert(ii, val)[source]¶: S.insert(index, value) – insert value before index

to_string(kind='user')[source]¶: Get user-readable output.

featurehub.modeling.model module¶

class featurehub.modeling.model.Model(problem_type)[source]¶

Bases: object

Versatile modeling object.

Handles classification and regression problems and computes variety of performance metrics.

problem_type : str: One of “classification” or “regression”

BINARY_METRIC_AGGREGATION = 'micro'¶

CLASSIFICATION = 'classification'¶

CLASSIFICATION_SCORING = [{'name': 'Accuracy', 'scoring': 'accuracy'}, {'name': 'Precision', 'scoring': 'precision'}, {'name': 'Recall', 'scoring': 'recall'}, {'name': 'ROC AUC', 'scoring': 'roc_auc'}]¶

MULTICLASS_METRIC_AGGREGATION = 'micro'¶

REGRESSION = 'regression'¶

REGRESSION_SCORING = [{'name': 'Root Mean Squared Error', 'scoring': 'root_mean_squared_error'}, {'name': 'R-squared', 'scoring': 'r2'}]¶

compute_metrics(X, Y, kind='cv', **kwargs)[source]¶

compute_metrics_cv(X, Y)[source]¶

Compute cross-validated metrics.

Trains this model on data X with labels Y.

Returns a MetricList with the name, scoring type, and value for each Metric. Note that these values may be numpy floating points, and should be converted prior to insertion in a database.

X : numpy array-like or pd.DataFrame: data
Y : numpy array-like or pd.DataFrame or pd.DataSeries: labels

compute_metrics_train_test(X, Y, n)[source]¶: Compute metrics on test set.

cv_score_mean(X, Y, scorings)[source]¶

Compute mean score across cross validation folds.

Split data and labels into cross validation folds and fit the model for each fold. Then, for each scoring type in scorings, compute the score. Finally, average the scores across folds. Returns a dictionary mapping scoring to score.

X : numpy array-like: data
Y : numpy array-like: labels
scorings : list of str: scoring types

scores_to_metriclist(scorings, scores)[source]¶

featurehub.modeling.scorers module¶

featurehub.modeling.scorers.ndcg_score(y_true, y_pred, k=5)[source]¶

Normalized discounted cumulative gain (NDCG) at rank k

This specific score function operates under the assumption that the relevance for the correct label is 1 and the relevance for all other labels is 0.

y_true : array-like, shape = [n_samples,]: Ground truth (true relevance labels). These must be encoded to integer values, by using LabelEncoder, for example.
y_pred : array-like, shape = [n_samples, n_classes]: Probability predictions for each class.
k : int: Rank.

NDCG @k : float

featurehub.modeling.scorers.rmsle_score(y_true, y_pred, **kwargs)[source]¶