featurehub.modeling package¶
Submodules¶
featurehub.modeling.automl module¶
featurehub.modeling.metrics module¶
-
class
featurehub.modeling.metrics.
Metric
(name, scoring, value)[source]¶ Bases:
object
-
convert
(kind='user')[source]¶ Convert to nice format for returning to user or inserting into db.
Conversion to user format returns a dictinary with one element mapping metric name to metric value. Conversion to db format returns a dictionary with keys “name”, “scoring”, and “value” mapping to their respective values. Both formats convert np.floating values to Python floats.
- kind : str
- One of “user” or “db”
-
-
class
featurehub.modeling.metrics.
MetricList
(data=None)[source]¶ Bases:
collections.abc.MutableSequence
-
convert
(kind='user')[source]¶ Convert underlying metric objects.
Conversion to user format returns a dictionary with each element mapping metric name to metric value. Conversion to db format returns a list of dictionaries, each with keys “name”, “scoring”, and “value” mapping to their respective values. Both formats convert np.floating values to Python floats.
- kind : str
- One of “user” or “db”
-
featurehub.modeling.model module¶
-
class
featurehub.modeling.model.
Model
(problem_type)[source]¶ Bases:
object
Versatile modeling object.
Handles classification and regression problems and computes variety of performance metrics.
- problem_type : str
- One of “classification” or “regression”
-
BINARY_METRIC_AGGREGATION
= 'micro'¶
-
CLASSIFICATION
= 'classification'¶
-
CLASSIFICATION_SCORING
= [{'name': 'Accuracy', 'scoring': 'accuracy'}, {'name': 'Precision', 'scoring': 'precision'}, {'name': 'Recall', 'scoring': 'recall'}, {'name': 'ROC AUC', 'scoring': 'roc_auc'}]¶
-
MULTICLASS_METRIC_AGGREGATION
= 'micro'¶
-
REGRESSION
= 'regression'¶
-
REGRESSION_SCORING
= [{'name': 'Root Mean Squared Error', 'scoring': 'root_mean_squared_error'}, {'name': 'R-squared', 'scoring': 'r2'}]¶
-
compute_metrics_cv
(X, Y)[source]¶ Compute cross-validated metrics.
Trains this model on data X with labels Y.
Returns a MetricList with the name, scoring type, and value for each Metric. Note that these values may be numpy floating points, and should be converted prior to insertion in a database.
- X : numpy array-like or pd.DataFrame
- data
- Y : numpy array-like or pd.DataFrame or pd.DataSeries
- labels
-
cv_score_mean
(X, Y, scorings)[source]¶ Compute mean score across cross validation folds.
Split data and labels into cross validation folds and fit the model for each fold. Then, for each scoring type in scorings, compute the score. Finally, average the scores across folds. Returns a dictionary mapping scoring to score.
- X : numpy array-like
- data
- Y : numpy array-like
- labels
- scorings : list of str
- scoring types
featurehub.modeling.scorers module¶
-
featurehub.modeling.scorers.
ndcg_score
(y_true, y_pred, k=5)[source]¶ Normalized discounted cumulative gain (NDCG) at rank k
This specific score function operates under the assumption that the relevance for the correct label is 1 and the relevance for all other labels is 0.
- y_true : array-like, shape = [n_samples,]
- Ground truth (true relevance labels). These must be encoded to integer values, by using LabelEncoder, for example.
- y_pred : array-like, shape = [n_samples, n_classes]
- Probability predictions for each class.
- k : int
- Rank.
NDCG @k : float