scikit-learn integration API reference

API reference for neptune--scikit-learn integration.

You can use the Neptune integration with scikit-learn to track your classifiers, regressors, and k-means clustering results.

create_regressor_summary()

Returns a scikit-learn regressor summary that includes:

All regressor parameters
Pickled estimator (model)
Model performance visualizations

The regressor should be fitted before calling this function.

Parameters

Name	Type	Default	Description
regressor	regressor	-	Fitted scikit-learn regressor object.
X_train	ndarray	-	Training data matrix.
X_test	ndarray	-	Testing data matrix.
y_train	ndarray	-	The regression target for training.
y_test	ndarray	-	The regression target for testing.
nrows	int, optional	1000	Log firstnrowsrows of test predictions.
log_charts	bool, optional	True	If True, calculate and log chart visualizations.This is equivalent to calling thecreate_learning_curve_chart()create_feature_importance_chart(),create_residuals_chart(),create_prediction_error_chart(), andcreate_cooks_distance_chart()functions from this module.Note:Calculating visualizations is potentially expensive depending on input data and regressor, and may take some time to finish.

This is equivalent to calling thecreate_learning_curve_chart()``create_feature_importance_chart(),create_residuals_chart(),create_prediction_error_chart(), andcreate_cooks_distance_chart()functions from this module.

Note:Calculating visualizations is potentially expensive depending on input data and regressor, and may take some time to finish.

Returns

dictwith all metadata, which can be assigned to a run namespace:

run["summary"] = create_regressor_summary(...)

Example

# Create a run
import neptune
run = neptune.init_run()

# Log random forest regressor summary
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["random_forest/summary"] = npt_utils.create_regressor_summary(
 rfr, X_train, X_test, y_train, y_test
)

create_classifier_summary()

Returns a scikit-learn classifier summary that includes:

All classifier parameters
Pickled estimator (model)
Test predictions probabilities
Model performance visualizations

The classifier should be fitted before calling this function.

Parameters

Name	Type	Default	Description
classifier	classifier	-	Fitted scikit-learn classifier object.
X_train	ndarray	-	Training data matrix.
X_test	ndarray	-	Testing data matrix.
y_train	ndarray	-	The classification target for training.
y_test	ndarray	-	The classification target for testing.
nrows	int, optional	1000	Log firstnrowsrows of test predictions and prediction probabilities.
log_charts	bool, optional	True	If True, calculate and log chart visualizations.This is equivalent to calling thecreate_classification_report_chart()create_confusion_matrix_chart(),create_roc_auc_chart(),create_prediction_error_chart(),create_precision_recall_chart()andcreate_class_prediction_error_chart()functions from this module.Note:Calculating visualizations is potentially expensive depending on input data and regressor, and may take some time to finish.

This is equivalent to calling thecreate_classification_report_chart()``create_confusion_matrix_chart(),create_roc_auc_chart(),create_prediction_error_chart(),create_precision_recall_chart()andcreate_class_prediction_error_chart()functions from this module.

Note:Calculating visualizations is potentially expensive depending on input data and regressor, and may take some time to finish.

Returns

dictwith all metadata, which can be assigned to the run namespace:

run["summary"] = create_classifier_summary(...)

Example

# Create a run
import neptune

run = neptune.init_run()

# Log random forest classifier summary
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils

run["random_forest/summary"] = npt_utils.create_classifier_summary(
 rfc, X_train, X_test, y_train, y_test
)

create_kmeans_summary()

Returns a scikit-learn k-means summary.

This method fits the k-means model to data and logs:

All KMeans parameters
Clustering visualizations: k-means elbow chart and silhouette coefficients chart

Parameters

Name	Type	Default	Description
model	KMeans	-	KMeans object
X	ndarray	-	Training instances to cluster
nrows	int, optional	1000	Number of rows to log in the cluster labels
kwargs	-	-	KMeans parameters

Returns

dictwith all metadata, which can be assigned to a run namespace:run["summary"]=create_kmeans_summary(...)

Example

# Create a run
import neptune
run = neptune.init_run()

# Log random forest classifier summary
km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)

import neptune.integrations.sklearn as npt_utils
run["kmeans/summary"] = npt_utils.create_kmeans_summary(km, X)

get_estimator_params()

Get estimator parameters.

Parameters

Name	Type	Description
estimator	estimator	Scikit-learn estimator to log parameters for.

Returns

dictwith all parameters mapped to their values.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log estimator parameters
rfr = RandomForestRegressor()

import neptune.integrations.sklearn as npt_utils
from neptune.utils import stringify_unsupported

run["estimator/params"] = stringify_unsupported(npt_utils.get_estimator_params(rfr))

get_pickled_model()

Get pickled estimator.

Parameters

Name	Type	Description
estimator	estimator	Scikit-learn estimator to pickle.

Returns

Filevalue object with a pickled model that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log pickled model
rfr = RandomForestRegressor()

import neptune.integrations.sklearn as npt_utils
run["estimator/pickled_model"] = npt_utils.get_pickled_model(rfr)

get_test_preds()

Get test predictions as a table.

If you passy_pred, predictions are not computed fromX_testdata.

The estimator should be fitted before calling this function.

Parameters

Name	Type	Default	Description
estimator	estimator	-	scikit-learn estimator to compute predictions.
X_test	ndarray	-	Testing data matrix.
y_test	ndarray	-	The regression target for testing.
y_pred	ndarray, optional	None	Estimator predictions on test data.
nrows	int, optional	1000	Number of rows to log.

Returns

Filevalue object with test predictions as a table that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log test predictions as a table
rfr = RandomForestRegressor()

import neptune.integrations.sklearn as npt_utils
run["estimator/test_preds"] = npt_utils.get_test_preds(rfr, X_test, y_test)

get_test_preds_proba()

Get test prediction probabilities.

If you pass X_test , prediction probabilities are computed from data.
If you pass y_pred_proba , prediction probabilities are not computed from X_test data.

The estimator should be fitted before calling this function.

Parameters

Name	Type	Default	Description
classifier	classifier	-	scikit-learn classifier to compute prediction probabilities.
X_test	ndarray	-	Testing data matrix.
y_pred_proba	ndarray, optional	None	Classifier prediction probabilities on test data.
nrows	int, optional	1000	Number of rows to log.

Returns

Filevalue object with test prediction probabilities as a table that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log classifier test predictions probabilities
rfr = RandomForestRegressor()

import neptune.integrations.sklearn as npt_utils
run["estimator/test_preds_proba"] = npt_utils.get_test_preds_proba(rfr, X_test)

get_scores()

Get estimator scores onX.

If you pass y_pred , predictions are not computed from X and y data.

The estimator should be fitted before calling this function.

Estimator	Logged scores
Single output regressors	Explained variance, max error, mean absolute error,(r^2)
Multi output regressors	(r^2)
Classifiers	Precision, recall,fbeta score, support

Parameters

Name	Type	Default	Description
estimator	estimator	-	scikit-learn estimator to compute scores.
X	ndarray	-	Data matrix.
y	ndarray	-	Target for testing.
y_pred	ndarray, optional	None	Estimator predictions on data.

Returns

dictwith scores.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log estimator scores
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["estimator/scores"] = npt_utils.get_scores(rfc, X, y)

create_learning_curve_chart()

Returns a learning curve chart.

Parameters

Name	Type	Default	Description
regressor	regressor	-	Fitted scikit-learn regressor object
X_train	ndarray	-	Training data matrix
y_train	ndarray	-	The regression target for training

Returns

Filevalue object with a learning curve chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a learning curve chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/learning_curve"] = npt_utils.create_learning_curve_chart(
 rfr, X_train, y_train
)

create_feature_importance_chart()

Returns a feature importance chart.

Parameters

Name	Type	Default	Description
regressor	regressor	-	Fitted scikit-learn regressor object
X_train	ndarray	-	Training data matrix
y_train	ndarray	-	The regression target for training

Returns

Filevalue object with a feature importance chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a feature importance chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/feature_importance"] = npt_utils.create_feature_importance_chart(
 rfr, X_train, y_train
)

create_residuals_chart()

Returns a residuals chart.

Parameters

Name	Type	Default	Description
regressor	regressor	-	Fitted scikit-learn regressor object
X_train	ndarray	-	Training data matrix
X_test	ndarray	-	Testing data matrix
y_train	ndarray	-	The regression target for training
y_test	ndarray	-	The regression target for testing

Returns

Filevalue object with a residuals chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a residuals chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/residuals"] = npt_utils.create_residuals_chart(
 rfr, X_train, X_test, y_train, y_test
)

create_prediction_error_chart()

Returns a prediction error chart.

Parameters

Name	Type	Default	Description
regressor	regressor	-	Fitted scikit-learn regressor object
X_train	ndarray	-	Training data matrix
X_test	ndarray	-	Testing data matrix
y_train	ndarray	-	The regression target for training
y_test	ndarray	-	The regression target for testing

Returns

Filevalue object with a prediction error chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a prediction error chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/prediction_error"] = npt_utils.create_prediction_error_chart(
 rfr, X_train, X_test, y_train, y_test
)

create_cooks_distance_chart()

Returns a Cook's distance chart.

Parameters

Name	Type	Default	Description
regressor	regressor	-	Fitted scikit-learn regressor object
X_train	ndarray	-	Training data matrix
y_train	ndarray	-	The regression target for training

Returns

Filevalue object with a Cook's distance chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a prediction error chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/cooks_distance"] = npt_utils.create_cooks_distance_chart(
 rfr, X_train, y_train
)

create_classification_report_chart()

Returns a classification report chart.

Parameters

Name	Type	Default	Description
classifier	classifier	-	Fitted scikit-learn classifier object
X_train	ndarray	-	Training data matrix
X_test	ndarray	-	Testing data matrix
y_train	ndarray	-	The classification target for training
y_test	ndarray	-	The classification target for testing

Returns

Filevalue object with a classification report chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a classification report chart
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/cls_report"] = npt_utils.create_classification_report_chart(
 rfc, X_train, X_test, y_train, y_test
)

create_confusion_matrix_chart()

Returns a confusion matrix.

Parameters

Name	Type	Default	Description
classifier	classifier	-	Fitted scikit-learn classifier object.
X_train	ndarray	-	Training data matrix.
X_test	ndarray	-	Testing data matrix.
y_train	ndarray	-	The classification target for training.
y_test	ndarray	-	The classification target for testing.

Returns

Filevalue object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the chart:

rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils

run["visuals/confusion_matrix"] = npt_utils.create_confusion_matrix_chart(
 rfc, X_train, X_test, y_train, y_test
)

create_roc_auc_chart()

Returns a ROC-AUC chart.

Parameters

Name	Type	Default	Description
classifier	classifier	-	Fitted scikit-learn classifier object.
X_train	ndarray	-	Training data matrix.
X_test	ndarray	-	Testing data matrix.
y_train	ndarray	-	The classification target for training.
y_test	ndarray	-	The classification target for testing.

Returns

Filevalue object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the chart:

rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils

run["visuals/roc_auc"] = npt_utils.create_roc_auc_chart(
 rfc, X_train, X_test, y_train, y_test
)

create_precision_recall_chart()

Returns a precision-recall chart.

Parameters

Name	Type	Default	Description
classifier	classifier	-	Fitted scikit-learn classifier object.
X_test	ndarray	-	Testing data matrix.
y_test	ndarray	-	The classification target for testing.
y_pred_proba	ndarray	-	Classifier predictions probabilities on test data.

Returns

Filevalue object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the chart:

rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils

run["visuals/precision_recall"] = npt_utils.create_precision_recall_chart(
 rfc, X_test, y_test
)

create_class_prediction_error_chart()

Returns a class prediction error chart.

Parameters

Name	Type	Default	Description
classifier	classifier	-	Fitted scikit-learn classifier object.
X_train	ndarray	-	Training data matrix.
X_test	ndarray	-	Testing data matrix.
y_train	ndarray	-	The classification target for training.
y_test	ndarray	-	The classification target for testing.

Returns

Filevalue object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the chart:

rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils

run["visuals/class_pred_error"] = npt_utils.create_class_prediction_error_chart(
 rfc, X_train, X_test, y_train, y_test
)

get_cluster_labels()

Logs the index of the cluster label each sample belongs to.

Parameters

Name	Type	Default	Description
model	KMeans	-	KMeans object.
X	ndarray	-	Training instances to cluster.
nrows	int, optional	1000	Number of rows to log.
kwargs	-	-	KMeans parameters.

Returns

Filevalue object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the labels:

km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)

import neptune.integrations.sklearn as npt_utils

run["kmeans/cluster_labels"] = npt_utils.get_cluster_labels(km, X)

create_kelbow_chart()

Returns the K-elbow chart for the KMeans clusterer.

Parameters

Name	Type	Default	Description
model	KMeans	-	KMeans object.
X	ndarray	-	Training instances to cluster.
kwargs	-	-	KMeans parameters.

Returns

Filevalue object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the chart:

km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)

import neptune.integrations.sklearn as npt_utils

run["kmeans/kelbow"] = npt_utils.create_kelbow_chart(km, X)

create_silhouette_chart()

Returns the silhouette coefficient charts for the KMeans clusterer.

Charts are computed for j = 2, 3, ..., n_clusters.

Parameters

Name	Type	Default	Description
model	KMeans	-	KMeans object.
X	ndarray	-	Training instances to cluster.
kwargs	-	-	KMeans parameters.

Returns

Filevalue object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the charts:

km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)

import neptune.integrations.sklearn as npt_utils

run["kmeans/silhouette"] = npt_utils.create_silhouette_chart(km, X, n_clusters=12)