Overview

The goal

scikit-learn is a machine learning tool kit for data analysis.

pip install scikit-learn

Simple and efficient tools for predictive data analysis

Accessible to everybody, and reusable in various contexts

Built on NumPy, SciPy, and matplotlib

I will keep it short and I will mark the most relevant tools in bold

sklearn.base: Base classes and utility functions

see here

sklearn.calibration: Probability Calibration


calibration.CalibratedClassifierCV([…])	Probability calibration with isotonic regression or logistic regression.
calibration.calibration_curve(y_true, y_prob, *)	Compute true and predicted probabilities for a calibration curve.

sklearn.cluster: Clustering

Classes


cluster.AffinityPropagation(*[, damping, …])	Perform Affinity Propagation Clustering of data.
cluster.AgglomerativeClustering([…])	Agglomerative Clustering.
cluster.Birch(*[, threshold, …])	Implements the BIRCH clustering algorithm.
cluster.DBSCAN([eps, min_samples, metric, …])	Perform DBSCAN clustering from vector array or distance matrix.
cluster.HDBSCAN([min_cluster_size, …])	Cluster data using hierarchical density-based clustering.
cluster.FeatureAgglomeration([n_clusters, …])	Agglomerate features.
cluster.KMeans([n_clusters, init, n_init, …])	K-Means clustering.
cluster.BisectingKMeans([n_clusters, init, …])	Bisecting K-Means clustering.
cluster.MiniBatchKMeans([n_clusters, init, …])	Mini-Batch K-Means clustering.
cluster.MeanShift(*[, bandwidth, seeds, …])	Mean shift clustering using a flat kernel.
cluster.OPTICS(*[, min_samples, max_eps, …])	Estimate clustering structure from vector array.
cluster.SpectralClustering([n_clusters, …])	Apply clustering to a projection of the normalized Laplacian.
cluster.SpectralBiclustering([n_clusters, …])	Spectral biclustering (Kluger, 2003).
cluster.SpectralCoclustering([n_clusters, …])	Spectral Co-Clustering algorithm (Dhillon, 2001).

Functions


cluster.affinity_propagation(S, *[, …])	Perform Affinity Propagation Clustering of data.
cluster.cluster_optics_dbscan(*, …)	Perform DBSCAN extraction for an arbitrary epsilon.
cluster.cluster_optics_xi(*, reachability, …)	Automatically extract clusters according to the Xi-steep method.
cluster.compute_optics_graph(X, *, …)	Compute the OPTICS reachability graph.
cluster.dbscan(X[, eps, min_samples, …])	Perform DBSCAN clustering from vector array or distance matrix.
cluster.estimate_bandwidth(X, *[, quantile, …])	Estimate the bandwidth to use with the mean-shift algorithm.
cluster.k_means(X, n_clusters, *[, …])	Perform K-means clustering algorithm.
cluster.kmeans_plusplus(X, n_clusters, *[, …])	Init n_clusters seeds according to k-means++.
cluster.mean_shift(X, *[, bandwidth, seeds, …])	Perform mean shift clustering of data using a flat kernel.
cluster.spectral_clustering(affinity, *[, …])	Apply clustering to a projection of the normalized Laplacian.
cluster.ward_tree(X, *[, connectivity, …])	Ward clustering based on a Feature matrix.

sklearn.compose: Composite Estimators


compose.ColumnTransformer(transformers, *[, …])	Applies transformers to columns of an array or pandas DataFrame.
compose.TransformedTargetRegressor([…])	Meta-estimator to regress on a transformed target.
compose.make_column_transformer(*transformers)	Construct a ColumnTransformer from the given transformers.
compose.make_column_selector([pattern, …])	Create a callable to select columns to be used with ColumnTransformer.

sklearn.covariance: Covariance Estimators


covariance.EmpiricalCovariance(*[, …])	Maximum likelihood covariance estimator.
covariance.EllipticEnvelope(*[, …])	An object for detecting outliers in a Gaussian distributed dataset.
covariance.GraphicalLasso([alpha, mode, …])	Sparse inverse covariance estimation with an l1-penalized estimator.
covariance.GraphicalLassoCV(*[, alphas, …])	Sparse inverse covariance w/ cross-validated choice of the l1 penalty.
covariance.LedoitWolf(*[, store_precision, …])	LedoitWolf Estimator.
covariance.MinCovDet(*[, store_precision, …])	Minimum Covariance Determinant (MCD): robust estimator of covariance.
covariance.OAS(*[, store_precision, …])	Oracle Approximating Shrinkage Estimator as proposed in [R69773891e6a6-1].
covariance.ShrunkCovariance(*[, …])	Covariance estimator with shrinkage.
covariance.empirical_covariance(X, *[, …])	Compute the Maximum likelihood covariance estimator.
covariance.graphical_lasso(emp_cov, alpha, *)	L1-penalized covariance estimator.
covariance.ledoit_wolf(X, *[, …])	Estimate the shrunk Ledoit-Wolf covariance matrix.
covariance.ledoit_wolf_shrinkage(X[, …])	Estimate the shrunk Ledoit-Wolf covariance matrix.
covariance.oas(X, *[, assume_centered])	Estimate covariance with the Oracle Approximating Shrinkage as proposed in [Rca3a42e5ec35-1].
covariance.shrunk_covariance(emp_cov[, …])	Calculate a covariance matrix shrunk on the diagonal.

sklearn.cross_decomposition: Cross decomposition


cross_decomposition.CCA([n_components, …])	Canonical Correlation Analysis, also known as “Mode B” PLS.
cross_decomposition.PLSCanonical([…])	Partial Least Squares transformer and regressor.
cross_decomposition.PLSRegression([…])	PLS regression.
cross_decomposition.PLSSVD([n_components, …])	Partial Least Square SVD.

sklearn.datasets: Datasets

see here

sklearn.decomposition: Matrix Decomposition


decomposition.DictionaryLearning([…])	Dictionary learning.
decomposition.FactorAnalysis([n_components, …])	Factor Analysis (FA).
decomposition.FastICA([n_components, …])	FastICA: a fast algorithm for Independent Component Analysis.
decomposition.IncrementalPCA([n_components, …])	Incremental principal components analysis (IPCA).
decomposition.KernelPCA([n_components, …])	Kernel Principal component analysis (KPCA) [R396fc7d924b8-1].
decomposition.LatentDirichletAllocation([…])	Latent Dirichlet Allocation with online variational Bayes algorithm.
decomposition.MiniBatchDictionaryLearning([…])	Mini-batch dictionary learning.
decomposition.MiniBatchSparsePCA([…])	Mini-batch Sparse Principal Components Analysis.
decomposition.NMF([n_components, init, …])	Non-Negative Matrix Factorization (NMF).
decomposition.MiniBatchNMF([n_components, …])	Mini-Batch Non-Negative Matrix Factorization (NMF).
decomposition.PCA([n_components, copy, …])	Principal component analysis (PCA).
decomposition.SparsePCA([n_components, …])	Sparse Principal Components Analysis (SparsePCA).
decomposition.SparseCoder(dictionary, *[, …])	Sparse coding.
decomposition.TruncatedSVD([n_components, …])	Dimensionality reduction using truncated SVD (aka LSA).
decomposition.dict_learning(X, n_components, …)	Solve a dictionary learning matrix factorization problem.
decomposition.dict_learning_online(X[, …])	Solve a dictionary learning matrix factorization problem online.
decomposition.fastica(X[, n_components, …])	Perform Fast Independent Component Analysis.
decomposition.non_negative_factorization(X)	Compute Non-negative Matrix Factorization (NMF).
decomposition.sparse_encode(X, dictionary, *)	Sparse coding.

sklearn.discriminant_analysis: Discriminant Analysis


discriminant_analysis.LinearDiscriminantAnalysis([…])	Linear Discriminant Analysis.
discriminant_analysis.QuadraticDiscriminantAnalysis(*)	Quadratic Discriminant Analysis.

sklearn.dummy: Dummy estimators


dummy.DummyClassifier(*[, strategy, …])	DummyClassifier makes predictions that ignore the input features.
dummy.DummyRegressor(*[, strategy, …])	Regressor that makes predictions using simple rules.

sklearn.ensemble: Ensemble Methods


ensemble.AdaBoostClassifier([estimator, …])	An AdaBoost classifier.
ensemble.AdaBoostRegressor([estimator, …])	An AdaBoost regressor.
ensemble.BaggingClassifier([estimator, …])	A Bagging classifier.
ensemble.BaggingRegressor([estimator, …])	A Bagging regressor.
ensemble.ExtraTreesClassifier([…])	An extra-trees classifier.
ensemble.ExtraTreesRegressor([n_estimators, …])	An extra-trees regressor.
ensemble.GradientBoostingClassifier(*[, …])	Gradient Boosting for classification.
ensemble.GradientBoostingRegressor(*[, …])	Gradient Boosting for regression.
ensemble.IsolationForest(*[, n_estimators, …])	Isolation Forest Algorithm.
ensemble.RandomForestClassifier([…])	A random forest classifier.
ensemble.RandomForestRegressor([…])	A random forest regressor.
ensemble.RandomTreesEmbedding([…])	An ensemble of totally random trees.
ensemble.StackingClassifier(estimators[, …])	Stack of estimators with a final classifier.
ensemble.StackingRegressor(estimators[, …])	Stack of estimators with a final regressor.
ensemble.VotingClassifier(estimators, *[, …])	Soft Voting/Majority Rule classifier for unfitted estimators.
ensemble.VotingRegressor(estimators, *[, …])	Prediction voting regressor for unfitted estimators.
ensemble.HistGradientBoostingRegressor([…])	Histogram-based Gradient Boosting Regression Tree.
ensemble.HistGradientBoostingClassifier([…])	Histogram-based Gradient Boosting Classification Tree.

sklearn.exceptions: Exceptions and warnings

see here

sklearn.experimental: Experimental

see here

sklearn.feature_extraction: Feature Extraction


feature_extraction.DictVectorizer(*[, …])	Transforms lists of feature-value mappings to vectors.
feature_extraction.FeatureHasher([…])	Implements feature hashing, aka the hashing trick.

From images


feature_extraction.image.extract_patches_2d(…)	Reshape a 2D image into a collection of patches.
feature_extraction.image.grid_to_graph(n_x, n_y)	Graph of the pixel-to-pixel connections.
feature_extraction.image.img_to_graph(img, *)	Graph of the pixel-to-pixel gradient connections.
feature_extraction.image.reconstruct_from_patches_2d(…)	Reconstruct the image from all of its patches.
feature_extraction.image.PatchExtractor(*[, …])	Extracts patches from a collection of images.

From text


feature_extraction.text.CountVectorizer(*[, …])	Convert a collection of text documents to a matrix of token counts.
feature_extraction.text.HashingVectorizer(*)	Convert a collection of text documents to a matrix of token occurrences.
feature_extraction.text.TfidfTransformer(*)	Transform a count matrix to a normalized tf or tf-idf representation.
feature_extraction.text.TfidfVectorizer(*[, …])	Convert a collection of raw documents to a matrix of TF-IDF features.

sklearn.feature_selection: Feature Selection


feature_selection.GenericUnivariateSelect([…])	Univariate feature selector with configurable strategy.
feature_selection.SelectPercentile([…])	Select features according to a percentile of the highest scores.
feature_selection.SelectKBest([score_func, k])	Select features according to the k highest scores.
feature_selection.SelectFpr([score_func, alpha])	Filter: Select the pvalues below alpha based on a FPR test.
feature_selection.SelectFdr([score_func, alpha])	Filter: Select the p-values for an estimated false discovery rate.
feature_selection.SelectFromModel(estimator, *)	Meta-transformer for selecting features based on importance weights.
feature_selection.SelectFwe([score_func, alpha])	Filter: Select the p-values corresponding to Family-wise error rate.
feature_selection.SequentialFeatureSelector(…)	Transformer that performs Sequential Feature Selection.
feature_selection.RFE(estimator, *[, …])	Feature ranking with recursive feature elimination.
feature_selection.RFECV(estimator, *[, …])	Recursive feature elimination with cross-validation to select features.
feature_selection.VarianceThreshold([threshold])	Feature selector that removes all low-variance features.
feature_selection.chi2(X, y)	Compute chi-squared stats between each non-negative feature and class.
feature_selection.f_classif(X, y)	Compute the ANOVA F-value for the provided sample.
feature_selection.f_regression(X, y, *[, …])	Univariate linear regression tests returning F-statistic and p-values.
feature_selection.r_regression(X, y, *[, …])	Compute Pearson’s r for each features and the target.
feature_selection.mutual_info_classif(X, y, *)	Estimate mutual information for a discrete target variable.
feature_selection.mutual_info_regression(X, y, *)	Estimate mutual information for a continuous target variable.

sklearn.gaussian_process: Gaussian Processes


gaussian_process.GaussianProcessClassifier([…])	Gaussian process classification (GPC) based on Laplace approximation.
gaussian_process.GaussianProcessRegressor([…])	Gaussian process regression (GPR).

Kernels


gaussian_process.kernels.CompoundKernel(kernels)	Kernel which is composed of a set of other kernels.
gaussian_process.kernels.ConstantKernel([…])	Constant kernel.
gaussian_process.kernels.DotProduct([…])	Dot-Product kernel.
gaussian_process.kernels.ExpSineSquared([…])	Exp-Sine-Squared kernel (aka periodic kernel).
gaussian_process.kernels.Exponentiation(…)	The Exponentiation kernel takes one base kernel and a scalar parameter and combines them via
gaussian_process.kernels.Hyperparameter(…)	A kernel hyperparameter’s specification in form of a namedtuple.
gaussian_process.kernels.Kernel()	Base class for all kernels.
gaussian_process.kernels.Matern([…])	Matern kernel.
gaussian_process.kernels.PairwiseKernel([…])	Wrapper for kernels in sklearn.metrics.pairwise.
gaussian_process.kernels.Product(k1, k2)	The Product kernel takes two kernels k1 and k2 and combines them via
gaussian_process.kernels.RBF([length_scale, …])	Radial basis function kernel (aka squared-exponential kernel).
gaussian_process.kernels.RationalQuadratic([…])	Rational Quadratic kernel.
gaussian_process.kernels.Sum(k1, k2)	The Sum kernel takes two kernels k1 and k2 and combines them via
gaussian_process.kernels.WhiteKernel([…])	White kernel.

sklearn.impute: Impute


impute.SimpleImputer(*[, missing_values, …])	Univariate imputer for completing missing values with simple strategies.
impute.IterativeImputer([estimator, …])	Multivariate imputer that estimates each feature from all the others.
impute.MissingIndicator(*[, missing_values, …])	Binary indicators for missing values.
impute.KNNImputer(*[, missing_values, …])	Imputation for completing missing values using k-Nearest Neighbors.

sklearn.inspection: Inspection


inspection.partial_dependence(estimator, X, …)	Partial dependence of features.
inspection.permutation_importance(estimator, …)	Permutation importance for feature evaluation [Rd9e56ef97513-BRE].

Plotting


inspection.DecisionBoundaryDisplay(*, xx0, …)	Decisions boundary visualization.
inspection.PartialDependenceDisplay(…[, …])	Partial Dependence Plot (PDP).

sklearn.isotonic: Isotonic regression


isotonic.IsotonicRegression(*[, y_min, …])	Isotonic regression model.
isotonic.check_increasing(x, y)	Determine whether y is monotonically correlated with x.
isotonic.isotonic_regression(y, *[, …])	Solve the isotonic regression model.

sklearn.kernel_approximation: Kernel Approximation


kernel_approximation.AdditiveChi2Sampler(*)	Approximate feature map for additive chi2 kernel.
kernel_approximation.Nystroem([kernel, …])	Approximate a kernel map using a subset of the training data.
kernel_approximation.PolynomialCountSketch(*)	Polynomial kernel approximation via Tensor Sketch.
kernel_approximation.RBFSampler(*[, gamma, …])	Approximate a RBF kernel feature map using random Fourier features.
kernel_approximation.SkewedChi2Sampler(*[, …])	Approximate feature map for “skewed chi-squared” kernel.

sklearn.kernel_ridge: Kernel Ridge Regression


kernel_ridge.KernelRidge([alpha, kernel, …])	Kernel ridge regression.

sklearn.linear_model: Linear Models

Linear classifiers


linear_model.LogisticRegression([penalty, …])	Logistic Regression (aka logit, MaxEnt) classifier.
linear_model.LogisticRegressionCV(*[, Cs, …])	Logistic Regression CV (aka logit, MaxEnt) classifier.
linear_model.PassiveAggressiveClassifier(*)	Passive Aggressive Classifier.
linear_model.Perceptron(*[, penalty, alpha, …])	Linear perceptron classifier.
linear_model.RidgeClassifier([alpha, …])	Classifier using Ridge regression.
linear_model.RidgeClassifierCV([alphas, …])	Ridge classifier with built-in cross-validation.
linear_model.SGDClassifier([loss, penalty, …])	Linear classifiers (SVM, logistic regression, etc.) with SGD training.
linear_model.SGDOneClassSVM([nu, …])	Solves linear One-Class SVM using Stochastic Gradient Descent.

Classical linear regressors


linear_model.LinearRegression(*[, …])	Ordinary least squares Linear Regression.
linear_model.Ridge([alpha, fit_intercept, …])	Linear least squares with l2 regularization.
linear_model.RidgeCV([alphas, …])	Ridge regression with built-in cross-validation.
linear_model.SGDRegressor([loss, penalty, …])	Linear model fitted by minimizing a regularized empirical loss with SGD.

Regressors with variable selection


linear_model.ElasticNet([alpha, l1_ratio, …])	Linear regression with combined L1 and L2 priors as regularizer.
linear_model.ElasticNetCV(*[, l1_ratio, …])	Elastic Net model with iterative fitting along a regularization path.
linear_model.Lars(*[, fit_intercept, …])	Least Angle Regression model a.k.a.
linear_model.LarsCV(*[, fit_intercept, …])	Cross-validated Least Angle Regression model.
linear_model.Lasso([alpha, fit_intercept, …])	Linear Model trained with L1 prior as regularizer (aka the Lasso).
linear_model.LassoCV(*[, eps, n_alphas, …])	Lasso linear model with iterative fitting along a regularization path.
linear_model.LassoLars([alpha, …])	Lasso model fit with Least Angle Regression a.k.a.
linear_model.LassoLarsCV(*[, fit_intercept, …])	Cross-validated Lasso, using the LARS algorithm.
linear_model.LassoLarsIC([criterion, …])	Lasso model fit with Lars using BIC or AIC for model selection.
linear_model.OrthogonalMatchingPursuit(*[, …])	Orthogonal Matching Pursuit model (OMP).
linear_model.OrthogonalMatchingPursuitCV(*)	Cross-validated Orthogonal Matching Pursuit model (OMP).

Bayesian regressors


linear_model.ARDRegression(*[, max_iter, …])	Bayesian ARD regression.
linear_model.BayesianRidge(*[, max_iter, …])	Bayesian ridge regression.

Multi-task linear regressors with variable selection


linear_model.MultiTaskElasticNet([alpha, …])	Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer.
linear_model.MultiTaskElasticNetCV(*[, …])	Multi-task L1/L2 ElasticNet with built-in cross-validation.
linear_model.MultiTaskLasso([alpha, …])	Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer.
linear_model.MultiTaskLassoCV(*[, eps, …])	Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer.

Outlier-robust regressors


linear_model.HuberRegressor(*[, epsilon, …])	L2-regularized linear regression model that is robust to outliers.
linear_model.QuantileRegressor(*[, …])	Linear regression model that predicts conditional quantiles.
linear_model.RANSACRegressor([estimator, …])	RANSAC (RANdom SAmple Consensus) algorithm.
linear_model.TheilSenRegressor(*[, …])	Theil-Sen Estimator: robust multivariate regression model.

Generalized linear models (GLM) for regression


linear_model.PoissonRegressor(*[, alpha, …])	Generalized Linear Model with a Poisson distribution.
linear_model.TweedieRegressor(*[, power, …])	Generalized Linear Model with a Tweedie distribution.
linear_model.GammaRegressor(*[, alpha, …])	Generalized Linear Model with a Gamma distribution.

Miscellaneous


linear_model.PassiveAggressiveRegressor(*[, …])	Passive Aggressive Regressor.
linear_model.enet_path(X, y, *[, l1_ratio, …])	Compute elastic net path with coordinate descent.
linear_model.lars_path(X, y[, Xy, Gram, …])	Compute Least Angle Regression or Lasso path using the LARS algorithm [1].
linear_model.lars_path_gram(Xy, Gram, *, …)	The lars_path in the sufficient stats mode [1].
linear_model.lasso_path(X, y, *[, eps, …])	Compute Lasso path with coordinate descent.
linear_model.orthogonal_mp(X, y, *[, …])	Orthogonal Matching Pursuit (OMP).
linear_model.orthogonal_mp_gram(Gram, Xy, *)	Gram Orthogonal Matching Pursuit (OMP).
linear_model.ridge_regression(X, y, alpha, *)	Solve the ridge equation by the method of normal equations.

sklearn.manifold: Manifold Learning


manifold.Isomap(*[, n_neighbors, radius, …])	Isomap Embedding.
manifold.LocallyLinearEmbedding(*[, …])	Locally Linear Embedding.
manifold.MDS([n_components, metric, n_init, …])	Multidimensional scaling.
manifold.SpectralEmbedding([n_components, …])	Spectral embedding for non-linear dimensionality reduction.
manifold.TSNE([n_components, perplexity, …])	T-distributed Stochastic Neighbor Embedding.
manifold.locally_linear_embedding(X, *, …)	Perform a Locally Linear Embedding analysis on the data.
manifold.smacof(dissimilarities, *[, …])	Compute multidimensional scaling using the SMACOF algorithm.
manifold.spectral_embedding(adjacency, *[, …])	Project the sample on the first eigenvectors of the graph Laplacian.
manifold.trustworthiness(X, X_embedded, *[, …])	Indicate to what extent the local structure is retained.

sklearn.metrics: Metrics

Model Selection Interface


metrics.check_scoring(estimator[, scoring, …])	Determine scorer from user options.
metrics.get_scorer(scoring)	Get a scorer from string.
metrics.get_scorer_names()	Get the names of all available scorers.
metrics.make_scorer(score_func, *[, …])	Make a scorer from a performance metric or loss function.

Classification metrics


metrics.accuracy_score(y_true, y_pred, *[, …])	Accuracy classification score.
metrics.auc(x, y)	Compute Area Under the Curve (AUC) using the trapezoidal rule.
metrics.average_precision_score(y_true, …)	Compute average precision (AP) from prediction scores.
metrics.balanced_accuracy_score(y_true, …)	Compute the balanced accuracy.
metrics.brier_score_loss(y_true, y_prob, *)	Compute the Brier score loss.
metrics.class_likelihood_ratios(y_true, …)	Compute binary classification positive and negative likelihood ratios.
metrics.classification_report(y_true, y_pred, *)	Build a text report showing the main classification metrics.
metrics.cohen_kappa_score(y1, y2, *[, …])	Compute Cohen’s kappa: a statistic that measures inter-annotator agreement.
metrics.confusion_matrix(y_true, y_pred, *)	Compute confusion matrix to evaluate the accuracy of a classification.
metrics.dcg_score(y_true, y_score, *[, k, …])	Compute Discounted Cumulative Gain.
metrics.det_curve(y_true, y_score[, …])	Compute error rates for different probability thresholds.
metrics.f1_score(y_true, y_pred, *[, …])	Compute the F1 score, also known as balanced F-score or F-measure.
metrics.fbeta_score(y_true, y_pred, *, beta)	Compute the F-beta score.
metrics.hamming_loss(y_true, y_pred, *[, …])	Compute the average Hamming loss.
metrics.hinge_loss(y_true, pred_decision, *)	Average hinge loss (non-regularized).
metrics.jaccard_score(y_true, y_pred, *[, …])	Jaccard similarity coefficient score.
metrics.log_loss(y_true, y_pred, *[, eps, …])	Log loss, aka logistic loss or cross-entropy loss.
metrics.matthews_corrcoef(y_true, y_pred, *)	Compute the Matthews correlation coefficient (MCC).
metrics.multilabel_confusion_matrix(y_true, …)	Compute a confusion matrix for each class or sample.
metrics.ndcg_score(y_true, y_score, *[, k, …])	Compute Normalized Discounted Cumulative Gain.
metrics.precision_recall_curve(y_true, …)	Compute precision-recall pairs for different probability thresholds.
metrics.precision_recall_fscore_support(…)	Compute precision, recall, F-measure and support for each class.
metrics.precision_score(y_true, y_pred, *[, …])	Compute the precision.
metrics.recall_score(y_true, y_pred, *[, …])	Compute the recall.
metrics.roc_auc_score(y_true, y_score, *[, …])	Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
metrics.roc_curve(y_true, y_score, *[, …])	Compute Receiver operating characteristic (ROC).
metrics.top_k_accuracy_score(y_true, y_score, *)	Top-k Accuracy classification score.
metrics.zero_one_loss(y_true, y_pred, *[, …])	Zero-one classification loss.

Regression metrics


metrics.explained_variance_score(y_true, …)	Explained variance regression score function.
metrics.max_error(y_true, y_pred)	The max_error metric calculates the maximum residual error.
metrics.mean_absolute_error(y_true, y_pred, *)	Mean absolute error regression loss.
metrics.mean_squared_error(y_true, y_pred, *)	Mean squared error regression loss.
metrics.mean_squared_log_error(y_true, y_pred, *)	Mean squared logarithmic error regression loss.
metrics.median_absolute_error(y_true, y_pred, *)	Median absolute error regression loss.
metrics.mean_absolute_percentage_error(…)	Mean absolute percentage error (MAPE) regression loss.
metrics.r2_score(y_true, y_pred, *[, …])	R^2 (coefficient of determination) regression score function.
metrics.mean_poisson_deviance(y_true, y_pred, *)	Mean Poisson deviance regression loss.
metrics.mean_gamma_deviance(y_true, y_pred, *)	Mean Gamma deviance regression loss.
metrics.mean_tweedie_deviance(y_true, y_pred, *)	Mean Tweedie deviance regression loss.
metrics.d2_tweedie_score(y_true, y_pred, *)	D^2 regression score function, fraction of Tweedie deviance explained.
metrics.mean_pinball_loss(y_true, y_pred, *)	Pinball loss for quantile regression.
metrics.d2_pinball_score(y_true, y_pred, *)	D^2 regression score function, fraction of pinball loss explained.
metrics.d2_absolute_error_score(y_true, …)	D^2 regression score function, fraction of absolute error explained.

Multilabel ranking metrics


metrics.coverage_error(y_true, y_score, *[, …])	Coverage error measure.
metrics.label_ranking_average_precision_score(…)	Compute ranking-based average precision.
metrics.label_ranking_loss(y_true, y_score, *)	Compute Ranking loss measure.

Clustering metrics


metrics.adjusted_mutual_info_score(…[, …])	Adjusted Mutual Information between two clusterings.
metrics.adjusted_rand_score(labels_true, …)	Rand index adjusted for chance.
metrics.calinski_harabasz_score(X, labels)	Compute the Calinski and Harabasz score.
metrics.davies_bouldin_score(X, labels)	Compute the Davies-Bouldin score.
metrics.completeness_score(labels_true, …)	Compute completeness metric of a cluster labeling given a ground truth.
metrics.cluster.contingency_matrix(…[, …])	Build a contingency matrix describing the relationship between labels.
metrics.cluster.pair_confusion_matrix(…)	Pair confusion matrix arising from two clusterings [R9ca8fd06d29a-1].
metrics.fowlkes_mallows_score(labels_true, …)	Measure the similarity of two clusterings of a set of points.
metrics.homogeneity_completeness_v_measure(…)	Compute the homogeneity and completeness and V-Measure scores at once.
metrics.homogeneity_score(labels_true, …)	Homogeneity metric of a cluster labeling given a ground truth.
metrics.mutual_info_score(labels_true, …)	Mutual Information between two clusterings.
metrics.normalized_mutual_info_score(…[, …])	Normalized Mutual Information between two clusterings.
metrics.rand_score(labels_true, labels_pred)	Rand index.
metrics.silhouette_score(X, labels, *[, …])	Compute the mean Silhouette Coefficient of all samples.
metrics.silhouette_samples(X, labels, *[, …])	Compute the Silhouette Coefficient for each sample.
metrics.v_measure_score(labels_true, …[, beta])	V-measure cluster labeling given a ground truth.

Biclustering metrics


metrics.consensus_score(a, b, *[, similarity])	The similarity of two sets of biclusters.

Distance metrics


metrics.DistanceMetric	Uniform interface for fast distance metric functions.

Pairwise metrics


metrics.pairwise.additive_chi2_kernel(X[, Y])	Compute the additive chi-squared kernel between observations in X and Y.
metrics.pairwise.chi2_kernel(X[, Y, gamma])	Compute the exponential chi-squared kernel between X and Y.
metrics.pairwise.cosine_similarity(X[, Y, …])	Compute cosine similarity between samples in X and Y.
metrics.pairwise.cosine_distances(X[, Y])	Compute cosine distance between samples in X and Y.
metrics.pairwise.distance_metrics()	Valid metrics for pairwise_distances.
metrics.pairwise.euclidean_distances(X[, Y, …])	Compute the distance matrix between each pair from a vector array X and Y.
metrics.pairwise.haversine_distances(X[, Y])	Compute the Haversine distance between samples in X and Y.
metrics.pairwise.kernel_metrics()	Valid metrics for pairwise_kernels.
metrics.pairwise.laplacian_kernel(X[, Y, gamma])Compute the laplacian kernel between X and Y.
metrics.pairwise.linear_kernel(X[, Y, …])	Compute the linear kernel between X and Y.
metrics.pairwise.manhattan_distances(X[, Y, …])	Compute the L1 distances between the vectors in X and Y.
metrics.pairwise.nan_euclidean_distances(X)	Calculate the euclidean distances in the presence of missing values.
metrics.pairwise.pairwise_kernels(X[, Y, …])	Compute the kernel between arrays X and optional array Y.
metrics.pairwise.polynomial_kernel(X[, Y, …])	Compute the polynomial kernel between X and Y.
metrics.pairwise.rbf_kernel(X[, Y, gamma])	Compute the rbf (gaussian) kernel between X and Y.
metrics.pairwise.sigmoid_kernel(X[, Y, …])	Compute the sigmoid kernel between X and Y.
metrics.pairwise.paired_euclidean_distances(X, Y)	Compute the paired euclidean distances between X and Y.
metrics.pairwise.paired_manhattan_distances(X, Y)	Compute the paired L1 distances between X and Y.
metrics.pairwise.paired_cosine_distances(X, Y)	Compute the paired cosine distances between X and Y.
metrics.pairwise.paired_distances(X, Y, *[, …])	Compute the paired distances between X and Y.
metrics.pairwise_distances(X[, Y, metric, …])	Compute the distance matrix from a vector array X and optional Y.
metrics.pairwise_distances_argmin(X, Y, *[, …])	Compute minimum distances between one point and a set of points.
metrics.pairwise_distances_argmin_min(X, Y, *)	Compute minimum distances between one point and a set of points.
metrics.pairwise_distances_chunked(X[, Y, …])	Generate a distance matrix chunk by chunk with optional reduction.

Plotting


metrics.ConfusionMatrixDisplay(…[, …])	Confusion Matrix visualization.
metrics.DetCurveDisplay(*, fpr, fnr[, …])	DET curve visualization.
metrics.PrecisionRecallDisplay(precision, …)	Precision Recall visualization.
metrics.PredictionErrorDisplay(*, y_true, y_pred)	Visualization of the prediction error of a regression model.
metrics.RocCurveDisplay(*, fpr, tpr[, …])	ROC Curve visualization.
calibration.CalibrationDisplay(prob_true, …)	Calibration curve (also known as reliability diagram) visualization.

sklearn.mixture: Gaussian Mixture Models


mixture.BayesianGaussianMixture(*[, …])	Variational Bayesian estimation of a Gaussian mixture.
mixture.GaussianMixture([n_components, …])	Gaussian Mixture.

sklearn.model_selection: Model Selection

Splitter Classes


model_selection.GroupKFold([n_splits])	K-fold iterator variant with non-overlapping groups.
model_selection.GroupShuffleSplit([…])	Shuffle-Group(s)-Out cross-validation iterator
model_selection.KFold([n_splits, shuffle, …])	K-Folds cross-validator
model_selection.LeaveOneGroupOut()	Leave One Group Out cross-validator
model_selection.LeavePGroupsOut(n_groups)	Leave P Group(s) Out cross-validator
model_selection.LeaveOneOut()	Leave-One-Out cross-validator
model_selection.LeavePOut(p)	Leave-P-Out cross-validator
model_selection.PredefinedSplit(test_fold)	Predefined split cross-validator
model_selection.RepeatedKFold(*[, n_splits, …])	Repeated K-Fold cross validator.
model_selection.RepeatedStratifiedKFold(*[, …])	Repeated Stratified K-Fold cross validator.
model_selection.ShuffleSplit([n_splits, …])	Random permutation cross-validator
model_selection.StratifiedKFold([n_splits, …])	Stratified K-Folds cross-validator.
model_selection.StratifiedShuffleSplit([…])	Stratified ShuffleSplit cross-validator
model_selection.StratifiedGroupKFold([…])	Stratified K-Folds iterator variant with non-overlapping groups.
model_selection.TimeSeriesSplit([n_splits, …])	Time Series cross-validator

Splitter Functions


model_selection.check_cv([cv, y, classifier])	Input checker utility for building a cross-validator.
model_selection.train_test_split(*arrays[, …])	Split arrays or matrices into random train and test subsets.

Hyper-parameter optimizers


model_selection.GridSearchCV(estimator, …)	Exhaustive search over specified parameter values for an estimator.
model_selection.HalvingGridSearchCV(…[, …])	Search over specified parameter values with successive halving.
model_selection.ParameterGrid(param_grid)	Grid of parameters with a discrete number of values for each.
model_selection.ParameterSampler(…[, …])	Generator on parameters sampled from given distributions.
model_selection.RandomizedSearchCV(…[, …])	Randomized search on hyper parameters.
model_selection.HalvingRandomSearchCV(…[, …])	Randomized search on hyper parameters.

Model validation


model_selection.cross_validate(estimator, X)	Evaluate metric(s) by cross-validation and also record fit/score times.
model_selection.cross_val_predict(estimator, X)	Generate cross-validated estimates for each input data point.
model_selection.cross_val_score(estimator, X)	Evaluate a score by cross-validation.
model_selection.learning_curve(estimator, X, …)	Learning curve.
model_selection.permutation_test_score(…)	Evaluate the significance of a cross-validated score with permutations.
model_selection.validation_curve(estimator, …)	Validation curve.

Visualization


model_selection.LearningCurveDisplay(*, …)	Learning Curve visualization.
model_selection.ValidationCurveDisplay(*, …)	Validation Curve visualization.

sklearn.multiclass: Multiclass classification


multiclass.OneVsRestClassifier(estimator, *)	One-vs-the-rest (OvR) multiclass strategy.
multiclass.OneVsOneClassifier(estimator, *)	One-vs-one multiclass strategy.
multiclass.OutputCodeClassifier(estimator, *)	(Error-Correcting) Output-Code multiclass strategy.

sklearn.multioutput: Multioutput regression and classification


multioutput.ClassifierChain(base_estimator, *)	A multi-label model that arranges binary classifiers into a chain.
multioutput.MultiOutputRegressor(estimator, *)	Multi target regression.
multioutput.MultiOutputClassifier(estimator, *)	Multi target classification.
multioutput.RegressorChain(base_estimator, *)	A multi-label model that arranges regressions into a chain.

sklearn.naive_bayes: Naive Bayes


naive_bayes.BernoulliNB(*[, alpha, …])	Naive Bayes classifier for multivariate Bernoulli models.
naive_bayes.CategoricalNB(*[, alpha, …])	Naive Bayes classifier for categorical features.
naive_bayes.ComplementNB(*[, alpha, …])	The Complement Naive Bayes classifier described in Rennie et al. (2003).
naive_bayes.GaussianNB(*[, priors, …])	Gaussian Naive Bayes (GaussianNB).
naive_bayes.MultinomialNB(*[, alpha, …])	Naive Bayes classifier for multinomial models.

sklearn.neighbors: Nearest Neighbors


neighbors.BallTree(X[, leaf_size, metric])	BallTree for fast generalized N-point problems
neighbors.KDTree(X[, leaf_size, metric])	KDTree for fast generalized N-point problems
neighbors.KernelDensity(*[, bandwidth, …])	Kernel Density Estimation.
neighbors.KNeighborsClassifier([…])	Classifier implementing the k-nearest neighbors vote.
neighbors.KNeighborsRegressor([n_neighbors, …])	Regression based on k-nearest neighbors.
neighbors.KNeighborsTransformer(*[, mode, …])	Transform X into a (weighted) graph of k nearest neighbors.
neighbors.LocalOutlierFactor([n_neighbors, …])	Unsupervised Outlier Detection using the Local Outlier Factor (LOF).
neighbors.RadiusNeighborsClassifier([…])	Classifier implementing a vote among neighbors within a given radius.
neighbors.RadiusNeighborsRegressor([radius, …])	Regression based on neighbors within a fixed radius.
neighbors.RadiusNeighborsTransformer(*[, …])	Transform X into a (weighted) graph of neighbors nearer than a radius.
neighbors.NearestCentroid([metric, …])	Nearest centroid classifier.
neighbors.NearestNeighbors(*[, n_neighbors, …])	Unsupervised learner for implementing neighbor searches.
neighbors.NeighborhoodComponentsAnalysis([…])	Neighborhood Components Analysis.
neighbors.kneighbors_graph(X, n_neighbors, *)	Compute the (weighted) graph of k-Neighbors for points in X.
neighbors.radius_neighbors_graph(X, radius, *)	Compute the (weighted) graph of Neighbors for points in X.
neighbors.sort_graph_by_row_values(graph[, …])	Sort a sparse graph such that each row is stored with increasing values.

sklearn.neural_network: Neural network models


pipeline.FeatureUnion(transformer_list, *[, …])	Concatenates results of multiple transformer objects.
pipeline.Pipeline(steps, *[, memory, verbose])	Pipeline of transforms with a final estimator.
pipeline.make_pipeline(*steps[, memory, verbose])	Construct a Pipeline from the given estimators.
pipeline.make_union(*transformers[, n_jobs, …])	Construct a FeatureUnion from the given transformers.

sklearn.pipeline: Pipeline

see here

sklearn.preprocessing: Preprocessing and Normalization


preprocessing.Binarizer(*[, threshold, copy])	Binarize data (set feature values to 0 or 1) according to a threshold.
preprocessing.FunctionTransformer([func, …])	Constructs a transformer from an arbitrary callable.
preprocessing.KBinsDiscretizer([n_bins, …])	Bin continuous data into intervals.
preprocessing.KernelCenterer()	Center an arbitrary kernel matrix
preprocessing.LabelBinarizer(*[, neg_label, …])	Binarize labels in a one-vs-all fashion.
preprocessing.LabelEncoder()	Encode target labels with value between 0 and n_classes-1.v
preprocessing.MultiLabelBinarizer(*[, …])	Transform between iterable of iterables and a multilabel format.
preprocessing.MaxAbsScaler(*[, copy])	Scale each feature by its maximum absolute value.
preprocessing.MinMaxScaler([feature_range, …])	Transform features by scaling each feature to a given range.
preprocessing.Normalizer([norm, copy])	Normalize samples individually to unit norm.
preprocessing.OneHotEncoder(*[, categories, …])	Encode categorical features as a one-hot numeric array.
preprocessing.OrdinalEncoder(*[, …])	Encode categorical features as an integer array.
preprocessing.PolynomialFeatures([degree, …])	Generate polynomial and interaction features.
preprocessing.PowerTransformer([method, …])	Apply a power transform featurewise to make data more Gaussian-like.
preprocessing.QuantileTransformer(*[, …])	Transform features using quantiles information.
preprocessing.RobustScaler(*[, …])	Scale features using statistics that are robust to outliers.
preprocessing.SplineTransformer([n_knots, …])	Generate univariate B-spline bases for features.
preprocessing.StandardScaler(*[, copy, …])	Standardize features by removing the mean and scaling to unit variance.
preprocessing.TargetEncoder([categories, …])	Target Encoder for regression and classification targets.
preprocessing.add_dummy_feature(X[, value])	Augment dataset with an additional dummy feature.
preprocessing.binarize(X, *[, threshold, copy])	Boolean thresholding of array-like or scipy.sparse matrix.
preprocessing.label_binarize(y, *, classes)	Binarize labels in a one-vs-all fashion.
preprocessing.maxabs_scale(X, *[, axis, copy])	Scale each feature to the [-1, 1] range without breaking the sparsity.
preprocessing.minmax_scale(X[, …])	Transform features by scaling each feature to a given range.
preprocessing.normalize(X[, norm, axis, …])	Scale input vectors individually to unit norm (vector length).
preprocessing.quantile_transform(X, *[, …])	Transform features using quantiles information.
preprocessing.robust_scale(X, *[, axis, …])	Standardize a dataset along any axis.
preprocessing.scale(X, *[, axis, with_mean, …])	Standardize a dataset along any axis.
preprocessing.power_transform(X[, method, …])	Parametric, monotonic transformation to make data more Gaussian-like.

sklearn.random_projection: Random projection


random_projection.GaussianRandomProjection([…])	Reduce dimensionality through Gaussian random projection.
random_projection.SparseRandomProjection([…])	Reduce dimensionality through sparse random projection.
random_projection.johnson_lindenstrauss_min_dim(…)	Find a ‘safe’ number of components to randomly project to.

sklearn.semi_supervised: Semi-Supervised Learning


semi_supervised.LabelPropagation([kernel, …])	Label Propagation classifier.
semi_supervised.LabelSpreading([kernel, …])	LabelSpreading model for semi-supervised learning.
semi_supervised.SelfTrainingClassifier(…)	Self-training classifier.

sklearn.svm: Support Vector Machines


svm.LinearSVC([penalty, loss, dual, tol, C, …])	Linear Support Vector Classification.
*svm.LinearSVR([, epsilon, tol, C, loss, …])**	Linear Support Vector Regression.
svm.NuSVC(*[, nu, kernel, degree, gamma, …])	Nu-Support Vector Classification.
svm.NuSVR(*[, nu, C, kernel, degree, gamma, …])	Nu Support Vector Regression.
*svm.OneClassSVM([, kernel, degree, gamma, …])**	Unsupervised Outlier Detection.
*svm.SVC([, C, kernel, degree, gamma, …])**	C-Support Vector Classification.
*svm.SVR([, kernel, degree, gamma, coef0, …])**	Epsilon-Support Vector Regression.
svm.l1_min_c(X, y, *[, loss, fit_intercept, …])	Return the lowest bound for C.

sklearn.tree: Decision Trees


tree.DecisionTreeClassifier(*[, criterion, …])	A decision tree classifier.
tree.DecisionTreeRegressor(*[, criterion, …])	A decision tree regressor.
tree.ExtraTreeClassifier(*[, criterion, …])	An extremely randomized tree classifier.
tree.ExtraTreeRegressor(*[, criterion, …])	An extremely randomized tree regressor.
tree.export_graphviz(decision_tree[, …])	Export a decision tree in DOT format.
tree.export_text(decision_tree, *[, …])	Build a text report showing the rules of a decision tree.
tree.plot_tree(decision_tree, *[, …])	Plot a decision tree.

sklearn.utils: Utilities

see here

The source code is Open Source and can be found on GitHub.