Most data science and machine learning problems involve several steps of data preprocessing and transformation. ScikitLearn.jl provides two types to facilitate this task.
Pipeline can be used to chain multiple estimators into one. This is useful as there is often a fixed sequence of steps in processing the data, for example feature selection, normalization and classification.
using ScikitLearn using ScikitLearn.Pipelines: Pipeline, make_pipeline @sk_import decomposition: PCA estimators = [("reduce_dim", PCA()), ("logistic_regression", LogisticRegression())] clf = Pipeline(estimators) fit!(clf, X, y)
?make_pipeline and the user guide for details.
- Pipelining: chaining a PCA and a logistic regression
- Restricted Boltzmann Machine features for digit classification
FeatureUnion combines several transformer objects into a new transformer that combines their output. A FeatureUnion takes a list of transformer objects. During fitting, each of these is fit to the data independently. For transforming data, the transformers are applied in parallel, and the sample vectors they output are concatenated end-to-end into larger vectors.
using ScikitLearn.Pipelines: FeatureUnion @sk_import decomposition: (PCA, KernelPCA) estimators = [("linear_pca", PCA()), ("kernel_pca", KernelPCA())] combined = FeatureUnion(estimators)
?FeatureUnion and the user guide for more.