Python models
scikit-learn has on the order of 100 to 200 models (more generally called "estimators"), split into three categories:
- Supervised Learning (linear regression, support vector machines, random forest, neural nets, ...)
- Unsupervised Learning (clustering, PCA, mixture models, manifold learning, ...)
- Dataset Transformation (preprocessing, text feature extraction, one-hot encoding, ...)
All of those estimators will work with ScikitLearn.jl. They are imported with
@sk_import
. For example, here's how to import and fit
sklearn.linear_regression.LogisticRegression
using ScikitLearn
@sk_import linear_model: LogisticRegression
log_reg = fit!(LogisticRegression(penalty="l1"), X_train, y_train)
predict(X_test)
Reminder: ?LogisticRegression
contains a lot of information about the model
parameters.
Installation
Importing the Python models requires Python 2.7 with numpy, and the
scikit-learn library. This is easiest to get through Conda.jl, which is already
installed on your system. Calling @sk_import linear_model: LinearRegression
should automatically install everything. You can also install scikit-learn
manually with Conda.add("scikit-learn")
. If you have other issues, please
refer to PyCall.jl, or
post an issue
Julia models
Julia models are hosted in other packages, and need to be installed separately
with Pkg.add
or Pkg.checkout
(to get the latest version - sometimes
necessary). They all implement the common api, and provide
hyperparameter information in their ?docstrings
.
Unfortunately, some packages export a fit!
function that conflicts with
ScikitLearn's fit!
. This can be fixed by adding this line:
using ScikitLearn: fit!, predict
ScikitLearn models
ScikitLearn.Models.LinearRegression()
implements linear regression using\
, optimized for speed. See?LinearRegression
for fitting options.
GaussianMixtures.jl
Pkg.checkout("GaussianMixtures.jl") # install the package
using GaussianMixtures: GMM
using ScikitLearn
gmm = fit!(GMM(n_components=3, # number of Gaussians to fit
kind=:diag), # diagonal covariance matrix (other option: :full)
X)
predict_proba(gmm, X)
Documentation at GaussianMixtures.jl. Example: density estimation
GaussianProcesses.jl
Pkg.checkout("GaussianProcesses.jl") # install the package
using GaussianProcesses: GP
using ScikitLearn
gp = fit!(GP(; m=MeanZero(), k=SE(0.0, 0.0), logNoise=-1e8),
X, y)
predict(gp, X)
Documentation at GaussianProcesses.jl and in the ?GP
docstring. Example: Gaussian Processes
Gaussian Processes have a lot of hyperparameters, see get_params(GP)
for a list. They can all be tuned
DecisionTree.jl
DecisionTreeClassifier
DecisionTreeRegressor
RandomForestClassifier
RandomForestRegressor
AdaBoostStumpClassifier
Documentation at DecisionTree.jl. Examples: Classifier Comparison, Decision Tree Regression notebooks.
LowRankModels.jl
SkGLRM
: Generalized Low Rank ModelPCA
: Principal Component AnalysisQPCA
: Quadratically Regularized PCARPCA
: Robust PCANNMF
: Non-negative matrix factorizationKMeans
: The k-means algorithm
Please note that these algorithms are all special cases of the Generalized Low Rank Model algorithm, whose main goal is to provide flexible loss and regularization for heterogeneous data. Specialized algorithms will achieve faster convergence in general.
Documentation at LowRankModels.jl. Example: KMeans Digit Classifier.
Contributing
To make your Julia model compatible with ScikitLearn.jl, you need to implement the scikit-learn interface. See ScikitLearnBase.jl