## Python models

scikit-learn has on the order of 100 to 200 models (more generally called "estimators"), split into three categories:

- Supervised Learning (linear regression, support vector machines, random forest, neural nets, ...)
- Unsupervised Learning (clustering, PCA, mixture models, manifold learning, ...)
- Dataset Transformation (preprocessing, text feature extraction, one-hot encoding, ...)

All of those estimators will work with ScikitLearn.jl. They are imported with
`@sk_import`

. For example, here's how to import and fit
`sklearn.linear_regression.LogisticRegression`

```
using ScikitLearn
@sk_import linear_model: LogisticRegression
log_reg = fit!(LogisticRegression(penalty="l1"), X_train, y_train)
predict(X_test)
```

Reminder: `?LogisticRegression`

contains a lot of information about the model
parameters.

#### Installation

Importing the Python models requires Python 2.7 with numpy, and the
scikit-learn library. This is easiest to get through Conda.jl, which is already
installed on your system. Calling `@sk_import linear_model: LinearRegression`

should automatically install everything. You can also install `scikit-learn`

manually with `Conda.add("scikit-learn")`

. If you have other issues, please
refer to PyCall.jl, or
post an issue

## Julia models

Julia models are hosted in other packages, and need to be installed separately
with `Pkg.add`

or `Pkg.checkout`

(to get the latest version - sometimes
necessary). They all implement the common api, and provide
hyperparameter information in their `?docstrings`

.

Unfortunately, some packages export a `fit!`

function that conflicts with
ScikitLearn's `fit!`

. This can be fixed by adding this line:

```
using ScikitLearn: fit!, predict
```

### ScikitLearn models

`ScikitLearn.Models.LinearRegression()`

implements linear regression using`\`

, optimized for speed. See`?LinearRegression`

for fitting options.

### GaussianMixtures.jl

```
Pkg.checkout("GaussianMixtures.jl") # install the package
using GaussianMixtures: GMM
using ScikitLearn
gmm = fit!(GMM(n_components=3, # number of Gaussians to fit
kind=:diag), # diagonal covariance matrix (other option: :full)
X)
predict_proba(gmm, X)
```

Documentation at GaussianMixtures.jl. Example: density estimation

### GaussianProcesses.jl

```
Pkg.checkout("GaussianProcesses.jl") # install the package
using GaussianProcesses: GP
using ScikitLearn
gp = fit!(GP(; m=MeanZero(), k=SE(0.0, 0.0), logNoise=-1e8),
X, y)
predict(gp, X)
```

Documentation at GaussianProcesses.jl and in the `?GP`

docstring. Example: Gaussian Processes

Gaussian Processes have a lot of hyperparameters, see `get_params(GP)`

for a list. They can all be tuned

### DecisionTree.jl

`DecisionTreeClassifier`

`DecisionTreeRegressor`

`RandomForestClassifier`

`RandomForestRegressor`

`AdaBoostStumpClassifier`

Documentation at DecisionTree.jl. Examples: Classifier Comparison, Decision Tree Regression notebooks.

### LowRankModels.jl

`SkGLRM`

: Generalized Low Rank Model`PCA`

: Principal Component Analysis`QPCA`

: Quadratically Regularized PCA`RPCA`

: Robust PCA`NNMF`

: Non-negative matrix factorization`KMeans`

: The k-means algorithm

Please note that these algorithms are all special cases of the Generalized Low Rank Model algorithm, whose main goal is to provide flexible loss and regularization for heterogeneous data. Specialized algorithms will achieve faster convergence in general.

Documentation at LowRankModels.jl. Example: KMeans Digit Classifier.

### Contributing

To make your Julia model compatible with ScikitLearn.jl, you need to implement the scikit-learn interface. See ScikitLearnBase.jl