## Quick start guide

Let's build a classifier for the classic iris dataset. If you don't have RDatasets, `Pkg.add` it.

``````using RDatasets: dataset

iris = dataset("datasets", "iris")

# ScikitLearn.jl expects arrays, but DataFrames can also be used - see
# the corresponding section of the manual
X = convert(Array, iris[[:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]])
y = convert(Array, iris[:Species])
``````

Next, we load the LogisticRegression model from scikit-learn's library.

``````using ScikitLearn

# This model requires scikit-learn. See
# http://scikitlearnjl.readthedocs.io/en/latest/models/#installation
@sk_import linear_model: LogisticRegression
``````

Every model's constructor accepts hyperparameters (such as regression strength, whether to fit the intercept, the penalty type, etc.) as keyword arguments. Check out `?LogisticRegression` for details.

``````model = LogisticRegression(fit_intercept=true)
``````

Then we train the model and evaluate its accuracy on the training set:

``````fit!(model, X, y)

accuracy = sum(predict(model, X) .== y) / length(y)
println("accuracy: \$accuracy")

> accuracy: 0.96
``````

### Cross-validation

This will train five models, on five train/test splits of X and y, and return the test-set accuracy of each:

``````using ScikitLearn.CrossValidation: cross_val_score

cross_val_score(LogisticRegression(), X, y; cv=5)  # 5-fold

> 5-element Array{Float64,1}:
>  1.0
>  0.966667
>  0.933333
>  0.9
>  1.0
``````

See this tutorial for more information.

### Hyperparameter tuning

`LogisticRegression` has a regularization-strength parameter `C` (smaller is stronger). We can use grid search algorithms to find the optimal `C`.

`GridSearchCV` will try all values of `C` in `0.1:0.1:2.0` and will return the one with the highest cross-validation performance.

``````using ScikitLearn.GridSearch: GridSearchCV

gridsearch = GridSearchCV(LogisticRegression(), Dict(:C => 0.1:0.1:2.0))
fit!(gridsearch, X, y)
println("Best parameters: \$(gridsearch.best_params_)")

> Best parameters: Dict{Symbol,Any}(:C=>1.1)
``````

Finally, we plot cross-validation accuracy vs. `C`

``````using PyPlot

plot([cv_res.parameters[:C] for cv_res in gridsearch.grid_scores_],
[mean(cv_res.cv_validation_scores) for cv_res in gridsearch.grid_scores_])
``````

### Saving the model to disk

Both Python and Julia models can be saved to disk

``````import JLD, PyCallJLD

JLD.save("my_model.jld", "model", model)
model = JLD.load("my_model.jld", "model")    # Load it back
``````