Auto ML

Ignacio Ruiz
3 min readAug 28, 2021

Using Auto ML from Scikit Learn

Hello again! Today I want to show you how to classify using Auto ML from Scikit Learn. For this example, I will be using the clean version of the titanic dataset. You are able to use the normal dataset as well, all I have done to it is remove the Nan values as well as converting to boolean values and removing any strings.

Before we start, you can check out my previous blog on how to install the package. Just as a reminder, this package may interfere with other libraries that you have already installed, so check that nothing is interfering before proceeding.

We will start by loading the dataset to our notebook, then seeing what that looks like.

Awesome! this looks great. Now, we’ll proceed by loading our libraries and creating our train_test_split()

After that, we will instantiate our model and train it. I will be running the model with its default parameters, but we will explore them in future blogs.

Now, when it comes to training your model, it will take a predisposed amount of time, in this case, my model ran for 1hour. After running the model, I created our predictions, and finally, it’s time for scoring.

Looking at our results, Auto ML was able to have an incredible accuracy score with minimal effort! This is an amazing tool to have under your belt! Make sure to check it out. In the next blog, we will explore more in-depth.

import pandas as pddf= pd.read_csv('cleaned_titanic.csv')
print(df)
import autosklearn.classification
import sklearn.model_selection
import sklearn.metrics
y = df.Survived
X = df.drop('Survived', axis=1)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1)automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)
y_hat = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat))
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
cf_matrix = confusion_matrix(y_test, y_hat)
sns.heatmap(cf_matrix, annot=True)

--

--