Using Auto ML from Scikit Learn
Hello again! Today I want to show you how to classify using Auto ML from Scikit Learn. For this example, I will be using the clean version of the titanic dataset. You are able to use the normal dataset as well, all I have done to it is remove the Nan values as well as converting to boolean values and removing any strings.
Before we start, you can check out my previous blog on how to install the package. Just as a reminder, this package may interfere with other libraries that you have already installed, so check that nothing is interfering before proceeding.
We will start by loading the dataset to our notebook, then seeing what that looks like.
Awesome! this looks great. Now, we’ll proceed by loading our libraries and creating our
After that, we will instantiate our model and train it. I will be running the model with its default parameters, but we will explore them in future blogs.
Now, when it comes to training your model, it will take a predisposed amount of time, in this case, my model ran for 1hour. After running the model, I created our predictions, and finally, it’s time for scoring.
Looking at our results, Auto ML was able to have an incredible accuracy score with minimal effort! This is an amazing tool to have under your belt! Make sure to check it out. In the next blog, we will explore more in-depth.
import pandas as pddf= pd.read_csv('cleaned_titanic.csv')
import sklearn.metricsy = df.Survived
X = df.drop('Survived', axis=1)X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1)automl = autosklearn.classification.AutoSklearnClassifier()
automl.fit(X_train, y_train)y_hat = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat))import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
cf_matrix = confusion_matrix(y_test, y_hat)