Lime for uninterpretable models.

Helping to bring a little clarity to black box models.

Hey again! This time I want to bring you a useful tool to help bring some clarification to uninterpretable models aka black box models.

The library we’re talking about is called Lime (You can find Lime’s GitHub here) is able to explain any black-box classifier, with two or more classes. All we require is that the classifier implements a function that takes in raw text or a NumPy array and outputs a probability for each class. Support for scikit-learn classifiers is built-in.

Lime is great for different kinds of classifications. Some of them are:

  • Text Classifiers (NLP)
  • Tabular Data (Numerical Data)
  • Images
  • Images (Faces)
  • Simple Regression

The output of this library is called an Explanation. It’s a visual form to see the approximation of the model’s behavior. They explain it in their GitHub as:

While the model may be very complex globally, it is easier to approximate it around the vicinity of a particular instance. While treating the model as a black box, we perturb the instance we want to explain and learn a sparse linear model around it, as an explanation.

Let’s check out how it works! First, we want to install the library. From the terminal, we can use pip to install the library.

pip install lime

Let’s check out how it works! First, we want to install the library. From the terminal, we can use pip to install the library.

After we do that we’re ready to work in an example. Here I will be using a clean version of the Titanic Database but you can also retrieve it from Kaggle here. I also will be using Logistic Regression as the model.

Remember that you can use any black-box model for this library. I just chose Logistic Regression since it’s a model that I am very familiar with.

Let’s start! I’ll load up all my libraries and check out my DataFrame.

Following that, I’ll skip the EDA and Feature engineering and jump into splitting my X and y and creating my train test split.

Next, I’ll instantiate my Logistic Regression object and make it with balanced class weights; this will benefit the modeling since we did not do any feature engineering and then we’ll fit and predict.

Let’s check out our scores.

Well, not that impressive but it’s not terrible. For the sake of the example, we’ll move on with the process.

Now let’s load Lime and check out how the model performed. We will start by importing Lime, then we’ll import lime_tabular this is the library that we use to do classification when the data is numerical.

Then we’ll instantiate our lime tabular explainer. The basics parameters that we will use are the following:

  • Training Data: The reason for this is because we compute statistics on each feature. If the feature is numerical, we compute the mean and std and discretize it into quartiles. If the feature is categorical, we compute the frequency of each value.
  • Feature Names: The names of the columns used for the training set.
  • Class Names: The names of our target (string form if it applies)
  • Discretize Continuous: This feature makes for more intuitive explanations.

Alright, now let’s create our explainer object. First, we’ll create a random number generator to pull a random value from our test set. The parameters we’ll use are the following:

  • Testing Data: This is the data that we’ll use to see how the model performs. We just need one row of our training set.
  • predict_broba: We need to use the prediction probability of each column to be able to interpret it.
  • num_features: The features we want to use to interpret the outcome.

Nice! Here we can see that the model predicted that this passenger would survive. We see on the table on the right the values for each feature, here it’s telling us that the passenger is Female and that she was in third class. In the center, we see the probability values and it tells us that since she is female the probability of survival is 47% but since she was in third class the probability of not surviving is 36%.

Now, let’s add more context and more features.

For the sake of explanation I decided to incorporate the raw data and to see how the model predicted and the actual survival value.

We can see that for their survival not being male, being youngin, having an age less than or equal to 22, and paying more than 30.70 for the ticket improved her odds of survival. But, being in a class that is less or equal to 3 and greater than 1, having more than 1 sibling and they embarked in Southampton lowered her chance of survival.

In the end, even though the model predicted that the passenger would survive unfortunately they did not. As we know, most of the passengers in the third class did not survive.

Let’s look at another one.

In this case, the model predicted accurately and we can see the probability factors for each feature. Male, third class, and an adult were the main reasons why he did not survive.

And there you have it! A brief example to look at your black-box models and their performance. Please check out Lime’s GitHub page and see how you can implement lime for your next model!

You’ll find the code that was used below.

#Loading Libraries
import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
#Reading DF
df=pd.read_csv('cleaned_titanic.csv')
df.head(5)
#Creating my target variable and features
y=df["Survived"]
X=df.drop(['Survived'],axis=1)
#Setting my train test split 80/20
X_train,X_test,y_train,y_test=train_test_split(X,y, test_size=.20, random_state=21)
#Instantiating Logistic regression with weighted class.
log_reg=LogisticRegression(max_iter=1000, random_state=21, class_weight="balanced")
#Fitting and predicting
log_reg.fit(X_train,y_train)
log_pred=log_reg.predict(X_test)#Looking at our metrics for our model
print("Logistic Regression Accuracy: {}".format(metrics.accuracy_score(y_test, log_pred)))
print("Logistic Regression Recall: {}".format(metrics.recall_score(y_test, log_pred)))
print("Logistic Regression F1: {}".format(metrics.f1_score(y_test,log_pred)))
#Importing Lime and lime Tabular.
import lime
import lime.lime_tabular
#Instantiating our tabular explainer
explainer = lime.lime_tabular.LimeTabularExplainer(X_train.values,feature_names = list(X.columns),class_names = ['No',"Yes"], discretize_continuous=True)
#Selecting a random test value
i = np.random.randint(0, y_test.shape[0])
#Using said value to create our explainer.
exp = explainer.explain_instance(X_test.values[i],log_reg.predict_proba,num_features=2)
exp.show_in_notebook(show_table=True, show_all=False)#Using value to create our explainer.
exp1 = explainer.explain_instance(X_test.values[i],log_reg.predict_proba,num_features=7)
print("Value Selected:"+"\n",pd.DataFrame(zip(X.columns,X_test.values[i])).set_index(0).T,)
print('\n'+"Survived Actual: ","Survived" if y_test.values[i]==1 else "Did not Survive",
'\n'+"Survived Prediction: ","Survived" if log_pred[i]==1 else "Did not Survive")
#Show explainer
exp1.show_in_notebook(show_table=True, show_all=False)

A Data Scientist in the making!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store