Helping with the curse of dimensionality.


So, we looked at what is the curse of dimensionality, now lets see some techniques on how to mitigate it. There are different methods on how to resolve this and in this blog we’ll take a look.

In this part we will look at Feature selection, this helps us by only keeping the most relevant variables from the original dataset. The techniques that we will use here will be:

  • High Correlation filter
  • Low Variance Filter
  • Missing Value Ratio

The dataset that I will be using is the titanic dataset. Ok, let's get started.


What’s the big deal?


So, you’re an emerging Data Scientist or you’re dabbling in data analytics and you hear the “Curse of Dimensionality” mentioned a lot. We’ll maybe I can help to clear it up!

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming.

The thing about the curse of dimensionality is that when the features or dimensions increase, the volume…

A way to create generators or to use it for infinite numbers.

A useful trick to use when creating functions or just when you don’t want to create a computationaly heavy item you can use the yield object to produce a generator that can return an object on demand.

A perfect example to illustrate how yield can work is by using the Sieve of Eratosthenes example. How the sieve works is by taking a number and removing all the numbers ahead of it that are divisible by said number. …

A quick guide to plot your latitude and longitudes.

Today I want to bring you a quick tutorial for when you have some geodata and don’t know what to do or what does it look like! You can check out Folium’s GitHub here.

Let’s start by installing folium.

pip install folium

In this case, I will use a dataset that contains the Latitude and Longitude of houses in the Seattle, WA area.

Helping to bring a little clarity to black box models.

Hey again! This time I want to bring you a useful tool to help bring some clarification to uninterpretable models aka black box models.

The library we’re talking about is called Lime (You can find Lime’s GitHub here) is able to explain any black-box classifier, with two or more classes. All we require is that the classifier implements a function that takes in raw text or a NumPy array and outputs a probability for each class. Support for scikit-learn classifiers is built-in.

Lime is great for different kinds of classifications…

The nested comprehensions introduction.

Welcome back! Now that we know how to create a regular list comprehension, we can start working on more complex forms of them. We will start by using nested For loops. If you would like to see the previous blogs that go over the structure of list comprehensions, click here.

To start, nested For loops are very useful if you want to iterate through matrices. You can also use it if you have to iterate through lists within lists. In this example, we’ll iterate through a matrix to flatten it.

nums = [[1,2,3],[4,5,6],[7,8,9],[10]]nums = [i…

A brief introduction.

This is a brief introduction on how to write list comprehensions. List comprehensions can be very helpful and sometimes more flexible than a regular For loop.

To begin, let’s start with a simple For loop. We’ll start with a list of numbers from 1 to 10 and iterate them through a For loop to be multiplied by 2 and assign it to a new list.

nums = [1,2,3,4,5,6,7,8,9,10]
nums2 = []
for i in nums:
nums2.append(i * 2)

As we saw above, this took quite a bit of space and may take more processing time…

An easy and short guide for Jupyter Notebook or other apps.

Here’s the situation, you use Jupyter Notebook but you’re tired of going to terminal, changing the directory, and launching Jupyter Notebook or Jupyter Lab. I have a solution for you!

First, let’s launch Terminal. We want to use the terminal to change the directory to where I want the shell

A friendly guide to doing your first twitter web scrape on Jupyter Notebook.(Mac)

IMG SRC:analyticsvidhya

Whether If this is your first time using twint or you are coming across some issues while using it on your Jupyter Notebook, this might be the solution you are looking for.

First, we want to start from the very beginning. To install twint, you will go to your terminal and use the install command as well as upgrading twint to the current version. This you can take from twint’s GitHub:

!pip3 install twint
!pip3 install --user --upgrade git+

We’re off to a great start! Now, if…

Critical Assessment of protein Structure Prediction

CASP,is a worldwide comuinity experiment for protein structure prediction. This community was created in 1994 and provides the means of objective testing of these methods via the process of blind prediction provides research groups with an opportunity to test ther protein structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users.

The CASP experiments are focused on establishing current state of the art structure predictions, identifying what progress has been made and also expose where future predictions or methods should be focused to be more…

Ignacio Ruiz

A Data Scientist in the making!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store