Welcome back! Now that we looked at what AutoML is in the previous blog, let’s dive a little into it. Today we’ll look at some great AutoML tools and libraries that are useful and that you might be interested in.
Auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
Auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning, and ensemble construction.
The image above summarizes the overall AutoML workflow, including both of our improvements. We note that we expect their effectiveness to be greater for flexible ML frameworks that offer many degrees of freedom (e.g., many algorithms, hyperparameters, and preprocessing methods).
Learn more about the technology behind auto-sklearn by reading auto-sklearn’s paper published at NIPS 2015.
The auto-sklearn software is open-source and available on GitHub.
The Tree-Based Pipeline Optimization Tool (TPOT) was one of the very first AutoML methods and open-source software packages developed for the data science community. TPOT was developed by Dr. Randal Olson while a postdoctoral student with Dr. Jason H. Moore at the Computational Genetics Laboratory of the University of Pennsylvania and is still being extended and supported by this team.
The goal of TPOT is to automate the building of ML pipelines by combining a flexible expression tree representation of pipelines with stochastic search algorithms such as genetic programming. TPOT makes use of the Python-based scikit-learn library as its ML menu.
The TPOT software is open-source and available on GitHub.
While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed Auto-PyTorch, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).
Auto-PyTorch is mainly developed to support tabular data (classification, regression), but can also be applied to image data (classification). The newest features in Auto-PyTorch for tabular data are described in the paper “Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL”.
Auto-PyTorch achieved state-of-the-art performance on several tabular benchmarks by combining multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs) and common baselines for tabular data.
The auto-pytorch software is open-source and available on GitHub.
These are some of my favorites and in the next blog we will explore some more!