Daily Python: Data School: How to encode categorical features with scikit-learn (video)

Tuesday, November 12, 2019

Data School: How to encode categorical features with scikit-learn (video)

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?

In this 28-minute video, you'll learn:

How to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a single step
How to include this step within a Pipeline so that you can cross-validate your model and preprocessing steps simultaneously
Why you should use scikit-learn (rather than pandas) for preprocessing your dataset

If you want to follow along with the code, you can download the Jupyter notebook from GitHub.

Click on a timestamp below to jump to a particular section:

0:22 Why should you use a Pipeline?
2:30 Preview of the lesson
3:35 Loading and preparing a dataset
6:11 Cross-validating a simple model
10:00 Encoding categorical features with OneHotEncoder
15:01 Selecting columns for preprocessing with ColumnTransformer
19:00 Creating a two-step Pipeline
19:54 Cross-validating a Pipeline
21:44 Making predictions on new data
23:43 Recap of the lesson
24:50 Why should you use scikit-learn (rather than pandas) for preprocessing?

Related Resources

scikit-learn documentation for OneHotEncoder, ColumnTransformer, and Pipeline
My video series: Introduction to Machine Learning in Python
My videos on cross-validation and grid search
My lesson notebook on StandardScaler

P.S. Want to master Machine Learning in Python? Enroll in my online course, Machine Learning with Text in Python!

from Planet Python
via read more

Daily Python

Tuesday, November 12, 2019

Data School: How to encode categorical features with scikit-learn (video)

Related Resources

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

Search This Blog