Tuesday, November 12, 2019

Data School: How to encode categorical features with scikit-learn (video)

How to encode categorical features with scikit-learn (video)

In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?

In this 28-minute video, you'll learn:

  • How to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a single step
  • How to include this step within a Pipeline so that you can cross-validate your model and preprocessing steps simultaneously
  • Why you should use scikit-learn (rather than pandas) for preprocessing your dataset

If you want to follow along with the code, you can download the Jupyter notebook from GitHub.

Click on a timestamp below to jump to a particular section:

0:22 Why should you use a Pipeline?
2:30 Preview of the lesson
3:35 Loading and preparing a dataset
6:11 Cross-validating a simple model
10:00 Encoding categorical features with OneHotEncoder
15:01 Selecting columns for preprocessing with ColumnTransformer
19:00 Creating a two-step Pipeline
19:54 Cross-validating a Pipeline
21:44 Making predictions on new data
23:43 Recap of the lesson
24:50 Why should you use scikit-learn (rather than pandas) for preprocessing?

Related Resources

P.S. Want to master Machine Learning in Python? Enroll in my online course, Machine Learning with Text in Python!



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...