Saturday, December 19, 2020

ListenData: How to use variable in a query in pandas

Suppose you want to reference a variable in a query in pandas package in Python. This seems to be a straightforward task but it becomes daunting sometimes. Let's discuss it with examples in the article below.

Let's create a sample dataframe having 3 columns and 4 rows. This dataframe is used for demonstration purpose.


import pandas as pd
df = pd.DataFrame({"col1" : range(1,5),
"col2" : ['A A','B B','A A','B B'],
"col3" : ['A A','A A','B B','B B']
})
Sample Dataframe
Filter a value A A in column col2
In order to do reference of a variable in query, you need to use @.
Mention Value Explicitly

newdf = df.query("col2 == 'A A'")
Reference Method

myval1 = 'A A'
newdf = df.query("col2 == @myval1")
How to pass column name as a variable in query
Instead of filter value we are referring the column which we want to use for subetting or filtering.

myvar1 = 'col2'
newdf2 = df.query("{0} == 'A A'".format(myvar1))
{0} takes a value of variable myvar1.
"{0} == 'A A'".format(myvar1) returns "col2 == 'A A'"
Incase you want to pass multiple columns as variables in query. Here we are using columns col2 and col3.

myvar1 = 'col2'
myvar2 = 'col3'
newdf2 = df.query("{0} == 'A A' & {1} == 'B B'".format(myvar1, myvar2))
"{0} == 'A A' & {1} == 'B B'".format(myvar1, myvar2) is equivalent to "col2 == 'A A' & col3 == 'B B'"
How to handle space in column name
Let's rename column col2 by including a space in between for illustration purpose.

df.rename(columns={'col2':'col 2'}, inplace = True)
By using backticks `` you can pass a column which contains space.

myvar1 = '`col 2`'
newdf = df.query("{0} == 'A A'".format(myvar1))


from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...