Monday, July 26, 2021

Python Morsels: Map and Filter in Python

Transcript

Let's talk about the map and filter functions in Python, and why I don't usually recommend using them (related: I also don't recommend lambda expressions).

The map function transforms each item

The map function accepts a function and an iterable . Here we're passing a square function and a numbers list to map:

>>> def square(n):
...     return n**2
...
>>> numbers = [2, 1, 3, 4, 7, 11, 18]
>>> squared_numbers = map(square, numbers)

The map function returns a lazy iterable:

>>> squared_numbers
<map object at 0x7f241e1f47f0>

As we loop over this map object (squared_numbers), the map object will loop over the given iterable (numbers) and call the given function (square) on each item in the iterable, giving us back the return value of that function call:

>>> list(squared_numbers)
[4, 1, 9, 16, 49, 121, 324]

In this case, our map object is squaring each of numbers in the given numbers iterable.

You can think of map as doing a transformation operation. The map function:

  1. Takes an iterable
  2. Takes an operation to perform on each item in the iterable
  3. Performs the given operation on each item as we loop over it

The filter function filters items down

The filter function also accepts a function and an iterable. We're an is_odd function and a numbers list to filter:

>>> def is_odd(n):
...     return n%2 == 1
...
>>> numbers = [2, 1, 3, 4, 7, 11, 18]
>>> odd_numbers = filter(is_odd, numbers)

Like map, the filter function gives us back a lazy iterable:

>>> odd_numbers
<filter object at 0x7fbf13c1d7c0>

As we loop over this filter object (odd_numbers), the filter object will loop over the given iterable (numbers), and call the given function (is_odd) on each item in it. However, it doesn't give us back the return value of that function call; instead it uses that function call to determine whether that item should be included in the resulting lazy iterable:

>>> list(odd_numbers)
[1, 3, 7, 11]

In this case, we're only getting odd numbers, because the filter function will only include items where True (or a truthy value) is returned when that item is passed to the given function (is_odd in our case).

map and filter are equivalent to writing a generator expression

  1. The map function takes each item in a given iterable and and includes all of them in a new lazy iterable, transforming each item along the way
  2. The filter function doesn't transform the items, but it's selectively picks out which items it should include in the new lazy iterable

The reason I don't usually recommend using map and filter is that they can each be summed up in just one line of Python code.

The map function is nearly equivalent to this generator expression:

def map(function, iterable):
    return (function(x) for x in iterable)

There's a little bit more to the map function that this, but for most use cases map is essentially the same as a generator expression that loops over an iterable and calls a function on every item in that iterable (to transform each item).

The filter function is essentially the same as this generator expression:

def filter(function, iterable):
    return (x for x in iterable if function(x))

This generator expression loops over an iterable and calls a function on each item in the conditional part of the generator expression to determine whether the items should be included in the new lazy iterable.

Nested map and filter calls vs generator expressions

We have square and is_odd functions here:

>>> def square(n):
...     return n**2
...
>>> def is_odd(n):
...     return n%2 == 1

And we have a list, numbers:

>>> numbers = [2, 1, 3, 4, 7, 11, 18]

We could use the map and filter functions to take numbers and square all of the odd numbers (that is, only including odd numbers and squaring each included number).

We could pass is_odd and numbers to the filter function and then take the filter object we get back (which is a lazy iterable) and pass it to map along with the square function:

>>> numbers = [2, 1, 3, 4, 7, 11, 18]
>>> map(square, filter(is_odd, numbers))
<map object at 0x7ff0b70ef1c0>

This makes a lazy iterable which will include squares of all of the odd numbers in our list.

As we loop over the lazy map object we get back, we'll see that it includes the square of all the odd numbers from our original list:

>>> list(map(square, filter(is_odd, numbers)))
[1, 9, 49, 121]

We could accomplish this same task using a generator expression, like this:

>>> (square(n) for n in numbers if is_odd(n))
<generator object <genexpr> at 0x7ff0b710aba0>

Though if we wanted to get a list instead of a lazy iterable, we could write it as a list comprehension instead:

>>> [square(n) for n in numbers if is_odd(n)]
[1, 9, 49, 121]

I find this list comprehension (or generator expression) version a lot more readable than the equivalent map and filter version of the same code:

>>> list(map(square, filter(is_odd, numbers)))
[1, 9, 49, 121]

The map and filter version is a little bit inside-out looking: we pass a function (square) to map along with a filter object which has a function (is_odd) and an iterable (numbers) passed to it.

Whereas the list comprehension version looks more like the English sentence I might say in order to describe the operation we're performing:

>>> [square(n) for n in numbers if is_odd(n)]
[1, 9, 49, 121]

In fact, with the generator expression or list comprehension, you don't even need extra functions to call (unlike with map and filter). You can write out the operations (n**2 and if n % 2 == 1) right inside the first part and last part of a list comprehension (or generator expression):

>>> [n**2 for n in numbers if n % 2 == 1]
[1, 9, 49, 121]

In fact, I think of the first part of a generator expression is the mapping part, and the last part of a generator expression as the filtering part because they serve the same purpose as the built-in map and filter functions.

Summary

The map function performs a transformation on each item in an iterable, returning a lazy iterable back. The filter function filters down items in an iterable, returning a lazy iterable back.

Instead of map and filter, I tend to prefer generator expressions. The first part of a generator expression is the mapping part, and the last optional part of a generator expression (the condition) is the filtering part.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...