Monday, August 24, 2020

Daniel Bader: Functional Programming Primitives in Python

Functional Programming Primitives in Python

In this tutorial we’ll take a look at the basic building blocks of functional programming in Python, like the “filter”, “map”, and “reduce” functions.

Python Functional Programming

What Is Functional Programming?

Functional programming (FP) is a programming technique that avoids side effects by performing computation primarily through the evaluation of mathematical functions and the use of immutable data structures.

In some cases, using a functional programming style can reduce the likelihood of bugs in your programs and make them more maintainable. Admittedly, FP is a little difficult to pin down because many modern FP languages (like Haskell) support a vast range of additional features and concepts—like advanced typing systems—that are often assumed to be a part of FP.

But, when you read the commonly accepted definitions closely you’ll find that these advanced concepts are more “add-ons” than absolute necessities. For example, here’s how Wikipedia defines functional programming:

“In computer science, functional programming is a programming paradigm — a style of building the structure and elements of computer programs — that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data.” (Source)

In this tutorial you’ll learn how the use the basic building blocks of functional programming in Python.

First, we’ll go over the distinction between immutable and mutable data structures in Python and how you can use them to write more maintainable and efficient code.

After that you’ll see how to work with Python’s built-in FP primitives, such as the filter(), map(), and reduce() functions.

Let’s get started!

Immutable Data Structures in Python

In my experience, picking the right data structures to represent the domain objects you’ll be working with his half the battle in any kind of programming work you do. To illustrate the coding examples in this tutorial, I’ll use the following list of scientists as my “toy” data set:

Name Field Born Nobel Prize?
Ada Lovelace math 1815 no
Emmy Noether math 1882 no
Marie Curie physics 1867 yes
Tu Youyou chemistry 1930 yes
Ada Yonath chemistry 1939 yes
Vera Rubin astronomy 1928 no
Sally Ride physics 1951 no

As you can see, all of these records follow a uniform structure. Each entry in this list represents a well-known scientist and some associated data, like their name, the field they worked in, the year they were born, and whether or not they ever won the Nobel prize.

This data set will be our playground for exploring the most common Python functional programming techniques.

The next thing we’ll need to figure out is how can we represent this data set in our Python program? There are a number of options we can use to represent this data in Python. My first hunch when I started writing this tutorial was to use plain dictionary objects:

scientists = [
    {'name': 'Ada Lovelace', 'field': 'math', 'born': 1815, 'nobel': False},
    {'name': 'Emmy Noether', 'field': 'math', 'born': 1882, 'nobel': False},
    {'name': 'Marie Curie', 'field': 'physics', 'born': 1867, 'nobel': True},
    {'name': 'Tu Youyou', 'field': 'chemistry', 'born': 1930, 'nobel': True},
    {'name': 'Ada Yonath', 'field': 'chemistry', 'born': 1939, 'nobel': True},
    {'name': 'Vera Rubin', 'field': 'astronomy', 'born': 1928, 'nobel': False},
    {'name': 'Sally Ride', 'field': 'physics', 'born': 1951, 'nobel': False},
]

This looks quite nice—but there are two downsides with this approach:

  • Plain dictionaries make no guarantees that I’m using consistent field names. I could easily introduce a typo and write 'borne' instead of 'born' in one of the items and that would lead to errors and bugs further down the road.
  • Python’s dicts are mutable data structures which means these items can be modified after they were created. I could accidentally change Ada Lovelace’s name to “Ed Lovelace” somewhere in my program and there would be nothing stopping me from doing that.
  • Lists are also mutable in Python, meaning that new items could be added or removed freely.

That’s why I decided to use a more stringent representation for this list of scientists. I’m using collections.namedtuple from the Python standard library to define an immutable data type (Scientist) that holds all of the associated fields:

import collections

Scientist = collections.namedtuple('Scientist', [
    'name',
    'field',
    'born',
    'nobel',  # Won nobel prize?
])

Using the Scientist namedtuple I can now define my data set as an, once again immutable, tuple of Scientist objects. This disallows changing individual fields on each item, ensures that all field names are set with the correct spelling, and that the resulting list of items can’t be modified after the fact:

scientists = (
    Scientist(name='Ada Lovelace', field='math', born=1815, nobel=False),
    Scientist(name='Emmy Noether', field='math', born=1882, nobel=False),
    Scientist(name='Marie Curie', field='physics', born=1867, nobel=True),
    Scientist(name='Tu Youyou', field='chemistry', born=1930, nobel=True),
    Scientist(name='Ada Yonath', field='chemistry', born=1939, nobel=True),
    Scientist(name='Vera Rubin', field='astronomy', born=1928, nobel=False),
    Scientist(name='Sally Ride', field='physics', born=1951, nobel=False),
)

Now, what’s so great about immutability? Like I said before, the state of an immutable object can’t be modified after it was created. To give you an example, this is great for writing applications that use parallel computing to complete their work faster.

Immutable data structures are inherently more thread-safe than mutable objects because they allow one thread of execution to act on the data without worrying about what other threads are doing.

And while the use of functional programming doesn’t require the use of parallel computing or a multithreaded execution model, it is one of the advantages of FP that writing parallelized code becomes easier and more enjoyable—as long as we represent our data with some foresight.

Of course, you could very well implement the examples in this tutorial using a list of dictionaries or tuples. Actually, this would make a great exercise for you to try out later 🙂

Besides guaranteeing immutability, using collections.namedtuple here has another benefit: I find that being able to type x.name is more readable than having to deal with x['name'] dictionary keys or tuple indexes everywhere. But that’s just my personal preference.

If you’d like to dig deeper into the subject of immutable data structures in Python and how they relate to a functional programming style, I’ve recorded an 18 minute video tutorial. You can watch it in the video player embedded below or directly on my YouTube channel:

» Subscribe to the dbader.org YouTube Channel for more Python tutorials.

Before you move on, let’s talk about one more thing:

What’s pprint.pprint()? Throughout this article I’ll be using the pprint() function in the code examples to show you the results of our calculations and transformations.

If you’re wondering where to find this function, it’s hosted in the pprint module included with Python’s standard library. So in order to make the examples work you’ll need to import the function from the pprint module:

from pprint import pprint

Alright, now you’re all set to dive into some hands-on functional programming examples. Next up you’ll learn about the filter(), map(), and reduce() functions.

Python Functional Primitive #1: filter()

Python’s filter function is a basic building block of functional programming. It allows you to filter a collection of elements so that only an arbitrary subset of elements is included in the output. Okay, that sounds a little confusing. Let’s think of a concrete example:

What if we wanted to get a list of all scientists in our list who won the Nobel prize at some point in their life?

The classic way to do this filtering would be with a for loop in Python:

nobel_scientists = []

for entry in scientists:
    if entry.nobel is True:
        nobel_scientists.append(entry)

In the above code we’re spelling out step by step what needs to happen in order to create the “filtered” nobel_scientists list. We start with an empty list and then keep adding scientists to it if we find that their nobel flag is set to True. This algorithm works, but it’s quite verbose and it needs to repeatedly update a mutable data structure (nobel_scientists) to generate its output. This isn’t necessarily a bad thing—but if you approach the same problem with a functional programming approach you’ll notice some differences.

Let’s use Python’s filter() function to achieve the same result—a filtered list of scientists who won the Nobel prize. This is the full implementation:

nobel_scientists = tuple(filter(lambda x: x.nobel is True, scientists))

As you can see, filter takes a collection of elements and a filter predicate as its input. The filter predicate is just a function that takes an element as input and returns a bool indicating whether or not the element should be included in the resulting filtered list.

I’m using a lambda here for brevity, but you could just as well expand this example into something more verbose by defining the filter predicate as a separate named function:

def predicate(elem):
    return elem.nobel is True

nobel_scientists = tuple(filter(predicate, scientists))

Also, why am I calling tuple() on the result of the filter call? Normally the filter function returns an iterable filter object on Python 3:

>>> nobel_scientists = filter(lambda x: x.nobel is True, scientists)
>>> pprint(filter(nobel_scientists))
<filter object at 0x1027daa58>
>>> pprint(list(nobel_scientists))
[Scientist(name='Marie Curie', field='physics', born=1867, nobel=True),
 Scientist(name='Tu Youyou', field='chemistry', born=1930, nobel=True),
 Scientist(name='Ada Yonath', field='chemistry', born=1939, nobel=True)]

>>> pprint(list(nobel_scientists))
[]
nobel_scientists = tuple(x for x in scientists if x.nobel is True)

» Subscribe to the dbader.org YouTube Channel for more Python tutorials.

The filter() is one of the functional programming primitives or building blocks available in Python and it’s useful in a number of contexts.

Let’s filter this data set

Python Functional Primitive #2: map()

The map function takes a function and a sequence of items. It will then essentially loop over all items in the sequence and will return a new sequence of identical length containing the results of each function call.

Let’s make this a little more concrete though. If you have a list of values and a function f(), mapping f to the list results in the following transformation:

[x, y, z] --- map(f) ---> [f(x), f(y), f(z)]

For example, we could transform our input list of scientists into a list of dictionaries with some derived properties computed from the original ones.

To do so we first define a transformation function that takes a Scientist object as its input and returns a new value. In our case it’ll be a dictionary with some derived properties.

Once we’ve defined the transformation function we can call map() and apply the transformation to the original input sequence:

def transform(x):
    return {'name': x.name, 'age': 2017 - x.born}

names_and_ages = map(transform, scientists)

Note how this won’t modify the input sequence. Instead, calling map() creates a new sequence that’s based on the original input.

If you try to print the resulting names_and_ages variable you’ll get a slightly surprising result (on Python 3):

>>> pprint(names_and_ages)
<map object at 0x1025789e8>

It turns out that the map() function is lazy. Instead of returning a fully instantiated sequence that contains all transformed items, it returns an iterator (called a map object) that performs the transformation item by item when it is consumed.

This is a little bit difficult to visualize. So what I’ll do instead is convert this iterator into a fully instantiated list or tuple object so we can inspect it more easily in the Python REPL.

In this case I’m going with a tuple to keep the output sequence immutable:

>>> pprint(tuple(names_and_ages))
({'age': 202, 'name': 'Ada Lovelace'},
 {'age': 135, 'name': 'Emmy Noether'},
 {'age': 150, 'name': 'Marie Curie'},
 {'age': 87, 'name': 'Tu Youyou'},
 {'age': 78, 'name': 'Ada Yonath'},
 {'age': 89, 'name': 'Vera Rubin'},
 {'age': 66, 'name': 'Sally Ride'})

[If you’d like to learn more about iterators check out my tutorial here and my Python Tricks book.]

Using a single-expression lambda function and converting the map object iterator into a proper tuple, we can shorten the above code to this:

names_and_ages = tuple(map(
    lambda x: {'name': x.name, 'age': 2017 - x.born},
    scientists
))

We can make this code more Pythonic by removing the map call and using a list comprehension instead:

names_and_ages = [
    {'name': x.name, 'age': 2017 - x.born}
    for x in scientists
]

Notice how I previously used immutable tuple objects. We can achieve the same result by using the tuple() constructor in combination with a generator expression:

names_and_ages = tuple(
    {'name': x.name, 'age': 2017 - x.born}
    for x in scientists
)

» Subscribe to the dbader.org YouTube Channel for more Python tutorials.

Python Functional Primitive #3: reduce()

>>> pprint(list(names_and_ages))
[{'age': 202, 'name': 'Ada Lovelace'},
 {'age': 135, 'name': 'Emmy Noether'},
 {'age': 150, 'name': 'Marie Curie'},
 {'age': 87, 'name': 'Tu Youyou'},
 {'age': 78, 'name': 'Ada Yonath'},
 {'age': 89, 'name': 'Vera Rubin'},
 {'age': 66, 'name': 'Sally Ride'}]
total_age = functools.reduce(
    lambda acc, val: acc + val['age'],
    names_and_ages,
    0
)
>>> print(total_age)
807
total_age = sum(x['age'] for x in names_and_ages)

More interesting uses of reduce(): grouping scientists by field

scientists_by_field = functools.reduce(
    lambda acc, val: {**acc, **{val.field: acc[val.field] + [val.name]}},
    scientists,
    {'math': [], 'physics': [], 'chemistry': [], 'astronomy': []}
)

pprint(scientists_by_field)

# Can't reference values already in the dict while we're creating it
# scientists_by_field2 = {
#     val['born']: scientists_by_field2[val['field']] + [val['name']]
#     for val in scientists
# }

# pprint(scientists_by_field2)


def reducer(acc, val):
    acc[val.field].append(val.name)
    return acc


scientists_by_field3 = functools.reduce(
    reducer,
    scientists,
    {'math': [], 'physics': [], 'chemistry': [], 'astronomy': []}
)

pprint(scientists_by_field3)

import collections

scientists_by_field4 = functools.reduce(
    reducer,
    scientists,
    collections.defaultdict(list)
)

pprint(scientists_by_field4)


import itertools

scientists_by_field5 = {
    item[0]: list(item[1])
    for item in itertools.groupby(scientists, lambda x: x.field)
}

# groupby returns (group, iterator) tuples that we need to convert into
# concrete lists

pprint(scientists_by_field5)

» Subscribe to the dbader.org YouTube Channel for more Python tutorials.

Functional Programming in Python – Conclusion

“Functional Programming in Python” YouTube playlist


[ Improve Your Python with Dan's 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...