You’ve somehow ended up with lists nested inside of lists, possibly like this one:
1
|
|
But you want just a single list (without the nesting) like this:
1
|
|
You need to flatten your list-of-lists.
We’re looking for a “shallow” flatten
We can think of this as a shallow flatten operation, meaning we’re flattening this list by one level. A deep flatten operation would handle lists-of-lists-of-lists-of-lists (and so on) and that’s a bit more than we need for our use case.
The flattening strategy we come up with should work on lists-of-lists as well as any other type of iterable-of-iterables. For example lists of tuples should be flattenable:
1
|
|
And even an odd type like a dict_items
object (which we get from asking a dictionary for its items) should be flattenable:
1 2 3 4 |
|
Flattening iterables-of-iterables with a for
loop
One way to flatten an iterable-of-iterables is with a for
loop. We can loop one level deep to get each of the inner iterables.
1 2 |
|
And then we loop a second level deep to get each item from each inner iterable.
1 2 3 |
|
And then append each item to a new list:
1 2 3 4 |
|
There’s also a list method that makes this a bit shorter, the extend
method:
1 2 3 |
|
The list extend
method accepts an iterable and appends every item in the iterable you give to it.
Or we could use the +=
operator to concatenate each list to our new list:
1 2 3 |
|
You can think of +=
on lists as calling the extend
method. With lists these two operations (+=
and extend
) are equivalent.
Flattening iterables-of-iterables with a comprehension
This nested for
loop with an append
call might look familiar:
1 2 3 4 |
|
The structure of this code looks like something we could copy-paste into a list comprehension.
Inside our square brackets we’d copy the thing we’re appending first, and then the logic for our first loop, and then the logic for our second loop:
1 2 3 4 5 |
|
This comprehension loops two levels deep, just like our nested for
loops did. Note that the order of the for
clauses in the comprehension must remain the same as the order of the for
loops.
The (sometimes confusing) order of those for
clauses is partly why I recommend copy-pasting into a comprehension. When turning a for
loop into a comprehension, the for
and if
clauses remain in the same relative place, but the thing you’re appending moves from the end to the beginning.
Could we flatten with *
in a comprehension?
But what about Python’s *
operator? I’ve written about the many uses for the prefixed asterisk symbol in Python.
We can use *
in Python’s list literal syntax ([
…]
) to unpack an iterable into a new list:
1 2 3 4 |
|
Could we use that *
operator to unpack an iterable within a comprehension?
1 2 3 4 |
|
We can’t. If we try to do this Python will specifically tell us that the *
operator can’t be used like this in a comprehension:
1 2 3 4 5 6 7 8 |
|
This feature was specifically excluded from PEP 448, the Python Enhancement Proposal that added this *
-in-list-literal syntax to Python due to readability concerns.
Can’t we use sum
?
Here’s another list flattening trick I’ve seen a few times:
1
|
|
This does work:
1 2 |
|
But I find this technique pretty unintuitive.
We use the +
operator in Python for both adding numbers and concatenating sequences and the sum
function happens to work with anything that supports the +
operator (thanks to duck typing). But in my mind, the word “sum” implies arithmetic: summing adds numbers together.
I find it confusing to “sum” lists, so I don’t recommend this approach.
What about itertools.chain
?
There is one more tool that’s often used for flattening: the chain
utility in the itertools
module.
chain
accepts any number arguments and it returns an iterator:
1 2 3 |
|
We can loop over that iterator or turn it into another iterable, like a list:
1 2 |
|
There’s actually a method on chain
that’s specifically for flattening a single iterable:
1 2 |
|
Using chain.from_iterable
is more performant than using chain
with *
because *
unpacks the whole iterable immediately when chain
is called.
Recap: comparing list flattening techniques
If you want to flatten an iterable-of-iterables lazily, I would use itertools.chain.from_iterable
:
1 2 |
|
This will return an iterator, meaning no work will be done until the returned iterable is looped over:
1 2 |
|
And it will be consumed as we loop, so looping twice will result in an empty iterable:
1 2 |
|
If you find itertools.chain
a bit too cryptic, you might prefer a for
loop that calls the extend
method on a new list to repeatedly extend the values in each iterable:
1 2 3 |
|
Or a for
loop that uses the +=
operator on our new list:
1 2 3 |
|
Unlike chain.from_iterable
, both of these for
loops build up new list rather than a lazy iterator object.
If you find list comprehensions readable (I love them for signaling “look we’re building up a list”) then you might prefer a comprehension instead:
1 2 3 4 5 |
|
And if you do want laziness (an iterator) but you don’t like itertools.chain
you could make a generator expression that does tha same thing as itertools.chain.from_iterable
:
1 2 3 4 5 |
|
Happy list flattening!
from Planet Python
via read more
No comments:
Post a Comment