Sunday, January 20, 2019

Toshio Kuratomi: Optimizating Conway

Conway’s Game of Life seems to be a common programming exercise. I had to program it in Pascal when in High School and in C in an intro college programming course. I remember in college, since I had already programmed it before, that I wanted to optimize the algorithm. However, a combination of writing in C and having only a week to work on it didn’t leave me with enough time to implement anything fancy.

A couple years later, I hiked the Appalachian Trail. Seven months away from computers, just hiking day in and day out. One of the things I found myself contemplating when walking up and down hills all day was that pesky Game of Life algorithm and ways that I could improve it.

Fast forward through twenty intervening years of life and experience with a few other programming languages to last weekend. I needed a fun programming exercise to raise my spirits so I looked up the rules to Conway’s Game of Life, sat down with vim and python, and implemented a few versions to test out some of the ideas I’d had kicking around in my head for a quarter century.

This blog post will only contain a few snippets of code to illustrate the differences between each approach.  Full code can be checked out from this github repository.

The Naive Approach: Tracking the whole board

The naive branch is an approximation of how I would have first coded Conway’s Game of Life way back in that high school programming course. The grid of cells is what I would have called a two dimensional array in my Pascal and C days. In Python, I’ve more often heard it called a list of lists. Each entry in the outer list is a row on the grid which are each represented by another list. Each entry in the inner lists are cells in that row of the grid. If a cell is populated, then the list entry contains True. If not, then the list entry contains False.

One populated cell surrounded by empty cells would look like this:

board = [
         [False, False, False],
         [False, True, False],
         [False, False, False],
        ]

Looking up an individual cell’s status is a matter of looking up an index in two lists: First the y-index in the outer list and then the x-index in an inner list:


# Is there a populated cell at x-axis 0, y-axis 1?
if board[0][1] is True:
    pass

Checking for changes is done by looping through every cell on the Board, and checking whether each cell’s neighbors made the cell fit a rule to populate or depopulate the cell on the next iteration.

for y_idx, row in enumerate(board):
    for x_idx, cell in enumerate(row):
        if cell:
            if not check_will_live((x_idx, y_idx), board, max_x, max_y):
                next_board[y_idx][x_idx] = False
        else:
            if check_new_life((x_idx, y_idx), board, max_x, max_y):
                next_board[y_idx][x_idx] = True

This is a simple mapping of the two-dimensional grid that Conway’s takes place on into a computer data structure and then a literal translation of Conway’s ruleset onto those cells. However, it seems dreadfully inefficient. Even in college I could see that there should be easy ways to speed this up; I just needed the time to implement them.

Intermediate: Better checking

The intermediate branch rectifies inefficiencies with checking of the next generation cells. The naive branch checks every single cell that is present in the grid. However, thinking about most Conway setups, most of the cells are blank. If we find a way to ignore most of the blank cells, then it would save us a lot of work. We can’t ignore all blank cells, though; if a blank cell has exactly three populated neighbors then the blank cell will become populated in the next generation.

The key to satisfying both of those is to realize that all the cells we’re going to need to change will either be populated (in which case, they could die and become empty in the next generation) or they will be a neighbor of a populated cell (in which case, they may become populated next generation). So we can loop through our board and ignore all of the unpopulated cells at first. If we find a populated cell, then we must both check that cell to see if it will die and also check its empty neighbors to see if they will be filled in the next generation.

The major change to implement that is here:

       checked_cells = set()
       # We still loop through every cell in the board but now
       # the toplevel code to do something if the cell is empty
       # has been removed.
       for y_idx, row in enumerate(board):
            for x_idx, cell in enumerate(row):
                if cell:
                    if not check_will_live((x_idx, y_idx), board, max_x, max_y):
                        next_board[y_idx][x_idx] = False
                    # Instead, inside of the conditional block to
                    # process when a cell is populated, there's
                    # a new loop to check all of the neighbors of
                    # a populated cell.
                    for neighbor in (n for n in find_neighbors((x_idx, y_idx), max_x, max_y)
                                     if n not in checked_cells):
                        # If the cell is empty, then we check whether
                        # it should be populated in the next generation
                        if not board[neighbor[1]][neighbor[0]]:
                            checked_cells.add(neighbor)
                            if check_new_life(neighbor, board, max_x, max_y):
                                next_board[neighbor[1]][neighbor[0]] = True

Observant readers might also notice that I’ve added a checked_cells set. This tracks which empty cells we’ve already examined to see if they will be populated next generation. Making use of that means that we will only check a specific empty cell once per generation no matter how many populated cells it’s a neighbor of.

These optimizations to the checking code made things about 6x as fast as the naive approach.

Gridless: Only tracking populated cells

The principle behind the intermediate branch of only operating on populated cells and their neighbors seemed like it should be applicable to the data structure I was storing the grid in as well as the checks. Instead of using fixed length arrays to store both the populated and empty portions of the grid, I figured it should be possible to simply store the populated portions of the grid and then use those for all of our operations.

However, C is a very anemic language when it comes to built in data structures.
if I was going to do that in my college class, I would have had to implement a linked list or a hash map data structure before I even got to the point where I could implement the rules of Conway’s Game of Life. Today, working in Python with it’s built in data types, it was very quick to implement a data structure of only the populated cells.

For the gridless branch, I replaced the 2d array with a set. The set contained tuples of (x-coordinate, y-coordinate) which defined the populated cells. One populated cell surrounded by empty cells would look like this:

board = set((
             (1,1),
           ))

Using a set had all sorts of advantages:

  • Setup became a matter of adding tuples representing just the populated cells to the set instead of creating the list of lists and putting True or False into every one of the list’s cells:
        - board = []
        - for y in range(0, max_y):
        -     for x in range(0, max_x):
        -         board[x][y] = (x, y) in initial_dataset
        + board = set()
        + for x, y in initial_dataset:
        +     board.add((x, y))
    
  • For most cases, the set will be smaller than the list of lists as it only has to store entries for the populated cells, not for all of the cells in the grid. Some patterns may be larger than the list of lists (because the overhead of the set is more than the overhead for a list of lists) but those patterns are largely unstable; after a few generations, the number of depopulated cells will increase to a level where the set will be smaller.
  • Displaying the grid only has to loop through the populated cells instead of looping through every entry in the list of lists to find the populated cells:
    - for y in range(0, max_y):
    -     for x in range(0, max_x):
    -         if board[x][y]:
    -             screen.addstr(y, x, ' ', curses.A_REVERSE)
    + for (x, y) in board:
    +     screen.addstr(y, x, ' ', curses.A_REVERSE)
  • Instead of copying the old board and then changing the status of cells each generation, it is now feasible to simply populate a new set with populated cells every generation. This is because the set() is empty except for the populated cells whereas the list of lists would need to have both the populated and the empty cells set to either True or False.
         - next_board = copy.deepcopy(board)
         + next_board = set()
  • Similar to the display loop, the main loop which checks what happens to the cells can now just loop through the populated cells instead of having to search the entire grid for them:
        # Perform checks and update the board
        for cell in board:
            if check_will_live(cell, board, max_x, max_y):
                next_board.add(cell)
            babies = check_new_life(cell, board, max_x, max_y)
            next_board.update(babies)
        board = next_board
  • Checking whether a cell is populated now becomes a simple containment check on a set which is faster than looking up the list indexes in the 2d array:
- if board[cell[0]][cell[1]]:
+ if cell in board:

Gridless made the program about 3x faster than intermediate, or about 20x faster than naive.

Master: Only checking cell changes once per iteration

Despite being 3x faster than intermediate, gridless was doing some extra work. The code in the master branch attempts to correct those.

The most important change was that empty cells in gridless were being checked for each populated cell that was its neighbor. Adding a checked_cells set like the intermediate branch had to keep track of that ensures that we only check whether an empty cell should be populated in the next generation one time:

        checked_cells = set()
        for cell in board:
            if check_will_live(cell, board, max_x, max_y):
                next_board.add(cell)
            checked_cells.add(cell)
            # Pass checked_cells into check_new_life so that
            # checking skips empty neighbors which have already
            # been checked this generation
            babies, barren = check_new_life(cell, board, checked_cells, max_x, max_y)
            checked_cells.update(babies)
            checked_cells.update(barren)

The other, but relatively small, optimization was to use Python’s builtin least-recently-used cache decorator on the find_neighbors function. This allowed us to skip computing the set of neighboring cells when those cells were requested soon after each other. In the set-based code, finding_neighbors is called for the same cell back to back quite frequently so this did have a noticable impact.

+ @functools.lru_cache()
  def find_neighbors(cell, max_x, max_y):

These changes sped up the master branch an additional 30% over what gridless had achieved or nearly 30x as fast as the naive implementation that we started with.

What have I learned?

  • I really made these optimizations to Conway’s Game of Life purely to see whether any of the ideas I had had about it in the past 25 years would make a difference. It is pleasing to see that they did.
  • I think that choice of programming language made a huge difference in how I approached this problem. Even allowing that I was a less experienced programmer 25 years ago, I think that the cost in programmer time to implement sets and other advanced data structures in C would likely have taken long enough that I wouldn’t have finished the fastest of these designs as a one week class assignment using C. Implementing the same ideas in Python, where the data structures exist as part of the language, took less than a day.
  • I had originally envisioned using a linked list or set instead of a 2d array as purely a memory optimization. After I started coding it, it quickly became clear that only saving the populated cells also reduced the number of cells that would have to be searched through. I’d like to say I saw that from the beginning but it was just serendipity.


from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...