Monday, December 3, 2018

Julien Danjou: A multi-value syntax tree filtering in Python

A multi-value syntax tree filtering in Python

A while ago, we've seen how to write a simple filtering syntax tree with Python. The idea was to provide a small abstract syntax tree with an easy to write data structure that would be able to filter a value. Filtering meaning that once evaluated, our AST would return either True or False based on the passed value.

With that, we were able to write small rules like Filter({"eq": 3})(4) that would return False since, well, 4 is not equal to 3.

In this new post, I propose we enhance our filtering ability to support multiple values. The idea is to be able to write something like this:

>>> f = Filter(
  {"and": [
    {"eq": ("foo", 3)},
    {"gt": ("bar", 4)},
   ]
  },
)
>>> f(foo=3, bar=5)
True
>>> f(foo=4, bar=5)
False

The biggest change here is that the binary operators (eq, gt, le, etc.) now support getting two values, and not only one, and that we can pass multiple values to our filter by using keyword arguments.

How should we implement that? Well, we can keep the same data structure we built previously. However, this time we're gonna do the following change:

  • The left value of the binary operator will be a string that will be used as the key to access the keyword arguments passed to our Filter.__call__ values.
  • The right value of the binary operator will be kept as it is (like before).

We therefore need to change our Filter.build_evaluator to accommodate this as follow:

def build_evaluator(self, tree):
    try:
        operator, nodes = list(tree.items())[0]
    except Exception:
        raise InvalidQuery("Unable to parse tree %s" % tree)
    try:
        op = self.multiple_operators[operator]
    except KeyError:
        try:
            op = self.binary_operators[operator]
        except KeyError:
            raise InvalidQuery("Unknown operator %s" % operator)
        assert len(nodes) == 2 # binary operators take 2 values
        def _op(values):
            return op(values[nodes[0]], nodes[1])
        return _op
    # Iterate over every item in the list of the value linked
    # to the logical operator, and compile it down to its own
    # evaluator.
    elements = [self.build_evaluator(node) for node in nodes]
    return lambda values: op((e(values) for e in elements))

The algorithm is pretty much the same, the tree being browsed recursively.

First, the operator and its arguments (nodes) are extracted.

Then, if the operator takes multiple arguments (such as and and or operators), each node is recursively evaluated and a function is returned evaluating those nodes.
If the operator is a binary operator (such as eq, lt, etc.), it checks that the passed argument list length is 2. Then, it returns a function that will apply the operator (e.g., operator.eq) to values[nodes[0]] and nodes[1]: the former access the arguments (values) passed to the filter's __call__ function while the latter is directly the passed argument.

The full class looks like this:

import operator


class InvalidQuery(Exception):
    pass


class Filter(object):
    binary_operators = {
        u"=": operator.eq,
        u"==": operator.eq,
        u"eq": operator.eq,

        u"<": operator.lt,
        u"lt": operator.lt,

        u">": operator.gt,
        u"gt": operator.gt,

        u"<=": operator.le,
        u"≤": operator.le,
        u"le": operator.le,

        u">=": operator.ge,
        u"≥": operator.ge,
        u"ge": operator.ge,

        u"!=": operator.ne,
        u"≠": operator.ne,
        u"ne": operator.ne,
    }

    multiple_operators = {
        u"or": any,
        u"∨": any,
        u"and": all,
        u"∧": all,
    }

    def __init__(self, tree):
        self._eval = self.build_evaluator(tree)

    def __call__(self, **kwargs):
        return self._eval(kwargs)

    def build_evaluator(self, tree):
        try:
            operator, nodes = list(tree.items())[0]
        except Exception:
            raise InvalidQuery("Unable to parse tree %s" % tree)
        try:
            op = self.multiple_operators[operator]
        except KeyError:
            try:
                op = self.binary_operators[operator]
            except KeyError:
                raise InvalidQuery("Unknown operator %s" % operator)
            assert len(nodes) == 2 # binary operators take 2 values
            def _op(values):
                return op(values[nodes[0]], nodes[1])
            return _op
        # Iterate over every item in the list of the value linked
        # to the logical operator, and compile it down to its own
        # evaluator.
        elements = [self.build_evaluator(node) for node in nodes]
        return lambda values: op((e(values) for e in elements))

We can check that it works by building some filters:

x = Filter({"eq": ("foo", 1)})
assert not x(foo=1, bar=1)

x = Filter({"eq": ("foo", "bar")})
assert not x(foo=1, bar=1)

x = Filter({"or": (
    {"eq": ("foo", "bar")},
    {"eq": ("bar", 1)},
)})
assert x(foo=1, bar=1)

Supporting multiple values is handy as it allows to pass complete dictionaries to the filter, rather than just one value. That enables users to filter more complex objects.

Sub-dictionary support

It's also possible to support deeper data structure, like a dictionary of dictionary. By replacing values[nodes[0]] by self._resolve_name(values, node[0]) with a _resolve_name method like this one, the filter is able to traverse dictionaries:

ATTR_SEPARATOR = "."

def _resolve_name(self, values, name):
    try:
        for subname in name.split(self.ATTR_SEPARATOR):
            values = values[subname]
        return values
    except KeyError:
        raise InvalidQuery("Unknown attribute %s" % name)

It then works like that:

x = Filter({"eq": ("baz.sub", 23)})
assert x(foo=1, bar=1, baz={"sub": 23})

x = Filter({"eq": ("baz.sub", 23)})
assert not x(foo=1, bar=1, baz={"sub": 3})

By using the syntax key.subkey.subsubkey the filter is able to access item inside dictionaries on more complex data structure.

That basic filter engine can evolve quite easily in something powerful, as you can add new operators or new way to access/manipulate the passed data structure.

If you have other ideas on nifty features that could be added, feel free to add a comment below!



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...