Wednesday, January 2, 2019

Modeling Polymorphism in Django With Python

Modeling polymorphism in relational databases is a challenging task. In this article, we present several modeling techniques to represent polymorphic objects in a relational database using the Django object-relational mapping (ORM).

This intermediate-level tutorial is designed for readers who are already familiar with the fundamental design of Django.

What Is Polymorphism?

Polymorphism is the ability of an object to take on many forms. Common examples of polymorphic objects include event streams, different types of users, and products in an e-commerce website. A polymorphic model is used when a single entity requires different functionality or information.

In the examples above, all events are logged for future use, but they can contain different data. All users need be able to log in, but they might have different profile structures. In every e-commerce website, a user wants to put different products in their shopping cart.

Why Is Modeling Polymorphism Challenging?

There are many ways to model polymorphism. Some approaches use standard features of the Django ORM, and some use special features of the Django ORM. The main challenges you’ll encounter when modeling polymorphic objects are the following:

  • How to represent a single polymorphic object: Polymorphic objects have different attributes. The Django ORM maps attributes to columns in the database. In that case, how should the Django ORM map attributes to the columns in the table? Should different objects reside in the same table? Should you have multiple tables?

  • How to reference instances of a polymorphic model: To utilize database and Django ORM features, you need to reference objects using foreign keys. How you decide to represent a single polymorphic object is crucial to your ability to reference it.

To truly understand the challenges of modeling polymorphism, you are going to take a small bookstore from its first online website to a big online shop selling all sorts of products. Along the way, you’ll experience and analyze different approaches for modeling polymorphism using the Django ORM.

Naive Implementation

You have a bookstore in a nice part of town right next to a coffee shop, and you want to start selling books online.

You sell only one type of product: books. In your online store, you want to show details about the books, like name and price. You want your users to browse around the website and collect many books, so you also need a cart. You eventually need to ship the books to the user, so you need to know the weight of each book to calculate the delivery fee.

Let’s create a simple model for your new book store:

from django.contrib.auth import get_user_model
from django.db import models


class Book(models.Model):
    name = models.CharField(
        max_length=100,
    )
    price = models.PositiveIntegerField(
        help_text='in cents',
    )
    weight = models.PositiveIntegerField(
        help_text='in grams',
    )

    def __str__(self) -> str:
        return self.name


class Cart(models.Model):
    user = models.OneToOneField(
        get_user_model(),
        primary_key=True,
        on_delete=models.CASCADE,
    )
    books = models.ManyToManyField(Book)

To create a new book, you provide a name, price, and weight:

>>>
>>> from naive.models import Book
>>> book = Book.objects.create(name='Python Tricks', price=1000, weight=200)
>>> book
<Product: Python Tricks>

To create a cart, you first need to associate it with a user:

>>>
>>> from django.contrib.auth import get_user_model
>>> haki = get_user_model().create_user('haki')

>>> from naive.models import Cart
>>> cart = Cart.objects.create(user=haki)

Then the user can start adding items to it:

>>>
>>> cart.products.add(book)
>>> cart.products.all()
<QuerySet [<Book: Python Tricks>]>

Pro

  • Easy to understand and maintain: It’s sufficient for a single type of product.

Con

  • Restricted to homogeneous products: It only supports products with the same set of attributes. Polymorphism is not captured or permitted at all.

Sparse Model

With the success of your online bookstore, users started to ask if you also sell e-books. E-books are a great product for your online store, and you want to start selling them right away.

A physical book is different from an e-book:

  • An e-book has no weight. It’s a virtual product.

  • An e-book does not require shipment. Users download it from the website.

To make your existing model support the additional information for selling e-books, you add some fields to the existing Book model:

from django.contrib.auth import get_user_model
from django.db import models


class Book(models.Model):
    TYPE_PHYSICAL = 'physical'
    TYPE_VIRTUAL = 'virtual'
    TYPE_CHOICES = (
        (TYPE_PHYSICAL, 'Physical'),
        (TYPE_VIRTUAL, 'Virtual'),
    )
    type = models.CharField(
        max_length=20,
        choices=TYPE_CHOICES,
    )

    # Common attributes
    name = models.CharField(
        max_length=100,
    )
    price = models.PositiveIntegerField(
        help_text='in cents',
    )

    # Specific attributes
    weight = models.PositiveIntegerField(
        help_text='in grams',
    )
    download_link = models.URLField(
        null=True, blank=True,
    )

    def __str__(self) -> str:
        return f'[{self.get_type_display()}] {self.name}'


class Cart(models.Model):
    user = models.OneToOneField(
        get_user_model(),
        primary_key=True,
        on_delete=models.CASCADE,
    )
    books = models.ManyToManyField(
        Book,
    )

First, you added a type field to indicate what type of book it is. Then, you added a URL field to store the download link of the e-book.

To add a physical book to your bookstore, do the following:

>>>
>>> from sparse.models import Book
>>> physical_book = Book.objects.create(
...     type=Book.TYPE_PHYSICAL,
...     name='Python Tricks',
...     price=1000,
...     weight=200,
...     download_link=None,
... )
>>> physical_book
<Book: [Physical] Python Tricks>

To add a new e-book, you do the following:

>>>
>>> virtual_book = Book.objects.create(
...     type=Book.TYPE_VIRTUAL,
...     name='The Old Man and the Sea',
...     price=1500,
...     weight=0,
...     download_link='https://books.com/12345',
... )
>>> virtual_book
<Book: [Virtual] The Old Man and the Sea>

Your users can now add both books and e-books to the cart:

>>>
>>> from sparse.models import Cart
>>> cart = Cart.objects.create(user=user)
>>> cart.books.add(physical_book, virtual_book)
>>> cart.books.all()
<QuerySet [<Book: [Physical] Python Tricks>, <Book: [Virtual] The Old Man and the Sea>]>

The virtual books are a big hit, and you decide to hire employees. The new employees are apparently not so tech savvy, and you start seeing weird things in the database:

>>>
>>> Book.objects.create(
...     type=Book.TYPE_PHYSICAL,
...     name='Python Tricks',
...     price=1000,
...     weight=0,
...     download_link='http://books.com/54321',
... )

That book apparently weighs 0 pounds and has a download link.

This e-book apparently weighs 100g and has no download link:

>>>
>>> Book.objects.create(
...     type=Book.TYPE_VIRTUAL,
...     name='Python Tricks',
...     price=1000,
...     weight=100,
...     download_link=None,
... )

This doesn’t make any sense. You have a data integrity problem.

To overcome integrity problems, you add validations to the model:

from django.core.exceptions import ValidationError


class Book(models.Model):

    # ...

    def clean(self) -> None:
        if self.type == Book.TYPE_VIRTUAL:
            if self.weight != 0:
                raise ValidationError(
                    'A virtual product weight cannot exceed zero.'
                )

            if self.download_link is None:
                raise ValidationError(
                    'A virtual product must have a download link.'
                )

        elif self.type == Book.TYPE_PHYSICAL:
            if self.weight == 0:
                raise ValidationError(
                    'A physical product weight must exceed zero.'
                )

            if self.download_link is not None:
                raise ValidationError(
                    'A physical product cannot have a download link.'
                )

        else:
            assert False, f'Unknown product type "{self.type}"'

You used Django’s built-in validation mechanism to enforce data integrity rules. clean() is only called automatically by Django forms. For objects that are not created by a Django form, you need to make sure to explicitly validate the object.

To keep the integrity of the Book model intact, you need to make a little change to the way you create books:

>>>
>>> book = Book(
...    type=Book.TYPE_PHYSICAL,
...    name='Python Tricks',
...    price=1000,
...    weight=0,
...    download_link='http://books.com/54321',
... )
>>> book.full_clean()
ValidationError: {'__all__': ['A physical product weight must exceed zero.']}

>>> book = Book(
...    type=Book.TYPE_VIRTUAL,
...    name='Python Tricks',
...    price=1000,
...    weight=100,
...    download_link=None,
... )
>>> book.full_clean()
ValidationError: {'__all__': ['A virtual product weight cannot exceed zero.']}

When creating objects using the default manager (Book.objects.create(...)), Django will create an object and immediately persist it to the database.

In your case, you want to validate the object before saving if to the database. You first create the object (Book(...)), validate it (book.full_clean()), and only then save it (book.save()).

Pro

  • Easy to understand and maintain: The sparse model is usually the first step we take when certain types of objects need more information. It’s very intuitive and easy to understand.

Cons

  • Unable to utilize NOT NULL database constraints: Null values are used for attributes that are not defined for all types of objects.

  • Complex validation logic: Complex validation logic is required to enforce data integrity rules. The complex logic also requires more tests.

  • Many Null fields create clutter: Representing multiple types of products in a single model makes it harder to understand and maintain.

  • New types require schema changes: New types of products require additional fields and validations.

Use Case

The sparse model is ideal when you’re representing heterogeneous objects that share most attributes, and when new items are not added very often.

Semi-Structured Model

Your bookstore is now a huge success, and you are selling more and more books. You have books from different genres and publishers, e-books with different formats, books with odd shapes and sizes, and so on.

In the sparse model approach, you added fields for every new type of product. The model now has a lot of nullable fields, and new developers and employees are having trouble keeping up.

To address the clutter, you decide to keep only the common fields (name and price) on the model. You store the rest of the fields in a single JSONField:

from django.contrib.auth import get_user_model
from django.contrib.postgres.fields import JSONField
from django.db import models

class Book(models.Model):
    TYPE_PHYSICAL = 'physical'
    TYPE_VIRTUAL = 'virtual'
    TYPE_CHOICES = (
        (TYPE_PHYSICAL, 'Physical'),
        (TYPE_VIRTUAL, 'Virtual'),
    )
    type = models.CharField(
        max_length=20,
        choices=TYPE_CHOICES,
    )

    # Common attributes
    name = models.CharField(
        max_length=100,
    )
    price = models.PositiveIntegerField(
        help_text='in cents',
    )

    extra = JSONField()

    def __str__(self) -> str:
        return f'[{self.get_type_display()}] {self.name}'


class Cart(models.Model):
    user = models.OneToOneField(
        get_user_model(),
        primary_key=True,
        on_delete=models.CASCADE,
    )
    books = models.ManyToManyField(
        Book,
        related_name='+',
    )

Your Book model is now clutter-free. Common attributes are modeled as fields. Attributes that are not common to all types of products are stored in the extra JSON field:

>>>
>>> from semi_structured.models import Book
>>> physical_book = Book(
...     type=Book.TYPE_PHYSICAL,
...     name='Python Tricks',
...     price=1000,
...     extra={'weight': 200},
... )
>>> physical_book.full_clean()
>>> physical_book.save()
<Book: [Physical] Python Tricks>

>>> virtual_book = Book(
...     type=Book.TYPE_VIRTUAL,
...     name='The Old Man and the Sea',
...     price=1500,
...     extra={'download_link': 'http://books.com/12345'},
... )
>>> virtual_book.full_clean()
>>> virtual_book.save()
<Book: [Virtual] The Old Man and the Sea>

>>> from semi_structured.models import Cart
>>> cart = Cart.objects.create(user=user)
>>> cart.books.add(physical_book, virtual_book)
>>> cart.books.all()
<QuerySet [<Book: [Physical] Python Tricks>, <Book: [Virtual] The Old Man and the Sea>]>

Clearing up the clutter is important, but it comes with a cost. The validation logic is a lot more complicated:

from django.core.exceptions import ValidationError
from django.core.validators import URLValidator

class Book(models.Model):

    # ...

    def clean(self) -> None:

        if self.type == Book.TYPE_VIRTUAL:

            try:
                weight = int(self.extra['weight'])
            except ValueError:
                raise ValidationError(
                    'Weight must be a number'
                )
            except KeyError:
                pass
            else:
                if weight != 0:
                    raise ValidationError(
                        'A virtual product weight cannot exceed zero.'
                    )

            try:
                download_link = self.extra['download_link']
            except KeyError:
                pass
            else:
                # Will raise a validation error
                URLValidator()(download_link)

        elif self.type == Book.TYPE_PHYSICAL:

            try:
                weight = int(self.extra['weight'])
            except ValueError:
                raise ValidationError(
                    'Weight must be a number'
                 )
            except KeyError:
                pass
            else:
                if weight == 0:
                    raise ValidationError(
                        'A physical product weight must exceed zero.'
                     )

            try:
                download_link = self.extra['download_link']
            except KeyError:
                pass
            else:
                if download_link is not None:
                    raise ValidationError(
                        'A physical product cannot have a download link.'
                    )

        else:
            raise ValidationError(f'Unknown product type "{self.type}"')

The benefit of using a proper field is that it validates the type. Both Django and the Django ORM can perform checks to make sure the right type is used for the field. When using a JSONField, you need to validate both the type and the value:

>>>
>>> book = Book.objects.create(
...     type=Book.TYPE_VIRTUAL,
...     name='Python Tricks',
...     price=1000,
...     extra={'weight': 100},
... )
>>> book.full_clean()
ValidationError: {'__all__': ['A virtual product weight cannot exceed zero.']}

Another issue with using JSON is that not all databases have proper support for querying and indexing values in JSON fields.

In PostgreSQL for example, you can query all the books that weigh more than 100:

>>>
>>> Book.objects.filter(extra__weight__gt=100)
<QuerySet [<Book: [Physical] Python Tricks>]>

However, not all database vendors support that.

Another restriction imposed when using JSON is that you are unable to use database constraints such as not null, unique, and foreign keys. You will have to implement these constraints in the application.

This semi-structured approach resembles NoSQL architecture and has many of its advantages and disadvantages. The JSON field is a way to get around the strict schema of a relational database. This hybrid approach provides us with the flexibility to squash many object types into a single table while still maintaining some of the benefits of a relational, strictly and strongly typed database. For many common NoSQL use cases, this approach might actually be more suitable.

Pros

  • Reduce clutter: Common fields are stored on the model. Other fields are stored in a single JSON field.

  • Easier to add new types: New types of products don’t require schema changes.

Cons

  • Complicated and ad hoc validation logic: Validating a JSON field requires validating types as well as values. This challenge can be addressed by using other solutions to validate JSON data such as JSON schema.

  • Unable to utilize database constraints: Database constraints such as null null, unique and foreign key constraints, which enforce type and data integrity at the database level, cannot be used.

  • Restricted by database support for JSON: Not all database vendors support querying and indexing JSON fields.

  • Schema is not enforced by the database system: Schema changes might require backward compatibility or ad hoc migrations. Data can “rot.”

  • No deep integration with the database metadata system: Metadata about the fields is not stored in the database. Schema is only enforced at the application level.

Use Case

A semi-structured model is ideal when you need to represent heterogeneous objects that don’t share many common attributes, and when new items are added often.

A classic use case for the semi-structured approach is storing events (like logs, analytics, and event stores). Most events have a timestamp, type and metadata like device, user agent, user, and so on. The data for each type is stored in a JSON field. For analytics and log events, it’s important to be able to add new types of events with minimal effort, so this approach is ideal.

Abstract Base Model

So far, you’ve worked around the problem of actually treating your products as heterogeneous. You worked under the assumption that the differences between the products is minimal, so it made sense to maintain them in the same model. This assumption can take you only so far.

Your little store is growing fast, and you want to start selling entirely different types of products, such as e-readers, pens, and notebooks.

A book and an e-book are both products. A product is defined using common attributes such as name and price. In an object-oriented environment, you could look at a Product as a base class or an interface. Every new type of product you add must implement the Product class and extend it with its own attributes.

Django offers the ability to create abstract base classes. Let’s define a Product abstract base class and add two models for Book and EBook:

from django.contrib.auth import get_user_model
from django.db import models


class Product(models.Model):
    class Meta:
        abstract = True

    name = models.CharField(
        max_length=100,
    )
    price = models.PositiveIntegerField(
        help_text='in cents',
    )

    def __str__(self) -> str:
        return self.name


class Book(Product):
    weight = models.PositiveIntegerField(
        help_text='in grams',
    )


class EBook(Product):
    download_link = models.URLField()

Notice that both Book and EBook inherit from Product. The fields defined in the base class Product are inherited, so the derived models Book and Ebook don’t need to repeat them.

To add new products, you use the derived classes:

>>>
>>> from abstract_base_model.models import Book
>>> book = Book.objects.create(name='Python Tricks', price=1000, weight=200)
>>> book
<Book: Python Tricks>

>>> ebook = EBook.objects.create(
...     name='The Old Man and the Sea',
...     price=1500,
...     download_link='http://books.com/12345',
... )
>>> ebook
<Book: The Old Man and the Sea>

You might have noticed that the Cart model is missing. You can try to create a Cart model with a ManyToMany field to Product:

class Cart(models.Model):
    user = models.OneToOneField(
       get_user_model(),
       primary_key=True,
       on_delete=models.CASCADE,
    )
    items = models.ManyToManyField(Product)

If you try to reference a ManyToMany field to an abstract model, you will get the following error:

abstract_base_model.Cart.items: (fields.E300) Field defines a relation with model 'Product', which is either not installed, or is abstract.

A foreign key constraint can only point to a concrete table. The abstract base model Product only exists in the code, so there is no products table in the database. The Django ORM will only create tables for the derived models Book and EBook.

Given that you can’t reference the abstract base class Product, you need to reference books and e-books directly:

class Cart(models.Model):
    user = models.OneToOneField(
        get_user_model(),
        primary_key=True,
        on_delete=models.CASCADE,
    )
    books = models.ManyToManyField(Book)
    ebooks = models.ManyToManyField(EBook)

You can now add both books and e-books to the cart:

>>>
>>> user = get_user_model().objects.first()
>>> cart = Cart.objects.create(user=user)
>>> cart.books.add(book)
>>> cart.ebooks.add(ebook)

This model is a bit more complicated now. Let’s query the total price of the items in the cart:

>>>
>>> from django.db.models import Sum
>>> from django.db.models.functions import Coalesce
>>> (
...     Cart.objects
...     .filter(pk=cart.pk)
...     .aggregate(total_price=Sum(
...         Coalesce('books__price', 'ebooks__price')
...     ))
... )
{'total_price': 1000}

Because you have more than one type of book, you use Coalesce to fetch either the price of the book or the price of the e-book for each row.

Pro

  • Easier to implement specific logic: A separate model for each product makes it easier to implement, test, and maintain specific logic.

Cons

  • Require multiple foreign keys: To reference all types of products, each type needs a foreign key.

  • Harder to implement and maintain: Operations on all types of products require checking all foreign keys. This adds complexity to the code and makes maintenance and testing harder.

  • Very hard to scale: New types of products require additional models. Managing many models can be tedious and very hard to scale.

Use Case

An abstract base model is a good choice when there are very few types of objects that required very distinct logic.

An intuitive example is modeling a payment process for your online shop. You want to accept payments with credit cards, PayPal, and store credit. Each payment method goes through a very different process that requires very distinct logic. Adding a new type of payment is not very common, and you don’t plan on adding new payment methods in the near future.

You create a payment process base class with derived classes for credit card payment process, PayPal payment process, and store credit payment process. For each of the derived classes, you implement the payment process in a very different way that cannot be easily shared. In this case, it might make sense to handle each payment process specifically.

Concrete Base Model

Django offers another way to implement inheritance in models. Instead of using an abstract base class that only exists in the code, you can make the base class concrete. “Concrete” means that the base class exists in the database as a table, unlike in the abstract base class solution, where the base class only exists in the code.

Using the abstract base model, you were unable to reference multiple type of products. You were forced to create a many-to-many relation for each type of product. This made it harder to perform tasks on the common fields such as getting the total price of all the items in the cart.

Using a concrete base class, Django will create a table in the database for the Product model. The Product model will have all the common fields you defined in the base model. Derived models such as Book and EBook will reference the Product table using a one-to-one field. To reference a product, you create a foreign key to the base model:

from django.contrib.auth import get_user_model
from django.db import models


class Product(models.Model):
    name = models.CharField(
        max_length=100,
    )
    price = models.PositiveIntegerField(
        help_text='in cents',
    )

    def __str__(self) -> str:
        return self.name


class Book(Product):
    weight = models.PositiveIntegerField()


class EBook(Product):
    download_link = models.URLField()

The only difference between this example and the previous one is that the Product model is not defined with abstract=True.

To create new products, you use derived Book and EBook models directly:

>>>
>>> from concrete_base_model.models import Book, EBook
>>> book = Book.objects.create(
...     name='Python Tricks',
...     price=1000,
...     weight=200,
... )
>>> book
<Book: Python Tricks>

>>> ebook = EBook.objects.create(
...     name='The Old Man and the Sea',
...     price=1500,
...     download_link='http://books.com/12345',
... )
>>> ebook
<Book: The Old Man and the Sea>

In the case of concrete base class, it’s interesting to see what’s happening in the underlying database. Let’s look at the tables created by Django in the database:

> \d concrete_base_model_product

Column |          Type          |                         Default
--------+-----------------------+---------------------------------------------------------
id     | integer                | nextval('concrete_base_model_product_id_seq'::regclass)
name   | character varying(100) |
price  | integer                |

Indexes:
   "concrete_base_model_product_pkey" PRIMARY KEY, btree (id)

Referenced by:
   TABLE "concrete_base_model_cart_items" CONSTRAINT "..." FOREIGN KEY (product_id) 
   REFERENCES concrete_base_model_product(id) DEFERRABLE INITIALLY DEFERRED

   TABLE "concrete_base_model_book" CONSTRAINT "..." FOREIGN KEY (product_ptr_id) 
   REFERENCES concrete_base_model_product(id) DEFERRABLE INITIALLY DEFERRED

   TABLE "concrete_base_model_ebook" CONSTRAINT "..." FOREIGN KEY (product_ptr_id) 
   REFERENCES concrete_base_model_product(id) DEFERRABLE INITIALLY DEFERRED

The product table has two familiar fields: name and price. These are the common fields you defined in the Product model. Django also created an ID primary key for you.

In the constraints section, you see multiple tables that are referencing the product table. Two tables that stand out are concrete_base_model_book and concrete_base_model_ebook:

> \d concrete_base_model_book

    Column     |  Type
---------------+---------
product_ptr_id | integer
weight         | integer

Indexes:
   "concrete_base_model_book_pkey" PRIMARY KEY, btree (product_ptr_id)

Foreign-key constraints:
   "..." FOREIGN KEY (product_ptr_id) REFERENCES concrete_base_model_product(id) 
   DEFERRABLE INITIALLY DEFERRED

The Book model has only two fields:

  • weight is the field you added in the derived Book model.
  • product_ptr_id is both the primary of the table and a foreign key to the base product model.

Behind the scenes, Django created a base table for product. Then, for each derived model, Django created another table that includes the additional fields, and a field that acts both as a primary key and a foreign key to the product table.

Let’s take a look at a query generated by Django to fetch a single book. Here are the results of print(Book.objects.filter(pk=1).query):

SELECT
    "concrete_base_model_product"."id",
    "concrete_base_model_product"."name",
    "concrete_base_model_product"."price",
    "concrete_base_model_book"."product_ptr_id",
    "concrete_base_model_book"."weight"
FROM
    "concrete_base_model_book"
    INNER JOIN "concrete_base_model_product" ON
        "concrete_base_model_book"."product_ptr_id" = "concrete_base_model_product"."id"
WHERE
    "concrete_base_model_book"."product_ptr_id" = 1

To fetch a single book, Django joined concrete_base_model_product and concrete_base_model_book on the product_ptr_id field. The name and price are in the product table and the weight is in the book table.

Since all the products are managed in the Product table, you can now reference it in a foreign key from the Cart model:

class Cart(models.Model):
    user = models.OneToOneField(
        get_user_model(),
        primary_key=True,
        on_delete=models.CASCADE,
    )
    items = models.ManyToManyField(Product)

Adding items to the cart is the same as before:

>>>
>>> from concrete_base_model.models import Cart
>>> cart = Cart.objects.create(user=user)
>>> cart.items.add(book, ebook)
>>> cart.items.all()
<QuerySet [<Book: Python Tricks>, <Book: The Old Man and the Sea>]>

Working with common fields is also simple:

>>>
>>> from django.db.models import Sum
>>> cart.items.aggregate(total_price=Sum('price'))
{'total_price': 2500}

Pros

  • Primary key is consistent across all types: The product is issued by a single sequence in the base table. This restriction can be easily resolved by using a UUID instead of a sequence.

  • Common attributes can be queried from a single table: Common queries such as total price, list of product names, and prices can be fetched directly from the base table.

Cons

  • New product types require schema changes: A new type requires a new model.

  • Can produce inefficient queries: The data for a single item is in two database tables. Fetching a product requires a join with the base table.

  • Cannot access extended data from base class instance: A type field is required to downcast an item. This adds complexity to the code. django-polymorphic is a popular module that might eliminate some of these challenges.

Use Case

The concrete base model approach is useful when common fields in the base class are sufficient to satisfy most common queries.

For example, if you often need to query for the cart total price, show a list of items in the cart, or run ad hoc analytic queries on the cart model, you can benefit from having all the common attributes in a single database table.

Generic Foreign Key

Inheritance can sometimes be a nasty business. It forces you to create (possibly premature) abstractions, and it doesn’t always fit nicely into the ORM.

The main problem you have is referencing different products from the cart model. You first tried to squash all the product types into one model (sparse model, semi-structured model), and you got clutter. Then you tried splitting products into separate models and providing a unified interface using a concrete base model. You got a complicated schema and a lot of joins.

Django offers a special way of referencing any model in the project called GenericForeignKey. Generic foreign keys are part of the Content Types framework built into Django. The content type framework is used by Django itself to keep track of models. This is necessary for some core capabilities such as migrations and permissions.

To better understand what content types are and how they facilitate generic foreign keys, let’s look at the content type related to the Book model:

>>>
>>> from django.contrib.contenttypes.models import ContentType
>>> ct = ContentType.objects.get_for_model(Book)
>>> vars(ct)
{'_state': <django.db.models.base.ModelState at 0x7f1c9ea64400>,
'id': 22,
'app_label': 'concrete_base_model',
'model': 'book'}

Each model has a unique identifier. If you want to reference a book with PK 54, you can say, “Get object with PK 54 in the model represented by content type 22.”

GenericForeignKey is implemented exactly like that. To create a generic foreign key, you define two fields:

  • A reference to a content type (the model)
  • The primary key of the referenced object (the model instance’s pk attribute)

To implement a many-to-many relation using GenericForeignKey, you need to manually create a model to connect carts with items.

The Cart model remains roughly similar to what you have seen so far:

from django.db import models
from django.contrib.auth import get_user_model


class Cart(models.Model):
    user = models.OneToOneField(
        get_user_model(),
        primary_key=True,
        on_delete=models.CASCADE,
    )

Unlike previous Cart models, this Cart no longer includes a ManyToMany field. You are going need to do that yourself.

To represent a single item in the cart, you need to reference both the cart and any product:

from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType


class CartItem(models.Model):
    cart = models.ForeignKey(
        Cart,
        on_delete=models.CASCADE,
        related_name='items',
    )
    product_object_id = models.IntegerField()
    product_content_type = models.ForeignKey(
        ContentType,
        on_delete=models.PROTECT,
    )
    product = GenericForeignKey(
        'product_content_type',
        'product_object_id',
    )

To add a new item in the Cart, you provide the content type and the primary key:

>>>
>>> book = Book.objects.first()

>>> Item.objects.create(
...     product_content_type=ContentType.objects.get_for_model(book),
...     product_object_id=book.pk,
... )
>>> ebook = EBook.objects.first()

>>> Item.objects.create(
...    product_content_type=ContentType.objects.get_for_model(ebook),
...    product_object_id=ebook.pk,
... )

Adding an item to a cart is a common task. You can add a method on the cart to add any product to the cart:

class Cart(models.Model):

    # ...

    def add_item(self, product) -> 'CartItem':
        product_content_type = ContentType.objects.get_for_model(product)

        return CartItem.objects.create(
            cart=self,
            product_content_type=product_content_type,
            product_object_id=product.pk,
        )

Adding a new item to a cart is now much shorter:

>>>
>>> cart.add_item(book)
>>> cart.add_item(ebook)

Getting information about the items in the cart is also possible:

>>>
>>> cart.items.all()
<QuerySet [<CartItem: CartItem object (1)>, <CartItem: CartItem object (2)>]

>>> item = cart.items.first()
>>> item.product
<Book: Python Tricks>

>>> item.product.price
1000

So far so good. Where’s the catch?

Let’s try to calculate the total price of the products in the cart:

>>>
>>> from django.db.models import Sum
>>> cart.items.aggregate(total=Sum('product__price'))

FieldError: Field 'product' does not generate an automatic reverse 
relation and therefore cannot be used for reverse querying. 
If it is a GenericForeignKey, consider adding a GenericRelation.

Django tells us it isn’t possible to traverse the generic relation from the generic model to the referenced model. The reason for that is that Django has no idea which table to join to. Remember, the Item model can point to any ContentType.

The error message does mention a GenericRelation. Using a GenericRelation, you can define a reverse relation from the referenced model to the Item model. For example, you can define a reverse relation from the Book model to items of books:

from django.contrib.contenttypes.fields import GenericRelation

class Book(model.Model):
    # ...
    cart_items = GenericRelation(
        'CartItem',
        'product_object_id',
        'product_content_type_id',
        related_query_name='books',
    )

Using the reverse relation, you can answer questions like how many carts include a specific book:

>>>
>>> book.cart_items.count()
4

>>> CartItem.objects.filter(books__id=book.id).count()
4

The two statement are identical.

You still need to know the price of the entire cart. You already saw that fetching the price from each product table is impossible using the ORM. To do that, you have to iterate the items, fetch each item separately, and aggregate:

>>>
>>> sum(item.product.price for item in cart.items.all())
2500

This is one of the major disadvantages of generic foreign keys. The flexibility comes with a great performance cost. It’s very hard to optimize for performance using just the Django ORM.

Structural Subtyping

In the abstract and concrete base class approaches, you used nominal subtyping, which is based on a class hierarchy. Mypy is able to detect this form of relation between two classes and infer types from it.

In the generic relation approach, you used structural subtyping. Structural subtyping exists when a class implements all the methods and attributes of another class. This form of subtyping is very useful when you wish to avoid direct dependency between modules.

Mypy provides a way to utilize structural subtyping using Protocols.

You already identified a product entity with common methods and attributes. You can define a Protocol:

from typing_extensions import Protocol

class Product(Protocol):
    pk: int
    name: str
    price: int

    def __str__(self) -> str:
        ...

You can now use the Product protocol to add type information. For example, in add_item(), you accept an instance of a product and add it to the cart:

def add_item(
    self,
    product: Product,
) -> 'CartItem':
    product_content_type = ContentType.objects.get_for_model(product)

    return CartItem.objects.create(
        cart=self,
        product_content_type=product_content_type,
        product_object_id=product.pk,
    )

Running mypy on this function will not yield any warnings. Let’s say you change product.pk to product.id, which is not defined in the Product protocol:

def add_item(
    self,
    product: Product,
) -> 'CartItem':
    product_content_type = ContentType.objects.get_for_model(product)

    return CartItem.objects.create(
        cart=self,
        product_content_type=product_content_type,
        product_object_id=product.id,
    )

You will get the following warning from Mypy:

$ mypy
models.py:62: error: "Product" has no attribute "id"

Pros

  • Migrations are not needed to add product types: The generic foreign key can reference any model. Adding a new type of product does not require migrations.

  • Any model can be used as an item: Using generic foreign key, any model can be referenced by the Item model.

  • Built-in admin support: Django has built-in support for generic foreign keys in the admin. It can inline, for example, information about the referenced models in the detail page.

  • Self-contained module: There is no direct dependency between the products module and the cart module. This makes this approach ideal for existing projects and pluggable modules.

Cons

  • Can produce inefficient queries: The ORM cannot determine in advance what models are referenced by the generic foreign key. This makes it very difficult for it to optimize queries that fetch multiple types of products.

  • Harder to understand and maintain: Generic foreign key eliminates some Django ORM features that require access to specific product models. Accessing information from the product models requires writing more code.

  • Typing requires Protocol: Mypy is unable to provide type checking for generic models. A Protocol is required.

Use Case

Generic foreign keys are a great choice for pluggable modules or existing projects. The use of GenericForeignKey and structural subtyping abstract any direct dependency between the modules.

In the bookstore example, the book and e-book models can exist in a separate app and new products can be added without changing the cart module. For existing projects, a Cart module can be added with minimal changes to existing code.

The patterns presented in this article play nicely together. Using a mixture of patterns, you can eliminate some of the disadvantages and optimize the schema for your use case.

For example, in the generic foreign key approach, you were unable to get the price of the entire cart quickly. You had to fetch each item separately and aggregate. You can address this specific concern by inlining the price of the product on the Item model (the sparse model approach). This will allow you to query only the Item model to get the total price very quickly.

Conclusion

In this article, you started with a small town bookstore and grew it to a big e-commerce website. You tackled different types of problems and adjusted your model to accommodate the changes. You learned that problems such as complex code and difficulty adding new programmers to the team are often symptoms of a larger problem. You learned how to identify these problems and solve them.

You now know how to plan and implement a polymorphic model using the Django ORM. You’re familiar with multiple approaches, and you understand their pros and cons. You’re able to analyze your use case and decide on the best course of action.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]



from Real Python
read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...