Harnessing the power of Django and Python to build a configurable taxonomy
This article aims to present a way to implement a fully flexible taxonomy system inside your Django app. The editing implementation will rely heavily on the use of Wagtail (a CMS built on Django) but will still be relevant if only Django is used.
Business case
The case for a taxonomy can be broad — you may be developing a blog, are bored, and really want a complicated way to tag posts. Alternatively, you may be working with a knowledge management system and need to provide a structured system to manage hierarchal categorization of your team's information.
Either way, it is important to understand your goals before writing a single line of code. Or at least write some code, get frustrated, and then come back to think about what you are trying to do.
Our goals
- Build a flexible system to manage a nested (tree shaped) taxonomy.
- We must be able to go arbitrarily deep.
- We must be able to add the canonical (correct) terms but also have space to provide and search via the non-correct terms (such as abbreviations).
- We need to minimize dependencies and stay as close to Django conventions as possible (for future maintainability).
- Avoid any difficult to understand terms in the user interface (e.g. taxonomy).
What is a business taxonomy?
Glad you asked! Think of a taxonomy as a globally shared vocabulary for the business or organization. This vocabulary is often used throughout all documentation, categorization, and training, but never really written down in one place.
Taxonomies help organize content and knowledge into hierarchical relationships, adding detail to terms and concepts the further you go down the levels.
These two links add a bit more context:
Wearing the right hat
When I worked on a similar project for a client, one thing I found hard was switching between the right hats.
One hat was the business analyst, a.k.a, the guy who needs to translate what the boss has asked for. With this hat on, I found that there were legitimate concerns over how the company's information could be managed, searchable, and categorized to help add value to the organization as a whole.
The next hat was that of the developer. Here, I had to work with existing code and frameworks to implement a complex solution quickly and simply, along with consideration for future development wherever possible.
Finally, the hat that matters in the long run — the one of the everyday user. It was this hat I often found the hardest to don after wearing the others for a long time.
The concepts, both abstract and data model side, made sense to me and it felt like everyone else would get on board easily. In reality, I had to remember that I had been thinking and brainstorming this project for a long time and had the chance to really internalize the goals and way to think.
In the end, we landed on a great single sentence that helped our end users grok the concept of our 'taxonomy'. We also ditched the name taxonomy all together and used a more friendly internally relevant terminology instead.
Prerequisites
Installation of Wagtail 2.0. As of publication, this is still a release candidate but is solid enough to use.
We will be using Django 2.0 and all Python 3.5+ syntax (because it is awesome!).
Finally, we will be taking advantage of an incredible Python project called django-treebeard. I first found out about this project in depth after working with Wagtail for a while.
Essentially, this library takes all of the heavy lifting of managing a nested tree model inside a standard relational database. It is projects like this that get me excited about the power of Python and also the way Django can be extended. Shout out to @tabo for this epic project.
Note: If you have Wagtail up and running, you will not need to install django-treebeard. For a raw Django project, you will need to install the package.
Code walkthrough
1 - the 'Node' model
Naming this is hard. For now, we will just call our elements inside the taxonomy a 'node'. Our nodes will extend the django-treebeard project's Materialized Path Tree nodes, described as follows:
- Each node has one single path in the tree (think URL paths).
- There must be one single root node that all other nodes connect to.
- Nodes can be ordered in relation to their siblings. Initially, we will just order them by their name, the field.
- Nodes have a
path
,depth
, andnumchild
field whose values should not be changed directly. - The default set up can have a depth of 63, which I am sure will be sufficient for our use case.
We will be adding our own fields to the Node
model:
name
- a CharField that represents the canonical name of the Node.aliases
- a TextField where each line represents another potential name or abbreviation for the Node.node_order_index
- an IntegerField which can be used in the future if we want to implement custom ordering in the user interface.
Here is our initial model definition for the Node
model:
# File: my_app/models.py
from django import forms
from django.core.validators import MinLengthValidator
from django.db import models
from treebeard.mp_tree import MP_Node
from wagtail.admin.edit_handlers import FieldPanel
class Node(MP_Node):
"""Represents a single nestable Node in the corporate taxonomy."""
# node editable fields
name = models.CharField(
max_length=50,
unique=True,
help_text='Keep the name short, ideally one word.',
validators=[MinLengthValidator(5)]
)
aliases = models.TextField(
'Also known as',
max_length=255,
blank=True,
help_text="What else is this known as or referred to as?"
)
# node tree specific fields and attributes
node_order_indaex = models.IntegerField(
blank=True,
default=0,
editable=False
)
node_child_verbose_name = 'child'
# important: node_order_by should NOT be changed after first Node created
node_order_by = ['node_order_index', 'name']
# wagtail specific - simple way to declare which fields are editable
panels = [
FieldPanel('parent'), # virtual field - see TopicForm later
FieldPanel('name'),
FieldPanel('aliases', widget=forms.Textarea(attrs={'rows': '5'})),
]
After you have this model declared, you will want to run migrations in your console:
$ python3 ./manage.py makemigrations
$ python3 ./manage.py migrate
2 - The form
For the sake of simplicity, we will assume all of the code will go in the same models.py
file. In practice, you would be best served splitting up into separate files, but it is easier to get up and running with everything in one place.
We will be using the Wagtail system of building forms, but you can apply the main __init__
and __save__
overrides to any Django form or even Django modeladmin.
Key items to note:
- The djang-treebeard node API reference will come in handy here, we will be using methods like
get_depth
andis_root
from this API. parent
is a field that provides a user interface to select the parent of the node being edited (or created). We have extended theModelChoiceField
class to create a customBasicNodeChoiceField
where we can get a nice indication of the Node structure in our select box.__init__
on our form has been modified to do a few things.instance
will be an instance of Node bound to the values provided when the form submits, when creating or editing a Node.- If we are editing the root node (
instance.is_root()
) or creating the first node (Node.objects.count() is 0
) we want to ensure that theparent
field is hidden and will not throw an error if not filled out. - If we are editing an existing node we want to pre-select the node's parent via
get_parent()
.
save
needs to be changed to work with thedjango-treebeard
API, as we cannot just create or move Nodes directly.- First, we get the Node
instance
that is attempting to be saved, then we get the value ofparent
submitted with the form (which will be None for the root Node). - If we are not committing changes on this save call, we can simply return the instance provided.
- Otherwise, we want to handle the following cases:
- Creating the first Node, which will become the root Node, handled by the classmethod
add_root
. - Creating a Node, but not the root Node, which must be placed as a child under an existing parent Node via
add_child
on the parent node. - Making non-parent changes to any Node is handled by the normal
save
method. - Moving an existing node to a new location under a different parent Node, handled by
move(parent, pos='sorted-child')
.
- Creating the first Node, which will become the root Node, handled by the classmethod
- Finally, we tell Wagtail to use this form class when editing the Node model via
Node.base_form_class = NodeForm
.
- First, we get the Node
# File: my_app/models.py
# ... other imports from previous sections
from django import forms
from wagtail.admin.forms import WagtailAdminModelForm
class BasicNodeChoiceField(forms.ModelChoiceField):
def label_from_instance(self, obj):
depth_line = '-' * (obj.get_depth() - 1)
return "{} {}".format(depth_line, super().label_from_instance(obj))
class NodeForm(WagtailAdminModelForm):
parent = BasicNodeChoiceField(
required=True,
queryset=Node.objects.all(),
empty_label=None,
)
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
instance = kwargs['instance']
if instance.is_root() or Node.objects.count() is 0:
# hide and disable the parent field
self.fields['parent'].disabled = True
self.fields['parent'].required = False
self.fields['parent'].empty_label = 'N/A - Root Node'
self.fields['parent'].widget = forms.HiddenInput()
# update label to indicate this is the root
self.fields['name'].label += ' (Root)'
elif instance.id:
self.fields['parent'].initial = instance.get_parent()
def save(self, commit=True, *args, **kwargs):
instance = super().save(commit=False, *args, **kwargs)
parent = self.cleaned_data['parent']
if not commit:
# simply return the instance if not actually saving (committing)
return instance
if instance.id is None: # creating a new node
if Node.objects.all().count() is 0: # no nodes, creating root
Node.add_root(instance=instance) # add a NEW root node
else: # nodes exist, must be adding node under a parent
instance = parent.add_child(instance=instance)
else: # editing an existing node
instance.save() # update existing node
if instance.get_parent() != parent:
instance.move(parent, pos='sorted-child')
return instance
Node.base_form_class = NodeForm
3 - Wagtail modeladmin editing
We will now use the Wagtail modeladmin module. This is a powerful way to add CRUD operations to our models in the admin interface. It is similar (in concept) to Django's modeladmin, but not the same. It also makes extensive use of the awesome Class-based views.
Note: The Class-based views provide a great way to add functionality to Django without reinventing the wheel. They are easy to customize and provide a great API that is easy to extend and gives you a great example of a structure for view classes.
We will be declaring a new class that will extend ModelAdmin
:
model
is set to ourNode
model class.list_display
has ourname
andalias
field, along with a method available on theMP_Node
classget_parent
.inspect_view_enabled
means that the users can click on a simple view page to look at details but not edit anything on the Node.
# File: my_app/models.py
# ... other imports from previous sections
from wagtail.contrib.modeladmin.options import ModelAdmin
class NodeAdmin(ModelAdmin):
"""Class for presenting topics in admin using modeladmin."""
model = Node
# admin menu options
menu_icon = 'fa-cube' # using wagtail-fontawesome
menu_order = 800
# listing view options
list_per_page = 50
list_display = ('name', 'get_parent', 'aliases')
search_fields = ('name', 'aliases')
# inspect view options
inspect_view_enabled = True
inspect_view_fields = ('name', 'get_parent', 'aliases', 'id')
We will then register our custom ModelAdmin
in a new file called wagtail_hooks.py
. This is a special file name convention that Wagtail will ensure runs before the admin interface is prepared.
# File: my_app/wagtail_hooks.py
from .models import NodeAdmin
from wagtail.contrib.modeladmin.options import modeladmin_register
modeladmin_register(NodeAdmin)
4 - Node model enhancements
For round two of our model definition, we will add some nice helper methods to be used later.
Node
now also extendsindex.Indexed
— this provides the ability for this model to be indexed for searching. See also thesearch_fields
definition on the model for the fields we have added to the index.get_as_listing_header
is a method that renders a custom template that shows off the 'depth' of our Nodes. We also set theshort_description
andadmin_order_field
attributes on this method, used bymodeladmin
to show a nice column header.get_parent
is just the same method provided byMP_node
. However, we need to re-declare it on the model to set theshort_description
used bymodeladmin
.delete
method is overridden to block the deletion of the root Node. This is really important — if it is deleted, the node tree will be corrupted and chaos will enter the ancient forest.__str__
magic method is used to show a nice string representation of our Nodes.- Finally, we have decided that
Node
is not a friendly name for our team. We have elected to useTopic
instead.modeladmin
will also honor this reference and automatically use it in the admin interface.
# File: my_app/models.py
from django import forms
from django.core.exceptions import PermissionDenied
from django.core.validators import MinLengthValidator
from django.db import models
from django.template.loader import render_to_string # added
from treebeard.mp_tree import MP_Node
from wagtail.admin.edit_handlers import FieldPanel
from wagtail.search import index # added
class Node(index.Indexed, MP_Node): # Note: Now using index.Indexed in model
"""Represents a single nestable Node in the corporate taxonomy."""
# ...name, aliases and other attributes defined above go here
def get_as_listing_header(self):
"""Build HTML representation of node with title & depth indication."""
depth = self.get_depth()
rendered = render_to_string(
'includes/node_list_header.html',
{
'depth': depth,
'depth_minus_1': depth - 1,
'is_root': self.is_root(),
'name': self.name,
}
)
return rendered
get_as_listing_header.short_description = 'Name'
get_as_listing_header.admin_order_field = 'name'
def get_parent(self, *args, **kwargs):
"""Duplicate of get_parent from treebeard API."""
return super().get_parent(*args, **kwargs)
get_parent.short_description = 'Parent'
search_fields = [
index.SearchField('name', partial_match=True),
index.SearchField('aliases', partial_match=False, boost=0.25),
]
def delete(self):
"""Prevent users from deleting the root node."""
if self.is_root():
raise PermissionDenied('Cannot delete root Topic.')
else:
super().delete()
def __str__(self):
return self.name
class Meta:
verbose_name = 'Topic'
verbose_name_plural = 'Topics'
Here is the template used by our get_as_listing_header
method.
{# File: my_app/templates/includes/node_list_header.html #}
{% if is_root %}
<span style="font-size:135%;"><strong>{{ name }}</strong></span>
{% else %}
<span>
<span class="inline-block" style="margin-left:{{ depth_minus_1 }}em; font-size:{% if depth is 1 %}120{% elif depth is 2 %}110{% else %}100{% endif %}%;"></span>
<i class="icon icon-fa-level-up icon-fa-rotate-90" style="display: inline-block;"></i>
{{ name }}
</span>
{% endif %}
Then we need to update the definition of our NodeAdmin
to take advantage of our pretty get_as_listing_header
method.
class NodeAdmin(ModelAdmin):
#... other options
# listing view options ('name' replaced with 'get_as_listing_header')
list_display = ('get_as_listing_header', 'get_parent', 'aliases')
5 - Finishing up
We can now add a relation to our Nodes on any of our other models, where appropriate.
We can add a many-to-one relationship using ForeignKey.
KnowledgePage(Page):
# ... other fields
node = models.ForeignKey(
'my_app.Node',
on_delete=models.CASCADE,
)
We can add a many-to-many relationship using ManyToManyField.
KnowledgePage(Page):
# ... other fields
nodes = models.ManyToManyField('my_app.Node')
We now have an interface to manage our taxonomy, along with a way to link the nodes to any other model within Django.
Bonus points - Adding icing on the root Node
Hide delete button on root Node
It is nice to not show buttons that users are not meant to use. Thankfully, modeladmin
makes it easy to override how the buttons for each row are generated.
# File: my_app/models.py
from wagtail.contrib.modeladmin.helpers import ButtonHelper # add import
class NodeButtonHelper(ButtonHelper):
def delete_button(self, pk, *args, **kwargs):
"""Ensure that the delete button is not shown for root node."""
instance = self.model.objects.get(pk=pk)
if instance.is_root():
return
return super().delete_button(pk, *args, **kwargs)
class NodeAdmin(ModelAdmin):
#... other options
button_helper_class = NodeButtonHelper
Add button to quickly add a child node
This is a bit more involved, but worth it to understand how to work with class-based views and modeladmin in depth.
Walkthrough:
NodeButtonHelper
has a few changes to essentially create and insert a new button,add_child_button
, which will provide a simple way to pre-fill the parent field on a create Node view.AddChildNodeViewClass
extends theCreateView
class. Here, we do a few things:__init__
gets the pk (primary key) from the request and checks it is valid via the prepared queryset andget_object_or_404
.get_page_title
gives the user a nicer title on the create page, relevant to the parent they selected.get_initial
sets the initial values for ourNodeForm
. No changes are needed toNodeForm
for this to work.
- Inside our
NodeAdmin
, we override two methods:add_child_view
— this gives the modeladmin module a reference to a view to assign to the relevant URL.get_admin_urls_for_registration
— this adds our new URL for the above view to the registration process (Wagtail admin requires all admin URL patterns to be registered a specific way).
# File: my_app/models.py
from django.conf.urls import url
from django.contrib.admin.utils import quote, unquote
from django.shortcuts import get_object_or_404
from wagtail.contrib.modeladmin.helpers import ButtonHelper
from wagtail.contrib.modeladmin.views import CreateView
class NodeButtonHelper(ButtonHelper):
# delete_button... see above
def prepare_classnames(self, start=None, add=None, exclude=None):
"""Parse classname sets into final css classess list."""
classnames = start or []
classnames.extend(add or [])
return self.finalise_classname(classnames, exclude or [])
def add_child_button(self, pk, child_verbose_name, **kwargs):
"""Build a add child button, to easily add a child under node."""
classnames = self.prepare_classnames(
start=self.edit_button_classnames + ['icon', 'icon-plus'],
add=kwargs.get('classnames_add'),
exclude=kwargs.get('classnames_exclude')
)
return {
'classname': classnames,
'label': 'Add %s %s' % (
child_verbose_name, self.verbose_name),
'title': 'Add %s %s under this one' % (
child_verbose_name, self.verbose_name),
'url': self.url_helper.get_action_url('add_child', quote(pk)),
}
def get_buttons_for_obj(self, obj, exclude=None, *args, **kwargs):
"""Override the getting of buttons, prepending create child button."""
buttons = super().get_buttons_for_obj(obj, *args, **kwargs)
add_child_button = self.add_child_button(
pk=getattr(obj, self.opts.pk.attname),
child_verbose_name=getattr(obj, 'node_child_verbose_name'),
**kwargs
)
buttons.append(add_child_button)
return buttons
class AddChildNodeViewClass(CreateView):
"""View class that can take an additional URL param for parent id."""
parent_pk = None
parent_instance = None
def __init__(self, model_admin, parent_pk):
self.parent_pk = unquote(parent_pk)
object_qs = model_admin.model._default_manager.get_queryset()
object_qs = object_qs.filter(pk=self.parent_pk)
self.parent_instance = get_object_or_404(object_qs)
super().__init__(model_admin)
def get_page_title(self):
"""Generate a title that explains you are adding a child."""
title = super().get_page_title()
return title + ' %s %s for %s' % (
self.model.node_child_verbose_name,
self.opts.verbose_name,
self.parent_instance
)
def get_initial(self):
"""Set the selected parent field to the parent_pk."""
return {'parent': self.parent_pk}
class NodeAdmin(ModelAdmin):
#... other NodeAdmin attributes...
def add_child_view(self, request, instance_pk):
"""Generate a class-based view to provide 'add child' functionality."""
# instance_pk will become the default selected parent_pk
kwargs = {'model_admin': self, 'parent_pk': instance_pk}
view_class = AddChildNodeViewClass
return view_class.as_view(**kwargs)(request)
def get_admin_urls_for_registration(self):
"""Add the new url for add child page to the registered URLs."""
urls = super().get_admin_urls_for_registration()
add_child_url = url(
self.url_helper.get_action_url_pattern('add_child'),
self.add_child_view,
name=self.url_helper.get_action_url_name('add_child')
)
return urls + (add_child_url, )
In closing
I really hope this has been helpful from both the technical and 'thinking it through' perspective.
There is a lot of room for improvement in this implementation, but this is a solid starting point. From here, you can build your own amazing taxonomy systems in every application... that needs it.
You can view the full models.py
file on a GitHub gist. There are a few minor additions and tweaks based on the project I based this blog on.
Header Photo by Will Turner on Unsplash.
Very useful article and clean implementation, found it in many places and wondering about one thing. Suppose i want to Node be able to have some personal url and content for it representation (for example instance of or reference to Wagtail Page)
What can you say about most correct approach to reach the goal?
Great stuff!
I’m really curious to know what was that single sentence to explain “taxonomy” and what was the friendly term you replaced it with :)
Good question - we ended up simply calling them Topics (hence the class Meta: verbose_name = “Topic”). Our one liner was “Topics are what pages are about and are grouped into Subjects”. Seems simple but took a while to get there and also explained our nesting (which was limited to three levels deep: Root > Subject > Topic).
Will you describe the differences of your Django Taxonomy solution to a popular knowledge base solution, Semantic-MediaWiki?
Django is a web framework so it leaves the implementation of essentially everything the user interacts with up to the developer. Whereas Semantic-MediaWiki is an extension to MediaWiki (the wiki platform that Wikipedia uses). Semantic-MediaWiki is a full implementation of semantic data on top of Wikipedia (basically storing triples of object-entity-value).
The solution implemented above could be used as just the ‘value’ part of the semantics, eg. “PageModel hasTopic TopicNode” or it could be used as the entity-value if you enforce some rules about the second level of the node tree being the ‘entity’.
It really depends on how far you want to go down the semantic rabbit hole and how much you want to work with protocols like RDF. The closest Django specific project