Fischer, Part 3: Importing and Exporting Bookmarks :: Brain Dump

In order to further iterate on Fischer, I'm going to need to be able to test with a reasonably large pile of bookmarks. That's going to get tedious quickly if I can't import and export them as I experiment. In the past, the shortest path to doing this has been using django-import-export and either a script or django admin integration. I'd prefer a script for this, so I'm also going to use django-extensions to get the runscript management command and easily run them from nanodjango.

Installing `django-import-export` and `django-extensions`

First, as always, I add the packages to the poetry project:

poetry add django-import-export django-extensions

Then I change the way I use the Django constructor in my script to include them.

app = Django(
    EXTRA_APPS=["taggit", "django_extensions", "import_export"],
    TAGGIT_CASE_INSENSITIVE=True,
)

Finally, I create a scripts package at my project root so that the runscript command can find my scripts:

mkdir scripts
touch scripts/__init__.py

With that done, any script I save in that package will be runnable using the command

nanodjango manage fischer.py runscript <script_name>

So, if I put load_bookmarks.py in there, it’d be runnable using nanodjango manage fischer.py runscript load_bookmarks.

Creating resources for import/export

This is the first time I’ve ever used import-export and taggit together, so I need to do some basic exploration to see how they work together.

First, I add a resource for importing and exporting bookmarks to fischer.py

from import_export import resources

class BookmarkResource(resources.ModelResource):
    class Meta:
        model = Bookmark

Then I create a script in my scripts directory called dump_bookmarks.py with the following content:

from fischer import BookmarkResource

def run():
    dataset = BookmarkResource().export()
    print(dataset.csv)

and run it to see what I get:

nanodjango manage fischer.py runscript dump_bookmarks

id,user,title,summary,url,is_favorite,notes,created_at,updated_at,tags
1,1,Duck Duck Go,Internet search engine,https://duckduckgo.com/,1,,2024-10-05 13:17:36,2024-10-06 04:18:36,"1,2"

That’s not bad, but won’t be directly useful to me yet. I made some straightforward modifications to my BookmarkResource class based on the examples in the django-taggit documentation:

from import_export import resources
from import_export import fields as import_export_fields
from import_export import widgets as import_export_widgets

class BookmarkResource(resources.ModelResource):
    tags = import_export_fields.Field(
        column_name='tags',
        attribute='tags',
        widget=import_export_widgets.ManyToManyWidget("taggit.Tag", field="name", separator=","),
    )
    class Meta:
        model = Bookmark
        fields = ("user", "title", "summary", "url", "is_favorite", "notes", "tags")

Then modified my dump script to select only the fields I want to export:

from fischer import BookmarkResource

def run():
    dataset = BookmarkResource().export(export_fields=["title", "url", "summary", "notes", "is_favorite", "tags"])
    print(dataset.csv)

and now get something much nicer when I export:

title,summary,url,is_favorite,notes,tags
Duck Duck Go,Internet search engine,https://duckduckgo.com/,1,,"privacy,search"

The next step is to get that working for import. Since I want to be able to specify the import user, that seems like it might be a little more work.

Importing bookmark data

To see what the gap is, I first create a new script named load_bookmarks.py in my scripts package and try importing exactly what I just exported:

import tablib
from fischer import Bookmark, BookmarkResource

def run():
    csv_data = """title,summary,url,is_favorite,notes,tags
Duck Duck Go,Internet search engine,https://duckduckgo.com/,1,,"privacy,search"
"""
    dataset = tablib.Dataset()
    dataset.csv = csv_data
    bookmark_resource = BookmarkResource(model=Bookmark)()
    result = bookmark_resource.import_data(dataset, dry_run=True)
    print(f"Result has errors: {result.has_errors()}")
    if result.has_errors():
        for row in result.error_rows:
            print(f"{row.errors}")

Unsurprisingly, that results in the following error:

nanodjango manage fischer.py runscript load_bookmarks
Result has errors: True
[<Error: IntegrityError('NOT NULL constraint failed: fischer_bookmark.user_id') at row OrderedDict({'title': 'Duck Duck Go', 'summary': 'Internet search engine', 'url': 'https://duckduckgo.com/', 'is_favorite': '1', 'notes': '', 'tags': 'privacy,search'}) at number 1>]

because I didn’t export the user id and haven’t given myself a way to specify it.

After a few trips to the import-export documentation and a few more trips to the debugger, I settled on a straightforward approach to make this work the way I want.

First, I edited the BookmarkResource metaclass to use the combination of user and url to identify an existing bookmark in the database:

    class Meta:
        model = Bookmark
        fields = ("user", "title", "summary", "url", "is_favorite", "notes", "tags")
        import_id_fields = ("user", "url")

That suggests a change that needs to be made to the model so that the databse enforces those as unique together, but I’ll wait until later to make that change.

Then I added an import_user_id parameter to the BookmarkResource constructor along with overriding before_import and before_import_row to populate the user id for the import session:

class BookmarkResource(resources.ModelResource):
    def __init__(self, import_user_id=None, **kwargs):
        super().__init__(**kwargs)
        self.import_user_id = import_user_id

    tags = import_export_fields.Field(
        column_name='tags',
        attribute='tags',
        widget=import_export_widgets.ManyToManyWidget("taggit.Tag", field="name", separator=","),
    )

    def before_import(self, dataset, **kwargs):
        if self.import_user_id is None:
            raise ValueError("import_user_id must be set prior to import")
        if "user" not in dataset.headers:
            dataset.headers.append("user")
        super().before_import(dataset, **kwargs)

    def before_import_row(self, row, **kwargs):
        row["user"] = self.import_user_id
        super().before_import_row(row, **kwargs)

    class Meta:
        model = Bookmark
        fields = ("user", "title", "summary", "url", "is_favorite", "notes", "tags")
        import_id_fields = ("user", "url")

That earned me an absolutely inscrutable error message:

[<Error: AttributeError("'str' object has no attribute 'objects'") at row OrderedDict({'title': 'Duck Duck Go', 'summary': 'Internet search engine', 'url': 'https://duckduckgo.com/', 'is_favorite': '1', 'notes': '', 'tags': 'privacy,search', 'user': 1}) at number 1>]

After some time in the debugger, I realized that django-import-export wants a Model class, not the name of one for the first parameter to the ManyToManyWidget constructor, as is the case almost everywhere else in the framework.

So I added

from taggit.models import Tag

and changed that constructor call to

        widget=import_export_widgets.ManyToManyWidget(Tag, field="name", separator=","),

and that did the trick, at least for the dry run I specified in my script. Now it’s time to add a new line with a new tag, and see what happens with a real import. (I’m also making a note to offer a patch to django-import-export to allow model names.)

import tablib
from fischer import Bookmark, BookmarkResource
from import_export import resources

def run():
    csv_data = """title,summary,url,is_favorite,notes,tags
Duck Duck Go,Internet search engine,https://duckduckgo.com/,1,,"privacy,search"
Google,Internet search engine,https://google.com/,0,,"adtech,search"
"""
    dataset = tablib.Dataset()
    dataset.csv = csv_data
    bookmark_resource = BookmarkResource(import_user_id=1)
    result = bookmark_resource.import_data(dataset)
    print(f"Result has errors: {result.has_errors()}")
    if result.has_errors():
        for row in result.error_rows:
            print(f"{row.errors}")

While the import is successful, it’s not quite a flawless victory. A quick review of my admin UI shows that the new tag on line 2 of the CSV was not created.

That seems like something that can be handled easily enough in the before_import_row I already wrote:

    def before_import_row(self, row, **kwargs):
        row["user"] = self.import_user_id
        tag_names = row["tags"].split(",")
        for name in tag_names:
            _, created = Tag.objects.get_or_create(name=name)
            if created:
                logger.info(f"Importing {row['url']} for user id {row['user']} created tag {name}.")
        super().before_import_row(row, **kwargs)

After turning logging on for my import script:

import logging

import tablib
from fischer import BookmarkResource

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[logging.StreamHandler()]
)

def run():
    csv_data = """title,summary,url,is_favorite,notes,tags
Duck Duck Go,Internet search engine,https://duckduckgo.com/,1,,"privacy,search"
Google,Internet search engine,https://google.com/,0,,"adtech,search"
"""
    dataset = tablib.Dataset()
    dataset.csv = csv_data
    bookmark_resource = BookmarkResource(import_user_id=1)
    result = bookmark_resource.import_data(dataset)
    print(f"Result has errors: {result.has_errors()}")
    if result.has_errors():
        for row in result.error_rows:
            print(f"{row.errors}")

I can see that unknown tags are now created. At this point, I like the import and export well enough that I want to commit my progress, even though I plan to make it better (by wiring it into the admin and writing real scripts for it) later.

Adding a constraint to the database

Before this feels done enough to move on, I need to make the database enforce the importer’s id expectations by adding the following to the Bookmark model’s Meta class:

constraints = [
            models.UniqueConstraint(
                fields=["user", "url"], name="unique_user_bookmark_url"
            )
        ]

If you want to follow along with my progress, you can get the code here.

Read Part 4: Configuring Django Authentication

Return to the introduction

Fischer, Part 3: Importing and Exporting Bookmarks

Installing django-import-export and django-extensions

Creating resources for import/export

Importing bookmark data

Adding a constraint to the database

Installing `django-import-export` and `django-extensions`