Fischer, Part 3: Importing and Exporting Bookmarks
In order to further iterate on Fischer, I'm going to need to be able to test with a reasonably large pile of bookmarks. That's going to get tedious quickly if I can't import and export them as I experiment. In the past, the shortest path to doing this has been using django-import-export
and either a script or django admin integration. I'd prefer a script for this, so I'm also going to use django-extensions
to get the runscript
management command and easily run them from nanodjango.
Installing django-import-export
and django-extensions
First, as always, I add the packages to the poetry project:
poetry add django-import-export django-extensions
Then I change the way I use the Django
constructor in my script to include them.
app = Django(
EXTRA_APPS=["taggit", "django_extensions", "import_export"],
TAGGIT_CASE_INSENSITIVE=True,
)
Finally, I create a scripts package at my project root so that the runscript
command can find my scripts:
mkdir scripts
touch scripts/__init__.py
With that done, any script I save in that package will be runnable using the command
nanodjango manage fischer.py runscript <script_name>
So, if I put load_bookmarks.py
in there, it’d be runnable using nanodjango manage fischer.py runscript load_bookmarks
.
Creating resources for import/export
This is the first time I’ve ever used import-export and taggit together, so I need to do some basic exploration to see how they work together.
First, I add a resource for importing and exporting bookmarks to fischer.py
from import_export import resources
class BookmarkResource(resources.ModelResource):
class Meta:
model = Bookmark
Then I create a script in my scripts directory called dump_bookmarks.py
with the following content:
from fischer import BookmarkResource
def run():
dataset = BookmarkResource().export()
print(dataset.csv)
and run it to see what I get:
nanodjango manage fischer.py runscript dump_bookmarks
id,user,title,summary,url,is_favorite,notes,created_at,updated_at,tags
1,1,Duck Duck Go,Internet search engine,https://duckduckgo.com/,1,,2024-10-05 13:17:36,2024-10-06 04:18:36,"1,2"
That’s not bad, but won’t be directly useful to me yet. I made some straightforward modifications to my BookmarkResource class based on the examples in the django-taggit documentation:
from import_export import resources
from import_export import fields as import_export_fields
from import_export import widgets as import_export_widgets
class BookmarkResource(resources.ModelResource):
tags = import_export_fields.Field(
column_name='tags',
attribute='tags',
widget=import_export_widgets.ManyToManyWidget("taggit.Tag", field="name", separator=","),
)
class Meta:
model = Bookmark
fields = ("user", "title", "summary", "url", "is_favorite", "notes", "tags")
Then modified my dump script to select only the fields I want to export:
from fischer import BookmarkResource
def run():
dataset = BookmarkResource().export(export_fields=["title", "url", "summary", "notes", "is_favorite", "tags"])
print(dataset.csv)
and now get something much nicer when I export:
title,summary,url,is_favorite,notes,tags
Duck Duck Go,Internet search engine,https://duckduckgo.com/,1,,"privacy,search"
The next step is to get that working for import. Since I want to be able to specify the import user, that seems like it might be a little more work.
Importing bookmark data
To see what the gap is, I first create a new script named load_bookmarks.py
in my scripts
package and try importing exactly what I just exported:
import tablib
from fischer import Bookmark, BookmarkResource
def run():
csv_data = """title,summary,url,is_favorite,notes,tags
Duck Duck Go,Internet search engine,https://duckduckgo.com/,1,,"privacy,search"
"""
dataset = tablib.Dataset()
dataset.csv = csv_data
bookmark_resource = BookmarkResource(model=Bookmark)()
result = bookmark_resource.import_data(dataset, dry_run=True)
print(f"Result has errors: {result.has_errors()}")
if result.has_errors():
for row in result.error_rows:
print(f"{row.errors}")
Unsurprisingly, that results in the following error:
nanodjango manage fischer.py runscript load_bookmarks
Result has errors: True
[<Error: IntegrityError('NOT NULL constraint failed: fischer_bookmark.user_id') at row OrderedDict({'title': 'Duck Duck Go', 'summary': 'Internet search engine', 'url': 'https://duckduckgo.com/', 'is_favorite': '1', 'notes': '', 'tags': 'privacy,search'}) at number 1>]
because I didn’t export the user id and haven’t given myself a way to specify it.
After a few trips to the import-export documentation and a few more trips to the debugger, I settled on a straightforward approach to make this work the way I want.
First, I edited the BookmarkResource
metaclass to use the combination of user
and url
to identify an existing bookmark in the database:
class Meta:
model = Bookmark
fields = ("user", "title", "summary", "url", "is_favorite", "notes", "tags")
import_id_fields = ("user", "url")
That suggests a change that needs to be made to the model so that the databse enforces those as unique together, but I’ll wait until later to make that change.
Then I added an import_user_id
parameter to the BookmarkResource
constructor along with overriding before_import
and before_import_row
to populate the user id for the import session:
class BookmarkResource(resources.ModelResource):
def __init__(self, import_user_id=None, **kwargs):
super().__init__(**kwargs)
self.import_user_id = import_user_id
tags = import_export_fields.Field(
column_name='tags',
attribute='tags',
widget=import_export_widgets.ManyToManyWidget("taggit.Tag", field="name", separator=","),
)
def before_import(self, dataset, **kwargs):
if self.import_user_id is None:
raise ValueError("import_user_id must be set prior to import")
if "user" not in dataset.headers:
dataset.headers.append("user")
super().before_import(dataset, **kwargs)
def before_import_row(self, row, **kwargs):
row["user"] = self.import_user_id
super().before_import_row(row, **kwargs)
class Meta:
model = Bookmark
fields = ("user", "title", "summary", "url", "is_favorite", "notes", "tags")
import_id_fields = ("user", "url")
That earned me an absolutely inscrutable error message:
[<Error: AttributeError("'str' object has no attribute 'objects'") at row OrderedDict({'title': 'Duck Duck Go', 'summary': 'Internet search engine', 'url': 'https://duckduckgo.com/', 'is_favorite': '1', 'notes': '', 'tags': 'privacy,search', 'user': 1}) at number 1>]
After some time in the debugger, I realized that django-import-export wants a Model
class, not the name of one for the first parameter to the ManyToManyWidget
constructor, as is the case almost everywhere else in the framework.
So I added
from taggit.models import Tag
and changed that constructor call to
widget=import_export_widgets.ManyToManyWidget(Tag, field="name", separator=","),
and that did the trick, at least for the dry run I specified in my script. Now it’s time to add a new line with a new tag, and see what happens with a real import. (I’m also making a note to offer a patch to django-import-export to allow model names.)
import tablib
from fischer import Bookmark, BookmarkResource
from import_export import resources
def run():
csv_data = """title,summary,url,is_favorite,notes,tags
Duck Duck Go,Internet search engine,https://duckduckgo.com/,1,,"privacy,search"
Google,Internet search engine,https://google.com/,0,,"adtech,search"
"""
dataset = tablib.Dataset()
dataset.csv = csv_data
bookmark_resource = BookmarkResource(import_user_id=1)
result = bookmark_resource.import_data(dataset)
print(f"Result has errors: {result.has_errors()}")
if result.has_errors():
for row in result.error_rows:
print(f"{row.errors}")
While the import is successful, it’s not quite a flawless victory. A quick review of my admin UI shows that the new tag on line 2 of the CSV was not created.
That seems like something that can be handled easily enough in the before_import_row
I already wrote:
def before_import_row(self, row, **kwargs):
row["user"] = self.import_user_id
tag_names = row["tags"].split(",")
for name in tag_names:
_, created = Tag.objects.get_or_create(name=name)
if created:
logger.info(f"Importing {row['url']} for user id {row['user']} created tag {name}.")
super().before_import_row(row, **kwargs)
After turning logging on for my import script:
import logging
import tablib
from fischer import BookmarkResource
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[logging.StreamHandler()]
)
def run():
csv_data = """title,summary,url,is_favorite,notes,tags
Duck Duck Go,Internet search engine,https://duckduckgo.com/,1,,"privacy,search"
Google,Internet search engine,https://google.com/,0,,"adtech,search"
"""
dataset = tablib.Dataset()
dataset.csv = csv_data
bookmark_resource = BookmarkResource(import_user_id=1)
result = bookmark_resource.import_data(dataset)
print(f"Result has errors: {result.has_errors()}")
if result.has_errors():
for row in result.error_rows:
print(f"{row.errors}")
I can see that unknown tags are now created. At this point, I like the import and export well enough that I want to commit my progress, even though I plan to make it better (by wiring it into the admin and writing real scripts for it) later.
Adding a constraint to the database
Before this feels done enough to move on, I need to make the database enforce the importer’s id expectations by adding the following to the Bookmark
model’s Meta
class:
constraints = [
models.UniqueConstraint(
fields=["user", "url"], name="unique_user_bookmark_url"
)
]
If you want to follow along with my progress, you can get the code here.