Converting Wallabag articles into Firefox bookmarks

2020-12-03

Here is a script for converting articles exported from Wallabag into bookmarks format suitable for importing into Firefox.

I recently decided to shut down my Wallabag1 instance, in favour of the simpler system of saving articles to read as bookmarks in a “to-read” folder in Firefox. While this means that the text and images are not cached offline, the articles can still be displayed cleanly using the “reader mode” built into Firefox, and the to-read list can be synced to different devices.

To import bookmarks into Firefox, they need to be in Netscape Bookmarks format, which is documented here. The basic structure looks like this:

<!DOCTYPE NETSCAPE-Bookmark-file-1>
<Title>Bookmarks</Title>
<H1>Bookmarks</H1>
<DL>
  <DT><A HREF="{url}" ADD_DATE="{date}" LAST_VISIT="{date}" LAST_MODIFIED="{date}">{title}</A>
  <DT>...
</DL>

To get the list of articles out of Wallabag and into the bookmarks HTML format:

  1. Export the desired articles from Wallabag in v2 JSON format.
  2. Convert them to Netscape Bookmarks format with the Python script below.

The structure of the JSON file looks like this:

[
  {
    "id": 107,
    "url": "https://aeon.co/essays/how-to-understand-cells-...",
    "title": "Cognition all the way down",
    "published_by": ["Michael Levin & Daniel C Dennett"],
    "created_at": "2020-11-14T08:02:52+0000",
    "updated_at": "2020-11-14T08:02:52+0000",
    "published_at": "2020-10-13T10:00:00+0000",
    "content": "<p>Biologists like to think of themselves as ...",
    ...
  },
  {...}
]

This script converts it into the correct format for importing into Firefox:

#!/usr/bin/env python3
import json
import sys

_, json_file, out_file = sys.argv

header = """<!DOCTYPE NETSCAPE-Bookmark-file-1>
<Title>Bookmarks</Title>
<H1>Bookmarks</H1>
<DL>"""
footer = "</DL>"
template = (
    '  <DT>'
    '<A HREF="{url}" '
    'ADD_DATE="{created_at}" '
    'LAST_VISIT="{created_at}" '
    'LAST_MODIFIED="{updated_at}">'
    '{title}</A>'
)

with open(json_file) as f:
    data = json.load(f)

with open(out_file, "wt") as f:
    print(header, file=f)

    for article in data:
        print(template.format(**article), file=f)

    print(footer, file=f)

  1. Wallabag is a read–it–later service like Instapaper and Pocket, but self–hosted.  ↩