Web Clipping with Org Roam

  1. emacs
  2. note

Disclaimer

This post is written with the help of chatgpt. What can I say, it writes English waaayy better than me :). I do think it makes the article looks too “formal” to me, but still a quite good experience.

Introduction

Back in the day, there was a popular saying “the internet never forgets.” Unfortunately, this couldn’t be further from the truth. Web pages can disappear or change significantly over time, leaving you with broken links and lost information. That’s why it’s crucial to have a reliable backup plan for the web pages that matter to you.

One such solution is Org Roam, an Emacs package that helps you create and organize notes in a hierarchical manner. Additionally, org-web-tools is a package that enables you to save web pages as plain, readable org files. By combining these two packages, you can build a powerful system for organizing and accessing information.

In this post, we’ll explore how to use org-web-tools, to clip web pages to your Org Roam notes. With this technique, you can easily save and organize web pages that are important to you, even if they disappear from the internet.

Web clipping with org capture

Org Capture is a powerful tool for quickly creating notes in Emacs. To use it for web clipping, we’ll need to install the org-web-tools package. This package allows us to easily insert web pages into our note-taking system.

Org Roam has builtin org protocol support, which includes two types of protocols. For web clipping, we’ll be using the roam-ref protocol to preserve the link as a “ref”, you can find more information about this in the org roam manual.

Define a capture template for web clips:

(setq org-roam-capture-ref-templates
        '(("w" "ref" plain "%(org-web-tools--url-as-readable-org \"${ref}\")"
           :target (file+head "clips/${slug}.org" "#+title: ${title}\n")
           :unnarrowed t)))

However, since the function org-web-tools--url-as-readable-org is not autoloaded, make sure to load the package before capture:

(use-package! org-web-tools
  :commands org-web-tools--url-as-readable-org)

As mentioned in my previous post, I use tridactyl in Firefox, with the following configuration:

bind ;r js javascript:location.href = 'org-protocol://roam-ref?template=w&ref=' + encodeURIComponent(location.href) + '&title=' + encodeURIComponent(document.title)

Now I can pressing ;r in Firefox would bring up a capture frame. The web page is then converted to a plain readable org format with org-web-tools, although I do need to do some minor cleanup after the capture, such as removing base64 images from the result.

Overall, this is a highly efficient and streamlined way of clipping web pages and integrating them into my note-taking workflow.

Screenshot

2023-02-14_22-36_roam-capture.png