abx ecosystem

ArchiveBox Plugin Gallery

Browse the plugins that ArchiveBox, abx-dl, and other tools in the abx-ecosystem provide to extract content from websites.

56 Plugins
76 Hook scripts
190 Config Options
56 / 56 visible

yt-dlp

ytdlp

Download video and audio media with metadata, subtitles, thumbnails, and description sidecars.

Crawl Snapshot Embed Fullscreen 2 hooks 9 config fields
#02
Needs binaries: yt-dlp, node, ffmpeg
Outputs: audio/, video/, image/, application/x-subrip +3 more

Plugin Info

Plugin

yt-dlp

ytdlp

Download video and audio media with metadata, subtitles, thumbnails, and description sidecars.

Required Plugins

No plugin dependencies declared.

Required Binaries
yt-dlp node ffmpeg
Output Mimetypes
audio/ video/ image/ application/x-subrip text/vtt application/json text/plain

Run It

ArchiveBox
YTDLP_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=ytdlp 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key YTDLP_ENABLED

Enable video/audio downloading with yt-dlp

Type: boolean
Aliases: MEDIA_ENABLED, SAVE_MEDIA, USE_MEDIA, USE_YTDLP, FETCH_MEDIA, SAVE_YTDLP
true
Config key YTDLP_BINARY

Path to yt-dlp binary

Type: string
Aliases: YOUTUBEDL_BINARY, YOUTUBE_DL_BINARY
"yt-dlp"
Config key YTDLP_NODE_BINARY

Path to Node.js binary for yt-dlp JS runtime

Type: string
Fallback: NODE_BINARY
"node"
Config key YTDLP_TIMEOUT

Timeout for yt-dlp downloads in seconds

Type: integer
Fallback: TIMEOUT
Aliases: MEDIA_TIMEOUT
Minimum: 30
3600
Config key YTDLP_COOKIES_FILE

Path to cookies file

Type: string
Fallback: COOKIES_FILE
""
Config key YTDLP_MAX_SIZE

Maximum file size for yt-dlp downloads

Type: string
Aliases: MEDIA_MAX_SIZE
Pattern: ^\d+[kmgKMG]?$
"750m"
Config key YTDLP_CHECK_SSL_VALIDITY

Whether to verify SSL certificates

Type: boolean
Fallback: CHECK_SSL_VALIDITY
true
Config key YTDLP_ARGS

Default yt-dlp arguments

Type: array
Aliases: YTDLP_DEFAULT_ARGS
[
  "--restrict-filenames",
  "--trim-filenames=128",
  "--write-description",
  "--write-info-json",
  "--write-thumbnail",
  "--write-sub",
  "--write-auto-subs",
  "--convert-subs=srt",
  "--yes-playlist",
  "--continue",
  "--no-abort-on-error",
  "--ignore-errors",
  "--geo-bypass",
  "--add-metadata",
  "--no-progress",
  "--remote-components=ejs:github",
  "-o",
  "%(title)s.%(ext)s"
]
Config key YTDLP_ARGS_EXTRA

Extra arguments to append to yt-dlp command

Type: array
Aliases: YTDLP_EXTRA_ARGS
[]

gallery-dl

gallerydl

Download image and media galleries along with metadata sidecars from supported sites.

Crawl Snapshot Embed Fullscreen 2 hooks 7 config fields
#03
Needs binaries: gallery-dl
Outputs: image/, video/, application/json, text/plain +1 more

Plugin Info

Plugin

gallery-dl

gallerydl

Download image and media galleries along with metadata sidecars from supported sites.

Required Plugins

No plugin dependencies declared.

Required Binaries
gallery-dl
Output Mimetypes
image/ video/ application/json text/plain application/zip

Run It

ArchiveBox
GALLERYDL_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=gallerydl 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key GALLERYDL_ENABLED

Enable gallery downloading with gallery-dl

Type: boolean
Aliases: SAVE_GALLERYDL, USE_GALLERYDL
true
Config key GALLERYDL_BINARY

Path to gallery-dl binary

Type: string
"gallery-dl"
Config key GALLERYDL_TIMEOUT

Timeout for gallery downloads in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 30
3600
Config key GALLERYDL_COOKIES_FILE

Path to cookies file

Type: string
Fallback: COOKIES_FILE
""
Config key GALLERYDL_CHECK_SSL_VALIDITY

Whether to verify SSL certificates

Type: boolean
Fallback: CHECK_SSL_VALIDITY
true
Config key GALLERYDL_ARGS

Default gallery-dl arguments

Type: array
Aliases: GALLERYDL_DEFAULT_ARGS
[
  "--write-metadata",
  "--write-info-json"
]
Config key GALLERYDL_ARGS_EXTRA

Extra arguments to append to gallery-dl command

Type: array
Aliases: GALLERYDL_EXTRA_ARGS
[]

forum-dl

forumdl

Download forum threads and exports in JSONL, WARC, and mailbox-style archive formats.

Crawl Snapshot Embed Fullscreen 2 hooks 6 config fields
#04
Needs binaries: forum-dl
Outputs: application/x-ndjson, application/warc, message/rfc822

Plugin Info

Plugin

forum-dl

forumdl

Download forum threads and exports in JSONL, WARC, and mailbox-style archive formats.

Required Plugins

No plugin dependencies declared.

Required Binaries
forum-dl
Output Mimetypes
application/x-ndjson application/warc message/rfc822

Run It

ArchiveBox
FORUMDL_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=forumdl 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key FORUMDL_ENABLED

Enable forum downloading with forum-dl

Type: boolean
Aliases: SAVE_FORUMDL, USE_FORUMDL
true
Config key FORUMDL_BINARY

Path to forum-dl binary

Type: string
"forum-dl"
Config key FORUMDL_TIMEOUT

Timeout for forum downloads in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 30
3600
Config key FORUMDL_OUTPUT_FORMAT

Output format for forum downloads

Type: string
Enum: jsonl, warc, mbox, maildir, mh, mmdf, babyl
"jsonl"
Config key FORUMDL_ARGS

Default forum-dl arguments

Type: array
Aliases: FORUMDL_DEFAULT_ARGS
[]
Config key FORUMDL_ARGS_EXTRA

Extra arguments to append to forum-dl command

Type: array
Aliases: FORUMDL_EXTRA_ARGS
[]

Git

git

Clone git repositories from supported repository URLs into the snapshot output directory.

Crawl Snapshot Embed 2 hooks 6 config fields
#05
Needs binaries: git
Outputs: text/, application/, image/, audio/ +2 more

Plugin Info

Plugin

Git

git

Clone git repositories from supported repository URLs into the snapshot output directory.

Required Plugins

No plugin dependencies declared.

Required Binaries
git
Output Mimetypes
text/ application/ image/ audio/ video/ font/

Run It

ArchiveBox
GIT_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=git 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key GIT_ENABLED

Enable git repository cloning

Type: boolean
Aliases: SAVE_GIT, USE_GIT
true
Config key GIT_BINARY

Path to git binary

Type: string
"git"
Config key GIT_TIMEOUT

Timeout for git operations in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 10
120
Config key GIT_DOMAINS

Comma-separated list of domains to treat as git repositories

Type: string
"github.com,gitlab.com,bitbucket.org,gist.github.com,codeberg.org,gitea.com,git.sr.ht"
Config key GIT_ARGS

Default git arguments

Type: array
Aliases: GIT_DEFAULT_ARGS
[
  "clone",
  "--depth=1",
  "--recursive"
]
Config key GIT_ARGS_EXTRA

Extra arguments to append to git command

Type: array
Aliases: GIT_EXTRA_ARGS
[]

wget

wget

Archive pages and their requisites with wget, optionally writing WARC captures.

Crawl Snapshot Embed 2 hooks 9 config fields
#06
Needs binaries: wget
Outputs: text/html, application/warc, image/, text/css +4 more

Plugin Info

Plugin

wget

wget

Archive pages and their requisites with wget, optionally writing WARC captures.

Required Plugins

No plugin dependencies declared.

Required Binaries
wget
Output Mimetypes
text/html application/warc image/ text/css application/javascript font/ audio/ video/

Run It

ArchiveBox
WGET_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=wget 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key WGET_ENABLED

Enable wget archiving

Type: boolean
Aliases: SAVE_WGET, USE_WGET
true
Config key WGET_WARC_ENABLED

Save WARC archive file

Type: boolean
Aliases: SAVE_WARC, WGET_SAVE_WARC
true
Config key WGET_BINARY

Path to wget binary

Type: string
"wget"
Config key WGET_TIMEOUT

Timeout for wget in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
60
Config key WGET_USER_AGENT

User agent string for wget

Type: string
Fallback: USER_AGENT
""
Config key WGET_COOKIES_FILE

Path to cookies file

Type: string
Fallback: COOKIES_FILE
""
Config key WGET_CHECK_SSL_VALIDITY

Whether to verify SSL certificates

Type: boolean
Fallback: CHECK_SSL_VALIDITY
true
Config key WGET_ARGS

Default wget arguments

Type: array
Aliases: WGET_DEFAULT_ARGS
[
  "--no-verbose",
  "--adjust-extension",
  "--convert-links",
  "--force-directories",
  "--backup-converted",
  "--span-hosts",
  "--no-parent",
  "--page-requisites",
  "--restrict-file-names=windows",
  "--tries=2",
  "-e",
  "robots=off"
]
Config key WGET_ARGS_EXTRA

Extra arguments to append to wget command

Type: array
Aliases: WGET_EXTRA_ARGS
[]

Archive.org

archivedotorg

Submit URLs to the Internet Archive Wayback Machine and save the resulting archive link.

Snapshot Embed 1 hooks 3 config fields
#08
Outputs: text/plain

Plugin Info

Plugin

Archive.org

archivedotorg

Submit URLs to the Internet Archive Wayback Machine and save the resulting archive link.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes
text/plain

Run It

ArchiveBox
ARCHIVEDOTORG_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=archivedotorg 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key ARCHIVEDOTORG_ENABLED

Submit URLs to archive.org Wayback Machine

Type: boolean
Aliases: SAVE_ARCHIVEDOTORG, USE_ARCHIVEDOTORG, SUBMIT_ARCHIVEDOTORG
true
Config key ARCHIVEDOTORG_TIMEOUT

Timeout for archive.org submission in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 10
60
Config key ARCHIVEDOTORG_USER_AGENT

User agent string

Type: string
Fallback: USER_AGENT
""

Favicon

favicon

Fetch and save the site favicon or touch icon.

Snapshot Embed 1 hooks 3 config fields
#11
Outputs: image/

Plugin Info

Plugin

Favicon

favicon

Fetch and save the site favicon or touch icon.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes
image/

Run It

ArchiveBox
FAVICON_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=favicon 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key FAVICON_ENABLED

Enable favicon downloading

Type: boolean
Aliases: SAVE_FAVICON, USE_FAVICON
true
Config key FAVICON_TIMEOUT

Timeout for favicon fetch in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30
Config key FAVICON_USER_AGENT

User agent string

Type: string
Fallback: USER_AGENT
""

Modal Closer

modalcloser

Automatically dismiss dialogs, cookie banners, and framework modals while the page is being archived.

Snapshot 1 hooks 3 config fields
#15
Needs plugins: chrome
Needs binaries: chrome

Plugin Info

Plugin

Modal Closer

modalcloser

Automatically dismiss dialogs, cookie banners, and framework modals while the page is being archived.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
MODALCLOSER_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=modalcloser 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key MODALCLOSER_ENABLED

Enable automatic modal and dialog closing

Type: boolean
Aliases: CLOSE_MODALS, AUTO_CLOSE_MODALS
true
Config key MODALCLOSER_TIMEOUT

Delay before auto-closing dialogs (ms)

Type: integer
Minimum: 100
1250
Config key MODALCLOSER_POLL_INTERVAL

How often to check for CSS modals (ms)

Type: integer
Minimum: 100
500

Console Log

consolelog

Capture browser console messages emitted while the page loads.

Snapshot 1 hooks 2 config fields
#21
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/x-ndjson

Plugin Info

Plugin

Console Log

consolelog

Capture browser console messages emitted while the page loads.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/x-ndjson

Run It

ArchiveBox
CONSOLELOG_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=consolelog 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key CONSOLELOG_ENABLED

Enable console log capture

Type: boolean
Aliases: SAVE_CONSOLELOG, USE_CONSOLELOG
true
Config key CONSOLELOG_TIMEOUT

Timeout for console log capture in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

DNS

dns

Record DNS activity observed while loading the page in Chrome.

Snapshot 1 hooks 2 config fields
#22
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/x-ndjson

Plugin Info

Plugin

DNS

dns

Record DNS activity observed while loading the page in Chrome.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/x-ndjson

Run It

ArchiveBox
DNS_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=dns 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key DNS_ENABLED

Enable DNS traffic recording during page load

Type: boolean
Aliases: SAVE_DNS, USE_DNS
true
Config key DNS_TIMEOUT

Timeout for DNS recording in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

SSL

ssl

Capture TLS certificate and connection metadata for the loaded page.

Snapshot 1 hooks 2 config fields
#23
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/x-ndjson

Plugin Info

Plugin

SSL

ssl

Capture TLS certificate and connection metadata for the loaded page.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/x-ndjson

Run It

ArchiveBox
SSL_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=ssl 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key SSL_ENABLED

Enable SSL certificate capture

Type: boolean
Aliases: SAVE_SSL, USE_SSL
true
Config key SSL_TIMEOUT

Timeout for SSL capture in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

Responses

responses

Capture HTTP response metadata for requests made during page load.

Snapshot 1 hooks 2 config fields
#24
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/x-ndjson, text/, image/, audio/ +3 more

Plugin Info

Plugin

Responses

responses

Capture HTTP response metadata for requests made during page load.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/x-ndjson text/ image/ audio/ video/ application/ font/

Run It

ArchiveBox
RESPONSES_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=responses 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key RESPONSES_ENABLED

Enable HTTP response capture

Type: boolean
Aliases: SAVE_RESPONSES, USE_RESPONSES
true
Config key RESPONSES_TIMEOUT

Timeout for response capture in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

Redirects

redirects

Capture the redirect chain encountered while loading the page.

Snapshot 1 hooks 2 config fields
#25
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/x-ndjson

Plugin Info

Plugin

Redirects

redirects

Capture the redirect chain encountered while loading the page.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/x-ndjson

Run It

ArchiveBox
REDIRECTS_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=redirects 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key REDIRECTS_ENABLED

Enable redirect chain capture

Type: boolean
Aliases: SAVE_REDIRECTS, USE_REDIRECTS
true
Config key REDIRECTS_TIMEOUT

Timeout for redirect capture in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

Static File

staticfile

Detect and download static-file responses directly when a URL resolves to a non-HTML asset.

Snapshot Embed 1 hooks 2 config fields
#26
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/pdf, application/epub+zip, image/, audio/ +8 more

Plugin Info

Plugin

Static File

staticfile

Detect and download static-file responses directly when a URL resolves to a non-HTML asset.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/pdf application/epub+zip image/ audio/ video/ application/json application/xml text/csv text/xml application/zip application/octet-stream application/x-

Run It

ArchiveBox
STATICFILE_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=staticfile 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key STATICFILE_ENABLED

Enable static file detection

Type: boolean
Aliases: SAVE_STATICFILE, USE_STATICFILE
true
Config key STATICFILE_TIMEOUT

Timeout for static file detection in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

Headers

headers

Capture HTTP headers for the main document response.

Snapshot 1 hooks 2 config fields
#27
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/json

Plugin Info

Plugin

Headers

headers

Capture HTTP headers for the main document response.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/json

Run It

ArchiveBox
HEADERS_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=headers 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key HEADERS_ENABLED

Enable HTTP headers capture

Type: boolean
Aliases: SAVE_HEADERS, USE_HEADERS
true
Config key HEADERS_TIMEOUT

Timeout for headers capture in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

Chrome

chrome

Launch and manage a shared Chromium session for browser-driven plugins.

Crawl Snapshot 6 hooks 15 config fields
#30
Needs binaries: chrome
Outputs: text/plain, application/json

Plugin Info

Plugin

Chrome

chrome

Launch and manage a shared Chromium session for browser-driven plugins.

Required Plugins

No plugin dependencies declared.

Required Binaries
chrome
Output Mimetypes
text/plain application/json

Run It

ArchiveBox
CHROME_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=chrome 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key CHROME_ENABLED

Enable Chromium browser integration for archiving

Type: boolean
Aliases: USE_CHROME
true
Config key CHROME_BINARY

Path to Chromium binary

Type: string
Aliases: CHROMIUM_BINARY, GOOGLE_CHROME_BINARY
"chromium"
Config key CHROME_NODE_BINARY

Path to Node.js binary (for Puppeteer)

Type: string
Fallback: NODE_BINARY
"node"
Config key CHROME_TIMEOUT

Timeout for Chrome operations in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
60
Config key CHROME_HEADLESS

Run Chrome in headless mode

Type: boolean
true
Config key CHROME_SANDBOX

Enable Chrome sandbox (disable in Docker with --no-sandbox)

Type: boolean
true
Config key CHROME_RESOLUTION

Browser viewport resolution (width,height)

Type: string
Fallback: RESOLUTION
Pattern: ^\d+,\d+$
"1440,2000"
Config key CHROME_USER_DATA_DIR

Path to Chrome user data directory for persistent sessions (derived from ACTIVE_PERSONA if not set)

Type: string
""
Config key CHROME_USER_AGENT

User agent string for Chrome

Type: string
Fallback: USER_AGENT
""
Config key CHROME_ARGS

Default Chrome command-line arguments (static flags only, dynamic args like --user-data-dir are added at runtime)

Type: array
Aliases: CHROME_DEFAULT_ARGS
[
  "--no-first-run",
  "--no-default-browser-check",
  "--disable-default-apps",
  "--disable-sync",
  "--disable-infobars",
  "--disable-blink-features=AutomationControlled",
  "--disable-component-update",
  "--disable-domain-reliability",
  "--disable-breakpad",
  "--disable-client-side-phishing-detection",
  "--disable-hang-monitor",
  "--disable-speech-synthesis-api",
  "--disable-speech-api",
  "--disable-print-preview",
  "--disable-notifications",
  "--disable-desktop-notifications",
  "--disable-popup-blocking",
  "--disable-prompt-on-repost",
  "--disable-external-intent-requests",
  "--disable-session-crashed-bubble",
  "--disable-search-engine-choice-screen",
  "--disable-datasaver-prompt",
  "--ash-no-nudges",
  "--hide-crash-restore-bubble",
  "--suppress-message-center-popups",
  "--noerrdialogs",
  "--no-pings",
  "--silent-debugger-extension-api",
  "--deny-permission-prompts",
  "--safebrowsing-disable-auto-update",
  "--metrics-recording-only",
  "--password-store=basic",
  "--use-mock-keychain",
  "--disable-cookie-encryption",
  "--font-render-hinting=none",
  "--force-color-profile=srgb",
  "--disable-partial-raster",
  "--disable-skia-runtime-opts",
  "--disable-2d-canvas-clip-aa",
  "--enable-webgl",
  "--hide-scrollbars",
  "--export-tagged-pdf",
  "--generate-pdf-document-outline",
  "--disable-lazy-loading",
  "--disable-renderer-backgrounding",
  "--disable-background-networking",
  "--disable-background-timer-throttling",
  "--disable-backgrounding-occluded-windows",
  "--disable-ipc-flooding-protection",
  "--disable-extensions-http-throttling",
  "--disable-field-trial-config",
  "--disable-back-forward-cache",
  "--autoplay-policy=no-user-gesture-required",
  "--disable-gesture-requirement-for-media-playback",
  "--lang=en-US,en;q=0.9",
  "--log-level=2",
  "--enable-logging=stderr"
]
Config key CHROME_ARGS_EXTRA

Extra arguments to append to Chrome command (for user customization)

Type: array
Aliases: CHROME_EXTRA_ARGS
[]
Config key CHROME_PAGELOAD_TIMEOUT

Timeout for page navigation/load in seconds

Type: integer
Fallback: CHROME_TIMEOUT
Minimum: 5
60
Config key CHROME_WAIT_FOR

Page load completion condition (domcontentloaded, load, networkidle0, networkidle2)

Type: string
Enum: domcontentloaded, load, networkidle0, networkidle2
"networkidle2"
Config key CHROME_DELAY_AFTER_LOAD

Extra delay in seconds after page load completes before archiving (useful for JS-heavy SPAs)

Type: number
Minimum: 0
0
Config key CHROME_CHECK_SSL_VALIDITY

Whether to verify SSL certificates (disable for self-signed certs)

Type: boolean
Fallback: CHECK_SSL_VALIDITY
true

SEO

seo

Capture SEO-related metadata such as meta tags and Open Graph fields.

Snapshot 1 hooks 2 config fields
#38
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/json

Plugin Info

Plugin

SEO

seo

Capture SEO-related metadata such as meta tags and Open Graph fields.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/json

Run It

ArchiveBox
SEO_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=seo 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key SEO_ENABLED

Enable SEO metadata capture

Type: boolean
Aliases: SAVE_SEO, USE_SEO
true
Config key SEO_TIMEOUT

Timeout for SEO capture in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

Accessibility

accessibility

Capture the browser accessibility tree for the archived page.

Snapshot 1 hooks 2 config fields
#39
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/json

Plugin Info

Plugin

Accessibility

accessibility

Capture the browser accessibility tree for the archived page.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/json

Run It

ArchiveBox
ACCESSIBILITY_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=accessibility 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key ACCESSIBILITY_ENABLED

Enable accessibility tree capture

Type: boolean
Aliases: SAVE_ACCESSIBILITY, USE_ACCESSIBILITY
true
Config key ACCESSIBILITY_TIMEOUT

Timeout for accessibility capture in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

Infinite Scroll

infiniscroll

Expand infinite-scroll pages and load additional content before downstream capture plugins run.

Snapshot 1 hooks 7 config fields
#45
Needs plugins: chrome
Needs binaries: chrome

Plugin Info

Plugin

Infinite Scroll

infiniscroll

Expand infinite-scroll pages and load additional content before downstream capture plugins run.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
INFINISCROLL_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=infiniscroll 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key INFINISCROLL_ENABLED

Enable infinite scroll page expansion

Type: boolean
Aliases: SAVE_INFINISCROLL, USE_INFINISCROLL
true
Config key INFINISCROLL_TIMEOUT

Maximum timeout for scrolling in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 10
120
Config key INFINISCROLL_SCROLL_DELAY

Delay between scrolls in milliseconds

Type: integer
Minimum: 500
2000
Config key INFINISCROLL_SCROLL_DISTANCE

Distance to scroll per step in pixels

Type: integer
Minimum: 100
1600
Config key INFINISCROLL_SCROLL_LIMIT

Maximum number of scroll steps

Type: integer
Minimum: 1
10
Config key INFINISCROLL_MIN_HEIGHT

Minimum page height to scroll to in pixels

Type: integer
Minimum: 1000
16000
Config key INFINISCROLL_EXPAND_DETAILS

Expand

elements and click 'load more' buttons for comments

Type: boolean
true

Claude Chrome

claudechrome

Use Claude computer-use to interact with pages in Chrome via CDP screenshots and the Anthropic API.

Crawl Snapshot Embed Fullscreen 3 hooks 6 config fields
#47
Needs plugins: chrome
Needs binaries: node
Outputs: application/json, image/png

Plugin Info

Plugin

Claude Chrome

claudechrome

Use Claude computer-use to interact with pages in Chrome via CDP screenshots and the Anthropic API.

Required Plugins
chrome
Required Binaries
node
Output Mimetypes
application/json image/png

Run It

ArchiveBox
CLAUDECHROME_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=claudechrome 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key CLAUDECHROME_ENABLED

Enable Claude for Chrome browser extension for AI-driven page interaction

Type: boolean
Aliases: USE_CLAUDECHROME
false
Config key CLAUDECHROME_PROMPT

Prompt for Claude to execute on the page. Claude can click buttons, fill forms, download files, and interact with any page element.

Type: string
"Look at the current page. If there are any \"expand\", \"show more\", \"load more\", or similar buttons/links, click them all to reveal hidden content. Report what you did."
Config key CLAUDECHROME_TIMEOUT

Timeout for Claude for Chrome operations in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 10
120
Config key CLAUDECHROME_MODEL

Claude model to use (e.g. sonnet, opus, haiku). Availability depends on your plan.

Type: string
"sonnet"
Config key CLAUDECHROME_MAX_ACTIONS

Maximum number of agentic loop iterations (screenshots + actions) per page

Type: integer
Minimum: 1
15
Config key ANTHROPIC_API_KEY

Anthropic API key for Claude for Chrome authentication

Type: string
""

SingleFile

singlefile

Save a complete page as a single self-contained HTML file using the SingleFile extension or CLI.

Crawl Snapshot Embed 3 hooks 11 config fields
#50
Needs plugins: chrome
Needs binaries: chrome, single-file-cli
Outputs: text/html

Plugin Info

Plugin

SingleFile

singlefile

Save a complete page as a single self-contained HTML file using the SingleFile extension or CLI.

Required Plugins
chrome
Required Binaries
chrome single-file-cli
Output Mimetypes
text/html

Run It

ArchiveBox
SINGLEFILE_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=singlefile 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key SINGLEFILE_ENABLED

Enable SingleFile archiving

Type: boolean
Aliases: SAVE_SINGLEFILE, USE_SINGLEFILE
true
Config key SINGLEFILE_BINARY

Path to single-file binary

Type: string
Aliases: SINGLE_FILE_BINARY
"single-file"
Config key SINGLEFILE_NODE_BINARY

Path to Node.js binary

Type: string
Fallback: NODE_BINARY
"node"
Config key SINGLEFILE_CHROME_BINARY

Path to Chromium binary

Type: string
Fallback: CHROME_BINARY
""
Config key SINGLEFILE_TIMEOUT

Timeout for SingleFile in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 10
60
Config key SINGLEFILE_USER_AGENT

User agent string

Type: string
Fallback: USER_AGENT
""
Config key SINGLEFILE_COOKIES_FILE

Path to cookies file

Type: string
Fallback: COOKIES_FILE
""
Config key SINGLEFILE_CHECK_SSL_VALIDITY

Whether to verify SSL certificates

Type: boolean
Fallback: CHECK_SSL_VALIDITY
true
Config key SINGLEFILE_CHROME_ARGS

Chrome command-line arguments for SingleFile

Type: array
Fallback: CHROME_ARGS
[]
Config key SINGLEFILE_ARGS

Default single-file arguments

Type: array
Aliases: SINGLEFILE_DEFAULT_ARGS
[
  "--browser-headless"
]
Config key SINGLEFILE_ARGS_EXTRA

Extra arguments to append to single-file command

Type: array
Aliases: SINGLEFILE_EXTRA_ARGS
[]

Screenshot

screenshot

Capture a PNG screenshot of the rendered page.

Snapshot Embed Fullscreen 1 hooks 3 config fields
#51
Needs plugins: chrome
Needs binaries: chrome
Outputs: image/png

Plugin Info

Plugin

Screenshot

screenshot

Capture a PNG screenshot of the rendered page.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
image/png

Run It

ArchiveBox
SCREENSHOT_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=screenshot 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key SCREENSHOT_ENABLED

Enable screenshot capture

Type: boolean
Aliases: SAVE_SCREENSHOT, USE_SCREENSHOT
true
Config key SCREENSHOT_TIMEOUT

Timeout for screenshot capture in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
60
Config key SCREENSHOT_RESOLUTION

Screenshot resolution (width,height)

Type: string
Fallback: RESOLUTION
Pattern: ^\d+,\d+$
"1440,2000"

PDF

pdf

Render the current page to PDF using the shared Chrome session.

Snapshot Embed Fullscreen 1 hooks 3 config fields
#52
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/pdf

Plugin Info

Plugin

PDF

pdf

Render the current page to PDF using the shared Chrome session.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/pdf

Run It

ArchiveBox
PDF_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=pdf 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key PDF_ENABLED

Enable PDF generation

Type: boolean
Aliases: SAVE_PDF, USE_PDF
true
Config key PDF_TIMEOUT

Timeout for PDF generation in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
60
Config key PDF_RESOLUTION

PDF page resolution (width,height)

Type: string
Fallback: RESOLUTION
Pattern: ^\d+,\d+$
"1440,2000"

DOM

dom

Save the fully rendered DOM HTML from the live page.

Snapshot Embed 1 hooks 2 config fields
#53
Needs plugins: chrome
Needs binaries: chrome
Outputs: text/html

Plugin Info

Plugin

DOM

dom

Save the fully rendered DOM HTML from the live page.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
text/html

Run It

ArchiveBox
DOM_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=dom 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key DOM_ENABLED

Enable DOM capture

Type: boolean
Aliases: SAVE_DOM, USE_DOM
true
Config key DOM_TIMEOUT

Timeout for DOM capture in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
60

Title

title

Capture the final document title from the rendered page.

Snapshot 1 hooks 2 config fields
#54
Needs plugins: chrome
Needs binaries: chrome
Outputs: text/plain

Plugin Info

Plugin

Title

title

Capture the final document title from the rendered page.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
text/plain

Run It

ArchiveBox
TITLE_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=title 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key TITLE_ENABLED

Enable title extraction

Type: boolean
Aliases: SAVE_TITLE, USE_TITLE
true
Config key TITLE_TIMEOUT

Timeout for title extraction in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

Readability

readability

Extract article HTML, text, and metadata using Mozilla Readability.

Crawl Snapshot Embed Fullscreen 2 hooks 5 config fields
#56
Needs binaries: readability-extractor
Outputs: text/html, text/plain, application/json

Plugin Info

Plugin

Readability

readability

Extract article HTML, text, and metadata using Mozilla Readability.

Required Plugins

No plugin dependencies declared.

Required Binaries
readability-extractor
Output Mimetypes
text/html text/plain application/json

Run It

ArchiveBox
READABILITY_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=readability 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key READABILITY_ENABLED

Enable Readability text extraction

Type: boolean
Aliases: SAVE_READABILITY, USE_READABILITY
true
Config key READABILITY_BINARY

Path to readability-extractor binary

Type: string
"readability-extractor"
Config key READABILITY_TIMEOUT

Timeout for Readability in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30
Config key READABILITY_ARGS

Default Readability arguments

Type: array
Aliases: READABILITY_DEFAULT_ARGS
[]
Config key READABILITY_ARGS_EXTRA

Extra arguments to append to Readability command

Type: array
Aliases: READABILITY_EXTRA_ARGS
[]

Defuddle

defuddle

Extract cleaned article HTML, text, and metadata from archived HTML using Defuddle.

Crawl Snapshot 2 hooks 5 config fields
#57
Needs binaries: defuddle
Outputs: text/html, text/plain, application/json

Plugin Info

Plugin

Defuddle

defuddle

Extract cleaned article HTML, text, and metadata from archived HTML using Defuddle.

Required Plugins

No plugin dependencies declared.

Required Binaries
defuddle
Output Mimetypes
text/html text/plain application/json

Run It

ArchiveBox
DEFUDDLE_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=defuddle 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key DEFUDDLE_ENABLED

Enable Defuddle text extraction

Type: boolean
Aliases: SAVE_DEFUDDLE, USE_DEFUDDLE
true
Config key DEFUDDLE_BINARY

Path to defuddle binary

Type: string
"defuddle"
Config key DEFUDDLE_TIMEOUT

Timeout for Defuddle in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30
Config key DEFUDDLE_ARGS

Default Defuddle arguments

Type: array
Aliases: DEFUDDLE_DEFAULT_ARGS
[]
Config key DEFUDDLE_ARGS_EXTRA

Extra arguments to append to Defuddle command

Type: array
Aliases: DEFUDDLE_EXTRA_ARGS
[]

Mercury

mercury

Extract article HTML, text, and metadata using the Postlight Mercury parser.

Crawl Snapshot Embed 2 hooks 5 config fields
#57
Needs binaries: postlight-parser
Outputs: text/html, text/plain, application/json

Plugin Info

Plugin

Mercury

mercury

Extract article HTML, text, and metadata using the Postlight Mercury parser.

Required Plugins

No plugin dependencies declared.

Required Binaries
postlight-parser
Output Mimetypes
text/html text/plain application/json

Run It

ArchiveBox
MERCURY_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=mercury 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key MERCURY_ENABLED

Enable Mercury text extraction

Type: boolean
Aliases: SAVE_MERCURY, USE_MERCURY
true
Config key MERCURY_BINARY

Path to Mercury/Postlight parser binary

Type: string
Aliases: POSTLIGHT_PARSER_BINARY
"postlight-parser"
Config key MERCURY_TIMEOUT

Timeout for Mercury in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30
Config key MERCURY_ARGS

Default Mercury parser arguments

Type: array
Aliases: MERCURY_DEFAULT_ARGS
[]
Config key MERCURY_ARGS_EXTRA

Extra arguments to append to Mercury parser command

Type: array
Aliases: MERCURY_EXTRA_ARGS
[]

Claude Code Extract

claudecodeextract

Use Claude Code to generate clean Markdown from snapshot extractor outputs.

Snapshot Embed Fullscreen 1 hooks 5 config fields
#58
Needs plugins: claudecode
Needs binaries: claude
Outputs: text/markdown

Plugin Info

Plugin

Claude Code Extract

claudecodeextract

Use Claude Code to generate clean Markdown from snapshot extractor outputs.

Required Plugins
claudecode
Required Binaries
claude
Output Mimetypes
text/markdown

Run It

ArchiveBox
CLAUDECODEEXTRACT_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=claudecodeextract 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key CLAUDECODEEXTRACT_ENABLED

Enable Claude Code AI extraction

Type: boolean
Aliases: USE_CLAUDECODEEXTRACT
false
Config key CLAUDECODEEXTRACT_TIMEOUT

Timeout for Claude Code extraction in seconds

Type: integer
Fallback: CLAUDECODE_TIMEOUT
Minimum: 10
120
Config key CLAUDECODEEXTRACT_PROMPT

Custom prompt for Claude Code extraction. Use this to define what Claude should extract or generate from the snapshot.

Type: string
"Read all the previously extracted outputs in this snapshot directory (readability/, mercury/, defuddle/, htmltotext/, dom/, singlefile/, etc.). Using the best available source, generate a clean, well-formatted Markdown representation of the page content. Save the output as content.md in your output directory."
Config key CLAUDECODEEXTRACT_MODEL

Claude model to use for extraction (e.g. sonnet, opus, haiku)

Type: string
Fallback: CLAUDECODE_MODEL
"sonnet"
Config key CLAUDECODEEXTRACT_MAX_TURNS

Maximum number of agentic turns for extraction

Type: integer
Fallback: CLAUDECODE_MAX_TURNS
Minimum: 1
10

HTML to Text

htmltotext

Convert archived HTML from other extractors into plain text for indexing and analysis.

Snapshot 1 hooks 2 config fields
#58
Outputs: text/plain

Plugin Info

Plugin

HTML to Text

htmltotext

Convert archived HTML from other extractors into plain text for indexing and analysis.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes
text/plain

Run It

ArchiveBox
HTMLTOTEXT_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=htmltotext 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key HTMLTOTEXT_ENABLED

Enable HTML to text conversion

Type: boolean
Aliases: SAVE_HTMLTOTEXT, USE_HTMLTOTEXT
true
Config key HTMLTOTEXT_TIMEOUT

Timeout for HTML to text conversion in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

Trafilatura

trafilatura

Extract article content from archived HTML into text, markdown, HTML, CSV, JSON, and XML formats.

Crawl Snapshot 2 hooks 10 config fields
#59
Needs binaries: trafilatura
Outputs: text/plain, text/markdown, text/html, text/csv +3 more

Plugin Info

Plugin

Trafilatura

trafilatura

Extract article content from archived HTML into text, markdown, HTML, CSV, JSON, and XML formats.

Required Plugins

No plugin dependencies declared.

Required Binaries
trafilatura
Output Mimetypes
text/plain text/markdown text/html text/csv application/json application/xml application/tei+xml

Run It

ArchiveBox
TRAFILATURA_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=trafilatura 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key TRAFILATURA_ENABLED

Enable Trafilatura extraction

Type: boolean
Aliases: SAVE_TRAFILATURA, USE_TRAFILATURA
true
Config key TRAFILATURA_BINARY

Path to trafilatura binary

Type: string
"trafilatura"
Config key TRAFILATURA_TIMEOUT

Timeout for Trafilatura in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30
Config key TRAFILATURA_OUTPUT_TXT

Write plain text output (content.txt)

Type: boolean
true
Config key TRAFILATURA_OUTPUT_MARKDOWN

Write markdown output (content.md)

Type: boolean
true
Config key TRAFILATURA_OUTPUT_HTML

Write HTML output (content.html)

Type: boolean
true
Config key TRAFILATURA_OUTPUT_CSV

Write CSV output (content.csv)

Type: boolean
false
Config key TRAFILATURA_OUTPUT_JSON

Write JSON output (content.json)

Type: boolean
false
Config key TRAFILATURA_OUTPUT_XML

Write XML output (content.xml)

Type: boolean
false
Config key TRAFILATURA_OUTPUT_XMLTEI

Write XML TEI output (content.xmltei)

Type: boolean
false

papers-dl

papersdl

Fetch downloadable academic papers from paper URLs and DOI targets.

Crawl Snapshot Embed Fullscreen 2 hooks 5 config fields
#66
Needs binaries: papers-dl
Outputs: application/pdf

Plugin Info

Plugin

papers-dl

papersdl

Fetch downloadable academic papers from paper URLs and DOI targets.

Required Plugins

No plugin dependencies declared.

Required Binaries
papers-dl
Output Mimetypes
application/pdf

Run It

ArchiveBox
PAPERSDL_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=papersdl 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key PAPERSDL_ENABLED

Enable paper downloading with papers-dl

Type: boolean
Aliases: SAVE_PAPERSDL, USE_PAPERSDL
true
Config key PAPERSDL_BINARY

Path to papers-dl binary

Type: string
"papers-dl"
Config key PAPERSDL_TIMEOUT

Timeout for paper downloads in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 30
300
Config key PAPERSDL_ARGS

Default papers-dl arguments

Type: array
Aliases: PAPERSDL_DEFAULT_ARGS
[
  "fetch"
]
Config key PAPERSDL_ARGS_EXTRA

Extra arguments to append to papers-dl command

Type: array
Aliases: PAPERSDL_EXTRA_ARGS
[]

Parse HTML URLs

parse_html_urls

Parse HTML documents and emit discovered links as JSONL snapshot records.

Snapshot 1 hooks 1 config fields
#70
Outputs: application/x-ndjson

Plugin Info

Plugin

Parse HTML URLs

parse_html_urls

Parse HTML documents and emit discovered links as JSONL snapshot records.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes
application/x-ndjson

Run It

ArchiveBox
PARSE_HTML_URLS_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=parse_html_urls 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key PARSE_HTML_URLS_ENABLED

Enable HTML URL parsing

Type: boolean
Aliases: USE_PARSE_HTML_URLS
true

Parse Text URLs

parse_txt_urls

Parse plain text documents and emit discovered URLs as JSONL snapshot records.

Snapshot 1 hooks 1 config fields
#71
Outputs: application/x-ndjson

Plugin Info

Plugin

Parse Text URLs

parse_txt_urls

Parse plain text documents and emit discovered URLs as JSONL snapshot records.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes
application/x-ndjson

Run It

ArchiveBox
PARSE_TXT_URLS_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=parse_txt_urls 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key PARSE_TXT_URLS_ENABLED

Enable plain text URL parsing

Type: boolean
Aliases: USE_PARSE_TXT_URLS
true

Parse RSS URLs

parse_rss_urls

Parse RSS and Atom feeds and emit discovered entry URLs as JSONL snapshot records.

Snapshot 1 hooks 1 config fields
#72
Outputs: application/x-ndjson

Plugin Info

Plugin

Parse RSS URLs

parse_rss_urls

Parse RSS and Atom feeds and emit discovered entry URLs as JSONL snapshot records.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes
application/x-ndjson

Run It

ArchiveBox
PARSE_RSS_URLS_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=parse_rss_urls 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key PARSE_RSS_URLS_ENABLED

Enable RSS/Atom feed URL parsing

Type: boolean
Aliases: USE_PARSE_RSS_URLS
true

Parse Netscape URLs

parse_netscape_urls

Parse Netscape bookmark HTML exports and emit discovered URLs as JSONL snapshot records.

Snapshot 1 hooks 1 config fields
#73
Outputs: application/x-ndjson

Plugin Info

Plugin

Parse Netscape URLs

parse_netscape_urls

Parse Netscape bookmark HTML exports and emit discovered URLs as JSONL snapshot records.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes
application/x-ndjson

Run It

ArchiveBox
PARSE_NETSCAPE_URLS_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=parse_netscape_urls 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key PARSE_NETSCAPE_URLS_ENABLED

Enable Netscape bookmarks HTML URL parsing

Type: boolean
Aliases: USE_PARSE_NETSCAPE_URLS
true

Parse JSONL URLs

parse_jsonl_urls

Parse JSONL bookmark exports and emit discovered URLs as JSONL snapshot records.

Snapshot 1 hooks 1 config fields
#74
Outputs: application/x-ndjson

Plugin Info

Plugin

Parse JSONL URLs

parse_jsonl_urls

Parse JSONL bookmark exports and emit discovered URLs as JSONL snapshot records.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes
application/x-ndjson

Run It

ArchiveBox
PARSE_JSONL_URLS_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=parse_jsonl_urls 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key PARSE_JSONL_URLS_ENABLED

Enable JSON Lines URL parsing

Type: boolean
Aliases: USE_PARSE_JSONL_URLS
true

Parse DOM Outlinks

parse_dom_outlinks

Extract crawlable links from the rendered DOM and emit them as JSONL records.

Snapshot 1 hooks 2 config fields
#75
Needs plugins: chrome
Needs binaries: chrome
Outputs: application/x-ndjson

Plugin Info

Plugin

Parse DOM Outlinks

parse_dom_outlinks

Extract crawlable links from the rendered DOM and emit them as JSONL records.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes
application/x-ndjson

Run It

ArchiveBox
PARSE_DOM_OUTLINKS_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=parse_dom_outlinks 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key PARSE_DOM_OUTLINKS_ENABLED

Enable DOM outlinks parsing from archived pages

Type: boolean
Aliases: SAVE_DOM_OUTLINKS, USE_PARSE_DOM_OUTLINKS
true
Config key PARSE_DOM_OUTLINKS_TIMEOUT

Timeout for DOM outlinks parsing in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

SQLite Search

search_backend_sqlite

Index archived snapshot content into a SQLite FTS database for local search.

Snapshot 1 hooks 3 config fields
#90
Outputs: application/vnd.sqlite3

Plugin Info

Plugin

SQLite Search

search_backend_sqlite

Index archived snapshot content into a SQLite FTS database for local search.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes
application/vnd.sqlite3

Run It

ArchiveBox
archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=search_backend_sqlite 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key SEARCH_BACKEND_SQLITE_DB

SQLite FTS database filename

Type: string
Aliases: SQLITEFTS_DB
"search.sqlite3"
Config key SEARCH_BACKEND_SQLITE_SEPARATE_DATABASE

Use separate database file for FTS index

Type: boolean
Aliases: FTS_SEPARATE_DATABASE, SQLITEFTS_SEPARATE_DATABASE
true
Config key SEARCH_BACKEND_SQLITE_TOKENIZERS

FTS5 tokenizer configuration

Type: string
Aliases: FTS_TOKENIZERS, SQLITEFTS_TOKENIZERS
"porter unicode61 remove_diacritics 2"

Sonic Search

search_backend_sonic

Index archived snapshot content into a Sonic search backend.

Snapshot 1 hooks 5 config fields
#91

Plugin Info

Plugin

Sonic Search

search_backend_sonic

Index archived snapshot content into a Sonic search backend.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=search_backend_sonic 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key SEARCH_BACKEND_SONIC_HOST_NAME

Sonic server hostname

Type: string
Aliases: SEARCH_BACKEND_HOST_NAME, SONIC_HOST
"127.0.0.1"
Config key SEARCH_BACKEND_SONIC_PORT

Sonic server port

Type: integer
Aliases: SEARCH_BACKEND_PORT, SONIC_PORT
Minimum: 1
1491
Config key SEARCH_BACKEND_SONIC_PASSWORD

Sonic server password

Type: string
Aliases: SEARCH_BACKEND_PASSWORD, SONIC_PASSWORD
"SecretPassword"
Config key SEARCH_BACKEND_SONIC_COLLECTION

Sonic collection name

Type: string
Aliases: SONIC_COLLECTION
"archivebox"
Config key SEARCH_BACKEND_SONIC_BUCKET

Sonic bucket name

Type: string
Aliases: SONIC_BUCKET
"snapshots"

Claude Code Cleanup

claudecodecleanup

Use Claude Code to deduplicate and clean up redundant snapshot extractor outputs.

Snapshot Embed Fullscreen 1 hooks 5 config fields
#92
Needs plugins: claudecode
Needs binaries: claude
Outputs: text/plain

Plugin Info

Plugin

Claude Code Cleanup

claudecodecleanup

Use Claude Code to deduplicate and clean up redundant snapshot extractor outputs.

Required Plugins
claudecode
Required Binaries
claude
Output Mimetypes
text/plain

Run It

ArchiveBox
CLAUDECODECLEANUP_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=claudecodecleanup 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key CLAUDECODECLEANUP_ENABLED

Enable Claude Code AI cleanup of snapshot files

Type: boolean
Aliases: USE_CLAUDECODECLEANUP
false
Config key CLAUDECODECLEANUP_TIMEOUT

Timeout for Claude Code cleanup in seconds

Type: integer
Fallback: CLAUDECODE_TIMEOUT
Minimum: 10
120
Config key CLAUDECODECLEANUP_PROMPT

Custom prompt for Claude Code cleanup. Defines what Claude should clean up and how to determine which duplicates to keep.

Type: string
"Analyze all the extractor output directories in this snapshot. Look for duplicate or redundant outputs across plugins (e.g. multiple HTML extractions, multiple text extractions, multiple URL extraction outputs, etc.). For each group of similar outputs, inspect the content and determine which version is the best quality. Delete the inferior/redundant versions, keeping only the best one. Also remove any unnecessary temporary files, empty directories, or incomplete outputs. Write a summary of what you cleaned up to cleanup_report.txt in your output directory."
Config key CLAUDECODECLEANUP_MODEL

Claude model to use for cleanup (e.g. sonnet, opus, haiku)

Type: string
Fallback: CLAUDECODE_MODEL
"sonnet"
Config key CLAUDECODECLEANUP_MAX_TURNS

Maximum number of agentic turns for cleanup

Type: integer
Fallback: CLAUDECODE_MAX_TURNS
Minimum: 1
15

Hashes

hashes

Generate a hash manifest for files produced in the snapshot directory.

Snapshot 1 hooks 2 config fields
#93
Outputs: application/json

Plugin Info

Plugin

Hashes

hashes

Generate a hash manifest for files produced in the snapshot directory.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes
application/json

Run It

ArchiveBox
HASHES_ENABLED=true archivebox add 'https://example.com'
abx-dl
abx-dl dl --plugins=hashes 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

Config Options

Config key HASHES_ENABLED

Enable merkle tree hash generation

Type: boolean
Aliases: SAVE_HASHES, USE_HASHES
true
Config key HASHES_TIMEOUT

Timeout for merkle tree generation in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 5
30

npm

npm

Install binaries from npm packages and expose Node module paths.

Crawl Binary 2 hooks 0 config fields
#00
Needs binaries: node, npm

Plugin Info

Plugin

npm

npm

Install binaries from npm packages and expose Node module paths.

Required Plugins

No plugin dependencies declared.

Required Binaries
node npm
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox init --setup
abx-dl
abx-dl plugins --install npm

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

This plugin does not define a config.json schema.

Claude Code

claudecode

Run Claude Code AI agent on snapshots to extract, analyze, or transform archived content.

Crawl Embed Fullscreen 1 hooks 6 config fields
#35
Needs binaries: node, claude
Outputs: application/json

Plugin Info

Plugin

Claude Code

claudecode

Run Claude Code AI agent on snapshots to extract, analyze, or transform archived content.

Required Plugins

No plugin dependencies declared.

Required Binaries
node claude
Output Mimetypes
application/json

Run It

ArchiveBox
CLAUDECODE_ENABLED=true archivebox init --setup
abx-dl
abx-dl plugins --install claudecode

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

Config key CLAUDECODE_ENABLED

Enable Claude Code AI agent integration. Controls the crawl-time Claude binary install hook; child plugins still need the claudecode plugin installed and a working Claude binary.

Type: boolean
Aliases: USE_CLAUDECODE
false
Config key CLAUDECODE_BINARY

Path to Claude Code CLI binary

Type: string
"claude"
Config key CLAUDECODE_TIMEOUT

Timeout for Claude Code operations in seconds

Type: integer
Fallback: TIMEOUT
Minimum: 10
120
Config key ANTHROPIC_API_KEY

Anthropic API key for Claude Code authentication

Type: string
""
Config key CLAUDECODE_MODEL

Claude model to use (e.g. sonnet, opus, haiku)

Type: string
"sonnet"
Config key CLAUDECODE_MAX_TURNS

Maximum number of agentic turns per invocation

Type: integer
Minimum: 1
10

ripgrep Search

search_backend_ripgrep

Search archived snapshot files directly with ripgrep instead of maintaining an index.

Crawl 1 hooks 4 config fields
#50
Needs binaries: ripgrep

Plugin Info

Plugin

ripgrep Search

search_backend_ripgrep

Search archived snapshot files directly with ripgrep instead of maintaining an index.

Required Plugins

No plugin dependencies declared.

Required Binaries
ripgrep
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox init --setup
abx-dl
abx-dl plugins --install search_backend_ripgrep

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

Config key RIPGREP_BINARY

Path to ripgrep binary

Type: string
"rg"
Config key RIPGREP_TIMEOUT

Search timeout in seconds

Type: integer
Fallback: TIMEOUT
Aliases: SEARCH_BACKEND_TIMEOUT
Minimum: 5
90
Config key RIPGREP_ARGS

Default ripgrep arguments

Type: array
Aliases: RIPGREP_DEFAULT_ARGS
[
  "--files-with-matches",
  "--no-messages",
  "--ignore-case"
]
Config key RIPGREP_ARGS_EXTRA

Extra arguments to append to ripgrep command

Type: array
Aliases: RIPGREP_EXTRA_ARGS
[]

Puppeteer

puppeteer

Install and manage Chromium through the Puppeteer toolchain.

Crawl Binary 2 hooks 0 config fields
#60
Needs binaries: puppeteer, chrome

Plugin Info

Plugin

Puppeteer

puppeteer

Install and manage Chromium through the Puppeteer toolchain.

Required Plugins

No plugin dependencies declared.

Required Binaries
puppeteer chrome
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox init --setup
abx-dl
abx-dl plugins --install puppeteer

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

This plugin does not define a config.json schema.

uBlock Origin

ublock

Install the uBlock Origin extension to block ads, trackers, and other page clutter during archiving.

Crawl 1 hooks 1 config fields
#80
Needs plugins: chrome
Needs binaries: chrome

Plugin Info

Plugin

uBlock Origin

ublock

Install the uBlock Origin extension to block ads, trackers, and other page clutter during archiving.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
UBLOCK_ENABLED=true archivebox init --setup
abx-dl
abx-dl plugins --install ublock

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

Config key UBLOCK_ENABLED

Enable uBlock Origin browser extension for ad blocking

Type: boolean
Aliases: USE_UBLOCK
true

I Still Don't Care About Cookies

istilldontcareaboutcookies

Install the I Still Don't Care About Cookies extension to dismiss cookie banners during archiving.

Crawl 1 hooks 1 config fields
#81
Needs plugins: chrome
Needs binaries: chrome

Plugin Info

Plugin

I Still Don't Care About Cookies

istilldontcareaboutcookies

Install the I Still Don't Care About Cookies extension to dismiss cookie banners during archiving.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
ISTILLDONTCAREABOUTCOOKIES_ENABLED=true archivebox init --setup
abx-dl
abx-dl plugins --install istilldontcareaboutcookies

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

Config key ISTILLDONTCAREABOUTCOOKIES_ENABLED

Enable I Still Don't Care About Cookies browser extension

Type: boolean
Aliases: USE_ISTILLDONTCAREABOUTCOOKIES
true

2Captcha

twocaptcha

Install and configure the 2Captcha extension to solve CAPTCHAs during browser-based archiving.

Crawl 2 hooks 6 config fields
#95
Needs plugins: chrome
Needs binaries: chrome

Plugin Info

Plugin

2Captcha

twocaptcha

Install and configure the 2Captcha extension to solve CAPTCHAs during browser-based archiving.

Required Plugins
chrome
Required Binaries
chrome
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
TWOCAPTCHA_ENABLED=true archivebox init --setup
abx-dl
abx-dl plugins --install twocaptcha

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

Config key TWOCAPTCHA_ENABLED

Enable 2captcha browser extension for automatic CAPTCHA solving

Type: boolean
Aliases: CAPTCHA2_ENABLED, USE_CAPTCHA2, USE_TWOCAPTCHA
true
Config key TWOCAPTCHA_API_KEY

2captcha API key for CAPTCHA solving service (get from https://2captcha.com)

Type: string
Aliases: API_KEY_2CAPTCHA, CAPTCHA2_API_KEY
""
Config key TWOCAPTCHA_RETRY_COUNT

Number of times to retry CAPTCHA solving on error

Type: integer
Aliases: CAPTCHA2_RETRY_COUNT
Minimum: 0
3
Config key TWOCAPTCHA_RETRY_DELAY

Delay in seconds between CAPTCHA solving retries

Type: integer
Aliases: CAPTCHA2_RETRY_DELAY
Minimum: 0
5
Config key TWOCAPTCHA_TIMEOUT

Timeout for CAPTCHA solving in seconds

Type: integer
Fallback: TIMEOUT
Aliases: CAPTCHA2_TIMEOUT
Minimum: 5
60
Config key TWOCAPTCHA_AUTO_SUBMIT

Automatically submit forms after CAPTCHA is solved

Type: boolean
false

pip

pip

Install Python-based binaries into a managed virtual environment.

Binary 1 hooks 0 config fields
#11
Needs binaries: python, pip

Plugin Info

Plugin

pip

pip

Install Python-based binaries into a managed virtual environment.

Required Plugins

No plugin dependencies declared.

Required Binaries
python pip
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox init --setup
abx-dl
abx-dl plugins --install pip

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

This plugin does not define a config.json schema.

Homebrew

brew

Install binaries through the Homebrew package manager.

Binary 1 hooks 0 config fields
#12
Needs binaries: brew

Plugin Info

Plugin

Homebrew

brew

Install binaries through the Homebrew package manager.

Required Plugins

No plugin dependencies declared.

Required Binaries
brew
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox init --setup
abx-dl
abx-dl plugins --install brew

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

This plugin does not define a config.json schema.

APT

apt

Install binaries through the Debian and Ubuntu APT package manager.

Binary 1 hooks 0 config fields
#13
Needs binaries: apt

Plugin Info

Plugin

APT

apt

Install binaries through the Debian and Ubuntu APT package manager.

Required Plugins

No plugin dependencies declared.

Required Binaries
apt
Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox init --setup
abx-dl
abx-dl plugins --install apt

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

This plugin does not define a config.json schema.

Custom

custom

Install binaries using an arbitrary custom shell command.

Binary 1 hooks 0 config fields
#14

Plugin Info

Plugin

Custom

custom

Install binaries using an arbitrary custom shell command.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox init --setup
abx-dl
abx-dl plugins --install custom

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

This plugin does not define a config.json schema.

Environment

env

Discover binaries that are already available on the system PATH.

Binary 1 hooks 0 config fields
#15

Plugin Info

Plugin

Environment

env

Discover binaries that are already available on the system PATH.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox init --setup
abx-dl
abx-dl plugins --install env

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

Config Options

This plugin does not define a config.json schema.

Base

base

Provide shared utilities, helpers, and test support used by other plugins.

0 hooks 0 config fields

Plugin Info

Plugin

Base

base

Provide shared utilities, helpers, and test support used by other plugins.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox add 'https://example.com'
abx-dl
abx-dl plugins base

Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.

Hook Scripts

No hook scripts are defined in this plugin directory.

Config Options

This plugin does not define a config.json schema.

Media

media

Provide a shared namespace for media-related plugin outputs and helpers.

0 hooks 0 config fields

Plugin Info

Plugin

Media

media

Provide a shared namespace for media-related plugin outputs and helpers.

Required Plugins

No plugin dependencies declared.

Required Binaries

No binary dependencies declared.

Output Mimetypes

No output mimetypes declared.

Run It

ArchiveBox
archivebox add 'https://example.com'
abx-dl
abx-dl plugins media

Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.

Hook Scripts

No hook scripts are defined in this plugin directory.

Config Options

This plugin does not define a config.json schema.