AI agents are only as useful as the context they can reach. Peter Steinberger, known online as steipete, just dropped a collection of open-source tools that solve exactly this problem: a “crawl army” of CLIs that mirror your app history into local SQLite databases, making it searchable, queryable, and - most importantly - readable by agents.
AI agents forget everything between sessions. The crawl army fixes this by mirroring your data from walled gardens (Discord, Slack, WhatsApp, Notion, Twitter, Google) into local-first SQLite databases that agents can actually query.
Where we’re headed
Every AI agent you’ve ever used has the same problem: it starts from zero each time. No memory of your Slack conversations, your Discord decisions, your Notion docs, your email threads. The crawl army is a set of seven open-source CLIs built by core OpenClaw contributors that copy your data out of siloed platforms and into local SQLite databases with full-text search. The result: agents that can actually remember what matters.
The story
The post on r/myclaw shares seven tools built by core OpenClaw contributors. Their shared philosophy is simple: your data lives inside walled gardens, and AI agents can’t reach it. Each tool breaks down one of those walls, copying data into a local-first, offline-capable SQLite archive with full-text search built in.
The name “crawl army” is fitting: each tool is a specialist soldier, and together they form a coordinated unit. They share a common architecture (Go CLI + SQLite + FTS5), a common interface pattern (sync, search, status, doctor commands), and a common goal: making your digital history agent-readable without uploading anything to the cloud.
This is part of a broader trend in the AI agent ecosystem. Tools like ContextMesh and MemoAIr are building persistent memory layers. Projects like Claw Recall are creating searchable conversation archives. But the crawl army takes a different, more pragmatic approach: instead of building new memory systems, it makes the data you already have accessible. Your Discord history, Slack messages, WhatsApp chats, Notion docs, Twitter bookmarks, and Google data all already exist. The crawl army just makes them agent-readable.
The key insight is local-first. All data stays on your machine in SQLite. No cloud uploads, no third-party services, no API keys floating around. You own your archive. The Git-backed sharing mechanism lets teams distribute read-only snapshots without exposing credentials. This is a fundamentally different trust model from “upload everything to an AI service.”
Who built this?
Peter Steinberger (steipete) is the driving force behind the crawl army. He’s the founder of PSPDFKit (acquired by Stripe in 2023), a major iOS/PDF SDK used by companies like Disney, IBM, and SAP. After PSPDFKit, he became the “Clawdfather” at OpenClaw, and in February 2026 he announced he was joining OpenAI to work on bringing agents to everyone. Based between Vienna and London, he’s a prolific open-source contributor with 48K+ GitHub followers.
Vincent Koc is an AI Research Engineer at Comet ML, an OpenClaw maintainer, and a lecturer at MIT. His career spans Qantas, Airbyte, and Microsoft. He brings deep experience in data pipelines and integrations, which shows in the clean, API-aware design of slacrawl and notcrawl.
Felix Krause (KrauseFx) is best known as the creator of fastlane, the iOS/Android deployment automation tool used by hundreds of thousands of developers (now maintained by Google). He’s the founder of ContextSDK, based in Vienna. His beeper-cli contribution brings the same developer-first philosophy: zero-config, read-only, and immediately useful.
Real-world scenarios
- The onboarding agent: A new developer joins your team. Instead of spending a week reading Slack history and Notion docs, your AI agent has already indexed everything via slacrawl and notcrawl. Ask “what was the decision about the API redesign?” and get an instant, sourced answer.
- The compliance search: Your company needs every mention of a specific client across years of Discord, Slack, and WhatsApp conversations. Run
discrawl search,slacrawl search, andwacrawl search, then cross-reference. - The personal CRM: With gog, your agent searches your entire Gmail history, Calendar, and Contacts. “When did I last email Sarah from Acme Corp?” becomes a simple SQL query.
- The Twitter triage: You wake up to 200 mentions. birdclaw’s AI-ranked inbox filters the noise, showing high-signal interactions first. Your agent can even draft replies based on your past communication style.
- The cross-platform research: You vaguely remember discussing a feature idea across Discord, Slack, and a Notion page, but can’t recall where. The crawl army’s consistent SQLite + FTS5 pattern lets you search all three at once.
The tools
From here on, we get into the technical details of each crawler. If you just care about the concept and not the implementation, you can skip to the takeaway.
discrawl - Discord History Crawler
Built by steipete, discrawl mirrors Discord guild data into local SQLite. It syncs channels, threads, members, and full message history via the Discord bot API. It also has a “wiretap” mode that reads the local Discord Desktop cache, letting you recover DMs and local-only conversations without needing a user token. It supports Git-backed archive publishing so teams can share org memory without distributing bot credentials.
slacrawl - Slack History Crawler
Vincent Koc’s contribution, slacrawl mirrors Slack workspaces into SQLite with FTS5 search. It supports three modes:
- API sync for full history
- Socket Mode for live tailing
- Desktop cache ingestion for local “wiretap” inspection
Like discrawl, it supports Git-backed snapshots for org-wide read access. Thread reply backfill and DM sync are available when a user token is provided.
wacrawl - WhatsApp Archaeology
Also by steipete, wacrawl takes a different approach from the API-based crawlers. It reads the local WhatsApp Desktop SQLite databases on macOS, copies them into a temporary snapshot, and imports chat data into its own archive. It’s read-only: it doesn’t send messages, decrypt backups, or touch the network. You get wacrawl search "release notes", wacrawl chats, and --json output for agent integration.
notcrawl - Notion Workspace Crawler
Vincent Koc’s second entry, notcrawl mirrors Notion workspaces into SQLite and normalized Markdown. It has two ingestion paths: local desktop cache (read-only) and the official Notion API. The dual output is clever: SQLite for machines, Markdown for humans and agents. It supports database metadata, CSV/TSV export, and Git-friendly compressed JSONL snapshots for sharing.
beeper-cli - Beeper Chat History
Felix Krause’s entry reads the local Beeper SQLite database (which already ships with FTS5). It provides thread listing, message browsing, and full-text search with proximity operators (e.g., party NEAR/5 christmas). JSON output makes it agent-friendly, and there’s built-in DM name resolution via platform bridge databases.
birdclaw - Twitter/X Workspace
Steipete’s most ambitious crawler, birdclaw is a full local Twitter workspace: not just archive import, but cached live reads, a web UI for triage, reply flows, and FTS5 search across tweets and DMs. It features:
- AI-ranked inbox for mentions, with an OpenAI scoring hook for low-signal filtering
- Block/mute management
- Git-friendly text backups
- Posting and DM replies - making it both a reader and a writer
gog - Google Suite CLI
The most expansive tool, gog is a CLI for the entire Google ecosystem: Gmail, Calendar, Drive, Docs, Sheets, Slides, Contacts, Tasks, Forms, Chat, Classroom, and even Keep. It handles search, send, upload, download, convert (including Markdown to Google Doc), label management, delegation, and encrypted backups via age. If your digital life runs on Google, gog makes it all queryable from the terminal.
The takeaway
Key points:
- AI agents are amnesiacs by default - the crawl army gives them access to the context that makes them useful
- Seven tools, one architecture: Go CLI + SQLite + FTS5, local-first, offline-capable, no cloud uploads
- The builders are core OpenClaw contributors, including its founder (steipete) - agent memory is a first-class concern in that ecosystem
The crawl army isn’t building new memory systems. It’s making the data you already have accessible to the agents that need it. One app at a time, one SQLite database at a time.