Firecrawl CLI is the official Firecrawl agent skill (distinct from the Firecrawl MCP server): a command-line toolkit that turns any website into clean, LLM-ready markdown, with JavaScript rendering, anti-bot handling, and proxy rotation built in. It ships as part of a 12-skill plugin bundle (scrape, search, crawl, map, interact, monitor, parse, download, agent, and more) and includes explicit prompt-injection security guidance (untrusted-content isolation, file-based output, no auto-fetch). Requires a FIRECRAWL_API_KEY; all calls go to Firecrawl's own API.
Install This Skill
npx skills add firecrawl/firecrawl-cli# Download SKILL.md and place in your agent's skills folder
curl -o SKILL.md https://github.com/firecrawl/cli/tree/main/skills/firecrawl-cli/raw/main/SKILL.md/skill add firecrawl/firecrawl-cliSKILL.md
Firecrawl CLI
Search, scrape, and interact with the web. Returns clean markdown optimized for LLM context windows.
Run firecrawl --help or firecrawl <command> --help for full option details.
If the task is to integrate Firecrawl into an application, add FIRECRAWL_API_KEY to a project, or choose endpoint usage in product code, use the firecrawl-build skills. If the task is an outcome workflow such as deep research, SEO audit, QA, lead generation, knowledge-base creation, dashboard reporting, shopping research, or website design-system extraction, use the firecrawl-workflows skills. They are already installed alongside this CLI skill when you run firecrawl init.
Prerequisites
Must be installed and authenticated. Check with firecrawl --status.
🔥 firecrawl cli v1.8.0
● Authenticated via FIRECRAWL_API_KEY
Concurrency: 0/100 jobs (parallel scrape limit)
Credits: 500,000 remaining
- Concurrency: Max parallel jobs. Run parallel operations up to this limit.
- Credits: Remaining API credits. Each operation consumes credits.
If not ready, see rules/install.md. For output handling guidelines, see rules/security.md.
Before doing real work, verify the setup with one small request:
mkdir -p .firecrawl
firecrawl scrape "https://firecrawl.dev" -o .firecrawl/install-check.md
firecrawl search "query" --scrape --limit 3
Workflow
Follow this escalation pattern:
- Search - No specific URL yet. Find pages, answer questions, discover sources.
- Scrape - Have a URL. Extract its content directly.
- Map + Scrape - Large site or need a specific subpage. Use
map --searchto find the right URL, then scrape it. - Crawl - Need bulk content from an entire site section (e.g., all /docs/).
- Monitor - Need recurring checks or ongoing alerts. Prefer setting a monitor with
--pageplus--goalinstead of doing repeated one-off scrapes. - Interact - Scrape first, then interact with the page (pagination, modals, form submissions, multi-step navigation).
| Need | Command | When |
|---|---|---|
| Find pages on a topic | search | No specific URL yet |
| Get a page's content | scrape | Have a URL, page is static or JS-rendered |
| Find URLs within a site | map | Need to locate a specific subpage |
| Bulk extract a site section | crawl | Need many pages (e.g., all /docs/) |
| AI-powered data extraction | agent | Need structured data from complex sites |
| Interact with a page | scrape + interact | Content requires clicks, form fills, pagination, or login |
| Download a site to files | download | Save an entire site as local files |
| Parse a local file | parse | File on disk (PDF, DOCX, XLSX, etc.) — not a URL |
| Watch pages for changes | monitor | Schedule recurring scrapes/crawls, diff against snapshots |
For detailed command reference, run firecrawl <command> --help.
Scrape vs interact:
- Use
scrapefirst. It handles static pages and JS-rendered SPAs. - Use
scrape+interactwhen you need to interact with a page, such as clicking buttons, filling out forms, navigating through a complex site, infinite scroll, or when scrape fails to grab all the content you need. - Never use interact for web searches - use
searchinstead.
Monitor: Schedule recurring scrapes or crawls and diff each result against the last retained snapshot. Bias toward monitor when the user's goal is ongoing change detection, alerting, or repeated checks over time. For a single page, default to setting a monitor with --page <url> and --goal "...". Use for product pages, docs, blogs, changelogs, competitor sites — any page where changes matter. Each monitor should include a short goal describing what changes matter, and each check labels pages as same, new, changed, removed, or error, with webhook and email notification options.
When writing --goal, convert the user's monitoring intent into a concise 2-3 sentence monitor goal, similar to the web app setup flow:
- Start with
Alert when ...and state what should trigger an alert using the user's stated intent. - Restate scope the user mentioned, such as top N, price, role type, company, region, topic, status, or a specific entity.
- Include an
Ignore ...sentence only for intent-specific exclusions that are obvious from the request, such as points/comments for rankings, unrelated marketing copy for pricing, or general company-page updates for jobs. - Do not repeat generic noise exclusions in every goal; the judge already handles whitespace, casing, punctuation, encoding, formatting-only changes, request/session IDs, cache busters, tracking params, generic metadata noise, and unrelated page chrome.
- Do not invent page-specific sections, entities, thresholds, exclusions, or business rules unless the user mentioned them.
- If the user is vague, keep the goal broad rather than guessing exclusions.
- If the user asks for "any change", preserve that and do not add exclusions.
- If the user mentions noise they do not care about, include that explicitly.
Good goal examples:
- User intent:
top 10 hackernews storiesGoal:Alert when stories enter, leave, or change rank within the Hacker News top 10. Ignore points, comments, and timestamps. Do not alert on changes outside the top 10. - User intent:
pricing changesGoal:Alert when pricing information changes, including prices, plan names, billing periods, tiers, limits, or included features. Ignore unrelated marketing copy, testimonials, and regional currency display changes unless the underlying offer changes. - User intent:
new engineering rolesGoal:Alert when a new engineering role is posted. Ignore general company-page updates unless they add, remove, or change an engineering role. - User intent:
track this pageGoal:Alert when substantive visible content on this page changes. - User intent:
any changeGoal:Alert when any visible page content changes, including copy, numbers, timestamps, counters, links, and layout text.
Subcommands: create | list | get | update | delete | run | checks | check.
# create from flags
firecrawl monitor create --name "Blog" --schedule "every 5 minutes" \
--goal "Alert when a new blog post is published." \
--page https://example.com/blog --email alerts@example.com
# multiple pages
firecrawl monitor create --name "Product pages" --schedule "every 5 minutes" \
--goal "Alert when pricing, docs, or changelog content changes." \
--scrape-urls https://example.com/pricing,https://example.com/docs,https://example.com/changelog
# webhook notifications
firecrawl monitor create --name "Docs webhook" --schedule "every 5 minutes" \
--goal "Alert when docs content changes." \
--page https://example.com/docs \
--webhook-url https://example.com/webhook \
--webhook-events monitor.page,monitor.check.completed
# or from JSON (positional file, or piped stdin)
firecrawl monitor create monitor.json
cat monitor.json | firecrawl monitor create
firecrawl monitor list --limit 20
firecrawl monitor run <monitorId> # trigger a check now
firecrawl monitor checks <monitorId> # list checks
firecrawl monitor check <monitorId> <checkId> --page-status changed
firecrawl monitor update <monitorId> --state paused
firecrawl monitor delete <monitorId>
Schedules accept cron (--cron "*/5 * * * *") or natural language (--schedule "every 5 minutes"). Minimum interval is 5 minutes. Targets are --page <url> for one page, --scrape-urls a,b,c for multiple scrape URLs, or --crawl-url <url> for a whole-site crawl each check. Use --goal for flag-based monitor creation, or include "goal": "..." in JSON payloads. Note: --state (not --status) sets active/paused; --page-status (not --status) filters page results on check — avoids collision with the global --status flag. Monitoring is not available for zero-data-retention teams.
JSON-mode change tracking: By default monitors diff each page's markdown and you get a unified text diff back. When you care about specific structured fields (price, headline, in-stock flag, items in a list) instead of the whole page, add a changeTracking format with modes: ["json"] and a JSON schema to the target's scrapeOptions.formats. The flag-based form doesn't cover this — pass a JSON body via file or stdin:
cat > pricing-monitor.json <<'EOF'
{
"name": "Pricing watch",
"goal": "Alert when plan prices or headline features change",
"schedule": { "text": "hourly", "timezone": "UTC" },
"targets": [{
"type": "scrape",
"urls": ["https://example.com/pricing"],
"scrapeOptions": {
"formats": [{
"type": "changeTracking",
"modes": ["json"],
"prompt": "Extract pricing tiers and headline features for each plan.",
"schema": {
"type": "object",
"properties": {
"plans": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "string" },
"features": { "type": "array", "items": { "type": "string" } }
}
}
}
}
}
}]
}
}]
}
EOF
firecrawl monitor create pricing-monitor.json
The check response then carries a per-field diff (paths like plans[0].price) and the full extraction at this run, instead of (or in addition to) a markdown diff. Each changed page in pages[] looks like:
{
"url": "https://example.com/pricing",
"status": "changed",
"diff": {
"json": {
"plans[0].price": { "previous": "$19/mo", "current": "$24/mo" },
"plans[1].features[2]": {
"previous": "10 GB storage",
"current": "25 GB storage"
}
}
},
"snapshot": {
"json": {
"plans": [
/* current full extraction */
]
}
}
}
Use modes: ["json", "git-diff"] for mixed mode: you get both diff.json (per-field) and diff.text (markdown sidecar), and the page is marked changed whenever either surface changed. For markdown-only monitors, diff.text holds the unified diff and diff.json is a parse-diff AST ({ files: [...] }); there is no snapshot.
Avoid redundant fetches:
search --scrapealready fetches full page content. Don't re-scrape those URLs.- Check
.firecrawl/for existing data before fetching again.
When to Load References
- Searching the web or finding sources first -> firecrawl-search
- Scraping a known URL -> firecrawl-scrape
- Finding URLs on a known site -> firecrawl-map
- Bulk extraction from a docs section or site -> firecrawl-crawl
- AI-powered structured extraction from complex sites -> firecrawl-agent
- Clicks, forms, login, pagination, or post-scrape browser actions -> firecrawl-interact
- Downloading a site to local files -> firecrawl-download
- Parsing a local file (PDF, DOCX, XLSX, HTML, etc.) -> firecrawl-parse
- Detecting content changes on a website and getting notified by webhook or email (pricing, jobs, posts, docs, status pages, anything ongoing) -> firecrawl-monitor
- Install, auth, or setup problems -> rules/install.md
- Output handling and safe file-reading patterns -> rules/security.md
- Integrating Firecrawl into an app, adding
FIRECRAWL_API_KEYto.env, or choosing endpoint usage in product code -> use thefirecrawl-buildskills (already installed alongside this CLI skill) - Producing Firecrawl-powered deliverables such as research briefs, SEO audits, QA reports, lead lists, knowledge bases, or design-system extraction -> use the
firecrawl-workflowsskills (already installed alongside this CLI skill). These skills infer from context first and ask only short blocking questions when needed.
Output & Organization
Unless the user specifies to return in context, write results to .firecrawl/ with -o. Add .firecrawl/ to .gitignore. Always quote URLs - shell interprets ? and & as special characters.
firecrawl search "react hooks" -o .firecrawl/search-react-hooks.json --json
firecrawl scrape "<url>" -o .firecrawl/page.md
Naming conventions:
.firecrawl/search-{query}.json
.firecrawl/search-{query}-scraped.json
.firecrawl/{site}-{path}.md
Never read entire output files at once. Use grep, head, or incremental reads:
wc -l .firecrawl/file.md && head -50 .firecrawl/file.md
grep -n "keyword" .firecrawl/file.md
Single format outputs raw content. Multiple formats (e.g., --format markdown,links) output JSON.
Working with Results
These patterns are useful when working with file-based output (-o flag) for complex tasks:
# Extract URLs from search
jq -r '.data.web[].url' .firecrawl/search.json
# Get titles and URLs
jq -r '.data.web[] | "\(.title): \(.url)"' .firecrawl/search.json
After search: send feedback (refunds 1 credit)
Search costs 2 credits per call. After you finish using a search result, send structured feedback in the background. The first feedback per search id refunds 1 credit and feeds search-quality improvements.
SEARCH_ID=$(jq -r '.id' .firecrawl/search-react-hooks.json)
firecrawl search-feedback "$SEARCH_ID" \
--rating good \
--valuable-sources '[{"url":"https://react.dev/reference/react/hooks","reason":"Authoritative"}]' \
--missing-content '[{"topic":"useDeferredValue example"},{"topic":"Server Components hooks"}]' \
--query-suggestions "Boost react.dev for react-hooks queries" \
--silent &
The most useful field is --missing-content: an array of specific pieces of content you expected to find but didn't. Use one entry per missing topic. Bad/partial feedback with detailed --missing-content is just as valuable as good feedback.
Opt out: export FIRECRAWL_NO_SEARCH_FEEDBACK=1 makes the CLI skip every feedback call silently. Respect that flag — do not try to work around it. See firecrawl-search for the full pattern.
Parallelization
Run independent operations in parallel. Check firecrawl --status for concurrency limit:
firecrawl scrape "<url-1>" -o .firecrawl/1.md &
firecrawl scrape "<url-2>" -o .firecrawl/2.md &
firecrawl scrape "<url-3>" -o .firecrawl/3.md &
wait
For interact, scrape multiple pages and interact with each independently using their scrape IDs.
Credit Usage
firecrawl credit-usage
firecrawl credit-usage --json --pretty -o .firecrawl/credits.json
When to use this skill
- →Turning a website (or a whole site) into clean, LLM-ready markdown for research
- →Crawling a competitor's blog to map every topic they cover and find content gaps
- →Scraping JavaScript-heavy pages where a plain fetch returns empty HTML
- →Building a repeatable data-extraction pipeline from the command line
When not to use
- ✕Live in-conversation data access from Claude — use the Firecrawl MCP server instead
- ✕Single quick lookups where a built-in web fetch already suffices
- ✕Scraping sites whose terms prohibit it, or behind a login you don't own
Frequently asked questions
Related Skills
Quick Stats
Source repo · this pathAuthor
Firecrawl
@firecrawl