What it does
The infrastructure for “clean, LLM-ready data” from the live web is a real bottleneck for AI agents and RAG pipelines. General scrapers leave you to handle JavaScript rendering, complex markup, robots.txt, and multi-step interactions yourself — and the output rarely lands in a shape that an LLM can consume directly.
Firecrawl bundles that infrastructure into one API. Quoting firecrawl.dev: “the infrastructure layer that helps AI find, read, and act on the live web.” Output is returned as LLM-ready markdown or structured data from the start.
Key features — three integrated capabilities
-
Search — web search
Run a query and get search results, with optional content extraction for each hit in the same call.
-
Scrape — page → clean data
Extract a single URL into JSON, markdown, or branding formats. JavaScript rendering and complex markup are handled automatically.
-
Interact — page automation
Automate clicks, typing, and navigation to reach content that static scraping cannot.
Additional endpoints include Agent (autonomous multi-source research), Crawl (multi-page extraction with depth and page limits), Map (discover indexed URLs on a site), and Batch Scrape (parallel processing of many URLs).
Cloud vs Open Source
| Aspect | Open Source (this repo) | Cloud (firecrawl.dev) |
|---|---|---|
| Operator | You | Firecrawl team |
| License | AGPL-3.0 (SDKs / some UI = MIT) | SaaS terms |
| Extra features | Core engine | Additional cloud-only features (see README comparison) |
| Cost | Your infra cost | 1,000 credits/month free + paid plans |
| Data control | Full self-control | Routed through Firecrawl infrastructure |
| Best for | Strict data residency, cost or customization control | Fast start without infrastructure overhead |
SDKs
| Language | Install |
|---|---|
| Python | pip install firecrawl-py |
| Node.js | npm install @mendable/firecrawl-js |
| Java | JitPack via Gradle / Maven (com.github.firecrawl:firecrawl-java-sdk:2.0) |
| Elixir | {:firecrawl, "~> 1.0"} |
| Rust | firecrawl = "2" |
A community Go SDK is linked separately in the README.
Usage
Cloud (fastest start) — generate an API key at firecrawl.dev and call directly.
curl -X POST 'https://api.firecrawl.dev/v2/search' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{"query": "firecrawl", "limit": 5}'
Self-host — use the docker-compose stack at the repo root.
git clone https://github.com/firecrawl/firecrawl
cd firecrawl
docker compose up
See SELF_HOST.md in the repo for environment setup and dependencies.
From Claude Code — use the Firecrawl MCP. Point it at a self-hosted instance via FIRECRAWL_API_URL to keep the cloud out of the loop entirely.
Notes
- AGPL-3.0 has real obligations — review the copyleft terms before integrating the engine source into a commercial product. Simply calling the API as a client (via MCP or SDK) is generally unaffected.
- SDKs and some UI components are MIT — explicit in the README. Client-side integration draws only the MIT-licensed parts.
- robots.txt respected by default — README quote: “Firecrawl respects robots.txt by default,” and: “It is the sole responsibility of end users to respect websites’ policies when scraping.”
- Adoption — firecrawl.dev cites over one million signups and customers including Apple, Canva, and Lovable.
- Actively maintained — near-daily commits since the first commit in April 2024.