[ Tool · Data Engineering ]

Firecrawl

🔥 The API to search, scrape, and interact with the web for AI. Three integrated capabilities — Search, Scrape, Interact — exposed through one API. Open source under AGPL-3.0 and self-hostable via docker-compose, the engine also powers the firecrawl.dev cloud SaaS run by the same team.

firecrawl/firecrawl ·updated 2026-05-12

$ git clone https://github.com/firecrawl/firecrawl && cd firecrawl && docker compose up copy

What it does

The infrastructure for “clean, LLM-ready data” from the live web is a real bottleneck for AI agents and RAG pipelines. General scrapers leave you to handle JavaScript rendering, complex markup, robots.txt, and multi-step interactions yourself — and the output rarely lands in a shape that an LLM can consume directly.

Firecrawl bundles that infrastructure into one API. Quoting firecrawl.dev: “the infrastructure layer that helps AI find, read, and act on the live web.” Output is returned as LLM-ready markdown or structured data from the start.

Key features — three integrated capabilities

Search — web search

Run a query and get search results, with optional content extraction for each hit in the same call.
Scrape — page → clean data

Extract a single URL into JSON, markdown, or branding formats. JavaScript rendering and complex markup are handled automatically.
Interact — page automation

Automate clicks, typing, and navigation to reach content that static scraping cannot.

Additional endpoints include Agent (autonomous multi-source research), Crawl (multi-page extraction with depth and page limits), Map (discover indexed URLs on a site), and Batch Scrape (parallel processing of many URLs).

Cloud vs Open Source

Aspect	Open Source (this repo)	Cloud (firecrawl.dev)
Operator	You	Firecrawl team
License	AGPL-3.0 (SDKs / some UI = MIT)	SaaS terms
Extra features	Core engine	Additional cloud-only features (see README comparison)
Cost	Your infra cost	1,000 credits/month free + paid plans
Data control	Full self-control	Routed through Firecrawl infrastructure
Best for	Strict data residency, cost or customization control	Fast start without infrastructure overhead

SDKs

Language	Install
Python	`pip install firecrawl-py`
Node.js	`npm install @mendable/firecrawl-js`
Java	JitPack via Gradle / Maven (`com.github.firecrawl:firecrawl-java-sdk:2.0`)
Elixir	`{:firecrawl, "~> 1.0"}`
Rust	`firecrawl = "2"`

A community Go SDK is linked separately in the README.

Usage

Cloud (fastest start) — generate an API key at firecrawl.dev and call directly.

curl -X POST 'https://api.firecrawl.dev/v2/search' \
  -H 'Authorization: Bearer fc-YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"query": "firecrawl", "limit": 5}'

Self-host — use the docker-compose stack at the repo root.

git clone https://github.com/firecrawl/firecrawl
cd firecrawl
docker compose up

See SELF_HOST.md in the repo for environment setup and dependencies.

From Claude Code — use the Firecrawl MCP. Point it at a self-hosted instance via FIRECRAWL_API_URL to keep the cloud out of the loop entirely.

Notes

AGPL-3.0 has real obligations — review the copyleft terms before integrating the engine source into a commercial product. Simply calling the API as a client (via MCP or SDK) is generally unaffected.
SDKs and some UI components are MIT — explicit in the README. Client-side integration draws only the MIT-licensed parts.
robots.txt respected by default — README quote: “Firecrawl respects robots.txt by default,” and: “It is the sole responsibility of end users to respect websites’ policies when scraping.”
Adoption — firecrawl.dev cites over one million signups and customers including Apple, Canva, and Lovable.
Actively maintained — near-daily commits since the first commit in April 2024.

§ 7

Frequently Asked Questions

frequently asked

§ 8.1

What is Firecrawl?

Quoting the README: "The API to search, scrape, and interact with the web for AI." A full-stack backend service written in TypeScript, Python, Rust, and Java that both powers the [firecrawl.dev](https://firecrawl.dev) cloud SaaS and is open-source under AGPL-3.0 for anyone to self-host.

§ 8.2

Is it open source? What's the license?

Yes — published on GitHub under AGPL-3.0. From the README: "This project is primarily licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). The SDKs and some UI components are licensed under the MIT License." The core engine is AGPL; SDKs and some UI components are MIT.

§ 8.3

How does it relate to firecrawl.dev?

firecrawl.dev is the cloud SaaS run by the same Firecrawl team — a hosted version of this engine with additional cloud-only features (see the README's "Open Source vs Cloud" comparison). The free plan starts at 1,000 credits per month.

§ 8.4

How do I self-host it?

Use the `docker-compose.yaml` in the repo root and follow the `SELF_HOST.md` guide. It runs as a containerized stack (with services like Redis as dependencies). Not a single `docker run`, but lighter than bare-metal infrastructure deployment.

§ 8.5

Which SDKs are available?

Officially supported: Python (`firecrawl-py`), Node.js (`@mendable/firecrawl-js`), Java (Gradle/Maven via JitPack), Elixir (`firecrawl`), and Rust (`firecrawl`). A community Go SDK is also linked in the README.

§ 8.6

How do I use it from Claude Code?

Through the [Firecrawl MCP](/en/tools/firecrawl-mcp/). The MCP server can target either the cloud (`FIRECRAWL_API_KEY`) or a self-hosted instance (`FIRECRAWL_API_URL`), so you can use your own deployment from inside Claude as well.