Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.
FireScrape is a powerful web scraper built with Crawlee and Puppeteer. It crawls websites, extracts content, converts it into Markdown format, and structures the data — perfect for generating datasets for LLMs.
1{ 2 "title": "FireScrape Input Schema", 3 "type": "object", 4 "schemaVersion": 1, 5 "properties": { 6 "startUrls": { 7 "title": "Start URLs", 8 "type": "array", 9 "description": "List of URLs to start crawling from.", 10 "editor": "requestListSources", 11 "prefill": [{ "url": "https://apify.com" }] 12 }, 13 "maxPages": { 14 "title": "Maximum Pages", 15 "type": "integer", 16 "description": "The maximum number of pages to crawl.", 17 "default": 50, 18 "minimum": 1 19 }, 20 "proxyConfig": { 21 "title": "Proxy Configuration", 22 "type": "object", 23 "description": "Select proxy settings.", 24 "editor": "proxy", 25 "default": { "useApifyProxy": true } 26 }, 27 "screenshot": { 28 "title": "Take Screenshots", 29 "type": "boolean", 30 "description": "Enable this to capture a screenshot of each page.", 31 "default": true 32 }, 33 "enqueue": { 34 "title": "Enqueue Links", 35 "type": "boolean", 36 "description": "Whether to follow and enqueue new links on the page.", 37 "default": true 38 }, 39 "getText": { 40 "title": "Extract Text Content", 41 "type": "boolean", 42 "description": "Extract only the visible text content from the page.", 43 "default": false 44 }, 45 "getHtml": { 46 "title": "Extract HTML Content", 47 "type": "boolean", 48 "description": "Extract the full HTML content of the page.", 49 "default": false 50 } 51 }, 52 "required": ["startUrls"] 53}
Each successfully scraped page will output a structured JSON object:
1{ 2 "url": "https://example.com", 3 "title": "Example Page", 4 "metadata": { "description": "An example page", "keywords": ["example", "page"] }, 5 "markdown": "# Example Page\n\nThis is an example page content...", 6 "textContent": "This is an example page content...", 7 "htmlContent": "<html><body><h1>Example Page</h1>...</body></html>", 8 "screenshot": "data:image/png;base64,iVBORw..." 9}
Feel free to extend FireScrape with additional features — like handling dynamic content, authentication, or specialized formatting.
Happy scraping! 🚀🔥
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!