This template is a production ready boilerplate for developing with PuppeteerCrawler. The PuppeteerCrawler provides a simple framework for parallel crawling of web pages using headless Chrome with Puppeteer. Since PuppeteerCrawler uses headless Chrome to download web pages and extract data, it is useful for crawling of websites that require to execute JavaScript.
If you're looking for examples or want to learn more visit:
Actor.getInput() gets the input from INPUT.json where the start urls are defined
Create a configuration for proxy servers to be used during the crawling with Actor.createProxyConfiguration() to work around IP blocking. Use Apify Proxy or your own Proxy URLs provided and rotated according to the configuration. You can read more about proxy configuration here.
Create an instance of Crawlee's Puppeteer Crawler with new PuppeteerCrawler(). You can pass options to the crawler constructor as:
proxyConfiguration - provide the proxy configuration to the crawler
requestHandler - handle each request with custom router defined in the routes.js file.
Handle requests with the custom router from routes.js file. Read more about custom routing for the Cheerio Crawler here
Create a new router instance with new createPuppeteerRouter()
Define default handler that will be called for all URLs that are not handled by other handlers by adding router.addDefaultHandler(() => { ... })
Define additional handlers - here you can add your own handling of the page
1router.addHandler('detail', async ({ request, page, log }) => {
2 const title = await page.title();
3 // You can add your own page handling here
45 await Dataset.pushData({
6 url: request.loadedUrl,
7 title,
8 });
9});
crawler.run(startUrls); start the crawler and wait for its finish
Resources
If you're looking for examples or want to learn more visit:
You can also deploy the project on your local machine to Apify without the need for the Git repository.
Log in to Apify. You will need to provide your Apify API Token to complete this action.
apify login
Deploy your Actor. This command will deploy and build the Actor on the Apify Platform. You can find your newly created Actor under Actors -> My Actors.
apify push
Documentation reference
To learn more about Apify and Actors, take a look at the following resources:
Is it legal to scrape job listings or public data?
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
Do I need to code to use this scraper?
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
What data does it extract?
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Can I scrape multiple pages or filter by location?
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
How do I get started?
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!