Sitemap To Request Queue

Sitemap To Request Queue

Download sitemap XMLs and put them in a RequestQueue

DEVELOPER_TOOLSOPEN_SOURCEApify

Sitemap to RequestQueue

Downloads a sitemap.xml files and append them to a RequestQueue of your choice.

Example

1// this is your actor
2Apify.main(async () => {
3  const { proxyConfig } = await Apify.getInput();
4  const requestQueue = await Apify.openRequestQueue();
5
6  // this is needed so it doesn't execute everytime there's a migration
7  const run = (await Apify.getValue('SITEMAP-CALL', run)) || { runId: '', actorId: '' };
8
9  if (!run || !run.runId) {
10    // this might take a while!
11    const runCall = await Apify.call('pocesar/sitemap-to-request-queue', {
12      // required proxy configuration, like { useApifyProxy: true, apifyProxyGroups: ['SHADER'] }
13      proxyConfig,
14      // use this for this run's RequestQueue, but can be a named one, or if you
15      // leave it empty, it will be placed on the remote run RQ
16      targetRQ: requestQueue.queueId,
17      // required sitemaps
18      startUrls: [{
19        url: "http://example.com/sitemap1.xml",
20        userData: {
21          label: "DETAILS" // userData will passthrough
22        }
23      }, {
24        url: "http://example.com/sitemap2.xml",
25      }],
26      // Provide your own transform callback to filter or alter the request before adding it to the queue
27      transform: ((request) => {
28        if (!request.url.includes('detail')) {
29          return null;
30        }
31
32        request.userData.label = request.url.includes('/item/') ? 'DETAILS' : 'CATEGORY';
33
34        return request;
35      }).toString()
36    }, { waitSecs: 0 });
37
38    run.runId = runCall.id;
39    run.actorId = runCall.actId;
40
41    await Apify.setValue('SITEMAP-CALL', run);
42  }
43
44  await Apify.utils.waitForRunToFinish(run);
45
46  const crawler = new Apify.PuppeteerCrawler({
47    requestQueue, // ready to use!
48    //...
49  });
50
51  await crawler.run();
52});

License

Apache 2.0

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!