Download sitemap XMLs and put them in a RequestQueue
Downloads a sitemap.xml files and append them to a RequestQueue of your choice.
1// this is your actor 2Apify.main(async () => { 3 const { proxyConfig } = await Apify.getInput(); 4 const requestQueue = await Apify.openRequestQueue(); 5 6 // this is needed so it doesn't execute everytime there's a migration 7 const run = (await Apify.getValue('SITEMAP-CALL', run)) || { runId: '', actorId: '' }; 8 9 if (!run || !run.runId) { 10 // this might take a while! 11 const runCall = await Apify.call('pocesar/sitemap-to-request-queue', { 12 // required proxy configuration, like { useApifyProxy: true, apifyProxyGroups: ['SHADER'] } 13 proxyConfig, 14 // use this for this run's RequestQueue, but can be a named one, or if you 15 // leave it empty, it will be placed on the remote run RQ 16 targetRQ: requestQueue.queueId, 17 // required sitemaps 18 startUrls: [{ 19 url: "http://example.com/sitemap1.xml", 20 userData: { 21 label: "DETAILS" // userData will passthrough 22 } 23 }, { 24 url: "http://example.com/sitemap2.xml", 25 }], 26 // Provide your own transform callback to filter or alter the request before adding it to the queue 27 transform: ((request) => { 28 if (!request.url.includes('detail')) { 29 return null; 30 } 31 32 request.userData.label = request.url.includes('/item/') ? 'DETAILS' : 'CATEGORY'; 33 34 return request; 35 }).toString() 36 }, { waitSecs: 0 }); 37 38 run.runId = runCall.id; 39 run.actorId = runCall.actId; 40 41 await Apify.setValue('SITEMAP-CALL', run); 42 } 43 44 await Apify.utils.waitForRunToFinish(run); 45 46 const crawler = new Apify.PuppeteerCrawler({ 47 requestQueue, // ready to use! 48 //... 49 }); 50 51 await crawler.run(); 52});
Apache 2.0
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!