Scrape any API / JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. Enables you to follow pagination recursively from the payload without the need to visit the HTML page.
Download any JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output.
Apify
variableThis scraper is different from cheerio-scraper that you can handle the errors before the handlePageFunction
fails.
Using the handleError
input, you can enqueue extra requests before failing, allowing you to recover or trying a different URL.
1{ 2 handleError: async ({ addRequest, request, response, error }) => { 3 request.noRetry = error.message.includes('Unexpected') || response.statusCode == 404; 4 5 addRequest({ 6 url: `${request.url}?retry=true`, 7 }); 8 } 9}
This function can filter, map and enqueue requests at the same time. The difference is that the userData from the current request will pass to the next request.
1const startUrls = [{ 2 url: "https://example.com", 3 userData: { 4 firstValue: 0, 5 } 6}]; 7 8// assuming the INPUT url above 9await Apify.call('pocesar/json-downloader', { 10 filterMap: async ({ request, addRequest, data }) => { 11 12 if (request.userData.isPost) { 13 // userData will be inherited from previous request 14 request.userData.firstValue == 0; 15 16 // return the data only after the POST request 17 return data; 18 } else { 19 // add the same request, but as a POST 20 addRequest({ 21 url: `${request.url}/?method=post`, 22 method: 'POST', 23 payload: { 24 username: 'username', 25 password: 'password', 26 }, 27 headers: { 28 'Content-Type': 'application/json', 29 }, 30 userData: { 31 isPost: true 32 } 33 }); 34 // omit return or return a falsy value will ignore the output 35 } 36 }, 37})
1{ 2 filterMap: async ({ flattenObjectKeys, data }) => { 3 return flattenObjectKeys(data); 4 } 5} 6/** 7 * an object like 8 * { 9 * "deep": { 10 * "nested": ["state", "state1"] 11 * } 12 * } 13 * 14 * becomes 15 * { 16 * "deep.nested.0": "state", 17 * "deep.nested.1": "state1" 18 * } 19 */
1{ 2 "startUrls": [ 3 { 4 "url": "https://ow0o5i3qo7-dsn.algolia.net/1/indexes/prod_PUBLIC_STORE/query?x-algolia-agent=Algolia%20for%20JavaScript%20(4.13.0)%3B%20Browser%20(lite)&x-algolia-api-key=0ecccd09f50396a4dbbe5dbfb17f4525&x-algolia-application-id=OW0O5I3QO7", 5 "method": "POST", 6 "payload": "{\"query\":\"instagram\",\"page\":0,\"hitsPerPage\":24,\"restrictSearchableAttributes\":[],\"attributesToHighlight\":[],\"attributesToRetrieve\":[\"title\",\"name\",\"username\",\"userFullName\",\"stats\",\"description\",\"pictureUrl\",\"userPictureUrl\",\"notice\",\"currentPricingInfo\"]}", 7 "headers": { 8 "content-type": "application/x-www-form-urlencoded" 9 } 10 } 11 ] 12}
1{ 2 filterMap: async ({ addRequest, request, data }) => { 3 if (data.nbPages > 1 && data.page < data.nbPages) { 4 // get the current payload from the input 5 const payload = JSON.parse(request.payload); 6 7 // change the page number 8 request.payload = { ...payload, page: data.page + 1 }; 9 // add the request for parsing the next page 10 addRequest(request); 11 } 12 13 return data; 14 } 15}
1{ 2 filterMap: async ({ addRequest, request, data }) => { 3 if (data.hits.length < 10) { 4 return; 5 } 6 7 return data; 8 } 9}
1{ 2 filterMap: async ({ addRequest, request, data }) => { 3 return data.hits; // just return an array from here 4 } 5}
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!