Speed of light scraping with Rust programming language! This is an early alpha version for experimenting, use at your own risk!
This is super early version for experimentation. Use at your own risk!
Speed of light scraping with Rust programming language. This is meant to be a faster (but less flexible) version of Apify's JavaScript based Cheerio Scraper.
Rust is one of the fastest programming languages out there. In many cases, it matches the speed of C. Although JavaScript offers huge flexibility and development speed, we can use Rust to significantly speed up the crawling and/or reduce costs. Rust scraper is both faster and requires less memory.
You can read about fixes and updates in the detailed changelog file.
Because this scraper is so fast, you can easily take a website down. This matters especially if you scrape more than few hundred URLs and use the async scraping mode. How to prevent that:
max_concurrency
input field. You can still scrape very fast and with tiny memory footprint if you set it below 10
.max_concurrency
, only scrape large websites that can handle a load of 1000 requests/second and more.If we see you abusing this scraper for attacks on Apify platform, your account can be banned.
Rust is statically typed language compiled directly into machine code. Because of this, it can optimize the code into the most efficient structures and algorithms. Of course, it is also job of the programmer to write the code efficiently so we expect further improvements for this scraper.
extract
object) to define what should be scraped.max_concurrency
unless CPU gets overwhelmed.Input is a JSON object with the properties below explained in detail on the Apify Store page. You can also set it up on Apify platform with a nice UI.
You need to provide an extraction configuration object. This object defines selectors to find on the page, what to extract from those selector and finally names of the fields that the data should be saved as.
extract
(array) is an array of objects where each object has:
field_name
(string) Defines to which field will the data be assigned in your resulting datasetselector
(string) CSS selector to find the data to extractextract_type
(object) What to extract
type
(string) Can be Text
or Attribute
content
(string) Provide only when type
is Attribute
Full INPUT example:
1{ 2 "proxy_settings": { 3 "useApifyProxy": true, 4 "apifyProxyGroups": ["SHADER"] 5 }, 6 "urls": [ 7 { "url": "https://www.amazon.com/dp/B01CYYU8YW" }, 8 { "url": "https://www.amazon.com/dp/B01FXMDA2O" }, 9 { "url": "https://www.amazon.com/dp/B00UNT0Y2M" } 10 ], 11 "extract": [ 12 { 13 "field_name": "title", 14 "selector": "#productTitle", 15 "extract_type": { 16 "type": "Text" 17 } 18 }, 19 { 20 "field_name": "customer_reviews", 21 "selector": "#acrCustomerReviewText", 22 "extract_type": { 23 "type": "Text" 24 } 25 }, 26 { 27 "field_name": "seller_link", 28 "selector": "#bylineInfo", 29 "extract_type": { 30 "type": "Attribute", 31 "content": "href" 32 } 33 } 34 ] 35}
Output example in JSON (This depends purely on your extract
config)
1[ 2 { 3 "seller_link":"/Propack/b/ref=bl_dp_s_web_3039360011?ie=UTF8&node=3039360011&field-lbr_brands_browse-bin=Propack","customer_reviews":"208 customer reviews", 4 "title":"Propack Twist - Tie Gallon Size Storage Bags 100 Bags Pack Of 4" 5 }, 6 { 7 "byline_link":"/Ziploc/b/ref=bl_dp_s_web_2581449011?ie=UTF8&node=2581449011&field-lbr_brands_browse-bin=Ziploc","customers":"561 customer reviews", 8 "title":"Ziploc Gallon Slider Storage Bags, 96 Count" 9 }, 10 { 11 "byline_link":"/Reynolds/b/ref=bl_dp_s_web_2599601011?ie=UTF8&node=2599601011&field-lbr_brands_browse-bin=Reynolds","customers":"456 customer reviews", 12 "title":"Reynolds Wrap Aluminum Foil (200 Square Foot Roll)" 13 } 14]
You can run this locally if you have Rust installed. You need to build it before running. If you want to use Apify Proxy, don't forget to add your APIFY_PROXY_PASSWORD
into the environment, otherwise you will get a nasty error.
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!