General Purpose Web Scraping and Metadata Extraction

General Purpose Web Scraping and Metadata Extraction

This project uses the Apify platform to scrape data from web pages, collect metadata, and store results in an Apify dataset. It features functions for managing date ranges, encoding identifiers, and handling large datasets, aiming to efficiently extract and store structured data for analysis.

AIAUTOMATIONDEVELOPER_TOOLSApify

Airbnb Data Scraper using Apify

This project is an Apify actor designed to scrape data from Airbnb property listings, including availability, pricing, and other details, over a given date range. The actor uses dynamic parameters for flexibility and stores the extracted data in Apify's dataset or a CSV file.


Features

  • Dynamic Date Range: Automatically generates check-in and check-out dates for the specified number of days.
  • Recursive JSON Parsing: Extracts all paths and values from the JSON responses for comprehensive data collection.
  • Data Storage: Pushes the extracted data to the Apify dataset or saves it locally as a CSV.
  • Configurable Inputs: Accepts various input parameters like URLs, stay duration, number of guests, and more.

Input Schema

The script accepts the following inputs via Apify:

ParameterDescriptionExample Value
startUrlsList of Airbnb listing URLs to scrape.[{ "url": "https://www.airbnb.com/rooms/12345" }]
checkInDateStarting date for the scraping."2024-11-21"
Stay_DaysDuration of each stay in days.1
numberOfDaysTotal number of days to scrape data for.60
adultsNumber of adults for the booking.2
childrenNumber of children for the booking.0
petsIndicates if pets are included in the booking.0

How It Works

  1. Dynamic Date Generator:

    • Generates check-in and check-out dates based on the input checkInDate, Stay_Days, and numberOfDays.
  2. Request Construction:

    • Encodes the Airbnb room ID in Base64 format.
    • Constructs GraphQL API requests with dynamically populated variables.
  3. Data Collection:

    • Sends GET requests to Airbnb's API for each listing and date range.
    • Extracts data paths and values using recursive JSON parsing.
  4. Data Storage:

    • Pushes the extracted data to the Apify dataset for further use.
    • Optionally saves data locally as a CSV file.

Output

The script outputs a dataset with the following fields:

FieldDescription
Check-In DateThe generated check-in date.
Check-Out DateThe corresponding check-out date.
PathJSON path of the extracted data.
ValueValue at the extracted JSON path.

Example Input

1{
2  "startUrls": [
3    { "url": "https://www.airbnb.com/rooms/12345" },
4    { "url": "https://www.airbnb.com/rooms/67890" }
5  ],
6  "checkInDate": "2024-11-21",
7  "Stay_Days": 1,
8  "numberOfDays": 10,
9  "adults": "2",
10  "children": "0",
11  "pets": "0"
12}

Logs

The script logs progress and errors to the console, including:

  • Current URL and date range being processed.
  • Any errors encountered during requests or data parsing.

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!