Wordpress Post Scraper - NEW

Wordpress Post Scraper - NEW

This actor scrapes WordPress blog posts of one or more websites, cleans the HTML content, and pushes flattened JSON data (collects all data it can find in the post). It uses Selenium to handle pages requiring JavaScript rendering.

AUTOMATIONECOMMERCESEO_TOOLSApify

WordPress Scraper Actor

The WordPress Scraper Actor allows you to easily scrape content from (multiple) WordPress websites, including blogs, articles, author details, categories, comments and media. It uses the WordPress REST API, Requests library and if necessary Selenium for accurate data extraction. Only works on WP sites that accept REST API calls

Features

  • Extract blog posts, articles, author information, products, categories, comments and images from WordPress websites.
  • Uses REST API and Selenium for complete data extraction.
  • Outputs cleaned HTML content as plain text in JSON format.
  • Supports pagination for comprehensive scraping.

How It Works

The actor takes a single or multiple website URLs as input, interacts with the REST API to gather data, and uses Selenium to handle JavaScript-rendered pages. The scraped data is cleaned and formatted as structured JSON.

Input Parameters

  • start_urls (required): List of website URLs to scrape (company1.com,company2.com,etc).
  • max_results (optional): Maximum number of posts to retrieve per site. Set to 'all' for all posts.
  • scrape_mode (required, default is 'posts'): Choose the data you wish to scrape, you can choose from 'posts', 'media', 'categories','comments'

Output

The actor outputs (cleaned) JSON data for each post, including:

  • Title
  • Cleaned Content
  • Metadata (author, publication date, tags, categories)
  • Media Links
  • All post data: All the raw post data in the "All fields" tab

Getting Started

  1. Create an Actor Task: On Apify, create a new actor task and provide the list of URLs to scrape.
  2. Input Configuration: Set start_urls and optionally max_results.
  3. Run the Actor: Execute the actor to start scraping.
  4. Review Results: Download the results as a JSON file.

Use Cases

  • Content Aggregation: Collect articles or blog posts from multiple WordPress sites.
  • Market Research: Scrape product descriptions and reviews from WordPress-powered e-commerce sites.
  • Data Analysis: Gather articles for analysis or summarization.

Important Notes

  • Respecting Site Policies: Always ensure you have permission to scrape data from a website, and respect the site's robots.txt policies.

Actor Input Example

1{
2  "start_urls": [
3    { "url": "https://example.com" },
4    { "url": "https://another-example.com" }
5  ],
6  "max_results": "all"
7}

Actor Output Example (CLEANED)

1{
2  "title": "Sample Blog Post",
3  "cleaned_content": "This is the content of the blog post, without HTML tags.",
4  "date_published": "2023-10-01",
5}

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!