Awesome Google News Scraper

This tool scrapes content from Google News, streamlining the collection of latest the information on any topic. Its key feature is the ability to extract full-length articles, not just headlines. Customize results from brief summaries to complete content, revolutionizing your news gathering process.

NEWSINTEGRATIONSAUTOMATIONApify

Try Now →

Unlock the power of comprehensive news analysis with this cutting-edge Apify actor! Designed to revolutionize how you gather and process information, this tool doesn't just scrape headlines – it delivers entire articles right to your fingertips. By leveraging Google News as its source, our actor offers an unparalleled ability to extract, filter, and aggregate full-length news content on any topic you choose.

Features

• Full Article Extraction: Unlike standard RSS feeds or basic scrapers, this actor can retrieve the complete text of articles, giving you access to in-depth content without leaving the platform. • Customizable Content Length: Whether you need a quick summary or the entire story, you're in control. Choose between a specific word count or opt for the full article. • Smart Filtering: Easily exclude unwanted content with customizable keyword filters. • Flexible Time Ranges: Stay current or research past events with adjustable time frame options. • Streamlined Data Structure: Receive well-organized output including titles, URLs, publication dates, sources, and more. • Optional Image Retrieval: Choose whether to fetch image URLs for articles, balancing between comprehensive data and faster performance.

Transform your news gathering process and gain deeper insights with our actor's unique ability to provide complete article content. Say goodbye to surface-level summaries and hello to comprehensive news analysis at your fingertips!

Input

The actor accepts the following input parameters:

Parameter	Type	Description
keyword	String	The search term for news (e.g., "BRICS", "Politics")
numberOfItems	Number	The number of news items to fetch (default: 10, maximum: 100)
filterBadKeywords	Array	Optional array of keywords to filter out unwanted news items
contentLength	String/Number	Number of words to extract from the article or 'full' for entire content
timeRange	String	Time range for news articles (e.g., "Past hour", "Past 24 hours", "Past week", "Past year")
retrieveImage	Boolean	Whether to retrieve image URLs for articles (default: false)

Example input:

json { "keyword": "Bitcoin", "numberOfItems": 20, "filterBadKeywords": ["scam", "fraud"], "contentLength": "200", "timeRange": "Past week", "retrieveImage": false }

Output

The actor outputs a dataset with the following structure for each news article:

title: The title of the news article
link: The resolved URL of the article
pubDate: The publication date of the article
source: The source (news outlet) of the article
imageUrl: The URL of the article's main image (if retrieveImage is set to true)
summary: A brief summary of the article
content: The extracted content of the article (based on contentLength parameter)

Usage

Configure your desired input parameters
Run the actor
Retrieve the results from the dataset

Performance

The performance of this actor can vary based on the number of items requested and the complexity of the articles being scraped. Here are some general guidelines:

Processing Time: On average, the actor takes about 5-10 seconds per article for full content extraction.
Scalability: The actor is designed to handle up to 100 items per run efficiently.
Concurrent Requests: To balance performance and politeness to source websites, the actor processes up to 5 articles concurrently.

For optimal performance, we recommend:

Limiting requests to 50 items or fewer for quicker results.
Using more specific keywords to target relevant articles and reduce processing time.
Setting a reasonable contentLength if you don't need the full article text.
Keeping retrieveImage set to false unless image URLs are necessary, as this can significantly speed up the scraping process.

Note: Performance can be affected by factors such as network latency and the responsiveness of source websites.

Error Handling

This actor is designed with robust error handling to ensure smooth operation:

Network Issues: If a connection to Google News fails, the actor will retry up to 3 times before moving on to the next item.
Rate Limiting: The actor implements a delay between requests to avoid triggering Google's rate limits. If rate limiting is detected, the actor will pause for 60 seconds before retrying.
Article Extraction: If the full text of an article cannot be extracted, the actor will fall back to providing the summary from the RSS feed.
Invalid Inputs: The actor validates all inputs and will provide meaningful error messages for any invalid parameters.

In case of any unrecoverable errors, the actor will log the error details and continue processing the remaining items where possible.

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!