In Depth News Scraper

Extract full length articles from top news sources, streamlining the collection of the latest updates on any subject. Its key feature is retrieving complete content—not just headlines. Customise your output from concise summaries to complete articles, transforming your news gathering process.

NEWSINTEGRATIONSAUTOMATIONApify

Try Now →

In-Depth News Scraper

The In-Depth News Scraper is an Apify actor designed to revolutionise how you gather and process news data. It stands apart from conventional scrapers by delivering complete article content rather than just headlines, enabling comprehensive analysis across diverse news categories.

Key Advantages

• Thorough content extraction, not just headlines • Support for major news categories and outlets • Flexible search and filtering capabilities • Structured, analysis-ready output

Features

• Category-Based Filtering: Focus your news gathering by targeting specific categories such as World, Business, or Technology. • Complete Article Extraction: Access full article content directly, surpassing the limitations of basic news aggregators. • Customisable Content Length: Control output size by specifying word count or retrieving complete articles. • Intelligent Filtering: Exclude irrelevant content using customisable keyword filters. • Time-Range Selection: Gather current news or research historical content with flexible time frame options. • Structured Data Output: Receive consistently formatted data including titles, URLs, dates, and sources. • Optional Image Support: Choose whether to include article images based on your requirements.

Input Parameters

The actor accepts the following configuration options:

Parameter	Type	Description
newsCategory	String	Required: Category filter (e.g., "World", "Technology")
additionalKeywords	String	Optional: Refine search within selected category
numberOfItems	Number	Number of articles to retrieve (default: 10, max: 100)
filterBadKeywords	Array	Optional: Keywords to exclude from results
contentLength	String	Content extraction mode: "Full" or "Summary" (default: Full)
timeRange	String	Time period for article selection
retrieveImage	Boolean	Include image URLs in output (default: false)

Example configuration:

1{
2    "newsCategory": "Technology",
3    "additionalKeywords": "artificial intelligence",
4    "numberOfItems": 20,
5    "filterBadKeywords": ["sponsored", "advertisement"],
6    "contentLength": "Full",
7    "timeRange": "Past week",
8    "retrieveImage": false
9}

Supported Categories

The actor provides coverage across these primary news categories:

World
Business
Technology
Entertainment
Health
Science
Sports
Politics

Output Structure

Each article in the dataset contains the following fields:

1{
2    "title": "Article headline",
3    "link": "Article URL",
4    "pubDate": "2025-02-05T10:00:00.000Z",
5    "source": "Publishing outlet name",
6    "summary": "Brief article overview",
7    "content": "Full article text (length based on contentLength parameter)",
8    "imageUrl": "Main image URL (if retrieveImage is true)"
9}

Implementation Guide

Choose your target news category
Add any specific keywords to refine results
Set additional parameters as needed
Execute the actor
Access your structured dataset

Performance Considerations

Performance varies based on several factors:

Processing Duration: Typically 5-10 seconds per article for full extraction
Volume Handling: Efficiently processes up to 100 articles per run
Request Management: Sequential processing with appropriate intervals

For optimal results:

Limit requests to 50 items for faster completion
Use precise keywords to target relevant content
Consider using word limits unless full text is required
Disable image retrieval when not essential

Note: Network conditions and source website responsiveness may affect performance.

Error Handling and Troubleshooting

The actor implements comprehensive error handling:

Connection Issues: Automatic retry (up to 3 attempts) for failed connections
Rate Management: Dynamic delays between requests to prevent rate limiting
Content Fallback: Defaults to article summary if full content extraction fails
Input Validation: Clear error messages for invalid configurations

Troubleshooting Common Issues

Timeout Errors: Consider reducing batch size or increasing time between requests
Missing Content: Check if the source website requires authentication
Rate Limiting: The actor will automatically pause and retry; no action needed
Error Logs: Available in the actor's run details for debugging

For detailed error information, consult the actor's run log in the Apify Console.

Technical Support

For implementation assistance or to report issues:

Check the actor's run log for specific error messages
Review the troubleshooting section above
Contact support with the actor run ID for detailed investigation

The actor continuously logs its progress and any errors encountered, facilitating quick problem resolution.

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!