Extract full length articles from top news sources, streamlining the collection of the latest updates on any subject. Its key feature is retrieving complete content—not just headlines. Customise your output from concise summaries to complete articles, transforming your news gathering process.
The In-Depth News Scraper is an Apify actor designed to revolutionise how you gather and process news data. It stands apart from conventional scrapers by delivering complete article content rather than just headlines, enabling comprehensive analysis across diverse news categories.
• Thorough content extraction, not just headlines • Support for major news categories and outlets • Flexible search and filtering capabilities • Structured, analysis-ready output
• Category-Based Filtering: Focus your news gathering by targeting specific categories such as World, Business, or Technology. • Complete Article Extraction: Access full article content directly, surpassing the limitations of basic news aggregators. • Customisable Content Length: Control output size by specifying word count or retrieving complete articles. • Intelligent Filtering: Exclude irrelevant content using customisable keyword filters. • Time-Range Selection: Gather current news or research historical content with flexible time frame options. • Structured Data Output: Receive consistently formatted data including titles, URLs, dates, and sources. • Optional Image Support: Choose whether to include article images based on your requirements.
The actor accepts the following configuration options:
Parameter | Type | Description |
---|---|---|
newsCategory | String | Required: Category filter (e.g., "World", "Technology") |
additionalKeywords | String | Optional: Refine search within selected category |
numberOfItems | Number | Number of articles to retrieve (default: 10, max: 100) |
filterBadKeywords | Array | Optional: Keywords to exclude from results |
contentLength | String | Content extraction mode: "Full" or "Summary" (default: Full) |
timeRange | String | Time period for article selection |
retrieveImage | Boolean | Include image URLs in output (default: false) |
Example configuration:
1{ 2 "newsCategory": "Technology", 3 "additionalKeywords": "artificial intelligence", 4 "numberOfItems": 20, 5 "filterBadKeywords": ["sponsored", "advertisement"], 6 "contentLength": "Full", 7 "timeRange": "Past week", 8 "retrieveImage": false 9}
The actor provides coverage across these primary news categories:
Each article in the dataset contains the following fields:
1{ 2 "title": "Article headline", 3 "link": "Article URL", 4 "pubDate": "2025-02-05T10:00:00.000Z", 5 "source": "Publishing outlet name", 6 "summary": "Brief article overview", 7 "content": "Full article text (length based on contentLength parameter)", 8 "imageUrl": "Main image URL (if retrieveImage is true)" 9}
Performance varies based on several factors:
For optimal results:
Note: Network conditions and source website responsiveness may affect performance.
The actor implements comprehensive error handling:
For detailed error information, consult the actor's run log in the Apify Console.
For implementation assistance or to report issues:
The actor continuously logs its progress and any errors encountered, facilitating quick problem resolution.
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!