A powerful and modular web scraping tool designed to extract content from any webpage, article, or news site. Get clean, structured data from any website with optimized extraction algorithms, anti-bot detection avoidance, and proxy support.
A powerful and modular web scraping tool designed to extract content from any webpage, article, or news site. Get clean, structured data from any website with optimized extraction algorithms, anti-bot detection avoidance, and proxy support.
Ultimate Articles Extractor uses multiple specialized extraction engines to extract meaningful content from any webpage. It's designed for data scientists, researchers, journalists, and developers who need to analyze web content at scale.
Perfect for:
Extractor | Best For | Key Strengths | Output Fields |
---|---|---|---|
Newspaper4k | General news articles | NLP capabilities, metadata extraction | Title, text, authors, publish date, keywords, summary |
Trafilatura | News & blog content | Optimized for news, metadata support | Title, text, author, date, language, categories, tags |
Boilerpy3 | Simple article extraction | Fast, efficient text extraction | Title, text, text density metrics |
News-Please | Comprehensive extraction | Rich metadata, fallback capabilities | Title, text, authors, publish date, language, images |
Goose3 | Article content & images | Image extraction, metadata support | Title, text, authors, images, keywords |
Article Parser | HTML & markdown output | Multiple output formats | Title, HTML content, markdown content |
JusText | Boilerplate removal | Focuses on main content | Text, paragraphs count, language |
The application accepts the following input parameters:
1{ 2 "startUrls": [ 3 "https://www.nytimes.com/live/2025/03/21/world/heathrow-airport-power-outage-fire" 4 ], 5 "extractorEngine": "newspaper4k", 6 "saveHtml": false, 7 "saveArticleHtml": false, 8 "useHeaderGenerator": true, 9 "headerGeneratorOptions": { 10 "browsers": ["chrome", "firefox", "safari", "edge"], 11 "devices": ["desktop"] 12 }, 13 "customHeaders": {}, 14 "proxyConfiguration": { 15 "useApifyProxy": true, 16 "apifyProxyGroups": [ 17 "RESIDENTIAL" 18 ] 19 }, 20 "maxRetries": 15 21}
newspaper4k
- Best all-around extractor with NLP capabilities (default)trafilatura
- Optimized for news contentboilerpy3
- Fast and efficient text extractionnews-please
- Rich metadata extractiongoose3
- Good for extracting images and article contentarticle-parser
- Supports multiple output formatsjustext
- Focused on boilerplate removal1{ 2 "title": "Flights Resume at Heathrow After Fire Forced Its Closure", 3 "description": "The cause of a blaze that knocked out power to one of the world's busiest airports was under investigation.", 4 "text": "The authorities said there was no immediate indication of foul play in the substation fire...", 5 "author": ["Michael Levenson", "Andrew Das"], 6 "publishedDate": "2025-03-21T04:09:20", 7 "url": "https://www.nytimes.com/live/2025/03/21/world/heathrow-airport-power-outage-fire", 8 "language": "en", 9 "image": "https://static01.nyt.com/images/2025/03/21/multimedia/21vid-heathrow-closure-package-cover-zqhj/21vid-heathrow-closure-package-cover-zqhj-superJumbo.jpg", 10 "keywords": ["airport", "heathrow", "power outage", "london"], 11 "summary": "Heathrow Airport in London resumed some flight departures and arrivals late Friday...", 12 "extractorEngine": "newspaper4k" 13}
1{ 2 "title": "Flights Resume at Heathrow After Fire Forced Its Closure", 3 "text": "Flights Resume at Heathrow After Fire Forced Its Closure\nThe cause of a blaze that knocked out power to one of the world's busiest airports was under investigation...", 4 "url": "https://www.nytimes.com/live/2025/03/21/world/heathrow-airport-power-outage-fire", 5 "language": "en", 6 "categories": ["world", "europe"], 7 "tags": ["heathrow", "airport", "power outage", "london"], 8 "extractorEngine": "trafilatura" 9}
1{ 2 "title": "Flights Resume at Heathrow After Fire Forced Its Closure - The New York Times", 3 "text": "SKIP ADVERTISEMENT\nFlights Resume at Heathrow After Fire Forced Its Closure\nThe cause of a blaze that knocked out power to one of the world's busiest airports was under investigation...", 4 "url": "https://www.nytimes.com/live/2025/03/21/world/heathrow-airport-power-outage-fire", 5 "textDensity": 0.85, 6 "markupToTextRatio": 0.32, 7 "extractorUsed": "ArticleExtractor", 8 "extractorEngine": "boilerpy3" 9}
1{ 2 "title": "Flights Resume at Heathrow After Fire Forced Its Closure", 3 "description": "The cause of a blaze that knocked out power to one of the world's busiest airports was under investigation.", 4 "text": "Heathrow Airport in London resumed some flight departures and arrivals late Friday as one of the world's busiest air travel hubs began to rumble back to life...", 5 "image": "https://static01.nyt.com/images/2025/03/21/multimedia/21vid-heathrow-closure-package-cover-zqhj/21vid-heathrow-closure-package-cover-zqhj-superJumbo.jpg", 6 "keywords": ["heathrow", "airport", "power outage", "london"], 7 "extractorEngine": "goose3" 8}
1{ 2 "text": "Flights Resume at Heathrow After Fire Forced Its Closure\nThe cause of a blaze that knocked out power to one of the world's busiest airports was under investigation...", 3 "url": "https://www.nytimes.com/live/2025/03/21/world/heathrow-airport-power-outage-fire", 4 "paragraphsCount": 15, 5 "languageUsed": "English", 6 "extractorEngine": "justext" 7}
1{ 2 "title": "Flights Resume at Heathrow After Fire Forced Its Closure", 3 "articleHtml": "<div><p>Heathrow Airport in London resumed some flight departures and arrivals late Friday...</p></div>", 4 "text": "# Flights Resume at Heathrow After Fire Forced Its Closure\n\nHeathrow Airport in London resumed some flight departures and arrivals late Friday...", 5 "url": "https://www.nytimes.com/live/2025/03/21/world/heathrow-airport-power-outage-fire", 6 "extractorEngine": "article-parser" 7}
1{ 2 "title": "Flights Resume at Heathrow After Fire Forced Its Closure", 3 "description": "The cause of a blaze that knocked out power to one of the world's busiest airports was under investigation.", 4 "text": "Heathrow Airport in London resumed some flight departures and arrivals late Friday...", 5 "author": ["Michael Levenson", "Andrew Das"], 6 "publishedDate": "2025-03-21T04:09:20", 7 "url": "https://www.nytimes.com/live/2025/03/21/world/heathrow-airport-power-outage-fire", 8 "language": "en", 9 "image": "https://static01.nyt.com/images/2025/03/21/multimedia/21vid-heathrow-closure-package-cover-zqhj/21vid-heathrow-closure-package-cover-zqhj-superJumbo.jpg", 10 "extractorEngine": "news-please" 11}
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!