Fast News Scraper

Fast News Scraper

Extract full article text and metadata from popular news sites like The New York Times, AP News, Reuters, CNBC, NPR, and Wired. Scrape thousands of articles in just a few minutes.

NEWSAIApify

Fast News Scraper extracts full article text from select news and content websites with a focus on speed. It uses private APIs where available and only makes plain HTTP requests. This won't work for every website, but with a little ingenuity, it can work in a surprising number of cases. Thousands of full articles can be pulled in just minutes.

In addition to the full article text, Fast News Scraper also retrieves various pieces of metadata for each article. The full output is detailed below.

What news websites are supported?

Fast News Scraper currently supports scraping articles from the following websites:

  • The New York Times (nytimes.com)
  • The Washington Post (washingtonpost.com)
  • CNN (cnn.com)
  • Reuters (reuters.com)
  • Wired (wired.com)
  • CNBC (cnbc.com)
  • Associated Press (apnews.com)
  • NBC News (nbcnews.com)
  • NPR (npr.com)

Additional websites will be added over time. If there's a website you'd like to see supported, go to the Issues tab and create a new issue.

Why scrape full news articles?

There are a variety of reasons why scraping full news articles is useful:

  1. Media monitoring: Scrape news articles to track mentions of your company, competitors, or industry-related keywords, allowing you to stay on top of your online reputation and market trends.
  2. Research and analysis: Collect and analyze news articles to identify patterns, trends, and insights on various topics, such as politics, economics, or social issues.
  3. Sentiment analysis: Analyze news articles to determine the sentiment around a particular topic, company, or individual, helping you understand public opinion and make informed decisions.
  4. Event detection: Scrape news articles to detect and track events, such as natural disasters, protests, or product launches, allowing you to respond quickly and effectively.
  5. Topic modeling: Use scraped news articles to identify underlying topics and themes, enabling you to understand the broader context and relationships between different news stories.
  6. Entity extraction: Extract specific entities, such as people, organizations, and locations, from news articles to build databases, create profiles, or track relationships.
  7. News recommendation: Scrape news articles to build personalized news recommendation systems, suggesting relevant content to users based on their interests and preferences.
  8. Fake news detection: Analyze news articles to identify potential fake news stories, helping to combat misinformation and promote fact-based journalism.
  9. Historical research: Scrape news articles to create archives of historical events, allowing researchers and scholars to study and analyze past events and trends.
  10. Business intelligence: Collect and analyze news articles to gather competitive intelligence, track market trends, and identify business opportunities.
  11. Content generation: Use scraped news articles as inspiration or input for generating new content, such as summaries, abstracts, or even entire articles.
  12. Academic research: Collect and analyze news articles to support academic research in fields like journalism, communication, sociology, and political science.
  13. Data journalism: Scrape news articles to create interactive visualizations, dashboards, and stories that help journalists and researchers.
  14. AI training: AI models require large quantities of training data. News articles can provide a rich source of such data.

Input configuration

Fast News Scraper works by using the built-in search functionality on the website you want to scrape. You can search by query and return results based on relevance or date where supported.

Here are all the supported input fields. For more details, see the Input tab.

FieldTypeDescriptionDefault value
sitestringThe site to scrape. Must be one of the supported sites.reuters.com
querystringThe query term used to search to selected site. Not all sites support queries, and only some sites allow an empty query.artificial intelligence
sortstringThe order in which articles are returned. Must be either date or relevance. Not all websites support both.date
maxItemsnumberThe approximate maximum number of items that will be returned by a run. The actual number returned may be slightly higher or lower.500
datasetNamestringIf this field is present, a named dataset will be used. This is useful for appending results from multiple runs.null
requestQueueNamestringIf this field is present, a named request queue will be used. This allows you to avoid scraping the same content across multiple runs.null
proxyobjectThe proxy configuration to use. This field is rcquired.{ "useApifyProxy": true }

Output example

The scraped articles will be shown as a dataset which you can find in the Output tab. Note that the output will first be organized as a table for viewing convenience.

You can preview all the fields and choose in which format to download the data you’ve extracted: JSON, CSV, Excel, HTML table, or XML. Below is a sample dataset in JSON format:

1{
2	"query": "Nvidia",
3	"label": "apnews.com.article",
4	"site": "apnews.com",
5	"url": "https://apnews.com/article/nvidia-gtc-jensen-huang-ai-457e9260aa2a34c1bbcc07c98b7a0555",
6	"title": "Nvidia CEO Jensen Huang unveils new Rubin AI chips at GTC 2025",
7	"tags": [
8		"Business"
9	],
10	"description": "Nvidia founder Jensen Huang kicked off the company’s artificial intelligence developer conference, on Tuesday by telling a crowd of thousands that AI is going through “an inflection point.”",
11	"image": "https://dims.apnews.com/dims4/default/1110a1f/2147483647/strip/true/crop/4659x2621+0+243/resize/1440x810!/quality/90/?url=https%3A%2F%2Fassets.apnews.com%2F1c%2Fa7%2Fbb9db252004b299235ec619feb7b%2F227eb1f572f14664b6ea05d276e07359",
12	"author": "SARAH PARVINI",
13	"published": "2025-03-18T18:35:20",
14	"updated": "2025-03-18T18:35:20",
15	"content": "Nvidia founder Jensen Huang kicked off the company’s artificial intelligence developer conference on Tuesday by telling a crowd of thousands that AI is going through “an inflection point.”\n\nAt GTC 2025 — dubbed the “Super Bowl of AI” — Huang focused his keynote on the company’s advancements in AI and his predictions for how the industry will move over the next few years. Demand for GPUs from the top four cloud service providers is surging, he said, adding that... (truncated)"
16}

Note: Some fields will be blank, empty, or null depending on the website and article.

How long does it take to scrape news articles?

The article extraction rate for each supported website differs. Using the default settings, here's a rough idea of how quickly you can scrape full articles using Fast News Scraper based on some test runs:

Note: All runs listed below used Datacenter proxies unless otherwise noted.

SiteArticlesTimeRateNotes
reuters.com1,9994m 17s467 articles/minute
cnn.com1,5862m 03s774 articles/minute
wired.com1,8824m 25s426 articles/minute
nytimes.com9244m 47s193 articles/minuteResidential proxies
washingtonpost.com2904m 54s59 articles/minute
cnbc.com6453m 19s195 articles/minute
apnews.com6213m 1s206 articles/minute
nbcnews.com9652m 26s397 articles/minute
npr.com9801m 56s507 articles/minute

How to scrape articles from The New York Times (nytimes.com)

The New York Times is a daily newspaper based in New York City that is widely regarded as one of the most respected and authoritative sources of news and information in the world. Founded in 1851, The Times has a long history of journalistic excellence, having won 127 Pulitzer Prizes, more than any other newspaper. Known for its in-depth reporting and thoughtful analysis, The Times covers a wide range of topics, including national and international news, politics, business, culture, and more.

The New York Times locks articles behind a paywall, only allowing free users to access a limited number of articles per month. Fast News Scraper gets around this, providing access to the full text of New York Times articles.

To extract New York Times articles:

  • Set site to nytimes.com.
  • If query is omitted or left empty, all New York Times content will be returned. A non-empty query will use the website's search functionality.
  • Set sort to either date or relevance. If sort is omitted, articles will be returned by date.

Each New York Times search will only return a maximum of ~1,000 articles.

Note: Using Residential proxies is recommended for scraping New York Times articles to avoid getting blocked and to ensure that articles are not dropped.

Note: Some types of content, including live news content and any articles that live on a subdomain, are skipped.

How to scrape articles from The Washington Post (washingtonpost.com)

The Washington Post is a major American daily newspaper published in Washington, D.C. Founded in 1877, it is one of the oldest and most respected newspapers in the United States. Known for its in-depth coverage of national politics, The Post has won numerous Pulitzer Prizes for its investigative reporting, including its coverage of the Watergate scandal in the 1970s. Today, The Washington Post is a leading source of news and opinion on politics, business, sports, and culture, with a print and online circulation of millions.

The Washington Post locks articles behind a paywall, only allowing free users to access a limited number of articles per month. Fast News Scraper gets around this, providing access to the full text of Washington Post articles.

To extract Washington Post articles:

  • Set site to washingtonpost.com.
  • Set query to a non-empty string. The Washington Post does not allow empty queries.
  • Articles will always be returned sorted by relevance. The sort field will be ignored.

The default Datacenter proxies work just fine with The Washington Post.

Note: Each query generally only returns a few hundred articles.

How to scrape articles from Reuters (reuters.com)

Reuters is a leading international news agency that provides comprehensive and unbiased coverage of global news, including politics, business, finance, technology, and more. Founded in 1851, Reuters is one of the oldest and most respected news agencies in the world, with a reputation for accuracy, speed, and independence. Reuters.com offers real-time news coverage, in-depth analysis, and commentary on global events, as well as video and photography from around the world.

Reuters requires registration to view unlimited articles, only allowing unregistered users to access a limited number of articles per month. Fast News Scraper gets around this, providing access to the full text of Reuters articles.

To extract Reuters articles:

  • Set site to reuters.com.
  • Set query to a non-empty string. Reuters does not allow empty queries.
  • Set sort to either date or relevance. If sort is omitted, articles will be returned by date.

The default Datacenter proxies work just fine with Reuters.

How to scrape articles from CNN (cnn.com)

CNN (Cable News Network) is a 24-hour cable news channel that provides continuous coverage of global news, politics, business, entertainment, and more. Founded in 1980, CNN is one of the most recognized and respected news brands in the world, known for its breaking news coverage, in-depth reporting, and live coverage of major events. CNN.com offers a wide range of news content, including video, articles, and blogs, as well as live streaming of CNN TV programming.

CNN doesn't require registration to view news articles, so scraping the website is relatively straightforward.

To extract CNN articles:

  • Set site to cnn.com.
  • If query is omitted or left empty, all CNN content will be returned. A non-empty query will use the website's search functionality.
  • Set sort to either date or relevance. If sort is omitted, articles will be returned by date.

The default Datacenter proxies work just fine with CNN.

Note: Some types of content, including video, live news, CNN Underscored, and gallery content, will be skipped. Any articles that are in a special format (interactive, etc.) will likely fail to be extracted.

How to scrape articles from Wired (wired.com)

Wired is a technology-focused news site that provides in-depth coverage of the latest developments in tech, science, and innovation. Founded in 1993, Wired is known for its cutting-edge reporting on emerging trends, gadgets, and ideas that are shaping the future of business, culture, and society. Wired.com features news, analysis, and commentary on topics such as artificial intelligence, cybersecurity, robotics, and more, as well as profiles of innovators and entrepreneurs who are changing the world.

Wired locks articles behind a paywall, only allowing free users to access a limited number of articles per month. Fast News Scraper gets around this, providing access to the full text of Wired articles.

To extract Wired articles:

  • Set site to wired.com.
  • If query is omitted or left empty, all Wired content will be returned. A non-empty query will use the website's search functionality.
  • Set sort to either date or relevance. If sort is omitted, articles will be returned by date.

The default Datacenter proxies work just fine with Wired.

Note: Sponsored content is skipped.

How to scrape articles from CNBC (cnbc.com)

CNBC, or Consumer News and Business Channel, is a 24-hour cable television network that provides business news and financial information to a global audience. Founded in 1989, CNBC is a leading source of business and market news, offering live coverage of stock markets, economic indicators, and corporate news. The network's programming includes popular shows such as "Squawk Box," "Fast Money," and "Mad Money with Jim Cramer," featuring expert analysis and commentary from experienced journalists and financial experts. CNBC also provides online content, including articles, videos, and podcasts, making it a one-stop shop for investors, business leaders, and anyone interested in staying informed about the world of finance.

CNBC locks its PRO articles behind a paywall. Fast News Scraper gets around this, providing access to the full text of CNBC articles, regardless of whether they're standard articles or PRO articles.

To extract CNBC articles:

  • Set site to cnbc.com.
  • Set query to a non-empty string. CNBC does not allow empty queries.
  • Set sort to either date or relevance. If sort is omitted, articles will be returned by date.

The default Datacenter proxies work just fine with CNBC.

Note: Video content is skipped, and some "live update" content will fail to be scraped.

How to scrape articles from Associated Press (apnews.com)

The Associated Press (AP) is a non-profit news cooperative that has been a leading source of factual reporting for over 175 years. Founded in 1846, the AP is one of the largest and most respected news organizations in the world, providing comprehensive coverage of national and international news to thousands of newspapers, television and radio stations, and online media outlets. The AP's website, apnews.com, offers a wealth of news, photos, and videos on a wide range of topics, including politics, business, sports, and entertainment. Visitors to the site can access breaking news, in-depth analysis, and feature stories, as well as watch live video and access a vast archive of AP content. With its commitment to fact-based reporting and impartiality, apnews.com is a trusted source of news and information for people around the world.

To extract AP articles:

  • Set site to apnews.com.
  • Set query to a non-empty string. AP does not allow empty queries.
  • Set sort to either date or relevance. If sort is omitted, articles will be returned by date.

The default Datacenter proxies work just fine with AP, although if some articles are getting blocked, you can try Residential proxies.

How to scrape articles from NBC News (nbcnews.com)

NBCNews.com is the online news website of NBC News, a leading American news organization that provides comprehensive coverage of national and international news. The site offers a wide range of news, analysis, and features on topics such as politics, business, health, technology, and entertainment. Visitors to the site can access breaking news, in-depth reporting, and investigative journalism, as well as watch live video and access a vast array of multimedia content. NBCNews.com is known for its coverage of major news events, including elections, natural disasters, and global conflicts, and features reporting from NBC News correspondents and anchors, including Rachel Maddow, Lester Holt, and Katy Tur. The site also offers specialized sections, such as NBC News' investigative unit, "Investigations," and "Health," which provides the latest news and information on health and wellness topics.

To extract NBC News articles:

  • Set site to nbcnews.com.
  • Set query to a non-empty string. NBC News does not allow empty queries.

Note: NBC News does not support sorting by date. Articles will be returned sorted by relevance and the sort parameter will be ignored.

The default Datacenter proxies work just fine with NBC News, although if some articles are getting blocked, you can try Residential proxies.

How to scrape articles from NPR (npr.com)

NPR (National Public Radio) is a non-profit media organization that produces and distributes news, information, and cultural programming to a wide audience through its network of public radio stations and online platforms. NPR's flagship website, npr.org, offers a wealth of news, analysis, and features on a wide range of topics, including politics, science, arts, and culture. Visitors to the site can access NPR's signature news programs, such as "Morning Edition" and "All Things Considered," as well as original reporting and storytelling from NPR's correspondents and producers. The site also features a diverse array of podcasts, including "How I Built This," "TED Radio Hour," and "Planet Money," which offer in-depth explorations of topics such as business, technology, and global issues. With its commitment to in-depth reporting and nuanced storytelling, npr.org is a trusted source of news and information for millions of Americans.

To extract NPR articles:

  • Set site to npr.com.
  • Set query to a non-empty string. NPR does not allow empty queries.
  • Set sort to either date or relevance. If sort is omitted, articles will be returned by date.

The default Datacenter proxies work just fine with NPR, although if some articles are getting blocked, you can try Residential proxies.

How to pull only new articles

Let's say you want to schedule Fast News Scraper to run once a week and pull any new articles from wired.com that you haven't already extracted. The key is to use a named dataset and request queue, which can be done using the datasetName and requestQueueName input fields. Each time you run Fast News Scraper, only articles that have not yet been scraped will be processed, and the scraper will automatically stop once it's reached a point where the only articles it's finding are articles you've already scraped. This way you avoid wasting time and money repeatedly scraping the same content.

Learn more about scheduling on the Apify platform.

Note: If you use a named dataset, data will be pushed to the named dataset and an unnamed dataset linked to the run. This is a limitation of the Apify platform. You can view the full dataset and request queue by navigating to the Storage page in the Apify console.

Extracting articles is legal, as you are scraping publicly available content. Please be aware that most articles are protected by copyright laws. Before you publish extracted articles anywhere, check the terms of use of the scraped website. In other words: Don't be a jerk.

News icons created by Freepik - Flaticon

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!