Extract full article text and metadata from popular news sites like The New York Times, AP News, Reuters, CNBC, NPR, and Wired. Scrape thousands of articles in just a few minutes.
Fast News Scraper extracts full article text from select news and content websites with a focus on speed. It uses private APIs where available and only makes plain HTTP requests. This won't work for every website, but with a little ingenuity, it can work in a surprising number of cases. Thousands of full articles can be pulled in just minutes.
In addition to the full article text, Fast News Scraper also retrieves various pieces of metadata for each article. The full output is detailed below.
Fast News Scraper currently supports scraping articles from the following websites:
Additional websites will be added over time. If there's a website you'd like to see supported, go to the Issues tab and create a new issue.
There are a variety of reasons why scraping full news articles is useful:
Fast News Scraper works by using the built-in search functionality on the website you want to scrape. You can search by query and return results based on relevance or date where supported.
Here are all the supported input fields. For more details, see the Input tab.
Field | Type | Description | Default value |
---|---|---|---|
site | string | The site to scrape. Must be one of the supported sites. | reuters.com |
query | string | The query term used to search to selected site. Not all sites support queries, and only some sites allow an empty query. | artificial intelligence |
sort | string | The order in which articles are returned. Must be either date or relevance. Not all websites support both. | date |
maxItems | number | The approximate maximum number of items that will be returned by a run. The actual number returned may be slightly higher or lower. | 500 |
datasetName | string | If this field is present, a named dataset will be used. This is useful for appending results from multiple runs. | null |
requestQueueName | string | If this field is present, a named request queue will be used. This allows you to avoid scraping the same content across multiple runs. | null |
proxy | object | The proxy configuration to use. This field is rcquired. | { "useApifyProxy": true } |
The scraped articles will be shown as a dataset which you can find in the Output tab. Note that the output will first be organized as a table for viewing convenience.
You can preview all the fields and choose in which format to download the data you’ve extracted: JSON, CSV, Excel, HTML table, or XML. Below is a sample dataset in JSON format:
1{ 2 "query": "Nvidia", 3 "label": "apnews.com.article", 4 "site": "apnews.com", 5 "url": "https://apnews.com/article/nvidia-gtc-jensen-huang-ai-457e9260aa2a34c1bbcc07c98b7a0555", 6 "title": "Nvidia CEO Jensen Huang unveils new Rubin AI chips at GTC 2025", 7 "tags": [ 8 "Business" 9 ], 10 "description": "Nvidia founder Jensen Huang kicked off the company’s artificial intelligence developer conference, on Tuesday by telling a crowd of thousands that AI is going through “an inflection point.”", 11 "image": "https://dims.apnews.com/dims4/default/1110a1f/2147483647/strip/true/crop/4659x2621+0+243/resize/1440x810!/quality/90/?url=https%3A%2F%2Fassets.apnews.com%2F1c%2Fa7%2Fbb9db252004b299235ec619feb7b%2F227eb1f572f14664b6ea05d276e07359", 12 "author": "SARAH PARVINI", 13 "published": "2025-03-18T18:35:20", 14 "updated": "2025-03-18T18:35:20", 15 "content": "Nvidia founder Jensen Huang kicked off the company’s artificial intelligence developer conference on Tuesday by telling a crowd of thousands that AI is going through “an inflection point.”\n\nAt GTC 2025 — dubbed the “Super Bowl of AI” — Huang focused his keynote on the company’s advancements in AI and his predictions for how the industry will move over the next few years. Demand for GPUs from the top four cloud service providers is surging, he said, adding that... (truncated)" 16}
Note: Some fields will be blank, empty, or null depending on the website and article.
The article extraction rate for each supported website differs. Using the default settings, here's a rough idea of how quickly you can scrape full articles using Fast News Scraper based on some test runs:
Note: All runs listed below used Datacenter proxies unless otherwise noted.
Site | Articles | Time | Rate | Notes |
---|---|---|---|---|
reuters.com | 1,999 | 4m 17s | 467 articles/minute | |
cnn.com | 1,586 | 2m 03s | 774 articles/minute | |
wired.com | 1,882 | 4m 25s | 426 articles/minute | |
nytimes.com | 924 | 4m 47s | 193 articles/minute | Residential proxies |
washingtonpost.com | 290 | 4m 54s | 59 articles/minute | |
cnbc.com | 645 | 3m 19s | 195 articles/minute | |
apnews.com | 621 | 3m 1s | 206 articles/minute | |
nbcnews.com | 965 | 2m 26s | 397 articles/minute | |
npr.com | 980 | 1m 56s | 507 articles/minute |
The New York Times is a daily newspaper based in New York City that is widely regarded as one of the most respected and authoritative sources of news and information in the world. Founded in 1851, The Times has a long history of journalistic excellence, having won 127 Pulitzer Prizes, more than any other newspaper. Known for its in-depth reporting and thoughtful analysis, The Times covers a wide range of topics, including national and international news, politics, business, culture, and more.
The New York Times locks articles behind a paywall, only allowing free users to access a limited number of articles per month. Fast News Scraper gets around this, providing access to the full text of New York Times articles.
To extract New York Times articles:
site
to nytimes.com.query
is omitted or left empty, all New York Times content will be returned. A non-empty query
will use the website's search functionality.sort
to either date or relevance. If sort
is omitted, articles will be returned by date.Each New York Times search will only return a maximum of ~1,000 articles.
Note: Using Residential proxies is recommended for scraping New York Times articles to avoid getting blocked and to ensure that articles are not dropped.
Note: Some types of content, including live news content and any articles that live on a subdomain, are skipped.
The Washington Post is a major American daily newspaper published in Washington, D.C. Founded in 1877, it is one of the oldest and most respected newspapers in the United States. Known for its in-depth coverage of national politics, The Post has won numerous Pulitzer Prizes for its investigative reporting, including its coverage of the Watergate scandal in the 1970s. Today, The Washington Post is a leading source of news and opinion on politics, business, sports, and culture, with a print and online circulation of millions.
The Washington Post locks articles behind a paywall, only allowing free users to access a limited number of articles per month. Fast News Scraper gets around this, providing access to the full text of Washington Post articles.
To extract Washington Post articles:
site
to washingtonpost.com.query
to a non-empty string. The Washington Post does not allow empty queries.sort
field will be ignored.The default Datacenter proxies work just fine with The Washington Post.
Note: Each query generally only returns a few hundred articles.
Reuters is a leading international news agency that provides comprehensive and unbiased coverage of global news, including politics, business, finance, technology, and more. Founded in 1851, Reuters is one of the oldest and most respected news agencies in the world, with a reputation for accuracy, speed, and independence. Reuters.com offers real-time news coverage, in-depth analysis, and commentary on global events, as well as video and photography from around the world.
Reuters requires registration to view unlimited articles, only allowing unregistered users to access a limited number of articles per month. Fast News Scraper gets around this, providing access to the full text of Reuters articles.
To extract Reuters articles:
site
to reuters.com.query
to a non-empty string. Reuters does not allow empty queries.sort
to either date or relevance. If sort
is omitted, articles will be returned by date.The default Datacenter proxies work just fine with Reuters.
CNN (Cable News Network) is a 24-hour cable news channel that provides continuous coverage of global news, politics, business, entertainment, and more. Founded in 1980, CNN is one of the most recognized and respected news brands in the world, known for its breaking news coverage, in-depth reporting, and live coverage of major events. CNN.com offers a wide range of news content, including video, articles, and blogs, as well as live streaming of CNN TV programming.
CNN doesn't require registration to view news articles, so scraping the website is relatively straightforward.
To extract CNN articles:
site
to cnn.com.query
is omitted or left empty, all CNN content will be returned. A non-empty query
will use the website's search functionality.sort
to either date or relevance. If sort
is omitted, articles will be returned by date.The default Datacenter proxies work just fine with CNN.
Note: Some types of content, including video, live news, CNN Underscored, and gallery content, will be skipped. Any articles that are in a special format (interactive, etc.) will likely fail to be extracted.
Wired is a technology-focused news site that provides in-depth coverage of the latest developments in tech, science, and innovation. Founded in 1993, Wired is known for its cutting-edge reporting on emerging trends, gadgets, and ideas that are shaping the future of business, culture, and society. Wired.com features news, analysis, and commentary on topics such as artificial intelligence, cybersecurity, robotics, and more, as well as profiles of innovators and entrepreneurs who are changing the world.
Wired locks articles behind a paywall, only allowing free users to access a limited number of articles per month. Fast News Scraper gets around this, providing access to the full text of Wired articles.
To extract Wired articles:
site
to wired.com.query
is omitted or left empty, all Wired content will be returned. A non-empty query
will use the website's search functionality.sort
to either date or relevance. If sort
is omitted, articles will be returned by date.The default Datacenter proxies work just fine with Wired.
Note: Sponsored content is skipped.
CNBC, or Consumer News and Business Channel, is a 24-hour cable television network that provides business news and financial information to a global audience. Founded in 1989, CNBC is a leading source of business and market news, offering live coverage of stock markets, economic indicators, and corporate news. The network's programming includes popular shows such as "Squawk Box," "Fast Money," and "Mad Money with Jim Cramer," featuring expert analysis and commentary from experienced journalists and financial experts. CNBC also provides online content, including articles, videos, and podcasts, making it a one-stop shop for investors, business leaders, and anyone interested in staying informed about the world of finance.
CNBC locks its PRO articles behind a paywall. Fast News Scraper gets around this, providing access to the full text of CNBC articles, regardless of whether they're standard articles or PRO articles.
To extract CNBC articles:
site
to cnbc.com.query
to a non-empty string. CNBC does not allow empty queries.sort
to either date or relevance. If sort
is omitted, articles will be returned by date.The default Datacenter proxies work just fine with CNBC.
Note: Video content is skipped, and some "live update" content will fail to be scraped.
The Associated Press (AP) is a non-profit news cooperative that has been a leading source of factual reporting for over 175 years. Founded in 1846, the AP is one of the largest and most respected news organizations in the world, providing comprehensive coverage of national and international news to thousands of newspapers, television and radio stations, and online media outlets. The AP's website, apnews.com, offers a wealth of news, photos, and videos on a wide range of topics, including politics, business, sports, and entertainment. Visitors to the site can access breaking news, in-depth analysis, and feature stories, as well as watch live video and access a vast archive of AP content. With its commitment to fact-based reporting and impartiality, apnews.com is a trusted source of news and information for people around the world.
To extract AP articles:
site
to apnews.com.query
to a non-empty string. AP does not allow empty queries.sort
to either date or relevance. If sort
is omitted, articles will be returned by date.The default Datacenter proxies work just fine with AP, although if some articles are getting blocked, you can try Residential proxies.
NBCNews.com is the online news website of NBC News, a leading American news organization that provides comprehensive coverage of national and international news. The site offers a wide range of news, analysis, and features on topics such as politics, business, health, technology, and entertainment. Visitors to the site can access breaking news, in-depth reporting, and investigative journalism, as well as watch live video and access a vast array of multimedia content. NBCNews.com is known for its coverage of major news events, including elections, natural disasters, and global conflicts, and features reporting from NBC News correspondents and anchors, including Rachel Maddow, Lester Holt, and Katy Tur. The site also offers specialized sections, such as NBC News' investigative unit, "Investigations," and "Health," which provides the latest news and information on health and wellness topics.
To extract NBC News articles:
site
to nbcnews.com.query
to a non-empty string. NBC News does not allow empty queries.Note: NBC News does not support sorting by date. Articles will be returned sorted by relevance and the sort
parameter will be ignored.
The default Datacenter proxies work just fine with NBC News, although if some articles are getting blocked, you can try Residential proxies.
NPR (National Public Radio) is a non-profit media organization that produces and distributes news, information, and cultural programming to a wide audience through its network of public radio stations and online platforms. NPR's flagship website, npr.org, offers a wealth of news, analysis, and features on a wide range of topics, including politics, science, arts, and culture. Visitors to the site can access NPR's signature news programs, such as "Morning Edition" and "All Things Considered," as well as original reporting and storytelling from NPR's correspondents and producers. The site also features a diverse array of podcasts, including "How I Built This," "TED Radio Hour," and "Planet Money," which offer in-depth explorations of topics such as business, technology, and global issues. With its commitment to in-depth reporting and nuanced storytelling, npr.org is a trusted source of news and information for millions of Americans.
To extract NPR articles:
site
to npr.com.query
to a non-empty string. NPR does not allow empty queries.sort
to either date or relevance. If sort
is omitted, articles will be returned by date.The default Datacenter proxies work just fine with NPR, although if some articles are getting blocked, you can try Residential proxies.
Let's say you want to schedule Fast News Scraper to run once a week and pull any new articles from wired.com
that you haven't already extracted. The key is to use a named dataset and request queue, which can be done using the datasetName
and requestQueueName
input fields. Each time you run Fast News Scraper, only articles that have not yet been scraped will be processed, and the scraper will automatically stop once it's reached a point where the only articles it's finding are articles you've already scraped. This way you avoid wasting time and money repeatedly scraping the same content.
Learn more about scheduling on the Apify platform.
Note: If you use a named dataset, data will be pushed to the named dataset and an unnamed dataset linked to the run. This is a limitation of the Apify platform. You can view the full dataset and request queue by navigating to the Storage page in the Apify console.
Extracting articles is legal, as you are scraping publicly available content. Please be aware that most articles are protected by copyright laws. Before you publish extracted articles anywhere, check the terms of use of the scraped website. In other words: Don't be a jerk.
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!