PubMed Search Scraper

PubMed Search Scraper

Scrape research papers and academic articles from PubMed based on search terms. Extract comprehensive article metadata including titles, authors, citations, abstracts, and more. Perfect for medical research and literature reviews.

INTEGRATIONSApify

PubMed Search Scraper 🔬

📋 Overview

Extract academic articles and research papers from PubMed, the world's leading database of biomedical literature. This actor allows you to scrape detailed information from search results based on your keywords.

✨ Features

  • 🔎 Scrape articles based on custom search queries
  • 📑 Extract comprehensive article metadata:
    • Title and article ID
    • Full & short author lists
    • Complete & abbreviated journal citations
    • PMID (PubMed Identifier)
    • Article tags and types
    • Full & truncated abstracts
    • Social sharing links
  • ⚡ High-performance scrolling pagination
  • 🛡️ Built-in anti-blocking measures
  • 🎯 Configurable maximum items limit

💡 Use Cases

  • Medical research and literature reviews
  • Academic meta-analyses
  • Tracking research trends
  • Building research databases
  • Bibliometric analysis
  • Scientific data mining

📤 Output

The actor outputs detailed article information in JSON format, including:

  • Article title and unique identifier
  • Author information (full and short formats)
  • Journal citation details
  • PMID reference
  • Article type tags
  • Abstract content
  • Social sharing links

💪 Tips for Optimal Usage

  1. Use specific search terms for more targeted results
  2. Consider breaking large searches into smaller queries
  3. Allow sufficient run time for larger result sets
  4. Monitor your usage to stay within PubMed's guidelines

Input Example

A full explanation of an input example in JSON.

1{
2    "searchUrls": ["https://pubmed.ncbi.nlm.nih.gov/?term=rheumatoid%20arthritis"],
3    "maxItems": 30
4}

Output sample

The results will be wrapped into a dataset which you can always find in the Storage tab. Here's an excerpt from the data you'd get if you apply the input parameters above:

And here is the same data but in JSON. You can choose in which format to download your data: JSON, JSONL, Excel spreadsheet, HTML table, CSV, or XML.

1[
2	{
3		"title": "Rheumatoid arthritis.",
4		"articleId": "27156434",
5		"articleUrl": "https://pubmed.ncbi.nlm.nih.gov/27156434/",
6		"authors": {
7			"full": "Smolen JS, Aletaha D, McInnes IB.",
8			"short": "Smolen JS, et al."
9		},
10		"citation": {
11			"full": "Lancet. 2016 Oct 22;388(10055):2023-2038. doi: 10.1016/S0140-6736(16)30173-8. Epub 2016 May 3.",
12			"short": "Lancet. 2016."
13		},
14		"pmid": "27156434",
15		"tags": [
16			"Free article.",
17			"Review."
18		],
19		"abstract": {
20			"full": "Rheumatoid arthritis is a chronic inflammatory joint disease, which can cause cartilage and bone damage as well as disability. ...In this Seminar, we describe current insights into genetics and aetiology, pathophysiology, epidemiology, assessment, therapeutic agents …",
21			"short": "Rheumatoid arthritis is a chronic inflammatory joint disease, which can cause cartilage and bone damage as well as disability. …"
22		},
23		"shareLinks": {
24			"twitter": "http://twitter.com/intent/tweet?text=Rheumatoid%20arthritis.%20https%3A//pubmed.ncbi.nlm.nih.gov/27156434/",
25			"facebook": "http://www.facebook.com/sharer/sharer.php?u=https%3A//pubmed.ncbi.nlm.nih.gov/27156434/",
26			"permalink": "https://pubmed.ncbi.nlm.nih.gov/27156434/"
27		}
28	},
29	{
30		"title": "Management of Rheumatoid Arthritis: An Overview.",
31		"articleId": "34831081",
32		"articleUrl": "https://pubmed.ncbi.nlm.nih.gov/34831081/",
33		"authors": {
34			"full": "Radu AF, Bungau SG.",
35			"short": "Radu AF, et al."
36		},
37		"citation": {
38			"full": "Cells. 2021 Oct 23;10(11):2857. doi: 10.3390/cells10112857.",
39			"short": "Cells. 2021."
40		},
41		"pmid": "34831081",
42		"tags": [
43			"Free PMC article.",
44			"Review."
45		],
46		"abstract": {
47			"full": "Rheumatoid arthritis (RA) is a multifactorial autoimmune disease of unknown etiology, primarily affecting the joints, then extra-articular manifestations can occur. ...",
48			"short": "Rheumatoid arthritis (RA) is a multifactorial autoimmune disease of unknown etiology, primarily affecting the joints, then ext …"
49		},
50		"shareLinks": {
51			"twitter": "http://twitter.com/intent/tweet?text=Management%20of%20Rheumatoid%20Arthritis%3A%20An%20Overview.%20https%3A//pubmed.ncbi.nlm.nih.gov/34831081/",
52			"facebook": "http://www.facebook.com/sharer/sharer.php?u=https%3A//pubmed.ncbi.nlm.nih.gov/34831081/",
53			"permalink": "https://pubmed.ncbi.nlm.nih.gov/34831081/"
54		}
55	},
56    ...
57]

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!