Archive.org advanced search

a powerful, fast and advanced seach api for archive.org leveraging its api for fast and accurate results, with all the filters supported in archive.org's advanced search

OTHERDEVELOPER_TOOLSAUTOMATIONApify

Try Now →Read Guide →

Overview

Welcome to the Archive.org Advanced Search, a powerful Apify Actor built to unlock the full potential of the Internet Archive's vast digital repository! This cutting-edge tool empowers you to perform highly customizable searches across millions of archived web pages, books, audio files, videos, and more, using a flexible and intuitive interface. Whether you're a researcher, historian, data analyst, or simply a curious explorer, this Actor delivers precise results tailored to your needs with support for advanced filters, pagination, and sorting.

Why Choose This Actor?

Unmatched Flexibility: Search by title, creator, description, collection, media type, and up to 5 custom fields with customizable operators (e.g., "contains" or "not").
Precision Control: Fine-tune your queries with exact dates or date ranges, ensuring you find exactly what you're looking for.
Efficient Pagination: Retrieve results in batches (up to 1000 items per page) and navigate multiple pages effortlessly.
Dynamic Sorting: Sort results by fields like publicdate or downloads in ascending or descending order.

This Actor leverages the Archive.org search API to deliver fast, reliable, and structured data, making it an indispensable tool for scraping, analyzing, or archiving digital content. Start exploring the depths of digital history today!

Features

Advanced Search Filters: Target specific fields with operators like "contains" and "not".
Custom Fields Support: Add up to 5 custom search criteria for niche queries.
Date Precision: Search by exact dates (YYYY-MM-DD) or date ranges.
Pagination Support: Control the number of results per page and page number.
Sorting Options: Customize result ordering based on your preferences.
Detailed Output: Receive structured JSON data with metadata like titles, dates, and URLs.

Usage

This Actor is designed to run on the Apify platform. No local installation is required! Simply:

Sign up or log in to Apify.
Search for the "Archive.org Advanced Search" in the Apify Store
Configure the input parameters via the Apify UI or an INPUT.json file.
Run the Actor and download the results from the dataset.

For local development or testing, clone the repository, install dependencies (e.g., apify-client, httpx), and use the apify run command with a valid INPUT.json.

Output

The Actor stores results in the default Apify dataset, accessible as JSON. See the Output Example for a sample response.

Input Documentation

The Actor accepts a JSON object with the following attributes. All fields are optional unless specified.

Attribute	Type	Description	Default Value	Editor	Constraints/Options
`any_field_value`	`string`	Search term to match across all fields.	`""`	`textfield`	None
`any_field_operator`	`string`	Operator for the 'Any Field' search term.	`"contains"`	`select`	`["contains", "not"]`
`title_value`	`string`	Search term to match in the title field.	`""`	`textfield`	None
`title_operator`	`string`	Operator for the 'Title' search term.	`"contains"`	`select`	`["contains", "not"]`
`creator_value`	`string`	Search term to match in the creator field.	`""`	`textfield`	None
`creator_operator`	`string`	Operator for the 'Creator' search term.	`"contains"`	`select`	`["contains", "not"]`
`description_value`	`string`	Search term to match in the description field.	`""`	`textfield`	None
`description_operator`	`string`	Operator for the 'Description' search term.	`"contains"`	`select`	`["contains", "not"]`
`collection_value`	`string`	Search term to match in the collection field.	`""`	`textfield`	None
`collection_operator`	`string`	Operator for the 'Collection' search term.	`"contains"`	`select`	`["contains", "not"]`
`mediatype_value`	`string`	Search term to match in the mediatype field.	`""`	`textfield`	None
`mediatype_operator`	`string`	Operator for the 'Media Type' search term.	`"is"`	`select`	`["is", "not"]`
`custom_field_1_name`	`string`	Name of the first custom field to search.	`""`	`textfield`	None
`custom_field_1_value`	`string`	Value of the first custom field to search.	`""`	`textfield`	None
`custom_field_1_operator`	`string`	Operator for the first custom field search term.	`"contains"`	`select`	`["contains", "not"]`
`custom_field_2_name`	`string`	Name of the second custom field to search.	`""`	`textfield`	None
`custom_field_2_value`	`string`	Value of the second custom field to search.	`""`	`textfield`	None
`custom_field_2_operator`	`string`	Operator for the second custom field search term.	`"contains"`	`select`	`["contains", "not"]`
`custom_field_3_name`	`string`	Name of the third custom field to search.	`""`	`textfield`	None
`custom_field_3_value`	`string`	Value of the third custom field to search.	`""`	`textfield`	None
`custom_field_3_operator`	`string`	Operator for the third custom field search term.	`"contains"`	`select`	`["contains", "not"]`
`custom_field_4_name`	`string`	Name of the fourth custom field to search.	`""`	`textfield`	None
`custom_field_4_value`	`string`	Value of the fourth custom field to search.	`""`	`textfield`	None
`custom_field_4_operator`	`string`	Operator for the fourth custom field search term.	`"contains"`	`select`	`["contains", "not"]`
`custom_field_5_name`	`string`	Name of the fifth custom field to search.	`""`	`textfield`	None
`custom_field_5_value`	`string`	Value of the fifth custom field to search.	`""`	`textfield`	None
`custom_field_5_operator`	`string`	Operator for the fifth custom field search term.	`"contains"`	`select`	`["contains", "not"]`
`date`	`string`	Exact date to match (format: YYYY-MM-DD).	`""`	`textfield`	Must match `^\d{4}-\d{2}-\d{2}$`
`date_range_start`	`string`	Start date of the range to match (format: YYYY-MM-DD).	`""`	`textfield`	Must match `^\d{4}-\d{2}-\d{2}$`
`date_range_end`	`string`	End date of the range to match (format: YYYY-MM-DD).	`""`	`textfield`	Must match `^\d{4}-\d{2}-\d{2}$`
`hits_per_page`	`integer`	Number of items to return per page.	`50`	`number`	Min: 1, Max: 1000
`page`	`integer`	Page number to fetch (1-based).	`1`	`number`	Min: 1
`sort_name`	`string`	Field to sort by (e.g., `publicdate`, `downloads`).	`""`	`textfield`	None
`sort_value`	`string`	Sort direction (ascending or descending).	`""`	`select`	`["asc", "desc"]`

Input Example

Below is an example INPUT.json file demonstrating a search for Spanish audio files created between 1994 and 2024, sorted by publication date in descending order.

1{
2    "any_field_value": "",
3    "any_field_operator": "contains",
4    "title_value": "learn spanish",
5    "title_operator": "contains",
6    "creator_value": "",
7    "creator_operator": "not",
8    "description_value": "",
9    "description_operator": "not",
10    "collection_value": "",
11    "collection_operator": "contains",
12    "mediatype_value": "",
13    "mediatype_operator": "is",
14    "custom_field_1_name": "",
15    "custom_field_1_value": "",
16    "custom_field_1_operator": "contains",
17    "custom_field_2_name": "",
18    "custom_field_2_value": "",
19    "custom_field_2_operator": "contains",
20    "custom_field_3_name": "",
21    "custom_field_3_value": "",
22    "custom_field_3_operator": "contains",
23    "custom_field_4_name": "",
24    "custom_field_4_value": "",
25    "custom_field_4_operator": "contains",
26    "custom_field_5_name": "",
27    "custom_field_5_value": "",
28    "custom_field_5_operator": "contains",
29    "date": "2019-05-10",
30    "date_range_start": "1994-06-06",
31    "date_range_end": "2024-02-06",
32    "hits_per_page": 50,
33    "page": 1,
34    "sort_name": "downloads",
35    "sort_value": "desc"
36}

Output Example

The Actor returns a JSON object stored in the Apify dataset. Below is a sample output for the above input, assuming a successful API response.

1[
2    {
3      "index": "prod-o-001",
4      "service_backend": "metadata",
5      "hit_type": "item",
6      "identifier": "podcast_learn-spanish-with-daily-podca_1052684843",
7      "filename": "",
8      "file_basename": "",
9      "page_num": 0,
10      "file_creation_mtime": 0,
11      "updated_on": "",
12      "created_on": "",
13      "mediatype": "collection",
14      "title": "Learn Spanish with daily podcasts",
15      "publicdate": "2019-06-15T11:08:02Z",
16      "downloads": 3826,
17      "collection": [
18        "podcasts",
19        "audio"
20      ],
21      "subject": [
22        "podcast",
23        "itunes",
24        "apple"
25      ],
26      "addeddate": "2019-06-15T11:08:02Z",
27      "description": "L",
28      "result_in_subfile": false,
29      "__href__": "",
30      "highlight": [],
31      "_score": null,
32      "url": "https://archive.org/details/podcast_learn-spanish-with-daily-podca_1052684843"
33    },
34    {
35      "index": "prod-o-001",
36      "service_backend": "metadata",
37      "hit_type": "item",
38      "identifier": "lp_listen-learn-spanish_no-artist",
39      "filename": "",
40      "file_basename": "",
41      "page_num": 0,
42      "file_creation_mtime": 0,
43      "updated_on": "",
44      "created_on": "",
45      "mediatype": "audio",
46      "title": "Listen & Learn Spanish",
47      "publicdate": "2020-11-09T08:43:19Z",
48      "downloads": 2546,
49      "collection": [
50        "album_recordings",
51        "vinyl_bostonpubliclibrary",
52        "audio_music",
53        "unlockedrecordings"
54      ],
55      "subject": [
56        "Non-Music",
57        "Speech",
58        "Education"
59      ],
60      "addeddate": "2020-12-02T02:37:01Z",
61      "description": "T",
62      "result_in_subfile": false,
63      "__href__": "",
64      "highlight": [],
65      "_score": null,
66      "url": "https://archive.org/details/lp_listen-learn-spanish_no-artist"
67    },
68]

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!