a powerful, fast and advanced seach api for archive.org leveraging its api for fast and accurate results, with all the filters supported in archive.org's advanced search
Welcome to the Archive.org Advanced Search, a powerful Apify Actor built to unlock the full potential of the Internet Archive's vast digital repository! This cutting-edge tool empowers you to perform highly customizable searches across millions of archived web pages, books, audio files, videos, and more, using a flexible and intuitive interface. Whether you're a researcher, historian, data analyst, or simply a curious explorer, this Actor delivers precise results tailored to your needs with support for advanced filters, pagination, and sorting.
publicdate
or downloads
in ascending or descending order.This Actor leverages the Archive.org search API to deliver fast, reliable, and structured data, making it an indispensable tool for scraping, analyzing, or archiving digital content. Start exploring the depths of digital history today!
This Actor is designed to run on the Apify platform. No local installation is required! Simply:
INPUT.json
file.For local development or testing, clone the repository, install dependencies (e.g., apify-client
, httpx
), and use the apify run
command with a valid INPUT.json
.
The Actor stores results in the default Apify dataset, accessible as JSON. See the Output Example for a sample response.
The Actor accepts a JSON object with the following attributes. All fields are optional unless specified.
Attribute | Type | Description | Default Value | Editor | Constraints/Options |
---|---|---|---|---|---|
any_field_value | string | Search term to match across all fields. | "" | textfield | None |
any_field_operator | string | Operator for the 'Any Field' search term. | "contains" | select | ["contains", "not"] |
title_value | string | Search term to match in the title field. | "" | textfield | None |
title_operator | string | Operator for the 'Title' search term. | "contains" | select | ["contains", "not"] |
creator_value | string | Search term to match in the creator field. | "" | textfield | None |
creator_operator | string | Operator for the 'Creator' search term. | "contains" | select | ["contains", "not"] |
description_value | string | Search term to match in the description field. | "" | textfield | None |
description_operator | string | Operator for the 'Description' search term. | "contains" | select | ["contains", "not"] |
collection_value | string | Search term to match in the collection field. | "" | textfield | None |
collection_operator | string | Operator for the 'Collection' search term. | "contains" | select | ["contains", "not"] |
mediatype_value | string | Search term to match in the mediatype field. | "" | textfield | None |
mediatype_operator | string | Operator for the 'Media Type' search term. | "is" | select | ["is", "not"] |
custom_field_1_name | string | Name of the first custom field to search. | "" | textfield | None |
custom_field_1_value | string | Value of the first custom field to search. | "" | textfield | None |
custom_field_1_operator | string | Operator for the first custom field search term. | "contains" | select | ["contains", "not"] |
custom_field_2_name | string | Name of the second custom field to search. | "" | textfield | None |
custom_field_2_value | string | Value of the second custom field to search. | "" | textfield | None |
custom_field_2_operator | string | Operator for the second custom field search term. | "contains" | select | ["contains", "not"] |
custom_field_3_name | string | Name of the third custom field to search. | "" | textfield | None |
custom_field_3_value | string | Value of the third custom field to search. | "" | textfield | None |
custom_field_3_operator | string | Operator for the third custom field search term. | "contains" | select | ["contains", "not"] |
custom_field_4_name | string | Name of the fourth custom field to search. | "" | textfield | None |
custom_field_4_value | string | Value of the fourth custom field to search. | "" | textfield | None |
custom_field_4_operator | string | Operator for the fourth custom field search term. | "contains" | select | ["contains", "not"] |
custom_field_5_name | string | Name of the fifth custom field to search. | "" | textfield | None |
custom_field_5_value | string | Value of the fifth custom field to search. | "" | textfield | None |
custom_field_5_operator | string | Operator for the fifth custom field search term. | "contains" | select | ["contains", "not"] |
date | string | Exact date to match (format: YYYY-MM-DD). | "" | textfield | Must match ^\d{4}-\d{2}-\d{2}$ |
date_range_start | string | Start date of the range to match (format: YYYY-MM-DD). | "" | textfield | Must match ^\d{4}-\d{2}-\d{2}$ |
date_range_end | string | End date of the range to match (format: YYYY-MM-DD). | "" | textfield | Must match ^\d{4}-\d{2}-\d{2}$ |
hits_per_page | integer | Number of items to return per page. | 50 | number | Min: 1, Max: 1000 |
page | integer | Page number to fetch (1-based). | 1 | number | Min: 1 |
sort_name | string | Field to sort by (e.g., publicdate , downloads ). | "" | textfield | None |
sort_value | string | Sort direction (ascending or descending). | "" | select | ["asc", "desc"] |
Below is an example INPUT.json
file demonstrating a search for Spanish audio files created between 1994 and 2024, sorted by publication date in descending order.
1{ 2 "any_field_value": "", 3 "any_field_operator": "contains", 4 "title_value": "learn spanish", 5 "title_operator": "contains", 6 "creator_value": "", 7 "creator_operator": "not", 8 "description_value": "", 9 "description_operator": "not", 10 "collection_value": "", 11 "collection_operator": "contains", 12 "mediatype_value": "", 13 "mediatype_operator": "is", 14 "custom_field_1_name": "", 15 "custom_field_1_value": "", 16 "custom_field_1_operator": "contains", 17 "custom_field_2_name": "", 18 "custom_field_2_value": "", 19 "custom_field_2_operator": "contains", 20 "custom_field_3_name": "", 21 "custom_field_3_value": "", 22 "custom_field_3_operator": "contains", 23 "custom_field_4_name": "", 24 "custom_field_4_value": "", 25 "custom_field_4_operator": "contains", 26 "custom_field_5_name": "", 27 "custom_field_5_value": "", 28 "custom_field_5_operator": "contains", 29 "date": "2019-05-10", 30 "date_range_start": "1994-06-06", 31 "date_range_end": "2024-02-06", 32 "hits_per_page": 50, 33 "page": 1, 34 "sort_name": "downloads", 35 "sort_value": "desc" 36}
The Actor returns a JSON object stored in the Apify dataset. Below is a sample output for the above input, assuming a successful API response.
1[ 2 { 3 "index": "prod-o-001", 4 "service_backend": "metadata", 5 "hit_type": "item", 6 "identifier": "podcast_learn-spanish-with-daily-podca_1052684843", 7 "filename": "", 8 "file_basename": "", 9 "page_num": 0, 10 "file_creation_mtime": 0, 11 "updated_on": "", 12 "created_on": "", 13 "mediatype": "collection", 14 "title": "Learn Spanish with daily podcasts", 15 "publicdate": "2019-06-15T11:08:02Z", 16 "downloads": 3826, 17 "collection": [ 18 "podcasts", 19 "audio" 20 ], 21 "subject": [ 22 "podcast", 23 "itunes", 24 "apple" 25 ], 26 "addeddate": "2019-06-15T11:08:02Z", 27 "description": "L", 28 "result_in_subfile": false, 29 "__href__": "", 30 "highlight": [], 31 "_score": null, 32 "url": "https://archive.org/details/podcast_learn-spanish-with-daily-podca_1052684843" 33 }, 34 { 35 "index": "prod-o-001", 36 "service_backend": "metadata", 37 "hit_type": "item", 38 "identifier": "lp_listen-learn-spanish_no-artist", 39 "filename": "", 40 "file_basename": "", 41 "page_num": 0, 42 "file_creation_mtime": 0, 43 "updated_on": "", 44 "created_on": "", 45 "mediatype": "audio", 46 "title": "Listen & Learn Spanish", 47 "publicdate": "2020-11-09T08:43:19Z", 48 "downloads": 2546, 49 "collection": [ 50 "album_recordings", 51 "vinyl_bostonpubliclibrary", 52 "audio_music", 53 "unlockedrecordings" 54 ], 55 "subject": [ 56 "Non-Music", 57 "Speech", 58 "Education" 59 ], 60 "addeddate": "2020-12-02T02:37:01Z", 61 "description": "T", 62 "result_in_subfile": false, 63 "__href__": "", 64 "highlight": [], 65 "_score": null, 66 "url": "https://archive.org/details/lp_listen-learn-spanish_no-artist" 67 }, 68]
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!