Algolia Website Indexer

The Indexer crawls recursively a website using the Puppeteer browser (headless Chrome) and indexes the selected pages to the Algolia index.

apify

Try Now →

The Indexer crawls a website using the Puppeteer browser (headless Chrome) and indexes the selected pages to the Algolia index. It was designed to run in an Apify actor.

Usage

You can find instructions on how to run it in the Apify cloud on its Apify Store page. If you want to run it in your environment, you can use the Apify CLI.

Input

The input of the actor is JSON with the following parameters.

Field	Type	Description
algoliaAppId	String	Your Algolia Application ID
algoliaApiKey	String	Your Algolia API key
algoliaIndexName	String	Your Algolia index name
crawlerName	String	Crawler name, it updates/removes/adds pages into the index regarding this name. In this case, you can have more websites in the index.
startUrls	Array	URLs where crawler starts crawling
selectors	Array	Selectors, which text content you want to index. Key is name of the attribute and value is the CSS selector.
waitForElement	String	Selector of an element to wait on each page.
additionalPageAttrs	Object	Additional attributes you want to attach to each record in the index.
skipIndexUpdate	Boolean	Option to switch off updating the Algolia index.

Advanced

There are a few parameters not shown in the UI. These parameters change the behaviour of crawling, and you can set them up using the API or in the local environment.

Field	Type	Description
pageFunction	String	Overrides default pageFunction
pseudoUrls	Array	Overrides default pseudoUrls
clickableElements	String	Overrides default clickableElements
keepUrlFragment	Boolean	Option to switch on enqueueing URL with URL fragments
omitSearchParamsFromUrl	Boolean	Option to switch off enqueueing with search params.

Debug indexed pages

You can find all the pages that will be indexed in the default dataset for a specific actor run.

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!