/llms.txt Generator

/llms.txt Generator

The /llms.txt Generator ๐Ÿ•ธ๏ธ๐Ÿ“„ extracts website content to create an llms.txt file for AI apps ๐Ÿค–โœจ like LLM fine-tuning and indexing. Output is available ๐Ÿ“ฅ in the Key-Value Store for easy download and integration into workflows. ๐Ÿš€

AIAUTOMATIONOPEN_SOURCEApify

/llms.txt Generator ๐Ÿš€๐Ÿ“„

Agent Actor Inspector GitHub Repo stars

The /llms.txt Generator is an Apify Actor that helps you extract essential website content and generate an /llms.txt file, making your content ready for AI-powered applications such as fine-tuning, indexing, and integrating large language models (LLMs) like GPT-4, ChatGPT, or LLaMA. This Actor leverages the Website Content Crawler actor to perform deep crawls and extract text content from web pages, ensuring comprehensive data collection. The Website Content Crawler is particularly useful because it supports output in multiple formats, including markdown, which is used by the /llms.txt.

๐ŸŒŸ What is /llms.txt?

The /llms.txt format is a markdown-based standard for providing AI-friendly content. It contains:

  • Brief background information and guidance.
  • Links to additional resources in markdown format.
  • AI-focused structure to help coders, researchers, and AI models easily access and use website content.

Proposed structure:

1# Title
2
3> Optional description
4
5Optional details go here
6
7## Section name
8
9- [Link title](https://link_url): Optional link details
10
11## Optional
12
13- [Link title](https://link_url)

By adding an /llms.txt file to your website, you make it easy for AI systems to understand, index, and use your content effectively.


๐ŸŽฏ Features of /llms.txt Generator

Our Actor is designed to simplify and automate the creation of /llms.txt files. Here are its key features:

  • Deep website crawling: Extracts content from multi-level websites using the powerful Crawlee library and the Website Content Crawler Actor.
  • Content extraction: Retrieves key metadata such as titles, descriptions, and URLs for seamless integration.
  • File generation: Saves the output in the standardized /llms.txt format.
  • Downloadable output: The /llms.txt file can be downloaded from the key-value store in the storage section of the Actor run details.
  • Resource management: Limits the crawler Actor to 4 GB of memory to ensure compatibility with the free tier, which has an 8 GB limit. Note that this may slow down the crawling process.

๐Ÿš€ How it works

  1. Input: Provide the start URL of the website to crawl.
  2. Configuration: Set the maximum crawl depth and other options (optional).
  3. Output: The Actor generates a structured /llms.txt file with extracted content, ready for AI applications.

Input example

1{
2  "startUrl": "https://docs.apify.com",
3  "maxCrawlDepth": 1
4}

Output example (/llms.txt)

1# docs.apify.com
2
3## Index
4
5- [Home | Platform | Apify Documentation](https://docs.apify.com/platform): Apify is your one-stop shop for web scraping, data extraction, and RPA. Automate anything you can do manually in a browser.
6- [Web Scraping Academy | Academy | Apify Documentation](https://docs.apify.com/academy): Learn everything about web scraping and automation with our free courses that will turn you into an expert scraper developer.
7- [Apify Documentation](https://docs.apify.com/api)
8- [API scraping | Academy | Apify Documentation](https://docs.apify.com/academy/api-scraping): Learn all about how the professionals scrape various types of APIs with various configurations, parameters, and requirements.
9- [API client for JavaScript | Apify Documentation](https://docs.apify.com/api/client/js/)
10- [Apify API | Apify Documentation](https://docs.apify.com/api/v2)
11- [API client for Python | Apify Documentation](https://docs.apify.com/api/client/python/)
12...

โœจ Why use /llms.txt Generator?

  • Save time: Automates the tedious process of extracting, formatting, and organizing web content.
  • Boost AI performance: Provides clean, structured data for LLMs and AI-powered tools.
  • Future-proof: Follows a standardized format thatโ€™s gaining adoption in the AI community.
  • User-friendly: Easy integration into customer-facing products, allowing users to generate /llms.txt files effortlessly.

๐Ÿ”ง Technical highlights

  • Built on the Apify SDK, leveraging state-of-the-art web scraping tools.
  • Designed to handle JavaScript-heavy websites using headless browsers.
  • Equipped with anti-scraping features like proxy rotation and browser fingerprinting.
  • Extensible for custom use cases.

๐Ÿ“– Learn more


Start generating /llms.txt files today and empower your AI applications with clean, structured, and AI-friendly data! ๐ŸŒ๐Ÿค–

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool โ€” just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. Youโ€™ll be guided to input a search term and get structured results. No setup needed!