Universal AI GPT Scraper

Transform any website into structured data with AI-powered extraction. This versatile tool combines advanced web scraping with intelligent content analysis to deliver clean, customized JSON output - perfect for automating data collection from any web source.

AIAUTOMATIONDEVELOPER_TOOLSApify

Try Now →

Transform any website into structured data effortlessly! This powerful Apify actor revolutionizes web scraping by combining AI with precise data extraction. Simply specify what data you need, and watch as advanced AI models intelligently parse web content into clean, structured JSON - saving you countless hours of manual data collection and processing. Perfect for businesses and developers who need reliable, automated data extraction without complex coding or maintenance.

Use Cases

Extract product information from e-commerce sites
Gather pricing data from service providers
Collect structured data from blog posts or articles
Extract specific fields from documentation pages
Convert any web content into structured JSON data

Features

🎯 Custom Field Extraction: Define exactly what fields you want to extract
🤖 AI-Powered: Uses advanced language models to understand and extract content
📊 Structured Output: Get clean JSON or CSV data with your specified fields
🔄 Type Support: Specify the data type for each field (string, number, boolean, etc.)
🎛️ Model Selection: Choose from predefined AI models or use your own
🎯 CSS Selector Support: Target specific page elements using CSS selectors
🔒 Secure: Support for secret API keys and proxy configuration

Input Configuration

Required Fields

URLs (array): List of web pages to scrape
Fields (array): Specification of fields to extract, each containing:
- name: Field name in the output
- description: Description to guide the AI, be as specific and descriptive as possible
- type: Data type (string, number, boolean, array, object)

Content Extraction Options

CSS Selector (string, optional): CSS selector to target specific elements on the page. This can greatly reduce the AI cost, by reducing the number of input tokens. It can also have a positive impact on accuracy. If provided, only text from elements matching this selector will be extracted. If not provided, the default content extraction will be used. This is an advanced option, if you are not familiar with CSS selectors, please do not provide one. Inspect the HTML of a page to find the correct CSS selector.

Example CSS selectors:

main: selects elements with tag "main".
#price: selects elements with id "price".
.product-details-container .price, .product-details-container .description: selects elements with class "price" and "description" that are descendants of elements with class "product-details-container".
article.main-story, .article-body > p: selects elements with tag "article" and class "main-story", as well as direct child "p" elements under elements with class "article-body".
.documentation-content h2, .documentation-content .method-signature: selects "h2" elements and elements with class "method-signature" that are descendants of elements with class "documentation-content".
.post-container[data-type="user-post"] .content: selects elements with class "content" that are descendants of elements with both class "post-container" and data-type attribute "user-post".
#product-listing div.item:not(.ad) .details h3, .price-info span.current-price: selects "h3" elements under elements with class "details" that are descendants of "div" elements with class "item" but not class "ad" under element with ID "product-listing", as well as "span" elements with class "current-price" under elements with class "price-info".

AI Model Configuration

You can either use one of our predfined models which we verified that work well. Or you could specify your own model from OpenRouter. If you use a predefined model, you don't have to bring your own API key we will cover the AI cost and you will be charged for it through Apify usage. If you bring your own OpenRouter API key you will not be charged for the AI cost. Your API key is stored securly and encrypted with Apify.

After some testing we found Google Gemini Flash 2.0 to give the best quality for the lowest price.

Free Apify users can only process 1 URL every 24 hours using predefined models to test out this functionality. If you are a free user you will have to upgrade your Apify account to a paying subscription tier to use predefined models or bring your own OpenRouter API key.

Option 1: Predefined Models

Predefined Model (string): Choose from supported models:
- Google Gemini Flash 1.5
- Google Gemini Flash 2.0 (recommended)
- OpenAI GPT-4o-mini
- Google Gemini Pro 1.5
- OpenAI GPT-4o

Option 2: Custom Model

Use Custom Model (boolean): Toggle to use your own model
Custom Model Name (string): OpenRouter model identifier e.g. google/gemini-2.0-flash-001
OpenRouter API Key (string): Your API key for custom model access (is stored encrypted)

Make sure your model supports structured outputs. Check model compatibility at: https://openrouter.ai/models?supported_parameters=structured_outputs

Proxies

Proxy Configuration (object): Configure proxy settings for web scraping

Example input

1{
2    "urls": [
3        "https://apify.com/clockworks/free-tiktok-scraper"
4    ],
5    "fields": [
6        {
7            "name": "name",
8            "description": "The name/title of the scraper tool",
9            "type": "string"
10        },
11        {
12            "name": "price",
13            "description": "The price per 1000 results, only the number",
14            "type": "number"
15        },
16        {
17            "name": "author",
18            "description": "The author or maintainer of the scraper",
19            "type": "string"
20        }
21    ],
22    "cssSelector": "main > article",
23    "useCustomModel": false,
24    "predefinedModel": "google/gemini-2.0-flash-001",
25    "proxyConfiguration": {
26        "useApifyProxy": true,
27        "apifyProxyGroups": [
28            "RESIDENTIAL"
29        ]
30    }
31}

Output

The actor outputs a dataset where each item contains:

url: The source URL
Custom fields as specified in your input configuration

Example output:

1{
2    "url": "https://apify.com/clockworks/free-tiktok-scraper",
3    "author": "Clockworks",
4    "name": "TikTok Data Extractor",
5    "price": 4
6}

Cost

There are 3 costs to using this model: startup cost, cost per result and AI cost. We split it up like this to make our pricing as competitive as possible.

There's a one time charge of $0.05 (5 cents) every time you start an actor run. This cost is to cover server startup times.
Every result pushed to the dataset (= every input URL) is charged at $0.001 (1/10th of a cent).
If you use a predefined model you will be charged for every 1,000 tokens depending on the AI model used. If you bring your own API key you will not be charged this.
- Google Gemini Flash 1.5: $0.0006 / 1,000 tokens (6/100th of a cent)
- Google Gemini Flash 2.0: $0.0008 / 1,000 tokens (8/100th of a cent, best value)
- Open AI GPT-4o mini: $0.0012 / 1,000 tokens (12/100th of a cent)
- Google Gemini Pro 1.5: $0.02 / 1,000 tokens (2 cents)
- Open AI GPT4o: $0.04 / 1,000 tokens (4 cents)

You can check how many tokens are in a given text by using the Open AI Tokenizer: https://platform.openai.com/tokenizer. Generally speaking 1 token = 1 word.

Limitations

The AI models require clear, well-structured content for best results
Some models may have token limits affecting the amount of text they can process
Custom models must support structured output format
Rate limits may apply based on the chosen AI provider

Cost of Usage

When using predefined models, AI costs are covered
Custom model usage requires your own OpenRouter API key and credits
Standard Apify platform charges apply (proxy usage if enabled)

Tips for Best Results

Be specific in your field descriptions
Choose appropriate data types for each field
Test with a small number of URLs first
Use the model that best fits your needs (faster models for simple extraction, more powerful models for complex tasks)
Consider using proxies when scraping at scale
Use CSS selectors when you know exactly which elements contain the relevant information
Test your CSS selectors first in browser DevTools to ensure they match the desired elements

Technical Details

Built with TypeScript
Uses Crawlee for web scraping
Integrates with OpenRouter for AI processing
Supports structured output with JSON schema validation
Includes automatic error handling and retries
Supports both default content extraction and CSS selector-based extraction

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!