Get URLs from link

Extracts URLs from a sitemap or webpage with intuitive path matching. Use comma-separated patterns to include or exclude URL paths with smart matching: '/tags/' for exact paths, '/product' for paths starting with, or simple text for substring matches.

DEVELOPER_TOOLSApify

Try Now →

This actor extracts URLs from a sitemap or any webpage containing links. It provides intuitive URL path matching and flexible filtering options to get exactly the URLs you need.

Features

Extract URLs from XML sitemaps or webpages
Smart URL path matching:
- Use '/tags/' to match exact path
- Use '/product' to match paths starting with /product
- Use 'product' to match URLs containing this text anywhere
Exclude specific file extensions (e.g., images)
Exclude URLs using the same smart path matching
Limit the number of processed URLs
Simple comma-separated syntax for filters

Input Configuration

Field	Type	Description
`link`	String	URL to process (required)
`urlPattern`	String	List of URL parts to include (comma separated). Use '*' to include all URLs. When using slashes: '/tags/' matches exact path, '/tags' matches path starting with /tags, 'tags/' matches path ending with tags/. Without slashes (e.g., 'product') matches anywhere in URL
`maxUrls`	Integer	Maximum number of URLs to process (0 for no limit). Good for testing purposes
`excludeExtensions`	String	List of file extensions to exclude (comma separated). Example: jpg,jpeg,png,gif
`customExcludePattern`	String	List of URL parts to exclude (comma separated). Uses same pattern matching as urlPattern. Examples: '/tags/,category' or '/blog/,author'

Output

The actor outputs a dataset containing URLs that match your specified criteria. Each record has the following field:

1{
2    "url": "https://example.com/page"
3}

Usage Examples

Basic Usage

Extract all URLs from a sitemap:

1{
2    "link": "https://example.com/sitemap.xml"
3}

Smart Path Matching

Get only product URLs with different matching options:

1{
2    "link": "https://example.com/sitemap.xml",
3    "urlPattern": "/products/,productId,deals/"
4}

This will match:

URLs containing exact '/products/' path
URLs containing 'productId' anywhere
URLs ending with 'deals/'

Exclude File Types and Sections

Get URLs excluding images and specific sections:

1{
2    "link": "https://example.com/sitemap.xml",
3    "excludeExtensions": "jpg,jpeg,png,gif",
4    "customExcludePattern": "/tags/,/category/,author"
5}

Limit Results

Get first 100 URLs for testing:

1{
2    "link": "https://example.com/sitemap.xml",
3    "maxUrls": 100
4}

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!