Extracts URLs from a sitemap or webpage with intuitive path matching. Use comma-separated patterns to include or exclude URL paths with smart matching: '/tags/' for exact paths, '/product' for paths starting with, or simple text for substring matches.
This actor extracts URLs from a sitemap or any webpage containing links. It provides intuitive URL path matching and flexible filtering options to get exactly the URLs you need.
Features
Extract URLs from XML sitemaps or webpages
Smart URL path matching:
Use '/tags/' to match exact path
Use '/product' to match paths starting with /product
Use 'product' to match URLs containing this text anywhere
Exclude specific file extensions (e.g., images)
Exclude URLs using the same smart path matching
Limit the number of processed URLs
Simple comma-separated syntax for filters
Input Configuration
Field
Type
Description
link
String
URL to process (required)
urlPattern
String
List of URL parts to include (comma separated). Use '*' to include all URLs. When using slashes: '/tags/' matches exact path, '/tags' matches path starting with /tags, 'tags/' matches path ending with tags/. Without slashes (e.g., 'product') matches anywhere in URL
maxUrls
Integer
Maximum number of URLs to process (0 for no limit). Good for testing purposes
excludeExtensions
String
List of file extensions to exclude (comma separated). Example: jpg,jpeg,png,gif
customExcludePattern
String
List of URL parts to exclude (comma separated). Uses same pattern matching as urlPattern. Examples: '/tags/,category' or '/blog/,author'
Output
The actor outputs a dataset containing URLs that match your specified criteria. Each record has the following field:
1{2"url":"https://example.com/page"3}
Usage Examples
Basic Usage
Extract all URLs from a sitemap:
1{2"link":"https://example.com/sitemap.xml"3}
Smart Path Matching
Get only product URLs with different matching options:
Is it legal to scrape job listings or public data?
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
Do I need to code to use this scraper?
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
What data does it extract?
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Can I scrape multiple pages or filter by location?
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
How do I get started?
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!