Extracts URLs from a sitemap or webpage with intuitive path matching. Use comma-separated patterns to include or exclude URL paths with smart matching: '/tags/' for exact paths, '/product' for paths starting with, or simple text for substring matches.
This actor extracts URLs from a sitemap or any webpage containing links. It provides intuitive URL path matching and flexible filtering options to get exactly the URLs you need.
Field | Type | Description |
---|---|---|
link | String | URL to process (required) |
urlPattern | String | List of URL parts to include (comma separated). Use '*' to include all URLs. When using slashes: '/tags/' matches exact path, '/tags' matches path starting with /tags, 'tags/' matches path ending with tags/. Without slashes (e.g., 'product') matches anywhere in URL |
maxUrls | Integer | Maximum number of URLs to process (0 for no limit). Good for testing purposes |
excludeExtensions | String | List of file extensions to exclude (comma separated). Example: jpg,jpeg,png,gif |
customExcludePattern | String | List of URL parts to exclude (comma separated). Uses same pattern matching as urlPattern. Examples: '/tags/,category' or '/blog/,author' |
The actor outputs a dataset containing URLs that match your specified criteria. Each record has the following field:
1{ 2 "url": "https://example.com/page" 3}
Extract all URLs from a sitemap:
1{ 2 "link": "https://example.com/sitemap.xml" 3}
Get only product URLs with different matching options:
1{ 2 "link": "https://example.com/sitemap.xml", 3 "urlPattern": "/products/,productId,deals/" 4}
This will match:
Get URLs excluding images and specific sections:
1{ 2 "link": "https://example.com/sitemap.xml", 3 "excludeExtensions": "jpg,jpeg,png,gif", 4 "customExcludePattern": "/tags/,/category/,author" 5}
Get first 100 URLs for testing:
1{ 2 "link": "https://example.com/sitemap.xml", 3 "maxUrls": 100 4}
Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.
No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.
It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.
Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.
You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!