Website Email Scraper

Extract videos, images, audio, APKs & emails from websites. This Apify actor crawls pages to discover media links with configurable depth, proxy support & domain filtering. Boost content research & lead gen.

LEAD_GENERATIONDEVELOPER_TOOLSAUTOMATIONApify

Try Now →

Website Email Extractor - Most efficieent

🔍 Overview

Media Link Extractor is a powerful Apify actor that automatically crawls websites to discover and extract various types of media links including videos, images, audio files, APK files, and email addresses. Perfect for content aggregation, SEO research, lead generation, and digital asset management.

Media Link Extractor Banner

✨ Key Features

Multi-Media Support: Extract various media types (videos, images, audio, APKs, emails)
Configurable Crawling: Set crawl depth, concurrency, and URL limits to suit your needs
Smart Extraction: Uses multiple detection methods including URL patterns, HTML tags, and CSS selectors
Proxy Support: Optional Apify proxy integration for better scraping success rates
Domain Filtering: Stays on the same domain to focus crawling on relevant content
Detailed Output: Organized dataset with source URLs, timestamps, and media metadata
Rate Limiting Protection: Built-in mechanisms to avoid overloading target websites

🎯 Use Cases

Content Creators: Find media resources for projects and presentations
Digital Marketers: Discover image and video assets for competitor analysis
App Developers: Locate APK distribution points for competitive research
Lead Generation: Extract email addresses for business outreach campaigns
SEO Specialists: Analyze media usage patterns across websites
Researchers: Gather media files for analysis and archiving projects

🛠️ Input Parameters

1{
2  "startUrls": [
3    { "url": "https://example.com" }
4  ],
5  "mediaType": "all",
6  "maxCrawlDepth": 1,
7  "maxConcurrency": 10,
8  "maxRequestRetries": 3,
9  "maxUrlsToCrawl": 100,
10  "useProxy": {
11    "useApifyProxy": false,
12    "apifyProxyGroups": [],
13    "apifyProxyCountry": ""
14  }
15}

Parameter Details

Parameter	Type	Description
`startUrls`	Array	List of URLs where the crawler will begin
`mediaType`	String	Type of media to extract: `video`, `audio`, `image`, `apk`, `email`, or `all`
`maxCrawlDepth`	Number	How many links deep the crawler will go
`maxConcurrency`	Number	Maximum parallel requests
`maxRequestRetries`	Number	Number of retry attempts for failed requests
`maxUrlsToCrawl`	Number	Maximum number of URLs to process
`useProxy`	Object	Configuration for Apify proxy usage

📊 Output Format

The actor stores results in the default dataset with this structure:

1{
2  "sourceUrl": "https://example.com/page",
3  "pageTitle": "Example Page Title",
4  "mediaLinks": [
5    {
6      "url": "https://example.com/video.mp4",
7      "sourceUrl": "https://example.com/page",
8      "title": "Example Page Title",
9      "type": "video",
10      "foundAt": "2025-04-10T06:40:01.000Z"
11    }
12  ],
13  "timestamp": "2025-04-10T06:40:01.000Z"
14}

⚙️ Technical Implementation

Media Link Extractor uses a combination of techniques to find media resources:

CSS Selectors: Targets specific HTML elements containing media
URL Pattern Matching: Identifies file extensions and URL patterns
Context Analysis: Examines surrounding elements for media indicators
Domain Adherence: Maintains focus on the original domain

💡 Best Practices

Start Small: Begin with a low maxUrlsToCrawl value to test results
Respect Websites: Use reasonable maxConcurrency values to avoid overloading sites
Optimize Depth: Most valuable media is often found within 1-2 levels of crawl depth
Target Specific Media: Use the appropriate mediaType parameter instead of "all" for more focused results

📚 Examples

Extract Videos from a Website

1{
2  "startUrls": [{ "url": "https://example.com/videos" }],
3  "mediaType": "video",
4  "maxCrawlDepth": 2,
5  "maxUrlsToCrawl": 50
6}

Find Email Addresses for Lead Generation

1{
2  "startUrls": [{ "url": "https://company.com/about" }],
3  "mediaType": "email",
4  "maxCrawlDepth": 3,
5  "maxUrlsToCrawl": 200
6}

Collect APK Files from Android Sites

1{
2  "startUrls": [{ "url": "https://apksite.com" }],
3  "mediaType": "apk",
4  "maxCrawlDepth": 2,
5  "maxUrlsToCrawl": 100
6}

📈 Performance Considerations

Processing speed depends on website complexity and response times
Typical extraction rates: 5-10 pages per second without proxy, 2-5 pages per second with proxy
Memory usage scales with concurrency and page complexity
Consider using Apify proxy for rate-limited or IP-blocking websites

🔗 Integration Ideas

Connect with Apify Storage for permanent dataset archiving
Combine with Google Sheets integration for easy team collaboration
Use with Zapier or Make to automate workflows with extracted media
Export data to S3 or other cloud storage for batch processing

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!