W3C Html Reporter

Get HTML validity reports from various web pages using W3C HTML validator.

DEVELOPER_TOOLSSEO_TOOLSApify

W3C HTML Validity Reporter

The W3C HTML Validity Reporter is an Apify actor that generates reports on the validity of given webpages HTML according to the W3C HTML Validator. The actor takes webpages URL as input and produces reports with detailed information on the validity of the webpages HTML.

Input

The actor takes the following input:

startUrls (required): The URL of the webpages to validate.
proxy (Object): Proxy configuration. You can edit this to use Apify proxy, or provide your own proxy servers. Default value is { "useApifyProxy": false }.
debug (Boolean): See detailed logs when activated. Default value is false.

Output

The actor generates a JSON report on the validity of the webpages HTML. The report includes:

A list of messages given by the validator

Usage

To use the actor, you'll need an Apify account. If you don't have one, sign up for free on the Apify website.

Once you have an account, you can run the actor by creating a new task with the following configuration:

1{
2  "startUrls": [{
3      "url": "https://example.com"
4    }
5  ],
6  "proxy": {
7    "useApifyProxy": false
8  },
9  "debug": false
10}

Replace "https://example.com" with the URL of the webpage you want to validate.

Please note that w3c validator use Cloudflare to protect their website against bot. You may need to use Apify proxy in order to use this crawler.

Results example

The output from scraping W3C validator is stored in the dataset. Each messsage is stored as an item inside the dataset. After the run is finished, you can download the scraped data onto your computer or export to any web app in various data formats (JSON, CSV, XML, RSS, HTML Table). Here's a few examples of the outputs you can get:

1{
2  "url": "https://apify.com",
3  "language": "en",
4  "severity": "info",
5  "lastLine": 10,
6  "firstColumn": 301,
7  "lastColumn": 357,
8  "message": "Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.",
9  "markup": "rowser."/><meta name="twitter:card" content="summary_large_image"/><meta ",
10  "highlightIndex": 10,
11  "highlightLength": 57
12}

1{
2  "url": "https://apify.com",
3  "language": "en",
4  "severity": "warning",
5  "firstLine": 614,
6  "lastLine": 614,
7  "firstColumn": 5684,
8  "lastColumn": 5721,
9  "message": "Section lacks heading. Consider using “h2”-“h6” elements to add identifying headings to all sections, or else use a “div” element instead for any cases where no heading is needed.",
10  "markup": "-0 wwExY"><section class="sc-1913faef-1 jYOdxN"><div c",
11  "highlightIndex": 10,
12  "highlightLength": 38
13}

1{
2  "url": "https://apify.com",
3  "language": "en",
4  "severity": "error",
5  "lastLine": 10,
6  "firstColumn": 1210,
7  "lastColumn": 1272,
8  "message": "A “meta” element with an “http-equiv” attribute whose value is “X-UA-Compatible” must have a “content” attribute with the value “IE=edge”.",
9  "markup": "ent="24"/><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/><meta ",
10  "highlightIndex": 10,
11  "highlightLength": 63
12}

Frequently Asked Questions

Is it legal to scrape job listings or public data?

Yes, if you're scraping publicly available data for personal or internal use. Always review Websute's Terms of Service before large-scale use or redistribution.

Do I need to code to use this scraper?

No. This is a no-code tool — just enter a job title, location, and run the scraper directly from your dashboard or Apify actor page.

What data does it extract?

It extracts job titles, companies, salaries (if available), descriptions, locations, and post dates. You can export all of it to Excel or JSON.

Can I scrape multiple pages or filter by location?

Yes, you can scrape multiple pages and refine by job title, location, keyword, or more depending on the input settings you use.

How do I get started?

You can use the Try Now button on this page to go to the scraper. You’ll be guided to input a search term and get structured results. No setup needed!