Solarly
← Solarly
AI report card

Common Crawl

commoncrawl.org
Live infrastructure simulation · commoncrawl.org
Baseline
Common Crawl
AI-Readiness Index42/100
Core Web Vitals40
Schema Coverage12%
Agent Crawlability32%
// current <head> on commoncrawl.org
<title>Common Crawl - Home</title>
<meta name="description" content="Welcome to Common Crawl.">
<!-- no JSON-LD -->
<!-- no /llms.txt -->
<!-- no agent manifest -->
<link rel="icon" href="/favicon.ico">
<script src="/legacy-analytics.js"></script>
AI crawlers parse this and bounce — no structured signal.
Toggle the switch to inject Solarly's infrastructure layer.
42
AI-readiness score / 100

Common Crawl is a non-profit organization that provides a free, open repository of web crawl data, established in 2007. It contains over 300 billion web pages and adds 3-5 billion new pages monthly. The data is widely used in research, cited in over 10,000 papers. Key resources include access to various indexes, crawl statistics, and community collaboration tools. Recent updates include the release of the June 2026 crawl archive containing 2.10 billion pages. The organization's mission focuses on making web data accessible for research and analysis.

Breakdown

Schema.org Coverage0
Crawler Accessibility100
Content Structure100
Metadata Quality10
AI Directives (llms.txt)0

What's holding Common Crawl back

Solá, Solarly's growth agent

Get alerts when this changes

Solá will watch Common Crawl and email you the moment its AI visibility shifts. $19/mo · cancel any time.

Explore