Smart_SEO_Media
IT
15 min read Tutorial

How I Use Claude to Run SEO Audits in Minutes Instead of Hours

What used to take a full day now takes 20 minutes. Here's the exact workflow — with Python code, Claude API calls, and the real output.

A client comes in. Their site isn't ranking. Traffic is flat or declining, and they need answers.

The old approach? Open five different tools. Export CSVs from Google Search Console, Screaming Frog, Ahrefs, maybe Semrush too. Stare at spreadsheets for hours trying to piece together a story from fragmented data. By the end of the day, you've got a document — but you've burned through your most productive hours on work that didn't require much strategic thinking.

I don't do that anymore. I've built a pipeline that combines Python scripts, the Claude API, live keyword data from DataForSEO, and structured prompts that handle everything from the initial crawl to a published editorial calendar. The thinking is still mine. The grunt work isn't.

Let me show you exactly how it works.

The Pipeline at a Glance

Full pipeline

CRAWL → AUDIT → GSC → CONTENT GAPS → KW RESEARCH → PLAN → ARTICLES
  ↑        ↑       ↑         ↑               ↑           ↑        ↑
Python  Claude  Claude+   Claude+        DataForSEO+  Claude  Claude+
crawler prompt  Python    DataForSEO     Claude score  skill  DataForSEO

Each step feeds the next. You can run just the first two for a quick audit, or chain the whole thing for a full SEO strategy. Let me walk through each one.

Step 1: Crawl the Site

Everything starts with data. I run a Python crawler that pulls the technical essentials from every page: title tags, meta descriptions, headings, canonicals, schema markup, internal links, images, load times, word count.

seo_crawler.py

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
import pandas as pd
import time, json

class SEOCrawler:
    def __init__(self, base_url, max_pages=100):
        self.base_url = base_url
        self.domain = urlparse(base_url).netloc
        self.max_pages = max_pages
        self.visited = set()
        self.results = []
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'SEOAuditBot/1.0 (internal audit tool)'
        })

    def crawl(self):
        self.queue = [self.base_url]
        while self.queue and len(self.visited) < self.max_pages:
            url = self.queue.pop(0)
            if url in self.visited:
                continue
            self.visited.add(url)
            page_data = self.analyze_page(url)
            if page_data:
                self.results.append(page_data)
            time.sleep(0.5)
        return self.results

    def analyze_page(self, url):
        try:
            response = self.session.get(url, timeout=15)
            soup = BeautifulSoup(response.text, 'html.parser')
            title = soup.find('title')
            meta_desc = soup.find('meta', attrs={'name': 'description'})
            h1_tags = [h.get_text().strip() for h in soup.find_all('h1')]
            canonical = soup.find('link', attrs={'rel': 'canonical'})
            images = soup.find_all('img')
            images_no_alt = [img for img in images
                             if not img.get('alt') or img['alt'].strip() == '']
            internal_links = []
            for link in soup.find_all('a', href=True):
                href = urljoin(url, link['href'])
                if urlparse(href).netloc == self.domain:
                    internal_links.append(href)
                    if href not in self.visited:
                        self.queue.append(href)
            for tag in soup(['script','style','nav','footer']):
                tag.decompose()
            word_count = len(soup.get_text(separator=' ', strip=True).split())
            schema_types = []
            for script in soup.find_all('script',
                                        attrs={'type': 'application/ld+json'}):
                try:
                    data = json.loads(script.string)
                    if isinstance(data, dict):
                        schema_types.append(data.get('@type', 'Unknown'))
                except (json.JSONDecodeError, TypeError):
                    pass
            return {
                'url': url,
                'status_code': response.status_code,
                'load_time_seconds': round(response.elapsed.total_seconds(), 2),
                'title': title.get_text().strip() if title else None,
                'title_length': len(title.get_text()) if title else 0,
                'meta_description': meta_desc['content'] if meta_desc else None,
                'h1_count': len(h1_tags),
                'canonical': canonical['href'] if canonical else None,
                'word_count': word_count,
                'images_without_alt': len(images_no_alt),
                'internal_links_count': len(internal_links),
                'schema_types': schema_types,
                'has_schema': len(schema_types) > 0,
            }
        except requests.RequestException as e:
            return {'url': url, 'status_code': 'Error', 'error': str(e)}

    def to_csv(self, filename='seo_crawl_data.csv'):
        df = pd.DataFrame(self.results)
        df.to_csv(filename, index=False)
        return df

Nothing fancy. Clean data, one row per page, ready to feed to Claude.

Step 2: The Technical Audit

Here's where Claude comes in. I don't just dump the CSV and say "audit this." I use a structured prompt that includes client context — industry, goals, team size, CMS. Claude uses all of it to calibrate recommendations.

audit.py

import anthropic, pandas as pd

def run_seo_audit(csv_path, client_info):
    client = anthropic.Anthropic()
    df = pd.read_csv(csv_path)
    crawl_data = df.to_json(orient='records', indent=2)

    prompt = f"""You are a senior technical SEO consultant.

## CLIENT CONTEXT
- Website: {client_info['website']}
- Industry: {client_info['industry']}
- Primary goal: {client_info['goal']}
- Team size: {client_info['team_size']}
- CMS: {client_info['cms']}

## CRAWL DATA
{crawl_data}

## DELIVERABLE
Produce a prioritized audit:
1. EXECUTIVE SUMMARY (3-4 sentences)
2. CRITICAL ISSUES — actively hurting rankings.
   For each: What (with URLs) → Why it matters → How to fix → Effort → Impact
3. QUICK WINS — high impact, under a day to fix.
4. CONTENT GAPS — thin pages, missing clusters, keyword opportunities.
5. TECHNICAL DEBT — important but not urgent (1-3 month horizon).
6. ACTION PLAN — Top 15 actions ranked by impact/effort.
   Format: Action | Hours | Impact | Priority (P0-P3)

Be specific. Every recommendation tied to actual data."""

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=8000,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

Here's what comes back (anonymized example from a real B2B SaaS client):

Output — example.com

1. EXECUTIVE SUMMARY
The site has solid topical foundations but suffers from
significant technical issues limiting crawl efficiency.
47% of pages have duplicate or missing meta descriptions,
12 pages return soft 404s, and the blog has 23 pages under
300 words. Fixing P0 issues alone should unlock ranking
improvements within 4-6 weeks.

2. CRITICAL ISSUES

2.1 Duplicate Title Tags (14 pages)
  • /features and /product → both "Project Management Software"
  • /pricing and /pricing-enterprise → both "Pricing Plans"
  • 9 blog tag pages → all just "Blog"
Why: Keyword cannibalization between /features and /product
     splits ranking signals for your top commercial term.
Fix: Unique, differentiated titles for each.
Effort: Low (2-3h) | Impact: High

2.2 Soft 404 Pages (12 pages)
  • /integrations/slack-legacy  (34 words)
  • /integrations/hipchat        (28 words)
  • /blog/author/jsmith          (0 words)
Fix: 301 redirect to active equivalents or return 410.
Effort: Low (1-2h) | Impact: Medium

6. ACTION PLAN
 #  | Action                          | Hours | Impact | Priority
----|--------------------------------|-------|--------|----------
 1  | Fix duplicate title tags        |  3h   | High   | P0
 2  | Resolve soft 404 pages          |  2h   | Medium | P0
 3  | Enable canonicals site-wide     | 1.5h  | Medium | P0
 4  | Write meta desc (top 10 pages)  | 2.5h  | High   | P1
 5  | Expand comparison pages (3)     |  12h  | High   | P1
...
P0 total: ~6.5 hours. Start here.

Specific URLs. Specific numbers. Prioritized by impact. The audit alone would take a full day manually — this takes about 5 minutes to generate.

Step 3: Layer in Search Console Data

The crawl catches technical issues. But for the performance layer — which keywords are winning, which are bleeding, where the real opportunities hide — I need Google Search Console data.

The key principle: compute in Python, interpret with Claude. I use this pattern everywhere.

gsc_analysis.py

def gsc_analysis(queries_csv, pages_csv, domain):
    client = anthropic.Anthropic()
    queries_df = pd.read_csv(queries_csv)

    # Python handles the number crunching
    expected_ctr = {1:28, 2:15, 3:11, 4:8, 5:7, 6:5, 7:4, 8:3, 9:2.5, 10:2}
    queries_df['expected_ctr'] = queries_df['position'].round().map(expected_ctr)
    queries_df['ctr_gap'] = queries_df['ctr'] - queries_df['expected_ctr']

    quick_wins = queries_df[
        (queries_df['position'] >= 5) &
        (queries_df['position'] <= 15) &
        (queries_df['impressions'] > 500)
    ].sort_values('impressions', ascending=False)

    # Claude handles the interpretation
    prompt = f"""You are a senior SEO analyst. Domain: {domain}
...
Deliver:
1. Performance overview with position distribution
2. CTR anomalies (title/meta optimization opportunities)
3. Quick wins with estimated traffic gain if moved to top 3
4. Cannibalization report: primary page vs pages to consolidate
5. Top 10 prioritized actions with traffic impact estimates"""

    response = client.messages.create(
        model="claude-sonnet-4-6", max_tokens=8000,
        messages=[{"role": "user", "content": prompt}])
    return response.content[0].text

Step 4: Competitive Content Gap Analysis

By now the audit has flagged content gaps. The GSC analysis has identified high-impression queries with no dedicated landing page. The next question is always: what are competitors covering that we're not?

This is where I pull in live data from DataForSEO. I parse the client's sitemap to map existing coverage, auto-discover their closest competitors, then run a domain intersection to find every keyword where a competitor ranks and the client doesn't.

gap_analysis.py

def find_content_gaps(domain, competitors=None, market="United States"):
    dataforseo_auth = ("login", "password")
    api = "https://api.dataforseo.com/v3"

    gaps = {}
    for comp in competitors:
        resp = requests.post(
            f"{api}/dataforseo_labs/google/domain_intersection/live",
            auth=dataforseo_auth,
            json=[{"target1": comp, "target2": domain,
                   "location_name": market, "language_code": "en",
                   "intersections": False,  # ← only keywords WE DON'T have
                   "limit": 100,
                   "filters": [
                     ["keyword_data.keyword_info.search_volume", ">", 100]
                   ]}])
        gaps[comp] = resp.json()

    # Claude clusters, scores, and prioritizes
    prompt = f"""You are a content strategist. Site: {domain}
...
Deliver:
1. Map current coverage: strengths and thin spots
2. Gap list by competitor, grouped by topic
3. Cluster into PILLAR → Cluster → Supporting keywords
4. Priority score:
   (Volume×0.25) + ((100-KD)×0.25) + (Gap×0.3) + (Relevance×0.2)
5. Top 10 content pieces to create:
   Target keyword | Format | Word count | Competitor to study | Priority"""

    response = client.messages.create(
        model="claude-sonnet-4-6", max_tokens=6000,
        messages=[{"role": "user", "content": prompt}])
    return response.content[0].text

The intersections: False parameter is the key. It tells DataForSEO: "show me only the keywords where competitor A ranks and my client doesn't." Claude then clusters these gaps into topic pillars, scores them, and tells me exactly what to create first.

Step 5: Keyword Research and Expansion

The gap analysis tells me where the opportunities are. Now I need to go deeper — understanding search volume, keyword difficulty, intent, and related terms.

keyword_research.py

def keyword_research(seed_keywords, market="United States"):
    # Three DataForSEO calls: overview, suggestions, related
    seed_data = requests.post(f"{api}/keyword_overview/live", ...).json()
    suggestions = requests.post(f"{api}/keyword_suggestions/live", ...).json()
    related = requests.post(f"{api}/related_keywords/live", ...).json()

    # Claude clusters and prioritizes
    prompt = """Cluster all keywords by topic and intent.
Score: (Volume×0.4) + ((100-KD)×0.3) + (Intent_Match×0.3)
Categorize:
  🟢 Quick Wins: high volume + KD under 40
  🟡 Strategic: high volume + high KD (long-term play)
  🔵 Long Tail: low volume + low KD (supporting content)
  ⚪ Skip: low volume + high KD
Table: Keyword | Volume | KD | CPC | Intent | Category | Cluster"""

Three API calls to DataForSEO. One call to Claude. Complete keyword map with clusters, priorities, and content recommendations.

Step 6: Build the Content Calendar

Keywords researched. Gaps identified. Priorities scored. Now I need to turn all of it into an editorial calendar the team can actually execute.

content_plan.py

def create_content_plan(keyword_data, gap_data, site_goals,
                        frequency="8/month", period="quarterly"):
    prompt = f"""You are a content strategist.

## Parameters: {frequency} | {period}

Architecture: Pillar (2000-4000 words) → Cluster (800-1500)
              → Supporting (500-800)
Content mix: 50-60% informational | 20-30% commercial | 10-20% transactional
Priority score: (Volume×0.3) + ((100-KD)×0.3) + (Business_Value×0.4)

Monthly calendar:
Week | Title | Target Keyword | Type | Words | Priority

P1 content briefs — for each priority piece:
- Primary + secondary keywords
- H1 and H2 outline
- Competitor URLs to reference
- Unique angle / value-add
- Internal linking targets

Rules: Pillars before clusters. Leave 20% buffer."""

Step 7: Write the Content

The calendar says what to publish and when. For each piece, I analyze the top-ranking competitors for that keyword — their heading structure, what they cover, what they miss — then generate a complete, optimized article.

write_article.py

def write_seo_article(keyword, market="United States", language="en"):
    # SERP analysis: who ranks and how
    serp = requests.post(f"{api}/serp/google/organic/live/advanced",
        auth=dataforseo_auth,
        json=[{"keyword": keyword, "location_name": market,
               "language_code": language, "depth": 10}]).json()

    # Content structure of top 3 competitors
    top_urls = [item['url'] for item in serp['items'][:3]
                if item.get('type') == 'organic']
    structures = []
    for url in top_urls:
        resp = requests.post(f"{api}/on_page/content_parsing/live",
            json=[{"url": url}])
        structures.append({'url': url, 'data': resp.json()})

    prompt = f"""Process:
1. Analyze heading structure of competitors. Common sections? Gaps?
2. Identify intent. What does the searcher want to accomplish?
3. Find the gap. What adds value beyond existing results?
4. Build the outline. H1 → H2s → H3s. Keyword variants in headings.
   Target word count: competitor average + 20%.
5. Write the article. Short paragraphs. Actionable advice.
   Natural keyword placement. Include FAQ section.
   Mark internal linking spots as [INTERNAL LINK: topic].
6. Validate. Keyword density 1-2%. Meta title under 60 chars.
   Meta description under 155 chars.
7. Deliver: meta title + meta description + full article in markdown
   + internal linking recommendations + schema suggestions."""

The article comes back publication-ready. Not perfect — I always review and add my own strategic layer — but it's 90% there. The competitor analysis, the outline, the writing, the SEO validation: all handled.

The Full Chain

full_pipeline.py

client_info = {
    'website': 'https://client-site.com',
    'industry': 'B2B SaaS - Project Management',
    'goal': 'Increase organic demo requests by 40% in Q2',
    'team_size': '1 SEO + 2 writers',
    'cms': 'WordPress with Yoast SEO'
}

# 1. CRAWL
crawler = SEOCrawler(client_info['website'], max_pages=200)
crawler.crawl()
crawler.to_csv('crawl_data.csv')

# 2. AUDIT
audit = run_seo_audit('crawl_data.csv', client_info)

# 3. GSC
gsc = gsc_analysis('gsc_queries.csv', 'gsc_pages.csv', 'client-site.com')

# 4. GAPS
gaps = find_content_gaps('client-site.com', market="United States")

# 5. KEYWORDS
kw = keyword_research(['project management software'])

# 6. PLAN
plan = create_content_plan(kw, gaps, client_info)

# 7. WRITE
article = write_seo_article('project management for small teams')

for name, content in [('audit', audit), ('gsc', gsc), ('gaps', gaps),
                        ('keywords', kw), ('plan', plan), ('article', article)]:
    with open(f'{name}_report.md', 'w') as f:
        f.write(content)

print("Pipeline complete.")

From raw crawl to published content plan and first article. What used to be a week of work now runs in an afternoon.

Why This Works

The secret isn't any single step. It's how they compound.

The audit finds content gaps. The gap analyzer quantifies them with real competitor data. The keyword research prioritizes them. The content planner schedules them. The article writer creates them. Each step feeds the next, and at every stage Claude adds the strategic reasoning layer that would otherwise require hours of manual work.

Look at the audit output. It doesn't just say "you have duplicate titles." It identifies the cannibalization risk between /features and /product, suggests differentiated titles, and flags that blog tag pages are a separate problem with a different solution. That kind of contextual understanding — connecting a technical issue to its business impact — is what separates a checklist from an audit. And that's what used to take hours.

What This Isn't

I'm not replacing SEO expertise. I'm removing the parts that don't require it.

The strategic thinking still matters. Understanding a client's business, reading between the lines of their analytics, knowing when the data is telling a misleading story — that's still human work, and it should be.

The key principle throughout: compute in Python, interpret with Claude, verify with your brain. Every output is a draft, not a deliverable. I review, adjust, add context, and apply judgment. But I start at 90% instead of 0%.

Start Here

You don't need the full pipeline on day one. Here's how to build up:

Week 1

Crawler + audit prompt

That alone will save you hours per client. The fastest entry point.

Week 2

Add GSC analysis

Pre-compute CTR gaps and cannibalization in Python, let Claude interpret.

Week 3

Plug in DataForSEO

Live data makes Claude's recommendations dramatically more specific.

Week 4

Chain it all together

Content planner + article generation. The full pipeline.

Want to implement this pipeline?

If you're evaluating how to bring AI into your SEO workflow — or want me to build and run this pipeline for your project — let's talk.

Get in Touch

Keep reading