Competitor Scraping Quality and URL Handling

Social

Dashboard

✨ What's Improved

The competitor scraping engine has been upgraded with several data quality improvements that make scraped results cleaner and more accurate out of the box.

🔧 Key Fixes

Auto-Prepend HTTPS

Enter competitor URLs like competitor.com and the system automatically adds https:// — no more validation errors for bare domain inputs.

Brand Name Cleaning

Scraped page titles like "The Brand Toolkit Platform | Brandkit - Home for your brand" are now cleaned to extract just the core brand name (e.g., "Brandkit"). SEO suffixes, taglines, and filler text are stripped automatically.

Color Deduplication

Duplicate hex values are removed from scraped brand color palettes, giving you a clean, unique set of brand colors.

When metadata doesn't contain social profiles, the scraper now parses the page markdown content as a fallback to find links to Instagram, LinkedIn, Twitter/X, Facebook, YouTube, TikTok, and Reddit.

Value Proposition Filtering

Extracted value propositions are now filtered to remove image markdown tags and raw URLs, keeping only meaningful text content.

Rich Data Backfill

Existing competitor records have been backfilled with typography, button styles, spacing, brand personality, and design framework data from previously captured raw scrape data.

📍 How to Use

These improvements apply automatically to all new competitor scrapes. Existing competitors have already been backfilled with the enriched data — click any competitor card to see the full details.