Google Places Email Finder

Python / AsyncIO / Web Crawling / 2025

Project Overview

The Google Places Email Finder is a high-performance web crawler built in Python using asyncio and aiohttp. It is designed to take domains (extracted from Google Places or other directories) and aggressively spider their pages to discover valid, professional email addresses while filtering out generic, placeholder, or invalid patterns.

Architecture & Data Flow

Input Domains
(CSV / Google Places)
Async Crawler
(aiohttp + BFS)
Regex Extraction
(TLD / Bad Substring Filter)
Output CSV
(Validated Emails)

Key Technical Features

  • Asynchronous I/O: Built with asyncio to handle thousands of concurrent connections.
  • Advanced Validation: Comprehensive Regex pattern matching and extensive TLD checking.
  • Intelligent Filtering: Built-in exclusion lists for media files (png, jpg), libraries (jquery), and dummy domains (example.com).
  • Broad Crawling Strategy: Implements a Breadth-First Search (BFS) to efficiently traverse nested pages.