Google Places Email Finder
Python / AsyncIO / Web Crawling / 2025
Project Overview
The Google Places Email Finder is a high-performance web crawler built in Python using asyncio and aiohttp. It is designed to take domains (extracted from Google Places or other directories) and aggressively spider their pages to discover valid, professional email addresses while filtering out generic, placeholder, or invalid patterns.
Architecture & Data Flow
Input Domains
(CSV / Google Places)
(CSV / Google Places)
Async Crawler
(aiohttp + BFS)
(aiohttp + BFS)
Regex Extraction
(TLD / Bad Substring Filter)
(TLD / Bad Substring Filter)
Output CSV
(Validated Emails)
(Validated Emails)
Key Technical Features
- Asynchronous I/O: Built with
asyncioto handle thousands of concurrent connections. - Advanced Validation: Comprehensive Regex pattern matching and extensive TLD checking.
- Intelligent Filtering: Built-in exclusion lists for media files (png, jpg), libraries (jquery), and dummy domains (example.com).
- Broad Crawling Strategy: Implements a Breadth-First Search (BFS) to efficiently traverse nested pages.