Photon Review
Crawl a target website once and walk away with internal URLs, email addresses, social media links, JavaScript files, and exposed secrets — all organized into separate files ready for downstream investigation.
Quick Verdict
Pentesters and OSINT analysts who need structured content extraction from a target website — email addresses, social links, and exposed JavaScript secrets — as an early active investigation step after scope confirmation.
Pros
- + Per-category output files — emails, social links, secrets, JavaScript paths — are immediately usable as inputs for downstream investigation tools without parsing
- + JavaScript secret detection scans client-side code for API keys, tokens, and credentials automatically during the crawl — no manual source review required
- + --wayback flag recovers historically linked URLs that are no longer present in the live site's navigation, surfacing forgotten paths and endpoints
- + Configurable crawl depth and thread count make it practical for both quick surface-level extractions and thorough multi-depth content mapping
Cons
- − Active tool generating direct HTTP requests — traffic is visible in target server logs; not appropriate for passive-only reconnaissance phases before active testing is authorized
- − Does not render JavaScript — single-page applications and JS-heavy sites that load content dynamically yield incomplete results without a browser-based crawler
Photon: Fast OSINT Web Crawler for Extracting URLs, Emails, and Exposed Secrets
Manual browsing through a company's website is a tedious and incomplete way to gather information. You follow the obvious links, check the contact page, and scan the footer. But you often miss crucial details. The careers page might have thirty staff email addresses. The privacy policy might link to four social media platforms. An outdated JavaScript bundle might expose an AWS key.
Photon does this work for you. It crawls the entire public web surface. It extracts all the relevant information and organizes it into structured files. One command, one run. The output is ready to feed into your next investigation step.
What Photon Extracts
Introduction to Photon
Photon crawls websites, visiting internal links recursively to a set depth, and logs external links instead of following them. From the crawl, Photon extracts six types of data: Internal URLs, external link targets, email addresses, social media profile links, JavaScript file paths, potential secrets in page source and JS files.
How Photon Works
The extraction process happens in one pass. Photon visits each internal page, extracts all six data types from HTML and JavaScript, and then moves on. By the end of the crawl, it has collected everything.
Secret Detection
Photon's secret detection feature finds high-value targets. Poorly maintained sites often expose sensitive information such as API keys, authentication tokens, AWS access keys, and private keys, which should not be included in client-side code. Developers sometimes commit this information and forget to remove it. Photon automatically finds this sensitive information, eliminating the need for manual JavaScript review. The process is simple: Photon crawls, extracts, and logs the information.
Core Command Usage
Basic invocation:
python3 photon.py -u https://target.com -l 3 -t 10
The -l flag controls crawl depth.
Depth 3 works for most sites.
For a quick scrape, go lower, to 1 or 2; for complex sites, go higher.
The -t flag sets concurrent request threads.
Ten threads is a safe bet.
On stable targets where speed is more important than stealth, you can increase it.
Output goes in a directory named after the target domain. The files are:
- internal.txt — all URLs discovered within the target domain, one per line
- external.txt — all outbound links the crawler encountered, one per line
- emails.txt — email addresses extracted from page source across the crawled surface
- social.txt — social media profile URLs linked from the target site
- js.txt — paths to all JavaScript files discovered during the crawl
- secrets.txt — potential credential material flagged by the pattern-matching detection
The files produced are plain text lists that can be piped directly to the next tool.
The --wayback flag provides historical URL data by querying the Wayback Machine for every URL ever indexed under the target domain. These URLs appear in the output, even if they are not linked from the live site, revealing paths that previously existed but no longer do, such as old admin interfaces, retired API endpoints, leftover upload directories, and deleted content that was not properly secured.
The Wayback query is a passive process, whereas the live crawl is active. Combining both approaches yields better URL coverage.
OSINT Investigation Applications
Email addresses are scattered across a company's public site, not just on the contact page. Staff directories, document metadata, forum posts, and newsletter archives all contribute addresses that manual browsing misses. Photon's emails.txt gathers these addresses in one file. You can feed this file into HaveIBeenPwned, social profile tools, or theHarvester to get a detailed organizational identity. Emails.txt collects addresses from staff directories, document metadata, forum posts, newsletter archives.
Social link mapping begins with social.txt, which lists official social platforms linked from an organization's web properties. This list is not inferred from search results, but rather is the authoritative start for social media investigation. The main domain, blog, careers page, and legal pages all define their claimed social presence. Social media presence includes main domain, blog, careers page, legal pages.
Discrepancies between what an organization links to publicly and what deeper investigation reveals can tell a story.
JavaScript secret detection via secrets.txt directly impacts security assessments. Exposed API keys in client-side JavaScript are live keys, available to anyone loading the page. Service and key details are contained in the code, often not rotated since they were committed. Photon surfaces these details across the entire JavaScript footprint in one crawl, revealing API keys, service details.
Limitations and Workflow Position
Photon sends HTTP requests straight to the target. Your IP and user agent show up in their server logs. This is expected in active engagements where probing is authorized. But if you're in a passive phase and can't leave traces, Photon isn't for you. Use it after subdomain enumeration and when active testing is cleared.
Photon fails on JavaScript-heavy sites. Pages built with React, Vue, or Angular without server-side rendering return near-empty HTML. The JavaScript code is there, and Photon scans it for secrets. But URLs, emails, and social links aren't in the raw HTML. For modern JS sites, use Photon with Katana. Katana uses a headless browser to render JavaScript before extraction.
Photon is used after passive enumeration identifies subdomains, httpx checks live web services, and active probing is confirmed in scope.
Photon extracts structured content from live web hosts. It tells you what's exposed on identified surfaces. It maps content.
Photon is best for pentesters and OSINT analysts needing structured extraction of emails, social links, and JavaScript secrets.
The tool is available on GitHub at s0md3v/Photon.
Similar Tools
Shodan
Search engine for internet-connected devices — find exposed servers, industrial systems, and network infrastructure worldwide.
RTL-SDR Blog V4
The standard $40 software-defined radio dongle for ADS-B aircraft tracking, AIS ship tracking, and weather satellite imagery.
SingleFile
Archive any web page — including JavaScript-rendered content — into a single self-contained HTML file that opens identically offline and can be cryptographically verified.
urlscan.io
Free website scanner that captures full-page screenshots, network requests, and DOM snapshots for any URL
Community Rating
Ratings from security researchers. No third-party tracking.
Rate this tool:
This review reflects testing as of 2026-04-07. OSINT tools change frequently — check the vendor's current documentation for pricing and feature updates. Report an error →