What is Photon used for?

Photon is a fast Python-based OSINT web crawler that extracts internal URLs, external links, email addresses, social media profile links, JavaScript file paths, and potential secrets from a target website in a single recursive crawl. Output is organized into per-category files — internal.txt, external.txt, emails.txt, social.txt, js.txt, secrets.txt — each immediately usable as input for downstream tools. A --wayback flag supplements the live crawl with historical URLs from the Wayback Machine, surfacing content that no longer appears in the live site's link structure.

Who should use Photon?

Pentesters and OSINT analysts who need structured content extraction from a target website — email addresses, social links, and exposed JavaScript secrets — as an early active investigation step after scope confirmation.

Photon: Fast OSINT Web Crawler for Extracting URLs, Emails, and Exposed Secrets

Name: Photon Review
Item: Photon
Rating: 4.1
Author: OSINTBench

Manual browsing through a company's website is a tedious and incomplete way to gather information. You follow the obvious links, check the contact page, and scan the footer. But you often miss crucial details. The careers page might have thirty staff email addresses. The privacy policy might link to four social media platforms. An outdated JavaScript bundle might expose an AWS key.

Photon does this work for you. It crawls the entire public web surface. It extracts all the relevant information and organizes it into structured files. One command, one run. The output is ready to feed into your next investigation step.

What Photon Extracts

Introduction to Photon

Photon crawls websites, visiting internal links recursively to a set depth, and logs external links instead of following them. From the crawl, Photon extracts six types of data: Internal URLs, external link targets, email addresses, social media profile links, JavaScript file paths, potential secrets in page source and JS files.

How Photon Works

The extraction process happens in one pass. Photon visits each internal page, extracts all six data types from HTML and JavaScript, and then moves on. By the end of the crawl, it has collected everything.

Secret Detection

Photon's secret detection feature finds high-value targets. Poorly maintained sites often expose sensitive information such as API keys, authentication tokens, AWS access keys, and private keys, which should not be included in client-side code. Developers sometimes commit this information and forget to remove it. Photon automatically finds this sensitive information, eliminating the need for manual JavaScript review. The process is simple: Photon crawls, extracts, and logs the information.

Core Command Usage

Basic invocation:

python3 photon.py -u https://target.com -l 3 -t 10

The -l flag controls crawl depth. Depth 3 works for most sites. For a quick scrape, go lower, to 1 or 2; for complex sites, go higher.

The -t flag sets concurrent request threads. Ten threads is a safe bet. On stable targets where speed is more important than stealth, you can increase it.

Output goes in a directory named after the target domain. The files are:

internal.txt — all URLs discovered within the target domain, one per line
external.txt — all outbound links the crawler encountered, one per line
emails.txt — email addresses extracted from page source across the crawled surface
social.txt — social media profile URLs linked from the target site
js.txt — paths to all JavaScript files discovered during the crawl
secrets.txt — potential credential material flagged by the pattern-matching detection

The files produced are plain text lists that can be piped directly to the next tool.

The --wayback flag provides historical URL data by querying the Wayback Machine for every URL ever indexed under the target domain. These URLs appear in the output, even if they are not linked from the live site, revealing paths that previously existed but no longer do, such as old admin interfaces, retired API endpoints, leftover upload directories, and deleted content that was not properly secured.

The Wayback query is a passive process, whereas the live crawl is active. Combining both approaches yields better URL coverage.

OSINT Investigation Applications

Email addresses are scattered across a company's public site, not just on the contact page. Staff directories, document metadata, forum posts, and newsletter archives all contribute addresses that manual browsing misses. Photon's emails.txt gathers these addresses in one file. You can feed this file into HaveIBeenPwned, social profile tools, or theHarvester to get a detailed organizational identity. Emails.txt collects addresses from staff directories, document metadata, forum posts, newsletter archives.

Social link mapping begins with social.txt, which lists official social platforms linked from an organization's web properties. This list is not inferred from search results, but rather is the authoritative start for social media investigation. The main domain, blog, careers page, and legal pages all define their claimed social presence. Social media presence includes main domain, blog, careers page, legal pages.

Discrepancies between what an organization links to publicly and what deeper investigation reveals can tell a story.

JavaScript secret detection via secrets.txt directly impacts security assessments. Exposed API keys in client-side JavaScript are live keys, available to anyone loading the page. Service and key details are contained in the code, often not rotated since they were committed. Photon surfaces these details across the entire JavaScript footprint in one crawl, revealing API keys, service details.

Limitations and Workflow Position

Photon sends HTTP requests straight to the target. Your IP and user agent show up in their server logs. This is expected in active engagements where probing is authorized. But if you're in a passive phase and can't leave traces, Photon isn't for you. Use it after subdomain enumeration and when active testing is cleared.

Photon fails on JavaScript-heavy sites. Pages built with React, Vue, or Angular without server-side rendering return near-empty HTML. The JavaScript code is there, and Photon scans it for secrets. But URLs, emails, and social links aren't in the raw HTML. For modern JS sites, use Photon with Katana. Katana uses a headless browser to render JavaScript before extraction.

Photon is used after passive enumeration identifies subdomains, httpx checks live web services, and active probing is confirmed in scope.

Photon extracts structured content from live web hosts. It tells you what's exposed on identified surfaces. It maps content.

Photon is best for pentesters and OSINT analysts needing structured extraction of emails, social links, and JavaScript secrets.

The tool is available on GitHub at s0md3v/Photon.

Photon Review

Quick Verdict

Pros

Cons

Photon: Fast OSINT Web Crawler for Extracting URLs, Emails, and Exposed Secrets

What Photon Extracts

Introduction to Photon

How Photon Works

Secret Detection

Core Command Usage

OSINT Investigation Applications

Limitations and Workflow Position

Similar Tools

Shodan

RTL-SDR Blog V4

SingleFile

urlscan.io

Community Rating