How RepoSec Works

RepoSec is a free tool that scans public Git repositories for intentionally malicious code — supply chain attacks, obfuscated payloads, credential theft, and script hijacking. Think of it as VirusTotal, but for code repositories.

The Problem

Open source supply chain attacks are growing fast. Attackers publish packages and repositories that look legitimate but contain hidden backdoors, data exfiltration, or obfuscated payloads that activate on install. Traditional vulnerability scanners (Snyk, Dependabot) check for known CVEs in dependencies — they don't catch deliberately malicious code that hasn't been reported yet.

RepoSec fills this gap. Paste a URL, get a verdict — no setup, no config, no account needed.

How a Scan Works

When you submit a repository URL, here's what happens:

Validation & cache check — We verify the URL format and check if we've already scanned this repo at the same commit. If so, you get instant results.
Queue & isolate — The scan job is queued and picked up by a worker that spawns a fresh, ephemeral Docker container for this scan alone.
Clone — The repo is shallow-cloned inside the container with all dangerous Git features disabled (no hooks, no symlinks, no local protocol).
10 security checks — Each check module analyzes the cloned files using pattern matching, AST analysis, and heuristics. No repo code is ever executed.
Score & report — Findings are weighted and aggregated into a risk score (clean → low → medium → high → critical). The container is destroyed.

Average scan time is 10–30 seconds. Results are cached per commit hash — same commit, same results, instantly.

The 10 Security Checks

Committed Secrets

Detects API keys, tokens, passwords, and credentials accidentally (or intentionally) left in source code. Covers AWS, GCP, Stripe, GitHub tokens, private keys, and more.

Dangerous Functions

Flags use of eval(), exec(), new Function(), child_process, and other dynamic code execution patterns across JavaScript, Python, Ruby, PHP, and shell scripts.

Script Hijacking

Analyzes package.json lifecycle scripts (postinstall, preinstall, prepare) and Makefile/setup.py hooks for suspicious commands — the #1 vector for npm/PyPI supply chain attacks.

Obfuscation Detection

Identifies hex-encoded strings, base64 payloads, Unicode escapes, and common JavaScript obfuscation patterns (e.g., JSFuck-style encoding, \x escape sequences) that hide malicious intent.

Outbound Requests

Scans for HTTP calls, DNS lookups, WebSocket connections, and other network activity that could exfiltrate data. Flags hardcoded IPs, unusual domains, and Telegram/Discord webhook URLs.

Data Flow Analysis

Traces how sensitive data (environment variables, file reads, crypto wallets) flows through the code. Catches patterns like reading ~/.ssh/id_rsa and sending it over HTTP.

Crypto & Exfiltration

Detects cryptocurrency wallet address patterns, clipboard hijacking code, and crypto-mining scripts that silently run in the background.

Dependency Audit

Checks for typosquatted package names, dependencies installed from Git URLs instead of registries, and suspiciously pinned versions. Cross-references known malicious packages.

Smart Contract Execution

Identifies code that fetches and executes smart contract payloads — a pattern used in web3-themed supply chain attacks where the actual malicious code lives on-chain.

Git Metadata Analysis

Examines repository age, commit count, author count, and other signals. Brand new repos with a single author and minimal history are flagged as higher risk context.

Isolation & Security

Every scan runs inside a dedicated Docker container with strict constraints:

512 MB memory limit — prevents memory bombs
1 CPU core — prevents resource exhaustion
150-second hard timeout — container is killed if it exceeds this
100 MB repo size limit — oversized repos are rejected
Non-root user — scanner runs as an unprivileged user inside the container
Ephemeral — container and all cloned data are destroyed after each scan

Crucially, no repository code is ever executed. All checks are static analysis — pattern matching and file inspection only. The container isolation protects against edge cases like malicious filenames, symlink attacks, Git exploits, and zip/git bombs.

Risk Scoring

Each finding has a severity (critical, high, medium, low, info). The overall risk score is a weighted aggregate:

Critical — Active malicious patterns (e.g., data exfiltration + obfuscation combined)
High — Strong indicators of malicious intent (e.g., postinstall running obfuscated code)
Medium — Suspicious but potentially legitimate (e.g., eval() usage, hardcoded URLs)
Low — Minor concerns worth reviewing (e.g., committed .env file)
Clean — No findings or only informational notes

Scores are deterministic — the same code at the same commit always produces the same score.

Limitations

RepoSec is useful, but it's not magic. Be aware of what it can't do:

Static analysis only — We never run the code. This means we can miss sophisticated attacks that only manifest at runtime, or payloads that are fetched dynamically after install.
Pattern-based detection — Checks rely on known malicious patterns and heuristics. A sufficiently novel attack that doesn't match any known pattern will slip through.
False positives — Legitimate security tools, penetration testing frameworks, and crypto projects will naturally trigger some checks. A high score doesn't automatically mean malicious — context matters.
Public repos only — Private repositories can't be cloned without authentication, which we intentionally don't support.
Shallow clone — We clone only the latest commit (--depth 1). Malicious code buried in old commits or other branches won't be caught.
No dependency resolution — We check declared dependencies for suspicious patterns but don't recursively scan transitive dependencies.
Language coverage — Checks are strongest for JavaScript/TypeScript, Python, and shell scripts. Other languages have basic coverage but fewer specialized checks.

Background

RepoSec was initiated in July 2025 by TechGDPR, a Berlin-based consultancy specializing in GDPR and AI Act compliance for tech companies. It grew out of a practical need: when auditing client codebases for data protection compliance, the team regularly encountered open source dependencies with no easy way to check for intentional malice — only known vulnerabilities.

The tool was designed and built by Silvan Jongerius (TechGDPR founder), with architecture and implementation assisted by AI. The entire codebase — from architecture document to deployment — was built iteratively with Claude as a development partner.

RepoSec is free to use. It's hosted on a single VPS, processes scans sequentially, and doesn't require an account. If it's useful to you, that's the whole point.

Badge API

Add a RepoSec badge to your repository README to show the latest scan result:

![RepoSec](https://reposec.net/api/badge/org/repo)

Badges are cached for 1 hour and update automatically when new scans complete. Colors range from bright green (clean) to red (critical).