Scanning

File Filtering

Which files xgrep scans by default and how to control it — git-tracked files, ignore patterns, globs, and size limits.

File Filtering

xgrep decides which files to scan from the target you point it at. The defaults favor completeness over speed-by-omission, with flags to opt into Semgrep-style filtering when you want it.

What gets scanned

By default xgrep scans the git-tracked files under the target (git ls-files), which automatically respects .gitignore. When the target isn't a git repository it falls back to a filesystem walk. Narrow or widen the set with --include / --exclude globs.

There is no default max file size — large files are scanned, not skipped (set a limit explicitly with --max-target-bytes). Built-in ignore patterns are opt-in via --xgrepignore, not on by default. Both choices are explained in Design decisions below.

Machine-generated files (Code generated … DO NOT EDIT, @generated) and binary files are skipped by default; use --include-generated to scan generated files.

Minified files (vendored bundles like jquery.min.js) are also skipped by default: a file whose first 64 KB contains a line over 7,000 bytes — or averages over 700 bytes per line — is treated as minified. Findings in minified third-party code are rarely actionable (the code is unfixable in place and line numbers are meaningless), and such files are by far the most expensive to scan. Use --include-minified to scan them anyway:

# default: dist/vendor.min.js is skipped
xgrep scan ./frontend

# audit everything, including minified bundles
xgrep scan --include-minified ./frontend

Design decisions

Ignore patterns are opt-in, not default. xgrep intentionally does not auto-skip test directories, docs, or vendor code by default, because test files are valid SAST targets — vulnerabilities in test fixtures can indicate real security patterns, and test code often contains hardcoded credentials, unsafe patterns, or copy-pasted production code.

Production scope: security findings in test paths are dropped from results. File selection and finding reporting are separate steps. xgrep still scans test, spec, fixture, and example files (above), but by default it drops the security findings located in those paths from the report — a vulnerability in a test fixture is rarely exploitable in production, and reporting it is noise that buries real bugs. Two deliberate exceptions keep this from hiding anything that matters:

  • Secrets are always kept, wherever they live — a credential committed to a test file is just as compromised as one in production code.
  • --include-tests turns the security findings in those paths back on, for when the test code itself is the surface you're reviewing.

This is distinct from --xgrepignore, which removes the files from the scan entirely; production scope scans them but filters their non-secret findings.

Pass --xgrepignore to opt into the built-in ignore set, which skips non-source surfaces: test/, benchmark{,s}/, eval{,s}/, example{,s}/, docs/, vendor/, node_modules/, dist/, build/, .github/, lockfiles, and *.{md,test.js,spec.ts,bench.ts} and similar. To customize the set, drop a .xgrepignore file (one glob per line, # comments allowed) at the scan root; a .semgrepignore file is honored as a fallback. The first such file that exists is authoritative — a present-but-empty (or comment-only) file is an explicit "ignore nothing extra" override that keeps the flag active while disabling the built-in patterns. --semgrepignore remains as an alias for --xgrepignore.

This is the recommended flag for embedders scanning a focused executable surface (e.g. an AI-agent skill: a SKILL.md plus scripts/), where benchmark/eval/example trees are noise rather than the code under review.

No default max file size. xgrep scans all files regardless of size and instead invests in making the engine performant on large files (per-function scoping, literal pre-checks, context-based timeouts). Silently skipping large files hides potential vulnerabilities in generated code, vendored dependencies, or monolithic source files. Set a limit explicitly with --max-target-bytes if you need one.

See Semgrep compatibility for the full parity matrix.

On this page