Which files xgrep scans by default and how to control it — git-tracked files, ignore patterns, globs, and size limits.

File Filtering

xgrep decides which files to scan from the target you point it at. The defaults favor completeness over speed-by-omission, with flags to opt into Semgrep-style filtering when you want it.

What gets scanned

By default xgrep scans only the git-tracked files under the target (git ls-files --cached), which automatically respects .gitignore. Untracked files — those present in the working tree but not yet committed — are skipped, so local data dumps, build output, and scratch files don't bloat the scan. Add --all-files to also scan untracked-but-not-ignored files (useful for reviewing new code before you commit it):

xgrep scan .              # tracked files only (default)
xgrep scan --all-files .  # tracked + untracked (still respects .gitignore)

When the target isn't a git repository, xgrep falls back to a filesystem walk (scanning everything not ignored). Narrow or widen the set further with --include / --exclude globs.

There is no default max file size — large files are scanned, not skipped (set a limit explicitly with --max-target-bytes); a per-file 60s timeout guards against a single pathological file (set --timeout, or --timeout 0 to disable). A conservative set of dependency/build/lockfile directories is ignored by default (node_modules/, vendor/, dist/, *.min.*, lockfiles, …); --no-ignore scans them, and --xgrepignore adds the opinionated test/docs/example set on top. Both choices are explained in Design decisions below.

Machine-generated files (Code generated … DO NOT EDIT, @generated) and binary files are skipped by default; use --include-generated to scan generated files. Some generators emit large, low-signal output without a DO NOT EDIT marker — these are recognized by their preamble instead: tree-sitter parsers (tree_sitter/parser.h plus the generated STATE_COUNT/LANGUAGE_VERSION table defines), GNU Bison, and flex. Detection is content-based, so a hand-written external scanner that merely includes parser.h is still scanned.

Minified files (vendored bundles like jquery.min.js) are also skipped by default. A file is treated as minified when its first 64 KB contains a line over 7,000 bytes, averages over 700 bytes per line, or is under 7% whitespace (the last catches densely-packed bundles with short lines, like react.production.min.js). This decision is content-based, never name-based — xgrep looks at the bytes, not the filename, so a renamed or inlined bundle is still skipped and a hand-written file that happens to be named *.min.js is still scanned. Findings in minified third-party code are rarely actionable (the code is unfixable in place and line numbers are meaningless), and such files are by far the most expensive to scan. Use --include-minified to scan them anyway:

# default: dist/vendor.min.js is skipped
xgrep scan ./frontend

# audit everything, including minified bundles
xgrep scan --include-minified ./frontend

The skip is applied by default at every scan entry point — the CLI, the MCP scan tool, and the library scan.Options — so minified and generated files never slip back in unless you opt in. For Semgrep compatibility, --no-exclude-minified-files is an alias for --include-minified, and --exclude-minified-files is accepted as a no-op (it matches xgrep's default of skipping minified files).

Design decisions

Two ignore tiers: a default-on noise set, and an opt-in opinionated set. xgrep splits the built-in ignore patterns:

Default-on (disable with --no-ignore): dependency, build, and generated output that is never the executable surface under review — .git/, node_modules/, vendor/, dist/, dev/breeze/, *.min.js, *.min.css, lockfiles (*.lock, go.sum, package-lock.json, yarn.lock). Skipping these can't hide a real finding but avoids walking and regex-scanning large generated trees — the dominant cost on a big repo. (build/ is deliberately not here: some projects keep build-system source under it.)
Opt-in (--xgrepignore): the opinionated set that can legitimately hold findings — test directories, docs, and examples. xgrep does not skip these by default, because test files are valid SAST targets: vulnerabilities in test fixtures can indicate real patterns, and test code often contains hardcoded credentials or copy-pasted production code.

--no-ignore disables both tiers ("scan everything").

Production scope: security findings in test paths are dropped from results. File selection and finding reporting are separate steps. xgrep still scans test, spec, fixture, and example files (above), but by default it drops the security findings located in those paths from the report — a vulnerability in a test fixture is rarely exploitable in production, and reporting it is noise that buries real bugs. Two deliberate exceptions keep this from hiding anything that matters:

Secrets are always kept, wherever they live — a credential committed to a test file is just as compromised as one in production code.
--include-tests turns the security findings in those paths back on, for when the test code itself is the surface you're reviewing.

This is distinct from --xgrepignore, which removes the files from the scan entirely; production scope scans them but filters their non-secret findings.

Pass --xgrepignore to opt into the built-in ignore set, which skips non-source surfaces: test/, benchmark{,s}/, eval{,s}/, example{,s}/, docs/, vendor/, node_modules/, dist/, build/, .github/, lockfiles, and *.{md,test.js,spec.ts,bench.ts} and similar. To customize the set, drop a .xgrepignore file (one glob per line, # comments allowed) at the scan root; a .semgrepignore file is honored as a fallback. The first such file that exists is authoritative — a present-but-empty (or comment-only) file is an explicit "ignore nothing extra" override that keeps the flag active while disabling the built-in patterns. --semgrepignore remains as an alias for --xgrepignore.

This is the recommended flag for embedders scanning a focused executable surface (e.g. an AI-agent skill: a SKILL.md plus scripts/), where benchmark/eval/example trees are noise rather than the code under review.

No default max file size. xgrep scans all files regardless of size and instead invests in making the engine performant on large files (per-function scoping, literal pre-checks, context-based timeouts). Silently skipping large files hides potential vulnerabilities in generated code, vendored dependencies, or monolithic source files. Set a limit explicitly with --max-target-bytes if you need one.

See Semgrep compatibility for the full parity matrix.

File Filtering

File Filtering

What gets scanned

Design decisions

On this page