File Filtering
Which files xgrep scans by default and how to control it — git-tracked files, ignore patterns, globs, and size limits.
File Filtering
xgrep decides which files to scan from the target you point it at. The defaults favor completeness over speed-by-omission, with flags to opt into Semgrep-style filtering when you want it.
What gets scanned
By default xgrep scans only the git-tracked files under the target
(git ls-files --cached), which automatically respects .gitignore. Untracked
files — those present in the working tree but not yet committed — are skipped, so
local data dumps, build output, and scratch files don't bloat the scan. Add
--all-files to also scan untracked-but-not-ignored files (useful for reviewing
new code before you commit it):
xgrep scan . # tracked files only (default)
xgrep scan --all-files . # tracked + untracked (still respects .gitignore)When the target isn't a git repository, xgrep falls back to a filesystem walk
(scanning everything not ignored). Narrow or widen the set further with
--include / --exclude globs.
There is no default max file size — large files are scanned, not skipped (set
a limit explicitly with --max-target-bytes). Built-in ignore patterns are
opt-in via --xgrepignore, not on by default. Both choices are explained in
Design decisions below.
Machine-generated files (Code generated … DO NOT EDIT, @generated) and binary
files are skipped by default; use --include-generated to scan generated files.
Some generators emit large, low-signal output without a DO NOT EDIT marker —
these are recognized by their preamble instead: tree-sitter parsers
(tree_sitter/parser.h plus the generated STATE_COUNT/LANGUAGE_VERSION
table defines), GNU Bison, and flex. Detection is content-based, so a
hand-written external scanner that merely includes parser.h is still scanned.
Minified files (vendored bundles like jquery.min.js) are also skipped by
default. A file is treated as minified when its first 64 KB contains a line over
7,000 bytes, averages over 700 bytes per line, or is under 7% whitespace (the
last catches densely-packed bundles with short lines, like
react.production.min.js). This decision is content-based, never name-based —
xgrep looks at the bytes, not the filename, so a renamed or inlined bundle is
still skipped and a hand-written file that happens to be named *.min.js is
still scanned. Findings in minified third-party code are rarely actionable (the
code is unfixable in place and line numbers are meaningless), and such files are
by far the most expensive to scan. Use --include-minified to scan them anyway:
# default: dist/vendor.min.js is skipped
xgrep scan ./frontend
# audit everything, including minified bundles
xgrep scan --include-minified ./frontendThe skip is applied by default at every scan entry point — the CLI, the MCP
scan tool, and the library scan.Options — so minified and generated files
never slip back in unless you opt in. For Semgrep compatibility,
--no-exclude-minified-files is an alias for --include-minified, and
--exclude-minified-files is accepted as a no-op (it matches xgrep's default of
skipping minified files).
Design decisions
Ignore patterns are opt-in, not default. xgrep intentionally does not auto-skip test directories, docs, or vendor code by default, because test files are valid SAST targets — vulnerabilities in test fixtures can indicate real security patterns, and test code often contains hardcoded credentials, unsafe patterns, or copy-pasted production code.
Production scope: security findings in test paths are dropped from results. File selection and finding reporting are separate steps. xgrep still scans test, spec, fixture, and example files (above), but by default it drops the security findings located in those paths from the report — a vulnerability in a test fixture is rarely exploitable in production, and reporting it is noise that buries real bugs. Two deliberate exceptions keep this from hiding anything that matters:
- Secrets are always kept, wherever they live — a credential committed to a test file is just as compromised as one in production code.
--include-teststurns the security findings in those paths back on, for when the test code itself is the surface you're reviewing.
This is distinct from --xgrepignore, which removes the files from the scan
entirely; production scope scans them but filters their non-secret findings.
Pass --xgrepignore to opt into the built-in ignore set, which skips non-source
surfaces: test/, benchmark{,s}/, eval{,s}/, example{,s}/, docs/, vendor/,
node_modules/, dist/, build/, .github/, lockfiles, and *.{md,test.js,spec.ts,bench.ts}
and similar. To customize the set, drop a .xgrepignore file (one glob per line, #
comments allowed) at the scan root; a .semgrepignore file is honored as a
fallback. The first such file that exists is authoritative — a present-but-empty
(or comment-only) file is an explicit "ignore nothing extra" override that keeps
the flag active while disabling the built-in patterns. --semgrepignore remains
as an alias for --xgrepignore.
This is the recommended flag for embedders scanning a focused executable surface
(e.g. an AI-agent skill: a SKILL.md plus scripts/), where benchmark/eval/example
trees are noise rather than the code under review.
No default max file size. xgrep scans all files regardless of size and instead
invests in making the engine performant on large files (per-function scoping,
literal pre-checks, context-based timeouts). Silently skipping large files hides
potential vulnerabilities in
generated code, vendored dependencies, or monolithic source files. Set a limit
explicitly with --max-target-bytes if you need one.
See Semgrep compatibility for the full parity matrix.
Kotlin
What xgrep detects in Kotlin — injection, XSS, unsafe deserialization, weak crypto, insecure TLS, open redirect, and ReDoS — for Android apps and JVM backends.
Semgrep/OpenGrep compatibility
Feature parity between xgrep's CLI and Semgrep / OpenGrep — subcommands, flags, output, and xgrep-only extras.