Scanning

CLI Reference

xgrep flags and subcommands.

CLI Reference

xgrep [flags] -f <rules> <targets...>

Flags:
  -f, --rules string      path to rule file or directory
  -c, --config string     path to rule file or directory (alias for --rules)
      --json              output results as JSON
      --sarif             output results as SARIF
      --gitlab            output results as a GitLab SAST report (gl-sast-report.json)
      --project-root string        repository root for report file paths (default: auto-detected git root)
      --sarif-category string      SARIF automationDetails.id / GitHub Code Scanning category (default: xgrep)
      --error             exit with code 1 if findings are found (default: exit 0 even with findings)
      --max-findings int  exit 1 only when active findings exceed N (gradual rollout); takes precedence over --error; -1 disables (default)
      --disable-nosemgrep ignore inline nosemgrep/nogrep suppression comments and report all findings
  -j, --jobs int          number of parallel workers (default: NumCPU)
      --severity string   minimum severity to report (INFO, WARNING, ERROR)
      --category string   only run rules in this category (e.g. security, correctness); default: security,secrets for built-in rules
      --subcategory string         only run rules in this subcategory tier (e.g. vuln for exploitable-only)
      --exclude-subcategory string skip rules in this subcategory tier (e.g. audit to drop hardening/advisory rules)
      --with-builtin string        with -f/--rules, also run the built-in rules from the given categories (comma-separated)
      --include-opt-in             also run rules marked metadata.opt-in: true (off by default)
      --include-tests              keep security findings in test/spec/fixture/example paths (dropped by default; secrets are always kept)
      --include string    include only files matching glob pattern
      --exclude string    exclude files matching glob pattern
      --max-target-bytes int  skip files larger than N bytes
  -o, --output string     write output to file instead of stdout
      --rule-id string    only run rules with matching IDs (comma-separated)
      --skip-rule string  skip rules with matching IDs (comma-separated)
      --autofix           apply fixes to source files in place
      --dry-run           show fixes without applying (use with --autofix)
      --verbose           enable debug output

Flags shown with a type (string, int) take an argument; the rest (--include-opt-in, --include-tests, --autofix, …) are booleans that take none. For the complete, current flag list (including --baseline-commit, --history, --decode, --timeout, --max-memory, --lang, --metrics, --stdin, and profiling flags), run xgrep --help.

Rule selection

The built-in corpus is filtered by category and tier before it runs:

  • --category (default security,secrets) picks the top-level rule sets: security, secrets, correctness, … Secrets scanning is on by default, so a committed credential is reported without any extra flag.
  • --subcategory / --exclude-subcategory narrow by exploitability tier (vuln vs audit, below).
  • --rule-id / --skip-rule (and the Semgrep-compatible --exclude-rule alias) include or exclude individual rule IDs.
  • --include-opt-in additionally runs rules marked metadata.opt-in: true — higher-noise or situational rules that are off by default. --skip-rule still excludes them.

To run your own rules alongside the built-ins, pass -f/--rules together with --with-builtin <categories>; a custom rule overrides a built-in rule with the same ID, and an explicit --category still filters the whole merged set:

xgrep scan -f rules/ --with-builtin security,secrets src/

Exploitability tier (--subcategory)

Every built-in security rule carries a metadata.subcategory tier:

  • vuln — exploitable / attacker-reachable impact: injection, eval/dynamic exec, deserialization, SSRF, path traversal, auth bypass, open redirect, hardcoded credentials/secret exposure, and insecure-TLS / disabled cert validation.
  • audit — hardening / best-practice / advisory with no direct exploit: missing security headers, cookie flags, weak hashing/ciphers, timing attacks on constants, info disclosure (stacktraces, cleartext logging), availability DoS (ReDoS), and regex/validation smells.

Filter on it to get an exploitable-only scan (high signal, fewer advisories):

xgrep scan --category security --subcategory vuln <target>

--exclude-subcategory audit is the inverse and yields the same set when every rule is tiered. Combine with --xgrepignore to also drop non-source trees — the recommended setup for scanning a focused executable surface (e.g. an AI-agent skill):

xgrep scan --category security --subcategory vuln --xgrepignore <target>

Both flags accept a comma-separated list and compose with --category, --severity, and --rule-id/--skip-rule.

Subcommands

scan              scan targets (default when -f is provided)
ci                CI-optimized diff-aware scan (auto-detects the CI environment)
inspect           code intelligence: search symbols, navigate definitions, assess impact
graph             build and query the code graph
mcp               run as an MCP server over stdio (for AI agents)
skill             list and install the Claude Code skills bundled in the binary
test <path>       run tests on rule files in a directory
validate <path>   validate rule files without scanning
lsp               start an LSP server over stdio
version           print version and exit

Scan targets

A scan target is more than a single file. xgrep accepts any of these as <targets...>, and you can pass several at once:

  • A filexgrep scan app.py scans just that file.
  • A directoryxgrep scan . scans it recursively. By default xgrep scans the git-tracked files under the directory (respecting .gitignore), falling back to a filesystem walk when the target isn't a git repository. See File filtering for what is and isn't included.
  • Multiple targetsxgrep scan src/ lib/ config.yaml scans the union.
  • A remote git repositoryxgrep scan github.com/expressjs/express clones and scans it, no manual checkout needed. See Scanning a remote repository below.
  • stdin — pipe a single source with --stdin, or a multi-file JSON manifest with --stdin-files. See Scanning from stdin below.

With no target and no stdin, xgrep prints a welcome screen instead of scanning.

Scanning from stdin

When the code to scan isn't on disk — an editor buffer, a generated snippet, a file pulled from an API — feed it on stdin instead of a path:

# A single source. --lang is required (there's no filename to detect from).
cat app.py | xgrep scan --stdin --lang python

# A set of files in one call, as a JSON manifest [{path, content}, ...].
# Cross-file taint is preserved across the set.
xgrep scan --stdin-files < files.json
  • --stdin reads one source from stdin and requires --lang to set the language. Findings are reported against a synthetic path.
  • --stdin-files reads a JSON array of {path, content} objects and scans them together, preserving the relative paths (so cross-file dataflow works).
  • stdin input is mutually exclusive with --history and --baseline-commit.

Scanning a remote repository

A scan target can be a remote git repository instead of a local path; xgrep clones it (shallowly, default branch) into a temp directory, scans it, and reports repo-relative paths. Cloning uses a built-in git client — no git binary required.

xgrep scan github.com/expressjs/express            # host/owner/repo shorthand
xgrep scan https://github.com/expressjs/express    # explicit https
xgrep scan git@github.com:expressjs/express.git    # SSH
xgrep scan github.com/expressjs/express --ref 4.18.2   # a tag, branch, or commit
xgrep scan github.com/expressjs/express --full-clone   # full history instead of shallow
  • A target is treated as remote only when it isn't an existing local path, so a local directory always scans in place.
  • --ref <branch|tag|commit> checks out a specific ref before scanning (default: the repo's default branch). A commit SHA implies a full clone.
  • --depth <n> sets the shallow clone depth (default 1); --full-clone clones full history. The two are mutually exclusive.
  • Private repositories use your usual git credentials: SSH targets via the ssh-agent; HTTPS targets via a token in the environment — GITHUB_TOKEN, GITLAB_TOKEN, or the generic XGREP_GIT_TOKEN.
  • Diff-aware scanning of a remote works: combine --baseline-commit with a remote target and xgrep clones it with full history automatically (a shallow clone has none to diff a range against), then diffs and reports only changed lines, repo-relative — the same behavior as a local diff-aware scan. Only a single remote target is supported in this mode.
    xgrep scan github.com/acme/app --baseline-commit v1.0.0..v1.1.0
  • Remote scanning needs outbound network access (governed by your environment's network policy in hosted/CI setups).

Suppressing findings (nosemgrep)

Add a nosemgrep (or nogrep) comment on the matched line or the line directly above it to suppress a finding. Scope it to specific rules with nosemgrep: <id>; a bare nosemgrep suppresses every rule on that line.

dangerous(user_input)  # nosemgrep
dangerous(user_input)  # nosemgrep: python-command-injection

Suppressed findings are retained, not deleted, and surface differently per output so CI behaves predictably:

OutputSuppressed finding
Console texthidden
JSONincluded, extra.is_ignored: true
SARIFincluded with suppressions[].kind: "inSource" → GitHub shows it dismissed
GitLab SASTomitted
Exit code / --errornot counted (a file with only suppressed findings exits 0)

Pass --disable-nosemgrep to ignore all suppression comments and report every finding as active — useful for auditing what suppressions are hiding.

Diff-aware scanning (--baseline-commit)

For pull-request / CI checks, --baseline-commit scopes the scan to the files changed since a baseline so xgrep only parses and runs rules over what changed, and only reports findings on changed lines. This matches Semgrep/Opengrep, which expose diff-aware scanning under the same flag.

xgrep --baseline-commit HEAD               # changes in the working tree vs HEAD
xgrep --baseline-commit origin/main        # changes since origin/main (use the merge-base in CI)
xgrep --baseline-commit origin/main..HEAD  # changes in a commit range
  • The spec is a single ref (diffed against the working tree) or a <base>..<head> / <base>...<head> range.
  • Only changed files are scanned; findings on unchanged lines of changed files are dropped. --error exits 1 only when a changed line has a finding.
  • Content scanned is the working tree, and paths are repository-root relative — run xgrep from the repo root (the standard CI setup). For an accurate range, the head should be the working tree / HEAD.
  • Because unchanged files are not scanned, interfile (cross-file) analysis sees only the changed files; omit --baseline-commit for a full-context scan.

Scanning git history for secrets (--history)

A secret that was committed and later deleted still lives in the repository's git history, so it is still compromised. A normal scan only sees the working tree, and even a diff-aware scan (--baseline-commit) misses an add-then-remove. --history walks the full commit history and scans the content each commit introduced, so it catches secrets that no longer exist in the current tree.

# Scan a repo's whole history for secrets (pass the repo path)
xgrep --history --category secrets .

# Bound the walk for speed on large repos
xgrep --history --category secrets --since 2024-01-01 .
xgrep --history --category secrets --max-commits 5000 .

Each finding carries commit provenance — who introduced the secret and when — so you can rotate the credential and purge the history. In JSON/SARIF output this appears as a commit object:

{
  "check_id": "aws-access-key-id",
  "path": "config/app.yaml",
  "commit": {
    "sha": "b719f81a0c…",
    "author": "Jane Dev",
    "email": "jane@example.com",
    "date": "2024-03-02T11:07:14Z"
  }
}

Notes:

  • Pair it with --category secrets. History scanning runs whatever rules are selected; secrets are the use case it exists for. (The built-in default category is security, so without --category secrets you would scan history with the code-vulnerability rules instead.)
  • Additions only. Each commit is compared to its first parent and only the lines it added are attributed to it, so a secret is reported at the commit that introduced it — with its true line number in that commit's file.
  • Reported once. The same secret touched by many commits is de-duplicated to its earliest introducing commit.
  • --since <date> accepts YYYY-MM-DD or an RFC3339 timestamp; --max-commits <n> caps how many commits are walked (0 = no limit). Both require --history.
  • Hermetic. It reads only the local .git object store — no network. To scan a remote repository's history, clone it with full history first (xgrep scan <url> --full-clone) or check it out locally, then run --history.
  • Mutually exclusive with --baseline-commit and stdin input.

Decoding encoded payloads (--decode)

Secrets are often committed one encoding layer deep — a base64-wrapped .env, a gzip'd Kubernetes secret, a token inside a percent-encoded URL. To a normal scan that outer blob is opaque, so the credential inside it is missed. --decode decodes encoded payloads and re-runs the secret/generic rules over the decoded content, so the hidden token is found and reported at the encoded span in the original file.

# Find secrets hidden inside base64 / hex / url / gzip payloads
xgrep --decode --category secrets .

# Combine with history to sweep encoded secrets across deleted code too
xgrep --history --decode --category secrets .

Each decoded finding records the decode chain in metadata.decoded-from, and its position points at the encoded blob you can see in the file:

{
  "check_id": "aws-access-key-id",
  "path": "config/app.yaml",
  "start": { "line": 12, "col": 14 },
  "extra": {
    "metadata": { "decoded-from": "base64 > gzip" }
  }
}

Notes:

  • Opt-in. Off by default; a scan without --decode is byte-for-byte unchanged. Decoders supported today: base64 (standard + URL-safe), hex, url (percent-encoding), and gzip/zlib (peeled when nested).
  • Bounded. Nesting depth, per-payload input/output size, and per-file budgets are all capped, and an over-sized inflation is abandoned — so a decompression bomb cannot blow up a scan.
  • Precise. Decoded bytes that are not printable text are dropped before matching, so decoding never manufactures false positives; it only adds findings on content that really decodes to a credential.
  • Hermetic. Decoding is a local, deterministic transform — no network, no temp files. Decoded content is scanned as text, never parsed back into the language/AST engine.
  • Pair it with --category secrets for the intended use case (the built-in default category is security).

Validating secrets (--validate)

Detection finds strings that look like credentials; --validate confirms whether one is actually live by probing its provider. It turns a finding's validation_state into:

  • confirmed — the provider accepted the secret; it is live (act now).
  • unconfirmed — the provider rejected it; revoked/invalid (lower urgency).
  • error — could not determine (network issue / unexpected response).
  • unvalidated — not probed (the default; no --validate, or the rule has no validation endpoint).
xgrep --validate --category secrets <target>
{ "check_id": "github-personal-access-token", "extra": { "validation_state": "confirmed" } }

This is opt-in and off by default — it is the one mode that makes outbound network calls, so the hermetic default is preserved unless you ask for it:

  • The candidate secret is sent only to that rule's fixed provider endpoint (built into the rule, not anything from the scanned content), and is never logged or written to disk — it is held in memory only for the probe.
  • Validation runs after detection; it never changes which findings are reported, only their validation_state. Probes are bounded by a small concurrency limit and short timeouts to stay gentle on provider APIs.
  • Only rules with a provider introspection endpoint are validated; the rest stay unvalidated. Validators currently ship for GitHub, GitLab, Slack, and Stripe tokens (more to follow).
  • Severity follows the result. A validatable secret is reported at its reduced unvalidated-severity until proven live; a confirmed finding is raised to the rule's full severity, while unconfirmed/error keep the reduced one. So --validate sharpens the signal — live keys rise to the top, dead ones stay low — rather than just adding a field.

On this page