Code ScanningRules

Taint Analysis

Write source-to-sink dataflow rules with xgrep's taint mode.

Taint Analysis

The general authoring guide for this page is still to be written.

xgrep supports taint-mode rules (mode: taint) with pattern-sources, pattern-sinks, pattern-sanitizers, and pattern-propagators, including interprocedural and cross-file (interfile: true) tracking. The feature list is in Writing rules.

Flow finalization (metadata.flow)

Rules can refine their findings per flow — over the source→sink paths the ADR-0229 dataflow engine reconstructs — instead of per matched span. A rule opts in under metadata.flow, whose keys are strictly validated: an unknown key or guard name is a load-time error, so a typo can never silently disable the analysis. The engine-driven pass lives in pkg/dataflowscan (the bridge that imports both pkg/core and pkg/dataflow); core stays free of a pkg/dataflow import.

Guards and filters

metadata:
  flow:
    guards: [allowlist]
    filter: flow.completeGuards.contains("allowlist") == false

For every finding, xgrep reconstructs the source→sink flow, joins the if conditions dominating the sink to the variables actually on that flow's path, and asks each configured guard provider to annotate the conditions it recognises with a verdict. The rule's filter — an MQL expression over the flow resource (sourceLine, sinkLine, sinkVar, steps, guardKinds, completeGuards, incompleteGuards) — then decides whether the flow is a finding. Provider allowlist (java) recognises if (X.contains(tainted)) and reports whether X is a complete (constant-only, non-escaping) allowlist, sharing its classification with the allowlist-analyzer incomplete-sink alerts.

Reporting one finding per flow (report-at)

A check that matches both a weak primitive's construction (MD5.new()) and its consumption (hasher.update(data)) folds the connected pair into one finding:

metadata:
  flow:
    report-at: sink # or: source

sink keeps the consumption, source the construction/binding. This works through intermediate bindings and respects kill boundaries (a use after reassignment is not the same flow). A hit no flow connects keeps its own finding, so a construction whose consumption is out of view still reports.

Cross-rule finding identity (group)

A structural creation rule and a taint usage rule can share a group:

metadata:
  flow:
    group: go-weak-cipher

Within a group, a hit covered by a sibling rule's reported flow (the sibling's dataflow trace records where its flow entered) folds into the sibling's finding at the flow's sink end; where the sibling does not fire, the hit keeps its backstop role.

Safety

  • Per-variable precision. A guard only joins flows whose path contains the variable the condition tests; a second tainted variable sunk in the same guarded block keeps its finding.
  • Fail-open. A flow the engine cannot reconstruct, an unrecognised guard, or a filter error never suppresses a finding — suppression requires positive evidence. The CI gate TestAllFlowFiltersCompileAndExecute compiles and executes every shipped filter so a malformed one fails at author time rather than silently keeping every finding.

See docs/adr/0229-flow-centric-findings-architecture.md for the architecture and roadmap.

On this page