Taint Analysis
Write source-to-sink dataflow rules with xgrep's taint mode.
Taint Analysis
The general authoring guide for this page is still to be written.
xgrep supports taint-mode rules (mode: taint) with pattern-sources,
pattern-sinks, pattern-sanitizers, and pattern-propagators, including
interprocedural and cross-file (interfile: true) tracking. The feature list is
in Writing rules.
Flow finalization (metadata.flow)
Rules can refine their findings per flow — over the source→sink paths the
ADR-0229 dataflow engine reconstructs — instead of per matched span. A rule opts
in under metadata.flow, whose keys are strictly validated: an unknown key
or guard name is a load-time error, so a typo can never silently disable the
analysis. The engine-driven pass lives in pkg/dataflowscan (the bridge that
imports both pkg/core and pkg/dataflow); core stays free of a pkg/dataflow
import.
Guards and filters
metadata:
flow:
guards: [allowlist]
filter: flow.completeGuards.contains("allowlist") == falseFor every finding, xgrep reconstructs the source→sink flow, joins the if
conditions dominating the sink to the variables actually on that flow's path,
and asks each configured guard provider to annotate the conditions it
recognises with a verdict. The rule's filter — an MQL expression over the
flow resource (sourceLine, sinkLine, sinkVar, steps, guardKinds,
completeGuards, incompleteGuards) — then decides whether the flow is a
finding. Provider allowlist (java) recognises if (X.contains(tainted)) and
reports whether X is a complete (constant-only, non-escaping) allowlist,
sharing its classification with the allowlist-analyzer incomplete-sink alerts.
Reporting one finding per flow (report-at)
A check that matches both a weak primitive's construction (MD5.new()) and
its consumption (hasher.update(data)) folds the connected pair into one
finding:
metadata:
flow:
report-at: sink # or: sourcesink keeps the consumption, source the construction/binding. This works
through intermediate bindings and respects kill boundaries (a use after
reassignment is not the same flow). A hit no flow connects keeps its own
finding, so a construction whose consumption is out of view still reports.
Cross-rule finding identity (group)
A structural creation rule and a taint usage rule can share a group:
metadata:
flow:
group: go-weak-cipherWithin a group, a hit covered by a sibling rule's reported flow (the sibling's dataflow trace records where its flow entered) folds into the sibling's finding at the flow's sink end; where the sibling does not fire, the hit keeps its backstop role.
Safety
- Per-variable precision. A guard only joins flows whose path contains the variable the condition tests; a second tainted variable sunk in the same guarded block keeps its finding.
- Fail-open. A flow the engine cannot reconstruct, an unrecognised guard, or
a filter error never suppresses a finding — suppression requires positive
evidence. The CI gate
TestAllFlowFiltersCompileAndExecutecompiles and executes every shipped filter so a malformed one fails at author time rather than silently keeping every finding.
See docs/adr/0229-flow-centric-findings-architecture.md
for the architecture and roadmap.