Fuzzing Basics

The term fuzzing, coined in 1989 at the University of Wisconsin in Madison, refers to two related concepts:

  • To fuzz (a file, network stream, or other data) is to manipulate data intended to be parsed or otherwise processed by a software program
  • Fuzz testing, or fuzzing, is automated, repetitive negative testing of software via input generation or mutation

In each case, the end goal is to trigger hangs, exceptions, or crashes in the target application. Identifying and repairing the root cause of these issues yields software that is more reliable and resilient to attack.

Fuzzing Taxonomy

All fuzzers operate by channeling malformed or corrupted data to an application or service entry point. Beyond that, they come in a wide variety of flavors that can be loosely categorized along three axes: knowledge of the input format, knowledge of the target app structure, and method of generating new inputs:

Knowledge of input format

  • Dumb fuzzing -- input data is corrupted randomly without awareness of expected format.

  • Smart fuzzing -- input data is corrupted with awareness of the expected format, such as encodings (for example, base-64 encoding) and relations (offsets, checksums, lengths, etc.).

Knowledge of target application structure

  • Black-box fuzzing -- channeling of corrupted data without visibility or verification of which code branches were traversed or not

  • White-box fuzzing -- channeling of corrupted data with verification that all target code branches were traversed (which can be used to guide corruption of later inputs)

Input production scheme

  • Generation - each subsequent iteration's input data is produced independently of any previous input, and is typically based on a model of the input format

  • Mutation - modification of known-valid input data is made according to certain patterns

External Resources

Following are external references for this topic: