Final Year Project: Finding Bugs in Verilog Tools
November 24, 2025
VeriGen: a novel AST-based fuzzer to stress-test Verilog tools on under‑explored features
Based on a Thesis submitted in fulfillment of requirements for the degree of Master of Engineering in Electronic and Information Engineering
This article introduces Verilog fuzzing, explains my approach, gives a brief demo, and presents results.
Motivation: Why Fuzz Verilog?
Verilog is a Hardware Description Language (HDL) used to describe the structure and behaviour of digital circuits for ASICs, FPGAs, CPUs and more. It was standardised as IEEE‑1364 (1995) and later subsumed into SystemVerilog (IEEE‑1800, 2005), which added assertions, interfaces and object‑oriented features suitable for both design and verification.
Designs typically flow through two classes of Verilog‑consuming tools:
- Simulators — test circuit behaviour without physically building the circuit.
- Synthesis tools — translate HDL into a low‑level gate‑level netlist and map to real hardware components.
These tools are widely trusted, yet they’re complex, often closed‑source, and effectively treated as black boxes. That combination means subtle bugs can slip through.
Figure 1: Design flow of a synthesis tool
What is Hardware Fuzzing?
Fuzzing is the automated generation of unexpected or “weird” inputs to stress systems and uncover bugs, crashes, or inconsistencies. In a hardware toolchain context, that means generating random (but valid) Verilog designs to probe simulators and synthesisers beyond standard workflows. Even well‑tested tools can fail under unusual, edge‑case inputs; fuzzing helps reveal those weaknesses early.
So, Why Would One Want to Fuzz Verilog Tools?
Bugs in HDL tools can be very costly:
- Silent functional errors in hardware designs.
- ASIC spins or FPGAs that pass simulation but fail in the real world.
- Divergent behaviour across vendors and from the language standard (recent work has shown inconsistent support across front‑ends).
Fuzzing lets us push these tools into corners where such misbehaviours surface.
Prior Work: Progress and Gaps
Two notable fuzzers illustrate the landscape:
- VeriSmith: generates deterministic Verilog and uses equivalence checking to find synthesis bugs. It avoids undefined behaviour but does not support the generate construct or module hierarchy. It has previously found 11 bugs across Yosys, Vivado and Icarus.
- VlogHammer: uses non‑deterministic designs for differential testing, but lacks support for behavioural Verilog and multi‑module structures.
Historically, fuzzers have avoided two features that are common in real designs yet tricky for tools: the generate construct and hierarchical naming. This project targets exactly those.
| VeriSmith | VlogHammer | VeriGen | |
|---|---|---|---|
| Deterministic Design Generation | Yes | No | Yes |
| Employ Equivalency Checking | Yes: formal equivalency checking | No: differential testing | Yes: calculates the expected-value of a design |
| Implements the generate construct and Hierarchical Naming | No | No | Yes |
Table 1: feature comparison between Verilog fuzzers
Target 1: The Generate Construct
Introduced in IEEE‑1364‑2005, generate comes in three forms that mirror software control flow:
ifforcase(includingcasezandcasex)
Synthesis tools resolve these during elaboration, enabling scalable structural replication (e.g., a ripple‑carry adder built from repeated module instances connected in sequence).
module ripple_carry_adder #(parameter SIZE = 4) (
input [SIZE-1:0] a,
input [SIZE-1:0] b,
input ci,
output [SIZE-1:0] sum,
output co
);
wire [SIZE:0] carry;
assign carry[0] = ci;
genvar i;
for (i = 0; i < SIZE; i = i + 1) begin: adder_array
full_adder fa(
.sum(sum[i]),
.carry_out(carry[i+1]),
.a(a[i]),
.b(b[i]),
.carry_in(carry[i])
);
end
assign co = carry[SIZE];
endmodule
Figure 2: A ripple-carry adder constructed using the for generate block
Target 2: Hierarchical Naming
Formalised in IEEE‑1364‑1995 (though non-standard implementations existed before this), hierarchical names reference signals or modules via scoped paths, e.g. top.top_c1.out referred to inside top without an explicit wire. This is common in simulation for debugging/monitoring and design reuse. Synthesis support is limited — notably, cross‑module references (XMR) are broadly disallowed.
This fuzzer tests naming across nested modules, relative references, $root‑prefixed paths, and defparam‑style overrides.
[DIAGRAM PLACEHOLDER: Module tree with absolute/relative hierarchical references and optional $root/defparam]
Aims and Requirements
Functional requirements:
- Deterministic Verilog code generation, randomised test generation per iteration.
- Tool invocation and robust output handling.
- Discrepancy detection (expected vs observed).
- Logging and reproducibility/determinism (seeded runs).
Non‑functional requirements:
- Performance (≲ 1 min/iteration for simulation; synthesis may be longer e.g. ~7 mins in Vivado).
- Extensibility and maintainability.
- Portability (Windows and Linux).
Architecture at a Glance
The fuzzer is AST‑based: designs are constructed as Abstract Syntax Trees representing Verilog structures, enabling fine‑grained control over generation and mutation.
Figure 3: Overview of the fuzzer architecture
Key Design Choices
- AST‑based generation for structural control and flexibility.
- Seeded randomness for reproducible, debuggable fuzz cases.
- No external stimulus needed: each generated design computes a known constant output, sidestepping testbench input generation.
- This uses expected‑value propagation so every design encodes a “golden” result for automated checking.
- Tool‑agnostic CLI wrapper to support multiple back‑ends (e.g. Vivado, Quartus, Icarus).
- Expression operators limited to
ADDandXORto reduce error masking (e.g., a bad sum cancelled bySUB).
Design Generation
Two complementary generators are responsible for generating HDL designs:
- Generate‑focused: builds nested
generateblocks with tunable loop parameters (start values, bounds) per seed. - Hierarchy‑focused: creates multi‑level module trees with options such as
$rootprefixing anddefparamoverrides. Leaf modules can themselves includegenerateconstructs.
All designs are produced from the AST and serialised for analysis. Seeding ensures deterministic regeneration of any failing case.
Demo: CLI Examples
# Hierarchical naming + ModelSim simulation
./fuzz -t 5 --hier -n 10 --depth 3 --root-prefix --defparam
# Generate-construct designs: Quartus (synth) + ModelSim (sim)
./fuzz -t 1 -n 5 --depth 4
# Hierarchy with embedded generate blocks: simulate in ModelSim and Icarus
./fuzz -t 6 --hier -n 5 --depth 3 --include-gen
Results
Across 67k generated designs, all back-ends (Quartus, ModelSim, Vivado, Icarus) completed with no synthesis crashes, simulation errors, or equivalence mismatches. Determinism checks produced bit-identical output for fixed seeds.
| ID | Depth | Tool | Purpose | Iterations |
|---|---|---|---|---|
| TC1 | 3 | Quartus and ModelSim | Nested-loop correctness | 6 500 |
| TC2 | 3 | Vivado | Nested-loop correctness | 500 |
| TC3 | 5 | Icarus and ModelSim | Nested-loop correctness with hierarchical naming | 10 000 |
| TC4 | 5 | Icarus | Hierarchical naming test | 50 000 |
Table 2: Iterations per Test Campaign (TC)
Note: Vivado support was enabled late in the project; ModelSim was added near the end. Coverage therefore reflects limited runtime on those back‑ends relative to Icarus.
On coverage, the generator—despite being much smaller than VeriSmith—already reaches comparable regions of Icarus’s codebase. The generate-only and hierarchy-only modes land within 1–2% of each other, hinting that each explores distinct parts of the front-end. Combining both gives the best overall result and comes closest to VeriSmith. Icarus was rebuilt with coverage to obtain these measurements.
Figure 4: Coverage results recorded through Icarus, comparing VeriSmith and VeriGen
Conclusions and Further Work
This work delivers a novel AST‑based Verilog fuzzer focused on under‑tested yet practical features, generate and hierarchical naming, using deterministic, constant‑evaluating designs to avoid external test vectors and enable robust automated checking.
Key takeaways:
- Competitive coverage despite a simpler grammar.
- Determinism and expected‑value propagation make debugging tractable.
- Targeting language features (not just tokens) reveals meaningful behaviours in real tools.
Future extensions:
- Broader expression/operator set and deeper nesting.
- More synthesis/simulation back‑ends.
- Richer discrepancy oracles (e.g., cross‑tool differential checks, waveform‑level comparisons).
Full technical report (PDF)
Source code (GitHub)