Fourier Image Similarity: From Paper to Open-Source Package

March 27, 2026 · 10 min read

Frequency Domain Image Similarity Open Source
Mobile phone image-based anti-copy pattern detection and classification framework
Anti-copy pattern detection from my Array 2026 paper — the application that motivated extracting the Fourier-domain comparison pipeline into a standalone package.

Why Frequency Domain for Image Comparison?

Most image similarity approaches operate in pixel space: comparing histograms, SSIM, feature embeddings, or learned descriptors. These work well when images share spatial structure at similar scales and positions. But some classes of images are better understood by their periodic structure — the repeating patterns that are invisible in individual pixel values but immediately apparent as sharp peaks in the Fourier magnitude spectrum.

The specific context in my paper was anti-copy patterns: fine security prints designed with deliberate repeating microstructure that degrades under reprography. Two images that look visually similar in pixel space may have very different Fourier peak arrangements depending on whether that microstructure survived capture — and those differences are diagnostically important.

Once I had the Fourier comparison pipeline working for that task, it was clearly reusable beyond it. The fourier-image-similarity package extracts exactly that pipeline — no domain-specific logic, no segmentation or downstream classifier — just the frequency-domain feature extraction and scoring that any project can drop in.

How the Pipeline Works

The package is structured as six composable stages. Understanding what each stage does makes it easier to tune for a new domain.

1. Loading and Preprocessing

Images are loaded as grayscale float arrays and optionally resized to a fixed resolution. Working in grayscale is deliberate: the Fourier magnitude of a colour image mixes luminance and chrominance in ways that complicate peak interpretation. Resizing to a common resolution before transforming keeps peak coordinates comparable across images.

2. FFT Magnitude and Phase

The centered 2D FFT is computed for each image, producing two outputs:

3. Peak Detection

Local maxima are extracted from the magnitude image using a maximum filter. The DC region at the spectrum centre is masked out first — it is almost always dominant and uninformative for pattern comparison. The top N peaks by strength are retained (default 35), with a minimum spacing constraint to prevent tightly clustered detections counting as multiple peaks.

4. Local Peak Features

For each detected peak, two statistics are computed over a disk-shaped neighbourhood in the magnitude and phase images:

5. Reference Selection

When comparing a query set against a large pool of candidate reference images, you usually want a representative subset rather than all of them. The selection heuristic computes pairwise nearest-peak distances between all candidates and scores each candidate by how close its peak pattern is to the rest of the pool. Images with the lowest mean nearest-peak distance to others are the most central and are selected as references.

This is useful in practice because collecting a "perfect" clean reference set is rarely possible — you often have a noisy pool and need to filter it programmatically.

6. Similarity Scoring

Each query image is compared against each reference image using six component metrics, then aggregated into a single score in [0, 1]:

Key design choice: the score is entirely spatial-frequency-based. There is no learned component, no embedding from a neural network, and no dependency on image labels. This makes it interpretable, fast, and applicable to any image domain where periodic structure carries discriminative information.

Using the Package

Installation is straightforward — the only dependencies are NumPy, SciPy, and Pillow:

pip install fourier-image-similarity

The CLI covers the most common workflow:

fourier-image-similarity \
  --query-dir /path/to/query_images \
  --reference-dir /path/to/reference_images \
  --num-references 10

If you have a larger pool and want automatic reference selection, swap --reference-dir for --reference-candidate-dir. The tool will select the most representative subset and report which images were chosen.

For use inside a pipeline, the Python API keeps things simple:

from fourier_image_similarity.pipeline import run_similarity_analysis

rows, references = run_similarity_analysis(
    query_dir="/path/to/query_images",
    reference_dir="/path/to/reference_images",
    num_references=10,
)

for row in rows[:5]:
    print(row["file"], row["similarity_score"])

Each row in the output contains the full breakdown: similarity score, peak match and mismatch ratios, average closest-peak distance, and the three difference metrics. This makes it easy to diagnose why a particular image scored high or low rather than just accepting the aggregate number.

Tuning for a New Domain

The defaults were calibrated for the anti-copy pattern use case (tall, narrow crops at 216×685 pixels with up to 35 peaks). If you are applying this to a different image type, these parameters are worth reviewing:

Relation to the Paper

The package implements the Fourier-domain comparison sub-pipeline from our anti-copy pattern detection framework. The full paper additionally covers mobile phone image acquisition, a segmentation stage to isolate the pattern region, and a downstream classifier — none of which are included here. The package is intentionally narrow: it provides only the frequency comparison logic that generalises beyond that specific application.

The paper and package can be cited independently. If you use the package in research, the README includes a BibTeX entry pointing to the Array 2026 paper.

Related Papers and Source