Dataset - XLPSR Challenge

The IMPROVED Dataset

Dataset Focus

The XLPSR challenge evaluates text reconstruction accuracy rather than pixel-level fidelity. No high-resolution clean frames are provided—models must recover license plate text directly from degraded low-resolution inputs.

The challenge is built upon the proprietary IMPROVED dataset (In-the-wild with Multi-Perspective Realistic Observations for Vehicle Evidence and license-plate recognition), specifically curated to capture the diversity and complexity of real operational environments.

Dataset Characteristics

Short real-world video clips (up to 10 frames) of moving vehicles with French license plates
Multiple acquisition devices (consumer cameras, surveillance cameras, smartphones)
Wide range of distances (10–100 meters)
Variable lighting conditions
Different weather conditions (clear, cloudy)
Large viewpoint variability (frontal, oblique, steep angle)
Strong natural degradations (motion blur, sensor noise, compression artifacts)

Data Format

Each video clip is paired with:

Low-resolution input frames extracted from the video
Ground-truth license plate text (character string) avalaible only for the developement set.
Frame-level metadata (Video clip ID and license plate bounding boxes)

Note: High-resolution reference images are not provided to emphasize the real-world super-resolution task.

Acquisition Site

Acquisition Circuit

The dataset was acquired at the Saint-Laurent-de-Mûre circuit in France. The highlighted portion of the circuit below was used for capturing the video sequences with vehicles in motion under realistic conditions.

Figure 1: Saint-Laurent-de-Mûre circuit with highlighted acquisition zone.

Camera Configuration

The dataset features 17 different cameras covering a wide spectrum of imaging devices:

Surveillance cameras: Reolink, Instar, Hikvision, Dahua
Smartphones: Huawei P40 Pro, iPhone 15/15 Plus, Xiaomi Redmi Note 13, Huawei Honor 9X
Professional cameras: Blackmagic micro studio 4K (x2), Panasonic Lumix DMC-G70
Specialized: Hikvision fisheye camera (severe distortion), infrared camera

License Plate Examples

French License Plate Format

The dataset exclusively contains old and new French license plates. The format for old license plates 123-AAA-12 or 1234-AA-12 (3 or 4 degits, 2 or 3 letters, 2 degits). And the format for new license late is: AA-123-AA (2 letters, 3 digits, 2 letters).

Examples of French license plates in the dataset

Figure 2: Examples of French license plates from the acquisition.

File Format & Structure

Detection Annotations Format

Each sequence folder (10 frames) contains a detections.json file with the following structure:

[
  {
    "frame": "000000.png",
    "license_plate_coordinates": [x1, y1, x2, y2]
  },
  {
    "frame": "000001.png",
    "license_plate_coordinates": [x1, y1, x2, y2]
  },
  ...
]

Coordinates are in [top-left x, top-left y, bottom-right x, bottom-right y] format.

Directory Structure

dataset/
├── development/
│   └── seq_001/
│       ├── 000000.png
│       ├── ...
│       ├── 000009.png
│       └── detections.json
├── public_validation/
│   └── seq_002/
│       ├── 000000.png
│       ├── ...
│       └── detections.json
└── ground_truth.csv       (global ground truth file)

Each sequence contains exactly 10 consecutive frames.

Expected prediction file format

Participant should provide a zip file containing prediction.csv, it should include the sequence IDs and the coresponding predicted license plate text, as bellow:

sequence_id,license_plate
seq_001,AB123CD
seq_002,EF456GH
seq_002,457DEX16
...

Dataset Splits

The IMPROVED dataset is divided into three distinct sets to ensure fair evaluation and rigorous benchmarking:

Development Set

For Local Benchmarking & Validation

39 sequences (390 images)

Complete low-resolution video frames
Full ground-truth license plate labels
Detection coordinates in JSON format

Purpose: Allows participants to develop, test, and validate their models locally before submission.

Public Validation Set

For Public Leaderboard

347 sequences (3,470 images)

Low-resolution frames only (no ground truth)
Detection coordinates in JSON format

Purpose: Enables participants to benchmark against others and track progress on the public leaderboard.

Blind Test Set

For Final Evaluation

88 unreleazed sequences (880 images)

16 completely unseen license plates
Similar diversity to development set
Evaluated on organizers' servers

Purpose: Ensures fair, unbiased final evaluation of all submissions on previously unseen data.

Training Policy & Guidelines

Training Guidelines

External data permitted: Participants may use any external data (synthetic, public, or proprietary) for training their models.
Model constraints: No restrictions on model architecture, size, or training methodology.
Pre-trained models: Use of publicly available pre-trained models is allowed and encouraged.
Data augmentation: Participants may augment the provided data with synthetic degradations.
Submission requirements: Final models must be submitted as Docker containers for reproducibility (top teams and on invitation only).

Important Notes

The blind test set will not be released to participants at any time
Public leaderboard rankings are indicative only
Final ranking is based exclusively on blind test set performance and expert evaluation
All submissions must follow the sequence_id,predicted_lp CSV format
Dataset license agreement must be signed during registration

Dataset Release Schedule

Development Set: ~February 20, 2026 - Immediately upon registration approval
Public Validation Set: March 1, 2026 - Leaderboard opens
Blind Test Set: Never released - remains exclusively with organizers