Skip to content

Dataset Validation

Before training your OCR model, it's strongly recommended to validate your dataset using the validate-dataset CLI command. This ensures image integrity, label consistency, and format compatibility with your plate config.


What It Checks

The validator performs the following:

  • Image existence: Verifies that all image paths exist.
  • Image readability: Confirms that images are decodable and not corrupted.
  • Minimum resolution: Flags images smaller than a safe size (i.e., 2x2).
  • Resizing feasibility: Ensures images won't be resized below 1 pixel.
  • Text length: Verifies plate text length are less or equal than max_plate_slots.
  • Alphabet coverage: Ensures all characters are inside the allowed alphabet.
  • Duplicate entries: Warns about repeated image paths.
  • Unused characters: Identifies characters in your alphabet that are not used at all.

Basic Usage

fast-plate-ocr validate-dataset \
  --annotations-file my-dataset/train.csv \
  --plate-config-file config/latin_plates.yaml

Fix and Export Cleaned File

To automatically export a cleaned version of your dataset:

fast-plate-ocr validate-dataset \
  --annotations-file my-dataset/train.csv \
  --plate-config-file config/latin_plates.yaml \
  --export-fixed train_clean.csv

This creates train_clean.csv with only valid entries, skipping corrupted rows and malformed labels.


Allow Warnings but Exit on Errors

By default, the validator exits with code 1 if any error occurs. Use --warn-only to suppress the exit:

fast-plate-ocr validate-dataset \
  --annotations-file my-dataset/train.csv \
  --plate-config-file config/latin_plates.yaml \
  --warn-only

Control Minimum Resolution

Adjust what you consider "too small" for images:

fast-plate-ocr validate-dataset \
  --annotations-file my-dataset/train.csv \
  --plate-config-file config/latin_plates.yaml \
  --min-height 16 \
  --min-width 32

Output Example

After validation, a summary table is printed to the console using rich formatting:

 Validation Summary
┌──────────┬───────┐
│ Category │ Count │
├──────────┼───────┤
│ Errors   │   1   │
│ Warnings │   1   │
└──────────┴───────┘

                      Errors
┌──────┬──────────────────────────────────────────────┐
│ Line │ Message                                      │
├──────┼──────────────────────────────────────────────┤
│ 4554 │ Resize would give 0x0 (0x128)                │
│      │ from ./img/img_00001.jpg                     │
└──────┴──────────────────────────────────────────────┘

                     Warnings
┌──────┬──────────────────────────────────────────────┐
│ Line │ Message                                      │
├──────┼──────────────────────────────────────────────┤
│ 4554 │ Tiny image (1x1437 < 2x2):                   │
│      │ ./img/img_00001.jpg                          │
└──────┴──────────────────────────────────────────────┘

If no errors are found, you're safe to proceed with training.