Model Configuration¶
The model_config.yaml
defines the architecture of the OCR model used in fast-plate-ocr
. This configuration
allows you to customize key components of the model, i.e., convolutional tokenizers, patching strategies, attention
settings, etc. (all without modifying code).
All model configurations are validated using Pydantic, ensuring that every field, layer, and parameter is checked when building the model from the config.
Currently, the supported architectures are:
Supported Architectures¶
Compact Convolutional Transformer (CCT)¶
Inspired by the CCT architecture, this model structure:
- Uses a convolutional tokenizer to extract patch representations from the input image.
- Processes the resulting sequence with a Transformer Encoder.
Config Example¶
- This scales values between range
[0, 1]
- Supports a wide variety of layer types, such as
Conv2D
,MaxPooling2D
,DepthwiseConv2D
,SqueezeExcite
, etc. See Model Config for all available option. - Each layer supports the full set of corresponding Keras parameters. For example,
Conv2D
acceptsfilters
,kernel_size
,strides
, etc. - See
Transformers without Normalization
2
Note on plate/model configs
The plate config is used throughout both inference and training scripts. In contrast, the model config (shown above) is only used for training, as it defines the architecture to be built.
Building Custom Tokenizers with Any Keras Layer¶
You can define your own tokenizer stacks by composing any supported layer like Conv2D
, DepthwiseConv2D
, SqueezeExcite
, and many more directly in YAML, without writing any code.
Each layer accepts all its typical Keras parameters, and the model schema is validated with Pydantic, so typos or misconfigured fields are caught immediately.
Here's an example with a more diverse set of layers:
tokenizer:
blocks:
- { layer: Conv2D, filters: 64, kernel_size: 3, activation: relu }
- { layer: SqueezeExcite, ratio: 0.5 }
- { layer: DepthwiseConv2D, kernel_size: 3, strides: 1 }
- { layer: BatchNormalization }
- { layer: MaxBlurPooling2D, pool_size: 2, filter_size: 3 }
- { layer: Conv2D, filters: 128, kernel_size: 3 }
- { layer: CoordConv2D, filters: 96, kernel_size: 3, with_r: true }
Tip
Each layer:
value corresponds to a class in the Model Schema, check it out
to see all the supported layers and options!