Utilities

Utils module: Utilities (summary, validation, FLOPs calculation).

Provides model analysis tools including: - Model summaries with parameter and memory analysis - Architecture validation and dimension tracking - Memory and FLOPs estimation

calculate_output_shape(model: Module, input_shape: Tuple[int, ...], device: str = 'cpu') → Dict[str, Any][source]

Calculate output shape by forward pass with dummy input.

Parameters:

model – PyTorch model to analyze.
input_shape – Input tensor shape (including batch dimension).
device – Device to run the model on (‘cpu’ or ‘cuda’).

Returns:

‘output_shape’: Shape of the output tensor
’output_size’: Total elements in output
’success’: Whether calculation succeeded

Return type:

Dictionary containing

Raises:

ValueError – If input shape is invalid.

get_layer_summary(model: Module, input_shape: Tuple[int, ...], device: str = 'cpu') → List[Dict[str, Any]][source]

Get layer-by-layer summary with output shapes.

Hooks into model layers to capture output shapes during forward pass.

Parameters:

model – PyTorch model to analyze.
input_shape – Input tensor shape (including batch dimension).
device – Device to run the model on.

Returns:

‘name’: Layer name/path
’type’: Layer type (class name)
’output_shape’: Output shape of the layer
’parameters’: Number of parameters
’trainable’: Whether layer is trainable

Return type:

List of dictionaries containing layer information

count_parameters_by_type(model: Module) → Dict[str, Dict[str, int]][source]

Count parameters grouped by layer type.

Parameters:

model – PyTorch model to analyze.

Returns:

‘total’: Total parameters of this type
’trainable’: Trainable parameters
’count’: Number of layers of this type

Return type:

Dictionary mapping layer types to

get_memory_usage(model: Module, input_shape: Tuple[int, ...], device: str = 'cpu') → Dict[str, float | str][source]

Estimate model memory usage.

Parameters:

model – PyTorch model to analyze.
input_shape – Input tensor shape (including batch dimension).
device – Device to run the model on.

Returns:

‘parameter_memory_mb’: Memory used by parameters
’activation_memory_mb’: Estimated activation memory
’total_memory_mb’: Total estimated memory
’device’: Device used for estimation

Return type:

Dictionary containing

get_model_flops(model: Module, input_shape: Tuple[int, ...]) → Dict[str, int | str][source]

Estimate model FLOPs (floating point operations).

Note: This is an estimation based on standard layer operations. Actual FLOPs may vary based on implementation details.

Parameters:

model – PyTorch model to analyze.
input_shape – Input tensor shape (including batch dimension).

Returns:

‘total_flops’: Total estimated FLOPs
’total_flops_in_billions’: Total FLOPs in billions
’success’: Whether estimation succeeded

Return type:

Dictionary containing

print_model_summary(model: Module, input_shape: Tuple[int, ...], verbose: bool = True) → Dict[str, Any][source]

Print comprehensive model summary.

Parameters:

model – PyTorch model to summarize.
input_shape – Input tensor shape (including batch dimension).
verbose – Whether to print detailed information.

Returns:

Dictionary containing all summary information.

validate_spatial_dimensions(input_shape: Tuple[int, int, int], num_conv_blocks: int, pooling_kernel_size: int = 2, pooling_stride: int = 2, min_final_size: int = 4) → Dict[str, any][source]

Validate that spatial dimensions don’t become too small.

Parameters:

input_shape – Input shape as (C, H, W).
num_conv_blocks – Number of convolutional blocks.
pooling_kernel_size – Pooling kernel size.
pooling_stride – Pooling stride.
min_final_size – Minimum acceptable final spatial size.

Returns:

‘valid’: Whether dimensions are valid
’input_spatial’: Original spatial dimensions
’final_spatial’: Final spatial dimensions after pooling
’num_pooling_ops’: Number of pooling operations
’warnings’: List of warning messages

Return type:

Dictionary containing

validate_channel_progression(channels: List[int], num_conv_blocks: int, max_channels: int = 2048) → Dict[str, any][source]

Validate channel progression configuration.

Parameters:

channels – List of channel sizes per block.
num_conv_blocks – Expected number of blocks.
max_channels – Maximum allowed channels per layer.

Returns:

‘valid’: Whether configuration is valid
’warnings’: List of warning messages
’growth_rate’: Average growth per block

Return type:

Dictionary containing

validate_architecture(input_shape: Tuple[int, int, int], num_classes: int, num_conv_blocks: int, channels: List[int] | None = None, pooling_kernel_size: int = 2, pooling_stride: int = 2) → Dict[str, any][source]

Validate complete architecture configuration.

Parameters:

input_shape – Input shape as (C, H, W).
num_classes – Number of output classes.
num_conv_blocks – Number of convolutional blocks.
channels – Channel progression (optional).
pooling_kernel_size – Pooling kernel size.
pooling_stride – Pooling stride.

Returns:

Dictionary containing validation results for all checks.

estimate_model_size(num_conv_blocks: int, channels: List[int], num_classes: int, include_batch_norm: bool = True, include_dropout: bool = True) → Dict[str, any][source]

Estimate model size without actually creating it.

Parameters:

num_conv_blocks – Number of convolutional blocks.
channels – Channel progression.
num_classes – Number of output classes.
include_batch_norm – Whether batch norm is included.
include_dropout – Whether dropout is included.

Returns:

Dictionary containing estimated parameter counts.

predict_memory_usage(num_conv_blocks: int, channels: List[int], num_classes: int, batch_size: int = 32, input_height: int = 224, input_width: int = 224) → Dict[str, float][source]

Predict memory usage without running the model.

Parameters:

num_conv_blocks – Number of convolutional blocks.
channels – Channel progression.
num_classes – Number of output classes.
batch_size – Batch size for training.
input_height – Input height.
input_width – Input width.

Returns:

Dictionary with estimated memory in MB.

class ArchitectureConfig(input_shape: Tuple[int, int, int], num_classes: int, num_conv_blocks: int = 4, channels: str | List[int] = 'auto', kernel_sizes: int | List[int] = 3, strides: int | List[int] = 1, activations: str | List[str] = 'relu', dropout_rates: float | List[float] = 0.0, use_batchnorm: bool = True, pattern: str = 'sequential', use_attention: bool = False, use_residual: bool = False, use_dense: bool = False)[source]

Bases: object

Architecture configuration dataclass.

input_shape

Tuple of (channels, height, width)

Type:: Tuple[int, int, int]

num_classes

Number of output classes

Type:: int

num_conv_blocks

Number of convolutional blocks

Type:: int

channels

List of channel sizes or ‘auto’

Type:: str | List[int]

kernel_sizes

List of kernel sizes or single value

Type:: int | List[int]

strides

List of strides or single value

Type:: int | List[int]

activations

List of activation names or single name

Type:: str | List[str]

dropout_rates

List of dropout rates or single value

Type:: float | List[float]

use_batchnorm

Whether to use batch normalization

Type:: bool

pattern

Architecture pattern

Type:: str

use_attention

Whether to use attention mechanisms

Type:: bool

use_residual

Whether to use residual connections

Type:: bool

use_dense

Whether to use dense connections

Type:: bool

activations: str | List[str] = 'relu'

channels: str | List[int] = 'auto'

dropout_rates: float | List[float] = 0.0

classmethod from_dict(data: Dict[str, Any]) → ArchitectureConfig[source]: Create from dictionary.

kernel_sizes: int | List[int] = 3

num_conv_blocks: int = 4

pattern: str = 'sequential'

strides: int | List[int] = 1

to_dict() → Dict[str, Any][source]: Convert to dictionary.

use_attention: bool = False

use_batchnorm: bool = True

use_dense: bool = False

use_residual: bool = False

validate() → bool[source]

Validate configuration.

Returns:: True if valid
Raises:: ValueError – If invalid

input_shape: Tuple[int, int, int]

num_classes: int

class GridSearch(search_space: Dict[str, List[Any]], base_config: Dict[str, Any] | None = None)[source]

Bases: object

Grid search over architecture hyperparameters.

Systematically explores all combinations of provided parameters.

Examples

>>> search_space = {
...     'num_conv_blocks': [3, 4, 5],
...     'channels': [['auto'], [[64, 128, 256], [32, 64, 128, 256], [64, 128, 256, 512]]],
...     'activation': ['relu', 'gelu'],
... }
>>> searcher = GridSearch(search_space)
>>> configs = list(searcher.generate())
>>> len(configs)  # 3 * 3 * 2 = 18

generate()[source]

Generate all configurations.

Yields:: ArchitectureConfig objects

generate_with_index()[source]

Generate configurations with their index.

Yields:: Tuple of (index, ArchitectureConfig)

class RandomSearch(search_space: Dict[str, List[Any]], num_samples: int = 10, base_config: Dict[str, Any] | None = None, seed: int | None = None)[source]

Bases: object

Random search over architecture hyperparameters.

Randomly samples from provided parameter distributions.

Examples

>>> search_space = {
...     'num_conv_blocks': [3, 4, 5, 6],
...     'activation': ['relu', 'gelu', 'leaky_relu'],
...     'dropout_rate': [0.0, 0.1, 0.2, 0.3],
... }
>>> searcher = RandomSearch(search_space, num_samples=100)
>>> configs = list(searcher.generate())

generate()[source]

Generate random configurations.

Yields:: ArchitectureConfig objects

generate_with_index()[source]

Generate random configurations with their index.

Yields:: Tuple of (index, ArchitectureConfig)

class ArchitectureFactory[source]

Bases: object

Factory for generating architectures from patterns and configurations.

Simplifies creating models from predefined patterns.

Examples

>>> factory = ArchitectureFactory()
>>> config = ArchitectureConfig(
...     input_shape=(3, 224, 224),
...     num_classes=1000,
...     pattern='residual'
... )
>>> architecture_dict = factory.create(config)

create(config: ArchitectureConfig) → Dict[str, Any][source]

Create architecture from configuration.

Parameters:: config – Architecture configuration
Returns:: Dictionary with architecture information
Raises:: ValueError – If pattern is not recognized

register_pattern(name: str, builder: Callable[[ArchitectureConfig], Dict[str, Any]])[source]

Register custom architecture pattern.

Parameters:

name – Pattern name
builder – Function that creates architecture from config

class ArchitectureScorer(max_parameters: int | None = None, max_memory_mb: float | None = None, target_flops: float | None = None)[source]

Bases: object

Score architectures based on various metrics.

Evaluates architectures by parameter count, FLOPs, memory usage, etc.

Examples

>>> scorer = ArchitectureScorer()
>>> config = ArchitectureConfig(
...     input_shape=(3, 224, 224),
...     num_classes=1000,
...     num_conv_blocks=4
... )
>>> score = scorer.score_config(config)

score_config(config: ArchitectureConfig) → float[source]

Score architecture configuration.

Parameters:: config – Architecture configuration
Returns:: Score (higher is better)

class ArchitectureComparator(configs: List[ArchitectureConfig], scorer: ArchitectureScorer | None = None)[source]

Bases: object

Compare multiple architectures.

Examples

>>> configs = [
...     ArchitectureConfig(...),
...     ArchitectureConfig(...),
... ]
>>> comparator = ArchitectureComparator(configs)
>>> best = comparator.get_best()

get_best() → Tuple[ArchitectureConfig, float][source]: Get best configuration.

get_statistics() → Dict[str, float][source]: Get score statistics.

get_top_k(k: int = 3) → List[Tuple[ArchitectureConfig, float]][source]: Get top k configurations.

get_worst() → Tuple[ArchitectureConfig, float][source]: Get worst configuration.

class ArchitecturePattern(value)[source]

Bases: Enum

Predefined architecture patterns.

SEQUENTIAL = 'sequential'

RESIDUAL = 'residual'

DENSE = 'dense'

INCEPTION = 'inception'

MIXED = 'mixed'

expand_config_list(value: Any | List[Any], length: int) → List[Any][source]

Expand single value or list to specified length.

Parameters:

value – Single value or list
length – Target length

Returns:

List of specified length

Raises:

ValueError – If list length doesn’t match target

sample_architecture(num_conv_blocks: int, min_channels: int = 32, max_channels: int = 512, min_kernel: int = 3, max_kernel: int = 7) → ArchitectureConfig[source]

Generate random architecture.

Parameters:

num_conv_blocks – Number of conv blocks
min_channels – Minimum channels
max_channels – Maximum channels
min_kernel – Minimum kernel size
max_kernel – Maximum kernel size

Returns:

Random ArchitectureConfig

Model Analysis

print_model_summary(model: Module, input_shape: Tuple[int, ...], verbose: bool = True) → Dict[str, Any][source]

Print comprehensive model summary.

Parameters:

model – PyTorch model to summarize.
input_shape – Input tensor shape (including batch dimension).
verbose – Whether to print detailed information.

Returns:

Dictionary containing all summary information.

validate_architecture(input_shape: Tuple[int, int, int], num_classes: int, num_conv_blocks: int, channels: List[int] | None = None, pooling_kernel_size: int = 2, pooling_stride: int = 2) → Dict[str, any][source]

Validate complete architecture configuration.

Parameters:

input_shape – Input shape as (C, H, W).
num_classes – Number of output classes.
num_conv_blocks – Number of convolutional blocks.
channels – Channel progression (optional).
pooling_kernel_size – Pooling kernel size.
pooling_stride – Pooling stride.

Returns:

Dictionary containing validation results for all checks.

get_model_flops(model: Module, input_shape: Tuple[int, ...]) → Dict[str, int | str][source]

Estimate model FLOPs (floating point operations).

Note: This is an estimation based on standard layer operations. Actual FLOPs may vary based on implementation details.

Parameters:

model – PyTorch model to analyze.
input_shape – Input tensor shape (including batch dimension).

Returns:

‘total_flops’: Total estimated FLOPs
’total_flops_in_billions’: Total FLOPs in billions
’success’: Whether estimation succeeded

Return type:

Dictionary containing