Utilities

Utils module: Utilities (summary, validation, FLOPs calculation).

Provides model analysis tools including: - Model summaries with parameter and memory analysis - Architecture validation and dimension tracking - Memory and FLOPs estimation

calculate_output_shape(model: Module, input_shape: Tuple[int, ...], device: str = 'cpu') Dict[str, Any][source]

Calculate output shape by forward pass with dummy input.

Parameters:
  • model – PyTorch model to analyze.

  • input_shape – Input tensor shape (including batch dimension).

  • device – Device to run the model on (‘cpu’ or ‘cuda’).

Returns:

  • ‘output_shape’: Shape of the output tensor

  • ’output_size’: Total elements in output

  • ’success’: Whether calculation succeeded

Return type:

Dictionary containing

Raises:

ValueError – If input shape is invalid.

get_layer_summary(model: Module, input_shape: Tuple[int, ...], device: str = 'cpu') List[Dict[str, Any]][source]

Get layer-by-layer summary with output shapes.

Hooks into model layers to capture output shapes during forward pass.

Parameters:
  • model – PyTorch model to analyze.

  • input_shape – Input tensor shape (including batch dimension).

  • device – Device to run the model on.

Returns:

  • ‘name’: Layer name/path

  • ’type’: Layer type (class name)

  • ’output_shape’: Output shape of the layer

  • ’parameters’: Number of parameters

  • ’trainable’: Whether layer is trainable

Return type:

List of dictionaries containing layer information

count_parameters_by_type(model: Module) Dict[str, Dict[str, int]][source]

Count parameters grouped by layer type.

Parameters:

model – PyTorch model to analyze.

Returns:

  • ‘total’: Total parameters of this type

  • ’trainable’: Trainable parameters

  • ’count’: Number of layers of this type

Return type:

Dictionary mapping layer types to

get_memory_usage(model: Module, input_shape: Tuple[int, ...], device: str = 'cpu') Dict[str, float | str][source]

Estimate model memory usage.

Parameters:
  • model – PyTorch model to analyze.

  • input_shape – Input tensor shape (including batch dimension).

  • device – Device to run the model on.

Returns:

  • ‘parameter_memory_mb’: Memory used by parameters

  • ’activation_memory_mb’: Estimated activation memory

  • ’total_memory_mb’: Total estimated memory

  • ’device’: Device used for estimation

Return type:

Dictionary containing

get_model_flops(model: Module, input_shape: Tuple[int, ...]) Dict[str, int | str][source]

Estimate model FLOPs (floating point operations).

Note: This is an estimation based on standard layer operations. Actual FLOPs may vary based on implementation details.

Parameters:
  • model – PyTorch model to analyze.

  • input_shape – Input tensor shape (including batch dimension).

Returns:

  • ‘total_flops’: Total estimated FLOPs

  • ’total_flops_in_billions’: Total FLOPs in billions

  • ’success’: Whether estimation succeeded

Return type:

Dictionary containing

print_model_summary(model: Module, input_shape: Tuple[int, ...], verbose: bool = True) Dict[str, Any][source]

Print comprehensive model summary.

Parameters:
  • model – PyTorch model to summarize.

  • input_shape – Input tensor shape (including batch dimension).

  • verbose – Whether to print detailed information.

Returns:

Dictionary containing all summary information.

validate_spatial_dimensions(input_shape: Tuple[int, int, int], num_conv_blocks: int, pooling_kernel_size: int = 2, pooling_stride: int = 2, min_final_size: int = 4) Dict[str, any][source]

Validate that spatial dimensions don’t become too small.

Parameters:
  • input_shape – Input shape as (C, H, W).

  • num_conv_blocks – Number of convolutional blocks.

  • pooling_kernel_size – Pooling kernel size.

  • pooling_stride – Pooling stride.

  • min_final_size – Minimum acceptable final spatial size.

Returns:

  • ‘valid’: Whether dimensions are valid

  • ’input_spatial’: Original spatial dimensions

  • ’final_spatial’: Final spatial dimensions after pooling

  • ’num_pooling_ops’: Number of pooling operations

  • ’warnings’: List of warning messages

Return type:

Dictionary containing

validate_channel_progression(channels: List[int], num_conv_blocks: int, max_channels: int = 2048) Dict[str, any][source]

Validate channel progression configuration.

Parameters:
  • channels – List of channel sizes per block.

  • num_conv_blocks – Expected number of blocks.

  • max_channels – Maximum allowed channels per layer.

Returns:

  • ‘valid’: Whether configuration is valid

  • ’warnings’: List of warning messages

  • ’growth_rate’: Average growth per block

Return type:

Dictionary containing

validate_architecture(input_shape: Tuple[int, int, int], num_classes: int, num_conv_blocks: int, channels: List[int] | None = None, pooling_kernel_size: int = 2, pooling_stride: int = 2) Dict[str, any][source]

Validate complete architecture configuration.

Parameters:
  • input_shape – Input shape as (C, H, W).

  • num_classes – Number of output classes.

  • num_conv_blocks – Number of convolutional blocks.

  • channels – Channel progression (optional).

  • pooling_kernel_size – Pooling kernel size.

  • pooling_stride – Pooling stride.

Returns:

Dictionary containing validation results for all checks.

estimate_model_size(num_conv_blocks: int, channels: List[int], num_classes: int, include_batch_norm: bool = True, include_dropout: bool = True) Dict[str, any][source]

Estimate model size without actually creating it.

Parameters:
  • num_conv_blocks – Number of convolutional blocks.

  • channels – Channel progression.

  • num_classes – Number of output classes.

  • include_batch_norm – Whether batch norm is included.

  • include_dropout – Whether dropout is included.

Returns:

Dictionary containing estimated parameter counts.

predict_memory_usage(num_conv_blocks: int, channels: List[int], num_classes: int, batch_size: int = 32, input_height: int = 224, input_width: int = 224) Dict[str, float][source]

Predict memory usage without running the model.

Parameters:
  • num_conv_blocks – Number of convolutional blocks.

  • channels – Channel progression.

  • num_classes – Number of output classes.

  • batch_size – Batch size for training.

  • input_height – Input height.

  • input_width – Input width.

Returns:

Dictionary with estimated memory in MB.

class ArchitectureConfig(input_shape: Tuple[int, int, int], num_classes: int, num_conv_blocks: int = 4, channels: str | List[int] = 'auto', kernel_sizes: int | List[int] = 3, strides: int | List[int] = 1, activations: str | List[str] = 'relu', dropout_rates: float | List[float] = 0.0, use_batchnorm: bool = True, pattern: str = 'sequential', use_attention: bool = False, use_residual: bool = False, use_dense: bool = False)[source]

Bases: object

Architecture configuration dataclass.

input_shape

Tuple of (channels, height, width)

Type:

Tuple[int, int, int]

num_classes

Number of output classes

Type:

int

num_conv_blocks

Number of convolutional blocks

Type:

int

channels

List of channel sizes or ‘auto’

Type:

str | List[int]

kernel_sizes

List of kernel sizes or single value

Type:

int | List[int]

strides

List of strides or single value

Type:

int | List[int]

activations

List of activation names or single name

Type:

str | List[str]

dropout_rates

List of dropout rates or single value

Type:

float | List[float]

use_batchnorm

Whether to use batch normalization

Type:

bool

pattern

Architecture pattern

Type:

str

use_attention

Whether to use attention mechanisms

Type:

bool

use_residual

Whether to use residual connections

Type:

bool

use_dense

Whether to use dense connections

Type:

bool

activations: str | List[str] = 'relu'
channels: str | List[int] = 'auto'
dropout_rates: float | List[float] = 0.0
classmethod from_dict(data: Dict[str, Any]) ArchitectureConfig[source]

Create from dictionary.

kernel_sizes: int | List[int] = 3
num_conv_blocks: int = 4
pattern: str = 'sequential'
strides: int | List[int] = 1
to_dict() Dict[str, Any][source]

Convert to dictionary.

use_attention: bool = False
use_batchnorm: bool = True
use_dense: bool = False
use_residual: bool = False
validate() bool[source]

Validate configuration.

Returns:

True if valid

Raises:

ValueError – If invalid

input_shape: Tuple[int, int, int]
num_classes: int
class GridSearch(search_space: Dict[str, List[Any]], base_config: Dict[str, Any] | None = None)[source]

Bases: object

Grid search over architecture hyperparameters.

Systematically explores all combinations of provided parameters.

Examples

>>> search_space = {
...     'num_conv_blocks': [3, 4, 5],
...     'channels': [['auto'], [[64, 128, 256], [32, 64, 128, 256], [64, 128, 256, 512]]],
...     'activation': ['relu', 'gelu'],
... }
>>> searcher = GridSearch(search_space)
>>> configs = list(searcher.generate())
>>> len(configs)  # 3 * 3 * 2 = 18
generate()[source]

Generate all configurations.

Yields:

ArchitectureConfig objects

generate_with_index()[source]

Generate configurations with their index.

Yields:

Tuple of (index, ArchitectureConfig)

class RandomSearch(search_space: Dict[str, List[Any]], num_samples: int = 10, base_config: Dict[str, Any] | None = None, seed: int | None = None)[source]

Bases: object

Random search over architecture hyperparameters.

Randomly samples from provided parameter distributions.

Examples

>>> search_space = {
...     'num_conv_blocks': [3, 4, 5, 6],
...     'activation': ['relu', 'gelu', 'leaky_relu'],
...     'dropout_rate': [0.0, 0.1, 0.2, 0.3],
... }
>>> searcher = RandomSearch(search_space, num_samples=100)
>>> configs = list(searcher.generate())
generate()[source]

Generate random configurations.

Yields:

ArchitectureConfig objects

generate_with_index()[source]

Generate random configurations with their index.

Yields:

Tuple of (index, ArchitectureConfig)

class ArchitectureFactory[source]

Bases: object

Factory for generating architectures from patterns and configurations.

Simplifies creating models from predefined patterns.

Examples

>>> factory = ArchitectureFactory()
>>> config = ArchitectureConfig(
...     input_shape=(3, 224, 224),
...     num_classes=1000,
...     pattern='residual'
... )
>>> architecture_dict = factory.create(config)
create(config: ArchitectureConfig) Dict[str, Any][source]

Create architecture from configuration.

Parameters:

config – Architecture configuration

Returns:

Dictionary with architecture information

Raises:

ValueError – If pattern is not recognized

register_pattern(name: str, builder: Callable[[ArchitectureConfig], Dict[str, Any]])[source]

Register custom architecture pattern.

Parameters:
  • name – Pattern name

  • builder – Function that creates architecture from config

class ArchitectureScorer(max_parameters: int | None = None, max_memory_mb: float | None = None, target_flops: float | None = None)[source]

Bases: object

Score architectures based on various metrics.

Evaluates architectures by parameter count, FLOPs, memory usage, etc.

Examples

>>> scorer = ArchitectureScorer()
>>> config = ArchitectureConfig(
...     input_shape=(3, 224, 224),
...     num_classes=1000,
...     num_conv_blocks=4
... )
>>> score = scorer.score_config(config)
score_config(config: ArchitectureConfig) float[source]

Score architecture configuration.

Parameters:

config – Architecture configuration

Returns:

Score (higher is better)

class ArchitectureComparator(configs: List[ArchitectureConfig], scorer: ArchitectureScorer | None = None)[source]

Bases: object

Compare multiple architectures.

Examples

>>> configs = [
...     ArchitectureConfig(...),
...     ArchitectureConfig(...),
... ]
>>> comparator = ArchitectureComparator(configs)
>>> best = comparator.get_best()
get_best() Tuple[ArchitectureConfig, float][source]

Get best configuration.

get_statistics() Dict[str, float][source]

Get score statistics.

get_top_k(k: int = 3) List[Tuple[ArchitectureConfig, float]][source]

Get top k configurations.

get_worst() Tuple[ArchitectureConfig, float][source]

Get worst configuration.

class ArchitecturePattern(value)[source]

Bases: Enum

Predefined architecture patterns.

SEQUENTIAL = 'sequential'
RESIDUAL = 'residual'
DENSE = 'dense'
INCEPTION = 'inception'
MIXED = 'mixed'
expand_config_list(value: Any | List[Any], length: int) List[Any][source]

Expand single value or list to specified length.

Parameters:
  • value – Single value or list

  • length – Target length

Returns:

List of specified length

Raises:

ValueError – If list length doesn’t match target

sample_architecture(num_conv_blocks: int, min_channels: int = 32, max_channels: int = 512, min_kernel: int = 3, max_kernel: int = 7) ArchitectureConfig[source]

Generate random architecture.

Parameters:
  • num_conv_blocks – Number of conv blocks

  • min_channels – Minimum channels

  • max_channels – Maximum channels

  • min_kernel – Minimum kernel size

  • max_kernel – Maximum kernel size

Returns:

Random ArchitectureConfig

Model Analysis

print_model_summary(model: Module, input_shape: Tuple[int, ...], verbose: bool = True) Dict[str, Any][source]

Print comprehensive model summary.

Parameters:
  • model – PyTorch model to summarize.

  • input_shape – Input tensor shape (including batch dimension).

  • verbose – Whether to print detailed information.

Returns:

Dictionary containing all summary information.

validate_architecture(input_shape: Tuple[int, int, int], num_classes: int, num_conv_blocks: int, channels: List[int] | None = None, pooling_kernel_size: int = 2, pooling_stride: int = 2) Dict[str, any][source]

Validate complete architecture configuration.

Parameters:
  • input_shape – Input shape as (C, H, W).

  • num_classes – Number of output classes.

  • num_conv_blocks – Number of convolutional blocks.

  • channels – Channel progression (optional).

  • pooling_kernel_size – Pooling kernel size.

  • pooling_stride – Pooling stride.

Returns:

Dictionary containing validation results for all checks.

get_model_flops(model: Module, input_shape: Tuple[int, ...]) Dict[str, int | str][source]

Estimate model FLOPs (floating point operations).

Note: This is an estimation based on standard layer operations. Actual FLOPs may vary based on implementation details.

Parameters:
  • model – PyTorch model to analyze.

  • input_shape – Input tensor shape (including batch dimension).

Returns:

  • ‘total_flops’: Total estimated FLOPs

  • ’total_flops_in_billions’: Total FLOPs in billions

  • ’success’: Whether estimation succeeded

Return type:

Dictionary containing