Utilities
Utils module: Utilities (summary, validation, FLOPs calculation).
Provides model analysis tools including: - Model summaries with parameter and memory analysis - Architecture validation and dimension tracking - Memory and FLOPs estimation
- calculate_output_shape(model: Module, input_shape: Tuple[int, ...], device: str = 'cpu') Dict[str, Any][source]
Calculate output shape by forward pass with dummy input.
- Parameters:
model – PyTorch model to analyze.
input_shape – Input tensor shape (including batch dimension).
device – Device to run the model on (‘cpu’ or ‘cuda’).
- Returns:
‘output_shape’: Shape of the output tensor
’output_size’: Total elements in output
’success’: Whether calculation succeeded
- Return type:
Dictionary containing
- Raises:
ValueError – If input shape is invalid.
- get_layer_summary(model: Module, input_shape: Tuple[int, ...], device: str = 'cpu') List[Dict[str, Any]][source]
Get layer-by-layer summary with output shapes.
Hooks into model layers to capture output shapes during forward pass.
- Parameters:
model – PyTorch model to analyze.
input_shape – Input tensor shape (including batch dimension).
device – Device to run the model on.
- Returns:
‘name’: Layer name/path
’type’: Layer type (class name)
’output_shape’: Output shape of the layer
’parameters’: Number of parameters
’trainable’: Whether layer is trainable
- Return type:
List of dictionaries containing layer information
- count_parameters_by_type(model: Module) Dict[str, Dict[str, int]][source]
Count parameters grouped by layer type.
- Parameters:
model – PyTorch model to analyze.
- Returns:
‘total’: Total parameters of this type
’trainable’: Trainable parameters
’count’: Number of layers of this type
- Return type:
Dictionary mapping layer types to
- get_memory_usage(model: Module, input_shape: Tuple[int, ...], device: str = 'cpu') Dict[str, float | str][source]
Estimate model memory usage.
- Parameters:
model – PyTorch model to analyze.
input_shape – Input tensor shape (including batch dimension).
device – Device to run the model on.
- Returns:
‘parameter_memory_mb’: Memory used by parameters
’activation_memory_mb’: Estimated activation memory
’total_memory_mb’: Total estimated memory
’device’: Device used for estimation
- Return type:
Dictionary containing
- get_model_flops(model: Module, input_shape: Tuple[int, ...]) Dict[str, int | str][source]
Estimate model FLOPs (floating point operations).
Note: This is an estimation based on standard layer operations. Actual FLOPs may vary based on implementation details.
- Parameters:
model – PyTorch model to analyze.
input_shape – Input tensor shape (including batch dimension).
- Returns:
‘total_flops’: Total estimated FLOPs
’total_flops_in_billions’: Total FLOPs in billions
’success’: Whether estimation succeeded
- Return type:
Dictionary containing
- print_model_summary(model: Module, input_shape: Tuple[int, ...], verbose: bool = True) Dict[str, Any][source]
Print comprehensive model summary.
- Parameters:
model – PyTorch model to summarize.
input_shape – Input tensor shape (including batch dimension).
verbose – Whether to print detailed information.
- Returns:
Dictionary containing all summary information.
- validate_spatial_dimensions(input_shape: Tuple[int, int, int], num_conv_blocks: int, pooling_kernel_size: int = 2, pooling_stride: int = 2, min_final_size: int = 4) Dict[str, any][source]
Validate that spatial dimensions don’t become too small.
- Parameters:
input_shape – Input shape as (C, H, W).
num_conv_blocks – Number of convolutional blocks.
pooling_kernel_size – Pooling kernel size.
pooling_stride – Pooling stride.
min_final_size – Minimum acceptable final spatial size.
- Returns:
‘valid’: Whether dimensions are valid
’input_spatial’: Original spatial dimensions
’final_spatial’: Final spatial dimensions after pooling
’num_pooling_ops’: Number of pooling operations
’warnings’: List of warning messages
- Return type:
Dictionary containing
- validate_channel_progression(channels: List[int], num_conv_blocks: int, max_channels: int = 2048) Dict[str, any][source]
Validate channel progression configuration.
- Parameters:
channels – List of channel sizes per block.
num_conv_blocks – Expected number of blocks.
max_channels – Maximum allowed channels per layer.
- Returns:
‘valid’: Whether configuration is valid
’warnings’: List of warning messages
’growth_rate’: Average growth per block
- Return type:
Dictionary containing
- validate_architecture(input_shape: Tuple[int, int, int], num_classes: int, num_conv_blocks: int, channels: List[int] | None = None, pooling_kernel_size: int = 2, pooling_stride: int = 2) Dict[str, any][source]
Validate complete architecture configuration.
- Parameters:
input_shape – Input shape as (C, H, W).
num_classes – Number of output classes.
num_conv_blocks – Number of convolutional blocks.
channels – Channel progression (optional).
pooling_kernel_size – Pooling kernel size.
pooling_stride – Pooling stride.
- Returns:
Dictionary containing validation results for all checks.
- estimate_model_size(num_conv_blocks: int, channels: List[int], num_classes: int, include_batch_norm: bool = True, include_dropout: bool = True) Dict[str, any][source]
Estimate model size without actually creating it.
- Parameters:
num_conv_blocks – Number of convolutional blocks.
channels – Channel progression.
num_classes – Number of output classes.
include_batch_norm – Whether batch norm is included.
include_dropout – Whether dropout is included.
- Returns:
Dictionary containing estimated parameter counts.
- predict_memory_usage(num_conv_blocks: int, channels: List[int], num_classes: int, batch_size: int = 32, input_height: int = 224, input_width: int = 224) Dict[str, float][source]
Predict memory usage without running the model.
- Parameters:
num_conv_blocks – Number of convolutional blocks.
channels – Channel progression.
num_classes – Number of output classes.
batch_size – Batch size for training.
input_height – Input height.
input_width – Input width.
- Returns:
Dictionary with estimated memory in MB.
- class ArchitectureConfig(input_shape: Tuple[int, int, int], num_classes: int, num_conv_blocks: int = 4, channels: str | List[int] = 'auto', kernel_sizes: int | List[int] = 3, strides: int | List[int] = 1, activations: str | List[str] = 'relu', dropout_rates: float | List[float] = 0.0, use_batchnorm: bool = True, pattern: str = 'sequential', use_attention: bool = False, use_residual: bool = False, use_dense: bool = False)[source]
Bases:
objectArchitecture configuration dataclass.
- validate() bool[source]
Validate configuration.
- Returns:
True if valid
- Raises:
ValueError – If invalid
- class GridSearch(search_space: Dict[str, List[Any]], base_config: Dict[str, Any] | None = None)[source]
Bases:
objectGrid search over architecture hyperparameters.
Systematically explores all combinations of provided parameters.
Examples
>>> search_space = { ... 'num_conv_blocks': [3, 4, 5], ... 'channels': [['auto'], [[64, 128, 256], [32, 64, 128, 256], [64, 128, 256, 512]]], ... 'activation': ['relu', 'gelu'], ... } >>> searcher = GridSearch(search_space) >>> configs = list(searcher.generate()) >>> len(configs) # 3 * 3 * 2 = 18
- class RandomSearch(search_space: Dict[str, List[Any]], num_samples: int = 10, base_config: Dict[str, Any] | None = None, seed: int | None = None)[source]
Bases:
objectRandom search over architecture hyperparameters.
Randomly samples from provided parameter distributions.
Examples
>>> search_space = { ... 'num_conv_blocks': [3, 4, 5, 6], ... 'activation': ['relu', 'gelu', 'leaky_relu'], ... 'dropout_rate': [0.0, 0.1, 0.2, 0.3], ... } >>> searcher = RandomSearch(search_space, num_samples=100) >>> configs = list(searcher.generate())
- class ArchitectureFactory[source]
Bases:
objectFactory for generating architectures from patterns and configurations.
Simplifies creating models from predefined patterns.
Examples
>>> factory = ArchitectureFactory() >>> config = ArchitectureConfig( ... input_shape=(3, 224, 224), ... num_classes=1000, ... pattern='residual' ... ) >>> architecture_dict = factory.create(config)
- create(config: ArchitectureConfig) Dict[str, Any][source]
Create architecture from configuration.
- Parameters:
config – Architecture configuration
- Returns:
Dictionary with architecture information
- Raises:
ValueError – If pattern is not recognized
- class ArchitectureScorer(max_parameters: int | None = None, max_memory_mb: float | None = None, target_flops: float | None = None)[source]
Bases:
objectScore architectures based on various metrics.
Evaluates architectures by parameter count, FLOPs, memory usage, etc.
Examples
>>> scorer = ArchitectureScorer() >>> config = ArchitectureConfig( ... input_shape=(3, 224, 224), ... num_classes=1000, ... num_conv_blocks=4 ... ) >>> score = scorer.score_config(config)
- score_config(config: ArchitectureConfig) float[source]
Score architecture configuration.
- Parameters:
config – Architecture configuration
- Returns:
Score (higher is better)
- class ArchitectureComparator(configs: List[ArchitectureConfig], scorer: ArchitectureScorer | None = None)[source]
Bases:
objectCompare multiple architectures.
Examples
>>> configs = [ ... ArchitectureConfig(...), ... ArchitectureConfig(...), ... ] >>> comparator = ArchitectureComparator(configs) >>> best = comparator.get_best()
- get_best() Tuple[ArchitectureConfig, float][source]
Get best configuration.
- get_worst() Tuple[ArchitectureConfig, float][source]
Get worst configuration.
- class ArchitecturePattern(value)[source]
Bases:
EnumPredefined architecture patterns.
- SEQUENTIAL = 'sequential'
- RESIDUAL = 'residual'
- DENSE = 'dense'
- INCEPTION = 'inception'
- MIXED = 'mixed'
- expand_config_list(value: Any | List[Any], length: int) List[Any][source]
Expand single value or list to specified length.
- Parameters:
value – Single value or list
length – Target length
- Returns:
List of specified length
- Raises:
ValueError – If list length doesn’t match target
- sample_architecture(num_conv_blocks: int, min_channels: int = 32, max_channels: int = 512, min_kernel: int = 3, max_kernel: int = 7) ArchitectureConfig[source]
Generate random architecture.
- Parameters:
num_conv_blocks – Number of conv blocks
min_channels – Minimum channels
max_channels – Maximum channels
min_kernel – Minimum kernel size
max_kernel – Maximum kernel size
- Returns:
Random ArchitectureConfig
Model Analysis
- print_model_summary(model: Module, input_shape: Tuple[int, ...], verbose: bool = True) Dict[str, Any][source]
Print comprehensive model summary.
- Parameters:
model – PyTorch model to summarize.
input_shape – Input tensor shape (including batch dimension).
verbose – Whether to print detailed information.
- Returns:
Dictionary containing all summary information.
- validate_architecture(input_shape: Tuple[int, int, int], num_classes: int, num_conv_blocks: int, channels: List[int] | None = None, pooling_kernel_size: int = 2, pooling_stride: int = 2) Dict[str, any][source]
Validate complete architecture configuration.
- Parameters:
input_shape – Input shape as (C, H, W).
num_classes – Number of output classes.
num_conv_blocks – Number of convolutional blocks.
channels – Channel progression (optional).
pooling_kernel_size – Pooling kernel size.
pooling_stride – Pooling stride.
- Returns:
Dictionary containing validation results for all checks.
- get_model_flops(model: Module, input_shape: Tuple[int, ...]) Dict[str, int | str][source]
Estimate model FLOPs (floating point operations).
Note: This is an estimation based on standard layer operations. Actual FLOPs may vary based on implementation details.
- Parameters:
model – PyTorch model to analyze.
input_shape – Input tensor shape (including batch dimension).
- Returns:
‘total_flops’: Total estimated FLOPs
’total_flops_in_billions’: Total FLOPs in billions
’success’: Whether estimation succeeded
- Return type:
Dictionary containing