Tune Hyperparameters

All training hyperparameters live in a single frozen dataclass in config.py. This guide covers the most impactful knobs and when to turn them.

What are hyperparameters?

Training a PINN is an optimization process: the network's parameters are adjusted iteratively to minimize a loss function. Hyperparameters are the settings that control how this optimization runs: how fast parameters change per step (learning rate), how many steps to take (epochs), and how to balance competing objectives (loss weights). They are not learned automatically; you choose them.

The hyperparameter hierarchy

hp = ODEHyperparameters(
    lr=5e-4,                          # Global learning rate
    max_epochs=2000,                  # Training duration
    gradient_clip_val=0.1,            # Gradient clipping threshold
    training_data=GenerationConfig(   # Data configuration
        batch_size=100,
        collocations=6000,
    ),
    fields_config=MLPConfig(...),     # Network for solution fields
    params_config=...,                # Network/scalar for parameters
    scheduler=ReduceLROnPlateauConfig(...),  # LR scheduling
    pde_weight=1,                     # Loss term weights
    ic_weight=1,
    data_weight=1,
)

Learning rate

The single most impactful hyperparameter. Start with 1e-3 for simple problems and decrease to 5e-4 or 1e-4 for harder ones.

Use a scheduler

ReduceLROnPlateauConfig automatically decreases the learning rate when the loss plateaus. This is almost always better than a fixed rate:

scheduler=ReduceLROnPlateauConfig(
    mode="min",
    factor=0.5,       # Halve the LR
    patience=55,      # Wait 55 epochs before reducing
    threshold=5e-3,   # Minimum improvement to count
    min_lr=1e-6,      # Don't go below this
)

Network architecture

Fields (solution approximation)

fields_config=MLPConfig(
    in_dim=1,                           # 1 for ODEs, 2+ for PDEs
    out_dim=1,                          # 1 per field
    hidden_layers=[64, 128, 128, 64],   # Width and depth
    activation="tanh",                  # Smooth activation for derivatives
    output_activation=None,             # None for unconstrained output
)

Rules of thumb:

Start with [64, 128, 128, 64], which works for most problems
Add depth (more layers) for complex dynamics before adding width
Use tanh as the default activation. It's smooth and works well with automatic differentiation
Use softplus as output activation when the field must be positive

Parameters

For scalar parameters (constants to recover):

params_config=ScalarConfig(init_value=0.1)

For function-valued parameters (varying over the domain):

params_config=MLPConfig(
    in_dim=1, out_dim=1,
    hidden_layers=[64, 128, 128, 64],
    activation="tanh",
    output_activation="softplus",     # Often needed to keep values positive
)

Loss weights

The three weights pde_weight, ic_weight, and data_weight control the balance between physics enforcement, initial conditions, and data fitting.

See Loss Weighting Strategies for detailed guidance.

Quick start: Leave all weights at 1 initially, then increase data_weight if the model doesn't fit observations, or increase pde_weight if the recovered parameters are physically unreasonable.

Collocation density

The number of collocation points controls how densely the physics is enforced:

training_data=GenerationConfig(
    collocations=6000,     # More = better physics, slower training
    batch_size=100,        # Points per training step
)

Too few: The network may satisfy the equation at sampled points but violate it elsewhere
Too many: Training becomes slow with diminishing returns

Start with 1000–5000 for ODEs and 5000–20000 for PDEs.

Gradient clipping

gradient_clip_val=0.1

Prevents exploding gradients during early training. Lower values (0.01–0.1) add stability at the cost of slower convergence. Set to None to disable.

Diagnostic checklist

Symptom	Likely cause	Fix
Loss doesn't decrease	Learning rate too low or too high	Try `1e-3`, `5e-4`, `1e-4`
Loss oscillates wildly	Learning rate too high	Reduce LR, increase `gradient_clip_val`
Good data fit, bad parameter recovery	`pde_weight` too low	Increase `pde_weight`
Smooth but wrong solution	Not enough collocation points	Increase `collocations`
Training is very slow	Network too wide or too many collocations	Reduce `hidden_layers` or `collocations`