Tune Hyperparameters
All training hyperparameters live in a single frozen dataclass in config.py.
This guide covers the most impactful knobs and when to turn them.
What are hyperparameters?
Training a PINN is an optimization process: the network's parameters are adjusted iteratively to minimize a loss function. Hyperparameters are the settings that control how this optimization runs: how fast parameters change per step (learning rate), how many steps to take (epochs), and how to balance competing objectives (loss weights). They are not learned automatically; you choose them.
The hyperparameter hierarchy
hp = ODEHyperparameters(
lr=5e-4, # Global learning rate
max_epochs=2000, # Training duration
gradient_clip_val=0.1, # Gradient clipping threshold
training_data=GenerationConfig( # Data configuration
batch_size=100,
collocations=6000,
),
fields_config=MLPConfig(...), # Network for solution fields
params_config=..., # Network/scalar for parameters
scheduler=ReduceLROnPlateauConfig(...), # LR scheduling
pde_weight=1, # Loss term weights
ic_weight=1,
data_weight=1,
)
Learning rate
The single most impactful hyperparameter. Start with 1e-3 for simple problems
and decrease to 5e-4 or 1e-4 for harder ones.
Use a scheduler
ReduceLROnPlateauConfig automatically decreases the learning rate when
the loss plateaus. This is almost always better than a fixed rate:
Network architecture
Fields (solution approximation)
fields_config=MLPConfig(
in_dim=1, # 1 for ODEs, 2+ for PDEs
out_dim=1, # 1 per field
hidden_layers=[64, 128, 128, 64], # Width and depth
activation="tanh", # Smooth activation for derivatives
output_activation=None, # None for unconstrained output
)
Rules of thumb:
- Start with
[64, 128, 128, 64], which works for most problems - Add depth (more layers) for complex dynamics before adding width
- Use
tanhas the default activation. It's smooth and works well with automatic differentiation - Use
softplusas output activation when the field must be positive
Parameters
For scalar parameters (constants to recover):
For function-valued parameters (varying over the domain):
params_config=MLPConfig(
in_dim=1, out_dim=1,
hidden_layers=[64, 128, 128, 64],
activation="tanh",
output_activation="softplus", # Often needed to keep values positive
)
Loss weights
The three weights pde_weight, ic_weight, and data_weight control the
balance between physics enforcement, initial conditions, and data fitting.
See Loss Weighting Strategies for detailed guidance.
Quick start: Leave all weights at 1 initially, then increase data_weight
if the model doesn't fit observations, or increase pde_weight if the
recovered parameters are physically unreasonable.
Collocation density
The number of collocation points controls how densely the physics is enforced:
training_data=GenerationConfig(
collocations=6000, # More = better physics, slower training
batch_size=100, # Points per training step
)
- Too few: The network may satisfy the equation at sampled points but violate it elsewhere
- Too many: Training becomes slow with diminishing returns
Start with 1000–5000 for ODEs and 5000–20000 for PDEs.
Gradient clipping
Prevents exploding gradients during early training. Lower values (0.01–0.1)
add stability at the cost of slower convergence. Set to None to disable.
Diagnostic checklist
| Symptom | Likely cause | Fix |
|---|---|---|
| Loss doesn't decrease | Learning rate too low or too high | Try 1e-3, 5e-4, 1e-4 |
| Loss oscillates wildly | Learning rate too high | Reduce LR, increase gradient_clip_val |
| Good data fit, bad parameter recovery | pde_weight too low |
Increase pde_weight |
| Smooth but wrong solution | Not enough collocation points | Increase collocations |
| Training is very slow | Network too wide or too many collocations | Reduce hidden_layers or collocations |