Quickstart¶
Notebook Example¶
An example notebook can be found here for an interactive walkthrough.
Installation¶
From PyPi
pip install deepdiagnostics
From Source
git clone https://github.com/deepskies/DeepDiagnostics/
pip install poetry
poetry install
poetry run diagnose --help
Pre-requisites¶
DeepDiagnostics does not train models or generate data, they must be provided. Possible model formats are listed in Models and data formats in Data. If you are using a simulator, it must be registered by using deepdiagnostics.utils.register.register_simulator. More information can be found in custom_simulations.
Output directories are automatically created, and if a run ID is not specified, one is generated. Only if a run ID is specified will previous runs be overwritten.
Configuration¶
Description of the configuration file, including defaults, can be found in Configuration. Below is a minimal example.
…code-block:: yaml
- common:
out_dir: “./deepdiagnostics_results/” random_seed: 42
- data:
data_engine: “H5Data” data_path: “./data/my_data.h5” simulator: “MySimulator” simulator_kwargs: # Any augments used to initialize the simulator
foo: bar
- model:
model_engine: “SBIModel” model_path: “./models/my_model.pkl”
- plots_common: # Used across all plots
- parameter_labels: # Can either be plain strings or rendered LaTeX strings
“My favorite parameter”
“My least favorite parameter”
“My most mid parameter”
- parameter_colors: # Any color recognized by matplotlib
“#264a95”
“#ed9561”
“#89b7bb”
- line_style_cycle: # Any line type recognized by matplotlib
solid
dashed
dotted
- figure_size: # Approximate size, it can be scaled when adding additional subfigures
6 # x length
6 # y length
- metrics_common: # Used across all metrics (and plots if the plots have a calculation step)
samples_per_inference: 1000 number_simulations: 100 percentiles:
68
95
- plots:
- CoverageFraction: # Arguments supplied to {plottype}.plot()
include_coverage_std: True include_ideal_range: True reference_line_label: “Ideal Coverage”
- TARP:
coverage_sigma: 4 title: “TARP of My Model”
- metrics:
AllSBC Ranks:
num_bins: 3
Pipeline¶
DeepDiagnostics includes a CLI tool for analysis. * To run the tool using a configuration file:
diagnose --config {path to yaml}
To use defaults with specific models and data:
diagnose --model_path {model pkl} --data_path {data pkl} [--simulator {sim name}]
Additional arguments can be found using diagnose -h
Standalone¶
DeepDiagnostics comes with the option to run different plots and metrics independently. This requires setting a configuration file ahead of time, and then running the plots.
All plots and metrics can be found in plots and metrics.
from deepdiagnostics.utils.configuration import Config
from deepdiagnostics.model import SBIModel
from deepdiagnostics.data import H5Data
from deepdiagnostics.plots import LocalTwoSampleTest, Ranks
Config({configuration_path})
model = SBIModel({model_path})
data = H5Data({data_path}, simulator={simulator name})
LocalTwoSampleTest(data=data, model=model, show=True)(use_intensity_plot=False, n_alpha_samples=200)
Ranks(data=data, model=model, show=True)(num_bins=3)
Custom Simulations¶
To use generative model diagnostics, a simulator has to be included. This is done by registering your simulation with a name and a class associated.
By doing this, the DeepDiagnostics can find your simulation at a later time and the simulation does not need to be loaded in memory at time of running the CLI pipeline or standalone modules.
from deepdiagnostics.utils.register import register_simulator
class MySimulation:
def __init__(...)
...
register_simulator(simulator_name="MySimulation", simulator=MySimulation)
Simulations also require two different methods - generate_context (Which is used to either load or generate the non-theta input parameter for the simulation, also called x) and simulate. This is enforced by using the abstract class deepdiagnostics.data.Simulator as a parent class.
from deepdiagnostics.data import Simulator
import numpy as np
class MySimulation(Simulator):
def generate_context(self, n_samples: int) -> np.ndarray:
"""Give a number of samples (int) and get a numpy array of random samples to be used for the simulation"""
return np.random.uniform(0, 1)
def simulate(self, theta: np.ndarray, context_samples: np.ndarray) -> np.ndarray:
"""Give the parameters of the simulation (theta), and x values (context_samples) and get a simulation sample.
Theta and context should have the same shape for dimension 0, the number of samples."""
simulation_results = np.zeros(theta.shape[0], 1)
for index, context in enumerate(context_samples):
simulation_results[index] = theta[index][0]*context + theta[index][1]*context
return simulation_results