Data¶
- class deepdiagnostics.data.data.Data(path, simulator_name, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)¶
Load stored data to use in diagnostics
- Parameters:
path (str) – path to the data file.
simulator_name (str) – Name of the register simulator. If your simulator is not registered with utils.register_simulator, it will produce an error here.
simulator_kwargs (dict, optional) – Any additional kwargs used set up your simulator. Defaults to None.
prior (str, optional) – If the prior is not given in the data, use a numpy random distribution. Specified by name. Choose from: { “normal” “poisson” “uniform” “gamma” “beta” “binominal}. Defaults to None.
prior_kwargs (dict, optional) – kwargs for the numpy prior. View this page for a description. Defaults to None.
simulation_dimensions (Optional[int], optional) – 1 or 2. 1->output of the simulator has one dimensions, 2->output has two dimensions (is an image). Defaults to None.
- get_sigma_true()¶
Look for the true sigma of data. If supplied in the method, use that, other look in the configuration file. If neither are supplied, return 1.
- Returns:
Sigma value selected by the search.
- Return type:
Any
- get_simulator_output_shape()¶
Run a single sample of the simulator to verify the out-shape.
- Returns:
Output shape of a single sample of the simulator.
- Return type:
tuple[Sequence[int]]
- get_theta_true()¶
Look for the true theta given by data. If supplied in the method, use that, other look in the configuration file. If neither are supplied, return None.
- Returns:
Theta value selected by the search.
- Return type:
Any
- load_prior(prior, prior_kwargs)¶
Load the prior. Either try to get it from data (if it has been implemented for the type of data), or use numpy to initialize a random distribution using the prior argument.
- Parameters:
prior (str) – Name of prior.
prior_kwargs (dict[str, any]) – kwargs for initializing the prior.
- Raises:
NotImplementedError – The selected prior is not included.
RuntimeError – The selected prior is missing arguments to initialize.
- Returns:
Prior that can be sampled from by calling it with prior(n_samples)
- Return type:
callable
- sample_prior(n_samples)¶
Draw samples from the simulator
- Parameters:
n_samples (int) – Number of samples to draw
- Return type:
np.ndarray
- simulated_context(n_samples)¶
Call the simulator’s generate_context method.
- Parameters:
n_samples (int) – Number of samples to draw.
- Returns:
context (x values), as defined by the simulator.
- Return type:
np.ndarray
- simulator_outcome(theta, condition_context=None, n_samples=None)¶
_summary_
- Parameters:
theta (np.ndarray) – Theta value of shape (n_samples, theta_dimensions)
condition_context (np.ndarray, optional) – If x values for theta are known, use them. Defaults to None.
n_samples (int, optional) – If x values are not known for theta, draw them randomly. Defaults to None.
- Raises:
ValueError – If either n samples or content samples is supplied.
- Returns:
Simulator output of shape (n samples, simulator_dimensions)
- Return type:
np.ndarray
- true_context()¶
True data x values, if supplied by the data method.
- true_simulator_outcome()¶
Run the simulator on all true theta and true x values.
- Returns:
array of (n samples, simulator shape) showing output of the simulator on all true samples in data.
- Return type:
np.ndarray
- class deepdiagnostics.data.H5Data(path, simulator, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)¶
Load data that has been saved in a h5 format.
- Data Parameters
- Xs:
[REQUIRED] The context, the x values. The data that was used to train a model on what conditions produce what posterior.
- Thetas:
[REQUIRED] The theta, the parameters of the external model. The data used to train the model’s posterior.
- Prior:
Distribution used to initialize the posterior before training.
- Sigma:
True standard deviation of the actual thetas, if known.
- get_sigma_true()¶
Try to get the true standard deviation of the data. If it is not supplied, return 1.
- Returns:
sigma.
- Return type:
Any
- get_theta_true()¶
Get stored theta used to train the model.
- Returns:
theta array
- Raises:
NotImplementedError – Data does not have thetas.
- prior()¶
If the data has a supplied prior, return it. If not, the data module will default back to picking a prior from a random distribution.
- Raises:
NotImplementedError – The data does not have a prior field.
- true_context()¶
Try to get the xs field of the loaded data.
- Raises:
NotImplementedError – The data does not have a xs field.
- class deepdiagnostics.data.PickleData(path, simulator, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)¶
Load data that is saved as a .pkl file.
- save(data, path)¶
Save data in the form of a .pkl file.
- Parameters:
data (Any) – Data that can be encoded into a pkl.
path (str) – Out file path for the data. Must have a .pkl extension.
- Return type:
None
- class deepdiagnostics.data.simulator.Simulator¶
- abstract generate_context(n_samples)¶
[ABSTRACT, MUST BE FILLED]
Specify how the conditioning context is generated. Can come from data, or from a generic distribution.
Example:
# Generate from a random distribution class MySim(Simulator): def generate_context(self, n_samples: int) -> np.ndarray: return np.random.uniform(0, 1) # Draw from a sample class MySim(Simulator): def __init__(self): self.data_source = ..... def generate_context(self, n_samples: int) -> np.ndarray: return self.data_source.sample(n_samples)
- Parameters:
n_samples (int) – Number of samples of context to pull
- Returns:
Conditioning context used to produce simulated outcomes with a given theta.
- Return type:
np.ndarray
- abstract simulate(theta, context_samples)¶
[ABSTRACT, MUST BE FILLED]
Specify a simulation S such that y_{theta} = S(context_samples|theta)
Example:
# Generate from a random distribution class MySim(Simulator): def simulate(self, theta: np.ndarray, context_samples: np.ndarray) -> np.ndarray: simulation_results = np.zeros(theta.shape[0], 1) for index, context in enumerate(context_samples): simulation_results[index] = theta[index][0]*context + theta[index][1]*context return simulation_results
- Parameters:
theta (np.ndarray) – Parameters of the simulation model
context_samples (np.ndarray) – Samples to use with the theta-primed simulation model
- Returns:
Simulated outcome.
- Return type:
np.ndarray