Data¶
- class deepdiagnostics.data.data.Data(path, simulator_name, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)¶
Load stored data to use in diagnostics
- Parameters:
path (str) – path to the data file.
simulator_name (str) – Name of the register simulator. If your simulator is not registered with utils.register_simulator, it will produce an error here.
simulator_kwargs (dict, optional) – Any additional kwargs used set up your simulator. Defaults to None.
prior (str, optional) – If the prior is not given in the data, use a numpy random distribution. Specified by name. Choose from: { “normal” “poisson” “uniform” “gamma” “beta” “binominal}. Defaults to None.
prior_kwargs (dict, optional) – kwargs for the numpy prior. View this page for a description. Defaults to None.
simulation_dimensions (Optional[int], optional) – 1 or 2. 1->output of the simulator has one dimensions, 2->output has two dimensions (is an image). Defaults to None.
- get_sigma_true()¶
Look for the true sigma of data. If supplied in the method, use that, other look in the configuration file. If neither are supplied, return 1.
- Returns:
Sigma value selected by the search.
- Return type:
Any
- load_prior(prior, prior_kwargs)¶
Load the prior. Either try to get it from data (if it has been implemented for the type of data), or use numpy to initialize a random distribution using the prior argument.
- Parameters:
prior (str) – Name of prior.
prior_kwargs (dict[str, any]) – kwargs for initializing the prior.
- Raises:
NotImplementedError – The selected prior is not included.
RuntimeError – The selected prior is missing arguments to initialize.
- Returns:
Prior that can be sampled from by calling it with prior(n_samples)
- Return type:
callable
- sample_prior(n_samples)¶
Sample from the prior.
- Parameters:
n_samples (int) – Number of samples to draw.
- Returns:
Samples drawn from the prior.
- Return type:
np.ndarray
- class deepdiagnostics.data.H5Data(path, simulator, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)¶
Load data that has been saved in a h5 format.
If you cast your problem to be y = mx + b, these are the fields required and what they represent:
simulator_outcome - y thetas - parameters of the model - m, b context - xs
- Data Parameters
- Xs:
[REQUIRED] The context, the x values. The data that was used to train a model on what conditions produce what posterior.
- Thetas:
[REQUIRED] The theta, the parameters of the external model. The data used to train the model’s posterior.
- Prior:
Distribution used to initialize the posterior before training.
- Sigma:
True standard deviation of the actual thetas, if known.
- get_sigma_true()¶
Try to get the true standard deviation of the data. If it is not supplied, return 1.
- Returns:
sigma.
- Return type:
Any
- prior()¶
If the data has a supplied prior, return it. If not, the data module will default back to picking a prior from a random distribution.
- Raises:
NotImplementedError – The data does not have a prior field.
- class deepdiagnostics.data.PickleData(path, simulator, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)¶
Load data that is saved as a .pkl file.
- save(data, path)¶
Save data in the form of a .pkl file.
- Parameters:
data (Any) – Data that can be encoded into a pkl.
path (str) – Out file path for the data. Must have a .pkl extension.
- Return type:
None
- class deepdiagnostics.data.simulator.Simulator¶
- abstract generate_context(n_samples)¶
[ABSTRACT, MUST BE FILLED]
Specify how the conditioning context is generated. Can come from data, or from a generic distribution.
Example:
# Generate from a random distribution class MySim(Simulator): def generate_context(self, n_samples: int) -> np.ndarray: return np.random.uniform(0, 1) # Draw from a sample class MySim(Simulator): def __init__(self): self.data_source = ..... def generate_context(self, n_samples: int) -> np.ndarray: return self.data_source.sample(n_samples)
- Parameters:
n_samples (int) – Number of samples of context to pull
- Returns:
Conditioning context used to produce simulated outcomes with a given theta.
- Return type:
np.ndarray
- abstract simulate(theta, context_samples)¶
[ABSTRACT, MUST BE FILLED]
Specify a simulation S such that y_{theta} = S(context_samples|theta)
Example: .. code-block:: python
# Generate from a random distribution class MySim(Simulator):
- def simulate(self, theta: np.ndarray, context_samples: np.ndarray) -> np.ndarray:
simulation_results = np.zeros(theta.shape[0], 1) for index, context in enumerate(context_samples):
simulation_results[index] = theta[index][0]*context + theta[index][1]*context
return simulation_results
- Parameters:
theta (np.ndarray) – Parameters of the simulation model
context_samples (np.ndarray) – Samples to use with the theta-primed simulation model
- Returns:
Simulated outcome.
- Return type:
np.ndarray
- class deepdiagnostics.data.lookup_table_simulator.LookupTableSimulator(data, random_state, outside_range_limit=2.0, hash_precision=10)¶
A lookup table that mocks a simulator - assume your data is perfectly representative of a simulator (or else you are okay with nearest neighbor matching)
Does not need to be registered, it is automatically available as the default simulator
Assumes your has the following fields accessible as data[“context”], data[“thetas”], data[“simulator_outcome”], where xs is the context, thetas are the parameters, and ys are the outcomes
- Parameters:
data (tensor)
random_state (Generator)
outside_range_limit (float)
hash_precision (int)
- generate_context(n_samples)¶
Draw samples from the context data
- simulate(theta, context_samples)¶
Find the outcome y for a given theta and context sample. If no exact match, take the nearest neighbor (via the L2 norm of normalized theta and context)
- Parameters:
theta (Union[np.ndarray, float]) – parameter(s) to simulate
context_samples (Union[np.ndarray, float]) – context(s) to condition on
- Returns:
Simulated outcomes(s)
- Return type:
np.ndarray