Data¶

class deepdiagnostics.data.data.Data(path, simulator_name, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)¶

Load stored data to use in diagnostics

Parameters:

path (str) – path to the data file.
simulator_name (str) – Name of the register simulator. If your simulator is not registered with utils.register_simulator, it will produce an error here.
simulator_kwargs (dict, optional) – Any additional kwargs used set up your simulator. Defaults to None.
prior (str, optional) – If the prior is not given in the data, use a numpy random distribution. Specified by name. Choose from: { “normal” “poisson” “uniform” “gamma” “beta” “binominal}. Defaults to None.
prior_kwargs (dict, optional) – kwargs for the numpy prior. View this page for a description. Defaults to None.
simulation_dimensions (Optional[int], optional) – 1 or 2. 1->output of the simulator has one dimensions, 2->output has two dimensions (is an image). Defaults to None.

get_sigma_true()¶

Look for the true sigma of data. If supplied in the method, use that, other look in the configuration file. If neither are supplied, return 1.

Returns:: Sigma value selected by the search.
Return type:: Any

get_simulator_output_shape()¶

Run a single sample of the simulator to verify the out-shape.

Returns:: Output shape of a single sample of the simulator.
Return type:: tuple[Sequence[int]]

get_theta_true()¶

Look for the true theta given by data. If supplied in the method, use that, other look in the configuration file. If neither are supplied, return None.

Returns:: Theta value selected by the search.
Return type:: Any

load_prior(prior, prior_kwargs)¶

Load the prior. Either try to get it from data (if it has been implemented for the type of data), or use numpy to initialize a random distribution using the prior argument.

Parameters:

prior (str) – Name of prior.
prior_kwargs (dict[str, any]) – kwargs for initializing the prior.

Raises:

NotImplementedError – The selected prior is not included.
RuntimeError – The selected prior is missing arguments to initialize.

Returns:

Prior that can be sampled from by calling it with prior(n_samples)

Return type:

callable

sample_prior(n_samples)¶

Draw samples from the simulator

Parameters:: n_samples (int) – Number of samples to draw
Return type:: np.ndarray

simulated_context(n_samples)¶

Call the simulator’s generate_context method.

Parameters:: n_samples (int) – Number of samples to draw.
Returns:: context (x values), as defined by the simulator.
Return type:: np.ndarray

simulator_outcome(theta, condition_context=None, n_samples=None)¶

_summary_

Parameters:

theta (np.ndarray) – Theta value of shape (n_samples, theta_dimensions)
condition_context (np.ndarray, optional) – If x values for theta are known, use them. Defaults to None.
n_samples (int, optional) – If x values are not known for theta, draw them randomly. Defaults to None.

Raises:

ValueError – If either n samples or content samples is supplied.

Returns:

Simulator output of shape (n samples, simulator_dimensions)

Return type:

np.ndarray

true_context()¶: True data x values, if supplied by the data method.

true_simulator_outcome()¶

Run the simulator on all true theta and true x values.

Returns:: array of (n samples, simulator shape) showing output of the simulator on all true samples in data.
Return type:: np.ndarray

class deepdiagnostics.data.H5Data(path, simulator, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)¶

Load data that has been saved in a h5 format.

Data Parameters

Xs:: [REQUIRED] The context, the x values. The data that was used to train a model on what conditions produce what posterior.
Thetas:: [REQUIRED] The theta, the parameters of the external model. The data used to train the model’s posterior.
Prior:: Distribution used to initialize the posterior before training.
Sigma:: True standard deviation of the actual thetas, if known.

get_sigma_true()¶

Try to get the true standard deviation of the data. If it is not supplied, return 1.

Returns:: sigma.
Return type:: Any

get_theta_true()¶

Get stored theta used to train the model.

Returns:: theta array
Raises:: NotImplementedError – Data does not have thetas.

prior()¶

If the data has a supplied prior, return it. If not, the data module will default back to picking a prior from a random distribution.

Raises:: NotImplementedError – The data does not have a prior field.

true_context()¶

Try to get the xs field of the loaded data.

Raises:: NotImplementedError – The data does not have a xs field.

class deepdiagnostics.data.PickleData(path, simulator, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)¶

Load data that is saved as a .pkl file.

save(data, path)¶

Save data in the form of a .pkl file.

Parameters:

data (Any) – Data that can be encoded into a pkl.
path (str) – Out file path for the data. Must have a .pkl extension.

Return type:

None

class deepdiagnostics.data.simulator.Simulator¶

abstract generate_context(n_samples)¶

[ABSTRACT, MUST BE FILLED]

Specify how the conditioning context is generated. Can come from data, or from a generic distribution.

Example:

# Generate from a random distribution
class MySim(Simulator):
    def generate_context(self, n_samples: int) -> np.ndarray:
        return np.random.uniform(0, 1)

# Draw from a sample
class MySim(Simulator):
    def __init__(self):
        self.data_source = .....

    def generate_context(self, n_samples: int) -> np.ndarray:
        return self.data_source.sample(n_samples)

Parameters:: n_samples (int) – Number of samples of context to pull
Returns:: Conditioning context used to produce simulated outcomes with a given theta.
Return type:: np.ndarray

abstract simulate(theta, context_samples)¶

[ABSTRACT, MUST BE FILLED]

Specify a simulation S such that y_{theta} = S(context_samples|theta)

Example:

# Generate from a random distribution
class MySim(Simulator):
    def simulate(self, theta: np.ndarray, context_samples: np.ndarray) -> np.ndarray:
        simulation_results = np.zeros(theta.shape[0], 1)
        for index, context in enumerate(context_samples):
            simulation_results[index] = theta[index][0]*context + theta[index][1]*context

        return simulation_results

Parameters:

theta (np.ndarray) – Parameters of the simulation model
context_samples (np.ndarray) – Samples to use with the theta-primed simulation model

Returns:

Simulated outcome.

Return type:

np.ndarray

Data¶

Table of Contents

Previous topic

Next topic

This Page