Data

class deepdiagnostics.data.data.Data(path, simulator_name, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)

Load stored data to use in diagnostics

Parameters:
  • path (str) – path to the data file.

  • simulator_name (str) – Name of the register simulator. If your simulator is not registered with utils.register_simulator, it will produce an error here.

  • simulator_kwargs (dict, optional) – Any additional kwargs used set up your simulator. Defaults to None.

  • prior (str, optional) – If the prior is not given in the data, use a numpy random distribution. Specified by name. Choose from: { “normal” “poisson” “uniform” “gamma” “beta” “binominal}. Defaults to None.

  • prior_kwargs (dict, optional) – kwargs for the numpy prior. View this page for a description. Defaults to None.

  • simulation_dimensions (Optional[int], optional) – 1 or 2. 1->output of the simulator has one dimensions, 2->output has two dimensions (is an image). Defaults to None.

get_sigma_true()

Look for the true sigma of data. If supplied in the method, use that, other look in the configuration file. If neither are supplied, return 1.

Returns:

Sigma value selected by the search.

Return type:

Any

get_simulator_output_shape()

Run a single sample of the simulator to verify the out-shape.

Returns:

Output shape of a single sample of the simulator.

Return type:

tuple[Sequence[int]]

get_theta_true()

Look for the true theta given by data. If supplied in the method, use that, other look in the configuration file. If neither are supplied, return None.

Returns:

Theta value selected by the search.

Return type:

Any

load_prior(prior, prior_kwargs)

Load the prior. Either try to get it from data (if it has been implemented for the type of data), or use numpy to initialize a random distribution using the prior argument.

Parameters:
  • prior (str) – Name of prior.

  • prior_kwargs (dict[str, any]) – kwargs for initializing the prior.

Raises:
  • NotImplementedError – The selected prior is not included.

  • RuntimeError – The selected prior is missing arguments to initialize.

Returns:

Prior that can be sampled from by calling it with prior(n_samples)

Return type:

callable

sample_prior(n_samples)

Draw samples from the simulator

Parameters:

n_samples (int) – Number of samples to draw

Return type:

np.ndarray

simulated_context(n_samples)

Call the simulator’s generate_context method.

Parameters:

n_samples (int) – Number of samples to draw.

Returns:

context (x values), as defined by the simulator.

Return type:

np.ndarray

simulator_outcome(theta, condition_context=None, n_samples=None)

_summary_

Parameters:
  • theta (np.ndarray) – Theta value of shape (n_samples, theta_dimensions)

  • condition_context (np.ndarray, optional) – If x values for theta are known, use them. Defaults to None.

  • n_samples (int, optional) – If x values are not known for theta, draw them randomly. Defaults to None.

Raises:

ValueError – If either n samples or content samples is supplied.

Returns:

Simulator output of shape (n samples, simulator_dimensions)

Return type:

np.ndarray

true_context()

True data x values, if supplied by the data method.

true_simulator_outcome()

Run the simulator on all true theta and true x values.

Returns:

array of (n samples, simulator shape) showing output of the simulator on all true samples in data.

Return type:

np.ndarray

class deepdiagnostics.data.H5Data(path, simulator, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)

Load data that has been saved in a h5 format.

Data Parameters
Xs:

[REQUIRED] The context, the x values. The data that was used to train a model on what conditions produce what posterior.

Thetas:

[REQUIRED] The theta, the parameters of the external model. The data used to train the model’s posterior.

Prior:

Distribution used to initialize the posterior before training.

Sigma:

True standard deviation of the actual thetas, if known.

get_sigma_true()

Try to get the true standard deviation of the data. If it is not supplied, return 1.

Returns:

sigma.

Return type:

Any

get_theta_true()

Get stored theta used to train the model.

Returns:

theta array

Raises:

NotImplementedError – Data does not have thetas.

prior()

If the data has a supplied prior, return it. If not, the data module will default back to picking a prior from a random distribution.

Raises:

NotImplementedError – The data does not have a prior field.

true_context()

Try to get the xs field of the loaded data.

Raises:

NotImplementedError – The data does not have a xs field.

class deepdiagnostics.data.PickleData(path, simulator, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)

Load data that is saved as a .pkl file.

save(data, path)

Save data in the form of a .pkl file.

Parameters:
  • data (Any) – Data that can be encoded into a pkl.

  • path (str) – Out file path for the data. Must have a .pkl extension.

Return type:

None

class deepdiagnostics.data.simulator.Simulator
abstract generate_context(n_samples)

[ABSTRACT, MUST BE FILLED]

Specify how the conditioning context is generated. Can come from data, or from a generic distribution.

Example:

# Generate from a random distribution
class MySim(Simulator):
    def generate_context(self, n_samples: int) -> np.ndarray:
        return np.random.uniform(0, 1)

# Draw from a sample
class MySim(Simulator):
    def __init__(self):
        self.data_source = .....

    def generate_context(self, n_samples: int) -> np.ndarray:
        return self.data_source.sample(n_samples)
Parameters:

n_samples (int) – Number of samples of context to pull

Returns:

Conditioning context used to produce simulated outcomes with a given theta.

Return type:

np.ndarray

abstract simulate(theta, context_samples)

[ABSTRACT, MUST BE FILLED]

Specify a simulation S such that y_{theta} = S(context_samples|theta)

Example:

# Generate from a random distribution
class MySim(Simulator):
    def simulate(self, theta: np.ndarray, context_samples: np.ndarray) -> np.ndarray:
        simulation_results = np.zeros(theta.shape[0], 1)
        for index, context in enumerate(context_samples):
            simulation_results[index] = theta[index][0]*context + theta[index][1]*context

        return simulation_results
Parameters:
  • theta (np.ndarray) – Parameters of the simulation model

  • context_samples (np.ndarray) – Samples to use with the theta-primed simulation model

Returns:

Simulated outcome.

Return type:

np.ndarray