Data

class deepdiagnostics.data.data.Data(path, simulator_name, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)

Load stored data to use in diagnostics

Parameters:
  • path (str) – path to the data file.

  • simulator_name (str) – Name of the register simulator. If your simulator is not registered with utils.register_simulator, it will produce an error here.

  • simulator_kwargs (dict, optional) – Any additional kwargs used set up your simulator. Defaults to None.

  • prior (str, optional) – If the prior is not given in the data, use a numpy random distribution. Specified by name. Choose from: { “normal” “poisson” “uniform” “gamma” “beta” “binominal}. Defaults to None.

  • prior_kwargs (dict, optional) – kwargs for the numpy prior. View this page for a description. Defaults to None.

  • simulation_dimensions (Optional[int], optional) – 1 or 2. 1->output of the simulator has one dimensions, 2->output has two dimensions (is an image). Defaults to None.

get_sigma_true()

Look for the true sigma of data. If supplied in the method, use that, other look in the configuration file. If neither are supplied, return 1.

Returns:

Sigma value selected by the search.

Return type:

Any

load_prior(prior, prior_kwargs)

Load the prior. Either try to get it from data (if it has been implemented for the type of data), or use numpy to initialize a random distribution using the prior argument.

Parameters:
  • prior (str) – Name of prior.

  • prior_kwargs (dict[str, any]) – kwargs for initializing the prior.

Raises:
  • NotImplementedError – The selected prior is not included.

  • RuntimeError – The selected prior is missing arguments to initialize.

Returns:

Prior that can be sampled from by calling it with prior(n_samples)

Return type:

callable

sample_prior(n_samples)

Sample from the prior.

Parameters:

n_samples (int) – Number of samples to draw.

Returns:

Samples drawn from the prior.

Return type:

np.ndarray

class deepdiagnostics.data.H5Data(path, simulator, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)

Load data that has been saved in a h5 format.

If you cast your problem to be y = mx + b, these are the fields required and what they represent:

simulator_outcome - y thetas - parameters of the model - m, b context - xs

Data Parameters
Xs:

[REQUIRED] The context, the x values. The data that was used to train a model on what conditions produce what posterior.

Thetas:

[REQUIRED] The theta, the parameters of the external model. The data used to train the model’s posterior.

Prior:

Distribution used to initialize the posterior before training.

Sigma:

True standard deviation of the actual thetas, if known.

get_sigma_true()

Try to get the true standard deviation of the data. If it is not supplied, return 1.

Returns:

sigma.

Return type:

Any

prior()

If the data has a supplied prior, return it. If not, the data module will default back to picking a prior from a random distribution.

Raises:

NotImplementedError – The data does not have a prior field.

class deepdiagnostics.data.PickleData(path, simulator, simulator_kwargs=None, prior=None, prior_kwargs=None, simulation_dimensions=None)

Load data that is saved as a .pkl file.

save(data, path)

Save data in the form of a .pkl file.

Parameters:
  • data (Any) – Data that can be encoded into a pkl.

  • path (str) – Out file path for the data. Must have a .pkl extension.

Return type:

None

class deepdiagnostics.data.simulator.Simulator
abstract generate_context(n_samples)

[ABSTRACT, MUST BE FILLED]

Specify how the conditioning context is generated. Can come from data, or from a generic distribution.

Example:

# Generate from a random distribution
class MySim(Simulator):
    def generate_context(self, n_samples: int) -> np.ndarray:
        return np.random.uniform(0, 1)

# Draw from a sample
class MySim(Simulator):
    def __init__(self):
        self.data_source = .....

    def generate_context(self, n_samples: int) -> np.ndarray:
        return self.data_source.sample(n_samples)
Parameters:

n_samples (int) – Number of samples of context to pull

Returns:

Conditioning context used to produce simulated outcomes with a given theta.

Return type:

np.ndarray

abstract simulate(theta, context_samples)

[ABSTRACT, MUST BE FILLED]

Specify a simulation S such that y_{theta} = S(context_samples|theta)

Example: .. code-block:: python

# Generate from a random distribution class MySim(Simulator):

def simulate(self, theta: np.ndarray, context_samples: np.ndarray) -> np.ndarray:

simulation_results = np.zeros(theta.shape[0], 1) for index, context in enumerate(context_samples):

simulation_results[index] = theta[index][0]*context + theta[index][1]*context

return simulation_results

Parameters:
  • theta (np.ndarray) – Parameters of the simulation model

  • context_samples (np.ndarray) – Samples to use with the theta-primed simulation model

Returns:

Simulated outcome.

Return type:

np.ndarray

class deepdiagnostics.data.lookup_table_simulator.LookupTableSimulator(data, random_state, outside_range_limit=2.0, hash_precision=10)

A lookup table that mocks a simulator - assume your data is perfectly representative of a simulator (or else you are okay with nearest neighbor matching)

Does not need to be registered, it is automatically available as the default simulator

Assumes your has the following fields accessible as data[“context”], data[“thetas”], data[“simulator_outcome”], where xs is the context, thetas are the parameters, and ys are the outcomes

Parameters:
  • data (tensor)

  • random_state (Generator)

  • outside_range_limit (float)

  • hash_precision (int)

generate_context(n_samples)

Draw samples from the context data

simulate(theta, context_samples)

Find the outcome y for a given theta and context sample. If no exact match, take the nearest neighbor (via the L2 norm of normalized theta and context)

Parameters:
  • theta (Union[np.ndarray, float]) – parameter(s) to simulate

  • context_samples (Union[np.ndarray, float]) – context(s) to condition on

Returns:

Simulated outcomes(s)

Return type:

np.ndarray