Generating Datasets

A DeepLenstronomy Demo.

deeplenstronomy is a software package that let's you interact with the strong gravitational lensing simulation software lenstronomy in a streamlined framework. Let's take a look at how it works!

deeplenstronomy works by reading user-prepared configuration files. start by specifying the configuration file you will use to make the dataset.

The configuration file is a yaml-style file for specifying all the properties of your dataset. Here is what this file contains:

There's a lot of information in there. To learn about how configuration files for deeplenstronomy are structured, check out the Creating deeplenstronomy Configuration Files documentation.

Simulating a Dataset

Let's put deeplenstronomy to work!

That's it. You now have your dataset.

Understanding the make_dataset() Function

All the information about the contents of the dataset is stored in the configuration file, but the make_dataset() function will tell deeplenstronomy what you want to do with your dataset. Let's look at the function definition:

def make_dataset(config, dataset=None, save_to_disk=False, store_in_memory=True,
                 verbose=False, store_sample=False, image_file_format='npy',
                 survey=None, return_planes=False, skip_image_generation=False,
                 solve_lens_equation=False):
    """
    Generate a dataset from a config file

    :param config: yaml file specifying dataset characteristics
                   OR
                   pre-parsed yaml file as a dictionary
    :param verbose: bool, if true, print status updates
    :param store_in_memory: bool, save images and metadata as attributes
    :param save_to_disk: bool, save images and metadata to disk
    :param store_sample: bool, save five images and metadata as attribute
    :param image_file_format: outfile format type (npy, h5)
    :param survey: str, a default astronomical survey to use
    :param return_planes: bool, if true, return the separate planes of simulated images
    :param skip_image_generation: bool, if true, skip image generation
    :param solve_lens_equation: bool, if true, calculate the source positions
    :return: dataset: instance of dataset class
    """

If all the default parameters work for you, then all you have to do is call dataset = dl.make_dataset(config_file) and you'll be good to go. If not, the parameters can be modified to product the desired behavior.

make_dataset() Parameters:

The filename of your configuration file as a string.

If you would like brief status messages of the simulation to be printed as make_dataset() runs, you can set verbose=True.

By default, the simulated images and associated metadata will be stored in your computer's memory as attributes of the object returned by make_dataset(). If you would like to prevent this, typically motivated by just wanting to write the simulation products directly to output files, you can pass store_in_memory=False.

By default, the simulated images and associated metadata are not saved to output files. If you would like them to be saved, set save_to_disk=True and make sure to specify an out directory in your configuration file.

If you would like to take a peek at the simualted images and metadata to inspect them without storing the entire dataset in memory, you can set store_sample=True. This setting will store five images for each configuration in memory as attributes of the object returned by make_dataset().

When saving images to disk, you can choose the image file format. The currently supported options are (by default) image_file_format='npy' and image_file_format='h5'.

deeplenstronomy has built in information about multiple large astronomical surveys. Currently, the options are des, delve, lsst, ztf, euclid, and hst. Specifying one of these will overwrite values of the SURVEY and IMAGE sections of your configuration file at runtime.

By default, deeplenstronomy will calculate light from lenses, sources, point sources, and noise separately before stacking them into one output image. If you would prefer to keep the lens, source, point source, and noise information separate for your analysis, you can pass return_planes=True. The image arrays in your dataset will gain an extra dimension of length 4 (one for each of these possible sources of light).

If you would like to generate all the lenstronomy inputs specified in you configuration file but not generate images, for example to check that your distribtuions are working as expected, you can skip the image generation step while still outputing all the metadata by setting skip_image_generation=True.

If you would like to analyze the positions and numbers of individual images of the source galaxy in each of your dataset images you will need to solve the lens equation. To do this, set solve_lens_equation=True and this information will be calculated and sotred in the metadata. This calculation is skipped by default for performance considerations.

Interacting with the Dataset

Now that we have a dataset, let's look at what was stored in the dataset variable.

The configuration labels you specified in the configuration file are stored here. The reasoning here is if you plan to do some sort of supervised classification, you will probably want to have the images labeled.

The dataset name, size, and output directory are also stored as attributes of the dataset object.

There are a few other things that get stored automatically (that you can explore via dir(dataset)), but we'll shift our focus to the things we simulated.

The most interesting information is stored here in these attributes:

The _images attribute is a numpy.ndarray object and the _metadata attribute is a pandas.DataFrame object.

Images

Let's checkout some of the images in the dataset.CONFIGURATION_1_images attribute.

What's in this array?

The array dimensions are (image index, band, x_pixels, y_pixels).

The number of images is the size of the dataset multiplied by the fraction of the dataset in CONFIGURATION_1, both of which you specify in the configuration file. The bands used is also specified in the configuration file. Finally, yup you guessed it, the image dimensions are also specified in the configuration file.

deeplenstronomy also has built-in vizualization functions that are demonstrated in more detail in the Visualization Notebook.

Let's look at the r-band of image index 2 in CONFIGURATION_1:

You can also look at all the bands for this image at once.

Metadata

Once you have an image, you may want to consider the parameters that went into its generation to better understand what you made. To check that out, you can view the metadata saved by deeplenstronomy.

Let's look at the properties of the metadata.

Wow. That's a lot of columns. The number of columns increases with the complexity of your configurations, since there is more information for deeplenstronomy to keep track of. The columns are also broken up my band, so doubling the number of bands will double the number of columns in the metadata.

Let's look at the column names to see what information we have.

Every individual number used in the lenstronomy simulation is tracked.

As well, the row index in the metadata dataframe corresponds to the image index in the image array, so you can track which image has which properties. The dataframe contents can be accessed like this:

Saving Datasets

If you are working in interactive mode, you can straightforwardly save the images array and metadata dataframe in any file format you are comfortable with.

If you instead choose to set the save_to_disk=True option when making your dataset, let's look at what gets saved where.

Recall that the dataset object has the user-specified out directory as an attribute.

Let's look in that directory.

The image arrays have been stored as numpy files. They can be loaded by doing

images = numpy.load('MySimulationResults/CONFIGURATION_1_images.npy')

The metadata dataframes have been written to csv files. They can be loaded by doing

metadata = pandas.read_csv('MySimulationResults/CONFIGURATION_1_metadata.csv')

Future versions of deeplenstronomy will include file format flexibility and built in dataset loading funcitons.

The End

That's pretty much it to generating datasets with deeplenstronomy! Feel free to contact me with any suggestions or bugs.