magnet.data¶

Data¶

class magnet.data.Data(train, val=None, test=None, val_split=0.2, **kwargs)[source]¶

A container which holds the Training, Validation and Test Sets and provides DataLoaders on call.

This is a convenient abstraction which is used downstream with the Trainer and various debuggers.

It works in tandem with the custom Dataset, DataLoader and Sampler sub-classes that MagNet defines.

Parameters:

train (Dataset) – The training set
val (Dataset) – The validation set. Default: None
test (Dataset) – The test set. Default: None
val_split (float) – The fraction of training data to hold out as validation if validation set is not given. Default: 0.2

Keyword Arguments:

num_workers (int) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. Default: 0
collate_fn (callable) – merges a list of samples to form a mini-batch Default: pack_collate()
pin_memory (bool) – If True, the data loader will copy tensors into CUDA pinned memory before returning them. Default: False
timeout (numeric) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. Default: 0
worker_init_fn (callable) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. Default: None
transforms (list or callable) – A list of transforms to be applied to each datapoint. Default: None
fetch_fn (callable) – A function which is applied to each datapoint before collating. Default: None

__call__(batch_size=1, shuffle=False, replace=False, probabilities=None, sample_space=None, mode='train')[source]¶

Returns a MagNet DataLoader that iterates over the dataset.

Parameters:

batch_size (int) – How many samples per batch to load. Default: 1
shuffle (bool) – Set to True to have the data reshuffled at every epoch. Default: False
replace (bool) – If True every datapoint can be resampled per epoch. Default: False
probabilities (list or numpy.ndarray) – An array of probabilities of drawing each member of the dataset. Default: None
sample_space (float or int or list) – The fraction / length / indices of the sample to draw from. Default: None
mode (str) – One of ['train', 'val', 'test']. Default: 'train'

magnet.data.core.MNIST(val_split=0.2, path=None, **kwargs)[source]¶

The MNIST Dataset.

Keyword Arguments:
Parameters:	val_split (float) – The fraction of training data to hold out as validation if validation set is not given. Default: `0.2` path (pathlib.Path or str) – The path to save the dataset to. Default: Magnet Datapath
	() – See `Data` for more details.

magnet.data.transforms.augmented_image_transforms(d=0, t=0, s=0, sh=0, ph=0, pv=0, resample=2)[source]¶

Returns a list of augmented transforms to be applied to natural images.

Parameters:

d (sequence or float or int) – Range of degrees to select from. Default: 0
t (tuple) – Tuple of maximum absolute fraction for horizontal and vertical translations. Default: 0
s (tuple, optional) – Scaling factor interval. Default: 0
sh (sequence or float or int, optional) – Range of shear. Default: 0
ph (float) – The probability of flipping the image horizontally. Default: 0
pv (float) – The probability of flipping the image vertically. Default: 0
resample (int) – An optional resampling filter. Default: 2

See torchvision.transforms for more details.

magnet.data.transforms.image_transforms(augmentation=0, direction='horizontal')[source]¶

Returns a list of transforms to be applied to natural images.

Parameters:	augmentation (float) – The percentage of augmentation to be applied. Default: `0` direction (str) – The direction to flip the image at random. Default: `'horizontal'`