magnet.data

Data

class magnet.data.Data(train, val=None, test=None, val_split=0.2, **kwargs)[source]

A container which holds the Training, Validation and Test Sets and provides DataLoaders on call.

This is a convenient abstraction which is used downstream with the Trainer and various debuggers.

It works in tandem with the custom Dataset, DataLoader and Sampler sub-classes that MagNet defines.

Parameters:
  • train (Dataset) – The training set
  • val (Dataset) – The validation set. Default: None
  • test (Dataset) – The test set. Default: None
  • val_split (float) – The fraction of training data to hold out as validation if validation set is not given. Default: 0.2
Keyword Arguments:
 
  • num_workers (int) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. Default: 0
  • collate_fn (callable) – merges a list of samples to form a mini-batch Default: pack_collate()
  • pin_memory (bool) – If True, the data loader will copy tensors into CUDA pinned memory before returning them. Default: False
  • timeout (numeric) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. Default: 0
  • worker_init_fn (callable) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. Default: None
  • transforms (list or callable) – A list of transforms to be applied to each datapoint. Default: None
  • fetch_fn (callable) – A function which is applied to each datapoint before collating. Default: None
__call__(batch_size=1, shuffle=False, replace=False, probabilities=None, sample_space=None, mode='train')[source]

Returns a MagNet DataLoader that iterates over the dataset.

Parameters:
  • batch_size (int) – How many samples per batch to load. Default: 1
  • shuffle (bool) – Set to True to have the data reshuffled at every epoch. Default: False
  • replace (bool) – If True every datapoint can be resampled per epoch. Default: False
  • probabilities (list or numpy.ndarray) – An array of probabilities of drawing each member of the dataset. Default: None
  • sample_space (float or int or list) – The fraction / length / indices of the sample to draw from. Default: None
  • mode (str) – One of ['train', 'val', 'test']. Default: 'train'

Core Datasets

magnet.data.core.MNIST(val_split=0.2, path=None, **kwargs)[source]

The MNIST Dataset.

Parameters:
  • val_split (float) – The fraction of training data to hold out as validation if validation set is not given. Default: 0.2
  • path (pathlib.Path or str) – The path to save the dataset to. Default: Magnet Datapath
Keyword Arguments:
 

() – See Data for more details.

Transforms

magnet.data.transforms.augmented_image_transforms(d=0, t=0, s=0, sh=0, ph=0, pv=0, resample=2)[source]

Returns a list of augmented transforms to be applied to natural images.

Parameters:
  • d (sequence or float or int) – Range of degrees to select from. Default: 0
  • t (tuple) – Tuple of maximum absolute fraction for horizontal and vertical translations. Default: 0
  • s (tuple, optional) – Scaling factor interval. Default: 0
  • sh (sequence or float or int, optional) – Range of shear. Default: 0
  • ph (float) – The probability of flipping the image horizontally. Default: 0
  • pv (float) – The probability of flipping the image vertically. Default: 0
  • resample (int) – An optional resampling filter. Default: 2

See torchvision.transforms for more details.

magnet.data.transforms.image_transforms(augmentation=0, direction='horizontal')[source]

Returns a list of transforms to be applied to natural images.

Parameters:
  • augmentation (float) – The percentage of augmentation to be applied. Default: 0
  • direction (str) – The direction to flip the image at random. Default: 'horizontal'