magnet.data¶
Data¶
-
class
magnet.data.Data(train, val=None, test=None, val_split=0.2, **kwargs)[source]¶ A container which holds the Training, Validation and Test Sets and provides DataLoaders on call.
This is a convenient abstraction which is used downstream with the Trainer and various debuggers.
It works in tandem with the custom Dataset, DataLoader and Sampler sub-classes that MagNet defines.
Parameters: - train (
Dataset) – The training set - val (
Dataset) – The validation set. Default:None - test (
Dataset) – The test set. Default:None - val_split (float) – The fraction of training data to hold out
as validation if validation set is not given. Default:
0.2
Keyword Arguments: - num_workers (int) – how many subprocesses to use for data
loading. 0 means that the data will be loaded in the main process.
Default:
0 - collate_fn (callable) – merges a list of samples to form a mini-batch
Default:
pack_collate() - pin_memory (bool) – If
True, the data loader will copy tensors into CUDA pinned memory before returning them. Default:False - timeout (numeric) – if positive, the timeout value for collecting a batch
from workers. Should always be non-negative. Default:
0 - worker_init_fn (callable) – If not
None, this will be called on each worker subprocess with the worker id (an int in[0, num_workers - 1]) as input, after seeding and before data loading. Default:None - transforms (list or callable) – A list of transforms to be applied to
each datapoint. Default:
None - fetch_fn (callable) – A function which is applied to each datapoint
before collating. Default:
None
-
__call__(batch_size=1, shuffle=False, replace=False, probabilities=None, sample_space=None, mode='train')[source]¶ Returns a MagNet DataLoader that iterates over the dataset.
Parameters: - batch_size (int) – How many samples per batch to load. Default:
1 - shuffle (bool) – Set to
Trueto have the data reshuffled at every epoch. Default:False - replace (bool) – If
Trueevery datapoint can be resampled per epoch. Default:False - probabilities (list or numpy.ndarray) – An array of probabilities
of drawing each member of the dataset. Default:
None - sample_space (float or int or list) – The fraction / length / indices
of the sample to draw from. Default:
None - mode (str) – One of [
'train','val','test']. Default:'train'
- batch_size (int) – How many samples per batch to load. Default:
- train (
Core Datasets¶
-
magnet.data.core.MNIST(val_split=0.2, path=None, **kwargs)[source]¶ The MNIST Dataset.
Parameters: - val_split (float) – The fraction of training data to hold out
as validation if validation set is not given. Default:
0.2 - path (pathlib.Path or str) – The path to save the dataset to. Default: Magnet Datapath
Keyword Arguments: () – See
Datafor more details.- val_split (float) – The fraction of training data to hold out
as validation if validation set is not given. Default:
Transforms¶
-
magnet.data.transforms.augmented_image_transforms(d=0, t=0, s=0, sh=0, ph=0, pv=0, resample=2)[source]¶ Returns a list of augmented transforms to be applied to natural images.
Parameters: - d (sequence or float or int) – Range of degrees to select from.
Default:
0 - t (tuple) – Tuple of maximum absolute fraction for horizontal
and vertical translations. Default:
0 - s (tuple, optional) – Scaling factor interval. Default:
0 - sh (sequence or float or int, optional) – Range of shear. Default:
0 - ph (float) – The probability of flipping the image horizontally.
Default:
0 - pv (float) – The probability of flipping the image vertically.
Default:
0 - resample (int) – An optional resampling filter. Default:
2
See
torchvision.transformsfor more details.- d (sequence or float or int) – Range of degrees to select from.
Default: