magnet.data¶
Data¶
-
class
magnet.data.
Data
(train, val=None, test=None, val_split=0.2, **kwargs)[source]¶ A container which holds the Training, Validation and Test Sets and provides DataLoaders on call.
This is a convenient abstraction which is used downstream with the Trainer and various debuggers.
It works in tandem with the custom Dataset, DataLoader and Sampler sub-classes that MagNet defines.
Parameters: - train (
Dataset
) – The training set - val (
Dataset
) – The validation set. Default:None
- test (
Dataset
) – The test set. Default:None
- val_split (float) – The fraction of training data to hold out
as validation if validation set is not given. Default:
0.2
Keyword Arguments: - num_workers (int) – how many subprocesses to use for data
loading. 0 means that the data will be loaded in the main process.
Default:
0
- collate_fn (callable) – merges a list of samples to form a mini-batch
Default:
pack_collate()
- pin_memory (bool) – If
True
, the data loader will copy tensors into CUDA pinned memory before returning them. Default:False
- timeout (numeric) – if positive, the timeout value for collecting a batch
from workers. Should always be non-negative. Default:
0
- worker_init_fn (callable) – If not
None
, this will be called on each worker subprocess with the worker id (an int in[0, num_workers - 1]
) as input, after seeding and before data loading. Default:None
- transforms (list or callable) – A list of transforms to be applied to
each datapoint. Default:
None
- fetch_fn (callable) – A function which is applied to each datapoint
before collating. Default:
None
-
__call__
(batch_size=1, shuffle=False, replace=False, probabilities=None, sample_space=None, mode='train')[source]¶ Returns a MagNet DataLoader that iterates over the dataset.
Parameters: - batch_size (int) – How many samples per batch to load. Default:
1
- shuffle (bool) – Set to
True
to have the data reshuffled at every epoch. Default:False
- replace (bool) – If
True
every datapoint can be resampled per epoch. Default:False
- probabilities (list or numpy.ndarray) – An array of probabilities
of drawing each member of the dataset. Default:
None
- sample_space (float or int or list) – The fraction / length / indices
of the sample to draw from. Default:
None
- mode (str) – One of [
'train'
,'val'
,'test'
]. Default:'train'
- batch_size (int) – How many samples per batch to load. Default:
- train (
Core Datasets¶
-
magnet.data.core.
MNIST
(val_split=0.2, path=None, **kwargs)[source]¶ The MNIST Dataset.
Parameters: - val_split (float) – The fraction of training data to hold out
as validation if validation set is not given. Default:
0.2
- path (pathlib.Path or str) – The path to save the dataset to. Default: Magnet Datapath
Keyword Arguments: () – See
Data
for more details.- val_split (float) – The fraction of training data to hold out
as validation if validation set is not given. Default:
Transforms¶
-
magnet.data.transforms.
augmented_image_transforms
(d=0, t=0, s=0, sh=0, ph=0, pv=0, resample=2)[source]¶ Returns a list of augmented transforms to be applied to natural images.
Parameters: - d (sequence or float or int) – Range of degrees to select from.
Default:
0
- t (tuple) – Tuple of maximum absolute fraction for horizontal
and vertical translations. Default:
0
- s (tuple, optional) – Scaling factor interval. Default:
0
- sh (sequence or float or int, optional) – Range of shear. Default:
0
- ph (float) – The probability of flipping the image horizontally.
Default:
0
- pv (float) – The probability of flipping the image vertically.
Default:
0
- resample (int) – An optional resampling filter. Default:
2
See
torchvision.transforms
for more details.- d (sequence or float or int) – Range of degrees to select from.
Default: