dnadna.transforms
Data transforms that can be applied during training.
Functions
|
Remove sites that are no longer polymorphic in sample. |
Classes
|
Pseudo-transform that composes multiple transforms by applying them in order one after the other. |
|
Crop the SNP matrix and position array to a maximum size. |
|
Changes the format of the input position array. |
|
Given a sequence, return a random rotation of it along the SNP axis. |
|
This transform specifies in what format the SNP matrix and position arrays are combined to form the input to the network. |
|
Subsample SNP matrix of size (n, k), with n individuals and k SNPs and return a matrix of size (m, l), with m individuals and m < n and l SNPs with l <= k because columns without SNP anymore are not kept. |
Dataset transform. |
|
|
A special transform that does not actually modify the data, but merely performs certain verifications on it. |
Exceptions
|
Exception raised when a sample doesn’t meet the minimum requirements for the dataset. |
|
Exception raised when applying a |
- class dnadna.transforms.Compose(transforms)[source]
Bases:
object
Pseudo-transform that composes multiple transforms by applying them in order one after the other.
- class dnadna.transforms.Crop(max_snp=None, max_indiv=None, keep_polymorphic_only=True)[source]
Bases:
dnadna.transforms.Transform
Crop the SNP matrix and position array to a maximum size.
- exception dnadna.transforms.InvalidSNPSample(msg, sample=None)[source]
Bases:
Exception
Exception raised when a sample doesn’t meet the minimum requirements for the dataset.
Used by
ValidateSnp
.
- class dnadna.transforms.ReformatPosition(distance=None, normalized=None, circular=None, chromosome_size=None, initial_position=None)[source]
Bases:
dnadna.transforms.Transform
Changes the format of the input position array.
It can change from normalized/unnormalized positions, and can convert between distance and absolute position formats.
When initializing this transform it is only necessary to specify those parameters that you explicitly want to convert.
Warning
This transform should be applied before any other transforms (e.g. rotate) which can modify the position orders, since this transform assumes positions are all in increasing order.
- Keyword Arguments
distance (bool) – (optional) – If True, change positions to distances or vice-versa; if left unspecified the current position format is kept.
normalized (bool) – (optional) – Divide SNP positions/distances by chromosome size? If True, unnormalized positions are converted to normalized positions and vice-versa; if left unspecified the current normalization is kept. The
chromosome_size
argument is also required when changing the normalization, unless thechromosome_size
is already specified on the inputs.chromosome_size (int) – (optional) – Length of the chromosome; required when transforming from normalized to unnormalized positions. If left unspecified, but the input
SNPSample
has achromosome_size
in itspos_format
, that it will be used.circular (bool) – (optional) – Chromosome should be treated as circular when performing the transformation. Normally the input’s circularity is kept.
initial_position (int or float) – (optional) – A position to use as the initial position when converting from circular positions.
Examples
>>> from dnadna.snp_sample import SNPSample >>> from dnadna.transforms import ReformatPosition >>> import numpy as np
Initial example with unnormalized absolute positions and chromosome_size = 1000:
>>> sample = SNPSample(np.eye(4), [5, 460, 900, 952], ... pos_format={'normalized': False, 'distance': False, ... 'chromosome_size': 1000}) >>> xf = ReformatPosition(normalized=True) >>> xf((sample, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([0.0050, 0.4600, 0.9000, 0.9520], dtype=torch.float64), pos_format={'normalized': True, 'distance': False, 'chromosome_size': 1000} ) >>> xf = ReformatPosition(distance=True) >>> xf((sample, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([ 5, 455, 440, 52]), pos_format={'normalized': False, 'distance': True, 'chromosome_size': 1000} ) >>> xf = ReformatPosition(distance=True, normalized=True) >>> dist_norm = xf((sample, None, None))[0] >>> dist_norm SNPSample( snp=tensor(...), pos=tensor([0.0050, 0.4550, 0.4400, 0.0520], dtype=torch.float64), pos_format={'normalized': True, 'distance': True, 'chromosome_size': 1000} )
Convert from normalized distances back to unnormalized positions:
>>> xf = ReformatPosition(distance=False, normalized=False) >>> xf((dist_norm, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([ 5, 460, 900, 952]), pos_format={'normalized': False, 'distance': False, 'chromosome_size': 1000} )
Convert from normalized linear distances to circular distances:
>>> xf = ReformatPosition(circular=True, initial_position=0.005) >>> xf((dist_norm, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([0.0530, 0.4550, 0.4400, 0.0520], dtype=torch.float64), pos_format={'normalized': True, 'distance': True, 'chromosome_size': 1000, 'circular': True, 'initial_position': 0.005} )
Convert from positions to circular distances:
>>> xf = ReformatPosition(distance=True, circular=True) >>> xf((sample, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([ 53, 455, 440, 52]), pos_format={'normalized': False, 'distance': True, 'chromosome_size': 1000, 'circular': True, 'initial_position': 5} ) >>> xf = ReformatPosition(distance=True, normalized=True, circular=True) >>> circ_norm = xf((sample, None, None))[0] >>> circ_norm SNPSample( snp=tensor(...), pos=tensor([0.0530, 0.4550, 0.4400, 0.0520], dtype=torch.float64), pos_format={'normalized': True, 'distance': True, 'chromosome_size': 1000, 'circular': True, 'initial_position': 0.005} )
Test converting some circular distances, first from circular to non-circular:
>>> xf = ReformatPosition(circular=False) >>> xf((circ_norm, None, None))[0] SNPSample( snp=tensor(...), pos=tensor([0.0050, 0.4550, 0.4400, 0.0520], dtype=torch.float64), pos_format={'normalized': True, 'distance': True, 'chromosome_size': 1000, 'circular': False, 'initial_position': 0.005} )
- class dnadna.transforms.Rotate[source]
Bases:
dnadna.transforms.Transform
Given a sequence, return a random rotation of it along the SNP axis.
- Args:
None
- class dnadna.transforms.SnpFormat(format='concat')[source]
Bases:
dnadna.transforms.Transform
This transform specifies in what format the SNP matrix and position arrays are combined to form the input to the network.
Currently this can be one of:
concat: the position array and the SNP matrix are concatenated vertically with the position array becoming the first row of the tensor (this is the default, even if this transform is not used explicitly).
product: the SNP matrix is multiplied by the position array, so that each active site has the value of its position, rather than just
1
.
- class dnadna.transforms.Subsample(size, keep_polymorphic_only=True)[source]
Bases:
dnadna.transforms.Transform
Subsample SNP matrix of size (n, k), with n individuals and k SNPs and return a matrix of size (m, l), with m individuals and m < n and l SNPs with l <= k because columns without SNP anymore are not kept.
- class dnadna.transforms.Transform[source]
Bases:
dnadna.utils.plugins.Pluggable
Dataset transform.
When loading
SNPSample
s from the dataset, these transforms are applied to the samples to modify either the position or SNP matrix arrays, or both.To implement a transform you must provide its
__call__
method, which takes as input a tuple consisting of theSNPSample
being loaded from the dataset, as well as a the parameters being trained as aLearnedParams
, and the parameter values associated with the sample’s scenario, as loaded from the PandasDataFrame
.- classmethod get_schema()[source]
Provide a schema for validating a single transform in a list of transforms in the config file (see the training config schema) for example usage).
- exception dnadna.transforms.TransformException(transform)[source]
Bases:
Exception
Exception raised when applying a
Transform
to an input.- Parameters
transform (
dnadna.transforms.Transform
) – The transform that caused the exception.
- class dnadna.transforms.ValidateSnp(uniform_shape=True)[source]
Bases:
dnadna.transforms.Transform
A special transform that does not actually modify the data, but merely performs certain verifications on it.
If verification fails the data sample will be excluded from batches returned by the data loader.
Currently there is only one verification supported, which is to verify that all SNPs have the same shape (same number of SNPs and individuals).
This can be combined e.g. with
Crop
to first crop the SNP sizes to a maximum size, then verify that they are of a consistent shape with previous SNPs in the dataset.- Keyword Arguments
uniform_shape (bool) – (optional) – Check whether all SNP samples in the dataset have the same shape (same number of SNPs and individuals).