dnadna.nets

DNADNA’s neural nets used for model training.

Module Attributes

DNADNANet

Alias for backwards-compatibility; use just dnadna.nets.Network instead.

Classes

CNN(n_outputs)

CNN is a basic convolutional neural network.

CustomCNN(n_snp, n_indiv, n_outputs[, concat])

CustomCNN is a convolutional neural network that infers demographic parameters from a SNP matrix and its associated vector of positions.

DNADNANet

Alias for backwards-compatibility; use just dnadna.nets.Network instead.

MLP(n_snp, n_indiv, n_outputs[, concat])

MLP is a basic fully connected network.

Network()

Base class for DNADNA neural nets.

SPIDNA(n_blocks, n_features, n_outputs)

SPIDNA is a convolutional neural network that infers evolutionary parameters.

SPIDNABlock(n_outputs, n_features)

Sub-part of the SPIDNA network.

class dnadna.nets.CNN(n_outputs)[source]

Bases: dnadna.nets.Network

CNN is a basic convolutional neural network. It can be used as a baseline or for testing dnadna

Task

Regression / Classification

Constraints
  • min_snp (400)

  • max_snp (400)

  • min_indiv (50)

  • max_indiv (50)

Warning

None

Parameters

n_outputs (int) – number of parameters or classes to infer

forward(x)[source]

The forward function of the network (see torch.nn.Module for more details). Should accept a batch of SNP matrices as input (may be in either concat format (where the position array has the SNP matrix concatenated to it) or product format (where the position array is multiplied by the SNP matrix).

If the operation of the net depends on which format the input is in, its __init__ method should accept a concat argument. It will be passed True or False by the network trainer depending on which format the inputs are in.

class dnadna.nets.CustomCNN(n_snp, n_indiv, n_outputs, concat=True)[source]

Bases: dnadna.nets.Network

CustomCNN is a convolutional neural network that infers demographic parameters from a SNP matrix and its associated vector of positions. The number of SNP is predefined and fixed.

The network is based on multiple 2D convolution filters of mixed sizes.

Task

Regression

Constraints
  • min_snp (400)

  • max_snp (no constraint)

  • min_indiv (50)

  • max_indiv (50)

Warning

None

Notes

This net was used to predict population sizes through time. It is called “custom CNN” in Sanchez et al., and was referred to in earlier versions of this code as “SPIDNA1”.

Publication

T. Sanchez, J. Cury, G. Charpiat, et F. Jay, « Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation », Mol Ecol Resour, p. 1755‑0998.13224, juill. 2020, doi: 10.1111/1755-0998.13224.

Parameters
  • n_snp (int) – number of SNPs in the alignment data

  • n_indiv (int) – number of individuals in the SNP data

  • n_outputs (int) – number of demographic parameters to infer

  • concat (bool) – Whether the position vector is concatenated to the SNP matrix. Thus, it adds 1 to the dimension of the individuals.

forward(x)[source]

The forward function of the network (see torch.nn.Module for more details). Should accept a batch of SNP matrices as input (may be in either concat format (where the position array has the SNP matrix concatenated to it) or product format (where the position array is multiplied by the SNP matrix).

If the operation of the net depends on which format the input is in, its __init__ method should accept a concat argument. It will be passed True or False by the network trainer depending on which format the inputs are in.

dnadna.nets.DNADNANet

Alias for backwards-compatibility; use just dnadna.nets.Network instead.

alias of dnadna.nets.Network

class dnadna.nets.MLP(n_snp, n_indiv, n_outputs, concat=True)[source]

Bases: dnadna.nets.Network

MLP is a basic fully connected network. It can be used as a baseline or for testing dnadna.

Task

Regression / Classification

Constraints
  • min_snp (no constraints)

  • max_snp (no constraints)

  • min_indiv (no constraints)

  • max_indiv (no constraints)

Warning

None

Parameters
  • n_snp (int) – number of SNPs in the alignment data

  • n_indiv (int) – number of individuals in the SNP data

  • n_outputs (int) – number of parameters or classes to infer

  • concat (bool) – Whether the position vector is concatenated to the SNP matrix. Thus, it adds 1 to the dimension of the individuals.

forward(x)[source]

The forward function of the network (see torch.nn.Module for more details). Should accept a batch of SNP matrices as input (may be in either concat format (where the position array has the SNP matrix concatenated to it) or product format (where the position array is multiplied by the SNP matrix).

If the operation of the net depends on which format the input is in, its __init__ method should accept a concat argument. It will be passed True or False by the network trainer depending on which format the inputs are in.

class dnadna.nets.Network[source]

Bases: torch.nn.modules.module.Module, dnadna.utils.plugins.Pluggable

Base class for DNADNA neural nets.

All neural nets, including user-defined neural nets in plugins, must use this base class, as it adds the net to the registry of nets known by the software. Sub-modules used by the net but that are not meant for use on their own should still use torch.nn.Module as their base class.

abstract property forward

The forward function of the network (see torch.nn.Module for more details). Should accept a batch of SNP matrices as input (may be in either concat format (where the position array has the SNP matrix concatenated to it) or product format (where the position array is multiplied by the SNP matrix).

If the operation of the net depends on which format the input is in, its __init__ method should accept a concat argument. It will be passed True or False by the network trainer depending on which format the inputs are in.

classmethod get_schema()[source]

Returns a schema pairing the network.name property with the valid network.params associated with that network (which may be very broad if the Network subclass does not specify its Network.schema).

schema = {}

Schema for the network’s net_params, the section in the training config for parameters the net instance should be instantiated with (e.g. n_snp, n_indiv in the case of SPIDNA1, among others).

It can be either a string containing the name (without the .yml extension) of a schema in the default schema path (for built-in nets) or a dict representing the schema.

If left empty, the net_params simply won’t be validated when loading the config.

class dnadna.nets.SPIDNA(n_blocks, n_features, n_outputs)[source]

Bases: dnadna.nets.Network

SPIDNA is a convolutional neural network that infers evolutionary parameters.

This network’s predictions are invariant to the permutation of individuals in the SNP matrix and adaptive to the number of individuals.

It is also adaptive to the number of SNPs, although it is recommended to evaluate the performance when the number of SNPs varies, because batch normalization is applied.

Task

Regression

Constraints
  • min_snp (400)

  • max_snp (no constraint)

  • min_indiv (2)

  • max_indiv (no constraint)

Warning

None

Notes

This net has been used to predict population sizes through time.

It is called “SPIDNA batch normalization” in Sanchez et al. 2020 and was trained with data cropped to a fixed number of SNPs (400) and individuals (50) without padding. However this is not a constraint of the architecture, which is adaptive to the number of individuals and SNPs.

Publication

T. Sanchez, J. Cury, G. Charpiat, et F. Jay, « Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation », Mol Ecol Resour, p. 1755‑0998.13224, juill. 2020, doi: 10.1111/1755-0998.13224.

Parameters
  • n_blocks (int) – number of SPIDNA blocks in the architecture

  • n_features (int) – number of convolution filters in each convolution layer, doubled for layers inside SPIDNA blocks n_features should be greater or equal to n_outputs.

  • n_outputs (int) – number of demographic parameters to infer

forward(x)[source]

The forward function of the network (see torch.nn.Module for more details). Should accept a batch of SNP matrices as input (may be in either concat format (where the position array has the SNP matrix concatenated to it) or product format (where the position array is multiplied by the SNP matrix).

If the operation of the net depends on which format the input is in, its __init__ method should accept a concat argument. It will be passed True or False by the network trainer depending on which format the inputs are in.

class dnadna.nets.SPIDNABlock(n_outputs, n_features)[source]

Bases: torch.nn.modules.module.Module

Sub-part of the SPIDNA network. The number of SPIDNABlock inside the SPIDNA network is defined by the n_blocks parameter.

forward(x, output)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.