5 Polygenic Adaptation in SLiM
5.1 Overview
In this chapter I’ll provide a number of templates aiming to reconstruct some of the polygenic models we have seen in SLiM. I’ve aimed to make these templates highly customisable, with many features that can be adjusted as you see fit. As such, I’ll break down each section of the code to explain what it does, which may give you ideas for how it could be extended for your own use cases, or which parts can be removed if you don’t need them and want to maximise your speed and/or minimise memory use.
5.2 A Single Polygenic Trait (Chp5-1_1T.slim)
The first example we’ll go over models a single polygenic trait evolving through either drift, stabilising selection, or directional selection.
This template has been built to be very broad, with a large array of parameters. These parameters are defined at the top of the file -
any line beginning with SetCfgParam() is a different parameter you may adjust3. The defaults are set based on realistic values, with sources
cited in comments. See Box 5.2.1 for information on each of these parameters.
The model begins with a specified number of generations of burn-in (given via the parameter burnTime).
This burn-in period involves mutations subject to neutral drift, which yields standing genetic variation for selection to act on when a fitness function
is applied. The idea is for the population to be at mutation-drift equilibrium by the time the selection regime is applied.
I’ve found this typically occurs within 50,000 generations for a population size of 8,000, but you’ll likely want to adjust this as you see fit.
One way to confirm when a population is at equilibrium is by tracking heterozygosity over time. In the SLiM script (Chp5-1_1T.slim) you can
do this by setting SetCfgParam(printH, T); (Line 36). This will plot heterozygosity over time during your set burn-in period for you to view.
When the population is at equilibrium, this value should remain relatively stable around \(\theta = 4N_e\mu\), however the amount of stability around
this value (or any value at all) is dependent on several parameters including linkage (given by rwide), and background selection
(determined by del_mean, del_shape, and `mutWeights). During burn-in, a variety of output files are generated.
These include files containing
- Information on which genome positions are QTLs as opposed to non-QTL sites
- Heterozygosity and phenotypic means over time (sampled every X generations, defined by the first value in samplerate).
- The phenotypic location of the trait optimum.
Following burn-in, a certain selection regime takes place for a specified number of generations. This is set using the selType and
testTime parameters. Here, the population follows a stabilising or directinal selection trajectory, or continues via neutral drift
(“Brownian motion”). During this process, some more output is generated:
- Information on heterozygosity, phenotypic means, and fitness (sampled at a rate fiven by the first value in samplerate)
- Allelic effect sizes, frequencies, fixation times, etc. (sampled at a rate given by the second value in samplerate)
Note that mutation output is quite large (seeing as it stores many lines every time it writes to file), so you will want to adjust the sampling frequency as per your available storage, the size of your simulation (in number of generations, size of the genome, number of mutations/mutation rate etc.).
5.3 Parameters
The script template contains a lot of parameters. Some of these will be applicable to you, others won’t be. I’ll explain each of them here so you can get an idea. In brackets next to the parameter name is the type it expects, and how many arguments it expects (if > 1).
| Parameter (type) | Description |
|---|---|
| mu (float) | Genome-wide mutation rate |
| Ne (int) | Population size |
| del_mean and del_shape (float) | Mean and shape parameters used to pull effects of deleterious mutations |
| mutWeights (float, 3) | Proportions of mutations of different types |
| rwide (float) | Genome wide recombination rate |
| nloci (int) | The number of QTLs to model |
| genomelength (int) | The total length of the genome in loci |
| locimu, locisigma, locidist (float, float, string) | Parameters for QTL distribution of additive effects. |
| width (float) | Stabilising selection fitness function width parameter |
| opt (numeric) | Position of the optimum relative to the burn-in phenotype mean |
| selType (string) | The type of selection to use during the test: “s” = stabilising, “p” = positive/directional, “d” = drift |
| printH (bool) | Draw a graph of heterozygosity during burn-in to diagnose mutation-drift equilibrium |
| burnTime (int) | Number of generations to run neutral burn-in for. |
| testTime (int) | Number of generations to run the selective regime for. |
| samplerate (int, 2) | How often, in generations, to save output |
| out… (string) | Output filenames and paths for various output files |
| modelindex (int) | Identifier for Latin Hypercube job arrays |
Below are some tips regarding some of the parameters to guide you in designing your experiments:
5.3.1 Ne
The single most expensive thing to do in SLiM is increase the population size. Be careful with setting this and time your simulations as you adjust it so you can maximise your population size for the available time you have to run your simulations. Be aware that smaller population sizes = stronger drift, and so selection will be comparatively weaker in small populations.
5.3.2 del_mean and del_shape
When mutations appear in SLiM, a fitness effect is pulled from a user-specified distribution of fitness effects. Individuals with the mutation have their fitness adjusted according to this effect. Non-trait deleterious mutations are pulled from a gamma distribution in this simulation, which asks for a mean and shape parameter. This is slightly different to the standard mathematical notation, which usually specifies a \(\alpha\) and \(\beta\) parameter. In this case, the mean is equal to \(\alpha\beta\), and the shape parameter is \(\alpha\). You are able to see the shape of your distribution of effects for any mutation type by clicking the hand underneath the generation counter in SLiMgui and hovering over the mutation type:

5.3.3 mutWeights
In this simulation, not all genome positions are QTLs. Some are sites with nothing to do with the trait whatsoever.
Therefore, there are 3 mutation types: neutral (m1), which do nothing except give an idea of how linked variation
declines with other types; non-trait deleterious (m2), which act as a form of background selection: imagine these as
being mutations that don’t affect the trait but reduce fitness; and QTLs (m3), which affect the trait. All three of
these types can occur at a QTL site, but QTL mutations (m3) cannot occur at non-QTL sites. Imagine a mutation
occurs at a QTL that causes a pleiotropic effect elsewhere, resulting in an extremely deleterious effect. This is the
kind of situation this is modelling. Similarly, a neutral mutation could occur at a QTL site that is synonymous,
not changing the phenotype at all.
In this simulation, all mutations begin life as an m1 mutation. The simulation checks if any mutation exists at a QTL site
and then randomly chooses a type based on mutWeights. This parameter specifies the relative likelihood of a
given mutation type (neutral vs QTL vs deleterious) occurring.
5.3.4 rwide
From my experience, recombination rate in SLiM doesn’t typically affect performance too much when using realistic values.
However, if you want to model independent assortment (by setting rwide to 0.5) to get rid of linkage effects,
you will probably see a tiny\(^1\) performance drop\(^2\). I’d suggest testing this out, and if you want to model independent
assortment, do so with a small genome (or a small population). This parameter is set as a single genome-wide value, however
recombination maps can be constructed in SLiM using a vector of rates followed by a vector of endpoints where the next rate
begins. This will require some small adjustments to the script, but should be trivial given you read the documentation on the
initializeRecombinationRate() method.
\(^1\): Massive
\(^2\): Death of your computer
5.3.5 genomelength
This simulation assumes each ‘box’ in the SLiM genome is a locus rather than a base pair. This means all rates (rwide, mu etc.)
are scaled according to per locus per generation, rather than per base pair. nloci specifies the number of QTL positions
to be randomly scattered across the genome, whereas genomelength is the actual genome length in total loci.
Hence, nloci must be \(<=\) genomelength.
5.3.6 locimu, locisigma, and locidist
QTL mutations effects are pulled from a normal distribution by default in this simulation, with mean = locimu and variance = locisigma.
This can be changed though, using the locidist parameter. A normal distribution (default) is set using “n”.
A gamma distribution of effects can also be chosen using “g”.
5.3.7 width
The fitness function being used for stabilising selection comes from Lande 1976 (SOURCE). This equation comes out to the form \(f(z)\ =\ 0.01+\exp(-(z^{2})*\left(\frac{1}{2w^{2}}\right))\)
Where f(z) is the fitness of an individual z units from the optimum, w is the width of the function, and z is the Euclidean distance from the optimum. You can play around with that here. Moving the width will show you how fitness decays as an individual gets far from the optimum: higher width means weaker selection (lower difference between individuals at the optimum vs those far away).
The width parameter is precomputed to save a little bit of computing time - computers don’t like doing division
very much, and doing that every time we calculate fitness is wasteful.
Hence, we precompute \(\frac{1}{2w^{2}}\), and that is our width parameter.
5.3.8 printH
Setting this option to T will enable a diagnostic plot of heterozygosity over time to help visualise mutation-drift
equilibrium for your given mutation rate and population size (see Section 5.2). Dotted lines drawn indicate 10% CIs of \(\theta\),
a measure of nucleotide diversity given by \(\theta = 4N_e\mu\).
5.3.9 samplerate
There are two values set in the samplerate parameter. The first value corresponds to ordinary outputs
(means, heterozygosities etc.), which occur during burn-in and the test. The second value corresponds to mutation output,
which is considerably large due to printing many lines per generation. As a result, you likely will need to sample mutations less
frequently than other output in order to save space (and have your files openable in R without running out of memory).
5.3.10 modelindex
modelindex is an important parameter for Latin Hypercube sets of simulations, however we’ll get into that later.
For now, you can use it individually to quickly identify different simulations where you have changed a bunch of
parameters (e.g. if you have three levels of rwide, you could set three different modelindex values to
identify those models. This is more useful when combined with interactions between multiple parameters). It can also be
used as a proxy for all of your variables, saving considerable space when storing output. For example, you will only need
one column to identify the parameters which led to a result rather than a column for each parameter. This is particularly
input with regards to mutation output, which grows to potentially 100s of gigabytes in size.
5.4 Footnotes
SetCfgParam()is based on the SLiM functiondefineConstant(), but works for both command line and SLiMgui so no changes are necessary to switch between the two.↩︎