Genetic Linkage Analysis
Contributor(s)
Written  200205  Françoise ClergetDarpoux 
Unité de Recherche dEpidémiologie Génétique, INSERM U535, KremlinBicêtre, France 
 I. Recombination fraction
 II. Definition of the "lod score" of a family
 III. Test for linkage
 IV. Estimation of the recombination fraction
 V. Recombination fraction for a disease locus and a marker locus
Investigating the linked segregation of genes situated at different loci is a way of testing the independence of their transmission. This concept of independence is also reflected in the recombination fraction, q, which is the percentage of the gametes transmitted by the parents to be recombined. If they are transmitted independently, there will be the same number of recombined gametes as there are parental gametes, and so q= 1/2. If they are not transmitted independently, then the parenteral gametes are transmitted preferentially to the recombined gametes, and 0 q< 1/2. In this case, there is said to be "linkage" between the two loci.
I. Recombination fraction
Let us consider the caseof two loci, A and B, with two codominant alleles at each of these loci, A_{1}, A_{2} and B_{1}, B_{2 }respectively. Such an individual can produce four types of gamete:
A_{1}B_{
1}A_{2}B_{
1}A_{1}B_{2}
A_{2}B_{
2
}
Two situations are possible:
 The loci A and B are on different chromosome pairs In this case, the four gametes all have the same probability: 1/4.

The loci A and B are on the same chromosome pairs
Here we have to distinguish between two possible situations: the alleles A_{1 and} B_{1} may be on the same chromosome within the pair, in which case A_{1 }and B_{1} are said to be "coupled"; or they may be on different chromosomes, in which case A_{1 }and B_{1} are said to be in a state of "repulsion". For instance, let us suppose that A_{1 }and B_{1} are "coupled". Four types of gametes are still produced. Gametes A_{1}B_{1} and A_{2}B_{2} are said to be "parental". In the offspring, as in the parents, A_{1} is "coupled" with B_{1} (and A_{2} is "coupled" with B_{2)}.
The gametes A_{1}B_{2} and A_{2} B_{1} are therefore described as being "recombined". An uneven number of recombination or "crossingover" phenomena have occurred between the A and B loci.
Assuming that the crossingover event for a pair of chromosomes follows Poisson?s law, and knowing that a parental gamete has zero or an even number of crossingsover, whereas a recombined gamete has an odd number, we can show that the frequency of recombined gametes is always equal to or lower than that of the parenteral gametes and so
0 q < 1/2
If q = 1/2, then all the gamete types have the same probability and the alleles at the loci A and B loci are transmitted independently. Loci A and B are therefore said not to enhibit genetic linkage. This is the situation if A and B are on different pairs of chromosomes, and also if A and B are one the same pair, but at some distance from each other.
However, if q < 1/2, then the two loci are genetically linked.
For a couple of which the genotypes at the A and B are known, the probability of observing the genotypes of the offspring depends on the value of q.
Let us assume the following crossing:
Therefore, such a couple can have 4 types of offspring
Assuming that there is gamete equilibrium at the A and B loci, in parent 1 there is a probability of 1/2 that alleles A_{1 }and B_{1 }will be coupled, and a probability of 1/2 that they will be in repulsion.

A_{1 }and B_{1 }are coupled, so the probability that parent (1) provides the gametes A_{1}B_{1} and A_{2}B_{2} is (1q)/2 and the probability that this parent provides gametes A_{1}B_{2 }and A_{2}B_{1 }is q/2. The probability that the couple will have child of type (1) or (2) is (1q)/2, and that of their having a type (3) or type (4) child is q/2.
The probability of finding n_{1} children of type (1), n_{2} of type (2), n_{3} of type (3) and n_{4} of type (4) is therefore
[(1 q)/2]^{n1+n2} x (q/2)^{n3+n4} 
A_{1 }and B_{1 }are in a state of repulsion, so the probability that parent (1) provides the gametes A_{1}B_{2 }and A_{2}B_{1 }is (1q)/2 and the probability that this parent provides gametes A_{1}B_{1 }and A_{2}B_{2 }is q/2.
The probability of the previous observation is therefore:
(q/2)^{n1+n2} x[(1q/2]^{n3+n4}
So in the end, with no additional information about the A_{1 }and B_{1 }phase , and assuming that the alleles at the A and B loci are in a state of coupling equilibrium, the probability of finding n_{1}, n_{2}, n_{3} and n_{4} children in categories (1), (2), (3), (4) is:
p(n_{1},n_{2},n_{3},n_{4}/q)=1/2{[(1 q/2]^{n1+n2} x (q/2)^{n3+n4} + (q/2)^{ n1+n2} x [(1q/2]^{ n3+n4}}
So the liklihood of q for an observation n1, n2, n3, n4 can be written :
L(q/n1,n2,n3,n4)=1/2 {[(1q)/2]^{n1+n2} (q/2)^{n3+n4} + (q/2)^{ n1+n2} [(1q)/2]^{ n3+n4}}
Special case: number of children n= 1
Regardless of the category to which this child belongs
L(q) = 1/2 [(1q)/2] + 1/2 [q/2] = 1/4
The likelihood of this observation for the family does not depend on q. We can say that such a family is not informative for q.
Informative families
An "informative family" is a family for which the liklihood is a variable function of q.
One essential condition for a family to be informative is, therefore, that it has more than one child. Furthermore, at least one of the parents must be heterozygotic.
Definition: if one of the parents is doubly heterozygotic and the other is
 A double homozygote, we have a backcross
 A single homozygote, we have a simple backcross
 A double heterozygote, we have a double intercross
II. Definition of the "lod score" of a family
Take a family of which we know the genotypes at the A and B loci of each of the members.
Let L(q) be the liklihood of a recombination fraction 0 q < 1/2
L(1/2) be the liklihood of q= 1/2, that is of independent segregation into A and B.
The lod score of the family in q is:
Zq = log_{10} [L(q)/L(1/2)]
Z can be taken to be a function of q defined over the range [0,1/2].
Lod score of a sample of families
The liklihood of a value of q for a sample of independent families is the product of the liklihoods of each family, and so the lod score of the whole sample will be the sum of the lod scores of each family.
III. Test for linkage
Several methods have been proposed to detect linkage: the U scores, the sib pair test, the likelihood ratios, the lod score method. The lod score method is the one most commonly used at present.
The test procedure in the lod score method is sequential. Information, i.e. the number of families in the sample, is accumulated until it is possible to decide between the hypotheses H0 and H1 :

H0 : genetic independence q = 1/2
and  Hl: linkage of q_{1 }0 q_{1} < 1/2
The lod score of the q_{1} sample
 Z(q_{1}) = log_{10} [L(q_{1})/L(l/2)]
("lod = logarithm of the odds").
The decision thresholds of the test are usually set at 2 and +3, so that if:
 Z(q_{1}) 3 H0 is rejected, and linkage is accepted.
 Z(q_{1}) 2 linkage of q_{1 }is rejected.
 2 < q_{1}) < 3 it is impossible to decide between H0 and Hl. It is necessary to go on accumulating information.
 The first degree error, a < 10^{3}
 The second degreee error, b < 10^{2}
 The reliability, 1r> 0.95 "q_{1}
 The power, P(q) > 0.80 "q_{1} if the true value of q < 0.10
In fact, what is being tested is not a single value of q_{1} relative to q = 1/2, but a whole set of values between 0 and 1/2, with a step of various size (0.01 or 0.05).
If there is a value of q_{1} such that Z(q_{1}) 3: linkage is concluded to exist.
If there is a value of q_{1 }such that Z(q_{1}) = 2
The linkage is excluded for any q q_{1}
If q2 < Z(q) < 3, no conclusion can be drawn, the sample is not sufficiently informative.
The proposed test has the advantage of being very simple, and of providing protection against falsely concluding linkage. However, some criticisms can be levelled, not only against the criteria chosen, but also against the entire principle of using a sequential procedure. The number of families typed is, indeed, rarely chosen in the light of the test results.
IV. Estimation of the recombination fraction
If the test, on a sample of the family, has demonstrated linkage between the A and B loci, then one may want to estimate the recombination fraction for these loci.
The estimated value of q is the value which maximizes the function of the lod score Z, and this is equivalent to taking the value of q for which the probability of observing linkage in the sample is greatest.
V. Recombination fraction for a disease locus and a marker locus
Let us assume we are dealing with a disease carried by a single gene, determined by an allele, g_{0,} located at a locus G (g_{0} : harmful allele, G_{0 }: normal allele).
We would like to be able to situate locus G relative to a marker locus T, which is known to occupy a given locus on the genome. To do this, we can use families with one or several individuals affected and in which the genotype of each member of the family is known with regard to the marker T.
In order to be able to use the lod scores method described above, what is needed
 the frequency, g_{0}

the penetration vector f_{1}, f_{2},f_{3}
 f_{1} = proba (/g_{0}g_{0})
 f_{2} = proba (affected /g_{0}G_{0})
 f_{3} = proba (affected /G_{0}G_{0})
It will often happen that the information available for the marker is not also genotypic, but phenotypic in nature. Once again, all possible genotypes must be envisaged.
As a general rule, the information available about a family concerns the phenotype. To calculate the likelihood of q, we must envisage all the possible genotype configurations at each of the loci, for this family, writing the likelihood of q for each configuration, weighting it by the probability of this configuration, and knowing the phenotypes of individuals in A and B.
Knowledge of the genetic parameters at each of the loci (gene frequency, penetration values) is therefore necessary before we can estimate q.
It is obvious that calculating the lod scores, despite being simple in theory, is in fact a lengthy and tedious business; specific software have been designed for linkage analysis.
Analysis of gene linkage has made it possible to construct a gene map by locating the new polymorphisms relative to one other on the genome. The measurement used on the gene map is not the recombination fraction, which is not an additive datum, but the gene distance, which we will define below.
Citation
ClergetDarpoux F
Atlas of Genetics and Cytogenetics in Oncology and Haematology 20020501
Genetic Linkage Analysis
Online version: http://atlasgeneticsoncology.org/teaching/30031/geneticlinkageanalysis