Title: | Generating and Testing DNA Sequences |
---|---|
Description: | Generates DNA sequences based on Markov model techniques for matched sequences. This can be generalized to several sequences. The sequences (taxa) are then arranged in an evolutionary tree (phylogenetic tree) depicting how taxa diverge from their common ancestors. This gives the tests and estimation methods for the parameters of different models. Standard phylogenetic methods assume stationarity, homogeneity and reversibility for the Markov processes, and often impose further restrictions on the parameters. |
Authors: | Faisal Ababneh, John Robinson, Lars S Jermiin and Hasinur Rahaman Khan |
Maintainer: | Hasinur Rahaman Khan <[email protected]> |
License: | GPL-2 |
Version: | 1.0 |
Built: | 2024-11-14 05:13:49 UTC |
Source: | https://github.com/cran/DNAseqtest |
Generates DNA sequences based on Markov model techniques for matched sequences. This can be generalized to several sequences. The sequences (taxa) are then arranged in an evolutionary tree (phylogenetic tree) depicting how taxa diverge from their common ancestors. This gives the tests and estimation methods for the parameters of different models. Standard phylogenetic methods assume stationarity, homogeneity and reversibility for the Markov processes, and often impose further restrictions on the parameters.
Package: | DNAseqtest |
Type: | Package |
Version: | 1.0 |
Date: | 2016-03-26 |
License: | GPL-2 |
Faisal Ababneh, John Robinson, Lars S Jermiin and Hasinur Rahaman Khan Maintainer: Hasinur Rahaman Khan <[email protected]>
Lars Sommer Jermiin, Vivek Jayaswal, Faisal Ababneh, John Robinson (2008). Phylogenetic model evaluation. Bioinformatics, Volume 452 of the series Methods in Molecular Biology, 331-364.
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
Faisal Ababneh, Lars S Jermiin, John Robinson (2006). Generation of the Exact Distribution and Simulation of Matched Nucleotide Sequences on a Phylogenetic Tree. Journal of mathematical modelling and algorithms, 5(3), 291-308.
#To generate a 4^5 gene array merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3),4,2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) gn.sec<-gn(theta, merge2) gn.sec
#To generate a 4^5 gene array merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3),4,2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) gn.sec<-gn(theta, merge2) gn.sec
Array to
MatrixThis function transfers any array to a matrix.
artomat(fobs)
artomat(fobs)
fobs |
a |
This function transfers any array containing the observed divergent frequencies of K matched sequences to an m x K matrix, where m is the sum of the frequencies in the
observed divergence array.
An m x K matrix
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
gn2, gn, Fmatrix
This function calculates the paralinear distance between K matched DNA sequences.
Distance(F4)
Distance(F4)
F4 |
a |
This function calculates the paralinear distances between K matched DNA sequences, depending on the joint distribution array for these K sequences or on the observed divergence array N.
A K x K symmetric matrix distances between the K sequences
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
gn2, gn, Fmatrix, Ntml
merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) F1<-gn(theta,merge2) dn<-Distance(F1) dn
merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) F1<-gn(theta,merge2) dn<-Distance(F1) dn
This function calculates the joint distribution function for two edge tree.
Fmatrix(t1, t2, f0, Sx2, Sy2, Pix, Piy)
Fmatrix(t1, t2, f0, Sx2, Sy2, Pix, Piy)
t1 |
represents the length from the tree root to the first node |
t2 |
represents the length from the tree root to the second node |
f0 |
the initial distribution for the four nucleotides |
Sx2 |
a 4 x 4 symmetric matrix related to the first edge |
Sy2 |
a 4 x 4 symmetric matrix related to the second edge |
Pix |
a diagonal matrix for the stationary distribution of the first edge |
Piy |
a diagonal matrix for the stationary distribution of the second edge |
This function calculates the joint distribution function for a two edge tree with different edge lengths, stationary distributions and differentS matrices.
A 4 x 4 matrix containing the joint edges
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
gn, Smatrix
f0<-c(.25,.25,.25,.25) Pi1<-diag(c(.2,.2,.2,.4)) Pi2<-diag(c(.1,.1,.1,.7)) S1<-Smatrix(c(.2,.2,.2,.2,.2,.2),diag(Pi1)) S2<-Smatrix(c(.3,.3,.3,.3,.3,.3),diag(Pi2)) fm<-Fmatrix(1, .5, f0, S1, S2, Pi1, Pi2) fm
f0<-c(.25,.25,.25,.25) Pi1<-diag(c(.2,.2,.2,.4)) Pi2<-diag(c(.1,.1,.1,.7)) S1<-Smatrix(c(.2,.2,.2,.2,.2,.2),diag(Pi1)) S2<-Smatrix(c(.3,.3,.3,.3,.3,.3),diag(Pi2)) fm<-Fmatrix(1, .5, f0, S1, S2, Pi1, Pi2) fm
This function calculates the joint distribution array for K matched sequences.
gn(theta, merge2)
gn(theta, merge2)
theta |
a vector of variables containing the following parameters in this order–1. the first three parameters from |
merge2 |
(K-1) x 2 matrix describing the tree topology |
This function calculates the joint distribution array for a tree with K matched sequences. it uses the following functions– Pt, Fmatrix and Smatrix.
A array containing the joint distribution for the K edges
Lars Sommer Jermiin, Vivek Jayaswal, Faisal Ababneh, John Robinson (2008). Phylogenetic model evaluation. Bioinformatics, Volume 452 of the series Methods in Molecular Biology, 331-364.
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
Fmatrix, Pt, Smatrix
#To generate a 4^5 gene array merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3),4,2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) gn.sec<-gn(theta, merge2) gn.sec
#To generate a 4^5 gene array merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3),4,2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) gn.sec<-gn(theta, merge2) gn.sec
This function calculates the joint distribution array for K matched sequences (second option).
gn2(theta, merge2)
gn2(theta, merge2)
theta |
a vector of variables containing the following parameters in this order–1. the first three parameters from |
merge2 |
(K-1) x 2 matrix describing the tree topology |
This function calculates the joint distribution array for a tree with K matched sequences. it uses the following functions– Pt, Fmatrix and Smatrix.
A array containing the joint distribution for the K edges
Lars Sommer Jermiin, Vivek Jayaswal, Faisal Ababneh, John Robinson (2008). Phylogenetic model evaluation. Bioinformatics, Volume 452 of the series Methods in Molecular Biology, 331-364.
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
Fmatrix, Pt, Smatrix
#To generate a 4^5 gene array merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) rho2<-matrix(c(.3,.5,.3,.2,.3,.5,.8,2.7),4,2) theta<-c(rep(.25,3), rep(.25,3),rep(.25,3), c(.2,.35,.79,.01,.93,.47),rho2) gn2<-gn2(theta, merge2) gn2
#To generate a 4^5 gene array merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) rho2<-matrix(c(.3,.5,.3,.2,.3,.5,.8,2.7),4,2) theta<-c(rep(.25,3), rep(.25,3),rep(.25,3), c(.2,.35,.79,.01,.93,.47),rho2) gn2<-gn2(theta, merge2) gn2
This function generates random DNA samples using Rambaut and Grassly method.
gn3sim(theta, seqLength, merge2)
gn3sim(theta, seqLength, merge2)
theta |
a vector of variables containing the following parameters in this order–1. the first three parameters from |
seqLength |
the length of sequences we need to generate |
merge2 |
(K-1) x 2 matrix describing the tree topology |
This function generates a DNA array using Rambaut and Grassly,
(1997) method. It depends on a set of variables theta, the sequence
length and a merge matrix describing the tree topology.
A n x K observed divergence matrix
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
Ntml, simapp, simemb, gn, gn2, Fmatrix
# This will give 4^5 observed divergence array theta<-(c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8)) n<-1000 merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) gn3<-gn3sim(theta, n, merge2) gn3
# This will give 4^5 observed divergence array theta<-(c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8)) n<-1000 merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) gn3<-gn3sim(theta, n, merge2) gn3
This function calculates log likelihood ratio value.
likelihood(thetast, fobs, merge2)
likelihood(thetast, fobs, merge2)
thetast |
a starting values for the parameter we need to estimate |
fobs |
the |
merge2 |
a (K-1) x 2 matrix describing the tree topology |
This function calculates the log likelihood ratio value for F(t). It needs
a vector of starting values for the parameters estimate, observed divergence array and merge matrix describing the tree topology.
The value of the log likelihood ratio
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
gn, gn2
merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47),3,.1,.5,.8) F1<-gn(theta, merge2) lh<-likelihood(theta, F1, merge2) lh
merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47),3,.1,.5,.8) F1<-gn(theta, merge2) lh<-likelihood(theta, F1, merge2) lh
Generating random DNA samples from a multinomial distribution.
Ntml(N, Ft)
Ntml(N, Ft)
N |
sample size |
Ft |
a |
This function generates a DNA array from a multinomial distribution. It depends on the sample size we need to generate and the
joint distribution array of K matched sequences.
A observed divergence array
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
simemb, simapp, gn3sim, gn, gn2, Fmatrix
#This will give a 4^K observed divergence array merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) F1<-gn(theta,merge2) Nt<-Ntml(1000, F1) Nt
#This will give a 4^K observed divergence array merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) F1<-gn(theta,merge2) Nt<-Ntml(1000, F1) Nt
This function calculates the transition probability function for a process during a period of time.
Pt(S, Pi, t)
Pt(S, Pi, t)
S |
a 4 x 4 symmetric matrix |
Pi |
a diagonal matrix containing the stationary distribution for the process |
t |
a period of time describing the length of the process |
This function needs the 4 x 4 symmetric matrix S, II and the process
length t in order to find the transition probability over that process,
where is the probability that the ith nucleotide changes to the
j-th nucleotide during the period of t.
A 4 x 4 matrix containing the transition probabilities for a process.
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
Smatrix
Pi<-diag(c(.1,.1,.1,.7)) S<-Smatrix(c(.3,.3,.3,.3,.3,.3),diag(Pi)) t<-1 p<-Pt(S, Pi, t) p
Pi<-diag(c(.1,.1,.1,.7)) S<-Smatrix(c(.3,.3,.3,.3,.3,.3),diag(Pi)) t<-1 p<-Pt(S, Pi, t) p
This function generates random DNA samples using an approximation method
simapp(theta, seqLength, merge1)
simapp(theta, seqLength, merge1)
theta |
a vector of variables containing the following parameters in this order–1. the first three parameters from |
seqLength |
the length of sequences we need to generate |
merge1 |
(K-1) x 2 matrix describing the tree topology |
This function generates a DNA array using an approximation method. It depends on a set of variables theta, the sequence length and a merge matrix describing the tree topology.
A n x K observed divergence matrix
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
Ntml, simemb, gn3sim, gn, gn2, Fmatrix
# This will give 4^5 observed divergence array theta<-(c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.2,.2,.2,.2,.2), 3,.1,.5,.8)) n<-1000 merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) sa<-simapp(theta, n, merge2) sa
# This will give 4^5 observed divergence array theta<-(c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.2,.2,.2,.2,.2), 3,.1,.5,.8)) n<-1000 merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) sa<-simapp(theta, n, merge2) sa
This function generates random DNA samples using embedded chain.
simemb(theta, seqLength, merge2)
simemb(theta, seqLength, merge2)
theta |
a vector of variables containing the following parameters in this order–1. the first three parameters from |
seqLength |
the length of sequences we need to generate |
merge2 |
(K-1) x 2 matrix describing the tree topology |
This function generates DNA array using embedded Markov chain.
It depends on a set of variables theta, the sequence length and a merge
matrix describing the tree topology.
A n x K observed divergence matrix
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
Ntml, simapp, gn3sim, gn, gn2, Fmatrix
# This will give 4^5 observed divergence array theta<-(c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8)) n<-1000 merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) sm<-simemb(theta, n, merge2) sm
# This will give 4^5 observed divergence array theta<-(c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8)) n<-1000 merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) sm<-simemb(theta, n, merge2) sm
This function calculates the symmetric matrix S.
Smatrix(s, pix)
Smatrix(s, pix)
s |
a vector of variables containing the six free parameters in the S matrix |
pix |
a vector giving the stationary probabilities for the four nucleotides A, C, G and T |
This function calculates the matrix S, which we used to calculate the rate matrix R.
A 4 x 4 symmetric matrix
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
Pt, Fmatrix, gn ,gn2
s<-c(.1,.2,.3,.4,.5,.6) pi<-c(.1,.1,.1,.7) sm<-Smatrix(s, pi) sm
s<-c(.1,.2,.3,.4,.5,.6) pi<-c(.1,.1,.1,.7) sm<-Smatrix(s, pi) sm
This function tests for symmetry between all the pairs of K matched DNA sequences.
TEST2(f)
TEST2(f)
f |
a |
This function calculates Bowker's test for symmetry, Stuart's test for
marginal symmetry and the test for internal symmetry. It depends on the observed divergence array N.
A list of three lower triangle matrices
first |
the lower triangle of the matrix contains (K-1) x (K-1) values shows Bowker's test between all the possible pairs of the K sequences |
second |
the lower triangle of the matrix contains (K-1) x (K-1) values shows Stuart's test between all the possible pairs of the K sequences |
third |
the lower triangle of the matrix contains (K-1) x (K-1) values shows the internal test between all the possible pairs of the K sequences |
Lars Sommer Jermiin, Vivek Jayaswal, Faisal Ababneh, John Robinson (2008). Phylogenetic model evaluation. Bioinformatics, Volume 452 of the series Methods in Molecular Biology, 331-364.
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
Ntml, simapp, simemb, TEST3
merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) F1<-gn(theta,merge2) N1<-Ntml(1000,F1) t2<-TEST2(N1) t2
merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) F1<-gn(theta,merge2) N1<-Ntml(1000,F1) t2<-TEST2(N1) t2
This function tests for symmetry between K matched DNA sequences.
TEST3(Farray)
TEST3(Farray)
Farray |
a |
This function calculates overall test for marginal symmetry. It depends
on the observed divergence array N.
A single value gives the overall test for marginal symmetry between K matched sequences
Lars Sommer Jermiin, Vivek Jayaswal, Faisal Ababneh, John Robinson (2008). Phylogenetic model evaluation. Bioinformatics, Volume 452 of the series Methods in Molecular Biology, 331-364.
Faisal Ababneh, Lars S Jermiin, Chunsheng Ma, John Robinson (2006). Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics, 22(10), 1225-1231.
Ntml, simapp, simemb, TEST2
merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) F1<-gn(theta,merge2) N1<-Ntml(1000,F1) t3<-TEST3(N1) t3
merge2<-matrix(c(-1,-4,-3,2,-2,-5,1,3), 4, 2) theta<-c(rep(.25,3), rep(.25,3), rep(.25,3), c(.2,.35,.79,.01,.93,.47), 3,.1,.5,.8) F1<-gn(theta,merge2) N1<-Ntml(1000,F1) t3<-TEST3(N1) t3