Tuesday, July 28, 2009

Probability question,need help,thanks~?

Two (short) DNA sequences are compared to judge the degree of relatedness of the


organisms from which they were obtained. We will consider the random variable


X = number of matching bases in corresponding positions on both sequences. For


simplicity, we will ignore the possibility of insertions or deletions for this problem.


The sequences to be compared are:


A T T G C T C T A T T G T G G A C T A C


A T T G C T G T A C T G A G G A C T A C


(a) Suppose that the two pieces of DNA are random and unrelated. Then what is


the distribution of the random variable X (state the name of the distribution


and list its parameter(s))? On the other hand, if the sequences were related,


then what would be a reasonable (qualitative not quantitative) assumption to


be made about the distribution parameter(s)?


(b) Formulate a null hypothesis and alternative (in terms of the distribution param-


eter(s) from part (a)). Our goal is to decide whether or not the two sequences


are related.


(c) Write down the probability distribution function of the random variable X if


the null hypothesis were true. This is your test statistic function.


(d) Draw a (rough) sketch of the test statistic distribution function. Compute the


observed value of X and indicate it in your sketch.


(e) Compute the p-value for this hypothesis test and draw a conclusion at signifi-


cance level = 0.05.

Probability question,need help,thanks~?
Nice problem; particularly part A.





In the spirit of homework oversimplification, we can assume that each position can take on 4 values (A, C, G, or T) independent of any adjacent bases.





(This is contrary to fact. In DNA, the alphabet is triplets of bases and some triples are "equal" to others.)


http://en.wikipedia.org/wiki/Codon





So the probability of a match in any one position is 1/4 and you are dealing with a binomial distribution.





http://en.wikipedia.org/wiki/Binomial_di...





As for just how close the sequences have to be to be related, that's a problem for biology, not math. But I assume your teacher wants you to ignore that.


No comments:

Post a Comment