Project Title: Statistical Estimation of Insertion-Deletion Stacked Pair Distance between Random Strings Generated by Markov Sources and Generating New DNA Codes.

 

Adviser: Vyacheslav Rykov

 

Description. We will use the concept of block isomorphic subsequences to describe new abstract weighted string metrics that are similar to the weighted Levenshtein insertion-deletion metric. The metrics will be used to model a thermodynamic distance function on single stranded DNA sequences. Our model captures a key aspect of the nearest neighbor thermodynamic model for hybridized DNA duplexes. Our thermodynamically weighted distance function is a metric in the rigorous mathematical sense. Thermodynamic distance functions are important components in the construction of DNA codes and DNA codes are important components in biomolecular computing and other biotechnical applications that employ DNA hybridization assays. We show how this new distances can be calculated and we will create algorithms for generating   new DNA codes.

In this project we propose that the participating student engages in the following activities:

 

A. Study the theoretical aspects of the problem by using following sources:

 

1.  A.G. D’yachkov, A.J. Macula_, W.K. Pogozelski, T.E. Renz, V.V. Rykov, and D.C. Torney, A Weighted Insertion-Deletion Stacked Pair Thermodynamic Metric for DNA Codes. The Tenth International Meeting on DNA Computing. Milano-Bicocca, Italy 2004

 

2  A.G. D’yachkov, A.J. Macula_, W.K. Pogozelski, T.E. Renz, V.V. Rykov, And D.C. Torney,  An Insertion-Deletion Like Metric with Application to DNA Hybridization Thermodynamic Modeling, IEEE Information Theory, 2004 .

 

3. A. Serfling, Approximation Theorems of Mathematical Statistics, John Wiley, 1985.

 

4. F.J. MacWilliams, N.J.A. Sloan, The Theory of Error - Correcting Codes, Amsterdam, The

 Netherlands:  North Holland, 1977.

 

This will help her/him understand the research topic and serve as an introduction to the final, written, research report.

 

B. Develop algorithms for statistical estimation the distance between random strings, generating Markov sources of strings and generating new DNA codes.

 

C. Write computer programs for generating random strings, statistical estimation, and generating DNA codes.

 

D. Run the programs and help the adviser to get the statistical estimation of the distance for different parameters of Markov sources and lengths of the strings, generate new DNA codes, and make web site with generated codes. 

 

E. Put together her/his findings, written software, significant graphs, tables, and so on in the final research report to be presented at the MAM Symposium.

 

OTHER REQUIREMENTS: The students interested in the project above are expected to have taken and passed with maximal grades or close MATH 1950(Calc I), MATH 1960(Calc II), MATH 4050 (Linear Algebra) and MATH 4740 (Introduction to Probability and Statistics). They should be familiar with computers and MAPLE, be willing to learn C++ and taking MATH 8670 (Topics in Probability/Statistics) in the spring semester. The student is also expected to meet with the adviser a couple of times a week, (or communicate actively by e-mail) for discussions, guidance, and progress reports during the preparation period of the project.