Project
Title: Statistical Estimation of Insertion-Deletion Stacked Pair Distance between
Random Strings Generated by Markov Sources and Generating New DNA Codes.
Adviser: Vyacheslav Rykov
Description. We will use the concept of block isomorphic subsequences to describe new
abstract weighted string metrics that are similar to the weighted Levenshtein insertion-deletion metric. The metrics will be
used to model a thermodynamic distance function on single stranded DNA sequences.
Our model captures a key aspect of the nearest neighbor thermodynamic model for
hybridized DNA duplexes. Our thermodynamically weighted distance function is a
metric in the rigorous mathematical sense. Thermodynamic distance functions are
important components in the construction of DNA codes and DNA codes are
important components in biomolecular computing and other biotechnical
applications that employ DNA hybridization assays. We show how this new
distances can be calculated and we will create algorithms for generating new DNA codes.
In this project we
propose that the participating student engages in the following activities:
A.
Study the theoretical aspects of the problem by using following sources:
1. A.G. D’yachkov, A.J. Macula_, W.K. Pogozelski,
T.E. Renz, V.V. Rykov, and D.C. Torney,
A Weighted Insertion-Deletion Stacked Pair Thermodynamic Metric for DNA Codes. The Tenth International Meeting on DNA Computing.
2 A.G. D’yachkov, A.J. Macula_, W.K.
Pogozelski, T.E. Renz, V.V.
Rykov, And D.C. Torney, An Insertion-Deletion Like Metric with
Application to DNA Hybridization Thermodynamic Modeling, IEEE Information Theory, 2004 .
3. A. Serfling,
Approximation Theorems of Mathematical Statistics, John Wiley, 1985.
4. F.J. MacWilliams, N.J.A. Sloan,
The Theory of Error - Correcting Codes,
This
will help her/him understand the research topic and serve as an introduction to
the final, written, research report.
B. Develop algorithms
for statistical estimation the distance between
random strings, generating Markov sources of strings and generating new DNA codes.
C. Write computer programs
for generating random strings, statistical estimation, and generating DNA codes.
D. Run the
programs and help the adviser to get the statistical estimation of the distance for different
parameters of Markov sources and lengths of the
strings, generate new DNA codes, and make web site with generated codes.
E. Put together
her/his findings, written software, significant graphs, tables, and so on in
the final research report to be presented at the
MAM Symposium.
OTHER REQUIREMENTS: The
students interested in the project above are expected to have taken and passed
with maximal grades or close MATH 1950(Calc I), MATH 1960(Calc II), MATH 4050
(Linear Algebra) and MATH 4740 (Introduction to Probability and
Statistics). They should be familiar with computers and MAPLE, be willing
to learn C++ and taking MATH 8670 (Topics in Probability/Statistics)
in the spring semester. The student is also expected to meet with the adviser a
couple of times a week, (or communicate actively by e-mail) for discussions,
guidance, and progress reports during the preparation period of the
project.