Eventi

5 Novembre, 2015 11:00

Sezione di Probabilità e Statistica Matematica

On random sequence comparison

Juri Lember, University of Tartu, Tartu, Estonia

Auletta Seminari III piano

Abstract

Measuring the similarity of long strings like texts or DNA-sequences is an important subject in many fields. In computational molecular biology, for example, similar DNA sequences are deemed to have a common ancestor (homologous); in linguistics, the similarity of texts can indicate the same topic or author. Typically the similarity between two (finite length) sequences is measured via a similarity score. A commonly used score is the length of the longest common subsequence, but there are many other scores possible.
Often the sequences are modeled as random processes and then the score is a random variable, too. In order to distinguish related sequences from unrelated (independent) ones, one has to know the distribution of random score, at least asymptotically. It turns out that this is a very complicated task and has not yet been solved even for the simplest models.
In the talk, a brief overview of main sequence comparison ideas and methods is given. Moreover, some recent results concerning the rate of convergence of the moments of score are discussed as well.