Dichotomic classes, short range correlations and entropy optimization in coding sequences

Simone Giannerini
University of Bologna, Department of Statistical Sciences, Italy

In this talk we introduce and study dichotomic classes, quantities that arise naturally from a mathematical model of the genetic code. Dichotomic classes can be defined as nonlinear functions of the information contained in a dinucleotide, that is, a group of two adjacent bases. Interestingly, such classes, that represent precise biochemical interactions, emerge naturally from the mathematical model. Moreover, dichotomic classes possess precise symmetry properties and can be put in a group theoretic framework.

We use the dichotomic classes as a coding scheme for DNA sequences and study the mutual dependence between such classes. We obtain meaningful tests for dependence by using an entropy based measure possessing many desirable properties together with suitable resampling techniques. We find universal strong short-range correlations between certain combinations of dichotomic classes. These correlations point to the existence of a local structure that might be related to the mechanisms of error correction and entropy optimization in the management of genetic information.