Institute Cytology and Genetics

Laboratory of Theoretical Genetics

Head N.A.Kolchanov, Dr.Biol.Sci., Prof.

Correlation between the value of helical twist angle and double strand DNA nucleotide context (as represented in B-DNA-Video database)

A model of the DNA-protein interaction

Linear correlation between a) DNA-USF affinity and twist and b) DNA-USF affinity and minor groove depthth

A leader fragment of E. coli thrS mRNA: the secondary structure predicted by the genetic algorithm

Fragment of the regulatory 5'-region controlling transcription of the rat tyrosine aminotransferase gene described in the TRRD database

Fragments of gene networks of erythrocyte maturation and differentiation

Informational biology occupies a crucial and an exceptionally important position in modern biology. It provides the theoretical and computer-assisted informational background for genomic research, genetics and breeding, molecular genetics and molecular biology, gene and protein engineering, biotechnology, medical genetics, gene diagnostics, and gene therapy, in other words, sciences whose outstanding achievements confer a leading status to modern biology of the coming century. Top priority is given to the development of research in the area of informational biology throughout the world. The Institute of Cytology and Genetics (IC&G) SB RAS is also committed to the development of integrative informational biology.

The main lines of research

The main goals of the Laboratory are:

  • development of theoretical and computer methods for genome analysis;

  • theoretical and computer-assisted studies of the fundamental molecular-genetic processes, such as transcription, splicing, translation, mutations and recombination, etc.;

  • theoretical and computer analysis of the structural and functional organization of genetic macromolecules (DNA, RNA, and proteins);

  • theoretical and computer studies of regulatory genetic systems;

  • analysis of the evolution of genetic macromolecules and molecular-genetic systems;

  • development of computer technologies for modeling complex molecular-genetic systems and processes;

  • development of databases compiling experimental data on the structure and function of DNA, RNA, and proteins;

  • creation of large-scale informational-programming systems for analysis of molecular-biological and molecular-genetic data.

This work requires an integrated approach, i.e., theoretical and computer-assisted analyses of genetic systems and processes are executed at different levels: from molecule, cell, organism, to population.

Informational and program resources for analysis of genetic systems and processes

The current list of resources includes:

TRRD - a database on transcription regulatory regions in eukaryotic genes

ACTIVITY - for predicting the activity of functional sites on the basis of their nucleotide sequences

B-DNA Video - for investigating the physico-chemical and conformational properties of sites and their recognition.

ConsFreq - for investigating the contextual characteristics of sites and site recognition by frequency matrices and consensusi

TFBSR - for analyzing and recognizing transcription factor binding sites using methods of realizations

GeneNet - a database on eukaryotic gene nets

Leader_mRNA - for estimating the level of eukaryotic mRNA translation

CRASP - for investigating the structural organization and evolution of proteins on the basis of analysis of co-adaptive substitutions

Nucleosome - for analyzing and recognizing DNA nucleosome sites

Fitness - for RNA secondary structure prediction by genetic algorithm

We are developing the TRRD (Transcription Regulatory Regions Database), designed for accumulation of experimental data on extended transcription regulatory regions of the eukaryotic genes. The TRRD allows to describe the modular structure of transcription regulatory regions and the hierarchy of their constituent regulatory units, including cis-elements (transcription factor binding sites), composite elements formed by the pairs of neighboring sites, promoters, enhancers, silencers, and long transcription regulatory regions in the 5'- and 3'-ends of the genes or in their introns that contain all the listed above elements.

DNA conformational and physico-chemical properties underlie the function of regulatory genomic sequences (RGS) and define the specificity and efficiency of their interactions with regulatory proteins. The conformational and physico-chemical DNA properties are context-dependent. They determine the orientation of base pairs in the DNA double helix with respect to each other; physico-chemical properties of RGS; RGS dynamic characteristics significant for DNA-protein interactions.

We have developed the ACTIVITY system for revealing the conformational and physico-chemical properties of RGS. Analysis of different types of RGS demonstrates that each type is characterized by a specific set of conformational and physico-chemical properties and their activity values.

The GeneExpress system, which we are developing, is designed to integrate our and other Internet available resources on gene expression regulation.

We are developing the GeneNet database to accumulate and store information on gene networks and signal transduction pathways of the eukaryotic organisms. The following hierarchical levels of the gene network are considered: an organism, a single cell, and a single gene. The following gene networks are described in the GeneNet database:

  • Regulation of antiviral response.

  • Regulation of erythrocyte maturation and differentiation.

  • Heat shock response.

  • Auto-regulation of the HSP-70 gene.

  • Regulation of steroid biosynthesis.

  • Regulation of lipid metabolism.

  • Regulation of plant seed germination.

  • Storage of nutritional substances in plant seeds.

  • Regulation of nitrogen fixation in legumes.