Bioinformatics - Part 2

Data and databanks

Een computerWe are interested in information about our DNA, proteins and the function of proteins. Genes and proteins can be sequenced: this means that the sequence of bases in genes, or amino acids in proteins can be determined.

This information must be stored in an intelligent fashion, so that scientists can solve problems quickly and easily using all available information. Therefore, the information is stored in databanks, many of which are accessible to everyone on the internet.

A few examples are a databank containing protein structures (the PDB or Protein Data Bank), a databank containing protein sequences and their function (Swiss-Prot), a databank with information about enzymes and their function (ENZYME), and a databank with nucleotide sequences of all genes sequenced up to date (EMBL).

Due to the current state of technology, there are large differences between the sizes of databanks. EMBL, the nucleotides database contains many more sequences than the number of protein structures registered in the PDB. The reason for this is that it is a lot simpler to sequence a gene, than to find out which protein is encoded by this gene and what its function is. And it is even more difficult to determine the structure of the protein.

On November 22nd 2007 the databanks contained:

Besides Swiss-Prot, another protein database is used by bioinformaticians: Uni-Prot. Uni-Prot is much more complete than Swiss-Prot because it contains all data from Swiss-Prot + all computer-generated translations from all genes in EMBL. The downside is that much of this computer-generated information is not annotated yet, or experimentally confirmed.

Using databanks, one can perform all kinds of comparisons and search queries. If, for example, you know a protein which causes a disease in humans, your might look into a databank to see if a similar protein has previously been described and what this protein does in the human body. Using known information will make it easier and quicker to develop a drug against the disease or a test to detect the disorder in an early stage.

There are many more applications to bioinformatics!

Exercise 5:

Give an example of how bioinformatics is applied in the following fields:

Answers

Progress

Of course, the technology involved in bioinformatics has not reached its current level overnight. Biotechnology has existed for thousands of years, ever since the Sumerians started brewing beer. De progress made by man lies on different levels:

The table below shows a number of events which have had an influence on bioinformatics as we know it today.

Exercise 6:

For each event, mention if it is a form of experimental progress, a discovery or an improvement in computer technology.

YearMilestoneType
1590Invention of the microscope 
1663First description of cells 
1830First description of proteins 
1833First purification of enzymes  
1906The word genetics makes its first appearance--
1919The concept of biotechnology is introduced--
1944Avery proves that DNA is the molecule containing genetic information  
1946ENIAC, the first programmable computer able to perform 5000 calculations in one second, is built  
1951Pauling and Corey describe α-helices and β-sheets  
1953Watson and Crick describe the 3D structure of DNA  
1958Development of the first microchip  
1970A method is developed to compare sequences (Needleman-Wunsch)  
1971Ray Tomlinson invents a new way to communicate: e-mail  
1972Scientist realise that the DNA composition of chimpanzees is 99% identical to that of Man  
1973The Protein Data Bank containing all known information about protein structures is created  
1974The idea arises to link several computers through an internet  
1976Building of the first supercomputer: it could perform 160 million operations per second and had a memory of 8 MB  
1981A machine is built to bind nucleotides together and thus build small pieces of genes  
1981IBM introduces its Personal Computer: a computer for everyone  
1983A new technique is developed to make exact copies of DNA in large amounts: PCR  
1983The Compact Disc is brought to the market  
1986The term genomics is used for the first time and describes the science of sequencing, mapping and analyzing genes--
1986The Swiss-Prot databank is started, containing all known information about proteins  
1988The Human Genome Project is launched--
1991The European research center CERN develops the World Wide Web  
1994The first breast cancer gene is discovered  
1995For the first time, the complete genome of a bacterium is sequenced  
1997The sheep Dolly is cloned in Scotland  
1998The first complete animal genome is sequenced: a worm  
2001The human genome is published, although a few gaps remain  
2005The monkey genome is published  

Answers

Previous | Next