Role of Data Mining Techniques in Bioinformatics (2024)

Article Preview

Top

Introduction

Bioinformatics is the integration of biology, mathematics, statistics, medicines, information technology, and computer science. Bioinformatics is the skill of storing, retrieving and analyzing huge amounts of biological information such as DNA, RNA, and Proteins etc. (Bayat, 2002). Recent technological advancement permits the biologists to produce huge volumes of data ranging from measurements of DNA database, Protein sequence, protein structure database, Phenotype database and Genomic sequence database etc. Bioinformatics holds great potential of analysis in the different areas like genome, proteomics, drug discovery and development, protein structure, cell biology, molecular modelling, gene expression (Khan, 2018) etc. as represented in figure 1. one can analysis and extract valuable pattern in gene expression, classify protein structure, gene prediction, gene identification, diagnosing different types of disease (cancer etc.) on which genes are expressed etc. Data Mining offers capability to analysis of bioinformatics data, and useful to pattern identification, classification, prediction and genetic network induction (Mabu, 2018).

Figure 1.

Bioinformatics areas

In today’s world, data is the base for everything, if it is analyzeand extracted properly. In bioinformatics various types of data is available for mining as shown in figure 2.

Figure 2.

Types of data in bioinformatics

DNA: It’s the genetic code that determines all the characteristics of a living thing. DNA is heridatry material means child got his DNA from his parents. Smaller units of DNA are called as nucleotides. Each nucleotide entails three part nitrogen, sugar (ribose) and phosphate. There are four type of nitrogen bases are adenine (A), thymine (T), guanine (G) and cytosine (C). The order of these bases governs the genetic code (Dua & Chowriappa, 2012).

Proteins: Proteins are huge, complex molecules that very significant for the body. Protein consists twenty different amino acids. Sequence of these amino acids regulates each protein’s unique 3D structure and its precise function.

Gene: A gene is a segment of DNA that buildup of a sequence of As, Cs, Ts and Gs in a particular order. Human genes differ in size ranges from few hundred bases to million bases.

Genome: Complete set of genes of an organism.

Data mining techniques can be useful to identify correlation, pattern and knowledge discovery from bioinformatics datasets. Data mining denotes to digging or “mining” knowledge from vast amounts of data. Data mining techniques discover important pattern, hidden information available from data set. Data mining techniques is successfully applied in diverse domains like retail, e-business, marketing, health care, research etc. Bioinformatics is not exceptional in this line. Actually, domain that is leveraging with rich set of data is the best candidate for data mining. Hence, there is a great potential to enhance the communication between data mining techniques and bioinformatics (Hashemi et al., 2018).

There are various challenges in bioinformatics like classification of proteins, gene etc, and association between co-diseases. Data mining techniques are useful to overcome these challenges and added new insights to finding knowledge and pattern in biological data base.

In this paper author highlights role of data mining techniques in bioinformatics. The remainder of this paper has outlined as follows. Section 2, introduced the challenges involved in the field of bioinformatics. Section 3 provides the different data mining task in bioinformatics. An application of data mining in disease prediction is represented in section 4 and section 5 concludes the paper.