BLAST for dummies - mn/ibv/bioinfwiki (2024)

Contents

  • 1 BLAST for dummies
    • 1.1 Sequence similarity searches: queries and hits
    • 1.2 The BLAST database
    • 1.3 Aligning query and hit sequences
    • 1.4 Understanding the BLAST output

Sequence similarity searches: queries and hits

The BLAST algorithm is more or less the standard way of performing sequence similarity searches. With ‘sequences’, we mean biological (nucleotide or amino acid) sequences. There are many different reasons as to why such searches may be performed. Typically, the user has one (or many) unknown sequences, and he/she wants to understand what these sequences are or what they do. In the terminology used by BLAST, these are the query sequences. A sequence search will (hopefully) identify sequences that are similar (or even identical) to the queries. The identified sequences are often called the hit sequences (or just hits). Typically, there is much more known about the hits than the query. For instance, we may know that a specific hit is an enzyme. If the match between the query and the hit is sufficient good, we may conclude that the query sequence also is an enzyme (but not necessarily with exactly the same specificity!). Sometimes, we also perform BLAST searches with queries that are already known to the user. In keeping with the previous example, we may use the sequence of a well-known enzyme as a query sequence. After performing a BLAST search, the hit sequences do not help us identify the nature of the query sequence, but they may tell us something about the distribution of this particular protein in other organisms (provided this information is included in the hit descriptions).

The BLAST database

From the above it is clear that, in order to provide information about a given query, BLAST needs a collection of sequences that the query is compared to. Such a collection of queries is known as the database. When executing, BLAST will compare the query sequence to every single sequence in the database. If a similarity is detected, BLAST will output this sequence as a hit. Both the query and the database must be formatted as FASTA files, i.e. each sequence must contain a header starting with the “>” character, followed by the actual sequence on the following lines. The database will often consist of one FASTA file containing very many separate FASTA sequences. Such BLAST databases can be created by the user, but often previously created databases are used.

Sometimes, the user wants to find hit sequences that are 100% identical to the query. Such a search is obviously easy to accomplish. Finding matches that are similar (but not identical) is a much more difficult task. BLAST is (within certain limits) able to do this. But this also implies that not all hits for a given query are equal; some hits will be better than others. In fact, some hits may display so little similarity with the query that we should disregard them altogether.

Aligning query and hit sequences

How does BLAST identify similarity between sequences? BLAST tries to create an alignment between the query and a given database sequence. To start with, a short 100% identical match must be found between the query and database sequences. If such a match is found, the alignment is extended in both directions. Matching characters are awarded points; if the sum of these points keeps increasing, the extension continues. If the sum of points drops below a limit, the alignment extension is stopped, and the hit is reported.

It is not necessary to understand the exact mechanism behind this algorithm. But it is clear that BLAST needs to be instructed about the precise manner of scoring an alignment (i.e. awarding points for matching characters). If using nucleotide sequences, this is accomplished in a very simple manner: only matching characters increase the point score. But when using amino acid sequences, this becomes a bit more complicated. Some amino acids are so dissimilar that they are not awarded points (or indeed get a negative score). But some amino acids are quite similar to each other, such as leucine and isoleucine. These get scores almost as good as identical amino acids. The precise scores for every possible amino acid pair are defined in so-called matrix files. The standard BLAST matrix is called the BLOSUM62 matrix. Along with specifying a query and a database, the user needs to specify which matrix to use when running BLAST.

It is important to understand that this way of creating alignments is not a perfect algorithm. It is used in BLAST because it is very fast, but it will miss or under-report certain types of similarities. (The interested reader may look up “dynamic programming” to find an algorithm that theoretically will produce perfect alignments). The great advantage of BLAST is not its exactness, but its speed.

Understanding the BLAST output

It should be clear from the above that the output of BLAST consists of a list of hits for a given query sequence. The hits are ordered according to their similarity with the query. The most basic measurement of similarity is the “bitscore” or just (“score”), which simply reflects the points awarded the BLAST-generated alignment. The score is recalculated to provide the “E-value”, which quantifies the possibility of a hit being produced just by chance.

It is possible to run BLAST specifying multiple query sequences. In that case, BLAST simply processes one query at the time, and adds the output to the same output file, starting with a definition of the query used. If using many queries in one BLAST run, the output thereof can quickly become overwhelming. In that case, it is useful to use a tool to visualize the BLAST output. One such tool has been developed at UoO:

BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data

(doi:10.1186/1471-2105-15-128)

If you are interested in other options, you can read the following paper:

BLAST output visualization in the new sequencing era

(doi: 10.1093/bib/bbt009)

BLAST for dummies - mn/ibv/bioinfwiki (2024)
Top Articles
Debt Review Removal: A Step-by-Step Guide
How Blockchain Can Reshape the Future of Real Estate Industry
Poe T4 Aisling
Dte Outage Map Woodhaven
Cash4Life Maryland Winning Numbers
How To Be A Reseller: Heather Hooks Is Hooked On Pickin’ - Seeking Connection: Life Is Like A Crossword Puzzle
Encore Atlanta Cheer Competition
Soap2Day Autoplay
270 West Michigan residents receive expert driver’s license restoration advice at last major Road to Restoration Clinic of the year
123 Movies Babylon
Morgan Wallen Pnc Park Seating Chart
Large storage units
Edible Arrangements Keller
De Leerling Watch Online
Gwdonate Org
O'reilly's Auto Parts Closest To My Location
Epro Warrant Search
Soccer Zone Discount Code
Nesz_R Tanjiro
E22 Ultipro Desktop Version
Lonesome Valley Barber
Recap: Noah Syndergaard earns his first L.A. win as Dodgers sweep Cardinals
Viha Email Login
Beverage Lyons Funeral Home Obituaries
Reborn Rich Kissasian
Home
Cb2 South Coast Plaza
Buhl Park Summer Concert Series 2023 Schedule
What we lost when Craigslist shut down its personals section
Darktide Terrifying Barrage
Lininii
Deepwoken: Best Attunement Tier List - Item Level Gaming
Desales Field Hockey Schedule
Uky Linkblue Login
25Cc To Tbsp
Citibank Branch Locations In Orlando Florida
How does paysafecard work? The only guide you need
Roto-Rooter Plumbing and Drain Service hiring General Manager in Cincinnati Metropolitan Area | LinkedIn
Mississippi State baseball vs Virginia score, highlights: Bulldogs crumble in the ninth, season ends in NCAA regional
Tyler Sis 360 Boonville Mo
Gold Nugget at the Golden Nugget
Aliciabibs
Gvod 6014
Infinite Campus Parent Portal Hall County
20 bank M&A deals with the largest target asset volume in 2023
Home Auctions - Real Estate Auctions
3 bis 4 Saison-Schlafsack - hier online kaufen bei Outwell
Avatar: The Way Of Water Showtimes Near Jasper 8 Theatres
Wgu Admissions Login
Sinai Sdn 2023
Adams-Buggs Funeral Services Obituaries
Latest Posts
Article information

Author: Margart Wisoky

Last Updated:

Views: 6590

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.