How is sequence similarity different than percent identity from BLASTP? (2024)

How is sequence similarity different than percent identity from BLASTP?

Entering edit mode

23 months ago

Viraj • 0

Hi Everyone,

I am new to Biostars. I am having trouble finding a concrete answer to the post's question.

My understanding is that sequence similarity is the fraction of residues that are similar between two different protein sequences. Percent identity is the number of characters that match exactly between two different sequences.

I read that sequence similarity is strongly correlated to percent identity. I also read that it is a subset of percent identity. These two are contradicting.

Can someone help me distinguish between the two concepts? Thanks

hom*ology BLASTP • 7.7k views

ADD COMMENTlink 23 months ago by Viraj • 0

1

Hi! Have you read this webpage? I think it is nicely explained :)

ADD REPLYlink 23 months ago by iraun 6.2k

Entering edit mode

Thank you for the link.

Looking at the link and these sequences:

A: AAGGCTT

B: AAGGC

I understand this has 100% identity. How is this 60% similar?

Edit distance is minimal number of edit operations (inserts, deletes, and substitutions) in order to transform the one sequence into an exact copy of the other sequence being aligned

Similar = 1 - edit distance/ unaligned length of shorter sequence

Therefore, similar = 1 - (2/2) or 1. Not sure how the author got 60%. Either the author made a typo in the similar definition or the math is wrong.

Can someone explain? Thanks.

ADD REPLYlink 23 months ago by Viraj • 0

1

Entering edit mode

23 months ago

Mensur Dlakic ★ 28k

Sequence identity has a literal meaning that should be easy to understand. When the two sequences are aligned, any pair of residue is either identical, or it isn't.

Sequence similarity is a broader term, and always includes identity. That means identical residues are always similar by definition, while the opposite is not necessarily true. Therefore, sequence similarity is equal to or greater than sequence identity. Similarity includes conservative substitutions that usually have positive scores in substitution matrices.

The alignment below has 430/432 identical residues (see under Identities) and 432/432 similar residues (see under Positives). If you look in the middle alignment row, similar residues have a + sign instead of residue letters (around positions 285 and 340). If the residues were not similar, there would be an empty space instead of +.

Score Expect Method Identities Positives Gaps877 bits(2265) 0.0 Compositional matrix adjust. 430/432(99%) 432/432(100%) 0/432(0%)Query 1 MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETGAGK 60 MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETGAGKSbjct 1 MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETGAGK 60Query 61 HVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAANNYARGHYTIGKEIIDLVLD 120 HVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAANNYARGHYTIGKEIIDLVLDSbjct 61 HVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAANNYARGHYTIGKEIIDLVLD 120Query 121 RIRKLADQCTGLQGFLVFHsfgggtgsgftsLLMERLSVDYGKKSKLEFSIYPAPQVSTA 180 RIRKLADQCTGLQGFLVFHSFGGGTGSGFTSLLMERLSVDYGKKSKLEFSIYPAPQVSTASbjct 121 RIRKLADQCTGLQGFLVFHSFGGGTGSGFTSLLMERLSVDYGKKSKLEFSIYPAPQVSTA 180Query 181 VVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIERPTYTNLNRLISQIVSSITA 240 VVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIERPTYTNLNRLISQIVSSITASbjct 181 VVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIERPTYTNLNRLISQIVSSITA 240Query 241 SLRFDGALNVDLTEFQTNLVPYPRIHFPLATYAPVISAEKAYHEQLSVAEITNACFEPAN 300 SLRFDGALNVDLTEFQTNLVPYPRIHFPLATYAPVISAEKAYHEQL+VAEITNACFEPANSbjct 241 SLRFDGALNVDLTEFQTNLVPYPRIHFPLATYAPVISAEKAYHEQLTVAEITNACFEPAN 300Query 301 QMVKCDPRHGKYMACCLLYRGDVVPKDVNAAIATIKTKRSIQFVDWCPTGFKVGINYQPP 360 QMVKCDPRHGKYMACCLLYRGDVVPKDVNAAIATIKTKR+IQFVDWCPTGFKVGINYQPPSbjct 301 QMVKCDPRHGKYMACCLLYRGDVVPKDVNAAIATIKTKRTIQFVDWCPTGFKVGINYQPP 360Query 361 TVVPGGDLAKVQRAVCMLSNTTAIAEAWARLDHKFDLMYAKRAFVHWYVGEGMEEGEFSE 420 TVVPGGDLAKVQRAVCMLSNTTAIAEAWARLDHKFDLMYAKRAFVHWYVGEGMEEGEFSESbjct 361 TVVPGGDLAKVQRAVCMLSNTTAIAEAWARLDHKFDLMYAKRAFVHWYVGEGMEEGEFSE 420Query 421 AREDMAALEKDY 432 AREDMAALEKDYSbjct 421 AREDMAALEKDY 432

ADD COMMENTlink 23 months ago by Mensur Dlakic ★ 28k

Entering edit mode

Thank you for the comprehensive answer! This means sequence similarity is the positive score and sequence identity is the identity score from BLAST

ADD REPLYlink 23 months ago by Viraj • 0

Entering edit mode

This means sequence similarity is the positive score and sequence identity is the identity score from BLAST

Not quite. Sequence identity is a fraction of identical residues, and similarity is a fraction of similar residues. BLAST provides a single score that includes everything rather than breaking it down by identity or similarity.

ADD REPLYlink 23 months ago by Mensur Dlakic ★ 28k

Entering edit mode

In your above example, identical residues is 430/432 and similar residues is 432/432. Similar residues include identical residues plus similar residues denoted as a plus. This is not right?

ADD REPLYlink 23 months ago by Viraj • 0

1

Entering edit mode

It is right, but they are not scores in the same sense as bit-score. They are fractions.

ADD REPLYlink 23 months ago by Mensur Dlakic ★ 28k

Entering edit mode

I see. Thank you for your answers and clarification. I appreciate the help!

ADD REPLYlink 23 months ago by Viraj • 0

Login before adding your answer.

How is sequence similarity different than percent identity from BLASTP? (2024)
Top Articles
How to manage my MoonPay account?
10 of the Best Blogging Niches for Freelance Writers
Katie Nickolaou Leaving
Joe Taylor, K1JT – “WSJT-X FT8 and Beyond”
Cappacuolo Pronunciation
Thor Majestic 23A Floor Plan
Free Atm For Emerald Card Near Me
Coffman Memorial Union | U of M Bookstores
Yi Asian Chinese Union
Celsius Energy Drink Wo Kaufen
Globe Position Fault Litter Robot
Aces Fmc Charting
Gmail Psu
Pekin Soccer Tournament
R Personalfinance
PowerXL Smokeless Grill- Elektrische Grill - Rookloos & geurloos grillplezier - met... | bol
Tu Pulga Online Utah
Rochester Ny Missed Connections
The Many Faces of the Craigslist Killer
Ecampus Scps Login
Regina Perrow
EVO Entertainment | Cinema. Bowling. Games.
Angel Haynes Dropbox
Mjc Financial Aid Phone Number
Meowiarty Puzzle
Bridgestone Tire Dealer Near Me
Lincoln Financial Field, section 110, row 4, home of Philadelphia Eagles, Temple Owls, page 1
Publix Daily Soup Menu
Acuity Eye Group - La Quinta Photos
Craigslist Free Puppy
140000 Kilometers To Miles
Kvoa Tv Schedule
42 Manufacturing jobs in Grayling
Orion Nebula: Facts about Earth’s nearest stellar nursery
PruittHealth hiring Certified Nursing Assistant - Third Shift in Augusta, GA | LinkedIn
Vocabulary Workshop Level B Unit 13 Choosing The Right Word
Bartow Qpublic
Henry Ford’s Greatest Achievements and Inventions - World History Edu
Bob And Jeff's Monticello Fl
Trivago Sf
Isabella Duan Ahn Stanford
Emily Browning Fansite
Collision Masters Fairbanks
10 Types of Funeral Services, Ceremonies, and Events » US Urns Online
Kenwood M-918DAB-H Heim-Audio-Mikrosystem DAB, DAB+, FM 10 W Bluetooth von expert Technomarkt
Sam's Club Gas Price Sioux City
Advance Auto.parts Near Me
antelope valley for sale "lancaster ca" - craigslist
18 Seriously Good Camping Meals (healthy, easy, minimal prep! )
View From My Seat Madison Square Garden
Mawal Gameroom Download
Latest Posts
Article information

Author: Laurine Ryan

Last Updated:

Views: 6160

Rating: 4.7 / 5 (77 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Laurine Ryan

Birthday: 1994-12-23

Address: Suite 751 871 Lissette Throughway, West Kittie, NH 41603

Phone: +2366831109631

Job: Sales Producer

Hobby: Creative writing, Motor sports, Do it yourself, Skateboarding, Coffee roasting, Calligraphy, Stand-up comedy

Introduction: My name is Laurine Ryan, I am a adorable, fair, graceful, spotless, gorgeous, homely, cooperative person who loves writing and wants to share my knowledge and understanding with you.