Last updated on Sep 6, 2024
- All
- Data Structures
Powered by AI and the LinkedIn community
1
Hashing basics
2
Hashing algorithms
3
Advantages of hashing
4
Disadvantages of hashing
5
Hashing tips
6
Here’s what else to consider
Hashing is a technique that converts a string into a numerical value, called a hash, that represents some characteristic of the string. Hashing can be used for string matching, which is the problem of finding whether a given pattern occurs in a text. Read on to learn about the advantages and disadvantages of using hashing for string matching, plus some of the common algorithms that use hashing.
Top experts in this article
Selected by the community from 33 contributions. Learn more
Earn a Community Top Voice badge
Add to collaborative articles to get recognized for your expertise on your profile. Learn more
- Nishant Chahar Building @Tayyari | Ex-Microsoft | 400k+ Subs on YT | NSIT
121
-
63
- Manan Solanki Backend SDE II @ Flipkart | DAIICT | Top 1% @ Leetcode | 4x Top Voice | Java, Spring
12
1 Hashing basics
Hashing is based on the idea that if two strings are equal, they should have the same hash value. Therefore, to compare two strings, you can actually compare their hashes, instead of comparing each character. This saves time and space, especially if the strings are long or have a fixed length. However, hashing isn't perfect. There's a possibility that two different strings have the same hash value. Called a collision, it can cause false positives or errors in string matching.
Help others by sharing more (125 characters min.)
- Nishant Chahar Building @Tayyari | Ex-Microsoft | 400k+ Subs on YT | NSIT
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Hashing is one of the best algorithms when it comes to string matching. The pros for hashing is that it is time efficient and gives an average time complexity of constant time.There are cons as well, like you need to implement collision handling as it is possible that 2 strings might have same hash even when they are different. Also hash tables use extra space, so where we have to optimise on space we cannot use hashing.
LikeLike
Celebrate
Support
Love
Insightful
Funny
121
- Deepika Aggarwal SDE 3 at Paypal | 29K+ | Mentor @preplaced | Content Creater |YT : codingwithDeepika
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Hashing is a very good way for searching in almost O(1) time complexity. It is very efficient and memory efficient as well. The only thing that makes hashing O(n) is collisions. If we can find optimal collision resolution mechanism than hashing is one of the best also.
LikeLike
Celebrate
Support
Love
Insightful
Funny
9
- Megha Kumari SDE III at Google
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Pros:1. Efficiency in Comparison:Hashing allows for constant-time average case comparisons. Once the hash values are computed, comparing the hash values of substrings is faster than comparing the substrings character by character.2. Constant-Time Average Case:On average, hashing provides constant-time complexity for comparisons. Cons:1. Collision Handling:Hash collisions can occur, where different strings hash to the same value. Collision resolution strategies are needed to address this issue, which can add complexity to the implementation.2. Memory Usage:Hashing may require additional memory to store hash values, which can be a concern in memory-constrained environments.
LikeLike
Celebrate
Support
Love
Insightful
Funny
7
- Md Kamran MTS Intern @GeeksforGeeks | Specialist(1536) at Codeforces | Guardian at Leetcode (2158) Top :- ~1% (1600+ Solved) | 4⭐ CodeChef (1921) | AtCoder(890) 6Kyu | 4000+ Algorithmic Problems | Aspiring SDE |
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Hashing transforms input data into a fixed-size value called a hash code. It's a key concept in computer science, especially for fast data retrieval. In string matching, it ensures quick comparisons by converting strings into numerical values.
LikeLike
Celebrate
Support
Love
Insightful
Funny
5
- Mayank Jain Software Engineering Specialist @ GE HealthCare
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Hashing is a process that converts input data (like a string) into a fixed-size string of characters, typically a hash code. In string matching, hashing is commonly used to efficiently compare strings. Here are the pros and cons:Pros:Efficiency: Enables quick comparison of strings for faster matching.Constant Time: Once hashed, comparison takes constant time, regardless of string length.Cons:Collisions: Possible occurrences of different strings having the same hash.Hash Sensitivity: Small changes in input can result in significantly different hash values, affecting matching accuracy.
LikeLike
Celebrate
Support
Love
Insightful
Funny
4
Load more contributions
2 Hashing algorithms
String matching can be achieved with a variety of hashing algorithms, all of which differ in how they handle collisions, compute the hash value, and search the text. For instance, the Rabin-Karp algorithm uses a rolling hash function to compute the hash value of a substring, enabling it to move quickly from one substring to another without recomputing. An alternative approach is Knuth-Morris-Pratt (KMP), which doesn't use hashing, instead creating a table that stores information about the pattern. This allows it to skip characters that don't match the pattern and avoid backtracking. While KMP can be faster than hashing algorithms, it uses more space.
Help others by sharing more (125 characters min.)
- Minh Chien Vu Ph.D | Senior data scientist | LinkedIn Top Voice
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
The hashing for string matching has of course the advantages and the disadvantages. In the Rabin-Karp algorithm, the rolling hash function is used to rapidly compute hash values for substrings thus, searching fast even for the case of multiple pattern matches in a text. On the contrary, hashing algorithms may be faced with collisions, whereby different strings are foiled to the same hash value, thus, a necessity for the auxiliary checks to authenticate the matches. Contrarily, the KMP algorithm does not use hashing and makes a table that stores the pattern information that thus, avoids unnecessary comparisons. On the contrary, KMP is more effective and avoids collision issues, but it usually needs more memory to store the table.
LikeLike
Celebrate
Support
Love
Insightful
Funny
9
-
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Some common hash function used by big tech companies includes MD5(Message Digest Algorithm 5) which produces a 128-bit hash value. SHA-1(Secure Hash Algorithm 1) but this is not widely used nowadays. One of the most famous algorithms is SHA-256 which is also used by WhatsApp to encrypt our messages and passwords. Rabin-Karp Algorithm is one of the most efficient for comparing hash values of substrings during string matching. The rolling hash function within this algorithm allows for quick updates as characters are added or removed.
LikeLike
Celebrate
Support
Love
Insightful
Funny
8
- Himanshu Agrawal (Serving Notice Period) SDE-2 @ Samsung | ACM-ICPC Regionals'21 | Founder @CodingBuddy | SmartThings Cloud | Java, Spring Boot, AWS, Kafka, Golang, Backed Engineer 🚀
(edited)
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
1)𝗥𝗮𝗯𝗶𝗻-𝗞𝗮𝗿𝗽 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺:Utilizes hashing to efficiently search for a pattern within a text by comparing hash values.It uses a rolling hash function (like polynomial hashing) to compute hash values for substrings.Compares hash values of the pattern and sliding window substrings to find matches.2)𝗞𝗻𝘂𝘁𝗵-𝗠𝗼𝗿𝗿𝗶𝘀-𝗣𝗿𝗮𝘁𝘁 (𝗞𝗠𝗣) 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺:Focuses on efficient pattern matching by utilizing a prefix function to avoid unnecessary character comparisons.Preprocesses the pattern to construct a prefix table (partial match table) that determines shifts during matching.Avoids redundant comparisons by using information from previous match failures.
LikeLike
Celebrate
Support
Love
Insightful
Funny
5
- Siddharth Ex SWE Intern @Netlux Systems | | Top Cloud Computing , Web , Frontend Voice | | Full Stack Developer | | WEB3 | | NEXT | | Django | | CLOUD SECURITY | | Moderator @ArcOfCode | | Contributor In Web3 Community
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
In my experience, delving into the world of hashing algorithms has been both intriguing and essential. Navigating the intricate landscape of cryptographic techniques, I've come to appreciate the significance of hash functions in data integrity and security. From MD5 to SHA-256, each algorithm carries its unique attributes, influencing applications ranging from password storage to digital signatures. Join me as I explore the pivotal role of hashing in safeguarding sensitive information and fostering digital trust.
LikeLike
Celebrate
Support
Love
Insightful
Funny
4
- Md Kamran MTS Intern @GeeksforGeeks | Specialist(1536) at Codeforces | Guardian at Leetcode (2158) Top :- ~1% (1600+ Solved) | 4⭐ CodeChef (1921) | AtCoder(890) 6Kyu | 4000+ Algorithmic Problems | Aspiring SDE |
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Algorithms like SHA-256, MD5, and Rabin-Karp are popular for hashing. They vary in complexity and use cases, with some optimized for security and others for speed. Choosing the right one depends on your specific needs, like collision resistance or performance.
LikeLike
Celebrate
Support
Love
Insightful
Funny
3
3 Advantages of hashing
Due to its simplicity, scalability, and versatility, hashing can be a beneficial approach for string matching. It only requires a hash function and a comparison operation, making it easy to implement. It can reduce the time and space complexity of string matching for long or fixed-length strings; This makes it efficient and scalable. Plus, hashing can handle multiple patterns, approximate matches, or other variations of string matching with ease.
Help others by sharing more (125 characters min.)
-
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Here are some of the advantages of using the Hash codesHashes are commonly used in databases to store and retrieve records quickly.Hashes are used in caches to quickly look up frequently accessed data.Hashes are used in symbol tables to store key-value pairs representing identifiers and their corresponding attributes.Hashes are used in cryptography to create digital signatures, verify the integrity of data, and store passwords securely.
LikeLike
Celebrate
Support
Love
Insightful
Funny
- Kamlesh Joshi Jr Software Engineer @ Egnyte || Ex- GeeksforGeeks || (Knight - Under 2.13%) @Leetcode || 5⭐ @Codechef || Specialist @Codeforces ||
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Mostly Hashing is used in cryptography to create digital signatures, and to store the passwords securely. We encrypt and decrypt while using the passwords
LikeLike
Celebrate
Support
Love
Insightful
Funny
4
- Himanshu Agrawal (Serving Notice Period) SDE-2 @ Samsung | ACM-ICPC Regionals'21 | Founder @CodingBuddy | SmartThings Cloud | Java, Spring Boot, AWS, Kafka, Golang, Backed Engineer 🚀
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
𝗙𝗮𝘀𝘁 𝗖𝗼𝗺𝗽𝗮𝗿𝗶𝘀𝗼𝗻: Hashing allows for quick comparisons between hash values, which can speed up the matching process significantly. Instead of comparing entire strings character by character, you compare their hash values.𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆: Once the hash values are computed, searching for matches can be faster than traditional string comparison methods, especially for large datasets.𝗥𝗲𝗱𝘂𝗰𝗲𝗱 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆: Hashing reduces the complexity of string comparison operations from linear to constant time (assuming the hash function has a good distribution and collisions are minimal).
LikeLike
Celebrate
Support
Love
Insightful
Funny
3
- Raktim Sarkar Enterprise Architect at TCS | DSA | Java | JavaScript | TypeScript | Python | Tech leadership | Automation-QE
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Advantage —-Password Storage : Instead of storing plain-text passwords , System stores the hash value of passwords . Its challenging for the attackers to reverse it and obtain the original password . Disadvantage —— of course Collison risk , like if there are two inputs havibg same hash code , collison happens . This concept is eectensively used in HashMap though in case of map , we can still optimise the collision by forming a red-black binary tree .
LikeLike
Celebrate
Support
Love
Insightful
Funny
2
- Md Kamran MTS Intern @GeeksforGeeks | Specialist(1536) at Codeforces | Guardian at Leetcode (2158) Top :- ~1% (1600+ Solved) | 4⭐ CodeChef (1921) | AtCoder(890) 6Kyu | 4000+ Algorithmic Problems | Aspiring SDE |
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Hashing offers rapid data retrieval, perfect for large datasets. It provides efficient storage and search capabilities, reducing time complexity from O(n) to O(1). Plus, it's versatile, supporting various applications from password storage to data integrity checks.
LikeLike
Celebrate
Support
Love
Insightful
Funny
2
Load more contributions
4 Disadvantages of hashing
Hashing can be a less than ideal method for string matching due to its potential for inaccuracy and unreliability caused by false positives or collisions. It can be vulnerable to the choice of hash function, size of hash value, or distribution of text and pattern. Hashing can also be costly and wasteful, as it may require additional operations to resolve collisions or verify matches.
Help others by sharing more (125 characters min.)
- Himanshu Agrawal (Serving Notice Period) SDE-2 @ Samsung | ACM-ICPC Regionals'21 | Founder @CodingBuddy | SmartThings Cloud | Java, Spring Boot, AWS, Kafka, Golang, Backed Engineer 🚀
(edited)
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
𝗥𝗶𝘀𝗸 𝗼𝗳 𝗖𝗼𝗹𝗹𝗶𝘀𝗶𝗼𝗻𝘀: Different strings yielding the same hash value can cause false positives, leading to incorrect matches.𝗛𝗮𝘀𝗵 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻 𝗤𝘂𝗮𝗹𝗶𝘁𝘆: Accuracy relies on a good hash function; poor ones increase collision chances, impacting matching precision.𝗘𝘅𝗮𝗰𝘁 𝗠𝗮𝘁𝗰𝗵𝗲𝘀 𝗢𝗻𝗹𝘆: Hashing allows only exact matches, unable to handle approximate or fuzzy matches.𝗛𝗮𝘀𝗵 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻 𝗢𝘃𝗲𝗿𝗵𝗲𝗮𝗱: Computationally expensive for complex hash functions or lengthy strings, potentially negating efficiency gains.𝗦𝗽𝗮𝗰𝗲 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆: Storing hash values alongside strings might consume more memory, especially if the hashes are larger.
LikeLike
Celebrate
Support
Love
Insightful
Funny
8
- Md Kamran MTS Intern @GeeksforGeeks | Specialist(1536) at Codeforces | Guardian at Leetcode (2158) Top :- ~1% (1600+ Solved) | 4⭐ CodeChef (1921) | AtCoder(890) 6Kyu | 4000+ Algorithmic Problems | Aspiring SDE |
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Hashing can suffer from collisions, where different inputs produce the same hash. It also requires good hash function design to avoid performance bottlenecks. Additionally, it’s a one-way process, making it unsuitable for scenarios needing data reversal.
LikeLike
Celebrate
Support
Love
Insightful
Funny
2
5 Hashing tips
When using hashing for string matching, select a suitable hash function that minimizes collisions and maximizes randomness. Additionally, using a large prime number as the modulus or the base of the hash function can reduce the probability of collisions, while also increasing hash values' diversity. To further improve accuracy and efficiency, combine hashing with other techniques, like preprocessing, randomization, or verification.
Help others by sharing more (125 characters min.)
- Manan Solanki Backend SDE II @ Flipkart | DAIICT | Top 1% @ Leetcode | 4x Top Voice | Java, Spring
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
When using hashing for string matching, it is important to use a good Hash function that minimizes collisions and fast to compute. One good option is Polynomial Rolling Hash is effective, with a formula:H(s) = (s[0] * p^(n-1) + s[1] * p^(n-2) + ... + s[n-1] * p^0) % mwhere:s[i] is the ASCII value of the i-th character in the string,p is a random prime number (e.g., 31 or 53),m is a large prime number (e.g., 2^61 - 1).It calculates the hash in O(n) time and can recalculate in O(1) time after rolling while using constant extra space.
LikeLike
Celebrate
Support
Love
Insightful
Funny
12
- Md Kamran MTS Intern @GeeksforGeeks | Specialist(1536) at Codeforces | Guardian at Leetcode (2158) Top :- ~1% (1600+ Solved) | 4⭐ CodeChef (1921) | AtCoder(890) 6Kyu | 4000+ Algorithmic Problems | Aspiring SDE |
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Choose a hash function that minimizes collisions and distributes hash values uniformly. Regularly update your hash functions to guard against vulnerabilities. Lastly, always test your hashing algorithm under different scenarios to ensure robustness and efficiency.
LikeLike
Celebrate
Support
Love
Insightful
Funny
6
6 Here’s what else to consider
This is a space to share examples, stories, or insights that don’t fit into any of the previous sections. What else would you like to add?
Help others by sharing more (125 characters min.)
-
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Hashing allows for quick insertions & deletions due to its constant-time complexity in average cases. In a real-time logging system, we used hashing to manage log entries dynamically. The ability to rapidly insert new logs and remove outdated ones ensured that the system remained responsive and efficient, even under heavy load. This performance benefit was critical for maintaining real-time monitoring & analysis.Handling dynamic data that frequently changes can introduce complexity when using hashing, particularly in maintaining consistent performance. In a dynamic user profile management system, we faced challenges with maintaining performance as user data continuously evolved.
LikeLike
Celebrate
Support
Love
Insightful
Funny
4
- Amruta Jahagirdar "Experienced .NET Architect | Masters in CS Georgia Tech USA| Author, Mentor, AI Enthusiast | Transforming Ideas into Innovative Solutions for 14 Years"
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Once the hash values are computed for the strings, comparing the hash values has constant time complexity O(1), which makes it efficient for searching and matching.Tis is great thing about hashing. But, you need to use it carefully.Hashing typically requires additional memory to store hash values, which can increase memory usage, especially for large datasets. This additional memory overhead should be considered, particularly in memory-constrained environments.
LikeLike
Celebrate
Support
Love
Insightful
Funny
3
- Soumik Mukherjee Software Engineer @ JFrog | Former SDE Intern @ JFrog, Intel, Hyland | Ex-MTS Intern @ GeeksForGeeks | Mentor @ SWOC 2.0, GWoC'21, LGM-SoC'21 | KiiT, Bhubaneswar
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
One time, while working on a project that required comparing huge datasets of user inputs, we relied heavily on hashing for string matching. At first, it worked great—super fast and efficient. But then we hit a snag: hash collisions. Two completely different strings were generating the same hash, causing incorrect matches. It was a reminder that, as powerful as hashing is, it's not perfect.We had to implement additional checks to handle those rare cases. So, while hashing can be a game-changer, you need to be aware of its limitations and always have a backup plan for dealing with collisions.
LikeLike
Celebrate
Support
Love
Insightful
Funny
3
Data Structures
Data Structures
+ Follow
Rate this article
We created this article with the help of AI. What do you think of it?
It’s great It’s not so great
Thanks for your feedback
Your feedback is private. Like or react to bring the conversation to your network.
Tell us more
Tell us why you didn’t like this article.
If you think something in this article goes against our Professional Community Policies, please let us know.
We appreciate you letting us know. Though we’re unable to respond directly, your feedback helps us improve this experience for everyone.
If you think this goes against our Professional Community Policies, please let us know.
More articles on Data Structures
No more previous content
- How do you handle noisy, incomplete, or inconsistent data when filtering big data? 13 contributions
- What are some common challenges or pitfalls of using skip trie trees for string matching? 13 contributions
- What are some real-world applications of divide and conquer sorting methods? 33 contributions
- How do you handle different types of data, such as text, images, or audio, with string algorithms? 12 contributions
- What are some common pitfalls or misconceptions about topological sort? 24 contributions
- What are some best practices and tips for writing clean and readable binary search tree traversal code? 26 contributions
- What are some best practices for designing and testing hash table data structures? 16 contributions
- How do you handle edge cases and corner cases when designing and testing your data structures? 18 contributions
- What are the trade-offs between speed and compression ratio in string algorithms? 20 contributions
- How do you optimize the performance and scalability of divide and conquer sorting algorithms? 34 contributions
- How do you use dynamic programming for string matching in a matrix or a graph? 14 contributions
- How do you apply quick sort to other data structures, such as linked lists or trees? 52 contributions
No more next content
More relevant reading
- Algorithms How do you use Prim's algorithm for finding minimum spanning trees?
- Algorithms How can binary search algorithm improve time complexity?
- Data Visualization How can you clean negative values in a logarithmic scale bar chart?
- Data Extraction How do you debug and test your scraping scripts before running them on large datasets?