Multiple ways to find duplicates in c# array with pros and cons (2024)

Multiple ways to find duplicates in c# array with pros and cons (2)

In C#, we can use various approaches to find duplicate elements in array. Each have pros and cons. We will use 3 approaches in this article.

  1. Using a HashSet
  2. Using a Dictionary
  3. Using LINQ

Lets take this array as an example, from this array we will extract distinct numbers and duplicate numbers.

int[] numbers = { 4,7, 2, 3, 4, 5, 3, 6, 7, 8,1, 8 };

We will use this array in all three approaches. So let’s understand one by one.

HashSet is a data structure, that holds the unique items. Using HashSet , we can easily find the duplicates in array.

This approach involves iterating over the input array and adding each element to a HashSet. If an element is already present in the HashSet, it is considered a duplicate. This approach has an average time complexity of O(n) and is particularly useful if the input array is large and the order of the elements does not matter. But it can have few problems, which we will discuss later.

public class Program
{
static void Main()
{
int[] numbers = { 4, 7, 2, 3, 4, 5, 3, 6, 7, 8, 1, 8 };
HashSet<int> distinctNumbers = new();
HashSet<int> duplicateNumbers = new();

foreach (int number in numbers)
{
if (!distinctNumbers.Add(number))
duplicateNumbers.Add(number);
}

Console.WriteLine("original array");
PrintArray(numbers);

Console.WriteLine("distinct Numbers");
PrintArray(distinctNumbers.ToArray());

Console.WriteLine("duplicate Numbers");
PrintArray(duplicateNumbers.ToArray());

Console.ReadLine();
}

static void PrintArray(int[] numArray)
{
foreach (int num in numArray)
{
Console.Write($"{num} ");
}
Console.WriteLine();
}
}

Let’s break it. We are extracting distinct and duplicate numbers from the array

int[] numbers = { 4, 7, 2, 3, 4, 5, 3, 6, 7, 8, 1, 8 };

We have created two HashSet distinctNumbers and duplicateNumbers.

HashSet<int> distinctNumbers = new();
HashSet<int> duplicateNumbers = new();

We will iterate over the numbers array, try to add a number in distinctNumbers. If it is added successfully then it is a unique number, otherwise we will add it in a duplicateNumbers.

 foreach (int number in numbers)
{
if (!distinctNumbers.Add(number))
duplicateNumbers.Add(number);
}

Output:

original array
4 7 2 3 4 5 3 6 7 8 1 8
distinct Numbers
4 7 2 3 5 6 8 1
duplicate Numbers
4 3 7 8

⚠️Downsides: These are the few downsides for using HashSet.

  1. Memory usage: The `HashSet` approach requires additional memory to store the unique and duplicate elements separately. In worst-case scenario, where all the elements are unique, you will end up with a hash set that’s the same size as the original array.
  2. Ordering: The `HashSet` approach does not preserve the order of the original array, so if order matters in your application, you will need to find a different approach.
  3. Performance: The performance of the `HashSet` approach can degrade if the input array is very large, as the `Add` method has a time complexity of O(1) on average, but in the worst case, it could be O(n) if there are many hash collisions.
  4. Data type limitations:
  • When you use the HashSet approach to find duplicates and unique elements in an array, the elements in the array need to be of a certain type. Specifically, they need to be either a value type (like int, float, or bool) or a reference type (like string, object, or a custom class that you define).
  • However, if you have an array of a custom object type that you’ve created, like a Person class that has properties for name, age, and address, you may run into a problem. The reason is that the HashSet class doesn't know how to compare two Person objects to see if they're equal or not.
  • To solve this problem, you need to teach the HashSet class how to compare Person objects. You can do this by implementing the IEquatable<T> interface on your Person class. This interface provides a way to define how two objects of the same type should be compared for equality.
  • If you don’t implement the IEquatable<T> interface (or another interface called IEqualityComparer<T>), the HashSet class will use the default comparison behavior, which is to check if the objects are the same reference in memory. This means that two Person objects that have the same name, age, and address but are different objects in memory will not be considered equal by the HashSet class.
  • So, to summarize, if you’re using the HashSet approach to find duplicates and unique elements in an array, make sure that the elements in the array are of a type that can be compared for equality by the HashSet class. If you're using a custom object type, you'll need to implement the IEquatable<T> interface (or another equivalent interface) to define how two objects of that type should be compared for equality

This approach is similar to using a HashSet, but instead of storing only the unique values, we store each element of the input array in a Dictionary as a key and the number of times it appears in the array as the value. This approach has a time complexity of O(n) and is useful if we need to keep track of the count of each duplicate value.

 int[] numbers = { 4, 7, 2, 3, 4, 5, 3, 6, 7, 8, 1, 8, 8 };
Dictionary<int, int> counts = new Dictionary<int, int>();
foreach (int num in numbers)
{
if (counts.ContainsKey(num))
counts[num]++;
else
counts[num] = 1;
}

int[] distinctNumbers = counts.Select(kv => kv.Key).ToArray();

int[] duplicateNumbers = counts.Where(a => a.Value > 1)
.Select(kv => kv.Key).ToArray();

Console.WriteLine("original array");
PrintArray(numbers);

Console.WriteLine("distinct Numbers");
PrintArray(distinctNumbers);

Console.WriteLine("duplicate Numbers");
PrintArray(duplicateNumbers);

Console.ReadLine();

👉Note: PrintArray() method is defined in the previous example, so I am not going to reimplement in this example and next example.

Let’s break it. We are extracting distinct and duplicate numbers from the array

int[] numbers = { 4, 7, 2, 3, 4, 5, 3, 6, 7, 8, 1, 8 };

We have created a dictionary with name counts , a number in a array will be key and number of occurences will be stored as a value.

Dictionary<int, int> counts = new Dictionary<int, int>();

We will iterate over the numbers array and we will check if this number is already exists as a key or not. If a number is not present as a key in dictionary, it means it is the first occurrence , we will assign 1 to this key. Otherwise we will increment it’s value by 1.

 foreach (int num in numbers)
{
if (counts.ContainsKey(num))
counts[num]++;
else
counts[num] = 1;
}

Since all the keys will always be unique, we can store those in the distinctNumbers array.

If key’s value is greater that 1, it means it is a duplicate, we can select those values from dictionary and store in the duplicateNumbers array.

int[] distinctNumbers = counts.Select(kv => kv.Key).ToArray();

int[] duplicateNumbers = counts.Where(a => a.Value > 1)
.Select(kv => kv.Key).ToArray();

Output:

⚠️Downsides : These are the few downsides for using dictionary.

  1. Extra memory usage: The dictionary will use extra memory to store the key-value pairs, which can be a problem if you’re working with very large arrays.
  2. Slower performance: Creating a dictionary involves some overhead, such as allocating memory and performing hash calculations. For small arrays, this may not be noticeable, but for larger arrays it can slow down your program.
  3. Unpredictable order: When you extract distinct and duplicate elements using a dictionary, the order of the elements may not be preserved. This may not matter for some applications, but for others it can be important.
  4. Limited to one key and value type: Dictionaries in C# are limited to using a specific key and value type. If you need to extract elements of a different type or structure, you may need to use a different approach.

The LINQlibrary provides a concise and expressive way to extract unique and duplicate values from an array. For example, we can use the Distinct() method to extract unique values and the GroupBy() method to group the duplicates. It have a time complexity of O(n).


int[] numbers = { 4, 7, 2, 3, 4, 5, 3, 6, 7, 8, 1, 8, 8 };
int[] distinctNumbers = numbers.Distinct().ToArray();
int[] duplicateNumbers = numbers
.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToArray();

Console.WriteLine("original array");
PrintArray(numbers);

Console.WriteLine("distinct Numbers");
PrintArray(distinctNumbers);

Console.WriteLine("duplicate Numbers");
PrintArray(duplicateNumbers);

Console.ReadLine();

We do not need to break it much. With Distinct() , we can get the distinct numbers and using GroupBy(), we can get duplicate numbers.

int[] distinctNumbers = numbers.Distinct().ToArray();
int[] duplicateNumbers = numbers
.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToArray();

Oututput:

original array
4 7 2 3 4 5 3 6 7 8 1 8 8
distinct Numbers
4 7 2 3 5 6 8 1
duplicate Numbers
4 7 3 8

⚠️ Downsides: These are the few downsides for using LINQ.

  1. Performance: LINQ operations can be slower than direct loops or specialized data structures, especially for large datasets. If performance is critical, it may be better to use a more specialized approach.
  2. Memory usage: LINQ operations often create new objects or collections, which can increase memory usage. This can be a concern if memory usage is a limiting factor in the application.
  3. Maintenance: LINQ expressions can sometimes be difficult to read or understand, which can make it harder to maintain or modify the code in the future. Additionally, using LINQ may require a developer to have a solid understanding of the LINQ syntax and behavior.

All the approaches have its benefits and downsides, choose carefully according to your need.

HashSet is the most efficient and recommended approach for large arrays, as it has a faster lookup time and doesn’t modify the original array. However, if preserving the original array is important, making a copy of the array is required.

Dictionary is a good alternative, but it’s less efficient than HashSet and requires more memory.

LINQ can also be used, but it’s slower than HashSet and Dictionary and can impact performance for large arrays.

If you find this article useful, then consider to clap and share it. You can also follow me on YouTube, I have created lots of content related to .net core and angular.

Connect with me
👉 YouTube: https://youtube.com/@ravindradevrani
👉 Twitter: https://twitter.com/ravi_devrani
👉 GitHub: https://github.com/rd003

Become a supporter ❣️:
You can buy me a coffee 🍵 : https://www.buymeacoffee.com/ravindradevrani

Thanks a lot 🙂🙂

Multiple ways to find duplicates in c# array with pros and cons (2024)
Top Articles
What is a Managed Service Provider (MSP)?
Categorizing Credit Card Payments
Instacart Shopper Change Payment Method
Filing of beneficial ownership details with the Register of Beneficial Owners
John Bobbitt Net Worth
Ksp Er-7500
Flanagan-Watts Funeral Home Obituaries
Loopnet Properties For Sale
Pappadeaux Seafood Kitchen - Careers
781-866-8521
Ultimate Wizard101 Beginner Guide - Final Bastion
Aeries Portal Tulare
Dan Mora Growth
Telegram Scat
Allmovieshub. In
Gotcha Paper 2022 Danville Va
Tamara Lapman
Nm Ose
Toledo Schools Closed
The Exorcist: Believer Showtimes Near Regal Carlsbad
Sunset On June 21 2023
800 Times 6
Calverton-Galway Local Park Photos
New England Revolution vs CF Montréal - En vivo MLS de Estados Unidos - 2024 - Fase Regular
Lagrange Tn Police Officer
415-261-2242
Translations Of Linear Functions Worksheet Answer Key
Madden 24 Repack
97226 Zip Code
Caprijeans ARIZONA Ultra Stretch Gr. 36, N-Gr, rosa Damen Jeans High Waist mit seitlichem Streifen
Alumni of University of Michigan: class of 1978
Courier Press Sports
Dollar General Warehouse Pay Rate
Botw Royal Guard
Emerson Naturals Kratom
This Modern World Daily Kos
Massui Login
Ogden Body Rubs
What Kinds of Conditions Can a General Surgeon Operate On?
149 Capstone Project Ideas & Examples – 2024
Sriracha Sauce Dollar General
9Xmovie Worldfree4U
28 Box St
Aultman.mysecurebill
Craigslist South Jersey Nj
Wmlink/Sspr
Janitronics Team Hub
Christopher Carlton Cumberbatch
Ups Printing Services
indianapolis community "free" - craigslist
Sound Of Freedom Showtimes Near Sperry's Moviehouse Holland
Latest Posts
Article information

Author: Tish Haag

Last Updated:

Views: 6344

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Tish Haag

Birthday: 1999-11-18

Address: 30256 Tara Expressway, Kutchburgh, VT 92892-0078

Phone: +4215847628708

Job: Internal Consulting Engineer

Hobby: Roller skating, Roller skating, Kayaking, Flying, Graffiti, Ghost hunting, scrapbook

Introduction: My name is Tish Haag, I am a excited, delightful, curious, beautiful, agreeable, enchanting, fancy person who loves writing and wants to share my knowledge and understanding with you.