Pages

Tuesday, 9 April 2013

Removing duplicates from an Array or Collection - Part 1


Problem:

Sometimes  we have an array or collection of objects from which we want to remove the duplicates.

Solution:

There are various duplicate-removal algorithms out there to resolve this issue. But the best suggested solution for this issue is to use a Set!
Since its the nature of Sets that there are no duplicates in them, so if we somehow convert our Array/Collection to a Set type, things would get simplified!
The Set is implemented in various programming languages like: Objective-C, C#, and Java.

In java, we can use a child class of interface java.util.Set, like a TreeSetHashSet, or a LinkedHashSet
The advantage of using a TreeSet is that it not only remove the duplicates, but also sorts the Data in Ascending order.
The HashSet does not preserve the ordering the Set, so it can be used if you are not interested in ordering.
The LinkedHashSet keeps the original ordering of the Set. So it is advisable to be used if you want to keep the original ordering of your Set!

In java, following is the way to convert an Array into a Set and return back the new Array without remove duplicates!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
public static Integer[] removeDuplicates(Integer[] inputArray) {
    // first, convert Array into List
    List<Integer> integersList = Arrays.asList(inputArray);

    // pass this List into Set's constructor
    Set<Integer> integersSet = new TreeSet(integersList);

    // create an Array of length equal to set.size() 
    Integer[] outputArray = new Integer[integersSet.size()];

    // pass the newly created array to the toArray method to store Set's 
    // objects in Array and return the array
    return integersSet.toArray(outputArray);
}

Notice that we have used Integer instead of int. The reason is that in java, Collections can not contain primitive data types like: int, float, char, boolean, long, and double etc.
Therefor we must use the wrapper classes of these primitive data types like: Integer, Float, Character, Boolean, Long, and Double etc.
A simple use case of above method would be:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
public static void main(String... args) {
    // create an unsorted array that contain duplicates
    Integer[] arrayWithDuplicates = new Integer[]{1, 3, 3, 5, 5, 6, 8, 9, 3, 4, 5, 3, 2, 0};
      
    // pass this array to removeDuplicates method and get output in another array
    Integer[] newArray = removeDuplicates(arrayWithDuplicates);
    
    // print the new Array!
    for (int i : newArray) {
        System.out.println(i);
    }
}

The output of this is:

run:
0
1
2
3
4
5
6
8
9
BUILD SUCCESSFUL (total time: 2 seconds)
Notice that the duplicates are not only removed but the array has also been sorted in Ascending order. This is due to the TreeSet.
If we want to preserve the order of the array, then use a LinkedHashSet instead of TreeSet.
To do so, replace following line:
Set<Integer> integersSet = new TreeSet(integersList);
with
Set<Integer> integersSet = new LinkedHashSet(integersList);
in removeDuplicates method.
This approach can be applied to remove duplicates  from Arrays/Collections of String, Integer, Float, Double, Character, Boolean, and Long etc.

In the the next (final) part, we will discuss how to remove duplicate Objects of user-defined Classes using this approach.

References:




No comments:

Post a Comment