Pages

Tuesday, 9 April 2013

Removing duplicates from an Array or Collection - Part 2 (last part)


In the previous post we discussed how to remove duplicates from Arrays/Collections.

But then question question in mind:
What if we want to remove duplicate Objects of user-defined Classes using this approach?
Suppose we have a class with its own fields. Then how to remove duplicates of such a class from an/a Array/Collection?
Suppose we have class like:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
public class TestObject {

    private int id;    
    private String name;

    public TestObject(int id, String name) {
        this.id = id;
        this.name = name;
    }

    // getter and setter methods

    // Override toString() method to print the object in a readable 
    // format
    @Override
    public String toString() {
        return "TestObject{" + "id=" + id + ", name=" + name + '}';
    }
}

To remove duplicates from an Array of this class, we modify our method like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
public static TestObject[] removeDuplicates(TestObject[] inputArray) {

    // first, convert Array into List
    List<TestObject> objectsList = Arrays.asList(inputArray);

    // pass this List into Set's constructor
    Set<TestObject> objectsSet = new LinkedHashSet(objectsList);

    // create an Array of length equal to set.size() 
    TestObject[] outputArray = new TestObject[objectsSet.size()];

    // pass the newly created array to the toArray method to store 
    // Set's objects in Array and return the array
    return objectsSet.toArray(outputArray);
}
The use case of this is given below:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
public static void main(String... args) {

    // create some objects of TestObject class
    TestObject obj1 = new TestObject(3, "Abdullah");
    // notice that obj2 is similar to obj1
    TestObject obj2 = new TestObject(3, "Abdullah");
    TestObject obj3 = new TestObject(1, "Maaz");
    // notice that obj4 is similar to obj3
    TestObject obj4 = new TestObject(1, "Maaz");
    TestObject obj5 = new TestObject(6, "Hamza");
    TestObject obj6 = obj5;
    TestObject obj7 = new TestObject(2, "Salman");
    TestObject obj8 = obj7;
    TestObject obj9 = new TestObject(5, "Hammad");
    TestObject obj10 = obj9;

    // create an array that contain duplicate objects
    TestObject[] arrayWithDuplicates = new TestObject[]{obj1, obj2, obj3, obj4, obj5, obj6, obj7, obj8, obj9, obj10};

    // print array with duplicates
    System.out.println("Printing Array with Duplicates------------------- ");
    for (TestObject obj : arrayWithDuplicates) {
        System.out.println(obj);
    }

    // pass this array to removeDuplicates method and get output in another array
    TestObject[] newArray = removeDuplicatesFrom(arrayWithDuplicates);

    System.out.println("Printing new Array------------------- ");
    for (TestObject obj : newArray) {
        System.out.println(obj);
    }
}

The output of this is:
run:
Printing Array with Duplicates------------------- 
TestObject{id=3, name=Abdullah}
TestObject{id=3, name=Abdullah}
TestObject{id=1, name=Maaz}
TestObject{id=1, name=Maaz}
TestObject{id=6, name=Hamza}
TestObject{id=6, name=Hamza}
TestObject{id=2, name=Salman}
TestObject{id=2, name=Salman}
TestObject{id=5, name=Hammad}
TestObject{id=5, name=Hammad}

Printing new Array------------------- 
TestObject{id=3, name=Abdullah}
TestObject{id=3, name=Abdullah}
TestObject{id=1, name=Maaz}
TestObject{id=1, name=Maaz}
TestObject{id=6, name=Hamza}
TestObject{id=2, name=Salman}
TestObject{id=5, name=Hammad}
BUILD SUCCESSFUL (total time: 2 seconds)
As we can see that it has only removed the duplicate objects with same references. i.e:
    TestObject obj5 = new TestObject(6, "Hamza");
    TestObject obj6 = obj5;
    TestObject obj7 = new TestObject(2, "Salman");
    TestObject obj8 = obj7;
    TestObject obj9 = new TestObject(5, "Hammad");
    TestObject obj10 = obj9;

But has not has not removed the duplicates with same values. like:
    TestObject obj1 = new TestObject(3, "Abdullah");
    TestObject obj2 = new TestObject(3, "Abdullah");
    TestObject obj3 = new TestObject(1, "Maaz");
    TestObject obj4 = new TestObject(1, "Maaz");

Solution:

To achieve our goal, we have to override the equals(Object) method of java.lang.Object class. 
We have to override this method to check if an object "equals" the object of our class? If yes, make it to return true, else return false.
After Overriding this method, out TestObject class looks like:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
public class TestObject {

    private int id;    
    private String name;

    public TestObject(int id, String name) {
        this.id = id;
        this.name = name;
    }

    // getter and setter methods

    // Override toString() method to print the object in a readable 
    // format
    @Override
    public String toString() {
        return "TestObject{" + "id=" + id + ", name=" + name + '}';
    }

    @Override
    public boolean equals(Object object) {

        // if given object is null, return false
        if (object == null) {
            return false;
        }

        // check if given object is an instance of TestObject class.
        // instanceof is an operator in java that checks if an object
        // is an instance of a given class. more info on instanceof 
        // operator can be found here

        if (object instanceof TestObject) {
            // cast object to TestObject 
            TestObject other = (TestObject) object;

            // match the values of given object with this object
            // and return true in case if values match!
            if (this.id == other.id &amp;&amp; this.name.equals(other.name)) {
                return true;
            }
        }
        return false;
    }
}

After modifing TestObject class, the output is:

run:
Printing Array with Duplicates------------------- 
TestObject{id=3, name=Abdullah}
TestObject{id=3, name=Abdullah}
TestObject{id=1, name=Maaz}
TestObject{id=1, name=Maaz}
TestObject{id=6, name=Hamza}
TestObject{id=6, name=Hamza}
TestObject{id=2, name=Salman}
TestObject{id=2, name=Salman}
TestObject{id=5, name=Hammad}
TestObject{id=5, name=Hammad}

Printing new Array------------------- 
TestObject{id=3, name=Abdullah}
TestObject{id=1, name=Maaz}
TestObject{id=6, name=Hamza}
TestObject{id=2, name=Salman}
TestObject{id=5, name=Hammad}
BUILD SUCCESSFUL (total time: 2 seconds)
Now we can see that now all duplicates have been removed!
But the output is not sorted because we are using the LinkedHashSet.
If we try using the TreeMap in removeDuplicates method, the compiler would throw an Exception.

In the next post, we will see how we can use TreeSet to remove duplicates and to sort Objects of such classes.

References:

See Also:

Removing duplicates from an Array or Collection - Part 1


Problem:

Sometimes  we have an array or collection of objects from which we want to remove the duplicates.

Solution:

There are various duplicate-removal algorithms out there to resolve this issue. But the best suggested solution for this issue is to use a Set!
Since its the nature of Sets that there are no duplicates in them, so if we somehow convert our Array/Collection to a Set type, things would get simplified!
The Set is implemented in various programming languages like: Objective-C, C#, and Java.

In java, we can use a child class of interface java.util.Set, like a TreeSetHashSet, or a LinkedHashSet
The advantage of using a TreeSet is that it not only remove the duplicates, but also sorts the Data in Ascending order.
The HashSet does not preserve the ordering the Set, so it can be used if you are not interested in ordering.
The LinkedHashSet keeps the original ordering of the Set. So it is advisable to be used if you want to keep the original ordering of your Set!

In java, following is the way to convert an Array into a Set and return back the new Array without remove duplicates!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
public static Integer[] removeDuplicates(Integer[] inputArray) {
    // first, convert Array into List
    List<Integer> integersList = Arrays.asList(inputArray);

    // pass this List into Set's constructor
    Set<Integer> integersSet = new TreeSet(integersList);

    // create an Array of length equal to set.size() 
    Integer[] outputArray = new Integer[integersSet.size()];

    // pass the newly created array to the toArray method to store Set's 
    // objects in Array and return the array
    return integersSet.toArray(outputArray);
}

Notice that we have used Integer instead of int. The reason is that in java, Collections can not contain primitive data types like: int, float, char, boolean, long, and double etc.
Therefor we must use the wrapper classes of these primitive data types like: Integer, Float, Character, Boolean, Long, and Double etc.
A simple use case of above method would be:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
public static void main(String... args) {
    // create an unsorted array that contain duplicates
    Integer[] arrayWithDuplicates = new Integer[]{1, 3, 3, 5, 5, 6, 8, 9, 3, 4, 5, 3, 2, 0};
      
    // pass this array to removeDuplicates method and get output in another array
    Integer[] newArray = removeDuplicates(arrayWithDuplicates);
    
    // print the new Array!
    for (int i : newArray) {
        System.out.println(i);
    }
}

The output of this is:

run:
0
1
2
3
4
5
6
8
9
BUILD SUCCESSFUL (total time: 2 seconds)
Notice that the duplicates are not only removed but the array has also been sorted in Ascending order. This is due to the TreeSet.
If we want to preserve the order of the array, then use a LinkedHashSet instead of TreeSet.
To do so, replace following line:
Set<Integer> integersSet = new TreeSet(integersList);
with
Set<Integer> integersSet = new LinkedHashSet(integersList);
in removeDuplicates method.
This approach can be applied to remove duplicates  from Arrays/Collections of String, Integer, Float, Double, Character, Boolean, and Long etc.

In the the next (final) part, we will discuss how to remove duplicate Objects of user-defined Classes using this approach.

References: