Tuesday, December 17, 2019

Python Copy and Deepcopy


Assignment statements (=) in Python do not copy objects, they create bindings between a target and an object. When user use = operator user thinks that this creates a new object well, it doesn’t. It only creates a new variable that shares the reference of the original object. Sometimes a user wants to work with mutable objects, in order to do that user looks for a way to create “real copies” or “clones” of these objects. Or, sometimes a user wants copies that user can modify without automatically modifying the original at the same time, in order to do that we create copies of objects.


In Python, there are two ways to create copies:

*   Deep copy
*   Shallow copy

Above two methods are provided by python "copy" module.

What is Shallow Copy ?


A shallow copy means constructing a new collection object and then populating it with references to the child objects found in the original. In case of shallow copy, a reference of object is copied in other object. It means that any changes made to a copy of object do reflect in the original object. In python, this is implemented using “copy()” function.

Note: The copying process does not recurse and therefore won’t create copies of the nested child objects themselves.

What is Deep Copy ?


Deep copy is a process in which the copying process occurs recursively. It means first constructing a new collection object and then recursively populating it with copies of the child objects found in the original. In case of deep copy, a copy of object is copied in other object. It means that any changes made to a copy of object do not reflect in the original object. In python, this is implemented using “deepcopy()” function.

There are two problems often exist with deep copy operations that don’t exist with shallow copy operations:

    *   Recursive objects may cause a recursive loop.
    *   Because deep copy copies everything it may copy too much, such as data which is intended to be shared between copies.

How Deepcopy avoids such problems ?

    *   Keeping a “memo” dictionary of objects already copied during the current copying pass.
    *   Letting user-defined classes override the copying operation or the set of components copied.

Simple Example of Shallow copy without using copy module.

L = ["A", "B", "C", "D", [1, 2, 3], "E", "F"]

print("Original List Before Copy", L)
L1 = list(L)
print("Original List After Copy", L)

print("Copy List L1 Before Changes", L1)
L1[0] = "a"
L1[1] = "b"
L1[4][0] = 4
L1[4][1] = 5 
print("Copy List L1 After Changes", L1)
print("Original List L After Changes in L1 List", L)

Original List Before Copy ['A', 'B', 'C', 'D', [1, 2, 3], 'E', 'F']
Original List After Copy ['A', 'B', 'C', 'D', [1, 2, 3], 'E', 'F']
Copy List L1 Before Changes ['A', 'B', 'C', 'D', [1, 2, 3], 'E', 'F']
Copy List L1 After Changes ['a', 'b', 'C', 'D', [4, 5, 3], 'E', 'F']
Original List L After Changes in L1 List ['A', 'B', 'C', 'D', [4, 5, 3], 'E', 'F']

As you can see we made changes to L1 list on "0" and "1" position which are not reflected in original list but on "4" position we have another list object which was pointing to original object so changes made to this objects are reflected in original list L. So this is the disadvantage of shallow copy.


Above same example using copy module.


import copy
L = ["A", "B", "C", "D", [1, 2, 3], "E", "F"]

print("Original List Before Copy", L)
# using copy module
L1 = copy.copy(L)

print("Original List After Copy", L)
print("Copy List L1 Before Changes", L1)
L1[0] = "a"
L1[1] = "b"
L1[4][0] = 4
L1[4][1] = 5 
print("Copy List L1 After Changes", L1)
print("Original List L After Changes in L1 List", L)

Original List Before Copy ['A', 'B', 'C', 'D', [1, 2, 3], 'E', 'F']
Original List After Copy ['A', 'B', 'C', 'D', [1, 2, 3], 'E', 'F']
Copy List L1 Before Changes ['A', 'B', 'C', 'D', [1, 2, 3], 'E', 'F']
Copy List L1 After Changes ['a', 'b', 'C', 'D', [4, 5, 3], 'E', 'F']
Original List L After Changes in L1 List ['A', 'B', 'C', 'D', [4, 5, 3], 'E', 'F']

As we can see output remains same as above.

Above same example using Deepcopy.


import copy
L = ["A", "B", "C", "D", [1, 2, 3], "E", "F"]

print("Original List Before Copy", L)
# using deepcopy module
L1 = copy.deepcopy(L)

print("Original List After Copy", L)
print("Copy List L1 Before Changes", L1)
L1[0] = "a"
L1[1] = "b"
L1[4][0] = 4
L1[4][1] = 5 
print("Copy List L1 After Changes", L1)
print("Original List L After Changes in L1 List", L)

Original List Before Copy ['A', 'B', 'C', 'D', [1, 2, 3], 'E', 'F']
Original List After Copy ['A', 'B', 'C', 'D', [1, 2, 3], 'E', 'F']
Copy List L1 Before Changes ['A', 'B', 'C', 'D', [1, 2, 3], 'E', 'F']
Copy List L1 After Changes ['a', 'b', 'C', 'D', [4, 5, 3], 'E', 'F']
Original List L After Changes in L1 List ['A', 'B', 'C', 'D', [1, 2, 3], 'E', 'F']

As you can see in above result, using deepcopy changes made to L1 list are not reflected in original list not even in child objects at "4" position. This is the advantage of using deepcopy as it creates complete new copy of original object so changes made to original object or changes made to copied object does not affect each other.


Example of using copy and deepcopy Magic (Dunder) Methods.


import copy
class Employee(object):
    def __init__(self):
        self.name = "John"
        self.age = 35
    def __copy__(self):
        cls = self.__class__
        instance = cls.__new__(cls)
        instance.__dict__.update(self.__dict__)
        return instance
    def __deepcopy__(self, memo):
        cls = self.__class__
        instance = cls.__new__(cls)
        memo[id(self)] = instance
        for k, v in self.__dict__.items():
            setattr(instance, k, copy.deepcopy(v, memo))
        return instance

emp = Employee()
emp_1 = copy.copy(emp)

print(emp.age)
print(emp_1.age)

emp.age = 25
del emp.name
emp.salary = 25000
emp_2 = copy.deepcopy(emp)
emp_2.age
emp_2.name
emp_2.salary

35
35
25
AttributeError: 'Employee' object has no attribute 'name'
25000

No comments:

Post a Comment