6.11. Shallow vs. deep copies

As a last, relatively minor point before some of the more advanced fundamentals, we have a small note about making copies. This doesn’t always give the behavior people are expecting when they are starting out.

If you have code such as

a = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
b = list(a)

b is a copy of a. However, it is a shallow copy. To save space in memory, the entries (0 to 8 in this case) aren’t actually copied. Rather, a new variable b is made, and its entries just point to the the original entries in a. This is shown below.

Illustration of a shallow copy

If you then run

a[0][0] = '1'

you’ll find that both a and b have updated. However, if you run

a.append([9, 10, 11])

b won’t have this added to it. b won’t know where any additional locations in a from after the copy was made. This use of memory is illustrated in the figure below.

Illustration of appending items to a shallow copy

This can lead to some confusing behavior if you’re not expecting it!

If you want two completely independent copies of a piece of data you need to make a deep copy. In Python this is done with

import copy
c = copy.deepcopy(a)

In general deep copies should only be used when you really need them. They will use more memory, and for large items it can take quite a lot of time to actually make a copy of all of the data (rather than just pointing to the data that’s already present).