Variable references in Python
In python when we assign a value to a name, we actually create an object and a reference to it. For example in a=1, an object with value '1' is created in memory and a reference 'a' now points to it. Because I switched to python after programming in C for a while, I used to think that if I added another assignment statement to the same code (keeping the value the same) i.e. b=1, a new object would be created and a new reference 'b' would be pointing to it. But I was wrong. That made me start looking at what reference counting meant and how it operates in Python. Let's look at the following example piece of code and how it executes:
a=1
b=1
c=2
For the first statement as we agreed above, an object in memory is initialized with value 1. A reference 'a' is added to it and the reference count of '1' increments. When Python executes the next statement b=1, since it is the same value (1), a new object is not initialized. The same object in memory with value 1 just has another reference 'b' added to it. This is part of the Python memory management process. For c=2, a new object is again created since it has a new value of 2 with c as a reference now pointing to it.
We can check the reference counts of every object in python by importing the python library module-sys and accessing its function getrefcount.
import sys
print sys.getrefcount(a)
We can also understand this better by accessing the memory addresses associated with these values by using the id module: a, b refer to the same address in memory and c refers to a diff address. The following code was executed using ipython:
In [1]: a=1
In [2]: b=1
In [3]: id(a)
Out[3]: 38127544
In [4]: id(b)
Out[4]: 38127544
In [5]: c=2
In [6]: id(c)
Out[6]: 38127520
Here is how it looks interesting in lists.
L = [1,1,1,1]
41596856
>>>id(L[0])
41596856
>>>id(L[1])
41596856
>>>id(L[2])
41596856
>>>id(L[3])
41596856
All elements of the list are only refering to one address which contains the value 1.
it would have been more helpful if sys.getrefcount() was elobrated on more
i couldnt get it to work
>>> a
1
>>> b
1
>>> id(a)
140489734678048
>>> id(b)
140489734678048
>>> id(1)
140489734678048
>>> sys.getrefcount(a)
1072
>>> sys.getrefcount(b)
1072
>>> sys.getrefcount(1)
1074
what does the the returned value for sys.getrefcount() means ?
Hi
It may be possible that the value 1 is being used elsewhere in your Python code or shell. Did you write any piece of code before this in the same shell?
You might like to test it by reassigning a or b to a different value like a=2 and see if the sysrefcount(1) value decrement by 1.
Small integers are special. They are indeed same objects for optimization purpose. If you check some higher ones they will be different.
Thanks. That is interesting to know and makes sense. I did try for very big noes after seeing your comment and also modifying the list like l=[1000008971, 890123455555, 1000008971, 1000008971] but only see the address 140373007772400 (<type ‘int’>) being used. I am using python 2.7 and also I guess the computations maybe processor CPU dependent?
Check this link, I think discussion there covers it better than my memory : https://stackoverflow.com/questions/306313/is-operator-behaves-unexpectedly-with-integers
When I run code separately on ipython, I see the difference as you mention in your first comment for x and y as small integers and large integers. I think in this particular case because it is a list -it has something to do with mutability and also contiguous allocation of memory. I found another link on python memory management that may be helpful as well- https://stackoverflow.com/questions/11596371/how-does-python-memory-management-work. It mentions that memory management works differently for different data types.
Nice practical look at reference counting and object identity!