Should pybindgen be using PyObject_GC_Del?

Asked by Mike Owen

I'm seeing some fairly nasty memory leaks occurring in my pybindgen wrapped code, and I'm wondering if garbage collection is being properly supported in the generated code. For instance, according to the documentation at http://docs.python.org/c-api/gcsupport.html objects that are containers of other objects that could also be containers need to be allocated using PyObject_GC_New and deallocated with PyObject_GC_Del. There's also the issue with PyObject_GC_Track, but I'm not sure if that's applicable. I note that when pybindgen wrapped objects have allow_subclassing=True they are allocated with PyObject_GC_New. However, it appears in no case are they correspondingly deallocated with PyObject_GC_Del. Does this mean there's a potential memory problem here? In my case I have a lot of objects that are exposed as pointers to objects that are owned by other objects. I use the custodian memory pattern to handle these cases, but I'm concerned that garbage collection is not getting to destroy these things at the appropriate times, which would explain the memory leaks I'm seeing.

I haven't been able to boil this down to a small standalone example yet unfortunately, but I can try if it's helpful. Right now I'm just wondering if there's a potential problem here due to the lack of PyObject_GC_Del, or if this isn't an issue.

Question information

Language:
English Edit question
Status:
Solved
For:
PyBindGen Edit question
Assignee:
No assignee Edit question
Solved by:
Mike Owen
Solved:
Last query:
Last reply:
Revision history for this message
Gustavo Carneiro (gjc) said :
#1

If you use reference counting, I think it is possible that a reference cycle is being created with the help of the links between C++ objects, if those links are hidden from Python. In that case, we would have to use PyObject_GC_New/Del, for starter, but also implement a special tp_traverse for the class that understands the links in C++ space. But this is a unlikely scenario, I think... Regarding PyObject_GC_Track/Untrack, these are lower level APIs, which are already used automatically by PyObject_GC_New/Del.

If you use custodians, notice that it asserts if the "custodian" class does not have allow_subclassing=True because it needs to access an instance dictionary, to access a __wards__ attribute, which is a list of wards. So if you use custodian the python GC should already be able to detect the cycles between wards and custodians and free them all. It works in the unit tests, so I would hope it works also in the real world.

So you are saying PyObject_GC_Del is not used, even with allow_subclassing=True? Hmm.. I'll have to investigate in that case, because I think it should...

Revision history for this message
Mike Owen (mikeowen) said :
#2

Well in my case I'm not implementing any reference counting under the covers. The links between objects are that some of the C++ objects are data members of others, hence the custodian and ward pattern. Some of these member objects are quite large though (big arrays), and it seems that accessing them from python repeatedly is chewing up memory until the machine actually crashes. This same code does not have this problem under Boost.Python, so I'm pretty sure it's a problem with my pybindgen bindings (it could easily be me misusing it somewhow of course). I got suspicious when I noticed that in the pybindgen generated code there are over 3600 PyObject_GC_New calls, but not a single PyObject_GC_Del! I don't even see in pybindgen itself anywhere that PyObject_GC_Del would be generated?

Revision history for this message
Gustavo Carneiro (gjc) said :
#3

Looking at the generated code, it calls the type object's tp_free slot, which should do the right thing:

static void
_wrap_PyFoo__tp_dealloc(PyFoo *self)
{
    [...]
    self->ob_type->tp_free((PyObject*)self);
}

About your memory leak, I have no idea. But I understand debugging memory leaks is always hard... maybe valgrind will help, at least to discover which object is leaking?..

Revision history for this message
Gustavo Carneiro (gjc) said :
#4

Also keep in mind that the Python GC may not be working as fast as you would like. To be sure, test by inserting the following code at critical places:

   while gc.collect():
       pass

This code will essentially tell Python: "collect all memory that you can _right now_".

Revision history for this message
Mike Owen (mikeowen) said :
#5

OK, here's a small example that seems odd to me. This could just be my misunderstanding how the custodian pattern should be working, so please correct me if so!

Consider this example module exposing two classes:

----------------------------------------------------------------------------------------------------------------
Header example.hh:

#include <iostream>

using namespace std;

class A {
public:
  A(int val = 0): mVal(val) { std::cout << "A constructor." << std::endl; }
  virtual ~A() {}
  int val() const { return mVal; }
  void val(int x) { mVal = x; }
private:
  int mVal;
};

class B {
public:
  B(): mAptr(new A) { std::cout << "B constructor with " << mAptr << std::endl; }
  virtual ~B() { delete mAptr; }
  A* Aptr() { return mAptr; }
private:
  A* mAptr;
};

----------------------------------------------------------------------------------------------------------------
Pybindgen generation code:

from pybindgen import *
import sys

mod = Module("example")
mod.add_include('"example.hh"')

a = mod.add_class("A", allow_subclassing=True)
a.add_constructor([param("int", "val", default_value="0")])
a.add_instance_attribute("val", "int", getter="val", setter="val")

b = mod.add_class("B", allow_subclassing=True)
b.add_constructor([])
b.add_method("Aptr", retval("A*", custodian=0), [])

f = open("example.C", "w")
mod.generate(FileCodeSink(f))
f.close()

----------------------------------------------------------------------------------------------------------------
Now run the following test code:

import example
b = example.B()
a = b.Aptr()
for i in xrange(100):
    b.Aptr().val = i
    print a.val, b.Aptr().val
import gc
while gc.collect():
    pass
print b.__wards__

----------------------------------------------------------------------------------------------------------------
Here's the output:

alastor1{owen}415: python -i example.py
A constructor.
B constructor with 0x819b20
0 0
1 1
2 2

.... I'm clipping out the expected middle numbers here....

96 96
97 97
98 98
99 99
[<example.A object at 0x2aaaaabd4e60>, <example.A object at 0x2aaaaabd4ea8>, <example.A object at 0x2aaaaabd4ef0>, <example.A object at 0x2aaaaabd4f38>, <example.A object at 0x2aaaaabd4f80>, <example.A object at 0x2aaaaabd4fc8>, <example.A object at 0x2aaaaabd9050>, <example.A object at 0x2aaaaabd9098>, <example.A object at 0x2aaaaabd90e0>, <example.A object at 0x2aaaaabd9128>, <example.A object at 0x2aaaaabd9170>, <example.A object at 0x2aaaaabd91b8>, <example.A object at 0x2aaaaabd9200>, <example.A object at 0x2aaaaabd9248>, <example.A object at 0x2aaaaabd9290>, <example.A object at 0x2aaaaabd92d8>, <example.A object at 0x2aaaaabd9320>, <example.A object at 0x2aaaaabd9368>, <example.A object at 0x2aaaaabd93b0>, <example.A object at 0x2aaaaabd93f8>, <example.A object at 0x2aaaaabd9440>, <example.A object at 0x2aaaaabd9488>, <example.A object at 0x2aaaaabd94d0>, <example.A object at 0x2aaaaabd9518>, <example.A object at 0x2aaaaabd9560>, <example.A object at 0x2aaaaabd95a8>, <example.A object at 0x2aaaaabd95f0>, <example.A object at 0x2aaaaabd9638>, <example.A object at 0x2aaaaabd9680>, <example.A object at 0x2aaaaabd96c8>, <example.A object at 0x2aaaaabd9710>, <example.A object at 0x2aaaaabd9758>, <example.A object at 0x2aaaaabd97a0>, <example.A object at 0x2aaaaabd97e8>, <example.A object at 0x2aaaaabd9830>, <example.A object at 0x2aaaaabd9878>, <example.A object at 0x2aaaaabd98c0>, <example.A object at 0x2aaaaabd9908>, <example.A object at 0x2aaaaabd9950>, <example.A object at 0x2aaaaabd9998>, <example.A object at 0x2aaaaabd99e0>, <example.A object at 0x2aaaaabd9a28>, <example.A object at 0x2aaaaabd9a70>, <example.A object at 0x2aaaaabd9ab8>, <example.A object at 0x2aaaaabd9b00>, <example.A object at 0x2aaaaabd9b48>, <example.A object at 0x2aaaaabd9b90>, <example.A object at 0x2aaaaabd9bd8>, <example.A object at 0x2aaaaabd9c20>, <example.A object at 0x2aaaaabd9c68>, <example.A object at 0x2aaaaabd9cb0>, <example.A object at 0x2aaaaabd9cf8>, <example.A object at 0x2aaaaabd9d40>, <example.A object at 0x2aaaaabd9d88>, <example.A object at 0x2aaaaabd9dd0>, <example.A object at 0x2aaaaabd9e18>, <example.A object at 0x2aaaaabd9e60>, <example.A object at 0x2aaaaabd9ea8>, <example.A object at 0x2aaaaabd9ef0>, <example.A object at 0x2aaaaabd9f38>, <example.A object at 0x2aaaaabd9f80>, <example.A object at 0x2aaaaabd9fc8>, <example.A object at 0x2aaaaabd8050>, <example.A object at 0x2aaaaabd8098>, <example.A object at 0x2aaaaabd80e0>, <example.A object at 0x2aaaaabd8128>, <example.A object at 0x2aaaaabd8170>, <example.A object at 0x2aaaaabd81b8>, <example.A object at 0x2aaaaabd8200>, <example.A object at 0x2aaaaabd8248>, <example.A object at 0x2aaaaabd8290>, <example.A object at 0x2aaaaabd82d8>, <example.A object at 0x2aaaaabd8320>, <example.A object at 0x2aaaaabd8368>, <example.A object at 0x2aaaaabd83b0>, <example.A object at 0x2aaaaabd83f8>, <example.A object at 0x2aaaaabd8440>, <example.A object at 0x2aaaaabd8488>, <example.A object at 0x2aaaaabd84d0>, <example.A object at 0x2aaaaabd8518>, <example.A object at 0x2aaaaabd8560>, <example.A object at 0x2aaaaabd85a8>, <example.A object at 0x2aaaaabd85f0>, <example.A object at 0x2aaaaabd8638>, <example.A object at 0x2aaaaabd8680>, <example.A object at 0x2aaaaabd86c8>, <example.A object at 0x2aaaaabd8710>, <example.A object at 0x2aaaaabd8758>, <example.A object at 0x2aaaaabd87a0>, <example.A object at 0x2aaaaabd87e8>, <example.A object at 0x2aaaaabd8830>, <example.A object at 0x2aaaaabd8878>, <example.A object at 0x2aaaaabd88c0>, <example.A object at 0x2aaaaabd8908>, <example.A object at 0x2aaaaabd8950>, <example.A object at 0x2aaaaabd8998>, <example.A object at 0x2aaaaabd89e0>, <example.A object at 0x2aaaaabd8a28>, <example.A object at 0x2aaaaabd8a70>, <example.A object at 0x2aaaaabd8ab8>, <example.A object at 0x2aaaaabd8b00>, <example.A object at 0x2aaaaabd8b48>, <example.A object at 0x2aaaaabd8b90>, <example.A object at 0x2aaaaabd8bd8>, <example.A object at 0x2aaaaabd8c20>, <example.A object at 0x2aaaaabd8c68>, <example.A object at 0x2aaaaabd8cb0>, <example.A object at 0x2aaaaabd8cf8>, <example.A object at 0x2aaaaabd8d40>, <example.A object at 0x2aaaaabd8d88>, <example.A object at 0x2aaaaabd8dd0>, <example.A object at 0x2aaaaabd8e18>, <example.A object at 0x2aaaaabd8e60>, <example.A object at 0x2aaaaabd8ea8>, <example.A object at 0x2aaaaabd8ef0>, <example.A object at 0x2aaaaabd8f38>, <example.A object at 0x2aaaaabd8f80>, <example.A object at 0x2aaaaabd8fc8>, <example.A object at 0x2aaaaabda050>, <example.A object at 0x2aaaaabda098>, <example.A object at 0x2aaaaabda0e0>, <example.A object at 0x2aaaaabda128>, <example.A object at 0x2aaaaabda170>, <example.A object at 0x2aaaaabda1b8>, <example.A object at 0x2aaaaabda200>, <example.A object at 0x2aaaaabda248>, <example.A object at 0x2aaaaabda290>, <example.A object at 0x2aaaaabda2d8>, <example.A object at 0x2aaaaabda320>, <example.A object at 0x2aaaaabda368>, <example.A object at 0x2aaaaabda3b0>, <example.A object at 0x2aaaaabda3f8>, <example.A object at 0x2aaaaabda440>, <example.A object at 0x2aaaaabda488>, <example.A object at 0x2aaaaabda4d0>, <example.A object at 0x2aaaaabda518>, <example.A object at 0x2aaaaabda560>, <example.A object at 0x2aaaaabda5a8>, <example.A object at 0x2aaaaabda5f0>, <example.A object at 0x2aaaaabda638>, <example.A object at 0x2aaaaabda680>, <example.A object at 0x2aaaaabda6c8>, <example.A object at 0x2aaaaabda710>, <example.A object at 0x2aaaaabda758>, <example.A object at 0x2aaaaabda7a0>, <example.A object at 0x2aaaaabda7e8>, <example.A object at 0x2aaaaabda830>, <example.A object at 0x2aaaaabda878>, <example.A object at 0x2aaaaabda8c0>, <example.A object at 0x2aaaaabda908>, <example.A object at 0x2aaaaabda950>, <example.A object at 0x2aaaaabda998>, <example.A object at 0x2aaaaabda9e0>, <example.A object at 0x2aaaaabdaa28>, <example.A object at 0x2aaaaabdaa70>, <example.A object at 0x2aaaaabdaab8>, <example.A object at 0x2aaaaabdab00>, <example.A object at 0x2aaaaabdab48>, <example.A object at 0x2aaaaabdab90>, <example.A object at 0x2aaaaabdabd8>, <example.A object at 0x2aaaaabdac20>, <example.A object at 0x2aaaaabdac68>, <example.A object at 0x2aaaaabdacb0>, <example.A object at 0x2aaaaabdacf8>, <example.A object at 0x2aaaaabdad40>, <example.A object at 0x2aaaaabdad88>, <example.A object at 0x2aaaaabdadd0>, <example.A object at 0x2aaaaabdae18>, <example.A object at 0x2aaaaabdae60>, <example.A object at 0x2aaaaabdaea8>, <example.A object at 0x2aaaaabdaef0>, <example.A object at 0x2aaaaabdaf38>, <example.A object at 0x2aaaaabdaf80>, <example.A object at 0x2aaaaabdafc8>, <example.A object at 0x2aaaaabdb050>, <example.A object at 0x2aaaaabdb098>, <example.A object at 0x2aaaaabdb0e0>, <example.A object at 0x2aaaaabdb128>, <example.A object at 0x2aaaaabdb170>, <example.A object at 0x2aaaaabdb1b8>, <example.A object at 0x2aaaaabdb200>, <example.A object at 0x2aaaaabdb248>, <example.A object at 0x2aaaaabdb290>, <example.A object at 0x2aaaaabdb2d8>, <example.A object at 0x2aaaaabdb320>, <example.A object at 0x2aaaaabdb368>, <example.A object at 0x2aaaaabdb3b0>, <example.A object at 0x2aaaaabdb3f8>, <example.A object at 0x2aaaaabdb440>, <example.A object at 0x2aaaaabdb488>, <example.A object at 0x2aaaaabdb4d0>, <example.A object at 0x2aaaaabdb518>, <example.A object at 0x2aaaaabdb560>, <example.A object at 0x2aaaaabdb5a8>, <example.A object at 0x2aaaaabdb5f0>, <example.A object at 0x2aaaaabdb638>, <example.A object at 0x2aaaaabdb680>, <example.A object at 0x2aaaaabdb6c8>, <example.A object at 0x2aaaaabdb710>, <example.A object at 0x2aaaaabdb758>, <example.A object at 0x2aaaaabdb7a0>]
>>>

Now what surprises me here is that the __wards__ preserved all those temporary loop variables, even though they should be garbage collectible. I think this is the kind of thing eating up memory in my real application, and maybe this is the expected behavior? If so, I probably don't want to be using the custodian pattern, but I'm not sure what I should be doing instead such that the b.Aptr method is not making copies?

Revision history for this message
Gustavo Carneiro (gjc) said :
#6

In your example, 'b' is the custodian object, therefore for as long as b is still alive then the A instance will not be garbage collected.

This example does manifest one problem, I admit, in that calling repeatedly b.Aptr() will create an additional wrapper for each call. The reason why this happens is that pybindgen does not use by default a "wrapper registry". A "wrapper registry" is a piece of code that is inserted that registers python wrappers as they are created, and each time a C++ to python of an object is requested it will first check if a Python wrapper already exists and if so reuse it.

For C++ code, you can enable the wrapper registry by:

    pybindgen.settings.wrapper_registry = pybindgen.settings.StdMapWrapperRegistry

For C code you'd have to implement your own wrapper registry implementation. Because I have not found time and patience to implement a wrapper registry for C code, and pybindgen does not know whether it will be used to generate C or C++ code, this StdMapWrapperRegistry is not enabled by default. But if you are using C++ code, it is definitely advisable to use it.

Revision history for this message
Mike Owen (mikeowen) said :
#7

My stuff is C++, so I gave the wrapper registry a try. Sadly, I found that my big application works for a while with this option, but eventually crashes with errors that appear to be due to getting back the wrong objects from queries. :(

However, I have found a fix that I think improves things when using __wards__. It seems to me that once the reference count for an object in the __wards__ list falls to 1, this means that the only thing that is still keeping this ward alive is the __wards__ list itself and it can be safely deleted since no one outside the custodian cares or knows about the ward anymore. I've tried modifying the _add_ward method in cppclass_typehandlers.py to do the following:

def _add_ward(wrapper, custodian, ward):
    wards = wrapper.declarations.declare_variable(
        'PyObject*', 'wards')
    wards_size = wrapper.declarations.declare_variable(
        'Py_ssize_t', 'wards_size')
    i = wrapper.declarations.declare_variable(
        'Py_ssize_t', 'i')
    item = wrapper.declarations.declare_variable(
        'PyObject*', 'item')
    wrapper.after_call.write_code(
        "%(wards)s = PyObject_GetAttrString(%(custodian)s, (char *) \"__wards__\");"
        % vars())
    wrapper.after_call.write_code(
        "if (%(wards)s == NULL) {\n"
        " PyErr_Clear();\n"
        " %(wards)s = PyList_New(0);\n"
        " PyObject_SetAttrString(%(custodian)s, (char *) \"__wards__\", %(wards)s);\n"
        "}" % vars())
    wrapper.after_call.write_code(
        'wards_size = PyList_Size(%(wards)s);\n'
        'if (%(wards_size)s > 0) {\n'
        ' for (%(i)s = %(wards_size)s - 1; %(i)s != 0; %(i)s--) {\n'
        ' %(item)s = PyList_GetItem(%(wards)s, %(i)s);\n'
        ' if (Py_REFCNT(%(item)s) <= 1) {\n'
        ' PySequence_DelItem(%(wards)s, %(i)s);\n'
        ' }\n'
        ' }\n'
        '}\n' % vars())
    wrapper.after_call.write_code("PyList_Append(%s, %s);" % (wards, ward))
    wrapper.after_call.add_cleanup_code("Py_DECREF(%s);" % wards)

So you see all I've done is add a loop that scans the __wards__ list and removes any objects for which the reference count has fallen to 1. This modification gets rid of all the temporary wards in the code example in my earlier post, and better still seems to solve a large fraction of the memory leak problems that have been killing my full up main application. Sadly though it appears that memory leaks are still occurring somewhere and taking me down, but this change *significantly* improves the behavior and lets me run much longer.

I've confirmed that all the pybindgen unit tests pass with this change (except those using pygccxml for automatic parsing since I don't have that installed). I'm not sure this is necessarily the best place to put this kind of culling of the __wards__ list, but I think something like this needed to keep the __wards__ from growing in an unbounded fashion. Is this sort of change something you want for pybindgen, or am I missing something this would break?

Revision history for this message
Mike Owen (mikeowen) said :
#8

I've found the other memory leak problem that's been plaguing me, and this one is definitely all me. In one of the methods I provided a while back to support the "sq_item" slot to provide support for __getitem__ container type accesses on pybindgen wrapped objects, I neglected a necessary Py_DECREF. On line 281 of "pytypeobject.py" the following should be inserted:

Py_DECREF(args);

The lack of this was causing a leak of a tuple every time the __getitem__ method was called, which led to memory hilarity. This change combined with the previous one I mention above for culling the __wards__ attribute seem to cure the problems I've been having -- I'm now seeing stable memory size behavior. This missing Py_DECREF thing is definitely a bug and I imagine it won't be a problem putting this is the main version of pybindgen. I'm not sure how you feel about culling the __wards__ list though. Since these changes cure the problems I've been having though I'll say this solves my problem.

Revision history for this message
Gustavo Carneiro (gjc) said :
#9

The Py_DECREF was committed. Thanks.

Regarding __wards__ culling. First of all, it is slightly worrying that your code crashes when you activate the wrapper registry. The wards culling would not be needed if it was used.

For the culling itself, I have to think about it some more.

The problem is that I implemented custodian option for some guy that never gave me any more feedback, and I don't really use it in my code at all. So the memory model for custodian/ward is not 100% to me, I have to admit :P I guess PBG strives to provide equivalent behaviour to boost::python's custodian_and_ward. Now, if only I could understand clearly boost's custodian_and_ward...

Revision history for this message
Mike Owen (mikeowen) said :
#10

I see your problem with implementing the culling of __wards__ since you're not sure if it will break some expected behavior. Still, it's hard for me to see how this would be a problem since the ref count of 1 should indicate no one outside the custodian has a reference to the object.

In my case the reason the wards can grow out of control is because I use the custodian pattern to expose the __getitem__ elements of big containers (like std::vector) of objects. My application is a physics simulation code that cycles the same set of code over and over again ~10^4 times. At each of those cycles some analysis methods have to index each element of several of these big containers, giving us another factor of ~10^4. You can imagine at these numbers the __wards__ will just grow ridiculously and unnecessarily, since the vast majority of those index operations result in temporary objects that increment some sum and are then thrown away.

I guess I see a few ways to proceed:

1. I can try to make a small example that breaks the use of the wrapper registry, and we can try and see what is going wrong there.

2. Even if we do that, I think the current default __wards__ implementation has a fairly serious problem without the culling. Perhaps it would be possible to make the culling an optional thing that can be selected by the user when the wrappers are generated?

Revision history for this message
Gustavo Carneiro (gjc) said :
#11

I originally thought that custodian/ward pattern was meant that the custodian was supposed to keep the ward alive for as long as the custodian is alive. From that, your premise, that if refcount == 1 in the ward it means no one but the custodian is referencing the ward and so the ward should be freed, is wrong. The ward should not be freed because it is part of the contract that it should be kept alive. Now I have to admit that I do not know the reason why the boost people invented such a contract.

Now for the memory growth, I think if the wrapper registry is not used we could try an alternative approach: when creating a new wrapper for an object with custodian/ward memory model, first check if custodian.__wards__ already contains a wrapper for the same object. If so, INCREF the existing wrapper and return it, else create a new wrapper and add it to __wards__.

Revision history for this message
Mike Owen (mikeowen) said :
#12

That sounds workable, but I think that's the same thing you're trying to do with the wrapper registry. Either solution (using the wrapper registry vs. making the __wards__ act more like the wrapper registry) would probably work for my case. It would probably be good for me to figure out how to make a small test case that reproduces the the errors I'm encountering trying to use the registry, so that either/both solutions could be tested against this pathology.