Python Module Name Clashes

about | archive


[ 2012-April-16 21:44 ]

I wasted an hour or two today because I didn't realize that Python imports are relative, not absolute, and that they are relative to two things: the module's own location, and the main script's location. This is horrible, because it is very easy to accidentally have a name that conflicts with a built-in module, and then it becomes difficult to import that module reliably. To explain how this burned me and what is actually happening, I'll walk you through an example. Let's say we have a Python program called script.py, and two modules in a package: mypackage.mymodule and mypackage.email. The directory structure looks like:

(You can download the sample code if you like)

Now script.py is one line: import mypackage.email, and the email module prints a message when imported. If we run script.py Python finds mypackage/email.py as we expect:

$ ./script.py 
mypackage.email imported

Great! Now mymodule.py also imports mypackage.email, and we modify script.py to import this module as well:

$ ./script.py 
mypackage.email imported
mymodule: mypackage.email = <module 'mypackage.email' from '.../mypackage/email.pyc'>

We start working on our program, and at some point we realize that inside mymodule, we want to call Python's built-in email.utils.parseaddr() function. So we add one line, import email.utils, and then we get:

$ ./script.py 
mypackage.email imported
Traceback (most recent call last):
  File "./script.py", line 4, in <module>
    import mypackage.mymodule
  File ".../mypackage/mymodule.py", line 8, in <module>
    import email.utils
ImportError: No module named utils

Python can't find email.utils, except it is a built-in module! Why not?

Module-Relative Imports

The problem here is that imports are by default relative to the module (In Python versions <= 2.7.3; see below). Thus, mymodule first searches in its own package, and finds our own email.py instead of the built-in module. If we change the script to just import email, we get:

$ python ./script.py 
mypackage.email imported
mymodule: mypackage.email = <module 'mypackage.email' from '.../mypackage/email.pyc'>
mymodule: email = <module 'mypackage.email' from '.../mypackage/email.pyc'>

So how do we get the built-in module? We need to tell Python that we want absolute imports. This is the default in Python3, and it may become the default for future versions of Python2.x as well. To do this, we need to use: from __future__ import absolute_import. If we add that line, we now get the following output:

$ ./script.py 
mypackage.email imported
mymodule imported; mypackage.email = <module 'mypackage.email' from '.../mypackage/email.py'>
mymodule: email = <module 'email' from '/System/.../python2.7/email/__init__.pyc'>

Victory! We now can access both modules, one as mymodule.email and the other as email. If we explicitly want a relative import, we can use from . import email as local_email, and then you get:

$ ./script.py 
mypackage.email imported
mymodule: mypackage.email = <module 'mypackage.email' from '.../mypackage/email.pyc'>
mymodule: email = <module 'email' from '/System/.../python2.7/email/__init__.pyc'>
mymodule: local_email = <module 'mypackage.email' from '.../mypackage/email.pyc'>

Next Problem

Now let's write a unit test for mypackage.mymodule. We create a file called mypackage/mymodule_test.py that imports mympackage.mymodule:

$ ./mypackage/mymodule_test.py 
Traceback (most recent call last):
  File "./mypackage/mymodule_test.py", line 4, in <module>
    import mypackage.mymodule
ImportError: No module named mypackage.mymodule

Ah right, we need to set our PYTHONPATH so it can find the module. Let's try again:

$ PYTHONPATH=. ./mypackage/mymodule_test.py
mypackage.email imported
mypackage.email imported
mymodule: mypackage.email = <module 'mypackage.email' from '.../mypackage/email.pyc'>
mymodule: email = <module 'email' from '.../mypackage/email.pyc'>
mymodule: local_email = <module 'mypackage.email' from '.../mypackage/email.pyc'>

Wait a second, look at that closely: In mymodule, it found our own email.py module for both import email and import mypackage.email, even though we are specifying that we want absolute imports. Didn't we just fix this problem? Why isn't it still fixed?

Script-Relative Imports

The problem now is that Python puts the script's directory at the beginning of the module search path, sys.path (or PYTHONPATH). In Python, the main script is assumed to be at the root of the package tree. Doing anything differently, like trying to put the mymodule_test script inside mypackage, breaks things. The first warning sign here is that we needed to specify our own PYTHONPATH. However, even i There are two "easy" but unsatisfying solutions: either put all the main Python scripts in the actual root of your package hierarchy, or rename files to avoid name clashes with built-in modules.

I have however found a disgusting hack that fixes this problem: Modify the first entry in sys.path. The easy solution is to just remove it (del sys.path[0]). This will require that you manually specify the correct PYTHONPATH. A more complex but "perfect" solution is to modify sys.path[0] to reflect the script's desired location in the package hierarchy, which looks like the following:

if __name__ == "__main__":
    import os
    import sys
    scriptdir = os.path.abspath(os.path.dirname(sys.argv[0]))
    # Check that this Python version does what we expect
    assert sys.path[0] == scriptdir
    # package root is up one level in the heirarchy
    sys.path[0] = os.path.normpath(os.path.join(scriptdir, ".."))

import mypackage.mymodule

This script now does what we expect, no matter how we invoke it:

$ ./mypackage/mymodule_test.py 
mypackage.email imported
mymodule: mypackage.email = <module 'mypackage.email' from '/Users/ej/example/mypackage/email.pyc'>
mymodule: email = <module 'email' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/email/__init__.pyc'>
mymodule: local_email = <module 'mypackage.email' from '/Users/ej/example/mypackage/email.pyc'>

Unfortunately, this is a lot of crap to include in the header of a script, but it could be made into its own module. For now, I just use the del sys.path[0] hack in my programs, and always explicitly specify the right PYTHONPATH.

Conclusion

By default, Python paths are relative to the main script that is being executed. On Python 2.7 and older, imports are also relative to the module doing the importing. This means if you get weird import errors, check for name clashes.

The higher-level lesson here is that absolute imports are easier to understand. They always do the same thing in all programs, and don't depend on things that can change like the file's location, or the location of the main program. In this respect, Java probably got this right.