Created by: gwideman, Mar 10, 2011 5:45 pm
Revised by: gwideman, Oct 13, 2013 2:45 am (51 revisions)

Note 2013-10-12: this article and subsidiary pages were written in spring 2011, concerning Python 3.1, and I think 3.2. The module site.py, and the module loader importlib.py have undergone significant changes over the course of the 3.x evolution, so this article is not current and I have not reviewed it to determine needed changes. Of particular note, this from Brett Cannon who rewrote the import module in Python (ie: replacing some C code) for version 3.3:
I'm hoping to find a readable version of the "big diagram" of import logic.

Overview

This article looks at the subject of how to organize locally-developed Python code that is common to several projects, and possibly under continued development.
Understanding the alternatives available requires understanding
  • what Python features can be employed by Python programs to find and load such common code. (import and the various features it interacts with, such as sys.path.)
  • what different directory arrangements these features facilitate or prohibit.
I embarked on this investigation because there seemed no clear answers in the many Python online docs and paper books which I consulted. It seemed to me that in most programming languages there's a fairly rapid progression from "single file program" to "multi-file program" to "numerous programs which share some code", so I expected there would be a clearly marked trail for that progression.
Instead, I found plenty of online advice of the kind "make an installation with distutils" or "set up virtual env", both of which are (a) currently not-fully-cooked-looking for Python 3.x, and (b) way overkill for small in-house development that nonetheless wants organization better than a single folder full of files.

Outcome: Issue filed

Based on what I uncovered here, I filed a bug report: bugs.python.org/issue11553 which generated some discussion and revisions to the python docs.

Scope

Here's a rough list of the topics, keywords and so on involved:
  • The module search path:
    • sys.path
    • PYTHONPATH environment variable
    • site-packages directories
    • ".pth" files
  • Package structure
    • A package is a directory containing
      • An __init__.py file
      • Zero or more additional module files
      • Optional subdirectory structure
  • The importstatement
    • Make other packages, modules or objects accessible to calling module
    • Different forms of import statement
    • Absolute vs relative imports
  • External visibility of attributes of a module or package
    • Leading underscore for "private"
    • __all__ list

The module and package search path

  • Python's import mechanism finds modules and packages by following a search path which consists of a list of directories in the global variable sys.path
  • Contributors to sys.path:
    • Python startup adds standard directories (not sure where they come from, but in any case sufficient for basic operation, and mostly not of interest to ordinary developers?)
    • The directory containing the main module (the module that was launched) gets added to the beginning of sys.path.
    • PYTHONPATH environment variable: string containing list of dirs separated by colon (linux/Mac) or semicolon (Windows)
    • xxx.pth files, located only in specific "site-package" directories (more on this below)
    • sys.path is just a Python list, so code can add directories to (or delete dirs from) this list.
      • There are evidently some advanced techniques which do this. Not covered in the current article.
  • NOT included in sys.path
    • The current working directory (os.getcwd()) that was in effect when the python script was launched seems NOT to be included in syspath, according to my test of Python 3.1. (It could be added using code if desired.)
  • Module searching order follows order of dirs in sys.path, so the order in which dirs are added to sys.path may matter if there's more than one target package or module by the same name.

Important subsidiary topics

Several more topics bear on the discussion here. See these separate pages:
  • Python- site-package dirs and .pth files
    • These directories, and .pth files, may be especially useful for structuring collections of locally-developed modules or packages.
  • Python- Package structure
    • Review of the details of package structure, since it interacts with the import statement, and how modules and their objects can be found, or for that matter, hidden.
  • Python- import statement
    • The behavior of the import statement is crucial to the topic of how common functionality will be available and invoked to projects.

So, for local development, how to organize common code modules?

Finally I return to the original question of how to organize the modules or package(s) which implement common functionality. The matters to be decided include the following:
Decision question
Alternatives
Pros/Cons
Comments
Form
Just a Bunch of Module Files (JBOMF)
Simplest imports, and no need to understand package structure, __init__.py etc, but modules need to be on sys.path. However, this choice is more vulnerable to name collisions unless modules are named with collision-avoiding prefixes.


One or more proper packages
This provides better control over names; e.g,: a carefully named package contains numerous more-briefly-named modules. Slight cost of understanding package structure, __init__.py etc.

Location
[pythonhome]\Lib\site-packages
(or unix equivalent)
Good location for functionality common to all users, but specific to particular version of Python.
See next entries for some alternatives using site-packages

A package within site-packages
This is a good place for a package. It is accessible via sys.path without any additional settings.
Recall that a package consists of a folder containing some modules and additional structure. So only the single folder appears at the level of the site-packages directory. Only the package name is directly accessible via sys.path, so the constituent modules are not at risk of name collisions with unrelated modules.

A module file stored in site-packages
A module within site-packages will be accessible via sys.path.
Poor choice because adding many modules to site-packages adds clutter and invites name collisions.

A subdir containing modules, stored within site-packages
To make such a subdir accessible, add a .pth file to point to it.
Less cluttered than adding the modules directly to site-packages, however each individual module is then directly exposed and vulnerable to name collisions.

[userhome]...Python##\site-packages
Good location for functionality for specific user, and specific to particular version of Python.

Modules and packages placed directly in the user's site-packages directory will be accessible to sys.path (for that user) with no further settings.
Similar comments apply here as to the site-wide site-packages case.
This location would also be useful for a developer/user who doesn't have write access to the site-wide site-packages directory

SomeOtherConvenientDir
A location outside the special site-package dirs would be preferred for common packages or modules that apply to multiple versions of Python, (and multi users).
(And there's nothing to stop further organizing such a dir by Python version or user if needed.)
This location might also be used by a dev/user who lacks write access to site-wide dir, or personal dirs of other users.
Adding to sys.path
[do nothing]
Modules or packages directly in one of the site-package directories are found automatically.
(But see note above about using a containing directory and .pth file for JBOMFs.)

Use PYTHONPATH environment variable
(If NOT putting packages or modules directly inside a site-packages directory.)
PYTHONPATHcan be set as a system-wide variable, or on a per-user or even per-run basis.
It is especially useful if the particular version of packages/modules might need to be swapped in or out at the time the main script is launched.

".pth" file (in site-package dir)
This method adds paths to sys.path with less vulnerability to mistakenly changed or omitted PYTHONPATH.
Arguably more robust, but also less flexibly changed per run, for example during development.
import
import mymodule
Simplest way for caller to get access to a non-packaged module. Requires using fully qualified names in body of code.
x = mymodule.attr
y = mymodule.someclass()

import mypackage
This gives caller access to class, and possibly to subsidiary modules IF mypackage.__init__.py imports them emplicitly. Access module using fully qualified names.
x = mypackage.mymodule.attr
y = mypackage.mymodule.someclass()

import mypackage.mymodule as myname
Imports only the specific module from mypackage, and assigns it a name that does not require qualification
x = myname.attr
y = myname.myclass()

from mypackage.mymodule import attr
The "from" form allows importing specific attributes, which can then be referenced with just a single name. Increased care needed to avoid name collision.
This form is especially useful for importing class definitions, where the containing module's name may match the name of the class, and hence seems redundant.
x = attr
from mypackage.someclassmod import someclass
y = someclass()

Numerous other permutations

See separate article on import.



























Reference documents

PEPs

Tutorial

Python Language Reference

Python Library Reference

Source files

  • [PYTHONHOME]\Lib\importlib\ module and test files
    • __init__.py: has incorrect reference (PEP 275 should be 273)
  • [PYTHONHOME]\Lib\site.py (for site-packages and .pth behavior)
    • Issue: uses incorrect case for Lib directory when searching for site-packages
  • [PYTHONHOME]\Lib\distutils\command\install.py (see extra_path and install_path_file comments)

Related Tracker Issues

Other references


Some Terminology

Notes-to-self on some terms.
  • "Global": Evidently in the Python context "global" is used to mean "global to a particular module" as opposed to program-wide globals, and as distinct from "locals", the local variables of a function.
    • built-in globals() function returns reference to module's dictionary.
      • or import own module (import mymodule as me ... me.__dict__ ...
        • Not sure why dict is not available without "importing self"
    • Python does have program-wide global variables, accessible via builtins.
      • __builtins__['myvar'] = 'myvarvalue', then later print(myvar)

Useful strings for google searches of Python docs

site:python.org
site:docs.python.org
+pth v3 "Python Standard Library" "Python Language Reference" "The Python Tutorial"
-"Usage Statistics"