What is __init__.py?
A Google search leads to stackoverflow which links to the python documentation. The gist is that
__init__.py is used to indicate that a directory is a python package. (A quick note about packages vs. modules from the python docs: “From a file system perspective, packages are directories and modules are files.”). If we want a folder to be considered a python package, we need to include a
__init__.py file. But what should we put in it?
What to put in
There is a range of options of what to put in an
__init__.py file. The most minimal thing to do is to leave the
__init__.py file totally empty. Moving slightly away from this, while still keeping things simple, you can use an
__init__.py only for determining import order. Another step up on the spectrum, you can use the
__init__.py file to define the API of the package. And the final step is you can actually just define your entire package in the
I will dig into the pro’s an cons of each of these 4 approaches and give examples of them in the wild in the rest of the post.
#1 Leave it Empty
This approach is the simplest to communicate and the simplest to enforce. There is no gray area about not including anything in an
__init__.py. The file will serve it’s purpose of indicating the folder should be considered a python package, and nothing else.
One disadvantage of this approach is that it fails to take advantage of the special status of the
__init__.py file. The code in the file will be executed in the course of importing any of the package’s submodules. For instance, if we have a project with the following directory structure:
foo ├── __init__.py ├── a.py ├── b.py └── bar └── __init__.py
And we want to import the “a” module, the statement
from foo import a looks in the
foo directory, sees the
__init__.py. It knows to treat
foo as a package, and it executes it’s
__init__.py, then looks for how to import
a. (You can verify this behavior by recreating this directory structure and putting print statements in the files. If you do
from foo import c, you’ll get an
ImportError, but not after the print statement in
foo/__init__.py executes. If you are interested in digging into the python source code, the code for
importlib is available on github. Also the spec for the generic Importer Protocol is in PEP-302). By leaving our
__init__.py file blank, we miss out on the opportunity to leverage this.
Another disadvantage is related to namespaces. In the directory structure listed above, importing
foo anywhere will be useless. In order to access any of our actual code, we have to import sub modules. In addition to making import statements longer, naming things is hard. It is unfortunate to come up with a great name for a package or a sub-package and then also need to come up with good names for sub-modules since that is what you will end up referring to.
An example in the python source of this approach being used is in urllib.
#2 Enforce Import Order
This is what mssaxm over at axialcorps.com recommends in a post titled 5 Simple Rules For Building Great Python Packages. This approach takes advantage of the special behavior of
__init__.py while still keeping the file simple. This approach really shines if your sub-modules have some static initialization. For example, let’s say
a.py writes a config file when it is imported, and
b.py reads from that file. We could have our
__init__.py ensure that
a.py is always run before
b.py by having it’s contents be:
import a import b
Then when we run
import foo.b, it is guaranteed that
a.py would be executed before
b.py. (This dependency example is a bit contrived; I do not mean to suggest that sub-modules should make a habit of writing out files on import.)
Since this approach does not allow non-import code in the
__init__.py, it seems to suffer from the namespace issue described in #1 above. However, this can be circumvented by importing member from individual packages. For instance, if we had a
my_func that we wanted to be able to access as
import foo; foo.my_func(), we could put
a.py and then have our
from a import my_func.
#3 Define API
In this approach, the
__init__.py file houses the most visible functionality for the package. It pieces together the functionality from the sub-modules. This approach has the advantage of providing a good starting point to look into a package, and makes it clear what the top level functionality is.
The disadvantage is that your
__init__.py file is more complicated. The more complicated it gets, and the more deeply nested your package structure gets, the greater the risk of this causing problems. Remember that importing a deeply nested package executes the
__init__.py of every parent package.
Another disadvantage of this approach is that it can be difficult to decide what deserves to be in the
__init__.py vs. in a sub-module. As the file gets bigger and more complex, a call will need to be made about when to pull things out.
An example of this approach in python library code is in the json module. The
__init__.py file exposes the
loads functions which rely on functionality defined in sub-modules.
#4 The Whole Package
The final approach is to put the entire package in the
__init__.py file. This can work well for small packages. It avoids needing to come up with a bunch of new names.
As the package gets larger however, a single file package can become unwieldy.
An example of this approach is collections module. (Although, technically it does have one sub-module.)
So what should you put in your
__init__.py? It depends on the project. Hopefully the information in this post can help you assess the pro’s and con’s of each of these approaches. For a guide on other general things to think about, I found a guide called Structuring Your Project on python-guide.org to be very helpful.