Manipulating Python's os.walk()
Jan 23, 2016
Python’s os.walk() is a method that walks a directory tree, yielding lists of directory names and file names. If you’re not close friends though, it can appear tricky to control. It may flood your screen with hidden files or generally have poor boundaries! This is my effort for you guys to get to know each other a bit better.
How do I use os.walk()?
A standard way of walking down a path with os.walk() is making a loop:
Example output:
Try it out in your python REPL in order to get the gist of it. It will print out the root directory path for each loop, a list of the dirs in it and a list of the files in it.
Another way of doing it would be by using the os.path.join() method, which will print the full paths of the directories and files.
Which will result in a clean print:
But how can I tweak os.walk() in order to produce a customized tree?
What if you want to:
-
Not go into hidden directories
-
Not list a certain type of files
-
Not go into directories that match a path pattern
The plain os.walk() loop can seem like an unstoppable force of nature once provided with a path. It will go through every directory and every file until it can’t go no more! Python’s docs are hinting a solution for that, but let’s make it more comprehensible.
How can I exclude hidden directories from os.walk()?
The answer here lies in making a copy of the dirs
list and filtering the items.
With a list comprehension, now our list of directories does not include hidden directories. Moreover, os.walk() won’t go into those directories at all. We can do the same with the files
list.
How can I exclude other specific directories?
For example, you may have another list of directory names that you want to ignore during your os.walk(). One way to do this would be the same as above, with a list comprehension. Another way of doing it would be to check the root
each time, and in case it’s in the ignore list, empty both the dirs
and the files
list.
This kind of way will serve you better if you’re not working with an ignore list, but a path pattern. Since a list comprehension would not work with a pattern, you have to check if the root
matches the pattern each time. You can use fnmatch
or glob
for that.
And with that one, we conclude our little get-together with os.walk(). And remember, it’s all in the lists!