Aalok Trivedi · Follow
7 min read · Feb 12, 2023
Python is a powerful, versatile programing language with countless uses for task automation. In this scenario, your company wants to perform an audit on certain machines. Our task is to build a script that pulls information on every file within a given directory.
Goals
- Create a function that gets a list of all files in a given directory.
- Extract the file name, path, size, and creation date of every file, and put this info into a list.
- Output the list into a readable JSON format.
Prerequisites
- Basic knowledge of python methods (functions, lists, dictionaries, for-loops)
- an IDE, such as VS Code.
First, let’s import the necessary python libraries to build this script:
- os: Allows access to various operating system methods.
- datetime: Allows us to convert raw time outputs into a readable format.
- json: Allows us to convert data into JSON format.
import os
import datetime
import json
The os module gives us the time in units of seconds, which isn’t very helpful for us lowly humans so, let’s first create a helper function that will allow us to convert the creation date into a readable format.
In this function, we’ll use a parameter of timestamp
to pass in the creation date (seconds) later on.
def convertDate(timestamp):
d = datetime.datetime.utcfromtimestamp(timestamp)
formatedDate = d.strftime('%b %d, %Y') return formatedDate
The datetime.utcfromtimestamp()
method converts the seconds to a UTC formatted timestamp.
For example, datetime.datetime.utcfromtimestamp(10000)
will give us an output of 1970–01–01 02:46:40
. This is a totally acceptable output, but if we wanted a more readable format, we can use the .strftime()
method to provide us with a string in ‘mm dd yyyy’ format.
Here is a full list of format codes you can use with this method
Let’s test the function by printing the output for 10000 seconds.
print(convertDate(10000))
Great! Now that our convertDate()
function is working, we can work on the meat-and-potatoes of the script, which will extract file information from a given path.
For more info on the datetime library and module, check out the official documentation
To get specific file and directory information, we will use the os library and the various methods it provides.
Here’s a breakdown of what we need this function to do:
- Accept a custom path or default it to the current working directory.
- Take the path and find every file within the directory and its subdirectories.
- Extract the name, path, size, and creation date of each file
- Take the file information and put it into a dictionary and list.
- Take that list of dictionaries and convert it into JSON format.
Let’s start by defining the function getDirDetails()
that will take a path
parameter and default to the current working directory.
To get the current working directory, we can use the os.getcwd()
method.
Set the path parameter equal to the cwd as the default and test the function.
def getDirDetails(path=os.getcwd()): return path
print(getDirDetails())
The first thing we need to do is create some variables that will determine helpful information about the path that will be passed in.
def getDirDetails(path=os.getcwd()):
fileList = []
pathExists = os.path.exists(path)
isFile = os.path.isfile(path)
isDir = os.path.isdir(path)
fileList = []
: An empty list that will eventually store our file info.
pathExists = os.path.exists(path)
: Checks if the path provided exists. Returns True or False.
isFile = os.path.isFile(path)
: Checks if the path provided is a File. Returns True or False.
isDir = os.path.isDir(path)
: Checks if the path provided is a Directory. Returns True or False.
The last three will help us with error handling and validation.
Error handling
We only want the function to execute if the path provided exists AND is a directory. If the path does not exist or leads to a file → provide an error statement. We can use multiple if-else statements to check for each case.
def getDirDetails(path=os.getcwd()):
fileList = []
pathExists = os.path.exists(path)
isFile = os.path.isfile(path)
isDir = os.path.isdir(path) if pathExists and isDir:
#do stuff
print(f"'{path}' is a directory.")
elif pathExists and isFile:
print(f"Error: The path '{path}' must be a directory.")
elif pathExists == False:
print(f"Error: The path '{path}' does not exist.")
#test it out
print(getDirDetails()) #cwd: should pass
print(getDirDetails("/Users/aaloktrivedi/package-lock.json")) #file: should fail
print(getDirDetails("/Users/sdsvdvd")) #invalid path: should fail
Fantastic! The error handling works as expected.
NOTE: I know python has built-in error-handling methods, so this might not be the best way to tackle validation, but for our purposes, this will work just fine.
Iterate through the directory
Now we’re ready to iterate through our directory to get all the files. One way we can do this is by utilizing the os.walk()
method. This method will generate all files in a given directory tree and return a tuple of (dirpath, dirnames, filenames)
.
We can use a for-loop to iterate through these tuples to get individual info. We’ll want the root path: root
, directories: dirs
, and files: files
We then want to use another for-loop to iterate through just the files
to get all the individual files.
def getDirDetails(path=os.getcwd()):
fileList = [] pathExists = os.path.exists(path)
isFile = os.path.isfile(path)
isDir = os.path.isdir(path)
if pathExists and isDir:
for root, dirs, files in os.walk(path):
for file in files:
print(file)
...
#outside of the function
getDirDetails("/Users/aaloktrivedi/LUIT/Projects/LUIT_python")
Perfect! We now have access to every file, even inside subdirectories.
Let’s get more information about these files, such as the path, size, and creation date, and store them in variables for later use.
For the path, we’ll actually need to use a special join
within the os.path
method. We have access to the root path and the file name separately, but we can use os.path.join()
to put them back together.
filePath = os.path.join(root, file)
For the file size, we can use the os.path.getSize()
method, which returns the size in bytes. Again, we need to pass in the whole path, so we can use the filePath
variable we just created.
#I divided the bytes by 1024 to convert it into kb (optional).
#remember you need to pass in the whole file path, not just the file name.fileSize = round(os.path.getsize(filePath) / 1024, 1)
Remember that convertDate()
helper function we created earlier? Time to use it to convert our file creation date. We can use the os.path.getctime()
method to get the time in seconds and pass that into our helper function.
fileCreationDate = convertDate(os.path.getctime(filePath))
So far, our function code should look like this:
def getDirDetails(path=os.getcwd()):
fileList = [] pathExists = os.path.exists(path)
isFile = os.path.isfile(path)
isDir = os.path.isdir(path)
if pathExists and isDir:
for root, dirs, files in os.walk(path):
for file in files:
filePath = os.path.join(root, file)
fileSize = round(os.path.getsize(filePath) / 1024, 1)
fileCreationDate = convertDate(os.path.getctime(filePath))
elif pathExists and isFile:
print(f"Error: The path '{path}' must be a directory.")
elif pathExists == False:
print(f"Error: The path '{path}' does not exist.")
Store data and add to list
We have our file information, so now it’s time to store this data in a dictionary. We can then add the dictionary to the empty fileList
we created.
#create a dict for each file
fileDict = {
'file_name': file,
'path': filePath,
'size_kb': fileSize,
"date_created": fileCreationDate
}#append the dict to fileList
fileList.append(fileDict)
print(fileList)
It works!… but this isn’t very readable. Let’s convert it to JSON format to make the data much more human-friendly. We can do this by using the json.dumps()
method (make sure this is outside of the for-loops).
pathFilesJSON = json.dumps(fileList, indent=4)
Finally, we want to return
the JSON data (outside of the for-loops). This ensures the function outputs the JSON whenever it’s called.
return pathFilesJSON
#outside of the function
print(getDirDetails("YOUR_PATH")
Much better!
We just successfully created a python script that delivers data on every file within a directory!
Here is the full code:
Thank you for following me on my cloud computing journey. I hope this article was helpful and informative. Please give me a follow as I continue my journey, and I will share more articles like this!