Creating a python script to extract file information from a directory (2024)

Python is a powerful, versatile programing language with countless uses for task automation. In this scenario, your company wants to perform an audit on certain machines. Our task is to build a script that pulls information on every file within a given directory.

Goals

  • Create a function that gets a list of all files in a given directory.
  • Extract the file name, path, size, and creation date of every file, and put this info into a list.
  • Output the list into a readable JSON format.

Prerequisites

  1. Basic knowledge of python methods (functions, lists, dictionaries, for-loops)
  2. an IDE, such as VS Code.

First, let’s import the necessary python libraries to build this script:

  1. os: Allows access to various operating system methods.
  2. datetime: Allows us to convert raw time outputs into a readable format.
  3. json: Allows us to convert data into JSON format.
import os
import datetime
import json

The os module gives us the time in units of seconds, which isn’t very helpful for us lowly humans so, let’s first create a helper function that will allow us to convert the creation date into a readable format.

In this function, we’ll use a parameter of timestamp to pass in the creation date (seconds) later on.

def convertDate(timestamp):
d = datetime.datetime.utcfromtimestamp(timestamp)
formatedDate = d.strftime('%b %d, %Y')

return formatedDate

The datetime.utcfromtimestamp() method converts the seconds to a UTC formatted timestamp.

For example, datetime.datetime.utcfromtimestamp(10000) will give us an output of 1970–01–01 02:46:40 . This is a totally acceptable output, but if we wanted a more readable format, we can use the .strftime() method to provide us with a string in ‘mm dd yyyy’ format.

Here is a full list of format codes you can use with this method

Let’s test the function by printing the output for 10000 seconds.

print(convertDate(10000))
Creating a python script to extract file information from a directory (2)

Great! Now that our convertDate() function is working, we can work on the meat-and-potatoes of the script, which will extract file information from a given path.

For more info on the datetime library and module, check out the official documentation

To get specific file and directory information, we will use the os library and the various methods it provides.

Here’s a breakdown of what we need this function to do:

  1. Accept a custom path or default it to the current working directory.
  2. Take the path and find every file within the directory and its subdirectories.
  3. Extract the name, path, size, and creation date of each file
  4. Take the file information and put it into a dictionary and list.
  5. Take that list of dictionaries and convert it into JSON format.

Let’s start by defining the function getDirDetails() that will take a path parameter and default to the current working directory.

To get the current working directory, we can use the os.getcwd() method.

Set the path parameter equal to the cwd as the default and test the function.

def getDirDetails(path=os.getcwd()):

return path

print(getDirDetails())

Creating a python script to extract file information from a directory (3)

The first thing we need to do is create some variables that will determine helpful information about the path that will be passed in.

def getDirDetails(path=os.getcwd()):
fileList = []
pathExists = os.path.exists(path)
isFile = os.path.isfile(path)
isDir = os.path.isdir(path)

fileList = [] : An empty list that will eventually store our file info.

pathExists = os.path.exists(path) : Checks if the path provided exists. Returns True or False.

isFile = os.path.isFile(path) : Checks if the path provided is a File. Returns True or False.

isDir = os.path.isDir(path) : Checks if the path provided is a Directory. Returns True or False.

The last three will help us with error handling and validation.

Error handling

We only want the function to execute if the path provided exists AND is a directory. If the path does not exist or leads to a file → provide an error statement. We can use multiple if-else statements to check for each case.

def getDirDetails(path=os.getcwd()):
fileList = []
pathExists = os.path.exists(path)
isFile = os.path.isfile(path)
isDir = os.path.isdir(path)

if pathExists and isDir:
#do stuff
print(f"'{path}' is a directory.")

elif pathExists and isFile:
print(f"Error: The path '{path}' must be a directory.")

elif pathExists == False:
print(f"Error: The path '{path}' does not exist.")

#test it out
print(getDirDetails()) #cwd: should pass
print(getDirDetails("/Users/aaloktrivedi/package-lock.json")) #file: should fail
print(getDirDetails("/Users/sdsvdvd")) #invalid path: should fail

Creating a python script to extract file information from a directory (4)

Fantastic! The error handling works as expected.

NOTE: I know python has built-in error-handling methods, so this might not be the best way to tackle validation, but for our purposes, this will work just fine.

Iterate through the directory

Now we’re ready to iterate through our directory to get all the files. One way we can do this is by utilizing the os.walk() method. This method will generate all files in a given directory tree and return a tuple of (dirpath, dirnames, filenames).

We can use a for-loop to iterate through these tuples to get individual info. We’ll want the root path: root, directories: dirs , and files: files

We then want to use another for-loop to iterate through just the files to get all the individual files.

def getDirDetails(path=os.getcwd()):
fileList = []

pathExists = os.path.exists(path)
isFile = os.path.isfile(path)
isDir = os.path.isdir(path)

if pathExists and isDir:
for root, dirs, files in os.walk(path):
for file in files:
print(file)
...

#outside of the function
getDirDetails("/Users/aaloktrivedi/LUIT/Projects/LUIT_python")

Creating a python script to extract file information from a directory (5)

Perfect! We now have access to every file, even inside subdirectories.

Let’s get more information about these files, such as the path, size, and creation date, and store them in variables for later use.

For the path, we’ll actually need to use a special join within the os.path method. We have access to the root path and the file name separately, but we can use os.path.join() to put them back together.

filePath = os.path.join(root, file)

For the file size, we can use the os.path.getSize() method, which returns the size in bytes. Again, we need to pass in the whole path, so we can use the filePath variable we just created.

#I divided the bytes by 1024 to convert it into kb (optional).
#remember you need to pass in the whole file path, not just the file name.

fileSize = round(os.path.getsize(filePath) / 1024, 1)

Remember that convertDate() helper function we created earlier? Time to use it to convert our file creation date. We can use the os.path.getctime() method to get the time in seconds and pass that into our helper function.

fileCreationDate = convertDate(os.path.getctime(filePath))

So far, our function code should look like this:

def getDirDetails(path=os.getcwd()):
fileList = []

pathExists = os.path.exists(path)
isFile = os.path.isfile(path)
isDir = os.path.isdir(path)

if pathExists and isDir:
for root, dirs, files in os.walk(path):
for file in files:
filePath = os.path.join(root, file)
fileSize = round(os.path.getsize(filePath) / 1024, 1)
fileCreationDate = convertDate(os.path.getctime(filePath))

elif pathExists and isFile:
print(f"Error: The path '{path}' must be a directory.")

elif pathExists == False:
print(f"Error: The path '{path}' does not exist.")

Store data and add to list

We have our file information, so now it’s time to store this data in a dictionary. We can then add the dictionary to the empty fileList we created.

#create a dict for each file
fileDict = {
'file_name': file,
'path': filePath,
'size_kb': fileSize,
"date_created": fileCreationDate
}

#append the dict to fileList
fileList.append(fileDict)

print(fileList)

Creating a python script to extract file information from a directory (6)

It works!… but this isn’t very readable. Let’s convert it to JSON format to make the data much more human-friendly. We can do this by using the json.dumps() method (make sure this is outside of the for-loops).

pathFilesJSON = json.dumps(fileList, indent=4)

Finally, we want to return the JSON data (outside of the for-loops). This ensures the function outputs the JSON whenever it’s called.

return pathFilesJSON
#outside of the function
print(getDirDetails("YOUR_PATH")
Creating a python script to extract file information from a directory (7)

Much better!

We just successfully created a python script that delivers data on every file within a directory!

Here is the full code:

Thank you for following me on my cloud computing journey. I hope this article was helpful and informative. Please give me a follow as I continue my journey, and I will share more articles like this!

Creating a python script to extract file information from a directory (2024)
Top Articles
Crypto news & regulatory update: December 23, 2023 – January 12, 2024 - a16z crypto
About Farmers & Merchants State Bank
Tlc Africa Deaths 2021
Www.fresno.courts.ca.gov
12 Rue Gotlib 21St Arrondissem*nt
Crossed Eyes (Strabismus): Symptoms, Causes, and Diagnosis
Doby's Funeral Home Obituaries
Bill Devane Obituary
Declan Mining Co Coupon
Craigslist/Phx
Craigslist Heavy Equipment Knoxville Tennessee
Indiana Immediate Care.webpay.md
Insidekp.kp.org Hrconnect
Cvb Location Code Lookup
How Much Are Tb Tests At Cvs
Po Box 35691 Canton Oh
Urban Airship Expands its Mobile Platform to Transform Customer Communications
Apply for a credit card
Gayla Glenn Harris County Texas Update
Music Go Round Music Store
Teen Vogue Video Series
LCS Saturday: Both Phillies and Astros one game from World Series
Anotherdeadfairy
A Man Called Otto Showtimes Near Cinemark University Mall
Jeff Nippard Push Pull Program Pdf
Jurassic World Exhibition Discount Code
Publix Christmas Dinner 2022
Lawrence Ks Police Scanner
Chase Bank Cerca De Mí
Cheap Motorcycles Craigslist
Indiana Immediate Care.webpay.md
Despacito Justin Bieber Lyrics
Sinai Sdn 2023
Aliciabibs
Merge Dragons Totem Grid
Craigslist Mexicali Cars And Trucks - By Owner
20 bank M&A deals with the largest target asset volume in 2023
2132815089
Rocky Bfb Asset
Payrollservers.us Webclock
Blackwolf Run Pro Shop
Sallisaw Bin Store
Craigslist Antique
Valls family wants to build a hotel near Versailles Restaurant
Martha's Vineyard – Travel guide at Wikivoyage
Ehc Workspace Login
Star Sessions Snapcamz
Random Warzone 2 Loadout Generator
53 Atms Near Me
Dmv Kiosk Bakersfield
Mike De Beer Twitter
Dcuo Wiki
Latest Posts
Article information

Author: Twana Towne Ret

Last Updated:

Views: 6342

Rating: 4.3 / 5 (44 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Twana Towne Ret

Birthday: 1994-03-19

Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618

Phone: +5958753152963

Job: National Specialist

Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking

Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.