Finding Descriptive Statistics for Columns in a DataFrame - Data Science Discovery (2024)

When we're presented with a new DataFrame, it can be a lot to deal with. A great way to familiarize ourselves with all the new information is to look at descriptive statistics (sometimes known as summary statistics) for all applicable variables.

The Movie Dataset

To demonstrate these functions, we'll use a DataFrame of five different movies, including information about their release date, how much money they made in US dollars, and a personal rating out of 10.

import pandas as pd#Creates a DataFrame of "movie", "release date", "domestic gross", "worldwide gross", "personal rating", and "international box office" columnsdf = pd.DataFrame([ {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201, "personal rating": 10, "international box office": 138500000}, {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598, "personal rating": 9, "international box office": 522958274}, {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547, "personal rating": 7, "international box office": 266567421}, {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837, "personal rating": 8, "international box office": 6879509}, {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721, "personal rating": 7, "international box office": 242817}])df
movierelease datedomestic box officeworldwide box officepersonal ratinginternational box office
0The Truman Show1996-06-0512561820126411820110138500000
1Rogue One: A Star Wars Story2016-12-1653217732410551355989522958274
2Iron Man2008-05-023186041265851715477266567421
3Blade Runner1982-06-25326563283953583786879509
4Breakfast at Tiffany's1961-10-05955190497947217242817

List of Functions

Pandas has a great selection of functions for calculating descriptive statistics. In most cases, we only want to use these on columns with float and int dtypes, not strings. For example, we can't calculate the average movie title!

We'll go into detail about how to use these later. But for now, here are the most common and useful functions.

  • .count()
    • Returns how many non-null values are in a column
    • In other words, how many rows actually have a value for this column?
  • .sum()
    • Returns the sum of all values in a column
  • .mean()
    • Returns the mean (average) of the values in a column
  • .median()
    • Returns the median of the values in a column
  • .var()
    • Returns the variance of the values in a column
  • .std()
    • Returns the standard deviation of the values in the column
    • aka the square root of the variance
    • NOTE: Pandas automatically calculates the sample standard deviation, not the population standard deviation. To calculate the population standard deviation, switch the degrees of freedom to 0 by typing the parameter ddof = 0 in the parenthesis.
  • .min() and .max()
    • Returns the minimum value in a column
    • See the guide for Finding Specific Values in a Column
  • .quantile()
    • Returns the quantiles of the values in a column
    • Must input a parameter to specify the quantile
    • See the guide for Finding Quantiles of a Column in a DataFrame

Now, let's see these functions in action.

Finding a Descriptive Statistic for a Single Column

The most practical use of descriptive statistics is to apply the functions to a single column. This allows us to store the result in a variable and save it for future analysis.

We do this by specifying the column in brackets before applying the function. Let's say we wanted to find the average personal rating of these 5 movies.

df["personal rating"].mean()
8.2

Finding a Descriptive Statistic for All Columns

If we don't specify the column first, the function will return a list of that statistic for each column. But be careful: this could produce an error, since not every column in the DataFrame contains floats and ints!

df.median()
domestic box office 125618201.0worldwide box office 264118201.0personal rating 8.0international box office 138500000.0dtype: float64

The Holy Grail: Finding All of the Basic Descriptive Statistic

All of the aforementioned functions find one descriptive statistic at a time. But if we want a simple way to see all this information at once, there's also a function for that: .describe(). There are a few different ways to use this function, which are detailed below.

Entire DataFrame

If we apply .describe() to an entire DataFrame, it returns a brand new DataFrame with rows that correspond to all essential descriptive statistics. By default, it will only include the columns with integer and float dtypes.

df.describe()
domestic box officeworldwide box officepersonal ratinginternational box office
count5.000000e+005.000000e+005.000005.000000e+00
mean2.037216e+083.907512e+088.200001.870296e+08
std2.203103e+084.369559e+081.303842.172975e+08
min9.551904e+069.794721e+067.000002.428170e+05
25%3.265633e+073.953584e+077.000006.879509e+06
50%1.256182e+082.641182e+088.000001.385000e+08
75%3.186041e+085.851715e+089.000002.665674e+08
max5.321773e+081.055136e+0910.000005.229583e+08

That one line of code returns something pretty powerful.

One Column

If you want to find all descriptive statistics for a single column at once, .describe() can do that, too. With only one column, the results are returned as a list.

df["worldwide box office"].describe()
count 5.000000e+00mean 3.907512e+08std 4.369559e+08min 9.794721e+0625% 3.953584e+0750% 2.641182e+0875% 5.851715e+08max 1.055136e+09Name: worldwide box office, dtype: float64

However, when we apply .describe() to a column of strings, we don't get an error. Instead, .describe() gives us a list of statistics that are more applicable to the string dtype.

df["movie"].describe()
count 5unique 5top The Truman Showfreq 1Name: movie, dtype: object

Subsets of Columns

We can describe smaller subsets of columns, too. Just use double brackets to insert a list of the column names, with each name separated by a comma. The result will be a DataFrame.

df[["domestic box office", "worldwide box office"]].describe()
domestic box officeworldwide box office
count5.000000e+005.000000e+00
mean2.037216e+083.907512e+08
std2.203103e+084.369559e+08
min9.551904e+069.794721e+06
25%3.265633e+073.953584e+07
50%1.256182e+082.641182e+08
75%3.186041e+085.851715e+08
max5.321773e+081.055136e+09

However, this is only effective when both columns contain numbers (floats and/or ints) or when both columns contain strings. If you select columns with contrasting dtypes, it will only show the numerical descriptive statistics by default.

df[["movie", "personal rating"]].describe()
personal rating
count5.00000
mean8.20000
std1.30384
min7.00000
25%7.00000
50%8.00000
75%9.00000
max10.00000
Finding Descriptive Statistics for Columns in a DataFrame - Data Science Discovery (2024)
Top Articles
What Are Fluid Intelligence and Crystallized Intelligence?
Pepper Projectile Launcher State Laws and Regulations
955 Ups jobs in Amsterdam
Carmel.clay Schools Calendar
Party City.near.me
Julian Sands Shirtless
1968 Subaru Invader Gt
Deranged Wojak
Holiday Gift Bearer In Egypt
Funny Spotify Playlist Covers 300X300
Todoroki Oc
Inside Teresa Giudice & Luis Ruelas' $3.3 Million New Jersey House
Transcripts - University of Phoenix
Craigslist Akron Canton Ohio
Trejo's Blanchard Menu
‘This is going to be different. It’ll be exclusive.’ JetBlue announces plans to open luxury lounge at Logan. - The Boston Globe
80 For Brady Showtimes Near Cinemark At Harlingen
Dekalb County Jail Fort Payne Alabama
Sam's Club Gas Price Annapolis
BERNZOMATIC TS4000 INSTRUCTION MANUAL Pdf Download
Subway Surfers Unblocked Wtf
Umbreon GX #SM36 Prices | Pokemon Promo
Reiseland Brandenburg: Ihr Portal für Urlaub und Ausflüge
Newcardapply.com/21978
Used Trolling Motors For Sale Craigslist
27L1576
Game Like Tales Of Androgyny
2487872771
What is God Saying To YOU Today?
Hardage Giddens Chapel Hills Obituaries
Remember those moving, 3D portraits from Harry Potter? They’re a real thing now! - Yanko Design
On the hunt for an apartment? Try these 9 Craigslist alternatives
Crime Graphics Tcsd
belarus studio│bingサジェスト検索結果履歴│
Weil Cornell Connect
Octagonal Pyramid Surface Area Calculator
Fab Pedigree
Uh Board Of Regents Meeting
Skroch Funeral Chapel Obituaries
Clothes Mentor Arlington Texas
Creepshotorg
3rd Age Felling Axe / Third Age Felling Axe [High Level Delivery!... | ID 217153061 | PlayerAuctions
Huron County Jail, OH Inmate Search: Roster & Mugshots
Depew Garbage Schedule 2023
Nyu Paralegal Program
Tupperware Containers Ebay
Craigslist Apartments In Philly
Dr. David Oualaalou Ethnicity
Pokemon Mmo Rom
First Mess Blog
Latest Posts
Article information

Author: Allyn Kozey

Last Updated:

Views: 5597

Rating: 4.2 / 5 (43 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Allyn Kozey

Birthday: 1993-12-21

Address: Suite 454 40343 Larson Union, Port Melia, TX 16164

Phone: +2456904400762

Job: Investor Administrator

Hobby: Sketching, Puzzles, Pet, Mountaineering, Skydiving, Dowsing, Sports

Introduction: My name is Allyn Kozey, I am a outstanding, colorful, adventurous, encouraging, zealous, tender, helpful person who loves writing and wants to share my knowledge and understanding with you.