Table of contents
Text begins
Topic navigation
- 4 Data exploration
- 4.1 Data exploration tools
- 4.2 Types of variables
- 4.3 Frequency distribution
- 4.4 Measures of central tendency
- 4.5 Measures of dispersion
- 4.6 Exercises
- 4.7 Answers
A variable is a characteristic that can be measured and that can assume different values. Height, age, income, province or country of birth, grades obtained at school and type of housing are all examples of variables. Variables may be classified into two main categories: categorical and numeric. Each category is then classified in two subcategories: nominal or ordinal for categorical variables, discrete or continuous for numeric variables. These types are briefly outlined in this section.
Categorical variables
A categorical variable (also called qualitative variable) refers to a characteristic that can’t be quantifiable. Categorical variables can be either nominal or ordinal.
Nominal variables
Anominal variableis one that describes a name, label or category without natural order. Sex and type of dwelling are examples of nominal variables. In Table4.2.1, the variable “mode of transportation for travel to work” is also nominal.
Mode of transportation fortravel to work | Number of people |
---|---|
Car, truck, van as driver | 9,929,470 |
Car, truck, van as passenger | 923,975 |
Public transit | 1,406,585 |
Walked | 881,085 |
Bicycle | 162,910 |
Other methods | 146,835 |
Ordinal variables
Anordinal variableis a variable whose values are defined by an order relation between the different categories. In Table4.2.2, the variable “behaviour” is ordinal because the category “Excellent” is better than the category “Very good,” which is better than the category “Good,” etc. There is some natural ordering, but it is limited since we do not know by how much “Excellent” behaviour is better than “Very good” behaviour.
Behaviour | Number of students |
---|---|
Excellent | 5 |
Very good | 12 |
Good | 10 |
Bad | 2 |
Very bad | 1 |
It is important to note that even if categorical variables are not quantifiable, they can appear as numbers in a data set. Correspondence between these numbers and the categories is established during data coding. To be able to identify the type of variable, it is important to have access to the metadata (the data about the data) that should include the code set used for each categorical variable. For instance, categories used in Table4.2.2 could appear as a number from 1 to 5: 1 for “very bad,” 2 for “bad,” 3 for “good,” 4 for “very good” and 5 for “excellent.”
Numeric variables
Anumeric variable (also called quantitative variable) is a quantifiable characteristic whose values are numbers (except numbers which are codes standing up for categories). Numeric variables may be either continuous or discrete.
Continuous variables
A variable is said to be continuous if it can assume an infinite number of real values within a given interval. For instance, consider the height of a student. The height can’t take any values. It can’t be negative and it can’t be higher than three metres. But between 0 and 3, the number of possible values is theoretically infinite. A student may be 1.6321748755 …metres tall. In practice, the methods used and the accuracy of the measurement instrument will restrict the precision of the variable. The reported height would be rounded to the nearest centimetre, so it would be 1.63metres. The age is another example of a continuous variable that is typically rounded down.
Discrete variables
As opposed to a continuous variable, adiscrete variablecan assume only a finite number of real values within a given interval. An example of a discrete variable would be the score given by a judge to a gymnast in competition: the range is 0to10 and the score is always given to one decimal (e.g. a score of 8.5). You can enumerate all possible values (0, 0.1, 0.2…) and see that the number of possible values is finite: it is 101! Another example of a discrete variable is the number of people in a household for a household of size20 or less. The number of possible values is 20, because it’s not possible for a household to include a number of people that would be a fraction of an integer like 2.27 for instance.
Table of contents
- Statistics: Power from Data! - Main page
- 1 Data, statistical information and statistics
- 2 Sources of data
- 3 Data gathering and processing
- 4 Data exploration
- 5 Data visualization
- Bibliography
- Glossary
- Date modified: