To promote a thorough understanding of data collected during a study, it is important to use adequate and accurate modes of representation. Here are a few examples:
When you have a distribution containing either data that are all different, or a small number of data, it is possible to simply list them.
In a recent survey, 14 people were asked to count the number of minutes they spent in front of the television during the previous day. Here are the different responses:
|39,| |23,| |40,| |59,| |10,| |39,| |57,| |38,| |11,| |37,| |53,| |61,| |29,| |51|
This data can be represented in a table.
Person | Time (minutes) |
---|---|
1st person | |39| |
2nd person | |23| |
3rd person | |40| |
4th person | |59| |
5th person | |10| |
6th person | |39| |
7th person | |57| |
8th person | |38| |
9th person | |11| |
10th person | |37| |
11th person | |53| |
12th person | |61| |
13th person | |29| |
14th person | |51| |
Note: The data could have also been represented in a stem-and-leaf plot.
When conducting a survey, it is useful to use a table to compile the data before analyzing it.
A class of |30| students are asked to name their favourite sport. The data is compiled in the table below:
Sport | Tally | Number of students |
---|---|---|
Football | |6| | |
Hockey | |10| | |
Baseball | III | |3| |
Soccer | |5| | |
Dance | IIII | |4| |
Tennis | II | |2| |
When the amount of data is very large and several data values occur more than once, it is helpful to use a condensed data table, also called a distribution table.
Note that the table can only be used when the type of variable studied is quantitative discrete, or qualitative. For quantitative continuous variables, tables with data grouped into classes or intervals are used.
A table of condensed data may contain several columns: value (category), number, cumulative number, relative frequency and cumulative relative frequency. In general, each row is associated with a value or mode, except for the last row which represents each column total.
-
A value is a data item that refers to a quantitative variable.
-
A category is a data item that refers to a qualitative variable.
-
A frequency is the number of times that a data value occurs in a distribution.
-
A cumulative frequency is the sum of the frequency of a data item and the total frequency of all the data items that precede it.
-
The total frequency is the total number of data in a distribution.
-
A relative frequency is the percentage of a data item’s frequency in relation to the total frequency.
-
A cumulative relative frequency is the percentage of the cumulative frequency of a data item in relation to the total number of data.
To find a cumulative frequency, simply add together the frequencies of all the previous data. To find a relative frequency or a cumulative relative frequency, the following formulas can be used:
||\text{Relative Frequency} = \dfrac{\text{Frequency}}{\text{Total frequency}} \times 100||
||\begin{gather}\text{Cumulative}\\\text{Relative Frequency}\end{gather}=\dfrac{\text{Cumulative frequency}}{\text{Total frequency}}\times100||
-
Determine the data.
-
Determine the frequency of each data value.
-
Calculate the cumulative frequency of each data value, as needed.
-
Calculate the relative frequency of each data value, as needed.
-
Calculate the cumulative relative frequency of each data value, as needed.
A woman stands in the courtyard of a secondary school and asks the people she meets their age. She gets the following distribution:
|14,| |16,| |13,| |12,| |12,| |13,| |17,| |15,| |15,| |15,| |18,| |12,| |13,| |13,| |14,| |13,| |14,| |15,| |16,| |15,| |15,| |12,| |17,| |17,| |16,| |14,| |14,| |14,| |15,| |15,| |13,| |16,| |17,| |15,| |13,| |17,| |14,| |12,| |15,| |13|
-
Determine the data
In this distribution, the values of the ages are |12,| |13,| |14,| |15,| |16,| |17,| and |18| years old.
Age | Frequency | Cumulative frequency | Relative frequency |(\%)| |
Cumulative relative frequency |(\%)| |
---|---|---|---|---|
|12| | ||||
|13| | ||||
|14| | ||||
|15| | ||||
|16| | ||||
|17| | ||||
|18| | ||||
Total |
-
Determine the frequency of each data value
To do so, the number of times each value occurs in the distribution is recorded in the table. A tally table (compilation table) can be used if needed.
We then add them all together and find that there are |40| data values.
Age | Frequency | Cumulative frequency | Relative frequency |(\%)| |
Cumulative relative frequency |(\%)| |
---|---|---|---|---|
|12| | |5| | |||
|13| | |8| | |||
|14| | |7| | |||
|15| | |10| | |||
|16| | |4| | |||
|17| | |5| | |||
|18| | |1| | |||
Total | |\boldsymbol{40}| |
-
Calculate the cumulative frequency of each data value, as needed
To find the cumulative frequency of a certain value, we add its frequency to the frequencies of all the preceding values.
For the 1st row, which is the row with a value of |12| years, the cumulative frequency of people is equal to the frequency, which is |5.| For the 2nd row, the cumulative frequency equals |5+8,| or |13.| For the 3rd row, the cumulative frequency equals |13+7,| or |20.| The rest of the column is filled in the same way.
There is no total to fill in for this column. However, we must make sure that the cumulative frequency of the last value correctly equals the total number of employees.
Age | Frequency | Cumulative frequency | Relative frequency |(\%)| |
Cumulative relative frequency |(\%)| |
---|---|---|---|---|
|12| | |5| | |5| | ||
|13| | |8| | |13| | ||
|14| | |7| | |20| | ||
|15| | |10| | |30| | ||
|16| | |4| | |34| | ||
|17| | |5| | |39| | ||
|18| | |1| | |40| | ||
Total | |\boldsymbol{40}| |
-
Calculate the relative frequency of each data value, as needed
This is determined using the relative frequency formula for each value. Here is an example of how this is calculated for the value |14.|
||\begin{align}\text{Relative Frequency}&= \dfrac{\text{Frequency}}{\text{Total frequency}} \times 100\\ &=\dfrac{7}{40} \times 100\\ &= 17.5\ \%\end{align}||
The total of this column should always equal |100\%.|
Age | Frequency | Cumulative frequency | Relative frequency |(\%)| |
Cumulative relative frequency |(\%)| |
---|---|---|---|---|
|12| | |5| | |5| | |12.5| | |
|13| | |8| | |13| | |20.0| | |
|14| | |7| | |20| | |17.5| | |
|15| | |10| | |30| | |25.0| | |
|16| | |4| | |34| | |10.0| | |
|17| | |5| | |39| | |12.5| | |
|18| | |1| | |40| | |2.5| | |
Total | |\boldsymbol{40}| | |\boldsymbol{100.0}| |
-
Calculate the cumulative relative frequency of each data value, as needed
This is determined by using the cumulative relative frequency formula for each value. Here is an example of how this is calculated for the value |16.|
||\begin{align}\begin{gathered}\text{Cumulative}\\\text{Relative Frequency}\end{gathered}&= \dfrac{\text{Cumulative frequency}}{\text{Total frequency}} \times 100\\&=\dfrac{34}{40} \times 100\\ &\approx 85.0\ \%\end{align}||
There is no total to fill in for this column. However, we must make sure that the cumulative relative frequency of the last value equals |100\%.|
Age | Frequency | Cumulative frequency | Relative frequency |(\%)| |
Cumulative relative frequency |(\%)| |
---|---|---|---|---|
|12| | |5| | |5| | |12.5| | |12.5| |
|13| | |8| | |13| | |20.0| | |32.5| |
|14| | |7| | |20| | |17.5| | |50.0| |
|15| | |10| | |30| | |25.0| | |75.0| |
|16| | |4| | |34| | |10.0| | |85.0| |
|17| | |5| | |39| | |12.5| | |97.5| |
|18| | |1| | |40| | |2.5| | |100.0| |
Total | |\boldsymbol{40}| | |\boldsymbol{100.0}| |
When the number of values in the distribution is very large, or the type of variable studied is quantitative and continuous, a table of data grouped into classes is often used to organize the data.
A table of data grouped into classes or intervals contains roughly the same columns as a condensed data table: class, frequency, cumulative frequency, relative frequency, and cumulative relative frequency. Only the first column changes, from value to class/interval.
-
A class is an interval of values written in square brackets.
-
The range of a class is its highest value minus the lowest value.
There are several different ways to separate a distribution into classes. First, the number of classes is chosen (usually 5 to 8 classes). Then the range of each class, called the class interval, is determined using the following formula:
|\text{Class Interval}= \dfrac{\text{Range of the distribution}}{\text{Number of classes}}|
-
Determine the classes by finding each class interval.
-
Determine the frequency of each class.
-
Calculate the cumulative frequency of each class, as needed.
-
Calculate the relative frequency of each class, as needed.
-
Calculate the cumulative relative frequency of each class, as needed.
In a study about the influence of climate on the size of different rodents, 20 rodents of the same species were measured. Here are the results in centimetres:
|12.1;| |12.3;| |12.4;| |12.5;| |13.2;| |13.7;| |14.2;| |14.8;| |14.9;| |14.9;| |14.9;| |15.0;| |15.2;| |15.3;| |15.3;| |15.4;| |15.5;| |15.6;| |16.3;| |17.3|
-
Determine the classes by finding the class interval
We separate the data into |6| classes. To find the class interval, we take the range of the distribution and divide it by the desired number of classes.
||\begin{align}\text{Class Interval}&=\dfrac{\text{Range of the distribution}}{\text{Number of classes}}\\ &=\dfrac{17.3-12.1}{6}\\&=\dfrac{5.2}{6}\\&=0.8\overline{6}\end{align}||
The class interval is |0.8\overline{6}.| We choose a class interval of |1| and begin the classes at |12.|
Size (cm) | Frequency | Relative frequency |(\%)| |
---|---|---|
|[12, 13[| | ||
|[13, 14[| | ||
|[14, 15[| | ||
|[15, 16[| | ||
|[16, 17[| | ||
|[17, 18[| | ||
Total |
-
Determine the frequency of each class
To do so, we count the number of data in each class.
Note: The data value |15.0| is placed in the class |[15,16[| and not in the |[14,15[| class.
Then, we add up the data and find that there are indeed a total of |20| data values.
Size (cm) | Frequency | Relative frequency |(\%)| |
---|---|---|
|[12, 13[| | |4| | |
|[13, 14[| | |2| | |
|[14, 15[| | |5| | |
|[15, 16[| | |7| | |
|[16, 17[| | |1| | |
|[17, 18[| | |1| | |
Total | |\boldsymbol{20}| |
-
Calculate the cumulative frequency of each class, as needed
This is not necessary for this example.
-
Calculate the relative frequency of each class, as needed
This is done using the relative frequency formula for each class. Here is an example of how this is calculated for the |[15, 16[| class.
||\begin{align}\text{Relative Frequency}&= \dfrac{\text{Frequency}}{\text{Total frequency}} \times 100\\ &=\dfrac{7}{20} \times 100\\ &= 35\ \%\end{align}||
This column should always have a total of |100\ \%.|
Size (cm) | Frequency | Relative frequency |(\%)| |
---|---|---|
|[12, 13[| | |4| | |20| |
|[13, 14[| | |2| | |10| |
|[14, 15[| | |5| | |25| |
|[15, 16[| | |7| | |35| |
|[16, 17[| | |1| | |5| |
|[17, 18[| | |1| | |5| |
Total | |\boldsymbol{20}| | |\boldsymbol{100}| |
-
Calculate the cumulative relative frequency of each class, as needed
This is not necessary for this example.