Content code
m1366
Slug (identifier)
box-and-whisker-plots
Grades
Secondary III
Topic
Mathematics
Tags
median
interquartile range
lower quartile
upper quartile
outlier
quarter range
box and whisker plot
Content
Contenu
Corps

The box and whisker plot allows you to see, at a glance, several details about the dispersion of the data in a distribution. It shows the quartiles (including the median), the minimum value, the maximum value, the interquartile range, the quarter range, and the outliers. In addition, the box and whisker plot makes it easy to assess the symmetry (or asymmetry) of a distribution.

A box and whisker plot is usually placed horizontally, but it is also possible for it to be placed vertically. Here is an example of each type.

Image
A horizontal box and whisker plot.
Image
A vertical box and whisker plot.
Content
Corps

It is possible for a distribution to contain some outliers, that is, data that are not representative of the rest of the distribution. If such data values exist, make sure that they are indicated in the quartile diagram. They cannot simply be eliminated, otherwise the graph would be misleading and lose its credibility.

Content
Corps

An outlier is a value in the distribution that is less than |1.5| times the interquartile range from |Q_1| or greater than |1.5| times the interquartile range from |Q_3.|

Corps

In other words, a data value |x| of a distribution is an outlier if one of the following two conditions is met:

  • |x<Q_1-1.5\times IR|

  • |x>Q_3+1.5\times IR|

Links
Title (level 2)
Constructing Box and Whisker Plots
Title slug (identifier)
constructing-box-and-whisker
Contenu
Corps

Here are the steps to follow to construct a box and whisker plot:

Surtitle
Règle
Content
Corps
  1. Place the data in ascending order.

  2. Separate the data distribution into |4| equal quarters.

  3. Determine the value of the quartiles.

  4. Determine if there are any outliers.

  5. Determine the minimum and maximum.

  6. Draw the box and whisker plot.

Content
Corps

Draw the box and whisker plot for the following distribution:

|15,| |26,| |31,| |16,| |19,| |38,| |12,| |22,| |36,| |27,| |30,| |18,| |29|


  1. Place the data in ascending order

    |12,| |15,| |16,| |18,| |19,| |22,| |26,| |27,| |29,| |30,| |31,| |36,| |38|

  2. Separate the data distribution into |\boldsymbol{4}| equal quarters

    This distribution has an odd number of data values |(13).| Therefore, |Q_2| is the data value at the centre of the distribution and separates it into |2| subgroups of |6| data values. |Q_1| and |Q_3| are therefore between data values, in order to create |4| quarters that each contain |3| data.
    ||\begin{alignat}{20}&&&\boldsymbol{\color{#ec0000}{\underbrace{Q_2}}}\\12,15,16\ &\color{#3b87cd}{\Big\vert}\ 18,19,22\ &&\color{#ec0000}{\boxed{\boldsymbol{26}}}\ 27,29,30\ &&\color{#7cca51}{\Big\vert}\ 31,36,38\\&\!\!\!\!\boldsymbol{\color{#3b87cd}{\overbrace{Q_1}}}&&&&\!\!\!\!\boldsymbol{\color{#7cca51}{\overbrace{Q_3}}}\end{alignat}||

  3. Determine the value of the quartiles

    We start by determining the value of the median |(Q_2),| which corresponds to the 7th data value.
    ||\boldsymbol{\color{#ec0000}{Q_2}}=\boldsymbol{\color{#ec0000}{26}}||
    Then the 1st quartile |(Q_1)|, which corresponds to the middle of the 3rd and 4th data values, is calculated by finding the mean of those two data values.
    ||\begin{alignat}{20}12,15,\boldsymbol{16}\ &\color{#3b87cd}{\Big\vert}\ \boldsymbol{18},19,22\ &&\color{#ec0000}{\boxed{\boldsymbol{26}}}\ 27,29,30\ &&\color{#7cca51}{\Big\vert}\ 31,36,38\\&\!\!\!\!\boldsymbol{\color{#3b87cd}{\overbrace{Q_1}}}\end{alignat}||||\boldsymbol{\color{#3b87cd}{Q_1}}=\dfrac{16+18}{2}=\boldsymbol{\color{#3b87cd}{17}}||
    Finally, the 3rd quartile |(Q_3),| which corresponds to the mean of the 10th and 11th data values, is calculated.
    ||\begin{alignat}{20}12,15,16\ &\color{#3b87cd}{\Big\vert}\ 18,19,22\ &&\color{#ec0000}{\boxed{\boldsymbol{26}}}\ 27,29,\boldsymbol{30}\ &&\color{#7cca51}{\Big\vert}\ \boldsymbol{31},36,38\\&&&&&\!\!\!\!\boldsymbol{\color{#7cca51}{\overbrace{Q_3}}}\end{alignat}||||\boldsymbol{\color{#7cca51}{Q_3}}=\dfrac{30+31}{2}=\boldsymbol{\color{#7cca51}{30.5}}||

  4. Determine if there are any outliers

    First, the interquartile range is calculated.
    ||\begin{align}IR&=Q_3-Q_1\\&=30.5–17\\&=13.5\end{align}||
    Next, we verify if the data at the ends of the distribution are outliers.

Columns number
2 columns
Format
50% / 50%
First column
Corps

||\begin{align}Q_1-1.5\times IR&=17-1.5\times13.5\\&=-3.25\end{align}||No data value in the distribution is less than | -3.25.|

Second column
Corps

||\begin{align}Q_3+1.5\times IR&=30.5+1.5\times13.5\\&=50.75\end{align}||No data value in the distribution is greater than |50.75.|

Corps

Therefore, there are no outliers.

  1. Determine the minimum and maximum

Since there are no outliers, the minimum value |(x_\text{min})| corresponds to the data with the smallest value and the maximum value |(x_\text{max})| corresponds to the data with the largest value.||\begin{align}x_\text{min}&=12\\x_\text{max}&=38\end{align}||

  1. Draw the box and whisker plot

Using a number line and the values calculated in the previous steps, the box and whisker plot is drawn.

Image
A box and whisker plot.
Corps

It is not necessary to indicate the minimum, maximum and quartile values on the plot, since there is always a number line.

The following is an example of a distribution where there is an outlier:

Content
Corps

Here are the grades obtained by students in Group 301 on a mathematics exam:

|63,| |96,| |60,| |84,| |52,| |68,| |70,| |12,| |98,| |75,| |72,| |65,| |60,| |74,| |92,| |76,| |94,| |68,| |65,| |88,| |76,| |80|

Construct the box and whisker plot for this distribution.

Columns number
2 columns
Format
50% / 50%
First column
Corps

Voici les notes obtenues par les élèves du groupe 301 lors d’un examen de mathématiques.

|63,| |96,| |60,| |84,| |52,| |68,| |70,| |12,| |98,| |75,| |72,| |65,| |60,| |74,| |92,| |76,| |94,| |68,| |65,| |88,| |76,| |80|

Trace le diagramme de quartiles de cette distribution.

Solution
Corps
  1. Place the data in ascending order

|12,| |52,| |60,| |60,| |63,| |65,| |65,| |68,| |68,| |70,| |72,| |74,| |75,| |76,| |76,| |80,| |84,| |88,| |92,| |94,| |96,| |98|

  1. Separate the data distribution into |\boldsymbol{4}| equal quarters

This distribution has an even number of data values |(22).| Therefore, |Q_2| is between the |2| data at the centre of the distribution and separates it into |2| subgroups of |11| data. |Q_1| and |Q_3| are therefore the data values in the middle of their respective subgroups, so as to create |4| quarters that contain |5| data values each.
||\begin{alignat}{20}&&&\!\!\!\!\boldsymbol{\color{#ec0000}{\underbrace{Q_2}}}\\12,52,60,60,63\ &\color{#3b87cd}{\boxed{\boldsymbol{65}}}\ 65,68,68,70,72\ &&\color{#ec0000}{\Big\vert}\ 74,75,76,76,80\ &&\color{#7cca51}{\boxed{\boldsymbol{84}}}\ 88,92,94,96,98\\&\!\!\;\boldsymbol{\color{#3b87cd}{\overbrace{Q_1}}}&&&&\!\!\;\boldsymbol{\color{#7cca51}{\overbrace{Q_3}}}\end{alignat}||

  1. Determine the value of the quartiles

We start by determining the value of the median |(Q_2),| which corresponds to the mean of the 11th and 12th data values.
||\begin{alignat}{20}&&&\!\!\!\!\boldsymbol{\color{#ec0000}{\underbrace{Q_2}}}\\12,52,60,60,63\ &\color{#3b87cd}{\boxed{\boldsymbol{65}}}\ 65,68,68,70,\boldsymbol{72}\ &&\color{#ec0000}{\Big\vert}\ \boldsymbol{74},75,76,76,80\ &&\color{#7cca51}{\boxed{\boldsymbol{84}}}\ 88,92,94,96,98\\\phantom{\boldsymbol{\overbrace{Q_1}}}\end{alignat}||||\boldsymbol{\color{#ec0000}{Q_2}}=\dfrac{72+74}{2}=\boldsymbol{\color{#ec0000}{73}}||
Next, we determine the 1st quartile |(Q_1),| which corresponds to the 6th data value.
||\boldsymbol{\color{#3b87cd}{Q_1}}=\boldsymbol{\color{#3b87cd}{65}}||
Finally, we determine the 3rd quartile |(Q_3),| which corresponds to the 17th data value.
||\boldsymbol{\color{#7cca51}{Q_3}}=\boldsymbol{\color{#7cca51}{84}}||

  1. Determine if there are any outliers

First, the interquartile range is calculated.
||\begin{align}IR&=Q_3-Q_1\\&=84–65\\&=19\end{align}||
Next, we verify if the data at the ends of the distribution are outliers.

Columns number
2 columns
Format
50% / 50%
First column
Corps

||\begin{align}Q_1-1.5\times IR&=65-1.5\times19\\&=36.5\end{align}||The 1st data value of the distribution |(12)| is less than |36.5.|

Second column
Corps

||\begin{align}Q_3+1.5\times IR&=84+1.5\times19\\&=112.5\end{align}||None of the data values in the distribution are greater than |112.5.|

Corps

Therefore, |12| is an outlier.

  1. Determine the minimum and maximum

Since |12| is an outlier, it is not considered the minimum value of the distribution.
||\begin{align}x_\text{min}&=52\\x_\text{max}&=98\end{align}||

  1. Draw the box and whisker plot

Using a number line and the values calculated in the previous steps, the box and whisker plot is drawn. The outlier |(12)| is represented by a star.

Image
A box and whisker plot containing an outlier.
Title (level 2)
Interpreting a Box and Whisker Plot
Title slug (identifier)
interpreting-box-and-whisker
Contenu
Corps

The number of data in a quarter should not be confused with the concentration of data in that same quarter.

Content
Corps

Each quarter of a box and whisker plot contains about |25\%| of the data in the distribution it represents.

Corps

In a box and whisker plot, a quarter that is longer than the others indicates that the data are more dispersed. Conversely, a quarter that is shorter than the others indicates that the data are more concentrated.

Content
Corps

With the intention of opening a new sportswear store, a company interviewed a sample of the population about how much each individual would be willing to pay for a high-quality piece of clothing.

To facilitate the interpretation of the data collected, the following box and whisker plot is constructed:

Image
A box and whisker plot.
Corps

Looking at this diagram, we can conclude that approximately |75\%| of people, that is, those in the 2nd, 3rd and 4th quarters, are prepared to pay between |\$ 60| and |\$ 120| for a top quality garment.

In addition, we note that the people in the 4th quarter are prepared to pay a price within a very narrow range (between |\$ 110| and |\$ 120|), whereas the people in the 1st quarter are prepared to pay a price within a very wide range (between |\$ 20| and |\$ 60|).

The future company will have to keep this information in mind in order not to sell its products at prices that are too high or too low.

Content
Corps

It is possible to convert quarters to percentile ranks and quartiles to percentiles.

Columns number
2 columns
Format
50% / 50%
First column
Corps
  • The 1st quarter contains the percentile ranks 1 to 25.

  • The 2nd quarter contains the percentile ranks 26 to 50.

  • The 3rd quarter contains the percentile ranks 51 to 75.

  • The 4th quarter contains the percentile ranks 76 to 100.

Second column
Corps
  • The 1st quartile |(Q_1)| corresponds to the 25th percentile |(C_{25}).|

  • The median |(Q_2)| corresponds to the 50th percentile |(C_{50}).|

  • The 3rd quartile|(Q_3)| corresponds to the 75th percentile |(C_{75}).|

Title (level 2)
Comparing Box and Whisker Plots
Title slug (identifier)
comparing-box-and-whisker
Contenu
Corps

When comparing box and whisker plots, first compare the medians |(Q_2)| and then compare the lengths of the whiskers (1st and 4th quarters) and the lengths of the boxes (2nd and 3rd quarters) to get an idea of the symmetry and dispersion of each distribution.

Here is an example of interpreting and comparing 2 box and whisker plots.

Content
Corps

Serge is the manager of the shoe department in a sports equipment shop. To compare the performance of his two employees, Karine and Eric, he records their sales every day over a 30-day period. From the data he collects, Serge creates the following 2 box and whisker plots.

Image
Comparison of 2 box and whisker plots.
Corps

a) Who had the fewest number of sales in one day?

b) Who is the most consistent in terms of sales?

c) Considering their respective |15| best days, who made the most sales?

d) Did Karine necessarily make a day with |36| sales?

e) For how many days did Eric make between |22| and |40| sales per day?

Columns number
2 columns
Format
50% / 50%
First column
Corps

Serge is the manager of the shoe department in a sports equipment shop. To compare the performance of his two employees, Karine and Eric, he records their sales every day over a 30-day period. From the data he collects, Serge creates the following 2 box and whisker plots.

Image
Comparison of 2 box and whisker plots.
Corps

a) Who had the fewest number of sales in one day?

b) Who is the most consistent in terms of sales?

c) Considering their respective |15| best days, who made the most sales?

d) Did Karine necessarily make a day with |36| sales?

e) For how many days did Eric make between |22| and |40| sales per day?

Solution
Corps

a) Who had the fewest number of sales in one day?

Simply compare the minimum data values |(x_\text{min})| for Karin and Eric.

Columns number
2 columns
Format
50% / 50%
First column
Corps

Karine||x_\text{min}=8||

Second column
Corps

Eric||x_\text{min}=12||

Corps

The person with the fewest number of sales in one day was Karine.

b) Who is the most consistent in terms of sales?

We can compare the range and the interquartile range |({IR})| for Karine and Eric, which will give an idea of the concentration of the data in each distribution.

Columns number
2 columns
Format
50% / 50%
First column
Corps

Karine

The minimum data value |(x_\text{min})| is |8| and the maximum data value |(x_\text{max})| is |52.|
||\begin{align}\text{Range}&=x_\text{max}-x_\text{min}\\&=52-8\\&=44\end{align}||
The 1st quartile |(Q_1)| is |26| and the 3rd quartile |(Q_3)| is |46.|
||\begin{align}IR&=Q_3-Q_1\\&=46-26\\&=20\end{align}||

Second column
Corps

Eric

The minimum data value |(x_\text{min})| is |12| and the maximum data value |(x_\text{max})| is |58.|
||\begin{align}\text{Range}&=x_\text{max}-x_\text{min}\\&=58-12\\&=46\end{align}||
The 1st quartile |(Q_1)| is |22| and the 3rd quartile |(Q_3)| is |50.|
||\begin{align}IR&=Q_3-Q_1\\&=50-22\\&=28\end{align}||

Corps

We see that Karine's range and interquartile range of sales are smaller to Eric's. In other words, compared to Eric, Karine's daily sales are more concentrated around the median, implying that Karine is a more consistent salesperson than Eric.

c) Considering their respective |\boldsymbol{15}| best days, who made the most sales?

To answer this question, we must analyze the data. However, since we do not know all the data values, we assume that they are evenly distributed in each quarter.

The sales data was collected over a period of |30| days. Therefore, we are interested in |50\%| of the days |\left(\dfrac{15}{30}\right)| that Karin and Eric made the most sales. In other words, we need to analyze the 3rd and 4th quarters of each box and whisker plot.

Columns number
2 columns
Format
50% / 50%
First column
Corps

Karine

The median |(Q_2)| is |36| and the maximum data value |(x_\text{max})| is |52.|

Second column
Corps

Eric

The median |(Q_2)| is |40| and the maximum data value |(x_\text{max})| is |58.|

Corps

We notice that the median and the maximum of Eric's sales are both higher than those of Karine. Therefore, we can conclude that Eric made the most sales during his best |15| days.

d) Did Karine necessarily make a day with |\boldsymbol{36}| sales?

We note that |36| is the median |(Q_2)| of Karine's distribution. However, the data was collected over a period of |30| days. The median is therefore the mean of the 15th and 16th data values.
||\begin{alignat}{20}&\!\!\!\!\!\!\!\!\!\!\!\boldsymbol{\color{#ec0000}{\underbrace{Q_2=36}}}\\x_\text{min}\ldots15\text{th}\text{ data value}\ &\color{#ec0000}{\Big\vert}\ 16\text{th}\text{ data value}\ldots x_\text{max}\end{alignat}||
For instance, the 15th data value might be |35| and the 16th data value |37.| The median would therefore be calculated as follows:
||\boldsymbol{\color{#ec0000}{Q_2}}=\dfrac{35+37}{2}=36||
We conclude that Karine did not necessarily make a day with 36 sales.

e) For how many days did Eric make between |\boldsymbol{22}| and |\boldsymbol{40}| sales per day?

We are concerned with the number of days in the 2nd quarter, that is, between |22| |(Q_1)| and |40| |(Q_2).| We found earlier that the median |(Q_2)| separates the distribution into 2 subgroups of |15| data values. This means that |Q_1| corresponds to the 8th data value in the distribution.
||\begin{alignat}{20}&&&\!\!\!\!\!\!\!\!\!\!\!\boldsymbol{\color{#ec0000}{\underbrace{Q_2=40}}}\\x_\text{min}\ldots7\text{th}\text{ data value}\ &\color{#3b87cd}{\boxed{\boldsymbol{8}\textbf{th}\textbf{ data value}}}\ 9\text{th}\text{ data value}\ldots15\text{th}\text{ data value}\ &&\color{#ec0000}{\Big\vert}\ 16\text{th}\text{ data value}\ldots x_\text{max}\\&\ \:\boldsymbol{\color{#3b87cd}{\overbrace{Q_1=22}}}\end{alignat}||
Each quarter contains |7| data values. So, unless the data values |22| and |40| repeat, we can conclude that Eric made between |22| and |40| sales during |7| days.

Contenu
Title
The evolution of a share on the stock exchange using box and whisker plots
Content
Content
Columns number
2 columns
Format
50% / 50%
First column
Corps

When analyzing a share on the stock market, often a graph is used that is similar to a box and whisker plot.

However, it should not be interpreted in the same way, since the different parts of each plot are not quarters, that is, they do not contain |25\%| of the data.

In this type of diagram, the length of the box corresponds to the difference between the opening and closing share price of the stock market.

Second column
Image
Graph of a share on the stock exchange.
Description
Source of data: OANDA Corporation, 2023.
Title (level 2)
See also
Title slug (identifier)
see-also
Contenu
Links
Remove audio playback
No
Printable tool
Off