Quantile algorithms

Published

October 20, 2024

1 Steps to calculate quantiles in R

  • The help of the quantile() function in R describes the different algorithms used to calculate quantiles. However, I find the details very technical and not easy to understand for casual readers.

  • Therefore, I tried to simplify the algorithms as discussed below.

  • R provides nine algorithms to calculate the quantiles, which can be specified using the type argument:

    • Types \(1-3\) are used for discontinuous data, while types \(4-9\) are used for continuous data.

    • Type \(1\) and \(3\) are used for class “Date” and for ordered factors.

    • Type \(7\) is the default method for continuous data in R.

    • Type \(6\) is used to get results similar to SPSS, Minitab, or Graphpad Prism.

  • Let \(n\) be the number of observations in the dataset.

  • Let \(p\) a number between \(0\) and \(1\), where \((p \times 100)\%\) is the quantile to be calculated, denoted by \(Q_p\).

  • The first step is to calculate the product \(np\).

  • Next, calculate the \(j^{th}\) rank and \(Q_p\) as follows:

    Algorithm \(j\) Condition Quantile \((Q_p)\)
    \(1\) \(\lfloor np \rfloor\)

    \(np = j\)

    \(np \ne j\)

    \(x_j\)

    \(x_{j+1}\)

    \(2\) \(\lfloor np \rfloor\)

    \(np = j\)

    \(np \ne j\)

    \(\frac{1}{2}(x_j + x_{j+1})\)

    \(x_{j+1}\)

    \(3\) \(\lfloor np - \large \frac{1}{2}\rfloor\)

    \(np = j + \frac{1}{2}\) and \(j\) even

    \(np \ne j+ \frac{1}{2}\)

    \(x_j\)

    \(x_{j+1}\)

    \(4\) \(\lfloor np \rfloor\)
    \(x_j + (x_{j+1} - x_j)(np-j)\)
    \(5\) \(\lfloor np + \large \frac{1}{2}\rfloor\)
    \(x_j + (x_{j+1} - x_j)(np-j + \frac{1}{2})\)
    \(6\) \(\lfloor np + p \rfloor\)
    \(x_j + (x_{j+1} - x_j)(np-j +p)\)
    \(7\) \(\lfloor np -p +1 \rfloor\)
    \(x_j + (x_{j+1} - x_j)(np -j - p + 1)\)
    \(8\) \(\lfloor np + \large \frac{(p+1)}{3} \rfloor\)
    \(x_j + (x_{j+1} - x_j)(np - j + \frac{(p+1)}{3})\)
    \(9\) \(\lfloor np + \large \frac{(2p+3)}{8} \rfloor\)
    \(x_j + (x_{j+1} - x_j)(np - j + \frac{(2p+3)}{8})\)
    • The symbol \(\lfloor x \rfloor\) reads as the floor of \(x\). This function returns the largest integer not greater than \(x\) (i.e., rounds down \(x\) to the nearest integer). For example, \(\lfloor 3.9 \rfloor = 3\).

    • \(x_j\) and \(x_{j+1}\) are the \(j^{th}\) and \((j+1)^{th}\) order statistics, respectively (i.e., the observations having the rank \(j\) and \(j+1\) in the ordered array of the observations).

  • Let’s illustrate the above algorithms with some examples.

  • Example \(1\) (ordered factor): consider the following set of observations pertaining to the pain severity (\(1\): mild, \(2\): moderate, \(3\): severe) of \(10\) patients: \(1, 1, 1, 2, 2, 3, 3, 3, 3, 3\). The \(50^{th}\ (Q_{0.5})\) and \(75^{th}\ (Q_{0.75})\) quantiles can be calculated using algorithms \(1-3\) as follows:

  • \(\small 50^{th}\) Quantile:

    • \(\small p = 0.5\), \(\small np = 10 \times 0.5 = 5\), and \(\small j = \lfloor np \rfloor = \lfloor 5 \rfloor = 5\)

    • Since \(\small np = j\), then \(\small Q_{0.5} = x_j = x_5\) (the observation having the rank \(\small 5\) in the ordered array), which has the value \(\small 2\)

  • \(\small 75^{th}\) Quantile:

    • \(\small p = 0.75\), \(\small np = 10 \times 0.75 = 7.5\), and \(\small j = \lfloor np \rfloor = \lfloor 7.5 \rfloor = 7\)

    • Since \(\small np \ne j\), then \(\small Q_{0.75} = x_{j+1} = x_8\) (the observation having the rank \(\small 8\) in the ordered array), which has the value \(\small 3\)

  • Check the result using the quantile() function in R:

    Click to show/hide code
    x <- c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3)
       quantile(
     x, 
     probs = c(0.5, 0.75), 
     type = 1
       )
    50% 75% 
      2   3 
  • \(\small 50^{th}\) Quantile:

    • \(\small p = 0.5\), \(\small np = 10 \times 0.5 = 5\), and \(\small j = \lfloor np \rfloor = \lfloor 5 \rfloor = 5\)

    • Since \(\small np = j\), then \(\small Q_{0.5} = \frac{1}{2}(x_j + x_{j+1}) =\) \(\small \frac{1}{2}(x_5 + x_6) = \frac{1}{2}(2 + 3) = 2.5\)

  • \(\small 75^{th}\) Quantile:

    • \(\small p = 0.75\), \(\small np = 10 \times 0.75 = 7.5\), \(\small j = \lfloor np \rfloor = \lfloor 7.5 \rfloor = 7\)

    • Since \(\small np \ne \lfloor np \rfloor\), then \(\small Q_{0.75} = x_{j+1} = x_8 = 3\)

  • Check the result using the quantile() function in R:

    Click to show/hide code
    quantile(
      x, 
      probs = c(0.5, 0.75), 
      type = 2
    )
    50% 75% 
    2.5 3.0 
  • \(\small 50^{th}\) Quantile:

    • \(\small p = 0.5\), \(\small np = 10 \times 0.5 = 5\), and \(\small j = \lfloor np - 0.5 \rfloor = \lfloor 5 - 0.5 \rfloor = 4\)

    • Since \(\small np \ne j + 0.5\), then \(\small Q_{0.5} = x_{j+1} = x_5 = 2\)

  • \(\small 75^{th}\) Quantile:

    • \(\small p = 0.75\), \(\small np = 10 \times 0.75 = 7.5\), and \(\small j = \lfloor np - 0.5 \rfloor = \lfloor 7.5 - 0.5 \rfloor = 7\)

    • Since \(\small np = j + 0.5\) but \(j\) is not even, then \(\small Q_{0.75} = x_{j+1} = x_8 = 3\)

  • Check the result using the quantile() function in R:

    Click to show/hide code
    quantile(
      x, 
      probs = c(0.5, 0.75), 
      type = 3
    )
    50% 75% 
      2   3 
  • Example \(2\) (continuous data): consider the following ordered set of observations: \((10.2,\ 10.4,\ 11.6,\ 12.3,\ 13.2,\ 14.7,\ 15.4,\ 16.1)\). The \(25^{th}\ (Q_{0.25})\) and \(75^{th}\ (Q_{0.75})\) quantiles can be calculated using algorithms \(4-9\) as follows:
  • \(\small 25^{th}\) Quantile:

    • \(\small p = 0.25\), \(\small np = 8 \times 0.25 = 2\), and \(\small j = \lfloor np \rfloor = \lfloor 2 \rfloor = 2\)

    • \(\small Q_{0.25} = x_j + (x_{j+1} - x_j)(np - j) = x_2 + (x_3 - x_2)(2 - 2) = 10.4\)

  • \(\small 75^{th}\) Quantile:

    • \(\small p = 0.75\), \(\small np = 8 \times 0.75 = 6\), and \(\small j = \lfloor np \rfloor = \lfloor 6 \rfloor = 6\)

    • \(\small Q_{0.75} = x_j + (x_{j+1} - x_j)(np - j) = x_6 + (x_7 - x_6)(6 - 6) = 14.7\)

  • Check the result using the quantile() function in R:

    Click to show/hide code
    y <- c(10.2, 10.4, 11.6, 12.3, 13.2, 14.7, 15.4, 16.1)
    quantile(
      y, 
      probs = c(0.25, 0.75), 
      type = 4
    )
     25%  75% 
    10.4 14.7 
  • \(\small 25^{th}\) Quantile:

    • \(\small p = 0.25\), \(\small np = 8 \times 0.25 = 2\), and \(\small j = \lfloor np + 0.5 \rfloor = \lfloor 2.5 \rfloor = 2\)

    • \(\small Q_{0.25} = x_j + (x_{j+1} - x_j)(np - j + 0.5) = x_2 + (x_3 - x_2)(2 - 2 + 0.5) =\)

      \(\small 10.4 + (11.6 - 10.4) \times 0.5 = 11\)

  • \(\small 75^{th}\) Quantile:

    • \(\small p = 0.75\), \(\small np = 8 \times 0.75 = 6\), and \(\small j = \lfloor np + 0.5 \rfloor = \lfloor 6.5 \rfloor = 6\)

    • \(\small Q_{0.75} = x_j + (x_{j+1} - x_j)(np - j + 0.5) = x_6 + (x_7 - x_6)(6 - 6 + 0.5) =\) \(\small 14.7 + (15.4 - 14.7) \times 0.5 = 15.05\)

  • Check the result using the quantile() function in R:

    Click to show/hide code
    quantile(
      y, 
      probs = c(0.25, 0.75), 
      type = 5
    )
      25%   75% 
    11.00 15.05 
  • \(\small 25^{th}\) Quantile:

    • \(\small p = 0.25\), \(\small np = 8 \times 0.25 = 2\), and \(\small j = \lfloor np + p \rfloor = \lfloor 2 + 0.25 \rfloor = 2\)

    • \(\small Q_{0.25} = x_j + (x_{j+1} - x_j)(np - j + p) = x_2 + (x_3 - x_2)(2 - 2 + 0.25) =\)

      \(\small 10.4 + (11.6 - 10.4) \times 0.25 = 10.7\)

  • \(\small 75^{th}\) Quantile:

    • \(\small p = 0.75\), \(\small np = 8 \times 0.75 = 6\), and \(\small j = \lfloor np + p \rfloor = \lfloor 6 + 0.75 \rfloor = 6\)

    • \(\small Q_{0.75} = x_j + (x_{j+1} - x_j)(np - j + 0.75) = x_6 + (x_7 - x_6)(6 - 6 + 0.75) =\) \(\small 14.7 + (15.4 - 14.7) \times 0.75 = 15.225\)

  • Check the result using the quantile() function in R:

    Click to show/hide code
    quantile(
      y, 
      probs = c(0.25, 0.75), 
      type = 6
    )
       25%    75% 
    10.700 15.225 
  • \(\small 25^{th}\) Quantile:

    • \(\small p = 0.25\), \(\small np = 8 \times 0.25 = 2\), and \(\small j = \lfloor np - p + 1 \rfloor = \lfloor 2 - 0.25 + 1 \rfloor = 2\)

    • \(\small Q_{0.25} = x_j + (x_{j+1} - x_j)(np - j - p + 1) = x_2 + (x_3 - x_2)(2 - 2 - 0.25 + 1) =\)

      \(\small 10.4 + (11.6 - 10.4) \times 0.75 = 11.3\)

  • \(\small 75^{th}\) Quantile:

    • \(\small p = 0.75\), \(\small np = 8 \times 0.75 = 6\), and \(\small j = \lfloor np - p + 1 \rfloor = \lfloor 6 - 0.75 + 1 \rfloor = 6\)

    • \(\small Q_{0.75} = x_j + (x_{j+1} - x_j)(np - j - 0.75 + 1) = x_6 + (x_7 - x_6)(6 - 6 - 0.75 + 1) =\) \(\small 14.7 + (15.4 - 14.7) \times 0.25 = 14.875\)

  • Check the result using the quantile() function in R:

    Click to show/hide code
    quantile(
      y, 
      probs = c(0.25, 0.75), 
      type = 7
    )
       25%    75% 
    11.300 14.875 
  • \(\small 25^{th}\) Quantile:

    • \(\small p = 0.25\), \(\small np = 8 \times 0.25 = 2\), and \(\small j = \lfloor np + \frac{p + 1}{3} \rfloor = \lfloor 2 + \frac{0.25 + 1}{3} \rfloor = 2\)

    • \(\small Q_{0.25} = x_j + (x_{j+1} - x_j)(np - j + \frac{p + 1}{3}) = x_2 + (x_3 - x_2)(2 - 2 + \frac{0.25 + 1}{3}) =\)

      \(\small 10.4 + (11.6 - 10.4) \times 0.41667 = 10.9\)

  • \(\small 75^{th}\) Quantile:

    • \(\small p = 0.75\), \(\small np = 8 \times 0.75 = 6\), and \(\small j = \lfloor np + \frac{p + 1}{3} \rfloor = \lfloor 6 + \frac{0.75 + 1}{3} \rfloor = 6\)

    • \(\small Q_{0.75} = x_j + (x_{j+1} - x_j)(np - j + \frac{p + 1}{3}) = x_6 + (x_7 - x_6)(6 - 6 + \frac{0.75 + 1}{3}) =\) \(\small 14.7 + (15.4 - 14.7) \times 0.58333 = 15.10833\)

  • Check the result using the quantile() function in R:

    Click to show/hide code
    quantile(
      y, 
      probs = c(0.25, 0.75), 
      type = 8
    )
         25%      75% 
    10.90000 15.10833 
  • \(\small 25^{th}\) Quantile:

    • \(\small p = 0.25\), \(\small np = 8 \times 0.25 = 2\), and \(\small j = \lfloor np + \frac{2p + 3}{8} \rfloor = \lfloor 2 + \frac{2 \times 0.25 + 3}{8} \rfloor = 2\)

    • \(\small Q_{0.25} = x_j + (x_{j+1} - x_j)(np - j + \frac{2p + 3}{8}) = x_2 + (x_3 - x_2)(2 - 2 + \frac{2 \times 0.25 + 3}{8}) =\)

      \(\small 10.4 + (11.6 - 10.4) \times 0.4375 = 10.925\)

  • \(\small 75^{th}\) Quantile:

    • \(\small p = 0.75\), \(\small np = 8 \times 0.75 = 6\), and \(\small j = \lfloor np + \frac{2p + 3}{8} \rfloor = \lfloor 6 + \frac{2 \times0.75 + 3}{8} \rfloor = 6\)

    • \(\small Q_{0.75} = x_j + (x_{j+1} - x_j)(np - j + \frac{2p + 3}{8}) = x_6 + (x_7 - x_6)(6 - 6 + \frac{2 \times 0.75 + 3}{8}) =\) \(\small 14.7 + (15.4 - 14.7) \times 0.5625 = 15.09375\)

  • Check the result using the quantile() function in R:

    Click to show/hide code
    quantile(
      y, 
      probs = c(0.25, 0.75), 
      type = 9
    )
         25%      75% 
    10.92500 15.09375 

2 References

quantile: Sample Quantiles. Retrieved October 20, 2024, from https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/quantile


3 Add your comments

Back to top