قالب وردپرس درنا توس
Home / Tips and Tricks / How (and why) use the Outliers function in Excel

How (and why) use the Outliers function in Excel



An outlier is a value that is significantly higher or lower than most values ​​in your data. When using Excel to analyze data, equalizers can boast the results. For example, the average of a dataset can really reflect your values. Excel contains some useful features to help you handle your loss, so let's have a look.

A quick example

In the picture below, it is relatively easy to discover the values ​​of two assigned to Eric and the value of 1

73 assigned Ryan. In a data set like this, it is easy to detect and manage these outliers manually.

  Values ​​containing deviators

In a larger set of data, it will not be the case. Being able to identify the deviants and remove them from statistical calculations is important – and that is what we should look at how to do this article.

How to find extras in your data

To find the deviations in a dataset, we use the following steps:

  1. Calculate the 1st and 3rd quarters (we talk about what they are just).
  2. Evaluate the interquartile range (we will also explain these a
  3. Return the upper and lower limits of our data value.
  4. Use these limits to identify the remote data points.

The cell area to the right of the data set seen in the image below will be used to store these values.

  Quartile Range

Let's Get Started

Step One: Calculate Quartiles

If you divide your data into quarters, each set is called a quartile. The lowest 25% of the numbers in the range represent the 1st quartile, the next 25% 2nd quartile, and so on. We take this step first because the most commonly used definition of an outlier is a data point that is more than 1.5 interquartile ranges ( IQR) during the 1st quartile and 1.5 interquartile intervals over the 3rd quartile To determine these values, we must first find out what the quartiles are.

Excel provides a QUARTILE function n to calculate quartiles, it requires two parts of the information: the array and the quarter.

  = QUARTILE (array, quart) 

array is the value range that you evaluate. And quart is a number that represents the quartile you want to return (eg 1 for 1 st quartile, 2 for the 2nd quartile, etc.)

Note ! In Excel 2010, Microsoft released the QUARTILE.INC and QUARTILE.EXC functions as improvements to the QUARTILE function. QUARTILE is more backward compatible when working across multiple versions of Excel.

Let's return to our sample table. Quartile Range ” width=”300″ height=”209″ src=”/pagespeed_static/1.JiBnMqyl6S.gif” onload=”pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);” onerror=”this.onerror=null;pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);”/>

To calculate the [quartile] quartile, we can use the following formula

  = QUARTILE (C2: C14.1) [196590018] When entering the formula, Excel gives a list of alternatives for the quarterly argument. 

 Using the QUARTILE function [19659017] To calculate the 3 rd quartile, we can specify a formula like the previous one in cell F3, but use a three instead of one.

  = QUARTILE (C2: C14.3) [19659019] Now we have quarterly data points displayed in the cells. 

 1st and 3rd quartile values ​​

Step two: Evaluate the interquartile range

The interquartile range (or IQR) is the middle of 50% of the values ​​in your data. It is calculated as the difference between the 1st quartile value and the 3rd quartile value.

We will use a simple formula in cell F4 that subtracts the quarter 1 st from 3

  quartile: 

  = F3-F2 

Now we can see our interquartile range is shown.

 Interquartile value

Step three: Return the lower and upper bands [19659016] The lower and upper limits are the smallest and largest values ​​for the data type we want to use. Some values ​​that are smaller or larger than these limit values ​​are different.

We calculate the lower limit in cell F5 by multiplying the IQR by 1.5 and then subtracting it from the Q1 data point:

  = F2- (19459019) 

<img class = "alignnone size- full wp-image-400243 "data-pagespeed-lazy-src =" https://www.howtogeek.com/wp-content/uploads/2018/12/xl-bound-border.png.pagespeed.gp+jp+ jw + pj + ws + js + rj + rp + rw + ri + cp + md.ic.7ue4qdLArz.png "alt =" The windows in this formula are not necessary because the multiplication part will be calculated before the subtraction part, but they make the formula easier

To calculate the upper limit in cell F6, we multiply IQR by 1.5 again, but this time adds to the Q3 data point: [19659018] = F3 + (1.5 * F4) [19659025]  Lower and upper limit values ​​

Step four: Identify equalizers

Now that we have all our underlying data established, it is time to identify our remote data points - those that are lower than those t the lower limit or higher than the upper limit.

We shall use the OR function to perform this logical test and display the values ​​meeting these criteria by specifying the following formula in cell C2:

  = OR (B2 <$F$5,B2> $ F $ 6) 

]  OR function to identify outliers

We should then copy that value to our C3-C14 cells. A true value indicates an outlier, and as you can see we have two in our data.

Ignore equalizers when calculating the mean

Using the QUARTILE function, we can calculate the IQR and work with the most commonly used definition of an outlier. But when calculating the mean of a number of values ​​and ignoring the equalizer, there is a faster and easier function to use. This technique will not identify an outlier as before, but it will allow us to be flexible with what we can consider our outlier part.

The function we need is called TRIMMEAN, and you can see the syntax for it below: [19659018] = TRIMMEAN (array, percent)

The array is the value range you want on average. the percentage is the percentage of data points to be excluded from the upper and lower part of the dataset (you can specify it as a percentage or a decimal value).

We entered the formula below in cell D3

  = TRIMMEAN (B2: B14, 20%) 

 TRIMMEAN formula on average excluding equalizers


There you have two different functions for handling extreme values. Whether you want to identify them for some reporting needs or exclude them from averaging, Excel has a feature that suits your needs.


Source link