How to calculate average in excel. Calculating the minimum, maximum and average value in Microsoft Excel

The most common type of average is the arithmetic average.

simple arithmetic mean

The simple arithmetic mean is the average term, in determining which the total volume of a given attribute in the data is equally distributed among all units included in this population. Thus, the average annual output per worker is such a value of the volume of output that would fall on each employee if the entire volume of output was equally distributed among all employees of the organization. The arithmetic mean simple value is calculated by the formula:

simple arithmetic mean— Equal to the ratio of the sum of individual values of a feature to the number of features in the aggregate

Example 1 . A team of 6 workers receives 3 3.2 3.3 3.5 3.8 3.1 thousand rubles per month.

Find the average salary
Solution: (3 + 3.2 + 3.3 +3.5 + 3.8 + 3.1) / 6 = 3.32 thousand rubles.

Arithmetic weighted average

If the volume of the data set is large and represents a distribution series, then a weighted arithmetic mean is calculated. This is how the weighted average price per unit of production is determined: the total cost of production (the sum of the products of its quantity and the price of a unit of production) is divided by the total quantity of production.

We represent this in the form of the following formula:

Weighted arithmetic mean- is equal to the ratio (the sum of the products of the attribute value to the frequency of repetition of this attribute) to (the sum of the frequencies of all attributes). It is used when the variants of the studied population occur an unequal number of times.

Example 2 . Find the average wages of shop workers per month

The average wage can be obtained by dividing the total wage by the total number of workers:

Answer: 3.35 thousand rubles.

Arithmetic mean for an interval series

When calculating the arithmetic mean for an interval variation series, the average for each interval is first determined as the half-sum of the upper and lower boundaries, and then the average of the entire series. In the case of open intervals, the value of the lower or upper interval is determined by the value of the intervals adjacent to them.

Averages calculated from interval series are approximate.

Example 3. Determine the average age of students in the evening department.

Averages calculated from interval series are approximate. The degree of their approximation depends on the extent to which the actual distribution of population units within the interval approaches uniform.

When calculating averages, not only absolute, but also relative values (frequency) can be used as weights:

The arithmetic mean has a number of properties that more fully reveal its essence and simplify the calculation:

1. The product of the average and the sum of the frequencies is always equal to the sum of the products of the variant and the frequencies, i.e.

2. The arithmetic mean of the sum of the varying values is equal to the sum of the arithmetic means of these values:

3. The algebraic sum of the deviations of the individual values of the attribute from the average is zero:

4. The sum of the squared deviations of the options from the mean is less than the sum of the squared deviations from any other arbitrary value, i.e.

How to calculate the average of numbers in Excel

You can find the arithmetic mean of numbers in Excel using the function.

Syntax AVERAGE

=AVERAGE(number1,[number2],…) - Russian version

Arguments AVERAGE

number1- the first number or range of numbers, for calculating the arithmetic mean;
number2(Optional) – second number or range of numbers to calculate the arithmetic mean. The maximum number of function arguments is 255.

To calculate, do the following steps:

Select any cell;
Write a formula in it =AVERAGE(
Select the range of cells for which you want to make a calculation;
Press the "Enter" key on the keyboard

The function will calculate the average value in the specified range among those cells that contain numbers.

How to find the average value given text

If there are empty lines or text in the data range, then the function treats them as "zero". If there are logical expressions FALSE or TRUE among the data, then the function perceives FALSE as “zero”, and TRUE as “1”.

How to find the arithmetic mean by condition

The function is used to calculate the average by a condition or criterion. For example, let's say we have product sales data:

Our task is to calculate the average sales of pens. To do this, we will take the following steps:

In a cell A13 write the name of the product “Pens”;
In a cell B13 let's enter the formula:

=AVERAGEIF(A2:A10,A13,B2:B10)

Cell range “ A2:A10” points to the list of products in which we will search for the word “Pens”. Argument A13 this is a link to a cell with text that we will search for among the entire list of products. Cell range “ B2:B10” is a range with product sales data, among which the function will find “Pens” and calculate the average value.

In most cases, the data is concentrated around some central point. Thus, to describe any data set, it is enough to indicate the average value. Let us consider successively three numerical characteristics that are used to estimate the mean value of the distribution: the arithmetic mean, the median, and the mode.

Average

The arithmetic mean (often referred to simply as the mean) is the most common estimate of the mean of a distribution. It is the result of dividing the sum of all observed numerical values by their number. For a sample of numbers X 1, X 2, ..., Xn, the sample mean (denoted by the symbol ) equals \u003d (X 1 + X 2 + ... + Xn) / n, or

where is the sample mean, n- sample size, Xi– i-th element of the sample.

Download note in or format, examples in format

Consider calculating the arithmetic average of the five-year average annual returns of 15 very high-risk mutual funds (Figure 1).

Rice. 1. Average annual return on 15 very high-risk mutual funds

The sample mean is calculated as follows:

This is a good return, especially when compared to the 3-4% return that bank or credit union depositors received over the same time period. If you sort the return values, it is easy to see that eight funds have a return above, and seven - below the average. The arithmetic mean acts as a balance point, so that low-income funds balance out high-income funds. All elements of the sample are involved in the calculation of the average. None of the other estimators of the distribution mean have this property.

When to calculate the arithmetic mean. Since the arithmetic mean depends on all elements of the sample, the presence of extreme values significantly affects the result. In such situations, the arithmetic mean can distort the meaning of the numerical data. Therefore, when describing a data set containing extreme values, it is necessary to indicate the median or the arithmetic mean and the median. For example, if the return of the RS Emerging Growth fund is removed from the sample, the sample average of the return of the 14 funds decreases by almost 1% to 5.19%.

Median

The median is the middle value of an ordered array of numbers. If the array does not contain repeating numbers, then half of its elements will be less than and half more than the median. If the sample contains extreme values, it is better to use the median rather than the arithmetic mean to estimate the mean. To calculate the median of a sample, it must first be sorted.

This formula is ambiguous. Its result depends on whether the number is even or odd. n:

If the sample contains an odd number of items, the median is (n+1)/2-th element.
If the sample contains an even number of elements, the median lies between the two middle elements of the sample and is equal to the arithmetic mean calculated over these two elements.

To calculate the median for a sample of 15 very high-risk mutual funds, we first need to sort the raw data (Figure 2). Then the median will be opposite the number of the middle element of the sample; in our example number 8. Excel has a special function =MEDIAN() that works with unordered arrays too.

Rice. 2. Median 15 funds

Thus, the median is 6.5. This means that half of the very high-risk funds do not exceed 6.5, while the other half do so. Note that the median of 6.5 is slightly larger than the median of 6.08.

If we remove the profitability of the RS Emerging Growth fund from the sample, then the median of the remaining 14 funds will decrease to 6.2%, that is, not as significantly as the arithmetic mean (Fig. 3).

Rice. 3. Median 14 funds

Fashion

The term was first introduced by Pearson in 1894. Fashion is the number that occurs most often in the sample (the most fashionable). Fashion describes well, for example, the typical reaction of drivers to a traffic signal to stop traffic. A classic example of the use of fashion is the choice of the size of the produced batch of shoes or the color of the wallpaper. If a distribution has multiple modes, then it is said to be multimodal or multimodal (has two or more "peaks"). The multimodal distribution provides important information about the nature of the variable under study. For example, in sociological surveys, if a variable represents a preference or attitude towards something, then multimodality could mean that there are several distinctly different opinions. Multimodality also serves as an indicator that the sample is not homogeneous and the observations may be generated by two or more "overlapped" distributions. Unlike the arithmetic mean, outliers do not affect the mode. For continuously distributed random variables, such as the average annual returns of mutual funds, the mode sometimes does not exist at all (or does not make sense). Since these indicators can take on a variety of values, repeating values are extremely rare.

Quartiles

Quartiles are measures that are most commonly used to evaluate the distribution of data when describing the properties of large numerical samples. While the median splits the ordered array in half (50% of the array elements are less than the median and 50% are greater), quartiles break the ordered dataset into four parts. The Q 1 , median and Q 3 values are the 25th, 50th and 75th percentile, respectively. The first quartile Q 1 is a number that divides the sample into two parts: 25% of the elements are less than, and 75% are more than the first quartile.

The third quartile Q 3 is a number that also divides the sample into two parts: 75% of the elements are less than, and 25% are more than the third quartile.

To calculate quartiles in versions of Excel prior to 2007, the function =QUARTILE(array, part) was used. Starting with Excel 2010, two functions apply:

=QUARTILE.ON(array, part)
=QUARTILE.EXC(array, part)

These two functions give slightly different values (Figure 4). For example, when calculating the quartiles for a sample containing data on the average annual return of 15 very high-risk mutual funds, Q 1 = 1.8 or -0.7 for QUARTILE.INC and QUARTILE.EXC, respectively. By the way, the QUARTILE function used earlier corresponds to the modern QUARTILE.ON function. To calculate quartiles in Excel using the above formulas, the data array can be left unordered.

Rice. 4. Calculate quartiles in Excel

Let's emphasize again. Excel can calculate quartiles for univariate discrete series, containing the values of a random variable. The calculation of quartiles for a frequency-based distribution is given in the section below.

geometric mean

Unlike the arithmetic mean, the geometric mean measures how much a variable has changed over time. The geometric mean is the root n th degree from the product n values (in Excel, the function = CUGEOM is used):

G= (X 1 * X 2 * ... * X n) 1/n

A similar parameter - the geometric mean of the rate of return - is determined by the formula:

G \u003d [(1 + R 1) * (1 + R 2) * ... * (1 + R n)] 1 / n - 1,

where R i- rate of return i-th period of time.

For example, suppose the initial investment is $100,000. By the end of the first year, it drops to $50,000, and by the end of the second year, it recovers to the original $100,000. The rate of return on this investment over a two-year period is equal to 0, since the initial and final amount of funds are equal to each other. However, the arithmetic average of annual rates of return is = (-0.5 + 1) / 2 = 0.25 or 25%, since the rate of return in the first year R 1 = (50,000 - 100,000) / 100,000 = -0.5 , and in the second R 2 = (100,000 - 50,000) / 50,000 = 1. At the same time, the geometric mean of the rate of return for two years is: G = [(1–0.5) * (1 + 1 )] 1/2 – 1 = ½ – 1 = 1 – 1 = 0. Thus, the geometric mean more accurately reflects the change (more precisely, the absence of change) in the volume of investments over the biennium than the arithmetic mean.

Interesting Facts. First, the geometric mean will always be less than the arithmetic mean of the same numbers. Except for the case when all the taken numbers are equal to each other. Secondly, having considered the properties of a right triangle, one can understand why the mean is called geometric. The height of a right-angled triangle, lowered to the hypotenuse, is the average proportional between the projections of the legs on the hypotenuse, and each leg is the average proportional between the hypotenuse and its projection on the hypotenuse (Fig. 5). This gives a geometric way of constructing the geometric mean of two (lengths) segments: you need to build a circle on the sum of these two segments as a diameter, then the height, restored from the point of their connection to the intersection with the circle, will give the desired value:

Rice. 5. The geometric nature of the geometric mean (figure from Wikipedia)

The second important property of numerical data is their variation characterizing the degree of dispersion of the data. Two different samples can differ both in mean values and in variations. However, as shown in fig. 6 and 7, two samples can have the same variation but different means, or the same mean and completely different variation. The data corresponding to polygon B in Fig. 7 change much less than the data from which polygon A was built.

Rice. 6. Two symmetric bell-shaped distributions with the same spread and different mean values

Rice. 7. Two symmetric bell-shaped distributions with the same mean values and different scatter

There are five estimates of data variation:

span,
interquartile range,
dispersion,
standard deviation,
the coefficient of variation.

scope

The range is the difference between the largest and smallest elements of the sample:

Swipe = XMax-XMin

The range of a sample containing data on the average annual returns of 15 very high-risk mutual funds can be calculated using an ordered array (see Figure 4): range = 18.5 - (-6.1) = 24.6. This means that the difference between the highest and lowest average annual returns for very high risk funds is 24.6%.

The range measures the overall spread of the data. Although the sample range is a very simple estimate of the total spread of the data, its weakness is that it does not take into account exactly how the data is distributed between the minimum and maximum elements. This effect is well seen in Fig. 8 which illustrates samples having the same range. The B scale shows that if the sample contains at least one extreme value, the sample range is a very inaccurate estimate of the spread of the data.

Rice. 8. Comparison of three samples with the same range; the triangle symbolizes the support of the balance, and its location corresponds to the average value of the sample

Interquartile range

The interquartile, or mean, range is the difference between the third and first quartiles of the sample:

Interquartile range \u003d Q 3 - Q 1

This value makes it possible to estimate the spread of 50% of the elements and not to take into account the influence of extreme elements. The interquartile range for a sample containing data on the average annual returns of 15 very high-risk mutual funds can be calculated using the data in Figure 2. 4 (for example, for the function QUARTILE.EXC): Interquartile range = 9.8 - (-0.7) = 10.5. The interval between 9.8 and -0.7 is often referred to as the middle half.

It should be noted that the Q 1 and Q 3 values, and hence the interquartile range, do not depend on the presence of outliers, since their calculation does not take into account any value that would be less than Q 1 or greater than Q 3 . The total quantitative characteristics, such as the median, the first and third quartiles, and the interquartile range, which are not affected by outliers, are called robust indicators.

While the range and interquartile range provide an estimate of the total and mean scatter of the sample, respectively, neither estimate takes into account how the data are distributed. Variance and standard deviation free from this shortcoming. These indicators allow you to assess the degree of fluctuation of the data around the mean. Sample variance is an approximation of the arithmetic mean calculated from the squared differences between each sample element and the sample mean. For a sample of X 1 , X 2 , ... X n the sample variance (denoted by the symbol S 2 is given by the following formula:

In general, the sample variance is the sum of the squared differences between the sample elements and the sample mean, divided by a value equal to the sample size minus one:

where - arithmetic mean, n- sample size, X i - i-th sample element X. In Excel before version 2007, the function =VAR() was used to calculate the sample variance, since version 2010 the function =VAR.V() is used.

The most practical and widely accepted estimate of data scatter is standard deviation. This indicator is denoted by the symbol S and is equal to the square root of the sample variance:

In Excel before version 2007, the =STDEV() function was used to calculate the standard deviation, from version 2010 the =STDEV.B() function is used. To calculate these functions, the data array can be unordered.

Neither the sample variance nor the sample standard deviation can be negative. The only situation in which the indicators S 2 and S can be zero is if all elements of the sample are equal. In this completely improbable case, the range and interquartile range are also zero.

Numeric data is inherently volatile. Any variable can take on many different values. For example, different mutual funds have different rates of return and loss. Due to the variability of numerical data, it is very important to study not only estimates of the mean, which are summative in nature, but also estimates of the variance, which characterize the scatter of the data.

The variance and standard deviation allow us to estimate the spread of data around the mean, in other words, to determine how many elements of the sample are less than the mean, and how many are greater. The dispersion has some valuable mathematical properties. However, its value is the square of a unit of measure - a square percentage, a square dollar, a square inch, etc. Therefore, a natural estimate of the variance is the standard deviation, which is expressed in the usual units of measurement - percent of income, dollars or inches.

The standard deviation allows you to estimate the amount of fluctuation of the sample elements around the mean value. In almost all situations, the majority of observed values lie within plus or minus one standard deviation from the mean. Therefore, knowing the arithmetic mean of the sample elements and the standard sample deviation, it is possible to determine the interval to which the bulk of the data belongs.

The standard deviation of returns on 15 very high-risk mutual funds is 6.6 (Figure 9). This means that the profitability of the bulk of funds differs from the average value by no more than 6.6% (i.e., it fluctuates in the range from – S= 6.2 – 6.6 = –0.4 to +S= 12.8). In fact, this interval contains a five-year average annual return of 53.3% (8 out of 15) of funds.

Rice. 9. Standard deviation

Note that in the process of summing the squared differences, items that are farther from the mean gain more weight than items that are closer. This property is the main reason why the arithmetic mean is most often used to estimate the mean of a distribution.

The coefficient of variation

Unlike previous scatter estimates, the coefficient of variation is a relative estimate. It is always measured as a percentage, not in the original data units. The coefficient of variation, denoted by the symbols CV, measures the scatter of the data around the mean. The coefficient of variation is equal to the standard deviation divided by the arithmetic mean and multiplied by 100%:

where S- standard sample deviation, - sample mean.

The coefficient of variation allows you to compare two samples, the elements of which are expressed in different units of measurement. For example, the manager of a mail delivery service intends to upgrade the fleet of trucks. When loading packages, there are two types of restrictions to consider: the weight (in pounds) and the volume (in cubic feet) of each package. Assume that in a sample of 200 bags, the average weight is 26.0 pounds, the standard deviation of the weight is 3.9 pounds, the average package volume is 8.8 cubic feet, and the standard deviation of the volume is 2.2 cubic feet. How to compare the spread of weight and volume of packages?

Since the units of measurement for weight and volume differ from each other, the manager must compare the relative spread of these values. The weight variation coefficient is CV W = 3.9 / 26.0 * 100% = 15%, and the volume variation coefficient CV V = 2.2 / 8.8 * 100% = 25% . Thus, the relative scatter of packet volumes is much larger than the relative scatter of their weights.

Distribution form

The third important property of the sample is the form of its distribution. This distribution can be symmetrical or asymmetric. To describe the shape of a distribution, it is necessary to calculate its mean and median. If these two measures are the same, the variable is said to be symmetrically distributed. If the mean value of a variable is greater than the median, its distribution has a positive skewness (Fig. 10). If the median is greater than the mean, the distribution of the variable is negatively skewed. Positive skewness occurs when the mean increases to unusually high values. Negative skewness occurs when the mean decreases to unusually small values. A variable is symmetrically distributed if it does not take on any extreme values in either direction, such that large and small values of the variable cancel each other out.

Rice. 10. Three types of distributions

The data depicted on the A scale have a negative skewness. This figure shows a long tail and left skew caused by unusually small values. These extremely small values shift the mean value to the left, and it becomes less than the median. The data shown on scale B are distributed symmetrically. The left and right halves of the distribution are their mirror images. Large and small values balance each other, and the mean and median are equal. The data shown on scale B has a positive skewness. This figure shows a long tail and skew to the right, caused by the presence of unusually high values. These too large values shift the mean to the right, and it becomes larger than the median.

In Excel, descriptive statistics can be obtained using the add-in Analysis package. Go through the menu Data → Data analysis, in the window that opens, select the line Descriptive statistics and click Ok. In the window Descriptive statistics be sure to indicate input interval(Fig. 11). If you want to see descriptive statistics on the same sheet as the original data, select the radio button output interval and specify the cell where you want to place the upper left corner of the displayed statistics (in our example, $C$1). If you want to output data to a new sheet or to a new workbook, simply select the appropriate radio button. Check the box next to Final statistics. Optionally, you can also choose Level of difficulty,k-th smallest andk-th largest.

If on deposit Data in the region of Analysis you don't see the icon Data analysis, you must first install the add-on Analysis package(see, for example,).

Rice. 11. Descriptive statistics of the five-year average annual returns of funds with very high levels of risk, calculated using the add-on Data analysis Excel programs

Excel calculates a number of statistics discussed above: mean, median, mode, standard deviation, variance, range ( interval), minimum, maximum, and sample size ( check). In addition, Excel calculates some new statistics for us: standard error, kurtosis, and skewness. standard error equals the standard deviation divided by the square root of the sample size. asymmetry characterizes the deviation from the symmetry of the distribution and is a function that depends on the cube of differences between the elements of the sample and the mean value. Kurtosis is a measure of the relative concentration of data around the mean versus the tails of the distribution, and depends on the differences between the sample and the mean raised to the fourth power.

Calculation of descriptive statistics for the general population

The mean, scatter, and shape of the distribution discussed above are sample-based characteristics. However, if the dataset contains numerical measurements of the entire population, then its parameters can be calculated. These parameters include the mean, variance, and standard deviation of the population.

Expected value is equal to the sum of all values of the general population divided by the volume of the general population:

where µ - expected value, Xi- i-th variable observation X, N- the volume of the general population. In Excel, to calculate the mathematical expectation, the same function is used as for the arithmetic mean: =AVERAGE().

Population variance equal to the sum of the squared differences between the elements of the general population and mat. expectation divided by the size of the population:

where σ2 is the variance of the general population. Excel prior to version 2007 uses the =VAR() function to calculate the population variance, starting with version 2010 =VAR.G().

population standard deviation is equal to the square root of the population variance:

Excel prior to version 2007 uses =STDEV() to calculate the population standard deviation, starting with version 2010 =STDEV.Y(). Note that the formulas for population variance and standard deviation are different from the formulas for sample variance and standard deviation. When calculating sample statistics S2 And S the denominator of the fraction is n - 1, and when calculating the parameters σ2 And σ - the volume of the general population N.

rule of thumb

In most situations, a large proportion of observations are concentrated around the median, forming a cluster. In data sets with positive skewness, this cluster is located to the left (i.e., below) the mathematical expectation, and in sets with negative skewness, this cluster is located to the right (i.e., above) of the mathematical expectation. Symmetric data have the same mean and median, and the observations cluster around the mean, forming a bell-shaped distribution. If the distribution does not have a pronounced skewness, and the data is concentrated around a certain center of gravity, a rule of thumb can be used to estimate variability, which says: if the data has a bell-shaped distribution, then approximately 68% of the observations are within one standard deviation of the mathematical expectation, Approximately 95% of the observations are within two standard deviations of the expected value, and 99.7% of the observations are within three standard deviations of the expected value.

Thus, the standard deviation, which is an estimate of the average fluctuation around the mathematical expectation, helps to understand how the observations are distributed and to identify outliers. It follows from the rule of thumb that for bell-shaped distributions, only one value in twenty differs from the mathematical expectation by more than two standard deviations. Therefore, values outside the interval µ ± 2σ, can be considered outliers. In addition, only three out of 1000 observations differ from the mathematical expectation by more than three standard deviations. Thus, values outside the interval µ ± 3σ are almost always outliers. For distributions that are highly skewed or not bell-shaped, the Biename-Chebyshev rule of thumb can be applied.

More than a hundred years ago, the mathematicians Bienamay and Chebyshev independently discovered a useful property of the standard deviation. They found that for any data set, regardless of the shape of the distribution, the percentage of observations that lie at a distance not exceeding k standard deviations from mathematical expectation, not less (1 – 1/ 2)*100%.

For example, if k= 2, the Biename-Chebyshev rule states that at least (1 - (1/2) 2) x 100% = 75% of the observations must lie in the interval µ ± 2σ. This rule is true for any k exceeding one. The Biename-Chebyshev rule is of a very general nature and is valid for distributions of any kind. It indicates the minimum number of observations, the distance from which to the mathematical expectation does not exceed a given value. However, if the distribution is bell-shaped, the rule of thumb more accurately estimates the concentration of data around the mean.

Computing descriptive statistics for a frequency-based distribution

If the original data is not available, the frequency distribution becomes the only source of information. In such situations, you can calculate the approximate values of quantitative indicators of the distribution, such as the arithmetic mean, standard deviation, quartiles.

If the sample data is presented as a frequency distribution, an approximate value of the arithmetic mean can be calculated, assuming that all values within each class are concentrated at the midpoint of the class:

where - sample mean, n- number of observations, or sample size, from- the number of classes in the frequency distribution, mj- middle point j-th class, fj- frequency corresponding to j-th class.

To calculate the standard deviation from the frequency distribution, it is also assumed that all values within each class are concentrated at the midpoint of the class.

To understand how the quartiles of the series are determined based on frequencies, let us consider the calculation of the lower quartile based on the data for 2013 on the distribution of the Russian population by average per capita cash income (Fig. 12).

Rice. 12. The share of the population of Russia with per capita monetary income on average per month, rubles

To calculate the first quartile of the interval variation series, you can use the formula:

where Q1 is the value of the first quartile, xQ1 is the lower limit of the interval containing the first quartile (the interval is determined by the accumulated frequency, the first exceeding 25%); i is the value of the interval; Σf is the sum of the frequencies of the entire sample; probably always equal to 100%; SQ1–1 is the cumulative frequency of the interval preceding the interval containing the lower quartile; fQ1 is the frequency of the interval containing the lower quartile. The formula for the third quartile differs in that in all places, instead of Q1, you need to use Q3, and substitute ¾ instead of ¼.

In our example (Fig. 12), the lower quartile is in the range 7000.1 - 10,000, the cumulative frequency of which is 26.4%. The lower limit of this interval is 7000 rubles, the value of the interval is 3000 rubles, the accumulated frequency of the interval preceding the interval containing the lower quartile is 13.4%, the frequency of the interval containing the lower quartile is 13.0%. Thus: Q1 \u003d 7000 + 3000 * (¼ * 100 - 13.4) / 13 \u003d 9677 rubles.

Pitfalls associated with descriptive statistics

In this note, we looked at how to describe a dataset using various statistics that estimate its mean, scatter, and distribution. The next step is to analyze and interpret the data. So far, we have studied the objective properties of data, and now we turn to their subjective interpretation. Two mistakes lie in wait for the researcher: an incorrectly chosen subject of analysis and an incorrect interpretation of the results.

An analysis of the performance of 15 very high-risk mutual funds is fairly unbiased. He led to completely objective conclusions: all mutual funds have different returns, the spread of fund returns ranges from -6.1 to 18.5, and the average return is 6.08. The objectivity of data analysis is ensured by the correct choice of total quantitative indicators of the distribution. Several methods for estimating the mean and scatter of data were considered, and their advantages and disadvantages were indicated. How to choose the right statistics that provide an objective and unbiased analysis? If the data distribution is slightly skewed, should the median be chosen over the arithmetic mean? Which indicator more accurately characterizes the spread of data: standard deviation or range? Should the positive skewness of the distribution be indicated?

On the other hand, data interpretation is a subjective process. Different people come to different conclusions, interpreting the same results. Everyone has their own point of view. Someone considers the total average annual returns of 15 funds with a very high level of risk to be good and is quite satisfied with the income received. Others may think that these funds have too low returns. Thus, subjectivity should be compensated by honesty, neutrality and clarity of conclusions.

Ethical Issues

Data analysis is inextricably linked to ethical issues. One should be critical of the information disseminated by newspapers, radio, television and the Internet. Over time, you will learn to be skeptical not only about the results, but also about the goals, subject and objectivity of research. The famous British politician Benjamin Disraeli said it best: “There are three kinds of lies: lies, damned lies and statistics.”

As noted in the note, ethical issues arise when choosing the results that should be presented in the report. Both positive and negative results should be published. In addition, when making a report or written report, the results must be presented honestly, neutrally and objectively. Distinguish between bad and dishonest presentations. To do this, it is necessary to determine what the intentions of the speaker were. Sometimes the speaker omits important information out of ignorance, and sometimes deliberately (for example, if he uses the arithmetic mean to estimate the mean of clearly skewed data in order to get the desired result). It is also dishonest to suppress results that do not correspond to the point of view of the researcher.

Materials from the book Levin et al. Statistics for managers are used. - M.: Williams, 2004. - p. 178–209

QUARTILE function retained to align with earlier versions of Excel

In mathematics, the arithmetic mean of numbers (or simply the average) is the sum of all the numbers in a given set divided by their number. This is the most generalized and widespread concept of the average value. As you already understood, in order to find you need to sum up all the numbers given to you, and divide the result by the number of terms.

What is the arithmetic mean?

Let's look at an example.

Example 1. Numbers are given: 6, 7, 11. You need to find their average value.

Solution.

First, let's find the sum of all given numbers.

Now we divide the resulting sum by the number of terms. Since we have three terms, respectively, we will divide by three.

Therefore, the average of 6, 7, and 11 is 8. Why 8? Yes, because the sum of 6, 7 and 11 will be the same as three eights. This is clearly seen in the illustration.

The average value is somewhat reminiscent of the "alignment" of a series of numbers. As you can see, the piles of pencils have become one level.

Consider another example to consolidate the knowledge gained.

Example 2 Numbers are given: 3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29. You need to find their arithmetic mean.

Solution.

We find the sum.

3 + 7 + 5 + 13 + 20 + 23 + 39 + 23 + 40 + 23 + 14 + 12 + 56 + 23 + 29 = 330

Divide by the number of terms (in this case, 15).

Therefore, the average value of this series of numbers is 22.

Now consider negative numbers. Let's remember how to sum them up. For example, you have two numbers 1 and -4. Let's find their sum.

1 + (-4) = 1 - 4 = -3

Knowing this, consider another example.

Example 3 Find the average value of a series of numbers: 3, -7, 5, 13, -2.

Solution.

Finding the sum of numbers.

3 + (-7) + 5 + 13 + (-2) = 12

Since there are 5 terms, we divide the resulting sum by 5.

Therefore, the arithmetic mean of the numbers 3, -7, 5, 13, -2 is 2.4.

In our time of technological progress, it is much more convenient to use computer programs to find the average value. Microsoft Office Excel is one of them. Finding the average in Excel is quick and easy. Moreover, this program is included in the software package from Microsoft Office. Let's consider a brief instruction, value using this program.

In order to calculate the average value of a series of numbers, you must use the AVERAGE function. The syntax for this function is:
=Average(argument1, argument2, ... argument255)
where argument1, argument2, ... argument255 are either numbers or cell references (cells mean ranges and arrays).

To make it clearer, let's test the knowledge gained.

Enter the numbers 11, 12, 13, 14, 15, 16 in cells C1 - C6.
Select cell C7 by clicking on it. In this cell, we will display the average value.
Click on the "Formulas" tab.
Select More Functions > Statistical to open
Select AVERAGE. After that, a dialog box should open.
Select and drag cells C1-C6 there to set the range in the dialog box.
Confirm your actions with the "OK" button.
If you did everything correctly, in cell C7 you should have the answer - 13.7. When you click on cell C7, the function (=Average(C1:C6)) will be displayed in the formula bar.

It is very useful to use this function for accounting, invoices, or when you just need to find the average of a very long range of numbers. Therefore, it is often used in offices and large companies. This allows you to keep the records in order and makes it possible to quickly calculate something (for example, the average income per month). You can also use Excel to find the mean of a function.

A very convenient invention of the computer world is spreadsheets. You can enter data into them, beautifully arrange them in the form of documents to your taste (or to the taste of the authorities).

You can create such a document once - in fact, immediately a whole family of documents, which, according to Excel terminology, is called a “workbook” (English workbook).

How Excel behaves

Then you just need to change a few original numbers when changing the data, and then Excel will perform several actions at once, arithmetic and others. It's in the document:

To do this, the spreadsheet program (and Excel is far from the only one) has a whole arsenal of arithmetic tools and ready-made functions that are performed on already debugged and workable programs. It is only necessary to indicate in any cell when we write the formula, among other operands, the name of the corresponding function and, in brackets to it, the arguments.

Lots of features and they are grouped by application:

To generalize multiple data, there is a whole set of statistical functions. Getting the average of some data is probably the very first thing a statistician thinks of when he looks at the numbers.

What is an average?

This is when a certain series of numbers is taken, two values are calculated for them - the total number of numbers and their total sum, and then the second is divided by the first. Then you get a number that, by its value, is somewhere in the very middle of the row. Perhaps even coincide with some of the numbers in the series.

Well, let's assume that that number was terribly lucky in this case, but usually the arithmetic average is not only not the same as any of the numbers in its series, but even, as they say, "not climbing into any gates" in this series . For example, average number of people living in the apartments of some city N-ska, there may be 5,216 people. What is it like? 5 people live and another appendage of 216 thousandths of one of them? The one who knows will only smirk: what are you doing! That's statistics!

Statistical (or simply accounting) tables can be of completely different shapes and sizes. Actually, the shape is a rectangle, but they are wide, narrow, repeating (say, data for a week by day), scattered on different sheets of your workbook - a workbook.

Or even in other workbooks (that is, in books, in English), and even on other computers in the local network, or, it’s scary to say, in other parts of our white world, now united by the all-powerful Internet. A lot of information can be obtained from very reputable sources on the Internet already in finished form. Then process, analyze, draw conclusions writing articles, dissertations...

As a matter of fact, today we just need to calculate the average on a certain array of homogeneous data using the miraculous spreadsheet program. Homogeneous means data about some similar objects and in the same units of measurement. So that people never sum up with bags of potatoes, but kilobytes with rubles and kopecks.

Average search example

Let us have initial data written in some cells. Usually, generalized data, or data obtained from the original data, is somehow recorded here.

The initial data are located on the left side of the table (for example, one column is the number of parts manufactured by one worker A, which corresponds to a separate line in the table, and the second column is the price of one part), the last column shows the output of worker A in money.

Previously, this was done with a calculator, now you can entrust such a simple task to a program that never makes mistakes.

Simple table of daily earnings

Here in the picture amount of earnings and it is calculated for each employee in column E by multiplying the number of parts (column C) by the price of the parts (column D).

Then he will not even be able to set foot in other places of the table, and he will not be able to look at the formulas. Although, of course, everyone in that shop knows how the output of an individual worker translates into money earned by him in a day.

Total values

Then the total values are usually calculated. These are the aggregate figures. throughout the workshop, site, or the entire team. Usually these figures are reported by some bosses to others - superior bosses.

This is how you can calculate the amounts in the columns of the source data, and at the same time in the derived column, that is, the earnings column

Immediately, I note that while the Excel table is being created, no protection is done in the cells. Otherwise, how would we draw the plate itself, enter the design, color it and enter smart and correct formulas? Well, when everything is ready, before giving this workbook (that is, a spreadsheet file) to work with a completely different person, protection is made. Yes, just from careless action, so as not to accidentally damage the formulas.

And now the table is self-calculating in work, in the workshop it will begin to work together with the rest of the workshop hard workers. After the labor day is over, all such tables of data on the work of the workshop (and not just one of it) are transferred to the high authorities, who will summarize these data the next day and draw some conclusions.

Here it is, the average (mean - in English)

It is first of all calculates the average number of parts, made per employee per day, as well as the average earnings per day for the workers of the shop (and then for the plant). We will also do this in the last, lowest row of our table.

As you can see, you can use the amounts already calculated in the previous line, just divide them by the number of employees - 6 in this case.

In formulas, dividing by constants, constant numbers, is a bad form. What if something out of the ordinary happens in our country, and the number of employees becomes less? Then you will need to climb all the formulas and change the number seven to some other everywhere. You can, for example, "deceive" the sign like this:

Instead of a specific number, put in the formula a link to cell A7, where the serial number of the last employee from the list is. That is, this will be the number of employees, which means that we correctly divide the amount for the column of interest to us by the number and get the average value. As you can see, the average number of parts turned out to be 73 plus a mind-boggling in numbers (although not in significance) appendage, which is usually thrown out by the rounding method.

Rounding to kopecks

Rounding is a common action when in formulas, especially accounting ones, one number is divided by another. Moreover, this is a separate topic in accounting. Accountants have been rounding for a long time and scrupulously: they immediately round each number obtained by dividing to kopecks.

Excel is a mathematical program. She is not in awe of a fraction of a penny - what to do with it. Excel simply stores numbers as they are, with all decimal places. And again and again will carry out calculations with such numbers. And, the end result can round (if we give the command).

Only the accounting department will say that this is a mistake. Because they round each received "crooked" number to whole rubles and kopecks. And the end result usually turns out to be slightly different than that of a program indifferent to money.

But now I will tell the main secret. Excel can find the average without us, it has a built-in function for this. She only needs to specify a range of data. And then she herself will sum them up, count them, and then she herself will divide the amount by the quantity. And it will turn out exactly the same thing that we comprehended step by step.

In order to find this function, we, having inserted into cell E9, where its result should be placed - the average value in column E, click on the icon fx, which is to the left of the formula bar.

A panel called "Function Wizard" will open. This is such a multi-step dialog (Wizard, in English), with the help of which the program helps in constructing complex formulas. And, note that the help has already begun: in the formula bar, the program has driven the = sign for us.
Now you can be calm, the program will guide us through all the difficulties (even in Russian, even in English) and as a result, the correct formula for the calculation will be built.

In the upper window (“Search for a function:”) it is written that we can search and find here. That is, here you can write “average” and click the “Find” button (Find, in English). But you can do otherwise. We know that this is a function - from the category of statistical. Here we will find this category in the second window. And in the list that opens below, we will find the function "AVERAGE".

At the same time, we will see how great it is many functions in the category of statistical, there are 7 averages alone. And for each of the functions, if you move the pointer over them, below you can see a brief annotation for this function. And if you click even lower, on the inscription "Help for this function", you can get a very detailed description of it.

And now we just calculate the average value. We click "OK" (this is how consent is expressed in English, although, rather, it is in American) on the button below.

The program has driven the beginning of the formula, now you need to set the range for the first argument. Just select it with the mouse. Click OK and get the result. Left add rounding here, which we made in cell C9, and the plate is ready for daily work.