Give a brief definition of the general population. General and sample populations. The concept of representativeness

Lecture 6. Elements mathematical statistics

Questions to control knowledge and summarize the lecture

1. Define a random variable.

2. Write formulas for the mathematical expectation and dispersion of discrete and continuous random variables.

3. Give a definition of Laplace's local integral limit theorem

4. Write formulas for the binomial distribution, hypergeometric distribution, Poisson distribution, uniform distribution, and normal distribution.

Purpose: To study the basic concepts of mathematical statistics

1. Population and sample

2. Statistical distribution of the sample. Polygon. bar graph .

3. Estimates of the parameters of the general population based on its sample

4. General and sample averages. Methods for their calculation.

5. General and sample variances.

6. Questions to control knowledge and summarize the lecture

We begin to study the elements of mathematical statistics, in which scientifically based methods for collecting statistical data and processing them are developed.

1. General population and sample. Let it be required to study a set of homogeneous objects (this set is called statistical aggregate) regarding some qualitative or quantitative feature that characterizes these objects. For example, if there is a batch of parts, then the standard part can serve as a qualitative sign, and the controlled size of the part can serve as a quantitative sign.

It is best to make a continuous survey, i.e. explore each item. However, in most cases, for various reasons, this is not possible. A large number of objects and their unavailability can prevent a continuous survey. If, for example, we need to know the average depth of the funnel during the explosion of a projectile from an experimental batch, then by making a complete survey, we will destroy the entire batch.

If a complete survey is not possible, then a part of the objects is selected for study from the entire population.

The statistical set from which some of the objects are selected is called the general population. A set of objects randomly selected from the general population is called sample.

The number of objects in the general population and the sample is called, respectively volume general population and volume samples.

Example 10.1. The fruits of one tree (200 pieces) are examined for the presence of a taste specific to this variety. To do this, select 10 pcs. Here 200 is the population size and 10 is the sample size.

If the sample is taken from one object, which is examined and returned to the general population, then the sample is called repeated. If the objects of the sample are no longer returned to the general population, then the sample is called unrepeated.



In practice, non-repetitive sampling is more often used. If the sample size is a small fraction of the population size, then the difference between resampling and non-repeating sampling is negligible.

The properties of the objects in the sample must correctly reflect the properties of the objects in the population, or, as they say, the sample must be representative(representative). It is believed that the sample is representative if all objects of the general population have the same probability of being included in the sample, i.e., the choice is made randomly. For example, in order to estimate the future harvest, one can make a sample from the general population of fruits that have not yet ripened and examine their characteristics (weight, quality, etc.). If the entire sample is taken from one tree, then it will not be representative. A representative sample should consist of randomly selected fruits from randomly selected trees.

2. Statistical distribution of the sample. Polygon. Bar graph. Let a sample be taken from the general population, and X 1 observed n 1 time, X 2 - p 2 once, ..., x k - n k times and n 1 +n 2 +…+ p k= P - sample size. Observed values x 1 , x 2 , …, x k called options, and the variant sequence, written in ascending order, is variation series. Number of observations n 1 , n 2 , …, nk called frequencies and their relationship to the sample size , , …, - relative frequencies. Note that the sum of the relative frequencies is equal to one: .

The statistical distribution of the sample call the list of options and their corresponding frequencies or relative frequencies. The statistical distribution can also be specified as a sequence of intervals and their corresponding frequencies (continuous distribution). As the frequency corresponding to the interval, take the sum of the frequencies of the variant that fell into this interval. For graphic image statistical distribution use polygons and histograms.

To build a polygon on the axis Oh set aside option values X i , on the axis OU - frequency values P i (relative frequencies ).

Example 10.2. On fig. 10.1 shows the polygon of the following distribution

The polygon is usually used in the case of a small number of options. In the case of a large number of variants and in the case of a continuous distribution of the feature, histograms are more often built. To do this, the interval, which contains all the observed values ​​of the feature, is divided into several partial intervals of length h and find for each partial interval n i, - the sum of the frequencies of the variant that fell into i-interval. Then, on these intervals, as on bases, they build rectangles with heights (or, where P - sample size).

Square i partial rectangle is , (or ).

Therefore, the area of ​​the histogram is equal to the sum of all frequencies (or relative frequencies), i.e. sample size (or unit).

Example 10.3. On fig. 10.2 shows a histogram of continuous volume distribution n= 100 given in the following table.

Population (in English - population) - the totality of all objects (units), regarding which the scientist intends to draw conclusions when studying a specific problem.

The general population consists of all objects that are subject to study. The composition of the general population depends on the objectives of the study. Sometimes the general population is the entire population of a certain region (for example, when the attitude of potential voters to a candidate is being studied), most often several criteria are set that determine the object of study. For example, men aged 30-50 who use a certain brand of razor at least once a week and have an income of at least $100 per family member.

Sampleor sampling frame- a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population for participation in the study.

Sample characteristics:

· Qualitative characteristics of the sample - who exactly we choose and what methods of sample construction we use for this.

· The quantitative characteristic of the sample is how many cases we select, in other words, the sample size.

Need for sampling

· The object of study is very broad. For example, consumers of the products of a global company are a huge number of geographically dispersed markets.

· There is a need to collect primary information.

Sample size

Sample size- the number of cases included in the sample. For statistical reasons, it is recommended that the number of cases be at least 30-35.

Dependent and independent samples

When comparing two (or more) samples, their dependence is an important parameter. If it is possible to establish a homomorphic pair (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis of relationship is important for the trait measured in the samples), such samples are called dependent. Examples of dependent selections:

· pair of twins

· two measurements of any feature before and after experimental exposure,

· husbands and wives

· etc.

If there is no such relationship between the samples, then these samples are considered independent, For example:

· men and women,

· psychologists and mathematicians.

Accordingly, dependent samples always have the same size, while the size of independent samples may differ.

Samples are compared using various statistical criteria:

· Student's t-test

· Wilcoxon test

· Mann-Whitney U test

· Criterion of signs

· and etc.

Representativeness

The sample may be considered representative or non-representative.

An example of a non-representative sample

One of the most famous in the USA historical examples an unrepresentative sample is considered to be an incident that occurred during the presidential election in 1936. The Litrery Digest, which had successfully predicted the events of several previous elections, miscalculated by sending out ten million test ballots to its subscribers, as well as to people selected from countrywide phone books and people from car registration lists. In 25% of the returned ballots (nearly 2.5 million), the votes were distributed as follows:

· 57% preferred Republican candidate Alf Landon

· 40% chose then-Democratic President Franklin Roosevelt

As is well known, Roosevelt won the actual elections with more than 60% of the votes. The Litreary Digest's mistake was this: wanting to increase the representativeness of the sample - because they knew that the majority of their subscribers considered themselves Republicans - they expanded the sample with people selected from phone books and registration lists. However, they did not take into account contemporary realities and in fact recruited even more Republicans: during the Great Depression, it was mostly the middle and upper class (that is, most Republicans, not Democrats) who could afford to own phones and cars.

Types of plan for building groups from samples

There are several main types of group building plan:

1. Study with experimental and control groups, which are placed in different conditions.

2. Study with experimental and control groups using a paired selection strategy

3. Study using only one group - experimental.

4. A study using a mixed (factorial) plan - all groups are placed in different conditions.

Sample types

Samples are divided into two types:

· probabilistic

· improbability

Probability samples

1. Simple probability sampling:

oSimple resampling. The use of such a sample is based on the assumption that each respondent is equally likely to be included in the sample. Based on the list of the general population, cards with the numbers of respondents are compiled. They are placed in a deck, shuffled, and a card is taken out of them at random, a number is written down, then returned back. Further, the procedure is repeated as many times as the sample size we need. Minus: repetition of selection units.

The procedure for constructing a simple random sample includes the following steps:

1. need to get full list members of the general population and number this list. Such a list, recall, is called the sampling frame;

2. determine the expected sample size, that is, the expected number of respondents;

3. extract as many numbers from the table of random numbers as we need sample units. If the sample should include 100 people, 100 random numbers are taken from the table. These random numbers can be generated by a computer program.

4. select from the base list those observations whose numbers correspond to the written random numbers

· A simple random sample has obvious benefits. This method is extremely easy to understand. The results of the study can be extended to the study population. Most approaches to getting statistical inference provide for the collection of information using a simple random sample. However, the simple random sampling method has at least four significant limitations:

1. It is often difficult to create a sampling frame that would allow for a simple random sample.

2. The result of using a simple random sample can be a large population, or a population distributed over a large geographical area, which significantly increases the time and cost of data collection.

3. the results of applying a simple random sample are often characterized by low accuracy and greater standard error than the results of applying other probabilistic methods.

4. As a result of the application of the SRS, an unrepresentative sample may be formed. Although the samples obtained by simple random selection, on average, adequately represent the population, some of them extremely incorrectly represent the population under study. The probability of this is especially high with a small sample size.

· Simple non-repetitive sampling. The procedure for constructing the sample is the same, only the cards with the numbers of the respondents are not returned back to the deck.

1. Systematic probability sampling. It is a simplified version of a simple probability sample. Based on the list of the general population, respondents are selected at a certain interval (K). The value of K is determined randomly. The most reliable result is achieved with a homogeneous general population, otherwise the step size and some internal cyclic patterns of the sample may coincide (sample mixing). Cons: the same as in a simple probability sample.

2. Serial (nested) sampling. The sampling units are statistical series (family, school, team, etc.). The selected elements are subjected to continuous examination. The selection of statistical units can be organized according to the type of random or systematic sampling. Cons: Possibility of greater homogeneity than in the general population.

3. Zoned sample. In the case of a heterogeneous population, before using probability sampling with any selection technique, it is recommended to divide the population into homogeneous parts, such a sample is called a zoned sample. Zoning groups can be both natural formations (for example, city districts) and any feature underlying the study. The sign on the basis of which the division is carried out is called the sign of stratification and zoning.

4. "Convenient" selection. The procedure of "convenient" sampling consists in establishing contacts with "convenient" sampling units - with a group of students, sports team with friends and neighbors. If it is necessary to obtain information about people's reactions to a new concept, such a sample is quite reasonable. "Convenience" sampling is often used for preliminary testing of questionnaires.

Incredible Samples

The selection in such a sample is carried out not according to the principles of chance, but according to subjective criteria - accessibility, typicality, equal representation, etc.

1. Quota sampling - the sampling is constructed as a model that reproduces the structure of the general population in the form of quotas (proportions) of the studied characteristics. Number of sample items with different combination of the characteristics under study is determined in such a way that it corresponds to their share (proportion) in the general population. So, for example, if we have a general population of 5,000 people, of which 2,000 women and 3,000 men, then in the quota sample we will have 20 women and 30 men, or 200 women and 300 men. Quota samples are most often based on demographic criteria: gender, age, region, income, education, and others. Cons: usually such samples are not representative, because it is impossible to take into account several social parameters at once. Pros: easily accessible material.

2. Snowball method. The sample is constructed as follows. Each respondent, starting with the first, is asked to contact his friends, colleagues, acquaintances who would fit the selection conditions and could take part in the study. Thus, with the exception of the first step, the sample is formed with the participation of the objects of study themselves. The method is often used when it is necessary to find and interview hard-to-reach groups of respondents (for example, respondents with a high income, respondents belonging to the same professional group, respondents who have some similar hobbies / passions, etc.)

3. Spontaneous sampling - sampling of the so-called "first comer". Often used in television and radio polls. The size and composition of random samples is not known in advance, and is determined by only one parameter - the activity of the respondents. Disadvantages: it is impossible to determine what general population the respondents represent, and as a result, it is impossible to determine representativeness.

4. Route survey - often used if the unit of study is the family. On the map locality where the survey will be performed, all streets are numbered. With the help of a table (generator) of random numbers are selected big numbers. Each large number is considered as consisting of 3 components: street number (2-3 first numbers), house number, apartment number. For example, the number 14832: 14 is the street number on the map, 8 is the house number, 32 is the apartment number.

5. Zoned sampling with selection of typical objects. If, after zoning, a typical object is selected from each group, i.e. an object that, according to most of the characteristics studied in the study, approaches the average, such a sample is called zoned with the selection of typical objects.

Group Building Strategies

The selection of groups for their participation in a psychological experiment is carried out using various strategies that are needed in order to ensure the greatest possible compliance with internal and external validity.

· Randomization (random selection)

· Pairwise selection

· Stratometric selection

· Approximate modeling

· Engaging Real Groups

Randomization, or random selection, is used to create simple random samples. The use of such a sample is based on the assumption that each member of the population is equally likely to be included in the sample. For example, to make a random sample of 100 university students, you can put pieces of paper with the names of all university students in a hat, and then take 100 pieces of paper out of it - this will be random selection (Goodwin J., p. 147).

Pairwise selection- a strategy for constructing sample groups, in which groups of subjects are made up of subjects that are equivalent in terms of side parameters that are significant for the experiment. This strategy is effective for experiments using experimental and control groups with the best option- attracting twin pairs (mono- and dizygotic), as it allows you to create ...

Stratometric selection - randomization with the allocation of strata (or clusters). At this method sampling, the general population is divided into groups (strata) that have certain characteristics (sex, age, political preferences, education, income level, etc.), and subjects with the corresponding characteristics are selected.

Approximate modeling - drawing up limited samples and generalizing the conclusions about this sample to a wider population. For example, when participating in a study of students in the 2nd year of university, the data of this study are extended to "people aged 17 to 21 years." The admissibility of such generalizations is extremely limited.

Approximate modeling is the formation of a model that, for a clearly defined class of systems (processes), describes its behavior (or desired phenomena) with acceptable accuracy.

http://www.hi-edu.ru/e-books/xbook096/01/index.html?part-011.htm- a very useful site!

The sampling method of research is the main statistical method. This is natural, since the volume of the studied objects is usually infinite (and even if it is finite, it is very difficult to enumerate all the objects, one has to be content with only a part of them, a sample).

General and sample populations

The general set is the totality of all the elements studied in this experiment.

A sample set (or sample) is a finite set of objects randomly selected from the general population.

The volume of the population (sample or general) is the number of objects in this collection.

Example of population and sample populations

Let's say that a person's psychological predisposition to dividing a given segment in relation to the golden section is being investigated. Since the origin of the very concept of the golden section is dictated by the anthropometry of the human body, it is clear that in this case the general population is any anthropogenic creature that has reached physical maturity and acquired final proportions, that is, the entire adult part of humanity. The volume of this collection is practically infinite.

If this predisposition is studied exclusively in the artistic environment, then the general population is people who have direct relation to design: artists, architects, designers. There are also a lot of such people, and we can assume that the volume of the general population in this case is also infinite.

In both cases, for the study, we are forced to limit ourselves to reasonable sample sizes, choosing as representatives of both sets of students of technical specialties (as people far from the artistic world) or design students (as people directly related to the world of art). artistic images).

Representativeness

The main problem of the sampling method is the question of how accurately the objects selected from the general population for the study represent the studied characteristics of the general population, that is, the question of the representativeness of the sample.

So, the sample is called representative (representative) if it accurately represents the quantitative ratios of the general population.

Of course, it is difficult to say what exactly is hidden behind the vague wording. quite accurate. Representativeness issues are generally the most controversial in any experimental study. There are many who have already become classic examples when the insufficient representativeness of the sample led experimenters to absurd results.

As a rule, issues of representativeness are resolved with the help of peer review, when the scientific community accepts the point of view of a group of authoritative specialists about the correctness of the study.

Representativeness example

Let's go back to the segment division example. Questions of the representativeness of the samples here lie at the very heart of the study: in no case should we confuse the groups of subjects on the basis of their belonging to the artistic milieu.

Statistical distribution of the observed feature

Observed value frequency

Let, as a result of the test in the volume sample, the observed feature take on the values,, ..., and the value was observed once, the value-times, etc., the value was observed once. Then the frequency of the observed value is called the number, the values ​​are the number, and so on.

Relative frequency of observed value

The relative frequency of the observed value of a feature is the ratio of the frequency to the sample size:

It is clear that the sum of the frequencies of the observed feature should give the sample size

and the sum of the relative frequencies should give unity:

These considerations can be used to control the compilation of statistical tables. If the equalities are not observed, then an error was made when recording the results of the experiment.

Statistical distribution of observed value

The statistical distribution of the observed feature is the correspondence between the observed values ​​of the feature and the corresponding frequencies (or relative frequencies).

As a rule, the statistical distribution is written in the form of a two-line table, in which the observed values ​​of the feature are indicated in the first line, and the corresponding frequencies (or relative frequencies) are indicated in the second line:

Population


The statistical population consists of materially existing objects (Employees, enterprises, countries, regions), is an object
statistical research. Population
- a set of units that have mass character, typicality, qualitative uniformity and the presence of variation.

Population unit- each specific unit of the statistical population.

One and the same statistical population can be homogeneous in one feature and heterogeneous in another.

Qualitative uniformity- the similarity of all units of the population on any basis and dissimilarity on all the rest.

In a statistical population, the differences between one unit of the population and another are more often of a quantitative nature. Quantitative changes in the values ​​of the attribute of different units of the population are called variation.

Feature Variation - quantitative change trait (for a quantitative trait) when moving from one unit of the population to another.

sign is a property characteristic or other feature of units, objects and phenomena that can be observed or measured. Signs are divided into quantitative and qualitative. The diversity and variability of the value of a feature in individual units of the population is called variation.

Attributive (qualitative) features are not quantifiable (composition of the population by sex). Quantitative characteristics have a numerical expression (composition of the population by age).

Indicator- this is a generalizing quantitative and qualitative characteristic of any property of units or aggregates for the purpose in specific conditions of time and place.

Scorecard- is a set of indicators comprehensively reflecting the phenomenon under study.

For example, consider salary:
  • Sign - wages
  • Statistical population - all employees
  • Aggregate unit - each worker
  • Qualitative homogeneity - accrued salary
  • Feature variation - a series of numbers

General population and sample from it

The basis of statistical research is a set of data obtained as a result of measuring one or more characteristics. The actually observed set of objects, statistically represented by a series of observations of a random variable , is sampling, and hypothetically existing (thought-out) - general population. The general population can be finite (number of observations N = const) or infinite ( N = ∞), and a sample from the general population is always the result of a limited number of observations. The number of observations that make up a sample is called sample size. If the sample size is large enough n→∞) the sample is considered large, otherwise it is called a sample limited volume. The sample is considered small, if, when measuring a one-dimensional random variable, the sample size does not exceed 30 ( n<= 30 ), and when measuring simultaneously several ( k) features in a multidimensional space relation n to k less than 10 (n/k< 10) . The sample forms variation series if its members are order statistics, i.e., sample values ​​of the random variable X are sorted in ascending order (ranked), the values ​​of the attribute are called options.

Example. Almost the same randomly selected set of objects - commercial banks of one administrative district of Moscow, can be considered as a sample from the general population of all commercial banks in this district, and as a sample from the general population of all commercial banks in Moscow, as well as a sample of commercial banks in the country and etc.

Basic sampling methods

The reliability of statistical conclusions and meaningful interpretation of the results depends on representativeness samples, i.e. completeness and adequacy of the representation of the properties of the general population, in relation to which this sample can be considered representative. The study of the statistical properties of the population can be organized in two ways: using continuous and inconsistent observation. Continuous observation includes examination of all units studied aggregates, a non-continuous (selective) observation- only parts of it.

There are five main ways to organize sampling:

1. simple random selection, in which objects are randomly extracted from the general population of objects (for example, using a table or a random number generator), and each of the possible samples has an equal probability. Such samples are called actually random;

2. simple selection through a regular procedure is carried out using a mechanical component (for example, dates, days of the week, apartment numbers, letters of the alphabet, etc.) and the samples obtained in this way are called mechanical;

3. stratified selection consists in the fact that the general population of volume is subdivided into subsets or layers (strata) of volume so that . Strata are homogeneous objects in terms of statistical characteristics (for example, the population is divided into strata by age group or social class; enterprises by industry). In this case, the samples are called stratified(otherwise, stratified, typical, zoned);

4. methods serial selection are used to form serial or nested samples. They are convenient if it is necessary to examine a "block" or a series of objects at once (for example, a consignment of goods, products of a certain series, or the population in the territorial-administrative division of the country). The selection of series can be carried out in a random or mechanical way. At the same time, a continuous survey of a certain batch of goods, or an entire territorial unit (a residential building or a quarter) is carried out;

5. combined(stepped) selection can combine several selection methods at once (for example, stratified and random or random and mechanical); such a sample is called combined.

Selection types

By mind there are individual, group and combined selection. At individual selection individual units of the general population are selected in the sample set, with group selection- qualitatively homogeneous groups (series) of units, and combined selection involves a combination of the first and second types.

By method selection distinguish repeated and non-repetitive sample.

Unrepeatable called selection, in which the unit that fell into the sample does not return to the original population and does not participate in the further selection; while the number of units of the general population N reduced during the selection process. At repeated selection caught in the sample, the unit after registration is returned to the general population and thus retains an equal opportunity, along with other units, to be used in the further selection procedure; while the number of units of the general population N remains unchanged (the method is rarely used in socio-economic studies). However, with a large N (N → ∞) formulas for unrepeated selection are close to those for repeated selection and the latter are used almost more often ( N = const).

The main characteristics of the parameters of the general and sample population

The basis of the statistical conclusions of the study is the distribution of a random variable , while the observed values (x 1, x 2, ..., x n) are called realizations of the random variable X(n - sample size). The distribution of a random variable in the general population is theoretical, ideal in nature, and its sample analogue is empirical distribution. Some theoretical distributions are given analytically, i.e. them options determine the value of the distribution function at each point in the space of possible values ​​of the random variable . For a sample, it is difficult, and sometimes impossible, to determine the distribution function, therefore options are estimated from empirical data, and then they are substituted into an analytical expression describing the theoretical distribution. In this case, the assumption (or hypothesis) about the type of distribution can be both statistically correct and erroneous. But in any case, the empirical distribution reconstructed from the sample only roughly characterizes the true one. The most important distribution parameters are expected value and dispersion.

By their very nature, distributions are continuous and discrete. The best known continuous distribution is normal. Selective analogues of parameters and for it are: mean value and empirical variance. Among the discrete in socio-economic studies, the most commonly used alternative (dichotomous) distribution. The expectation parameter of this distribution expresses the relative value (or share) units of the population that have the characteristic under study (it is indicated by the letter ); the proportion of the population that does not have this feature is denoted by the letter q (q = 1 - p). The variance of the alternative distribution also has an empirical analog.

Depending on the type of distribution and on the method of selecting population units, the characteristics of the distribution parameters are calculated differently. The main ones for the theoretical and empirical distributions are given in Table. 9.1.

Sample share k n is the ratio of the number of units of the sample population to the number of units of the general population:

k n = n/N.

Sample share w is the ratio of units that have the characteristic under study x to sample size n:

w = n n / n.

Example. In a batch of goods containing 1000 units, with a 5% sample sample fraction k n in absolute value is 50 units. (n = N*0.05); if 2 defective products are found in this sample, then sample fraction w will be 0.04 (w = 2/50 = 0.04 or 4%).

Since the sample population is different from the general population, there are sampling errors.

Table 9.1 Main parameters of the general and sample populations

Math statistics is a branch of mathematics that studies approximate methods for finding distribution laws and numerical characteristics based on the results of an experiment.

Population is the set of all conceivable values ​​of observations (objects), homogeneous with respect to some feature, that could be made.

Sample this is a collection of randomly selected observations (objects) for direct study from the general population.

Statistical distribution is a combination of options x i and their corresponding frequencies n i .

Frequency histogram is a stepped figure consisting of adjacent rectangles built by this straight line, the bases of which are the same and equal to the width of the class, and the height is equal to either the frequency of falling into the interval n i or the relative frequency n i /n. The interval width i can be determined according to the Sturges formula:

I=(x max -x min)/(1+3.32lgn),

Where x max is the maximum; x min is the minimum value of the option, and their difference is called variation range; n is the sample size.

Frequency polygon – a broken line, segments of which connect points with coordinates x i , n i .

5. Characteristics of position (mode, median, sample mean) and dispersion (sample variance and sample standard deviation).

Fashion (M about ) it is a variant value such that the preceding and following values ​​have lower frequencies of occurrence.

For unimodal distributions, the mode is the most frequently occurring variant in a given population.

To determine the mode of interval series, the formula is:

M 0 =x lower +i*((n 2 -n 1 )/(2n 2 -n 1 +n 3 )),

where х lower is the lower bound of the modal class, i.e. class with the highest frequency of occurrence n 2 ; n 2 – modal class frequency; n 1 - the frequency of the class preceding the modal; n 3 is the frequency of the class following the modal; i is the width of the class interval.

Median (M e )- is the value of the feature. With respect to which the distribution series is divided into 2 parts equal in volume.

Sample mean - this is the arithmetic mean of the variant of the statistical series

Sample variance- the arithmetic mean of the squares of the deviation of the variant from their average value:

Standard deviation is the square root of the sample variance:

S in =√(S in 2 )

6. Estimation of the parameters of the general population based on its sample (point and interval). Confidence interval and confidence probability.

Numerical values ​​characterizing the general population are called parameters.

Statistical evaluation can be done in two ways:

1)point estimate- an estimate that is given for some specific point;

2)interval estimation– according to the sample data, the interval in which the true value lies with a given probability is estimated.

Point Estimation is an estimate that is determined by a single number. And this number is determined by the sample.

The point estimate is called wealthy, if, with an increase in the sample size, the sample characteristic tends to the corresponding characteristic of the general population.

The point estimate is called effective if it has the smallest sample distribution variance compared to other similar estimates.

A point estimate is called unbiased, if its mathematical expectation is equal to the estimating parameter for any sample size.

Unbiased estimator of the general mean(mathematical expectation) is the sample mean in:

in = i n i ,

where x i – sampling options; n i – frequency of occurrence variant x i ; n is the sample size.

Interval Estimation- this is a numerical interval, which is determined by two numbers - the boundaries of the interval containing an unknown parameter of the general population.

Confidence interval- this is the interval in which, with one or another predetermined probability, there is an unknown parameter of the general population.

Confidence probabilityp it is such a probability that the event of probability (1-p) can be considered impossible. α=1-p is the significance level. Usually, probabilities close to 1 are used as confidence probabilities. Then the event that the interval covers the characteristic will be practically reliable. These are p≥0.95, p≥0.99, p≥0.999.

For a small sample size (n<30) нормально распределенного количественного признака х доверительный интервал может иметь вид:

in - mt≤≤ in + mt (p≥0.95),

where is the general average; c – sample mean; t is the normalized Student's distribution index with (n-1) degrees of freedom, which is determined by the probability that the general parameter falls into this interval; m is the error of the sample mean.