Stat - Exam #1 Flashcards Preview

Spring 2015 > Stat - Exam #1 > Flashcards

Flashcards in Stat - Exam #1 Deck (134)
Loading flashcards...

What is the methods to find the “kth” percentile?

1. RANK, from low to high (MUST);
2. Table, fill in (Index, Position, Value)
-index: i = (k/100) x n
— i = Integer = avg. i and (i+1) data points;
— i = decimal = next larger data point
3. Value: form position in the ranked set of data


What do the INDEX and POSTION show for the “kth" percentile?

-INDEX takes just below the desired percentile;
— POSTION takes the rest of the way


What are Quartiles?

-Quartiles are the most common percentiles;
-Divide a set of ranked data points into four equal parts, each part of the set of data contains 25% of the data points


What is the First Quartile?

-The number such that 25% of the ranked data points are smaller, and 75% are greater;
-Denoted: Q1


What is the Third Quartile?

-The number such that 75% of the ranked data points are smaller, 25% are greater;
-Denoted: Q3


What is the Second Quartiles?

-Actually the MEDIAN (and called such)


What is the main disadvantage for using RANGE for a summary number for spread?

-It is NOT a RESISTANT stat and is strongly affected by extreme data points;
-May not represent the bulk of the data, especially if the two extremes are considerable outliers;
-Need to correct by measuring the range of only 50% f the data points and not allow the influence of extremes = IQR


What are Fences?

-Check for extreme observations or outliers;
-Do NOT automatically kick-out, but check-it out;
-Use the fences to determine;
-OUTLIERS = Smaller than LOWER fence or larger than UPPER fence


Formula for LOWER Fence?

Q1 - 1.5(IQR)


Formular for UPPER Fence?

Q3 + 1.5(IQR)


What are the important RESISTANT measures of spread?

1. Median = resistant for location;
2. IQR = resistant for spread


What 3 numbers give the most information about data?

(Q1, M, Q3);
-Only missing the tails of data (min and max)


What is the Five-Number Summary?

-A set of numbers consisting of the smallest data (min), Q1, median (M), Q3, and the largest data value (max)
-{Min, Q1, M, Q3, Max}


What is a Boxplot?

-Picture of the 5-number summary


How can the SHAPE of a column of data be seen from a boxplot?

(Shape - Median - Tails)
1. Symmetric = center median = equal tails;
2. Skew Right = median left = right tail longer;
3. Skew Left = median right = left tail longer


Properties of Normal Distribution

-Probability = Area under curve;
-z-Transformation: z = (x-u)/sigma — value/pop. avg/pop.;
-Normal probability plot


What are the data characteristics for Normal Distribution?

Data type = Continuous
Data Distribution = Normal


What is Probability Density Function (PDF)?

-Equation of a curve used to compute probabilities of a continuous, random variable, which satisfies 2 conditions:
1. Area under ENTIRE curve must equal 1;
2. Curve must be greater than, or equal to, zero at every point — CANNOT be negative


What is the Normal Probability Density Function?

-Equation (don’t have to integrate);
-Describes asymmetric, bell-shaped curve;
-Completely defined by the mean and variance (standard deviation)


What defines the shape of normal curves?

-Defined by the equation and the o only difference will ever be the LOCATION or the SPREAD


What are the properties of Normal Distribution?

1. Symmetric about the mean (u) =
— Mode, median, and mean are the same point;
— Area under the curve to RIGHT of the mean (u) is equal to the area under the curve to the left of the mean (area=0.5);
2. Curve approaches but never touches zero;
3. Area under the curve is exactly 1 by definition


What are the Two Symmetries?

-These properties lead to two symmetries of the normal curve =
1. If the area under the curve to the left of point -a is A;
2. Then the area under the curve to the right of:
Symmetry 1: Point -a is (1-A);
Symmetry 2: Point a is (A)


What does the Area Under the Curve give?

-Area under the curve for an event gives the probability of the event happening, if the curve is a PDF.


What are the types of Probability?

1. PROPORTION of population described by the event;
2. PROBABILITY that a randomly selected individual from the population will be described by the event


What is the Empirical Rule?

For any NORMAL Curve:
-Between pop.mean(u) +/- 1SD = 68% Area
-Between pop.mean(u) +/- 2SD = 95% Area
-Between pop.mean(u) +/- 3SD= 99.7% Area;
Also, for the quartiles:
-Between pop.mean(u) +/-0.67= 50% Area


How will we find the are under a certain part of any curve?

-Convert any other normal curve into the STANDARD NORMAL CURVE and use one table


What is Standardizing of a normal random variable?

-Means to convert a column of data from x-values (normal distribution) to z-scores (standard normal distribution) = Z-transformation


P (z < a)

Probability that a standard normal random variable is...

LESS than a


P (a < z)

Probability that a standard normal random variable is...

GREATER than a


P (a < z < b)

Probability that a standard normal random variable is...