Flashcards in Stat - Exam #1 Deck (134)
What are Summary Numbers?
Numerical quantity used to describe data
What are the Numerical Measure (stats) that represent/summarize characteristics?
-Number of Observations = Size;
-Mean, Median, Mode = Location;
-Range, variance, standard deviation = Spread;
-Correlation, last-squares registration = Association
What is a Parameter?
-A summary number for a POPULATION;
— Constant as population does not change;
— Greek letters
What is a Statistic?
-A summary number for a SAMPLE;
— Variable as the same varies (changes)
What is a Resistant Statistic?
-A stat that is NOT sensitive to extreme data values;
**MEDIAN is more resistant than the MEAN — median won’t typically alter with an outlier
What is Size?
The size of a set of data is the NUMBER of INDIVIDUALS (data points) in the set;
— Sample = n;
— Population = N
What are summary numbers for Location?
-Tell where the MIDDLE of the data is located on the real number line;
-To find the middle, imagine a histogram;
-Determine the middle of the histogram and find where the middle hits the number line;
**Condensing column of data to ONE number
What are the numerical measures for location?
-Mean, Median, and Mode;
-Best measure of location depends on:
1. TYPE of data;
2. SHAPE of distribution
What are the measures per data types for location?
-Quantitative = Symmetric = MEAN (most info):
-Quantitative = Skewed = MEDIAN (“ugly” graphs):
-Qualitative = No shape = MODE
What is Binomial Data?
-Special case of qualitative data;
-Only TWO values: 0 for failure and 1 for success;
-Best measure is the PROPORTION of successes, which is the average of the data (x-bar)
What is the Mean?
-Arithmetic average of all data points (balance point);
-Sample = X-bar
-Population = u;
-Used for discrete and continuous data;
-Advantages = easily understood; uses all points;
-Disadvantages = affected by extreme data
Why is the mean the most commonly used measure of location?
-Algebraically easy to use;
-Statistically more stable in that it tends to vary less from sample to sample than other measures
What is the method to find the MEAN?
1. Rank the data points from lowest to highest (optional);
2. Sum of all values;
3. Divide by the number of data points
What is the Median?
-The numerical value that lies in the medal of a ranked set of sat;
-Sample = M
-Population = M;
-Used for discrete and continuous data;
-Advantages = NOT affected by outliers (RESISTANT);
-Disadvantages = uses information from only the position of data
The median is the balance of what?
-The balance point for the NUMBER of DATA POINTS;
-Half below and half above;
-Median with be ONLY the
1. Value of a data point, or;
2. Simple average between two adjacent data points
What is the method to find the Median?
1. RANK (must) the data points from low to high;
2. TABLE, fill in Index, Position, Value;
(Find the singular middle value of an odd numbered set, or find the average of the 2 middle in an even numbered set)
What is the Mode?
-The value that occurs MOST frequently in a set of data;
-Sample = Mode;
-Population = Mode;
-Advantages = Easy to find;
-Disadvantages = Not unique and uses info from only part of data
What “completely” describes b column of numbers?
-Spread tell how WIDESPREAD data is on the real number line;
-WIDTH = indication of variability
How does Variability indicate variance?
-More variable data has a GREATER width;
-Less variable data has SMALLER width
What are the 3 common measures of SPREAD?
3. Standard Deviation;
4. Interquartile Range (IQR)
What is the Range?
-The difference between the largest and smallest data value;
— Sample = R;
— Population = R;
-Advantages = Easy calc;
-Disadvantage = Not resistant, Use only the two most extreme data points (NOT all data points)
What is the method to find the Range?
1. Rank the data from low to high (MUST);
2. Find the largest data value and the smallest data value;
3. Take the difference: R = max - min
What is the Interquartile Range (IQR)?
The difference in the 75th percentile and the 25th percentile ;
— Sample = IQR;
— Population = IQR;
-Advantages = RESISTANT version of the range (removes outliers);
-Disadvantages = Only gives spread of 50% of data;
EX: P(75) - P(25) = IQR
What is the Variance?
*Most important summary number for spread in stats;
-The ‘average’ of the squared deviations of the data points from the mean;
-NOT the ‘true average’ because it is divided by the degrees of freedom and not by the number of data points;
— Sample = S(^2);
— Population = sigma(^2))
-Advantages = BEST estimator of spread;
-Disadvantages = Uses DIFF measurements than the mean
What is the method to find the Variance?
1. Rank the data points;
2. Use the Sum-of-Squares Table;
3. Calculation: s(^2) = [Sum of (X(i) - avg.)^2 / (n-1)]
*Divide the Sum-of-Squares by the degrees of freedom
What is the formula for Sum-of-Squares?
Summation of (X(i) - avg.)^2
What is the Degrees of Freedom
-Sample number(n) -1;
-Estimated the mean to calc variance, so lost one degree of freedom;
-or Know the true value of the mean, the data points can take any possible value except the last data point…last data point must take one, specific value to give the correct value of the mean and is not free;
= Number of FREE data
What is the Standard Deviation?
-Square root of the variance;
-*Most COMMON summary for spread;
— Sample = s;
— Population = sigma;
-Gives the “average” deviation of the data from this mean; Not a true average since it is divided by degrees of freedom, not data points
What is the method of the Standard Deviation?
1. Take the square-root of the variance: s = sq. root of s(^2)