assessment: principles of test construction Flashcards

Question 1

Q

Validity

Answer

A

How accurately an instrument measures a given construct. Validity is concerned with what an instrument measures, how well it does so, and the extent to which meaningful inferences can be made from the instrument’s results. The three main types of validity are (a) content validity, the extent to which an instrument’s content seems appropriate to its intended purpose; (b) criterion-related validity, the effectiveness of an instrument in predicting an individual’s performance on a specific criterion, either predictive or concurrent; (c) and construct validity, the extent to which an instrument measures a theoretical construct (i.e., idea or concept).

Question 2

Q

six types of reliability:

Answer

A

test-retest,
alternative form,
internal consistency,
split-half reliability,
inter-item consistency,
inter-rater

Question 3

Q

face validity

Answer

A

A superficial measure that is concerned with whether an instrument looks valid or credible. Face validity is not a true type of validity.

Question 4

Q

validity coefficient

Answer

A

Often used to report validity; a correlation between a test score and the criterion measure.

Question 5

Q

factor analysis

Answer

A

A statistical test used to reduce a larger number of variables (often items on an assessment) to a smaller number of factors (groups or factors). The two forms of factor analysis are (a) exploratory factor analysis (EFA), which involves an initial examination of potential models (or factor structures) that best categorize the variables and (b) confirmatory factor analysis (CFA), which refers to confirming the EFA results

Question 6

Q

standard error of estimate

Answer

A

A statistic that indicates the expected margin of error in a predicted criterion score due to the imperfect validity of the test.

Question 7

Q

sensitivity

Answer

A

the instrument’s ability to accurately identify the presence of a phenomenon

Question 8

Q

specificity

Answer

A

the instrument’s ability to accurately identify the absence of a phenomenon

Question 9

Q

false positive

Answer

A

an instrument inaccurately identifying the presence of a phenomenon

Question 10

Q

false negative

Answer

A

an instrument inaccurately identifying the absence of a phenomenon

Question 11

Q

efficiency

Answer

A

the ratio of total correct decisions divided by the total number of decisions

Question 12

Q

incremental validity

Answer

A

the extent to which an instrument enhances the accuracy of prediction of a specific criterion

Question 13

Q

decision accuracy

Answer

A

Decision Accuracy: The accuracy of an instrument in supporting counselor decisions. Decision accuracy often assesses sensitivity (the instrument’s ability to accurately identify the presence of a phenomenon); specificity (the instrument’s ability to accurately identify the absence of a phenomenon); false positive error (an instrument inaccurately identifying the presence of a phenomenon); false negative error (an instrument inaccurately identifying the absence of a phenomenon); efficiency (the ratio of total correct decisions divided by the total number of decisions); and incremental validity (the extent to which an instrument enhances the accuracy of prediction of a specific criterion).

Question 14

Q

Reliability

Answer

A

Consistency of scores attained by the same person on different administrations of the same test. Concerned with measuring the difference between (error) an individual’s observed test score and true test score: X = 1 + e. There are several different types: (a) test-retest reliability (sometimes called temporal stability) determines the correlation between the scores obtained from two different administrations of the same test, thus evaluating the consistency of scores across time; (b) alternate form reliability (sometimes called equivalent form reliability or parallel form reliability) compares the consistency of scores from two alternative, but equivalent, forms of the same test; (c) internal consistency measures the consistency of responses within a single administration of the instrument (two common types of internal consistency are split-half reliability and interitem reliability-e.g., KR-20 and coefficient alpha); and (d) interscorer reliability, sometimes called inter-rater reliability, is used to calculate the degree of consistency of ratings between two or more persons observing the same behavior or assessing an individual through observational or interview methods.

Question 15

Q

reliability coefficient

Answer

A

A measure of reliability of a set of scores on a test. Ranges from 0 to 1.00; the closer the coefficient to 1.00, the more reliable the scores

Question 16

Q

standard error of measurement (SEM)

Answer

A

A statistic that indicates how scores from repeated administrations of the same instrument to the same individual are distributed around the true score. The standard error of measurement is computed using the standard deviation and reliability coefficient of the test instrument

Question 17

Q

Item Analysis

Answer

A

: A procedure that involves statistically examining test-taker responses to individual test items with the intent to assess the quality of test items as well as the test as a whole. Item analysis is frequently used to eliminate confusing, easy, and difficult items from a test that will be used again.

Question 18

Q

item difficulty

Answer

A

The percentage of test-takers who answer a test item correctly, calculated by dividing the number of individuals who correctly answered the item by the total number of test-takers.

Question 19

Q

item discrimination

Answer

A

The degree to which a test item is able to correctly differentiate test-takers who vary according to the construct measured by the test. It is calculated by subtracting the performance of the top quarter of total scores from the bottom quarter of total scores on a given test item.

Question 20

Q

Test Theory

Answer

A

Assumes that test constructs, in order to be considered empirical, must be measurable for quality and quantity (Erford, 2013); consequently, test theory strives to reduce test error and enhance construct reliability and validity. The two common types of test theory are (a) classical test theory, which postulates that an individual’s observed score is the sum of the true score and the amount of error present during test administration and (b) item response theory, also referred to as modern test theory, which applies mathematical models to the data collected from assessments to evaluate how well individual test items and the test as a whole work.

Question 21

Q

Scale

Answer

A

A collection of items or questions that combine to form a composite score on a single variable. Scales can measure discrete or continuous variables and can describe data quantitatively or qualitatively

Question 22

Q

Likert scale

Answer

A

Commonly used to measure attitudes or opinions; typically includes a statement regarding the concept in question followed by answer choices that range from Strongly Agree to Strongly Disagree. Sometimes called Likert-type scale

Question 23

Q

semantic differential

Answer

A

A scaling technique rooted in the belief that people think dichotomously and commonly includes the statement of an affective question followed by a scale that asks test-takers to place a mark between two dichotomous adjectives. Also referred to as self-anchored scales.

Question 24

Q

Thurstone scale

Answer

A

Measures multiple dimensions of an attitude by asking respondents to express their beliefs through agreeing or disagreeing with item statements. The Thurstone scale has equal-appearing, successive intervals and employs a paired comparison method

Question 25

Q

Guttman scale

Answer

A

Measures the intensity of a variable being measured. Items are presented in a progressive order so that a respondent, who agrees with an extreme test item, will also agree with all previous, less extreme items

Brainscape's Knowledge GenomeTM

assessment: principles of test construction Flashcards

Brainscape's Knowledge Genome^TM