Test Construction Flashcards

Question 1

Q

Some test experts use the term _____________ to refer to the extent to which test items contribute to achieving the stated goals of testing.

Answer

A

relevance

Question 2

Q

A determination of relevance is based on a qualitative judgement that takes into account which factors?

Answer

A

Content appropriateness (Does the item actually assess the content or behavior domain that the test is designed to evaluate?) Taxonomic level (Does the item reflect the appropriate cognitive or ability level?) Extraneous abilities (To what extent does the item require knowledge, skills, or abilities outside the domain of interest?)

Question 3

Q

An item’s difficulty is measured by calculating an item difficulty index (p), which is what equation?

Answer

A

The value of p ranges from 0 to 1.0, with larger values indicating easier items. When p is equal to 1.0, this means the item was answered correctly by all examinees; when p is 0, this indicates that none of the examinees answered the item correctly.

Question 4

Q

In most situations, a p value of _____ is optimal. One exception is the case of a true/false test, for which the optimal p value is _____.

Question 5

Q

______________________ refers to the extent to which a test item differentiates between examinees who obtain high versus low scores on the entire test or on an external criterion.

Answer

A

Item discrimination

Question 6

Q

The item discrimination index ranges from _____ to _____.

Answer

A

-1.0; +1.0

Question 7

Q

For most tests, an item with a discrimination index of _____ or higher is considered acceptable.

Question 8

Q

When using item response theory, an ______________________ is constructed for each item by plotting the proportion of examinees in the tryout sample who answered the item correctly against either the total test score, performance on an external criterion, or a mathematically-derived estimate of a latent ability or trait.

Answer

A

Item characteristic curve (ICC)

Question 9

Q

The theory of measurement that regards observed variability in test scores as reflecting two components: true differences between examinees on the attributes measured by the test and the effects of measurement (random) error

Answer

A

Classical test theory

Question 10

Q

______________ is a measure of true score variability. It reforest to the consistency of test scores; i.e., the extent to which a test measures an attribute without being affected by random fluctuations (measurement error) that produce inconsistencies over time, across items, or over different forms.

Answer

A

Reliability

Question 11

Q

When a test is ____________, it provides dependable, consistent results and, for this reason, the term consistency is often given as a synonym.

Question 12

Q

What are some methods for establishing reliability?

Answer

A

test-retest, alternative forms, split-half, coefficient alpha, and inter-rater

Question 13

Q

Most methods for estimating reliability produce a ______________________, which is a correlation coefficient that ranges in value from 0.0 to 1.0.

Answer

A

Reliability coefficient

Question 14

Q

What does it mean if a test’s reliability coefficient is 0.0?

Answer

A

All variability in obtained test scores is due to measurement error.

Question 15

Q

When a test’s reliability coefficient is 1.0, this indicates that all variability reflects what?

Answer

A

True score variability

Question 16

Q

If a test has a reliability coefficient of .91, this means that ____% of variability in obtained test scores is due to ______________ variability, while the remaining 9% reflects _____________.

Answer

A

91; true score; measurement error

Question 17

Q

Match the method for estimating reliability to the correct definition: a. Test-Retest Reliability b. Alternate (Equivalent, Parallel) Forms Reliability c. Internal Consistency Reliability d. Inter-Rater (Inter-Scorer, Inter-Observer) Reliability 1. ___ To assess this, two equivalent forms of the test are administered to the same group of examinees and the two sets of scores are correlated. Indicates the consistency of responding to different item samples and, when the forms are administered at different times, the consistency of responding over time. 2. ___ Involves administering the same test to the same group of examinees on two different occasions and then correlating the two sets of scores. It is used for determining the reliability of tests designed to measure attributes that are relatively stable over time and that are not affected by repeated measurement (i.e., aptitude). Most thorough. 3. ___Split-half reliability and coefficient alpha are two methods for evaluating this. Both involve administering the test once to a single group of examinees and is useful when a test is designed to measure a single characteristic, when the characteristic measured by the test fluctuates over time, or when scores are likely to be affected by repeated exposure to the test. 4. ___ Is of concern whenever test scores depend on a rater’s judgement. It’s assessed either by calculating a correlation coefficient or by determining the percent of agreement between two or more raters.

Answer

A

b 2. a 3. c 4. d

Question 18

Q

Link the term that belong together: a. Spearman-brown formula b. KR-20 c. Kappa statistic 1. Inter-rater reliability 2. Split-half reliability 3. Coefficient alpha

Answer

A

c 2. a. 3. c

Question 19

Q

_________________ reliability is the most thorough method for estimating reliability.

Answer

A

Alternate forms

Question 20

Q

_________________ reliability is not appropriate for speed tests.

Answer

A

Internal consistency

Question 21

Q

The magnitude of a reliability coefficient is affected by several factors. In general, the longer a test, the _______________ its reliability coefficient. The _______________ formula is used to estimate the effect of lengthening or ______________ a test on its reliability coefficient. If the new items do not represent the same content domain as the original items or are more susceptible to measurement error, this formula is likely to _____________ the effects of lengthening the test. Like other correlation coefficients, the reliability coefficient is affected by the range of scores: The greater the range, the ___________ the reliability the coefficient. To maximize a test’s reliability coefficient, the sample of examinees should include people who are ___________ with regard to the attributes measured by the test. A reliability coefficient is also affected by the probability that an examinee can select the correct answer to a test question by guessing. The easier it is to guess the correct answer, the ___________ the reliability coefficient.

Answer

A

larger Spearman-Brown shortening overestimate larger heterogeneous smaller

Question 22

Q

While the reliability coefficient is useful for assessing the amount of variability in test scores that is due to _____________ variability for a group of examinees, it does not directly indicate how much we can expect an individual examinee’s obtained score to reflect his or her true score. The standard error of ________________. It is calculated by multiplying the standard deviation of the test scores by the ___________________ of one minus the reliability coefficient.

Answer

A

True score Measurement square root

Question 23

Q

_____________ refers to a test’s accuracy.

Question 24

Q

There are three main forms of validity: ___________ validity is of concern whenever a test has been designed to measure one or more content or behavior domains. ________________ validity is important when a test will be used to measure a hypothetical trait such as achievement, motivation, intelligence, or mechanical aptitude. ___________ validity is of interest when a test has been designed to estimate or predict performance on another measure.

Answer

A

Content Construct Criterion-related

Question 25

Q

One method for assessing a test’s construct validity is to determine if the test has both ______________ and _______________ validity.

Answer

A

Convergent; discriminant (divergent)

Question 26

Q

When a test has high correlations with measures that assess the same construct, this provides the evidence of the tests _______________ validity; when a test has low correlations with measures of unrelated characteristics, this indicates that the test has _______________ validity.

Answer

A

Convergent; discriminant (divergent)

Question 27

Q

_____________________ is used to identify the dimensions that underlie the intercorrelations among a set of tests.

Answer

A

Factor analysis

Question 28

Q

In factor analysis, a test is shown to have construct validity when it has _______ correlations with the factors it is expected to correlate with and ______ correlations with the factors it is not expected to correlate with.

Answer

A

high; low

Question 29

Q

__________________ validity is of interest whenever test scores are to be used to draw conclusions about an examinee’s likely standing or performance on another measure.

Answer

A

Criterion-related

Question 30

Q

What are the two forms of criterion related validity?

Answer

A

Concurrent and predictive

Question 31

Q

When establishing ______________ validity, the predictor is administered to a sample of examinees prior to the criterion. It is the appropriate type of validity when the goal of testing is to predict __________ status on the criterion. When evaluating _____________ validity, the predictor and criterion are administered at about the same time. It is the preferred method for assessing validity when the purpose of testing is to estimate __________ status on the criterion.

Answer

A

predictive; future; concurrent; current

Question 32

Q

The data collected in a concurrent or predictive validity study can also be used to assess a predictor’s ________________, or the increase in correct decisions that can be expected if the predictor is used as a decision-making tool.

Answer

A

Incremental validity

Question 33

Q

_______________ occurs when a rater’s knowledge of a person’s predictor performance affects how he/she rates the person on the criterion.

Answer

A

Criterion contamination

Question 34

Q

A ____________ expresses an examinee’s raw score in terms of the percentage of examinees in the norm sample who achieved lower scores.

Answer

A

percentile rank

Question 35

Q

When an examinee’s raw test score is converted to a ___________________, the transformed score indicates the examinee’s position in the normative sample in terms of standard deviations from the mean.

Answer

A

Standard scores

Question 36

Q

The ________ equivalent for an examinee’s raw score is calculated by subtracting the mean of the distribution from the raw score to obtain a deviation score and then dividing the deviation score by the distributions standard deviation.

Answer

A

z-score z= (X-M) —— SD

Question 37

Q

The optimal item difficulty level for a true/false test is:

a. .25
b. .50
c. .75
d. 1.00

Question 38

Q

For a test item that has an item discrimation index (D) of +1.0, you would expect:

a. high achievers to be more likely to answer the item correctly than low achievers
b. low achievers to be more likely to answer the item correctly than high achievers
c. low and high achievers to be equally likely to answer the item correctly
d. low and high achievers to be equally likely to answer the item incorrectly

Answer

A

a. When all examinees in the upper group and none in the lower group answered the item correctly, D is equal to +1.0

Question 39

Q

Dina Receives a precentile rank of 48 on a test, and her twin brother, Dino receives a percentile rank of 98. Their teacher realizes that she made an error in scoring thier tests and adds four points to Dina and Dino’s raw scores. (The other students; tests were scored correctly.) When she recalculates Dina and dino’s percentile ranks, she will find that:

a. Dina’s percentile rank will change by more points than Dino’s
b. Dino’s percentile rank will change by more points than Dina’s
c. Dina and Dino’s percentile ranks will change by the same number of points
d. Dina and Dino’s percentile ranks will not change

Question 40

Q

Percentile ranks and standard scores share in common which of the following:

a. both types of transformed scores are normally distributed regardless of the shape of the raw score distribution
b. both report an examinee’s test score in terms of standard deviation units from the mean
c. both reference an examinee’s score to a prespecified external standard
d. both reference an examinee’s score to those achieved by examinees in the standardization sample

Answer

A

d. Percentile ranks and standard scores are both norm-referenced scores

Question 41

Q

A Wechsler IQ score is a(n):

a. percentile rank
b. standard score
c. ipsative score
d. stanine score

Question 42

Q

Assuming a normal distribution, which of the following represents the highest score:

a. a z-score of 1.5
b. a T-score of 70
c. A WAIS score of 120
d. a percentile rank of 92

Answer

A

b. a T score of 70 is two standard deviations above the mean

Question 43

Q

A test developer uses a sample of 50 employees to develop a new selection technique. When she correlates scores on the selection test with scores on a measure of job performance, she obtains a validity coefficient of .35. When the test developer adminsters the test and measure of job performance to another sample of 50 employees, she will most likely obtain a validity coefficient that is:

a. greater than .35
b. less than .35
c. about .35
d. negative in value

Answer

A

b. When a test is cross-validated on another sample, the validity coefficient ordinarily shrinks (is smaller)

Question 44

Q

In terms of item response theory, the slope (steepness) of the item response curve indicates the item’s:

a. difficulty
b. discriminability
c. reliability
d. validity

Question 45

Q

A researcher correlates scores on two alternate forms of an achievement test and obtains a correlation coefficent of .80. This means that ___% of observed test score variability reflects true score variability.

a. 80
b. 64
c. 36
d. 20

Question 46

Q

To estimate the effects of lengthening a 50-item test to 100 items on the test’s reliability, you would use which of the following:

a. Pearson r
b. Kuder-Richardson Formula 20
c. kappa coefficient
d. Spearman-Brown Formula

Brainscape's Knowledge GenomeTM

Test Construction Flashcards

Brainscape's Knowledge Genome^TM