There are many times in
today’s society when psychological tests may be administered in order to
measure certain mental and/or behavioral characteristics. However, before administering
a psychological test, it is essential to try and choose the most appropriate testing instrument to
collect data. This is because when inappropriate test methods are used, the
final results can also acquire and/or measure inadequate levels of overall reliability
and/or validity.
Since,
this can occur my ultimate goal for this paper will be to provide a better
understanding of test reliability and validity by addressing certain aspects
that are directly associated with these concepts. This will include first
discussing what the reliability of a test is, why it is important, and how the
two main types which are test-retest reliability and internal consistency
reliability can be measured. I will then address what the validity of a test
is, why it is important, and how the four main forms which include face
validity, content validity, criterion-related validity and construct validity
can also be measured.
The
next section will identify certain things that psychologists
need to do to ensure that selected test methods will measure adequate levels of
reliability and validity. Three of these include gathering personal information
about each participant, choosing well-established test methods, and addressing any ethical, legal,
individual and socio-cultural issues that may arise. Some specific ethical
considerations that will also be addressed include confidentiality, cross
cultured sensitivity, informed consent, protection from harm, and test
administration.
Furthermore,
the final thing that I will discuss in this paper is why all four forms of validity must be
considered and applied when using test methods in different types of settings.
This is because the validity of a measure can change due to various factors
within each setting. One example of this could be if a psychologist decides to
use an intelligence test that is designed for adults with elementary school
students. In this case, the level of overall validity may be lacking because
young children don’t normally possess the same level of intelligence as adults.
Therefore, the test method in question would also not be appropriate to use
with both age groups.
Reliability of a Test
Reliability
can be defined as the degree of which scores from a specific
test are consistent and free from errors of measurement. Some possible factors that can cause errors of
measurement are the test environment, poorly worded questions, nervous test
takers, or unclear instructions from the examiner. There are also two types of reliability
that can be measured. The first type is test-retest reliability which is based
on how consistent participant responses are over time and it can be measured by
administering the same test at different times. While, the second type of reliability
is internal consistency and this is based on how consistently the same
construct is measured after administering only one test and then calculating
the average of all correlations among items (Zechmeister, Zechmeister & Shaughnessy. 2001).
Determining
test-retest reliability and internal consistency reliability is important because
it can confirm whether a specific test is consistent over time and measures the
construct that it was initially designed to measure. This data is also essential
because it allows psychologists the opportunity to confirm which testing
methods may be most reliable and appropriate to use with participants and/or
specified need. If both types of reliability cannot be measured than it is also
impossible to confirm validity and therefore, the test method in question should
not be used.
Validity of a Test
Validity
can be defined as the degree of how valid the scores from
a specific test are
when measuring what it is intended to measure. There are also different observational
forms of validity that must be addressed when completing a validation process. The first form is called face validity and this is used to determine if a
test appears to measure criterion within a specific domain. However, in many
areas of psychology, this may not be considered a true form of validity because
it is not certain that the appearance of test items is an accurate representation
of the intended domain (Neukrug, Fawcett. 2010). Therefore, this is considered a basic observational type of
validation that is used to measure the validity of a test at face value only. One example of how face validity can be measured is if a psychologist designs a test
to assess mathematical skill. He or she could then request feedback from laypeople
to determine if they agree that the test may actually measure mathematical
skill based on its appearance.
Providing
evidence of face validity is also important because no formal testing
instrument can be accepted in the field or used in future research and design
without it. However, there are instances when informal assessment tools that
lack face validity may still be used. One example of this is if a psychologist
designs an online survey that actually initiates sales of self-help products versus
its stated purpose of simply collecting consumer data.
The
second and simplest form of validity is known as content validity and this is
used to measure “how adequately a test samples behavior representative of the
universe of behavior that the test was designed to sample” (Cohen, Swerdlik.
2010. p. 176). This is similar to face validity but it confirms whether a test
actually measures criterion within a specific domain instead of just assuming
that it does. One example of this is if a psychologist has mathematical experts
confirm that a test used to assess mathematical skill actually does measure that
domain.
Confirming
content validity is also important because it gives psychologists the opportunity
to determine which test instruments measure an adequate level of content
validity versus those that do not. Furthermore, if a test instrument does
not have an adequate level of content validity then a more valid method should
be used. This is because it can ensure that the collected data may have a
higher level of overall validity, accuracy and truthfulness.
A
third form of validity is called criterion-related validity and this can determine
if a test method produces similar results when compared to valid established instruments
that measure the same variable. On example of criterion-related validity could
be if employee selection tests are validated against measures of a criterion
like job performance. There are also two types of criterion validity which are predictive and
concurrent validity. Predictive validity is based on how well an individual's
performance is predicted for a future measure and concurrent validity is based
on how test methods compare to similar instruments that measure the same
criterion.
Confirming evidence of criterion-related validity is also important
because it gives psychologists the opportunity to predict measures with future
participants and determine which instruments measure an adequate level of
criterion validity when compared to valid established tests. If a test instrument does
not have an adequate level of criterion validity then a more valid method
should be used. This is because it can ensure that collected data will closely
reflect the results that are measured when using valid more established test
methods.
A
fourth form is known as construct validity and according to (Bordens, Abbott.
2008), this type of “validity applies when a test is designed to measure a
"construct" or variable "constructed" to describe or
explain behavior on the basis of theory” (p. 130). Establishing construct
validity can be a tedious process because it requires a gradual accumulation of
evidence which supports that scores relate to observable behaviors in a way
that they were predicted by an underlying theory. One example of how a psychologist
can measure construct validity is when using a test that measures whether
participants who have higher intelligence scores will achieve higher grades in
school. There are also two types of construct validity which are known as convergent
and discriminant. Convergent validity may be measured when final results are
similar to a different test that measures the same construct and discriminant
validity occurs if the selected test does not measure constructs that it what
not intended to measure (Bordens, Abbott. 2008).
Establishing
construct validity is also important because it gives psychologists the
opportunity to determine whether a test acquires similar measures when compared
to similar methods and that it does not measure constructs that it is not
intended for. Furthermore, if a testing instrument does not have an
adequate level of construct validity then a more valid method should be used.
This is because it can help ensure that the results will be more accurate because
they do not include invalid measures that were obtained by measuring irrelevant
constructs.
How
Psychologists Can Ensure Adequate Test Reliability and Validity
Since
adequate levels of test reliability and validity are essential to acquire
reliable and valid results with fewer errors, there are certain things that
psychologists need to do to ensure that these measures exist before testing
each participant. The first thing that psychologists should do is gather pertinent
personal information about each participant before beginning the testing process.
This way, the psychologist will have detailed background information that can
be used to choose a test that is based on the individualized needs of each participant.
This will also be a great way to reveal any racial, gender, educational and/or
cultural background issues that may reduce the level of overall reliability and
validity. Two examples of this would be if a psychologist administers a test
written in English to a participant who can only read Spanish or if a
psychologist administers a test about auto-mechanic skills to a 5 year old child.
A
second thing that should be done to ensure adequate levels of reliability and
validity is to choose testing instruments that have already been measured for
these aspects. This is because if previous research has already established adequate
levels of these measures, then there may be a higher chance that these aspects
will be measured again. Ensuring previous reliability and validity can also aide
psychologists in determining which testing methods might be most appropriate to
use with each individual participant and/or specified need.
Furthermore, a third
thing that should be done to ensure
adequate reliability and validity is to follow all ethical and/or legal
standards. This is important because these standards have been created to protect participants from
experiencing certain negative psychological and/or physical affects that may
have occurred in the past. Therefore, the psychologist will need to address all
pertinent standards that may apply throughout the entire duration of each
testing process.
One
specific ethical standard that may apply when using testing methods with
participants is confidentiality. This is because it helps protect the rights of
all participants by mandating that personal information can only be released
under specific circumstances. Following this law is also important because it
helps ensure that no harm occurs due to personal information being released in
a malicious or damaging manner to third party members. However, the Behavior
Analyst Certification Board (2004) has determined that a professional can
disclose confidential information when it is mandated by law or for a valid
purpose. Some examples of this are if a professional needs to provide service for
an individual or organization, acquire payment for services that were
previously rendered or if a client is considered a danger to himself or others.
A second ethical standard
that may apply when conducting assessment testing is cross-cultured
sensitivity. This is because it states that professionals must be aware of
their own potential biases when administering, selecting, and interpreting
results as well as acknowledgment of potential effects due to differences in
age, cultural background, ethnicity, disability, gender, religion,
socioeconomic status, and sexual orientation. One example of this would be if a
psychologist refuses to work with a participant from a foreign country.
A third ethical standard
that may apply when using certain testing methods is informed consent. This is
important because it states that professionals must acquire permission prior to
assessing any participant. If the participant is a minor, a parent or caretaker
must give consent before any testing can occur. This can also be addressed by
ensuring that all pertinent consent forms are collected prior to beginning the
overall testing process.
A fourth ethical standard
that normally applies when using test methods is protection from harm. This is
because it ensures that no psychological or physical harm will occur to research
participants. Therefore, psychologists will need to determine the safest
possible way to use a specific testing method and if no method is available,
the test cannot be completed (Schacter, Gilbert, & Wegner. 2009). This can also be implemented by identifying
any aspects of testing that may be harmful to one or more participants. Once
these factors are identified, the professional must then take precautions to
prevent this possible harm from ever occurring.
Finally, a fifth ethical
standard that should be addressed prior to using most testing methods is test administration.
This states that tests should be administered according to how they were
established and any altercations should be noted and/or adjusted accordingly.
This is also important because it can ensure that the results will reflect
measurements for a specific construct and/or domain. When this occurs, it may
also be easier to measure adequate levels of reliability and validity for the
specific testing method that is used (Schacter, Gilbert, & Wegner. 2009).
Why It’s Important to Ensure Validity in Different
Types of Settings
Once
the validity of a test has been established, a psychologist will need to ensure
that it can be measured when assessing participants in all various types of settings.
This is because the level of test validity can change based on varying factors
within different settings. Therefore, certain steps may also need to be
completed to ensure that an adequate level of validity can be measured, no
matter which factors or setting is used.
One
specific setting that utilizes psychological testing on a normal basis is
educational institutions. This occurs because achievement and aptitude tests are regularly used to measure a
student’s overall level of knowledge about specific topics or aptitude that is
needed to master material within a certain domain. A specific test
instrument that is also widely used to assess these constructs is called the
Scholastic Aptitude Test (SAT) because institutions of higher education can use
the scores to make student admissions decisions (CollegeBoard. 2013).
Since this test is designed to assess a student’s achievement
or aptitude for future success it’s also important that all types and/or forms
of validity are present. This is because face validity indicates that the test
appears to measure achievement and/or aptitude at a level that is acceptable to
continue further research and design. Content validity indicates that the test
does actually measure achievement and aptitude at a level that is acceptable.
Criterion validity indicates that the test produces similar results when
compared to valid established instruments that measure achievement and
aptitude. While, construct validity indicates that there were similar measures
when compared to valid achievement and aptitude tests and that the test does
not measure irrelevant constructs that can negatively affect the overall level
of validity.
A
second setting that utilizes psychological testing on a normal basis is mental
health clinics. This is because various
test methods are regularly used to better understand individual style or aide
in clinical diagnoses. One specific paper–and-pencil method that is
widely used to assess personality is
called the Minnesota Multiphasic Personality Inventory or MMPI. However, this test is not perfect and has raised
questions concerning adequate measures of reliability and validity. Therefore,
a new revised version called the MMPI-2 can now be used which contains 567 test
items with scales that measure even more traits that are
associated with abnormal behavior (Lezak,
Howieson, & Loring. 2004).
Since
testing methods like the MMPI and MMPI-2 are designed to assess individual
personality for research and diagnostic purposes, it is also important that all
forms of validity are present. This is because face validity indicates that the
test appears to measure accurate levels of individual personality at a level
that is acceptable to continue further research and design. Content validity
indicates that the test actually does measure personality at a level that is
acceptable. Criterion-related validity indicates that the test produces similar
results when compared to valid established instruments that measure personality.
While, construct validity indicates that there are similar measures when
compared to valid personality tests and it does not measure irrelevant constructs.
Lezak, Howieson, & Loring (2004) also states that the MMPI-2 appears to
establish a higher degree of construct validity because it supports more
evidence of convergent and discriminant validation.
Summary
For several years,
psychological testing has been conducted because it gives psychologists the
opportunity to measure specific mental and/or behavioral characteristics in
human beings. When this process is going to be completed, it is always
important to try and choose
a test method that will measure accurate levels of reliability and validity. This
is because the final data may be considered more valuable and/or viable by
other professionals within the field.
With
these things in mind, my ultimate goal for this paper was to provide a better
understanding of reliability and validity by addressing certain aspects that
are directly associated with these concepts. This included first discussing what
the reliability of a test is, why it is important, and how the two main types which
are test-retest reliability and internal consistency reliability can be
measured. I then addressed what the validity of a test is, why it is important,
and how the four main forms which include face validity, content validity,
criterion-related validity and construct validity can also be measured.
This
was followed by identifying certain things that psychologists need to
do to ensure that selected test methods will measure adequate levels of
reliability and validity. Three of these include gathering personal information
about each participant, choosing well-established methods, and addressing all ethical, legal,
individual and socio-cultural issues apply. Some specific ethical considerations
that were also addressed in this section include confidentiality, cross
cultured sensitivity, informed consent, protection from harm, and test
administration.
Furthermore,
the final thing that I discussed in this paper is why all four forms of validity must be
considered when using test methods in different types of settings. This is
because the validity of a measure can change due to various factors within each
setting. I am also confident that if this information is followed by
psychologists, then it may be easier to confirm both reliability and validity
when using test methods within all types of settings.
References:
Behavior Analyst Certification Board (2004). Guidelines for responsible conduct for behavior analysts. Retrieved via Kaplan Online Campus at http://contentasc.kaplan.edu.edgesuite.net/PS502_1004A/images/product/Guidelines%20for%20Responsible%20Conduct.pdf
Bordens, K., & Abbott, B. (2008). Research design and methods. (7th ed.). New York, NY:
The McGraw-Hill Companies, Inc.
CollegeBoard.Org. (2012). SAT Validity Studies. Retrieved via the World Wide Web at http://professionals.collegeboard.com/data-reports-research/sat/validity-studies
Cohen, R. J., & Swerdlik, M. E. (2010). Psychological
testing and assessment: An introduction to tests and measurement. Boston, MA: McGraw-Hill Higher Education.
Neukrug, E. S., & Fawcett, R. C. (2010). Essentials of testing and assessment: A practical guide for counselors,
social workers, and psychologists. (2nd ed.). Belmont, CA:
Brooks/Cole Cengage Learning.
Lezak, M., Howieson, D., & Loring, D. (2004). Neuropsychological
assessment (4th ed.). Oxford: Oxford University Press.
Zechmeister, J. S.,
Zechmeister, E. B., & Shaughnessy, J. J. (2001). Essentials of research methods in psychology. New York, NY: The
McGraw-Hill Companies, Inc.