Saturday, June 15, 2013

UNDERSTANDING RELIABILITY & VALIDITY OF A TEST

     There are many times in today’s society when psychological tests may be administered in order to measure certain mental and/or behavioral characteristics. However, before administering a psychological test, it is essential to try and choose the most appropriate testing instrument to collect data. This is because when inappropriate test methods are used, the final results can also acquire and/or measure inadequate levels of overall reliability and/or validity.
     Since, this can occur my ultimate goal for this paper will be to provide a better understanding of test reliability and validity by addressing certain aspects that are directly associated with these concepts. This will include first discussing what the reliability of a test is, why it is important, and how the two main types which are test-retest reliability and internal consistency reliability can be measured. I will then address what the validity of a test is, why it is important, and how the four main forms which include face validity, content validity, criterion-related validity and construct validity can also be measured.
     The next section will identify certain things that psychologists need to do to ensure that selected test methods will measure adequate levels of reliability and validity. Three of these include gathering personal information about each participant, choosing well-established test methods, and addressing any ethical, legal, individual and socio-cultural issues that may arise. Some specific ethical considerations that will also be addressed include confidentiality, cross cultured sensitivity, informed consent, protection from harm, and test administration.
     Furthermore, the final thing that I will discuss in this paper is why all four forms of validity must be considered and applied when using test methods in different types of settings. This is because the validity of a measure can change due to various factors within each setting. One example of this could be if a psychologist decides to use an intelligence test that is designed for adults with elementary school students. In this case, the level of overall validity may be lacking because young children don’t normally possess the same level of intelligence as adults. Therefore, the test method in question would also not be appropriate to use with both age groups. 
Reliability of a Test
    Reliability can be defined as the degree of which scores from a specific test are consistent and free from errors of measurement. Some possible factors that can cause errors of measurement are the test environment, poorly worded questions, nervous test takers, or unclear instructions from the examiner. There are also two types of reliability that can be measured. The first type is test-retest reliability which is based on how consistent participant responses are over time and it can be measured by administering the same test at different times. While, the second type of reliability is internal consistency and this is based on how consistently the same construct is measured after administering only one test and then calculating the average of all correlations among items (Zechmeister, Zechmeister & Shaughnessy. 2001).
     Determining test-retest reliability and internal consistency reliability is important because it can confirm whether a specific test is consistent over time and measures the construct that it was initially designed to measure. This data is also essential because it allows psychologists the opportunity to confirm which testing methods may be most reliable and appropriate to use with participants and/or specified need. If both types of reliability cannot be measured than it is also impossible to confirm validity and therefore, the test method in question should not be used.
Validity of a Test
     Validity can be defined as the degree of how valid the scores from a specific test are when measuring what it is intended to measure. There are also different observational forms of validity that must be addressed when completing a validation process. The first form is called face validity and this is used to determine if a test appears to measure criterion within a specific domain. However, in many areas of psychology, this may not be considered a true form of validity because it is not certain that the appearance of test items is an accurate representation of the intended domain (Neukrug, Fawcett. 2010). Therefore, this is considered a basic observational type of validation that is used to measure the validity of a test at face value only. One example of how face validity can be measured is if a psychologist designs a test to assess mathematical skill. He or she could then request feedback from laypeople to determine if they agree that the test may actually measure mathematical skill based on its appearance.
     Providing evidence of face validity is also important because no formal testing instrument can be accepted in the field or used in future research and design without it. However, there are instances when informal assessment tools that lack face validity may still be used. One example of this is if a psychologist designs an online survey that actually initiates sales of self-help products versus its stated purpose of simply collecting consumer data.
     The second and simplest form of validity is known as content validity and this is used to measure “how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample” (Cohen, Swerdlik. 2010. p. 176). This is similar to face validity but it confirms whether a test actually measures criterion within a specific domain instead of just assuming that it does. One example of this is if a psychologist has mathematical experts confirm that a test used to assess mathematical skill actually does measure that domain.
     Confirming content validity is also important because it gives psychologists the opportunity to determine which test instruments measure an adequate level of content validity versus those that do not. Furthermore, if a test instrument does not have an adequate level of content validity then a more valid method should be used. This is because it can ensure that the collected data may have a higher level of overall validity, accuracy and truthfulness.
     A third form of validity is called criterion-related validity and this can determine if a test method produces similar results when compared to valid established instruments that measure the same variable. On example of criterion-related validity could be if employee selection tests are validated against measures of a criterion like job performance. There are also two types of criterion validity which are predictive and concurrent validity. Predictive validity is based on how well an individual's performance is predicted for a future measure and concurrent validity is based on how test methods compare to similar instruments that measure the same criterion.
     Confirming evidence of criterion-related validity is also important because it gives psychologists the opportunity to predict measures with future participants and determine which instruments measure an adequate level of criterion validity when compared to valid established tests. If a test instrument does not have an adequate level of criterion validity then a more valid method should be used. This is because it can ensure that collected data will closely reflect the results that are measured when using valid more established test methods.
     A fourth form is known as construct validity and according to (Bordens, Abbott. 2008), this type of “validity applies when a test is designed to measure a "construct" or variable "constructed" to describe or explain behavior on the basis of theory” (p. 130). Establishing construct validity can be a tedious process because it requires a gradual accumulation of evidence which supports that scores relate to observable behaviors in a way that they were predicted by an underlying theory. One example of how a psychologist can measure construct validity is when using a test that measures whether participants who have higher intelligence scores will achieve higher grades in school. There are also two types of construct validity which are known as convergent and discriminant. Convergent validity may be measured when final results are similar to a different test that measures the same construct and discriminant validity occurs if the selected test does not measure constructs that it what not intended to measure (Bordens, Abbott. 2008).
     Establishing construct validity is also important because it gives psychologists the opportunity to determine whether a test acquires similar measures when compared to similar methods and that it does not measure constructs that it is not intended for. Furthermore, if a testing instrument does not have an adequate level of construct validity then a more valid method should be used. This is because it can help ensure that the results will be more accurate because they do not include invalid measures that were obtained by measuring irrelevant constructs.
How Psychologists Can Ensure Adequate Test Reliability and Validity
     Since adequate levels of test reliability and validity are essential to acquire reliable and valid results with fewer errors, there are certain things that psychologists need to do to ensure that these measures exist before testing each participant. The first thing that psychologists should do is gather pertinent personal information about each participant before beginning the testing process. This way, the psychologist will have detailed background information that can be used to choose a test that is based on the individualized needs of each participant. This will also be a great way to reveal any racial, gender, educational and/or cultural background issues that may reduce the level of overall reliability and validity. Two examples of this would be if a psychologist administers a test written in English to a participant who can only read Spanish or if a psychologist administers a test about auto-mechanic skills to a 5 year old child.
     A second thing that should be done to ensure adequate levels of reliability and validity is to choose testing instruments that have already been measured for these aspects. This is because if previous research has already established adequate levels of these measures, then there may be a higher chance that these aspects will be measured again. Ensuring previous reliability and validity can also aide psychologists in determining which testing methods might be most appropriate to use with each individual participant and/or specified need.
     Furthermore, a third thing that should be done to ensure adequate reliability and validity is to follow all ethical and/or legal standards. This is important because these standards have been created to protect participants from experiencing certain negative psychological and/or physical affects that may have occurred in the past. Therefore, the psychologist will need to address all pertinent standards that may apply throughout the entire duration of each testing process.
     One specific ethical standard that may apply when using testing methods with participants is confidentiality. This is because it helps protect the rights of all participants by mandating that personal information can only be released under specific circumstances. Following this law is also important because it helps ensure that no harm occurs due to personal information being released in a malicious or damaging manner to third party members. However, the Behavior Analyst Certification Board (2004) has determined that a professional can disclose confidential information when it is mandated by law or for a valid purpose. Some examples of this are if a professional needs to provide service for an individual or organization, acquire payment for services that were previously rendered or if a client is considered a danger to himself or others.
     A second ethical standard that may apply when conducting assessment testing is cross-cultured sensitivity. This is because it states that professionals must be aware of their own potential biases when administering, selecting, and interpreting results as well as acknowledgment of potential effects due to differences in age, cultural background, ethnicity, disability, gender, religion, socioeconomic status, and sexual orientation. One example of this would be if a psychologist refuses to work with a participant from a foreign country.
     A third ethical standard that may apply when using certain testing methods is informed consent. This is important because it states that professionals must acquire permission prior to assessing any participant. If the participant is a minor, a parent or caretaker must give consent before any testing can occur. This can also be addressed by ensuring that all pertinent consent forms are collected prior to beginning the overall testing process.
     A fourth ethical standard that normally applies when using test methods is protection from harm. This is because it ensures that no psychological or physical harm will occur to research participants. Therefore, psychologists will need to determine the safest possible way to use a specific testing method and if no method is available, the test cannot be completed (Schacter, Gilbert, & Wegner. 2009). This can also be implemented by identifying any aspects of testing that may be harmful to one or more participants. Once these factors are identified, the professional must then take precautions to prevent this possible harm from ever occurring.    
     Finally, a fifth ethical standard that should be addressed prior to using most testing methods is test administration. This states that tests should be administered according to how they were established and any altercations should be noted and/or adjusted accordingly. This is also important because it can ensure that the results will reflect measurements for a specific construct and/or domain. When this occurs, it may also be easier to measure adequate levels of reliability and validity for the specific testing method that is used (Schacter, Gilbert, & Wegner. 2009).
Why It’s Important to Ensure Validity in Different Types of Settings
     Once the validity of a test has been established, a psychologist will need to ensure that it can be measured when assessing participants in all various types of settings. This is because the level of test validity can change based on varying factors within different settings. Therefore, certain steps may also need to be completed to ensure that an adequate level of validity can be measured, no matter which factors or setting is used.
     One specific setting that utilizes psychological testing on a normal basis is educational institutions. This occurs because achievement and aptitude tests are regularly used to measure a student’s overall level of knowledge about specific topics or aptitude that is needed to master material within a certain domain. A specific test instrument that is also widely used to assess these constructs is called the Scholastic Aptitude Test (SAT) because institutions of higher education can use the scores to make student admissions decisions (CollegeBoard. 2013).
     Since this test is designed to assess a student’s achievement or aptitude for future success it’s also important that all types and/or forms of validity are present. This is because face validity indicates that the test appears to measure achievement and/or aptitude at a level that is acceptable to continue further research and design. Content validity indicates that the test does actually measure achievement and aptitude at a level that is acceptable. Criterion validity indicates that the test produces similar results when compared to valid established instruments that measure achievement and aptitude. While, construct validity indicates that there were similar measures when compared to valid achievement and aptitude tests and that the test does not measure irrelevant constructs that can negatively affect the overall level of validity.
     A second setting that utilizes psychological testing on a normal basis is mental health clinics. This is because various test methods are regularly used to better understand individual style or aide in clinical diagnoses. One specific paper–and-pencil method that is widely used to assess personality is called the Minnesota Multiphasic Personality Inventory or MMPI. However, this test is not perfect and has raised questions concerning adequate measures of reliability and validity. Therefore, a new revised version called the MMPI-2 can now be used which contains 567 test items with scales that measure even more traits that are associated with abnormal behavior (Lezak, Howieson, & Loring. 2004).
     Since testing methods like the MMPI and MMPI-2 are designed to assess individual personality for research and diagnostic purposes, it is also important that all forms of validity are present. This is because face validity indicates that the test appears to measure accurate levels of individual personality at a level that is acceptable to continue further research and design. Content validity indicates that the test actually does measure personality at a level that is acceptable. Criterion-related validity indicates that the test produces similar results when compared to valid established instruments that measure personality. While, construct validity indicates that there are similar measures when compared to valid personality tests and it does not measure irrelevant constructs. Lezak, Howieson, & Loring (2004) also states that the MMPI-2 appears to establish a higher degree of construct validity because it supports more evidence of convergent and discriminant validation.
Summary
     For several years, psychological testing has been conducted because it gives psychologists the opportunity to measure specific mental and/or behavioral characteristics in human beings. When this process is going to be completed, it is always important to try and choose a test method that will measure accurate levels of reliability and validity. This is because the final data may be considered more valuable and/or viable by other professionals within the field.
     With these things in mind, my ultimate goal for this paper was to provide a better understanding of reliability and validity by addressing certain aspects that are directly associated with these concepts. This included first discussing what the reliability of a test is, why it is important, and how the two main types which are test-retest reliability and internal consistency reliability can be measured. I then addressed what the validity of a test is, why it is important, and how the four main forms which include face validity, content validity, criterion-related validity and construct validity can also be measured.
     This was followed by identifying certain things that psychologists need to do to ensure that selected test methods will measure adequate levels of reliability and validity. Three of these include gathering personal information about each participant, choosing well-established methods, and addressing all ethical, legal, individual and socio-cultural issues apply. Some specific ethical considerations that were also addressed in this section include confidentiality, cross cultured sensitivity, informed consent, protection from harm, and test administration.
     Furthermore, the final thing that I discussed in this paper is why all four forms of validity must be considered when using test methods in different types of settings. This is because the validity of a measure can change due to various factors within each setting. I am also confident that if this information is followed by psychologists, then it may be easier to confirm both reliability and validity when using test methods within all types of settings.  
References:
Behavior Analyst Certification Board (2004). Guidelines for responsible conduct for behavior analysts. Retrieved via Kaplan Online Campus at http://contentasc.kaplan.edu.edgesuite.net/PS502_1004A/images/product/Guidelines%20for%20Responsible%20Conduct.pdf

Bordens, K., & Abbott, B. (2008). Research design and methods. (7th ed.). New York, NY: The McGraw-Hill Companies, Inc.  

Cohen, R. J., & Swerdlik, M. E. (2010). Psychological testing and assessment: An introduction to tests and measurement. Boston, MA: McGraw-Hill Higher Education.

CollegeBoard.Org. (2012). SAT Validity Studies. Retrieved via the World Wide Web at http://professionals.collegeboard.com/data-reports-research/sat/validity-studies

Neukrug, E. S., & Fawcett, R. C. (2010). Essentials of testing and assessment: A practical guide for counselors, social workers, and psychologists. (2nd ed.). Belmont, CA: Brooks/Cole Cengage Learning.

Lezak, M., Howieson, D., & Loring, D. (2004). Neuropsychological assessment (4th ed.). Oxford: Oxford University Press.

Zechmeister, J. S., Zechmeister, E. B., & Shaughnessy, J. J. (2001). Essentials of research methods in psychology. New York, NY: The McGraw-Hill Companies, Inc.






No comments:

Post a Comment