Increasingly, schools of nursing are choosing to administer a new generation of standardized exams, such as the computerized exams developed by Health Education Systems, Inc (HESI), to assess student competency and evaluate achievement of curricular outcomes. According to HESI database records, the number of schools of nursing using HESI exams increased from 85 in December 1999 to 565 in December 2003, an increase of 565% in 4 years.1 The methods used to determine the reliability and validity of HESI exams are of paramount interest to nursing faculties. It is not enough to simply state that an exam is reliable and valid. Faculties must be accountable for determining how these constructs are measured so that evidence-based decisions can be made about the usefulness of incorporating these exams into their nursing curricula.
HESI provides a variety of exams, including the HESI Admission Assessment, which is an entrance exam; specialty exams, which are used to evaluate specific clinical content; custom exams, which are specialty exams that are designed to evaluate faculty-specified nursing content; the HESI Exit Exam (E2), which is a comprehensive exit exam and available in both registered nurse (RN) and practical nurse (PN) versions; an RN-BSN exam, which is used as an outcome measure for RNs pursuing a bachelor's degree in nursing; and a critical thinking exam as well as custom exams for practice settings. The focus of this article will be on two types of HESI exams, specialty exams and exit exams. The purpose of this article is to describe (1) the methods used to measure the reliability and validity of HESI specialty exams and HESI exit exams, (2) the current reliability findings for HESI specialty exams and HESI exit exams, and (3) the current validity data for HESI specialty exams and HESI exit exams.
DESCRIPTION OF HESI EXAMS
Purpose of HESI Exams
HESI specialty exams were developed to assess students' knowledge and their ability to apply nursing concepts within specific content areas. Faculties often use scores provided by these specialty exams as a substitute for teacher-made final exams. The E2 was developed to assess students' preparedness for the licensing exam. Data provided by these exams can be useful in preparing self studies for accreditation that are required by accrediting agencies. The National League for Nursing Accrediting Commission (NLNAC)2 and the Commission on Collegiate Nursing Education (CCNE)3 require schools of nursing to demonstrate a systematic program evaluation. HESI Summary Reports provide content area scores that can be used to evaluate curricular strengths and weaknesses.4 Because these exams provide interval-level data, the data can be analyzed using a variety of statistical methods. Since the E2 has demonstrated a high degree of accuracy in predicting outcomes of NCLEX-RN and NCLEX-PN,4-7 nursing faculties are increasingly choosing to use E2 scores as benchmarks for progression and remediation.8,9
Conceptual Framework
The conceptual framework used to develop HESI exams is grounded in classical test theory and critical thinking theory. The creation, administration, and interpretation of tests are accomplished through educational and psychological measurement processes. Crocker and Algina10 stated that measurement of psychological attributes occurs when quantitative values are assigned to the sample of behaviors obtained from administering a test. By observing and classifying similar behaviors, the test designer is able to draw inferences about the psychological constructs that contribute to the makeup of the test taker. The test designer may also be able to identify relationships between psychological constructs and practical consequences, therefore predicting test-taking behaviors, such as success in academic programs or nursing practice. To make such predictions, the test designer must first quantify the observations representing the constructs that define these behaviors. The nurses who design and revise the nursing exams use course syllabi from nursing programs across the United States in combination with NCLEX test blueprints provided by the National Council of State Boards of Nursing (NCSBN)11,12 to define the constructs indicative of behaviors required for entry-level practice. HESI item writers create test items for use on HESI exams that specifically measure these behaviors. Figure 1 describes the theoretical framework for development of HESI exams.
Development of HESI Test Items
The method used by HESI for development of critical thinking test items13,14 is based on concepts derived from the critical thinking theory described by Paul15 and the cognitive taxonomy developed by Bloom.16 Test items on all HESI exams are written and reviewed by nurse educators and clinicians who evaluate the merit of the items as current measures of nursing practice-specifically, the items that reflect high-priority and high-frequency activities characteristic of entry-level RN practice. All submitted test items are reviewed by HESI nurse educators and modified, as needed, by HESI editors. Each test item is categorized by numerous subject areas, and each subject area provides subset scores. All test items are stored in a database along with their particular item analysis data.
HESI Specialty Exams
HESI specialty exams are available for RN curricula only. However, PN specialty exams are currently in the pilot stage. The specialty exams presented in this article describe only the eight most widely used RN specialty exams.
Specialty exams are designed to measure the student's ability to apply concepts related to specific clinical nursing content areas. Typically, specialty exams consist of 50 test items. Test blueprints for these exams are developed by HESI nurse educators whose clinical practice area is congruent with the exam being developed. Test items that measure nursing knowledge and competencies within the specific clinical specialty area are selected from the HESI database.
Custom exams are specialty exams that are designed to meet specific curricular evaluation needs. Typically, custom specialty exams consist of 50 test items. Test blueprints for custom exams are developed by HESI nurse educators and include the content domain specified in the syllabus or syllabi that are provided to HESI by the faculty requesting the development of a custom exam. Test items that best measure nursing knowledge and competencies within the designated content area are selected from the HESI database. Custom exams are completed following consultation between faculty and HESI nurse educators to ensure that the final products are valid for the constructs to be tested.
Midcurricular exams are custom exams that evaluate content from several nursing courses. Typically, midcurricular exams consist of 100 test items. These exams are administered halfway through the curriculum, which makes them useful as exit exams at the conclusion of the first half of the curriculum. While it is impractical to cite reliability findings for every custom exam that has been developed, the reliability of custom exams is determined in the same manner as it is for all other HESI exams.
HESI Exit Exam
The E2 is a 150-item comprehensive exam that is designed for administration near the completion of the curriculum to measure student preparedness for the NCLEX-RN or NCLEX-PN. It is used to identify the strengths and weaknesses of the students and the possible need for remediation prior to taking the licensure exam. Four versions of the E2 are available for RN students, and two versions are available for PN students. RN students serve as the norming group for the test items that are included on the RN version of the E2, whereas PN students serve as the norming group for the PN version of the E2. Different versions of the E2 are often used to retest those students who require remediation so that the success of such remediation can be evaluated.
Scoring of HESI Exams
The HESI Predictability Model, a proprietary mathematical model, is used to calculate scores for HESI specialty exams and HESI exit exams. All scores provided by these HESI exams are based on the application of this model to the raw data. Test items are individually weighted based on their difficulty level, which is determined by dividing the number of correct responses to the item by the total number of responses to that item, thereby deriving a percentage of correct responses to the item. Each HESI specialty exam and HESI exit exam also provides a conversion score. This score is presented as a percentage that reflects the average weight of all the test items on an exam and the average weight of the test items answered correctly. Therefore, this conversion score is a weighted percentage score that faculty can include as a part of the student's final course grade.
Evaluation of HESI Exams
Crocker and Algina10 identified the basic elements of an exam within the framework of classical test theory as the observed score, the true (universe) score, and the error score (error of measurement). The relationship provided by these scores is described by the formula: observed score equals true score plus error score. Crocker and Algina10 and Sax17 asserted that the reduction of systematic and random error is critically important to ensure that obtained scores on tests closely represent the student's true score. All item analysis and reliability calculations regarding HESI exams are based upon this interpretation.
The item analysis data include each test item's difficulty level and discrimination data as expressed by the point biserial correlation coefficient as well as the number of times the test item has been used on an exam. These data are stored for the last administration of the test item as well as for all administrations of the test item (cumulative data). No test item is scored on any HESI exam until it has been piloted and the item analysis data obtained on that item. HESI nurse educators review and revise test items based on the results of pilot testing. The parameters HESI uses to judge the quality of test items include a cumulative difficulty level of no less than 40% and a point biserial correlation coefficient of 0.15 or above. HESI exams that contain 50 test items include five pilot (nonscored) items, and exams that contain 75 or more test items include 10 pilot items.
PSYCHOMETRIC PROPERTIES OF HESI EXAMS
Reliability
HESI determines the reliability of HESI exams by conducting an item analysis on each exam that is administered and returned to the company for a composite report of the aggregate data. Discrimination data are obtained for each test item by calculating a point biserial correlation coefficient. As a measure of the test's overall reliability, a Kuder Richardson Formula 20 is calculated for every exam administered. Data obtained from these calculations are used to estimate the reliability of an exam prior to administration. These reliability estimates are based on all previous administrations of the test items on each exam and are reflective of the most recently updated item analysis. Reliability estimates are recalculated every time a HESI exam is scored, and they are updated concurrently on all exams that include any of the same test items. Table 1 describes the estimated reliability coefficients and the range of the number of uses for items contained on eight HESI specialty exams, four versions of the E2 for RN students, and two versions of the E2 for PN students as of December 31, 2003. The estimated reliability coefficients for these HESI exams ranged from 0.86 to 0.99, and the number of times the items were used on these exams ranged from 180 to 47,320.
Validity
Research designed to quantify the degree of validity of all HESI exams is an ongoing process. The most current evidence of validity for various HESI exams is determined through an assessment of content validity, construct validity, and criterion-related validity as described in classical test theory.
CONTENT VALIDITY
Content validity refers to the effectiveness of the test items in measuring the basic nursing knowledge and skills of students. Expert nurse educators and clinicians establish content validity for each HESI test item by evaluating the relevance of the content to entry-level nursing practice. This evaluation is conducted before test items are placed into the HESI item banks and periodically thereafter to determine their continued relevance to current nursing practice. HESI uses course syllabi from nursing programs and NCLEX test blueprints to define the content for the E2. The HESI database provides a test blueprint report that describes the distribution of test items in each subject area, including client needs as defined by the NCSBN.11,12 When an E2 is designed and test items are selected from the database for use in the exam, this test blueprint report is reviewed and changed as necessary until the distribution of test items in the subject areas mimics the distribution described by the NCSBN.
The content validity of HESI specialty exams and HESI custom exams is determined by reviewing the syllabi of nursing courses that these exams are designed to evaluate. As of December 31, 2003, HESI nurse educators had developed 1203 custom exams. At least one course syllabus was reviewed when developing each of these custom exams, and in many cases, several course syllabi were reviewed. HESI nurse educators determine the content domain for inclusion in HESI specialty exams and HESI custom exams by reviewing these course syllabi. On average, five or more syllabi arrive monthly at HESI for the development of custom exams. Based on a review of these syllabi, new test items are written by HESI nurse educators as deemed necessary.
CONSTRUCT VALIDITY
Construct validity refers to the extent to which a test measures specified traits or attributes at an abstract level. A construct is a trait, attribute, or quality that cannot be observed directly, but that can be inferred from testing. HESI specialty exams and HESI exit exams measure constructs that are essential to entry-level nursing practice. These constructs, which are reflected in the NCLEX test blueprints,11,12 are defined by nursing faculties and are also defined by the NCSBN's practice analyses of recently graduated nurses.18 Nursing faculties also use HESI scores to make inferences regarding the appropriateness of their nursing curricula and the competence of their students related to specific nursing content areas. HESI Summary Reports for specialty exams and exit exams describe individual and aggregate data on student performance in the subject areas tested.
The increased use of HESI specialty exams and HESI exit exams may indicate that faculties trust the data reported by these exams, and such confidence provides an additional indication of construct validity. According to HESI database records, the administration of specialty exams, including custom exams, increased from 8702 in academic year 1999-2000 to 30,004 in the academic year 2002-2003, an increase of 245% in 4 years.1 Although there are myriad possible reasons why faculties are increasingly choosing to administer HESI specialty exams and HESI custom exams, this increase in use suggests that faculties find these exams worthwhile evaluation tools for measuring student outcomes within particular nursing courses. Use of HESI exit exams increased from 7193 administrations in academic year 1999-2000 to 25,241 administrations in academic year 2002-2003, an increase of 251% in 4 years.1 This increase in use suggests that faculties find the E2 a useful tool in identifying students' remediation needs8 and predicting NCLEX success.
Efforts made to demonstrate convergent validity offer support for a test's construct validity.10,17 Evidence of convergent validity was obtained by comparing HESI exam scores to other measures of the same constructs. In three as-yet unpublished studies, associate degree nursing (ADN) and bachelor of science in nursing faculties that use HESI exams provided evidence of convergent validity for these exams by correlating students' HESI exam scores with their final course grades and cumulative grade point averages (GPAs). Murray and Nibert19 correlated ADN students' (N = 52) HESI specialty exams scores with their final course grades in three courses the exams were designed to evaluate. The correlations were statistically significant (P <= .01) for maternity nursing (r = 0.515), pediatric nursing (r = 0.517), and psychiatric-mental health nursing (r = 0.494).19 Three custom exams were administered in the first year of the ADN program. Scores for two of the three custom exams were significantly correlated (P <= .01) with students' final course grades in the courses for which they were designed to evaluate, Custom-2 (r = 0.569) and Custom-3 (r = 0.691).19 The Custom-1 exam was designed to evaluate three courses, and scores on this exam were significantly correlated (P <= .01) with two of the three final course grades, fundamentals (r = 0.581) and pharmacology (r = 0.463), but scores were not significantly correlated with final course grades in therapeutic communications.19
M. Owings (unpublished data, 2002) correlated the HESI specialty exam scores of second-year ADN students (N = 19) with their cumulative objective course grades (scores for papers, presentations, and clinical grades were not included). The HESI Pediatric Nursing specialty exam scores were significantly correlated (P >= .05) with test scores in the pediatric nursing course (r = 0.402). However, HESI Maternity Nursing specialty exam scores were not significantly correlated with test scores in the maternity nursing course.
L. Symes (unpublished data, 2002) found a significant correlation (P <= .01) between the cumulative GPAs and E2 scores (r = 0.498) of senior students completing a baccalaureate nursing program (N = 27). Based on the findings of these studies conducted by faculty members in two types of nursing programs at three different schools, it can be concluded that HESI specialty exams and HESI custom exams were valid at the time the studies were conducted for these three schools, with the exception of the Maternity Nursing specialty exam scores for one group of students (N = 19).
CRITERION-RELATED VALIDITY
Criterion-related validity refers to inferences made from analyses of test scores for the purpose of predicting student outcomes on another criterion of interest, such as performance in an entry-level nursing position or success on the NCLEX-RN or NCLEX-PN. HESI scores are used to make inferences about students' nursing content knowledge and their ability to apply concepts to nursing problems. Specialty exam scores, including custom exam scores, and exit exam scores provide inferences about students' ability to succeed on the NCLEX.
Evidence of criterion-related validity for the E2 was obtained from four annual validity studies conducted to determine the accuracy of this exam in predicting NCLEX-RN and NCLEX-PN outcomes. Based on the aggregate data collected from 19,554 subjects over four consecutive years, the E2 was found to be 96.36% to 98.46% accurate in predicting NCLEX-RN and NCLEX-PN success.4-7 Additionally, in two different studies the E2 was described as 96.42%20 and 100% accurate21 in predicting NCLEX-RN failures. A chi-square goodness-of-fit test revealed that the predictive accuracy of the E2 did not differ significantly throughout the 4 years of study. Furthermore, there were no significant differences in predictive accuracy by types of programs examined: associate degree, baccalaureate degree, diploma, or practical nursing programs. Nibert et al7 concluded that E2 was a valid measure of students' preparedness for the licensure exam.
Validity can also be evaluated by examining evidence of the consequence or meaning given to the test.17,18,22 Increasing numbers of nursing schools are establishing policies that incorporate HESI exams as a benchmark for progression and remediation. Nibert et al8 reported in a recent study that 45 of 149 RN programs (30.20%) indicated that they had established policies that used E2 scores as benchmarks for progression. The authors also described three consequences of such progression policies: an incomplete or failing grade in the capstone course (34.29%); denial of eligibility for graduation (51.43%); and withholding of approval for NCLEX candidacy (14.29%).8 Morrison et al9 interviewed administrators at seven nursing programs and found that NCLEX-RN pass rates increased by 9% to 41% within 2 years after implementation of policies that used E2 scores as a benchmark for progression.
SUMMARY AND RECOMMENDATIONS
Sufficient scientific data exist to reassure nurse educators that HESI specialty exams and HESI exit exams can be used confidently to assess students' progress throughout the nursing curriculum and their preparedness for the licensure exam. HESI uses item analysis data from all previous administrations of the test items included on an exam to calculate an estimated reliability coefficient. This methodology ensures that the reliability estimates for HESI exams are updated on a continuous basis.
The methods used for determining the validity of HESI specialty exams and HESI exit exams are rooted in classical test theory. Based on this foundation, the validity of HESI specialty exams and HESI exit exams is well established. However, measurement of exam validity is an ongoing process. Additional approaches to establishing validity, beyond those described in classical test theory, are needed to provide quantifiable evidence of validity.17,22,23 To provide further quantifiable evidence of validity, three proposals for future research are presented. First, studies should be conducted that focus on objective methods for collecting and analyzing data obtained from the nurses who review HESI test items. A random sampling of HESI test items should be provided to nurse reviewers along with a questionnaire about these test items' relevance to nursing practice and their value as they relate to the model for writing critical thinking test items described by Morrison et al14 and Morrison and Free.13 Responses to such a questionnaire could then be analyzed to identify the degree of validity exhibited by these exams. Second, differential item function studies need to be conducted on outcome data from HESI exam administrations. With the oversight of a committee charged with the protection of human subjects, this type of data could be obtained and analyzed with the consent of a sample of students whose individual outcomes on both the HESI exams and the NCLEX-RN or NCLEX-PN are known. Third, questionnaires should be used to obtain information regarding faculties' degree of satisfaction with HESI exams as indicators of student competency and curricular outcome achievement. Data obtained from these studies could provide additional quantifiable evidence of the validity of HESI exams.
Acknowledgment
The authors acknowledge Mrs Donna Boyd for her editorial assistance in the preparation of this manuscript.
REFERENCES