SMART GOALS:
1. AT THE END OF THIS WORKSHOP, THE LEARNER WILL BE ABLE TO IDENTIFY COMPONENTS OF A RESEARCH DESIGN FOR BOTH QUALITATIVE RESEARCH AND QUANTITATIVE RESEARCH.
2. AT THE END OF THIS WORKSHOP, THE LEARNER WILL BE ABLE TO IDENTIFY AT LEAST TWO TYPES OF RELIABILITY AND TWO TYPES OF VALIDITY.
3. AT THE END OF THIS WORKSHOP, THE LEARNER WILL BE ABLE TO IDENTIFY WAYS TO MEASURE RELIABILITY.
1. AT THE END OF THIS WORKSHOP, THE LEARNER WILL BE ABLE TO IDENTIFY COMPONENTS OF A RESEARCH DESIGN FOR BOTH QUALITATIVE RESEARCH AND QUANTITATIVE RESEARCH.
2. AT THE END OF THIS WORKSHOP, THE LEARNER WILL BE ABLE TO IDENTIFY AT LEAST TWO TYPES OF RELIABILITY AND TWO TYPES OF VALIDITY.
3. AT THE END OF THIS WORKSHOP, THE LEARNER WILL BE ABLE TO IDENTIFY WAYS TO MEASURE RELIABILITY.
Introduction
This website was created to inform the reader about types of research studies, terminology associated with research and information on clicker systems in K -12 education. Within this workshop you will find information on the technology of clicker systems, definition of measurement in research studies, types of reliability and validity that important to this research, sample designs for qualitative and quantitative research studies, along with an assessment to measure your learning from this experience.
What are Clicker Systems?
For my research, I will be investigating clicker systems. There are many different names for these systems such as: classroom response system, personal response system, student response system, or audience response system. (Deal, 2007). These systems involve both hardware and software that allows students to interact with the instructor by answering questions. There are handheld devices, both wireless and with wires, for students to use for answering. The power of this technology is the active engagement that is produced as a result of using it (Draper, 2004). Students participate by answering questions posed by the instructor. Then, within seconds, the results for the entire class are posted for review and discussion.
Instructionally, this allows a teacher to quickly assess whether or not students are grasping the information presented (Beatty, 2009). These systems display results, including the number of participants. This provides an opportunity for teacher to see, as well as students, that everyone is participating by answering.
For my research, I will be investigating clicker systems. There are many different names for these systems such as: classroom response system, personal response system, student response system, or audience response system. (Deal, 2007). These systems involve both hardware and software that allows students to interact with the instructor by answering questions. There are handheld devices, both wireless and with wires, for students to use for answering. The power of this technology is the active engagement that is produced as a result of using it (Draper, 2004). Students participate by answering questions posed by the instructor. Then, within seconds, the results for the entire class are posted for review and discussion.
Instructionally, this allows a teacher to quickly assess whether or not students are grasping the information presented (Beatty, 2009). These systems display results, including the number of participants. This provides an opportunity for teacher to see, as well as students, that everyone is participating by answering.
Why are Clicker Systems important?
Technology is a part of our everyday lives and students are accustomed to using technology in all aspects of their lives. Research shows that students are more likely to be engaged in learning when using some form of technology (Draper, 2004). Traditional instruction no longer can support student’s diverse learning styles. There are many different types of technologies available, however clicker systems are a fairly simple one that can be incorporated into nearly all subject areas. There are many types of questions that can be used in a clicker system, such as application, critical thinking, or student perspective – to name a few (Penuel, 2007). Clicker systems can be used for various types of assessments. Teachers can use the system in a formative setup to assess as the lesson progresses. It can also be used in a summative fashion once a concept has completely been presented. This allows the teacher to see a student’s total understanding. Other examples of types of ways this can be used would include at the beginning of a class to get basic background knowledge levels of the class or as a method to identify student attitudinal positions on a particular topic (Scornavacca, 2007).
Technology is a part of our everyday lives and students are accustomed to using technology in all aspects of their lives. Research shows that students are more likely to be engaged in learning when using some form of technology (Draper, 2004). Traditional instruction no longer can support student’s diverse learning styles. There are many different types of technologies available, however clicker systems are a fairly simple one that can be incorporated into nearly all subject areas. There are many types of questions that can be used in a clicker system, such as application, critical thinking, or student perspective – to name a few (Penuel, 2007). Clicker systems can be used for various types of assessments. Teachers can use the system in a formative setup to assess as the lesson progresses. It can also be used in a summative fashion once a concept has completely been presented. This allows the teacher to see a student’s total understanding. Other examples of types of ways this can be used would include at the beginning of a class to get basic background knowledge levels of the class or as a method to identify student attitudinal positions on a particular topic (Scornavacca, 2007).
Measurement Components
Reliability and Validity in Measurement
A key to effective research is having data that is measured accurately and can be generalized to the larger population. This aspect of the research process is critical. Having measures that accurately measure what the research topic is about is essential to answering the research question and in determining whether you accept or reject the research hypothesis (Haladyna, 2006). The two components that are significant in this phase of research are reliability and validity.
Reliability is often thought of as consistency. This consistency exists when the measure or test is given multiple times and produces the same results- thus it is a reliable measure (Downing, 2004). Two types of reliability are Inter-rater reliability and Internal consistency reliability. Inter-rater reliability measures how well the raters of the measurement agree through their scoring (Haladyna, 2006). Often times, it is less time consuming for multiple raters to complete the gathering or scoring of data. When this happens, these raters must be consistent in scoring the data, otherwise it is not going to be functional to the research. Raters need to be “calibrated,” that is measured on a fixed set of data and evaluated to see what their rater tendency is – to score high or low. Then, once this is established rater agreement percentages should be calculated to determine if they are in fact producing reliable scoring of the data. Internal consistency reliability measures how well the items on a test measure the same concept being addressed (Haladyna, 2006). The measure is a correlation that produces a coefficient. The closer this coefficient is to 1, the better the reliability in this form. The measure often used for Internal consistency is called the Cronbach’s Alpha (Downing, 2004).
Validity looks at whether or not the instrument used for testing actually measures what you need to have measured. Validity is related to the accuracy of the data or the soundness of the data (Messick, 1980). When a researcher uses a measurement tool, it is essential that the tool measure what the researcher is investigating. Otherwise the data may not be able to be used, particularly in answering the research question and being generalized. Two types of validity are content validity and construct validity. Content validity is the extent to which an assessment measures all aspects of a contruct (Fitzpatrick, 2016). This simply means whether the assessment content is appropriate given the criteria being measured, does the assessment match the topic? Typically, an expert in the content area would provide the validation that the assessment was accurate and sound in the topic covered by the assessment (Nevo, 1985). This type of validity is important because in our society, achievement tests are used in many professions with adverse consequences. Content validity will help ensure that these assessments are accurate representations of what the person creating the test intends on assessing with students (Nevo, 1985). Construct validity tends to be more difficult to define. This form of validity looks at how well the test measures what it says it will measure (McLeod, 2013). For example if a math teacher gives a test over rate, time and distance, construct validity would look at whether the assessment was over this topic or not. If these questions were long and complex reading, the construct validity would not be acceptable as the assessment might then be measuring reading level, as opposed to math proficiency with rate, time and distance.
A key to effective research is having data that is measured accurately and can be generalized to the larger population. This aspect of the research process is critical. Having measures that accurately measure what the research topic is about is essential to answering the research question and in determining whether you accept or reject the research hypothesis (Haladyna, 2006). The two components that are significant in this phase of research are reliability and validity.
Reliability is often thought of as consistency. This consistency exists when the measure or test is given multiple times and produces the same results- thus it is a reliable measure (Downing, 2004). Two types of reliability are Inter-rater reliability and Internal consistency reliability. Inter-rater reliability measures how well the raters of the measurement agree through their scoring (Haladyna, 2006). Often times, it is less time consuming for multiple raters to complete the gathering or scoring of data. When this happens, these raters must be consistent in scoring the data, otherwise it is not going to be functional to the research. Raters need to be “calibrated,” that is measured on a fixed set of data and evaluated to see what their rater tendency is – to score high or low. Then, once this is established rater agreement percentages should be calculated to determine if they are in fact producing reliable scoring of the data. Internal consistency reliability measures how well the items on a test measure the same concept being addressed (Haladyna, 2006). The measure is a correlation that produces a coefficient. The closer this coefficient is to 1, the better the reliability in this form. The measure often used for Internal consistency is called the Cronbach’s Alpha (Downing, 2004).
Validity looks at whether or not the instrument used for testing actually measures what you need to have measured. Validity is related to the accuracy of the data or the soundness of the data (Messick, 1980). When a researcher uses a measurement tool, it is essential that the tool measure what the researcher is investigating. Otherwise the data may not be able to be used, particularly in answering the research question and being generalized. Two types of validity are content validity and construct validity. Content validity is the extent to which an assessment measures all aspects of a contruct (Fitzpatrick, 2016). This simply means whether the assessment content is appropriate given the criteria being measured, does the assessment match the topic? Typically, an expert in the content area would provide the validation that the assessment was accurate and sound in the topic covered by the assessment (Nevo, 1985). This type of validity is important because in our society, achievement tests are used in many professions with adverse consequences. Content validity will help ensure that these assessments are accurate representations of what the person creating the test intends on assessing with students (Nevo, 1985). Construct validity tends to be more difficult to define. This form of validity looks at how well the test measures what it says it will measure (McLeod, 2013). For example if a math teacher gives a test over rate, time and distance, construct validity would look at whether the assessment was over this topic or not. If these questions were long and complex reading, the construct validity would not be acceptable as the assessment might then be measuring reading level, as opposed to math proficiency with rate, time and distance.
Reliability and Validity in K -12 Education
In the state of Ohio, students in grades 3 through 12 are required to take achievement tests in various subjects each year. These scores are used to “evaluate” the success of a school district, building and teacher. Having assessments that are reliable and valid are critically important to everyone in public education. Surprisingly, the data on these tests has fluctuated over time, thus causing an even greater call from educators about the value of these assessments (Haladyna, 2006). Aside from these high stakes tests with graduation implications, the notions of reliability and validity resonate in every classroom in our schools. Each grade level has multiple teachers teaching and assessing students over common standards or expectations. So, in theory, if all teachers are creating and using reliable and valid assessments, all students would have the same opportunity for success. However, it is only within past few years that teachers are beginning to use “common” assessments to measure student progress. These tests are often teacher created. Unfortunately, most teachers have not been instructed on the practices of how to create tests that truly measure the material or content being covered. Therefore, skeptics question the reliability of the test results and the validity of the test itself (Paris, 1991).
In the area of reliability, one type that is used in many classrooms is inter-rater reliability. A grade level team of 6 teachers may create a common assessment with open ended questions. While the assessment is created together, each teacher will likely go off and grade their own students’ work. This causes a problem if Teacher A is an “easier” scorer than Teacher B. Without proper calibration of scoring or scoring the assessments as a group, this form of reliability may be in jeopardy. Thus the resulting data for the grade level may not be accurate. The key to inter-rater reliability is that the various raters will perform at a similar level to ensure consistent scoring (Haladyna, 2006). As described in the scenario, without proper training teachers may not be consistent across the grade and therefore will have data that is not accurate to the assessment. With testing being used in decision making related to placement and retention, it is critical that our assessments are reliable (Burger, 2003).
With respect to validity, the notion of construct validity is critical to achievement testing for public schools. As mentioned above, if an achievement test is created to assess a student’s understanding of algebra, there cannot be extraneous factors such as reading level or vocabulary level affecting the outcome. In Ohio, this has been an issue in creating the state achievement tests. Reading level is often questioned in these assessments, as having adverse effect on the outcome of a non-reading subject test (Haladyna, 2006). Another aspect that is to be considered is the vocabulary used in the creation of the questions. Some students may not have had experience with a miter saw, yet a test question could be created discussing the angle at which the saw cuts a board. If the student has not experience with this tool, the assessment is now testing the student’s vocabulary and application of the vocabulary, as opposed to the mathematical concept of angles. This would create a situation where the construct validity is weak (Burger, 2003). Teachers need to be extremely mindful when creating assessments to consider these two examples of how this validity can be lost. At the state level, this information should be considered when creating high stakes test questions. Although the state has a fairness and sensitivity committee to monitor issues with respect to diversity, reading level needs to also be considered in this group.
With the consequences of the high stakes achievement tests, it is imperative that state officials consider the reliability and validity of these assessments (Burger, 2003). As discussed previously, these areas are under scrutiny by many critics. Until the test creators address these areas of concern, this pressure will likely continue. As for teachers, more training and practice is needed in creating questions that are better indicators for student’s understanding. This is a practice that is rarely discussed in undergraduate educational coursework.
In the state of Ohio, students in grades 3 through 12 are required to take achievement tests in various subjects each year. These scores are used to “evaluate” the success of a school district, building and teacher. Having assessments that are reliable and valid are critically important to everyone in public education. Surprisingly, the data on these tests has fluctuated over time, thus causing an even greater call from educators about the value of these assessments (Haladyna, 2006). Aside from these high stakes tests with graduation implications, the notions of reliability and validity resonate in every classroom in our schools. Each grade level has multiple teachers teaching and assessing students over common standards or expectations. So, in theory, if all teachers are creating and using reliable and valid assessments, all students would have the same opportunity for success. However, it is only within past few years that teachers are beginning to use “common” assessments to measure student progress. These tests are often teacher created. Unfortunately, most teachers have not been instructed on the practices of how to create tests that truly measure the material or content being covered. Therefore, skeptics question the reliability of the test results and the validity of the test itself (Paris, 1991).
In the area of reliability, one type that is used in many classrooms is inter-rater reliability. A grade level team of 6 teachers may create a common assessment with open ended questions. While the assessment is created together, each teacher will likely go off and grade their own students’ work. This causes a problem if Teacher A is an “easier” scorer than Teacher B. Without proper calibration of scoring or scoring the assessments as a group, this form of reliability may be in jeopardy. Thus the resulting data for the grade level may not be accurate. The key to inter-rater reliability is that the various raters will perform at a similar level to ensure consistent scoring (Haladyna, 2006). As described in the scenario, without proper training teachers may not be consistent across the grade and therefore will have data that is not accurate to the assessment. With testing being used in decision making related to placement and retention, it is critical that our assessments are reliable (Burger, 2003).
With respect to validity, the notion of construct validity is critical to achievement testing for public schools. As mentioned above, if an achievement test is created to assess a student’s understanding of algebra, there cannot be extraneous factors such as reading level or vocabulary level affecting the outcome. In Ohio, this has been an issue in creating the state achievement tests. Reading level is often questioned in these assessments, as having adverse effect on the outcome of a non-reading subject test (Haladyna, 2006). Another aspect that is to be considered is the vocabulary used in the creation of the questions. Some students may not have had experience with a miter saw, yet a test question could be created discussing the angle at which the saw cuts a board. If the student has not experience with this tool, the assessment is now testing the student’s vocabulary and application of the vocabulary, as opposed to the mathematical concept of angles. This would create a situation where the construct validity is weak (Burger, 2003). Teachers need to be extremely mindful when creating assessments to consider these two examples of how this validity can be lost. At the state level, this information should be considered when creating high stakes test questions. Although the state has a fairness and sensitivity committee to monitor issues with respect to diversity, reading level needs to also be considered in this group.
With the consequences of the high stakes achievement tests, it is imperative that state officials consider the reliability and validity of these assessments (Burger, 2003). As discussed previously, these areas are under scrutiny by many critics. Until the test creators address these areas of concern, this pressure will likely continue. As for teachers, more training and practice is needed in creating questions that are better indicators for student’s understanding. This is a practice that is rarely discussed in undergraduate educational coursework.
QUalitative Research on Clicker Systems
Below you will find a presentation of a qualitative research study about the use of Clicker Systems in High School Math classes. This study will be helpful to high school Math instructors that are preparing to implement clickers as an instructional method. This study will involve teacher and student interviews, as well as classroom observations. The data gathered from this study will provide insights into the strengths and weaknesses of clicker systems from both a teacher and student perspective. The research will generate valid and reliable data that can be generalized to the field. The research questions for this study are:
- In what ways do students describe their experience of using clicker systems in their high school class?
- In what ways do teachers describe their experience of implementing clicker systems into their high school classes?
Quantitative Research on Clicker Systems
This quantitative research study will be investigating the use of clicker systems in high school classrooms. This study will focus on the impact on student achievement, clickers systems may have, as well as the perception of how the clicker system helps students learn. The study will include pre- and post-test assessments of math content. These will occur at the beginning and ending of the courses. There will also be surveys for students using the clicker systems and their corresponding teachers. These tools will collect reliable and valid data to address the research questions. The research questions for this study are:
1. How does the use of Clicker Systems impact students’ learning in math class?
2. How do students and teachers perceive the ease and enjoyment of using clicker systems?
Results from this study will provide evidence of impact of clickers on student achievement. The surveys may also provide areas for future research and study.
1. How does the use of Clicker Systems impact students’ learning in math class?
2. How do students and teachers perceive the ease and enjoyment of using clicker systems?
Results from this study will provide evidence of impact of clickers on student achievement. The surveys may also provide areas for future research and study.
References
Beatty, I. D. & Gerace, W. J. (2009). Technology-enhanced formative assessment: A research-based pedagogy for teaching science with classroom response technology. Journal of Science Education & Technology, 18(2), 146-162.
Burger, J., & Krueger, M. (2003). A Balanced Approach to High-Stakes Achievement Testing: An Analysis of the Literature with Policy Implications. Retrieved from http://iejll.journalhosting.ucalgary.ca/iejll/index.php/ijll/article/view/413/75 on January 28, 2017.
Deal, A. (2007). Classroom response systems: A teaching with technology white paper. Retrieved from http://www.cmu.edu/teaching/resources/PublicationsArchives/StudiesWhitepapers/ClassroomResponse_Nov07.pdf on January 19, 2017.
Downing S. (2004). Reliability: on the reproducibility of assessment data. Med Educ., 38(9), 1006–1012.
Draper, S.W., & Brown, M.I. (2004). Increasing interactivity in lectures using an electronic voting system. Journal of Computer Assisted Learning, 20(2), 81-94.
Fitzpatrick, Anne (2016). The Meaning of Content Validity. Applied Psychological Measurement, 7(1), 3 – 13.
Haladyna, T. (2006). Perils of Standardized Achievement Testing. Educational Horizons, 85(1), 30-43.Hodges, L. (2010). Engaging students, assessing learning: Just a click away. Essays on Teaching Excellence, 21(3).
McLeod, S. A. (2013). What is Validity? Retrieved from www.simplypsychology.org/validity.html on January 27, 2017.
Messick S. (1980). Test validity and the ethics of assessment. American Psychology, 35(11), 1012–1027.
Nevo, B. (1985). Face validity revisited. Journal of Educational Measurement, 22(4), 287-293.
Paris, S. (1991). A Developmental Perspective on Standardized Achievement Testing. Educational Researcher, 20(5), 12 – 20.
Patry, M. (2009). Clickers in large classes: From student perceptions towards an understanding of best practices. International Journal of the Scholarship of Teaching and Learning, 3(2).
Penuel, W. R., Boscardin, C. K., Masyn, K., & Crawford, V. M. (2007). Teaching with student response systems in elementary and secondary education settings: A survey study. Educational Technology, Research and Development, 55(4).
Scornavacca, E., & Marshall, S. (2007). TXT-2-LRN: Improving students’ learning experience in the classroom through interactive SMS. Presented at the 40th Hawaii International Conference on System Sciences. Retrieved from http://www.massey.ac.nz/massey/fms/Molta/Scornavacca.pdf on January 22, 2017.
Beatty, I. D. & Gerace, W. J. (2009). Technology-enhanced formative assessment: A research-based pedagogy for teaching science with classroom response technology. Journal of Science Education & Technology, 18(2), 146-162.
Burger, J., & Krueger, M. (2003). A Balanced Approach to High-Stakes Achievement Testing: An Analysis of the Literature with Policy Implications. Retrieved from http://iejll.journalhosting.ucalgary.ca/iejll/index.php/ijll/article/view/413/75 on January 28, 2017.
Deal, A. (2007). Classroom response systems: A teaching with technology white paper. Retrieved from http://www.cmu.edu/teaching/resources/PublicationsArchives/StudiesWhitepapers/ClassroomResponse_Nov07.pdf on January 19, 2017.
Downing S. (2004). Reliability: on the reproducibility of assessment data. Med Educ., 38(9), 1006–1012.
Draper, S.W., & Brown, M.I. (2004). Increasing interactivity in lectures using an electronic voting system. Journal of Computer Assisted Learning, 20(2), 81-94.
Fitzpatrick, Anne (2016). The Meaning of Content Validity. Applied Psychological Measurement, 7(1), 3 – 13.
Haladyna, T. (2006). Perils of Standardized Achievement Testing. Educational Horizons, 85(1), 30-43.Hodges, L. (2010). Engaging students, assessing learning: Just a click away. Essays on Teaching Excellence, 21(3).
McLeod, S. A. (2013). What is Validity? Retrieved from www.simplypsychology.org/validity.html on January 27, 2017.
Messick S. (1980). Test validity and the ethics of assessment. American Psychology, 35(11), 1012–1027.
Nevo, B. (1985). Face validity revisited. Journal of Educational Measurement, 22(4), 287-293.
Paris, S. (1991). A Developmental Perspective on Standardized Achievement Testing. Educational Researcher, 20(5), 12 – 20.
Patry, M. (2009). Clickers in large classes: From student perceptions towards an understanding of best practices. International Journal of the Scholarship of Teaching and Learning, 3(2).
Penuel, W. R., Boscardin, C. K., Masyn, K., & Crawford, V. M. (2007). Teaching with student response systems in elementary and secondary education settings: A survey study. Educational Technology, Research and Development, 55(4).
Scornavacca, E., & Marshall, S. (2007). TXT-2-LRN: Improving students’ learning experience in the classroom through interactive SMS. Presented at the 40th Hawaii International Conference on System Sciences. Retrieved from http://www.massey.ac.nz/massey/fms/Molta/Scornavacca.pdf on January 22, 2017.