Click to View SlideBook Version

Educational Research Competencies for Analysis and Applications Eleventh Edition

Educational Research Competencies for Analysis and Applications Eleventh Edition

Find more at

Find more at www.downloadslide.comE d u c at i o n a l Research

Find more at This page intentionally left blank

Find more at www.downloadslide.comE d u c at i o n a l Research Competencies for Analysis and Applications Eleventh Edition GLOBAL Edition Geoffrey E. Mills Southern Oregon University L. R. Gay Late of Florida International University Boston Columbus Indianapolis New York San Francisco Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

Find more at www.downloadslide.comVice President and Editorial Director: Jeffery W. JohnstonVice President and Publisher: Kevin M. DavisDevelopment Editor: Gail GottfriedEditorial Assistant: Marisia StylesExecutive Field Marketing Manager: Krista ClarkSenior Product Marketing Manager: Christopher BarryProject Manager: Lauren CarlsonSenior Acquisitions Editor, Global Edition: Sandhya GhoshalSenior Project Editor, Global Edition: Daniel LuizManager, Media Production, Global Edition: M. Vikram KumarSenior Manufacturing Controller, Production, Global Edition: Trudy KimberProcurement Specialist: Carol MelvilleSenior Art Director: Diane LorenzoText Designer: Candace RowleyCover Designer: Lumina DatamaticsCover Art: Shutterstock/Linda BucklinMedia Project Manager: Tammy WaltersFull-Service Project Management: IntegraComposition: IntegraCredits and acknowledgments for material borrowed from other sources and reproduced, with permission, in thistextbook appear on the appropriate page within the text.Every effort has been made to provide accurate and current Internet information in this book. However, the Internetand information posted on it are constantly changing, so it is inevitable that some of the Internet addresses listed inthis textbook will change.Pearson Education LimitedEdinburgh GateHarlowEssex CM20 2JEEnglandand Associated Companies throughout the worldVisit us on the World Wide Web© Pearson Education Limited 2016The rights of Geoffrey E. Mills and Lorraine R. Gay to be identified as the authors of this work have been asserted bythem in accordance with the Copyright, Designs and Patents Act 1988.Authorized adaptation from the United States edition, entitled Educational Research: Competencies for Analysis andApplications, 11th edition, ISBN 978-0-13-385938-6, by Geoffrey E. Mills and Lorraine R. Gay, published by PearsonEducation © 2016.All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in anyform or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior writtenpermission of the publisher or a license permitting restricted copying in the United Kingdom issued by the CopyrightLicensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS.All trademarks used herein are the property of their respective owners. The use of any trademark in this text does notvest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trade-marks imply any affiliation with or endorsement of this book by such owners.ISBN 10: 1292106174ISBN 13: 9781292106175British Library Cataloguing-in-Publication DataA catalogue record for this book is available from the British Library.10  9  8  7  6  5  4  3  2  114  13  12  11  10Typeset in 10/12 ITC Garamond Std by Integra.Printed and bound by RR Donnelley Kendallville in the United States of America.

Find more at www.downloadslide.comPrefaceNew to This Edition specifically with the purpose of encouraging new researchers to start writing early in theLike the tenth edition, the eleventh edition reflects research process.a combination of both unsolicited and solicited ■ Chapter 3 has undergone significant revisioninput. Positive feedback suggested aspects of the because of the way technology has affected thetext that should not be changed—the writing literature review process. Changes include astyle and the focus on ethical practice, for ex- Digital Research Tools feature on Google Bookample. Those aspects remain. However, for the and Google Scholar, step-by-step directionsfirst time in many years, the Table of Contents for an ERIC EBSCO search that maximizesreflects a new organization for the book. Part I, the power of university library consortiumFoundational Concepts and Processes retains the agreements to identify fully online journalsame six chapters from the 10th edition, but Part II, articles, a “Write Like a Researcher” featureResearch Designs, includes all of the research that encourages new researchers to start theirdesign chapters that were previously separated writing of the review of related literature veryinto quantitative research designs and qualitative early in the research process.research designs. This reflects our decision to ■ Chapter 8 on experimental research has beenprovide a comprehensive discussion of all the re- significantly updated to reflect 21st centurysearch designs before discussing data analysis and discussions about validity, effect size, power,interpretation. Part  III, Working with Quantitative and quasi-experimental designs.and Qualitative Data brings together discussions ■ Chapter 15 on mixed methods designs has beenof descriptive statistics, inferential statistics, and significantly updated to reflect the expansion ofqualitative data collection and analysis. The intent three basic and three advanced mixed methodsof this new section is to provide a comprehensive designs currently being used in educationalsection on both quantitative and qualitative data research settings.analysis and interpretation that reflects the increas- ■ The chapters on Descriptive and Inferentialing application of mixed methods designs in edu- Statistics (now Chapters 17 and 18 in Part IIIcational research. Part IV, Reporting and Critiquing Working with Quantitative and QualitativeResearch effectively remains the same. Data) have been updated to reflect new versions of SPSS and Excel. Content changes reflect the inclusion of newtopics and the expansion or clarification of existing In addition, we have added new tables andtopics. There are many improvements in this edition, figures throughout the text. Every chapter has beenand we describe the more significant highlights here: edited and updated. References have been updated. Appendix A that historically contained tables related■ All research articles have been annotated and to random numbers, and so on, has been deleted now include descriptive annotations (what is and replaced with links throughout the book to on- the researcher doing) and reflective/evaluative line sources that provide the same information. annotations (how did the researcher’s decisions support or challenge the chosen research Philosophy and Purpose design). These annotations will scaffold the readers’ understanding of the content of the This text is designed primarily for use in the in- chapters to the sample journal articles. troductory course in educational research that is a basic requirement for many graduate programs.■ Chapter 1 (and subsequent chapters throughout the book) include a new “Write Like a 5 Researcher” Feature that have been designed

Find more at www.downloadslide.com6 PrefaceBecause the topic coverage of the text is relatively each student selects and delineates a researchcomprehensive, it may be easily adapted for use problem of interest that has relevance to his orin either a senior-level undergraduate course or a her p­ rofessional area. Throughout the rest of themore advanced graduate-level course. text, the student then simulates the procedures that would be followed in conducting a study de- The philosophy that guided the development signed to investigate the research problem; eachof the current and previous editions of this text was chapter develops a specific skill or set of skillsthe conviction that an introductory research course required for the execution of such a research de-should be more oriented toward skill and applica- sign. Specifically, the student learns about the ap-tion than toward theory. Thus, the purpose of this plication of the scientific method in education andtext is for students to become familiar with research the ethical considerations that affect the conductmainly at a “how-to” skill and application level. The of any educational research (Chapter 1), identi-authors do not mystify students with theoretical and fies a research problem and formulates hypoth-statistical jargon. They strive to provide a down- eses (Chapter 2), conducts a review of the relatedto-earth approach that helps students acquire the literature (Chapter 3), develops a research planskills and knowledge required of a competent con- (Chapter 4), selects and defines samples (Chaptersumer and producer of educational research. The 5), and evaluates and selects measuring instrumentsemphasis is not just on what the student knows but (Chapter 6). Throughout these chapters are parallelalso on what the student can do with what he or discussions of quantitative and qualitative researchshe knows. It is recognized that being a “good” re- constructs. This organization, with increased em-searcher involves more than the acquisition of skills phasis on ethical considerations in the conductand knowledge; in any field, important research is of educational research and the skills needed tousually produced by those who through experience conduct a comprehensive review of related litera-have acquired insights, intuitions, and strategies ture, allows the student to see the similarities andrelated to the research process. Research of any differences in research designs and to understandworth, however, is rarely conducted in the absence more fully how the nature of the research questionof basic research skills and knowledge. A funda- influences the selection of a research design. Part IImental assumption of this text is that the competen- “Research Designs” includes description and dis-cies required of a competent consumer of research cussion of different quantitative research designs,overlap considerably with those required of a com- qualitative research designs, mixed methods re-petent producer of research. A person is in a much search designs, and action research designs. Part IIIbetter position to evaluate the work of others after “Working with Quantitative and Qualitative Data”she or he has performed the major tasks involved in includes two chapters devoted to the statisticalthe research process. approaches and the analysis and interpretation of quantitative data, and two chapters describing theOrganization and Strategy collection, analysis, and interpretation of qualitative data. Part  IV “Reporting and Critiquing Research”The overall strategy of the text is to promote stu- focuses on helping the student prepare a researchdents’ attainment of a degree of expertise in re- report, e­ither for the completion of a degree re-search through the acquisition of knowledge and quirement or for publication in a refereed journal,by involvement in the research process. and an ­opportunity for the student to apply the skills and knowledge acquired in Parts I throughOrganization III to ­critique a research report.In the eleventh edition, Part I “Foundational StrategyConcepts and Processes” includes discussion of thescientific and disciplined inquiry approach and its This text represents more than just a textbook toapplication in education. The main steps in the re- be incorporated into a course; it is a total instruc-search process and the purpose and methods of the tional system that includes stated learning out-various research designs are discussed. In Part I, comes, instruction, and procedures for evaluating

Find more at Preface 7each outcome. The instructional strategy of the student. Full-length articles, reprinted from thesystem emphasizes the demonstration of skills and educational research literature, appear at the endsindividualization within this structure. Each chap- of all chapters presenting research designs andter begins with a list of learning outcomes that de- serve as illustrations of “real-life” research usingscribes the knowledge and skills that the student that design. For the 11th edition all of these articlesshould gain from the chapter. In many instances, have been annotated with descriptive and evalua-learning outcomes may be assessed either as writ- tive annotations.ten exercises submitted by students or by tests,whichever the instructor prefers. In most chapters, Supplementary Materialsa task to be performed is described next. Tasksrequire students to demonstrate that they can per- The following resources are available for instructors toform particular research skills. Because each stu- download from works with a different research problem, each mills. Download the supplement you need. If youstudent demonstrates the competency required by require assistance in downloading any resources,a task as it applies to his or her own problem. With contact your Pearson representative.the exception of Chapter 1, an individual chapteris directed toward the attainment of only one task Instructor’s Resource Manual(occasionally, students have a choice between a With Test Bankquantitative and qualitative task). The Instructor’s Resource Manual with Test Bank Text discussion is intended to be as simple is divided into two parts. The Instructor’s Resourceand straightforward as possible. Whenever fea- Manual contains, for each chapter, suggested activ-sible, procedures are presented as a series of steps, ities that have been effectively used in Educationaland concepts are explained in terms of illustra- Research courses, strategies for teaching, andtive examples. In a number of cases, relatively ­selected resources to supplement the textbookcomplex topics or topics beyond the scope of the content. The test bank contains multiple-choicetext are presented at a very elementary level, and items covering the content of each chapter, newlystudents are directed to other sources for addi- updated for this edition, and can be printed andtional, in-depth discussion. There is also a degree edited or used with TestGen®.of intentional repetition; a number of concepts arediscussed in different contexts and from different TestGen®perspectives. Also, at the risk of eliciting morethan a few groans, an attempt has been made to TestGen is a powerful test generator available exclu-sprinkle the text with touches of humor—a hall- sively from Pearson Education publishers. You in-mark of this text spanning three decades—and stall TestGen on your personal computer and createperhaps best captured by the pictures and quotes your own tests for classroom testing and for otherthat open each chapter. Each chapter includes a specialized delivery options, such as over a localdetailed, often lengthy summary with headings area network or on the web. A test bank, which isand subheadings directly parallel to those in the also called a Test Item File (TIF), typically containschapter. The summaries are designed to facilitate a large set of test items, organized by chapter andboth the review and location of related text discus- ready for your use in creating a test, based on thesion. Finally, each chapter (or part) concludes with associated textbook material. Assessments may besuggested criteria for evaluating the associated created for both print and testing online.task and with an example of the task producedby a former introductory educational research

Find more at www.downloadslide.com8 PrefacePowerPoint® Slides at Pearson, Lauren Carlson ably shepherded the manuscript through development and production,The PowerPoint® slides highlight key concepts and responded to my cries for help, and kept me onsummarize text content to help students understand, track. An author does not take on the task of aorganize, and remember core concepts and ideas. major revision of a text of this magnitude withoutThey are organized around chapter learning outcomes the commitment and support of excellent help instructors structure class presentations. Kevin and Gail were instrumental in the develop- ment of this edition and I sincerely thank themAcknowledgments for their professionalism, patience, caring, and sense of humor.I sincerely thank everyone who provided input forthe development of this edition. The following in- I believe that I have made a positive contribu-dividuals made thoughtful and detailed s­ uggestions tion to this text, now my fourth edition, and addedand comments for improving the ­eleventh edition: to the wisdom of earlier editions by L. R. Gay andM.H. Clark, University of Central Florida; Anne Peter Airasian. Long-time users of the text will stillDahlman, Minnesota State University, Mankato; “hear” Lorrie Gay’s voice throughout the text, butDwight R. Gard, Texas Tech University; Jann W. increasingly there is an Aussie accent and sense ofMacInnes, University of Florida; Lauren Saenz, humor creeping its way into the pages!Boston College; and Rishi Sriram, Baylor University.These reviewers contributed greatly to the eleventh I wish to thank my friend and colleagueedition and their efforts are very much appreciated. Dr. Ken Kempner (Emeritus Professor, Southern Oregon University) for his thoughtful work on This edition benefited from the efforts of two revising the descriptive and inferential statisticseditors: Kevin Davis and Gail Gottfried. A few chapters and feedback on other quantitative chap-words of thanks are in order here. For nearly 20 ters in the text.years I have been fortunate to work with KevinDavis, Vice President and Publisher at Pearson. Finally, I want to thank my best friend and wife,Kevin gave me my textbook start in 1997 when Dr. Donna Mills (Southern Oregon University), andhe offered me a contract to write Action Research: my son, Jonathan, for their love, support, and pa-A Guide for the Teacher Researcher (now in its tience. Their commitment to my work is alwaysfifth edition). Kevin has taught me a great deal appreciated and never taken for granted. Theabout writing, and I will always be indebted to completion of this edition signals another new erahim for trusting me with stewardship of this won- in my life as my son Jonathan completes his under-derful text. I have also been fortunate to work graduate degree and contemplates work and grad-with my Developmental Editor, Gail Gottfried, uate school, and Donna prepares for retirementfor a number of years spanning both my ac- after a very successful university career. I continuetion research and educational research books. to suggest to Jonathan that one day he may wantMy virtual relationship with Gail is remarkable. to take over my books. While it is safe to say thatWhile we have never met face-to-face I trust and he is less than excited by the prospect—his under-respect all the contributions she has made to my graduate experiences in the Clark Honors Collegework over the years. I benefit greatly from Gail’s at the University of Oregon and his study abroadcreative thinking about how to make an educa- experiences at the University of Oxford have seentional research textbook meaningful and fun. Also his interest in research increase dramatically! Geoff Mills Southern Oregon University

Find more at www.downloadslide.comBrief ContentsPart I  Foundational Concepts CHAPTER 12  Narrative Research 364 and Processes CHAPTER 13  Ethnographic Research 390CHAPTER 1  Educational Research: CHAPTER 14  Case Study Research 416Method, Purpose, and Ethics 20 CHAPTER 15  Mixed Methods Research:CHAPTER 2  Identifying and Stating Integrating Quantitative anda Research Problem 88 Qualitative Research Designs 442CHAPTER 3  Literature Review 106 CHAPTER 16  Action Research 474CHAPTER 4  Preparing and Refining 136 Part III  Data in Researcha Research Plan CHAPTER 5  Sampling 154 CHAPTER 17  Organizing and 500 Graphing Data CHAPTER 6  Constructs, Variables,and Tests 174 CHAPTER 18  Inferential Statistics 522 CHAPTER 19  Fieldwork 562Part II  Research Designs CHAPTER 20  Analyzing and Interpreting Data CHAPTER 7  Survey Research 208 580CHAPTER 8  Correlational Research 232 Part IV Writing and Evaluating Research ReportsCHAPTER 9  Causal–ComparativeResearch 258 CHAPTER 21  Preparing and Publishing a Research Report CHAPTER 10  Experimental Research 284 596 CHAPTER 22  Analyzing and 620CHAPTER 11  Single-Subject 334 Critiquing Research Experimental Research 9

Find more at This page intentionally left blank

Find more at www.downloadslide.comContentsPart I  Foundational Concepts Sources of Research Problems 89 and Processes Narrowing the Problem 92 Characteristics of Good Problems 93Chapter 1 Educational Research: Stating the Research Problem 94 Method, Purpose, and Ethics 20 Developing Research Questions 95Tasks 1A, 1B 21 Formulating and Stating a Hypothesis 97Task 1C 21 Definition and Purpose of Hypotheses 98 in Quantitative Studies Welcome! 21 Types of Hypotheses 99The Scientific Method 22 Stating the Hypothesis 100Limitations of the Scientific Method 23 Testing the Hypothesis 101Application of the Scientific Method in Education 23 Definition and Purpose of Hypotheses in Qualitative Studies Different Approaches to Educational Research 24 101The Continuum of Research Philosophies 24 Summary 103Quantitative Research 24Qualitative Research 25 Chapter 3 Literature Review 106Mixed Methods Research 26 Task 2A 107Characteristics of Quantitative and Qualitative 26 Task 2B 107Research Approaches Classification of Research by Design 28 Review of Related Literature: Purpose and Scope 107Quantitative Approaches 28 Qualitative Research and the Review 108 of Related Literature Qualitative Approaches 32 Identifying Keywords and Subject Terms, and Identifying,Classification of Research by Purpose 34 Evaluating, and Annotating Sources 109Basic and Applied Research 34 Identifying Keywords 109Evaluation Research 35 Identifying Your Sources 110Research and Development (R&D) 35 Evaluating Your Sources 119Action Research 36 Annotating Your Sources 122The Ethics of Educational Research 36 Analyzing, Organizing, and Reporting the Literature 123Informed Consent and Protection from Harm 38 Meta-Analysis 127Deception 39 Summary 129Ethical Issues Unique to Qualitative Research 39 Performance Criteria Task 2A and 2B 132Gaining Entry to the Research Site 42 Task 2 Example 133Summary 47Performance Criteria Task 1 51Tasks 1A and 1B 51 Chapter 4 Preparing and Refining a Research Plan Task 1C 51 136 Task 3A Task 1A Quantitative Example 52 Task 3B 137 Definition and Purpose of a Research Plan 137Task 1B Qualitative Example 76 Components of the Quantitative Research Plan 137 138Chapter 2 Identifying and Stating 88 Introduction Section 138 a Research Problem Method Section 139 89 Data Analysis 141The Research problem 89Identifying a Research problem 11

Find more at www.downloadslide.com12 ContentsTime Schedule 141 Test Selection, Construction, and Administration 195Budget 141 Selecting a Test 195Components of the Qualitative Research Plan 142 Sources of Test Information 195Prior Fieldwork 142 Selecting from Alternatives 198Title 142 Constructing Tests 199Introduction Section 142 Test Administration 200Research Procedures Section 143 Summary 201Appendixes 147 Performance Criteria Task 5 205Revising and Improving the Research Plan 147 Task 5 Example 206Summary 148Performance Criteria Task 3 151 Part II  Research DesignsTask 3 Example 152Chapter 5 Sampling 154 Chapter 7 Survey Research 208Task 4A 155 Task 6A 209Task 4B 155 Survey Research: Definition and Purpose 210Sampling in Quantitative Research 155 Survey Research Designs 210Defining a Population 156 Cross-Sectional Surveys 210Selecting a Random Sample 156 Longitudinal Surveys 211Determining Sample Size 163 Conducting Survey Research 211Avoiding Sampling Error and Bias 165 Conducting a Questionnaire Study 212Selecting a Nonrandom Sample 166 Administering the Questionnaire 216Sampling in Qualitative Research 167 Summary 222Selecting Research Participants: 168 Example: Survey Study 225Purposive Sampling Approaches Determining Sample Size 169 Chapter 8 Correlational Research 232Summary 170 Task 6B 233Performance Criteria Task 4 172 Correlational Research: Definition and Purpose 234Task 4A Example 173 The Correlational Research Process 235Chapter 6 Constructs, Variables, Problem Selection 235 and Tests 174 Participant and Instrument Selection 235Task 5 175 Design and Procedure 235Vignette: Big Pine School District 175 Data Analysis and Interpretation 235Constructs 175 Relationship Studies 239Variables 176 Data Collection 239Measurement Scales and Variables 176 Data Analysis and Interpretation 239Quantitative and Qualitative Variables 178 Prediction Studies 242Dependent and Independent Variables 178 Data Collection 242Characteristics of Measuring Instruments 179 Data Analysis and Interpretation 243Instrument Terminology 179 Other Correlation-Based Analyses 244Quantitative and Qualitative Data 180 Problems to Consider in Interpreting CorrelationCollection Methods Coefficients 245Interpreting Instrument Data 180 Summary 246Types of Measuring Instruments 181 Example: Correlational Study 249Cognitive Tests 181 Chapter 9 Causal–Comparative Research 258Affective Tests 182Projective Tests 185Criteria for Good Measuring Instruments 186 Task 6C 259 Causal–Comparative Research: Definition 260Validity of Measuring Instruments 186 and Purpose Reliability of Measuring Instruments 190

Find more at Contents 13The Causal–Comparative Research Process 263 Examining Photographs, Memory Boxes, 372 and Other Artifacts Design and Procedure 263 Storytelling 372Control Procedures 264 Letter Writing 372Data Analysis and Interpretation 265 Autobiographical and Biographical Writing 372Summary 267 Other Narrative Data Sources 372Example: Causal-Comparative Study 269 Writing the Narrative 373Chapter 10 Experimental Research 284 Summary 374Task 6D 285 Chapter 13 Ethnographic Research 390Experimental Research: Definition and Purpose 286 Task 7B 391The Experimental Process 287 Ethnographic Research: Definition and Purpose 392Manipulation and Control 288 The Ethnographic Research Process 393Threats to Experimental Validity 289 Key Characteristics of Ethnographic Research 395Threats to Internal Validity 290 Types of Ethnographic Research 395Threats to External Validity 293 Ethnographic Research Techniques 396Control of Extraneous Variables 298 Triangulation 397Group Experimental Designs 300 Participant Observation 397Single-Variable Designs 301 Field Notes 399Factorial Designs 309 Observing and Recording Everything You Possibly Can Summary 312 401 Looking for Nothing in Particular; Looking 402 for Bumps and Paradoxes Chapter 11 Single-Subject Summary 404 Experimental Research 334 Chapter 14 Case Study Research 416Task 6E 335Single-Subject Experimental Designs 336 Task 7C 417Single-Subject versus Group Designs 336 Case Study Research: Definition and Purpose 418The Single-Variable Rule 336 When to Use Case Study Research 419Types of Single-Subject Designs 336 Characteristics of Case Study Research 419Data Analysis and Interpretation 342 Case Study Research Design 420Threats to Validity 342 Sample Selection in Case Study Research 421External Validity 342 Data Collection Techniques 422Internal Validity 343 Conducting and Analyzing Multiple Case Studies 422Replication 344 Summary 426Summary 345 Chapter 15 Mixed Methods Research: Integrating QuantitativePerformance Criteria Task 6 348 and Qualitative Research Designs 442Task 6 Example 349Example: Single-Subject Study 351Chapter 12  Narrative Research 364 Task 7D 443 Mixed Methods Research: Definition and Purpose 444Task 7A 365 Types of Mixed Methods Research Designs 445Narrative Research: Definition and Purpose 366 Basic Mixed Methods Designs 445Types of Narrative Research 367 Advanced Mixed Methods Research Designs 447Narrative Analysis and the Analysis of Narrative 368 Conducting Mixed Methods Research 449The Narrative Research Process 368 Identifying Studies Using Mixed Method Designs 451Key Characteristics of Narrative Research 369 Evaluating a Mixed Methods Study 451Narrative Research Techniques 370 Summary 453Restorying 370 Performance Criteria Task 7 455Oral History 371 Task 7 Example 456

Find more at www.downloadslide.com14 ContentsChapter 16 Action Research 474 Measures of Central Tendency 504Task 8 475 Measures of Variability 506Action Research: Definition and Purpose 476 The Normal Curve 508Key Characteristics of Action Research 476 Skewed Distributions 510Action Research Is Persuasive and Measures of Relative Position 511Authoritative 476 Measures of Relationship 514Action Research Is Relevant 476 Graphing Data 517Action Research Is Accessible 477 Postscript 518Action Research Challenges the Intractability Summary 519of Reform of the Educational System 477Action Research Is Not a Fad 477 Chapter 18 Inferential Statistics 522Types of Action Research 477Critical Action Research 478Practical Action Research 478 Task 9 523Levels of Action Research 479 Concepts Underlying Inferential Statistics 523The Action Research Process 480 Standard Error 523Identifying an Area of Focus 480 Hypothesis Testing 526Data Collection, Analysis, and Tests of Significance 526Interpretation 482 Two-Tailed and One-Tailed Tests 527Action Planning 482 Type I and Type II Errors 529Summary 483 Degrees of Freedom 532Performance Criteria and Examples Task 8 485 Selecting Among Tests of Significance 532Write an Area-of-Focus Statement 485 The t Test 533Define the Variables 485 Analysis of Variance 540Develop Research Questions 485 Multiple Regression 544Describe the Intervention or Innovation 485 Chi Square 547Describe the Membership of the Action 485 Other Investigative Techniques: Data Mining,Research Group Factor Analysis, and Structural Equation Modeling 550Describe Negotiations That Needto Be Undertaken 486 Types of Parametric and Nonparametric Statistical Tests Develop a Time Line 486 551Develop a Statement of Resources 486 Summary 552Develop Data Collection Ideas 486 Performance Criteria Task 9 556Example: Action Research 488 Task 9 Example 557Part III  Data in Research Chapter 19 Fieldwork 562 Data Collection Sources and Techniques 563Chapter 17 Organizing and Observing 563 Graphing Data 500 Interviewing 568 Questionnaires 570The Language of Statistics 501 Examining Records 571Preparing Data for Analysis 502 Validity and Reliability in Qualitative Research 572Scoring Procedures 502 Validity in Qualitative Research 572Tabulation and Coding Procedures 502 Reliability in Qualitative Research 576Types of Descriptive Statistics 503 Getting Started 577Frequencies 503 Summary 578

Find more at Contents 15Chapter 20 Analyzing and 580 Chapter 22 Analyzing and 620 Interpreting Data Critiquing Research Data Analysis and Interpretation: 581 Task 11 621Definition and Purpose General Evaluation Criteria 621Data Analysis During Data Collection 581 Introduction 622Data Analysis after Data Collection 582 Method 622Steps in Analyzing Qualitative Research Data 582 Results 623Reading/Memoing 583 Discussion (Conclusions andDescribing 584 Recommendations) 623Classifying 584 Abstract or Summary 623Data Analysis Strategies 584 Design-Specific Evaluation Criteria 623Example of Coding an Interview 586 Survey Research 624Developing a Concept Map 588 Correlational Research 624Qualitative Data Analysis: 589 Causal–Comparative Research 624An Example Experimental Research 624Data Interpretation Strategies 592 Single-Subject Research 624Ensuring Credibility in Your Study 593 Qualitative Research (in General) 625Summary 594 Evaluating Validity and Reliability in Qualitative Studies 625 Narrative Research 626Part IV Writing and Evaluating Ethnographic Research 626 Research Reports Case Study Research 626 Mixed Methods Research 626Chapter 21 Preparing and Publishing Action Research 626 a Research Report 596 Summary 627 Performance Criteria Task 11 629Task 10 597 Task 11 Example 630Guidelines for Writing a Research Report 597Format and Style 599 Appendix A Statistical References 645Formatting Theses and Dissertations 600 Appendix B Suggested Responses 667Preliminary Pages 601The Main Body 602 Glossary 673Writing for Journal Publication 604Summary 606 Name Index 685Performance Criteria Task 10 608Task 10 Example 609 Subject Index 687

Find more at This page intentionally left blank

Find more at www.downloadslide.comResearch ArticlesChapter 1Can Instructional and Emotional Support in the First-Grade Classroom Make a Difference for Childrenat Risk of School Failure?  52Developing Teacher Epistemological Sophistication About Multicultural Curriculum: A Case Study  76Chapter 7To What Extent Are Literacy Initiatives Being Supported: Important Questions for Administrators  225Chapter 8Parental Involvement and Its Influence on the Reading Achievement of 6th Grade Students  249Chapter 9Comparing Longitudinal Academic Achievement of Full-Day and Half-Day KindergartenStudents 269Chapter 10Effects of Mathematical Word Problem–Solving Instruction on Middle School Students with LearningProblems 316Chapter 11Effects of Functional Mobility Skills Training for Young Students with Physical Disabilities  351Chapter 12For Whom the School Bell Tolls: Conflicting Voices Inside an Alternative High School  376Chapter 13Preparing Preservice Teachers in a Diverse World  406Chapter 14Using Community as a Resource for Teacher Education: A Case Study  428Chapter 15How Should Middle-School Students with LD Approach Online Note Taking? A Mixed MethodsStudy 459Chapter 16“Let’s Talk”: Discussions in a Biology Classroom: An Action Research Project  488Chapter 22Gender and Race as Variables in Psychosocial Adjustment to Middle and High School  630 17

Find more at This page intentionally left blank

Find more at www.downloadslide.comE d u c at i o n a l Research

Find more at Chapter One Educational Research: Method, Purpose, and Ethics Little Heroes 3, 2002 “Despite a popular stereotype that depicts researchers as spectacled, stoop-shouldered, elderly gentlemen who endlessly add chemicals to test tubes, every day thousands of men andwomen of all ages, shapes, and sizes conduct educational research in a wide variety of settings.” (p. 21)

Find more at chapter 1  •  Educational Research: Method, Purpose, and Ethics 21Learning Outcomes Completing Chapter 1 should enable you to per- form the following tasks:After reading Chapter 1, you should be able to dothe following: Tasks 1A, 1B 1. Briefly describe the reasoning involved in the Identify and briefly state the following for both scientific method. research studies at the end of this chapter: 2. Explain why researchers would use 1. The research design quantitative, qualitative, mixed methods, or 2. The rationale for the choice of the research action research designs to address a specific research problem. design 3. The major characteristics of the research 3. Briefly define and state the major characteristics of these research designs: design, including research procedures, survey, correlational, causal–comparative, method of analysis, and major experimental, single-subject, narrative, conclusions ethnographic, case study, mixed methods, and 4. Ethical issues the authors experienced and action research. how they were addressed 4. Explain the purposes of basic research, (See Performance Criteria, p. 51.) applied research, evaluation research, research and development (R&D), and action Task 1C research. Classify given research studies based on their 5. Explain the ethical obligations that characteristics and purposes. (See Performance educational researchers have and describe Criteria, p. 51.) the codes and procedures they must follow to ensure they adhere to them.Welcome! some of you will decide to become educational researchers. A career in research opens the door toIf you are taking a research course because it is a variety of employment opportunities in universi-required in your program of studies, raise your ties, research centers, and business and industry.right hand. If you are taking a research coursebecause it seems like it will be a really fun elective, Despite a popular stereotype that depictsraise your left hand. We thought you may not be researchers as spectacled, stoop-shouldered,here of your own free will. Although you may be elderly gentlemen (a stereotype I am rapidlyrequired to take this course, you are not the inno- approaching!) who endlessly add chemicals to testcent victim of one or more sadists. Your professors tubes, every day thousands of men and women ofhave several legitimate reasons for believing this all ages and postures conduct educational researchresearch course is an essential component of your in a wide variety of settings. Every year many mil-education. lions of dollars are spent in the quest for knowl- edge related to teaching and learning. Educational First, educational research findings contrib- research has contributed many findings concern-ute significantly to both educational theory and ing principles of behavior, learning, and retentioneducational practice. As a professional, you need of knowledge—many of which can also be appliedto know how to find, understand, and evaluate to curriculum, instruction, instructional materi-these findings. And when you encounter research als, and assessment techniques. Both the quantityfindings in professional publications or in the and the quality of research are increasing, partlymedia, you have a responsibility, as a professional, because researchers are better trained. Educationalto distinguish between legitimate and ill-founded research classes have become core components ofresearch claims. Second, although many of you preservice teacher education programs, as well aswill be primarily critical consumers of research, the cornerstone of advanced degree programs.

Find more at www.downloadslide.com22 chapter 1  •  Educational Research: Method, Purpose, and Ethics We recognize that educational research is a rel- approaches to understanding has limitations whenatively unfamiliar discipline for many of you. Our used in isolation. Some problems associated withfirst goals, then, are to help you acquire a general experience and authority as sources of knowledgeunderstanding of research processes and to help are graphically illustrated in a story told aboutyou develop the perspective of a researcher. We Aristotle. According to the story, one day Aristotlebegin by examining the scientific method. caught a fly and carefully counted and recounted the legs. He then announced that flies have fiveThe Scientific Method legs. No one questioned the word of Aristotle. For years his finding was accepted uncritically.What is knowledge? And how do we come to Unfortunately, the fly that Aristotle caught just“know” something? Experience is certainly one of happened to be missing a leg! Whether or not youthe fundamental ways we come to know about and believe the story, it illustrates the limitations ofunderstand our world. For example, a child who relying on personal experience and authority astouches something hot learns that high heat hurts. sources of knowledge.We know other things because a trusted authority,such as a parent or a teacher, told us about them. The story also points out a potential problemMost likely, much of your knowledge of current with inductive reasoning: Generalizing from aworld events comes secondhand, from things you small sample, especially one that is atypical, canhave read or heard from a source you trust. lead to errors. Deductive reasoning, too, is limited by the evidence in the original observations. If Another way we come to know something is every research text really does have a chapter onthrough thinking, through reasoning. Reasoning sampling, and if this book really is a research text,refers to the process of using logical thought to then it follows that this book must have a chapterreach a conclusion. We can reason inductively or on sampling. However, if one or more of the prem-deductively. Inductive reasoning involves devel- ises is false (perhaps some research texts do notoping generalizations based on observation of have a chapter on sampling), your conclusion maya limited number of related events or experi- also be wrong.ences. Consider the following example of induc-tive reasoning: When we rely exclusively on these common approaches to knowing, the resulting knowl-Observation: An instructor examines five research edge is susceptible to error and may be of lim- textbooks. Each contains a chapter about ited value to understanding the world beyond sampling. our immediate experience. However, experience, authority, and inductive and deductive reasoningGeneralization: The instructor concludes that all are very effective when used together as integral research textbooks contain a chapter about components of the scientific method. The scien- sampling. tific method is an orderly process entailing a number of steps: recognition and definition of a Deductive reasoning involves essentially the problem, formulation of hypotheses, collectionreverse process—arriving at specific conclusions of data, analysis of data, and statement of con-based on general principles, observations, or expe- clusions regarding confirmation or disconfirma-riences (i.e., generalizations)—as shown in the tion of the hypotheses (i.e., a researcher forms anext example. h­ ypothesis—an explanation for the occurrence of certain behaviors, phenomena, or events—asObservations: All research textbooks contain a a way of predicting the results of a research chapter on sampling. The book you are reading study and then collects data to test that predic- is a research text. tion). These steps can be applied informally to solve everyday problems such as the mostGeneralization: This book must contain a chapter efficient route to take from home to work or on sampling. (Does it?) school, the best time to go to the bank, or the best kind of computer to purchase. The more Although people commonly use experience, formal application of the scientific method isauthority, inductive reasoning, and deductive standard in research; it is more efficient and morereasoning to learn new things and draw newconclusions from that knowledge, each of these

Find more at chapter 1  •  Educational Research: Method, Purpose, and Ethics 23reliable than relying solely on experience, author- Application of the Scientific Methodity, inductive reasoning, and deductive reasoning in Educationas sources of knowledge. Research is the formal, systematic application ofLimitations of the Scientific Method the scientific method to the study of problems; educational research is the formal, systematicThe steps in the scientific method guide researchers application of the scientific method to the studyin planning, conducting, and interpreting research of educational problems. The goal of educationalstudies. However, it is important to recognize research is essentially the same as the goal of allsome limitations of the method. First, the scientific science: to describe, explain, predict, or controlmethod cannot answer all questions. For example, phenomena—in this case, educational phenom-applying the scientific method will not resolve ena. As we mentioned previously, it can be quitethe question “Should we legalize euthanasia?” The difficult to describe, explain, predict, and controlanswers to questions like this one are influenced situations involving human beings, who are byby personal philosophy, values, and ethics. far the most complex of all organisms. So many factors, known and unknown, operate in any edu- Second, application of the scientific method cational environment that it can be extremely dif-can never capture the full richness of the individu- ficult to identify specific causes of behaviors or toals and the environments under study. Although generalize or replicate findings. The kinds of rigidsome applications of the method lead to deeper controls that can be established and maintained inunderstanding of the research context than oth- a biochemistry laboratory, for instance, are impos-ers, no application—and in fact no research sible in an educational setting. Even describingapproach—provides full comprehension of a site behaviors, based on observing people, has limits.and its inhabitants. No matter how many variables Observers may be subjective in recording behav-one studies or how long one is immersed in a iors, and people who are observed may behaveresearch context, other variables and aspects of atypically just because they are being watched.context will remain unexamined. Thus, the scien- Chemical reactions, on the other hand, are cer-tific method and, indeed, all types of inquiry give tainly not aware of being observed! Nevertheless,us a simplified version of reality. behavioral research should not be viewed as less scientific than natural science research conducted Third, our measuring instruments always have in a lab.some degree of error. The variables we study areoften proxies for the real behavior we seek to exam- Despite the difficulty and complexity of apply-ine. For example, even if we use a very precisely ing the scientific method in educational settings,constructed multiple-choice test to assess a person’s the steps of the scientific method used by edu-values, we will likely gather information that gives cational researchers are the same as those usedus a picture of that person’s beliefs about his or her by researchers in other more easily controlledvalues. However, we aren’t likely to have an ade- settings:quate picture of how that person acts, which maybe the better reflection of the person’s real values. 1. Selection and definition of a problem. A problem is a question of interest that can be More broadly, all educational inquiry, not just tested or answered through the collectionthe scientific method, is carried out with the and analysis of data. Upon identifying acooperation of participants who agree to pro- research question, researchers typicallyvide researchers with data. Because educational review previously published research onresearchers deal with human beings, they must the same topic and use that information toconsider a number of ethical concerns and respon- hypothesize about the results. In other words,sibilities to the participants. For example, they they make an educated guess about themust shelter participants from real or potential answer to the question.harm. They must inform participants about thenature of the planned research and address the 2. Execution of research procedures. Theexpectations of the participants. These factors can procedures reflect all the activities involvedlimit and skew results. All these limitations will be in collecting data related to the problemaddressed in later sections of this book.

Find more at www.downloadslide.com24 chapter 1  •  Educational Research: Method, Purpose, and Ethics (e.g., how data are collected and from whom). researchers have adopted diverse philosophies To a great extent, the specific procedures are toward their research. Now, there are certain dictated by the research question and the philosophical assumptions that underpin an edu- variables involved in the study. cational researcher’s decision to conduct research. 3. Analysis of data. Data are analyzed in a These philosophical assumptions address issues way that permits the researcher to test related to the nature of reality (ontology), how the research hypothesis or answer the researchers know what they know (epistemol- research question. Analysis usually involves ogy), and the methods used to study a particular application of one or more statistical phenomenon (methodology), with an emphasis on technique. For some studies, data analysis quantitative or qualitative methods. As Creswell1 involves verbal synthesis of narrative data; notes, historically, researchers compared the philo- these studies typically involve new insights sophical assumptions that underpinned qualitative about the phenomena in question, generate and quantitative research approaches in order to hypotheses for future research, or both. establish the legitimacy of qualitative research, but 4. Drawing and stating conclusions. The given the evolution of qualitative and quantitative conclusions, which should advance our research over the past four decades, there is no general knowledge of the topic in question, longer any need to justify one set of philosophical are based on the results of data analysis. assumptions over another set of assumptions. They should be stated in terms of the original hypothesis or research question. Conclusions Quantitative Research should indicate, for example, whether the research hypothesis was supported or Educational researchers have also followed well- not. For studies involving verbal synthesis, defined, widely accepted procedures for stating conclusions are much more tentative. research topics, carrying out the research process, analyzing the resulting data, and verifying theDifferent Approaches to quality of the study and its conclusions. Often,Educational Research these research procedures are based on what has come to be known as a quantitative approach toAll educational inquiry ultimately involves a deci- conducting and obtaining educational understand-sion to study or describe something—to ask some ings. The quantitative framework in educationalquestion and seek an answer. All educational inquiry research involves the application of the scientificnecessitates that data of some kind be collected, method to try to answer questions about edu-that the data be analyzed in some way, and that the cation. At the end of this chapter you will findresearcher come to some conclusion or interpreta- an example of quantitative research publishedtion. In other words, all educational inquiry shares in Child Development (a refereed journal): “Canthe same four basic actions we find in the scientific Instructional and Emotional Support in the First-method. However, it is not accurate to say that all Grade Classroom Make a Difference for Childreneducational research is an application of the scien- at Risk of School Failure?” (Hamre & Pianta, 2005).tific method. Important differences exist between the As this title suggests, this research investigates thetypes of problems researchers investigate and the ways in which children’s risk of school failure mayquestions they ask, the types of data they collect, the be moderated by instructional and emotional sup-form of data analysis, and the conclusions that the port from teachers.researcher can draw meaningfully and with validity. Quantitative research is the collection andThe Continuum of Research analysis of numerical data to describe, explain,Philosophies predict, or control phenomena of interest. Part II of the text will address in detail specific quantita-Historically, educational researchers used tive research designs that satisfy the assumptionsapproaches that involved the use of the scientificmethod. However, over the last four decades, 1 Creswell, J. W. (2013). Qualitative Inquiry & Research Design: Choosing Among Five Approaches (3rd ed.). Thousand Oaks, CA: Sage.

Find more at chapter 1  •  Educational Research: Method, Purpose, and Ethics 25underpinning a quantitative approach to research. participants deepens (think back to the discus-A quantitative research approach entails more than sion of inductive reasoning). As a result, qualitativejust the use of numerical data. At the outset of a researchers often avoid stating hypotheses beforestudy, quantitative researchers state the hypotheses data are collected, and they may examine a particu-to be examined and specify the research proce- lar phenomenon without a guiding statement aboutdures that will be used to carry out the study. They what may or may not be true about that phenome-also maintain control over contextual factors that non or its context. However, qualitative researchersmay interfere with the data collection and identify do not enter a research setting without any idea ofa sample of participants large enough to provide what they intend to study. Rather, they commencestatistically meaningful data. Many quantitative their research with “foreshadowed problems.”2 Thisresearchers have little personal interaction with difference is important—­quantitative research usu-the participants they study because they frequently ally tests a specific hypothesis; qualitative researchcollect data using paper-and-pencil, noninteractive often does not.instruments. The analysis of numerical data can becomplex but addressed systematically and Part III Additionally, in qualitative research, context isof the text will provide a detailed description for not controlled or manipulated by the to work with quantitative data. The effort to understand the participants’ perspec- tive requires researchers using qualitative meth- Underlying quantitative research methods is ods to interact extensively and intimately withthe philosophical belief or assumption that we participants during the study, using time-intensiveinhabit a relatively stable, uniform, and coherent data collection methods such as interviews andworld that we can measure, understand, and gen- observations. As a result, the number of partici-eralize about. This view, adopted from the natural pants tends to be small, and qualitative researcherssciences, implies that the world and the laws that analyze the data inductively by categorizing andgovern it are somewhat predictable and can be organizing it into patterns that produce a descrip-understood by scientific research and examination. tive, narrative synthesis.In this quantitative perspective, claims about theworld are not considered meaningful unless they Qualitative research differs from quantitativecan be verified through direct observation. research in two additional ways: (1) Qualitative research often involves the simultaneous collectionQualitative Research of a wealth of narrative and visual data over an extended period of time, and (2) as much as is pos-Qualitative research is the collection, analysis, sible, data collection occurs in a naturalistic setting.and interpretation of comprehensive narrative and In quantitative studies, in contrast, research is mostvisual (i.e., non-numerical) data to gain insights often conducted in researcher-controlled environ-into a particular phenomenon of interest. Part II ments under researcher-controlled conditions, andof the text will address in detail specific qualita- the activities of data collection, analysis, and writ-tive research designs that satisfy the underpinning ing are separate, discrete activities. Because quali-assumptions of a qualitative approach to research. tative researchers strive to study people and eventsQualitative research approaches are based on in their naturalistic settings, qualitative researchdifferent beliefs and designed for different pur- is sometimes referred to as naturalistic research,poses than quantitative research approaches. For naturalistic inquiry, or field-oriented research.example, qualitative researchers do not necessar-ily accept the view of a stable, coherent, uniform These two characteristics of qualitativeworld. They argue that all meaning is situated in research, the simultaneous study of many aspectsa particular perspective or context, and because of a phenomenon and the attempt to study thingsdifferent people and groups often have different as they exist naturally, help in part to explain theperspectives and contexts, the world has many growing enthusiasm for qualitative research in edu-different meanings, none of which is necessarily cation, especially in applied teacher p­ractitioner–more valid or true than another. oriented research. Some researchers and educators Qualitative research approaches tend to evolve 2 Argonauts of the Western Pacific (p. 9), by B. Malinowski,as understanding of the research context and 1922. London: Routledge.

Find more at www.downloadslide.com26 chapter 1  •  Educational Research: Method, Purpose, and Ethicsfeel that certain kinds of educational problems and Characteristics of Quantitative andquestions do not lend themselves well to quanti- Qualitative Research Approachestative methods, which use principally numericalanalysis and try to control variables in very com- Earlier in this chapter, we presented four general,plex environments. As qualitative researchers point conceptual research steps used in the scientificout, findings should be derived from research con- method. In this section we expand the number ofducted in real-world settings to have relevance to steps to six, which are followed by both quantita-real-world settings. tive researchers and qualitative researchers. As we discuss in subsequent chapters in Part II, however, At the end of this chapter, you will find the application of the steps differs depending onan example of qualitative research published the research design. For example, the researchin Action in Teacher Education (a refereed procedures in qualitative research are often less­journal): “Developing Teacher Epistemological rigid than those in quantitative research. Similarly,Sophistication about Multicultural Curriculum: A although both quantitative and qualitative research-Case Study” (Sleeter, 2009). This research inves- ers collect data, the nature of the data differs.tigates how teachers’ thinking about curriculum Figure 1.1 compares the six steps of qualitative anddevelops during a teacher preparation program quantitative research approaches and lists traitsand how the lessons from the case study might that characterize each approach at every step:inform teacher education pedagogy. And, ofcourse, the use of the word epistemological in 1. Identifying a research topic. Often the initialthe title introduces you to the language of educa- topic is narrowed to be more manageable.tional research! 2. Reviewing the literature. The researcherMixed Methods Research examines existing research to identify useful information and strategies for carrying outMixed methods research combines quantita- the study.tive and qualitative approaches by including bothquantitative and qualitative data in a single study. 3. Selecting participants. Participants areThe purpose of mixed methods research is to build purposefully selected (i.e., not randomlyon the synergy and strength that exists between selected) and are usually fewer in numberquantitative and qualitative research approaches than in quantitative understand a phenomenon more fully thanis possible using either quantitative or qualita- 4. Collecting data. Qualitative data tend to betive approaches alone. Chapter 15 will describe gathered from interviews, observations, andin detail six mixed methods research designs artifacts.(­convergent-parallel, explanatory, exploratory,experimental, social justice, and multistage evalu- 5. Analyzing and interpreting data. Theation). However, the basic differences among the researcher analyzes the themes and generaldesigns are related to the priority given to the fol- tendencies and provides interpretations oflowing areas: the data.■ the type of data collected (i.e., qualitative and 6. Reporting and evaluating the research. The quantitative data are of equal weight, or one researcher summarizes and integrates the type of data has greater weight than the other) qualitative data in narrative and visual form.■ the sequence of data collection (i.e., both types of data are collected during the same time Table 1.1 provides another snapshot of period, or one type of data is collected in each quantitative and qualitative research character- sequential phase of the project) istics. Despite the differences between them, you should not consider quantitative and quali-■ the analysis techniques (i.e., either an analysis tative research approaches to be oppositional. that combines the data or one that keeps the Taken together, they represent the full range of two types of data separate). educational research designs. The terms quan- titative and qualitative are used to differentiate one approach from the other conveniently. If you see yourself as a positivist—the belief that

Find more at chapter 1  •  Educational Research: Method, Purpose, and Ethics 27Figure 1.1 • Characteristics of quantitative and qualitative researchQuantitative Steps in the Process QualitativeCharacteristics of Research Characteristics• Description and Identifying a • Exploratory and explanation-oriented Research Problem understanding-oriented• Major role Reviewing the • Minor role• Justification for the Literature • Justification for the research problem and research problem specification for the need for the study• Specific and narrow Selecting • General and broad• Measurable, Participants/Sample • Participants' observable data experiences• Predetermined Collecting • Emerging protocols instruments Data • Text or image data • Small number of• Numeric (numbered) data• Large number of individuals individuals or sites• Statistical analysis Analyzing and • Text analysis• Description of trends, Interpreting Data • Description, analysis, comparison of groups, or and thematic development relationships among variables • The larger meaning• A comparison of results with predictions and past studies of findings• Standard and fixed Reporting and • Flexible and emerging• Objective and unbiased Evaluating Research • Reflexive and biasedSource: Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research(5th ed.), (pp. 20, 464, 504, 541), by Creswell, John W., © 2015. Reprinted by permission of PearsonEducation, Inc., Upper Saddle River, NJ.qualities of natural phenomena must be veri- generally be more appropriate than the other,fied by evidence before they can be considered although selecting a primary approach does notknowledge—that does not mean you cannot preclude borrowing from the other. In fact, bothuse or learn from qualitative research methods. may be utilized in the same studies, as when theThe same holds true for nonpositivist, phenom- administration of a (quantitative) questionnaire isenologist qualitative researchers. Depending on followed by a small number of detailed (qualita-the nature of the question, topic, or problem to tive) interviews to obtain deeper explanations forbe investigated, one of these approaches will the numerical data.

Find more at www.downloadslide.com28 chapter 1  •  Educational Research: Method, Purpose, and EthicsTable 1.1 • Overview of qualitative and quantitative research characteristicsType of data collected Quantitative Research Qualitative ResearchResearch problem Numerical data Non-numerical narrative and visual dataManipulation of context Hypothesis and research procedures Research problems and methodsSample size stated before beginning the study evolve as understanding of topicResearch procedures deepens Yes NoParticipant interaction Larger SmallerUnderlying belief Relies on statistical procedures Relies on categorizing and organizing data into patterns to produce a Little interaction descriptive, narrative synthesis We live in a stable and predictable world Extensive interaction that we can measure, understand, and Meaning is situated in a particular generalize about. perspective or context that is different for people and groups; therefore, the world has many meanings.Classification of Research on the behavior change an individual exhibits as aby Design result of some intervention fall under the heading of single-subject research.A research design comprises the overall strategyfollowed in collecting and analyzing data. Although Survey Researchthere is some overlap, most research studies fol-low a readily identifiable design. The largest dis- Survey research determines and reports the waytinction we can make in classifying research by things are; it involves collecting numerical datadesign is the distinction between quantitative and to test hypotheses or answer questions about thequalitative approaches. Quantitative and qualita- current status of the subject of study. One com-tive research approaches, in turn, include several mon type of survey research involves assessingdistinct types or designs with a focus on unique the preferences, attitudes, practices, concerns, orresearch problems. interests of a group of people. A pre-election political poll and a survey about community mem-Quantitative Approaches bers’ perception of the quality of the local schools are examples. Survey research data are mainlyQuantitative research approaches are applied collected through questionnaires, interviews, andto describe current conditions, investigate rela- observations.tions, and study cause–effect phenomena. Surveyresearch is often designed to describe current Although survey research sounds very sim-conditions. Studies that investigate the rela- ple, there is considerably more to it than justtions between two or more variables are cor- asking questions and reporting answers. Becauserelational research. Experimental studies and researchers often ask questions that have not beencausal–c­omparative studies provide information asked before, they usually have to develop theirabout cause–effect outcomes. Studies that focus own measuring instrument for each survey study. Constructing questions for the intended respon- dents requires clarity, consistency, and tact. Other

Find more at chapter 1  •  Educational Research: Method, Purpose, and Ethics 29major challenges facing survey researchers are The purpose of a correlational study may beparticipants’ failure to return questionnaires, their to establish relations or use existing relations towillingness to be surveyed over the phone, and make predictions. For example, a college admis-their ability to attend scheduled interviews. If the sions director may be interested in answeringresponse rate is low, then valid, trustworthy con- the question “How do the SAT scores of highclusions cannot be drawn. For example, suppose school seniors correspond to the students’ first-you are doing a study to determine the attitudes semester college grades?” If students’ SAT scoresof principals toward research in their schools. You are strongly related to their first-semester grades,send a questionnaire to 100 principals and include SAT scores may be useful in predicting how stu-the question “Do you usually cooperate if your dents will perform in their first year of college. Onschool is asked to participate in a research study?” the other hand, if there is little or no correlationForty principals respond, and they all answer “Yes.” between the two variables, SAT scores likely willIt’s certainly a mistake to conclude that princi- not be useful as predictors.pals in general cooperate. Although all those whoresponded said yes, those 60 principals who did Correlation refers to a quantitative measurenot respond may never cooperate with researchers. of the degree of correspondence. The degree toAfter all, they didn’t cooperate with you! Without which two variables are related is expressed asmore responses, it is not possible to make gener- a correlation coefficient, which is a numberalizations about how principals feel about research between +1.00 and −1.00. Two variables that arein their schools. not related have a correlation coefficient near 0.00. Two variables that are highly correlated will Following are examples of questions that can have a correlation coefficient near +1.00 or − investigated in survey research studies, along A number near +1.00 indicates a positive correla-with typical research designs: tion: As one variable increases, the other variable also increases (e.g., students with high SAT scores■ How do second-grade teachers spend their may also have high grade point averages [GPAs]). teaching time? Second-grade teachers are A number near −1.00 indicates a negative corre- asked to fill out questionnaires, and results lation: As one variable increases, the other vari- are presented as percentages (e.g., teachers able decreases (e.g., a high GPA may correlate spent 50% of their time lecturing, 20% asking negatively with the likelihood of dropping out). or answering questions, 20% in discussion, and Because very few pairs of variables are perfectly 10% providing individual student help). correlated, predictions based on them are rarely +1.0 or −1.0.■ How will citizens of Yourtown vote in the next school board election? A sample of Yourtown It is very important to note that the results citizens complete a questionnaire or interview, of correlational studies do not suggest cause– and results are presented as percentages (e.g., effect relations among variables. Thus, a posi- 70% said they will vote for Peter Pure, 20% tive correlation between, for example, self-concept named George Graft, and 10% are undecided). and achievement does not imply that self-concept Survey research is described in more detail in causes achievement or that achievement causes Chapter 7. self-concept. The correlation indicates only that students with higher self-concepts tend to haveCorrelational Research higher levels of achievement and that students with lower self-concepts tend to have lower levelsCorrelational research involves collecting data to of achievement. We cannot conclude that one vari-determine whether, and to what degree, a relation able is the cause of the other.exists between two or more quantifiable variables.A variable is a placeholder that can assume any Following are examples of research questionsone of a range of values; for example, intelligence, tested with correlational studies:height, and test score are variables. At a minimum,correlational research requires information about ■ What is the relation between intelligence andat least two variables obtained from a single group self-esteem? Scores on an intelligence testof participants. and a measure of self-esteem are acquired

Find more at www.downloadslide.com30 chapter 1  •  Educational Research: Method, Purpose, and Ethics from each member of a given group. The two A weakness of causal–comparative studies is sets of scores are analyzed, and the resulting that, because the cause under study has already coefficient indicates the degree of correlation. occurred, the researcher has no control over it. For■ Does an algebra aptitude test predict success example, suppose a researcher wanted to investi- in an algebra course? Scores on the algebra gate the effect of heavy smoking on lung cancer aptitude test are correlated with final exam and designs a study comparing the frequency of scores in the algebra course. If the correlation lung cancer diagnoses in two groups, long-time is high, the aptitude test is a good predictor of smokers and nonsmokers. Because the groups success in algebra. are preexisting, the researcher did not control the conditions under which the participants smoked Correlational research is described in detail in or did not smoke (this lack of researcher control isChapter 8. why the variable is known as a grouping variable rather than an independent variable). Perhaps aCausal–Comparative Research large number of the long-time smokers lived in a smoggy, urban environment, whereas only a fewCausal–comparative research attempts to deter- of the nonsmokers were exposed to those condi-mine the cause, or reason, for existing differences tions. In that case, attempts to draw cause–effectin the behavior or status of groups of individuals. conclusions in the study would be tentative at best.The cause is a behavior or characteristic believed Is it smoking that causes higher rates of lung can-to influence some other behavior or characteris- cer? Is it living in a smoggy, urban environment?tic and is known as the grouping variable. The Or is it some unknown combination of smokingchange or difference in a behavior or character- and environment? A clear cause–effect link cannotistic that occurs as a result—that is, the effect—is be obtained.known as the dependent variable. Put simply,causal–comparative research attempts to establish Although causal–comparative research pro-cause–effect relations among groups. duces limited cause–effect information, it is an important form of educational research. True Following are examples of research questions cause–effect relations can be determined onlytested with causal–comparative studies (note that through experimental research (discussed in thethe word is causal, not casual): next section), in which the researcher maintains control of an independent variable; but in many■ How does preschool attendance affect social cases, an experimental study is inappropriate or maturity at the end of the first grade? The unethical. The causal–comparative approach is grouping variable is preschool attendance chosen precisely because the grouping variable (i.e., the variable can take one of two values— either cannot be manipulated (e.g., as with gen- students attending preschool and students not der, height, or year in school) or should not be attending); the dependent variable, or effect, is manipulated (e.g., as with smoking or prena- social maturity at the end of the first grade. The tal care). For example, to conduct the smoking researcher identifies a group of first-graders study as an experiment, a researcher would need who attended preschool and a group who did to select a large number of participants who not, gathers data about their social maturity, had never smoked and divide them into two and then compares the two groups. groups, one directed to smoke heavily and one forbidden to smoke. Obviously, such a study is■ How does having a working mother affect unethical because of the potential harm to those a child’s school absenteeism? The grouping forced to smoke. A causal–comparative study, variable is the employment status of the mother which approximates cause–effect results without (again with two possible values—the mother harming the participants, is the only reasonable works or does not work); the dependent approach. Like descriptive and correlational stud- variable is absenteeism, measured as number of ies, however, causal–comparative research does days absent. The researcher identifies a group not produce true experimental research outcomes. of students who have working mothers and Causal-comparative research is described in detail a group whose mothers do not work, gathers in Chapter 9. information about their absenteeism, and compares the groups.

Find more at chapter 1  •  Educational Research: Method, Purpose, and Ethics 31Experimental Research computational skills are measured and compared to determine which treatment, ifIn experimental research, at least one inde- either, produced higher skill levels.pendent variable is manipulated, other rele- ■ Is there an effect of reinforcement on students’vant variables are controlled, and the effect on attitude toward school? The independentone or more dependent variables is observed. variable is type of reinforcement (withTrue experimental research provides the stron- three values: positive, negative, or nogest results of any of the quantitative research reinforcement); the dependent variableapproaches because it provides clear evidence is attitude toward school. The researcherfor linking variables. As a result, it also offers randomly forms three groups from a singlegeneralizability, or applicability of findings to large group of students. One group receivessettings and contexts different from the one in positive reinforcement, another negativewhich they were obtained. reinforcement, and the third no reinforcement. After the treatments are applied for a Unlike causal–comparative researchers, predetermined time, student attitudes towardre­searchers conducting an experimental study school are measured and compared for eachcan control an independent variable. They can of the three the participants for the study, divide theparticipants into two or more groups that have Experimental research is described in detail insimilar characteristics at the start of the research Chapter 10.experiment, and then apply different treatmentsto the selected groups. They can also control the Single-Subject Researchconditions in the research setting, such as whenthe treatments will be applied, by whom, for how Rather than compare the effects of different treat-long, and under what circumstances. Finally, the ments (or treatment versus no treatment) on tworesearchers can select tests or measurements to or more groups of people, experimental research-collect data about any changes in the research ers sometimes compare a single person’s behaviorgroups. The selection of participants from a sin- before treatment to behavior exhibited during thegle pool of participants and the ability to apply course of the experiment. They may also study adifferent treatments or programs to participants number of people together as one group, ratherwith similar initial characteristics permit experi- than as individuals. Single-subject experimen-mental researchers to draw conclusions about tal designs are those used to study the behaviorcause and effect. The essence of experimentation change that an individual or group exhibits as ais control, although in many education settings result of some intervention or treatment. In theseit is not possible or feasible to meet the strin- designs, the size of the sample—the individualsgent control conditions required by experimental selected from a population for a study—is said toresearch. be one. Following are examples of research questions Following are examples of published studiesthat are explored with experimental studies: that used single-subject designs:■ Is personalized instruction from a teacher ■ The effects of a training program with and more effective for increasing students’ without reinforced directed rehearsal as a computational skills than computer correction procedure in teaching expressive instruction? The independent variable is type sign language to nonverbal students with of instruction (with two values: personalized mental retardation. Ten students with instruction and computer instruction); moderate to severe mental retardation were the dependent variable is computational studied.3 skills. A group of students who have never experienced either personalized teacher 3 “Effects of Reinforced Directed Rehearsal on Expressive Sign instruction or computer instruction are Language Learning by Persons with Mental Retardation,” by A. selected and randomly divided into two J. Dalrymple and M. A. Feldman, 1992, Journal of Behavioral groups, each taught by one of the methods. Education, 2(1), pp. 1–16. After a predetermined time, the students’

Find more at www.downloadslide.com32 chapter 1  •  Educational Research: Method, Purpose, and Ethics■ The effects of instruction focused on assignment and interviews (i.e., historical research); and some completion on the homework performance of describe the lives of individuals (i.e., narrative). students with learning disabilities. A single- Overall, a collective, generic name for these quali- subject experiment design was used to tative approaches is interpretive research.5 determine how instruction in a comprehensive, independent assignment completion strategy Narrative Research affected the quality of homework and the homework completion rate of eight students Narrative research is the study of how different with learning disabilities.4 humans experience the world around them; it involves a methodology that allows people to tell Single-subject experimental research is the stories of their “storied lives.”6 The researcherdescribed in detail Chapter 11. typically focuses on a single person and gathers data by collecting stories about the person’s life.Qualitative Approaches The researcher and participant then construct a written account, known as a narrative, about theQualitative research seeks to probe deeply into the individual’s experiences and the meanings theresearch setting to obtain in-depth understandings individual attributes to the experiences. Becauseabout the way things are, why they are that way, of the collaborative nature of narrative research,and how the participants in the context perceive it is important for the researcher and participantthem. To achieve the detailed understandings they to establish a trusting and respectful, qualitative researchers must undertake sus- Another way to think of narrative research is thattained in-depth, in-context research that allows the narrative is the story of the phenomenon beingthem to uncover subtle, less overt, personal under- investigated, and narrative is also the method ofstandings. The field of qualitative research uses a inquiry being used by the researcher.7 One ofvariety of common qualitative research designs. the goals of narrative research in education is toFor example, some qualitative researchers focus on increase understanding of central issues relatedthe exploration of phenomena that occur within a to teaching and learning through the telling andbounded system (e.g., a person, event, program, retelling of teachers’ cycle; in a case study); some focus in depthon a group’s cultural patterns and perspectives to Following is an example of the narrativeunderstand participants’ behavior and their con- research approach:text (i.e., using ethnography); some examine howmultiple cultures compare to one another (i.e., Kristy, an assistant professor of education,ethology); some examine people’s understanding is frustrated by what she perceives as theof their daily activities (i.e., ethnomethodology); g­ender-biased distribution of resources withinsome derive theory using multiple steps of data the School of Education (SOE). Kristy shares hercollection and interpretation that link actions of story with Winston, a colleague and researcher.participants to general social science theories or In the course of their lengthy recorded con-work inductively to arrive at a theory that explains versations, Kristy describes in great detail hera particular phenomenon (i.e., grounded theory); view that the SOE dean, George, is allocatingsome ask about the meaning of this experi- more resources for technology upgrades, cur-ence for these participants (i.e., phenomenology); riculum materials, and conference travel tosome look for common understandings that have her male colleagues. Kristy also shares withemerged to give meaning to participants’ interac- Winston her detailed journals, which capturetions (i.e., symbolic interaction); some seek to her experiences with George and other fac-understand the past by studying documents, relics, ulty members in interactions dealing with the allocation of resources. In addition, Winston4 “Effects of Instruction in an Assignment Completion Strategyon the Homework Performance of Students with Learning 5 For a discussion, see Qualitative Evaluation and ResearchDisabilities in General Education Classes,” by C. A. Hughes, Methods (3rd ed.), M. Q. Patton, 2002, Thousand Oaks, CA:K. L. Ruhl, J. B. Schumaker, and D. D. Deshler, 2002, Learning Sage.Disabilities Research and Practice, 17(1), pp. 1–18. 6 “Stories of Experience and Narrative Inquiry,” by F. M. Connelly and D. J. Clandinin, 1990, Educational Research, 19(5), p. 2. 7 “Stories,” Connelly and Clandinin, pp. 2–14.

Find more at chapter 1  •  Educational Research: Method, Purpose, and Ethics 33 collects artifacts—including minutes of faculty the data and undertakes a cultural interpretation. meetings, technology orders, and lists of cur- The result of the ethnographic study is a holistic riculum materials ordered for the library at the description and cultural interpretation that repre- u­ niversity—that relate to resource allocation. sents the participants’ everyday activities, values, and events. The study is written and presented After collecting all the data that will influence as a narrative, which, like the study from which the story, Winston reviews the information, it was produced, may also be referred to as an identifies important elements and themes, and ethnography. retells Kristy’s story in a narrative form. After constructing the story with attention given to Following is an example of an ethnographic time, place, plot, and scene, he shares the story approach to a research question: with Kristy, who collaborates on establishing its accuracy. In his interpretation of Kristy’s unique ■ What is the Hispanic student culture in an story of gender bias, Winston describes themes urban community college? After selecting a related to power and influence in a hierarchical general research question and a research site school of education and the struggles faced by in a community college that enrolls many beginning professors to establish their career Hispanic students, the researcher first gains paths in a culture that is remarkably resistant entry to the college and establishes rapport to change. with the participants of the study. Building rapport can be a lengthy process, depending on Narrative research is described in detail in the characteristics of the researcher (e.g., non-Chapter 12. Hispanic versus Hispanic; Spanish speaking versus non-Spanish speaking). As is commonEthnographic Research in qualitative approaches, the researcher simultaneously collects and interprets dataEthnographic research, or ethnography, is the to help focus the general research questionstudy of the cultural patterns and perspectives of initially posed.participants in their natural settings. Ethnographyfocuses on a particular site or sites that provide the Throughout data collection, the ethnographicresearcher with a context in which to study both researcher identifies recurrent themes, integratesthe setting and the participants who inhabit it. An them into existing categories, and adds new cat-ethnographic setting can be defined as anything egories as new themes or topics arise. The suc-from a bowling alley to a neighborhood, from a cess of the study relies heavily on the researcher’snomadic group’s traveling range to an elementary skills in analyzing and synthesizing the qualitativeprincipal’s office. The participants are observed data into coherent and meaningful they take part in naturally occurring activities The research report includes a holistic descrip-within the setting. tion of the culture, the common understandings and beliefs shared by participants, a discussion The ethnographic researcher avoids making of how these beliefs relate to life in the culture,interpretations and drawing conclusions too early and discussion of how the findings compare toin the study. Instead, the researcher enters the set- literature already published about similar groups.ting slowly, learning how to become accepted by In a sense, the successful researcher providesthe participants and gaining rapport with them. guidelines that enable someone not in the cultureThen, over time, the researcher collects data in to know how to think and behave in the culture.waves, making initial observations and interpre- Ethnographic research is described in detail intations about the context and participants, then Chapter 13.collecting and examining more data in a secondwave of refining the initial interpretation, then Case Study Researchcollecting another wave of data to further refineobservations and interpretation, and so on, until Case study research is a qualitative researchthe researcher has obtained a deep understanding approach to conducting research on a unit of studyof both the context and its participants’ roles in it. or bounded system (e.g., an individual teacher, aLengthy engagement in the setting is a key facet of classroom, or a school can be a case). Case studyethnographic research. The researcher organizes

Find more at www.downloadslide.com34 chapter 1  •  Educational Research: Method, Purpose, and Ethicsresearch is an all-encompassing method covering Basic and Applied Researchdesign, data collection techniques, and specificapproaches to data analysis.8 A case study is also It is difficult to discuss basic and applied researchthe name for the product of case study research, separately because they are on a single con-which is different from other field-oriented tinuum. In its purest form, basic research isresearch approaches such as narrative research conducted solely for the purpose of developingand ethnographic research. or refining a theory. Theory development is a conceptual process that requires many research Following is an example of a study that used studies conducted over time. Basic researchersthe case study research approach: may not be concerned with the immediate util- ity of their findings because it may be years Mills (1988)9 asked, “How do central office before basic research leads to a practical edu- personnel, principals, and teachers manage and cational application. For example, one of the cope with multiple innovations?” and studied articles listed at the end of this chapter focuses educational change in one American school dis- on basic research to develop and refine theories trict. Mills described and analyzed how change of c­hildren’s adaptation to new school settings functioned and what functions it served in this (Hamre & Pianta, 2005). district. The function of change was viewed from the perspectives of central office person- Applied research, as the name implies, is nel (e.g., superintendent, director of research conducted for the purpose of applying or test- and evaluation, program coordinators), prin- ing a theory to determine its usefulness in solv- cipals, and teachers as they coped with and ing practical problems. A teacher who asks, “Will managed multiple innovations, including the the theory of multiple intelligences help improve introduction of kindergartens to elementary my students’ learning?” is seeking an answer to a schools, the continuation of a program for practical classroom question. This teacher is not at-risk students, and the use of the California interested in building a new theory or even gener- Achievement Test (CAT) scores to drive school alizing beyond her classroom; instead, she is seek- improvement efforts. Mills used qualitative data ing specific helpful information about the impact collection techniques including participant ob- of a promising practice (i.e., a teaching strategy servation, interviewing, written sources of data, based on the theory of multiple intelligences) on and nonwritten sources of data. student learning. For example, one of the articles listed at the end of this chapter focuses on how Case study research is described in detail in a beginning teacher integrates university course-Chapter 14. work on multicultural education into her class- room teaching and the decision-making processClassification of Research related to the implementation of a multiculturalby Purpose curriculum (Sleeter, 2009).Research designs can also be classified by the Educators and researchers sometimes disagreedegree of direct applicability of the research to about which end of the basic–applied researcheducational practice or settings. When purpose continuum should be emphasized. Many educa-is the classification criterion, all research studies tional research studies are located on the appliedfall into one of two categories: basic research and end of the continuum; they are more focusedapplied research. Applied research can be subdi- on what works best than on finding out why itvided into evaluation research, research and devel- works as it does. However, both basic researchopment (R&D), and action research. and applied research are necessary. Basic research provides the theory that produces the concepts for8 Yin, R. K. (2014). Case Study Research: Design and Methods solving educational problems. Applied research(5th ed.). Thousand Oaks, CA: Sage. provides data that can help support, guide, and9 Mills, G. E. (1988). Managing and Coping with Multiple revise the development of theory. Studies locatedEducational Changes: A Case Study. Unpublished doctoral dis- in the middle of the basic–applied continuum seeksertation, University of Oregon, Eugene. to integrate both purposes. Figure 1.2 illustrates the educational research continuum.

Find more at chapter 1  •  Educational Research: Method, Purpose, and Ethics 35Figure 1.2 • The educational research continuum Basic Applied EvaluationResearch Research Research DataDevelop and Solve Educational Monitor ProgressRefine Theory Problems Judge Impact Quantitative Make Decisions and Qualitative MethodsEvaluation Research formative or summative, for example. Formative evaluation occurs during the design phase whenAt the applied end of the research continuum is a program or product is under development andevaluation research, an important, widely used, and is conducted during implementation so that weak-explicitly practical form of research. Evaluation nesses can be remedied. Summative evaluationresearch is the systematic process of collecting focuses on the overall quality or worth of a com-and analyzing data about the quality, effectiveness, pleted program or product.merit, or value of programs, products, or prac-tices. Unlike other forms of research that seek new Research and Development (R&D)knowledge or understanding, evaluation researchfocuses mainly on making decisions—decisions Research and development (R&D) is the pro-about those programs, products, and practices. For cess of researching consumer needs and thenexample, following evaluation, administrators may developing products to fulfill those needs. Thedecide to continue a program or to abandon it, to purpose of R&D efforts in education is not toadopt a new curriculum or to keep the current one. formulate or test theory but to develop effec-Some typical evaluation research questions are “Is tive products for use in schools. Such productsthis special science program worth its costs?” “Is the include teacher-training materials, learning mate-new reading curriculum better than the old one?” rials, sets of behavioral objectives, media materi-“Did students reach the objectives of the diversity als, and management systems. R&D efforts aresensitivity program?” and “Is the new geography generally quite extensive in terms of objectives,curriculum meeting the teachers’ needs?” personnel, and time to completion. Products are developed according to detailed specifications. Evaluations come in various forms and serve Once completed, products are field-tested anddifferent functions.10 An evaluation may be either revised until a prespecified level of effectiveness is achieved. Although the R&D cycle is expen-10 See Evaluation Models: Viewpoints on Educational and sive, it results in quality products designed toHuman Services Evaluation, by D. Stufflebeam, G. Madaus, and meet specific educational needs. School person-T. Kellaghan, 2000, Norwell, MA: Kluwer Academic; Program nel who are the consumers of R&D endeavorsEvaluation, by M. Gridler, 1996, Upper Saddle River, NJ: may, for the first time, really see the value ofPrentice Hall; The Program Evaluation Standards: How to educational research.Assess Evaluation of Education Programs (2nd ed.), by JointCommittee on Standards for Educational Evaluation, 1994,Thousand Oaks, CA: Sage.

Find more at www.downloadslide.com36 chapter 1  •  Educational Research: Method, Purpose, and EthicsAction Research The Ethics of Educational ResearchAction research in education is any systematicinquiry conducted by teachers, principals, school Ethical considerations play a role in all researchcounselors, or other stakeholders in the ­teaching– studies, and all researchers must be aware of andlearning environment to gather information about attend to the ethical considerations related to theirthe ways in which their particular schools oper- studies. In research, the ends do not justify theate, the teachers teach, and the students learn. Its means, and researchers must not put the need orpurpose is to provide teacher-researchers with a desire to carry out a study above the responsibil-method for solving everyday problems in their ity to maintain the well-being of the study partici-own settings. Because the research is not char- pants. Research studies are built on trust betweenacterized by the same kind of control evident in the researcher and the participants, and research-other categories of research, however, study results ers have a responsibility to behave in a trustwor-cannot be applied to other settings. The primary thy manner, just as they expect participants togoal of action research is the solution of a given behave in the same manner (e.g., by providingproblem, not contribution to science. Whether the responses that can be trusted). The two overrid-research is conducted in one classroom or in many ing rules of ethics are that participants should notclassrooms, the teacher is very much a part of the be harmed in any way—physically, mentally, orprocess. The more research training the teachers socially—and that researchers obtain the partici-have had, the more likely it is that the research pants’ informed consent, as discussed in the fol-will produce valid results. Action research can lowing sections.use quantitative, qualitative, or mixed methodsresearch designs depending on the nature of the To remind researchers of their responsibili-research problem. ties, professional organizations have developed codes of ethical conduct for their members. The Following are examples of action research: general principles from the Ethical Principles of Psychologists and Code of Conduct adopted by■ A study to determine how mathematics the American Psychological Association ( June 1, problem-solving strategies are integrated into 2010) provides guidelines and contains specific student learning and transferred to real-life ethical standards in 10 categories, which are not settings outside the classroom. An elementary limited to research: (1) Resolving Ethical Issues, teacher conducts the study in his own (2) Competence, (3) Human Relations, (4) Privacy school. and Confidentiality, (5) Advertising and Other Public Statements, (6) Record Keeping and Fees,■ A study on how a school grading policy (7) Education and Training, (8) Research and change affects student learning. A team of Publication, (9) Assessment, and (10) Therapy. high school teachers works collaboratively You may read the full text online at the web- to determine how replacing number and site  for the American Psychological Association letter grades with narrative feedback affects ( student learning and attitudes toward learning. The American Educational Research Association (AERA) approved a code of ethics in February 2011 The value of action research is confined (for a comprehensive discussion see Educationalprimarily to those conducting it. Despite this Researcher, 40[3], 145–156). The code of ethics oflimitation, action research represents a scientific AERA outlines a set of values on which educationalapproach to problem solving that is considerably researchers should build their research practices.better than change based on the alleged effective- Included in the code of ethics are five princi-ness of untried procedures and infinitely better ples and 22 ethical standards. The principles arethan no change at all. It is a means by which con- intended to serve as a guide for education research-cerned school personnel can attempt to improve ers in determining ethical behavior in various con-the educational process, at least within their envi- texts and include: (a) Professional Competence; (b)ronment. Action research is described in detail in Integrity; (c) Professional, Scientific, and ScholarlyChapter 16.

Find more at Chapter TwoIdentifying and Stating a Research Problem Doctor’s Orders, 1930“Some graduate students spend many anxiety-ridden days andsleepless nights worrying about where they are going to find a problem to address in their theses or dissertations.” (p. 89)

Find more at chapter 2  •  Identifying and Stating a Research Problem 89Learning Outcomes 2. Identify a research problem. 3. Formulate and state a hypothesis.After reading Chapter 2, you should be able to dothe following: 1. Describe the importance of selecting and defining a good research problem.The Research problem a hypothesis are components of both a written research plan and a research report.Selecting and defining a research problem is thefirst step in applying the scientific method. Before Identifying a Researchyou read more about this first step, a few com- problemments about the research process seem appropri-ate. Textbooks tend to present the research process Throughout our school careers, we are taught toand its steps in a simple, linear form: Do this and solve problems of various kinds. Ask people to listthen this and then this, and ultimately you’ll get the 10 most important outcomes of education, andto where you want to be. Although a linear format most will mention problem-solving skills. Now,provides a necessary template for student learning, after many years of emphasis on solving problems,the reality of educational research is that progress you face a research task that asks you to find,is seldom so straightforward. Educational research rather than solve, a problem. If you are like mostis truly a process of trial and error. As you inves- people, you have had little experience findingtigate and refine a research problem, for instance, problems. For beginning researchers, selection ofyou will find things that don’t fit as expected, ideas a problem is the most difficult step in the researchthat are not as clear on paper as they were in your process. Some graduate students spend manyhead, and ideas that require considerable rethink- anxiety-ridden days and sleepless nights worryinging and rewriting. That is the reality of research. about where they are going to find a problem toHowever, your ability to work through these chal- address in their theses or dissertations.lenges is an important and satisfying measure ofyour understanding. Remember this as you embark The first step in selecting a research problemon this learning experience. is to identify a general subject area that is related to your area of expertise and is of particular inter- The research problem provides focus and est to you. Remember, you will be spending a greatstructure for the remaining steps in the scientific deal of time reading about and working with yourmethod; it is the thread that binds everything chosen problem. Having one that interests you willtogether. Selecting and defining a problem should help you maintain focus during the months of con-entail considerable thought. An initial problem that ducting your broad and complex often proves unmanageablefor study, and the researcher must narrow its scope Sources of Research Problemsto implement or complete the study. When prop-erly defined, the research problem reduces a study You may be asking yourself, “Where do researchto a manageable size. problems come from? Where should I look to fer- ret out problems to study?” The four main sources The research problem that you ultimately of research problems are theories, personal experi-select is the problem you will work with in suc- ences, previous studies that can be replicated, andceeding chapters of this text. Therefore, it is library searches. Additional sources are discussedimportant that you select a problem relevant to in the section on digital research tools for the 21styour area of study and of particular interest to century and include: RSS feeds, Facebook, Twitter,you. The Chapter 2 outcomes are for you to iden- blogs, and electronic mailing lists.tify and define a meaningful problem and to statea testable hypothesis. A problem statement and

Find more at www.downloadslide.com90 chapter 2  •  Identifying and Stating a Research ProblemTheories effectiveness was untested (e.g., questioning why a writing program was successful or why scienceThe most meaningful problems are generally those materials were not). A possible research problemderived from theories. A theory is an organized based on personal experience could be “What isbody of concepts, generalizations, and principles the impact of No Child Left Behind (NCLB) Annualthat can be investigated. Educationally relevant Yearly Progress (AYP) reporting requirements ontheories, such as theories of learning and behav- the ways teachers teach?” For classroom practitio-ior, can provide the inspiration for many research ners, another source of a research problem wouldproblems. For example, Jean Piaget posited be daily classroom life and the effects of teachingthat children’s thinking develops in four stages: practices on student outcomes—the starting placethe sensorimotor stage (birth to approximately age for action research.2 years), preoperational stage (approximately age2 to 7 years), concrete operational stage (approxi- Studies That Can Be Replicatedmately age 7 to 11 years), and formal operationalstage (approximately age 11 years and older). An additional source of research problems is pre-Piaget described tasks and behaviors that children viously published studies, many of which can becan and cannot do at each stage. Whether aspects replicated. A replication is a repetition of a studyof Piaget’s theory operate as suggested is a good using different subjects to retest its hypothesis.basis for many possible research problems. For No single study, regardless of its focus or breadth,example, a researcher may explore certain factors provides the certainty needed to assume that simi-that may affect the length of time children take to lar results occur in all or most similar situations.pass from one stage to the next. Progress through research usually comes from accumulated understandings and explanations, Research focused on aspects of a theory is not and replication is a tool to provide such accumu-only conceptually rich; such research also provides lated information.information that confirms or disconfirms one ormore of those aspects and may suggest additional In most cases, the method of a replication isstudies to test the theory further. Take a moment not identical to the original study. Rather, somenow to think of two other theories that are popu- feature or features of the original study are alteredlar in education and, from them, identify a few in an attempt to stretch or move beyond theproblems to investigate. original findings. For example, the researcher may select a different sample of participants for thePersonal Experiences replication in the hope of determining whether the results are the same as those found in the originalAnother common way to identify research prob- study. Or the researcher may examine a differ-lems is to examine some of the questions we often ent kind of community or student, use a differ-ask ourselves about education. Questions may ent questionnaire, or apply a different method ofarise when we participate in class discussion, read data analysis. There are many different interestingarticles in local newspapers and educational jour- and useful ways to replicate studies in the manynals, or interact with others. When we observe or domains of education. For example, a possible rep-read about schools, teachers, and programs, we lication study may focus on how students’ use ofshould ask ourselves questions such as “Why does computers in classrooms affects their achievement,that happen?” “What causes that?” “What would and the study may extend original studies in thehappen if …?” and “How would a different group area by providing computers to children who haverespond to this?” Normally we think only briefly not previously had access to such technology.about such questions before returning to oureveryday business, but such questions are proba- Library Searchesbly the most common source of research problemsbecause they capture our interest. It is hard to Another commonly cited source for a researchimagine an educator who has never had a hunch problem is a library search. Many students areabout a better way to do something (e.g., increase encouraged to immerse themselves in the librarylearning or improve student behavior) or asked and read voraciously in their areas of studyquestions about a program or materials whose until research problems emerge. Although some

Find more at chapter 2  •  Identifying and Stating a Research Problem 91research problems may emerge from library can post comments or questions to the mailing list.immersion, they are considerably fewer than those Your messages will be read by members of the list,emerging from theories, personal experiences, and who may respond to you personally or to the mail-previous studies. Trying to identify a problem amid ing list as a whole.the enormous possibilities in a library is akin tolooking for a needle in a haystack—sometimes An electronic mailing list is a good resource towe find it, but not very often. Clearly libraries are consult when you are devising a research problem.essential sources of information in the research You can ask list members what they think of aprocess, but the library is most useful to the particular problem, if they know of other researchresearcher after a problem has been narrowed. pertaining to your problem, or for links (electronicThen library resources can provide information or otherwise) to resources of interest. You can alsoto place the problem in perspective, reveal what bounce ideas off other list members at each stageresearchers have already learned about the prob- of your research. You can even ask for volunteerslem, and suggest methods for carrying out a study. to read your work in progress!Electronic Mailing Lists To subscribe to an electronic mailing list, you are generally required to send a short email mes-Researchers frequently use email to solicit advice sage to the list address. When you are subscribed,and feedback and conduct dialogue with peers you will receive detailed information about how toand experts in their fields. The most common way post messages, how to unsubscribe, and rules forto do so is by subscribing to an electronic mail- use. Examples of electronic mailing lists for educa-ing list service. A well-known example is Listserv, tional problems include:run by L-Soft International. Electronic mailing listsare designed by organizations or special interest American Educational Research Association Listgroups to facilitate communication among their ( Through one of these lists, you canexpect to receive announcements and bulletins AERA Division K Teaching and Teacher Educationrelated to your area of interest. In addition, you Forum ( Educational Administration Discussion List ( Research Tools for the 21st CenturyDeveloping Research ProblemsMany efficiency-based digital tools are available to Arguably, one of the best digital tools to assist witheducational researchers, primarily in the realm of the development of your research problem is theassisting with reviewing the literature, data collec- use of Rich Site Summary (RSS) feeds (also knowntion, data analysis, and publishing (these will be dis- as web feeds or channels). RSS feeds allow you tocussed in the chapters that deal with these topics). subscribe to a content distributor (e.g., publisher,We suggest the following digital research tools to as- professional organization, and individual educa-sist with the development of your research problem tional researcher) and to receive regular updatesor questions. on everything from a specific journal’s table of con- tents to upcoming podcasts and an individual’s blogRich Site Summary (RSS) Feeds posting.Staying current in your area of interest (or chal- Getting started is as simple as selecting a freelenge) will help you stay on top of what other RSS service (e.g., NetNewsWire for Macintosh usersp­rofessionals in the field are researching and or SharpReader for Windows users) and subscrib-contributing to the existing body of knowledge. ing to RSS feeds of interest to you. A simple Google search of “educational research RSS feeds” resulted (continued)

Find more at www.downloadslide.com92 chapter 2  •  Identifying and Stating a Research Problemin 1.3 million hits, so you probably want to be se- Twitterlective about the feeds you choose, for example,professional journals and organizations in your area Twitter is another social networking websiteof interest, and government-sponsored feeds. that allows users to send short (140-character) m­ essages to subscribers (followers). These short One advantage of subscribing to RSS feeds is messages are referred to as tweets and have be-that your email inbox will not be inundated with come the short message service (SMS) of theregular postings from publishers and professional Internet. Twitter has also become popular withorganizations—your RSS reader will simply indi- schools, colleges, and universities as a way of con-cate whenever you have updates to read. Similarly, necting with Generation Y (also known as the Netmany of the updates provided to you will include Generation) potential students. However, Twitterweb links to journal abstracts and full online ver- has also become a popular tool with researcherssions of articles that have become available. And in and with journals as yet another way of providingan era in which we must all be concerned about educational researchers with the ability to sub-identity theft, subscription to RSS feeds does not scribe to live tweets about interesting problems orrequire disclosure of personal email addresses that individuals. As was reported at one website, themay make you vulnerable to spam, viruses, and “edublogosphere” is way too big to try and capturephishing. in a 140-character tweet; however, Twitter serves as another potentially valuable way of connectingFacebook with like-minded educational researchers.Facebook is a social networking website that allows Blogsusers to maintain an updated personal profile andto notify friends and colleagues about themselves. Blogs are another way of tracking what educa-Users can also join (and form) networks, which are tional researchers are investigating at any givenincreasingly being formed by schools and colleges. time and provide followers with another way ofUniversities have turned to using Facebook as a re- connecting with individual researcher’s journeys.cruiting tool as well as a way to notify their current For example, I recently used a blog to track mystudents and faculty members about changes at the work experiences of teaching educational researchuniversity. Educational research organizations such in Greenland ( the American Educational Research Association and was surprised by the number of followers who(AERA) use Facebook as a tool to connect divisions tracked my journey and who wanted to engage inwithin the organization, thus creating a mechanism conversations, especially about the challenges ofthat connects like-minded scholars. Participation in teaching educational research in a setting whereone of these groups is one way to connect to other English is a distant third language.researchers investigating your current area of interest. A useful website to consult in your search for Narrowing the Problemappropriate electronic mailing lists is This site, sponsored by L-Soft For most quantitative researchers and some quali-International, contains a catalog of more than 50,000 tative researchers, the general problem area mustpublic electronic mailing lists. At this site, you can be narrowed to a more specific, researchablebrowse public lists on the Internet, search for mail- one. A problem that is too broad can lead toing lists of interest, and get information about host grief. First, a broad problem enlarges the task ofsites. A recent search for education lists yielded reviewing the related literature, likely resultinghundreds of electronic mailing lists. Appropriately, in many extra hours spent in the library. Second,this site is called CataList! Additional digital sources broad problems complicate the organization offor research problems are included in the feature the literature review itself. Finally, and most“Digital Research Tools for the 21st Century.” important, a problem that is too broad tends to

Find more at chapter 2  •  Identifying and Stating a Research Problem 93result in a study that is general, difficult to carry or those that need to be conducted. For example,out, and difficult to interpret. Conversely, a well- following a study investigating the effectiveness ofdefined, manageable problem results in a well- computer-assisted instruction in elementary arith-defined, manageable study. metic, the researcher may suggest the need for similar studies in other curriculum areas. At this Note that the appropriate time to narrow stage in the research process, look for generala problem differs for quantitative and qualita- research overviews that describe the nature oftive approaches. Quantitative research typi- research in an area and can suggest more specificcally requires that the researcher spell out a problems in your chosen area.specific and manageable problem at the start ofthe research process. For most qualitative research, In narrowing your problem, you should selectthe researcher often enters the research setting an aspect of the general problem area that iswith only a general problem in mind. Following related to your area of expertise. For example,observation over a period of time, the qualita- from the general problem area, “the use of reviewstive researcher formulates a narrower research to increase retention,” you may generate manyproblem. specific problems, such as the comparative effec- tiveness of immediate versus delayed review on For ideas on narrowing your problem, you the retention of geometric concepts and the effectmay begin by talking to your faculty advisers and of review games on the retention of vocabularyto specialists in your area to solicit specific sugges- words by second-graders. In your efforts to delin-tions for study. You may also want to read sources eate a problem sufficiently, however, be careful notthat provide overviews of the current status of to get carried away—a problem that is too narrowresearch in your problem area and search through is just as bad as a problem that is too broad. Ahandbooks that contain many chapters focused study on the effectiveness of preclass reminderson research in a particular area (e.g., Handbook in reducing instances of pencil sharpening duringof Research in Educational Administration, The class time, for example, would probably contrib-Handbook of Educational Psychology, Handbook ute little, if anything, to the education knowledgeof Research on Curriculum, Handbook of Research base. Table 2.1 provides examples of broad andon Teacher Education, Handbook of Sport narrow research statements focused on the samePsychology, International Handbook of Early problem.Child Education, International Handbook of Self-Study of Teacher Education Practices, and many Characteristics of Good Problemsmore). You can also check the Encyclopedia ofEducational Research or journals such as the Selecting a good problem is well worth the timeReview of Educational Research, which provide and effort. As mentioned previously, there is noreviews of research in many areas. These sources shortage of significant educational problems thatoften identify what may be called next-step studies,Table 2.1 • Comparison of broad and narrow research problemsBroad Research Problem Narrow Research Problem 1. How is passing through Piaget’s four stages of 1. What factors affect the length of time children take to cognitive development related to success at college? pass from one Piagetian stage to the next? 2. How do the requirements of No Child Left Behind 2. What is the impact of No Child Left Behind (NCLB) (NCLB) legislation affect whether or not children Annual Yearly Progress (AYP) reporting requirements become good citizens? on the ways teachers teach? 3. How is use of the Internet by elementary-school-age 3. How does providing computers to children who have children related to success at college? not previously had access to such technology affect their achievement?

Find more at www.downloadslide.com94 chapter 2  •  Identifying and Stating a Research Problemneed to be studied; there is really no excuse for but a general rule of thumb is that a significantselecting a trite, overly narrow problem. Besides, study is one that contributes in some way to theit is generally to your advantage to select a worth- improvement or understanding of educational the-while problem because you will certainly get a ory or practice.great deal more out of it professionally and aca-demically. If the subsequent study is well con- A fourth characteristic of a good problem isducted and reported, not only will you make a that the research is ethical. That is, the researchcontribution to the existing knowledge base but must not potentially harm the research partici-you may also find your work published in a pro- pants. Harm encompasses not only physical dan-fessional journal. The potential personal benefits ger but emotional danger as be derived from publication include increasedprofessional status and job opportunities, not to A fifth important characteristic is that themention tremendous self-satisfaction. problem is manageable for you. Choosing an interesting problem in an area in which you Working with an interesting problem helps have expertise is not sufficient. You must choosea researcher stay motivated during months of a problem that you can investigate adequately,study. Being interesting, however, is only one given your current level of research skill, thecharacteristic of a good research problem. A resources available to you, and the time youresearch problem, by definition, is an issue in can commit to carrying out the study. The avail-need of investigation, so it follows that a fun- ability of appropriate participants and measur-damental characteristic of a good problem is ing instruments, for example, is an importantthat it is researchable. A researchable problem one that can be investigated through collect-ing and analyzing data. Problems dealing with The characteristics of a good research prob-philosophical or ethical issues are not research- lem are summarized in Table 2.2. As you assess aable. Research can assess how people feel about problem for its appropriateness and feasibility, yousuch issues, but it cannot resolve them. In edu- may want to consult your faculty advisers for theircation, many issues make great problems for opinions.debate (e.g., “Should prayer be allowed in theschools?”), but they are not researchable prob- Stating the Research Problemlems; there is no way to resolve these issuesthrough collecting and analyzing data. Generally, After you have selected and narrowed yourproblems or questions that contain the word research problem, you should draft a written state-should cannot be answered by research of any ment of that problem. The way in which a problemkind because they are ultimately matters of is stated varies according to the type of researchopinion. undertaken and the preferences of the researcher. As with other parts of the research process, the However, a slight wording change can turn approach differs somewhat for quantitative andan unresearchable problem into a researchable qualitative For example, studies that examine the effectsof school prayer on teachers and students, the Stating Quantitative Research Problemseffects of grouping practices on classroom learn-ing, or the consequences of being held back in a For a quantitative study, a well-written problemgrade can be carried out. Such studies, as worded, statement generally describes the variables ofcan tell us about the varied consequences of interest, the specific relations among those vari-these practices. The decisions that any school or ables, and (ideally) important characteristics of theteacher makes regarding those practices can then participants (e.g., gifted students, fourth-gradersbe based in part on those studies, but ultimately with learning disabilities, teenage mothers). Anthose decisions also involve issues that go beyond example of a problem statement is: “The problemany research study. to be investigated in this study is the effect of posi- tive reinforcement on the quality of 10th-graders’ A third characteristic of a good research prob- English compositions.” It is clear that the variableslem is that it has theoretical or practical signifi- in this study are positive reinforcement and qualitycance. People’s definitions of significance vary, of English compositions, and the participants will be 10th-graders.

Find more at chapter 2  •  Identifying and Stating a Research Problem 95 Table 2.2 • Choosing a good research problem Following are examples of general 1. Your problem is interesting. It will hold your interest throughout the statements that may be drafted in the early stages of the qualitative entire research process. research process: 2. Your problem is researchable. It can be investigated through the ■ “The purpose of this study is tocollection and analysis of data and is not stated as an effort to describe the nature of children’sdetermine what should be done. engagement with mathematics. 3. Your problem is significant. It contributes in some way to the The intention is to gatherimprovement or understanding of educational theory or practice. details about children’s ways of 4. Your problem is ethical. It does not involve practices or strategies that entering into and sustaining their may embarrass or harm participants. involvement with mathematics.” 5. Your problem is manageable. It fits your level of skill, the available ■ “This qualitative study examines resources, and the time restrictions. how members of an organization identify, evaluate, and respond to organizational change. The study examines the events that members of an organization identify asOther possible problem statements include the significant change events and whether differentfollowing: events are seen as significant by subgroups in■ “The problem to be investigated in this study is the organization.” secondary teachers’ attitudes toward required ■ “The purpose of this research is to study the after-school activities.” social integration of children with disabilities in■ “The purpose of this study is to investigate a general education third-grade class.”the relation between school entrance age andreading comprehension skills of primary-level Developing Research Questionsstudents.”■ “The problem to be studied is the effect of Developing research questions breathes life into the wearing required school uniforms on the self- research problem statements. To use a teaching anal- esteem of socioeconomically disadvantaged ogy, it is like taking the aims of the lesson (the prob- sixth-grade students.” lem statement, broad statement of outcomes) andTry to identify the variable or variables in each developing the instructional objectives for the lessonexample and suggest the quantitative research (the research questions; bite-size, narrow questions).approach that would likely be employed to carry These research questions also validate that you have a workable way to proceed with your research. (Seeout the study. Figure 2.1.) There is a direct connection between theStating Qualitative Research Problems research question and the data collection strategiesAt this point in the research process, qualitative that the researcher will use to answer the question.research problems are often stated in more gen- The research questions add another level of specific-eral language than quantitative ones because, in ity to the development of the research and providemany cases, the qualitative researcher needs to the researcher with an action plan for the develop-spend time in the research context for the focus ment and identification of research instruments.of the study to emerge. Remember, the qualita- Following are examples of research questions devel-tive researcher usually is much more attuned to oped from the earlier quantitative research problems:the specifics of the context in which the study ■ “The problem to be investigated in this study istakes place than is the quantitative researcher. secondary teachers’ attitudes toward requiredQualitative problem statements eventually narrow after-school activities.”as the researcher learns more about the research Research questions: What are secondarycontext and its inhabitants, and these more pre- teachers’ attitudes toward varsity athleticscise statements appear in the research report. programs? What instructional strategies do

Find more at www.downloadslide.com96 chapter 2  •  Identifying and Stating a Research ProblemFigure 2.1 • Framework for conceptualizing research questionsActivity CharacteristicsIdentifying Theoriesa topic Personal experiences Replication studiesNarrowing Library searchesthe topic RSS FeedsDeveloping a Twittergood topic BlogsStating the Electronic mailing listsresearch topic Talk with faculty advisors.Developing Consult overviews of current research in yourresearch topic area.questions Select a general problem area related to your area of expertise. The topic is interesting. The topic is researchable. The topic is significant. The topic is ethical. The topic is manageable. A well written topic statement generally describes the variables of interest, and, ideally, important characteristics of the participants. Breathes life into the research topic statements and validates that you have a workable way to proceed with your research. Provides direction for the development of research instruments. secondary teachers use to accommodate Following are examples of research questions student-athletes? How do these instructional developed from the earlier qualitative research strategies affect student achievement? problems:■ “The purpose of this study is to investigate the relation between school entrance age and ■ “The purpose of this study is to describe reading comprehension skills of primary-level the nature of children’s engagement with students.” mathematics. The intention is to gather details about children’s ways of entering into and Research question: What is the correlation sustaining their involvement with mathematics.” between student entrance age at the beginning Research question: What strategies do of primary school and students’ performance children use to engage in learning mathematics? in reading comprehension at the end of first How do these strategies sustain student grade? involvement in learning mathematics? How■ “The problem to be studied is the effect of does being engaged with mathematics content wearing required school uniforms on the self- affect student attitudes toward mathematics? esteem of socioeconomically disadvantaged sixth-grade students.” ■ “This qualitative study examines how members of an organization identify, evaluate, and Research question: What is the effect of respond to organizational change. The study mandatory school uniforms on the self-esteem examines the events that members of an of socioeconomically disadvantaged sixth-grade organization identify as significant change students?

Find more at chapter 2  •  Identifying and Stating a Research Problem 97 events and whether different events are effectiveness of salaried paraprofessionals and seen as significant by subgroups in the nonsalaried parent volunteers with respect to the organization.” reading achievement of first-grade children.” This statement may be followed by a discussion of Research questions: What are the (1) the role of paraprofessionals; (2) the increased unintended consequences of teacher utilization of paraprofessionals by schools; (3) the involvement in the schoolwide reform efforts? expense involved; and (4) the search for alterna- How do the school administrators involve tives, such as parent volunteers. The significance teachers, students, and community members of the problem is that, if parent volunteers and in the schoolwide reform efforts? What are the paid paraprofessionals are equally effective, vol- major challenges facing school administrators unteers can be substituted for salaried paraprofes- in building teacher support for the schoolwide sionals at great savings. Any educational practice reform efforts? that may increase achievement at no additional■ “The purpose of this research is to study the cost is certainly worthy of investigation! social integration of children with disabilities in a general education third-grade class.” Thinking about the significance of your prob- lem will help you develop a tentative hypothesis, Research questions: What instructional which is a prediction about the research find- strategies do regular education teachers use to ings. A researcher typically uses the tentative integrate children with learning disabilities into hypothesis as a guiding hypothesis during the their general education third-grade class? How process of reviewing literature related to the do regular education students accommodate research problem. In the example just given, a children with learning disabilities in their tentative hypothesis is that parent volunteers regular classroom activities? are equally as effective as salaried paraprofes- sionals. The tentative hypothesis is likely to be These examples illustrate the importance of modified, even changed radically, as a result oftranslating qualitative and quantitative research the review of the literature, but it gives directionproblems into specific research questions that to the literature search and helps the researcherprovide the researcher with a methodological road narrow the scope of the search to include onlymap for how to proceed with the development relevant problems. Clearly, it is important toof the research proposal (discussed in detail in develop a guiding hypothesis prior to startingChapter 4). your ­literature review.Placement and Nature of the Problem Formulating and Stating aStatement in a Study HypothesisIt’s helpful to understand how the problem state- A hypothesis is a researcher’s prediction of thement is used in later stages of the research research findings, a statement of the researcher’sprocess. A statement of the problem is the first expectations about the relations among the vari-component of both a research plan and the com- ables in the research problem. Many studies con-pleted research report, and it gives direction to tain a number of variables, and it is not uncommonthe remaining aspects of both the plan and report. to have more than one hypothesis for a researchThe statement is accompanied by a presentation problem. The researcher does not set out to proveof the background of the problem, a justification a hypothesis but rather collects data that eitherfor the study (i.e., a discussion of its significance), support or do not support it. A written statementand often limitations of the study. The background of your hypothesis is part of your research planincludes information needed by readers to under- and report.stand the nature of the problem. Both quantitative and qualitative researchers To provide a justification of the study, the deal with hypotheses, but the nature of eachresearcher must explain how investigation of the approach differs. We first discuss the quantitativeresearch problem can contribute to educationaltheory or practice. For example, consider an intro-duction that begins with this problem statement,“The purpose of this study is to compare the

Find more at www.downloadslide.com98 chapter 2  •  Identifying and Stating a Research Problemuse of hypotheses and then discuss the qualitative hypothesis won’t be consistent with everycounterpart. study, but it should follow from the rule, not from the exception.Definition and Purpose of 2. A good hypothesis provides a reasonableHypotheses in Quantitative Studies explanation for the predicted outcome. If your telephone is out of order, you mayHypotheses are essential to all quantitative hypothesize that butterflies are sitting onresearch studies, with the possible exception your telephone wires, but such a hypothesisof some survey studies whose purpose is to does not provide a reasonable explanation.answer certain specific questions. A quantitative More reasonable hypotheses are that youresearcher formulates a hypothesis before con- forgot to pay your bill or that a repair crewducting the study because the nature of the study is working outside. As another example, ais determined by the hypothesis. Every aspect of hypothesis suggesting that schoolchildrenthe research is affected, including participants, with freckles attend longer to tasks thanmeasuring instruments, design, procedures, data schoolchildren without freckles does notanalysis, and conclusions. provide a reasonable explanation for children’s attention behavior. A hypothesis Hypotheses are typically derived from theories suggesting that children who eat a nutritiousor from knowledge gained while reviewing the breakfast pay attention longer than childrenrelated literature, which often leads the researcher who have no breakfast is more expect a certain finding. For example, stud- 3. A good hypothesis states as clearly andies finding white chalk to be more effective than concisely as possible the expected relation (oryellow chalk in teaching mathematics may lead a difference) between variables and definesresearcher to expect white chalk to be more effec- those variables in operational, measurabletive in teaching physics as well, if there are no terms. A simple but clearly stated hypothesisother findings to the contrary. Similarly, a theory makes the relation easier for readers tosuggesting that the ability to think abstractly is understand, is simpler to test, and facilitatesquite different for 10-year-olds than for 15-year- the formulation of conclusions. A relationolds may lead a researcher to propose a hypoth- between variables may be expressed as aesis that 10- and 15-year-olds perform differently correlational or a causal one. For example,on tests of abstract reasoning. in a study focused on the relation between anxiety and math achievement, the Although all hypotheses are based on theory hypothesis may be that anxiety and mathor previous knowledge and are aimed at extending achievement are negatively correlated, suchknowledge, they are not all of equal worth. A num- that students who are highly anxious alsober of criteria can be, and should be, applied to have low math achievement, and studentsdetermine the value of a hypothesis. The following with higher math achievement have lowguidelines help ensure that you develop a good anxiety. In a causal study addressing theresearch hypothesis: same variables, a researcher may hypothesize that anxiety causes poor performance on a 1. A hypothesis should be based on a sound math test. rationale. It should derive from previous research or theory, and its confirmation This example illustrates the need for or disconfirmation should contribute to operational definitions that clearly describe educational theory or practice. Therefore, a variables in measurable ways. Operational major characteristic of a good hypothesis is definitions clarify important terms in a that it is consistent with theory or previous study so that all readers understand the research. The chances are slim that you’ll precise meaning the researcher intends. be a Christopher Columbus of educational To define the variables in these studies, a research who shows that something believed researcher must ask such questions as “How to be flat is really round! Of course, in areas can we measure math achievement?” “What of research where results are conflicting, your

Find more at chapter 2  •  Identifying and Stating a Research Problem 99 does ‘poor performance’ mean?” “What See Table 2.3 for a summary of the characteris- observable behaviors define high anxiety?” tics of a good hypothesis. In this example, “high anxiety” may be a score on the Acme Anxiety Inventory in Types of Hypotheses the upper 30% of student scores, and “low anxiety” may be a score in the lowest 30% Hypotheses can be classified in terms of how of students. “Poor” performance on a math they are derived (i.e., inductive versus deductive test may be operationally defined in terms of hypotheses) or how they are stated (i.e., direc- certain math subtest scores on the California tional versus null hypotheses). If you recall the Achievement Test. discussion of inductive and deductive reason- ing in Chapter 1, you may guess that an induc- If you can operationally define your tive hypothesis is a generalization based on variables within the hypothesis statement specific observations. The researcher observes without making it unwieldy, you should that certain patterns or associations among do so. If not, state the hypothesis and variables occur in a number of situations and define the appropriate terms immediately uses these tentative observations to form an afterward. Of course, if all necessary terms inductive hypothesis. For example, a researcher have already been defined, either within observes that, in some eighth-grade classrooms, or immediately following the problem students who take essay tests appear to show statement, repeating the definitions in the less test anxiety than those who take multiple- statement of the hypothesis is not necessary. choice tests. This observation could become the The general rule of thumb is to define terms basis for an inductive hypothesis. A deductive the first time you use them, but it does not hypothesis, in contrast, is derived from theory hurt to remind readers of these definitions and provides evidence that supports, expands, or occasionally. contradicts the theory. 4. A well-stated and well-defined hypothesis must also be testable—and it will be testable if it A research hypothesis states an expected rela- is well formulated and stated. It should be tion or difference between variables. In other possible to test the hypothesis by collecting words, the quantitative researcher specifies the and analyzing data. It is not possible to test relation he or she expects to test in the research a hypothesis that some students behave study. Research hypotheses can be nondirectional better than others because some have an or directional. A nondirectional hypothesis states invisible little angel on their right shoulders simply that a relation or difference between vari- and some have an invisible little devil on ables exists. A directional hypothesis states the their left shoulders; a researcher would expected direction of the relation or difference. have no way to collect data to support the hypothesis. Table 2.3 • Characteristics of a good hypothesis A good hypothesis should normally 1. A good hypothesis is based on sound reasoning be testable within some reasonable period that is consistent with theory or previous research. of time. For example, the hypothesis that first-grade students who read after lunch 2. A good hypothesis provides a reasonable every day will have bigger vocabularies at explanation for the predicted outcome. age 60 would obviously take a very long time to test, and the researcher would very 3. A good hypothesis clearly states the expected likely be long gone before the study was relation or difference between defined variables. completed. A more manageable hypothesis with the same theme is that first-grade 4. A good hypothesis is testable within a reasonable children who read after lunch every day time frame. will have bigger vocabularies at the end of the first grade than those who don’t read daily.

Find more at www.downloadslide.com100 chapter 2  •  Identifying and Stating a Research ProblemFor example, a nondirectional hypothesis may in measurable terms. A general model for stat-state the following: ing hypotheses for experimental studies is as follows: The achievement of 10th-grade biology students who are instructed using interactive multimedia P who get X do better on Y than P who do not get is significantly different than the achievement of X (or get some other X) those who receive regular instruction only. In the model, The corresponding directional hypothesis mayread: P = the participants X = the treatment, the causal or independent Tenth-grade biology students who are in- structed using interactive multimedia achieve variable (IV) at a higher level than those who receive regular Y = the study outcome, the effect or instruction only. dependent variable (DV) The nondirectional hypothesis predicts a dif-ference between the groups, whereas the direc- Although this model is an oversimplificationtional hypothesis predicts not only the ­difference and may not always be appropriate, it should helpbut also that the difference favors interactive you to understand the statement of a instruction. A directional hypothesis should Further, this model, sometimes with variations, isbe stated only if you have a basis for believing applicable in many situations.that the results will occur in the stated direc-tion. Nondirectional and directional h­ypotheses Study the following problem statement, andinvolve different types of statistical tests of sig- see if you can identify P, X, and Y:nificance, which are examined in Chapter 18. The purpose of this study is to investigate Finally, a null hypothesis states that there the effectiveness of 12th-grade mentors on theis no significant relation or difference between ­absenteeism of low-achieving 10th-graders.variables. For example, a null hypothesis may bestated as follows: In this example, The achievement level of 10th-grade biology P = low-achieving 10th-graders students who are instructed using interactive X = presence or absence of a 12th-grade multimedia is not significantly different than the achievement level of those who receive regular mentor (IV) instruction. Y = a bsenteeism, measured as days absent or, The null hypothesis is the hypothesis of stated positively, days present (DV)choice when a researcher has little research ortheoretical support for a hypothesis. Also, statisti- A review of the literature may indicate thatcal tests for the null hypothesis are more conser- mentors are effective in influencing younger stu-vative than they are for directional hypotheses. dents. Therefore, the directional hypothesis result-The disadvantage of null hypotheses is that they ing from this problem may read as follows:rarely express the researcher’s true expectationsbased on literature, insights, and logic. Given that Low-achieving 10th-graders (P) who have afew studies can be designed to test for the non- 12th-grade mentor (X) have less absenteeismexistence of a relation, it seems logical that most (Y) than low-achieving 10th-graders who dostudies should not be based on a null hypothesis. not.Stating the Hypothesis As another example, consider this problem statement:A good hypothesis is stated clearly and con-cisely, expresses the relation or difference The purpose of the proposed research is tobetween variables, and defines those variables ­investigate the effectiveness of different conflict resolution techniques in reducing the aggressive behaviors of high school students in an alterna- tive educational setting.

Find more at chapter 2  •  Identifying and Stating a Research Problem 101 For this problem statement, See if you can write the null hypothesis for the P = high school students in an alternative following problem statement: educational setting The purpose of this study is to assess the impact X = type of conflict resolution—punishment of formal versus informal preschool reading in- struction on children’s reading comprehension or discussion (IV) at the end of the first grade. Y = instances of aggressive behaviors (DV) Testing the Hypothesis The related nondirectional hypothesis mayread as follows: You will use your hypothesis as you conduct your research study. The researcher selects the For high school students in an alternative edu- sample, measuring instruments, design, and pro- cational setting, the number of aggressive be- cedures that will enable him or her to collect the haviors will be different for students who re- data necessary to test the hypothesis. During the ceive punishment than for students who engage course of a research study, those data are ana- in discussion approaches to conflict resolution. lyzed in a manner that permits the researcher to determine whether the hypothesis is supported. Of course, in all these examples, the terms Remember that analysis of the data does not leadrequire operational definition (e.g., “aggressive to a hypothesis being proven or not proven, onlybehaviors”). supported or not supported for this particular study. The results of analysis indicate whether a Got the idea? Let’s try one more. Here is the hypothesis is supported or not supported for theproblem statement: particular participants, context, and instruments involved. This study investigates the effectiveness of to- ken reinforcement, in the form of free time Many beginning researchers have the miscon- given for the completion of practice worksheets, ception that if the hypothesis is not supported by on the math computation skills of ninth-grade the data, then the study is a failure and, conversely, if general math students. the hypothesis is supported, then the study is a suc- cess. Neither of these beliefs is true. If a hypothesis is P = ninth-grade general math students not supported, a valuable contribution may be made X = token reinforcement in the form of through the development of new research methods or even a revision of some aspect of a theory. Such free time for completion of practice revisions can generate new or revised hypotheses worksheets and new and original studies. Thus, hypothesis test- Y = math computation skills ing contributes to education primarily by expanding, refining, or revising its knowledge base. The directional hypothesis may be written asfollows: Definition and Purpose of Hypotheses in Qualitative Studies Ninth-grade general math students who re- ceive token reinforcement in the form of free The aims and strategies of qualitative researchers time when they complete their practice work- may differ substantially from those of quantitative sheets have higher math computation skills researchers. Typically, qualitative researchers do than ninth-grade general math students who not state formal hypotheses before conducting do not receive token reinforcement for com- studies; rather, they seek to understand the nature pleted worksheets. of their participants and contexts before stating a research focus or hypothesis. As noted earlier, how-The null hypothesis may take this form: ever, qualitative researchers may develop g­ uiding hypotheses for the proposed research. Rather than There is no difference on Y (the outcome of the study) between P1 (treatment A) and P2 (­ treatment B). P1 (treatment A) = free time P2 (treatment B) = no free time

Find more at www.downloadslide.com102 chapter 2  •  Identifying and Stating a Research Problemtesting hypotheses, qualitative researchers are Having identified a guiding hypothesis, themuch more likely to generate new hypotheses qualitative researcher may operationalize theas a result of their studies. The inductive process hypothesis through the development of researchwidely used in qualitative research is based on questions that provide a focus for data collection.observing patterns and associations in the par- Qualitative research questions encompass a rangeticipants’ natural setting without prior hunches or of problems, but most focus on participants’ under-hypotheses about what researchers will study and standing of meanings and social life in a particularobserve. Qualitative researchers’ reluctance to iden- context. However, these general problems musttify variables and predictions immediately stems necessarily be more focused to become useful andfrom the view that contexts and participants differ researchable questions. For example, the problemand must be understood on their own terms before “What are the cultural patterns and perspectives ofa researcher can begin hypothesizing or judging. this group in its natural setting?” can be narrowedThus, qualitative researchers have more discre- by asking, “What are the cultural patterns and per-tion in determining when and how to examine or spectives of teachers during lunch in the teachers’narrow a problem. lounge?” Similarly, the problem “How do people make sense of their everyday activities to behave Identifying patterns and associations in the in socially acceptable ways?” may be narrowedsetting often helps a researcher discover ideas by asking, “How do rival gang members engageand questions that lead to new hypotheses. For in socially acceptable ways when interacting withexample, the repeated observation that, early in each other during the school day?” Clearly, therethe school year, first-grade students can accu- are many ways to restate these questions to makerately identify the “smart” and the “not smart” them viable and focused research questions. Instudents in class may suggest a hypothesis about most cases, the purpose of narrowing questionshow teachers’ actions and words communicate is to reduce aspects of the problem, much as astudents’ status in the classroom. In simple terms, hypothesis does for quantitative research, becauseit is generally appropriate to say that a strength most researchers overestimate the proper scope ofof qualitative research is in generating hypoth- a study.eses, not testing hypotheses.

Find more at chapter 2  •  Identifying and Stating a Research Problem 103SummaryIdentifying a Research Problem narrowed until the researcher has more information about the participants and their 1. The first step in selecting a research problem setting. is to identify a general subject that is related to your area of expertise and is of particular Characteristics of Good Problems interest to you. 1 0. Two basic characteristics of a good researchSources of Research Problems problem are that it is of interest to the researcher and that it is researchable using 2. The five main sources of research problems the collection and analysis of data. Problems are theories, personal experiences, previous related to philosophical and ethical issues studies that can be replicated, electronic (i.e., should questions) are not researchable. mailing lists, and library searches. 11. A good problem has theoretical or practical 3. Theories are organized bodies of concepts, significance; its solution contributes in some generalizations, and principles. Researchers way to improving the educational process. often study particular aspects of a theory to determine its applicability or generalizability. 1 2. A good problem is one that is ethical and does not harm participants in any way. 4. A researcher’s personal experiences and concerns often lead to useful and personally 13. A good problem for you must be a problem rewarding studies. Common questions, such that can be adequately investigated given your as “Why does that happen?” and “What would current level of research skill, the available happen if …?” can be rich problem sources. resources, and time and other restrictions. 5. Existing studies are a common source of Stating the Research Problem research problems. Replication of a study usually involves changing some feature from 1 4. The problem statement is the first item in the original study. the introduction to a research plan and the introduction to the final research report. It 6. Library searches are generally not efficient provides direction for the remaining aspects ways to identify research problems. of both. Handbooks, encyclopedias, and yearbooks that cover many problems briefly are more 1 4. A well-written problem statement for a useful. Library resources are invaluable, quantitative study generally indicates the however, after you have identified a problem variables of interest, the specific relations to study. among those variables, and (ideally) the characteristics of the participants. Qualitative 7. Electronic mailing list services are designed research problems are usually stated in by organizations to facilitate communication general language because qualitative (usually via the Internet) among their researchers need to become attuned to the members. Other digital tools such as RSS research context before narrowing their feeds, Facebook, Twitter, and blogs keep problem. researchers updated on what others are investigating. Developing Research QuestionsNarrowing the Problem 1 6. Developing research questions breathes life into the research problem statements. 8. After an initial problem is identified, it often needs to be narrowed and focused into a 1 7. The research questions add another level of manageable problem to study. specificity to the development of the research problem and provide the researcher with 9. Quantitative research problems are usually an action plan for the development and narrowed quickly at the start of a study. identification of research instruments. Qualitative research problems are not usually

Find more at www.downloadslide.com104 chapter 2  •  Identifying and Stating a Research ProblemFormulating and Stating 2 6. A nondirectional hypothesis predicts only thata Hypothesis a relation or difference exists; a directional hypothesis indicates the direction of the 18. A hypothesis is a researcher’s prediction of difference as well. A null hypothesis predicts the research findings. that there is no significant relation or difference between variables.1 9. Researchers do not set out to prove a hypothesis but rather collect data that either Stating the Hypothesis support or do not support it. 2 7. A general paradigm, or model, for statingDefinition and Purpose of Hypotheses in hypotheses for experimental studies is P whoQuantitative Studies get X do better on Y than P who do not get X (or get some other X). P refers to participants, 20. A hypothesis in a quantitative study is X refers to the treatment or independent formulated based on theory or on knowledge variable (IV), and Y refers to the outcome or gained while reviewing the related literature. dependent variable (DV). 21. A critical characteristic of a good hypothesis Testing the Hypothesis is that it is based on a sound rationale. A hypothesis is a reasoned prediction, not a wild 28. The researcher selects the sample, measuring guess. It is a tentative but rational explanation instruments, design, and procedures that for the predicted outcome. will enable him or her to collect the data necessary to test the hypothesis. Those data 22. A good hypothesis states clearly and concisely are analyzed to determine whether or not the the expected relations or differences between hypothesis is supported. variables. Variables should be stated in measurable terms. Definition and Purpose of Hypotheses in Qualitative Studies2 3. A well-stated and well-defined hypothesis must be testable. 2 9. Typically, qualitative researchers do not state formal hypotheses prior to the study.Types of Hypotheses However, a qualitative researcher may develop guiding hypotheses for the proposed research. 24. An inductive hypothesis is a generalization made from a number of observations. A 30. Having identified a guiding hypothesis, the deductive hypothesis is derived from theory qualitative researcher may operationalize and is aimed at providing evidence that the hypothesis through the development of supports, expands, or contradicts aspects of a research questions that provide a focus for given theory. data collection. Qualitative researchers are likely to generate new hypotheses as a result2 5. A research hypothesis states the expected of their studies. relation or difference between variables, which the researcher expects to test through the collection and analysis of data.

Find more at This page intentionally left blank

Find more at Chapter Three Literature Review Real Genius, 1985“Too often the review of related literature is seen as a necessaryevil to be completed as fast as possible so that one can get on with the ‘real research’.” (p. 107)

Find more at chapter 3  •  Literature Review 107Learning Outcomes ■ Write an introduction for a research plan, including a complete review of the literatureAfter reading Chapter 3, you should be able to do that supports a testable hypothesis.the following: Task 2A 1. Define the purpose and scope of a review of related literature. Write an introduction for a quantitative research plan. Include a statement of the research problem, a 2. Describe the role of the literature review in statement concerning the importance or significance qualitative research. of the problem, a brief review of related literature, and a testable hypothesis regarding the outcome 3. Identify keywords and identify, evaluate, and of your study. Include definitions of terms where annotate sources. appropriate (see Performance Criteria, p. 132). 4. Describe the steps involved in analyzing, Task 2B organizing, and reporting a review of the literature. Write an introduction for a qualitative research plan. Include a statement of the research problem, a state- 5. Define meta-analysis and describe the process ment concerning the importance or significance of for conducting a meta-analysis. the problem, a brief review of related literature, and a guiding hypothesis for your study. Include defini-The chapter outcomes form the basis for Tasks 2A tions of terms where appropriate (see Performanceand 2B, which require you to Criteria, p. 132).■ Identify 10 to 15 good references (sources) that relate directly to a problem of interest. The references should include a variety of source types (e.g., books, articles, Internet reports, etc.).■ Evaluate and abstract those references.Review of Related or report that discusses the reviewed documents.Literature: Purpose These documents can include articles, abstracts,and Scope reviews, books, dissertations, government pub- lications, and other research reports. The majorHaving happily found a suitable problem, the purpose of reviewing the literature is to determinebeginning researcher is usually raring to go. Too what has already been done that relates to youroften the review of related literature is often seen problem. This knowledge not only prevents youas a necessary evil to be completed as fast as possi- from unintentionally duplicating another person’sble so that one can get on with the “real research.” research but also gives you the understanding andThis perspective reflects a lack of understanding insight you need to place your problem within aof the purposes and importance of the review and logical framework. Previous studies can providea feeling of uneasiness on the part of students the rationale for your research hypothesis andwho are not sure how to report the literature. indicate what needs to be done to help you jus-Nonetheless, the review of related literature is as tify the significance of your study. Put simply, theimportant as any other component of the research review tells you what has been done and whatprocess and can be conducted quite painlessly if needs to be done.approached in an orderly manner. Some research-ers even find the process quite enjoyable! Another important purpose of reviewing the literature is to discover research strategies and spe- The review of related literature involves the cific data collection approaches that have or havesystematic identification, location, and analysis not been productive in investigations of problemsof documents containing information related to similar to yours. This information will help youthe research problem. The term is also used to avoid other researchers’ mistakes and profit fromdescribe the written component of a research plan their experiences. It may suggest approaches and procedures that you previously had not considered.

Find more at www.downloadslide.com108 chapter 3  •  Literature ReviewFor example, suppose your problem involved the of different learning tasks. Focus on thosecomparative effects of a brand-new experimen- using similar subjects or similar variables—tal method versus the traditional method on the for example, if you were concerned with theachievement of eighth-grade science students. The relation between frequency of feedback andreview of literature may reveal 10 related studies chemistry achievement, you would probablythat found no differences in achievement. Several not have to review feedback studies related toof the studies, however, may suggest that the non-human animal learning.brand-new method is more effective for certain ■ When investigating a new or little-researchedkinds of students than for others. Thus, you may problem area, review any study related inreformulate your problem to involve the compara- some meaningful way to your problem. Gathertive effectiveness of the brand-new method versus enough information to develop a logicalthe traditional method on the achievement of a framework for the study and a sound rationalesubgroup of eighth-grade science students—such for the research hypothesis. For example,as those with low aptitude. suppose you want to study the effects of an exam for non-English-speaking students on Being familiar with previous research also facili- grade point average (GPA). The students musttates interpretation of your study results. The results pass the exam to graduate. Your literaturecan be discussed in terms of whether or how they review would probably include any studiesagree with previous findings. If the results contra- that involved English as a second languagedict previous findings, you can describe differences (ESL) classes and the effects of culture-specificbetween your study and the others, providing a grading practices as well as studies thatrationale for the discrepancy. If your results are identified strategies to improve the learningconsistent with other findings, your report should of ESL students. In a few years, there willinclude suggestions for the next step; if they are not probably be enough research on the academicconsistent, your report should include suggestions consequences of such an exam on non-English-for studies that may resolve the conflict. speaking students to permit a much more narrowly focused literature review. Beginning researchers often have difficultydetermining how broad and comprehensive their lit- A common misconception among beginningerature reviews should be. At times, all the literature researchers is that the worth of a problem is directlywill seem directly related to the problem, so it may related to the amount of literature available about difficult to determine when to stop. Determining This is not the case. For many new and importantif an article is truly relevant to the problem is com- areas of research, few studies have been published.plicated and requires time. Unfortunately, there is The effect(s) of high-stakes testing is one such simple formula to solve the problem. You must The very lack of such research often increases thedecide using your own judgment and the advice worth of its study. On the other hand, the fact that aof your teachers or advisers. The following general thousand studies have already been done in a givenguidelines can assist you: problem area does not mean there is no further need for research in that area. Such an area will■ Avoid the temptation to include everything you generally be very well developed, and subprob- find in your literature review. Bigger does not lems that need additional research will be readily mean better. A smaller, well-organized review identifiable. is definitely preferred to a review containing many studies that are only tangentially related Qualitative Research to the problem. and the Review of Related Literature■ When investigating a heavily researched area, review only those works that are directly related Both qualitative and quantitative researchers con- to your specific problem. You will find plenty struct literature reviews. Unlike quantitative of references and should not have to rely on researchers, however, who spend a great deal of less relevant studies. For example, the role of feedback for verbal and nonverbal learning has been studied extensively in both non- human animals and human beings for a variety

Find more at chapter 3  •  Literature Review 109time examining the research on their problems at Table 3.1 • Conducting a literature reviewthe outset of the study, some qualitative research-ers do not delve deeply into their literature until 1. Identify and make a list of keywords to guide yourthe problem has emerged over time. Qualitative literature search.researchers disagree about the role of the litera-ture review in the research process. Some qualita- 2. Search appropriate databases using your keywordstive researchers have argued that reviewing the and identify authoritative subject headings to locateliterature curtails inductive analysis—using induc- primary and secondary sources pertaining to yourtion to determine the direction of the research— research problem.and should be avoided at the early stages of theresearch process.1 Others suggest that the review 3. Evaluate your sources for quality.of related literature is important early in the quali-tative research process because it serves the fol- 4. Abstract your sources.lowing functions:2 5. Analyze and organize your sources using a literature■ The literature review demonstrates the matrix. underlying assumptions (i.e., propositions) behind the research questions that are central 6. Write the literature review. to the research proposal. Identifying Keywords■ The literature review provides a way for and Subject Terms, and the novice researcher to convince the Identifying, Evaluating, proposal reviewers that he or she is and Annotating Sources knowledgeable about the related research and the intellectual traditions that support Identifying Keywords the proposed study.3 The words you select for your searches dictate the■ The literature review provides the researcher success of your research. Before you begin your with an opportunity to identify any gaps research, list possible keywords to guide your liter- that may exist in the body of literature and ature search. As you progress through your search- to provide a rationale for how the proposed ing, add additional keywords and subject headings study may contribute to the existing body of related to your search. Most of the initial source knowledge. works you consult will have an alphabetical subject index or a thesaurus to help you locate information■ The literature review helps the researcher to about your problem. You can look in these indexes refine the research questions and embed them for the keywords you have selected. Databases such in guiding hypotheses that provide possible as the Education Resources Information Center directions the researcher may follow. (ERIC) and Education Full Text generate subject headings or descriptors with the search results.We recommend that qualitative researchers con-duct a review of related literature but also rec- For example, if your problem concerns theognize that the review serves a slightly different effect of interactive multimedia on the achieve-purpose than the one outlined for quantitative ment of 10th-grade biology students, the logicalresearchers. keywords would be interactive multimedia and biology. However, when beginning with a keyword Conducting a literature review follows a basic search for interactive multimedia in a databaseset of steps for both quantitative and qualitative such as ERIC, you will see a list of possible subjectresearch. Table 3.1 outlines the basic process to headings such as multimedia instruction, computer-take when reviewing the literature. assisted instruction, multimedia materials, games, or hypermedia. These subject headings may also1 Qualitative Research for Education: An Introduction to Theory be called descriptors. It is important that youand Methods (3rd ed.), by R. C. Bogdan and S. K. Biklen, 1998, understand the difference between the keywordBoston, MA: Allyn & Bacon. search and a subject heading—and, perhaps more2 Designing Qualitative Research (2nd ed.), by C. Marshall andG. Rossman, 1995, Thousand Oaks, CA: Sage.3 Ibid., p. 28.

Find more at www.downloadslide.com110 chapter 3  •  Literature Reviewimportant, why you want to connect to the subject ■ Handbook of Research on the Education ofheadings. Young Children Every article that is indexed in a database such ■ Handbook of Latinos and Education: Theory,as ERIC or Education Full Text is read by a human Research, and Practicebeing who determines the problems addressed inthe article. The problems are listed as subject head- ■ Handbook of Research on Practices andings or descriptors in the article citation. Therefore, Outcomes in E-Learning: Issues and Trendsa subject search is more precise than a keywordsearch that searches for the words anywhere in ■ Handbook of Reading Disability Researchthe complete record of an article. If the words ■ Handbook of Research on the Education ofappear one time in the full text of an article, youwill retrieve that article even though it may not be School Leadersvery relevant to your search. Subject headings or ■ Handbook of Research on New Media Literacydescriptors connect you with the major conceptsthat you are searching for, not just the words. You at the K–12 Level: Issues and Challengesmay have to mix and match your search terms to ■ Handbook of Research on Children’s and Youngretrieve more accurate and relevant results. The key-words and subject headings are sometimes obvious Adult Literaturefor some searches, such as biology. For others, you ■ Handbook of Research on Teaching the Englishmay have to play detective. Giving a bit of thoughtto possible keywords and subject headings should Language Artsfacilitate an efficient beginning to an effective ■ Handbook of Education Policy Researchsearch. As you progress through your search, try to ■ Handbook of Research on School Choiceidentify additional keywords and subject headings ■ Handbook of Research on Literacy and Diversitythat you can use to reformulate a search to produce ■ Handbook of Education Finance and Policydifferent and more relevant results. ■ Handbook of Research in the SocialIdentifying Your Sources Foundations of Education ■ Handbook of Research on Schools, Schooling,For your review, you will examine a range ofsources that are pertinent to your problem. To and Human Developmentstart, it is best to consult educational encyclope-dias, handbooks, and annual reviews found in It’s important to distinguish between two typeslibraries. These resources, some of which were of sources used by educational researchers: p­ rimarymentioned earlier in the discussion on narrowing and secondary sources. A primary source containsyour problem, provide summaries of important firsthand information, such as an original documentproblems in education and reviews of research or a description of a study written by the personon various problems. They allow you to get a pic- who conducted the study. The data are factualture of your problem in the broader context and rather than interpretive, so the study is more valuedhelp you understand where it fits in the field. You than secondary research. Research reports, disserta-may also find these sources useful for identifying tions, experiments, surveys, conference proceed-search terms and aspects related to your problem ings, letters, and interviews are some examples ofthat you may not have considered. primary sources. There is a difference between the opinion of an author and the results of an empirical The following are some examples of hand- study. The latter is more valued in a review.books, encyclopedias, and reviews relevant to edu-cational research: A secondary source is a source that interprets or analyzes the work of others—either a primary■ The International Encyclopedia of Education source or another secondary source, such as a■ Encyclopedia of Curriculum Studies brief description of a study written by someone■ Handbook of Research on Teacher Education: other than the person who conducted it. Secondary sources are often used to review what has already Enduring Questions in Changing Contexts been written or studied. Education encyclopedias, handbooks, and other reference works typically con- tain secondhand information summarizing research studies conducted on a given problem. Secondary sources usually give complete bibliographic infor- mation for the references cited, so they can direct you to relevant primary sources, which are preferred over secondary sources.

Find more at chapter 3  •  Literature Review 111Searching for Books on Your Research increasingly available options. In today’s academicProblem in the Library libraries, card catalogs of previous generations have been replaced with online catalogs. These on-Having identified your keywords and some poten- line catalogs provide access to the resources in thetial resources, you are ready to make an initial library as well as to collective catalogs accessingforay into your university library. Because it will materials from other libraries within a particularbe a second home to you, at least for a while, you region as part of a library’s consortium agreementshould become familiar with the library. The time with other institutions. For students, getting booksyou spend here initially will save more in the long through the collective catalog and having themrun. You should learn about the resources that delivered to the library at the students’ institu-are available and how to access them both within tion is generally free. These electronic catalogsand outside the building. You should know how are extremely user-friendly and connect you withto navigate your library’s entire website and how sources beyond your particular library. The onlineto access resources from any location with a con- catalog often offers you an opportunity to previewnection to the Internet. Most libraries, especially the work and review the table of contents, theu­ niversity libraries, provide help and education in introduction, and parts of the chapters that willthe use of their resources. You should be familiar help you decide whether there is relevant materialwith the services offered by the library as well related to your area of the rules and regulations regarding the use oflibrary materials. Libraries provide access to mate- To locate books, video, and other materialsrials through interlibrary loan for articles and to such as government documents, you need to con-collective library catalogs to find books that are duct a search of the library catalog. To search byavailable from other institutions. problem, begin with a keyword search. In library catalogs, a keyword search searches the entire Most university libraries have reference librar- record of an item, which includes the contentians on duty and 24/7 online chat reference to assist notes—these are chapter headings or titles ofyou with your research questions and to help you essays within a book. For example, to find sum-find materials. There also is a librarian who is the maries of research previously conducted in an arealiaison to the education department and special- of psychology, you may enter the keywords “hand-izes in research within the discipline. This librarian book” and “psychology” or “encyclopedia” andhas experience in both K–12 and graduate educa- “psychology.” If you search for a particular prob-tion and is very skilled in helping folks track down lem such as transformative learning, enter thoseresources. Librarians are usually very willing to terms as a keyword search. The keyword searchhelp you, but you should also learn to navigate the is important when you are looking for bookslibrary on your own. The librarian is available to because the search retrieves items with your key-work with you, not to do your research. With or words in the title, subject heading, and the contentwithout a librarian’s help, you should take advan- notes listed in the item record. Because the con-tage of the resources available from the library, tent notes provide a listing of essays and chaptersuch as the online catalog, databases like ERIC and headings within a book, a keyword search couldEducation Full Text, and government publications. retrieve an essay about transformative learning in aLearn to browse the stacks to search for books on book about adult learning. Always check the sub-your problem. Most of all, make a point to know ject headings to any relevant item that you locate.the education librarian. Generally, if you are taking You will find the subject headings importantmore than 15 to 20 minutes trying to find some- for finding similar items and further information.thing, then you should ask for assistance. You may If  you know a title of a book or an author, theneven want to set up an appointment with the educa- you can also search for the specific title or author.tion librarian. If you are at the beginning of your searchUsing Library Catalogs. Although significant for primary sources, add search terms slowly andtechnological advances have changed the way thoughtfully. Refrain from searching phrases suchresearch is conducted in the library, individual as “developing active learning activities in the­libraries vary greatly in their ability to capitalize on classroom.” Choose the main concepts from your research question—“active learning” and “classroom

Find more at www.downloadslide.com112 chapter 3  •  Literature Reviewactivities.” Add additional search terms, concept by leads to related materials when you initiated yourconcept, depending on the amount of materials you search on the computer. Remember, libraries try toretrieve and how narrow you want your search. organize like objects with like objects. When youIf you need a relatively small number of references locate a relevant item on the shelves, it is alwaysand if a significant amount of research has been prudent to look at other items nearby.published about your problem, a narrow searchwill likely be appropriate. If you need a relatively Steps for Searching Computer Databaseslarge number of references and very little has beenpublished about your problem, a broad search The online catalog found in a library is an examplewill be better. If you do not have a sense of what of a database, a sortable, analyzable collection ofis available, your best strategy is to start narrow records representing items such as books, docu-and broaden as necessary. For example, if you find ments, DVDs, and videos that are maintained onvery few references related to the effect of interac- a computer. Other types of subject specific data-tive multimedia on the achievement of 10th-grade bases are also used in research to search indexesbiology students, you can broaden your search by of articles—some of which are full textbooks,including all sciences or all secondary students. abstracts, or other documents. These databases such as the ERIC, Education Full Text, PsycInfo, A useful way to narrow or broaden a keyword and others provide an excellent way to identifysearch is to use Boolean operators, words that tell primary sources and secondary sources.the computer the keywords you want your searchresults to include or exclude. Common Boolean The steps involved in searching a research data-operators are the words AND, OR, and NOT. Put base are similar to those involved in a book searchsimply, using the connector AND or NOT between except that it is more critical to identify appropriatekeywords narrows a search, whereas using the con- subject headings or descriptors to retrieve highlynector OR broadens one. If you search “multiple relevant material.intelligences AND music,” you will obtain a list ofreferences that refer to both multiple intelligences 1. Identify keywords related to your problem.and music. If you search “multiple intelligences NOT 2. Select the appropriate databases—somemusic,” your search will retrieve references pertain-ing to multiple intelligences but will exclude refer- databases using the same interface mayences pertaining to music. A search for “multiple allow you to search more than one databaseintelligences OR music” retrieves references that simultaneously.relate to either or both concepts. By using various 3. Initiate a search using your keywordscombinations of the AND and OR connectors, you selectively. Some map databases to subjectcan vary your search strategy as needed. Note that headings or descriptors, requiring youit is difficult to develop a search model that can to build your search term by term. Otherbe ­followed in every library and every catalog or databases provide a list of subject headingsdatabase. You must get acquainted with the unique or descriptors based on the results of yoursearch strategies and methods that are successful search. For example, in Figure 3.1, youwithin your library environment. It is a good idea can see the results of a keyword searchto check with the education librarian to determine using “cooperative learning” and “studentif additional Boolean operator strategies are best achievement” with a possible 1,761 articles.suited to your search. These initial hits require additional sorting in order to determine the relevancy for yourBrowsing the Stacks.  With access to online cata- review of related literature.logs, many new researchers may not consider an 4. Reformulate your search using appropriateolder strategy for locating books—browsing the subject headings or descriptors, combiningstacks. This strategy is similar to the kind of activity terms as appropriate. Remember thatyou may undertake at a public library when looking combining too many terms may result in littlefor a new fiction book to read. If you can locate the or no retrieved items. If this occurs, mix andarea of the library with books related to your area match search terms or broaden the searchof focus, it can be productive to browse and pull to produce better results. For example, ininteresting books off the shelves. You may also find Figure 3.2, “student achievement” is called “academic achievement” and results in a more targeted list of references.

Find more at chapter 3  •  Literature Review 113Figure 3.1 • Sample of EBSCO keyword searchFigure 3.2 • Sample of ERIC/EBSCO search: reformulating with subject descriptors

Find more at www.downloadslide.com114 chapter 3  •  Literature Review Figure 3.3 • Sample ERIC/EBSCO: sample record 5. Once you have found a relevant article, 6. Most databases provide a link that creates a check the item record for links to additional citation in various formats including American subject headings or descriptors, author(s), Psychological Association (APA). Although the cited references, times cited in database, citations still need to be checked to see if they or other references for finding additional are exactly correct, they provide an excellent related items using the features within start to creating your list of references. ERIC/ the database. For example, the record in EBSCO allows you to create an account and to Figure 3.3 gives other descriptors used save your references in either APA, American to classify and a link to other articles Medical Association (AMA), Chicago Manual written by the same author that are in the of Style, or Modern Language Association database. (MLA) formats (see Figure 3.4). Figure 3.4 • Sample APA citation

Find more at chapter 3  •  Literature Review 115 7. Many databases allow you to create an review of related literature, you will be very account, so you can login to save and manage thankful to have created this account! your searches as well as your relevant research articles. This feature is an important The following sections describe some of the part of utilizing the capacity of a particular commonly used databases for searches of educa- database not only to retrieve relevant tion literature. research but also to manage your sources. For example, Figure 3.5 shows how you can Education Resources Information Center return to your MyEBSCOhost account at any (ERIC).  ERIC was established in 1966 by the time to copy your references, which can be National Library of Education as part of the pasted into your review of related literature United States Department of Education’s Office of document. When it comes time to write your Educational Research and Improvement and is now sponsored by Institute of Education Sciences (IES)Figure 3.5 • Managing references in a database

Find more at www.downloadslide.com116 chapter 3  •  Literature Reviewof the U.S. Department of Education. ERIC is the within the Wilson Education Index and referenceslargest digital library of education literature in the articles published in educational periodicals sinceworld. The online database provides information on 1983. In June 2011, the H.W. Wilson database mergedsubjects ranging from early childhood and elemen- with EBSCO, so the interface is the same as the ERICtary education to education for gifted children and database. The Education Full Text ­database pro-rural and urban education. ERIC is used by more vides reference to many full-text articles that are notthan 500,000 people each year, providing them available in the ERIC database, so it is important towith access to more than 1.3 million bibliographic search both databases for more ­comprehensive re-records of journal articles and more than 107,000 search. In addition to article a­bstracts, the databasefull-text non-journal documents. includes citations for yearbooks and monograph se- ries, videorecordings, motion picture and computer In 2004, the ERIC system was restructured by program reviews, and law cases.the Department of Education. The ERIC databaseis available at almost every academic library or via PsycINFO.  The PsycINFO database is the onlinethe ERIC website at The website uses version of Psychological Abstracts, a former printthe most up-to-date retrieval methods for the ERIC source that presents summaries of completed psy-databases, but it is no match for the database chological research studies (see provided by your academic library, par- Psychological Abstracts ceased its print publicationticularly because the database links to the full text in December 2006. PsycINFO contains summariesfor many of your articles. Given a choice, search of journal articles, technical reports, book chapters,ERIC via the interface available through your and books in the field of psychology. It is organizedlibrary—such as EBSCO. Doing so will allow you by subject area according to the PsycINFO classifi-to link automatically to full-text articles when they cation codes for easy browsing. The classificationare available through your library. Regardless of codes can be accessed at you choose to use your library’s database tips-classcodes.html. These classification codes a­ llowinterfaces or the government-sponsored ERIC web- you to retrieve abstracts for studies in a specificsite, ERIC is a formidable database for searching c­ategory—for example, Developmental Disorderseducational materials that is relatively quick and and Autism (3250) or Speech and Languageeasy to search. Disorders (3270). When you search ERIC, you may notice that Dissertation Abstracts. Dissertation Abstractsdocuments are categorized with an ED or EJ des- contains bibliographic citations and abstracts fromignation. An ED designation, or ERIC Document, is all subject areas for doctoral dissertations and mas-generally used for unpublished documents, such as ter’s theses completed at more than 1,000 accred-reports, studies, and lesson plans. For the most part, ited colleges and universities worldwide. The da-ED references are available in academic libraries as tabase dates back to 1861, with abstracts includedfull-text online documents within the database or from 1980 forward. If after reading an abstract youvia microfiche if they are very old. An EJ designa- want to obtain a copy of the complete dissertation,tion, or ERIC Journal, is used for articles that have check to see if it is available in your library. If not,been published in professional journals. EJ articles speak to a librarian about how to obtain a copy.are often available in full text from the ERIC data- You can request a dissertation from your librarybase at an academic library. If you are using the through interlibrary loan. Be aware that there mayERIC collection on the Web at, then the be charges to get the dissertation from the lendingfull text may not be available and must be tracked library.down in the periodicals collection of a library orpurchased from an article reprint company. Searching the Internet and the World Wide Web Although ERIC is the largest computer data-base for searches of education literature, it is not An abundance of educational materials is availablethe only source available. Other commonly used on the Web—from primary research articles anddatabases in education are described next. educational theory to lesson plans and research guides. Currently, a proficient researcher can accessEducation Full Text.  The Education Full Textd­ atabase is composed of articles historically available

Find more at chapter 3  •  Literature Review 117information in a variety of formats, such as video, little quality control on the Internet, and at times,images, multimedia, PowerPoint presentations, the sheer volume of information on the Webscreen captures, tutorials, and more. Blogs, RSS can be overwhelming. Some Internet sites postfeeds, podcasts, wikis, email, and other Web 2.0 research articles selected specifically to promotetools offer researchers a variety of alternative means or encourage a particular point of view or evenfor finding information. Also, as search engines an educational product. Blogs and wikis providedevelop to include more sophisticated methods for excellent modes of creating and manipulatingfinding research, both the technologically savvy and content to share and communicate ideas and con-traditional researchers can find primary sources cepts, but they are not always as robust as peer-using tools such as Google Scholar, Google Books, reviewed academic research. The key is to makeand more. Even Wikipedia can provide background sure you understand the strengths and limits ofinformation that can help a researcher understand the sources you use.fundamental concepts and theory that lead to bet-ter keywords and strategies for searching. YouTube​ The following are some websites that are espe-EDU provides access to educational videos that cially useful to educational researchers:include lectures and illustrations. For further discus-sion on using Google, see the box feature in this CSTEEP: The Center for the Study of Testing,chapter called Digital Research Tools for the 21st Evaluation, and Educational PolicyCentury: Google Searches. ( The website for this educational research organization contains Internet search engines are structured and i­nformation on testing, evaluation, and publicfunction differently than databases. The essen- policy studies on school assessment practices andtial difference is a search engine is compiled international comparative research.and organized through programming. Searchresults are retrieved using complex algorithms National Center for Education Statisticsthat establish an essential relevancy for each ( This site contains statistical reportssite or page. No person reads or reviews the list and other information on the condition of U.S.of sites compiled within a search engine. This education. It also reports on education activitiessomewhat explains why an Internet search breaks internationally.down after the second or third page and beginsto contain more and more irrelevant information. Developing Educational StandardsA  database is also compiled through program- ( This siteming, but it is organized by humans who have contains a wealth of up-to-date informationread the contents, such as the articles. It is essen- regarding educational standards and curriculumtial to connect with the human organization of a frameworks from all sources (e.g., national, state,database by using the appropriate subject head- local, and other). Information on standards andings and descriptors that are not present in an frameworks can be linked to by subject area,Internet search engine. state, governmental agency, or organization. Entire ­standards and frameworks are available. The resources you can find on the Web arealmost limitless. More and more print material U.S. Department of Education ( Thisis being digitized, and new sites are constantly site contains links to the education databasesbeing developed and tested to provide more supported by the U.S. government (includingand more access to information. With just a few ERIC). It also makes available full-text reports onclicks, you can access electronic educational jour- current findings on education and provides linksnals that provide full-text articles, bibliographic to research offices and organizations as well asinformation, and abstracts. You can obtain up-to-­ research publications and products.the-minute research reports and informationabout educational research activities undertaken Becoming a Member of Professionalat various research centers, and you can access Organizationseducation sites that provide links to a rangeof resources that other researchers have found Another way to find current literature related toespecially valuable. But be warned—there is your research problem is through membership in

Find more at www.downloadslide.com118 chapter 3  •  Literature Reviewprofessional organizations. The following list gives and Organizations site ( names of a few U.S.-based professional organi- gen/othersites/associations.html) to discover andzations that can be valuable resources for research learn about some of the associations that supportreports and curriculum materials. In countries teachers and specific disciplines in education.other than the United States, there are likely tobe similar organizations that can also be accessed Association for Supervision and Curriculumthrough an Internet search. This list of professional Development (ASCD; Boastingorganizations is not intended to be comprehensive 160,000 members in more than 135 countries,because there are as many professional organiza- ASCD is one of the largest educationaltions as there are content areas (e.g., reading, writ- organizations in the world. ASCD publishes books,ing, mathematics, science, social studies, music, newsletters, audiotapes, videotapes, and somehealth, and physical education) and special interest excellent journals that are valuable resourcesgroups (e.g., Montessori education). Try searching for teacher researchers, including Educationalthe Education Resource Organizations Directory or Leadership and the Journal of Curriculum andbrowse the About ED—Educational Associations Supervision.Digital Research Tools for the 21st CenturyGoogle SearchesGoogle Books ( than the title and the subject headings. As an al- ternative, you could search the title of the book inGoogle Books searches for books, and within Google Books to find additional information suchthe content of books, from Google’s digital book as the table of contents or even a preview of thecollection. The searchable collection of digi- contents of the book.tized books contains full-text as well as limited­selections and previews or snippet views of the Google Books began in 2004, and as Googlecontent—including front cover, table of contents, digitizes more and more content into the Googlei­ndexes, and other relevant information like related Books database, the usefulness to a researcherbooks, posted reviews, and key terms. As such, will continue to expand. Be aware that you willGoogle Books offers an alternative search mecha- want to consider limiting the publication date of anism to a library catalog for finding and preview- search using the advanced feature to retrieve moreing books and information inside books. Google current materials. The default search is set to rel-Books searches full text content, so a search can evancy, and the most relevant material may be toooften retrieve more specific information that a old for the research you are doing.l­ibrary catalog will not retrieve. Google Scholar ( In most cases, however, Google Books doesnot replace the full text of all the books that Google Scholar offers simple and free access toit finds, so it is best used in conjunction with scholarly information. Originally released in aa l­ibrary catalog or the collective catalog from beta version in November 2004, Google Scholara consortia of libraries. For example, you may searches for articles in full text and the citation andsearch Google Books and find a relevant book. abstract. It also searches the Google Books data-After reviewing the information such as the table base for books. To take full advantage of Googleof contents and the limited preview of the book, Scholar, you should click on Scholar Preferencesyou may want to search your library catalog to ob- and set the Library Links feature to access yourtain the book. On the other hand, you may find an library. This allows you to obtain the full text ofitem record of a book using your library catalog the articles you find through your library andthat does not contain much information about the your library databases. You may also want to setbooks in the record. You may not be able to see your preferences to retrieve only articles. Googlethe table of contents or other information other

Find more at chapter 3  •  Literature Review 119Scholar also includes links to other articles that ( cited an article and related articles. is a powerful online search tool for find- Again, for finding scholarly and peer-reviewed ing U.S. federal government and state governmentjournal articles, you will ultimately want to use information. For education research, refinesyour library’s access to the ERIC or Education a typical search by limiting the results to informationFull Text databases. However, Google Scholar can from federal and state domains. For example, youoften help you tease out the correct descriptors may search for “standards aligned curriculum” toor subject headings for finding articles in your li- determine what activities are happening in ­variousbrary databases. This is especially true if you find states. You can also limit a search to a particularthe full text of an article in a database from your state, such as Oregon, to retrieve information specifi-library. If you are familiar with searching Google, cally from sites such as the Oregon Department ofthen searching Google Scholar allows you to find Education. Because so much educational informa-relevant information using the simple and familiar tion and decision making can be found on govern-search strategies you use to search the Web. Your ment sites, a search is a good option forinitial search can start with Scholar, which can ulti- finding relevant primary information not found inmately lead you to more sophisticated searching in books and journal articles.library databases.National Council of Teachers of Mathematics Science and Children, Science Scope, The Science(NCTM; With nearly 100,000 m­ embers, Teacher, and Journal of College Science Teaching.NCTM is dedicated to the teaching and l­earning ofmathematics and offers vision and l­eadership for International Reading Associationmathematics educators at all age levels. NCTM pro- (IRA; IRA provides resources to anvides regional and national p­ rofessional international audience of reading teachers throughdevelopment opportunities and p­ ublishes the its publication of the journals The Readingfollowing journals: Teaching Children Mathematics, Teacher, Journal of Adolescent and Adult Literacy,Mathematics Teaching in the Middle School, and Reading Research Quarterly.Mathematics Teacher, Online Journal forSchool Mathematics, and Journal for Research About ED—Educational Associations andin Mathematics Education. Organizations ( gen/othersites/associations.html). This U.S.National Council for the Social Studies Department of Education site lists a variety of(NCSS; NCSS supports and ­advocates educational associations and studies ­education. Its resources fore­ ducators include the journals Social Education Education Resource Organizations Directoryand Social Studies and the Young Learner. ( The Education Resource Organizations Directory canNational Education Association (NEA; help you identify and contact a wide range ofThe mission of NEA is to advocate for education educational organizations in the discipline.professionals and fulfill the promise of public­education to prepare students to succeed in a Evaluating Your Sourcesdiverse and interdependent world. When you have retrieved a list of sources, you willNational Science Teachers Association need to evaluate them to determine not only if these(NSTA; NSTA, with more than 55,000 sources are relevant but also whether they are reli-members, provides many valuable resources for able and legitimate. Good researchers must be ablescience teachers. It develops the National Science to discern the quality and limitations of a source,Education Standards and publishes the journals so good research requires excellent judgment. The statements in Table 3.2 can serve as a rubric for

Find more at www.downloadslide.com120 chapter 3  •  Literature ReviewTable 3.2 • Rubric for evaluating print and Internet sources Evaluation CriteriaDimension 1 2 3 4 5 Poor Below Average Average Above Average ExcellentRelevancy The source does The source The source The source meets The source meets not address the addresses one addresses most all of the research all of the research research interests of the research of the research interests of your interests of your of your study. interests of your interests of your study. study and provides a study. study. conceptual framework for a study that is replicable.Author Unclear who Author name Author name, Author name, Author is a well-known authored the and contact contact contact researcher in the study. information is information, and information, and research area under provided. some credentials full credentials are investigation and are included in included in the provides links to other the article. article. research related to the current study.Source Source is a Source is a Source is a Source is a Source is a scholarly or nonrefereed nonrefereed scholarly or peer- scholarly or peer- peer-reviewed journal website and is a website and reviewed journal, reviewed journal. with links to related summary of the must be closely an education- literature by same author’s opinion. examined for related magazine, author(s) and ability to bias, subjectivity, or a popular download complete intent, accuracy, magazine. online versions of the and reliability article. before inclusion in the review of related literature.Methodology It is not possible The description of The source The source The source includes a to determine from the methodology includes a full includes a full full description of the the description does not include description of description of research problem and of the study sufficient the research the research the appropriateness whether or not information to problem and the problem and the of the methodology an appropriate determine if the appropriateness appropriateness of to investigate the methodology sample size was of the the methodology problem. Issues of was used to acceptable given methodology to to investigate validity and reliability investigate the the research investigate the the problem. are discussed along research problem. problem. problem. The results with limitations of are presented the study. There is objectively and sufficient information can be connected in the source to enable to the data a replication of the presented in the study. study.Date No date of Date of Current date of Current date of Current date of publication is publication is publication. publication with a publication with a included in the included, but it list of references list of references source. is too old to be consulted by the consulted by the helpful for the author. author, including links current research to full online articles. problem.

Find more at chapter 3  •  Literature Review 121evaluating your sources, whether those sources are refereed or a nonrefereed journal? In a refereedfrom scholarly journals, magazines, or websites. journal, articles are reviewed by a panel of experts in the field and are more scholarly and A note of caution: Anyone can post a profes- trustworthy than articles from nonrefereed orsional-looking website on the Internet;4 do not be popular journals. Research articles in refereedfooled by looks and apply the same criteria for journals are required to comply with strictevaluating web-based materials as you would for guidelines regarding not only format but alsoprint materials. Critically evaluating your sources research procedures. Special care and cautionwill save you time and energy reading and anno- must also be taken when evaluating websitestating sources that may contribute little to your because anyone can post information on theunderstanding of a research problem. This section Internet. Websites must be closely examined forincludes an evaluation rubric using the categories bias, subjectivity, intent, accuracy, and reliability.of relevancy, author, source, methodology, and These important quality-control questions willdate (including references). help you determine whether a source is worthy of inclusion in your review of related literature.Relevance Methodology■ What is the purpose or problem statement of the study? Obviously, the first thing to do is to ■ How was the study conducted? It is important determine if the source really applies to your to verify that the information presented in a research problem and qualifies to be included particular source is objective and impartial. in a review of related literature. Does the title of What was the methodology used to investigate the source reflect research related to your work? the problem or test the hypothesis? Was an Is there a well-refined question or statement of appropriate method used? Can the research be purpose? The problem statement is often found replicated by others? Was the sample size suitable in the abstract and allows you to determine the for the research? Does the source add to the relevance of the research to your own research. information you have already gathered about your problem? Is the information presented in theAuthor source accurate? It is important to verify that the information presented in a particular source is■ Who is the author? What are the qualifications, objective and impartial. Does the author present reputation, and status of the author? In most evidence to support the interpretations? Does the databases, the name of the author links to content of the article consist mainly of opinion, any other published works in the database. or does it contain appropriately collected and Is the subject matter a primary interest in other analyzed data? How accurate are the discussion published works of the author? Is the author and conclusions of the findings? Do the findings affiliated with any institution or organization? present any contrary data or assumptions? Most important, can you contact the author? Does the author have a personal website with Date curriculum vitae and contact information? ■ When was the research conducted? The dateSource of publication is of primary importance in evaluating a source. Look at the copyright date■ What is the source of the publication? Does of books and the dates when articles were the information come from a scholarly or published. Websites should always include a peer-reviewed journal, an education-related reference to the last updated or revised date. magazine, or a popular magazine? Is the Review the references listed in the bibliography information personal opinion or the result of or notes. These sources should be primarily a research study? Clearly, sources of different current, particularly in problems of current types merit different weight in your review. interest with continuing developments. For instance, did you find your source in a Searching for recent references does not mean that older research should be disregarded. Older4 For a comprehensive discussion on evaluating web-based ma- research, as opposed to out-of-date research,terials, visit Johns Hopkins University at, and the University of Maryland at

Find more at www.downloadslide.com122 chapter 3  •  Literature Review is often pertinent to your worldview as an To begin the annotation process, arrange your educator and is still relevant. The importance of articles and other sources in reverse chronologi- seminal theoretical works is evident throughout cal order. Beginning with the latest references is this book, such as the theoretical work a good research strategy because the most recent conducted in educational psychology by Jean research is likely to have profited from previous Piaget. Check the bibliography of a source to research. Also, recent references may cite preced- help determine the quality of the research. Do ing studies you may not have identified. For each the references reflect current, scholarly peer- reference, complete the following steps: reviewed research? Are they robust enough for the subject matter? Do they reflect original 1. If the article has an abstract or a summary, sources and alternative perspectives? Who are which most do, read the article to determine the authors? The list of references can yield its relevance to your problem. an abundance of information when evaluating the quality of a source. Remember, the quality 2. Skim the entire article, making mental notes of your research will also be judged by the of the main points of the study. references you choose, so you should be careful to select the best research to support your work. 3. On an index card or in a document, write a complete bibliographic reference for the Conducting effective library and Internet searches work. Include the library call number ifyields very useful information about your problem. the source work is from a book. This stepBy using multiple search methods and strategies, you can be tedious, but it is important. Youcan collect information that is current, accurate, and will spend much more time trying to findcomprehensive. As you become more experienced, the complete bibliographic information foryou will learn to conduct more efficient and effective an article or book you failed to annotatesearches, identifying better sources that focus on completely than you will annotating it in theyour problem and accurately represent the needed first place. If you know that your final reportinformation for your research. must follow a particular editorial style, such as that described in the Publication ManualAnnotating Your Sources of the American Psychological Association (APA), put your bibliographic reference inAfter you have identified the primary references that form. Remember, most databases willrelated to your problem, you are ready to move on put the citation of a source in a citationto the next phase of a review of related literature— style. For example, an APA-style referenceannotating the references. Many databases include for a journal article looks like this:an abstract or summary of a study that describesthe hypotheses, procedures, and conclusions. An Snurd, B. J. (2013). The use of white versusabstract is descriptive in nature and does not yellow chalk in the teaching of advancedassess the value or intent of the source. An annota- calculus. Journal of Useless Findings, 105(2),tion assesses the quality, relevance, and accuracy 465–477.of a source. The annotation also describes howthe source relates to the problem and its relative In this example, 2013 is the date ofimportance. Basically, annotating involves review- publication, 105 is the volume, and 2 ising, summarizing, and classifying your references. the issue number of the journal. The pageStudents sometimes ask why it is necessary to read numbers are 465 to 477. A style manualand annotate original, complete articles or reports provides reference formats for all types ofif they already have perfectly good abstracts. By sources. Whatever format you use, use itassessing the quality and usefulness of a source, consistently and be certain your bibliographicannotations articulate your response to a source references are accurate. You never knowand why the source is important to your research. when you may have to go back and getAfter completing annotations, many students dis- additional information from an article.cover that these same annotations contributed heav- 4. Classify and code the article according toily to the writing of their review of related literature. some system, and then add the code to the annotation in a conspicuous place, such as an upper corner. The code should be one that can be easily accessed when you want

Find more at www.downloadslide.com176 chapter 6  •  Constructs, Variables, and Testsmust understand the relations among constructs, 6. There will be a statistically significant relationvariables, and instruments. between number of years a teacher has been teaching and his or her interest in taking new A construct is an abstraction that cannot be courses.observed directly; it is a concept invented to explainbehavior. Examples of educational constructs are 7. Ninth-grade girls will have statistically differentintelligence, personality, teacher effectiveness, cre- attitudes toward science than ninth-grade boys.ativity, ability, achievement, and motivation. Tobe measurable, constructs must be operationally The variables in these examples are as follows:defined—that is, defined in terms of processes or (1) grades and self-confidence, (2) administrativeoperations that can be observed and measured. problems, (3) learning and the new social studiesTo measure a construct, it is necessary to identify p­ rogram (note that the social studies program hasthe scores or values it can assume. For example, two forms—new and old—and thus is also a vari-the construct “personality” can be made measur- able), (4) effects of the GI Bill, (5) student activitiesable by defining two personality types, introverts and student interactions, (6) years teaching andand extroverts, as measured by scores on a 30-item interest in taking new courses, and (7) gender andquestionnaire, with a high score indicating a more attitudes toward science.introverted personality and a low score indicating amore extroverted p­ ersonality. Similarly, the construct There are many different approaches to measur-“teacher effectiveness” may be operationally defined ing a variable and many instruments for doing so (inby observing a teacher in action and judging effec- educational research, an instrument is a tool usedtiveness based on four levels: unsatisfactory, mar- to collect data). For example, to measure sixth-gradeginal, adequate, and excellent. When constructs are students’ mathematics achievement, we can chooseoperationally defined, they become variables. from a number of existing measuring instruments, such as the Stanford Achievement Test or the IowaVariables Tests of Basic Skills. We can also use a teacher-made test to measure math achievement. InformationEarlier we defined variable as a placeholder that can about the instruments should be included in theassume any one of a range of values. The variable procedure section of the research plan.must be able to take on at least two values or scores.We deal with variables in all our research studies. Variables themselves differ in many ways. ForGender, ethnicity, socioeconomic status (SES), test example, variables can be represented by differentscores, age, and teacher experience are all variables; kinds of measurements, they can be identified aspeople differ on these characteristics. Can you iden- categorical or quantitative, or they can be classi-tify the variables in the following research problems fied as dependent or independent. The followingand hypotheses? sections discuss these distinctions. 1. Is there a relation between middle school Measurement Scales and Variables students’ grades and their self-confidence in science and math? Researchers use four types of measurement scales: nominal, ordinal, interval, and ratio scales. A mea- 2. What do high school principals consider to surement scale is a system for organizing data so be the most pressing administrative problems that it may be inspected, analyzed, and interpreted. they face? In other words, the scale is the instrument used to provide the range of values or scores for each vari- 3. Do students learn more from a new social able. It is important to know which type of scale studies program than from the previous one? is represented in your data because, as we discuss in later chapters, different scales require different 4. What were the effects of the GI Bill on state methods of statistical analysis. colleges in the Midwest in the 1950s? Table 6.1 summarizes the measurement scales 5. How do the first 5 weeks of school in and provides examples of each. In general, a nomi- Ms. Foley’s classroom influence student nal scale measures nominal variables, an ordinal activities and interactions in succeeding scale measures ordinal variables, and so on, as dis- months? cussed in the following subsections.

Find more at chapter 6  •  Constructs, Variables, and Tests 177Table 6.1 • Comparison of measurement scales example, if 50 students were placed into five read- ing groups, with each group representing a dif-Scale Description Examples ferent reading ability, a student in reading groupNominal Categorical Northerners, Southerners; 1 would be in the highest-achieving group and a Republicans, Democrats; student in reading group 5 would be in the lowest-Ordinal Rank order, eye color; male, female; achieving group. Rankings make it possible toInterval unequal units public, private; gifted make comparisons, for example, to say that one Rank order student, typical student student is achieving at a higher level than anotherRatio and interval Scores of 5, 6, 10 are equal student. Class rank is another example of an ordi- units but no to scores of 1, 2, 3 nal variable. zero point A score of 10 and a score Rank order, of 30 have the same Although ordinal variables permit us to describe intervals, and degree of difference as performance as higher, lower, better, or worse, they a defined zero a score of 70 and a score do not indicate how much higher one person point of 90 ­performed compared to another. In other words, A person is 5 feet tall and intervals between ranks are not equal; the differ- her friend is two-thirds as ence between rank 1 and rank 2 is not necessar- tall as she ily the same as the difference between rank 2 and rank  3. Consider the following example of ordinal data from BPSD where the five elementary schools were ranked based on student test scores in math (as measured by the statewide assessments):Nominal Variables Rank School Average Test ScoreA nominal variable is also called a categorical 1 Pinecrest 84variable because the values include two or more 2 Madrone 81named categories (i.e., the word nominal comes 3 Spruce 77from the Latin word for “name”). Nominal variables 4 Cedar 75include sex (e.g., female, male), employment sta- 5 Poison Oak 70tus (e.g., full-time, part-time, unemployed), maritalstatus (e.g., married, divorced, single), and type of The difference in average test score between rank 1school (e.g., public, private, charter). For identifi- and rank 2 is 3 points; the difference betweencation purposes, nominal variables are often rep- rank  2 and rank 3 is 4 points, the differenceresented by numbers. For example, the category between rank  3 and rank 4 is 2 points, and the“male” may be represented by the number 1 and difference between rank 4 and rank 5 is 5 points.“female” by the number 2. It is critically important Although an ordinal variable can be used to rankto understand that such numbering of nominal vari- schools or students, it does not have equal scaleables does not indicate that one category is higher intervals. This characteristic limits the statisticalor better than another. That is, representing male methods used to analyze ordinal variables.with a 1 and female with a 2 does not indicate thatmales are lower or worse than females or that males Interval Variablesare at a higher rank than females. The numbers areonly labels for the groups. To avoid such confusion, An interval variable also has values that areit is often best to label the levels of nominal vari- ranked in order, but its values also represent equalables with names or letters (A, B, C, etc.). intervals. Scores on most tests used in educational research, such as achievement, aptitude, motiva-Ordinal Variables tion, and attitude tests, are treated as interval vari- ables. When variables have equal intervals, it isAn ordinal variable not only classifies persons or assumed that the difference between a score ofobjects, it also ranks them. In other words, ordinal 30 and a score of 40 is essentially the same as thevariables have, as their values, rankings in orderfrom highest to lowest or from most to least. For

Find more at www.downloadslide.com178 chapter 6  •  Constructs, Variables, and Testsdifference between a score of 50 and a score of 60, Dependent and Independentand the difference between 81 and 82 is about the Variablessame as the difference between 82 and 83. Intervalscales, however, do not have a true zero point. As discussed earlier, the dependent variable in anThus, if Roland’s science achievement test score is experimental study is the variable hypothesized to0 on an interval scale of 0 to 100, his score does depend on or to be caused by another variable, thenot indicate the total absence of science knowl- independent variable. Recall this research problemedge. Nor does Gianna’s score of 100 indicate from Chapter 1:complete mastery. Without a true zero point, wecan say that a test score of 90 is 45 points higher Is there an effect of reinforcement onthan a score of 45, but we cannot say that a person students’ attitudes toward school?scoring 90 knows twice as much as a person scor-ing 45. Variables that have or are treated as having You probably had little trouble identifying atti-equal intervals are subject to an array of statistical tudes toward school as a variable. Because rein-data analysis methods. forcement is hypothesized to affect students’ attitudes toward school, “attitudes toward school”Ratio Variables is the dependent variable in this example. It is also a quantitative variable because it is likely mea-A ratio variable has equal intervals in rank order sured on a numerical scale (e.g., strongly favorableand, in addition, its ­measurement scale has a true toward school could be assigned a higher number,zero point. Height, weight, time, distance, and whereas strongly unfavorable could be assigned aspeed are examples of ratio scales. The concept lower one).of “no weight,” for example, is a meaningful one.Because of the true zero point, we can say not If the research question is rephrased as,only that the difference between a height of 3 ft “Do positive and negative reinforcement affect2 in. and a height of 4 ft 2 in. is the same as the elementary students’ attitudes toward school?”difference between 5 ft 4 in. and 6 ft 4 in. but also it is easy to see a second variable—type of rein-that a person 6 ft 4 in. is twice as tall as one who forcement—which contains two levels, positiveis 3 ft 2 in. As another example, the total number and negative. Because it has two named cat-of correct items on a test can be measured on a egories as its levels, it is a categorical variable.ratio scale; that is, a student can get zero items And because it is manipulated by the researchercorrect; a student with 20 items correct has twice (i.e., the researcher selected the two types ofas many correct answers as a student with 10 reinforcement and then assigned participants toitems correct. experience one or the other), type of reinforce- ment is the independent variable. The indepen-Quantitative and Qualitative dent variable in a research study is sometimesVariables called the experimental variable, the manipu- lated variable, the cause, or the treatment vari-Quantitative variables exist on a continuum that able, but regardless of the label, the independentranges from low to high, or less to more. Ordinal, variable is always the hypothesized cause of theinterval, and ratio variables are all quantitative dependent variable (also called the criterionvariables because they describe performance variable, the effect, the outcome, or the post-in quantitative terms. Examples are test scores, test). Independent variables are primarily usedheights, speed, age, and class size. in experimental research studies (and grouping variables are used in similar ways in causal–­ Nominal variables do not provide quantitative comparative studies).information about how people or objects differ.They provide information about qualitative differ- It is important to remember that the inde-ences only. Nominal variables permit persons or pendent variable must have at least two levels ofthings that represent different qualities (e.g., eye treatments. Thus, neither positive nor negativecolor, religion, gender, political party) but not dif- reinforcement is a variable by itself. The inde-ferent quantities. pendent variable is type of reinforcement; posi- tive reinforcement and negative reinforcement are the two levels of the variable. Try to identify

Find more at chapter 6  •  Constructs, Variables, and Tests 179the independent and dependent variables in this developing an instrument yourself. A standardizedresearch hypothesis: instrument is one that is administered, scored, and interpreted in the same way no matter where or Teachers who participated in the new when it is used. Standardized instruments tend to professional development program are less be developed by experts, who possess needed test likely to express approval of new teaching construction skills. From a research point of view, strategies than teachers who did not. an additional advantage of using a standardized instrument is that results from different studiesCharacteristics of using the same instrument can be compared.Measuring Instruments Thousands of published and standardized instru-In this section, we examine the range of measur- ments are available and yield a variety of data for aing instruments used to collect data in qualitative variety of variables. Major areas for which numerousand quantitative research studies. There are three measuring instruments have been developed includemajor ways to collect research data: achievement, personality, attitude, interest, and apti- tude. Each area can, in turn, be further divided into 1. Administer a standardized instrument. many subcategories. Personality instruments, for 2. Administer a self-developed instrument. example, can be classified as nonprojective or pro- 3. Record naturally occurring or already jective, as discussed later in this chapter. Choosing an instrument for a particular research purpose available data (e.g., make observations in involves identifying and selecting the most appropri- a classroom or record existing grade point ate instrument from among alternatives. To choose averages). intelligently, researchers must be familiar with a vari- ety of instruments and know the criteria they shouldThis chapter is concerned with published, standard- apply in selecting the best alternatives.ized tests and teacher-prepared tests. Instrument Terminology Qualitative studies, such as ethnographicstudies, are often built around the idea that the Given the array of instruments in educationalresearcher will work with naturally occurring or research, it is important to know some of the basicexisting data. Although using naturally occur- terminology used to describe them. We start withring or existing data requires a minimum of the terms test, assessment, and measurement.effort and sounds very attractive, existing dataare appropriate for very few qualitative studies A test is a formal, systematic, usually paper-and, even when appropriate, available data can and-pencil procedure for gathering informationlead to problems. For example, two different about peoples’ cognitive and affective characteristicsteachers may give the same grade for different (a cognitive characteristic is a mental c­ haracteristicreasons (e.g., A for effort, A for achievement). related to intellect, such as achievement; an affec-The  grades, then, do not necessarily represent tive characteristic is a mental characteristic relatedthe same standard of behavior, and conclu- to emotion, such as attitude). Tests typically pro-sions based on the data may not be trustwor- duce numerical scores. A standardized test is onethy. However, developing a new, high-quality that is administered, scored, and interpreted in theinstrument to collect data also has drawbacks: same way no matter where or when it is used. ForIt  requires considerable effort and skill and example, the SAT, ACT, Iowa Tests of Basic Skills,greatly increases the total time needed to con- Stanford Achievement Test, and other nationallyduct the study. At a minimum, you would need a used tests have been crafted to ensure that all testcourse in measurement to gain the skills needed takers experience the same conditions when tak-for proper instrument development. At times, ing them. Such standardization allows comparisonshowever, constructing your own instrument will among test takers from across the nation. You maybe necessary, especially if your research problem remember taking national standardized achieve-and concepts are original or barely researched. ment tests in school. They probably had a stop sign every few pages that warned, “Stop! Do not turn Selecting an appropriate instrument that is the page until instructed.” These stops are to ensurealready standardized invariably takes less time than

Find more at www.downloadslide.com180 chapter 6  •  Constructs, Variables, and Teststhat all test takers have the same amount of time for answered correctly on an assessment. For exam-each part of the test. ple, if a ­student at Pinecrest Elementary achieved 78 of 100  points on a science test, the student’s Assessment is a broad term that encompasses raw score is 78. In most quantitative research,the entire process of collecting, synthesizing, and raw scores are the basic (unanalyzed) data.interpreting information, whether formal or infor- By themselves, however, raw scores don’t givemal, numerical or textual. Tests are a subset of us much information. To learn more, we mustassessment, as are observations and interviews. put the scores into a context. In other words, we must interpret the scores in some way. Measurement is the process of quantifying or Norm-referenced, criterion-referenced, and self-scoring performance on an assessment instrument. referenced scoring approaches represent threeMeasurement occurs after data are collected. ways of interpreting performance on tests and measures.Quantitative and QualitativeData Collection Methods In norm-referenced scoring, a student’s per- formance on an assessment is compared to theResearchers typically use paper-and-pencil methods, performance of others. For example, if we askobservations, or interviews to collect data. Obser­ how well a Pinecrest Elementary student per-vation and interviewing are used predominantly by formed in science compared to other students inqualitative researchers (and are discussed in detail the same grade across the nation, we are askingin Chapter 19), whereas paper-and-pencil methods for norm-referenced information. The interpreta-are favored by quantitative researchers. tion of the student’s score of 78 is based on how the student performed compared to the class or Paper-and-pencil methods are divided into two a national group of students in the same grade.general categories: selection and supply. With selec- Norm-referenced scoring is also called gradingtion methods (or selection items on an instrument), on the curve, where the curve is a bell-shapedthe test taker has to select from among a set of distribution of the percentages of students whogiven answers; these methods include multiple- receive each grade. Standardized tests and assess-choice, true/false, and matching questions. In supply ments frequently report norm-referenced scoresmethods (or supply items), the test taker has to sup- in the form of derived scores such as percentileply an answer; supply items include questions that ranks (discussed in detail in Chapter 17).require the responder to fill in the blank or write ashort answer or essay. In criterion-referenced scoring, an individu- al’s performance on an assessment is compared to Current emphasis on supply methods in a predetermined, external standard rather than toschools has spawned the rise of so-called per- the performance of others. For example, a teacherformance assessments. A performance assess- may say that test scores of 90 to 100 are an A,ment, also known as an authentic assessment or scores of 80 to 89 are a B, scores of 70 to 79 are aalternative assessment, is a type of assessment C, and so on. A student’s score is compared to thethat emphasizes a student process (e.g., lab dem- preestablished performance levels—to preestab-onstration, debate, oral speech, or dramatic per- lished criteria—to determine the grade. Anyoneformance) or product (e.g., an essay, a science who scores between 90 and 100 will get an A. If nofair project, or a research report). By asking stu- one scores between 90 and 100, no one will get andents to do or create something, educators seek A. If all students score between 90 and 100, theyto assess tasks more complex than memorization. all will get A’s. This scenario could not happen inIf a researcher is ­conducting research in schools, norm-referenced scoring, which requires that dif-it is likely that performance assessments are used ferent scores, even very close ones, get differentto collect data. grades.Interpreting Instrument Data Self-referenced scoring approaches involve measuring how an individual student’s performanceData from an assessment can be reported and on a single assessment changes over time. Studentinterpreted in various ways. A raw score is performances at different times are compared tothe number or point value of items a person determine improvement or decline.

Find more at chapter 6  •  Constructs, Variables, and Tests 181Types of Measuring number of subtests and other factors, standardizedInstruments achievement batteries can take from 1 to 5 hours to complete.Many different kinds of tests are available, and thereare many different ways to classify them. The Mental Some achievement tests, such as the Gates-Measurements Yearbook (MMY), published by the MacGinitie Reading Tests, focus on achievementBuros Institute of Mental Measurements ( in a single subject area. Single-subject tests aremental-measurements-yearbook), is a major source sometimes used as diagnostic tests. A diagnosticof test information for educational researchers. The test yields multiple scores to facilitate identifica-yearbook, which can be found in most large librar- tion of a student’s weak and strong areas withinies, provides information and reviews of more than the subject area. The Stanford Diagnostic Reading3,000 published tests in various school subject areas Test and the Key Math Diagnostic Inventory of(such as English, mathematics, and reading) as well Essential Mathematics Test are examples of widelyas personality, intelligence, aptitude, speech and used diagnostic achievement instruments.hearing, and vocational tests. Of all the types ofmeasuring instruments available, cognitive, affec- Aptitude Teststive, and projective tests are the most commonlyused in educational research. Tests of general aptitude are also referred to as scholastic aptitude tests and tests of general men-Cognitive Tests tal ability. Unlike an achievement test, which is used to assess what individuals have learned, anA cognitive test measures intellectual processes, ­aptitude test is commonly used to predict howsuch as thinking, memorizing, problem solving, well an individual is likely to perform in a futureanalyzing, reasoning, and applying information. situation. Aptitude tests are standardized and areMost tests that school pupils take are cognitive often administered as part of a school testing pro-achievement tests. gram; they are also used extensively in job hiring.Achievement Tests Aptitude tests usually include cognitive mea- sures, but ones that are not normally part ofAn achievement test measures an individual’s cur- classr­oom tests. For example, many require thatrent proficiency in given areas of knowledge or skill. participants respond to a variety of verbal and non-Typically administered in school settings, achieve- verbal tasks intended to measure the individual’sment tests are designed to provide information ability to apply knowledge and solve problems.about how well test takers have learned the material Such tests often yield three scores: an overall score,introduced in school. The tests are standardized, and a verbal score, and a quantitative individual’s performance is usually determinedby comparing it to the norm, the performance of a Aptitude tests can be administered to groups,national group of students in the individual’s grade or they can be individually administered. A com-or age level who took the same test. Thus, these monly used group-administered battery is thetests can provide comparisons of a given student to Columbia Mental Maturity Scale (CMMS). Thesimilar students nationally. CMMS has six versions and can be administered to school-age children, college students, and adults. Standardized achievement tests typically cover a It  includes 12 subtests representing five aptitudenumber of different curriculum areas, such as read- factors: logical reasoning, spatial relations, numeri-ing, vocabulary, language, and mathematics. A stan- cal reasoning, verbal concepts, and memory.dardized test that measures achievement in several Another frequently administered group aptitudecurriculum areas is called a test battery, and the test is the Otis-Lennon School Ability Test, whichassessment of each area is done with a subtest. The has versions designed for children in grades K–12.California Achievement Test, Stanford Achievement The Otis-Lennon School Ability Test measures fourTests, TerraNova, and the Iowa Tests of Basic Skills factors: verbal comprehension, verbal reasoning,are examples of cognitive achievement tests com- figurative reasoning, and quantitative reasoning.monly used in U.S. classrooms. Depending on the The Differential Aptitude Tests is another battery that includes tests of space relations, mechanical reasoning, and clerical speed and accuracy, among

Find more at www.downloadslide.com182 chapter 6  •  Constructs, Variables, and Testsother areas, and is designed to predict success in a vase given by our great-grandmother. Attitudesvarious job areas. indicate our favorable or unfavorable feelings; they reflect our tendencies to accept or reject groups, Individually administered tests are preferred ideas, or objects. For example, Greg’s ­attitude towardfor some test takers (e.g., very young children or Brussels sprouts is much more favorable than hisstudents with disabilities). Probably the most well attitude toward green beans (this attitude puts Gregknown of the individually administered tests are the in a distinct minority). Interests indicate the degreeStanford-Binet Intelligence Scale and the Wechsler to which we seek out or desire to participate inscales. The Stanford-Binet is appropriate for young particular activities, objects, and ideas. Personality ischildren and adults. Wechsler scales are available composed of a number of c­haracteristics that repre-to measure intelligence: the Wechsler Preschool sent a person’s typical behaviors; it describes whatand Primary Scale of Intelligence—Third Edition we do in our natural life circumstances.(WPPSI–III) is appropriate for children age 2 yearsthrough 7 years; the Wechsler Intelligence Scale for Attitude ScalesChildren—Fourth Edition (WISC–IV) is designedfor children age 6 years through 17 years; and the An attitude scale is an instrument that mea-Wechsler Adult Intelligence Scale—Fourth Edition sures what an individual believes, perceives, or(WAIS–IV) is given to older adolescents and adults. feels about self, others, activities, institutions, orAs an example, the WISC is a scholastic aptitude situations. Five basic types of scales are used totest that includes verbal tests (e.g., general informa- measure attitudes: Likert scales, semantic differ-tion, vocabulary) and performance tests (e.g., pic- ential scales, rating scales, Thurstone scales, andture completion, object assembly). Other commonly Guttman scales. The first three are frequently usedused individually administered aptitude tests are in educational research.the McCarthy Scales of Children’s Abilities and theKaufman Assessment Battery for Children. Likert Scales.  A Likert scale requires an individ- ual to respond to a series of statements by indicat-Affective Tests ing whether he or she strongly agrees (SA), agrees (A), is undecided (U), disagrees (D), or stronglyAn affective test is an assessment designed to mea- disagrees (SD). Each response is assigned a pointsure affective characteristics—mental characteristics value, and an individual’s score is determinedrelated to emotion, such as attitude, interest, and by adding the point values of all the statements.value. Affective tests are often used in educational For  example, the following point values are typi-research and exist in many different formats. Most cally assigned to positive statements: SA = 5, A = 4,are nonprojective; that is, they are self-report mea- U = 3, D = 2, SD = 1. An example of a positive state-sures in which the test taker responds to a series ment is “Short people are entitled to the same jobof questions or statements about him- or herself. opportunities as tall people.” A score of 5 or 4 onFor example, a question may be “Which would you this item indicates a positive attitude toward equalprefer, reading a book or playing basketball?” Self- opportunity for short people. A high total scorereport tests are frequently used in survey studies across all items on the test would be indicative of(e.g., to describe the personality structure of various an overall positive attitude. For negative statements,groups, such as high school dropouts), correlational the point values would be reversed—that is, SA = 1,studies (e.g., to determine relations among vari- A = 2, U = 3, D = 4, and SD = 5. An example of aous personality traits and other variables, such as negative statement is “Short people are not entitledachievement), and causal–comparative or experi- to the same job opportunities as tall people.” Onmental studies (e.g., to investigate the comparative this item, scores should be reversed; “disagree” oreffectiveness of different instructional methods for “strongly disagree” indicate a positive attitude to-different personality types). ward opportunities for short people. Instruments that examine values, attitudes, inter- Semantic Differential Scales.  A semantic differ-ests, and personalities tap the test takers’ emotions ential scale requires an individual to indicate hisand perceptions. Values are deeply held beliefs or her attitude about a problem (e.g., propertyabout ideas, persons, or objects. For example, wemay value our free time, our special friendships, or

Find more at chapter 6  •  Constructs, Variables, and Tests 183taxes) by selecting a position on a continuum that Circle the number that best describes theranges from one bipolar adjective (e.g., fair) to degree to which you state lesson objectivesanother (e.g., unfair). Each position on the con- and give an overview before teaching a lesson.tinuum has an associated score value. For example,a scale concerning attitudes toward property taxes 5 = alwaysmay include the following items and values: 4 = almost always 3 = about half the timeNecessary Unnecessary 2 = rarely 1 = never—— —— —— —— —— —— ——3 2 1 0 -1 -2 -3 12345Fair Unfair —— —— —— Likert, semantic differential, and rating scales—— —— 1 0 -1 —— —— are similar, requiring the respondent to self-report32 -2 -3 along a continuum of choices. However, in cer-Better —— —— —— Worse tain situations—such as observing performance or 1 0 -1 judging teaching competence—Likert, semantic dif-—— —— —— —— ferential, and rating scales can be used by others32 -2 -3 (e.g., a researcher, a principal, a colleague) to collect information about study participants. For example, This scale is typical of semantic differential in some studies it may be best to have the principal,scales, which usually have 5 to 7 intervals, with a rather than the teacher, use a Likert, semantic dif-neutral attitude assigned a score value of 0. A per- ferential, or rating scale to collect data about thatson who checks the first interval (i.e., a score of 3) teacher.on each of these items has a very positive attitudetoward property taxes. Totaling the score values for Thurstone and Guttman Scales.  A Thurstone scaleall items yields an overall score. Usually, summed requires participants to select from a list of state-scores (i.e., interval data) are used in statistical data ments that represent different points of view on aanalysis. problem. Each item has an associated point value between 1 and 11; point values for each item areRating Scales.  A rating scale may also be used to determined by averaging the values of the itemsmeasure a respondent’s attitudes toward self, others, assigned by a number of judges. An individual’sactivities, institutions, or situations. One form of rat- attitude score is the average point value of all theing scale provides descriptions of performance or statements checked by that individual. A Guttmanpreference and requires the individual to check the scale also requires respondents to agree or disagreemost appropriate description. with a number of statements; it is then used to d­etermine whether an attitude is unidimensional. Select the choice that best describes your An attitude is unidimensional if it produces a cumu- actions in the first 5 minutes of the classes lative scale in which an individual who agrees with you teach. a given statement also agrees with all related pre- ceding statements. For example, if you agree with ____ State lesson objectives and overview at Statement 3, you also agree with Statements 2 and 1. the start of the lesson. Interest Inventories ____ State lesson objectives but no overview at the start of the lesson. An interest inventory requires participants to indi- cate personal likes and dislikes, such as the kinds ____ Don’t state objectives or give overview at of activities they prefer. The respondent’s pattern the start of the lesson. of interest is compared to the interest patterns of others. For example, for an occupational interestA second type of rating scale asks the individual to inventory, responses are compared to those typi-rate performance or preference using a numerical cal of successful persons in various occupationalscale similar to a Likert scale. fields. Interest inventories are widely used in this

Find more at www.downloadslide.com184 chapter 6  •  Constructs, Variables, and Testsway to suggest the fields in which respondents are presented with items consisting of choices andmay be most happy and successful. are asked to allocate points to the alternatives according to how much they value them. For exam- Two frequently used interest inventories are ple, a two-alternative item may be the following:the Strong-Campbell Interest Inventory and theKuder Preference Record—Vocational. The Strong- Suppose you had the choice of reading one ofCampbell Interest Inventory examines areas of two books first. If the books were titled Makinginterest in occupations, school subjects, activities, Money in the Stock Market and The Politics ofleisure activities, and day-to-day interactions with Political Power, which would you read first?various types of people. Test takers are presentedwith many problems related to these five areas Respondents allocate points to the two choices,and are asked to indicate whether they like (L), indicating degree of preference. By summing thedislike (D), or are indifferent (I) to each problem. points given to each of the six areas, the scorerThe second part of the Strong-Campbell inventory can obtain an indication of an individual’s prefer-consists of a choice between two options such ence among the six categories. A second form ofas “dealing with people” or “dealing with things” question provides four choices that the respondentand a number of self-descriptive statements that must rank from 4 to 1 in order of preference. Thethe individual responds to by choosing yes (like Study of Values is used primarily in research studiesme), no (not like me), or ? (not sure). The Kuder to categorize individuals or to measure the valueOccupational Interest Survey addresses 10 broad orientation of different groups, such as scientists orcategories of interest: outdoor, mechanical, com- newspaper writers.putational, scientific, persuasive, artistic, literary,musical, social service, and clerical. Individuals are Personality Inventoriespresented with three choices related to these cat-egories and must select the one they most like and A personality inventory includes questions or state-the one they least like. For example, an individual ments that describe behaviors characteristic of cer-may be presented with this item: tain personality traits. Respondents indicate whether or to what degree each statement describes them. Would you rather: dig a hole, read a book, or Some inventories are presented as checklists; draw a picture? Choose the one that you most respondents simply check items they feel charac- would like to do and the one that you least terize them. An individual’s score is based on the would like to do. number of responses characteristic of the trait being measured. An introvert, for example, would be The Strong-Campbell and the Kuder are both expected to respond yes to the statement “Readingself-report instruments that provide information is one of my favorite pastimes,” and no to the state-about a person’s interests. Scoring the instruments ment “I love large parties.” Personality inventoriesrequires sending data to the testing companies who may measure only one trait or many traits.produce them for computer analysis. You cannotscore them yourself. Information on the Strong- General personality inventories frequentlyCampbell, the Kuder, and other attitudinal, value, used in educational research studies include theand personality instruments may be found in the Personality Adjective Checklist, California Psycho­Mental Measurements Yearbook. logical Inventory, Minnesota Multiphasic Personality Inventory, Mooney Problem Checklist, Myers-BriggsValues Tests Type Indicator, and the Sixteen Personality Factors Questionnaire. The Minnesota Multiphasic PersonalityThe Study of Values instrument (Riverside Publishing Inventory (MMPI) alone has been u­tilized in hun-Co.) measures the relative strength of an individual’s dreds of educational research s­tudies. Its items werevalues in six different areas: theoretical (e.g., discov- originally selected on the basis of response dif-ery of truth, empirical approach), economic (e.g., ferences between psychiatric and nonpsychiatricpractical values), aesthetic (e.g., symmetry, form, har- patients. The MMPI measures many personality traits,mony), social (e.g., altruism, philanthropic), political such as depression, paranoia, schizophrenia, and(e.g., personal power, influence), and religious (e.g., social introversion. It contains more than 370 itemsunity of experience, cosmic coherence). Individuals to which a test taker responds true (of me), false

Find more at chapter 6  •  Constructs, Variables, and Tests 185(of me), or cannot say. It also has nearly 200 items background, not personality, values, attitudes, orthat form additional scales for anxiety, ego strength, interests. These issues need to be recognized inrepression, and alcoholism. selecting and interpreting the results of both cog- nitive and affective instruments. General personality instruments that requireself-report, such as the MMPI, are complex and Projective Testsrequire a substantial amount of knowledge of bothmeasurement and psychology to score. Beginning Projective tests were developed in part to eliminateresearchers should avoid their use unless they have some of the problems inherent in the use of self-more than a passing knowledge of these areas. report and forced-choice measures. Projective tests are ambiguous and not obvious to respondents.Problems with Self-Report Instruments Such tests are called projective because respon- dents project their true feelings or thoughts ontoSelf-report instruments such as attitude, interest, an ambiguous stimulus. The classic example ofvalues, and personality scales have notable limits. a projective test is the Rorschach inkblot test.The researcher can never be sure that individuals Respondents are shown a picture of an inkblotare expressing their true attitudes, interests, values, and are asked to describe what they see in it. Theor personalities. A common problem with studies inkblot is really just that—an inkblot made by put-that use self-report instruments is the existence of ting a dab of ink on a paper and folding the papera response set, the tendency of an individual to in half. There are no right or wrong answers to therespond in a particular way to a variety of instru- question “What do you see in the inkblot?” (It’s onlyments. One common response set occurs when an inkblot—honest.) The test taker’s descriptions ofindividuals select responses that are believed to be such blots are believed to be projections of his orthe most socially acceptable, even if they are not her feelings or personality, which the administratornecessarily characteristic of the respondents them- interprets. Because the purpose of the test is notselves. Another form of a response set is when a clear, conscious dishonesty of response is reduced.test taker continually responds yes, agree, or trueto items because he or she believes that is what The most commonly used projective techniquethe researcher desires. Because scores are mean- is the method of association. Presented with a stimu-ingful only to the degree that respondents are hon- lus such as a picture, inkblot, or word, participantsest and select choices that truly characterize them, respond with a reaction or interpretation. Word-every effort should be made to increase honesty association tests are probably the most well knownof response by giving appropriate directions to of the association techniques. (How many moviethose completing the instruments. One strategy to psychiatrists deliver the line, “I’ll say a word and youovercome the problem of response sets is to allow tell me the first thing that comes to your mind?”)participants to respond anonymously. Similarly, in the Thematic Apperception Test, the indi- vidual is shown a series of pictures and is asked to Both affective and cognitive instruments are tell a story about what is happening in each picture.also subject to bias, which is distortion of researchdata that renders the data suspect or invalid. In the past, all projective tests were requiredBias is present when respondents’ characteristics— to be administered individually. There have beensuch as ethnicity, race, gender, language, or reli- some recent efforts, however, to develop groupgious orientation—distort their performance or projective tests. One such test is the Holtzmanresponses. For example, low scores on reading Inkblot Technique, which is intended to measuretests by students who speak little English or non- the same variables as the Rorschach Test.standard forms of English are probably due inlarge part to language disadvantages, not reading From the preceding discussion, it should notdifficulties. For these students, test performance be a surprise that projective tests are used mainlymeans something different than it does for English- by clinical psychologists and very infrequentlyfluent students who took the same test. Similarly, by educational researchers. Administering, scoring,if one’s culture discourages competition, making and interpreting projective tests require lengthyeye contact, or speaking out, the responses on self- and specialized training. Because of the trainingreport instruments can differ according to cultural required, projective testing is not recommended for beginning researchers.

Find more at www.downloadslide.com186 chapter 6  •  Constructs, Variables, and Tests There are other types and formats of measuring made from the selected tests or instruments. It theninstruments beyond those discussed here. The intent requires the collection of sources of evidence toof this section is to provide an overview of types of support the desired interpretation.tests, formats for gathering data, scoring methods,interpretation strategies, and limitations. To find In some situations, a test or instrument is usedmore information about the specific tests described for several different purposes and thus must bein this chapter and many other tests we have validated for each. For example, at Big Pine Highnot described, refer to the Mental Measurement School, a chemistry achievement test may be usedYearbook. to assess students’ end-of-year chemistry learning, to predict students’ future performance in scienceCriteria for Good courses, and even to select students for advancedMeasuring Instruments placement chemistry. Because each use calls for a different interpretation of the chemistry testIf researchers’ interpretations of data are to be scores, each requires its own validation. Further,valuable, the measuring instruments used to col- the same test may be given to groups of respon-lect those data must be both valid and reliable. The dents with significant differences (e.g., one groupfollowing sections give an overview of both validity who has studied the test material and one who hasand reliability. More specific information on these not); the differences may or may not have beenproblems and on testing in general can be found in considered when the test was developed. Validitythe Standards for Educational and Psychological is specific to the interpretation being made and toTesting.1 the group being tested. In other words, we cannot simply say, “This test is valid.” Rather, we must say,Validity of Measuring Instruments “This test is valid for this particular interpretation and this particular group.”Validity refers to the degree to which a test mea-sures what it is supposed to measure and thus per- Researchers generally discuss four types of testmits appropriate interpretation of scores. Validity validity: content validity, criterion-related validity,is therefore critically important when considering construct validity, and consequential validity. Theymeasuring instruments. When we test, we test for are viewed as interrelated, not independent, aspectsa purpose, and our measurement tools must help of validity. Table 6.2 summarizes the four forms ofus achieve that purpose. For example, in Big Pine validity.School District, the curriculum director may want toconduct an experimental study to compare learn- Content Validitying for science students taught by method A (e.g.,hands-on constructivist learning) and learning for Content validity is the degree to which a test mea-those taught by method B (e.g., textbook or rote sures an intended content area. Content v­aliditylearning). A key question for these and other such requires both item validity and sampling validity.test users is “Does this test or instrument permit Item validity is concerned with whether the testthe curriculum director to select the best teaching items are relevant to the measurement of themethod?” intended content area. Sampling validity is con- cerned with how well the test samples the total Validity is important in all forms of research content area being tested. For example, a testand in all types of tests and measures. It is best designed to measure knowledge of biology factsthought of in terms of degree: highly valid, moder- would have good item validity if all the items areately valid, and generally invalid. Validation begins relevant to biology, but it would have poor sam-with an understanding of the interpretation(s) to be pling validity if all the test items are about verte- brates. If, instead, the test adequately sampled the1 Standards for Educational and Psychological Testing, by full content of biology, it would have good contentAmerican Education Research Association, American Psychological validity. Content validity is important because weAssociation, and National Council on Measurement in Education, cannot possibly measure every problem in a con-2014, Washington, DC: American Psychological Association. tent area, yet we want to make inferences about test takers’ performance on the entire content area. Such inferences are possible only if the test

Find more at chapter 6  •  Constructs, Variables, and Tests 187Table 6.2 • Forms of validity essentially equal computational ability but that the “new” math resulted in better conceptual under-Form Method Purpose standing. The moral of the story is that you should take care that your test measures what the studentsContent Compare content of To what extent were expected to learn in the treatments. That is, bevalidity the test to the domain does this test sure that the test has content validity for your study being measured. represent the and for your research participants. general domain of interest? Content validity is determined by expert judg- ment (i.e., content validation). There is no formulaCriterion- Correlate scores from To what extent or statistic by which it can be computed, and thererelated one instrument of does this test is no way to express it quantitatively. Often expertsvalidity scores on a criterion correlate highly in the problem covered by the test are asked to measure, at the same with another assess its content validity. These experts carefully (concurrent) or a test? review the process used to develop the test as different (predictive) well as the test itself, and then they make a judg- time. ment about how well items represent the intended content area. In other words, they compare whatConstruct Amass convergent, To what extent was taught and what is being tested. When the twovalidity divergent, and does this test coincide, the content validity is strong. content-related reflect the evidence to determine construct it is The term face validity is sometimes used to that the presumed intended to describe the content validity of tests. Although its construct is what is measure? meaning is somewhat ambiguous, face validity basi- being measured. cally refers to the degree to which a test appears to measure what it claims to measure. Although deter-Consequential Observe and To what extent mining face validity is not a psychometrically soundvalidity determine whether does the test way of estimating validity, the process is sometimes the test has adverse create harmful used as an initial screening procedure in test selec- consequences for test consequences tion. It should be followed up by content validation. takers or users. for test takers? Criterion-Related Validityitems adequately sample the domain of possibleitems. For this reason, you should clearly identify Criterion-related validity is determined by relatingand examine the bounds of the content area to performance on a test to performance on a secondbe tested before constructing or selecting a test or test or other measure. The second test or measuremeasuring instrument. is the criterion against which the validity of the ini- tial test is judged. Criterion-related validity has two Content validity is of particular importance for forms: concurrent validity and predictive validity.achievement tests. A test score cannot accuratelyreflect a student’s achievement if it does not mea- Concurrent Validity.  Concurrent validity is thesure what the student was taught and is supposed degree to which scores on one test are related toto have learned. Content validity is compromised if scores on a similar, preexisting test administered inthe test covers problems not taught or if it does not the same time frame or to some other valid measurecover problems that have been taught. Early stud- available at the same time. Often, for example, a testies that compared the effectiveness of “new” math is developed that claims to do the same job as somewith “old” math are classic cases in which achieve- other test but is easier or faster. One way to deter-ment test validity was called into question. Scores mine whether the claim is true is to administer theon achievement tests suggested no differences in new and the old test to the same group and com-students’ learning under the two approaches. The pare the scores.problem was that the “new” math emphasized con-cepts and principles, but achievement tests assessed Concurrent validity is determined by establish-computational skills. When tests that contained an ing a relationship or discrimination. The relation-adequate sampling of items measuring concepts and ship method involves determining the correlationprinciples were developed, researchers began to find between scores on the test under study (e.g., a newthat the two approaches to teaching math resulted in

Find more at www.downloadslide.com188 chapter 6  •  Constructs, Variables, and Teststest) and scores on some other established test or used to determine eligibility for special educationcriterion (e.g., grade point average). The steps are services and the needs of students receiving suchas follows: services. It is imperative in these situations that decisions about appropriate programs be based on 1. Administer the new test to a defined group of the results of measures with predictive validity. individuals. The predictive validity of an instrument may vary 2. Administer a previously established, valid depending on a number of factors, including the cur- criterion test (the criterion) to the same riculum involved, textbooks used, and geographic group at the same time or shortly thereafter. location. The Mindboggling Algebra Aptitude Test, for example, may predict achievement better in courses 3. Correlate the two sets of scores. using the Brainscrambling Algebra I textbook than 4. Evaluate the results. in courses using other textbooks. Likewise, studies on the GRE have suggested that, although the test The resulting correlation, or validity coefficient, appears to have satisfactory predictive validity forindicates the degree of concurrent validity of the success in some areas of graduate study (such asnew test; if the coefficient is high (near 1.0), the test English), its validity in predicting success in otherhas good concurrent validity. For example, suppose areas (such as art education) appears to be question-Big Pine School District uses a 5-minute test of chil- able. Thus, if a test is to be used for prediction, it isdren’s English language proficiency. If scores on this important to compare the description of its validationtest correlate highly with scores on the American with the situation for which it is to be used.Language Institute test of English language profi-ciency (which must be administered to one child at Of course, no test has perfect predictive valid-a time and takes at least an hour), then the test from ity. In other words, predictions based on the scoresBig Pine School District is definitely preferable in a of any test will be imperfect. However, predictionsgreat many situations. based on a combination of several test scores will invariably be more accurate than predictions based The discrimination method of establishing on the scores of any single test. Therefore, whenconcurrent validity involves determining whether important classification or selection decisions aretest scores can be used to discriminate between to be made, they should be based on data frompersons who possess a certain characteristic and more than one indicator.those who do not or who possess it to a greaterdegree. For example, a test of personality disorder In establishing the predictive validity of a testwould have concurrent validity if scores resulting (called the predictor because it is the variable uponfrom it could be used to classify institutionalized which the prediction is based), the first step is toand noninstitutionalized persons correctly. identify and carefully define the criterion, or pre- dicted variable, which must be a valid measure ofPredictive Validity.  Predictive validity is the the performance to be predicted. For example, if wed­ egree to which a test can predict how well an in- want to establish the predictive validity of an algebradividual will do in a future situation. For example, aptitude test, final examination scores at the comple-if an algebra aptitude test administered at the start tion of a course in algebra may be considered a validof school can predict which students will perform criterion. As another example, if we are interestedwell or poorly in algebra at the end of the school in establishing the predictive validity of a given testyear (the criterion) fairly accurately, the aptitude for forecasting success in college, the grade pointtest has high predictive validity. average at the end of the first year would probably be considered a valid criterion, but the number of Predictive validity is extremely important for extracurricular activities in which the student par-tests that are used to classify or select individuals. ticipated probably would not. Once the criterion hasAn example many of you are familiar with is the been identified and defined, the procedure for deter-use of Graduate Record Examination (GRE) scores mining predictive validity is as follows:to select students for admission to graduate school.Many graduate schools require a minimum score for 1. Administer the predictor variable to a group.admission in the belief that students who achieve 2. Wait until the behavior to be predicted, thethat score have a higher probability of succeedingin graduate school than those scoring lower. Other criterion variable, occurs.tests used to classify or select people include those

Find more at chapter 6  •  Constructs, Variables, and Tests 189 3. Obtain measures of the criterion for the same that the instrument selected for the study actually group. measures the intended construct rather than some unanticipated, intervening variable. 4. Correlate the two sets of scores. 5. Evaluate the results. Determining construct validity is by no means easy. It usually involves gathering a number of The resulting correlation, or validity coefficient, pieces of evidence to demonstrate validity; no singleindicates the predictive validity of the test; if the validation study can establish the construct valid-coefficient is high, the test has good predictive ity of a test. For example, if we wanted to deter-validity. You may have noticed that the procedures mine whether the intelligence test developed byfor determining concurrent validity and predictive Big Pine School District—Big Pine Intelligence Testvalidity are very similar. The major difference has to (BPIT)—had construct validity, we could carry outdo with when the criterion measure is administered. all or most of the following validation studies. WeIn establishing concurrent validity, the criterion could see whether students who scored high onmeasure is administered at about the same time the BPIT learned faster, more, and with greateras the predictor. In establishing predictive validity, retention than students with low scores. We couldthe researcher usually has to wait for time to pass correlate scores on the BPIT taken at the beginningbefore criterion data can be collected. of the school year with students’ grades at the end of the school year. We could also correlate perfor- In the discussion of both concurrent and pre- mance on the BPIT with performance on other,dictive validity, we have noted that a high coeffi- well-established intelligence tests to see whether thecient indicates that the test has good validity. You correlations were high. We could have scholars inmay have wondered, “How high is high?” Although the field of intelligence examine the BPIT test itemsthere is no magic number that a coefficient should to judge whether they represented typical problemsreach, a number close to 1.0 is best. in the field of intelligence. Notice how content and ­criterion-related forms of validity are used in studiesConstruct Validity to determine a test’s construct validity.Construct validity is the most important form In addition to the confirmatory evidence justof validity because it asks the fundamental valid- described (i.e., evidence that a test is valid for aity question: What is this test really measuring? In construct), we could seek disconfirmatory validityother words, construct validity reflects the degree to information (i.e., evidence that a test is not validwhich a test measures an intended hypothetical con- for a different, unrelated construct). For example,struct. All variables derive from constructs, and con- we would not expect scores on an intelligence teststructs are nonobservable traits, such as intelligence, to correlate highly with self-esteem or height. If weanxiety, and honesty, “invented” to explain behav- correlated the BPIT with self-esteem and heightior. Constructs underlie the variables that research- and found low or moderate correlations, we coulders measure. You cannot see a construct; you can conclude that the test is measuring something dif-only observe its effect. Constructs do an amazingly ferent than self-esteem or height. We would thengood job, however, of explaining certain differences have evidence that the BPIT correlates highly withamong individuals. For example, some students other intelligence tests (i.e., confirmatory valida-learn faster than others, learn more, and retain tion) and does not correlate highly with self-esteeminformation longer. To explain these differences, and height (i.e., disconfirmatory validation).scientists hypothesized that a construct called intelli-gence is related to learning, and everyone possesses Consequential Validityintelligence to a greater or lesser degree. A theory ofintelligence was born, and tests were developed to Consequential validity, as the name suggests, ismeasure a person’s intelligence. As it happens, stu- concerned with the consequences that occur fromdents who have high intelligence scores (i.e., “more” tests. As more and more tests are being adminis-intelligence) tend to do better in school and other tered to more and more individuals, and as thelearning environments than those who have lower consequences of testing are becoming more impor-intelligence scores (i.e., “less” intelligence). It is tant, concern over the consequences of testing hasimportant to remember, however, that research stud- increased. All tests have intended purposes (I mean,ies involving a construct are valid only to the extent

Find more at www.downloadslide.com190 chapter 6  •  Constructs, Variables, and Testsreally, who would create these things just for fun?) Psychological Association (APA), and the Nationaland, in general, the intended purposes are valid Council on Measurement in Education (NCME)and appropriate. However, some testing instances includes a comprehensive list of validity standardsproduce (usually unintended) negative or harmful that, if met, allow educational researchers to makeconsequences to the test takers. Consequential robust claims about the context-specific interpreta-validity, then, is the extent to which an instrument tions they make. For novice researchers interestedcreates harmful effects for the user. Examining con- in a comprehensive discussion of all the standards,sequential validity allows researchers to ferret out we recommend that you read The Standards forand identify tests that may be harmful to students, Educational and Psychological Testing (2014) Partteachers, and other test users, whether the problem I—Validity. The discussion presented there expandsis intended or not. our treatment of different forms of validity and provides a comprehensive discussion of generally The key question in consequential validity is accepted professional validity standards.“What are the effects of various forms of testing onteachers or students?” For example, how does testing To summarize, validity is the most importantstudents solely with multiple-choice items affect stu- characteristic that a test or measure can have.dents’ learning compared with assessing them with Without validity, the interpretations of the dataother, more open-ended items? Should non-English have inappropriate (or no) meaning. In the end, thespeakers be tested in the same way as English speak- test user makes the final decision about the validityers? Can people who see the test results of non- and usefulness of a test or measure. The bases forEnglish speakers but do not know about their lack that decision should be described in the procedureof English make harmful interpretations for such stu- section of your research plan.dents? Although most tests serve their intended pur-pose in benign ways, consequential validity reminds Reliability of Measuring Instrumentsus that testing can and sometimes does have nega-tive consequences for test takers or users. In everyday English, reliability means dependabil- ity or trustworthiness. The term means the sameFactors That Threaten Validity thing when describing measurements. Reliability is the degree to which a test consistently mea-A number of factors can diminish the validity of sures whatever it is measuring. The more reliabletests and instruments used in research, including: a test is, the more confidence we can have that the scores obtained from the test are essentially the■ Unclear test directions same scores that would be obtained if the test were■ Confusing and ambiguous test items readministered to the same test takers at another■ Vocabulary too difficult for test takers time or by a different person. If a test is unreliable■ Overly difficult and complex sentence structures (i.e., if it provides inconsistent information about■ Inconsistent and subjective scoring methods performance), then scores will likely be quite differ-■ Untaught items included on achievement tests ent every time the test is administered. For example,■ Failure to follow standardized test if an attitude test is unreliable, then a student with a total score of 75 today may score 45 tomorrow and administration procedures 95 the day after tomorrow. If the test is reliable, and■ Cheating, either by participants or by someone if the student’s total score is 75 on one day, then the student’s score will not vary much on retesting teaching the correct answers to the specific test (e.g., likely between 70 and 80). Of course, we should items not expect the student’s score to be exactly the same on other retestings. The reliability of test scores isThese factors diminish the validity of tests because similar to the reliability of sports scores, such asthey distort or produce atypical test performance, scores for golf, bowling, or shot put. Golfers, bowlers,which in turn distorts the desired interpretation of and shot-putters rarely produce identical scores timethe test scores. after time after time. An individual’s health, motiva- tion, anxiety, luck, attitude, and attention changeValidity Standards from time to time and influence performance ofThe Standards for Educational and PsychologicalTesting manual (2014) developed by the AmericanEducational Research Association (AERA), the American

Find more at chapter 6  •  Constructs, Variables, and Tests 191these activities, just as they affect performance on measured what it was supposed to measure, thetests—this variation is known as error. All test scores reliability coefficient would be higher. On the otherhave some degree of measurement error, and the hand, if the reported reliability coefficient were 0.92smaller the amount of error, the more reliable the (which is definitely good), you wouldn’t know muchscores and the more confidence we have in the con- about validity—the test could be consistently mea-sistency and stability of test takers’ performance. suring the wrong thing. In other words, reliability is necessary but not sufficient for establishing validity. Reliability is expressed numerically, usually asa reliability coefficient, which is obtained by using As with validity, there are different types ofcorrelation. A perfectly reliable test would have a reliability, each of which deals with a differentreliability coefficient of 1.00, meaning that students’ kind of test consistency and is established in a dif-scores perfectly reflected their true status with ferent manner. The following sections describe fiverespect to the variable being measured, but alas, no general types of reliability, which are summarizedtest is perfectly reliable. High reliability (i.e., a coef- in Table 6.3.ficient close to 1.00) indicates minimum error—thatis, the effect of errors of measurement is small. Stability Reliability tells about the consistency of the Stability, also called test–retest reliability, is thescores produced; validity tells about the appropri- degree to which scores on the same test are consis-ateness of a test. Both are important for judging tent over time. In other words, this type of reliabilitythe suitability of a test or measuring instrument. provides evidence that scores obtained on a test atAlthough a valid test is always reliable, a reliable test one time (test) are the same or close to the sameis not always valid. In other words, if a test is mea- when the test is readministered some other timesuring what it is supposed to be measuring, it will be (retest). The more similar the scores on the test overreliable, but a reliable test can consistently measure time, the more stable the test scores. Test stability isthe wrong thing and be invalid! For example, sup- especially important for tests used to make predic-pose an instrument intended to measure social stud- tions because these predictions are based heavilyies concepts actually measured only social studies on the assumption that the scores will be stablefacts. It would not be a valid measure of concepts, over time. The procedure for determining test–retestbut it could certainly measure the facts very consis- reliability is quite simple:tently. As another example, suppose the reportedreliability coefficient for a test was 0.24, which is def- 1. Administer the test to an appropriate group.initely quite low. This low coefficient would tell you 2. After some time has passed, say, 2 weeks,that the validity was also low—if the test consistently administer the same test to the same group.Table 6.3 • Five types of reliabilityName What Is Measured ProcedureStability (test–retest) Stability of scores over time Give one group the same test at two different times, and correlate the two scores.Equivalence Relationship between two versions of a Give alternative test forms to a single group,(equivalent-forms) test intended to be equivalent and correlate the two scores.Equivalence and Relationship between equivalent versions Give two alternative tests to a group at two differentstability of a test given at two different times times, and correlate the scores.Internal consistency The extent to which the items in a test are Give tests to one group, and apply split-half, Kuder- similar to one another in content Richardson, or Cronbach’s alpha to estimate theScorer/rater internal consistency of the test items. The extent to which independent scorers Give copies of a set of tests to independent scorers or a single scorer over time agree on the or a single scorer at different times, and correlate or scoring of an open-ended test compute the percentage of scorer agreement.

Find more at www.downloadslide.com192 chapter 6  •  Constructs, Variables, and Tests 3. Correlate the two sets of scores. Equivalent-forms reliability is the most com- 4. Evaluate the results. monly used estimate of reliability for most tests used in research. The major problem with this If the resulting coefficient, referred to as the method of estimating reliability is the difficulty ofcoefficient of stability, is high, the test has good constructing two forms that are essentially equiva-test–retest reliability. A major problem with this type lent. Even though equivalent-forms reliability isof reliability is that it is difficult to know how much considered to be a very good estimate of reliability,time should elapse between the two testing sessions. it is not always feasible to administer two differ-If the interval is too short, the students may remem- ent forms of the same test. Imagine your instructorber responses they made on the test the first time; if saying you have to take two final examinations!they do, the estimate of reliability will be artificiallyhigh. If the interval is too long, students may improve Equivalence and Stabilityon the test due to intervening learning or maturation;if they do, the estimate of reliability will be artificially This form of reliability combines equivalence andlow. Generally, although not universally, a period of stability. If the two forms of the test are adminis-from 2 to 6 weeks is used to determine the stability tered at two different times (the best of all possibleof a test. When stability information about a test is worlds!), the resulting coefficient is referred togiven, the stability coefficient and the time interval as the coefficient of stability and equivalence. Inbetween testing should also be given. essence, this approach assesses stability of scores over time as well as the equivalence of the two setsEquivalence of items. Because more sources of measurement error are present, the resulting coefficient is likelyEquivalence, also called equivalent-forms reliabil- to be somewhat lower than a coefficient of equiva-ity, is the degree to which two similar forms of a lence or a coefficient of stability. Thus, the coef-test produce similar scores from a single group of ficient of stability and equivalence represents atest takers. The two forms measure the same variable conservative estimate of reliability. The procedureand have the same number of items, the same struc- for determining equivalence and stability reliabilityture, the same difficulty level, and the same direc- is as follows:tions for administration, scoring, and interpretation.Only the specific items are not the same, although 1. Administer one form of the test to anthey measure the same problems or objectives. The appropriate group.equivalent forms are constructed by randomly sam-pling two sets of items from the same, well-described 2. After a period of time, administer the otherpopulation. If two tests are equivalent, they can be form of the test to the same group.used interchangeably. It is reassuring to know thata person’s score will not be greatly affected by the 3. Correlate the two sets of scores.particular form administered. In some research stud- 4. Evaluate the results.ies, two forms of a test are administered to the samegroup, one as a pretest and the other as a posttest. Internal Consistency The procedure for determining equivalent-forms Internal consistency reliability is the extent toreliability is similar to that for determining test– which items in a single test are consistent amongretest reliability: themselves and with the test as a whole. It is mea- sured through three different approaches: split- 1. Administer one form of the test to an half, Kuder-Richardson, or Cronbach’s alpha. Each appropriate group. provides information about items in a single test that is taken only once. Because internal consis- 2. At the same session, or shortly thereafter, tency approaches require only one test administra- administer the second form of the test to the tion, some sources of measurement errors, such as same group. differences in testing conditions, are eliminated. 3. Correlate the two sets of scores. Split-Half Reliability.  Split-half reliability is a 4. Evaluate the results. measure of internal consistency that involves divid- ing a test into two halves and correlating the scoresIf the resulting coefficient of equivalence is high, on the two halves. It is especially appropriate whenthe test has good equivalent-forms reliability.

Find more at chapter 6  •  Constructs, Variables, and Tests 193a test is very long or when it would be difficult rtotal test = 2(.80) = 1.60 = .89to administer either the same test at two different 1 + .80 1.80times or two different forms to a group. The pro-cedure for determining split-half reliability is as Kuder-Richardson and Cronbach’s Alpha.  Kuder-follows: Richardson 20 (KR-20) and Cronbach’s alpha estimate internal consistency reliability by deter- 1. Administer the total test to a group. mining how all items on a test relate to all other 2. Divide the test into two comparable halves, test items and to the total test. Internal consistency results when all the items or tasks on a test are or subtests, most commonly by selecting odd related or, in other words, are measuring similar items for one subtest and even items for the things. Both techniques provide reliability estimates other subtest. that are equivalent to the ­average of the split- 3. Compute each participant’s score on the two half reliabilities computed for all possible halves; halves—each participant will have a score for Cronbach’s alpha is a general formula of which the the odd items and a score for the even items. KR-20 formula is a special case. KR-20 is a highly 4. Correlate the two sets of scores. regarded method of assessing reliability but is use- 5. Apply the Spearman-Brown correction ful only for items that are scored dichotomously formula. (i.e., every item is given one of two scores—one 6. Evaluate the results. for the right answer, one for the  wrong answer); multiple-choice items and true/false items are ex- The odd–even strategy for splitting the test amples of dichotomously scored items. If itemsworks out rather well regardless of how a test is can have more than two scores (e.g.,  “How manyorganized. Suppose, for example, we have a 20-item previous research classes have you taken? Selecttest in which the items get progressively more dif- from among the following choices: 0, 1, 2, 3”),ficult. Items 1, 3, 5, 7, 9, 11, 13, 15, 17, and 19 as a then Cronbach’s alpha should be used. As an-group should be approximately as difficult as items other example, many affective instruments and2, 4, 6, 8, 10, 12, 14, 16, 18, and 20. In effect, we are performance tests are scored using more than twoartificially creating two equivalent forms of a test choices (e.g., with a Likert scale). If numbers areand computing equivalent-forms reliability. In split- used to  represent the response choices, analysishalf reliability the two equivalent forms are parts of for internal consistency can be accomplished usingthe same test—thus the label internal consistency Cronbach’s alpha.reliability. Kuder and Richardson provided an alterna- Notice that the procedure does not stop after tive, more easily computed form of their formula,the two sets of scores are correlated. Because lon- called Kuder-Richardson 21 (KR-21). It requiresger tests tend to be more reliable and the split-half less time than any other method of estimating reli-reliability coefficient represents the reliability of ability, although it provides a more conservativea test only half as long as the actual test, a cor- estimate of reliability. The KR-21 formula is:rection formula must be applied to determinethe reliability of the whole test. The correction rtotal test = (K)(SD2) - X(K - X)formula used is the Spearman-Brown prophecy (SD2)(K - 1)formula. For example, suppose the split-half reli-ability coefficient for a 50-item test were 0.80. The where0.80 number would be based on the correlationbetween scores on 25 even items and 25 odd K = the number of items in the testitems and would therefore be an estimate of the SD = the standard deviation of the scoresreliability of a 25-item test, not a 50-item test. TheSpearman-Brown formula provides an estimate of X = the mean of the scoresthe reliability of the full 50-item test. The formulais very simple and is applied to our example in the In Chapter 17 you will learn how to compute thefollowing way: mean and standard deviation of a set of scores. For now, recognize that the mean, X, is the aver-rtotal test = 2rsplit half age score on the test for the group that took it, 1 + rsplit half and the standard deviation (SD) is an indication of

Find more at www.downloadslide.com194 chapter 6  •  Constructs, Variables, and Teststhe amount of score variability, or how spread out and aptitude tests should have high reliabil- ity, often higher than 0.90. On the other hand,the scores are. For example, assume that you have personality measures and other projective tests do not typically report such high reliabilitiesadministered a 50-item test and have calculated the (although certainly some do), and a researchermean to be 40 (X = 40) and the standard deviation using one of these measures should be satis-to be 4 (SD = 4). The reliability of the test (which fied with a reliability somewhat lower than thatin this example turns out to be rather poor) would expected from an achievement test. Moreover, when tests are developed in new areas, reliabilitybe calculated as follows: is often low initially. The best way to evaluate the level of reliability in a test that you are using is tortotal test = (50)(42) - 40(50 - 40) gather information from other, similar tests to use (42)(50 - 1) as a benchmark. = (50)(16) - 40(10) If a test is composed of several subtests that (16)(49) will be used individually in a study, then the reli- ability of each subtest should be evaluated. Because = 800 - 400 = 400 = .51 reliability is a function of test length, the reliability 784 784 of any particular subtest is typically lower than the reliability of the total test.Scorer/Rater Reliability Researchers should report reliability informa-Reliability must also be investigated with regard tion about tests in their research plans; they mustto the individuals who score the tests. Subjectivity also be sure to evaluate and report reliability foroccurs when a single scorer over time is inconsis- their own research participants. Reliability, liketent or different scorers do not agree on the scores validity, is dependent on the group being tested.of a single test. Essay tests, short-answer tests, per- The more heterogeneous the test scores of a group,formance and product tests, projective tests, and the higher the reliability will be. Thus, if Group Aobservations—almost any test that calls for more and Group B both took the same test, but Groupthan a one-word response—raise concerns about A was made up of valedictorians and Group B wasthe reliability of scoring. Interjudge reliability made up of students ranging from low to high per-(i.e., interrater reliability) refers to the consistency formers, the test would be more reliable for Groupof two or more independent scorers, raters, or B than for Group A.observers; intrajudge reliability (i.e., intraraterreliability) refers to the consistency of one individu- Standard Error of Measuremental’s scoring, rating, or observing over time. Reliability can also be expressed by stating the stan- Subjective scoring is a major source of errors dard error of measurement. The standard error ofof measurement, so it is important to determine measurement is an estimate of how often one canthe reliability of the individuals who score open- expect errors of a given size in an individual’s testended tests. It is especially important to determine score. Thus, a small standard error of measure-scorer/rater reliability when performance on a test ment indicates high reliability, and a large standardhas serious consequences for the test taker. For error of measurement indicates low reliability. Youexample, some tests are used to determine who will should be familiar with this concept because suchbe awarded a high school diploma or promoted to data are often reported for a test.the  next grade. The more open-ended test itemsare, the more important it is to seek consensus in If a test were perfectly reliable (which noscoring among raters. Subjective scoring reduces test is), a person’s test score would be the truereliability and, in turn, diminishes the validity of score—the score obtained under ideal conditions.the interpretations that the researcher or tester can However, we know that if you administered themake from the scores. same test over and over to the same individual, the scores would vary, like the golf, bowling, and shot-Reliability Coefficients put scores. The amount of variability is a function of the reliability of the test: Variability is small forWhat is an acceptable level of reliability? Theminimum level of acceptability differs among testtypes. For example, standardized achievement

Find more at chapter 6  •  Constructs, Variables, and Tests 195a highly reliable test and large for a test with low Test Selection,reliability. If we could administer a test many times Construction,to the same individual or group of individuals, we and Administrationcould see how much variation actually occurred.Of course, realistically we can’t do this, but it is Selecting a Testpossible to estimate this degree of variation (i.e.,the standard error of measurement) using the data A very important guideline for selecting a test is this:from the administration of a single test. In other Do not stop with the first test you find that appearswords, the standard error of measurement allows to measure what you want, say, “Eureka! I have foundus to estimate how much difference there may it!” and blithely use it in your study. Instead, identifybe between a person’s obtained score and that a group of tests that are appropriate for your study,person’s true score. The size of this difference is a compare them on relevant factors, and select the bestfunction of the reliability of the test. We can esti- one. If you become knowledgeable concerning themate the standard error of measurement using the qualities a test should possess and familiar with thefollowing simple formula: various types of tests that are available, then select- ing an instrument will be a very orderly process. SEm = SD 11 - r Assuming that you have defined the purpose of your study, the first step in choosing a test is to determinewhere precisely what type of test you need. The next step is to identify and locate appropriate tests. Finally, you SEm = standard error of measurement must do a comparative analysis of the tests and select SD = the standard deviation of the test scores the best one for your needs. r = the reliability coefficient Sources of Test InformationAs an example, for a 25-item test, we calculate the Mental Measurements Yearbookstandard deviation of a set of scores to be 5 (SD = 5)and the reliability coefficient to be .84 (r = .84). The After you have determined the type of test youstandard error of measurement would then be calcu- need (e.g., a test of reading comprehension forlated as follows: second-graders or an attitude measure for high school students), a logical place to start look- SEm = SD11 - r = 511 - .84 ing for specific tests to meet your needs is in the = 51.16 = 5(.4) = 2.0 Mental Measurements Yearbook (MMY). The MMY is the most comprehensive source of test infor-As this example illustrates, the size of the SEm mation available to educational researchers. Theis a function of both the SD and the reliability Nineteenth Mental Measurements Yearbook (2014)coefficient. Higher reliability is associated with a is the latest publication in a series that includes thesmaller SEm, and a smaller SD is associated with MMY, Tests in Print, and many other related worksa smaller SEm. If the reliability coefficient in the such as Vocational Tests and Reviews. The MMY,previous example were .64, would you expect SEm which can be found in most university l­ibraries,to be larger or smaller? It would be larger: 3.0. If is expressly designed to assist users in makingthe standard deviation in the example were 10, informed test selection decisions. The stated pur-what would you expect to happen to SEm? Again, poses are to provide (1) factual information onit would be larger: 4.0. Although a small SEm indi- all known new or revised tests in the English-cates less error, it is impossible to say how small speaking world; (2) objective test reviews writtenthe SEm should be because the size of the SEm is specifically for the MMY; and (3) comprehensiverelative to the size of the test. Thus, an SEm of 5 bibliographies for specific tests, including relatedwould be large for a 20-item test but small for a references from p­ ublished literature. Some of this200-item test. In our example, an SEm of 2.0 would information is available free of charge from thebe considered moderate. To facilitate better inter- Buros Institute website, and a fee is charged for thepretation of scores, some test publishers present test reviews.not only the SEm for the total group but also aseparate SEm for each of a number of identifiedsubgroups.

Find more at www.downloadslide.com196 chapter 6  •  Constructs, Variables, and Tests Getting maximum benefit from the MMY researcher in the subject area, information aboutrequires, at the very least, that you familiarize validity and reliability, and other useful informa-yourself with the organization and the indexes tion about the test.provided. Perhaps the most important thing toknow in using the MMY is that the numbers given Tests in Printin the indexes are test numbers, not page num-bers. For example, in the Classified Subject Index, A very useful supplemental source of test informa-under “Achievement” you will find the following tion is Tests in Print (TIP). TIP is a comprehensiveentry (among others): i-Ready Diagnostic and bibliography of all known commercially availableInstruction. Students in Grades K-8 (Diagnostic) tests that are currently in print. It also serves as aand K-6 (Instruction); 86. The 86 indicates that master index that directs the reader to all originalthe description of the i-Ready Diagnostic and reviews of tests that have appeared in all the edi-Instruction test is entry 86 in the main body of the tions of the MMY to date. It is most often used tovolume, not on page 86 (it is actually on page 360). determine the availability of a test. If you know that a test is available, you can look it up in the The MMY provides six indexes with informa- MMY to evaluate whether it is appropriate for yourtion about tests: Index of Titles, Index of Acronyms, purpose. The main body of the latest TIP editionClassified Subject Index (i.e., alphabetical list of test is organized alphabetically. TIP provides informa-subjects), Publishers Directory and Index (i.e., names tion on many more tests than the MMY, but theand addresses of publishers), Index of Names MMY contains more comprehensive information(i.e.,  names of test developers and test reviewers), for each test.and Score Index (i.e., types of scores obtained fromthe tests). For example, if you heard that Professor Pro-Ed PublicationsJeenyus had developed a new interest test but youdid not know its name, you could look in the Index Some other sources of test information comeof Names under “Jeenyus.” There, you would find from Pro-Ed Publications. Tests: A Comprehensivetest numbers for all tests developed by Professor Reference for Assessments in Psychology, Education,Jeenyus that were included in the volume. and Business (2008, T. Maddox, Ed.), now in its sixth edition, provides descriptions of more than If you are looking for information on a particu- 2,000 tests. Although no reviews are included, com-lar test, you can find it easily by using the alphabeti- plete information about test publishers is providedcal organization of the most recent MMY. If you are to enable users to call or write for additional infor-not sure of the title or know only the general type of mation. In addition, tests appropriate for individualstest you need, you may use the following procedure: with physical, visual, and hearing impairments are listed, as are tests that are available in a variety of 1. If you are not sure of a title for a test, look languages. Information on Tests can be found in through the Index of Titles for possible Pro-Ed’s online catalog at variants of the title or consult the appropriate default. asp. In addition to these reference works, subject area in the Classified Subject Index Pro-Ed publishes numerous tests, which are also for that particular test or related ones. described in the catalog. 2. If you know the test publisher, consult the Professional Journals Publishers Directory and Index and look for the test you seek. A number of journals, many of which are American Psychological Association publications, 3. If you are looking for a test that yields a regularly publish information of interest to test particular type of score, search for tests in users. For example, Psychological Abstracts is a that category in the Score Index. potential source of test information. Other jour- nals of interest to test users include Journal 4. Using the entry numbers listed in all the of Applied Measurement, Journal of Consulting sections described previously, locate the test Psychology, Journal of Educational Measurement, descriptions in the Tests and Reviews section and Educational and Psychological Measurement. (i.e., the main body of the volume).An example of an MMY entry is shown in Figure 6.1.The entry contains the suggested ages of the par-ticipants, the author and publisher, a review by a

Find more at chapter 6  •  Constructs, Variables, and Tests 197Figure 6.1 • Sample entry from the Mental Measurements Yearbook [6] ­Foundations/Teacher Education, College of Education,Assessment of Classroom Environments. U­ niversity of Arkansas at Little Rock, Little Rock, AR:Purpose: “Identifies [teachers’] preferences [and approaches] forestablishing classroom environments [by comparing] the Lead- DESCRIPTION. The Assessment of Classroom Envi-ership Model, the Guidance Model, and the Integrated Model.” ronments (A.C.E.) is a group-administered battery of ratingPopulation: Teachers. scales designed to profile an individual’s teaching style asPublication Dates: 2000–2008. reflecting one of three models: Leadership (teacher centered),Acronym: ACE. Guidance (student-centered), or Integration (information-Scores: 3 models (Leadership, Guidance, Integration) for processing). Although not specified in the test manual or testeach of 8 scales: Self-Attributions, Self-Reflections, Ideal instruments, the instrument appears to be designed for teach-Teacher, Peers, Students, Supervisors, General Form, ers in the K-12 setting.­Comparative Form.Administration: Group. The A.C.E. consists of eight scales to be completed byForms, 8: Self-Attributions (ratings by teacher), Self-R­ eflections the teacher, peers, students, supervisors, and community(ratings by teacher [teacher’s perception of how students, members. The Self-Attribution Scale, the Self-Reflectionpeers, and supervisors view teacher]), 4 Observation Check- Scale, and the Ideal Teacher scale are all completed by thelists (General Form [ratings by “community members, parents, teacher. Observation checklists are completed by peer teach-visitors, [or] college students in teacher preparation programs”], ers, students, supervisors, and a community member such asPeer Form [ratings by teacher’s peers], Student Form [ratings a parent or other adult. Finally, a Comparative Scale is com-by teacher’s students], Supervisor Form [ratings by teacher’s pleted by the teacher that identifies attributes most descriptivesupervisors]), Ideal Checklist (ratings by teacher [teacher’s of self, others, and the ideal teacher.perception of the ideal classroom environment]), Compara-tive Form (ratings by teacher [comparison of the teacher’s All of the scales consist of 25 identical triads of state-classroom environment, other professional teachers’ classroom ments that demonstrate a teacher’s style preference acrossenvironment, and the ideal classroom environment]). six factors: classroom management, learning environment,Price Data, 2009: $50 per 25 Self-Attributions forms; $50 per instructional approach, teacher efficacy, assessment, and25 Self-Reflections forms; $50 per 25 Observation Checklist- i­nstructional practices. Each of the statements in a triad rep-General forms; $50 per 25 Observation Checklist-Peer forms; resents one of three teaching models identified by the test au-$50 per 25 Observation Checklist- Student forms; $50 per thors. The statement that is believed to be most ­descriptive of25 Observation Checklist-Supervisor forms; $50 per 25 Ideal the teacher’s approach in the classroom is given a rank of +1.Checklist forms; $50 per 25 Comparative forms; $40 per test The statement that is believed to be least descriptive of themanual (2008, 34 pages); $.40 per scoring/profiling per scale; teacher is given a rank of +3. The remaining statement in the$40 per analysis report. triad is given a rank of +2. These rankings are additive andTime: Administration time not reported. the model with the lowest composite score is then consideredAuthors: Louise M. Soares and Anthony T. Soares (test). to be most indicative of that teacher’s style in the classroom.Publisher: Castle Consultants. The technical manual provides instructions for Review of the Assessment of Classroom Environ- ­administration as well as instructions for scoring and profiling.ments by AMANDA NOLEN, Assistant Professor, Educational The primary objective for the A.C.E. is to create an accu- rate profile of an individual teacher’s instructional style using an integrated approach of self-report as well as objective observations of others.Source: J. C. Conoley and J. C. Impara (Eds.), The Twelfth Mental Measurements Yearbook (1995), pp. 380–381. Lincoln, NE: BurosInstitute of Mental Measurements.Test Publishers and Distributors all descriptions to be exactly what you need may not be what you need after all. For example, itAfter narrowing your search to a few acceptable tests, may contain many items measuring content notyou should review the manuals for the tests, which covered, or its language level may be too high orare available from the publishers. A manual typically low for your participants. Above all, rememberincludes detailed technical information, a description that, in selecting tests, you must be a good con-of the population for whom the test is intended, a sumer, one who finds an instrument that fits yourdetailed description of norming procedures, condi- needs. The feature Digital Research Tools for thetions of administration, detailed scoring instructions, 21st Century discusses online resources to helpand requirements for score interpretation. you identify useful sources of information about specific tests. Final selection of a test usually requiresexamining the test itself. A test that appears from

Find more at www.downloadslide.com198 chapter 6  •  Constructs, Variables, and TestsSelecting from Alternatives You would presumably select the test with the high- est reliability, but other considerations may be equallyAfter you have narrowed the number of test can- important, such as ease of test use. For example, adidates and acquired relevant information, you test that can be administered during one class periodmust conduct a comparative analysis of the tests. would be considerably more convenient than aAlthough a number of factors should be considered 2-hour test. Shorter tests generally are also preferablein choosing a test, these factors are not of equal because they are less tiring and more motivating forimportance. For example, the least expensive test test takers. However, a shorter test will tend to beis not necessarily the best test. As you undoubtedly less reliable than a longer one. If one test takes halfknow by now, the most important factor to be con- as long to administer as another and is only slightlysidered in test selection is validity. Is one test more less reliable, the shorter test is probably better.appropriate for your sample than the others? If youare interested in prediction, does one test have a By the time you get to this point, you will prob-significantly higher validity coefficient than the oth- ably have made a decision. The test you chooseers? If content validity is of prime importance, are will probably be group-administered rather thanthe items of one test more relevant to the problem individually administered. Of course, if the natureof your study than those on other tests? of your research study requires an individually administered test, select it, but be certain you have If several tests seem appropriate after the validity the qualifications needed to administer, score, andcomparisons, the next factor to consider is reliability. interpret the results. If you do not, can you affordDigital Research Tools for the 21st CenturyOnline Test SourcesMany web-based, commercial test databases allow Educational Research Serviceresearchers to identify useful sources of informationabout specific tests. Three of them are described The Educational Research Service (ERS) is a non-here. profit organization focused on providing educators with information about testing programs and theirETS Test Collection Database impact on policy decisions related to student achieve- ment. ERS provides a searchable online catalog ofA joint project of the Educational Testing Service tests as well as research-based resources developed(ETS) and the Education Resources Information to provide timely information about specific testingCenter (ERIC) Clearinghouse on Assessment and issues and concerns. For further information visit theEvaluation, the ETS Test Collection Database is ERS website at online searchable database containing de-scriptions of more than 25,000 tests and research The National Board on Educationalinstruments in almost all fields. In contrast to Testing and Public Policythe MMY, the database includes unpublished aswell as published tests but provides much less The National Board on Educational Testing andinformation on each test. To access the ETS Test Public Policy (NBETPP) monitors tests for appropri-Collection Database, go to ate use and technical adequacy. Housed in the Lynchon the Web. There, you can search for tests and School of Education at Boston College, the NBETPPresearch instruments by title or keyword; each is an independent organization that monitors test-entry included in the database contains the title, ing in the United States. The NBETPP is particularlyauthor, publication date, target population, pub- useful for educators wishing to investigate the policylisher or source, and an annotation describing the implications of any test identified for use as part ofpurpose of the instrument. a study. For further information about the board and links to their resources go to

Find more at chapter 6  •  Constructs, Variables, and Tests 199to hire the necessary personnel? If, after all this pencil test items.2 Table 6.4 presents additional sug-soul searching, you still have more than one test in gestions for preparing items.the running, by all means pick the cheapest one! ■ Avoid wording and sentence structure that is Two additional considerations in test selection ambiguous and confusing.have nothing to do with their psychometric qualities. Poor: All but one of the following items are notBoth are related to the use of tests in schools. If you elements. Which one is not?are planning to include schoolchildren in your study, Better: Which one of the following is an element?you should identify any tests they have alreadytaken so that you do not administer a test with ■ Use appropriate vocabulary.which test takers are already familiar. Second, you Poor: The thesis of capillary execution serves toshould be sensitive to the fact that some parents or illuminate how fluids are elevated in smalladministrators object to a test that contains sensitive tubes. True Falseor personal items. Certain attitude, values, and per- Better: The principle of capillary action helpssonality tests, for example, contain questions related explain how liquids rise in small the personal beliefs and behaviors of the respon- True Falsedents. If the test contains potentially objectionableitems, either choose another test or acquire appro- ■ Write items that have only one correct answer.priate permissions before administering the test. Poor: Ernest Hemingway wrote ______________. Better: The author of The Old Man and the SeaConstructing Tests is ______________.On rare occasions you may not be able to locate ■ Give information about the nature of thea suitable test. One logical solution is to construct desired answer.your own test. Good test construction requires a Poor: Compare and contrast the North and thevariety of skills. If you don’t have them, get some South in the Civil War. Support your As mentioned previously, experience at least Better: What forces led to the outbreak of theequivalent to a course in measurement is needed. Civil War? Indicate in your discussion theYou should buy and read one of the many useful economic, foreign, and social conditions.classroom assessment test books. In addition, if you You will be judged in terms of thesedevelop your own test, you must collect validity and three problems. Your essay should be fivereliability data. A self-developed test should not be paragraphs in length, and spelling andutilized in a research study unless it has first been grammar will count in your grade.pilot-tested by a group of 5 to 10 persons similarto the group you will be testing in the actual study. ■ Do not provide clues to the correct answer.The following discussion gives an overview of some Poor: A figure that has eight sides is called anguidelines to follow if you need to construct a test a. pentagonto administer to schoolchildren. b. quadrilateral c. octagonWriting Your Own Paper-and-Pencil d. ogiveTest Items Better: Figures that have eight sides are called a. pentagonsTo create a paper-and-pencil test, you will need b. quadrilateralsto determine what type or types of test items to c. octagonsinclude. Selection items include multiple-choice, d. ogivestrue/false, and matching. Supply items includeshort-answer items, completion items, and essays. Be sure to assess only content that has beenNote that scoring or judging responses is much taught. Aligning instruction and assessment will helpmore difficult for essays than the other types of you ensure valid results. We strongly suggest that anytest items. Get help if needed. test you construct should be tried in advance. Ask four or five insightful teachers or individuals expe- The following suggestions provide elementary rienced in test-item writing to critique your test forstrategies for constructing your own paper-and- 2 Test items on these pages are from Classroom Assessment: Concepts and Applications (7th ed., pp. 182–190), by M. Russell and P. W. Airasian, 2012, New York: McGraw-Hill. Copyright 2012 by The McGraw-Hill Companies, Inc. Reprinted with permission.

Find more at www.downloadslide.com200 chapter 6  •  Constructs, Variables, and TestsTable 6.4 • Preparing test items Essay Items • Use several short-essay questions rather than oneMultiple-Choice Items• Set pupils’ task in the item stem. long one.• Include repeated words in the stem. • Provide a clear focus in questions.• Avoid grammatical clues. • Indicate scoring criteria to pupils.• Use positive wording if possible. True/False Items• Include only plausible options. • Make statements clearly true or false.• Avoid using “all of the above” or “none of the • Avoid specific determiners. • Do not arrange responses in a pattern. above.” • Do not select textbook sentences.Matching Items Completion and Short-Answer Items• Use a homogeneous problem. • Provide a clear focus for the desired answer.• Put longer options in the left column. • Avoid grammatical clues.• Provide clear direction. • Put blanks at the end of the item.• Use an unequal numbers of entries in the two • Do not select textbook sentences. columns.Source: From Classroom Assessment: Concepts and Applications (7th ed., p. 192), by M. Russell and P. W. Airasian, 2012, New York:McGraw-Hill. Copyright 2012 by The McGraw-Hill Companies, Inc. Reprinted with permission.clarity and logic. On the basis of their suggestions, in more than one session, the conditions of theyou can improve your test. Also conduct a small pilot sessions should be as similar as possible. Third, bestudy. It is not necessary to have a large number of prepared. Be thoroughly familiar with the adminis-participants to find out if your test is valid and clear. tration procedures presented in the test manual, and follow the directions precisely. If the procedures areTest Administration at all complicated, practice beforehand. Administer the test to some group, or stand in front of a mirrorYou should be aware of several general guidelines and give it to yourself.for test administration. First, if testing is to be con-ducted in a school setting, arrangements should As with everything in life, good planning andbe made beforehand with the appropriate persons. preparation usually pay off. If you have made allConsultation with the principal should result in necessary arrangements, secured all necessaryagreement about when the testing will take place, cooperation, and are very familiar and comfort-under what conditions, and with what assistance able with the administration procedures, thefrom school personnel. The principal can be very testing situation should go well. If some unfore-helpful in supplying information such as dates for seen catastrophe, such as an earthquake or awhich testing is inadvisable (e.g., assembly days and power failure, occurs during testing, make care-days immediately preceding or following holidays). ful note of the incident. If it is serious enough toSecond, whether you are testing in the schools or invalidate the testing, you may have to try againelsewhere, you should do everything you can to another day with another group. At minimum,ensure ideal testing conditions; a comfortable, quiet note the occurrence of the incident in your finalenvironment is more conducive to participant coop- research report. You cannot predict every prob-eration. You should monitor test takers carefully to lem that may arise, but you can greatly increaseminimize cheating. Also, if testing is to take place the probability of all going well if you plan and prepare adequately for the big day.

Find more at chapter 6  •  Constructs, Variables, and Tests 201SummaryConstructs other test takers; criterion-referenced scoring compares a student’s test performance to 1. All types of research require collecting data. predetermined standards of performance. Data are pieces of evidence used to examine a research problem or hypothesis. Types of Measuring Instruments 2. Constructs are mental abstractions such as 1 2. Cognitive tests measure intellectual processes. personality, creativity, and intelligence that Achievement tests measure the current status cannot be observed or measured directly. of individuals on school-taught subjects. Constructs become variables when they are stated in terms of operational definitions. 1 3. Aptitude tests are used to predict how well a test taker is likely to perform in the future.Variables General aptitude tests typically ask the test taker to perform a variety of verbal and 3. Variables are placeholders that can assume nonverbal tasks. any one of a range of values. 1 4. Affective tests are assessments designed to 4. Categorical variables assume nonnumerical measure characteristics related to emotion. (nominal) values; quantitative variables assume numerical values and are measured on 1 5. Most affective tests are nonprojective, self-report an ordinal, interval, or ratio scale. measures in which the individual responds to a series of questions about him- or herself. 5. An independent variable is the treatment or cause, and the dependent variable is the 16. Five basic types of scales are used to measure outcome or effect of the independent variable. attitudes: Likert scales, semantic differential scales, rating scales, Thurstone scales,Characteristics of Measuring and Guttman scales. The first three are theInstruments most used. 6. Three main ways to collect data for research 1 7. Attitude scales measure respondents’ feelings studies include administering an existing about various objects, persons, and activities. instrument, constructing an original People respond to Likert scales by indicating instrument, and recording naturally occurring their feelings along a scale such as strongly events (i.e., observation). agree, agree, undecided, disagree, and strongly disagree. Semantic differential scales 7. The time and skill it takes to select an present a continuum of attitudes on which the appropriate instrument are invariably less respondent selects a position to indicate the than the time and skill it takes to develop an strength of attitude, and rating scales present original instrument. statements that respondents must rate on a continuum from high to low. 8. Thousands of standardized and nonstan­ dardized instruments are available for 18. Interest inventories allow individuals to researchers. A standardized test is administered, indicate personal likes and dislikes. Responses scored, and interpreted in the same way no are generally compared to interest patterns of matter when and where it is administered. other people. 9. Most quantitative tests are paper-and-pencil 1 9. Personality describes characteristics that tests, whereas most qualitative researchers represent a person’s typical behavior. collect data by observation and oral Personality inventories include lists of questioning. statements describing human behaviors, and participants must indicate whether each 10. Raw scores indicate the number of items or statement pertains to them. points a person got correct. 20. Personality inventories may be specific to a 11. Norm-referenced scoring compares a student’s single trait (introversion–extroversion) or may test performance to the performance of be general and measure a number of traits.

Find more at www.downloadslide.com202 chapter 6  •  Constructs, Variables, and Tests 21. Use of self-report measures creates a concern 32. Construct validity is determined by a series about whether an individual is expressing his or of validation studies that can include content her true attitude, values, interests, or personality. and criterion-related approaches. Both confirmatory and disconfirmatory evidence 22. Test bias in both cognitive and affective are used to determine construct validity. measures can distort the data obtained. Bias is present when one’s ethnicity, race, gender, 3 3. Consequential validity is concerned with the language, or religious orientation influences potential of tests to create harmful effects for test performance. test takers. 23. Projective tests present an ambiguous situation 34. The validity of any test or measure can be and require the test taker to “project” her or diminished by factors such as unclear test his true feelings on the ambiguous situation. directions, ambiguous or difficult test items, subjective scoring, and nonstandardized 24. Association is the most commonly used administration procedures. projective technique and is exemplified by the inkblot test. Only the specially trained can Reliability of Measuring Instruments administer and interpret projective tests. 35. Reliability is the degree to which a testCriteria for Good Measuring consistently measures whatever it measures.Instruments Reliability is expressed numerically, usually as a coefficient ranging from 0.0 to 1.0; a high 25. Validity is the degree to which a test measures coefficient indicates high reliability. what it is supposed to measure, thus permitting appropriate interpretations of test scores. 3 6. Measurement error refers to the inevitable fluctuations in scores due to person and test2 6. A test is not valid per se; it is valid for a factors. No test is perfectly reliable, but the particular interpretation and for a particular smaller the measurement error, the more group. Each intended test use requires its reliable the test. own validation. Validity is measured on a continuum—tests are highly valid, moderately 37. The five general types of reliability are valid, or generally invalid. stability, equivalence, equivalence and stability, internal consistency, and scorer/rater. 27. Content validity assesses the degree to which a test measures an intended content area. 3 8. Stability, also called test–retest reliability, is the Content validity is of prime importance for degree to which test scores are consistent over achievement tests. time. It is determined by correlating scores from the same test, administered more than once. 28. Content validity is determined by expert judgment, not by statistical means. 3 9. Equivalence, also called equivalent-forms reliability, is the degree to which two similar 29. Criterion-related validity is determined by forms of a test produce similar scores from a relating performance on a test to performance single group of test takers. on a second test or other measure. 4 0. Equivalence and stability reliability is the 30. Criterion validity has two forms, concurrent degree to which two forms of a test given at and predictive. Concurrent validity is the two different times produce similar scores, as degree to which the scores on a test are measured by correlations. related to scores on another test administered at the same time or to another measure 41. Internal consistency deals with the reliability available at the same time. Predictive validity of a single test taken at one time. It measures is the degree to which scores on a test are the extent to which the items in the test are related to scores on another test administered consistent among themselves and with the in the future. In both cases, a single group test as a whole. Split-half, Kuder-Richardson must take both tests. 20 and 21, and Cronbach’s alpha are the main approaches to measuring internal consistency. 31. Construct validity is a measure of whether the construct underlying a variable is actually 42. Split-half reliability is determined by dividing a being measured. test into two equivalent halves (e.g., odd items

Find more at chapter 6  •  Constructs, Variables, and Tests 203 versus even items), correlating the two halves, Sources of Test Information and using the Spearman-Brown formula to determine the reliability of the whole test. 5 0. The Mental Measurement Yearbook (MMY)4 3. Kuder-Richardson reliability deals with the is the most comprehensive source of test internal consistency of tests that are scored information available. It provides factual dichotomously (i.e., right, wrong), whereas information on all known or revised tests, test Cronbach’s alpha deals with the internal reviews, and comprehensive bibliographies consistency of tests that are scored with more and indexes. than two choices (agree, neutral, disagree, or 0, 1, 2, 3). 51. Tests in Print (TIP) is a comprehensive 44. Scorer/rater reliability is important when bibliography of all tests that have appeared scoring tests that are potentially subjective. in preceding editions of MMY. Pro-Ed Interjudge reliability refers to the reliability Publications’ Tests describes more than of two or more independent scorers, whereas 2,000 tests in education, psychology, and intrajudge reliability refers to the reliability of business. a single individual’s ratings over time. 52. The ETS Test Collection Database describesReliability Coefficients more than 25,000 tests, published and unpublished. 45. The acceptable level of reliability differs among test types, with standardized achievement tests 53. Other sources of test information are having very high reliabilities and projective professional journals and test publishers or tests having considerably lower reliabilities. distributors.4 6. If a test is composed of several subtests Selecting from Alternatives that will be used individually in a study, the reliability of each subtest should be 5 4. The three most important factors to consider determined and reported. in selecting a test are its validity, reliability, and ease of use.Standard Error of Measurement Constructing Tests4 7. The standard error of measurement is an estimate of how often one can expect 5 5. Self-constructed tests should be pilot-tested test score errors of a given size. A small before use to determine validity, reliability, standard error of measurement indicates and feasibility. high reliability; a large standard error of measurement indicates low reliability. 56. Be certain to align instruction and assessment to ensure valid test results. 48. The standard error of measurement is used to estimate the difference between a person’s Test Administration obtained and true scores. Big differences indicate low reliability. 5 7. Every effort should be made to ensure ideal test administration conditions. Failing toTest Selection, Construction, administer procedures precisely or alteringand Administration the administration procedures, especially on standardized tests, lowers the validity of4 9. Do not choose the first test you find that the test. appears to meet your needs. Identify a few appropriate tests and compare them on 58. Monitor test takers to minimize cheating. relevant factors.

Find more at This page intentionally left blank

Find more at chapter 6  •  Constructs, Variables, and Tests 205Performance Criteria TAS K 5All the information required for the descriptions of reliability coefficients reported but that one of thethe tests can be found in the Mental Measurements tests is more appropriate for your participants.Yearbook. Following the descriptions, you shouldpresent a comparative analysis of the three tests An example that illustrates the performancethat forms a rationale for your selection of the “most called for by Task 5 follows (see Task 5 Example).acceptable” test for your study. As an example, The task in the example was submitted by theyou might indicate that all three tests have similar same student whose work for Tasks 2, 3A, and 4A was presented in previous chapters.

Find more at www.downloadslide.comTask 5 Example 1 Effect of Interactive Multimedia on the Achievement of 10th-Grade Biology Students Test One (from an MMY, test #160) a) High School Subject Tests, Biology—1980–1990 American Testronics $33.85 per 35 tests with administration directions; $13.25 per 35 machine-scorable answer sheets; $19.45 per Teacher’s Manual (’90, 110 pages). b) The Biology test of the High School Subject Tests is a group-administered achievement test that yields 10 scores (Cell Structure and Function, Cellular Chemistry, Viruses/Monerans/Protists/Fungi, Plants, Animals, Human Body Systems and Physiology, Genetics, Ecology, Biological Analysis and Experimentation). c) Reviewers state that reliability values (KR-20s) for the various subject tests ranged from .85 to .93, with a median of .88. Content validity should be examined using the classification tables and objective lists provided in the teacher’s manual so that stated test objectives and research objectives can be matched. d) Grades 9–12. e) Administration time is approximately 40 minutes. f) Scoring services are available from the publisher. g) Reviewers recommend the test as a useful tool in the evaluation of instructional programs, recognizing that the test fairly represents the content for biology in the high school curriculum. However, they do caution that a match should be established between stated test objectives and local objectives. Test Two (from an MMY, test #256) a) National Proficiency Survey Series: Biology (NPSS:B)—1989 The Riverside Publishing Company $34.98 per 35 test booklets including directions for administration; $19.98 per 35 answer sheets; $9 per technical manual (26 pages) (1990 prices) b) The NPSS:B is a group-administered achievement test with 45 items designed to measure “knowledge about the living world ranging from single-celled organisms to the human body.” c) Content validity is good; items were selected from a large item bank provided by classroom teachers and curriculum experts. The manual alerts users that validity depends in large measure upon the purpose of the test. Although the standard error of measurement is not given for the biology test, the range of KR-20s for the entire battery is from .82 to .91, with a median of .86. d) Grades 9–12. e) Administration time is approximately 45 minutes. f) Tests can be machine scored or self-scored. A program is available on diskette so that machine scoring may be done on site. Both percentile rank and NCE scores are used. NCEs allow users to make group comparisons. g) The reviewer finds the reliability scores to be low if the test is to be used to make decisions concerning individual students. However, he praises the publishers for their comments regarding content validity, which state that “information should always be interpreted in relation to the user’s own purpose for testing.”206

Find more at 2 Test Three (from an MMY, test #135)a) End of Course Tests (ECT) – 1986CTB/McGraw-Hill$21 per complete kit including 35 test booklets (Biology 13 pages) and examiner’s manual.b) The ECT covers a wide range of subjects in secondary school. Unfortunately, detailed information is notavailable for individual subjects. The number of questions range from 42 to 50 and are designed to measuresubject matter content most commonly taught in a first-year course.c) No statistical validity evidence is provided for the ECT and no demographic breakdown is provided tounderstand the representativeness of the standardization samples. However, reliability estimates were given andranged from .80 to .89 using the KR-20 formula.d) Secondary school students.e) Administration time is from 45 to 50 minutes for any one subject test.f) Both machine scoring and hand scoring are available. A Class Record Sheet is provided in the manual to helpthose who hand score to summarize the test results.g) Users must be willing to establish local norms and validation evidence for effectual use of the ECT, since nostatistical validity evidence is provided. Conclusion All three batteries have a biology subtest; The High School Subject Tests (HSST) and the NPSS:B aredesigned specifically for 10th-grade students, while the ECT is course, rather than grade, oriented. It isacknowledged that more data are needed for all three tests, but reported validity and reliability data suggest thatthey all would be at least adequate for the purpose of this study (i.e., to assess the effectiveness of the use ofinteractive multimedia in biology instruction). Of the three tests, the least validity evidence is provided for the ECT, so it was eliminated fromcontention first. Both the HSST and the NPSS:B provide tables and objective lists in their manuals that may beused to establish a match between stated test objectives and research objectives. The HSST and the NPSS:B bothhave good content validity but the HSST does not cross-index items to objectives, as does the NPSS:B. Also,norming information indicates that Catholic school students were included in the battery norm group. Therefore,of the three tests, the NPSS:B seems to be the most valid for the study. With respect to reliability, all three tests provide a comparable range of KR-20 values for battery subtests.While specific figures are not given for the biology subtests, the reported ranges (low eighties to low nineties)suggest that they all have adequate internal consistency reliability. The NPSS:B appears to be the most appropriate instrument for the study. The items (which wereprovided by both classroom teachers and curriculum experts) appear to match the objectives of the researchstudy quite well. The KR-20 reliability is good, both in absolute terms and as compared to that of the otheravailable tests. Both machine- and self-scoring are options, but an added advantage is that machine scoring canbe done on site using a program provided by the publisher. Thus, the NPSS:B will be used in the current study. As a cross-check, internal consistency reliability willbe computed based on the scores of the subjects in the study. 207

Find more at Chapter Seven Survey Research Paradise, 2013“Turning people off is certainly not the way to get them to respond.” (p. 212)

Find more at chapter 7  •  Survey Research 209Learning Outcomes the method section of a research report. In addi- tion, an example of a published study using surveyAfter reading Chapter 7, you should be able to do methods appears at the end of this chapter.the following: Task 6A 1. Define survey research, and differentiate between sample surveys and census surveys, For a quantitative study, you have created research and between cross-sectional surveys and plan components (Tasks 2 and 3A), described a longitudinal surveys. sample (Task 4A), and considered appropriate measuring instruments (Task 5). If your study 2. Describe survey research designs, including involves survey research, you should develop the cross-sectional studies and longitudinal method section of the research report for this task. studies. Include a description of participants, data collec- tion method, and research design (see Performance 3. Describe the procedures involved in Criteria at the end of Chapter 11, p. 348). conducting survey research.The chapter learning outcomes form the basis forthe following task, which requires you to developDefinition SUMMARY: SURVEY RESEARCHDesign(s)Types of appropriate Survey research involves collecting data to test hypotheses or to answerresearch questions questions about people’s opinions on some problem or issue.Key characteristics Cross-sectional or longitudinalSteps in the process Questions about people’s opinions on some problem or issuePotential challengesExample • Sampling from a population • Collecting data through questionnaires or interviews • Construction or identification of survey instrument for data collection • High response rate 1. State the problem. 2. Construct or locate the questionnaire/survey tool. 3. Pilot-test the questionnaire. 4. Prepare the cover letter. 5. Administer the questionnaire: select participants, distribute the questionnaire, conduct follow-up activities. 6. Tabulate the questionnaire responses. 7. Analyze the results. 8. Write the report. • Response rate of 50% or greater A school superintendent wants to know how high school teachers perceive their schools.

Find more at www.downloadslide.com264 Chapter 9  •  Causal–Comparative ResearchIf  a researcher wanted to compare a group of group of students who had received preschool edu-students with an unstable home life to a group of cation to a group who had not may conclude thatstudents with a stable home life, the terms unstable preschool education results in higher first-gradeand stable would have to be operationally defined. reading achievement. However, if all preschoolAn unstable home life could refer to any number programs in the region in which the study was con-of things, such as life with a parent who abuses ducted were private and required high tuition, thea­lcohol, who is violent, and/or who neglects the researcher would really be investigating the effectschild. It could refer to a combination of these or of preschool education combined with membershipother factors. Operational definitions help define in a well-to-do family. Perhaps parents in such fami-the populations and guide sample selection. lies provide early informal reading instruction for their children. In this case, it is very difficult to dis- Random selection from the defined populations entangle the effects of preschool education from theis generally the preferred method of participant effects of affluent families on first-grade reading.selection. The important consideration is to select A  researcher aware of the situation could controlsamples that are representative of their respec- for this variable by studying only children of well-tive populations. Note that, in causal–comparative to-do parents. Thus, the two groups to be comparedresearch, the researcher samples from two already would be equated with respect to the extraneousexisting populations, not from a single population. variable of parents’ income level. This example isThe goal is to have groups that are as similar as but one illustration of a number of statistical andpossible on all relevant variables except the group- nonstatistical methods that can be applied in aning variable. To determine the equality of groups, attempt to control for extraneous variables.information on a number of background and currentstatus variables may be collected and compared for The following sections describe three controleach group. For example, information on age, years techniques: matching, comparing homogeneousof experience, gender, and prior knowledge may be groups or subgroups, and analysis of covariance.obtained and examined for the groups being com-pared. The more similar the two groups are on such Matchingvariables, the more homogeneous they are on every-thing but the variable of interest. This homogeneity Matching is a technique for equating groups onmakes a stronger study and reduces the number one or more variables. If researchers identify aof possible alternative explanations of the research variable likely to influence performance on thefindings. Not surprisingly, then, a number of control dependent variable, they may control for thatprocedures correct for identified inequalities on variable by pair-wise matching of participants. Insuch variables. other words, for each participant in one group, the researcher finds a participant in the other groupControl Procedures with the same or very similar score on the control variable. If a participant in either group does notLack of randomization, manipulation, and control have a suitable match, the participant is eliminatedare all sources of weakness in a causal–comparative from the study. Thus, the resulting matched groupsstudy. In other study designs, random assignment are identical or very similar with respect to theof participants to groups is probably the best way identified extraneous variable. For example, if ato try to ensure equality of groups, but random researcher matched participants in each group onassignment is not possible in causal–comparative IQ, a participant in one group with an IQ of 140studies because the groups are naturally formed would be matched with a participant with an IQ atbefore the start of the study. Without random or near 140 in the other group. A major problemassignment, the groups are more likely to be dif- with pair-wise matching is that invariably someferent on some important variable (e.g., gender, participants have no match and must therefore beexperience, age) other than the variable under eliminated from the study. The problem becomesstudy. This other v­ariable may be the real cause even more serious when the researcher attemptsof the observed difference between the groups. to match participants on two or more variablesFor example, a researcher who simply compared a simultaneously.

Find more at Chapter 9  •  Causal–Comparative Research 265Comparing Homogeneous between the grouping and control variable wouldGroups or Subgroups be that a method involving manipulation of blocks is more effective than other methods for studentsAnother way to control extraneous variables is with lower IQs, but the manipulation method is noto compare groups that are homogeneous with more effective than other methods for students withrespect to the extraneous variable. In the study higher IQs.about preschool attendance and first-grade achieve-ment, the decision to compare children only from Analysis of Covariancewell-to-do families is an attempt to control extrane-ous variables by comparing homogeneous groups. Analysis of covariance is a statistical technique forIf, in another situation, IQ were an identified extra- adjusting initial group differences on variables usedneous variable, the researcher could limit groups in causal–comparative and experimental studies.only to participants with IQs between 85 and In essence, analysis of covariance adjusts scores115 (i.e., average IQ). This procedure may lower on a dependent variable for initial differences onthe number of participants in the study and also some other variable related to performance onlimit the generalizability of the findings because the dependent variable. For example, suppose wethe sample of participants includes such a limited planned a study to compare two methods, X andrange of IQ. Y, of teaching fifth graders to solve math problems. When we gave the two groups a test of math abil- A similar but more satisfactory approach is to ity prior to introducing the new teaching methods,form subgroups within each group to represent all we found that the group to be taught by Method Ylevels of the control variable. For example, each scored much higher than the group to be taughtgroup may be divided into subgroups based on by  Method X. This difference suggests that theIQ: high (e.g., 116 and above), average (e.g., 85 to Method Y group will be superior to the Method X115), and low (e.g., 84 and below). The existence group at the end of the study just because mem-of comparable subgroups in each group controls bers of the group began with higher math abilityfor IQ. This approach also permits the researcher than members of the other group. Analysis of cova-to determine whether the target grouping variable riance statistically adjusts the scores of the Methodaffects the dependent variable differently at differ- Y group to remove the initial advantage so that, atent levels of IQ, the control variable. That is, the the end of the study, the results can be fairly com-researcher can examine whether the effect on the pared, as if the two groups started equally.dependent variable is different for each subgroup. Data Analysis and Interpretation If subgroup comparison is of interest, the bestapproach is not to do separate analyses for each Analysis of data in causal–comparative studiessubgroup but to build the control variable into involves a variety of descriptive and inferentialthe research design and analyze the results with a statistics. All the statistics that may be used in astatistical technique called factorial analysis of vari- causal–comparative study may also be used inance. A factorial analysis of variance (discussed an experimental study. Briefly, however, the mostfurther in Chapter 18) allows the researcher to commonly used descriptive statistics are the mean,determine the effects of the grouping variable (for which indicates the average performance of a groupcausal–comparative designs) or independent vari- on a measure of some variable, and the standardable (for experimental designs) and the control deviation, which indicates the spread of a set ofvariable both separately and in combination. In scores around the mean—that is, whether theother words, factorial analysis of variance tests for scores are relatively close together and clusteredan interaction between the independent/grouping around the mean or widely spread out around thevariable and the control variable such that the inde- mean. The most commonly used inferential statis-pendent/grouping variable operates differently at tics are the t test, used to determine whether theeach level of the control variable. For example, a scores of two groups are significantly different fromcausal–comparative study of the effects of two dif- one another; analysis of variance, used to test forferent methods of learning fractions may includeIQ as a control variable. One potential interaction

Find more at www.downloadslide.com266 Chapter 9  •  Causal–Comparative Researchsignificant differences among the scores for three preceding example, a period of excessive absen-or more groups; and chi square, used to compare teeism were frequently followed by a student get-group frequencies—that is, to see if an event occurs ting in trouble with the law, then the researchermore frequently in one group than another. could reasonably conclude that excessive absen- teeism leads to involvement in criminal activities. Again, remember that interpreting the findings On the other hand, if a student’s first involvementin a causal–comparative study requires consider- in criminal activities were preceded by a period ofable caution. Without randomization, manipula- good attendance but followed by a period of poortion, and control factors, it is difficult to establish attendance, then the conclusion that involvementcause–effect relations with any great degree of in criminal activities leads to excessive absentee-confidence. The cause–effect relation may in fact ism would be more the reverse of the one hypothesized (i.e., thealleged cause may be the effect, and vice versa). The possibility of a third, common explana-Reversed causality is not a reasonable alternative tion is also plausible in many situations. Recallin every case, however. For example, preschool the example of parental attitude affecting bothtraining may affect reading achievement in third self-concept and achievement, presented earliergrade, but reading achievement in third grade in the chapter. As mentioned, one way to con-cannot affect preschool training. Similarly, one’s trol for a potential common cause is to comparegender may affect one’s achievement in mathemat- homogeneous groups. For example, if students inics, but one’s achievement in mathematics cer- both the strong self-concept group and the weaktainly does not affect one’s gender! When reversed self-concept group could be selected from parentscausality is plausible, it should be investigated. who had similar attitudes, the effects of parents’For example, it is equally plausible that excessive attitudes would be removed because both groupsabsenteeism produces, or leads to, involvement would have been exposed to the same parentalin criminal activities as it is that involvement in attitudes. To investigate or control for alternativecriminal activity produces, or leads to, excessive hypotheses, the researcher must be aware of themabsenteeism. The way to determine the correct and must present evidence that they are not betterorder of causality—which variable caused which— explanations for the behavioral differences underis to determine which one occurred first. If, in the investigation.

Find more at Chapter 9  •  Causal–Comparative Research 267SummaryCausal–Comparative Research: some dependent variable. One group mayDefinition and Purpose possess a characteristic that the other does not, or one group may possess more of a 1. In causal–comparative research, the researcher characteristic than the other. attempts to determine the cause, or reason, for 8. Samples must be representative of their respec­ existing differences in the behavior or status tive populations and similar with respect to criti­ of groups. cal variables other than the grouping variable. 2. The basic causal–comparative approach is Control Procedures retrospective; that is, it starts with an effect and seeks its possible causes. A variation of 9. Lack of randomization, manipulation, and the basic approach is prospective—that is, control are all sources of weakness in a starting with a cause and investigating its causal–comparative design. It is possible that effect on some variable. the groups are different on some other major variable besides the target variable of interest, 3. An important difference between causal– and this other variable may be the cause of comparative and correlational research is the observed difference between the groups. that causal–comparative studies involve two (or more) groups of participants and one 10. Three approaches to overcoming problems grouping variable, whereas correlational of initial group differences on an extraneous studies typically involve two (or more) variable are matching, comparing variables and one group of participants. homogeneous groups or subgroups, and Neither causal–comparative nor correlational analysis of covariance. research produces true experimental data. Data Analysis and Interpretation 4. The major difference between experimental research and causal–comparative research is 11. The descriptive statistics most commonly used that, in experimental research, the researcher in causal–comparative studies are the mean, can randomly form groups and manipulate the which indicates the average performance of independent variable. In causal–comparative a group on a measure of some variable, and research the groups are already formed and the standard deviation, which indicates how already differ in terms of the variable in spread out a set of scores is—that is, whether question. the scores are relatively close together and clustered around the mean or widely spread 5. Grouping variables in causal–comparative out around the mean. studies cannot be manipulated, should not be manipulated, or simply are not manipulated 1 2. The inferential statistics most commonly used but could be. in causal–comparative studies are the t test, which is used to determine whether the scores 6. Causal–comparative studies identify relations of two groups are significantly different from that may lead to experimental studies, but only one another; analysis of variance, used to test if a relation is established clearly. The alleged for significant differences among the scores cause of an observed causal–comparative effect for three or more groups; and chi square, used may in fact be the supposed cause, the effect, to see if an event occurs more frequently in or a third variable that may have affected both one group than another. the apparent cause and the effect. 1 3. Interpreting the findings in a causal–The Causal–Comparative comparative study requires considerableResearch Process caution. The alleged cause–effect relation may be the effect, and vice versa, or a third factorDesign and Procedure may be the cause of both variables. The way to determine the correct order of causality is 7. The basic causal–comparative design involves to determine which one occurred first. selecting two groups differing on some variable of interest and comparing them on

Find more at This page intentionally left blank

Find more at www.downloadslide.comExample: Causal-Comparative StudyComparing Longitudinal Academic Achievement ofFull-Day and Half-Day Kindergarten StudentsJennifer R. WolgemuthR. Brian CobbMarc A. WinokurColorado State UniversityNancy LeechUniversity of Colorado–DenverDick EllerbyPoudre School District Abstract  The authors compared the achievement of children who were enrolled in full-day kindergarten (FDK) to a matched sample of stu- dents who were enrolled in half-day kindergarten (HDK) on mathematics and reading achievement in Grades 2, 3, and 4, several years after they left kindergarten. Results showed that FDK students demonstrated signifi- cantly higher achievement at the end of kindergarten than did their HDK counterparts, but that a­dvantage disappeared quickly by the end of the first grade. Interpretations and implications are given for that finding.Key words: academic achievement of full and half-day kindergarten students, (01)mathematics and reading success in elementary grades. (02) Coinciding with increases in pre-kindergarten enrollment and the num- (03)ber of parents working outside of the home, full-day kindergarten (FDK) hasbecome exceedingly popular in the United States (Gullo & Maxwell, 1997).The number of students attending FDK classes in the United States rosefrom 30% in the early 1980s (Holmes & McConnell, 1990) to 55% in 1998(National Center for Education Statistics, 2000), reflecting societal changesand newly emerging educational priorities. Whereas kindergarten studentswere required to perform basic skills, such as reciting the alphabet andcounting to 20, they are now expected to demonstrate reading readiness andmathematical reasoning while maintaining the focus and self-control neces-sary to work for long periods of time (Nelson, 2000). In contrast, the popularity of half-day kindergarten (HDK) has de-creased for similar reasons. For example, parents prefer FDK over HDK forthe time it affords (Clark & Kirk, 2000) and for providing their children withfurther opportunities for academic, social, and personal enrichment (Aten,Foster, & Cobb, 1996; Cooper, Foster, & Cobb, 1998a, 1998b). The shift in kindergarten preferences has resulted in a greater demandfor research on the effects of FDK in comparison with other scheduling ap-proached (Gullo & Maxwell, 1997). Fusaro (1997) cautioned[,]…that “Beforea school district decides to commit additional resources to FDK classes,Address correspondence to R. Brian Cobb, College of Applied Human Sciences,Colorado State University, 222 West Laurel Street, Fort Collins, CO 80521. ( Copyright © 2006 Heldref Publications. 269

Find more at it should have empirical evidence that children who attend FDK manifest greater achievement than children who attend half-day kindergarten” (p. 270). According to the literature, there is mounting evidence that supports the aca- demic, social, and language development benefits of FDK curricula (Cryan, Sheehan, Wiechel, & Bandy-Hedden, 1992; Hough & Bryde, 1996; Karweit, 1992; Lore, 1992; Nelson, 2000). Successful FDK programs specifically ex- tend traditional kindergarten objectives and use added class hours to afford children more opportunities to fully integrate new learning (Karweit, 1992). Furthermore, most education stakeholders support FDK because they believe that it provides academic advantages for students, meets the needs of busy parents, and allows primary school teachers to be more effective (Ohio State Legislative Office of Education Oversight [OSLOEO], 1997). Length of School Day (04) According to Wang and Johnstone (1999), the “major argument for full-day kindergarten is that additional hours in school would better prepare children for first grade and would result in a decreased need for grade retention” (p. 27). Furthermore, extending the kindergarten day provides educational advantages resulting from increased academic emphasis, time on task, and content coverage (Karweit, 1992; Nelson, 2000; Peck, McCaig, & Sapp, 1988). Advocates of FDK also contend that a longer school day allows teachers to provide a relaxed classroom atmosphere in which children can experience kindergarten activities in a less hurried manner (McConnell & Tesch, 1986). Karweit (1992) argued that consistent school schedules and longer school days help parents to better manage family and work responsibilities while providing more time for individualized attention for young children. (05) Critics of FDK express concern that “children may become overly tired with a full day of instruction, that children might miss out on important learning experiences at home, and that public schools should not be in the business of providing ‘custodial’ child care for 5-year-olds” (Elicker & Mathur, 1997, p. 461). Peck and colleagues (1988) argued that some FDK programs use the extra time to encroach on the first-grade curriculum in an ill-advised attempt to ac­celerate children’s cognitive learning. However, in a 9-year study of kindergarten students, the Evansville-Vanderburgh School Corporation (EVSC, 1988) found that school burnout and academic stress were not issues for FDK students. Others conclude convincingly that the events that occur in classrooms (e.g., teacher philosophy, staff development), rather than the length of the school day, determine whether curricula and in- struction are developmentally appropriate for young students (Clark & Kirk, 2000; Elicker & Mathur, 1997; Karweit, 1994). Parent Choice (06) A critical factor driving the growth of FDK is greater parent demand for choice in kindergarten programs. Although surveys of parents with children in HDK often mention the importance of balancing education outside the home with quality time in the home, Elicker and Mathur (1997) found that a majority of these parents would select an FDK program for their child if given the opportunity. However, Cooper and colleagues (1998a) found that parents of FDK students were even more supportive of having a choice of programs than were parents of HDK students. (07) Although some parents expressed concern about the length of time that children were away from home, most were content with her option of FDK (Nelson, 2000): In addition to the belief that FDK better accommodates their work schedules (Nelson), “parents of full-day children expressed higher levels of satisfaction with program schedule and curriculum, citing benefits similar to those expressed by teachers: more flexibility; more time for child-initiated,270

Find more at www.downloadslide.comin-depth, and creative activities; and less stress and frustration” (Elicker &Mathur, 1997, p.  459). Furthermore, Cooper and colleagues (1998a) foundthat parents of full-day students were happy with the increased opportunitiesfor academic learning afforded by FDK programs.Student Achievement (08)Most researchers who compared the academic achievement levels ofFDK and HDK kindergarten students found improved educational perfor-mance within FDK programs (Cryan et al., 1992; Elicker & Mathur, 1997;Holmes & McConnell, 1990; Hough & Bryde 1996; Koopmans, 1991; Wang& Johnstone, 1999). In a meta-analysis of FDK research, Fusaro (1997)found that students who attended FDK demonstrated significantly higheracademic achievement than did students in half-day programs. Houghand Bryde (1996) matched six HDK programs with six FDK programsand found that FDK students outperformed HDK students on languagearts and mathematics criterion-referenced assessments. In a study of 985kindergarten students, Lore (1992) found that 65% of the students who at-tended an FDK program showed relatively stronger gains on the readingand oral comprehension sections of the Comprehensive Test of Basic Skills.In a 2-year evaluation of a new FDK program, Elicker and Mathur (1997)reported that FDK students demonstrated significantly more progress in lit-eracy, mathematics, and general learning skills, as compared with studentsin HDK programs. However, some researchers have not found significantdifferences between the academic achievement of students from FDK andHDK programs (e.g., Gullo & Clements, 1984; Holmes & McConnell, 1990;Nunnally, 1996).Longitudinal Student Achievement (09) (10)Evidence supporting the long-term effectiveness of FDK is less availableand more inconsistent than is its short-term effectiveness (Olsen & Zigler,1989). For example, the EVSC (1988) reported that FDK students had highergrades than did HDK students throughout elementary and middle school,whereas Koopmans (1991) found that the “significance of the differencesbetween all-day and half-day groups disappears in the long run [as] testscores go down over time in both cohorts” (p. 16). Although OSLOEO(1997) concluded that the academic and social advantages for FDK studentswere diminished after the second grade, Cryan and colleagues (1992) foundthat the positive effects from the added time offered by FDK lasted well intothe second grade. Longitudinal research of kindergarten programming conducted in the1980s (Gullo, Bersani, Clements & Bayless, 1986; Puleo, 1988) has been criti-cized widely for its methodological flaws and design weaknesses. For exam-ple, Elicker and Mathur (1997) identified the noninclusion of initial academicabilities in comparative models as a failing of previous longitudinal researchon the lasting academic effects of FDK.Study Rationale (11)In 1995, the Poudre School District (PSD) implemented a tuition-based FDKprogram in ­addition to HDK classes already offered. Although subsequentsurveys of parent satisfaction revealed that FDK provided children withfurther opportunities for academic enrichment (Aten et al., 1996; Cooper etal., 1998a, 1998b), researchers have not determined the veracity of these as-sumptions. Thus, we conducted the present study to address this gap in theempirical evidence base. 271

Find more at www.downloadslide.comResearch questions are appro- Research Questionspriate given the focus on thevariable of full-day kindergarten (12) Because of the inconclusiveness in the research literature on the longitudinal academic achievement of FDK versus HDK kindergarten students, we didversus half-day kindergarten not pose a priori research hypotheses. We developed the following researchand the effects of these two questions around the major main effects and interactions of the kindergartenprogram choices on student class variable (full day vs. half day), covariates (age and initial ability), andachievement in mathematics and dependent variables (K–5 reading and mathematics achievement).reading abilities as they progress 1. What difference exists between FDK and HDK kindergarten studentsthrough elementary school. It in their mathematics and reading abilities as they progress throughwould not be ethical to randomlyassign children to FDK and HDK elementary school, while controlling for their initial abilities?programs. Researchers are also 2. How does this differential effect vary, depending on student gender?interested in whether the organ-ismic variable of gender plays a Methodrole in student achievement. Participants (13) The theoretical population for this study included students who attendedParticipant selection of 489 elementary schools in moderately sized, middle-to-upper class cities in thestudents provided a substantial United States. The actual sample included 489 students who attended FDKsample that far exceeds the mini- or HDK from 1995 to 2001 at one elementary school in a Colorado city ofmum requirement of 15 partici- approximately 125,000 residents. Because this study is retrospective, we usedpants per group. only archival data to build complete cases for each student in the sample. Hence, no recruitment strategies were necessary. (14) Students were enrolled in one of three kindergarten classes: 283 stu-Researchers relied on archival dents (57.9%) a­ttended half-day classes (157 half-day morning and 126data to build complete cases for half-day afternoon) and 206 students (42.1%) attended full-day classes.each student in the sample. Students’ ages ranged from 5 years 0 months to 6 years 6  months upon entering kindergarten; overall average age was 5 years 7 months. The total study ­included 208 girls (44.0%) and 265 boys (56.0%). The majority of stu-The researchers chose not to dents r­eceived no monetary assistance for lunch, which was based on par- ent income (89.0%, n = 424); 49 ­students (10.0%) received some assistance.include participants from v­ ariousethnic groups represented in the Twenty-six students (5.3%) spoke a language at homeother than English.p­ opulation and those receiving The majority of students (90.5%, n = 428) were Caucasian; 31 studentslunch ­assistance from the study. (6.3%) were Hispanic; and 14 students (2.8%) were African American, NativeWhat potential impact could this American, or Asian American. Those data reflect the community demograph-d­ ecision have on the outcomes of ics within the school district. Because of the potential for individual identi-the study? fication based on the small numbers of students within the various ethnic groups and those receiving lunch assistance, our analyses excluded ethnicity and lunch assistance as control variables. Intervention (15) We excluded from the study students who switched during the academic year from FDK to a HDK (or vice versa). FDK comprised an entire school day, beginning at 8:30 a.m. and ending at 3:00 p.m. HDK morning classes took place from 8:30 a.m. to 11:15 a.m.; HDK afternoon classes occurred from 12:15 p.m. to 3:00 p.m. FDK recessed at lunch and provided a 30-min rest period in the afternoon when students typically napped, watched a video, or both. HDK student also recessed but did not have a similar rest period. Both kindergarten programs employed centers (small ability-based groups) as part of their reading and mathematics instruction, and all kindergarten teachers met weekly to discuss and align their curriculum. The amount of time spent on reading instruction was two or three times greater than that dedicated to mathematics. (16) Reading Curriculum. The kindergarten reading curriculum was based predominantly on the Open Court system, which emphasizes phonemic awareness. Students learned to segment and blend words by pronouncing272

Find more at www.downloadslide.comand repronouncing words when beginnings and endings were removed. (17)Teachers also included daily “letters to the class” on which students iden-tified the letters of the day and circled certain words. Teachers also readstories to students, helped students write capital and lowercase letters andwords, and encouraged them to read on their own and perform other read-ing activities. Teachers expected the students to know capital and lowercaseletters and their sounds, and some words by sight when they completedkindergarten.Mathematics Curriculum. The kindergarten mathematics curriculum waspredominantly workbook based and integrated into the whole curriculum.Students worked with mathematics problems from books, played numbergames with the calendar, counted while standing in line for lunch and recess,and practiced mathematical skills in centers. Once a week, the principal cameinto the kindergarten classes and taught students new mathematics gameswith cards or chips. The games included counting-on, skip-counting, and sim-ple addition and subtraction. Students were expected to leave kindergartenknowing how to count and perform basic n­ umerical operations (i.e., addingand subtracting 1).Measures  Control measures for the studyInitial Reading Ability Covariate.  When each participant entered kinder- (18) included; initial reading ability, initial mathematics ability, andgarten, school personnel (kindergarten teacher or school principal) assessed K–2 ­reading fluency.them for their ability to recognize capital and lowercase letters and to pro-duce their sounds. This letter-knowledge assessment ­requested that studentsname all uppercase and lowercase letters (shown out of order) and makethe sounds of the uppercase letters. Students received individual testing, andschool personnel recorded the total number of letters that the student identi-fied correctly out of a possible 78 letters. Letter-name and -sound knowledgeare both essential skills in reading development (Stage, Sheppard, Davidson,& Browning, 2001). Simply put, theory suggests that letter-name knowledgefacilitates the ability to produce letter sounds, whereas letter-sounding abil-ity is the foundation for word d­ ecoding and fluent reading (Ehri, 1998; Kirby& Parrila, 1999; Trieman, Tincoff, Rodriguez, Mouzaki, & Francis, 1998).Predictive validity is evidenced in the numerous studies in which researchershave reported high correlations (r = .60 to r = .90) between letter-naming andletter-sounding ability and subsequent reading, ability and achievement mea-sures (Daly, Wright, Kelly, & Martens, 1997; Kirby & Parrila, 1999; McBride-Chang, 1999; Stage et al., 2001).Initial Mathematics Ability Covariate.  When the students entered kinder- (19)garten, school personnel (kindergarten teacher or school principal) assessedtheir initial mathematics ability. The assessment consisted of personnelasking students to identify numbers from 0 to 10. They ­recorded the totalnumber that the student named out of a possible 11. The ability to recognizenumbers and perform basic numerical operations, such as counting to 10,is recognized as ­important indicators of kindergarten readiness (Kurdek &Sinclair, 2001). Researchers have shown that basic number skills (countingand number recognition) in early kindergarten predict mathematics achieve-ment in first grade (Bramlett, Rowell, & Madenberg, 2000) and in fourthgrade (Kurdek & Sinclair, 2001).K–2 Reading Fluency Dependent Variable: One–Minute Reading (OMR) (20)Assessment.  The school principal assessed K–2 reading achievement byconducting 1-min, grade-appropriate reading samples with each student atthe beginning and end of the school year. The kindergarten reading pas-sage contained 67 words, the first-grade passage had 121 words, and the 273

Find more at second-grade passage included 153 words. Students who finished a passage in less than 1 min returned to the beginning of the passage and continued reading until the minute expired. The principal recorded the total number of words that a student read correctly in 1 min. Students who read pas- sages from grades higher than their own were excluded from subsequent analyses. (21) The OMR is a well-known curriculum-based measure of oral fluency that is theoretically and empirically linked to concurrent and future read- ing achievement (Fuchs, Fuchs, Hosp, & Jenkins, 2001). Scores on the OMR correlate highly with concurrent criteria (r = .70 to .90; Parker, Hasbrouck, & Tindal, 1992). Evidence of oral fluency criterion validity includes high cor- relations with teacher student-ability judgments ( Jenkins & Jewell, 1993), standardized student achievement test scores (Fuchs, Fuchs, & Maxwell, 1988; Jenkins & Jewell), reading inventories (Parker et al., 1992), and read- ing-comprehension tests (Hintze, Shapiro, Conte, & Basile, 1997; Kranzler, Brownell, & Miller, 1998). (22) Dependent Variables for Reading- and Mathematics-Achievement- Level Tests: Reading and Mathematics Levels.  The Northwest Evaluation Association (NWEA) developed standardized reading-, mathematics-, and science-level tests for the Poudre School District. NWEA g­ enerated the tests from a large data bank of items that were calibrated on a common scale using Rasch measurement techniques. The tests measure student performance on a Rasch unit (RIT) scale that denotes a student’s ability, independent of grade level. The elementary school conducted reading- and mathematics-level tests once a year in the spring with all second- through sixth-grade students who could read and write. NWEA (2003) reported that the levels tests correlate highly with other achievement tests, including the Colorado State Assessment Program test (r = .84 to .91) and the Iowa Tests of Basic Skills (r = .74 to .84). Test–retest ­reliability results were similarly favorable, ranging from .72 to .92, depending on grade level and test (NWEA).The authors introduce us to a Resultsnew term here: sphericity. Instatistical analysis (which we (23) Rationale for Analyses.  We considered several alternatives when we ana-haven’t covered yet!), spheric- lyzed the data from this study. Our first choice was to analyze the data byity relates to the equality of using three multiway mixed analyses of covariances (ANCOVAs) with kin- dergarten group and gender as the between-groups factors and the repeatedthe variances of the differences measurements over time as the within-subjects factor. However, we rejectedbetween levels of the repeated that analytic technique for two reasons. First and foremost, all three analy-measures. In the case of this ses evidenced serious violations of sphericity. Second, this analytic designstudy, the repeated measures of requires that all cases have all measures on the dependent variable (thereading and mathematics levels within-subjects factor). That requirement reduced our sample size by as muchwere found to have unequal as 75% in some of the analyses when compared with our final choice of sepa-variances and covariances. More rate univariate, between-groups follow on this in Chapter 18. (24) Our second choice was to analyze the data with three 2 * 2This r­esulted in a reduction of (Kindergarten Group [full day vs. half day] * Gender) between-groups multi-the sample size by up to 75%. variate analyses of variance (MANCOVAs) with the multiple dependent vari- ables measures included simultaneously in the analysis. Field (2000) recom- mended switching from repeated-measure ANCOVAs to MANCOVAs when sample sizes are relatively high and violations of sphericity are fairly severe, as in our situation. Unfortunately, there also are difficulties when research- ers use MANCOVAs. First, the analysis and interpretation of MANCOVA are extraordinarily complex and cumbersome. More important, a number of statisticians (e.g., Tabachnick & Fidell, 1996) have counseled against us- ing MANCOVA when strong intercorrelations exist between the dependent274

Find more at www.downloadslide.commeasures. Finally, our data violated the homogeneity of covariance matrices, (25)which is an additional assumption of MANCOVA. (26) (27) Our final choice was to conduct separate univariate ANCOVAs with ap-propriate Bonferroni adjustments to prevent inflation in the Type I error rate. (28)For the OMR, we began our analyses with five 2 * 2 (Kindergarten Group * (29)Gender) ANCOVAs, with initial reading ability as the covariate. We measuredOMR at the end of kindergarten and at the beginning and end of first andsecond grades. The alpha level was set at .01 for each of the five analyses. For the reading-level analyses, we conducted three 2 * 2 ANCOVAsbecause reading achievement tests were given in the spring of the second,third, and fourth grades. The alpha level was set at .017 for each of theanalyses. For the mathematics levels analyses, we conducted three 2 * 2ANCOVAs with the mathematics achievement tests given in the spring ofthe second, third, and fourth grades. The alpha level was also set at .017 forthose analyses.Assessing Assumptions.  We began our univariate ANCOVA analyses by test-ing for univariate and multivariate normality. Univariate normality existed inall 11 analyses, at least with respect to skewness. There were two instancesin which univariate kurtosis exceeded acceptable boundaries for normality.Although there were a limited number of instances in which multivariatenormality was mildly violated, visual inspection of the histograms and Q-Qplots suggested no substantive deviations from normality, except for the OMRtest given at the end of kindergarten. Hence, we eliminated the test from ourfinal set of analyses. Given the large sample sizes and the relative robustnessof ANCOVA against violations of normality, we proceeded with the remaining10 ANCOVAs. We next assessed the assumption of homogeneity of regression slope,which, if violated, generates much more difficulty in the interpretation of theresults of the analyses. Neither of the five OMR analyses nor any of the threemathematics levels analyses violated that assumption. However, the third-grade reading-level analysis violated the assumption. Hence, we removedthat analysis from the study, leaving only two analyses of reading achieve-ment, at the second- and fourth-grade levels. Finally, we assessed the correlation between the covariate and thedependent variable. We began by assuming that the participant’s age (mea-sured in months) might be correlated significantly with the dependent vari-ables and should be included in our analyses as a covariate. Tables 1 and2 show the results of this analysis and that none of the correlations wereTable 1Correlations of Dependent Variables with Initial Reading Ability and Age OMR end OMR OMR end OMR OMR end Level 2 Level 4 kindergarten beginning Grade 1 beginning Grade 2 rn rn Grade 1 Grade 2 .40** 234 .30** 103Variable r n r nr nr nr n .03 266 -.10 127Initial .47** 403 .50** 265 .40** 198 .39** 97 .41** 182readingability Age .05 453 .03 301 .01 231 .03 105 .07 208Note: OMR = One-Minute Reading.** p 6 .01. 275

Find more at www.downloadslide.comTable 2Correlations of Dependent Variables with Initial Mathematics Ability and AgeVariable Level 2 Level 3 Level 4Initial mathematics ability .22*  120 r .35** .30** −.09 n  194  180 127Age r   .03 −.02 n  264  189* p 6 .05.** p 6 .01. statistically significant. Hence, we did not include age in the analyses as a covariate. (30) Initial reading and mathematics abilities were the other covariates in- cluded in the analyses. Our a priori assumption was that those covariates had to correlate significantly with their appropriate dependent variable to the included in the analyses. As Tables 1 and 2 show, all of the final corre- lation were statistically significant, confirming the propriety of their use as covariates. (31) Findings  Tables 3, 4, and 5 show the source tables for the OMR, the read- ing levels, and the mathematics levels, respectively. In each table, the kin- dergarten grouping independent variable is included in the table, regardless of whether it achieved statistical significance. Gender, on the other hand, is included in the source tables only in those analyses in which it achieved sta- tistical significance (second-grade mathematics achievement). (32) Table 3 shows that kindergarten class was statistically significant at the end of kindergarten, F (1, 400) = 35.08, p 6 .001, at the beginning of first grade, F (1, 261) = 11.43, p 6 .01, and at the end of first grade, F (1, 194) = 6.26, p 6 .05. The covariate, as expected, was strongly significant at all lev- els, and gender was not statistically significant at any level in the analyses. Significance levels and the estimates of effect size declined as the partici- pants progressed in school within and across academic years. (33) Table 4 shows that the covariate was highly significant (as expected) but with no statistically significant effect for either kindergarten class or gender. Table 5 shows a similar pattern in the two preceding tables, with (a) a sta- tistically significant covariate, (b) absence of statistical significance for the kindergarten class, and (c) declining estimates of effect size as time in school increased. Gender was statistically significant at the second grade. (34) Table 6 shows the subsample sizes, means, standard deviations, and corrected effect sizes for each of the two kindergarten alternatives across all dependent measures. The only effect size estimate whose magnitude ap- proaches Cohen’s (1998) standard for minimal practical significance (.25) is the first one reported in Table 6 (.44). That effect size indicates that FDK confers a small-to-moderate advantage on reading ability at the end of the Kindergarten experience. At the beginning and end of first grade, that advan- tage is no longer practically significant, although it is still positive. Beginning in second grade, the advantage in reading and mathematics is neither practi- cally significant nor positive for FDK students.276

Find more at www.downloadslide.comTable 3Analysis of Covariance Results for OMR Fluency Tests as a Function of Kindergarten Class, Controlling for InitialReading AbilityVariable and source df MS fpOMR (end kindergarten)Kindergarten class 1 14,405.36 32.79 6.001Initial reading ability 1 59,031.95 134.37 6.001Error 398 439.33OMR (beginning Grade 1)Kindergarten class 1 10,339.87 10.76 .001Initial reading ability 1 96,556.42 100.43 6.001Error 260 961.42OMR (end Grade 1)Kindergarten class 1 5,261.69 5.73 .018Initial reading ability 1 39,604.41 43.15 6.001Error 193 917.79OMR (beginning Grade 2)Kindergarten class 1 185.25 .22 .64Initial reading ability 1 14,922.39 17.45 6.001Error 92 855.23OMR (end Grade 2)Kindergarten class 1 100.23 .14 .71Initial reading ability 1 25,530.89 35.52 6.001Error 177 718.73Note: OMR = One-Minute Reading.Table 4Analysis of Covariance Results for Reading Achievement Tests as a Function of Kindergarten Class, Controlling forI­nitial Reading AbilityVariable and source df MS f pLevel 2 readingKindergarten class 1 43.82 .37 .55Initial reading ability 1 5,496.21 45.79 6.001Error 228 120.02Level 4 readingKindergarten class 1 12.85 .10 .76Initial reading ability 1 1,265.53 9.50 .003Error 98 133.22 277

Find more at www.downloadslide.comTable 5Analysis of Covariance Results for Kindergarten and Mathematics Achievement Tests as a Function of KindergartenClass, Controlling for Initial Mathematics AbilityVariable and source df MS f pLevel 2 mathematics Kindergarten class 1 22.53 .22 .64 Gender 1 707.76 6.87 .009 Initial mathematics ability 1 3,485.92 33.82 6.001 Error 248 103.08Level 3 mathematics Kindergarten class 1 29.74 .34 .56 Initial mathematics ability 1 1,464.35 16.66 <.001 Error 175 87.89Level 4 mathematics Kindergarten class 1 12.47 .11 .75 Initial mathematics ability 1 756.59 6.37 .013 Error 115 118.79Table 6Descriptive Information for Statistically Significant Comparison for Full-Day Versus Half-Day Kindergarten, on AllDependent Variables Kindergarten class Half-day Full-dayDependent variable N Ma SD N Ma SD ES(d)OMR (end Kindergarten)OMR (beginning Grade 1) 220  25.33 23.72 183  37.52 25.03 .50OMR (end of Grade 1)OMR (beginning Grade 2) 156  44.62 36.57 109  57.61 36.47 .36OMR (end of Grade 2)Reading achievement (Grade 2) 120  84.56 33.50  78  95.87 33.77 nsReading achievement (Grade 4)Mathematics achievement (Grade 2)  65  62.31 29.96  32  65.54 34.65 nsMathematics achievement (Grade 3)Mathematics achievement (Grade 4) 108  95.81 28.47  74  97.43 30.53 ns 137 195.95 12.05  96 196.86 11.77 ns  70 214.90 11.01  33 214.11 14.11 ns 151 199.71 11.08 102 199.09 10.86 ns 109 212.60  9.78  71 213.45 10.09 ns  82 218.94 11.14  38 219.64 11.09 nsNote: OMR = One-Minute Reading.aCovariate adjusted means.278

Find more at www.downloadslide.comFollow-Up Interviews (35) The researchers added follow-As a follow-up to our analyses, we interviewed the four kindergarten teach- up interviews to their researchers in January 2004, for their views on (a) the kindergarten curriculum, (b) design. Note the use of inter-their perceived differences between FDK and HDK programming, and (c) views in a causal–comparativetheir explanations for the findings that we observed between FDK and HDK study is not part of the design,students in reading and mathematics achievement. The teachers were women however their use adds anotherwho had taught for 14, 9, 8, and 6 years, respectively. They had previously lens to the findings of the study.taught FDK and HDK kindergarten and had been teaching kindergarten at In Chapter 15 we will discussthe elementary school research site for 10, 9, 6, and 4 years, respectively. mixed methods research andTwo of the teachers were still teaching kindergarten; the other two teachers how the use of an explanatorywere teaching second and sixth grades, respectively. One teacher admitted mixed methods design canthat she had a “half-day bias,” whereas another teacher was a “proponent of contribute to our understandingfull-day kindergarten.” of the research results.All interviews consisted of open-ended questions and lasted between (36)30 min and 1 hr. The interviews were tape-recorded and transcribed and re-turned to the teachers for review. After approval of the transcripts, we codedthe interviews by using constant comparative analytic techniques (Strauss &Corbin, 1994), which involved inductively identifying themes and developingwritten summaries. When questioned about the differences between FDK and HDK, all (37)teachers stated that they would have expected FDK students, in general, toperform better academically than HDK students at the end of kindergarten.They attributed the difference to the increased time that FDK students spentreviewing and practicing material. However, consistent with our findings, allteachers were equally doubtful that the differences would last. They believedthat the academic disparity between FDK and HDK students would disap-pear during first through third grades. For example, one teacher stated that“kids, by third grade, catch up or things kind of level out so I don’t thinkthere’d be much of a difference.”Although teachers agreed that the FDK advantage probably did not ex- (38)tend past early e­ lementary education, their explanations for the ephemeraldifferences varied and fell into three general categories: (a) effects of differ-entiated instruction, (b) individual student development, and (c) individualstudent attributes.Differentiated Instruction.  All teachers, in various ways, suggested that (39)differentiated instruction would need to occur in every grade subsequent (40)to kindergarten to, at least partially, maintain higher achievement levelsevidenced by FDK students. When asked to define differentiated instruction,one teacher said: What it means to me is that I need to meet that child where they are. I mean I need to have appropriate material and appropriate instruction for that child…. I need to make sure that they’re getting what they need where they are…. But, I think you need to set the bar pretty high and expect them to reach that; on the other hand, I think you need to not set it so high that you’re going to frustrate the kids that aren’t ready. However, the kindergarten teachers recognized the challenges of us-ing differentiated instruction and were careful not to place blame on first-through third-grade teachers. One teacher stated, “I’m not saying that not ev-eryone does differentiated instruction. But I think that you have to be carefulyou don’t do too much whole group teaching to a group of kids that’s waypast where they’re at.” Although all of the teachers agreed that differentiatedinstruction would be necessary to maintain differences after kindergarten, 279

Find more at not all of them believed that this technique would be singularly sufficient. Some teachers believed strongly that the “leveling out” was predominantly a result of individual student development or student attributes, or both, rather than teaching methods. (41) Students Development.  Two teachers felt that the leveling out of academic differences between FDK and HDK students by second grade resulted from natural developmental growth occurring after kindergarten. They explained: You have kids that cannot hear a sound. They cannot hear, especially the vowel sounds. They are not ready to hear those. They are not mature enough to hear those sounds. You could go over that eight billion times and they just aren’t ready to hear those sounds. They go into first grade and they’ve grown up over the summer and…it clicks with them. And they might have been low in my class, but they get to first grade and they’re middle kids. They’ve kind of reached where their potential is. I mean, there’s big developmental gap in K, 1, 2 and by third grade the kids that look[sic] behind, if they’re going to be average or normal, they catch-up by then…. Like some kids in second grade, they still struggle with handwriting and reversal and by now it’s a red flag if they’re still doing that, developmentally everything should be fitting together in their little bodies and minds and they should be having good smooth handwriting and writ- ing in the right direction. And if that’s not happening then that’s a red flag. And by third grade…if they’re not forming like an average student then there’s something else that needs to be looked at. So it’s a development thing and it’s just when kids are ready. (42) Yet, both of those teachers acknowledged that HDK students do have to work to catch up to FDK students, citing (a) less time spent on material, (b) differ- ences in FDK and HDK teachers’ instructional philosophies, and (c) lack of familiarity with all-day school as disadvantages that HDK students must over- come in first grade to equal their FDK counterparts. (43) Student attributes. A final explanation that teachers offered for the level- ing out of differences suggested that individual student attributes accounted for student differences in subsequent grades. Three teachers believed that, no matter what kindergarten program students attended, their inherent level of academic ability or level of parent involvement, or both, were most impor- tant in eventually determining how individual students would compare with other students. For example, I think they get to where their ability is, regardless of…. You can give them a good start and I think that can make a difference, but a high kid is go- ing to be high whether they were in full or half. And those gray kids, you can give them a boost and they can be higher than maybe they would have been in half-day, you know you can give them a better start. (44) Thus, these three teachers believed that student attributes, such as inherent ability or degree of parent involvement in their schooling, would ultimately play a more significant role in how students would eventually compare with one another in second and third grades, regardless of whether they attended FDK or HDK programs. Discussion (45) What can be determined about the effects of FDK versus HDK kindergarten as a result of our analyses? Children who attend FDK can and do learn more through that experience than do their HDK counterparts. Nonetheless, the280

Find more at www.downloadslide.comadditional learning appears to decline rapidly, so much so that by the start of (46)first grade, the benefits of FDK have diminished to a level that has little prac-tical value. That effect was consistent across two measures of reading and (47)one measure of mathematics. The effect also was consistent across gender, (48)given that there was a gender by kindergarten-group interaction in only oneof the analyses. Our findings are consistent with past meta-analytic research (Fusaro,1997) and high-quality empirical studies (e.g., Hough & Bryde, 1996) in thatFDK confers initial benefits on academic achievement but that these benefitsdiminish relatively rapidly (OSLOEO, 1997). We are unclear why the rapiddecline occurs, but we offer this insight from several school administratorsand teachers with whom we interacted in our discussions of these data: Teachers in the first few grades are so concerned with students who enter their classes [with] nonexistent reading and math skills that they spend the majority of their time bringing these students up to minimal math and read- ing criteria at the expense of working equally hard with students whose reading and math achievement are above average. Hence, the high-achiev- ing students’ gains at the end of kindergarten gradually erode over the next few years with lack of attention. We concur with Fusaro (1997) that districts must make their choicesinvolving FDK with a full understanding of what the benefits may be foracademic achievement and nonachievement outcomes. Our findings of ini-tial gains place the onus of maintaining those gains on schools and teachersthrough their own internal policies, procedures, and will to sustain thosegains. Our study, of course, is not without limitations. We studied only oneschool, albeit over a relatively long period of time, with well-establishedmeasures and with reasonably well-equated groups. The greatest reservationwe have about the generalizability of our findings clearly focuses on the pre-dicted decline in long-term benefits of FDK for schools, making it a priorityto assure that teachers provide differentiated instruction to all students toadvance each one as far as possible during the academic year rather than tomove all students to a common set of expected learning at the end of theacademic year. We recognize that school policies, procedures, and cultureplay important roles in the variability in student achievement, regardlessof the skill levels of students entering first grade. Although our results willlikely generalize to a wide variety of elementary school children, they alsowill likely generalize to those children who attend schools whose instruc-tional policies and practices in the early grades are similar to the school inthis study.NoteThe authors appreciate the thoughtful participationof Suzie Gunstream and the other elementary teach-ers whose invaluable practitioner insights helped usmake sense of the findings. 281

Find more at www.downloadslide.comReferences Fuchs, L. S., Fuchs, D., Hosp, M. K., & Jenkins, J. R.Aten, K. K., Foster, A., & Cobb, B. (1996). Lopez full- (2001). Oral fluency as an indicator of reading competence: A theoretical, empirical, and his- day kindergarten study. Fort Collins, CO: Research torical analysis. Scientific Studies of Reading, 5, and Development Center for the Advancement of 239–256. Student Learning.Bramlett, R. K., Rowell, R. K., & Madenberg, K. (2000). Fuchs, L. S., Fuchs, D., & Maxwell, L. (1998). The Predicting first grade achievement from kindergar- validity of informal measures of reading com- ten screening measures: A comparison of child prehension. Remedial and Special Education, 9, and family predictors. Research in Schools, 7, 1–9. 20–28.Clark, P., & Kirk, E. (2000). All-day kindergarten. Childhood Education, 76, 228–231. Fusaro, J. A. (1997). The effect of full-day kindergar-Cohen, J. (1988). Statistical power and analysis for ten on student achievement: A meta-analysis. Child the behavioral sciences (2nd ed.). Hillsdale, NJ: Study Journal, 27, 269–277. Erlbaum.Cooper, T., Foster, A., & Cobb, B. (1998a). Half- Gullo, D. F., Bersani, C. U., Clements, D. H., & or full-day kindergarten: Choices for parents in Bayless, K. M. (1986). A comparative study of “all- Poudre School, District. Fort Collins, CO: Research day,” “alternate-day,” and “half-day” kindergarten and Development Center for the Advancement of schedules: Effects on achievement and classroom Student Learning. social behaviors. Journal of Research in ChildhoodCooper, T., Foster, A., & Cobb, B. (1998b). Full- Education, 1, 87–94. and half-day kindergarten: A study of six el- ementary schools. Fort Collins, CO: Research and Gullo, D. F., & Clements, D. H. (1984). The effects Development Center for the Advancement of of kindergarten schedule on achievement, class- Student Learning. room behavior, and attendance. The Journal ofCryan, J. R., Sheehan, R., Wiechel, J., & Brandy- Educational Research, 78, 51–56. Hedden, I. G. (1992). Success outcomes of all- day kindergarten: More positive behavior and Gullo, D. F., & Maxwell, C. B. (1997). The effects increased achievement in the years after. Early of different models of all-day kindergarten on Childhood Research Quarterly, 7, 187–203. children’s development competence. Early ChildDaly, E. J., III, Wright, J. A., Kelly, S. Q., & Martens, Development and Care, 139, 119–128. B. K. (1997). Measures of early academic reading skills: Reliability and validity with a first grade Hintze, J. M., Shapiro, E. S., Conte, K. L., & Basile, I. A. sample. School Psychology Quarterly, 12, 268–280. (1997). Oral reading fluency and authentic readingEhri, L. C. (1998). Grapheme-phoneme knowledge is material: Criterion validity of the technical features essential for learning to read words in English. In of CBM survey-level assessment. School Psychology J. L. Metsala & L. C. Ehri (Eds.), Word recognition Review, 26, 535–553. in beginning reading (pp. 3–40). Mahwah, NJ: Erlbaum. Holmes, C. T., & McConnell, B. M. (1990, April). Full-Elicker, J., & Mathur, S. (1997). What do they do all day versus half-day kindergarten: An experimen- day? Comprehensive evaluation of a full-day kin- tal study. Paper presented at the annual meeting dergarten. Early Childhood Research Quarterly, 12, of the American Educational Research Association, 459–480. Boston, MA.Evansville-Vanderburgh School Corporation. (1998). A longitudinal study of the consequences of full- Hough, D., & Bryde, S. (1996, April). The effects of day kindergarten: Kindergarten through grade full-day kindergarten on student achievement and eight. Evansville, IN: Author. affect. Paper presented at the annual meeting ofField, A. (2000). Discovering statistics using SPSS for the American Educational Research Association, Windows. London: Sage. New York. Jenkins, J. R., & Jewell, M. (1993). Examining the validity of two measures for formative teaching: Reading aloud and maze. Exceptional Children, 59, 421–432. Karweit, N. L. (1992). The kindergarten experience. Educational Leadership, 49(6), 82–86.282

Find more at www.downloadslide.comKarweit, N. L. (1994). Issues in kindergarten organiza- Ohio State Legislative Office of Education Oversight. tion and curriculum. In R. E. Slavin, N. L. Karweit, (1997). An overview of full-day kindergarten. & B. A. Wasik (Eds.), Preventing early school Columbus, OH: Author. failure: Research, policy and practice. Needham Heights, MA: Allyn & Bacon. Olsen, D., & Zigler, E. (1989). An assessment of the all-day kindergarten movement. Early ChildhoodKirby, J. R., & Parrila, R. K. (1999). Theory-based Research Quarterly, 4, 167–186. prediction of early reading. The Alberta Journal of Educational Research, 45, 428–447. Parker, R., Hasbrouck, J. E., & Tindal, G. (1992). Greater validity for oral reading fluency: Can mis-Koopmans, M. (1991). A study of the longitudi- cues help? The Journal of Special Education, 25, nal effects of all-day kindergarten attendance on 492–503. achievement. Newark, NJ: Board of Education, Office of Research, Evaluation, and Testing. Peck, J. T., McCaig, G., & Sapp, M. E. (1988). Kindergarten policies: What is best for children?Kranzler. J. H., Brownell, M. T., & Miller, M. D. (1998). Washington, DC: National Association for the The construct validity of curriculum-based mea- Education of Young Children. surement of reading: An empirical test of a plausi- ble rival hypothesis. Journal of School Psychology, Puleo, V. T. (1988). A review and critique of research 36, 399–415. on full-day kindergarten. The Elementary School Journal, 88, 427–439.Kurdek, L. A., & Sinclair, R. J. (2001). Predicting reading and mathematics achievement in fourth- Stage, S. A., Sheppard, J., Davidson, M. M., & grade children from kindergarten readiness scores. Browning, M. M. (2001). Prediction of first graders’ Journal of Educational Psychology, 93, 451–455. growth in oral reading fluency using kindergarten letter fluency. The Journal of School Psychology,Lore, R. (1992). Language development component: 39, 225–237. Full-day kindergarten program 1990–1991 find evaluation report. Columbus, OH: Columbus Strauss, A., & Corbin, J. (1994). Grounded theory Public Schools, Department of Program Evaluation. methodology: An overview. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative researchMcBride-Chang, C. (1999). The ABC’s of the ABCs: (pp. 273–285). Thousand Oaks, CA: Sage. The development of letter-name and letter-sound knowledge. Merrill-Power Quarterly, 45, 285–308. Tabachnick, B. G., & Fidell, L. S. (1996). Using mul- tivariate statistics (3rd ed.). New York: Harper &McConnell, B. B., & Tesch, S. (1986). Effectiveness of Row. kindergarten scheduling. Educational Leadership, 44(3), 48–51. Trieman, R., Tincoff, R., Rodriguez, K., Mouzaki, A., & Francis, D. J. (1998). The foundations of lit-National Center for Education Statistics. (2000). eracy: Learning the sounds of letters. Childhood America’s kindergartens. Washington, DC: Author. Development, 69, 1524–1540.Nelson, R. F. (2000). Which is the best kindergarten? Wang, Y. L., & Johnstone, W. G. (1999). Evaluation of Principal, 79(5), 38–41. a full-day kindergarten program. ERS Spectrum, 17(2), 27–32.Northwest Evaluation Association. (2003). Reliability estimates and validity evidence for achievement Note: “Comparing Longitudinal Academic Achievement of Full- level tests and measures of academic progress. Day and Half-Day Kindergarten Students,” by J. R. Wolgemuth, Retrieved March 30, 2003, from R. B. Cobb, & M. A. Winokur, The Journal of Educational NorthingStudy.htm Research 99(5), pp. 260–269, 2006. Reprinted with permission of the Helen Dwight Reid Educational Foundation. PublishedNunnally, J. (1996). The impact of half-day ver- by Heldref Publications, 1319 Eighteenth St., NW, Washington, sus full-day kindergarten programs on stu- DC 20036–1802, Copyright © 2006. dent outcomes: A pilot project. New Albany, IN: Elementary Education Act Title I. (ERIC Document Reproduction Service No. ED396857) 283

Find more at Chapter TenExperimental Research The Evil of Frankenstein, 1964 “When well conducted, experimental studies produce thesoundest evidence concerning cause—effect relations.” (p. 286)

Find more at Chapter 10  •  Experimental Research 285Learning Outcomes Task 6DAfter reading Chapter 10, you should be able to For a quantitative study, you have created researchdo the following: plan components (Tasks 2 and 3A), described a sample (Task 4A), and considered appropriate mea- 1. Briefly define and state the purpose of suring instruments (Task 5). If your study involves experimental research. experimental research, now develop the method section of a research report. Include a description 2. Briefly explain the threats to validity in of participants, data collection methods, and the experimental research. research design (see Performance Criteria at the end of Chapter 11, p. 348). 3. Define and provide examples of group experimental designs.These chapter learning outcomes form the basis forthe following task, which requires you to develop themethod section of a research report for an experi-mental study. SUMMARY: EXPERIMENTAL RESEARCHDefinition In experimental research, the researcher manipulates at least one independentDesign(s) variable, controls other relevant variables, and observes the effect on one or more dependent variables.Types of appropriateresearch questions An experiment typically involves a comparison of two groups (although someKey characteristics experimental studies have only one group or even three or more groups). The experimental comparison is usually one of three types: (1) comparison of two different approaches (A versus B), (2) comparison of a new approach and the existing approach (A versus no A), and (3) comparison of different amounts of a single approach (a little of A versus a lot of A). Group experimental designs include: pre-experimental designs (the one-group posttest only design, the one-group pretest–posttest design, and the posttest-only design with nonequivalent groups), true experimental designs (the pretest–posttest control group design, the posttest-only control group design, and the Solomon four- group design), quasi-experimental designs (the nonequivalent control group design, the time-series design, the counterbalanced designs), and factorial designs. In experimental educational research, the types of research questions are often focused on independent variables such as method of instruction, type of reinforcement, arrangement of learning environment, type of learning materials, and length of treatment. • The manipulation of an independent variable is the primary characteristic that differentiates experimental research from other types of research. • An experimental study is guided by at least one hypothesis that states an expected causal relation between two variables. • In an experiment, the group that receives the new treatment is called the experimental group, and the group that receives a different treatment or is treated as usual is called the control group. • The use of randomly formed treatment groups is a unique characteristic of experimental research. (continued )

Find more at www.downloadslide.com286 Chapter 10  •  Experimental ResearchSteps in the process 1. Select and define a problem.Potential challenges 2. Select participants and measuring instruments. 3. Prepare a research plan.Example 4. Execute procedures. 5. Analyze the data. 6. Formulate conclusions. • Experimental studies in education often suffer from two problems: a lack of sufficient exposure to treatments and failure to make the treatments substantially different from each other. • An experiment is valid if results obtained are due only to the manipulated independent variable and if they are generalizable to individuals or contexts beyond the experimental setting. These two criteria are referred to, respectively, as the internal validity and external validity of an experiment. • Threats to internal validity include history, maturation, testing, instrumentation, statistical regression, differential selection of participants, mortality, selection–maturation interactions, and other interactive effects. • Threats to external validity include pretest–treatment interaction, multiple- treatment interference, selection–treatment interaction, specificity of variables, treatment diffusion, experimenter effects, and reactive arrangements. What are the differential effects of two problem-solving instructional approaches (schema-based instruction and general strategy instruction) on the mathematical word-problem-solving performance of 22 middle school students who had learning disabilities or were at risk for mathematics failure?Experimental Research: type of reinforcement, arrangement of learning envi-Definition and Purpose ronment, type of learning materials, and length of treatment. This list is by no means exhaustive. TheExperimental research is the only type of research dependent variable, also called the criterion, effect,that can test hypotheses to establish cause–effect or posttest variable, is the outcome of the study,relations. It represents the strongest chain of reason- the change or difference in groups that occurs as aing about the links between variables. In experi- result of the independent variable. It gets its namemental research, the researcher manipulates at least because it is dependent on the independent vari-one independent variable, controls other relevant able. The dependent variable may be measured by avariables, and observes the effect on one or more test or some other quantitative measure (e.g., atten-dependent variables. The researcher determines dance, number of suspensions, time on task). Thewho gets what; that is, the researcher has control only restriction on the dependent variable is that itover the selection and assignment of groups to treat- must represent a measurable outcome.ments. The manipulation of an independent vari-able is the primary characteristic that differentiates Experimental research is the most structuredexperimental research from other types of research. of all research types. When well conducted, exper-The independent variable, also called the treatment, imental studies produce the soundest evidencecausal, or experimental variable, is that treatment or concerning cause–effect relations. The resultscharacteristic believed to make a difference. In edu- of experimental research permit prediction, butcational research, independent variables that are fre- not the kind that is characteristic of correlationalquently manipulated include method of instruction, research. A correlational study predicts a particular score for a particular individual. Predictions based

Find more at Chapter 10  •  Experimental Research 287on experimental findings are more global and one of three types: (1) comparison of two differ-often take the form, “If you use Approach X, you ent approaches (A versus B), (2) comparison of awill probably get different results than if you use new approach and the existing approach (A versusApproach Y.” Of course, it is unusual for a single no A), and (3) comparison of different amountsexperimental study to produce broad generaliza- of a single approach (a little of A versus a lot oftion of results because any single study is limited in A). An example of an A versus B comparison iscontext and participants. However, replications of a a study that compares the effects of a computer-study involving different contexts and participants based approach to teaching first-grade readingoften produce cause–effect results that can be gen- to a teacher-based approach. An example of an Aeralized widely. versus no A comparison is a study that compares a new handwriting method to the classroom teacher’sThe Experimental Process existing approach. An example of a little of A versus a lot of A comparison is a study that compares theThe steps in an experimental study are basically effect of 20 minutes of daily science instruction onthe same as in other types of research: selecting fifth graders’ attitudes toward science to the effect ofand defining a problem, selecting participants and 40 minutes of daily science instruction. Experimentalmeasuring instruments, preparing a research plan, designs are sometimes quite complex and mayexecuting procedures, analyzing the data, and involve simultaneous manipulation of several inde-formulating conclusions. An experimental study pendent variables. At this stage of the game, how-is guided by at least one hypothesis that states an ever, we recommend that you stick to just one!expected causal relation between two variables.The experiment is conducted to test the experi- In an experiment, the group that receives themental hypothesis. In addition, in an experimental new treatment is called (not surprisingly) the experi-study, the researcher is in on the action from the mental group, and the group that receives a differentvery beginning, selecting the groups, deciding treatment or is treated as usual is called the controlhow to allocate treatment to the groups, control- group. A common misconception is that a controlling extraneous variables, and measuring the effect group always receives no treatment, but a group withof the treatment at the end of the study. no treatment would rarely provide a fair comparison. For example, if the independent variable were type of It is important to note that the experimen- reading instruction, the experimental group may betal researcher controls both the selection and the instructed with a new method, and the control groupassignment of the research participants. That is, may continue instruction with the method currentlythe researcher randomly selects participants from a used. The control group would still receive readingsingle, well-defined population and then randomly instruction; members would not sit in a closet whileassigns these participants to the different treatment the study was conducted—if they did, the studyconditions. This ability to select and assign partici- would be a comparison of the new method with nopants randomly to treatments makes experimental reading instruction at all. Any method of instructionresearch unique—the random assignment of partici- is bound to be more effective than no instruction.pants to treatments, also called manipulation of the An alternative to labeling the groups as control andtreatments, is the feature that distinguishes it from experimental is to describe the treatments as com-causal–comparative research. Experimental research parison groups, treatment groups, or groups A and B.has both random selection and random assign-ment, whereas causal–comparative research has only The groups that are to receive the different treat-random selection, not assignment, because random ments should be equated on all variables that mayassignment to a treatment from a single popula- influence performance on the dependent variable.tion is not possible in causal–comparative studies. For example, in the previous example, initial readingRather, participants in causal–comparative studies are readiness should be very similar in each treatmentobtained from different, already existing populations. group at the start of the study. The researcher must make every effort to ensure that the two groups An experiment typically involves a comparison are similar on all variables except the independentof two groups (although some experimental stud- variable. The main way that groups are equatedies have only one group or even three or more is through simple random or stratified randomgroups). The experimental comparison is usually sampling.

Find more at www.downloadslide.com288 Chapter 10  •  Experimental Research After the groups have been exposed to the a difficult concept to grasp. Quite simply, it meanstreatment for some period, the researcher col- that the researcher selects the treatments andlects data on the dependent variable from the decides which group will get which treatment.groups and tests for a significant difference in For example, if the independent variable in aperformance. In other words, using statistical study were number of annual teacher reviews, theanalysis, the researcher determines whether the researcher may decide to form three groups, rep-treatment made a real difference. For example, resenting three levels of the independent variable:suppose that, at the end of an experimental one group receiving no review, a second groupstudy evaluating reading method, one group had receiving one review, and a third group receivingan average score of 29 on a measure of reading two reviews. Having selected research participantscomprehension and the other group had an aver- from a single, well-defined population (e.g., teach-age score of 27. Clearly the groups are different. ers at a large elementary school), the researcherBut is a 2-point difference a meaningful differ- would randomly assign participants to treatments.ence, or is it just a chance difference produced Independent variables that are manipulated by theby measurement error? Statistical analysis allows experimenter are also known as active variables.the researcher to answer this question withconfidence. Control refers to the researcher’s efforts to remove the influence of any variable, other than Experimental studies in education often suf- the independent variable, that may affect per-fer from two problems: a lack of sufficient formance on the dependent variable. In otherexposure to treatments and failure to make the words, in an experimental design, the groupstreatments substantially different from each other. should differ only on the independent variable.Regarding the first problem, no matter how effec- For example, suppose a researcher conductedtive a treatment is, it is not likely to be effective if a study to test whether student tutors are morestudents are exposed to it for only a brief period. effective than parent tutors in teaching first grad-To test a hypothesis concerning the effectiveness ers to read. In this study, suppose the studentof a treatment adequately, an experimental group tutors were older children from higher gradewould need to be exposed to it long enough that levels, and the parent tutors were members of thethe treatment has a chance to work (i.e., produce PTA. Suppose also that student tutors helped eacha measurable effect). Regarding the second prob- member of their group for 1 hour per school daylem (i.e., difference in treatments), it is important for a month, whereas the parent tutors helpedto operationalize the variables so that the differ- each member of their group for 2 hours per weekence between groups is clear. For example, in a for a month. Finally, suppose the results of thestudy comparing team teaching and traditional study indicate that the student tutors producedlecture teaching, team teaching must be opera- higher reading scores than the parent tutors.tionalized in a manner that clearly differentiated Given this study design, concluding that studentit from the traditional method. If team teaching tutors are more effective than parent tutors wouldsimply meant two teachers taking turns lectur- certainly not be fair. Participants with the studenting in the traditional way, it would not be very tutors received 2½ times as much help as  thatdifferent from so-called traditional teaching, and provided to the parents’ group (i.e., 5 hours perthe researcher would be very unlikely to find a week versus 2  hours per week). Because thismeaningful difference between the two study researcher did not control the time spent in tutor-treatments. ing, he or she has several possible conclusions— student tutors may in fact be more effective thanManipulation and Control parent tutors, longer periods of tutoring may be more effective than shorter periods regard-As noted several times previously, direct manipu- less of type of tutor, or the combination of morelation by the researcher of at least one indepen- time/student tutors may be more effective thandent variable is the characteristic that differentiates the combination of less time/parent tutors. Toexperimental research from other types of research. make the comparison fair and interpretable, bothManipulation of an independent variable is often students and parents should tutor for the same

Find more at Chapter 10  •  Experimental Research 289amount of time; in other words, time of tutoring beyond the experimental setting. Validity can alsomust be controlled. be thought of as the “approximate truth of an inference.”1 Researchers focus on four types of A researcher must consider many factors experimental validity, which we describe in greaterwhen attempting to identify and control extrane- detail next.ous variables. Some variables may be relativelyobvious; for example, the researcher in the pre- Statistical conclusion validity refers to theceding study should control for reading readi- appropriate use of statistics to infer whether theness and prior reading instruction in addition to presumed independent and dependent variablestime spent tutoring. Some variables may not be co-vary in the experiment. Threats to statisticalas obvious; for example, both student tutors and conclusion validity include low statistical power,parent tutors should use similar reading texts and violated assumptions of statistical tests.and materials. Ultimately, two different kinds of As Shadish et al.2 suggest “An insufficiently pow-variables need to be controlled: participant vari- ered experiment may incorrectly conclude thatables and environmental variables. A participant the relationship between treatment and outcomevariable (such as reading readiness) is one on is not significant.” with resulting implications forwhich participants in different groups in a study Type I and Type II errors. Similarly, “violationsmay differ; an environmental variable (such as of statistical test assumptions can lead to eitherlearning materials) is a variable in the setting of overestimating or underestimating the size andthe study that may cause unwanted differences significance of an effect.”between groups. A researcher should strive toensure that the characteristics and experiences of Internal validity refers to the degree to whichthe groups are as equal as possible on all impor- observed differences on the dependent variable aretant variables except the independent variable. If a direct result of manipulation of the independentrelevant variables can be controlled, group differ- variable, not some other variable. In other words,ences on the dependent variable can be attributed an examination of internal validity focuses onto the independent variable. threats or rival explanations that influence the out- comes of an experimental study but are not due to Control is not easy in an experiment, espe- the independent variable. In the example of studentcially in educational studies, where human beings tutors and parent tutors, a plausible threat or rivalare involved. It certainly is a lot easier to con- explanation for the research results is the differencetrol solids, liquids, and gases! Our task is not in the amount of tutoring time. The degree to whichan impossible one, however, because we can experimental research results are attributable to theconcentrate on identifying and controlling only independent variable and not to another rival expla-those variables that may really affect or inter- nation is the degree to which the study is internallyact with the dependent variable. For example, if valid.two groups had significant differences in shoesize or height, such differences would probably Construct validity is “the degree to which infer-not affect the results of most education studies. ences are warranted from the observed persons, set-Techniques for controlling extraneous variables tings, and cause and effect operations included inare presented later in this chapter. a study to the constructs that these instances might represent,”3 that is, the validity of inferences aboutThreats to Experimental the variables (constructs) in a study.Validity External validity, also called ecological valid-As noted, any uncontrolled extraneous variables ity, is the degree to which study results are gener-affecting performance on the dependent variable alizable, or applicable, to groups and environmentsare threats to the validity of an experiment. An outside the experimental setting. In other words, anexperiment is valid if results obtained are due onlyto the manipulated independent variable and if 1 Experimental and Quasi-Experimental Designs for Generalizedthey are generalizable to individuals or contexts Causal Inference by W. R. Shadish, T. D. Cook, and D. T. Campbell, 2002, Belmont, CA: Wadsworth Cengage Learning, p. 34. 2 Shadish, Cook, and Campbell, p. 45. 3 Shadish, Cook, and Campbell, p. 38.

Find more at www.downloadslide.com290 Chapter 10  •  Experimental Researchexamination of external validity focuses on threats potential threats are classified is not of great impor-or rival explanations that disallow the results of a tance; what is important is that you be aware ofstudy to be generalized to other settings or groups. their existence and how to control for them. As youA study conducted with groups of gifted ninth read, you may begin to feel that there are just toograders, for example, should produce results that are many threats for a researcher to control. However,applicable to other groups of gifted ninth graders. the task is not as formidable as it may appear atIf research results were never generalizable outside first because experimental designs can control manythe experimental setting, then no one could profit or most of the threats you are likely to encounter.from research. An experimental study can contribute Also, remember that each threat is a potential threatto educational theory or practice only if its results only—it may not be a problem in a particular study.and effects are replicable and generalize to other Probably the most authoritative sources on experi-places and groups. If results cannot be replicated mental design and threats to experimental validityin other settings by other researchers, the study has are the work that was done over 50 years ago bylow external, or ecological, validity. Donald Campbell, Julian Stanley, and Thomas Cook,5 and most recently, the work by William Shadish.6 Thus, all one has to do to conduct a valid And while the typology used by these authors hasexperiment is to maximize statistical conclusion expanded over the years to include statistical conclu-validity, internal validity, construct validity and sion validity and construct validity, our discussionexternal validity, right? Wrong. Unfortunately, a here will focus on the two primary threats to experi-“Catch-22” complicates the researcher’s experimen- mental validity: internal validity and external validity.tal life. For example, to maximize internal validity,the researcher must exercise very rigid controls over Threats to Internal Validityparticipants and conditions, producing a laboratory-like environment. However, the more a research There are eight main threats to internal validity: his-situation is narrowed and controlled, the less real- tory, maturation, testing, instrumentation, statisticalistic and generalizable it becomes. A study can regression, differential selection of participants, mor-contribute little to educational practice if techniques tality, and selection–maturation interaction, whichthat are effective in a highly controlled setting are are summarized in Table 10.1. Before describingnot also effective in a less controlled classroom these threats to internal validity, however, we notesetting. On the other hand, the more natural the the role of experimental research in overcomingexperimental setting becomes, the more difficult it these threats. You are not rendered helpless whenis to control extraneous variables. It is very difficult, faced with them. Quite the contrary, the use offor example, to conduct a well-controlled study in random selection of participants, the researcher’sa classroom. Thus, the researcher must strive for assignment of participants to treatments, and controlbalance between control and realism. If a choice is of other variables are powerful approaches to over-involved, the researcher should err on the side of coming the threats. As you read about the threats,control rather than realism4 because a study that is note how random selection and assignment to treat-not internally valid is worthless. A useful strategy ments can control most address this problem is to demonstrate an effectin a highly controlled environment (i.e., with maxi- Historymum internal validity) and then redo the study in amore natural setting (i.e., to examine external valid- When discussing threats to validity, history refersity). In the final analysis, however, the researcher to any event occurring during a study that is notmust seek a compromise between a highly con- part of the experimental treatment but may affecttrolled and highly natural environment. 5 Experimental and Quasi-Experimental Designs for Research, by In the following pages, we describe the many D. T. Campbell and J. C. Stanley, 1971, Chicago, IL: Rand McNally;threats to validity. Some extraneous variables are Quasi-Experimentation: Design and Analysis Issues for Fieldthreats to internal validity, some are threats to exter- Settings, T. D. Cook and D. T. Campbell, 1979, Chicago, IL: Randnal validity, and some may be threats to both. How McNally. 6 Experimental and Quasi-Experimental Designs for Generalized4 This is a clear distinction between the emphases of quantita- Causal Inference by W. R. Shadish, T. D. Cook, and D. T. Campbell,tive and qualitative research. 2002, Belmont, CA: Wadsworth Cengage Learning.

Find more at Chapter 10  •  Experimental Research 291Table 10.1 • Threats to internal validityThreat DescriptionHistoryMaturation Unexpected events occur between the pre- and posttest, affecting the dependent variable. Changes occur in the participants, from growing older, wiser, more experienced, and so on,Testing during the study.Instrumentation Taking a pretest alters the result of the posttest. The measuring instrument is changed between pre- and posttesting, or a single measuringStatistical regression instrument is unreliable.Differential selection Extremely high or extremely low scorers tend to regress to the mean on retesting.of participants Participants in the experimental and control groups have different characteristics that affectMortality the dependent variable differently. Different participants drop out of the study in different numbers, altering the composition ofSelection–maturation the treatment groups.interaction The participants selected into treatment groups have different maturation rates. Selection interactions also occur with history and instrumentation.the dependent variable. The longer a study lasts, the training program on 3-year-olds than in a studymore likely it is that history will be a threat. A bomb designed to compare two methods of teachingscare, an epidemic of measles, or global current algebra. Young participants typically undergo rapidevents are examples of events that may produce biological changes, raising the question of whethera history effect. For example, suppose you con- changes on the dependent variable are due to theducted a series of in-service workshops designed training program or to increase the morale of teacher participants.Between the time you conducted the workshops Testingand the time you administered a posttest measureof morale, the news media announced that, due to Testing, also called pretest sensitization, refers tostate-level budget problems, funding to the local the threat of improved performance on a posttestschool district was to be significantly reduced, and that results from a pretest. In other words, simplypromised pay raises for teachers would likely be taking a pretest may improve participants’ scorespostponed. Such an event could easily wipe out any on a posttest, regardless of whether they receivedeffect the workshops may have had, and posttest any treatment or instruction in between. Testing ismorale scores may well be considerably lower than more likely to be a threat when the time betweenthey otherwise may have been. the tests is short; a pretest taken in September is not likely to affect performance on a posttest takenMaturation in June. The testing threat to internal validity is most likely to occur in studies that measure factualMaturation refers to physical, intellectual, and information that can be recalled. For example, tak-emotional changes that naturally occur within indi- ing a pretest on solving algebraic equations is lessviduals over a period of time. In a research study, likely to improve posttest performance than takingthese changes may affect participants’ performance a pretest on multiplication facts would.on a measure of the dependent variable. Especiallyin studies that last a long time, participants become Instrumentationolder and perhaps more coordinated, less coordi-nated, unmotivated, anxious, or just plain bored. The instrumentation threat refers to unreliabil-Maturation is more likely to be a problem in a study ity, or lack of consistency, in measuring instru-designed to test the effectiveness of a psychomotor ments that may result in an invalid assessment of

Find more at www.downloadslide.com292 Chapter 10  •  Experimental Researchperformance. Instrumentation may threaten valid- receive much higher scores than 25. If all theseity in several different ways. A problem may occur students took the test a second time, without anyif the researcher uses two different tests, one for instruction intervening, their expected scores wouldpretesting and one for posttesting, and the tests still be 25. Thus, students who scored very low theare not of equal difficulty. For example, if the first time would be expected to have a secondposttest is more difficult than the pretest, improve- score closer to 25, and students who scored veryment may be masked. Alternatively, if the posttest high the first time would also be expected to scoreis less difficult than the pretest, it may indicate closer to 25 the second time. Whenever participantsimprovement that is not really present. If data are are selected on the basis of their extremely high orcollected through observation, the observers may extremely low performance, statistical regression isnot be observing or evaluating behavior in the a viable threat to internal validity.same way at the end of the study as at the begin-ning. In fact, if they are aware of the nature of the Differential Selection of Participantsstudy, they may record only behavior that supportsthe researcher’s hypothesis. If data are collected Differential selection of participants is the selec-through the use of a mechanical device, the device tion of participants who have differences beforemay be poorly calibrated, resulting in inaccu- the start of a study that may account at least par-rate measurement. Thus, the researcher must take tially for differences found in a posttest. The threatcare in selecting tests, observers, and mechanical that the groups are different before the studydevices to measure the dependent variable. begins is more likely when a researcher is compar- ing already-formed groups. Suppose, for example,Statistical Regression you receive permission to invite two of Ms. Hynd’s English classes to participate in your study. YouStatistical regression usually occurs in studies have no guarantee that the two classes are equiva-where participants are selected on the basis of their lent. If your luck is really bad, one class may beextremely high or extremely low scores. Statistical the honors English class and the other class mayregression is the tendency of participants who be the remedial English class—it would not bescore highest on a test (e.g., a pretest) to score too surprising if the honors class did much betterlower on a second, similar test (e.g., a posttest) on the posttest! Already formed groups should beand of participants who score lowest on a pre- avoided if possible; when they are included in atest to score higher on a posttest. The tendency study, the researcher should select groups that areis for scores to regress, or move, toward a mean as similar as possible and should administer a pre-(i.e.,  average) or expected score. Thus, extremely test to check for initial equivalence.high scorers regress (i.e., move lower) towardthe mean, and extremely low scorers regress Mortality(i.e., move higher) toward the mean. For example,suppose a researcher wanted to test the effective- First, let us make it perfectly clear that the mortalityness of a new method of instruction on the spelling threat is usually not related to participants dying!ability of poor spellers. The researcher could admin- Mortality, or attrition, refers to a reduction in theister a 100-item, four-alternative, multiple-choice number of research participants; this reductionspelling pretest, with questions reading, “Which occurs over time as individuals drop out of a study.of the following four words is spelled incorrectly?” Mortality creates problems with validity particularlyThe researcher could then select for the study the when different groups drop out for different rea-30 students who scored lowest. However, perhaps sons and with different frequency. A researcher cannone of the students knew any of the words and assess the mortality of groups by obtaining demo-guessed on every question. With 100 items, and graphic information about the participant groupsfour choices for each item, a student would be before the start of the study and then determiningexpected to receive a score of 25  just by guess- if the makeup of the groups has changed at theing. Some students, however, would receive scores end of the study.much lower than 25 due simply to rotten guess-ing, and other students, equally by chance, would A change in the characteristics of the groups due to mortality can have a significant effect on the results of the study. For example, participants who

Find more at Chapter 10  •  Experimental Research 293drop out of a study may be less motivated or unin- equivalent on all relevant variables. Suppose,terested in the study than those who remain. This however, that for some reason Ms. Hynd had totype of attrition frequently occurs when the partici- miss one of her classes but not the other (maybepants are volunteers or when a study compares a she had to have a root canal) and Ms. Alma Maternew treatment to an existing treatment. Participants took over Ms. Hynd’s class. As luck would haverarely drop out of control groups or existing treat- it, Ms. Alma Mater proceeded to cover muchments because few or no additional demands are of the material now included in your posttestmade on them. However, volunteers or participants (i.e., a problem with history). Unbeknownst to you,using the new, experimental treatment may drop your experimental group would have a definiteout because too much effort is required for partici- advantage, and this advantage, not the indepen-pation. The experimental group that remains at the dent variable, may cause posttest differences inend of the study then represents a more motivated the dependent variable. A researcher must selectgroup than the control group. As another example a design that controls for potential problems suchof mortality, suppose Suzy Shiningstar (a high-IQ- as this or make every effort to determine if theyand-all-that student) got the measles and dropped are operating in the study.out of your control group. Before Suzy droppedout, she managed to infect her friends in the con- Threats to External Validitytrol group. Because birds of a feather often flocktogether, Suzy’s control-group friends may also Several major threats to external validity can limitbe high-IQ-and-all-that students. The experimental generalization of experimental results to other pop-group may end up looking pretty good when com- ulations. Building on the work of Campbell andpared to the control group simply because many Stanley, Bracht and Glass7 refined and expanded theof the top students dropped out of the control discussion of threats to external validity and classi-group. The researcher cannot assume that partici- fied these threats into two categories. Threats affect-pants drop out of a study in a random fashion and ing “generalizing to whom”—that is, threat affectingshould, if possible, select a design that controls for the groups to which research results can be gen-mortality. For example, one way to reduce mortal- eralized—make up threats to population validity.ity is to provide some incentive to participants to Threats affecting “generalizing to what”—that is,remain in the study. Another approach is to identify threats affecting the settings, conditions, variables,the kinds of participants who drop out of the study and contexts to which results can be generalized—and remove similar participants from the other make up threats to ecological validity. The followinggroups in equal numbers. discussion incorporates the contributions of Bracht and Glass into Campbell and Stanley’s (1971) con-Selection–Maturation Interaction ceptualizations; the threats to external validity areand Other Interactive Effects summarized in Table 10.2.The effects of differential selection may also Pretest–Treatment Interactioninteract with the effects of maturation, history, ortesting, with the resulting interaction threaten- Pretest–treatment interaction occurs when par-ing internal validity. In other words, if already ticipants respond or react differently to a treatmentformed groups are included in a study, one because they have been pretested. Pretesting maygroup may profit more (or less) from a treatment sensitize or alert participants to the nature of theor have an initial advantage (or disadvantage) treatment, potentially making the treatment effectbecause of maturation, history, or testing factors. different than it would have been had participantsThe most common of these interactive effects is not been pretested. Campbell and Stanley illustratedselection–maturation interaction, which exists this effect by pointing out the probable differ-if participants selected into the treatment groups ences between two groups—participants who viewmatured at different rates during the study. For the anti-prejudice film Gentleman’s Agreement afterexample, suppose that you received permission toinclude two of Ms. Hynd’s English classes in your 7 “The External Validity of Experiments,” by G. H. Bracht andstudy; both classes are average and apparently G. V. Glass, 1968, American Educational Research Journal, 5, pp. 437–474.

Find more at www.downloadslide.com294 Chapter 10  •  Experimental ResearchTable 10.2 • Threats to external validityThreat DescriptionPretest–treatment The pretest sensitizes participants to aspects of the treatment and thus influencesinteraction posttest scores.Selection–treatment The nonrandom or volunteer selection of participants limits the generalizabilityinteraction of the study.Multiple-treatment When participants receive more than one treatment, the effect of prior treatment caninterference affect or interact with later treatment, limiting generalizability.Specificity of variables Poorly operationalized variables make it difficult to identify the setting and procedures to which the variables can be generalized.Treatment diffusion Treatment groups communicate and adopt pieces of each other’s treatment, altering the initial status of the treatment’s comparison.Experimenter effects Conscious or unconscious actions of the researchers affect participants’ performance and responses.Reactive arrangements The fact of being in a study affects participants so that they act in ways different from their normal behavior. The Hawthorne and John Henry effects are reactive responses to being in a study.taking a lengthy pretest dealing with anti-Semitism When a study is threatened by pretest–treatmentand participants who view the movie without a interaction, researchers should select a design thatpretest. Individuals not pretested could conceivably either controls for the threat or allows the research-enjoy the movie as a good love story, unaware that it ers to determine the magnitude of the effect. Fordeals with a social issue. Individuals who had taken example, the researcher can (if it’s feasible) makethe pretest, in contrast, may be much more likely to use of unobtrusive measures—ways to collect datasee a connection between the pretest and the mes- that do not intrude on or require interaction withsage of the film. If pretesting affects participants’ research participants—such as reviewing schoolresponses on the dependent measure, the research records, transcripts, and other written sources.results are generalizable only to other pretestedgroups; the results are not even generalizable to the Multiple-Treatment Interferencepopulation from which the sample was selected. Sometimes the same research participants receive For some studies, the potential interactive more than one treatment in succession. Multiple-effect of a pretest is a more serious consideration treatment interference occurs when carryoverthan others. For example, taking a pretest on effects from an earlier treatment make it difficultalgebraic algorithms would probably have very to assess the effectiveness of a later treatment. Forlittle impact on a group’s responsiveness to a new example, suppose you were interested in compar-method of teaching algebra, but studies involving ing two different approaches to improving class-self-report measures, such as attitude scales and room behavior, behavior modification and corporalinterest inventories, are especially susceptible to punishment (admittedly an extreme example we’rethis threat. The pretest–treatment interaction is using to make a point!). For 2 months, behav-also minimal in studies involving very young chil- ior modification techniques were systematicallydren, who would probably not see or remember applied to the participants. At the end of thisa connection between the pretest and the subse- period, you found behavior to be significantlyquent treatment. Similarly, for studies conducted better than before the study began. For the nextover a period of months or longer, the effects of 2 months, the same participants were physicallythe pretest would probably have worn off or be punished (with hand slappings, spankings, andgreatly diminished by the time a posttest is given.

Find more at Chapter 10  •  Experimental Research 295the like) whenever they misbehaved, and at the this is true, the positive effect shown by the partici-end of the 2 months, behavior was equally as good pants in the sample may be valid only for lower-abilityas after the 2 months of behavior modification. ­students rather than for the target population of allCould you then conclude that behavior modifica- junior high students. Similarly, if computer-assistedtion and corporal punishment are equally effective instruction appears ineffective for this sample, it maymethods of behavior control? Certainly not. In fact, still be effective for the target population.the goal of behavior modification is to produceself-maintaining behavior—that is, behavior that Selection–treatment interaction, like the prob-continues after direct intervention is stopped. The lem of differential selection of participants associ-good behavior exhibited by the participants at the ated with internal validity, mainly occurs whenend of the corporal punishment period could well participants are not randomly selected for treat-be due to the effectiveness of previous exposure ments. But this threat can occur in designs involv-to behavior modification; this good behavior could ing randomization as well, and the way a givenexist in spite of, rather than because of, exposure population becomes available to a researcher mayto corporal punishment. If it is not possible to threaten generalizability, no matter how internallyselect a design in which each group receives only valid an experiment may be. For example, supposeone treatment, the researcher should try to mini- that, in seeking a sample, a researcher is turnedmize potential multiple-treatment interference by down by nine school systems before finally beingallowing sufficient time to elapse between treat- accepted by a tenth. The accepting system is veryments and by investigating distinctly different likely to be different from the other nine systemstypes of independent variables. and also from the population of schools to which the researcher would like to generalize the results. Multiple-treatment interference may also occur Administrators and instructional personnel in thewhen participants who have already participated in tenth school may have higher morale, less fear ofa study are selected for inclusion in another, appar- being inspected, or more zeal for improvement thanently unrelated study. If the accessible population for personnel in the other nine schools. In the researcha study is one whose members are likely to have par- report, researchers should describe any problemsticipated in other studies (e.g., psychology majors), they encountered in acquiring participants, includ-then information on previous participation should ing the number of times they were turned down, sobe collected and evaluated before participants are that the reader can judge the seriousness of a pos-selected for the current study. If any members of the sible selection–t­reatment interaction.accessible population are eliminated from consider-ation because of previous research activities, a note Specificity of Variablesshould be made in the research report. Like selection–treatment interaction, specificity ofSelection–Treatment Interaction variables is a threat to generalizability of research results regardless of the particular experimentalSelection–treatment interaction, another threat to pop- design. Any given study has specificity of variables;ulation validity, occurs when study findings apply that is, the study is conducted with a specific kindonly to the (nonrepresentative) groups involved and of participant, using specific measuring instruments,are not representative of the treatment effect in the at a specific time, and under a specific set of cir-extended population. This interaction occurs when cumstances. We have discussed the need to describestudy participants at one level of a variable react research procedures in sufficient detail to permitdifferently to a treatment than other potential par- another researcher to replicate the study. Suchticipants in the population, at another level, would detailed descriptions also permit interested readershave reacted. For example, a researcher may conduct to assess how applicable findings are to their situ-a study on the effectiveness of computer-assisted ations. When studies that supposedly manipulatedinstruction on the math achievement of junior high the same independent variable get quite differentstudents. Classes available to the researcher (i.e., the results, it is often difficult to determine the reasonsaccessible population) may represent an overall abil- for the differences because researchers have notity level at the lower end of the ability spectrum for p­ rovided clear, operational definitions of their inde-all junior high students (i.e., the target population). If pendent variables. When operational definitions are

Find more at www.downloadslide.com296 Chapter 10  •  Experimental Researchavailable, they often reveal that two independent Treatment Diffusionvariables with the same name were defined quitedifferently in the separate studies. Because such Treatment diffusion occurs when different treat-terms as discovery method, whole language, and ment groups communicate with and learn fromcomputer-based instruction mean different things each other. When participants in one treatmentto different people, it is impossible to know what group know about the treatment received by a dif-a researcher means by these terms unless they ferent group, they often borrow aspects from thatare clearly defined. Generalizability of results is treatment. When such borrowing occurs, the studyalso tied to the clear definition of the dependent no longer has two distinctly different treatmentsvariable, although in most cases the dependent but rather has two overlapping ones. The integ-variable is clearly operationalized as performance rity of each treatment is diffused. Often, the moreon a specific measure. When a researcher has a desirable treatment—the experimental treatmentchoice of measures to select from, he or she should or the treatment with additional resources—is dif-address the comparability of these instruments and fused into the less desirable treatment. For exam-the potential limits on generalizability arising from ple, suppose Mr. Darth’s and Ms. Vader’s classestheir use. were trying two different treatments to improve spelling. Mr. Darth’s class received videos, new and Generalizability of results may also be colorful spelling texts, and prizes for improvedaffected by short- or long-term events that occur spelling. In Ms. Vader’s class, the students werewhile the study is taking place. This threat is asked to list words on the board, copy them intoreferred to as the interaction of history and treat- notebooks, use each word in a sentence, and studyment effects and describes the situation in which at home. After the first week of treatments, the stu-events extraneous to the study alter the research dents began talking to their teachers about the dif-results. Short-term, emotion-packed events, such ferent spelling classes. Ms. Vader asked Mr. Darthas the firing of a superintendent, the release if she could try the videos in her class, and herof district test scores, or the impeachment of a students liked them so well that she incorporatedpresident may affect the behavior of participants. them into her spelling program. The diffusion ofUsually, however, the researcher is aware of such Mr. Darth’s treatment into Ms. Vader’s treatmenthappenings and can assess their possible impact produced two overlapping treatments that did noton results, and accounts of such events should represent the initial intended treatments. To reducebe included in the research report. The impact treatment diffusion, a researcher may ask teachersof long-term events, such as wars and economic who are implementing different treatments not todepressions, however, is more subtle and tougher communicate with each other about the treatmentsto evaluate. until the study is completed or may carry out the study in more than one location, thus allowing Another threat to external validity is the only one treatment per school.interaction of time of measurement and treat-ment effect. This threat results from the fact that Experimenter Effectsposttesting may yield different results dependingon when it is done. A posttest administered imme- Researchers themselves also present potentialdiately after the treatment may provide evidence threats to the external validity of their own stud-for an effect that does not show up on a posttest ies. A researcher’s influences on participants orgiven some time after treatment. Conversely, a on study procedures are known as experimentertreatment may have a long-term but not a short- effects. Passive experimenter effects occur as aterm effect. The only way to assess the general- result of characteristics or personality traits of theizability of findings over time is to measure the experimenter, such as gender, age, race, anxietydependent variable at various times following level, and hostility level. These influences are col-treatment. lectively called experimenter personal-attributes effects. Active experimenter effects occur when the To summarize, the researcher must deal with researcher’s expectations of the study results affectthe threats associated with specificity by defining his or her behavior and contribute to producingvariables operationally, in a way that has meaning certain research outcomes. This effect is referred tooutside the experimental setting, and must be care-ful in stating conclusions and generalizations.

Find more at Chapter 10  •  Experimental Research 297as the experimenter bias effect. An experimenter productivity, researchers investigated the effect ofmay unintentionally affect study results, typically light intensity and worker output. The researchersin the desired direction, simply by looking, feeling, increased light intensity and production went up.or acting a certain way. They increased it some more and production went up some more. The brighter the place became, the One form of experimenter bias occurs when more production rose. As a check, the researchersthe researcher affects participants’ behavior or is decreased the light intensity, and guess what, pro-inaccurate in evaluating behavior because of previ- duction went up! The darker it got, the more work-ous knowledge of the participants. For example, ers produced. The researchers soon realized that itsuppose a researcher hypothesizes that a new was the attention given the workers, not the illumi-reading approach will improve reading skills. If nation, that was affecting production. To this day,the researcher knows that Suzy Shiningstar is in the term Hawthorne effect is used to describe anythe experimental group and that Suzy is a good situation in which participants’ behavior is affectedstudent, she may give Suzy’s reading skills a higher not by the treatment per se but by their awarenessrating than they actually warrant. This example of participating in a study.illustrates another way a researcher’s expectationsmay contribute to producing those outcomes: A related reactive effect, known as compen-Knowing or even believing that participants are in satory rivalry or the John Henry effect, occursthe experimental or the control group may cause when members of a control group feel threatenedthe researcher—unintentionally—to evaluate their or challenged by being in competition with anperformances in a way consistent with the expec- experimental group and they perform way beyondtations for that group. what would normally be expected. Folk hero John  Henry, you may recall, was a “steel drivin’ It is difficult to identify experimenter bias in a man” who worked for a railroad. When he heardstudy, which is all the more reason for researchers that a steam drill was going to replace him and histo be aware of its consequences on the external fellow steel drivers, he set out to beat the machine.validity of a study. The researcher should strive to Through tremendous effort he managed to win theavoid communicating emotions and expectations ensuing contest, dropping dead at the finish participants in the study. Experimenter bias In the John Henry effect, research participants whoeffects can also be reduced by blind scoring, in are told that they will form the control group forwhich the researcher doesn’t know whose perfor- a new, experimental method, start to act like Johnmance is being evaluated. Henry. They decide to challenge the new method by putting extra effort into their work, essentiallyReactive Arrangements saying (to themselves), “We’ll show them that our old ways are as effective as their newfangledReactive arrangements, also called participant ways!” By doing this, however, the control groupeffects, are threats to validity that are associated performs atypically; their performance provides awith the way in which a study is conducted and the rival explanation for the study results. When thefeelings and attitudes of the participants involved. John Henry effect occurs, the treatment underAs discussed previously, to maintain a high degree investigation does not appear to be very effectiveof control and obtain internal validity, a researcher because posttest performance of the experimentalmay create an experimental environment that is group is not much (if at all) better than that of thehighly artificial and not easily generalizable to non- control group.experimental settings; this is a reactive arrangement. As an antidote to the Hawthorne and John Another type of reactive arrangement results Henry effects, educational researchers often attemptfrom participants’ knowledge that they are involved to achieve a placebo effect. The term comes fromin an experiment or their feeling that they are in medical researchers who discovered that any appar-some way receiving special attention. The effect ent medication, even sugar and water, could makethat such knowledge or feelings can have on par- participants feel better; any beneficial effect causedticipants was demonstrated at the Hawthorne Plant by a person’s expectations about a treatment ratherof the Western Electric Company in Chicago some than the treatment itself became known as theyears ago. As part of a study to investigate the placebo effect. To counteract this effect, a placeborelation between various working conditions and

Find more at www.downloadslide.comThe research problem focuses potential. A support model approach to programming provides a frameworkon answering the question: for identifying adult outcomes, determining current levels of functioning, and“Do functional mobility skills identifying supports needed to achieve the targeted students with physical dis- (03) As educational practices change, therapy approaches that stressed remedi-abilities improve as a result of ation of individual skills in isolated environments are being replaced by thedirect training using the MOVE practice of integrated therapy in which services are provided in natural settingsCurriculum and will skills be where skills will be functional and performance meaningful for individual stu-maintained over time?” The dents (Rainforth & York-Barr, 1997). Integrated therapy breaks from the moreresearchers define the depen- traditional, multidisciplinary model where team members conduct assessmentsdent variable of “functional and set goals in relative isolation (Orelove & Sobsey, 1996). Parents, teachers,mobility skills” and independent and therapists collaborate as a team to assess the student, write goals, and im-variable of “MOVE curriculum” plement intervention. The team develops the IEP together by setting prioritiesas part of the preceding review and developing child-centered goals through consensus (Rainforth & York-of literature. Barr). In this way, all team members are aware of the IEP goals and can work cooperatively to embed them into the child’s natural activities. (04) As the fields of physical therapy, occupational therapy, and education have begun to move away from a developmental approach toward a func- tional model that emphasizes potential and support, the link between spe- cial education and pediatric therapy has been strengthened (McEwen & Shelden, 1995). Recent research suggests that when therapy is integrated into the student’s natural environments, treatment is just as effective as traditional therapy and that the integrated approach is more preferred by the school team (Giangreco, 1986; Harris, 1991). The benefits of providing therapy in integrated settings include (a) the availability of natural motiva- tors (Atwater, 1991; Campbell et al., 1984), (b) repeated ­opportunities for practicing motor skills in meaningful situations (Campbell et al., 1984; Fetters, 1991), and (c) increased generalization of skills across different environmental settings (Campbell et al., 1984; Craig et al., 1999; Harris, 1991). “Although intervention has historically focused on deficient skills with the assumption that isolated skills must be learned and then eventu- ally transferred to functional activities, we now know that for learners with severe disabilities, task-specific instruction must take place in the natural environment for retention to occur” (Shelden, 1998, p. 948). (05) In response to the shortcomings of traditional motor treatment a­pproaches, the MOVE (Mobility Opportunities Via Education) Curriculum was developed to teach functional mobility skills to students with severe dis- abilities (Kern County Superintendent of Schools, 1999). MOVE is a top- down, activity-based curriculum designed to link educational programs and therapy by providing functional mobility practice within typical daily activi- ties in the natural context. Individuals using the MOVE Curriculum follow a top-down approach to program planning, rather than selecting skills from a developmental hierarchy. A transdisciplinary team that includes parents, ed- ucators, and therapists works collaboratively to assess the student’s skills, design an individualized program, and teach targeted skills while the stu- dent participates in school and community activities (for additional informa- tion on the MOVE Curriculum see Bidabe, Barnes, & Whinnery, 2001). (06) Since the inception of the MOVE Curriculum in 1986, this seemingly suc- cessful approach has spread to a great number of classrooms, rehabilitation fa- cilities, and homes for students with disabilities across the United States as well as throughout Europe and Asia. Although testimony from practitioners and fami- lies as well as informal studies have praised the effectiveness of MOVE, there has been no systematic research related to the effectiveness of this approach to teaching functional mobility skills. While the great number of anecdotal reports of student successes in the MOVE Curriculum should not be disregarded, there is a critical need for demonstrable data to support the efficacy of the program. Therefore, this study asked the following question: Do functional mobility skills in students with physical disabilities improve as a result of direct training using the MOVE Curriculum and will these skills be maintained over time?352

Find more at www.downloadslide.comMethod (07)Participants Participants in the study described as five children with severe, multiple ­disabilitiesFive children with severe, multiple disabilities between the ages of 3 and 9 between the ages of 3 and 9. Selectionwere selected to participate in this study. All of the children attended a public criteria for research p­ articipants clearlyelementary school located in an urban, southeastern school district, were stated.served in special education classes, and received o­ ccupational and physical (08)therapy as related services. Four of the participants were served in a preschoolclassroom for students with severe, multiple disabilities. The remaining partici- (09)pant was served in a varying exceptionalities classroom for students withmoderate to severe disabilities. (10) The following criteria were used to select participants for the study: (11)(a) diagnosis of a s­ evere, multiple disability including a physical impairment,(b) parental consent, (c) medical eligibility, (d) willingness of the school (12)team to participate and to be trained in MOVE, and (e) no prior implementa-tion of the MOVE Curriculum. Five of the 17 students served in the two (13)classes met all the selection criteria. The primary means of mobility for all participants was either being pushedin a wheelchair or being carried. Participant 1, Kim, was a 7-year-old femalediagnosed with Down syndrome, severe mental retardation, general hypotoniain all extremities, and a seizure disorder for which she took anticonvulsantmedication. Kim was able to bear her own weight in s­tanding while holding astationary object and could move her feet reciprocally while being supportedfor weight shifting and balance. Although she demonstrated these skills on rareo­ ccasions in physical therapy, she typically refused to use them. Participant 2, Melissa, was a 4-year-old female diagnosed with a devel-opmental delay and cerebral palsy with hypotonia. Melissa was able to bearweight in standing while holding a stationary object and move her legs­reciprocally while being supported for balance and weight shifting in physi-cal therapy, but she also refused to use these skills. Participant 3, Kevin, was a 3-year-old male diagnosed with cerebralpalsy with hypotonia, right hemiparesis, cortical blindness, and a seizuredisorder for which he took medication. Kevin was unable to bear weight instanding unless his knees, hips, and trunk were held in alignment by astanding device. Participant 4, David, was a 9-year-old male diagnosed with spasticquadriplegic cerebral palsy and asthma for which he took medication.David had the ability to maintain hip and knee extension when supportedby an adult and to tolerate fully prompted reciprocal steps when supportedin a walker. Participant 5, Caleb, was a 4-year-old male diagnosed with global­developmental delays, spastic quadriplegic cerebral palsy, chronic lungdisorder, and a seizure disorder for which he took medication. Additionally,Caleb had a tracheostomy that required frequent suctioning, a gastros-tomy tube, and occasionally had breathing distress. He required a one-o­ n-one nurse in attendance at all times, and his medical complications some-times resulted in extended absences. Caleb had the ability to maintain hipand knee extension when supported by an adult and to tolerate fullyprompted reciprocal steps when supported in a walker.Research Methodology (14)A single-subject, multiple-baseline across subjects study was employed. The Research design described as aindependent variable was the MOVE Curriculum that consists of six steps: single-subject, multiple-baseline(1)  Testing, (2) Setting Goals, (3) Task Analysis, (4) Measuring Prompts, across subjects study. This design can(5)  Reducing Prompts, and (6) Teaching the Skills. The dependent variable be annotated as follows: A-B-A-B.was the number of reciprocal steps. A reciprocal step was defined as a stepwithin a time interval of not more than 10 seconds between initial contact ofone foot and initial contact of the opposite foot in a forward motion. 353

Find more at Setting (15) Mobility practice was conducted in the natural context in accordance with the principles of the MOVE Curriculum. Meaningful and relevant activities that naturally occur during the school day were selected for each participant. These activities occurred throughout the school campus. (16) The study was conducted over the course of one school year beginning in the third week of the fall term and lasting until the 27th week in spring. Maintenance data was collected over a 2-week period 2 years following the intervention year. Staff Training (17) Two special education teachers, a physical therapist, and an occupational therapist from the selected school participated in a 2-day MOVE International Basic Provider training on the MOVE Curriculum. Basic Provider training incorporates 16 hrs of instruction on the six steps of the MOVE Curriculum including hands-on instruction in assessment, goal set- ting, and adaptive prompts and equipment with families and individuals with disabilities. Materials and Equipment (18) The Rifton Gait Trainer (Community Playthings, 1999) was used during inter- vention. The Gait Trainer, also known as the Front Leaning Walker, provides support for an individual to learn to take reciprocal steps. The Gait Trainer is designed to provide total support (if needed) for individuals who are just beginning to bear weight in standing. The prompts can be removed as an individual requires less support with the long-term goal of independent walking. (19) Procedures BaselineResearch procedures for the A-B-A-Bstudy included baseline data collection, During the baseline phase, repeated measures of the number of reciprocaltwice weekly, of the number of reciprocal steps were taken twice a week until a pattern of stable performance was es-steps taken. tablished. Baseline measures began by the fifth week of the school year. Due to the multiple-baseline design, baseline was collected for 1½ weeks for the first participant, Kim, and continued for 12 weeks for the last participant, Caleb. Each participant was given the least amount of adult assistance (i.e., one or two hands held or support at trunk) necessary for weight bearing in standing and verbal directions to walk. No assistance was provided for taking reciprocal steps. Baseline measures of reciprocal stepping were taken with adult support for all participants. Additional measurements without assis- tance were taken for Kim and Melissa because they had demonstrated the ability to bear weight in standing while holding a stationary o­ bject. Baseline measures ­occurred within the participants’ normal school environments; however, no measures were taken using the Gait Trainer or within functional activities because these were considered to be part of the intervention. Data Collection (20) Although practice of walking skills occurred throughout the day, measurement of the number of reciprocal steps was taken twice a week during specifically tar- geted activities to provide consistency of measurement. Measurement was taken at the first walking opportunity during each activity. (21) For the purpose of data collection, three general levels of support were used for participants according to their abilities and needs. Support was d­ efined as (a) no outside assistance or independent, (b) adult assistance for postural control with independent weight bearing (e.g., one or two hands held or support at trunk), and (c) use of the Gait Trainer to provide postural control and partial weight bearing support when necessary. As students’ reciprocal354

Find more at www.downloadslide.comstepping skills increased, the level of support decreased progressively from theuse of the Gait Trainer to adult assistance to no assistance as appropriate.Therefore, measurements were taken in multiple ways for each participant be-cause it was not possible to predict changing levels of necessary support dur-ing intervention or the eventual level of independent mobility after interven-tion. For all participants, data were collected concurrently at the level ofsupport required at the beginning of intervention and at the next more inde-pendent level. In addition, measurements were taken for Melissa at all threelevels of support. Although Melissa required the use of the Gait Trainer, mea-surements were taken for independent walking ­because she had previouslydemonstrated the ability to bear her own weight and occasionally take one ortwo reciprocal steps with both hands held. Because Kim began to taker­eciprocal steps independently during the second trial of intervention, data col-lection with “adult assistance” was discontinued.Interobserver Agreement (22)Although the intervention was implemented by all team members, measure-ments were taken by the first author to increase reliability. In addition, in-terobserver agreement checks were made by the second author to ensure ac-curate measurement. For each participant, a minimum of two checks wasconducted for each targeted behavior. Percentage of agreement was calcu-lated by dividing the total number of agreements by the sum of agreementsand disagreements and multiplying that number by 100 (White & Haring,1980). Agreement for the number of ­reciprocal steps taken equaled 100%.Intervention Phase (23) (24)The intervention consisted of the implementation of the six steps of theMOVE Curriculum for each participant. Using the information obtained dur- (25)ing Step 1, Testing, the team was able to identify each participant’s consistent (26)use of mobility skills and to select functional activities during Step 2. Theseactivities were task analyzed in Step 3 in order to identify the critical ­mobilityskills to be addressed in each activity. Once meaningful daily activities were identified for mobility practice, thelevel and type of physical support needed to accomplish the activity weredetermined in Step 4, Measuring Prompts. A critical component of the MOVEprogram is to provide the necessary but minimal prompts (physical support)needed for functional mobility within an activity. This level was determinedfor each individual based upon assessment data collected in Step 1. Therefore,not all participants required assistance and all three levels of support. As is advocated in Step 5 of MOVE, physical support was faded as soonas the students demonstrated an increase in skill level as indicated by thedata. The reduction of prompts differed for participants according to theirindividual rate of progress. During Teaching the Skills, Step 6 of MOVE, instruction of skills was em-bedded into typical daily activities in order to provide meaningful, intensive,and consistent practice of reciprocal stepping. An important component ofthis step is the identification of practice activities that are relevant and moti-vating to the individual to encourage active participation. From these practiceopportunities, one activity per participant was selected for data collection.Data were collected twice a week.Maintenance Phase (27)After a period of 2 years, maintenance measures were taken on dependentvariables for 4 of the 5 participants. David had moved from the area and wasunavailable. Data were collected during the participants’ natural activities atthe time. Some students had moved into new classrooms and many wereparticipating in different activities than those used during the initial inter-vention phase. 355

Find more at (28) Measurements were taken for participants at their current level of s­ upport necessary for functional walking (e.g., independent walking for Kim and Melissa, walking with adult assistance for Caleb, and walking with the use of the Gait Trainer for Kevin). Results Data Analysis (29) Data were analyzed using visual inspection of the graphs including changes in means, levels, and trends as well as percentage of overlap across phases (Kazdin, 1982). Performance data for intervention and maintenance are pre- sented in Figures 1–3.Data analysis for each participant (30) Kim.  A stable baseline with a mean and range of 0 steps was observed for walk- ing forward independently (see Figure 1). The mean for intervention phase wasin the study is represented with 5.25 steps with a range from 0 to 14 steps. There was only a 9% overlap of dataa visual display that captures points (4 of 45 data points) from baseline with a 5.25-point ­increase in the mean.the changes in the number of There was an upward trend in reciprocal steps observed in intervention phasereciprocal steps taken across with one notable decrease coinciding with an increase in seizure episodes.the baseline, intervention, and­maintenance phases of the study. (31) During the maintenance phase, there was a sizable increase in the num- ber of independent reciprocal steps recorded. All measurements during main- tenance revealed that Kim was able to walk over 500 ft independently. This resulted in a 494.74-point increase in the mean number of steps taken with a 0% overlap of data points from intervention phase to the maintenance phase. (32) Melissa.  A stable baseline with a mean and range of 0 steps was observed for walking (see Figure 1). Intervention included the use of the Gait Trainer (see Figure 3) and adult support (see Figure 2). As Melissa required less Reciprocal Steps 450 Maintenance Kim Baseline 350 Move250 150 50 12 10 8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 Trials Reciprocal Steps450 Maintenance Melissa Baseline350 Move250 150 50 12 10 8 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 60 Trials Figure 1  Independent walking for Kim and Melissa across baseline, intervention, and maintenance phases356

Find more at www.downloadslide.com100Reciprocal Steps Maintenance 80 Baseline 60 Move 40 MelissaReciprocal Steps Maintenance 25 Baseline 5 10 15 20 25 30 35 40 45 50 55 20 Move Trials 15 10Reciprocal Steps KevinMaintenance Baseline 5 Move5 10 15 20 25 30 35 40 45 50 55 0 TrialsReciprocal Steps Maintenance 0 Baseline David Move100 5 10 15 20 25 30 35 40 45 50 55 60 80 Trials 60 40 Caleb 25 5 10 15 20 25 30 35 40 45 50 55 20 Trials 15 10 5 0 0100 80 60 40 25 20 15 10 5 0 0100 80 60 40 25 20 15 10 5 0 0Figure 2  Walking with adult support for Melissa (2-hand assistance), Kevin (support frombehind at trunk), David (2-hand assistance), and Caleb (support from behind at trunk)across baseline, intervention, and maintenance phases 357

Find more at Reciprocal Steps 120 Maintenance Melissa Move105 90 5 10 15 20 25 30 35 40 45 75 Trials 60 45 30 15 0 0 Reciprocal Steps 105 Maintenance Kevin Move 90 Would Not Bear Weight 75 60 5 10 15 20 25 30 35 40 45 45 Trials 30 15 0 0 Reciprocal Steps 165 DavidMaintenance Move150 135 5 10 15 20 25 30 35 40 45 50 120 Trials 105 90 75 60 45 30 15 0 0 Figure 3  Walking with the use of the gait trainer for Melissa, Kevin, and David across baseline, intervention, and maintenance phases358

Find more at www.downloadslide.coms­upport, the Gait Trainer was discontinued (after the 37th trial). For adult (33)support, there was a general increase in the number of steps with a mean of (34)30.16 with a range from 0 to 100 steps. A 38% overlap of data points (10 of 26data points) was noted from baseline to intervention. (35) (36) During the maintenance phase, measurements were taken on independent (37)reciprocal steps since Melissa no longer required the use of the Gait Trainer oradult support. All measurements during maintenance showed that Melissa wasable to walk over 500 ft independently. A 500-point increase in the mean witha 0% overlap of data points from intervention to maintenance was observed.Kevin.  For walking forward with adult support, Kevin was unable to bearweight or to take any steps during baseline or intervention (see Figure 2).Additionally, he would not accept b­ eing placed into the Gait Trainer during in-tervention (see Figure 3). During maintenance, however, Kevin was taking recip-rocal steps both with adult support and while using the Gait Trainer. The meanfor walking forward with adult support during maintenance was 3.33 steps witha range from 2 to 4 steps. This showed a slight upward trend and a mean of3.33 steps. There was also a 0% overlap of data points (0 of 3 points) betweenintervention and maintenance. While using the Gait Trainer, Kevin consistentlywas able to take independent reciprocal steps for a minimum of 100 steps.David.  For walking forward with adult support, there was a stable baselinewith a mean of 1.64 steps and a range from 1 to 3 steps (see Figure 2). Anupward trend was noted during the intervention phase with a mean of 6.91steps and a range from 0 to 26 steps. A 55% overlap of data points (12 of 22data points) from baseline to intervention with a 5.27-point increase in themean was observed. Initially, measurements of reciprocal steps while walking in the GaitTrainer were taken since David was unable to walk the 70-ft distance fromthe bathroom to the classroom (see Figure 3). However, once David was ableto consistently walk the entire distance, the measurement of d­istance wasdiscontinued and the measurement of time was added (after the 30th trial).For the duration of the study, measurements of time indicated a steady de-crease from 9 min 20 sec to 4 min 54 sec by the 53rd trial. David was unavail-able during the maintenance phase due to a family move.Caleb.  For walking forward with adult support, a stable baseline with amean of .60 steps and a range from 0 to 1 was observed (see Figure 2). Themean for the intervention phase was 4.47 steps with a range from 0 to 12steps. However, after the 33rd trial, all walking was discontinued for 3 weeksdue to medical complications. This resulted in a substantial decrease in walk-ing skills following this period. A reintroduction of the intervention was fol-lowed by another upward trend. There was a 3.87-point increase in the meanfrom baseline to intervention with a 21% overlap of data points (4 of 19 datapoints). There was a significant decrease in skill level in the maintenancephase demonstrated by a mean of 0 steps. This represented a 4.47-point de-crease from the intervention phase with a 0% overlap of data points.Discussion (38)The current study investigated the effects of the MOVE Curriculum on Conclusion: The results of this studyfunctional mobility skills (e.g., walking forward) with 5 students with se- provide support for the use of thevere, multiple disabilities. The results of this study provide support for the MOVE Curriculum to increase functionaluse of the MOVE Curriculum to increase functional mobility skills for stu- mobility skills for students with severedents with severe disabilities. A clear functional relationship between the behaviors and the intervention procedures was demonstrated. Fourof the 5 participants showed increases in walking skills from baseline tointervention. Although the fifth participant, Kevin, did not make any gainsduring intervention, he did show a dramatic increase in walking during themaintenance phase. 359

Find more at (39) In general, prior to intervention none of the participants was able to dem- onstrate functional walking skills either independently or with support. David and Caleb did demonstrate the ability to take a few steps with support during baseline, but these minimal levels did not increase their functional participa- tion in daily activities. By the end of intervention, however, Kim was able to walk short distances independently. Melissa, David, and Caleb were able to walk with adult support to participate more fully in their selected activities. As the students gained functional walking skills, they were able to participate in other school and community activities without the use of their wheelchairs. (40) Although Kim demonstrated very little interest in her environment and would not a­ ttempt to take any independent steps during baseline, the addi- tion of a motivating activity appeared to have a positive impact. Initially, Melissa resisted all attempts at walking and ­required the use of the Gait Trainer as well as full physical prompting to move her legs reciprocally. By the end of the intervention phase, however, she no longer required the use of the Gait Trainer and did not need to use her wheelchair during the school day. Despite Kevin’s inability to take reciprocal steps during the intervention phase, the transdisciplinary team continued to practice supported weight bearing in standing and transfers from sitting to standing with the expecta- tion that reciprocal stepping would develop. This was found to be true when measurements were taken during maintenance. (41) Although David was able to take a few steps with adult support during baseline, he used the Gait Trainer to practice walking for longer distances during intervention. David’s ability to walk forward with adult support also improved significantly during intervention. This new skill allowed David to walk short distances without the use of his wheelchair, allowing i­ncreased participation in crowded environments. (42) Caleb showed a fairly steady increase in reciprocal steps with adult sup- port during intervention. There was a 3-week period when Caleb’s nurse re- stricted all walking due to medical complications. After this break, Caleb ex- perienced a temporary regression in walking skills followed by progress beyond earlier achievements. (43) During the maintenance phase, only 4 of the 5 participants were avail- able. Three of the 4  participants not only maintained the gains made in walking skills, but they also continued to make improvements beyond the intervention year. The remaining participant, Caleb, experienced a signifi- cant setback in walking skills. (44) Kim was consistently walking over 500 ft on a variety of surfaces (e.g., sand, grass) and no longer used her wheelchair at school. Melissa was able to walk independently over 500 ft on a variety of surfaces, and her mother re- ported that she had greater access to the community. Kevin was consistently walking over 100 ft in the Gait Trainer and was bearing his own weight to take some steps with adult support. This was in sharp contrast to his perfor- mance during intervention when he was unable to bear his own weight. Caleb experienced numerous medical complications during the time period between intervention and maintenance. Walking skill practice was not a regu- lar component of Caleb’s education program resulting in regression in recip- rocal stepping. During maintenance data collection, Caleb could bear his own weight for short periods of time, but was not strong enough to independently take ­reciprocal steps. (45) This study has limitations including a small sample size, variability of data, and difficulty in establishing a cause and effect relationship. The first issue of small sample size is characteristic of single-subject designs. The limitations of this design were reduced by the use of repeated measure- ments over time and multiple baselines. Additionally, the dramatic effects required of single-subject designs may be more generalizable across indi- viduals than are larger group ­designs that meet relatively weaker statistical standards (Gall, Borg, & Gall, 1996; Kazdin, 1982).360

Find more at A second limitation of this study is the variability of the data that makes (46)interpretation of treatment effects difficult. The variety of influences in the (47)natural environment and the characteristics of individuals with severe dis-abilities (i.e., frequent illnesses and absences, medical complexities, etc.) (48)typically result in variations in the data. While traditional r­esearch methods (49)consider variability to be a weakness, researchers studying the multiple fac-tors affecting skill development advocate for the preservation of variabilitybecause it provides valuable information about behavior changes (Kamm,Thelen, & Jensen, 1990; Kratochwill & Williams, 1988). A third limitation of this study is the difficulty in establishing a causalrelationship ­between MOVE and increases in reciprocal stepping. In single-subject multiple-baseline designs, causal relationships can be inferredwhen performance changes at each point that intervention is introduced(Kazdin, 1982; Tawney & Gast, 1984). In this study, some participants didnot demonstrate immediate increases in skill with the introduction of theintervention. The slow rate of change of some participants also resulted inintroduction of the curriculum for some students prior to significant in-creases of the previous participant. Although this violation of multiple-baseline design lessens the degree of experimental control, the decisionwas made to expose all participants to the intervention within the schoolyear. However, a slow rate of behavior change is characteristic for thepopulation studied (Beirne-Smith et al., 2002; Shelden, 1998). Additionally,a visual inspection of the data indicates that participants made dramaticchanges in reciprocal stepping skills in either intervention or maintenance.Such dramatic changes in behavior provide more support for a causal rela-tionship between the intervention and an increase in behavior (Kazdin,1982; Tawney & Gast, 1984). A second consideration related to the inference of causal relationships isthe effect of ­external variables in relation to the intervention. In this studystable and staggered baselines helped to reduce the influence of competingvariables such as maturation and historical events. Further, there were nogains in functional mobility skills for any participant until after the interven-tion was introduced for that individual. The results of this preliminary study suggest that systematic mobilitytraining programs, such as MOVE, can lead to an increase in functional mo-bility skills. Additional research i­nvestigating the effects of the MOVECurriculum is warranted. Systematic direct replication should be conductedto help establish reliability and generalizability and could be replicatedacross at least five dimensions (e.g., subjects, behaviors, settings, procedures,and processes). In addition to replication of initial outcomes of this study, fu-ture research should investigate the ­criticality of specific components of theMOVE Curriculum (e.g., levels of family involvement, student-selected versusadult-selected activities, and systematic prompt reduction).Implications for Practice (50)This study emphasizes the importance of environmental support in the devel-opment of new skills. The importance of supports was obvious with Davidwho was unable to walk independently, but could walk short distances withadult support and even greater distances with the use of the Gait Trainer. Thisdiscrepancy would suggest that without assistance, David would not havehad the opportunity to practice walking skills. Kevin not only required pos-tural support, but he also needed to develop weight-bearing skills before heexperienced gains in walking. As muscle strength and proprioceptive aware-ness developed, he eventually was able to walk proficiently in the GaitTrainer. The implications of this study are significant for individuals with se-vere disabilities who may not have opportunities to participate in meaningfullife activities without environmental support. 361

Find more at A second implication of this study was related to motivation. In addition to a lack of postural balance and strength, some of the participants appeared to have no interest in walking. This seemed to be the case with Melissa who showed no signs of progress for over 2 months before making rather rapid and dramatic gains in her walking skills. The use of the Gait Trainer as well as full physical prompting allowed the school team to provide Melissa with the experience of walking to many different environments until she became actively engaged in walking. Thus, it appeared that once Melissa was moti- vated to walk within meaningful activities, she made dramatic increases in her functional walking abilities. Motivation was a factor with Kim’s program, also. Once Kim developed walking skills, she would walk only to the table to eat or when returning to the “safety” of her classroom. As she became more excited about the freedom walking gave her, she began to generalize this skill across many environments and activities. The natural motivation associated with activity-based instruction is a key component of the MOVE Curriculum, and this study supports the importance of practice during motivating activities.(52) A third implication relates to the need for increased opportunities to practice new skills. For many individuals with severe disabilities, these op- portunities do not always naturally occur. The results from this study sup- port the use of practice opportunities that are both meaningful and continu- ous. Melissa appeared to have the potential to walk, but not the motivation. The continuous opportunities for practice seemed to be related to her in- creased desire to walk. With Caleb and Kevin who appeared to lack both the skill and the will to walk, ­increased mobility opportunities provided the consistent practice necessary for the acquisition of walking skills. When progress is not immediately apparent, as with Melissa and Kevin, e­ ducational teams must be committed to continuous meaningful practice. In both cases, the participants initially appeared to be unaffected by the intervention, but eventually made significant gains in functional walking. Regardless of whether lack of progress is due to limited skills or low motivation, continu- ous o­pportunities for practice should be a critical component of mobility training.References developing chronological age-appropriate and func-Atwater, S. W. (1991). Should the normal motor de- tional curricular content for severely handicapped adolescents and young adults. The Journal of Special velopmental sequence be used as a theoretical Education, 13(1), 81–90. model in pediatric physical therapy? In J. M. Lister Butterfield, N., & Arthur, M. (1995). Shifting the (Ed.), Contemporary management of motor control focus: Emerging priorities in communication pro- problems: Proceedings of the II Step Conference gramming for students with a severe intellectual (pp. 89–93). Alexandria, VA: The Foundation for disability. Education and Training in Mental Physical Therapy. Retardation and Developmental Disabilities,Barnes, S. B. (1999). The MOVE Curriculum: An 30(1), 41–50. application of contemporary theories of physi- Campbell, P. H., McInerney, W. F., & Cooper, M. cal therapy and education. (Doctoral dissertation, A. (1984). Therapeutic programming for students University of West Florida, 1999). Dissertation with severe handicaps. The American Journal of Abstracts International, 9981950. Occupational Therapy, 38(9), 594–602.Beirne-Smith, M., Ittenbach, R. F., & Patton, J. R. Community Playthings. (1999). Rifton equipment (2002). Mental retardation. (5th ed.). Upper (Catalog). Rifton, NY: Community Products LLC. Saddle River, NJ: Prentice-Hall. Craig, S. E., Haggart, A. G., & Hull, K. M. (1999).Bidabe, D. L., Barnes, S. B., & Whinnery, K. W. (2001). Integrating therapies into the educational setting: M.O.V.E.: Raising expectations for individuals with Strategies for supporting children with severe severe disabilities. Physical Disabilities: Education disabilities. Physical Disabilities: Education and and Related Services, 19(2), 31–48. Related Services, 17(2), 91–109.Brown, L., Branson, M. B., Hamre-Nietupski, S., Pumpian, I., Certo, N., & Gruenewald, L. (1979). A strategy for362

Find more at www.downloadslide.comDunn, W. (1989). Integrated related services for pre- Kratochwill, T. R., & Williams, B. L. (1988). schoolers with neurological impairments: Issues Perspectives on pitfalls and hassles in single- and strategies. Remedial and Special Education, subject research. Journal for the Association of 10(3), 31–39. Persons with Severe Handicaps, 13(3), 147–154.Fetters, L. (1991). Cerebral palsy: Contemporary treat- McEwen, I. R., & Shelden, M. L. (1995). Pediatric ther- ment concepts. In J. M. Lister (Ed.), Contemporary apy in the 1990’s: The demise of the educational management of motor control problems: Proceedings versus medical dichotomy. Occupational and of the II Step Conference (pp. 219–224). Alexandria, Physical Therapy in Educational Environments, VA: The Foundation for Physical Therapy. 15(2), 33–45.Gall, M. D., Borg, W. R., & Gall, J. P. (1996). Orelove, F. P., & Sobsey, D. (1996). Educating chil- Educational research: An introduction (6th ed.). dren with multiple disabilities: A transdisciplinary White Plains, NY: Longman. approach. Baltimore: Paul H. Brookes.Giangreco, M. F. (1986). Effects of integrated ther- Rainforth, B., & York-Barr, J. (1997). Collaborative apy: A pilot study. Journal of the Association for teams for students with severe disabilities: Persons with Severe Handicaps, 11(3), 205–208. Integrating therapy and educational services. Baltimore: Paul H. Brookes.Harris, S. R. (1991). Functional abilities in context. In J. M. Lister (Ed.), Contemporary management Shelden, M. L. (1998). Invited commentary. Physical of motor control problems: Proceedings of the II Therapy, 78, 948–949. Step Conference (pp. 253–259). Alexandria, VA: The Foundation for Physical Therapy. Tawney, J. W., & Gast, D. L. (1984). Single subject re- search in special education. Columbus, OH: Merrill.Kamm, K., Thelen, E., & Jensen, J. L. (1990). A dy- namical systems approach to motor development. White, O. R., & Haring, N. G. (1980). Exceptional Physical Therapy, 70, 763–775. teaching (2nd ed.). Columbus, OH: Merrill.Kazdin, A. E. (1982). Single-case research designs. Source: “Effects of Functional Mobility Skills Training for Young New York: Oxford. Students with Physical Disabilities,” by S. B. Barnes and K. W. Whinnery, Exceptional Children, 68, pp. 313–324. Copyright ©Kern County Superintendent of Schools. (1999). 2002 by The Council for Exceptional Children. Reprinted with MOVE: Mobility Opportunities Via Education. permission. Bakersfield, CA: Author. 363

Find more at Chapter Twelve Narrative Research House of Frankenstein, 1944“If you are a person who does not interact well with others, narrative research is probably not for you!” (p. 369)

Find more at chapter 12  •  Narrative Research 365Learning Outcomes the research procedures section of a research report for a narrative research study.After reading Chapter 12, you should be able todo the following: Task 7A 1. Briefly state the definition and purpose of For a qualitative study, you have already cre- narrative research. ated research plan components (Tasks 2 and 3B) and described a sample (Task 4B). If your 2. Describe the narrative research process. study involves narrative research, now develop 3. Describe the key characteristics of narrative the research procedures section of the research report. Include in the plan the overall approach research. and rationale for the study, site and sample selec- 4. Describe narrative research techniques, tion, the researcher’s role, data collection meth- ods, data management strategies, data analysis including restorying, oral history, examining strategies, trustworthiness features, and ethical artifacts, storytelling, letter writing, and considerations (see Performance Criteria at the autobiographical and biographical writing. end of Chapter 15, p. 455). 5. Outline the steps involved in writing a narrative.The chapter learning outcomes form the basis forthe following task, which requires you to develop Summary: Narrative ResearchDefinition Narrative research is the study of how different humans experience the world around them, and it involves a methodology that allows people to tell the stories of their “storied lives.”Design(s) Narrative studies usually focus on the experiences of individuals and their chronology and context using the technique of restorying to collaboratively construct a narrative account. The goal of a narrative research design is to collaboratively explore a phenomenon of interest with an individual in an effort to understand how individuals’ past experiences impact the present and, potentially, the future.Types of appropriate Narrative research can contribute to our understanding of educational issuesresearch questions such as adolescent drug use, cultural differences in diverse urban school settings, and the achievement gap that separates children raised in poverty from children who are less economically disadvantaged.Key characteristics • A focus on the experiences of individuals • A concern with the chronology of individuals’ experiences • A focus on the construction of life stories based on data collected through interviews • Restorying as a technique for constructing the narrative account • Inclusion of context and place in the story • A collaborative approach that involves the researcher and the participants in the negotiation of the final text • A narrative constructed around the question “And then what happened?” (continued )

Find more at www.downloadslide.com366 chapter 12  •  Narrative ResearchSteps in the process The narrative research process is a highly personal, intimate approach to educational research that demands a high degree of caring and sensitivity on thePotential challenges part of the researcher.Example 1. Identify the purpose of the research study, and identify a phenomenon to explore. 2. Identify an individual who can help you learn about the phenomenon. 3. Develop initial narrative research questions. 4. Consider the researcher’s role (e.g., entry to the research site, reciprocity, and ethics) and obtain necessary permissions. 5. Negotiate entry to the research setting in terms of a shared narrative with the research participant. 6. Establish a relationship between researcher and participant that is mutually constructed and characterized by an equality of voice. 7. Collaborate with the research participant to construct the narrative and to validate the accuracy of the story. • Trust • Developing and maintaining a mutually constructed relationship that is characterized by caring, respectfulness, and equality of voice How do teachers confront, and deal with, high school students who have drug problems?Narrative Research: recent trends have influenced the development ofDefinition and Purpose narrative research:Narrative research is the study of how different ■ The increased emphasis in the past 15 yearshumans experience the world around them, and it on teacher reflection, teacher research, actioninvolves a methodology that allows people to tell research, and self-studythe stories of their “storied lives.”1 Narrativeresearchers collect data about people’s lives and, ■ The increased emphasis on teacher knowledge—with the participants, collaboratively construct a for example, what teachers know, how they think,narrative (i.e., written account) about the experi- how they develop professionally, and how theyences and the meanings they attribute to the make decisions in the classroomexperiences. ■ The increased emphasis on empowering Narrative research has a long history in diverse teachers by giving them voices in thedisciplines such as literature, history, art, film, educational research process throughtheology, philosophy, psychology, anthropology, collaborative educational research effortssociology, sociolinguistics, and education, and assuch it does not fit neatly into a single scholarly These trends in education have resulted in a changingfield. Within the field of education, a number of landscape of educational research and the promotion of scientifically based research practices to address1 “Stories and Experience and Narrative Inquiry,” by F. M. Connelly social, cultural, and economic issues.and D. J. Clandinin, 1990, Educational Research, 19(5), p. 2. We live (and perhaps teach or work in schools in some other capacity) in a time when we are being challenged by educational issues such as adolescent drug use, cultural differences in diverse

Find more at chapter 12  •  Narrative Research 367urban school settings, and the achievement gap that effectively with adolescent drug users. Perhapsseparates children raised in poverty from children Hilda holds economic, social, cultural, or reli-who are less economically disadvantaged. There are gious beliefs and values that affect her ability tono silver bullets to solve these (and many other) deal with the drug culture in her school.issues that have come to the forefront of politicaland educational policy in the late 20th and early From the information you collect in interviews,21st centuries, but we can try to understand them you will slowly construct a story of Hilda’s workbetter. By using narrative research in education, we with the troubled student. You will then shareattempt to increase understanding of central issues (i.e., retell) the story and, with Hilda’s help,related to teaching and learning through the telling shape the final report of the narrative research.and retelling of teachers’ stories. Narrative research This final report will be Hilda’s story of workingprovides educational researchers with an oppor- with a student who is troubled by drug use, andtunity to validate the practitioner’s voice in these it will contribute to our understanding of what itimportant political and educational debates. takes, on the part of a teacher, to work with ado- lescent drug users in our schools. To visualize what narrative and research in thesame sentence really mean, consider an example: This example shows how narrative research allows the researcher to share the storied lives of teach- Hilda, a teacher at High High School, has stu- ers to provide insights and understandings about dents in her class who appear “distracted” (which challenging educational issues as well as to enrich is perhaps teacher code for under the influence the lives of those teachers. Narrative research can of drugs). As an educational researcher, you contribute to our understanding of the complex decide that it would be helpful to know more world of the classroom and the nuances of the about how Hilda deals with this significant edu- educational enterprise that exist between teachers cational issue and what she does to work with and students. It simply is not always possible, nor the distracted, possibly drug-using adolescents in desirable, to reduce our understanding of teaching her classroom. You think of a research question: and learning to numbers. “What have been Hilda’s experiences in confront- ing and dealing with a student who has a drug Types of Narrative Research problem?” To study this question, you plan to interview Hilda and listen to stories about her Like other types of qualitative research, narrative experiences working with one particular dis- research may take a variety of forms. Some of tracted student. You will talk to the student, the these forms are listed in Figure 12.1. student’s parents, other teachers, administrators, and counselors, all of whom are stakeholders in How a particular narrative research approach is the student’s educational experience. You also categorized depends on five characteristics: who want to know about Hilda’s life and any signifi- authored the account (e.g., the researcher or the cant events that have affected her ability to work participant; note that the researcher is the partici- pant in an autobiography), the scope of the narrativeFigure 12.1 • Examples of types of narrative research forms• Autobiographies • Personal documents • Autoethnographies• Biographies • Documents of life • Ethnopsychologies• Life writing • Life stories and life histories • Person-centered ethnographies• Personal accounts • Oral histories • Popular memories• Personal narratives • Ethnohistories • Latin American testimonios• Narrative interviews • Ethnobiographies •Polish memoirsSource: Creswell, John W., Educational Research: Planning, Conducting, and Evaluating Quantitative andQualitative Research, 5th Edition, © 2015, p. 506. Reprinted by permission of Pearson Education, Inc., UpperSaddle River, NJ.

Find more at www.downloadslide.com368 chapter 12  •  Narrative Research(e.g., an entire life or an episode in a life), who pro- The Narrative Researchvides the story (e.g., teachers or students), the kind Processof theoretical/conceptual framework that has influ-enced the study (e.g., critical or feminist theory), and The narrative research process is a highly per-whether all these elements are included in one nar- sonal, intimate approach to educational researchrative.2 The nuances that distinguish the different that demands a high degree of caring and sensitivityforms of narrative research listed in Figure 12.1 are on the part of the researcher. Although negotiatingembedded in the disciplines in which they origi- entry to the research setting is usually considered annated. If one specific style of narrative research ethical matter with assurances of confidentiality andpiques your interest, you would do well to focus on anonymity, in narrative research it is necessary tothe discipline-based literature to guide your research think about this negotiation in terms of a shared nar-efforts.3 rative. That is, narrative research necessitates a rela- tionship between the researcher and the participantNarrative Analysis and the Analysis more akin to a close friendship, where trust is a criti-of Narrative cal attribute. However, this friendship quality is not easily attained in an educational research setting (letIt is important to distinguish between narrative anal- alone in our lives in general). It is not uncommonysis and the analysis of narrative, which, despite their for teachers, for example, to be cynical about anysimilar terminology, reflect unique processes.4 In nar- educational research, especially a style of researchrative analysis, the researcher collects descriptions of whose success relies on a friendship between theevents through interviews and observations and syn- researcher and participant. Imagine how you wouldthesizes them into narratives or stories, similar to the feel if approached by one of your educationalprocess of restorying. In this type of narrative research classmates (or colleagues at school) withresearch, the story is the outcome of the research, an a proposition such as this one: “I heard you talkingattempt by the researcher to answer how and why a about the difficulty you were having teaching kidsparticular outcome came about. Analysis of narrative who come to school stoned and wondered howis a process in which the researcher collects stories you would feel about spending a lot of time talkingas data and analyzes common themes to produce a to me about it. Maybe by working on the problemdescription that applies to all the stories captured in together, we can gain a greater understanding of thethe narratives. Using this approach, the researcher issues involved.” Think about the kind of person youdevelops a statement of themes as general knowl- would trust to undertake this kind of research inedge about a collection of stories, but in so doing, your workplace; for your narrative study to succeed,underemphasizes the unique aspects of each story. you need to become that person. If you are a person who does not interact well with others, narrative In this chapter, the focus of discussion is research is probably not for you!n­arrative analysis. That is, we are describing thedevelopment of a narrative or story that focuses on As Connelly and Clandinin5 have suggested, itparticular knowledge about how or why an outcome is important that the relationship betweenoccurred rather than the development of a collection researcher and participant be a mutually con-of stories and the search for themes to develop gen- structed one that is caring, respectful, and charac-eral knowledge about the collection of stories. terized by an equality of voice. If the researcher is unable to let go of the control that is typical in2 Educational Research: Planning, Conducting, and Evaluating many styles of educational research, the narrativeQuantitative and Qualitative Research (5th ed.) by J. W. research process is not likely to succeed. The edu-Creswell, 2015, Upper Saddle River, NJ: Pearson Education, Inc. cational researcher using a narrative research3 For examples of how narrative research has been applied to a methodology must be prepared to follow the leadwide range of contexts (e.g., school-based violence, Holocaust of the research participant and, in the immortalsurvivors, undocumented immigrant families, and other chal-lenging social problems), consider reading Narrative Analysis: 5 “Stories,” by Connelly and Clandinin, 1990, pp. 2–14; NarrativeStudying the Development of Individuals in Society, by C. Inquiry: Experience and Story in Qualitative Research, byDauite and C. Lightfoot (Eds.), 2004, Thousand Oaks, CA: Sage. D. J. Clandinin and F. M. Connelly, 2000, San Francisco, CA:4 “Narrative Analysis in Qualitative Research,” by D. E. Jossey-Bass.Polkinghorne, 1995, in Life History and Narrative (pp. 5–23),by J. A. Hatch and R. Wisniewski (Eds.), London: Falmer Press.

Find more at chapter 12  •  Narrative Research 369words of Star Trek, go where “no man [or woman] 3. Develop initial narrative research questions.has gone before!” In a very real sense, narrativeresearch is a pioneering effort that takes a skilled What have been Hilda’s experiences in con­researcher committed to living an individual’s fronting and dealing with a student who has astory and working in tandem with that individual. drug problem? What life experiences influence the way Hilda approaches the problem? Equality of voice is especially critical in theresearcher–participant relationship because the 4. Consider the researcher’s role (e.g., entry toparticipant (in all likelihood a teacher) must feel the research site, reciprocity, and ethics) andempowered to tell the story. Throughout the research obtain necessary permissions.process, the researcher must leave any judgmentalbaggage at home. The first hint of criticism or “ivory The researcher should seek permission fromtower” superiority will be a nail in the research cof- the Institutional Review Board (IRB), asfin. The researcher’s intent must be clear: to empower well as any other permission required bythe participant to tell the story and to be collab- the school or school district. In addition, theorative and respectful in the process. The researcher researcher must ask Hilda to sign an informedshould listen to the participant’s story before con- consent form.tributing his or her own perspective—even if asked.That is, the narrative researcher must not become an 5. Develop data collection methods, payinginformant. After all, it is the participant’s story we are particular attention to interviewing, andtrying to tell. As a patient listener, the researcher has collect the opportunity to validate the participant’s voice andallows the participant to gain authority during the A narrative researcher utilizes a variety oftelling of the story. narrative research data collection techniques, including interviewing and examining written A researcher interested in a narrative study and nonwritten sources of data.must thus decide if he or she has the time, access,experience, personal style, and commitment to 6. Collaborate with the research participant toundertake this particular style of on-site research. construct the narrative and to validate theOnce the decision is made, the researcher can accuracy of the story.begin planning the study. Each study will haveunique requirements, and the steps that follow The researcher and Hilda participateare meant simply as guideposts, but you should collaboratively in restorying the narrativenotice a parallel between the steps and the out- and then validating the final written accountline for writing a qualitative research proposal. (restorying—a writing process that involves synthesizing story elements—is described To illustrate the steps in planning and conduct- later in this chapter).ing narrative research, we build on the example ofour teacher, Hilda. 7. Write the narrative account. 1. Identify the purpose of the research study, Key Characteristics and identify a phenomenon to explore. of Narrative Research The purpose of the study at High High Narrative research can be characterized by the fol- School is to describe Hilda’s experiences lowing elements:6 in confronting and dealing with a student who has a drug problem. The specific ■ A focus on the experiences of individuals phenomenon that will be explored is that ■ A concern with the chronology of individuals’ of adolescent drug use in high school. experiences 2. Identify an individual who can help you learn ■ A focus on the construction of life stories based about the phenomenon. on data collected through interviews Hilda, a teacher at High High School, has volunteered to work collaboratively with the 6 Elements of narrative research were adapted from those in researcher. Educational Research, Creswell, 2015, and “Narrative Analysis,” by C. K. Riessman, 2002, in The Qualitative Researcher’s Companion, by A. M. Huberman and M. B. Miles, Thousand Oaks, CA: Sage.

Find more at www.downloadslide.com370 chapter 12  •  Narrative Research■ Restorying as a technique for constructing the involves the participants’ abilities to live their lives narrative account while telling their stories. Picture yourself as Hilda, the teacher focused on coping with adolescent■ Inclusion of context and place in the story drug users in her classroom. Can you imagine your-■ A collaborative approach that involves self fully engaged in living the daily life of a class- room teacher while relaying the story of your daily the researcher and the participants in the events and the meaning of your actions to a negotiation of the final text researcher? You might feel as if you were having a■ A narrative constructed around the question kind of out-of-body experience in which you had “And then what happened?” to look down on yourself from above. As Connelly and Clandinin noted, “A person is, at once, engagedThe narrative research process is similar to the in living, telling, retelling, and reliving stories.”8construction of a biography in that the educa- Now imagine yourself as the researcher who istional researcher does not have direct access to faced with the task of recording and communicat-observational data but must rely on primary data ing Hilda’s story. It is no wonder that the researchersources (e.g., the participant’s recollections) and and the research participant must establish a highsecondary sources (e.g., written documents by the degree of trust and respect akin to the kind of rela-participant); data are collected primarily through tionship we all expect in a close friendship.interviews and written exchanges. As mentionedpreviously, narrative research places considerable As with other methods used in qualitativeemphasis on the collaborative construction of research, narrative research relies on the triangula-the written account—the narrative text. Although tion of data to address issues of trustworthiness.researchers using other styles of on-site research As noted earlier, the data collection techniquesmay share accounts with research participants as a used in narrative research are sometimes criticizedway to test the trustworthiness of those accounts, as leading to fictitious, romanticized versions ofthey place little emphasis on the restorying pro- life in schools. Researchers can best counter thiscess that is quite unique to narrative research. criticism by ensuring the use of multiple data sources as well as the collaborative negotiation ofNarrative Research the written narrative account.Techniques In the following sections, we focus on data col-Empirical data are central to narrative research in lection techniques somewhat unique to narrativespite of the inevitable interpretation that occurs dur- research (e.g., storytelling, letter writing, autobio-ing the data collection process (e.g., during the tell- graphical and biographical writing, and other narra-ing and restorying activities). However, interpretation tive sources). In writing about personal experiencedoes not mean that the outcome of the process is methods, Clandinin and Connelly described thesefiction. The narrative researcher, like researchers data collection techniques as “field texts”9 that areusing other on-site research approaches, must be focused on capturing the essence of collaborativelyprepared to use multiple data sources to counteract created artifacts of the field experience of thechallenges that narratives could be written without researcher and the participant.ever leaving home. Accordingly, Clandinin andConnelly7 recommend that data be in the form of Restoryingfield notes on shared research experiences. Theseexperiences occur as the researcher collects data A characteristic of narrative research that distin-through journal and letter writing and documents guishes it from other on-site research approaches issuch as lesson plans and class newsletters. the technique of restorying the stories that individu- als tell about their life experiences. According to The immensity of the writing task for the narra- Creswell, restorying is “the process in which thetive researcher becomes clear if you consider whatis involved—for both the researcher and the partic- 8 “Stories,” Connelly and Clandinin, 1990, p. 4.ipant—in “living the story.” The main challenge 9 “Personal Experience Methods,” by D. J. Clandinin and F. M. Connelly, 1994, in Handbook of Qualitative Research (p. 419), by7 Narrative Inquiry, Clandinin and Connelly, 2000. N. K. Denzin and Y. S. Lincoln (Eds.), Thousand Oaks, CA: Sage.

Find more at chapter 12  •  Narrative Research 371researcher gathers stories, analyzes them for key ele- and may be focused on the actions of thements of the story (e.g., time, place, plot, and scene), students (e.g., their off-task behavior andand then rewrites the story to place it in a chrono- other relevant classroom behavior), thelogical sequence.”10 Often, individuals share stories problems caused by the actions (e.g., otherabout their experiences with researchers but with- students distracted, teacher time focused onout attention to the real-time order of events. For a few students), and any resolutions to theexample, participants may share specific details of a problems that Hilda employed (e.g., seekingvacation in a somewhat random sequence, back- assistance from outside the classroom,tracking to fill in earlier omissions (e.g., “Oh, I forgot establishing learning contracts with students).to tell you that before we got to the campsite . . .”) orjumping forward as certain details of the event call After restorying is completed, the researcherto mind other, related events (e.g., “Telling you invites the participant to collaborate on the final nar-about this trip makes me think of the one we took rative of the individual’s experiences. For example,last year, when the bear showed up . . .”). With each the educational researcher and Hilda would collab-interview, the researcher records these recollections, oratively construct a narrative that describes Hilda’samassing many pages of notes, which serve as the experiences working with adolescent drug users, asraw data for the narrative account. Although the well as the meaning these experiences had for Hilda.notes contain many interesting stories and details, This collaboration between researcher and partici-they do not constitute a narrative account of the par- pant is critical to ensure that there is no gap betweenticipant’s experiences because they lack chronology the “narrative told and narrative reported.”12 One testand coherence. The researcher must go through the of the trustworthiness of the narrative account is theprocess of restorying to provide that coherence. participant’s validation that the account is representa- tive of the individual’s lived experiences, as relayed The restorying process has three steps:11 in the interviews. A valid and clear narrative should increase our collective understanding of the phe- 1. The researcher conducts the interview and nomenon under study—in Hilda’s case, how a transcribes the audio recording to provide teacher confronts and deals with adolescent drug a written record of the raw data from the users in the classroom. interview. This process involves noting not just the spoken words but also the nuances Oral History of the interview—for example, humor, laughter, anger, and so on. One method for creating field texts is to have partici- pants share their oral histories. An oral history may 2. The researcher retranscribes the data be obtained by the researcher during a structured (i.e., condenses and annotates the transcripts) interview schedule with predetermined questions based on the key elements that are identified (and hence with the researcher’s agenda clearly in the story. For example, suppose that Hilda stated) or through an open-ended approach in which (our teacher at High High School) described the researcher asks participants to tell their own how she copes with students who come to stories in their own ways. In constructing an oral his- class under the influence of drugs. From her tory, a researcher may ask a participant to create a comments, we may identify and highlight time line (also known as a chronicle) that is divided certain themes, such as seeking assistance from into segments of significant events or memories. An a school nurse or counselor and establishing oral history of a teacher working with adolescents individual educational plans and contracts. who use drugs, for example, may include a time line from the beginning of the year (or previous years) 3. The researcher organizes the story into a that indicates significant events related to student chronological sequence with attention to the drug use, such as when students were suspended setting, characters, actions, problems, and from school because they violated a zero tolerance resolutions. For example, Hilda’s story may be policy or when students were arrested for drug set in the context of her classroom with the adolescents who use drugs (i.e., characters) 12 Ibid. p. 514.10 Educational Research (p. 511), Creswell, 2015.11 Ibid.

Find more at www.downloadslide.com372 chapter 12  •  Narrative Researchpossession. The time line is a helpful tool for the Letter Writingnarrative researcher attempting to make sense of theimportance of these events in the teacher’s overall Letter writing (or email exchange) is another way tostory. The teacher may also be asked to expand on engage participants in writing about their lived expe-these significant events and to write a description riences and to engage the narrative researcher andin a journal. Together, the chronicle and journal participant in a dialogue. The commitment of thoughtof the teacher’s experiences provide the narrative to text helps both the researcher and the participant.researcher with a powerful descriptive tool. Because email is widely available, this kind of dia- logue can be easily initiated and maintained. TheExamining Photographs, Memory dialogue serves as a working chronicle of the partici-Boxes, and Other Artifacts pant’s thoughts on issues related to the research phe- nomenon and thus provides the narrative researcherTeachers have a proclivity for acting like pack rats. with valuable insights into the evolving, tentativeThe materials they collect, apart from the obvious interpretations that the participant may be consider-curriculum materials, often include cards from ing. Further, if each email includes previous mes-former students, newspaper clippings, yearbooks, sages, the narrative researcher and the participantphotographs, and audio and video recordings of can reflect on the evolution of the themes by readingstudent performances. Often, these artifacts adorn the increasing record of the narrative dialogue.a teacher’s desk and bulletin board as badges ofhonor. The narrative researcher can use these arti- Autobiographical and Biographicalfacts as prompts to elicit details about the teacher’s Writinglife in school and in particular the phenomenonunder investigation. For example, a teacher may Engaging a participant in constructing or collabor-share thank-you cards from students who, due atively constructing a life history through autobio-to  the teacher’s intervention, were able to kick a graphical or biographical writing has the potentialdrug habit. to broaden the narrative researcher’s understand- ings about past events and experiences that haveStorytelling affected the participant’s experiences with the phenomenon under investigation. Perhaps Hilda,Narrative research affords many opportunities to for example, has had other professional or per-engage participants in storytelling. Teachers, by sonal experiences with adolescent drug users thatnature, are master storytellers, and many will hap- would contribute to an understanding of how shepily share stories about their experiences in operates in her current educational as “competent narrators of their lives.”13 Autobiographical or biographical writing aboutThe m­ anner in which narrative researchers engage Hilda’s life could bring these experiences to light.participants in storytelling sessions has a great Again, the use of email could provide a wonderfulimpact on the nature of the story. That is, when electronic record of the emerging narrative.s­torytelling is a routine part of the narrativeresearch process, researchers can regularly add to Other Narrative Data Sourcestheir understanding of a “day in the life” of ateacher who is focused on finding a resolution to A researcher can access many other narrative dataa ­challenging educational problem. Often, stories sources that can contribute to the construction ofare shared when a recorder is not handy, and the the written narrative. For example, documents suchresearcher will have to record field notes and ver- as lesson plans, parent newsletters, and personalbatim accounts as necessary. These stories are philosophy statements are readily available. Thesecritical in providing insights into teachers’ work sources provide windows into a world of class-and explanations of their actions. rooms that is not easily accessible to outsiders.13 The Active Interview (p. 29), by J. A. Holstein and J. F. Gubrium, Narrative research relies heavily on interview-1995, Thousand Oaks, CA: Sage. ing and observing, which comes with the chal- lenges of transcribing recorded interviews and

Find more at chapter 12  •  Narrative Research 373recording field notes. Thus, the use of new, readily end up as part of the final written account.accessible digital dictation tools is described in the Given the collaborative nature of narrativeDigital Research Tools for the 21st Century feature. research from the beginning until the end, the negotiation of the final narrative account shouldWriting the Narrative be relatively easy to achieve. However, it is worth remembering that the goal in conducting narra-The final step in the narrative research process is tive research is to “learn about the general fromwriting the narrative, which is again a collabora- the particular.”14 As such, we should be modesttion between participant and researcher. Many in the claims we make for the collaborativelydata collection techniques used in narrative constructed written narrative that is the finalresearch result in products—such as email, let- product of our research efforts.ters, and a participant’s biography—that often 14 “Narrative Analysis,” Riessman, 2002, p. 262.Digital Research Tools for the 21st CenturyDragon mobile assistant, Dragon Dictation, and dragon Dictate for Mac 3Speech recognition programs have been available for be edited, emailed, or even posted to blogs. Withmany years but were often cumbersome to use and a little practice, Dragon Dictation gives the re-expensive to purchase. However, there are now many searcher the potential to record ­observations, fieldsmartphone and computer applications available that notes, and interviews at five times the speed of typ-will save the narrative researcher some of the time ing on a keyboard. This is also a great tool to usespent writing field notes and transcribing interviews. to record your thoughts in the car while you are driving to your home or office, and best of all, it’sDragon Mobile Assistant a free application for smartphone users. As Dragon Dictation claims, “Turn talk into type while you areA new app for your mobile phone, Dragon Mobile on the go.”Assistant combines the easy-to-use voice recognitionsoftware application with a host of other tools for Dragon Dictate for Mac 3the on-the-go researcher. Need help scheduling aninterview? Check your calendar and send an email to If you’re not comfortable with talking and driv-your research participants while driving to another ing and you are looking for a more advancedresearch site. This free app can help record your field software package, Dragon Dictate for Mac 3notes, send emails and texts, and make your dinner a­ llows you to convert talk to type at a computerreservations while automatically detecting the need (and to interact with Mac applications by usingfor hands-free operation. only your voice). This program could be used to record interviews with research participantsDragon Dictation and  would therefore save the researcher time spent transcribing. Unlike Dragon Dictation, it isDragon Dictation is an easy-to-use voice ­recognition not free, but it may become your favorite com-software application that allows you to speak and puter application and narrative research time-instantly see your content in a text form that can saving tool.

Find more at www.downloadslide.com374 chapter 12  •  Narrative ResearchSummaryNarrative Research: Definition Participants in narrative research must feeland Purpose empowered to tell their stories. 6. A narrative researcher first identifies 1. Narrative research is the study of the lives a phenomenon to explore, selects an of individuals as told through the stories of individual, seeks permissions, and poses their experience, including a discussion of initial research questions. After determining the meaning of those experiences for the the role of the researcher and the data individual. collection methods, the researcher and participant collaborate to construct the 2. Narrative research is conducted to increase narrative, validate the accuracy of the story, understanding of central issues related to and write the narrative account. teaching and learning through the telling and retelling of teachers’ stories. Key Characteristics of Narrative ResearchTypes of Narrative Research 7. Narrative research focuses on the experiences 3. How a narrative research approach is of individuals and their chronology. characterized depends on who authored the account, the scope of the narrative, the 8. Narrative research uses the technique of kind of theoretical/conceptual framework restorying to construct a narrative account that has influenced the study, and whether based on data collected through interviews. all these elements are included in the one narrative. 9. Narrative research incorporates context and place in the story.Narrative Analysis and the Analysisof Narrative 1 0. The construction of a narrative always involves responding to the question “And then 4. In narrative analysis, the researcher collects what happened?” descriptions of events through interviews and observations and synthesizes them into Narrative Research Techniques narratives or stories. Analysis of narrative is a process in which the researcher collects 11. Narrative researchers employ a number of stories as data and analyzes common themes unique data collection techniques, including to produce a description that applies to all the restorying, oral history, examination stories captured in the narratives. of artifacts, storytelling, letter writing, and autobiographical and biographicalThe Narrative Research Process writing. 5. The relationship between researcher Writing the Narrative and participant must be a mutually constructed one that is caring, respectful, 1 2. Writing the narrative is a collaboration and characterized by an equality of voice. between participant and researcher.

Find more at This page intentionally left blank

Find more at For Whom the School Bell Tolls: Conflicting Voices Inside an Alternative High School Jeong-Hee Kim Kansas State University Abstract  This article is a study of conflicting voices inside an alterna- tive high school in Arizona. Voices of alternative schools are, quite often, not included in the discourse of curriculum reform even though the num- ber of alternative schools is growing every year. Bakhtinian novelness of polyphony, chronotope, and carnival are incorporated into an arts-based, storied form of representation to provoke empathic understanding among readers. Multiple voices (polyphony) of the school are juxtaposed within a certain time and space (chronotope) while all the different voices are valued equally (carnival) to represent conflicting views on public alterna- tive school experiences. The purpose of the article is to provide readers with vicarious access to tensions that exist in an alternative school, so that they may engage in questioning the nature and purpose of these spaces. In so doing, the study aims to promote dialogic conversations about “best practice” for disenfranchised students who are subject to experiencing educational inequalities in the current era of accountability and standardization.The purpose of this research is to Introductionpromote dialogic conversationsamong educators about ways in (01) One of the school experiences that are available for teenagers who droppedwhich educators can better serve out or were e­ xpelled from traditional high schools is the alternative school.a growing number of students One of its goals is to provide students with a second chance at school suc-who are at risk of school failure. cess. Although definitions or characteristics of ­alternative schools vary bySpecifically, the research uses state or even school district, one of the commonalities they share is that stu-an arts-based narrative research dents who attend an alternative school did not do well in traditional schools.approach to capture the voices of These students tend to be labeled as “at risk” of school failure no matter howfive participants (“inhabitants”) much potential they may have, and are likely to be excluded in the discourseof the alternative high school. of curriculum reform. As Oakes points out in the forward for Kelly (1993), alternative schooling tends to perpetuate social, political, economic, and edu-Is this purpose ­statement clear cational inequalities and continues to be an undercurrent of education with-enough to allow the reader to out scrutiny. While many alternative education programs serving the growingu­ nderstand the r­esearch questions population of a­t-risk students are run by school districts, little research hasthat will be investigated? been done to evaluate the success or the failure of the public alternative schools or programs (Conley, 2002). (02) This article is a case study of Borderlands Alternative High School (­pseudonym) in Arizona, which is a public school that serves about 250 s­tudents. Five different voices of its inhabitants: the principal, the security guard, a teacher, and two students, are presented in arts-based, narrative in- quiry. These voices reveal tensions and conflicts that exist inside Borderlands, which may reflect issues and problems that exist in other alternative schools. Rather than to provide a final solution, the purpose of the article is to promote dialogic conversations among educators about ways in which educators can better serve a growing number of students who are at risk of school failure. The article begins with a brief review of the literature on alternative educa- tion, then specific research methods are considered, next the theoretical376

Find more at www.downloadslide.comframework of Bakhtinian novelness is briefly explicated, this is followed bythe voices of the five protagonists, finally in the epilogue, the voice of the re-searcher is presented.Review of the Literature on Alternative Education (03) (04)Alternative education proliferated in the United States in the late 1960s and theearly 1970s as educational priorities shifted back to the progressive educationmovement. People who were unhappy with traditional curriculum hailed alter-native public schools that subscribed to the ideas of progressive education,which called for a free, open policy that emphasized the development of self-concept, problem solving, and humanistic approaches (Conley, 2002; Goodman,1999; Raywid, 1995; Young, 1990). These alternative schools attempted to offerplaces where students would have greater freedom and opportunities for suc-cess than in traditional schools, affirming that one unified curriculum could notbe sufficient for all (Conley, 2002). Many disgruntled parents transferred theirchildren to alternative schools that incorporated the concepts of “Free School”and “Open School” into the school curricula in order to meet students’ differentlearning styles, needs, and interests. However, most alternative schools of thisera were short-lived for various reasons, e.g., internal financial mismanage-ment, public pressure for school accountability and the “Back to Basics” move-ment that followed in the 1980s (Marsh & Willis, 2003). In the mid 1990s, alternative learning programs and schools includingpublic and private voucher programs, charter schools, and magnet pro-grams, started emerging in an effort to solve issues of poor student achieve-ment, ineffective pedagogical methods, and the increasing inability to meetthe needs of diverse families (Conley, 2002). Alternative schools in this era“satisfy the need to provide choice and diversity within a monopolistic­bureaucratic giant of public education” (Conley, 2002, p. 177). For instance,alternative schools in Washington State have been successful as an alterna-tive to traditional public education, with schools effectively meeting stu-dents’ different needs (see Billings, 1995). Billings states:Experiential learning, off-campus course work, learning contracts, demo-cratic decision making, new learning environments, restructuring of time,outcome-based credit, parental involvement, project based learning, sensitiv-ity to diverse learning styles, process focused curriculum, and small size arejust a few of the features that have long characterized alternative schools inWashington. (p. 1) Other recent research on alternative education, however, shows that the (05)public views alternative schools as places for students whose behaviors are (06)disruptive, deviant, and dysfunctional (see Dryfoos, 1997; Howell, 1995; Leone,Rutherford, & Nelson, 1991; Mcgee, 2001). Rather than being recognized as al-ternative solutions for students whose needs are not being met by traditionalschools, alternative schools are believed to exist to keep all the “trouble­makers” in one place in order to protect the students who remain in traditionalschools (Mcgee, 2001; National Association of State Boards of Education, 1994).They also tend to work to keep the expelled students off the streets in order toprevent them from committing a crime (Sprague & Tobin, 2000). Furthermore,Nolan and Anyon (2004) raise a concern that some alternative schools serve as“an interface between the school and the prison,” calling it the “school/prisoncontinuum” (p. 134). According to the first national study about public alternative schools andprograms conducted by the National Center for Educational Statistics (NCES),there were 10,900 public alternative schools and programs serving approxi-mately 612,900 at-risk students in the nation during the 2000–2001 schoolyear (National Center for Educational Statistics, 2002). NCES also reported thatalternative schools were disproportionately located in urban districts, districts 377

Find more at with high minority students, and districts with high poverty concentrations. This situation, in some cases, has rendered alternative schools as “enclaves for black, Latino, native American, and poor white students” (Arnove & Strout, 1980, p. 463), and “warehouses for academically underprepared sons and daughters of working-class families or single parents receiving welfare” (Kelly, 1993, p. 3). (07) More specifically, in the State of Arizona, the State Department of Education announced formal definitions of alternative schools in 1999. According to the Arizona Department of Education (ADE), the school must intend to serve students exclusively in one or more of the following c­ategories: students with behavioral issues (documented history of disrup- tive behavior); students identified as dropouts; students in poor academic standing who are more than one year behind on academic credits, or who have a demonstrated pattern of failing grades; pregnant and/or parenting students; and adjudicated youth (Arizona Department of Education, 2002). Every alternative school must meet the “achievement profile” provided by the ADE in the information packet on Standards and Accountability. This profile includes: ninety-five percent (95%) of students taking Arizona’s Instrument to Measure Standards (AIMS), which is a state exit exam that all high school students have to pass to be able to graduate with a high school diploma; decreasing dropout rate; and increasing percentage of graduates who demonstrate proficiency on the Standards via AIMS. Every alternative school is expected to have 100% of graduates demonstrate proficiency on the Standards via AIMS by 2006 (Arizona Department of Education, 2002). (08) The research site, Borderlands Alternative High School, is one of the twelve public alternative schools in the East Valley school district in Arizona. Borderlands houses students from ninth through twelfth grade and accepts students only by referrals from principals of conventional public schools. Enrollment at Borderlands has increased every year since the school opened in 1999. One hundred and fifty-two students enrolled at Borderlands duringResearcher’s role in the study was the 1999–2000 school year, 291 students during the 2000–2001 school year,as an observer and participant and 350 students during the 2001–2002 school year.­observer. The researcher Research Methods and Methodologyparticipated in “classroomactivities, interacted with (09) Fieldwork was conducted from August through December 2003. Data werestudents and faculty, helped collected Monday through Thursday, about five hours each day, by means ofstudents with schoolwork, and observation and participant observation. I took part in classroom activities,invited them to talk about their interacted with students and faculty, helped students with schoolwork, andschool and live experiences while invited them to talk about their school and life experiences while havinghaving lunch.” lunch. A main approach to the fieldwork was “conversation as research”How does the r­esearcher’s role (Kvale, 1996), in which conversations about school experiences and daily lifereflect the goals of narrative with students, teachers, and the school staff were made during break time,research? lunch hours, and in class. This approach not only helped me build informal relationships with each member of the school community, but also helped me understand the ways the school was perceived by them. (10) More formal conversations with students and staff took the form of semi-structured interviews with open-ended questions. The five protagonists in this study: Mrs. Principal, Mr. Hard (pseudonym, school security guard), Holly (pseudonym, female student), Jose (pseudonym, male student) and Ms.Data collection methods included Bose (pseudonym, teacher), were interviewed individually during theirthe use of semi-structured school hours except for Ms. Bose. Ms. Bose invited me to her home fori­nterviews with open-ended ­dinner where the interview was conducted. Each interview lasted about anquestions. hour and a half. The interviewees were asked to talk about their back- grounds, views on the alternative schooling, and their school experiences. Interviews were tape-recorded and then transcribed. (11) In terms of research methodology, this study employs narrative inquiry, which has b­ ecome an increasingly influential technique within teacher edu- cation during the last decade (Goodson, 1995). Using narrative inquiry,378

Find more at www.downloadslide.comeducational researchers interrogate the nature of the dominant stories Data were analyzed through narrativethrough which we have shaped our understandings of education, and chal- analysis that resulted in a narrativelenge the view of schooling framed in a predictable, fragmented, and that captured themes from the lived­paradigmatic way (Casey, 1993; Connelly & Clandinin, 1990; Goodson, 1995, experiences of the study’s participants.1992; Munro, 1998; Sparkes, 1994). In this study, data are analyzed throughnarrative analysis or narrative configuration. This is the “procedure through (12)which the researcher organizes the data elements into a coherent develop-mental a­ ccount” (Polkinghorne, 1995, p. 15). That is, in the process of narra- (13)tive analysis, the researcher extracts an emerging theme from the fullness of Researcher collaborated with partici-lived experiences presented in the data t­hemselves and configures stories, pants to construct first-p­ erson accountsmaking a range of disconnected research data elements c­ oherent, so that the of their experiences.story can appeal to the reader’s understanding and imagination (Kerby, 1991; Did the researcher e­ mploy restoryingPolkinghorne, 1995; Spence, 1986). as the technique for constructing the participants’ narratives? This narrative analysis creates arts-based research texts as an outcome (14)of research. According to Barone and Eisner (1997), some qualities thatmake educational stories arts-based texts include: the use of expressive, con- (15)textualized, and vernacular forms of language; the promotion of empathicunderstanding of the lives of characters; and the creation of a virtual reality. (16)A virtual reality means that the stories seem real to the reader so that thereader is able to identify the episodes in the text from his/her own experi-ences, and thus believe in the possibility or the credibility of the virtualworld as an analogue to the “real” one (Barone & Eisner, 1997). Virtual real-ity is an important element of an arts-based text as it promotes empathicunderstanding of the lives of the protagonists. In this article the five protagonists share their backgrounds, views, emo-tions, and reflections about their alternative school experiences using theirexpressive, contextualized, and vernacular language. Their stories are con-structed in the first person. When stories are told in the first person, they cangive the reader the illusion of spontaneous speech, that is, “the impressionof listening to an unrehearsed, rambling monologue” (Purcell, 1996, p. 277),contributing to the creation of a virtual reality.Theoretical Framework: Bakhtinian NovelnessThrough narrative inquiry, educational researchers try to understand thelived experiences of teachers or students and transform this understandinginto significant social and educational implications (Phillion, He, & Connelly,2005). Using Bakhtinian novelness as a theoretical framework is particularlyimportant in the story-telling nature of narrative inquiry as it facilitates theunderstanding of human experiences in a social and educational context. Itallows each protagonist to speak for him- or herself, while there is no single,unified point of view that dominates (Tanaka, 1997). According to Bakhtin (1975/1981), all stories are not the same.Depending on what kind of purpose a story has, it becomes either an epicor a novel. In an epic, stories are told from one point of view in one lan-guage, outside of considerations of time and particular places. There is onlyone world, one reality that is ordered and complete. On the other hand, anovel represents many languages competing for truth from different vantagepoints. The world of the novel is incomplete and imperfect. There is not asense of formal closure in a novel: “One may begin the story at almost anymoment, and finish at almost any moment” (Bakhtin, 1975/1981, p. 31). This“impulse to continue” and “impulse to end” are found in novels and they arepossible only in a world with open-endedness. Bakhtin posits three concepts to specify the nature of the novel, or“novelness”: polyphony, chronotope, and carnival. First, polyphony, or alanguage of heteroglossia, refers to “a plurality of independent, unmergedvoices and consciousness” (Bakhtin, 1963/1984, p. 6). The polyphonic, dia-logized heteroglossia of the novel involves a special multivoiced use oflanguage, in which no language enjoys an absolute privilege. Different 379

Find more at languages are used and different voices are heard without having one voice privileged over the others. Each language or voice is continually tested and retested in relation to others, any one of which may turn out to be capable of becoming as good or better a language of truth—if only t­entatively, on a specific set of occasions, or with respect to particular questions (Morson & Emerson, 1990). In this way, the novel can offer rich images of languages. The creation of images of languages is, in turn, a form of sociological probing and an exploration of values and beliefs, and  these images are tools for understanding the social belief systems (Morson & Emerson, 1990). (17) The second concept of novelness, chronotope, emphasizes time and space. For Bakhtin, polyphony is not enough to promote dialogic conversa- tions. A chronotope is a way of understanding experiences; it is a specific form-shaping ideology for understanding the nature of events and actions (Morson & Emerson, 1990). For the voices to reflect believable individual experiences, they should be put in particular times and particular spaces. Bakhtin (1975/1981) states that “time, as it were, thickens, takes on flesh, becomes artistically visible; likewise, space becomes charged and responsive to the movement of time, plot and history” (p. 84, cited in Morson & Emerson, 1990). Chronotope, therefore, becomes important in understand- ing our lives as individuals and social beings. (18) The third concept of the dialogic nature of “novelness” is the concept of carnival or “the carnivalesque.” Carnival, according to Bakhtin (1975/1981), is a concept in which everyone is an active participant, openness is cele- brated, hierarchy is invisible, and norms are reversed, like in popular ­festivals. The carnivalesque novel, through “laughter, irony, humor, and ­elements of self-parody” (Bakhtin 1975/1981, p. 7), offers an unofficial truth, where the symbols of power and violence are disturbed and counter- narratives are promoted with equal value. The novel is indebted to the spirit of carnival in creating a genuine dialogue. Bakhtin believes that the novel should play the same role in literature that carnival is alleged to play in the real life of cultures (Morson & Emerson, 1990). One formal and privileged way of life or way of thinking is discarded, but different views and styles are valued by representing the wide range of languages and experiences in the novel. In the carnival, voices of the marginalized or silenced are pro- moted and respected. (19) In brief, using Bakhtinian novelness of polyphony, chronotope, and carnival as a theoretical framework is particularly effective for the issues of power, resistance, tensions, and conflicts that occur in schools (Tanaka, 1997). As such, conflicting voices heard in a text with Bakhtinian novelness may “raise important questions about the topics under discussion, chal- lenging the reader to rethink the values that undergird certain social prac- tices” (Barone, 2001, p. 157). (20) In the following narratives, you will hear five different voices: first, Mr.  Hard is the school security guard, a big, White, middle-class, former ­police officer, who has been working at Borderlands for two years; second, Holly is a ninth grader, White, working-class girl, who wants to be a lawyer; third, Ms. Bose is a White, Italian descent and ninth grade science and math teacher, who has been working with at-risk students for 25 years; fourth, Jose is a half-Hispanic and half-White male student, who wants to be a great musician; and finally, Mrs. Principal is a White, middle-class administrator, who is devoted to making her school an “achieving” school. The Voice of Mr. Hard, the Security Guard (21) I am the security guard at this alternative high school. I got retired from a police department where I worked for 20 years before I came here. My wife is a director at a hospital here in Phoenix. Her job brought us here from380

Find more at www.downloadslide.comPittsburgh two years ago. I have two sons and a daughter. Two of them are (22)happily married, and my youngest son is in college. My hobby is fixing andbuilding stuff around the house on weekends, and Home Depot is my favor- (23)ite shopping place. (24) (25) This is my second year in this school, and I’ve been enjoying my job so (26)far. My main responsibility is to make sure that our school is a safe place.As you know, kids these days can be dangerous. Especially kids in thisschool have a lot of problems that regular schools don’t want to deal with.That’s why they are here. A lot of kids have a criminal history. Some kidshave already been to jail. My previous career working as a cop has helpedme a lot dealing with these kids who have a potential to commit a crime.That’s why I got hired so quickly. Our principal whom I’m closely workingwith gave me the authority to be in charge of the student discipline. Myposition here is to be a hard-liner. I’m the final set of rules that studentshave to abide by. That’s my background. I spent a lot of money on my­education at the police academy and I’m bringing that knowledge to­discipline these kids. That’s what I  like about my job. I try to help themsucceed by using my resources. If a student fails to go by rules, then he orshe has to deal with me. You know, they’re here because they can’t controltheir attitudes. They can’t control what they’re saying. They are violent,throw temper tantrums, and talk back. There are different ways to dealwith them and they are not in the textbook. Teachers can be flexible. When they don’t want to deal with disruptivestudents, they can send them to me. My job here is to inculcate rules tokids. Some of you go to football games Sunday afternoon. When there areno referees, what kind of game is it? It’s going to be a mess, right? With ref-erees and rules, we have an organized game. Likewise, I’m the referee here.I’m the rules. Students have to face me if they don’t follow the rules. I’m theone who keeps the game organized, and keeps the game from getting outof hand. My responsibility is to maintain the rules. We’re trying to helpthese kids become successful young adults in the society. In that sense,we’ve been very productive. I’ve seen a lot of difference among studentssince I started working here. Kids try to avoid me at school. Out of sight, out of fight. I know theydon’t like me. That’s fine with me. I don’t want to be liked. I just want to berespected. Don’t get me wrong. I’m not saying that I don’t have sympathy forthem. I do feel sorry for these kids because they have a lot of baggage. Theycome from broken, poor, and abusive families. They don’t fit the mainstream.They have lost the idea of where the main road is. So, our job is to put themback on the right track. It can be done only by strict discipline. They need tolearn how to behave so that they can function in a society as a cashier orsomething. If they don’t follow the rules, we kick them out of school. In fact,we suspended a lot of students this year. It’s our way of showing them theyare wrong. As you can imagine, we have a zero-tolerance policy for students whoviolate school rules. Holly has been my target these days. She is just impos-sible. I don’t know what she’s gonna turn into in the future. She’s violentand gets into trouble every other day. She smokes, violates dress codes, andtalks back to teachers, just to name a few. We have given her several warn-ings. She’s quite smart, but being smart doesn’t count here. What matters iswhether or not one obeys the rules. On the first week of October, I caughther smoking in the restroom again. When I asked her to come with me, shewouldn’t. So I tried to call the police, but Holly picked up a handful of rocksand started throwing them at me. She was ferocious! We gave her a five-daysuspension. And then, our school threw a Halloween party for students threeweeks later. Teachers and staff donated money to buy hamburger patties,sausages, and other stuff for students. I  brought my own barbeque grill 381

Find more at and tools from home and took charge of barbequing. I was happy to be the chef of the day. I was happy to see students relaxing, having fun, and ­enjoying food that I cooked. It was so nice to see students and teachers mingling together, p­laying basketball and other games. It was a nice change. The party was going well for the most part. But, right before the party was over, Holly got into an argument with this Black girl, Shawnee. Holly got mad at her and mooned Shawnee who was with other ninth graders. This incident was reported to the principal, who called Holly’s mom to ask her to a­ ppear at the school the next day. Holly got expelled after the “happy” Halloween party. Hope this expulsion will teach her something! The Voice of Holly, the Goofy Snoopy (27) My name is Holly. I just turned fifteen in July. I was born in Mesa, Arizona, and have never moved out of Arizona. I’m a White girl with a little bit of Native American descent from my mom’s side. I heard my mom’s great- grandma was some sort of a Native American. I don’t know what tribe, though. I’m tall, about five feet seven inches, and have long blonde hair with red highlights. I like to wear tight, low-rise jeans and a black “dead-rose” shirt that has a picture of a human skull surrounded by roses. I used to wear the Gothic style of clothes in my junior high, all in black from head to toe, ­wearing heavy, clumpy army boots. But I got tired of it, so, now I’m into Punk. I have a tattoo on my lower back and have a silver ring on the center of my tongue. I got my tongue pierced on my 15th birthday. I like it a lot. My mom hates it, though. But I don’t care. She hates whatever I do, anyway. She’s a bitch. She works at a car body shop, buffing and painting old cars with her boyfriend who is living with us. I can’t wait to leave home. As soon as I turn 18, I’ll say bye to them and leave home. I’m tired of them ordering me to do this and that. (28) Anyways . . . My nickname is Snoopy. I got it in eighth grade for jumping and dancing like Snoopy at the Fiesta Shopping Mall. I just felt like doing it. People gathered around me and shouted, “Snoopy, Snoopy!” I did that for an hour. I didn’t feel embarrassed at all. Since then, my friends started calling me Snoopy. They think I’m goofy. Yes, I am goofy. I don’t care what others think about me. If I feel like doing something, I just do it. No second thought. But at school, I get into trouble because of that. Teachers don’t like my personality. They think I’m just acting out. In fact, I was very upset when Ms. Bose told me the other day to change my personality. Do you know what she told me? She said, “I don’t like your personality. You need to stop acting out. You need to change your personality. Then, your school life will be a lot easier.” I said to myself, ‘Bullshit!’ Change my personality? It took me fifteen years to develop it, for Christ’s sake! I don’t care if she likes it or not. I’m unique. I’m different. I have my own opinions unlike other kids. But teachers think I’m acting out, disruptive, unruly, and rude. Because I like to speak up, I have a history of being kicked out of classrooms and sent to ALC (Alternative Learning Center) where other “disruptive” kids are isolated, sup- posedly working on their individual assignments. (29) My friends like to talk to me about their personal issues because I give them a solution. Having said that, I think I have a leadership personality. I want to be a lawyer. I like to argue with people: my mom, her boyfriend, teachers, and my classmates. I win them all. Teachers are actually my worst enemies, but I’m not scared of them. A lot of times, they don’t make sense. Last week, for example, I whistled in Ms. Bose’s math class because I was happy to finish my work sheet earlier than other kids. Well, we’re supposed to be ninth graders, but we were learning things that I had already learned in seventh grade. So this worksheet was super easy for me. So, I whistled to let everybody know that I finished my assignment. But here goes Ms. Bose.382

Find more at“Holly! Stop whistling. You’re getting a zero point for today for being disrup- (30)tive.” “What? I’m getting a zero point even though I finished my assignment?That doesn’t make sense!” “Yes, you’re getting a zero point no matter what, (31)because you are being disruptive.” “Fine! If I’m getting a zero point for the (32)day, I might as well keep whistling. What the hell!” I just kept whistling. Ms.Bose started yelling at me, “Holly, stop whistling right now! Otherwise, I’mgonna call the office.” “Whatever!” It was one heck of a yelling match.Finally, Ms. Bose called the office. Five minutes later, Mr. Hard came to ourclassroom to get me. He took me to the ALC. So, the day became another“do-nothing-at-school” day. This school sucks, if you ask me. They put a bunch of “bad” kids hereall together like a warehouse. There is nothing attractive here. Look at theseugly portable buildings without any windows. They are called “classrooms.”We don’t have a cafeteria, so we have to eat our lunch at outdoor picnic ta-bles near the restrooms. We get to enjoy this picnic every single day evenunder the hot temperature of one hundred five degree heat of the desert.Go figure. We use old, “hand-me-down” textbooks that came from a neigh-boring high school. It’s like we are the disposables of education. We don’tmean much. Our classes have six or seven students. I like this small class.But we don’t really cover all the stuff in the textbook. We learn easy stuff,and I get bored with that. I had to do the multiplication table again becauseour Mexican boy, Guillermo, didn’t know how to do multiplications! When Irun into difficult stuff, I just copy answers from the textbook to fill out theworksheets without understanding. And I get a good point format as longas I behave. I want to be a lawyer. But I don’t know if I will ever be able toachieve my dream. I know I’m not stupid. But there is no counselor I cantalk to about it. There are more rules and regulations here than regular schools. Look atMr. Hard, the old, fat, security guard who retired from the police depart-ment. I hate that guy. He is obsessed with rules. He goes, “Follow the rules,follow the rules. That’s the rule number one here, ­otherwise you deal withme.” We try to avoid running into him because he will make sure to findsomething wrong with us. He randomly calls one or two kids into his officeand starts searching their backpacks. We hate it. It’s such an insult. Recently, Mr. Hard has been watching me like ahawk. I don’t know when I became his target. Somehow, he decided to pickon me. On a gloomy day in October, I felt like smoking. The weather wasweird, and I had a fight with my mom again that morning. I was having abad day, you know. I needed to smoke to release my stress. When I wassmoking in the restroom, Mr. Hard caught me on the spot. He asked me tocome with him to his office. I said no. He asked me again. I said no again.Then, he started calling the police. I  quickly grabbed some rocks on theground and threw them at the son of a bitch. He ran away like a chickenwith his head chopped off. I beat him finally! That night, I had a dream ofhim. I had a screw driver and shoved it into his neck, saying, “Leave mealone!” He was scared of me!The Voice of Ms. Bose, the Boss (33)“Hey, guys. There are times when I’ll be asking you to leave the classroom ifyou get on my nerves. When I say ‘Leave,’ I want you to get out of here. Get outof my sight for five minutes or so, go walk around or something and comeback in, instead of fighting me. I’m the boss here. I’m the dictator. It’s me whomakes a decision for you guys. So you have to follow my order. They pay me alot of money to keep me here. I get paid more than any other teachers here.Yes. I make a lot of money for educating you to become a good person. Sowhen you and I have an argument and when I say to you to get out of here,you need to leave the classroom for five minutes.” 383

Find more at (34) This is what I usually say to my students in the first day of class. It is ­important to let them know who is the boss here. Otherwise, they will be out of control. I’ve been teaching for almost 25 years including five years of teaching at a prison in St. Louis before I moved to Arizona. After taking a break from teaching for a couple of years to raise my boys, I volunteered to teach at the poorest and worst school where there was nothing but gangsters. I never wanted to teach at a “nice” school where all the good kids attended. It is my strength that I can easily be sympathetic with kids who have issues and problems, like gang members, because I have been there. I myself came from a poor immigrant family background from Italy. I grew up in a poor area where crimes took place every day. I know what it is like to live in poverty. My father was a cop, but his paycheck was not thick enough to feed seven family members in the 50s and 60s. I still r­emember those days when our family had to skip meals as often as we ate. From that kind of environment, I learned to be tough. I needed to be as tough as iron to be able to survive. I also learned to control rather than being controlled. (35) I enjoy teaching at-risk kids. I have never been afraid of those kids even though some of them are gangsters. I believe that we, human beings, are basi- cally the same, no matter how stupid or how smart we are. We are all vulner- able and fragile. We all make mistakes and ­regret. We tend to repeat the cycle. But I need to teach these kids to break the cycle. I have to be a therapist first rather than a teacher in order to be able to do that. Teaching how to read and write can’t be a main focus. What they need is a mental therapy, not an educa- tion, because they are “emotionally handicapped.” It is their poor emotional well-being and low self-esteem that causes them to get into trouble. (36) But under the No Child Left Behind of 2001, terms like achievement, a­ccountability, standardization, and testing, have become our every day lan- guage at schools. Alternative schools are not an exception. In the year 2003, we got a new principal who believed that these kids needed to be taught to standardized tests. According to Mrs. Principal, my therapeutic method was not helpful in raising students’ test scores. She said teachers need to focus on teaching kids how to take the standardized test, especially AIMS, if we don’t want our school to be shut down under the NCLB. But, look, these kids are former drop-outs from regular schools. They are way behind their grade level because they have been skipping classes. There is no way we can make them pass the AIMS test until we subdue these kids’ acting-out behaviors first. That’s why Mr. Vee, our former principal, left this school. He couldn’t stand the pressure from the school district about these alternative kids meeting the standards. He was the kind of liberal educator—too liberal for me, by the way—who emphasized students’ personal growth. Portfolio assessment was one of his initiatives that tried to help students to reflect on their growth. But Mrs. Principal got rid of that. I didn’t care for the portfolio assessment any- way, but it just shows how our school is changing under this accountability and standards movement. (37) My perspective on educating these kids is different from both princi- pals. My focus here is to get them to listen, that is, to make them behave and make them be positive. A lot of kids have ADHD (Attention Deficit Hyperactivity Disorder). A good example is Holly. She cannot sit still for a minute; she’s loud, annoying, and disturbing. She’s pretty smart, at least smarter than other kids here, but her problem is that she likes to argue with everybody. She knows it all. She tries to get tough with me, but I am tougher. I let her know who is the boss here. I have a firm belief that resistance from students like Holly needs to be controlled by strong authority. I can be as warm as freshly baked bread when students listen to me, but I can also be as tough as iron when they don’t listen. I believe that a good education for these kids is to teach them to behave and have a good positive attitude, so that they can function well in this society. I mean, what kind of boss would want to have an employee like Holly who talks back and is disobedient? It’s384

Find more at www.downloadslide.commy responsibility to teach my students to have good attitudes, which willeventually lead them to get a job after graduation. Here’s my phrase for mystudents: “Attitude, attitude, attitude. You gotta have a good attitude.”The Voice of Jose, the Silent Rebel (38) (39)I am 17 years old, about five feet ten inches tall. I was born in California on (40)April 23rd. I’m half-Hispanic, half-White. My biological dad is Hispanic from (41)Mexico. I haven’t seen him since my parents got divorced when I was three.My mom got remarried when I was five, and that’s when we moved toArizona. Since then, my mom went through two more divorces, and nowshe’s with her fourth husband. Right now, I’m living with my mom, my olderbrother, my fourth step-dad, and two of his children. My mom changed jobsseveral times, and she currently works as a gate-keeper for a housing com-pany. Her current husband is a construction worker. I’m supposed to be asenior but am taking junior classes due to the lack of credits. I have attendedBorderlands for two years to catch up with credits. I was in and out of schoolduring my freshman and sophomore years because I was struggling with alot of personal issues. My mom’s frequent divorces and remarriages havebadly affected me. I went to jail a couple of times for doing drugs, which Istarted when I was fifteen, and I’m on probation because of that. In addition,I was in a rehabilitation center for eight weeks for being depressed and sui-cidal. I used to be in serious depression, and used to cut myself with a razor.But the rehabilitation program didn’t do much good for me because I’m stilldepressed most of the day and not talking to anybody. There is one thing that keeps me going, though. It is music. WheneverI  feel frustrated and depressed, I play the guitar. Bass guitar. That’s whatkeeps me sane. I express myself through music. I write a song, sing, andplay. I also organized a band with my friends like six months ago. The ga-rage of my house is our practice room. We get together once a week, some-times twice a week for practice. We’re planning to play at a bar on Saturdayswhen we play better. Actually, some kids at school asked us to play at theHalloween party. We asked Mrs. Principal for her permission, but she said‘no’ after she examined our lyrics. Her reason was that our music was notappropriate for a school environment. She said there were too many cusswords in our songs, so students would be badly influenced by our music. Wewere pissed off when we heard it. Kids would have loved it! What does sheknow about pop music, hard rock, or punk rock? Nothing! I bet she doesn’t know who Jim Morrison is. I’m sure she has neverheard of the legendary band, the Doors. Morrison is my idol, although hedied even before I was born. Morrison and his music influenced me somuch. It was Jim Morrison who taught me how to see the world, not theteachers, not my parents. I see the world through Morrison’s eyes and hismusic. I wanna be a great musician like him. He wrote songs and poems.I love his poetry. Through his poetry, some of which became the lyrics of hissongs, he criticized the society for destroying people’s souls with money,authority, and momentary pleasure. His songs are about the feeling of isola-tion, disconnectedness, despair, and loneliness that are caused by the prob-lems of society. He was a free soul who was against authority. He taught meto stand up for myself to be able to survive in this world. He taught me tostand against authority. Maybe that’s why I cannot stand Mr. Schiedler, oursocial studies teacher. I call him a “lost soul.” Whenever I say something thatchallenges what he says, he goes, “Be quiet!”, “Shut up!” He is a BIG control-ling dude. He has to make an issue about everything I do. He doesn’t under-stand students at all. He just thinks we are a bunch of losers. In fact, many teachers are lost souls. I’ve been attending this school fortwo years, but I find teachers to be so annoying. They are only interested inkeeping their job, so they just regurgitate the stuff they are supposed toteach and show no compassion. A lot of things they teach are biased and 385

Find more at pointless. Just straight facts that have nothing to do with life. There is so much going on in the world, and there are so many other things we need to learn about. But all we do, like in Mr. Schiedler’s social studies class, is to  copy a bunch of god damn definitions of terms from the textbook and take a test that has 150 questions on it. One hundred fifty questions! I don’t even read the questions. I just choose answers in alphabetical order: A, B, C, D, A, B, C, D . . . . (42) Teachers expect us to believe whatever they say. It’s like going against them is a sin. I think it’s propaganda that brainwashes and pollutes students’ minds. But not mine. Jim Morrison taught me not to believe everything that adults say. That’s why I get into so many arguments with teachers. I give them a piece of my mind. I have gotten suspended and kicked out of school many times, but I don’t care. Schools don’t mean much to me. I have a tattoo on my right arm. It is one red word, “Revolution.” The Voice of Mrs. Principal (43) How did I get here? Umm, it was last year, November 2002, when my district office contacted me and asked if I wanted to transfer to Borderlands as ­principal. I was told that Mr. Vee had to resign because he was having some issues with the district office. At that time, I was an assistant principal at a junior high, which was also an alternative school for 6th thru 8th grade. Of course, I happily accepted the offer because it was a promotion for me. For 20 years of my involvement with education, I always liked working with those at-risk kids who were struggling in every way. It’s a challenge, but it’s a good challenge that I enjoy because I feel much more successful and much more needed. (44) I started my job here in January this year. My district superintendent told me that our school would be a “referral basis only” starting spring ­semester. It means that our school is not a choice school any longer. If there are students who are deviant, unruly, disruptive, skipping classes, and violating school rules, a school principal refers them to me. It has made my job more difficult especially under the NCLB because we have to spend a lot of time dealing with students’ behavioral problems when we can use that time for preparing them for the tests. (45) I brought several teachers with me from my previous school because it’s easy for me to work with teachers whom I know and trust. They are like my buddies. And they know me well. They know I have a ranch home far away from the school with three horses. They know my 15-year-old daughter is into horseback riding and enters a horse race every spring. Actually my hus- band and I took her to a horse show held in the West World close to Scottsdale two weeks ago. Yea, we like horses. That’s an important part of my personal life. (46) Sorry about the digression. Anyway, the teachers who came with me are very cooperative in making the school run smooth. They are not only teach- ing subject matter but also teaching kids social and life skills. They work on disciplining the students. We have a zero tolerance policy for anybody who violates the school rules and regulations. Mr. Hard has been playing a key role in implementing the policy. He’s really good at taking care of kids who have issues of drugs, violence, smoking, fighting, etc., all kinds of problems our students have. Since he started working with us, discipline issues got a lot better. Kids are scared of him. They try to follow the rules as much as they can, so that they don’t need to face him. Holly and Jose have been ex- ceptions, though. They tend to act out too much, making a bad influence on others. The other day, Jose was trying to bring his band to school for the Halloween party, but I flatly said no. Their songs were full of “F” words, talk- ing about getting high, going against authority, and revolt, all kinds of bad stuff. And I know his band members do drugs. No way we would allow them to play at school.386

Find more at www.downloadslide.comChrispeels, J. H., & Rivero, E. (2001). Engaging Latino Higher Education Amendments of 1998. Pub. L. No. families for student success: How parent educa- 105–244, 112 Stat. 1581, 20 U.S.C. §§1001 et seq. tion can reshape parents’ sense of place in the education of their children. Peabody Journal of Honig, M. I., Kahne, J., & McLaughlin, M. W. (2002). Education, 76(2), 119–169. School-community connections: Strengthening op- portunity to learn and opportunity to teach. InCochran-Smith, M. (1995). Color blindness and bas- W. R. Houston (Ed.), Handbook of research on ket making are not the answers: Confronting the teaching (pp. 998–1028). Washington, DC: dilemmas of race, culture and language diversity in American Education Research Association. teacher education. American Education Research Journal, 32(3), 493–522. Koerner, M., Rust, F. O., & Baumgartner, F. (2002). Exploring roles in student teaching placements.Cochran-Smith, M. (2001). Desperately seeking so- Teacher Education Quarterly, 29(2), 35–58. lutions. Journal of Teacher Education, 52(5), 347–349. Ladson-Billings, G. (2001). Crossing over to Canaan: The journey of new teachers in diverse classrooms.Comer, J. P. (2005). The rewards of parent participa- San Francisco: Jossey-Bass. tion. Educational Leadership, 62(6), 38–42. Lagemann, E. C., & Shulman, L. S. (1999). The im-Conceptual Framework. (2002). Boston: University of provement of education research: A complex, con- Massachusetts, Graduate College of Education. tinuing quest. In E. C. Lagemann & L. S. Shulman (Eds.), Issues in education research: ProblemsCreswell, J. W. (1998). Qualitative inquiry and research and possibilities (pp. xiii–xxi). San Francisco: design: Choosing among five traditions. Thousand Jossey-Bass. Oaks, CA: Sage. Lather, P. A. (1991). Getting smart: Feminist researchDelpit, L. (1988). The silenced dialogue: Power and and pedagogy with/in the postmodern. New York: pedagogy in educating other people’s children. Routledge. Harvard Education Review, 58(5), 280–298. Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic in-Denzin, N. K., & Lincoln, Y. S. (1994). Handbook of quiry. Beverly Hills, CA: Sage. qualitative research. Thousand Oaks, CA: Sage. Massachusetts Coalition for Teacher Quality andDudley Street Neighborhood Initiative. (2005). Student Achievement. (2004). Working together to Retrieved from prepare teachers so that students in our urban schools receive a high quality education, 1999–Epstein, J. L. (1995). School/family/community 2004. Report funded by the U.S. Department of p­ artnerships: Caring for the children we share. Phi Education. Delta Kappan, 76(9), 701–712. Medoff, P., & Slar, H. (1994). Streets of hope: The fallEpstein, J., Sanders, M. G., Simon, B. S., Salinas, K. C., and rise of an urban neighborhood. Boston: South Jansom, N. R., & Van Voohis, F. (2002). School, fam- End Press. ily, and community partnerships: Your handbook for action (2nd ed.). Thousand Oaks, CA: Corwin. Murrell, P. C., Jr. (2001). The community teacher: A new framework for effective urban teaching. NewFreire, P. (1985). The politics of education: Culture York: Teachers College Press. power and liberation (D. Macedo, Trans.). Hadley, MA: Bergin & Garvey. Oakes, J., & Lipton, M. (2003). Teaching to change the world (2nd ed.). Boston: McGraw Hill.Fullan, M. (1998). The meaning of educational change: A quarter century of learning. In A. Hargreaves, A. Reinharz, S. (1992). Feminist methods in social Lieberman, M. Fullan, & D. Hopkins (Eds.), The in- ­research. New York: Oxford University Press. ternational handbook of educational change (pp. 214–228). Boston: Kluwer. Scholz, R. W., & Tietje, O. (2002). Embedded case study methods: Integrating quantitative and quali-Goodlad, J. (1990). Teachers for our nation’s schools. tative knowledge. Thousand Oaks, CA: Sage. San Francisco: Jossey-Bass. Schubert, W. (1991). Teacher lore: A basis for under-Guyton, E., & McIntyre, D. J. (1990). Student teach- standing praxis. In C. Witherell & N. Noddings ing and school experiences. In W. R. Houston (Eds.), Stories lives tell: Narrative and dialogue (Ed.), Handbook of research of teacher education in education (pp. 207–233). New York: Teachers (pp. 514–534). New York: Macmillan. College Press.Haberman, M. (1995). Star teachers of children in Sheldon, S. B., & Epstein, J. L. (2002). Improving poverty. West Lafayette, IN: Kappa Delta Pi. student behavior and school discipline with fam- ily and community involvement. Education andHenderson, A. T., & Mapp, K. L. (2002). A new Urban Society, 35(1), 4–26. wave of evidence: The impact of school, family and community connections on student achievement. Stake, R. E. (1995). The art of case study research. Austin, TX: Southwest Educational Development Thousands Oaks, CA: Sage. Laboratory.440

Find more at www.downloadslide.comStovall, D., & Ayers, W. (2005). The school a commu- education 2005. Washington, DC: National Center nity built. Educational Leadership, 62(6), 34–37. for Education Statistics. Yin, R. K. (1984). Case study research: Design andTaylor, L., & Adelman, H. S. (2000). Connecting schools, methods. Beverly Hills, CA: Sage. families and communities. Professional School Yin, R. K. (2003). Case study research: Design and Counseling, 3(5), 298–308. methods (3rd ed.). Beverly Hills, CA: Sage.University of Massachusetts Boston. (2004). Mission Source: Mari E. Loemer, Najwa Abdul-Tawwab, Using Com­ statement. Available at: mis- munity as a Resource for Teacher Education: A Case Study, sion.html Equity and Excellence in Education, 39, pp. 37–46. Reprinted by permission of the publisher. (Taylor & Francis Group,Villegas, A. M., & Lucas, T. (2002). Preparing c­ ulturally ­ responsive teachers: Rethinking the curriculum. Journal of Teacher Education, 53(1), 20–32.Wirt, J. G., Choy, S., Rooney, P., Hussar, W., Povasnik, S., & Hampden-Thompson, G. (2005). The condition of 441

Find more at Chapter fifteenMixed Methods Research: Integrating Quantitative and Qualitative Research Designs Julie and Julia, 2009 “Mixed methods research designs involve the collection, analysis, and “mixing” of quantitative and qualitative research designs.” (p. 444)

Find more at chapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research Designs 443Learning Outcomes Task 7DAfter reading Chapter 15, you should be able to For a qualitative and/or quantitative study, youdo the following: have already created research plan components (Tasks 2 and 3), and described a sample (Task 4). 1. Define mixed methods research, and describe If your study involves mixed methods research, the purpose of a mixed methods study. now develop the research procedures section of the research report. Include in the plan the overall 2. Distinguish among the various basic and approach and rationale for the study, site and sam- advanced mixed methods research designs. ple selection, the researcher’s role, data collection methods, data management strategies, data analysis 3. Describe the processes involved in conducting strategies, trustworthiness features, and ethical con- mixed methods research. siderations (see Performance Criteria at the end of this chapter, p. 455). 4. Identify studies that use mixed methods designs. 5. Evaluate a mixed methods study using a series of questions and criteria.The chapter learning outcomes form the basis forthe following task, which requires you to developthe research procedures section of a research report. Summary: Mixed Methods ResearchDefinition Mixed methods research combines quantitative and qualitative research designs by including both quantitative and qualitative data in a single study. The purpose of mixed methods research is to understand a phenomenon more fully than is possible using either quantitative or qualitative designs alone.Design(s) There are three common, basic types of mixed methods research design: • Explanatory sequential (also known as the QUAN S qual) design • Exploratory sequential (also known as the QUAL S quan) design • Convergent parallel (also known as the QUAN + QUAL) design Three advanced types of mixed methods research designs are also frequently used: • Experimental design • Social justice design • Multistage evaluation designTypes of appropriate Questions that involve the collection and analysis of both quantitative andresearch questions qualitative data in order to better understand the phenomenon under investigation.Key characteristics The differences among the basic designs are related to the priority given to the following areas: • the weight given to the type of data collected (i.e., qualitative and quantitative data are of equal weight, or one type of data has greater weight than the other) • the sequence of data collection (i.e., both quantitative and qualitative data are collected during the same time period, or one type of data is collected in each sequential phase of the project) • the analysis techniques (i.e., either an analysis that combines the data or one that keeps the two types of data separate) (continued )

Find more at www.downloadslide.com444 chapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research DesignsSteps in the process The differences among the advanced designs are related to the following area:Potential challengesExample • the framework in which the basic design is embedded (i.e., experimental, social justice, program evaluation) 1. Identify the purpose of the research. 2. State research questions that require both quantitative and qualitative research designs. 3. Determine the priority to be given to the type of data collected. 4. Determine the sequence of data collection (and hence the appropriate mixed methods design). 5. Collect data. 6. Conduct data analysis appropriate for each kind of data. 7. Write a report that draws conclusions based on both qualitative and quantitative data and analysis. • Few researchers possess all the knowledge and skills needed to master the full range of research techniques encompassed in quantitative and qualitative research approaches. • Researchers who undertake a mixed methods study must have the considerable time and resources needed to implement such a comprehensive approach to research. • A high level of skill is required to analyze quantitative and qualitative data sources concurrently or in sequence and to find both points of intersection and discrepancies. What are college students’ attitudes toward and use of birth control measures?In this chapter, we present an introduction to Although the benefits of this approach to researchmixed methods research, focusing on how to inte- may appear obvious (i.e., of course we wantgrate both qualitative and quantitative designs. a c­omplete u­nderstanding of any phenomenon worthy of investigation), mixed methods researchMixed methods Research: can be challenging because it requires a thoroughDefinition and Purpose understanding of both quantitative and qualita- tive research procedures. The choice of a mixedMixed methods research designs involve the methods design also assumes that the researchc­ollection, analysis, and “mixing” of quantitative problem, and hence the research questions, cannotand qualitative research designs to understand a be answered adequately by either a qualitative or aresearch problem. They include both quantitative quantitative research design alone.and qualitative data collection strategies withinthe same study. The main purpose of mixed For example, let’s say that you are interested inmethods research is to use the advantages of students’ attitudes toward and use of birth control.both quantitative and qualitative research designs To answer this question with a mixed methodsand data collection strategies to understand a design study, you might collect quantitative data inphenomenon more fully than is possible using the first phase and then follow up with qualitativeeither quantitative or qualitative design alone. data in the second phase. For the initial quantita- tive phase, your research question might be “What are the factors that affect college students’ attitudes

Find more at www.downloadslide.comchapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research Designs 445toward and use of appropriate birth control mea- the conventions presented by Morse.1 This letteringsures?” and you might administer a survey to a ran- system consists of three important distinctions:dom sample of college students. In the follow-upqualitative phase, your research question may be 1. Whether the research is qualitatively (QUAL)“When students mention alcohol as an ‘influencing or quantitatively (QUAN) oriented.factor’ with respect to their use of birth control,what do they mean?” This phase might include a 2. Which aspect of the mixed methods designseries of interviews with individual college stu- is dominant, as indicated through the use ofdents, with the interview questions arising from the uppercase letters, and which aspect of theresults of the earlier survey. Alternatively, you could design is less dominant, as indicated throughbegin with a qualitative interview or focus group the use of lowercase letters.of college students to help determine the areas ofconcern related to attitudes and use of birth control 3. Whether mixed methods designs areamong sexually active college students and use the conducted simultaneously, as designated by athemes that emerge from the interviews or focus plus sign (+) or sequentially, as designated bygroup to develop a quantitative survey instrument an arrow (S).to be administered to a random sample of collegestudents. This mixed methods study would provide The Explanatory Sequentialan understanding both broad (i.e., from survey (QUAN S qual) Designresults) and deep (i.e., from interview data), onethat would not be possible to achieve using either a In the explanatory sequential mixed methods design,quantitative design or a qualitative design by itself. quantitative data are collected first and are more heavily weighted than are qualitative data. In theTypes of Mixed methods first study or phase, the researcher formulates aResearch Designs hypothesis, collects quantitative data, and conducts data analysis. The findings of the quantitative studyOur discussion in this chapter focuses on three then determine the type of data collected in a sec-basic mixed methods designs: ond phase, which includes data collection, analysis, and interpretation of qualitative data. The researcher■ Explanatory sequential (QUAN S qual) can then use the qualitative analysis and interpreta-■ Exploratory sequential (QUAL S quan) tion to help explain or elaborate on the quantitative■ Convergent parallel (QUAN + QUAL) results. When quantitative methods are dominant, researchers may enliven their quantitative findingsand three advanced mixed methods designs, which by collecting and writing case on the basic designs: Let’s return to our earlier example: What are■ Experimental college students’ attitudes toward and use of birth■ Social justice control measures? What might an explanatory■ Multistage evaluation sequential mixed methods design that attempted to answer this question look like? In a study using thisBasic Mixed Methods Designs design, you would collect quantitative data in the first phase and then, in the second phase, collectBasic mixed methods designs are most commonly and analyze qualitative data to explain the findingsused in education and should be viewed as a good from the first phase. As we discussed earlier, youstarting point for educational researchers who would likely first administer a quantitative surveyseek to combine the strengths of quantitative and to a simple random sample of college students,qualitative research designs and data collection then analyze the data to show the percentage ofstrategies. Figure 15.1 provides a summary of basic students who agreed or disagreed with particularmixed methods designs. statements about using birth control. In the second In the nomenclature for mixed methods designs, 1 Morse, J. M. (2003) Principles of mixed methods and multi-our use of uppercase and lowercase letters follows method research design. In A. Tashakkori & C. Teddlie (Eds.), Handbook of mixed methods in social and behavioral research (pp. 189–208). Thousand Oaks, CA: Sage.

Find more at www.downloadslide.com446 chapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research DesignsFigure 15.1 • Basic mixed methods designsConvergent Parallel DesignQuantitative Data Quantitative Merge Results Interpret or Explain Collection and Results for Comparison Convergence/Divergence Analysis QualitativeQualitative Data Results Collection and AnalysisExplanatory Sequential DesignQuantitative Data Quantitative Determine Qualitative Data Qualitative Interpret How Collection and Results Quantitative Collection and Results Qualitative Data Analysis Results to Analysis Explains Explain QuantitativeExploratory Sequential Design ResultsQualitative Data Qualitative Use Results to Quantitative Data Quantitative Interpret How Collection and Results Form Variables, Collection and Results Quantitative Results Analysis Based Provide New Results, Analysis Instruments, on Variables, Interventions Instruments, New, Better Interventions Instruments, and Better InterventionsSource: Creswell, John W., Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research, 5th Edition, © 2015.Reprinted by permission of Pearson Education, Inc., Upper Saddle River, NJ.phase of the design, you could include qualita- When the qualitative study (or phase in a study)tive interviews or focus groups of college students comes first, it is typically an exploratory study inselected from the group who participated in the which observation and open-ended interviews withsurvey, with the questions focused on explanations individuals or groups are conducted. Analysis offor common patterns seen in the analysis of the these qualitative data helps the researcher identifyquantitative data. For example, if a large percentage key concepts and potential hypotheses to exploreof students disagreed with one particular statement and test with quantitative techniques such as sur-in the questionnaire, you could use the interview to vey, census, and Likert-scale data that can be ana-explore more deeply the reasons for that disagree- lyzed along with narrative data. With this studyment. The qualitative data would then be analyzed, design, the validity of the qualitative results can beand the themes that emerge from the analysis could enhanced by the quantitative used to help understand and perhaps even chal-lenge initial interpretations of the survey data. In the exploratory sequential design, the researcher usually begins by working inductively,The Exploratory Sequential collecting qualitative data from a purposive sample(QUAL S quan) Design (as described in Chapter 5) with a goal of increasing understanding of the phenomenon under investiga-In the exploratory sequential mixed methods design, tion. Data collection would be followed by analysesqualitative data are collected first and given more appropriate for qualitative data (e.g., coding, iden-emphasis or attention than the quantitative data. tifying themes and categories, concept mapping, and visually displaying the data), and the researcher

Find more at www.downloadslide.comchapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research Designs 447would identify key unanswered questions. The sec- (e.g., data about the context) offset the weaknessesond phase of the study would then build on the of quantitative data (e.g., ecological validity), andfindings of the qualitative phase of the study, the strengths of quantitative data (e.g., generaliz-involving quantitative data collection (e.g., a survey) ability) offset the weaknesses of qualitative datawith a large, randomly chosen sample (as described (e.g., context dependence). The fully integrated con-in Chapter 5). Ideally, analyses of the quantitative vergent parallel design is the most challenging typedata would ultimately lead to results that are gen- of mixed methods research because it requires thateralizable to a larger population. Each phase of the the researcher place equal emphasis and attentionstudy must adhere to the methodological assump- on concurrently collected quantitative and qualita-tions for the qualitative and quantitative processes tive data and that the researcher look critically at theand designs used in the mixed methods study if the results of the quantitative and qualitative analysis tofindings of the study are to be generalizable. determine if the datasets reveal similar findings. We return to our earlier example, What are A convergent parallel mixed methods designcollege students’ attitudes toward and use of birth focused on the research question What are collegecontrol measures? What might an exploratory students’ attitudes toward and use of birth controlsequential mixed methods design that attempted measures? might look like the following: You wouldto answer this question look like? You could begin likely begin by designing a quantitative surveywith a series of qualitative interviews or a focus and developing questions for qualitative interviewsgroup of college students to help determine the and/or focus groups simultaneously, based on aareas of concern related to attitudes and use of review of the literature. Then you would collectbirth control. You would then analyze the data and both types of data at the same time, although notuse the themes that emerge to help understand the necessarily from the same sample (e.g., the surveyconcerns and attitudes related to the use of birth could be sent to a large sample from many collegescontrol in this sample of students. In the second and universities, while the focus groups might bephase of the study, you could use those themes to conducted with a smaller, purposive sample of stu-develop and administer a quantitative survey. For dents from one college). After completing all theexample, if students indicated that they preferred data collection (both quantitative and qualitative),not to use condoms in the qualitative phase, you you would then merge the two datasets togethercould develop quantitative survey questions that for analysis, using each type of data as a check orask students to rate whether they agree or dis- an expansion of the other. For example, the per-agree with statements about condom use, such centages of students agreeing with particular state-as “The use of condoms for ‘safe sex’ is the only ments could be computed and then annotated orresponsible thing to do.” The resulting survey enhanced with quotes from individual students.could be administered to a simple random sampleof college students from the same population or To summarize, the differences among the basicfrom a different or larger population. The findings designs are related to the priority given to the typefrom the survey could then be compared to the of data collected (i.e., qualitative and quantitativethemes identified during the qualitative phase of data are of equal emphasis, or one type of datathe study, providing both a check on validity and a has greater emphasis than the other), the sequencemore complete understanding of the phenomenon. of data collection (i.e., both types of data are col- lected during the same time period, or one type ofThe Convergent Parallel data is collected in each sequential phase of the(QUAN+QUAL) Design project), and the analysis techniques (i.e., either an analysis that combines the data or one that keepsIn the convergent parallel mixed methods design, the two types of data separate).quantitative and qualitative data are given equalattention and emphasis and are collected concur- Advanced Mixed methodsrently throughout the same study—the data are not Research Designscollected in separate studies or distinct phases, as inthe other two basic designs. The main advantage of Advanced mixed methods research designs havethis design is that the strengths of qualitative data emerged in the last few years because mixed meth- ods research has become more common within

Find more at www.downloadslide.com448 chapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research Designsmany social and health science disciplines. The more directly in the experimental experiencethree advanced designs discussed in this chapter than would be possible in a simple experimentaleach take a basic design and frame it within some design. In this advanced design, the qualitativelarger worldview or broad context—an experi- data may be collected before, during, or after thement, a social justice issue, or a program evalua- conclusion of the experiment. In other words,tion process. Figure 15.2 provides a summary of the experimental mixed methods design can bethe advanced mixed methods designs. based on a convergent, exploratory, or explana- tory basic design. For example, you might con-The Experimental Design duct an experiment to test for any effects of a particular treatment—say, whether a seminar inIn the experimental mixed methods design, the sexual behavior on campus affected college stu-quantitative element of the study is always an dents’ attitudes toward birth control—and collectexperimental design (as described in Chapter qualitative data at the same time (convergent basic10), and these data are the primary, more heavily design) that is intended to answer the researchweighted data in the study. The qualitative phase question “What were the experiences of the par-of the study is intended to provide data to sup- ticipants as they participated in the treatment/port or supplement the quantitative data from the intervention phase of the e­xperimental study?”experimental design, involving the participants Or you might collect qualitative data immediatelyFigure 15.2 • Advanced mixed methods designs Experimental Mixed Methods Design Experimental Study Qualitative Experiment Group Pre-test Intervention Post-test QualitativeData Collection, Control Group Pre-test Post-test Data Collection, Analysis, and Results Analysis, and Results (Explanatory) (Exploratory) Qualitative Data Collection, Analysis, and Results (Convergent) Social Justice Design (using an Explanatory Sequential Design example)Theory Research Quantitative Quantitative Qualitative Data Qualitative Interpret How Questions Data Collection Results Collection and Results Qualitative Data e.g., survey Analysis Explains Quantitative Results and Calls for Action Promote Social JusticeMultistage Evaluation Design (using an Exploratory Sequential Design example) Single Program ObjectiveFormative Needs Theory/ Instrument Formative Summative Program Assessment Conceptual Framework Development Program Program Revision (qualitative Assessment Evaluation (based on qualitative (based on (qualitative (pre-post data collection) results) quantitative tests) data collection) quantitative tests)Source: Creswell, John W., Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research, 5th Edition, © 2015.Reprinted by permission of Pearson Education, Inc., Upper Saddle River, NJ.

Find more at www.downloadslide.comchapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research Designs 449afterward (explanatory basic design), to answer assessing the impact. Moreover, it is common forthe research question “Did students feel that the multistage evaluation studies to include formativeprogram would influence their subsequent behav- evaluation stages in which the new program orior?” As in all mixed methods designs, you would instrument is pilot-tested, revised, retested, revisedcombine the data in some way during analysis and again, and so on, before full implementation and awhen drawing conclusions in the experimental summative evaluation. The stages of the design aremixed methods design. typically interrelated, all focused on addressing a single question or objective.The Social Justice Design Depending on the overall aim of the study, theThe social justice advanced design can also be stages can be done sequentially or concurrently,based on any of the three basic mixed methods and thus the multistage evaluation design can bedesigns; regardless of the basic design, however, built on any of the three basic designs—and eachthe social justice design always has a specific pur- individual stage can require qualitative or quantita-pose: to address one or more injustices faced by tive data, or both. For example, perhaps your studysome group in society and ultimately evoke soci- of college students’ attitudes toward and use ofetal change. As Creswell asserts, when using this birth control suggested that students might ben-design, “The mixed methods researcher uses as an efit from a semester-long program on safe sexualoverall orientating lens in the study as a transfor- practices on campus. You might then choose tomative framework. This framework may be a femi- develop a program and test it with a multistagenist perspective, a racial or ethnic perspective, or evaluation design, collecting quantitative data onsome other perspective. It is this framework that student participation and student demographics,shapes many aspects of the mixed methods design, quantitative survey data on students’ perceptionssuch as the framing of the theory, the questions, of the materials available ­during the program,the methods, and the conclusions”2. The social jus- qualitative case study or individual interview datatice design shares some of the characteristics of from students participating in the program andcritical action research (discussed in Chapter 16), from the staff and administrators who run thewhich has a goal of emancipation for the study’s program, and so on. Based on early data from aparticipants; it is democratic, equitable, liberating, planning stage, you might decide to alter the pro-and life-enhancing. The goal of the researcher in gram in some way, and then you would collectimplementing a social justice advanced design is to new data—q­ ualitative, quantitative, or both—afterbring about positive change for an underrepre- implementing the change. Ultimately, the datasented group. from all the stages would go into a full report that reflects how well the program addressed the origi-The Multistage Evaluation Design nal need.Multistage evaluation is the systematic process of Conducting Mixed Methodscollecting and analyzing data about the quality, Researcheffectiveness, merit, or value of programs, products,or practices (as described in Chapter 1). A multi- Your research question(s) dictate whether a mixedstage evaluation design is used when a researcher methods study is appropriate for your research,wants to evaluate the impact of a program using and the decision about whether one type ofboth qualitative and quantitative data collection data will be primary and one will be secondarytechniques. Multistage evaluation designs are con- determines the basic design. Whether or not youducted, as the name implies, as a series of stages tackle a problem within an overarching frame-that usually include assessing needs, planning work ­determines whether you need an advancedsome sort of program or developing an instrument, design, and the particular framework guides yourimplementing that program or instrument, and final selection. Regardless of the design you select, you need to adhere to the basic principles for2 Educational Research: Planning, Conducting, and Evaluating q­ uantitative and qualitative research that relate toQuantitative and Qualitative Research (5th ed., p. 550), by J. W. the procedures you choose. For example, a reviewCreswell, 2015, Upper Saddle River, NJ: Pearson Education, Inc.

Find more at www.downloadslide.com450 chapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research Designsof previous literature plays a different role for individuals in that sample are not appropriate par-qualitative research than it does for quantitative ticipants for the second quantitative phase, whichresearch (see Chapter 3). Thus, when conducting will likely require a larger, random sample.a mixed methods study, a researcher needs to con-sider how and when to conduct a literature review, Let’s return to our example from earlier in thewhether or not to let it guide hypotheses or other chapter. Students who participate first in a focusexpectations for the study, and how to present it in group about their attitudes toward birth controla final report (see Chapter 21). may be influenced by the other participants in the focus group, or perhaps they may be influ- Additionally, all procedures to be used in a enced by a subtle response by the must be reviewed by an Institutional Review If these students are then given a survey, theyBoard (IRB) prior to data collection for that phase may—intentionally or unknowingly—change their(see Chapter 1). Thus a researcher conducting a answers. Moreover, they may be concerned thatconcurrent mixed methods study should submit their responses to the survey would not be anony-an initial proposal, including both the qualita- mous, given their previous interactions with thetive and the quantitative components of the study, researcher, and thus not give honest responses.but a researcher conducting an explanatory mixed In this type of mixed methods design, an entirely­methods study may need to submit an initial pro- new random sample may be needed for the sec-posal for the first, quantitative phase of the study ond phase of the study.and then, after that phase is complete, a secondp­roposal for the qualitative phase of the study. Adherence to the sampling requirements forTwo (or more) proposals are likely necessary if the each phase of the mixed methods study is alsosecond phase of the study is based on the findings crucial so that the researcher can use the statisti-from the first phase. cal tests and analysis methods appropriate for each type of data. For example, if you conduct Sampling (i.e., selecting participants; see an ethnography as the qualitative component ofChapter 5) is a particularly important consider- an exploratory design, you would transcribe theation when conducting a mixed methods study data and organize it to look for themes. If youbecause sampling for each phase of the study conducted a survey as the quantitative compo-(qualitative and quantitative) must be compatible nent, you’d tabulate the responses and summa-with the assumptions belonging to that part of the rize them with some type of descriptive statisticsdesign. Consider the sampling procedures needed (in Chapters 17 to 20, we discuss basic principlesfor an explanatory mixed methods design. The first and procedures for analysis in more detail).phase is quantitative, and quantitative research Note that, in many mixed methods studies, thegenerally requires some type of random sample so researcher analyzes both types of data together,that the findings can be generalized to the broader looking for points of intersection and possiblepopulation. For the second (qualitative) phase, discrepancies.however, purposive sampling is appropriate. Theresearcher may choose a sample that provides use- For a more expansive discussion of the nutsful data most effectively, and in many cases that and bolts of conducting mixed methods researchpurposive sample is a subset of the original sam- designs and the inherent methodological chal-ple that was randomly selected for the quantitative lenges of these designs, we recommend the workphase—as long as the participants give consent of Morse and Niehaus,3 which will help to guidefor this phase of the study as well (see Chapter 1). you through the nuances of these designs. For aAfter all, the goal is often to have the individuals comprehensive discussion of data analysis consid-who participated in the quantitative study explain erations in mixed methods designs, we encouragetheir responses in more detail or (as in a multi- you to review Teddlie and Tashakkori’s4 guidelines.stage evaluation study) discuss their experienceas a participant in the study. On the other hand, 3 Morse, J. M., and Niehaus, L. (2009). Mixed Method Design:consider the exploratory mixed methods design. Principles and Procedures. Walnut Creek, CA: Left Coast Press.Because the primary phase of the design is quali- 4 Teddlie, C., & Tashakkori, A. (2009). Foundations of Mixedtative, purposive sampling is appropriate, but the Methods Research: Integrating Quantitative and Qualitative Approaches in the Social and Behavioral Sciences (pp. 249–285). Thousand Oaks, CA: Sage.

Find more at www.downloadslide.comchapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research Designs 451Identifying Studies Using research can work together to broaden edu-Mixed Method Designs cational research from a single to a multiple perspective.You now have the tools to identify a mixed methodsstudy. When reading a study that may use mixed Evaluating a Mixedmethods, look for the following characteristics: methods Study 1. The study title includes terms such as When reading a mixed methods study, you should quantitative and qualitative, mixed methods, ask yourself the following questions:5 convergent parallel, explanatory, exploratory, simultaneous, sequential, or other terms that ■ Does the study include a rationale for using a suggest a mixture of methods. mixed methods research design? 2. The purpose statement or the research ■ Is the correct type of mixed methods research questions indicate that a mixed methods design used? design was used, or, if the researcher does not explicitly state the design(s), you ■ Does the study use both quantitative and recognize that the questions can be answered qualitative data collection techniques only with a combination of quantitative and appropriately? qualitative data. ■ Is the priority given to quantitative and 3. The researcher states that both qualitative qualitative data collection and the sequence and quantitative methods were used for of their use reasonable given the research collecting and analyzing data, or it is clear question? from the description of the method and the results that both narrative and numerical data ■ Was the study feasible given the amount of were collected and analyzed. data to be collected and concomitant issues of resources, time, and expertise?Once you’ve identified a study as a mixed meth-ods study, you can also tell from the research ■ Does the study identify qualitative andreport whether the researchers used a basic or an quantitative data collection techniquesadvanced mixed methods design, and then which clearly?type of basic or advanced design. A researcherusing an advanced design directly or indirectly ■ Does the study use appropriate data analysisspecifies an overarching framework, such as a techniques for both qualitative and quantitativeprogram evaluation or a focus on a group that data?currently experiences some type of injustice. Thefocus of the framework determines, by defini- Armed with answers to these questions, you will betion, whether the study is an experimental, social prepared to evaluate a mixed methods study shouldjustice, or multistage evaluation design. If no you encounter one during a review of related lit-framework is provided or implied, the study is erature. You will also be able to evaluate potentiala basic design, and you can identify the type of designs for your own research. Given the complex-basic design by looking at both sequencing and ity of planning and conducting a mixed methodsthe weight or priority given to the phases of data study, a novice researcher interested in this typecollection. of research is advised to team up with a colleague who possesses a skill set, in either qualitative or Figure 15.3 shows an example of an abstract quantitative research, that complements that of thefor a combined qualitative and quantitative novice.research study. The abstract provides an overviewof how combined quantitative and qualitative 5 Questions adapted from Educational Research (p. 559), Creswell.

Find more at www.downloadslide.com452 chapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research Designs Figure 15.3 • Abstract of a mixed methods studyNote that the title indicates AUTHOR Holbrook, Allyson; Bourke, Sid; Owen, John M.; McKenzie, Phil; Ainley, Johnthat the research involves TITLE Mapping Educational Research and Exploring Research Impact: A Holistic,both qualitative and PUB DATE Multi-Method Approach.quantitative methods. NOTE 2000 31P.; Paper presented at the Annual Meeting of the American EducationalThe topic states that the PUB TYPE Research Association (New Orleans, LA, April 24–26, 2000).study “considered the EDRS PRICE “Mapping Educational Research and Its Impact on Schools” was one ofimpact”—not “determined DESCRIPTORS three studies of the “Impact of Educational Research” commissioned andthe impact”—giving the funded by the Australian Federal Dept. of Education, Training, and Youthabstract a distinct IDENTIFIERS Affairs in 1999 (Minister: The Honorable Dr. David Kemp, MP).qualitative flavor. ABSTRACT Reports—Research (143)—Speeches/Meeting Papers (150)Note that the study used MFOl/PCO2 Plus Postage.both questionnaires Administrator Attitudes; Databases; Educational Administration; Educational(quantitative) and Policy; *Educational Research; Elementary Secondary Education; Foreigninterviews (qualitative) to Countries; *Graduate Students; Higher Education; *Principals; *Researchcollect data. It is common Utilization; *Teacher Attitudes; Theory Practice Relationshipin mixed method studies to *Australiacombine these two data- This paper discusses the main analytical techniques used in “Mappingcollection methods. Educational Research and Its Impact on Schools.” The study considered the impact of the outcomes of educational research on the practice of teaching and learning in Australian schools and on educational policy and administration. Mixed methods were used, beginning with a review of the literature and the exploration of the Australian Education Index (AEI) educational research database. Documents were collected from faculties of education in Australia, and questionnaires about the use of educational literature were developed for postgraduate students (n 1,267), school principals (n 73), and representatives of 72 professional associations. Interviews were then conducted with seven policymakers and selected respondents to the postgraduate student questionnaires. The study indicates that it is possible to use an existing database to monitor educational research in Australia. A clear majority of all three groups surveyed provided evidence of the awareness, acceptance, and valuing of educational research in Australia. Interviews with policymakers also showed the use of educational research in policy formation. The multiple perspectives of this study give a picture of the links between research and its use in schools and departments of education in Australia. An appendix summarizes the database descriptors from the database investigation. (Contains 3 tables, 3 figures, and 34 references.) (SLD)Source: Holbrook, Allyson; Bourke, Sid; Owen, John M.; McKenzie, Phil; Ainley, John. “Mapping Educational Research and Exploring ResearchImpact: A Holistic, Multi-Method Approach.” Paper presented at the Annual Meeting of the American Education Research Associations(New Orleans, LA, April 24–28, 2000). Used with permission.

Find more at www.downloadslide.comchapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research Designs 453Summary 8. Sampling must be compatible with the assumptions belonging to the quantitativeMixed methods Research: Definition and qualitative research designs included inand Purpose the study. 1. Mixed methods research uses procedures for Identifying Studies Using Mixed conducting research that are typically applied methods Designs in both quantitative and qualitative studies to understand a research problem more fully.   9. Look to the title, purpose statement, and research questions for terms suchTypes of Mixed methods Research as quantitative and qualitative, mixedDesigns methods, convergent parallel, explanatory, exploratory, simultaneous, sequential, 2. Basic mixed methods research designs are or other terms that suggest a mixture of distinguished based on the order and the methods. weight of the qualitative and the quantitative components. 10. Look to the data collection section and the analysis section to determine whether 3. In the explanatory sequential mixed methods both qualitative and quantitative data were design, quantitative data are collected first and collected and analyzed. are more heavily weighted than are qualitative data. The findings of the quantitative phase 11. To determine the particular type of mixed determine the type of data collected in the methods design, identify where the researcher qualitative phase. indicates the order and the preference given to qualitative or quantitative data collection 4. In the exploratory sequential mixed methods techniques and whether the study has an design, qualitative data are collected before overarching framework. quantitative data. The qualitative phase is exploratory and leads to potential hypotheses Evaluating a Mixed methods that are then tested with quantitative techniques. Study 5. In the convergent parallel mixed methods 12. A mixed methods study can be evaluated by design, quantitative and qualitative data are answering questions related to the use of given equal attention and emphasis and are at least one qualitative and one quantitative collected concurrently. research method, the rationale for using a mixed methods research design, the 6. Advanced mixed methods research designs priority and sequence given to qualitative take a basic design and frame it within and quantitative data collection, the use some larger worldview or broad context: of qualitative and quantitative research an experiment, a social justice issue, or a questions and matching data collection program evaluation process. techniques, and the use of appropriate data analysis techniques for mixed methodsConducting Mixed Methods designs.Research 7. It is critical that the mixed methods researcher adhere to the basic principles for quantitative and qualitative research that relate to the procedures used in the study.

Find more at This page intentionally left blank

Find more at www.downloadslide.comchapter 15  •  Mixed Methods Research: Integrating Quantitative and Qualitative Research Designs 455Performance Criteria TA S K 7The qualitative research topic or problem should or observed in their natural setting to keep thebe open-ended and exploratory in nature. Your interview or observation as authentic as possible.qualitative research questions should be worded to The description of the setting should be included.illuminate an issue and provide understanding ofa topic, not a­ nswer specific questions. You should Data collection methods should be described,mention the type of r­esearch approach you will and there may be more than one data collectionuse—for instance, you should note whether it is method in a study. Qualitative data are descriptive;a case study, a grounded theory study, an ethnog- they are collected as words. Data may be in theraphy, or a narrative study. The reason you chose form of interview notes and transcripts, observa-this topic, or the nature of its importance, should tion field notes, and the like. The researcher willbe mentioned. be immersed in the data and participate in data collection. Instruments may be video cameras, Qualitative studies may include literature ci- audio recorders, notepads, researcher-created ob-tations in the introduction of a study to provide servation records, and so forth. The description ofbackground information for the reader and to build the instruments should also describe their validitya case for the need for the study. Literature relevant and the research topic should be presented in yourexample, and citations should follow American An example that illustrates the performancePsychological Association (APA) style, for example, called for by Task 7 appears on the following pagesSmith (2002). Despite the fact that your study may (see Task 7D Example). The task example is repre-not require a literature review until data collection sentative of the level of understanding you shouldbegins, cite some related texts anyway to get prac- have after studying Chapters 12 through 15. Whentice weaving the literature into the plan. the researcher created her plan, she still needed to choose her core participants, carry out data col- The description of participants should in- lection and data analysis, and write the final study.clude the number of participants, how they were She constructed her plan taking into considerationselected, and major characteristics (for example, the six steps in the research process. Note that notoccupation). Participants are ideally interviewed all plans or proposals require a results section.

Find more at www.downloadslide.comTASK 7 Example 1 Research Plan for: How Do Teachers Grade Student Essays? The Research Aim The purpose of this study was to examine the ways that freshman and sophomore high school English teachers grade their students’ essays. I chose this topic to study because students in my school complain that their essays are graded unfairly. For example, one student said, “Teachers give the same scores for essays of different length.” Other comments I hear include, “Teachers don’t provide enough information about the number of examples they want included in an essay” and “Teachers don’t give enough information about features they want included in essays so I can never match what they expect.” I wanted to understand how teachers actually grade student essays. I also wanted to find out what criteria teachers use and whether they explain their essay grading criteria to their students, or whether the students’ complaints are legitimate. At the beginning of my exploration, my topic was stated generally, but through my initial investigations, it has narrowed a bit. Because qualitative research involves recurring study and examination, the topic may narrow some more. My approach is to carry out an ethnographic study. The research context is participants’ classrooms. Literature Review An initial concern was the decision of whether to obtain and study existing literature, and if so, at what point in the study. For this study, I have consulted two assessment books frequently read by teachers in teacher education programs, Nitko (2001) and Linn and Gronlund (2000), to find out what sort of training teachers receive in scoring essays. Having some understanding of the following will help me to recognize scoring practices in the teachers I plan to interview: forms and uses of essay questions, their advantages and limitations, how essay questions should be constructed to measure the attainment of learning outcomes, and essay question scoring criteria. Later in the study, I may find a need to examine additional literature, but for now, this has been sufficient. Choosing Participants I identified teachers in freshman and sophomore English classes in an urban high school as participants for the study. The high school is the context for the ethnographic setting of the study. I contacted the school principal initially to propose the study and receive her approval before proceeding and contacting potential participants. She was cordial, and asked for more information about the role that the teachers and students would have. We discussed her concerns and she consented. She indicated that she would send informed consent forms to the parents involved in the study. All but two of the students’ parents agreed to let their children participate. There was no indication why the parents declined the request. The principal of the school also provided me with copies of the school’s Human Subject Review Form to give to the teachers to sign when I explained their part in the study to them. I contacted the freshman and sophomore English teachers in the school, and described the project to them as an exploratory study about how teachers plan lessons and assess their students. I also told them, and the principal concurred, that each teacher456

Find more at 2participant would be identified by a number rather than by a name in the final written study. Only I would knowthe identities of the teachers. I thought this would allow the teachers to be more open when providing data. Allbut three teachers agreed to become participants in the study. Two of these teachers asked for more informationabout what would be asked of them. One of the two teachers agreed to participate after more discussion, but theremaining two teachers still opted not to participate. I thought this final number, 8 teacher participants, was agood sample for the study. The principal has been a helpful gatekeeper and interested observer. In general, theschool personnel are supportive. I will identify approximately 10 students to participate in the study and provide comments about essayitems and graded essays. I suspect that this number will decrease once data collection begins and I determinewhich participants can provide the most helpful comments. As the research data are collected, I will note the comments of the teachers, not only to obtain theircomments on grading essays, but also to identify the most articulate and conceptual teachers to focus on duringdata collection. Ultimately, I will have a smaller number of core participants than I began with. Data Collection The teachers will be studied in their own context, in each teacher’s own classroom. If this is not possible,data collection will take place somewhere else in the school. Ethnographic data collection relies heavily onasking questions, interviewing, and observing participants. Each of these methods will be applied over a periodof 12 weeks. I plan to collect data in the form of completed and graded student essays from the teachers. I expectto collect approximately 7 to 9 essays per teacher over the 12 weeks. I think this will be sufficient to capture andintegrate the data. I have arranged to receive a copy of each essay exam or assignment. The purpose of this form of datacollection is to assess the characteristics of the essay items. I plan to examine the essay items to evaluatewhether students understand what is expected of them in this type of performance assessment. I will also lookat samples of the teachers’ essay items that will be critiqued by students. Again, the names of the students willbe confidential. I also plan to informally interview teachers and ask questions such as “Tell me how you grade youressays and why,” “What do you consider to be the best feature of your essay grading?” “What do youconsider to be the weakest feature of your essay grading?” Similar questions will be asked of the students ininformal interviews. During the 12-week period, I plan to hold several focus groups, one with teachers, and one with students,in which grading is discussed. The focus groups will be audiotaped and then transcribed. Finally, I will employobservation to obtain data. I will observe teachers grading student essays, question them about why they assignthe grade they do, note the time it takes them to grade the items, and so forth. If written feedback is providedfor the graded essays, I will collect a copy as data, for later analysis. I will also follow these graded essays andask the students whether they feel the essay items are fairly graded. Therefore, my data will include student artifacts,audiotapes, field notes and memos from informal questioning and interviews, and field notes from observations. 457

Find more at 3 Data Analysis As data are collected from the participants, I will examine and reexamine the data in search of themes and integration in the data to arrive at a number of themes. I anticipate that analyzing and synthesizing the data will take approximately three to four weeks, eight hours a day, after data collection ends. Triangulation among asking questions, observing, interviewing, and analyzing essays will help to integrate the analysis. Results The final step will be to describe the procedures and interpretation in a written format for others to examine and critique. Before writing up the study, it will be important to spend time thinking about the data analysis and interpret what the data reveal. I hope to be able to express a contribution or insight that emerges from this study.458

Find more at www.downloadslide.comHow Should Middle-School Students withLD Approach Online Note Taking?A Mixed-Methods StudyL. Brent IgoClemson UniversityRoger H. BruningUniversity of NebraskaPaul J. RiccominiClemson UniversityGinger G. PopeSpecial Education Teacher Abstract  This explanatory sequential mixed-methods study explored how the encoding of text ideas is affected when students with learn- ing disabilities (LD) take notes from Web-based text. In the quantitative phase of the study, 15 students took three kinds of notes—typed, copy and paste, and written—with each kind of notes addressing a different topic. After taking notes, students performed poorly on two immediate measures of facts learning. Cued-recall test performances were best for topics noted by writing, whereas multiple-choice test performances were best for topics noted by copying and pasting. Students performed worse on the cued-recall test when it was readministered four days later. In the qualitative phase of the study, followup interviews indicated students preferred copying and pasting their notes (for practical reasons) and found typing notes to be distracting, which made learning problematic. A textual analysis of students’ notes confirmed that students took mostly verbatim notes when typing or writing, which has been linked to shallow processing, and perhaps further accounts for the low level of learning that occurred. The mixing of quantitative and qualitative data (in the qual- itative data analysis phase of the study), along with learning and motiva- tion theories, provides justification for teachers to instruct middle-school students with LD to use copy and paste to take notes from W­ eb-based sources.Access to the general education curriculum is mandated for students with dis- (01)abilities by the Individuals with Disabilities Education Act Amendments of (02)1997 (Federal Register, 1999) and reiterated in the 2004 reauthorization(Council for Exceptional Children, 2004). The importance placed on studentaccess and progress requires educators to provide students with disabilitiesinstruction on the essential skills and concepts emphasized through the gen-eral education curriculum. Advances in technologies over the last decade may offer a path to ­improvedstrategies for students with learning disabilities (LD) to successfully access and 459

Find more at progress through the general education curriculum. Note taking is an especially useful skill that can be applied to learning from Web-based sources. (03) Web-based note taking is increasingly common, as students are more readily using the Internet for research purposes (Dabbagh & Bannan-Ritland, 2005). However, to learn from online sources students need more than ­access to learning technologies; they need proper instruction related to on- line learning (Dabbagh & Bannan-Ritland, 2005). (04) Recent research has addressed this issue, suggesting that teachers can i­mprove the effectiveness of students’ Web-based note taking by providing stu- dents with a cued note chart (to ensure appropriate information is gathered) or by instructing students to type their notes instead of copying and pasting them from Internet sources (Igo, Bruning, McCrudden, & Kauffman, 2003). Unfortunately, to date, only general education students have been included in Web-based note-taking research. More specific investigation is needed if gener- alizations are to be made to the instruction of students with LD. Facilitation of Encoding (05) Note taking can promote learning in two phases: the external storage phase and the encoding phase (Divesta & Gray, 1972; Kiewra et al., 1991). In the external storage phase, learning occurs as students study a set of notes that have already been recorded. For example, when a student studies her prere- corded lecture notes in preparation for a test, she employs the external ­storage phase of note learning. In the encoding phase, learning occurs as students take notes. For example, students who take notes while listening to a lecture can sometimes comprehend the lecture better (Kiewra, 1985) and remember more of the ideas presented (Aiken, Thomas, & Shennum, 1975) than students who simply listen. In short, although studying notes can result in a great deal of learning, students can encode (or learn) information through the note-taking process alone. (06) The amount of information students encode through the note-taking process is largely a function of the kinds of notes they take (Igo et al., 2003; Igo, Bruning, & McCrudden, 2005a; Slotte & Lonka, 1999). For example, stu- dents who take summary notes remember more from lectures than students who take verbatim notes (Slotte & Lonka, 1999). Similarly, note taking that involves identification of main ideas or paraphrasing seems to boost encod- ing (Blanchard, 1985; Hidi & Anderson, 1986; Igo et al., 2003; Mayer, 2002; McAndrew, 1983; Rinehart & Thomas, 1993). (07) One explanation for these differences in encoding is that the mental processes required to create summaries and paraphrases (or to identify and note main ideas) are deeper than those required to record verbatim notes (see, e.g., Craik & Lockhart, 1972). Consequently, researchers have described the boost in encoding related to different kinds of note taking as a depth-of-processing effect (Divesta & Gray, 1972, 1973; Igo et al., 2003; Igo et al., 2005a; Mayer, 2002), where deeper levels of thinking result in more encoded ideas than shallow levels of thinking. (08) Depth-of-processing effects have been documented in research address- ing students with LD (Boyle & Weishar, 2001). For example, students with LD can deepen their processing by relating new information to prior knowl- edge (Alley & Deshler, 1979) or by identifying main ideas (Deshler, Shumaker, Alley, Clark, & Warner, 1981; Ellis & Lenz, 1987) while they take notes. In short, encoding is facilitated by the deep kinds of thinking that are necessary to create certain kinds of notes. Web-Based Note Taking (09) The depth-of-processing effect has also been documented in Web-based ­note-taking environments. For example, in a study by Igo et al. (2003), gen- eral education high-school students who typed notes from Web-based text460

Find more at www.downloadslide.comwere likely to create paraphrases as a default strategy. Presumably, using the (10) Basic mixed methods design:paraphrase strategy required them to deepen their mental processes; in turn, (11) explanatory sequentialthe students who typed notes learned more than students who copied and (QUANSqual) design. Thepasted their notes from Web-based text. (12) purpose of this study was to explore the encoding function Further, Igo et al. (2005a) found that college students who copied of Web-based note taking forand pasted notes with greater text selectivity encoded more ideas from middle-school students with LD.Web-based text than students who pasted notes less selectively. In In the quantitative, first phasep­ ost-note-taking interviews, the selective pasters described engaging in of the study, 15 students readdeeper mental processes (e.g., evaluation of text ideas) while taking Web-based text covering threenotes than the less selective pasters. Again, the encoding function of topics and noted each topic inWeb-based note taking was related to the depth of processing in which a different way: writing, typing,students engaged. or copying and pasting. In the qualitative (second) phase of To date, the encoding phase of Web-based note taking for students the study, each student waswith LD has been neglected in the research literature. Thus, the depth-of- interviewed to explore [her orprocessing effect that has been documented with general education his] perspectives of the three­students may not materialize in populations of students with LD. First, kinds of note taking, examine­students with LD often struggle to process text in deep, meaningful, or how [each student] approachedstrategic ways (Mastropieri & Scruggs, 2000; Mercer & Mercer, 2001; using the three techniques, andSawyer, Graham, & Harris, 1992). Students with LD face other obstacles to further explain the quantitativeencoding, such as the distraction imposed by spelling and punctuation findings.monitoring (Hughes & Smith, 1990; Poteet, 1979). Consequently, inW­ eb-based note-taking environments, students with LD may not choose to Is the explanatory sequentialparaphrase while typing notes and, therefore, may not attain improvements mixed methods design thein encoding. Also, it is unclear how students with LD are affected by the best choice of design for thisuse of copy and paste while taking notes. Although general education problem?high-school students (Igo et al., 2003) and college students (Igo et al.,2005a) learn less when they use their own copy-and-paste strategies, the The quantitative phase of thesame might not be true for students with LD. Finally, when students with mixed methods design used anLD take notes from the Web, it might be more beneficial for them to write experimental design that testedtheir notes instead of taking electronic notes. two competing hypotheses: the depth hypothesis and the The purpose of this sequential, explanatory mixed-methods study was to t­ ransfer-appropriate hypothesis.explore the encoding function of Web-based note taking for middle-school The 15 students who partici-students with LD. In the quantitative, first phase of the study, 15 students pated in the study were fromread Web-based text covering three topics and noted each topic in a different an i­ntact, self-contained socialway: by writing, typing, or copying and pasting. In Latin-square fashion, each studies classroom. The experi-student took three kinds of notes, but the combinations of their topic and ment occurred over one day,note-taking styles differed, so that different students pasted, typed, or wrote with the delayed test takingnotes on different topics (see Figure 1). After taking notes, students were place four days later. Students(a) immediately tested to examine any differences in encoding prompted by met in their usual classroomthe three note-taking techniques and (b) given a delayed measure of recall of where class roll was taken;text ideas (four days later). students were then assigned randomly to one of six experi-Students/topics bauxite coal uranium mental groups that differed in 1, 7, 13 Type Write Paste the combination of topics to 2, 8, 14 Type Paste Write be noted and kinds of notes to 3, 9, 15 Write Paste Type be taken. 4, 10 Write Type Paste 5, 11 Paste Write Type Does the study adhere to the 6, 12 Paste Type Write methodological assumptions of experimental research? Is theFigure 1. Latin-square assignment of students (numbers) to conditions sampling procedure appropriate(note-taking styles x topic) for an experimental study?Note: The three treatment conditions (type, write, paste) were assigned randomly to eachstudent. 461

Find more at (13) In the qualitative phase of the study, each student was interviewed to explore their perspectives of the three kinds of note taking, examine how they approached using the three techniques, and further explain the quantitative findings. Finally, a textual analysis of students’ notes was conducted to help explain students’ strategies, learning, and mental processing. Quantitative Hypotheses and Predictions (14) Two competing hypotheses were constructed for the quantitative phase of the study: the depth hypothesis and the transfer-appropriate hypothesis. The depth hypothesis stems from levels-of-processing theory and its related ­research (Craik, 2000; Craik & Lockhart, 1972). According to this theory, stu- dents who process text at deeper levels encode more information while taking notes than students who process text at shallower levels (Cermak & Craik, 1979; Craik, 2000). In previous research, students were found to display depth-of-processing effects (and learned more) when they typed paraphrase notes but not when they took copy-and-paste notes (Igo et al., 2003). In fact, in an in-depth mixed-methods study, Igo et al. (2005a) found that many col- lege students take copy-and-paste notes in a decidedly mindless way, pasting large amounts of text and remembering little (if anything) of what they had pasted. Thus, for purposes of this study, the depth hypothesis predicted that learning will be more robust when notes are taken by typing or writing, as these two techniques allow students the opportunity to create paraphrases and deepen their processing. On immediate and delayed tests, then, students should perform better on items assessing knowledge of topics that were writ- ten or typed than on items assessing knowledge of topics that were pasted. (15) The transfer-appropriate hypothesis stems from transfer-appropriate processing theory and its related research (Baguley & Payne, 2000; Bransford, Franks, Morris, & Stein, 1979). According to this theory, memory performances (and encoding) are maximized when the cognitive skills used in learning are the same as the cognitive skills required by an assessment procedure. As such, this hypothesis predicts that when students copy and paste their notes (where they must identify and select the appropriate infor- mation to note), their performances will be highest on a multiple-choice test (because they are required to identify and select the appropriate information for their answers). Similarly, this hypothesis predicts that when students type or write their notes (where they must generate words to note), their memory performances will be highest on a cued recall test (where they must gener- ate words for their answers). Method Participants (16) Participants were 7th- and 8th-grade students in a rural southeastern town. The middle school has a large number of migratory students and students from low socioeconomic backgrounds. The 15 students who participated in the study were from an intact self-contained social studies classroom. Ten were in 7th grade and 5 were in 8th grade. Twelve participants were male and 3 were female, ranging in age from 13 years to 14 years 6 months. (17) Eleven of the participants were identified with a learning disability, 2 students were identified with emotional and behavioral disorders, and 2 stu- dents were identified with other health impairments (OHI) for attention dif- ficulties. All 15 participants were described by their teacher as being poor readers with low motivation. The participants’ demographic information is summarized in Table 1. It is important to note that the students’ achievement scores were not available to the researchers and, therefore, are not reported.462

Find more at www.downloadslide.comTable 1Summary of Participant Demographic Information ParticipantsGrade Level 10  57th grade8th grade 12  3Gender 15Male 13 years 10 monthsFemale 13–14 years 6 monthsTotal  9Age  1  5MeanRange  0  1Race/Ethnicity 14Anglo 11Hispanic  2African-American  2SES >60% Self-containedHighMiddle 82 (11.8)Low 68–100Disability Category 4th grade 2nd–7th gradeLDE/BDOHIEducational PlacementTime in Sp. Ed. PlacementLevel of PlacementIntelligenceMean (SD)RangeReading AchievementMeanRangeNote: Intelligence as measured by WISC III, Universal Nonverbal Intelligence Test,and Stanford-Binet Intelligence Scale.The number of participants was limited to 15 by the researchers to provide (18)experimental content consistent with the teacher’s goal and the number ofclasses covering the same content. The students’ low achievement in all areas is evident in their achieve-ment test scores, which were obtained at the district office. Seven studentshad been assessed using the Brigance Comprehensive Inventory of BasicSkills. On math computation all 7 students scored at the 3rd-grade level andranged from 2nd to 4th grade on problem solving. Reading scores for wordrecognition, oral reading, and comprehension ranged from 2nd to 7th grade.Writing scores for spelling and sentence writing ranged from 3rd to 5thgrade. Five students had been assessed with the Wide Range Achievement 463

Find more at Test. For math, reading, and spelling, their scores ranged from 2nd to 6th grade. The overall achievement of three students is not reported because of incomplete files as a result of students having recently moved into the school district. Materials (19) Materials included a researcher-constructed text passage from which stu- dents took notes. The passage was 763 words long, described three native Australian minerals (coal, bauxite, and uranium), and was presented on a single, continuous Web page (HTML document) accessed through Microsoft Internet Explorer. The text was of comparable length to text used in previ- ous text-encoding research (e.g., Blanchard, 1985; Golding & Fowler, 1992; Marxen, 1996; Peterson, 1992; Spiegel & Barufaldi, 1994; Wade & Trathen, 1989). The text described each mineral along parallel lines, identifying each mineral’s (a) supply, (b) production, (c) uses, (d) geographic location, (e) first characteristic, and (f) second characteristic. After controlling for certain vocabulary words that occurred several times throughout the text (e.g., bauxite, uranium, nuclear), the Flesh-Kincaid grade level of the text was 6.4. (20) Students took notes in a Web-based, note-taking tool and (for one topic) on a paper chart. The note-taking tool was a word-processing chart fit with the text’s structure. It contained three columns corresponding to the three text topics (minerals) and six rows corresponding to the six text categories. The three columns were labeled from left to right as bauxite, coal, and ura- nium. The six rows were labeled production, supply, uses, location, first characteristic, and second characteristic. Thus, at the outset, the tool pre- sented students with 18 blank cells, 6 for each mineral addressed in the text, with cues directing them to find information intersecting topics and catego- ries. The tool itself could be minimized, maximized, or reduced in the same way as other computer programs. For example, students could choose to have the tool appear on the screen as they took notes, or they could expand the text to cover the screen and hide the chart. (21) The paper note-taking chart was a paper version (8½ × 11 in.) of the online note-taking tool. It presented students with the same cues and blank cells as the Web-based tool, but students filled in one topic by writing in the paper chart, whereas two topics in the Web-based tool were filled by typing and copying and pasting. Dependent Measures (22) Two researcher-constructed tests assessed student learning of facts from the text. The cued-recall-of-facts test was administered twice: immediately after the note taking and after a four-day delay. Students filled in a cued paper chart (similar to the online note-taking chart) with all, or any part of, the information that they could remember reading or typing, writing, or pasting into their notes. The columns and rows were labeled in the same way as the note-taking chart; the cells were blank. The test was scored by awarding 1 point per idea recalled and placed in the correct, cued cell corresponding to an idea from the text, whether the idea was originally noted or not. Two raters scored the quiz, blind to experimental conditions, with a clearly acceptable level of inter-rater reliability (Cohen’s K = .89). (23) An 18-item multiple-choice test (a = .73) required students to recognize factual information presented in the text. For each item, students read a fact and then decided to which of the three minerals it corresponded. One point was awarded for each correct response.464

Find more at www.downloadslide.comProcedure (24) (25)Prior to the experiment, the students received a brief, informal tutorial fromtheir teacher on how to use the type and copy-and-paste functions of a com- (26)puter. The teacher indicated that while most students were already familiarwith each technique, they struggled when using them. The experiment occurred over one day, with the delayed test takingplace four days later. Students met in their usual classroom where class rollwas taken; students were then assigned randomly to one of six experimentalgroups that differed in the combination of topics to be noted and kinds ofnotes to be taken (see Figure 1). Next, students were given an overview ofthe note-taking task. Specifically, the primary researcher told the studentsthat they (a) were to read and take notes over material as per their assignedcondition, (b) would be given two brief tests after they finished, and(c)  would be given another brief test four days later. The students thenmoved as a group to the school’s computer lab. Next, students logged on to computers (which they were allowed tochoose) and created user names and passwords (which permitted them touse the note-taking tool and allowed their notes to be saved on a universityserver and printed). The students were instructed by the primary researcherto take notes using the cues provided in the chart—for example, uses andsupply— to read and take notes at a pace comfortable to them, and to takeas much time as they needed to complete their notes. Students then read thetext, completed their notes, and saved their notes on the computer (andturned in their paper note sheets). Most students completed the note-takingtask in 14–18 minutes. Because students finished the note-taking portion ofthe experiment in varying amounts of time regardless of experimental condi-tion, and because each student took all three kinds of notes, differences inengagement time were judged to be minimal and realistic of typical class-room behavior.Results (27)Immediate (28)ANOVA results indicated a significant effect on students’ immediate cued re-call test performances, F (2, 42) = 4.8, p < .05. The strength of the relationshipbetween kind of note taking and cued recall was strong as assessed by etasquare, with the kind of note taking accounting for 17% of the variance incued recall. Results of the LSD post-hoc test indicated that cued recall wassignificantly higher for topics that were noted by writing than for topics thatwere noted by pasting (see Table 2). Recall performances for topics that werenoted by typing fell between those of writing and pasting and did not differfrom either. Results also indicated a significant effect on students’ multiple-choicetest performances, F (2, 42) = 3.96, p < .05. The strength of the relationshipbetween the kind of note taking and fact identification was strong as as-sessed by eta square, with the kind of note taking accounting for 16% of thevariance in fact identification. Results of the LSD post-hoc test indicated thatmultiple-choice performances were significantly higher for topics that werenoted by pasting than for topics that were noted by writing and typing(see Table 2).Delayed (29)ANOVA results indicated no significant effect on students’ delayed cued recalltest performances, F (2, 42) = 1.6, p = .38. Performance means for students’recall of text ideas ranged from .67 points for written topics to .53 for typedtopics and .24 for pasted topics. 465

Find more at Table 2 Means Summary of Two Tests and Analysis of Variance Measure and Note-Taking Style Mean SD F Power h2 .84 .17 Cued Recall .71 .16 Write 1.47* 1.35 4.51 Type .87 .99 Paste .33 .61 M/C Facts Write 2.87 1.30 Type 2.73 1.02 Paste 4.07* 1.31 3.96 *p 6 .05. Discussion Quantitative Phase (30) The experimental results did not support our depth hypothesis, whichFindings from the quantita- predicted that learning through writing and typing notes is superior totive (experimental) phase of learning through pasting notes. In previous research, typing notes pro-the explanatory sequential duced higher levels of student learning across different tests assessingmixed methods design: The learning from W­ eb-based text. This was attributed to the use of paraphraseexperimental results did not strategies and subsequent deepening of mental processes during note tak-support our depth hypothesis, ing (Igo et al., 2003). In the present study, however, students’ writing andwhich predicted that learning typing behaviors (which presumably afforded them the opportunity tothrough writing and typing paraphrase) yielded inconsistent performances across tests. A deepenednotes is superior to learning level of processing should have resulted in boosted performances on eachthrough ­pasting notes. The test. The absence of such consistency suggests the absence of deepresults do support the transfer-­ processing.appropriate hypothesis, as (31) Also in previous research, copying and pasting notes decreased learn-students’ test performances ing. This was attributed to the imposition of verbatim note taking on thediffered with respect to both student, which has been linked to shallow processing in previous researchnote-taking style and test type. (Igo et al., 2003; Slotte & Lonka, 1999). In the present study, however, copy-Is the statistical use of ANOVA ing and pasting yielded higher performances on one of the immediate testsappropriate given the sample size (multiple choice). This is not to say that the students were engaging inand selection procedures? deeper levels of processing while pasting, however, because, again, the deepened processing would have resulted in consistently higher perfor- mances across tests. (32) Finally, the absence of deep processing is made clearer by the results of the delayed, cued recall test. On average, students recalled about half of an idea across topics noted by all three kinds of note taking. This result, cou- pled with the relationship between depth-of-processing theory and long- term memory (Craik, 2000), suggests students were not thinking deeply while they were taking notes. (33) By contrast, our results do support the transfer-appropriate hypothesis, as students’ test performances differed with respect to both note-taking style and test type. In transfer-appropriate fashion (Baguley & Payne, 2000), stu- dents tended to perform better when the type of engagement during a par- ticular kind of note taking was closely matched to the type of engagement required by the test. For example, points were awarded on students’ cued re- call tests only if students were able to generate correct written facts in the appropriate cell of their cued recall charts. They had already done this once, during the note-taking phase of the experiment when they noted topics by writing. The cognitive engagement necessary to complete the written portion466

Find more at www.downloadslide.comof the notes was closely related to the engagement necessary to answer the (34)cued recall test, and performances were highest when there was such a (35)match. The same effect was evident in the multiple-choice test, which­required students to search for and then select information for their answers.Performances were highest for topics that were copied and pasted, which re-quired students to select and paste the correct ideas into notes. This is an interesting finding because transfer-appropriate processing ef-fects are typically regarded as weak learning effects (see, e.g., Neath, 1998),occurring in the absence of deep processing. Again, for evidence of this, seethe means in Table 1, which suggest that students, in general, learned littleduring any of the three kinds of note taking. Specifically, 6 points were pos-sible in the cued recall of facts test, but the mean for each kind of note­taking on the immediate test was below 2 points, with pasting and typingbelow 1 point. As expected, performances were even worse on the delayedtest, where all means fell below 1 point. Although the experimental findings support the transfer-appropriateh­ypothesis, they do not explain the weak learning effects. Certain othercharacteristics of the results also are unexplainable in light of only thee­xperimental findings. For example, students’ performances were slightlylower for typed than for written notes. This phenomenon occurred on eachdependent measure, and it is not explained by either of our theoreticalh­ ypotheses. Perhaps this finding was due to the small number of studentswho participated in the study. Similarly, the relatively small number of itemson each dependent measure might have influenced the results. But each ofthese explanations might also be inconclusive. In cases where quantitative,experimental inquiry does not provide enough description of a phenome-non, researchers can use qualitative followup procedures to aid understand-ing (Creswell, 2003; Newman & Benz, 1998; Tashakkori & Teddlie, 1998).Qualitative Phase (36)In order to gain a more detailed view of the students’ note-taking behaviors The qualitative phase of the study usedand attitudes, as well as the impact of those behaviors and attitudes on test two kinds of qualitative data: interviewperformances and processing, two kinds of qualitative data were collected data and students’ notes.and analyzed: interview data and students’ notes. The interview data wereanalyzed to ascertain the students’ least and most preferred note-taking tech-niques, to describe why the students subscribed to those beliefs, and to helpexplain the experimental results. Further, students’ notes were analyzed toexamine how students approached using the three kinds of notes and to­explain how their approaches might account for the experimental findings.Analysis of Notes and QUAN–QUAL Data Mixing (37) (38)Students’ notes were analyzed in three ways. First, they were checked forcompleteness. As documented in previous research using cued note charts(see, e.g., Igo et al., 2003; Kiewra, 1989), students in the current study com-pleted their note charts appropriately. All cells were filled in each student’schart. The notes also were checked for appropriateness of ideas. Again, as inprevious research, the information in each note-taking cell correctly corre-sponded to the cues that were provided in the chart. Because the chart andcues were fit with the text’s structure, the notes were appropriate to the textas well. Finally, the notes were checked for the presence or absence of para-phrases and the presence or absence of verbatim text ideas, except for topicsthat were noted by pasting, which were all identical to the original text. In general, students’ notes—whether typed or written—were con-structed in verbatim fashion. This preference has been documented in olderstudents with LD (see, e.g., Suritsky, 1992), whereas general education stu-dents prefer to create paraphrases in lieu of verbatim notes (Igo et al., 2003).In the present study, some note-taking cells were filled with verbatim 467

Find more at (39) sentences from the text, but more often they contained sentence fragments from the original text (varying in length from two to seven words). In most (40) cases, students selected sentence fragments appropriately. That is, no realThe intent of the explanatory ­sequential meaning was lost through their choice to include fewer words rather thanmixed methods design is to explain entire sentences.the quantitative results with qualitativedata. In this study, the analysis of notes In some cases, albeit few, students attempted to paraphrase text ideas infurther confirmed the basis for rejecting their notes. Interesting, the paraphrases were short. In fact, in most cases,the depth hypothesis in the quantitative they simply took the form of word substitution rather than typical sentencephase of this study. or paragraph paraphrases. For example, one note cell was cued to be filled with the uses of uranium. Whereas the text presented the uses of “providing (41) electrical power” and “used to make nuclear weapons,” one student wrote “bombs.” There were other similar examples of such paraphrase attempts. (42) But unlike when students wrote or typed verbatim sentence fragments, when they attempted to paraphrase, part of the text’s meaning was lost in (43) the transition of ideas from the original text to the students’ notes. In the (44) “bombs” example given above, the student’s notes were perhaps not as com- plete as those of another student, who chose to write verbatim each of the uses of uranium. In short, in the rare cases where students chose to take paraphrase notes, their attempts seemed to come at the expense of note quality. That is, they built notes inferior to those that contained verbatim text ideas. The analysis of notes thus further confirmed our basis for rejecting the depth hypothesis in the quantitative phase of this study. As mentioned, in previous research, verbatim note taking has been linked to shallow levels of processing (Slotte & Lonka, 1999), and shallow levels of processing have been linked to poor memory performances (Craik, 2000). Because students in the present study performed poorly on the tests and took mostly verbatim notes, the absence of deep processing becomes an increasingly more plausi- ble account of our results. Student Interviews Immediately following completion of the two tests, students were inter- viewed separately by the primary researcher, who typed their responses ver- batim on a laptop computer. After each student’s responses were recorded, they were read back to the student to ensure that they communicated what had been intended. The items to which students responded came from a protocol consisting of seven questions/prompts: ■ Which type of note taking did you like the best? ■ And why is that so? ■ Which type of note taking did you like the least? ■ And why is that so? ■ Explain the process you used when you typed your notes. ■ Explain the process you used when you pasted your notes. ■ Explain the process you used when you wrote your notes. Additional questions were asked to further prompt answers from inter- viewees who at first gave brief or non-descript answers to one or more of the questions. In general, the interviews lasted from four to six minutes. After the interviews, students’ responses were printed and cut into slips of paper, containing a statement addressing one of the questions. A prede- termined set of coding schemes was subsequently used to sort the state- ments into three categories: typed, written, or pasted notes. The statements within each category then were read and examined several times for any commonality or thread. Similar coding systems have been used in previous research (Igo et al., 2005a). For example, Igo et al. (2005a) used three prede- termined categories to sort statements from student interviews into catego- ries addressing shallow, moderate, and deep processing. In this study, some468

Find more at www.downloadslide.comTable 3Condensed Effects Matrix That Organized the Interview DataNote Write Type Paste Theme Exemplar QuoteTaking “When I pasted stuff, I . . . took itPreferred 2 1 12 1. 12 students preferred pasting; 10 and put it in without worryingMethod about how it looked.” because it removed the need to “I could write faster than I could do the others.” monitor spelling and grammar. “ . . . sometimes I had to look back 2. 3 students who were confident in their [to the text] a couple of times for the same word if I didn’t know spelling found writing/typing notes to how to spell it.” “It was hard to find some of the be the most time efficient. [letter keys . . . ]”Least 2 13 0 3. 1 1 students described the worry and “I would forget how to spell [thePreferred word] I was typing.]” stress of monitoring spelling while writing and/or typing notes. 4. 6 students found typing notes cumbersome, as need to monitor spelling was amplified by the need to search keyboard for the correct letters.of the themes were identified easily, as students’ responses to the questions (45)were in some ways quite similar. Other themes were at first more elusive,but emerged after several examinations of the statements. Following the prescriptions of Miles and Huberman (1994), an effectsmatrix was constructed to serve three purposes: organization of the inter-view data, explanation of effects, and drawing of conclusions. As shown inTable 3 (a condensed version of the matrix), the effects matrix organized theinterview data by kind of note taking and student preference. Four themesemerged once the data were organized.Explanation of Effects (46) (47)As seen in Table 3, the first theme that emerged from the interview data (48)­relates to an overwhelming note-taking preference of the students in thisstudy. Of the 15 students, 12 described preferring copy and paste to the othertwo types of note taking. We judged this to be a theme because of the sheerpercentage of students who gave this answer. Previous research has docu-mented a similar preference in high-school general education populations(Igo et al., 2003), as well as college students (Katayama & Crooks, 2003;Katayama & Robinson, 2000). Another dimension of this theme was why stu-dents preferred copying and pasting. Ten students indicated that pastingnotes was their favorite because it removed the barriers of spelling and gram-mar while they took notes. This was not documented in the previous researchwith general education students. The second theme was similar to the first, but relates to students whopreferred to write notes. Although only two students described this prefer-ence, it is important to note that they both stated that writing was the easiestway for them to take notes. When the students gave this answer during theinterviews, the interviewer asked them a further question, “Are you a goodspeller?” Each student said “yes” in response. In fact, the lone student whopreferred to type his notes made it clear that he too was confident in hisspelling, indicating that alleviating the spelling need, which seemed impor-tant to most students, was to these three students not an enticing feature ofthe copy-and-paste function. The third theme from the interview data relates to spelling concerns aswell. First, it is important to note that all students reported least preferring 469

Find more at either to type or write notes. None assigned that label to copy and paste. Of the 15 students, 11 described worrying about spelling while taking notes as the primary reason why they disliked typing or writing. In fact, the note- taking tool had no spell-check function, so students could not “right-click” to find spelling options. Interesting, 5 students stated that they were less likely to be concerned about spelling when they were writing notes and more likely to be concerned when typing their notes. Perhaps this is because, when typed, a misspelled word looks suspiciously incorrect, whereas in handwriting the error may not appear as blatant (as indicated by one student during the interviews). (49) Finally, the fourth theme relates solely to typed notes. Typing was prob- lematic for some students. Six students specifically mentioned having to search the keyboard to find the appropriate letter keys. This frustration was complicated even further for certain students. Thus, three mentioned having to look back to the text several times per word while searching the key- board, as they tended to forget how to spell the word during the time it took to find the letters on the keyboard. (50) Conclusions and QUAN-QUAL Data Mixing (51) The four qualitative interview themes, coupled with an analysis of studentThe intent of the explanatory sequen- notes, offer a sound explanation of the transfer-appropriate processingtial mixed methods design is to explain ­effects found in lieu of depth-of-processing effects in the quantitative phasethe quantitative results with qualita- of this study. First, as indicated by the analysis of notes, most students createdtive data. The qualitative phase of this verbatim notes; they tended to write (or type) one or two words at a timestudy . . . helps explain why students while trying to match their notes to the main text. Verbatim note taking hasperformed slightly better on tests for been linked to shallow processing with both general-education and LD popu-topics [on which they took notes] . . . by lations (Igo et al., 2003; Slotte & Lonka, 1999; Suritsky, 1992). Further, duringwriting [rather] than by typing. the interviews, students consistently described the need to monitor spelling while typing and writing notes. As such, they most likely would have had to (52) shift their mental efforts away from the meaning of the text from time to time as they took notes, which can result in diminished encoding of the text ideas (53) (Igo, Bruning, & McCrudden, 2005b). Finally, some students described the added distraction of searching the keyboard for letters while they took notes. This, too, forces students to shift their mental efforts to a task unrelated to the message in the text. The qualitative phase of this study also helps explain why students per- formed slightly better on tests for topics that were noted by writing than by typing. For example, although students indicated that spelling was a concern in both the writing and typing conditions, three students noted that they were able to write their notes more quickly than typing them. Similarly, five students described feeling less pressure to spell correctly while writing than while typing. Together, these two findings could account for the slightly higher performance on written topics, as each suggests that less time was spent on distracting tasks (see, e.g., Baddeley, 1998). Implications for Practice The results of the present study suggest that middle-school students with LD struggle to encode (or learn) text ideas simply through the note-taking pro- cess regardless of the kind of notes they take (typed, written, or pasted). The encoding phase, then, results in little actual learning. Therefore, it might be of optimal benefit for students to ensure clarity and completeness of their notes in order to maximize the external storage function of note taking. In other words, if students won’t remember much of what they have noted, they should at least have a good set of notes to study from. The question becomes, “How do we best ensure that middle-school students with LD create a good set of notes from Web-based sources?” Based on our qualitative findings, one answer is that most students should use copy and paste in lieu of typing or writing for practical, motivational and470

Find more at www.downloadslide.comlearning-related reasons. On a practical note, this population of students chose, (54)in general, to create verbatim notes in the writing and typing conditions. Copy (55)and paste essentially does the same thing, but it does so in a more t­ime-efficientfashion. In terms of motivation, students described a measure of anxiety regard-ing spelling and grammar while typing and writing that was not presentwhen they pasted their notes. Copying and pasting their notes, then, wouldbe a less intimidating and more comfortable experience when learning­online. Reducing the anxiety associated with note taking (spelling and gram-mar concerns) may motivate students to engage in the note-taking process(Barlow, 2000; Beck, 2004; Gray, 1982). In terms of learning, the quality of students’ notes tended to suffer whenthey attempted to paraphrase ideas while taking notes in the writing andtyping conditions. Ultimately, this would have negatively affected the poten-tial of the external storage phase (the study of notes), as their paraphrasednotes were in some ways incomplete (Divesta & Gray, 1972). Notes createdwith copy and paste, however, addressed the note-taking cues the studentswere provided.Limitations and Future Research (56) (57)At least two practical concerns with the present study should be addressed infuture research. First, the use of a single text may be problematic. Becausetexts differ with respect to their density of ideas, content, and general length,a different text might produce different results. If possible, future researchcould require students to take notes from multiple texts or different textsthan this study. Second, the present study included only 15 participants.A Latin-square design was, therefore, employed to test the effect of the threekinds of note taking on learning. Different results might be obtained if moreparticipants were included and assigned to three experimental groups thatdiffer in the kinds of notes they take. Last, future research should test the­external storage function of note taking by allowing students to study­before learning is assessed. In conclusion, the results from this study appear to indicate that stu-dents with LD have unique needs when it comes to gathering informationfrom Web-based text. Because more and more students with LD are requiredto use the Internet and other Web-based formats for school-related activities,it is important that teachers consider these characteristics when planninginstructional activities. Thirteen of the 15 students indicated a preference forusing the copy-and-paste tool for note taking. If copy-and-paste note takingimproves students’ potential to learn, eliminates spelling errors, and benefitsmotivation by reducing anxiety, teachers should consider instructing theirstudents in how to use copy and paste to take notes.References ­construction processes and model structure.Aiken, E. G., Thomas, G. S., & Shennum, W. A. (1975). Quarterly Journal of  Experimental Psychology, 53(2), 479–512. Memory for lecture: Effects of notes, lecture rate, Barlow, D. H. (2000). Unraveling the mysteries of anx- and informational density. Journal of Educational iety and its disorders from the perspective of emo- Psychology, 67, 439–444. tion theory. American Psychologist, 55, 1245–1263.Alley, G., & Deshler, D. (1979). Teaching the learn- Beck, R. C. (2004). Motivation: Theories and princi- ing disabled adolescent: Strategies and methods. ples. Upper Saddle River, NJ: Pearson Prentice Hall. Denver, CO: Love. Blanchard, J. S. (1985). What to tell students aboutBaddeley, A. (1998). Human memory: Theory and underlining . . . and why. Journal of Reading, 29, practice. Boston: Allyn Bacon. 199–203.Baguley, T., & Payne, S. J. (2000). Long-term mem- ory for spatial temporal mental models includes 471

Find more at www.downloadslide.comBoyle, J. R., & Weishaar, M. (2001). The effects of a learning disabilities: A synthesis of the literature. strategic note-taking technique on the comprehen- Learning Disability Quarterly, 13, 66–79. sion and long term recall of lecture information for Igo, L. B., Bruning, R. H., & McCrudden, M. (2005a). high school students with LD. Learning Disability Exploring differences in students copy and paste Research and Practice, 16(3), 125–133. decision-making and processing: A mixed meth- ods study. Journal of Educational Psychology, 97,Bransford, J. D., Franks, J. J., Morris, C. D., & Stein, 103–116. B. S. (1979). Some general constraints on learning Igo, L. B., Bruning, R., & McCrudden, M. (2005b). and memory research. In L. S. Cermak & F. I. M. Encoding disruption associated with copy and Craik (Eds.), Levels of processing in human mem- paste note taking. In L. PytlikZillig, M. Bodvarsson, ory (pp. 331–354). Hillsdale, NJ: Erlbaum. & R. Bruning (Eds)., Technology-based education: Bringing researchers and practitioners togetherCermak, L., & Craik, F. (1979). Levels of processing in (pp. 107–119). Greenwich, CT: Information Age human memory. Hillsdale, NJ: Erlbaum. Publishing. Igo, L. B., Bruning, R., McCrudden, M., & Kauffman,Council for Exceptional Children. (2004). The NEW D. F. (2003). InfoGather: A tool for gathering IDEA: CEC’s summary of significant issues. and organizing information from the Web. In Retrieved January 17, 2005, from cec R. Bruning, C. Horn, & L. PytlikZillig (Eds.), IDEA_ 120204.pdf ­Web-based learning: What do we know? Where do we go? (pp. 57–77): Greenwich, CT: InformationCraik, F. I. M. (2000). Memory: Coding process. Age Publishing. In A. Kazdin (Ed.), Encyclopedia of psychol- Katayama, A. D., & Crooks, S. M. (2003). Online ogy. Washington, DC: American Psychological notes: Differential effects of studying complete Association. or partial graphically organized notes. Journal of Experimental Education, 71, 293–312.Craik, F., & Lockhart, R. (1972). Levels of process- Katayama, A. D., & Robinson, D. H. (2000). Getting ing: A framework for memory research. Journal of students “partially” involved in note-taking ­using Verbal Learning & Verbal Behavior, 11, 671–684. graphic organizers. Journal of Experimental Education, 68, 119–133.Creswell, J. W. (2003). Research design: Qualitative, Kiewra, K. A. (1985). Investigating note taking quantitative, and mixed methods approaches (2nd and review: A depth of processing alternative. ed.). Thousand Oaks, CA: Sage. Educational Psychologist, 20, 23–32. Kiewra, K. A. (1989). A review of note taking: The en-Dabbagh, N., & Bannan-Ritland, B. (2005). Online coding-storage paradigm and beyond. Educational learning: Concepts, strategies, and application. Psychology Review, 3, 147–172. Upper Saddle River, NJ: Prentice Hall. Kiewra, K., Dubois, N., Christain, D., McShane, A., Meyerhoffer, M., & Roskelley, D. (1991). Note-Deshler, D. D., Shumaker, J. B., Alley, G. R., Clark, E. taking functions and techniques. Journal of L., & Warner, M. M. (1981). Paraphrasing strategy. Educational Psychology, 83, 240–245. University of Kansas, Institute for Research in Marxen, D. E. (1996). Why reading and underlining Learning Disabilities (Contract No. 300-77-0494). a passage is a less effective strategy than simply Washington, DC: Bureau of Education for the rereading the passage. Reading Improvement, 33, Handicapped. 88–96. Mastropieri, M., & Scruggs, T. (2000). The inclu-Divesta, F. J., & Gray, G. S. (1972). Listening and note sive classroom: Strategies for effective instruction. taking. Journal of Educational Psychology, 63, Columbus, OH: Merrill. 8–14. Mayer, R. R. (2002). The promise of educational psy- chology: Teaching for meaningful learning (Vol.Divesta, F. J., & Gray, G. S. (1973). Listening and note II). Columbus, OH: Merrill Prentice Hall. taking II. Journal of Educational Psychology, 64, McAndrew, D. A. (1983). Underlining and note tak- 278–287. ing: Some suggestions from research. Journal of Reading, 27, 103–108.Ellis, E. S., & Lenz, B. K. (1987). A component analy- Mercer, C. D., & Mercer, A. R. (2001). Teaching stu- sis of effective learning strategies for LD students. dents with learning problems (6th ed.). Upper Learning Disabilities Focus, 2(2), 94–107. Saddle River, NJ: Merrill/Prentice Hall. Miles, M. B., & Huberman, A. M. (1994). QualitativeFederal Register. (1999). 34 C. F. R. Parts 300 and 303. data analysis. Thousand Oaks, CA: Sage.Golding, J. M., & Fowler, S. B. (1992). The lim- ited facilitative effect of typographical signals. Contemporary Educational Psychology, 17, 99–113.Gray, J. A. (1982). The neuropsychology of anxiety: An inquiry into the functions of the septohipocampal system. Oxford: Oxford University Press.Hidi, S., & Anderson, V. (1986). Producing writ- ten summaries: Task demands, cognitive opera- tions, and implications for instruction. Review of Educational Research, 56, 473–493.Hughes, C. A., & Smith, J. O. (1990). Cognitive and academic performance of college students with472

Find more at www.downloadslide.com528 chapter 18  •  Inferential StatisticsFigure 18.1 • Regions of rejection for a = .05 and a = .01 95 % 99 + % Region of Region of Chance ChanceRegion of Region of Region of Region ofRejection Rejection Rejection Rejection−3 SE −2 SE −1 SE X 1 SE 2 SE 3 SE −3 SE −2 SE −1 SE X 1 SE 2 SE 3 SE Significance level, α = .05 Significance level, α = .01Figure 18.2 • Significance areas for one-tailed and two-tailed tests with a = .052.5% of area Region of 2.5% of area Region of 5% of areaunder the Chance under the Chance under thecurve curve curveRegion of Region of Region ofRejection Rejection Rejection Two-tailed test, = .05 One-tailed test, = .05are not simply due to chance. Note that for both hour. In other words, if we make a graph with “nobell curves, a total of 5% of the scores fall into the difference” at the middle and “big differences” at theshaded range: Alpha is set at .05. ends, most scores will fall into the region of chance illustrated in Figure 18.2. Sometimes, however, the A concrete example is useful to help under- two groups will appear very different (althoughstand the graphs and the distinction between a just by chance, if the null hypothesis is true)—inone-tailed and a two-tailed test. Consider the fol- some cases, the group with the snack will be betterlowing null hypothesis: behaved (i.e., one tail on the graph), and in other cases, the group without the snack will be better There is no difference between the behavior behaved (i.e., the other tail on the graph). during the hour before lunch of kindergarten students who receive a midmorning snack When conducting our study, we want to know and that of kindergarten students who do not if we can reject the null hypothesis; we believe receive a midmorning snack. it’s not true. Assume, then, we have a directional research hypothesis:What if the null hypothesis is true—the midmorningsnack doesn’t matter? If we take repeated samples Kindergarten children who receive aof kindergarten children and randomly divide the midmorning snack exhibit better behaviorchildren in each sample into two groups, we can during the hour before lunch thanexpect that, for most of our samples, the two groups kindergarten students who do not receivewill have very similar behavior during the lunch a midmorning snack.

Find more at chapter 18  •  Inferential Statistics 529To reject the null hypothesis and claim support for to cover both possible outcomes (e.g., the childrenour research hypothesis, we need to find not just with snacks will behave better or the childrenthat there’s a difference between groups but also without snacks will behave better). As should bethat children who get snacks exhibit better behav- clear from the graphs, the values that fall into theior than their peers who don’t get snacks, and we two shaded tails of the graph on the left are moreneed to feel confident that our results aren’t simply extreme than the values that fall into the one shadeddue to chance. We set a =.05; a statistically signifi- tail of the graph on the right. For example, whencant difference between the groups (i.e., not likely using a two-tailed test, the two groups of kindergar-to be due to chance) will be large enough to fall teners (i.e., with or without snacks) need to be quiteinto the region of rejection, or the shaded region on d­ ifferent—more different than they need to be if youthe right tail of the bell curve on the right of Figure are using only a one-tailed test.18.2. We look at only one tail because, accordingto our hypothesis, we’re i­nterested only in seeing Type I and Type II Errorswhether the group r­eceiving snacks behaves betterthan the group without snacks. Based on a test of significance, as we have dis-But what if the outcome is reversed—children cussed, the researcher will either reject or notwho get snacks behave much worse than children reject the null hypothesis. In other words, thewho don’t? We haven’t supported our research researcher will make the decision that the differ-hypothesis (in fact, we’ve found the exact oppo- ence between the means is, or is not, likely due tosite!), and although we’ve found a big difference chance. Because we are dealing with probability,between groups, the mean difference doesn’t fall not certainty, we never know for sure whether weinto the region of rejection on our one-tailed graph. are absolutely correct. Sometimes we make mis-We can’t reject the null hypothesis, then—but the takes—we decide that a difference is a real differ-null hypothesis clearly doesn’t reflect the true state ence when it’s really due to chance, or we decideof affairs either! It should be clear that a two-tailed that a difference is due to chance when it’s not.test of significance would help us because it allows These mistakes are known as Type I and Type IIfor both possibilities—that the group that received errors, respectively.a snack would be better behaved or that the group To understand these errors, reconsider ourwithout a snack would be better behaved. example of the two methods of reading at PinecrestTests of significance are almost always two- Elementary. Our decision-making process can leadtailed. To select a one-tailed test of significance, the to four possible outcomes (see Figure 18.3):researcher has to be very certain that a ­difference 1. The null hypothesis can, in reality, be true forwill occur in only one direction, and this is not the population (i.e., no difference betweenvery often the case. However, when appropriate,a one-tailed test has onemajor advantage: The score Figure 18.3 • The four possible outcomes of decision makingdifference required for concerning rejection of the null hypothesis­significance is smaller thanfor a two-tailed test. In other The true status of the nullwords, it is “easier” to obtain hypothesis. It is reallya significant difference when True Falsepredicting change in only (should not (shouldone direction. To under- be rejected) be rejected)stand this concept in moredetail, reconsider Figure 18.2. The researcher's True Correct Type IIBecause a =.05, the region decision. The (does not Decision Errorof rejection represents 5% of researcher reject)the area under the curve. In concludes that False Type I Correctthe graph for the two-tailed the null (rejects) Error Decisiontest, however, that 5% is split hypothesis isinto two regions of 2.5% each

Find more at www.downloadslide.com530 chapter 18  •  Inferential Statistics the reading methods: new method = old found a difference between the means of the old method). If we decide that any difference we reading program and the new reading program; find is just due to chance, we fail to reject the suppose  our inferential statistics show that the null hypothesis, and we have made a correct difference is significant at our preselected level decision. of a =.05, and we reject the null hypothesis of ■ Correct: Null hypothesis is true; the no difference. In essence, we are saying that we are confident that the difference resulted from researcher fails to reject it and concludes the independent variable (i.e., the new method of no significant difference between groups. reading), not random error, because the chances 2. The null hypothesis, in reality, is false (i.e., are only 5 out of 100 (.05) that a difference in the new method ≠ old method). If we decide that mean of the reading scores is as large (or larger) we are reasonably confident that our results as the one we have found would occur solely are not simply due to chance, we reject the by chance. What if we are worried that we have null hypothesis. We have made a correct too much at stake if we make the wrong deci- decision. sion about our results? For example, if we do not ■ Correct: Null hypothesis is false; the want to risk giving the superintendent the wrong researcher rejects it and concludes that the advice, we may decide that a more stringent level groups are significantly different. of significance (a =.01) is necessary. We are say- 3. The null hypothesis is true (i.e., new method ing that a difference as large as the one we have = old method), but we reject it, believing that found between the reading scores at Pinecrest the results are not simply due to chance and would be expected to occur by chance only once that the methods are different. In this case, for every 100 samples from our population— we have made an incorrect decision. We have there’s only one chance in 100 that we make a mistakenly assumed there is a difference in Type I error if we conclude that the new reading the reading programs when there is none. method is better. ■ Incorrect: Null hypothesis is true, but the researcher rejects it and concludes that the So why not set our probability level at a groups are significantly different. =.000000001 and hardly ever be wrong? If you 4. The null hypothesis is false (i.e., new ≠ old), select a to be very, very small, you definitely but we fail to reject it, believing that the decrease your chances of committing a Type groups are really the same. We are incorrect I error; you will hardly ever reject a true null because we have concluded there is no hypothesis. But as you decrease the probability of difference when indeed there is a difference. committing a Type I error, you increase the prob- ■ Incorrect: Null hypothesis is false, but the ability of committing a Type II error—that is, of researcher fails to reject it and concludes not rejecting a null hypothesis when you should. that the groups are not significantly different. Because the choice of a probability level, a, is made before execution of the study, researchers If the researcher incorrectly rejects a null hy- need to consider the relative seriousness of com-pothesis (i.e., possibility 3), the researcher has mitting a Type I versus a Type II error and selectmade a Type I error. If the researcher incorrectly a accordingly. We must compare the consequencesfails to reject a null hypothesis (i.e., possibility 4), of making each wrong decision. For example,the researcher has made a Type II error. perhaps the new method of reading at Pinecrest Elementary is much more expensive to implement The probability level selected by the re- than the old, traditional method of reading: If wesearcher determines the probability of commit- adopt it, we will have to spend a great deal ofting a Type I error—that is, of rejecting a null money on materials, in-service training, and newhypothesis that is really true (i.e., thinking you’ve testing. Given the expense of the new program, wefound an effect when you haven’t). Thus, if you can set our level of significance (a) to .01; we wantselect a =.05, you have a 5% probability of making to reduce the likelihood of Type I error. In othera Type I error, whereas if you select a =.01, you words, we want to be confident (at a =.01) that,have only a 1% probability of committing a Type if we recommend the new program, it works bet-I error. For example, at Pinecrest Elementary, we ter than the old one—that any difference we may

Find more at chapter 18  •  Inferential Statistics 531find is not simply random error. We may be more one of the authors can help explain these choiceswilling to risk a Type II error (i.e., concluding further. A social service agency needed to makethe new method isn’t better, although it really  is) a choice about whether to use a day-treatmentbecause it is such an expensive program to imple- or residential-treatment program for adolescentment and we suspect we may find a better but drug abusers. A study of the two programs foundcheaper program. the residential program was significantly better at the predetermined alpha of .10. Because the risk We’re willing to take that risk because, in this of committing a Type I error (i.e., claiming theexample, a Type I error is the more serious error residential program was better when it wasn’t) wasfor Pinecrest Elementary. If we conclude that the not high, a =.10 was an acceptable level of program really works, but it’s not really any Unfortunately, the residential-treatment program,better than the old program (i.e., a Type I error), as you would imagine, was quite expensive. Eventhe superintendent is going to be very upset if a though the residential program was significantlybig investment is made based on our decision and different than the day-treatment program, thethe students show no difference in achievement researchers recommended using the day-treatmentat the end of the year. On the other hand, if we program. This example clearly shows the differ-conclude that the new program does not really ence between statistical and practical significance.make a difference, but it really is better (i.e., a The higher cost of the residential program didType II error), it’s likely that nothing adverse will not justify the statistical advantage over the day-happen. In fact, the superintendent would not treatment program. Furthermore, the researchersknow that the new method is better because, of subsequently found that, had they set the level ofcourse, we never implemented it. We just hope the significance at a higher level (a = .01), the differ-superintendent does not find any research sug- ence in programs would not have been statisticallygesting that the new program was effective else- significant. The researchers concluded, as did thewhere. Given the choice, then, we would rather program administrators, that the cheaper programcommit a Type II error (i.e., rejecting a successful was the better choice.program) rather than a Type I error (i.e., going tothe expense of implementing an unsuccessful Both as researchers and as consumers, weprogram). make choices every day based on acceptable lev- els of risk. For example, we may choose to take The choice of which type of error is worth vitamins each morning based on studies of theirrisking may not always be so clear, however. For effectiveness that show only marginally significantexample, if Pinecrest Elementary is not meeting its results. But because the risk of being wrong is notAdequate Yearly Progress (AYP) goals for reading severe (i.e., Type I error—so what if they mightunder No Child Left Behind (NCLB), the stakes are not really work? As long as they don’t hurt, it’shigh. We need to find a reading program better worth a try), we go ahead and take the vitamins.than the old method, or we risk losing funding On the other hand, if we decided to jump out ofand potentially closing the school. Under these an airplane, we would want to use a parachutecircumstances, we may be more likely to risk com- that has a very high probability of working cor-mitting a Type I error—we have little to lose if we rectly and would want to know how this type ofselect a program that is no better than the program parachute performed in repeated trials. And wewe are now using, and a lot to gain if the program would want a highly stringent probability level,is in fact better. Therefore, we can use a level of such as a =.000001 or beyond. The risk of beingsignificance that is less stringent; we can accept wrong is fatal. When you are unsure what levelthe greater risk that a difference of the size we of risk is acceptable, selecting a = .05 is a stan-find could occur by chance in 5 out of 100 studies dard practice that provides an acceptable balance(a =.05) or 10 out of 100 (a =.10). between Type I and Type II error. Otherwise, consider the risk: Are you jumping out of an air- The decision about level of significance for a plane or are you trying to decide if you shouldparticular study is based on both risk and practical take vitamin C? Fortunately, most choices in thesignificance. If the consequences of committing a fields of education and human services are not lifeType I error are not severe or life threatening, we or death.usually accept a lower level of significance (e.g.,a =.05 rather than a =.01). A study conducted by

Find more at www.downloadslide.com532 chapter 18  •  Inferential StatisticsDegrees of Freedom Selecting Among Tests of SignificanceAfter determining whether the significance testwill be two-tailed or one-tailed and select- Many different statistical tests of significance caning a probability cutoff (i.e., alpha), we then be applied in research studies. Factors such asselect an appropriate statistical test and con- the scale of measurement represented by the dataduct the analysis. When computing the statistics (e.g., nominal, ordinal, etc.), method of participantby hand, we check to see if we have significant selection, number of groups being compared, andresults by consulting the appropriate table at number of independent variables determine whichthe intersection of the probability level and test of significance should be used in a givendegrees of freedom (df) used to evaluate signifi- study. It is important that the researcher selectcance. When the analysis is conducted on the the appropriate test because an incorrect test cancomputer, the output contains the exact level of lead to incorrect conclusions. The first decisionsignificance (i.e., the exact probability that the in selecting an appropriate test of significance isresults are due to chance), and the degrees of whether a parametric test or a nonparametric testfreedom. must be selected. Parametric tests are usually more powerful and generally preferable when practical. An example may help illustrate the concept “More powerful” in this case means that, based onof degrees of freedom, defined as the number the results, the researcher is more likely to reject aof observations free to vary around a param- null hypothesis that is false; in other words, use ofeter. Suppose we ask you to name any five a powerful test makes the researcher more likelynumbers. You agree and say, “One, two, three, to identify a true effect and thus less likely to com-four, five.” In this case, N is equal to 5; you had mit a Type II error.five choices and you could select any numberfor each choice. In other words, each number A parametric test, however, requires certainwas “free to vary;” it could have been any num- assumptions to be met for it to be valid. Forber you wanted. Thus, you had five degrees example, the measured variable must be normallyof freedom for your selections (df = N). Now distributed in the population (or at least that thesuppose we tell you to name five numbers, you shape of the distribution is known). Many variablesbegin with, “One, two, three, four, . . .,” and we studied in education are normally distributed,say: “Wait! The mean of the five numbers you so this assumption is often met. A second majorchoose must be 4.” Now you have no choice for assumption is that the data represent an intervalthe final number—it must be 10 to achieve the or ratio scale of measurement, although in somerequired mean of 4 (i.e., 1 + 2 + 3 + 4 + 10 = 20, cases, ordinal data, such as from a Likert-typeand 20/5 = 4). That final number is not scale, may be included. Because many measuresfree to vary; in the language of statistics, you used in education represent or are assumed tolost one degree of freedom because of the represent interval data, this assumption is usuallyrestriction that the mean must be 4. In this met. In fact, this is one major advantage of usingsituation, you only had four degrees of freedom an interval scale—it permits the use of a paramet-(df = N - 1). ric test. A third assumption is that the selection of participants is independent. In other words, the Each test of significance has its own for- selection of one subject in no way affects selec-mula for determining degrees of freedom. For tion of any other subject. Recall from Chapter 5example, for the product moment correlation that, with random sampling, every member of thecoefficient, Pearson r, the formula is N - 2. The population has an equal and independent chancenumber 2 is a constant, requiring that degrees of to be selected for the sample. Thus, if randomiza-freedom for r are always determined by subtract- tion is used in participant selection, the assump-ing 2 from N, the number of participants. Each tion of independence is met. Another assumptionof the inferential statistics discussed in the next is that the variances of the comparison groups aresection also has its own formula for degrees of equal (or at least that the ratio of the variances isfreedom, but in every case, the value for df is known). Remember that the variance of a groupimportant in determining whether the results arestatistically significant.

Find more at chapter 18  •  Inferential Statistics 533of scores is the square of the standard deviation The basic strategy of the t test is to compare(see  Chapter 17 for discussion of variance and the actual difference between the means of thestandard deviation). groups (X1 - X2) with the difference expected by chance. For our data from Pinecrest Elementary With the exception of independence, a small School, we can use a t test to determine if the dif-violation of one or more of these assumptions ference between the reading scores of the boysusually does not greatly affect the results of and the girls is statistically significant, that is, thetests for significance. Because parametric statis- likelihood that any difference we find occurredtics seem to be relatively hardy, doing their job by chance. It involves forming the ratio of theeven with moderate assumption violation, they scores for the boys and the girls, as shown in theare usually selected for analysis of research data. formula below:However, if one or more assumptions are violatedto a large degree—for example, if the distribution t = X1 - X2is extremely skewed—parametric statistics shouldnot be used. In such cases, a nonparametric test, B a SS1 + SS2 2 b a 1 + 1which makes no assumptions about the shape of n1 + n2 - n1 n2 bthe distribution, should be used. Nonparametrictests are appropriate when the data represent In the formula, the numerator is the differencean ordinal or nominal scale, when a parametric between the sample means X1 and X2, and theassumption has been greatly violated, or when the ­denominator is the chance difference that wouldnature of the distribution is not known. be expected if the null hypothesis were true (i.e., no difference between the boys’ and the Nonparametric tests are not as powerful as girls’ scores). In other words, the denominatorparametric tests. In other words, it is more difficult is the standard error of the difference betweenwith a nonparametric test to reject a null hypothe- the means—a function of both sample size andsis at a given level of significance; usually, a larger group variance. Smaller sample sizes and greatersample size is needed to reach the same level of variation within groups are associated with greatersignificance as in a parametric test. Additionally, random differences between groups. Even if themany hypotheses cannot be tested with nonpara- null hypothesis were true, we do not expect twometric tests. Nevertheless, we often have no choice sample means to be identical; there will always bebut to use nonparametric statistics when we are some chance variation. The t test determines thedealing with societal variables that are not con- likelihood that a difference of this size would beveniently measured on an interval scale, such as expected solely by chance.religion, race, or ethnicity. If we were making the t test calculation by In the following sections, we examine both hand, we would divide the numerator by theparametric and nonparametric statistics. Although denominator and then determine whether thewe cannot discuss every statistical test available to resulting t value reflects a statistically significantthe researcher, we describe several statistics com- difference between the groups by comparing themonly used in educational research. t we computed to a table of t values (you can easily find a table of t values using an InternetThe t Test search). If the t value is equal to or greater than the table value for the required df (i.e., reflectingThe t test is used to determine whether two groups sample size) and alpha (i.e., reflecting signifi-of scores are significantly different at a selected cance level), then we can reject the null hypoth-probability level. A one-sample t test considers esis: Our results suggest a significant differencewhether a sample mean is significantly different between the groups. If the t value we compute isfrom a hypothesized value or population estimate. A less than the table value, we fail to reject the nulltwo-sample t test is used to determine if two levels/ hypothesis: Any difference we have found isvalues for one variable differ from each other sta- likely due to sampling error (i.e.,  chance). Oftistically. For example, a two-sample t test can be course, typically, we would conduct the t testused to compare the reading scores for males and with the computer, which produces outputfemales at Pinecrest Elementary School.

Find more at www.downloadslide.com534 chapter 18  •  Inferential Statisticsshowing the t value, its level of significance, and Elementary School, we would want to know ifthe degrees of freedom. the boys’ reading scores are statistically different from the girls’ scores. Even though we know from In determining significance, the t test is our previous example that the mean for the girlsadjusted for the fact that the distribution of scores on the fall reading score (X = 52.533) was higherfor small samples becomes increasingly differ- than that for the boys (X = 41.054), we do notent from the normal distribution as sample sizes know how likely it is that this difference wouldbecome increasingly smaller. For example, distri- occur by chance or if it is a meaningful differencebutions for smaller samples tend to be higher at statistically. The t test helps us decide whether thethe mean and at the two ends of the distribution. difference between the boys’ and the girls’ scoresAs a result, the t values required to reject a null is statistically significant, that is, not likely to havehypothesis are higher for small samples. As the occurred by chance.size of the samples becomes larger, the score dis-tribution approaches normality. Keep in mind that, We can use Excel, SPSS (Statistical Package foras the number of participants increases, degrees the Social Sciences), or a variety of other softwareof freedom also increase, and the t value (or test applications to conduct a t test. Although eachstatistic) needed to reject the null hypothesis program has advantages and disadvantages, dedi-becomes smaller. Furthermore, as alpha becomes cated statistical packages, such as SPSS, are set upsmaller (e.g., .01 vs .05), a larger t value is required in terms of dependent and independent variables,to reject the null hypothesis. and as our analyses become more complex and use larger numbers of variables and cases, statisti-The t Test for Independent Samples cal packages offer more advantages. Although the setup procedures for analyses on different statisti-The t test for independent samples is a para- cal software packages are slightly different, theymetric test of significance used to determine are all somewhat similar; that is, we select thewhether, at a selected probability level, a signifi- variables to be compared and the statistical test tocant d­ifference exists between the means of two run. To help you understand the variable selectionindependent samples. Independent samples are process required in statistical tests, we present arandomly formed without any type of match- step-by-step example of the t test procedure usinging—the members of one sample are not related SPSS. Explanations for other procedures we use into members of the other sample in any system- this chapter are available in Appendix A.atic way other than that they are selected fromthe same population. If two groups are randomly To perform the t test in SPSS, first clickformed, the expectation is that, at the beginning on Analyze and choose Compare Means fromof a study, they are essentially the same with the pull-down menu. A submenu appears, asrespect to performance on the dependent variable. shown in Figure 18.4. From this submenu, chooseTherefore, if they are also essentially the same at Independent-Samples T Test. . . . In summary, thethe end of the study (i.e., their means are close), options are as follows:the null hypothesis is probably true. On the otherhand, if their means are not close at the end of the Analyzestudy, the null hypothesis is probably false and Compare Meansshould be rejected. The key word is essentially. Independent Samples T TestWe do not expect the means to be identical at theend of the study—they are bound to be somewhat Selecting these options produces thedifferent. The question of interest, of course, is Independent-Samples T Test window, shown inwhether they are significantly different. Figure 18.5.Calculating the t Test for Independent In our example, we are comparing the fall read-Samples Using SPSS ing test scores (ReadF) of the boys in Ms. Alvarez’s third-grade class to the girls’ scores. We need toAs we have discussed, the t test for independent move the dependent (i.e., outcome) variable, fallsamples is used when we want to compare the reading score (ReadF), into the Test Variable(s)scores for two groups. For example, at Pinecrest section. Next, we need to specify that we would like to compare the group of boys and girls; we need to select the Grouping Variable, which is

Find more at chapter 18  •  Inferential Statistics 535Figure 18.4 • SPSS menu options for independent-samples t testFigure 18.5 • Independent-samples t test window

Find more at www.downloadslide.com536 chapter 18  •  Inferential Statistics Figure 18.6 • Independent-samples t test window with Define Groups button and Define Groups windowgender. We define the groups by selecting the group (represented in the N column) as well asDefine Groups button just underneath Grouping the mean test score for each group, the standardVariable, as shown in Figure 18.6. Because the deviation, and the standard error of the mean.two groups in our dataset are specified as Group Table  18.2 shows the results of the independent-1 for boys and Group 2 for girls, we simply type samples t test, including additional statistics tothe number 1 (for Group 1) and the number 2 (for assist with the interpretation. The first set of sta-Group 2). Remember that you need to specify the tistics comes under the heading Levene’s Test forcodes for the groups to be tested, which may not Equality of Variances. This test determines if thealways be 1 or 2, as in this example. variances of the two groups in the analysis are equal. If they are not, then SPSS makes an adjust- To run the analysis, click Continue to return to ment to the remainder of the statistics to accountthe Independent-Samples T Test window. Click on for this difference. When the observed probabil-the OK button. The analysis runs, and an output ity value of the Levene’s test (shown in the Sig.window opens, showing two tables. The first table column) is greater than .05, you should read theis the Group Statistics table, shown in Table 18.1. results on the top row of t test statistic, equalThis table shows you the sample size for each variance assumed, because we have found no significant difference in the variances. When theTable 18.1 • Independent-samples output observed probability value for the Levene’s test is less than .05, you should read the results fromGroup Statistics the bottom row of t test statistics, equal variance not assumed, because the difference between theGender N Mean Std. Std. group variances is significant. In Table 18.2, the Deviation Error Mean observed probability value for the Levene’s test is greater than .05 (i.e., Sig. = .560, no significant dif-ReadF 1 13 41.054 14.0581 3.8990 ference found), so you should read from the top row of the t test statistic, equal variances assumed.2 12 52.533 13.2480 3.8244

Find more at chapter 18  •  Inferential Statistics 537Table 18.2 • Independent-samples t test statistics Independent Samples Test Levene’s Test for t-test for Equality of Means Equality of Variances 95% Confidence Interval of the Difference F Sig. t Sig. Mean Std. Error df (2-Tailed) Difference Difference Lower UpperReadF Equal variances assumed .350 .560 -2.097 23 .047 -11.4795 5.4750 -22.8055 -.1535 .047 -11.4795 5.4615 -22.7778 -.1811Equal variances not assumed -2.102 22.987 Having selected the appropriate row to read, The t Test for Nonindependent Sampleswe can find the observed t statistic value and itscorresponding probability value. In Table 18.2, the The t test for nonindependent samples is used toobserved t statistic we use for equal variances is compare groups that are formed by some type of-2.097 with its observed level of significance (Sig.) matching. The nonindependent procedure is also= .047. The value for t is negative because SPSS used to compare a single group’s performance on asubtracts the second number from the first, but pretest and posttest or on two different treatments.the sign (i.e., negative or positive) has no effect For example, at Pinecrest Elementary School, weon how we interpret the level of significance. The want to know if students’ reading scores improvedsignificance level of this t test (p = .047) indicates from the beginning of the year to the end of thethat a difference between the means this large (i.e., year. Because we have fall and spring test scores-11.4795) would happen by chance only 4.7 times for each student, this sample has nonindependentout of 100 in repeated studies. Because .047 is scores. When scores are nonindependent, they aresmaller than the standard alpha level (p = .05) that systematically related: Because at Pinecrest thewe had preselected for our level of significance, this reading scores are from the same students at twoexample shows a statistically significant difference different times, they are expected to correlate posi-between the fall reading scores of the boys and tively with each other—students with high scoresthose of the girls. We can thus have confidence as in the fall will likely have high scores in the spring,we inform our colleagues at Pinecrest Elementary and students with low scores in the fall will likelythat the boys are entering third grade with lower have low scores in the spring. When the scores arereading achievement than the girls; we may suggest nonindependent, a special t test for nonindepen-that the school staff accommodate for this differ- dent samples is needed. The error term of the t testence using the current reading curriculum as the tends to be smaller than for independent samples,school year proceeds, or we may recommend new and the probability that the null hypothesis will beprograms to benefit the boys in younger grades. rejected is higher. Note that, although we’ve used SPSS to pres- Calculating the t Test for Nonindependentent our example, we could use other programs Samples Using SPSSand achieve the same result. For example, in Excel,we would select the appropriate t test in the Data Even though we are using the same student dataAnalysis menu and then define our variable range for Pinecrest Elementary School in our examples,(e.g., boys’ fall reading scores compared to the the different questions we ask allow equally dif-girls’ scores) to conduct the test. The value for t and ferent analyses. To answer our question aboutthe probability that the results are due to chance whether the students’ readings scores improvedwill be the same as those generated by SPSS. over the school year, we conduct a t test of

Find more at www.downloadslide.com538 chapter 18  •  Inferential Statisticsnonindependent samples, comparing fall read- move them to the right section of the window,ing scores to spring reading scores. Because we labeled Paired Variables. As you select the vari-are also interested to know if the students’ math ables, the Current Selections section at the bottomscores also improved, we conduct a second com- of the window (Figure B18.2) shows them. Clickparison including MathF (i.e., fall scores) and the arrow button to move them into the PairedMathS (i.e., spring scores). It is easy to conduct Variables section on the right of the screen, andseveral analyses of nonindependent samples with click the OK button to conduct the analysis.SPSS—we just designate the additional variables inthe t test. The first section of the output, showing descriptive statistics for each variable, is displayed To conduct these analyses, select Analyze in in Table  18.3. First, the mean for each variable isthe SPSS Data Editor window. To find the non- shown. For ReadF, X = 46.564; for MathF, X =independent t test, scroll down the Analyze menu 45.408. Similarly, for ReadS, X = 53.964, and forand select the Compare Means option. From the MathS, X = 45.560. The next number shown issubmenu, choose the option called Paired Samples the number of cases, or the sample size (N). TheT Test. This difference in designation may seem a output shows that 25 people—the number of stu-bit confusing. One way to think of it is that we are dents in Ms. Alvarez’s class—took each of the tests.comparing two sets of scores (i.e., fall and spring) The third statistic shown is the standard deviationfor the same group of students. Therefore, the rela- for each set of test scores. SD is used to computetion between the sets of scores is dependent on the final statistic shown in the table, the standardthe group of people, our Pinecrest students. We error of the mean scores. These values are used toare not, however, matching each student to the compute t, shown in the remainder of the outputscores; rather, we are comparing the group means (Table 18.4).in the fall to the group means in the spring (i.e., apair of data points for each participant). Table 18.3 • Dependent samples output In summary, the procedures for the SPSS anal- Pair 1 Paired Samples Statistics Std.ysis are as follows: Pair 2 Std. Error MeanAnalyze Mean N Deviation 2.9225Compare Means ReadF 46.564 25 14.6123 3.4255Paired-Samples T Test . . . ReadS 53.964 25 17.1274 3.0622 MathF 45.408 25 15.3108 3.0793Because this analysis is somewhat similar to the pre- MathS 45.560 25 15.3964vious t test example, you may refer to the test menuoptions in Appendix A, Figures A18.1 and A18.2. The paired samples t test requires that youchoose the variables to include in the analysisand, as with the test for independent samples,Table 18.4 • Dependent samples t test output Paired Samples Test Paired Differences 95% Confidence Interval of the Difference Std. Std. t Sig. Mean Deviation Error Mean Lower Upper df (2-tailed)Pair 1 ReadF–ReadS -7.4000 5.5039 1.1008 -9.6719 -5.1281 -6.722 24 .000Pair 2 MathF–MathS   -.1520 1.5286  .3057   -.7830   .4790   -.497 24 .624

Find more at chapter 18  •  Inferential Statistics 539 SPSS generates and displays the t value, why the math scores have not improved, but it hasdegrees of freedom, and the significance value notified us that we have a problem.(i.e., Sig. in the output, also referred to as the pvalue) showing the probability that these results One final note on the t test: In this example,could be obtained simply by chance. The first box the t value is negative because SPSS automaticallyin the table shows the variables that are being subtracts the second mean score in the list fromcompared. The next four boxes show the differ- the first. If the students had done better on the fallence between the mean scores, the standard devia- reading test than the spring reading test (i.e., if thetion, the standard error of the difference between mean scores had been reversed), then the differ-the mean scores, and the 95% confidence interval ence in the means and the resulting t test statistic(i.e., the range of values in which you can be 95% would have been positive. We would hope not toconfident that the real difference between mean have a positive t value for Pinecrest because a pos-scores falls). The last three boxes show the t value, itive value would suggest that students did worsethe degrees of freedom, and the significance value. in the spring than they did in the fall.If the value in the box labeled Sig. (2-tailed) is lessthan or equal to a = .05, then the students’ fall Analysis of Gain or Difference Scoresreading and math tests (i.e., ReadF and MathF) aresignificantly different than their spring reading and As we noted previously, when comparing the read-math tests (i.e., ReadS and MathS). ing and math scores for students between fall and spring, we did not match each student’s score. The finding for reading is good news for Instead we compared the mean of all students inthe students and Pinecrest Elementary teachers: the fall to that of all students in the spring. SomeStudents’ reading scores in the spring are better researchers think, however, that a viable way tothan their scores in the fall (t = 26.722, p = .000; analyze data from two groups who are pretested,note that probability isn’t equal to zero—rather, treated, and then tested again is to: (1) subtractit’s so small that it can’t be shown in only three each participant’s pretest score from his or herdecimal places). We are able to reject the null posttest score (resulting in a gain, or difference,hypothesis of no difference in the scores. We don’t score), (2) compute the mean gain or differenceknow why, however, the scores are different. For for each group, and (3) calculate a t value forexample, if we were testing a new reading pro- the difference between the two average meangram at Pinecrest, our findings would provide sup- ­differences. This approach has two main problems.port for the new program. Or the improvement in First, every participant, or student in our example,reading scores could be due to Ms. Alvarez’s skill does not have the same potential to gain. A partici-as a teacher. The improvement could also be due pant who scores very low on a pretest has a largeto parental involvement or other variables we may opportunity to gain, but a participant who scoreshave not controlled or considered. The t test can very high has only a small opportunity to improveonly tell us if the difference between the means (the latter situation, where participants score at oris likely due to chance, not why the difference near the high end of the possible range, is referredoccurs. to as the ceiling effect). Who has improved, or gained, more—a participant who goes from 20 to What happened to the math scores? Table 18.4 70 (a gain of 50) or a participant who goes fromshows that the fall math scores (X = 45.408) and 85 to 100 (a gain of only 15 but perhaps a perfectthe spring math scores (X = 45.560) were not sig- score)? Second, gain or difference scores are lessnificantly different. Without even conducting the t reliable than analysis of posttest scores alone.test, we can expect intuitively that this differenceis not significant because the scores increased The appropriate analysis for data from pretest–only .1520 between the fall and the spring—not posttest designs depends on the performance ofmuch happened. The test confirms our intuition: t the two groups on the pretest. For example, if both= 2.497, Sig. = .624. In other words, we can expect groups are essentially the same on the pretest andthis finding by chance in more than 62% of other neither group has been previously exposed to thesimilar studies. The students did not appear to treatment planned for it, then posttest scores arelearn very much, and we want to find out why, so best compared using a t test. On the other hand,we have more work to do. The t test cannot tell us if there is a difference between the groups on the

Find more at www.downloadslide.com540 chapter 18  •  Inferential Statisticspretest, the preferred approach is the analysis of possible to compute a series of t tests, one for eachcovariance. As discussed in Chapter 9, analysis of pair of means, to do so raises some statisticalcovariance adjusts posttest scores for initial differ- problems concerning distortion of the probabilityences on some variable (in this case, the pretest) of a Type I error, and it is certainly more conve-related to performance on the dependent variable. nient to perform one simple ANOVA than several tTo determine whether analysis of covariance is tests. For example, to analyze four means, six sep-necessary, calculate a t test using the two pretest arate t tests would be required (X1 - X2, X1 - X3,means. If the two pretest means are significantly X1 - X4, X2 - X3, X2 - X4, X3 - X4). ANOVAdifferent, use the analysis of covariance. If they is much more efficient and keeps the error rateare not, a simple t test can be computed on the under control.p­ osttest means. The concept underlying ANOVA is that the totalAnalysis of Variance variation, or variance, of scores can be divided into two sources—variance between groups andAnalysis of variance (ANOVA) is a parametric variance within groups. Between-group variancetest of significance used to determine whether considers, overall, how the individuals in a par-scores from two or more groups are significantly ticular group differ from individuals in the otherdifferent at a selected probability level. Note that, groups. In our example of the Pacific Crestif only two groups are being compared, the analy- College students, between-group variance referssis of variance computation will yield identical to the ways in which students from differentresults to an independent-samples t test comparing economic backgrounds differ from one another.those two groups. Ultimately, between-group differences are what researchers are usually interested in. The within-Simple Analysis of Variance group variance considers how students vary from others in the same group. Not every student inSimple, or one-way, analysis of variance the high-economics group has the same GPA,(ANOVA) is used when the comparison involves for example. These differences are known as theone variable with two or more levels. Because it within-group variance, or error variance.involves only one variable, it is also known as one-way ANOVA. For our example, we introduce a new To ensure that apparent group differencesdataset composed of freshman college students at aren’t just due to these differences among peoplePacific Crest College. Table A18.1 in Appendix A in general (i.e., just error), ANOVA involves adisplays the Pacific Crest College dataset, which ratio, known as F, with group differences as theincludes data from 125 students. Although this numerator (i.e., variance between groups) anddataset for Pacific Crest College is still not large, error as the denominator (i.e., variance withinit is sufficient to allow us to accomplish basic groups). If the variance between groups is muchANOVA and multiple regression analyses. We use greater than the variance within groups, greaterANOVA to test whether Pacific Crest College stu- than would be expected by chance, the ratio willdents’ college grade point averages (CollGPA) dif- be large, and a significant effect will be appar-fer for groups of students based on their economic ent. On the other hand, if the variance betweenlevels (ECON). In other words, economic level groups and the variance within groups do not dif-is the grouping variable, with three levels (low, fer by more than would be expected by chance,medium, and high), and college GPA is the depen- the resulting F ratio is small; the groups are notdent variable. significantly different. Three (or more) means are very unlikely to be To summarize, the greater the difference inidentical; the key question is whether the differ- variance, the larger the F ratio; larger F’s areences among the means represent true, significant more likely to reflect significant differences amongdifferences or chance differences due to sampling groups. However, a significant finding tells theerror. To answer this question, simple ANOVA researcher only that the groups are not all theis used: An F ratio is computed. Although it is same. To identify how the groups differ (i.e., which means are different from one another), additional statistics, described next, are needed.

Find more at chapter 18  •  Inferential Statistics 541Multiple Comparisons researcher and can be selected in the SPSS analy- sis; the discussion of each is beyond the scope ofIn essence, multiple comparisons involve calcula- this chapter.tion of a special form of the t test. This special tadjusts for the fact that many tests are being exe- Because ANOVA is an analytical method forcuted. Each time we conduct a significance test, we comparing means, we begin the SPSS procedure asaccept a particular probability level, a. For example, we have previously by selecting:we agree that if the results we find would occur bychance only five times in every 100 samples, we Analyzewill conclude that our results are meaningful and Compare Meansnot simply due to chance. However, if we conduct One-Way ANOVAtwo tests of significance on the same dataset, thechance of finding a significant difference increases For your reference, the menu options for ANOVA(i.e., we now have two tests that could show a sig- are shown in Appendix A, Figure A18.3.nificant difference), but the chance of committinga Type I error increases as well (i.e., we now have The second step is to select the Post Hoc . . .two chances to commit this error, one for each button in the One-Way ANOVA window (showntest). When multiple comparisons are involved, spe- in Figure A18.4). For this analysis, use the Scheffécial statistics are needed to keep the error low. multiple-comparison technique by checking the appropriate box (displayed in Figure A18.5). Click In general, when conducting a study that the Continue button to conduct the analysis.involves more than two groups, researchers plan aset of comparisons between specific groups before SPSS produces a series of tables as output.collecting the data, based on the research hypoth- The first table, shown in Table 18.5, gives theeses. Such comparisons are called a priori (i.e., ratio of between-group variance to within-group“before the fact”) or planned comparisons. For variance, F = 37.060, and the associated prob-example, in our study of Pacific Crest College, we ability value, p = .000. From this ANOVA, we canmay predict that the GPAs of high-income students conclude that college GPA (CollGPA) differs forwill differ from those of low-income students and students at different economic levels (Econ). Inplan to conduct that comparison. Often, however, the language of statistics, we can reject the nullit is not possible to state a priori predictions. hypothesis of no difference between the stu-In these cases, we can use a posteriori, or post dents’ GPAs; that is, students’ GPAs appear to behoc (i.e., “after the fact”), comparisons. In either dependent on their economic level. Notice we saycase, multiple comparisons should not be a fish- “appear;” we never have definitive proof becauseing expedition in which researchers look for any inferential statistics simply provide an evaluationpossible difference; they should be motivated by of probability. We can be quite confident of ourhypotheses. conclusion, however, because we have a relatively high F statistic (37.060), which would occur byCalculating ANOVA with Post Hoc Multiple chance fewer than once in 1,000 samples (i.e., p =Comparison Tests Using SPSS .000; remember that SPSS shows only three deci- mal places).In this example, we use SPSS to run an Table 18.5 • Overall ANOVA solutionANOVA to determine whether and howthe college GPAs differ for students in ANOVAthe high, middle, and low economicgroups. We selected the Scheffé test CollGPAas the multiple-comparison procedurebecause it is somewhat conservative in Sum of Squares df Mean Square F Sig.its analysis, requiring a large differencebetween means to show significance. A Between Groups 12.445   2 6.223 37.060 .000number of other multiple-­comparisontechniques are also available to the Within Groups 20.484 122  .168 Total 32.929 124

Find more at www.downloadslide.com542 chapter 18  •  Inferential StatisticsTable 18.6 • SPSS summary table for Scheffé multiple-comparison test Multiple ComparisonsDependent Variable:CollGPA Scheffe 95% Confidence Interval Mean(I) Econ (J) Econ Difference (I − J) Std. Error Sig. Lower Bound Upper Bound1 2 -.01385 .08732 .988 -.2302  .2025 3 -.70290(*) .08995 .000 -.9258 -.48002 1 .01385 .08732 .988 -.2025  .2302 3 -.68906* .09414 .000 -.9223 -.45583 1 .70290* .08995 .000  .4800  .9258 2 .68906* .09414 .000  .4558  .9223* The mean difference is significant at the .05 level. The Scheffé test for our comparisons is dis- Multifactor Analysis of Varianceplayed in Table 18.6. This table shows the meantest score of each group compared with that for If a research study uses a factorial design to inves-each other group. For example, the first row shows tigate two or more independent variables and thea comparison between the low and the middle interactions between them, the appropriate statisti-economic groups, and the second row shows a cal analysis is a factorial, or multifactor, analysis ofcomparison between the low and the high eco- variance. This analysis yields a separate F ratio fornomic groups. The difference between the mean each independent variable and one for each interac-scores is shown, along with the standard error of tion. When two independent variables are analyzed,the difference and a probability value for the test. the ANOVA is considered a two-way; when three independent variables are analyzed, it is considered From this test, we can see that the GPAs of the a three-way ANOVA; and so forth. In some analyses,students in the low-economic group do not dif- two dependent variables are analyzed in a multivari-fer from those of students in the middle-economic ate analysis of variance (MANOVA). For example,group (i.e., row 1 in the table, Sig. = .988). However, suppose that we want to consider whether genderthe GPAs of the students in the low-economic group and economic level both affect students’ collegeare significantly different from those of students in achievement. MANOVA would allow us to considerthe high-economic group (i.e., row 2 in the table; both independent variables (i.e., economic level,Sig. = .000). The positive and negative signs for the gender) and multiple dependent variables (e.g., col-mean differences allow us to determine further that lege GPA as well as other test scores we may havethe students in the highest economic level had the from math or language classes). As you can imagine,highest mean on the GPA and students at the low- however, we need a large dataset to run increasinglyest economic level had the lowest mean. complex analyses with multiple independent and dependent variables. For example, of the 125 stu- In this example, the multiple-comparison proce- dents at Pacific Crest College, there are no womendure allows us to identify that the overall difference in the highest economic group who are at the low-shown by the ANOVA is due to the students in the est level of reading. Complex statistical analyses arehigher economic level having higher GPAs than stu- not warranted without a larger sample size that hasdents at the middle and lower economic levels. Our meaningful variation among the variables.findings match previously published research indi-cating that more economically advantaged students, Although a factorial ANOVA is a more complexas a group, are likely to have higher GPAs in college procedure to conduct and interpret than a one-waythan students with fewer economic resources.

Find more at chapter 18  •  Inferential Statistics 543ANOVA, the basic process is similar. SPSS or other results are likely to be misleading at best. Otherstatistical packages provide the appropriate sta- assumptions associated with the use of ANCOVAtistical tests; we simply specify the independent are not as serious if participants have been ran-variables and dependent variable for the analysis. domly assigned to treatment groups.Analysis of Covariance A second function of ANCOVA is that it increases the power of a statistical test by reduc-Analysis of covariance (ANCOVA) is a form of ing within-group (error) variance. Power refersANOVA that accounts for the different ways in to the ability of a significance test to identify awhich the independent variables are measured, true research finding (i.e., there’s really a differ-taking into account the design of the study. When ence, and the statistical test shows a significanta study has two or more dependent variables, difference), allowing the experimenter to rejectmultivariate analysis of covariance (MANCOVA) is a null hypothesis that is false. In the language ofan appropriate test. ANCOVA is used in two major statistics, increasing power reduces the likelihoodways: as a technique for controlling extraneous that the experimenter will commit a Type II error.variables and as a means of increasing the power Because ANCOVA can reduce random samplingof a statistical test. error by statistically equating different groups, it increases the power of the significance test. The For controlling variables, use of ANCOVA is power-increasing function of ANCOVA is directlybasically equivalent to matching groups on the related to the degree of randomization involved invariable or variables to be controlled. ANCOVA formation of the groups. Although increasing sam-adjusts posttest scores for initial differences on ple size also increases power, we are often limiteda variable; in other words, groups are equalized to samples of a given size for financial and practi-with respect to the control variable and then com- cal reasons (e.g., Ms. Alvarez’s class at Pinecrestpared. ANCOVA is thus similar to handicapping in Elementary includes only 25 students); ANCOVAbowling or golf. In an attempt to equalize teams, thus is often the only way to increase power for ahigh scorers are given little or no handicap, and particular study.low scorers are given big handicaps. Any variablethat is correlated with the dependent variable can Of course, SPSS and many other statisticalbe controlled for using covariance. Examples of software packages provide the procedures forvariables commonly controlled using ANCOVA are conducting ANCOVA and MANCOVA. The pro-pretest performance, IQ, readiness, and aptitude. cedures are somewhat similar in that we desig-By analyzing covariance, we are attempting to nate the dependent and independent variablesreduce variation in posttest scores that is attribut- for analysis and the program produces tables ofable to another variable. Ideally, we would like all results. However, given the assumptions underly-posttest variance to be attributable to the treatment ing the use of these complex statistics, researchersconditions. must be mindful of when and how these analyses should be employed. We cannot stress enough that ANCOVA is used both in causal–comparative analyses and their subsequent meanings need tostudies in which already formed but not necessar- be formulated and interpreted in relation to theily equal groups are involved, and in experimental research design and hypotheses you have formu-studies in which either existing groups or ran- lated, not based exclusively on what appears ondomly formed groups are involved. Unfortunately, the computer screen. For example, the results ofthe situation for which ANCOVA is least appropri- ANCOVA and MANCOVA are least likely to be validate is the situation for which it is used most often. when groups have not been randomly selected andUse of ANCOVA assumes that participants have assigned, yet in educational research, we often arebeen randomly assigned to treatment groups. Thus, faced with this situation—can you imagine tryingit is best used in true experimental designs. If to seek permission by assuring parents that if theirexisting or intact groups are not randomly selected child is randomly assigned to the less successfulbut are assigned to treatment groups randomly, method, we could always have the student repeatANCOVA may still be used, but results  must be the same grade again next year with the moreinterpreted with caution. If ANCOVA is used with successful method? Often, reality clashes with ourexisting groups and nonmanipulated independent knowledge of the most appropriate research andvariables, as in causal–comparative studies, the statistics methods.

Find more at www.downloadslide.com544 chapter 18  •  Inferential StatisticsMultiple Regression In our example, we consider the effects of high school math score (Math), high school languageThe more independent variables we have mea- score (Lang), and high school GPA (HSGPA) onsured or observed, the more likely we are to Pacific Crest College students’ college GPA. Weexplain the outcomes of the dependent variables. consider only the direct effects of each variable inMultivariate statistical analyses tell us how much the model on the single criterion variable.of the variance found in the outcome variable isattributed to the independent variables. Whereas At Pacific Crest College, several groups areANOVA is the appropriate analysis when the interested in determining the variables that bestindependent variables are categorical, multiple predict a student’s college GPA. The admissionsregression is used with ratio or interval variables. office at Pacific Crest College, for example, wants toMultiple regression combines variables that are make correct decisions by admitting students whoknown individually to predict (i.e., correlate with) will most likely be successful. High school counsel-the criterion into a prediction equation known as ors who recommend the college also would like toa multiple regression equation. Multiple regression know what they can do at the high school level tois an extremely valuable procedure for analyzing increase a student’s chance for success in c­ ollege—the results of a variety of experimental, causal– can any of the variables that predict success becomparative, and correlational studies because it controlled (i.e., are any variables malleable)? Indetermines not only whether variables are related fact, two such variables may be influenced by thebut also the degree to which they are related. high school counselors, math and language scores.Understanding how variables are related is benefi- If our multiple regression shows that higher mathcial both for researchers and for groups needing to or language scores are associated with a highermake data-based decisions. college GPA, we may recommend to the high school counselors that they encourage students to Step-wise analysis is often used for regression improve their math and language knowledge. Ifbecause it allows us to enter predictor variables reading and math skills are p­redictors of collegeinto the regression equation, or omit them, step by success, then it behooves students to improve thesestep (i.e., one variable at a time). We can see which skills while they are still in high school.of the predictor variables are making the most sig-nificant contribution to the criterion variable, and Figure 18.7 • Multiple regression model forwe can remove variables from our predictive model Pacific Crest Collegeif they are not making a significant contribution. Independent Variables Dependent Variable Multiple regression is also the basis for pathanalysis that begins with a predictive model (see High SchoolFigure 18.7). Path analysis identifies the degree to GPAwhich predictor variables interact with each otherand contribute to the variance of the dependent (HSGPA)variables. Basically, path analysis involves multipleregressions between and among all the variables in SAT College GPAthe model and then specifies the direct and indirect (CollGPA)effects of the predictor variables onto the criterion High Schoolvariable. Although somewhat more complex to calcu- Math Scorelate than a simple multiple regression, path analysisprovides an excellent picture of the causal relations (Math)among all the variables in a predictive model. High School As an example of multiple regression, we use Language Scorethe Pacific Crest College dataset (see Appendix A,Table A.18.1) that we used previously with ANOVA. (Lang)A distinct advantage of multiple regression is thatwe must create a predictive model that positsin advance which variables predict the criterionvariable or variables, as illustrated in Figure  18.7.

Find more at chapter 18  •  Inferential Statistics 545 Our basic question for the multiple regression box on the left. Next, we select the four predictoranalysis is: What are the best predictors of college (labeled by SPSS as independent) variables (HSGPA,GPA? If we had a large number of variables from SAT, Math, Lang) by highlighting each variable andwhich to choose, we would select the variables that clicking on the arrow to move the variables intohave the greatest likelihood of predicting success, the appropriate box. Underneath the independentbased on previous research. The results of multiple variables box, we select a multiple ­regressionregression would give us the answer to our ques- procedure; we have selected Enter for this analy-tion and would also tell us the relative contribution sis. Also notice the box below the independentof each predictor variable on the criterion vari- variable is labeled Selection Variable. Althoughable. For example, we may find that high school multiple regression uses ratio or interval variables,GPA, language score, and SAT scores are the best SPSS will run separate analyses using a nominalpredictors of college GPA, with math score as the v­ariable if it is included in the Selection Variablevariable contributing the least. We could then run box. This option allows us to compare multiple re-another multiple regression excluding math score, gression analyses by gender or economic level, foror we could add other variables to our multiple example, both of which are included in our Pacificregression equation. At this point we have a choice Crest College dataset and included in the variablesof procedures for the multiple regression. We can window on the left in Figure A18.7. If we wantedenter variables one at a time, or we can enter them to compare multiple regression outcomes of theall at once, as shown in the following example. The males and females on college GPA, we would selectoutcomes would be similar; we just have several gender as the selection variable and then, with thechoices of how we build and interpret them. Rule button, select 1 for males as our first analysis. For the next analysis, we would then simply selectCalculating Multiple Regression Using SPSS 2 to conduct the multiple regression test for the females. Multiple regression also allows the user toStatistical software packages such as SPSS provide transform a nominal variable into an interval vari-various choices for conducting multiple regression, able, known as a dummy variable. For example, aand some are quite complex. Because our goal nominal variable, such as gender, can be coded asin this chapter is to give you a conceptual under- 0 and 1 and entered into the multiple regression asstanding of inferential statistics, we provide a sim- if it were an interval variable. The interpretation ofplified analysis of the model illustrated in Figure dummy variables indicates order of the variable in18.7. Other options for accomplishing our analysis the multiple regression equation and has meaninginclude a complete step-wise regression where only for purposes of the statistical calculation.we enter independent variables one at a time toconsider their cumulative effects or we could con- The multiple regression summary output of ourduct a path analysis where we would consider all example for predicting college GPA is shown inthe multiple effects between the predictor vari- Table 18.7. The complete model yields an R value ofables and criterion variable. For the purposes of .893, which is a calculation by SPSS from the mul-our example, we enter all the predictor variables tiple regression equation. This R value is quitetogether to consider their cumulative effect—the helpful because, when it is squared (R2 = .798), itbasic multiple regression procedure called Enter. provides the percentage of variance in the criterion variable explained by the predictor variables—the As with all statistical analyses, we specify ourvariables and follow the SPSS options as follows Table 18.7 • Multiple regression output summary(refer to Appendix A Figure A18.6 and FigureA18.7 for the Linear Regression window): Model SummaryAnalyze Adjusted R Std. Error ofRegression Model R R Square Square the EstimateLinear 1 .893(a) .798 .791 .23545Once the Linear Regression window is open, weselect the criterion (labeled by SPSS as dependent) a Predictors: (Constant), LANG, SAT, HSGPA, MATHvariable (CollGPA) from the list of variables in the

Find more at www.downloadslide.com546 chapter 18  •  Inferential Statisticsfour predictor variables explain Table 18.8 • Multiple regression: analysis of variance and of coefficients79.8% of the variance in collegeGPA. A simple way to explain ANOVA(b)this finding is that, first, weknow there are many reasons Model Sum of Squares df Mean Square F Sig.that some students have higher 1  Regression 26.277   4 6.569 118.494 .000*GPAs or lower GPAs than oth- Residual  6.653 120  .055ers (i.e., there is variance), andsecond, we’ve identified some Total 32.929 124of the reasons—not all the rea- a Predictors: (Constant), LANG, SAT, HSGPA, MATHsons (i.e., R2 is not 1.00), but b Dependent Variable: CollGPAour predictive model is quitegood because our four predic- Coefficients(a)tor variables explain or account Standardizedfor about 80% of the variance Unstandardized Coefficients Coefficientsin college GPA. If we can find Model B Std. Error Beta t Sig.other variables to explain theremaining 20% of the variance, 1  (Constant) 1.068 .139 7.698 .000we can predict college GPA HSGPA .196 .045 .211 4.366 .000with an even higher amount of SAT 3.96E-005 .000 .007 .137 .892certainty. Remember, of course,there is always a risk of being MATH .002 -.016 -.263 .793wrong or predicting incor- LANG .022 .002 .792 13.240 .000rectly—we never have certainty. a Dependent Variable: CollGPA Now that we know all fourindependent variables providean effective predictive model,our next step is to under-stand which variables are the best predictors. Additionally, SPSS provides individual weightsTable  18.8 shows analysis of the variance from or coefficients to explain the contribution eachthe SPSS Regression procedure. The SPSS output variable has on the criterion. Coefficients are cal-first lists the F ratio (F = 118.494) and the level culated as unstandardized B and as standardizedof significance of the whole model we are test- beta, which accounts for the standard error. Ining; in this case, the probability of this finding this example, the high beta weight (.792) for theoccurring by chance is less than 1 in 1,000 (i.e., language score (Lang) also shows it is the stron-although the output notes Sig. = .000, remember gest predictor. SAT, in contrast, is not a very goodthat there is always a probability, very small in indicator of college GPA for these students in thisthis ­example, that this finding could occur by particular regression model—the beta weight forchance). What is especially helpful about mul- the SAT score is quite low (.007), as is its t valuetiple regression is that it gives us information (.137); the probability that the finding is due toabout the ­individual contribution each variable chance is quite high (.892). Our regression modelis making to the variance in the criterion vari- also suggests that math score is not a good pre-able. To interpret the information in Table 18.8, dictor of college GPA (beta = -.016, t = -.263,we look first at the t value calculation and its Sig. = .793). Note, however, that the significancelevel of significance. This calculation gives us the level of a variable in a multiple regression modelindividual effect of each variable in the model is dependent on the model, and in particular the(including the ­“constant” or “Y” value, which is other variables included in the model, especiallypart of the regression equation). The strongest when there are strong relations between variables.predictors in the model are language score (Lang; It is important to consider all combinations oft = 13.240, Sig. = .000) and high school GPA variables in order to understand the effect of each(HSGPA; t = 4.366, Sig. = .000). individual variable.

Find more at chapter 18  •  Inferential Statistics 547 It is also important to remember that the research, we are often interested in the effects ofresults shown for Lang, HSGPA, SAT, and Math are nominal variables, such as race, class, or religion,for this sample only. Other samples using differ- so chi square offers an excellent analytical tool.ent regression models will likely show differentresults. The design and validity of the study are Simple frequency counts for the variablesimportant—we want to include the variables that under consideration are often presented in contin-have the most meaning or best predict the criterion gency tables, such as those shown in Chapter 17,variable. Our regression models are only as good Tables 17.3 and 17.4. A contingency table byas the data we collect and the choices we  make itself presents basic descriptive data; a chi-squareregarding the variables to include in our analyses. analysis helps determine if any observed differ- ences between the variables are meaningful and Nevertheless, this example suggests that stu- is computed by comparing the frequencies ofdents’ language scores in high school and their high each variable observed in a study to the expectedschool GPAs are highly effective predictors of GPA frequencies. Expected proportions are usually theat Pacific Crest College. In contrast, their SAT and frequencies that would be expected if the groupsmath scores add very little to the accuracy of pre- were equal (i.e., no difference between groups),dicting their college GPAs. Based on this analysis, although occasionally they also may be based onwe can tell the admissions office that the students past data. The chi-square value increases as thewith the best chances of success in college are most difference between observed and expected fre-likely those with higher language scores and high quencies increases; large chi-square values indi-school GPAs. Of course, we make this interpretation cate statistically significant differences.cautiously because we did not study all variablesthat can possibly affect college success. Obviously, As an example of a chi-square analysis, wesome high school students with high language consider the relation between gender and read-scores and high GPAs will not succeed at college for ing level for the students at Pacific Crest College.other reasons; for example, psychological, personal, We use chi square because we have two nominalor family effects can contribute to college success variables: gender (i.e., male, female) and read-or failure. However, considering the variables we ing level (i.e., low, medium, high). Reading levelhave measured and over which we know counsel- (ReadLevel) is a composite variable that considersors and students have some control (e.g., improving the language score, reading fluency, and a place-language skills), we can offer advice about two very ment assessment of reading and language abilitygood predictors of college success at Pacific Crest. when the students started college. Reading couldWe cannot, however, make predictions for other be considered an ordinal variable because thecolleges because Pacific Crest students may be dif- reading levels are in order from low to mediumferent from other college students. to high. Because reading level is a composite of both qualitative and quantitative considerations,Chi Square however, the distance between low and medium is not likely the same as between medium and high.Chi square, symbolized as x2, is a nonparametric For purposes of our example, then, ReadLevel istest of significance appropriate when the data are considered nominal and should be analyzed with ain the form of frequency counts or percentages nonparametric measure, such as chi square.and proportions that can be converted to frequen-cies. It is used to compare frequencies occurring in As illustrated in Table 18.9, the basic contin-different categories or groups. Chi square is thus gency table shows the distribution of males andappropriate for nominal data that fall into either females at each of the three reading levels. We aretrue categories or artificial categories. A true cat- interested in whether the pattern for the malesegory is one in which persons or objects naturally (i.e., the distribution across reading level) is signif-fall, independent of any research study (e.g., male icantly different than the pattern for the females,versus female), whereas an artificial category is and on first inspection, it appears so—only oneone that is operationally defined by a researcher female is at ReadLevel 1, whereas 18 males are; 44(e.g., tall versus short). Two or more mutually females are at ReadLevel 3, whereas only 7 malesexclusive categories are required. In educational are. Although our data suggest differences, we do not yet know if these differences are meaningful until we consider the outcome of the chi-square

Find more at www.downloadslide.com548 chapter 18  •  Inferential StatisticsTable 18.9 • Contingency table of gender frequencies (symbolized as ∑). The expected fre-and reading level quencies are the numbers we would find if the variables are independent—in other words, the Gender * ReadLevel Crosstabulation pattern of distribution across reading level is the same for the males and the females. Therefore,Count the expected frequencies reflect the null hypoth- esis of no difference. The expected frequencies ReadLevel and percentage distributions are presented in the expanded cross-tabulation shown in Table 18.10.  1  2  3 Total  59 Although we typically conduct chi square onGender 1 18 34  7  66 the computer, the hand calculation is manageable 125 when we have a simple cross-tabulation table, such 2  1 21 44 as Table 18.9. The formula (which is also used by statistical programs such as SPSS) is:Total 19 55 51analysis. In the language of statistics, a significant x2 = Σ c ( fo - fe)2chi square tells us that these variables (i.e., gender fe dand reading level) are not independent. In this formula, fo is the observed frequency, and fe To determine if the variables are independent is the expected frequency. As with other hand cal- culations, we refer to a statistical table to determineor not, we compare the frequencies we actually whether the value computed for x2 is significant,observed (symbolized as O) with the expected taking into account the degreesTable 18.10 • SPSS crosstabulation table of gender and reading of freedom and probability levellevel with percentages (i.e., the level of risk we accept that this finding would occur by Gender * ReadLevel Crosstabulation chance). Degrees of freedom for ReadLevel chi square are computed by mul- tiplying the number of rows in 1 2 3 Total the contingency table, minus oneGender  1 Count 18 34 7 59 (i.e., the number of levels of one variable, minus one) by the num- Expected Count 9.0 26.0 24.1 59.0 ber of columns, minus one (i.e., % within Gender 30.5% 57.6% 11.9% 100.0% the number of levels of the other % within ReadLevel 94.7% 61.8% 13.7% 47.2% variable, minus one): df = (R - 1) (C - 1). In this example, then, % of Total 14.4% 27.2% 5.6% 47.2% df = (3 - 1)(2 - 1) = 2.Total 2 Count 1 21 44 66 Calculating Chi Square Expected Count 10.0 29.0 26.9 66.0 Using SPSS % within Gender 31.8% 66.7% 100.0% % within ReadLevel 1.5% 38.2% 86.3% 52.8% To specify the chi-square sta- % of Total 5.3% 16.8% 35.2% 52.8% tistic for a given set of data, we Count 55 51 125 go to the Descriptive Statistics Expected Count .8% 55.0 51.0 125.0 submenu by clicking Analyze % within Gender 19 44.0% 40.8% 100.0% (because chi square is a nonpara- % within ReadLevel 19.0 100.0% 100.0% 100.0% metric statistic, it is listed with % of Total 15.2% 44.0% 40.8% 100.0% descriptive statistics). Within 100.0% this submenu, we choose the 15.2% Crosstabs . . . option. Chi square is one of the few analysis proce- dures in SPSS that does not have

Find more at chapter 18  •  Inferential Statistics 549a dedicated menu option. We display the SPSS and reading level. With the significant chi-squaremenu options in Appendix A, Figure A.18.8, but statistic, we can conclude that gender and readingthey can be summarized as follows: level are not independent—in other words, the pat- terns for the males are different than the patternsAnalyze for the females. Males and females are not distrib-Descriptive Statistics uted in the same way at each reading level.Crosstabs . . . The Pearson chi-square value tells us only thatOnce in the Crosstabs . . . window, we need to spec- the patterns are not the same; it does not tell usify the variables to go in the rows and columns how they differ. In other words, we don’t know ifof the table; in this example they are gender and the number of males differs significantly from thereading level, as shown in Figure A.18.9. To com- number of females at each reading level or only atpute chi square, we click on the Statistics button at some of the reading levels. We need to go back tothe bottom of the window and then select the Chi- the crosstabulation table (i.e., Table 18.10), whichsquare statistic button in the next screen. Finally, shows that a higher proportion of the females iswe click the Continue button in the upper right of found at the highest level of reading. In this exam-this screen to complete the analysis. If we want to ple, 44 females are at the highest level of reading;display the expected frequencies in each cell, we these 44 females account for 86.3% of the studentscan return to the Crosstabs . . . window and click on (males and females) found at the highest level ofthe Cells button (shown in Figure A.18.9). reading. Clearly, the females greatly outnumber the males. Likewise, 18 males and only 1 female are at The first table that SPSS generates is the cross- the lowest level of reading; that is, 94.7% of thetabulation table, as illustrated in Table 18.9, which students in the lowest level of reading are males.shows the observed values for each cell. The nexttable (see Table 18.10) presents further informa- To summarize, we have found from the chi-tion on the expected frequencies and percentage square analysis that the observed distribution ofof students in each cell in the contingency table. students across gender and reading level is notThese percentages are helpful to interpret the what we expected, simply due to chance. The twomeaning of the chi-square analysis. variables, gender and reading level, are not inde- pendent. From this finding with the Pacific Crest The outcome of the chi-square calculation is College students, we may conclude that males needpresented in Table 18.11. The first line shows a additional reading help; we may consider improv-Pearson chi-square value, x2 = 7.747, which yields ing the high school language curriculum, providinga significance level of .021. Although SPSS pro- a remedial reading program for first-year collegevides a larger variety of statistical computations, males, or providing support services accordingly.the Pearson chi-square is adequate for our pur-poses of determining the relation between gender Chi square may also be used with more than two variables. Because contingency tables can beTable 18.11 • Chi-square analysis of gender of two, three, or more dimensions, depending onand reading* the number of variables, a multidimensional chi square can be thought of as a factorial chi square. Chi-Square Tests Of course, as the contingency table gets larger by adding more variables, interpreting the chi square Value Asymp. becomes more complex. We are also limited by df Sig. (2-sided) sample size if we want to expand the number of variables to include in a contingency table. ForPearson Chi-Square 7.747(a) 2 .021 example, because we have only 125 students in our Pacific Crest College sample, we can very easilyLikelihood Ratio 8.005 2 .018 have almost as many cells in the table as we have students. If we are interested in looking at the read-Linear-by-Linear 6.783 1 .009 ing levels of students sorted by economic level, gen-Association der, and ethnicity, we quickly run out of students to fill all the cells in the contingency table. Gender hasN of Valid Cases 125 two values, reading level has three values, economic* 0 cells (.0%) have expected count less than 5. The minimumexpected count is 8.02.

Find more at www.downloadslide.com550 chapter 18  •  Inferential Statisticslevel has three levels, and ethnicity has five—this may have in common. Factor analysis is commonlytable would have 90 cells (2 * 3 * 3 * 5), and we used to reduce a large number of responses oronly have 125 students to fill the table. It’s possible questions to a few more meaningful groupings,that some cells would have no students that fit into known as factors. For example, we may give ourthat combination of variables (e.g., a White, high- students or subjects a 100-question personalityeconomic level woman at the lowest reading level). inventory. To reduce these 100 responses to a man-Obviously, we need more students or fewer vari- ageable number, we can perform a factor analysisables to conduct an appropriate chi square. to identify several key factors that the responses have in common. A number of psychologicalOther Investigative Techniques: inventories, such as the Minnesota MultiphasicData Mining, Factor Analysis, and Personality Inventory (MMPI), were created withStructural Equation Modeling the assistance of factor analysis. Responses to the MMPI are scored on 10 scales that represent indi-In addition to some of the standard statistical tests cators of factors such as schizophrenia, depres-we have presented, a number of other valuable sion, and hysteria. However, factor analysisanalytical tools are extremely helpful, depending indicates only how the responses group together;on the purpose of the research and the data avail- the names and meaning of the factors must beable. Data mining, as an example, uses analytical determined by the researchers. The intelligencetools to identify and predict patterns in datasets or quotient (IQ) was also created through factor anal-large warehouses of data that have been collected ysis. Because the IQ itself represents one factorfrom thousands of subjects and about hundreds that emerged from factor analysis, a number ofof variables. Data mining is used often in business scholars are quite critical of how well the IQ actu-and scientific research to discover relations and ally measures a concept as complex as intelli-predict patterns among variables and outcomes. gence.2 Interpreting the meaning of factors isIn business, for example, data-mining techniques challenging—factor analysis may be as much anmay be used to discover purchasing patterns—who art form as a statistical analysis.buys what products how often and for what pur-poses—to identify where advertisements should Structural Equation Modeling (SEM) canbe placed and the products that should be sold be conducted by several software programs, thein particular stores. Likewise, credit card compa- most widely used is LISREL (Linear Structuralnies are interested in who makes which purchases Relationships), which is available on SPSS. LISRELfrom which stores and how often. These buying can be thought of as an ultracombination of pathpatterns are also important for security reasons to analysis and factor analysis: It is an extremelydetect fraudulent purchases, as when thousands complicated procedure that builds a structuralof dollars of video game purchases are charged model to explain the interactive relations amongto an 80-year-old’s credit card. Obviously, quite a relatively large number of variables. The distinctsophisticated statistical techniques are needed to advantage of LISREL is that it begins with thetest multiple hypotheses with such large databases. creation of a complex path model that consid-Among other statistical software packages, SPSS ers multiple relations among independent andand SAS (Statistical Analysis System) offer data- dependent variables as well as latent variables thatmining procedures in the more advanced versions. are unobserved but responsible for measurement“Clementine,” for example, is the data-mining pro- error. Factor analysis yields groupings of variablescedure available on the full version of SPSS. Newer or ­factors that are tested with path analysis (i.e.,advancements in data mining include text mining multiple regression) to show the strength of theand Web mining procedures to provide predictive factors in the model. The disadvantage of LISRELmodels beyond the data the researcher or business is that it requires a large dataset and is quite com-has collected. plex to interpret. Nonetheless, for the advanced researcher it is a powerful tool that uses the best Factor analysis is a statistical procedure used capabilities of path analysis and factor identify relations among variables in a correla-tion matrix. Basically, factor analysis determines 2 See Gould, Stephen Jay. The Mismeasure of Man. New York:how variables group together based on what they W.W. Norton, 1981, 1996. Gardner, Howard. Frames of Mind: The Theory of Multiple Intelligences. New York: Basic Books, 1983.

Find more at chapter 18  •  Inferential Statistics 551Types of Parametric and and select the one that provides the best match.Nonparametric Statistical Tests Other information in the table also helps in carry- ing out the appropriate significance test. Of course,There are too many parametric and nonparametric researchers should use statistical tests only if theystatistical methods to describe in detail here. Table can justify their use and interpret the outcomes with18.12 provides an overview of some of the more confidence. Many a graduate student has needlesslycommonly used tests and their associated purposes. suffered in a thesis defense when trying to explainThe table is best used by first identifying the levels an overly complex statistical procedure that is unfa-of measurement of the study. Then examine the pur- miliar. Select appropriate procedures you under-pose statements that fit the levels of measurement, stand and be parsimonious in your explanations.Table 18.12 • Commonly used parametric and nonparametric significance tests Test Parametric (P) or Var. 1 Var. 2Name of Test Statistic df Non-parametric (NP) Purpose Independent Dependentt test for t n1 + n2 - 2 P Test the difference Nominal Interval orindependent between means of two ratiosamples independent groupst test for t N-1 P Test the difference Nominal Interval ordependent between means of two ratiosamples dependent groupsAnalysis of F SSB = groups - 1; P Test the difference Nominal Interval orvariance SSW = participants among three or more ratio - groups - 1 independent groupsPearson r N-2 P Test whether a Interval or Interval orproduct correlation is different ratio ratiocorrelation from zero (a relationship exists)Chi-square x2 rows - 1 * NP Test the difference in Nominal Nominaltest column - 1 proportions in two or more groupsMedian test x2 rows - 1 * NP Test the difference of Nominal Ordinal column - 1 the medians of two independent groupsMann- U N-1 NP Test the difference of Nominal OrdinalWhitney U the medians of twotest independent groupsWilcoxon Z N-2 NP Test the difference in Nominal Ordinalsigned rank the ranks of two relatedtest groupsKruskal- H groups - 1 NP Test the difference in the Nominal OrdinalWallis test ranks of three or more independent groupsFreidman test x groups - 1 NP Test the difference in Nominal Ordinal the ranks of three or more dependent groupsSpearman r N-2 NP Test whether a Ordinal Ordinalrho correlation is different from zero

Find more at www.downloadslide.com552 chapter 18  •  Inferential StatisticsSummaryConcepts Underlying Inferential 8. The smaller the standard error of the mean,Statistics the less sampling error. As the size of the sample increases, the standard error of the 1. Inferential statistics deal with inferences mean decreases. A researcher should make about populations based on the behavior every effort to acquire as large a sample as of samples. Inferential statistics are used to possible. determine how likely it is that results based on a sample or samples are the same results 9. Standard error can also be calculated for that would have been obtained for the entire other measures of central tendency, as well as population. for measures of variability, relationship, and relative position. Standard error can also be 2. The degree to which the results of a sample determined for the difference between means. can be generalized to a population is always expressed in terms of probabilities, not in Hypothesis Testing terms of proof. 1 0. Hypothesis testing is a process of decisionStandard Error making in which researchers evaluate the results of a study against their original 3. Expected, chance variation among means is expectations. In short, hypothesis testing is referred to as sampling error. The question the process of determining whether to reject that guides inferential statistics is whether the null hypothesis (i.e., no meaningful observed differences are real or only the result differences, only those due to sampling error) of sampling errors. in favor of the research hypothesis (i.e., the groups are meaningfully different; one 4. A useful characteristic of sampling errors is treatment is more effective than another). that they are usually normally distributed. If a sufficiently large number of equal-size 11. Because we can never completely control large samples are randomly selected from all the factors that may be responsible for a population, the means of those samples the outcome or test all the possible samples, will be normally distributed around the we can never prove a research hypothesis. population mean. The mean of all the sample However, if we can reject the null hypothesis, means will yield a good estimate of the we have supported our research hypothesis, population mean. gaining confidence that our findings reflect the true state of affairs in the population. 5. A distribution of sample means has its own mean and its own standard deviation. The Tests of Significance standard deviation of the sample means (i.e., the standard deviation of sampling 12. A test of significance is a statistical procedure errors) is usually called the standard error in which we determine the likelihood (i.e., of the mean (SE X ). probability) that the results from our sample are just due to chance. Significance refers 6. In a normal curve, approximately 68% of the to a selected probability level that indicates sample means will fall between {1 standard how much risk we are willing to take if the error of the population mean, 95% will fall decision we make is wrong. between {2 standard errors, and 99 +% will fall between {3 standard errors. 13. When conducting a test of significance, researchers set a probability level at which 7. In most cases, we do not know the mean or they feel confident that the results are not standard deviation of the population, so we simply due to chance. This level of significance estimate the standard error with the following is known as alpha, symbolized as a. The formula: smaller the probability level, the less likely it is that this finding would occur by chance.(SEX) = SD 1 2N -

Find more at chapter 18  •  Inferential Statistics 5531 4. The standard preselected probability level used Degrees of Freedom by educational researchers is usually five out of 100 chances that the observed difference 2 4. Degrees of freedom are important in occurred by chance (symbolized as a = .05). determining whether the results of a study are statistically significant. Each testTwo-Tailed and One-Tailed Tests of significance has its own formula for determining degrees of freedom based on1 5. Tests of significance can be either one-tailed factors such as the number of subjects and the or two-tailed. “Tails” refer to the extreme number of groups. ends of the bell-shaped curve of a sampling distribution. Selecting Among Tests of Significance1 6. A one-tailed test assumes that a difference can occur in only one direction; the research 25. Different tests of significance are appropriate hypothesis is directional. To select a one-tailed for different types of data. The first decision test of significance, the researcher should be in selecting an appropriate test of significance quite sure that the results can occur only in is whether a parametric test or nonparametric the predicted direction. test should be selected. 17. A two-tailed test assumes that the results 26. Parametric tests are more powerful and can occur in either direction; the research appropriate when the variable measured is hypothesis is nondirectional. normally distributed in the population and the data represent an interval or ratio scale1 8. When appropriate, a one-tailed test has one of measurement. Parametric tests also assume major advantage: It is statistically “easier” to that participants are randomly selected for the obtain a significant difference when using a study and that the variances of the population one-tailed test. comparison groups are equal.Type I and Type II Errors 2 7. Nonparametric tests make no assumptions about the shape of the distribution and are 19. A Type I error occurs when the null used when the data represent an ordinal or hypothesis is true, but the researcher nominal scale, when a parametric assumption rejects it, believing—incorrectly—that the has been greatly violated, or when the nature results from the sample are not simply due of the distribution is not known. to chance. For example, the groups aren’t different, but the researcher concludes The t Test incorrectly that they are. 28. The t test is used to determine whether two 20. A Type II error occurs when the null groups of scores are significantly different at a hypothesis is false, but the researcher fails to selected probability level. The basic strategy of reject it, believing incorrectly that the results the t test is to compare the actual difference from the sample are simply due to chance. between the means of the groups (X1 - X2) For example, the groups are different, but with the difference expected by chance if the the researcher concludes incorrectly that null hypothesis (i.e., no difference) is true. they aren’t. This ratio is known as the t value.2 1. The preselected probability level (alpha) 29. If the t value is equal to or greater than determines the probability of committing the value statistically established for the a Type I error, that is, of rejecting a null predetermined significance level, we can reject hypothesis that is really true. the null hypothesis.2 2. As the probability of committing a Type I 3 0. The t test is adjusted for the fact that the error decreases, the probability of committing distribution of scores for small samples a Type II error (that is, of not rejecting a null becomes increasingly different from a hypothesis when you should) increases. normal distribution as sample sizes become increasingly smaller. 23. The consequences of committing a Type I error thus affect the decision about level of significance for a particular study.

Find more at www.downloadslide.com554 chapter 18  •  Inferential Statistics3 1. The t test for independent samples is used to 38. The comparisons to be examined should determine whether, at a selected probability be planned before, not after, the data are level, a significant difference exists between collected. the means of two independent samples. Independent samples are randomly formed 3 9. Of the many multiple comparison techniques without any type of matching. available, a commonly used one is the Scheffé test, which is a very conservative test. 32. The t test for nonindependent samples is used to compare groups that are formed by Multifactor Analysis of Variance some type of matching or to compare a single group’s performance on two occasions or on 40. Multifactor analysis of variance is appropriate two different measures. if a research study is based on a factorial design; it investigates two or moreAnalysis of Gain or Difference Scores independent variables and the interactions between them. This analysis yields a separate 33. Subtracting each participant’s pretest score F ratio for each independent variable and one from his or her posttest score results in a for each interaction. gain, or difference, score. Analyzing difference scores is problematic because every participant Analysis of Covariance does not have the same potential to gain. 4 1. Analysis of covariance (ANCOVA) is a formAnalysis of Variance of ANOVA used for controlling extraneous variables. ANCOVA adjusts posttest scores 34. Analysis of variance (ANOVA) is a parametric for initial differences on some variable and test of significance used to determine whether compares adjusted scores. scores from two or more groups are significantly different at a selected probability level. 4 2. ANCOVA is also used as a means of increasing the power of a statistical test. Power refersSimple Analysis of Variance to the ability of a significance test to identify a true research finding, thus allowing the 35. Simple, or one-way, analysis of variance experimenter to reject a null hypothesis that is (ANOVA) is used when the comparison false. involves one variable with two or more levels. 43. ANCOVA is based on the assumption that 36. In ANOVA, the total variance of scores is participants have been randomly assigned attributed to two sources—variance between to treatment groups. It is best used in groups (variance caused by the treatment or conjunction with true experimental designs. other independent variables) and variance If existing, or intact, groups are involved but within groups (error variance). ANOVA treatments are assigned to groups randomly, involves a ratio, known as F, with variance ANCOVA may still be used but results must be between groups as the numerator and error interpreted with caution. variance as the denomin