Which data recording method would be best used to measure the time between when a cue was delivered?

1. DeVellis RF. Scale Development: Theory and Application. Los Angeles, CA: Sage; Publications (2012). [Google Scholar]

2. Raykov T, Marcoulides GA. Introduction to Psychometric Theory. New York, NY: Routledge, Taylor & Francis Group; (2011). [Google Scholar]

3. Streiner DL, Norman GR, Cairney J. Health Measurement Scales: A Practical Guide to Their Development and Use. Oxford University Press; (2015). [Google Scholar]

4. McCoach DB, Gable RK, Madura JP. Instrument Development in the Affective Domain. School and Corporate Applications, 3rd Edn. New York, NY: Springer; (2013). [Google Scholar]

5. Morgado FFR, Meireles JFF, Neves CM, Amaral ACS, Ferreira MEC. Scale development: ten main limitations and recommendations to improve future research practices. Psicol Reflex E Crítica (2018) 30:3 10.1186/s41155-016-0057-1 [CrossRef] [Google Scholar]

6. Glanz K, Rimer BK, Viswanath K. Health Behavior: Theory, Research, and Practice. San Francisco, CA: John Wiley & Sons, Inc; (2015). [Google Scholar]

7. Ajzen I. From intentions to actions: a theory of planned behavior. In: Action Control SSSP Springer Series in Social Psychology Berlin; Heidelberg: Springer, (1985). p. 11–39. [Google Scholar]

8. Bai Y, Peng C-YJ, Fly AD. Validation of a short questionnaire to assess mothers' perception of workplace breastfeeding support. J Acad Nutr Diet (2008) 108:1221–5. 10.1016/j.jada.2008.04.018 [PubMed] [CrossRef] [Google Scholar]

9. Hirani SAA, Karmaliani R, Christie T, Rafique G. Perceived Breastfeeding Support Assessment Tool (PBSAT): development and testing of psychometric properties with Pakistani urban working mothers. Midwifery (2013) 29:599–607. 10.1016/j.midw.2012.05.003 [PubMed] [CrossRef] [Google Scholar]

10. Boateng GO, Martin S, Collins S, Natamba BK, Young SL. Measuring exclusive breastfeeding social support: scale development and validation in Uganda. Matern Child Nutr. (2018). 10.1111/mcn.12579. [Epub ahead of print]. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

11. Arbach A, Natamba BK, Achan J, Griffiths JK, Stoltzfus RJ, Mehta S, et al.. Reliability and validity of the center for epidemiologic studies-depression scale in screening for depression among HIV-infected and -uninfected pregnant women attending antenatal services in northern Uganda: a cross-sectional study. BMC Psychiatry (2014) 14:303. 10.1186/s12888-014-0303-y [PMC free article] [PubMed] [CrossRef] [Google Scholar]

12. Natamba BK, Kilama H, Arbach A, Achan J, Griffiths JK, Young SL. Reliability and validity of an individually focused food insecurity access scale for assessing inadequate access to food among pregnant Ugandan women of mixed HIV status. Public Health Nutr. (2015) 18:2895–905. 10.1017/S1368980014001669 [PubMed] [CrossRef] [Google Scholar]

13. Neilands TB, Chakravarty D, Darbes LA, Beougher SC, Hoff CC. Development and validation of the sexual agreement investment scale. J Sex Res. (2010) 47:24–37. 10.1080/00224490902916017 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

14. Neilands TB, Choi K-H. A validation and reduced form of the female condom attitudes scale. AIDS Educ Prev. (2002) 14:158–71. 10.1521/aeap.14.2.158.23903 [PubMed] [CrossRef] [Google Scholar]

15. Lippman SA, Neilands TB, Leslie HH, Maman S, MacPhail C, Twine R, et al.. Development, validation, and performance of a scale to measure community mobilization. Soc Sci Med. (2016) 157:127–37. 10.1016/j.socscimed.2016.04.002 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

16. Johnson MO, Neilands TB, Dilworth SE, Morin SF, Remien RH, Chesney MA. The role of self-efficacy in HIV treatment adherence: validation of the HIV treatment adherence self-efficacy scale (HIV-ASES). J Behav Med. (2007) 30:359–70. 10.1007/s10865-007-9118-3 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

17. Sexton JB, Helmreich RL, Neilands TB, Rowan K, Vella K, Boyden J, et al.. The Safety Attitudes Questionnaire: psychometric properties, benchmarking data, and emerging research. BMC Health Serv Res. (2006) 6:44. 10.1186/1472-6963-6-44 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

18. Wolfe WS, Frongillo EA. Building household food-security measurement tools from the ground up. Food Nutr Bull. (2001) 22:5–12. 10.1177/156482650102200102 [CrossRef] [Google Scholar]

19. González W, Jiménez A, Madrigal G, Muñoz LM, Frongillo EA. Development and validation of measure of household food insecurity in urban costa rica confirms proposed generic questionnaire. J Nutr. (2008) 138:587–92. 10.1093/jn/138.3.587 [PubMed] [CrossRef] [Google Scholar]

20. Boateng GO, Collins SM, Mbullo P, Wekesa P, Onono M, Neilands T, et al. A novel household water insecurity scale: procedures and psychometric analysis among postpartum women in western Kenya. PloS ONE. (2018). 10.1371/journal.pone.0198591 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

21. Melgar-Quinonez H, Hackett M. Measuring household food security: the global experience. Rev Nutr. (2008) 21:27s−37s. 10.1590/S1415-52732008000700004 [CrossRef] [Google Scholar]

22. Melgar-Quiñonez H, Zubieta AC, Valdez E, Whitelaw B, Kaiser L. Validación de un instrumento para vigilar la inseguridad alimentaria en la Sierra de Manantlán, Jalisco. Salud Pública México (2005) 47:413–22. 10.1590/S0036-36342005000600005 [PubMed] [CrossRef] [Google Scholar]

23. Hackett M, Melgar-Quinonez H, Uribe MCA. Internal validity of a household food security scale is consistent among diverse populations participating in a food supplement program in Colombia. BMC Public Health (2008) 8:175. 10.1186/1471-2458-8-175 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

24. Hinkin TR. A review of scale development practices in the study of organizations. J Manag. (1995) 21:967–88. 10.1016/0149-2063(95)90050-0 [CrossRef] [Google Scholar]

25. Haynes SN, Richard DCS, Kubany ES. Content validity in psychological assessment: a functional approach to concepts and methods. Pyschol Assess. (1995) 7:238–47. 10.1037/1040-3590.7.3.238 [CrossRef] [Google Scholar]

26. Kline P. A Handbook of Psychological Testing. 2nd Edn. London: Routledge; Taylor & Francis Group; (1993). [Google Scholar]

27. Hunt SD. Modern Marketing Theory. Cincinnati: South-Western Publishing; (1991). [Google Scholar]

28. Loevinger J. Objective tests as instruments of psychological theory. Psychol Rep. (1957) 3:635–94. 10.2466/pr0.1957.3.3.635 [CrossRef] [Google Scholar]

29. Clarke LA, Watson D. Constructing validity: basic issues in objective scale development. Pyschol Assess. (1995) 7:309–19. 10.1037/1040-3590.7.3.309 [CrossRef] [Google Scholar]

30. Schinka JA, Velicer WF, Weiner IR. Handbook of Psychology, Vol. 2, Research Methods in Psychology. Hoboken, NJ: John Wiley & Sons, Inc. (2012). [Google Scholar]

31. Fowler FJ. Improving Survey Questions: Design and Evaluation. Thousand Oaks, CA: Sage Publications; (1995). [Google Scholar]

32. Krosnick JA. Questionnaire design. In: Vannette DL, Krosnick JA, editors. The Palgrave Handbook of Survey Research. Cham: Palgrave Macmillan; (2018), pp. 439–55. [Google Scholar]

33. Krosnick JA, Presser S. Question and questionnaire design. In: Wright JD, Marsden PV, editors. Handbook of Survey Research. San Diego, CA: Elsevier; (2009), pp. 263–314. [Google Scholar]

34. Rhemtulla M, Brosseau-Liard PÉ, Savalei V. When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychol Methods (2012) 17:354–73. 10.1037/a0029315 [PubMed] [CrossRef] [Google Scholar]

35. MacKenzie SB, Podsakoff PM, Podsakoff NP. Construct measurement and validation procedures in MIS and behavioral research: integrating new and existing techniques. MIS Q. (2011) 35:293 10.2307/23044045 [CrossRef] [Google Scholar]

36. Messick S. Validity of psychological assessment: validation of inferences from persons' responses and performance as scientifica inquiry into score meaning. Am Psychol. (1995) 50:741–9. 10.1037/0003-066X.50.9.741 [CrossRef] [Google Scholar]

37. Campbell DT, Fiske DW. Convergent and discriminant validity by the multitrait-multimethod matrix. Psychol Bull. (1959) 56:81–105. 10.1037/h0046016 [PubMed] [CrossRef] [Google Scholar]

38. Dennis C. Theoretical underpinnings of breastfeeding confidence: a self-efficacy framework. J Hum Lact. (1999) 15:195–201. 10.1177/089033449901500303 [PubMed] [CrossRef] [Google Scholar]

39. Dennis C-L, Faux S. Development and psychometric testing of the Breastfeeding Self-Efficacy Scale. Res Nurs Health (1999) 22:399–409. 10.1002/(SICI)1098-240X(199910)22:5<399::AID-NUR6>3.0.CO;2-4 [PubMed] [CrossRef] [Google Scholar]

40. Dennis C-L. The breastfeeding self-efficacy scale: psychometric assessment of the short form. J Obstet Gynecol Neonatal Nurs. (2003) 32:734–44. 10.1177/0884217503258459 [PubMed] [CrossRef] [Google Scholar]

41. Frongillo EA, Nanama S. Development and validation of an experience-based measure of household food insecurity within and across seasons in Northern Burkina Faso. J Nutr. (2006) 136:1409S−19S. 10.1093/jn/136.5.1409S [PubMed] [CrossRef] [Google Scholar]

42. Guion R. Content validity - the source of my discontent. Appl Psychol Meas. (1977) 1:1–10. 10.1177/014662167700100103 [CrossRef] [Google Scholar]

43. Lawshe C. A quantitative approach to content validity. Pers Psychol. (1975) 28:563–75. 10.1111/j.1744-6570.1975.tb01393.x [CrossRef] [Google Scholar]

44. Lynn M. Determination and quantification of content validity. Nurs Res. (1986) 35:382–5. 10.1097/00006199-198611000-00017 [PubMed] [CrossRef] [Google Scholar]

45. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. (1960) 20:37–46. 10.1177/001316446002000104 [CrossRef] [Google Scholar]

46. Wynd CA, Schmidt B, Schaefer MA. Two quantitative approaches for estimating content validity. West J Nurs Res. (2003) 25:508–18. 10.1177/0193945903252998 [PubMed] [CrossRef] [Google Scholar]

47. Linstone HA, Turoff M. (eds). The Delphi Method. Reading, MA: Addison-Wesley; (1975). [Google Scholar]

48. Augustine LF, Vazir S, Rao SF, Rao MV, Laxmaiah A, Ravinder P, et al.. Psychometric validation of a knowledge questionnaire on micronutrients among adolescents and its relationship to micronutrient status of 15–19-year-old adolescent boys, Hyderabad, India. Public Health Nutr. (2012) 15:1182–9. 10.1017/S1368980012000055 [PubMed] [CrossRef] [Google Scholar]

49. Beatty PC, Willis GB. Research synthesis: the practice of cognitive interviewing. Public Opin Q. (2007) 71:287–311. 10.1093/poq/nfm006 [CrossRef] [Google Scholar]

50. Alaimo K, Olson CM, Frongillo EA. Importance of cognitive testing for survey items: an example from food security questionnaires. J Nutr Educ. (1999) 31:269–75. 10.1016/S0022-3182(99)70463-2 [CrossRef] [Google Scholar]

51. Willis GB. Cognitive Interviewing and Questionnaire Design: A Training Manual. Cognitive Methods Staff Working Paper Series. Hyattsville, MD: National Center for Health Statistics; (1994). [Google Scholar]

52. Willis GB. Cognitive Interviewing: A Tool for Improving Questionnaire Design. Thousand Oaks, CA: Sage Publications; (2005). [Google Scholar]

53. Tourangeau R. Cognitive aspects of survey measurement and mismeasurement. Int J Public Opin Res. (2003) 15:3–7. 10.1093/ijpor/15.1.3 [CrossRef] [Google Scholar]

54. Morris MD, Neilands TB, Andrew E, Mahar L, Page KA, Hahn JA. Development and validation of a novel scale for measuring interpersonal factors underlying injection drug using behaviours among injecting partnerships. Int J Drug Policy (2017) 48:54–62. 10.1016/j.drugpo.2017.05.030 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

55. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. (2009) 42:377–81. 10.1016/j.jbi.2008.08.010 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

56. GoldsteinM Benerjee R, Kilic T. Paper v Plastic Part 1: The Survey Revolution Is in Progress. The World Bank Development Impact; (2012). Available online at: http://blogs.worldbank.org/impactevaluations/paper-v-plastic-part-i-the-survey-revolution-is-in-progress (Accessed November 10, 2017). [Google Scholar]

57. Fanning J, McAuley E. A Comparison of tablet computer and paper-based questionnaires in healthy aging research. JMIR Res Protoc. (2014) 3:e38. 10.2196/resprot.3291 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

58. Greenlaw C, Brown-Welty S. A Comparison of web-based and paper-based survey methods: testing assumptions of survey mode and response cost. Eval Rev. (2009) 33:464–80. 10.1177/0193841X09340214 [PubMed] [CrossRef] [Google Scholar]

59. MacCallum RC, Widaman KF, Zhang S, Hong S. Sample size in factor analysis. Psychol Methods (1999) 4:84–99. 10.1037/1082-989X.4.1.84 [CrossRef] [Google Scholar]

60. Nunnally JC. Pyschometric Theory. New York, NY: McGraw-Hill; (1978). [Google Scholar]

61. Guadagnoli E, Velicer WF. Relation of sample size to the stability of component patterns. Am Psychol Assoc. (1988) 103:265–75. 10.1037/0033-2909.103.2.265 [PubMed] [CrossRef] [Google Scholar]

62. Comrey AL. Factor-analytic methods of scale development in personality and clinical psychology. Am Psychol Assoc. (1988) 56:754–61. [PubMed] [Google Scholar]

63. Comrey AL, Lee H. A First Cours in Factor Analysis. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc; (1992). [Google Scholar]

64. Ong DC. A Primer to Bootstrapping and an Overview of doBootstrap. Stanford, CA: Department of Psychology, Stanford University; (2014). [Google Scholar]

65. Osborne JW, Costello AB. Sample size and subject to item ratio in principal components analysis. Pract Assess Res Eval. (2004) 99:1–15. Available online at: http://pareonline.net/htm/v9n11.htm [Google Scholar]

66. Ebel R, Frisbie D. Essentials of Educational Measurement. Englewood Cliffs, NJ: Prentice-Hall; (1979). [Google Scholar]

67. Hambleton R, Jones R. An NCME instructional module on comparison of classical test theory and item response theory and their applications to test development. Educ Meas Issues Pract. (1993) 12:38–47. 10.1111/j.1745-3992.1993.tb00543.x [CrossRef] [Google Scholar]

68. Raykov T. Scale Construction and Development. Lecture Notes. Measurement and Quantitative Methods. East Lansing, MI: Michigan State; University (2015). [Google Scholar]

69. Whiston SC. Principles and Applications of Assessment in Counseling. Cengage Learning (2008). [Google Scholar]

70. Brennan RL. A generalized upper-lower item discrimination index. Educ Psychol Meas. (1972) 32:289–303. 10.1177/001316447203200206 [CrossRef] [Google Scholar]

71. Popham WJ, Husek TR. Implications of criterion-referenced measurement. J Educ Meas. (1969) 6:1–9. 10.1111/j.1745-3984.1969.tb00654.x [CrossRef] [Google Scholar]

72. Rasiah S-MS, Isaiah R. Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper. Ann Acad Med Singap. (2006) 35:67–71. Available online at: http://repository.um.edu.my/id/eprint/65455 [PubMed] [Google Scholar]

73. Demars C. Item Respons Theory. New York, NY: Oxford University Press; (2010). [Google Scholar]

74. Lord FM. Applications of Item Response Theory to Practical Testing Problems. New Jersey, NJ: Englewood Cliffs; (1980). [Google Scholar]

75. Bazaldua DAL, Lee Y-S, Keller B, Fellers L. Assessing the performance of classical test theory item discrimination estimators in Monte Carlo simulations. Asia Pac Educ Rev. (2017) 18:585–98. 10.1007/s12564-017-9507-4 [CrossRef] [Google Scholar]

76. Piedmont RL. Inter-item correlations. In Encyclopedia of Quality of Life and Well-Being Research. Dordrecht: Springer; (2014). p. 3303–4. 10.1007/978-94-007-0753-5_1493 [CrossRef] [Google Scholar]

77. Tarrant M, Ware J, Mohammed AM. An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis. BMC Med Educ. (2009) 9:40. 10.1186/1472-6920-9-40 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

78. Fulcher G, Davidson F. The Routledge Handbook of Language Testing. New York, NY: Routledge; (2012). [Google Scholar]

79. Cizek GJ, O'Day DM. Further investigation of nonfunctioning options in multiple-choice test items. Educ Psychol Meas. (1994) 54:861–72. [Google Scholar]

80. Haladyna TM, Downing SM. Validity of a taxonomy of multiple-choice item-writing rules. Appl Meas Educ. (1989) 2:51–78. 10.1207/s15324818ame0201_4 [CrossRef] [Google Scholar]

81. Tappen RM. Advanced Nursing Research. Sudbury, MA: Jones & Bartlett Publishers; (2011). [Google Scholar]

82. Enders CK, Bandalos DL. The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Struct Equ Model. (2009) 8:430–57. 10.1207/S15328007SEM0803_5 [CrossRef] [Google Scholar]

83. Kenward MG, Carpenter J. Multiple imputation: current perspectives. Stat Methods Med Res. (2007) 16:199–218. 10.1177/0962280206075304 [PubMed] [CrossRef] [Google Scholar]

84. Gottschall AC, West SG, Enders CK. A Comparison of item-level and scale-level multiple imputation for questionnaire batteries. Multivar Behav Res. (2012) 47:1–25. 10.1080/00273171.2012.640589 [CrossRef] [Google Scholar]

85. Cattell RB. The Scree test for the number of factors. Multivar Behav Res. (1966) 1:245–76. 10.1207/s15327906mbr0102_10 [PubMed] [CrossRef] [Google Scholar]

86. Horn JL. A rationale and test for the number of factors in factor analysis. Psychometrika (1965) 30:179–85. 10.1007/BF02289447 [PubMed] [CrossRef] [Google Scholar]

87. Velicer WF. Determining the number of components from the matrix of partial correlations. Psychometrika (1976) 41:321–7. 10.1007/BF02293557 [CrossRef] [Google Scholar]

88. Lorenzo-Seva U, Timmerman ME, Kiers HAL. The hull method for selecting the number of common factors. Multivar Behav Res. (2011) 46:340–64. 10.1080/00273171.2011.564527 [PubMed] [CrossRef] [Google Scholar]

89. Jolijn Hendriks AA, Perugini M, Angleitner A, Ostendorf F, Johnson JA, De Fruyt F, et al. The five-factor personality inventory: cross-cultural generalizability across 13 countries. Eur J Pers. (2003) 17:347–73. 10.1002/per.491 [CrossRef] [Google Scholar]

90. Bond TG, Fox C. Applying the Rasch Model: Fundamental Measurement in the Human Sciences. Mahwah, NJ: Erlbaum; (2013). [Google Scholar]

91. Brown T. Confirmatory Factor Analysis for Applied Research. New York, NY: Guildford Press; (2014). [Google Scholar]

92. Morin AJS, Arens AK, Marsh HW. A bifactor exploratory structural equation modeling framework for the identification of distinct sources of construct-relevant psychometric multidimensionality. Struct Equ Model Multidiscip J. (2016) 23:116–39. 10.1080/10705511.2014.961800 [CrossRef] [Google Scholar]

93. Cochran WG. The χ2 test of goodness of fit. Ann Math Stat. (1952) 23:315–45. 10.1214/aoms/1177729380 [CrossRef] [Google Scholar]

94. Brown MW. Confirmatory Factor Analysis for Applied Research. New York, NY: Guildford Press; (2014). [Google Scholar]

95. Tucker LR, Lewis C. A reliability coefficient for maximum likelihood factor analysis. Psychometrika (1973) 38:1–10. 10.1007/BF02291170 [CrossRef] [Google Scholar]

96. Bentler PM, Bonett DG. Significance tests and goodness of fit in the analysis of covariance structures. Psychol Bull. (1980) 88:588–606. 10.1037/0033-2909.88.3.588 [CrossRef] [Google Scholar]

97. Bentler PM. Comparative fit indexes in structural models. Psychol Bull. (1990) 107:238–46. 10.1037/0033-2909.107.2.238 [PubMed] [CrossRef] [Google Scholar]

98. Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct Equ Model Multidiscip J. (1999) 6:1–55. 10.1080/10705519909540118 [CrossRef] [Google Scholar]

99. Jöreskog KG, Sörbom D. LISREL 8.54. Structural Equation Modeling With the Simplis Command Language (2004) Available online at: http://www.unc.edu/~rcm/psy236/holzcfa.lisrel.pdf

100. Browne MW, Cudeck R. Alternative ways of assessing model fit. In: Bollen KA, Long JS, editors. Testing Structural Equation Models. Newbury Park, CA: Sage Publications; (1993). p. 136–62. [Google Scholar]

101. Yu C. Evaluating Cutoff Criteria of Model Fit Indices for Latent Variable Models With Binary and Continuous Outcomes. Los Angeles, CA: University of California, Los Angeles; (2002). [Google Scholar]

102. Gerbing DW, Hamilton JG. Viability of exploratory factor analysis as a precursor to confirmatory factor analysis. Struct Equ Model Multidiscip J. (1996) 3:62–72. 10.1080/10705519609540030 [CrossRef] [Google Scholar]

103. Reise SP, Morizot J, Hays RD. The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Qual Life Res. (2007) 16:19–31. 10.1007/s11136-007-9183-7 [PubMed] [CrossRef] [Google Scholar]

104. Gibbons RD, Hedeker DR. Full-information item bi-factor analysis. Psychometrika (1992) 57:423–36. 10.1007/BF02295430 [CrossRef] [Google Scholar]

105. Reise SP, Moore TM, Haviland MG. Bifactor models and rotations: exploring the extent to which multidimensional data yield univocal scale scores. J Pers Assess. (2010) 92:544–59. 10.1080/00223891.2010.496477 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

106. Brunner M, Nagy G, Wilhelm O. A Tutorial on hierarchically structured constructs. J Pers. (2012) 80:796–846. 10.1111/j.1467-6494.2011.00749.x [PubMed] [CrossRef] [Google Scholar]

107. Vandenberg RJ, Lance CE. A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research - Robert J. Vandenberg, Charles E. Lance, 2000. Organ Res Methods (2000) 3:4–70. 10.1177/109442810031002 [CrossRef] [Google Scholar]

108. Sideridis GD, Tsaousis I, Al-harbi KA. Multi-population invariance with dichotomous measures: combining multi-group and MIMIC methodologies in evaluating the general aptitude test in the arabic language - Georgios D. Sideridis, Ioannis Tsaousis, Khaleel A. Al-harbi, 2015. J Psychoeduc Assess. 33:568–84. 10.1177/0734282914567871 [CrossRef] [Google Scholar]

109. Joreskog K. A general method for estimating a linear equation system. In: Goldberger AS, Duncan OD, editors. Structural Equation Models in the Social Sciences. New York, NY: Seminar Press; (1973). pp. 85–112. [Google Scholar]

110. Kim ES, Cao C, Wang Y, Nguyen DT. Measurement invariance testing with many groups: a comparison of five approaches. Struct Equ Model Multidiscip J. (2017) 24:524–44. 10.1080/10705511.2017.1304822 [CrossRef] [Google Scholar]

111. Muthén B., Asparouhov T. BSEM Measurement Invariance Analysis. (2017). Available online at: https://www.statmodel.com/examples/webnotes/webnote17.pdf

112. Asparouhov T, Muthén B. Multiple-group factor analysis alignment. Struct Equ Model. 21:495–508. 10.1080/10705511.2014.919210 [CrossRef] [Google Scholar]

113. Reise SP, Widaman KF, Pugh RH. Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychol Bull. (1993) 114:552–66. 10.1037/0033-2909.114.3.552 [PubMed] [CrossRef] [Google Scholar]

114. Pushpanathan ME, Loftus AM, Gasson N, Thomas MG, Timms CF, Olaithe M, et al.. Beyond factor analysis: multidimensionality and the Parkinson's disease sleep scale-revised. PLoS ONE (2018) 13:e0192394. 10.1371/journal.pone.0192394 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

115. Armor DJ. Theta reliability and factor scaling. Sociol Methodol. (1973) 5:17–50. 10.2307/270831 [CrossRef] [Google Scholar]

116. Porta M. A Dictionary of Epidemiology. New York, NY: Oxford University Press; (2008). [Google Scholar]

117. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika (1951) 16:297–334. 10.1007/BF02310555 [CrossRef] [Google Scholar]

118. Zumbo B, Gadermann A, Zeisser C. Ordinal versions of coefficients alpha and theta for likert rating scales. J Mod Appl Stat Methods (2007) 6:21–9. 10.22237/jmasm/1177992180 [CrossRef] [Google Scholar]

119. Gadermann AM, Guhn M, Zumbo B. Estimating ordinal reliability for Likert type and ordinal item response data: a conceptual, empirical, and practical guide. Pract Assess Res Eval. (2012) 17:1–13. Available online at: http://www.pareonline.net/getvn.asp?v=17&n=3 [Google Scholar]

120. McDonald RP. Test Theory: A Unified Treatment. New Jersey, NJ: Lawrence Erlbaum Associates, Inc; (1999). [Google Scholar]

121. Revelle W. Hierarchical cluster analysis and the internal structure of tests. Multivar Behav Res. (1979) 14:57–74. 10.1207/s15327906mbr1401_4 [PubMed] [CrossRef] [Google Scholar]

122. Revelle W, Zinbarg RE. Coefficients alpha, beta, omega, and the glb: comments on Sijtsma. Psychometrika (2009) 74:145 10.1007/s11336-008-9102-z [CrossRef] [Google Scholar]

123. Bernstein I, Nunnally JC. Pyschometric Theory. New York, NY: McGraw-Hill; (1994). [Google Scholar]

124. Weir JP. JP: Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Con Res. (2005) 19:231–40. 10.1519/15184.1 [PubMed] [CrossRef] [Google Scholar]

125. Rousson V, Gasser T, Seifert B. Assessing intrarater, interrater and test–retest reliability of continuous measurements. Stat Med. (2002) 21:3431–46. 10.1002/sim.1253 [PubMed] [CrossRef] [Google Scholar]

126. Churchill GA. A paradigm for developing better measures of marketing constructs. J Mark Res. (1979) 16:64–73. 10.2307/3150876 [CrossRef] [Google Scholar]

127. Bland JM, Altman DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med. (1990) 20:337–40. 10.1016/0010-4825(90)90013-F [PubMed] [CrossRef] [Google Scholar]

128. Hebert JR, Miller DR. The inappropriateness of conventional use of the correlation coefficient in assessing validity and reliability of dietary assessment methods. Eur J Epidemiol. (1991) 7:339–43. 10.1007/BF00144997 [PubMed] [CrossRef] [Google Scholar]

129. McPhail SM. Alternative Validation Strategies: Developing New and Leveraging Existing Validity Evidence. San Francisco, CA: John Wiley & Sons, Inc; (2007). [Google Scholar]

130. Dray S, Dunsch F, Holmlund M. Electronic Versus Paper-Based Data Collection: Reviewing the Debate. The World Bank Development Impact; (2016). Available online at: https://blogs.worldbank.org/impactevaluations/electronic-versus-paper-based-data-collection-reviewing-debate (Accessed November 10, 2017). [Google Scholar]

131. Ellen JM, Gurvey JE, Pasch L, Tschann J, Nanda JP, Catania J. A randomized comparison of A-CASI and phone interviews to assess STD/HIV-related risk behaviors in teens. J Adolesc Health (2002) 31:26–30. 10.1016/S1054-139X(01)00404-9 [PubMed] [CrossRef] [Google Scholar]

132. Chesney MA, Neilands TB, Chambers DB, Taylor JM, Folkman S. A validity and reliability study of the coping self-efficacy scale. Br J Health Psychol. (2006) 11(Pt 3):421–37. 10.1348/135910705X53155 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

133. Thurstone L. Multiple-Factor Analysis. Chicago, IL: University of Chicago Press; (1947). [Google Scholar]

134. Fan X. Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educ Psychol Meas. (1998) 58:357–81. 10.1177/0013164498058003001 [CrossRef] [Google Scholar]

135. Glockner-Rist A, Hoijtink H. The best of both worlds: factor analysis of dichotomous data using item response theory and structural equation modeling. Struct Equ Model Multidiscip J. (2003) 10:544–65. 10.1207/S15328007SEM1004_4 [CrossRef] [Google Scholar]

136. Keeves JP, Alagumalai S. editors. Applied Rasch Measurement: A Book of Exemplars: Papers in Honour of John P. Keeves. Dordrecht; Norwell, MA: Springer; (2005). [Google Scholar]

137. Cappelleri JC, Lundy JJ, Hays RD. Overview of classical test theory and item response theory for quantitative assessment of items in developing patient-reported outcome measures. Clin Ther. (2014) 36:648–62. 10.1016/j.clinthera.2014.04.006 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

138. Harvey RJ, Hammer AL. Item response theory. Couns Psychol. (1999) 27:353–83. 10.1177/0011000099273004 [CrossRef] [Google Scholar]

139. Cook KF, Kallen MA, Amtmann D. Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT's unidimensionality assumption. Qual. Life Res. (2009) 18:447–60. 10.1007/s11136-009-9464-4 [PMC free article] [PubMed] [CrossRef] [Google Scholar]

140. Greca AML, Stone WL. Social anxiety scale for children-revised: factor structure and concurrent validity. J Clin Child Psychol. (1993) 22:17–27. 10.1207/s15374424jccp2201_2 [CrossRef] [Google Scholar]

141. Frongillo EA, Nanama S, Wolfe WS. Technical Guide to Developing a Direct, Experience-Based Measurement Tool for Household Food Insecurity. Washington, DC: Food and Nutrition Technical Assistance Project; (2004). [Google Scholar]


Page 2

PMC full text:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The three phases and nine steps of scale development and validation.

ActivityPurposeHow to explore or estimate?References
PHASE 1: ITEM DEVELOPMENT
Which data recording method would be best used to measure the time between when a cue was delivered?
Step 1: Identification of Domain and Item Generation: Selecting Which Items to Ask
Domain identificationTo specify the boundaries of the domain and facilitate item generation1.1 Specify the purpose of the domain 1.2 Confirm that there are no existing instruments1.3 Describe the domain and provide preliminary conceptual definition

1.4 Specify the dimensions of the domain if they exist a priori


1.5 Define each dimension
(1–4), (25)
Item generationTo identify appropriate questions that fit the identified domain1.6 Deductive methods: literature review and assessment of existing scales
1.7 Inductive methods: exploratory research methodologies including focus group discussions and interviews
(2–5), (24–41)
Which data recording method would be best used to measure the time between when a cue was delivered?
Step 2: Content Validity: Assessing if the Items Adequately Measure the Domain of Interest
Evaluation by expertsTo evaluate each of the items constituting the domain for content relevance, representativeness, and technical quality2.1 Quantify assessments of 5-7 expert judges using formalized scaling and statistical procedures including content validity ratio, content validity index, or Cohen's coefficient alpha
2.2 Conduct Delphi method with expert judges
(1–5), (24, 42–48)
Evaluation by target populationTo evaluate each item constituting the domain for representativeness of actual experience from target population2.3 Conduct cognitive interviews with end users of scale items to evaluate face validity(20, 25)
PHASE 2: SCALE DEVELOPMENT
Which data recording method would be best used to measure the time between when a cue was delivered?
Step 3: Pre-testing Questions: Ensuring the Questions and Answers Are Meaningful
Cognitive interviewsTo assess the extent to which questions reflect the domain of interest and that answers produce valid measurements3.1 Administer draft questions to 5–15 interviewees in 2–3 rounds while allowing respondents to verbalize the mental process entailed in providing answers(49–54)
Which data recording method would be best used to measure the time between when a cue was delivered?
Step 4: Survey Administration and Sample Size: Gathering Enough Data from the Right People
Survey administrationTo collect data with minimum measurement errors4.1 Administer potential scale items on a sample that reflects range of target population using paper or device(55–58)
Establishing the sample sizeTo ensure the availability of sufficient data for scale development4.2 Recommended sample size is 10 respondents per survey item and/or 200-300 observations(29, 59–65)
Determining the type of data to useTo ensure the availability of data for scale development and validation4.3 Use cross-sectional data for exploratory factor analysis
4.4 Use data from a second time point, at least 3 months later in a longitudinal dataset, or an independent sample for test of dimensionality (Step 7)
Which data recording method would be best used to measure the time between when a cue was delivered?
Step 5: Item Reduction: Ensuring Your Scale Is Parsimonious
Item difficulty indexTo determine the proportion of correct answers given per item (CTT) To determine the probability of a particular examinee correctly answering a given item (IRT)5.1 Proportion can be calculated for CTT and item difficulty parameter estimated for IRT using statistical packages(1, 2, 66–68)
Item discrimination testTo determine the degree to which an item or set of test questions are measuring a unitary attribute (CTT) To determine how steeply the probability of correct response changes as ability increases (IRT)5.2 Estimate biserial correlations or item discrimination parameter using statistical packages(69–75)
Inter-item and item-total correlationsTo determine the correlations between scale items, as well as the correlations between each item and sum score of scale items5.3 Estimate inter-item/item communalities, item-total, and adjusted item-total correlations using statistical packages(1, 2, 68, 76)
Distractor efficiency analysisTo determine the distribution of incorrect options and how they contribute to the quality of items5.4 Estimate distractor analysis using statistical packages(77–80)
Deleting or imputing missing casesTo ensure the availability of complete cases for scale development5.5 Delete items with many cases that are permanently missing, or use multiple imputation or full information maximum likelihood for imputation of data(81–84)
Which data recording method would be best used to measure the time between when a cue was delivered?
Step 6: Extraction of Factors: Exploring the Number of Latent Constructs that Fit Your Observed Data
Factor analysisTo determine the optimal number of factors or domains that fit a set of items6.1 Use scree plots, exploratory factor analysis, parallel analysis, minimum average partial procedure, and/or the Hull method(2–4), (85–90)
PHASE 3: SCALE EVALUTION
Which data recording method would be best used to measure the time between when a cue was delivered?
Step 7: Tests of Dimensionality: Testing if Latent Constructs Are as Hypothesized
Test dimensionalityTo address queries on the latent structure of scale items and their underlying relationships. i.e., to validate whether the previous hypothetical structure fits the items7.1 Estimate independent cluster model—confirmatory factor analysis, cf. Table 27.2 Estimate bifactor models to eliminate ambiguity about the type of dimensionality—unidimensionality, bidimensionality, or multi-dimensionality

7.3 Estimate measurement invariance to determine whether hypothesized factor and dimension is congruent across groups or multiple samples

(91–114)
Score scale itemsTo create scale scores for substantive analysis including reliability and validity of scale7.4. calculate scale scores using an unweighted approach, which includes summing standardized item scores and raw item scores, or computing the mean for raw item scores
7.5. Calculate scale scores by using a weighted approach, which includes creating factor scores via confirmatory factor analysis or structural equation models
(115)
Which data recording method would be best used to measure the time between when a cue was delivered?
Step 8: Tests of Reliability: Establishing if Responses Are Consistent When Repeated
Calculate reliability statisticsTo assess the internal consistency of the scale. i.e., the degree to which the set of items in the scale co-vary, relative to their sum score8.1 Estimate using Cronbach's alpha
8.2. Other tests such as Raykov's rho, ordinal alpha, and Revelle's beta can be used to assess scale reliability
(116–123)
Test–retest reliabilityTo assess the degree to which the participant's performance is repeatable; i.e., how consistent their scores are across time8.3 Estimate the strength of the relationship between scale items over two or three time points; variety of measures possible(1, 2, 124, 125)
Which data recording method would be best used to measure the time between when a cue was delivered?
Step 9: Tests of Validity: Ensuring You Measure the Latent Dimension You Intended
Criterion validity
Predictive validityTo determine if scores predict future outcomes9.1 Use bivariate and multivariable regression; stronger and significant associations or causal effects suggest greater predictive validity(1, 2, 31)
Concurrent validityTo determine the extent to which scale scores have a stronger relationship with criterion measurements made near the time of administration9.2 Estimate the association between scale scores and “gold standard” of scale measurement; stronger significant association in Pearson product-moment correlation suggests support for concurrent validity(2)
Construct validity
Convergent validityTo examine if the same concept measured in different ways yields similar results9.3 Estimate the relationship between scale scores and similar constructs using multi-trait multi-method matrix, latent variable modeling, or Pearson product-moment coefficient; higher/stronger correlation coefficients suggest support for convergent validity(2, 37, 126)
Discriminant validityTo examine if the concept measured is different from some other concept9.4 Estimate the relationship between scale scores and distinct constructs using multi-trait multi-method matrix, latent variable modeling, or Pearson product-moment coefficient; lower/weaker correlation coefficients suggest support for discriminant validity(2, 37, 126)
Differentiation by “known groups”To examine if the concept measured behaves as expected in relation to “known groups”9.5 Select known binary variables based on theoretical and empirical knowledge and determine the distribution of the scale scores over the known groups; use t-tests if binary, ANOVA if multiple groups(2, 126)
Correlation analysisTo determine the relationship between existing measures or variables and newly developed scale scores9.6 Correlate scale scores and existing measures or, preferably, use linear regression, intraclass correlation coefficient, and analysis of standard deviations of the differences between scores(2, 127, 128)