Ethnic Differences in Students’ Attitudes to the Arts: Providing Validity Evidence to Make Comparisons

Previous research suggests that non-cognitive factors play an important role in promoting success at school and beyond, aligning with the multifaceted goals of education. Enhancing students’ attitudes to learning in school is expected to have positive impacts on various schooling outcomes. To date, very few studies have focused on measuring and understanding students’ attitude to the arts. This study aims to address a gap in current research in this area by introducing instruments designed to measure attitude to dance, drama, music and visual arts. Confirmatory factor analysis and measurement invariance analyses are employed to examine the factorial validity and measurement equivalence of the scales of attitude to the arts disciplines for different ethnic groups in New Zealand. Findings support the utility of the scales as valid measures of attitude to dance, drama, music and visual arts. Noticeable differences are reported among New Zealand European, Maori, Pasifika and Asian students regarding their attitudes to dance, drama, music and visual arts.


Introduction
Mounting evidence from research suggests that non-cognitive factors play an important role in promoting success at school and beyond, aligning with the multifaceted goals of education (Bertling et al., 2016). Attitude to a learning area is one such factor (Lipnevich et al., 2013) that has received growing recognition by educators and researchers in its ability to influence life outcomes and predict students' academic success (Popham, 2005).
Enhancing students' attitudes to learning in school is expected to have positive impacts on student achievement. However, investigations of relationships between attitudes and achievement present complex and sometimes conflicting patterns. The relationships are mostly positive, statistically significant but weak. The magnitude of the relationship may vary for specific groups, such as gender and ethnicity, for different learning areas or cultural contexts, or for how attitudes are measured and their psychometric quality and validity for different groups (Aiken, 1970(Aiken, , 1976Hansford & Hattie, 1982;Logan & Johnston, 2009;Ma & Kishor, 1997;Silverman & Subramaniam, 2000;Steinkamp & Maehr, 1983;Weinburg, 1995). Berry et al. (2002) argue that cultural factors may have strong impacts on affective measures, such as attitudes, threatening the generalizability of psychological constructs across groups.
The National Monitoring Study of Student Achievement (NMSSA) is designed to assess and understand student achievement across the New Zealand Curriculum (NZC). As well as gathering student achievement data, the study collects contextual information about attitudes, learning experiences and learning environments, from students, teachers and principals. As indicated in their reports, NMSSA is designed to recognize and take account of the identity, language and culture of a multi-cultural population.
Although student success is generally measured by narrow standardized tests and the curricula are focused on the disciplines that are measured through those tests in most countries, research suggests positive effects of artistic engagement in cognitive development, and social and emotional well-being (Caldwell & Vaughan, 2011). For example, Werner (2001) showed that using dance to teach math resulted in a significant difference in students' math scores and motivation to learn. Skinner and Pitzer (2012) state that arts-integrated lessons provide high levels of academic engagement. Furthermore, artistic behaviors are considered tools for communicating and expressing ideas and feelings, and for creating and maintaining social bonds among young students (Blatt-Gross, 2013), thus helping them to improve team-work skills and become active members of society.
The number of studies on students' attitudes towards the arts is not only limited in the literature but the approaches to measuring attitudes to the arts also vary. For example, Smithrim and Upitis (2005) used two item stems (i.e., "I like…", and "I would like to do more…") for music, drama and dance; these comprised one factor. The National Assessment Education Progress (NAEP) uses various numbers of items to survey eighth grade students' attitudes towards music and visual arts; however, results are examined in terms of individual items rather than scales. Xu et al. (2018) use partial least squares structural equation modelling to predict student achievement performance. The formative measurement models used in their study showed that three of the seven items about students' attitudes and behaviors (i.e., personal preferences) were not significant predictors of achievement. Pavlou and Kambouri's (2007) research measured elementary school students' attitudes to art education through a scale including 34 items in four dimensions: enjoyment, confidence, usefulness and support needed. They found high intercorrelations (0.81 to 0.90) among enjoyment, confidence and usefulness, and the comparative fit index (CFI) and goodness of fit index (GFI) were below the threshold of 0.90, which indicates a possible lack of fit. Considering the lack of available literature, NMSSA sought to develop items to create new scales to measure students' attitudes to the arts as part of its assessment program.
The use of the same measurement instrument does not ensure comparability of the results across groups, as the items or statements may have different meanings for students from different ethnic, language or cultural subgroups. Therefore, it is important that NMSSA provides evidence that the measurement of a non-cognitive construct like attitude is equally valid for all the subgroups that will be compared. NMSSA uses item response theory (IRT) to investigate the quality and validity of cognitive and non-cognitive scales, including differential item functioning (DIF, item-level measurement invariance (MI)) to examine whether the items are equivalent across subgroups. However, it is also important to examine MI at the scale level to see whether the accumulation of item-level DIFs yields noninvariance at the scale level. Measurement non-invariance at the scale level would suggest that the meaning of the construct being measured may not be the same for subgroups that are compared.
Thus, the goal of this study was to investigate the extent to which NMSSA scales of attitude to the arts were valid measurements for use across different ethnic groups.

Importance of Testing Measurement Invariance
To make meaningful and valid inferences, measurement instruments need to be interpreted in a similar way and responded to using the same reference framework across subgroups. Therefore, establishing MI across subgroups is regarded as a prerequisite before making comparisons between those subgroups (Bauer, 2017;Ercikan & Lyons-Thomas, 2013). Quite recently, the International Test Commission (ITC) published guidelines for the large-scale assessment of linguistically diverse populations (ITC, 2017). The commission underlined the necessity and importance of testing MI across subgroups by introducing Guideline 2.12, which requires evidence to: Evaluate the invariance of the internal factor structure of the assessment across the L1 (refers to the home/first language (s) of a test taker) and L2 (refers to the second language or foreign language of a test taker) populations. (p. 14) Similarly, the second edition of the ITC guidelines for translating and adapting tests (ITC, 2016) advises test developers to "only compare scores across populations when the level of invariance has been established on the scale on which scores are reported" (Guideline: SSI-2 (16), p. 17). MI across subgroups should not simply be assumed or taken for granted but should be conceptually and statistically supported before making comparisons between subgroup means (Asil & Brown, 2016;Van de Vijver & He, 2016).
MI is defined as "the equivalence of measured constructs in two or more independent groups to assure that the same constructs are being assessed in each group" (Chen et al., 2005, p. 472). In other words, it evaluates whether there is any differential scale functioning for different groups. Do all groups interpret and understand the construct in the same way?
MI is generally tested with a multi-group confirmatory factor analytic approach (MG-CFA) with nested model comparisons. MI is accepted if the differences between nested models are small. There are four levels of invariance that are sequentially tested by putting additional constraints on the item parameters; configural invariance, weak (metric) invariance, strong (scalar) invariance, and strict (residual) invariance (Meredith, 1993).
Configural invariance is the least stringent and requires similar item-factor patterns in each group (Horn & McArdle, 1992) while allowing factor loadings to differ. Metric invariance implies that the unit of measurement (factor loadings) is the same across groups. Even though metric invariance is achieved, comparing factor-latent mean differences is still not possible as the origin of the scales may vary (Chen et al., 2005). Scalar invariance refers to the equivalence of intercepts (scale origins) across groups. Achieving scalar invariance indicates that differences on latent factor means can be attributed to group characteristics rather than to measurement artefacts (Brown et al., 2017). There seems to be a consensus in the literature that scalar invariance is sufficient to compare latent means meaningfully across subgroups (Schmitt & Kuljanin, 2008).
To establish the invariance of the construct of attitudes to the arts across different ethnic groups, this study applied confirmatory factor analysis (CFA). This procedure has been adopted by large-scale international surveys, such as the Programme for International Student Assessment (PISA), the Teaching and Learning International Survey (TALIS), and the Programme for the International Assessment of Adult Competences (PIAAC) ( Van de Vijver et al., 2019).

National Monitoring Study of Student Achievement
NMSSA assesses students' achievement and attitudes across all learning areas of the NZC in a five-year cycle (NMSSA, 2017a). NMSSA is a collaboration between the Educational Assessment Research Unit (EARU) of the University of Otago, and the New Zealand Council for Educational Research (NZCER).
The purpose of NMSSA is to get a broad picture of student achievement in New Zealand (NZ) state and state-integrated primary schools and to collect contextual information to understand factors related to student achievement. This includes, but is not limited to attitude, engagement and opportunities to learn in the learning areas of the curriculum. NMSSA collects data from a linguistically, culturally and socio-economically heterogeneous NZ sample, and seeks to investigate which contextual factors may influence achievement.
While NZ participates in international assessment programs, NMSSA is the only data source for the NZ Ministry of Education that covers all learning areas of the NZC. This unique feature of NMSSA provides an opportunity for the project to produce system-level information that informs national policy and practice. Consequently, NMSSA results are highly valued by educators and used to improve student learning and accelerate progress.
Given their importance, many large-scale national and international studies (e.g. PISA, TIMSS, TALIS and PIRLS) collect information from students about their non-cognitive skills in their efforts to explain achievement differences and trends. However, most of these large-scale studies and researchers have focused mainly in the learning areas of reading (Yurdakal, 2019), mathematics (Aktas & Tabak, 2018) and science (Kayacan & Sonmez, 2019) rather than other learning areas, such as the arts.
There is a vast body of research (for example, Iwai, 2002;Kairavuori & Sintonen, 2012) that shows how arts education improves students' aesthetic, socio-emotional, socio-cultural, and cognitive developments and their academic achievement. Thus, many countries nowadays have implemented arts education in their curricula. As de Eça et al. (2017) state, "art education is preparing students with not only knowledge about art and the contemporary world they live in but also to think critically, use problem-solving skills and develop skills for living together in harmony" (p. 104). These are all considered important 21 st century citizen skills. Therefore, it is important for today's schools to offer equal opportunities and experiences in arts education, which will encourage students to develop positive attitudes. As we know, attitudes are formed and shaped largely by experiences, and these attitudes, in turn, influence motivation for behavior (Morris & Stuckhardt, 1977). Students come to schools from various cultural and socioeconomic backgrounds with different childhood experiences, which influence their attitudes to certain arts disciplines. Moreover, population demographics in NZ are changing dramatically. In light of these factors, it was important in this study to seek information about students' attitudes to arts within the context of multicultural NZ setting. Encouraging positive attitudes and improving learning outcomes for all students are not unique to the NZ context, but they are global challenges. Therefore, the findings of this study are expected to be of interest to educators and researchers in multicultural societies.

Research Goal
The goal of this study was twofold: (i) use the CFA to examine the factorial validity of the attitude to arts scales in dance, drama, music and visual arts for different ethnic groups in NZ: NZ European, Maori, Pasifika, and Asian, and (ii) if MI is established, i.e. the construct is similar for all ethnic groups, investigate any differences in attitudes between NZ European, Maori, Pasifika, and Asian students.

The research questions were:
 To what extent is there evidence to support the factorial validity of the attitude to arts scales (dance, drama, music and visual arts) for use with different ethnic groups in the NZ context?
 If appropriate, are there any differences in attitudes to dance, drama, music and visual arts by ethnicity?

Methodology Sample
NMSSA reports on achievement in different learning areas for year 4 and year 8 student populations in NZ. Nationally representative samples are drawn so that all year 4 and year 8 students in schools (with at least eight students at the respective year level) have an approximately equal chance of being selected. To achieve this, NMSSA carries out a twostep sampling procedure. The first step involves stratifying all eligible schools that have year 4 students and all eligible schools that have year 8 students using region, decile and school size as stratification variables. A stratified random sample of 100 schools at year 4 and 100 schools at year 8 are selected. Within each school selected, a random sample of up to 25 students from the year 4 or year 8 cohort is selected.
Approximately 5000 students are sampled each year from 200 schools. More detailed information about the NMSSA sampling algorithm and the characteristics of the 2015 sample can be found in 2015 NMSSA Technical Information report (NMSSA, 2017f). The 2015 sample was deemed to be nationally representative across gender, ethnicity, school decile (socio-economic status), type of school and region.
The assessment program was developed by the NMSSA team and used a variety of approaches to collect the data (see (NMSSA, 2017a) for details of the assessment framework for the study). Twelve experienced teachers were employed and specifically trained in 2015 to conduct the program of assessments and other data collection over two and half days within each school. Data about students' attitudes to dance, drama, music and visual arts were collected via questionnaires presented on laptops.
Because of the specific needs of the program, NMSSA used a matrix booklet design in 2015 for the student questionnaire, which meant that students in a random half of the schools responded to questions related to drama and music, and students in the other half of the schools responded to questions related to dance and visual arts. The number of students who responded to each arts discipline by ethnicity and included in this study is given in Table 1. For the study reported in this paper, the year 4 and year 8 students were combined as they were asked the same questions regarding their attitudes to the art disciplines. The total number of students responding to attitude items in dance and visual arts was 2084 (47.6% female), and in drama and music 2213 (49.8% female).

Measures Attitude Statements and Scales
The NZC categorizes 'the arts' as four distinct but related disciplines; dance, drama, music and visual arts. The attitude statements and scales were developed by the NMSSA team in 2014-2015. The statement generation process began by reviewing the relevant literature and other large-scale studies that discussed indicators of attitudes to art disciplines. The NMSSA team worked with teachers, principals, members of the reference groups (Technical, Maori, Pasifika and Special education), members of the curriculum advisory panel in the arts, Ministry of Education officials and academic researchers to discuss and re-shape the statements within each discipline to ensure that they validly represented the attitude construct. The questionnaires were extensively peer-reviewed and piloted to ensure that the crafted statements would be understood by students. Following this, the revised versions were trialed with a total of 941 year 4 and year 8 students. At the end of this process, five items were retained for the final versions to be used in the 2015 study.
Each scale comprised five statements (e.g. "I like doing dance at school"). Students were asked to indicate how much they agreed with each statement on a 4-point Likert scale ("do not agree at all", "agree a little", "mostly agree", "totally agree"). Each scale was a measure of the extent to which students enjoyed and felt confident about their learning in the respective arts.

Data Analysis
Data analysis was conducted in three stages using Mplus 7.1 software (Muthen & Muthen, 2015) to answer each research question. Firstly, the factorial structure of each attitude scale was examined to determine whether the scales were unidimensional and represented the construct of attitude to each discipline of the arts. To determine whether the scales were unidimensional, we used CFA with maximum likelihood estimation with robust standard errors (MLR), which is a robust estimator to non-normality (Sass et al., 2014). Goodness-of-fit was evaluated using multiple criteria (Cheung & Rensvold, 2002;Fan & Sivo, 2005, 2007Hu & Bentler, 1999;Vandenberg & Lance, 2000;Wu et al., 2007). Because of the sensitivity of the  2 statistic to sample size (Brown, 2006;Kline, 2005), acceptable and good model fit was determined by these measures: the root mean square error of approximation (RMSEA) and the standardized root mean square residuals (SRMR) with values < 0.08 (acceptable) or < 0.05 (good); and the comparative fit index (CFI) and the Tucker-Lewis index (TLI) with values > 0.90 (acceptable) or 0.95 (good).
Secondly, to see if the same construct was being measured in the same way for different ethnic groups, the MI for each attitude scale was examined by ethnicity using MG-CFA with means and covariance structure (MACS) (Sorbom, 1974).
To determine if valid and meaningful comparisons between the ethnic groups could be made, we tested three progressively stringent estimates of MI: configural, metric and scalar. We applied the decision rule of ΔCFI ≤ .01 (Cheung & Rensvold, 2002;Vandenberg & Lance, 2000;Wu et al., 2007) because of the over-sensitivity of the chisquare difference test to large sample sizes.
Thirdly, (assuming scalar invariance was established), we compared the latent mean differences between ethnic groups to investigate if different ethnic groups had similar attitudes to dance, drama, music and visual arts.

Results
The results are presented in three sections. The first section reports the descriptive statistics for the attitude scales. The second section displays the results of the CFA analyses conducted to determine the unidimensionality of each scale and to determine if it was the same for each ethnic subgroup. The third section presents the comparisons of attitudes of NZ European, Maori, Pasifika, and Asian students across the arts disciplines. Table 2 presents the descriptive statistics for the five attitude statements, and Cronbach's Alpha values for each scale of attitude to dance, drama, music and visual arts. Overall, students had positive attitudes to the art disciplines, particularly for visual arts. They indicated the lowest attitudes, on average, to dance. The skewness and kurtosis values for the statements were within the acceptable range, not violating the univariate normality. Cronbach's alpha internal consistency estimates for the respective scales of attitude to the arts were above the recommended level of 0.70.

Confirmatory Factor Analysis
Separate unidimensional CFAs for each attitude scale were performed using the MLR estimation method. The modeldata-fit statistics are summarized in Table 3. As can be seen from Table 3, unidimensional models of each attitude scale provided acceptable (e.g., RMSEA< .08) to good (e.g., RMSEA< .05) fit indicating that the one-factor model of these measures was supported. Therefore, we concluded that CFA results were in line with the IRT estimates provided in the NMSSA report (NMSSA, 2017a).
The factor loadings of the attitude to dance, drama, music and visual arts scales are presented in Table 4. All standardized factor loadings ranged from .60 to .93, and were statistically significant, indicating that these statements were good indicators of each arts discipline-specific construct. The CFA results provided support for the factorial validity of each scale.

Measurement Invariance Analyses
After establishing good model fit, we used MG-CFA to test the MI of each scale. MI analyses by ethnicity results for each attitude scale are summarized in Table 5. The configural models fitted the data reasonably well (range of RMSEA: .05-.08, range of CFI/TLI: .97-.99, and range of SRMR: .01-.02) for all ethnic groups, indicating that students from different ethnic subgroups used the same frame of reference to answer the attitude items. Metric and scalar invariance results supported making meaningful comparisons of the latent mean scores between ethnic subgroups. Overall, the analyses supported the MI of the unidimensional attitude scales for each ethnic group in the sample. That is, NZ European, Maori, Pasifika, and Asian students all interpreted the attitude statements in the same way.  Table 6 presents the comparisons of the latent mean scores of attitudes to dance, drama, music and visual arts for different ethnic groups; specifically, to determine whether Maori, Pasifika, and Asian students differed from NZ European students (which was treated as the reference groups). Effect sizes (ES) were used to judge the meaningfulness of the differences. Cohen's (1988) d estimates of ES are judged to be small when d = 0.20; medium when d = 0.50; and large when d = 0.80.

Comparison of Attitudes to the Arts by Ethnicity
Pasifika students had higher mean attitude scale scores than NZ European students across all arts disciplines. However, the ES for music was medium and for dance, drama and visual arts, it was small. Maori students had higher mean attitude scale scores than NZ European students for dance, music and visual arts, and a lower mean scale score for drama. The ES for music was small and negligible for the others. Asian students had higher mean scale scores for drama, music and visual arts, although the mean differences were negligible or small.
A small effect size does not mean that it is trivial. According to some scholars in education (Slavin, 1990) effect sizes above .25 are considered large enough to be educationally meaningful.

Discussion
MI is an important prerequisite for making meaningful and valid comparisons between subgroups on cognitive and non-cognitive measures. Comparisons may be made on the basis of numerous variables, such as gender, ethnicity, socio-economic status, region etc. Given the increasing cultural diversity of NZ and interest in cross-cultural research, a variable of particular interest was ethnicity. This study described the MI of scales to measure students' attitudes to four disciplines of art for different ethnic groups in NZ and investigated the differences in attitudes of those subgroups. This study provided validity and reliability evidence for measuring students' attitudes to dance, drama, music, and visual arts whereas previous research either had a narrow focus on the areas of the arts, such as drama and visual arts (Smithrim & Upitis, 2005), considered art as one subject -art making (Pavlou & Kambouri, 2007), or had not investigated the arts at all. The present study adds to the attitude to arts research literature, by providing empirical evidence supporting the MI of the unidimensional attitude to dance, drama, music and visual arts scales across four ethnic subgroups in NZ primary schools.
The findings of the study supported the scales of attitude to dance, drama, music and visual arts as unidimensional scales that were equally valid for each ethnic group. These findings corresponded with the IRT (Rasch) approach used by NMSSA to construct cognitive and non-cognitive scales (NMSSA, 2017a) and provided support for an endorsement of the procedures used by NMSSA for constructing scales in the learning areas of the NZC.
The attitudes to dance, drama, music and visual arts by ethnic subgroups showed findings similar to those reported in the NMSSA reports (NMSSA 2017b(NMSSA , 2017c(NMSSA , 2017d. Mean scores on the attitude scales indicated generally positive attitudes towards the arts disciplines. However, there were attitudinal differences with respect to ethnicity. Similar to the other learning areas assessed by NMSSA, Pasifika students tended to have higher mean attitude scores than the other ethnic groups in the art disciplines, but especially for dance and music. Mean attitude scale score differences between Asian and NZ students were generally negligible, while the mean attitude to music scale score for Maori students was higher than their NZ European friends.
The psychometric investigation of this study, and the similarity of findings with the published NMSSA findings gives us confidence that reported comparisons between subgroups on cognitive and non-cognitive measures are actual differences, not influenced by extraneous psychometric variables.
Performing arts, especially music and dance are important elements of Pasifika and Maori cultures. Thompson et al., (2009) and Whitinui (2008) have argued that Pasifika and Maori students engage in and learn more through the art of performing by enabling them to make stronger links with their own cultural identity and heritage. These results may partly explain our findings about Pasifika and Maori students' more positive attitudes to these disciplines and provides further construct validity evidence.
According the NZC, the arts and culture are inextricably interconnected. The importance of culturally inclusive programs in the arts is well highlighted in the NZC document: "The Arts in the New Zealand Curriculum places emphasis on all students having opportunities to learn about the indigenous heritage of Maori and the diverse traditions of the European, Pacific, and other cultures that make up our nation" (Ministry of Education, 2007, p. 20).

Conclusion
This study has demonstrated that the IRT procedures adopted by NMSSA to construct cognitive and non-cognitive scales meets the psychometric expectations required to make comparisons between subgroups in large-scale assessment programs. Having established that the scales of attitudes to dance, drama, music and visual arts were unidimensional, multiple criteria were used to evaluate and establish the measurement invariance of the scales. Scale level MI results reported in this study were found to be similar to those item-level MI findings reported by NMSSA. Therefore, the readers and users of NMSSA findings can be reassured of the veracity of the psychometric procedures used, and the findings reported.
Scales measuring student attitudes to the arts have seldom been constructed. Understanding attitudes to different learning areas and possible ethnic differences on those constructs may help educators and policy makers reflect on their practices and provide strategies and intervention programs aimed at encouraging positive attitudes, and thereby improving engagement and schooling outcomes for all students.

Suggestions
Further studies are needed to understand the relationship between achievement in school learning areas and attitude to that learning area. NMSSA has sought to report this for all learning areas in the NZC, but other measures of attitudes also merit investigation. The literature suggests that engagement in the arts may help increase achievement as well as critical thinking and problem-solving skills. Studies examining the relationship between attitudes to the arts and achievement in both the learning area of arts and other curriculum areas are encouraged. The roles of external variables, such as socio-economic background, gender, teacher and school characteristics on students' attitudes to the arts could also be investigated through further research. MI studies of these attitude scales over time also merit investigation given the rapid technological advances in the practice of the arts.

Limitations
This study is not immune to limitations. The data used in this research came from self-report instruments. New studies are encouraged to evaluate the criterion validity of the scales using various measures. For example, the relationship between student-reported attitudes and in-class observations during art classes or the achievement scores awarded by art teachers could be investigated.