The Indonesian Version of the Physics Metacognition Inventory: Confirmatory Factor Analysis and Rasch Model

Metacognition inventory supports increased awareness and self-control to improve student’s academic success, including physics. However, there are limitations to revealing the Physics Metacognition Inventory (PMI), especially in Indonesia. This study aims to explore and evaluate the psychometric properties of PMI. This survey research has involved 479 students from three high schools in Indonesia. The psychometric properties of the I-PMI were evaluated using a Confirmatory Factor Analysis and Rasch Model approach. The results show that the Indonesian Physics Metacognition Inventory (I-PMI) is collected in 6 constructs from 26 items. The validity, reliability, and compatibility tests have also been analyzed with good results. The five rating scales used have adequate functionality. This research has also presented more comprehensive information about the Physics Metacognition Inventory in the context of Indonesian culture. This study has implications for using I-PMI to assess students’ metacognition at the high school level in Indonesia and recommendations for future research.


Introduction
Metacognition has become one of the central issues in educational research worldwide in the last 4 decades (Asy'ari et al., 2019; Kim et al., 2017;Zohar & Barzilai, 2013). The number of these studies has increased significantly since the 1970s (Harrison & Vallin, 2018). Positive influence on metacognition also occurs in almost all fields of science, including physics. Metacognitive awareness has significantly improved students' academic achievement (Hikmah et al., 2021;Taasoobshirazi & Farley, 2013;Uopasai et al., 2018). Students who have high metacognitive awareness tend to be able to improve learning outcomes (Panggayuh, 2017), conceptual understanding (Çetin, 2017;Taasoobshirazi et al., 2015), academic motivation (Mirzaei et al., 2012), and problem-solving abilities (Akben, 2020; Şahin & Kendir, 2013;Tachie, 2019). The next challenge is how to increase students' metacognitive awareness through effective problemsolving. This will be reflected in the learning vision through various models, including those in Indonesia.
Metacognition is an essential component that drives student success in problem-solving (Koyunlu Ünlü & Dökme, 2019). The term metacognition was first initiated by Flavell (1976) as an aspect of problem-solving. This is a higherorder thinking process that involves control and monitoring in learning (Livingston, 2003). Flavell conceptualizes metacognition as an individual's knowledge of cognitive activity or anything correlated with it (Coşkun, 2018;Cubukcu, 2009;Kim et al., 2017;Taasoobshirazi et al., 2015). In another view, metacognition is a person's activity or awareness in managing and controlling cognitive dynamics to have an effective way of learning (Sukarelawan & Sriyanto, 2019;Sulaiman et al., 2021). Then, metacognition develops in the context of problem-solving in collaborative groups (Graesser et al., 2018); skills and awareness (Akben, 2020); and success (Ali et al., 2018). However, there is limited availability of inventory to assess the impact of metacognition on students' problem-solving. With this awareness, Taasoobshirazi and Farley (2013) initiated a metacognitive inventory by developing a 24-item Physics Metacognition Inventory (PMI). This inventory involves an essential construct to validate the instrument. Taasoobshirazi and Farley (2013) report that Exploratory Factor Analysis (EFA) has provided good evidence of construct validity and grouped 24-item into six metacognitive constituent factors, namely: knowledge of cognition, planning, monitoring, evaluation, debugging, and information management. Then in 2015, they revised and further developed the PMI to 26-item. Confirmatory Factor Analysis (CFA) and Rasch analysis have been used to explore the psychometric properties of PMI. As with their first research, they involved college physics students as respondents. The PMI uses a five-point Likert-type scale. Based on the CFA results, 26 items that make up the six factors in the PMI have a good model fit. Rasch analysis provides further evidence of the suitability of the PMI model to the Rasch model. This indicates that the 26 items developed contribute to the 6 factors that underlie the PMI. Therefore, the PMI becomes a valid and reliable inventory for measuring metacognition in problem-solving college physics students.
PMI cannot be directly applied to different cultures even though it has extraordinary psychometric properties. Therefore, a cross-cultural adaptation process is needed. Several researchers have adapted PMI to other cultures (Haeruddin et al., 2020;Koyunlu Ünlü & Dökme, 2019). Koyunlu Ünlü and Dökme adapts PMI into Turkish. A total of 8 experts have been involved in the adaptation process and 458 students for empirical validation. Experts have recommended that the Turkish version of PMI is compatible with Turkish culture. Based on the CFA analysis, the Turkish version of the PMI has a good model fit and meets the criteria for validity and reliability. Then Haeruddin et al. (2020) adapted PMI into the Indonesian version. A total of 3 linguists and 715 physics students were involved. Linguists recommend using PMI because it is by Indonesian culture. EFA results on 26 items indicate the formation of six factors. The CFA results support a good model fit for the Indonesian version of the PMI. Sukarelawan et al. (2021) conducted an inventory of heat and temperature materials in other cases. They showed six constructs which are Knowledge of cognition, Planning, Information management, Monitoring, Debugging, and Evaluation. In addition, the Indonesian version of the PMI has good validity and reliability. However, those were limited to explore the psychometric properties of the PMI. These two studies have not reported DIF (Differential Item Functioning) and the validity of the rating scale in PMI. DIF needs to be reported to determine whether the items that make up the inventory have a bias towards specific attributes of the respondents. Meanwhile, the validity of the rating scale needs to be reported to see if the 5-Point Likert scale used is functioning properly.
Rasch analysis needs to be used as an integrated approach to providing more comprehensive information on the psychometric properties of the inventories developed (Ning, 2018). Combining CFA and Rasch analysis provides a richer and deeper insight into psychometric properties in exploring student PMI in Indonesia (I-PMI). In addition to having good validity and reliability and compatibility with the model, the Indonesian version of the PMI should be free from gender bias. This study also provides rating scale results that do not confuse students. This is an advantage of Rasch analytics which offers this facility. Therefore, this study aims to bridge the gap in the literature through the adaptation of PMI into Indonesian for high school students. The findings of this study offer new and significant information in the Indonesian version of the PMI psychometric properties, which teachers or practitioners can use in assessing students' metacognition.

Research Design
We explored I-PMI through pre-existing constructs (Haeruddin et al., 2020;Koyunlu Ünlü & Dökme, 2019;Taasoobshirazi et al., 2015). The results of the inventory obtained a metacognitive component consisting of 26 items. These are then grouped into six factors (constructs), namely: knowledge of Cognition (K, 6 items), planning (P, 5 items), free-body diagrams (F, 4 items), monitoring (M, 4 items), debugging (D, 3 items), and evaluation (E, 4 items). The I-PMI uses a 5-point Likert scale from "Never suits me" rated 1 to "Always suits me" rated 5. We consulted three English language experts. They concluded that the I-PMI was by Indonesian culture but needed minor revision.

Respondents
The questionnaire was piloted on students in public secondary schools to produce a significant inventory. The CFA sample size requires respondents to be at least five times more than the observed variables (Hair et al., 2014;Kyriazos, 2018). The use of 146 respondents for the CFA analysis of 26 items in the I-PMI is considered to have met the specified sample size. Rasch analysis to assess the model's goodness is recommended between 50-250 respondents (Ling Lee et al., 2020). Therefore, this study (Rasch analysis) involved 333 students in fulfilling the assumption of data stability. We also consider outlier and misfit data. All respondents came from three public high schools in Indonesia, which were selected using stratified random sampling. The characteristics of the respondents are shown in Table 1.

Data collection and ethics
The data collection process in this study has been in collaboration with local school teachers. We have discussed with the teacher what to convey to prospective respondents. Before collecting data, the teacher was asked to explain the purpose of this research. Student participation is voluntary. Students are given the right to withdraw their responses without being penalized. Confidentiality and anonymity have also been guaranteed (Lee et al., 2018;Taasoobshirazi et al., 2015). We ask teachers to inform students that students' responses will not directly impact their learning outcomes.

Analyzing of Data
Several statistical applications have been used to support the data analysis process. Microsoft Excel and IBM SPSS Statistics 24 were used to tabulate the raw data. Confirmatory Factor Analysis (CFA) using Lisrel 8.50 software and Rasch probability model using Winsteps 4.6.1 software.
The structure of the I-PMI factor was analyzed using CFA because the original version has reported the formation of 6 factors on the metacognitive dimension. It is based on the CFA application paradigm, which states that the relationship between the observed variable and the latent variable is known (Byrne, 2013). Model parameters were evaluated employing a maximum likelihood method. Before conducting the overall fittest of the model, we checked the results of the factor loading estimation through the t-value of the item (accepted if > 1.96) and the Standardized Loading Factor item (SLF, accepted if 0.50) (Wijanto, 2008). The overall fit of the model is evaluated from several fit indices such as absolute measures (such as χ 2 /df, RMSEA, and SRMR), incremental measures (such as TLI), and parsimony fit measures (such as PNFI). The overall fit of the model is met if χ 2 /df ≤ 3; RMSEA ≤ 0.08; SRMR ≤ 0.08; TLI ≥ 0.95; and PNFI > 0.60 (Hair et al., 2014).
Rasch's analytical model makes it possible to process ordinal data into interval data from questionnaires (Setiawan et al., 2018). This model can reveal people's behavior by representing the measured items (Habibi et al., 2019). The reliability of the instrument in terms of the Cronbach coefficient, the person reliability coefficient, and the item reliability. Person data is used to see the suitability of respondents statistically, while item data is to know the appropriateness of using items in the instrument (Setiawan et al., 2018). The separation index on persons and items needs to be evaluated to see the function of the measurement instrument (Boone et al., 2014).
The fit of the measurement model was evaluated according to the MNSQ Infit and Outfit statistical values. The model fit interval was accepted in the range of 0.5 < MNSQ < 1.5 (Setiawan et al., 2018). Cronbach coefficient is used to evaluate the level of consistency of internal reliability. Bias on I-PMI items is detected through Differential Item Functioning (DIF). An item has a bias towards gender attributes if it has a probability value below 5% (0.05) (Bond & Fox, 2015;Sumintono & Widhiarso, 2014). The validity of the rating scale is carried out to verify the rating options used in the I-PMI. A good rating scale is if each respondent can distinguish the ratings provided. The validity of the rating scale is evaluated based on five criteria. First, the observed counts used have a unimodal distribution. Second, the number of frequencies for each rating is at least 10. Third, the Observed Average increases monotonically. Fourth, MNSQ Outfit < 2.0. Finally, step calibration increases monotonically (Papini et al., 2020;Sumintono & Widhiarso, 2014).

Confirmatory Factor Analysis
This study aims to explore the psychometric properties of the Indonesian version of I-PMI, which will be used for high school students. This study uses CFA as a grouping of factors and confirms the factors of previous studies. Before carrying out the overall model fit test, the Standardized Loading Factor (SLF) value has been used to see an indication of an offending estimate. Table 2 summarizes the item loading of the 26 items in each I-PMI factor. These results were then compared to previous studies, namely version 1 (Taasoobshirazi et al., 2015) and version 2 (Haeruddin et al., 2020). The CFA results show that all items do not indicate an offending estimate. The factor load value of each item is statistically significant. The items in the I-PMI show a strong correlation to the formed factors. This has been supported by the loading value of each item exceeding the predetermined criteria. The E2 item ("I rechecked my work after completing the physics problem") had the highest correlation to the factor. Meanwhile, items P3 ("Before solving physics problems, I ignored the information I didn't need in the problem") and D1 ("After finishing physics questions, I double-checked my answers") had the lowest correlations among the other items. Composite reliability is a better approach in measuring internal consistency than Cronbach's alpha because it displays the standardized loadings of the manifest variables (Fornell & Larcker, as cited in Zulherman et al., 2021Zulherman et al., , p. 1702. The composite reliability of each factor/construct is between 0.70 -0.92 (≥ 0.70). The composite reliability value range shows that the internal consistency has been met. So, the I-PMI adapted has adequate validity and reliability (internal consistency).
The overall fit of the model was tested using the maximum likelihood method. Table 3 summarizes the overall fit index of the I-PMI model. Based on the overall fit index of the previously determined model, the values of χ 2 /df, RMSEA, SRMR, TLI, and PNFI have met the specified criteria. This indicates that the I-PMI model has a good fit.

Rasch Analysis
Verification of the preferred rating in the I-PMI is carried out to test the validity of the rating scale. A good rating scale if the respondent does not feel confused about the ranking choices provided. Table 4 presents the results of the analysis of the I-PMI rating scale structure. The analysis results in Table 4 show that the number of frequencies on each rating scale is more than 10 and is unimodal. Observed average and step calibration Rasch-Andrich threshold increases monotonically. Outfit MNSQ scores from five categories are in the range of 0.80 -1.47, below 2.0. In addition, the increase in step calibration > 1.0 scale. This is an evidence that the roles between response categories do not overlap. So overall, the I-PMI rating scale matches and functions well.

Figure 1. Category Probabilities curve for I-PMI
The summary of I-PMI statistics in Table 4 includes reliability and separation index on persons and items. Cronbach' a value is obtained at 0.94 (excellent reliability). This shows the appropriate and reliable person-item interaction on the use of I-PMI. Person and item reliability values are 0.92 and 0.98, respectively. This value indicates the consistency of respondents in answering, and the quality of the items in the I-PMI is very good. The average person is 0.71 logit. This shows the tendency of respondents to agree on various attributes of the I-PMI. The person group (3.47) can be divided into 5 strata and the item group (7.06) into 10 strata based on the separation index. A high separation value indicates the ability of the inventory to identify groups of respondents and groups of items in various strata (Sumintono & Widhiarso, 2014). Principal component analysis of residuals has been used to determine the unidimensionality of the I-PMI. This measure is essential to evaluate whether the I-PMI developed can measure metacognitive constructs. Scale unidimensionality is met if the raw variance explained by measures is not less than 40% and the unexplained variance does not exceed 15% (Sumintono & Widhiarso, 2014). Empirically, the raw variance explained by the measures is 45.7%, and the Unexplained variance is 8.3%. This value indicates that the I-PMI has good unidimensionality. Table 6 summarizes the I-PMI's fit and outfit MNSQ statistics. Based on Table 6, it was found that item P3: "Before solving the physics problem, I ignored the information that I didn't need in the problem," which had an infit and outfit MNSQ statistic outside the acceptance value in the range of 0.5-1.5. A large MNSQ infit value indicates a less sensitive response pattern to the goal item on the person. Meanwhile, a large MNSQ Outfit value indicates a less sensitive response pattern to items with a certain level of difficulty in persons (Sumintono & Widhiarso, 2014). The MNSQ infit and outfit values on the P3 item indicate that the psychometric properties are inadequate in measuring the planning construct. This is in line with the results of the CFA analysis. Item P3 has an SLF value slightly above the acceptance limit value. So, item P3 needs to be improved. However, the overall MNSQ infit and outfit statistics have shown a good alignment between response patterns to goal items and item response patterns to certain difficulty levels (Pozo Muñoz & Bretones Nieto, 2019). This has supported the good validity of the I-PMI construct.  Table 6 shows an indication of the presence of DIF through the probability values of cross-gender items. A total of 12 out of 26 items (46%) were biased towards gender (numbers in bold). Nevertheless, the differences that are influenced by gender need to be examined more deeply. This can be done by paying attention to the tendency of DIF in the I-PMI, as shown in Figure 2. In the Knowledge of Cognition construct, items K3 and K4 are identified as having DIF for gender. They tend to be easy to agree with by the male group (see Figure 2a). Items P1 and P2 in the planning construct and items F3 and F4 in the free-body diagram construct have a DIF for gender. As in items K3 and K4, items P1, P2, F3, and F4 tend to be more difficult for women to agree with (see Figures 2b and 2c). In the monitoring construct, items M3 and M4 have DIF based on gender, and based on Figure 2d, it can be seen that there is an opposite trend with 6 items in the previous three constructs. Items M3 and M4 tend to be easier for female students to agree with than male. Two items (D1 and D2) in the Debugging construct and two items (E1 and E2) in the evaluation construct have DIF. The investigation results are in Figures 2e and 2f. These four items (D1, D2, E1, and E2) tend to be more difficult for male students to agree with. Therefore, biased items can be further selected to be discarded or retained. Next, the person-item distribution is carried out to see the sensitivity and reliability of the I-PMI construct, as shown in Figure 3. The left section shows the distribution of persons based on their logit scores. Person logit scores ranged from -2.28 logit (for strongly disagree persons or students with low metacognitive levels) to +3.76 logit (strongly agree persons or students with high metacognitive levels). Students with high metacognitive levels are placed at the top of the map. The right side of the map shows the distribution of items by level of difficulty. The logit score ranges from -1.09 logit (item D1, with the lowest difficulty level or the easiest for students to agree on) to +0.88 logit (item F1, with the highest difficulty level or the most difficult for students to agree on). It appears that the average logit person is +0.71 higher than the average logit item (0.0 logit). This shows that students' metacognitive is above the average level of item standard difficulty. Several items need to be added to fill in the blanks in the interval -1.06 to -0.48 to provide optimal information when applied to low-ability students. This will have implications for increasing instrument sensitivity and reliability for respondents/students (Pozo Muñoz & Bretones Nieto, 2019).

Discussion
The focus of the research is exploring the physics metacognition inventory for high school students. Based on the analysis results, I-PMI has good psychometric behavior in high school students taking physics classes. The CFA and Rasch analysis results provide logical support for the 6 constructs/factors on the original scale. The results show that there are 26 items spread over 6 factors from the CFA analysis. This is also a suggestion that the two main components of metacognition, namely: Knowledge of cognition and Regulation of cognition (Schraw & Moshman, 1995;Taasoobshirazi et al., 2015;Zohar & Barzilai, 2013), can be broken down into six more specific ones. First, knowledge of cognition is knowledge, ideas, and theories that students have memorized about goals and strategies to complete tasks (Çetin, 2017;Zohar & Barzilai, 2013). It includes 3 sub-components: declarative, procedural, and conditional Knowledge (Dafik et al., 2019;Schraw & Moshman, 1995;Taasoobshirazi et al., 2015). Second, the regulation of cognition is a sequence of activities or actions students take to control their thinking or learning (Mahdavi, 2014). At the beginning of its development, regulation of cognition consisted of three components: planning, monitoring, and evaluation (Çetin, 2017;González et al., 2017;Kallio et al., 2017;Mahdavi, 2014). In its development, debugging and information management/free-body diagrams complement the regulation of cognition into five components (Asy'ari et al., 2019; Rahmat & Chanunan, 2018;Schraw & Dennison, 1994;Taasoobshirazi et al., 2015).
The results also show that the I-PMI in this study complements the previous results. These results align with the PMI structure reported by Taasoobshirazi et al. (2015) and Haeruddin et al. (2020). The comparison in Table 2 shows that the I-PMI has a factor structure by the model proposed by Taasoobshirazi et al. (2015) and supports the structure of factors formed in the research Haeruddin et al. (2020) in the Indonesian context. Therefore, this study strengthens the evidence for the suitability of using PMI across cultures and at different educational levels (Haeruddin et al., 2020;Koyunlu Ünlü & Dökme, 2019;Taasoobshirazi et al., 2015). Furthermore, the I-PMI is indicated to assess students' metacognition at the high school level. This research has presented a more comprehensive analysis by displaying CFA, Rasch model, Item Fit, DIF, and Pearson data distribution. This may serve as input for the construct developer to use all the tools so that the resulting model is in the students' metacognition measures inventory.

Conclusion
The psychometric properties of the Indonesian version of the Physics Metacognitive Inventory (I-PMI) have been evaluated based on Confirmatory Factor Analysis and Rasch Model. The I-PMI adopted follows Indonesian culture and has good psychometric properties applied to state high school students in Indonesia. The CFA analysis has supported the 6-factor/construct structure on the original scale. The Rasch analysis shows the 5-point rating scale works well. However, some items in each construct have a gender bias.

Recommendations
This study has supported the 6-factor structure of the metacognitive awareness scale in the Indonesian context. Due to these limitations, we recommend that future research considers the diversity of schools and across interests, cultural backgrounds, and even general education levels. A more comprehensive analysis can also be carried out by involving the equation model. This can clarify how the distinct components that weaken the validation can be detected. Further research can also further identify the factors in the I-PMI that contribute significantly to building students' awareness in managing their cognitive.

Limitations
This research still has some limitations, including gender bias, not too wide age range, not involving private and crossinterest schools, and lack of heterogeneity in the cultural backgrounds of respondents. The results of the DIF show that the items that make up a scale are dependent on the gender in which the scale is developed (Williams et al., 2012). The emergence of bias threatens the validity of items in the I-PMI (Myers et al., 2006). I-PMI items are good if they are equal for each gender group (Papini et al., 2020). This may be because data collection was carried out in public schools only. The results cannot be generalized to students from private schools and across interests. However, this study has contributed to evaluating significant psychometric properties that were not reported by previous similar studies (Haeruddin et al., 2020;Koyunlu Ünlü & Dökme, 2019;Taasoobshirazi et al., 2015). These findings provide significant implications for practitioners, counselors, and parents to design and develop appropriate learning strategies for students.

Authorship Contribution Statement
Sekarelawan: Contribute to the concept and design aspects of research, data acquisition, data analysis, and drafting of manuscript. Jumadi: Contributed to the concept and design aspects, critical manuscript revision, supervision, and final approval. Heru Kuswanto: Contributed to aspects of critical manuscript revision, supervision, and final approval. Thohir: Strengthen discussion, admin, technical or material support.

References
Akben, N. (2020). Effects of the problem-posing approach on students' problem solving skills and metacognitive awareness in science education.