Assessment of the Validity and Reliability of Mental Health Instruments of High School Student in Indonesia

This study aims to develop a standard instrument for measuring mental health among urban adolescents in Indonesia. The objective is to produce valid and reliable school adolescent mental health instruments to be used by agencies or schools to identify students' mental health. The survey was conducted in Jakarta and South Tangerang with a total of 1007 respondents divided into two experiments where the first trial was conducted on 597 students and the second trial was conducted on 410 students. Measurements were made using a Likert scale questionnaire. Instrument testing begins with a theoretical validity test by 4 experts and 20 panelists who test the instrument material in terms of construction, content and language. Experts analyze and correct the instrument qualitatively. The instrument was then reviewed and analyzed quantitatively by panelists using the Aiken index. At this stage, 44 items, 9 indicators and 3 variable dimensions were obtained. The next test is done by testing the validity empirically, by analyzing the measurement model using Confirmatory Factor Analysis (CFA) with the LISREL 8.80 Full Version program. By using the criteria for the SLF value ≥0.30 and t-value ≥1.96, and calculating the reliability with the construct reliability (CR) at the level> 0.70, the results of the second trial showed that 35 items were valid. The observations of the model fitness through Goodness-of-Fit test showed that there is a fitness between the theoretical model and the empirical model for the mental health instruments in this study.


Introduction
Mental health involves an individual's ability to feel comfortable with stress in their environment. The ability to respond to stress can be seen through thoughts, feelings, and behaviors that are appropriate to age, norms and local culture. They are also able to actualize their potential. Maslow (1981) claims that self-actualization is a psychologically healthy state, as the fulfillment of the highest needs (Townsend & Morgan, 2017). Because of its important role in measuring mental health, several measurements are widely used across a wide range of ages. However, Hammarström et al. (2016) highlighted the lack of validity measurement results for mental health problems among adolescents. For this reason, this study conducts research on the development of mental health instruments in the form of scales. This instrument is expected to be used as a first step to identify the mental health of students in religious schools or public educational institutions. The development of this instrument is focused on developing mental health instruments for high school students who are in the adolescent phase (Clauss-Ehlers et al., 2013;Omoniyi, 2016).
Good instruments generally are valid and reliable (Cohen et al., 1996;Linn, 2008). The instrument is said to be valid if it has accuracy in measuring the results, while the reliable instrument indicates the measurement results with a stable or consistent score (Reynolds et al., 2009). Fulfillment of the valid and reliable requirements of an instrument shows that the instrument has been standardized (Eryılmaz & Sapsağlam, 2018). This means that the instrument is developed through a rational validation process by experts and panels which is then refined according to the advice of experts and panelists. After the empirical validation process or trials, the instruments are then calculated, analyzed and assembled into a set of valid and reliable instruments (Kubiszyn & Borich, 1987;de Almeida Vieira & Fernandes, 2016).
This study refers to the stage of developing standard instruments, in regards to the constructs of the measured variable of mental health. This study was conducted in Islamic high school, or Madrasah Aliyah (MA), students that are in their adolescence, i.e. in the transition period. Based on the existing problems, this study aims to produce a valid and reliable school adolescent mental health instrument, so that it can be used by the institution or school to identify the initial mental health of students and get a general picture of the mental health of the students (Liu et al., 2013). Students who are mentally healthy are characterized by having positive feelings, good personal development, developed social aspects, empathy, and psychology. Mental health has an influence on behavior (Tengland, 2012), and in the case of students, they are able to adapt to the environment and to the prevailing norms, and able to avoid stress, depression and alcohol abuse, drug abuse and other bad behavior (Hardy et al., 2013). Martin (2012) stated that mental health contributes to learning achievement, perseverance, graduation and overall student success. Students may have difficulties in adjusting themselves. All of these may cause problems for students. Adults in the family, school and community need to understand mental health so that they can be wiser in dealing with, guiding and fostering adolescents. They must help adolescents to get through this phase successfully (Bohnenkamp et al., 2015). Finally, it can be concluded that students' mental health is the existence of genuine harmony between the psychological functions and the ability to adjust according to religious guidance and the ability to actualize the potential.

Literature Review
Mental health Mace (2007) asserts that health is not only defined as a person's freedom from mental illness but also defined as the ability to adjust, which shows serenity. A behavior is categorized as normal or not normal depending on cultural or community norms. Therefore, normal behavior as evidence of mental health in one community can be deemed not normal for other societies. It means that the state of being normal is not absolute but relative (Townsend & Morgan, 2017). The different response of each individual to the source of stress or problems depends on the individual perception and abilities. In stressful situations, besides having positive emotions, individuals must face negative emotions. Positive emotions can eliminate some of the effects of negative emotions, especially physiological effects. Positive emotion is the energy produced by humans through spiritual meanings that follow stressor events that are able to respond to stressors in a more adaptive way (Kring & Caponigro, 2010;Szaflarski et al., 2012). Lamborn et al., (2018) said that mental health is a reflection of a good life, including positive feelings that reach an optimal level of life, which can be constructed into three components, namely emotional, social and psychological well-beings. Gould (2016) identifies four attributes of mental illnesses, namely organic, psychotherapy, sociotherapy and medical. Organic orientation refers to the causes of mental disorders due to physiological abnormalities, while psychotherapy mentions that mental stress is the result of psychological conflict. The association of sociotherapy mentions that mental problems are caused by social and environmental aspects. Lastly, medical model associations are a combination of three orientations namely organic, psychotherapeutic and sociotherapeutic. The fashionable model states that every disease is the result of two environmental factors that integrate into an organism (Gould, 2016). The World Health Organization (WHO) mentions that mental health is an integral part of health and well-being. Mental health is a condition of being healthy physically, psychologically, and socially. It also refers to a condition of being free from mental illness. Determinants of mental health and mental disorders are not only from one's ability to manage thoughts, emotions, behavior, and how a person interacts with others, but also among others social, cultural, economic and political factors (World Health Organization, 2013).

Mental Health Construction
Contemporary literatures agree that the meaning of mental health involves three-dimensional construction, namely emotional, psychological and social. These dimensions have differences in shape and level of well-being. The study of well-beings is divided into two streams, namely equalizing welfare with hedonism or happiness and eudaimonic or human potential that produces positive potential. The hedonic approach relates to the presence of positive emotions, namely by maximizing positive emotions and minimizing negative emotions. Meanwhile, the eudaimonic approach considers optimal psychological function in the life of psychological and social well-being (Petrillo et al., 2015). Previous studies showed that clients from religious backgrounds can increase hedonism (life satisfaction) through eudaimonic well-beings. Religion and spirituality have potential as a source of assistance in counseling, helping to improve adaptability, having goals and optimism in life, and feeling a meaningful life (Yoon et al., 2015;Hefti, 2011).
Religion can affect a person's psychiatric needs by providing a sense of security in the sense of reducing life anxiety (Flannelly & Galek, 2010).The religious component of mental health is associated with holistic mental health therapy, because with a religious approach speeds client's therapy or recovery. Religiosity-based therapy can be more effective than other therapies even with drug therapy, namely by increasing physical and mental health (Chidarikire, 2012;. There are three things that make religion a part of mental health. First, religion with its teachings can guide life and help solve problems. Second, religion has many lessons contained in every religious teaching. Third, religion can be the basis for seeking meaning by feeling the presence of God, feeling the meaning in life, making a self-identity as a model of goodness in the environment (Behere et al., 2013). As mentioned by the World Health Organization (WHO), mental health is health and well-being, although there are diverse interpretations of the definition of health by including spirituality (Nagase, 2012). Gillam (2018) mentions that being mentally healthy is not only free from mental disorders and mental illness, but there are psychological, physical and social and spiritual wellbeings. Spirituality has an important role in mental health. There are a number of notions of mental health, and in principle, these notions make one better than the other, because some of these meanings complement each other so that they become more comprehensive in understanding the meaning of mental health in humans.

Research Design
In assessing the mental health instrument, a theoretical and empirical validity test is conducted (Hammarström et al., 2016;McMillan, 2015). Theoretical validation is carried out by 4 experts who work as psychology lecturers and psychometric and evaluation experts and 20 panelists most of whom have completed or are currently in doctoral education in the field of Islamic education and evaluation. They were involved to correct instrument items from the aspect of content or substance, construction and language. The experts consist of 2 people from the field of psychology and 2 people from the measurement of education. The experts conducted a qualitative study, while the panelists reviewed quantitatively (Cecil et al., 2009).
After being refined based on expert suggestions, the instrument was reviewed by panelists to provide a quantitative assessment of the construct by looking at: (1) whether the existing dimensions are components of the construct of a variable, (2) whether the indicators are part of the dimensions, and (3) whether the items in the instrument are the development of the research variable indicators. Panelist validation was carried out using a Likert scale model with five answer choices, and the answers from the panelists were analyzed using the Aiken V index.
Note: V= validity index from Aiken; = number of experts who choose criterion i; r= criterion i; lo= lowest rating; n= the number of all experts; c= number of ratings/criteria In this study, the validity index was classified as follows: 0.00-0.20 = very low; 0.21-0.40 = low; 0.41-0.60 = sufficient; 0.61-0.80 = high; and, 0.81-1.00 = very high. This theoretical validation is then followed by empirical validation.
Empirical validation was carried out on a questionnaire instrument in the form of a 5-point Likert scale, which had five alternative answer choices that were the opposite between positive and negative questions. For positive (favorable/F) questions, the Likert scale classification is: 5 (very suitable), 4 (appropriate), 3 (less suitable), 2 (inappropriate), and 1 (very inappropriate). Conversely, for negative score items (unfavorable/UF), the scores were 1 (very suitable), 2 (suitable), 3 (less suitable), 4 (unsuitable), 5 (very unsuitable) (McMillan, 2015;Linn, 2008). More detailed information of these items can be seen in the Appendix.

Sampling
Empirical validation was done twice to the 16-17-year-old students of Madrasah Aliyah in Jakarta and South Tangerang. The research was conducted in two trials with different sample size in each stage and a test to assess the item validity was conducted in last stage. The number of samples is calculated by referring to Gable and Wolf (2012), that in order to develop a standard instrument, the number of respondents suggested in the try out is 6-10 times the number of items in the instrument. This study took 10 times the number of items as a sampling basis. Taking into account the estimated response rate of 50 percent of the number of questionnaires distributed, the sample size used is: 44 items x 10 x (100/50) = 880, rounded to 850 In the second stage, the number of items was 37 and the estimated response rate was 55 percent, so that the sample size used is: 37 items x 10 x (100/55) = 672, rounded to 650.
In the first stage, the distributed questionnaires were 850 with response rate of 70.24% (597 questionnaires returned), and in second phase, we employed 650 students as sample with response rate of 63.08% (410 questionnaire returned).
Third stage was used to analyze the model fitness was conducted to observed fit test of CFA 1 and CFA 2 to assessing 35 valid items.

Data Collection
As explained before, the research was conducted in two trials with different sample size in each stage. The first phase of the trial instrument used a questionnaire containing 3 dimensions, 9 indicators and 44 items. The questionnaire was given to 850 students, 597 of whom returned and filled in the questionnaires. The second phase of the trial included a questionnaire containing 3 dimensions, 9 indicators and 37 items. The questionnaire was given to 650 students, 410 of whom completed and returned the questionnaires. In the first trial which was responded by 597 students, the analysis of the model evaluation showed that out of 44 items, there were 37 valid items with the criteria having an SLF value ≥0.30 and t-value ≥1.96, and 7 items were invalid. The reliability is calculated with Construct Reliability (CR) = 0.94, and the calculated results show that the items are reliable. The second stage evaluates the structural model, by using the criteria of SLF value ≥0.30, the t-value loading factor ≥1.96, and Construct Reliability (CR)= 0.93> 0.70. In this stage, the number of responses was from 410 students out of 650 questionnaires distributed. The results of 2 nd trial showed that from a total of 37 items, there are 2 invalid items, leaving 35 valid items as a basis for the mental health instrument. Furthermore, the third stage of the evaluation of model fitness was conducted to observed fit test of CFA 1 and CFA 2, and the model shows the fit of the mental health instrument model as a whole.

Variable Measurements
The instruments used in this study were adopted from some psychometric measurement and assessment developed by Reynolds et al. (2006), and Cohen et al. (1996). The description of the indicators of the dimensions of a mental health variable is shown in Table 1 (see also, Appendix).

Analysis Technique
The testing technique of item validity at the first and second stages was conducted by using Structural Equation Modeling (SEM) analysis. Analysis of the measurement model was done by using Confirmatory Factor Analysis (CFA) with the LISREL 8.80 Full Version program. If the analysis value of Standard Loading Factor (SLF) is smaller than the critical limit of <0.70 or 0.50, the observed verifiability should be removed from the model. In addition to the two critical limits, Igbaria and Baroudi (1993), and Wijayanto (2008) added that if the loading factor is 0.30≥ <0.50 and tvalue ≥1.96, the observed variable can be maintained in the model . Hair Jr. et al. (2014) stated that the valid standard in the CFA is a loading factor ≥ 0.30. This testing technique was used to calculate reliability. Hence, the level of consistency of manifest variables can be determined in measuring the latent construct by using Construct Reliability (CR), and be determined by the acceptance standard of reliability or Construct Reliability (CR)> 0.70 (Kline, 2015).

Theoretical Validity
The expert's theoretical and qualitative validations to mental health instruments of students suggest there are improvements in terms of construction or composition, language and content (Topkaya et al., 2017). The items must be more specific in measuring the indicators of behavior to avoid stereotypical answers. Some items were arranged in favorable directions and some others must be made in unfavorable directions. The experts suggested that 5 items must place the subjects at the beginning of the sentence. The focus of 2 items must be on the respondent's environment and 4 items are suggested to be added. Moreover, 3 items must be substituted, 1 item was requested to be made simpler, 2 items were suggested to be added, 3 items must use standard Indonesian, and 1 word in 3 sentences must be deleted and not be used in the instrument.
The results of the theoretical quantitative validation by the panelists show that 44 items of mental health instruments fit between dimensions, indicators and statement items and are conceptually valid. The panelists validated using the Aiken index, and the results showed that all mental health instrument items were valid, with high (0.61-0.80) and very high (0.81-1.00) calculation results.

Empirical Validity
The item validity test in the first and second stages was conducted using Structural Equation Modeling (SEM) analysis. The development of mental health instruments is carried out using first order and second order confirmatory factor analysis, through stages of measurement model evaluation, structural model evaluation, and model fitness testing. The CFA 1 calculation is that mental health instruments contain 37 valid items as observed in Table 2.  Figure 1. Figure 1 presents the measurement of the second order of confirmatory factor analysis with the t-value model, and shows that all items have a t-value >1.96, meaning that all factor loads are significant. Based on Table 3, the model fit test shows that the overall fit of the mental health instrument model.

Structural Model Evaluation
The structural model is composed of exogenous latent variables and endogenous variables, which describe the relationship of one variable to another. The validity of the variable construct can be known through the values of the structural model. Figure 2 showed that the observed loading factor of the constructs is valid. The loading value of the feeling is 0.68, mind 0.61. behavior 0.88, self-adjustment 0.97, adjustment to others 0.79, adjustment to the environment 0.70, learning activities 0.88, developing interest 0.88 and exercise 0.52. Thus, it can be concluded that the indicators that measure the latent variables of mental health instruments are valid, with t-value of 1.96 and Standardized Loading Factor (SLF) of 0.30.

Figure 2: Diagram of the Construct Model of Mental Health Variables
Furthermore, structural model evaluation is done by testing validity and reliability with the calculations shown in Table 4.

Model Fitness Evaluation
The CFA model is used to test the fitness of the model to find out the causal relationship of each latent variable with an indicator or item. The ideal Goodness-of-Fit (GoF) criteria are presented in Table 5. Based on Table 5, it can be seen that there are 3 indices with poor fit, 5 indices with good fit, and 7 indices with good fit. From the results of the model fit test, it can be seen that the fitness of the overall mental health instrument model is good.
The theoretical and empirical validation processes that have been carried out on mental health instruments from the validation results show that there are 7 invalid items from the first trial, and 2 invalid items in the second trial. This reduces the number of items from 44 items to 35 items. Furthermore, 35 valid items were compiled into a final set of mental health instruments, which form the basis of the questionnaire structure (Appendix). The mental health instrument grid is shown in Table 6. Lastly, the mental health assessment is conducted regarding the overall result regarding the general description of the mental health condition of students from all stages of the 35 questionnaire items. The basis for the assessment for measuring the student mental health involved in this study was obtained from the highest and lowest scores for each trial, and were divided into 3 classifications (good, moderate, and less). The highest and lowest scores in the first stage were 181 and 75, respectively. Meanwhile, in the second stage, the highest score was 175, and the lowest score was 91 ( Table 7). The results of the research in the first and second trials showed that the majority of mental health students at Madrasah Aliyah (MA) were in the adequate mental health category. The results of the first trial of 597 students, there were 416 students or 69.68% in the moderate mental health category, 140 students or 23.45% in the good mental health category, and 41 students or 6, 87% in the health category mentally less. Furthermore, the results of the 2nd trial out of 410 students, there were 252 students or 61.46% in the moderate mental health category, 133 students or 32.44% in the mental health category, and 25 students or 6, 10% in the category good mental health. Based on these data it can be concluded that most of Madrasah Aliyah (MA) students have adequate mental health. This situation can be caused because students of Madrasah Aliyah (MA) are in their teens. Adolescence is a period of transition from childhood and preparation for adulthood. Along with this period, there was rapid physical and psychological development because adolescents were not yet able to master and function properly their physical and psychological functions. The assessment generally denoted that emotional stability that has not been formed properly in adolescence is likely able to affect psychological process. Here, the role of the environment is needed which can help as a preventive and constructive measures to improve adolescent mental health, and to minimize disturbances in their mental health. A good environment that adolescents need is a family, school and community environment that can meet their psychological needs. The specific needs of these environments include providing a social atmosphere, healthy and safe conditions, emotional and behavioral education, and counseling on religious values. The role of adults is also needed by understanding and responding well to the situation of adolescents who are experiencing a sensitive period from their psychological development.

Discussion
This study analyzes the stages of the development and validation of standard instruments as an analytical tool to measure the mental health of school age adolescents. In this study, empirical validation or testing of the instruments was carried out in the field. Test the validity of the items empirically at the first and second stages using Structural Equation Modeling (SEM) analysis. The measurement model analysis was performed using the Confirmatory Factor Analysis (CFA), namely CFA 1 and CFA 2 with the Lisrel 8.80 Full Version program. Measurement of items is carried out by paying attention to the concept of a measured variable construct, namely the mental health variable. The reference for the measurement and utilization of good or quality instruments in general is that it has requirements, namely valid and reliable (Miller et al., 2009;Ronald Jay Cohen & Swerdlik, 2010). The findings of this study confirm the validity of the items of adolescent mental health, particularly students of Madrasah Aliyah in Jakarta and South Tangerang. This is in accordance with Cecil et al. (2009) who state that a valid instrument is to have fitness in performing its measuring function, and the reliable instruments are able to measure results with a stable or consistent measuring score. Furthermore, the use of panelists' opinions in this study is used to provide an opinion regarding the fitness of items to the context of adolescent mental health, and to modify them. This is in accordance with Kubiszyn and Borich (2010), who found that the validity and reliability of an instrument are fulfilled, indicating that the instrument is standardized. Standardized instruments are compiled through a rational validation process by experts and panels which are then refined according to the suggestions of experts and panels. After that, the empirical validation process is carried out or tested, calculated, analyzed and assembled into a set of valid and reliable instruments.
In this study, empirical validation is done twice for students of Madrasah Aliyah in South Jakarta and South Tangerang Hudson et al. (2020) and Arici-Ozcan et al. (2019) who found a relationship between psychological stress experienced by adolescents and the influence of cognitive factors to strengthen the mental health of school age adolescents. In the context of mental health development, there have been several previous studies that analyzed the relationship between validation and measurement with educational performance in general. In the context of secondary schools in Indonesia, this study highlights aspects of measurement validation, which were also expressed by Abubakar et al. (2015). Furthermore, Tran et al. (2019) analyzed the anxiety disorders among adolescent in Indonesia. The findings of this study are also in line with Garey et al. (2020) who evaluated role of religious orientation on the mental health of adolescence in Indonesia.

Conclusion
The development of mental health instruments was carried out through the process of theoretical validation and empirical validation. The theoretical validation was done by experts and panelists that examine items in terms of construction, content and language, as well as analyzing whether representative points represent indicators. Indicators are precise descriptions of dimensions. Dimensions are the exact description of the operational definition. The concept definition of the variable represents the construct of a mental health variable, which is the synthesis of theories. The results of a review of 4 qualitative experts showed that they generally stated that they were good. There were only a few suggestions from experts on a small portion of mental health instrument items to be revised in terms of construction, content and language. This, it could be concluded in principle that there were no meaningful changes, after the points were fixed according to expert advice, then theoretical validation was carried out by 20 panelists quantitatively using the Aiken index. The calculated results showed that all instrument items were valid, with high interpretation of the calculated results (0.61-0.80) and very high (0. 81-1.00). The results showed the model fitness evaluation, the fit test observations the CFA 1 and CFA 2 models show the same results. In conclusion, the fitness of the overall mental health instrument model has a good model fit.
As recommendations, the findings of this study focus on the validity and reliability of adolescent mental health in Indonesia. The findings indicate practical and theoretical recommendations that need to be made in relation to the findings. This study theoretically contributes to measuring items that are valid for adolescent mental health with items such as feeling, mind, behavior, self, relationships with other people, environment, learn, interest and exercise. Practically, this study underlines the importance of the elements of harmony of mental functions, adaptability and the ability to actualize self-potential in the formation of adolescent mental health.
Finally, although this study is insightful in providing a measure of construct validation and reliability that can be used for further research on adolescent health, several limitations of this study have limited its applicability. Firstly, since this study was conducted in Islamic high school, or Madrasah Aliyah (MA), the limitation is that the generalization of this study is limited to adolescent in Islamic high school student, age from 15 to 18 years old. Secondly, this study was conducted by focusing on the relationship between mental health variables and their various constructs. Although this study provides clear and measurable empirical evidence regarding the validity and reliability of the various items used, the lack of testing between items and between variables makes it less likely to adopt them in direct testing. Thirdly, this study did not classify the demographic characteristics of the sample as the basis for cross-sectional testing. Furthermore, this test is limited to adolescents at the high school level, in formal religious education institutions, in Indonesia. Future research is expected to broaden the baseline of measurement for adolescents over a wider age range, and use the demographic characteristics of respondents to analyze differences in mental health across demographic elements. Lastly, further studies also need to calculate the external validity in which the instrument is utilized to mentally healthy and mentally unhealthy adolescents to see the discriminating power of the instrument.

Appendix
Items used for measuring mental health with the explanation of stage by stage validation