The agreement and the pre-agreement actually observed constitute a random agreement. The percent deal and Kappa have strengths and limits. Percentage chord statistics are easy to calculate and directly interpretable. Its main restriction is that it does not take into account the possibility that councillors guess on partitions. It may therefore overestimate the true agreement between the advisors. The Kappa was designed to take into account the possibility of rates, but the assumptions it makes about the independence of advisers and other factors are not well supported, and it can therefore reduce the estimate of the agreement excessively. In addition, it cannot be interpreted directly, and it has therefore become common for researchers to accept low levels of kappa in their interrater reliability studies. The low level of reliability of the Interrater is unacceptable in the field of health or clinical research, especially when the results of studies can alter clinical practice in a way that leads to poorer patient outcomes. Perhaps the best advice for researchers is to calculate both the approval percentage and kappa. While there are probably a lot of rates between advisors, it may be helpful to use Kappa`s statistics, but if the evaluators are well trained and low rates are likely, the researcher can certainly rely on the percentage of consent to determine the reliability of the Interraters. Theoretically, confidence intervals are represented by the kappa subtraction of the desired DE level value times the standard kappa error. As the most frequently desired value is 95%, Formula 1.96 uses as a constant to multiply the standard error of Kappa (SE).

The formula for a confidence interval is 0.85 – 1.96 x 0.037 to 0.85 – 1.96 x 0.037, which is calculated over an interval of 0.77748 to 0.92252, which leads to a confidence interval of 0.78 to 0.922. It should be noted that the SE depends in part on the sample size. The higher the number of measured observations, the lower the expected standard error. While kappa can be calculated for relatively small sample sizes (z.B 5), IC should be broad enough for such studies, which will lead to a lack of “concordance” within the IC. As a general heuristic, the sample size should not be less than 30 comparisons. Sample sizes of 1000 or more are mathematically the most likely to produce very small CIS, which means that the estimate of match should be very accurate. Until now, the discussion has held that the majority is correct and that the minority evaluators were wrong in their notes and that all advisors had made a conscious scoring choice. Jacob Cohen understood that the hypothesis could be false.

Indeed, he stated expressly that “in the typical situation, there is no criterion for “correction” of judgments (5). Cohen suggests the possibility that at least for some of the variables, none of thevines were sure of the score to be entered and simply made random assumptions. In this case, the agreement reached is a false agreement. Cohen`s Kappa was designed to address this concern. Percentage agreement calculation (fictitious data). Percentage of match on multiple data collectors (fictitious data). There are a number of statistics that have been used to measure the reliability of interreters and intraraterns. A sub-list includes a match percentage, Kappa cohens (for two tyters), kappa fleiss (Adjustment of Cohens Kappa for 3 or more raters), contingency coefficient, Pearson r and Spearman Rho, intraclassin correlation coefficient, match correlation coefficient, and Alpha krippendorff (useful if there are several tips and evaluations). The use of correlation coefficients such as Pearsons r can be a poor reflection of the agreement between advisors, leading to an extreme overshoot or underestimation of the actual level of the breach agreement (6).

In this document, we will take into account only two of the most common measures, the percentage of consent and Kappa cohens.