The Kruskal-Wallis test is an inferential technique to assess two competing hypotheses about the population locations (mean-ranks) across k groups. Specifically, the Kruskal-Wallis test assesses whether the k population locations (mean-ranks) are equal. If there is significant evidence that the population locations (mean-ranks) are different, we want to conduct post hoc testing to investigate where the differences are.
The Kruskal-Wallis test evaluates the equality of k population locations (mean-ranks) based on observed data assuming that the observations are representative of the population of interest and independent.
The Kruskal-Wallis test compares the observed data with what we expect under the null hypothesis: all group locations (mean-ranks) are equal. The resulting p-value tells us how likely it is to observe the evidence we have for the alternative hypothesis or more when the null hypothesis is true. If the p-value is less than the specified significance level (e.g., less than 0.05), we reject the null hypothesis in favor of the alternate hypothesis. In that case, the Kruskal-Wallis test indicates that we have significant evidence that at least one population location (mean-rank) is different. Otherwise, we fail to reject the null hypothesis, meaning we do not have significant evidence that at least one population location (mean-rank) is different.
If the Kruskal-Wallis test result is significant, we can conduct post hoc tests to investigate which means differ. The post hoc test for the Kruskal-Wallis test is Dunn's Test; we correct for multiple simultaneous inferences by applying a Bonferroni or Benjamini-Hochberg correction.
Note that we employ the Wilcoxon rank-sum test in the two-sample case, and it produces its own confidence interval for interpretation. We note that the Wilcoxon rank-sum test is numerically indistinguishable from Kruskal-Wallis and Dunn's test for two groups.
Step 1: To use this app, go to the 'Dataset & Hypothesis' tab and upload your .csv type dataset.
Step 2: You can check the assumptions provided in the 'Summary & Assumptions Check' tab.
Step 3: You can check the result of the Kruskal-Wallis or Wilcox rank-sum procedure (test statistics, decision making, and test visualization) in the 'Hypothesis Test' tab.
Step 4 (Optional): If the hypothesis test produces a significant result, you can view the results of the appropriate post hoc procedures in the 'Post Hoc' tab.
Please contact us if you have any questions at datascience@colgate.edu.
Within the Kruskal-Wallis test app, we provide the penguin data that includes measurements for penguin species inhabiting islands in Palmer Archipelago and made available through the palmerpenguins library for R (Gorman et al., 2014). Suppose researchers aimed to evaluate whether species of penguins (Adelie, Chinstrap, and Gentoo) have differing flipper lengths (mm).
Here, we have three samples of observations (the species) and a continuous attribute (flipper length). We will use the Kruskal Wallis test to evaluate whether the data support the claim that at least one species has a different population location (mean ranks of flipper lengths).
First, we load the Kruskal-Wallis test app. Second, we click 'Sample Data' to load the penguin data. Once the data are loaded, we select the variable (flipper_length_mm) and the sample (species). The data summary provides our first look at the data.
This plot shows that the data are roughly normally distributed as the densities are symmetric and bell-shaped. Unlike the ANOVA test, we don't have to check whether the variances are similar or the data are normally distributed. Still, we note that evaluating whether the observations are representative of the population of interest and independent is more challenging. These data were collected from many penguin nests across three different islands in Palmer Archipelago, meaning the data are likely representative. We trust that the researchers collected data in a way that made the observations near independent.
The 'Hypothesis Test' tab shows the result of the Kruskal-Wallis test. As we might expect after observing the data summaries, there is significant evidence that the population locations (mean ranks of flipper lengths) differ across species (đ²=244.8905, p<0.0001). This tells us that at least one population location (mean ranks of flipper lengths) is different, but not which population locations or in what direction.
To evaluate differences among the populations, click 'Post Hoc'. All differences are statistically significant. This is not all that surprising. The data summaries show that Gentoo penguins have visibly different flipper lengths (mm). While Chinstrap and Adelie are closer in flipper length (mm), they are still significantly different based on our observations.
Gorman KB, Williams TD, Fraser WR (2014) Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis). PLoS ONE 9(3): e90081. doi:10.1371/journal.pone.0090081
Within the Kruskal-Wallis test app, we provide the MFAP4 data, including measurements for Hepatitis C patients collected by the German network of Excellence for Viral Hepatitis and studied by Bracht et al. (2016). These researchers aimed to evaluate whether the human microfibrillar-associated protein 4 (MFAP4, U/ml) varies across the disease stages of Hepatitis C (0, 1, 2, 3, 4). The researchers can use the Kruskal Wallis test to evaluate MFAP4 as a biomarker for disease stages of Hepatitis C.
Here, we have five samples of observations (the disease stages) and a continuous attribute (MFAP4 U/ml). We will use the Kruskal Wallis test to evaluate whether the data support the claim that the MFAP4 levels vary across disease stages; i.e., at least one disease stage has a different population location (mean ranks of MFAP U/ml).
First, we load the Kruskal-Wallis test app. Second, we click 'Sample Data' to load the MFAP4 data. Once the data are loaded, we select the variable (MFAP4) and the sample (Fibrosis.Stage). The data summary provides our first look at the data.
At this point, we observed the data are heavily skewed, and the variances differ. Unlike the ANOVA test, we don't have to check whether the variances are similar or the data are normally distributed. We note that evaluating whether the observations are representative of the population of interest and independent is more challenging. In their paper, Bracht et al. (2016) tell us these data were collected at different sites using a protocol meant to reduce bias, meaning the data are likely to be representative. We trust that the researchers collected data in a way that made the observations near independent.
The 'Hypothesis Test' tab shows the result of the Kruskal-Wallis test. As we might expect after observing the data summaries, there is significant evidence that the population locations (mean ranks of MFAP U/ml) differ across disease stages (đ²=117.7364, p<0.0001). This tells us that at least one population location is different but not which population location(s) or in what direction.
To evaluate differences among the populations, click 'Post Hoc'. We see that stage 0 is different from stages 2-4; stage 1 is different from stages 2-4; stage 2 is different from stages 0-1, 3-4; stage 3 is different from stages 0-2; and stage 4 is different from stages 0-2. That is, the Kruskal-Wallis test creates groupings (0-1), 2, and (3-4). Note that this result is slightly different than that of the ANOVA test, which Bracht et al. (2016) use to suggest that MFAP4 is a promising biomarker for the assessment of no to moderate hepatic fibrosis stages (0-2) from patients with severe fibrosis and cirrhosis (3-4).
Looking at the graphical summary of the data, we see that stage 2 patients have slightly higher MFAP4 U/ml observations at the median but a similar distribution to stages 0-2. When using the ANOVA procedure, this difference is not statistically distinguishable, and it is distinguishable when using the Kruskal-Wallis test. That is, we need to consider the full picture of the data, not just whether the differences are statistically significant.
Bracht, T., Molleken, C., Ahrens, M., Poschmann, G., Schlosser, A., Eisenacher, M., ... & Sitek, B. (2016). Evaluation of the biomarker candidate MFAP4 for non-invasive assessment of hepatic fibrosis in hepatitis C patients. Journal of Translational Medicine, 14(1), 1-9.
Within the Kruskal-Wallis test app, we provide U.S. News and World Report's College Data that includes measurements for many U.S. Colleges from the 1995 issue of U.S. News and World Report and made available through the ISLR library in R (James et al., 2017). Suppose we aimed to evaluate whether alumni donate at different rates at private and public colleges and universities.
Here, we have two samples of observations (private/public) and a discrete attribute (percent of alumni who donate). We will use the Kruskal Wallis test to evaluate whether the data support the claim that there is a difference in the locations (mean ranks of percent of alumni who donate) across types of colleges and universities.
First, we load the Kruskal-Wallis test app. Second, we click 'Sample Data' to load the U.S. News College data. Once the data are loaded, we select the variable (perc.alumni) and the samples (private). The data summary provides our first look at the data.
We note that our data is discrete. There are only 61 unique observations for the percentage of alumni donating among 777 institutions. The Kruskal-Wallis test can suffer when there are a substantial number of ties, but it performs reasonably in cases like this where there are a moderate or low number of ties.
The 'Hypothesis Test' tab shows the result of the Wilcoxon signed-rank test, which is a two-sample version of the Kruskal-Wallis test. As we might expect after observing the data summaries, there is significant evidence that the population locations (mean ranks of percent of alumni who donate) differ across types of colleges and universities (Z=-12.1008, p<0.0001). This tells us that at least one population location is different but not which population location(s) or in what direction.
To evaluate differences among the populations, click 'Post Hoc'. We see that private institutions see a higher percentage of alumni donating than non-private institutions because the confidence interval for the comparison (Private - Not Private) is always positive. This is not surprising because the graphical and numerical summaries show that the mean and median are much larger for private institutions. That is, the difference is statistically significant and large.
Gareth James, Daniela Witten, Trevor Hastie and Rob Tibshirani (2017). ISLR: Data for an Introduction to Statistical Learning with Applications in R. R package version 1.2. https://CRAN.R-project.org/package=ISLR