@ -6,5 +6,84 @@
This section addresses the statistical analysis of the data obtained throughout
the main, within-subject, user study (n = 24) that consisted of five repeated
measurements. Because the data was from related, dependent groups, we used
repeated measurement \gls { ANOVA} if all required assumption were met and
Friedman's Test otherwise.
\textit { Repeated Measurement \gls { ANOVA} } if all required assumption were met
and \textit { Friedman's Test} otherwise. To identify the specific pairs of
treatments that differed significantly, we ran either \textit { Dependent T-Tests}
or \textit { Wilcoxon Signed Rank Tests} (both with \textit { Holm correction
(sequetially rejective Bonferroni test)} \cite { holm_ correction} ) as post-hoc
tests \cite { field_ stats, downey_ stats} . The reliability of the two sub-scales
(hedonic and pragmatic quality) in the \glsfirst { UEQ-S} was estimated using
\textit { Cronbach's alpha} \cite { tavakol_ cronbachs_ alpha} . All results are
reported statistically significant with an $ \alpha $ -level of $ p < 0 . 05 $ . We used
95\% confidence intervals in visualizations of certain results. Normality of
data or residuals was checked using visual assessment of \gls { Q-Q} plots and
additionally \textit { Shapiro-Wilk} Test \cite { field_ stats, downey_ stats} .
\subsubsection { Own Keyboard \& Reference Values}
\label { sec:res_ OPC}
As mentioned in Section \ref { sec:main_ design} , the keyboard \textit { Own} was
used as a reference for some metrics captured during the experiment. Since the
measurements with \textit { Own} took place at the start (T0\_ 1) and end (T0\_ 2)
of the experiment, we compared the results of both typing tests to detect
possible variations in performance due to fatigue. Using dependent T-tests, we
found that there were no significant differences in \glsfirst { KSPS} for T0\_ 1 (M
= 5.39, sd = 1.49) compared to T0\_ 2 (M = 5.47, sd = 1.48, t = -1.53, p =
0.139), \glsfirst { UER} was overall negligible with T0\_ 1 (M = 0.005, sd = 0.013,
85th percentile = 0.0051) and T0\_ 2 (M = 0.008, sd = 0.028, 85th percentile =
0.0052) and \glsfirst { WPM} showed a trend to approach significance with T0\_ 1 (M
= 54.2, sd = 14.7) compared to T0\_ 2 (M = 53.0, sd = 14.5, t = 1.92, p =
0.067). Further, using dependent T-tests we were able to find statistically
significant differences in \glsfirst { AdjWPM} for T0\_ 1 (M = 53.9, sd = 14.5) and
T0\_ 2 (M = 52.5, sd = 14.3, t = 2.44, p = 0.023), \glsfirst { CER} for T0\_ 1 (M =
0.057, sd = 0.028) and T0\_ 2 (M = 0.078, sd = 0.038, t = -3.54, p = 0.002) and
\glsfirst { TER} for T0\_ 1 (M = 0.063, sd = 0.031) and T0\_ 2 (M = 0.086, sd =
0.039, t = -4.27, p = 0.0003). Because of the differences, we decided to use the
means of all metrics gathered for each participant through T0\_ 1 and T0\_ 2 as
the reference values to compute the \textit { \gls { OPC} } for the test keyboards
(\textit { Athena, Aphrodite, Nyx} and \textit { Hera} ).
Additionally, using a dependent T-test, we compared the muscle activity (\% of
\glsfirst { MVC} ) and found, that there are significant differences in left flexor
(\glsfirst { FDP} \& \glsfirst { FDS} ) \% \gls { MVC} for T0\_ 1 (M = 12.0, sd = 8.27)
and T0\_ 2 (M = 8.53, sd = 7.16, t = 3.18, p = 0.004). Residuals of right flexor
(\gls { FDF} \& \gls { FDS} ) were not normally distributed, therefore we used the
Wilcoxon Signed Rank Test and found an significant difference for T0\_ 1 (M =
10.8, sd = 8.18, Med = 9.52) and T0\_ 2 (M = 7.71, sd = 6.08, Med = 5.32, p =
0.021). It has to be noted, that we had to remove two erroneous measurements for
the right flexor (n = 22). No significant differences have been found in left or
right extensor (\glsfirst { ED} ) \% \gls { MVC} between T0\_ 1 and T0\_ 2.
\begin { table} [ht]
\centering
\ra { 1.3}
\begin { tabular} { ?l^ l^ l^ l^ l^ l^ l^ l}
\toprule
\rowstyle { \itshape }
Y & Comparison & Statistic & p & Estimate & CI & Method & Alternative \\
\midrule
WPM & T0\_ 1 - T0\_ 2 & 1.92 & 0.07 & 1.18 & [-0.09, 2.45] & T-test & two.sided \\
AdjWPM & T0\_ 1 - T0\_ 2 & 2.44 & 0.02* & 1.35 & [0.21, 2.50] & T-test & two.sided \\
KSPS & T0\_ 1 - T0\_ 2 & -1.53 & 0.14 & -0.08 & [-0.19, 0.03] & T-test & two.sided \\
CER & T0\_ 1 - T0\_ 2 & -3.54 & 0.00* & -0.02 & [-0.03, -0.01] & T-test & two.sided \\
TER & T0\_ 1 - T0\_ 2 & -4.27 & 0.00* & -0.02 & [-0.03, -0.01] & T-test & two.sided \\
\% MVC_ { LF} & T0\_ 1 - T0\_ 2 & 3.18 & 0.004* & 3.44 & [1.20, 5.68] & T-test & two.sided \\
\% MVC_ { LE} & T0\_ 1 - T0\_ 2 & 1.44 & 0.163 & 0.956 & [-0.42, 2.33] & T-test & two.sided \\
\% MVC_ { RF} & T0\_ 1 - T0\_ 2 & 3.18 & 0.004* & 3.44 & [1.20, 5.68] & T-test & two.sided \\
\% MVC_ { RE} & T0\_ 1 - T0\_ 2 & 3.18 & 0.004 & 3.44 & [1.20, 5.68] & T-test & two.sided \\
\bottomrule
\end { tabular}
\end { table}
\subsection { Performance Metrics}
\label { sec:res_ perf}
\subsubsection { Typing Speed}
\label { sec:res_ typing_ speed}
The typing speed for each individual keyboard and typing test was automatically
captured with the help of the typing test functionality offered by
\glsfirst { GoTT} . We captured \gls { WPM} , \gls { AdjWPM} and
\gls { KSPS} according to the formulas mentioned in Section
\ref { sec:meas_ perf} . The individual measurements were then converted into
percentage values of the mean of the reference values gathered from typing tests
with keyboard \textit { Own} . None of the gathered data for the individual
treatments was distributed normally and thus, Friedman's Test was applied.