You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

463 lines
24 KiB

% A rapid method that creates many corrected errors, has efficient error correction, and leaves
% few uncorrected errors can still be considered a successful method, since it produces
% accurate text in relatively little time. pp. 56 MacKenzie
4 years ago
\section{Results}
\label{sec:results}
This section addresses the statistical analysis of the data obtained throughout
the main, within-subject, user study (n = 24) that consisted of five repeated
measurements. Because the data was from related, dependent groups, we used
\textit{\gls{rmANOVA}} if all required assumption were met and
\textit{Friedman's Test} otherwise. To identify the specific pairs of treatments
that differed significantly, we ran either \textit{Dependent T-Tests} or
\textit{Wilcoxon Signed Rank Tests} (both with \textit{Holm correction
(sequetially rejective Bonferroni test)} \cite{holm_correction}) as post-hoc
tests \cite{field_stats, downey_stats}. The reliability of the two sub-scales
(hedonic and pragmatic quality) in the \glsfirst{UEQ-S} was estimated using
\textit{Cronbach's alpha} \cite{tavakol_cronbachs_alpha}. All results are
reported statistically significant with an $\alpha$-level of $p < 0.05$. We used
95\% confidence intervals when presenting certain results. Normality of data or
residuals was checked using visual assessment of \gls{Q-Q} plots and
additionally \textit{Shapiro-Wilk} Test. Further, we used \textit{Mauchly's Test
for Sphericity} to evaluate if there was statistically significant variation
in the variances of the differences of contrasting groups \cite{field_stats,
downey_stats}.
\subsection{Own Keyboard}
\label{sec:res_OPC}
As mentioned in Section \ref{sec:main_design}, the keyboard \textit{Own} was
used as a reference for some metrics captured during the experiment. Since the
measurements with \textit{Own} took place at the start (T0\_1) and end (T0\_2)
of the experiment, we compared the results of both typing tests to detect
possible variations in performance due to fatigue. Using dependent T-tests, we
found that there were no significant differences in \glsfirst{KSPS} for T0\_1 (M
= 5.39, sd = 1.49) compared to T0\_2 (M = 5.47, sd = 1.48, t = -1.53, p =
0.139), \glsfirst{UER} was overall negligible with T0\_1 (M = 0.005, sd = 0.013,
85th percentile = 0.0051) and T0\_2 (M = 0.008, sd = 0.028, 85th percentile =
0.0052) and \glsfirst{WPM} showed a trend to approach significance with T0\_1 (M
= 54.2, sd = 14.7) compared to T0\_2 (M = 53.0, sd = 14.5, t = 1.92, p =
0.067). Further, using dependent T-tests we were able to find statistically
significant differences in \glsfirst{AdjWPM} for T0\_1 (M = 53.9, sd = 14.5) and
T0\_2 (M = 52.5, sd = 14.3, t = 2.44, p = 0.023), \glsfirst{CER} for T0\_1 (M =
0.057, sd = 0.028) and T0\_2 (M = 0.078, sd = 0.038, t = -3.54, p = 0.002) and
\glsfirst{TER} for T0\_1 (M = 0.063, sd = 0.031) and T0\_2 (M = 0.086, sd =
0.039, t = -4.27, p = 0.0003). Because of the differences, we decided to use the
means of all metrics gathered for each participant through T0\_1 and T0\_2 as
the reference values to compute the \textit{\gls{OPC}} for the test keyboards
(\textit{Athena, Aphrodite, Nyx} and \textit{Hera}). This value was later used
to make statements about the performance of the individual test keyboards
compared to the participant's own, familiar, keyboard.
Additionally, using a dependent T-test, we compared the muscle activity (\% of
\glsfirst{MVC}) and found, that there are significant differences in left flexor
(\glsfirst{FDP} \& \glsfirst{FDS}) \%\gls{MVC} for T0\_1 (M = 12.0, sd = 8.27)
and T0\_2 (M = 8.53, sd = 7.16, t = 3.18, p = 0.004). Residuals of right flexor
(\gls{FDF} \& \gls{FDS}) were not normally distributed, therefore we used the
Wilcoxon Signed Rank Test and found an significant difference for T0\_1 (M =
10.8, sd = 8.18, Med = 9.52) and T0\_2 (M = 7.71, sd = 6.08, Med = 5.32, p =
0.021). It has to be noted, that we had to remove two erroneous measurements for
the right flexor (n = 22). No significant differences have been found in left or
right extensor (\glsfirst{ED}) \%\gls{MVC} between T0\_1 and T0\_2. All results
can be observed in Table \ref{tbl:res_own_before_after}.
\begin{table}[H]
\centering
\small
\ra{1.3}
\begin{tabular}{?l^l^l^l^l^l^l^l}
\toprule
\rowstyle{\itshape}
Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\
\midrule
\multicolumn{6}{l}{\textbf{Parametric (Dependent T-test)}} \\
WPM & T0\_1 - T0\_2 & 1.92 & 0.07^\dagger & 1.18 & [-0.09, 2.45] & two-tailed \\
AdjWPM & T0\_1 - T0\_2 & 2.44 & 0.02^* & 1.35 & [0.21, 2.50] & two-tailed \\
KSPS & T0\_1 - T0\_2 & -1.53 & 0.14 & -0.08 & [-0.19, 0.03] & two-tailed \\
CER & T0\_1 - T0\_2 & -3.54 & 0.002^* & -0.02 & [-0.03, -0.01] & two-tailed \\
TER & T0\_1 - T0\_2 & -4.27 & 0.0003^* & -0.02 & [-0.03, -0.01] & two-tailed \\
\%MVC_{LF} & T0\_1 - T0\_2 & 3.18 & 0.004^* & 3.44 & [1.20, 5.68] & two-tailed \\
\%MVC_{LE} & T0\_1 - T0\_2 & 1.44 & 0.163 & 0.956 & [-0.42, 2.33] & two-tailed \\
\multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\
\%MVC_{RF} & T0\_1 - T0\_2 & 197 & 0.021^* & 1.83 & [0.39, 3.93] & two-tailed \\
\%MVC_{RE} & T0\_1 - T0\_2 & 173 & 0.527 & 0.28 & [-0.58, 0.91] & two-tailed \\
\bottomrule
\end{tabular}
\caption{Statistical analysis of differences between typing tests T0\_1 and
T0\_2 for keyboard \textit{Own}. For $\%MVC_{RF}$ two erroneous measurements
were removed (n = 22). Statistically significant differences (p < 0.05) are
marked with an asterisk and p values indicating a trend towards significance
are denoted with $\dagger$. Confidence intervals are given for the estimate
in the difference in means (T-test) and difference of the location parameter
(Wilcoxon). The subscript LF, RF, LE, RE stand for left or right forearm
flexor or extensor muscles}
\label{tbl:res_own_before_after}
\end{table}
We also evaluated the means of \glsfirst{KCQ} questions 8 to 12 which concerned
perceived fatigue in fingers, wrists, arms, shoulders and neck respectively
(7-point Likert scale) and the slopes (improving, deteriorating, stable) of the
UX-curves drawn by each participant after the whole experiment, to identify
possible differences in perceived fatigue from T0\_1 to T0\_2. As shown in
Figure \ref{fig:res_own_per_fat}, participants \gls{KCQ} reported slight
improvements in terms of finger (diff = 0.33) and wrist (diff = 0.33) fatigue in
T0\_2 compared to T0\_1, no difference in arm fatigue (diff = 0) and very
slightly increased fatigue in shoulder (diff = -0.12) and neck (diff = -0.13) in
T0\_2 compared to T0\_1. Sixteen of the twenty-four UX-curves regarding overall
perceived fatigue had positive slope when measured from start of T0\_1 to end of
T0\_2 ($\pm$ 1 mm). The subjective reports about the decrease in finger and
wrist fatigue emphasize the decrease in muscle activity for the flexor muscles
we described in the last paragraph.
\begin{figure}[H]
\centering
\includegraphics[width=1.0\textwidth]{images/res_own_per_fat}
\caption{Trends for reported fatigue through the \gls{KCQ} (questions 8:
finger, 9: wrist, 10: arm, 11: shoulder, 12: neck) and histogram for the
slopes (IM: improving, DE: deteriorating, ST: stable) of UX-curves
concerning perceived fatigue. The curves were evaluated by looking at the y
value of the starting point for T0\_1 and comparing it to y value of the end
point for T0\_2 with a margin of $\pm$ 1 mm}
\label{fig:res_own_per_fat}
\end{figure}
\subsection{Performance Metrics}
% As briefly mentioned in the last section, the individual measurements were then converted into
% percentage values of the mean of the reference values gathered from typing tests
% with keyboard \textit{Own} (\gls{OPC}).
\label{sec:res_perf}
\subsubsection{Typing Speed}
\label{sec:res_typing_speed}
The typing speed for each individual keyboard and typing test was automatically
captured with the help of the typing test functionality offered by
\glsfirst{GoTT}. We captured \gls{WPM}, \gls{AdjWPM} and \gls{KSPS} according to
the formulas mentioned in Section \ref{sec:meas_perf}. We used the mean of the
results for both typing tests performed with each keyboard to conduct the
following statistical analysis. A \gls{rmANOVA} was performed and revealed
possible differences between at least two of the test keyboards (\textit{Athena,
Aphrodite, Nyx} and \textit{Hera}) in terms of \gls{WPM} (F(3, 69) = 6.036, p
= 0.001). We performed dependent T-tests with Holm correction and found
significant differences between \textit{Aphrodite} (M = 51.5, sd = 14.0) and
\textit{Nyx} (M = 49.4, sd = 13.3, t = 3.33, p = 0.014), \textit{Athena} (M =
51.5, sd = 14.2) and \textit{Nyx} (M = 49.4, sd = 13.3, t = 2.76, p = 0.044) and
\textit{Hera} (M = 51.9, sd = 14.6) and \textit{Nyx} (M = 49.4, sd = 13.3, t =
3.53, p = 0.01). Further, the \gls{rmANOVA} for \gls{AdjWPM} yielded (F(3, 69) =
6.197, p = 0.0009) and for \gls{KSPS} (F(3, 69) = 3.566, p = 0.018). All
relevant results of the post-hoc tests and the summary of the performance data
can be observed in Tables \ref{tbl:sum_tkbs_speed} and \ref{tbl:res_tkbs_speed}.
\begin{table}[H]
\centering
\footnotesize
\ra{1.2}
\toprule
\parbox{.49\linewidth}{
\begin{tabular}{?r^l^l^l^l^l}
\multicolumn{6}{c}{\textbf{\gls{WPM}}} \\
\rowstyle{\itshape}
Pseud. & Mean & Min & Max & SD & SE \\
\midrule
Athena & 51.47 & 17.96 & 73.86 & 14.21 & 2.90 \\
Aphrodite & 51.46 & 20.76 & 76.36 & 14.01 & 2.86 \\
Nyx & 49.39 & 20.80 & 74.26 & 13.28 & 2.71 \\
Hera & 51.87 & 18.10 & 76.06 & 14.55 & 2.97 \\
\end{tabular}
}
\parbox{.49\linewidth}{
\begin{tabular}{?r^l^l^l^l^l^l^l}
\multicolumn{6}{c}{\textbf{\gls{AdjWPM}}} \\
\rowstyle{\itshape}
Pseud. & Mean & Min & Max & SD & SE \\
\midrule
Athena & 51.04 & 17.94 & 73.19 & 14.07 & 2.87 \\
Aphrodite & 50.97 & 20.76 & 75.78 & 13.95 & 2.85 \\
Nyx & 48.84 & 20.80 & 73.62 & 13.17 & 2.69 \\
Hera & 51.32 & 18.06 & 75.14 & 14.40 & 2.94 \\
\end{tabular}
}
\begin{tabular}{?r^l^l^l^l^l^l^l}
\\
\multicolumn{6}{c}{\textbf{\gls{KSPS}}} \\
\rowstyle{\itshape}
Pseud. & Mean & Min & Max & SD & SE \\
\midrule
Athena & 5.23 & 1.68 & 7.94 & 1.54 & 0.31 \\
Aphrodite & 5.32 & 2.00 & 8.14 & 1.50 & 0.31 \\
Nyx & 5.31 & 1.95 & 8.15 & 1.48 & 0.30 \\
Hera & 5.37 & 1.72 & 8.15 & 1.57 & 0.32 \\
\end{tabular}
\bottomrule
\caption{Summaries for \glsfirst{WPM}, \glsfirst{AdjWPM} and \glsfirst{KSPS} for the test keyboards}
\label{tbl:sum_tkbs_speed}
\end{table}
\begin{table}[H]
\centering
\small
\ra{1.3}
\begin{tabular}{?l^l^l^l^l^l^l^l}
\toprule
\rowstyle{\itshape}
Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\
\midrule
\multicolumn{6}{l}{\textbf{Parametric (Dependent T-test)}} \\
WPM & Athena - Nyx & 2.765 & 0.044^* & 2.083 & [0.52, 3.64] & two-tailed \\
WPM & Aphrodite - Nyx & 3.332 & 0.014^* & 2.069 & [0.78, 3.35] & two-tailed \\
WPM & Hera - Nyx & 3.541 & 0.010^* & 2.479 & [1.03, 3.93] & two-tailed \\
AdjWPM & Athena - Nyx & 2.868 & 0.035^* & 2.200 & [0.61, 3.79] & two-tailed \\
AdjWPM & Aphrodite - Nyx & 3.443 & 0.011^* & 2.132 & [0.85, 3.41] & two-tailed \\
AdjWPM & Hera - Nyx & 3.515 & 0.011^* & 2.475 & [1.02, 3.93] & two-tailed \\
KSPS & Athena - Hera & -2.834 & 0.056^\dagger & -0.145 & [-0.25, -0.04] & two-tailed \\
KSPS & Aphrodite - Athena & 2.566 & 0.086^\dagger & 0.095 & [0.02, 0.17] & two-tailed \\
\bottomrule
\end{tabular}
\caption{Relevant post-hoc results of speed related metrics for the test
keyboards. Significant p values are denoted with * and p values indicating a
trend towards significance are marked with $\dagger$. Confidence intervals
are given for the estimate in the difference in means}
\label{tbl:res_tkbs_speed}
\end{table}
\subsubsection{Error Rate}
\label{sec:res_error_rate}
\gls{GoTT} also automatically tracked various error related metrics from which
we analyzed \glsfirst{UER}, \glsfirst{CER} and \glsfirst{TER}. Since we were
interested in whether higher actuation forces lead to a lower error rates
compared to lower actuation forces, we conducted one-tailed post-hoc tests for
the following statistical analyses. Like in Section \ref{sec:res_typing_speed},
we used the means of the results from both typing test for each keyboard to
conduct the analysis. The Friedman's Tests for \gls{TER} ($\chi^2$(3) = 25.4, p
= 0.00001) and the \gls{rmANOVA} for \gls{CER} (F(3, 69) = 13.355, p = 0.0000408
(\gls{GG})) revealed differences for at least two test keyboards. The Friedman's
Test for \gls{UER} ($\chi^2$(3) = 2.59, p = 0.46) yielded no statistical
significant difference. It should be noted, that the 90th percentile of
\gls{UER} for all keyboards was still below 1\%. Summaries for the individual
metrics and results for all post-hoc tests can be seen in Table
\ref{tbl:sum_tkbs_err} and \ref{tbl:res_tkbs_err}.
\begin{table}[H]
\centering
\footnotesize
\ra{1.2}
\toprule
\begin{tabular}{?r^l^l^l^l^l}
\multicolumn{6}{c}{\textbf{\gls{TER}}} \\
\rowstyle{\itshape}
Pseud. & Mean & Min & Max & SD & SE \\
\midrule
Athena & 0.08 & 0.02 & 0.17 & 0.03 & 0.01 \\
Aphrodite & 0.09 & 0.02 & 0.20 & 0.04 & 0.01 \\
Nyx & 0.11 & 0.03 & 0.25 & 0.06 & 0.01 \\
Hera & 0.09 & 0.02 & 0.21 & 0.04 & 0.01 \\
\end{tabular}
\\
\parbox{.49\linewidth}{
\begin{tabular}{?r^l^l^l^l^l^l^l}
\multicolumn{6}{c}{\textbf{\gls{UER}}} \\
\rowstyle{\itshape}
Pseud. & Mean & Min & Max & SD & SE \\
\midrule
Athena & 0.01 & 0.00 & 0.14 & 0.03 & 0.01 \\
Aphrodite & 0.01 & 0.00 & 0.17 & 0.03 & 0.01 \\
Nyx & 0.01 & 0.00 & 0.21 & 0.04 & 0.01 \\
Hera & 0.01 & 0.00 & 0.18 & 0.04 & 0.01 \\
\end{tabular}
}
\parbox{.49\linewidth}{
\begin{tabular}{?r^l^l^l^l^l^l^l}
\multicolumn{6}{c}{\textbf{\gls{CER}}} \\
\rowstyle{\itshape}
Pseud. & Mean & Min & Max & SD & SE \\
\midrule
Athena & 0.07 & 0.02 & 0.13 & 0.03 & 0.01 \\
Aphrodite & 0.08 & 0.02 & 0.18 & 0.04 & 0.01 \\
Nyx & 0.10 & 0.03 & 0.23 & 0.05 & 0.01 \\
Hera & 0.08 & 0.02 & 0.14 & 0.04 & 0.01 \\
\end{tabular}
}
\bottomrule
\caption{Summaries for \glsfirst{TER}, \glsfirst{UER} and \glsfirst{CER} for the test keyboards}
\label{tbl:sum_tkbs_err}
\end{table}
\begin{table}[H]
\centering
\small
\ra{1.3}
\begin{tabular}{?l^l^l^l^l^l^l^l}
\toprule
\rowstyle{\itshape}
Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\
\midrule
\multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\
TER & Athena - Hera & 38.0 & 0.004^* & -0.011 & [-Inf, -0.01] & less \\
TER & Athena - Aphrodite & 58.5 & 0.009^* & -0.012 & [-Inf, 0] & less \\
TER & Athena - Nyx & 18.0 & 0.00009^* & -0.027 & [-Inf, -0.02] & less \\
TER & Aphrodite - Nyx & 35.5 & 0.002^* & -0.018 & [-Inf, -0.01] & less \\
TER & Hera - Aphrodite & 181.0 & 0.816 & 0.002 & [-Inf, 0.01] & less \\
TER & Hera - Nyx & 29.5 & 0.002^* & -0.016 & [-Inf, -0.01] & less \\
\multicolumn{6}{l}{\textbf{Parametric (Dependent T-test)}} \\
CER & Athena - Hera & -2.796 & 0.015^* & -0.011 & [-Inf, 0] & less \\
CER & Athena - Aphrodite & -2.772 & 0.015^* & -0.011 & [-Inf, 0] & less \\
CER & Athena - Nyx & -4.356 & 0.0007^* & -0.030 & [-Inf, -0.02] & less \\
CER & Aphrodite - Nyx & -3.821 & 0.002^* & -0.019 & [-Inf, -0.01] & less \\
CER & Hera - Aphrodite & 0.050 & 0.520 & 0.000 & [-Inf, 0.01] & less \\
CER & Hera - Nyx & -3.825 & 0.002^* & -0.019 & [-Inf, -0.01] & less \\
\bottomrule
\end{tabular}
\caption{Post-hoc results of error rates for the test keyboards. Significant p
values are denoted with *. Confidence intervals are given for the estimate
in the difference in means (T-test) and difference of the location parameter
(Wilcoxon)}
\label{tbl:res_tkbs_err}
\end{table}
\subsection{Muscle Activity}
\label{sec:res_muscle_activity}
We utilized the \gls{EMG} device described in Section \ref{sec:main_design} to
gather data about the muscle activities (\% of \glsfirst{MVC}) during typing
tests for the extensor and flexor muscles of both forearms. For our analysis, we
used the mean values of the results for both typing tests with each keyboard.
It has to be noted, that we had to remove two erroneous measurements concerning
the right flexor muscle (n = 22). We found no significant differences in
\%\gls{MVC} for any of the test keyboards in neither flexor, nor extensor
\gls{EMG} measurements. Further, we analyzed the effect of the individual
keyboards on \%\gls{MVC}s separately for first and second typing tests (Tn\_1 \&
Tn\_2, n := 1, ..., 4), but did not find any statistically significant results
as well. Lastly, we analyzed possible differences between \%\gls{MVC}
measurements of first and second typing tests for each individual keyboard,
using either dependent T-tests or Wilcoxon Signed Rank Tests. There were no
statistically significant differences in \%\gls{MVC} between the first and the
second typing test for any keyboard/muscle combination. The summaries for all
test keyboards of the mean values for both typing tests combined can be observed
in Table \ref{tbl:sum_tkbs_emg}.
\begin{table}[H]
\centering
\footnotesize
\ra{1.2}
\toprule
\parbox{.49\linewidth}{
\begin{tabular}{?r^l^l^l^l^l}
\multicolumn{6}{c}{\textbf{Left Flexor \%\gls{MVC}}} \\
\rowstyle{\itshape}
Pseud. & Mean & Min & Max & SD & SE \\
\midrule
Athena & 9.90 & 0.94 & 41.91 & 9.03 & 1.84 \\
Aphrodite & 8.82 & 0.26 & 23.10 & 6.37 & 1.30 \\
Nyx & 8.84 & 2.13 & 24.37 & 6.65 & 1.36 \\
Hera & 9.98 & 2.82 & 25.18 & 6.91 & 1.41 \\
\end{tabular}
}
\parbox{.49\linewidth}{
\begin{tabular}{?r^l^l^l^l^l^l^l}
\multicolumn{6}{c}{\textbf{Right Flexor \%\gls{MVC}} \textit{(n = 22)}} \\
\rowstyle{\itshape}
Pseud. & Mean & Min & Max & SD & SE \\
\midrule
Athena & 9.69 & 2.13 & 23.88 & 5.67 & 1.21 \\
Aphrodite & 9.33 & 2.15 & 16.96 & 4.51 & 0.96 \\
Nyx & 8.60 & 1.68 & 16.16 & 4.43 & 0.94 \\
Hera & 9.26 & 1.42 & 20.39 & 5.75 & 1.23 \\
\end{tabular}
}
\\
\parbox{.49\linewidth}{
\begin{tabular}{?r^l^l^l^l^l^l^l}
\multicolumn{6}{c}{\textbf{Left Extensor \%\gls{MVC}}} \\
\rowstyle{\itshape}
Pseud. & Mean & Min & Max & SD & SE \\
\midrule
Athena & 12.24 & 5.17 & 18.98 & 4.11 & 0.84 \\
Aphrodite & 11.60 & 4.80 & 16.86 & 3.67 & 0.75 \\
Nyx & 11.43 & 5.14 & 16.45 & 3.87 & 0.79 \\
Hera & 11.73 & 4.80 & 21.05 & 4.10 & 0.84 \\
\end{tabular}
}
\parbox{.49\linewidth}{
\begin{tabular}{?r^l^l^l^l^l^l^l}
\multicolumn{6}{c}{\textbf{Right Extensor \%\gls{MVC}}} \\
\rowstyle{\itshape}
Pseud. & Mean & Min & Max & SD & SE \\
\midrule
Athena & 10.78 & 3.34 & 17.58 & 3.86 & 0.79 \\
Aphrodite & 10.66 & 3.56 & 19.05 & 4.41 & 0.90 \\
Nyx & 10.57 & 3.81 & 21.55 & 4.33 & 0.88 \\
Hera & 10.79 & 4.11 & 19.50 & 4.09 & 0.83 \\
\end{tabular}
}
\bottomrule
\caption{Summaries for the \textit{mean values of} measured muscle activity
(\% of \glsfirst{MVC}) in \textit{both typing tests} conducted with each
keyboard.}
\label{tbl:sum_tkbs_emg}
\end{table}
\pagebreak
\subsection{Questionnaires}
\label{sec:res_questionnaires}
\subsubsection{Keyboard Comfort Questionnaire}
\label{sec:res_kcq}
The \glsfirst{KCQ} was filled out by the participants after each individual
typing test. The questionnaire featured twelve questions regarding the
previously used keyboard which are labelled as follows:
\begin{table}[H]
\centering
\ra{0.8}
\small
\begin{tabular}{llll}
\textbf{KCQ1:} & \textit{``Required operating force during usage?''} & \textbf{KCQ7:} & \textit{``Ease of use?''} \\
\textbf{KCQ2:} & \textit{``Perceived uniformity during usage?''} & \textbf{KCQ8:} & \textit{``Fatigue of the fingers?''} \\
\textbf{KCQ3:} & \textit{``Effort required during usage?''} & \textbf{KCQ9:} & \textit{``Fatigue of the wrists?''} \\
\textbf{KCQ4:} & \textit{``Perceived accuracy?''} & \textbf{KCQ10:} & \textit{``Fatigue of the arms?''} \\
\textbf{KCQ5:} & \textit{``Acceptability of speed?''} & \textbf{KCQ11:} & \textit{``Fatigue of the shoulders?''} \\
\textbf{KCQ6:} & \textit{``Overall satisfaction?''} & \textbf{KCQ12:} & \textit{``Fatigue of the neck?''} \\
\end{tabular}
\end{table}
All questions featured a 7-point Likert scale where 1 always denoted the worst
and 7 the best possible experience \cite{iso9241-411}. We conducted Friedman's
Tests for all questions and found differences for at least two of the test
keyboards in \textit{KCQ3} ($\chi^2$(3) = 9.49, p = 0.024), \textit{KCQ4}
($\chi^2$(3) = 18.4, p = 0.0004), \textit{KCQ6} ($\chi^2$(3) = 10.2, p = 0.017)
and \textit{KCQ8} ($\chi^2$(3) = 12.0, p = 0.0075). Further, we noticed a trend
towards significance for question \textit{KCQ1} ($\chi^2$(3) = 7.02, p =
0.071). The mean values for all answers can be seen in Figure
\ref{fig:kcq_tkbs_res} and the post-hoc test for relevant answers are shown in
Table \ref{tbl:res_kcq}.
\begin{figure}[H]
\centering
\includegraphics[width=1.0\textwidth]{images/kcq_tkbs_res}
\caption{Means of the responses for all questions of the \glsfirst{KCQ}}
\label{fig:kcq_tkbs_res}
\end{figure}
\begin{table}[H]
\centering
\small
\ra{1.3}
\begin{tabular}{?l^l^l^l^l^l^l^l}
\toprule
\rowstyle{\itshape}
Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\
\midrule
\multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\
KCQ1 & Aphrodite - Athena & 191.5 & 0.051^\dagger & 1.5 & [0.5, 2.5] & two-tailed \\
\midrule
KCQ3 & Aphrodite - Athena & 209.5 & 0.03^* & 1.25 & [0.25, 2] & two-tailed \\
KCQ3 & Athena - Hera & 37.0 & 0.022^* & -1.25 & [-2, -0.5] & two-tailed \\
KCQ3 & Athena - Nyx & 31.0 & 0.03^* & -1.5 & [-2.5, -0.5] & two-tailed \\
\midrule
KCQ4 & Aphrodite - Nyx & 161.5 & 0.038^* & 1.5 & [0.75, 2.5] & two-tailed \\
KCQ4 & Athena - Hera & 168.5 & 0.072^\dagger & 1.0 & [0.25, 1.5] & two-tailed \\
KCQ4 & Athena - Nyx & 193.5 & 0.006^* & 2.0 & [1, 2.75] & two-tailed \\
\bottomrule
\end{tabular}
\caption{Post-hoc tests for questions from the \gls{KCQ}. Statistically
significant differences (p < 0.05) are marked with an asterisk and p values
indicating a trend towards significance are denoted with
$\dagger$. Confidence intervals are given for the difference of the location
parameter}
\label{tbl:res_kcq}
\end{table}
\subsubsection{User Experience Questionnaire (Short)}
\label{sec:res_ueqs}