You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
692 lines
36 KiB
692 lines
36 KiB
% A rapid method that creates many corrected errors, has efficient error correction, and leaves
|
|
% few uncorrected errors can still be considered a successful method, since it produces
|
|
% accurate text in relatively little time. pp. 56 MacKenzie
|
|
\section{Results of the Main User Study}
|
|
\label{sec:results}
|
|
This section addresses the statistical analysis of the data obtained throughout
|
|
the main, within-subject, user study (n = 24) that consisted of five repeated
|
|
measurements. Because the data was from related, dependent groups, we used
|
|
\textit{\gls{rmANOVA}} if all required assumption were met and
|
|
\textit{Friedman's Test} otherwise. To identify the specific pairs of treatments
|
|
that differed significantly, we ran either \textit{Dependent T-Tests} or
|
|
\textit{Wilcoxon Signed Rank Tests} (both with \textit{Holm correction
|
|
(sequetially rejective Bonferroni test)} \cite{holm_correction}) as post-hoc
|
|
tests \cite{field_stats, downey_stats}. The reliability of the two sub-scales
|
|
(hedonic and pragmatic quality) in the \glsfirst{UEQ-S} was estimated using
|
|
\textit{Cronbach's alpha} \cite{tavakol_cronbachs_alpha}. All results are
|
|
reported statistically significant with an $\alpha$-level of $p < 0.05$. We used
|
|
95\,\% confidence intervals when presenting certain results. Normality of data or
|
|
residuals was checked using visual assessment of \gls{Q-Q} plots and
|
|
additionally \textit{Shapiro-Wilk} Test. Further, we used \textit{Mauchly's Test
|
|
for Sphericity} to evaluate if there was statistically significant variation
|
|
in the variances of the differences of contrasting groups \cite{field_stats,
|
|
downey_stats}.
|
|
|
|
\subsection{Own Keyboard}
|
|
\label{sec:res_OPC}
|
|
As mentioned in Section \ref{sec:main_design}, the keyboard \textit{Own} was
|
|
used as a reference for some metrics captured during the experiment. Since the
|
|
measurements with \textit{Own} took place at the start (T0\_1) and end (T0\_2)
|
|
of the experiment, we compared the results of both typing tests to detect
|
|
possible variations in performance due to fatigue. Using dependent T-tests, we
|
|
found that there were no significant differences in \glsfirst{KSPS} for T0\_1 (M
|
|
= 5.39, sd = 1.49) compared to T0\_2 (M = 5.47, sd = 1.48, t = -1.53, p =
|
|
0.139), \glsfirst{UER} was overall negligible with T0\_1 (M = 0.005, sd = 0.013,
|
|
85th percentile = 0.0051) and T0\_2 (M = 0.008, sd = 0.028, 85th percentile =
|
|
0.0052) and \glsfirst{WPM} showed a trend to approach significance with T0\_1 (M
|
|
= 54.2, sd = 14.7) compared to T0\_2 (M = 53.0, sd = 14.5, t = 1.92, p =
|
|
0.067). Further, using dependent T-tests we were able to find statistically
|
|
significant differences in \glsfirst{AdjWPM} for T0\_1 (M = 53.9, sd = 14.5) and
|
|
T0\_2 (M = 52.5, sd = 14.3, t = 2.44, p = 0.023), \glsfirst{CER} for T0\_1 (M =
|
|
0.057, sd = 0.028) and T0\_2 (M = 0.078, sd = 0.038, t = -3.54, p = 0.002) and
|
|
\glsfirst{TER} for T0\_1 (M = 0.063, sd = 0.031) and T0\_2 (M = 0.086, sd =
|
|
0.039, t = -4.27, p = 0.0003). Because of the differences we decided to use the
|
|
means of all metrics gathered for each participant through T0\_1 and T0\_2 as
|
|
the reference values to compute the \textit{\gls{OPC}} for the test keyboards
|
|
(\textit{Athena, Aphrodite, Nyx} and \textit{Hera}). This value was later used
|
|
to make statements about the performance of the individual test keyboards
|
|
compared to the participant's own, familiar keyboard.
|
|
|
|
Additionally, using a dependent T-test, we compared the muscle activity (\% of
|
|
\glsfirst{MVC}) and found, that there are significant differences in left flexor
|
|
(\glsfirst{FDP} \& \glsfirst{FDS}) \%\gls{MVC} for T0\_1 (M = 12.0, sd = 8.27)
|
|
and T0\_2 (M = 8.53, sd = 7.16, t = 3.18, p = 0.004). Residuals of right flexor
|
|
(\gls{FDF} \& \gls{FDS}) were not normally distributed, therefore we used the
|
|
Wilcoxon Signed Rank Test and found an significant difference for T0\_1 (M =
|
|
10.8, sd = 8.18, Med = 9.52) and T0\_2 (M = 7.71, sd = 6.08, Med = 5.32, p =
|
|
0.021). It has to be noted, that we had to remove two erroneous measurements for
|
|
the right flexor (n = 22). No significant differences have been found in left or
|
|
right extensor (\glsfirst{ED}) \%\gls{MVC} between T0\_1 and T0\_2. All results
|
|
can be observed in Table \ref{tbl:res_own_before_after}.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\small
|
|
\ra{1.3}
|
|
\begin{tabular}{?l^l^l^l^l^l^l^l}
|
|
\toprule
|
|
\rowstyle{\itshape}
|
|
Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\
|
|
\midrule
|
|
\multicolumn{6}{l}{\textbf{Parametric (Dependent T-test)}} \\
|
|
WPM & T0\_1 - T0\_2 & 1.92 & 0.07^\dagger & 1.18 & [-0.09, 2.45] & two-tailed \\
|
|
AdjWPM & T0\_1 - T0\_2 & 2.44 & 0.02^* & 1.35 & [0.21, 2.50] & two-tailed \\
|
|
KSPS & T0\_1 - T0\_2 & -1.53 & 0.14 & -0.08 & [-0.19, 0.03] & two-tailed \\
|
|
CER & T0\_1 - T0\_2 & -3.54 & 0.002^* & -0.02 & [-0.03, -0.01] & two-tailed \\
|
|
TER & T0\_1 - T0\_2 & -4.27 & 0.0003^* & -0.02 & [-0.03, -0.01] & two-tailed \\
|
|
\%MVC_{LF} & T0\_1 - T0\_2 & 3.18 & 0.004^* & 3.44 & [1.20, 5.68] & two-tailed \\
|
|
\%MVC_{LE} & T0\_1 - T0\_2 & 1.44 & 0.163 & 0.956 & [-0.42, 2.33] & two-tailed \\
|
|
\multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\
|
|
\%MVC_{RF} & T0\_1 - T0\_2 & 197 & 0.021^* & 1.83 & [0.39, 3.93] & two-tailed \\
|
|
\%MVC_{RE} & T0\_1 - T0\_2 & 173 & 0.527 & 0.28 & [-0.58, 0.91] & two-tailed \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\caption{Statistical analysis of differences between typing tests T0\_1 and
|
|
T0\_2 for keyboard \textit{Own}. For $\%MVC_{RF}$ two erroneous measurements
|
|
were removed (n = 22). Statistically significant differences (p < 0.05) are
|
|
marked with an asterisk and p values indicating a trend towards significance
|
|
are denoted with $\dagger$. Confidence intervals are given for the estimate
|
|
in the difference in means (T-test) and difference of the location parameter
|
|
(Wilcoxon). The subscript LF, RF, LE, RE stand for left or right forearm
|
|
flexor or extensor muscles}
|
|
\label{tbl:res_own_before_after}
|
|
\end{table}
|
|
|
|
We also evaluated the means of \glsfirst{KCQ} questions 8 to 12 which concerned
|
|
perceived fatigue in fingers, wrists, arms, shoulders and neck respectively
|
|
(7-point Likert scale) as well as the slopes (improving, deteriorating, stable)
|
|
of the \gls{UX Curve}s drawn by each participant after the whole experiment, to
|
|
identify possible differences in perceived fatigue from T0\_1 to T0\_2. As shown
|
|
in Figure \ref{fig:res_own_per_fat}, participants \gls{KCQ} reported slight
|
|
improvements in terms of finger (diff = 0.33) and wrist (diff = 0.33) fatigue in
|
|
T0\_2 compared to T0\_1, no difference in arm fatigue (diff = 0) and very
|
|
slightly increased fatigue in shoulder (diff = -0.12) and neck (diff = -0.13) in
|
|
T0\_2 compared to T0\_1. Sixteen of the twenty-four \gls{UX Curve}s regarding
|
|
overall perceived fatigue had positive slope when measured from start of T0\_1
|
|
to end of T0\_2 ($\pm$ 1 mm). The subjective reports about the decrease in
|
|
finger and wrist fatigue emphasize the decrease in muscle activity for the
|
|
flexor muscles we described in the last paragraph.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=0.98\textwidth]{images/res_own_per_fat}
|
|
\caption{Trends for reported fatigue through the \gls{KCQ} (questions 8:
|
|
finger, 9: wrist, 10: arm, 11: shoulder, 12: neck) and histogram for the
|
|
slopes (IM: improving, DE: deteriorating, ST: stable) of \gls{UX Curve}s
|
|
concerning perceived fatigue. The curves were evaluated by looking at the y
|
|
value of the starting point for T0\_1 and comparing it to y value of the end
|
|
point for T0\_2 with a margin of $\pm$ 1 mm}
|
|
\label{fig:res_own_per_fat}
|
|
\end{figure}
|
|
\subsection{Performance Metrics}
|
|
% As briefly mentioned in the last section, the individual measurements were then converted into
|
|
% percentage values of the mean of the reference values gathered from typing tests
|
|
% with keyboard \textit{Own} (\gls{OPC}).
|
|
\label{sec:res_perf}
|
|
\subsubsection{Typing Speed}
|
|
\label{sec:res_typing_speed}
|
|
The typing speed for each individual keyboard and typing test was automatically
|
|
captured with the help of the typing test functionality offered by
|
|
\glsfirst{GoTT}. We captured \gls{WPM}, \gls{AdjWPM} and \gls{KSPS} according to
|
|
the formulas mentioned in Section \ref{sec:meas_perf}. We used the mean of the
|
|
results for both typing tests performed with each keyboard to conduct the
|
|
following statistical analysis. A \gls{rmANOVA} was performed and revealed
|
|
possible differences between at least two of the test keyboards (\textit{Athena,
|
|
Aphrodite, Nyx} and \textit{Hera}) in terms of \gls{WPM} (F(3, 69) = 6.036, p
|
|
= 0.001). We performed dependent T-tests with Holm correction and found
|
|
significant differences between \textit{Aphrodite} (M = 51.5, sd = 14.0) and
|
|
\textit{Nyx} (M = 49.4, sd = 13.3, t = 3.33, p = 0.014), \textit{Athena} (M =
|
|
51.5, sd = 14.2) and \textit{Nyx} (M = 49.4, sd = 13.3, t = 2.76, p = 0.044) and
|
|
\textit{Hera} (M = 51.9, sd = 14.6) and \textit{Nyx} (M = 49.4, sd = 13.3, t =
|
|
3.53, p = 0.01). Further, the \gls{rmANOVA} for \gls{AdjWPM} yielded (F(3, 69) =
|
|
6.197, p = 0.0009) and for \gls{KSPS} (F(3, 69) = 3.566, p = 0.018). All
|
|
relevant results of the post-hoc tests and the summary of the performance data
|
|
can be observed in Tables \ref{tbl:sum_tkbs_speed} and
|
|
\ref{tbl:res_tkbs_speed}. We further examined which of the four test keyboard
|
|
was the fastest for each participant and found, that \textit{Hera} was the
|
|
fastest keyboard in terms of \gls{WPM} for 46\,\% (11) of the twenty-four
|
|
subjects. Additionally, we analyzed the \gls{WPM} percentage of \textit{Own}
|
|
(\gls{OPC}) for all test keyboards to figure out, which keyboard exceeded the
|
|
performance of the participant's own keyboard. We found that three subjects
|
|
reached \gls{OPC}\_\gls{WPM} values greater than 100\,\% with all four test
|
|
keyboards. Also, \textit{Athena, Aphrodite} and \textit{Hera} exceeded 100\,\%
|
|
of \gls{OPC}\_\gls{WPM} eight, seven and six times respectively. Detailed
|
|
results are presented in Figure \ref{fig:max_opc_wpm}.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\footnotesize
|
|
\ra{1.2}
|
|
\toprule
|
|
\parbox{.49\linewidth}{
|
|
\begin{tabular}{?r^l^l^l^l^l}
|
|
\multicolumn{6}{c}{\textbf{\gls{WPM}}} \\
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 51.47 & 17.96 & 73.86 & 14.21 & 2.90 \\
|
|
Aphrodite & 51.46 & 20.76 & 76.36 & 14.01 & 2.86 \\
|
|
Nyx & 49.39 & 20.80 & 74.26 & 13.28 & 2.71 \\
|
|
Hera & 51.87 & 18.10 & 76.06 & 14.55 & 2.97 \\
|
|
\end{tabular}
|
|
}
|
|
\parbox{.49\linewidth}{
|
|
\begin{tabular}{?r^l^l^l^l^l^l^l}
|
|
\multicolumn{6}{c}{\textbf{\gls{AdjWPM}}} \\
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 51.04 & 17.94 & 73.19 & 14.07 & 2.87 \\
|
|
Aphrodite & 50.97 & 20.76 & 75.78 & 13.95 & 2.85 \\
|
|
Nyx & 48.84 & 20.80 & 73.62 & 13.17 & 2.69 \\
|
|
Hera & 51.32 & 18.06 & 75.14 & 14.40 & 2.94 \\
|
|
\end{tabular}
|
|
}
|
|
\begin{tabular}{?r^l^l^l^l^l^l^l}
|
|
\\
|
|
\multicolumn{6}{c}{\textbf{\gls{KSPS}}} \\
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 5.23 & 1.68 & 7.94 & 1.54 & 0.31 \\
|
|
Aphrodite & 5.32 & 2.00 & 8.14 & 1.50 & 0.31 \\
|
|
Nyx & 5.31 & 1.95 & 8.15 & 1.48 & 0.30 \\
|
|
Hera & 5.37 & 1.72 & 8.15 & 1.57 & 0.32 \\
|
|
\end{tabular}
|
|
\bottomrule
|
|
\caption{Summaries for \glsfirst{WPM}, \glsfirst{AdjWPM} and \glsfirst{KSPS} for the test keyboards}
|
|
\label{tbl:sum_tkbs_speed}
|
|
\end{table}
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\small
|
|
\ra{1.3}
|
|
\begin{tabular}{?l^l^l^l^l^l^l^l}
|
|
\toprule
|
|
\rowstyle{\itshape}
|
|
Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\
|
|
\midrule
|
|
\multicolumn{6}{l}{\textbf{Parametric (Dependent T-test)}} \\
|
|
WPM & Athena - Nyx & 2.765 & 0.044^* & 2.083 & [0.52, 3.64] & two-tailed \\
|
|
WPM & Aphrodite - Nyx & 3.332 & 0.014^* & 2.069 & [0.78, 3.35] & two-tailed \\
|
|
WPM & Hera - Nyx & 3.541 & 0.010^* & 2.479 & [1.03, 3.93] & two-tailed \\
|
|
AdjWPM & Athena - Nyx & 2.868 & 0.035^* & 2.200 & [0.61, 3.79] & two-tailed \\
|
|
AdjWPM & Aphrodite - Nyx & 3.443 & 0.011^* & 2.132 & [0.85, 3.41] & two-tailed \\
|
|
AdjWPM & Hera - Nyx & 3.515 & 0.011^* & 2.475 & [1.02, 3.93] & two-tailed \\
|
|
KSPS & Athena - Hera & -2.834 & 0.056^\dagger & -0.145 & [-0.25, -0.04] & two-tailed \\
|
|
KSPS & Aphrodite - Athena & 2.566 & 0.086^\dagger & 0.095 & [0.02, 0.17] & two-tailed \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\caption{Relevant post-hoc results of speed related metrics for the test
|
|
keyboards. Significant p values are denoted with * and p values indicating a
|
|
trend towards significance are marked with $\dagger$. Confidence intervals
|
|
are given for the estimate in the difference in means}
|
|
\label{tbl:res_tkbs_speed}
|
|
\end{table}
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=1.0\textwidth]{images/max_opc_wpm}
|
|
\caption{The left graph shows the fastest keyboard in terms of \gls{WPM} for
|
|
each participant. The right graph shows, which keyboards were even faster
|
|
than the participant's own keyboard (\gls{OPC}\_\gls{WPM} > 100\,\%)}
|
|
\label{fig:max_opc_wpm}
|
|
\end{figure}
|
|
|
|
\subsubsection{Error Rate}
|
|
\label{sec:res_error_rate}
|
|
\gls{GoTT} also automatically tracked various error related metrics from which
|
|
we analyzed \glsfirst{UER}, \glsfirst{CER} and \glsfirst{TER}. Since we were
|
|
interested in whether higher actuation forces lead to a lower error rates
|
|
compared to lower actuation forces, we conducted one-tailed post-hoc tests for
|
|
the following statistical analyses. Like in Section \ref{sec:res_typing_speed},
|
|
we used the means of the results from both typing test for each keyboard to
|
|
conduct the analysis. The Friedman's Tests for \gls{TER} ($\chi^2$(3) = 25.4, p
|
|
= 0.00001) and the \gls{rmANOVA} for \gls{CER} (F(3, 69) = 13.355, p = 0.0000408
|
|
(\gls{GG})) revealed differences for at least two test keyboards. The Friedman's
|
|
Test for \gls{UER} ($\chi^2$(3) = 2.59, p = 0.46) yielded no statistical
|
|
significant difference. It should be noted, that the 90th percentile of
|
|
\gls{UER} for all keyboards was still below 1\,\%. Summaries for the individual
|
|
metrics and results for all post-hoc tests can be seen in Table
|
|
\ref{tbl:sum_tkbs_err} and \ref{tbl:res_tkbs_err}. Furthermore, we compared the
|
|
\gls{TER} of all test keyboards for each participant and found that
|
|
\textit{Athena} was the keyboard which participants typed most accurately
|
|
with. Two participants scored identical \gls{TER} with two test keyboards,
|
|
therefore the total number of ``1st-placed'' keyboards increased to twenty-six.
|
|
Lastly, we compared the test keyboards to subject's own keyboards and examined
|
|
that eleven participants scored lower \gls{TER}s with \textit{Athena} compared
|
|
to \textit{Own} (\gls{OPC}). All data can be observed in Figure
|
|
\ref{fig:max_opc_ter}.
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\footnotesize
|
|
\ra{1.2}
|
|
\toprule
|
|
\begin{tabular}{?r^l^l^l^l^l}
|
|
\multicolumn{6}{c}{\textbf{\gls{TER}}} \\
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 0.08 & 0.02 & 0.17 & 0.03 & 0.01 \\
|
|
Aphrodite & 0.09 & 0.02 & 0.20 & 0.04 & 0.01 \\
|
|
Nyx & 0.11 & 0.03 & 0.25 & 0.06 & 0.01 \\
|
|
Hera & 0.09 & 0.02 & 0.21 & 0.04 & 0.01 \\
|
|
\end{tabular}
|
|
\\
|
|
\parbox{.49\linewidth}{
|
|
\begin{tabular}{?r^l^l^l^l^l^l^l}
|
|
\multicolumn{6}{c}{\textbf{\gls{UER}}} \\
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 0.01 & 0.00 & 0.14 & 0.03 & 0.01 \\
|
|
Aphrodite & 0.01 & 0.00 & 0.17 & 0.03 & 0.01 \\
|
|
Nyx & 0.01 & 0.00 & 0.21 & 0.04 & 0.01 \\
|
|
Hera & 0.01 & 0.00 & 0.18 & 0.04 & 0.01 \\
|
|
\end{tabular}
|
|
}
|
|
\parbox{.49\linewidth}{
|
|
\begin{tabular}{?r^l^l^l^l^l^l^l}
|
|
\multicolumn{6}{c}{\textbf{\gls{CER}}} \\
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 0.07 & 0.02 & 0.13 & 0.03 & 0.01 \\
|
|
Aphrodite & 0.08 & 0.02 & 0.18 & 0.04 & 0.01 \\
|
|
Nyx & 0.10 & 0.03 & 0.23 & 0.05 & 0.01 \\
|
|
Hera & 0.08 & 0.02 & 0.14 & 0.04 & 0.01 \\
|
|
\end{tabular}
|
|
}
|
|
\bottomrule
|
|
\caption{Descriptive statistics for \glsfirst{TER}, \glsfirst{UER} and
|
|
\glsfirst{CER} for the test keyboards}
|
|
\label{tbl:sum_tkbs_err}
|
|
\end{table}
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\small
|
|
\ra{1.3}
|
|
\begin{tabular}{?l^l^l^l^l^l^l^l}
|
|
\toprule
|
|
\rowstyle{\itshape}
|
|
Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\
|
|
\midrule
|
|
\multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\
|
|
TER & Athena - Hera & 38.0 & 0.004^* & -0.011 & ]-Inf, -0.01] & less \\
|
|
TER & Athena - Aphrodite & 58.5 & 0.009^* & -0.012 & ]-Inf, 0] & less \\
|
|
TER & Athena - Nyx & 18.0 & 0.00009^* & -0.027 & ]-Inf, -0.02] & less \\
|
|
TER & Aphrodite - Nyx & 35.5 & 0.002^* & -0.018 & ]-Inf, -0.01] & less \\
|
|
TER & Hera - Aphrodite & 181.0 & 0.816 & 0.002 & ]-Inf, 0.01] & less \\
|
|
TER & Hera - Nyx & 29.5 & 0.002^* & -0.016 & ]-Inf, -0.01] & less \\
|
|
\multicolumn{6}{l}{\textbf{Parametric (Dependent T-test)}} \\
|
|
CER & Athena - Hera & -2.796 & 0.015^* & -0.011 & ]-Inf, 0] & less \\
|
|
CER & Athena - Aphrodite & -2.772 & 0.015^* & -0.011 & ]-Inf, 0] & less \\
|
|
CER & Athena - Nyx & -4.356 & 0.0007^* & -0.030 & ]-Inf, -0.02] & less \\
|
|
CER & Aphrodite - Nyx & -3.821 & 0.002^* & -0.019 & ]-Inf, -0.01] & less \\
|
|
CER & Hera - Aphrodite & 0.050 & 0.520 & 0.000 & ]-Inf, 0.01] & less \\
|
|
CER & Hera - Nyx & -3.825 & 0.002^* & -0.019 & ]-Inf, -0.01] & less \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\caption{Post-hoc results of error rates for the test keyboards. Significant p
|
|
values are denoted with *. Confidence intervals are given for the estimate
|
|
in the difference in means (T-test) and difference of the location parameter
|
|
(Wilcoxon)}
|
|
\label{tbl:res_tkbs_err}
|
|
\end{table}
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=1.0\textwidth]{images/max_opc_ter}
|
|
\caption{The left graph shows the keyboard with the lowest \gls{TER} for each
|
|
participant. The right graph shows, which keyboards were more accurate than
|
|
the participant's own keyboard (\gls{OPC}\_\gls{TER} < 100\,\%)}
|
|
\label{fig:max_opc_ter}
|
|
\end{figure}
|
|
|
|
\subsection{Muscle Activity Measurements}
|
|
\label{sec:res_muscle_activity}
|
|
We utilized the \gls{EMG} device described in Section \ref{sec:main_design} to
|
|
gather data about the muscle activities (\% of \glsfirst{MVC}) during typing
|
|
tests for the extensor and flexor muscles of both forearms. For our analysis, we
|
|
used the mean values of the results for both typing tests with each keyboard.
|
|
It has to be noted, that we had to remove two erroneous measurements concerning
|
|
the right flexor muscle (n = 22). We found no significant differences in
|
|
\%\gls{MVC} for any of the test keyboards in neither flexor, nor extensor
|
|
\gls{EMG} measurements. Further, we analyzed the effect of the individual
|
|
keyboards on \%\gls{MVC}s separately for first and second typing tests (Tn\_1 \&
|
|
Tn\_2, n := 1, ..., 4), but did not find any statistically significant results
|
|
either. Additionally, we analyzed possible differences between \%\gls{MVC}
|
|
measurements of first and second typing tests for each individual keyboard,
|
|
using either dependent T-tests or Wilcoxon Signed Rank Tests. There were no
|
|
statistically significant differences in \%\gls{MVC} between the first and the
|
|
second typing test for any keyboard/muscle combination. The summaries for all
|
|
test keyboards of the mean values for both typing tests combined can be observed
|
|
in Table \ref{tbl:sum_tkbs_emg}. Lastly, we created histograms (Figure
|
|
\ref{fig:max_emg_tkbs}) for each of the observed muscle groups, that show the
|
|
number of times a keyboard yielded the highest \%\gls{MVC} out of all keyboards
|
|
for each participant. We found that \textit{Athena} most frequently
|
|
($\approx$45\,\%) produced the highest extensor muscle activity for both
|
|
arms. The highest muscle activity for both flexor muscle groups was evenly
|
|
distributed among all test keyboards with a slight exception of \textit{Nyx},
|
|
which produced the highest \%\gls{MVC} only in ~14\,\% of participants.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=1.0\textwidth]{images/max_emg_tkbs}
|
|
\caption{Histograms for all \gls{EMG} measurements that show the keyboard with
|
|
the highest mean \% of \glsfirst{MVC} out of all four keyboards for each
|
|
participant}
|
|
\label{fig:max_emg_tkbs}
|
|
\end{figure}
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\footnotesize
|
|
\ra{1.2}
|
|
\toprule
|
|
\parbox{.49\linewidth}{
|
|
\begin{tabular}{?r^l^l^l^l^l}
|
|
\multicolumn{6}{c}{\textbf{Left Flexor \%\gls{MVC}}} \\
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 9.90 & 0.94 & 41.91 & 9.03 & 1.84 \\
|
|
Aphrodite & 8.82 & 0.26 & 23.10 & 6.37 & 1.30 \\
|
|
Nyx & 8.84 & 2.13 & 24.37 & 6.65 & 1.36 \\
|
|
Hera & 9.98 & 2.82 & 25.18 & 6.91 & 1.41 \\
|
|
\end{tabular}
|
|
}
|
|
\parbox{.49\linewidth}{
|
|
\begin{tabular}{?r^l^l^l^l^l^l^l}
|
|
\multicolumn{6}{c}{\textbf{Right Flexor \%\gls{MVC}} \textit{(n = 22)}} \\
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 9.69 & 2.13 & 23.88 & 5.67 & 1.21 \\
|
|
Aphrodite & 9.33 & 2.15 & 16.96 & 4.51 & 0.96 \\
|
|
Nyx & 8.60 & 1.68 & 16.16 & 4.43 & 0.94 \\
|
|
Hera & 9.26 & 1.42 & 20.39 & 5.75 & 1.23 \\
|
|
\end{tabular}
|
|
}
|
|
\\
|
|
\parbox{.49\linewidth}{
|
|
\begin{tabular}{?r^l^l^l^l^l^l^l}
|
|
\multicolumn{6}{c}{\textbf{Left Extensor \%\gls{MVC}}} \\
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 12.24 & 5.17 & 18.98 & 4.11 & 0.84 \\
|
|
Aphrodite & 11.60 & 4.80 & 16.86 & 3.67 & 0.75 \\
|
|
Nyx & 11.43 & 5.14 & 16.45 & 3.87 & 0.79 \\
|
|
Hera & 11.73 & 4.80 & 21.05 & 4.10 & 0.84 \\
|
|
\end{tabular}
|
|
}
|
|
\parbox{.49\linewidth}{
|
|
\begin{tabular}{?r^l^l^l^l^l^l^l}
|
|
\multicolumn{6}{c}{\textbf{Right Extensor \%\gls{MVC}}} \\
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 10.78 & 3.34 & 17.58 & 3.86 & 0.79 \\
|
|
Aphrodite & 10.66 & 3.56 & 19.05 & 4.41 & 0.90 \\
|
|
Nyx & 10.57 & 3.81 & 21.55 & 4.33 & 0.88 \\
|
|
Hera & 10.79 & 4.11 & 19.50 & 4.09 & 0.83 \\
|
|
\end{tabular}
|
|
}
|
|
\bottomrule
|
|
\caption{Descriptive statistics for the \textit{mean values of} measured
|
|
muscle activity (\% of \glsfirst{MVC}) in \textit{both typing tests}
|
|
conducted with each keyboard.}
|
|
\label{tbl:sum_tkbs_emg}
|
|
\end{table}
|
|
\pagebreak
|
|
\subsection{Questionnaires}
|
|
\label{sec:res_questionnaires}
|
|
\subsubsection{Keyboard Comfort Questionnaire}
|
|
\label{sec:res_kcq}
|
|
The \glsfirst{KCQ} was filled out by the participants after each individual
|
|
typing test. The questionnaire featured twelve questions regarding the
|
|
previously used keyboard which are labeled as follows:
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\ra{0.8}
|
|
\small
|
|
\begin{tabular}{llll}
|
|
\textbf{KCQ1:} & \textit{``Required operating force during usage?''} & \textbf{KCQ7:} & \textit{``Ease of use?''} \\
|
|
\textbf{KCQ2:} & \textit{``Perceived uniformity during usage?''} & \textbf{KCQ8:} & \textit{``Fatigue of the fingers?''} \\
|
|
\textbf{KCQ3:} & \textit{``Effort required during usage?''} & \textbf{KCQ9:} & \textit{``Fatigue of the wrists?''} \\
|
|
\textbf{KCQ4:} & \textit{``Perceived accuracy?''} & \textbf{KCQ10:} & \textit{``Fatigue of the arms?''} \\
|
|
\textbf{KCQ5:} & \textit{``Acceptability of speed?''} & \textbf{KCQ11:} & \textit{``Fatigue of the shoulders?''} \\
|
|
\textbf{KCQ6:} & \textit{``Overall satisfaction?''} & \textbf{KCQ12:} & \textit{``Fatigue of the neck?''} \\
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
All questions featured a 7-point Likert scale where 1 always denoted the worst
|
|
and 7 the best possible experience \cite{iso9241-411}. We conducted Friedman's
|
|
Tests for all questions and found differences for at least two of the test
|
|
keyboards in \textit{KCQ3} ($\chi^2$(3) = 9.49, p = 0.024), \textit{KCQ4}
|
|
($\chi^2$(3) = 18.4, p = 0.0004), \textit{KCQ6} ($\chi^2$(3) = 10.2, p = 0.017)
|
|
and \textit{KCQ8} ($\chi^2$(3) = 12.0, p = 0.0075). Further, we noticed a trend
|
|
towards significance for question \textit{KCQ1} ($\chi^2$(3) = 7.02, p =
|
|
0.071). The mean values for all answers can be seen in Figure
|
|
\ref{fig:kcq_tkbs_res} and the post-hoc test for relevant answers are shown in
|
|
Table \ref{tbl:res_kcq}.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=1.0\textwidth]{images/kcq_tkbs_res}
|
|
\caption{Means of the responses for all questions of the \glsfirst{KCQ}}
|
|
\label{fig:kcq_tkbs_res}
|
|
\end{figure}
|
|
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\small
|
|
\ra{1.3}
|
|
\begin{tabular}{?l^l^l^l^l^l^l^l}
|
|
\toprule
|
|
\rowstyle{\itshape}
|
|
Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\
|
|
\midrule
|
|
\multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\
|
|
KCQ1 & Aphrodite - Athena & 191.5 & 0.051^\dagger & 1.5 & [0.5, 2.5] & two-tailed \\
|
|
\midrule
|
|
KCQ3 & Aphrodite - Athena & 209.5 & 0.03^* & 1.25 & [0.25, 2] & two-tailed \\
|
|
KCQ3 & Athena - Hera & 37.0 & 0.022^* & -1.25 & [-2, -0.5] & two-tailed \\
|
|
KCQ3 & Athena - Nyx & 31.0 & 0.03^* & -1.5 & [-2.5, -0.5] & two-tailed \\
|
|
\midrule
|
|
KCQ4 & Aphrodite - Nyx & 161.5 & 0.038^* & 1.5 & [0.75, 2.5] & two-tailed \\
|
|
KCQ4 & Athena - Hera & 168.5 & 0.072^\dagger & 1.0 & [0.25, 1.5] & two-tailed \\
|
|
KCQ4 & Athena - Nyx & 193.5 & 0.006^* & 2.0 & [1, 2.75] & two-tailed \\
|
|
\midrule
|
|
KCQ6 & Aphrodite - Nyx & 240.000 & 0.061^\dagger & 1.0 & [0.25, 1.75] & two-tailed \\
|
|
\midrule
|
|
KCQ8 & Athena - Hera & 18.000 & 0.007^* & -1.25 & [-1.75, -0.75] & two-tailed \\
|
|
KCQ8 & Athena - Nyx & 12.500 & 0.007^* & -1.25 & [-2, -0.75] & two-tailed \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\caption{Post-hoc tests for questions from the \gls{KCQ}. Statistically
|
|
significant differences (p < 0.05) are marked with an asterisk and p values
|
|
indicating a trend towards significance are denoted with
|
|
$\dagger$. Confidence intervals are given for the difference of the location
|
|
parameter}
|
|
\label{tbl:res_kcq}
|
|
\end{table}
|
|
\subsubsection{User Experience Questionnaire (Short)}
|
|
\label{sec:res_ueqs}
|
|
In addition to to the \gls{KCQ}, we utilized the \glsfirst{UEQ-S}. It featured
|
|
eight questions on a 7-point Likert scale, which formed two scales (pragmatic,
|
|
hedonic). Additionally we added one extra question that could be answered on a
|
|
\glsfirst{VAS} from 0 to 100. The survey was filled out after both tests with a
|
|
keyboard have been completed. The questions of our modified \gls{UEQ-S} were
|
|
labeled as follows:
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\ra{0.8}
|
|
\small
|
|
\begin{tabular}{llll}
|
|
\multicolumn{2}{c}{Pragmatic Scale} & \multicolumn{2}{c}{Hedonic Scale} \\
|
|
\\
|
|
\textbf{PRA1:} & \textit{``Obstructive or Supportive?''} & \textbf{HED1:} & \textit{``Boring or Exciting?''} \\
|
|
\textbf{PRA2:} & \textit{``Complicated or Easy?''} & \textbf{HED2:} & \textit{``Not interesting or Interesting?''} \\
|
|
\textbf{PRA3:} & \textit{``Inefficient or Efficient?''} & \textbf{HED3:} & \textit{``Conventional or Inventive?''} \\
|
|
\textbf{PRA4:} & \textit{``Confusing or Clear?''} & \textbf{HED4:} & \textit{``Usual or Leading Edge?''} \\
|
|
\\
|
|
\multicolumn{4}{c}{Additional Question (\gls{VAS})} \\
|
|
\\
|
|
\textbf{SATI:} & \multicolumn{3}{l}{\textit{``How satisfied have you been with this keyboard?''}}
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
The 7-point Likert scale items (PRA1-4, HED1-4) were then transformed to
|
|
represent a scale from -3 to +3, where -3 represented the left term and +3 the
|
|
right term of the ``or'' questions. All sub-scales, pragmatic ($\alpha$ =
|
|
0.90)\footnote{PRA: Athena ($\alpha$ = 0.83), Aphrodite ($\alpha$ = 0.95), Nyx
|
|
($\alpha$ = 0.90), Hera ($\alpha$ = 0.85)} and hedonic ($\alpha$ =
|
|
0.88)\footnote{HED: Athena ($\alpha$ = 0.89), Aphrodite ($\alpha$ = 0.89), Nyx
|
|
($\alpha$ = 0.91), Hera ($\alpha$ = 0.90)}, exceeded the recommended threshold
|
|
for Cronbach's alpha of $\alpha$ > 0.7 \cite{schrepp_ueq_handbook}. The mean
|
|
values for all responses of the \gls{UEQ-S} can be seen in Figure
|
|
\ref{fig:kcq_tkbs_res} and the individual responses to the additional question
|
|
(SATI) are presented in Figure \ref{fig:res_tkbs_sati}. We conducted
|
|
\gls{rmANOVA}s for both sub-scales but found no statistically significant
|
|
variations for the pragmatic scale (F(3, 69) = 3.254, p = 0.06, post-hoc did not
|
|
reveal any tendencies) nor the hedonic scale (F(3, 69) = 0.425, p =
|
|
0.74). Contrary, the \gls{rmANOVA} for the additional question \textit{SATI}
|
|
indicated statistically significant differences (F(3, 69) = 3.254, p =
|
|
0.027). In this case, we decided to use Wilcoxon Signed Rank Tests for our
|
|
post-hoc analysis because of our interest in the difference of medians and the
|
|
relatively high power of this test in analyzing \gls{VAS} data
|
|
\cite{heller_vas}. The results and summaries for the test keyboards can be
|
|
observed in Tables \ref{tbl:res_tkbs_sati} and \ref{tbl:sum_tkbs_sati}.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=0.92\textwidth]{images/ueq_tkbs_res}
|
|
\caption{Means of the responses for all questions of the \glsfirst{UEQ-S}}
|
|
\label{fig:ueq_tkbs_res}
|
|
\end{figure}
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\small
|
|
\ra{1.2}
|
|
\begin{tabular}{?l^l^l^l^l^l^l^l}
|
|
\toprule
|
|
\rowstyle{\itshape}
|
|
Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\
|
|
\midrule
|
|
\multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\
|
|
SATI & Aphrodite - Nyx & 217.0 & 0.046^* & 14.0 & [5, Inf[ & greater \\
|
|
SATI & Aphrodite - Athena & 201.5 & 0.046^* & 12.5 & [4.5, Inf[ & greater \\
|
|
SATI & Nyx - Athena & 125.5 & 1.0 & -3.0 & [-11.5, Inf[ & greater \\
|
|
SATI & Hera - Athena & 205.5 & 0.174 & 8.5 & [0, Inf[ & greater \\
|
|
SATI & Hera - Aphrodite & 118.5 & 1.0 & -2.5 & [-12.5, Inf[ & greater \\
|
|
SATI & Hera - Nyx & 223.5 & 0.074^\dagger & 12.5 & [2.5, Inf[ & greater \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\caption{Post-hoc tests for the additional question \textit{``How satisfied
|
|
have you been with this keyboard?''}. Statistically significant
|
|
differences (p < 0.05) are marked with an * and p values indicating a trend
|
|
towards significance are denoted with $\dagger$. Confidence intervals are
|
|
given for the difference of the location parameter. We only tested keyboards
|
|
with lower actuation force against keyboards with higher actuation
|
|
force. The first comparison of Aphrodite (50\,g) and Nyx (35\,g) was added,
|
|
because of the noticeable differences in the visual assessment of Figure
|
|
\ref{fig:res_tkbs_sati}}
|
|
\label{tbl:res_tkbs_sati}
|
|
\end{table}
|
|
|
|
\begin{table}[H]
|
|
\centering
|
|
\footnotesize
|
|
\ra{1.1}
|
|
\begin{tabular}{?r^l^l^l^l^l^l^l}
|
|
\toprule
|
|
\rowstyle{\itshape}
|
|
Pseud. & Mean & Median & Min & Max & SD & SE \\
|
|
\midrule
|
|
Athena & 54.12 & 50.00 & 1.00 & 95.00 & 25.43 & 5.19 \\
|
|
Aphrodite & 65.08 & 71.50 & 10.00 & 94.00 & 22.56 & 4.61 \\
|
|
Nyx & 51.42 & 55.00 & 0.00 & 90.00 & 23.40 & 4.78 \\
|
|
Hera & 63.29 & 70.00 & 12.00 & 92.00 & 19.95 & 4.07 \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\caption{Descriptive statistics for the additional question \textit{``How
|
|
satisfied have you been with this keyboard?''} for all four test
|
|
keyboards}
|
|
\label{tbl:sum_tkbs_sati}
|
|
\end{table}
|
|
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=1.0\textwidth]{images/sati_tkbs_res}
|
|
\caption{Responses for the additional question \textit{``How satisfied have
|
|
you been with this keyboard?''} with the means for all participant
|
|
represented as horizontal lines}
|
|
\label{fig:res_tkbs_sati}
|
|
\end{figure}
|
|
|
|
|
|
\subsection{UX Curves and Semi-Structured Interviews}
|
|
\label{sec:res_uxc}
|
|
In order to give all participants the chance to recapitulate the whole
|
|
experiment and give retrospective feedback about each individual keyboard, we
|
|
conducted a semi-structured interview which included drawing \gls{UX Curve}s for
|
|
perceived fatigue and perceived typing speed. We evaluated the curves by
|
|
measuring the y position of the \gls{SP} for a curve and the y position of the
|
|
respective \gls{EP} an determine the slope of that curve. Slopes are defined as
|
|
improving if \gls{SP} < \gls{EP}, deteriorating if \gls{SP} > \gls{EP} and
|
|
stable if \gls{SP} = \gls{EP} (margin of $\pm$ 1 mm). One curve can either
|
|
represent one typing test (C1 or C2) or the whole experience with one keyboard
|
|
over the course of both typing tests (C12). All curves can be observed in
|
|
Appendix \ref{app:uxc} and the resulting slopes for all curve types are shown in
|
|
Figure \ref{fig:res_uxc}. During the semi-structured interview we asked the
|
|
participants to rank the keyboards from 1 (favorite) to 5 (least favorite). If
|
|
in doubt, participants were allowed to place two keyboards on the same
|
|
rank. Further, we asked some participants (n = 19) to also rank the keyboards
|
|
from lowest actuation force (one) to highest actuation force (five). The
|
|
participants own keyboard was four times more often placed first than any other
|
|
keyboard. \textit{Hera} was the only keyboard that never got placed fifth and
|
|
except for \textit{Own}, was the most represented keyboard in the top three. The
|
|
ranking of the perceived actuation force revealed that participants were able
|
|
to identify \textit{Nyx} (35\,g) and \textit{Athena} (80\,g) as the keyboards
|
|
with the lowest and highest actuation force respectively. All results for both
|
|
rankings are visualized in Figure \ref{fig:res_interview}. Lastly, we analyzed
|
|
the recordings of all interviews and found several similar statements about
|
|
specific keyboards. Twelve participants noted that because of the new form
|
|
factor of the test keyboards, additional familiarization was required to feel
|
|
comfortable. Nine of those specifically mentioned the height of the keyboard as
|
|
the main difference. Fourteen subjects reported―\textit{``Because Nyx had such a
|
|
low resistance, I kept making mistakes!''}. Four participants explicitly
|
|
noted that \textit{Hera} felt very pleasant and two subjects mentioned
|
|
\textit{``I had really good flow.''} and \textit{``It somehow just felt
|
|
right''}. Ten participants reported, that typing on \textit{Athena} was
|
|
exhausting. \textit{Aphrodite} was not mentioned as often as the other keyboards
|
|
which could be related to a comment of two subjects―\textit{``It felt very
|
|
similar to my own Keyboard''}.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=1.0\textwidth]{images/res_uxc}
|
|
\caption{\centering Evaluation of \gls{UX Curve} slopes for perceived fatigue and perceived
|
|
speed. \\
|
|
\textit{DE:} deteriorating, \textit{IM:} improving, \textit{ST:} stable}
|
|
\label{fig:res_uxc}
|
|
\end{figure}
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=1.0\textwidth]{images/res_interview}
|
|
\caption{Rankings for favorite keyboard and perceived required actuation force
|
|
for all keyboards including \textit{Own}. The graphs show the number of
|
|
times a keyboard was placed at a certain rank}
|
|
\label{fig:res_interview}
|
|
\end{figure} |