% A rapid method that creates many corrected errors, has efficient error correction, and leaves % few uncorrected errors can still be considered a successful method, since it produces % accurate text in relatively little time. pp. 56 MacKenzie \section{Results of the Main User Study} \label{sec:results} This section addresses the statistical analysis of the data obtained throughout the main, within-subject, user study (n = 24) that consisted of five repeated measurements. Because the data was from related, dependent groups, we used \textit{\gls{rmANOVA}} if all required assumption were met and \textit{Friedman's Test} otherwise. To identify the specific pairs of treatments that differed significantly, we ran either \textit{Dependent T-Tests} or \textit{Wilcoxon Signed Rank Tests} (both with \textit{Holm correction (sequetially rejective Bonferroni test)} \cite{holm_correction}) as post-hoc tests \cite{field_stats, downey_stats}. The reliability of the two sub-scales (hedonic and pragmatic quality) in the \glsfirst{UEQ-S} was estimated using \textit{Cronbach's alpha} \cite{tavakol_cronbachs_alpha}. All results are reported statistically significant with an $\alpha$-level of $p < 0.05$. We used 95\,\% confidence intervals when presenting certain results. Normality of data or residuals was checked using visual assessment of \gls{Q-Q} plots and additionally \textit{Shapiro-Wilk} Test. Further, we used \textit{Mauchly's Test for Sphericity} to evaluate if there was statistically significant variation in the variances of the differences of contrasting groups \cite{field_stats, downey_stats}. \subsection{Own Keyboard} \label{sec:res_OPC} As mentioned in Section \ref{sec:main_design}, the keyboard \textit{Own} was used as a reference for some metrics captured during the experiment. Since the measurements with \textit{Own} took place at the start (T0\_1) and end (T0\_2) of the experiment, we compared the results of both typing tests to detect possible variations in performance due to fatigue. Using dependent T-tests, we found that there were no significant differences in \glsfirst{KSPS} for T0\_1 (M = 5.39, sd = 1.49) compared to T0\_2 (M = 5.47, sd = 1.48, t = -1.53, p = 0.139), \glsfirst{UER} was overall negligible with T0\_1 (M = 0.005, sd = 0.013, 85th percentile = 0.0051) and T0\_2 (M = 0.008, sd = 0.028, 85th percentile = 0.0052) and \glsfirst{WPM} showed a trend to approach significance with T0\_1 (M = 54.2, sd = 14.7) compared to T0\_2 (M = 53.0, sd = 14.5, t = 1.92, p = 0.067). Further, using dependent T-tests we were able to find statistically significant differences in \glsfirst{AdjWPM} for T0\_1 (M = 53.9, sd = 14.5) and T0\_2 (M = 52.5, sd = 14.3, t = 2.44, p = 0.023), \glsfirst{CER} for T0\_1 (M = 0.057, sd = 0.028) and T0\_2 (M = 0.078, sd = 0.038, t = -3.54, p = 0.002) and \glsfirst{TER} for T0\_1 (M = 0.063, sd = 0.031) and T0\_2 (M = 0.086, sd = 0.039, t = -4.27, p = 0.0003). Because of the differences we decided to use the means of all metrics gathered for each participant through T0\_1 and T0\_2 as the reference values to compute the \textit{\gls{OPC}} for the test keyboards (\textit{Athena, Aphrodite, Nyx} and \textit{Hera}). This value was later used to make statements about the performance of the individual test keyboards compared to the participant's own, familiar keyboard. Additionally, using a dependent T-test, we compared the muscle activity (\% of \glsfirst{MVC}) and found, that there are significant differences in left flexor (\glsfirst{FDP} \& \glsfirst{FDS}) \%\gls{MVC} for T0\_1 (M = 12.0, sd = 8.27) and T0\_2 (M = 8.53, sd = 7.16, t = 3.18, p = 0.004). Residuals of right flexor (\gls{FDF} \& \gls{FDS}) were not normally distributed, therefore we used the Wilcoxon Signed Rank Test and found an significant difference for T0\_1 (M = 10.8, sd = 8.18, Med = 9.52) and T0\_2 (M = 7.71, sd = 6.08, Med = 5.32, p = 0.021). It has to be noted, that we had to remove two erroneous measurements for the right flexor (n = 22). No significant differences have been found in left or right extensor (\glsfirst{ED}) \%\gls{MVC} between T0\_1 and T0\_2. All results can be observed in Table \ref{tbl:res_own_before_after}. \begin{table}[H] \centering \small \ra{1.3} \begin{tabular}{?l^l^l^l^l^l^l^l} \toprule \rowstyle{\itshape} Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\ \midrule \multicolumn{6}{l}{\textbf{Parametric (Dependent T-test)}} \\ WPM & T0\_1 - T0\_2 & 1.92 & 0.07^\dagger & 1.18 & [-0.09, 2.45] & two-tailed \\ AdjWPM & T0\_1 - T0\_2 & 2.44 & 0.02^* & 1.35 & [0.21, 2.50] & two-tailed \\ KSPS & T0\_1 - T0\_2 & -1.53 & 0.14 & -0.08 & [-0.19, 0.03] & two-tailed \\ CER & T0\_1 - T0\_2 & -3.54 & 0.002^* & -0.02 & [-0.03, -0.01] & two-tailed \\ TER & T0\_1 - T0\_2 & -4.27 & 0.0003^* & -0.02 & [-0.03, -0.01] & two-tailed \\ \%MVC_{LF} & T0\_1 - T0\_2 & 3.18 & 0.004^* & 3.44 & [1.20, 5.68] & two-tailed \\ \%MVC_{LE} & T0\_1 - T0\_2 & 1.44 & 0.163 & 0.956 & [-0.42, 2.33] & two-tailed \\ \multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\ \%MVC_{RF} & T0\_1 - T0\_2 & 197 & 0.021^* & 1.83 & [0.39, 3.93] & two-tailed \\ \%MVC_{RE} & T0\_1 - T0\_2 & 173 & 0.527 & 0.28 & [-0.58, 0.91] & two-tailed \\ \bottomrule \end{tabular} \caption{Statistical analysis of differences between typing tests T0\_1 and T0\_2 for keyboard \textit{Own}. For $\%MVC_{RF}$ two erroneous measurements were removed (n = 22). Statistically significant differences (p < 0.05) are marked with an asterisk and p values indicating a trend towards significance are denoted with $\dagger$. Confidence intervals are given for the estimate in the difference in means (T-test) and difference of the location parameter (Wilcoxon). The subscript LF, RF, LE, RE stand for left or right forearm flexor or extensor muscles} \label{tbl:res_own_before_after} \end{table} We also evaluated the means of \glsfirst{KCQ} questions 8 to 12 which concerned perceived fatigue in fingers, wrists, arms, shoulders and neck respectively (7-point Likert scale) as well as the slopes (improving, deteriorating, stable) of the \gls{UX Curve}s drawn by each participant after the whole experiment, to identify possible differences in perceived fatigue from T0\_1 to T0\_2. As shown in Figure \ref{fig:res_own_per_fat}, participants \gls{KCQ} reported slight improvements in terms of finger (diff = 0.33) and wrist (diff = 0.33) fatigue in T0\_2 compared to T0\_1, no difference in arm fatigue (diff = 0) and very slightly increased fatigue in shoulder (diff = -0.12) and neck (diff = -0.13) in T0\_2 compared to T0\_1. Sixteen of the twenty-four \gls{UX Curve}s regarding overall perceived fatigue had positive slope when measured from start of T0\_1 to end of T0\_2 ($\pm$ 1 mm). The subjective reports about the decrease in finger and wrist fatigue emphasize the decrease in muscle activity for the flexor muscles we described in the last paragraph. \begin{figure}[H] \centering \includegraphics[width=0.98\textwidth]{images/res_own_per_fat} \caption{Trends for reported fatigue through the \gls{KCQ} (questions 8: finger, 9: wrist, 10: arm, 11: shoulder, 12: neck) and histogram for the slopes (IM: improving, DE: deteriorating, ST: stable) of \gls{UX Curve}s concerning perceived fatigue. The curves were evaluated by looking at the y value of the starting point for T0\_1 and comparing it to y value of the end point for T0\_2 with a margin of $\pm$ 1 mm} \label{fig:res_own_per_fat} \end{figure} \subsection{Performance Metrics} % As briefly mentioned in the last section, the individual measurements were then converted into % percentage values of the mean of the reference values gathered from typing tests % with keyboard \textit{Own} (\gls{OPC}). \label{sec:res_perf} \subsubsection{Typing Speed} \label{sec:res_typing_speed} The typing speed for each individual keyboard and typing test was automatically captured with the help of the typing test functionality offered by \glsfirst{GoTT}. We captured \gls{WPM}, \gls{AdjWPM} and \gls{KSPS} according to the formulas mentioned in Section \ref{sec:meas_perf}. We used the mean of the results for both typing tests performed with each keyboard to conduct the following statistical analysis. A \gls{rmANOVA} was performed and revealed possible differences between at least two of the test keyboards (\textit{Athena, Aphrodite, Nyx} and \textit{Hera}) in terms of \gls{WPM} (F(3, 69) = 6.036, p = 0.001). We performed dependent T-tests with Holm correction and found significant differences between \textit{Aphrodite} (M = 51.5, sd = 14.0) and \textit{Nyx} (M = 49.4, sd = 13.3, t = 3.33, p = 0.014), \textit{Athena} (M = 51.5, sd = 14.2) and \textit{Nyx} (M = 49.4, sd = 13.3, t = 2.76, p = 0.044) and \textit{Hera} (M = 51.9, sd = 14.6) and \textit{Nyx} (M = 49.4, sd = 13.3, t = 3.53, p = 0.01). Further, the \gls{rmANOVA} for \gls{AdjWPM} yielded (F(3, 69) = 6.197, p = 0.0009) and for \gls{KSPS} (F(3, 69) = 3.566, p = 0.018). All relevant results of the post-hoc tests and the summary of the performance data can be observed in Tables \ref{tbl:sum_tkbs_speed} and \ref{tbl:res_tkbs_speed}. We further examined which of the four test keyboard was the fastest for each participant and found, that \textit{Hera} was the fastest keyboard in terms of \gls{WPM} for 46\,\% (11) of the twenty-four subjects. Additionally, we analyzed the \gls{WPM} percentage of \textit{Own} (\gls{OPC}) for all test keyboards to figure out, which keyboard exceeded the performance of the participant's own keyboard. We found that three subjects reached \gls{OPC}\_\gls{WPM} values greater than 100\,\% with all four test keyboards. Also, \textit{Athena, Aphrodite} and \textit{Hera} exceeded 100\,\% of \gls{OPC}\_\gls{WPM} eight, seven and six times respectively. Detailed results are presented in Figure \ref{fig:max_opc_wpm}. \begin{table}[H] \centering \footnotesize \ra{1.2} \toprule \parbox{.49\linewidth}{ \begin{tabular}{?r^l^l^l^l^l} \multicolumn{6}{c}{\textbf{\gls{WPM}}} \\ \rowstyle{\itshape} Pseud. & Mean & Min & Max & SD & SE \\ \midrule Athena & 51.47 & 17.96 & 73.86 & 14.21 & 2.90 \\ Aphrodite & 51.46 & 20.76 & 76.36 & 14.01 & 2.86 \\ Nyx & 49.39 & 20.80 & 74.26 & 13.28 & 2.71 \\ Hera & 51.87 & 18.10 & 76.06 & 14.55 & 2.97 \\ \end{tabular} } \parbox{.49\linewidth}{ \begin{tabular}{?r^l^l^l^l^l^l^l} \multicolumn{6}{c}{\textbf{\gls{AdjWPM}}} \\ \rowstyle{\itshape} Pseud. & Mean & Min & Max & SD & SE \\ \midrule Athena & 51.04 & 17.94 & 73.19 & 14.07 & 2.87 \\ Aphrodite & 50.97 & 20.76 & 75.78 & 13.95 & 2.85 \\ Nyx & 48.84 & 20.80 & 73.62 & 13.17 & 2.69 \\ Hera & 51.32 & 18.06 & 75.14 & 14.40 & 2.94 \\ \end{tabular} } \begin{tabular}{?r^l^l^l^l^l^l^l} \\ \multicolumn{6}{c}{\textbf{\gls{KSPS}}} \\ \rowstyle{\itshape} Pseud. & Mean & Min & Max & SD & SE \\ \midrule Athena & 5.23 & 1.68 & 7.94 & 1.54 & 0.31 \\ Aphrodite & 5.32 & 2.00 & 8.14 & 1.50 & 0.31 \\ Nyx & 5.31 & 1.95 & 8.15 & 1.48 & 0.30 \\ Hera & 5.37 & 1.72 & 8.15 & 1.57 & 0.32 \\ \end{tabular} \bottomrule \caption{Summaries for \glsfirst{WPM}, \glsfirst{AdjWPM} and \glsfirst{KSPS} for the test keyboards} \label{tbl:sum_tkbs_speed} \end{table} \begin{table}[H] \centering \small \ra{1.3} \begin{tabular}{?l^l^l^l^l^l^l^l} \toprule \rowstyle{\itshape} Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\ \midrule \multicolumn{6}{l}{\textbf{Parametric (Dependent T-test)}} \\ WPM & Athena - Nyx & 2.765 & 0.044^* & 2.083 & [0.52, 3.64] & two-tailed \\ WPM & Aphrodite - Nyx & 3.332 & 0.014^* & 2.069 & [0.78, 3.35] & two-tailed \\ WPM & Hera - Nyx & 3.541 & 0.010^* & 2.479 & [1.03, 3.93] & two-tailed \\ AdjWPM & Athena - Nyx & 2.868 & 0.035^* & 2.200 & [0.61, 3.79] & two-tailed \\ AdjWPM & Aphrodite - Nyx & 3.443 & 0.011^* & 2.132 & [0.85, 3.41] & two-tailed \\ AdjWPM & Hera - Nyx & 3.515 & 0.011^* & 2.475 & [1.02, 3.93] & two-tailed \\ KSPS & Athena - Hera & -2.834 & 0.056^\dagger & -0.145 & [-0.25, -0.04] & two-tailed \\ KSPS & Aphrodite - Athena & 2.566 & 0.086^\dagger & 0.095 & [0.02, 0.17] & two-tailed \\ \bottomrule \end{tabular} \caption{Relevant post-hoc results of speed related metrics for the test keyboards. Significant p values are denoted with * and p values indicating a trend towards significance are marked with $\dagger$. Confidence intervals are given for the estimate in the difference in means} \label{tbl:res_tkbs_speed} \end{table} \begin{figure}[H] \centering \includegraphics[width=1.0\textwidth]{images/max_opc_wpm} \caption{The left graph shows the fastest keyboard in terms of \gls{WPM} for each participant. The right graph shows, which keyboards were even faster than the participant's own keyboard (\gls{OPC}\_\gls{WPM} > 100\,\%)} \label{fig:max_opc_wpm} \end{figure} \subsubsection{Error Rate} \label{sec:res_error_rate} \gls{GoTT} also automatically tracked various error related metrics from which we analyzed \glsfirst{UER}, \glsfirst{CER} and \glsfirst{TER}. Since we were interested in whether higher actuation forces lead to a lower error rates compared to lower actuation forces, we conducted one-tailed post-hoc tests for the following statistical analyses. Like in Section \ref{sec:res_typing_speed}, we used the means of the results from both typing test for each keyboard to conduct the analysis. The Friedman's Tests for \gls{TER} ($\chi^2$(3) = 25.4, p = 0.00001) and the \gls{rmANOVA} for \gls{CER} (F(3, 69) = 13.355, p = 0.0000408 (\gls{GG})) revealed differences for at least two test keyboards. The Friedman's Test for \gls{UER} ($\chi^2$(3) = 2.59, p = 0.46) yielded no statistical significant difference. It should be noted, that the 90th percentile of \gls{UER} for all keyboards was still below 1\,\%. Summaries for the individual metrics and results for all post-hoc tests can be seen in Table \ref{tbl:sum_tkbs_err} and \ref{tbl:res_tkbs_err}. Furthermore, we compared the \gls{TER} of all test keyboards for each participant and found that \textit{Athena} was the keyboard which participants typed most accurately with. Two participants scored identical \gls{TER} with two test keyboards, therefore the total number of ``1st-placed'' keyboards increased to twenty-six. Lastly, we compared the test keyboards to subject's own keyboards and examined that eleven participants scored lower \gls{TER}s with \textit{Athena} compared to \textit{Own} (\gls{OPC}). All data can be observed in Figure \ref{fig:max_opc_ter}. \begin{table}[H] \centering \footnotesize \ra{1.2} \toprule \begin{tabular}{?r^l^l^l^l^l} \multicolumn{6}{c}{\textbf{\gls{TER}}} \\ \rowstyle{\itshape} Pseud. & Mean & Min & Max & SD & SE \\ \midrule Athena & 0.08 & 0.02 & 0.17 & 0.03 & 0.01 \\ Aphrodite & 0.09 & 0.02 & 0.20 & 0.04 & 0.01 \\ Nyx & 0.11 & 0.03 & 0.25 & 0.06 & 0.01 \\ Hera & 0.09 & 0.02 & 0.21 & 0.04 & 0.01 \\ \end{tabular} \\ \parbox{.49\linewidth}{ \begin{tabular}{?r^l^l^l^l^l^l^l} \multicolumn{6}{c}{\textbf{\gls{UER}}} \\ \rowstyle{\itshape} Pseud. & Mean & Min & Max & SD & SE \\ \midrule Athena & 0.01 & 0.00 & 0.14 & 0.03 & 0.01 \\ Aphrodite & 0.01 & 0.00 & 0.17 & 0.03 & 0.01 \\ Nyx & 0.01 & 0.00 & 0.21 & 0.04 & 0.01 \\ Hera & 0.01 & 0.00 & 0.18 & 0.04 & 0.01 \\ \end{tabular} } \parbox{.49\linewidth}{ \begin{tabular}{?r^l^l^l^l^l^l^l} \multicolumn{6}{c}{\textbf{\gls{CER}}} \\ \rowstyle{\itshape} Pseud. & Mean & Min & Max & SD & SE \\ \midrule Athena & 0.07 & 0.02 & 0.13 & 0.03 & 0.01 \\ Aphrodite & 0.08 & 0.02 & 0.18 & 0.04 & 0.01 \\ Nyx & 0.10 & 0.03 & 0.23 & 0.05 & 0.01 \\ Hera & 0.08 & 0.02 & 0.14 & 0.04 & 0.01 \\ \end{tabular} } \bottomrule \caption{Descriptive statistics for \glsfirst{TER}, \glsfirst{UER} and \glsfirst{CER} for the test keyboards} \label{tbl:sum_tkbs_err} \end{table} \begin{table}[H] \centering \small \ra{1.3} \begin{tabular}{?l^l^l^l^l^l^l^l} \toprule \rowstyle{\itshape} Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\ \midrule \multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\ TER & Athena - Hera & 38.0 & 0.004^* & -0.011 & ]-Inf, -0.01] & less \\ TER & Athena - Aphrodite & 58.5 & 0.009^* & -0.012 & ]-Inf, 0] & less \\ TER & Athena - Nyx & 18.0 & 0.00009^* & -0.027 & ]-Inf, -0.02] & less \\ TER & Aphrodite - Nyx & 35.5 & 0.002^* & -0.018 & ]-Inf, -0.01] & less \\ TER & Hera - Aphrodite & 181.0 & 0.816 & 0.002 & ]-Inf, 0.01] & less \\ TER & Hera - Nyx & 29.5 & 0.002^* & -0.016 & ]-Inf, -0.01] & less \\ \multicolumn{6}{l}{\textbf{Parametric (Dependent T-test)}} \\ CER & Athena - Hera & -2.796 & 0.015^* & -0.011 & ]-Inf, 0] & less \\ CER & Athena - Aphrodite & -2.772 & 0.015^* & -0.011 & ]-Inf, 0] & less \\ CER & Athena - Nyx & -4.356 & 0.0007^* & -0.030 & ]-Inf, -0.02] & less \\ CER & Aphrodite - Nyx & -3.821 & 0.002^* & -0.019 & ]-Inf, -0.01] & less \\ CER & Hera - Aphrodite & 0.050 & 0.520 & 0.000 & ]-Inf, 0.01] & less \\ CER & Hera - Nyx & -3.825 & 0.002^* & -0.019 & ]-Inf, -0.01] & less \\ \bottomrule \end{tabular} \caption{Post-hoc results of error rates for the test keyboards. Significant p values are denoted with *. Confidence intervals are given for the estimate in the difference in means (T-test) and difference of the location parameter (Wilcoxon)} \label{tbl:res_tkbs_err} \end{table} \begin{figure}[H] \centering \includegraphics[width=1.0\textwidth]{images/max_opc_ter} \caption{The left graph shows the keyboard with the lowest \gls{TER} for each participant. The right graph shows, which keyboards were more accurate than the participant's own keyboard (\gls{OPC}\_\gls{TER} < 100\,\%)} \label{fig:max_opc_ter} \end{figure} \subsection{Muscle Activity Measurements} \label{sec:res_muscle_activity} We utilized the \gls{EMG} device described in Section \ref{sec:main_design} to gather data about the muscle activities (\% of \glsfirst{MVC}) during typing tests for the extensor and flexor muscles of both forearms. For our analysis, we used the mean values of the results for both typing tests with each keyboard. It has to be noted, that we had to remove two erroneous measurements concerning the right flexor muscle (n = 22). We found no significant differences in \%\gls{MVC} for any of the test keyboards in neither flexor, nor extensor \gls{EMG} measurements. Further, we analyzed the effect of the individual keyboards on \%\gls{MVC}s separately for first and second typing tests (Tn\_1 \& Tn\_2, n := 1, ..., 4), but did not find any statistically significant results either. Additionally, we analyzed possible differences between \%\gls{MVC} measurements of first and second typing tests for each individual keyboard, using either dependent T-tests or Wilcoxon Signed Rank Tests. There were no statistically significant differences in \%\gls{MVC} between the first and the second typing test for any keyboard/muscle combination. The summaries for all test keyboards of the mean values for both typing tests combined can be observed in Table \ref{tbl:sum_tkbs_emg}. Lastly, we created histograms (Figure \ref{fig:max_emg_tkbs}) for each of the observed muscle groups, that show the number of times a keyboard yielded the highest \%\gls{MVC} out of all keyboards for each participant. We found that \textit{Athena} most frequently ($\approx$45\,\%) produced the highest extensor muscle activity for both arms. The highest muscle activity for both flexor muscle groups was evenly distributed among all test keyboards with a slight exception of \textit{Nyx}, which produced the highest \%\gls{MVC} only in ~14\,\% of participants. \begin{figure}[H] \centering \includegraphics[width=1.0\textwidth]{images/max_emg_tkbs} \caption{Histograms for all \gls{EMG} measurements that show the keyboard with the highest mean \% of \glsfirst{MVC} out of all four keyboards for each participant} \label{fig:max_emg_tkbs} \end{figure} \begin{table}[H] \centering \footnotesize \ra{1.2} \toprule \parbox{.49\linewidth}{ \begin{tabular}{?r^l^l^l^l^l} \multicolumn{6}{c}{\textbf{Left Flexor \%\gls{MVC}}} \\ \rowstyle{\itshape} Pseud. & Mean & Min & Max & SD & SE \\ \midrule Athena & 9.90 & 0.94 & 41.91 & 9.03 & 1.84 \\ Aphrodite & 8.82 & 0.26 & 23.10 & 6.37 & 1.30 \\ Nyx & 8.84 & 2.13 & 24.37 & 6.65 & 1.36 \\ Hera & 9.98 & 2.82 & 25.18 & 6.91 & 1.41 \\ \end{tabular} } \parbox{.49\linewidth}{ \begin{tabular}{?r^l^l^l^l^l^l^l} \multicolumn{6}{c}{\textbf{Right Flexor \%\gls{MVC}} \textit{(n = 22)}} \\ \rowstyle{\itshape} Pseud. & Mean & Min & Max & SD & SE \\ \midrule Athena & 9.69 & 2.13 & 23.88 & 5.67 & 1.21 \\ Aphrodite & 9.33 & 2.15 & 16.96 & 4.51 & 0.96 \\ Nyx & 8.60 & 1.68 & 16.16 & 4.43 & 0.94 \\ Hera & 9.26 & 1.42 & 20.39 & 5.75 & 1.23 \\ \end{tabular} } \\ \parbox{.49\linewidth}{ \begin{tabular}{?r^l^l^l^l^l^l^l} \multicolumn{6}{c}{\textbf{Left Extensor \%\gls{MVC}}} \\ \rowstyle{\itshape} Pseud. & Mean & Min & Max & SD & SE \\ \midrule Athena & 12.24 & 5.17 & 18.98 & 4.11 & 0.84 \\ Aphrodite & 11.60 & 4.80 & 16.86 & 3.67 & 0.75 \\ Nyx & 11.43 & 5.14 & 16.45 & 3.87 & 0.79 \\ Hera & 11.73 & 4.80 & 21.05 & 4.10 & 0.84 \\ \end{tabular} } \parbox{.49\linewidth}{ \begin{tabular}{?r^l^l^l^l^l^l^l} \multicolumn{6}{c}{\textbf{Right Extensor \%\gls{MVC}}} \\ \rowstyle{\itshape} Pseud. & Mean & Min & Max & SD & SE \\ \midrule Athena & 10.78 & 3.34 & 17.58 & 3.86 & 0.79 \\ Aphrodite & 10.66 & 3.56 & 19.05 & 4.41 & 0.90 \\ Nyx & 10.57 & 3.81 & 21.55 & 4.33 & 0.88 \\ Hera & 10.79 & 4.11 & 19.50 & 4.09 & 0.83 \\ \end{tabular} } \bottomrule \caption{Descriptive statistics for the \textit{mean values of} measured muscle activity (\% of \glsfirst{MVC}) in \textit{both typing tests} conducted with each keyboard.} \label{tbl:sum_tkbs_emg} \end{table} \pagebreak \subsection{Questionnaires} \label{sec:res_questionnaires} \subsubsection{Keyboard Comfort Questionnaire} \label{sec:res_kcq} The \glsfirst{KCQ} was filled out by the participants after each individual typing test. The questionnaire featured twelve questions regarding the previously used keyboard which are labeled as follows: \begin{table}[H] \centering \ra{0.8} \small \begin{tabular}{llll} \textbf{KCQ1:} & \textit{``Required operating force during usage?''} & \textbf{KCQ7:} & \textit{``Ease of use?''} \\ \textbf{KCQ2:} & \textit{``Perceived uniformity during usage?''} & \textbf{KCQ8:} & \textit{``Fatigue of the fingers?''} \\ \textbf{KCQ3:} & \textit{``Effort required during usage?''} & \textbf{KCQ9:} & \textit{``Fatigue of the wrists?''} \\ \textbf{KCQ4:} & \textit{``Perceived accuracy?''} & \textbf{KCQ10:} & \textit{``Fatigue of the arms?''} \\ \textbf{KCQ5:} & \textit{``Acceptability of speed?''} & \textbf{KCQ11:} & \textit{``Fatigue of the shoulders?''} \\ \textbf{KCQ6:} & \textit{``Overall satisfaction?''} & \textbf{KCQ12:} & \textit{``Fatigue of the neck?''} \\ \end{tabular} \end{table} All questions featured a 7-point Likert scale where 1 always denoted the worst and 7 the best possible experience \cite{iso9241-411}. We conducted Friedman's Tests for all questions and found differences for at least two of the test keyboards in \textit{KCQ3} ($\chi^2$(3) = 9.49, p = 0.024), \textit{KCQ4} ($\chi^2$(3) = 18.4, p = 0.0004), \textit{KCQ6} ($\chi^2$(3) = 10.2, p = 0.017) and \textit{KCQ8} ($\chi^2$(3) = 12.0, p = 0.0075). Further, we noticed a trend towards significance for question \textit{KCQ1} ($\chi^2$(3) = 7.02, p = 0.071). The mean values for all answers can be seen in Figure \ref{fig:kcq_tkbs_res} and the post-hoc test for relevant answers are shown in Table \ref{tbl:res_kcq}. \begin{figure}[H] \centering \includegraphics[width=1.0\textwidth]{images/kcq_tkbs_res} \caption{Means of the responses for all questions of the \glsfirst{KCQ}} \label{fig:kcq_tkbs_res} \end{figure} \begin{table}[H] \centering \small \ra{1.3} \begin{tabular}{?l^l^l^l^l^l^l^l} \toprule \rowstyle{\itshape} Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\ \midrule \multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\ KCQ1 & Aphrodite - Athena & 191.5 & 0.051^\dagger & 1.5 & [0.5, 2.5] & two-tailed \\ \midrule KCQ3 & Aphrodite - Athena & 209.5 & 0.03^* & 1.25 & [0.25, 2] & two-tailed \\ KCQ3 & Athena - Hera & 37.0 & 0.022^* & -1.25 & [-2, -0.5] & two-tailed \\ KCQ3 & Athena - Nyx & 31.0 & 0.03^* & -1.5 & [-2.5, -0.5] & two-tailed \\ \midrule KCQ4 & Aphrodite - Nyx & 161.5 & 0.038^* & 1.5 & [0.75, 2.5] & two-tailed \\ KCQ4 & Athena - Hera & 168.5 & 0.072^\dagger & 1.0 & [0.25, 1.5] & two-tailed \\ KCQ4 & Athena - Nyx & 193.5 & 0.006^* & 2.0 & [1, 2.75] & two-tailed \\ \midrule KCQ6 & Aphrodite - Nyx & 240.000 & 0.061^\dagger & 1.0 & [0.25, 1.75] & two-tailed \\ \midrule KCQ8 & Athena - Hera & 18.000 & 0.007^* & -1.25 & [-1.75, -0.75] & two-tailed \\ KCQ8 & Athena - Nyx & 12.500 & 0.007^* & -1.25 & [-2, -0.75] & two-tailed \\ \bottomrule \end{tabular} \caption{Post-hoc tests for questions from the \gls{KCQ}. Statistically significant differences (p < 0.05) are marked with an asterisk and p values indicating a trend towards significance are denoted with $\dagger$. Confidence intervals are given for the difference of the location parameter} \label{tbl:res_kcq} \end{table} \subsubsection{User Experience Questionnaire (Short)} \label{sec:res_ueqs} In addition to to the \gls{KCQ}, we utilized the \glsfirst{UEQ-S}. It featured eight questions on a 7-point Likert scale, which formed two scales (pragmatic, hedonic). Additionally we added one extra question that could be answered on a \glsfirst{VAS} from 0 to 100. The survey was filled out after both tests with a keyboard have been completed. The questions of our modified \gls{UEQ-S} were labeled as follows: \begin{table}[H] \centering \ra{0.8} \small \begin{tabular}{llll} \multicolumn{2}{c}{Pragmatic Scale} & \multicolumn{2}{c}{Hedonic Scale} \\ \\ \textbf{PRA1:} & \textit{``Obstructive or Supportive?''} & \textbf{HED1:} & \textit{``Boring or Exciting?''} \\ \textbf{PRA2:} & \textit{``Complicated or Easy?''} & \textbf{HED2:} & \textit{``Not interesting or Interesting?''} \\ \textbf{PRA3:} & \textit{``Inefficient or Efficient?''} & \textbf{HED3:} & \textit{``Conventional or Inventive?''} \\ \textbf{PRA4:} & \textit{``Confusing or Clear?''} & \textbf{HED4:} & \textit{``Usual or Leading Edge?''} \\ \\ \multicolumn{4}{c}{Additional Question (\gls{VAS})} \\ \\ \textbf{SATI:} & \multicolumn{3}{l}{\textit{``How satisfied have you been with this keyboard?''}} \end{tabular} \end{table} The 7-point Likert scale items (PRA1-4, HED1-4) were then transformed to represent a scale from -3 to +3, where -3 represented the left term and +3 the right term of the ``or'' questions. All sub-scales, pragmatic ($\alpha$ = 0.90)\footnote{PRA: Athena ($\alpha$ = 0.83), Aphrodite ($\alpha$ = 0.95), Nyx ($\alpha$ = 0.90), Hera ($\alpha$ = 0.85)} and hedonic ($\alpha$ = 0.88)\footnote{HED: Athena ($\alpha$ = 0.89), Aphrodite ($\alpha$ = 0.89), Nyx ($\alpha$ = 0.91), Hera ($\alpha$ = 0.90)}, exceeded the recommended threshold for Cronbach's alpha of $\alpha$ > 0.7 \cite{schrepp_ueq_handbook}. The mean values for all responses of the \gls{UEQ-S} can be seen in Figure \ref{fig:kcq_tkbs_res} and the individual responses to the additional question (SATI) are presented in Figure \ref{fig:res_tkbs_sati}. We conducted \gls{rmANOVA}s for both sub-scales but found no statistically significant variations for the pragmatic scale (F(3, 69) = 3.254, p = 0.06, post-hoc did not reveal any tendencies) nor the hedonic scale (F(3, 69) = 0.425, p = 0.74). Contrary, the \gls{rmANOVA} for the additional question \textit{SATI} indicated statistically significant differences (F(3, 69) = 3.254, p = 0.027). In this case, we decided to use Wilcoxon Signed Rank Tests for our post-hoc analysis because of our interest in the difference of medians and the relatively high power of this test in analyzing \gls{VAS} data \cite{heller_vas}. The results and summaries for the test keyboards can be observed in Tables \ref{tbl:res_tkbs_sati} and \ref{tbl:sum_tkbs_sati}. \begin{figure}[H] \centering \includegraphics[width=0.92\textwidth]{images/ueq_tkbs_res} \caption{Means of the responses for all questions of the \glsfirst{UEQ-S}} \label{fig:ueq_tkbs_res} \end{figure} \begin{table}[H] \centering \small \ra{1.2} \begin{tabular}{?l^l^l^l^l^l^l^l} \toprule \rowstyle{\itshape} Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\ \midrule \multicolumn{6}{l}{\textbf{Non Parametric (Wilcoxon Signed Rank Test)}} \\ SATI & Aphrodite - Nyx & 217.0 & 0.046^* & 14.0 & [5, Inf[ & greater \\ SATI & Aphrodite - Athena & 201.5 & 0.046^* & 12.5 & [4.5, Inf[ & greater \\ SATI & Nyx - Athena & 125.5 & 1.0 & -3.0 & [-11.5, Inf[ & greater \\ SATI & Hera - Athena & 205.5 & 0.174 & 8.5 & [0, Inf[ & greater \\ SATI & Hera - Aphrodite & 118.5 & 1.0 & -2.5 & [-12.5, Inf[ & greater \\ SATI & Hera - Nyx & 223.5 & 0.074^\dagger & 12.5 & [2.5, Inf[ & greater \\ \bottomrule \end{tabular} \caption{Post-hoc tests for the additional question \textit{``How satisfied have you been with this keyboard?''}. Statistically significant differences (p < 0.05) are marked with an * and p values indicating a trend towards significance are denoted with $\dagger$. Confidence intervals are given for the difference of the location parameter. We only tested keyboards with lower actuation force against keyboards with higher actuation force. The first comparison of Aphrodite (50\,g) and Nyx (35\,g) was added, because of the noticeable differences in the visual assessment of Figure \ref{fig:res_tkbs_sati}} \label{tbl:res_tkbs_sati} \end{table} \begin{table}[H] \centering \footnotesize \ra{1.1} \begin{tabular}{?r^l^l^l^l^l^l^l} \toprule \rowstyle{\itshape} Pseud. & Mean & Median & Min & Max & SD & SE \\ \midrule Athena & 54.12 & 50.00 & 1.00 & 95.00 & 25.43 & 5.19 \\ Aphrodite & 65.08 & 71.50 & 10.00 & 94.00 & 22.56 & 4.61 \\ Nyx & 51.42 & 55.00 & 0.00 & 90.00 & 23.40 & 4.78 \\ Hera & 63.29 & 70.00 & 12.00 & 92.00 & 19.95 & 4.07 \\ \bottomrule \end{tabular} \caption{Descriptive statistics for the additional question \textit{``How satisfied have you been with this keyboard?''} for all four test keyboards} \label{tbl:sum_tkbs_sati} \end{table} \begin{figure}[H] \centering \includegraphics[width=1.0\textwidth]{images/sati_tkbs_res} \caption{Responses for the additional question \textit{``How satisfied have you been with this keyboard?''} with the means for all participant represented as horizontal lines} \label{fig:res_tkbs_sati} \end{figure} \subsection{UX Curves and Semi-Structured Interviews} \label{sec:res_uxc} In order to give all participants the chance to recapitulate the whole experiment and give retrospective feedback about each individual keyboard, we conducted a semi-structured interview which included drawing \gls{UX Curve}s for perceived fatigue and perceived typing speed. We evaluated the curves by measuring the y position of the \gls{SP} for a curve and the y position of the respective \gls{EP} an determine the slope of that curve. Slopes are defined as improving if \gls{SP} < \gls{EP}, deteriorating if \gls{SP} > \gls{EP} and stable if \gls{SP} = \gls{EP} (margin of $\pm$ 1 mm). One curve can either represent one typing test (C1 or C2) or the whole experience with one keyboard over the course of both typing tests (C12). All curves can be observed in Appendix \ref{app:uxc} and the resulting slopes for all curve types are shown in Figure \ref{fig:res_uxc}. During the semi-structured interview we asked the participants to rank the keyboards from 1 (favorite) to 5 (least favorite). If in doubt, participants were allowed to place two keyboards on the same rank. Further, we asked some participants (n = 19) to also rank the keyboards from lowest actuation force (one) to highest actuation force (five). The participants own keyboard was four times more often placed first than any other keyboard. \textit{Hera} was the only keyboard that never got placed fifth and except for \textit{Own}, was the most represented keyboard in the top three. The ranking of the perceived actuation force revealed that participants were able to identify \textit{Nyx} (35\,g) and \textit{Athena} (80\,g) as the keyboards with the lowest and highest actuation force respectively. All results for both rankings are visualized in Figure \ref{fig:res_interview}. Lastly, we analyzed the recordings of all interviews and found several similar statements about specific keyboards. Twelve participants noted that because of the new form factor of the test keyboards, additional familiarization was required to feel comfortable. Nine of those specifically mentioned the height of the keyboard as the main difference. Fourteen subjects reported―\textit{``Because Nyx had such a low resistance, I kept making mistakes!''}. Four participants explicitly noted that \textit{Hera} felt very pleasant and two subjects mentioned \textit{``I had really good flow.''} and \textit{``It somehow just felt right''}. Ten participants reported, that typing on \textit{Athena} was exhausting. \textit{Aphrodite} was not mentioned as often as the other keyboards which could be related to a comment of two subjects―\textit{``It felt very similar to my own Keyboard''}. \begin{figure}[H] \centering \includegraphics[width=1.0\textwidth]{images/res_uxc} \caption{\centering Evaluation of \gls{UX Curve} slopes for perceived fatigue and perceived speed. \\ \textit{DE:} deteriorating, \textit{IM:} improving, \textit{ST:} stable} \label{fig:res_uxc} \end{figure} \begin{figure}[H] \centering \includegraphics[width=1.0\textwidth]{images/res_interview} \caption{Rankings for favorite keyboard and perceived required actuation force for all keyboards including \textit{Own}. The graphs show the number of times a keyboard was placed at a certain rank} \label{fig:res_interview} \end{figure}