@ -1,7 +1,7 @@
% A rapid method that creates many corrected errors, has efficient error correction, and leaves
% A rapid method that creates many corrected errors, has efficient error correction, and leaves
% few uncorrected errors can still be considered a successful method, since it produces
% few uncorrected errors can still be considered a successful method, since it produces
% accurate text in relatively little time. pp. 56 MacKenzie
% accurate text in relatively little time. pp. 56 MacKenzie
\section { Results}
\section { Results of the Main User Study }
\label { sec:results}
\label { sec:results}
This section addresses the statistical analysis of the data obtained throughout
This section addresses the statistical analysis of the data obtained throughout
the main, within-subject, user study (n = 24) that consisted of five repeated
the main, within-subject, user study (n = 24) that consisted of five repeated
@ -141,7 +141,17 @@ significant differences between \textit{Aphrodite} (M = 51.5, sd = 14.0) and
3.53, p = 0.01). Further, the \gls { rmANOVA} for \gls { AdjWPM} yielded (F(3, 69) =
3.53, p = 0.01). Further, the \gls { rmANOVA} for \gls { AdjWPM} yielded (F(3, 69) =
6.197, p = 0.0009) and for \gls { KSPS} (F(3, 69) = 3.566, p = 0.018). All
6.197, p = 0.0009) and for \gls { KSPS} (F(3, 69) = 3.566, p = 0.018). All
relevant results of the post-hoc tests and the summary of the performance data
relevant results of the post-hoc tests and the summary of the performance data
can be observed in Tables \ref { tbl:sum_ tkbs_ speed} and \ref { tbl:res_ tkbs_ speed} .
can be observed in Tables \ref { tbl:sum_ tkbs_ speed} and
\ref { tbl:res_ tkbs_ speed} . We further examined, which of the four test keyboard
was the fastest for each participant and found, that \textit { Hera} was the
fastest keyboard in terms of \gls { WPM} for 46\% (11) of the twenty-four
subjects. Additionally, we analyzed the \gls { WPM} percentage of \textit { Own}
(\gls { OPC} ) for all test keyboards to figure out, which keyboard exceeded the
performance of the participant's own keyboard. We found, that three subjects
reached \gls { OPC} \_ \gls { WPM} values greater than 100\% with all four test
keyboards. Also, \textit { Athena, Aphrodite} and \textit { Hera} exceeded 100\% of
\gls { OPC} \_ \gls { WPM} eight, seven and six times respectively. Detailed results
are presented in Figure \ref { fig:max_ opc_ wpm} .
\begin { table} [H]
\begin { table} [H]
\centering
\centering
@ -215,6 +225,15 @@ can be observed in Tables \ref{tbl:sum_tkbs_speed} and \ref{tbl:res_tkbs_speed}.
\label { tbl:res_ tkbs_ speed}
\label { tbl:res_ tkbs_ speed}
\end { table}
\end { table}
\begin { figure} [H]
\centering
\includegraphics [width=1.0\textwidth] { images/max_ opc_ wpm}
\caption { The left graph shows the fastest keyboard in terms of \gls { WPM} for
each participant. The right graph shows, which keyboards were even faster
than the participant's own keyboard (\gls { OPC} \_ \gls { WPM} > 100\% )}
\label { fig:max_ opc_ wpm}
\end { figure}
\subsubsection { Error Rate}
\subsubsection { Error Rate}
\label { sec:res_ error_ rate}
\label { sec:res_ error_ rate}
\gls { GoTT} also automatically tracked various error related metrics from which
\gls { GoTT} also automatically tracked various error related metrics from which
@ -230,7 +249,15 @@ Test for \gls{UER} ($\chi^2$(3) = 2.59, p = 0.46) yielded no statistical
significant difference. It should be noted, that the 90th percentile of
significant difference. It should be noted, that the 90th percentile of
\gls { UER} for all keyboards was still below 1\% . Summaries for the individual
\gls { UER} for all keyboards was still below 1\% . Summaries for the individual
metrics and results for all post-hoc tests can be seen in Table
metrics and results for all post-hoc tests can be seen in Table
\ref { tbl:sum_ tkbs_ err} and \ref { tbl:res_ tkbs_ err} .
\ref { tbl:sum_ tkbs_ err} and \ref { tbl:res_ tkbs_ err} . Furthermore, we compared the
\gls { TER} of all test keyboards for each participant and found, that
\textit { Athena} was the keyboard which participants typed most accurately
with. Two participants scored identical \gls { TER} with two test keyboards,
therefore the total number of ``1st-placed'' keyboards increased to twenty-six.
Lastly, we compared the test keyboards to subject's own keyboards and examined
that eleven participants scored lower \gls { TER} s with \textit { Athena} compared
to \textit { Own} (\gls { OPC} ). All data can be observed in Figure
\ref { fig:max_ opc_ ter} .
\begin { table} [H]
\begin { table} [H]
\centering
\centering
@ -309,7 +336,16 @@ metrics and results for all post-hoc tests can be seen in Table
\label { tbl:res_ tkbs_ err}
\label { tbl:res_ tkbs_ err}
\end { table}
\end { table}
\subsection { Muscle Activity}
\begin { figure} [H]
\centering
\includegraphics [width=1.0\textwidth] { images/max_ opc_ ter}
\caption { The left graph shows the keyboard with the lowest \gls { TER} for each
participant. The right graph shows, which keyboards were more accurate than
the participant's own keyboard (\gls { OPC} \_ \gls { TER} < 100\% )}
\label { fig:max_ opc_ ter}
\end { figure}
\subsection { Muscle Activity Measurements}
\label { sec:res_ muscle_ activity}
\label { sec:res_ muscle_ activity}
We utilized the \gls { EMG} device described in Section \ref { sec:main_ design} to
We utilized the \gls { EMG} device described in Section \ref { sec:main_ design} to
gather data about the muscle activities (\% of \glsfirst { MVC} ) during typing
gather data about the muscle activities (\% of \glsfirst { MVC} ) during typing
@ -327,7 +363,23 @@ using either dependent T-tests or Wilcoxon Signed Rank Tests. There were no
statistically significant differences in \% \gls { MVC} between the first and the
statistically significant differences in \% \gls { MVC} between the first and the
second typing test for any keyboard/muscle combination. The summaries for all
second typing test for any keyboard/muscle combination. The summaries for all
test keyboards of the mean values for both typing tests combined can be observed
test keyboards of the mean values for both typing tests combined can be observed
in Table \ref { tbl:sum_ tkbs_ emg} .
in Table \ref { tbl:sum_ tkbs_ emg} . Lastly, we created histograms (Figure
\ref { fig:max_ mvc_ tkbs} ) for each of the observed muscle groups, that show the
number of times a keyboard yielded the highest \% \gls { MVC} out of all keyboards
for each participant. We found, that \textit { Athena} most frequently (~45\% )
produced the highest extensor muscle activity for both arms. The highest muscle
activity for both flexor muscle groups was evenly distributed among all test
keyboards with a slight exception of \textit { Nyx} , which produced the highest
\% \gls { MVC} only in ~14\% of participants.
\begin { figure} [H]
\centering
\includegraphics [width=1.0\textwidth] { images/max_ emg_ tkbs}
\caption { Histograms for all \gls { EMG} measurements that show the keyboard with
the highest mean \% of \glsfirst { MVC} out of all four keyboards for each
participant}
\label { fig:max_ emg_ tkbs}
\end { figure}
\begin { table} [H]
\begin { table} [H]
\centering
\centering
@ -402,14 +454,14 @@ previously used keyboard which are labelled as follows:
\centering
\centering
\ra { 0.8}
\ra { 0.8}
\small
\small
\begin { tabular} { llll}
\begin { tabular} { llll}
\textbf { KCQ1:} & \textit { ``Required operating force during usage?''} & \textbf { KCQ7:} & \textit { ``Ease of use?''} \\
\textbf { KCQ1:} & \textit { ``Required operating force during usage?''} & \textbf { KCQ7:} & \textit { ``Ease of use?''} \\
\textbf { KCQ2:} & \textit { ``Perceived uniformity during usage?''} & \textbf { KCQ8:} & \textit { ``Fatigue of the fingers?''} \\
\textbf { KCQ2:} & \textit { ``Perceived uniformity during usage?''} & \textbf { KCQ8:} & \textit { ``Fatigue of the fingers?''} \\
\textbf { KCQ3:} & \textit { ``Effort required during usage?''} & \textbf { KCQ9:} & \textit { ``Fatigue of the wrists?''} \\
\textbf { KCQ3:} & \textit { ``Effort required during usage?''} & \textbf { KCQ9:} & \textit { ``Fatigue of the wrists?''} \\
\textbf { KCQ4:} & \textit { ``Perceived accuracy?''} & \textbf { KCQ10:} & \textit { ``Fatigue of the arms?''} \\
\textbf { KCQ4:} & \textit { ``Perceived accuracy?''} & \textbf { KCQ10:} & \textit { ``Fatigue of the arms?''} \\
\textbf { KCQ5:} & \textit { ``Acceptability of speed?''} & \textbf { KCQ11:} & \textit { ``Fatigue of the shoulders?''} \\
\textbf { KCQ5:} & \textit { ``Acceptability of speed?''} & \textbf { KCQ11:} & \textit { ``Fatigue of the shoulders?''} \\
\textbf { KCQ6:} & \textit { ``Overall satisfaction?''} & \textbf { KCQ12:} & \textit { ``Fatigue of the neck?''} \\
\textbf { KCQ6:} & \textit { ``Overall satisfaction?''} & \textbf { KCQ12:} & \textit { ``Fatigue of the neck?''} \\
\end { tabular}
\end { tabular}
\end { table}
\end { table}
All questions featured a 7-point Likert scale where 1 always denoted the worst
All questions featured a 7-point Likert scale where 1 always denoted the worst
@ -461,3 +513,173 @@ Table \ref{tbl:res_kcq}.
\end { table}
\end { table}
\subsubsection { User Experience Questionnaire (Short)}
\subsubsection { User Experience Questionnaire (Short)}
\label { sec:res_ ueqs}
\label { sec:res_ ueqs}
Additionally to the \gls { KCQ} we utilized the \glsfirst { UEQ-S} . It featured
eight questions on a 7-point Likert scale, which formed two scales (pragmatic,
hedonic). Additionally we added one extra question that could be answered on a
\glsfirst { VAS} from 0 to 100. The survey was filled out after both tests with a
keyboard have been completed. The questions of our modified \gls { UEQ-S} were
labelled as follows:
\begin { table} [H]
\centering
\ra { 0.8}
\small
\begin { tabular} { llll}
\multicolumn { 2} { c} { Pragmatic Scale} & \multicolumn { 2} { c} { Hedonic Scale} \\
\\
\textbf { PRA1:} & \textit { ``Obstructive or Supportive?''} & \textbf { HED1:} & \textit { ``Boring or Exciting?''} \\
\textbf { PRA2:} & \textit { ``Complicated or Easy?''} & \textbf { HED2:} & \textit { ``Not interesting or Interesting?''} \\
\textbf { PRA3:} & \textit { ``Inefficient or Efficient?''} & \textbf { HED3:} & \textit { ``Conventional or Inventive?''} \\
\textbf { PRA4:} & \textit { ``Confusing or Clear?''} & \textbf { HED4:} & \textit { ``Usual or Leading Edge?''} \\
\\
\multicolumn { 4} { c} { Additional Question (\gls { VAS} )} \\
\\
\textbf { SATI:} & \multicolumn { 3} { l} { \textit { ``How satisfied have you been with this keyboard?''} }
\end { tabular}
\end { table}
The 7-point Likert scale items (PRA1-4, HED1-4) were then transformed to
represent a scale from -3 to +3, where -3 represented the left term and +3 the
right term of the ``or'' questions. All sub-scales, pragmatic ($ \alpha $ =
0.90)\footnote { PRA: Athena ($ \alpha $ = 0.83), Aphrodite ($ \alpha $ = 0.95), Nyx
($ \alpha $ = 0.90), Hera ($ \alpha $ = 0.85)} and hedonic ($ \alpha $ =
0.88)\footnote { HED: Athena ($ \alpha $ = 0.89), Aphrodite ($ \alpha $ = 0.89), Nyx
($ \alpha $ = 0.91), Hera ($ \alpha $ = 0.90)} , exceeded the recommended threshold
for Cronbach's alpha of $ \alpha $ > 0.7 \cite { schrepp_ ueq_ handbook} . The mean
values for all responses of the \gls { UEQ-S} can be seen in Figure
\ref { fig:kcq_ tkbs_ res} and the individual responses to the additional question
(SATI) are presented in Figure \ref { fig:sati_ tkbs_ res} . We conducted
\gls { rmANOVA} s for both sub-scales but found no statistically significant
variations for the pragmatic scale (F(3, 69) = 3.254, p = 0.06, post-hoc did not
reveal any tendencies) nor the hedonic scale (F(3, 69) = 0.425, p =
0.74). Contrary, the \gls { rmANOVA} for the additional question \textit { SATI}
indicated statistically significant differences (F(3, 69) = 3.254, p =
0.027). In this case, we decided to use Wilcoxon Signed Rank Tests for our
post-hoc analysis because of our interest in the difference of medians and the
relatively high power of this test in analyzing \gls { VAS} data
\cite { heller_ vas} . The results and summaries for the test keyboards can be
observed in Tables \ref { tbl:res_ tkbs_ sati} and \ref { tbl:sum_ tkbs_ sati} .
\begin { figure} [H]
\centering
\includegraphics [width=1.0\textwidth] { images/ueq_ tkbs_ res}
\caption { Means of the responses for all questions of the \glsfirst { UEQ-S} }
\label { fig:ueq_ tkbs_ res}
\end { figure}
\begin { table} [H]
\centering
\small
\ra { 1.3}
\begin { tabular} { ?l^ l^ l^ l^ l^ l^ l^ l}
\toprule
\rowstyle { \itshape }
Y & Comparison & Statistic & p & Estimate & CI & Hypothesis \\
\midrule
\multicolumn { 6} { l} { \textbf { Non Parametric (Wilcoxon Signed Rank Test)} } \\
SATI & Aphrodite - Nyx & 217.0 & 0.046^ * & 14.0 & [5, Inf] & greater \\
SATI & Aphrodite - Athena & 201.5 & 0.046^ * & 12.5 & [4.5, Inf] & greater \\
SATI & Nyx - Athena & 125.5 & 1.0 & -3.0 & [-11.5, Inf] & greater \\
SATI & Hera - Athena & 205.5 & 0.174 & 8.5 & [0, Inf] & greater \\
SATI & Hera - Aphrodite & 118.5 & 1.0 & -2.5 & [-12.5, Inf] & greater \\
SATI & Hera - Nyx & 223.5 & 0.074^ \dagger & 12.5 & [2.5, Inf] & greater \\
\bottomrule
\end { tabular}
\caption { Post-hoc tests for the additional question \textit { ``How satisfied
have you been with this keyboard?''} . Statistically significant
differences (p < 0.05) are marked with an * and p values indicating a trend
towards significance are denoted with $ \dagger $ . Confidence intervals are
given for the difference of the location parameter. We only tested keyboards
with lower actuation force against keyboards with higher actuation
force. The first comparison of Aphrodite (50 g) and Nyx (35 g) was added,
because of the noticeable differences in the visual assessment of Figure
\ref { fig:sati_ tkbs_ res} }
\label { tbl:res_ tkbs_ sati}
\end { table}
\begin { table} [H]
\centering
\small
\ra { 1.3}
\begin { tabular} { ?r^ l^ l^ l^ l^ l^ l^ l}
\toprule
\rowstyle { \itshape }
Pseud. & Mean & Median & Min & Max & SD & SE \\
\midrule
Athena & 54.12 & 50.00 & 1.00 & 95.00 & 25.43 & 5.19 \\
Aphrodite & 65.08 & 71.50 & 10.00 & 94.00 & 22.56 & 4.61 \\
Nyx & 51.42 & 55.00 & 0.00 & 90.00 & 23.40 & 4.78 \\
Hera & 63.29 & 70.00 & 12.00 & 92.00 & 19.95 & 4.07 \\
\bottomrule
\end { tabular}
\caption { Summaries for the additional question \textit { ``How satisfied have
you been with this keyboard?''} for all four test keyboards}
\label { tbl:sum_ tkbs_ sati}
\end { table}
\begin { figure} [H]
\centering
\includegraphics [width=1.0\textwidth] { images/sati_ tkbs_ res}
\caption { Responses for the additional question \textit { ``How satisfied have
you been with this keyboard?''} with the means for all participant
represented as horizontal lines}
\label { fig:sati_ tkbs_ res}
\end { figure}
\subsection { UX Curves and Semi-Structured Interviews}
\label { sec:res_ uxc}
In order to give all participants the chance to recapitulate the whole
experiment and give retrospective feedback about each individual keyboard, we
conducted a semi-structured interview which included drawing UX-curves for
perceived fatigue and perceived typing speed. We evaluated the curves by
measuring the y position of the \gls { SP} for a curve and the y position of the
respective \gls { EP} an determine the slope of that curve. Slopes are defined as
improving if \gls { SP} < \gls { EP} , deteriorating if \gls { SP} > \gls { EP} and
stable if \gls { SP} = \gls { EP} (margin of $ \pm $ 1 mm). One curve can either
represent one typing test (C1 or C2) or the whole experience with one keyboard
over the course of both typing tests (C12). All curves can be observed in
Appendix \ref { app:uxc} and the resulting slopes for all curve types are shown in
Figure \ref { fig:res_ uxc} . During the semi-structured interview, we asked the
participants to rank the keyboards from 1 (favorite) to 5 (least favorite). If
in doubt, participants were allowed to place two keyboards on the same
rank. Further, we asked some participants (n = 19) to also rank the keyboards
from lowest actuation force (one) to highest actuation force (five). The
participants own keyboard was four times more often placed first than any other
keyboard. \textit { Hera} was the only keyboard, that never got placed fifth and
except for \textit { Own} , was the most represented keyboard in the top three. The
ranking of the perceived actuation force revealed, that participants were able
to identify \textit { Nyx} (35 g) and \textit { Athena} (80 g) as the keyboards with
the lowest and highest actuation force respectively. All results for both
rankings are visualized in Figure \ref { fig:res_ interview} . Lastly, we analyzed
the recordings of all interviews and found several similar statements about
specific keyboards. Twelve participants noted, that because of the new form
factor of the test keyboards, additional familiarization was required to feel
comfortable. Nine of those specifically mentioned the height of the keyboard as
the main difference. Fourteen subjects reported―\textit { ``Because Nyx had such a
low resistance, I kept making mistakes!''} . Four participants explicitly
noted, that \textit { Hera} felt very pleasant and two subjects mentioned
\textit { ``I had really good flow.''} and \textit { ``It somehow just felt
right''} . Ten participants reported, that typing on \textit { Athena} was
exhausting. \textit { Aphrodite} was not mentioned as often as the other keyboards
which could be related to a comment of two subjects―\textit { ``It felt very
similar to my own Keyboard''} .
\begin { figure} [H]
\centering
\includegraphics [width=1.0\textwidth] { images/res_ uxc}
\caption { \centering Evaluation of UX-curve slopes for perceived fatigue and perceived
speed. \\
\textit { DE:} deteriorating, \textit { IM:} improving, \textit { ST:} stable}
\label { fig:res_ uxc}
\end { figure}
\begin { figure} [H]
\centering
\includegraphics [width=1.0\textwidth] { images/res_ interview}
\caption { Rankings for favorite keyboard and perceived required actuation force
for all keyboards including \textit { Own} . The graphs show the number of
times a keyboard was placed at a certain rank}
\label { fig:res_ interview}
\end { figure}