bachelor-thesis/chap5/results.tex

% A rapid method that creates many corrected errors, has efﬁcient error correction, and leaves
% few uncorrected errors can still be considered a successful method, since it produces
% accurate text in relatively little time. pp. 56 MacKenzie
\section{Results}
\label{sec:results}
This section addresses the statistical analysis of the data obtained throughout
the main, within-subject, user study (n = 24) that consisted of five repeated
measurements. Because the data was from related, dependent groups, we used
\textit{Repeated Measurement \gls{ANOVA}} if all required assumption were met
and \textit{Friedman's Test} otherwise. To identify the specific pairs of
treatments that differed significantly, we ran either \textit{Dependent T-Tests}
or \textit{Wilcoxon Signed Rank Tests} (both with \textit{Holm correction
  (sequetially rejective Bonferroni test)} \cite{holm_correction}) as post-hoc
tests \cite{field_stats, downey_stats}. The reliability of the two sub-scales
(hedonic and pragmatic quality) in the \glsfirst{UEQ-S} was estimated using
\textit{Cronbach's alpha} \cite{tavakol_cronbachs_alpha}. All results are
reported statistically significant with an $\alpha$-level of $p < 0.05$. We used
95\% confidence intervals in visualizations of certain results. Normality of
data or residuals was checked using visual assessment of \gls{Q-Q} plots and
additionally \textit{Shapiro-Wilk} Test \cite{field_stats, downey_stats}.

\subsubsection{Own Keyboard \& Reference Values}
\label{sec:res_OPC}
As mentioned in Section \ref{sec:main_design}, the keyboard \textit{Own} was
used as a reference for some metrics captured during the experiment. Since the
measurements with \textit{Own} took place at the start (T0\_1) and end (T0\_2)
of the experiment, we compared the results of both typing tests to detect
possible variations in performance due to fatigue. Using dependent T-tests, we
found that there were no significant differences in \glsfirst{KSPS} for T0\_1 (M
= 5.39, sd = 1.49) compared to T0\_2 (M = 5.47, sd = 1.48, t = -1.53, p =
0.139), \glsfirst{UER} was overall negligible with T0\_1 (M = 0.005, sd = 0.013,
85th percentile = 0.0051) and T0\_2 (M = 0.008, sd = 0.028, 85th percentile =
0.0052) and \glsfirst{WPM} showed a trend to approach significance with T0\_1 (M
= 54.2, sd = 14.7) compared to T0\_2 (M = 53.0, sd = 14.5, t = 1.92, p =
0.067). Further, using dependent T-tests we were able to find statistically
significant differences in \glsfirst{AdjWPM} for T0\_1 (M = 53.9, sd = 14.5) and
T0\_2 (M = 52.5, sd = 14.3, t = 2.44, p = 0.023), \glsfirst{CER} for T0\_1 (M =
0.057, sd = 0.028) and T0\_2 (M = 0.078, sd = 0.038, t = -3.54, p = 0.002) and
\glsfirst{TER} for T0\_1 (M = 0.063, sd = 0.031) and T0\_2 (M = 0.086, sd =
0.039, t = -4.27, p = 0.0003). Because of the differences, we decided to use the
means of all metrics gathered for each participant through T0\_1 and T0\_2 as
the reference values to compute the \textit{\gls{OPC}} for the test keyboards
(\textit{Athena, Aphrodite, Nyx} and \textit{Hera}).

Additionally, using a dependent T-test, we compared the muscle activity (\% of
\glsfirst{MVC}) and found, that there are significant differences in left flexor
(\glsfirst{FDP} \& \glsfirst{FDS}) \%\gls{MVC} for T0\_1 (M = 12.0, sd = 8.27)
and T0\_2 (M = 8.53, sd = 7.16, t = 3.18, p = 0.004). Residuals of right flexor
(\gls{FDF} \& \gls{FDS}) were not normally distributed, therefore we used the
Wilcoxon Signed Rank Test and found an significant difference for T0\_1 (M =
10.8, sd = 8.18, Med = 9.52) and T0\_2 (M = 7.71, sd = 6.08, Med = 5.32, p =
0.021). It has to be noted, that we had to remove two erroneous measurements for
the right flexor (n = 22). No significant differences have been found in left or
right extensor (\glsfirst{ED}) \%\gls{MVC} between T0\_1 and T0\_2.

\begin{table}[ht]
  \centering
  \ra{1.3}
  \begin{tabular}{?l^l^l^l^l^l^l^l}
    \toprule
    \rowstyle{\itshape}
    Y      & Comparison    & Statistic & p      & Estimate & CI             & Method & Alternative \\
    \midrule
    WPM    & T0\_1 - T0\_2 & 1.92      & 0.07   & 1.18     & [-0.09, 2.45]  & T-test & two.sided   \\
    AdjWPM & T0\_1 - T0\_2 & 2.44      & 0.02*  & 1.35     & [0.21, 2.50]   & T-test & two.sided   \\
    KSPS   & T0\_1 - T0\_2 & -1.53     & 0.14   & -0.08    & [-0.19, 0.03]  & T-test & two.sided   \\
    CER    & T0\_1 - T0\_2 & -3.54     & 0.00*  & -0.02    & [-0.03, -0.01] & T-test & two.sided   \\
    TER    & T0\_1 - T0\_2 & -4.27     & 0.00*  & -0.02    & [-0.03, -0.01] & T-test & two.sided   \\
    \%MVC_{LF} & T0\_1 - T0\_2 & 3.18      & 0.004* & 3.44     & [1.20, 5.68]   & T-test & two.sided   \\
    \%MVC_{LE} & T0\_1 - T0\_2 & 1.44      & 0.163  & 0.956    & [-0.42, 2.33]  & T-test & two.sided   \\

    \%MVC_{RF} & T0\_1 - T0\_2 & 3.18 & 0.004* & 3.44 & [1.20, 5.68] & T-test & two.sided \\
    \%MVC_{RE} & T0\_1 - T0\_2 & 3.18 & 0.004  & 3.44 & [1.20, 5.68] & T-test & two.sided \\
    \bottomrule
  \end{tabular}
\end{table}

\subsection{Performance Metrics}
\label{sec:res_perf}
\subsubsection{Typing Speed}
\label{sec:res_typing_speed}
The typing speed for each individual keyboard and typing test was automatically
captured with the help of the typing test functionality offered by
\glsfirst{GoTT}. We captured \gls{WPM}, \gls{AdjWPM} and
\gls{KSPS} according to the formulas mentioned in Section
\ref{sec:meas_perf}. The individual measurements were then converted into
percentage values of the mean of the reference values gathered from typing tests
with keyboard \textit{Own}. None of the gathered data for the individual
treatments was distributed normally and thus, Friedman's Test was applied.
update: methodology done 4 years ago			`% A rapid method that creates many corrected errors, has efﬁcient error correction, and leaves`
			`% few uncorrected errors can still be considered a successful method, since it produces`
			`% accurate text in relatively little time. pp. 56 MacKenzie`
Update 4 years ago			`\section{Results}`
update: methodology done 4 years ago			`\label{sec:results}`
			`This section addresses the statistical analysis of the data obtained throughout`
			`the main, within-subject, user study (n = 24) that consisted of five repeated`
			`measurements. Because the data was from related, dependent groups, we used`
update: results own 4 years ago			`\textit{Repeated Measurement \gls{ANOVA}} if all required assumption were met`
			`and \textit{Friedman's Test} otherwise. To identify the specific pairs of`
			`treatments that differed significantly, we ran either \textit{Dependent T-Tests}`
			`or \textit{Wilcoxon Signed Rank Tests} (both with \textit{Holm correction`
			`(sequetially rejective Bonferroni test)} \cite{holm_correction}) as post-hoc`
			`tests \cite{field_stats, downey_stats}. The reliability of the two sub-scales`
			`(hedonic and pragmatic quality) in the \glsfirst{UEQ-S} was estimated using`
			`\textit{Cronbach's alpha} \cite{tavakol_cronbachs_alpha}. All results are`
			`reported statistically significant with an $\alpha$-level of $p < 0.05$. We used`
			`95\% confidence intervals in visualizations of certain results. Normality of`
			`data or residuals was checked using visual assessment of \gls{Q-Q} plots and`
			`additionally \textit{Shapiro-Wilk} Test \cite{field_stats, downey_stats}.`

			`\subsubsection{Own Keyboard \& Reference Values}`
			`\label{sec:res_OPC}`
			`As mentioned in Section \ref{sec:main_design}, the keyboard \textit{Own} was`
			`used as a reference for some metrics captured during the experiment. Since the`
			`measurements with \textit{Own} took place at the start (T0\_1) and end (T0\_2)`
			`of the experiment, we compared the results of both typing tests to detect`
			`possible variations in performance due to fatigue. Using dependent T-tests, we`
			`found that there were no significant differences in \glsfirst{KSPS} for T0\_1 (M`
			`= 5.39, sd = 1.49) compared to T0\_2 (M = 5.47, sd = 1.48, t = -1.53, p =`
			`0.139), \glsfirst{UER} was overall negligible with T0\_1 (M = 0.005, sd = 0.013,`
			`85th percentile = 0.0051) and T0\_2 (M = 0.008, sd = 0.028, 85th percentile =`
			`0.0052) and \glsfirst{WPM} showed a trend to approach significance with T0\_1 (M`
			`= 54.2, sd = 14.7) compared to T0\_2 (M = 53.0, sd = 14.5, t = 1.92, p =`
			`0.067). Further, using dependent T-tests we were able to find statistically`
			`significant differences in \glsfirst{AdjWPM} for T0\_1 (M = 53.9, sd = 14.5) and`
			`T0\_2 (M = 52.5, sd = 14.3, t = 2.44, p = 0.023), \glsfirst{CER} for T0\_1 (M =`
			`0.057, sd = 0.028) and T0\_2 (M = 0.078, sd = 0.038, t = -3.54, p = 0.002) and`
			`\glsfirst{TER} for T0\_1 (M = 0.063, sd = 0.031) and T0\_2 (M = 0.086, sd =`
			`0.039, t = -4.27, p = 0.0003). Because of the differences, we decided to use the`
			`means of all metrics gathered for each participant through T0\_1 and T0\_2 as`
			`the reference values to compute the \textit{\gls{OPC}} for the test keyboards`
			`(\textit{Athena, Aphrodite, Nyx} and \textit{Hera}).`

			`Additionally, using a dependent T-test, we compared the muscle activity (\% of`
			`\glsfirst{MVC}) and found, that there are significant differences in left flexor`
			`(\glsfirst{FDP} \& \glsfirst{FDS}) \%\gls{MVC} for T0\_1 (M = 12.0, sd = 8.27)`
			`and T0\_2 (M = 8.53, sd = 7.16, t = 3.18, p = 0.004). Residuals of right flexor`
			`(\gls{FDF} \& \gls{FDS}) were not normally distributed, therefore we used the`
			`Wilcoxon Signed Rank Test and found an significant difference for T0\_1 (M =`
			`10.8, sd = 8.18, Med = 9.52) and T0\_2 (M = 7.71, sd = 6.08, Med = 5.32, p =`
			`0.021). It has to be noted, that we had to remove two erroneous measurements for`
			`the right flexor (n = 22). No significant differences have been found in left or`
			`right extensor (\glsfirst{ED}) \%\gls{MVC} between T0\_1 and T0\_2.`

			`\begin{table}[ht]`
			`\centering`
			`\ra{1.3}`
			`\begin{tabular}{?l^l^l^l^l^l^l^l}`
			`\toprule`
			`\rowstyle{\itshape}`
			`Y & Comparison & Statistic & p & Estimate & CI & Method & Alternative \\`
			`\midrule`
			`WPM & T0\_1 - T0\_2 & 1.92 & 0.07 & 1.18 & [-0.09, 2.45] & T-test & two.sided \\`
			`AdjWPM & T0\_1 - T0\_2 & 2.44 & 0.02* & 1.35 & [0.21, 2.50] & T-test & two.sided \\`
			`KSPS & T0\_1 - T0\_2 & -1.53 & 0.14 & -0.08 & [-0.19, 0.03] & T-test & two.sided \\`
			`CER & T0\_1 - T0\_2 & -3.54 & 0.00* & -0.02 & [-0.03, -0.01] & T-test & two.sided \\`
			`TER & T0\_1 - T0\_2 & -4.27 & 0.00* & -0.02 & [-0.03, -0.01] & T-test & two.sided \\`
			`\%MVC_{LF} & T0\_1 - T0\_2 & 3.18 & 0.004* & 3.44 & [1.20, 5.68] & T-test & two.sided \\`
			`\%MVC_{LE} & T0\_1 - T0\_2 & 1.44 & 0.163 & 0.956 & [-0.42, 2.33] & T-test & two.sided \\`

			`\%MVC_{RF} & T0\_1 - T0\_2 & 3.18 & 0.004* & 3.44 & [1.20, 5.68] & T-test & two.sided \\`
			`\%MVC_{RE} & T0\_1 - T0\_2 & 3.18 & 0.004 & 3.44 & [1.20, 5.68] & T-test & two.sided \\`
			`\bottomrule`
			`\end{tabular}`
			`\end{table}`

			`\subsection{Performance Metrics}`
			`\label{sec:res_perf}`
			`\subsubsection{Typing Speed}`
			`\label{sec:res_typing_speed}`
			`The typing speed for each individual keyboard and typing test was automatically`
			`captured with the help of the typing test functionality offered by`
			`\glsfirst{GoTT}. We captured \gls{WPM}, \gls{AdjWPM} and`
			`\gls{KSPS} according to the formulas mentioned in Section`
			`\ref{sec:meas_perf}. The individual measurements were then converted into`
			`percentage values of the mean of the reference values gathered from typing tests`
			`with keyboard \textit{Own}. None of the gathered data for the individual`
			`treatments was distributed normally and thus, Friedman's Test was applied.`