bachelor-thesis/chap3/implementation.tex

\section{Development and Implementation of Necessary Tools}
For the purpose of this thesis, we programmed our own typing test platform to
have better control over the performance related measurements and the text that
has to be transcribed. The participants had to fill out up to two questionnaires
after each typing test which had to be linked to this specific typing test or
keyboard. With a total number of 24 subjects, five keyboards and therefore 10
individual typing tests per subject or 240 typing tests in total, we decided to
incorporate a questionnaire feature into our platform to mitigate the
possibility of false mappings between typing tests, surveys and
participants. Additionally, because we wanted to control the understandability
of text without introducing observer bias for the text selection process and
also to save time, we implemented a crowdsourcing feature where individuals
could provide text snippets that were automatically checked for adequate
\gls{FRE}. Finally, we wanted to open source this platform so other researchers
in the field of text entry performance could use it without additional cost.

Another challenge was to measure the maximum force each individual finger is
able to apply to any of the keyswitches on a keyboard. We therefore decided to
prototype a device that is able to simulate the position of different keyswitches
and measure the applied force by the finger usually responsible to actuate a
specific key.

Both implementations are explained in more detail in the following two sections
as shown in Figure \ref{fig:s3_flow}

\begin{figure}[H]
  \centering
  \includegraphics[width=1.0\textwidth]{images/section_3_flow}
  \caption{Overview of the topics covered in the following sections}
  \label{fig:s3_flow}
\end{figure}

\subsection{Typing Test Platform}
\label{sec:gott}
The platform we created is called \gls{GoTT} because the backend, which is the
server side code, is programmend in Go, a programming language developed by a
team at Google \cite{golang}. The decision for Go was made because Go's
standard library offers convenient packages to quickly setup a web server with
simple routing and templating functionalities \cite{golang_std}. The backend and
frontend communicate through a \gls{REST} \gls{API} and exchange data in
\gls{JSON} format. \gls{GoTT} utilizes a document based database to persistently
store login credentials, results of typing tests and all finished
questionnaires. We decided to use \gls{MongoDB} because of the capability to
directly store \gls{JSON}-like, nested, data without prior transformation
\cite{mongodb}. The general functionality of \gls{GoTT} can be seen in Figure
\ref{fig:gott_arch}.

\begin{figure}[H]
  \centering
  \includegraphics[width=0.9\textwidth]{images/gott_arch.png}
  \caption{Overview of the general functionality of \gls{GoTT}}
  \label{fig:gott_arch}
\end{figure}

The platform offers three major functionalities that are important for this thesis:

\begin{enumerate}
  \item \textbf{The typing test} itself was designed after evaluating various
  free typing test tools online. One major issue almost all had in common was
  the lack of functionality to provide own texts for transcription. Further,
  only a few provided insights on how performance metrics were calculated or the
  ability to export results automatically. Since time in between typing tests
  was limited by the design of the experiment as described in Section
  \ref{sec:methodology}, recording the results by hand for multiple metrics
  would have been error prone and therefore not a valid option.

  The typing test provided by \gls{GoTT} features a non-intrusive interface. The
  font size can be adjusted via the zoom functionality of the browser. Colors
  used to indicate correctly or incorrectly entered characters have been
  adjusted to enhance accessibility for people with vision related
  disabilities. The perception of the colors used in \gls{GoTT} for people with
  different color vision impairments can be observed in Figure
  \ref{fig:gott_colorblind} and was simulated with the help of a tool called
  \textit{Color Oracle} \footnote{\url{https://colororacle.org/index.html}}
  \cite{colororacle}.

  \begin{figure}[H]
    \centering
    \includegraphics[width=1.0\textwidth]{images/gott_colorblind.png}
    \caption{\gls{GoTT}'s text area perceived with different kinds of
      colorblindness. The examples are ordered from top, impairments most
      commonly found in the population, to bottom (least common) and are
      simulated with the tool \textit{Color Oracle} \cite{colororacle}}
    \label{fig:gott_colorblind}
  \end{figure}

  The typing test features an area to display the text that has to be
  transcribed. As soon as the typist has transcribed half of the displayed text,
  the content of this area starts to scroll up one line after each finished line
  of text. Further, two drop down menus are used to select the text and keyboard
  currently required for the next typing test. Lastly, two buttons determine when
  the text is revealed (Start) and if the participant or researcher wants to
  interrupt the active typing test in case of malfunctioning hardware e.g.,
  keyboard, \gls{EMG} device, computer, etc., or if the subject experiences
  discomfort and wants to stop. The timer for the typing test starts when the
  participant inputs the first character after the start button was pressed. The
  \gls{UI} for the typing test is shown in Figure \ref{fig:gott_text_area}.

  \begin{figure}[H]
    \centering
    \includegraphics[width=0.80\textwidth]{images/gott_text_area.jpg}
    \caption{\gls{GoTT}'s typing test. The \textit{START} button reveals the
      text selected with the dropdown menu labeled \textit{Text to
        transcribe}. The \textit{RESET} button interrupts the currently active
      typing test. The content will scroll up one line after half of the text
      was transcribed (Marked by \textit{Scrolling begins here}) so the relevant
      line always stays centered.}
    \label{fig:gott_text_area}
  \end{figure}

  \gls{GoTT} captures the metrics presented in Listing \ref{lst:meas_perf}
  according to the formulas given in Section \ref{sec:meas_perf}.

  \begin{listing}[H]
\caption{Implementation of performance related metrics in \gls{GoTT}.
The function \textit{roundToPrecision} takes the number of decimal places
to round to as the second argument.}
\label{lst:meas_perf}
\begin{minted}[linenos,fontsize=\small]{js}
// TEST_TIME is retrieved from backend and
// set in the config file in seconds
mins = TEST_TIME / 60;
// T is the transcribed text
TL = T.length;
// Input Stream Length = TL + Fixes (Backspace)
//                          + Incorrect Fixed (Fixed Errors)
ISL = TL + F + IF;
// Correct input = TL - Incorrect Not Fixed (Left errors)
C  = TL - INF;

// Error metrics
CER = roundToPrecision(IF / (TL + IF), 5);
UER = roundToPrecision(INF / (TL + IF), 5);
TER = roundToPrecision((INF + IF)/(TL + IF), 5);

// Speed metrics
// TL - 1 because the first char is entered at 0 seconds
WPM = roundToPrecision((TL - 1) / (5 * mins), 2);
AdjWPM = roundToPrecision(WPM * Math.pow((1 - UER), a), 2);
KSPS = roundToPrecision((ISL - 1) / TEST_TIME, 5);
\end{minted}
\end{listing}
% // Correct / Any input char
% accuracy = roundToPrecision(C / (TL + IF) * 100, 2);
% KSPC = roundToPrecision(ISL / TL, 5);

For further implementation details on how input was captured or sent to the
backend refer to the code in the online
repository\footnote{\url{https://github.com/qhga/GoTT}}.

To test the usability of the typing test we asked five individuals to complete
multiple typing tests with their own computer. Based on the feedback we
received, we were able to switch to another font to further improve readability
and also fix a bug related to the scrolling. All five volunteers reported that
the typing test was very intuitive and fun to use.

\item \textbf{The questionnaires} had to be linked to a specific participant,
typing test and keyboard. In total, three different types of questionnaires had
to be filled out by each participant at different times (more information in
Section \ref{sec:methodology}). The demographics questionnaire was completed
once at the start of the experiment, which could have been done via already
existing survey tools and then linked to the participant by hand. The \gls{PTTQ}
and the \gls{PKQ} on the other hand, were required after each individual typing
test or after every keyboard respectively. Whereas manually matching all
finished questionnaires to the corresponding typing tests and keyboards could
have led to unwanted errors, we decided to implement a survey tool into
\gls{GoTT} which achieved this task automatically. The \gls{PTTQ} resembled the
\gls{KCQ} \cite[56]{iso9241-411} and the questions for the \gls{PKQ} were
gathered from the \gls{UEQ-S} \cite{schrepp_ueq_handbook}. All questionnaires
can be observed in Appendix \ref{app:gott}.

\item \textbf{The text crowdsourcing platform} was required because of the
potential introduction of observer bias as described in Section
\ref{sec:bias}. Further, this part of \gls{GoTT} helped us gather 44639 instead
of the estimated 40000 required characters to provide enough text for ten
non-overlapping texts. The goal was reached after only 2 days, which proved
crowdsourcing to be a good method to efficiently gather greater amounts of
text for our experiment. The estimation of 40000 characters was made according
to Eq. \ref{eq:chars}.

\begin{equation}
  \label{eq:chars}
  n_{kb} * m_{ttkb} * \frac{s}{60} * |w| * wpm_{max} = 5 * 2 * \frac{300}{60} * 5 * 160 = 40000
\end{equation}

with $n_{kb}$ the number of tested keyboards, $m_{ttkb}$ the number of typing
test conducted with each keyboard, $\frac{s}{60}$ the time for each typing test
(5min), $|w|$ number of characters defining a word (Section \ref{sec:meas_perf})
and $wpm_{max}$ which represents the average wpm of the top 100 typists
retrieved from a database released by the website
Typeracer\footnote{\url{https://docs.google.com/spreadsheets/d/18ZokmvjdzDypIr-Ayl1VWsRPOBa91qvgX3FgcsZtSAU/edit#gid=636312661}}
which included the top 25000 competitors in terms of average \gls{WPM}
\cite{typeracer}.

The text snippets provided by volunteers trough our platform had to fulfill three
requirements:
\begin{enumerate}
  \item German language
  \item Fairly easy to understand (\gls{FRE} $>$ 70
  \cite{flesch_fre})
  \item Number of characters must be between 200 and 300
\end{enumerate}

In order to communicate what kind of text is appropriate, the platform provided
an example where the difference between fairly easy and difficult text was
shown. Further, the backend implemented a set of functions that calculated the
\gls{FRE} of submitted text, counted the number of characters and either
accepted or rejected the text depending on if the requirements were met or
not. The implementation of the algorithm that calculates the \gls{FRE} can be
seen in Listing \ref{lst:gott_fre}. The function \textit{countSyllables}
utilizes regex \footnote{\url{https://github.com/google/re2/wiki/Syntax}}
matching to identify the number of syllables in a given string in German
language. The rules for hyphenation defined by \textit{Duden Online}
\footnote{\url{https://www.duden.de/sprachwissen/rechtschreibregeln/worttrennung}}
were used to derive the regex patterns to identify syllables
\cite{duden_hyphen}. The \gls{FRE} scores yielded by our function were verified
with the help of multiple unit tests and also compared to scores obtained by
another website \footnote{\url{https://fleschindex.de/berechnen/}} offering the
calculation for German texts. The \gls{UI} for the crowdsourcing page is shown
in Appendix \ref{app:gott}. The gathered text snippets were, first checked for
typos and grammar using \textit{Duden Mentor}\footnote{\url{https://mentor.duden.de/}},
then randomized and finally aggregated into equally long texts with nearly
identical \gls{FRE} scores (mean = 80.10, SD = 0.48).

\begin{listing}[H]
\caption{Algorithm that calculates the \gls{FRE} score for a given string in German
language, utilizing regex pattern matching to count syllables, words and sentences.}
\label{lst:gott_fre}
\begin{minted}[linenos,fontsize=\small]{go}
func countSyllables(txt string) int {
	rx := regexp.MustCompile(`(?i)[^aeiouäöüßy\W][aeiouäöüßy]|
        \b[aeiouäöüßy][^aeiouäöüßy\W]|\b[aeiouäöüy]{2,}|
        u[aeuo]|(on|er)\b|\B(a|o|u|e)\B`)
	extraConsonants := []string{"ck", "x", "ch", "x", "sch", "x",
                                    "st", "x", "gn", "x"}
	extraVowels := []string{"äu", "i", "ie", "i"}
	r := strings.NewReplacer(extraConsonants...)
	txt = r.Replace(txt)
	r = strings.NewReplacer(extraVowels...)
	txt = r.Replace(txt)
	syllableCount := len(rx.FindAllStringIndex(txt, -1))
	return syllableCount
}

func countWords(txt string) int {
	rx := regexp.MustCompile(`[\wäöüß]{2,}`)
	return len(rx.FindAllStringIndex(txt, -1))
}

func countSentences(txt string) int {
	rx := regexp.MustCompile(`[\wäöüß]{2,}[\?\.!;]`)
	return len(rx.FindAllStringIndex(txt, -1))
}

func calculateFRE(txt string) float64 {
	syc := countSyllables(txt)
	wc := countWords(txt)
	sec := countSentences(txt)
        // Average Sentence Length = Words / Sentence
	asl := float64(wc) / float64(sec)
        // Average Number of Syllables per Word = Syllables / Words
	asw := float64(syc) / float64(wc)
	fre := math.Round((180.-asl-(58.5*asw))*100) / 100
	// <0 and >100 is allowed, but not relevant in our case
	if fre > 100. { fre = 100. }
	if fre < 0. { fre = 0. }
	return fre
}
\end{minted}
\end{listing}
\end{enumerate}

\pagebreak
\subsection{Finger Strength Measurement Device}
\label{sec:force_meas_dev}

\begin{figure}[ht]
  \centering
  \includegraphics[width=0.8\textwidth]{images/force_master_1}
  \caption{Prototype of a measuring device that simulates the distance and
    finger position required to press different keys on a keyboard. The display
    shows the currently applied force in gram and the peak force applied
    throughout the current measurement in gram and \gls{N}}
  \label{fig:force_master}
\end{figure}

Considering the fact that we required very specific data about the force each
digit is able to apply to keyswitches in different locations, we decided to
prototype our own device to measure the required data. Because of previous
research in the field of finger strength and force applied to keyboards, we
wanted to use the same type of sensor―a load cell―that was commonly utilized in
those studies \cite{gerard_keyswitch, rempel_ergo, bufton_typingforces}. A load
cell, capable of measuring up to 5 kg $\approx$ 49.0 \gls{N}, in combination
with the HX711 load cell amplifier shown in Figure \ref{fig:hx711} and the
library HX711\_ADC\footnote{\url{https://github.com/olkal/HX711_ADC}} was used
to build the prototype which can be seen in Figure
\ref{fig:force_master}. Initial testing revealed that the response for
measurements with the standard 10 Hz sample rate of the HX711 was not sufficient
to pick up the peak force in some measurements. Therefore, we resoldered the 0
$\Omega$ surface mount resistor to raise sample rate to 80 Hz, which yielded
better results for fast keystrokes but did not deteriorate overall precision
compared to the measurements conducted with 10 Hz. The apparatus used an
\gls{OLED} display to present currently applied force in gram and peak force in
gram and \gls{N}. The device was mainly controlled via two terminal
commands. While one command initiated re-calibration that was used after each
participant or in between measurements, the other command reset all peak
values displayed via the display. The base of the device featured a scale, which
was traversed with the help of a wrist rest that got aligned with the markings
corresponding to the currently measured key. Each mark represents the distance
and position of a finger to the associated key indicated by the label underneath
the marking. The measurement process is explained in more detail in Section
\ref{sec:meth_force}

\begin{figure}[ht]
  \centering
  \includegraphics[width=0.5\textwidth]{images/hx711}
  \caption{HX711 amplifier module. The 0 $\Omega$ resistor had to be resoldered
    to accomplish 80 Hz polling rate. This module is used in combination with
    the HX711\_ADC library to read the changes in resistance by the load cell
    and convert those into gram.}
  \label{fig:hx711}
\end{figure}

\subsection{Summary}
By implementing our own typing test platform (\gls{GoTT}) we maximized the
control over one of the main measurement tools required by our experiment. We
were able to exactly define all functions responsible to collect the metrics
according to our research done in Section \ref{sec:meas_perf}. The crowdsourcing
tool allowed us to gather a great amount of unbiased text in very little time
and the addition of questionnaires into \gls{GoTT} eliminated the possibility of
unnecessary errors. Both potentially improved the reliability of the results
acquired by our experiment. Further, the device we built to measure the peak
force each finger can produce while pressing certain keys on a keyboard allowed
us to base the design of our keyboard with non-uniform actuation forces on more
than anecdotal evidence. The exact procedure of our preliminary experiment on
peak force will be addressed in the following section.