OVERVIEW

We conducted a usability testing on the comparison of user's performance on SwiftKey and standard iOS type-entry keyboard. In this study, our goals are:

  • compare SwiftKey and iOS keyboard for performance and satisfaction
  • Learn more about the experience of a novice gesture typist
  • Evaluate gesture as a text entry method

 


| ROLE |
User Researcher
Test Moderator

| MEASUREMENT METRICS |  
Time on task, Error Rate, Efficiency, Learnability,
Post-task self report metrics,
After-scenario questionnaire

| TEAM |

Nathan Heep, Chung-Yu Ho
Caroline O'Meara, Xiaoyue Pang
Chen-Chun Shen, Zhenzhi Zhu


OBJECTIVE

PRIMARY RESEARCH QUESTIONS 

  • How does gesture-text entry performance compare between SwiftKey for iOS and the standard iOS type-entry keyboard?

SECONDARY RESEARCH QUESTIONS

  • Do users perform faster at gesture-based typing over the duration of the test?
  • Do users become more accurate using gesture-based typing over the duration of the test?
  • Do users have a clear preference between SwiftKey and the standard iOS type entry keyboard?

PRIOR GESTURE RESEARCH

  • Type entry around 30-34  WPM
  • Gesture entry WPM around 24-30 WPM
  • Keyboard 60 WPM
  • Users able to learn 15 gestures over 45 mins

 

 

Swiftkey for iOS gesture keyboard

Standard iOS text-entry keyboard

The timeline of this study


METHODOLOGY

RECRUITING PARTICIPANTS

  • iPhone Users
  • 5+ years of use
  • No previous experience with gesture keyboards
  • Personal phones

TESTING ENVIRONMENT

  • Tested in a classroom
  • Captured phone screen
  • with Ziggi HD Doc Cam
  • Videotaped sessions
  • Moderator + Notetaker(s)

EQUIPMENTS

  • 1 iPhone (running iOS 8.x or 9.x and SwiftKey)
  • 1 Ziggi-HD High-Definition USB Document Camera
  • 1 Desktop, connected to the Ziggi-HD
  • 1 Digital video camera
  • 1 Hard drive to store captured videos

TASK DESIGN

The tasks included 8 exact entries and 2 scenario entries.

  • Text entry is giving participants a sheet which exact sentences to follow and type. To test the learnability, we repeated the shark entry for 3 times.
  • "There are almost 400 different kinds of sharks. There are sharks in all four oceans of the world. Some sharks are longer than a school bus, while others are so small they can live in fish tanks."

 

  • Scenario entry is giving participants a scenario and one initial sentence, and the moderator will interact with the participant real-time within the instruction and the description of the scenario. In this way, we do not ask user to look at a piece of paper then go back to the screen, but have their natural performance to communicate with the moderator. 
  • For example: "Imagine you’ve found a great deal on a couch on Craigslist. Text 512-555-1212 and ask if the couch is still available. If it is, arrange a time to come pick up the couch on Sunday, the only day you’ll have your cousin’s van. You will need their address."

METRICS FOR THE EVALUATION

The metrics we used in this study included Quantitative Metrics:

  • Time on task
    • Only measured for exact entry tasks. Starting from the moment the finger touches the keyboard to begin a gesture, we ended the timing of each task when the user’s finger completes the last gesture.
  • Error
    • Calculate the errors by looking at the final output of both kinds of typing scenarios and counting the number of typing errors. 
  • Efficiency
    • Only calculated in the exact entry tasks. We counted all backspace moments where users correct input during tasks.
  • Learnability
    • We asked participants to repeat one exact typing entry over the course of the session.

And Qualitative metrics:
 

  • Post task self-report metrics
    • We asked users to rate each task as they completed them. They would rate the task according to the following scale: “Overall, this task was very difficult, difficult, neutral, easy, very easy.”
  • After-Scenario Questionnaire (ASQ)
    • To measure overall satisfaction, we  asked users about their overall satisfaction at the end of each keyboard section.
      • I am satisfied with the ease of completing the tasks on this keyboard
      • I am satisfied with the amount of time it took to complete the tasks on this keyboard
      • I am satisfied with this typing method

STUDY FINDINGS

We came up with our findings of this evaluation based on the metrics that we had. Other than quantitive data, we also found out some issues that happened during the testing.

PERFORMANCE METRICS

For different exact entry task, we compare the metrics that we mentioned before as the composition of users' performance on different types of keyboard. In time on tasks, we calculated the CPS (Character per second) and WPM (Word per minute). We also looked at the overall performance trend between our 6 participants, and the performance difference between type and gesture keyboard.

LEARNABILITY

Synthesized the qualitative metrics of exact text entry task 2, 7, 8 (which are the three repeated shark tasks), we found an interesting result that the efficiency of standard keyboard is decreasing, but we do not know the reason behind this.

The 1st Shark Task 
Use gesture, SwiftKey predicts words after gesture completes

The 2nd Shark Task
SwiftKey correctly predicts words before gesture begins on multiple occasions user does not click

The 3rd Shark Task
SwiftKey predicts every word, user click to finish

 

SATISFACTION

In self-report metrics, we summed up the Post task self-report metrics to see the ease of use of two different keyboards in each tasks. The result showed that overall users thought that gesture keyboard is harder in each text-entry tasks. However in the task 8, which is the 3rd shark task, users recognized that SwiftKey is easier than standard keyboard, we assumed that was because of the prediction that SwiftKey provided.

For After-Scenario Questionnaire, we found that the distribution was pretty consistent, with the seven-point Likert scale, the chart showed that overall users had more positive feedback on the iOS keyboard.

SPECIFIC ISSUES

From the post-interview and feedback session, we asked users their opinions and feeling about these two keyboard. Based on their response, we concluded 5 specific issues that gesture keyboard has, and the quotes from our users. They mentioned that (1) the double letters word is very difficult for some of them because it does not fit their mental model; (2) the muscle memory that they have already acquired with standard iOS entry keyboard - with longer words, we observed that users would swipe to a certain point then couldn't find the next character they want; (3) from the perspective of efficiency, it was hard for the users to go back and correct specific character that is misspelled in the word, it took longer time; (4) small palm and short thumb problem make the gesture harder to use; (5) the better prediction feature that gesture keyboard has.


LIMITATIONS AND RECOMMENDATIONS

PROJECT LIMITATIONS

  • Task - We all agreed that there were gaps between real using scenario and the designed tasks, which make the study limited. We reflected that 
  • Location - Classroom setting might be a factor that influence participants' performance. The environment is not as natural as users' normal daily use, being asked to sit in a classroom and being videotaped might affect their emotions and performance. Also, in order to have the best quality for doc cam, we framed them to do the tasks within an area, it was another factor that might make users not as comfortable as they normally do.
  • Learnability - From previous study, we knew that users would be able to learn 15 gestures over 45 mins, and the performance would get better after a period of time. Our testing asked user to use two different types of keyboard. Although we did not ask them to finish tasks in a limited time, their learning curves were still not long enough, plus the task design and locations, the context was totally different from their real daily uses.

RECOMMENDATIONS

  • Touch Accuracy - Because of the phone size and finger, the touch accuracy performed differently in our study. Although there is a trend of making smartphone bigger, SwiftKey still needs to be aware of smaller screen users - like iPhone 4/5, make sure the touch accuracy could have the same quality among different devices.
  • Auto Correction - Sometimes the auto correction is useful, but sometimes it is nosy and makes mistakes before users notice. We suggested that SwiftKey can try to learn users' conversation context or typing patterns more carefully.
  • Prediction - SwiftKey is famous for its use of big data - collecting people's vocabulary and gesture to improve the prediction. But in our study there are still spaces to improve the prediction feature. To hold an advantage in the market, SwiftKey should be aware of the prediction issues that users encountered.
  • Combined Typing - The double-letter issue that happen during our testing did enhance the difficulty of the use. But just like the QWERTY keyboard, people might get better if they really like or enjoy the experience and are willing to learn - we are actually optimistic and confident in SwiftKey's market, as long as the prediction experience could have better improvement, it should not be a problem for them to promote their gesture keyboard.