FUSION Replication Package

In this section we provide a description of the studies conducted to evaluate FUSION and make all of the data available for replication purposes and further research.  (Please note that this section contains results not explicitly discussed in our paper due to space limitations)

Research Questions

  • RQ1: What types of information fields do developers/testers consider important to be available when reporting bugs in Android?

  • RQ2: Is FUSION easier to use for reporting bugs than traditional bug tracking systems?  

  • RQ3: Are FUSION Reports easier to use for reproducing bugs than traditional bug reports?

  • RQ4: Do reports generated with FUSION allow for faster bug reproduction compared to reports submitted using traditional bug tracking systems?

  • RQ5: Do developers using FUSION reproduce more bugs compared to traditional bug tracking systems?

Study Descriptions

study 1: Reporting Bugs with FUSION

The goal of the first study is to assess whether FUSION’s features are useful when reporting bugs for Android apps, which aims to answer RQ1 & RQ2. In particular, we want to identify if the auto-completion steps and in-situ screenshot features are useful when reporting bugs. For this, we recruited eight students (four undergraduate or non-experts and four graduate or experts) at the College of William and Mary to construct bug reports using FUSION and Google Code Issue Tracker (GCIT) — as a representative of traditional bug tracking systems— for the real world bugs shown in the table below. The four graduate participants had extensive programming backgrounds. Four participants constructed a bug report for each of the 15 bugs in Table 2 using FUSION prototype, and four participants reported bugs using the Google Code Issue Tracker Interface. The participants were distributed to the systems to have non-experts and experienced programmers evaluating both systems. In total the participants constructed 60 bug reports using FUSION and 60 using GCIT. Participants used a Nexus 7 tablet with Android 4.4.3 KitKat installed to reproduce the bugs.  To avoid bias, participants watched videos that illustrated the bug being manifested on a Nexus 7 tablet. We collected the time it took each participant to report each bug as well as the answers to the User Preference and Usability questions below.  

Study 1 User PREFERENCE Questions

Question IDQuestion
S1UP1 What fields in the form did you find useful when reporting the bug?
S1UP2 (FUSION ONLY) Were the component suggestions accurate?
S1UP3 (FUSION ONLY) Were the screenshot suggestions accurate?
S1UP4 What information if any were you not able to report?
S1UP5What elements do you like most from the system?
S1UP6What elements do you like least in the system?
S1UP7 Please give any additional feedback about the bug reporting system?

Study 1 Usability questions

Question IDQuestion
S1UX1 I think that I would like to use (system) frequently.
S1UX2I found (system) very cumbersome to use.
S1UX3I found the various functions in (system) were well integrated.
S1UX4 I thought (system) was easy to use.
S1UX5I found (system) unnecessarily complex.
S1UX6I thought (system) was really useful for reporting a bug.

Study 2: Reproducibility of FUSION Bug REports

The goal of Study 2 is to evaluate the ability of FUSION to improve the reproducibility of bug reports, in turn answering research questions RQ3-RQ5. In particular, we evaluated the following aspects in FUSION and traditional issue trackers: 1) usability when using the bug tracking systems’ GUIs for reading bug reports, 2) time required to reproduce reals bugs by using the bug reports, and 3) number of bugs that were successfully reproduced. The reports generated during Study 1, using FUSION and GCIT, in addition to the original bug reports (Table 2) were evaluated by a new set of participants by attempting to reproduce the bugs on physical devices. The usability was evaluated by using statements based on the SUS usability scale by John Brooke [link]. These statements can be found in the attached data files.  For the evaluation we enlisted 20 new participants, none of which participated in the first study. The participants were graduate students from the Computer Science Department at College of William and Mary, all of whom are familiar with the Android platform. All participants were compensated $15 USD for their efforts. Each user evaluated 15 bug reports, six from FUSION, six from GCIT, and three original. 135 reports were evaluated (120 from Study 1 plus the 15 original bug reports), and were distributed to the 20 participants in such a way that each bug report was evaluated by two different participants (the full design matrix can be found in the attached files). Bug reports produced by both experienced [FUS(E), GCIT(E)] and inexperienced participants  [FUS(I), GCIT(I)] from the first study were evenly distributed across the six reports evaluated by each participant for the FUSION and GCIT systems.  (For more details on these studies please refer to our paper)

Study 2 User Preference Questions

Question IDQuestion
S2UP1What information from this type of Bug Report did you find useful for reproducing the bug?
S2UP2What other information if any would you like to see in this type of bug report?
S2UP3What elements did you like the most from this type of bug report?
S2UP4 What information did you like least from this type of bug report?

Study 2 Usability Questions

Question IDQuestion
S2UX1 I think that I would like to use this type of bug report frequently.
S2UX2I found this type of bug report unnessecarily complex.
S2UX3I thought this type of bug report was easy to read/understand.
S2UX4I found this type of bug report very cumbersome to read.
S2UX5I thought the bug report was really useful for reproducing the bug.

Graphs and Figures

Study 1 UX Question Responses

Study 1 Bug Creation Time Statistics (EX = Experienced USer IEX= Inexperienced USer)

Partcipant 1 (EX)Participant 2 (EX) Participant 3 (IEX)Participant 4 (IEX)
FUSION 5:14 5:20 10:40 4:59
Partcipant 5 (EX)Participant 6 (EX) Participant 7 (IEX)Participant 8 (IEX)
GCIT 3:17 6:39 1:14 1:46

Study 2 User Experience responses

Study 2 Bug Reproduction results

Answers to Research Questions

RQ1: While reporter’s generally felt that the opportunity to enter extra information in a bug report using FUSION increased the quality of their reports, inexperienced users would have preferred a simpler web UI.

RQ2: According to usability scores, participants generally preferred FUSION over the original bug reports, but generally preferred GCIT to FUSION by a small margin. The biggest reporter complaint regarding FUSION was the organization of information in the report.

RQ3: Developers using FUSION are able to reproduce more bugs compared to traditional bug tracking systems such as the GCIT.

RQ4: Bug reports generated with FUSION do not allow for faster reproduction of bugs compared bug reports generated using traditional bug tracking systems such as the GCIT.

Reproduction Data

In this section we provide all of the information collected during the user studies used in our evaluation of FUSION.  Specifically we provide the Following items:

  • A list of all the real-world Android application bugs used in the studies extracted from apps hosted on the F-Droid open-source marketplace, with links to the original bug reports and videos of each bug.

  • All of the user responses and statistics from both studies conducted.

Applications and Bug Reports Used in Evaluation

Click on the application to view the video of each bug being exhibited on a Nexus 7 tablet.  Click on the ID # of the bug report to view the original bug report in the corresponding app's issue tracker and click on the name of the application to view a video of each bug being exhibited on a Nexus 7 tablet.  

  • The representations for the Bug Type are as follows: GDE = Gui Display Error, C = Crash, DIC = Data Input/Calculation Error, NE = Navigation Error

Application NameBug IDDescriptionMinimum # of StepsBug Type
A Time Tracker 24 Dialog box is displayed three times in error. 3 GDE
Aarddict 106 Scroll Position of previous pages is incorrect. 4-5 GDE
ACV 11 App Crashes when long pressing on sdcard folder. 5 C
Car Report 43 Wrong information is displayed if two of the same values are entered subsequently 10 DIC
Document Viewer 48 Go To Page \# number requires two entries before it works 4 NE
DroidWeight 38 Weight graph has incorrectly displayed digits 7 GDE
Eshotroid 2 Bus time page never loads. 10 GDE/NE
GNU Cash 256 Selecting from autocomplete suggestion doesn't allow modification of value 10 DIC
GNU Cash 247 Cannot change a previously entered withdrawal to a deposit. 10 DIC
Mileage 31 Comment Not Displayed. 5 GDE/DIC
NetMBuddy 3 Some YouTube videos do not play. 4 GDE/NE
Notepad 23 Crash on trying to send note. 6 C
OI Notepad 187 Encrypted notes are sorted in random when they should be ordered alphabetically 10 GDE/DIC
Olam 2 App Crashes when searching for word with apostrophe or just a "space" character 3 C
QuickDic 85 Enter key does not hide keyboard 5 GDE

User Study Dataset

Click the button below to download our dataset either in .csv or .xlsx format.  If you need a viewer for the .xlsx version of the dataset you can download LibreOffice (free), OpenOffice (free) or Microsoft's Excel (paid).

The spreadsheet is arranged in 9 different sheets/files each corresponding to a different type of data as follows:

  • Study 1 User Experience

  • Study 1 User Preference

  • Study 1 Bugs Created

  • Study 1 Bug Creation Time Results

  • Study 1 Participant Programming Experience

  • Study 2 User Experience

  • Study 2 User Preference

  • Study 2 Bug Reproduction

  • Aggregated Bug Reproduction Results

  • Study 2 Participant Programming Experience

 

**To access any of the FUSION bug reports used in the study, simply navigate to our report lookup page, and type in the ID # of the bug report you wish to view.  For more information, see the above instructions on using FUSION.