GVT: Automated Reporting of GUI Design Violations for Mobile Apps

Team Members: Kevin MoranBoyang LiCarlos Bernal-Cárdenas, Dan Jelf, & Denys Poshyvanyk

College of William & Mary --- SEMERU



This project was created by the Software Engineering Maintenance and Evolution Research Unit (SEMERU) at the College of William & Mary, under the supervision of Dr. Denys Poshyvanyk.  The major goal of the GVT project is to help improve and automate the process of verifying whether the GUI of a mobile app was implemented according to its design specifications by automatically reporting instances where an app implementation differs from its design.  

Video Demonstration


Request Access to the GVT Tool!

Would you like to use GVT in your research or open source project? Click on the button below to request access to the tool. 


GVT Workflow Overview

Figure 1: GVT Workflow Overview (Click for more detail)

The GVT Approach

The workflow of GVT (Fig. 1) proceeds in three stages: First in the GUI-Collection Stage, GUI-related information from both mock-ups and running apps is collected; Next, in the GUI-Comprehension Stage leaf-level Gui-Components (GCs) are parsed from the trees and a KNN-based algorithm is used to match corresponding GCs using spatial information; Finally, in the Design Violation Detection Stage Design Violations (DVs) are detected using a combination of methods that leverage spatial GC information and computer vision techniques. 

Stage 1: GUI Collection

Mock-Up GUI Collection. Software UI/UX design professionals typically use professional-grade image editing software (such as Photoshop or Sketch) to create their mock-ups. Designers employed by our industrial partners at Huawei utilize the Sketch design software. Sketch is popular among mobile UI/UX designers due to its simple but powerful features, ease of use, and large library of extensions. When using these tools designers often construct graphical representations of smartphone applications by placing objects representing GCs (which we refer to as mock-up GCs) on a canvas (representing a Screen) that matches the typical display size of a target device. In order to capture information encoded in these mock-ups we decided to leverage an export format that was already in use by our industrial partner, an open-source Sketch extension called Marketch that exports mock-ups as an html page including a screenshot and JavaScript file.

Thus, as input from the mock-up, GVT receives a screenshot (to be used later in the Design Violation Detection Stage) and a directory containing the Marketch information. The JavaScript file contains several pieces of information for each mock-up GC including, (i) the location of the mock-up GC on the canvas, (ii) size of the bounding box, and (iii) the text/font displayed by the mock-up GC (if any). As shown in Figure 1-1.1 , we built a parser to read this information. However, it should be noted that our approach is not tightly coupled to Sketch or Marketch files. For example, we could also parse information from mock-ups created in Photoshop exported as .svg files.  After the Marketch files have been parsed, GVT examines the extracted spatial information to build a GC hierarchy. The result can be logically represented as a rooted tree where leaf nodes contain the atomic UI-elements with which a typical user might interact. Non-leaf node components typically represent containers, that form logical groupings of leaf node components and other containers. In certain cases, our approximation of using mock-up GCs to represent implementation GCs may not hold. For instance, an icon which should be represented as a single GC may consist of several mock-up GCs representing parts of the icon. GVT handles such cases in the GUI-Comprehension Stage.

Dynamic App GUI-Collection. In order to compare the the mock-up of an app to its implementation G must extract GUI-related meta-data from a running Android app. G is able to use Android’s uiautomator framework intended for UI testing to capture xml files and screenshots for a target screen of an app running on a physical device or emulator. Each uiautomator file contains information related to the runtime GUI-hierarchy of the target app, including the following attributes utilized by GVT : (i) The Android component type (e.g., android.widget.ImageButton), (ii) the location on the screen, (iii) the size of the bounding box, (iv) text displayed, (v) a developer assigned id. The hierarchal structure of components is encoded directly in the uiautomator le, and thus we built a parser to extract GUI-hierarchy using this information directly (see Fig. 1-1.2 ). 

Stage 2: GUI Comprehension

In order for GVT to find visual discrepancies between components existing in the mock-up and implementation of an app, it must determine which components correspond to one another. Unfortunately, the GUI-hierarchies parsed from both the Marketch, and uiautomator les tend to differ dramatically due to several factors, making tree-based GC matching difficult. First, since the hierarchy constructed using the Marketch files is generated using information from the Sketch mock-up of app, it is using information derived from designers. While designers have tremendous expertise in constructing visual representations of apps, they typically do not take the time to construct programmatically-oriented groupings of components. Furthermore, designers are typically not aware of the correct Android component types that should be attributed to different objects in a mock-up. Second, the uiautomator representation of the GUI-hierarchy contains the runtime hierarchal structure of GCs and correct GC types. This tree is typically far more complex, containing several levels of containers grouping GCs together, which is required for the responsive layouts typical of mobile apps.

To overcome this challenge, GVT instead forms two collections of leaf-node components from both the mock-up and implementation GUI-hierarchies (Fig. 1-2), as this information can be easily extracted.  The vast majority of DVs affects leaf-node components. Once the leaf node components have been extracted from each hierarchy, GVT employs a K-Nearest- Neighbors (KNN) algorithm utilizing a similarity function based on the location and size of the GCs in order to perform matching. In this setting, an input leaf-node component from the mock-up would be matched against it closest (e.g., K=1) neighbor from the implementation based upon the following similarity function: 

Screen Shot 2018-02-21 at 11.52.08 AM.png

Where is a similarity score where smaller values represent closer matches. The x, ,w and h variables correspond to the x & y locations of the top and left-hand borders of the bounding box, and the height and width of the bounding boxes for the mock-up and implementation GCs respectively. The result is a list of GCs that should logically correspond to one another (corresponding GCs).

It is possible that there exist instances of missing or extraneous components between the mock-up and implementation. To identify these cases, our KNN algorithm employs a GC-Matching Threshold (MT). If the similarity score of the nearest neighbor match for a given input mock-up GC exceeds this threshold, it is not matched with any component, and will be reported as a missing GC violation. If there are unmatched GCs from the implementation, they are later reported as extraneous GC violations.

Also, there may be cases where a logical GC in the implementation is represented as small group of mock-up GCs. GVT is able to handle these cases using the similarity function outlined above. For each mock-up GC, GVT checks whether the neighboring GCs in the mockup are closer than the closest corresponding GC in the implementation. If this is the case, they are merged, with the process repeating until a logical GUI-component is represented. 

Stage 3: Design Violation Detection

In the Design Violation Detection stage of the GVT workflow, the approach uses a combination of computer vision techniques and heuristic checking in order to effectively detect the differentiate between orthogonal categories of DVs.

Perceptual Image Differencing. In order to determine corresponding GCs with visual discrepancies GVT uses a technique called Perceptual Image Differencing (PID) that operates upon the mock-up and implementation screenshots. PID utilizes a model of the human visual system to compare two images and detect visual di erences, and has been used to successfully identify visual discrepancies in web applications in previous work. We use this algorithm in conjunction with the GC information derived in the previous steps of G to achieve accurate violation detec- tion. For a full description of the algorithm, we refer readers to the PID project. The PID algorithm uses several adjustable parameters including: F which corresponds to the visual field of view in degrees, L which indicates the luminance or brightness of the image, and C which adjusts sensitivity to color differences. The values used in our implementation are stipulated at the end of this section.

The output of the PID algorithm is a single difference image (Fig. 1-3 ) containing difference pixels, which are pixels considered to be perceptually different between the two images. After processing the difference image generated by PID, GVT extracts the implementation bounding box for each corresponding pair of GCs, and overlays the box on top of the generated di erence image. It then calculates the number of di erence pixels contained within the bounding box where higher numbers of di erence pixels indicate potential visual discrepancies. Thus, GVT collects all “suspicious" GC pairs with a % of difference pixels higher than a Difference Threshold DT . This set of suspicious components is then passed to the Violation Manager (Fig. 1-3 ) so that specific instances of DVs can be detected.

Detecting Layout Violations. The first general category of DVs that GVT detects are Layout Violations.  There are six specific layout DV categories that relate to two component properties: (i) screen location (i.e., <x,y> position) and (ii) size (i.e., <h,w> of the GC bounding box). GVT first checks for the three types of translation DVs utilizing a heuristic that measures the distance from the top and left-hand edges of matched components. If the difference between the components in either the x or dimension is greater than a Layout Threshold (LT), then these components are reported as a Layout DV . Using the LT avoids trivial location discrepancies within design tolerances being reported as violations, and can be set by a designer or developer using the tool. When detecting the three types of size DVs in the derived design violation taxonomy, GVT utilizes a heuristic that compares the width and height of the bounding boxes of corresponding components. If the width or height of the bounding boxes di er by more than the LT , then a layout violation is reported.

Detecting Text Violations. The next general type of DV that GVT detects are Text Violations, of which there are three specific types: (i) Font Color, (ii) Font Style, and (iii) Incorrect Text Content. These detection strategies are only applied to pairs of text-based components as determined by uiautomator information. To detect font color violations, GVT extracts cropped images for each pair of suspicious text components by cropping the mock-up and implementation screenshots according to the component’s respective bounding boxes. Next, Color Quantization (CQ) is applied to accumulate instances of all unique RGB values expressed in the component-specific images. This quantization information is then used to construct a Color Histogram (CH) (Fig. 1-3). GVT computes the normalized Euclidean distance between the extracted Color Histograms for the corresponding GC pairs, and if the Histograms do not match within a Color Threshold (CT) then a Font-Color DV is reported and the top-3 colors (i.e, centroids) from each CH are recorded in the GVT report. Likewise, if the colors do match, then the PID discrepancy identified earlier is due to the Font-Style changing (provided no existing layout DVs), and thus a Font-Style Violation is reported. Finally, to detect incorrect text content, GVT utilizes the textual information, preprocessed to remove whitespace and normalize letter cases, and performs a string comparison. If the strings do not match, then an Incorrect Text Content DV is reported.

Detecting Resource Violations. GVT is able to detect the following resource DVs: (i) missing component, (ii) extraneous component, (iii) image color, (iv) incorrect images, and (v) component shape. The detection and distinction between Incorrect Image DVs and Image Color DVs requires an analysis that combines two different computer vision techniques. To perform this analysis, cropped images from the mock-up and implementation screenshots according to corresponding GCs respective bounding boxes are extracted. The goal of this analysis is to determine when the content of image-based GCs differ, as opposed to only the colors of the GCs differing. To accomplish this, GVT leverages PID applied to extracted GC images converted to a binary color space (B-PID) in order to detect di erences in content and CQ and CH analysis to determine di er- ences in color (Sec. 4.4.3). To perform the B-PID procedure, cropped GC images are converted to a binary color space by extracting pixel intensities, and then applying a binary transformation to the intensity values (e.g., converting the images to intensity independent black & white). Then PID is run on the color-neutral version of these images. If the images differ by more than an Image Difference Threshold (IDT), then an Incorrect Image DV (which encompasses the Component Shape DV ) is reported. If the component passes the binary PID check, then G utilizes the same CQ and CH processing technique described above to detect image color DVs. Missing and extraneous components are detected as described earlier.

Generating Violation Reports. In order to provide developers and designers with e ective information regarding the detected DVs, GVT generates an html report that, for each detected violation contains the following: (i) a natural language description of the design violation(s), (ii) an annotated screenshot of the app im- plementation, with the a ected GUI-component highlighted, (iii) cropped screenshots of the a ected GCs from both the design and implementation screenshots, (iv) links to a ected lines of application source code, (v) color information extracted from the CH for GCs identified to have color mismatches, and (vi) the difference image generated by PID. The source code links are generated by matching the ids extracted from the uiautomator information back to their declarations in the layout xml files in the source code (e.g., those located in the /res/ directory of an app’s source code).

GVT Parameters

Using the acceptance tests and feedback from our collaborators at Huawei we tuned the various thresholds and parameters of the tool for best performance. The PID algorithm settings were tuned for sensitivity to capture subtle visual inconsistencies which are then later ltered through additional CV techniques: F was set to 45 , L was set to 100cdm2, and C was set to 1. The GC-Matching Threshold (MC) was set to 1/8th the screen width of a target device; the DT for determining suspicious GCs was set to 20%; The LT was set to 5 pixels (based on designer preference); the CT which determines the degree to which colors must match for color-based DVs was set to 85%; and nally, the IDT was set to 20%. GVT allows for a user to change these settings if desired, additionally users are capable of defining areas of dynamic content (e.g., loaded from network activity), which should be ignored by the GVT analysis.

Tools used for GVT Implementation

We provide the tools we used in our implementation of the GUI-Collection, Design Violation Detection, and Report Generation Stages of GVT.

Tools used to implement GUI-Collection:

tools used to Implement Design Violation Detection:

tools used to Implement Report Generation:

Accessing GVT & Documentation

We developed GVT in collaboration with Huawei, and we must control access to GVT.  We are able to share both binaries and source code of the tool to researchers and open source developers.  Please click the button to fill out the form below and we will share the corresponding materials with you.

GVT Study

Research Questions

RQ1: How well does GVT perform in terms of detecting and classifying design violations?

RQ2: What Utility can GVT provide from the viewpoint of Android developers?

RQ3: What is the industrial applicability of GVT in terms of improving the mobile application development workflow?

Study Results

Results for RQ1

Results for RQ2


Results for RQ3