Performance testing on the web

This article was updated on Nov 25, 2020 to use the web_benchmarks package. Overview During development, we often want to test an app’s performance in the browser. Performance testing is useful, as it reveals potential bugs that make an app slower. This article describes a way to test an app’s performance in Chrome. This method is similar to how we test the new Flutter Gallery ’s performance. Example app We use a simple app that contains an appbar, a floating action button, and an infinite list of items. The list also shows the number of times the button is pushed. The app has a second page containing some information. You can clone the app here: https://github.com/material-components/material-components-flutter-experimental/tree/develop/web_benchmarks_example What to test? We want to test the app’s performance in Chrome under the following usage scenarios: User scrolls through the infinite list. User switches between the two pages. User taps the floating action button. Setting up the framework Add the following to pubspec.yaml: https://medium.com/media/ef0ca17199a2807e917e92cb92ffc891/href This dependency pulls in web_benchmarks , a minimal package that implements performance testing in Chrome. This package is adapted from macrobenchmarks and devicelab , two packages used by Flutter for web performance testing on the Flutter Gallery. At the moment, these two packages are specialized for web performance testing within flutter/flutter, so it is easier to import the more general package, web_benchmarks . Run flutter pub get to pull in this package. Writing the first test Add a benchmarks directory under lib, and add a new dart file to it called runner.dart: The contents of the file are as follows: https://medium.com/media/3a002aacd849a5e328e379e37450537c/href What is this test doing? When this app runs, a ScrollRecorder object is created, which drives the app by automatically making gestures. In this case, shortly after the app starts, it starts scrolling down the infinite list. The ScrollRecorder class extends the AppRecorder class, which extends the WidgetRecorder class, which also records performance data as it drives the app. runBenchmarks is a function defined in package:web_benchmarks/client.dart, which allows the user to select which benchmark to run, and displays the results in the browser. The method automate uses the flutter_test package, which provides methods to make gestures or find certain widgets in an app. Running the first test In the root directory of the project, run flutter run -d chrome -t lib/benchmarks/runner.dart. This tells Flutter to use runner.dart as the entry point, instead of main.dart. We only have one benchmark so far, so click “scroll” to start it. The test begins, and the list automatically scrolls down. The test ends in a few seconds, showing the following screen: This chart shows the time it took for the app to draw each (recorded) frame. The horizontal axis represents the flow of time; the vertical axis, the duration each frame took. The first 2/3 of the chart has a gray background; these frames are considered “warm-up frames”, and are omitted from the statistics. Warm-up frames typically give the JIT compiler time to compile the code, and populate various caches, so that the measured frames produce numbers that reflect the “eventual” performance of the app, rather than the first few seconds of it. The warm-up phase should not be always ignored — it can provide valuable information about your app’s performance during the first few seconds, which can still influence the perception of the app’s quality. Red frames are “outliers” — they are frames which take significantly longer than other frames to draw. Some outliers can be nearly unnoticeable. For example, jank at the beginning or the end of an animation up to a certain point will not be visible. However, a janky frame in the middle of an animation will be very noticeable. Outliers provide a good indicator of the jankiness of the app. By improving your app, you can lower the values of outliers or reduce the number of outliers, which shows that your app has become smoother. Collecting data from Chrome’s DevTools This benchmark is entirely run from inside Chrome. Add the following file as test/run_benchmarks.dart: https://medium.com/media/5ce2ae71e67c98c50eef78779e9f1d35/href Then, run dart test/run_benchmarks.dart. After about one minute, you should see the following results: https://medium.com/media/2e95cfd7eb847b13e4d86219ca683123/href The exact benchmark values may vary depending on the machine. What is this test doing? Running test/run_benchmarks.dart builds the app for the web. Then, it starts a Chrome instance and runs the app in it. test/run_benchmarks.dart connects to Chrome’s DevTools port, and listens and collects relevant performance data from it. What do the results mean? When rendering a frame, the layer tree is walked twice. “Preroll” is the first walk. It does not render anything, but it computes values that are later used for rendering. Examples include: transform matrices, the inverse of transforms, and clips. “Apply frame” is the second walk where the UI is actually rendered. “Draw frame” is the total time that the framework takes to render a frame. It includes “Preroll” and “Apply frame”, but it also includes the time spent on building and laying out the widgets. “Total UI frame” includes everything in “Draw frame”, but it also includes some hidden work that the browser performs, such as layer tree updates, style recalculations, and browser-side layout (not to be confused with Flutter’s own layout). When a dataset (a list of durations) is collected, the algorithm removes outliers. First, the mean and standard deviation of the data are computed, and any data point that is higher than (mean + 1 standard deviation) is considered an outlier. The mean and standard deviation of non-outliers (clean data) are used to compute the average and noise of the data set, which are then reported. The mean of all outliers, as well as the ratio of the “outlier mean” and the “non-outlier mean” are also reported. For each dataset, “outlierRatio” and “noise” are both good indicators of how much noise there is in the performance of the app. If the results are too noisy, it might indicate inconsistencies in performance (such as janky frames as GC pauses). By aiming to lower the noise, you can make your app perform more smoothly. Add more tests Edit lib/benchmarks/runner.dart to add two more tests. First, modify the main function: https://medium.com/media/86e9a82223d28c5bb1beaf236dfb297a/href Finally, add two more classes that extend AppRecorder: https://medium.com/media/4b6ad9219c3e3c13aba5cd2796cd6052/href What are these tests doing? We have added the two remaining benchmark tests: one for switching between pages, and the other for tapping on the floating action button. animationStops repeatedly checks whether an animation is happening, and stops when all animation has stopped. This ensures, for example, a successful transition to the “about” page. In the “page” and “tap” benchmarks, the _completed boolean tracks whether the automated gestures have finished. In the “page” and “tap” benchmarks, overriding the shouldContinue method causes the AppRecorder to stop recording frames after all gestures have finished. How to run these tests? To run these tests (and see the animations) in Chrome, run: flutter run -d chrome -t lib/benchmarks/runner.dart --profile To run these tests and collect DevTools data, run: dart test/run_benchmarks.dart What next? Once you have a way to collect performance data, you can use it however you want: You can set up a job in CI that runs these benchmark tests whenever someone submits a PR, to avoid introducing performance-heavy changes. You can also set up a dashboard that keeps track of the trend of performance benchmarks. This is what we are doing for the Flutter Gallery (see Flutter Dashboard ). Performance testing on the web was originally published in Flutter on Medium, where people are continuing the conversation by highlighting and responding to this story.