Extracting Maximum Performance From Hardware

How to identify and eliminate software bottlenecks.

popularity

The Arm DS-5 Streamline performance analyzer provides system performance metrics, software tracing, and statistical profiling to help engineers get the most performance from hardware and find important bottlenecks in software.

The Raspberry Pi 3 is one of the easiest systems for learning Streamline, and a quad-core Cortex-A53 also makes it a good target for learning Linux development. Many of the Streamline articles on the Arm Community involve complex sequences to flash device bootloaders, patch and cross-compile Linux kernels, use utilities to pack images, root Android devices, disable security settings, and even rebuild the entire Android AOSP from scratch. Sometimes these things are necessary to get the most from Streamline, but the Raspberry Pi 3, with its focus on education, provides a great platform to learn how to setup a target system and learn how to use all Streamline features. The only real drawback is the current user space of Raspbian is AArch32 and not AArch64. Nevertheless, let’s see how to use Streamline with the Pi.

Target Preparation

DS-5 Streamline relies on the Arm gator Linux driver and gator daemon application.

There are three key things to consider for Streamline:

  • Linux kernel configuration
  • Compiling gator driver
  • Compiling the gator daemon application

The first two are closely coupled since the gator driver needs the running kernel source tree to compile against, and the kernel source is needed to provide source level information about the performance of the running kernel.

The kernel configuration involves a few aspects:

  • Enable profiling features in the kernel configuration
  • Make PMU counters visible via PMU bindings in the device tree
  • Compile the kernel with debug enabled for source code visibility

The gator driver can be statically compiled into the Linux kernel or inserted as a dynamically loaded module. Either one works, but the loadable module is usually preferred. Building the driver into the kernel is preferred only when security features or something else blocks the usage of loadable kernel modules. For best results, the gator daemon must be run as root on the target system.

Streamline can work without the kernel module and there are various other configurations to use it with reduced functionality. There is even an Android gator .apk which can be installed and used. For more info refer to Streamline for developers. The aim here is to not take any shortcuts and to get everything working on the easy to use Raspberry Pi 3 board.

There is lots more documentation on target setup for Streamline, but this article is meant to avoid navigating through all the general documentation and focus on using Streamline on the Raspberry Pi 3.

Raspberry Pi 3 configuration
The starting point is the latest version of Raspbian from Raspberry Pi. Follow the instructions to create an SD card using whatever path is easiest. I typically use Linux and use dd to copy the .img file to the SD card, but there are many other options for getting the image on to an SD card and lots of good information on how to do it. A 16 GB or larger SD card is required, otherwise you will run out of space when building the kernel.

Boot the system for the first time. Connect the wireless or wired network as needed. The default username is pi and password is raspberry. I usually change the password to make sure nobody else logs in using the default password. The use of sudo is enabled for user pi automatically. I also enable ssh so I can connect remotely. This is done using Preferences -> Raspberry Pi configuration and then click the Interfaces tab.

Once a working system is confirmed the kernel can be downloaded, configured, and compiled. To keep things as easy as possible I recommend compiling the kernel right on the Raspberry Pi 3 itself. It’s a little slower, but easier to do compared to cross-compiling.

The instructions are in the Raspberry Pi documentation. I recommend adding ncurses to the packages to install so make menuconfig can be used.

Install the needed extra tools. I found that git was already installed and flex and bison are needed to build the Linux perf application.

Then navigate to Kernel hacking -> Compile-time checks and compiler options. The path to get here is along the top of the screenshot blow. Select “Compile the kernel with debug info” and hit space to enable. Then select Exit along the bottom 3 times and Yes to save a new kernel .config file.

The kernel compile takes some time, but be patient and it will complete. I didn’t time it, but it could be a couple of hours.

The good news is all of the configuration needed for Streamline is already enabled so there is no need to make changes to the kernel configuration beyond enabling debug.

Restart the system with the new kernel use dmesg to confirm the new kernel is running.

1

Look at the output of the Linux version line and make sure it matches the time the kernel was compiled:

Linux perf
Linux perf can be used to make sure all the performance and profiling features are enabled. It is part of the kernel source tree, but not compiled automatically. Build the perf executable. From the linux/ directory:

The output should contain a number of events as well information about the PMU counters. Look for events with [Kernel PMU event] to confirm the CPU counters will be available to Streamline.

Next, setup the gator driver and daemon to enable the connection to Streamline.

Gator driver and daemon
There are a couple of places to get the gator software from. The instructions below are for using github.

The code can also be obtained from a DS-5 installation. The path in DS-5 is $DS5_HOME/sw/streamline/gator directory. I found the pre-compiled gatord at $DS5_HOME/ sw/streamline/bin/arm/gatord works perfectly well on the Raspberry Pi 3, but it’s good to learn how to download and compile it anyway.

The README file for gator has plenty of information about the kernel configuration. Feel free to study it and confirm the Raspberry Pi 3 kernel config has these things enabled. Unfortunately, the kernel changes rapidly and the options are not the same for all kernel versions so sometimes the details are not the same.

Download the software:

Compile the kernel module. The -C should be the path to the linux source already used to build the kernel. This guarantees the module is compiled against the kernel that is actually running.

This will create gator.ko to be inserted into the kernel. Next compile the gator daemon.

Running gator is as simple as inserting the kernel module into the running kernel and starting the gator daemon.

Use the lsmod command to show the list of modules and confirm gator is now listed.

Now, it’s time to connect streamline. From a Windows or Linux machine start the Streamline GUI using the menu or by running streamline from the command line.

Click the eye-ball, Browse for a target. Sometimes the Raspberry Pi may show up automatically. If not, just click Setup Target. Either way, enter the IP address of the Raspberry Pi 3, pi as the username, and the password.

If you enter the IP address of the Pi but Streamline still cannot find it, ensure that Streamline and the Raspberry Pi 3 are on the same network. Only the last number of the IP addresses of the host and target machine should be different.

The pop-up displayed after the Install button is pressed is a little deceptive as it means to install the gatord on the target system. We don’t need this as it’s already running so you can click No to avoid this.

Now select the target.

Start a capture session, run a program, and end the capture session. This will confirm data collection is working. The Red circle starts a capture session and the red Stop icon at the top left of the timeline will end it.

Without any source code or software images, the Call Paths, Functions, and Code tabs don’t provide much information, just process names and process ID values with a lot of blank screens and “unknown code” messages. To improve this the software images and source is needed by Streamline.

Kernel Source Information
To provide the kernel software image and source code to Streamline the easiest way is to copy the /home/pi/linux directory from the Raspbery Pi 3 to the host machine running Streamline. Instructions will be highly dependent on if the host machine is Linux or Windows. The easiest way to copy from the Raspberry Pi 3 is scp. Using scp on Linux or an scp client on Windows such as putty (pscp.exe) or WinSCP.

For a Linux host, first tar up the source on Raspberry Pi:

On the Linux host machine, use the IP address of the Raspberry Pi 3 to scp. Replace the IP with the correct address of the Raspberry Pi.

Once the source code and compiled objects are on the host machine, they can be added by clicking the gear icon, Capture & analysis options.

Enter the vmlinux from the top of the Linux source tree just copied to the host machine by clicking Add ELF image… in the bottom section of the dialog.

To enable source code profiling on other applications compile them on the Raspberry Pi 3 and then copy the compiled source tree from the Pi to the host machine and add to the list of images in this same area.

An example application
To do a test with Streamline, the LMbench applications can be used. They are easy to download and build with -g so Streamline will be able to map the source code of the applications.

This will generate executables in the bin/armv7l-linux-gnu directory.

Go the top and run a test.

Now enable the Streamline capture, run the test, and stop it to see the results. Make sure to copy the lmbench-3.0-a9 directory from the Raspberry Pi 3 to the host machine to enable source level profiling of the application. This can be done using scp in the same way the Linux kernel source was copied.

The Code tab in Streamline will not be able to find the source code because the path on the target is different than the path on the host. Use the Click here to locate source and navigate to the file being referenced. If the window doesn’t have the message about missing source code try to click a process in the bottom of the Timeline tab for which the source code is available.

Inserting markers
Markers can be inserted in the source code of any application to make it easier to track progress on the Streamline timeline. Everything needed is in the annotate/ directory of the gator software. Inserting a marker requires a few steps:

  • Include streamline_annotate.h from the gator/annotate directory in the application source
  • Include streamline_annote.c in the compilation of the application
  • Call the annotate setup function to initialize the interface
  • Call one of the annotate marker functions to create a marker

To add a marker to the LMbench source file bw_mem.c add the include file at the top of the source file:

#include “streamline_annotate.h”

During the setup, somewhere in the main() function before the test is run, add:

ANNOTATE_SETUP;

Finally, put the markers where needed to track the test on the Streamline timeline.


Make sure to add the include path for the compiler to find streamline_annotate.h and add streamline_annotate.c to the list of source files to compile.

Entries will also show up in the log tab and markers on the timeline.

Markers can also be inserted in Java applications and in the Linux kernel. Refer to the gator/annotate/readme.txt for more information.

Conclusion
This covers all of the steps to setup and run Streamline with kernel and application tracing and full system profiling on an Arm target system. The Raspberry Pi 3 is one of the best systems to learn on since it’s very easy to gain root access, change configuration, and build software from scratch. There are also numerous resources and documentation to learn about the Raspberry Pi 3 and the Raspbian operating system. Learning this flow on the Raspberry Pi 3 will make it easier when it’s time to apply Streamline on an Android mobile device or other target system.

 



Leave a Reply


(Note: This name will be displayed publicly)