Understanding Complex Systems
There aren’t more than a handful of people who can accurately explain how every component of a modern system of hardware and software works. We all depend on abstraction and subject matter experts to help fill in all the hazy areas in our understanding. The skills I have listed below will help you to know enough about a whole system to properly test and optimize it without knowing every detail.
1. Interpret and draw system diagrams.
2. Understand systems environments like shared resources, components, and services, CPU, memory, storage, network, and soft resources.
3. Understand the differences between production and test environments like containers, cloud, virtualization, and configuration management.
Learning about distributed systems is a great way to acquire these skills. Using diagrams, you can identify separate resources, their roles, reactions, and interactions.
Designing Effective Tests
Thinking through the design of a test – what information it will gather, what that information will mean, and where it is fallible – is great fun and really interesting. Context is imperative in outlining the goals of testing. This context guides the types of tests to perform as well as the desired outcomes.
4. Identify your goals, requirements, desires, and your stakeholders.
5. Understand how to test concurrency, arrival rates, and scheduling.
6. Understand the roles of scalability, capacity, and reliability as quality attributes and requirements.
7. Understand how to test data and data management.
In designing tests with the above aspects in mind, you can design effective tests that are both true and justified.
I always treat load tests as experiments, making the parameters of the load test the conditions of the experiment. In order for the experiment to be a valid predictor of future outcomes, the modeled conditions need to accurately project expected activity for the system’s workload. Describing these conditions requires counts and frequencies, which is more complex than the simple concurrency count most people think of when describing load amounts.
8. Identify transactions and workflows, and calculate workload TPS goals and rates.
9. Calculate think time and pacing.
10. Understand how to log file analyses, run queries, and monitor production.
Constructing these models requires distilling the guesses and projections of stakeholders and subject matter experts, and then supplementing these guesses and projections with data from monitoring systems and logs. Not every activity in the system can or should be part of a load test, but most of the critical and high-volume activities should be.
Automating Scripts and Tests
Some scripting skills are required for building load tests. There are many tools that provide rapid test development environments that can minimize the amount of coding necessary to get going, but it is helpful to know how to optimize source code yourself. However, if you do use load testing tools, be prepared to get creative to come up with workarounds for the inevitable limitations that load testing tools create.
11. Measure parameters and dynamic content.
12. Evaluate transaction measurement and naming conventions.
13. Understand the effect of proper validations.
Understanding these aspects of a system and how they affect processes helps to guide recommendations and further testing parameters. If you don’t have the scripting skills, you can always work with someone who does.
Interpreting Performance Test Results
Load tests have high stakes. It is critical to understand what a performance test does and does not tell us because these results guide business recommendations and have a deep effect on revenue.
14. Use consistent measurements and metrics.
15. Identify bottlenecks, and where they are occurring.
16. Effectively read results and interpret graphs.
17. Describe the relationship between queues and sub-systems (Little’s Law)
In all testing, there is the risk of being too invested in confirmation, and under-invested in investigation and exploration. There can be much more to load testing than applying as much load as you think you need, checking that average response time is acceptable, error rate is low enough, and noting that the system didn’t crash. That might describe a good portion of the load testing that takes place, but here are just a few more things that could be examined in a load test:
- Did response time degrade? When? By how much, and in what pattern?
- How did resource consumption look against load? Any degradation patterns?
- Can we make any projections about system capacity from what we’ve learned?
- When did errors occur? Were they related to changes in response time or metric consumption?
Who becomes a performance engineer?
Performance engineering attracts people from diverse backgrounds. Some of us were functional testers who were assigned to performance. Others were developers who took an interest in optimization. Some, such as network administrators or database administrators, get interested in performance and optimization and migrate towards it. Fresh engineering graduates are another source of newcomers to the profession. Many highly talented people without a formal education seem to turn up in this specialty as well.
Modern thinking about agile development emphasizes multiple roles over rigid ones. As with other specialties, there is a gradient of degrees of effectiveness, based on experience and skill. It’s possible to dabble, but harder to excel without acquiring all or most of the above skills. Great performance engineers usually flesh out their skillsets with tuning expertise, development skills, and testing theory. As with most professions, there is more than one way to get the job done well.
A number of backgrounds can lead to performance engineering, but the diverse set of skills needed means that the best candidates are technology generalists with a healthy curiosity and a desire to learn.