Cheat Sheet – patterns & practices Performance Engineering

J.D. Meier, Alex Homer, David Hill, Jason Taylor, Prashant Bansode, Lonnie Wall, Rob Boucher, Akshay Bogawat


  • Understand the concepts of performance engineering.
  • Learn the key activities and patterns related to performance engineering.
  • Understand the common performance architecture and design issues.
  • Learn the key process and design principles for performance engineering.


This cheat sheet summarizes the patterns & practices approach to Performance Engineering, with the emphasis on architecture and design. To design, build, and deploy better performing applications, you must integrate performance engineering into you application development lifecycle and include specific performance-related activities in your current software engineering processes. Performance engineering should become a feature of your development process and not an afterthought. The key design-focused performance engineering activities include identifying performance objectives, applying performance design guidelines, conducting performance architecture and design reviews, and doing performance testing and tuning. Each activity will improve the performance of your application; for best results, you should implement them all, but you can incrementally adopt any of these activities as you see fit.

Performance Overlay

Figure 1 shows how performance engineering topics fit with the core activities of application design.

Figure 1 - Performance engineering topics as part of the core architectural design activities.

Key Activities in the Life Cycle

The performance engineering approach extends the proven core activities to define performance-specific activities. These activities ensure that all aspects of performance are tested, and that sufficient considerations are made for deployment and future capacity requirements. The core activities you should consider performing include the following:
  • Performance Objectives. Setting objectives helps you to scope and prioritize your work by setting boundaries and constraints. Setting performance objectives helps you to identify where to start, how to proceed, and when your application meets your performance goals.
  • Budgeting. Budgeting represents your constraints and enables you to specify how much you can spend (in terms of acquiring resources) and how you plan to spend it. It indicates the maximum cost that a particular feature or unit in your project can afford to pay against each of your key performance objectives.
  • Performance Modeling. Performance modeling is an engineering technique that provides a structured and repeatable approach to meeting your performance objectives. By building and analyzing models, you can evaluate tradeoffs before you actually build the solution.
  • Performance Design Guidelines. Applying design guidelines, patterns, and principles enables you to engineer for performance from an early stage.
  • Performance Design Inspections. Performance design inspections are an effective way to identify problems in your application design. By using pattern-based categories and a question-driven approach, you simplify the task of evaluating your design against root-cause performance issues.
  • Performance Code Inspections. Many performance defects are found during code reviews. Analyzing code for performance defects includes knowing what to look for and how to look for it. Performance code inspections identify inefficient coding practices that could lead to performance bottlenecks.
  • Performance Testing. Load and stress testing are used to generate metrics, and to verify application behavior and performance under normal and peak load conditions.
  • Performance Tuning. Performance tuning is an iterative process that you use to identify and eliminate bottlenecks until your application meets its performance objectives. You start by establishing a baseline. Then you collect data, analyze the results, and make configuration changes based on the analysis. After each set of changes, you re-test and re-measure to verify that your application has moved closer to its performance objectives.
  • Performance Health Metrics. These are the metrics (such as measures, values, and criteria) obtained by running performance tests that are relevant to your performance objectives, and help you to identify bottlenecks. The metrics allow you to evaluate the health of your application from a performance perspective, in relation to performance objectives such as throughput, response time, and resource utilization.
  • Performance Deployment Inspections. During the deployment phase, you validate your model by using production metrics. You can validate workload estimates, resource utilization levels, response time, and throughput.
  • Capacity Planning. You should continue to measure and monitor your application after it is deployed in the production environment. Changes that may affect system performance include increased user loads, deployment of new applications on shared infrastructure, system software revisions, and updates to your application to provide enhanced or new functionality. Use your performance metrics to guide your capacity and scaling plans.

Performance Frame

The following performance frame defines a set of patterns-based categories organized around repeatable problems and solutions. You can use these categories to divide your application architecture for further analysis, and to help identify application performance issues. The categories within the frame represent the critical areas where mistakes are most often made.

Category Description
Caching What and where to cache? Caching refers to how your applications stores frequently used data at a location close to the point of use to reduce the number of round-trips. The main points to be considered are per-user caching, application-wide caching, and data volatility.
Communication How to communicate between layers? Communication refers to choices for transport mechanism, boundaries, remote interface design, round trips, serialization, and bandwidth.
Concurrency How to handle concurrent user interactions? Concurrency refers to choices for transaction, locks, threading, and queuing.
Coupling / Cohesion How to structure the application? Coupling refers to the relationship between components or sub-systems. Tight coupling leads to the creation of an architecture where changes ripple through components making it hard to understand and modify the code. Cohesion refers to the way components or classes are composed. If a component or class has a well-defined role within the entire system, then it is said to be highly cohesive.
Data Access How to access data? Data access refers to choices and approaches for schema design, paging, hierarchies, indexes, volume of data, and round trips.
Data Structures How to handle data? Data structures and algorithms refer to the choice of code algorithms, and the choice of application entities (such as arrays or collections).
Exception Management How to handle exceptions? Exceptions management refers to choices and approaches for catching, managing, and throwing exceptions.
Resource Management How to manage resources? Resource management refers to approach for allocating, creating, destroying, and pooling application resources.
State Management What and where to maintain state? State management refers to how your application maintains state. The main points to consider are per-user state, application-wide state, state persistence, and state store location.

Architecture and Design Issues

To apply the performance frame to your application, it is useful to think about each category as it applies to your application scenarios and its specific deployment. For example, Figure 2 shows how you might analyze the architecture and performance design issues for a typical Web application.

Figure 2 - Typical Web application performance design issues.

Separate your performance concerns by application tier to get a clearer view of performance issues, potential bottlenecks, and mitigations. For example, the key areas of concern for each application tier in the diagram above are:
  • Browser. Using large volumes of viewstate data, page output rendered at one go, and the unnecessary use of the HTTPS protocol for pages that do not require securing.
  • Web Server. Not caching reference data, poor resource management, blocking calls to services on the application tier, wrong data types, and requiring state affinity.
  • Application Server. Not pooling database connections, incorrect data structure choices, and chatty communication with the database server.
  • Database Server. Contention, isolation level, locking, and deadlocks.


When carrying oout performance engineering activities, you should follow well-defined principles. These principles fall into two categories. Design process principles help you to define the process that you will follow when designing your performance engineering approach. Design principles help to ensure that you consider all of the relevant areas where performance engineering can maximize application performance. The following sections describe these two sets of principles.

Design Process Principles

Design process principles help you to define the process that you will follow when designing your performance engineering approach. Consider the following principles to ensure that you design an appropriate performance engineering process:
  • Set objective goals. Avoid ambiguous or incomplete goals that cannot be measured, such as "the application must run fast" or "the application must load quickly". You must identify the performance and scalability goals of your application so that you can both design to meet them, and plan your tests around them. Make sure that your goals are measurable and verifiable. Requirements to consider for your performance objectives include response times, throughput, resource utilization, and workload. For example, how long should a particular request take? How many users must your application support? What is the peak load that the application must handle? How many transactions per second must it support? You must also consider resource utilization thresholds. How much CPU, memory, network I/O, and disk I/O is it acceptable for your application to consume?
  • Validate your architecture and design early. Identify, prototype, and validate your key design choices at the start of the process. Your goal is to evaluate whether your application architecture can support your performance goals. Some of the important decisions you must validate include deployment topology, load balancing, network bandwidth, authentication and authorization strategies, exception management, instrumentation, database design, data access strategies, state management, and caching. Be prepared to remove features and functionality, or rework areas that do not meet your performance goals. Know the cost of specific design choices and features.
  • Cut the deadwood. Often the greatest gains come from finding whole sections of work that can be removed because they are unnecessary. This often occurs when (well-tuned) functions are composed to perform some greater operation. It is common for interim results from the first function in your system not be used if they are destined for the second and subsequent functions. Elimination of these "waste" paths can provide noticeable end-to-end performance improvements.
  • Tune end-to-end performance. Optimizing a single feature may take resources away from another feature and reduce overall performance. In the same way, a single bottleneck in a subsystem within your application can affect overall application performance regardless of how well the other subsystems are tuned. You obtain the most benefit from performance testing when you tune end-to-end, rather than spending considerable time and money on tuning one particular subsystem. Identify bottlenecks, and only then tune these specific parts of your application. Often performance engineering involves moving from one bottleneck to the next one.
  • Measure throughout the life cycle. You must identify whether your application's performance is moving towards or away from your performance objectives. Performance tuning is an iterative process of continuous improvement with hopefully steady gains, punctuated by unplanned losses, until you meet your objectives. Measure your application's performance against your performance objectives throughout the development lifecycle and make sure that performance is a core component of that lifecycle. Unit test the performance of specific pieces of code and verify that it meets the defined performance objectives before moving on to integrated performance testing. When your application is deployed in a production environment, continue to measure its performance. Factors such as the number of users, usage patterns, and data volumes change over time. Newly installed applications may also compete for shared resources.

Design Principles

Design principles help to ensure that you consider all of the relevant areas where performance engineering can maximize application performance. The following design principles are abstracted from architectures that have scaled and performed well over time:
  • Consider designing coarse-grained services. Coarse-grained services minimize the number of client-service interactions, and help you design cohesive units of work. Coarse-grained services also help to abstract service internals from the client and provide a looser coupling between the client and service. Loose coupling improves your ability to encapsulate change. If you already have fine-grained services, consider wrapping them with a facade layer to help achieve the benefits of a coarse-grained service.
  • Consider designing fine-grained services. If your services communicate with components on the same physical tier, and you do not use message-based communication, you can obtain a performance increase by using fine-grained methods where this will reduce the processing load and the volume of data transferred. Howver, it will only provide an advantage where the caller can make one or a few calls to the fine-grained methods instead of calling a more coarse-grained method.
  • Minimize round trips by batching work. Minimize round trips to reduce call latency. For example, batch calls together and design coarse-grained services that allow you to perform a single logical operation by using a single round trip. Apply this principle to reduce communication across boundaries such as threads, processes, processors, or servers. This is particularly important when making remote server calls across a network. -For more information about using batching when accessing data, see "Data Access Guidelines".
  • Acquire late and release early. Minimize the time that you hold shared and limited resources such as network and database connections. Releasing and re-acquiring such resources from the operating system can be expensive, so consider a recycling plan to support "acquire late and release early". This enables you to optimize the use of shared resources across requests.
  • Evaluate affinity with processing resources. When certain resources are only available from certain servers or processors, there is an affinity between the resource and the server or processor. While affinity can improve performance, it can also impact scalability. Carefully evaluate your scalability needs. Will you need to add more processors or servers? If application requests are bound by affinity to a particular processor or server, you may inhibit your application's ability to scale. As the load increases, the ability to distribute processing across processors or servers influences the potential capacity of your application.
  • Locate processing closer to the resources it needs. If your processing involves a lot of client-service interaction, you may need to move the processing code closer to the client. If the process interacts intensively with the data store, you may want to move the processing code closer to the data.
  • Pool shared resources. Pool shared resources that are scarce or expensive to create, such as database or network connections. Use pooling to help eliminate the performance overhead associated with establishing access to resources; and to improve scalability by sharing a limited number of resources amongst a much larger number of clients.
  • Avoid unnecessary work. Use techniques such as caching, avoiding round trips, and validating input early to reduce unnecessary processing. For more information, see "Cut the Deadwood", above.
  • Reduce contention. Blocking and hotspots are common sources of contention. Blocking is caused by long-running tasks, such as expensive I/O operations. Hotspots result from concentrated access to certain data that many other processes need to access. Avoid blocking while accessing resources, because resource contention leads to requests being queued. Contention can be subtle. Consider a database scenario: on one hand, large tables must be indexed very carefully to avoid blocking due to intensive I/O. However, many clients will be able to access different parts of the table with no difficulty. On the other hand, small tables are unlikely to have I/O problems but might be used so frequently by so many clients that they are heavily contested. Techniques for reducing contention include the efficient use of shared threads and minimizing the amount of time your code retains locks.
  • Use progressive processing. Use efficient practices for handling data changes. For example, perform incremental updates: when a portion of data changes, process the changed portion and not all of the data. In addition, consider rendering output progressively: do not block while retrieving the entire result set if you can send the user an initial subset of rows quickly and then follow up with further subsets of rows when required.
  • Process independent tasks concurrently. When you must process multiple independent tasks, you can asynchronously execute these tasks to perform them concurrently. Asynchronous processing provides the most benefit for I/O-bound tasks, and has limited benefit when the tasks are CPU-bound and restricted to a single processor. If you plan to deploy on single-CPU servers, using additional threads will simply cause increased context switching because there is no real multithreading capability, and so gains in performance are likely to be very limited. Single CPU-bound multithreaded tasks perform relatively slowly due to the overhead of thread switching.


This table represents a set of performance design guidelines for application architects. Use this as a starting point for your design, and as a guide to improve performance design inspections.

Category Guidelines
Caching Decide where to cache data.
Decide what data to cache.
Decide the expiration policy and scavenging mechanism.
Decide how to load the cache data.
Avoid distributed coherent caches.
Communication Choose the appropriate remote communication mechanism.
Design chunky interfaces.
Consider how to pass data between layers.
Minimize the amount of data sent across the wire.
Batch work to reduce calls over the network.
Reduce transitions across boundaries.
Consider asynchronous communication.
Consider message queuing.
Consider a "fire and forget" invocation model.
Concurrency Reduce contention by minimizing lock times.
Strike a balance between coarse-grained and fine-grained locks.
Choose an appropriate transaction isolation level.
Avoid long-running atomic transactions.
Coupling / Cohesion Design for loose coupling.
Design for high cohesion.
Partition application functionality into logical layers.
Use early binding where possible.
Evaluate resource affinity.
Data Access Open connections as late as possible and release them as early as possible.
Separate read-only and transactional requests.
Avoid returning unnecessary data.
Cache data to avoid unnecessary work.
Data Structures Choose an appropriate data structure.
Pre-assign the size for large data types that grow dynamically.
Use value and reference types appropriately.
Exception Management Do not use exceptions to control application flow.
Use validation code to avoid unnecessary exceptions.
Do not catch exceptions that you cannot handle.
Use a finally block to ensure resources are released.
Resource Management Treat threads as a shared resource.
Pool shared or scarce resources.
Acquire late, release early.
Consider efficient object creation and destruction.
Consider resource throttling.
State Management Evaluate a stateful design against a stateless design.
Consider your state store options.
Minimize session data.
Free session resources as soon as possible.
Avoid accessing session variables from business logic.


Design patterns in this context refer to generic solutions that address commonly occurring application design problems. Some of the patterns identified below are well-known design patterns. The primary purpose of some of these patterns does not relate specifically to performance. However, their use in certain scenarios enables better performance as a secondary goal.

Following are the patterns that are most useful:
  • Data Transfer Object - Create an object that carries all the state it requires, combining multiple remote calls for state into a single call. This is closely associated with the Remote Façade pattern. The Remote Façade aggregates numerous fine-grained calls in to a single coarse-grained interface. The Data Transfer Object is the element containing the data, which is passed over the network. For more information, see Data Transfer Object in Enterprise Solution Patterns at
  • Remote Façade - Reduce the overhead for calls by wrapping fine-grained calls with more coarse-grained calls.
  • Fast Lane Reader - For read-only data that does not change often, avoid transactional overhead.
  • Flyweight - Reuse objects instead of creating new ones. For more information, see Flyweight at
  • Message Façade - Create a façade to allow asynchronous execution when the client does not require response before it can continue its processing.
  • Service Locator - Reduce expensive lookups by caching the connection details and locations of services.
  • Singleton - Limit the number of objects created for a given type, usually to a single instance. Use this pattern with caution because, while it reduces overhead, it can create contention. See Singleton in Enterprise Solution Patterns at

In addition to these well-known patterns, look for patterns that accomplish the following: reducing resource contention, reducing object creation and destruction overhead, distributing load, queuing work, batching work, improving cohesion, reducing chatty communication, and improving resource sharing.

Additional Information

Last edited Nov 20, 2008 at 5:37 PM by prashantbansode, version 2


No comments yet.