Going beyond Power Usage Effectiveness (PUE) for datacenter efficiency
Today, many organizations are looking for newer ways of doing more with less, reducing IT budgets or perhaps curtailing the incidental costs associated with datacenter expansions. In a rapidly changing market environment, datacenter managers need to focus on creating efficient operating environments to augment the life of existing datacenters. The efficiencies in datacenters can be attained through numerous ways, these include – increasing compute densities, creating cold aisle containment systems, more effective use of outside air, but the key component over time is to have an easily understood metric to gauge just how efficient the datacenter is, and how much improvement in efficiencies have been created on an ongoing basis.
Power Usage Effectiveness or simply “PUE” is one of the basic and most effective metrics for measuring datacenter energy efficiency. It is calculated by taking into account the total power consumed by a datacenter facility and dividing it by the power consumed by the IT equipment. The resulting ratio provides the effective power overhead for a unit of IT load. For example, a PUE value of 2.0 means that for every watt used to power IT equipment, an additional watt is required to deliver the power and keep the equipment cool. There is increasing pressure being exerted on datacenter managers to take measures to reduce the PUE.
The Power Usage Effectiveness (PUE) metric was introduced by the Green Grid, an association of IT professionals focused on increasing the energy efficiency of datacenters. To effectively manage and monitor the energy efficiency in the datacenter having metrics to measure the impact of changes is essential. Green Grid had introduced two primary metrics, PUE and DCE (Data Center Efficiency). The latter was later changed to DCiE (Data Center Infrastructure Efficiency). Both metrics measure the same two parameters, the total power to the datacenter and the IT equipment power.
A PUE value of 1 depicts the optimal level of data center efficiency. In practical terms, a PUE value of 1 means that all power going into the data center is being used to power IT equipment. Anything above a value of 1 means there is data center overhead required to support the IT load. Data Center Infrastructure Effectiveness (DCiE) is the reciprocal of PUE. It is calculated as a percentage by taking the total power of the IT equipment and dividing it by the total power into the datacenter multiplied by 100. A PUE value of 3.0 would equate to a DCiE value of 33%, or suggest that the IT equipment was consuming 33% of the facility’s power.
Let us take a look at the way power is consumed in a datacenter.
In an ideal case scenario, all the power entering the datacenter should be used to operate the IT load (servers, storage and network). If we consider that all the power entering the datacenter is consumed for operating it, then the resultant PUE should ideally be 1. Realistically, however, some of this power is diverted to support cooling, lighting and other support infrastructure. Some of the remaining power is consumed due to losses in the power system and the rest then goes to service the IT load.
Let us take an example to see how PUE is calculated. Consider that the power entering the datacenter (measured at the utility meter) is 100 kW and the power consumed by the IT load (measured at the output of the UPS) is 50 kW, PUE will be calculated as follows:
A PUE value of 2.0 is quite usual for a datacenter. What does this mean? This means that for every watt required to power a server, 2 watts of power is consumed. It is important to note here that since we are paying for every watt of power entering the datacenter, so every watt of overhead represents an additional cost. Reducing this overhead will reduce the overall operating costs for the datacenter.
The two ways in which we can bring about a change and improve datacenter energy efficiency include:
- Reducing the power going to the support infrastructure
- Reducing losses in the power system.
This way we can ensure that more of the power entering the datacenter should make it to the IT load; consequently, improving datacenter energy efficiency and reducing the PUE.
Are there drawbacks to using PUE as a measurement of datacenter efficiency?
Datacenter managers are under immense pressure to reduce costs and match the reported PUE with that of other companies. In other words, they are being asked to significantly reduce their PUE value. Unfortunately, this is not always the right approach and can have a negative impact. If datacenter managers focus only on reducing PUE, they may inadvertently use more energy and increase datacenter costs.
Let us explain this with the help of an example. Suppose we have a datacenter which has input power of 100 kW, 50kW of which is being used to power IT equipment. As previously illustrated, this would give us an initial PUE value of 2.0.
Suppose the organization now decides to virtualize some servers. In fact, it is so successful with virtualization that it is able to reduce the power to IT equipment by 25 kW and the overall power to the data center by the same amount. What happens to the PUE in such a case?
But isn’t this what we want to avoid – a higher value of PUE. Well, not necessarily. Let us understand the reason behind the increase or decrease in PUE value. While it may seem ambiguous but any reduction in IT usage will actually result in a higher PUE.
Let us explain this with another formula for PUE
will always increase, thereby, resulting in an increase in the PUE. Conversely, increasing the IT load will always decrease the PUE. So, if the PUE has gone up, does this mean the datacenter is now less energy efficient? On the contrary, the data center is now more energy efficient. We are able to do more with less now i.e., same work with less energy at a lesser cost.
Still not convinced? Ok, let us explain this further:
Annual energy utilization = 100kW x 8760 hrs/yr = 876000 kWh
Annual electricity cost = 876000kWh x Rs. 3.10/kWh* = Rs. 27, 15,600
*Base Tariff for HT I – Industries – Mahadiscom
Assumption that the PUE goes up to 2.1 due to reduced capacity utilization.
Annual energy utilization = (25+25*1.1=52.5) kW x 8760 hrs/yr = 459900 kWh
Annual electricity cost = 459900 kWh x Rs. 3.10/kWh* = Rs. 14,25,690
*Base Tariff for HT I – Industries – Mahadiscom, for example at commercial consumer level prices are at more than 2x of these levels.
Above example shows that there is huge amount of savings inspite of increase in PUE, which demonstrates that IT load management can deliver better results than just PUE optimization.
PUE becomes a meaningless number if we do not know how to use it to measure the outcome of changes in the datacenter. Knowing that virtualization will eventually increase the PUE of our datacenter, should we avoid it? No, infact when we examine the PUE of our datacenter over a period of time we should also take into account when the virtualization actually took place. We must track any changes that may have taken place in the IT infrastructure or IT Load in addition to tracking our PUE, so that we are able to correlate the changes to the PUE value.There are many other factors that may impact PUE. Redundancy, for example, will increase PUE. There will always be tradeoffs between availability and energy efficiency. Data center equipment – from cooling equipment to UPSs to server power supplies – will run more efficiently when they are heavily loaded.
The bottom line is that PUE, while an important piece of the energy efficiency puzzle, is just that – one piece of the energy efficiency puzzle. PUE constitutes only one component of a comprehensive energy management program that must consider both sides of the coin – the IT and the facility.
What else needs to be measured along with PUE?
PUE is best used for tracking the impact of changes made to the datacenter infrastructure. Let us revisit Exhibit 1.
Whilst it is important for an organization to reduce losses in the power system and the power used for the support infrastructure, it is also apparent that the bulk of the power consumption in the datacenter goes to the IT load itself. If the organization can reduce the IT load, it will reduce the overall power required for the datacenter.
As a matter of fact, reducing the IT load will have a compounding effect, as it will also reduce the losses in the power system and the power required for the support infrastructure. This can be termed as a cascading effect. Let us see how this works.
For e.g., If we assume that one watt of power can be saved at the IT load, it will reduce losses in the server power supply (AC to DC conversion), reduce losses in the power distribution (PDU transformers, losses in the wiring itself), reduce power losses in the UPS, reduce the amount of cooling required and, finally, reduce power losses in the building transformer and switchgear. The end result of the cascade effect is that saving one watt at the IT load may actually result in two or more watts of overall energy savings.
Powering the IT load forms a major chunk of the overall electricity cost in a datacenter, hence, for any energy efficiency initiative to be successful an organization should first look at reduction of the IT load.
Reducing the IT Load
There are a number of ways to reduce the IT load in the datacenter. These include:
- Decommission or repurpose servers which are no longer in use
- Power down servers when not in use
- Enable power management
- Replace inefficient servers
- Virtualize or consolidate servers
Decommission or repurpose servers
Datacenter managers always struggle with how to identify unused or lightly used or ‘ghost’ servers. One way of identifying a ‘ghost’ server is to use CPU utilization as a measure of whether or not a server is being actively used. However, this may not hold true every time. A server may appear to be busy when it is actually only performing secondary or tertiary processing not related directly to the primary services of the server.
For e.g., the primary service of an e-mail server is to provide e-mail. In addition, this server may also provide monitoring services, backup services, antivirus services, etc., but those are secondary, tertiary, and similar types of service. If the e-mail server stops being accessed for e-mail, the monitoring, backup, and antivirus services may no longer be necessary, but the server may still continue to provide them. So from a CPU-utilization standpoint, the unused server may appear to be busy, but that may only be secondary or tertiary processing. Hence, CPU utilization as a measure will become ineffective.
Another way of determining whether a server is actively being used or not is Server Compute Efficiency (ScE). The ScE metric measures CPU usage, disk and network I/O, incoming session-based connection requests and interactive logins to determine if the server is providing primary services. The ScE metric can provide datacenter managers with the ability to determine which servers are being actively used for primary services vis-à-vis ‘ghost’ servers which may be good candidates for virtualization or consolidation.
Power down servers when not in use
While the majority of servers in datacenters may be utilized around the clock, there may be some servers which may only be used during certain parts of the day or week. These servers should be turned off when not in use to save power going into these servers.
Enable power management
To reduce usage of power in servers, datacenter managers should employ Demand-Based Switching (DBS) to attain significant savings in the data center.
Replace inefficient servers
Once a server is purchased, it is considered as ‘sunk cost’. Not taken into account are the ongoing operational costs to power the server which includes power, cooling, software licensing and so on. A new multi-core server may replace as many as 15 single-core servers, saving as much as 93% of the power usage. In addition to the power savings, software licensing and other maintenance costs can also be considerably reduced. Additional savings include a reduction in datacenter cooling costs and the potential to reclaim valuable rack space.
There are various compelling reasons for virtualizing servers. From a business continuity viewpoint, virtual machines can be isolated from physical system in the event of failures to augment system availability. In addition, parallel virtual environments allow for an easier transition to a backup facility. From an energy efficiency viewpoint, virtualization provides numerous opportunities for energy savings.
Virtual machines provide minute control over workloads and can be moved to additional active servers as demand increases. Overall, virtualization can increase server CPU usage by 40-60%. As the CPU usage is increased, the energy efficiency of the server power supply will also increase.
The Power Usage Effectiveness (PUE) metric provides valuable information for measuring datacenter energy efficiency. But it represents only one component in a comprehensive energy management program. While datacenter managers are under tremendous pressure to reduce the PUE, doing so without a full understanding of power usage in the datacenter might actually be detrimental.
Datacenter managers must consider other metrics such as energy usage at the IT device level and server compute efficiency other than the PUE to affect sustained reductions in energy usage.