How to Select A Server Room Monitoring System
Rising cooling requirements are a common problem faced by many organisations as they adopt the latest server technologies, including high power computing. The higher the power draw of the latest servers the greater the need for air conditioning as their kW power draw adds an equal demand to the size of the air conditioning required. Edge Computing and micro data centres add another layer of complexity to the need for a dedicated remote environmental monitoring system.
Environment Monitoring Benefits
There are several ways to monitoring temperature and other environmental factors within a server room or datacentre. Some critical infrastructure systems include temperature sensors that can detect rising temperatures within their cabinets or accessories. UPS systems and their battery sets are a typical example. Temperature probes can also be added to cabinet power distribution units (PDUs). Whilst these will provide useful information, they are separate systems and do not provide in themselves a comprehensive overview of the server room and serve racks and their environmental profile.
A dedicated server room environmental monitoring system can provide this and help to:
- Prevent System Downtime: most electronic devices can work up to 40 degrees Centigrade without derating but at temperatures above 25degC internal cooling fans have to work harder to force air over their circuits and heatsinks. Overall reliability reduces in the long term and can lead to intermittent faults and eventual downtime. Knowing when a temperature is rising within a server room or server rack allows action to be taken, either to improve air flow through a reconfiguration, replace hardware or add further cooling.
- Reduce Operating Costs: cooling has a financial cost. The higher the kW-demand on an air conditioning system the more electricity is used to provide the required level of cooling. Maintaining a stable temperature and humidity-controlled environment helps to stabilise demand on local air conditioning systems and allows them to be optimised for air flow and cooling efficiency. A temperature-controlled environment can also prevent heat damage to local UPS batteries which need to be within a 20-25degC ambient. Whilst cooling fans have a high mean time between failure (MTBF) of approximately 70,000 hours, high speed running can lead to increased wear and tear and potential long-term failure.
- Employee Comfort: for health & safety and personnel wellness it is important to make sure that any employee or visitor to the server room facility can work within it comfortably. An ambient of 20-21degC is normally considered acceptable with general recommendations running from 10 to 28degC.
For more information on suitable server room and datacentre temperatures see: https://tc0909.ashraetcs.org/documents/ASHRAE_TC0909_Power_White_Paper_22_June_2016_REVISED.pdf
Server Room and Datacentre Environmental Monitoring Checklist
Whilst temperature is probably the most critical environmental factor to monitor in a server room, datacentre or Edge Computing facility, it is not the only one. There are several others that a dedicated server room monitoring system can monitor using a range of sensors, detectors and converter devices to provide a comprehensive and safer environment for data processing and storage.
- Temperature: the most critical environmental monitoring factor for many IT environments. The principle reason for this is the amount of power drawn by each server places an equal demand onto the cooling system. Within server racks, ‘hot-spots’ can develop which can lead to equipment malfunction and the potential for thermal runaway, posing a fire risk. With the average server rack power demand rising from 5kW to 10-15kW or more, there is an even great risk to the overall IT facility from an internal ‘hot-spot’ with most facilities adopting a 6-six temperature monitoring policy for each server cabinet i.e. top, middle and bottom of the rack, front and rear access.
- Relative Humidity (RH): humidity is a temperature related environmental factor. Humidity is a term of moisture content and the amount of moisture within the air is important within an IT facility. When air is cooled to its Dew Point, moisture forms and this can present corrosion and potential short-circuit issues within any sensitive electronics environment. If the air becomes too dry there is a potential for electrostatic discharge. Most facilities aim for a relative humidity 45-55% and a stable temperature environment. Humidity control is also important for those working within the facility. It is important for them to rehydrate regularly as the air conditioning removes moisture from the air as part of its cooling operation. Monitoring humidity is also a good check on how efficient the air conditioning system is.
- Air Flow: linked to cooling and therefore temperature and humidity control, is ‘air flow’. This can be especially important in high density computing server racks or hot/cold aisle containment systems. Plug-in air flow sensors can be connected to an environmental monitoring system to provide an additional ‘proxy’ check on the cooling efficiency.
- Flood and Water Leakage: water leakage may only seem an issue for server rooms and datacentres with water-based fire suppression systems but there are other potential problem sources. Air conditioning systems can be poorly installed and breakdown or a ceiling-mounted AC unit installed over server racks in error. Overhead or underfloor condensate pipes can crack or rupture. Whilst not recommended the building may be remote or operated within a flood plain or the server room situated within basement with a potential for flooding.
- Smoke and Fire Detection: this can be especially important if the server room does not have a dedicated fire suppression system. Smoke is a sign of rising heat and that a cable, battery or electronic device may not have reach its full combustion temperature. Sensors that detect smoke, whether in a room or at the server rack level, can provide enough reaction time to prevent a catastrophic fire breaking out within a server rack or a space within the serve facility.
- Critical Power and Energy Usage Monitoring: the critical power path within an IT facility is the electricity distribution path form the building incomer to the PDU socket outlets from which IT servers and network peripherals draw their power. An environmental monitoring system should be capable of monitoring both single and three-phase electrical supplies and systems installed along the critical power path. This includes PDU outlet sockets, uninterruptible power supplies and standby backup generators. Energy efficiency can be an important metric for many datacentres and organisations running server rooms. Specific datacentre metrics such as PUE (power usage effectiveness) measure the IT load compared to the total facility load and requires accurate energy usage monitoring in kWh. Colocation datacentres require kWh measurements for client billing.
- Physical Access: there can be several aspects to monitor include room access and server rack level access. The access control system operating within a building may be capable of monitoring and controlling access to a server room or IT space. Rack level access may require separate management and by the dedicated server room environmental monitoring system. Sever rack level access may be important within share server rooms, with security enhanced using motion detection and IP-based CCTV cameras.
- Network Connection: here consideration has to be given to how the network monitoring system is to be connected to the local network. The most common method is to use an IP/Ethernet connection but where this is not possible WiFi may be an option offered by some environment monitors. For remote sites, mobile GSM may be the only way to track and control performance. GSM alerts can be important for IT managers who looking for an additional alert outside of an overloaded email box.
- Messaging Systems: most environment monitors can offer a variety of alert and communication options. These can include email alerts, SMS text message, phone calls, web interfaces (Cloud portals) and simple network management protocol (SNMP). Whichever method is chosen it is important to put in place a robust system that meets the need of the organisation and to routinely check and test the system to ensure that alerts are received and can be acted upon.
- Environment Monitor Selection: factors to consider here include the number of environmental factors and therefore sensors required for the device to operate as required. An environment monitor may have a built-in temperature sensor or require plug-in sensors and detectors. The device may have a built-in battery pack or require to be powered from a local UPS system in order to operate whether a mains power supply is present or not. As well as digital inputs (DI) and analog sensor connections, digital output (DO) may be required to drive an actuator or other device as part of a response to an alarm condition.
Summary
As organisations have increased the complexity of their computing facilities, they have come to rely on their environmental monitoring system to ensure availability and uptime. Additional drivers for its installation have included industry and legal compliance as well the need to meet insurance policy requirements. The benefits remain the same whether the server facility is on-premise or part of a hybrid-cloud facility or remote as in an Edge Computing or micro data centre. Monitoring temperature and humidity levels helps to keep servers running, reduce energy costs and provide enough time to react to a potentially catastrophic system problem or failure.