FAQs - Data Centre Maintenance Services
Data centre maintenance services to keep critical infrastructure systems in server rooms and data halls available.
Critical infrastructure systems require regular inspection and preventative maintenance to ensure their resilience and availability. The main infrastructure systems in a server room or data centre including power (UPS and generators), cooling (air conditioners, CRACs, CRAHs, chillers and liquid cooling), and fire suppression systems. These critical infrastructure systems have consumables that require inspection and replacement on a regular basis. Their regular inspection, maintenance and service should be included within an annual maintenance contract which should also include 24/7 support and emergency call out.
A single point of failure (SPoF) is a part of a system or product that will lead to downtime if it fails. Within a server room or data centre environment, critical infrastructure systems are designed and installed to remove single points of failure, using N+X configurations that provide redundancy in the system design. An N+1 system configuration, removes a single point of failure by providing a second system that can take over from the first, should there be a failure. Typical arrangements include modular UPS and cooling systems but if one of the ‘+X’ elements goes off-line, so that the system is reliant on its N (single) system, then the system has a single point of failure. Regular data centre inspection and system maintenance can identify and help to prevent single points of failures.
The primary source of data centre downtime is a failure within the critical power supply. Critical power is the pathway that connects the building incomer to the IT load (servers, storage, and networking devices) and will consist of LV switchboards, sub-distribution switchgear, standby power generators, uninterruptible power supplies and power distribution units. Failure can occur within any of these systems if they are not regularly inspected and maintained as part of data centre maintenance contract. The most common cause of UPS failure are the batteries maintaining sufficient charge to provide long enough for the generator to start. The second most common cause of downtime in a data centre is failure of the generator to start (open circuit-breakers, failed starter battery or fuel contamination). Regular testing, inspection (including thermal imaging), and maintenance will help to prevent failures and ensure system resilience.