Annual Server Room Audits and Inspections
Server rooms should be audited and inspected regularly to ensure critical infrastructure systems are fully operational and ready to provide protection, monitoring and control of this most sensitive of environments. Typical systems including power, cooling, fire suppression and monitoring. In this article we cover some of the key questions we ask during an audit and inspection.
UPS System Checks
When was the last time you ran your UPS on battery power? Most uninterruptible power supplies have automatic battery test routines that run daily and alarm if they sense a loss of battery capacity. The testing is predictive only and the only way to fully evaluate if the battery will be able to deliver its full autonomy is by conducting a simulated mains supply failure and recording the amount of time the battery runs for.
Valve regulated lead acid batteries (VRLA) either have a 5year or 10year design life. This means that they will typically require replacement during years 3-4 or 7-8 respectively. Replacement UPS battery kits are available from Server Room Environments for users to replace their own battery cartridges or we can provide a complete onsite UPS replacement service. All batteries returned to us are disposed of via our eco-friendly and managed waste streams.
Simulating a mains power supply failure may form part of an annual desktop business continuity plan test. As important as checking battery capacity is IT network uptime and connectivity. Every critical IT network item must be UPS protected and a simulated mains power supply failure can help to identify any routers and other network peripherals that also require backup power. It is also important to check that any UPS remote monitoring for alarms and server shutdown is also active and configured correctly.
How long do you need to run for during a power outage? Power outages can be momentary, lasting milliseconds, (noticeable as lights flicker from overhead storms) or longer. UPS with little battery capacity will ride through most short duration power outages but for longer periods will require a larger battery set. The acceptable ‘standard’ is 10-30minutes but longer runtimes up to 4 or 8 hours or longer may be required for service and business continuity.
Any UPS system should be maintained and the older the system, the more important this becomes. There are consumables to replace annually and if there is an alarm condition, a maintenance contract should include an emergency response service level. Consider where the response time meets your planned needs for the coming period and raise this with your UPS maintenance provider.
Cooling System Checks
How cool does your server room have to be? The recommended ambient temperature for a server room is 18-27°C (64-80°F) and for people working in the space, around 23°C or less. Most server rooms run their air conditioning systems on too low a setting. Consider increasing the room temperature by at least 1-2°C if you are running at a low temperature setting. The savings should help to reduce overall running costs and electricity bills.
An air conditioning system should be maintained annually. As with a UPS system there are consumable items that need to be checked and replaced on regular basis. As part of a planned preventative maintenance inspection, an air conditioning engineer will check and replace the consumables in line with manufacturers recommendations and check coolant pressure.
How long do you need your air conditioning system to run for on mains power failure. Most air conditioning systems are powered directly from the mains power supply. There may or may not be a local standby generator. We prefer not to run air conditioners as part of the UPS system load, as this can lead to oversizing. However, as soon as the air conditioning ‘fails’ heat can build-up quickly within server cabinets and the room space itself, presenting a potential fire risk.
Either your IT servers and network is shutdown in a controlled manner when there is a long power outage, or if you need to maintain their uptime for business continuity reasons, you need to ensure there is a suitable power protection strategy in place.
Server Cabinet Air Flow
How well your air conditioning systems cool your servers, storage and IT devices is also dependent on airflow, both around the room and within the server cabinets. All server cabinets should have sides and doors fitted. Blanking panels should be added to the front pillars where the U-slots are unused. Improving airflow management will improve the overall cooling from intake to hot exhaust efficiency and help to avoid hot-spots within the cabinets.
Server Room Cleaning
It is surprising sometimes how much dust and dirt can buildup in a server cabinet and around the IT devices. Not only that, but debris can also build-up outside the cabinets from onsite works, cabling and from the general movement of people within the space. If your server room is rarely cleaned, consider an annual clean. Also remember to remove any packing, old cables and IT devices that have been left in the area and consider how best to safely dispose of them if they will not be reused.
Fire Suppression Inspection
Your server room may or not have a fire suppression system installed. If there is fire protection in place, this should be on your maintenance register. A room integrity test (RIT) may also be required as part of annual insurance renewals. If you do not have fire suppression installed, it may be worth considering. Also consider removing any unnecessary items from the room, including combustible materials. Fire requires oxygen and fuel, such as carboard packaging.
Environmental Monitoring Tests
The critical infrastructure systems within a server room may be installed with their own connectivity interfaces. An uninterruptible power supply may for example have an SNMP card to allow IP connection to the local network for use with UPS monitoring and shutdown software. Air conditioning and fire suppression systems may have an SNMP or relay contact card to provide a digital alarm signal.
Whilst this approach can monitor for alarm condition, it does not provide a dashboard overview of the entire facility and the systems in use. This is where an environmental monitoring system wins. As well as being able to monitor plug-in sensors and detectors that can be installed throughout a room, including temperature, humidity, water leakage, AC power presence, battery health, smoke and door access, the system can pick up digital signals and Modbus comms from third-party systems.
As part of an annual business to continuity simulation, it is important to assess the pre-configured alarm notification and alert (email, SMS, phone, SNMP) routines. As well as ensuring alerts are received, threshold and frequency levels can be checked.
Structed Cabling
Overtime cables get added to patch panels, connecting routers and switches, servers and interfaces. All too often we see poorly labelled cables and often a nest of cables with no identification as to their connectivity and network importance. Often, the only way to solve the issue is to depopulate a server cabinet and to rerack. Regularly updating a cable map and labelling cables prevents this type of problem. If you have this type of issue but cannot afford to power down the kit, consider tackling this in staged approach, mapping out and identifying the most important connections, before your next annual review.
Energy Usage and Sustainability
Any annual review would not be complete without a review of the running costs and budgets allocated for the coming year. Energy is the biggest running cost for a server room, and particularly the costs of electricity. Over 40% of a typical operational budget is spent on electricity. As well as reviewing providers (traditional and renewably sourced), it is also important to consider what savings can be made from upgrading systems. Each generation of uninterruptible power supply and air conditioning shows improvements in energy efficiency which should result in lower electricity usage and improve resilience.
Summary
Over time, server room resilience can reduce, as systems age and poor practices can creep in. An annual inspection or audit provides an opportunity to review the entire facility and each operational system and aspect. The overall purpose being to ensure business continuity and uptime. Contact our Projects team to discuss how can help improve the design, energy efficiency and resilience of your server room or data centre.