Archive Page 2


EMP Attacks

It’s fair to say that after the attacks on September 11th, 2001, our discussions on security changed forever.  I personally recall never having conceived of attacks of that nature prior to that day.  Since then, security has received a new and enthusiastic level of scrutiny.  Many people make a living thinking of scenarios that might seem unimaginable to the rest of us.  They look around and ask, ‘where are our soft spots as a country?’ and critical infrastructure always seems to fit the bill.

The concerns are straight out of a Tom Clancy novel: we are a technologically advanced nation, and we rely to a high degree on electronics and integrated circuitry, and then some rogue force acquires an EMP device to decimate our technology and thrust us back into the stone age.  The topic of Electro Magnetic Pulse attacks has come up in data center design more than once, and it is often a topic of discussion at forums and consortiums on data centers.

First, some history.  The first noted EMP disturbance was actually a by-product of high altitude nuclear detonation tests over the Johnston Islands in the Pacific.  A detonation named ‘Starfish Prime’ caused electrical disturbances in Hawaii several hundred miles away. The physics are complicated, but as a nuclear detonation occurs, the Compton effect causes a kind of major power surge in equipment that usually exceeds the capacity of the conductor to handle.  The result is fried and non-functional circuitry.  Naturally, this effect got the attention of the Department of Defense who saw several potential applications for this effect.  Several tests were conducted until 1963, when the above ground nuclear testing treaty was signed due to concerns over radiation pollution in the Earth’s atmosphere.  No EMP from a nuclear ordinance has been created since.

In spite of the ban, the effects of high altitude detonations was well understood by that time, so DoD standards and specifications were developed to protect sensitive electronics in critical buildings and war machines.  The DoD attempted to build gigantic testing facilities that would simulate this effect, the first being the trestle at Kirtland Air Force Base, another being the EMPRESS system developed by the Navy.  From what I have read, these did simulate the effect, but could not create a power spike on the magnitude of a nuclear weapon.  They were better than nothing, but less than the real thing.

Fast forward to today, the concern is now fresh on the minds of anyone building a critical facility.  If the more robust electronics of the post war era could not stand up to EMP, how could the delicate integrated circuitry of model electronics ever stand a chance? How can we protect our sensitive equipment from this kind of attack?  Well, general consensus today is that a Faraday cage is the best way to protect systems from this effect.  This has manifested itself from the very sensible sheet metal rooms or computer cabinets to the questionable installation of chicken wire into the envelope of the building. It’s here that I would like to make two arguments: 1) You can’t really guarantee that you can protect your equipment for several reasons and 2) with cloud computing taking off, this will probably matter less and less for end users.

Here are the problems with trying to harden a facility against EMP.  First, there really isn’t that much information available to the public about this kind of weapon.  Remember, there has not been a documented EMP event since before 1963, or nearly 50 years.  Second, there is no viable way to test or commission an installation of chicken wire (or any other protection scheme).  This is especially problematic because every penetration into a chicken wire cage is a potential conductor of electricity and could compromise the integrity of the cage.  This means every wire, pipe, duct or structural member.  DoD specs call for special line arresters and filters on all incoming power lines.  Finally, consider what would be required to generate this EMP.  A well placed high altitude nuclear detonation over Kansas City would affect most of the 48 states and substantial portions of Canada and Mexico.  The list of candidates to accomplish this task is short, and it flies in the face of current theories of nuclear deterrence, namely that a nation keeps these weapons in the hopes of not using them.  None of this addresses the much larger concerns of a society thrust into darkness, with power and infrastructure in ruin.

And here’s why it won’t really matter for end users in the years to come.  The best shield against EMP is actually the Earth itself.  The extents of the EMP are the sight lines to the horizon from the point of detonation, everything beyond is un-affected.  As companies migrate to the cloud, their information and processes will live redundantly in the cloud across a wide physical geography.  If Google’s American data centers went down, its European, Asian and Scandinavian centers would still run, and processes would be backed up.  This kind of thinking is not new, companies will place redundant data centers a minimum distance from each other so a singular event is not likely to take out both.  Yes, physical infrastructure would be lost, and the costs would be devastating to a facility owner, but the real value of a data center is the business processes that occur in them, and those will surely live on and survive such an attack.


Moving Beyond the Tier Rating

This is an interesting article about an emerging resiliency strategy for large scale IT operations.  If you read through the Tier guidelines from the Uptime Institute, you’ll note that for the two upper tiers (more resilient with respect to downtime) that generator plants are considered the primary power source for the building, and that all other utility feeds are just lagniappe.  Well, what happens when those utility feeds are more reliable than a generator plant?

There is a whole series of events that must occur in the proper order to ensure that from the time a utility feed is dropped and gens are brought online, IT processes are preserved.  This is a very complex process and it is why we commission data centers.  We want to be sure that these backup systems come online without a hitch.  However, there are so many parts that must work properly, there exists the real possibility of failure.  To give you an idea in basic terms, the sequence might go something like this:

1. The utility feed goes down

2. A static switch at a UPS throws over to battery or flywheel power temporarily

3. Generators are brought online

4. Some kind of switch gear switches the power over to generator from failed utility

5. Static switch at UPS switches back over to primary feed

The equipment that is installed to make this happen is very, very expensive.  The generators can easily run into the 6-figures for each set, and all of the required switchgear and UPS modules constitute a substantial part of the cost of the project.  They can also carry substantial maintenance costs.  The other factor here is that a company with redundant processes across the globe can afford to allow downtime at any given facility.  In this way, it’s a bit like a car rental business in that there is no need for insurance, because having a whole fleet of cars IS the insurance.  The most telling part of the article is the last section, where they rightly point out that this would be courting disaster for a smaller operation that is more critical to a company’s function.

In the case of the power grid across the pond, to not have an outage in nearly 30 years is nothing short of amazing!  The Facebooks and Googles of the world appear to have transcended the world of tier ratings in a big way, and now they enjoy a competitive advantage with their lower cost facilities.


Life is Imperfection

Fly in the Ointment? Meet Cricket in the Epoxy.


Tornadoes and Other High Winds

It goes without saying that  Data Center clients are always looking to locate in an area that fits certain favorable criteria for data centers, not the least of which is a lack of major weather events that might contribute to downtime.  If you look on the SuperNAP website, they are very proud of the fact that they are located in a zone of the country that has virtually no disruptive weather events (Las Vegas).  There is little rain, no hurricanes, earthquakes, volcanoes or few tornadoes.  Of course, weather can’t be the only deciding factor, others might include tax incentives or the cost of utilities, or proximity or connectivity to regional or national locales.  For all the clients that cannot locate their entire IT operation in Las Vegas, weather events will eventually come up as part of the risk assessment.

I am by no means an expert on chance, nor am I an insurance adjuster, but it is still an interesting exercise to watch.  If you have ever played roulette, you have probably seen the sign above the wheel that shows the results of the last few spins.  It should be obvious to all of us that it is purely a game of chance, and that the numbers on the sign are not an indicator that a certain number or color is due because it keeps showing up or doesn’t show up at all.  If the last three spins were 18, then the odds that the next spin will be 18 will still be one in 38 (on an American wheel, for you gambling enthusiasts).  So it is with tornadoes.  We look at historical data, and to some degree we can say with certainty that particular areas of the country will receive more tornadoes than others.  Beyond that, it gets a little fuzzier, and this stems from an evolving process for studying and understanding tornadoes.

The currently accepted measure of severity for tornadoes is the Enhanced Fujita Scale, based on research from the National Weather Service, American Meteorological Society and the Wind Science and Engineering Research Center at Texas Tech.  With time and more exhaustive research, this scale will become more accurate and useful, however, I see two major issues with the scale as it exists currently.  First, the scale is based on observed damage, not on direct measurement of a tornado event.  This means that they look at the damage, and then try to figure out what kind of wind would be required to cause that much damage.  Second, our records of tornado events are incomplete and inconsistent earlier than 1950, when these events were tracked in a national registry.

I’ll address the windspeed first.  The windspeed factor is crucial because this is the end result that we are trying to understand.  When we design a data center, the windspeed which it should resist is among the first decisions made.  We look at the frequency of tornadoes of all scales and pick a design wind speed in excess of that which is required by local building code.  The danger here is that, since the Fujita scale is based on observed damage, it is conceivable that an owner either over builds and thus wastes resources that could have gone into critical infrastructure, or underestimates the risk, exposing the facility to more risk than was assumed based on historical data.

And this brings up the second issue which is historical data.  Consider this:  America is a large country, and is not densely populated from coast to coast, there are gaps between population centers.  If a tornado were to strike one of these areas and not cause any noted damage, would there be damage to assess a windspeed, or would the tornado even be registered?  I suspect that the recorded data probably under-counts the number of tornadoes in a given state or county.  The other question is how accurate past accounts of damage were when being assessed on the Fujita scale.

Ultimately, the decision rests with the owner.  Just like with investments, past performance is no guarantee of future returns, and the decision will come down the owner’s tolerance for risk at that facility.  I have heard of people locating in bunkers, and others who take a riskier approach per facility by distributing processes.


Fascinating Discussion

This week, I was discussing cloud computing with an equipment vendor and an electrical engineer.  One of the product reps that had been in the room just minutes before had casually stated that ‘virtualization would eliminate or reduce the role of tier ratings in data center design moving forward.’  This was a very bold statement, but there is some merit to what was suggested.  Cloud computing means externalizing or outsourcing processing to a server or facility that is not local to the user, so in the sense that a process could be sent to several locations simultaneously might suggest that this represents an increased level of redundancy for that process.

However, the designer in me still thinks that there is a physical connection to the cloud that might require more redundancy.  Think about it for a moment.  If you run an office and have only a single telcom line entry into the building, and that line or any part of it’s network responsible for delivering your data to a remote location for cloud computing is severed, then there is failure regardless of how redundant the cloud may be.  It might mean now that we have externalized the risk to areas that are no longer under our direct control.  A data center can bring multiple fiber providers into a facility, and multiple utility feeds.  These things help ensure that there is no disruption to the critical IT processes.  The other issue at work is who is going to provide the capital for multiple cloud sites, and at what cost?  Does the cost of redundancy for systems decrease with the prospect of spreading the processing around to multiple sites?

If we rely on the idea of ‘2N’ as provided by multiple data center sites, does that mean that currently we accept the possibility of a single point of failure in data delivery?  It’s a high level discussion that is probably dependent on the particulars of what kind of data processing is going on.  But I just can’t see a future where the tier ratings aren’t a factor in design anymore.


You Learn Something New Every Day

In data centers, it is common to deploy power and cooling systems in a redundant configuration.  This is to say, there will be more equipment installed than is actually required, so that if one system fails, there is at least one system there to pick up the load.  This is usually expressed as ‘N’ for the number of units required, ‘N+1’ for the needed number of systems plus an additional, 2N for twice the number required and so on.  Well, I learned recently on a data center project that I am trying to wrap up that running all systems in tandem for cooling actually produces energy savings because of the energy required for fans or equipment to start.  So even though they have N + a number of redundant chillers or CRACs, all units would run at the same time and at a lower capacity thus sharing the load.

I should caution that this is specific to a particular project and the input of a mechanical engineer (which I am not) is required to make an ultimate determination. I am told that this is a common energy savings strategy for larger data centers with chiller plants carrying a redundancy or with large CRAH units.   This comes as a surprise to me after meeting clients who advocated alternating equipment on a schedule to balance run time for equipment, which must seem preferable to designating a primary unit and wearing the unit out and having to rely on the backup system while the primary is maintained or replaced.  I’ll be interested to learn more.


Smaller Scales Welcome the Proprietary

It seems that the smaller the overall design and deployment of a data center, the more focused and specialized the equipment and best practices can become.  I am currently working on a small data room (calling it a data center might not accurately convey the size of the project) that will utilize a proprietary cooling strategy, as well as a proprietary power delivery strategy for the racks that are being deployed.

The project was planned with 5kw per rack, which to me seems like a lot of power and density, but in a couple of years, if current industry trends continue for data processing, it might just be a middle of the pack deployment.  In a larger enterprise facility, this kind of power consumption would require a rather large chiller plant and some serious power delivery and backup systems.  But the solution that has been proposed is a packaged UPS/Battery unit that deploys nicely within a data hall space, and is modular for growth if needed in the future.  The interface and manual switching are delightfully simple, so much so that anyone with enough patience to study the reductive one-line diagram could safely manage the use of this hardware.

The cooling proposed is equally clever.  Small CRAC units that fit into a standard cabinets provide cool air right where it is needed the most, eliminating the need for a more traditional underfloor plenum pressurized by CRAC/CRAH units located remotely from the servers.  The space below the raised floor can now be reserved for refrigerant lines and whips, leaving this space clean and free of obstructions.  This is very efficient on a small scale with a small volume to cool.  To ‘top’ it all off, they have decided to deploy a hot air containment system to keep the data room at a pleasant, workable temperature.  It may be small, but it is a microcosm that reflects perfectly the current trends touted by larger data centers splashing the headlines of technology news outlets.