:

Cisco on Cisco

Data Center IT Deployment in Progress: How Cisco IT Consolidates I/O in the Data Center


Cisco Nexus 7000 and 5000 Series Switches consolidate I/O, reducing cabling requirements and increasing application performance.
BUSINESS OPPORTUNITY

Cisco IT is transforming its data centers with solutions that help to realize the company’s Data Center 3.0 vision, which employs a unified network fabric to connect servers and storage devices in a way that is resilient, scalable, and easy to manage. The transformation occurs in three stages:

  • Consolidating I/O and increasing throughput by implementing unified I/O running on 10 Gigabit Ethernet (current stage)
  • Increasing the power available to compute resources by reducing the power consumed by the network infrastructure
  • Making applications location-independent, which will simplify changes and possibly eliminate the need for change requests

This deployment report focuses on the first stage.

Until now, Cisco has used a traditional Cisco Ethernet switching infrastructure and Fibre Channel switches at the distribution layer. Cisco IT is consolidating data center I/O from multiple 1 Gbps Ethernet connections and 4 Gbps Fibre Channel connections to a pair of 10 Gbps Ethernet connections through a lossless, high-performance, low-latency switching fabric.

“We wanted to consolidate to 10 Gbps and also increase port density throughout the network’s core and distribution layers,” says Mauricio Arregoces, engineering manager at Cisco.

Business drivers for this change include:

  • Meeting increased bandwidth demand while improving application performance. Cisco has adopted server virtualization technologies to consolidate servers and improve utilization. A single physical server that hosts multiple virtual machines requires greater bandwidth. Cisco IT wanted to increase bandwidth and support multiple application services while also increasing application performance.
  • Reducing hardware and operational costs. By consolidating from 1 Gbps to 10 Gbps I/O, Cisco IT will significantly reduce the number of ports that IT needs to purchase and manage. “Consolidating our access and distribution layer switches will reduce power consumption, cooling, cabling, and space requirements,” says Wilson Ng, Member of Technical Staff for IT Network and Data Center Services Engineering. "We can use the energy we save for additional servers, extending the life of the data center.”
  • Adopting Fibre Channel over Ethernet (FCoE). Currently, Cisco data center servers need multiple Ethernet and Fibre Channel I/O connections for different networks: data, storage area network (SAN), server management, and response and throughput performance. Maintaining separate fabrics for network and storage increases hardware and operational costs and complicates provisioning. By consolidating the SAN node into10 Gbps network connections, FCoE simplifies management and reduces cost.
    “FCoE has the same potential for the data center that VoIP had for voice,” says Mike Norman, director of IT Network and Data Center Services Engineering. “Just as VoIP enabled Cisco WebEx and Cisco TelePresence, FCoE will enable new data center functions.”
    To adopt FCoE, Cisco needs greater throughput. “Cisco’s servers require 4 Gbps of Fibre Channel capacity for storage access, so we need a 10 Gbps connection to accommodate storage plus data,” says Kumar Ramachandra-Rao, technical staff member, IT Network and Data Center Services Engineering.

To meet these business requirements, Cisco IT needed a new data center I/O architecture that would:

  • Consolidate I/O at the access layer
  • Increase bandwidth in the core layer without dropping packets or increasing application response time
  • Scale port density at the distribution layer

“Adding 10 Gbps connections is not new,” says Tom Settle, technical staff member, IT Network and Data Center Services Engineering. “What’s new is scaling port density at the distribution layer.”

When Cisco IT began planning for I/O consolidation in 2004, the only option was InfiniBand, a Layer 2 proprietary method of connecting hosts with storage, SAN switches, and networking devices. But InfiniBand requires proprietary copper cables and highly scalable gateways, both of which increase capital and operational expense. In addition, Cisco IT prefers to improve operational efficiency by standardizing on Ethernet and IP standards-based technologies whenever available, which made FCoE the preferred choice.

Before adopting FCoE, however, Cisco IT needed to ensure that faster I/O speeds would not result in dropped storage traffic. Dropped packets are not a major problem for network data traffic because the receiving node can either ask the sending node to retransmit or just ignore the missing data. But storage systems have less tolerance for dropped packets; therefore, Cisco needed an FCoE solution that would eliminate dropped packets, a “lossless” fabric.

IT PROJECT

Cisco IT deployed Cisco Nexus 7000 and 5000 Series Switches as the platform for unified I/O. The Nexus 7000 Series meets Cisco’s needs for the distribution layer because of its scalability, up to 512 Gigabit Ethernet ports and up to 15 Tbps backplane capacity. A standards-based switch, the Cisco Nexus 7000 is built to support future 40 Gbps and 100 Gbps Ethernet. The Cisco Nexus 5000 Series meets Cisco’s needs for a top-of-rack access switch because of its unified I/O support, high port density, and low-latency (less than 3 milliseconds) lossless fabric, which will improve application performance.

In September 2008, Cisco IT deployed the Nexus 7000 and 5000 Switches along with Cisco Catalyst 6500 Series Switches in a controlled production environment at its data center in Mountain View, California. The test environment includes the following components:

  • Nexus 7000 Series Switches are deployed at the Layer 2/3 distribution boundary to consolidate existing Cisco Catalyst 6500 Series Switches. For the initial deployment, Cisco IT is using a Nexus 7000 Switch with a 10-slot chassis. Each 10-slot Nexus 7000 Switch can replace two to four traditional Ethernet switch chassis, significantly reducing Cisco’s data center space, power, and cooling requirements. In the future, Cisco IT plans to use 18-slot Nexus 7000 Switches to accommodate growing density and bandwidth requirements.
  • Nexus 5000 Series Switches are deployed at the server-access layer to provide a low-latency, lossless fabric. As a top-of-rack switch, the Nexus 5000 consolidates network and storage hardware for the access layer and also reduces cabling costs.
  • Catalyst 6500 Series Switches are still used at the network edge to provide rich network services, such as load balancing and Secure Sockets Layer (SSL) acceleration services provided by the Cisco ACE Application Control Engine. “The Nexus 7000 is intended as a 10 Gbps switch that provides Layer 2 multipaths,” Ng explains. “Therefore, using both products, the Nexus 7000 at the distribution layer and the Catalyst 6500 as the service switch, will provide the best possible network capabilities for Cisco.”
Nexus Pod Deployment

Cisco IT is deploying the Cisco Nexus 7000 and 5000 Switches in four steps:

Figure 1. Nexus Pod Environment

Click on Image to Enlarge popup

  • Step 1. Deploy unified I/O pods in the Mountain View data center. Cisco adopted the pod concept in 2004. Each pod is a self-contained, standardized, modular, computing block that is supported by its own LAN and SAN network substation. Pods incorporate the network access layer and distribution layer, which connect to a common LAN and SAN core layers shared by all pods.
    “Pods are ideal for testing because they provide a repeatable network environment with predictable scalability,” says Arregoces. In the new pod configuration, the Cisco Nexus 5000 is the top-of-rack switch linking to the Cisco Nexus 7000 Switch for distribution and to the Cisco MDS Multilayer Director Switches for storage (Figure 1).
  • Step 2. Measure application performance within the pod to determine which applications experience the greatest throughput improvements from FCoE. “During testing, we will confirm the performance benefits of FCoE based on application profiles and incorporate the findings into our virtualization and load balancing rules,” says Norman. Cisco IT will compare throughput for the Nexus pod to throughput for the traditional separate LAN and SAN fabrics that are currently used in Cisco’s production data centers.
  • Step 3. Connect the pod to a production Oracle database in the Mountain View data center and repeat the performance testing.
  • Step 4. Use FCoE within the rack. The converged network adapters will connect to the top-of-rack Nexus 5000 Switch. “Over time, new features on the Nexus 7000 will let us extend Fibre Channel to the cloud,” Norman says.
Management

The Nexus 7000 and Nexus 5000 Series Switches use the NX-OS operating system, which closely resembles the Cisco IOS Software. Cisco IT staff can configure and implement NX-OS using their existing skills. “Staff who have solid experience with the Cisco IOS Software or the MDS SAN-OS software learn how to configure the NX-OS in one to two hours,” says Ng.

Cisco IT plans to integrate management of the Cisco Nexus 7000 and 5000 Switches into existing system management environments using Cisco Data Center Network Manager.

ANTICIPATED RESULTS

In October 2008, Cisco IT certified the Nexus-based pods as ready for production. Five production business applications are operating in the Nexus pod environment, including News@Cisco, a financial system, and a database used by the Office of the Chairman and CEO.

Cisco IT anticipates the following results from the initial deployment.

Low-Risk Experience with FCoE

I/O consolidation is the first step on the journey to a unified fabric, according to Norman. “This is a low-risk step because initially we are only using FCoE as an I/O access technology in a single rack, not across the whole data center fabric,” he says. “Our goal during this step is to expose the teams to FCoE so that we can fully understand the technology and its effects on the Cisco IT organization.”

Increased Scalability

The 18-slot Cisco Nexus 7000 Switch can scale up to 512 10 Gbps Ethernet ports. “Increased port density enables us to attach more access switches and gives us the bandwidth to scale aggregation at the switch layer,” says Settle.

Reduced Operational Costs

Cisco IT is calculating power savings at both the system level and data center level:

  • System level: “Based on our current pod designs, we anticipate that using the Nexus 7000 and Nexus 5000 Switches will enable us to reduce the number of access and distribution switches by up to 80 percent,” says Ng. “We expect this to reduce each server’s power consumption  about 40 watts and will verify our assumptions in the early adopter pod.” Cisco can either reduce its data center power consumption or use the saved capacity to power more servers.
  • Data center level: “Suppose that Cisco IT needs to support 9000 new servers over four years. If Nexus 7000 can increase the number of servers per megawatt from 1000 to 1500, then we can build one-third fewer facilities,” Norman explains. “Building three instead of five 2-megawatt facilities would increase our return on investment by 50 percent.” 
Lower Cabling Costs

Cisco IT expects to significantly lower cabling costs by reducing the number of connections to each server, as well as the connections from racks to the distribution switches. “We can just lay fiber once and then never again have to worry about separate cabling for data and storage traffic,” says Sidney Morgan, manager, Cisco on Cisco IT.

Faster Provisioning

When Cisco IT begins using the dual FCoE converged network adapters with Nexus 5000 deployments on a large scale, provisioning time for new servers will decrease substantially. “Today, provisioning a data center server requires opening multiple service requests, including racking the server, installing two HBAs [host bus adapters], and connecting two set of cables,” says Ramachandra-Rao. “Adopting FCoE will improve our SLA [service-level agreement] because we’ll eliminate service requests for the second HBA and cable.”

Improved SAN Performance

The Cisco Nexus 7000 Switch increases bandwidth capacity for servers and clients that communicate across access and aggregation layers. “We can expect faster response times because fewer IP packets will be dropped due to congestion,” says Ramachandra-Rao.

More Efficient IT Organization

Previously, Cisco IT’s network and storage operations groups operated separately. Unified I/O and FCoE are helping them converge, increasing the efficiency of the Cisco IT organization. Norman likens the change to when Cisco converged its voice and data networks. “To adopt VoIP, we cross-trained our TDM voice engineers and networking engineers to collaborate,” he says. “Now, as we adopt FCoE, previously separate storage and networking skill sets will also converge.”

Cisco IT is already restructuring its server, storage, and networking teams according to the PDIO model: planning, design, implementation, and operations. The design team works on the end-to-end solution, including storage, server, and orchestration components. “We’ve begun establishing Role-Based Access Control procedures for our storage and networking teams, to avoid conflicts,” says Ng.

NEXT STEPS

Now that the Nexus-based pod has been validated for the production environments, Cisco IT will begin using the Nexus 5000 with dual FCoE converged network adapters on the servers to reduce costs for hardware, cabling, power, and cooling.

The next data centers to deploy Nexus pods will be the engineering and development data centers in San Jose, California; the new production data center in Richardson, Texas, and existing data centers in Research Triangle Park, North Carolina, and Boxborough, Massachusetts. Cisco IT will coordinate the pod deployments with Cisco’s Fleet upgrade program for refreshing network equipment at regularly scheduled intervals.