Jun
19
2009
0

Oracle on VMware ESX

Running Oracle on VMware ESX is one of those long-standing controversial issues not much different than prayer in schools, gun control and prolife/prochoice.  Oracle’s current policy for running its database software reminds me of the tactics that Microsoft used for Exchange and SQL on ESX, back about 4 years ago.  Back then, most people looked at the benefits and cost savings of running VMware and then looked at the risks of Microsoft saying we aren’t going to support you and simply did it anyway.  After too many customer complaints, the market spoke, and Microsoft reversed course and decided to officially support EX and SQL on VMware ESX.

What happened here?  Did Microsoft test their products on ESX and decide it was going to be ok.  Did ESX come out with a feature that made it ok to run EX and SQL?  None of the above.  What happened was that Microsoft realized that they could not force customers into using their hypervisor. 

Customers want to have choices wherever possible.  Hardware is the understandable exception.  Disk drive manufacturers cannot be expected to all be interchangable and still have quality and reliability.  That’s why customers are not as upset when forced to purchase proprietary drives and drive shelves from EMC, after purchasing a Clariion. 

Software should not have to suffer the same fate.  Customers should be free to use the operating systems, applications and management tools that they deem fit for their organization.  Just because you want to own more of the software stack, you shouldn’t use threatening tactics (such as telling your customers they will not be supported if they don’t adopt all the pieces of our app) to gain market share.  Certain companies, however have tried to use a stick to entice customers to adopt their products, instead of such novel things such as better products, better positioning and more value. 

Oracle is falling into the same pothole as Microsoft did.  Oracle has never had specific hardware manufacturer requirements for their single-instance database application.  They have never said that Oracle databases have to be run on Dell, HP and IBM only.  You can run Oracle on any server you want, as long as CPU speed, disk and RAM requirements are met.  Why should it matter if it is Dell, HP, IBM or a VMware Virtual Machine?  Larry is trying to own the entire stack by his latest wall street shopping trips (they just purchased Sun), his rebranding of Red Hat Linux to Oracle Unbreakable Linux and his “introduction” of an Oracle-branded hypervisor (Virtual Iron).  He and his enormous ego are convinced that customers will believe that OracleVM can run Oracle databases and Oracle RAC better and more reliably than VMware ESX can.  C’mon Larry, what do you take your customers for, fools? 

Let’s just say that Virtual Iron is a solid product, along with Citrix XenServer, Microsoft Hyper-V and VMware ESX.  These, many would consider are the top players in the server hypervisor space.  I don’t think anyone could argue that Oracle would run reliably on most of these platforms, given the correct amount of architecture, design and hardware.  Why would Oracle only support its products on Virtual Iron?  Didn’t Oracle used to support VMware?  Yes!  In fact, early on Oracle was all about VMware, even providing a ready-made Oracle Virtual Appliance that could be downloaded.

Larry, you have a helluva company and a product that is phenomenal.  Customers picked you because you have the best in class database.  Why would you turn on them simply because they want to use the best in class and the market leader hypervisor platform?  Suggestion:  Build a better hypervisor, market it effectively and put away the stick!

Written by Dan Weiss in: Uncategorized |
Jun
09
2009
0

JRE Error when launching the Avamar Console

avamarerrorOn most workstations, when you install the Avamar Console, you will get the following error message:

 

This is a well known issue at EMC Support and here is the easy fix:

1. Remove all the Avamar Console Installations through Add/Remove Programs.

2. Next, uninstall all the Java installations on the system (you will be able to install the latest JAVA but you need to do it in a certain order). You may have several installed.  Remove all of them.  Some may date back to Update 7 and before. You will also need to uninstall all the JRE runtime updates.

3. Once they are all un-installed, type the name of the Avamar Server into a Web Browser and then click on Documents and Downloads in the middle of the page. Next, click on the full-windows client to list all the files in this directory. Now select and download the Java ‘jre-1_5_0_09-windows-i586-p.exe’ file. Install this file.  Once installed then download and install the Avamar Admin Console from the same Documents and Download Page. Once both and only these versions are installed (Make sure there are no other Java versions around), then you can follow the steps below by applying the tzupdater.

Apply the JRE patch by performing the steps listed below: 
1. Download tzupdater-1.1.0-2007c.zip.
Copy the URL below into your browser:
ftp://avamar_ftp:anonymous@ftp.avamar.com:/software/tzupdater-1.1.0-2007c.zip
to download the file. Save it in a convenient location such as C:\Temp.
C:\Temp will be used as an example for the remainder of this procedure. 

2.  Unzip tzupdater-1.1.0-2007c.zip (using WinZip or other zip utility).

3. Go to the tzupdater-1.1.0-2007c directory by typing:  cd C:\Temp\tzupdater-1.1.0-2007c\

4. Now apply the JRE patch by typing:  java -jar tzupdater.jar -f -bc

After this, test and load the Administrator Console to make sure the DST Error message no longer pops up. If successful, then proceed to the java web site and then you will be able to update to the latest JRE version. As of November 5, 2007 latest version is ‘Java 6 Update 3′. After this latest version of Java 6 Update 3 has been applied the Administrator Console will continue to work and no DST Error message will pop up.

 
Written by Dan Weiss in: Uncategorized |
Jun
03
2009
0

EMC Celerra Datamover Routing

The EMC Celerra is an interesting piece of gear.  What I mean by that is the datamovers, which are the heart of the Celerra, have an entirely separate set of configuration parameters than the Control Station does.  Sometimes that’s hard to remember.

When you first log into the Celerra Control Station, you are in a modified Linux server, complete with its own routing table and ip interface stack.  Note that this has absolutely nothing to do with the state of the world of the datamover.  The datamover, which is yet another computer that runs the proprietary EMC DART O/S, has its own separate and isolated ip stack and route table.  Now, this may not be news to some of you but it is important to keep this in mind when you are configuring, and especially troubleshooting. 

So, here is my scenario.  I am at a client in SC who has somewhat of a complex network infrastructure.  Lots of VLANs and routes, with nearly 50 remote sites.  When their EMC Celerra was installed, we cabled it up, created its own VLAN and IP address space for both the network side and the iSCSI side, and were able to connect to the Control Station to configure the disks, file systems, etc.  So far, so good.

It was finally time to actually configure the CIFS server so that we could collapse some Netware file servers into it (I will write another blog post about how we did this, later).  Here is the command for creating the CIFS server and subsequently joining it to the domain:

[nasadmin@XXXXX ~]$ server_cifs server_2 -Join compname=cifsservernamehere,domain=domainhere,admin=administratoraccounthere,ou=”Computers: EMC Celerra”

When I did this the first time, I got some obscure error like Error 4020: Input/Output error, timeout or somesuch.  A quick Powerlink check turned up nothing.  In the /nbsnas/log directory, I tailed the file cmd_log.err and it showed “Input/output error Internal MAC Socket Error 10: Connection timeout“.  Again, not much help here.  I assued the “server_cifs server_2″ command to see that the CIFS server was indeed created but failed to join the AD domain.

I then wondered if it was a networking or connectivity issue.  I pinged the domain controller, DNS servers and gateways from the Control Station which obviously worked fine.  After saying doh!, almost audibly, I issued with following command:

[nasadmin@XXXXX ~]$ server_ping server-2 x.x.x.x (ip address of the DNS server)

server_2: No such device or address, no answer from x.x.x.x

After smirking to myself and feeling like a complete fool (for those of you who are lost, the control station saw the DNS server and the rest of the network fine, but the datamover, at this point, was unable to ping the DNS server).

I then issued the following command to see what the routing table of the datamover looked like:

[nasadmin@XXXXX ~]$ server_route server_2 -list

The output showed no default route.  The datamover was unable to return acknowledgement packets that the DC/DNS server was sending.  That’s why the CIFS server domain join was failing. 

The local IP address space for the VMware and Celerra environment which was specifically created for the project was 10.30.127.x/17. (mask of 255.255.128.0).  The gateway for that network is 10.30.127.254.  I simply did this:

[nasadmin@XXXXX ~]$ server_route server_2 -add default 10.30.127.254

I checked the route table and it was added.  Then I went back and pinged the DNS server using the “server_ping server_2″ command and well I guess you know the rest of the story :)

Written by Dan Weiss in: Uncategorized |
May
26
2009
0

Miscellaneous Fun with EMC Celerra

I wanted to shutdown our EMC Celerra the other day and the proper way to do this is to halt the data mover, once all of the hosts are turned off.  Here is the command:

server_cpu server_2 -halt

This will shutdown the data mover altogether.  You can verify this by doing a “/nas/sbin/getreason”.

In order to bring it back up you must issue the command:

/nas/sbin/t2reset pwron -s 2

The -s means slot and 2 means slot_2.  This will power it back on.  Issuing /nas/sbin/getreason commands will show you the various steps of power up on the data mover such as reset, loading, configured and finally contacted.  When it says “contacted”, it is completely up.  Should take about 2-3 minutes.

The /nas/sbin/getreason should show that slot_2 is contacted with a reason code of 5.  This means normal operation.

Another useful tip is when you are having diffculty getting into the gui, you can do a ps -ef | grep jserver and see if the Jserver is running.  Issuing a /nas/sbin/js_fresh_restart will restart the Jserver.  This typically clears up any issues with the GUI not functioning well.

A weird issue occured with our lab NS20 where the GUI was reporting that the DM was down and that the file systems were not up, although I clearly had access to NFS, CIFS and other storage resources.  Somehow the GUI got disconnected from reality.  The fix for this is to reinitalize the control station’s connection to the DM:

/nas/sbin/setup_enclosure -initSystem force

This sounds scary but simply goes back through the configuration and resets and remaps all of the configuration into place.  Takes 3-4 minutes and typically clears up any inconsistency issues with the GUI.

Written by Dan Weiss in: Celerra, EMC, Storage |
Apr
24
2009
0

Celerra File System Access from the Control Station

I am a collector of (mostly) useless IT tips and tricks.  On the EMC Celerra that we have in our lab, we had a number of CIFS shares and there were a few files that for some reason we couldn’t delete using the normal means.  Somehow the files were locked.  I was on the phone with an EMC support engineer about another problem and happened to ask if it was possible to access the file system from the control station.  He said yes, although admitted that it was not documented.

By typing the command “nas_server -l”, you can see the slot number of the data movers.  Typically, data mover 2 (server_2) is set to slot_2.  If you do a “cd /nas/quota/slot_2″, you will see a listing of the file systems on the Celerra.  You can change directory into any of the CIFS or NFS file systems and actually view, delete and rename the actual file system files.  This is ony true for CIFS and NFS data.  For iSCSI LUNs, you can see the LUN name but cannot view the contents (obviously, tampering with a block-based LUN could easily corrupt it). 

Again, an interesting tidbot of mostly useless information but often, you may find yourself needing this capability.

Written by Dan Weiss in: Uncategorized |
Apr
20
2009
0

HA, Network Style

Here at Varrow we work hard at helping clients understand the best practices and recommendations in order for them to get the most out of their infrastructure.  To us, Corporate IT Staff want to Simplify Operations, Go Faster, Go Green (not just save the environment but also save money) and Have Peace of Mind.  These are the tenents of what we offer clients and truly represent what we set out to do when engaging a client.

In most customer environments, the network access layer is comprised of one or more stackable switches that are uplinked to a distribution layer or perhaps directly to the core layer, depending on the size of the client’s facilities.  If the client uses distribution switches, these should be well-connected to the core by using redundant links.  Likewise, if multiple stacks of access switches are linked to a distribution layer, these too should take advantage of high availability options.  The question is, what methods and design alternatives should be used in each of these situations?

Imagine a hospital with 7 floors.  The core switches are in the basement and each floor has a wiring closet with a stack of layer 2 switches.  Each wiring closet has  1 fiber optic cable that home-runs down to the basement.  The fiber cable is attached to the top switch in the stack.  The stack of switches in each closet use a single port uplink using a daisy chain mode.  If one switch  goes down, all of the other switches are orphaned and all devices on the floor are potentially affected.

One upgrade option to consider in this situation would be to purchase a layer 3 switch for each floor and to link each l2 switch into the l3 switch, and not to each other.  Each l2 switch could collapse into the l3 switch and in turn, the l3 switch could home-run back to the basement.  To add a higher degree of redundancy, you could run a fiber cable between every other floor so that each l3 switch has at least 2 physical paths.  Employing port channelling on the l3 switches and using equal cost paths with an interior routing protocol such as EIGRP or OSPF, a higher degree of availability can be achieved. 

Adding l3 switches does not come without some degree of additional complexity.  Another design alternative would be to use l2 switches in the similar design as above.  Instead of using routing and equal cost paths, you could employ rapid spanning tree for the high availability.  Of course the caveat here is true load balancing along with immediate convergence in cases of primary path failure is not possible when using STP.  There can be a several second delay when the network fabric has to re-converge.  Not so when using routing.  Routing decisions can be made immediate using equal cost paths.  The listening-learning-blocking and STP election process can take quite a few seconds.

In the next 2 blog posts, I am going to drill down into the design and configuration of each of these 2 methods for introducing multi-pathing into your network.

Written by Dan Weiss in: Uncategorized |
Apr
10
2009
0

Differences between Nexus 7000 and Cisco CAT 6500

In a previous post, I was describing some of the differences between the new Cisco Nexus 7000 series switches and the tried and true Catalyst 6500 series.  In that post I was pondering the differences between the two and also trying to help the audience position the switches in the correct use cases. 

My good friends Matt Free and Paul Michniak of Cisco Systems turned me on to a document that is a clear response to this legitimate concern over where each switch plays.  Although the document doesn’t come right out and spell out where you SHOULD use the Nexus and where you SHOULDN’T use the CAT (and vice-versa), it does give the reader a good indication of how the Nexus is different and what strengths it has.

The document is marked “Cisco Public Information” so I have included a link (note you may have to use your CCO login to get to it).

http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9402/ps9512/White_Paper_C17-449427.pdf

Written by Dan Weiss in: Cisco |
Apr
09
2009
0

Convergence FCoE Style – Nexus 5000

At Varrow we work hard to help the customer take advantage of emerging technologies without getting burned by solutions that are note quite ready for prime time.  FCoE, although not fully ratified as a standard, will provide customers a way to merge their SAN and LAN fabrics in a cost-effective and efficient way. 

Many people will ask, why bother with this kind of solution now?  The economy is tight, FC has never been easier to manage and LAN technologies are stable and relatively easy to support.  The additional complexity of consolidating all datacenter I/O doesn’t seem to make sense right now.

My comment to this is to have Directors and CIOs take a broad look at their teams.  Are “fifedoms” occuring between the SAN, LAN and Server people?  Is the LAN team considering investing money on new 10G technology because they fear the SAN team’s hunger for bandwidth will exceed what is reasonable for 1GbE iSCSI?  Are the storage guys and gals eyeing FC switch upgrades and 10GbE for iSCSI without having coffee with the LAN team?  This kind of equipment is not cheap, nor are upgrades trivial.  Why not discuss how our LAN and SAN teams can all get along and in the process have a common I/O platform that will save power, rack units and money.

Many businesses are seeing the value of virtualization of server and desktop workloads.  The ROI is typically within 12 months or less and even ultra-conservative shops are being forced into virtualization simply to continue to cost-effectively provide services to their company.  Typically the hardware purchase required for a virtualization solution includes one or more storage arrays along with either FC or iSCSI switches; and in many cases both.  This creates situations where there are additional FC switches to manage and purchase, addition LAN switches for iSCSI access and perhaps even some of those LAN switches are 10GbE capable for future expansion.  So another fifedom is created and so another pod of captive storage FC and iSCSI ports are the sole owner of that solution.  Other departments could potentially make use of the available infrastructure.  Unfortunately, since you relied on the tried and true–separate switches for separate networks, you’ve got islands of captive capacity that are locked down and dedicated to that singular use.

Remember the reason you virtualized your servers?  Unused capacity on servers, when combined in a highly available and fault tolerant solution could actually provide a lower cost and more flexibility.  Now enter the zen of consolidated I/O…

Consolidated I/O simply allows you to connect your servers to a common fabric infrastructure that provides both SAN access for storage applications as well as LAN access for network connectivity.  This is done through a new kind of host bus adapter called a “converged network adapter” or CNA.  The CNA is nothing more than a 10GbE HBA with a special set of drivers that allow it to determine if incoming packets are LAN or SAN type traffic.  Likewise, on egress, it communicates with the FCoE switch to properly tag the frames for LAN or SAN processing.  By consolidating the access layer for servers into a centralized SAN/LAN switching fabric, you provide for a shared layer of switching which provides massive bandwidth, shared points of connectivity and fewer pieces of equipment to maintain.

The Cisco Nexus 5000 series is an 10GbE/FCoE switch that is chock-full of functionality.  The Nexus 5020 is a 2U switch that is typically mounted in the top of a rack.  It has 40 fixed 10GbE ports, capable of being configured as SAN or LAN ports.  Now, if you have existing FC switches or don’t yet have a SAN array capable of 10GbE, don’t worry.  The 5020 allows you to purchase an expansion module with four 1/2/4Gb FC ports.  (we at Varrow are speculating that next week there may be an announcement from EMC for 10GbE hot pluggable modules for the Clariion for direct 10GbE connectivity).

Transitioning from a FC SAN and separate iSCSI LAN and network LAN infrastructure to the Nexus is diabolically straightforward.  First, interconnect your Nexus switches into your existing SAN fabric via the FC expansion ports (you can easily make the N5K the subordinate switch for zoning purposes).  Connect your 10Gb LAN ports (or 1Gb via the same expansion module) to your LAN core and viola!  Keep in mind that the N5K is also a LAN switch so the LAN uplink ports can be trunked and all applicable VLANs will be enforced.

Here is a picture showing this setup:

nexus5k-1

 

Note the red box on the far left.  This is your existing FC switch infrastructure.  Although the picture does not show the links between the Nexus and the MDS, you can uplink them easily in a subordinate mode without affecting existing FC switch operations. 

The arrow shows the Nexus collapsing into the LAN core switches using either the 10GbE or 1GbE ports.

Note that all VLANs and STP settings will be enabled on the Nexus 5K as it is a L3 LAN switch too.

Once your N5Ks are hog-tied to your MDS and your LAN core switches, you can start cutting servers over to them by replacing the existing HBAs and NICs with CNAs!

 

 

 

So here is the classic problem—you need to purchase six HP DL380 G5s for your VMware server consolidation project.  The much thinner and less expensive DL360 has as much power and RAM as you need but it can only hold 2 PCI adapters.  So you are forced into purchasing the DL380 only for the purpose of spending even more money on multiple HBAs and multi-port NICs.  With The N5K, your DL360 diskless server with 32Gb of RAM and a pair of CNAs are all that you need in order to get started.

If you have experience with the Cisco MDS and are handy with the basics of Cisco LAN switch configuration, you are practically ready to configure the Nexus 5K.  The N5K supports Cisco fabric manager (with a necessary plug in for FCoE) and is just as easy to configure using the CLI (IMHO).  The LAN side of the N5K is no different than a CAT 6500 switch.  It supports the rapid-per VLAN spanning tree (PVST+), port channels, role-based access control (RBAC), QoS, etc.  Configuring FCoE requires a few extra steps as you have to create a Virtual Fibre Channel Interface (vfc).  With 3 simple commands, your port is up and operational for FCoE capability.  With Cisco Fabric Manager, you just drap and drop and you have performed the same thing in the gui. 

In summary, I/O consolidation is ready for prime-time and the Nexus 5K with FCoE gives you the technology that you need in order to make it a reality.

Written by Dan Weiss in: Cisco |
Apr
08
2009
0

Cisco Nexus 7000 versus Catalyst 6500

I am taking a Nexus 7000 and 5000 class this week and am getting a technical deep-dive into the capabilities of this exciting new line of Cisco gear.  There has been alot of chatter across the ‘net about the Nexus 7000 and the perception that it will replace the Catalyst 6500 series.  I wanted to post my take on this concept and also enlighten anyone about the Nexus.

Positioning the Nexus 7000Cisco Nexus 7010 and 7018
Our first 2 days were totally focused on the Nexus 7000 switch.  The 7k is quite a bit different than the 5k in that it is classified as a data center aggregation switch.  In short, it is a switch that servers and other switches would connect to in order to provide a mechanism for fast and resilient switching and routing within the data center.  The aggregation layer, as opposed to the core switch layer such as the Catalyst 6500; is not new to Cisco.  People have been using top-of-rack/end-of-row switches for years to provide access to devices in the datacenter.  A typical scenario is to either have Cisco departmental (3750/3560/4900) type switches in each rack (or in every other rack) and then collapse this back into the core.  Another typical scenario (yet slightly unwieldy) is to home-run all racks back to the core.  Either method has its issues with scalability, resiliency and cost.  The 7K provides an end-of-row solution to aggregate both 1GbE and 10GbE connections within the data center.  Ideally, the 7K would redundantly uplink to a CAT 6500 for core routing and switching.

Physical
Physically speaking, the 7K is a big switch.  The 7010 has 10 slots with room for dual SUPs (supervisor modules) and 8 actual line card slots.  There are only 2 line cards currently, 32 ports of 10GbE or 48 ports of 1GbE.  The 7018 is even larger with horizontally-mounted line cards.  Cable management is great on the 7010 as all cables come into the top portion of the unit in a vertical fashion.  No more “trapping” line cards behind a wall of cables and having to unplug cables from other line cards when you need to get one of the middle line cards out.

Nexus Cable Management

The diagram above shows the cable paths coming into the cable management tray and vertically feeding each card.  Cables don’t cross over cards, preventing easy insertion and extraction. 

What’s Different?
The Nexus 7000 has dual supervisor modules, hot swappable line cards, a highly resilient, highly available architecture and an array of L2 and L3 features——wait a minute!  This sounds like the 6500!!!  Actually, according to data sheets on both the 6500 and the 7000, they share many of the tried and true, “built like a tank” engineering attributes that Cisco is famous for.  The main differences in the Nexus are the throughput, line card density and protocol and application functionality.
 
Throughput
The N7K has over 1.5Tb of throughput across the backplane.  There is a virtual output queuing with centralized arbitration mechanism that prevents packet traffic jams by “pre-sorting” packets by destination and pre-queuing them for delivery.  This capability provides lossless ethernet, unlike traditional ethernet switches.  With all 5 fabric modules installed, the back-end will deliver up to 230Gbps per slot of bandwidth.  The CAT 6500 can do up to 1.4Tb of throughput, but that’s with two units using the Virtual Switching System (VSS).  The 6500 shows that it can deliver up to 32Gbps of shared bus bandwidth with the standard config.   Also, keep in mind that the CAT ethernet line cards rely on the SUP module for forwarding decisions.  The CAT 6500 can actually do more than 32Gb but you have to purchase those distributed forwarding cards for each I/O module and the 720Gb SUPs to get the Distributed Cisco Express Forwarding to work.  This allows each line card to make its own forwarding decisions, without conversing with the SUP.  On the N7K, all forwarding decisions are made at the line card level by default.  This results in faster packet fowarding, less latency and a true separation of the SUP’s control plane from the data plane. 
 
Line Card Density
The N7K only has 2 line card choices at present, a 32 port 10Gb card and the 48 port 10/100/1000Gb card.  The 32 port module provides 10Gb connectivity for a maximum of 80Gb to the shared bus.  Doing the math (32 x 10), you might surmise that this card uses oversubscription.  This  oversubscribed mode has its roots in the Cisco MDS fibre channel switches, which is where the Nexus’ NX-OS comes from.  There are 8 ASICs on the card with 4 ports per.  One designated port can be placed into “dedicated” mode, where it has access to up to the full line rate of 10Gb.  Otherwise, 4 ports will share 10Gb in the lossless and centrally arbitrated fabric.  The 48 port card is not oversubscribed, so all 48 x 1Gb ports are full line rate.  The Nexus 7010 (the smaller unit) can handle up to 256 x 10Gb ports or 384 10/100/1000 ports.  On the CAT side, it can only handle 130 x 10Gb ports and even then, you are limited to a 32Gb bus.  For a pure 10Gb play, the Nexus has the 6500 beat by a mile.  The largest 10Gb line card for the 6500 is 16 ports.  I haven’t crunched the numbers to see on a per-port basis, which is the smarter play.  I suspect with the incentives available, the N7K would be less expensive for a moderate to heavy 10Gb design.
 
Protocols and Application Functionality
The N7K only supports TCP/IP currently.  No support for IPX, SAP traffic and such.  HA protocols such as HSRP and VRRP are supported.  Another HA protocol, GLBP is supported but I cannot confirm this is not also available on the CAT.  GLBP performs a simliar function as HSRP and VRRP but spreads the load across the two switches.  For the server guys, it would be the difference between an active/passive cluster and an active/active NLB type cluster. 
 
The Nexus doesn’t have near the choices of line cards that the 6500 has.  There may be plans for application-specific modules such as SSL, firewall, NBAR, PoE, T1, SONET/OC, ATM, fiber, etc but from the way it sounds, Cisco will keep the Nexus path from impeding too much on the 6500 for the next year or two.   I understand that the CAT 6500 is good until at least 2012 so that leaves Cisco probably the remainder of this year to see if the Nexus could replace the 6500 series or continue to compliment it.
 
Virtualization
Switches are not immune to the wave of virtualization that is sweeping the IT business.  The N7K has something called Virtual Device Contexts or VDCs.  The switch can actually be divided up into 4 separate virtual switches.  Each VDC has its own configuration, vlans, routes and physically assigned interfaces.  In fact, the only way for one VDC to route and talk to another VDC is to actually connect a physical cable between the designated ports on the front of the switch!  You use this new command “switchto vdc <name of vdc>” to change between VDCs.  Naturally, there is role-based security so that administrators on VDC2 cannot get into VDC3 and so on.   The CAT 6500 has its own version of virtualization with the Virtual Switching System.  I haven’t done alot of research on this but from my reading I can best describe the CAT 6500’s VSS as a “core cluster”.  Dual 6500’s with the 720 SUP can be “hog-tied” together to appear as a single, centralized switch.  There is no need for HSRP/VRRP.  Servers and switches alike are dual homed into the VSS solution.  The SUP modules on both 6500 share state so failover/switchover is seamless and requires no L2 reconvergence.  VSS is more of a resiliency feature than a virtualization feature.  I vote they rename it to the “CATCLUSTER”.  :) .  VSS is something that the N7K doesn’t have and probably doesn’t need.  With VDCs, GLBP and no single point of failure, the Nexus wants for nothing in the way of resiliency.
 
Conclusion
At this juncture I would say that the N7K serves a different space than the 6500 and because of the stark differences in throughput, density and application capability, have a different use case.  Describing the 6500 and the N7K in terms of vehicles, the 6500 would be a highly customized conversion van, complete with hot tub, wet bar, TV, Wii and all of the comforts of home.  It may never get over 65 mph but lots of fun stuff to do while toddling down the highway.  The N7K is a one of those gigantic 50 passenger charter busses with sturdy, utilitarian bench style seats and dual jet engines.  Perfect for you and 49 of your closest friends to get from Charlotte to Atlanta in an hour or less!

 

 

Written by Dan Weiss in: Cisco | Tags:
Mar
15
2009
0

Symantec Storage Foundations and MSCS

 I am getting ready to do a project with one of our clients to implement Symantec Storage Foundations for Windows on a Microsoft SQL Cluster.  Now, I am not normally an implementation engineer.  I just happen to have a situation where the client is very important, and none of our other engineers have experience with SFW.  I happen to have some experience with it and we don’t want to disappoint one of our top clients, soooo you gotta do what you gotta do.

First of all let me say that SFW has greatly improved over the years.  It is at version 5.1, currently.  The interface is very intuitive with lots of wizards.  SFW has a rich heritage, coming from the Veritas side of Symantec and giving mounds of functionality. 

What does it do?
SFW comes in a few different versions and has more than a few options.  Essentially it wraps a software layer around all disks on a host, no matter if they are local, SAN, iSCSI, etc.  If you have storage coming from a JBOD or low-feature SAN Array, SFW can provide some advanced features such as RAID, data migration, replication, snapshots, etc.  Now, if you are using a feature rich SAN Array like an EMC, NetApp or HP, on the surface this may seem a waste of money; however there is more to this product that what meets the eye.

Symantec is giving away the Basic Version of SFW.  Here are a few features that you get with the free version:

  • Centralized visibility over all storage (EMC, NetApp, local) in the datacenter under 1 interface
  • Storage can be expanded or shrunk dynamically and reallocated
  • Dynamic Multipathing capabilities that allow for load balancing and seamless failure of I/O
  • Hot relocation-moves data off of failing disks
  • RAID Support including mirroring, RAID5, etc.

Great feature-set for free, if you ask me.  One fault that I see of the Basic version is the odd licensing they have for VMs.  Although you can use it on an unlimited number of physical servers (4 file systems/4 volumes limit), you can only use it on one VM per ESX server.    This is according to their FAQ located at http://eval.symantec.com/mktginfo/enterprise/other_resources/ent-sf_basic_5%200_faq_01_2009.en-us.pdf

GUI
The GUI is nice to look at although busy.  Note the Basic Group (disks that are not under control of SFW) and the CLUSTER_GROUP and QUORUM (disks that are under control of SFW):

 sfw26

 Version Differences
Per Symantec’s datasheet and website, SFW Standard Edition is best suited for medium workloads and applications where more than 4 volumes are needed (not including the boot volume, unless you are controlling it with SFW too).  As an observation, “upgrading” to SFW Standard requires you to pay a hefty sum and then get the privledge of paying for DMP all over again.   I guess if you need more than 4 disks under SFW, then you would need to do this, at a minimum. 

Here is a chart that shows off the differences:

sfw-chart

Microsoft Cluster Server Option
Most people deploy MSCS for more fault tolerance and to introduce high availability into their environment.  Obvious to some (but not others), MSCS will protect you from host failure, it does not protect from disk failure.  The SFW MSCS option allows you to create cluster-aware disk groups which can be protected with snapshots (Flashsnap, licensed separately) , mirroring, RAID5 and replication with VVR (VVR requires a separate license).  It is a small fee ($295+-) to add.

Flashsnap
Flashsnap is SFW’s snapshot option.  It starts at $395, depending on your O/S version.  Although it is all host-based, it is very fast and uses very little system overhead.  You can set up a schedule, take instant snaps, break off snaps and mount them and even take snapshot volumes to other hosts.

Vertias Volume Replicator (VVR)
VVR allows for sync or async replication of data to a remote server. 

Installation
 Installation is very simple but configuration is another matter.  With so many options, careful planning on how you are going to use SFW is the key.  When installing it into a cluster environment, Symantec best practices says to perform a rolling upgrade.  This means, roll the services around to the various nodes and install on each node, when it is in passive mode.  It does not affect the cluster when it gets installed, nor does it affect failover, once it is initially installed.

My project included building a lab which includes a MSCS with SQL 2005.  The Data and Log disks were mirrored, along with the Quorum.  I installed SFW and went to work configuring the product.  The cluster was already built with a Data drive and separate Log drive.  The first step was to bring down the physical disk resources within the Cluster Administrator.  Just right-click on the physical disk resources for the Data and Log and click Take Offline.  I did this on the active node.

Then I used the SFW GUI (on the active node) to create a Dynamic Cluster-Enabled Disk Group.  I added the Data and Log disks to it.  Note that this process doesn’t affect any of the data on the disks.  This process will remove the Data and Log disks from the Basic Group and moves them to the new Dynamic Disk Group.  After this, I brought the physical disks back online and checked failover, which was just fine.

Then I added 2 new disks to act as the mirrored pair for the Data and Log disks.  I added them to the Dynamic Disk Group.  Then I right-clicked on the Data disk and choose mirror.  A wizard started and walked me through picking the Data disk’s twin for mirroring purposes.  When I clicked on finish, I received an error stating that the configuration was wrong or there wasn’t adequate disk space.  I sized the mirrored twin exactly the same as the original.  I made the assumption, based on the error message that the twin needed to be a bit larger to accomidate volume information or mirroring meta-data.  Instead of unwinding that on the SAN-side, I just reduced the size of the original disk by 500Mb using SFW.  Neat feature which only took about 1 minute. 

After retrying the mirror wizard, it failed again!  Having worked with SFW before, I wanted to retry the same operation on the CLI.  Consulting the very detailed Administration Guide (720 pages), I executed the command “vxassist S: mirror harddisk5″.  The S: drive refers to the Data disk currently in use.  The harddisk5 refers to the new disk.  This immediately worked and the disk was mirrored up in less than 1 minute (it was only about 2Gb of data).  Weird issue with the GUI, but as I said, I have worked with SFW before and know that the CLI is the most powerful tool to manage it.  I did the same operation to the L: (logs) and before I knew it, SFW had redundant disks which were clustered up.  I tested failover and all worked as expected.

Next step was to mirror the Quorum.  This was a little more tricky.  First I added a new 1Gb disk into the hosts and created a QUORUM Dynamic Cluster-Enabled Disk Group.  I added the resource to the cluster, not as a physical disk, but as a Volume Manager Disk Object of Y:.  Then I right-clicked on the cluster name in the Cluster Administrator and swapped out the Quorum to the new Y drive.  I thought I was home-free but then the MSDTC wouldn’t start.  After digging for a minute or two, I figured out the MSDTC used the Q: drive (original Quorum drive letter) for writing logs to.  I shut the cluster back down again and using SFW changed out the drive letters of the new, SFW-protected Quorum.  Unfortunately the cluster didn’t want to let go of the original quorum so it wouldn’t allow me to change the drive letters out!  Going into Device Manager and disabling the Cluster device service (and rebooting) did the trick.  I swapped the drive letters and re-enabled the Cluster device service.  After buttoning the cluster all back up, everything worked flawlessly. 

Given the ability to mirror drives from different disk arrays (for free), free dynamic multipathing with load balancing and failure and a low-cost MSCS option to mirror all disks to soup up your MSCS installs makes the SFW Basic a great deal.  Adding enterprise features like Flashsnap and VVR takes a little more cogitation but when you have a mission critical server, they are nice to have.

Written by Dan Weiss in: Storage |

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com