Apr
26
2010
0

Business criteria for a NGDS (Next Generation Desktop Solution)

I regularly attend numerous webinars, training sessions and partner education events.  Likewise, I spend time reading whitepapers and viewing videos on all aspects of the solutions we provide here at Varrow.  Not surprisingly, however, the best opportunity to learn has always been from my clients.

I had a meeting with a client several weeks back and they had the foresight to put together a list of business requirements for a next generation desktop solution.  This is a nice, vendor agnostic list of attributes that they would want to be included in any solution that they would adopt to replace traditional fat clients.  So here goes…

1) Application Management – The solution must manage application installs and upgrades-In other words either using server based computing, application virtualization/packaging or automated deployment tools, the solution must automate and centrally manage the installation, metering and upgrades of applications across the enterprise.

2) Patching – The solution must automate the process of patching the operation system of the desktop.  Patching should be able to be easily administered and completed during a low-usage window WITHOUT the necessity of asking the user to leave the machine on and WITHOUT the necessity of physically visiting each computer.

3) Creation of a homogenous workstation environment – When I was an IT Manager, we had numerous workstation images depending on the model and user profile.  To make matters more complicated, we had to update the image almost every time we ordered a new batch of desktops because either the video or NIC driver changed.  This was the case many times even if we ordered the exact same model.  In order to eliminate this challenge, the desktop solution should have a homogenous desktop hardware profile.  Clearly desktop virtualization does this.

4) Efficient Replacements – When one of our users had a strange, unexplainable problem that couldn’t be easily resolved, the “fix” was often to re-image the box.  That in itself created problems because the user profile and “personality” information needed to be restored.  This often made for just as much pain except that it was over the course of the next several days as the user would repeatedly call the help desk because this setting or that setting didn’t make it back onto their newly imaged machine.  The ideal solution would be to separate the operating system, applications and user personality and easily bring these back together to seamlessly rebuild desktops without missing pieces.

5) Efficient Rollout – Given our global economy, companies desire more agility to quickly and easily deploy new teams and create workgroups of users that can work globally, in days not weeks.  Rolling out tens and hundreds of users that may be hundreds of miles from each other is becoming more common.  This requirement should be able to be met without visiting the sites or even going through a complicated support call to get the user set up.

6) Centralize Assets – As companies are searching for ways to reduce costs and consolidate sites, they are also looking for opportunities to reduce the number of datacenters and also centralize data inside secure facilities.  Likewise, they are looking for ways to reduce risk and adhere to compliance regulations.

7) Flex to a variety of user performance profiles – As we all understand, all users are not created equal.  Giving all users the highest common denominator of machine is inefficient.  Creating “buckets” of users (power, moderate, low-end) creates problems as user’s performance demands may change throughout a day, week, month or year.  The solution should be able to assign compute, memory, input/output and disk space as necessary to those users, on demand and if possible, automatically.  A mid-level user should be able to get the same compute resources as a power-user automatically, without having to resort to upgrading their device.

8 ) Reduction of Management Costs – PC engineers and help desk professionals work tirelessly with users to deploy, maintain and support users.  Unfortunately, they are working with a set of traditional tools that require a significant amount of interaction on their part.  The right solution should make their job easier to deploy workstations, package and upgrade applications, patch operating systems and secure data.  Ideally, this solution should be able to be administered from a centralized console with complete control over the process and without the necessity of visiting the user’s site.

9) Preserve User Profiles – Back when I worked at a law firm, we had attorneys that regularly visited other offices.  They would expect to login at a remote workstation and experience the same desktop that they would have if they were in their home office.  We tried server-based computing, roaming profiles and numerous other tools.  Finally, we settled on Citrix GoToMyPC.  These days, given the plethora of virtual desktop solutions, users should be able to go anywhere and on any device and have the same experience.

10) Management and re-use of end devices – Existing investments in “fat” clients should be preserved.  New devices should have mobility features so that users who travel can have an end device that will provide secure connectivity into the corporate network.  Using tools such as 2X and 10Zig’s Thin Desktop allows customers to leverage and prolong existing investments in desktop hardware.  Likewise, an ideal solution would be able to help centrally control end devices for BIOS updates and thin-client software refreshes.

I hope these are helpful for customers who are evaluating a Next Generation Desktop Solution.  You may find this list is idea as sort of a buyer’s guide for the evaluation of a desktop replacement solution.

Written by Dan Weiss in: Uncategorized |
Apr
25
2010
0

/dev/dad

As a father and an IT professional, I find myself in situations like this, when I walk in the door after work:

Dad: “Hey guys, I’m home.  I really missed you today.”

Wife: “Hey honey, can you help me get logged onto the bank website, I need to pay some bills and I couldn’t log in all day.  Also, can you call my sister?  She is having trouble with her laptop.  She said something about pop ups ads were making it impossible to surf the web.

Dad: “Did you try and reboot the wireless router?”

Wife: “What’s a router?”

Kid # 1 (in a sufficiently sassy 9 year old girl tone): “Dad, I am so glad you are finally home!  I have been trying to get on webkinz.com since I have been home from school and it keeps coming up with invalid password.  Why can’t you fix this thing?!  Are you REALLY a computer person?”

Kid #2 (in a typical teenage boy voice) : “Dad, my homework is to research information on Louis and Clark and everytime I boot up my laptop, I get this blueish screen with a bunch of numbers and letters.  By the way, it’s due tomorrow and my 20 page report is stored on there.  Can you help me?”

Man!  These are the toughest customers I have come into contact with all day, and to top it all off, they are my family!  Being an IT Dad sure has its challenges.  Most of which include providing technical support to your family forever on a 24×7 basis.

If this is anything like your house, here are a few suggestions to keep your SLAs intact and your customer sat at an all time high at home:

1) Never, ever, ever help any family member (outside of your immediate family) purchase a computer.  Once you do this, you are on the hook to support them for life.  Suggest that they contact the good people at Dell or better yet, go down to their local Best Buy and have those fine folks suggest a machine.

2) Use logmein.com to get remote access to each of the computers in your home.  If your spouse is having trouble during the day, you can remote-in and figure out the problem fairly easily.

3) Show your kids and spouse how to use Skype.  Skype is great for quick and easy video teleconferencing.  We use it all the time when I am out of town for “face time” with each other.

4) Purchase a router with great parental controls, like the Netgear series that support Live Parental Controls (LPC).  LPC is actually powered by OpenDNS, a web filtering and anti-phishing service that many educational, governmental and service providers use.  Best of all, it’s free.  Filtering is category based and best of all you can whitelist up to 25 domains, have access to basic reporting and make use of the web based dashboard.

5) Teach your kids to store their school papers on a thumb drive as well as on the computer.  Better yet, subscribe to Microsoft’s Skydrive (http://skydrive.live.com/).  They provide you 25GB of cloud based storage that is accessible from anywhere in the world, uses a Windows Explorer-like interface to drop and drag files and share certain files with other Windows Live users.

Written by Dan Weiss in: Uncategorized |
Mar
20
2010
22

NetApp performance testing engineer tells it like it is

NetApp performance test engineers rock!  Yes, I said it and meant it.  You see, I am the CEO and Co-Founder of an EMC partner and EMC is the only storage platform we work with. I believe in proving out a solution from an engineering perspective to ensure that what we are selling will deliver the results necessary for our customer.  Last month, a NetApp engineer helped me to keep a long time customer on EMC storage.

I was working with a client who prior to now was a staunch EMC customer.  It was storage refresh time for them and we did what we normally do for clients.  We re-examined their storage usage and performance profile and helped to understand their growth plans.  We discussed backup, replication, snapshots, clones and cache as well as lots of other aspects of their data storage requirements.  We developed for them a solution based on a combination of what their usage stats showed, along with what they felt like they needed for overhead and peak times.  Our solution had approximately 420 drive spindles and over 65000 IOPS, without taking into consideration things such as cache.  We regularly size our solutions with zero reliance on things such as what cache might provide, simply because we cannot know for sure what we will actually achieve.  We feel that when we provide a client an IOPS number, we want that to be a number that we can guarantee.  They may achieve more performance but we just don’t know how much more, so we just don’t estimate it.  Apparently, there are others that don’t feel the same.

Our customer wanted to do their due diligence and entertained a competitive solution from NetApp.  The NetApp sales team provided them with a solution proposal with about 40% fewer spindles than ours.  Also, their mix of disks was 65% SATA and 35% FC, while ours was 90% FC and 10% SATA.  Finally, they had approximately 30% less storage than us, justifying it by estimating that their deduplication would make up for the difference.  Unfortunately, their array was much less expensive and the customer told us that they were seriously considering replacing EMC with NetApp as their new storage platform due to the snazzy features, more efficient design and lower cost.

Prior to being a reseller, I was a long time user of EMC solutions and believe it’s the platform that I trust to put my customers on.  It’s not that I don’t like other storage platforms…I dislike when others position estimates of performance and storage savings as “gospel” to customers.  That goes for EMC as well as other storage companies.  It is a well known and common understanding that on average, a 15k rpm fibre channel disk will provide 180 IOPS.  Likewise, 10k fibre channel disks will provide 140 IOPS.  Lastly, 7200 rpm SATA disks provide 80 IOPS.  Now this is not an exact number, but an average, without taking into consideration the application, number of hosts, relative utilization of the storage processors, cache hit ratio, etc; you can rely on this.  More importantly, most storage professionals understand these rules and have relied on them for configurations and sizing for nearly a decade.

The other sales team told my client that they would actually get more IOPS from the NetApp solution than ours because of their Performance Accelerator Module II (PAM II).  They configured the 65% SATA and 35% FC solution with the 512GB PAM II card.  If you use the scale mentioned above and take into consideration the drive spindle count and drive type, we calculated that their IOPS would have been approximately 37,000, without the effect of the PAM II card.  However, the NetApp team told my customer that due to the large amount of cache on the PAM II, and the nature of the workload, obtaining 65000 IOPS or more would not be an issue.

They also told my customer that due to NetApp’s deduplication ratios, they could cut back on the amount of storage that had to be purchased.  My customer needed 147TB of storage for current needs and growth.  The NetApp solution only provided 102TB with the majority of it SATA storage.  The justification was that due to NetApp’s ability to deduplicate everything on the box, including production data such as Oracle, SQL and Exchange; my customer would save lots of money because fewer disks needed to be purchased.

The NetApp solution seemed too good to be true.  It seems that the NetApp solution was equivalent to our solution but with less storage, less spindles, and less cost.  With that, I set out to get more educated on the NetApp solution.  Instead of reaching for the competitive section of EMC Powerlink (EMC’s information portal for partners, customers and employees), I placed myself squarely in the customer’s shoes and went right for the NetApp documentation library.

I found a tremendous amount of great information on the PAM II.  First and foremost, it is a card that caches only reads.  While 512GB of read cache is nothing to scoff at, it is only helpful if the data that you need is actually in cache at the time you get a read request.  Given that the majority of my customer’s environment was Oracle, I knew that read cache hits would not be stellar.  You see, read cache utilizes an algorithm to read ahead and load blocks to cache, in anticipation that the next blocks requested would be close to where the last ones were requested.  There are a ton of different types of strategies to optimize read cache, however one thing is certain: it is difficult to obtain a high read cache hit rate with random reads in a large database.

I found the most interesting white paper at http://media.netapp.com/documents/tr-3753.pdf.  This paper, written by Dean Brock, a NetApp performance testing engineer, essentially states that during his test, the PAM card increased transactions from 6% to as high as 35%.  The test simulated an Oracle OLTP environment with 75% reads and 25% writes.  The NetApp Filers were dual FAS3140s with 56 fibre channel disks each.  No other hosts were on the Filers and no SATA disks were in the configuration.  Also, the test used NetApp’s FlexShare, a quality of service application that helps to tune the PAM II card.

If you take the 37000 IOPS, the number that the NetApp configuration, without the PAM II card, would have been able to provide and uplift that number by 35% (the top-end performance boost from the NetApp PAM II whitepaper), that number is still only about 50,000 IOPS.  There is something else to understand, however.  The NetApp configuration for my customer had SATA disks and no FlexShare, the QOS application for tuning the PAM II card.  Without the FlexShare, the PAM II card would have been quite busy trying to increase the throughput of the SATA disks, which made up more than 65% of the configuration.  Due to this fact, we believe that the PAM II card may have had even a far lower performance improvement than 35%; nevertheless we wanted to use the most conservative approach in our comparisons.  Something else to consider was the test itself.  There were no other hosts in the configuration besides the test hosts.  In the consolidated storage solution that my customer wanted to have, the PAM II card would have had to also contend with things other than Oracle such as SQL and Exchange.  Given that the NetApp configuration was mostly SATA, I truly  believe that the overall performance would have been nominal and that additional disks would have to be added to keep up with the IO demand.

The other gem I found was on NetApp’s blog site.  I was looking for a real-world number for the deduplication ratio of Oracle data.  From our experience with Data Domain and EMC Avamar, we knew that deduplication ratio of Oracle wasn’t the most impressive, simply because of the lack of commonality of the data.  Wouldn’t you know it!  Good old Dean Brock again spoke the truth in a blog entry right here: http://communities.netapp.com/message/15066.

In this blog entry Dean says that Oracle data averages 15% deduplication savings and even questions if it is worth it for Oracle.  For my customer, given that the mainstay of their environment is Oracle, the 15% savings would have been tough to fit into the storage that they specified in their configuration.  If the number ended up being less, like 13% or 12%, it was clear that they would not have had enough storage.

So after bringing these points up to the customer and showing them that our solution was giving them no estimates on performance and available disk space, they decided to keep their storage platform with us and EMC. With that I would again like to commend NetApp engineers who tell it like it is!

Written by Dan Weiss in: EMC, Storage |
Feb
19
2010
2

BI for the TSA

Over the last 18 months, my air travel has considerably increased, so much to the point where I now give a lot of thought about how I will pack.  I am certain to ensure that I keep checked baggage to a minimum and make certain that I don’t pack pocket knives, lighters or anything that could possibly get confiscated during my trek through airport security.

I have been delayed several times by TSA, searching through my bags, inspecting my hair gel and toothpaste and hunting for bomb-making residue.  I even had to go through one of those special scanners once, where you stand in the middle and the apparatus scans from your head to your toes.  That was interesting.

I mean, why would TSA even take a second look at me.  In no way do I fit the profile of a terrorist.  If you inspected my credit record, buying habits, bank accounts and background, I am very boring.  The efforts expended on folks like me, running computers through the Xray machine twice and opening up the many bags, swabbing it with those residue-detector pads is overwhelming.  I cannot fathom how this makes us any safer.  It’s kind of like looking for a needle in a haystack, except you are looking in all of the haystacks and not starting with the ones that have a huge sign that says “NEEDLES IN HERE”.

Our family loves to vacation and we often travel with my 22 year old niece, let’s call her Catie Jones.  Catie is a dental hygenist, college graduate, single and lives in Charlotte, NC.  Each time Catie goes with us on vacation, she gets stopped at the ticket counter and TSA.  You see, Catie has been placed on the no-fly-list, a special group of people who potentially pose a threat for air travel.  Catie Jones, my niece from NC is not a member of any terrorist organization, but at one point, a Catie Jones, an alias of an otherwise harder to pronounce and less common name is.  This “Catie” is not from NC, but likely from a country where insurgents and terrorists are harbored.

Now, if the TSA working with the airlines, with the money and resources of the United States government cannot tell the difference between a 22 year old dental hygenist born in NC, who attended UNCW and lives in downtown Charlotte with a foreign government insurgent, I am very dubious as to how they are going to keep our skies safe!!!!

I did some research on Google in an attempt to get a flavor of how the public perceives TSA.  Not surprisingly, the overwhelming number of posts say that we are no safer and many say we are less so.  Many of these posts are not just from the average person but from security experts and even Congressmen.  In my research I found no reports publishing their effectiveness or record of thwarting acts of terror.  Agreed, they may be more of a deterrent for underachieving would-be terrorists.  For the highly motivated, they have proven themselves to be easily circumvented…case in point the “Underwear Bomber” last Christmas as well as the “Shoe Bomber” and others.

EL AL Airlines in Israel has an impressive record.  Never has there been a terrorist act committed on their airlines.  Almost 100% of their flights have an armed air marshal on the plane.  All pilots hired by EL AL are former Israeli Air Force pilots, specially trained to react in terror-type situations.  EL AL has a short interview with each and every air traveler, asking about their business traveling, their background and their return flight.  The interviewers are specifically trained to look at the facial muscles of the interviewees to see if they are being deceptive.  Pat downs, in-depth searches and racial/nationality profiling are part and parcel of traveling with EL AL and play a large part in the airline’s safety record.

Now, I am not proposing that the FAA go-EL AL on all of us.  This country was built on certain freedoms and their tactics would not fly in this country, pun intended.  However, some of their strategies make a lot of sense.  For example, mining the vast databases of information available to determine likely suspects and more carefully scrutinizing those that fall out of the normal profile.

Now, what I am proposing is probably going to make privacy advocates go ballistic.  However, before you shout quotes out of George Orwell’s 1984, stop and think about it for a moment.  This information is used everyday in our society.  Financial companies regularly purchase information from credit reporting agencies to determine who has a mortgage, has a certain amount of household income and pays their bills on time.  Criminal backgrounds of any person can be purchased for as little as $20 on the Internet.  Companies track our online preferences and in-turn market those preferences right back to us.

Say, for example the TSA had shared access to a database of people and their profiles.  Normal folks that own property, have jobs, have no major convictions and travel often for business might be given a terror threat of say 5.  People who have ties to global terrorism, and perhaps whose fathers had called their own country’s intelligence agencies and warned the CIA that his son may be plotting to bomb an airplane might be flagged as a 1.  Of course, I am making reference here to Abdulmutallab (aka the “Underwear Bomber”).

The TSA should also ask if EL AL would be willing to come to the US to help train them in the art of facial deception detection.  When I went through TSA just this morning, the guy who checks my ID and my boarding pass spent the majority of time talking to his co-worker about the odds on the Superbowl, glancing up to me for approximately 2 seconds to see if I looked like the picture on my drivers license.  Why didn’t he ask me where I was going, what my name was and when I would return?  I could have easily made a fake ID and fake boarding pass and been halfway through getting a bomb on the plane.

If the TSA simply hired some really smart computer people to help it build an information base of people who fly and build profiles based on this, they could easily reduce the useless interaction with the likes of Catie Jones from NC and myself, who are the least-likely to blow up an airplane.  Their efforts could be stepped up for those who are not profiled and who may be suspicious or likely terror suspects.

In summary, the IT industry has incredibly smart people who are practical and think efficiently.  The TSA should take advantage of this to build a distributed profile database of information of travelers.  Let the folks with normal profiles through quicker, say with a less restrictive process.  For those few who are unprofiled or are flagged, take more time and effort and use the face-to-face method of interviewing to determine deception.  I believe this would help keep us safer and make traveling less hectic for us all.

Written by Dan Weiss in: Applications |
Jun
19
2009
0

Oracle on VMware ESX

Running Oracle on VMware ESX is one of those long-standing controversial issues not much different than prayer in schools, gun control and prolife/prochoice.  Oracle’s current policy for running its database software reminds me of the tactics that Microsoft used for Exchange and SQL on ESX, back about 4 years ago.  Back then, most people looked at the benefits and cost savings of running VMware and then looked at the risks of Microsoft saying we aren’t going to support you and simply did it anyway.  After too many customer complaints, the market spoke, and Microsoft reversed course and decided to officially support EX and SQL on VMware ESX.

What happened here?  Did Microsoft test their products on ESX and decide it was going to be ok.  Did ESX come out with a feature that made it ok to run EX and SQL?  None of the above.  What happened was that Microsoft realized that they could not force customers into using their hypervisor. 

Customers want to have choices wherever possible.  Hardware is the understandable exception.  Disk drive manufacturers cannot be expected to all be interchangable and still have quality and reliability.  That’s why customers are not as upset when forced to purchase proprietary drives and drive shelves from EMC, after purchasing a Clariion. 

Software should not have to suffer the same fate.  Customers should be free to use the operating systems, applications and management tools that they deem fit for their organization.  Just because you want to own more of the software stack, you shouldn’t use threatening tactics (such as telling your customers they will not be supported if they don’t adopt all the pieces of our app) to gain market share.  Certain companies, however have tried to use a stick to entice customers to adopt their products, instead of such novel things such as better products, better positioning and more value. 

Oracle is falling into the same pothole as Microsoft did.  Oracle has never had specific hardware manufacturer requirements for their single-instance database application.  They have never said that Oracle databases have to be run on Dell, HP and IBM only.  You can run Oracle on any server you want, as long as CPU speed, disk and RAM requirements are met.  Why should it matter if it is Dell, HP, IBM or a VMware Virtual Machine?  Larry is trying to own the entire stack by his latest wall street shopping trips (they just purchased Sun), his rebranding of Red Hat Linux to Oracle Unbreakable Linux and his “introduction” of an Oracle-branded hypervisor (Virtual Iron).  He and his enormous ego are convinced that customers will believe that OracleVM can run Oracle databases and Oracle RAC better and more reliably than VMware ESX can.  C’mon Larry, what do you take your customers for, fools? 

Let’s just say that Virtual Iron is a solid product, along with Citrix XenServer, Microsoft Hyper-V and VMware ESX.  These, many would consider are the top players in the server hypervisor space.  I don’t think anyone could argue that Oracle would run reliably on most of these platforms, given the correct amount of architecture, design and hardware.  Why would Oracle only support its products on Virtual Iron?  Didn’t Oracle used to support VMware?  Yes!  In fact, early on Oracle was all about VMware, even providing a ready-made Oracle Virtual Appliance that could be downloaded.

Larry, you have a helluva company and a product that is phenomenal.  Customers picked you because you have the best in class database.  Why would you turn on them simply because they want to use the best in class and the market leader hypervisor platform?  Suggestion:  Build a better hypervisor, market it effectively and put away the stick!

Written by Dan Weiss in: Uncategorized |
Jun
09
2009
0

JRE Error when launching the Avamar Console

avamarerrorOn most workstations, when you install the Avamar Console, you will get the following error message:

 

This is a well known issue at EMC Support and here is the easy fix:

1. Remove all the Avamar Console Installations through Add/Remove Programs.

2. Next, uninstall all the Java installations on the system (you will be able to install the latest JAVA but you need to do it in a certain order). You may have several installed.  Remove all of them.  Some may date back to Update 7 and before. You will also need to uninstall all the JRE runtime updates.

3. Once they are all un-installed, type the name of the Avamar Server into a Web Browser and then click on Documents and Downloads in the middle of the page. Next, click on the full-windows client to list all the files in this directory. Now select and download the Java ‘jre-1_5_0_09-windows-i586-p.exe’ file. Install this file.  Once installed then download and install the Avamar Admin Console from the same Documents and Download Page. Once both and only these versions are installed (Make sure there are no other Java versions around), then you can follow the steps below by applying the tzupdater.

Apply the JRE patch by performing the steps listed below: 
1. Download tzupdater-1.1.0-2007c.zip.
Copy the URL below into your browser:
ftp://avamar_ftp:anonymous@ftp.avamar.com:/software/tzupdater-1.1.0-2007c.zip
to download the file. Save it in a convenient location such as C:\Temp.
C:\Temp will be used as an example for the remainder of this procedure. 

2.  Unzip tzupdater-1.1.0-2007c.zip (using WinZip or other zip utility).

3. Go to the tzupdater-1.1.0-2007c directory by typing:  cd C:\Temp\tzupdater-1.1.0-2007c\

4. Now apply the JRE patch by typing:  java -jar tzupdater.jar -f -bc

After this, test and load the Administrator Console to make sure the DST Error message no longer pops up. If successful, then proceed to the java web site and then you will be able to update to the latest JRE version. As of November 5, 2007 latest version is ‘Java 6 Update 3′. After this latest version of Java 6 Update 3 has been applied the Administrator Console will continue to work and no DST Error message will pop up.

 
Written by Dan Weiss in: Uncategorized |
Jun
03
2009
0

EMC Celerra Datamover Routing

The EMC Celerra is an interesting piece of gear.  What I mean by that is the datamovers, which are the heart of the Celerra, have an entirely separate set of configuration parameters than the Control Station does.  Sometimes that’s hard to remember.

When you first log into the Celerra Control Station, you are in a modified Linux server, complete with its own routing table and ip interface stack.  Note that this has absolutely nothing to do with the state of the world of the datamover.  The datamover, which is yet another computer that runs the proprietary EMC DART O/S, has its own separate and isolated ip stack and route table.  Now, this may not be news to some of you but it is important to keep this in mind when you are configuring, and especially troubleshooting. 

So, here is my scenario.  I am at a client in SC who has somewhat of a complex network infrastructure.  Lots of VLANs and routes, with nearly 50 remote sites.  When their EMC Celerra was installed, we cabled it up, created its own VLAN and IP address space for both the network side and the iSCSI side, and were able to connect to the Control Station to configure the disks, file systems, etc.  So far, so good.

It was finally time to actually configure the CIFS server so that we could collapse some Netware file servers into it (I will write another blog post about how we did this, later).  Here is the command for creating the CIFS server and subsequently joining it to the domain:

[nasadmin@XXXXX ~]$ server_cifs server_2 -Join compname=cifsservernamehere,domain=domainhere,admin=administratoraccounthere,ou=”Computers: EMC Celerra”

When I did this the first time, I got some obscure error like Error 4020: Input/Output error, timeout or somesuch.  A quick Powerlink check turned up nothing.  In the /nbsnas/log directory, I tailed the file cmd_log.err and it showed “Input/output error Internal MAC Socket Error 10: Connection timeout“.  Again, not much help here.  I assued the “server_cifs server_2″ command to see that the CIFS server was indeed created but failed to join the AD domain.

I then wondered if it was a networking or connectivity issue.  I pinged the domain controller, DNS servers and gateways from the Control Station which obviously worked fine.  After saying doh!, almost audibly, I issued with following command:

[nasadmin@XXXXX ~]$ server_ping server-2 x.x.x.x (ip address of the DNS server)

server_2: No such device or address, no answer from x.x.x.x

After smirking to myself and feeling like a complete fool (for those of you who are lost, the control station saw the DNS server and the rest of the network fine, but the datamover, at this point, was unable to ping the DNS server).

I then issued the following command to see what the routing table of the datamover looked like:

[nasadmin@XXXXX ~]$ server_route server_2 -list

The output showed no default route.  The datamover was unable to return acknowledgement packets that the DC/DNS server was sending.  That’s why the CIFS server domain join was failing. 

The local IP address space for the VMware and Celerra environment which was specifically created for the project was 10.30.127.x/17. (mask of 255.255.128.0).  The gateway for that network is 10.30.127.254.  I simply did this:

[nasadmin@XXXXX ~]$ server_route server_2 -add default 10.30.127.254

I checked the route table and it was added.  Then I went back and pinged the DNS server using the “server_ping server_2″ command and well I guess you know the rest of the story :)

Written by Dan Weiss in: Uncategorized |
May
26
2009
0

Miscellaneous Fun with EMC Celerra

I wanted to shutdown our EMC Celerra the other day and the proper way to do this is to halt the data mover, once all of the hosts are turned off.  Here is the command:

server_cpu server_2 -halt

This will shutdown the data mover altogether.  You can verify this by doing a “/nas/sbin/getreason”.

In order to bring it back up you must issue the command:

/nas/sbin/t2reset pwron -s 2

The -s means slot and 2 means slot_2.  This will power it back on.  Issuing /nas/sbin/getreason commands will show you the various steps of power up on the data mover such as reset, loading, configured and finally contacted.  When it says “contacted”, it is completely up.  Should take about 2-3 minutes.

The /nas/sbin/getreason should show that slot_2 is contacted with a reason code of 5.  This means normal operation.

Another useful tip is when you are having diffculty getting into the gui, you can do a ps -ef | grep jserver and see if the Jserver is running.  Issuing a /nas/sbin/js_fresh_restart will restart the Jserver.  This typically clears up any issues with the GUI not functioning well.

A weird issue occured with our lab NS20 where the GUI was reporting that the DM was down and that the file systems were not up, although I clearly had access to NFS, CIFS and other storage resources.  Somehow the GUI got disconnected from reality.  The fix for this is to reinitalize the control station’s connection to the DM:

/nas/sbin/setup_enclosure -initSystem force

This sounds scary but simply goes back through the configuration and resets and remaps all of the configuration into place.  Takes 3-4 minutes and typically clears up any inconsistency issues with the GUI.

Written by Dan Weiss in: Celerra, EMC, Storage |
Apr
24
2009
0

Celerra File System Access from the Control Station

I am a collector of (mostly) useless IT tips and tricks.  On the EMC Celerra that we have in our lab, we had a number of CIFS shares and there were a few files that for some reason we couldn’t delete using the normal means.  Somehow the files were locked.  I was on the phone with an EMC support engineer about another problem and happened to ask if it was possible to access the file system from the control station.  He said yes, although admitted that it was not documented.

By typing the command “nas_server -l”, you can see the slot number of the data movers.  Typically, data mover 2 (server_2) is set to slot_2.  If you do a “cd /nas/quota/slot_2″, you will see a listing of the file systems on the Celerra.  You can change directory into any of the CIFS or NFS file systems and actually view, delete and rename the actual file system files.  This is ony true for CIFS and NFS data.  For iSCSI LUNs, you can see the LUN name but cannot view the contents (obviously, tampering with a block-based LUN could easily corrupt it). 

Again, an interesting tidbot of mostly useless information but often, you may find yourself needing this capability.

Written by Dan Weiss in: Uncategorized |
Apr
20
2009
0

HA, Network Style

Here at Varrow we work hard at helping clients understand the best practices and recommendations in order for them to get the most out of their infrastructure.  To us, Corporate IT Staff want to Simplify Operations, Go Faster, Go Green (not just save the environment but also save money) and Have Peace of Mind.  These are the tenents of what we offer clients and truly represent what we set out to do when engaging a client.

In most customer environments, the network access layer is comprised of one or more stackable switches that are uplinked to a distribution layer or perhaps directly to the core layer, depending on the size of the client’s facilities.  If the client uses distribution switches, these should be well-connected to the core by using redundant links.  Likewise, if multiple stacks of access switches are linked to a distribution layer, these too should take advantage of high availability options.  The question is, what methods and design alternatives should be used in each of these situations?

Imagine a hospital with 7 floors.  The core switches are in the basement and each floor has a wiring closet with a stack of layer 2 switches.  Each wiring closet has  1 fiber optic cable that home-runs down to the basement.  The fiber cable is attached to the top switch in the stack.  The stack of switches in each closet use a single port uplink using a daisy chain mode.  If one switch  goes down, all of the other switches are orphaned and all devices on the floor are potentially affected.

One upgrade option to consider in this situation would be to purchase a layer 3 switch for each floor and to link each l2 switch into the l3 switch, and not to each other.  Each l2 switch could collapse into the l3 switch and in turn, the l3 switch could home-run back to the basement.  To add a higher degree of redundancy, you could run a fiber cable between every other floor so that each l3 switch has at least 2 physical paths.  Employing port channelling on the l3 switches and using equal cost paths with an interior routing protocol such as EIGRP or OSPF, a higher degree of availability can be achieved. 

Adding l3 switches does not come without some degree of additional complexity.  Another design alternative would be to use l2 switches in the similar design as above.  Instead of using routing and equal cost paths, you could employ rapid spanning tree for the high availability.  Of course the caveat here is true load balancing along with immediate convergence in cases of primary path failure is not possible when using STP.  There can be a several second delay when the network fabric has to re-converge.  Not so when using routing.  Routing decisions can be made immediate using equal cost paths.  The listening-learning-blocking and STP election process can take quite a few seconds.

In the next 2 blog posts, I am going to drill down into the design and configuration of each of these 2 methods for introducing multi-pathing into your network.

Written by Dan Weiss in: Uncategorized |

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com