Sunday, December 18, 2011

Is Data Loss in Your Future?

Recently it was announced that Seagate and Western Digital were dropping the warranty from 5 years to just 1 or 2 for a large majority of their hard drives. Granted these are consumer grade drives, but the concerning thing about this statement is it is more or less an acknowledgement that drive failure rates are probably more common than we'd like to think. I don't know if the floods in Thailand have had any influence on this (no, according to WD), but I suspect it is at least playing some role. The manufacturing of hard drives is such a sensitive practice, and the only two drive makers in the world are now dealing with the residual effects of this...

Even though these are consumer drives we are talking about, and the support agreement you have with your enterprise array vendor won't change, one has to wonder what this says about the drives spinning on your datacenter floor today and the ones you're ordering for tomorrow. After all, this research paper done by Carnegie Melon University showed the MTBF between enterprise and consumer grade drives are roughly the same. And a similar Google study on drive failures also confirmed drive manufacturers MTBF numbers were generously overstated. 

Most enterprise storage vendors have the ability to provide you with a dual disk parity scheme yet so many are reluctant to build systems that way. With all the things NetApp does well this is something that seems to get lost in the mix. We've been shipping systems with RAID-DP (our dual disk parity scheme) for the better part of half a decade in order to guarantee the safety of our customers data. As array vendors continue to add features and complexity to their offerings the underlying data structure oftentimes gets glossed over, especially when data is being moved at the sub-lun level and a single LUN may be spread across different RAID levels.

Although there is nothing particularly sexy or exciting about disk parity schemes, it is still a very, very important piece of the puzzle. If your solution providers and vendors aren't designing your systems with the highest levels of protection in mind you could be unknowingly introducing the possibility for data loss in your environment. Safeguarding your businesses information should be priority one, and with drive quality issues in doubt both now and in the future why expose yourself to risk if you don't have to?

Sunday, October 2, 2011

Safely Virtualize Oracle on NetApp, VMware, and UCS


Virtualizing your Tier1 applications is one of the last hurdles on the way to a truly dynamic and flexible datacenter. Large Oracle databases almost always fall into that category. In the past a lot of the concern revolved around performance, but with faster hardware and support for larger and larger virtual machines this worry is starting to fade away. The lingering issue remains what is and what isn't supported in a virtual environment from your software vendor?

Although Oracle has relaxed their stance on virtualization, they take the same approach that most do when it comes to support in virtual environments. Take for example the following excerpt from Oracle's database support matrix: Oracle will provide support for issues that are known to occur on the native OS, or can be demonstrated not to be a result of running on the server virtualization software. Oracle may request that the problem be reproduced on the native hardware.

That last part is the killer for most companies. How could you quickly re-create a multi-terabyte database on physical hardware once it is virtualized if there is a problem? Luckily NetApp, VMware, and Cisco UCS provide a very elegant solution to address this issue. Let's take a look at a simple diagram depicting a 10TB virtualized Oracle DB instance connected via 10GbE and utilizing Oracle's Direct NFS client.

The guest OS has been virtualized and resides on the VMFS datastore, the vSphere host is booting from SAN, and the database is directly hosted and accessed on the NetApp array using NFS. Each data volume in the picture is connected using a different technology to illustrate protocol independence (outside of Oracle where NFS is used for simplicity of setup).

As you can see from the diagram the real challenge is re-creating that 10TB database in a way that is cost effective and fast. NetApp's FlexClone technology allows the instant creation of zero space virtual copies. The process is similar to VMware's linked clones, but NetApp does it with LUN's or file data, and with no performance hit.

To build your safety net follow the steps below.

  1. Create LUN on NetApp array
  2. Create UCS Service Profile Template
  3. Configure Service Profile Template and set to boot from LUN in step 1
  4. Deploy Service Profile from template
  5. Install same OS as virtualized instance (OEL 5.5 in this case)
  6. Create FlexClone of Oracle files/volumes
  7. Create exports and set permissions for newly created server
  8. Configure OS with mount points designed for FlexCloned file/volume
At this point you have a full physical environment of that 10TB virtualized Oracle database. The diagram below shows what this looks like.
The next step is to clean this up since you don't want this UCS blade occupied with the test environment.
  1. Shut down the OS
  2. Delete the Service Profile (not the template)
  3. Delete the FlexClone(s)
Now in the event you have some nasty database issue, and Oracle tells you to reproduce the issue on physical hardware, you can listen on the phone as the support guys jaw hits the floor when you tell him to give you 5 minutes. The entire process can be scripted easily using the Data ONTAP and UCS PowerShell Toolkit, or using an orchestration tool of your choice.

Reserving a blade or two for this unlikely scenario may seem wasteful to some, but because of the flexibility of UCS you can quickly spin that hardware up into production for things like hardware maintenance without a performance hit or capacity on demand for your vSphere environment. With NetApp, VMware, and Cisco you can safely and efficiently take your company to a 100% virtualized private cloud environment.

Monday, May 16, 2011

Oooh Barracuda!

Earlier this year I was working on an issue setting up TLS encrypted email between my company and a new client using our Barracuda Spam Firewall. The process should have been a piece of cake, especially since we use STARTTLS commands on all outbound email. What we were running into with this particular client was TLS would never negotiate even when we forced encryption to their domain.

The Problem

I had opened and closed 2 separate tickets, each time with the rep claiming the problem would be resolved if we implemented their suggestion. Each time I had to contact the client and schedule off-hours resources on their end to test the changes, and each time it failed. We were rapidly approaching the go-live date, and no encrypted email between our sites would effectively mean the agreed upon work flows would need to be modified.

I once again called in, created a new ticket, and escalated with the prior 2 tickets as reference. After speaking with a top tier tech at Barracuda I was told that what we needed was not possible. The issue had to do with the fact that this client was running an ESMTP inspect process on their ASA, and were masking the banner information returned from the EHLO command. As it was explained to me the Barracuda was defaulting to non-extended commands and issuing a HELO as soon as it did not see EHLO in the response. Because of this we never issued a STARTTLS, and the encryption negotiation would never take place. Not being able to force a STARTTLS when choosing to encrypt all email to a domain regardless of banner reply was a bug in my mind, and there was no way I was going back to this new (and very large) client to tell them we could not encrypt email between our sites in a seamless fashion.

The Solution

I learned a long time ago that engineers often times only see things from the technical side of things, and in this case it was clear, the system as it was built today would not work the way we needed it to work. This is why I was getting nowhere fast with the standard support channel. In order to get a resolution I laid everything out in a lengthy email, explained the urgency of the issue, and sent it to the sales department. Involving the sales team brings in a different line of thinking (and motivation), and it was my hope they could escalate in a way that was not possible through standard methods.

What happened next was nothing short of amazing. I instantly received replies from members of the sales team, and within a few hours I received a call from the CTO and co-founder of the company Zachary Levow. Mr. Levow wanted to understand the exact problem we were having, had me send him some logs (while working with an engineer on their side), and even emailed me his cell phone stating he would be my point of contact until the issue was resolved. Barracuda is a decent sized company with over 130,000 organizations using their products so working directly with their CTO certainly caught me off guard.

Needless to say the work done in the next 48 hours far exceeded my expectations. Barracuda wrote custom code for our system, tested without our client being involved, and promised to roll the changes into a permanent patch. Within a couple weeks we saw this: [BNSF-15994] Enhancement: If TLS Encryption is required per the DOMAINS > Manage Domain > ADVANCED > Email Protocol page, the Barracuda Spam & Virus Firewall will always issue an EHLO, regardless of welcome banner containing ESMTP. So there you have it, our little feature request made it into a full production release in a matter of weeks (so we could continue patching our system without fear of breaking the custom code), the client was very happy we were able to work the issue on our side (as opposed to them making firewall changes or coming up with a different work flow), and it solved a very real issue that probably affected other users.

In the end we purchased 2 more Barracuda Spam Firewalls, and I'm sure we will continue to partner with this company long term. What could have been a bad experience was turned completely around by the hard work of the sales, technical, and executive team at a great tech company. My hats off to you Barracuda, your product comes strongly recommended! For all my VMware tweeps make sure to check out their new Spam Firewall Virtual Appliance.