Thursday, January 12, 2012

How to Cluster Windows Server 2003

Before you can install SQL Server 2005 clustering, you must first install Windows Server 2003 clustering services. Once it is successfully installed and tested, then you can install SQL Server 2005 clustering. In this article, we take a step-by-step approach to installing and configuring Windows 2003 clustering. In a later article, we will learn how to install SQL Server 2005 clustering.

Before Installing Windows 2003 Clustering

Before you install Windows 2003 clustering, we need to perform a series of important preparation steps. This is especially important if you didn’t build the cluster nodes, as you want to ensure everything is working correctly before you begin the actual cluster installation. Once they are complete, then you can install Windows 2003 clustering. Here are the steps you must take:
  • Double check to ensure that all the nodes are working properly and are configured identically (hardware, software, drivers, etc.).
  • Check to see that each node can see the data and Quorum drives on the shared array or SAN. Remember, only one node can be on at a time until Windows 2003 clustering is installed.
  • Verify that none of the nodes has been configured as a Domain Controller.
  • Check to verify that all drives are NTFS and are not compressed.
  • Ensure that the public and private networks are properly installed and configured.
  • Ping each node in the public and private networks to ensure that you have good network connections. Also ping the Domain Controller and DNS server to verify that they are available.
  • Verify that you have disabled NetBIOS for all private network cards.
  • Verify that there are no network shares on any of the shared drives.
  • If you intend to use SQL Server encryption, install the server certificate with the fully qualified DNS name of the virtual server on all nodes in the cluster.
  • Check all of the error logs to ensure there are no nasty surprises. If there are, resolve them before proceeding with the cluster installation.
  • Add the SQL Server and Clustering service accounts to the Local Administrators group of all the nodes in the cluster.
  • Check to verify that no antivirus software has been installed on the nodes. Antivirus software can reduce the availability of clusters and must not be installed on them. If you want to check for possible viruses on a cluster, you can always install the software on a non-node and then run scans on the cluster nodes remotely.
  • Check to verify that the Windows Cryptographic Service Provider is enabled on each of the nodes.
  • Check to verify that the Windows Task Scheduler service is running on each of the nodes.
  • If you intend to run SQL Server 2005 Reporting Services, you must then install IIS 6.0 and ASP .NET 2.0 on each node of the cluster.
These are a lot of things you must check, but each of these is important. If skipped, any one of these steps could prevent your cluster from installing or working properly.

How to Install Windows Server 2003 Clustering

Now that all of your physical nodes and shared array or SAN is ready, you are now ready to install Windows 2003 clustering. In this section, we take a look at the process, from beginning to end.
To begin, you must start the Microsoft Windows 2003 Clustering Wizard from one of the nodes. While it doesn’t make any difference to the software which physical node is used to begin the installation, I generally select one of the physical nodes to be my primary (active) node, and start working there. This way, I won’t potentially get confused when installing the software.
If you are using a SCSI shared array, and for many SAN shared arrays, you will want to make sure that the second physical node of your cluster is turned off when you install cluster services on the first physical node. This is because Windows 2003 doesn’t know how to deal with a shared disk until cluster services is installed. Once you have installed cluster services on the first physical node, you can turn on the second physical node, boot it, and then proceed with installing cluster services on the second node.

Installing the First Cluster Node

To begin your installation of SQL Server 2003 Clustering, open Cluster Administrator. If this is the first cluster, then you will be presented with the following window.

From the Action drop-down box, select Create New Cluster and click OK. This brings up the New Server Cluster Wizard, as show below.

Click Next to begin the wizard.

The next steps seem easy because of the nature of the wizard, but if you choose the wrong options, they can have negative consequences down the line. Because of this, it is important that you carefully think through each of your responses. Ideally, you will already have made these choices during your planning stage.
 ---------------
The first choice you must make is the domain the cluster will be in. If you have a single domain, this is an easy choice. If you have more than one domain, select the domain that all of your cluster nodes reside in.
The second choice is the name you will assign the virtual cluster. This is the name of the virtual cluster, not the name of the virtual SQL Server. About the only time you will use this name is when you connect to the cluster with Cluster Administrator. SQL Server 2005 clients will not connect to the cluster using this virtual name.
Once you enter the information, click Next to proceed.

Now, we have to tell the wizard the physical name of the node we want to install clustering on. Assuming that you are running the Cluster Wizard on the primary node of your cluster, then the computer name you see in the above screen will be the name of the physical node you are installing on. If you are installing from one node, but want to install clustering on a different node, you can, but it just gets confusing if you do. It is much easier to install on the same node.
Notice the Advanced button in the screen shot above. If you click on it, you will see the following.

Advanced Configuration Options allow you to choose from between a Typical and an Advanced configuration. In almost all cases, the Typical configuration will work fine, and that is the option we use during this example. The Advanced configuration option is only needed for complex SAN configurations, and is beyond the scope of this article.
So click Cancel to return to the wizard, enter the correct physical node, if need be, and click Next.

This next step is very important. What the Cluster Wizard does is to verify that everything is in place before it begins the actual installation of the cluster service on the node. As you can see above, the wizard goes through many steps, and if you did all of your preparation correctly, when the testing is done, you will see a green bar under Tasks completed, and you will be ready to proceed. But if you have not done all the preliminary steps properly, you may see yellow or red icons next to one or more of the many tested steps, and a green or red bar under Tasks completed.
-----------
Ideally, you will want to see results similar to the figure above, with a green bar and no yellow icons next to the test steps. In some cases, you may see yellow warning icons next to one or more of the test steps, but still see a green bar at the bottom. While the green bar does indicate that you can proceed, it does not mean the cluster will be completed successfully or will be configured like you want it to be completed. If you see any yellow warning icons, you can drill down into them and see exactly what the warning is. Read each warning very carefully. If the warning is something unimportant to you, it can be ignored. But in most cases, the yellow warnings need to be addressed. This may mean you will have to abort the cluster service installation at this time to fix the problem. Then you can try to install it again.
If you get any red warning icons next to any of the test steps, then you will also get a red bar at the bottom, which means that you have a major problem that needs to be corrected before you can proceed. Drill down to see the message and act accordingly. Most likely, you will have to abort the installation, fix the issue, and then try installation again.
Assuming that the installation is green and you are ready to proceed, click Next.

The next step is to enter the IP address of our virtual cluster. This is the IP address for the cluster, not the virtual SQL Server. The IP address must be on the same subnet as all of the nodes in the cluster. Click Next.

Next you enter the name of the domain account you want to use as the cluster service account. You will also enter the account’s password and the name of the domain where the account was created. This account should have already been created in your domain and added to all of the cluster nodes in the Local Administrators Group. Click Next.

The next Cluster Wizard step is the Proposed Cluster Configuration. But before you click Next, be sure to click on the Quorum button and check which drive the Cluster Wizard has selected for the Quorum. In this case Drive Q has been chosen, which is correct. Most of the time, the Cluster Wizard will select the correct drive for the Quorum, but not always. This is why it is important to check to see if the correct drive was chosen. Because I named my Quorum drive “Q,” it is very easy for me to determine that the correct drive was chosen by the Cluster Administrator. That is why I earlier suggested that you name the Quorum drive “Q.”
Assuming everything is OK, click OK to accept the Quorum drive, and then click Next. At this time, the Cluster Wizard will reanalyze the cluster, again looking for any potential problems. If none is found, click Next, and then click Finish to complete the installation of SQL Server 2003 clustering on the first node.
---------

Installing the Second Node of Your Cluster

Once you have installed the first node of your cluster, it is time to install the second node. Like the first node, the second node is installed from Cluster Administrator. Because the cluster already exists, we are just adding the second node to the currently existing cluster. You can install the second node from either the first node or the second node. Personally, I do it from the second node so that I don’t get confused.
To install the second node, turn it on (it should have been off while you installed the first node) and bring up Cluster Administrator. You will get the same window as you saw when you installed the first node. From here, select Add Nodes to Cluster. This brings up the Add Nodes Wizard, which is very similar to the previous New Server Cluster Wizard we just ran, except it has fewer options.
As the wizard proceeds, you will enter the name of the physical node to add to the current cluster, after which a series of tests will be automatically run to verify that the node is ready to be clustered. As before, if you run into any problems—yellow or red warnings—you should correct them first before continuing. Once all problems have been corrected, you are then asked to enter the password for the cluster service account (to prove that you have permission to add a node to the cluster) and the node is added to the cluster.


Verifying the Nodes With Cluster Administrator

Once you have successfully installed the two nodes of your cluster, it is a good idea to view the nodes from Cluster Administrator. When you bring up Cluster Administrator for the first time after creating a cluster, you may have to tell it to Open a Connection to Cluster, and type in the name of the virtual cluster you just created. Once you have done this, the next time you open Cluster Administrator it will automatically open this cluster for you by default.
After opening up Cluster Administrator, what you see will be very similar to the figure below.

Notice that two resource groups have been created for you: Cluster Group and Group 0. The Cluster Group includes three cluster resources: the Cluster IP Address, the Cluster Name, and the Quorum drive. These were all automatically created for you by the Cluster Wizard. We will talk more about Group 0 a little later.
When you look next to each cluster resource, the State for each resource should be Online. If not, then your cluster may have a problem that needs to be fixed. As a quick troubleshooting technique, if any of the resources are not Online, right-click on the resource and choose Bring Online. In some cases, this will bring the resource online and you will not experience any more problems. But if this does not work, then you need to begin troubleshooting your cluster.
Also, next to each resource is listed the Owner of the resource. All the resources in a resource group will always have the same owner. Essentially, the owner is the physical node where the cluster resources are currently running. In the example above, the physical node they are running on is SQL2005A, which is the first node in my two-node cluster. If a failover occurs, then all of the resources in the resource group will change to the other node in your cluster.


How to Configure Windows Server 2003 for Clustering

Before you install SQL Server clustering, there is one small step you need to perform, and that is to prepare a resource group for the SQL Server resources that will be created when SQL Server is installed.
Most likely, when you created the cluster, as above, you will see a resource group named Group 0. This resource group was created when the cluster was created, and it most likely includes the shared resource for your SQL Server databases to use. See below.

In my example, Disk F, the shared array for SQL Server, is in Group 0. If you like, you can leave the resource group with this name, but it is not very informative. I suggest that you rename Group 0 to SQL Server Group. You can do this by right-clicking on Group 0 and selecting Rename.
-----------
In some cases, the Cluster Wizard may put the SQL Server shared disk array in the Cluster Group resource group and not create a Group 0. If this is the case, then you will need to create a new resource group and then move the SQL Server shared disk array from the Cluster Group to the newly created SQL Server resource group.
Here’s how you create a new resource group using Cluster Administrator:
  • Start Cluster Administrator.
  • From the File menu, select New, then select Group. This starts the New Group Wizard.
  • For the Name of the group, enter “SQL Server Group.” Optionally, you can also enter a description of this group. Click Next.
  • Now, you must select which nodes of your cluster will be running SQL Server. This of course will be all of your nodes. The nodes are listed on the left side of the wizard. CTRL-click each of the nodes on the left and then select Add. This will move the selected nodes from the left side of the wizard to the right side. Click Finish.
The new SQL Server Group resource group has now been created.
Now that the group has been created, it must be brought online. Here’s how.
  • From Cluster Administrator, right-click on the SQL Server resource group (it will have a red dot next to it) and select Bring Online.
  • The red dot next to the resource group name goes away, and the SQL Server Group resource group is now online and ready for use.
Now, your next step is to move any disk resources from the Cluster Group (except the Quorum drive) to the SQL Server Group. This is a simple matter of dragging and dropping the disk resources from the Cluster Group to the SQL Server Group. Once you have done this, you are ready for the next step.


Test, Test, and Test Again

Once you have installed Windows 2003 clustering on your nodes, you need to thoroughly test the installation before beginning the SQL Server 2005 cluster install. If you don’t, and problems arise later with Windows 2003 clustering, you may have to remove SQL Server 2005 clustering to fix it, so you might as well identify any potential problems and resolve them now.
Below are a series of tests you can perform to verify that your Windows 2003 cluster is working properly. After you perform each test, verify if you get the expected results (a successful failover). Also be sure to check the Windows event log files for any possible problems. If you find a problem during one test, resolve it before proceeding to the next test. Once you have performed all of these tests successfully, then you are ready to continue with the cluster installation.


Preparing for the Tests

Before you begin testing, identify a workstation that has Cluster Administrator on it, and use this copy of Cluster Administrator for interacting with your cluster during testing. You will get a better test using a remote copy of Cluster Administrator than trying to use a copy running on one of the cluster nodes.

Move Groups Between Nodes

The easiest test to perform is to use Cluster Administrator to manually move the Cluster Group and SQL Server resource groups from the active node to a passive node, and then back again. To do this, right-click on the Cluster Group and then select Move Group.
Once the group has been successfully moved from the active node to a passive node, then use the same procedure above to move the group back to the original node. The moves should be fairly quick and uneventful. Use Cluster Administrator to watch the failover and failback, and check the Event Logs for possible problems. After moving the groups, all of the resources in each group should be in the online state. If not, you have a problem that needs to be identified and corrected.

Manually Initiate a Failover in Cluster Administrator

This test is also performed from Cluster Administrator. Select any of the resources found in the Cluster Group resource group (not the cluster group itself), right-click on it, and select Initiate Failure. Because the cluster service always tries to recover up to three times from a failure, if it can, you will have to select this option four times before a test failover is initiated. Watch the failover from Cluster Administrator. After the failover, then failback using the same procedure as described above, again watching the activity from Cluster Administrator. Check the Event Logs for possible problems. After this test, all of the resources in each group should be in the online state. If not, you have a problem that needs to be identified and corrected.

Manually Failover Nodes by Turning Them Off

This time, we will only use Cluster Administrator to watch the failover activity, not to initiate it. First, turn off the active node by turning it off hard. Once this happens, watch the failover in Cluster Administrator. Once the failover occurs, turn the former active node on and wait until it fully boots. Then turn off the now current active node by turning it off hard. And again, watch the failover in Cluster Administrator. After the failover occurs, bring the off node back on. Check the Event Logs for possible problems. After this test, all of the resources in each group should be in the online state. If not, you have a problem that needs to be identified and corrected.

Manually Failover Nodes by Breaking the Public Network Connections

In this test, we will see what happens if network connectivity fails. First, both nodes being tested should be on. Second, unplug the public network connection from the active node. This will cause a failover to a passive node, which you can watch in Cluster Administrator. Third, plug the public network connection back into the server. Fourth, unplug the public network connection from the now active node. This will cause a failover to the current passive node, which you can watch in Cluster Administrator. Once the testing is complete, plug the network connection back into the server. Check the Event Logs for possible problems. After this test, all of the resources in each group should be in the online state. If not, you have a problem that needs to be identified and corrected.
-----------

Manually Failover Nodes by Breaking the Shared Array Connection

This test is always exciting as it is the test that is most apt to identify potential problems. First, from the active node, remove the shared array connection. This will cause a failover that you can watch in Cluster Administrator. Now reconnect the broken connection. Second, from the now active node, remove the shared array connection. Watch the failover in Cluster Administrator. When done, reconnect the broken connection. Check the Event Logs for possible problems. After this test, all of the resources in each group should be in the online state. If not, you have a problem that needs to be identified and corrected.
As I mentioned before, if any particular test produces unexpected problems, such as failover not working or errors are found in the Event Logs, identify and resolve them now before proceeding with the next test. Once you have resolved any problems, be sure to repeat the test that originally indicated the problem in order to verify that it has been fixed.
Now that you have completed the Windows 2003 cluster installation and have tested it, you are ready to install and configure the Microsoft Distributed Transaction Coordinator.

Configuring the Microsoft Distributed Transaction Coordinator

While not required, it is recommended that you install the Microsoft Distributed Transaction Coordinator (MS DTC) on each of the cluster nodes before installing SQL Server 2005 clustering. This is because SQL Server 2005 requires this service in order to perform some functions, including running distributed queries, two-phase commit transactions, and some aspects of replication. MS DTC must be installed after installing Windows 2003 clustering, but before installing SQL Server 2005 clustering.

Installing MS DTC Using Cluster Administrator

While MS DTC can be set up for clustering from the command line, it is much easier to use Cluster Administrator, as described below. This is because this procedure automatically configures MS DTC on all of the cluster nodes at the same time. Take your time to ensure that you do it right the first time.
  • Start Cluster Administrator.
  • Right-click on the Cluster Group resource group, select New, then Resource. This starts the new Resource Wizard.
  • In the first screen of the Resource Wizard, enter the name of the resource you are creating, which would be “MSDTC Resource.” If you like, you can also enter an optional description of this resource. Under Resource Type, select Distributed Transaction Coordinator. Under Group, Cluster Group should already be displayed. Click Next.
  • In the Possible Owners dialog box, you will see that all of the nodes of the cluster are listed under Possible Owners. This is correct and should not be changed. Click Next.
  • In the Dependencies dialog box, press and hold the CTRL key on the Quorum disk resource and the Cluster Name, then click Add. Then click finish.
At this time, the MSDTC Resource is created.
Now that the resource has been created, it must be brought online. Here’s how.
  • From Cluster Administrator, right-click on the MSDTC Resource (it will have a red dot next to it) and select Bring Online.
The red dot next to the resource name goes away, and the MSDTC Resource is now online and ready for use. If the new resource won’t come online, delete it and try again.

Ready to Install SQL Server 2005

Finally, you are ready to install SQL Server 2005 clustering. This topic will be covered in my next article.

No comments:

Post a Comment