Pages

Tuesday, March 21, 2017

Cluster Continuous Replication - Solving the failover

What is CCR in MS Exchange Server?

The CCR issue i.e Cluster Continuous Replication is faced by many of the MS Exchange users nowadays. This blog will discuss the CCR related problems with the reasons and their proper solutions. The Exchange Server of Microsoft has two fundamental elements known as the transaction logs and the database. First, the information is written to the Extensible Storage Engine memory cache and then to a transaction log by a log writer, and finally to the database (.edb file). The high availability features of the Microsoft Exchange Server 2007 is a Cluster Continuous Replication (CCR). In addition, it merges the asynchronous log shipping and replay technology in Exchange 2007 with the management and failover features rendered by the Cluster service. Moreover, it offers full availability for the Exchange 2007 mailbox server with:
In addition, it merges the asynchronous log shipping and replay technology in Exchange 2007 with the management and failover features rendered by the Cluster service. Moreover, it offers full availability for the Exchange 2007 mailbox server with:
  • no single point of failure
  • no requirement of shared storage
  • no special hardware requirements
  • reduce the full backup frequency and backed up the data volume

However, there are some issues that users encounter while using CCR and searching for the solution to troubleshoot the same. Therefore, in this post, all issues encountered by the user in CCR and how to troubleshoot Cluster continuous replication issues are discussed in detail.

Problems & Solutions of Cluster Continuous Replication

This segment of the post addressed the CCR issues and their respective solution based on the cause of the problem.

Problem 1

Get-StorageGroupCopyStatus “Failed” signifies that the database is “Failed” and not properly seeded.

Reason:
A replication copy or the configuration problem does not have a legitimate baseline database. The main reason behind this is that seeding the storage group copy was not done when the inactive node was added.
Solution:
  • Confirm the storage for the copy is configured and work properly. If there is an error, a user can initiate a new check of the copy by stopping and resuming the storage group.
  • Confirm that the database paths and the storage group are correctly configured properly respective to the storage on the inactive server. One can use Exchange Management Console and execute Get-StorageGroup cmdlet.
  • To seed, the storage group copy, use the Update-StorageGroupCopy cmdlet.

Problem 2

Get-StorageGroupCopyStatus “Failed” signifies that the database is "Failed", and the FailedMessage value determine the storage group copy has moved.

Reason:
The main reason behind this is a failover, and when the most of the logs were lost due to which the database on the current active server cannot be resynchronized with the previously active server without a complete reseed.
Solution:
One can seed the storage group copy using Update-StorageGroupCopy cmdlet.

Problem 3

Get-StorageGroupCopyStatus indicates that the database is “Failed”. Moreover, the FailedMessage value render particular information related to the source of the failure.

Reason:
It can also be possible that storage group copy is reported as failed due to many different causes being determined. For example, the above two cases i.e. not being diverged and seeded. The FailedMessage value specially here identifies the detected problem.
Solution:
  • Execute the Get-StorageGroupCopyStatus cmdlet to get the whole value of FailedMessage that determines the detected problem.
  • After obtaining the value of FailedMessage, analyze the received information and try to resolve the reported condition.
  • If the missing log and corruption is the reported condition then, with the help of correct generation number, try to search a non-corrupted log.
  • One can also run the Update-StorageGroupCopy if the correct log is not found.
  • If there is no log, then one can delete the share from the source’s log directory and can restart the replication service on that particular node.

Problem 4

Get-ClusteredMailboxServerStatus indicates one or more databases failed. Alternatively, failover succeeds, but few databases do not mount either automatically or manually.

Reason:
The current failover results in more lost logs when compared with the logs reported by the settings of automatic mounting configuration. Another possible behind this error is that at the time of failure the passive copy was not healthy.
Solution:
  • First of all, review the event log to find the reason behind the database failed to mount issue.
  • One of the possible reason is corruption in the database or logs files. A user by moving the active user to the other node can restore the access to the database.
  • One can determine the failure occur by reviewing the event log/
  • After determining the status of the storage group copy’s database, a user can mount it in Exchange Management Shell by executing the Restore-StorageGroupCopy cmdlet.
  • After that, execute the Get-StorageGroupCopyStatus cmdlet and go to the SummaryCopyStatus value to determine the reason behind the mounting failure is previously active copy.

Problem 5

At startup in a CCR environment, database fails to mount.

Reason:
A possible reason that Exchange database fails to mount in cluster continuous environment is due to explicit administration action. The database will not be taken online at the next start up it the database is dismounted explicitly, and the clustered mailbox server is already offline. Another reason behind this is that non-acceptable number of logs was lost during a failover.
Solution:
  • Use Exchange Management Shell and execute the Get-ClusteredMailboxServerStatus cmdlet to confirm that the store is working on the node.
  • To try a mount operation on the affected database, one can go for the Exchange Management Shell or the Exchange Management Console.

Conclusion

In Exchange Server 2007, the Cluster Continuous Replication i.e. CCR offers various services that are used by the users. However, there are some situations due to which many times users encounter issues in CCR. Therefore, considering the user requirement, we have discussed various issues related to it and how a user can troubleshoot Cluster Continuous Replication issues. One can choose any of the solutions according to the cause of error.

0 comments:

Post a Comment

Post a reply