Wednesday, August 31, 2005

Information about Bb5 Outage

David Carter-Tod (ITS - Client Services, Virginia Community College System) posted the following on Wednesday, August 31, 2005 - 4:17 pm

IF YOU ARE A FACULTY THAT NEEDS ACCESS TO Bb5 READ THE FOLLOWING CAREFULLY AND CONTACT ME- RUTH SMITH (smithru@tncc.edu) 825-2807 BEFORE DONING ANYTHING!


The following message has just been sent to the Technology Council, Academic Vice-Presidents, and CSLs.

---
On August 23, the Blackboard 5 environment experienced a catastrophic failure. Since that time, VCCS engineers have been working diligently to restore service. The length of the outage as well as its severe impact on those students and faculty using the system requires a more detailed explanation than has been provided to date. I will therefore attempt to summarize events as best I can.

The cause appears to have been a hardware failure impacting the storage system of the shared content file system where any files attached to courses are stored. The storage system consists of an array of hard drives similar to those found in a desktop computer, though much faster in terms of read and writes times. To ensure against the failure of any one hard drive, these drives are configured to redundantly store all data across all drives in the array. In the event of a failure, the information stored on a failed disk can be reconstructed using the data on the remaining ones.

It would appear that the Blackboard 5 system experienced multiple disk failures resulting in a corrupt file system that is not recoverable. The system is, however, backed up onto tapes for just such an event. The VCCS uses a backup system from IBM called Tivoli Storage Manager. The last backup of the system was performed on August 13, just prior to the release of the new Blackboard environment, Blackboard 6. VCCS engineers are now working to restore the Blackboard system from these backup tapes. The immense size of the file system (approximately 12 million files) which serves all 23 colleges in the VCCS system has complicated the restore process. The restore process has failed several times with each attempt taking up to 14 hours. This has resulted in a much longer outage than was originally anticipated.

The problems experienced are complex and extremely difficult to overcome given the long time required for each attempt. At this point, we have completed a partial restore and can allow access to those users with an urgent need. It must be understood, however, that the file system is only partially restored at this point. Users will occasionally find broken links or missing content. This is to be expected at this stage.

Below are some guidelines for instructors using the partially restored system:

From a user perspective what may be missing are files that were attached to course elements, for example, uploaded .doc files or .ppt files. These could be in regular content areas (e.g. Course Information, Syllabus) or in quizzes where images were used.

Faculty:

  • Do not remove broken links, the missing content will be restored in due time. If the link is removed, the content will not be accessible once the restore is complete.
  • Missing content may be available from courses copied to Blackboard 6 or from local college backups of Blackboard courses. If content that is critical to users is missing in Blackboard 5, but available in Blackboard 6, faculty may choose to upload it to Blackboard 5 for use during this restoration period. We recommend that it is uploaded as a new item rather than replacing a missing content item.
  • Students must be informed of the limitations of the current Blackboard 5 environment and be asked to refrain from calling the college help desk or their Blackboard Administration to report missing content or broken links.

The partially restored Blackboard 5 environment can be accessed by going to the link below.

http://164.106.66.46/, as well as the link from Blackboard 6, will continue to show the outage notice until the restore process is complete.

Engineers believe they have resolved the issues that have caused previous restoration attempts to fail. The restoration of all missing content has begun, but will be initially written to another location to allow users to access the system during the process. When the restoration is complete, a short outage will be required to add the restored content to the system. This outage will be announced in advance. It is anticipated that the restore process now underway will take at least two days to complete.

It is important to note that the vulnerabilities of the Blackboard 5 environment have been largely eliminated in the Blackboard 6 environment. Not only is a failure of this type much less likely, the restoration would also be easier and quicker. I will address this topic in more detail later.

Please feel free to contact me if you have questions.

Ralph Lucia

No comments: