RAID EE Technology

As hard disk capacity increases, the amount of time required to rebuild RAID data has also dramatically increased. This makes one of the most troubles for enterprise storage management today. In the past days when the hard disk capacity was only 10GB to 100GB, RAID built was a job that could be completed in 10 minutes or even more than 10 minutes, which was not yet a problem without special concern. However, as disk capacity grows to hundreds of GB and even TB, RAID rebuild times have increased to hours or even days, it becomes a major problem in storage management.

 

Why RAID Rebuild is Time-Consuming

There are several factors that affect the RAID rebuild time:

• HDD Capacity: The HDD capacity makes up the disk group, the larger the HDD capacity, the longer the rebuild time is required.

• Quantity of Disk Drives: The quantity of disk drives included in a disk group affects the amount of time it takes for the system to read data from the remaining healthy disk drives and write them to the hot spare disk drives. The more disk disks, the longer the rebuild time.

• Rebuild Job Priority: During RAID rebuild, the system still has to assume I/O access to the front-end host. The higher the priority assigned to the RAID rebuild job, the faster the rebuild, but the less the front-end host gains I/O performance.

• Fast Rebuild: Enabling fast rebuild function only need to rebuild the actual capacity of the volume, unused disk group space has not to rebuild. If only part of the space in a disk group is used by the volume, the rebuild time will be shortened.

• RAID level: RAID 1 and RAID 10 with direct block-to-block replication will rebuild faster than RAID 5 and RAID 6 with parity calculations.

Given the potential for failure on each disk drive, the more disk drives contain in a disk group, the more possibility of cumulative failure increase, so there is an upper limit on the quantity of disk drives in a disk group. Compared with the previous factors, the increasing impact of the disk drive capacity on the rebuild speed has become the primary factor. Such a long rebuild time is apparently not acceptable to any user. To solve the problems of traditional RAID, we implement RAID EE technology.

Theory of RAID EE

RAID EE adds more spare disks in a disk group, we call them RAID EE spares to separate the original global, local, and dedicated spares. Spare areas are preserved in each stripe of the disk group and are distributed in the disk group by means of disk rotation. When disks failed in the disk group, missing data is rebuilt into the preserved spare areas. Since all disks in the set are the destination of rebuilt data, the bottleneck of traditional RAID rebuild is gone, rebuild performance dramatically improved. If new disks are added in, data in spare areas are copied back to new joined disks.

Four new RAID levels are provided for RAID EE, there are:

• RAID 5EE (E stands for Enhanced), requires a minimum of 4 disk drives with one RAID EE spare disk which can tolerate 2 disk drives failure. Adding more RAID EE spares will tolerate more disk drives failure.

• RAID 6EE requires a minimum of 5 disk drives.

• RAID 50EE requires a minimum of 7 drives.

• RAID 60EE requires a minimum of 9 drives.

 
  RAID 5EE RAID 6EE RAID 50EE RAID 60EE
Min. Drives 4 5 7 9
Data Protection One drive failure Two drive failure One drive failure Two drive failure
Read Performance Very Good Very Good Very Good Very Good
Write Performance Good Fair to Good Good Fair to Good
Capacity
(N = drive quantity, M = drive capacity, G = subgroups, S = RAID EE spares)
(N-1-S) x M
(e.g., (10 drives – 1 – 2 spares) x 1TB = 7TB)
(N-2-S) x M
(e.g., (10 drives - 2 -2 spares) x 1TB = 6TB)
(N-G-S) x M
(e.g., (10 drives - 2 subgroups - 2 spares) x 1TB = 6TB)
(N-2xG-S) x M
(e.g., (10 drives - 2 x 2 subgroups - 2 spares) x 1TB = 4TB)
Capacity Utilization
(Min.~26 drives)
18%~92%
(e.g., 7/10 = 70%)
17%~88%
(e.g., 6/10 = 60%)
29%~88%
(e.g., 6/10 = 60%)
25%~80%
(e.g., 4/10 = 40%)
Typical Applications Data warehouse, Web service, Archive Data archive, High Availability solution, Server with large capacity requirement Large database, File server, Application server Data archive, High Availability solution, Server with large capacity requirement

 

Download RAID EE Technology White Paper >

Example of RAID 5EE with 1 RAID EE spare

Now we take an example to describe how it works. The following example is a RAID 5EE with 5 disks. 4 disks are for RAID disks, and additional one disk is for RAID EE spare. After initialization, data block distribution is as follows. P is stands for parity, S is stands for RAID EE spare, and it is empty now.

Assume that disk 2 is failed. RAID 5EE is under degraded mode.

The spare areas are rebuilt with data from the failed disk drive. This action is called EE Rebuild. After rebuild, data distributed is like RAID 5 and it can tolerate another failed disk drive. As we can imagine, the more RAID EE spare disks, the faster it rebuilds.

When a new disk drive is joined into the RAID EE disk group, the data rebuilt in the spare area will be copied back to the new disk. This action is called Copyback. After copied back, it is back to RAID 5EE normal state.

Example of RAID 60EE with 2 RAID EE spares

Take another example of a RAID 60EE with 10 disks. 8 disks are for RAID disks, and 2 disks are for RAID EE spares. After initialization, data block distribution is as follows. Rebuild and copy back of RAID 60EE is similar as the above; it will not be repeated here.

 

Test Results

Test Case 1: RAID 5 vs. RAID 5EE

This test provides the comparison of rebuild time and copyback time between RAID 5 and RAID 5EE. We assume that the more RAID EE spare disks will have less rebuild time. First we create a RAID 5 pool. After initialization, plug out and then plug in one disk drive. Count the rebuild time with different I/O access patterns. Continue to create RAID 5EE with 1 / 2 / 4 /8 x RAID EE spare disks in sequence. After initialization, plug out one disk drive. The RAID EE starts rebuilding. Count the rebuild time with different I/O access patterns. Then plug in one disk drive and set it as dedicated spare, it starts copying back. Last, count the copyback time.

 

 

 

 

Summary

• RAID EE can improve rebuild time by up to 48%.

• The more RAID EE spare disks are used, the less rebuild time is.

• Rebuild time is more effective when there are reading accesses.

Test Equipment & Configurations

Server Storage
• Model: ASUS RS700 X7/PS4 (CPU: Intel Xeon E5-2600 v2 / RAM: 8GB) • Model: JetStor 824FXD
• iSCSI HBA: Intel 82574L Gigabit Network Connection • Memory: 16GB (2 x 8GB in bank 1 & 3) per controller 
• OS: Windows Server 2012 R2 • Firmware 1.3.0
I/O Pattern • HDD: 24 x Seagate Constellation ES, ST500NM0001, 500GB, SAS 6Gb/s
• Tool: IOmeter V1.1.0 • HDD Pool:
• Workers: 1 • RAID 5 Pool with 16 x NL-SAS HDDs in Controller 1
• Outstanding (Queue Depth): 128 • RAID 5EE Pool with 17 (16+1 x RAID EE spare) x NL-SAS HDDs in Controller 1
• Access Specifications: • RAID 5EE Pool with 18 (16+2 x RAID EE spares) x NL-SAS HDDs in Controller 1
Backup Pattern (Sequential Read / Write 256KB (MB/s)) • RAID 5EE Pool with 20 (16+4 x RAID EE spares) x NL-SAS HDDs in Controller 1
Database Access Pattern (as defined by Intel/StorageReview.com, 8KB,      67% Read, 100% Random • RAID 5EE Pool with 24 (16+8 x RAID EE spares) x NL-SAS HDDs in Controller 1
File Server Access Pattern (as defined by Intel) • HDD Volume: 100GB in Pool
Idle  

 

Test Case 2: RAID 60 vs. RAID 60EE

This test provides the comparison of rebuild time and copyback time between RAID 60 and RAID 60EE. The same, we assume that the more RAID EE spare disks will have less rebuild time and RAID 60EE will have better efficiency. First, we create a RAID 60 pool. After initialization, plug out and then plug in one disk drive. Count the rebuild time with different I/O access patterns. Continue to create RAID 60EE with 1 / 2 / 4 /8 x RAID EE spare disks in sequence. After initialization, plug out one disk drive. The RAID EE starts rebuilding. Count the rebuild time with different I/O access patterns. Then plug in one disk drive and set it as dedicated spare, it starts copying back. Last, count the copyback time.

 

 

Summary

• RAID EE can improve rebuild time by up to 58%.
• The more RAID EE spare disks are used, the less rebuild time is.
• Rebuild time is more effective when there are reading accesses.

 

Test Equipment & Configurations

Server Storage
• Model: ASUS RS700 X7/PS4 (CPU: Intel Xeon E5-2600 v2 / RAM: 8GB) • Model: JetStor 824FXD
• iSCSI HBA: Intel 82574L Gigabit Network Connection • Memory: 16GB (2 x 8GB in bank 1 & 3) per controller
• OS: Windows Server 2012 R2 • Firmware 1.3.0
I/O Pattern • HDD: 24 x Seagate Constellation ES, ST500NM0001, 500GB, SAS 6Gb/s
• Tool: IOmeter V1.1.0 • HDD Pool:
• Workers: 1 • RAID 5 Pool with 16 x NL-SAS HDDs in Controller 1
• Outstanding (Queue Depth): 128 • RAID 5EE Pool with 17 (16+1 x RAID EE spare) x NL-SAS HDDs in Controller 1
• Access Specifications: • RAID 5EE Pool with 18 (16+2 x RAID EE spares) x NL-SAS HDDs in Controller 1
• Backup Pattern (Sequential Read / Write, 256KB (MB/s)) • RAID 5EE Pool with 20 (16+4 x RAID EE spares) x NL-SAS HDDs in Controller 1
• Database Access Pattern (as defined by Intel/StorageReview.com, 8KB, 67% Read, 100% Random) • RAID 5EE Pool with 24 (16+8 x RAID EE spares) x NL-SAS HDDs in Controller 1
• File Server Access Pattern (as defined by Intel) • HDD Volume: 100GB in Pool
• Idle  

Conclusion

As drive capacity grows, RAID rebuild time grows linearly. The more disk drives contain in a disk group, the more possibility of cumulative failure increase, so does the increasing impact of the disk drive capacity on the rebuild speed. Using RAID EE technology will greatly reduce these risks.