博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Introduction to RAID,part2
阅读量:2185 次
发布时间:2019-05-02

本文共 8253 字,大约阅读时间需要 27 分钟。

In this illustration when block A1 is written to disk 0, the same block is also written to disk 1. Since the disks are independent of one another, the write to disk 0 and the write to disk 1 can happen at the same time. However, when the data is read, the RAID controller can read block A1 from disk 0 and block A2 from disk 1 at the same time since the disks are independent. So overall, the write performance of a RAID-1 array is the same as a single disk, and the read performance is actually faster from a RAID-1 array relative to a single disk.

 

The strength of RAID-1 lies in the fact that disks contains copies of the data. So if you lose disk 0, the exact same data is also on disk 1. This greatly improves data reliability or availability.

The capacity of RAID-1 is the following:

Capacity = min(disk sizes)

meaning that the capacity of RAID-1 is limited by the smallest disk (you can use different size drives in RAID-1). For example, if you have a 500GB disk and a 400GB disk, then the maximum capacity would be 400GB (i.e. 400GB of the 500GB drive is used as a mirror, and the remaining 100GB is not used). RAID-1 has the lowest capacity utilization of any RAID configuration.

The reliability or probability of failure is also described in . Since the disks are mirrors of one another but still independent, the probability of having both disks fail, leading to data lose, is the following:

P(dual failure) = P(single drive)2

So the probability of failure of a RAID-1 configuration is the square of the failure probability of a single drive. Since the probability of failure of a single drive is less than 1, that means that the failure of a RAID-1 array is even smaller than the probability of failure of a single drive. The has a more extensive discussion about the probability of failure but in general, the probably is fairly low.

 

One might be tempted to use RAID-1 for storing important data in place of backups of the data. While RAID-1 improves data reliability or availability, it does not replace backups. If the RAID controller fails, or if the unit containing the RAID-1 array suffers some sort of failure, then the data is not available and may even be lost. Without a backup you don’t have a copy of your data anymore. However, if you make a backup of the data, you would have a copy. The moral of the tale is - make real backups and don’t rely on RAID-1.

Table 2 below is a quick summary of RAID-1 with a few highlights.

Table 2 - RAID-1 Highlights

Raid Level Pros Cons Storage Efficiency Minimum Number of disks
RAID-1

 

  • Great data redundancy/availability
  • Great MTTF

 

  • Worst capacity utilization of single RAID levels
  • Good read performance, limited write performance
50% assuming two drives of the same size 2

RAID-2

This RAID level was one of the original five defined, but it is no longer really used. The basic concept is that RAID-2 stripes data at the bit level instead of the block level (remember that RAID-0 stripes at the block level) and uses a for parity computations. In RAID-2, the first bit is written on the first drive, the second bit is written on the second drive, and so on. Then a Hamming-code parity is computed and either stored on the disks or on a separate disk. With this approach you can get very high data throughput rates since the data is striped across several drives, but you also lose a little performance because you have to compute the parity and store it.

 

A cool feature of RAID-2 is that it can compute single bit errors and recover from them. This prevents data errors or what some people call “bit rot”. For an overall evaluation of RAID-2, there is this .

 

According to article hard drives added error correction that used Hamming codes, so using them at the RAID level became redundant so people stopped using RAID-2.

RAID-3

RAID-3 uses data striping at the byte level and also adds parity computations and stores them on a dedicated parity disk. Figure 3 from (image by Cburnett) illustrates how the data is written to four disks in RAID-3.

Figure 3: RAID-3 layout (from Cburnett at wikipedia under the GFDL license)

This RAID-3 layout uses 4 disks and stripes data across three of them and uses the fourth disk for storing parity information. So a chunk of data “A” has byte A1 written to disk 0, byte A2 is written to disk 1, and byte A3 written to disk 3. Then the parity of bytes A1, A2, and A3 is computed (this is labeled as Ap(1-3) in Figure 3) and written to disk 3. The process then repeats until the entire chunk of data “A” is written. Notice that the minimum number of disks you can have in RAID-3 is three (you need 2 data disks and a third disk to store the parity).

 

RAID-3 is also capable of very high performance while the addition of parity gives back some data reliability and availability compared to a pure striping model ala’ RAID-0. Since the number of disks in a stripe is likely to be smaller than a block all of the disks in a byte-level stripe are accessed at the same time improving read and write performance. However, the RAID-3 configuration some possible side effects.

 

In particular, this explains that RAID-3 cannot accommodate multiple requests at the same time. This results from the fact that a block of data will be spread across all members of the RAID-3 group (minus the parity disk) and the data has to reside in the same location on each drive. This means that the disks (spindles) have to be accessed at the same time, using the same stripe, which usually means that the spindles have to be synchronized. As a consequence, if an I/O request for data chunk A comes into the array (see Figure 3), all of the disks have to seek to the beginning of the chunk A and read their specific bytes and send it back to the RAID-3 controller. Any other data request, such as that for a data chunk labeled B in Figure 3 is blocked until the request for “A” has completed because all of the drives are being used.

The capacity of RAID-3 is the following:

Capacity = min(disk sizes) * (n-1)

meaning that the capacity of RAID-3 is limited by the smallest disk (you can use different size drives in RAID-3) multiplied by the number of drives n , minus one. The “minus one” part is because of the dedicated parity drive.

 

RAID-3 has some good performance since it is similar to RAID-0 (striping), but you have to assume some reduction in performance because of the parity computations (this is done by the RAID controller). However, if you lose the parity disk you will not lose data (the data remains on the other disks). If you lose a data disk, you still have the parity disk so you can recover data. So RAID-3 offers more data availability and reliability than RAID-0 but with some reduction in performance because of the parity computations and I/O. More discussion about the performance of RAID-3 is contained at this link .

 

RAID-3 isn’t very popular in the real-world but from time to time you do see it used. RAID-3 is used in situations where RAID-0 is totally unacceptable because there is not enough data redundancy and the data throughput reduction due to the data parity computations is acceptable.

Table 3 below is a quick summary of RAID-3 with a few highlights.

Table 3 - RAID-3 Highlights

Raid Level Pros Cons Storage Efficiency Minimum Number of disks
RAID-3

 

  • Good data redundancy/availability (can tolerate the lose of 1 drive)
  • Good read performance since all of the drives are read at the same time
  • Reasonable write performance but parity computations cause some reduction in performance
  • Can lose one drive without losing data

 

  • Spindles have to be synchronized
  • Data access can be blocked because all drives are accessed at the same time for read or write
(n - 1) / n where n is the number of drives 3 (have to be identical)

RAID-4

RAID-3 improved data redundancy by adding a parity disk to add some reliability. In a similar fashion, RAID-4 builds on RAID-0 by adding a parity disk to block-level striping. Since the striping is now down to a block level, each disk can be accessed independently to read or write data allowing multiple data access to happen at the same time. Figure 4 below from (image by Cburnett) illustrates how the data is written to four disks in RAID-4.

Figure 4: RAID-4 layout (from Cburnett at Wikipedia under the GFDL license)

转载地址:http://xemkb.baihongyu.com/

你可能感兴趣的文章
一个 tflearn 情感分析小例子
查看>>
attention 机制入门
查看>>
手把手用 IntelliJ IDEA 和 SBT 创建 scala 项目
查看>>
GAN 的 keras 实现
查看>>
AI 在 marketing 上的应用
查看>>
Logistic regression 为什么用 sigmoid ?
查看>>
Logistic Regression 为什么用极大似然函数
查看>>
SVM 的核函数选择和调参
查看>>
LightGBM 如何调参
查看>>
用 TensorFlow.js 在浏览器中训练神经网络
查看>>
cs230 深度学习 Lecture 2 编程作业: Logistic Regression with a Neural Network mindset
查看>>
梯度消失问题与如何选择激活函数
查看>>
为什么需要 Mini-batch 梯度下降,及 TensorFlow 应用举例
查看>>
为什么在优化算法中使用指数加权平均
查看>>
什么是 Q-learning
查看>>
用一个小游戏入门深度强化学习
查看>>
如何应用 BERT :Bidirectional Encoder Representations from Transformers
查看>>
5 分钟入门 Google 最强NLP模型:BERT
查看>>
强化学习第1课:像学自行车一样的强化学习
查看>>
强化学习第2课:强化学习,监督式学习,非监督式学习的区别
查看>>