XenonStack Recommends

DevOps

Types of RAID Storage for Databases in Public Cloud

Gursimran Singh | 13 August 2024

Introduction to RAID Storage

RAID stands for redundant array of independent disks. RAID storage uses different disks to provide fault tolerance, to improve overall performance, and to increase storage size in a system. This is in comparison with older storage devices that used only a single disk drive to store data. RAID stores the data in a defined way to improve overall performance. In addition to it, RAID disk drives are used commonly on servers but aren't required for personal computers. In this blog, you will explore how RAID can be used with persistent block storage provided by a public cloud-like AWS EBS, Azure Disk and Google Cloud Persistent Storage for databases like Cassandra, MongoDB, MYSQL and Druid.

Why Do We Need RAID Storage

Every Persistent block storage service has limitations. RAID storage provides a limited number of IOPS, speed, and storage according to the disk type you are choosing. We hit bottleneck if our IOPS needs increases or if we need more speed or redundancy. So that If a disk crashes the database setup will fail. However, we can create a cluster and add more server but it will increase the cost. This is where RAID kicks in. It will provide both speed and disk fault tolerance.

Types Of Raid Storage in the cloud

RAID 0

  • We can use RAID 0 in the public cloud.
  • Minimum No. of Disks required for RAID 0 is 2.
  • All the space of both disks is available.
  • It only provides speed as it splits the data between disks.
  • It is should only be used in a cluster setup, not in single node setup.
  • Recommended using only on the non-critical environment or multi-node setup.

RAID 1

  • RAID 1 is used where redundancy is required. It doesn’t offer speed as it mirrors the data between two disks.
  • Minimum No. of Disks required for RAID 1 is 2.
  • Storage Efficiency is reduced to 50% of the available disk.
  • Can be used in both single node and multinode cluster.

RAID 10

  • RAID 10 is the combination of both RAID 1 and RAID 0.
  • It offers redundancy and speed. It both splits and replicates data between disks.
  • Minimum No. of Disks required for RAID 10 is 4.
  • Storage Efficiency is reduced to 50% of the available disk.
  • Can be used in both single node and multinode cluster.
  • Recommended to use in a mission-critical environment where speed is required
  • It offers speed more than RAID 1, RAID 5 & 6
  • It can handle up to 1 disk failure and it can handle 2 disk failure if both disks are from separate RAID 1.
  • It is not recommended to use RAID 5 and RAID 6 as they use parity to distribute data across disks, which are very resource intensive.

RAID Storage Recommendations According to Databases

When it comes to databases no RAID will completely satisfy your both read and write needs. You have to choose wisely so that you don’t face any performance issue and higher I/O wait time. Now, we are going to discuss what RAID option will fit the best of the following databases:-

MYSQL

It is the most popular open-source SQL database. It supports features like clustering, replication offering High Availability. It is most popular amongst web-hosting companies. It is easy to install and use. It has poor scaling performance and it doesn’t handle big databases efficiently.

RAID 0

The use of RAID 0 is not recommended for MYSQL. As the failure of a single disk will fail the RAID.  We can RAID 0 in a cluster where performance is required and failure of a single node will not affect the working of MYSQL.

RAID 1

RAID 1 is recommended for the single node cluster where redundancy is required. RAID 1 will not provide any performance boost to the MySQL.

RAID 10

It will provide both performance and redundancy is required. It is recommended to use RAID 10 for MYSQL either for single or cluster deployment. It will definitely increase the storage cost as more disks are required for the RAID.

MongoDB

MongoDB is a NoSQL database which supports a rich and expressive object model. In MongoDB, objects can have properties and objects can be nested in one another.

RAID 0

It is not recommended to use RAID 0 for MongoDB. A single drive can cause disk failure. It can surely provide more performance but no redundancy.

RAID 1

We can use RAID 1 with MongoDB. It will provide disk redundancy preventing us from a disk failure. We can use RAID 1 where performance is not required but only redundancy.

RAID 10

RAID 10 will increase both disk performance and provides redundancy. It is recommended to use RAID10 with MongoDB either in single deployment or in a cluster deployment. It will definitely increase the disk cost as the number of disks required will increase.

Cassandra

Cassandra is a NoSQL distributed database. It is self-healing, scalable and runs on commodity hardware without a single point of failure. It keeps copies of data over multiple servers depending upon the replication that we set and thus preventing us from node failure.

RAID 0

In Cassandra, we can use RAID 0 if we large cluster and can handle disk failure easily. RAID 0 will definitely increase the cluster performance within the budget as no additional devices will be needed. Although RAID 0 is not recommended with a small size cluster.

RAID 1

We can use RAID 1 if we have a single or two-node cluster just to avoid disk failure in the case where we can’t avoid node failures. It will not provide any performance boost only protection from disk failure. Disk cost will increase using RAID 1.

RAID 10

It can be used if we have a single or small cluster to where more disk performance and redundancy are required. It will definitely increase disk performance and storage costs as more disks are required for RAID10. It can be only used where disk and node failure is not tolerated. It is not recommended to use RAID 10 in a large cluster.

DRUID

Druid is an open-source data store designed to run queries on real-time and historical data. And, It is the distributed, in-memory OLAP data store which provides low latency data ingestion, flexible data exploration, and fast data aggregation primarily used for business intelligence. Adding more, it is scalable up to trillions of events and petabytes of data. In druid for Metadata preferred backend is MYSQL or Postgresql and for Deep Storage preferred backend is S3, HDFS but we can also use local/file storage.

RAID 0

It is not recommended to use RAID 0 with HDFS  and file storage for Deep Storage. Single disk failure will cause data to lose.

RAID 1

It is not recommended to use RAID 1 with HDFS for Deep Storage. If we are using a single node deployment and using local storage then we can use RAID 1  for redundancy but it will not increase performance.

RAID 10

We can use RAID 10 with local/file storage for Deep Storage. It will provide both redundancy and performance. It should be used with single node deployment. It is not recommended to use RAID 10 with HDFS.

Concluding RAID Storage for Databases

To sum up, for distributed databases like Cassandra and druid, it is not the best decision to use RAID unless you have a small cluster. For databases like MongoDB and MySQL, we have to use RAID 10 for redundancy and performance. RAID should be chosen according to current and future workloads. This is where we can help you to decide what solution to choose so that you can get the required performance and redundancy.

How Can XenonStack Help You?

XenonStack offers Managed Cloud Services and DevOps Consulting Services and Solutions