Hello Friends! I hope you are enjoying this amazing journey of AWS complete tutorial series. We at Daily Code Buffer, always try to give you the quality content based on current industry standards and requirements. Following the same expectation, we are here to represent the next important chapter that is: Relational Databases Systems(RDS) and DynamoDB services in AWS. We highly recommend that if you are new here, you can always start from the beginning of this amazing tutorial series of AWS by clicking here.
In this chapter, we will cover the following topics:
- Introduction to Database
- Relational Database Management Services
- NoSQL databases
- AWS Database service overview
- Amazon Aurora DB
- Amazon Aurora Vs. MySQL
- Maria DB
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
- Creating an Amazon RDS MySQL DB instance
- Monitoring RDS instances
- Creating a snapshot
- Introduction to DynamoDB
- Primary Keys
- Secondary indexes
- Create a DynamoDB table
- Setting auto-scaling while creating a table
- Query and Scan DynamoDB table
Introduction to Database:
- A database engine allows your application to access, manage, and search large volumes of data records.
- In a well-architected application, the database will need to meet the performance demands, the availability needs, and the recoverability characteristics of the system.
- Generally, database systems and engines can be grouped into two broad categories:
- Relational Database Management Systems (RDBMS)
- NoSQL (or non-relational) databases.
Relational Database Management Services:
- The relational database has roots going back to the 1970s when Edgar F. Codd, working for IBM, developed the concepts of the relational model.
- Today, relational databases power all types of applications from social media apps, e-commerce websites, and blogs to complex enterprise applications.
- Commonly used relational database software packages include MySQL, PostgreSQL, Microsoft SQL Server, and Oracle.
- Relational databases provide a common interface that lets users read and write from the database using commands or queries written using Structured Query Language (SQL). A relational database consists of one or more tables, and a table consists of columns and rows similar to a spreadsheet.
- A database column contains a specific attribute of the record, such as an empId, empName, empContactNo, etc…
- Each attribute is assigned a data type such as text, number, or date, and the database engine will reject invalid inputs.
NoSQL databases:
- NoSQL databases are often simpler to use, more flexible and can achieve performance levels that are difficult or impossible with traditional relational databases.
- Traditional relational databases are difficult to scale beyond a single server without significant engineering and cost, but a NoSQL architecture allows for horizontal scalability on commodity hardware.
- NoSQL databases are non-relational and do not have the same table and column semantics of a relational database.
- NoSQL databases are instead often key/value stores or document stores with flexible schemas that can evolve over time or vary.
- Contrast that to a relational database, which requires a very rigid schema.
- Today, many application teams use Hbase, MongoDB, Cassandra, CouchDB, Riak, and Amazon DynamoDB to store large volumes of data with high transaction rates. You can run any type of NoSQL database on AWS using Amazon EC2, or you can choose a managed service like Amazon DynamoDB to deal with the heavy lifting involved with building a distributed cluster spanning multiple data centers.
AWS Database services overview:
- AWS Relational Database Service (RDS) is a fully managed relational database service from Amazon.
- RDS makes it easier for enterprises and developers who want to use a relational database in the cloud without investing a lot of time and resources in managing the environment.
- AWS RDS supports six database engines: Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server.
- It provides easy-to-use, cost-effective, and scalable relational databases in the cloud.
- Amazon RDS charges are based on the instance type, running time, storage size, type and I/O requests, total backup storage size, and data in and out transfers.
The advantages of Amazon RDS are as follows:
- It’s a fully managed service that automatically manages backups, software and OS patching, automatic failover, and recovery.
- It also allows taking a manual backup of the database as a snapshot. Snapshots of a database can be used to restore a database as and when required.
- RDS provides fine-grained access control with the help of AWS IAM.
Now let’s see each SQL database which is supported by AWS:
Amazon Aurora DB:
- Amazon Aurora is a MySQL and PostgreSQL-compatible, fully managed Relational Database Management System (RDBMS).
- Amazon RDS also provides push-button migration tools to convert your existing Amazon RDS for MySQL applications to Amazon Aurora.
- It is also possible to use the code, tools, and applications you use today with your existing PostgreSQL databases with Aurora (PostgreSQL).
- Creating an Amazon Aurora DB instance will create a DB cluster.
- It may consist of one or more instances along with a cluster volume to manage the data.
- Each Aurora cluster grows automatically as the amount of data in the database grows.
- It can grow up to 64 TB. Table size is limited to the cluster volume size, hence, the table can grow up to 64 TB
These clusters consist of two types of instance:
Primary instance:
Performs read, writes, and modifies data to the cluster volume. Each Aurora DB cluster has one primary instance.
Aurora Replica:
Performs only read operations. Each Aurora DB cluster supports up to 15 Aurora Replicas plus one primary instance. Amazon RDS Aurora instance availability can be increased by spreading Aurora Replicas across multiple AZs.
Amazon Aurora Vs. MySQL:
Maria DB:
- MariaDB is a community version of MySQL RDBMS under
- GNU GPL license. It maintains a high level of compatibility with MySQL.
- Amazon RDS MariaDB manages versions as X.Y.Z, where X.Y denotes a major version and Z is the minor version.
- For example, a version change from 10.0 to 10.1 is considered a major version change, while a version change from 10.0.17 to 10.0.24 is a minor version change.
- In general, within three to five months, it will be introduced in Amazon RDS MariaDB.
- Minor version support may not be available in all AWS regions.
- Amazon RDS MariaDB supports multiple storage engines, but not all of them are optimized for recovery and durability.
- At present, it fully supports the XtraDB storage engine.
- It supports point-in-time restore and snapshot restore.
- Amazon RDS MariaDB is available in all regions except AWS GovCloud (US) (us-gov-west-1).
Microsoft SQL Server:
- It is possible to run Microsoft SQL Server as an RDS instance.
- It supports various versions of MS SQL such as from SQL Server 2008 R2 to SQL Server 2016.
- There are a few limitations for Microsoft SQL Server DB instances:
- Each Amazon RDS Microsoft SQL instance can have a maximum of 30 databases. Master and model databases are not counted as a database in this count.
- Some ports are reserved for internal purposes and cannot be used for general purposes.
- It is not possible to rename a database when an RDS instance with Microsoft SQL Server is deployed in Multi-AZ mirroring.
- The minimum storage required is 20 GB with a maximum of 400 GB for the Web and Express editions.
- For the Enterprise and Standard editions, a minimum of 200 GB and a maximum of 4 TB of storage is required. In case larger storage is required, with the help of sharding across multiple DB instances, this can be achieved.
- It is recommended to allocate storage based on future considerations. Once storage volume is allocated, it cannot be increased due to the extensibility limitations of striped storage attached to the Windows Server.
- It doesn’t support some of the features of SQL Server such as SQL Server Analysis Services, SQL Server Integration Services, SQL Server Reporting Services, Data Quality Services, and Master Data Services. To use these features it is required to configure Microsoft SQL Server on an Amazon EC2 instance.
- Due to the limitations of Microsoft SQL Server, the point-in-time restore may not work properly until the database has been dropped successfully.
MySQL:
- Amazon RDS supports various versions of MySQL.
- MySQL versions are organized as X.Y.Z, where X.Y indicates a major version and Z indicates a minor version.
- A new version of MySQL is available with the Amazon RDS MySQL instance usually within three to five months.
- It is essential to perform an OS update (if any are available) before upgrading Amazon RDS MySQL 5.5 DB instance to MySQL 5.6 or later.
- It is possible to create a snapshot for an Amazon RDS MySQL instance storage volume.
- Each snapshot is based on the MySQL instance engine version. It supports DB snapshot upgrades from MySQL 5.1 to MySQL 5.5.
Oracle:
- The following Oracle RDBMS versions are supported by the Amazon RDS Oracle engine:
- Oracle 12c, Version 12.1.0.2
- Oracle 11g, Version 11.2.0.4
- The Amazon RDS Oracle engine also supports the following Oracle RDBMS versions, but soon they will be deprecated:
- Oracle 12c, Version 12.1.0.1
- Oracle 11g, Version 11.2.0.3, and Version 11.2.0.2
- Oracle RDS can be deployed within VPC and can perform point-in-time recovery and scheduled or manual snapshots.
- At the time of creating a DB instance, the master user gets DBA privileges with some limitations. For example, SYS user, SYSTEM user, and other DB administrative user accounts cannot be used.
- Amazon allows you to change Oracle RDS instance types; however, if your DB instance uses a deprecated version of Oracle, you cannot change the instance type.
- Such RDS instances are automatically updated to a new version based on a cut-off date provided by Amazon.
- While upgrading an Amazon RDS Oracle instance from 11g to 12c is a major version upgrade, it has to be done manually and it requires downtime.
- Once the DB engine is successfully upgraded, it cannot be undone.
PostgreSQL:
- The Amazon RDS PostgreSQL engine supports various versions of PostgreSQL.
- It also supports point-in-time recovery using periodically or manually taken snapshots, Multi-AZ deployment, provisioned IOPS, Read Replicas, SSL connection to DB, and VPC.
Creating an Amazon RDS MySQL DB instance:
Monitoring RDS instance:
- Once an Amazon RDS instance is created as per the present need, it is very important to observe its performance with constantly changing business requirements and application loads.
- It is possible to monitor the instance’s CPU utilization, DB connections, free storage space, free memory, and many other parameters.
- By following the below steps, you can generate an alarm for DB instance:
- Go to the Amazon RDS dashboard and select the desired DB instance from the list of running DB instances.
- Click Show Monitoring to get the list of supported metrics.
- Create an alarm by specifying the threshold and other relevant details such as the SNS topic to use to send notifications, CPU utilization threshold, consecutive time period, and alarm name.
Creating and restoring a snapshot:
Creating a snapshot:
- Select the desired DB instance.
- Select Take Snapshot from the Instance Actions drop-down menu, available above the list of the running RDS instances
Restoring a snapshot:
- A snapshot can only be restored by creating a new instance. You cannot restore a snapshot to an existing instance.
- While restoring the snapshot to a new RDS instance, you can have a different storage volume type from the one used in the snapshot.
Introduction to DynamoDB:
- DynamoDB is an easy-to-use and fully-managed NoSQL database service provided by Amazon.
- Amazon DynamoDB significantly simplifies the hardware provisioning, setup and configuration, replication, software patching, and cluster scaling of NoSQL databases.
- Developers can create a table in Amazon DynamoDB and write an unlimited number of items with consistent latency.
- Amazon DynamoDB can provide consistent performance levels by automatically distributing the data and traffic for a table over multiple partitions.
- After you configure a certain read or write capacity, Amazon DynamoDB will automatically add enough infrastructure capacity to support the requested throughput levels.
Primary Keys:
- The primary key identifies each item within a table uniquely.
- DynamoDB supports two different kinds of primary keys:
Partition key / Hash attribute :
- The first type is a scalar value, i.e. an attribute that can hold only a single value of the type’s string, number and binary.
- DynamoDB computes a hash over the attribute’s value in order to determine the partition in which the corresponding item will be stored.
- A table is divided into different partitions, each partition stores a subset of all values. This way DynamoDB can work concurrently on different partitions and divide larger tasks into smaller ones.
Partition key and sort key / Hash and range attribute
- The second type of primary key consists of two values:
- Partition key – the partition key is used to determine the partition the current item is stored in
- Sort key – the items in a table with an additional sort key are stored in sorted order by the sort key.
- This way of storing the data provides more flexibility in querying the data as range queries can be performed more efficiently.
- If you would store for example the projects of an employee in one table and define the project as sort key, DynamoDB can serve queries that should deliver a subset of projects of an employee more easily. In such kind of tables, two items with the same partition key can exist, but they must have different sort keys.
Secondary indexes:
- Often it is required to query data not only by its primary key but additionally also by other attributes.
- To speed up such kind of queries, indexes can be created that contain a subset of attributes of the base table.
- These indexes can then be queried much faster.
DynamoDB distinguishes two different kinds of secondary indexes:
Global secondary index:
- A global secondary index has a partition key and sort key that can be different from the one of the base table. It is called a “global” index because queries on this index can span all the data in the base table.
Local secondary index:
- A local secondary index has the same partition key as the base table but a different sort key.
- As base table and index have the same partition key, all partitions of the base table have a corresponding partition in the index.
- Hence, the partitions of the index are “local” to the base table’s partitions.
Create a DynamoDB table:
Setting auto-scaling while creating a table:
Query and Scan DynamoDB table:
- A scan operation can read up to 1 MB of data from one partition.
- This does not only consume a lot of the available read capacity at once but also limits the usage of capacity to one partition.
- This may cause DynamoDB to throttle following read requests on the same table/partition. Instead of performing scans on the same table as queries, one can also think about replicating data to different tables. One table serves time-critical queries while another one is used to execute long-running scan operations.
In this chapter, we have gone through various relational and nonrelational databases that are supported by AWS. Also, we saw the difference between SQL and no SQL databases.
Congratulations on completing till here for your AWS journey. See you in the next chapter.