Paper Review: Adapting Microsoft SQL Server for Cloud Computing

Paper Review:

Summary:

MSSQL is the first distributed commercial SQL storage. It uses primary and secondary replicas across the datacenters, coordinated by a global partition manager. Operations are done using quorum and  Paxos consensus algorithm is used in replication and recovery.

Strong points:

  1. I guess one of the best thing about Microsoft SQL server is that it’s a SQL-based cloud storage solution, which means standard and fast development for most small companies with common data models. It supports aggregation, full-text queries, and referential constraints, views and stored procedures and most of those are not supported by custom record stores.
  2. From the content I think global partition manager is not a single machine but more of a highly-available service made with multiple nodes across the datacenters to ensure the availability.
  3. The decoupled design of layers enables upgrade without interfering with user operations. All the cluster activities, including the two-phase upgrades, are done in the layer of infrastructure and deployment services and the user won’t be able to use the new features unless the process is finished.

Weak points:

  1. The replica placement is good for avoiding heavy traffic where each server host a mix of primary and secondary partitions. Note that only primary partitions serves all the query, update and other operations (however, nearly up-to-date secondaries might be used as read-only copies). Could the consistency be an issue because the asynchronous update? Since the read-only replicas are nearly update but there’s no guarantee, the responsibility of validation of the data is given to the users. And what if the client wants to write something and the primary is far far away? The primary replica might be a good way to coordinate operations but it surely affects availability and consistency.
  2. The update to replicas will be propagated from primary replica to secondary ones, which means that if the server storing primary replica fails during the beginning process of propagation might result in loss of transferred data before one nearly up-to-date secondary replica becomes primary.
  3. Since it’s a SQL server, the scalability could be worse than NoSQL storage like MongoDB and Bigtable since the data is stored in hierarchical fashion. Also I guess MSSQL doesn’t offer dynamic schema as well.
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s