David Chappell

  • September 2020
  • November 2017
  • April 2017
  • October 2016
  • March 2016
  • February 2016
  • August 2015
  • April 2015
  • December 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
  • July 2004
  • June 2004
  • May 2004
  • April 2004
  • March 2004
  • February 2004
  • January 2004
  • December 2003

Opinari

Get the Feed! Subscribe

Cloud Platform Storage: Relational vs. Scale-Out  
# Friday, February 27, 2009
 
All of today’s most visible cloud platforms provide a scalable storage mechanism. Google’s AppEngine has the Datastore, Amazon Web Services has SimpleDB, Microsoft’s Windows Azure has Tables, and Salesforce.com’s Force platform has an object store. All of them offer similar things: hierarchical rather than table-based storage, a straightforward schema-less approach, and a simple query language.

But why? Why don’t these platforms just provide ordinary relational storage? While their hierarchical approach has some strengths—it’s simple and flexible—it also has a host of weaknesses. Here are some of the big ones:
  • Because they don’t provide ordinary tables, these storage technologies are harder for developers to understand and use. It also requires work to move data between familiar on-premises relational databases and hierarchical cloud datastores. For example, achieving good performance probably means organizing your data hierarchy to optimize for your app’s most common queries, something that differs from the usual relational approach.
  • None of them support standard SQL. This adds to the unfamiliarity, and it also means that useful things like joins and aggregates aren’t generally available. And each platform has its own query language, making life more difficult for developers and increasing platform lock-in.
  • The lack of standard relational data means that existing tools for working with that data, such as reporting services, can’t easily be used.
  • Because there’s no schema, programs will contain more errors. Rather than relying on the database to catch attempts to, say, store a string in an integer field, you’ll have to find these yourself.

These are significant limitations, and they raise a big question: Why would anybody use these things? Why don’t the cloud platform vendors just give us relational storage with SQL rather than these limited approaches?

The reason is that nobody seems to know how to make a relational DBMS scale to hold really massive amounts of data. You certainly can make a relational system handle more and more data by running it on ever-larger machines, but it’s much harder to do this by replicating relational data across multiple machines. In other words, traditional relational databases scale up, but they’re hard to scale out.

One way to scale out with a relational DBMS is to divide your data across multiple instances of the DBMS. Maybe all customers whose names start with “A” are in one instance, all whose names start with “B” are in the next instance, and so on. This approach, sometimes referred to as sharding, can work. But it’s hard to administer, and think about what it gives up: You lose the familiar all-in-one relational world, you can’t do SQL queries across different instances (which means you lose joins and aggregates), you can’t easily use reporting tools across different instances, and you no longer have an automatically maintained common schema across your entire database.

Does this list of problems sound familiar? It should, since it mirrors what you lose with the hierarchical storage mechanisms provided by cloud platforms. These hierarchical stores are all focused on providing massively scalable storage, which means scale-out storage. Just like sharded databases, they trade off functionality for scalability.

Perhaps we’ll one day see cloud platforms offer scale-out storage that provides everything we now get in traditional relational databases. At the moment, however, the state of the art in scale-out storage seems to require giving up much of what we’re used to with SQL and relational databases.

So when should you use a cloud platform’s scale-out storage? In some cases, such as Google AppEngine, there’s no choice: All persistent data is kept in the Datastore. But other platforms have options. Amazon Web Services, for instance, offers both SimpleDB for scale-out storage and the ability to run a standard relational DBMS such as Oracle or SQL Server in a virtual machine. This latter option won’t scale as well, but it does give you a full relational system.

I’d argue that scale-out storage is so limited that it should be used only when your app requires enormous scalability. Giving up the advantages of relational databases makes sense only when the trade-off between functionality and scalability is worth it. So far, my sense is that not many cloud apps require this—using SimpleDB appears to be much less popular than running a relational database in a VM, for example.

Relational storage is a wonderful thing. It won’t always solve your problem, and so embracing the limitations of scale-out storage is sometimes necessary. But unless your cloud app needs massive scale, the relational option still makes plenty of sense.



5 comments :: Post a Comment

 


Comments:

There is the question of transport from the client application to the cloud data service. The familiar model that developers follow today is REST (GET/POST/PUT/DELETE) of documents in the data service. These operations can cover much of what is needed for a data driven application. In these cases a simple data service can meet the requirements of the application. So I would add the Rich Internet Application use case to the list of applications that make sense to build with a simple data service approach. Use a simple data service approach when you want huge scalability and/or when you want to build Rich Internet Applications in languages like AJAX, Flash, Silverlight, and JavaFX.
 

I think scale out factor is only one axis of cloud based storage. Very important parameter here is also higher resilience capabilities compare to RDBMS. With cloud based storage you can distribute data quite easily across multiple different geographical locations and with as many data copies as you would need to fulfill resilience numbers in the SLA contract. Additionally all that you can achieve on commodity based servers which lower provider’s cost even farther. I doubt this is possible to achieve even in fraction of such scale on current RDBMS systems where higher resilience is heavily traded for specialized and very costly hardware. Complexity to do same with RDBMS will be simply enormous. Main question for me therefore is whether majority of enterprise applications really need such support to justify usually not natural hierarchy based data model now.
 

A couple of thoughts:

Jerry, you're right that a RESTful interface is useful. You don't have to give up a standard relational store to get this, however. On Windows, for instance, ADO.NET Data Services can put a RESTful head on a variety of different storage bodies, including relational stores. (In fact, Windows Azure tables use ADO.NET Data Services to expose their RESTful interface.) A cloud relational store that also provided a RESTful interface would address your concern without giving up the benefits of our familiar relational world.

And Libor, scale-out storage certainly does provide higher reliability than traditional relational databases. Still, the great majority of apps today are happy with the reliability they get from existing relational storage technologies. Why give up all of the benefits of relational storage for more of something that you don't really need? I'd be surprised if most developers do this.
 

Simple question, does the recent announcement from the Microsoft SDS team change your view on this?

http://blogs.msdn.com/ssds/archive/2009/03/10/9469228.aspx
 

No. It's a great thing for the Azure platform to include relational storage--it's an essential service. But I'd expect the SDS relational storage to be significantly less scalable than Windows Azure tables, Microsoft's scale-out storage offering.
 

Post a Comment


<< Home