NoSQL on the Microsoft Platform

On 2010年08月10日, in database, highscalability, by netoearth

NoSQL is a trend that is gaining steam primarily in the world of Open Source. There are numerous NoSQL solutions available for all levels of complexity: from queryable distributed solutions like MongoDB to simpler distributed key-value storage solutions like Cassandra. Then there’s Riak, Tokyo Cabinet, Voldemort, CouchDB, and Redis. However, very few of these packaged NoSQL products are available for the other end of the platform market: Microsoft Windows. I’m going to outline what’s available now and briefly touch on some opportunities that are still available to the daring Microsoft engineer.

What’s Available Now

There are a handful of NoSQL projects that currently support Microsoft Windows and support it well enough for practical use.

Memcached

Memcached is not traditionally considered a NoSQL solution, but being a distributed key-value in-memory cache, it can be used to house a variety of transient datasets in a manner typical of other NoSQL data stores.

NorthScale has a nicely packaged, freely downloadable, version of Memcached that works on both 32-bit and 64-bit versions of Windows. You can check it out here: http://www.northscale.com/products/memcached.html

MongoDB

MongoDB is a document-based (JSON-style ) data store capable of scaling horizontally via its auto-sharding feature. It uses a simple but powerful query language based in JSON/javascript and is capable of fast inserts and updates thanks largely to its low-overhead atomic modifiers. Additionally, Map/Reduce is used for aggregation and data processing across MongoDB databases.

The team at 10Gen, the company behind MongoDB, officially supports the Windows platform and has since early on in the development process. It currently sits at version 1.6.0 and is in use at a number of high-profile web companies.

You can find more information about setting up MongoDB on Windows here: http://www.mongodb.org/display/DOCS/Windows

And you can download the latest version here: http://www.mongodb.org/downloads

sones GraphDB

The sones GraphDB is an enterprise graph data store developed in managed .NET code using C#. It is open source and available for free download for non-commercial usage. Commercial usage licenses are available.

Graph databases in general are a different kind of beast than the typically referenced NoSQL storage examples. They excel at handling a specific class of problem: datasets that include a high number of relationships and require traversing those relationships quickly and efficiently.

A common use-case for graph databases is for storing social relationships or “social graphs”. Often, these social graphs are made up of nodes with many individual relationships between other nodes. This is a problem domain that traditional relational database handle poorly.

You can find more information about the sones GraphDB  source code on GitHub here: http://github.com/sones/sones

And cost information and a feature breakdown of the various license options here: http://www.sones.com/produkte

Voldemort

Voldemort is a distributed key-value storage system used at LinkedIn for “certain high-scalability storage problems where simple functional partitioning is not sufficient.” It’s written in Java and by virtue of the fact that Java is cross-platform, it can be configured to work on Windows.

Check out this link on GoNoSQL.com to learn more about setting it up in a Windows environment: http://www.gonosql.com/how-to-install-voldemort-on-windows/

NoSQL Project Opportunities

This is an exciting time in the world of Microsoft. Partial as a result of the fact that the Microsoft club is slow to take hold of the NoSQL trend, opportunities are abound for developers to begin implementing a host of NoSQL storage solutions.

Thinking through some of the possibilities and what there is to work with and build upon, some interesting possibilities have presented themselves…

Managed ESENT-Backed Distributed Data Store

The best analogy I can think of to describe Managed ESENT is that it is the BerkleyDB of the Microsoft world. It is hardly known and rarely used by .NET developers, but its performance and reliability have been proven time and time again by ESENT’s usage in major Microsoft products like Active Directory, Exchange Server, and more.

More technically, ESENT is an “embeddable database engine native to Windows.” Managed ESENT is a CodePlex project and the .NET wrapper around the esent.dll that is part of every late Windows version.

I will say this – in the limited testing that I have done, it is damn fast – on the order of 100,000 inserts per second. See the Performance section here for more rough stats: http://managedesent.codeplex.com/wikipage?title=ManagedEsentDocumentation&referringTitle=Documentation

I’m imagining a Microsoft NoSQL solution that uses Managed ESENT as the backing store for a simple, distributed, key-value or columnar data store. Use C# or F# with asynchronous TCP networking and consistent hashing or a lookup/routing instance and we could have something here. Makes me want to play around with that and see what comes of it – anyone else interested in thinking this through with me in the comments?

In-Memory Dictionary-Backed Distributed Data Store

Another alternative, and admittedly this will likely be food for thought for a future post, is the viability of an in-memory dictionary-backed distributed data store. This is similar in concept to the Managed ESENT version above, but contained entirely in volatile memory.

This could serve as the basis for a distributed cache, or could be persisted by replicating data across a series of nodes. With the intent that by having at least a subset of nodes running at any one time, data within the data store would be persisted. Amazon, or any other cloud-based non-persistent server solution would host this perfectly. It’s “out there” as a general concept, but I’m a proponent in a big way – more on this in a later post.

Closing Thoughts

There is clearly limited options available to Microsoft/.NET developers as far as NoSQL solutions go. That is a shame, but can and will change with time. As .NET developers, it is up to us to make that happen and with some of the ideas I’ve presented above, it should be clear that opportunities are abound.

I consider this an exciting time and being able to bring NoSQL to the Microsoft masses is an effort that I’m willing to get behind. If there are any volunteers that would like to discuss this further, please comment below!

UPDATE:

Matt Warren mentioned RavenDB.net in the comments. Looks like an interesting document-database project written in .NET. Thanks Matt!

UPDATE 2:

Faulkner mentioned in the comments that Cassandra works in Windows as well! Check out this source for more information on getting it running and working with it from the client side: http://www.ronaldwidha.net/2010/06/22/resources-on-apache-cassandra-for-net-devs/

About the Author

Max Indelicato

Director of Infrastructure and Software Development at Stride & Associates

Max has worked in a variety of companies, including startups, growth-stage businesses, and established enterprises. He’s held the roles of Chief Software Architect, Director of Technology, and the like, where he’s built and maintained mobile marketing platforms, large scale public-facing e-commerce websites, and a series of financial applications supporting fixed-income securities financial entities.

Follow Me

Tagged with:  

Comments are closed.