Top 4 Reasons Why a Shared Development Database is Evil.

by

It’s pretty common.  A development team will frequently have one instance of SQL Server — filled with data from production — that all the devs use to develop and test the next version of the application.  It’s one of those things in software development that feels like “common sense”.  Of *course* you’re going to have a single instance of the database that everyone uses.  How else would you keep all the schema changes synchronized across everyone’s development workstation?  How would you move all that test data around?  I mean, every time that you push a new version of the schema, you’d have to re-build the database and re-populate it with test data.  That’d just be a pain. 

The unfortunate thing is that “common sense” can sometimes be wrong.  “Common sense” says that the Waterfall project management approach (think project plans and Gantt charts) is obviously right but piles of empirical data says that Waterfall stinks and that there are tons of hidden and not so hidden problems.  Same thing for developing with a shared database.  Tons of hidden problems.

Reasons Why a Shared Database is Evil

Reason #1: Fuzzy Version Control

If you’re developing your application’s code (C#, VB.NET, Perl, etc etc), you wouldn’t dream of editing it without the help of version control.  Without version control, how would you share what you’d done with the rest of your team?  How would you roll back if you made a mistake?  How would you make sure that everyone else’s changes work with yours?  You also wouldn’t dream of developing your middle-tier (business object, Domain Model) code under source control but then skipping version control for your user interface (ASP.NET, WPF, Silverlight) code.  It wouldn’t make sense.  You’d never have a way to know what version of the UI code worked with the middle tier code.  Version control helps you sync various pieces of code together in order to form a coherent, compileable, deployable application.   You simply can’t develop a complex application without version control. 

Now think about that shared database code.  Without the appropriate version of your database schema (tables, stored procedures, etc), your application won’t work.  Chances are pretty high that your database code isn’t versioned at the same time as your application code.  If you’re using a shared database, it’s usually up to the developer to *remember* to make a copy of whatever he/she has changed and check it in to version control.  That’s crazy!  Why do developers think that that’s ok? 

What can go wrong?  Because it’s up to the developer to check in code, it’s difficult to feel truly confident that what you have in source control matches the application code.  If you don’t have the code in source control, it’s difficult to do branching & merging and it’s difficult to know what code is in test/stage/production, and it’s very difficult to reliably move your application (compiled code + db schema) from dev to QA to stage to production.  Any doubt that you have or any “fuzziness” that you have in your versioning shows up as production deployment problems — either you didn’t test the right code or you didn’t deploy the right code — there are a zillion ways that not exactly precisely totally knowing what code you have where can bite you. 

Reason #2: Shared databases rot your brain and encourage you to write bad unit tests.

A shared database makes it way too easy to write integration tests instead of unit tests.  In fact, shared databases often make it DIFFICULT or IMPOSSIBLE to write unit tests.  If it’s remotely difficult for your developers to do The Right Thing, they won’t.  We developers are lazy.  We know the best practices but, oddly, don’t follow them.

Your unit tests are probably not “unit” tests if you write to a database.  Anything that writes to a database as part of the test is should probably be considered an “Integration Test”.  The majority of your test code should be unit tests that test small units of functionality in isolation from other pieces of code in the application.  These unit tests are going to code against interfaces and use A LOT of mock objects — especially mocked versions of your database code.  The integration tests are also going to be narrowly scoped (meaning: they’re going to test as little as possible and the purposes of the tests are going to be very focused) and will make calls in to the database. 

Here’s the catch: in order to be good, valid and accurate, integration tests need to know the state of the database when they run.  This means that the integration test needs to make assumptions about which records are and are not in the database.  If you’re going to check to make sure that you can’t create a new user with a duplicate user name, you basically have to make sure that there is a user record in the database that has is using the user name that you want to attempt to duplicate.  You’re basically setting up the initial state of the test and if that duplicate user record isn’t already in the database, that test is going to give you a failure.  (Tricky, isn’t it?) 

Now that’s just one single integration test.  You’re going to have a ton of those integration tests.  If every test writes some data in to the database as it does its work, how are you going to make sure that these tests aren’t going to be stomping all over each other when you run your entire suite of integration tests together?  (Yah. Tricky, isn’t it?)  You’ve got to make sure that all your unit tests can “play nice” together.  This means that they’re going to be doing tons of INSERTs, UPDATEs, and DELETEs in order to make the database look the way the tests need. 

There’s no way that you can have two developers using a shared database and run all those unit tests at the same time.  1) You’d be constantly messing up what the other developer is working on and 2) whatever that other developer is working on is going to be constantly breaking your tests. 

So, fellow Lazy Developer, are you going to subject yourself to near constant broken unit tests and broken applications –OR– are you just going to skip the unit/integration tests.  Yah…that’s what I thought.  You’re lazy and therefore you’re going to skip writing the tests.  What could *possibly* go wrong?  <grin />

Reason #3: Developers are integrating on top of each other in real-time.

Even if you aren’t writing unit and integration tests, your developers are still going to be doing separate pieces of work against that shared database without any isolation from everyone else.  Personally, I think it’s important to be able to work on my code in isolation and only integrate with the rest of the team when I’m ready.  I signal to the other developers that I’m ready to integrate by checking my code in to version control.  At that point, the continuous integration build kicks off and verifies that my code does actually integrate.  (And since I’m using TFS2010 Gated Checkin, if it doesn’t integrate and the continuous integration build fails, my checkin is rejected.) 

Let’s say that you don’t have that developer isolation.  Let’s say that Dev #1 is working on implementing a customer management screen and Dev #2 is working on changes to the security system.  For Dev #1 to make the app run, he/she has to log in to the application.  Now, if Dev #2 is in the middle of making changes and the security tables and stored procedures are currently broken or otherwise doesn’t match the version of the security code that Dev #1 is running on his/her workstation, that login window is broken.  Now Dev #1 can’t log in to the application to test the customer management screens.  Dev #1 is now blocked by Dev #2.  In reality, if it’s a change like this, it’s probably not just Dev #1 who’s blocked but it’s also Dev #3, #4, #5, #6, #7, etc who are blocked, too. 

Reason #4: Everyone has DBO permissions.

If you need to make changes to the database schema, you need some fairly “beefy” permissions on that database.  You need CREATE, DROP, ALTER, EXECUTE, VIEW DDL, etc etc etc.  You need lots of permissions.  You need lots of permissions on things that haven’t even been created yet.  So, either you’re going to drive your DBAs batty with 60 trillion “hey…can you grant me permissions on xyz” requests or the DBA is just going to give all the developers DBO rights on that database.  Makes total sense. 

Ok.  So we developers are lazy, right?  We know the right thing to do but we don’t do the right thing unless we’re being watched.  So, we know that we should be running and testing our applications with the least amount of security permissions as possible.  We’re going to (hopefully) deploy our application in to production with “least privilege” (aka. the minimal amount of permissions required to run the app) so we should be doing the same thing in the DEV, QA, and STAGE environments, too.  How much do ya wanna bet that your run your unit tests with your “DBO” permissions?  I’d put the chances at about 0% that your run your unit/integration tests or the version of the application that’s on your local machine with that low privilege user account.

What this usually means is that you don’t test the application under the configuration that it’ll be running in in Production until fairly late in the development cycle.  This means that you’re going to miss security bugs and deployment bugs related to permissions late in the development cycle.  For my customers who insist on using a shared database, deploying to production is always a chore because they’re always deploying the wrong versions of schema objects and are almost always battling permissions problems. 

The Fix: It’s all about the Continuous Integration build.

Now, if you don’t have a shared database, it means that you’ve solved a lot of your deployment problems. You have a way to track and version control database changes easily and you probably have a tool that lets you easily deploy the database from scratch.  My tool of choice is the Visual Studio Database Project (formerly known as “Visual Studio Team System Database Developer Edition”, “DBPro”, or “Data Dude”.) 

If you’re using a DBPro project, you can easily hook this in to your automated builds.  So, when someone checks in code to the database project, a database build kicks off, DROPs the database and re-creates it from scratch using the latest code in version control.  Since we’re using a TFS2010 Gated Checkin, if this code cannot be deployed to create a valid instance of the database, the checkin gets rejected.  If the checkin passes, we have a completely accurate version of the database that is guaranteed to match what’s in version control.  Now if we run our integration tests against this fresh database, they should pass.  If they don’t pass, then we know that we don’t have everything correct in version control.  If there’s a problem, we catch it early.

We ensure that the integration tests work with “Least Privilege” by not running the unit tests from a non-privileged account and making sure that developers don’t have anything even remotely resembling DBO permissions on the build server’s database.  This ensures that the unit tests pass when running with the permissions we’ll use in production. 

Also as part of the build, we deploy our freshly-built application in to Internet Information Server (IIS).  We tear down the previously deployed version of the application and build it back again. This gives us something to use to verify that our application as checked in to version control still actually works.  No shared database.  We build everything from the ground up.  If we can’t do it in an automated fashion using the latest code that’s checked in to source control, our application is officially broken. 

It’s all about catching problems early.

Common Excuses Why You Can’t Use a Non-Shared Database

Excuse #1: We need test data.  How do we move all that test data around?

This is tricky.  You glib answer is that you shouldn’t be moving your test data around because you should be creating it from your integration tests and that your unit/integration tests are almost the only thing that matters.  In reality, that’s only mostly true.  Getting away from a shared database is going to take some actual work…but it is very rare for a company to TRULY need all that test data to do development.  I’m making a distinction here between DEVELOPMENT and TESTING.  In development, you probably only need a small subset of data.  When you move your application into a more “integration testing” or QA-centric environment, you’re probably going to want to test your application with a lot more data. 

Here’s a tip: For your development environment, try creating some SQL scripts that will move a small subset of data around.  Version control these scripts and write a batch file that will insert these records in to the developer database. 

Excuse #2: We need test data.  Without test data we can’t tell how our queries perform and how our application performs.

True.  It’s difficult to performance test an application when it doesn’t have enough data in the database.  The timings on everything is going to be totally wrong and totally off.  BUT!  Do you really performance test your applications on developer workstations?  Do all your developers have a NAS and a bunch of multi-core machines with 32 GB of memory on their desktop?  Not very likely.  This is why you’ll have a separate integration environment.  You’ll do this kind of performance testing when the code has been deployed in to a more representative environment. 

Excuse #3: We’ve got privacy concerns.  We can’t put production data on our developer laptops!  What if the machine is stolen?!

Yah.  Totally.  What if the machine is stolen.  Or what if the developer takes a backup of your production “test data”, zips it up, puts it on a USB thumb drive and walks it out the door?  You’re still screwed. 

In reality, if you have privacy concerns about your data, your developers probably shouldn’t be developing with that production data anyway. 

Summary

I’m sure that there are lots more reasons why a shared development database is evil and there are tons more common excuses.  In summary: using a shared database is pure evil and there are tons of little hidden costs and developer performance penalties. 

Do yourself a favor and start freeing yourself from the tyranny of the shared database!

-Ben

 

Got a shared database that you’d like to send to “shared database heaven”?  Problems figuring out what database code matches with what application code?  Issues figuring out what you’ve deployed to dev, QA, stage, and production?  We can help.  Drop us a line at info@benday.com.


6 Responses to "Top 4 Reasons Why a Shared Development Database is Evil."
  1. 100% agreement here. It’s all about failing early and loudly. With a shared DB there are too many factors that could influence why tests fail. Data dependent tests are a common symptom of failed tests in a shared dev DB. As developers loose faith in their tests the chance [Ignore] or deletion of a test increases.

    Also 100% agreement in the build DB being locked down so only a select few can write to it, for a slightly different reason. Developers will be developers, at 3 a.m., with a broken build, the temptation to “spike” the data so a test will pass is very great. With locked down permissions, this temptation is not there.

  2. I don’t think this is a resolved question, and I certainly don’t think you can say shared development databases are evil.

    Let me give a concrete counterexample. A developer builds a schema, creates procedures, and all that good stuff that is involved in database development. Let’s say it’s a small project and takes them two weeks.

    They now deploy all of the necessary changes into what’s been described in this article as an integration testing server. All that matters is that this is a server with a lot of data, and real data. Dirty data. Production or production-like data.

    Now we find out quite a few things. First of all, we find out that the schema and code are horrible. They perform at a crawl. The developer has scalar functions in their query predicates and check constraints. The datatypes they have defined are unable to integrate with horrible data that resides in some other database… a proprietary, third party system against which schema changes cannot be made.

    The DBA simply refuses to deploy such underperforming code, and the schema incompatibilities mean the system is non functional anyway. The developer has to start again from scratch. Two weeks lost.

    Yes, in a shared model there is a modest chance that two people alter the same procedure, and one overwrites the other. But if this happens it is noticed early, and the “merge conflict” needs to be resolved early. And it’s not hard to resolve… as long as you have the code in revision control (using, say, the red gate tools) you haven’t permanently lost one developer’s version of the code. You just need the developers involved to communicate and come up with an appropriate solution.

    Indeed, if you had a merge conflict using a non-shared model the developers involved will probably need to communicate to resolve the conflict anyway.

  3. I don’t think you covered another evil of shared databases. Everyone has to wait for the shared database to be restored. Due to timing issues, our data is near-duped and we can’t easily clean it up because of constraints. It is quicker to just restore the database, but first we have to ensure they everyone has scripts of any objects they need to create after the restore. While everything is getting sorted out to prepare for the restore, we can’t build anything else. If we had separate databases, this would not have been a problem.

  4. I agree 100% with this article. I know this is a very old post and so is Don’s comment but I wanted to respond to it anyway:

    Don – if a developer actually spends 2 weeks building a simple app with such rudimentary and fundamental problems that can’t be fixed with a few indexes and requires a total rewrite then either a) fire the developer because they suck or b) fire the manager that stuck an inexperienced junior dev with no database experience on a database app development project all by himself with nobody to review anything he was doing the whole time.

    Any mediocre developer with even a moderate amount of database development and performance optimization/testing experience should have a rough “order of magnitude” idea of the query performance characteristics of their application without ever running a single query. If they don’t then they shouldn’t be tasked to go develop a whole application by themselves.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.