in iDevBlogADay, Tools

Reconsidering Version Control

Ever since I turned indie, version control just hasn’t been much of an issue. Gone are the days of hundreds of multi-GB files that changed multiple times per day. With small teams of one or two plus a few collaborators, Subversion hosted remotely worked fine. Of course, all the cool kids these days are going on about how their distributed version control systems solve world hunger, but I’ve been mostly ignoring it because I have better things to do with my time (like writing games [1]).

Yesterday things changed a bit. As a result of last week’s “growing” post, Manuel Freire is going to join me to help with Flower Garden development. That makes two of us banging on the same codebase, and from two different time zones, so we don’t get the benefit of being in the same closet as it was the case with Power of Two Games. Since I was in my get-things-done mindset, I figured I would just set up a new svn repository for the project, move over the Flower Garden data, give us both access to it, and move on.

But no, it couldn’t be that easy. Can you guess what the first words out of his mouth were when I asked him about version control? “Oh, I love Git!”

That was the last straw. I had to do a mini-research session on version control systems, so I spent a couple of hours looking into it. If we were going to move over to something, now would be the time to do it.

Git and Mercurial

Git and Mercurial both look great. I was debating which one to go with until I realized that they’re both two flavors of the same thing, so it comes down more to personal preferences and tastes. This is best description I found on the differences between Git and Mercurial. When it comes to computers, I’m totally a MacGyver guy (actually, that might be true when it comes to anything now that I think about it), so that made my decision easier.

The big feature everybody keeps talking about for distributed version systems is effortless branching. That’s great, but I really have no intention of branching much. I haven’t created a single branch in the last four years, and I don’t expect to start doing that now. Next.

The other big feature is working disconnected from the network. That’s something I could use, but considering I’m only offline a handful of times a year, it really isn’t enough of an incentive to switch to a whole new system.

Git sounds like a great tool for large, distributed, open-source projects with hundreds of contributors, but frankly, I couldn’t find anything else that was worth mentioning for a small project and a handful of people. I feel like someone is trying to sell me a Porsche when my beat-up Hyundai is still perfectly functional for driving to the grocery store once a week. Am I missing something obvious?

Hosting

Hosting the actual repository was part of the consideration. This is for a private project, so all open-source sites are out. Ideally I wanted to host it just like I do with Subversion in Dreamhost, but the instruction page on how to install Git and Mercurial are enough to put most people off. Clearly, there’s a steep learning curve there.

server-rack.jpgSo I asked on Twitter for recommendations. I’ve learned that Git and Mercurial users are definitely very vocal and are always willing to help someone join their ranks. Within minutes I had all the suggestions listed below:

Prices varied a lot. From the $25/month/user of Kiln, to the $6/month of RepositoryHosting (gets you unlimited users and 2GB of storage). The Snappy Touch repository is already over 2GB, so it would end up being a bit more expensive than that, but not too bad.

Of those, Github was definitely the most recommended. I was starting to feel Git might not be the one for me, so I looked a bit more into RepositoryHosting because they had Subversion support. It turns out they also provide Trac, which is a great tool, although I already have that set up myself.

Wish List

Binary files

The one thing I really want in a version control system is good large binary file handling. I check in everything under version control, source code, assets, raw assets, and even built executables for each version. I want to be able to throw multiple GB psd files in the repository and have it work correctly (meaning fast, and using the least amount of space possible).

Perforce did an OK job with that. Git and Mercurial apparently are both horrible at it. So is Subversion, but at least it’s a tool I already know and I don’t have to spend time learning the ins and outs of how to optimize the Git database or how to make backups, or cull unused trees.

GUI

I love my command-line tools. I live in Terminal for a good part of the day, and having a real shell is one of the things that makes my life so much more pleasant under Mac OS than under Windows. But there are some things for which a GUI tool is a really useful addition, and version control is one of them.

On the Mac, I’ve been using Versions as a client for Subversion and it does everything I want. It’s fast, handles multiple repositories, lets me browse history, diff changes, etc. From my brief search and other people’s comments, there’s nothing quite like that for Git or Mercurial yet. That’s a pretty big, gaping hole.

Low-level access

Looking at all those hosting providers, I realized how much I want to have low-level access to the database. I want to be able to back it up myself, and run svnadmin when I want to. A lot of those hosting sites looked really pretty, but you were very limited in what you could do.

Coming To A Decision

If this were a thriller, you’d be disappointed. I’m afraid there are no plot twists and you can already guess the conclusion.

In the end, since neither Git nor Hg are built to address my biggest need (large binary files), I’ll stick with svn. It might be old, it might not be cool, but it serves my needs, I already know how to use it, I have the tools, I can admin it and fix a problem. I can concentrate on what matters instead: Writing games.

I decided to continue hosting it myself on Dreamhost. I can easily have one repository for every major project. However, by default, Dreamhost creates svn repositories using htaccess security and HTTP protocol. That’s OK, except that none of the actual data is encrypted as it would be if I were using ssh. I could use HTTPS, but then I would have to set up a certificate and pay for a fixed IP address, so instead I found an alternate way to have a secure connection.

All Subversion repositories live in a user account (svnuser). I create a new user group for every repository, and change all the files in the repository to belong to that group. Make sure you also set the SGID bit so any files created in that directory still belong to the group. Then I can create a new shell user for every collaborator, and add him to the groups of the repositories I’d like him to have access to. At that point, he’ll be able to access the repository as svh+ssh://username@hostname.com/home/svnuser/repository. All safe and secure.

Bonus SVN-Fu

Here’s something that I learned yesterday while I was moving repositories around. It’s probably common knowledge for seasoned SVN admins, but it was new to me.

I had a repository that included a bunch of my projects. What I wanted was to create a new repository that still had all the history, but only for the FlowerGarden part of the tree. I knew about svnadmin dump for transferring whole repositories, but I didn’t know there was a very simple way to only transfer part of it.

First you need to dump it as usual:

svnadmin dump repository > repos-dumpfile

Now, it turns out you can process the dump file before adding it back to another repository. So we can do:

svndumpfilter include FlowerGarden --drop-empty-revs --renumber-revs < repos-dumpfile > fg-dumpfile

Finally, you can add the resulting dump file into a fresh repository and have all the history for that project and only that project:

svnadmin create flowergarden
svnadmin load --ignore-uuid flowergarden < fg-dumpfile

Amazingly, for a repository that was over 2GB, that only took a few minutes. Go Subversion!

[1] Or reading. Or going for a walk. Heck, even sleeping would be a better use of my time than futzing around with a new tool.[2]

[2] And yes, I realize I sound like a grumpy old man. Getting there apparently. Now get off my lawn.

This post is part of iDevBlogADay, a group of indie iPhone development blogs featuring two posts per day. You can keep up with iDevBlogADay through the web site, RSS feed, or Twitter.

37 Comments

    • @SnappyTouch SVN is still good and if it does the job well for you why change it. At least you’re not doing by hand with floppy 😉

    • @SnappyTouch Yay for Subversion, still serves us well also. TortoiseSVN can be the only weak point with large numbers of files.

    • @SnappyTouch Interesting, I love git (and github) but didn’t realize it was bad at large files. Still learning git though, I guess.

  1. Side note: with distributed VCS, there’s not much (or any?) need for backup/admin type of functionality. Your checkout (er… clone) of the repository _is_ _full_ backup. You could clone from your laptop into your desktop, and it would be the same as cloning from the hosted repository. Hence there are no “admin” type of tools; you have everything locally.

    • Good point. I keep thinking of traditional, server-based repositories. I guess that makes it easier.

    • That’s not entirely true – rebuilding a central hub from the various versions people have on their laptop *sucks*. Sure, it’s great if there are only one or two people sharing.

      But when you start talking many branches, many people, there’s a good chance all users only have a subset of the central one. Which means getting things together piecemeal.

      So, PLEASE, do yourself the favor and backup the central hub. You’ll be happy you did.

      (And I haven’t even mentioned yet what happened if somebody got rebase-happy and force-pushed that. You *so* want a backup 😉

  2. As a grizzled vet of cvs then svn, GitHub is heaven. I have not looked back. It does help to be a bit of a Unix geek for the various odds and ends you need to do on your Mac but that set of tasks is well defined and not really a big deal.

    Go for it,
    Doug

    • I have no doubt I’ll get there one day, but that’s not going to be right away. Maybe by the time I dive in, someone will come up with a decent SCM that works with large binary files too. That would be awesome.

  3. Mercuial and git are extremely useful for throw away branches and not polluting your subversion repository with unnecessary branches and commits (the subversion repository *will* grow in size unfortunately). You can check out from you subversion repository, hg init, and off you go. Then clone the new mercurial repository, make lots of changes, revert etc., then eventually copy back the version you like over the top of your subversion working copy.

  4. Well we went through this discussion on our side. Our data were a bit different : 80gb SVN repository, some needs for branching and as we are quite a good number of people I prefer to run things internally. We tried both git and hg on some projects to draw some conclusions.

    1) Git is unsuitable for us as we are mainly a windows based shop and the tool is really too much unix oriented with a lack of good windows support. Perhaps somebody will fix that, until now I would not recommend using git on windows but well on Mac it is a completly different story. Hg is working well.

    2) Big binary files are a problem for sure and graphists like big files.

    3) More difficult : teaching less technical people how to use TortoiseSVN was already a bit tricky (not that much but nevertheless), Hg & Git are a whole different story. The workflow is much more complicated and some people got troubles with that, especially with Git as the usage of name for operation is really different.

    Our conclusion was to stay with SVN for all binary files and assets, then we use Hg mostly for code. It works quite well as the integration server is the bartender and do the mixing.

  5. The way you talk about branching was also the way I talked about branching before I learned Git and Hg. I did the occasional feature branch in SVN, but nothing major.

    After I started using the distributed systems, however, I branch like there’s no tomorrow. Pretty much everything I do is a branch, especially since it’s so easy to change between branches, and therefore jump to the new branch for this minor fix, and then back to the feature I was working on, and so forth and so on.

    The real beauty comes when committing branches. Since the distributed systems work with changes rather than revisions, then they are very good at handling merges, and many times you don’t even notice that you have actually performed a merge.

    A small disclaimer, however, I have only worked with Git and Hg on small projects with 2-4 developers, and we have all been quite technically minded.

    • The reason I avoided branches before is because they were a major pain. Agreed. Oh, do I have the horror stories to tell about branching 🙂

      But now I don’t do branching because I’m mostly doing TDD and continuous integration. So I want to be as close to the head revision as possible all the time. I suppose I could do a branch if I’m doing something really fishy and I’m not sure of the outcome, but since I try not to break anything, I’m OK checking into the main branch each small change along the way.

      • Branching in Git/Hg should almost have a different name from the same operation as it’s SVN counterpart. It’s not really anything to do with being distributed either, they are just better at it. It’s so easy to branch, merge and move between branches that people tend to create a branch for each new feature. With SVN there is a barrier to branching and merging, it’s easy but not trivial. Removing that barrier completely changes the way you think about branches. At least it did for me.

      • Branching is super useful for when you want to still do bug fixes and get them out quickly while you are working on a feature.

        In git, this is so easy to do, you have problems understanding how you lived without it. Git changes your workflow, so you feel happy commiting more often. It’s so incredibly fast, and if you’re on a laptop without network connectivity you can keep making commits and then just push them to the central server when you get network back.

        I highly recommend that you at least have a play with git. http://gitref.org is a great reference.

  6. I just went through this myself. I tried using Github but I had the same ho-hums as you did. On my very small team I do the most or all of the coding and what I wanted was for my graphics artist to be able to upload game assets and have a hand place to download new betas. Github worked well just not for what I was hoping for. I am now using SVN and set up SFTP for my art repo / beta distribution.

    Good post.

  7. I used TortoiseSVN on windows for a long time connecting mostly to local network apache servers with the svn extensions, as opposed to remote svn services. On OSX it took a while to find clients I liked – TextMate has a bundle that uses the commandline but it’s easy to misuse the gui, some of our team used Coda but svn in that was quite painful/limited and we got into a few holes now and again, Versions was in beta (I think), it was ok at the time… I also tried an old OSX Finder integrated one from the TortoiseSVN developers, but it was a bit dodgy at the time as well (and wasn’t Snow Leopard compatible, from memory).

    so I’ve mostly ended up using SmartSVN (free version after the demo expired) and it’s been really good – plus some TextMate/commandline for fun 🙂

    I haven’t looked too much at Git – it didn’t really exist without fuss on windows when I was still using it (about a year OSX now) and svn seemed to do what we needed. I wouldn’t rule out trying it, but I think a few of everyone’s comments match up with my thinking – I’d prefer less voodoo and command-line magic (not afraid, but all that stuff is a timesink and working with designers who want gui’s and not command lines is a really big consideration) and it would be great to know that large binary stuff is supported better. Also, if svn is working then there has to be a compelling enough reason to move, and at the moment there isn’t 🙂

  8. Your git branches are more than likely local. They’re a full clone of the repository with history etc. The primary benefit is they sandbox the change you’re making and at any time you can jump back to the master branch for bug fixes etc. The workflow takes a bit of getting used to and to be honest I still forget to branch sometimes. I used to avoid branching like the plague with p4 and svn. The simplicity and speed or branching in git doesn’t even feel like the same feature. Other than the basic concept they’re not really compatible.

    Still, like anything, if it’s not right for the job it’s not right for the job. The lack of a good visual merging tool is a bit of a pain, the best tools in that category I’ve found are diff viewers, not actual merge tools.

  9. Here is a Tortoise like plugin for OSX. I haven’t really looked at it for a while, so I’m not sure what it’s like. Last time I looked it was less than feature complete.

    🙂

    http://scplugin.tigris.org/

  10. Here is my simple setup for hosting all my projects under mercurial.
    I use slicehost to store all my projects.
    Each project is a user with its own home dir and I create the mercurial repo there and clone it with that user through ssh. It works great even with multiple users, eider I give the passw to the other dev or just create a new user and use linux permissions to give him access and just link it to his home dir.
    It’s not the “proper” way to run a mercurial server:):) but works great on the budget:)

    lzantal

  11. I used hg locally for a while underneath a Perforce client. It was great, because I could branch a ton and not “pollute” Perforce with a ton of throwaway work.

    Consider the ability to trivially branch sort of like having multiple clientspecs under Perforce. It’s not so much that you want to be far away from the head as it is that you might have a couple of ideas cooking and you don’t want to cross-pollute.

    I really liked hg, but never used it with anyone else.

    Once I went to the Mac, I started using git to see what everyone was on about. Plus it seemed more straight forward to set up a server with the requirements that I had at the time. I use GitX (which was mentioned earlier), and it’s fine but it’s no P4V and it certainly doesn’t (apparently?) offer all of the cool git features of being able to track changes across files (e.g. sometimes it’d be nice to see the history of a line of code as it moved around the codebase).

    Anyway- neither of these are appropriate for (large or frequently changing) binary assets, simply because you have the entire history of those objects replicated on every machine. Not to mention, branching files like that doesn’t make any sense because you can’t reasonably merge them. However, I think a server-based solution for those files and a DVCS for text files wouldn’t be too big a stretch. Some folks use separate VCSes for these things anyway (say, Perforce for code, Alienbrain for data) and it works just fine.

  12. Mac GUI front ends…there’s MacHG and Murky (both Mercurial) – I use MacHg…

  13. I’ve been through just about every version control system in the last 15 years, and I now prefer hg/git over svn for 2 reasons:

    1. Performance. Operations like tree-wide diffs & logs are almost instant on both hg & git. Subversion is orders of magnitude slower – certainly when doing things that require network access, but even on local operations like diffs on large trees simply because there’s an .svn metadata folder at every level of the tree.

    2. No single point of failure. I replicate by repositories to a local server and to offsite backups, but I know that if my server isn’t available, it’s actually no big deal. In SVN, if the server’s down, work stops or is at least massively compromised. In hg/git, I just carry on and re-sync when things are back up, or sync via another mechanism instead (another ‘server’ – really just file space – email or God forbid, even a USB stick).

    I concur with your impression of the lack of good Mac GUI tools though – which is why I’m writing one that handles both Git and Mercurial in one tool.

  14. This sounds a little trolly, but you haven’t branched for years… because you’re using SVN?
    I wish I could branch more often. The new(to me) branch-tracking in SVN is great, but I’m tempted to try git for a while.

    I like branching. Accurev does it fantastically IMO (Though I’ve never used perforce)

  15. I think you missed a couple of points

    “) You want to run svnadimin.

    The point here is you’re thinking like a someone coming from a centralized host system. In a DVCS there is no centralized host. Each machine holds a copy of all the data. That’s what it means by DISTRIBUTED. It’s up to you and your fellow programmers to decide which one of them is “official” but as far as the DVCS itself there is no “official”. Today you can declare yours the official one, Tomorrow your partners, the next day someone else’s. Think of it as if there was no version control and you each had a copy of a file. Who’s is the official copy? The difference is only that a DVCS tracks enough info so that if you sync/merge with someone else it can merge all the changes correctly regardless of how many times you’ve each edited the file locally.

    So the point is you don’t need svnadmin access. That’s a bit lik you asking for admin access to your partners’ computers.

    *) You don’t branch

    As others have pointed out, that’s because you were using p4 or svn where branching sucks. With git / hg you wanna try something you just branch and immediately start trying it, which back to your previous branch, switch to experiment #1, switch to bug fix #2, switch back to your main branch, all of this works in a way that p4/svn could never do and so branching actually becomes second nature, not the exception.

    *) binary files.

    Neither git nor hg are worse at binary files than svn so that’s not really a valid argument against them if you’re sticking with svn.

    I’m sure you’ve been told this several times but one day you’ll make the switch. When you do you’ll wonder how you ever worked without a DVCS. The improvement is nearly as much as going from nothing to having your first version control system. It’s a huge win.

    • Sorry, not to get on your case, but I think you misunderstood what I said (maybe it was on other comments): I don’t branch not because it’s difficult, but because I have no desire to do so. Even if it were zero effort (which might well be in Git/Hg). I practice mostly TDD/continuous integration, and with a team of one, two, or a handful, I really have no intention of branching.

      As for binary files, yes, they’re just as bad. So my attitude is, why bother changing. The point is, there isn’t a compelling enough reason for me to switch now.

      And yes, you’re right. I’m sure one day in the next few years I’ll switch. Hopefully distributed version control systems will be more mature and robust by then. I doubt I’ll look back and wondered why I hadn’t switched earlier though.

      • I think your aversion to branching comes from your previous experience.

        Consider this use case. You’re working on feature xyz, you’ve edited a few files. Your partner or an artist finds a bug and needs your help fixing it. He’s blocked until it’s fixed but your current edits are untested and are not ready for check in.

        in svn/p4 you could tell them you aren’t in a position to help them. That’s not so great.

        You could branch but as you know, branching in svn/p4 sucks.

        You could make a new client or copy of the repo. This works but if your repo is as big as mine that’s a 10-30 minute operation.

        In hg/git you type the branch command and 5ms later you’re branched back to state before your changes. You fix the bug, check it in, your partners are back to work, another 5ms later you’re back to the state you were in before they needed help with the bug.

        The same could be true for your self. You’re in the middle of working on something. You have an idea want to try but your code is not in a good state. All the same examples apply. You could not try your idea. You could copy the repo. You could branch in svn/p4 (yuck) but if you’re on hg/git it’s literally as a fast as typing “echo done”. You’re now in a state you can try your experiment. Once done you check it in or not, and in less than a second your back to where you were before the experiment.

        Not a branching example but really possibly for similar reasons, you have 2 programmers and 1 artist. You are working on a feature that needs integration with work from the other programmer but if you check in your work you’ll put the product in a state that will effect the artist poorly. In svn/p4 you’re screwed. In a git/hg you just share your changes through hg/git directly with the other programmer. The artist never sees them until they’re ready.

        Once you get used to it you’ll wonder why you worked so long without it.

    • Exactly the same points as you. 
      Svnadmin is useless in a DVCS environment
      I never branched when I was on CVS and then SVN because it was such a hassle and the gain never really appeared (even worse, the merge were full of conflicts…)
      For binary files, I currently “mirror” an SVN repository to some git hoster with git-svn. The project is a video game and is 5.9GB full of raw assets like psd, 3ds, ms3d and max files. Everything is fine.
      Where git is bad, is with files which are bigger than the available ram. Other than that, the binary diff algorithm is pretty efficient and stores only the changes of a binary file between 2 versions.

      I think going from central VCS to DVCS is like going from no Testing at all to TDD. It’s a new world to learn but you never want to go back after the transition is done. Especially when you like nothing less than good automation with nice workflow. Continuous integration just seem so natural then.

  16. Games do seem to be one area where DVCS are way behind due to the binary assets and large teams working on them.

    For a while my Big Game Studio had a repository for code and one for the content. That kind of sucked to deal with because now everyone had 2 sets of things to keep in sync. When we switched to Perforce for everything it was a breath of fresh air – just sync to Changelist 224876 and you are good to go.

  17. Um.. I’m really just a novice with both TDD and Mercurial but I sort of wonder, is it possible to use the branching mechanism as a sort of, I don’t know, functionality slicer; that is have something like unit branching…? Wouldn’t it make it simpler [and cleaner] to isolate and test then integrate object interaction tests?

  18. Just wanted to bring this up in case you didn’t know about it: Joel Spolsky of “Joel on Software” wrote up a Mercurial tutorial that’s both very funny and very helpful for getting oriented on the benefits and differences in both workflow and mindset that a DVCS allows: http://hginit.com/

    First line is: “When the programmers at my company decided to switch from Subversion to Mercurial, boy was I confused.”

  19. My advice to you is this: Don’t switch to git for your work projects; but try git on new small repositories for relatively non-crucial things like documentation, for example for just a single document you are writing.

    It’s simple to initialize a git repo compared to svn for such small things. Overtime, you will get used to git, and then may consider moving your important stuff to git.

Comments are closed.