Basics of Source Control
As much fun as git tutorials, and comparisons between mercurial and bazzar are, they don’t really help if you don’t understand some of the basics. Hopefully this can quickly clear up some of the easy stuff.
Source control only really makes sense if you can see what it does, and how that maps into common and useful tasks.
What’s the problem?
Source code is easily misplaced. Shared code being emailed back and forth, or being shared on a windows share drive is easily misplaced. Even worse, you can have two copies of the same file, and not know which one is newer.
What can source control do?
- Provide the authoritative source of the code
- Match changes to their author
- Commit messages with details of how and why the code changed
- Merge changes from two different authors on the same file
- Tag releases, important milestones, builds
- Branch to have multiple streams of development
- Get all versions of the code going backwards based on date, tag, or version
- Easily combined with a backup solution to keep code safe.
How can two people work side-by-side on the same project?
Merging or locking. Visual source safe locks files when you check them out, and unlocks when you check in. Since only one person has the lock, there’s no risk of overlap. This can get annoying when you need a file that has it’s lock being held by a coworker.
Merging means that it tries to reconcile changes to the same file by two different people only when it has to. There’s no lock on the file, and concurrent changes can happen easily. There are algorithms that try to “merge” the two changed copies together. Most of the time, it works great, and no human interaction is needed. If the two people changed the files enough, or overlapped their changes, then the second person to check in is told about the overlap, and is forced to reconcile it manually in a text editor before checking in the final version.
Most source implementations take this merging approach, including Subversion, CVS, Borlands’s StarTeam (can also do locking), and others.
What is merging?
If I change line 5, and you change line 30, the software is smart enough to realize they don’t overlap, and shoves them together into a single file. If the changes do overlap, then it alerts you, and marks the locations in the file for you to fix manually. This sounds scary, but it happens far less often than you think, especially if you keep commits small.
What is tagging?
Label a certain point in time with a name. This can be anywhere from “NightlyBuild_2008_05_09” or “Release_3.56” or “UAT_1”. It just marks a set of file versions with a name so you can refer back to them later.
Why would I want the code from last Tuesday or from a tag?
Several reasons:
- It gives confidence later that when you build the final release version, it’s exactly the same code that was signed off on by the users.
- You can go back in time to a previous version when the users accuse you of having introduced bugs into a new version
- You don’t have to refer to things by date
- You don’t have to remember that version 3 was built from the code on the 10th of January.
- Any other time you want a human readable label on your code
What is branching?
This is where we get to the tricky part of source control. Let me first start with the problem that this solves.
We’ve released AwesomeThing version 1.0, and are ramping up to move the whole thing to version 2.0. Doing so will require lots of code changes, and take a fair amount of time.
So, we’re halfway through the new 2.0 version, and a bug report comes in for 1.0. What do we do? We labeled the code, so we can get the code as it was back in the release, but after we make the change, how do we keep track of it? We can’t insert it back in time, and it doesn’t make any sense to insert it at the head of the chain either, since that’s well on it’s way to version 1.0.
Branching is the solution. When you release a version of the code, you “branch” it into a support branch.
Looking at the timeline:
-------------- 1.0 Work
turns into
------------ 1.0 Release ---------- 2.0
\
\------ 1.0 Bug Fix Branch
Now you can get the newest version of the deployed code off of the maintenance branch, or the newest version of the next version off of the main (“head”) line.
At the point you release A.05, you just forget that the A.03 branch exists and make another branch for A.05 maintenance, so it would look like:
----- 1.0 Release ------------ 2.0 Release ----- New Features
\ \
\------ 1.0 Branch \---- 2.0 Bug Fix Branch
What else can you use branches for?
Branches are useful for all sorts of things other than just release management.
One of those uses is doing feature branches. Say you are working on the fancy new A.06 release, and need to rip out some of the guts of the code in order to get the code how you need it. If this takes two weeks, and you are following the best practice of checking in code early and often, you’ll break the code for all of the other developers during that time, who are relying on a working component to continue testing their new work against.
What you do is a feature branch. This looks like:
------- Work --------- Other work ------- Merge Point
\ /
\-------- Disruptive feature ----/
This shows another feature of branches. You can merge between them. At the end of your feature branch, when you have a nice, tested chunk of code you merge it back into the mainline development and integrate it with the rest of the code. Chances are higher of conflicts here than in the stepping on toes case described in merging up above, but it is exactly the same process.
Bug fixes can also be merged across branches in the Bug Fix branch scenario as well, to avoid losing a fix after you’ve done it. Basically, you fix and deploy off of the branch, then merge that fix back into the mainline development. This keeps both branches in sync, without having to duplicate work.
Best Practices
- Check in early, and often. Keep the code in the repository for everybody to see. Each check in should reflect only one thing, one bug fix, one CR, one very small feature.
- Check out your code often. This just gets a fresh copy of what everybody else is working on.
- Give good commit messages. This lets other developers know what you changed without digging into the code
- Tag everything that makes sense, too many is much better than too few
- Branch on releases to keep a bug fix branch handy
- Don’t duplicate what the repository does for you. It manages history very well, there is no reason to leave “commented out code, since I might need it later” in the repository, when you can pull it up at any time in the future. Embrace the delete key. Don’t mark the code with comments with the CR number (it should be in the commit message) for your change.
Leave a Reply