ThinkGeek - Cool Stuff for Geeks and Technophiles

It’s been one year since we decided to switch to Git from Visual Source Safe. We’ve had up to 6 developers using Git with our .NET Visual Studio solution which consists of 31 project and 3000+ source files(C#, F#, ascx, etc…). Git is an amazing piece of software. If you are not using it, stop what you are doing, download it, and start learning it. I’ve used Perforce, MKS, Mercurial, and CVS/SVN, but it would be very sad for me to go back to using something other than Git.

Before I get more into Git, I want to get one thing straight: Source Safe is not source control software. At best, it’s a fat random byte generation layer on top of the file system or network stack. Calling it source control is the “biggest scam perpetrated on the American public since One Hour Martinizing”.

Git does not have the notion of a central repository built into it, it is distributed. You work directly on a repository. Usually this is your own local repository. You commit changes locally. You have all your history and branches locally. Branches can be shared across repositories by pushing and pulling. Git comes with its own network protocol for doing this, but there are other protocols that can be used like SSH. By starting with this distributed architecture, the very notion of having your own repository means you are working in your own seperate branch (a local branch can be tracked to a branch in a remote repository). The result: branching is EASY. It’s built in. It’s part of everything you do.

Just because it’s distrubited doesn’t mean you can’t work with a repository that is considered the “central” or master repo. This is what we do. We all cloned from the master repo. Our master branch is “tracked” to the master branch in the master repo. We work locally, commit, and push our changes to the central repo (you might have to pull first before doing a push if you’ve changed files that someone else has). We create other branches and push them to the master branch so that they are available to other developers when they do a pull. Merging is done on pulls automatically. Everything just works.

I’m not going to give you a course on Git here, there are other good sources for that, but I will give you some starter tips:

  • Get the msys Git for windows. Cygwin is evil.
  • Get git extensions. It adds git functionality to file explorer and Visual Studio.
  • Git, by default, considers all the bytes within all the files in the file system tree starting at the top level repository directory part of the repository. Any bytes in that file tree are fair game. This behavior can be changed. See below.
  • Don’t think of the repositories as files and directories. Think a big byte blog. Changes that happen are not happening in files, they are happening somewhere in this big ball of bytes.
  • Tags, branches, and commits are all the same thing: references to some point in the repository. Branches have a different work flow.
  • For visual studio projects, we use a .gitignore file with these contents:
    *.pdb
    *.dll
    *.exe
    *.suo
    *.scc
    *.vspscc
    *.user
    *.cache
    *.log
    *.rdl.data
    bin/
    obj/
  • For merging:
    1. Download p4merge.
    2. Create a batch file called p4diff.bat with the following contents and put it in the c:\windows dir:

    p4merge %2 %5

    3. Add the following to the end of your .gitconfig file under c:\Documents and Settings\Username
    [mergetool "p4merge"]
    cmd = p4merge $BASE $REMOTE $LOCAL
    [merge]
    tool = p4merge
    [diff]
    external = p4diff.bat

    Now, when you call git mergetool p4merge will be used.

Don’t let the name scare you. A monad is just something that represents one or more computations, a workflow. The need for monads came up in pure functional programming languages like Haskell where given a set of inputs, a function should return the same value every time. But what happens when you have a function that takes a filename and line number and returns a byte array of the data on that line in the file? Might it not return a different byte array between subsequent calls if the file changed? The solution: instead of executing the operations to read the file and return the data, the function creates a workflow(monad) that will do that. So what comes back from the function is a workflow, something that represents the operations to read from the file. In typical pragmatic form, Microsoft decided to use a better name for this in F#: Workflows. The stuff in the Workflow is called the compuation expression. These names make sense.

It turns out workflows are useful for other things like building Sequence Expressions and Async Workflows. The advantage is a syntactic sugar for creating what is usually a gobbledygook of disparate pieces of code to accomplish the same thing. You can create your own workflows by creating a type with a couple of important mothods: Bind, Delay, and Return. I really recommend watching Luca Bolognese’s presentation on F#. Its worth watching for the section where he discusses Async Workflows. Pretty amazing stuff.

If you come from the Lisp world, F# Sequence Expressions will seem familiar to you. In Clojure and Lisp they are called List Comprehensions but are essentially a specialized Monad/Workflow.

Are you tired of doing the if(dict.Contains(key))… pattern? Extend Dictionary:

public static R ValueOrSomethingElse<K, V, R>(this Dictionary<K, V> Col, K Key, Func<V, R> Transform, Func<R> SomethingElse)
{
   if (Col.ContainsKey(Key))
      return Transform(Col[Key]);
   else
      return SomethingElse();
}

Example usage:

Dictionary<string, DateTime> BDays = new Dictionary<string, DateTime>();
...
BDays.ValueOrSomethingElse("Mark", d => d.ToString("MM/dd/yy"), () => "Unknown BirthDate");

The second parameter is a lambda for transforming the value if the key is found. The third parameter is a lambda to execute and return if the key is not found. I have the third parameter(something else) as a lambda in case the “something else” is an expensive operation.

Over the years, I had heard much about Lisp. Articles I had read, or colleagues who knew it, would espouse the benefits of learning it or using it. Promised results ranged from making you a better imperative programmer to the highest intellectual enlightenment that could be achieved. So I dove into Lisp. Except it wasn’t diving. Well it was like diving if the body of water you are diving into is covered with 18 inches of ice, and you need to use a sledge hammer to make a hole to dive into. And then your mind and body react to diving into close to freezing water. If you come from a strictly imperative programming background, learning Lisp and/or Functional Programming can be a battle. I’m pretty sure it’s easier to learn Lisp and FP if you don’t know anything about programming.

So I started reading bits and pieces about Lisp, but without actually coding in it, I really didn’t get it. I needed to code. But what were the options? Islands. Islands with libraries that promised this and that but still little islands. Then I met Rich Hickey and his language Clojure. Clojure is a much improved dialect of Lisp that compiles down to JVM or CLR byte code. Continents. Continents with lots of libraries and documentation. I could now easily connect to SQL Server and do some real world stuff. Yes, Clojure is a dynamic language, and I’ve mentioned I prefer static languages, but Clojure is filled with awesome.

Although Clojure along with other Lisps are functional programming languages, Lisps have a very unique property that other common FP languages don’t have: Homoiconicity. Thats fancy talk for saying the code and data have the same shape. Data structures and code are represented the same way: s-expressions. This is what drives the powerful Lisp macro system. The macro system is what makes the language changeable. If there is a feature you want that’s not in the language, you can add it. This point was really driven home to me when I learned about the F# pipeline operator: |> . I thought, see this is a nice feature that Clojure doesn’t have. Until I realized that someone had built a pipeline macro for Clojure. This is from core.clj:

(defmacro ->
  "Threads the expr through the forms. Inserts x as the
  second item in the first form, making a list of it if it is not a
  list already. If there are more forms, inserts the first form as the
  second item in second form, etc."
  ([x form] (if (seq? form)
              `(~(first form) ~x ~@(next form))
              (list form x)))
  ([x form & more] `(-> (-> ~x ~form) ~@more)))

If F# didn’t have |> there would be nothing you could do [Update: That is not true. As Brian pointed out in the comments, F# allows you to declare infix operators.]. Don’t get me wrong, I’m a HUGE fan of F#, it just doesn’t have a macro system. But it is statically typed :)

You need a lazy sequence of something:

public static IEnumerable<T> Range<T>(T Start, T End, Func<T, T> inc) where T : IComparable<T>
{
   T result = Start;
   while (result.CompareTo(End) <= 0)
   {
      yield return result;
      result = inc(result) ;
   }
}

Need a sequence of integers up to a million:

Funcs.Range(0, 1000000, (j => j + 1));

Need an infinite list of uppercase letters:

Funcs.Range('A', '[', (c => c >= 'Z' ? 'A' : (char)(c + 1)));

I’m a refactorer.  My title’s have had the words developer or programmer or engineer or architect.  But really a lot of what I do is refactor.  I mean even when I design and build something from the ground up, part of that process involves refactoring.  You start with a design, you start coding with that design in mind but inevitably you realize an hour later, maybe a day or two later, that part of your vision of the problem or solution domain was wrong.  So you refactor. I can’t imagine doing this with a dynamically typed language.  The very beginning of creating a thought about potentially having to do this immediately causes my brain to shutdown.

You get two years into a large enterprise application with hundreds of source files.  You realize you need to refactor a couple of the fundamental classes/libraries used in your system.  It’s not because you were an idiot when you designed the things, its just that business has changed.  The problem has changed.  You need to adapt (refactor). It’s the right thing to do.  It is good for the future.  Are you a spineless ninny?  Your’e not.  You refactor.  It’s not that bad, 341 compiler errors, but it doesn’t compile.

And don’t give me the old: “but all I need to do is grep for…”