Multiple web servers simulation using nginx

Recently, I had to prepare a web application to being run on multiple web servers (also known as a web farm). The goal of that is to ensure the failover (so another web server replaces the primary one if the latter is down), or to increase the performance using a load-balancing between several webservers. I wanted to perform the most tests using a single machine, with several web sites using the same code, to have a benefit of a quick feedback cycle – without the need to touch another machine or roll up a VM.

As we use Microsoft technologies, the solution seemed to be pretty obvious: use IIS Application Request Routing (ARR) to set up a web farm. However, on a local machine, it’s proved to be a difficult task. After a day and a half of trying and getting only 503, I decided to search for another solution.

This another solution turned out to be nginx, widely used static/proxy web server used mostly on Unix machines. I was able to configure it in 40 minutes, and while Windows version of the server is not that well performant, it turned out to be good enough for our purposes.

This is an example of a minimal config you might use to simulate a web farm on localhost:

worker_processes  1;

error_log  logs/error.log info;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    client_max_body_size 100m;

    keepalive_timeout  65;

    upstream farm {
        server localhost:8003;
        server localhost:8004;
    }
    
    server {

        listen       8007 ssl;
        server_name  lb.localhost;
        
        ssl_certificate      cert.pem;
        ssl_certificate_key  cert.key;

        ssl_ciphers  HIGH:!aNULL:!MD5;
        ssl_prefer_server_ciphers  on;

        location / {
            proxy_pass https://farm;
            proxy_set_header X-Forwarded-Host $host;
            proxy_set_header X-Forwarded-Port 8007;
        }

        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
    }
}

In a nutshell, listen directive specifies on which port nginx will listen to incoming requests, and proxy_pass defines how nginx will forward them to the server set specified in the upstream directive. Normally nginx uses a round robin procedure to route the request, that is, first request lands on the first server listed in the upstream, next request lands on the subsequent one and so forth, starting from the beginning if the server list is exhausted.

Note that for HTTPS, you will need to acquire SSL certificates and put them into the nginx configuration directory (cert.pem is a certificate, and cert.key is a private key). The easiest (and free) way to do so is to use http://www.cert-depot.com/.

You can run nginx via typing start nginx at command line.

Note that ssl_session_cache directive present in the default nginx config is omitted here because it doesn’t work on Windows (if not removed, it will cause nginx to fail miserably). Also, client_max_body_size is increased compared to the default value for the case when your site supports uploads of large files (otherwise, you will get unexpected errors from nginx). Proxy_set_header is need in case your web site wants to know the outward domain name of the website.

Sql Server Management Studio cannot edit rows with long text

Today, I found a reason why Sql Server Management Studio refused to edit a row in a table with “String or binary data would be truncated” error. I checked all columns content, and they were perfectly fitting their column definitions; and there were no triggers for that table at all.

Using a profiler, I found that actual SQL it tried to perform is UPDATE with some extra optimistic concurrency-looking checks:

UPDATE TOP(200) Table SET Field1 = @Field1 WHERE (Id = @Param1) AND … (Field2 LIKE @Param2)

Now, @Param2 was a 10KB text which lived nicely in a ntext column, except LIKE pattern cannot be longer than  8,000 bytes according to the documentation!

Good work, SSMS 🙂

Naming

I always try to come up with better names for classes, methods and variables. Sometimes it makes me stop for quite a while and think, which at times involves sarcastic comments from my colleagues.

Still, I do believe good naming reduces confusion, decreasing time to understand the code and increasing probability of successful modification. Here’s one of the most recent examples.

In explorer-like document management system we were writing there was openedFolderChanged event. During refactoring which involved handling this event in a very special circumstances, I suddenly got a feeling I did not understand what the purpose of this event was. I asked a team what that name could mean, and got two very different answers.

One group thought this event should represent situation when folder whose contents we currently show to the user is replaced by another folder (e.g. we were showing “Invoices September” and transitioning to “Invoices October”). Another group thought the name implied that folder which we currently show was changed somehow – for example, some new files were added to it.

Both groups were very sure their point of view is the only natural one. Somebody from the first group mentioned it should have been named “FolderContentChanged” if the meaning was to be what second group thought. This convention was logical, but not sufficient to me. Quite expectedly, I found most code followed meaning 1, except of couple of places where event was fired in situation where it can be explained only by meaning 2.

We brainstormed for the better naming for a while. We tried several variations with “selectedFolder” which we didn’t like as we also had checkboxes allowing user to select files and folders. We tried how some “currentFolder” options  – which was better, but didn’t resolve folder/content ambiguity. It also became obvious “changed” part was not very clear, as, actually, this event caused new data to be loaded, so for the user folder was not really “changed” at the point when event was fired. There were some “changing” combinations, and then there were couple of options with “request” word, like currentFolderChangeRequest. And finally, we came up with switchCurrentFolder, which everybody agreed completely eliminated any other meaning, and meant only “folder A is going to be replaced with folder B”.

I’m still not happy with imperative “switch” here, but overall relief was so great that the developer went immediately replacing the name and cleaning the places where it had been used incorrectly. But the most important thing was, now we were sure if switchCurrentFolder even had been fired with the same folder id as the current folder, we were ok to ignore it, as it could never mean “folder contents needs to be updated” anymore.

DRY is heuristic, don’t follow it blindly

Programmers love simple and straightforward principles telling them what to do. Heard about SOLID, YAGNI or DRY before? Once acquainted with them, it is very hard to break free from these magic spells.

It is important to remember though that the life is more complicated that any one-liner can be. Software design principles are just a part of some bigger goals, and one doesn’t need to follow them blindly.

As for DRY, which sometimes receives almost religious worshipping, the real goal is the ease of maintenance. “Do it once or die propagating changes everywhere!” it says. In fact, DRY violations serve as good heuristics showing the places where maintenance problem may arise.

Still, making the same change in all places of usage is not the only maintenance concern. More often, complicated business requirements state a number of special cases; describe similar but not identical behaviors, and eventually transform a solid quant of reuse into the web of numerous branches. At that stage, we are concerned mostly not about a sheer amount of work to synchronize features, but a probability of breaking one feature as a result of modifying another one. DRY becomes a bane instead of a boon.

No doubt, in most case DRY is good. But sometimes, before applying DRY on a level higher than that of a single class, stop for a moment and think, will the benefits outweigh the shortcomings?

You can read more about that in posts by Udi Dahan http://www.udidahan.com/2009/06/07/the-fallacy-of-reuse/ and Ayende http://ayende.com/blog/4333/effectus-isolated-features.

CruiseControl.NET: Subversion Source Control Block doesn’t see changes

Stumbled upon a weird bug while setting up continuous integration with CruiseControl.NET and Subversion: source control just didn’t catch changes after commits. Turned out, it was caused by time mismatch between CCNET and SVN server, and it’s a known issue mentioned in the documentation. What wasn’t mentioned though, is how to fix it if you can’t synchronize time on servers for some reason. The solution is to set revisionNumbers parameter of svn source control block to true.

ccnet and git

CruiseControl.NET has a built-in git integration via Git Source Control Block. Still, when using msysgit on Windows connecting to a github repository, you may find out that ccnet will hang on getting the sources until timeout kills the build. For me the simplest solution that worked was to run ccnet service under the user account I used git before (I guess it has something to do with SSH keys). In a different situation you may want to create a dedicated account, setup the git, including keys and all, and then use this account for CruiseControl service.