Using screen scraping to expose legacy web pages in RSS

As part of my (almost) daily drive to and from one of my clients I pass through the sub-sea Oslofjord tunnel (Oslofjordtunnelen). Now what has driving got to do with screen scraping and RSS, you say?

Hang on, I’m getting there.

Below is a map extract that shows part of my route. The topmost pin is where I start out, the bottom-most pin is the Oslofjord tunnel. The pin in the middle is where, more often than not, a sign shows up stating that the tunnel is closed for maintenance.  You can imagine my frustration when I’m forced to drive all the way back north to get around the Oslofjord!

Map picture

To avoid that pain I set out to find a feed with traffic status updates and ended up at this page published by the Norwegian Public Roads Administration (NPRA). The page have regularly updated traffic information (all in Norwegian mind you) but to my frustration all as static web pages. No feed in sight!

At last here comes the screen scraping into play. You could write up your own scraper in any modern runtime these days. But being a good/lazy developer I know there are already quite good services out there that makes it a breeze setting up feeds with data scraped off of web pages. And lo and behold, I now burn a feed with the latest traffic updates!

So… basking in the glory of my genius for a couple of days I thought it a good idea to write up this blog post for the greater good of mankind. To make the story a bit shorter, what I discovered while rummaging around the NPRA site is that they indeed have great support for RSS!

Feeling a bit stupid I will now go and redirect my FeedBurner setup…and please let me know if there is a moral to this story.

Good night.


Going to Microsoft PDC 2009!

Only one year after the most successful PDC 08, I find myself (and my company) going to Los Angeles once more. PDC 09 looks promising with sessions covering the .Net Framework 4, Visual Studio and Team System 2010, Windows Azure, DirectX 11, Silverlight 3, and much more.


Team Build and drop location share permissions

I’ve recently banged my head against a simple, yet annoying problem with Team Build and the way build result files are published to the so-called drop location. In my case, this location is a share on our file server. Knowingly, the build service executes under a TFSBUILD account which I have given Co-Owner permission level at the share.

Everything builds and results are published, yet all builds are only partially successful. Digging around the somewhat unmanageable BuildLog.txt file I see that Test results from MSTest is published via a call to http://tfs:8080/Build/v1.0/PublishTestResultsBuildService2.asmx which subsequently fails with this error:

  The results directory “\\fileserver\tfsbuilds\BuildFolder.1\TestResults” could not be created for publishing.

The solution is quite simple. Since the actual publishing of test results is done by the PublishTestResultsBuildService2 service, executing under the TFSSERVICE account, we also need to give that account Co-Owner permissions to the drop location share.

Obvious? I think not Smile 


ASP.Net Profile performance or lack thereof

In a customer solution I’ve been working on we use ASP.Net Membership and Profile to store information about users. We use profile data extensively in various reports and listings throughout the solution.

Putting the solution under some regular user activity though, showed some really poor performance when producing reports. Some of these are large reports mind you so I had to go digging to figure out what was going on. I always check SQL Server activity first, looking for waiting processes and locks. And yes, there it was: an exclusive lock on aspnet_Users. Why, we’re only doing reads in these reports!

Further digging into which stored procedures are touching the aspnet_Users table I discovered that the aspnet_Profile_GetProfileProperties actually do an update on the aspnet_Users.LastActivityDate column. This stored procedure does not take any parameter to control this behavior. So a quick solution to the problem was removing the updating part.

Of course, after figuring out what the problem was I suspected that others have figured this out too. Go here for a view of the stored procedure before and after surgery.


Sql Server, WMI and PowerShell

So I started out on a quest: a quest for the overview of installed SQL Server instances on my machine. With previous versions we had to crawl the tangled forest of the Registry to get this information. Knowing that with SQL Server 2005 there is now a WMI namespace that can be queried, and with this sample I could easily enumerate my 2005 instances (both full and express versions). It was all there in the ROOT\Microsoft\SqlServer\ComputerManagement namespace!

Encouraged by the simplicity of getting instances through WMI I moved on to install the latest SQL Server 2008 bits. I wanted to try out the Express CTP, downloaded and installed it, and ran my code again.

…no sign of the 2008 instance. Thinking that they surely haven’t ripped out WMI support in 2008 I needed a tool to browse the WMI namespace to find out what was going on. Luckily, the PowerShell Guy have made a nice PowerShell script that explores the WMI namespaces in a nice and graphical UI.

But the script requires PowerShell V2 so now I had to go on a side-quest, a quest to get the WMI Explorer running. Installing PowerShell V2 CTP on Windows XP requires installing Windows Remote Management, the Microsoft implementation of WS-Management protocol. For the record, WRM is built into Windows Server 2008 (and probably Vista too) but needs to be installed separately on Windows Server 2003 and Windows XP. And yes, I had to throw out PowerShell 1.0 (what happened to side-by-side installations?) The quirk with removing PS 1.0 is that in Add/Remove Programs, remember to Show Updates in order to see the “Windows PowerShell(TM) 1.0” entry under “Windows XP – Software Updates”.

The rest went well, and I’m now back on track browsing the WMI namespace. And lo and behold, they have indeed changed the WMI schema with SQL Server 2008. The location is now ROOT\Microsoft\SqlServer\ComputerManagement10.

But this wasn’t the treasure I set out to find! Instead of crawling the registry we now have to crawl through WMI. First, I will have to query the ROOT\Microsoft\SqlServer\__NAMESPACE in order to discover what ComputerManagementXX namespaces are installed. Okey, so it is a bit easier than the registry anyway. When I go into the ComputerManagement10 namespace and look up instances of the SqlService class, I get a nice list of all SQL Server instances, both 2005 and 2008.

Lessons learned? Microsoft warned us that “the Registry will change”. It seems that the same goes for WMI. 

 Update: couple this with strongly typed classes for accessing WMI for extreme simplicity Smile