Notes on running spark-notebook

These days Docker makes it extremely easy to get started with virtually any application you like. At first I was a bit skeptical but over the last couple of months I have changed my mind. Now I strongly believe this is a game changer. Even more when it comes to Windows. Anyway, these days kitematic (GUI to manage docker images) allows you to simply pick the spark-notebook by Andy Petrella.


When running your docker host in VirtualBox, you still need to set up port forwarding for port 9000 (the notebook) and ports 4040 to 4050 (spark-ui) Assuming your docker host vm is named default:

VBoxManage modifyvm "default" --natpf1 "tcp-port9000,tcp,,9000,,9000"
for i in {4040..4050}; do
VBoxManage modifyvm "default" --natpf1 "tcp-port$i,tcp,,$i,,$i";

Now you can browse to http://localhost:9000 and start using your new notebook:


You may want to copy the default set of notebooks to a local directory:

docker cp $containerName:/opt/docker/notebooks /Users/timvw/notebooks

Using that local copy is just a few clicks away with Kitematic:


Offcourse you want to use additional packages such as spark-csv. This can be achieved by editting the your notebook metadata:


You simply need to add an entry to customDeps:


When your container did not shutdown correctly, you may end up in the awkward situation that your container believes that it is still running(). The following commands fix that:

docker start $containerName && docker exec -t -i $containerName /bin/rm /opt/docker/RUNNING_PID

ConcurrencyCheck with EF/Devart on Oracle

Earlier this week I was wondering how I could easily achieve optimistic concurrency in a system using EF/Devart targetting an Oracle database (Not really my preferred technologies, but whatever… :P). Here is a potential solution:

Using a column for optimistic concurrency is documented on the devart website:

public class Test : IRequireConcurrencyCheck
  [ConcurrencyCheck] // <-- TELL EF to use this column as our "timestamp/logical version"
  public virtual int Version { get; protected set; } // protected, so users of this type can not touch this (easily)

By introducing an interface and some custom behaviour on SaveChanges we can now take away the burden of having to update the Version property correctly:

public interface IRequireConcurrencyCheck
  int Version { get; }
public class DataContext : DbContext
  public DataContext(DbConnection existingConnection)
    : base(existingConnection, true)

  public override int SaveChanges()
    var = ChangeTracker
      .Where(x => x.State == EntityState.Modified)

    foreach (var entity in entitiesWhichHaveConcurrencyCheck)
      entity.Property<int>(x => x.Version).CurrentValue++;

    return base.SaveChanges();

  public IDbSet<Test> Tests { get; set; }

Failure to load mono-supplied .dylib (libgdiplus.dylib) when running from console

So earlier this week I was bit by the following bug: Bug 22140 – Failure to load mono-supplied .dylib when running from console.

The workaround that works for me is the following: Edit /Library/Frameworks/Mono.framework/Versions/3.8.0/etc/mono/config and
replace the entries for libgdiplus:

<dllmap dll="gdiplus"
    <dllmap dll="gdiplus.dll"
    <dllmap dll="gdi32"
    <dllmap dll="gdi32.dll"

Deploying a Cloud Service to Azure with Octopus

Currently Octopus has limited support to deploy a Cloud Service on Azure. A typical use-case is that you need a different Web.Config file per environment. Simply add the Web.Environment.Config files to your NuGet package and use the following PreDeploy.ps1 script:

# Load unzip support
[Reflection.Assembly]::LoadWithPartialName("System.IO.Compression.FileSystem") | Out-Null

function Unzip($zipFile, $destination)
	If (Test-Path $destination){
		Remove-Item $destination -Recurse | Out-Null
	New-Item -ItemType directory -Force -Path $destination | Out-Null
	[System.IO.Compression.ZipFile]::ExtractToDirectory($zipFile, $destination) | Out-Null

# Unzip deployment package
$CsPkg = "Customer.Project.Api.Azure.cspkg"
Unzip $CsPkg "azurePackage"
Unzip (Get-Item (join-path -path "azurePackage" -childPath "*.cssx")) "website"

# Perform replacements, eg: replace Web.Config
$ConfigFileToUse = "Web." + $OctopusParameters["Octopus.Environment.Name"] + ".config"
Copy-Item -Path $ConfigFileToUse -Destination "website/sitesroot/0/Web.Config" -Force

# Repackage
$role = "Customer.Project.Api"
$contentPath = "website\approot"
$rolePath = "website/approot"
$webPath = "website/sitesroot/0"
$cspackPath = "C:\Program Files\Microsoft SDKs\Windows Azure\.NET SDK\v2.2\bin\cspack.exe"
& $cspackPath "ServiceDefinition.csdef" "/out:$CsPkg" "/role:$role;$rolePath;Customer.Project.Api.dll" "/sites:$role;Web;$webPath" "/sitePhysicalDirectories:$role;Web;$webPath" 

Cute sort implementation

For years I had been implementing my sort functions as following:

(x,y) => {
   if (x.PartName == null && y.PartName == null) return 0;
   if (x.PartName == null) return -1;
   if (y.PartName == null) return 1;
   return x.PartName.CompareTo(y.PartName);

Earlier today I found the following cute variant while browsing through the ServiceStack codebase:

(x,y) => x.Priority - y.Priority

Clone all your repositories on another machine

Recently I was configuring a new machine (God, i love Chocolatey) and I wanted to take all the repositories I have under c:/src and clone them on my new machine. Here is how i did that:

# write all remote fetch locations into repositories.txt
find /c/src -type d -mindepth 1 -maxdepth 1 -exec git --work-tree={} --git-dir={}/.git remote -v \; | grep fetch | awk '{print $2}' > repositories.txt

# clone each repository
cat repositories.txt | xargs -l1 git clone

Or as a gist:

A parameterized active pattern to match the first elements of an array

Been writing code in F# for almost a year and never blogged about it. Time to change that 😉 Earlier today someone asked the following on twitter:

Is it possible to pattern match the first part of an array in #FSharp? Something like | 1::2::3::tail but for arrays? #lazyweb

I accepted the challenge 😉

let (|Array|_|) pattern toMatch =
    let patternLength = Array.length pattern
    let toMatchLength = Array.length toMatch
    let tailLength = toMatchLength - patternLength

    if patternLength > toMatchLength then
        let firstElementsAreEqual = [ 0 .. (patternLength - 1) ] |> Seq.forall (fun i -> pattern.[i] = toMatch.[i])
        if firstElementsAreEqual then
            Some(Array.sub toMatch patternLength tailLength)
match [|1;2;3|] with
| Array [|1|] tail -> sprintf "bingo %i" (tail |> Array.sum) // the tail is [|2;3|]

Or as a gist:

The curious case of trailing spaces in SQL

A while ago I was quite surprised to see that the following query returns 1 instead of 0:

SELECT COUNT(*) WHERE N'Tim' = N'Tim '; -- notice the trailing space

Apparently this is just standard behaviour. Here is an extract from sql1992.txt (Section 8.2 Paragraph 3):

     3) The comparison of two character strings is determined as fol-

            a) If the length in characters of X is not equal to the length
              in characters of Y, then the shorter string is effectively
              replaced, for the purposes of comparison, with a copy of
              itself that has been extended to the length of the longer
              string by concatenation on the right of one or more pad char-
              acters, where the pad character is chosen based on CS. If
              CS has the NO PAD attribute, then the pad character is an
              implementation-dependent character different from any char-
              acter in the character set of X and Y that collates less
              than any string under CS. Otherwise, the pad character is a

            b) The result of the comparison of X and Y is given by the col-
              lating sequence CS.

            c) Depending on the collating sequence, two strings may com-
              pare as equal even if they are of different lengths or con-
              tain different sequences of characters. When the operations
              MAX, MIN, DISTINCT, references to a grouping column, and the
              UNION, EXCEPT, and INTERSECT operators refer to character
              strings, the specific value selected by these operations from
              a set of such equal values is implementation-dependent.