Running a Blog on Azure Data Lake

Important Note: There is now a new service, Azure Static Web Apps, that is even better than Azure Data Lake for hosting websites. The blog has moved to this new server. Read more about it here: Azure Static Web Apps - the fast and secure way to run your blog.

Can you host a blog in an Azure Data Lake? Yes, absolutely! In fact, this website is an example of that. In this article, I will describe how you can host your website in Azure Data Lake and handle dynamic things such as a contact page.

First, a couple of words about why you would build your blog on Azure Data Lake:

  • It is very cheap. See the pricing here.
  • You get a really super fast website.
  • Security is great.
  • It is cool.

You will certainly a bit of technical skills and time, but if you enjoy creating things in Azure you will have great fun!

Static Website Hosting in Azure

The deal-breaker that makes this possible is a new function in Azure Data Lake that is called Static website.

Azure Data Lake Static Website Actually you were able to host a static website in Azure Data Lake already before, but Azure Data Lake static websites gives you two very important features:

  • Have an index (default) page. For example a page at the root, http://www.yoursite.com/.
  • Display a user friendly not-found-page when you mistype an address

The way that the static website feature works is very simple. It creates a container called [code]$web[/code], which is where you put anything you wish to publish. Note that URLs will be case-sensitive. This is because blob names are always case-sensitive.

There are a couple of things to consider if you want to use it for a real-world scenario:

  • How to add a custom domain
  • How to handle a typical server-side thing like a contact form
  • How to edit/update your website

Custom Domain

Probably you will wish to use your own domain name for your website, like www.yoursite.com. The automatic address the static website feature gives you isn't very pretty. There is currently no setting for custom domains, so how do you solve this?

The solution is to use Azure CDN (Content Delivery Network).

Azure CDN

The purpose of Azure CDN is to increase speed and reduce the load on your webservers by caching your content on a global network of servers. When users visit your website, they will get the content from the nearest cache instead of reading directly from your webserver.

Azure CDN is available as different product offerings, with different features. I suggest selecting Premium Verizon, because it has a rules engine. In the rules engine, you can do things such as URL rewriting and URL redirection.

Azure CDN is cheap and easy to setup. It can even save you the cost of a HTTPS certificate for your domain, because it can request and manage your certificate for free.

URL rewriting and URL redirection

If you are not careful, your website could appear in search engines under multiple domains. This is because search engines could find your website on the origin (the autogenerated Azure Data Lake static website address).

To prevent search engines from indexing a website there is robots.txt. On your custom domain it should say OK. On other domains than your custom domain, robots.txt should say Disallow.

But wait... How can you make robots.txt different for different domains?

This is where URL rewriting can do magic for you. Here's an example of how I use it for www.how2code.info:

Azure Cdn Verizon rewrite rule

In a similar way, you can do URL redirection. For instance redirect any request from http to https:

Azure Cdn Verizon redirect rule
Finding your "customer origin"

You should replace the "customer origin" /80103ADC/cdn-how2code-www with your own. The easiest way to find your customer origin is to create a new rule (that you later discard), select "Origin" and then "Customer Origin". Your customer origin will then appear in a drop-down.

Azure Cdn Verizon finding the Customer Origin

Cache-Control

If you use a CDN, caching becomes even more important. This is because the CDN needs to know if it needs to reload the content or not.

Without Cache-Control, the request will go from the CDN to the Data Lake each time. This is very unnecessary and will slow down your website.

Azure Data Lake Cache-Control scenario

There are two ways to manage the Cache-Control. Either you can manage the caching directly on the blobs in Azure Data Lake, or you can manage it by rules in the CDN.

My recommendation is to manage the caching directly on the blobs. Cache-Control is available as a property on your blobs. Unfortunately it is not shown in the portal, but you can easily manage it through PowerShell, C#, REST API, etc. You can easily check it from your web browser and should then see something like this:

One more thing... In Azure CDN, there is also a setting called Query-String Caching. Azure Data Lake static website will ignore any query strings, so for best cacheability you should choose "standard-cache" mode. This is the default mode.

Content-Type

The content-type is necessary for web browsers to display/handle content correctly. Usually it is managed by web servers, but with Azure Data Lake static websites it becomes necessary to manage yourself.

When uploading files in the portal, Azure will by default assign a content-type/MIME type based on the file extension. For example:

  • text/html for a .html file
  • image/jpeg for a .jpg file
  • audio/mpeg for a .mp3 file

When uploading through other means (such as PowerShell or C#), you will need to set the content-type yourself. It is available as a property on the blobs and you can even see it in the portal.

Managing it in PowerShell is similar to managing other properties, such as cache-control. This article describes how to manage properties.

Adding a Contact Form

A contact form is a typical example of something that should be handled server side:

  • Email servers usually requires credentials. These should not be exposed by including them in a client script.
  • Maybe the message is written to a database? Then it is even more important that the databas credentials are not included in a client script.
  • You might be doing some anti-spam checks in your contact form. That also should not be handled in a client script.

Azure Data Lake Storage static websites are... static. They don't come with any server side code support. So you have two options:

  • Using a third-party contact form that is hosted on a third-party server. For example Formspree.
  • Adding an Azure Function, or similar, to execute your code server side.

Azure Functions (Serverless Compute) are easy to create, and very cheap. They can be created in the portal or by using for example Visual Studio Code. Here's an example.

Azure Function How2Code

Calling the Azure Function from JavaScript can be done in several ways, depending on browser compatibility. In this example, I use the XMLHttpRequest class for maximum browser compatibility.

 1var xhr = new XMLHttpRequest();
 2xhr.open('POST', 'https://func-how2code-web.azurewebsites.net/api/SendMail');
 3xhr.setRequestHeader("Content-Type", "application/json;charset=UTF-8");
 4xhr.onreadystatechange = function () {
 5	if (this.readyState === XMLHttpRequest.DONE) {
 6		if (this.status == 200) {
 7			window.location.href = '/en/contact/thankyou';
 8		} else {
 9			enableButton();
10			alert('An error occured. Please try again later.');
11		}
12	}
13}
14xhr.send(JSON.stringify({ "email": email, "name": name, "subject": subject, "message": body }));

Azure Functions are a good way to extend your static website with server side code and they can be written in your prefered language (C#, JavaScript/Node.js, PowerShell, etc).

Editing/updating your website

Azure Data Lake static website doesn't come with any web editing functionality like WordPress. You will have to build and manage the Html files yourself.

It's not super hard to make a decent editing environment for your website.

  • You could run something like WordPress locally on your computer. Then there are plugins available that can convert your website into static files. These files can be uploaded to your Azure Data Lake by using AzCopy, PowerShell or similar.
  • You could build your own editor that automatically updates the Azure Data Lake.

I built my own editor by creating a ASP.NET Core C# web application. The advantage of this is that you get all the libraries to easily render html pages. Basically I have a few html templates that I apply to all my content. The content is stored in a dedicated Azure Data Lake container, and the html output is written to the $web container. Any changes to the website design is easy because I just need to change the templates. I run the editor locally, but it could easily be deployed as a web app if needed.

An even cooler way to design an editor would be to build it in Vue.js, React, Angular or similar framework. That way both the website and the editor could be hosted as Azure Data Lake static websites.

Azure even has an SDK for JavaScript! Using this SDK you can for example:

  • Integrate your web apps with Azure Active Directory Authentication (see for example here)
  • Access protected Azure functions.
  • Upload and update blobs

Another option is using Blazor Webassembly. From my experience that is easy to host in a static website and could be used to create an excellent editor. This would probably be my main option if I started from scratch with this blog today.

Related Posts