Running a Blog on Azure Data Lake
by Johan Åhlén • Updated
Can you build a blog on Azure Data Lake/Azure Blob Storage? Yes, absolutely! In fact, this blog is an example of that. In this article, I will describe how to build a website on Azure Data Lake including dynamic content such as blog post and tag indexes.
First, a couple of words about why you would do such as thing:
- It is very cheap. See the pricing here.
- You get a super fast webserver.
- It is cool.
You will certainly need more technical skills and time than if you used something like WordPress, but if you enjoy creating things in Azure you will have great fun!
Static Website hosting in Azure
You can easily upload a file to an Azure Data Lake, where it will become a blob and is assigned a URL. This makes it very easy to upload a complete static website. However, there are two features that you usually want from a website:
- Display a user friendly not-found-page when you mistype an address.
- Have a default page. For instance when you type
www.how2code.info, it actually goes to
Coincidentally, these two features are exactly what you get if you enable Static website hosting in Azure Storage. That feature is currently (as of August 2020) in preview for Azure Data Lake, but it is fully available for Azure Storage without hierarchical namespaces.
Be careful with mime-types. Azure will by default assign a mime-type based on the file extension of the files you upload. If your pages are stored in .html-files, it will be okay. For an address like
www.how2code.info/en, you will have to set the
Note that URLs will be case-sensitive. This is because blob names are case-sensitive.
To deploy your files to your Azure Data Lake, you could either use AzCopy or develop your own more sophisticated scripts.
Structuring your Data Lake
I suggest dividing your Data Lake/Azure Storage content into different containers:
- Web - for your .html-files
- Posts - for your blog posts in a structured format
- Img - for your images
You can build scripts that pre-transforms your blog posts from the structured format to html-files that are stored in "Web". The scripts can then also autogenerate index pages for your blog posts and tags, as well as a sitemap. Another option is to do the transformations dynamically in the end user's web client.
Here's an example of the structured format I use for my blog posts on How2Code.info:
Using a Custom Domain
The address your website will get is something like
https://something.blob.core.windows.net. Surely you want a nicer address?
Azure CDN is the solution. It gives you:
- Support for custom domains (like www.how2code.info).
- It even can create a certificate (for free) for your custom domain. So you can use HTTPS.
- Local caching of your content on a global network of servers. Your website visitors will be automatically sent to the nearest server.
Azure CDN is available as different product offerings, with different features. I suggest selecting Premium Verizon, because it has a rules engine. In the rules engine, you can easily setup things such as default pages and redirects.
Caching is very important for CDNs. If you don't enable any caching, the CDN will have to reload the files all the time. The caching could either be setup in the CDN, or on the blobs. This article describes how to setup caching on blobs.
Azure CDN is quick and easy to setup. It is useful for almost any website, not only blogs running on Azure Data Lake.
Making your Website dynamic
It is amazing how much you can now do client side on the web. There are frameworks like Angular and Vue.js, that can be used to build incredible websites. You could build much more advanced things than this blog. Still they can be deployed to an Azure Data Lake, since to the webserver they are only static files.
- Integrate your web apps with Azure Active Directory Authentication (see for example here)
- Access protected Azure functions.
- Upload and update blobs, which actually makes your static website not so static anymore.
Still, there are cases when you want to run things server side. For example, security is much easier to handle server side. You can easily run server side code that is not exposed to the end users. With client side code, your and basically open sourcing everything.
As a compromise, you could place code in Azure Functions (Serverless Compute). That will protect the code, but they have to be carefully written so they cannot be exploited.
Another thing that can't be done client side is permanent redirects. One possible way to solve it is through the CDN rules engine.
- Prism, to get nice syntax highlighting of your source code examples.
- CodeMirror, to get a html editor that you can build your blog post editor on.
- LazySizes, to speed up your website by loading images on demand.
Do you have any other favorite libraries? Feel free to let me know.
Finally, I must admit. This website has all content in an Azure Data Lake, but it uses an Azure App Service (Web App) for some server side functionality. Mainly for security reasons. That will change sometime in the future...