Processing microservices data is a relatively new, but hotly discussed topic in the developer world. In our latest thought leadership piece, we examine the pros and cons of the technological and architectural solutions available and set out our best practice guidance.
Over the past year, OCC (Oxford Computer Consultants) has frequently been tasked with development work on both medical diagnostic and research tools.
Interestingly, project requirements have typically followed the same pattern, stipulating that the data must be:
- Uploaded in raw form using a website or app
- Analysed using machine learning
- Analysis results must be displayed on a website or app
This article will not focus on the data upload or display steps. Dare we say (with apologies to our UX designer readers) that there is a wealth of advice available on these – and they are in fact solved problems. We will instead focus on the processing steps.
Processing microservices data
In some cases, processing is sufficiently quick that it can be embedded within the website. This is not usually the best strategy. Processing times can make web server responses unacceptably long. Ideally, data processing steps should be tied into a separate process that does not interfere with web responsiveness.
OCC’s approach is to separate the processing step out completely. Persistent storage is used to communicate information between the processing pipeline and the web UI. System components can then be completely uncoupled.
This raises the question of how to handle processing elements of these systems offline. Fortunately, there are several potential solutions to this issue; some architectural; some technological.
The technology choices revolve around platform and language. In this article, we will be exclusively discussing Azure hosting. Similar options are available on AWS (Amazon Web Services) and Google Cloud. Language choice will be dictated by the problem domain. Many data processing problems can be alleviated through standard numerical and machine learning libraries.
Python’s NumPy, scikit-learn and PyTorch libraries are excellent in this regard, so for any problem revolving around machine learning, image processing or complex numerical analysis, Python will be the recommended language choice. Microsoft’s free IDE, Visual Studio Code provides strong support for Azure and Python development, so tooling is not a significant problem where Python is required.
OCC has also been successful with C# processing pipelines where existing Python support is lacking. The tooling and support for C# Azure Functions in Visual Studio is first rate. Where available this can speed up development.
Linked to choices of language and platform is the choice of architecture. This can be crudely summarised as monolithic vs microservice.
Does the application consist of one large complex process, or several smaller ones, communicating via messages or shared persistent storage? Both approaches have their advantages.
Monolithic vs. Microservice
Monolithic apps are simpler to develop and deploy, and may perform better, as the overhead of inter-component communications is reduced. On the other hand, smaller services permit easier testing, debugging and monitoring. This can be of particular benefit in medical apps, where quality control is king.
Azure provides several promising platforms for both types of app. The simplest case would be the use of a VM. This will work in almost all cases, but is not cost efficient, either for hosting or for maintenance.
A better choice might be to use Azure Functions. Azure Functions comprise a serverless processing technology. In terms of advantages, they:
- Are quite powerful
- Offer many options for scaling, triggers and language choices
- Are cost efficient
- Will make a good choice in many cases
On the downside, they are unsuitable:
- For very long-running calculations
- Where third-party binary components or other exotic code is required.
Time limits can be partially worked around using Azure Durable Functions and premium hosting options. However, runtimes of over 1 hour can never be relied on. Azure functions are ideal for microservices based architectures with many small components passing data along a pipeline u sing shared storage. The way in which Azure Functions are invoked and receive data is called a “trigger” and Azure provides some different triggers for its functions these include: timer based triggers, queue triggers which launch functions on receipt of a message, and blob storage triggers.
Blob storage triggers
One trigger option that is very appealing to new Azure Functions developers are Blob Storage triggers. The appearance of a file in an Azure Blob Storage automatically launches a function to process it. This option should be used with great caution, if at all, as it will result in a lot of activity scanning files. This is especially true where files are not deleted from the Storage immediately once processed. This activity can be significantly expensive for large numbers of files. We would generally recommend using queue-based triggers instead.
The third major option would be the use of a container-based solution. These are less convenient and require more development effort than Azure Functions. There are some considerable advantages however, namely that they are:
- Much more flexible when it comes to content
- Not constrained to short runtimes
- Extremely portable
- Not necessarily restricted to the Azure platform.
If you do remain on Azure, there are many ways to use containers to your advantage. These include in particular:
Azure Container Instances – a cheap and simple, (but rather user-unfriendly) approach to container deployment.
Azure Kubernetes Service – Kubernetes is a complex and difficult technology, but potentially very helpful for managing more complex scenarios. The steep learning curve required means it may be more sensible to consider other approaches. This is, unless you have existing Kubernetes experience or have very complex deployments, or heavy workloads.
Azure Container Apps – a new addition, combining the simplicity of container instances with inbuilt scaling support and support for DAPR enhanced microservices. This option appears extremely promising, but it is quite new, so the pool of community knowledge is still relatively small.
All these options are viable in the appropriate circumstances. Our design choice is always influenced by the individual set of circumstances in each case. However, our preference is for cost-effective Azure Functions, with a switch to container-based solutions for applications with a longer runtime, or more unusual software requirements.