The practice of creating Machine Learning until recently was based on solutions that required a high level of effort and specialization to achieve passable results. However, we’ve seen the rise of out of the box solutions slowly reducing the barrier to entry for custom ML solution building. The widespread introduction of cloud-based AI services has now accelerated this phenomenon to new heights, almost eliminating the core machine learning skills and hardware requirements that were previously required to create commonly used AI functions.
Developing an ML Solution from Scratch
The process of setting up a simple Machine Learning solution required first a knowledge of the different paradigms of solutions available. The typical choices for this being Genetic Algorithms, Fuzzy Logic, Artificial Neural Networks (ANNs), or a Hybrid approach.
Let us look at an example: Here we will go through the high-level steps of creating an application that takes image inputs containing text and outputs text files containing the text in these images.
First, we must isolate a strategy to translate the images into inputs for our neural network. This can be done by translating pixels into number representations.
We must then decide on the structure of our neural network. This involves deciding on the number of nodes, layers, and implementation of propagation. The architecture of a neural network heavily influences its effectiveness. A basic architecture might start with the input layer, followed by several hidden layers, and finally an output layer. For image processing tasks, Convolutional Neural Networks (CNNs) are typically used because they can recognize patterns in images more effectively by preserving the spatial relationships between pixels.
Finally, as we translated the input into number representations, our output also must be translated back into text.
Once this setup has been completed, we can then begin the training of our neural network. This can require many Processor hours, as well as Person Hours if a supervised learning strategy is being used.
The process of training a neural network involves several iterations of:
- Training: Adjusting the weights of the network using backpropagation based on the error rate of the output compared to the actual label.
- Validation: Using a separate set of data to validate the accuracy of the model, ensuring that it generalizes well and isn’t just memorizing the training data.
- Re-training: Modifying the network’s architecture, tuning hyperparameters, or providing more training data based on validation results.
When we are satisfied with the level of accuracy of our outputs, we can then finally deploy the solution. Note that the training portion is the most resource intensive part of the process. The actual running of the application against new inputs will also be costly.
While this sequence of tasks required to create a working ML solution may seem daunting, for aspiring AI developers, it is highly recommended that you undertake the building of such implementations. Building a solution from scratch is an informative, educational, and rewarding endeavor and will deepen your understanding of AI. This may also serve as a steppingstone if you are interested in learning the theories behind Machine Learning.
Visualization of an OCR Neural Network
The Advent of Out of the Box Solutions
Machine Learning based solutions have been available in the wild for quite some time. Widespread amongst these are Optical Character Recognition, Speech Transcription, and Image Analysis. In recent times, we’ve also seen the usage of Speech and Image Synthesis, as well as other Generative solutions being deployed. OCR and Speech synthesis have many proprietary and open-source solutions available. However, many of these solutions suffered from having a high skill floor requirement for configuration and setting up. More importantly, they required powerful and sometimes specialized hardware to perform within acceptable parameters. Even when well-equipped and properly configured, training times and deployment times could be astronomical, and a fair amount of trial and error is required, with costly retraining yet again hampering development times.
The Shift to Cloud Based Services
These challenges are mostly eliminated with the usage of cloud-based AI services. Platforms like Azure and AWS have begun offering products that fulfill the requirements of most AI-based solutions. In this section we will look at the services offered by Azure under their Azure AI Platform.
Firstly, Azure offers a wide range of solutions encompassing the most used as well as the most cutting-edge AI use cases. This library is also constantly evolving and expanding as existing services are being improved and new services added. They include but are not limited to:
- Classification and Anomaly Detection
- Content Moderation
- Text and Speech functionality
- Translation
- Computer Vision
The Azure portals and User Interfaces make the process of training these solutions simple as well. In the case of Images Analysis using computer vision, the images can be uploaded to the Azure Computer Vision Portal, classified using the GUI and used for training image recognition and classification algorithms. Similar portals exist for many of the other services. Alternatively, Upload can also be done through REST APIs and programmatically using SDKs.
Azure offers flexibility when it comes to the deployment of solutions. Services can be grouped, isolated and even dockerized to be deployed on site. Once deployed, these solutions can once again be consumed at your convenience. The most comprehensive solution for this would be the Azure SDKs, which are available in many of the most popular languages and frameworks, including .NET, Python and Java.
Crucially, since these services are cloud based (aside from onsite docker deployments), the burden of training and running of these services is shifted to the cloud provider, which take off a major weight from the developers and their associated organizations.
What these services ultimately offer is a shift in developer ergonomics and huge reduction in the barrier to entry of creating AI solutions. It is exciting to know that in the future, many more developers and organizations will be able to create new and innovative AI solutions that leverage the use of cloud computing.
Accelerated Deployment with Azure AI Services
Looking back on the task of extracting text from images, using Azure AI services simplifies the process significantly, thanks to the suite of cognitive services that Microsoft provides. There are two ways we can implement this based on our requirements.
To extract text from a predetermined set of documents we can use the Azure Cognitive Search with the OCR Cognitive Skill: See Extract text and information from images in AI enrichment, or we can simply use AI Vision Read API:
- Start by Logging on the Azure Vision Portal and provisioning an AI Vision resource: Create Computer Vision – Microsoft Azure
(You will require an active Azure Subscription and an existing resource group for this.) - For this demo, you can select a Pay as you subscription and use the Free Pricing Tier. Choose a region that is close to your physical location for minimal latency.
- Ensure that All Networks are allowed to access the resource, including the internet.
- Hit review + create to proceed. Review your information and create the computer vison resource.
- This enables us to call the read API. You can supply an image URL containing the Image to be analyzed.
- You will be redirected to your resource dashboard, Click the Go to resource to see your new Computer Vision Service details.
- Copy one of the two provided Keys. This will be required for calling your endpoint. Note down your endpoint from this page.
- Open web API Testing tool of choice or Terminal of choice and prepare your request:
curl -v -X POST 'https://westcentralus.api.cognitive.microsoft.com/vision/v3.2/read/analyze' \ --header 'Content-Type: application/json' \ --header 'Ocp-Apim-Subscription-Key:
' \ --data-ascii "{'url':'https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png'}" - Replace subscription key with your API key enter a URL to your image. Read More about Azure Subscription Keys: Subscriptions in Azure API Management. For the purposes of testing, you can use the example image in the test url provided. Make sure you use the correct region associated with your vision resource.
- Execute the request. This will return an Operation ID that you can use to get the results of the read.
- Now you can call the analyze results API with the operation ID. Make sure to once again include your Ocp-Apim-Subscription-Key.
curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {key}" --data-ascii "{body}"
- This will return the following information:
- Page Attributes
- Page Number: Indicates the page number analyzed.
- Angle: Specifies the rotation angle of the text.
- Dimensions: Provides the width and height of the page in pixels.
- Unit: Uses “pixel” as the unit of measurement.
- Lines: Details the lines of text detected:
- Bounding Box: Lists coordinates of the bounding box surrounding each line.
- Text: Contains the text detected.
- Appearance: Provides details about the text appearance, including:
- Style Name: Specifies the style of the text.
- Confidence Level: Indicates the confidence in the style detection.
- Words: Breaks down each line into individual words:
- Bounding Box: Lists coordinates of the bounding box surrounding each word.
- Text: Contains the recognized word.
- Confidence Level: Indicates the confidence in the word recognition.
[You can also use the SDKs available to interact with the API programmatically. See: Quickstart: Optical character recognition (OCR) – Azure AI services | Microsoft Learn]
This offers several advantages over the non-service-based approaches:
- Simplification of Development Process
Traditionally, building an ML solution required a deep understanding of the theories behind machine learning. Developers needed to manually configure the structure of neural networks, including the number of nodes and layers, and handle the intricate details of data preprocessing and feature extraction.
Cloud-based services, however, provide pre-built and pre-trained models that can be utilized with minimal setup. For example, services like Azure’s Computer Vision simplify the process by allowing users to upload images, automatically classifying them, and using them for training image recognition models. This eliminates the need for extensive knowledge in neural network design and lowers the entry barrier for developers. - Reduction of Hardware Requirements
ML development traditionally demands significant computational resources, which can be cost-prohibitive. Training complex models requires powerful processors and sometimes specialized hardware like GPUs. Cloud platforms mitigate this by providing access to scalable computing resources on-demand, thereby shifting the computational burden from local machines to the cloud. This not only reduces the investment in hardware but also allows for scalability as computational needs grow. - Accelerated Deployment and Integration
Deploying and integrating ML solutions can be complex, involving setting up servers, managing dependencies, and ensuring the application scales are efficient. Cloud services streamline this process through managed deployments. Platforms like Azure allow AI solutions to be dockized and deployed on-site or run directly on the cloud. This flexibility simplifies the deployment process and provides robust integration options through well-documented APIs and SDKs, supporting various programming languages. - Ease of Maintenance and Updates
Cloud platforms continuously update and improve their AI services. As a result, developers can leverage the latest advancements without needing to manually update their systems or models. This continuous improvement cycle ensures that AI solutions remain state-of-the-art without requiring significant ongoing investment from developers. - Comprehensive Service Offerings
Cloud platforms typically offer a broad range of AI services that address common and advanced use cases—from text and speech processing to translation and anomaly detection. This variety allows developers to easily find and integrate AI capabilities that meet their specific needs, often just by configuring existing services rather than building from scratch.
Ultimately, cloud technology in AI serves to democratize access to powerful AI capabilities by abstracting complex processes and providing accessible, scalable, and maintenance-free computing resources. This ushers a paradigm shift as development teams can now look towards cloud AI service providers to implement AI reliably and consistently in their applications without compromising budget or developer ergonomics. Make sure to consider these approaches in your application.