XenonStack Recommends

XAI

Neural Architecture Search (NAS) for Computer Vision Models

Dr. Jagreet Kaur Gill | 08 October 2024

Neural Architecture Search (NAS) for Computer Vision Models
11:12

Overview 

Deep learning based specifically in neural networks is an area that has grown and advanced significantly in the last few years specifically when it comes to the processing of signals from computer vision. However, the selection of a precise neural network for a particular task can be a tedious and time-consuming job. Well, this is where Neural Architecture Search, also commonly known as NAS, steps in operations. Therefore, NAS intelligently avoids the process of designing the structure of a neural network and can minimize the amount of human effort to develop top performing computer vision models.  

In this blog, we will learn how NAS works, why it is revolutionary for computer vision, and how it is utilized to design better models. 

What is Neural Architecture Search (NAS)? 

neural-architecture-search

Fig 1.0: Neural Architecture Search (NAS)

 

Neural Architecture Search (NAS) is an autonomous procedure for finding suitable neural network architecture. NAS stands for neural architecture search, which digitally creates architectures using known spaces and seeks out the most effective model for the task at hand. The idea is to find an approximate or near-best design that provides high accuracy and low computational cost. 

NAS has three main components: 

bigquery-bi

Search Space

Defines the possible neural architectures the algorithm explores, including aspects like the number of layers, layer types, filter sizes, and activation functions

our-purpose-line-icon

Search Strategy

Specifies the method NAS uses to navigate the search space, typically involving reinforcement learning, evolutionary algorithms, or gradient-based techniques

targeting

Evaluation Strategy

Evaluates the selected architecture’s performance using indicators like error rate, often aiming for efficient estimation methods without fully training the model

NAS has been particularly impactful in the field of computer vision, where neural networks are applied to tasks like object detection, image classification, and semantic segmentation. 

Why is NAS Crucial for Computer Vision? 

This is because designing high-performance neural networks for computer vision systems is not trivial because of the involved task complexities and enhanced possibilities of architectural options available. Conventional neural network architecture design would require a lot of know-how, time, and guesswork. 

Some reasons why NAS is crucial for computer vision include: 

  • Automation of Architecture Design: First, NAS eliminates the need for time-consuming and labor-intensive architecture searches through manual cross-experiments. This is especially helpful in computer vision in which the required architectures are usually the best and deepest.  

  • Optimizing for Accuracy and Efficiency: NAS can search for architectures that achieve the highest accuracy and the highest accuracy per computational operation required. This is especially important for real-world applications where bringing the models to edge devices or smartphones is resource-constrained.  

  • Transferability: NAS can prescribe architectures that can be generalized and can be used for different computer vision tasks. For instance, an architecture discovered for image classification can be transferred to object detection or segmentation. 

NAS Strategies for Computer Vision Models 

There are several strategies that researchers use to perform NAS. Each strategy varies in how it navigates the search space and evaluates performance. Below are some of the most common NAS methods used for computer vision: 

1. Reinforcement Learning-based NAS 

The first techniques used for addressing NAS include Reinforcement Learning(RL) to identify models of superior performance. In this regard, an agent (controller) creates potential architectures for a deep neural network, and then the quality of the underlying architecture is determined by its efficiency in solving a problem such as image classification. By maximizing a reward, more often the model accuracy, the agent is trained to create better architectures. 

  • Example: In the NASNet framework developed by Google, RL was used to search for a neural network architecture that achieved state-of-the-art results on the ImageNet dataset. NASNet outperformed many hand-designed models like ResNet and Inception-v4. 

2. Evolutionary Algorithms 

Evolutionary algorithms are with reference to the evolutionary model and genetic algorithms. This approach begins from a pool of randomly initialized neural architectures that are appropriately initialized. These architectures are then “evolved” over successive generations through mechanisms such as mutation – in which some aspects of the architecture are changed – and crossover – in which parts of two architectures are taken. The only architectures that are retained are the ones that yield the best performance, while the other architectures that do not yield positive results are eliminated. 

  • Example: Google's AmoebaNet uses evolutionary algorithms to find optimal architectures. It has produced highly efficient architectures for image classification, outperforming RL-based methods in some cases. 

3. Gradient-Based NAS 

Despite the richness of reinforcement learning and evolutionary approaches, they are computationally demanding mainly because they entail building and training a large number of candidate models. The most related idea to Gradient-based NAS is the differentiable architecture search in which the search space is simply a convex space. 

  • Example: The DARTS (Differentiable Architecture Search) framework is a gradient-based method that significantly reduces the time and resources required for architecture search. DARTS uses a continuous relaxation of the architecture search space, allowing architectures to be optimized using gradient descent. 

This method is particularly useful for computer vision tasks, where architectures tend to be deep and complex, and training each candidate model fully can be prohibitively expensive. 

 

Applications of NAS in Computer Vision 

NAS has been used to develop highly efficient and accurate architectures for various computer vision tasks. Below are some key applications: 

 

1. Image Classification 

Image classification is one of the most common tasks in computer vision, and NAS has been particularly successful in automating the design of classification models. Architectures like NASNet and EfficientNet, both developed using NAS, have set new benchmarks in terms of accuracy and efficiency. 

  • NASNet: NASNet became one of the first NAS-generated models to show superior performance on a variety of large-scale datasets, including ImageNet. It automated the identification of convolutional cell structures that can be elaborated and reproduced to create deeply structured networks.  

  • EfficientNet: Another NN NAS developed is called EfficientNet, which is aimed at a balance between accuracy and speed. By applying such a principled approach, the depth, width, and resolution of the model have been scaled in EfficientNet so that it can achieve excellent performance with fewer parameters and computations. 

2. Object Detection 

Object detection involves not only classifying objects in images but also localizing them. NAS is now being applied to design architectures that can perform object detection more efficiently. 

  • NAS-FPN: NAS-FPN (Feature Pyramid Networks) used NAS to discover architectures for feature extraction in object detection. It achieved state-of-the-art performance on COCO, a popular benchmark dataset for object detection while being more computationally efficient than manually designed models. 

3. Semantic Segmentation 

Semantic segmentation is the task of classifying each pixel in an image into a category. This is a challenging problem due to the need for fine-grained spatial understanding, and NAS has been used to design architectures that excel in this domain. 

  • Auto-DeepLab: Auto-DeepLab, developed using NAS, automatically discovers network architectures for semantic image segmentation. It was able to outperform many hand-designed segmentation models on benchmarks like Cityscapes, a dataset used for urban scene understanding. 

4. Edge Computing and Mobile Vision 

One of the most exciting applications of NAS is in designing lightweight models that can run efficiently on mobile devices and edge computing platforms. NAS can optimize architectures to meet the constraints of mobile hardware without sacrificing performance. 

  • MnasNet: MnasNet was specifically designed using NAS for mobile vision tasks. It balances accuracy and latency, making it ideal for real-time applications on smartphones or embedded systems. 

Challenges and Future Directions of NAS 

While NAS has proven to be a powerful tool for designing computer vision models, there are still challenges and areas for improvement: 

Computational Costs 

NAS is computationally expensive, especially when incorporating reinforcement learning or evolution-based algorithms. Training and evaluating multiple architectures can be a computationally intensive process, which mars the accessibility of NAS for many organizations with limited computational capability.  

 

However, the recent development of gradient-based NAS, such as DARTS, and the frequent substitution of expensive performance estimation of network architectures through efficient proxy tasks are making NAS more computationally affordable and, thus, more practicable. 

Search Space Design 

In fact, NAS significantly relies on the design of the search space to achieve its effectiveness. If the search space is too small, it may be possible that NAS does not find more diverse architectures, but if it is too large, NAS may take a lot of time to find efficient architectures. Thus, keeping the design of the search space optimal continues to be a research problem. 

Generalization to Different Tasks 

Despite the fact that NAS has been demonstrated effective in some specific tasks such as image classification and object detection, it is still a problem to ensure that the discovered architectures can be well generalized for other related tasks, for example, video understanding or 3D vision. The future developments of NAS might integrate finding better ways to transfer NAS-generated architectures across wider or varying computer vision problems. 

Conclusion of NAS

Neural Architecture Search (NAS) is transforming the way neural networks are designed, particularly in the field of computer vision. By automating the discovery of optimal architectures, NAS is helping researchers and engineers create models that are not only more accurate but also more efficient, especially for large-scale tasks like image classification, object detection, and semantic segmentation. 

 

While there are still challenges to overcome, including high computational costs and search space design, the future of NAS looks promising. As the technology matures, NAS will likely play a pivotal role in developing the next generation of computer vision models that can operate efficiently on both large-scale cloud systems and resource-constrained edge devices. 

 

NAS represents a major leap forward in neural network design, enabling more scalable, adaptable, and powerful solutions for complex computer vision tasks.