Neural architecture search is currently an emergent area. A lot of research is going on and there are many different approaches to the task. There isn't a single best method generally or even a single best method for a specialized kind of problem such as object identification in images. Neural architecture search is an aspect of AutoML, along with feature engineering, transfer learning, and hyperparameter optimization. It's probably the hardest machine learning problem currently under active research; even the evaluation of neural architecture search methods is hard. Neural architecture search research can also be expensive and time-consuming. The metric for the search and training time is often given in GPU-days, sometimes thousands of GPU-days.
Recurrent Neural Networks take in sequential inputs and predict the next element in the sequence depending on the data they're trained on. Vanilla recurrent networks would, based on the input provided, process their previous hidden state and output the next hidden state and a sequential prediction. This prediction is compared with the ground truth values to update weights using backpropagation. We also know that RNNs are susceptible to vanishing and exploding gradients. In the context of neural architecture search, recurrent networks in one form or another will come in handy as they can serve as controllers which create sequential outputs. These sequential outputs will be decoded to create neural network architectures that we will train and test iteratively to move towards better architecture modeling.
NAS algorithms design a specific search space and hunt through the search space for better architectures. The search space for convolutional network design in the paper mentioned above can be seen in the diagram below. The algorithm would stop if the number of layers exceeded a maximum value. They also added skip connections, batch normalization, and ReLU activations to their search space in their later experiments. Similarly, they create RNN architectures by creating different recurrent cell architectures using the search space shown below. The biggest drawback of this approach was the time it took to navigate through the search space before coming up with a definite solution. They used 800 GPUs for 28 days to navigate through the entire search space before coming up with the best architecture. There was clearly a need for a way to design controllers that could navigate the search space more intelligently.
Most of the work that has gone into neural architecture search has been innovations for this part of the problem that is finding out which optimization methods work best, and how they can be changed or tweaked to make the search process churn out better results faster and with consistent stability. There have been several approaches attempted, including Bayesian optimization, reinforcement learning, neuroevolution, network morphing, and game theory. We will look at all of these approaches one by one.
Reinforcement learning has been used successfully in driving the search process for better architectures. The ability to navigate the search space efficiently in order to save precious computational and memory resources is typically the major bottleneck in a NAS algorithm. Often, the models built with the sole objective of a high validation accuracy end up being high in complexity–meaning a greater number of parameters, more memory required, and higher inference times.
Floreano et al. (2008) claim that gradient-based methods outperform evolutionary methods for the optimization of neural network weights and that evolutionary approaches should only be used to optimize the architecture itself. Besides deciding on the right genetic evolution parameters like mutation rate, death rate, etc., there's also the need to evaluate how exactly the topologies of neural networks are represented in the genotypes we use for digital evolution.
On the other hand, Compositional Pattern Producing Networks (CPPNs) provide a powerful indirect encoding that can be evolved with NEAT for better results. You can learn more about CPPNs here, and find implementation and visualizations in an article by David Ha here. Another variation of NEAT known as HyperNEAT also uses CPPNs for encoding and evolves with the NEAT algorithm. Irwin-Harris et al. (2019) propose an indirect encoding method that uses directed acyclic graphs to encode different neural network architectures for evolution.
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.