This blog is an introduction to Genetic Algorithms. We will be discussing the terminologies involved and how the algorithm has been put together to solve optimization problems.
As the algorithm's name suggests, there is a crucial link to biology. In fact it is more or less based on Charles Darwin's theory of evolution. This algorithm realizes the saying 'survival of the fittest' although Darwin argued against it.
So let's take you back to your biology class for a while to make you familiar with the concept of natural selection in the theory of evolution.
In natural selection, individuals in a species show variation in physical characteristics. This variation is because of differences in their genes. The individuals reproduce and pass on their characteristic to their offspring through their genes. The offspring of more fit parents would be obviously fitter and the one with less fit parents would be
expectedly less fit. A few mutations can happen to bring out some changes in the population in the course of evolution. Fit individuals contribute in making the population move forward. As this cycle continues the population keeps evolving into a better version of its previous self.
Now that you have had a refresher, lets start with Genetic Algorithms.
Genetic algorithms
It is a search-based heuristic algorithm used for solving optimization problems in machine learning. This algorithm works by simulating the process of natural selection. There are five phases in a genetic algorithm namely:
Population initialization
Fitness function
Selection
Crossover
Mutation
This algorithm helps in solving difficult problems that would take a long time to solve. It has been used in various real-life applications such as data centers, electronic circuit design, code-breaking, image processing and artificial creativity.
Let's discuss the various phases involved in the genetic algorithm.
Population Initialization
This algorithm starts with generation of a population which is often random but consists of all the possible solution to the problem. The individuals in the population are called as 'Chromosomes' and these chromosomes are further divided in to 'Genes'. Each gene corresponds to a specific characteristic.
Fitness Function
This is the function that determines the fitness of an individual. It assigns a fitness score (as per requirement) to every individual, which further determines the probability of being chosen for reproduction. The higher the fitness score, the higher the chances of being chosen for reproduction.
Selection
This is the phase which selects a number of individual based on their fitness score to reproduce and give rise to a new population.
The goal here is to maintain a population with high chances of generating the best solution to the problem (better than the previous generation). The genetic algorithm uses the fitness proportionate selection technique to ensure that useful solutions are used for recombination.
Crossover
This is the most crucial phase as this phase is responsible for giving rise to the new generation of population. In this phase, the selected individuals from the previous phase are paired to mate and give rise to an offspring. The offspring has characteristics of both the parents.
It should be noted that there are a number of ways to obtain the offspring. Some of them are listed below.
Single point crossover: It is one of the simplest way to perform crossover. In this case, a crossover point on the parent chromosomes is chosen randomly and all the data beyond that point in the parent chromosomes is swapped as shown below. Strings are characterized by Positional Bias.
Two point crossover: In this case instead of one, two crossover points are selected randomly and the genes are swapped as shown below.
Uniform Crossover: In this case, each gene (bit) is selected randomly from one of the corresponding genes of the parent chromosomes as shown below.
Mutation
This phase brings new genetic information to the offspring in the new generation with a very low probability. It can be achieved by flipping or shuffling of some bits in the child chromosome.
Mutation helps to solve the problem of getting stuck in the local minimum/maximum and enhances diversification in the succeeding populations.
In the image shown above a bit has been flipped randomly in the child chromosome altering its gene received from its parents, hence bringing mutation.
It should be noted that mutation is used to maintain and introduce diversity in the genetic population and its occurrence is usually kept very low in order to let the algorithm explore more possible solutions. If the mutation rate is high then the genetic algorithm gets reduced to a random search.
These phases keep on repeating in the same chronology and keeps on producing new population until a stopping criterion is reached. It will identify the solution provided by the current/last generation as the best solution.
Why is the current/last generation considered the best solution?
The initial population is randomly created and each individual is very different from one another. Some are fit while some others may be very unfit. When the next generation is produced, the newly created offspring have the genes of both the parents, both of whom were pretty fit to begin with due to 'selection'.
Therefore, the offspring inherited better genes. Thus, inherently the new population is better than the previous one leading to a better solution.
healthy parents = healthy kids
But it can be argued that its not always true. Sometimes, kids of fit parents can be unhealthy although the probability is very low. This occurs due to mutation. So, in the algorithm we introduce mutation with a very low probability by randomly changing the value of the genes. This brings diversity in the population. And in our algorithm it helps in not getting stuck in a local minima/maxima.
Still, the new population which is inherently better than the previous one and this population's less fit individuals are still better than the individuals of the previous
generation. Although there may be some exceptions but their number will be very low.
Each new generation promotes fitness which means each generation makes a better solution based on the fitness score. And hence, after a long cycle of generations we reach the best solution possible, incrementally.
In this algorithm, the only way forward is to be better than the previous self. It guarantees us that the fitness of the new generation wont be less than the previous one or at the least it will be comparable. With each successive generation the algorithm produces a better solution and therefore for best possible solution the algorithm needs to run for hundreds or thousands of generations.
Lastly, it should be noted that the no. of generations is not the only factor that directs the algorithm to a better solution. The algorithm depends on other factors as well such as:
size of initial population
how well the fitness function evaluates an individual
how often mutation is introduced in the population
the no. of elite individuals selected for mating
the type of crossover; etc.
If the above factors are assigned suitably, then the algorithm promotes the best solution possible.
If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.
Commentaires