The Theory of Evolution remains surprisingly controversial in the United States. According to a recent Gallop Poll, 38% of Americans believe God created human beings as-is. Nearly 2 in 5 of those polled reject evolution. It could be argued that our DNA is one of the most defining aspects of who we are, and yet, many modern people remain unconvinced.
Rather than look at evolution through the lens of biology, let us consider this theory in the context of computer science. In this article we're going to take a closer look at the mathematics behind natural selection. More specifically, we are going to use genetic algorithms to find a solution to the n-queens problem, and in doing so, gain a better understanding of what the theory of evolution is all about.
When it comes to machine learning, any supervised or unsupervised model begins with the data. This data is what "teaches" the machine to make decisions on its own. Whether using regression, classification, or neural networks, all machine learning methods share this demand for data in order to function.
Genetic algorithms take a very different approach to artificial intelligence. Rather than deriving answers from data, these algorithms mimic the rules of evolution instead. The solutions found by these algorithms do not result from a rigorous mathematical formula, but rather, the laws of nature, and a whole lot of trial and error.
Many who are skeptical about evolution argue that it cannot explain how life originated. They are absolutely correct. The study of abiogenesis seeks to explain how life originated. By contrast, evolution is only concerned with what happens to life once it exists, regardless of where or how or why it originated.
As Charles Darwin once explained:
It is not the strongest of the species that survives, not the most intelligent that survives. It is the one that is the most adaptable to change.
Evolution is a process of change. It works like this: all individual animals have a collection of traits. These traits are expressed from their genes, which are just little chunks of DNA in specific combinations. All animals live in environments. All environments present unique challenges and opportunities. A combination of genetic traits, free will, and some blind luck, determines if any individual animal will survive any given environment.
According to the theory, those animals best matched to the current environment are most likely to survive. That is, most likely to make babies with each other. The "winning" genes from these "winning" animals are more likely to pass down. Add the occasional beneficial mutation, keep repeating the process over hundreds or thousands of generations, and eventually, you should find some beneficial evolution has taken place.
Case in point: a giraffe with a longer neck can reach more food. A well-fed giraffe is more likely to make baby giraffes.
In order to measure the survival of the fittest we must first define our fitness metric. In the simplified example above, the length of neck is the fitness metric of giraffes. Those with longer necks are more likely to reproduce.
When it comes to our genetic algorithm, we will be using the n-queens problem as our fitness metric. However, whether trying to eat leaves, or play chess, the fundamental rules of evolution are the same.
In the game of chess, the queen is the most powerful piece. Like the king, she is able to move in all 8 directions. Unlike the king, who is limited to one square of movement, the queen is free to move as many squares as she likes.
The n-queens problem was born in 1848 when Max Bezzel asked: is it possible to place 8 queens on a chessboard so that none of them are in attacking range of any other? In other words, is it possible to place 8 queens on an 8x8 board so that no queen shares a row, column or diagonal with any other? Although it may sound trivial, try it for yourself and you will quickly discover this is a more challenging problem than one might expect.
In answer to Mr. Bezzel's question, there are 12 distinct solutions to this problem. Here is one of them:
The reason we call this the n-queens problem is that it works for any n greater than 3. In other words, it is possible to place 1,000,000 queens on a chessboard with a million rows and columns, so that no two queens are attacking each other. As we increase n the computational cost increases geometrically, which explains why this problem remains so popular in first year computer science classes.
This also means we can go the other way. By reducing n to 4, and placing only 4 queens on a 4x4 board, we simplify this problem considerably. When reduced to 4-queens there are only 2 distinct solutions to this puzzle:
We are going to use the above 4-queens problem and translate this challenge into something that can be represented "genetically". If we look at standard chess notation, we find our origin at the bottom left corner, with rows 1, 2, 3, 4 extending up, and columns A, B, C, D extending right. We also know that no two queens can be placed in the same row, or else they will be in conflict with each other.
We can therefore reduce any viable board configuration into a string of 4 letters, where each letter represents a column, one per row. This can be considered the "DNA" of any board. The above two solutions could be represented as "BDAC" and "CADB" respectively.
This puzzle also provides a good fitness metric. Consider the potential solution "CDCD":
This configuration is among the worst, with each queen in conflict with several others (10 total, marked with red arrows). Compared to one of the perfect solutions above, we can see this configuration is "less fit" according to our target metric. We already have everything we need to figure out a solution, using nothing more than the rules of evolution to find it.
Let us imagine a species of fictional animal called "the eims". These are extremely simple creatures, with DNA consisting of only 2 chromosome pairs, which corresponds to configurations on a 4x4 chess board, as explained above. Not much is known about these eims, so we decide to take a pair back to the lab and study them more.
Here we have Adam (AABB) and Eve (CCDD) of the eims. The first thing we notice is that these eims like to reproduce, with each mating resulting in a new baby eim. Each baby eim takes one chromosome pair from mom, and one from dad, and recombines them into either order to generate its own DNA.
Any pair of eims can produce up to 8 genetically distinct offspring, and Adam and Eve are no exception. They quickly produce the following 8 children before dying in a state of contented bliss:
The next thing we discover is that the eims do not share our disgust of incest; the children of Adam and Eve do not hesitate to mate with each other. We also discover that the eims are pansexual; any two eim can mate and create viable offspring. They are not restricted by any distinguishable gender.
These 8 eims in generation two can pair off in 28 different ways, and each of these pairs can produce up to 8 genetically distinct children. Our population explodes to 224 eims in generation three. However, only 16 of these eims have unique DNA. The rest are genetic clones of other eims.
In the third generation we find clones of all the second generation eims (top two rows), plus clones of our original Adam and Eve (circled in red). Given a total of four chromosome pairs (AA, BB, CC, DD) these 16 eims represent all possible combinations that will ever exist. Even if we use the same "breed all combinations" method to expand our population to 960, no new DNA will emerge. And none of these 16 configurations solves the issue of the 4-queens problem.
Before we can "evolve" we must add two new features to our experiment.
First, we want our eims to act with more discretion. Instead of mating every possible combination, the probability of mating will be directly proportionate to the fitness of any individual. In other words, just as giraffes with longer necks are more likely to mate, so too are eims with fewer queen conflicts. Let us consider these two eims from generation three:
The left eim (AACC) has a total of 8 queen conflicts. The right eim (AADD) has only 6. Therefore, the right eim is "more fit", and thus, more likely to reproduce. The probability of reproduction for any individual eim is equal to the fitness of that eim divided by the total fitness of the entire generation.
The final ingredient needed for evolution is mutation.
After generation three, every baby eim has a 1 in 3 chance of mutating. Every time a baby eim mutates, we randomly select one of the 4 pieces of DNA and replace it with a randomly selected chromosome. There is a 1 in 16 chance that a mutation will leave the original DNA unchanged (if we select a given letter and replace it with the same one).
In order to speed up our experiment we have intentionally selected a high mutation rate. We have also limited future generations to 256 for the sake of faster computation. Changing either of these parameters could radically alter how our genetic algorithm performs. Even with the same parameters, given the large amount of randomness, the same algorithm will still produce different results each time it runs.
Starting in generation four, our population of eims begins to evolve. Due to mutations, new chromosomes begin to appear in our gene pool, increasing the genetic diversity beyond clones of the 16 original combinations. Consider these two mutant eims that emerged in generation four:
Above we find the new "CD" and "DA" chromosomes which, prior to mutation, did not exist. However, both of these mutant eims still contain 6 queen conflicts, which is the same as "AADD" from the prior generation.
Not all mutations are directly beneficial. Most are benign, just like these two. These mutations did not directly increase the fitness metric of either individual eim, however, they did introduce new chromosomes into our gene pool. And each new chromosome introduced increases the amount of potential combinations geometrically.
We continue iterating through generations, using the rules of evolution as defined above. In the 28th generation, after witnessing the births and deaths of 6,634 individual eims, the chosen one emerges.
As we can see above, by the 26th generation (left), only one original chromosome remains, the rest are mutants. Oddly enough, two of the grandparents are identical (2nd and 4th from top). Despite this, they resulted in the children who birthed the chosen one. As we can see (right) the final eim matches one of the two valid solutions for the 4-queens problem (CADB). When it comes to our genetic algorithm, we consider either solution just as good as the other.
The challenge with genetic algorithms, and evolution in general, is that they rely on chaos (randomness) to produce their results. Our job then becomes to better measure and understand this chaos.
Consider this: if you randomly placed 4 queens on a 4x4 board, again and again and again, sooner or later you would stumble on a correct solution. More specifically, we have 4 rows, each of which can place a queen on any of 4 columns, resulting in 256 possible combinations (44). We know there are 2 distinct solutions to the 4-queens problem. Therefore you have a 1 in 128 chance of guessing a correct solution at random.
Each generation of eims contains 256 members. Based on the laws of probability, we would expect 2 solutions per generation of purely random eims. Meanwhile, it took our genetic algorithm 28 generations to derive a single solution. It appears as though our genetic algorithm is inferior to pure random chance. Can this be true?
Using the evolution method, it is impossible to create a solution in the first few generations, because the needed chromosome pairs do not exist yet. By comparison, each and every generation of 256 purely random eims is expected to find 2 solutions. However, once our genetic algorithm achieves the necessary level of genetic diversity, it starts outperforming random chance significantly.
To better measure this phenomenon we ran 100 experiments. In each of these experiments we created 100 generations of 256 purely random eims, and compared those to 100 generations of 256 eims created via our evolution method. We then counted the total number of solutions found in each experiment, and graphed the results below:
The random method (red) produced roughly 2 solutions per generation, just as we expected. The evolution method (green) found roughly 3x as many solutions. As we can see by the wide range in the green column, the results of our genetic algorithm are very chaotic, with some experiments strongly outperforming others. Despite this inherent chaos, taken in aggregate, the "survival of the fittest" results in more "fit" eims overall.
We can also see how the overall fitness of each population changes over time. Below we have averaged the results of these 100 experiments across each generation.
The random group sticks near an average fitness of 1600 per generation, no matter how many generations pass. The evolution group starts at a major disadvantage, but by generation 20 the improvement is outstanding. For the remaining generations the evolution group continues to get more fit, finding more solutions.
Given the simplified 4-queens puzzle, there are 44 (256) possible board configurations, 2 solutions, and a 1 in 128 chance of getting it right with a lucky guess. If you expand to a proper 8x8 board, there are now 88 possible board configurations, 12 solutions, and a 1 in 1,398,101 chance of guessing right. If you expand to the 16-queens version of this problem, your odds of a lucky guess plummet to less than 1 in 9,987,652,000,000.
As with the n-queens puzzle, the problems we find in the real world tend to get more challenging as we go along. As the probability of achieving your goal state through dumb luck decreases geometrically, the advanages to a genetic algorithm become crystal clear. While the random eims keep blindly chugging along, hoping to win the lottery, the evolution eims are slowly but surely achieving better fitness, and ultimately, a solution.
Evolution does not exist to solve the n-queens problem. Evolution, like any artificially intelligent machine, lacks motive. In the above experiments we provided motive by setting our fitness metric to solve the n-queens problem. In the real world, nature provides motive, but this motive is not always clear. Case in point: why does a giraffe have a long neck?
Both Darwin and Lamarck suggested that a giraffe's long neck must have something to do with reaching more food, and this certainly makes sense. However, this is not the only possible explanation. The "necks-for-sex" hypothesis suggests that long necks were sexually selected because male giraffes use their necks to fight over the females. In other words, a giraffe's neck may have less to do with getting leaves, and more to do with getting laid.
As far as evolution is concerned, it doesn't matter why female giraffes prefer men with long necks. Be it for food, or fighting off competing males, or wooing lady giraffes, long-necked giraffes make more babies, and the laws of probability do the rest.
In the real world, environments are constantly changing, and as environments change, so too do the challenges and opportunities for survival. Just because a form of life becomes dominant is no reason to expect it will stay dominant.
Evolution can produce longer necks, just as it can produce solutions to the n-queens problem, but this provides just a glimpse of evolution's true power. As Darwin explained, the true power of evolution lies in its ability to adapt to change.
Our eims have been very busy since last we checked in on them. As you recall, the eims have already split into two separate tribes. The "random" eims are created by randomly selecting one of the 256 possible combinations of DNA. The "evolution" eims breed based on probability according to their relative fitness, and each baby eim has a 1 in 3 chance of mutation.
Until now the "evolution" eims have acted like a rather simple animal, something like a bacteria. What if they were slightly more complicated? What if something like religion was introduced into our fictional eim world? A great debate rages among the eims. Is there a better way to organize society? Some of the eims think so, and split off into a new tribe based on "couples". Unlike the other eims, the "couples" eims strongly believe in the sanctity of marriage.
Starting from Adam and Eve, they form the same 16 genetically unique children, then make 16 clones of each, to produce the first generation of 256. From here they pair off, get married, and each of the "couples" produces two children with the same 1 in 3 chance of mutation. Eims marry other eims with the same fitness levels. In other words, the sexiest eims pair off first, then the next sexiest, and so on, until everyone has gotten married and had two children.
As before we ran 100 experiments of 100 generations to measure the results:
Sadly, the first attempt at the "couples" strategy is a flop. It is arguably the most fair system, since everyone gets married and has children. However, the average fitness never reaches that of the "random" group.
The elders of the "couples" are not deterred, they maintain faith in the sanctity of marriage. However, they concede to a few tweaks. From now on, only the top 50% of eims will get married, and they will have 4 children each. Although less fair, this is closer to the reality of our world, where not everyone reproduces.
In terms of average fitness per generation, this revised "couples" strategy is a success:
However, in terms of most solutions found, aside from a few outliers, "evolution" still does better overall:
Frustrated by this lack of progress, another new tribe of eims rises up, and rejects the dogma of the "couples". These are the "harem" eims, and they take survival of the fittest to the extreme. The "harem" eims are identical to "evolution" eims, except that "harem" eims have genders, and more importantly, strictly enforced (and extremely sexist) gender roles.
The top 5% of "harem" eims become male, with the remaining 95% becoming their concubines. Females with higher fitness scores still have a greater probability of being bred. This means that a female will move to a better "harem" as different males compete over her. This also means that the number of females in each "harem" is proportionate to the relative fitness of each male. Probability notwithstanding, the male is free to breed whichever females he fancies.
The results are visualized below:
In terms of fitness per generation, the "harem" strategy beats all others.
In terms of total solutions found, the "harem" strategy produces more than 5x as many solutions as the pure "evolution" method, and more than 18x as many solutions as the "random" method, after the same 100 generations.
Until now we have based our fitness score on the n-queens problem, specifically, the number of conflicts between queens in each eim. Eims with less conflicts are deemed "more fit", and thus, are more likely to reproduce. When we visualize average conflicts per generation, we see the opposite of the above graph:
Returning to the giraffe example, imagine there was some disaster that radically changed their environment. Imagine that, instead of being advantageous, long necks became detrimental for some reason. What would happen to them? According to the theory of evolution, we would expect shorter-necked giraffes to evolve, or for the species to die out due to an inability to adapt.
According to our fitness metric, eims with less queen conflicts are considered sexier, and are more likely to reproduce. We can easily imagine why a species such as the eim might benefit from reduced conflict, favoring a communal approach to limited resources.
However, we now have four sub-species of eims. Imagine they were all competing for the same space. In a situation such as that, embracing conflict may be the only way to survive. But if generation after generation of eim has evolved through natural selection to avoid conflict, what will happen to them when the key to their survival flips without warning? Can they still adapt, or will all the "conflict genes" have been bred out by then?
In the final simulation this is exactly what we tested. For the first 52 generations our fitness metric favors eims with the fewest conflicts. After that, our fitness metric favors the eims with the most conflicts. We can imagine generation 53 as the point where all eims come together, and begin fighting with each other to survive.
Here we witness the true power of evolution, not to converge on a specific solution, but to adapt to change. As expected, the "random" eims are not affected by this shift in environment, but all eims based on natural selection ("evolution", "couples", "harem") are able to change course with remarkable speed.
The "harem" eims still have the advantage when it comes to reducing conflicts, but the "couples" eims take the lead once the environment shifts. It appears the "couples" strategy is superior in a world where conflict is encouraged. Given that most modern societies are based on a strategy not too dissimilar from these "couples", it may be tempting to infer that the survival of modern humans favors conflict too.
Remember: changing any of our parameters could change the results of these experiments radically. It would be unwise to extrapolate too much from the world of our fictional eims, beyond the fundamental mechanics of evolution.
This article is not intended to provide the final say on the subject of evolution, nor should anything contained herein be misconstrued as advocacy of any form of eugenics (see also: Using Decision Trees to Identify White Nationalists). As always, we are happy to provide the code used in this project so that you are free to run experiments of your own. If you have the patience, you can easily extend these same simulations to 8-queens and beyond using this same code.
When it comes to the subject of artificial intelligence, the genetic algorithm is a powerful tool to have in your arsenal. It is especially useful for tasks where a simple brute-force approach would be too computationally expensive, assuming you can find a logical way to represent your real world problem "genetically". The different "breeding strategies" presented above may be useful in designing an algorithm that can converge on a winning solution more quickly.
When it comes to the subject of evolution, it is the author's sincere hope that even the most ardent skeptic takes pause having read this article. Remember: save your arguments regarding the origin of life for those discussing abiogenesis. Once life exists, real or simulated, no matter how or why it got here, the principles of evolution can be demonstrated to be mathematically true. I welcome your skepticism. Don't take my word for it. Run the simulations for yourself. Allow yourself to be duly impressed by the simple mathematics that drive all life on this planet, and beyond.
Did you find this article useful?
Connect with us on social media and let us know: