, you guide a small group of nomadic fairy-folk, the "Tronds," through a great and harrowing migration towards the mythical promised land called "Trondheim" (Norwegian for "Home of the Tronds"). The idea was that this journey would span several generations, so by the time you reached the final goal the original nomads would be long dead, but still live on in the memory and stories of their great-great-grandchildren.
Each person would have a unique name and job in the tribe, and you would watch over them as they worked, married, gave birth, grew old, and finally died.
As the tribe would be small enough to track individual relationships, a problem arose: how would I keep the computer from marrying close relatives? With a larger population, I could handle population growth abstractly, and with a population of anonymous animals I could mostly ignore the issue, but with a small group of individually named humanoid characters, I needed to make sure no player would see Leif marry his sister Sonya.
Since the computer would decide the relationships, I needed an algorithm designed to detect and prevent incest.
Attempt #1:
Family Tree Traversal
My first idea was to simply check two potential partners for family ties, and then disallow certain relationships. This required a list of "banned" pairings.
Though not universal, most cultures have incest taboos, the majority of which disallow marrying close blood-relatives, and often non-blood relations within the same family (aunts, step-relatives, etc) as well. Western taboos are largely derived from the Old Testament, so I figured Leviticus was as good a place to start as any.
Here's the list of marriage partners explicitly banned in the OT (assuming a male subject):
- Sister / Sister-in-law / Half-sister / Step-sister
- Mother / Mother-in-law / Step-mother
- Daughter / Daughter-in-law / Step-daughter
- Grandmother
- Grand-daughter
- Aunt
Using that as a starting point, I came up with the following "ban list" for my algorithm:
- Ancestors
- Descendants
- Siblings
- In-laws
- Step-relatives
- Aunts, Uncles, Nephews, Nieces
- First cousins

This broadly represents the incest taboos in modern western culture. It differs slightly from the OT laws - I'm disallowing first cousins (not explicitly banned in Leviticus), but allowing second cousins, which I personally find creepy, but is a reasonable concession for a small nomadic population on the edge of survival. Considering that marriage between first cousins is not unheard of in many parts of the western world (including the United States), our video game nomads are still adhering to a pretty conservative standard here. I should note that although these taboos are largely a mechanism for addressing inherited diseases, there are social concerns as well, since it would strike most of us as creepy if someone married their mother-in-law, step-sister, or adopted child, even though they aren't related by blood.
Defining these relationships in computable terms requires a "family tree" data structure that grows over time and tracks everyone in the society's relationship to one another. Whenever any two individuals are considered for marriage, the game runs an "incest check", which returns a value between 0 and 1, with 0 representing "no incest whatsoever," and 1 representing maximum incest, a sibling-sibling pairing. The idea is to run a bunch of test cases through the algorithm, and then determine what an acceptable maximum value will be.
To calculate the incest factor, I line up each character's ancestors and compare them, father vs father, mother vs mother, patneral grandmother vs paternal grandmother, etc, and count how many ancestors they have in common. This gets complicated fast. Let's start with the case of two first cousins:

These first cousins don't have any parents in common - so far, so good, but they do have the same maternal grandparents. They have 2/6 grandparents in common, and 0/4 parents in common, for a total of 2/10 ancestors, counting two generations back. If we just count the number of shared ancestors and divide by the total number, we get a value of 0.2.
However, this raises further questions - first, how far back do we need to check? Grandparents? Great-grandparents? Further? Second, the above method leaves out the fact that although the cousins don't share any parents, their mothers are sisters. So what do we do there? On the one hand, since we've already counted the shared grandparents, this could implicitly account for the related mothers. On the other hand, perhaps the algorithm should calculate the relative closeness of each ancestor and then use those as weighted values to affect the final outcome. At this point, some sort of recursive algorithm that cascades up and back down the family tree would be needed. As much as that sounds like a fascinating math problem, maybe there's an easier, more direct, way to do this.
Since incest is primarily (but not exclusively) a genetic concern, let's try a genetic approach. The easiest way is to just to assign each character a virtual "DNA" string at birth. For our initial population, let's start with a diverse gene pool, with each individual having completely unique DNA. To explain this in story terms, let's assume our society was not originally nomadic - they once lived in a great civilzation underground, but were chased out in a catastrophic invasion of subterranean trolls. This means that at the start of our game, our intitial "tribe" consists of unrelated refugees all thrown together by chance.
Let's take a look at the DNA of two of our nomads, the tribe chieftans:
Our incest-checker compares the DNA strings, and looks for matching "genes." There's zero matches here, an incest factor of 0, so this pairing is "legal." Trondfar and Trondmor have two children, Trondson and Tronddatter, and each inherits half their DNA from their father and half from their mother. (For simplicity's sake, we just inherit every other gene from each respective parent. The charts below sort the genes for easy comparison.)
The incest-checker would return a value of 1.0 (all genes match) for these two, so they can't get married. Instead, Trondson marries a girl named "Sonya," and Tronddatter marries a boy named "Leif," (neither of whom are related):
Then, Trondson and Sonya have a son, "Trondsonson", and Tronddatter and Leif have a daughter, "Tronddatterdatter", yielding these two first cousins:
If these two were considered, the incest-checker would return a value of 8/16, or an incest value of 0.5, which is too high, so they would look for other partners. What's useful about this approach is that we no longer have to track direct relationships, just check how close each person's "DNA" is to another. We can supplement this by also tracking immediate relationships like step-relatives, adoptees, and in-laws to catch "non-blood incest." All in all, the genetic approach works elegantly, quickly, and doesn't depend on laboriously checking geneology records.
It should be pretty easy at this point to run baseline checks on the relationships we want to disallow until we can determine what the maximum acceptable incest factor is. (For instance, this model gives second cousins an incest factor of 0.25). There are, however, a few short-comings to this method, which brings us to...
Attempt #3:
Virtual DNA 2.0
Virtual DNA is a good approach, but it could be improved. For one, if I choose to make character appearance variable, it seems natural that characteristics such as hair, skin, eye color, etc, could all be driven by individual genes in the DNA string. Even in a simple indie game with limited sprites, simple pallete-swapping and Mendelian inheritance could make it so children share characteristics with their parents. This requires some tweaks.
So far, we've only been using the virtual DNA to check against incest. If we're using it to drive physical features, we need to model real genetics more closely. With the old model, two children of the same sex from the same parents would have the exact same DNA, and thus the exact same appearance. In the new model, each person has two sets of DNA, one from their father, and one from their mother.
Here are our chieftans, then:
Trondson and Tronddatter would still get half their genes from Trondfar and half from Trondmor, but which genes they get in each slot is random. Trondfar passes on a DNA strand that contains a random mix of his two DNA strands, as does Trondmor:
This new system is more realistic, and accounts for inheritance better, but we've got a problem - with random inheritance, and such a low number of "genes", it might be easy for our incest checker to fail! Whereas the previous model would record a match on every gene for a sibling pair (since they are all essentially identical twins), the new model won't. Let's look:
That's 6 matches for the father's genes, and 7 matches for the mother's, or about a 40% match overall. In real life, siblings have just over 50% of their genes in common, on average (skewed high because of identical twins), so this result is pretty accurate. However, since the inherited genes are chosen randomly, it's still conceivable the algorithm could run into edge cases like this:
Here, only about 15% of the genes match, an incest factor of 0.15. The anti-incest algorithm would mistake these two for distant relatives instead of siblings. Now, the chances of the above case are pretty slim, but even with just a few thousand people playing the game it would not be uncommon for players to report brothers and sisters randomly pairing off. It wouldn't cost much to extend the length of the string to say, 50 genes from each parent (100 total), which would essentially eradicate the statistical likelihood of such flukes.
Even if I don't plan on using every single gene to drive some physical trait, the much longer DNA strand will ensure that the incest checker still works. Interestingly enough, this is a lot like DNA in real life, where most of it doesn't seem to code for anything (that we know of, at least). Long DNA strands, and a backup safety check to account for close, easily defined relatives (and non-blood relatives) should keep things from getting creepy even in the edge cases.
One final note - no fancy algorithm will protect our society from in-breeding in the long term with such a small population, because after only a few generations almost everyone will be related. To fix this, we can include an "immigrant" mechanic (similar to the one in Dwarf Fortress), where other Tronds who escaped the cataclysm will randomly run across your tribe and join you, providing fresh, unrelated genes.
So, there's a random page from my notebook. Maybe it will help someone in a project or something.
Any thoughts?
I'm curious if you could use a byte array and some fancy bitwise operation to handle it all though. Have each initial trod and each immigrant (if added) have a single unique bit set to 1 (so trod#1 = 0001, trod#2 = 0010, trod #3 = 0100, etc.).
To check for closeness and the 2 trod's history and count the # of 1s, then count the # of 1s in each trod. If either Trod has >=50% of their 1s in common with the other they're too closely related to be together.
For example:
Trod A = 1010 1011 (5 1 bits)
Trob B = 0100 0011 (3 1 bits)
And combination = 0000 0011 (2 1 bits)
# of 1 bits in combination / # of 1 bits for Trod A = 2/5
# of 1 bits in combination / # of 1 bits in Trod B = 2/3 <- This is over 50% so they are too close to get together. In this case Trod A must be Trod B's nephew through a 1/2 brother or a 1st cousin twice removed.
I still need to think it through more with all sorts of edge cases, but something like that seems like you should be able to get it to work with some additional rules/checks maybe. For adopted siblings and stuff you'd need to include something where you change the adopted person's history to a union of their existing one and their new parent's. But for a max pop of 200 you'd only have to store ~5,000 bytes of information (200 bits / 8) * 200 (one byte array for each person) and the comparisons would be fast. Then the DNA can be purely for traits and attributes without having to make it longer.
This might/would break down if the game was more like dwarf fortress where you may have thousands of immigrants/ancestors over the lifetime of a game even for a max population of 200.
In general, a match-making check is a pretty rare event in terms of game play time, as someone doesn't get married every second in a small population, it probably only happens once every few minutes of real-time at most. However, they would have to do a check against every other eligible bachelor/maiden in the tribe, and so the number of incest checks expands geometrically with the population. If this algorithm were to be used for a game with a larger society, you'd want to consider optimizations so it doesn't chug every time someone looks for a partner.
The first obvious thing to do would be to put in simple checks first - constrict "eligibility" to people within a similar age range, skip over immediate family (defined by pointers in each Trond object), etc, to lower the total number of DNA comparisons.
I do like your approach though, it removes the need for "junk" DNA that takes up memory. I guess I've just gotten lazy with these modern computers and their megs and megs of RAM :)
In this sense, there's no need for a whole second chromosome (yay less CPU and memory!) so long as you randomly assign genes from each parent--location matters and once you distinguish between all the A genes and B genes, ie. A1, A2,...,AN and B1, B2,...,BN.
The exact "incest prevention" you're talking about is the risk associated with recombining similar sets of DNA that result in expression of deleterious phenotypes (Hemophilia, Cystic Fibrosis), which generalizes to the common bans you listed in attempt #1. In the miraculous case in which two siblings would 100% produce a viable, healthy child, there's no reason for preventing them from pairing off.
I think this system I lightly outlined would work just as well, and avoid the issues you came across.
The principal motivation behind this system is to avoid creeping out the human beings who play the game. Since it's a video game and we make the rules, there's no reason we couldn't program it so that incestual relationships would produce 100% viable offspring ... except that as human beings, that's a pretty creepy game to play, watching brothers and sisters get married and have kids.
For instance, this is why the system has manual checks for non-blood relatives. If all we care about is genetically inherited diseases, there's no biological reason to keep you from marrying your step-mother. Except... that's pretty creepy.
Once we start implementing genetic algorithms that mimic what happens in real life, the more important it gets to take cues from biology, especially if we're going to build in genetically inheritable diseases, etc.
I was planning on implementing recessive genes, hence the double-strandedness. I kind of condensed all my reasoning into one breath, though.
However, it seems to me like you're overengineering this.
If the only goal is to prevent marriage between partners who share ancestors within X generations back (probably to the grandparent or great grandparent (second cousin) level if you want to be Puritan about it), then it seems to me all you have to do to check for incest is to remember who each person's grandparent is and see what percentage of them are shared by the two prospective partners.
It's essentially the same as your "incest value" calculation. The only difference is that instead of counting all the ancestors, we only count those in a specific generation. Counting more than one generation of ancestors is unnecessary. E.g. it's pointless to count Mom and Dad when you're already counted all four grandparents, because the grandparents' genes are in Mom and Dad.
Using this method you could just set a hard limit on the number of shared ancestors at a certain generational level. Probably one or zero grandparents, or two or three great grandparents.
Did I miss something? Of course, this doesn't help you implement actual gene inheritance with eye color, etc.
You also run into issues around immigrants/early settlers where people don't have a full set of grandparents (or possibly any at all). So the son of 2 1st generation immigrants would be able to marry his sister unless you do additional checks or assign the immigrants fake parents.
Offspring of Siblings: The younger's grandfather is the older's father.
Offspring of Offspring: The younger's grandfather is the older.
Siblings: Shared father!
Direct offspring: the younger's father is the older.
Cousins: Shared grandfather.
You check both sides to handle the case of polygamy, but I'm pretty sure this is the simplest way to do it and is guaranteed...
I've never implemented this algorithm in any setting, so I'm not sure if you would need the flexibility or not, but it seems like a strength to the approach. It probably doesn't save you too much memory, since you need to store gene strings for each Trond (unless you use a really efficient bitwise approach to keep it lean), but it does mean less book-keeping. If a Trond arrives with a set of genes, you can do blood-incest checks without needing to know who she's related to.
In fact, if the gene data is robust enough, you could reasonably INFER who her relatives are even after trashing the entirely family tree.
Specifically, you run into the following cases:
1. Two immigrants with no parents being incestuous
2. Two in-laws getting away with incest.
There are a bunch of tradeoffs and it's really a question of the constraints you're trying to enforce.
It seems to me you want to create a "web" of relationships between each Trond, then count the shortest relationship distance between any two Tronds, and if that number is too low, you deny the marriage. There are four basic relationships that seem to matter:
3 biological:
- parents
- siblings
- children
1 social:
- spouses
You would only have to search the end of 2 relationship lines for each of the cases you outline above... with the exception of first cousins. Cousin marriage is an interesting topic in and of itself (http://en.wikipedia.org/wiki/Cousin_marriage) and I find it a non-insignificant detail that, according to your Bio, you're in Texas, one of the "dark red" states on the first cousin marriage map of the US :) But from a code perspective, accounting for first cousins is a special case that requires considerably more relationship tracking than all other relationships. Alternatively you could blanket out all third-degree relations and sweep up first cousins, great grandparents, great aunts/uncles, and the like.
An added benefit of storing each of these for each Trond is that you can generate a complete relationship map of every Trond that lived in your game at the end of the game. It might be storing a lot of extra pointers, but it might pay off in the end.
I wasn't actually aware that Texas outright banned first-cousin marriages! Though I personally find first-cousin pairings taboo, my libertarian-leaning political views oppose enforcing those opinions through law. According to the article, the genetic risk associated with first-cousin marriages seems less than I'd originally assumed, so I might revisit my stance on this topic if further research bears this out (though it no longer affects me, since I'm already married and all my cousins live on the other side of the ocean).
As for other non-insignificant details, though I live in Texas, I'm also a Norwegian citizen, so there's probably a lot of first-cousin pairings in my ancestry. :)
As for the algorithm itself, the method I described does include both biological and social checks, primarily to catch so-called "non-blood" incest (assuming we choose to enforce those rules), though I only mentioned it in passing and didn't really detail the specifics.
The other consideration, is given that we've got a small population here, our nomadic tribe will eventually run into the same issue that is faced by the Amish and other small communities (also mentioned in your wiki article) - even if they avoid close relatives, the eventual shrinking of the gene pool means just tracking close relatives might not be enough.
But then, we've come full circle - if the only purpose of the algorithm is to keep from freaking out the audience, do we need to worry about it any further? I mean, I don't plan on implementing Cystic Fibrosis and other inherited diseases.
But I was thinking of a couple possible remedies.
1) What about making parents have a random number of children? That way instead of your population always staying the same, it can grow which reduces the chance of selecting two close relatives (assuming that you have a high enough starting population)
2) How about instead of selecting two people at random, you select to two best cases among the current generation?
And for multiple children, definitely. I imagine parents will have several children throughout their fertile years. Given waves of immigrations, harsh winters, the occasional raid by hostile trolls, and bountiful springs, I imagine the population would shrink and grow over time, but staying within certain bounds.
It's a scenario based on a real village model. It was noted that in one village- that as soon as someone married they moved away from the area- their family. So- as soon as you have marked a inhabitant as "married", you give them collision-detection. The bubble could increase as you keep them away from others in the family by an alpha-numeric designation. IE "collision-detection1", "collision-detection2"
1) Social pairings: individuals should not marry someone who's from their immediate family
2) Genetic pairings; individuals should not mate with someone who has a similar geneset
However, as you've found, the two checks are not guaranteed to produce the same results: two socially-distant people may share the same genetic recessives, and two family-related people may have completely different genes.
Equally, there's also a risk in real life that people may not be aware of their genetic background; people can change partners (and/or cheat on their partner, if in a marriage) and children can be adopted. If this happens - especially in a small society - there's a risk that two people will grow up while being unaware of the fact that they have one (or possibly even both) parents. But then we're getting into increasingly complex modelling, which I'm guessing is beyond the scope of what you're trying to achieve!
In truth, if you are trying to accurately model a (relatively) primitive nomadic society, I'd stick to checking family relationships only before permitting a relationship - after all, they're not going to be carting a DNA sequencer with them! I'd then perform the DNA check at the time of birth; if too many genes are common between the parents, the child would be assumed to have had a genetic fault which would cause it to be stillborn.
The latter sounds/feels a bit creepy, but it's as close as you can get to a realistic model...
Like Tynan, I feel like the approach could be simplified here - although the "virtual DNA" implementation is very interesting and seems to solve multiple problems at once, it's simulating something different from what you're actually trying to simulate.
What you're actually trying to simulate is the social taboos of certain types of marriages; society has "rules" about what is and isn't acceptable, and certain types of marriages are unacceptable based on these rules. This societal disapproval is what will be causing your players to find the game creepy too, so you should definitely be trying to make this your model.
This simplifies the problem referred to several times here where you're considering both "blood relations" and "social relations." In fact, all that really matters is social relations. (In real life, if a boy and girl fraternal twin are separated at birth and later meet, marry, and have children, there's no taboos broken so long as no one knows about it. There may be genetic problems - but that's not REALLY what you're trying to model here, and whether you want to model it would and should be a separate problem.)
I would therefore approach this as a setup of social rules. And if you think about the REAL social rules we're applying, they include things like:
"Siblings shouldn't marry."
"Cousins shouldn't marry."
"Adopted siblings shouldn't marry."
"A mother-in-law shouldn't marry her son-in-law."
In other words the rules very quickly stop being about genetics. What they're really about is ROLES. Roles are an important part of society, and when two people are in (or were in) certain role relations to each other, we become uncomfortable when they later break those roles and enter drastically new roles.
Essentially it starts to seem like this could be broken down into 1) a web of relations between characters, and 2) a series of rules of the form "if A & B have relationship X, then their entering into relationship Y would have an acceptability modifier of Z." When evaluating mates (or entering into other relationships!), these rules would first be checked (all of the relationships would be evaluated - note that A and B could have multiple different relationships) and an acceptability modifier would be generated. You could then weigh that acceptability against any threshold you wanted.
This system could also be used to later express social taboos such as "A professor shouldn't have a romantic relationship with their student" or "a pastor shouldn't have a romantic relationship with a member of his church." In other words it's flexible for any types of relationships and taboos about those relationships. Note that two people can have multiple different types of relationships, even in terms of family, and that this would check all of them and sum up a cumulative acceptability rating.
Interesting discussion!
I think the genetic approach is cool because it scratches a particular algorithm-design itch, but it might be better suited to game systems that involve more than superficial genetic metaphors.
Next thing I'd like to explore is the nitty-gritty algorithmic optimization of storing, traversing, and doing quick checks with this social web.
...It might be interesting to see if certain genetic traits end up dominating in the whole population after several generations (blond hair, brown eyes or whatever?) or if the social taboo rules do a good job promoting genetic diversity too.
It has been observed in Israeli kibbutz society, where unrelated children were raised together, and later were found to almost never have married anyone they had been raised with.
http://en.wikipedia.org/wiki/Imprinting_%28psychology%29#Westermarck_effect
Was choosing 16 genes an arbitrary number? Did you ever figure out what the "optimal" number of genes would be?
It makes sense in high tech pragmatic society, but lowtech nomadic tribes are antithesis of this (I can imagine high tech nomads use genetic approach, because its one of only two ways for survival with realy small population, second is: breed as much and discard unsucessfull individuals, which is much more resurce intensive and much more luck based).
But your stated goal is to avoid creepiness, so use rules that use as input what player can know.
It would be interesting to do genetics, but ban relaitons based on something else. And maybe even allow player to change a little these rules and allow him to see consequences.
Eg. allow him to switch off ALL bans, and see his tribe die in few generations (when healthy population is so small, that it does not overcome problems from outside world, like eg. attack of small group of goblins (insert anything from your world)).