Categories
Chromosome Mapping Guest posts Inferred Chromosome Mapping Tips

Reconstructing Grammy’s DNA

For my latest guest post, I’m pleased to welcome Tanner Tolman, a professional genealogist based in Utah. Tanner has successfully achieved something that’s a holy grail for many genealogists: reconstructing someone’s DNA based on the DNA of their descendants. Tanner has written a detailed account of the steps he had to go through in the process of DNA reconstruction for his wife’s grandmother.

Jonny

Disclaimer

Just because you can create a raw data file for an ancestor and upload it to other websites, does not always mean you should. Before uploading your DNA reconstruction to any company, I recommend that you first read their terms and conditions to make sure your kit is not against their rules. I am not a lawyer and cannot give you advice about which companies are ok and which are not.

Meeting Grammy

I first became interested in DNA testing in May 2014. I was single and in college. Once I learned about it, I quickly tested myself, both of my parents, my three living grandparents and even my great-grandmother who was still living at the time. I also decided that once I met my future wife, I would do the same for her family.

Fast forward to February 2016. I had just started dating my beautiful wife Whitney and we went to her grandparents’ house for a family party. I remember meeting Whitney’s grandparents, Bob Allphin and Julie Danielle Anderson Allphin “Grammy.” Grammy was frail, in a wheelchair, and on oxygen. I knew I wanted to marry Whitney and I knew I wanted to DNA test both of her grandparents, but I felt it would be presumptuous to ask before Whitney and I got engaged. I proposed to Whitney on April 6th and then Grammy passed away on the 10th.

I always regretted the fact that I delayed sending Grammy a DNA test. Going forward it has helped me feel a sense of urgency to test the oldest living generation before it is too late. I have had some close calls since then. Whitney’s other grandmother ended up passing away later that same year. Fortunately, I managed to send her a DNA test first. Her results arrived on the day of her funeral.

For five years, I accepted that I would only be able to give my children DNA results for five of their eight great-grandparents. Then in June 2021, I discovered Borland Genetics and learned that it was possible to build DNA kits for the other three. I have actively used Borland Genetics and DNA Painter since then and on 24 March 2023, I successfully reconstructed 99% of Grammy’s DNA with high enough quality that I could upload the raw data into other websites. This article will explain how I achieved this so that you do not have to spend almost two years of trial and error like I did.

Subscriptions and Programs

To make an excellent DNA kit, you are going to need subscriptions to the following three websites:

  1. borlandgenetics.com
  2. gedmatch.com
  3. dnapainter.com

I am not affiliated with anyone of these companies, I am just telling you the truth:

  • You need a subscription to GEDmatch because you will need to upload several kits into GEDmatch. Without a subscription you can only upload five.
  • You need a subscription to DNA Painter so you can create chromosome maps for all the necessary relatives.
  • You need a subscription to Borland Genetics so you can use all of the tools available there–most importantly, the Phase Map Locker tool which you will use to store chromosome maps that you create on DNA Painter and link them to the kits you create.

Finally, you also need to download the desktop version of Borland Genetics as well as a program called DNA Kit Studio. Both can be downloaded for free at the following links:

Part 1: Testing Everyone

Grammy is survived by her husband, all ten of her children, and four of her siblings. The ideal situation would be to test all of them with the same company and close to the same time so that they have all tested on the same microarray. As time passes, testing companies sometimes move to new versions of the “chip” or microarray. If some of the children are on different microarrays, it will increase the amount of no calls in the final product. If the amount of no calls gets too high, then your kit will be rejected by companies that accept uploads.

When doing your own reconstruction, it does not particularly matter what company you choose to test them all with. In my case, I chose AncestryDNA because several of Grammy’s children had already tested there. From those that had already tested, I asked for their DNA raw data files and from those that had not tested I gave them a free DNA test. All of them consented to letting me use their DNA for this project.

  • Husband: “Papa,” tested on 23andme v4 on his own, I also sent him AncestryDNA in December 2022 specifically for this project.
  • Child 1: Unable to obtain DNA results from. Not used for this project.
  • Child 2: Tested on 23andme v4 before the start of this project. Sent me her DNA for this project.
  • Child 3: Tested on AncestryDNA v2, before the start of this project. Sent me his DNA for this project.
  • Child 4: Unable to obtain DNA results from. Not used for this project.
  • Child 5: Unable to obtain DNA results from. Not used for this project.
  • Child 6: Tested on AncestryDNA v2 in December 2022 for this project.
  • Child 7: Tested on AncestryDNA v2, before the start of this project. Sent me his DNA for this project.
  • Child 8: Tested on AncestryDNA v2 in December 2022 for this project.
  • Child 9: Tested on FamilyTreeDNA v2, before the start of this project. Sent me her DNA for this project.
  • Child 10: Tested on AncestryDNA v2, in December 2022 for this project.
  • Sister: Tested on AncestryDNA v2, in December 2022 for this project.
  • Brother: Tested on AncestryDNA v2, in December 2022 for this project.

In summary, I obtained DNA results from seven of Grammy’s children (five of which were on AncestryDNA v2), her husband, and two of her siblings. I uploaded all of the kits to GEDmatch, Borland Genetics, and MyHeritage.

Stereo vs. Mono

Two important terms need to be understood in DNA reconstruction: Mono and Stereo:

  • Stereo kits have two alleles at each tested SNP, each allele being and A, C, G, or T and one allele always coming from father and the other from mother.
    • All people who take a DNA test in life have a stereo kit called a factory kit and the goal is to create a high-quality stereo kit for Grammy that resembles an AncestryDNA factory kit.
  • Mono kits only have one allele at each spot and if the kit was created correctly, each allele should just come from one parent.
    • Mono kits are created by splicing up stereo kits.
    • The goal of Borland Genetics is to splice up the DNA results from all of Grammy’s descendants into mono kits and then put them back together as a new stereo kit the way that DNA originally existed in Grammy.

Part 2: Missing Parent (Phasing)

In Borland Genetics, I first used the Missing Parent Tool to compare Papa’s DNA against Child 2. The missing parent tool compares Papa’s DNA against the child and creates a partial kit that only contains the SNPs that were unique to the child. Since a child gets exactly half of their DNA from each parent, all the DNA that does not match the tested parent (Papa) must come from the missing parent (Grammy). The resulting kit is a mono DNA kit and can be thought of as a 50% reconstruction of Grammy’s DNA.

I repeated this step on all of Grammy’s children. Each of them inherited 50% of Grammy’s DNA but each of them inherited a random half with some parts overlapping and being inherited by other children but also with each child inheriting some segments that were unique to them.

Part 3: Chromosome Mapping

The next step was to create chromosome maps for all seven of Grammy’s participating children at DNA Painter showing which segments each child inherited from Grammy’s father Vern and which were inherited from Grammy’s mother Joyce. I did this using a combination of chromosome mapping, visual phasing, and inferred mapping. Transferring all of the siblings’ DNA into MyHeritage was particularly helpful for this because they have a large database and a chromosome browser that allowed me to see what segments each child shared with their relatives in the database. The one child that was also in 23andme was also helpful for the same reasons.

Collage of six chromosome maps, each showing the combination of DNA that one of grammy's children inherited from her parents
Collage of six chromosome maps, each showing the combination of DNA that one of Grammy’s children inherited from her parents

Part 4: Reconstructing Grammy’s Paternal Chromosomes

Once I had all the chromosome maps created, I downloaded the CSV files out of DNA Painter and linked them to the corresponding mono kits for each child in Borland Genetics using the Phase Map Locker.

Then I used the Extract Segments Utility tool and these chromosome maps to extract out the DNA of each of Grammy’s children inherited from her father Vern. The result was seven mono kits that each had about 20-30% of Vern’s DNA, or in other words Grammy’s paternal DNA. I then used Borland Genetics’ Humpty Dumpty Merge Utility Tool and chose option 1: Mono to Mono and merged all seven of those kits together. I would have preferred to have merged all seven at once, but that caused the website to time out and crash, so I actually merged the DNA from the first two kits, then the next two, then the last three so now I had three larger kits instead of seven smaller ones. After that, I merged those three larger kits together.

The result was a kit that contained 49.5% of Vern’s DNA or in other words 99% of the DNA that Grammy inherited from her father. It was only 99% because there was a small gap on chromosome 3 and another on chromosome 12 where all seven participating children inherited DNA from Joyce. At all other points, at least one of Grammy’s participating children inherited DNA from Vern. If some of the other three children later choose to participate in this project, then it is possible that one or more of them will have inherited the right DNA to fill in these spots.

Chromosome map showing the reconstructed and unreconstructed parts of Grammy's paternal DNA
Chromosome map showing the reconstructed and unreconstructed parts of Grammy’s paternal DNA

Part 5: Reconstructing Grammy’s Maternal Chromosomes

Next, I went back to the seven 50% mono kits that I had for Grammy and again used the linked chromosome maps and the Extract Segment Utility.

This time I extracted all the segments that each inherited from Grammy’s mother, Joyce. The result was seven mono kits that each had about 20-30% of Joyce’s DNA, or in other words Grammy’s maternal DNA. Again, using Humpty Dumpty Option 1: Mono to Mono I merged all of those kits together as well. This time I reconstructed 48% of Joyce’s DNA, or in other words 96% of Grammy’s maternal DNA.

There was a gap on chromosome 6, another on chromosome 10, and three small gaps on chromosome 8 where all seven participating children inherited Vern’s DNA. If one or more of the other three children chooses to participate in this project, then it is possible they will have the right DNA to fill in these gaps.

Chromosome map showing the reconstructed and unreconstructed parts of Grammy's maternal DNA
Chromosome map showing the reconstructed and unreconstructed parts of Grammy’s maternal DNA

Part 6: First Attempt is Unsuccessful

Next, I took the maps showing all the DNA that had been reconstructed for Grammy’s paternal and maternal chromosomes. Using the extract segment tool, I moved them over from Vern and Joyce’s profiles to Grammy’s profile. I then merged these two kits together using Humpty Dumpty Option 2: Mono to Stereo. Mono to Stereo is always the last step in Borland Genetics. I recommend that you always build a person’s paternal and maternal chromosomes separately and then mono to stereo merge them at the end.

The result was a kit that had 97% of Grammy’s DNA. It had all of her DNA except for the aforementioned gaps, but it matched all seven participating children with a parent-child relationship. This kit was accepted into GEDmatch, but it was not accepted into any other sites that accept uploads. There are three main reasons for this:

  1. The kit had too many no calls. An analysis of the kit in DNA Kit Studio showed that of the 677,436 SNPs that AncestryDNA tests, 139,902 (20%) of them were no calls. A factory kit will typically have less than 1% no calls so this is much too high.
  2. The kit was not formatted to look like an AncestryDNA kit. It had a lot of SNPs that AncestryDNA uses but it was formatted incorrectly.
  3. The kit contained no mitochondrial DNA. Borland Genetics does not incorporate mitochondrial DNA or the Y chromosome into reconstructed kits at this time, but they are necessary to make the kit work.

If you have ever tried and failed to upload a Borland Genetics kit somewhere else, these are most likely the same issues that caused your kit to fail.

Part 5: Addressing the No Calls

The first of these issues was resolved using Borland Genetics.

To understand why certain SNPs became no calls even after seven children had tested requires knowing how a mono to stereo merge works. No SNPs were lost up until that point. When Grammy’s paternal and maternal chromosomes were merged in the final stages, each position had to have an allele (A, C, G, or T) determined in both her paternal and maternal copies or the result would be a no call. If either mono kit had a no call, then it became a no call in the final output.

To understand a real example of this, the first no call in the raw data file was on chromosome 1 at position 835,499 (rs4422948). Five of the children had inherited Vern’s DNA at this point and from them Borland Genetics’ software had figured out that Grammy’s paternal DNA had a G at this point, but only two of the children inherited Joyce’s DNA at this point and from those two it was not possible to determine what allele belonged. Rather than copy the G over twice, Borland Genetics made this a no call so it would not cause a problem with Grammy’s match list if the other allele was actually something else such as A.

This is why it is important to have as many of the test takers on the same microarray as possible:

  • The more times each SNP can be examined in a relative, the more likely the SNP will be able to be determined instead of becoming a no call.

In this case, there turned out to be several areas where Child 9 who had tested with FamilyTreeDNA was the only one to have inherited Vern’s DNA. The other six had all inherited Joyce’s. This was a problem for the reconstruction because only 30% of the SNPs tested at AncestryDNA and FamilyTreeDNA today are the same. In those places, only the 30% that was the same across both companies could be used, the other 70% all became no calls. It’s not that FamilyTreeDNA is bad. They are an excellent company. But for DNA reconstruction, the more children that tested on the same microarray the better for this reason.

The way to reduce no calls is to get more DNA tests to work with. Retesting Child 2 (23andme) and Child 9 (FamilyTreeDNA) with AncestryDNA so that they are all on the same microarray would reduce the no calls further and I might do this in the future. Additionally, DNA samples from children 1, 4, and 5 who are not participating would also reduce the amount of no calls. DNA from other relatives would have worked too. In this case, I reduced the no calls using DNA from two of Grammy’s siblings.

In this example, at that position (835,499 on chromosome 1), Grammy’s 97% DNA kit that was accepted into GEDmatch shows that her DNA was half-identical to her sister’s DNA there. By comparing her sister’s DNA to both Grammy’s paternal and maternal DNA separately, it became clear that it was their maternal side (Joyce) that matched. The sister had AA there. Therefore, Grammy must also have an A there on her maternal side. Now that it is known that Grammy inherited an A from her mother Joyce and a G from her father Vern, AG can be placed into her reconstructed kit instead of a no call.

To fill in the no calls quickly and efficiently, I waited for Grammy’s paternal and maternal mono kits to go through batch processing. This means they were compared against every other kit in the database and a match list was generated. Then I took Grammy’s paternal kit and ran it through a Borland Genetics tool called the Phoenix. The Phoenix generates a list of all the kits in the database that share some DNA with said paternal kit. I marked that I wanted the kit compared to Grammy’s brother and sister. The Phoenix created a new kit with all the DNA that the brother and sister had in common with Grammy’s paternal side and also filled in no calls in those matching areas wherever either the sister or brother’s SNPs were homozygous (AA, CC, GG, or TT).

I then did a Humpty Dumpty Mono-to-Mono merge on the old version of Grammy’s paternal DNA and the new one that I just made. This brand-new third kit has all the same DNA segments and SNPs as the first one did, but with fewer no calls. I then repeated this process with the DNA that I had reconstructed from Grammy’s maternal side.

Next, I took the brand-new paternal and maternal kits and merged those together using Humpty Dumpty Merge Option 2: Mono to Stereo. The result was the same as the first time I tried it, but now I had only a 9% no call rate. 9% no calls is much better than 20%. I do not know how many no calls are allowed before most companies will reject the kit, but I suspect that the maximum amount allowed is about 10%. With GEDmatch it is a lot easier than most other companies. With GEDmatch you need to have 87.5% coverage and I think you can have as high as 40% no calls.

Most of you will probably not have living siblings available of the person you are trying to construct. I ran the Phoenix tool on siblings, but it is not necessary that you have siblings in particular. Anyone who shares DNA with Grammy would have worked even if they were only a first or second cousin. The closer they are related the better because they will share more segments but use what you can.

This is as far as I was able to get using Borland Genetics. I had managed to solve the no call problem, but I still needed to include Grammy’s mtDNA and format the kit correctly. These two issues were solved using the desktop version of Borland Genetics and a program called DNA Kit Studio.

Quality Control: Possibly Redoing Everything

At this point I took the reconstruction and put it into GEDmatch to make sure it matched all of her relatives as expected. If perfect, the kit should be a half-match to all of her children across the whole genome. This was mostly the case, but there were a few little spots where Grammy’s DNA did not match her children. These were places where despite my best efforts, the chromosome maps I had made for Grammy’s children were slightly incorrect. Every time I have done a DNA reconstruction project, I have found slight little errors. It is normal and does not mean you did a bad job.

Because I wanted my kit to be perfect, I chose to examine those segments and figure out what had gone wrong, and then redid my maps and all of the previous steps several times until everything was perfect. But that is a lot of work and depending on how big and frequent the nonmatching segments are, you could choose to just move forward anyways.

The biggest headache for me in this case was the left side of chromosome 8. Child 2 has a tiny 1.1 cM segment from Joyce and Child 7 has a tiny 2.5 cM segment from Vern AND both are in nearly the exact same spot. It is extremely unusual for a child to have recombinations this close together. You will usually not inherit so small of a segment from your grandparents unless they are at the tip of the chromosome. This case was highly unusual.

The Chameleon Tool

Borland Genetics was originally created as a computer program. The online version launched in November 2019. There are a few tools that were present in the desktop version that are not in the online version. The most important of these is the Chameleon tool. By trial and error, I have found that the best thing to do is to use the Chameleon tool to create a blank copy of whatever template you want. In other words, a kit that has all the right positions listed, but has a no call at every single point. Like this:

A blank template created by the Chameleon tool in Borland Genetics
A blank template created by the Chameleon tool in Borland Genetics

I made this by using the Chameleon tool on Child 10’s DNA kit and an empty file I made in Notepad. I mapped the empty file onto the template of Child 10. I chose Child 10 because by choosing one of the five children that tested most recently, I would have the lowest no call rate.

Ancestry has changed their microarray ten times over the years, but they have not published dates for when each one would have been in use. This means:

  • Not only were Child 2 who tested with 23andme and Child 9 who tested with FamilyTreeDNA on totally different microarrays than the others, but
  • Children 3 and 7 who both tested with AncestryDNA on their own were also on slightly different templates.

However, all of the participants who tested in December 2022 are on version 10 of the microarray. Since the most test takers were on that template, choosing any one of them was the path to producing the fewest no calls.

DNA Kit Studio: RAW Extract

Next, I went to DNA Kit Studio and used the RAW Extract tool to extract the Mitochondrial DNA from one of Grammy’s children. In other words, I created a RAW data file that only had Grammy’s mitochondrial DNA. I chose Child 10 again just for consistency but any of the children who tested with AncestryDNA would have worked.

If you are trying to simulate an FamilyTreeDNA kit, then you can skip this step because FamilyTreeDNA’s autosomal (Family Finder) test does not test the mitochondrial DNA at all. If you want that information from them, you need to buy their Mitochondrial DNA test separately.

DNA Kit Studio: RAW Merger

The RAW Merger was the last tool I used. I first selected the blank file that I had made with the Chameleon, then I selected the best version of Grammy’s DNA that I had made (the one with 9% no calls), and then the one that only had her mitochondrial DNA and then I merged all three together. Before clicking the merge button, I made sure the Output Format Dropdown was set to “Ancestry” and that the checkbox called “Add not common SNPs” was NOT checked.

Including the blank Chameleon kit first on the list and not checking that box are both really important. When this tool merges kits, the first one is used as a base. The no calls in that kit are filled in with the values from the second kit and then the next one.

  • Putting this kit first ensures that anything that is still a no call stays as a no call instead of getting discarded.
  • Making sure “Add not common SNPs” is not checked matters because it affects what happens to SNPs that were in the reconstruction but not Child 10’s DNA from whom the template was made.

My reconstruction had 106,232 extra SNPs that had been determined from Child 2 and Child 9’s 23andme and FamilyTreeDNA tests but are not tested by AncestryDNA and therefore were not in the raw data files of those who had tested with AncestryDNA. After removing these SNPs, the kit will better resemble an AncestryDNA factory kit.

Additional Option if Working with Full Siblings

There is one more step I was able to take because I had DNA samples from two of Grammy’s full siblings. This step only works if you have full siblings of the deceased and was not necessary, but it did reduce the number of no calls down to only 6% which is even better.

I compared my reconstruction of Grammy’s DNA against her brother in a chromosome browser and found all the places where their DNA was fully identical. I also did visual phasing on the three kits and determined that Grammy’s DNA was most likely fully identical to her siblings in a few of the gaps.

In particular, I determined that Grammy’s DNA was almost certainly fully identical to her sisters where there is a gap in her paternal DNA on chromosome 12 and that Grammy’s DNA was almost certainly fully identical to her brother’s in the gaps on chromosomes 6 and 8. I also extracted these segments from their AncestryDNA kits and used them in the merge.

To ensure those SNPs were included, I made sure to list the extract fully identical kits from Grammy’s brother and sister as the second and third kits in the merge, the first still being the blank one. I put them high on the list because the merger prioritizes the SNPs from the kits higher on the list meaning that if two kits ever disagree, the first value is the one that is used. This increased my coverage from 97% to 99%. Only the gap on chromosome 3 in Grammy’s paternal DNA and the gap on chromosome 10 in her maternal DNA remain.

Conclusion

In the end I ended up with a high-quality kit for Grammy that has 99% coverage and 94% quality and resembles an AncestryDNA factory kit. The two gaps that remain are entirely homozygous because each only has DNA from one side. If the kit is run in GEDmatch’s “Are Your Parents Related?” tool, those two areas will light up and the tool thinks Vern and Joyce were distantly related when in reality they were not.

Despite this, the kit is extremely high quality. I have successfully uploaded it into FamilyTreeDNA, GEDmatch, Living DNA, and MyHeritage. In all of those databases it has a match list as if Grammy had tested in life. It matches all of her children with a parent-child relationship and shares about twice as much DNA with Grammy’s matches than any of her children do. MyHeritage even predicts that she is predominantly English and Scandinavian and also correctly puts her in the following genetic groups:

  • Mormons in USA (Utah and Idaho) and in Canada
  • Georgia, Florida, South Carolina and Alabama
  • Mormons in Utah and in Idaho
A photo of Grammy

Many thanks to Tanner for this detailed account. I’ve tried to put explanatory links in the text for technical terms. For a succinct overview of the Borland Genetics Tools, you can visit the FamilySearch wiki. They also have a Facebook user group.

More further reading on genetic reconstruction

Contact info: @dnapainter / jonny@dnapainter.com