Categories
Chromosome Mapping New Features Tools

It’s Bucketing at DNA Painter…

What is it? Bucketing is a new tool that lets you filter your DNA segments by using segment or match files from other known relatives.

Why would I use it? The tool helps you to see shared segments and accelerate the process of dividing your matches up by common ancestor.

What do I need in order to do this? As a minimum, you need your shared segments list and the match or segments list of another individual of interest.

Bucketing is a new DNA Painter tool that can help you subdivide your segments based on match or segment lists of other relatives. I announced the tool at the inaugural East Coast Genetic Genealogy Conference on April 23rd, 2022.

What is bucketing DNA?

Bucketing is a catch-all term to describe the process of organizing your DNA match segments into groups or “buckets” for specific ancestors. The Bucketing tool is an attempt to automate this process. It does this using match or segment lists of other relatives.

Buckets are often just maternal and paternal but can be more specific depending on the files you use. Please bear in mind that just as shared matches can include relatives who are related on other lines, bucketing in this way can introduce a certain amount of noise. Later in this article I’ll discuss some important caveats.

What do you need?

The Bucketing tool uses two or more files:

  • A comma-separated values (CSV) file of matching segments for an individual, exactly as provided by the testing company
    • I refer to this as the Main File
  • One or more other CSV files of matching segments or matches for relatives of this individual
    • I refer to these as Other Files
    • These other files should be from the same testing site as the main file

It then uses match identifiers to filter matches in the main file who also appear in at least one other file. You can also choose to filter just the segments that do not appear in any of the other files.

For example:

  • I can use my father’s segments file as the main file, and his maternal half-brother, maternal first cousin and maternal first cousin once-removed as the other files
  • The Bucketing tool returns a segments file filtered to include just segments for the matches that also appear in at least one of the other files

(The main file will typically be your own segments list from your testing company. But because he has a better crop of tested relatives, I’ve used my father as an example. You do not have to have a parent tested to use this tool.)

You also don’t have to have any close relatives tested. As a minimum, you need your segment list and the segment or match list of another person of interest.

Getting hold of the other files

To obtain segment or match lists to use for bucketing DNA, you will generally need relatives to send you their files. If you happen to manage their tests, you can just download them.

However, if a relative has transferred their DNA to Gedmatch.com, you can use the Segment Search Tier 1 tool to download the segment list for their kit ID.

Bucketing tool walkthrough

Step 1: Select Main File

Once you log in, you’ll see the opening screen asking you to select your main file.

The main screen for the Bucketing DNA tool
The initial view where you select the main file

Click ‘Select Main File’ and choose your segments file. This will be a CSV file. The tool can accept files from 23andMe, FamilyTreeDNA, Gedmatch, Geneanet or MyHeritage, as well as CSV files downloaded from DNA Painter chromosome maps. For information on where to find these files, you can click on ‘Where do I download the CSV of segment data’ or visit this help page. The file the testing company supplies may be zipped; please unzip it before using this tool.

Step 2: Select Other Files

You’ll now see the ‘Select Other Files’ box highlighted on the right.

Step 2: Select other files in the Bucketing tool
Step 2, with other files selected

In my example I’ve selected three close matches to my father. Once you’ve selected at least one file, the Filtering options box with the submit button will appear below

Step 3: Filter and submit

After you click ‘Bucket Matches,’ the tool will by default produce a filtered copy of the main file that:

  • Includes only segments for matches found in at least one of the other files
  • Removes all segments smaller than 10cM and then removes all matches who share less than 15cM
Filtering options

You can modify the settings to meet your needs. For example, if you’re working with your segments and your mother’s match file, you can exclude rather than include all matches in order to get just matches who you can infer are likely paternal. And if you would prefer to include smaller 6-10cM segments, you can select ‘Include all segments.’

Results screen

If the tool finds matches from the main file in at least one of the other files, you’ll see this success screen. This confirms the files and filtering options used, and includes a button that lets you download the filtered file.

The success screen in the DNA Painter tool for bucketing DNA
The success screen in the DNA Painter tool for bucketing DNA

What can you do with the filtered file?

If you now import this file into a DNA Painter chromosome map, you will be able to assign these segments to a more specific group based on how you’re related to the people whose lists you used to bucket them.

Looking at my example:

  • In my father’s map, I might tentatively assign the segments for matches that also appear in the lists of his maternal relatives as maternal
  • However, I can’t assume that every segment is maternal, since some of these shared matches will be related to his maternal relatives via different paths

Once you’ve imported the file, you’ll likely find some bucketed matches that don’t seem to make sense (a topic I discuss more below).

The overlay you see when you click on a segment in a DNA Painter chromosome map

If you made the file using segments from 23andMe or MyHeritage, you can click on a segment and click directly through to their match page at the testing site. Reviewing shared matches in detail on the testing site can help you clarify these confusing cases.

Observations and caveats

In my dream scenario, every bucketed match from my example would be maternal for my dad. I would be able to assume that every person who matches him and also matches a maternal relative must also match him on his maternal chromosomes. In reality, this won’t be the case.

The main cause of noise

Even if you have bucketed your segments with a match on a known line, you may connect to some of the bucketed matches via a completely different path.

Some clear examples jumped out when I bucketed my own segments with my father’s maternal relatives. I might have logically expected all these bucketed matches to be via my English paternal grandmother. However:

  • Some of the matches who turned up were matches to my mother, not my father!
    • These were people who were related to me via my Irish mother, but also to my English relatives via some other unknown path
    • This occurred on 5 different chromosomes, far more often than I intuitively expected
  • Various other bucketed matches have Jewish ancestry and match me via my Jewish grandfather while also matching my father’s maternal relatives

False matches

Alas, not everyone who appears on your match list is really related to you. Some false segments can be created by a weaving together of DNA on each copy of the chromosome.

Endogamous ancestry

Testers with endogamous ancestry will not be surprised to hear that this tool is about as helpful as shared matches. By this I mean not very helpful at all, sorry! I’m hoping to add additional filters in future could improve it slightly (see below under ‘potential future developments’).

For those of us with part-endogamous ancestry, bucketing can be helpful for teasing out matches to whom we’re related via ancestors who are not from endogamous communities.

Other observations

Completely unrelated people may have shared matches

In the process of testing the tool, I used my father-in-law’s file with the file of my 2nd cousin. I was surprised to find that there were 17 bucketed matches, even though they are not related to each other.

But should I have been surprised? Thinking about it, the match and segment lists I used each contain more than 10,000 matches. Some of these matches may be connections from more than 20 generations ago. So it’s not actually surprising if there are people within that timeframe who share DNA with both of them.

It helps identify pileups more clearly

I now have a clearer identification of some potential pileup areas in my father’s maternal DNA. In this position on chromosome 1, there are 20 matches sharing a very similar segment, and I can’t so far see any connection in any of their trees.

An apparent pileup on my father’s maternal chromosome 1

It could be that the connection is there and I haven’t found it, or it could be that it’s very far back.

Multiple relationships

Another case of something happening more often than I intuitively expected: I seem to be connected to several people via multiple paths. I found these people when comparing bucketed matches to my existing map. This effect may be accentuated in my DNA due to my ancestral background. As someone who is part Jewish, when I match others who are part-Jewish, there’s sometimes one ‘Jewish’ segment, and one English.

A learning opportunity

Much as I’d like things to be more clear-cut, I’ve learned interesting and unexpected things by exploring bucketing in this way, and I’m sure there’s more to discover.

Other bucketing methods

There are other options available for bucketing. These are some I’m aware of:

FamilyTreeDNA

FamilyTreeDNA have a system whereby:

  • You can identify matches up to 3rd cousin in your tree
  • They then identify other matches in your list who are on that side based on triangulations with the identified relative
  • This should have less “noise” than the DNA Painter tool

Family Locket has a helpful overview of this feature. I would love it if the other testing platforms did this.

There’s also already a method to import the FamilyTreeDNA bucketing information directly in to your DNA Painter map. Roberta Estes has an overview of this in her blog.

Excel

I will concede that my experience of Excel is on a ‘need-to-know’ basis. But it is almost certainly not hard to make comparisons that filter a list based on another list. I’m sure some people reading this have already been doing just that for years. I just prefer programming, and after all, not everyone has Excel!

MatchCompare

Kent Jaffa came up with MatchCompare, an Excel macro that helps speed up the process. You just need to format the data into a pre-set template.

One key difference from the DNA Painter tool:

  • MatchCompare works with match lists (where there’s one row for each match)
  • With Bucketing at DNA Painter, you can use match lists as the secondary ‘other’ files, but the main file currently has to be a file of segments

Double Match Triangulator

Louis Kessler is a Canadian developer who has created Behold Genealogy as well as the Double Match Triangulator (DMT).

A piece of Windows software, DMT goes further than the DNA Painter Bucketing tool by triangulating segments between the files. This should cut out some of the noise I refer to above! DMT can also output a file ready to be imported into DNA Painter.

Thanks / Potential future developments

This is an experimental tool and I’m very keen to receive your feedback. Many thanks to Leisa Byrne for helping to test this initial release.

Some possible additions and developments that come to mind:

  • Being able to apply the segment and match thresholds to the other files as well as the main file
  • Having an upper cM threshold for matches

Please visit https://dnapainter.com/tools/bucketing and let me know how you get on.

Contact info: @dnapainter / jonny@dnapainter.com