I’m pleased to introduce Cluster Auto Painter (CAP), an early step towards automated chromosome mapping. CAP aims to help you dig deeper into your DNA test results by letting you annotate and examine clusters of matches in a chromosome map.
What is a cluster of DNA matches?
A cluster is a set of DNA matches who share DNA with each other as well as with you. On your DNA testing site, these are the matches who appear:
- In the ‘shared matches’ tab (AncestryDNA)
- When the ‘In common with’ filter is used (FamilyTreeDNA)
- In the ‘Shared DNA Matches’ section (MyHeritage)
- Within the ‘People who match one or both of two kits’ report (Gedmatch)
In 2018, genealogist Dana Leeds developed an intuitive colour-based method for building clusters (read more about this). This was an instant hit, so developers and testing sites began to create ways to automate this process.
Where can I generate clusters of DNA matches?
While you can still collate matches manually, there are now automated systems that will generate the clusters for you.
Single-database cluster tools:
- MyHeritage now has an AutoClusters feature. This is available to subscribers, to early-adopter uploaders, and to those who have paid a one-off $29 unlock fee.
- Gedmatch now has an implementation of auto-clusters in its Tier 1 subscriber ($10 a month) package.
Sites that can scan different databases and generate clusters for each
- Genetic Affairs has its own web-based AutoCluster functionality. To use this, you add your credentials for each site. The service supports Ancestry, 23andme and FamilyTreeDNA.
- The DNAGedcom client, an application available for Windows and Mac computers, features a cluster tool called Collins Leeds Method.
- Finally, Shared Clustering is an excellent tool for gathering clusters from AncestryDNA; however, since AncestryDNA doesn’t provide segment data, you cannot use clusters of AncestryDNA matches from any provider to generate a chromosome map.
I hope in future to be able to add a way to store and make notes on AncestryDNA and other clusters within DNA Painter in a way that doesn’t involve segments.
How to use CAP
- Create a cluster at either MyHeritage, Gedmatch, Genetic Affairs or DNAGedcom (see Where can I… above), and visit CAP at http://dnapainter.com/tools/cap
- Optionally specify the gender of the test taker or adjust the cM threshold for segments (the default is 7cM)
- Browse for your cluster file (unzip it first!) by following the on-screen instructions for the site you used to generate your clusters
- If your cluster file contains the segment data, you’re all set. If not, you will be prompted to browse for your segments file (full instructions appear on-screen!)
- Click the button to generate your chromosome map!
Additional features
- CAP can detect if you’ve used a phased maternal or paternal cluster file from Gedmatch. If this is the case, it will label all cluster groups as maternal or paternal accordingly in the chromosome map.
- You may have have information on whether matches are maternal or paternal within FamilyTreeDNA or 23andme (for example if if known relatives have also tested there).
So if your Genetic Affairs autocluster file for a FamilyTreeDNA or 23andme kit includes this maternal or paternal ‘bucket’ information, CAP will use this to make the cluster groups maternal or paternal in the chromosome map. - If you’ve selected Paternal or XY, CAP will infer that there’s only one X chromosome and will use this info to mark any clusters with X segments as maternal.
How can making a chromosome map of your clusters help you interpret your DNA test results?
It’s early days! But here are a couple of ways that jump out.
Annotate
Clusters tend to be output in read-only formats. CAP allows you to annotate your clusters and start to make sense of them in a way you can come back to.
Just as in any other DNA Painter chromosome mapping profile, you can make notes at a cluster or segment/match level.
Identify pileups
Inevitably, some of your clusters will be more useful than others. Some of them may turn out to be ‘pileups’. These are regions within your DNA where you seem to have far more DNA matches than would be expected.
This generally means that any relationship that you have with these matches is likely to pre-date the genealogical timeframe. You should therefore devote your research time to other matches.
Thank you
Many thanks to the following people who have helped tremendously as I’ve developed CAP:
- The DNA Painter beta testing group
- Evert-Jan Blom of Genetic Affairs and Rob Warthen/John Collins of DNAGedcom, who kindly added the segment data into their cluster files.
If you’ve read this far, I hope this feature has piqued your interest and that you’ll find it helps you interpret your DNA test results. To get started, just go to https://dnapainter.com/tools/cap. There’s now also a simple FAQ on that page.
I’m always keen to make improvements based on user feedback. Please email me if you have any suggestions.
Thanks!
Update: I’ve written a new post on annotating CAP chromosome maps
Contact info: @dnapainter / jonny@dnapainter.com