Article Title

Health Care Systems Research Network Twin Cohort and its Potential Utility

Publication Date



genetics, genomics, epidemiology


Background: Family-based studies have historically been considered a powerful strategy when understanding the etiologies of human disease, especially those influenced by genetics. A gold standard in family-based study design includes twin studies due to the unique genetic relationships between twin siblings, but these unique familial relationships are relatively rare and difficult to recruit. Herein we demonstrate that electronic health record (EHR) systems across multiple Health Care Systems Research Network (HCSRN) sites can identify twin families. We further show the utility of such twin cohorts in research using twins identified in Marshfield Clinic’s EHR.

Methods: Twins were predicted by searching for patients who shared a common birthdate and last name along with a common home address, contact information or billing account. The twin prediction algorithm was applied to four different HCSRN sites, including Marshfield Clinic, Group Health Cooperative, Geisinger Health System and Meyers Primary Care Institute. In Marshfield Clinic twins, clinical phenotypes were defined by diagnostic ICD-9 coding. For each phenotype, a measure of familial aggregation and relative risk (RR) was calculated by assessing disease concordance in twin families. To further assess potential genetic etiologies, we compared familial aggregation in opposite-sex twins (dizygotic twins) and same-sex twins (enriched for monozygotic twins).

Results: A total of 21,699 families of twins (43,398 individuals) were identified across four HCSRN sites, including 8,242 families of twins from Marshfield Clinic’s EHR. Of the 5,598 phenotypes assessed by familial aggregation analysis, 1,222 phenotypes were statistically significant (P < 8.9E-6). When simply measuring relative risks across all diseases, 91% of phenotypes had relative risk > 1. There was a 4.2-fold enrichment of disease concordance in same-sex twins compared to opposite-sex twins for phenotypes with the largest relative risks. Many of these phenotypes were likely influenced by genetic factors.

Conclusion: This study has generated one of the world’s largest cohorts of twins. Unique to this population is the linkage to extensive phenotypic data through an EHR across multiple health care institutions. More broadly, with a significant proportion of diseases aggregating in families of twins, these results may emphasize the significant benefit of incorporating family data when predicting, preventing and treating many diseases for the advancement of precision medicine.




June 26th, 2017


August 10th, 2017