'A hybrid of Big Tech and big medicine': Truveta’s plan to make health AI that actually works
Former Microsoft executive Terry Myerson's new health data venture aims to apply big data to patient care with "an alliance of health systems."
If health AI were a patient at a hospital, its chart up to this point wouldn't look too promising.
Its symptoms are long-standing and chronic: a lack of interoperability, a dearth of equitable data sets and a difficult-to-navigate relationship with patient privacy. And the specialists that have taken a crack at treating it — Big Tech, insurance giants, AI and cloud companies — have largely come up short with patient-care tools that have broad utility across the medical field.
But Terry Myerson, the CEO of the newly-announced collaborative health data venture Truveta, said he believes the prognosis could be starting to turn the corner.
The former head of Windows at Microsoft, Myerson has assembled a team meshing Silicon Valley talent and health care expertise that, from the get-go, is approaching health data from a unique angle. Like its funding, Truveta's data will be sourced from 14 health care systems that span the country and account for tens of millions of patients.
With the scale and diversity of the data, along with buy-in from providers, Myerson sees Truveta as being in a unique position to use AI to structure the data and build a database that has clinical and research value, all while protecting the privacy of patients by relying on the ethics and governance procedures already built into the health networks' policies.
"The most important thing to all the health systems that are in this really has come down to governance," Myerson said. "This is their patient's data. They don't want this to be used for marketing drugs to sell the wrong drug to the wrong patient."
In an interview with Protocol, Myerson explained Truveta's new approach to data collection and analysis in medicine, how his company addresses algorithmic bias, how they're protecting privacy in the data and creating "an alliance of health systems."
This interview has been edited and condensed for clarity.
You've already managed a feat in bringing together 14 somewhat unlikely partners with Truveta. How did the company come to be?
How far back do you want to go? I was pre-med in college, but then fell in love with computers and ended up pursuing a career that was in computer software. In 1994, I remember I was actually working in the Washington, D.C. area, working with the National Institutes of Health on scientific visualization. I got exposure to the internet for the first time, fell in love, started a company to do internet work and then in 1997, Microsoft acquired that company. I thought I was going to stay a couple of weeks, but ended up staying 21 and a half years. When I left in 2018, I joined a local venture capital firm named Madrona Venture Group, which gave me the opportunity to really rekindle this interest and became absolutely fascinated in the intersection of data sciences and life sciences.
I saw just so much opportunity for innovation to bring computing and bring big data techniques to everything: to discovery of new biomarkers, the design of proteins, the diagnosis of diseases, reading X-rays. But there was this common trend in all of this big data work — nobody really had access to good data.
How did the pandemic play into it?
One of my old colleagues from Microsoft had become the chief information officer at Providence [one of the health systems working with Truveta] and said, "Hey Terry, I know you're super interested in this space. There's this little virus floating around. Why don't you come down and help us out and be part of the virtual team?" It was eye-opening to me. I saw that they were not able to ask and answer some really fundamental questions: Who should be intubated? How long should they be intubated for? The questions just went on and on, and the data was just not organized in a way for them to ask and answer those super important questions.
I watched health systems working so hard to provide data to regulatory authorities to create all these dashboards. Every city, state, county, federal agency, everyone was now creating a [coronavirus] dashboard, and they were all competing for the coolest controls and needing new data sets to come out for these. And there was this question for me. We've got all of these great COVID dashboards. My dad died of Crohn's disease, colon cancer; where's the dashboard on that? I had ACL surgery; where's the dashboard on that? My wife's best friend just had breast cancer; where's the dashboard on that? And the answer was, well, there isn't one.
That's when I learned about this project called Truveta, which the Providence team had actually written white papers on back in 2018 [saying] we need to build a national data alliance of health systems. We need to put our data together so we can learn about it at scale and govern the ethics of this data so that it's all used for the benefits of patients. And I'm like, "This is a really good idea. We should make this thing real."
You're talking about a ton of data, some of it surely structured better than others. What does the process of getting it to something workable in a clinical or research environment entail from a data science perspective?
We're actually quite fortunate. The academic community has done some amazing work to structure and model human health. It's not used in the production [electronic health record] systems today, but if you look at SNOMED or LOINC or RxNorm, just three very common ontologies you find broadly used in academia, there's some really great work out there to build a model for health concepts. And we're going to build off that amazing work.
I feel like we're standing on the shoulders of giants in the academic community that have done brilliant work, sometimes decades of brilliant work, to try and model human health and the relationships between health concepts. And we get to apply that work.
Apply it how?
In that case, I was talking about bringing structure to all of this unstructured data. And it has two purposes: One, it enables it all to be brought together in one normalized sense. But really you want to query it. If you want to look for people with certain side effects, certain outcomes, certain cohorts of a disease, you're going to be querying on these concepts.
We're not the first ones to try to model cancer. We want to build off the work in the academic community to model cancer. And our team consists of ex-Amazon, ex-Google, ex-Microsoft, ex-Facebook [people] and a bunch of clinically-trained people. We're this beautiful blend of clinical training and technology, but we're bringing all the hard work we learned trying to build global scale unstructured data systems to this new domain. We're this interesting hybrid of big tech and big medicine.
Health equity was a big part of your launch announcement. Health AI in the past has been the subject of criticism about algorithmic bias as it relates to gender, race, ethnicity and socioeconomic status. What's your plan to combat bias in the models you're building?
I think there's two things. First is just transparency. Truveta is being very transparent about where the data comes from, and we're going to be transparent about the demographics of any given data set anyone's doing work with and of the covariance of that data set versus the population. You'll know going in what the population you're reasoning against is, relative to the population.
Most of the data sets out there today are completely opaque as to where the data came from and the population inside the data set. To me, not only does that put unknown bias in, it fundamentally lacks scientific integrity. That transparency hopefully builds trust and helps us make reasoned decisions about whether we are unintentionally creating any bias by putting a data set in that's not demographically representative.
The second thing is we have scale. If you're going to create a data set to reason against, you want it to be statistically representative of the population, but unless that data set has scale, you can't finely cohort on it. Say I'm looking for people that have COVID and epilepsy. You better have enough data, such that when you look at that specific population, it can still be representative. That combination of scale and transparency about where the data comes from and the demographic population of it will shine a light on what we're reasoning against, what we're making policy decisions against, what we're creating algorithms for.
There was an article in STAT a couple of weeks ago about how all of these medicines have been approved with data sets where you don't know where they're from and don't know what populations are represented in them. And these are things that have been FDA-approved or CMS reimbursement-approved. To be fair, it may not be the fault of the innovator that they use those data sets because that's all they have today. But the inability to share information about the data and where it came from is undoubtedly creating unintended biases in these tools. With Truveta, we hope to move past that, where data sets can be reproduced, populations are representative and we have full transparency as to who we're reasoning about. It doesn't solve inequities, but I think it moves us really down the field in terms of having a better foundation to think through them.
The flip side of transparency is privacy. You're dealing with a public that's generally uneasy about public health data, to the point that what we saw play out with Google and Ascension last year opened up new questions about that line between what's legal and what's publicly accepted. With that in mind, how does privacy play into Truveta's product design?
It's something where we need to keep learning. When you do something new, you're always going to have new things to learn. Foundationally, because we're health system-led, everything we're doing is compliant with all state and federal regulations for health policy rules. And there are a number of health data organizations out there that are not HIPAA compliant, but we are. In everything we're doing, from notifying patients through de-identification, being HIPAA compliant is top of mind.
Then, there is this notion of this being the patient's data, and we've got to reason about that and do the right thing. We're continuously thinking about what the right way to inform patients is. We're thinking about the greater good, and we recognize that trust with patients is paramount to all of our partners and, having done new things before, we're going to learn and continuously improve.
Looking ahead, what's the roadmap look like for the rest of the year for the company, and what part has you most excited?
We hope to bring up the platform this year. Calendar 2021 is our year of building and bringing this thing to life. Most of our team are actually engineers — we've got a whole lot of people building this platform to try and bring it to life.
And I just feel like we're going to shine the light on so many new things that just haven't been seen before. It's exciting to work on something that could have such a big impact on humanity, on the people I care about in my life. You just don't get the opportunity very often. When I worked on Windows, I thought this is impacting a billion people around the world, but this, at least to me, has become so much more meaningful when I think about what we could do.