Medical records remain a walled garden mostly because of HIPAA. Unlike all the AI development that has managed to skirt the copyright law to train large models, training a 1T+ parameter transformer on a large enough dataset will need a lot of consumers to give up their medical records.
I've noticed a funny trend where people will bring up an ethical concern and some dweeb will immediately roll out the Chinese strawman like it justifies their incredibly dubious position.
"We should violate medical privacy laws on an enormous scale because China" Okay buddy.
I wasn't suggesting that we should, but in my lifetime we went from hospital admissions and discharges being published in the local paper, to that information being some of the most closely guarded with (theoretically) severe penalties for disclosure.
I think it was AIDS and people not wanting to be outed that changed things.
But whatever the reason, we now have HIPAA and it adds a huge amount of complexity to the management of medical records, and a huge amount of complexity to any research that needs access to a broad spectrum of medical case histories (such as training AI). Other countries don't have these concerns, and will outcompete the US on developing these capabilities.
Concerns about HIPAA as an obstacle to research are largely a red herring. Almost no clinically useful research can be accomplished by just processing large volumes of historical patient charts. In practice the researchers will have to gather additional data from study subjects so they're going to need to obtain consent anyway.
People keep claiming that other countries will out compete us in medical research and it keeps not happening.
yes but you can train relatively dumb AI on high volumes of historical data and presenting factors. This is typically just a 'diagnostic assist', but often significantly better than a human
I did work on a diagnostic assist tool developed by a large pharma which diagnosed between Asthma and COPD. GPs get this right 52% time, specialists just over 60 and AI came in over 80%...using 12 relatively mundane input variables. I believe there are a lot of these situations, but not clear pathway to FDA approval
If we become worse off than China, it's because we squandered a massive decades long lead and actively moved critical industrial processes over there, not because we didn't let AI companies scrape HIPAA data.
The Red Scare never ended for a whole lot of Americans. And my foot is more communist than modern China is. But you know, being afraid of Communism is just as, if not in fact easier, when you have no clue what Communism actually is.
If copyright stayed at 7+7, I would agree that AI companies were "skirting" copyright law, but as I see it life + 70 violates the "limited" terminology in the Constitution by denying people the right to create derivative works.
IP holders did this to themselves by extending the term beyond any reasonable interpretation of "limited".
Hell, Jack Valenti wanted to extend it to "forever minus a day"
IMO, copyright law in its current form is unconstitutional and unnatural.
Epic probably has enough breadth on their own to do it, although they don’t actually have an image display system that I know of, so perhaps they don’t have access to the raw images - only the written reports.
From a summary on HHS.gov it says "De-Identified Health Information. There are no restrictions on the use or disclosure of de-identified health information." Maybe someone with more knowledge could expand on the limitations of what counts as "De-Identified" but I think that might work. I followed the reference and nothing in the CFR jumps out at me, but I'm not a lawyer so who knows.