The heart rate monitor built into the new Apple Watch has sparked sharp debate over its risks and benefits, even though the feature was cleared by the Food and Drug Administration.

But out of the spotlight, the FDA has been doing away with regulatory action altogether on many diagnostic health apps targeting consumers, seeking to accelerate digital health adoption by defining many of these as “low risk” medical devices.

As the number of mobile health apps surged to a record 325,000 in 2017, app performance is going largely unpoliced, leading to what’s been dubbed a “Wild West” situation. Unfortunately for health consumers, the public can’t rely on the research community to play the role of sheriff.

When colleagues and I recently examined the medical literature on direct-to-consumer diagnostic apps in a study published in Diagnosis, we repeatedly found studies marred by bias, technological naïveté or a failure to provide crucial information for consumers. There was also a glaring lack of studies with actual consumers to see how they use these apps and what the impact on individual health, whether for better or worse, might be.

The app will see you now?

Interactive diagnostic apps now go well beyond “Dr. Google” keyword searches. They promise personalized information on whether a nagging symptom can likely be relegated to self-care or whether a visit to the doctor’s office or even the emergency room may be needed. Some of these apps become so popular that they have been downloaded tens of millions of times.

To understand whether the promising nature of these apps is backed up by the evidence, we searched both the peer-reviewed literature and nonacademic sources. The disturbing unreliability of that evidence for the average consumer is starkly visible when you consider apps that “advise” (a carefully chosen word) whether you might have skin cancer.

There are hundreds of cancer-related apps. Perhaps because melanoma rates have been rising for decades and it’s one of the most common young adult cancers, the largest group of articles we found focused on dermatology apps. One of the most prominent is Skin Scan.

If you’re a physician or reasonably savvy consumer, Google Scholar provides the easiest access to evidence-based information. One of the first results that pops up is a 2013 articleentitled, “Skin Scan: A demonstration of the need for FDA regulation of medical apps on iPhone.” If that title suggests a certain lack of objectivity, the problem isn’t limited to dermatology. We also found an orthopedist examining whether a symptom checker could “guess” the right diagnosis, and an ear, nose and throat doctor investigating whether an app could diagnose his own patients as well as he could.

That Skin Scan study sounding the alarm on regulation warned of a substantial potential for harm. Yet a separate study of the same app published online two years later was much more positive. Did app developers pour in improvements, or was it that the first researchers used their own skin growth photos while the second group used the smartphone’s images?

The answer is unclear. More broadly, however, researchers often seemed unaware of the impact of basic technological distinctions such as whether an app relied on user answers to questions, “crowdsourced” answers to others or used inputs from a smartphone’s camera and sensors.

More troubling was researchers’ lack of understanding of the public’s pressing need for reliable information. So, for instance, a study of four smartphone apps found that their sensitivity in detecting malignant skin lesions ranged from 7 percent to 98 percent. Yet the researchers chose not to identify any of the apps by name. Similarly, few studies mentioned cost (CrowdMed, for example, charges users a minimum of USD$149 per month), and those that did sometimes gave only a price range for a group of apps.

With scientific evidence sparse, consumers are left to rely upon online reviews – which, as a just-published study of popular blood pressure apps warned, can be dangerously wrong.

Or there’s always a random web search.

In the case of Skin Scan, my search found that in July the company that developed the app reported a melanoma detection sensitivity of 96 percent. That “report,” however, was part of a trade publication interview with SkinVision CEO Erik de Heus as the company announced it had raised another $7.6 million from investors.

Three years ago, a National Academy of Medicine report on diagnostic error called upon professionals to direct patients to reliable online resources. However, we found that search terms used by the National Library of Medicine’s PubMed Life Sciences search engine have lagged the digital health revolution, and medical journals do a hit-or-miss job of simply indexing every app mentioned in an article. The English National Health Service has launched an Apps Library to cut through the confusion, but there’s no similar resource in this country.

Is there a way to bring some order, if not law?

Some web-savvy researchers at sites like iMedicalApps are advising physicians about apps they can use themselves or others they can trust to recommend to their patients. Others trying to bring law and order to the wide-open health app field have suggested various frameworks, such as combining stakeholders’ expertise in collaborative health app rating teams. The goal would be to get innovators, policymakers and evidence-generators to jointly help corral confusing and contradictory information.

And as the debate over using Apple Watch data to measure heart health shows, FDA approval alone doesn’t remove the risk of consumers jumping to the wrong conclusion about what the information they’re receiving actually means. Nonetheless, as the pioneering stage of health apps starts to settle into the medical mainstream, the health of the American public requires apps and devices we know we can trust.

Michael L. Millenson is Adjunct Associate Professor of Medicine, Feinberg School of Medicine, Northwestern University
This article first appeared on The Conversation