Dean Ball is a member of the board of directors of The Alexander Hamilton Institute for the Study of Western Civilization (AHI) and a research fellow at the Mercatus Center, George Mason University. He publishes an online newsletter “Hyperdimensional” on artificial intelligence (AI). In his most recent piece, he explores “Synthetic Data in AI: Implications for Policy.” A “long-term bottleneck” that might limit advances in AI, Mr. Ball observes, is the simple fact that it requires such a huge amount of data.

A potential solution, Mr. Ball notes, lies in the use of AI-generated data to train future AI systems. An especially active research area in artificial intelligence, this means of expanding the data supply “has already shown great promise.” Dario Amodei, CEO of Anthropic (a major AI company), said in a recent broadcast interview that “synthetic data,” could be employed to create an “infinite data generation engine.” It has already been used significantly—for example, to simulate real-world situations in the field of robotics. Mr. Ball believes major additional progress in the use of synthetic data would have large implications for our ability to forecast future developments in AI—and large implications for policymaking related to it.

An especially serious challenge in the progress of synthetic data as a technique is a danger recently identified—and widely noted in the mainstream media—as “model collapse,” which can result from using too much synthetic data. In model collapse, AI produces highly repetitive or otherwise poor-quality results—like “using your smartphone to take a picture of something on a computer monitor.”

We do not yet know how serious an obstacle this will be. “The model collapse phenomenon is undoubtedly real,” Mr. Ball writes, “but what that means for the usefulness of synthetic data … is still an open question.”