James Gosling Says to Ask One Question About Generative AI

"What's my tolerance for error?"

Innovation depends on clarity. A well-articulated problem or an incisive question can lead to groundbreaking invention. And yet, according to James Gosling, a Distinguished Engineer at Amazon, our terminology for generative artificial intelligence is anything but clear.

I recently had the chance to discuss this emergent technology with Gosling — a seasoned computer scientist who, among many other accomplishments, invented the programming language Java — and found his thinking on the topic to be clear and practical.

Perhaps the most important question developers should be asking about generative AI, he says, is a surprisingly simple one: “What is my tolerance for error?”

What Generative AI Can’t Replace (Yet)

One key concern for Gosling is language. “[People] take that phrase, artificial intelligence,” he says, “and... just leap to the wrong conclusion.”

We use the word “intelligent” — which implies reasoning, rationality, ethics, and intellect — to characterize applications and models with few or none of those features. We say machines “learn,” when their data ingestion and synthesis processes hardly resemble the continuous, unstructured noticing, remembering, associating, and ideating that constitutes human learning.

In short, Gosling argues, “artificial intelligence [as a phrase] implies something that is false.”

More technical folks might feel immune to the confusion these terms create: laypeople depend on these high-level definitions, but surely we understand the machinery behind the magic. Do we really, though?

The hype over generative artificial intelligence in recent years has seemingly outpaced the actual real-world implementations. Large language models (LLMs) are helpful and interesting, certainly, but they make mistakes. Text-to-image models spit out pictures of elephants with three tusks or five legs. OpenAI’s recently announced Sora creates videos where human hands still look nothing like human hands. Chatbots simulate sentience, even as they imitate the most toxic personalities on the internet. In short, these tools can’t fully replace our current processes for learning, creating, and interacting online — at least, not yet.

While we can theorize about the future, many of our misapprehensions about the present state of AI would be helped, Gosling argues, if instead of “artificial intelligence” or “machine learning,” we were to call these tools what they really are: advanced statistical techniques.

“That doesn’t give [people] any more information other than, ‘Oh, this is math.’ But that’s a good thing,” says Gosling, “because it doesn’t lead them down this garden path.”

Instead, it leads us to a fruitful line of thinking: statistical models have strengths and weaknesses. Regression models — which machine learning uses particularly well — may predict behavior on a population level with striking accuracy. But applying such models to individuals becomes more complicated; they won’t predict nearly so reliably if or when, say, a particular borrower will default on their particular loan. And they shouldn’t be expected to do so!

This example might seem a far cry from LLMs (some of the most popular recent implementations of generative AI) but they aren’t so different. LLMs develop a model of language based on billions of conversations they have observed, Gosling explains. Then they gather statistics on those phrases and use them to predict the next word in a sentence.

What does this mean practically? Large language models can often give useful answers to broad questions — questions on topics they’ve seen articulated millions of times — but they become increasingly unreliable as you ask more specific questions. As Gosling puts it, “Generative AI is really good at generating sentences that [sound] plausible, but are fundamentally horseshit.”

With the presence of this error, we must ask the obvious question: What are LLMs really good for?

What Generative AI Can Replace (or Improve) Right Now

Perhaps the clearest value generative AI offers at present is speed. Take online search as an example. LLMs often do not fundamentally change the knowledge you might acquire while researching a topic, but they speed up the process of acquiring it. Rather than searching various sources to find information, an LLM offers an instantaneous overview — with the fluff left out.

In that case, says Gosling, “the question to ask is, what is your tolerance for error?”

If you’re scanning a few online resources to familiarize yourself with a new topic, you may pick up dubious information with or without generative AI — and in either case, small inaccuracies may not significantly impact your broad grasp of the idea.

On the other hand, developers are typically hired on the basis of writing code that actually works — just as aerospace engineers are hired on the basis of designing airplanes that fly, and architects on the basis of designing buildings that stand. Depending on generative AI to accomplish these efforts probably isn’t a good idea, as by their very nature, such models have a non-zero chance for error.

The good news is, it doesn’t have to be all or nothing: AI-driven services can help generate the repetitive, boiler-plate code that might take programmers half an hour to churn out themselves (in fact, I recently worked with a developer who told me one tool made her 50% more productive). It may even inspire creative solutions to a coding problem you’re stuck on.

Just don’t expect it to do all your coding for you — or do so reliably.

And this gets to Gosling’s main point, which he says he shares with developers whenever he’s asked about generative AI: these new tools have value; they’re just best applied in circumstances that can tolerate error. (In fact, if a specific project can’t tolerate any error, Gosling suggests you “run screaming in the opposite direction.”)

But let’s think about error tolerant applications for a moment — and there are many of them.

AI image recognition is getting close to surpassing humans’, and it’s certainly more reliable than tired or intoxicated humans. So maybe people will begin to accept the non-zero rate of error for cars that drive us home after we blow over a given threshold on a built-in breathalyzer — because that non-zero rate of error is still much lower than the human rate of error while intoxicated.

Or imagine a student learning a new subject. Outside of the classroom setting, that student may try to improve their grasp on a topic by reading online sources, seeking the guidance of peers, or even meeting with a tutor. In all of those contexts, they face a non-zero chance of learning inaccurate information. But depending on the topic, that chance may be comparable — or even higher — than seeking guidance from an LLM. At the very least, an LLM could help, when used alongside those other means of learning, to make the student’s learning more effective.

Such lines of thinking, centering our error tolerance for a given problem, bring us closer to identifying the most exciting implementations for the advanced statistical techniques we call artificial intelligence.

To see what real-world engineers are building with generative AI tools, check out the community.aws generative AI space. And as always, if you have ideas about how generative AI could be used in the future, or on the topic in general, drop them in the comments below!

Reply

or to participate.