There is a joke I hear a lot of people make when they encounter bad writing: "It sounds like an AI wrote this."
Sometimes, if they remember what my job is, they'll backtrack out of fear that I'll take offense, like I'm the AI's overprotective dad and they just laughed at it failing to kick a soccer ball. I don't take offense. Usually they're talking about writing that has jarring syntax, odd errors, and perhaps the right concepts and vocabulary for a given topic, but not the ability to relate those concepts to each other in a "human" way. That's a pretty accurate description of what AI-generated text used to sound like a couple of years ago, and I understand why the idea of that version has stuck around: it was really funny.
The failures of present-day LLMs to satisfy our writing needs are extremely boring by comparison. It was much more fun when AI-generated texts ("I told AI to write Harry Potter," "I told AI to write an M. Knight Shyamalan movie," etc.) were synonymous with a kind of dreamy, herky-jerky soup, with characters of the given franchise half-present and arranged in exquisite wrongness. A lot of those texts went viral. Some of them were genuinely produced by early AI models, and some weren't. But even those manually-written "fakes" were fun: human caricatures of machine caricatures of our world. Because I started officially as a prompt engineer in 2021, I spent a lot of time with the early models that either wrote or inspired these make-believe scripts, and I feel confident in saying that they were not a full and accurate representation of how those models behaved. No. Those models were much, much weirder. This is not an article exploring the infinite bizarreness of pre-ChatGPT LLMs, partly because I'm scared of how long that article would be, and partly because I don't think any particular model provider would like to be associated with a machine I once saw generate the words "Users can choose to be potty trained at their local pool and get picked up by a Uber and Lyft at the same time" in response to a simple request to write an ad for a property rental company.
Today, LLMs write with basically no errors. They use "normal" (if pretty safe) syntax, and speak coherently and knowledgeably about most topics. That said, there's obviously still a certain "feel" to their writing. Assuming a relatively neutral prompt (e.g., "Write about the impact of the space race on American culture"), I would describe the way most models write as something like an annoyingly-smart high school senior. Something else they share with that group is their slight wordiness—nothing crazy, just the sneaking feeling that perhaps a word count is trying to be reached, a vague sense that the sentence you just read was 10% padded bra, or a man walking around in one of those inflatable suits.
But for most of us, current-day AI could easily beat us in, say, the paragraph we'd churn out if given a few minutes and a mundane topic—especially if we're talking about polish and flow. AI-generated text rarely reuses the same specific word multiple times in proximity (like I initially did in this sentence and spent some alarming amount of time editing my way out of, and like many of us struggle not to do, all the time). Plus, its sentence structures are consistently, unrelentingly varied. We human beings are prone to awkwardly revisiting the same word and riding on the same syntax again and again, even while we might eagerly forge ahead conceptually; in speech the two are often inversely correlated, as we divert brain power away from the form of our words and towards the function of the ideas we're communicating. When it comes to expressing coherent ideas via polished writing at high speed, AI has obviously sprinted past us. (Do you even want to know how long it took me to write and edit this entire paragraph? Do I want to know how many choices within it I'll regret once I read it back in published form?)
And even though the AI varies its word choice, and sounds a bit like a precocious high schooler I would've either been or hated, it never strays too far from a "mainstream" pool of words that feel eloquent but still well-known, which is a prudent choice. People often have their own favorite uncommon words that stick with them and crop up in their thinking and writing, and sometimes trying to remember what "eclat" or "nadir" means can interrupt the flow of understanding the author's point. In this way, the AI surpasses us again.
Underlying framework, however, is where the cracks show.
Most AI models tend to favor list-like structuring, even for prose, and often punctuate the ends of their outputs with a neat, full-circle return to their main concept. In contrast, because most people writing something possess a full mental model of their viewpoint before they even begin, they may choose a non-linear starting and ending point for their text. They're more likely to explore the space between those points non-linearly as well, surfacing bits and pieces of fully-extant but partially-veiled thoughts, along with deeper dives; a fin above the water at the beginning of the piece may connect to a submerged shadow halfway through it, and pay off as the full breach of a whale by the end, in a way that can only happen if the whole "whale" existed before the piece of writing ever did. In other words, you get the sense that the author is looking at a complete picture, and is making strategic choices about how and when to share parts of that picture with you. It leaves room for both error and inspiration—sometimes those choices are flawed, and sometimes they're brilliant, often in the same piece of writing.
Because LLMs generate text procedurally, it can be difficult for them to emulate that kind of experience. They can absolutely achieve non-linearity, especially when prompted to write something established as having a non-linear structure (knock-knock jokes, for a very simple example), and some models are better at mimicking non-linearity than others. But watching them generate what is meant to be a finished piece of writing often feels like watching someone think through something out loud while pretending they're not. When they do reach moments of complexity and nuance, it tends to be later in their text, as they stack concepts on top of each other brick by brick. The result is a piece of writing that feels vaguely, in some fundamental way, like the author was trying to “solve” it while they wrote it.
We do two incredible and vastly different things with language: we think, and we convey.
We've dubbed this thing we're making "artificial intelligence," and we've given it the ability to use words, but we've never really established whether we're trying to emulate the insane miracle of human writing, or the insane miracle of human thought. Right now, we seem to be in a bit of a limbo between the two. We have something that is a little too improvisational to nail how we write, but perhaps a little too polished and aware of its audience to reap the full potential of how we think and reason. Models like o1 from OpenAI have attempted to move decisively in the “thinking” direction, partly by employing mechanisms that simulate a kind of non-linearity (creating potential pathways in their hidden chains of thought before choosing or discarding them), but doing so comes with the implicit assumption that this attempt at non-linear process is best suited for logic problems, and should inevitably move the model farther from creative or communication-driven writing and more towards reasoning and thought. Winding paths, backtracking, complex overlapping of non-concurrent steps—all this is currently reserved for math problems and deductive reasoning, with the focus decidedly off of how that approach could change the AI’s creative or artistic abilities.
In general, we've achieved a lot by encouraging the AI to go "step-by-step," conceptually putting one foot in front of the other in a march that has led it to pretty astonishing achievements. But we, as humans, don't achieve our best writing in a single linear march from origin to destination. Good writing is like proprioception of the full body; it is aware of the existence of its whole self, whether because of the author's mental model before starting, significant editing afterward, or an intensive combination of both. It exists in, and as, its own past and future, all at once, and often burrows complex tunnels to let light pass between the two.
For now, my main advice to anyone using AI for creative or “human” writing tasks is this: prioritize thinking steps. Thinking steps matter more deeply than you've been told, and they should be weirder than you would expect. Chain your prompts, and make full use of the model as a ruminator, discoverer, and inventor before asking it to succeed as a writer. Don't settle for telling it to "brainstorm about this topic." Try to understand the human cognitive processes involved in what you're aiming for. Do the task yourself, and pay attention to the choreography of your brain as you do it. Figure out how to mimic that choreography in the steps of your prompting. At present, we're confined to linearity, both in the actual mechanism by which LLMs generate text and in the way we chain together prompts. Your challenge is to structure your linear prompting chains to imitate a non-linear underbelly of thought—the system of paths between the past and future that make good writing possible. In other words, if you want the AI to generate a fin above the water in a believable way, it's also up to you to figure out how to create the full whale underneath it.
Write 10x faster, engage your audience, & never struggle with the blank page again.