**Small, personalized language models as a meaningful step towards a less dystopian future** Colin Raffel June 3, 2025 [colinraffel.com/blog](http://colinraffel.com/blog) *Disclaimer: Some of the topics discussed here are better studied and discussed elsewhere, by people who, unlike me, are experts. If you think my reasoning is wrong, please reach out and tell me so. But I feel I know at least enough to be worried and to try to brainstorm ways to make the future less bad.* # Some worries A possible goal of the major tech companies developing AI systems is the automation of labor. These tech companies have poured a huge amount of money into developing AI systems with steadily improving performance and steadily increasing costs. One way to justify this investment is the potential capture of a significant portion of the labor market. Doing so could result in the displacement of human labor and/or a decrease in human autonomy. Realizing this goal probably does not require superhuman AGI or even AGI at all; I suspect that many companies would choose to replace a human worker with an AI worker *even if the AI worker was less reliable or accurate than the human worker* if the AI worker can complete the (possibly subpar) labor more quickly and cheaply. It would be nice if automation of labor would allow for most people to keep or improve their quality of life while working less, but I worry that it will instead largely concentrate more power, resources, and money with the companies providing AI systems. In a simplistic view, if a company automates a human worker with an AI worker, they are essentially displacing the compensation away from the human worker to the company providing the AI service. The economic implications of this displacement on a large scale seem worrisome on their face but are beyond me to try to predict or analyze. A major factor driving improvement (and investment) in AI has been increased scale. This has in turn led to a well-documented ever-growing demand for data centers. Data centers are designed to provide a maximally efficient environment to concentrate huge amounts of computing power. This concentration, however, creates huge, localized draws on the power grid. Anecdotally it seems increasingly rare that the heavily concentrated power usage of data centers can be met using clean energy sources. A potent example of this is the [use of methane generators](https://www.politico.com/news/2025/05/06/elon-musk-xai-memphis-gas-turbines-air-pollution-permits-00317582) at xAI's Colossus datacenter. It's hard not to speculate that the use of these generators is a downstream effect of the local power grid not being able to support the rapid construction of the data center (which in turn was spurred by the rapidly increasing demand for compute). I am not a climate scientist, but I believe methane generators are an especially "unclean" way to generate power. Notably, this is the datacenter where Anthropic is renting capacity to support their own ever-growing compute needs. An additional side effect of the centralization of resources is the ceding of control to the AI providers. As an example, providers frequently deprecate model versions that are relied on by downstream users. Additionally, the inner workings of these models - for example, their thinking traces, the way they manage context across sessions, or how they incorporate feedback - are generally not visible. While issues from ceding control might be as severe as, say, total upheaval of the global labor market, they are nevertheless inconvenient. # A possibly better future What if we shifted our AI usage away from the large AI providers to local, open-weight LMs? I think this would be a meaningful step towards a less dystopian future. On its face, this would give end-users a great deal more control over the models themselves, as well as more visibility into how they are used. It additionally would likely create much more dispersed demand on power grids, potentially making it more feasible to rely on clean energy. As things currently stand, it's not clear how local LMs would help avoid large-scale economic displacement. I'm not sure there's a good reason to suspect that a company would be more likely to retain a human worker with access to a local LM if it was possible to cheaply offload the work to an AI worker. To mitigate economic issues, I think we need to go a step further and make it possible to create *user-specific models*. In other words, I would maintain and improve a "Colin model" that was designed specifically to assist *me*, that could reliably reproduce my voice and the idiosyncrasies of my writing, and that had a good model of my tastes and preferences.[^workshop] Current proprietary AI systems get at this functionality by tracking and summarizing your interaction history, but I suspect that infrastructure could be built (mainly around fine-tuning and user-specific preference tuning) to make personalization work considerably better for a local model. Ideally this would *increase* each person's value, because working with an individual would be the only way to get access to that individual's model. Realizing this goal in a responsible way would require ways of preventing user-specific models from being stolen. One factor limiting the adoption and usage of open-weight LMs is that their performance tends to lag behind the "frontier" set by proprietary models. Estimates of this lag vary, though there is some evidence that it tends to hover around [6 months](https://artificialanalysis.ai/models/open-source). Regardless of whether this gap grows or shrinks, I suspect that in the near future we will reach a level of capability that is simply "good enough" for LMs to be meaningfully helpful on most tasks. In other words, having a local LM that significantly improves and speeds up a human's labor, for most kinds of labor, might not be too far off. For example, [ArtificialAnalysis' Intelligence Index puts Qwen3.6 35B-A3B at around the level of Claude Sonnet 4.5](https://artificialanalysis.ai/models?models=qwen3-6-35b-a3b%2Cclaude-4-5-sonnet-thinking). While benchmarks like the Intelligence Index are imperfect measurements of model utility, I think the fact that we are *already* at near-parity with an anecdotally highly capable model like Sonnet 4.5 gives credence to the possibility of "good enough" open-weights models in the near future. I intentionally used Qwen3.6 35B-A3B as an example because it's a model that is runnable at useful speeds on local commodity hardware (e.g. [an M5 MacBook Pro](https://simonwillison.net/2026/Apr/16/qwen-beats-opus/)), in part thanks to the happy accident of single-user inference being feasible on commodity hardware (such as the Apple M-series processors, Nvidia's DGX/RTX Spark, or AMD's Strix Halo) due to being primarily memory bandwidth-bound. The previous paragraph paints an optimistic picture, but the high incidence of technical jargon also highlights a major issue: It is still nontrivial to run local models. What models fit on a given user's hardware? Among those models, which present the best speed/accuracy tradeoff? Should quantized variants be used, and if so, which one? Which serving backends support a given model, and among those, which are the most efficient? How much thinking effort should be allowed? Proprietary LMs are often exposed via a simple chat interface, sparing the user from this complexity. Even user-friendly GUIs like [LM Studio](https://lmstudio.ai/) require a user to choose from a dizzying array of models, among other factors. For me personally, a greater concern with open-weight models is the increased possibility of malicious use. While open-weight models typically undergo alignment and safety training, it is considerably easier to "jailbreak" them due to the lack of safety classifiers and the possibility of performing adversarial fine-tuning (made easy with tools like [heretic](https://github.com/p-e-w/heretic)). I think there is some credibility in the argument that LMs could make it easier for people to cause harm, for example via cyberattacks or bioweapons. However, I *don't* think the answer is to insist that LMs be kept closed and behind proprietary interfaces; this feels like throwing the baby out with the bathwater. As a first step, I support significantly more research into making open-weight LMs more robust to jailbreaking adversarial fine-tuning (and this is an emerging research direction in my lab). I don't think that making LMs entirely robust is a feasible goal; instead, we should strive to make malicious use economically unattractive by virtue of requiring a cost-prohibitive amount of computation to execute. Finally, state-of-the-art open-weight LMs are in a somewhat precarious position by virtue of the fact that they are expensive to develop. As a result, the best open-weight LMs tend to be trained and released by resource-rich companies (disproportionately many of which are Chinese). I worry that as costs increase (as they have tended to do), this altruism will start to diminish. Given the potential public benefit that local models can provide, I would hope for significantly more public investment in them. In a more utopian version of the future, one could imagine a robust ecosystem of publicly funded LM development, producing performant models that can easily be run and personalized to end-users, increasing human value rather than siphoning it away. [^workshop]: This was roughly the original mission of [Workshop Labs](https://www.workshoplabs.ai/). They've been acquired by Thinking Machines; I'm not sure whether they are still working on this.