Right-Sized AI: Why Your Enterprise Deployment Isn't the Problem — and How to Keep It That Way

February 25, 2026

Right-Sized AI: Why Your Enterprise Deployment Isn't the Problem — and How to Keep It That Way

The headlines are alarming. Global data center electricity consumption hit roughly 460 TWh in 2024 and is projected to more than double by 2026. A single ChatGPT query consumes about 2.9 watt-hours — ten times what a traditional search uses. Training GPT-3 required an estimated 1,287 MWh and produced around 552 tons of CO₂. Tech giants are spending north of $1 trillion cumulatively on AI infrastructure, and data centers in some regions now consume more electricity than entire countries.

If you're a climate-focused organization — or any firm that takes its environmental footprint seriously — these numbers should give you pause before adopting AI internally. But they should also prompt a more careful question: where, exactly, is all that energy going?

The answer matters, because it's not going where most people think.

The Scale Gap Most People Miss

The energy consumption driving those headlines comes from a very specific category of AI activity: hyperscale model training and massive consumer-facing inference. When OpenAI trains a new frontier model, it runs thousands of GPUs continuously for months. When hundreds of millions of people use ChatGPT, Copilot, or Gemini every day, the aggregate inference load is enormous.

An enterprise AI deployment serving a team of 50–200 people is a fundamentally different animal.

Consider the math. A targeted retrieval-augmented generation (RAG) system — the kind that connects AI to your internal documents and databases — runs queries against a curated, indexed dataset. The compute required for each query is a fraction of what a general-purpose model needs, because the system isn't reasoning from scratch across the entirety of human knowledge. It's looking up relevant context from your data, then generating a focused response.

A firm running a few hundred queries a day against an internal knowledge base is consuming energy roughly equivalent to running a modest cloud application. It's not in the same category — or even the same order of magnitude — as the infrastructure buildout making headlines.

This isn't to say the energy cost is zero. It isn't. But the honest comparison is: what's the energy cost of the manual work this replaces? Analysts flying between offices for meetings that could be replaced by cross-system intelligence. Hours spent manually reconciling data from multiple sources. Redundant research performed because institutional knowledge isn't accessible. The net carbon math, for a well-designed system, almost certainly favors adoption.

Efficiency as a Design Choice, Not an Afterthought

That said, "almost certainly" isn't good enough for firms whose brand and mission depend on environmental credibility. The difference between a responsible AI deployment and a wasteful one comes down to engineering decisions that most vendors never discuss:

Right-size the model. Not every task requires the most powerful model available. A query that retrieves and summarizes internal documents can run on a smaller, more efficient model than one that needs to reason through complex multi-step analysis. Using the smallest model that achieves the desired quality for each task can reduce compute — and energy — by an order of magnitude. Running those smaller models on your own hardware is also a resilience play — it's part of why a hybrid local/cloud LLM architecture is becoming an enterprise necessity.

Cache aggressively. Many enterprise queries are variations of the same questions. "What's our exposure to grid infrastructure?" might get asked by five different people in a week with slightly different phrasing. Intelligent caching and semantic deduplication mean the system doesn't need to re-run the full retrieval and generation pipeline every time.

Tune retrieval, not just generation. In a RAG architecture, the retrieval step — finding the right documents or data — is far less compute-intensive than the generation step. Better retrieval means less work for the model. Precise metadata tagging, thoughtful chunk sizing, and well-structured data reduce the number of tokens the model needs to process, which directly reduces energy consumption.

Choose your infrastructure deliberately. Not all cloud regions are equal. Azure's sustainability dashboard lets you see the carbon intensity of different regions, and choosing a region powered by a cleaner grid has a measurable impact. Microsoft has committed to matching 100% of its electricity consumption with renewable energy purchases and being carbon negative by 2030 — building on Azure means your AI workload inherits those commitments.

Batch when real-time isn't required. Automated reports, weekly summaries, and monitoring alerts don't need to run instantaneously. Scheduling batch processing during off-peak hours or periods of higher renewable generation further reduces the carbon intensity of your workload.

These aren't theoretical principles. They're engineering decisions that can be made at deployment time, and they compound. A system designed with all five in mind can be dramatically more efficient than one that simply defaults to the largest model and the nearest data center.

The Real Question for Climate-Conscious Firms

The question isn't "should we use AI?" — your competitors already are, and the productivity gap will widen. The question is: are we using AI in a way that's consistent with our values and our mission?

For firms focused on climate, energy, and sustainability, the answer should be yes — but only if you're deliberate about how you build. A well-designed enterprise AI system, scoped to your specific data and workflows, running on renewable-powered infrastructure, with right-sized models and efficient retrieval — that's not at odds with a climate mission. It's a reflection of it.

The firms that will get this right are the ones that apply the same rigor to their AI infrastructure that they apply to their investment decisions: understand the trade-offs, measure what matters, and build for efficiency by design.

Learn more about our approach to Efficient AI by Design.

West Stack builds AI solutions for wealth managers and asset managers — private, secure, and designed for efficiency. If your firm is thinking about AI adoption and wants to do it responsibly, request a demo.