← Back to Blog
·7 min read·Jake Lee

Microsoft Just Released 3 AI Models Worth Knowing About. Here's the Practical Breakdown.

AI ToolsMicrosoftAutomationSmall Business2026

Yesterday, Microsoft released three new AI models. They have names, they have prices, and they're available right now. No waitlist. No "coming soon." Live.

The models are MAI-Transcribe-1 (speech-to-text), MAI-Voice-1 (text-to-speech), and MAI-Image-2 (image generation). They're accessible through Microsoft Foundry — Microsoft's AI developer platform — and MAI-Image-2 is already rolling out inside Microsoft 365, Bing, and Copilot.

Most AI announcements are worth a skim and a bookmark. This one is worth a closer look if you run a service business with 2 to 50 people, because at least one of these tools is probably cheaper than whatever you're using now — or cheaper than the person you're paying to do the thing manually.

What Microsoft Actually Released

Here's the short version before we get into the details.

MAI-Transcribe-1 converts spoken audio to text. Microsoft is claiming the lowest word error rate across the top 25 languages tested. Price: $0.36 per hour of audio.

MAI-Voice-1 converts text to spoken audio. It generates 60 seconds of audio in about 1 second. Price: $22 per million characters.

MAI-Image-2 generates images from text prompts. It's at least twice as fast as Microsoft's previous image model and ranks in the top 3 on the Arena.ai image leaderboard. Price: $5 per million tokens for text input and $33 per million tokens for image output.

All three are available now through Microsoft Foundry and the MAI Playground. You don't need a developer to experiment with them — the playground is a browser interface — but you'll need a developer or an automation tool (Make, Zapier, n8n) to actually pipe these into your workflows.

MAI-Transcribe-1: The One I'd Pay Attention To First

Transcription is one of those things that most service businesses do badly, inconsistently, or not at all — even though every call they record is sitting there full of useful information.

At $0.36 per hour, the cost argument is basically gone. If you record 20 client calls a month, you're spending $7.20 on transcription. That's less than one cup of coffee.

The question is: what do you do with the transcript once you have it?

Here's what actually works. You feed the transcript into an AI model (Claude, ChatGPT, whatever you use) and ask it to extract action items, summarize the conversation, identify follow-ups, and pull any information that belongs in your CRM. That whole process takes about 90 seconds and produces something you'd normally spend 20 minutes writing by hand after a call — if you wrote it at all.

I've seen businesses run this on every sales call and cut their post-call admin from 25 minutes to 3 minutes per call. For a team doing 40 calls a month, that's 880 minutes. Nearly 15 hours. Gone.

Beyond call notes, cheap transcription opens up a few other doors:

  • Meeting minutes: Record your team meetings, transcribe them, and automatically distribute a summary and action item list. No one needs to take notes.
  • Content repurposing: If you record video content, a podcast, or any kind of audio, the transcript becomes a first draft of a blog post, an email, or a social caption. You're not starting from scratch.
  • Training material: Transcribe your best client calls and use them to train new team members on how to handle objections, explain your services, or close deals. Real conversations are better training material than anything you'd write in a handbook.
  • Pattern detection: If you accumulate 50 or 100 transcripts and feed them to an AI with the right prompt, you can spot patterns — objections that come up repeatedly, questions clients always have, moments where deals fall apart. That's data most businesses are sitting on and never using.

$0.36 per hour is not a barrier. The barrier is building the habit and the workflow. That's fixable.

MAI-Voice-1: Where It Makes Sense and Where It Doesn't

Text-to-speech has been around for a while, but the quality gap between cheap TTS and human-sounding audio has been closing fast. MAI-Voice-1 is priced at $22 per million characters, which sounds like a lot until you do the math.

A typical business voicemail greeting is about 200 words — roughly 1,000 characters. At $22 per million characters, that's $0.02 per voicemail recording. You could refresh your entire phone tree every week for under a dollar.

A 5-minute explainer video script runs about 750 words — around 3,750 characters — which costs roughly $0.08 in voice generation. Compare that to hiring a voice actor on Fiverr, which runs $50 to $200 for a basic voiceover.

Here's where voice AI actually makes sense for service businesses:

  • Phone systems and IVR: If you have an automated phone tree or after-hours recording, this is a no-brainer. Update your messages whenever you want, in seconds, for pennies.
  • Marketing videos: Product walkthroughs, explainer videos, social content. If you've been avoiding video because you don't want to be on camera, voice AI removes that obstacle. You write the script, the model reads it, you record a screen capture or use stock footage.
  • Training content: Internal training modules, onboarding videos, process walkthroughs. If you have text-based SOPs sitting in a document somewhere, you can turn them into narrated video training without touching a recording studio.
  • Multilingual content: MAI-Voice-1 supports multiple languages. If you serve Spanish-speaking customers or have a multilingual team, you can produce content in their language without hiring separate voice talent.

Where voice AI doesn't make sense: anything that requires emotional nuance, high-stakes client communication, or a personal relationship. Don't use AI voice for a call with your best client. Use it for your hold music script.

MAI-Image-2: Fast, Cheap, and Already in Your Microsoft Tools

Image generation is the most crowded space of the three. Midjourney, DALL-E 3, Stable Diffusion, Firefly — there's no shortage of options. So why pay attention to MAI-Image-2?

Two reasons. First, the pricing. At $5 per million input tokens and $33 per million output tokens, it's competitive with — or cheaper than — most subscription-based image tools when you do the math on volume. If you generate 50 images a month for marketing, social content, or presentations, you're probably spending $20 to $50 on a subscription somewhere. With pay-per-use pricing, that same output might cost you $5 to $10.

Second, if you're already in Microsoft 365, you get MAI-Image-2 through Copilot and Designer without doing anything extra. No new account, no new billing, no new tool to learn. It just shows up in the tools you're already using.

The practical use cases for a service business:

  • Social media visuals: Branded images, quote cards, announcement graphics. You describe what you want, the model generates options, you pick one. 3 minutes instead of 30 in Canva — or 3 hours with a designer.
  • Proposal and presentation graphics: Illustrative images for slide decks or proposals. Instead of searching stock photo sites for something that's close enough, generate exactly what you need.
  • Website and marketing images: Blog post headers, service page visuals, ad creatives. Consistency and quality on a budget.

One honest caveat: AI image quality is good but still inconsistent, especially for anything involving human faces, text in images, or highly specific branded compositions. For anything client-facing or high-stakes, you'll want a human to review the output before it goes live. For internal use, social posts, and blog headers, the quality is more than adequate.

The Bigger Picture: Why Microsoft Did This

Microsoft spent billions investing in OpenAI early. The relationship has gotten complicated — OpenAI is now a competitor in several areas, and Microsoft has been building its own AI infrastructure in parallel for the past two years.

MAI is Microsoft's bet on owning more of its own AI stack rather than depending entirely on OpenAI's models. The aggressive pricing on these three models is partly about market share, partly about making the case that Azure and Microsoft Foundry are where enterprises should be building.

What that competition means for you: AI tools are getting cheaper faster than most people expect.

Twelve months ago, $0.36 per hour for high-accuracy transcription would have been a good deal. Today it's the new baseline. Eighteen months from now, it'll probably be lower. The same is happening with voice, image generation, and every other AI capability.

The businesses that win in this environment are not the ones waiting for prices to drop further. They're the ones building the workflows now, learning what works, and iterating — so when the next price drop happens, they can just upgrade the model and keep running.

Waiting is not a strategy. It's just falling further behind while the tool costs decrease around you.

What You Should Actually Do This Week

Three concrete things, in order of impact:

1. Audit your current transcription situation. Are you recording calls? If yes, how are you transcribing them — manually, with Otter.ai, with Rev, with something else? What's it costing you? At $0.36/hour, MAI-Transcribe-1 is probably cheaper. More importantly, what are you doing with the transcripts? If the answer is "not much," that's the bigger problem to solve.

2. List the places you're paying for audio or image creation. Voice actors, stock photography subscriptions, design tools you barely use. Run the numbers. How much are you spending? What would it cost to do the same volume with pay-per-use pricing? In most cases, you'll find you're overpaying for capacity you're not using.

3. Pick one thing to test. Don't try to implement all three at once. Pick the one that maps to a real pain in your business right now — probably transcription — and build a simple workflow around it. Record a call. Transcribe it. Feed the transcript to your AI of choice and see what comes out. See if that output is useful. Iterate from there.

That's it. Not a six-month implementation project. One test, one workflow, one result you can evaluate.

Most businesses overthink AI adoption and underdo the actual work of testing things. The tools are cheap now. The cost of a bad experiment is $2. Run the experiment.

One More Thing Worth Knowing

Microsoft is not the only company releasing tools at these price points. Google, Anthropic, and a wave of smaller players are all competing for the same market. The Chinese open-source ecosystem — Qwen, DeepSeek, and others — is releasing powerful models at near-zero cost, which is forcing Western providers to keep prices competitive.

The practical implication: if you're locked into any single AI vendor right now and paying full subscription rates, it's worth checking whether you're getting the best deal. The market is moving fast enough that prices you locked in 6 months ago may not reflect what's available today.

Don't be loyal to a tool just because it's the one you started with. Be loyal to results.

If you want to talk through which of these tools makes sense for your specific business — and how to actually wire them into your workflows without a developer on staff — book a free call. We can usually get you a working proof of concept in under an hour.

Share this article: