There is a version of AI research that lives entirely inside benchmark tables. And there is another version — the one I find more interesting — that has to survive contact with the real world.
The gap between the two is where most of the hard work happens.
Scale changes the problem
When you are building models that will serve billions of people, the questions shift. It is no longer just "does this work?" but "does this work reliably, across languages, cultures, and contexts we did not anticipate?"
At Microsoft Research, leading Project Z-Code taught me that multilingual AI is not a translation problem — it is a representation problem. The model has to learn that meaning is not language-specific. That was the insight that unlocked Z-Code's performance across 100+ languages.
The deployment gap
Most research papers stop at the model. But getting to real-world impact means solving a second set of problems:
- How do you serve a model at a cost that makes it accessible?
- How do you evaluate quality in languages where human judgment is hard to source?
- How do you keep the model safe when you cannot enumerate every possible input?
These are not glamorous problems. They do not produce papers. But they are the ones that determine whether your work matters.
What I believe about AI right now
The models we have today are already more capable than most of the world knows. The bottleneck is not intelligence — it is integration. Getting AI into the hands of people who can use it, in the languages they speak, in the workflows they already live in.
That is the work worth doing.