How AI Changes the Role of Applied Scientists
Levi Boxell, Tilman Drerup, Alexandr Lenk
The Economics Team at Instacart is an applied science team that operates at the intersection of machine learning engineering and economics. Similar to other applied science teams, our work involves a good chunk of engineering, steeped in statistics, math, theory, and strategy. And while that is still at the heart of what we do today, the surprisingly rapid emergence of artificial intelligence has also fundamentally altered our work in ways that we did not see coming.
With this post, we want to provide a brief check-in and share an analysis of the patterns we are seeing from a distinctly economic perspective. To do so, we analyze the empirical dynamics of our project portfolio between 2023 and today, looking at the evolution of both the nature and quantity of our work over time. To start, let’s have a quick refresher of what economists at Instacart do and provide a theoretical framework to think about the impact of technological change through AI.
Background & Theoretical Framework
At Instacart, economists spend their day-to-day on a diverse portfolio of tasks and activities. Similar to other applied science teams within the company, our work relies on a blend of skills, including economics, statistics, math, machine learning, data manipulation, coding, and AI. Due to this versatility in tasks, the team’s work provides a particularly rich testing ground for predictions derived from economic theories concerning the impact of technological change.
But what does economic theory actually tell us? A useful theoretical abstraction for an applied scientist’s role is to frame it as a bundle of tasks (Autor, Levy, and Murnane, 2003), with each task characterized by its own production function (Acemoglu and Autor, 2011). Slightly simplified, a production function tells us how much output we can produce for a given level of input in a specific task. Comparisons of production functions across tasks in turn determine how we allocate our time and effort. Recent advances in AI have meaningfully affected the production functions associated with a wide variety of tasks, with bigger effects on some (e.g., coding) and currently smaller effects on others (e.g., causal inference). In consequence, we should see meaningful shifts in the distribution of tasks in our portfolio that reflect the differential effect AI has had on different types of tasks (Acemoglu and Restrepo, 2018 and 2019). Notably, in certain types of tasks, AI may even have discontinuous effects as it raises an applied scientist’s competence above thresholds of minimum viable competence, making entirely new task categories feasible.
To exemplify and make this a bit more concrete, AI is already delivering large productivity gains in many pattern-matchable, high-volume tasks, including the development of standard ML pipelines, data transformations, and boilerplate code. In such tasks, applied scientists should be able to produce more output (with less time). This should free up capacity that can be reallocated to other tasks. At the same time, entirely new tasks may enter an applied scientist’s portfolio as AI allows them to reach the minimal competence required to execute on such tasks. For example, an applied scientist without experience in writing frontend code might now be able to produce functional web applications. Similarly, a scientist with no experience in configuring CI/CD pipelines should now be able to stand them up.
Let’s jump into the data to see whether we can find evidence that is consistent with such predictions.
The Data: GitHub Contributions
Our analysis is based on three years of our team’s GitHub activity between 2023 and 2025. While we had some turnover during this time, our hiring criteria did not materially change, so any differences in output or focal areas documented below should not reflect team composition or selection effects. Importantly, we did not explicitly recruit specialists in any newly emerging task categories. In our analysis, we will focus on PRs and lines of code written per person as our primary output metrics. Evidently, these are far from perfect measures of productivity as they do not take quality of output into account. Having said that, we have a lot of anecdotal evidence of individuals being able to achieve milestones at a faster pace and with comparable quality. So while this covers some components of productivity, it’s far from exhaustive and there is plenty more to the job than what is captured in these patterns. At the same time, the distributional shifts we highlight below provide some evidence that what we see reflects true productivity adjustments in accordance with theoretical expectations.
To analyze our Github activity, we used an LLM to classify every pull request made by the team between 2023 and 2025. The classifier assigned each PR to one of eight categories (ML/Model Dev, Data Pipelines, Platform/Tooling, Analysis/Research, Infra/DevOps, Frontend/UI, Experimentation, and Other) based on the PR’s title, description, and changed files. All productivity metrics are reported relative to the 2023H1 baseline.
Productivity
Let’s first look at overall productivity in terms of code contributions, one of the most visible signs of an applied scientist’s raw output. As is evident from the two plots above, we saw a massive increase in the number of PRs and LoCs, each essentially doubling relative to their 2023H1 respective baseline. This overall tendency is strongly consistent with a meaningful upward shift in the team’s general productivity. Notably, the effects seem to become larger over time, likely reflecting the release of increasingly powerful tooling. We see a first small increase in the second half of 2023, following the release of new AI-powered productivity tools like Ava and their growing popularity within the team. A much more pronounced jump shows up in the first half of 2025, coinciding with the team’s broader adoption of Cursor agents. In 2026 (not shown), we saw a further substantial acceleration as we began to integrate Claude into our workflows, with the number of monthly Claude usage days across the team increasing by more than 400% (!) between January 2026 and April 2026 alone.
Differential Skills Drive Redistribution and Task Diversification
When we look at the distribution of tasks performed by individuals on the team, one thing immediately stands out: The average number of unique categories of tasks per member increased by 33%, adding approximately 1.3 new categories per member. At the same time, the average share of work outside each person’s primary task category (the one they focus most of their effort on) rose by ~37%. These results suggest that the team has started to work on a more diverse and less concentrated set of tasks.
Get Tilman Drerup’s stories in your inbox
Join Medium for free to get updates from this writer.
Remember me for faster sign in
To better understand what’s happening here, the table below presents a breakdown of the distribution of specific tasks in the team’s portfolio and how it changed over time. What’s behind these changes and does economic theory provide an explanation?
As we mentioned above, we expect AI to have a highly heterogeneous effect on the production functions associated with different types of skills, raising productivity in particular in areas that are more boilerplate and pattern-matchable. If that is indeed the case, the share of time allocated to such tasks should drop, freeing up capacity to work on alternative tasks. This is exactly what we find: More standardized tasks under Core ML (Data Pipelines or ML/Model Dev) saw declining shares, falling from 73% to 64% of the team’s PRs. Similarly, we observe lower shares for traditional productionization infrastructure tasks, dropping from 7.8% to 4.1%. Importantly, these share decreases coincide with an increase in overall output associated with such tasks. In essence, it seems that more standardized tasks now require substantially less time, allowing the team to do more of them while also freeing up capacity to work on other tasks.
And where did that freed capacity go? One particularly interesting theoretical prediction is that AI raises competence above what is minimally viable on tasks previously outside a worker’s skill set. Applied to our context, this means that scientists can now take on tasks outside of the core scientist skill set as the required specialized knowledge is becoming more easily accessible. And this is exactly what we see when we look at the categories that rise to prominence in our portfolio: The share of Platform/Tooling more than doubled from ~5.2% to ~12.1% of PRs per person and Frontend/UI emerged as an entirely new task. We are now seeing economists building full-stack web applications, something that would previously have required collaboration with separate engineering teams. Put differently, the data shows a redistribution across task types that is driven by differential shifts in productivity: the categories where AI compresses costs most lose share, while newly feasible categories absorb the freed capacity.
This raising of minimum viable competence in several new classes of tasks is particularly interesting as it allows teams like ours to avoid certain transaction and coordination costs. In projects that require a highly diverse set of skills (e.g., building a UI for a budget allocation system), a lot of time is spent on describing what’s needed to different functions, generating alignment, getting on a roadmap. AI massively lowers the self-production costs in such scenarios. When the cost of doing it yourself with AI falls below the cost of delegating to a specialist plus the coordination tax, the task migrates. This applies both across teams (scientists absorbing frontend work) and within teams (individuals becoming more self-sufficient). The direction of task migration depends on where AI’s skill shift bites hardest: codifiable tasks are most susceptible to absorption. Frontend/UI is the clearest case of such task absorption for our team. In 2023, virtually no economist wrote frontend code. By 2025, economists were building full-stack web applications. Previously, such work would have required delegating to separate engineering teams with all the attendant coordination costs: getting on their roadmap, describing requirements, iterating on designs.
Importantly, it is not the case that economists are encroaching upon or replacing frontend or platform engineering. The applications being built are internal tools and tooling that would likely never have made it onto those teams’ roadmaps in the first place. However, AI has effectively unlocked a class of previously unfunded work, allowing us to self-fund and self-execute projects that previously fell below the line for any team to own due to their required diversity of skills.
A concrete example is Apex, an experimentation dashboarding and NPV-grounded decisioning tool built end-to-end by the Economics team. Apex was scoped to fill a real gap in our stack to enable easier cross-experiment monitoring of key metrics to drive faster and better decision-making. Spinning up a cross-team project for this in advance would have been hard to justify: the use cases were unproven and the requirements were fuzzy. AI changed this calculus. With self-production cheap enough, we shipped a working end-to-end version, put it in front of users, and let real adoption decide which pieces deserved further investment. The pieces that proved their value have since started to graduate into the central experimentation platform.
What about the large increase in Experimentation (+45.2%)? Not every category tells a clean story, and with a relatively small team spread across eight task categories, some variance is expected. The cleaner signal is in the aggregate: a broad shift away from Core ML toward Systems Engineering and newly feasible task types. Individual category movements, particularly in smaller buckets like Experimentation and Infra/DevOps, should be interpreted with that in mind.
A Pitfall: Platformization Tensions
With implementation costs dropping, it becomes particularly tempting to think about platformizing certain types of solutions. By platformization, we are referring to the abstraction of similar tasks into a codified and standardized solution. For repeated tasks, building a platform solution may be worth it when `Fixed Cost < Expected Future Uses × Manual Cost Savings per Use`. While AI lowers the Fixed Cost, it also lowers the per-instance manual cost. The net effect is ambiguous and depends on which cost falls faster.
Returning to the category composition chart, the Platform/Tooling line tells a clear story: its share of PRs roughly increased by 132%, from ~5.2% to ~12.1%. This is consistent with AI lowering the fixed cost of platform-building faster than it lowered bespoke execution costs in some areas for our team. However, this only tells part of the story and our own experience illustrates the tension between the two cost types. Early in the AI wave, we invested into a Causal Inference Platform, CIP. CIP was a UI-based tool that provided company-wide access to standard causal inference methods, such as synthetic controls. This seemed like the right platformization bet: automate repeated analytical workflows behind a clean interface.
But AI may have undermined the premise. As AI coding assistants made bespoke causal inference dramatically faster to execute, the per-instance savings from a standardized platform shrank. Rather than clicking through the UI and being constrained by our design choices, users can simply type “Use a synthetic control to estimate the impact of our investments in region A on GTV.” The agentic interface can leverage our internal MCPs to explore annotated data tables, write Python estimation and visualization code, and run the analysis with real data in one shot — with the user able to adjust and guide in ways that are not possible in the UI tooling. The complexity of real causal inference fit poorly into a constrained UI, and contrasts with Apex where context, analysis, and decision frameworks are more tightly controlled.
For some use cases like causal inference, the rise of AI-powered skills and agents suggests a different model entirely: machine-interactable tools, such as MCPs and Agentic Skills, that an AI can invoke flexibly, rather than human-interactable dashboards that lock in a particular workflow. Machine-interactable tools can still provide vetted standardization of causal inference techniques (e.g., leveraging synthetic difference-in-differences as the default rather than vanilla synthetic controls or conducting specific falsification tests to verify key methodological assumptions), but also enable the end-users to use the tools in the agentic environments they are already doing data analysis within and with greater flexibility. We may have gotten the form of the platform wrong even as we correctly identified the impulse to platformize.
Organizations like ours will have to think thoroughly about which tasks belong in which roles and which workflows deserve platforms, and how those platforms should be constructed in an agentic-first work environment.
Conclusion
The analysis of our team’s GitHub data reveals patterns that are broadly consistent with theoretical arguments from the literature. The emergence of AI seems to have caused heterogeneous shifts in the distribution of skills, which in turn have meaningfully changed the types of tasks we work on. Where the productivity impact has been largest, cost-per-output has dropped and capacity was freed; where the capability floor rose, entirely new categories become feasible. For the team, this meant that a lot of exciting new areas opened up for exploration, adding entirely new facets to what we have historically worked on.
Looking ahead, we expect more changes to come our way.
AI-driven shifts in the industry may also alter what is demanded of applied scientist teams, such as increased prioritization of cutting edge methodology that improves experimentation velocity to keep up with the pace of new feature development.
And, paradoxically, the tasks that entered our portfolio most recently due to a lowering of the capabilities floor may also be the ones that exit again for the same reason. As AI’s cost on these tasks continues to fall, theory would predict a second reallocation wave towards judgment-intensive work where the productivity gap is largest — problem framing, modelling, causal identification, deciding which results to trust. We are beginning to see exciting new directions opening up in this domain as we try to establish effective collaboration models with agentic partners.
Thank you for reading! This post is part of a series covering the Economics Team at Instacart and the areas we work on. If you would like to learn more about our work, be sure to check out our other posts on optimization via regression discontinuity designs or on using surrogate indices for measuring long-run treatment effects. You can also follow tech-at-instacart to be notified as we release new posts.