Model Behavior: MLOps Starts at Development
Learn how model development decisions cut MLOps risk, from business alignment to drift.
Most teams I talk to want the same thing: the benefits of machine learning (ML) at scale without becoming a miniature version of Google or Facebook.
Today, that’s a telco trying to meet FCC pressure to curb robocalls. Or a sportsbook getting ready for new state-by-state legalization and leaning hard on regression and classification to set lines and manage risk. The common pattern: you don’t have a giant MLOps platform team, but you still have to ship reliable models into production.
The key idea: model development is MLOps. Every decision you make before deployment either reduces or creates operational risk later. If you treat those decision points as product decisions, you get a lot of MLOps “for free,” even in constrained environments.
Start With Business Alignment, Not Algorithms
In non–Big Tech environments, the most important MLOps control lives outside the code: clear business objectives.
Before you touch a Jupyter notebook, you should be able to answer in plain language:
- What problem are we solving, for whom?
- How will a model change a decision, workflow, or outcome?
- What does “good enough” look like in business terms?
For a robocall model, the objective might be: “Reduce robocall complaints by 40% over 12 months while keeping false blocks of legitimate calls under 0.1%.” For a sportsbook, maybe: “Maintain target margin on in-play bets within a 1–2% band while handling latency spikes on Sundays.”
Those targets immediately drive MLOps concerns:
- What metrics do we monitor in production (complaints, margin, latency, false positive rate)?
- Who owns those metrics? Is it operations, product, a risk committee?
- At what thresholds do we roll back, retrain, or escalate?
If these decisions are made during model development, you’re not just “trying algorithms.” You’re defining the operational contract your future MLOps stack will have to enforce.
Treat EDA as Product Discovery, Not Just Data Prep
Exploratory Data Analysis (EDA) is usually framed as a data science task: profiling distributions, correlations, missingness. That’s important. But I’d argue that EDA is also product discovery under constraints.
As you explore the data, ask:
- Can we legally use this data for this purpose, under GDPR, FCC rules, or internal policies?
- Will this data realistically be available in production, at the latency we need?
- Is the quality stable enough that we won’t be firefighting broken pipelines every week?
Imagine our telco. On paper, they have rich call detail records (CDRs), customer complaints, and network metadata. During EDA, the team discovers:
- Complaint labels arrive with a multi-day lag.
- Some network attributes are noisy or inconsistently populated.
- Real-time access to certain fields would require a major integration with a legacy switch.
Those findings should change the product scope. Maybe the first increment isn’t “real-time blocking,” but:
- Near real-time flagging of high-risk numbers for additional verification.
- Daily risk scores used to throttle suspicious traffic or trigger manual review.
By treating EDA as a joint product–data exercise, you design something you can actually run, instead of an idealized model that collapses when it hits production.
Engineer Features You Can Operate and Explain
Feature engineering is where data science creativity shines, but it’s also where future MLOps debt quietly accumulates.
In a telco robocall model, you might consider features like:
- Number of unique called destinations in the last hour.
- Ratio of answered vs. unanswered calls.
- Time-of-day patterns compared to typical behavior for that line.
Each feature has a lifecycle cost: it must be computed reliably, monitored, and explained to someone (regulators, ops, customers). A similar story plays out in sportsbooks, where you might use bet velocity, stake patterns, or market movements.
Product thinking here means asking:
- Does this feature map to a real behavior or risk concept that the business understands?
- Can we explain it to non-technical stakeholders if a decision is challenged?
- Do we know who owns the underlying data pipeline when it breaks at 2 a.m.?
Sometimes the “best” model on a leaderboard isn’t the right choice because its features are too brittle or opaque. Choosing simpler, more interpretable features can dramatically reduce operational risk later.
Make Reproducibility a First-Class Requirement
Data scientists iterate. In a typical project, you might train dozens of model variants before selecting one to ship. If you can’t reproduce how you got to the winning model, you’re setting your future self (or team) up for pain.
Today, you don’t need a full-blown ML platform to improve this. You can:
- Version-control code, configuration, and training scripts.
- Snapshot or reference the exact training data used.
- Save model artifacts with clear metadata: training window, features, hyperparameters, evaluation metrics.
This isn’t just about neatness. When a regulator asks, “Why did the model start blocking these calls?” or an executive asks, “What changed between last quarter and this one?” reproducibility is what allows you to answer without hand-waving.
From an MLOps perspective, reproducibility is the bridge between “experiment” and “service.” If it can’t be reproduced, it can’t be safely retrained, audited, or improved.
Embed Responsible AI Before Something Blows Up
Responsible AI can sound like a buzzword, but it’s really just structured risk management: legal, reputational, and ethical.
For our telco, over-aggressive blocking that hits small businesses disproportionately can quickly become a public relations and regulatory problem. For sportsbooks, models that systematically nudge certain vulnerable groups toward riskier bets can attract scrutiny.
During model development, that means explicitly deciding:
- Which attributes we will never include (or proxy) in models.
- What fairness or disparate impact checks we’ll run before shipping.
- How long we’ll retain data and under what consent.
These aren’t after-the-fact policy documents. They’re design constraints that shape features, model choices, and monitoring plans.
Plan for Decay Before You Ship
Every model faces drift: the world changes, customers adapt, fraudsters evolve. Training, evaluation, and drift handling is where model development and MLOps meet in the middle.
Before deploying, define:
- The core performance metrics you’ll track (e.g., robocall detection rate, false block rate, sportsbook margin stability).
- The acceptable ranges for those metrics over time.
- The signals of data drift you’ll watch (changes in input distributions, shifts in complaint types, new betting patterns).
- The operational response: retrain schedule, thresholds for investigation, and rollback criteria.
You don’t need complex tooling to start. A few well-chosen dashboards and a simple “playbook” for what happens when metrics move can deliver most of the value.
A Realistic Scenario: Mid-Sized Telco, Limited Stack
Let’s pull this together.
A mid-sized telco today wants to reduce robocall complaints and satisfy FCC expectations. They have a small data science team, partial cloud adoption, and several critical systems still on-prem.
Here’s how they might approach it:
- They define the objective: meaningful complaint reduction with strict limits on false positives, and identify a cross-functional owner spanning product, legal, and operations.
- During EDA, they discover latency and data quality constraints, so they scope the first release to risk scoring and throttling, not instant blocking.
- They design a compact set of interpretable features aligned with call behavior that support can understand and explain.
- They implement lightweight reproducibility: Git for code, stored training configs, and a clear mapping between model versions and deployment dates.
- They bake in responsible AI considerations by excluding certain fields, reviewing feature impacts by customer segment, and aligning with internal compliance.
- They define a monitoring plan with clear thresholds and escalation paths for when drift or performance issues appear.
No fancy platform, no army of MLOps engineers—just a series of explicit, product-minded decisions during model development that dramatically reduce operational risk.
Turn Model Development into Your MLOps Playbook
If you’re outside Big Tech, you probably won’t build a massive ML platform team. But you don’t need to.
You can get much of the benefit by treating model development as the front line of MLOps:
- Align early on business objectives, owners, and risk.
- Use EDA to discover constraints and adjust scope.
- Design features that are explainable and maintainable.
- Make reproducibility non-negotiable.
- Embed responsible AI from the start.
- Plan for drift and decay before you deploy.
Do that, and you’re not just building models; you’re building operationally viable products that can survive real-world conditions.
Further Readings
- Huyen, C. (2022). Designing machine learning systems: an iterative process for production-ready applications. O’Reilly.
- Majors, C., Fong-Jones, L., & Miranda, G. (2022). Observability engineering: achieving production excellence. O’Reilly.
- Mallari, M. (2019, June 3). Model packaging: ship shape ML for real-world impact. Fundamental Hybrid Thinking & Doing by Michael Mallari. https://www.michaelmallari.com/product/model-packaging-ship-shape-ml-for-real-world-impact/
- Mallari, M. (2019, June 1). MLOps, not magic: scale ML beyond big tech. Fundamental Hybrid Thinking & Doing by Michael Mallari. https://www.michaelmallari.com/product/mlops-not-magic-scale-ml-beyond-big-tech/
- Stenac, C., Dreyfus-Schmidt, L., Lefevre, K., Omont, N., & Treveil, M. (2020). Introducing MLOps: how to scale machine learning in the enterprise. O’Reilly.