Applying statistical and computational methods to prediction.
academically-rootedresearch-adjacentsenior-skewed
Data scientists analyze data, build predictive models, and productionize machine learning systems using statistical methods, deep learning frameworks, and MLOps tooling. The work sits closer to research than to application development, with PyTorch, TensorFlow, and the Python data stack as standard tools. Mathematical foundations matter more here than in most engineering roles. Specialization paths fragment further by problem type, including NLP, computer vision, classical ML, and generative AI research.
Specializations
Share of postings · n=6 tracks
Classical ML & Statistical Modeling
~40%
Share of postings
Roles centered on traditional ML techniques such as classification, regression, clustering, recommendation systems, and feature engineering. Scikit-learn anchors the toolkit, alongside XGBoost and statistical libraries. Strong mathematical and statistical foundations matter more than deep learning expertise. The largest data science segment by hiring volume.
Roles focused on building applications with LLMs, including RAG, agents, prompt engineering, and fine-tuning workflows. LangChain, LlamaIndex, vector databases, and OpenAI APIs define the stack. The newest and fastest-growing data science track. Often overlaps with AI application engineering but with deeper modeling context.
LLM ApplicationsRAG PipelinesFine-Tuned ModelsGenerative AI Products
Natural Language Processing
~15%
Share of postings
Roles focused on natural language processing, including text analytics, named entity recognition, information retrieval, semantic search, and language understanding. spaCy, NLTK, BERT, and HuggingFace transformers anchor the toolkit. The classical NLP track, distinct from pure LLM application work.
Roles focused on image processing, video analytics, and visual recognition systems. OpenCV, YOLO, and Detectron2 anchor the toolkit, alongside CNN-based architectures. The work spans medical imaging, autonomous systems, and consumer applications. A specialist track requiring strong perception expertise.
Roles focused on speech recognition, audio processing, and signal processing at the model and algorithm level. Research-focused work on the underlying techniques rather than end-user voice products. A narrow specialist segment that does not substitute with NLP or computer vision.
ML roles targeting edge devices, embedded systems, RTOS, and model optimization for constrained environments. TFLite, ONNX, NVIDIA TensorRT, and CUDA define the deployment toolkit. The work emphasizes inference efficiency and on-device performance over training scale. A niche but fundamentally different deployment context.
Data science and ML hiring breaks into a Python-and-statistics core that defines the role and four problem-type tracks that shape it depending on whether the work leans toward classical methods, speech and audio, natural language processing, or computer vision. The two subsections below separate what hiring managers expect from what they value as a plus.
Core skillsets—what hiring managers expect
Python anchors the daily toolkit alongside Scikit-learn, Pandas, and NumPy as the canonical data-science stack. SQL, Git, and Linux form the engineering baseline that lets practitioners pull data and ship code outside the notebook. Transformers and Neural Networks lead the deep-learning architecture knowledge, with CNN appearing where vision work shows up. Reinforcement Learning, Supervised Learning, and Unsupervised Learning frame the paradigm vocabulary. The four tracks split the work: classical ML through Feature Engineering, Statistics, R, and Random Forest; speech and audio through Speech LLMs and Speech Processing; natural language processing through Information Retrieval, BERT, and Semantic Search; and computer vision through Computer Vision Algorithms, OpenCV, and Image Processing.
Java and C/C++ surface where data science models meet production backend code outside the Python ecosystem. PyTorch and TensorFlow anchor the deep-learning frameworks band, with Keras as the higher-level alternative. AWS, Azure, and GCP host training and inference infrastructure, paired with Docker and Kubernetes for portable workloads. Spark, PySpark, and Hadoop handle big-data processing for feature pipelines, while Airflow and Kafka orchestrate data movement upstream of the model. MLflow, SageMaker, Kubeflow, and Vertex AI define the MLOps tier where teams version, track, and deploy models at scale.
Backend Programming Languages
JavaC/C++
Deep Learning Frameworks
PyTorchTensorFlowKeras
Cloud Platforms & Containers
AWSAzureGCPDockerKubernetes
Big Data & Pipelines
Data PipelinesSparkData ProcessingHadoopPySparkData Cleansing
Data Engineering Tools
AirflowKafkaDatabricks
MLOps & ML Platforms
MLflowSageMakerKubeflowVertex AI
Section 3 / Demand & Pay
Where the market sits and what it pays
Data Science and ML sits in the lower-volume tier of the snapshot, near ~21 per week across the window. The mix tilts toward MAANG and elite global tech at ~19%, with MNCs and GCCs leading at ~39%. Median pay: fresher band sits at 15 LPA, mid at 32 LPA, senior at 52 LPA. The panels below cover volume and company mix, then a zoom into fresher-accessible roles.
MNCs & GCCs~39%Unicorns & Indian Product~5%MAANG & Elite Global Tech~19%Established SME~12%Funded Startups~5%Indian IT Services / WITCH~13%Lala Companies~4%Other~4%
Window overall · ~21 / wk
Volume opened near ~35 per week in January, halved to ~17 in February, recovered to ~19 in March, then ran ~17 across April and ~18 across May. The mix carries the snapshot's strongest MAANG ramp: MAANG and elite global tech climbed from ~12% in January to ~23% by May, a gain of ~11 pp. Indian IT services dropped sharply, from ~22% in January to ~11% in May, ties for the lowest WITCH share in the field. MNCs and GCCs held in the ~34 to ~48% range across the window, peaking in Apr. The funded startups share is one of the highest in the snapshot at ~5%, with the FA p90 reaching ~42 LPA, the highest fresher upper-tail in the field.
Demand by experience—weekly, January–May 2026
Postings per week, segmented by experience:
Postings per week, by experience band
Window overall (January 2026 to May 2026)
Fresher (FA)~13%Mid~48%Senior~29%Staff~9%
Window overall · ~21 / wk
The experience mix is Mid-heavy with one of the strongest senior blocks: window-overall splits to ~48% Mid, ~29% Senior, ~13% FA, and ~9% Staff. The Staff share at ~9% is among the highest in the snapshot, reflecting the senior modeling and research-adjacent roles. FA share ranges ~9 to ~16% across populated months, placing Data Science and ML as one of the most fresher-accessible profiles in the snapshot.
Fresher-accessible cut—where entry-level roles sit
Data Science and ML is one of the most fresher-accessible profiles in the snapshot. Fresher-accessible here means roles open to ENTRY and JUNIOR LEVEL applicants, which make up ~17% of all postings on this profile and run at ~0 to 12 per week across the weekly buckets. Inside the fresher cut, MNCs and GCCs sit at ~30%, down from ~39% in the overall mix.
Share of total~17%of all postings
Volume / week~0 to 12weekly range
Inside the fresher cut · company class distribution
MNCs & GCCs~30%Unicorns & Indian Product~8%MAANG & Elite Global Tech~12%Established SME~10%Funded Startups~6%Indian IT Services / WITCH~13%Lala Companies~10%Other~12%
In the FA cut, MNCs & GCCs leads at ~30% (vs ~39% in the overall mix). Versus overall, MNCs & GCCs drops 9pp to ~30% and MAANG & Elite Global Tech drops 7pp to ~12%. On the other side, Other rises 8pp to ~12% and Lala Companies rises 6pp to ~10%.
Entry-level pay distribution (LPA)
Mass anchors at 12 LPA (~61% of FA offers), followed by 4 LPA at ~24% and 8 LPA at ~10%; the distribution is mid-anchored. The 30+ LPA tail at negligible is light despite MAANG presence of ~12%, suggesting senior-tilted MAANG hiring rather than fresher openings. The 20 LPA rung is thin at ~5% because Unicorns and funded startups together hold only ~14% of the FA cut. The 4 to 8 LPA entry mass at ~34% traces to Indian IT services at ~13% and Lala at ~10%.
Section 4 / Career Trajectory
Where this profile takes you once you're in
Data science & ML shows a fresher-leaning ladder with Senior+Staff share slightly below the snapshot baseline, an unusually wide IC pay band where the Staff p90 caps at 200 LPA and the long tail is the steepest in the snapshot, pivot routes spanning generalist software, data engineering, and AI/LLM work, and a MAANG path with strong staff-level concentration but a senior cohort too thin for stable comparison. The four panels below answer the four questions most candidates ask: is the ladder real, does expertise pay, where can I pivot if I want out, and how do I get to MAANG.
IC PREMIUMStaff p50 7.8x FAtail tops out at 200 LPA at p75
PIVOT BREADTH3 adjacent profiles19 to 29% skill overlap
MAANG PATHFA-skewed presence~11% at FA, ~6% at Senior, ~110% senior pay premium
Ladder health—this profile vs market average
Distribution of postings by seniority level (this profile vs the snapshot baseline of all 15 profiles, same window):
Seniority mix vs market average
Difference from market average, in points (profile − market average)
Market average
Fresher (FA)
+8 pp
Mid
-6 pp
Senior
-3 pp
Staff
±0 pp
−100+10
Hires less than market averageHires more than market average
The ladder is fresher-heavy. Fresher share at ~17% runs roughly 8 percentage points above the ~9% baseline, the largest fresher tilt in the snapshot, while Senior+Staff at ~35% sits a couple of points under the ~37% baseline. Mid at ~49% runs a few points below the ~54% baseline, and Staff at ~6% matches baseline. The shape suggests employers hire fresh data scientists actively, often to operationalize ML or run analyses, but the senior rung is comparatively thinner than in pure software engineering profiles. Verdict: not a dead-end, but a profile that rewards getting in early; senior-rung depth exists but is less common than the engineering norm.
IC pay premium—LPA quartiles, by seniority
Compensation progression along the IC track, in LPA, with quartiles at each seniority level:
IC pay quartiles by seniority
LPA · same profile · same window
Median
FRESHER (FA) p25 – p50 – p75 – p90
82039
12p50 · LPA
MID p25 – p50 – p75 – p90
153858
32p50 · LPA
SENIOR p25 – p50 – p75 – p90
305568
52p50 · LPA
STAFF p25 – p50 – p75 – p90
70200200
93p50 · LPA
Below p25p25 – p75p75 – p90p50 median
Senior → Staff p501.8xmultiple of medians
FA → Staff p507.8xmultiple of medians
FA p50 → Staff p7516.7xmultiple of medians
FA p50 → Staff p9016.7xmultiple of medians
Pay carries the long-staff-tail, steep-climb, and wide-entry archetypes simultaneously. Senior median 52 LPA is roughly 4.3x the fresher median of 12 LPA, and Staff median 93 LPA is another 1.8x on top, putting Staff at ~7.8x entry. The tail then explodes: Staff p75 and p90 both cap at the dataset sentinel of 200 LPA (tied with Generalist SWE), with the FA-to-Staff p90 multiple of ~16.7x the highest in the snapshot. The FA-to-Mid step from 12 to 32 LPA is the steepest proportional climb at ~2.7x, and the 8 to 20 LPA fresher band underlines the wide-entry tag. Verdict: deep ML or research expertise compounds dramatically here, with the FA-to-Staff multiple uniquely steep across the snapshot.
Pivot breadth—closest adjacent profiles by skill overlap
Closest profiles by SkillSet-level overlap (Jaccard similarity over the SkillSets cited in at least 10% of postings for each profile, same window). New SkillSets required is the count of SkillSets that appear in the adjacent profile's set but not in this profile's:
GENERALIST_SWE
~29%
5 shared · ~5 new required
Shared core skillsets
Programming LanguagesPython for Data ScienceCloud PlatformsRelational DatabasesVersion Control Systems
New skillsets required (examples)
Java & Spring Core.NET Backend.NET & DesktopNoSQL DatabasesCore Web
DATA_ENGINEERING
~29%
7 shared · ~12 new required
Shared core skillsets
Programming LanguagesPython for Data ScienceCloud PlatformsRelational DatabasesContainers & OrchestrationSpark & Batch ProcessingVersion Control Systems
New skillsets required (examples)
Data Engineering LanguagesCloud Data WarehousesETL & OrchestrationNoSQL DatabasesMessaging & Event SystemsCI/CD Platforms
AI_AND_LLM
~26%
6 shared · ~11 new required
Shared core skillsets
Python for Data ScienceDeep Learning FrameworksCloud PlatformsRelational DatabasesContainers & OrchestrationLLM Agents & Orchestration
Programming LanguagesPython for Data ScienceAnalytics LanguagesRelational Databases
New skillsets required (examples)
Power BI EcosystemMicrosoft Power PlatformBI PlatformsCloud Data WarehousesETL & OrchestrationCI/CD Platforms
DOMAIN_SPECIFIC
~19%
4 shared · ~9 new required
Shared core skillsets
Cloud PlatformsRelational DatabasesContainers & OrchestrationVersion Control Systems
New skillsets required (examples)
Alternative Server-Side LanguagesJava & Spring CoreCore WebNoSQL DatabasesWeb Frontend FrameworksCI/CD Platforms
Pivot paths are diverse but each requires real reskilling. The closest, Generalist SWE (~29%) and Data Engineering (~29%), share Programming Languages, Python for Data Science, Cloud Platforms, and Relational Databases, with generalist adding backend Java/.NET and data engineering adding ETL Orchestration, Spark, and Cloud Data Warehouses. AI & LLM Applications (~26%) is the natural in-domain pivot, sharing Python and Deep Learning Frameworks but adding LLM Agents, LLM APIs, and Vector Databases. Data Analytics & BI (~21%) and Domain-Specific (~19%) form a more distant tier. Verdict: lateral mobility into adjacent data and engineering profiles is real but always involves a deliberate ramp, with AI & LLM Applications as the lowest-friction in-domain step.
MAANG and elite global tech pathway—share of postings + senior pay
MAANG and elite global tech share of postings within this profile, broken out by seniority level:
MAANG and elite global tech share + senior pay
Within data science and ml
Share by seniority
Fresher (FA)~11%
Mid~12%
Senior~6%
Staff~34%
015%30%45%
Senior pay · same profile
MAANG senior[insufficient data]
Non-MAANG senior[insufficient data]
Skills that distinguish MAANG senior postings
MAANG presence is concentrated at the staff end of the ladder. The MAANG share at Staff is striking at ~34%, by far the highest staff-bucket figure in this profile, while Senior sits at ~6% and FA at ~11%. The distinct staff concentration suggests MAANG actively bids for senior-IC research and applied-science talent in this profile but hires more sparingly at the conventional Senior title. The MAANG senior cohort is too thin to compute a stable senior pay premium or a distinguishing-skills list. Verdict: MAANG hiring in data science skews toward staff-level research and scale-ML roles rather than mainstream senior IC. Realistic pathway: build a strong applied-research or production-ML profile over 5 to 8 years, then aim directly for staff-level openings; the senior rung is not a reliable entry point at MAANG in this profile.