Section 1 / Overview

Applying statistical and computational methods to prediction.

academically-rootedresearch-adjacentsenior-skewed

Data scientists analyze data, build predictive models, and productionize machine learning systems using statistical methods, deep learning frameworks, and MLOps tooling. The work sits closer to research than to application development, with PyTorch, TensorFlow, and the Python data stack as standard tools. Mathematical foundations matter more here than in most engineering roles. Specialization paths fragment further by problem type, including NLP, computer vision, classical ML, and generative AI research.

Specializations
Share of postings · n=6 tracks

Classical ML & Statistical Modeling

Roles centered on traditional ML techniques such as classification, regression, clustering, recommendation systems, and feature engineering. Scikit-learn anchors the toolkit, alongside XGBoost and statistical libraries. Strong mathematical and statistical foundations matter more than deep learning expertise. The largest data science segment by hiring volume.

Recommendation SystemsFraud DetectionDemand ForecastingCustomer Segmentation

Generative AI & LLM Applications

Roles focused on building applications with LLMs, including RAG, agents, prompt engineering, and fine-tuning workflows. LangChain, LlamaIndex, vector databases, and OpenAI APIs define the stack. The newest and fastest-growing data science track. Often overlaps with AI application engineering but with deeper modeling context.

LLM ApplicationsRAG PipelinesFine-Tuned ModelsGenerative AI Products

Natural Language Processing

Roles focused on natural language processing, including text analytics, named entity recognition, information retrieval, semantic search, and language understanding. spaCy, NLTK, BERT, and HuggingFace transformers anchor the toolkit. The classical NLP track, distinct from pure LLM application work.

Search SystemsText ClassificationSentiment AnalyticsKnowledge Extraction

Computer Vision

Roles focused on image processing, video analytics, and visual recognition systems. OpenCV, YOLO, and Detectron2 anchor the toolkit, alongside CNN-based architectures. The work spans medical imaging, autonomous systems, and consumer applications. A specialist track requiring strong perception expertise.

Image RecognitionObject DetectionVideo AnalyticsVisual Inspection

Speech & Audio Processing

Roles focused on speech recognition, audio processing, and signal processing at the model and algorithm level. Research-focused work on the underlying techniques rather than end-user voice products. A narrow specialist segment that does not substitute with NLP or computer vision.

Speech RecognitionAudio ClassificationVoice Analytics

Edge & Embedded ML

ML roles targeting edge devices, embedded systems, RTOS, and model optimization for constrained environments. TFLite, ONNX, NVIDIA TensorRT, and CUDA define the deployment toolkit. The work emphasizes inference efficiency and on-device performance over training scale. A niche but fundamentally different deployment context.

On-Device AIIoT IntelligenceReal-Time InferenceEdge Models
Section 2 / Skills

Skills at a Glance

Data science and ML hiring breaks into a Python-and-statistics core that defines the role and four problem-type tracks that shape it depending on whether the work leans toward classical methods, speech and audio, natural language processing, or computer vision. The two subsections below separate what hiring managers expect from what they value as a plus.

Core skillsetswhat hiring managers expect

Python anchors the daily toolkit alongside Scikit-learn, Pandas, and NumPy as the canonical data-science stack. SQL, Git, and Linux form the engineering baseline that lets practitioners pull data and ship code outside the notebook. Transformers and Neural Networks lead the deep-learning architecture knowledge, with CNN appearing where vision work shows up. Reinforcement Learning, Supervised Learning, and Unsupervised Learning frame the paradigm vocabulary. The four tracks split the work: classical ML through Feature Engineering, Statistics, R, and Random Forest; speech and audio through Speech LLMs and Speech Processing; natural language processing through Information Retrieval, BERT, and Semantic Search; and computer vision through Computer Vision Algorithms, OpenCV, and Image Processing.

PREREQUISITE

Python for Data Science

PythonScikit-learnPandasNumPyMatplotlib
PREREQUISITE

Engineering Baseline

SQLGitLinux
CORE

Deep Learning Architectures

TransformersNeural NetworksCNN
CORE

ML Paradigms

Reinforcement LearningSupervised LearningUnsupervised Learning
TRACK

Classical ML & Statistics

Feature EngineeringStatisticsRClassificationStatistical ModelingRegressionRecommendation SystemsNumerical OptimizationRandom ForestClustering
TRACK

Speech & Audio Processing

Speech LLMsSpeech Processing
TRACK

Natural Language Processing

Information RetrievalBERTText AnalyticsKnowledge GraphsSemantic SearchNERVector Search
TRACK

Computer Vision

Computer Vision AlgorithmsImage ProcessingOpenCVVideo Analytics
Auxiliary skillsetswhat they value as a plus

Java and C/C++ surface where data science models meet production backend code outside the Python ecosystem. PyTorch and TensorFlow anchor the deep-learning frameworks band, with Keras as the higher-level alternative. AWS, Azure, and GCP host training and inference infrastructure, paired with Docker and Kubernetes for portable workloads. Spark, PySpark, and Hadoop handle big-data processing for feature pipelines, while Airflow and Kafka orchestrate data movement upstream of the model. MLflow, SageMaker, Kubeflow, and Vertex AI define the MLOps tier where teams version, track, and deploy models at scale.

Backend Programming Languages

JavaC/C++

Deep Learning Frameworks

PyTorchTensorFlowKeras

Cloud Platforms & Containers

AWSAzureGCPDockerKubernetes

Big Data & Pipelines

Data PipelinesSparkData ProcessingHadoopPySparkData Cleansing

Data Engineering Tools

AirflowKafkaDatabricks

MLOps & ML Platforms

MLflowSageMakerKubeflowVertex AI
Section 3 / Demand & Pay

Where the market sits and what it pays

Data Science and ML sits in the lower-volume tier of the snapshot, near ~21 per week across the window. The mix tilts toward MAANG and elite global tech at ~19%, with MNCs and GCCs leading at ~39%. Median pay: fresher band sits at 15 LPA, mid at 32 LPA, senior at 52 LPA. The panels below cover volume and company mix, then a zoom into fresher-accessible roles.

VOLUME~15 / weekrecent average
PAY · ENTRY / JUNIOR / MID / SENIOR12 / 15 / 32 / 52 LPAmedians
TREND~20 / weeklast 2 wks ~15 / wk
Demand by company classweekly, January–May 2026

Postings per week, segmented by company class:

Postings per week, by company class

Window overall (January 2026 to May 2026)
0204060Jan W1Feb W1Mar W1Mar W5Apr W4May W3postings / wk
MNCs & GCCsUnicorns & Indian ProductMAANG & Elite Global TechEstablished SMEFunded StartupsIndian IT Services / WITCHLala CompaniesOther

Window overall · ~21 / wk

~21/ week

Volume opened near ~35 per week in January, halved to ~17 in February, recovered to ~19 in March, then ran ~17 across April and ~18 across May. The mix carries the snapshot's strongest MAANG ramp: MAANG and elite global tech climbed from ~12% in January to ~23% by May, a gain of ~11 pp. Indian IT services dropped sharply, from ~22% in January to ~11% in May, ties for the lowest WITCH share in the field. MNCs and GCCs held in the ~34 to ~48% range across the window, peaking in Apr. The funded startups share is one of the highest in the snapshot at ~5%, with the FA p90 reaching ~42 LPA, the highest fresher upper-tail in the field.

Demand by experienceweekly, January–May 2026

Postings per week, segmented by experience:

Postings per week, by experience band

Window overall (January 2026 to May 2026)
0204060Jan W1Feb W1Mar W1Mar W5Apr W4May W3postings / wk
Fresher (FA)MidSeniorStaff

Window overall · ~21 / wk

~21/ week

The experience mix is Mid-heavy with one of the strongest senior blocks: window-overall splits to ~48% Mid, ~29% Senior, ~13% FA, and ~9% Staff. The Staff share at ~9% is among the highest in the snapshot, reflecting the senior modeling and research-adjacent roles. FA share ranges ~9 to ~16% across populated months, placing Data Science and ML as one of the most fresher-accessible profiles in the snapshot.

Fresher-accessible cutwhere entry-level roles sit

Data Science and ML is one of the most fresher-accessible profiles in the snapshot. Fresher-accessible here means roles open to ENTRY and JUNIOR LEVEL applicants, which make up ~17% of all postings on this profile and run at ~0 to 12 per week across the weekly buckets. Inside the fresher cut, MNCs and GCCs sit at ~30%, down from ~39% in the overall mix.

Share of total~17%of all postings
Volume / week~0 to 12weekly range

Inside the fresher cut · company class distribution

MNCs & GCCsUnicorns & Indian ProductMAANG & Elite Global TechEstablished SMEFunded StartupsIndian IT Services / WITCHLala CompaniesOther

In the FA cut, MNCs & GCCs leads at ~30% (vs ~39% in the overall mix). Versus overall, MNCs & GCCs drops 9pp to ~30% and MAANG & Elite Global Tech drops 7pp to ~12%. On the other side, Other rises 8pp to ~12% and Lala Companies rises 6pp to ~10%.

Entry-level pay distribution (LPA)

30%24%10%61%5%LPA1510152025303540

Mass anchors at 12 LPA (~61% of FA offers), followed by 4 LPA at ~24% and 8 LPA at ~10%; the distribution is mid-anchored. The 30+ LPA tail at negligible is light despite MAANG presence of ~12%, suggesting senior-tilted MAANG hiring rather than fresher openings. The 20 LPA rung is thin at ~5% because Unicorns and funded startups together hold only ~14% of the FA cut. The 4 to 8 LPA entry mass at ~34% traces to Indian IT services at ~13% and Lala at ~10%.

Section 4 / Career Trajectory

Where this profile takes you once you're in

Data science & ML shows a fresher-leaning ladder with Senior+Staff share slightly below the snapshot baseline, an unusually wide IC pay band where the Staff p90 caps at 200 LPA and the long tail is the steepest in the snapshot, pivot routes spanning generalist software, data engineering, and AI/LLM work, and a MAANG path with strong staff-level concentration but a senior cohort too thin for stable comparison. The four panels below answer the four questions most candidates ask: is the ladder real, does expertise pay, where can I pivot if I want out, and how do I get to MAANG.

LADDER HEALTH~34% Senior+Staffvs ~37% snapshot baseline
IC PREMIUMStaff p50 7.8x FAtail tops out at 200 LPA at p75
PIVOT BREADTH3 adjacent profiles19 to 29% skill overlap
MAANG PATHFA-skewed presence~11% at FA, ~6% at Senior, ~110% senior pay premium
Ladder healththis profile vs market average

Distribution of postings by seniority level (this profile vs the snapshot baseline of all 15 profiles, same window):

Seniority mix vs market average

Difference from market average, in points (profile − market average)
Market average
Fresher (FA)
+8 pp
Mid
-6 pp
Senior
-3 pp
Staff
±0 pp
100+10
Hires less than market averageHires more than market average

The ladder is fresher-heavy. Fresher share at ~17% runs roughly 8 percentage points above the ~9% baseline, the largest fresher tilt in the snapshot, while Senior+Staff at ~35% sits a couple of points under the ~37% baseline. Mid at ~49% runs a few points below the ~54% baseline, and Staff at ~6% matches baseline. The shape suggests employers hire fresh data scientists actively, often to operationalize ML or run analyses, but the senior rung is comparatively thinner than in pure software engineering profiles. Verdict: not a dead-end, but a profile that rewards getting in early; senior-rung depth exists but is less common than the engineering norm.

IC pay premiumLPA quartiles, by seniority

Compensation progression along the IC track, in LPA, with quartiles at each seniority level:

IC pay quartiles by seniority

LPA · same profile · same window
Median
FRESHER (FA)
p25 – p50 – p75 – p90
82039
12p50 · LPA
MID
p25 – p50 – p75 – p90
153858
32p50 · LPA
SENIOR
p25 – p50 – p75 – p90
305568
52p50 · LPA
STAFF
p25 – p50 – p75 – p90
70200200
93p50 · LPA
Below p25p25 – p75p75 – p90p50 median
Senior → Staff p501.8xmultiple of medians
FA → Staff p507.8xmultiple of medians
FA p50 → Staff p7516.7xmultiple of medians
FA p50 → Staff p9016.7xmultiple of medians

Pay carries the long-staff-tail, steep-climb, and wide-entry archetypes simultaneously. Senior median 52 LPA is roughly 4.3x the fresher median of 12 LPA, and Staff median 93 LPA is another 1.8x on top, putting Staff at ~7.8x entry. The tail then explodes: Staff p75 and p90 both cap at the dataset sentinel of 200 LPA (tied with Generalist SWE), with the FA-to-Staff p90 multiple of ~16.7x the highest in the snapshot. The FA-to-Mid step from 12 to 32 LPA is the steepest proportional climb at ~2.7x, and the 8 to 20 LPA fresher band underlines the wide-entry tag. Verdict: deep ML or research expertise compounds dramatically here, with the FA-to-Staff multiple uniquely steep across the snapshot.

Pivot breadthclosest adjacent profiles by skill overlap

Closest profiles by SkillSet-level overlap (Jaccard similarity over the SkillSets cited in at least 10% of postings for each profile, same window). New SkillSets required is the count of SkillSets that appear in the adjacent profile's set but not in this profile's:

GENERALIST_SWE

~29%

5 shared · ~5 new required

Shared core skillsets

Programming LanguagesPython for Data ScienceCloud PlatformsRelational DatabasesVersion Control Systems

New skillsets required (examples)

Java & Spring Core.NET Backend.NET & DesktopNoSQL DatabasesCore Web

DATA_ENGINEERING

~29%

7 shared · ~12 new required

Shared core skillsets

Programming LanguagesPython for Data ScienceCloud PlatformsRelational DatabasesContainers & OrchestrationSpark & Batch ProcessingVersion Control Systems

New skillsets required (examples)

Data Engineering LanguagesCloud Data WarehousesETL & OrchestrationNoSQL DatabasesMessaging & Event SystemsCI/CD Platforms

AI_AND_LLM

~26%

6 shared · ~11 new required

Shared core skillsets

Python for Data ScienceDeep Learning FrameworksCloud PlatformsRelational DatabasesContainers & OrchestrationLLM Agents & Orchestration

New skillsets required (examples)

Python BackendJava & Spring CoreLLM APIs & ModelsCore WebNoSQL DatabasesWeb Frontend Frameworks

DATA_ANALYTICS_AND_BI

~21%

4 shared · ~7 new required

Shared core skillsets

Programming LanguagesPython for Data ScienceAnalytics LanguagesRelational Databases

New skillsets required (examples)

Power BI EcosystemMicrosoft Power PlatformBI PlatformsCloud Data WarehousesETL & OrchestrationCI/CD Platforms

DOMAIN_SPECIFIC

~19%

4 shared · ~9 new required

Shared core skillsets

Cloud PlatformsRelational DatabasesContainers & OrchestrationVersion Control Systems

New skillsets required (examples)

Alternative Server-Side LanguagesJava & Spring CoreCore WebNoSQL DatabasesWeb Frontend FrameworksCI/CD Platforms

Pivot paths are diverse but each requires real reskilling. The closest, Generalist SWE (~29%) and Data Engineering (~29%), share Programming Languages, Python for Data Science, Cloud Platforms, and Relational Databases, with generalist adding backend Java/.NET and data engineering adding ETL Orchestration, Spark, and Cloud Data Warehouses. AI & LLM Applications (~26%) is the natural in-domain pivot, sharing Python and Deep Learning Frameworks but adding LLM Agents, LLM APIs, and Vector Databases. Data Analytics & BI (~21%) and Domain-Specific (~19%) form a more distant tier. Verdict: lateral mobility into adjacent data and engineering profiles is real but always involves a deliberate ramp, with AI & LLM Applications as the lowest-friction in-domain step.

MAANG and elite global tech pathwayshare of postings + senior pay

MAANG and elite global tech share of postings within this profile, broken out by seniority level:

MAANG and elite global tech share + senior pay

Within data science and ml

Share by seniority

Senior pay · same profile

MAANG senior[insufficient data]
Non-MAANG senior[insufficient data]

Skills that distinguish MAANG senior postings

MAANG presence is concentrated at the staff end of the ladder. The MAANG share at Staff is striking at ~34%, by far the highest staff-bucket figure in this profile, while Senior sits at ~6% and FA at ~11%. The distinct staff concentration suggests MAANG actively bids for senior-IC research and applied-science talent in this profile but hires more sparingly at the conventional Senior title. The MAANG senior cohort is too thin to compute a stable senior pay premium or a distinguishing-skills list. Verdict: MAANG hiring in data science skews toward staff-level research and scale-ML roles rather than mainstream senior IC. Realistic pathway: build a strong applied-research or production-ML profile over 5 to 8 years, then aim directly for staff-level openings; the senior rung is not a reliable entry point at MAANG in this profile.