Profile

Data Science & ML

Skills deep-diveWhat to learn for Data Science & ML: the must-have floor, tracks, skill arc and company pools

Overview

Section 1 / Overview

Modeling real-world systems to predict, classify, and act.

academically-rootedresearch-adjacentsenior-skewed

Data scientists analyze data, build predictive models, and put machine learning systems into production using statistical methods, deep learning frameworks, and MLOps tools. The work sits closer to research than to application development. PyTorch, TensorFlow, and the Python data stack are the standard tools. Mathematical foundations matter more here than in most engineering roles. The specializations divide further by problem type, including NLP, computer vision, classical ML, and generative AI research.

Specializations

Classical ML & Statistical Modeling

Share within role

~38%

Weekly share

Jan W1now

Roles centered on traditional ML techniques such as classification, regression, clustering, recommendation systems, and feature engineering. Scikit-learn, XGBoost, and statistical libraries are the core tools. Strong mathematical and statistical foundations matter more than deep learning expertise.

Recommendation SystemsFraud DetectionDemand ForecastingCustomer Segmentation

Generative AI & LLM Applications

Share within role

~34%

Weekly share

Jan W1now

Roles focused on building applications with LLMs, including RAG, agents, prompt engineering, and fine-tuning workflows. LangChain, LlamaIndex, vector databases, and OpenAI APIs define the stack. Often overlaps with AI application engineering but with deeper modeling context.

LLM ApplicationsRAG PipelinesFine-Tuned ModelsGenerative AI Products

Natural Language Processing

Share within role

~13%

Weekly share

Jan W1now

Roles focused on natural language processing, including text analytics, named entity recognition, information retrieval, semantic search, and language understanding. spaCy, NLTK, BERT, and HuggingFace transformers are the core tools. The classical NLP track, distinct from pure LLM application work.

Search SystemsText ClassificationSentiment AnalyticsKnowledge Extraction

Computer Vision

Share within role

~10%

Weekly share

Jan W1now

Roles focused on image processing, video analytics, and visual recognition systems. OpenCV, YOLO, and Detectron2 are the core tools, alongside CNN-based architectures. The work spans medical imaging, autonomous systems, and consumer applications. A specialist track requiring strong perception expertise.

Image RecognitionObject DetectionVideo AnalyticsVisual Inspection

Speech & Audio Processing

Share within role

~3%

Weekly share

Jan W1now

Roles focused on speech recognition, audio processing, and signal processing at the model and algorithm level. Research-focused work on the underlying techniques rather than end-user voice products. A narrow specialist segment that does not substitute with NLP or computer vision.

Speech RecognitionAudio ClassificationVoice Analytics

Edge & Embedded ML

Share within role

~2%

Weekly share

Jan W1now

ML roles targeting edge devices, embedded systems, RTOS, and model optimization for constrained environments. TFLite, ONNX, NVIDIA TensorRT, and CUDA define the deployment tools. The work puts inference efficiency and on-device performance ahead of training scale. A niche but fundamentally different deployment context.

On-Device AIIoT IntelligenceReal-Time InferenceEdge Models

Section 2 / Skills

Skills at a Glance

Data science and ML hiring requirements primarily ask for a Python and statistics core in addition to four problem-type tracks. The track depends on whether the work leans toward classical methods, speech and audio, natural language processing, or computer vision. The two subsections below separate what hiring managers expect from what they value as a plus.

Core skillsets-what hiring managers expect

The core skills are Python with Scikit-learn, Pandas, and NumPy. SQL, Git, and Linux cover the engineering side, used to pull in data and ship code beyond the notebook. Transformers and Neural Networks are the deep-learning building blocks to know, with CNN for vision work. Reinforcement Learning, Supervised Learning, and Unsupervised Learning are the names of the main learning approaches. Beyond these basics, the work specializes into one of four areas. Classical ML uses Feature Engineering, Statistics, R, and Random Forest, while speech and audio work uses Speech LLMs and Speech Processing. Natural language processing uses Information Retrieval, BERT, and Semantic Search, and computer vision uses Computer Vision Algorithms, OpenCV, and Image Processing.

PREREQUISITE

Python for Data Science

PythonScikit-learnPandasNumPyMatplotlibSciPy

PREREQUISITE

Engineering Baseline

SQLGitLinux

CORE

Deep Learning Architectures

TransformersNeural NetworksCNN

CORE

ML Paradigms

Reinforcement LearningSupervised LearningUnsupervised Learning

TRACK

Classical ML & Statistics

PythonFeature EngineeringStatisticsRClassificationStatistical ModelingRegressionRecommendation SystemsRandom ForestNumerical OptimizationClusteringAnomaly DetectionProbabilityTime Series ForecastingTime Series Analysis

TRACK

Speech & Audio Processing

Speech Processing

TRACK

Natural Language Processing

Information Retrieval

TRACK

Computer Vision

Computer Vision AlgorithmsImage ProcessingOpenCVVideo Analytics

Auxiliary skillsets-what sets you apart

Java and C/C++ show up when a data science model has to work with the live backend code that runs the product, when that code is not written in Python. PyTorch and TensorFlow are the main frameworks for building deep-learning models, with Keras as a simpler option that hides more of the detail. Cloud platforms (AWS, Azure, and GCP) provide the computing power to train the models and run them, and Docker and Kubernetes make sure the work runs the same way on any machine. Spark, PySpark, and Hadoop process large amounts of data to build the input features the model learns from. Airflow and Kafka move the data through the steps that come before the model. MLflow, SageMaker, Kubeflow, and Vertex AI are the MLOps tools teams use to keep versions of their models, track how they perform, and roll them out for large-scale use.

Backend Programming Languages

PythonJavaC/C++GoJavaScript

Deep Learning Frameworks

PyTorchTensorFlowKeras

Cloud Platforms & Containers

AWSAzureGCPDockerKubernetes

Big Data & Pipelines

Data PipelinesSparkData ProcessingHadoopData CleansingPySparkETL

Data Engineering Tools

SparkAirflowKafkaDatabricks

MLOps & ML Platforms

MLflowSageMakerKubeflowVertex AI

Section 3 / Demand & Pay

Where the market sits and what it pays

Data Science and ML is the rarest profile, last by volume, with only around 20 postings a week. The mix tilts toward MAANG and Tier-1 Global Tech at around a fifth, one of the highest such shares anywhere. MNCs and GCCs still lead outright at just under two in five. Senior pay reaches 55 LPA and mid-level sits at 32 LPA, both among the highest of all the profiles, though there are too few entry-level postings to give a figure. The sections below trace the thin weekly volume and the company mix, then turn to the roles open to freshers.

Demand by company class-weekly

Postings per week, segmented by company class:

Postings per week, by company class

Window overall (January 2026 to July 2026)

MNCs and Global Capability Centers~40%Indian Product Companies and Unicorns~5%MAANG and Tier-1 Global Tech~20%Established SME~10%Funded Startups~5%Indian IT Services / WITCH~15%Lala Companies~3%Other~5%

Window overall · ~20 / wk

This profile leans toward MAANG, carrying one of the heaviest global-tech weights of all the profiles, with demand falling from its January high on very thin weekly counts. The standout is that MAANG and Tier-1 Global Tech presence, which runs higher here than in most other profiles. Over the period the mix shifts week to week on small samples, so the takeaway is the high-end tilt itself rather than any one category draining out. Few other profiles combine this kind of global-tech concentration with pay that is among the highest at both mid-level and senior.

Demand by experience-weekly

Postings per week, segmented by experience:

Postings per week, by experience band

Window overall (January 2026 to July 2026)

Fresher (FA)~10%Mid~50%Senior~30%Staff~9%

Window overall · ~20 / wk

Mid-level roles make up the largest share at around half, with senior roles next at just under a third. Fresher postings hold a notably high share at a bit more than a tenth, among the broadest entry shares of all the profiles. Staff sit at just under a tenth. On such thin weekly counts the split moves around, but the fresher share stays unusually generous for a profile this specialized.

Fresher-accessible cut-where entry-level roles sit

Roles open to freshers, meaning entry and junior level applicants, make up just under a fifth of Data Science and ML postings, one of the highest fresher shares of any profile. Weekly fresher volume is small in plain numbers at around 0 to 12 a week, since the profile itself is the rarest. Within the fresher roles, MAANG and Tier-1 Global Tech climbs above its overall share while MNCs and GCCs ease back.

Inside the fresher cut · company class distribution

MNCs and Global Capability Centers~30%Indian Product Companies and Unicorns~10%MAANG and Tier-1 Global Tech~25%Established SME~10%Funded Startups~5%Indian IT Services / WITCH~8%Lala Companies~8%Other~3%

MNCs and GCCs lead the fresher roles at around three in ten, though that is well below their overall share, one of the largest drops in the mix. MAANG and Tier-1 Global Tech rises to around a quarter, with Indian Product Companies and Unicorns and Lala Companies each up a little. The fresher roles lean harder toward global-tech and product employers than the overall mix does.

Entry-level pay distribution (LPA)

median 12

Median Rs 12 LPA · share of entry-level offers at each LPA value.

Entry pay concentrates sharply at 12 LPA, which is both the most common offer and the top of the visible spread. A smaller cluster sits at the 4 LPA floor, and the median also lands at 12 LPA, so the distribution is tight rather than broad. The strong MAANG and Tier-1 presence in fresher demand pins so many first offers at the 12 LPA mark instead of the floor. That presence is among the highest of the profiles.

Share of entry-level offers at each pay level (LPA).
Salary (LPA)	Share (%)
0	0.1
1	0.5
2	2.2
3	5.8
4	8.0
5	6.0
6	3.1
7	2.8
8	3.4
9	3.7
10	7.0
11	15.5
12	20.6
13	14.7
14	5.6
15	1.1
16	0.1

Section 4 / Career Trajectory

Where this profile takes you once you're in

Data Science and ML sits close to the average on ladder shape, with Senior and Staff together slightly below the typical level across profiles. The pay story, though, is one of the most extreme of all the profiles. The climb runs to a typical 91 LPA at Staff, the highest Staff median in the set, and the Staff top end runs the longest of all fifteen. Switches are narrow, clustered among AI and LLM and the other data profiles. The defining feature is that pay pattern. The longest Staff tail and one of the largest senior pay gaps in the data sit on the same page. The four sections below cover whether the climb to senior is real, whether going deep on the technical track pays, which sideways moves are within reach, and how to reach the top firms.

Seniority ladder-this profile vs others

Distribution of postings by seniority level (this profile vs the rest of the market, the other 14 profiles, all-time):

Seniority mix

Share of postings by band · this profile vs the rest of the market

This profileRest of market

FAMidSeniorStaff

Share of postings by band. Bars compare this profile against rest of market. Values approximate.

Mid leads at around half, below the usual just-over-half. Senior matches the average at around three in ten, and Staff holds even at a small share. Fresher roles run heavier here at well over a tenth, and Senior and Staff combined sit slightly below the typical level. Overall, the ladder is roughly the usual shape, fuller at entry and a touch lighter at the senior end.

IC pay premium-LPA spread (p10–p90), by seniority

Compensation progression along the individual-contributor (IC) track, in LPA, with quartiles at each seniority level:

Pay distribution by seniority

LPA · this profile

p10–p90 spreadp90medianp10

120

160

200

Entry

Junior

Mid

Senior

Staff

Seniority · pay in LPA

Pay percentiles (LPA) by seniority level.
Seniority	p10	Median	p90
Entry	4	12	12
Junior	8	15	42
Mid	14	32	58
Senior	28	54	72
Staff	46	88	200

The ladder starts modestly, 12 LPA typical at entry and 15 at junior, then more than doubles into Mid at 32. Senior lands at 52 and Staff at 91, the highest Staff median in the set. Past the median the staff tier runs very long, reaching 200 LPA at the very top. That is where this profile separates from the rest of the market. Staying deep is clearly rewarded, at 7.5 times the entry median by Staff.

Pivot breadth-closest adjacent profiles by skill overlap

Closest profiles by skill-set overlap, measured over the skill sets cited in at least one in ten postings for each profile in the same window. New skill sets required counts the skill sets that appear in the adjacent profile's set but not in this profile's:

AI_AND_LLM

~25%

5 shared · ~11 new required

Shared core skillsets

Python for Data ScienceDeep Learning FrameworksCloud PlatformsContainers & OrchestrationLLM Agents & Orchestration

New skillsets required

Python BackendJava & Spring CoreNoSQL DatabasesLLM APIs & ModelsVector Databases

DATA_ENGINEERING

~25%

5 shared · ~12 new required

Shared core skillsets

Programming LanguagesPython for Data ScienceCloud PlatformsContainers & OrchestrationSpark & Batch Processing

New skillsets required

Data Engineering LanguagesRelational DatabasesCloud Data WarehousesETL & OrchestrationNoSQL Databases

DATA_ANALYTICS_AND_BI

~20%

3 shared · ~7 new required

Shared core skillsets

Programming LanguagesPython for Data ScienceAnalytics Languages

New skillsets required

Power BI EcosystemMicrosoft Power PlatformBI PlatformsCloud Data WarehousesCI/CD Platforms

DEVOPS_AND_PLATFORM

~15%

3 shared · ~14 new required

Shared core skillsets

Programming LanguagesCloud PlatformsContainers & Orchestration

New skillsets required

DevOps LanguagesCI/CD PlatformsInfrastructure as CodeMonitoring & ObservabilityShell & OS Environments

GENERALIST_SWE

~15%

2 shared · ~6 new required

Shared core skillsets

Programming LanguagesPython for Data Science

New skillsets required

Java & Spring CoreRelational Databases.NET BackendCore Web.NET & Desktop

The closest move is AI and LLM Applications, sharing the Python, deep-learning, and orchestration core while asking for LLM APIs and backend skills. Data Engineering is just as close on shared Python and Spark, needing warehouse and ETL skills. Data Analytics and BI is a moderate reach on shared analytics languages. DevOps and Domain-Specific are far off, sharing only cloud and container basics. Overall, there is little scope to move sideways, with AI and LLM the natural step and the rest of the data family a reachable second group.

MAANG and elite global tech pathway-share of postings + senior pay

MAANG and elite global tech share of postings within this profile, broken out by seniority level:

MAANG and elite global tech share + senior pay

Within data science and ml

Share by seniority

Fresher (FA)~15%

Mid~10%

Senior~10%

Staff~30%

010%20%30%

Senior pay · this profile

MAANG senior~105 LPA

Non-MAANG senior~50 LPA

Skills that distinguish MAANG senior postings

MAANG presence concentrates at the top here, around three in ten at Staff against around a tenth at fresher and Senior levels. That staff-heavy shape fits the top firms keeping their data-science roles for the most senior research and platform work. The senior pay gap is one of the largest of all the profiles. The MAANG senior pay sits near 105 LPA against 50 LPA for senior roles elsewhere, a difference of roughly 55 LPA, or more than double. The data does not surface the skills that set senior roles apart. Overall, the MAANG and elite global tech tier is a top-level destination here, so the path runs through Staff-level research depth rather than early entry.

Explore more

Skills deep-dive

What to learn for Data Science & ML

Closest roles by skill overlap

Browse