Data Science

Data Science training in Vizag

Overview

Course Description

Data science is a field that applies principles and techniques of data analysis, machine learning, and statistics to gain insights and understanding from data-related events. In today’s job market, many aspire to become data scientists, making data science training one of the most popular courses to pursue. Regardless of the industry, employers are actively seeking skilled data scientists who can provide valuable business insights. Consequently, it is currently one of the most sought-after courses, with companies willing to offer substantial salaries to individuals who have undergone data science training. Data science is also used to analyze historical data and predict potential risks for companies, enabling proactive risk mitigation.

data Science training in Vizag

Data Science training in Vizag

Numerous online websites and offline coaching centers offer data science training. Online training institutes like JNNC Technologies stand out for providing high-quality training aligned with industry requirements, experienced trainers, real-world industry projects, and certification. The training also covers visualization and reporting tools. Those who cannot commit to regular training sessions, including working professionals looking to change careers, can benefit from self-paced data science learning opportunities. This self-learning approach features an updated curriculum in line with current industry needs and best practices, and it includes the guidance of experienced data science professionals who troubleshoot real-world data science issues.

Can you acquire data science skills at your own pace through online training? Yes, you can certainly learn data science through self-study. There are online resources and videos that can provide foundational knowledge about data science. However, self-learning online can be a bit challenging. Fortunately, online courses like the one offered by JNNC Technologies provide a supportive environment with online trainers available to assist with any queries. The data science self-learning online course is designed especially for busy individuals who cannot attend physical data science classes. This self-paced program accelerates the learning process and is suitable for working professionals who wish to take charge of their learning journey.

data Science training in Vizag

Data Science training in Vizag

Alternatively, individuals can utilize online training resources specifically designed to provide comprehensive knowledge about data science. These courses are in high demand and cater to individuals seeking a deep understanding of data science concepts. Contrary to popular belief, self-training can be just as effective as learning from an expert, provided the learner possesses the enthusiasm and determination to become a self-taught data science professional.

Given the importance of data science training, individuals who embark on this journey can fast-track their careers, securing lucrative job roles and advancing to the next level professionally. By enrolling in JNNC Technologies data science program, you can access top-notch training in this field.

Data Science training in Vizag

Course Curriculam

introduction

Overview of Data Science

Module 1: Statistics and Probability

a) Descriptive Statistics:

Central tendency: Mean, Median, Mode
Sample variance
Standard deviation
Random Variables: Discrete, Continuous
Probability density functions
Binomial distribution
Expected Value, E(X)
Poisson Process
Law of large numbers
Standard normal distribution and empirical rule
Z-score

b) Inferential Statistics:

Central limit theorem
Sampling distribution of the sample mean
Standard error of the mean
Mean and variance of Bernoulli distribution
Margin of error 1
Margin of error 2
Confidence interval
Hypothesis testing and p-value
One-tailed and two tailed tests
Z-statistics and T-statistics
Type 1 error
Squared error of regression line
Co-efficient of determination
Chi-square distribution
Pearson’s chi square test (goodness of fit)
Co-relation and casualty.

Module 2: Data Analysis using Numpy and Pandas

1. Numpy

Numpy Numpy Vector and Matrix
Functions – arrange(), zeros(), ones(), linspace(), eye (),
Reshape(), random(), max(), min(),
argmax(), argmin(), shape and dtype attribute
Indexing and Selection
Numpy Operations – Array with Array, Array with Scalars,
Universal Array Functions

2.Pandas

Pandas Series
Pandas Data-Frame
Missing Data (Imputation)
Group by Operations
Merging, Joining and Concatenating Data-Frame.
Pandas Operations
Data Input and Output from wide variety of formats like csv, excel, db and html etc.

Module 3: Data Visualization using Matplotlib, Seaborn, Pandas-in built, Plotly and Cufflinks

1.Matplotlib

plot() using Functional approach
multi-plot using subplot()
plt.figure() using OO API Methods
add_axes(), set_xlabel(), set_ylabel(), set_title() Methods
Customization – figure size, impoving dpi, Plot appearance,
Markers, Control over axis appearance and special Plot Types

2.Seaborn

Distribution Plots using distplot(), jointplot(), pairplot(), rugplot(),
kdeplot()
Categorical Plots using barplot(), countplot(), boxplot(), violinplot(),
stripplot(), swarmplot(), factorplot()
Matrix Plots using heatmap(), clustermap()
Grid Plots using PairGrid(), FacetGrid()
Regression Plots using lmplot()
Styles and Colors customization.

3. Plotly and Cufflinks

Interactive Plotting using Plotly and Cufflinks

4.Pandas Built-in

Histogram, Area Plot, Bar Plot, Scatter Plot, Box-plot, Hex-plot, Kde-plot, Density Plot e. Choropleth Maps
Interactive World Map and US Map using Plotly and Cufflinks Module

Module 4: GIT

Distribution Version Control System
How internally, GIT Manages Version Control on Changesets.
Creating Repository
Basic Commands like, git status, git add, git remove, git branch, git checkout, git log, git cat-file, git pull, git push, git commit
Managing Configuration – System Level, User Level, Repository level

Module 5: Jupyter Notebook

Introduction, Basic Commands, Keyboard Shortcut and Magic Functions

Module 6: Linear Algebra and Calculus

Vector and Matrix, basic operations
Trigonometry
Derivatives

Module 7: SQL

MySQL Server and Client Installation
SQL Queries
CRUD Operations

Module 8: Big Data

What is big data?
What is distributed computing?
What is parallel processing?
Why data scientist require big data?

Module 9: Machine Learning Introduction

What is Machine Learning
Machine Learning Process Flow-Diagram
Different Categories of Machine Leaning – Super- vised, Unsupervised and Reinforcement
Scikit-Learn Overview
Scikit-Learn cheat-sheet

Module 10: Regression

Linear Regression
Robust Regression (RANSAC Algorithm)
Exploratory Data Analysis (EDA)
Correlation Analysis and Feature Selection
Performance Evaluation – Residual Analysis, Mean Square Error (MSE), Co-efficient
Determination R^2, Mean Absolute Error (MAE), Root Mean Square Error (RMSE)
Polynomial Regression
Regularized Regression – Ridge, Lasso and Elas- tic Net Regression
Bias-Variance Trade-Off
Cross Validation – Hold Out and K-Fold Cross Validation
Data Pre-Processing – Standardization, Min-Max, Normalization and
Binarization
Gradient Descent

Projects

Predicting Boston House Prices – https://www.kaggle.com/schirmerchad/ bostonhoustingmlnd
Ecommerce Project – Company want to decide whether to focus their efforts on Mobile Experience or Website Experience.
4 USA Housing Prediction Project.
New York City Taxi Fare Prediction – https://www.kaggle.com/c/new-york-city-taxi- fareprediction
Emergency 911 Calls – https://www.kaggle.com/ mchirico/montcoalert

Module 11: Classification – Logistic Regression

Sigmoid function
Logistic Regression learning using Stochastic Gra-dient Descent (SGD)
SGD Classifier
Measuring accuracy using Cross-Validation, Strati-fied k-fold
Confusion Matrix – True Positive (TP), False Posi-tive (FP), False
Negative (FN), True Negative (TN)
Precision, Recall, F1 Score, Precision/Recall Trade-Off
Receiver Operating Characteristics (ROC) Curve.

Projects

Digit Recognizer –https://www.kaggle.com/c/digit-recognizer
Titanic: Machine Learning from Disaster – https://www.kaggle.com/c/titanic
Advertising Project – Indicating whether a particular internet user will click on an advertisement or not.
Project on working on classified Data to predict the Target Class 0 or 1.
Another, Project on working on classified Data to predict the Target Class 0 or 1.

Module 12: Classification – k-Nearest Neighbor(KNN)

Classification and Regression
Application, Advantages and Disadvantages
Distance Metric – Euclidean, Manhattan, Cheby- shev, Minkowski
Measuring accuracy using Cross-Validation, Stratified k-fold, Confusion Matrix, Precision, Recall, F1-score.

Projects

Breast Cancer Wisconsin (Diagnostic) Pro-ject using KNN- https://www.kaggle.com/uciml/breastcancer-wisconsin-data
Iris Species – https://www.kaggle.com/ uciml/iris

Module 13: Classification – SVM (Support Vector Machine)

Classification and Regression
Separating line, Margin and Support Vectors
Linear SVC Classification
Polynomial Kernel – Kernel Trick
Gaussian Radial Basis Function (rbf)
Grid Search to tune hyper-parameters
Support Vector Regression

data Science training in Vizag

Projects

Breast Cancer Wisconsin (Diagnostic) Project using KNN –https://www.kaggle.com/uciml/breastcancer-wisconsin-data
Iris Species – https://www.kaggle.com/uciml/iris

Module 14: Classification –Decision Trees

CART (Classification and Regression Tree)
Advantages and Disadvantages and its applications.
Decision Tree Learning algorithms – ID3, C4.5, C5.0 and CART.
Gini Impurity, Entropy and Information Gain
Decision Tree Regression
Visualizing a Decision Tree using graphviz module.
Regularization using tuning hyper-parameters using GridSearch CV.

Projects

1.IBM HR Analytics Employee Attrition and Per-formance –

https://www.kaggle.com/pavansubhasht/ibm-hranalytics-attritiondatasetZomato

2.Restaurants Data – https://www.kaggle.com/shrutimehta/zomatorestaurants-data

3.Predicting Bank Marketing Analysis -https://www.kaggle.com/kevalm/bankmarketingdataset

4.FIFA 18 Complete Player Dataset – https://www.kaggle.com/thec03u5/fifa-18demo-

playerdataset

Module 15: Classification – Ensemble Methods

Bootstrap Aggregating or Bagging
Random Forest algorithm
Extremely Randomized (Extra-Trees) Ensemble
Boosting – AdaBoost (Adaptive Boosting), Gradient Boosting
Machine (GBM), XGBoost (Extreme Gradient Boosting)

Module 16: Unsupervised Learning – Clustering

Connectivity- based Clustering using Hierarchical Clustering.
Ward’s Agglomerative Hierarchical Clustering
K-Means Clustering
Elbow Method and Solhouette Analysis

Projects

1. Lending Club Loan Data Analysis – https://www.kaggle.com/wendykan/lending-club-loan- data

2.U.S. News And World Report’s College Data –https://www.kaggle.com/flyingwombat/us-newsand- world-reports-college-data

3.Credit Card Dataset for Clustering –https://www.kaggle.com/arjunbhasin2013/ccdata

Module 17: Unsupervised Learning – Dimensionality

Linear Principal Component Analysis (PCA) reduction.
Kernel PCA
Linear Discriminant Analysis (LDA) on Supervised Data.

Projects

1.Breast Cancer Wisconsin (Diagnostic) Analysis us-ing PCA –

https://www.kaggle.com/uciml/breast-cancerwisconsin -data

2.Predicting Abalone’s Sex – https:// www.kaggle.com/yuridias/abalonedataset

3.Wine Project – https://www.kaggle.com/zynicide/ wine-reviews

4.SMS Spam Collection Dataset Analysis –https://www.kaggle.com/uciml/sms-spam- collection- dataset

5.Auto Summarizing Text using Rule Based Model.

6.Yelp Business Rating Prediction – https:// www.kaggle.com/c/yelprecsys-2013

Module 18: Model Deployment on AWS Cloud

What is cloud computing?
What is AWS?
How to store data in AWS S3?
Create deep learning instance on EC2.
Amazon sage maker to train, tune, build and deploy on production.

Data Science

Data Science training in Vizag

Overview

Course Description

Data Science training in Vizag

Data Science training in Vizag

Data Science training in Vizag

Course Curriculam

Module 1: Statistics and Probability

Module 2: Data Analysis using Numpy and Pandas

Module 3: Data Visualization using Matplotlib, Seaborn, Pandas-in built, Plotly and Cufflinks

Module 4: GIT

Module 5: Jupyter Notebook

Module 6: Linear Algebra and Calculus

Module 7: SQL

Module 8: Big Data

Module 9: Machine Learning Introduction

Module 10: Regression

Module 11: Classification – Logistic Regression

Module 12: Classification – k-Nearest Neighbor(KNN)

Module 13: Classification – SVM (Support Vector Machine)

Module 14: Classification –Decision Trees

Module 15: Classification – Ensemble Methods

Module 16: Unsupervised Learning – Clustering

Module 17: Unsupervised Learning – Dimensionality

Module 18: Model Deployment on AWS Cloud

Newsletter Get Instant access to our weekly newsletter

About JNNC Technologies

Services

Quick Links

Contact Info