Today across the world, organizations are inundated with huge amounts of data from all directions – and to make the best use of it, they must be able to harness all relevant data and analyze it to make the best decisions to transform their business. With this explosion in data, Hadoop has gained in significance as organizations worldwide have found Hadoop to be the best platform for managing and processing big data.
To make the most efficient use of the Hadoop platform, and fully analyze and utilize every bit of data for maximum productivity, training is of paramount importance. Trained Hadoop Data Analysts are much in demand as they will be able to leverage best practices to work with big data faster and more effectively.
Our Hadoop Data Analyst course is for those who wish to access, manipulate, and analyze massive data sets using SQL and familiar scripting languages on Hadoop. Learn how to transform data using Apache Pig, Apache Hive, and Cloudera Impala and analyze it using filters, joins, and user-defined functions familiar from other technologies.
1.1 Big Data Introduction
- What is Big Data
- Data Analytics
- Big Data Challenges
- Technologies supported by big data
1.2 Hadoop Introduction
- What is Hadoop?
- History of Hadoop
- Basic Concepts
- Future of Hadoop
- The Hadoop Distributed File System
- Anatomy of a Hadoop Cluster
- Breakthroughs of Hadoop
- Hadoop Distributions:
- Apache Hadoop
- Cloudera Hadoop
- Horton Networks Hadoop
- MapR Hadoop
2. Hadoop Daemon Processes
- Name Node
- Data Node
- Secondary Name Node
- Job Tracker
- Task Tracker
3. HDFS (Hadoop Distributed File System)
- Blocks and Input Splits
- Data Replication
- Hadoop Rack Awareness
- Cluster Architecture and Block Placement
- Accessing HDFS
- JAVA Approach
- CLI Approach
4. Hadoop Installation Modes and HDFS
- Local Mode
- Pseudo-distributed Mode
- Fully distributed mode
- Pseudo Mode installation and configurations
- HDFS basic file operations
5. Hadoop Developer Tasks
5.1 Writing a MapReduce Program
- Basic API Concepts
- The Driver Class
- The Mapper Class
- The Reducer Class
- The Combiner Class
- The Partitioner Class
- Examining a Sample MapReduce Program with several examples
- Hadoop’s Streaming API
6. Hadoop Ecosystems
- PIG concepts
- Install and configure PIG on a cluster
- PIG Vs MapReduce and SQL
- Write sample PIG Latin scripts
- Modes of running PIG
- PIG UDFs
- Hive concepts
- Hive architecture
- Installing and configuring HIVE
- Managed tables and external tables
- Joins in HIVE
- Multiple ways of inserting data in HIVE tables
- CTAS, views, alter tables
- User defined functions in HIVE
- Hive UDF
- SQOOP concepts
- SQOOP architecture
- Install and configure SQOOP
- Connecting to RDBMS
- Internal mechanism of import/export
- Import data from Oracle/MySQL to HIVE
- Export data to Oracle/MySQL
- Other SQOOP commands
- HBASE concepts
- ZOOKEEPER concepts
- HBASE and Region server architecture
- File storage architecture
- NoSQL vs SQL
- Defining Schema and basic operations
- HBASE use cases
- OOZIE concepts
- OOZIE architecture
- Workflow engine
- Job coordinator
- Installing and configuring OOZIE
- HPDL and XML for creating Workflows
- Nodes in OOZIE
- Action nodes and Control nodes
- Accessing OOZIE jobs through CLI, and web console
- Develop and run sample workflows in OOZIE
- Run MapReduce programs
- Run HIVE scripts/jobs
- FLUME Concepts
- FLUME Architecture
- Installation and configurations
- Executing FLUME jobs
7. Data Analytics using Pentaho as an ETL tool
- MapReduce and HIVE integration
- MapReduce and HBASE integration
- Java and HIVE integration
- HIVE – HBASE Integration
From the course:
Successful candidates will be able to learn:
- Basics of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with Hadoop tools
- How to join multiple data sets and analyze disparate data with Pig
- How to organize data into tables, perform transformations, and simplify complex queries with Hive
- How to perform real-time interactive analyses on massive data sets stored in HDFS or HBase using SQL with Impala
- How to pick the best tool for a given task in Hadoop, achieve interoperability, and manage workflows that are repetitive
There is no Certification offered for this course. On successful completion of the course, you will receive a Course Completion Certificate from Bacancy Trainings.
This course is best suited to data analysts, business analysts, developers, and administrators who have experience with SQL and basic UNIX or Linux commands.
Q. Can you tell me regarding the Training?
To make the most effective use of the Hadoop platform, and fully analyze and utilize every aspect of data for maximized productivity, training is of paramount importance. Trained Hadoop Data Analysts will be able to leverage best practices to work with big data faster and more effectively. Our Hadoop Data Analyst course is for those who wish to access, manipulate, and analyze massive data sets using SQL and familiar scripting languages on Hadoop. Learn how to transform data using Apache Pig, Apache Hive, and analyze it using filters, and other user-defined functions familiar from other technologies.
Q. Who can benefit from this course?
This training is aimed at data analysts, business analysts, developers, and administrators who have experience with SQL and basic UNIX or Linux commands.
Q.How we can register for the training?
You can register through online, we will provide online registration link you can use that link and do registration for the same.
Q.There is any group discount?
Yes, if you will be coming with 5 people we will give you 10% discount.
Q. What is training Timing & venue?
Training time will be 9:30 AM to 5:30 PM and venue will be communicated according to locations.