Summer Industrial Training for Information Technology Engineering Students.

Information Technology is the most evergreen branch in the field of engineering as there are ample of employment opportunities available in this field. But by simply learning the syllabus textbooks will not help you to obtain dream career, in fact you need to learn the practical aspects of the field.

In Present World Cloud Computing / BigData Hadoop is the main Job oriented Field (life line of this field) in Information Technology. Information technology (IT) is a broad field that encompasses many careers, including computer programming, technical support and systems analysis.linuxworld

LinuxWorld Informatics Pvt. Ltd Learning Project mission is to identify bright, serious and dedicated students/interns from a college and give them extraordinarily tough challenging projects that will significantly elevate their learning to newer levels.

We are also not biased towards any college. So interns can come from IIT / NIT / REC / Top Private College will evaluate their suitability and potential to perform. There are certain qualities that we look for in the interns.

banner-fb

Summer intern will be working under guidance of our Founder & CTO and are going to work towards creating extremely useful artifacts like tutorials, programming assignments on these topics.

After completion of training program candidate will feel confident as they will find their skills enhanced and can start their career as System Admin, Software Developer, Project Engineer, Data base Admin, Linux Admin. Cloud Admin.

Our Industrial training program is support by project implementation which will help students to get industrial experience and experts are well experienced with 13 years of training experience.

To more Information about summer internship Project please visit blow link

http://www.linuxworldindia.org/linuxworldindia-summer-industrial-training.php

Two certificates are being provided to all the participants at the end of the training:

1} A course certificate for the technology opted.
2} A certificate of the project work. (After the successful completion of the project)

Accommodation Facility
There is many P.G. Accommodations nearby (1-2 Kms distance) from LinuxWorld Office. We don’t have tie-up with any P.G. accommodation, but we can suggest few P.G. accommodations nearby. You have to talk to them directly for your stay. The cost of stay ranges from 3500 INR to 5000 INR per month depending on your choice of stay and/or food.


45 Days Project Based Summer Training in Jaipur

Linux world informatics Pvt. Ltd. is an ISO 9001:2008 certified proficient Information Technology and Computer Science Training & Development Company, working towards the best career prospect of the growing engineers.

Linux world informatics Pvt. Ltd specializes in providing training on various technologies namely:

  1. Devops
  2. Docker
  3. Chef
  4. Big Data Hadoop Deployment
  5. OpenStack Cloud Computing
  6. BIG DATA using Hadoop
  7. Openstack
  8. PHP
  9. RedHat Linux using PHP
  10. Cisco Networking Design and Deployment
  11. Cisco and RedHat Linux
  12. Hacking
  13. Develop Hacking & Linux
  14. Python
  15. Android Apps Development
  16. Oracle
  17. Java Core
  18. NET
  19. JSP
  20. Speech Control and Voice
  21. Configure RedHat Linux

download_20151118_122319

Platforms offered during BTech Summer Training for Students: Big Data Hadoop I Cloud Computing I RHCSA I RHCE I RHCSS I RHCVA I RHCA I PHP I CCNA I CCNP I Firewall I Perl I Shell Scripting and many more to go….

Duration of Summer Training Jaipur: 4 weeks / 6 weeks / 45 Days / 60 Days / 6 Month and according to students Summer Vacation
Deliverables from LinuxWorld Informatics Pvt Ltd for Summer Intern India:

A. Technical Benefits:

1. Project Certificate from LinuxWorld Informatics Pvt Ltd.
2. Learn from Industry Experts having 13+ years of experience.
3. Life Time Membership Card – Life Time Support
4. 24 x 7 Lab Facility
5. Practical Exposure by getting hands-on experience at our well defined labs during Summer Training in Jaipur

B. Management Benefits:

1. CV Building during Summer Training at Jaipur
2. Assistance in preparing Summer Training Project Report
3. Guidance for Presentation to be submitted at college level (PPT)
4. Familiarizing with tips and techniques to overcome the fear to face the interviews & group discussions.
5. Mock Group Discussions will be conducted
6. Grooming Sessions and much more to go.

Contact details:

LinuxWorld Informatics Pvt. Ltd.

Online Application form – http://www.linuxworldindia.org/summer-training-2016-application-form.php

Mob: – 09351788883/09351009002


It’s not just you: Azure Active Directory is down for Europe

A configuration error in Microsoft’s Azure Active Directory service is preventing customers from accessing a wide range of Microsoft services hosted in Europe, including Office 365 and Visual Studio Team Services.

What better time to sharpen your JavaScript skills? And for free!

The most recent problems began around 9 a.m. UTC Thursday, and were still ongoing shortly after midday UTC, Microsoft reported on its Azure status page.

However, they first showed up in Visual Studio Team Services on Wednesday, between 9.44 p.m. and 11.44 p.m. UTC, Microsoft said. Customers using Microsoft’s West Europe, South Central U.S., North Central U.S., and Australia East data centres may have run into HTTP 500 Internal server errors during this time, the company said.

Staff traced those errors back to a recent configuration change in Azure Active Directory — but rolling back the change did not eliminate the errors.

“Some of the roles in the farm across our Scale Units hit a caching bug that was triggered by the earlier outage. At this moment, we do not understand root cause of the caching bug, however we have taken the required dumps to do final root cause analysis and get to the bottom of the issue,” Microsoft staff explained shortly after midnight UTC.

The problems Thursday morning affected a wider range of services depending on Azure Active Directory, including Stream Analytics, Azure management portals, Azure Data Catalogue, Operational Insights, Remote App and SQL databases.

Some Office 365 customers were also unable to log in or access the service.

In preparing a failover to working servers, “The Azure Active Directory team identified an issue with the failover mitigation path, which would have blocked the mitigation,” Microsoft reported.

With that path ruled out, the team has been forced to take a more laborious one: updating Azure Active Directory front ends to call a known good configuration in the hope that this will improve performance.

Microsoft promised another status update at around 1.10 p.m. UTC (8.10 a.m. ET


Major IT Players Form R Consortium to Strengthen Data Analysis

The Linux Foundation announced the formation of R Consortium, with the intention of strengthening technical and user communities around the R language, the open source programming language for statistical data analysis.

The new organization R Consortium became an official project of Linux Foundation and is designed to strengthen R language users.  It is expected that R Consortium will complement the existing fund, and will focus on expanding the user base of R, as well as focus on improving the interaction of users and developers.

The Representatives of the R Foundation and industry representatives are behind the new consortium. Microsoft and RStudio have joined the consortium as platinum members. TIBCO Software is a gold member and Alteryx, Google, HP, Mango Solutions, Ketchum Trading and Oracle have joined as silver members.

R Consortium will complement the work of R Foundation, establishing communication with user groups and engaging in supporting projects – related to the creation and maintenance of R mirror sites, testing, resources for quality control, the financial support and promotion of the language. Also, the consortium will assist in creating support packages for R and organizing other related software projects.

R is a programming language and development environment for scientific calculations and graphics that originated at the University of Auckland (New Zealand). The R language has enjoyed significant growth and now supports more than two million users. A wide grass industries adopted the R language, including biotech, finance, research and high-tech industries. The R language is integrated with frequency analysis, visualization, and reporting applications.

Having acquired the company Revolution Analytics (which makes strong use of language), Microsoft announced that it is joining the consortium together with other founding members such as Google, Oracle, HP, Tibcom, Rstudio, Alteryx to finance the new consortium.

Microsoft’s official said that “the R Consortium will complement the work of the R Foundation, a nonprofit organization that maintains the language, and will focus on user outreach and other projects designed to assist the R user and developer communities. This includes both technical and infrastructure projects such as building and maintaining mirrors for downloading R, testing, QA resources, financial support for the annual useR! Conference and promotion and support of worldwide user groups.”

Google also says they have thousands of users and their own developers using R, so this language is crucial for many of their products. Google is happy to join the rest of companies to continue to maintain the infrastructure of the open source R.

Microsoft’s support of real-time analytics for Apache Hadoop in Azure HDInsight and machine learning in Azure Marketplace use R language to service anomaly detection for preventive maintenance or detection of fraud.


Apache Spark 1.5.2 and new versions of Ganglia monitoring, Presto, Zeppelin, and Oozie now available in Amazon EMR

You can now deploy new applications on your Amazon EMR cluster. Amazon EMR release 4.2.0 now offers Ganglia 3.6, an upgraded version of Apache Spark (1.5.2), and upgraded sandbox releases of Apache Oozie (4.2.0), Presto (0.125), and Apache Zeppelin (0.5.5). Ganglia provides resource utilization monitoring for Hadoop and Spark. Oozie 4.2.0 includes several new features, such as adding Spark actions and HiveServer2 actions in your Oozie workflows. Spark 1.5.2, Presto 0.125, and Zeppelin 0.5.5 are maintenance releases, and contain bug fixes and other optimizations.

You can create an Amazon EMR cluster with release 4.2.0 by choosing release label “emr-4.2.0” from the AWS Management Console, AWS CLI, or SDK. You can specify Ganglia, Spark, Oozie-Sandbox, Presto-Sandbox, and Zeppelin-Sandbox to install these applications on your cluster. To view metrics in Ganglia or create a Zeppelin notebook, you can connect to the web-based UIs for these applications on the master node of your cluster. Please visit the Amazon EMR documentation for more information about Ganglia 3.6, Spark 1.5.2, Oozie 4.2.0, Presto 0.125, and Zeppelin 0.5.5


How a Cloud Antivirus Works

How a Cloud Antivirus Works

by

panda cloud antivirus

Panda Cloud Antivirus scans your computer at regular intervals and checks it against the latest malware threats in its database.

Screenshot by Stephanie Crawford for HowStuffWorks

Whether you have years of computing behind you, or you’ve just bought your first laptop or desktop, you’re probably familiar with the need to protect computers from viruses. A virus is a software program that installs itself on your computer and makes undesirable changes to the data on your computer. Though there are rare viruses designed to target offline computers, we’re talking about malicious software (malware) you can pick up from the Internet.

To prevent malware from attacking your data, you can use antivirus software. One antivirus option is a technology called cloud antivirus. Cloud antivirus software does most of its processing elsewhere on the Internet rather than on your computer’s hard drive. Internet technology like cloud computing has made such innovations both possible and affordable.

Cloud antivirus software consists of client and Web service components working together. The client is a small program running on your local computer, which scans the system for malware. Full locally installed antivirus applications are notorious resource hogs, but cloud antivirus clients require only a small amount processing power.

The Web service behind cloud antivirus is software running on one or more servers somewhere on the Internet. The Web service handles most of the data processing so your computer doesn’t have to process and store massive amounts of virus information. At regular intervals, the client will scan your computer for any malware listed in the Web service’s database.

Here’s a summary of the advantages cloud antivirus has over traditional, locally installed antivirus software:

  • You have access to the latest data about malware within minutes of the cloud antivirus Web service learning about it. There’s no need to continually update your antivirus software to ensure you’re protected from the latest threats.
  • The cloud antivirus client is small, and it requires little processing power as you go on with your day-to-day activities online.
  • It’s free! You can get an impressive level of virus protection from the free versions of cloud antivirus software. You can also purchase upgrades for additional utilities and support, for prices that are competitive with popular local-only antivirus applications.

Now that you know what cloud antivirus is, let’s look at the features of cloud antivirus software and how you can use them to keep your system clean.


What is Infrastructure as a Service?

The definition of infrastructure as a service (IaaS) is pretty simple. You rent cloud infrastructure—servers, storage and networking—
on demand, in a pay-as-you-go model.

Since you don’t need to invest in your own hardware,
IaaS is perfect for start-ups or businesses testing out
a new idea.

Also, since the infrastructure scales on demand, it’s great for workloads that fluctuate rapidly.
Public IaaS

Your business rents infrastructure from the cloud provider,
and accesses that infrastructure over the Internet, in order to create or use applications.

Networking
Servers
Storage
Virtualisation

IaaS is the fastest growing area of cloud computing.*

Enterprise public cloud spending is expected to reach $207 billion
by 2016**

Common public IaaS workloads: dev/test, website hosting, storage, simple application development
Managed IaaS

But some workloads require an advanced solution.
Managed IaaS is suited for large enterprises running production workloads.


Introduction to MAPReduce

MapReduce is a programming model suitable for processing of huge data. Hadoop is capable of running MapReduce programs written in various languages: Java, Ruby, Python, and C++. MapReduce programs are parallel in nature, thus are very useful for performing large-scale data analysis using multiple machines in the cluster.

MapReduce programs work in two phases:

  1. Map phase
  2. Reduce phase.

Input to each phase are key-value pairs. In addition, every programmer needs to specify two functions: map function and reduce function.

The whole process goes through three phase of execution namely,

How MapReduce works

Lets understand this with an example –

Consider you have following input data for your MapReduce Program

Welcome to Hadoop Class

Hadoop is good

Hadoop is bad

The final output of the MapReduce task is

bad 1
Class 1
good 1
Hadoop 3
is 2
to 1
Welcome 1

The data goes through following phases

Input Splits:

Input to a MapReduce job is divided into fixed-size pieces called input splits Input split is a chunk of the input that is consumed by a single map

Mapping

This is very first phase in the execution of map-reduce program. In this phase data in each split is passed to a mapping function to produce output values. In our example, job of mapping phase is to count number of occurrences of each word from input splits (more details about input-split is given below) and prepare a list in the form of <word, frequency>

Shuffling

This phase consumes output of Mapping phase. Its task is to consolidate the relevant records from Mapping phase output. In our example, same words are clubed together along with their respective frequency.

Reducing

In this phase, output values from Shuffling phase are aggregated. This phase combines values from Shuffling phase and returns a single output value. In short, this phase summarizes the complete dataset.

In our example, this phase aggregates the values from Shuffling phase i.e., calculates total occurrences of each words.

The overall process in detail

  • One map task is created for each split which then executes map function for each record in the split.
  • It is always beneficial to have multiple splits, because time taken to process a split is small as compared to the time taken for processing of the whole input. When the splits are smaller, the processing is better load balanced since we are processing the splits in parallel.
  • However, it is also not desirable to have splits too small in size. When splits are too small, the overload of managing the splits and map task creation begins to dominate the total job execution time.
  • For most jobs, it is better to make split size equal to the size of an HDFS block (which is 64 MB, by default).
  • Execution of map tasks results into writing output to a local disk on the respective node and not to HDFS.
  • Reason for choosing local disk over HDFS is, to avoid replication which takes place in case of HDFS store operation.
  • Map output is intermediate output which is processed by reduce tasks to produce the final output.
  • Once the job is complete, the map output can be thrown away. So, storing it in HDFS with replication becomes overkill.
  • In the event of node failure before the map output is consumed by the reduce task, Hadoop reruns the map task on another node and re-creates the map output.
  • Reduce task don’t work on the concept of data locality. Output of every map task is fed to the reduce task. Map output is transferred to the machine where reduce task is running.
  • On this machine the output is merged and then passed to the user defined reduce function.
  • Unlike to the map output, reduce output is stored in HDFS (the first replica is stored on the local node and other replicas are stored on off-rack nodes). So, writing the reduce output

How MapReduce Organizes Work?

Hadoop divides the job into tasks. There are two types of tasks:

  1. Map tasks (Spilts & Mapping)
  2. Reduce tasks (Shuffling, Reducing)

as mentioned above.

The complete execution process (execution of Map and Reduce tasks, both) is controlled by two types of entities called a

  1. Jobtracker : Acts like a master (responsible for complete execution of submitted job)
  2. Multiple Task Trackers : Acts like slaves, each of them performing the job

For every job submitted for execution in the system, there is one Jobtracker that resides on Namenode and there are multiple tasktrackers which reside on Datanode.

  • A job is divided into multiple tasks which are then run onto multiple data nodes in a cluster.
  • It is the responsibility of jobtracker to coordinate the activity by scheduling tasks to run on different data nodes.
  • Execution of individual task is then look after by tasktracker, which resides on every data node executing part of the job.
  • Tasktracker’s responsibility is to send the progress report to the jobtracker.
  • In addition, tasktracker periodically sends ‘heartbeat’ signal to the Jobtracker so as to notify him of current state of the system.
  • Thus jobtracker keeps track of overall progress of each job. In the event of task failure, the jobtracker can reschedule it on a different tasktracker.

Article Source – http://www.guru99.com/introduction-to-mapreduce.html


Create Your First Hadoop Program

Find out Number of Products Sold in Each Country.

Input: Our input data set is a CSV file, SalesJan2009.csv

Prerequisites:

  • This tutorial is developed on Linux – Ubuntu operating System.
  • You should have Hadoop (version 2.2.0 used for this tutorial) already installed.
  • You should have Java (version 1.8.0 used for this tutorial) already installed on the system.

Before we start with the actual process, change user to ‘hduser’ (user used for Hadoop ).

su – hduser_

Steps:

Create a new directory with name MapReduceTutorial

sudo mkdir MapReduceTutorial

Give permissions

sudo chmod -R 777 MapReduceTutorial

Copy files SalesMapper.java, SalesCountryReducer.java and SalesCountryDriver.java in this directory.

Download Files Here

If you want to understand the code in these files refer this Guide

Check the file permissions of all these files

and if ‘read’ permissions are missing then grant the same-

2. Export classpath

export CLASSPATH=”$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.2.0.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar:~/MapReduceTutorial/SalesCountry/*:$HADOOP_HOME/lib/*”

3. Compile java files (these files are present in directory Final-MapReduceHandsOn). Its class files will be put in the package directory

javac -d . SalesMapper.java SalesCountryReducer.java SalesCountryDriver.java

This warning can be safely ignored.

This compilation will create a directory in a current directory named with package name specified in the java source file (i.e. SalesCountry in our case) and put all compiled class files in it.

Step )

Create a new file Manifest.txt

sudo gedit Manifest.txt

add following lines to it,

Main-Class: SalesCountry.SalesCountryDriver

SalesCountry.SalesCountryDriver is name of main class. Please note that you have to hit enter key at end of this line.

Step Create a Jar file

jar cfm ProductSalePerCountry.jar Manifest.txt SalesCountry/*.class

Check that the jar file is created

6. Start Hadoop

$HADOOP_HOME/sbin/start-dfs.sh

$HADOOP_HOME/sbin/start-yarn.sh

7. Copy the File SalesJan2009.csv into ~/inputMapReduce

Now Use below command to copy ~/inputMapReduce to HDFS.

$HADOOP_HOME/bin/hdfs dfs -copyFromLocal ~/inputMapReduce /

We can safely ignore this warning.

Verify whether file is actually copied or not.

$HADOOP_HOME/bin/hdfs dfs -ls /inputMapReduce

8. Run MapReduce job

$HADOOP_HOME/bin/hadoop jar ProductSalePerCountry.jar /inputMapReduce /mapreduce_output_sales

This will create an output directory named mapreduce_output_sales on HDFS. Contents of this directory will be a file containing product sales per country.

9. Result can be seen through command interface as,

$HADOOP_HOME/bin/hdfs dfs -cat /mapreduce_output_sales/part-00000

o/p of above

            OR

Results can also be seen via web interface as-

Results through web interface-

Open r in web browser.

Now select ‘Browse the filesystem’ and navigate upto /mapreduce_output_sales

o/p of above

Open part-r-00000


Introduction To Flume and Sqoop

Before we learn more about Flume and Sqoop , lets study

Issues with Data Load into Hadoop

Analytical processing using Hadoop requires loading of huge amounts of data from diverse sources into Hadoop clusters.

This process of bulk data load into Hadoop, from heterogeneous sources and then processing it, comes with certain set of challenges.

Maintaining and ensuring data consistency and ensuring efficient utilization of resources, are some factors to consider before selecting right approach for data load.

Major Issues:

1. Data load using Scripts

Traditional approach of using scripts to load data, is not suitable for bulk data load into Hadoop; this approach is inefficient and very time consuming.

2. Direct access to external data via Map-Reduce application

Providing direct access to the data residing at external systems(without loading into Hadopp) for map reduce applications complicates these applications. So, this approach is not feasible.

3.In addition to having ability to work with enormous data, Hadoop can work with data in several different forms. So, to load such heterogeneous data into Hadoop, different tools have been developed. Sqoop and Flume are two such data loading tools.

Introduction to SQOOP

Apache Sqoop (SQL-to-Hadoop) is designed to support bulk import of data into HDFS from structured data stores such as relational databases, enterprise data warehouses, and NoSQL systems. Sqoop is based upon a connector architecture which supports plugins to provide connectivity to new external systems.

An example use case of Sqoop, is an enterprise that runs a nightly Sqoop import to load the day’s data from a production transactional RDBMS into a Hive data warehouse for further analysis.

Sqoop Connectors

All the existing Database Management Systems are designed with SQL standard in mind. However, each DBMS differs with respect to dialect to some extent. So, this difference poses challenges when it comes to data transfers across the systems. Sqoop Connectors are components which help overcome these challenges.

Data transfer between Sqoop and external storage system is made possible with the help of Sqoop’s connectors.

Sqoop has connectors for working with a range of popular relational databases, including MySQL, PostgreSQL, Oracle, SQL Server, and DB2. Each of these connectors knows how to interact with its associated DBMS. There is also a generic JDBC connector for connecting to any database that supports Java’s JDBC protocol. In addition, Sqoop provides optimized MySQL and PostgreSQL connectors that use database-specific APIs to perform bulk transfers efficiently.

In addition to this, Sqoop has various third party connectors for data stores,

ranging from enterprise data warehouses (including Netezza, Teradata, and Oracle) to NoSQL stores (such as Couchbase). However, these connectors do not come with Sqoop bundle ;those need to be downloaded separately and can be added easily to an existing Sqoop installation.

Introduction to FLUME

Apache Flume is a system used for moving massive quantities of streaming data into HDFS. Collecting log data present in log files from web servers and aggregating it in HDFS for analysis, is one common example use case of Flume.

Flume supports multiple sources like –

  • ‘tail’ (which pipes data from local file and write into HDFS via Flume, similar to Unix command ‘tail’)
  • System logs
  • Apache log4j (enable Java applications to write events to files in HDFS via Flume).

Data Flow in Flume

Flume agent is a JVM process which has 3 components –Flume SourceFlume Channel and Flume Sink– through which events propagate after initiated at an external source .

  1. In above diagram, the events generated by external source (WebServer) are consumed by Flume Data Source. The external source sends events to Flume source in a format that is recognized by the target source.
  2. Flume Source receives an event and stores it into one or more channels. The channel acts as a store which keeps the event until it is consumed by the flume sink. This channel may use local file system in order to store these events.
  3. Flume sink removes the event from channel and stores it into an external repository like e.g., HDFS. There could be multiple flume agents, in which case flume sink forwards the event to the flume source of next flume agent in the flow.

Some Important features of FLUME

  • Flume has flexible design based upon streaming data flows. It is fault tolerant and robust with multiple failover and recovery mechanisms. Flume has different levels of reliability to offer which includes ‘best-effort delivery’ and an ‘end-to-end delivery’Best-effort delivery does not tolerate any Flume node failure whereas ‘end-to-end delivery’ mode guarantees delivery even in the event of multiple node failures.
  • Flume carries data between sources and sinks. This gathering of data can either be scheduled or event driven. Flume has its own query processing engine which makes it easy to transform each new batch of data before it is moved to the intended sink.
  • Possible Flume sinks include HDFS and Hbase. Flume can also be used to transport event data including but not limited to network traffic data, data generated by social-media websites and email messages.

Since July 2012, Flume is being released as Flume NG (New Generation), as it differs significantly from its original release, as known as Flume OG (Original Generation).

Sqoop Flume HDFS
Sqoop is used for importing data from structured data sources such as RDBMS. Flume is used for moving bulk streaming data into HDFS. HDFS is a distributed file system used by Hadoop ecosystem to store data.
Sqoop has a connector based architecture. Connectors know how to connect to the respective data source and fetch the data. Flume has an agent based architecture. Here, code is written (which is called as ‘agent’) which takes care of fetching data. HDFS has a distributed architecture where data is distributed across multiple data nodes.
HDFS is a destination for data import using Sqoop. Data flows to HDFS through zero or more channels. HDFS is an ultimate destination for data storage.
Sqoop data load is not event driven. Flume data load can be driven by event. HDFS just stores data provided to it by whatsoever means.
In order to import data from structured data sources, one has to use Sqoop only, because its connectors know how to interact with structured data sources and fetch data from them.

In order to load streaming data such as tweets generated on Twitter or log files of a web server, Flume should be used. Flume agents are built for fetching streaming data.

HDFS has its own built-in shell commands to store data into it.HDFS can not import streaming data