Spark_setup_all

Français

Welcome, we will discover in this tutorial the Spark environment and the installation under Windows 10 and we’ll do some testing with Apache Spark to see what makes this Framework and learn to use it. The code for this lab will be done in Java and Scala, which for what we will do is much lighter than Java. Do not worry if you do not know what language we will use only very simple features of Scala, and basic knowledge of functional languages is all you need. If that’s not enough, Google is your friend.

Baidu Spark Browser is a web browser based on the same engine that Google uses for its browser Chrome. At first glance, you'll notice an attractive interface that is totally customizable, so you can change its color to fit your liking. Mar 24, 2016 The web at your fingertips. Spark Browser is a modern and innovative web browser with advanced capabilities added to improve the browsing experience. It is a product of a Chinese tech web giant Baidu and it is based on Chromium.It delivers some.

Demo on Youtube

You will need the Java Development Kit (JDK) for Spark works locally. This is the first step described below.
Install Scala (Optional)

Note : These instructions apply to Windows. If you use a different operating system, you have to adapt the system variables and the paths to the directories in your environment.

Installing the JDK

Téléchargez le JDK depuis le site d’Oracle, la version 1.8 est recommandée.. Vérifiez l’installation depuis le répertoire bin sous le répertoire JDK 1.7 en tapant la commande java -version. Si l’installation est correcte, cette commande doit afficher la version de Java installée.

Download the JDK from Oracle’s site, version 1.7 is recommended . Check the installation from the bin directory under the JDK 1.7 directory by typing java -version. If the installation is correct, this command should display the version of Java installed.

Add the JAVA_HOME variable in the system environment variables with value: C: Program Files Java jdk1.7.x

Add in the variable PATH system environment value: C: Program Files Java jdk1.7.x bin

Testing with the cmdjava -version.

Installing Scala

Scala download from the link: http://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.msi

Define environmental variables following system:

Variable: SCALA_HOME Value: C: Program Files (x86) scala
Variable: système PATH Value: C: Program Files (x86) scala bin

Spark_setup_all.exe تحميل

Check cmd scala , see below.

Download the latest version from the Spark website. The most recent version at the time of this writing is 2.0.1. You can also select a specific version based on a version of Hadoop. I myself have downloaded Spark for Hadoop 2.7 and the file name is spark-2.0.1-bin-hadoop2.7.tgz. Unzip the file to a local directory, such as D: Spark.

Spark_setup_all Free Download

Then download Windows Utilities from the Github repohttps://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1 and paste it in D: spark spark-2.0.1-bin-hadoop2.7 bin. add theses environment variable system.

Variable: HADOOP_HOME Value: D:spark-2.0.1-bin-hadoop2.7
Variable: SPARK_HOME Value: D:spark-2.0.1-bin-hadoop2.7bin
Variable système PATH Value: D:sparkspark-2.0.1-bin-hadoop2.7bin

To verify the installation of Spark, position yourself on the Spark directory and run the Shell with the following commands:

This stage finished, you can exit the shell:

Spark Free Download

Application Scala via shell

Spark Once installed and running, you can run queries to analyze with the API. Simple controls to read data from a text file and process are available. We will look more advanced use cases in future articles in the series. Let’s start by using the API to perform the known example of word count.

move to your file directory, eg:

Open a Shell Scala, then run the following commands Scala

The cache function is called to store RDD created cache, so that Spark does not have to recalculate each time, with each subsequent request. Note that caching () is a lazy operation, Spark does not store the data directly in memory, in fact, this will be done when the action is invoked on an RDD. Now we can call the function count to see how many lines are present in the text file. txtData.count ()

The following commands perform the word count and display the account next to each word found in the file.

Other examples of using the API can be found on the Spark website, in the documentation.

Java application with Eclipse and Maven

To use the Spark API in Java we’ll choose as Eclipse IDE with Maven. initially starting with Apache Maven 3.3.9 download since http://apache.mivzakim.net/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.zip and extract eg in D: apache-maven-3.3.9
Ensuite ajouter les variables d’environnements suivantes :

MAVEN_HOME value: D:apache-maven-3.3.9
Path value : D:apache-maven-3.3.9bin

Check cmd, see below

Let the installation of the Eclipse IDE. We’ll use a light version of Eclipse Luna and then add Manven Eclipse.
Download Eclipse since https://eclipse.org/downloads/packages/eclipse-ide-java-developers/lunasr2
Extract the content of the archive into a directory and start Eclipse.
Turning to the integration of Maven into Eclipse following the above screenshots:

http://download.eclipse.org/technology/m2e/releases

Plugin configuration M2e

Impose on Eclipse to use your Maven installation by clicking Add and choose your Maven directory.

Now you have completed the installation step, we’ll create our first Spark project in Java.

Open Eclipse and do File => New Project => Select Maven Project; see below.

Now add External Jars from the location D: spark spark-2.0.1-bin-hadoop2.7 lib; see below.

Edit pom.xml. Paste the following cod.

Write your code or simply paste the following code:

Build the project: Go to the following location (where we stored the project) execute mvn package cmd

Test your application with the following command Spark

Result

In this article, we saw how the Framework Apache Spark with its standard API, helps us in processing and analyzing data. Spark is based on the same Hadoop file storage system, so it is possible to use Spark and Hadoop together where significant investments have already been made with Hadoop.

You can also combine the types of treatments with Spark Spark SQL, Spark and Spark Machine Learning Streaming as we shall see in future articles. With different modes of integration adapters and Spark, Spark you can combine with other technologies. For example you can use all Spark, Kafka and Apache Cassandra; Kafka for streaming incoming data, Spark for the treatment and Cassandra NoSQL database for storing results.

Spark_setup_all.exe

However, keep in mind that Spark is not yet fully mature ecosystem and that it needs improvement in some areas such as security and integration with BI tools.

fasromni