It is a different thing to run Spark on Linux and a very different experience to run Spark in Windows. Last few days had been very frustrating for me from the perspective that I have been trying hard to setup Apache Spark on my desktop and run a very simple example and finally it completed today. In the following post I will be documenting my experience and how anyone else can avoid these problems.
First let me explain my environment:
Based on a project requirement I wanted to test I chose the following version of Spark which I downloaded from Spark Website.
And then I setup Scala & SBT which I downloaded from the following links.
Then created a batch script to set it up all
https://wiki.apache.org/hadoop/Hadoop2OnWindows
I downloaded the binaries from Apache website and tried extracting the binaries and copied the winutils.exe to the hadoop bin directory. And though I ran the above hadoop command but when I ran spark-shell again I started getting new errors. And with lot of searching I restored back to the below binaries for Hadoop 2.6 and installed Microsoft Visual C++ 2010 Redistributable Package (x86) package for the correct Microsoft DLL binding for winutils to reflect. And then I re-ran the steps as in the above apache documentation.
Though Hadoop did not start but spark-shell started and I was able to use it.
Now I know this is not as detailed or as concrete an experiment as one would expect but this was helpful where I did not have to rebuild spark & hadoop from scratch for my system.
Hoping that this will be of help to others bye bye and have a great day.
First let me explain my environment:
OS: Windows 7 64 Bit
Processor: i5
RAM: 8 GB
Based on a project requirement I wanted to test I chose the following version of Spark which I downloaded from Spark Website.
spark-1.6.0-bin-hadoop2.6.tgzAs a pre-requisite I had the following version of Oracle Java
java version "1.8.0_25" and JAVA_HOME was setup appropriately.I use a batch script for the setup which is very handy.
jdk1.8.bat
@echo off
echo Setting JAVA_HOME
set JAVA_HOME=C:\jdk1.8.0_25-windows\java-windows
echo setting PATH
set PATH=%JAVA_HOME%\bin;%PATH%
echo Display java version
java -version
And then I setup Scala & SBT which I downloaded from the following links.
scala version 2.11.0-M8
sbt 0.13.13Downloaded the winutils.exe based on the advice of this stack overflow answer
http://stackoverflow.com/questions/25481325/how-to-set-up-spark-on-windows
winutils.exe linkAnd then setup the necessary access for c:\tmp\hive based on advice from this blog
Then created a batch script to set it up all
envscala.bat
@echo off
REM set SPARK & Scala related Dirs
set USERNAME=pridash4
set HADOOP_HOME=c:\rcs\hadoop-2.6.5
set SCALA_HOME=C:\scala-2.11.0-M8\scala-2.11.0-M8
set SPARK_HOME=C:\spark-1.6.0-bin-hadoop2.6
set SBT_HOME=C:\sbt-launcher-packaging-0.13.13
set PATH=%HADOOP_HOME%\bin;%SCALA_HOME%\bin;%SBT_HOME%\bin;%SPARK_HOME%\bin;%PATH%
Then I followed the following command:
>jdk1.8.bat
>envscala.bat
>spark-shell.bat
All started but again all stopped at one error:
- The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
This almost wasted one full day and despite trying all the steps I still got this error.
Then I re-read and found this stack overflow post "http://stackoverflow.com/questions/40409838/the-root-scratch-dir-tmp-hive-on-hdfs-should-be-writable-current-permissions" which gave the idea to install hadoop binaries itself and run the below command.
hadoop fs -chmod -R 777 /tmp/hive/
;
Thus started my new adventure to install hadoop 2.6 based on the below Apache documentation:https://wiki.apache.org/hadoop/Hadoop2OnWindows
I downloaded the binaries from Apache website and tried extracting the binaries and copied the winutils.exe to the hadoop bin directory. And though I ran the above hadoop command but when I ran spark-shell again I started getting new errors. And with lot of searching I restored back to the below binaries for Hadoop 2.6 and installed Microsoft Visual C++ 2010 Redistributable Package (x86) package for the correct Microsoft DLL binding for winutils to reflect. And then I re-ran the steps as in the above apache documentation.
Though Hadoop did not start but spark-shell started and I was able to use it.
Now I know this is not as detailed or as concrete an experiment as one would expect but this was helpful where I did not have to rebuild spark & hadoop from scratch for my system.
Hoping that this will be of help to others bye bye and have a great day.
No comments:
Post a Comment