!pip install pyspark
In [1]:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns
In [2]:
import pyspark from pyspark.rdd import RDD from pyspark.sql import Row from pyspark.sql import DataFrame from pyspark.sql import SparkSession from pyspark.sql import SQLContext from pyspark.sql import functions from pyspark.sql.functions import lit, desc, col, size, array_contains\ , isnan, udf, hour, array_min, array_max, countDistinct from pyspark.sql.types import * from pyspark.ml import Pipeline from pyspark.sql.functions import mean,col,split, col, regexp_extract, when, lit
For this exercise, I will use the purchase data. Let us take a look at this data using unix head command. We can run unix commands in Python Jupyter notebook using ! in front of every command.
In [3]:
!head -1 purchases.csv
12-29 11:06 Fort Wayne Sporting Goods 199.82 Cash
Firstly, We need to create a spark container by calling SparkSession. This step is necessary before doing anything
In [4]:
from pyspark.sql import SparkSession from pyspark.sql.types import * #create session in
from Planet SciPy
read more
No comments:
Post a Comment