Tuesday, April 1, 2014

Spark Stream

http://ampcamp.berkeley.edu/wp-content/uploads/2013/07/Spark-Streaming-AMPCamp-3.pptx
http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html

Unifying Batch and Stream Processing Models
- Spark program on Twitter log file using RDDs
val tweets = sc.hadoopFile("hdfs://...")
val hashTags = tweets.flatMap (status => getTags(status))
hashTags.saveAsHadoopFile("hdfs://...")
- Spark Streaming program on Twitter stream using DStreams
val tweets = ssc.twitterStream()
val hashTags = tweets.flatMap (status => getTags(status))

hashTags.saveAsHadoopFiles("hdfs://...")
































No comments: