# My test SparkR program - mySparkR.R
require(SparkR)
# this does not work since I don' have a cluster setup
# sc < - sparkR.init(master="spark://david-centos6:7077", sparkEnvir=list(spark.executor.memory="1g"))
sc < - sparkR.init(master="local[2]", sparkEnvir=list(spark.executor.memory="1g"))
lines < - textFile(sc, "hdfs://david-centos6:8020/user/david/data/result.txt")
words < - flatMap(lines,
function(line) {
strsplit(line, " ")[[1]]
})
wordCount < - lapply(words, function(word) { list(word, 1L) })
counts < - reduceByKey(wordCount, "+", 2L)
output < - collect(counts)
for (wordcount in output) {
cat(wordcount[[1]], ": ", wordcount[[2]], "\n")
}
# pur the input file into HDFS
> hadoop fs -put result.txt data
# Run SparkR to test it
> ./sparkR examples/mySparkR.R
- or To increase the memory used by the driver you can -
> SPARK_MEM=1g ./sparkR examples/mySparkR.R
NOTE: SparkUI is at http://david-centos6:4040
reference: http://stackoverflow.com/questions/21677142/running-a-job-on-spark-0-9-0-throws-error
ERROR sometimes you will be experiencing:
"Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory"
A simple test you can run to test if you have memory or other problems using Spark Shell
Try to run
MASTER="local[2]" spark-shell
on the same machine you're trying to run the code. And the same code in spark console: sc.parallelize(1 to 100).count
If the sufficient memory problem persist then you might want to try to add SPARK_WORKER_MEMORY=2g to the file tools/spark-0.9.0-incubating-bin-hadoop2/conf/spark-env.sh (Not sure if this help yet ???)
No comments:
Post a Comment