summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.md21
1 files changed, 21 insertions, 0 deletions
diff --git a/README.md b/README.md
index b31bfbd..407bd48 100644
--- a/README.md
+++ b/README.md
@@ -67,3 +67,24 @@ To test on Google Cloud, execute the following shell scripts in the given order:
`04-dataproc-create-cluster.sh` and `06-dataproc-update-cluster.sh` accept one
argument: the workers number. It can be 1, 2, 3 or 4.
+
+Using `06-dataproc-update-cluster.sh` is not recommended if you want to test
+with another machine type. Instead, is better to run:
+
+```
+gcloud dataproc clusters delete ${CLUSTER} --region=${REGION}
+```
+
+Then, run again `scripts/04-dataproc-create-cluster.sh` + `scripts/05-dataproc-submit.sh`.
+
+If you want to check the output on your local machine, execute:
+
+```
+gsutil -m cp -r "gs://${BUCKET_NAME}/output" .
+```
+
+And downloading the data, you can find the max counter using:
+
+```
+cat part-000* | cut -d ',' -f 3 | awk '{if($1>max){max=$1}} END{print max}'
+```