Add output check on readme

author: Santo Cariotti <santo@dcariotti.me> 2025-01-02 11:50:02 +0100
committer: Santo Cariotti <santo@dcariotti.me> 2025-01-02 11:50:02 +0100
commit: f4682cd0f7f54d6bcd57913a347d873e2d394d4e (patch)
tree: 7c21736e0a5003c354a33ad87a6cc9f65a7a55dc
parent: d6ee6dae981d99f339f80ed6e65deea91eeff2ce (diff)
1 files changed, 21 insertions, 0 deletions
diff --git a/README.md b/README.md
index b31bfbd..407bd48 100644
--- a/README.md
+++ b/README.md
@@ -67,3 +67,24 @@ To test on Google Cloud, execute the following shell scripts in the given order:
 
 `04-dataproc-create-cluster.sh` and `06-dataproc-update-cluster.sh` accept one
 argument: the workers number. It can be 1, 2, 3 or 4.
+
+Using `06-dataproc-update-cluster.sh` is not recommended if you want to test
+with another machine type. Instead, is better to run:
+
+```
+gcloud dataproc clusters delete ${CLUSTER} --region=${REGION}
+```
+
+Then, run again `scripts/04-dataproc-create-cluster.sh` + `scripts/05-dataproc-submit.sh`.
+
+If you want to check the output on your local machine, execute:
+
+```
+gsutil -m cp -r "gs://${BUCKET_NAME}/output" .
+```
+
+And downloading the data, you can find the max counter using:
+
+```
+cat part-000* | cut -d ',' -f 3 | awk '{if($1>max){max=$1}} END{print max}'
+```
author	Santo Cariotti <santo@dcariotti.me>	2025-01-02 11:50:02 +0100
committer	Santo Cariotti <santo@dcariotti.me>	2025-01-02 11:50:02 +0100
commit	f4682cd0f7f54d6bcd57913a347d873e2d394d4e (patch)
tree	7c21736e0a5003c354a33ad87a6cc9f65a7a55dc
parent	d6ee6dae981d99f339f80ed6e65deea91eeff2ce (diff)