diff options
author | Santo Cariotti <santo@dcariotti.me> | 2025-01-02 11:50:02 +0100 |
---|---|---|
committer | Santo Cariotti <santo@dcariotti.me> | 2025-01-02 11:50:02 +0100 |
commit | f4682cd0f7f54d6bcd57913a347d873e2d394d4e (patch) | |
tree | 7c21736e0a5003c354a33ad87a6cc9f65a7a55dc | |
parent | d6ee6dae981d99f339f80ed6e65deea91eeff2ce (diff) |
Add output check on readme
-rw-r--r-- | README.md | 21 |
1 files changed, 21 insertions, 0 deletions
@@ -67,3 +67,24 @@ To test on Google Cloud, execute the following shell scripts in the given order: `04-dataproc-create-cluster.sh` and `06-dataproc-update-cluster.sh` accept one argument: the workers number. It can be 1, 2, 3 or 4. + +Using `06-dataproc-update-cluster.sh` is not recommended if you want to test +with another machine type. Instead, is better to run: + +``` +gcloud dataproc clusters delete ${CLUSTER} --region=${REGION} +``` + +Then, run again `scripts/04-dataproc-create-cluster.sh` + `scripts/05-dataproc-submit.sh`. + +If you want to check the output on your local machine, execute: + +``` +gsutil -m cp -r "gs://${BUCKET_NAME}/output" . +``` + +And downloading the data, you can find the max counter using: + +``` +cat part-000* | cut -d ',' -f 3 | awk '{if($1>max){max=$1}} END{print max}' +``` |