diff options
author | Santo Cariotti <santo@dcariotti.me> | 2025-01-26 15:23:46 +0100 |
---|---|---|
committer | Santo Cariotti <santo@dcariotti.me> | 2025-01-26 15:23:46 +0100 |
commit | 344d250b74ab667687ffe5c820114eb4deea871a (patch) | |
tree | 537bc483f9415ea88322345fc0fd1c67311d682a | |
parent | f1d310658f8f8d7b1c1c7cf802cb98a451a61ed1 (diff) |
Add readme for weak-scaling
-rw-r--r-- | README.md | 27 |
1 files changed, 27 insertions, 0 deletions
@@ -149,3 +149,30 @@ $ for JOB in `gcloud dataproc jobs list --region="${REGION}" --format="table(ref gcloud dataproc jobs delete --region="${REGION}" $JOB --quiet; \ done ``` + +### Test weak scaling efficiency + +We have a good parameter of testing increasing the input file by n-times. For +instance, for 2 nodes we can use a doubli-fication of exam's input file. + +``` +$ cat order_products.csv order_products.csv >> order_products_twice.csv +$ ls -l +.rw-r--r-- santo santo 417 MB Fri Nov 22 12:43:07 2024 📄 order_products.csv +.rw-r--r-- santo santo 834 MB Mon Jan 13 15:12:13 2025 📄 order_products_twice.csv +$ wc -l *.csv + 32434489 order_products.csv + 64868978 order_products_twice.csv +$ egrep -n "^2,33120" order_products_twice.csv +1:2,33120 +32434490:2,33120 +$ scripts/00-create-service-account.sh; \ + scripts/01-create-bucket.sh ./order_products_twice.csv; \ + scripts/02-dataproc-copy-jar.sh; \ + scripts/03-update-network-for-dataproc.sh; \ + scripts/04-dataproc-create-cluster.sh 2 n1-standard-4 n1-standard-4; \ + scripts/05-dataproc-submit.sh 200 +``` + +The given output is what we obtain using 2 work-units for 2 nodes $W(2) = +\frac{T_1}{T_2}$. |