Author's Latest Posts


Spark On AWS Graviton2 Best Practices: K-Means Clustering Case Study


This report focuses on how to tune a Spark application to run on a cluster of instances. We define the concepts for the cluster/Spark parameters, and explain how to configure them given a specific set of resources. We use a K-Means machine learning algorithm as a case study to analyze and tune the parameters to achieve the required performance while optimally using the available resources. W... » read more