Using Hive and Presto on Amazon EMR

Introduction

In this post, I am going to go over a simple project running on Amazon EMR. I am using a dataset “Baby Names from Social Security Card Applications In The US” which holds the data for 109 Years (1910-2018). I transformed the data to make it compatible with this project and made it available in Github. I converted the CSV files to Parquet format and used both of them to compare the performance.

Continue reading “Using Hive and Presto on Amazon EMR”