I am troubleshooting query performance issues when running large queries against an AWS Redshift cluster. The majority of the time is spent transferring the data over the network and processing the data in the application. I am not concerned with the amount of time it takes the Redshift cluster to execute the query and start returning data.
I am trying to devise a test where I can measure and quantify the maximum throughput of the Redshift cluster over the AWS network to the EC2 instance I'm testing on. I want to discard the result set as quickly as possible as to not add any additional overhead.
The end result will be to compare the maximum throughput of the cluster and network, to the performance I am seeing in the application processing the data. Ultimately, this will allow me to better understand where my bottlenecks are and what maximum target I can achieve from Redshift.
I already know that the end to end processing speed of the data (33MB/s) seems well below the network and Redshift potential. I want to quantify the upper limit of that throughput.
Is there a mechanism or tool through which to do this? I can use Windows or Linux to do this. I would prefer to use an ODBC connection to closely simulate the application and existing driver. Or any other method for which to measure what I am asking about?
I've considered jMeter, but I'm not sure this is the most direct and simplest approach. I also tried using SQL Server Import and Export Wizard, but it's really stinking slow and clearly not pushing the network or cluster and it also has additional overhead to save the data to disk.