Wednesday 19 November 2014

R for processing JMeter output CSV files

So, I'm working as a performance engineer, and I run a lot of tests in JMeter. Of course most of those tests I run in non-GUI mode. To get results out of non-GUI mode there are two basic ways:
  1. Configure Aggregate report to save results to a file. Afterwards load that file to a GUI JMeter to see aggregated results.
  2. Use a special JMeter plugin to save aggregated results to a database.
I already wrote about the second way, so today I'll write about the first one. There are probably better ways to do it, but until recently this was how I processed results:
  1. Run a test, have it save results to a file.
  2. Open that file in Agregate Report component in GUI JMeter to get aggregated results.
  3. Click "Save Table Data" to get new csv with aggregated results.
  4. Edit that new CSV to get rid of samplers I am not interested in (mostly the ones that I didn't bother to name - e.g. separate URLs that compose a page), and to also get rid of the columns I am not interested in.
  5. Sort the data in the CSV by Sampler - this is because I run many tests, and I need to compare results between runs. For that reason I create a spreadsheet and copy response times and throughput data to that spreadsheet, adding more and more columns for the table with rows labeled as samplers. Whatever, works for me.
  6. Copy results from csv to a big spreadsheet and graph the results.
At some point I used macros and regexps in Notepad++ to do stage 4. Then my laptop died and I lost it, couldn't be bothered to write it again, even though it was big help. Still, even with the macro there were a lot of manual steps just to get to meaningful results.

But hey, guess what, I've been learning stuff recently - in particular Data Science and programming in R. So I used little I know and created this little script in R to do steps 2-5 above for me.

Now all I have to do is to place JMeter output files in a folder, start R Studio (which is a free tool, and I have it anyway) (you can probably do it with pure R, no need in R Studio even), set working directory to the folder with files and run the script. Script goes through all csv files in the folder (or you can setup whatever filenames list you want in the script), and for each file:
  • Calculates Median, 90% Line and Throughput per minute for each sampler
  • Removes the samplers starting with "/" - i.e. samplers I didn't bother to give proper names, so I am probably not very interested in their individual results.
  • Removes delays (that's the thing with our scripts - we use Debug sampler usually named as "User Delay" to set up a realistic load model).
  • Orders results by sample name.
  • Saves results to a separate file.
One button and voilĂ  - all the processing done. Now I only need to copy data to my big spreadsheet and graph it as I choose.

Script is in githubfeel free to grab and use.