Saturday 8 June 2019

On neurodiversity in testing community

Disclaimer: this is quite a personal post for me, albeit on testing. It is quite possible that it will make some people uncomfortable, and while I hope it does not damage my career, I am aware that it might. I am compelled to write it anyway. The thing is I am allergic to any kinds of closets. I spent a good time of my twenties living in a closet about my sexuality in homophobic Russia, and those who've been through a similar experience know how toxic it is. Once I got out I promised myself to never do that to myself again.

This is a very weird preface to a blog post about software testing, but I promise it will make sense if you hang in with me.

I have been active in the testing community on and off, and those amongst you who know me at all probably do so from my participation in testing meetups and conferences in New Zealand in 2013-2015. After that for a combination of reasons (getting farther away from functional testing after switching to performance testing, as well as going on maternity leave and turning off twitter to avoid all the political stuff that was giving me strong anxiety) I took a long break from the testing community. Mind you, it wasn't because I didn't like the people in it, quite the opposite, the future seemed bright, awesome things were happening on the conferences and meetups, and I enjoyed the atmosphere very much. The last thing that was a biggy for me was a WeTest conference, where Aaron Hodder talked about his struggles with social anxiety and about making life easier for neurodiverse people because of how valuable that diversity is for the society in general and for testing job specifically. The audience was receptive, I certainly enjoyed the talk very much myself because I could relate to it a lot, and I thought how wonderful it was that we are talking about these things out loud.

Forward to 2019 when I came back to twitter and quite suddenly discovered that the testing community is hating on James Bach for a reason unknown to me. People wouldn't even say "exploratory testing" much anymore, I remember reading a twit along the lines of "don't throw the baby out with the water" meaning don't throw away the idea of exploratory testing with James Bach. To say I was puzzled would be an understatement, why would exploratory testing even need defending in this manner, who would want to throw it away? I tried carefully asking questions about what happened, but the best I got was screenshots of some innocent looking slides and mentions of bullying. While I usually side with the victim by default, in this case it just didn't make any sense to me. Blunt and honest is one thing, intentionally mean and a bully - is another thing altogether, and it did not match the profile so to say. I was not to this day able to get any specifics on the bullying part, so I am left thinking that the slides and James' usual bluntness were misread. Which at the time I thought was quite silly and mind boggling to me how a community of people who are supposed to be big on critical thinking and objectivity could come to far fetched conclusions and just declare someone a bully without a good reason. It also made me very sad, because I myself have been in situations where my words were wildly misinterpreted as attacks, or as having hidden agenda.

Now, the purpose of this post is not to defend James Bach, as I am sure he is perfectly capable of that himself, and moreover I am way too late to that particular fight.
The purpose of this post is to be a reminder that there are all kinds of people in the testing community, with different communication approaches and different struggles. And it is simply not fair, nor is it realistic, to expect everyone to be perfectly sensitive and able to craft their phrasing to a non-offensive perfection. There will be miscommunications, such as what I understand happened with the slides James is still vilified for, and feelings will be hurt as long as we take every disagreement as a personal attack.

I know it can be very hard not to take it as a personal attack when someone talks to you less nicely than you would've preferred, especially if that someone has an authority in your eyes. But I think we could all benefit from pausing before making conclusions and asking an honest clarifying question at least once, and then maybe trusting that the answer is not a lie.

Now back to the disclaimer. I am now, at the age of 33, discovering that I am quite possibly autistic. Still waiting for a confirmation from a second specialist, but that would certainly explain a lot. And whether I am one or not, there are definitely a lot of autistic people amongst us in testing community (and in the population in general, for that matter). There are also, as Aaron said years ago, a lot of adults with various other types of brain wiring. The point is, we are a very diverse bunch. And I can't talk for everyone, but I can say for myself that I am making a lot of adjustments to be successful in this world. I have been doing it for so long (at least since what you'd call middle school), that I didn't even question it until coming face to face with a possibility that I am autistic, but it does not mean that they come easy and that I am not paying the price in elevated levels of anxiety, exhaustion and depression. The point is I am trying very hard to conform to what is expected of a software testing professional, which includes tolerating open spaces with their constant sensory overload and participating to the best of my abilities in non-essential social interactions to keep a team running smoothly. I smile and nod at jokes I don't understand, I keep my own remarks to myself where they are not strictly work-related, and I wrap whatever I have to say into layers of learned niceties instead of just stating facts as I would for myself. I also meditate, hide away on empty floors, stim in barely-visible ways, pretend to be super busy to discourage chit chat when I cannot handle communication and find myself quiet corners on conferences to take breathers. Still, sometimes I have panic attacks on a train home. All in all, it is extra work on top of actually, you know, doing my work, and I am not complaining, I am lucky to work in the area I love and am good at.

But it would certainly be nice if the community in return was making some adjustments for people like me. For example, not misinterpreting our words (or lack of thereof in case of non-participation in social rituals) as coldhearted attacks and insisting we are lying when we try to explain our intentions were far from it. Give your fellow human the benefit of the doubt, maybe? We are worth it.

P.S.: To be absolutely clear, I am not trying to say that James Bach is autistic, I am merely using the snowball of misunderstanding around his slides as an example of people jumping to conclusions.
P.S.S.: Also linking to Aaron's excellent list of resources that I found while trying to reference his excellent talk of years ago in some way: Inclusive Collaboration Resources

Tuesday 26 September 2017

Automation in Performance Testing: resources

Sooo, I've had an opportunity to speak at the Testing Automation Summit yesterday in Auckland. The event was organized and hosted by the Testing Mind, and they did a good job off it. I am certainly grateful to have had a chance to be a speaker once again - haven't done it in a few years, and it was nice (although nerve-wrecking, hi, social anxiety) to be back.

As promised at the conference, here I am sharing some resources from the talk for those interested:

  1. Slides with notes and all in pptx format or same slides on the speakerdeck.com.
  2. Datapot and Pritcel report, as well as few other R-scripts I regularly use in performance testing are on github: https://github.com/hali/Performance-data-processing
Please let me know if there are any questions you didn't get a chance to ask at the conference!


Monday 12 June 2017

Concurrency graphs

Just last week we were discussing one of the JMeter test scripts in our team, and it occurred to me that it would be useful to see the concurrency of the requests, meaning which requests run in parallel. That should very roughly represent which threads on the server are doing the same thing at the same time, and it is interesting as another way to look at what's going on during the test.

I knew there wasn’t such a chart in JMeter, because I thought about it briefly couple of times before and couldn't find it, but I was surprised to find that there wasn’t any ready to use tool to make that kind of chart. The closest I found was Gantt charts, but they are not designed to deal with thousands of requests and hundreds of threads. So I set up to make my own tool. In R, of course.

Here’s how the result looks (generated from one of the internal tests, legend cut off not to disclose anything):
The chart above shows which requests are currently running on each of the concurrent JMeter threads. It gives a very high level picture of what’s going on with the concurrency (requests are shown in different colors each, so you can assess the concurrency just by looking here). It is kinda hard to read though, especially in a case like this where we have a lot of requests, long duration of the test and long user thinking times in between. So here's another way to look at the same data:
This chart shows calculated concurrency level for different points of time (points of time being request start times). Concurrency is calculated as number of “overlaps” - i.e. if in the first graph you were to draw a vertical line and count the number of same-colored horizontal blocks intersecting that line - that would be the concurrency level for the corresponding request, and that value is displayed on the chart as (request_start_time, concurrency). Calculation is pretty rough, for more precision we'd like to calculate concurrency at regular time intervals, not just at request start times - that does mean longer running time and even harder to read chart though, so I decided I'm happy with what I've got here.
Of course, even this chart isn't easy to read, so if you want to drill into the data, I suggest running R-scripts in interactive mode and either look at text data, or chart only the specific request you are interested in. The code is there, and is hopefully easy to understand.
It's all good and nice to look at the data from the tests, but for it to be really useful you want to know whether it resembles production. You can check that by producing similar graphs from production usage data, e.g. from access.log. Please note, that access.log can be much noisier than JMeter output files, since in JMeter we get to name the samplers, whereas access.log will have each request with all the URL parameters included. So before graphing or calculating concurrency, you'll need to clear the data and filter out requests you are not interested in (e.g. you might want to remove static resources and/or remove dynamic user- or session- dependent parameters from other urls). Example of such filtering is in the script.

Feel free to grab the scripts, and please let me know in the comments if they were useful, or if you see some error in the calculations.

Monday 13 July 2015

Security ramble

All the recent hacks where mass amounts of personal data has been exposed made me wonder, whether in time public perception of privacy and data security will change.
What I mean is people nowadays seem very much surprised and distressed whenever their data gets stolen, be it photos from iCloud, or SSN and address info from governmental databases, or PHIs from health and insurance providers. It's almost like your average Joe or Jane do not expect it to ever happen... but data gets stolen all the time. And I don't see any reason for these hacks to stop in the near future.

Wouldn't it be more reasonable to assume that every information storage system will be hacked, and any data will be stolen? This assumption will give you state of mind and tools to concentrate on active monitoring and mitigation plan, whereas nowadays it looks like people mostly concentrate on preventing the hack (some big hacks went unnoticed for many months!).

I would much prefer if people responsible for the systems where my personal data is stored:

  • Assumed they are gonna be hacked.
  • Made sure when it happens they will notice (automated smart monitoring systems).
  • Made sure it is complicated and/or expensive to use stolen data to harm me (block bank accounts, make it possible to cancel ID easily, make it hard to make sense of my PHI without some key that is also easy to cancel/revoke, make sure devices that can physically harm me have inbuilt protection against that physical harm - e.g. e.g. it shouldn't be possible to program heart pacifiers to murder its carrier).
  • Worked on making the attack expensive (we are gonna be hacked, but it will be annoying, frustrating and expensive process for a hacker) and long (store unrelated data in different disconnected places, so you have to do a separate hack for each of the pieces).
And I myself am assuming my data can be stolen at any point, so I am trying to behave with that assumption in mind:
  • There are no private emails or photos that, if made public, will harm me - I do not put stuff that can harm me in the internet. I don't say shitty things about people behind their backs. I do not lie. Not that I naturally feel the need to do all that stuff, but assuming you can get exposed at any moment does provide additional motivation to withhold from being a dick.
  • My money are stored in different places, and my cards are not connected to my savings.
  • My most important email account is behind a 2fa authentication, and it is connected to my phone, so if it is compromised, I will notice, and I can block it fast.
  • And last but not the least, I am mentally prepared it can all fail me. If that happens it will mess me up a bit and create some hassle to block/change/restore cards, accounts and IDs, but it will not be the end of the world.

Thursday 9 April 2015

Oracle troubles and findings: tuning experience

I am not an Oracle DBA. But with performance testing I more often than not end up creating the whole environment, which means setting up Oracle server as well. Since I am not testing Oracle server specifically, I usually only tune it enough for it to not be the bottleneck. Lucky for me Oracle server is actually pretty cool compared to applications I test, and only gets to be a bottleneck in scalability testing where it serves multiple application servers. Still... recently I've bumped into a set of correlated problems that led me to tuning effort on the Oracle server itself.
Sponsored by Internet - meaning that all that I've done was googled in the internet and applied with fingers crossed.

So, here it goes.

First problem was the following: when I went from 4 application server nodes to 6 application server nodes, response times high rocketed. It was obviously an infrastructural problem, and after a while Oracle was the only culprit. Unlike usually, CPU usage wasn't that high on the oracle, so I had to dig a bit deeper, and guess what I found: concurrency issues such as "cursor pin S on X"!

To avoid doing hard parses, oracle puts any new query and it's execution plan into a shared cursors tree. Access to that tree is controlled by mutex pins algorithm. Only one session can grab a mutex pin for a specific cursor at a time. Also, similar queries are being put as leaves with a common root, and my understanding is the whole root is being pinned during any updates in the tree...

Anyway, it was happening for two reasons:
1. Application under test was using queries with literals where it should've been using prepared statements.
2. My oracle version (11.2.0.1) had known issues around shared cursors tree.

So I've updated to 11.2.0.4 and set CURSOR_SHARING=FORCE. What this option does is it replaces all literals in all queries by system variables, which effectively means that all the queries that only differ in literals are now treated as the same query, they have the same cursor, and cursors tree doesn't need to be constantly updated. This took care of concurrency issues. It also created another problem.

Suddenly one of my other queries which was never a problem before, a very simple and well behaved query, became a huge bottleneck. It would take thousands of CPU cycles to execute where before it was tens of cycles! This one took days of my time, numerous experiments that slightly improved the situation but didn't solve the main issue, and in the end I had to go to DBAs for help.

Turned out that innocent "1=2" in that query (which was there because the query was dynamically generated with optional conditions) was replaced by something like ":SYS_0=:SYS_1", and that meant Oracle was grabbing those variables and evaluating the clause again and again for each row in a huge table (I would think it would do it once, understand it's FALSE and leave it at it - but no).
This was of course the result of CURSOR_SHARING=FALSE. I'll say in advance, that I got exactly the same behaviour with CURSOR_SHARING=SIMILAR.

The suggested fix in my case was either to switch to prepared statements everywhere so that we don't need to use CURSOR_SHARING=FALSE/SIMILAR, or to remove "1=2" from the query that suffered from that setting. Can't have it both.

Other useful tuning:

  • Increasing shared_pool_size.
  • Increasing session_cached_cursors.
  • Weirdly enough, locking statistics on selected columns helped.

Sunday 1 February 2015

HAProxy balancing https backends

Recently I needed to configure load balancing in my environment, where I needed to balance between few https servers with sticky sessions enabled. I looked in the haproxy manual, I googled, I asked - and for days there was no making it work.

Most of the haproxy configuration examples out there are for the case when client connects to haproxy via https, and then haproxy decrypts it and balances requests between http backends. Few examples around https backends assumed that no sticky sessions are needed, so they all sit on top of tcp. To this day I have not found a guide or an example of how to configure what I need, so once I figured out how to do that, I thought I'd share.

So the way you do it is:
0) You need haproxy 1.5+. haproxy before that did not support https on its own.
1) A client connects to haproxy via https. There need to be a certificate+private key combination (that client would trust) on the haproxy server.
2) HAProxy decrypts the traffic and attaches a session cookie. If the cookie is already there, it knows where to send the request further.
3) HAProxy encrypts the traffic again before sending it to backend (where backend can decrypt it).
4) and the other way around.

And the configuration for that is:

  • For both backend and frontend you should have mode http.
  • In the bind line you need to add ssl cert <path to haproxy certificate + private key file>.
  • In the backend section you need to set load balancing algorithm - e.g. roundrobin or leastconn.
  • In the backend section you also need to set a cookie - e.g. cookie JSESSIONID insert indirect no cache.
  • For each server you need to say "ssl" after the ip, and then also set a cookie.
For me the one part I couldn't find in any guides was to put "ssl" in the server line (as well as in the bind line). I might have missed it somewhere in the not-so-helpful haproxy manual.

One thing I didn't go into was setting up a proper certificate on a backend servers in my environment, because of course in test environment they are self signed and all that. In order to work around it, just add another global setting to the haproxy settings: ssl-server-verify none.

And here's the example of the config file frontend & backend sections to make it work:

frontend  main
    mode http
    bind :443 ssl crt /etc/haproxy/cert.pem
    default_backend app

#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend app
    mode http
    balance     roundrobin
    option httpchk GET /concerto/Ping
    cookie JSESSIONID insert indirect nocache
    server  app1 10.0.1.11:443 ssl check cookie app1
    server  app2 10.0.1.12:443 ssl check cookie app2
    server  app3 10.0.1.13:443 ssl check cookie app3
    server  app4 10.0.1.14:443 ssl check cookie app4

Monday 15 December 2014

We suck in security

And by “we” I mean humanity, at least the part of humanity that uses computers to create, store and share information. This is my main take from this year’s kiwicon 8. And this is the story of how I got there.
Disclaimer: I’m just gonna assume that presenters knew what they were talking about and use what they said shamelessly, and then give details in the further blog posts (or for now you can probably look at the posts of other attendees).
According to Rich Smith, Computer Security appears where Technology intersects with People. Makes sense. So lets look at the technology and at people separately.

1. Technology.
Just at this two-days conference vulnerabilities have been demonstrated in the very software that we rely on in keeping us safe: turns out well established Cisco firewall and most of the anti viruses are pretty easy to hack (for people who do that professionally). That’s like a wall with many holes. You get relaxed because wow wall, and next thing you know is all your sheep are stolen.

Firewalls and anti viruses are software, but the problem runs deeper. Protocols and languages! Internet has not been designed for security, and looks like all the crutches we built for it since don’t help as much as you would hope. JavaScript with its modern capabilities is always a ticking bomb in your living room, and now there is also WebRTC that was designed for peer-to-peer browser communications, and that helps tools like BeEF hide themselves. BeEF is stealth as it is, so maybe it doesn’t make an awful lot of difference, but you can see the potential: where earlier BeEF server would control a bunch of browsers directly, now it can also make browsers control other browsers. Thank you, WebRTC. You would think that technologies arising these days would try to be secure by design, but oh well…

Wanna go even deeper? Ian “MCP” Latter presented proof of concept for new protocol that allows to transfer information through screen and through programmable keyboard. He also demonstrated how exploitation framework built on top of these protocols allows perpetrator to steal information bypassing all the secure infrastructure around the target. The idea is you are not passing files, you are showing temporary pictures on the monitor screen, you capture this stream of pictures, and you decipher information from those pictures. Sounds like something out of “Chuck”, yet this is a very real technology. As its creator said, “By the nature of the component protocols, TCXf remains undetected and unmitigated by existing enterprise security architectures.”
Then there are also internet-spread vulnerabilities that got known in the last year, that for me as a bystander sound mostly like: a lot of people build their products on top of some third-party libraries. When those libraries get compromised, half of the internet is compromised. And they do get compromised.
Then there is also the encryption problem, where random numbers aren’t really as random as people using them think. But compared to all above it sounds like the least of our problems.
Okay, so technology isn’t as secure as we want, what about people?

2. People
People are the weakest link. Forget technology, even if we were to make it perfect, people would still get security compromised. And according to many speakers on the kiwicon, so far security area sucks in dealing with people. There is wide-spread default blame culture: when someone falls a victim to social engineering, they are getting blamed and fired. That is hardly how people learn, but that is exactly how you create atmosphere in which no one would go to security team when in doubt because of the fear of getting fired. Moreover, we don’t test people. We don’t measure their “security”, and we don’t know how to train them so that training would stick - because we don’t know what works, and what doesn’t.
So, we have problems with technology and with people. What else is bad?

There are plenty of potential attackers out there. Governments, enforcement agencies, corporations, individuals with various goals from getting money to getting information to personal revenge… they have motivation, they have skills and tools, and it is so much cheaper to attack than it is to defend (so called “Asymmetric defence”).
To make it even easier for attackers, targets don’t talk to each other. They don’t share information when they were attacked, to report such a thing is seen as to compromise yourself. And even when information is willingly shared, we don’t have good mechanisms to do that, so we do it the slowest way: manually. It might be easy enough in simple cases, but as @hypatia and @hashoctothorpe said, complex systems often mean complex problems, which would make them hard to describe and to share.

So, we suck in security. This is quite depressing. To make it a bit less depressing, lets talk about solutions that were also presented on the kiwicon (in some cases).

Most solutions were for the “People” part of the problem. Not one but three speakers talked about that.
The short answer is (in words of Etsy’s Rich Smith): ComSec should be Enabling, Transparent and Blameless.
The slightly longer answer is:
  • Build culture that encourages people to seek assistance from the Security specialists and to report breaches (don’t blame people, don’t try to fix people - fix the system).
  • Share information between departments and between organizations.
  • Proactive reach: for security team to reach to development and help them develop secure products.
  • Build trust.
  • Recognise that complex system will have complex problems.
  • Do realistic drills and training, measure the impact of training and adjust it.
People are reward driven and trustful by default. What makes it a problem is that people are thus highly susceptible to social engineering methods which are many. This can’t be fixed (do we even want it fixed?), but at least we can make it super easy to ask professionals for help without feeling threatened.

Okay, so situation in the People area can be improved (significantly if everyone were to follow Etsy’s culture guidelines) - at least for some organizations. What about the Technology area? Well… this is what I found in the presentations:
  • Use good random numbers.
  • Compartmentalize (don’t keep all eggs in one basket, don’t use flat networks, don’t give one user permissions to all servers, etc.).
  • Make it as expensive as possible for attackers to hack you: anti-kaizen for attackers, put bumps and huge rolling stones in their way, make it not worth the effort.
  • Know what you are doing (e.g. don’t just use third-party libraries for your product without verifying how secure they are).
  • …?
This is depressing, okay. In fact, I’m gonna stop here and let you feel how depressing it is. And then in the next posts I’ll write about more cheerful things. Kiwicon was really a lot of fun and epicness (I was in a room full of my childhood heroes, yeeey!). And there was a DeLorean. Doesn’t get much more fun than that. :-D

Wednesday 19 November 2014

R for processing JMeter output CSV files

So, I'm working as a performance engineer, and I run a lot of tests in JMeter. Of course most of those tests I run in non-GUI mode. To get results out of non-GUI mode there are two basic ways:
  1. Configure Aggregate report to save results to a file. Afterwards load that file to a GUI JMeter to see aggregated results.
  2. Use a special JMeter plugin to save aggregated results to a database.
I already wrote about the second way, so today I'll write about the first one. There are probably better ways to do it, but until recently this was how I processed results:
  1. Run a test, have it save results to a file.
  2. Open that file in Agregate Report component in GUI JMeter to get aggregated results.
  3. Click "Save Table Data" to get new csv with aggregated results.
  4. Edit that new CSV to get rid of samplers I am not interested in (mostly the ones that I didn't bother to name - e.g. separate URLs that compose a page), and to also get rid of the columns I am not interested in.
  5. Sort the data in the CSV by Sampler - this is because I run many tests, and I need to compare results between runs. For that reason I create a spreadsheet and copy response times and throughput data to that spreadsheet, adding more and more columns for the table with rows labeled as samplers. Whatever, works for me.
  6. Copy results from csv to a big spreadsheet and graph the results.
At some point I used macros and regexps in Notepad++ to do stage 4. Then my laptop died and I lost it, couldn't be bothered to write it again, even though it was big help. Still, even with the macro there were a lot of manual steps just to get to meaningful results.

But hey, guess what, I've been learning stuff recently - in particular Data Science and programming in R. So I used little I know and created this little script in R to do steps 2-5 above for me.

Now all I have to do is to place JMeter output files in a folder, start R Studio (which is a free tool, and I have it anyway) (you can probably do it with pure R, no need in R Studio even), set working directory to the folder with files and run the script. Script goes through all csv files in the folder (or you can setup whatever filenames list you want in the script), and for each file:
  • Calculates Median, 90% Line and Throughput per minute for each sampler
  • Removes the samplers starting with "/" - i.e. samplers I didn't bother to give proper names, so I am probably not very interested in their individual results.
  • Removes delays (that's the thing with our scripts - we use Debug sampler usually named as "User Delay" to set up a realistic load model).
  • Orders results by sample name.
  • Saves results to a separate file.
One button and voilĂ  - all the processing done. Now I only need to copy data to my big spreadsheet and graph it as I choose.

Script is in githubfeel free to grab and use.

Sunday 14 September 2014

Clouds for performance testing

Cloud computing has been a buzz word in IT few years ago, and now it is rapidly becoming industry standard rather than some new thing. Clouds have matured quite a lot since they got public. Now, I do not know the full history of clouds, but in the last few months I had an opportunity to work with few of them and to assess them for my very particular purpose: performance testing in the cloud. What you need for performance testing is a consistent performance (CPU, memory, disks, network) and if that's a cloud - also an opportunity to quickly and easily bring environment up and down.  

These are the clouds I tried out:
  • HP Cloud
  • Rackspace
  • AWS
  • MS Azure
Without going much into details (to avoid breaking any NDAs), lets just say, that in each cloud I deployed a multi-tiered web application using either puppet master or (in case of Azure where I only really looked at the vanilla Cassandra database, see explanation below) internal cloud tooling. Then I loaded the solution and monitored resource utilisation, response times, throughput, and I noted down any problems that got in the way.

And here’s what I’ve got.

HP Cloud. The worst cloud I’ve seen. The biggest problems I’ve encountered are the following:
  • Unstable VM hosts: two times a VM we used suddenly lost the ability to attach disks, which practically caused DB server to die, and us to lose extra day on creating and configuring a new DB server.
  • Unstable network: ping time between VMs inside the cloud would occasionally jump from 1ms to 16-30ms.
  • High steal CPU time - which means that VM would not get the requested CPU time from the host. During testing it got as high as 80% on the load generating nodes, 69% at the database server, 15% on the application servers.
There were also minor inconveniences such as:
  • It is impossible to resize a live VM: you’ll need to destroy it, and then to recreate, if you need to add RAM or CPU.
  • There is no option to get dedicated resources for a VM.
  • Latest OS versions were not available in the library of images, which means that if you need a new OS version, you’ll have to install it manually, create a customised VM image, and pay separately for each license.
  • Sometimes HP Cloud would have a maintenance that puts VMs offline for several hours.

HP Cloud was the starting point of my cloud investigation, and it was obvious we cannot use it for performance testing. So next I moved to Rackspace - another Openstack provider, more mature and powerful than HP Cloud. More expensive, as well. In Rackspace I didn’t have any problems with steal CPU time, nor with resizing VMs on the fly. It was a stable environment allowing to do benchmarking and load testing. However, it also had a bunch of problems:
  • Sometimes a newly provisioned VM wouldn’t have any network connectivity but through Rackspace web console. Far more often a new VM wouldn’t have network connectivity for a limited amount of time (2-5 minutes) after the provisioning, which caused our Puppet scripts to fail and thus caused a lot of trouble in provisioning test environments. Rackspace tech support has been aware of the issue, but they weren’t able to fix it in the time I was on a project (if they fixed it later, I wouldn’t know).
  • There were occasional spikes in the ping times up to 32 ms.
  • Hardware in Rackspace wasn’t up to our standards: CPU we got didn’t have a lot of cache, so our application would stress out CPU much more than on the hardware we used “at home”. That practically meant that to get the performance we wanted we’d need at least twice as much hardware, which was quite expensive.

After Rackspace we moved onto AWS (my colleagues did more stuff on AWS, than me, thus “we”), and we were amazed at how good it was. In AWS we didn’t have any of the problems we had in Openstack. AWS runs on good hardware (including SSD disks), allows to pay for dedicated resources (but we didn’t have to do it, because even non-dedicated VMs gave consistent results with zero steal time!), shows consistent small ping times between VMs, has a quite cool RDS service for running Amazon-managed easy-to-control relational database servers.

Yet, AWS is not cheap. So we thought we'd quickly try MS Azure to see if it can provide comparable results for a lower price. Because I was to compare Azure vs AWS in few specific performance-related areas (mostly I was interested to see CPU steal times, disk and network performance), I ran few scalability tests for the Cassandra database. Cassandra is a noSQL database, that is quite easy to install and start using. What was cool for my purposes, it has a built in performance measuring tool named cassandra-stress. It's a fast to setup and extremely easy to run test, and also Puppet just wouldn't work with Azure, so instead of the multi-tiered web application I went with Cassandra scalability test.

MS Azure wasn’t actually that bad, but it is nowhere near AWS as an environment for running high loads:
  • The biggest problem seemed to be network latency. Where AWS was doing perfectly fine, Azure had about 40% failures on timeouts on high loads. Ping times between nodes during tests were as high as 74 ms at times (compared to 0.3 ms in AWS under similar load). From time to time my SSH connection to this or that VM would break for no apparent reason.
  • Concurrently provisioning VMs from the same image is tricky: part of the resources is actually locked during VM creation, and no other thread can use it. That caused few "The operation cannot be performed at this time because a conflicting operation is underway. Please retry later.” errors when I was creating my environment.
  • Unlike AWS, Azure doesn’t allow you to use SSD, which means a lower disk IO performance. Also in Azure there are limitations on the number of IOPS you can have per storage account (though to be fair there is no practical limitation on how many storage accounts you can have in your environment). Even using RAID-0 of 8 disks didn’t allow me to reach the performance we easily had in AWS without a RAID.
  • For some reason (I am not entirely sure it was MS Azure fault) CPU usage was very uneven between the Cassandra nodes, even though the load on each node was pretty much the same.
  • I was not able to use Puppet because the special Puppet module for MS was out of sync with the Azure API.
This being said, Azure is somewhere near Rackspace (if not better) in terms of performance, and is quite easy to use. For a non-technical person who wants a VM in the cloud for personal use I’d recommend Azure.

For running performance testing in the cloud, AWS is so far the best I've seen. I also went through few of Amazon courses, and it looks to me like the best way to utilise AWS powers is to write an application that would use AWS services (such as queues and messages) for communicating between nodes.

As a summary: from my experience I would recommend to stay away from the HP Cloud, to use MS Azure for simple tasks, to use AWS for complicated time-critical tasks. And if you are a fan of Openstack - Internet says Rackspace is considered to be the most mature of the Openstack providers and to run the best hardware.

Sunday 20 July 2014

Introvertic ramble on the trap of openspaces and office spaces in general


Hi, my name is Viktoriia, and I’m an introvert.

The weekend before last I spent two awesome days socializing with some of the best testers in New Zealand. After that I spent another three days trying to recover from all the joy. I was exhausted emotionally and physically, and had to spend full Sunday being sick and miserable because that’s how my body reacts to over-socialization - it goes to hibernate. Humans are not built to spend time in hibernate. That got me thinking…

Every day working in the office I get a bit more socialization that I would voluntarily choose to. And then when I get one little spike (like a testing conference), it becomes a butterfly that broke the cammel’s back.

Don’t get me wrong, co-location is awesome and critical for agile teams and all that. But there are also problems that come from the way we implement it (by placing everyone into these huge openspaces), and not only problems relevant to introverts exclusively:
  • The constant humming noise. Even if we forget about people who talk loud because that’s how they talk - the typing, and moving, and clicking, and talking, and whatelse is always there. Noise is stress. We even had it as a topic in school and university in Russia: even though human brain is pretty good with filtering out non-changing signals, human-produced complicated noise still makes it to do a lot of work to maintain those filters. Nervous system is always working extra hard just to save you the ability to concentrate. 
  • The cold going round. When someone is sick, everyone is sick. Someone is always sick. Sneezing and coughing never really stops. It’s like a kindergarden for IT - if you don’t have iron-made immune system, you are bound to go in and out of colds non-stop. Nothing serious, but pretty annoying. 
  • The temperature. Since we are all sharing the same space, we cannot possibly set temperature so that it’s good for everyone. For me it’s always freezing in the office. Judging from the number of people in jackets around, I guess I’m not the only one. 
  • The socialization itself. For introverts like me it’s additional stress just to be around this many people all the time. It makes it harder to concentrate, and it means that I’m always under just a little extra bit of stress. Immune system works badly when you are under stress, so that feeds into constantly being in and out of sickbay, which feeds into concentration problems again.
  • Commuting. This one applies to working from office in general, not just to openspaces. Every day so much time is being lost on getting from home to office and back. This makes roads overloaded, makes air worse, makes us all spend our precious time doing what really isn't necessary. Would be cool to free up roads for people who actually do have a good reason for being there. In IT in many cases it can be avoided - we have enough collaboration tools to go from 5 days a week working side by side to 1 day when everyone's physically in the office to align their actions and adjust plans as necessary and 4 days when everyone is where they choose to be, being online and connected via internet.
  • Multitasking. There have actually been research done* about the efficiency of office workers in different settings. It was shown that even extraverts work more efficiently and more creatively when they have a little bit of privacy (even if that’s a cubicle or a smaller room with just your team - but not the openspace). We also all know that exploratory testing recommends uninterrupted test sessions. The thing is, humans suck in multitasking. We can only really do one thing at a time. We can switch between tasks fast, that’s true, but imagine the overhead! When part of your resources is spent on ignoring the noise (I guess, headphones somewhat help, but in my experience you just get touched a lot when people want to talk to you), part on fighting the cold and part on switching between different tasks (passersby wanting to chat, for example) - you cannot possibly work at your fullest.
While having separate rooms for each team instead of openspace would make things much better, I personally would still prefer to have a choice to work from home. I found that few days when I was sick and worked from home turned out to be no less productive than an average day in the office, and most times even more productive. Always more comfortable.

It would be awesome to have an oportunity to work from home and be judged by results, not by hours in the chair. Especially since many IT companies seem to be already evaluating performance by results. Company I currently work for has a thorough system of logging and evaluating successes and results, and no one really sticks for hours as far as I know. Yet it is not a common practice to allow employees to work from home, aside from emergencies and special cases. I wish it was. One of the reasons I want to go to contracting in few years is to have an opportunity to live out of Auckland in a nice house with good internet and do all the work from there. In my book it beats both living in the center of Auckland to be near office and living outside of Auckland and spending few hours every work day on commuting. I'd rather work 9 hours from home than work 8 hours in the office and spend another hour on getting there and back.

*about research and more, there is an awesome book “Quiet: The Power of Introverts” by Susan Cain. It quotes and references quite a lot of scientific research in the area. I highly recommend it to anyone who’s interested in how people work.