Monday, March 4, 2013

The peril's of not testing properly, and making the largest web page ever in the process

In rooting around some old documentation while at work, I came across a screenshot of a website I had worked on as part of a very large project. 



What is this?  I'll zoom in on the highlighted part of the image, as that's the important part:



Incase you aren't familiar with debugging web application's, this is the size of the web page being generated for a user request.  This indicates that the page size is over 8 MB.  To put that in perspective, if a typical web page should be a slice of pizza, this is like... 4 trays of pizza. 

The back story


Looking at this, I vividly remember where I was when I preserved this.  We were at the tail end of a massive 18 month project to convert this particular company's system and sales processes involving hundreds of employee's and contractor's across the globe.  Each component of the system had large sub-components, all of which had been tweaked and re-tweaked countless times as the project owners and champions continually changed their vision and requirements, while also not relinquishing on their project deadline.  After many cups of coffee and few hours of sleep in the final weeks, many of the developers and system engineer's involved found a way to stay on target, and we were on the eve of the launch, which had cost the parent company 10's of millions to complete.

On this particular night, a majority of those involved in the launch of the project were on a late conference call also involving various V.P.'s and executive's who made more money while sitting on that call than I make in a month.  Those higher up's all wanted a final blessing from everyone publicly that the launch event would occur, that we were prepared to deal with any issues, and that we would be live with all new systems at the end of the weekend.  Of course, no one had the guts to actually "raise their hands" and say we weren't ready.  And to be honest, nobody was comfortable launching that weekend, we all had loads of defect's logged, and for the last month before launch, we all focused solely on issues that were considered "mission critical" (which is another tale entirely, as I had a previous freak-out when someone said the mispelling of the term "the" was "revenue impacting" and therefore a mission critical defect.. but I digress.)

So What Happened?


So, would you believe that, with all the time spent on developing and testing these new systems, we were never able to prepare and test with real, live data until when the actual go-live process was about to un-fold?  And, could you believe that nothing could go wrong with this model?  Of course something did.. actually, there were many hiccups during the launch process.  From a web perspective, it was disheartening that the first of these had occurred within MINUTES of the launch process starting.  It was also scary, because having normal web requests to be this large, it could have crippled our web servers within hours of launching our new systems, looking awful for the team that I was on.

The page in the screenshot above is the output for a course from that particular company, simply by requesting all courses within a 20 mile radius for a zip code.  The end result was a HUGE amount of courses!  And it wasn't just the course name, but a variety of data that was sent down with each course offering.  In the end, hundreds of courses were sent down from the search.
So you're asking yourself.. why didn't you test this?  We did, for months even.  What we DIDN'T test was the actual real-to-life dataset that would launch during go-live.  We were testing a 10th of the real data.  For business reasons, we weren't able to get the actual real-to-life dataset until go live, because it would make too much of an impact for user's to start entering the data until it was time to go live.

And so here we were, on the eve of launch, and while our V.P. of IT was telling very important men and women with several car's, apartment's, computer's and probably a yacht or 2 among the group that all would be OK when all systems were live, our team was discovering a huge, huge issue.

So What Did We Learn?


A few things actually.  The most important thing was that we can change our shorts and think of workarounds at the same time.  In the end the fix was actually simple, enough so that we were able to implement it within an hour or so and get it into Quality Assurance testing that evening - it basically involved reducing the number of courses returned to the screen at a time, making the new page size much, much smaller.

The real lesson however, and the reason for this post, is that project's of all shapes and all sizes need to be tested for appropriately.  When developing a new system, or on top of new hardware, or even performing upgrades, it's so easy to focus on what's new or what's changing.  Those aspects are important of course, but they're not the entire picture, it's not a real test unless you use real-stuff!  For my story above, I can't count how many times we enrolled in fake courses called "blah blah" taking place in zip code "12345".  What we really needed was a real data dump, and trust me, we had asked for that, for months!  We didn't get it, and by the grace of God we we're able to work around our particular issues.  If we didn't, or if the website issue was large enough that we couldn't have it ready in time, it was our team on the hook.. not all the people who failed to provide the accurate data we wanted to test with.

No comments:

Post a Comment