Applying Risk Management to Agile
It all started with horribly misquoting a software project for a high-profile client. The requirements were clear: connect to an existing API (Application Programming Interface), obtain the text content, look for specific phrases, and count them. During the discovery phase, a connection was made to the API and results sent back. In reading the API documentation, it was found that all files returned were scanned images, not strings.
Adding in time to perform OCR (Optical Character Recognition), the estimate of work was increased. Meeting with the stakeholders, the time frame was extended and development commenced. So far, all was well and rapid progress ensued. That was until the second day.
Architect or Engineer
The Agile Manifesto begins by placing individuals and interactions over processes and tools. What is amazing about Agile, is it clarifies a truth that reveals software as a collection of people, ideas, and goals. Not simple a means to automate a task.
It goes back to the difference between Architects and Engineers. Both apply scientific principles to design problems. However, Engineers deal mostly with clearly defined requirements. For example, the voltage required to move a 3000 pound car at 60 miles per hour. Architects deal with mostly unknowns, such as how to add a fire escape without affecting the lobby space.
Software Developers act as either Architect or Engineer based upon the clarity of requirements present. Our high-profile client project seemed like a solution for an Engineer. However, on day two of development, the API call to obtain the documents was failing. It turns out that the MIME (Multipurpose Internet Mail Extensions), the attribute used to notify software of the expected data type, was wrong.
Set as an octet stream, we were expecting a binary dump of data. Instead, it was a multi-part delivery of both zip file and XML description. This moved the requirements from known into unknown. Now the developers had to act as Architects in order to determine the best design decision for the application.
The solution it seemed was to intercept the API response, change its META attribute in the header, and then resolve it as a multi-part response. This worked.
Requirements
Describing what the software is supposed to accomplish seems easy. Our high profile client project had very clear requirements. Take some text, count the occurrences of certain phrases, and save the results. However, it did not represent the complexity required to reach those objectives.
After solving the meta issue with our API response, we converted each image file to a string via OCR software. Thus, we could scan the text for each document. However, a new issue occurred where the resulting string from OCR had serious problems. Now finding specific phrases in the text was significantly more difficult than anticipated. For example, the word “credit” was spelled with many variations of characters: “c$ed1”, “0r3dt”, etc.
It turns out that no one thought it prudent to test the response from the API. Instead, the team made a call and got back a WSDL (Web Service Description Language). A document explaining the various methods available on the service. However, it usually only gives basic interface level descriptions. Had we made one additional call to the method needed to pull documents, most of the issues would of been exposed. Resulting in setting a different client expectation.
Agile states, “Customer collaboration over contract negotiation.” Therefore we went back to the client and explained the problem. While not happy with the discovery, they did offer some reduction in the accuracy requirements. Allowing a 95% success rate for the counts instead of the original 100%.
Since the original expectation had already been set, the added uncertainty based on new information was not well received. While Agile methods are designed to bring the stakeholders (client, boss, or company) into the development process, there is still much left over from the waterfall days. The biggest challenge being buy-in from stakeholders when an earlier part of the process was done incorrectly or a large issue discovered.
That is one of the biggest problem for development teams — mismanaged expectations.
Estimations
The place where uncertainty is most evident is in the estimations. Typically, software projects start with a set of requirements in the form of user stories. Reading something like, “As a user I would like to…” The problem is that the process is deceptively simple.
Agile states, “Responding to change over following a plan.” Here is where Agile needs more clarification. If all we do is respond to change, how can we determine an accurate estimate?
Consider estimations start with the Project Manager asking each member of the development team to give an estimate of the time required for each story. The response is something like grade-school arithmetic. “We need an endpoint to return the result, that’s three. Then a button to make the request, that’s one. With testing, I would say a five.”
The problem is that everyone is just making a guess. The numbers used are arbitrary — maybe t-shirt sizes of small, medium, and large to fibonacci numbers, to number of hours. Regardless of the method, there is always a discrepancy between the estimate and actual. Project Managers try to reduce this by padding the numbers. Thus a small becomes a medium, a 3 becomes a 5, and so on.
With our high profile client, we padded the estimates but were still way off due to unforeseen problems. The word “unforeseen” here points to the real issue. That is everything with a new project is unforeseen. Even if the requirements match up to past experience, they reflect multiple unknowns.
In our high profile application, we discovered hundreds of patterns for one word. Thus finding the term in the text became a monstrous task. Not the simple few hour job we anticipated. Had we went further with initial tests, the complexity required would of became known before the work started. Thus, estimates should be based on actually data and not assumptions.
For example, if the requirement is to connect to an API and do something with the result. First, ensure the API can be connected too. Then be certain the desired result can be obtained from the data. Noting any problems encountered.
Had we done this, the original estimate would of been much closer to the added four-weeks of development required. The sad part is this project was not an estimation anomaly. On the contrary, the project with an accurate estimate is the outlier.
Agile wants the requirements and by default, estimations to be fluid. This is good. However, it must fully committed to by all involved. Thus, it is recommended to have a step before user stories. That is to phrase each requirement as a question. Then set-out to answer each question with quantifiable data to give a binary answer as yes or no.
Consider the user story, “As a website user, I need to enter my name, email address, and phone number to access the special content.” Instead, write “Does a website user need to enter their name, email address, and phone to access special content?” The reason is to ensure that everyone thinks through the requirements before committing to them.
Using risk management techniques, the job is now to calculate costs for each decision in man hours and expenses. Development time equates to complexity, quality, and cost. However, other expenses such as storage cost and user adoption are separate from man hours. For example, it may be determined that it will take 5 hours of development time to require a user’s phone number as part of their registration. That translates into $10 extra per month in storage fees and based on marketing data, a 20% reduction in user sign-ups. Thus the decision for this requirement can be answered with insight — not just a best guess.
The only issue with this is a large amount of time spent in planning. However, the time used can be relative to the scope of the project being considered. Perhaps 2 hours of research per 50 man hours of development.
Once in the habit of doing this, the time taken becomes minuscule. A Project Manager with good metrics at the ready can provide data to answer most of these requirement questions during the Sprint Planning Meeting. Turning the scenario into, “Should we require a phone number as part of registration?” Then the almost instant answer of, “It will increase the project by 5 man hours and our marketing department says it will reduce user registrations by 20%. Do you want to include it?”
Measuring development time comes down to taking regular measurements of the team’s ability to deliver. A Project Manager may have some idea based on past experience, but few have quantifiable data to use for this process. Most just ask, “how long will it take this team to make a user security system?” Instead of something like, “Dave makes a user front-end from designs in 34 hours on average. John completes most designs in 26 hours. Terry can build a back-end in 30 hours.”
The reason this is rarely done, is it requires data on each team member that spans multiple projects. Add to that a team’s ability to become more efficient over time, and the complexity of this task becomes evident. However, when using the Agile method of continuous deployment, many of the modern tools can formulate such metrics on the fly. Recording metrics from each developer’s code check-outs and merges.
Quantified team measures coupled with accurate pre-project tests of requirements serves well to reduce overall project risk. Making the estimates much more accurate for the initial set of requirements while serving to quickly estimate new levels of work due to requirement changes along the way.
Communication
Since software is a combination of people, ideas,and goals — communication is paramount to the process. The most difficult part is for those building the system to fully understand the vision of those needing it. One of the ways to do this is with a design first strategy. Meaning that each user screen is created in a graphical application and walked through with the stakeholders.
Not only does it give a visual reference for the completed product, it allows for multiple requirement questions to be answered. For example, “Here is the screen for user registration with the phone number and here is an example without the phone requirement.” Thus the stakeholders will give their opinion on which way to move forward.
For projects without a user interface, such as our high profile client project, the task of clear communication is more difficult. In this scenario, where the end product is a back-end service, it is best to provide a user interface for the stakeholder to see. Not only does it clearly communicate progress, it allows non-technical individuals to become part of the development process.
Doing so follows the “Customer collaboration over contract negotiation” tenant of Agile. Having a way for a stakeholder to enter input and view human readable output in a web browser will save hours of meeting time due to better clarifying the system. It is easy to communicate an issue with the stakeholder using such a system and a source for Quality Assurance to run tests against. For example a developer may ask, “Enter xyz123 as the product number. See where it says number of credits as 12? It should be 11.” Instead of trying to explain something like, “For some of the product numbers we are off by 1 or 2.”
Another aspect of communication for software projects is team dialog. In the past, front-end teams and back-end teams often blocked one another from working by not clearly communicating. The reason is they largely considered time conversing a hindrance to having their work completed while each is trying to place the burden of investigation on the other.
Instead, team members should be accountable to each other. Agile only counts working code as its measure of progress. A partial feature is not progress. Therefore, if one side of the development is completed while the other is waiting, the team is no further along. However, this must be communicated — all or nothing with the team, having reward or consequence for each user story completed or remaining.
Last in communication is the various meetings. There is some argument among team members as to what meetings make sense and which are a waste of time. Among them are Morning Scrums, Sprint Planning Meetings, and Retrospectives.
Morning Scrums (short stand-up meetings) for each member to explain three things: what they are working on, what they did yesterday, and any blockers preventing them from their work. This can be done via chat for remote teams. However, as stated by Agile, face-to-face is best.
Sprint Planning Meetings are among the most important due to the fact that all work for a one to two-week period is planned in them. Stakeholders are welcomed (or should be) to attend. It is especially useful for the team to showcase what they did, realistically plan the next sprint, and communicate with one another.
Retrospectives are to be held just before Sprint Planning Meetings so as to walk through three things: what should the team keep doing, what should it stop doing, and what should it start doing. The basic premise is that everyone has a say in the continued improvement of the team, and by default, the process.
The risk management aspect of communication is handled by the Chain of Command and Chain of Custody. The Chain of Command is the communication layers of responsibility for a project and is a supplement to the standard Agile process. The Chain of Custody is covered later in this document.
In a Chain of Command is the various levels of responsibility. Starting at the top, we have the C-level (CTO, CIO, etc…). This level may or may not be relevant to the process — depending on the size of organization. However, we always begin with the most senior technical leader on the team. This may be more than one person, such as Project Manager and Technical Manager. Regardless, the point is this person or persons is responsible for communicating the project status to stakeholders while working to remove barriers from the team.
Below them is the Lead or Senior Engineers. This role should be responsible for code reviews, mentoring less experienced members, and reporting issues up the chain to Technical Management. The reason for the layers is accountability and assistance. Consider an inexperienced Engineer unable to complete their task. The Lead Engineer can assist them with teaching moments while giving examples. Instead of the entire team waiting on that person to be done.
Next in line is the middle level Engineer. Usually referred to as Software Engineer or Developer. This level is responsible for being a reliable developer. He or she is tasked with completing work at a high level of quality. They report to the Lead or Senior Engineer. Asking for assistance when needed, giving help when asked, and being the one of the core workers of the team.
Below them is the Junior Level Engineer. This role report to others on the Engineering Team as student. They must be willing to learn and teachable while being willing to work on tasks at and slightly above their competence level.
The Chain of Command places responsibility with each member of the team at different levels based on a hierarchy of competency. By formally recognizing the level of competency for each member, individuals can have a clear set of requirements needed to advance. Thus, the good ones will quickly work to advance in their career. By default exposing those unable or unwilling to perform.
Quality
One of the side effects of the old deadline model of “Have this on my desk by Monday morning.” is the developers temptation to cut corners. Sure, they may make the deadline but with a slew of buggy code riddled with technical debt. As a developer, impossible deadlines are soul killing. No longer can one think through problems or deliver good work. Instead, it is a race against the clock to deliver just enough to keep the paychecks coming.
Issues from deadlines include: hard coded values, the absence of proper testing, disregard of coding standards and the use of techniques chosen solely on what is fastest to complete the work. It is like trying to take a 4 hour drive in 2 hours as both are just as reckless.
Agile gives a great alternative with defined sprints of work. This ensures that only the amount work realistic for a team is attempted. It must include time for the team to think though problems, discuss issues, and hold one-another accountable for the pass or fail of the entire work scheduled.
At the start of each sprint, have the team create a series of automated tests for each user story. Testing the user interface, input, output, and non-desired states that could occur. Doing so on each repository check-in. Code that passes is auto merged into development while code that fails is refused. It is important to also have tests to ensure proper code formatting too. Coupled with peer-to-peer and management code reviews.
For the management code reviews, this are expected to be random — covering any section of code: front-end, back-end, mobile, etc. at any time. The reason is to give a high level overview of the overall understanding of the team. Ensuring both the architecture and team standards are being followed by all.
Second, require each task from a user story to be approved or denied by the person working upstream from it. For example, the infrastructure team builds the database in the development environment and creates the schema. The corresponding ticket is moved from the “in-progress” column to the “ready for testing” column. Next, the back-end developer who will write the code that communicates with the database, checks that the schema and database are the same as what has been planned. Only after the approval from the back-end developer is the infrastructure task marked completed.
Another aspect of this communication is when a change needs to be made in something downstream. For example, the back-end developer just realizes he or she needs a time-stamp column in the users table. They can update the database ticket and move it back to in-progress. After a successful change, the back-end developer moves the database ticket back to complete. For risk management, we call this the Chain of Custody.
Improving on this concept is to include a new column in your Kanban board titled, “Ready for Upstream” just before the much loved “Completed” column. That way, the upstream worker can move their downstream colleagues work to “Ready for Upstream” instead of “Completed”, signifying that there may be additional work required once more upstream work is accomplished.
Having the development run from the bottom up in a tree like fashion brings the finish line from the infrastructure to the user interface. Meaning, the last step, and one connected to all previous, is the intersection of user and machine. All the back-end systems, storage, computations, and services terminate into the user experience. Thus the UI/UX developers represent the last step of the process.
Please note that each member of the team, regardless of their position in the stream, should have work they can do during each phase. For example, UI/UX developers can convert graphical design into HTML and CSS while the Infrastructure team creates the database and development server. At times there is no work for one part of the team, those members can assist with other team tasks. This is crucial since the team is responsible for pass or fail — not a sole individual.
Third, is the use of Quality Assurance testers. With all the automation of tests, code compliance, and developer behavior modeling, it may seem there is no longer need for human testers. However, finding new issues once people are introduced to a system is the norm. Not to mention, it is human testing that ultimately gives a project the stamp of success or failure.
In our high profile client project, testing was completed using the user interface developed. The stakeholder and Quality Assurance were able to test results against multiple items. Comparing the counts to known values. As a result, multiple changes were made to code to make it more accurate.
It is important to realize that 100% testing is not possible. The decision to mark as complete is one that must accept a certain degree of risk. After testing for requirements, then common misuse, and then a check that changes made do not affect the existing system — it is ready for some level of production.
The bulk of testing occurs in production due to the largest number of users as compared with development. This means using a canary deployment into production once the team and stakeholders are satisfied it meets the minimum standards. However, the team must have application monitoring tools for this and a means to quickly revert back to previous versions if problems arise. If an issue is discovered, production is rolled back and the code updated.
Delivery
Agile tells us, “Working software over comprehensive documentation.” While there needs to be documentation in the form of user stories, screen mock-ups, and descriptions, in the end, the stakeholders do not care about the process. They care about the result. “Is the system in-budget, on-time, and ready for production,” is the only question they will ask.
Sure, the Agile Methodology delivers things as quickly as possible. Code is created and deployed as soon as possible. Stakeholders utilize the system while under development. Thus, the issue of late delivery is almost removed. However, there is still risk. Consider a User Story that is more complex than anticipated. Therefore, it is important to test each requirement using quantifiable data before even making an estimate of work.
In our high profile client delivery, we were late. That was week two. However, by week three, the project was converted from Waterfall to Agile and the issue of time-frame vanished. The reason for the deadline going away stemmed from the fact that we had a user interface the stakeholders could use. Therefore we were free to work on the system without the dreaded, “when will it be ready?”
In reality, delivery solves most problems. That is unless it is riddled with problem or limited in scope below that which was expected. Therefore, it is important to only deliver code that meets a specific standard while managing realistic expectations.
Stakeholder expectations are paramount to successful projects. When asking an exceptional developer, “Can you have this done by Friday?” He replied, “No and neither can anyone else.” While not what one wants to hear, it is the truth. Something that will come out during the project anyway. Therefore it is better to start with it then let be a surprise later in the process.
The Project Manager needs to actively manage stakeholder expectations with data — not conjecture. That way, the expectation is always inline with the actual process. Meaning if a certain user story is going to take 5 days instead of 2, the act of sharing that first, will eliminate the risk of trying to quickly deliver a sub-par product.
Conclusion
Software is about people, ideas, and machines. People are uncertain, ideas are hard to clarify, and machines break. Therefore the act of developing software is full of risk. Agile works out the day-to-day issues, but additional measures are required to fully manage the uncertainty.
Agile is a great method for development, but not a complete process. The process must include a layer of management, chain of command, and chain of custody. In so doing, it is including risk management techniques to ensure delivery of software that is on-time, in-budget, and ready for production.
For more on including risk management in your Agile process, see http://toddmoses.com