Friday, August 27, 2010

One Metric to Rule Them All and In the Darkness Bind Them

I think metrics and measurements are good when used in the correct way based on the context and team I am working with. For each team I am working with I use metrics to help them see what their issues are. Once they see their issues then we use metrics to help us determine as early as possible if changes we are making are having a positive or negative impact on those issues and the rest of the system.

Measurements ARE necessary to know we are headed in the right direction.

There are plenty of articles out there about abusing metrics. I thought it should be well known that all metrics need to be balanced. (e.g. code coverage going up and complexity going down) And of course they need to be trended to be useful.

Now I have a requests to find 1-2 metrics to apply to all teams to determine how effective agile and coaching are doing at improving the teams. Can someone really think that 1-2 metrics can be used to determine effectiveness?

All teams do not have the same highest priority issue(s). Teams with terrible user stories and acceptance criteria do not need the same metrics as a team trying to fix high coupling code issues.

Ok, enough complaining! To help me, and I hope others, I want write about 1) what are the goals of specific metrics 2) what are the dangers and abuses of those metrics? and 3) how to balance those metrics against each other?

Average velocity trend

* Predictability!! What can be done by a specific date or when can something be completed.
* Velocity is a *capacity* measure *NOT* a productivity measure.
* Velocity allows a team to know how much business value they can deliver over time.
* Developing a consistent velocity allows for more accurate (i.e. predictable) release and iteration planning.

*Possible abuses:*
* Calling this a measure of productivity. If velocity is the only number focused on it could even hurt productivity. Teams can artificially increase velocity in many ways; stop writing unit tests or acceptance test, increase estimates, stop fixing story defects and reduced customer collaboration just to name a few.
* Comparing velocity between teams. Velocity is a team value and not a global value. Many variables affect a team's velocity including relative estimating base, support requirements, number of defects, political environment of the product or project and more.
* Calculating velocity by individual. This leads to a focus on individual performance vs. team performance (i.e. sub optimization).
* Using velocity to commit to the content of an iteration when the value is not valid. Velocity is a simple concept and it provides a lightweight measure, but it is also a very mature measure. To be useful it requires estimation maturity and the consistent application of this over a period of time by a stable team base. If it lacks these elements its abuse can come at the hands of management or from the team, the latter occurring when a team makes assumption about the validity of the metric when, without the mature elements in place, it is not usable at all.

*Balancing metrics:*
* Percentage of rework versus stories done on average each iteration. This can help a team see how much of their work each iteration is delivering new value to the team's customers.
* Planned work versus unplanned work trend. A lot of unplanned work will cause a teams velocity to be of less value because it hinders a team's ability to plan. Having a low value for unplanned work will make the teams planning more consistent and accurate.
* Code quality metrics such as code test coverage, cyclomatic complexity,static error checking and performance. A team that is increasing their velocity by not focusing on code quality is making a short term decision that will have a negative impact over time.

Delivered Features vs. Rework Resolution trend

* Makes _waste_ visible so that it can be eliminated.
* Gives the team a good understanding of how much of their iteration capacity is consumed by rework (i.e._waste_).

*Possible abuses/issues:*
* Lagging indicator of the team quality.
* Story defects are not worked on until a regression period giving a short term indication of fewer defects.
* Increasing story estimates and/or reducing defect estimates
* Hiding defects as stories.

*Balancing metrics:*
* An inconsistent velocity. Delaying defect correction until later will make the velocity trend erratic with large spikes.
* Planned versus unplanned scope. A team that is delaying defect correction will tend to have more unplanned work due to poor quality issues.
* Number of defects in the backlog. Ideally this number should be on a downward trend. An upward trend of the number of defects in the backlog could indicate the team is delaying defect correction.
* Increasingly long regression periods at the end of each release.

Completed work vs. Carryover trend

* How well the team is in their execution of the iteration (i.e. delivering on their commitments)

*Possible abuses:*
* Planning less work than the team is capable of to allow for interruptions or poor estimating.
* Delay refactoring code to complete work but not keeping the code at a level that makes change cheaper and easier in the future. (or other good practices such as TDD/unit testing)

*Balancing metrics:*
* A velocity trend that is not improving or is going down could be caused by planning less than the real capacity of the team.
* Planned versus unplanned work can indicate if the team is being interrupted and is causing task switching that could be the cause of the carryover.
* Downward test coverage trend and/or upward cyclomatic complexity trend could indicate that the code is becoming more difficult to change and much more difficult to estimate accurately.

Planned vs. Unplanned Scope trend

* Show how well the team is at planning.
* Show how often the team is being interrupted within the iteration to work on something that wasn't originally planned.

*Possible abuses:*
* Large place holders to allow unplanned work to come in and appear to be part of the planned work.

*Balancing metrics:*
* Delivered Features vs. Rework Resolution trend.
* Completed work vs. Carryover trend

Code coverage vs. Cyclomatic Complexity trends

* Reduce the cost of change. Clean code tends to make the application easier to understand and safer to change.
* Indicates that the system is being tested at an accurate level.
* Indicates that the code quality is good; loosely coupled, simple as possible, etc.

*Possible abuses:*
* focusing only on one code metric. e.g. 100% code coverage with generated tests will not make the code easier to understand or change.
* focusing on code quality alone and not focusing on business goals of the customer.

*Balancing metrics:*
* Velocity trend
* Delivered Features vs. Rework Resolution trend
* afferent and efferent coupling trends
* abstractness trend
* package dependency cycles
* number of changes in a class(es)

This is far from an exhaustive list of metrics! But I hope the idea of thinking about a metric and what your goal is of measuring a value and how you can stop yourself or others from gaming the value by balancing it with other methods.

** I started this article based on a set of metrics that my colleague Mike Stout uses, so thanks for the ideas Mike. Several other coaches I work with gave me feedback on this as well. Thanks!

Not Laughing Anymore

I use to read all the blog posts about the year of Linux on the desktop and laugh. Not because I do not like Linux but simply because it was simply to complex for the average user. But my opinion is changing.

Out of absolute frustration with the poor performance of my new work laptop running XP I decided to install the latest Ubuntu version for duel boot. Wow it was easy and fast. I have been able to do everything on Ubuntu 10.04 that I do on a daily basis. It works within my corporate network seamlessly. It worked at home even better. It is super fast but I am sure this is partially because I get the full 64 bit support that I do not get with XP.

Another great thing is how it feels the same whether I am at home or at the office. Windows XP behaved better outside the corporate network. I do not think this is all Windows fault. I am sure the corporate installed tools are a big part of the problem. But now I am running the evolution email client and it works the same in and out of the office. All the development tools do as well.

I have not been able to connect to the VPN we have, which was done with the Nortel VPN client on Windows. However, I used this to connect to outlook and I do not need it now.

I am still not 100% convinced and it has only been a week but so far Ubuntu is doing great as my corporate and at home OS.