PHP refactoring in legacy code

http://www.tomslabs.com/index.php/2012/01/php-refactoring-in-legacy-code/

The story we’ll talk about is a true story. It happened to be challenging and helped the team keep testing its beliefs in XP, iterative developments and code quality.

Product elevator statement

Imagine a well legac”ied” project you don’t know.
  • Product is a web forum with millions of messages.
  • We want to rebuild the categorization mechanism (messages are “categorized” meaning they are assigned to a category that best describes their content).
  • Mission : fix all bugs
  • “Short delay” and “no regression” are the words.
  • Only few people share the knowledge of the categories system to be refactored.
  • Numerous bugs (useless to mention that several generations of developers brought contributions to the project).
  • 20 commiters.
Background
From the team’s point of view, here are the goals we anticipated we needed to achieve:
  • Understand the expected behavior of the categorization mechanism
  • Bring no regression to the actual behavior
  • Replace the old mechanism by a new one
First decision we took was to use Git to work on that project. We won’t explain in details that choice (20 commiters, we wanted to avoid working in a dedicated branch for weeks and commit in the HEAD trunk of the project…). It has already been discussed here.

Refactoring strategy

As a Team, we decided to do the refactoring as follow:
  1. With the Product Owner, write BDD scenarii describing how the categories mechanism works
  2. Switch on the “Test Harness” by automating (implementing) the BDD scenarii
  3. Encapsulate ALL calls to the old categories mechanism behind an API (adding Unit Tests to that new API aswell)
  4. Based on the API contract, build the new mechanism relying on a new categories data model

1. Write BDD scenarii to describe the categorization behavior

First two weeks were spent “extracting” all the possible knowledge from the Product Owner about the product and translate it into BDD scenarii.
Example:

Given I am a visitor
When I go to url "http://www.infos-du-net.com.sf/forum/"
Then below the meta-category "Multimédia", I have the following sub-categories with content
| cat name                 | decrypted url                                      |
| Image et son             | http://www.infos-du-net.com.sf/forum/forum-20.html |
| Appareils photo, cameras | http://www.infos-du-net.com.sf/forum/forum-47.html |
| Consoles                 | http://www.infos-du-net.com.sf/forum/forum-29.html |
At the end of this step:
  • 100 BDD scenarii written
  • Shared knowledge of the expected application behavior

2. Switch on the “Test Harness”

We used Behat (PHP based) to implement the scenarii.
Some of the scenarii written with the Product Owner describe a behavior involving integration with third-party systems. They were not implemented because such tests, seen as “integration tests”, were seen as complicated and hard to maintain. We preferred to invest on Unit Tests by Contract (as well explained byJBrains).
Some scenarii were implemented but not automated because describing a behavior that highlights a bugor describing the future behavior. They got RED at the time of the implementation and would go GREEN by the end of the project.
At the end of this step:
  • The “Test Harness” is switched on !
  • Thanks to the Continuous Integration Platform, we are able to frequently test the categories behavior and ensure we will not break anything during the refactoring.

3. Encapsulate old categorization mechanism behind an API

Example of code BEFORE encapsulation (old DAO was FrmCategoryTable)

public function executeIndex(sfWebRequest $request) {

$categoryList = FrmCategoryTable::getForumList($idSite, $culture, $user);

}
In order to better test and avoid perturbation with other commiters, we’ve encapsulated all calls to the old category mechanism behind a new API.
We keep the calls to the old category mechanism, but we isolate them into a dedicated API.
Example of code AFTER encapsulation (new API is categoryProvider)

public function executeIndex(sfWebRequest $request) {

$categoryList = $this->categoryProvider->getAllCategories($culture, $brand, $country, ICategoryProvider::SERVICE_FORUM, $user);

}
Code that implements the new API

class CategoryProvider implements ICategoryProvider {
public function getAllCategories($culture, $brand, $country, $service, $user) {
$categoryList =
CatBrandAndCountryTable::getInstance()
->getAllCategories
($culture, $brand, $country, $service, $user);
return $categoryList;
}
}
At the end of this step:
  • The old mechanism is isolated behind an API
  • The “Test Harness” is still switched on !

4. Based on the API contract, build the new mechanism relying on the new categories data model

During the encapsulation step we’ve created the API that is the CONTRACT of our categories mechanism.
At this time we made the choice to start the implementation of the new API. It was probably not the best choice because for several days the new behavior was only partly implemented. We should have worked on another implementation of the API based on the CONTRACT we had extracted from the previous step.
Only once this is done, we should have switched from one implementation of the API to the other.
Code that implements the new API

class CategoryProvider implements ICategoryProvider {
public function getAllCategories($culture, $brand, $country, $service, $user) {

$categoryList = FrmCategoryTable::getForumList(
$siteId, $culture, $user, $categoryLevel);

}
}
At the end of this step:
  • The new mechanism is plugged (new DAO CatBrandAndCountryTable)
  • The “Test Harness” is still switched on !

Conclusion

  • Quite a big system was refactored without service interruption
  • No merge conflicts because we always committed in the trunk/HEAD
  • No projects conflicts because we isolated the pieces of code that were aimed to be re-factored
  • The writing of BDD scenarii WITH THE Product Owner helped implementing the right behavior and sharing the knowledge.

Comments

Popular posts from this blog

Navigating the Jungle of Web Traffic: A Technical Team Lead's Guide to "I'm a Celebrity, Get Me Out of Here"

TCP Handshake over IPv6

The Vital Importance of Secure Wi-Fi Networks