Monday, 24 December 2012

Testing Existing Code

When beginning Unit testing it is hard to find a good starting point. There is a lot of theory and some very good points of reference, but what about actually getting down and dirty & hands-on as so many of us love to do? Here are some ideas, and pointers on how you might want to think about starting out with TDD and Unit testing an existing system. I am a PHP Developer, primarily working on an Apache/MySQL stack, however I would assume these principles can be carried over to any project, if you substitute the appropriate technologies.

I will assume you have been through a few basic tutorials on PHPUnit or are at least familiar with a Unit Testing framework of some description. I won't go into a step by step set up but might assume you know a few things.

1. I would begin by getting a local development copy set up and running. Using XAMPP and MySQL, get a copy of your web application running off of localhost, even if it only has a few test rows of data and is stripped down, or whether it has lots of data isn't really relevant, the fact is I prefer to have a local copy for working on.

2. Obtain your PHP testing framework and get this set up into a directory above your public htdocs folder. I would begin by placing my phpunit files in a folder somewhere not publically accessible, be this above your web directory (htdocs/public_html/www), or within a secure members area or administration area. Not that I intend to have anything secretive in there, just that I feel it's good practice. I would simply create a test class that does the very basic - doesn't even interact with my application and simply asserts that true is true, to verify I have it up and running.

3. Next it is important to understand what unit testing is testing. It is testing single units of your code in isolation. For example, testing a method for verifying the number of items in a shopping basket, should only focus on testing the number of items in the basket, and not be worrying about asserting that the add or update items methods work. (You should have separate tests that verify this, and you should be confident enough in those tests that you should be able to say.. "This works and I can use it in another test". The focus needs to be on the counting method).

4. I would now look to touch on the application ever so slightly. Basically, we want to write an __autoload method, so we don't have to include all our classes for testing everytime. Now the sneaky bit here, is that we are going to override the database connection later, and depending on how this interaction is working, if it's inline then this will be messy, but if it is from a class that handles your database connections and queries (eg some sort of PDO or mysql_ class), this might still get messy.
We want to have an autoload function that will search our application and include the class automatically, so we hope that some form of naming convension is used for helping this. If there are a few different places to search within the file system, ie if some classes may be in a /core/ folder, or if some may be within an individual module, you'll probably want to loop through these folders and check for the class file to include when it is first instanciated within the code. The plan is to add a new location to this list, a place to check before anywhere else, a dummy or testing folder. This will allow us to make a test database wrapper and by calling this the same thing as the live one, it will be included before the autoload method finds the live class, and the tests will use that.

5. If I have lots of files which do'nt rely on the database class, or any class, I will try to leave those for a bit, and find one that does to get me started. I move away from the MySql database, and move towards a file based structure, or in one simple instance I used an in memory php array. I found understanding this was important as when I began instanciating my classes for testing, there was some information being returned from the database or checks being made that certain information was set/not set, and having at least a simple layer of abstration in there to begin with help me get my head around how much the database was being queried.

6. Now its a case of slowly building up the unit tests. For each class, I worked through each method that was there, attempting to do a bit of set up, eg, create a shopping cart. I would then do my assert, eg, assert that cart has no items. Then destroy the cart. The clean up is important to avoid any interference between tests. You want to ensure that each test has a clean start, you should know the state of the system before any test, no matter which tests have ran before. Eg, testing an add item to cart method, should empty and destroy the cart before moving to the next test that counting the number of items present, and you should be aware that it is, as these could interfere if not.

7. While working away through legacy code, I would focus on getting some base of tests for existing classes and models, with a view of going through any code which doesn't utilise classes later. I definitely think having all your model code in classes is the best way towards testing, your controllers and script pages should be as simple as possible to read and follow.

8. Now you have a suite of tests for your code, and can think more about refactoring and rewriting it now. You can make changes to your system in the knowledge that you can run these tests at any time, and know within 10 or 20 seconds if you have broken any of the existing functionality.

Disclaimer: I am not saying this is a fool proof or perfect way to get into testing, but it is the way I have got into it myself. I am also by no means an expert on this subject, and would class myself very novice in the area of testing. However I would get some information out there for anyone new to, or thinking about looking at unit testing.

Comments welcome.

Monday, 17 December 2012

More Than a Dot

No one knows if it was a man, or a woman, or a child that first did it, but we do know that about 40,000 years ago, someone put a faint red dot on the wall of a cave in Spain.

Humankind has always felt a burning desire to record information and transmit it into the future. What that dot meant, no one really knows. Perhaps it was the first expression of binary?

Perhaps it was just a dot.

7,000 years later, and our ancestors were expressing themselves in the form of horses, panthers, cave bears, mammoths, and much more inside different caves, this time in the south of France. It’s the first example of recorded history, the first transmission into the future.

It took many thousands more years until we discovered writing. The Sumerians started scraping sigils into clay tablets and baking them — an act which preserved them almost forever, and because of it we, know so much about their society and the way it worked.

Now, of course, we have the Web, and the world has been transformed. So much information is available that our biggest problems are sorting, classifying, filtering, and absorbing, and the rate at which information is growing is staggering. In 2010, we broke the zettabyte barrier, and in 2011, nearly 1.8 zettabytes of information was created, stored, and replicated. Numbers aren’t available for 2012 yet, but we can safely assume that it will have been greater, and that 2013 will be greater still.

The scale is phenomenal. If a penny represented every gigabyte of storage that was consumed, you could use nothing but pennies to construct a 1:1 replica of the Empire State Building in New York and still have some change left over for a cup of coffee when you were done with your labors.

As we are creating all this information, the way in which we are realizing we can use it is changing. We’re structuring the unstructured, and turning the unknowable into knowledge. We’re making tools for navigating the data more effectively, and we’re learning that sometimes raw information is knowledge — you just have to know to ask the right questions.

Improvements in the browser technology available is allowing web developers to push the boundaries of what was thought possible. Our interfaces are richer, and the interactivity is far greater than it ever has been before.

It’s not just the browser — far from it. As our processing spills out of our homes and offices and onto our smart phones and smart devices, the Web has ceased to be just about the browser. The Web has become a message bus. A conduit by which information is ferried back and forth, not just from server to client, but from client to client as well. I can share my thoughts and feelings with friends on the other side of the world instantly, and to the extent that sometimes it doesn't feel as though there is any distance at all.

Each development spurs a rush of new apps and new ways of looking at older data, which creates new data that pushes us forward again. It’s not that our time to market is shrinking, it’s that our time to market has become instantaneous, and our tools and techniques are adapting to cope with that.

However, as the end of the year approaches, I get reflective, and I find myself looking at the Web and worrying about one thing. 40,000 years from now, when our descendants cast their gaze back on the Web, will they understand what we were saying?

Have we preserved the knowledge we’re creating for future generations, or are we just doing the best we can to keep up with the rate that we’re creating it? Can we be sure that we’re doing the equivalent of writing on clay tablets, baked in the sun and made to last forever?

And, once we are sure of that, will our children’s children’s children still be able to make sense of what we have to say, or have we become so myopic that we no longer care about transmissions further into the future than our own lifetimes? Will our apps still serve them as they have served us? Are we the architects of a library, or of Babylon?

I’d certainly like to think we’re creating more than just a dot.


http://webadvent.org/2012/more-than-a-dot-by-james-duncan

Writing Readable Code

http://annafilina.com/blog/writing-readable-code/

Wednesday, 24 October 2012

Introduction to TDD


1. What is TDD?

The steps of test first development (TFD) are overviewed in the UML activity diagram of Figure 1.  The first step is to quickly add a test, basically just enough code to fail.  Next you run your tests, often the complete test suite although for sake of speed you may decide to run only a subset, to ensure that the new test does in fact fail.  You then update your functional code to make it pass the new tests.  The fourth step is to run your tests again.  If they fail you need to update your functional code and retest.  Once the tests pass the next step is to start over (you may first need to refactor any duplication out of your design as needed, turning TFD into TDD).TDD: A Practical Guide


I like to describe TDD with this simple formula:
   TDD = Refactoring + TFD.
TDD completely turns traditional development around. When you first go to implement a new feature, the first question that you ask is whether the existing design is the best design possible that enables you to implement that functionality.  If so, you proceed via a TFD approach.  If not, you refactor it locally to change the portion of the design affected by the new feature, enabling you to add that feature as easy as possible.  As a result you will always be improving the quality of your design, thereby making it easier to work with in the future.
Instead of writing functional code first and then your testing code as an afterthought, if you write it at all, you instead write your test code before your functional code.  Furthermore, you do so in very small steps – one test and a small bit of corresponding functional code at a time.  A programmer taking a TDD approach refuses to write a new function until there is first a test that fails because that function isn’t present.  In fact, they refuse to add even a single line of code until a test exists for it.  Once the test is in place they then do the work required to ensure that the test suite now passes (your new code may break several existing tests as well as the new one).  This sounds simple in principle, but when you are first learning to take a TDD approach it proves require great discipline because it is easy to “slip” and write functional code without first writing a new test.  One of the advantages of pair programming is that your pair helps you to stay on track.
There are two levels of TDD:
  1. Acceptance TDD (ATDD).  With ATDD you write a single acceptance test, or behavioral specification depending on your preferred terminology, and then just enough production functionality/code to fulfill that test.  The goal of ATDD is to specify detailed, executable requirements for your solution on a just in time (JIT) basis. ATDD is also called Behavior Driven Development (BDD).
  2. Developer TDD. With developer TDD you write a single developer test, sometimes inaccurately referred to as a unit test, and then just enough production code to fulfill that test.  The goal of developer TDD is to specify a detailed, executable design for your solution on a JIT basis.  Developer TDD is often simply called TDD.
Figure 2 depicts a UML activity diagram showing how ATDD and developer TDD fit together.  Ideally, you'll write a single acceptance test, then to implement the production code required to fulfill that test you'll take a developer TDD approach. This in turn requires you to iterate several times through the write a test, write production code, get it working cycle at the developer TDD level.
TDDSpecification by Example
Figure 2. How acceptance TDD and developer TDD work together.
 ATDD and TDD
Note that Figure 2 assumes that you're doing both, although it is possible to do either one without the other.  In fact, some teams will do developer TDD without doing ATDD, see survey results below, although if you're doing ATDD then it's pretty much certain you're also doing developer TDD.  The challenge is that both forms of TDD require practitioners to have technical testing skills, skills that many requirement professionals often don't have (yet another reason why generalizing specialists are preferable to specialists).
An underlying assumption of TDD is that you have a testing framework available to you.  For acceptance TDD people will use tools such as Fitnesse or RSpec and for developer TDD agile software developers often use the xUnit family of open source tools, such as JUnit or VBUnit, although commercial tools are also viable options.  Without such tools TDD is virtually impossible.  Figure 3 presents a UML state chart diagram for how people typically work with such tools.  This diagram was suggested to me by Keith Ray.

Figure 3. Testing via the xUnit Framework.
Kent Beck, who popularized TDD in eXtreme Programming (XP) (Beck 2000), defines two simple rules for TDD (Beck 2003).  First, you should write new business code only when an automated test has failed.  Second, you should eliminate any duplication that you find.  Beck explains how these two simple rules generate complex individual and group behavior:
  • You develop organically, with the running code providing feedback between decisions.
  • You write your own tests because you can't wait 20 times per day for someone else to write them for you. 
  • Your development environment must provide rapid response to small changes (e.g you need a fast compiler and regression test suite).
  • Your designs must consist of highly cohesive, loosely coupled components (e.g. your design is highly normalized) to make testing easier (this also makes evolution and maintenance of your system easier too).
TDDTDD
For developers, the implication is that they need to learn how to write effective unit tests.  Beck’s experience is that good unit tests:
  • Run fast (they have short setups, run times, and break downs).
  • Run in isolation (you should be able to reorder them).
  • Use data that makes them easy to read and to understand.
  • Use real data (e.g. copies of production data) when they need to.
  • Represent one step towards your overall goal.

2. TDD and Traditional Testing

TDD is primarily a specification technique with a side effect of ensuring that your source code is thoroughly tested at a confirmatory level.  However, there is more to testing than this.  Particularly at scale you'll still need to consider other agile testing techniques such as pre-production integration testing and investigative testing.  Much of this testing can also be done early in your project if you choose to do so (and you should). 
With traditional testing a successful test finds one or more defects.  It is the same with TDD; when a test fails you have made progress because you now know that you need to resolve the problem.  More importantly, you have a clear measure of success when the test no longer fails. TDD increases your confidence that your system actually meets the requirements defined for it, that your system actually works and therefore you can proceed with confidence. 
Lean-Agile ATDD
As with traditional testing, the greater the risk profile of the system the more thorough your tests need to be. With both traditional testing and TDD you aren't striving for perfection, instead you are testing to the importance of the system.  To paraphrase Agile Modeling (AM), you should "test with a purpose" and know why you are testing something and to what level it needs to be tested.  An interesting side effect of TDD is that you achieve 100% coverage test – every single line of code is tested – something that traditional testing doesn’t guarantee (although it does recommend it).  In general I think it’s fairly safe to say that although TDD is a specification technique, a valuable side effect is that it results in significantly better code testing than do traditional techniques.   
 
If it's worth building, it's worth testing.
If it's not worth testing, why are you wasting your time working on it?

3. TDD and Documentation

Like it or not most programmers don’t read the written documentation for a system, instead they prefer to work with the code.  And there’s nothing wrong with this.  When trying to understand a class or operation most programmers will first look for sample code that already invokes it.  Well-written unit tests do exactly this – the provide a working specification of your functional code – and as a result unit tests effectively become a significant portion of your technical documentation. The implication is that the expectations of the pro-documentation crowd need to reflect this reality.  Similarly, acceptance tests can form an important part of your requirements documentation.  This makes a lot of sense when you stop and think about it.  Your acceptance tests define exactly what your stakeholders expect of your system, therefore they specify your critical requirements.  Your regression test suite, particularly with a test-first approach, effectively becomes detailed executable specifications.
Are tests sufficient documentation?  Very likely not, but they do form an important part of it.  For example, you are likely to find that you still need user, system overview, operations, and support documentation.  You may even find that you require summary documentation overviewing the business process that your system supports.  When you approach documentation with an open mind, I suspect that you will find that these two types of tests cover the majority of your documentation needs for developers and business stakeholders.  Furthermore, they are a wonderful example of AM's Single Source Information practice and an important part of your overall efforts to remain asagile as possible regarding documentation.

4. Test-Driven Database Development

At the time of this writing an important question being asked within the agile community is “can TDD work for data-oriented development?”  When you look at the process depicted in Figure 1 it is important to note that none of the steps specify object programming languages, such as Java or C#, even though those are the environments TDD is typically used in.  Why couldn't you write a test before making a change to your database schema?  Why couldn't you make the change, run the tests, and refactor your schema as required?  It seems to me that you only need to choose to work this way.
My guess is that in the near term database TDD, or perhaps Test Driven Database Design (TDDD), won't work as smoothly as application TDD.  The first challenge is tool support.  Although unit-testing tools, such as DBUnit, are now available they are still an emerging technology at the time of this writing. Some DBAs are improving the quality of the testing they doing, but I haven’t yet seen anyone take a TDD approach to database development.  One challenge is that  unit testing tools are still not well accepted within the data community, although that is changing, so my expectation is that over the next few years database TDD will grow.  Second, the concept of evolutionary development is new to many data professionals and as a result the motivation to take a TDD approach has yet to take hold.  This issue affects the nature of the tools available to data professionals – because a serial mindsetstill dominates within the traditional data community most tools do not support evolutionary development.  My hope is that tool vendors will catch on to this shift in paradigm, but my expectation is that we'll need to develop open source tools instead.  Third, my experience is that most people who do data-oriented work seem to prefer a model-driven, and not a test-driven approach.  One cause of this is likely because a test-driven approach hasn't been widely considered until now, another reason might be that many data professionals are likely visual thinkers and therefore prefer a modeling-driven approach.

5. Scaling TDD via Agile Model-Driven Development (AMDD)

TDD is very good at detailed specification and validation, but not so good at thinking through bigger issues such as the overall design, how people will use the system, or the UI design (for example).  Modeling, or more to the pointagile model-driven development (AMDD) (the lifecycle for which is captured in Figure 4) is better suited for this.  AMDD addresses the agile scaling issues that TDD does not. 
Figure 4. The Agile Model Driven Development (AMDD) lifecycle.
AMDD Lifecycle
Comparing TDD and AMDD:
  • TDD shortens the programming feedback loop whereas AMDD shortens the modeling feedback loop.
  • TDD provides detailed specification (tests) whereas AMDD is better for thinking through bigger issues.
  • TDD promotes the development of high-quality code whereas AMDD promotes high-quality communication with your stakeholders and other developers.
  • TDD provides concrete evidence that your software works whereas AMDD supports your team, including stakeholders, in working toward a common understanding.
  • TDD “speaks” to programmers whereas AMDD speaks to business analysts, stakeholders, and data professionals.
  • TDD is provides very finely grained concrete feedback on the order of minutes whereas AMDD enables verbal feedback on the order minutes (concrete feedback requires developers to follow the practice Prove It With Code and thus becomes dependent on non-AM techniques).
  • TDD helps to ensure that your design is clean by focusing on creation of operations that are callable and testable whereas AMDD provides an opportunity to think through larger design/architectural issues before you code.
  • TDD is non-visually oriented whereas AMDD is visually oriented.
  • Both techniques are new to traditional developers and therefore may be threatening to them.
  • Both techniques support evolutionary development.
Which approach should you take?  The answer depends on your, and your teammates, cognitive preferences.  Some people are primarily "visual thinkers", also called spatial thinkers, and they may prefer to think things through via drawing.  Other people are primarily text oriented, non-visual or non-spatial thinkers, who don't work well with drawings and therefore they may prefer a TDD approach.  Of course most people land somewhere in the middle of these two extremes and as a result they prefer to use each technique when it makes the most sense.    In short, the answer is to use the two techniques together so as to gain the advantages of both.
How do you combine the two approaches?  AMDD should be used to create models with your project stakeholders to help explore their requirements and then to explore those requirements sufficiently in architectural and design models (often simple sketches).  TDD should be used as a critical part of your build efforts to ensure that you develop clean, working code.  The end result is that you will have a high-quality, working system that meets the actual needs of your project stakeholders.

6. Why TDD?

A significant advantage of TDD is that it enables you to take small steps when writing software.  This is a practice that I have promoted for years because it is far more productive than attempting to code in large steps.  For example, assume you add some new functional code, compile, and test it.  Chances are pretty good that your tests will be broken by defects that exist in the new code.  It is much easier to find, and then fix, those defects if you've written two new lines of code than two thousand. The implication is that the faster your compiler and regression test suite, the more attractive it is to proceed in smaller and smaller steps.  I generally prefer to add a few new lines of functional code, typically less than ten, before I recompile and rerun my tests.
I think Bob Martin says it well “The act of writing a unit test is more an act of design than of verification.  It is also more an act of documentation than of verification.  The act of writing a unit test closes a remarkable number of feedback loops, the least of which is the one pertaining to verification of function”.
The first reaction that many people have to agile techniques is that they're ok for small projects, perhaps involving a handful of people for several months, but that they wouldn't work for "real" projects that are much larger.   That’s simply not true.  Beck (2003) reports working on a Smalltalk system taking a completely test-driven approach which took 4 years and 40 person years of effort, resulting in 250,000 lines of functional code and 250,000 lines of test code.  There are 4000 tests running in under 20 minutes, with the full suite being run several times a day.  Although there are larger systems out there, I've personally worked on systems where several hundred person years of effort were involved, it is clear that TDD works for good-sized systems. 

7. Myths and Misconceptions

There are several common myths and misconceptions which people have regarding TDD which I would like to clear up if possible. Table 1 lists these myths and describes the reality.
Table 1. Addressing the myths and misconceptions surrounding TDD.
MythReality
You create a 100% regression test suiteAlthough this sounds like a good goal, and it is, it unfortunately isn't realistic for several reasons:
  • I may have some reusable components/frameworks/... which I've downloaded or purchased which do not come with a test suite, nor perhaps even with source code.  Although I can, and often do, create black-box tests which validate the interface of the component these tests won't completely validate the component. 
  • The user interface is really hard to test.  Although user interface testing tools do in fact exist, not everyone owns them and sometimes they are difficult to use.  A common strategy is to not automate user interface testing but instead to hope that user testing efforts cover this important aspect of your system.  Not an ideal approach, but still a common one.  
  • Some developers on the team may not have adequate testing skills. 
  • Database regression testing is a fairly new concept and not yet well supported by tools. 
  • I may be working on a legacy system and may not yet have gotten around to writing the tests for some of the legacy functionality.
The unit tests form 100% of your design specificationPeople new to agile software development, or people claiming to be agile but who really aren't, or perhaps people who have never been involved with an actual agile project, will sometimes say this.  The reality is that the unit test form a fair bit of the design specification, similarly acceptance tests form a fair bit of your requirements specification, but there's more to it than this.  As Figure 4 indicates, agilists do in factmodel (and document for that matter), it's just that we're very smart about how we do it.  Because you think about the production code before you write it, you effectively perform detailed design as  I highly suggest reading my Single Source Information: An Agile Practice for Effective Documentation article.
You only need to unit testFor all but the simplest systems this is completely false.  The agile community is very clear about the need for a host of other testing techniques
TDD is sufficient for testingTDD, at the unit/developer test as well as at the customer test level, is only part of your overall testing efforts.  At best it comprises your confirmatory testing efforts, but as Figure 5 shows you must also be concerned about independent testing efforts which go beyond this.  See Agile Testing and Quality Strategies: Reality over Rhetoric for details about agile testing strategies.
TDD doesn't scaleThis is partly true, although easy to overcome.  TDD scalability issues include:
  1. Your test suite takes too long to run.  This is a common problem with a equally common solutions.  First, separate your test suite into two or more components.  One test suite contains the tests for the new functionality that you're currently working on, the other test suite contains all tests.  You run the first test suite regularly, migrating older tests for mature portions of your production code to the overall test suite as appropriate.  The overall test suite is run in the background, often on a separate machine(s), and/or at night.  At scale, I've seen several levels of test suite -- development sandbox tests which run in 5 minutes or less, project integration tests which run in a few hours or less, a test suite that runs in many hours or even several days that is run less often.  On one project I have seen a test suite that runs for several months (the focus is on load/stress testing and availability).  Second, throw some hardware at the problem. 
  2. Not all developers know how to test.  That's often true, so get them some appropriate training and get them pairing with people with unit testing skills.  Anybody who complains about this issue more often than not seems to be looking for an excuse not to adopt TDD.
  3. Everyone might not be taking a TDD approach.  Taking a TDD approach to development is something that everyone on the team needs to agree to do.  If some people aren't doing so, then in order of preference: they either need to start, they need to be motivated to leave the team, or your team should give up on TDD.

Figure 5. Overview of testing on agile project teams.
Agile Testing


8. Who is Actually Doing This?

Unfortunately the adoption rate of TDD isn't as high as I would hope.  Figure 6, which summarizes results from the 2010 How Agile Are You? survey, provides insight into which validation strategies are being followed by the teams claiming to be agile.  I suspect that the adoption rates reported for developer TDD and acceptance TDD, 53% and 44% respectively, are much more realistic than those reported in my 2008 Test Driven Development (TDD) Survey.

Figure 6. How agile teams validate their own work.
Agile Criterion: Validation

9. Summary

Test-driven development (TDD) is a development technique where you must first write a test that fails before you write new functional code.  TDD is being quickly adopted by agile software developers for development of application source code and is even being adopted by Agile DBAs for database development.  TDD should be seen as complementary to Agile Model Driven Development (AMDD) approaches and the two can and should be used together. TDD does not replace traditional testing, instead it defines a proven way to ensure effective unit testing.  A side effect of TDD is that the resulting tests are working examples for invoking the code, thereby providing a working specification for the code. My experience is that TDD works incredibly well in practice and it is something that all software developers should consider adopting.

10. Tools

The following is a representative list of TDD tools available to you.  Please email me with suggestions.  I also maintain a list of agile database development tools.
.Net developers may find this comparison of .Net TDD tools interesting.

11. References and Suggested Online Readings




http://www.agiledata.org/essays/tdd.html

Wednesday, 15 August 2012

Validating Emails in PHP


A very quick snippet today because I've told two people to use this approach in the last few days and both of them told me they didn't know about it. How to check if an email address is valid in PHP: use one of the Filter functions, like this:
 
$email1 = "nonsense.something@dottiness";  // not a valid email
$email2 = "dotty@something.whatever";  // valid email
 
$clean_email1 = filter_var($email1, FILTER_VALIDATE_EMAIL); // $clean_email1 = false
$clean_email2 = filter_var($email2, FILTER_VALIDATE_EMAIL); // $clean_email2 = dotty@something.whatever
The Filter extension was new in PHP 5.2, but is one of the unsung heroes of the language. It's rare for me to ever describe one approach as the "right" way to do something - but for validating data, Filter really is excellent, offering both validating and sanitising filters and generally making it super-easy to clean up incoming variables. Many of the frameworks offer equivalents, and I'm sure many of those are wrapping this too.

Wednesday, 8 August 2012

Coding Standards

I've recently come across http://pear.php.net/manual/en/standards.php, which got a few ideas jumping about my head, and reminded me of a tool I use from time to time for curiosity purposes more than anything - phpcs or PHP CodeSniffer (http://pear.php.net/package/PHP_CodeSniffer/).

I occasionally run phpcs against some of the codebases I work on, just to see if there seems to be any structure or standard being followed throughout the system. The results are usually verbose and stressful to even begin trying to fix, and sometimes wonder if coding standards are used effectively anywhere.

With the many techniques and ways of completing a task in PHP and with PHP developers becoming more and more OOP oriented, developers are beginning to play around with inheritance and abstraction, namespaces and interfaces, getters and setters - All the buzzwords and terms which lead to more and more overly complex code for no reason other than - "we used all of the above", and as a result "our code must be cutting edge".

But the issue here comes at the cost of maintenance. I guess this can be classed as Technical Debt and it isn't necessarily the fault of the developer either.

In some cases developers may walk into a team where no standards are currently in place, and it is up to that developer to set a standard. Some developers will be happy to struggle along and hack and slash code until it works and leave satisfied at that, but there will come a point when they suddenly realise a standard is the way forward. But what should a general coding standard cover?

I definitely don't think a coding standard document should be hundreds of pages long, however I definitely don't think it should be a couple of sentences. Somewhere in the middle I guess is between 8 and 10 pages. Just long enough to set the foundations for new developers, and not to long to lose them that they wonder what they're walking into.

I think a common coding standard should cover 6 key issues:

1.  Naming Conventions
2.  File Naming and Organization
3.  Formatting and Indentation
4.  Comments and Documentation
5.  Classes, Functions and Interfaces
6.  Testing

Each of these, I'll discuss in a few lines.

1. Naming Conventions
Naming conventions here should cover pretty general variable naming standards. Although some of the standard may be obvious to programmers of say C or Java, as they would be used to explicitly declaring the scope of their variables, it might be good practise to adopt a similar standard even within PHP where things are a little more flexible. For example, would you like string variables to be prefixed with str (strName), or integer values to be prefixed by int(intAge). Should variables be all lowercase with underscores(str_name) to separate "words", or camelcase (strFirstName) to represent new words?

2. File naming and organisation
File naming and organising should specify how files are named, and the folder structure they should follow. For example, are all your class files in a folder called classes, or a sub folder structure application/core/classes. Or are your files grouped into modules, and within these modules, a classes folder exists. Where do images go? Both structural and customer uploaded content if applicable? Would you rather everything was above the public_html directory, and accessed through a controller file? specify this site and file organisation at this point.

3. Formatting and Indentation
Some developers swear by tabs, some swear by 4 spaces. Some just want to cram as much as possible on one line! Decide on a standard that you would like everyone to follow that is easy to read and maintain. If developers are using different IDE's, tabs might not be a good idea, as some IDE's have different indents for tabs, whereas 4 (or 6 or 8 whatever you like), spaces are pretty easy to replicate no matter what IDE is used.

4. Comments and Documentation
I remember this from as far back as High School - "Documentation, documentation, documentation". The envy of my computing life, but now I'm in the real world, how wrong I was to hate it. If I could have documentation for every piece of code I've used it would be amazing. Bottom line here, developers don't like documenting, but if you set this in your standard, to at least document the purpose of classes or files in the header (I'm thinking in phpDoc style here), and comment complex pieces of code, using a certain comment style, this will go down well if developers can see this across other code they'll be more inclined to follow suit.

5. Classes, Functions and Interfaces
A standard here, could be that concrete classes are never passed into an objects, and every class must implement an interface. How your classes are named and how methods are named should also fall under this category, for example, class names should always be capitalised and singular (eg Cart or Post), and all methods should be private except getters and setter which are prefixed with 'get' or 'set', and camelcased (eg getPostTitle(); or setDeliveryCharge();)

6. Testing
How is your code tested? Do you have a test suite that is run on check in, or do you manually write tests for your methods as and when you create the methods, showing inputs and outputs. A lot of organisations these days are moving towards CI (Continuous Integration) solutions where test suites are automatically ran (a blog-post I'm looking forward to writing). Some have unit or functional tests which are manually ran before code is checked into a repository, or some just look at the code and go 'ok, yeah that works'. The latter of which many of us have been pretty good at in the past. You may also have a QA team, person or phase of development to go through and sign off checklists before work is passed back to the client, whatever method works for you, document it and make sure everyone does the same.

Crikey, I haven't half went on there! Thank you if you're still with me. The bottom line I guess is, are coding standards a good thing or a waste of time and money? I believe they are a good thing. If implemented correctly, they should set the foundations for any new developers joining your team, and maintaining the highest standards of code quality. It also leads on to maintenance, that if a piece of work is being picked up by another team, they're not completely in the dark, they know if a variable is all in capitals its a constant defined in the configuration file, and this configuration file is always in the same place on every project. Also, if they're working on a specific area, they know to checkout unit tests or certain areas of documentation for details on that area of code.

Thanks for reading.

Monday, 6 August 2012

include, require, include_once or require_once

Well the question Today is which is the best or correct one to use?

There are a few reasons to choose between include and require I suppose, based on the importance of the file you are 'including'.  If I was including a config or settings files with critical information in it, or a bootstrap loader file which all my files need, I'd be inclined to use require over include - as the file is Required, and not an optional file that the site can load without.  Take on the other hand if I was pulling in a file with some adverts on it, or a banner on the site from another template file, I'd be more inclined to use include, as this isn't mission critical to the site loading - Ok I'd really like it there, but if there is an issue for whatever reason, I still want the rest of the page to continue loading.

Next comes the question of include/require vs include_once/require_once.
I guess the age old discussion of performance comes to mind, if I know for a fact (and I should It's my project), that the file I'm pulling in is only to be read in once, even still I shouldn't have to use *_once statements, however if for whatever reason I decide I do, as far as I'm aware, the process of calling:

require_once 'require_me.inc.php';

The system is going to go:

  • Ok, we require this file 'require_me.inc.php'
  • has it been required already? Let's check..
  • [check previous list of required files in this execution]
  • if it has been, don't include it again
  • [continue script]
If I choose to just require it:

require 'require_me.inc.php';

The system should go:

  1. Ok we require this file 'require_me.inc.php'
  2. pull it in now
  3. [continue script]
OK, now that process takes microseconds (if not faster), on server set ups these days, but scale this up to a situation where you may have mega numbers of files included, is this going to cause performance issues?  I'm not sure, would be interesting to benchmark the exact differences in performance of the commands, and I might in fact do that someday - just not today.

If anyone is prepared to benchmark these statements for me, it would be interested to see.

Also, you may have noticed that the code snippets above I used require 'file.php'; as opposed to require('file.php');  This is because I believe include and require are statements as opposed to methods which one passes variables into.   in the same way I'd call new MyClass(); as opposed to new(MyClass());

Thanks for reading.

Saturday, 4 August 2012

Portable PHPUnit

So looking into PHPUnit again this morning, and would be great to have a "portable" set up.  The usual way to set up PHPUnit is to install the PEAR package.  However it turns out it is possible to have a more portable set up which can be moved between my XAMPP usb pen, WAMP laptop setup and LAMP stack at work, without all needing to have PHPUnit installed under all the separate hosting accounts.

Found this post the other day: http://stackoverflow.com/questions/4801183/php-is-there-a-portable-version-of-phpunit.  Well in actual fact I found the process on another site, but that one sums it up pretty well I think.  And I don't think you'd need to use the git clone's either, you could I suppose download the source from github in an archive and extract it.
The main thing to remember is to update your include_path() to include all the paths to the modules you've downloaded though.


I will have to have a play around with this, as I have PHPUnit PEAR packages all installed on my localhost already, but am keen to have a portable solution.

Friday, 3 August 2012

Me and TDD

OK, so I'm looking into Test Driven Development (TDD).

Now, I've looked into working with it before within the PHP environment I use daily, but sometimes feel a bit overwhelmed by it all.
Every time I read up about it, I always get a different slant on things, and always learn something different or new from the time before, and feel I am getting to grips with the basic principles now.

The big question is whether to start from scratch or attempt to write tests for the existing codebases I have.

I have a personal codebase, albeit pretty dated, and then there are the (many) styles and codebases used at work for me to choose from.  Part of the dilema is if I use any work code, I can't post it online or use it in my own work, although, to be honest, when work finishes, usually the last thing I want to do is come home and be glued to the laptop writing code.  So that explains why I haven't made much progress over the past 3 years, and from time to time dip into blog articles and posts on TDD and Unit Testing.

Whether to use PHPUnit or Simpletest, whether to use a framework or not, and if I do use a framework, most of these have testing classes and modules integrated with them.

Along with beginning to look into TDD, a lot of the examples I am finding at for calculator and string classes, and very specific to testing the logic of these classes.  When I think about applying these to the monolithic grouping of php files and classes that comprise bespoke and custom built ecommerce and cms systems, I suddenly become overwhelmed by the sheer amount of code I'm going to have to write.  I mean, OK, I can write a unit test to verify an add() method works correctly and returns all the expected results given this input and that input, and throws exceptions or whatever.  But what about scaling that up to handle items with stock levels, sizes and prices that can all change, mixed in with a checkout or buying process, and whether an order was successfully completed.  Including discounts and shipping costs and members being logged in and all sorts - it just blows my mind trying to wrap all that up in tests.

I guess the key is to do it little bits at a time.  Focus on possibly a member, then items, and the process of building basket, then the ordering process, then the checkout and confirm stages.  Tying that all in with a database, and trying not to make too drastic a change that I may as well have rewritten it from scratch isn't easy.

So, it continues, my research into Unit Testing and TDD, and Acceptance Testing and ATDD.  I am leaning towards PHPUnit, but when I start to think about Mocks and Stubs to isolate the tests, I feel I'm going to end up writing so so much more code trying to refactor my existing codebase, that I keep leaning towards a re-write!

And why I don't just do it... time I guess, when customers need a feature, they want it now, and they want to pay for that feature, not me going back and rewriting the existing system.

I would love to write more posts about my journey into actually beginning to follow TDD a lot more strictly, but I don't think that will happen today.  Although, it might!

Comments welcome :)

Monday, 16 April 2012

MySQL SELECT COUNT(*) FROM issue with LIMIT

Recently working on my core database class I found an issue with my num_of_rows() method.

I was parsing my query to do various things, which basically boiled down to running a query like:
[code]SELECT COUNT(*) FROM `table_name`[/code]
To return the number of rows generated by this query.

Now, when I started introducing pagination in my front end, I wanted to utilise LIMIT/OFFSET in my queries, like so;
[code]SELECT COUNT(*) FROM `table_name` LIMIT 30, 10[/code]
Which I would hope returned 10 (or less if I had less than 30 rows in my table.

Anyway, this was throwing a wobbler in my PDO class, I wasn't getting any error or feedback that there was an issue, and in phpmyadmin I was simply getting "MySQL returned an empty result set (i.e. zero rows)".

Baffled, I took to google and found this work around:

[code]SELECT COUNT(*) FROM (SELECT * FROM `projects` LIMIT 30,10) AS subquery[/code]

Happy coding

Friday, 23 March 2012

How to Promote a New Product on Your Ecommerce Site

Retailers frequently add new product lines, brands, or even seasonal items. When these fresh items are introduced, marketers call it a product launch. Product launches are important and frequent events in ecommerce marketing that may be managed with a promotional recipe, removing uncertainty and ensuring measurable results. This sort of marketing is common and, therefore, is best managed in a uniform way within each retailer.
This product launch recipe does not imply some lack of creativity. Rather, it suggests the marketing tactics that each of these events should include, allows for consistent and comparable data from launch to launch, and it makes it possible improve common resources like email lists.

A Common Thread

A good product launch recipe begins with a common strategic goal for each launch event. This should be a standard goal — or type of goals — for all new product or brand introductions. For example, a certain number of sales of the product, the acquisition of some number of new customers, or an increase in registrations associated with the launch would all be reasonable product launch goals.
If this goal is similar for each new product launch, these successive launches may be compared and contrasted, identifying the types of products that sell best so that future lines may be chosen based on past successes.

Common Measurement

While marketing measurement should evolve with new technologies, new data, and new channels, product launch marketing should include a set of common data points that may be charted and analyzed from campaign to campaign.
These common measurements can be used collectively to improve product launch marketing overall and individually to compare specific launch events.

Common Resources

Product launches should draw on a common set of resources like email lists, Facebook followers, or even pay-per-click advertising experience. In this way, each new launch both makes use of existing resources and helps to increase the quality and size of those resources.

Tactics Are the Ingredients

A cooking recipe seeks to make some known dish and a product marketing launch recipe should also seek to a common result, like the aforementioned goals.
Marketing tactics are the ingredients that make up the product launch recipe. When combined in a given way they should achieve the product launch goals. What follows is a particular product launch recipe that is currently in use. It can be modified to meet particular goals or situations.

Most webmasters don't know how their websites got hacked, report says

According to a post on networkworld.com, 63% of webmasters whose websites get hacked don't know how the compromise occurred!

The leading cause of website compromises appears to be outdated content management software (CMS). This was indicated as a reason for their websites being hacked by 20 percent of respondents.
Twelve percent of webmasters said that a computer used to update their website was infected with malware, 6 percent said that their credentials were stolen credentials, and 2 percent admitted logging in while using wireless networks or public PCs. However, 63 percent of respondents didn't know how their websites got compromised.
The CMS platform most commonly installed on compromised websites was WordPress, as indicated by 28 percent of respondents. However, WordPress accounts for over 50 percent of the entire CMS market according to data from w3techs.com, so the rate between hacked WordPress websites and the platform's actual install base is better than that of other CMS platforms like Joomla or osCommerce.
Almost half of respondents -- 49 percent -- learned that their websites had been compromised through browser or search engine alerts. Eighteen percent were notified by their colleagues or friends, 10 percent by a security organization and 7 percent by their hosting provider. Only 6 percent of respondents discovered the compromise on their own by noticing suspicious or increased activity on their websites, the report said.
A third of respondents didn't know how their websites had been abused after being hacked. Those that did know pointed to the hosting of malware and rogue redirect scripts as being the most common forms of abuse -- 25 percent and 18 percent, respectively.
Many webmasters -- 46 percent -- managed to fix the compromise on their own, using information available on help forums and other websites. Twenty percent fixed the problem by following instructions received from security companies and 14 percent with the help of their hosting providers.
However, more than a quarter of respondents indicated that their websites remained compromised after trying several approaches.
Overall, 28 percent of webmasters said that they are considering switching Web hosting providers after their hacking experience. The survey found that webmasters were three times more likely to consider leaving Web hosting providers that charged extra for helping them address the problem or refused to provide support.
The report concluded that many webmasters are not fully aware of the threats their websites are exposed to and how to deal with possible compromises. Taking basic security precautions like keeping CMS software and plug-ins up to date, using strong and varied passwords that aren't stored on local machines, and regularly scanning computers for malware can go a long way to prevent website hacking incidents, the report said.

Mastering Frontend Interviews: 10 Essential Concepts Every Developer Should Know

Frontend development interviews can be daunting, particularly with the breadth of topics covered. From JavaScript fundamentals to performanc...