My Object Mentor Apprenticeship

Tuesday, July 28, 2009

Day 44

Yesterday and today were both spent working with my Dad on changing the nature in which FitNesse deals with suites. In the past, FitNesse would save suite pages (a page that runs many test pages) as a massive file with all of the results from each page inside of it, as well display a suite page as a summary of all the test page results and then all the test results below that. So anytime you ran a fairly large suite you would end up with this massive page with tons and tons of results on it. You would also get a history file for just the suite, and it would contain this same huge mass of information.
Now, we have changed FitNesse so that when it saves the suite page's history, it actually saves the history for all the individual test pages and only saves the links to those pages in the suite history file. It also only displays the suite's results as a summary of the individual test page's results, and then links to each of those pages. This strategy makes more sense, and saves memory since the browser no longer needs to save this super huge suite page.

Friday, July 24, 2009

Day 42

Today is the ultimate day. Today is Day 42 of my Object Mentor Appreticeship, and oh what a day it is.

I must say, I have thought I was done with the HistoryComparer time and time again, but I keep needing to improve it. All that work and refactoring that I had done before for the findBestMatch stuff, I deleted all of it... Turns out, it makes more sense to save all of the matches, no matter their score or which tables are matched, then order the matches based on their score (highest to lowest), and then remove any match which use the same table but are of a lower score. This way each table only has 1 match, and it is always matched with its best match. The reason for this change was that we found a bug where if lefthand table6 was matched with righthand table6 with a .8, then lhtable6 was matched with rhtable7 with a .9, and then lhtable7 was matched with rhtable7 with a 1.0, then the only remaining match would be the last one made since each time the new match was added, it had a better score and the previous match was using one of the tables in the new match (and since a table can only have one match, the old match had to be deleted). The new strategy will preserve the lhtable6-rhtable6 match since by the time it compared those matches with the other ones, the lhtable6-rhtable7 match would have been deleted(since it is earlier on the list because it has a higher score, and after the table7-table7 which would delete the table6-table7 because it uses table7).

I also added a few more features, like displaying the score for every match, displaying if the two pages where a complete match, and other things like that.

Wednesday, July 22, 2009

Day 40

I finished the HistoryComparer today, except for maybe a little more refactoring and cleaning. The comparer will now compare every table of one page history, to every table of the other page history search for the most likely matches, and then lining those tables up next to one another in the comparison. This way if an old page history used to have just two testing tables, but the newer page has two new tables, one at the top and one at the bottom, then the comparer will match up the two old tables with the two middle tables of the new page and then leave blank spaces for the other two tables. Its actually pretty cool, because it is now very easy to see the differences between older pages and new ones. And to find which tables were recently added, which ones have been just slightly changed, or even moved around.

To do this I created a rather arbitrary point system that I could use to make a sort of best fit between tables. First, the comparer would compare every cell of the two tables, and give a percentage based off how many of them matched (# of matching cells / average # of cells between the two tables). This was actually done row by row and then column by column, and it would add the percentages from each column and then divide by the total number of columns. Then it would award bonus points if the two tables had the same number of columns and the same number of cells. This way, if two tables were the same, but in one of the histories the table had failed all its tests (leading the cell values to be different) the tables could still be matched together since they had the same number of cell and columns. So the comparer would generate a score for each of the table comparisons, and any score that was over a certain amount would be considered a match.
The next issue was that then I would have situations where one table would match with two or three other tables, and the program would go nuts trying to line them all up. It was actually really funny the first time I saw this because it ended up matching the setup table with the teardown table, and thus added like 30 blank spaces to line them up. Probably had to be there though... :) So to solve this, I made a bestfit checker. For two tables to be matched up they would either have to have no other matches, or the new match would have to have a better score than the last, in which case the old match would be replaced with the new. This turned out to be rather mind bending. If we are comparing table1 and table2, and they have a score of 1.1, but table1 already is matched with table3 with a score of 1.05, then table1 and table2 will be matched, and table3 left unmatched. But if table2 already had a match with table 4 with a score of 1.15, then there couldn't be a new match. Actually when I write it out it seems rather straight forward, and I suppose it was in the end, it just required a lot of tests and postulating to think of all the possible scenarios. The issue with the solution I arrived upon was that I had a ton of if statements, and ifs inside other ifs and so forth. The whole thing just got ugly.

While trying to refactor this code, I discovered that a huge portion of the code had outstayed its welcome and needed to be moved to a new class. I called this the TableComparer, and it managed all the code having to do with making the matches between the tables. This made it much easier to make the code readable because now I had the group of semi ugly functions all in one class with one goal, so I could pull out a lot of arguments and turn them into fields, thus making it easier to extract new and descriptive methods. The other thing that was great about this refactoring was keeping the tests passing the whole time, and slowly transitioning the code into its new form. I would pull a chunk of code from the HistoryComparer and move it to the TableComparer, get everything to pass, and then remove the code from the HC. As I did this, the tests also informed me that I needed to make a new test file for the new class, and needed to move the appropriate tests into it. It was quite nice.

Tuesday, July 21, 2009

Day 39

Today I enhanced my sorting algorithm which lines up all the tables. First I made the function that compares two tables a little more forgiving. Ultimately I will need to make this comparison a percentage fitting algorithm, so that it will still match tables that are just a tiny bit different. Since FitNesse tests often have collapsible sections, that reveal some test specifics, I also had to make the comparison ignore those collapsed sections.

The biggest task today was to refactor the whole HistoryComparitor class. Most of this I just did myself, extracting method after method, giving names that described to true intent of the code, and abstracting certain repeditive concepts and variables. There was one method I needed help refactoring because it was a bit tricky. This was the same method that had a three level loop, that I described breifly yesterday. The method was actually one algorithm repeated twice for its compliment. Once for the firstTableList and again for the secondTableList. Once we noticed this, we began to implement the Strategy Pattern.
First we created an interface which described what methods we were going to need. Then we implemented that interface in two derivative classes, one for the firstTableList and another for the second.
By extracting the details of the algorithm into two subclasses, we had the power to remove all the duplication and generalize the algorithm such that it worked for both lists. This process made the code much easier to understand and follow, removed duplication and regidity, as well as freed the algorithm to be used elsewhere if needed.

The really cool part was that we kept the tests passing the entire time. At anypoint during our refactoring process we could have released the code and it would have worked. This is an essential skill that I must master. The refactoring was slow and careful, but vastly increased my understanding of both my own code and the power of the tests I wrote. The Strategy Pattern was pretty cool to.

Monday, July 20, 2009

Day 38

Today I worked on improving the historyComparer some more. I needed a way to take two lists of tables, and match them up in length and in value. So if I had a list of 3 tables and a list of four tables, where table 1 of the first list matched table 2 of the second list, I would ten add a blank table before table 1 of the first list. This would line up the matching tables as well as the end of both lists (they would both be 4 tables long). But then I also needed to guarantee that if two tables didn't match eachother, the would not be in the same row/lined up. The process would look something like this:

List1 List 2 ---> List1 List2 -----> List1 List2
_A_ X_____ blank_ X _______blank__ X
_B_ Y_______ A__ Y_________ A__ blank
_C_ B_______ B__ B_______ blank__ Y
_D_ Z_______ C__ Z_________ B___ B
_--_ D_______D__ D_________C__ blank
_________________________blank__ Z
__________________________ D___ D

This way, when comparing the tables of two page's test history, the user would be able to easily see which tables have changed, new tables, deleted tables, and so forth.
The solution was to first, put the tables in lists like above, then to compare the tables, then line them up, then finally add blanks where needed. This sounds simple enough, but what ended up happening was that I had loops inside loops inside of loops.
The process would start by comparing each table to every other table in the other list. Every time the comparison found a match, it would add the index of those two tables to a list of pairs, to keep track of the matches. Once all the matches were found, it would loop through the matches trying to line them up by adding in blank tables. But every time a blank table was added to a one of the lists, the indices of the matches had to be updated, so it would have to loop through the matches again inside the first loop, updating all the indices. After all the matches were lined up, it would loop through the lists adding blank tables anywhere there were two tables in a row that didn't match, and then again update all the match indices. Finally, it would make sure both lists had the same number of tables by adding blanks to the end of either list if needed.
Now I have a lot of refactoring to do to make the code explain itself. This is, I think, the most important part of coding, and its a good thing I have tests.

Friday, July 17, 2009

Day 37

This week my Dad taught a four day class to a couple of students, including myself, which was focused on the Agile Principles, Patterns, and Practices. The first day was a very quick run through on TDD so that the students would have the context under which all agile practices are performed. The most important information from that first day was the three rules of TDD: You may not write any production code until you have a test for it. You can only right as much of a test as is needed to fail, including compiler errors. You can only right as much production code as it takes the make the test pass. If you are using these three rules you are practicing TDD, and will experience the wealth of benefits that come along with extensive testing. What is interesting is how just following those three rules may actually lead you to many of the other agile practices like the Open/Closed Principle. A plethora of tests will force you to keep your code very flexible, because the act of writing a test forces you to access you production code from a separate path, often decoupling tight and nasty bonds.

The three days following were all about the agile PPP. The five PPP are:
Single Responsibility Principle
Open/Closed Principle
Liskov Substitution Principle
Interface Segregation Principle
Dependency Inversion Principle

SOLID

The SRP says that a class or a method can only have one reason to change. Every class needs to be based around one idea, and one idea only. Every method of that class should be geared to accomplish, manage, or add behavior to that one idea. A class named House should not, for example, have a method that tells the user what other houses or condos the owner of the house has. A method named makeGoldenFish should not set the fishing boat's x and y location. By using this principle we can avoid unexpected changes or failures when we create a certain class or use a certain method. It also makes sure that if we just so happened wanted to change the type of fishing boat, we wouldn't have to go into the makeGoldenFish method and change the boat type in there. It guarentees a certain degree of flexibilty in your code, as well as helping another coder understand what your code's intentions are.

The OCP says that classes or components should be open for extension but closed for modification. It should be easy to add new features to an existing framework, and you shouldn't have to change any of the existing code to do it. Making changes should be a low cost and low risk opperation. Adding new code is cheaper than changing old code, and by following the OCP you guarentee that adding new features is cheap, or at least cheaper than it would have been otherwise. Part of the process of writing good code is forgeting that details exist and writing very general solutions to problems, and then adding the details later. The OCP encourages you to write very abstract classes, methods, and solutions so that details can be added, interchanged, or removed without having to change much code at all. An example would be if I had some pets that I wanted to be able to command to move. I have a dog that runs, and fish that swims, and a bird that flies. A poor way to write the code to do this would be to have one Pet class, with some switch statement or if -else chain that checked what type of pet it was then said dog.run or bird.fly. This would violate the OCP because to add a new type of pet, like a sloth that can only walk, I would have to add another ifelse or case statement in every spot I wished to tell this pet to move. Thats a lot of changes, and very expensive, just to add a new pet. A better way to do this would be to use the Template Pattern. I would create an interface named Pet, and have a Dog, a Bird, and a Fish class which were derivatives of the Pet interface. I would then have a method in pet, which each derivative would implement, called move(). This way I could avoid all if and switch statements, and replace them all with pet.move(), and at runtime the appropriate pet would get the correct move call. This makes my code far more dynamic and flexible. To add a new pet, I would merely create a new derivative to Pet. Thats all. If the pet happened to be a sloth, then pet.move() would call the sloth's move function. This is cheap to add features too.

The LSP says that a derivative must be a sutable substitution for the base class. If I have a base class Hero with a method saveLife(Person) then my derivative Batman must have a saveLife(Person) method that will sufice in a situation that a user might call Hero.saveLife(Person). One must be careful with this however, for the IS-A relationship may apply for the real situation, but not the coded representation. The coded representation might not always match the reality. Some symptoms of the LSP are when low level details are forced to become visible, when bases know of their derivatives, when not all abstract methods are implemented in derivatives. A violation of the LSP often is a, or leads to a, violation of the OCP. An example is lets say you have a Driver base class and a Vehicle base class. Their derivatives are CabDriver ,Pilot and Car, Jet respectively. When driver.operate(vehicle) is called, you may have a CabDriver trying to operate a Jet, which simply wont work, and thus you have violated the LSP. To solve this you might use the Intelligent Children Pattern, which is useful when you have a dual higherarchy (such as the current example), and you don't want to have to use ifs or downcast your types. You can have the children, or the derivatives, hold the types of their respective needs. So the CabDriver would know his vehicle is a Car. This adds a dependancy, but there is a trade off either way.

The ISP says that an interface must be cohesive to the classes that implement them. When an interface begins to acquire more and more methods, and the classes that implement the interface don't use all of the methods, than the interface has gotten too fat. An interface should be specific to a general idea, and the classes that implement that interface should all fit into that idea. The example in Agile Software Development is a Door class, which is abstract, and has Door derivatives like a TimedDoor (a door that can only be open for a certain period of time). This TimedDoor uses the methods of a Door, but also inherets from a Timer class, to get the functionality it needs to know when it has been open too long. The example than explains a solution to this problem that demonstrates how an interface could be made for a TimerClient, which Door and then Timed Door would then implement. But why should Door implement a TimerClient? It shouldn't. A general Door has nothing to do with the timer, but in that solution it would implement the TimerClient just to get the timer functionality. This demonstrates at least part of the ISP. The ISP also outlines a strategy to write maintainable code. If you have some base class with a TON of methods, that a whole bunch of other classes use by extending the base class, then everytime you make a change to that base class you will need to recompile all of those children classes as well (even if they only use one of the base class methods). To solve this, you can stick an interface in between the base class and some of the child classes, so that the child classes will call the methods on the interface, and the base class will inherit the methods from the interface. This way you can free up some dependencies and only have to recompile small chunks of code if you make a change to the super base class.

Friday, July 10, 2009

Day 32

Yesterday turned out pretty excellent I thought. We got the new FitNesse release out! And now, its so cool, all you need to do to update is download and run the .jar file and FitNesse will do the rest! I spent most of the morning working on adding a button and some user friendly tools to the compareHistory function. I had a long debate with myself on how to actually accomplish making it user friendly. In the end I decided to use check boxes next to the test history files, and the user would select two and click the compare button I added. For awhile I was considering both doing the check boxes in Java Script, which I don't know much of and porbably would have taken me a day to learn the basics well enough to get the functionality done, or I would have made two parrallel option circle things thus making sure the user could only select two. Instead I just send an error page if the user attempts to compare history with any number of boxes checked other than just two. I am starting to get the hang out making new features, and finally I have a decent idea of how FitNesse works.
My next task has already been given to me, and that is to improve the compare history feature. Right now it merely displays the two pages side by side, and shows whether or not they match. Now I need to make this comparison table by table, showing which tables match and which don't. Even more than that, and this will be the challenging part, if a old page had lets say 1 more table than the new one, then of course all the tables wont line up right. So I need to make the comparison just a little smarter so that it will actually look for matching tables and line those ones up and just attach any extra ones to the end.