Wednesday, July 22, 2009

Day 40

I finished the HistoryComparer today, except for maybe a little more refactoring and cleaning. The comparer will now compare every table of one page history, to every table of the other page history search for the most likely matches, and then lining those tables up next to one another in the comparison. This way if an old page history used to have just two testing tables, but the newer page has two new tables, one at the top and one at the bottom, then the comparer will match up the two old tables with the two middle tables of the new page and then leave blank spaces for the other two tables. Its actually pretty cool, because it is now very easy to see the differences between older pages and new ones. And to find which tables were recently added, which ones have been just slightly changed, or even moved around.

To do this I created a rather arbitrary point system that I could use to make a sort of best fit between tables. First, the comparer would compare every cell of the two tables, and give a percentage based off how many of them matched (# of matching cells / average # of cells between the two tables). This was actually done row by row and then column by column, and it would add the percentages from each column and then divide by the total number of columns. Then it would award bonus points if the two tables had the same number of columns and the same number of cells. This way, if two tables were the same, but in one of the histories the table had failed all its tests (leading the cell values to be different) the tables could still be matched together since they had the same number of cell and columns. So the comparer would generate a score for each of the table comparisons, and any score that was over a certain amount would be considered a match.
The next issue was that then I would have situations where one table would match with two or three other tables, and the program would go nuts trying to line them all up. It was actually really funny the first time I saw this because it ended up matching the setup table with the teardown table, and thus added like 30 blank spaces to line them up. Probably had to be there though... :) So to solve this, I made a bestfit checker. For two tables to be matched up they would either have to have no other matches, or the new match would have to have a better score than the last, in which case the old match would be replaced with the new. This turned out to be rather mind bending. If we are comparing table1 and table2, and they have a score of 1.1, but table1 already is matched with table3 with a score of 1.05, then table1 and table2 will be matched, and table3 left unmatched. But if table2 already had a match with table 4 with a score of 1.15, then there couldn't be a new match. Actually when I write it out it seems rather straight forward, and I suppose it was in the end, it just required a lot of tests and postulating to think of all the possible scenarios. The issue with the solution I arrived upon was that I had a ton of if statements, and ifs inside other ifs and so forth. The whole thing just got ugly.

While trying to refactor this code, I discovered that a huge portion of the code had outstayed its welcome and needed to be moved to a new class. I called this the TableComparer, and it managed all the code having to do with making the matches between the tables. This made it much easier to make the code readable because now I had the group of semi ugly functions all in one class with one goal, so I could pull out a lot of arguments and turn them into fields, thus making it easier to extract new and descriptive methods. The other thing that was great about this refactoring was keeping the tests passing the whole time, and slowly transitioning the code into its new form. I would pull a chunk of code from the HistoryComparer and move it to the TableComparer, get everything to pass, and then remove the code from the HC. As I did this, the tests also informed me that I needed to make a new test file for the new class, and needed to move the appropriate tests into it. It was quite nice.

No comments:

Post a Comment