Indeed. BTW for maximum clarity I'd suggest that step 1 explicitly say "Delete the abstraction". It's implied, but it may be easy to forget to do it – in which case a new programmer may see the abstraction and think they're supposed to use it.
We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.
Favoriting means this is a discussion worth sharing. It gets shared to your followers' Disqus feeds, and gives the creator kudos!
Indeed. BTW for maximum clarity I'd suggest that step 1 explicitly say "Delete the abstraction". It's implied, but it may be easy to forget to do it – in which case a new programmer may see the abstraction and think they're supposed to use it.
Seems like a key part of this is whether it truly was duplication to begin with. A question that keeps coming up for me when I run into this situation is "do I really want these two places to move in lock-step, or do they just cosmetically look duplicated?"
We can open this up again. I remember reading this two years ago, and having then been working with a legacy code base, agreeing totally.
I do question the refactoring approach. Instead of reintroducing the duplicated code (looking at the parameters and conditionals to decide which parts), I would try to refactor the abstraction as a composition. Under condition a do:
a(g(x))
under condition b, do:
b(g(x))
A bad abstraction as described in the above can definitely be resolved via a composition. And as one finds other use cases, it's easy to extend the model:
c(b(g(x)))
In general, I say if a problem can be expressed as a composition, it should be. So much of the code I see every day hides one inside of it. Devs should be taught to look for them.
I would notice that the wrong abstraction began not by programmer A who first removed duplication but with programmer B or C who misunderstood to be "honor-bound" and instead of continuing refactoring and improving the code quality just added the conditional. This process also shows off a lack of code review practices that would have identified this problem. So good by programmer A.
I agree with this entirely. If some code has become (or was designed to be) over-complex, time-consuming to amend, overly confusing or obfuscated, it's probably time to look into replacing it with something better. The investment in time will be rewarded many times over in time saved later and employee morale.
It doesn't have to be just for abstractions of duplicated code. Maybe you've got a data structure spread over three classes, with broken encapsulation. Maybe a complex and fragile persistence system could be replaced with something simpler and more robust. We all come across sections of code that are over-engineered, hurriedly slapped together for a deadline or built for a set of conditions that may not be relevant any more. We shouldn't be afraid to re-think, re-engineer, refactor. (There may be muttering as you do so. This is normal.)
When it comes to duplication, sometimes it's a good idea to leave it in in the first place. (Gasp!) Except in trivial cases, a single duplication doesn't give much information about the shape of a general solution. Once you're doing the same thing three times you might have a bit of a better idea, but always be prepared to back out and change the abstraction when more requirements come in...
I don't see this as a failure of abstraction. It's caused by some specific other failures.
1. Failure of people to understand data modeling, and the difference between interface and implementation. "A new requirement appears for which the current abstraction is almost perfect." Well, is it or isn't it? Just because the code is the same doesn't mean it should use the same abstraction. The data model is wrong, not the abstraction.
2. Failure of people to think of the entire system, when making a change. The rookie mistake is to think that adding one more conditional is always harmless, but clearly at some point you've got 3 tons of straw on that poor camel's back. Adding just one more case may have been the correct fix last week, but not this week.
3. Failure of programming languages and frameworks to support the necessary abstractions. In a language like Java, for example, it's common for the easiest way to be "just add another conditional". In a language that supports multiple dispatch, though, you can easily add more situations that are similar to existing cases, without writing any conditionals at all, and it stays easy to maintain because you're not writing a dispatcher by hand along with your business logic.
After all the hemming and hawing about this, I'd really like to see a real case study on it. It's never been an issue for me. I've had functions where I added parameters later on for new use cases to fit the function to them, yes, but I never felt like I was suffering for it. Maybe DRY isn't so bad after all.
I have also seen this pattern, and used the same general inline technique to get out of the mess. One additional method I use is to examine runtime behavior to trim down unneeded branches. By the time I get to see the code the combinations of inputs are in the 100s, and side effects in dozens. Fortunately, runtime behavior is a lot simpler. I think that's because the code serves business processes which would have long collapsed on their own weight if they were too complicated.
It seems to me that if parameters are added to abstractions to enable optional features, yes it is the wrong abstraction, but more specifically, the abstraction lacks cohesion. Good abstractions do only one thing. There is nothing wrong with using a bunch of small cohesive abstractions in your implementation to reduce code duplication.
For example the set of functional list abstractions like foldl, map, and filter all generalize the operations over a list. The foldl function traverses a list from left to right and applies an operation , accumulating an output. Sometimes this doesn't work for my case, and I need to traverse the list from right to left. The library developers didn't add a parameter to foldl; they created a new abstraction called foldr.
In the OO world, adhering to the SOLID principals should keep your abstractions cohesive and coherent. A good library will almost always have a public interface consisting of a layer of small cohesive interfaces.
I think may people get frustrated with abstraction because they see so many "bad" abstractions, and so assume getting the right abstraction is hard.
If you play out the duplication scenario as you did the DRY scenario, you will see the problem with repetition. Programmer 2 alters the duplicated code, but also something changes that is common to dupe1 and dupe2, but they only change dupe2. Repeat. Now your code, instead of being complicated, is riddled with bugs because people keep forgetting to change dupe1, dupe2, dupe3, etc.
I believe you’re talking about the challenge of legacy code, rather than the challenge of abstraction.
Abstraction is the wrong target, it is the effect rather than the cause of the problem here. People won’t understand this though, and the internet will catch fire with an “all abstraction is bad” movement. I’m retiring.
This is a nice write-up. I wrote about the same topic with more abstract
analysis in my post The Fallacy of DRY:
https://www.entropywins.wtf...
This is not a newly realized problem, but DRY has made it a monster to combat, and this article is the best armor I've found. In my eyes, this is your magnum opus Sandi.
You've identified the problem but come up with a bad solution. If the abstracted class/function can't be modified without side effects, it's doing too much and should be broken up into smaller functions which can be called and extended as needed.
ah classes, you got to love'em... code duplication by necessity, if only there was a way to get away from them, say some way to compose ideal object from small functional pieces... wouldn't that be great? yeah. Anyways back to topic, I totally agree with the “prefer duplication over the wrong abstraction”.
Imagine an IDE plugin called UNABSTRACT that created the duplicated classes!
The rule is: THERE ARE NOT CONDITIONS IN ABSTRACTIONS.
If you find yourself adding a condition in one, refactor it.
Design is the king. Not the code. Going into the code head first is like going into the coal mine which has no light.
I agree with the sunk cost comments.
The cost of code duplication verse the wrong abstraction is also a comment on the value of delaying using DRY until the algorithm is better understood. The RailsConf 2014 talk and the notion of that the fastest way forward is back imply using this too soon can take things in the wrong direction.