> Data-driven journalism gave us Buzzfeed
> Data-driven music gave us X-Factor and Pop Idol
> Data-driven movies gave us 25 Hollywood sequels planned for this year
> Data-driven education gave us Key Performance Indicators and Teaching to the test
These first three examples are awful examples. A ton of people love all of those things. The only failure of data-drivenness here is the failure to generate content that the author wants. Now, the author can attempt to make some argument about how websites, TV shows, and movies have a moral obligation to strive for whatever objectives the author prefers, but that's a separate issue to settle.
The fourth example is a little different, because we're talking about mandatory education programs for children, rather than products that people choose to pay for or consume. Also, I don't think that data-drivenness itself is a significant contributor to those problems in education.
Enough people might reliable go see 25 sequels, but none of those films will be memorable. None will advance the art of film-making. None will change anyone's life or mentality or affect the culture in any meaningful way.
Being data driven means chasing the biggest, loudest signal in your data set. It means pandering to that signal, because it swamps all others. A data-driven approach is not going to lead you anywhere new.
In machine learning / AI we refer to such algorithms as "greedy." A classic example would be simple hill climbing a.k.a. gradient descent. These algorithms are known to be very good at optimizing within the bounds of a simple well-behaved and regular fitness landscape, but they readily become stuck at local maxima when presented with any solution space with any complex structure.
We're living in what I'm tempted to call the dark age of the local maximum, the age of gradient descent.
This reminds me of how one time, a professor of mine was discussing a pie-chart. He pointed towards the smallest sliver and said "Know what that is? That's opportunity!" He was most interested in the smallest signal because it represented untapped potential. (IIRC the topic was related to replacing oil with renewable energy.)
Perhaps you value novelty more than the average person. What might not be "adequate" for you might very well be adequate for a huge portion of people.
Throughout this comment you mention several potential goals of content producers (being memorable, advancing the art, changing someone's life, etc.), but you make no argument for why those goals ought to be prioritized over other goals, other than the implication that you personally prefer those goals.
If you're talking about money-making sequels than Toy Story 2 and 3 were memorable and interesting.
Historical data on sales of paintings would've never told Picasso to pursue Cubism, or for Salinger to write 'Catcher in the Rye'.
It was data that causes so many publishing houses to reject future bestsellers. Sales figures for past children's books told publishers that "Harry Potter" would never work.
We all know how that played out.
As for business, existing data would have never told Jobs to build an iPhone - simply because there really was no data on touchscreen devices.
* Back to the Future 2 * Indiana Jones and the Temple of Doom * Army of Darkness * Goldeneye * Evil Dead 2 * Terminator 2 * Ghostbusters 2 * Captain America: Winter Soldier * Batman: The Dark Knight
Yeah, none of those sequels are memorable. We should stop making sequels and only make original films so we don't waste money on travesties such as these.
The implications of the hypothetical extension of this trend are clear to a lot of people. As long as some percent of people are pushing the boundary, culture will be fine. Let everyone else enjoy their sequels and reality shows.
although the Star Wars prequels were not great IMO, they certainly brought a ton of technological innovations which have had enduring impact on filmmaking in general
He hired a company to research the direct-to-video market and tell him what types of product had the best return per dollar. After a lot of time and money, they came back with a report that said, in essence, that the two top performers in the DTV space were science fiction and vampire horror.
The research was sound. The data were irrefutable. He immediately sank millions of dollars into producing a direct-to-video trilogy about vampires from space.
A few years later, I saw him running one of those hot dog concessions outside a Home Depot.[1]
The commonplace version of this are SMB executives who, primed by HBR articles about data-driven decisionmaking, ascribe magical powers to simple linear regressions, believe that a single model can tell them exactly how to unvaryingly run heterogeneous business lines, think overfit just means you can make the model even better, and try to force-feed explanatory variables down the throats of their stats consultants. Analytics is a powerful and necessary toolset, but it's become harder to keep the c-suite from spiking the sauce.
[1] Hell of a work ethic, though. Not long after that he had a few dozen of those carts operating throughout the region. Never did see him again, but you've got to respect a guy who doesn't understand the meaning of "abject failure."
This is exactly the problem with completely data-driven decisions.
First, do people really love those things, or given an ever-increasing trend toward those formulaic things, are those the particular formulaic things they have chosen?
Obviously, it's important to use whatever data is available, but isn't the entire notion of a startup to throw away _ALL_ of the prior data and do something that entrenched players won't tend to risk? I think the idea is not to throw away what's fundamentally important to whatever you do because numbers.
BuzzFeed is doing much better at _terrible_ and mostly worthless journalism than they would if they hadn't invented, or at least largely propagated, clickbait, sure. Will they ever, possibly, following this course, reach the value of real journalists? They're a distraction.
Sometimes the data just tells us what the easiest ways to distract people are, but following that principle alone is creating a world with lots of cruft and very little actual value.
So what? A ton of people loved smoking cigarettes, segregation, creationism, and tons of other BS stuff too.
The idea about culture is also making value judgements, not just bending over to whatever "people like" as if mere enjoyment is the be all end all of arguments.
Especially since what people "like" is not the objective, "own choice", we make it to be, but can be manipulated by advertising, conditioning, lack (or less coverage) of alternatives and finally lack of education.
But isn't that what the article is about? The song it points as the thing to strive for is Bohemian Rapsody (and not, say, Enter, Evening) exactly because people loved it.
I don't think all this sort of stuff is bad. I'm looking forward to the next Avengers movie, in fact I love a good superhero yarn. At the same time, it's a sad fact that big-budget sequels crowd out investment in small ($10 million or less) indie movies, and there's general downward pressure on budgets at the low end of the market.
But while he argues that focusing on data is probably not the most effective way of running a startup, the main point of the article is stated in the title: that even if a data-driven startup is successful, it'll have no character.
Obviously not every startup needs character, so take from that what you will.
You mean, data-driven movie investment. Movie producers don't want the same thing as cinephiles.
However I agree that for early stage startups data driven decision making can be difficult. My experience is its expensive and you often have very little data.
The other issue is this kinda uncanny valley of false rigor. On the one extreme you have very informal analysis. For example, we tweaked our blog post template and increased newsletter signup rates but I can't tell you exact %s because at this stage we don't track it. We seem to be getting a lot more signups, but perhaps its illusory. That's ok. At our level of traffic it really isn't important. At the other extreme you try to model non-stationary processes and all that and have rigorous control over sources of error. In the middle is where I see many companies with, say, A/B testing, believing they have a high level of statistical rigor but not actually achieving that rigor in practice due to many uncontrolled sources of error. This middle spot, where you have too much faith in faulty reasoning, is where I believe bad data driven decision making resides.
Oh, and on turd polishing: http://www.dorodango.com/create.html
Just like startups...
Whenever I see statistics and data-sciences, I see tons of adhoc bullshit masquerading as sciences/knowledge. It is always easy to come up with a hypothesis to explain a set of chosen facts; in order for that hypothesis to be non ad hoc, it has to predict surprising facts.
As the fad continues, we may hear like robots replacing scientists to produce knowledge about various phenomena. For a best critique of AI, check the book by UCBerkeley philosopher Hubert Dreyfus: what computers can't do, a critique of artificial reason.
Using data is good, but “based on” offers a lot of wiggle room. A 10% increase in CTR is nothing to sneeze at, but it does not answer the question of the best use of your engineers’ (or designers’, or marketers’) time. Should they have instead been working on the thing that has a 50% chance of a 20% improvement? How do we account for all the data we didn’t bring to bear?
The data is small, the interpretation is big.
There is also the problem that philosophers call “regress”, which is that every rational decision has to trace back to a premise that one assumes in. Should we be in the business we are in, compared to all the other potential uses of talent? We can’t know that empirically, at root.
One of the better takeaways from the article was the notion that being data-driven means you're aiming for average, and you might not even hit it. Aim for the moon, you might only achieve orbit instead.
I watched a CEO make arbitrary layoff decisions based on what the numbers said should be the size of a development organization and the the ratio of developers to QA. The actual software being built was irrelevant to his figures. He used numbers to justify grinding the dev organization into the ground.
Every circuit, every program is based on a principle: comparators (analogue) = NAND (digital) = if statements (software). Machines choose their answer by taking a huge amount of information, and sorting it. By design, this leads to some monstrous conclusions. For example, eugenics might be logically efficient, but it is morally abhorrent.
Taking risks, making mistakes: these are not flaws, they are the very essence of being human.
Test yourself! I guess that everyone on here is very rational (as I am). I only discovered this problem in my character after a conversation with an artist, a good friend from high school. She makes all her decisions based on the heart, rather than the mind. Try to do something totally random! When things make no logical sense, the emotions wake up again. You'll "feel" again. It doesn't matter if that's a good or bad feeling - acting like a machine makes you feel nothing at all. A machine can defend every action it takes, because it's never wrong. But machines can't apologise.
There will be data-driven businesses. They're not actually run by humans (whatever the management says), they're run by machines. Those companies could ultimately be fully automated away. It's far better, as a human, to be creative (even if the most creative thing you can do, like me, is teaching machines how to talk to other machines).
Maybe I reading the wrong thing into that. I make decisions with my heart and mind.
Heart = I care about my kids and want them to be healthy
Mind = To care for them I vaccinate them and don't use homeopathy
Maybe I'm misinterpreting this but my general experience is someone who "makes all decisions based on the heart, rather than the mind" generally makes some very poor decisions that actually don't lead to the results they want.
While machines are presently lacking in certain capabilities such as empathy and conceptualization, they are still very useful as tools and extensions of the human brain as long as the limitations and pitfalls are understood.
With your example of artificial intelligence, the software running on top of silicon circuits still comes from human intelligence. Where does that ultimately come from?
If an early-stage startup tries growth hacking before it reaches product market fit, it will likely end in disaster.
Gut reactions can take us only so far: they break down as we move away from single human-scale familiar problems (ones that the brain has some built-in, evolved capacity of handling, such as reading other people's facial expressions).
As great as data is, it is also limited. Why? Because we can never gather enough data. All our data is just a simplification of what's actually going on.
Essentially we're just grabbing data that's generated by black-box tests on systems that are astronomically more complex then we can comprehend. In many cases the data tells only a fraction of the story. It's akin to some alien race trying to understand the a computer desktop by measuring the electrical inputs on a usb port and seeing how that effects the voltage output of the hdmi port.
--
Here's a telling example of the power of intuition by a quote from steve jobs responding to Marc Andreessen inquiring about the "critical" problem the iphone had of "not having a physical keyboard.":
‘They’ll get used to it.’
Any datapoint you gathered on keyboards back in the day would have told you otherwise!
This is not even remotely true in my experience working at and advising startups. Sure, data is important, but more so for tactical matters like ad performance and A/B testing. Big strategic decisions typically employ far less data relatively, pretty much definitionally since no data exists for "big strategic decisions".
Data shouldn't be used to set goals it should be used to achieve them. It may also tell you when it is not currently possible to achieve your goal. That doesn't mean you should throw up your hands and set goals that the data seems to indicate are achievable because (among other things) that is tantamount to believing we can actually predict the future.
While data always provides more information, the less strong your prior beliefs, the less informative your experiment will be -- If you believe something and it turns out to be majorly false, you get a nice shift in expectations. If you believe in something and it turns out to be very true, you gain lots of information in terms of quantifying the effect you are looking into.
If you are Google, looking to eck out every last 1/1000th of a penny on ads, yeah, maybe a/b testing the shade of blue of a button can be justified.
The more other companies are "Data Driven" [like the somewhat unfortunate examples the author chose], as opposed to "Hypothesis Driven", the more there is room for somebody else to fry bigger fish.
In other words, it's not the "data's" fault, it is ours.
I'm working within several different businesses right now, and the consistent theme I'm trying to relate to the folks I work with is that data-driven techniques can take you right up to the edge of what is known to be possible. It's the people who work with the ambiguity there and take leaps into the unknown that ultimately change things. It's fine to want to be part of the pack, but for the really ambitious folks being at the front-edge of the pack is still being part of the pack. Learning to make the move out in front is the hard part.
A bit off-topic, but it explains why we got Windows 8x and the upcoming Windows 10 - data driven metrics.
When will Microsoft learn that developers and advanced users turned off the "phone-home" metrics gathering functions in Windows XP, Vista, 7 and Office?
People want Windows 10 to be Windows 7.5. It would be nice to get some lost Windows XP functionality back and shell bugs fixed that are in since Vista.
Before the internet (and being able to track every single action), successful companies were built. It can be done. Using data to drive decisions has some value, but it's not the end-all solution, it's merely a piece of the puzzle.
"Bipsync provides a research automation platform to maximize the productivity of professional investors. Founded in Silicon Valley in 2012 by experienced investors and software developers at Stanford University, the company uses modern technologies and user-centered design to speed up data capture, automate research maintenance and identify insights that drive better decisions for investors and funds." [1]
I mean, maybe I shouldn't be looking for patterns, because, y'know, data. But it seems oddly conflicting to be pitching a product that encourages the use of data to drive decisions and then publicly condemning... the use of data to drive decisions.
Aside from that contradiction, the company just got seed funding four months ago. It's probably far too early to make decisions about the efficacy of being "data-driven." From personal experience, trying to manage people by telling them, "I'm right, let's do it my way," is terribly demotivating (and very prone to error). Conversely, trying to weigh everyone's input equally and sift out good ideas is an organizational nightmare that creates a ton of complexity. Complexity slows down execution. And who decides on the best ideas?
Creating a mental framework for hypothesis testing and building a product based on optimizing for specific metrics is, in my mind, what being data-driven actually means. There are no inconsistencies or personal biases. It's scalable. You can teach the entire team how to approach the design of a feature as a problem with a testable hypothesis. Politics go out the window as execution strategy is determined by return on investment of engineering resources. Being data-driven doesn't discourage creativity, it just allows you to reframe problems.
Buzzfeed clickbait titles are but a small (and, well, effective) subset of a vast array of largely positive things that come from being "data-driven." Attempting to demonize patterns of logical, rational decision-making because you (personally) don't like one outcome is... well, an anti-pattern. (It happens all of the time. See: The history of the scientific method. ;))
Sure, it's not sexy. But it doesn't need to be. It just needs to work.
It seems like micro-decisions are best made after looking at the data, but macro decisions are not.
"But fail often and fast" the cliche goes, but what about the opposite, it should be true by simple negation of logic right?
"Win seldomly and slow", then suddenly collecting data on every useless piece of data becomes futile. You are not focused on winning and without the burden of speed and pressure to screw things up. You are absolutely calm and able to think things through.