Archives for posts with tag: information

The Atlantic By Sara M. Watson
July 1, 2014 11:39 AM

Facebook has always “manipulated” the results shown in its users’ News Feeds by filtering and personalizing for relevance. But this weekend, the social giant seemed to cross a line, when it announced that it engineered emotional responses two years ago in an “emotional contagion” experiment, published in the Proceedings of the National Academy of Sciences (PNAS).

As a society, we haven’t fully established how we ought to think about data science in practice. It’s time to start hashing that out.

Before the Data Was Big…

Data by definition is something that is taken as “given,” but somehow we’ve taken for granted the terms under which we came to agree that fact. Once, the professional practice of “data science” was called business analytics. The field has now rebranded as a science in the context of buzzwordy “Big Data,” but unlike other scientific disciplines, most data scientists don’t work in academia. Instead, they’re employed in commercial or governmental settings.

The Facebook Data Science team is a prototypical data science operation. In the company’s own words, it collects, manages, and analyzes data to “drive informed decisions in areas critical to the success of the company, and conduct social science research of both internal and external interest.” Last year, for example, it studied self-censorship—when users input but do not post status updates. Facebook’s involvement with data research goes beyond its in-house team. The company is actively recruiting social scientists with the promise of conducting research on “recording social interaction in real time as it occurs completely naturally.” So what does it mean for Facebook to have a Core Data Science Team, describing their work—on their own product—as data science?

Contention about just what constitutes science has been around since the start of scientific practice. By claiming that what it does is data science, Facebook benefits from the imprimatur of an established body of knowledge. It looks objective, authoritative, and legitimate, built on the backs of the scientific method and peer review. Publishing in a prestigious journal, Facebook legitimizes its data collection and analysis activities by demonstrating their contribution to scientific discourse as if to say, “this is for the good of society.”

So it may be true that Facebook offers one of the largest samples of social and behavioral data ever compiled, but all of its studies—and this one, on social contagion—only describe things that happen on Facebook. The data is structured by Facebook, entered in a status update field created by Facebook, produced by users of Facebook, analyzed by Facebook researchers, with outputs that will affect Facebook’s future News Feed filters, all to build the business of Facebook. As research, it is an over-determined and completely constructed object of study, and its outputs are not generalizable.

Ultimately, Facebook has only learned something about Facebook.

The Wide World of Corporate Applied Science

For-profit companies have long conducted applied science research. But the reaction to this study seems to suggest there is something materially different in the way we perceive commercial data science research’s impacts. Why is that?

At GE or Boeing, two long-time applied science leaders, the incentives for research scientists are the same as they are for those at Facebook. Employee-scientists at all three companies hope to produce research that directly informs product development and leads to revenue. However, the outcomes of their research are very different. When Boeing does research, it contributes to humanity’s ability to fly. When Facebook does research, it serves its own ideological agenda and perpetuates Facebooky-ness.

Facebook is now more forthright about this. In a response to the recent controversy, Facebook data scientist Adam Kramer wrote, “The goal of all of our research at Facebook is to learn how to provide a better service…We were concerned that exposure to friends’ negativity might lead people to avoid visiting Facebook. We didn’t clearly state our motivations in the paper.”

Facebook’s former head of data science Cameron Marlow offers, “Our goal is not to change the pattern of communication in society. Our goal is to understand it so we can adapt our platform to give people the experience that they want.”

But data scientists don’t just produce knowledge about observable, naturally occurring phenomena; they shape outcomes. A/B testing and routinized experimentation in real time are done on just about every major website in order to optimize for certain desired behaviors and interactions. Google designers infamously tested up to 40 shades of blue. Facebook has already experimented with the effects of social pressure in getting-out-the-vote, raising concerns about selective digital gerrymandering. What might Facebook do with its version of this research? Perhaps it could design the News Feed to show us positive posts from our friends in order to make us happier and encourage us to spend more time on the site? Or might Facebook show us more sad posts, encouraging us to spend more time on the site because we have more to complain about?

Should we think of commercial data science as science? When we conflate the two, we assume companies are accountable for producing generalizable knowledge and we risk according their findings undue weight and authority. Yet when we don’t, we risk absolving practitioners from the rigor and ethical review that grants authority and power to scientific knowledge.

Facebook has published a paper in an attempt to contribute to the larger body of social science knowledge. But researchers today cannot possibly replicate Facebook’s experiment without Facebook’s cooperation. The worst outcome of this debacle would be for Facebook to retreat and avoid further public relations fiascos by keeping all its data science research findings internal. Instead, if companies like Facebook, Google, and Twitter are to support an open stance toward contributing knowledge, we need researchers with non-commercial interests who can run and replicate this research outside of the platform’s influence.

Facebook sees its users not as a population of human subjects, but as a consumer public. Therefore, we—that public and those subjects—must ask the bigger questions. What are the claims that data science makes both in industry and academia? What do they say about the kinds of knowledge that our society values?

We need to be more critical of the production of data science, especially in commercial settings. The firms that use our data have asymmetric power over us. We do them a favor unquestioningly accepting their claims to the prestige, expertise, and authority of science as well.

Ultimately, society’s greatest concerns with science and technology are ethical: Do we accept or reject the means by which knowledge is produced and the ends to which it is applied? It’s a question we ask of nuclear physics, genetic modification—and one we should ask of data science.

By Adam Frank

June 11, 2013 2:41 PM ET
Big Data may not be much to look at, but it can be powerful stuff. For instance, this is what the new National Security Agency (NSA) data center in Bluffdale, Utah, looks like.

Big Data may not be much to look at, but it can be powerful stuff. For instance, this is what the new National Security Agency (NSA) data center in Bluffdale, Utah, looks like.

George Frey/Getty Images

New technologies are not all equal. Some do nothing more than add a thin extra layer to the top-soil of human behavior (i.e., Teflon and the invention of non-stick frying pans). Some technologies, however, dig deeper, uprooting the norms of human behavior and replacing them with wholly new possibilities. For the last few months I have been arguing that Big Data — the machine-based collection and analysis of astronomical quantities of information — represents such a turn. And, for the most part, I have painted this transformation in a positive light. But last week’s revelations about the NSA’s PRISM program have put the potential dangers of Big Data front and center. So, let’s take a peek at Big Data’s dark side.

The central premise of Big Data is that all the digital breadcrumbs we leave behind as we go about our everyday lives create a trail of behavior that can be followed, captured, stored and “mined” en-mass, providing the miners with fundamental insights into both our personal and collective behavior.

The initial “ick” factor from Big Data is the loss of privacy, as pretty much every aspect of your life (location records via mobile phones, purchases via credit cards, interests via web-surfing behavior) has been recorded — and, possibly, shared — by some entity somewhere. Big Data moves from “ick” to potentially harmful when all of those breadcrumbs are thrown in a machine for processing.

This is the “data-mining” part of Big Data and it happens when algorithms are used to search for statistical correlations between one kind of behavior and another. This is where things can get really tricky and really scary.

Consider, for example, the age-old activity of securing a loan. Back in the day you went to a bank and they looked at your application, the market and your credit history. Then they said “yes” or “no.” End of story. In the world of Big Data, banks now have more ways to assess your credit worthiness.

“We feel like all data is credit data,” former Google CIO Douglas Merrill said last year in The New York Times. “We just don’t know how to use it yet.” Merrill is CEO of ZestCash, one of a host of start-up companies using information from sources such as social networks to determine the probability that an applicant will repay their loan.

Your contacts on LinkedIn can be used to assess your “character and capacity” when it comes to loans. Facebook friends can also be useful. Have rich friends? That’s good. Know some deadbeats, not so much. Companies will argue they are only trying to sort out the good applicants from the bad. But there is also a real risk that you will be unfairly swept into an algorithm’s dead zone and disqualified from a loan, with devastating consequences for your life.

Jay Stanley of the ACLU says being judged based on the actions of others is not limited to your social networks:

Credit card companies sometimes lower a customer’s credit limitbased on the repayment history of the other customers of stores where a person shops. Such “behavioral scoring” is a form of economic guilt-by-association based on making statistical inferences about a person that go far beyond anything that person can control or be aware of.

The link between behavior, health and health insurance is another gray (or dark) area for Big Data. Consider the case of Walter and Paula Shelton of Gilbert, Louisiana. Back in 2008, Business Weekreported how the Sheltons were denied health insurance when records of their prescription drug purchases were pulled. Even though their blood pressure and anti-depression medications were for relatively minor conditions, the Sheltons had fallen into another algorithmic dead zone in which certain kinds of purchases trigger red flags that lead to denial of coverage.

Since 2008 the use of Big Data by the insurance industry has only become more entrenched. As The Wall Street Journal reports:

Companies also have started scrutinizing employees’ other behavior more discreetly. Blue Cross and Blue Shield of North Carolina recently began buying spending data on more than 3 million people in its employer group plans. If someone, say, purchases plus-size clothing, the health plan could flag him for potential obesity—and then call or send mailings offering weight-loss solutions.

Of course no one will argue with helping folks get healthier. But with insurance costs dominating company spreadsheets, it’s not hard to imagine how that data about plus-size purchases might someday factor into employment decisions.

And then there’s the government’s use, or misuse, of Big Data. For years critics have pointed to no-fly lists as an example of where Big Data can go wrong.

No-fly lists are meant to keep people who might be terrorists off of planes. It has long been assumed that data harvesting and mining are part of the process for determining who is on a no-fly list. So far, so good.

But the stories of folks unfairly listed are manifold: everything from disabled Marine Corps veterans to (at one point) the late Sen. Ted Kennedy. Because the methods used in placing people on the list are secret, getting off the list can, according to Connor Freidersdorf of The Atlantic, be a Kafka-esque exercise in frustration.

A 2008 National Academy of Sciences report exploring the use of Big Data techniques for national security made the dangers explicit:

The rich digital record that is made of people’s lives today provides many benefits to most people in the course of everyday life. Such data may also have utility for counterterrorist and law enforcement efforts. However, the use of such data for these purposes also raises concerns about the protection of privacy and civil liberties. Improperly used, programs that do not explicitly protect the rights of innocent individuals are likely to create second-class citizens whose freedoms to travel, engage in commercial transactions, communicate, and practice certain trades will be curtailed—and under some circumstances, they could even be improperly jailed.

So where do we go from here?

From credit to health insurance to national security, the technologies of Big Data raise real concerns about far more than just privacy (though those privacy concerns are real, legitimate and pretty scary). The debate opening up before us is an essential one for a culture dominated by science and technology.

Who decides how we go forward? Who determines if a technology is adopted? Who determines when and how it will be deployed? Who has the rights to your data? Who speaks for us? How do we speak for ourselves?

These are the Big Questions that Big Data is forcing us to confront.

Editor’s note: This is a guest post from Scott Young ofScottYoung.com.

In high school, I rarely studied. Despite that, I graduated second in my class. In university, I generally studied less than an hour or two before major exams. However, over four years, my GPA always sat between an A and an A+.

Recently I had to write a law exam worth 100% of my final grade. Unfortunately, I was out of the country and didn’t get back by plane until late Sunday night. I had to write the test at 9 am Monday morning. I got an A after just one hour of review on the plane.

Right now, I’m guessing most of you think I’m just an arrogant jerk. And, if the story ended there, you would probably be right.

Why do Some People Learn Quickly?

The fact is most of my feats are relatively mundane. I’ve had a chance to meet polyglots who speak 8 languages, people who have mastered triple course loads and students who went from C or B averages to straight A+ grades while studying less than before.

The story isn’t about how great I am (I’m certainly not) or even about the fantastic accomplishments of other learners. The story is about an insight: that smart people don’t just learn better, they also learndifferently.

It’s this different strategy, not just blind luck and arrogance, that separates rapid learners from those who struggle.

Most sources say that the difference in IQ scores across a group is roughly half genes and half environment. I definitely won’t discount that. Some people got a larger sip of the genetic cocktail. Some people’s parents read their kids Chaucer and tutored them in quantum mechanics.

However, despite those gifts, if rapid learners had a different strategy for learning than ordinary students, wouldn’t you want to know what it was?

The Strategy that Separates Rapid Learners

The best way to understand the strategy of rapid learners is to look at its opposite, the approach most people take: rote memorization.

Rote memorization is based on the theory that if you look at information enough times it will magically be stored inside your head.

This wouldn’t be a terrible theory if your brain were like a computer. Computers just need one attempt to store information perfectly. However, in practice rote memorization means reading information over and over again. If you had to save a file 10 times in a computer to ensure it was stored, you’d probably throw it in the garbage.

The strategy of rapid learners is different. Instead of memorizing by rote, rapid learners store information by linking ideas together. Instead of repetition, they find connections. These connections create a web of knowledge that can succeed even when you forget one part.

When you think about it, the idea that successful learners create a web has intuitive appeal. The brain isn’t a computer hard drive, with millions of bits and bytes in a linear sequence. It is an interwoven network of trillions of neurons.

Why not adopt the strategy that makes sense with the way your brain actually works?

Not a New Idea, But an Incredibly Underused Idea

This isn’t a new idea, and I certainly didn’t invent it.

Polymath, cognitive scientist and AI researcher Marvin Minsky once said:

“If you understand something in only one way, then you don’t really understand it at all. The secret of what anything means to us depends on how we’ve connected it to all other things we know. Well-connected representations let you turn ideas around in your mind, to envision things from many perspectives until you find one that works for you. And that’s what we mean by thinking!” [emphasis mine]

Benny Lewis, polyglot and speaker of 8 languages, recently took up the task of learning Thai in two months. One of his first jobs was to memorize a phonetic script (Thai has a different alphabet than English). How did he do it?

“I saw [a Thai symbol] and needed to associate it with ‘t’, I thought of a number of common words starting with t. None of the first few looked anything like it, but then I got to toe! The symbol looks pretty much like your big toe, with the circle representing the nail of the second toe (if looking at your left foot). It’s very easy to remember and very hard to forget! Now I think of t instantly when I see that symbol.

It took time, but I’ve come up with such an association for all [75] symbols. Some are funny, or nerdy, or related to sex, or something childish. Some require a ridiculous stretch of the imagination to make it work. Whatever did the job best to help me remember.”

The famous British savant Daniel Tammet has the ability to multiply 5 digit numbers in his head. He explains that he can do this because each number, to him, has a color and texture, he doesn’t just do the straight calculation, he feels it.

All of these people believe in the power of connecting ideas. Connecting ideas together, as Minsky describes. Linking ideas with familiar pictures, like Lewis. Or even blending familiar shapes and sensations with the abstract to make it more tangible as Tammet can do.

How Can You Become a Rapid Learner?

So all this sounds great, but how do you actually do it?

I’m not going to suggest you can become a Tammet, Lewis or Minsky overnight. They have spent years working on their method. And no doubt, some of their success is owed to their genetic or environmental quirks early in life.

However, after writing about these ideas for a couple years I have seen people make drastic improvements in their learning method. It takes practice, but students have contacted me letting me know they are now getting better grades with less stress, one person even credited the method for allowing him to get an exam exemption for a major test.

Some Techniques for Learning by Connections

Here are the some of the most popular tactics I’ve experimented with and suggested to other students:

1. Metaphors and Analogy

Create your own metaphors for different ideas. Differential calculus doesn’t need to just be an equation, but the odometer and speedometer on a car. Functions in computer programming can be like pencil sharpeners. The balance sheet for a corporation can be like the circulatory system.

Shakespeare used metaphor prolifically to create vivid imagery for his audience. Your professor might not be the bard, but you can step in and try them yourself.

2. Visceralization

Visceralization is a portmanteau between visceral and visualization. The goal here is to envision an abstract idea as something more tangible. Not just by imagining a picture, but by integrating sounds, textures and feelings (like Tammet does).

When learning how to find the determinant of a matrix, I visualized my hands scooping through one axis of the matrix and dropping through the other, to represent the addition and subtraction of the elements.

Realize you already do this, just maybe not to the same degree. Whenever you see a graph or pie chart for an idea, you are taking something abstract and making it more tangible. Just be creative in pushing that a step further.

3. The 5-Year Old Method

Imagine you had to explain your toughest subject to a 5-year old. Now practice that.

It may be impossible to explain thermodynamics to a first grader, but the process of explanation forces you to link ideas. How would you explain the broader concepts in simpler terms a child would understand?

4. Diagramming

Mind-mapping is becoming increasingly popular as a way of retaining information. That’s the process of starting with a central idea and brainstorming adjacent connections. But mindmapping is just the skin of the onion.

Creating diagrams or pictures can allow you to connect ideas together on paper. Instead of having linear notes, organized in a hierarchy, what if you had notes that showed the relationships between all the ideas you were learning?

5. Storytelling to Remember Numbers and Facts

Pegging is a method people have been using for years to memorize large amounts of numbers or facts. What makes it unique isn’t just that it allows people to perform amazing mental feats (although it can), but the way it allows people to remember information–by connecting the numbers to a story.

Pegging is a bit outside the scope of this article, but the basic idea is that each digit is represented by the sound of a consonant (for example: 0=c, 3=t, 4=d…). This allows you to convert any number into a string of consonants (4304 = d-t-c-d).

The system allows you to add any number of vowels in between the consonants to make nouns (d-t-c-d = dot code). You can then turn this list of nouns into a story (The dot was a code that the snake used…). Then all you need to do is remember the order of the story to get the nouns, consonants and back to the numbers.

The Way We Were Taught to Learn is Broken

Children are imaginative, creative and, in many ways, the epitome of this rapid learning strategy. Maybe it’s the current school system, or maybe it’s just a consequence of growing up, but most people eventually suppress this instinct.

The sad truth is that the formal style of learning, makes learning less enjoyable. Chemistry, mathematics, computer science or classic literature should spawn new ideas, connections in the mind, exciting possibilities. Not only the right answers for a standardized test.

The irony is that maybe if that childlike, informal way of learning came back, even just in part, perhaps more people would succeed on those very tests. Or at least enjoyed the process of learning.

Scott Young is a university student, author and head of an online service designed to teach you rapid learning tactics. The program is currently sold out, but you can sign up here to get announcements when it reopens.

Follow

Get every new post delivered to your Inbox.

Join 6,027 other followers