These assessments help all kinds of businesses make decisions, such as where to take risks and how to improve their operations. But how do ratings affect the people being rated? New research from Kellogg’s Hatim Rahman suggests that despite the opacity of these algorithms—indeed in large part because
of this opacity — shape people’s behavior in unexpected ways.
Rahman, assistant professor of management and organizations, investigated the impact of algorithms on an online job platform for freelancers. “There’s a lot of talk about the future of work and what it’s going to look like,” says Rahman. “These platforms are part of the next frontier.”
Like many sites that promise to connect freelancers with paying clients (among them Upwork, TopCoder and TaskRabbit), the one controlled by Rahman used a complex algorithm to score freelancers. Potential clients could sort and select potential hires based on this score.
To find out how this opaque rating system was affecting freelancers, Rahman logged onto the platform (which he nicknames “TalentFinder”) and interviewed freelancers and the clients who hired them. He also analyzed written notices from TalentFinder and postings on freelance discussion boards.
All the workers he spoke to experienced continued paranoia about the possibility of a sudden and thoughtless downgrade. How they responded to this fear depended less on the strength of their rating than on whether they had previously experienced declines in their ratings—and, crucially, how dependent they were on the platform for their income.
Traditionally, Rahman explains, scholars have described workplace evaluations as helping to cement a “iron cagearound workers because they allow employers to constrain behavior and set the standards for success. Algorithmic ratings have a different, potentially undermining impact.
“Opaque third-party evaluations can create an ‘invisible cage’ for employees,” writes Rahman, “because they experience such evaluations as a form of control and yet cannot decipher or learn from the success criteria.”
Of course, as Rahman points out, not only do many of us live in some version of this invisible cage—we also play a small part in shaping its bars. Every time we rate an Amazon purchase or a Lyft driver, we’re potentially impacting the livelihoods of others.
“The people who use these platforms are largely unaware of the role they play in influencing these systems and their algorithms,” he says. “It just feels like a very transactional relationship.”
Making the Evaluation Criteria a Mystery
“TalentFinder” is one of the largest platforms of its kind. In 2015, over 12 million freelancers registered on the site, along with 5 million clients based in over 100 countries. Clients could choose from a wide range of gig workers, from assistants to marketers to software engineers.
When Rahman signed up to TalentFinder to start his research in 2013, the platform rated freelancers according to a transparent system of project scores and overall scores. Upon completion of a project, clients would rate freelancers on a scale of 1 to 5 on a number of attributes, including “Skills,” “Quality of Work,” and “Stick to Schedule.” Aggregating these scores resulted in an overall project score, and combining these project scores (weighted by the dollar value of each project) resulted in an overall score out of five stars, which was included in a freelancer’s profile.
As simple as this rating system was, it presented a problem for TalentFinder: freelancer ratings were too high across the board and lacked the differentiation that would make them useful to clients. At one point, more than 90 percent of freelancers had at least four out of five stars, and 80 percent had a near-perfect rating.
The solution: an algorithm. Beginning in 2015, freelancers were rated on a scale of 1 to 100, based on intentionally mysterious criteria.
“We do not disclose the exact calculation for your score,” TalentFinder wrote in a public blog post three months after the algorithm was introduced. “Doing so would make it easier for some users to artificially inflate their ratings.” After implementing the new algorithm, only about 5% of freelancers received a score of 90 or higher.
To study the effects of the new rating system on freelancers, Rahman collected data between 2015 and 2018 from three sources: 80 interviews conducted with freelancers and 18 with clients. written communications, including over two thousand TalentFinder community discussion board messages related to the algorithm and all TalentFinder public posts on the subject. and his own observations as a registered customer.
Diffuse Paranoia
As Rahman sifted through his interviews and written sources, he was struck by the consistency of the complaints he heard. All of the freelancers he spoke to experienced paranoia about possible sudden drops in their scores and constant frustration at their inability to learn and improve based on score fluctuations.
“What surprised me the most was that the more experienced freelancers on the platform didn’t necessarily gain any advantage in terms of how the algorithm worked,” he says. “Generally, those who do well in a system are able to understand what’s going on to some extent. But in that context, even people whose scores hadn’t changed were very sharp.”
Rahman observed two distinct reactions to this paranoia and frustration. One reaction is what he calls “experimental reactivity”—freelancers try through trial and error to boost their ratings, for example, by only taking on short-contract projects or proactively asking clients to send feedback.
The other backlash was when freelancers tried to protect their ratings through what Rahman calls “restricted activity.” Freelancers tried various means to limit their exposure to the rating algorithm, sometimes asking clients they met on TalentFinder to leave the platform to communicate and make payments so their ratings wouldn’t be affected. Others did nothing in the hope that this would preserve their rating.
Rahman isolated two key factors that determined which freelancers experimented and which pulled off the platform or simply did nothing: the extent of a freelancer’s reliance on the platform for income, and whether their ratings had dropped.
This varied depending on whether a freelancer had a high or low rating.
High-rated freelancers with high reliance on the platform chose their tactics based on whether they had seen a recent drop in their rating. Those who had seen their score drop experimented with tactics to raise it. If their score hadn’t dropped, they limited their activity on the platform in an attempt to protect their score. High performers with low platform dependency reduced their time on TalentFinder, regardless of whether or not they had a score drop.
For freelancers with lower scores, their reliance on the platform seemed to determine the path they took. If they depended on the platform, they engaged in experimentation even if the ratings continued to fluctuate. If they did not feel so connected to it for income, they gradually reduced their activity.
Rahman explains that the position of workers feels more precarious on these platforms than in traditional work environments because, well, they are. While typical employer reviews are largely intended to help an employee improve, algorithm-facilitated reviews on a site like TalentFinder are primarily intended to help the platform automate the job of picking the “best” employees from a vast tank, thus satisfying its customers.
“For platforms, it’s about optimizing their overall dynamics. their primary goal is not to help employees improve,” says Rahman. “For people’s day-to-day experiences, especially when they rely on the platform for work, that can be very frustrating and difficult.”
Living in the Cage — and its Configuration
Since conducting this research, Rahman says he has gained more and more insight into the various invisible cages most of us live in. He points, for example, to recent reports detailing how everything from us TVs and vacuum cleaners in the smartphone apps we use medical prescriptions
and insurance they collect our data and use it to train proprietary algorithms in ways that are largely invisible to us.
“I think the metaphor of the invisible cage applies more broadly as we enter this system where everything we do, say, and how we interact is feeding algorithms that we don’t necessarily know about,” he says.
He points out that some people are freer to withdraw from these platforms than others. it all comes down to their level of dependence on them. A fruitful area for future research, he says, could be examining how characteristics such as race, gender and income correlate with reliance on platforms and their algorithmic ratings. For example, it is important to understand whether people from certain racial groups are more likely to be “rated” (and potentially blacklisted) with algorithms before they can rent an apartment, open a credit card or sign up for health insurance.
“The hope of bringing this metaphor of the invisible cage to the fore is to bring awareness to this phenomenon, and hopefully in a way that people can connect with,” says Rahman. “Of course, even when we realize this, it’s hard to know what to do, given the complexity of these systems and the rate at which their algorithms change.”
Legislation is beginning to provide some oversight in an area that remains largely unregulated. In 2020 California Consumer Privacy Act, the strongest such legislation in the country, establishes the rights of online users to know, delete and opt out of the collection of personal data. In 2018, the European Union passed even more aggressive legislation for the same purpose. “It’s an encouraging sign,” says Rahman, “but regulation alone is far from a panacea.”