What’s on Top of TikTok?

The videos of TikToks can easily reach billions. But because the app won’t share what’s popular, we don’t know just what the world is watching.

We are not the cultural consumers we used to be. Data, streaming, and Web 2.0 have remade how we read and how we watch. Platforms are the new publishers. But although we consume culture differently now, much of how we talk about and study it remains lodged in the analog world of the 20th century. It’s time for our methods to catch up with our objects. Born digital culture requires a born digital approach. Hacking the Culture Industries showcases the power of data-driven cultural criticism, and reinvigorates cultural studies for the 21st century. These five new essays move between book culture, streaming TV, social media, and online writing platforms: Squid Game and streaming hits; Goodreads and romance fiction; Twitter and hive-critique; Tik Tok and cultural attention; and who gets to decide who wins book prizes in the age of social networks. This series takes up a call we issued a year ago: to hack the culture industries. To challenge their dominance by using their data to study them and their stranglehold on cultural production. To tell new stories about culture in a time of ubiquitous data.

—Laura B. McGrath, Dan Sinykin, and Richard Jean So


First, watch this very brief video. Or, if you prefer, read this description: For five seconds, an adult man, dressed up as the boy-wizard Harry Potter, appears to fly through the air on a broomstick, while twinkly music plays (a parodic riff on the Potterverse theme). Next, he exposes the ostensible mechanism behind his illusion: a contraption (concealed beneath his midsection) involving a mirror, a prosthetic leg, and a longboard. Finally, he engages with his cameraman in a verbal, slapstick squabble over who owns the longboard. “Come back here!” the cameraman yells, while the faux-wizard rolls away.

Still from “Zach King’s Magic Ride”

The sequence, at only 18 seconds long, is almost vanishingly slight. Yet it has done anything but escape notice. According to multiple sources, “Zach King’s Magic Ride”—as it is sometimes called—which was first posted to TikTok on December 9, 2019, is currently the “Most Viewed TikTok of all Time.”1 That makes it, arguably, one of the most visible new aesthetic creations in the contemporary world.

During only the first four days of its digital life, the video earned over 2 billion views.2 (King’s channel has since attracted 75 million “followers” and 1.1 billion “likes”). Even where such view counts are inflated—as they may often be—they are formidable, relatively speaking. Comparable figures are hard to find. But here are some: according to IMDb, the highest grossing movies of all time have sold around 400 million tickets worldwide during their initial box office run (these metrics, too, have been charged with hyperbole);3 in the US, most New York Times bestsellers sell only 10,000 to 100,000 copies during their first years in print.4

For decades, humanist “cultural critics” like myself—academics working in disciplines like literature or art history—have shown how new popular media both reflect and effect the cultures from which they issue. But we have barely begun (and I will spare you the lit review),5 to broach the questions raised by the web’s most megaviral content (content like King’s “Magic Ride,” which captivates millions, or even billions of users). For example: Do many of the most popular TikToks, we might ask, also involve illusions or magic? (They do.)6 And what is the significance of this pattern—especially given that more than half of TikTok’s fanbase, despite the platform’s Gen Z affiliations, are now over the age of 30?7 Is there a reason why contemporary citizens might suddenly fixate, so overwhelmingly, on visual trickery? (Think “fake news” or “alternative facts”). And how might that preoccupation affect how they think? (For clues: see art-historical treatments of the trompe l’oeil).

In my own work, of the past few years, I’ve pursued these types of cultural-critical questions. I’ve identified pervasive strains of viral content across multiple major platforms—from Facebook and Twitter (now “X”) to YouTube and Instagram—and considered what their ubiquity reveals about our collective, historical moment.

In the process, I’ve had the occasion to learn this surprising fact: that it is both unexpectedly and especially difficult to find comprehensive data about the most popular social media content. At most, there are a few brief top 10- or 20-style lists (sometimes crowdsourced; sometimes released by the platforms themselves). But further catalogs are challenging, if not impossible, to locate outside of industry settings (and especially without paying high premiums).

Put another way: today, billions of global citizens peruse popular platforms for hours per day.8 And yet, as academics—and by extension, the general public—we don’t know what they are actually consuming: the types of content that they are viewing, liking, sharing, or commenting on in the largest quantities.

At first, this may seem a familiar story. It’s well-known—at least among tech-critical types—that the platforms are stingy with their data. And it’s generally agreed that this is a problem, not only for scholarly inquiry, but for democratic society. When Elon Musk, for example, recently announced that he would close down free access to Twitter’s academic and public data source (or “API”) he did not simply disrupt multiple active scholarly research projects (and by extension, academic careers). He made it easier for the platform, moving forward, to conceal its role in ongoing crises, like misinformation or cyberharassment.

But the issue that I describe is more specific, and less regularly discussed in public fora. Even where social media companies have been more forthcoming with their data—opening their APIs and sharing their data with researchers more freely—they have still made it especially difficult to find information about the most popular digital content. And though this opacity has caused similar problems for democratic society, it has also bred its own unique types of troubling effects, warping collective culture toward the interests of the few.

Here, therefore, I’d like to take you through the details of this transparency problem: what it is, why it’s worrisome, and what we can do about it. We can’t easily compel the platforms to release more data, although that political fight wages on, with at least a few potential victories on the horizon.9 But we can more deliberately attempt to discover and consider the web’s most widespread content, and we can scrounge together the relevant data through our own devices. This project is essential for the health of our media cultures; in the year 2023, it’s some of the most crucial work that cultural critics can do.

 

Good Data About (Popular) Content is Hard to Find

To grasp the problem, it’s necessary, first, to understand a few basics of social media research. Currently, this research emerges from a broad array of mostly social-scientific disciplines (political science, communications, rhetoric, and many more), in a near-infinite variety of forms (network analyses, ethnographies, et cetera).

For our purposes, two facts are relevant: First, only some of these scholars take a general sort of interest in the most popular or viral content. Instead, most—insofar as they address content at all—focus in on case studies involving narrower subtypes of content, like fake news, COVID misinformation, #MeToo tweets, or makeup tutorials.

Second, where these scholars do more data-driven research, they often find it particularly useful to use the above-mentioned tools that are called APIs (or “Application Programming Interfaces”). These programmatically accessible tools can be hosted by the platforms in academically accessible versions (there are also other, less research-oriented types of APIs). Moreover, they curate “back-end” platform data—that is, data beyond what you can see on your tailored feed—in easily manipulable formats (think spreadsheets, not newsfeeds).

But those who want to use these APIs face a simple but pernicious obstacle: the platforms, one by one, have shut them down. Traditionally, Meta was the major villain in this story. In 2021, the company suddenly disbanded the team that ran its research-oriented API, which was called CrowdTangle; in 2022, they then paused new user applications for the tool. Since then, CrowdTangle has been reportedly glitchy for those who already had access, and unavailable to those—like myself—who did not. Researchers in search of free, back-end Facebook data, therefore, have often had to work with the platform directly, usually by demonstrating some type of political urgency (e.g., working on fake news). But things change quickly. Even as this essay was being prepared for publication, Meta released information about the beta version of a new academic API for Facebook and Instagram data—though it remains to be seen how widely accessible and functional this emerging tool will be.

Twitter and Reddit, meanwhile, throughout the late-2010s, were often called “Petri dishes” for social media research, because they hosted freely accessible research-oriented APIs (YouTube did, too; but researchers often found it easier to work with text-based platforms). Recently, however, as mentioned—and thanks to Elon Musk’s takeover of Twitter—they have both cut off access to these APIs and announced that they will charge expensive fees for their future use. TikTok has recently opened a new academic API; but the tool remains largely untested in published research, due to foreboding terms of service.

However, for scholars like myself—that is, of a more cultural-critical cast—the problem with API data is slightly more specific, and, arguably, more acute. Such scholars, at least in theory, have more reason to ask general questions about the most popular content, rather than focusing on narrower case studies (e.g., tracking the spread of vaccine misinformation or conducting ethnographies of Estonian teens involved in Game of Thrones fan cultures). But this type of general data can be still harder to acquire. Even where the APIs have been open—making it easy to collect data concerning predefined subtypes of content (e.g., by user or keyword)—they’ve made it harder to survey content more unboundedly.

Take, for example, Twitter’s recent academic API. In its official documentation, the API offered clear methods for searching through past tweets by hashtag, keyword, or creator;10 but it provided no simple function for conducting wider searches of content, from a particular past-tense period, sorted by numbers of likes or retweets (There were workarounds—but not that could allow users to produce the desired “top” datasets, without first surpassing their tweet quotas).11 Or take YouTube’s Data API, which remains accessible and free of charge. The documentation for this API does include one function that allows users to call up lists of vaguely defined “most popular” videos in a region, at the present moment.12 But when it comes to searching through larger stores of content from past periods, the suggested queries are narrowed down, for example by keyword, channel, or other subcategories (again, workarounds prove imperfect).13 Instagram’s regular tier Graph API, meanwhile, though it’s not oriented toward academics, does curate some data. But while it allows users to search for photos tagged with a particular hashtag, for example, it doesn’t allow them to turn up all top hashtags more broadly.14 (TikTok’s new academic API, if it becomes commonly used in published research, could, like Reddit’s, prove a partial exception to these rules, but this remains unclear—as will be discussed).15

This doesn’t mean, of course, that academics can never find data on the most popular content—especially if they are willing to go pricey industry routes. Twitter, for example, though it has restricted the capacities of its academic API, offers a far more expensive “Engagement API” to businesses, which allows them to rank tweet output by popularity metrics.16 Facebook, meanwhile, has allowed certain social media analytics services—like Newswhip and Buzzsumo—to build apps using its API data, which then curate that data for (high-) paying customers. (A team of Michigan researchers, for example, has used the Newswhip app to do research surrounding Facebook’s top-performing link posts).17

By and large, however, the basic knowledge gap persists: academics don’t have the tools and resources necessary to study popular content, top-down. And so they continue, by dint not only of choice—but also of subtle technical necessity—to produce more bottom-up case studies, analyzing content that falls into categories of predefined concern (like content hashtagged MeToo, mentioning makeup, or published by nytimes.com).

 

It’s Not Just a Problem for Researchers

These are steep hurdles for academic progress. But they are also much broader social problems.

When it comes to the industry-wide closing of APIs, this fact is already abundantly clear. Throughout the past half decade, social scientists have warned that when a company like Meta stops sharing its back-end data, it threatens democracy. By withholding such data, the company can obscure how its platforms have disseminated QAnon’s anti-vax content, or facilitated Russian suppression of the African American vote. These scholars have also worked to free social data from its corporate siloes. They’ve joined members of Congress, for example, in calling for the platforms to embrace more transparency; and they’ve begun collective projects—like Documenting the Now—that preserve social media datasets concerning politically urgent themes. Today, faced with the new problem of Twitter and Reddit’s embargoed APIs, these scholars are already helping raise the alarm in the popular press.

Such general threats are concerning. But, when it comes specifically to the issue of obscuring top data, they are worrisome in a different way. Certainly, when platforms hide what is their most popular content, they shield themselves from political scrutiny.

Billions of global citizens peruse popular platforms for hours per day. And yet we don’t know what they are actually consuming: the types of content that they are viewing, liking, sharing, or commenting on in the largest quantities.

For example, in the late-2010s, the journalist Kevin Roose started to publish weekly tweets listing the most-linked domains on Facebook (using what he called a “kludgey workaround” in CrowdTangle). In so doing, he revealed that new, alt-right news sources like Breitbart were more prevalent on the platform than many had imagined.18 Facebook, soon after, disbanded the CrowdTangle team. Around the same time, the company publicly critiqued Roose’s lists, arguing that their own internal metric of “reach” was a better measure of content’s visibility than his share counts (even though those, ironically, came directly from the API). Today, Facebook more fully controls the top content narrative by publishing their own quarterly “Widely Viewed Content Reports”; these include lists of the platform’s twenty most “widely viewed” posts or links, by their own private calculations.19 Their recently announced academic API could, in its way, continue to serve this aim. Its current documentation includes special discussion of the tool’s methods of calculating view counts, while stating that these metrics are “not available” for “posts or reels made before January 1, 2017.”20

But even beyond politics, platforms have more unique reasons for hiding the top data. Such control allows them to steer cultural production in their favor.

Where new streaming services like Netflix, for example, refuse to release their shows’ ratings, they do so because it generates hype: better for potential advertisers, in some cases, to guess—rather than know—the audience numbers. (In the first installment of this series, “Hacking the Culture Industries,” Melanie Walsh exposed similar issues in the book industry). Social media platforms, meanwhile, use the same sorts of tactics. TikTok, as Corey Doctorow has recently described, may juice certain videos’ view counts to encourage more lucrative advertainment.21 And the platform’s parallel efforts are elsewhere apparent: visit the company’s so-called Creative Center,  for example, and you’ll encounter flashy, visual displays of the current most-engaged content (though not in large, manipulable quantities).22 Before you see the most-viewed or -shared content, however, you must first peruse the platform’s own custom category of “hot”; here, you’ll find a suspiciously large amount of brand-oriented content.

The platforms, in other words, like to encourage advertising by telling their own stories about what’s trending. And this, as Doctorow puts it, keeps their content “shitty.”

 

There Are Some (Imperfect) Solutions

All hope, however, is not lost. While we can’t access lengthy, platform-provided top datasets, we can still, at the very least, create facsimiles.

Let’s say, for example, that I want to compile a catalog of tens of thousands of the most-engaged TikToks produced during the last completed year, 2022. If it were early 2024, I might be able to generate these lists using the platform’s by-then-tested academic API—which, based on its documentation, could be more flexible than some of its predecessors (though this remains uncertain, for various reasons).23 If it were late 2021, I could plan to archive daily feeds of trending content on the platform’s front page (prelogin), once or twice per day during the following year. But I am slightly too early and too late for those options, respectively. So, like many other recent TikTok researchers, I need to rely on front-end data collection, using tools that have not been outlawed by the platform’s terms of service.24 I can also lean on prior work by analytics companies, and past efforts at crowdsourcing top data.

Here, I choose this method: First, I use the Internet Archive to create a compendium of Wikipedia and the analytics company Social Blade’s most-followed TikTok influencer lists, as regularly updated throughout 2022.25 Next, I manually collect metadata for all TikToks included on the most-followed creators’ pages from the front-end of the website (with help from a tool called zeeschuimer). In this way, I compile a compendium of metadata concerning 14,837 TikToks posted by 61 of the most followed accounts (all of which, except 15, have earned 100,000-plus plays). These TikToks, of course, do not simply stand in for the platform’s most popular content, which would include more one-hit wonders by unknown creators. Nor do they organically reflect what TikTok users favor, or “want”—especially given that, in some cases, TikTok has potentially inflated the numbers. But they do serve as one type of sample of what the platform makes especially visible (a sample, however, that is nonetheless skewed toward TikToks by famous creators).

Of the 61 creators’ whose work, produced in 2022, is here represented, the two most followed—consistently vying for the top spot—are Khaby Lame and Charli D’Amelio. (Zach King, meanwhile, hovers around sixth place). Lame is an Italian creator (Senegalese-born), who makes videos that mock other popular TikToks, usually by revealing that their “lifehacks” actually overcomplicate simple processes: in one, typical of the genre, he first replays another creator’s performance of removing the skin from a banana, elaborately; he then peels a banana normally, wearing his signature deflationary expression. D’Amelio, meanwhile, is a typically-crop-top-clad teen, who first became famous for a video of herself performing a dance called “Renegade.” (This dance was originally choreographed and posted on the platform by a Black teenager named Jalaiah Harmon; Harmon has since been awarded varied cultural accolades—like prizes from Ellen DeGeneres—as compensation for the act of appropriation; but she remains less well known, and well paid, than D’Amelio.)

Lame, in other words, is one of TikTok’s many bubble bursters: creators who comment, in meta-fashion, on other TikToks, often via the “stitch” function, which allows them to fuse their own videos with prior TikToks (WonJeong is another especially popular creator who specializes in these videos). D’Amelio, meanwhile, is one of the platform’s many sirens: scantily clad dancers who wink and wiggle their way through its frenetic feeds (Kimberly Loaiza is another, with a particularly devoted following). Both Lame and D’Amelio now have 150 million plus separate followers: a group about half the size of the US population.

 

Still from a TikTok by Khaby Lame (@khaby.lame)

Still from a TikTok by Charli D’Amelio (@charlidamelio).

Here are some of the trends that I’ve encountered, across the popular TikToks in my collection. Collectively, the most commonly used caption-words included (with or without hashtags): “tiktokfood,” “ASMR,” “love,” “mom,” “girlfriend,” “sister,” “dance,” “happy,” “think,” “time,” “today,” and the tearful laughter emoji (many other top words were in Spanish, Korean, or platform-specific lingo. Though 50 out of 61 of the accounts were entirely in English, some of the non-English accounts produced particularly large quantities of videos). Their most commonly featured songs included XOTeam’s original song “Reason,” The King Khan and BBQ Show’s “Love You So,” and Lil Nas X’s “Old Town Road.” A random sample of five hundred of the TikToks, as I hand-labeled them, could be sorted into the following, most-overarching typologies:

 

  • music performance videos, including dance, lip-synch, or scenes playacted along with music (42% of the TikToks);
  • autobiographical, or more ostensibly “real” content, presenting scenes from life or personal opinions, or communicating with fans (36%);
  • stunt, mime, or illusion videos, featuring physical feats, and bodily altering performances, potentially involving animation or cosplay (22%);
  • skit videos, presenting fictional scenes (20%);
  • materially oriented content, foregrounding physical, object-centric processes, like crafts, cooking, makeup application, et cetera (15%);
  • media-commentary Tiktoks, responding to other Tiktoks or entertainment (10%);
  • overt brand advertisements, often as enfolded into other categories (1%);
  • (and then, of course, other miscellany).

(Note that these percentages do not add up to 100, since many videos belong to multiple categories.)26

 

Much more might be said about this data—it might also be collected more effectively, in due time (e.g., via the new API). For my purposes, however, the upshot is this: that a broad, top-oriented dataset can still be generated, despite existing limitations. And that doing so can serve important purposes.

On the level of scholarly research, such data opens doors for cultural critical analysis. The TikToks in this collection, for example, confirm what is already obvious to casual onlookers about the platform: TikTok foregrounds dance, pranks, and snacks; TikTok is ruled by tweens; TikTok is, uniformly, almost alarmingly hypnotic, the expression of a COVID-addled culture’s collective death drive.

But these TikToks also lend support to less common characterizations: TikTok is—as discussed—pervaded by magic show–style visual illusions; TikTok is inundated with ASMR; TikTok is overwhelmingly meta, favoring content about content; TikTok’s most pervasive genres include not only depictions of dances, pranks, stunts, and recipes, but also videos in which creators modulate their faces with filters or animations (10%), perform exaggerated reactions to other media objects (10%), or dramatize relationships with nuclear family members (often, yes, “moms” or “sisters”) (7%). Cultural critics have their work cut out for them, deciphering what all of this means.

On a broader social level, however, a dataset like this can begin to create or confirm counternarratives about what the apps are showing us. (Even where the data, itself, may already be partially doctored by the platform). TikTok, for example, is eager to confirm its “good-for-kids” credentials; but the cultural critic might be more troubled by how uniformly tweenage bodies seem to serve as the major source of its top content’s visual appeal. The site’s Creative Center may push the idea that the “hottest” content tells brand-centric stories; but a homespun dataset can test what type of creativity more organically prevails. Indeed, the answers to these questions are the only narratives that stand a chance of rivaling the platform’s own story, on both political and commercial levels.

 

A Bigger Picture

For the past few years, I’ve been collecting these types of datasets, at small and large scales—not just for TikTok, but for other platforms, too. And I’m not alone.

Currently, I know of no large-scale, collective effort—akin to CommonCrawl or the Internet Archive—that has attempted to collect the web’s most “viral” or popular content, and then make that data publicly available for collective use. Indeed, large-scale projects aimed at archiving social media, conducted by national libraries or major universities, have been hindered by many of the same structures that restrict individual researchers, confining them to focus on content belonging to topical subcategories. As a recent, scholarly survey of social media archiving efforts concludes: “While most archiving institutions use a twofold approach for archiving regular web content—combining broad crawls (covering top-level domains) and selective crawls (for thematic or events-based collections) … the archiving institutions in our sample only use selective crawls to archive social media content.”27

Nor do I know of a crowdsourced resource, akin to Documenting the Now, which collects individually assembled viral datasets—as opposed to datasets concerning major political events—for public use.

But I have come across countless groups of scholars who, for the purposes of different studies, have cobbled together archives that—whether deliberately or not—aggregate samples of especially highly visible content. In one recent study, for example, a researcher saved all the videos appearing on his TikTok “For You” feed, over a period of six months (Though this feed is personally tailored, it tends to include highly viewed content, as well as content that the platform is promoting; a similar method could be applied to the website’s prelogin feed).28 Studies of different types of virality, meanwhile (which have largely taken place in fields like business and marketing) have compiled samples that in some way capture digital popularity across different platforms, from collections of articles from popular publications (like the New York Times) and sets of highly viewed YouTube videos, to more comprehensive samples of platform content at particular moments in time, like for a single day, week, or even month (since APIs have often placed fewer parameters on real time content collection).

Internet chat forums, meanwhile, are littered with the records of researchers, both amateur and professional, who have attempted to discover the “kludgey workarounds”—as Kevin Roose put it—through which APIs, and other broadly accessible tools, can be used to produce more unbounded collections of content or longer top-style lists. These forums bear witness, for example, to one scholar’s29 determination to use the Twitter academic API to conduct blank searches of tweets, or to one coder’s compulsion to make the YouTube API produce longer “top” video catalogues—as well as his imperfect solution (collecting videos from a channel that aggregates trending content, possibly through more crowdsourced mechanisms).

What if we were to pool these efforts? What if we were to more deliberately, and decidedly collect datasets concerning the most visible or viral content—and then work with digital archivists to combine and convey the data, responsibly, for public use?

This isn’t an undertaking for every scholar. Political scientists have good cause to continue to focus their attentions on fake news; public health practitioners have every reason to home in on COVID-related content. But for humanist critics, at least—in an era of declining enrollments and waning cultural relevance—the project has multiple motivations: comprehend the truly central currents of contemporary culture; and, in the process, help contribute to the health, transparency, and quality of our collective media.

When Web 2.0 first arrived, in the early 2000s, it heralded democracy: we the people—so the platforms promised—would now control and create our own popular media. Today, each new platform—conscious that this ideal has become corrupted—makes a new claim to restore its reality. TikTok loudly boasts that its platform, more than any other, favors ordinary creators; new competitors, like BeReal, vie to claim this mantle of populist authenticity. But power over a platform requires knowledge of what it purveys. And where we collectively lack that information, we lose the capacity to steer creation’s course.

Social media companies have been happy to show us narrow subsets of posts, videos, and tweets. But we have a right to the view from the top. icon

  1. Countless digital sources cite this fact, though typically without attribution, beyond the collective folk wisdom of the internet. According to Guinness World Records, however, TikTok verified King’s video’s status as “most viewed” on March 15, 2022.
  2. As documented here, and on King’s page. Today, two years later, the original TikTok’s view count is 2.1 billion. The video, however, has earned countless more views, through reproductions in other forms, including: two later TikTok repostings, by King (here, and pinned to the top of his page), this YouTube concerning the video’s Guinness World Record (here), and other countdowns of top TikToks, like this one by the influencer Mr. Beast.
  3. According to Wikipedia, via varied sources. These commonly reported metrics, too, have often been questioned.
  4. Again, this type of data, as many have noted, remains opaque. Wikipedia offers these metrics, by way of this peer-reviewed source.
  5. For a full survey of where humanists have, and have not, attended to new popular digital “content” see the first two sections of my article “Content’s Forms.”
  6. This claim will be substantiated later in this article. For now, though, note that popular countdowns of the top TikTok’s, like this one, typically include multiple illusions by Zach King.
  7. TikTok’s user base, since 2022, is more than 50 percent over the age of 30.
  8. See studies of consumption from Pew. The results are most striking for American teens, 35% of whom, according to a recent, commonly cited Pew study, report using platforms “almost constantly.”
  9. The currently proposed Platform Accountability and Transparency Act (PATA) calls, among other things, for platforms to disclose “highly disseminated” content.
  10. Documentation for the now defunct Twitter academic API can still be accessed via the Internet Archive’s Wayback Machine, here. This documentation has mostly now migrated to the current v2 endpoint documentation, which offers access at the levels of Basic and Pro (closer to the prior academic version, and costing $100 or $5,000 dollars a month), or Enterprise (a business-tier level with many further functionalities, including engagement ranking—as noted later in this piece—and higher prices, which are not prelisted). As the documentation explains, for the API v2 (not the enterprise Engagement API) ranking tweets is part of “post-processing.”
  11. Blank searches, as some discovered, could be engineered through a search term like the one translating to “does and does not contain the word ‘the’”; these, however, would call up so many tweets that they could only be conducted for single milliseconds, and doing so too many times would overwhelm user tweet quotas. Because the results could not be presorted or filtered by metrics including like or share counts, it was not possible to simply call up the top 20 percent most-liked tweets, for example, from the brief period of the blank search. Today, this is still true—the regular v2 API does not allow users to presort results by tweet or share counts. As I will soon mention, however, this is a possible functionality in the more expensive, business-oriented “Enterprise” versions of the API. It’s also worth noting that workaround methods of creating blank searchers, though they cannot produce true “top-tweet” datasets for long periods, can still be used, at least, to create samples of top tweets from a particular moment, and are therefore mentioned among the “workaround” methods of creating top archives described later in this piece.
  12. This tutorial demonstrates this search method. Note that “most popular,” as per the “chart=mostpopular” command in the API is not clearly defined. The documentation defines the function: “retrieve a list of YouTube’s most popular videos, which are selected using an algorithm that combines many different signals to determine overall popularity.” The metrics therefore could refer to a vaguer sense of “trending” videos, as selected by YouTube, as opposed to a harder metric of “most-viewed,” for example.
  13. Documentation of Youtube API search functions can be found here. For past tense searches, there is a channels.list method that prompts users to search by channel (instructions here). There is also a search.list method that allows users to search through past videos, but YouTube’s provided examples of the function in the documentation (here and here) all involve searching through videos by keyword (this search function also uses up more of users daily quotas). Though it’s possible to leave the query or “q=” parameter blank, and then sort results by view counts, this workaround method does not appear to produce a reliable list of top-ranked videos for past-tense periods (like, say post-2010) by available benchmarks (e.g., producing a list topped by any entries commonly cited as the platform’s top-viewed videos like “Baby Shark Dance” or “Despacito”). Here see a review of efforts to use the API to produce top content lists, posted on the coding website stack exchange.
  14. Here is the documentation for the hashtag search function, which gives instructions for searching for specific hashtags (up to 30 at a time).
  15. Reddit already aggregates top posts in the r/top listings, so records of the posts on this page can be downloaded via the API (though it’s not entirely clear if they do represent a comprehensive list). TikTok’s new research API’s documentation does seem to indicate the possibility of a more blank, regional search of TikToks (narrowing TikToks only by a region_code, in an initial search). However, it’s unclear, as yet, if these results can be sorted by engagement, and/or narrowed down by engagement numbers; if not, then it could, as with the prior Twitter API, be necessary to download very large numbers of videos (for example, as produced in a single millisecond) before ranking them by engagement. And this could result in users quickly surpassing quotas—again, as with the Twitter API. This also remains unclear, as no quotas are currently listed in the API documentation. In other words, users must discover, by using this API, if the “unofficial documentation”—or, the unwritten rules—will allow for the production of long-term, top-TikTok archives. Even so, there are also other complications currently wrapped up with using this new API, as previously noted.
  16. The Engagement API is included with the Enterprise Plan (documentation here). Though preset prices aren’t listed, and appear to be negotiable by tier, they are presumably even higher than the Pro Plan of $5,000 per month.
  17. For a description of the Michigan research, using Newship, see here.
  18. See Roose’s account of all this here.
  19. Meta, Widely Viewed Content Report: What People See on Facebook, Q2 2023 Report.
  20. See documentation data dictionary here. The term “views” under the Facebook Post heading includes more editorializing than for the terms of other metrics like “shares” or “care reactions.” It states: “The number of times the post or reel was on screen, not including times it appeared on the post owner’s screen. View counts for Facebook posts or reels made before January 1, 2017 are not available. View counts are not available for Facebook posts created in the last 10–17 days.”
  21. See Doctorow’s piece here.
  22. The embedded streams of the top videos include no clickthrough links; the HTML source code for the webpage also includes no links for the videos, or relevant metadata like video IDs. This makes this data on this webpage virtually impossible to scrape or extract, except by painstaking manual means (using the video streams to manually locate the links on TikTok, and then extract the data).
  23. See note 11.
  24. See this library guide to research methods; I use a tool called zeeschuimer to assist manual data collection.
  25. Because the first lists draw mostly on the second, they are virtually identical; sadly, there are no corresponding lists for top TikToks.
  26. All percentages were also taken out of a total of 492 as opposed to 500, since 8 videos were dead links. Percentages are rounded to the nearest whole number.
  27. Eveline Vlassenroot, Sally Chambers, Sven Lieber, Alejandra Michel, Friedel Geeraert, Jessica Pranger, Julie Birkholz, and Peter Mechant, “Web-archiving and Social Media: An Exploratory Analysis,” International Journal of Digital Humanities, vol. 2 (2021), 107–128.
  28. Andreas Schellewald, “Communicative Forms on TikTok: Perspectives From Digital Ethnography,” International Journal of Communication, vol. 15 (2021).
  29. Because this post included identifying information, the hyperlink has been omitted, as per a common practice in Internet research, aimed at protecting the identity of everyday posters in published work.
This article was commissioned by Laura B. McGrath, Dan Sinykin, and Richard Jean So. Featured image: 3 people filming TikToks in Berlin, Germany, by Lear 21 / Wikimedia (CC 4.0)