You must log in or register to comment.

R1ppedWarrior t1_j68xt9j wrote

It seems weird to have data on a chart that is never labeled.


PartisanPlayground OP t1_j699ivt wrote

That's fair. "Prevalence" means share of articles that cover each story. This chart looks at the top ten stories on any given day. I can add a clearer explanation to the chart in the future. Thanks for the feedback.


sohosurf t1_j69fd06 wrote

May I ask what the four news articles that were in the top ten in the past few days but not yesterday were?


PartisanPlayground OP t1_j69go80 wrote

Good question! Here they are:

- Oscar nominations

- Ron Klain resigning as Chief of Staff

- Biden and the US-Mexico border

- Abortion

I'm thinking of adding labels to the stories that fell out of the news cycle along the bottom of the chart.


sohosurf t1_j69hrch wrote

Sorry to do this but what represents each on your graph, thanks for the information.


PartisanPlayground OP t1_j69ikqe wrote

No worries, here you go:

- 3rd place on Tuesday: Oscar nominations
- 10th place on Monday: Ron Klain resigning as Chief of Staff
- 8th place on Wednesday and Thursday: Biden and the US-Mexico border
- 10th place on Friday: Abortion


sohosurf t1_j69jco9 wrote

Thanks for being so helpful!


PartisanPlayground OP t1_j69jf1h wrote

Of course, thanks for your interest!


i_build_minds t1_j69ncsb wrote

I would regularly check a website with this data - particularly as a news site aggregator (x topic prev, y claims common in all or more than 75% of sources, etc)


PartisanPlayground OP t1_j69r2qg wrote

I'm producing this daily in a Substack newsletter:


i_build_minds t1_j6a0fmt wrote

Nice. Can I recommend this is just a big landing page with the top 10-15 articles or something?

It's easier to critique than to create - so no discourtesy intended - but that's basically my new homepage.


PartisanPlayground OP t1_j6a0zlr wrote

I've had the same idea. It'd be pretty cool to have this as a big landing page, where hovering over each story on the plot gave you details about the story and links to articles.


i_build_minds t1_j6aoh0g wrote

Yes! I'll continue to check it out but I hope your idea comes to fruition. It's fantastic.


PD216ohio t1_j6by3tg wrote

I noticed that from Jan 24-25 the title of the top story changed from Biden Documents to Classified documents. Was this because the subject of the discussion shifted in the media, or was the title changed for another reason. It is still the same "line" of the graph, just renamed.


PartisanPlayground OP t1_j6cktep wrote

That's right, the labels can change over time as the discussion shifts. GPT-3 does the labeling and I manually adjust it, if necessary. Occasionally, the stories themselves can split or combine.


iiioiia t1_j6a7iwl wrote

> I'm thinking of adding labels to the stories that fell out of the news cycle along the bottom of the chart.

I think it would be interesting to manually tag various events and then see if there is any temporal correlation between tags of certain types over long periods of time - for example: political scandals may coincidentally be commonly followed shortly by "social" scandals.

LOTS more could be done in this space, especially if one isn't too concerned for their health if you know what I mean.


merc08 t1_j6ab510 wrote

Discovering something like that seems like that kind of thing that would drive a person to commit suicide by shooting themselves three times in the back of the head.


ajt9000 t1_j69hqjz wrote

What was the data source?


PartisanPlayground OP t1_j69i6ji wrote

Articles from 64 news outlets. I included some explanation in another comment but was downvoted, I think for mentioning my Substack. Lesson learned.


Jeepcomplex t1_j68lrgf wrote

Interesting how a gang of cops publicly executing a citizen via torture can dominate the news


End3rWi99in t1_j69k1a0 wrote

What's more interesting is how it'll exit the news cycle about as fast as it entered it. We'll have the same Trump/DeSantis articles for months and months though.


Laxwarrior1120 t1_j6bglbl wrote

Well yeah the people who did it got arrested immediately what more is there to report on, especially since the media can't make it a race thing either.


Tomoromo9 t1_j6c3ni8 wrote

And they’re not allowed to make it an abuse of power/structural issue thing


handsomehares t1_j68pzp9 wrote

I’m not sure interesting is the word I’d use


iiioiia t1_j6a840h wrote

I think it is immensely interesting, because the consequences will follow the same general script as always and nothing will change.

That nothing ever changes is interesting!


OhGodNotAnotherOne t1_j68y9g9 wrote

Crazier how it seems it's almost gone completely from reddit already.

I've seen exactly ONE post on it.

Just one.


Ztaxas t1_j6913z9 wrote

With 60k upvotes? Using the general rule that only 1% of people interact with content other than viewing it (ie rating), it means it reached over 6m people.


OhGodNotAnotherOne t1_j696sax wrote

Sure but stuff like this usually ends up plastered all over. Memes, serious discussions, etc.

I was surprised to just see one post.

Apparently I'm alone in thinking it's an interesting development, perhaps it's better to forget the whole thing and just move on to something important, like changing how M&Ms dress?


Vic287 t1_j69ayr5 wrote

It's because the cops weren't white.


TightEntry t1_j69t5xs wrote

It’s also because the cops were fired and charged with murder. What is the impulse to protest/riot?

Yeah what happened is bad, and it casts yet another harsh light on the realities of police in America, but at least it is being recognized for the heinous act that it is.

There is no “following my training/feared for my life” defense. Police unions aren’t lining up to defend the murderers and the police chief called the act out for being particularly bad.

I guess this is progress since the Rodney King beating.


dtreth t1_j6aou1n wrote

If the officers were white we wouldn't have seen anyy of that


Valhallapeenyo t1_j69myte wrote

Yeah, there’s no denying it at this point. These cops absolutely brutalized this kid, 110% murdered him. The lack of coverage this story has gotten compared to…. Similar incidents… is absolutely fuckin mind blowing.


caroticum t1_j69tylg wrote

Huh you were quite right. I checked my posts and they all seem to have 0.5-1% of interactions


1337haXXor t1_j69tymh wrote

1%? Wow. I would've guessed maybe 5-10% or something. Interesting.


elastic_psychiatrist t1_j69xbp3 wrote

Lol what? Half of r/all was posts related to it last night, all with over 50k upvotes.


OhGodNotAnotherOne t1_j6ajsu3 wrote

I'm speaking about my experience, not yours.

I don't even know who you are or your reddit habits to even begin to tell why you saw hundreds of posts and I only saw one at the tjme (3 now, since I originally posted, including the mugshot post that's at the top now).


anubus72 t1_j69pwg1 wrote

So many subreddits have rules that cause posts about it to be deleted because it’s "political" (see r/videos)


Dravous t1_j6ahc3c wrote

there's a reason, but that's all I can say.


DutchNotSleeping t1_j694239 wrote

I'm sorry what? What did I miss?


myasterism t1_j69msvs wrote

5 black cops in the same city where civil rights activist Martin Luther king jr was killed, beat to death a 140lb black man who was unarmed and crying out for his mother, after they detained him for no good reason. The five black cops were unceremoniously denounced by the police department (the only “good” thing to happen in this whole mess), while the white officer whose body-cam footage has been all over the place was not disciplined, despite the lack of any attempt on his part to intervene in the cold-blooded killing his coworkers were committing.

It’s heinous.

ETA: I grew up in Memphis (where this happened) and wish I could say that I’m even slightly surprised by it, but it’s heartbreakingly and infuriatingly predictable.


GenerikDavis t1_j6a3bvj wrote

As the other people said, a 29 year old was beat to death.

Skip to ~38 minutes for a camera on a street light(I assume) that shows the whole thing. The first 38 minutes have the initial encounter before Tyre runs from them since he's scared for his life, and then 2 perspectives of them catching up to him and beating him. To warn you ahead of time though, it's literally them beating someone to the point that they died 3 days later. Like a soccer kick to the head, a cop breaking his baton from hitting him so hard, and 2 cops holding Tyre up while another punches him in the face. Its a top contender for the worst police behavior I've seen on camera.


grubas t1_j69yvpx wrote

It's the video.

It was always a story, fairly high up, but it wasn't THE story until the video dropped. The fact that they fired and went after the cops FAST got a lot of attention. Then the police, feds, and family asking people to remain calm and not riot 24 hours before release meant it was going to be bad.

Now it's going to be a dragged out legal process, which is much less easy to push compared to a 10 minute execution.


hearmenowboi t1_j68q0ke wrote

They only have the bandwidth for 10 stories at a time. 10 stories on rotation over 24 hours.


MEMENARDO_DANK_VINCI t1_j68wp23 wrote

I realistically have the bandwidth for 7 stories at a time so they do better than me


cnorw00d t1_j68y2hn wrote

Yes but your brain is not a multi billion dollar network supposedly filled with "journalists"


MEMENARDO_DANK_VINCI t1_j68yhf5 wrote

You’re looking at it from your brain and not the populations brain, if it was more likely to make money with a million different stories it would. All of humanity only has so much room in their head for the news and the news knows that if they want any two people to be discussing and thinking the same things at the same time, otherwise being valuable as “news” then they’ll have to focus their attention

I’d also like to point out that these are the stories that are the top 10 not the thousands of other stories made every single day and fighting for these slots


cnorw00d t1_j69b16g wrote

I'd get that for a news show like ABC or CBS but fox and CNN are 24 hour news networks. They have the opportunity to introduce new conversations instead of focusing on the same few ( and dumb most of the time) stories and they mostly do it to keep people watching for advertising. They don't even come at it with differing perspectives beyond what is considered the mainstream left and right.

In the age of the internet we should know that people are capable of focusing on multiple things. Tik tok should show you that there are worldwide superstars you may have never even heard of


MEMENARDO_DANK_VINCI t1_j69lla9 wrote

The individual is but for the news to have any weight they have to run the same thing more than once and talk about the same thing more than twice


ianhillmedia t1_j6cyn6f wrote

We have a saying in broadcast news. When you’re tired of talking about it is usually when the audience is first hearing about it.


MEMENARDO_DANK_VINCI t1_j6a7za9 wrote

I think you think you made a point, but you did not


iiioiia t1_j6a8wx2 wrote

What's interesting is that "made a point" does not have one single implementation, but this tends to be not how it appears to a person performing an implementation.

I mean come on, this is a data science subreddit, not /r/politics.


MEMENARDO_DANK_VINCI t1_j6abvab wrote

No, you come on and use your words to describe the point YOU want to make. Arguing by allusion is not arguing in good faith. Any Interpretor would have to parse their thoughts through you not being forthcoming with yours.


iiioiia t1_j6acsoo wrote

> No, you come on and use your words to describe the point YOU want to make.

I've made it above, you are welcome to do with it as you please.

> Arguing by allusion is not arguing in good faith.

lol, memes are not effective on me, though I suspect they'll be rather influential on 3rd party observers (which is the point perhaps?).

> Any Interpretor would have to parse their thoughts through you not being forthcoming with yours.

Oh do you know how people you've never met would experience the situation?

Sir: are you putting me on?


MEMENARDO_DANK_VINCI t1_j6ad9ck wrote

What? How do I know that the human psyche changes when met with unknowns vs knowns?

Making a point to yourself is what you did, troll me is also what you did. I didn’t state a meme I informed you that your method of argument is lackluster in a polite discussion, though I didn’t use so many words.

It’s okay if you can’t articulate your point well just try your best! And I’ll help you refine what you actually wanted to say :)


iiioiia t1_j6aedge wrote

> What? How do I know that the human psyche changes when met with unknowns vs knowns?

You very well may not know's not exactly common knowledge!

> Making a point to yourself is what you did, troll me is also what you did.

Here you are describing your experience. The experiences of others (more commonly known as "reality", or what "is") are not necessarily the same.

> I didn’t state a meme I informed you that your method of argument is lackluster in a polite discussion, though I didn’t use so many words.

"Not arguing in good faith" is a meme.


  • an image, video, piece of text, etc., typically humorous in nature, that is copied and spread rapidly by internet users, often with slight variations.

  • an element of a culture or system of behavior passed from one individual to another by imitation or other nongenetic means.

> It’s okay if you can’t articulate your point well just try your best! And I’ll help you refine what you actually wanted to say :)

Haha, I love it!! 🙏 You just earned yourself an updoot, partner!


Cash907 t1_j6953m1 wrote

Wow this style of chart sucks. Thanks for making my eyes hurt first thing in the morning.


epomzo t1_j69wra2 wrote

Exactly. There are 5 subtly different greens, 3 different mauve-purples, 2 of orange-red, and one light blue. These are different pastel tints in the same washed-out shade. Guaranteed this is not ADA compliant.


punania t1_j6bstvl wrote

The worst is that the topics are only at the end. It’d be easer to follow if the stories were listed at the front, too.


Frank2484 t1_j69g7i6 wrote

Wow this style of comment sucks. Thanks for making my eyes hurt first thing in the morning.


janellthegreat t1_j697355 wrote

What does the thickness of the color band indicate?


PartisanPlayground OP t1_j69ai2i wrote

The thickness indicates the share of articles that cover each story. It sounds like I need to make this explicit on the chart.


Segamaike t1_j69jg10 wrote

I mean. It looks lovely, but how is it surprising to you that a chart needs both axes to be labeled? Data is pointless if it’s not measurable and your most important metric isn’t even in the presentation


PartisanPlayground OP t1_j69js3w wrote

Glad it looks good. I figured the title of the chart took care of the y-axis, but I'll label it explicitly in the future. I think the x-axis is pretty clear.


Magmagan t1_j69mhw6 wrote

I think it's pretty obvious, I hope the extra labels don't detract from the viz. Love this style, can't wait for your post next week!


Tomoromo9 t1_j6c3t5i wrote

Why assume what is obvious when we can be purposefully obtuse?


chasing_the_wind t1_j6a2mc2 wrote

The title took care of everything for me. I was able to make the correct assumptions on everything you did with a little hand waiving at the details I didn’t care about. What else would the axes be?


justennn t1_j6c3nhf wrote

You still need a scale for this to mean anything. Otherwise it’s just artistic representation of data.


dtreth t1_j6aok4o wrote

It's a proportion graph, they're not often labeled


Usernametaken112 t1_j69cc3i wrote

Take a guess


RealTendy t1_j69ut79 wrote

Date is the opposite of guessing


Usernametaken112 t1_j6a6naz wrote

If you can't understand the context of news stories and what bigger/smaller bars mean by date, you weren't really capable of adding anything to the convo even if it was spelled out for you


wagonmaker85 t1_j69pjkf wrote

*in the United States

I’ve seen one of those news stories at all in the past week. A fleeting mention of three others. And never even heard of the rest.


dtreth t1_j6aomc8 wrote

Reddit is an American website


wagonmaker85 t1_j6apimo wrote

So? Its reach, and intended audience, is clearly global. And this sub is not US-specific.


I_eat_dookies t1_j695epj wrote

This is the most garbage format to present data. It isn't easy to read or digest, just use a different chart cause this shit is bunk af.


Count_baklava t1_j69w3mu wrote

Get with the times, it’s stupid simple to understand.


Danny_ODevin t1_j6a2esr wrote

Stupid simple to misinterpret. Complete with unlabeled mystery data pointlessly taking up space


Kazko25 t1_j69w8xn wrote

Pretty unreadable, the colors are too similar to each other


sakezx t1_j695h8d wrote

Which news? Where? Another prime example of r/usdefaultism


snerp t1_j696z9m wrote

that sub is so stupid. This is an american website, of course references are going to default to american point of view.


wagonmaker85 t1_j69pum7 wrote

This sub is about data. Well-presented data should be clear and unambiguous. The reader should have no doubt about what they are reading. Leaving out the fact that this is American, anywhere on the post, is simply sub-par data analysis.


snerp t1_j6b7yj8 wrote

Can't argue with that, good take.


KrozJr_UK t1_j69nbag wrote

I don’t mind a preponderance of American-centric views. What I ask for is two simple letters. Compare: “How news stories evolve in the news cycle” to “How news stories evolve in the US news cycle”. Real big difference, right?


Powerzap t1_j69jv38 wrote

No, it’s the American defaultists who are stupid. They are completely ignorant of the fact that there are people on the internet who are not in America. This mentality extends way beyond Reddit.

Re Reddit being an ‘American website’. Sure, it’s hosted in the US - but a considerable 46% of users are not there.


rramosbaez t1_j68wn5s wrote

Nothing about guy killed protesting cop city? Thats all over my feeds, but i guess it makes sense big media wont run it


vberl t1_j69r2dy wrote

Talk about an America-centric post…


victory-or-death t1_j69ljyx wrote

This is so so difficult to process, I barely know what I’m looking at let alone how to understand this


Count_baklava t1_j69wftv wrote

I see what you’re saying but reading this chat is common sense.


victory-or-death t1_j69yqie wrote

The idea behind r/dataisbeautiful is to provide visually appealing and user friendly data, not having to trawl through the comments for tips and tricks


Danny_ODevin t1_j6a47zi wrote

Not really. "Prevalence" seems common sense but is easy to interpret differently from person to person, and easy to be misrepresented on a chart built off of quantitation yet doesn't actually quantify anything. But sure, it's pretty.


Jeb_Kerman1 t1_j69i7s9 wrote

It’s fucked up that a shooting isn’t in the top three for a week


gucci_gucci_gu t1_j69lh52 wrote

America is seething over police brutality


Denziloe t1_j69m92p wrote

Whatever insight this is supposed to convey, I'm not getting it.


thepancakehouse t1_j69p58c wrote

And not a fucking drop of information about the Pfizer employee


Rinzern t1_j69udzy wrote

You wanna give another drop then cause that's vague


thepancakehouse t1_j6a0l9t wrote

Part of the problem is there aren't many reliable sources even discussing it. Look up Jordan T. Walker - Pfizer

That's a Twitter link to prove he works for Pfizer


FixSwords t1_j6ao9yj wrote

I looked it up and it appears the only news on it has come from Tucker Carlson. Any reputable stories to get me caught up?


thepancakehouse t1_j6b1jsc wrote

Oh God, it's gotten worse in just 24hrs. Of course only some wacked out conservative source (fox) is the only source. I know Newsweek (not crazy reputable) had an article. If nothing else, the 10min video where the Pfizer guy, Walker absolutely spilling all of the beans, made its rounds on twitter. I'll see what I can do. This is my point though. They are burying it; making it out to be nothing. This is cahoots on a legitimately high level. Why can't it be discussed?!?! Why can't we ask questions and get answers from the source and not simply a lack of existence of information?! There's no reason the Pelosi tape should have more coverage than this. That attack is OLD NEWS and there is a metric ton of stuff news media will talk at length about even though they know literally nothing about it.


FixSwords t1_j6cd16z wrote

Is it possible there’s not really much of story there? Or at least not enough verified info/identities for journalists to report on?

I’m not trying to discount the story, it’s just this reads a little like some conspiracy claims can sound where the lack of evidence is cited as proof of the conspiracy.

I don’t know how ‘they’ would bury it from the likes of journalists if it was all over Twitter?


frozen_tuna t1_j69q67e wrote

That was the first thing I noticed too. He wasn't just an employee, he was a director. Pfizer is a company with 79k employees and there's only a few dozen people above him. He's pretty darn high up.


tudorcat t1_j69rdvq wrote

Can you clarify whether you're just looking at the US, worldwide, or what?


despejado t1_j6a72jr wrote

Monterrey park shooting… wow tells me all I need to know about this society.


Dry_Inflation_861 t1_j6bhyvc wrote

TIL this sub is a bunch of prima donnas and cynical assholes. This chart is nice and interesting but definitely has it's flaws. I think the color palette makes it a little difficult to follow backwards and the trump/ desantis / gop/ future I don't quite understand, the other topics don't seem to be combined like that. But man, you don't deserve the hate you're getting.


ianhillmedia t1_j6cydkh wrote

Hey there, journalist here with 20+ years experience in the news industry, including 10+ years in digital news. To fairly consider the “prevalence” of news stories and how they progress in the news cycle you’d need a much bigger dataset. Right now, as you’ve described it in other comments, a more accurate title for this chart is: “How the top 10 articles were curated and ranked on homepages of 64 U.S. general interest national(?) news sites when the researcher looked at them.” Correct?

That can still be interesting data, but it doesn’t truly reflect the which stories are most “prevalent” in the news.

Know that news website homepages are one of a few (many?) places people consume news online. Google Analytics, which many news organizations use to track success online, reports acquisition channels as Search, Social, Referral and Direct. The percentage of users that see or are delivered (“prevalence”) and consume news via each channel varies by news organization, but a news org with a legacy brand will get a healthy percentage of traffic from each. People that come to news homepages are a subset of Direct traffic, which also includes users who just type in, for example.

Direct traffic also can include visitors to news organization mobile apps, which can be curated differently from website homepages. Direct traffic does not include people who read push alerts from mobile apps but don’t click through, and what those folks see also should be considered when determining the “prevalence” of news stories online. Referrals, meanwhile, can include visitors who click through from news organization email newsletters, which are often written and curated differently from homepages.

So the stories “prevalent” to U.S. news consumers can be different based on the platform on which they’re delivered news.

That brings us to Social and Search, both of which send healthy traffic to U.S. news sites and play a noteworthy role in determining the “prevalence” of news for Americans. Pew research reported in September indicated that 50% of U.S. adults get news from social media sometimes or often. 82% of American adults use YouTube; 25% of those users say they regularly get news on the site. 70% of American adults use Facebook, and 31% of those users regularly get news on the site. 30% of American adults use TikTok, and 10% of those users regularly get news on the site.

So to really track and report the “prevalence” of news stories, you’d also need to track and report which stories are delivered and consumed on social, and that delivery is determined in large part by social network algorithms powered by user behavior. Which is in part how we get vertical communities on social (“BookTok,” “Black Twitter.”) The prevalence of news stories in those communities can be community-dependent.

For Search, the good news is that data on what news stories people seek out and are delivered is available from Google Trends. That said, I’d suggest reading the Google Trends help docs before digging into and reporting that data. You need to know what relevance means to Google when looking at those numbers.

Those are just the differences in digital formats that need to be considered when researching the “prevalence” of news stories. We haven’t even discussed that to really measure “prevalence” you’d need to consider what’s in print editions and broadcast newscasts, both of which still help determine the news agenda for the country. We haven’t discussed the role that consumers play on setting the agenda - the number of people clicking on a story on other acquisition channels helps determine if that story is ranked on a homepage, and for how long. And we didn’t discuss the fact that some news organizations are testing personalization of homepages powered by machine learning. What you see on a news homepage can be unique to you and based on how cookies tracked your habits across the web. It might be different from what other visitors see.

It’s also worth noting that 64 news sites may not constitute a useful sample. At a minimum, in the top 100 DMAs in the U.S., there are typically at least four broadcast news websites and one newspaper of record website. That’s 500 local sites that determine the prevalence of news in the U.S. just in the top 100 DMAs. There are 210 Nielsen DMAs in the U.S. Many of those DMAs also are home to hyperlocal startups and alts which also should be considered when tracking and reporting the “prevalence” of news stories. What’s prevalent to people in Cleveland will be different from what’s prevalent to people in Memphis, which will be different from what’s prevalent to people in L.A., etc.

And that’s just the U.S.

That’s not to say that there aren’t worthwhile data-based stories to tell about news consumption and delivery in the U.S. It’s always interesting to learn more about how specific stories are presented by different news organizations on a specific market. You also could subscribe to a bunch of different newsletters and report on what they present, given that newsletters are static. Google Trends data from the previous day also is static.

Here’s the source of the Pew data about news consumption on social:

Hope that’s helpful!


PartisanPlayground OP t1_j6d2gwe wrote

This is an excellent comment, thank you for this!

I think I need a clearer way of describing "prevalence". This chart is showing the top ten stories by the share of articles written about them, not by the amount that they are consumed. I take articles from 64 sources on every day, cluster them together into "stories", then calculate each story's share based on the number of articles written about it. For example, if there are 1000 articles for a day, and one story has 100 articles written about it, then its share is 10%. Does that make sense?

I've explored measuring consumption of news in the past, and found it to be very difficult! (Facebook's Graph API used to be wide open, so I was able to get likes/engagement on news stories there, but it has since been locked down) Your comment does a great job of explaining the complexity in measuring consumption. You would need to combine:

- GA data from news outlets (which they don't publish)

- Cable news data (sources exist for this, but you would need to make a lot of assumptions to combine this with articles)

- Social media data

And you would need to make a lot of assumptions about what weights to use on each of those. As a result, I'm keeping this simple and focusing on article shares.

I do publish a daily automated Twitter thread on which news outlet gets the most engagement on Twitter. It includes the most liked and ratioed tweets from each "side" of the media. This is limited to Twitter, so does not cover all the channels you described. See an example here:

The other thing I've been doing is cutting articles by which "side" of the media they're on using media bias ratings from AllSides. Again, this involves some simplifying assumptions so it's not perfect but gives a good high-level view. You can see examples here:

Thanks again for your comment. This is exactly the sort of thing I was looking for when I posted.


ianhillmedia t1_j6d3dqb wrote

Happy to help! And I think you’re spot on when you say you need to clarify the definition of prevalence. Just because a news org puts resources into a topic doesn’t mean it’s prevalent to the user. That said, the number of stories a news org efforts on a subject is an interesting data point.

As someone on the other side of this, I hear you on the challenges associated with getting useful data. How are you currently tracking all articles published by those news orgs? And how are you parsing that data to identify specific stories - what search terms are you using to filter the data?


PartisanPlayground OP t1_j6d6ebz wrote

I'm getting the data from the Google News API. I've used RSS feeds in the past with similar results.

And actually I'm using a clustering algorithm to identify the specific stories. I have an automated process that pulls all articles from the past five days, clusters them into stories, then produces a bunch of analysis. This saves me a lot of time and brings some objectivity to the process.


ianhillmedia t1_j6db30j wrote

Got it thanks for the reply! I know not everyone supports RSS, and it’s a challenge when folks format RSS in different ways, but as they’re a primary source from the publisher I’d encourage you to use RSS over APIs from Google.

I was curious the signals in your algorithm as well. One of the challenges with automating taxonomies for news stories is the inexactitude of language and differences in style. A story might mention DeSantis and books in the headline and description but might actually be about GOP primaries; a story might emphasize DeSantis in the primaries in the headline and title but it might actually be about book banning.

Or a better example: a story that mentions Tyre Nichols may be about the actual incident, police violence or defunding the police.

Digging in even further, a local news organization might use colloquialisms for place names that can make it difficult for folks from outside that market to categorize those stories.


PartisanPlayground OP t1_j6eo5hz wrote

You're hitting on the most subjective part of this whole process. I've run into all of the issues you describe, and the question is ultimately: how do you define a story?

Your GOP primaries example is a good one. Let's say we have articles on Trump's legal issues, other articles on Pence's classified documents, and other articles on DeSantis and books. Now let's say all of these articles describe these things in the context of the 2024 GOP primaries. Is this one story called "GOP primaries"? Or three separate stories? You could make a case either way.

I've tuned the algorithm to split stories in a way that "looks about right" to me. That's subjective, but there's no way around it. This is an issue whether you're using an algorithm or doing this manually.

A related challenge is that story definitions may change over time. The classified documents story is a good example for this. Right now there are articles on Trump, Biden, and Pence all mishandling classified documents. The algorithm is categorizing all of them as the same story (fair enough).

But let's say that next week (just making this up), Trump gets indicted for it. Is that a separate story now? If so, how do you treat that? Do you retroactively split out the "Trump" portion of the "classified documents" story as though they were not the same story before? Do you show the classified documents story splitting into two? Do you just create a new story on the day the indictment happens? Currently, the algorithm is set up to do the first of these, but again, you could make a case for any of them.

All of this is to say that there is subjectivity involved in this process.


nervousengrish t1_j69kqk0 wrote

Label on the left and group items that emerge later and don’t grab as much share (Israel, Paul Pelosi, etc). It’s almost unreadable otherwise.


PartisanPlayground OP t1_j69kyys wrote

What do you mean by grouping the items that emerge later? Thanks for the feedback.


nervousengrish t1_j6a09hf wrote

The Paul Pelosi and Israel stories only appear for one day and don’t make much of an impact on the story you’re trying to tell about longevity in the news cycle (there’s just not enough data).

I would group them into a single bucket labeled ‘Emerging Stories’ or something.


Astronaut-Frost t1_j69mno7 wrote

I subscribed. I really enjoy this. I'd love for it to become even more in depth with filters for specific topics.


DirtyGoo t1_j6a77e1 wrote

Really hope the lies from George Santos keep surfacing. It just gets more bonkers every time and is immensely entertaining.


levoniust t1_j6aa85t wrote

Is there a website for this? I would love to go back in history even a few months back and take a look at what was super popular in a chart like this. Or just every day see a snapshot of what is being popular. Having something that is arguably unbiased and simply showing the topics that are most popular I find quite intriguing and useful.


Plokmijn27 t1_j6aak6e wrote

ukraine tanks and debt ceiling look interesting


LurkingChessplayer t1_j6azw39 wrote

Nothing about the project veritas video? Kinda shocked


hellopomelo t1_j6byuis wrote

Am I the only one who thinks the colors are, for once, pretty good?


justennn t1_j6c3iib wrote

This is meaningless without a y axis label and scale


Siom_one t1_j69tb1q wrote

And not one word about the VA girl who got trafficked in Baltimore because the judge "couldn't return the child to her parents" due to them knowing what a boy and a girl is.


ar243 t1_j6922r3 wrote

It's crazy that Ukraine isn't at least #2.

I hope we don't run out of steam supporting them.


PartisanPlayground OP t1_j68fge6 wrote

Hi Reddit! I've been producing analysis on news stories for awhile now, so I figured I should post to Reddit to get some feedback.

You can find this visualization daily along with others at my Substack:

My intention is to add more GPT-driven content in the future.

You can also find these visualizations and analysis of tweets at my Twitter account:

The source for the data in this visualization is news articles taken from 64 news outlets. Articles are clustered into stories on a daily basis and plotted with ggplot2. This particular visualization uses the ggsankey package.


Xystrel t1_j691ot8 wrote

What is this type of chart/visualization called?


beansAnalyst t1_j6busyd wrote

Don't know why this is getting so much hate. Assuming viewers have read one page of news in the last week, this graph makes it clear what the media is prioritising.


qazarqaz t1_j68qdgj wrote

Wasn't Veritas-Pfizer leakage covered anywhere? I expected it at least to be catched up by conservative outlets