UCalgary podcasts feature interviews with experts from our community on the COVID-19 situation.
Episode 26: Fake News Detector
June 5, 2020
"Fake news" has been an issue of public interest for years, with the Internet flooded with blatantly false information that gets widely shared using social media. We talk to Dr. Ray Patterson, PhD, a professor of business technology management. Ray and a team of fellow scholars have discovered a method to detect fake news sites by analyzing their digital supply chains.
Ray Patterson (RP): All of this happens within the first 200 nanoseconds of when you touch a website so there's an auction that's taken place. It's put up for bid. Hey, Ray has shown up to my website. Do I hear a bidder? You know? And they actually have this auction in the first 200 nanoseconds and the ad is placed on that site and I see it.
Deborah Yedlin (DY): That was Ray Patterson from the Haskayne School of Business and this is UCalgary COVIDcast. I'm DY Yedlin and thanks for joining us. The term fake news entered public consciousness during the 2016 U.S. Presidential campaign but lately the internet is flooded with information from questionable sources. Think about the conspiracy theory that the novel coronavirus was linked to 5G cell phone towers in China. This blatantly false information spread like wildfire over the internet. In fact, false stories are spreading 10 times faster than real news and the problem of fake news threatens our society. Call it a virus of a different sort. Today on COVIDcast, we're talking to Dr. Raymond Patterson, a professor of Business Technology Management in the Haskayne School of Business. RP and a team of academics recently published a paper on a method to detect fake news sites by analyzing their so-called digital supply chains. He joins us today to talk about his findings. Welcome ay. Thanks for joining us.
RP: Thank you Deborah. It's a pleasure to be here.
DY: So let's start with you telling all of us what a digital supply chain is.
RP: So the digital supply chain is the entire host of third parties that a website interacts with the moment you touch their website. So in the first couple of seconds that you touch a website, that you would click on it and open it up, the website is going to share your data with dozens and dozens of third parties and these third parties are essential to creating what a website is. They could be for targeting and advertising. They might be for functionality. They might be for performance but all of these third parties come together and form what you see as the website when you touch it. Now, that digital supply chain is unique to each and every website has their own unique set of suppliers, if you will, or third parties. So these third parties are brought together by the website to essentially pull off or do what the website does.
DY: Right. So they're functional.
RP: In many cases, it might be advertising. They're often functional but there are also some advertisers or some third parties, I should say, that are purely there to take your data as a user. I mean they're not all benevolent. There are many ...non-benevolent or can we call them bad guys?
DY: You can call them viruses. They jump onto the website and they take data and that's not a good thing.
RP: But the website makes money off of doing that and in our previous research, we showed that users actually sometimes prefer to get a free website experience rather than having to pay for it. Now maybe you've been to news sites that say, hey, I've turned on my do not track and I block things and they said hey, if you want to see this news article, you're going to have to open up to us and we're going to have to, you have to let our third parties in. What we showed in some previous research is that sometimes users want to be, let's call it spied on in order to get free stuff.
DY: Right. So tell me what your research uncovered that might have surprised you and led you in a direction that you might not have expected.
RP: Right. So what we've done in this research is we've taken those third parties and we've simply asked the question, based on the set of third parties or that footprint that each website has in the third party space or in their digital supply chain, could we use that set of third parties to tell whether or not a news site in particular was real or clickbait or fake? Is it trustworthy or untrustworthy? It turns out that the digital footprint that each website has is similar to other websites that are doing the same thing. What we've got going here is that there are different industries, if you will, of the fake news industry and the clickbait industry essentially is calling upon a different set of third parties than the real news industry would. So here's my analogy, and I don't know if you have time to watch Netflix, but have you ever watched "Breaking Bad" on Netflix? You have to admit it now.
DY: No. I haven't actually.
RP: You haven't?
DY: No. My kids have watched it. I haven't watched it. I feel like I'm still try to catch up with everything that everybody's been watching for the last several years. Know all about it. I've heard all about the storyline but I have not seen it.
RP: Okay. So have you ever heard of his lawyer Saul?
DY: No. I have not heard of his lawyer Saul. All I hear about is Bryan Cranston playing the character. Yeah.
RP: Okay. So there is a spinoff called "Better Call Saul" and Saul is a lawyer who specializes in seedy clients such as the drug dealer that Bryan Cranston plays...
RP: ...and the idea is simply if I knew who the client list was for Saul, the seedy lawyer, we might be able to use that information and predict oh. Wait a second. You're a client of Saul's also? Does that mean that maybe you are also ... Now I'm sure that Saul also had some divorces and other things that he threw in there in his client list so not all of clients were bad but we might be able to use that and all of the other contexts that one of the clients had to make a prediction about whether or not those clients were in fact drug dealers themselves and the same idea here. We could look at not only the clients of the third parties but also the combination of third parties that are used by different websites to actually predict do you look more like your footprint? Your digital supply chain footprint is more like a trustworthy source or an untrustworthy source.
DY: So you're kind of talking about impersonators in a way.
RP: In many ways. Deception is one of those ... So that was where we started this whole thing was we said ... Most of the literature is taking a look at these impersonators as you call them and they're very good at deception and our initial response or run at this was this is a fool's errand. There is no way that we're going to be able to look at the text of what they're saying on any particular article or news article and add anything to the literature from that end but what we had was access to the third party digital supply chain and we can actually go out and collect that and it turns out that ... Well, there's an old Spanish proverb and I'm going to quote it too just for a second and the Spanish proverb says "Tell me with whom you walk and I will tell you who you are." It's name-by-association...and that's all we're doing here. We're picking up ... If your business model is to be false or to be untrustworthy, do you fundamentally do things differently than if you were the New York Times, the Calgary Herald? And the answer is yes.
DY: The answer's yes and so I guess the question is ... I guess maybe I'm naive but I'm surprised that people who deal in fake news and incorrect information actually make money at it and do have a business model that people somehow give some credibility to. What am I missing?
RP: So there is a lot of advertising that goes on even with unsavory characters and so this business model would perhaps be different for different types of news outlets. So let me give you an example. There might be advertisers who don't want anything to do with fake news.
DY: And how do they know that? Like how can they determine for sure that they will have nothing to do with fake news?
RP: So there are controls that you can place. For example, it's not only fake news but in our research, we found that there were other types of websites that advertisers stay away from in droves. For example, in the adult section. For example, Proctor & Gamble would not like to see their advertising show up on an adult website. So the advertisers are very aware of where their advertisements are showing up on and they're very concerned about the image that is portrayed so it's not like it's random. There is an ecosystem. It's an entire multibillion-dollar industry of placing ads on particular websites and the people who are advertising are extremely cognizant of where their ads end up on. And if your brand, for example, is all about honesty and trustworthiness, let's say it's Disney. Disney is all about a very clean brand.
RP: They probably would not like to see their advertisements show up on untrustworthy or nefarious websites.
DY: So how can your digital supply chain research help users to tell fake news sources from a trustworthy news source?
RP: Okay. So this research is ... this is a proof of concept and this is very early research that illustrated very clearly that you could use the digital supply chain to very, very, very accurately tell or predict whether or not the next website you touch is real or fake or whether it's trustworthy or untrustworthy but it's not a commercial product at this point.
DY: Are there particular elements within the digital supply chain that are more relevant than others that will tell you whether you're going in the right direction or not? What would they be?
RP: Absolutely. So we actually incorporated that information into our algorithms and so we looked at the portions of the digital supply chain that only dealt with real news sites or only dealt with fake news sites and that was part of our algorithm. Now if you're asking me to name names, we held back on that. We didn't name names for fear of someone suing us.
DY: Fair but I'm just wondering, is there a certain piece in that supply chain? Let's say there's 10 actors and you will absolutely point to two that propagate this fake news chain or this misinformation.
RP: Absolutely and so that was one of our algorithms and it was an amazingly accurate component. The problem with doing that is that we were only able to see in on about 74% of the websites. Because not all websites will have dealt with either the completely pure or the completely nefarious third parties. The analogy that I use for this is I'm buying toilet paper at Costco. All businesses buy toilet paper at Costco. Everybody buys toilet paper at Costco. So I don't necessarily get any information from whether or not you bought toilet paper at Costco. Going back to the "Better Call Saul" analogy from "Breaking Bad", I would suspect the Saul also has a membership at Costco and to know that Saul buys his toilet paper at Costco doesn't tell me anything unique about Saul but the constellation of third parties and the websites that the third parties do business with the constellation does and we can't tell from the naked eye whether or not a website is good or bad. It's not just one component. So we had to use machine learning, SPM classifiers, and other tricks, machine artificial intelligence methods, to allow the algorithms to see better than we could as humans. So if you're asking me what's the magic elixir, you can't see it with the naked eye. Now sometimes you can. Like I said, 75% of the time you can actually just look and say well, do you deal with people on this list, the good list, or the people that show up on Santa's naughty list. It's kind of like if you were kids and you played with someone who was on Santa's naughty list, maybe you were more likely to be naughty and that's all we're doing.
DY: Yeah. But as users, what should we be looking for? Like we obviously don't have the depth that you have but as a user, what should we be looking for beyond the obvious? I mean you look at some websites and you say there's no way this makes any sense whatsoever so I'm not even going to click on it because I don't know where it's going to take me.
RP: Well exactly and so until these algorithms are commercialized by someone ... That would be a cool thing and I would love to work with a company that would like to do this. We'd love to but as a user, I've got two points of advice. So first is be careful what you read so question what you read and fact check and most importantly, perhaps, is that important decisions should never be made on the basis of rumors, speculation or someone's opinion. I think it's a phenomenal idea to go ahead and buy annual ... Not annual passes but a subscription to legitimate news sources. The New York Times. The Calgary Herald. It's okay. It's a very, very good idea to spend the 20, 30, 50, 60 bucks for a verified legitimate news source. On the other side, be careful with what you repost. The lessons here are if you're reposting unscrupulous, untrustworthy websites and resources, you're part of the problem.
DY: Sure and you're also ... You know there's confirmation bias that sort of plays into this as well, right, because sometimes people are looking to confirm their opinion and if it kind of works, then it will get reposted.
RP: Well, I want to tell you a story about a guy by the name of Jestin Coler. So Jestin's from Denver and Justin started the Denver Guardian, NationalReport.net, USAToday.com.co, WashingtonPost.com.co, and I don't know how many others. Jestin was a purveyor of fake news around the time of the last U.S presidential election and some of his articles have been verified, rated false by Snopes. Clearly they were fake news and Jestin did an interview with Colorado Public Radio where he laid out why he was doing this and at first it was just to prove how stupid people were and how gullible they are and then it turned into money. Jestin described a whole host of third parties that were there to help him advertise when he got knocked off of Google. And so when Google started to crack down, Jestin described the third party infrastructure that existed in this nefarious space. I mean there are a lot of people waiting to help you advertise and make money off of ... So you have a conglomeration or a ... So now I'm going to refer to the research done by Dr. Kate Starbird with the University of Washington and she studies how people use fake news after disasters and Kate, in this article that I'm reading from, she talks about how she's trying to unravel the insurgence, the hustlers, and foreign agents as they come together to upend the nation's political discourse. So basically what Kate was studying was okay. Yes. You have some really bad, big actors, which have been documented recently in the news. So we know that the Russians, as documented by the Mueller report ... Clearly we're part of this fake news entourage and interestingly enough, there are some signs currently that with COVID, back to the reason for this podcast, that the COVID misinformation is also being driven by some. We don't know who this is but let's suspect who it might be, some very, very, very large, well-moneyed actor, probably a nation state, and it's generating up to 40% of the fake news related to COVID. Maybe more. We don't know but it's a massive issue. And so you've got this nation states that are generating ... They have big operations to create fake news but then you also have little guys who are circulating it around trying to make money off of it and it's just a mess.
DY: So we talk about the state of the media a lot and it seems to me that you have challenging times with media outlets. So how come the media outlets themselves aren't focusing on doing the research and constructing systems, maybe using your digital supply chain information, to deal with this competitor that's taking money away from them and that's a problem?
RP: It's so true and you saw that maybe this week. You might've seen that the Australian government has asked for several hundred million dollars from Google for the reposting of the Australian news sites and so it's in the news. Now in terms of our research, this method is ... Basically, no one had done anything like this before so it is cutting edge. It's a rethink of how you would look at fake news and that is part of what we do here at the university. Our team and I know many, many, many other teams around campus are trying to push the boundaries of what is possible in this world. We're pushing the boundaries of the assumptions that constrain us and this is part of that whole process that we go through all the time to try and create new knowledge.
DY: So knowing what you know now, how can we solve this issue of fake news or are we still running to catch the train that's left the station, so to speak?
RP: Well, I think part of it is labeling things as fake. We saw big strides taken this week by Twitter.
RP: Actually, that was quite a...
DY: And you know what's interesting is that Facebook employees are now going to start to speak out against what the company is not doing in this regard as well. So that grassroots movement is definitely starting to take root, so to speak.
RP: And we've seen employers in the tech industry that then fire those workers that speak out.
DY: Yes or they resign on their own as we saw with Amazon.
RP: What's that old curse wish? May you live in interesting times?
DY: Yes. We're definitely living in interesting times right now but I guess ... Do you see your method, your research, the tools that could arise from what you've uncovered, do you see as being commercialized?
RP: It's commercializable.
RP: ...for sure and that is not what I do. As a researcher, I try to push, as I said before, push that boundary of knowledge with new methods, new mousetraps, new ways of thinking about the issue. To take the third parties instead of the articles, this is a very, very, very unique perspective and here's why I think that what we've done actually can last longer. It has some feet or legs, as they say, it has some legs. So when you look at the cat-and-mouse game that goes on with deception at the article level, the untrustworthy news sites are pretty good and actually, sometimes it's very hard to tell whether or not something is real or fake.
DY: Do you have an example that you could read?
RP: I have an example that is a new technology. So deepfakes with videos. Deepfakes with videos are now able to really ... We can't tell very ... You have to be almost an expert to tell whether or not of lighting and the angles of lighting. You have to spend a lot of time to figure out if that deepfake video or photograph is actually real and that's brought on by artificial intelligence as well and so for every measure, there's a countermeasure. It's a big cat-and-mouse game and what's different about what we've done is that it's the ecosystem of everybody that you deal with and it's very hard for a tiger to change its stripes.
DY: So where does the regulatory framework fit into this or this is a classic example of technology and systems getting ahead of the regulatory framework and we have to deal with the fallout until that regulatory piece catches up.
RP: We've been getting that question a lot and the question essentially comes to are you advocating to outlaw this or to regulate fake news...and I don't think that any, I haven't heard anybody on our team go that far. For example, with clickbait.cI don't know about you but I actually ... There are times of the day when I actually like and enjoy clickbait for the entertainment value. Who doesn't want to see the 50 craziest pictures that you won't believe from World War Two or that you'll never guess what Loni Anderson looks like now kind of clickbait.
DY: The National Enquirer has come to your desktop, basically if you're doing that. Right?
RP: Yeah and I wouldn't advocate outlawing that but perhaps labeling it so that I know that hey. Wait a second. This is probably not real or this is clickbait or this is fake. Certainly let me know. I do believe in people's wisdom if they're knowledgeable.
DY: If they're informed.
RP: If they're informed.
DY: That's the challenge. I mean I think that we have ... That it's getting harder and harder to figure out what's real and what's not. So I guess what ... do you have any specific tips for people in terms of what they need to do to spot something that is fake news and to stay away?
RP: Know your source, first of all. I often check the URL. I want to make sure that I'm not looking at the WashingtonPost.com.co. I want to make sure that it looks right. So even when I click on the Washington Post, ... Now I have a subscription to several ... Since doing this research, I've picked up subscriptions to several very legitimate web websites. L.A. Times, New York Times, Washington Post, whether it be the Calgary Herald or whatever it is. We're not talking about bias. So our research does not try to determine whether or not a website is biased left or
right. We treat all of these websites, the real news, whether they're biased left or biased right. Fox News we treated as real news. I have friends and relatives that might take a counterargument to that but either way, we treated that as real and CNN or MSNBC also real. Bias is not what we're talking about. If they are truly nefarious websites and they have a different business model.
DY: And what is their end game? Do you want to speculate on that?
RP: So it depends who are you talking about amongst the players, as I said. For some of them, it is to change the political discourse. That might be some of the nation state actors that we are pretty sure are engaged in propagating misleading information. It could be. But then there are also a lot of other players that are like this Jestin Coler, who is just basically a regular guy who fell into this business and figured out that he could make a lot of coin with the advertising and that there was an industry for advertising on these untrustworthy websites.
DY: Now you're a professor in the Haskayne School of Business. So what are the implications of these kinds of websites from a business context and for businesses? Like how should they manage the risk of potentially inadvertently becoming associated with some of these websites? What's the business risk?
RP: Yeah. As a business who's advertising on the web, you need to be very aware of where your advertisements are being placed and which website ultimately does it get placed on. All of this happens within the first 200 nanoseconds of when you touch a website so there's an auction that's taken place. It's put up for bid. Hey, Ray has shown up to my website. Who would like to ... Do I hear a bidder? You know? And they actually have this auction in the first 200 nanoseconds and the ad is placed on that site and I see it.
DY: And is that based on your previous history?
RP: Absolutely. It's based on not only my own previous history, it's based on their knowledge of other things that I've done. So...
DY: On the web, your browser history, et cetera,
RP: Not only on the web but your real-life history. So there are companies, third party companies, that actually connect your real-life footprint to your digital footprint to where you're located right now to place this particular ad. So like for myself, if I'm moving on my mobile phone, they won't place the same ad as if I'm in my house, because they have a geolocation set of third parties that all they do is figure out where is this coming from.
DY: So how do we protect ourselves then if we want to sort of somehow stay insulated from this and have more control?
RP: Oh this is a cool question and this is the subject of our next research which we're about to...so we're about to submit papers looking at the impacts of regulations such as GDPR or other alternative...
DY: Hey. What's GDPR? You have to explain that.
RP: Oh. That's the European Privacy Protection Law. And that law, it's what we call an opt-in where you have to say yes and you're starting to see a lot of...and especially now that California has come up with ... on January 1, California instituted their law, which was essentially very similar to the European in terms of forcing websites to allow you to either opt in or opt out. Now I will give you a preview. The preview is ... so we did a very large study. We tested 100,000 websites over a period of time and saw the impact. So the big preview is when you create this GDPR regulation that's supposed to actually help and reduce the intrusion, our next study has found that you're tracked more.
RP: Yeah. So that's future news so I'll come back to you in a couple years after we've got that paper all sorted out but our initial findings are that it doesn't always go the way you think it would when you try and regulate this stuff.
DY: Okay. So we're now in an election year in the United States. What are you looking for in terms of these actors and their influence on political outcomes?
RP: Everything that I've seen says that we are in for one wild ride from here to November with the United States and you're seeing a summer of discontent that is being fueled in part by not only legitimate concerns and they're definitely legitimate concerns but also being fueled by fake news and misinformation that's deliberately trying to divide people.
DY: Divide people and in some cases give them the confirmation bias they're looking for even though it's not correct.
RP: Yes and we can all agree. I mean just state right now that any form of violence, police violence is inappropriate. Completely beyond the pale. We would all agree with that. But that's not the fake news angle that we're talking about.
DY: That's very disconcerting. Ray, I want to thank you very much for this really interesting conversation.
RP: Thanks for your time. Thank you.
DY: This has been UCalgary COVIDcast. To subscribe or to listen to past episodes or to get more online resources for coping with the coronavirus pandemic, visit UCalgary.ca/COVIDsupport. Thanks to Dr. Ray Patterson for taking the time to chat with us today. I'm Deborah Yedlin. Thanks for listening.