Matt Gemmell

TOLL is available now!

An action-thriller novel — book 2 in the KESTREL series.

★★★★★ — Amazon


blog, comments & identity 7 min read

In reference partly to my recent articles on switching comments off, and also the ‘Identity’ section of my article on attribution of writing on the web, several people have sent me links to Disqus’ research on comments and identity.

For unfathomable reasons, the aforementioned research is presented as a gigantic JPEG in the guise of a web page, but let’s nervously ignore that eccentricity. The chosen title of the piece is “Pseudonyms drive communities!” - to which I can presumably reply “My blog isn’t a community!”, and participate no further.

Naturally, I won’t do that. The analysis is summarised (by Disqus) as follows:

The most important contributors to online communities are those using pseudonyms. In our data, they accounted for 61% of total comments! These contributors also comment more frequently - 6.5 times more frequently than anonymous commenters and 4.7 times more frequently than commenters using a real name (via Facebook).

Our first question must be: what’s the difference between a pseudonymous and an anonymous commenter? Disqus doesn’t make this at all clear, which troubles me; their only allusion to this surely key aspect of their own data is that pseudonyms “better represent one’s persona”, and “don’t sacrifice personality”. I’m going to thus generously assume that “anonymous” here means either those who are not registered (or logged in) with Disqus when using one of their comment forms. This leaves the question of how to account for those who are signed in but whose username is either “Anonymous” or some variant or equivalent thereof. If the latter group is counted as pseudonymous, the analysis becomes more skewed in favour of its conclusion.

Now, putting aside the further false assumption that everyone authenticating via Facebook is using their real name (in my experience, most people who do so are indeed using what appears to be a real name, so we’ll let this one slide), it seems to me that Disqus have summarised precisely the least interesting part of their analysis.

Disqus’ data shows that just over a third (35%) of the sample were anonymous, and that only 4% were identified (i.e. were using real names). Pseudonyms accounted for 65%, which is an accurate indication of how most people view the issue of identity online. The analysis of course compensates for the numerical difference by calculating the commenting ‘rate’ by identity, giving the summary previously mentioned.

Of course it’s true that those using pseudonyms will comment more often, and that there will be more such comments overall, than either anonymous or identified visitors. The primary effect of masking one’s identity is disinhibition, and an (almost complete) reduction in accountability. Anyone who, if forced to use a real identity, would choose not to post a given remark, may indeed post it if they can do so under a false identity. The conclusion is trivially true.

The issue of the disparity between pseudonymous and truly anonymous comments can be readily explained by:

  1. Our conditioning towards signing up for, and signing into online services.
  2. Our browsers’ ability to keep us logged into services without intervention.
  3. Our enormous egos, craving recognition for some identity even if not our real one.

So far, so boring. The really interesting and controversial question is raised by Disqus’ conclusion that:

Pseudonyms are the most valuable contributors to communities because they contribute the highest quantity and quality of comments.

Our question must of course then be: what is quality? Disqus has a definition ready, of course. The study rated the quality of a comment using two positive signals and three negative ones, as follows:

  1. Positive: The number of times a comment is ‘liked’.
  2. Positive: The number of times a comment is replied to.
  3. Negative: The number of times a comment is flagged (presumably as unsuitable).
  4. Negative: The number of times a comment is marked as spam.
  5. Negative: The number of times a comment is deleted (presumably by moderators).

Of these five metrics, we can safely discard the fourth (spam). If we assume the integrity of the system, with accurate and non-malicious use of spam-reporting facilities (as Disqus implies that we should by its very inclusion as a basis for conclusion), then those data points are relevant only to a discussion of how spammers behave online as regards identity. Since I view “comments” and “spam” as mutually exclusive (in that if a given remark is spam, then it doesn’t actually constitute a comment - and certainly not one to which a non-zero “quality” can be applied), the fourth result is superflous and not germane to this discussion.

The remaining four measures all seem reasonable, on the surface. Explicit peer approval, response generation, explicit peer disapproval, and moderator removal are surely relevant for inspection. The question is, do they measure what Disqus is trying to make them measure? My argument is that, no, they don’t - at least not with remotely sufficient definitiveness to drive the actual conclusion drawn. The problem is that these factors are being used in a way that defies much of my own (and I very much imagine, also your) experience with comments on the web. Let’s take each factor individually.


The two situations in which I’ve consistently seen by far the most “likes” (or equivalent votes) is on Facebook and on YouTube. In each case, the following holds true:

  • The liked comments are almost exclusively brief sentences or single paragraphs
  • The liked comments are almost always sarcastic (or at best, pithy) rejoinders
  • At least half of the time, there’s a discernible element of mean-spiritedness and/or ultra-partisanship to either the liked comment itself, or the act of liking it.

Twitter also has a voting system (the ‘favorite’ function), to which the above observations also seem to generally apply - brevity being of course enforced by the service itself.

It’s not at all clear to me that such voting systems are ever a measure of actual quality (contribution, extension of discussion, introduction of additional perspective, insightful commentary, etc) in comments.


An open question is whether Disqus’ analysis counted only direct replies to comments, or whether it instead counted an entire reply-thread as all being replies to the root comment of that thread; the latter approach would of course be debatable, since not all comments in a thread semantically respond to the spawning comment. Let’s assume the former position.

The universally-acknowledged state of comments on the web is that they’re of extremely low quality on average, and are wildly slanted towards poor thinking and/or triviality. Even the word “comments”, in the context of the web, draws a weary groan. Here are few observations of my own regarding comment-threads; I’d imagine that they largely gel with your own experience. I must also remind the reader that we’re talking about a sample set of Disqus comments, and thus we’re excluding web forums (one of the few places, at least for technical help purposes, that often have threads of value).

  • Complex comments usually generate few replies, for obvious reasons of the effort required to assimilate them.
  • Trolling, partisan or otherwise unpleasant comments generate substantial backlash.
  • Similarly, crowd-pleasing remarks generate copious essentially disposable expressions of agreement.
  • Comments containing errors of fact or language reliably attract corrections and/or mockery.
  • Duplicate comments (not exact duplicates of the same author, but rather functional duplicates of another’s remark) are extremely commonplace.
  • Responses which constitute either straw men or critiques of points incidental to the actual discussion are extremely commonplace.
  • As with articles themselves, a significant percentage of responses are dictated by the topic of a comment, rather than its specific content.

The reply-generation value of a comment is based at least as much on controversiality, audience, topic and orthogonal characteristics as its content. It seems ludicrous to me, given our experience of web comments on a daily basis, to claim that it’s a measure of quality in the conventional sense. Reply-generation value is only a measure of what it claims to be: the number of replies generated.

Disliking, or unsuitability

This is the opposite situation as liking, and is governed by the same rules (and those of reply-generation; voting is of course a legitimate and succinct form of reply). For those very same reasons, it can’t convincingly be said that voting (in either direction) is a measure of objective quality of the content, but merely of subjective value to the voter.

In no way is it clear how (or if) that value-assessment correlates with the feelings of the blog owner (presumably one of the groups at whom the research write-up is aimed).


This metric is slightly more interesting, being the will of the actual owner of the blog (or at least a moderated trusted by that person), but ultimately it falls foul of the implicit fallacy that the blog owner is just, abhors censorship, practises a staunchly scientific attitude towards pushback and correction, and is a better arbiter of quality than the writer of a given comment.

None of those things are true in any absolute sense, so the metric is inherently subjective. The only thing that the deletion of a comment indicates is that its subjective value to the moderator was low; it says nothing of its inherent quality, much less give any indication of a correlation with the comment author’s preferences regarding online identity.

Final thoughts

In closing, I’ll briefly note that the entire piece of research, being about identity of commenters rather than the content of comments themselves, naturally addresses only one of the arguments I’ve presented against allowing comments on blogs.

On its own, I think it makes many (very quotable) assumptions about what the data actually shows, and generalises too far to be meaningful. I’m skeptical of the metrics being used, and I think that their interpretations are over-reaching.

There’s an unfortunate social psychology of comments that tends to place an undeserved implicit value on the conversational aspect, without considering whether they actually enhance the discussion of the original topic. Indeed, it’s almost taboo to even raise the issue, immediately drawing inappropriate yells about egalitarianism and right to reply.

The core issue with lack of online identity, for me, is the unpleasant and all-too-visible underside of the disinhibition it creates. One of the reasons I strongly prefer real names is precisely because that policy tends to filter people’s output, due to their words then being personally attributable. We’re egocentric creatures, and we all have reputations and self-respect to maintain.

Anonymity online produces interactions that are free from the interpersonal protocols that regulate society, and I don’t think that’s a good thing. It will remain so, and the ‘constraint’ of personal accountability will remain necessary, until we’re substantially more evolved creatures.