Recently, I stumbled across this blog post: A Rant. This rant is about all the things R is bad at according to the author, and I will comment on (some of) the points the author makes. In case the original rant becomes unavailable/changed here is a link to the Wayback Machine of the version I am responding to: http://web.archive.org/web/20260217152619/https://www.hendrik-erz.de/post/a-rant.

Each section will be the name of the section in the original rant, and I will quote the relevant part of the rant, and then provide my response to it. Without further ado, let’s get into it.

R Cannot Produce flexible Descriptives and Regression Tables

Relevant section of the author’s rant

Yes, there are packages like stargazer out there. But none of them work reliably. Whenever I have to produce such a table, I more often than not find myself having to manually clean out some errors that these tables have. This is fine if you do that once, but this is not how science works. We run a regression dozens of times, and it is not feasible having to re-apply the same patches to the erroneous code just to get a proper LaTeX export.

Science is a collaborative effort, and thus, if you discover an issue with a package (that you use regularly), why wouldn’t you try to reach out to the author? Or instead of using stargazer consider one of the more modern alternatives, which are more maintained, such as modelsummary.

R is not Scalable

Relevant section of the author’s rant

This is not least due to the fact that data types in R have blurry demarcations. A data.frame is also a matrix, and a matrix is really just a two-dimensional vector.

This is just false, and we can quickly verify this:

df <- data.frame(x = 1:5, y = 6:10)
is.matrix(df)

[1] FALSE

Moving between the various data types is often an implicit action because a function may expect a matrix, but happily accepts a data.frame and silently converts it without you knowing.

Has nothing to do with the language. I can name plenty of Python functions that will happily convert to NumPy arrays or pandas DataFrames without you knowing.

Also, there are many tasks that require multi-processing because they’re CPU-bound. And with parallel, R even features a base package for that! Great! … is what I would say if it would actually work as expected. You know, multi-processing works by spawning a series of threads that run on different cores of your CPU. However, due to security considerations, each thread must have its own separate memory block. This means that all data that a thread needs to have access to must be copied for each thread.

See e.g. the newly released package mori, which seems to be a very promising method for reducing memory when doing parallel processing in R (released after the author wrote the rant).

R cannot handle text

Relevant section of the author’s rant

This has been a long-time pet peeve of myself. Basically all programming languages have a somewhat convenient way of working with text … except R.

Take just a few examples: Printing out interpolated text with two variables in various languages:
JavaScript: Some text ${var1} and other text: ${var2}
PHP: "Some text $var1 and other text $var2"
Python: f"Some text {var1} and other text {var2}"
Rust: println!("Some text {} and other text {}", var1, var2)

R has sprintf():

sprintf("Some text %s and other text %s", var1, var2)

Another option is the glue package, where you can then write the above like this:

glue::glue("Some text {var1} and other text {var2}")

“Everything is an Object”

Relevant section of the author’s rant

But in addition, do you know what data type a vector has? Exactly: It depends on the data type of the values it stores! Effectively, this means that a vector “hides” itself from your own code. You yourself may know which variable is a vector and which is a primitive value, but your code has little way of knowing. Here’s an example:
class(2.3)
[1] "numeric"
class(c(2.3, 1.2))
[1] "numeric"
typeof(2.3)
[1] "double"
typeof(c(2.3, 1.2))
[1] "double"
Have fun writing code that distinguishes between vectors and primitives.

The meaning of primitives in R have a specific meaning, which has nothing to do with the above, from the R documentation of primitives: “C code compiled into R at build time can be called directly in what are termed primitives”.

But anyway, for checking objects I can recommend the checkmate package, which provides a lot of useful functions for checking the type, the length, and so on.

Let me rephrase the section’s heading: The problem is less that everything is an Object in R, it is more that R is extremely sloppy with its data types. Do you have a list but need a vector? No problem: as.vector(list). Or do you need a matrix? Just do it. And if you don’t even care, R will care for you. This makes it extremely difficult to track down problems with your variables, because more often than not it does matter whether something is a vector or a list.

What? Is the author now claiming conversion functions are bad? Conversion functions between data types exists in every programming language, and are not a problem (the contrary, they are a feature).

Lastly, accessing an element inside a list requires you to use double brackets (my_list[[2]]), but the same syntax also works for vectors without being a syntax error, even though vectors only require single brackets for indexing. This also makes you yourself very sloppy. Do you remember which of your elements in R are lists, and which are vectors? Since both accept pretty much the same syntax, it’s hard to keep track.

This is a misunderstanding about what [ and [[ do in R. [ is used for subsetting, and it returns an object of the same class as the original object. [[ is used for extracting elements, and it returns an object of the class of the extracted element. So if you have a list of vectors, my_list[[2]] will return a vector, while my_list[2] will return a list containing that vector.

Since variables of length one are also vectors it’s equivalent to use y[1] and y[[1]] to access the first element of a vector y, which is probably why the author is confused.

The R Garbage Collector is Broken

Relevant section of the author’s rant

However, as I had to find out the hard way, calling gc() doesn’t necessarily free up all unused memory. I don’t know where the bug exactly originates, and I have more important things to do than track down obscure bugs, but it could be anything from just faulty memory size estimation to an issue with the reference counter or just a faulty implementation of gc. I don’t know.

It’s likely an issue with your IDE, RStudio, and not with R. It would probably go away if you used a different IDE or just used the terminal.

R doesn’t know pointers

Relevant section of the author’s rant

This problem is even exacerbated by something that I deem a very big hindrance to a better efficiency of the language: R doesn’t know pointers. Specifically, you cannot pass-by-reference in R. This means when you have some big matrix x and you want to extract some chunk of it, calling y <- x[:10000], will copy that entire chunk.

Just use e.g. data.table (or its tidy-variant tidytable) and you won’t have to worry about this.

R Scopes are a Joke

Relevant section of the author’s rant

And R? Well. First, it doesn’t have a global scope, it has environments. The “global” scope is just the default environment (the thing you see in RStudio to the top-right in the default layout). Any variable you declare anywhere in your code will just be slapped onto that. A few days ago, I wanted to better compartmentalize my data frame build module by using environments, in the hopes that variables declared in one script file won’t bleed into other code. And guess what: Exactly, it did not work.

This is just a misunderstanding of how R works. You can create new environments and use them to compartmentalize your code. E.g. if the author made several files and included them using source(), then you just need to set the argument local in source() to TRUE (or an environment of their choice) to avoid variables from one script file bleeding into other code.

A variable defined somewhere deep inside a function will afterward pop up in your global environment and there is little you can do against this. It turns out, R is really littering its data everywhere.

This is just plain false, which we can quickly verify:

foo <- function() {
  x <- 1
}
foo()
exists("x")

[1] FALSE

Of course, x is not defined in the global environment, because it is defined inside the function foo(), which uses it’s own environment.

R’s Syntax is Inconsistent

Relevant section of the author’s rant

I think this is something where an actual argument could be had for whether that is bad or not. But remember the list/vector argument from above: Lists require double brackets for accessing its contents whereas vectors only require single brackets. But vectors also accept double brackets and pretend that it’s just single brackets. This reduces the mental load that you bear, but this also nudges you to become sloppy.

And this is just because the author doesn’t understand the difference between [ and [[ in R, which I explained above.

RStudio Sucks

Relevant section of the author’s rant

Here the author gives some complains about RStudio, where again some of the points are wrong. And maybe the author should look into using e.g. Positron instead.

The R Community Has no Standards

Verbosity Where no Verbosity is Needed

Relevant section of the author’s rant

Here the author complains about the R package philentropy, and the fact that the wrapper function philentropy::KL() prints a message with no way to turn it off, (while the main function philentropy::distance() does have a mute.message argument). Upon seeing this, the author writes:

So I was convinced this was a bug and went to their GitHub repository. And indeed, I found a discussion on this issue. It was closed and the dead-serious answer by the maintainer was “just use suppress.messages() and the message will go away”.

Here is the actual GitHub issue: https://github.com/drostlab/philentropy/issues/25 and the relevant part of the maintainers response:

When you install the developer version of philentropy directly from GitHub, then you will find a new argument mute_message that you can set to TRUE.
philentropy::distance(rbind(1:10 / sum(1:10), 20:29 / sum(20:29)),
                     method = "euclidean",
                     mute_message = TRUE)
euclidean 
0.1280713 
Alternatively, you can also always use the suppressMessages(distance(...)) function to suppress message calls. Does this solve your issue?

That doesn’t align with what the author said above at all to me, and it even feels like the author is being disingenuous. Additionally, if the author asked the maintainer for the mute messages argument to be added to the wrapper functions such as philentropy::KL() it likely would have been. And indeed, the current development version has an option to set it persistently, when calls to philentropy::distance() are being made, as seen here https://github.com/drostlab/philentropy/blob/master/R/distance.R#L224.

Implicit Type Conversions Will Screw You

Relevant section of the author’s rant

Now the author complains about the package seededlda, and that the documentation of the batch_size argument was weird/bad (was originally something else, but the author realized he was wrong, hence the non-matching title), since the author is used to seeing batch size as an integer, but the documentation used it as a proportion.

It is very unusual to define batch size parameters as “proportions of something”. And this correctly made me suspicious. The reason they do this is because of a 2009 paper that they did not reference in the documentation, but rather somewhere in the source code.

And links to the following .Rd documentation. The documentation of the batch_size argument is as follows:

\item{batch_size}{split the corpus into the smaller batches (specified in
proportion) for distributed computing; it is disabled when a batch include
all the documents \code{batch_size = 1.0}. See details.}

Since that .Rd doesn’t have a details section, and seeing that the argument is inherited from textmodel_lda() (and that the textmodel_seededlda() function is just a small wrapper around textmodel_lda()), we check the documentation of that function, and indeed, there is a details section that explains the batch_size argument in more detail https://github.com/koheiw/seededlda/blob/master/man/textmodel_lda.Rd#L86-L92, and mentioning the paper (also at the time the author wrote the rant).

This took me a minute to find. And once again, if you have an issue with a package, consider making an issue/PR on GitHub. The maintainer of the package is likely to be very responsive and helpful, and will probably (try to) fix any issues you have.

Painting Code With magrittR

Relevant section of the author’s rant

The last problem is more meta and concerns the naming practices. R users like to give their packages fancy names as opposed to names that convey what they’re doing.

Yes, this is true, but hardly unique to R. Here I like e.g. Julia’s package name guidelines, which says:

Package names should be sensible to most Julia users, even to those who are not domain experts. The following rules apply to the General registry but may be useful for other package registries as well.

Reticulate has no reason to exist

Relevant section of the author’s rant

A final problem I have with R is the existence of reticulate. reticulate’s job is pretty easy to summarize: Offer a way to call Python code from within R. Now, I understand that some functions are obviously only implemented in Python and not in R. Then, there are two options. The reasonable idea would be to just learn Python, write a few lines of code to accomplish what you want, and load the data back into R to continue where you left off. Or you can be completely unreasonable and develop a package to interface with Python.

What? Does the author also think that packages like Rcpp to call C++, JuliaCall to call Julia code, rJava to call Java code, rextendr to call Rust code, and so on, also have no reason to exist? There are many valid reasons to want to call code from another language without switching completely to said language. So yes, reticulate has a reason to exist, and similar methods for calling other languages exists in all programming languages, and are not unique to R.

Conclusion

All in all, it seems like (almost?) all the points the author makes in the rant are either misunderstandings of how R works, or issues with specific packages that could be easily fixed by making an issue/PR on GitHub, or not having enough knowledge of the R ecosystem. So I would recommend the author to learn more about how R works, and to consider contributing to the packages they use if they find issues with them, instead of writing a rant about it.