Does Julia have any hope of sticking in the statistical community?



I recently read a post from R-Bloggers, that linked to this blog post from John Myles White about a new language called Julia. Julia takes advantage of a just-in-time compiler that gives it wicked fast run times and puts it on the same order of magnitude of speed as C/C++ (the same order, not equally fast). Furthermore, it uses the orthodox looping mechanisms that those of us who started programming on traditional languages are familiar with, instead of R's apply statements and vector operations.

R is not going away by any means, even with such awesome timings from Julia. It has extensive support in industry, and numerous wonderful packages to do just about anything.

My interests are Bayesian in nature, where vectorizing is often not possible. Certainly serial tasks must be done using loops and involve heavy computation at each iteration. R can be very slow at these serial looping tasks, and C/++ is not a walk in the park to write. Julia seems like a great alternative to writing in C/++, but it's in its infancy, and lacks a lot of the functionality I love about R. It would only make sense to learn Julia as a computational statistics workbench if it garners enough support from the statistics community and people start writing useful packages for it.

My questions follow:

  1. What features does Julia need to have in order to have the allure that made R the de facto language of statistics?

  2. What are the advantages and disadvantages of learning Julia to do computationally-heavy tasks, versus learning a low-level language like C/++?

Christopher Aden

Posted 2012-04-01T22:56:09.113

Reputation: 1 095

1@whuber, could you point me to more information regarding the »easily and automatically parallelizable functional style of coding «? As someone who has struggled with CUDA and MPI, that sounds very interesting! – trolle3000 – 2013-04-19T19:16:44.887

@trolle3000 Good, practical references include books on R (such as many of those written by one of its founding fathers, John Chambers) and Mathematica (which supports several programming paradigms but favors functional programming and offers automatic parallelization in a number of ways through its Parallel* commands and, more recently, CUDA support). See for example.

– whuber – 2013-04-19T20:12:29.120

@trolle3000: Automatically parallelizable is one of the nice features that comes out of some languages. Matlab, for one, has a lot of their functions parallelized in the Statistics Toolbox. While a lot of parallelization won't be as difficult (explicit) as CUDA, it'll still require some effort. See: and

– Christopher Aden – 2013-04-19T20:50:54.797

1@ChristopherAden, @whuber; parallelization doesn't just automagically fall out of 'some languages'...! The MATLAB parfor command, for example, will only parallelize an already embarrasingly parallel problem, just like the #omp pragma parfor will do in C. My real question is, what's the inherent advantage in functional programming with respects to parallelization? – trolle3000 – 2013-04-21T03:24:40.853

That's obviously #pragma omp parallel for – trolle3000 – 2013-04-21T03:31:40.693

1@trolle3000 I don't think anybody is claiming that parallelization is so automatic. However, when (if) you have written a functional version of a program, you have already undertaken much of the effort needed to parallelize it, which is why applications like Mathematica can automate the parallelization, often quite effectively. If instead you have coded an algorithm in a procedural manner, it will usually be much more difficult to parallelize it. – whuber – 2013-04-21T20:06:16.303

1I am astonished that the current discussion regarding 'functional programming' seems to omit real functional languages like Haskell, Clojure and Scala. Also note that imperative languages are not necessarily worse than FP at concurrency (take Go for instance). – Marc Claesen – 2014-06-22T13:20:26.270

A great new MCMC package just came out for Julia! You guys should check it out :)

– bdeonovic – 2014-06-22T13:43:38.410

"the orthodox looping mechanisms that those of us who started programming on traditional languages are familiar with, instead of R's apply statements and vector operations." - I have a problem with this statement. You probably mean programmers who started and got stuck in imperative programming. What are traditional programming languages anyway? R's apply statements are in line with very traditional functional language paradigm and with generic programming style like in ML. Both approaches are at least 40 years old. I first saw C++ STL in mid 1990s. "apply" is as traditonal as it gets in IT – Aksakal – 2015-10-12T17:31:52.757


How is Julia better than Incanter ( and other similar projects?

– Wayne – 2012-04-02T14:39:27.163

23Re procedural constructs (e.g. looping): that sounds like a giant step backwards. We are at the cusp of a change from single and small-CPU platforms to massively parallel platforms. As this evolution occurs over the next decade or so, the easily and automatically parallelizable functional style of coding will reap huge advantages over procedural code. Many other considerations intervene in one's choice of a statistical platform, of course, but this one is worth bearing in mind as a long term strategy. – whuber – 2012-04-03T16:14:15.247

@whuber: I've trimmed off the first ``question''. I'm interested in an informed speculation. Thoughtful responses like the Harlan and NeilG's. My familiarity with Julia is less than a week old, so their responses are able to shed some light on the pros/cons of both Julia and R. I'm interested in further discussion on the topic, so if you could recommend ways to make it more congruent with the CW format, I'd be grateful. – Christopher Aden – 2012-04-03T18:29:21.787

11Christopher, a good approach is to frame questions in a manner designed to solicit reasons and evidence. E.g., instead of "Does Julia have the necessary allure...," try something like "What elements of Julia might give it a chance of gaining traction and why"; instead of "Is it worth learning," ask "Why might Julia be worth learning now? What are its potential advantages?" You could further refine that question by specifying what kinds of uses of Julia you may be interested in, such as software development, solving one-off problems, biostatistics, data mining, etc. – whuber – 2012-04-03T18:41:17.627

1@Whuber: I appreciate the suggestions and have implemented them. Thank you! – Christopher Aden – 2012-04-03T19:02:23.240

1Christopher, well done! (And +1 for the question.) And now I think @naught could make a good case for this not to be CW, but I still agree with chl that multiple good answers are likely to emerge--that has become abundantly obvious--and so prefer to maintain the CW status of this thread. – whuber – 2012-04-03T19:16:33.103

1@whuber: the question seems to have changed substantially. I don't have a very good case against CW anymore :) – naught101 – 2012-04-03T21:56:18.253

My fears about CW giving me low-quality answers are unfounded. I'm quite satisfied with the responses I've received so far. – Christopher Aden – 2012-04-04T00:24:51.393

1the biggest hurdle a new design faces is in the old design somehow sustaining (more or less efficiently) another adaptation away from its original intend. The biggest competitor to Julia ain't R. It's more about Hp-computing adaptation to base-R. Here I'm thinking of the byte-code initiative such as the "compiler" package. compiler got to a slow start, but eventually it'll catch up with Javascript's speed for non-vectorized operations, and do that way before Julia will have a code depot as deep as R. The result will probably push R further away from elegance.... – user603 – 2012-04-04T15:28:01.313


@whuber I just thought you might be interested to know that the assertion that Julia code needs to be de-vectorized to be fast is a myth (albeit a strangely popular one). I answered a StackOverflow question about exactly this point here. The short summary is: Julia is fast and supportive of parallel constructs, whether you use a vectorized or de-vectorized style of coding.

– Colin T Bowers – 2018-01-16T05:11:11.763

1Thank you, @Colin. I believe you mischaracterize my comments: I don't think I said or implied vectorization is required for code to be "fast." Vectorization is helpful for scalability and extreme speed and therefore is an important consideration for code that will be applied to large problems. What is interesting is your claim that Julia vectorizes code that is not explicitly written in a vectorized form. If that's the case, we certainly can expect many algorithms implemented in Julia to be fast and scalable. – whuber – 2018-01-16T15:55:02.667

@whuber Apologies, poor language choice by me. I conflated "fast" and "scalability". However, I should clarify what I am claiming. I am not claiming that Julia always vectorizes non-vectorized code - I'm not aware of any compiler smart enough to do that. What I am claiming is difficult to summarize in a comment, but here goes: For non-vectorized Julia code, performance is close to C or Fortran. For vectorized Julia code, performance is close to R/Matlab and better in some ways, and support for parallelism is similarly close to or better.

– Colin T Bowers – 2018-01-16T22:54:30.810

1@Colin Thanks. That is exactly in the sense I understood your comment--I wouldn't expect an interpreter or compiler to recognize all vectorization opportunities. – whuber – 2018-01-16T23:35:32.393



I think the key will be whether or not libraries start being developed for Julia. It's all well and good to see toy examples (even if they are complicated toys) showing that Julia blows R out of the water at tasks R is bad at.

But poorly done loops and hand coded algorithms are not why many of the people I know who use R use R. They use it because for nearly any statistical task under the sun, someone has written R code for it. R is both a programming language and a statistics package - at present Julia is only the former.

I think its possible to get there, but there are much more established languages (Python) that still struggle with being usable statistical toolkits.


Posted 2012-04-01T22:56:09.113

Reputation: 15 705

2One reason why R took off was its compatibility to S-PLUS. People were able to use a lot of old code. Old heavily used code has fewer bugs. With new things like Julia, which are not compatible with old code, you need a "killer app" situation: something that justifies all the trouble of moving to a new platform. It's similar to Google's new language Go - nice try, but why would I learn it? – Aksakal – 2015-10-12T17:37:56.957

Julia has RCall and PyCall to call Python and R libraries without syntactic fuss. In fact, there's are so robust that people have written wrappers around many of the common libraries that people are used to, so the whole R core library is wrapped for use via Rmath.jl (and was the standard RNG during early Julia), and many people use PyPlot which uses matplotlib. There are ccall functions so you can call C, and the same for Fortran. Packages are wrapped for these as well. So Julia solved the library problem by making it easy to use just about any language's libraries! – Chris Rackauckas – 2016-07-22T05:21:08.203

Have you actually looked at the benchmark code (or the benchmarks) to know that the R methods are poorly written? I am trying to find it myself to see how the various languages were used... – Josh Hemann – 2012-04-04T14:01:07.923

10@JoshHemann I've looked at enough to know that across the board R is "slow-ish". It doesn't necessarily lose every time, and it does on occasion blow Python out of the water, but in all of those cases the "who wins" ribbon seems to go to which Python or R programmer actually wrote most of their stuff in C. – Fomite – 2012-04-04T17:02:23.233


The benchmark code is terrible. 2000x speed gains are possible for their R examples. See , especially the comments.

– Ari B. Friedman – 2012-05-15T13:44:29.887


You're right, @gsk. E.g., pisum (at takes 7.76 seconds while a simple rewrite using idiomatic R (replicate(500, sum((1 / (10000:1))^2))[500]) takes 0.137 seconds, more than a fifty-fold speedup.

– whuber – 2012-05-15T14:21:06.457

@whuber Exactly. And compiling probably buys you another "x" or two. – Ari B. Friedman – 2012-05-15T17:52:21.030


I agree with a lot of the other comments. "Hope"? Sure. I think Julia has learned a lot from what R and Python/NumPy/Pandas and other systems have done right and wrong over the years. If I were smarter than I am, and wanted to write a new programming language that would be the substrate for a statistical development environment in the future, it would look very much like Julia.

This said, it'll be 5 years before this question could possibly be answered in hindsight. As of right now, Julia lacks the following critical aspects of a statistical programming system that could compete with R for day-to-day users:

(list updated over time...)

  • optionally-ordered factor types
  • most statistical tests and statistical models
  • literate programming/reproduce-able analysis support
  • R-class, or even Matlab-class plotting

To compete with R, Julia and add-on stats packages will need to be clean enough and complete enough that smart non-programmers, say grad students in the social sciences, could reasonably use it. There's a heck of a lot of work to get there. Maybe it'll happen, maybe it'll fizzle, maybe something else (R 3.0?) will supercede it.


Julia now supports DataFrames with missing data/NAs, modules/namespaces, formula types and model.matrix infrastructure, plotting (sorta), database support (but not to DataFrames yet), and passing arguments by keywords. There is also now an IDE (Julia Studio), Windows support, some statistical tests, and some date/time support.


Posted 2012-04-01T22:56:09.113

Reputation: 562

literate programming/reproduce-able analysis support -> see IJulia. – Piotr Migdal – 2015-01-27T22:32:06.087

1Add iJulia kernel for the iPython/Jupyter notebook ecosystem. – thecity2 – 2015-05-15T20:07:22.580

2Julia Studio is being phased out, and Juno is now the IDE – Antony – 2015-06-29T12:47:35.977

22.5 years after this answer was first posted, two-thirds of the items on the list of "must haves" are now implemented. I think that's the best evidence you could find that Julia has real promise. – senderle – 2015-12-11T12:43:30.770

4+! For windows support. – Owe Jessen – 2012-04-03T19:49:47.110

5 years must have passed. Are we there yet, @Harlan? – StasK – 2017-11-16T03:31:51.843

@stask Hah, good question! Plotting, statistical packages, etc., are in pretty good shape. The weakest place is actually DataFrames themselves, which have been undergoing rapid change to support a missing-data approach that's type-stable and fast. Sounds like that'll be finalized in the next few weeks, and Julia 0.7 (aka 1.0 Beta) should be released in the next couple months. I'd definitely re-visit Julia after the 0.7 release and the DataFrames updates, if you haven't used it in a while! – Harlan – 2017-11-16T15:00:08.240

Also for statistical reproduce-able analysis Weave.jl is worth a look. :) – Dr. Mike – 2018-01-10T23:11:36.380


For me, one very important thing for a data analysis language is to have query/relational algebra functionality with reasonable defaults and interactively-oriented design, and ideally this should be a built-in of the language. IMO, no FOSS language that I've used does this effectively, not even R.

data.frame is very clunky to work with interactively - for example, it prints the whole data structure on invocation, the \$ syntax is hard to work programatically with, querying requires redundant self reference (i.e., DF[DF$x < 10]), joins and aggregation are awkward. Data.table solves most of these annoyances, but as it is not part of the core implementation, most R code does not make use of its facilities.

Pandas in python suffers from the same faults.

These gripes may seem nitpicky, but these faults accumulate and in the end are significant in aggregate as they end up costing a lot of time.

I believe if Julia is to succeed as a data analysis environment, effort must be devoted to implementing SQL type operators (without the baggage of SQL syntax) on a user friendly table data type.

Yike Lu

Posted 2012-04-01T22:56:09.113

Reputation: 111

1+1--An interesting point, thoughtfully explained. Welcome to our community! – whuber – 2012-06-06T13:03:23.307

4To be nit-picky, large Pandas DataFrames actually don't print out all of their contents when invoked, as happens in R. They switch to displaying column headers along with a count of null/non-null values. Also, while I agree the syntax isn't ideal, scoping issues make it hard to eliminate the self-reference for comprehension-style filtering. It is wordier, but it's also resistant to namespace collisions if a DataFrame has extra columns at runtime you didn't expect. – goodside – 2012-10-13T17:57:44.240


I can sign under what Dirk and EpiGrad said; yet there is one more thing that makes R an unique lang in its niche -- data-oriented type system.

R's was especially designed for handling data, that's why it is vector-centered and has stuff like data.frames, factors, NAs and attributes.
Julia's types are on the other hand numerical-performance-oriented, thus we have scalars, well defined storage modes, unions and structs.

This may look benign, but everyone that has ever try to do stats with MATLAB knows that it really hurts.

So, at least for me, Julia can't offer anything which I cannot fix with a few-line C chunk and kills a lot of really useful expressiveness.


Posted 2012-04-01T22:56:09.113

Reputation: 19 511


(+1) Good point. Some further thoughts: The lack of data.frame-like facilities in Python has long bothered me, but now Pandas seems to have resolve this issue. Formula are among some of the planned extensions of statsmodels (well, we know that sometimes it's better to avoid the formula interface in R). There's a data.frame proposal for Julia (pretty quick compared to Python!), (...)

– chl – 2012-04-02T16:55:27.223

(Con't) and Doug Bates has started playing with Julia--as well as Shane, John Myles White, or Vince Buffalo--which certainly reflects the interest of the statistical, ML and bioinformatics communities. So, let's wait and see, as @Dirk said.

– chl – 2012-04-02T16:55:34.463

5I think @mbq also has a point about C. If I need speed on the same order of magnitude as C/C++...I can use C/C++ with R. – Fomite – 2012-04-02T19:14:43.670

4@EpiGrad, yes, you can write C/C++ and interface cleanly with R. But that's a weakness, not a strength of the language. With Julia, end users will never need to write C to get speed. – Harlan – 2012-04-03T14:07:08.290

One of the interesting things about Julia is that ALL of the type system, except for hardware-level blocks of bits and floating-points, is implemented in Julia itself. That means that you could write a parallel type system for "Data" that looks much more R-like, including NA support. – Harlan – 2012-04-03T14:08:55.997

2@Harlan It's only a weakness if you already know both Julia and C. I'd assert time spent in C < time spent learning a new language and reimplementing everything from scratch. – Fomite – 2012-04-03T16:52:38.747

2@EpiGrad, right. There are probably millions of scientists and analysts who know zero C and shouldn't need to learn it to do their jobs quickly. If they need to learn to program in one language, probably poorly, there's a lot to be said for a language that's designed for their use cases rather than for system-level development. – Harlan – 2012-04-03T17:10:01.557

8@Harlan And to be blunt, those people aren't going to be rewriting their stuff in Julia. R as a statistics package, not a programming language is their use case. – Fomite – 2012-04-03T17:22:21.253


The Julia language is pretty new; it's time in the spot light can be measured in weeks (even though its development time can of course be measured in years). Now those weeks in the spot light were very exciting weeks---see for example the recent talk at Stanford where "it had just started"---but what you ask for in terms of broader infrastructure and package support will take much longer to materialize.

So I'd keep using R, and be mindful of the developing alternatives. Last year a lot of people went gaga over Clojure; this year Julia is the reigning new flavour. We'll see if it sticks.

Dirk Eddelbuettel

Posted 2012-04-01T22:56:09.113

Reputation: 7 282

Julia can't be compared to things like Incanter or clojure. Julia is a really new paradigm, based on JIT compiling to machine code! Its a completely different game. – kjetil b halvorsen – 2014-06-22T14:24:50.333

@DirkEddelbuettel: Has it stuck? If anything you as a lead developer R dev should be able to do a more educated guess now than us after having almost 3 years of checking the stickiness of it! – usεr11852 – 2015-03-21T23:50:14.900


I have no idea, but I gladly update the R side: As of today over 6400 packages on CRAN, and now over 350 of those using Rcpp. Still works for me. Julia folks seem active, and happy---and having a choice is a good thing. There is no one language for all problems: sorry, Python.

– Dirk Eddelbuettel – 2015-03-21T23:53:35.203

Thank you for the insight. I figured it might be a little early to tell on Julia. You're probably a bit biased, but do you see RCpp as a valid alternative in the interim, or is there too much programming knowledge needed to make it worth the hassle? – Christopher Aden – 2012-04-02T00:19:05.343

16Because of what I have seen via Rcpp, I am even more impressed by Julia---about 50, 60, 70 fold increases for simple looping as in MCMC, and several hundred fold for "degenerate" examples like fibonacci are essentially the same as Rcpp got! But I also know that with Rcpp I still get access to the 3700 CRAN packages---as well as countless C++ libraries---whereas Julia right now has almost nothing. That said, the promise of Julia is huge. But maybe there is a "then" as well as a "now". Time will tell. – Dirk Eddelbuettel – 2012-04-02T00:25:15.890

2And don't forget Incanter, which is supposed to become a statistical environment based on Clojure. How is Julia superior to that? – Wayne – 2012-04-03T15:35:18.540

2@Wayne, let's not muddy the waters here. Open a new question for that (perhaps one that asks for comparison between multiple languages) – naught101 – 2012-04-03T22:01:17.450

2@naught011: I'm simply echoing Dirk's point that Clojure was flavor of the month, then specifically Incanter, now Julia. I don't think that Julia or Incanter (or Clojure) stand a chance of being generalized statistical platforms. – Wayne – 2012-04-04T00:40:40.153


I can see Julia replacing Matlab, which would be a huge service for humanity.

To replace R, you'd need to consider all of the things that Neil G, Harlan, and others have mentioned, plus one big factor that I don't believe has been addressed: easy installation of the application and its libraries.

Right now, you can download a binary of R for Mac, Windows, or Linux. It works out of the box with a large selection of statistical methods. If you want to download a package, it's a simple command or mouse click. It just works.

I went to download Julia and it's not simple. Even if you download the binary, you have to have gfortran installed in order to get the proper libraries. I downloaded the source and tried to make and it failed with no really useful message. I have an undergraduate and a graduate degree in computer science, so I could poke around and get it to work if I was so inclined. (I'm not.) Will Joe Statistician do that?

R not only has a huge selection of packages, it has a fairly sophisticated system that makes binaries of the application and almost all packages, automatically. If, for some reason, you need to compile a package from source, that's not really any more difficult (as long as you have an appropriate compiler, etc, installed on your system). You can't ignore this infrastructure, do everything via github, and expect wide adoption.

EDIT: I wanted to fool around with Julia -- it looks exciting. Two problems:

1) When I tried installing additional packages (forget what they're called in Julia), it failed with obscure errors. Evidently my Mac doesn't have a make-like tool that they expected. Not only does it fail, but it leaves stuff lying around that I have to manually delete or other installs will fail.

2) They force certain spacing in a line of code. I don't have the details in front of me, but it has to do with macros and not having a space between the macro and the parenthesis opening its arguments. That kind of restriction really bugs me, since I've developed my code formatting over many years and languages and I do actually put a space between a function/macro name and the opening parenthesis. Some code formatting restrictions I understand, but whitespace within a line?


Posted 2012-04-01T22:56:09.113

Reputation: 14 651

9The day Open Source Language will replace MATLAB will be the best day to the engineering world. – Royi – 2013-08-02T13:07:38.143

8"I can see Julia replacing Matlab, which would be a huge service for humanity." I couldn't agree more. – davidav – 2013-10-03T11:16:37.853

5Julia's still VERY much in its infancy. I'm no historian, but I'd bet that clean binaries of R didn't come out in the first few months, either. Your point about the distribution system is something I haven't seen mentioned much thus far. Then again, I would also wager that CRAN did not sprout up the same time as R. A "CJAN" would definitely be nice for large-scale adoption. – Christopher Aden – 2012-04-03T18:57:39.727

7You might be interested then to know, @Christopher, that R is really an independently developed clone of a package (S, then S-Plus) that had been a (mild) commercial success and was under development ten years previously. That gave it a significant head start that Julia (and most other such efforts) never have. – whuber – 2012-04-03T19:19:21.743

3@ChristopherAden: I agree that Julia is yet young. But I would strenuously disagree that "a 'CJAN' would definitely be nice for large-scale adoption": it's an absolute necessity. The only tools I can think of that don't have a CRAN-like infrastructure are highly specialized -- like JAGS. But Julia, like R, is general purpose. – Wayne – 2012-04-04T00:46:17.167


Bruce Tate here, author of Seven Languages in Seven Weeks. Here are a few thoughts. I am working on Julia for the followup book. The following is just my opinion after a few weeks of play.

There are two fundamental forces at play. First, all languages have a lifespan. R will be replaced some day. We don't know when. New languages have an extremely difficult time evolving. When a new language does evolve, it usually solves some overwhelming pain point.

These two things are related. To me, we're starting to see a theme taking shape around languages like R. It's not fast enough, and it's harder than it needs to be. Those who can live within a certain performance envelope and stay within established libraries are fine. Those who can't need more, and they're starting to look for more.

The thing is, computer architectures are changing, and to take advantage of them, the language and its constructs need to be constructed in a certain way. Julia's take on concurrency is interesting. It optimizes the right thing for such a language: transparent distribution and the efficient movement of data between processes. When I use Julia for typical tasks, maps and transforms and the like, I am just calling functions. I don't have to worry about the plumbing.

To me, the fact that Julia is faster on one processor is interesting, but not overly damning for R. The thing that is interesting to me is that as processors depend more and more on multicore for performance, technical computing problems are just about ideally positioned to take the best possible advantage, given the right language.

The other feature that will help that happen is indeed macros. The pace of the language is just intense right now. Macros let you build with bigger, cleaner building blocks. Looking at libraries is interesting but doesn't tell the whole picture. You need to look at the growth of libraries. Julia's trajectory is pretty much spot on here.

Clojure is interesting to some because there's no technical language that does what R can, so some look to a general purpose language to fill that void. I am actually a huge fan. But Clojure is a pretty serious brain warp. Clojure will be there for programmers who need to do technical computing. It won't be for engineers and scientists. There's just too much to learn.

So to me, Julia or something like it will absolutely replace R some day. It's a matter of time.


Posted 2012-04-01T22:56:09.113

Reputation: 1

There aren't many new languages that provide both templated types and a first class lisp-derived macro ecosystem - Julia does. This capability along with it's concurrency features and speed (that will likely improve in future versions) give it a strong competitive position against other languages, in my view. I rarely use R but frequently use C++ (w/templates) and Lisp (w/macros). Julia can do both, cleanly and efficiently in a single, clear language. I am convinced that Julia will prove to be a major language in the future. – AsymLabs – 2018-01-10T21:51:08.743


Every time I see a new language, I ask myself why an existing language can't be improved instead.

Python's big advantages are

  • a rich set of modules (not just statistics, but plotting libraries, output to pdf, etc.)
  • language constructs that you end up needing in the long run (objected-oriented constructs you need in a big project; decorators, closures, etc. that simplify development)
  • many tutorials and a large support community
  • access to mapreduce, if you have a lot of data to process and don't mind paying a few pennies to run it on a cluster.

In order to overtake R, Julia, etc., Python could use

  • development of just-in-time compilation for restricted Python to give you more speed on a single machine (but mapreduce is still better if you can stand the latency)
  • a richer statistical library

Neil G

Posted 2012-04-01T22:56:09.113

Reputation: 8 477

3This may be true, but for a very-casual user, Python's language design may be a little harder to use than something like Matlab, or Julia, which has an even more math-like syntax. You can say y = 3x+2 in Julia and it works! – Harlan – 2012-04-03T16:20:58.443

@Harlan: How many statisticians are "very-casual users"? – Neil G – 2012-04-03T16:28:49.750

6That's funny: when I first saw Python some 10+ years ago I had exactly the same reaction (why is this needed? Why not just improve what's out there already? Why learn a whole new set of bizarre syntactic quirks, names of classes, methods, and procedures, and all the rest?). :-) – whuber – 2012-04-03T16:43:03.753

2@NeilG Not professional statisticians so much as non-programmer researchers in especially the sciences. Python's great for programmers, but if all you want to do is load your psychology data and fit some models (quickly), a very simple math-like syntax might be preferable to Python's elegant object-based design. – Harlan – 2012-04-03T17:05:53.197

@ Harlan - I doubt these users would need Julia, because R fits hat bill perfectly. – Owe Jessen – 2012-04-03T19:53:39.010

3@NeilG Keep in mind part of the success of R is that it's not just used by statisticians. It's used by people who do statistics. And social scientists, clinicians and first-year science graduate students are absolutely very casual users. – Fomite – 2012-04-04T17:07:18.780

@OweJessen Per my answer, they actually tend to follow where the statisticians go, because that means a preexisting code base. If the stats people all move to Julia, unless they start doing joint development with R, the users will move to where the people who make the code they need are. – Fomite – 2012-04-04T17:08:42.163

@EpiGrad: You are certainly right there, but the existing code is allreade in R, so Python and Julia have to either replicate the gigantic amount of work done for R (reinvent the wheel), or produce large benefits in places where R is lacking, which to me sounds like niche applications with regard to the casual useR (really casual users will continue to suffer Excel). Are there any ideas how Julia ties in with Ross Ihaka's sentiment to just start over (

– Owe Jessen – 2012-04-04T18:19:03.937

@OweJessen: Well, Python and R are open source, so the code can be ported without reinventing anything. There are significant advantages to Python as soon as you are doing anything complicated, which is partially driving the development of numpy, pandas, scipy, etc. – Neil G – 2012-04-04T18:29:45.843


I think (CrossValidated member) John D Cook's blog post is spot on: I'd much rather program math in a general purpose language than try to code math and systems problems in a math language. If the Julia community can keep this in mind, there is a good chance the language will stick for analytic programming in general (stats being only one part of that). See

– Josh Hemann – 2012-04-06T04:35:14.737


I think it's unlikely that Julia will ever replace R, for a lot of the reasons previously mentioned. Julia is a Matlab replacement, not a R replacement; they have different goals. Even after Julia has a fully-fleshed out statistics library, no one would ever teach an Intro to Statistics class in it.

However, an area in which it could be incredible is as a speed-optimized programming language that's less painful than C/C++. If it were seamlessly linked to R (in the style of Rcpp), then it would see a ton of use in writing speed-critical segments of code. Unfortunately no such link exists currently:

Ari B. Friedman

Posted 2012-04-01T22:56:09.113

Reputation: 2 010

But now there is one: did'nt try it (yet).

– kjetil b halvorsen – 2014-06-22T14:34:45.050


I am a Julia newbie, and am R competent. The reasons I find Julia interesting so far are performance and compatibility oriented.

GPU tools. I'd like to use CUSPARSE for a statistical application. CRAN results indicate there's not much out there. Julia has bindings available which seem to work smoothly so far.

N = 1000
M = 1000
hA = sprand(N, M, .01)
hA = hA' * hA
dA = CudaSparseMatrixCSR(hA)
dC = CUSPARSE.csric02(dA, 'O') #incomplete Cholesky decomp
hC = CUSPARSE.to_host(dC)

HPC tools. One can use a cluster interactively with multiple compute nodes.

nnodes = 2
ncores = 12    #ask for all cores on the nodes we control
procs = addprocs(SlurmManager(nnodes*ncores), partition="tesla", nodes=nnodes)
for worker in procs
    println(remotecall_fetch(readall, worker, `hostname`))

Python compatibility. There's access to the python ecosystem. E.g. It was straightforward to find out how to read brain imaging data:

import PyCall
@pyimport nibabel

fp = "foo_BOLD.nii.gz"
res = nibabel.load(fp)
data = res[:get_data]();

C compatibility. The following generates a random integer using the C standard library.

ccall( (:rand, "libc"), Int32, ())

Speed. Thought I would see how the Distributions.jl package perfomed against R's rnorm - which I assume is optimised.

julia> F = Normal(3,1)
Distributions.Normal(μ=3.0, σ=1.0)

julia> @elapsed rand(F, 1000000)

In R:

> system.time(rnorm(1000000, mean=3, sd=1))
   user  system elapsed 
  0.262   0.003   0.266 


Posted 2012-04-01T22:56:09.113

Reputation: 2 657

1@NickCox, as there are more than a dozen answers already, I thought it may be interesting to highlight an alternate angle. Also, I posted an early draft accidentally :) – conjectures – 2015-10-12T17:05:05.033

1The question was why Julia might stick in the statistical community, my answer centres on apparently good support for hpc + gpu, which many people with compute intensive work may find interesting. – conjectures – 2015-10-12T17:28:33.510


Julia will not take over R very soon. Check out Microsoft R open.

This is an enhanced version of R that automatically uses all the cores of your computer. It is the same R, same language, same packages. When you install it, RStudio will also use it in the console. The speed of MRO is even faster than Julia. I do a lot of heavy-duty computing and have used Julia more than a year. I switched to R recently because R has a better support and RStudio is an awesome editor. Julia is still in early stage and possibly not catching up Python or R very soon.

Milton Mai

Posted 2012-04-01T22:56:09.113

Reputation: 1


The following probably does not deserve to be an answer, but it is too important to be buried as a comment to someone else's response...

I have not heard much said about memory consumption, just speed. R's entire semantics being pass-by-value can be painful, and this has been one criticism of the language (which is a separate issue from how many great packages already exist). Good memory management is important, as is having ways of dealing with out-of-core processing (e.g. numpy's memory mapped arrays or pytables, or Revolution Analytics' xdf format). While PyPy's JIT compiler allows for some striking Python benchmarks, memory consumption can be quite high. So, does anyone have experience with Julia and memory usage yet? Sounds like there are memory leaks on the Windows "alpha" version that will no doubt be addressed, and I am still waiting on access to a Linux box to play with the language myself.

Josh Hemann

Posted 2012-04-01T22:56:09.113

Reputation: 2 686

1And R isn't really strictly pass-by-value. Lazy evaluation and some clever optimization means that often data isn't copied unless it has to be. – Ari B. Friedman – 2013-05-07T18:42:18.170

True, but there are ways to use pass-by-reference in R (Reference Classes, for one). – Ari B. Friedman – 2012-05-15T13:46:04.627


I am interested by the promise of better speed and easy parallelisation using different architectures. For that reason I will certainly watch Julia development but I am unlikely to use it until it can handle generalised linear mixed models, the has a good generic bootstrap package, a simple model language for building design matrices the capability equivalent to ggplot2 and a wide range from machine learning algorithms.

No statistician can afford to have a fundamentalist attitude to the choice of tools. We will use whatever enables us to get the job done most efficiently. My guess is I will be sticking with R for a few years yet, but but it would be nice to be pleasantly surprised.

Mervyn thomas

Posted 2012-04-01T22:56:09.113

Reputation: 1

Hi Mervyn, and welcome to Stats.SE! Julia has made some substantial improvements in the time since I created this post (almost a year ago!). Douglas Bates ported some of his GLM (maybe GLMM?) code to Julia, and the main Github page has seen many updates in the past year. My take on Julia thus far (I've used it on and off since last year) has been that's a nice tool for speed, which I use for some crude MCMC, but it hasn't replaced R in my toolchain yet. Can't wait for either R to get faster, or Julia to be more widespread!

– Christopher Aden – 2013-04-15T22:05:23.350

Doug hasn't ported GLMMs yet. If someone wants to help with that I'm sure he would be happy ... – Ben Bolker – 2014-01-25T17:24:06.047


I will be up front, I have no experience with R, but I work with plenty of people that think it is an excellent tool for statistical analysis. My background is in data warehousing, and due to Julia's easily distributed, but more standard programming model, I think it could be a very interesting substitute for the transform portion of traditional ETL tools that generally do the job very poorly , most have no way of easily creating a standardized transform, or re-using the results of a transform already performed on a prior data-set. The support for tightly defined and typed tuples stands out, if I want to build an OLAP cube that basically needs to build more detailed tuples (fact tables) out of tuples already calculated, today's ETL tools have no 'building blocks' to speak of that can help, this industry has worked around this issue through various means in the past, but there are trade-offs. Traditional programming languages can help by providing centrally defined transformations, and Julia could potentially simplify the non-standard aggregations and distributions common in more complex data warehouse systems.


Posted 2012-04-01T22:56:09.113

Reputation: 1


You can also use Julia and R together. There is Julia-to-R interface. With this packages you can play with Julia while calling R whenever it has a library that would be needed.


Posted 2012-04-01T22:56:09.113

Reputation: 207


The luxury of NA's in R does not come without performance penalties. If Julia supports NA's with a smaller performance penalty then it becomes interesting to a segment of the stats community, but NA's also impose considerable extra work when using compiled code with R.

Many of the packages in R rely on routines written in legacy languages (C, Fortran, or C++). In some cases the compiled routines were developed outside R and later used as the basis for R library packages. In others the routines were first implemented in R and then critical segments translated to a compiled language when performance was found lacking. Julia will be attractive if it can be used to implement equivalent routines There is an opportunity to design low-level support for NA's in a way that simplifies NA handling over what we have now when using R with compiled code.

The massive number of R libraries represents the efforts of many many users. This was possible because R provided capabilities that weren't otherwise available/affordable. If Julia is to become widely used, it needs a group of users who find it does what they need so much better than the alternatives that is worth the effort needed to supply very basic things (e.g., graphics, date classes, NA's, etc.) available from existing languages.

George N. White III

Posted 2012-04-01T22:56:09.113

Reputation: 1


Julia has without doubt every chance of becoming a statistics power-users dream come true, take SAS for example, it's power lies in the numerous procs written in C - what Julia can do is give you the procs with the source code, with matrices as a built in data type dispensing with SAS/iml. I have no doubt that statisticians will flock to Julia once they get a handle on just what this puppy can do.

Jimbo He

Posted 2012-04-01T22:56:09.113

Reputation: 1

1Welcome to Stats.SE, Jimbo. I disagree with your assertion. I think we've seen what Julia is able to do, but the problem at this point is that there aren't nearly as many domain-specific packages for it as there are in R. R will continue to reign supreme in open source statistics as long as researchers see more benefit to using the numerous packages in the R universe. That's my take, at least. – Christopher Aden – 2013-05-06T16:07:38.483


Oh yes, Julia will overtake R quite quickly. And the primary reasons will be "macros", 95% of the language is implemented in Julia, and its noise free, parsimonious syntax. If you don't have experience with lisp type of languages you might not understand it as yet, but you will see pretty quickly how R formula interface will became an obsolete and ugly mechanism, and will be replaced by specialized modeling micro languages akin to CL loop macro. Access to low level references of an object is also a big plus. I think R still didn't get that hiding internals from the user actually complicates than simplifies the things.

As I see it now (having years of heavy use of R behind, and just finished reading Julia manual), Julia's main drawbacks with respect to R is no support for structural inheritance (this was intentional). Julia's type system is less ambitious than S4; it also supports multiple dispatch and multiple inheritance, but with a catch - there is only one level of concrete classes. On the other hand I rarely see class hierarchies in R deeper than 3 levels.

Time will tell, but it will be sooner than most R users think:)


Posted 2012-04-01T22:56:09.113

Reputation: 167

If julia really becomes a MATLAB replacement, then it will have huge benefits to use the same language for engineering and statistics! The overlapping areas (such as time series) are huge. – kjetil b halvorsen – 2014-06-22T14:44:35.883

2You make a good point about macros: decades later people still underestimate how powerful Lisp really is. However, as you imply in point #1, this language is essentially a Matlab replacement, not an R replacement. I think you also ignore the fact that it is language plus libraries (packages) that people use and Julia doesn't even have 1% of what it needs there. – Wayne – 2012-04-06T14:32:14.463

2@Wayne, I don't ignore anything, the OP was about the future and not about what is now. In 5 years, we might see many more libraries for stats in Julia than there are now for R. And this, just because Julia has a good chance to be a much better language. – VitoshKa – 2012-04-12T18:44:55.037


Julia's first target use cases are numerical problems. Basically, you can break these analysis and computational science fields into data science (data driven) and simulation science (model driven). Julia is dealing with the simulation science use cases first. They are also dealing with the data science cases, but more slowly. R will never be very useful for simulation science, but Julia will be very useful for both in a couple of years.

Jamie Lawson

Posted 2012-04-01T22:56:09.113

Reputation: 1


It needs to be able to apply any function to large datasets that don't fit on memory transparently for the user.
That includes at least running mixed effects models, survival models or MCMC on datasets that fit on the disk but not on memory. And if possible on datasets distributed on several computers.


Posted 2012-04-01T22:56:09.113

Reputation: 344