Faster alternatives for DayOfWeek

26

10

It has been noticed on several occasions that DayOfWeek function is rather slow when applied to a large list of dates, e.g. in this recent question. What faster alternatives do we have in such situations?

Leonid Shifrin

Posted 2012-06-20T19:32:56.347

Reputation: 108 027

7Here comes the self-answer :) – Rojo – 2012-06-20T19:39:10.457

1

Related http://en.wikipedia.org/wiki/Determination_of_the_day_of_the_week

– Dr. belisarius – 2012-06-20T19:47:46.247

Answers

25

Just a literal implementation of a formula for the day of the week:

Clear[dow];
dow[{year_, month_, day_, _ : 0, _ : 0, _ : 0}] :=
  Module[{Y = If[month == 1 || month == 2, year - 1, year], 
    m = Mod[month + 9, 12] + 1, y, c, 
    s = {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday}},
    y = Mod[Y, 100];
    c = Quotient[Y, 100];
    s[[Mod[day + Floor[2.6 m - 0.2] + y + Quotient[y, 4] + Quotient[c, 4] - 2 c, 7] + 1]]];

Seems to give a 5-fold speed increase:

d = RandomDates[100000];
DayOfWeek /@ d // Short // AbsoluteTiming
dow /@ d // Short // AbsoluteTiming

{19.5781250,{Thursday,Thursday,Sunday,Friday,<<99992>>,Tuesday,Saturday,Saturday,Thursday}}

{3.7968750,{Thursday,Thursday,Sunday,Friday,<<99992>>,Tuesday,Saturday,Saturday,Thursday}}

Addition

Your function is readily compilable:

dowc = Compile[{{year, _Integer}, {month, _Integer}, {day, _Integer}},
    Module[
        {Y, m, y, c, s},
        Y = If[month == 1 || month==2, year-1, year];
        m = Mod[month + 9, 12] + 1; 
        y = Mod[Y, 100]; c = Quotient[Y, 100];
        Mod[day + Floor[2.6 m-0.2] + y + Quotient[y, 4] + Quotient[c, 4]-2 c,7]+1
    ],
    CompilationTarget -> "C",
    RuntimeAttributes -> {Listable},
    Parallelization -> True
];

In[286]:= dowc @@@ d[[All, 1 ;; 3]] // Short // AbsoluteTiming
Out[286]= {0.136741,{6,4,5,2,4,5,3,7,<<99984>>,5,4,2,4,3,2,5,6}}

wxffles

Posted 2012-06-20T19:32:56.347

Reputation: 13 711

5+1. It is then really a shame that the Calendar` package function can be beaten several times by a completely top-level implementation. – Leonid Shifrin – 2012-06-20T22:15:46.530

1Floor[c/4] is better written as Quotient[c, 4], I'd say. – J. M.'s ennui – 2012-06-20T23:17:15.500

1Also, you could compact things a bit with {c, y} = QuotientRemainder[Y, 100] – J. M.'s ennui – 2012-06-20T23:19:44.250

1I see you guys always use AbsoluteTiming, which gives you the elapsed real world time. Isn't Timing more appropriate, since it only seems to count the time spent by the kernel? – stevenvh – 2012-10-10T18:52:26.500

22

I will provide one solution which will be using Java and a simple Java reloader I recently introduced. This solution brings to the table up to 100-fold speed-up for large lists of dates.

Preparation

I will borrow @Mike's functions to generate a random list of dates, from his code in his recent question

RandomDateList[] := {
   RandomInteger[{1800, 2100}], 
   RandomInteger[{1, 12}], RandomInteger[{1, 28}], 
   RandomInteger[{0, 23}], RandomInteger[{0, 59}], 
   RandomInteger[{0, 59}]
};

RandomDates[n_] := Table[RandomDateList[], {n}]

Implementation

  1. Load the Java reloader

  2. Compile the following class:

    JCompileLoad@
      "import java.util.*;
       public class DayOfWeekCalculator {
           public static int[] getDaysOfWeek(int[][] dateDataList){
              Calendar calendar = new GregorianCalendar();
              int[] result = new int[dateDataList.length];
              int ctr = 0;
              for(int[] date: dateDataList){                        
                 calendar.set(date[0],date[1],date[2]);
                 result[ctr++]=calendar.get(Calendar.DAY_OF_WEEK);
              }
              return result;    
           }    
       }"
    
  3. The actual function is then:

    Clear[dayOfWeek];
    dayOfWeek[dates_List] :=
       DayOfWeekCalculator`getDaysOfWeek@Transpose@
           {#[[All, 1]], #[[All, 2]] - 1, #[[All, 3]]} &@dates;
    

The input is a nested list of the type we construct randomly, which is a natural date format as it appears in Mathematica. I subtract 1 from month, to comply with the Java conventions.

Use and benchmarks

d=RandomDates[100000];

dayOfWeek[d]//Short//AbsoluteTiming

(*
   {0.1259765,{6,6,1,3,6,6,3,5,3,2,2,4,4,5,6,3,4,2,5,6,7,2,4,
     <<99954>>,2,2,3,1,1,6,5,7,6,7,5,1,6,3,7,4,6,4,5,7,4,1,3}}
*)

DayOfWeek/@d//Short//AbsoluteTiming

(*
    {14.0732422,{Friday,Friday,Sunday,Tuesday,Friday, 
     <<99990>>,Thursday,Saturday,Wednesday,Sunday,Tuesday}}
*)

There is a 100-fold speedup for this example. Note that there is a small constant overhead of calling Java, so the larger is your list of dates, the more you gain.

Remarks

I think that this can be one of the "canonical" examples of a situation where the use of Java is more than appropriate. Generally, this happens when some of the following is true:

  • You have a large collection of Mathematica objects, which you want to process somehow.
  • The top-level overhead of explicit looping is (very) large, but the problem is not easily amenable to Compile
  • The functionality you seek for is readily available via Java libraries, or can be easily implemented using those.

Effective use of Java / JLink implies that loops are outsourced to Java. Only then the overhead of Java / JLink will not play a big role. Performing looping in Mathematica while invoking Java functions is likely to not be faster, and often be slower, than doing it all in Mathematica.

A big thanks goes to @Mike for spotting a bug in the reloader (which has been now fixed).

Leonid Shifrin

Posted 2012-06-20T19:32:56.347

Reputation: 108 027

item 3, indeed, it is, see my own answer. – Andreas Lauschke – 2012-07-06T00:38:44.580

12

I've shown off Larsen's method before (and see this as well), but here it is as a formal answer:

larsen[{yr_Integer, mo_Integer, da_Integer, ___}] := Module[{y = yr, m = mo, d = da, q},
  If[m < 3, y--; m += 12];
  q = d + 2 m + 1 + Quotient[3 (m + 1), 5] + y + Quotient[y, 4] +
      Quotient[y, 400] - Quotient[y, 100];
  {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday}[[Mod[q, 7] + 1]]]

This assumes the use of the Gregorian system, so this will require some modification if you need to work with dates older than the switching date September 14, 1752 (where the Julian system was still in use).


Here's how to adapt larsen[] for both Julian and Gregorian systems:

Options[larsen] = {"Calendar" -> "Gregorian"};

larsen[{yr_Integer, mo_Integer, da_Integer, ___}, OptionsPattern[]] :=
Module[{y = yr, m = mo, d = da, q, f},
       If[m < 3, y--; m += 12];
       f = Switch[OptionValue["Calendar"],
                  "Gregorian", Quotient[y, 400] - Quotient[y, 100],
                  "Julian", 5,
                  _, Return[]];
       q = d + 2 m + 1 + Quotient[3 (m + 1), 5] + y + Quotient[y, 4] + f;
       {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday}[[Mod[q, 7] + 1]]]

J. M.'s ennui

Posted 2012-06-20T19:32:56.347

Reputation: 115 520

3

That switching date is valid only for weird island dwellers (and their sphere of influence) who always want to do things the unusual and inconvenient way :P

– Szabolcs – 2012-06-21T15:15:02.687

10

This recent post reminded me that AbsoluteTime is a fast kernel function.

Using the RandomDates function from Leonid's post:

dates = RandomDates[500000];

Needs["Calendar`"]

rls = Thread[
       Range[0, 6] -> 
        {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday}
      ];

Timing[result1 = DayOfWeek /@ dates;]

Timing[result2 = Mod[Quotient[AbsoluteTime /@ dates, 60^2 24], 7] /. rls;]

result1 === result2
{37.783, Null}

{0.921, Null}

True

~ 41X speed-up.

Mr.Wizard

Posted 2012-06-20T19:32:56.347

Reputation: 259 163

Very cool, pretty fast ineed, +1. – Leonid Shifrin – 2012-06-22T21:54:12.287

6

Needs["JLink`"];
AddToClassPath[ToFileName[{$HomeDirectory,"javafiles","joda-time-2.1"}]];
JavaNew["org.joda.time.DateTime",2012,4,17,0,0]@dayOfWeek[]@getAsText[]

Super-fast. You need the Joda Time library for that.

If you're a hardcore JLink user, you have the first two lines in your init.m anyway, so the problem reduces to 71 characters, with an amazing speed.

Joda Time is ISO 8601-compliant.

Andreas Lauschke

Posted 2012-06-20T19:32:56.347

Reputation: 3 600

3+1. I thought about using Joda library in my answer as well, but did not want to introduce external dependencies. The reason I provided a vectorized (longer) version in my answer is to outsource loops to Java. Without doing this, it would not really matter how fast is individual day of week computation in Java, since the call to Java method via JLink has a constant overhead, which is in fact much larger than this computation time. I avoid this overhead by calling the function only once, and of course also Java loops are very much faster than top-level M loops. – Leonid Shifrin – 2012-07-06T08:25:19.113

@Leonid: Yepp, fully agreed. I like short code and have no qualms about external dependencies. – Andreas Lauschke – 2012-07-06T13:26:36.647

Although my solution is 8 months old, I probably should add that Joda Time was upgraded to 2.2 on March 8. The download link is http://sourceforge.net/projects/joda-time/files/joda-time/2.2/

– Andreas Lauschke – 2013-03-12T18:47:29.557

Oh boy, it's been 8 months already. The time is flying. Can't say I am too happy about that, I could have been more productive... – Leonid Shifrin – 2013-03-12T18:57:24.003

5

I will provide one solution which will be using ANSI C and LibraryLink. Needless to say that this is a speeder...(Platform: MacOSX, gcc 4.2)

The preparations are the same as in Leonid's answer.

Implementation

dayofweek = "
#include \"WolframLibrary.h\"

DLLEXPORT mint WolframLibrary_getVersion(){
   return WolframLibraryVersion;
}

DLLEXPORT int WolframLibrary_initialize( WolframLibraryData \
  libData) {
return 0;
}

DLLEXPORT void WolframLibrary_uninitialize( WolframLibraryData \
  libData) {
return;
}

#define _LEAP_YEAR(year)  (((year) > 0) && !((year) % 4) && \
    (((year) % 100) || !((year) % 400)))

#define _LEAP_COUNT(year) ((((year) - 1) / 4) - (((year) - 1) / \
    100) + (((year) - 1) / 400))

const int yeardays[2][13] = {
  { -1, 30, 58, 89, 119, 150, 180, 211, 242, 272, 303, 333, 364 },
  { -1, 30, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365 }
};

const int monthdays[2][13] = {
  { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 },
  { 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }
};

int weekday(int year, int month, int day)
{
  int ydays, mdays, base_dow;
  /* Correct out of range months by shifting them into range (in the same year) */
  month = (month < 1) ? 1 : month;
  month = (month > 12) ? 12 : month;
  mdays = monthdays[_LEAP_YEAR(year)][month - 1];
  /* Correct out of range days by shifting them into range (in the same month) */
  day = (day < 1) ? 1 : day;
  day = (day > mdays) ? mdays : day;
  /* Find the number of days up to the requested date */
  ydays = yeardays[_LEAP_YEAR(year)][month - 1] + day;
  /* Find the day of the week for January 1st */
  base_dow = (year * 365 + _LEAP_COUNT(year)) % 7;
  return (base_dow + ydays) % 7;
}

DLLEXPORT int dayOfWeek(WolframLibraryData libData,
        mint Argc, MArgument *Args, MArgument Res) {
  mint I0, I1, I2;
  I0 = MArgument_getInteger(Args[0]);
  I1 = MArgument_getInteger(Args[1]);
  I2 = MArgument_getInteger(Args[2]);

  MArgument_setInteger(Res, weekday(I0, I1, I2));
  return LIBRARY_NO_ERROR;
}
";

Create the Library and load it

lib = CreateLibrary[dayofweek, "dayOfWeek", CompileOptions -> "-O3 -funroll-loops"];
dow = LibraryFunctionLoad[lib, "dayOfWeek", {Integer, Integer, Integer}, Integer];

Microsoft's compiler (CL) has similar options with just different naming...

The dayOfWeek function

Clear[dayOfWeek];
dayOfWeek[dates_List] := 
   dow[#[[1]], #[[2]], #[[3]]] & /@ 
      Transpose@{#[[All, 1]], #[[All, 2]] - 1, #[[All, 3]]} &@dates

Timing

dayOfWeek[RandomDates[100000]] // Short // AbsoluteTiming
{0.067380,{6,5,6,6,3,2,0,0,4,6,4,3,5,3,4,6,6,<<99966>>,...}}

Conclusion

As the argumentation holds to use Java, because of it's simple interface I think I've shown that this holds as well for C/C++ and is unbeatable fast.

Stefan

Posted 2012-06-20T19:32:56.347

Reputation: 5 207

+1. I should try using Joda library to see if I can beat that, but even if it so happens that pure Java code is somehow faster, I will probably lose anyway due to JLink/MathLink data transfer overhead - not to mention that this overhead is huge for smaller lists, which is not the case for LibraryLink. – Leonid Shifrin – 2013-03-12T18:40:56.263

@LeonidShifrin That could be very interesting. I was astonished how fast your JLink/MathLink implementation was anyways and your JavaReloader is just simply genius. Why do I not come to an idea like that...sigh.... – Stefan – 2013-03-12T18:46:55.787

@Leonid: but it requires programming in C and an external compiler. Java works out of the box. – Andreas Lauschke – 2013-03-12T18:50:20.930

@Andreas Yes, true. Re: Stefan: There are tons of similar things to be done for Mathematica. One reason I like Mathematica as a platform is that so little has been done yet in terms of seamless interops with other languages / tools, and development tools for all this. So, I can learn by making some of these things (if only I had enough time for that ...:-)). By the way, Andreas has made big progress in this area with his JVM tools, they should get more traction. – Leonid Shifrin – 2013-03-12T18:55:13.413

@AndreasLauschke admitted, albeit in a Unix environment C is omnipresent and Compile with Target->"C" does use an external compiler as well.

I don't know. Does Mma ship with a Java compiler out of the box? – Stefan – 2013-03-12T18:55:56.320

@Stefan, fully agreed. A C compiler is omnipresent these days. M does not ship with a Java compiler, but my point is that you don't NEED one when you just call an external library. Joda Time is an external library that is already compiled for you. – Andreas Lauschke – 2013-03-12T18:58:22.760

@Andreas Re:Java compiler - it should be inside. My relaoder uses that. The JRE shipped with Mathematica is a bit non-standard, exactly in that it includes the compiler, while standard JREs don't. But I did not check for the version 9. – Leonid Shifrin – 2013-03-12T18:59:38.010

@Andreas is the Joda library your JVM tools that Leonid mentioned before? – Stefan – 2013-03-12T19:01:20.760

@Leonid: This may turn into a separate discussion ... the reason I'd ALWAYS speak against any JRE or compiler that ships with M is that it's always out of date. The last Java updates were all security updates (we're now on J7U17), so people should ALWAYS d/l and install the latest and use that. That's what ReinstallJava[] is there for. Any Java that ships with M will ALWAYS be outdated, and security updates are a valid reason to upgrade. – Andreas Lauschke – 2013-03-12T19:02:10.437

@Stefan, no Joda is separate, it's at http://sourceforge.net/projects/joda-time/files/joda-time/2.2/, version 2.2 is brand new, as of March 8. But I mull incorporating it in my product JVMTools if I see enough need to provide better date and time functions.

– Andreas Lauschke – 2013-03-12T19:03:25.070

@Andreas I know a bit about the latest security breaches in Java, but have these an effect on Mathematica especially the environment where Mma is used for? The new language features and concurrency is worth an update from a M/JLink point of view. Am I wrong? – Stefan – 2013-03-12T19:05:49.707

@Stefan: M9 ships already with Java 7. But not with update 17, and the last half dozen or so updates were all an update to Oracle's security baseline as well. We should NEVER be using anything less than Java 7. Security is just one of the reasons, performance is another major reason to only use Java 7. And security-conscious people should NEVER use anything less than the current security baseline, and that is J7U17 at this point. – Andreas Lauschke – 2013-03-12T19:08:47.053

Stefan and @Leonid: there are other reasons I prefer Java over LibraryLink. One is resiliency. If anything goes wrong with the code and it crashes, LibraryLink takes down THE WHOLE KERNEL. It kills your session. MathLink, JLink, and NETLink don't crash the kernel, thus as more resilient. You can continue working in M then. I don't want to lose my entire session, which also means I can't save my work anymore, just because an external library crashes. It's one of the disadvantages I see with LibraryLink over MathLink (and thus, JLink and NETLink). – Andreas Lauschke – 2013-03-12T19:11:58.853

@AndreasLauschke Yep, all your points make perfect sense to me. – Leonid Shifrin – 2013-03-12T19:15:57.380

@AndreasLauschke I even had a version where I created an executable and a MathLink version instead of a library with LibraryLink, but wanted to squeeze out the last performance.

My opinion on that matter is, if you're really sure what you're doing there is nothing wrong with down to the metal C/C++.

From a computer science point of view I don't like Java much. I like it's clean interface but I dislike really some of the engineering decisions made by Gosling's folk. I write native code all day; the only non-native code I write are cshell scripts. I think I'm just used to it... – Stefan – 2013-03-12T19:27:40.903

for me it is the most natural choice. And, if my program crashes then there is a reason for that and it is good that it crashes, because something is really wrong. I don't like forgiving environments when it comes to system development.

But honestly. I accept and appreciate your argumentation. You are absolutely right in your domain, with what you expect within. – Stefan – 2013-03-12T19:32:46.153

2part1: @Stefan, indeed, if the programmer knows what he/she is doing, then native code is not a problem. Hell, we can crash M with kernel commands alone. Regarding Java's legacies, indeed, Java is a pretty bad language by today's standards, but the performance is in the JVM. That's one of the reasons my product JVMTools supports Scala, because Scala is the "professional Java" nowadays, and also runs on the JVM. Performance-wise, there is no difference between Java-compiled and Scala-compiled code. If you want the speed of the JVM but a professional language for it, look no further than Scala. – Andreas Lauschke – 2013-03-12T19:35:28.680

part2: @Stefan: the Scala programmers sometimes call Scala "Java without the noise and ceremony", and I have to wholeheartedly agree. The JVM is very modern, Java is old and inefficient (as a language! not the JVM!), so Scala is the solution. And my product JVMTools fully supports it. – Andreas Lauschke – 2013-03-12T19:36:40.350

@Andreas and if I understood that correct JLink is not really JLink it is more JVMLink, since it is not a big thing to link Scala code to M Right? So if a language is able to produce JVM bytecode the classloader inside JLink can successfully load it. Correct? – Stefan – 2013-03-12T19:40:49.897

@Stefan: dead on. I should probably do a complete write-up about this, the comment functionality here is extremely limited and hardly anybody reads it. A "JLink variable" "merely" provides a link into a memory address in the JVM (see JLink documentation, almost verbatim). And neither the JVM nor JLink care if you compiled the bytecode with Java or Scala. Hell, you could even use JRuby or Clojure to generate bytecode for the JVM. Now, using Scala code directly from M is another story. My product JVMTools allows you to do that, but it's different from using Scala-compiled byte code with JLink! – Andreas Lauschke – 2013-03-12T19:46:49.787

2You should! @Leonid would that be something for the blog? I find that really interesting and I'm really curious in interfacing with mathematica, especially from a professional point of view. By professional I mean really serious stuff not just mumbo-jumbo...I think decent interfacing with the system could lead it into other environments where M doesn't play a role...but...sigh...Matlab (sorry for being religious...). High performance secure interfaces. That would be a go I presume... – Stefan – 2013-03-12T19:57:19.840

@Stefan Completely agree on the importance of interfaces. This is also of a professional interest to me, as well as being interested as a user. Re: blog: I owe at least a few blog posts for SE blog - I promised many. Have been swamped in the last few months though, recently haven't been much free time. Hope things will change. – Leonid Shifrin – 2013-03-12T20:03:10.517

@Leonid What has Stephen what we don't have? ;) Andreas should write about it and you as well and later on I join and do silly things, just to hardening your both propositions. :)

No really. That would be wonderful! – Stefan – 2013-03-12T20:09:42.753

1Yes, that sounds good :-) – Leonid Shifrin – 2013-03-12T20:17:55.953

@Andreas would it make sense to you if something similar to your JVMTools exists in C++ as well? I mean why not? Or even parts of it. I find that really interesting and I'd like to thank Leonid for bringing us together. – Stefan – 2013-03-12T21:56:51.563

part 1: @Stefan: geez, that really becomes a totally separate discussion, completely detached from the original issue. a) JVMTools is more than just compilation and usage of Java, Scala, C#, F# code from a M session. That's just one feature of many. JVMTools contains many other algorithms that beat M in terms of speed or problem size, for example the extensive Travelling Salesman functions that allow you to solve TSP problems with thousands of nodes in seconds, which are problems where M can only throw in the towel, or several concurrency methods. – Andreas Lauschke – 2013-03-12T22:12:40.303

part 2:@Stefan: b) if you want C compilation from M, there's already the ability to do that since M8, with the compile tools and symbolic C compilation. c) No, I generally believe in the advantages of virtual machines such as the JVM or .Net for various reasons, too many to write down here. C compilation has many problems, and many of them are documented here on m.SE as well as on my website. Keep in mind that Java by now is about as fast as C code, and the JVM actually compiles many methods (most "active" methods that are used frequently) to native code as well, and I'd rather have ... – Andreas Lauschke – 2013-03-12T22:16:42.470

part 3: @Stefan: ... native code generated by a virtual machine than my own. That way I keep it o/s-independent, I get optimized code (believing Java writes better native code than I could), I get garbage-collection and don't have to do my own memory management, etc. I see the JVM as the "comfortable" version of getting the speed of native code, and there are so many things that can go wrong with C or Fortran code that I'd prefer keeping it with the JVM or .Net. The ONLY advantage of C or Fortran has been speed, and that's no longer a valid argument as the JVM by now is just as fast. – Andreas Lauschke – 2013-03-12T22:21:47.573

Well, accept your rebuke. Since this became emotionally. Well then beat me! Your JVM is a hammer. I've just showed interest. – Stefan – 2013-03-12T22:43:47.120

@Stefan: contact me at info@lauschkeconsulting.net or jvmtools@lauschkeconsulting.net, I'll send you a free trial version that is not feature-capped. – Andreas Lauschke – 2013-03-12T22:51:50.893

I'll do. I'm really interested in this. Thank you. – Stefan – 2013-03-12T23:17:13.937

And the argument pro C++11 is speed and rich interfaces....I'm eager to get your product. That sounds really interesting to me. – Stefan – 2013-03-12T23:23:12.897