Analyzing citations with Mathematica

21

9

I need to analyze the citations of a paper. Since the number is around 500, I wanted to import the list of authors, titles etc... into Mathematica and then analyze the raw data there.

I can easily extract the citations to a bibtex file from Web of Science. However, I was wondering if there is already some interface to get the data into Mathematica.

Edit: The following is an example bibtex file

@article{feynman1948space,
  title={Space-time approach to non-relativistic quantum mechanics},
  author={Feynman, Richard Phillips},
  journal={Reviews of Modern Physics},
  volume={20},
  number={2},
  pages={367},
  year={1948},
  publisher={APS}
}


@article{einstein1905theory,
  title={The theory of the brownian movement},
  author={Einstein, Albert},
  journal={Ann. der Physik},
  volume={17},
  pages={549},
  year={1905}
}

physicsGuy

Posted 2017-01-10T16:17:37.833

Reputation: 507

1

There is a tutorial tutorial/CitationManagement that works in version 9. But the weblink tutorial/CitationManagement doesn't work anymore.

– kglr – 2017-01-10T17:42:11.837

1Can you export to some XML-based format? That will be much easier to import and parse. – Szabolcs – 2017-01-10T18:05:15.713

2I'm sure there are utilities to convert to csv. From experience developing a parser will be more of a chore than you might expect because you will encounter all manner of weird variations in format. – george2079 – 2017-01-10T18:21:17.473

3

@kglr here's the fixed link.

– rcollyer – 2017-01-10T18:29:28.033

3I think @george2079 is generally speaking right, but in this case we (1) can easily convert the BiBTeX records to JSON and then use ImportString and (2) relatively quickly actually write a parser that works most of the time. – Anton Antonov – 2017-01-10T21:27:36.863

Answers

11

This answer has two proposed solutions one with a dedicated parser the other using conversion to JSON. The programming of the JSON conversion happened three times faster. (But it was also done earlier in the day when I was also properly caffeinated...)

Making a dedicated parser

We can make a dedicated parser in a manner similar to the approach explained in this answer of "How to parse a clojure expression?".

Load the FunctionalParsers.m package:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"]

Write BibTeX grammar in EBNF :

ebnfBibTeXCode = "
  <bt-database> = { <bt-record> } <@ BTDatabase ;
  <bt-record> = ( '@' &> <bt-record-type> ) , ( '{' &> <bt-id> <& ',' ) , <bt-record-entry-list> <& '}' <@ BTRecord ;
  <bt-record-entry-list> = <bt-record-entry> , [ { ',' &> <bt-record-entry> } ] ;
  <bt-record-entry> = ( <bt-key> <& '=' ) , ( <bt-num-val> | '{' &> { <bt-val> } <& '}' ) <@ BTEntry ;
  <bt-key> = '_WordString' ;
  <bt-num-val> = '_?NumberQ' ;
  <bt-val> = <bt-num-val> | '_WordString' ;
  <bt-record-type> = 'article' | 'Article' | 'book' | 'Book' |  'incollection' | 'InCollection' <@ BTType ;
  <bt-id> = '_WordString' <@ BTID ;
  ";

res = GenerateParsersFromEBNF[ParseToEBNFTokens[ebnfBibTeXCode]];
LeafCount /@ res[[1]]

(* {411} *)

After generating parsers we can (have to) modify them in order to handle more general objects than the specified EBNF rules:

pBTVAL = (ToExpression\[CircleDot]ParsePredicate[
      StringMatchQ[#, NumberString] &])\[CirclePlus]ParsePredicate[
    StringMatchQ[#, (Except[{"{", "}"}] ..)] &];

pBTKEY = ParsePredicate[StringMatchQ[#, (Except[{"{", "}"}] ..)] &];

pBTID = ParsePredicate[
   StringMatchQ[#, (WordCharacter | PunctuationCharacter) ..] &];

pBTRECORDENTRYLIST = 
  ParseShortest[ParseListOf[pBTRECORDENTRY, ParseSymbol[","]]];

Here is a dedicated tokenizer:

ToBibTeXTokens[s_String] := 
  ParseToTokens[s, {"=", "{", "}", "@", ","}];

We get this output over the example citations in the question:

pBTDATABASE[ToBibTeXTokens[textCitations]]

(* {{{}, BTDatabase[{BTRecord[{BTType[
       "article"], {"feynman1948space", {BTEntry[{"title", \
{"Space-time", "approach", "to", "non-relativistic", "quantum", 
           "mechanics"}}], 
        BTEntry[{"author", {"Feynman", ",", "Richard", "Phillips"}}], 
        BTEntry[{"journal", {"Reviews", "of", "Modern", "Physics"}}], 
        BTEntry[{"volume", {20}}], BTEntry[{"number", {2}}], 
        BTEntry[{"pages", {367}}], BTEntry[{"year", {1948}}], 
        BTEntry[{"publisher", {"APS"}}]}}}], 
    BTRecord[{BTType[
       "article"], {"einstein1905theory", {BTEntry[{"title", {"The", 
           "theory", "of", "the", "brownian", "movement"}}], 
        BTEntry[{"author", {"Einstein", ",", "Albert"}}], 
        BTEntry[{"journal", {"Ann.", "der", "Physik"}}], 
        BTEntry[{"volume", {17}}], BTEntry[{"pages", {549}}], 
        BTEntry[{"year", {1905}}]}}}]}]}} *)

It is probably better though to use Map[pBTRecord, ___] than pBTDATABASE. This is how it is done below with the JSON conversion.

Conversion to JSON

Here is an implementation of my first suggestion in the comments -- we can easily convert the BiBTeX records to JSON and then use ImportString. As discussed, YMMV with different sets of records.

t1 = StringReplace[textCitations, 
   StartOfLine ~~ "@" ~~ 
     x : (LetterCharacter ..) ~~ (Whitespace | "") ~~ 
     "{" ~~ (Whitespace | "") ~~ y : (WordCharacter ..) :> 
    "@" <> "\"" <> x <> "\"" <> ":{\"label\":" <> "\"" <> y <> "\""];

t2 = StringReplace[t1, 
   StartOfLine ~~ (Whitespace | "") ~~ 
     x : (LetterCharacter ..) ~~ (Whitespace | "") ~~ 
     "=" ~~ (Whitespace | "") ~~ "{" ~~ y : (Except["}"] ..) ~~ "}" :>
     "\"" <> x <> "\"" <> ":" <> 
     "[" <> (StringJoin @@ 
       Riffle["\"" <> # <> "\"" & /@ 
         StringSplit[y, {Whitespace, ","}], ","]) <> "]"];

t3 = StringSplit[t2, "@"];

t4 = Select[t3, 
   StringLength[StringReplace[#, Whitespace -> ""]] > 0 &];

ImportString["{" <> # <> "}", "JSON"] & /@ t4

(* {{"article" -> {"author" -> {"Feynman", "", "Richard", 
      "Phillips"}, "label" -> "feynman1948space", 
    "journal" -> {"Reviews", "of", "Modern", "Physics"}, 
    "title" -> {"Space-time", "approach", "to", "non-relativistic", 
      "quantum", "mechanics"}, "volume" -> {"20"}, "number" -> {"2"}, 
    "pages" -> {"367"}, "year" -> {"1948"}, 
    "publisher" -> {"APS"}}}, 

    {"article" -> {"author" -> {"Einstein", 
      "", "Albert"}, "label" -> "einstein1905theory", 
    "journal" -> {"Ann.", "der", "Physik"}, 
    "title" -> {"The", "theory", "of", "the", "brownian", "movement"},
     "volume" -> {"17"}, "pages" -> {"549"}, "year" -> {"1905"}}}} *)

Assignment of textCitations

textCitations = "
  @article{feynman1948space,
    title={Space-time approach to non-relativistic quantum \
mechanics},
    author={Feynman, Richard Phillips},
    journal={Reviews of Modern Physics},
    volume={20},
    number={2},
    pages={367},
    year={1948},
    publisher={APS}
  }

  @article{einstein1905theory,
    title={The theory of the brownian movement},
    author={Einstein, Albert},
    journal={Ann. der Physik},
    volume={17},
    pages={549},
    year={1905}
  }
  ";

Anton Antonov

Posted 2017-01-10T16:17:37.833

Reputation: 32 565

Thanks, that should be what I am looking for. However, trying to run the 'Conversion to JSON' code, I get the error Import::fmterr: Cannot import data as JSON format. Any idea on that? – physicsGuy – 2017-01-11T17:00:01.583

What of version of Mathematica are you using? I find JSON conversion in Mathematica very capricious. You might consider doing the equivalent of the proposed JSON conversion in another system, say, R / RStudio. – Anton Antonov – 2017-01-11T17:03:56.647

I am using Mathematica 11.0, I could also try an older version – physicsGuy – 2017-01-11T17:05:02.637

So, may be it is the data -- are you using a different text with citations than the ones included in your question? – Anton Antonov – 2017-01-11T17:07:22.623

I simply pasted the textCitations code from your answer and after that the 'Conversion to JSON' code. After that I press run and the error pops up. – physicsGuy – 2017-01-11T17:12:56.170

It seems there are some white spaces added from MSE that break the conversion -- use your original text data for the assignment to textCitations. The JSON conversion is kind of a weak and hacky approach. You should be better off using the dedicated parsers. – Anton Antonov – 2017-01-11T17:22:46.203

@PhysicsGuy :) I am glad it finally worked! – Anton Antonov – 2017-01-11T17:40:39.943