How do you convert a string containing a number in C scientific notation to a Mathematica number?

72

30

Suppose I have a string containing the C-representation of a floating point number; for example

s = "1.23e-5"

and I want to convert this to a Mathematica number. How can I do this?

ToExpression[s] gives Plus[-5, Times[1.23`, e]].

Ian Hinder

Posted 2012-02-14T13:11:04.443

Reputation: 2 045

Amazing that the language doesn't include a simple straightforward function to do this! – becko – 2016-04-03T20:10:13.870

2The only way I know how to do this is ImportString["1.23e-5", "Table"][[1, 1]] which seems like rather a large hack! – Ian Hinder – 2012-02-14T13:13:53.550

Answers

79

I think probably the cleanest way to do this (at least, if you have only a single string, or are faced with a separate string for each number you wish to convert as a result of some other process) is to use the undocumented function Internal`StringToDouble, i.e.:

s = "1.23e-5";
Internal`StringToDouble[s]

which gives:

0.0000123

However, if you are trying to convert many such numbers at once, the standard, documented methods (Import, Read, etc.), are likely to represent better approaches.

Oleksandr R.

Posted 2012-02-14T13:11:04.443

Reputation: 22 073

1@FJRA Why should ToExpression be avoided? – George Wolfe – 2012-10-17T16:19:21.153

10@GeorgeWolfe because it might lead you to a code leak. What if there is dangerous code within the string? Or something innocent like a equal sign (=) may set any of your variables. – FJRA – 2012-10-18T20:34:58.340

16Another one from Mr.Undocumented! – rm -rf – 2012-02-14T15:32:39.367

9Always try to avoid ToExpression! – FJRA – 2012-02-14T17:37:42.587

2Keep in mind that Internal\StringToDouble[]can produce unexpected results:"-1"is parsed as -1.0 but" -1"(extra leading space) is parsed as 1.0 (sign dropped). Also "1.0000000000000000" is parsed as 1. but if you add one more zero it returns$Failed["Bignum"]` and no message is generated. – Gustavo Delfino – 2020-06-30T15:09:49.087

5This seems like functionality that should be available (officially) in Mathematica. Will Wolfram accept feature-requests? – Ian Hinder – 2012-02-23T16:06:45.897

21

s = "1.23e-5"

# &[Read[#, Number], Close@#]&[ StringToStream@s ]

Which is not as good as what you started with. Note that it is important to close the stream.


Szabolcs says this is difficult to read. That was surely not my intention. You could also write it verbosely like this:

fromC =
    Module[{output, stream},
      stream = StringToStream[#];
      output = Read[stream, Number];
      Close[stream];
      output
    ] &;

fromC[s]

Mr.Wizard

Posted 2012-02-14T13:11:04.443

Reputation: 259 163

Making the Close in the second argument of a function that never uses it is a nice trick. I did some benchmarking and found the more readable version to be about 55% slower. An alternative way of writing it as a composition of functions is fromC = StringToStream /* {Read[#, Number] &, Close} /* Through /* First which is just about 10% slower. In any case it doesn't matter because Internal\StringToDouble` is much faster than any of these. – Gustavo Delfino – 2020-05-22T21:41:41.823

16

Another solution would be to use SemanticImportString (new in 10).

Borrowing some code from Mr.Wizard so that I can compare my solution to his:

strings =
  ToString @ Row[RandomChoice /@ {{"-", ""}, {#}, {"e"}, {"-", ""}, Range@12}] & /@ 
    RandomReal[{0, 10}, 15000];

Needs["GeneralUtilities`"]

Internal`StringToDouble /@ strings // AccurateTiming

System`Convert`TableDump`ParseTable[
  {strings}, {{{}, {}}, {"-", "+"}, "."}, False
] // AccurateTiming

Interpreter["Number"][strings]   // AccurateTiming

SemanticImportString[
     StringJoin[Riffle[strings, ";"]],
     {"Number"}, 
     "List",
     Delimiters -> ";"
] // AccurateTiming

0.00671892

0.00504799

12.980645

0.0426966

Now as you can see there is still an order of magnitude, but at least SemanticImport is strict with things that are not numbers, while Internal`StringToDouble["foo"] returns 0..

Some of the types in Interpreter will benefit from using SemanticImport internally when called on lists of strings in the future.

As far as the current speed of Interpreter there is only so much you can gain if you want to support things like

Interpreter[
    Restricted["Number", {0, 10, 0.5}],
    NumberPoint -> "baz",
    NumberSigns -> {"foo", "bar"}
]["bar5baz5"]

5.5

Carlo

Posted 2012-02-14T13:11:04.443

Reputation: 1 151

1Thank you Carlo, this is much appreciated! – Szabolcs – 2014-08-21T17:13:19.103

16

On version 7 Internal`StringToDouble fails on long strings, and fails to recognize exponents:

Internal`StringToDouble["3.1415926535897932385"]

Internal`StringToDouble /@ {"3.14159", "3.14159e-02", "3.14159e+02"}
$Failed["Bignum"]

{3.14159, 3.14159, 3.14159}

This sent me looking for another way to convert numeric strings. Using Trace on ImportString I found another internal function that does what I need: System`Convert`TableDump`ParseTable.

Being an internal function is it not error tolerant and if fed bad arguments it will crash the kernel. The syntax is as follows:

System`Convert`TableDump`ParseTable[
  table,
  {{pre, post}, {neg, pos}, dot},
  False
]
table  :   table of strings, depth = 2; need not be rectangular.  
pre    :   List of literal strings to ignore if preceding the digits (only first match tried).  
post   :   List of literal strings to ignore if following the digits (only first match tried).  
neg    :   literal string to interpret a negative sign (`-`).  
pos    :   literal string to interpret a positive sign (`+`).  
dot    :   literal string to interpret as decimal point.

(Using True in place of False causes a call to System`Convert`TableDump`TryDate that I do not yet understand.)

Example:

System`Convert`TableDump`ParseTable[
  {{"-£1,234.141592653589793e+007"}, {"0.97¢", "140e2kg"}},
  {{{"£"}, {"kg", "¢"}}, {"-", "+"}, "."},
  False
]

{{-1.2341415926535898*^10}, {0.97, 14000.}}

Mr.Wizard

Posted 2012-02-14T13:11:04.443

Reputation: 259 163

ParseTable is great. Very fast and handles integers as well as reals. I've used it a bunch of times when I know my input is clean, thanks a bunch! – ssch – 2013-12-07T14:30:30.173

@ssch Glad I could help. :-) – Mr.Wizard – 2013-12-07T18:25:22.453

Ah, kernel spelunking, +1. – rcollyer – 2012-08-20T14:49:49.360

3Nice work! I'm sure this function will be useful. Regarding Internal`StringToDouble on 7: exponents are recognised if you first use StringReplace[nums, "e"|"E" -> "*^"]. – Oleksandr R. – 2012-08-21T23:54:50.947

@Oleksandr thanks, and good to know about StringToDouble! – Mr.Wizard – 2012-08-22T04:49:18.740

15

First[ImportString["1.23e-5", "List"]] might be slightly less hack-y than your suggestion in the comments...

J. M.'s ennui

Posted 2012-02-14T13:11:04.443

Reputation: 115 520

What about a string like "2.12e"? You can see in MMA examples where such strings are generated as CForm/FortranForm ScientificForm[2.12, NumberFormat -> (Row[{#1, "e", #3}] &)] – PlatoManiac – 2012-02-14T13:47:28.503

@Plato is that a standard form? It looks like an error. – Mr.Wizard – 2012-02-14T13:53:47.807

@Plato: That's funny; I don't think I've ever seen a superfluous e being added to numbers between $1$ and $10$, I must say. Neither CForm[] nor FortranForm[] do this, and ScientificForm[] will only do that if you mess with options like you have. – J. M.'s ennui – 2012-02-14T13:54:17.870

@J.M. You are right! It does not generate numbers that ends with such a "e" as I wrote. Actually I got misled by the documentation of ScientificForm. You can also check there the NumberFormat example in the Options section of the documentation for ScientificForm. There they show how to produce Fortran-like forms. Test with a number like "2.12" and see the foolish "e" appears. But it is indeed not a general truth about the CFormor FortranForm. – PlatoManiac – 2012-02-14T14:07:50.123

@Plato: Okay, but I think that's a rather contrived example. I don't think I've seen an entity like 2.12e in applications... – J. M.'s ennui – 2012-02-14T14:19:36.923

11

Version 10 introduced Interpreter which would seem suited to this task:

Interpreter[form]
represents an interpreter object that can be applied to a string to try to interpret it as an object of the specified form.

Interpreter["Number"]["1.23e-5"]
0.0000123

Unfortunately it seems that like many new-in-10 functions this is far from optimized. In fact I would say its performance is nothing short of abysmal for this particular task.

Some string data to test with:

strings =
  ToString @ Row[RandomChoice /@ {{"-", ""}, {#}, {"e"}, {"-", ""}, Range@12}] & /@ 
    RandomReal[{0, 10}, 15000];

Timings for Interpreter against StringToDouble and ParseTable (see the other answers):

Needs["GeneralUtilities`"]

Internal`StringToDouble /@ strings // AccurateTiming

System`Convert`TableDump`ParseTable[
  {strings}, {{{}, {}}, {"-", "+"}, "."}, False
] // AccurateTiming

Interpreter["Number"] /@ strings   // AccurateTiming
0.0052075

0.00645107

10.625608

At more than three orders of magnitude slower than the old methods the new function is simply not appropriate for general use. Hopefully it will be improved in a future release.

Mr.Wizard

Posted 2012-02-14T13:11:04.443

Reputation: 259 163

1I really need a useful function that can parse a number or tell me that the input is not a number (and won't execute arbitrary code like ToExpression). I also tried Interpreter and found it to be unusable slow, unfortunately (I can't wait for half a minute for a file to import). I'm reporting the problem now and hoping for a fix ... – Szabolcs – 2014-08-09T15:32:15.400

@Szabolcs I think Read is still your best bet. What is the format or structure of the files you need to import? – Mr.Wizard – 2014-08-09T16:42:17.520

Read is useful when the precise format is known in advance, i.e. you know what type of expect for the next token. Take for example a mixture of strings and number. If reading as a number fails, read as a string. This comment wasn't motivated by the need to read a single file type only. – Szabolcs – 2014-08-09T16:58:14.377

@Szabolcs I understand that; this is still applicable, as is StringReplace, depending on the specifics. I'd welcome a Question from you on the subject.

– Mr.Wizard – 2014-08-09T17:42:50.410

@Szabolcs I've added an answer to this question that might serve your purpose. – Carlo – 2014-08-21T16:32:04.640

6

May be one can try the following

convert[inp_?StringQ] := ToExpression@StringReplace[inp, "e" -> "*10^"];

PlatoManiac

Posted 2012-02-14T13:11:04.443

Reputation: 13 888

Still this is not fully correct! If numbers like 2.12 is represented as "2.12e" than the expected "2.12e1". MMA does so as I mentioned in the above comment on the answer given by @J.M – PlatoManiac – 2012-02-14T13:51:10.183

8It works, but let me give one comment: whenever you use ToExpression on data read from a file, you make it possible to inject code into a program even inadvertently (one can never tell what sort of erroneous input the program might get by mistake). I generally try not to use ToExpression for just reading in data (as opposed to converting code) – Szabolcs – 2012-02-14T13:55:29.983

@Szabolcs thanks for explaining the issue with ToExpression. Your implementation is pretty cool. I did not know about the function StringToStream thanks for introducing... – PlatoManiac – 2012-02-14T14:01:03.990

You can actually replace "*10^" with "*^", which would be the Mathematica's syntax for floating-point exponents. E.g. InputForm[N[5^-9]] will give you 5.12*^-7 as the output. – Ruslan – 2019-08-03T08:24:38.627

6

updated based on comment feedback

One more approach, using LibraryLink. Create a C file called strto.cpp as follows:

#include <cstdlib>
#include "WolframLibrary.h"

EXTERN_C DLLEXPORT int wolfram_strtol(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res) {
  char *string;
  mint base;
  mint result;
  string = MArgument_getUTF8String(Args[0]);
  base = MArgument_getInteger(Args[1]);
  result = strtol(string, NULL,base);
  MArgument_setInteger(Res,result);
  return LIBRARY_NO_ERROR;
}

EXTERN_C DLLEXPORT int wolfram_strtod(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res) {
  char *string;
  mint base;
  mreal result;
  string = MArgument_getUTF8String(Args[0]);
  result = strtod(string, NULL);
  MArgument_setReal(Res,result);
  return LIBRARY_NO_ERROR;
}

This is a very thin wrapper for the C++ strtol and strtod standard library functions.

Create the library:

Needs["CCompilerDriver`"];
lib = CreateLibrary[{"wolfram_strto.cpp"}, "wolfram_strto"]

Load the two library functions:

strtol = LibraryFunctionLoad[lib, "wolfram_strtol", {"UTF8String", Integer}, Integer];
strtod = LibraryFunctionLoad[lib, "wolfram_strtod", {"UTF8String"}, Real];

Test the basics:

strtol["104", 10]

This should return the integer 104

strtod["10e4"]

This should return the real 100000.

Check some harder cases:

strtod /@ {"3.14159", "3.14159e-02", "3.14159e+02", "1.23e-5", "1E6", "1.734E-003", "2.12e1"}

Try a hex number:

strtol["0x2AF3", 0]

This should return 10995 (e.g. same as 16^^2AF3)

Measure the elapsed time to 15,000 randomly generated reals:

strings = ToString @ Row[ RandomChoice /@ {{"-", ""}, {#}, {"e"}, {"-", ""}, Range@12}] & /@ RandomReal[{0, 10}, 15000]
First@AbsoluteTiming[ strtod /@ strings]

Returns in about 0.017 seconds on my machine.

For big numbers, there is another difference:

Internal`StringToDouble["1e4000"]
strtod["1e4000"]

The StringToDouble function gives $Failed["IEEE Exception"] and the strtod function gives DirectedInfinity[1].

In the case of underflow you get, respectively, $Failed["IEEE Underflow"] and 0.

Also, StringToDouble recognizes WL notation (e.g. 6.022*^23) and strtod does not recognize this format.

Arnoud Buzing

Posted 2012-02-14T13:11:04.443

Reputation: 9 213

1My C compiler's strtod is a two-argument function (no base argument). (+1) – Michael E2 – 2019-08-02T23:45:19.230

It indeed has only two arguments in any standard-conforming C compiler, see cppreference.

– Ruslan – 2019-08-03T08:28:08.560

It's correct in the github code (I had the same problem, but I did find some version of it which wanted three arguments...) – Arnoud Buzing – 2019-08-03T17:34:48.337

I guess there is more than one variant here: https://linux.die.net/man/3/strtol

– Arnoud Buzing – 2019-08-03T17:35:47.143

ok, I've updated the post to use a (hopefully) more compliant #include <cstdlib> on Windows (together with a switch to the C++ compiler, which is more compliant on Windows) – Arnoud Buzing – 2019-08-03T18:30:29.590

1

I have another LibraryLink implementation here: https://mathematica.stackexchange.com/a/118402/12 The biggest problem with StringToDouble is that it cannot indicate that the string does not represent a number (not that it's internal).

– Szabolcs – 2019-08-04T08:38:06.717

2

Here is a mathematica function which accepts a string and return a number or a string containing an error message.

ConvertScientificNumberStringToNumber[string_String] := Block[
   {regexSciNum, regexNumOnly, regexNumEOnly},
   regexSciNum = "^ *(\\+|-)?(\\d+(\\.\\d+)?|\\.\\d+)((e|E)((\\+|-)?\\d+)?)? *$";
   regexNumOnly = "^ *(\\+|-)?(\\d+(\\.\\d+)?|\\.\\d+) *$";
   regexNumEOnly = "^ *(\\+|-)?(\\d+(\\.\\d+)?|\\.\\d+)(e|E) *$";
   If[! StringMatchQ[string, RegularExpression[regexSciNum]],
     Return["String is not a valid Scientific Format Number"];
   ];
   If[ StringMatchQ[string, RegularExpression[regexNumOnly]],
     Return[ToExpression[string]];
   ];
   If[ StringMatchQ[string, RegularExpression[regexNumEOnly]],
     (* If nothing appears after e|E then We need to strip everything after e|E *)
     Return[ToExpression[StringReplace[string, RegularExpression["(e|E)(.+)?$"] -> ""]]]
   ,
     Return[ ToExpression[StringReplace[string, RegularExpression["(e|E)"] -> "*^"]]]
   ];
   Return["Error we should not reach this point in the function."];
];

Steven Siew

Posted 2012-02-14T13:11:04.443

Reputation: 121

1

This works for me with large data (1E6 points) in Ver 8.0.1:

test = Import["scope_29_1.csv", "Data"];
test2 = ToExpression[Drop[test, 2]];

"Data" forces mathematica to convert 1.734E-003 into 0.001734 but keeps as string because the first 2 lines contains names. "Drop" Keeps the first non-numerical lines out.

Leo

Posted 2012-02-14T13:11:04.443

Reputation: 11

0

ToExpression@StringReplace[s, "e" -> "*10^"]

JL AP

Posted 2012-02-14T13:11:04.443

Reputation: 1