aboutsummaryrefslogtreecommitdiff
path: root/src/libexpr/lexer.l
AgeCommit message (Collapse)Author
2022-05-25Handle EOFs in string literals correctlyEelco Dolstra
We can't return a STR token without setting a valid StringToken, otherwise the parser will crash. Fixes #6562.
2022-04-21replace most Pos objects/ptrs with indexes into a position tablepennae
Pos objects are somewhat wasteful as they duplicate the origin file name and input type for each object. on files that produce more than one Pos when parsed this a sizeable waste of memory (one pointer per Pos). the same goes for ptr<Pos> on 64 bit machines: parsing enough source to require 8 bytes to locate a position would need at least 8GB of input and 64GB of expression memory. it's not likely that we'll hit that any time soon, so we can use a uint32_t index to locate positions instead.
2022-03-24lexer: add error location to lexer errorsSergei Trofimovich
Before the change lexter errors did not report the location: $ nix build -f. mc error: path has a trailing slash (use '--show-trace' to show detailed location information) Note that it's not clear what file generates the error. After the change location is reported: $ src/nix/nix --extra-experimental-features nix-command build -f ~/nm mc error: path has a trailing slash at .../pkgs/development/libraries/glib/default.nix:54:18: 53| }; 54| src = /tmp/foo/; | ^ 55| (use '--show-trace' to show detailed location information) Here we see both problematic file and the string itself.
2022-01-19remove ExprIndStrpennae
it can be replaced with StringToken if we add another bit if information to StringToken, namely whether this string should take part in indentation scanning or not. since all escaping terminates indentation scanning we need to set this bit only for the non-escaped IND_STRING rule. this improves performance by about 1%. before nix search --no-eval-cache --offline ../nixpkgs hello Time (mean ± σ): 8.880 s ± 0.048 s [User: 6.809 s, System: 1.643 s] Range (min … max): 8.781 s … 8.993 s 20 runs nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix Time (mean ± σ): 375.0 ms ± 2.2 ms [User: 339.8 ms, System: 35.2 ms] Range (min … max): 371.5 ms … 379.3 ms 20 runs nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 2.831 s ± 0.040 s [User: 2.536 s, System: 0.225 s] Range (min … max): 2.769 s … 2.912 s 20 runs after nix search --no-eval-cache --offline ../nixpkgs hello Time (mean ± σ): 8.832 s ± 0.048 s [User: 6.757 s, System: 1.657 s] Range (min … max): 8.743 s … 8.921 s 20 runs nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix Time (mean ± σ): 367.4 ms ± 3.2 ms [User: 332.7 ms, System: 34.7 ms] Range (min … max): 364.6 ms … 374.6 ms 20 runs nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 2.810 s ± 0.030 s [User: 2.517 s, System: 0.225 s] Range (min … max): 2.742 s … 2.854 s 20 runs
2022-01-13optimize unescapeStrpennae
mainly to avoid an allocation and a copy of a string that can be modified in place (ever since EvalState holds on to the buffer, not the generated parser itself). # before Benchmark 1: nix search --offline nixpkgs hello Time (mean ± σ): 571.7 ms ± 2.4 ms [User: 563.3 ms, System: 8.0 ms] Range (min … max): 566.7 ms … 579.7 ms 50 runs Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix Time (mean ± σ): 376.6 ms ± 1.0 ms [User: 345.8 ms, System: 30.5 ms] Range (min … max): 374.5 ms … 379.1 ms 50 runs Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 2.922 s ± 0.006 s [User: 2.707 s, System: 0.215 s] Range (min … max): 2.906 s … 2.934 s 50 runs # after Benchmark 1: nix search --offline nixpkgs hello Time (mean ± σ): 570.4 ms ± 2.8 ms [User: 561.3 ms, System: 8.6 ms] Range (min … max): 564.6 ms … 578.1 ms 50 runs Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix Time (mean ± σ): 375.4 ms ± 1.3 ms [User: 343.2 ms, System: 31.7 ms] Range (min … max): 373.4 ms … 378.2 ms 50 runs Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 2.925 s ± 0.006 s [User: 2.704 s, System: 0.219 s] Range (min … max): 2.910 s … 2.942 s 50 runs
2022-01-13don't strdup tokens in the lexerpennae
every stringy token the lexer returns is turned into a Symbol and not used further, so we don't have to strdup. using a string_view is sufficient, but due to limitations of the current parser we have to use a POD type that holds the same information. gives ~2% on system build, 6% on search, 8% on parsing alone # before Benchmark 1: nix search --offline nixpkgs hello Time (mean ± σ): 610.6 ms ± 2.4 ms [User: 602.5 ms, System: 7.8 ms] Range (min … max): 606.6 ms … 617.3 ms 50 runs Benchmark 2: nix eval -f hackage-packages.nix Time (mean ± σ): 430.1 ms ± 1.4 ms [User: 393.1 ms, System: 36.7 ms] Range (min … max): 428.2 ms … 434.2 ms 50 runs Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 3.032 s ± 0.005 s [User: 2.808 s, System: 0.223 s] Range (min … max): 3.023 s … 3.041 s 50 runs # after Benchmark 1: nix search --offline nixpkgs hello Time (mean ± σ): 574.7 ms ± 2.8 ms [User: 566.3 ms, System: 8.0 ms] Range (min … max): 569.2 ms … 580.7 ms 50 runs Benchmark 2: nix eval -f hackage-packages.nix Time (mean ± σ): 394.4 ms ± 0.8 ms [User: 361.8 ms, System: 32.3 ms] Range (min … max): 392.7 ms … 395.7 ms 50 runs Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 2.976 s ± 0.005 s [User: 2.757 s, System: 0.218 s] Range (min … max): 2.966 s … 2.990 s 50 runs
2021-11-04Optimize primop callsEelco Dolstra
We now parse function applications as a vector of arguments rather than as a chain of binary applications, e.g. 'substring 1 2 "foo"' is parsed as ExprCall { .fun = <substring>, .args = [ <1>, <2>, <"foo"> ] } rather than ExprApp (ExprApp (ExprApp <substring> <1>) <2>) <"foo"> This allows primops to be called immediately (if enough arguments are supplied) without having to allocate intermediate tPrimOpApp values. On $ nix-instantiate --dry-run '<nixpkgs/nixos/release-combined.nix>' -A nixos.tests.simple.x86_64-linux this gives a substantial performance improvement: user CPU time: median = 0.9209 mean = 0.9218 stddev = 0.0073 min = 0.9086 max = 0.9340 [rejected, p=0.00000, Δ=-0.21433±0.00677] elapsed time: median = 1.0585 mean = 1.0584 stddev = 0.0024 min = 1.0523 max = 1.0623 [rejected, p=0.00000, Δ=-0.20594±0.00236] because it reduces the number of tPrimOpApp allocations from 551990 to 42534 (i.e. only small minority of primop calls are partially applied) which in turn reduces time spent in the garbage collector.
2021-09-29reset yylloc when yyless(0) is calledTaeer Bar-Yam
2021-08-06add antiquotations to pathsTaeer Bar-Yam
2021-07-14libexpr: Fix read out-of-bound on the heapPamplemousse
Signed-off-by: Pamplemousse <xav.maso@gmail.com>
2020-12-02Remove an `unknown pragma` gcc warningregnat
2020-12-01shut up clang warningsregnat
- Fix some class/struct discrepancies - Explicit the overloading of `run` in the `Cmd*` classes - Ignore a warning in the generated lexer
2020-06-15Remove trailing whitespaceEelco Dolstra
2020-04-22a few more 'format's rremovedBen Burdette
2018-10-27simplify handling of extra '}'Guillaume Maudoux
2018-08-29libexpr: Use int64_t for NixIntaszlig
Using a 64bit integer on 32bit systems will come with a bit of a performance overhead, but given that Nix doesn't use a lot of integers compared to other types, I think the overhead is negligible also considering that 32bit systems are in decline. The biggest advantage however is that when we use a consistent integer size across all platforms it's less likely that we miss things that we break due to that. One example would be: https://github.com/NixOS/nixpkgs/pull/44233 On Hydra it will evaluate, because the evaluator runs on a 64bit machine, but when evaluating the same on a 32bit machine it will fail, so using 64bit integers should make that consistent. While the change of the type in value.hh is rather easy to do, we have a few more options available for doing the conversion in the lexer: * Via an #ifdef on the architecture and using strtol() or strtoll() accordingly depending on which architecture we are. For the #ifdef we would need another AX_COMPILE_CHECK_SIZEOF in configure.ac. * Using istringstream, which would involve copying the value. * As we're already using boost, lexical_cast might be a good idea. Spoiler: I went for the latter, first of all because lexical_cast does have an overload for const char* and second of all, because it doesn't involve copying around the input string. Also, because istringstream seems to come with a bigger overhead than boost::lexical_cast: https://www.boost.org/doc/libs/release/doc/html/boost_lexical_cast/performance.html The first method (still using strtol/strtoll) also wasn't something I pursued further, because it is also locale-aware which I doubt is what we want, given that the regex for int is [0-9]+. Signed-off-by: aszlig <aszlig@nix.build> Fixes: #2339
2018-05-11Don't return negative numbers from the flex tokenizerEelco Dolstra
Fixes #1374. Closes #2129.
2018-05-11Revert "Throw a specific error for incomplete parse errors."Eelco Dolstra
This reverts commit 6498adb002bcf7e715afe46c23b8635d4592c156. We don't actually use IncompleteParseError in 'nix repl'.
2018-03-02libexpr: Recognize newline in more places in lexerTuomas Tynkkynen
Flex's regexes have an annoying feature: the dot matches everything except a newline. This causes problems for expressions like: "${0}\ " where the backslash-newline combination matches this rule instead of the intended one mentioned in the comment: <STRING>\$|\\|\$\\ { /* This can only occur when we reach EOF, otherwise the above (...|\$[^\{\"\\]|\\.|\$\\.)+ would have triggered. This is technically invalid, but we leave the problem to the parser who fails with exact location. */ return STR; } However, the parser actually accepts the resulting token sequence ('"' DOLLAR_CURLY 0 '}' STR '"'), which is a problem because the lexer rule didn't assign anything to yylval. Ultimately this leads to a crash when dereferencing a NULL pointer in ExprConcatStrings::bindVars(). The fix does change the syntax of the language in some corner cases but I think it's only turning previously invalid (or crashing) syntax to valid syntax. E.g. "a\ b" and ''a''\ b'' were previously syntax errors but now both result in "a\nb". Found by afl-fuzz.
2018-02-16libexpr: Pre-reserve space in string in unescapeStr()Tuomas Tynkkynen
Avoids some malloc() traffic.
2017-11-14Revert "Don't parse "x:x" as a URI"Eelco Dolstra
This reverts commit f90f660b243866b8860eeb24cc4a345d32cc7ce7. This broke Hydra's release.nix, which contained preCheck = ''export LOGNAME=${LOGNAME:-foo}'';
2017-10-30Don't parse "x:x" as a URIEelco Dolstra
URIs now have to contain "://" or start with "channel:".
2017-07-30Replace Unicode quotes in user-facing strings by ASCIIJörg Thalheim
Relevant RFC: NixOS/rfcs#4 $ ag -l | xargs sed -i -e "/\"/s/’/'/g;/\"/s/‘/'/g"
2017-05-01lexer: remove catch-all rules hiding real errorsGuillaume Maudoux
With catch-all rules, we hide potential errors. It turns out that a4744254 made one cath-all useless. Flex detected that is was impossible to reach. The other is more subtle, as it can only trigger on unfinished escapes in unfinished strings, which only occurs at EOF.
2017-05-01Fix lexer to support `$'` in multiline strings.Guillaume Maudoux
2016-12-06Tweak error messageEelco Dolstra
2016-11-27Improve error message on trailing path slashesGuillaume Maudoux
2016-11-13Fix comments parsingGuillaume Maudoux
Fixed the parsing of multiline strings ending with an even number of stars, like /** this **/. Added test cases for comments.
2016-02-24Throw a specific error for incomplete parse errors.Scott Olson
`nix-repl` will use this for deciding whether to keep waiting for input or error out right away.
2016-02-12Merge pull request #762 from ctheune/ctheune-floatsEelco Dolstra
Implement floats
2016-01-20Revert "Revert "next try for "don't abort when given unmatched '}' with ↵Eelco Dolstra
'start-condition stack underflow'. This fixes #751""" This reverts commit b669d3d2e83d3c50238751b57cff3ed0ca39bc8a.
2016-01-20Revert "next try for "don't abort when given unmatched '}' with ↵Eelco Dolstra
'start-condition stack underflow'. This fixes #751"" This reverts commit ed23c8568e10d15196bb4ff2b79fc14191d28109. Let's merge this *after* the 1.11.1 release.
2016-01-19next try for "don't abort when given unmatched '}' with 'start-condition ↵Fabian Schmitthenner
stack underflow'. This fixes #751" This reverts commit 8120b6fb8a4924f8ae717bba9bbda4a2f89e2141 and fixes the regression introduced in 8d22b26448a091c76ab972c0b0603daac5e255e4.
2016-01-19Revert "don't abort when given unmatched '}' with 'start-condition stack ↵Eelco Dolstra
underflow'. This fixes #751" This reverts commit 8d22b26448a091c76ab972c0b0603daac5e255e4. It breaks Nixpkgs: $ nix-env -qa error: syntax error, unexpected IND_STR, expecting '}', at /home/eelco/Dev/nixpkgs-stable/pkgs/top-level/python-packages.nix:7605:8
2016-01-12don't abort when given unmatched '}' with 'start-condition stack underflow'. ↵Fabian Schmitthenner
This fixes #751
2016-01-05Edge condition: parser did not pick up floats starting exactly with 0.Christian Theune
2016-01-05Fix up float parsing.Christian Theune
2016-01-05Try a simplified version of float lexing that didn't work.Christian Theune
The last one I tried was botchered anyway ...
2016-01-05First hit at providing support for floats in the language.Christian Theune
2015-07-03Fix the parsing of "$"'s in strings.Guillaume Maudoux
2015-07-03Fix the hack that resets the scanner state.Guillaume Maudoux
2015-02-19Allow the leading component of a path to be a ~Shea Levy
2014-08-20Use proper quotes everywhereEelco Dolstra
2014-01-14Allow "bare" dynamic attrsShea Levy
Now, in addition to a."${b}".c, you can write a.${b}.c (applicable wherever dynamic attributes are valid). Signed-off-by: Shea Levy <shea@shealevy.com>
2013-09-02Fix whitespaceEelco Dolstra
2013-08-19Store Nix integers as longsEelco Dolstra
So on 64-bit systems, integers are now 64-bit. Fixes #158.
2013-08-02Add comparison operators ‘<’, ‘<=’, ‘>’ and ‘>=’Eelco Dolstra
2013-03-14Fix building against Bison 2.6Eelco Dolstra
2012-09-27Allow dashes in identifiersEelco Dolstra
In Nixpkgs, the attribute in all-packages.nix corresponding to a package is usually equal to the package name. However, this doesn't work if the package contains a dash, which is fairly common. The convention is to replace the dash with an underscore (e.g. "dbus-lib" becomes "dbus_glib"), but that's annoying. So now dashes are valid in variable / attribute names, allowing you to write: dbus-glib = callPackage ../development/libraries/dbus-glib { }; and buildInputs = [ dbus-glib ]; Since we don't have a negation or subtraction operation in Nix, this is unambiguous.
2011-08-06* Add a Nix expression search path feature. Paths between angleEelco Dolstra
brackets, e.g. import <nixpkgs/pkgs/lib> are resolved by looking them up relative to the elements listed in the search path. This allows us to get rid of hacks like import "${builtins.getEnv "NIXPKGS_ALL"}/pkgs/lib" The search path can be specified through the ‘-I’ command-line flag and through the colon-separated ‘NIX_PATH’ environment variable, e.g., $ nix-build -I /etc/nixos ... If a file is not found in the search path, an error message is lazily thrown.