This proposal was discussed at this pull request.
Change the semantics of -fhpc¶
If I compile a program with the -fhpc
option and run it, I expect to obtain a .tix
file which contains information about the code covered during the execution of that run.
This, however, is surprisingly not the behaviour that is implemented.
Instead, the generated .tix
file contains the accumulated coverage of this execution of the program and all previous runs.
I propose to change the semantics to only generate the coverage information for one run of the program.
Motivation¶
The -fhpc
option compiles a program with additional instrumentation to generate coverage information.
When the instrumented executable is run, the collected coverage information is written to a .tix
file.
Tools such as hpc
can read the information contained in a .tix
file and generate useful information, such as
a html file with markup for the covered parts of the program. Other tools can generate information in other
formats which are used in various CI systems.
Surprisingly, an instrumented executable not only automatically writes a .tix
file at the end of its execution, it also
automatically tries to read a .tix
file at the beginning of the execution.
If it successfully reads a .tix
file at the beginning, then the datastructures that the RTS keeps to track coverage
are initialized with the data from the .tix
file. The user guide describes this behaviour as follows:
The program may be run multiple times (e.g. with different test data), and the coverage data from the separate runs is accumulated in the .tix file.
If there already is a .tix
file, but this file was created for a different version of the program, then the program crashes at the time the RTS is started.
Both behaviours are demonstrated in the example section below.
This behaviour is only useful for a completely manual way of interacting with instrumented executables.
It is a nuisance for tool developers who have to manually ensure to always remove
.tix
files, and for users not familiar with this behaviour.
For example, here is an example on Stack Overflow of a user bitten by
the fact that the program crashes if there is a .tix
file still around. None of the other users was able to correctly identify the underlying problem.
And here is an issue that describes that even cabal
was not able to correctly implement the logic for enabling code coverage in a project: Cabal Issue #7384
If a user wants to combine the results of multiple runs, then this behaviour can be implemented in userland.
The hpc
binary has the subcommand hpc combine
which allows to do this on the commandline, and the HPC library on Hackage provides the tools to do that programmatically from within Haskell programs.
Proposed Change Specification¶
The GHC user guide currently specifies the effect of -fhpc
as follows:
The program may be run multiple times (e.g. with different test data), and the coverage data from the separate runs is accumulated in the .tix file. To reset the coverage data and start again, just remove the .tix file.
This paragraph will be removed.
Instead, a new RTS flag --read-tix-file=<yes|no>
will be introduced which guards this behaviour.
The old behaviour will still be available using the flag --read-tix-file=yes
, but the default behaviour will be to not read the tix file and to always initialize the tix data structures with zeroes.
Examples¶
I will give two examples of the current behaviour which illustrate the surprising behaviour and the failure mode this feature entails.
Example 1: Coverage of multiple runs is accumulated¶
> cat Example.hs
module Main where
main = print "hello"
> ghc -fhpc Example.hs
[1 of 2] Compiling Main ( Example.hs, Example.o )
[2 of 2] Linking Example
> ./Example
"hello"
> cat Example.tix
Tix [ TixModule "Main" 2243069736 3 [1,1,1]]
> ./Example
"hello"
> cat Example.tix
Tix [ TixModule "Main" 2243069736 3 [2,2,2]]
Example 2: Tix File with Different Hash Crashes Program¶
> cat Example.hs
module Main where
main = print "hello"
> ghc -fhpc Example.hs
[1 of 2] Compiling Main ( Example.hs, Example.o )
[2 of 2] Linking Example
> ./Example
"hello"
> cat Example.tix
Tix [ TixModule "Main" 2243069736 3 [1,1,1]]
When I now change the definition of the program…
> cat Example.hs
module Main where
main = print "world"
> ghc -fhpc Example.hs
[1 of 2] Compiling Main ( Example.hs, Example.o ) [Source file changed]
[2 of 2] Linking Example [Objects changed]
> ./Example
in module 'Main'
Hpc failure: module mismatch with .tix/.mix file hash number
(perhaps remove Example.tix file?)
The crash occurs during the startup phase of the RTS, when it tries to initialize the
tix data structures with the information from the .tix
file in the directory, but finds out that the hashes
don’t match.
Effect and Interactions¶
The behaviour of programs compiled with -fhpc
becomes more predictable, we get rid
of a failure mode which perplexes users and prevents better integration in tools.
Costs and Drawbacks¶
People relying on the aggregation of multiple runs will have to explicitly use the --read-tix-file=yes
RTS option to get the old behaviour. It is also possible to sum up multiple tix files by hand, using the hpc combine
command.
Backward Compatibility¶
This is a breaking change w.r.t. to the semantics of -fhpc
.
It will only affect users of -fhpc
which rely on the described functionality,
namely that the coverage collected in the .tix
file is accumulated over multiple
program runs. It is very hard to be sure, but my guess is that very few people are currently
using -fhpc
in this way. They will only notice the change of behaviour in that
their .tix
files contain less ticks than they expected. The deprecation strategy outlined in the next
section will warn users that this behaviour will change.
Deprecation Strategy¶
The old behaviour will be changed over two consecutive releases:
In the first release, GHC continues with the current behaviour, but if it finds an old file (of any kind, even in the wrong format) it emits a warning (before attempting to read it) saying
I am reading in the existing tix file, and will add hpc info from this run to the existing data in that file. GHC 9.12 will cease looking for an existing tix file. If you positively want to add hpc info to the current tix file, use
--read-tix-file=yes
In the next release, it stops reading the file
Alternatives¶
Do nothing and leave the semantic as it is. It is also possible to completely remove the tix file parser from the runtime system. This would lead to a simplification in the RTS codebase, but the old behaviour would no longer be available.
Unresolved Questions¶
None.
Implementation Plan¶
I will implement this change: the change is mostly localized to the file rts/Hpc.c and to the files related to RTS flags.
The startup logic in the function startupHpc()
will be modified and will take the --read-tix-file
flag into account.
Endorsements¶
None.