MaxSAT Evaluation 2022
Affiliated with SAT 2022   ·   August 2-5   ·   Haifa, Israel

Rules

General

    Withdrawal

    A solver can be withdrawn from MaxSAT Evaluation 2022 only before the deadline for the submission of the final versions. After this deadline no further changes or withdrawals of the solvers are possible.

    Number of solver submissions

    The evaluation organizers reserve the right to restrict the number of solver submissions originating from the same author(s) based on computation resource limitations.

    Per-instance solver tuning

    Embedding strategies specific to particular instances in the solver is considered to be against the spirit of the evaluation which aims to encourage the advancement of MaxSat solvers with wide coverage and general applicability. Hence solvers are not allowed to employ triggers to modify their behaviour, that are deemed to be specific to particular instances. The final ruling on whether or not a solver is using instance specific tuning will be made on a case-by-case basis by the organizers, but only after further consultation with the solver developers. Participants are encouraged to consult the organizers in case of doubt.

    Disqualification

    A solver will not be immediate disqualified if it displays a wrong solution during the execution of the evaluation. The organizers will aim to provide all participants a fair chance of submitting bug fixes to their solvers in a timely manner based on feedback on wrong results. However, the organizers reserve the right to limit the number of resubmissions of a buggy solver based on resource limitations. A solver is considered buggy under the following circumstances.

    • The solver in the main tracks reports a truth assignment in a "v" line that does not satisfy the hard clauses.
    • The solver in the main tracks for complete solvers reports "s OPTIMUM FOUND" but the truth assignment reported in its "v" line has cost higher than another known solution.
    • The solver in the main track reports a cost on its "o" that does not match the actual cost of the truth assignment reported on the "v" line.
    • The solver in the incremental track does not correctly implement ipamir_solve, ipamir_val_lit or ipamir_val_obj as specified in the IPAMIR header.
    • The solver crashes on a significant number of instances.
    If the solver's bug cannot be resolved, the submitters will still have the option of having the solver's correct results tabulated and reported in the table of results. However, the results will be marked to indicate that the solver also generated some incorrect results.

    Use of Processor Cores

    In each track, solvers much use only a single core of the processor on the computing node it is run on. Solvers making use of multiple cores (for parallel computations) will be disqualified.

    Open source requirement

    Solver source code originating from the authors (including modifications to third-party source code such as SAT solvers as part of a solver) must be submitted together with a corresponding solver binary that can, for main track solvers, be directly run on StarExec. The source code package of each solver participating in the evaluation will be made available at the time when MaxSAT Evaluation 2022 results are presented at the SAT 2022 conference. In case bug fixes are submitted during the evaluation (i.e., if requested by the evaluation organizers), the bug-fixed source package must also be submitted together with a bug-fixed solver binary. A solver is allowed to use external binaries of unmodified third-party libraries. Source code need not be submitted for such libraries. However, if the library is non-standard (i.e., not STL, Boost, etc.) its identity and usage in the solver should be clearly specified in the solver description.

    Solver description

    Each entrant to MaxSAT Evaluation 2022 must include a short (at least 1, at most 2 pages) description of the system. This should include a list of all authors of the system and their present institutional affiliations. It should also provide details of any non-standard algorithmic techniques (e.g., heuristics, simplification/learning techniques, etc.) and data structures, as well as references to relevant literature (be they by the authors themselves or by others). The system description must be formatted in IEEE Proceedings style, and submitted as PDF. The system descriptions will be posted on the MaxSAT Evaluation 2022 website. Furthermore, given that the quality and number of system descriptions is high enough, the organizers are considering publishing the collection of system description as a report under the report series of Department of Computer Science, University of Helsinki (with an ISSN number).

    Participation of the organizers

    The organizers are allowed to enter their own solvers into the evaluation. To avoid providing them with advantage over other participants, details of the benchmark selection process will be made public after the submission deadline and it will ensured that no individual organizers nor subsets of the organizers can impose that a spefici seed is used for the randomized process applied for benchmark selection.

    Rankings

    Main tracks for complete solvers

    Solvers participating in the complete tracks will be ranked based on the number of solved instances within predefined per-instance resource limitations. In the main tracks for complete solvers solving an instance means finding an optimal solution.

    Main tracks for incomplete solvers

    Solvers participating in the incomplete tracks will be ranked based on the average of the so called incomplete scores. The incomplete score, score(s,i), of a solver s on instance i is the ratio between the best solution found by s and the best known solution for i:
    score(s, i) =  (1+ cost of best known cost for i) / (1 +cost of solution for i found by s)

  • For each instance we consider the best solution found by all incomplete solvers within 300 seconds.
  • For an instance i score is 0 if no solution was found by that solver.
  • For each instance the incomplete score is a value in [0, 1].

The organizers reserve the right to use additional scoring metrics for the evaluation results, such as ones which take into account search progress in terms of improvements to the cost of best solution found as a function of time.

Special track on incremental MaxSAT

In the incremental track, all solvers will be ranked on each benchmark application independently. For a specific benchmark application, the rank of a solver is determined by the number of solved instances for that benchmark. Solving an instance means that all invocations of the solver are correct, i.e. correctly return either OPTIMUM FOUND or UNSAT as specified in the IPAMIR header.

Solver Requirements

Main Tracks

Each solver submission must adhere to the following specification.

  • Input and output format: Solvers must accept as input MaxSAT instances in the standard WCNF format, and conform to the output format specified here.
  • SIGTERM: At the time limit solvers will receive a SIGTERM signal shortly before being forcibly terminated by a SIGKILL signal.

    Incomplete solvers must catch the SIGTERM signal at which point they can output the best solution they have found so far with a "o" and "v" line (if a solution has been found), and then terminate.

    Exact solvers in the main track, only need to output an optimal solution, and can terminate after finding and outputting one. So there is no real need for them to catch the SIGTERM signal as no score is associated with outputting the best solution found if that solution is not optimal.

    Catching the SIGTERM signal on StarExec
    StarExec is currently sending a SIGTERM followed by a SIGKILL signal with minimal delay. The time gap between SIGTERM and SIGKILL might not be sufficient for your program to output its best solution. But we suggest you try that first making sure to assemble all output before finally flushing the output stream (multiple flush commands will take more time). To address this issue we are in the process of incorporating a longer delay to the StarExec platform that should be sufficient for your program to complete its output. In the meantime, if you want to test that your solver is properly catching SIGTERM and then outputting its best solution, and have found that StarExec's delay is too short then we can suggest two other methods.

    1. Include the "runsolver" tool in your solver package. runsolver is a tool to control the memory and runtime of a solver. After downloading and statically compiling "runsolver", you can include "runsolver" in your bin directory when uploading your solver to StarExed. Then you can change your StarExec configuration script (see solver submission) so that runs your solver under "runsolver"'s control as follows:
        #!/bin/bash
        delay=3
        cpu=$(($STAREXEC_CPU_LIMIT-$delay-1))
        wl=$(($STAREXEC_WALLCLOCK_LIMIT-$delay-1))
      
        ./runsolver -d $delay -C $cpu -W $wl -v out.v -w out.w ./<your_solver_binary> $1
        

      which will send a SIGTERM signal and after the "delay" (in this case 3 seconds) will send a SIGKILL signal to your solver. To verify if your solver is receiving the SIGTERM signal properly, we suggest that you try using "runsolver" offline and verify that your solver is receiving and outputting a solution when a SIGTERM signal is sent.

    2. Run your solver using the linux "timeout" command. Again you would modify your StarExec configuration script, but this time there is no need to upload runsolver. Instead you invoke your solver as follows:
      #!/bin/bash
      delay=3
      wl=$(($STAREXEC_WALLCLOCK_LIMIT-$delay))
      timeout -s 15 <wl> ./<your_solver_binary> $1 
      

      Where <wl> is the runtime for your solver minus the delay in seconds that you want to have between SIGTERM and SIGKILL (when you create a StarExec job set the job's timeout to be greater than <delay>). This will cause the system to send your solver a SIGTERM signal after <delay> seconds, after which StarExec will terminate your solver at its timeout.

Special Track on Incremental MaxSAT

Solvers participating in the incremental track are required to implement a subset of functions from the IPAMIR interface. See details here.

Input and Output Requirements

Main Tracks

Input format

Note that the input format is changed for 2022. The changes (hard clauses marked with "h" and the removal of the p-line) will require minor modifications to existing solvers.

The input file must be read from the file given as a parameter to your solver. For example:

./mysolver <input_file_name>

In MaxSAT Evaluation 2022, the same input format is used for all benchmark instances in the main tracks. Variables as indexed with positive integers. A clause is a listing of variables indices, with "-" used to denote negation. "0" marks the end of the declaration of a clause. We associate a weight with each soft clause, which is the first integer in the soft clause. Weights must be greater than or equal to 1, and smaller than 263. The sum of the weights must be less than 264-1. Hard clauses are declared by "h" instead of a weight. See below for an example.

Note: With respect to the input format used in earlier (pre-2022) MaxSAT Evaluations, changes introduced in 2022 to the input format are the following:

  • No more p-line. The p-line is a source of many problems when constructing benchmark files. There is some value for knowing how many variables and clauses there are in the formula before reading the clauses (e.g., for allocating fixed sized arrays). But even in this case the space required for storing the clauses would have to be dynamically reallocated as the clauses are read from the file.
  • h lines. In the new format hard clauses will be indicated with an in initial ‘h’ before the list of literals in the clause.

Example 1.
Instance in the old (pre-2022) format: c This is a comment c Example 1...another comment p wcnf 7 4 12 12 1 2 3 4 0 1 -3 -5 6 7 0 6 -1 -2 0 4 1 6 -7 0 In new format used starting 2022: c This is a comment c Example 1...another comment h 1 2 3 4 0 1 -3 -5 6 7 0 6 -1 -2 0 4 1 6 -7 0

You might find some of the software in the benchmark code base to be useful. This software includes std_wcnf, a new version of the standardizer, which is able to convert old format MaxSat instances into the new format, and to_old_fmt a program to convert from the new format back to the old format.

Output format

The solvers must output messages to standard output that will be used to check the results. The output format is inspired by the DIMACS output specification of the SAT competition and will be used to check some results.

The solver cannot write to any files except standard output and standard error (only standard output will be parsed for results, but both output and error will be memorized during the whole evaluation process, for all executions).

Messages

  • Comments ("c " lines):
    These lines start by the two characters: lower case 'c' followed by a space (ASCII code 32).
    These lines are optional and may appear anywhere in the solver output.
    They contain any information that authors want to emphasize, such as #backtracks, #flips,... or internal cpu-time.
    Submitters are advised to avoid outputting comment lines which may be useful in an interactive environment but otherwise useless in a batch environment.

  • Solution Status ("s " line):
    This line starts by the two characters: lower case 's' followed by a space (ASCII code 32).
    Only one such line is allowed.
    This line gives the answer of the solver. It must be one of the following answers:
    • s OPTIMUM FOUND
      This line is to output when the solver has checked that the last "o" and "v" lines specify an optimal solution.
    • s UNSATISFIABLE
      This line is to be output when the solves has checked that the set of hard clauses is unsatisfiable.
    • s UNKNOWN
      This line must be output in any other case. Note that solvers in the incomplete track will typically output "s UNKNOWN" unless they can verify that the best solution they have found is optimal, in which case they can output "s OPTIMUM FOUND". However, no extra score will be associated in the incomplete tracks for proving optimality. Exact solvers in the main track can output "s UNKNOWN" if they are unable to find a optimal solution (note that not outputting an "s" line is the same as outputting "s UNKNOWN")

    It is of uttermost importance to respect the exact spelling of these answers. Any mistake in the writing of these lines will cause the answer to be disregarded.

    If the solver does not display a "s" line (or if the "s" is not valid), then UNKNOWN will be assumed.

  • Solution Cost Line ("o " lines):
    These lines start by the two characters: lower case 'o' followed by a space (ASCII code 32).
    An "o " line must contain the lower case 'o' followed by a space and then by an integer which represents the cost of the best solution found, i.e., the sum of the weights of clauses falsified by the solution.
    Incomplete solvers should field the SIGTERM signal so as to output an "o" (and "v") line specifying the best solution found just before termination. There is no need for exact solvers in the main track to output any solution other than an optimal one. When they find an optimal solution they can output an "o" and "v" line specifying the solution they found.


  • Solution Values (Truth Assignment) ("v " lines):
    These lines start by the two characters: lower case 'v' followed by a space (ASCII code 32).
    More than one "v " line is allowed but the evaluation environment will act as if their content was merged.
    When the solver reports a solution it must provide "s", "o" and "v" lines. The "v" line provides a truth assignment to the variables of the instance that will be used to check the correctness of the answer, i.e., it must provide a list of non-complementary literals which, when interpreted as true, achieves a sum of weights of unsatisfied clauses as specified in the "o" line.
    A literal is denoted by an integer that identifies the variable and the negation of a literal is denoted by a minus sign immediately followed by the integer of the variable.
    The solution line must define the value of each variable. The order of literals does not matter.
    If the solver does not output a value line, or if the value line does not match the specification, then UNKNOWN will be assumed.
All the lines must be ended by a standard Unix end of line character ('\n');

Multiple "o" lines
Although the solver can output a new "o" line every time it finds a better solution, there is no score associated with doing so. Only one "s" line is allowed and all "v" lines are merged, hence only one truth assignment can be specified. Therefore, the intermediate "o" lines cannot be verified by information in the output and will be ignored in the evaluation. Only the last "o" line will be used (and it must match the data specified in the "v" line).

Example

c --------------------------
c My MaxSAT Solver
c --------------------------
o 143
s OPTIMUM FOUND
v -1 2 3 -4 -5 6 -7 8 9 10 -11 -12 13 -14 -15 16 -17 18 19 20

Note: Consider changes to the v-line for a more compressed certificate. Details are available in the v-line section. These changes are optional for this year but can have reduce time for printing and will significantly reduce log sizes.

Special Track on Incremental MaxSAT

Solvers participating in the incremental track perform their I/O via the IPAMIR interface, and are required to implement a subset of functions from this generic interface. See details here.

Benchmarks

  • The benchmark sets used in the evaluation will consists of both instances appearing in the earlier MaxSAT evaluation benchmark sets as well as new unseen benchmarks.
  • The set of benchmarks used in the evaluation will be revealed only after the results from the evaluation are presented at the SAT 2022 conference
  • The number of benchmarks used in the evaluation is not predetermined.
  • The benchmark selection procedures used consists of two steps. 1. The organizers will divide accessible benchmarks into buckets consisting of "similar" benchmark instances (e.g., originating from the same or similar problem domain). 2. Benchmarks will be drawn randomly one-by-one from each of the buckets, so that a balanced number of benchmarks over the buckets is obtained.