1 Introduction
Gerrymandering, namely deliberate creations of district maps with highly asymmetric electoral outcomes to disenfranchise voters, has continued to be a curse to fairness of electoral systems in USA for a long time in spite of general public disdain for it. There is a long history of this type of voter disenfranchisement going back as early as when the specific term “gerrymandering” was coined after a redistricting of the senate election map of the state of Massachusetts resulted in a South Essex district taking a shape that resembled a salamander (see Fig. 1). There is an elaborate history of litigations involving gerrymandering as well. In the US Supreme Court (SCOTUS) ruled that gerrymandering is justiciable [10], but they could not
agree on an effective way of estimating it. In
, SCOTUS opined that a measure of partisan symmetry may be a helpful tool to understand and remedy gerrymandering [18], but again a precise quantification of partisan symmetry that will be acceptable to the courts was left undecided. Indeed, formulating precise and computationally efficient measures for partisan bias (i.e., lack of partisan symmetry) that will be acceptable in courts may be considered critical to removal of gerrymandering^{1}^{1}1Even though measuring partisan bias is a nontrivial issue, it has nonetheless been observed that two frequent indicators for partisan bias are cracking [26] (dividing supporters of a specific party between two or more districts when they could be a majority in a single district) and packing [26] (filling a district with more supporters of a specific party as long as this does not make this specific party the winner in that district). Other partisan bias indicators include hijacking [26] (redistricting to force two incumbents to run against each other in one district) and kidnapping [26] (moving an incumbent’s home address into another district).^{2}^{2}2See Section 1.2 regarding the impact of the SCOTUS gerrymandering ruling on 06/27/2019 on future gerrymandering studies..Although there is no dearth of legal briefs filed in courts involving gerrymandering over many years in the past, it is only more recently that mathematicians and applied computational researchers have started to investigate this topic, perhaps due to the tremendous progress in highspeed computation in the last two decades. For example, researchers in [4, 32, 31, 23, 5, 22, 16, 1, 15] have made conceptual or empirical attempts at quantifying gerrymandering and devising redistricting methods to optimize such quantifications using wellknown notions such as compactness and symmetry, whereas researchers in [6, 1, 8, 19, 31, 7] have investigated designing efficient heuristic approach and other computer simulation approaches for this purpose. Two recent research directions deserve specific mentions here. In the first direction, researchers Stephanopoulos and McGhee in several papers such as [21, 30] introduced a new gerrymandering measure called the efficiency gap that attempts to minimize the absolute difference of total wasted votes between the parties in a twoparty electoral system, and very importantly, at least from a legal point of view, this measure was found legally convincing in a US appeals court in a case that claims that the legislative map of the state of Wisconsin is gerrymandered. In another direction, and perhaps of considerable interest to the
algorithmic game theory
researchers, the authors in a recent paper [25] formulated the redistricting process as a twoperson game and analyzed the performances of two kinds of protocols for such games.1.1 Why write this article and why theoretical computer science researchers should care?
Somewhat unfortunately, even though the science of gerrymandering have received varying degrees of attention from legal researchers, mathematicians and applied computational researchers, it has received relatively little attention so far from the theoretical computer science (TCS) researchers (where by “TCS researchers” we mean researchers dealing with theoretical analysis of computational complexity issues of these problems, such as polynomialtime solvabilities, fixedparameter tractabilities, approximability issues, etc.), except few recent results such as [6]. In our opinion there are several reasons for this. Often, some of these problems are described in “nonCS nonmath” journals in a way that may not be very precise and may not be very easy for TCS researchers to follow. Another possible reason is the lack of effective collaboration between TCS researchers and other (perhaps nonCS) researchers working on these problems, perhaps accentuated by the lack of coverage of these topics in TCS publication venues. One of our goals in writing this article is to improve upon this situation. To this effect, the article is motivated by the following two highlevel aims:

[leftmargin=0.5cm]
 (I) Formalization of models and problem statements:

Our formal definitions and descriptions need to satisfy two (perhaps mutually conflicting) goals. The levels of abstraction should be as close to their realworld applications as possible but should still make the problems sufficiently interesting so as to to attract the attention of the TCS researchers.
 (II) Computational complexity analysis:

We provide computational complexity analysis of some versions of these problems, leaving other versions for future research.
Task (I) may not necessarily be as straightforward as it seems, especially since descriptions of some of the problem variations may come from nonCStheory (perhaps legal) venues. Regarding Task (II), one may wonder why computational complexity analysis (including computational hardness results) may of be practical interest at all. To this, we point out a few reasons.

[label=,leftmargin=*]

When a particular type of gerrymandering solution is found acceptable in courts, one would eventually need to develop and implement a software for this solution, especially for large US states such as California and Texas where manual calculations may take too long or may not provide the best result. Any exact or approximation algorithms designed by TCS researchers would be a valuable asset in that respect. Conversely, appropriate computational hardness results can be used to convince a court to not apply that measure for specific US states due to practical infeasibility.

Beyond scientific implications, TCS research works may also be expected to have a beneficial impact on the US judicial system. Some justices, whether at the Supreme Court level or in lower courts, seem to have a reluctance to taking mathematics, statistics and computing seriously [29, 12]. TCS research may be able to help showing that the theoretical methods, whether complicated or not (depending on one’s background), can in fact yield fast accurate computational methods that can be applied to “ungerrymander” the currently gerrymandered maps.
1.2 Remarks on the impact of the Scotus gerrymandering ruling
As this article was being written, SCOTUS issued a ruling on 06/27/2019 on two gerrymandering cases [28]. However, the ruling does not eliminate the need for future gerrymandering studies. While SCOTUS agreed that gerrymandering was antidemocratic, it decided that it is best settled at the legislative and political level, and it encouraged solving the problem at the state court level and delegating legislative redistricting to independent commissions via referendums. Both of the last two remedies do require further scientific studies on gerrymandering. It is also possible that a future SCOTUS may overturn this recent ruling.
2 Precise formulations of several gerrymandering problems
We assume for the rest of the paper that our political system consists of two parties only, namely Party A and Party B. This means that we ignore negligible thirdparty votes as is commonly done by researchers interested in twoparty systems. Although some of our concepts can be extended for three or more major parties, we urge caution since gerrymandering for multiparty systems may need different definitions.
2.1 Input data and its granularity levels
The topological part of an input is generically referred to a “map” which is partitioned into atomic elements or cells (e.g., subdivisions of counties or voting tabulation districts in legal gerrymandering literatures). The following two types of maps may be considered.

[leftmargin=0.5cm]
 Rectilinear polygon without holes (Fig. 2 (a)):

For this case, is placed on a unit grid of size . Then, the atomic elements (cells) of are identified with individual unit squares of the grid inside . We will refer to the cell on the row and column by for and .
 Arbitrary polygon without holes (Fig. 2 (b)):

For this case, is an arbitrary simple polygon, and the atomic elements (cells) of are arbitrary subpolygons (without holes) inside . Such a map can also be thought of a planar graph whose nodes are the cells, and an edge connects two cells if they share a portion of the boundary of nonzero measure. Note that although the planar graph for a given polygonal map is unique, for a given planar graph there are many polygonal maps.
In either case, the size of the map is the number of cells (resp., nodes) in it and, for a cell (resp., a node) and a subpolygon inside the polygonal map (resp., a subgraph of ) the notation will indicate that is inside (resp., is a node of ). Every cell or node of a map has the following numbers associated with it (see Fig. 2 (b)):

[label=]

A strictly positive integer indicating the “total population” inside .

Two nonnegative integers such that . and denotes the total number of voters for Party A and Party B, respectively.
In addition to the above numbers, we are also given a positive integer that denotes the required (legally mandated) number of districts^{3}^{3}3This is a hard constraint since a map with a different value of would be illegal. This precludes one from designing an approximation algorithm in which the value of changes even by just , and conversely a computational hardness result for a value of does not necessarily imply a similar result for another value of .. Based on existing literatures, three types of granularities of these numbers in the input data can be formalized:

[leftmargin=0.5cm]
 Course granularity:
 Fine granularity:

For this case, for every cell or node we have for some fixed constant . This kind of data is obtained, for example, when one uses data at the “Voting Tabulation District” (VTD) level^{4}^{4}4VTDs are often the smallest units in a US state for which the election data are available. or at the “census block” level.
 Ultrafine granularity:

For this case, for some fixed constant for every cell or node . If the different ’s in the fine granularity case do not differ from each other too much then depending on the optimization objective it may be possible to approximate the fine granularity by an ultrafine granularity.
2.2 Legal requirements for valid redistricting plans
Let denote the set of all cells (resp., all nodes) in the given polygonal map (resp., the planar graph ). A districting scheme is a partition of into subsets of cells (resp., nodes), say . One absolutely legally required condition is the following:
“every must be a connected polygon^{5}^{5}5For our purpose, two polygon sharing a single point is assumed to be disconnected from each other. (resp., a connected subgraph)”.
For convenience, we define the following quantities for each :

[leftmargin=0.5cm]
 Party affiliations in :

and .
 Population of :

.
Then, another legally mandated condition in its two forms can be stated as follows.

[leftmargin=0.5cm]
 Strict partitioning criteria:

Ideally, one would like to be a (exact) equipartition of , i.e.,
 Approximately strict partitioning criteria:

In practice, it is nearly impossible to satisfy the strict partitioning criteria. To alleviate this difficulty, the exactness of equipartition is relaxed by allowing , , to differ from each other within an acceptable range. To this effect, we define an approximate equipartition of for a given to be one that satisfies . Rulings such as [33] seem to suggest that the courts may allow a maximum value of in the range of to . Another possibility is to have an additive approximation to the strict partitioning criterion by allowing .
2.3 Optimization objectives to eliminate partisan bias
We describe a few objective functions for optimization to remove partisan bias (in TCS frameworks) that have been proposed in existing literatures or court documents^{6}^{6}6We remind the reader that there is no one single objective function that has been universally accepted in all or most court cases, and it is likely that new objectives will be proposed in the coming years.. Let be the set of districts (partitions) of the set of all cells (resp., nodes) in the given polygonal (resp., planar graph) map. We first define a few related useful notations and concepts.

[leftmargin=0.5cm]
 Winner of a district :

Clearly if then Party A should be the winner and if then Party B should be the winner. What if ? Most existing research works assigned the district to a specific preferred party (e.g., Party A) always for this case, so we will assume this by default. However, in reality, a (fair) cointoss is often used to decide the outcome^{7}^{7}7Please do not underestimate the power of a coin toss. The election for the district for house of delegates in the state of Virginia was decided by a coin toss, and in fact this also decided the legislative control of one of the chambers of the state..
 Normalized seat counts and seat margins of the two parties:

 Normalized vote counts and vote margins of the two parties:

 Wasted votes:
Without loss of generality, assume that . Based on the above notions, we can now describe a few optimization objectives:

[leftmargin=0.5cm]
 Seatvote equation:

For the decision version of this problem, we are required to produce a redistricting plan that exactly satisfies a relationship between between normalized seat counts and normalized vote counts between the two parties. The relationship was stated by [32] as
(1) where is a positive number and denotes almost equality. Kendall and Stuart in [17] argued in favor of using some stochastic models. Some special cases of Equation (1) are as follows:
Proportional representation: , Winnertakeall: .
In practice, a value of is considered to be a reasonable choice. For an optimization version of this problem, assuming and assuming Party A has the responsibility to do the redistricting^{8}^{8}8In other words, Party A chooses the districts in an attempt to his/her desirable value for ., we define an (asymptotic) approximation () as a solution that satisfies
(2) Equation (2) is obviously illdefined when , which may indeed happen in practice for smaller values of such as . We introduce appropriate modifications to Equation (2) to avoid this in the following manner. If then and thus an exact version of the seatvote equation would intuitively want no matter what is. Thus, when , we consider such a solution as an approximation where
(2)  Efficiency gap:

The goal here is to minimize the absolute difference of total wasted votes between the parties, i.e., we need to find a partition that minimizes
 Partisan bias:

Partisan bias is a deviation from bipartisan symmetry that favors one party over the other. The underlying assumption in using this very popular measure is that both the parties should expect to receive the same number of seats given the same vote proportion, i.e., for example, if and the redistricting plan results in then assuming the same redistricting plan should result in . However, since the precise distribution of voters when is not known, the distribution is generated artificially possibly based on some assumptions (which may not always be acceptable to court). Mathematically, a measure of partisan bias can be computed in the following manner.

Let . Note that .

Select such that . These choices depend upon the population shift model being used.

For every district , we create a district that corresponds to the same region (subpolygon or subgraph) but with the following parameters changes:
Note that is another legally valid redistricting plan for but for this new plan the normalized vote count for Party A is given by

Recalculate the normalized seat count for Party A for this new partition .

Define the measure of bias as .
The goal is then to find a partition to minimize .

 Geometric compactness of a polygonal district :

The primary goal of using this measure is to ensure that polygonal districts do not have “unusually weird” shapes (cf. Fig. 1). A most commonly used compactness measure is the socalled “PolsbyPopper compactness measure” [27] given by where is the area and is the length of the perimeter of , and is a suitable constant ( was used in [24]). The computational problem is then to find a redistricting plan such that for all for two given bounds and .
In addition to what is discussed above, there are other constraints and optimization criteria, such as responsiveness (also called swing ratio), equal vote weight and declination, that we did not discuss; the reader is referred to references such as [34, 20, 3] for informal discussions on them.
2.4 Prior relevant computational complexity research
To our knowledge, the most relevant prior nontrivial computational complexity (i.e., approximation hardness, approximation algorithms, etc.) article regarding gerrymandering is [6]. The article [6] exclusively dealt with the efficiency gap measure, and provided some nontrivial approximation hardness and approximation algorithms in addition to designing and implementing a practical algorithm for this case which works well on real maps. In the terminologies of this article, [6] showed that minimization of the efficiency gap measure for rectilinear polygonal maps with coarse grain inputs and strict partitioning criteria does not admit any nontrivial polynomialtime approximation in the worst case, but does admit polynomialtime approximation algorithms when further constraints are added to the problem. In addition, [6] and [30, p. 853] also observed that .
3 Our computational complexity results
Before stating our technical results, we remind the reader about the following obvious but important observations. Consider the following combinations for a pair :

[label=]

is rectilinear polygonal input and is arbitrary polygonal input (equivalently, a planar graph), or

is fine or ultrafine granular input and is coarse input, or
Then, the following statements hold:

[label=]

Any computational hardness result for also implies the same result for .

Any approximation or exact algorithmic result for also implies the same result for .
In the statements of our theorems or lemmas, we will use the following convention. will denote the number of districts. For polygonal maps (resp., planar graph maps) ((resp., ) will denote the polygon as a collection of all cells (resp., the graph), and ((resp., ) will denote an arbitrary valid (not necessarily optimal) solution. Since every state of USA has a valid current districting partition (sometimes subject to litigation), we assume that our problem has already at least one valid (but not necessarily optimal) solution that can be found in polynomial time (thus, for example, for our computational hardness results we are required to exhibit a polynomialtime valid solution).
In the following two subsections, we state our two computational complexity results and some relevant discussions on them, leaving the actual proofs later in Sections 4–6.
3.1 Rectilinear polygonal course granularity input
Theorem 1 (Hardness of seatvote equation computation).
Let be two arbitrary finite rational numbers, and be any two constants arbitrarily close to and , respectively. Suppose that we are allowed a (reasonably loose) additive approximate strict partitioning criteria (i.e., the partitioning satisfies ).
(a) (Hardness when ). It is hard to compute an approximation of the seatvoteequation optimization problem.
(b) (Hardness when ). Let for some two integers and . Then, it is hard to distinguish between the following two cases:

[label=]

if the seatvoteequation has an approximation where

or, if the seatvoteequation has an approximation where .
Moreover, a valid solution that is a approximation always exists irrespective of what definition of of an approximately strict partitioning criterion is used.
Remark 1.
The hardness result in (b) is tight if since a we have a approximation. For there is a factor gap between the two bounds that may be worthy of further investigation. Note that .
Chatterjee et al. [6] showed that the efficiency gap computation does not admit any nontrivial approximation at all using the strict partitioning criterion if the input is given at rectilinear polygonal course granularity level. The following theorem shows that the same result holds even if the strict partitioning criteria is relaxed arbitrarily.
Theorem 2 (Hardness of efficiency gap computation).
Let be any two numbers. Then, it is hard to compute an approximation of even when we are allowed to use approximate equipartition of .
3.2 Arbitrary polygonal fine granularity input
For this case, it is clearer to present our proofs if we assume that the planar graph format of our input, i.e., our input is planar graph whose nodes are the cells, and whose edges connect pairs of cells if they share a portion of the boundary of nonzero measure.
Chatterjee et al. [6] left open the complexity of the efficiency gap computation at the fine granularity level of inputs using either exact or approximate partitioning criteria. Here we show that computing the efficiency gap is complete for arbitrary polygonal fine granularity input even under approximately strict partitioning criteria.
Theorem 3 (Hardness of efficiency gap computation).
Computing is complete even when we are allowed to use approximate equipartition of for any constant .
Remark 2.
The hardness reduction in Theorem 3 does not provide any nontrivial inapproximability ratio. In fact, for the specific hard instances of the gerrymandering problem constructed in the proof of Theorem 3, it is possible to design a polynomialtime approximation scheme (PTAS) for the efficiency gap computation using the approach in [2] the proof of such a PTAS is relatively straightforward and therefore we do not provide an explicit proof.
3.3 What do results and proofs in Theorem 1 and Theorem 3 imply in the context of gerrymandering in US?
Our results are computational hardness result, so one obvious question is about the implications of these results and associated proofs for gerrymandering in US. To this effect, we offer the following motivations and insights that might be of independent interest.

[leftmargin=0.1in]
 On following the seatvote equation:

Theorem 1 indicates that efficient computation of even a modest approximation to the seatvote equation may be difficult. Thus, unless further research works indicate otherwise, it may not be a good idea to closely follow the seatvote equation for computationally efficient elimination of gerrymandering (fortunately, many courts also do not recommend on following the seatvote proportion too closely, though not for computational complexity reasons).
 On relaxing the exact equipartition criteria:

Relaxing the exact equipartition criteria even beyond the
% margin that has traditionally been allowed by courts does not seem to make removal of gerrymandering computationally any easier.  On accurate census data at the fine granularity level:

Accurate census data at the fine granularity level may make a difference to an independent commission seeking fair districts (such as in California). As stated in Remark 2, while it is difficult to even approximately optimize the absolute difference of the wasted votes at a course granularity level of inputs, the situation at the fine granularity level of inputs may be not so hopeless.
 On cracking and packing, how far one can push?

It is wellknown that cracking and packing may result in large partisan bias. For example, based on election data for election of the (federal) house of representatives for the states of Virginia, the Democratic party had a normalized vote count of about % but due to cracking/packing held only of the house seats [35, 36]. This observation, coupled with the knowledge that Virginia is one of the most gerrymandered states in US both on the congressional and state levels [38], leads to the following natural question: “could the Virginia lawmakers have disadvantaged the Democratic party more by even more careful execution of cracking and packing approaches”? As one lawmaker put it quite bluntly, they would have liked to gerrymander more if only they could.
We believe a partial answer to this is provided by the proof structures for Theorems 2 and 3. A careful inspection of the proofs of Theorems 2 and 3 reveal that they do use cracking and packing^{9}^{9}9For example, packing is used in the proof of Theorem 3 when a node with extra supporters for Party A is packed in the same district with the three nodes , and each having extra supporters for Party B (see Fig. 4). to create hard instances of the efficiency gap minimization problem that are computationally intractable to solve optimally certainly at the course granularity input level and even at the fine granularity input level^{10}^{10}10The proofs of Theorems 2 and 3 however do not make much use of hijacking or kidnapping.. Perhaps the computational complexity issues did save the Democratic party from further electoral disadvantages.
4 Proof of Theorem 1
(a) We reduce from the complete PARTITION problem [13] which is defined as follows:
given a set of positive integers , decide if there exists a subset such that where is an even number.
Note that we can assume without loss of generality that is sufficiently large, and each of is a multiple of any fixed positive integer (in particular, multiple of ), , no two integers in are equal and .
Proof for .
Multiplying and by , and denoting them by the same notations we can therefore assume that the minimum absolute difference between any two distinct numbers in is at least and . Our rectilinear polygon is a rectangle of size (see Fig. 3 (a)) with the following numbers for various cells:
Note that:

[label=]

.

.

.
First, as required, we show that has a valid solution satisfying all the constraints. Consider the following solution (refer to Fig. 3 (b)):
We can now verify the following:
and thus the partitioning constraint is satisfied since . Since , the proof is complete once the following claims are shown.
0.6cm
 (completeness)

If the PARTITION problem has a solution then .
 (soundness)

If the PARTITION problem does not have a solution then .
Proof of completeness (refer to Fig. 3 (c))
Suppose that there is a valid solution of of PARTITION and consider the two polygons
One can now verify the following:

[label=,leftmargin=*]

, , and thus the partitioning constraint is satisfied since .

, , and thus since .

, , and thus since .
Proof of soundness
Let and be the two partitions in any valid solution of the redistricting problem. For convenience, let us define the following sets:
The following chain of arguments prove the desired claim.

[leftmargin=0.5cm]
 (i)

Both the cells in cannot be together in the same partition, say , with any cell, say , from since in that case
 (ii)

At least one of and must be empty. To see this, assume that both are nonempty. By (i), we may suppose that and . Since the PARTITION problem does not have a solution, . Assume, without loss of generality, that . Then, , and therefore , thus violating the partitioning constraints.
 (iii)

Since both and cannot be empty, by (ii) assume that but . Then, by (i), both and are in . We can now verify that as follows:

, , and thus .

, , and thus .

Proof for .
Let for some two integers and . For this case, we will use copies, say , of the rectangle used for the previous case connected via connector cells, say , plus additional one or two cells, say and , depending on whether the value of is or , respectively (refer to Fig. 3 (d)). We now multiply and by , and again denoting them by the same notations we can therefore assume that the minimum absolute difference between any two distinct numbers in is at least and . We assign the required numbers to the connector and additional cells as follows: and for all . Letting denote the actual number of connector cells, we now have the following updated calculations:
Claim 1.
Any of the connector or additional cells cannot appear in the same partition with a cell from for any .
Proof. Suppose that the connector cell is together with at least one of two cells from in a partition, say . Then, . Note that
and thus there exists a partition , , such that . Consequently, it follows that
which violates the partitioning constraint. ❑
It is possible to generalize the proof for to . Intuitively, if there is a solution to the PARTITION problem then one of the two seats in each copy is won by Party A but otherwise Party A wins no seat at all. The correspondingly modified completeness and soundness claims are as follows:
0.2cm
 (completeness for )

If the PARTITION problem has a solution then .
 (soundness for )

If the PARTITION problem does not have a solution then .
(b) We can use a proof similar to that in (a) for , but we need to change some of the numbers. More precisely, the cell in the very first copy has the following new number (instead of the previous value of ) corresponding to the total number of voters for Party A: where is the smallest integer such that . Note that since . A relevant calculation is:
and therefore , as required. The only difference in the proofs come from the fact that now in the first copy Party A always wine one seat by default but wins two seats if PARTITION has a solution. The correspondingly modified completeness and soundness claims are as follows:
0.6cm
 (modified completeness claim for )

If the PARTITION problem has a solution then .
 (modified soundness claim for )

If the PARTITION problem does not have a solution then .
To see that these completeness and soundness claims indeed prove the desired bounds, note the following:

[label=]

and .

If and , then , and thus this gives a approximation since .

If and , then , and thus this gives a approximation since .

For any , if then and thus this gives a approximation.
For the existence of a approximation when , note that for any valid solution for , , and thus there must exists a district such that .
5 Proof sketch of Theorem 2
The proof is obtained by carefully modifying the proof of Theorem 4 in [6] in the following manner:

[label=,leftmargin=*]

We remove all cells with zero population. As a result, the rectangle in [6] now becomes a rectilinear polygon (without holes).

We multiply all the nonzero values of ’s and ’s by . It is possible to verify that as a result the following claim holds:
for any two districts and , implies either or .
This ensures that for any valid partition of the rectilinear polygon.

The new soundness and completeness claims now become as follows: 0.6cm
 (soundness)

If the PARTITION problem does not have a solution then .
 (completeness)

If the PARTITION problem has a solution then .
where is exactly as defined in [6]
6 Proof of Theorem 3
The problem is trivially in , so wil
Comments
There are no comments yet.