undefined | Better HN

0 pointstzs8y ago0 comments

(Note: this comment contains spoilers for the original problem and for my proposed variation)

(Note: I'm going to use Python3-based pseudo code, except I'm going to assume 1-based array numbers, so the input is in a[1], a[2], a[3], ..., a[N].)

Suppose the input input list has N integers. Then the smallest positive integer not in the list must be one of 1, 2, 3, ..., N+1. (Prove by noting that all positive integers less than the smallest missing positive integer must be in the list, and then apply the pigeonhole principle).

In the classic problem, where you can use the input list as scratch memory and do not have to restore it, there are two approaches I've seen.

1. The sorting/permuting approach. With this approach you go through the list, and for a[k] you do

  while 1 <= a[k] <= N and a[a[k]] != a[k]:
    a[k], a[a[k]] = a[a[k]], a[k]

The result is that if a 1 is in the list, a[1] ends up with 1. If 2 is in the list, a[2] ends up with 2, and so on. Then you can scan the list looking for the first k such that a[k] != k, and k is the smallest missing positive. If a[k] == k for all k, then k+1 is the fist missing positive.

If one takes this approach, I don't see much hope in restoring the input back to its original state after identifying the missing integer.

2. The flagging approach. Conceptually you have an array, b, of N Boolean values, initialized to all False, and do this:

  for k in range(1,N+1):
    if 1 <= a[k] <= N:
      b[k] = True

Then you scan b looking for the first a[k] that is False. k is the missing integer. If all of is True, N+1 is the missing integer.

Note that if all of the input integers were known to be positive, we could use the sign bits of a[] as out b[].

  for k in range(1,N+1):
    if 1 <= abs(a[k]) <= N:
      a[a[k]] = -abs(a[a[k]])

(The abs is there on the assignment to deal with duplicate values in the input).

Then we can scan a[] looking for the first positive a[k], and k is the missing integer, or N+1 if all a[k] < 0.

In this special case, where we know all of the input is positive, we can restore the input back to its original value after finding the missing by simply setting a[k] = abs(a[k]) for each k.

The flagging using sign bit approach also works if not all the input is positive. Just add this at the start:

  for k in range(1,N+1):
    if a[k] <= 0:
      a[k] = N+2

(If N+2 ≥ the maximum positive integer your language allows, then you are going to have to do some special cases that I'm not going to go into).

But now we cannot reverse this after we find the number. If we had a way to get rid of the non-positive entries that we could later reverse, we'd be good. One approach would be to find the most negative value, m, in the list, and then add abs(m) + 1 to everything. Then we could do the prior positive-only sign flagging approach, remembering that when we want the value at position k it is a[k] - m - 1, not a[k]. At the end, we then just subtract m + 1 from a[].

This works as long as M + m + 1, where M is the most positive value in the list, is not greater than the maximum positive integer.

Here's some working Python3 code for this (so arrays are back to 0-based, and so there are "- 1"s here and there):

  def firstMissingPositive(a):
      s = min(a)
      if s > 1:
          return 1
      s = abs(s) + 1
      for i in range(len(a)):
          a[i] += s
      for i in range(len(a)):
          v = abs(a[i]) - s
          if 1 <= v <= len(a):
              a[v-1] = -abs(a[v-1])
      missing = len(a) + 1
      for i in range(len(a)):
          if a[i] > 0:
              missing = i+1
              break
      for i in range(len(a)):
          a[i] = abs(a[i]) - s
      return missing

For Python3, which automatically does big integers, so M + n + 1 will not overflow, this should always solve my proposed restoring variant of the problem.

For systems where that can overflow, we need another approach. I've had some ideas, but they all would also run into overflow problems, so I'm currently stumped.

I think this subproblem is interesting enough on its own to state as a separate problem:

Write a function that given a list S of integers, returns an invertible integer function f and its inverse f', such that for all k in S, f(k) exists, is > 0, and INT_MIN <= -f(k) <= f(k) <= INT_MAX where INT_MIN and INT_MAX are the smallest and largest integers, respectively, your language handles. This should run in O(N) or better time and O(1) space.

Then to solve the restoring variant of missing integer, find f for your input, apply f to the input, and then do the positive-only flagging version of missing integer (using f' to get to the original values of a[]). Use f' to restore the input at the end.

This can be further generalized, because flagging by sign is not the only way to flag. For instance, if we happened to know that all of the input was even, we could use a[k] |= 1 to flag it, and we could clear the flag at the end with a[k] &= ~1. If not all of the input is even, we could start with a pass of a[k] *= 2 and undo it at the end with a[k] //= 2.

That gives us this problem to contemplate. Write a function that given a list S of integers, returns using O(N) time and O(1) space an invertible integer function f and its inverse f', invertible integer function m and its inverse m', and integer function t, such that.

1. For all k in S, f(k) exists

2. For all f(k), m(f(k)) exists

3. t(f(k)) = 0, t(m(f(k)) = 1

f, f', m, m', and t should all be O(1) time and O(1) space.

def firstMissingPositive(a): s = min(a) if s > 1: return 1 s = abs(s) + 1 for i in range(len(a)): a[i] += s for i in range(len(a)): v = abs(a[i]) - s if 1 <= v <= len(a): a[v-1] = -abs(a[v-1]) missing = len(a) + 1 for i in range(len(a)): if a[i] > 0: missing = i+1 break for i in range(len(a)): a[i] = abs(a[i]) - s return missing

0 comments

1 comments · 1 top-level

JoachimSchipper8y ago

I was indeed thinking of (1), albeit just using radix sort - your permutation trick is nicer.

But (2) doesn't work in general, because information theory may not allow storing more information in the input array than is already there; I'm convinced myself, but I haven't come up with a better argument than "obvious" in the time I had available. ;-)

j / k navigate · click thread line to collapse