8 Binary Search

9 Python에서 이진 검색을 수행하는 방법

Python에서 이진 검색을 수행하는 방법 기사를 보완하는 코드 스니펫입니다.

9.1 사용법

9.1.1 샘플 데이터셋 다운로드

벤치마킹용 샘플 데이터셋을 다운로드하려면 다음 스크립트를 실행하십시오.

$ python download_imdb.py
IMDb에서 데이터 가져오기...
"names.txt" 및 "sorted_names.txt" 생성됨

9.1.2 벤치마크

참고: 해시 기반 검색 알고리즘은 별도의 데이터 구조가 필요하므로 여기에 포함되지 않았습니다. 자유롭게 연습 문제로 취급하십시오.

임의 검색을 사용하여 정렬되지 않은 이름에서 Arnold Schwarzenegger의 인덱스 찾기:

$ python benchmark.py -a random -f names.txt 'Arnold Schwarzenegger'
이름 로드 중... 확인
[1/10] 인덱스 215에서 찾음 (21.68초)
[2/10] 인덱스 215에서 찾음 (148.10밀리초)
[3/10] 인덱스 215에서 찾음 (63.49초)
[4/10] 인덱스 215에서 찾음 (15.93초)
[5/10] 인덱스 215에서 찾음 (23.38초)
[6/10] 인덱스 215에서 찾음 (60.98초)
[7/10] 인덱스 215에서 찾음 (10.14초)
[8/10] 인덱스 215에서 찾음 (15.81초)
[9/10] 인덱스 215에서 찾음 (5.10초)
[10/10] 인덱스 215에서 찾음 (13.36초)
최고=148.10밀리초, 최악=63.49초, 평균=23.00초, 중앙값=15.87초

선형 검색을 사용하여 정렬되지 않은 이름에서 Arnold Schwarzenegger의 인덱스 찾기:

$ python benchmark.py -a linear -f names.txt 'Arnold Schwarzenegger'
이름 로드 중... 확인
[1/10] 인덱스 215에서 찾음 (30.95마이크로초)
[2/10] 인덱스 215에서 찾음 (26.86마이크로초)
[3/10] 인덱스 215에서 찾음 (26.26마이크로초)
[4/10] 인덱스 215에서 찾음 (26.55마이크로초)
[5/10] 인덱스 215에서 찾음 (26.24마이크로초)
[6/10] 인덱스 215에서 찾음 (25.88마이크로초)
[7/10] 인덱스 215에서 찾음 (25.77마이크로초)
[8/10] 인덱스 215에서 찾음 (25.89마이크로초)
[9/10] 인덱스 215에서 찾음 (25.91마이크로초)
[10/10] 인덱스 215에서 찾음 (25.99마이크로초)
최고=25.77마이크로초, 최악=30.95마이크로초, 평균=26.63마이크로초, 중앙값=26.11마이크로초

정렬된 이름에서 이진 검색을 사용하여 Arnold Schwarzenegger의 인덱스 찾기:

$ python benchmark.py -a binary -f sorted_names.txt 'Arnold Schwarzenegger'
이름 로드 중... 확인
[1/10] 인덱스 782431에서 찾음 (18.82마이크로초)
[2/10] 인덱스 782431에서 찾음 (9.19마이크로초)
[3/10] 인덱스 782431에서 찾음 (11.08마이크로초)
[4/10] 인덱스 782431에서 찾음 (9.70마이크로초)
[5/10] 인덱스 782431에서 찾음 (10.14마이크로초)
[6/10] 인덱스 782431에서 찾음 (9.35마이크로초)
[7/10] 인덱스 782431에서 찾음 (9.42마이크로초)
[8/10] 인덱스 782431에서 찾음 (8.79마이크로초)
[9/10] 인덱스 782431에서 찾음 (8.66마이크로초)
[10/10] 인덱스 782431에서 찾음 (8.79마이크로초)
최고=8.66마이크로초, 최악=18.82마이크로초, 평균=10.39마이크로초, 중앙값=9.38마이크로초

9.2 파일: `benchmark.py`

검색 알고리즘의 성능을 벤치마킹합니다.

요구사항: * 파이썬 3.7+

사용법: $ python benchmark.py -a 무작위 -f names.txt ‘아놀드 슈워제네거’ $ python benchmark.py -a 선형 -f names.txt ‘아놀드 슈워제네거’ $ python benchmark.py -a 바이너리 -f sorted_names.txt ‘아놀드 슈워제네거’

"""
Benchmark the performance of a search algorithm.

Requirements:
 * Python 3.7+

Usage:
$ python benchmark.py -a random -f names.txt 'Arnold Schwarzenegger'
$ python benchmark.py -a linear -f names.txt 'Arnold Schwarzenegger'
$ python benchmark.py -a binary -f sorted_names.txt 'Arnold Schwarzenegger'
"""

import argparse
import time
from statistics import median
from typing import List

from search.binary import find_index as binary_search
from search.linear import find_index as linear_search
from search.random import find_index as random_search


def main(args: argparse.Namespace) -> None:
    """Script entry point."""

    algorithms = {
        "random": random_search,
        "linear": linear_search,
        "binary": binary_search,
    }

    benchmark(
        algorithms[args.algorithm], load_names(args.path), args.search_term
    )


def parse_args() -> argparse.Namespace:
    """Parse command line arguments."""
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "-a", "--algorithm", choices=("random", "linear", "binary")
    )
    parser.add_argument("-f", "--file", dest="path")
    parser.add_argument("search_term")
    return parser.parse_args()


def load_names(path: str) -> List[str]:
    """Return a list of names from the given file."""
    print("Loading names...", end="", flush=True)
    with open(path) as text_file:
        names = text_file.read().splitlines()
        print("ok")
        return names


def convert(nano: int) -> str:
    """Convert nano seconds to a formatted string."""

    kilo, mega, giga = 1e3, 1e6, 1e9

    if nano < kilo:
        return f"{nano} ns"

    if nano < mega:
        return f"{nano / kilo:.2f} µs"

    if nano < giga:
        return f"{nano / mega:.2f} ms"

    return f"{nano / giga:.2f} s"


def benchmark(
    algorithm, elements: List[str], value: str, repeat: int = 10
) -> None:
    """Search for a value in elements using the given algorithm."""

    times: List[int] = []
    for i in range(repeat):
        print(f"[{i + 1}/{repeat}] Searching...", end="", flush=True)
        start_time = time.perf_counter_ns()
        index = algorithm(elements, value)
        elapsed_time = time.perf_counter_ns() - start_time
        times.append(elapsed_time)
        print("\b" * 12, end="")
        if index is None:
            print(f"Not found ({convert(elapsed_time)})")
        else:
            print(f"Found at index={index} ({convert(elapsed_time)})")

    print(
        f"best={convert(min(times))}",
        f"worst={convert(max(times))}",
        f"avg={convert(int(sum(times) / len(times)))}",
        f"median={convert(int(median(times)))}",
        sep=", ",
    )


if __name__ == "__main__":
    try:
        main(parse_args())
    except KeyboardInterrupt:
        print("Aborted")

9.3 파일: `download_imdb.py`

IMDb에서 사람 이름을 가져와서 구문 분석합니다.

사용법: $ 파이썬 다운로드_imdb.py

#!/usr/bin/env python

"""
Fetch and parse people names from the IMDb.

Usage:
$ python download_imdb.py
"""

import csv
import gzip
import shutil
import tempfile
import urllib.request


def main():
    """Script entry point."""

    print("Fetching data from IMDb...")

    with open("names.txt", "w", encoding="utf-8") as destination:
        destination.writelines(names())

    with (
        open("names.txt", encoding="utf-8") as source,
        open("sorted_names.txt", "w", encoding="utf-8") as destination,
    ):
        destination.writelines(sorted(source.readlines()))

    print('Created "names.txt" and "sorted_names.txt"')


def names():
    """Return a generator of names with a trailing newline."""
    url = "https://datasets.imdbws.com/name.basics.tsv.gz"
    with urllib.request.urlopen(url) as response:
        with tempfile.NamedTemporaryFile(mode="w+b") as archive:
            shutil.copyfileobj(response, archive)
            archive.seek(0)
            with gzip.open(archive, mode="rt", encoding="utf-8") as tsv_file:
                tsv = csv.reader(tsv_file, delimiter="\t")
                next(tsv)  # Skip the header
                for record in tsv:
                    full_name = record[1]
                    yield f"{full_name}\n"


if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        print("Aborted")

9.4 파일: `search/init.py`

from typing import Callable, TypeVar, Union

T = TypeVar("T")
S = TypeVar("S")

Key = Callable[[T], Union[T, S]]


def identity(element: T) -> Union[T, S]:
    """Identity function serving as a default key provider."""
    return element

9.5 파일: `search/binary.py`

이진 검색 알고리즘.

"""
The binary search algorithm.
"""

from typing import Optional, Sequence, Set

from search import Key, S, T, identity


def find_index(
    elements: Sequence[T], value: S, key: Key = identity
) -> Optional[int]:
    """Return the index of value in elements or None."""

    left, right = 0, len(elements) - 1

    while left <= right:
        middle = (left + right) // 2

        middle_element = key(elements[middle])

        if middle_element == value:
            return middle

        if middle_element < value:
            left = middle + 1
        elif middle_element > value:
            right = middle - 1

    return None


def find_leftmost_index(
    elements: Sequence[T], value: S, key: Key = identity
) -> Optional[int]:
    """Return the leftmost index of value in elements or None."""

    index = find_index(elements, value, key)

    if index is not None:
        while index >= 0 and key(elements[index]) == value:
            index -= 1
        index += 1

    return index


def find_rightmost_index(
    elements: Sequence[T], value: S, key: Key = identity
) -> Optional[int]:
    """Return the rightmost index of value in elements or None."""

    index = find_index(elements, value, key)

    if index is not None:
        while index < len(elements) and key(elements[index]) == value:
            index += 1
        index -= 1

    return index


def find_all_indices(
    elements: Sequence[T], value: S, key: Key = identity
) -> Set[int]:
    """Return a set of indices of elements with matching key."""

    left = find_leftmost_index(elements, value, key)
    right = find_rightmost_index(elements, value, key)

    if left and right:
        return set(range(left, right + 1))

    return set()


def find(elements: Sequence[T], value: S, key: Key = identity) -> Optional[T]:
    """Return an element with matching key or None."""
    return _get(elements, find_index(elements, value, key))


def find_leftmost(
    elements: Sequence[T], value: S, key: Key = identity
) -> Optional[T]:
    """Return the leftmost element or None."""
    return _get(elements, find_leftmost_index(elements, value, key))


def find_rightmost(
    elements: Sequence[T], value: S, key: Key = identity
) -> Optional[T]:
    """Return the rightmost element or None."""
    return _get(elements, find_rightmost_index(elements, value, key))


def find_all(elements: Sequence[T], value: S, key: Key = identity) -> Set[T]:
    """Return a set of elements with matching key."""
    return {elements[i] for i in find_all_indices(elements, value, key)}


def contains(elements: Sequence[T], value: S, key: Key = identity) -> bool:
    """Return True if value is present in elements."""
    return find_index(elements, value, key) is not None


def _get(elements: Sequence[T], index: Optional[int]) -> Optional[T]:
    """Return element at the given index or None."""
    return None if index is None else elements[index]

9.6 파일: `search/linear.py`

선형 검색 알고리즘.

"""
The linear search algorithm.
"""

from typing import Optional, Sequence

from search import Key, S, T, identity


def find_index(
    elements: Sequence[T], value: S, key: Key = identity
) -> Optional[int]:
    """Return the index of value in elements or None."""
    for i, element in enumerate(elements):
        if key(element) == value:
            return i
    return None


def find(elements: Sequence[T], value: S, key: Key = identity) -> Optional[T]:
    """Return an element with matching key or None."""
    index = find_index(elements, value, key)
    return None if index is None else elements[index]


def contains(elements: Sequence[T], value: S, key: Key = identity) -> bool:
    """Return True if value is present in elements."""
    return find_index(elements, value, key) is not None

9.7 파일: `search/random.py`

무작위 검색 알고리즘.

"""
The random search algorithm.
"""

import random
from typing import Optional, Sequence, Set

from search import Key, S, T, identity


def find_index(
    elements: Sequence[T], value: S, key: Key = identity
) -> Optional[int]:
    """Return the index of value in elements or None."""
    visited: Set[int] = set()
    while len(visited) < len(elements):
        random_index = random.randint(0, len(elements) - 1)
        visited.add(random_index)
        if key(elements[random_index]) == value:
            return random_index
    return None


def find(elements: Sequence[T], value: S, key: Key = identity) -> Optional[T]:
    """Return an element with matching key or None."""
    index = find_index(elements, value, key)
    return None if index is None else elements[index]


def contains(elements: Sequence[T], value: S, key: Key = identity) -> bool:
    """Return True if value is present in elements."""
    return find_index(elements, value, key) is not None

9 Python에서 이진 검색을 수행하는 방법

9.1 사용법

9.1.1 샘플 데이터셋 다운로드

9.1.2 벤치마크

9.2 파일: benchmark.py

9.3 파일: download_imdb.py

9.4 파일: search/__init__.py

9.5 파일: search/binary.py

9.6 파일: search/linear.py

9.7 파일: search/random.py