mediapy
mediapy: Read/write/show images and videos in an IPython/Jupyter notebook.
[GitHub source] [API docs] [PyPI package] [Colab example]
See the example notebook, or better yet, open it in Colab.
Image examples
Display an image (2D or 3D numpy array):
checkerboard = np.kron([[0, 1] * 16, [1, 0] * 16] * 16, np.ones((4, 4)))
show_image(checkerboard)
Read and display an image (either local or from the Web):
IMAGE = 'https://github.com/hhoppe/data/raw/main/image.png'
show_image(read_image(IMAGE))
Read and display an image from a local file:
!wget -q -O /tmp/burano.png {IMAGE}
show_image(read_image('/tmp/burano.png'))
Show titled images side-by-side:
images = {
'original': checkerboard,
'darkened': checkerboard * 0.7,
'random': np.random.rand(32, 32, 3),
}
show_images(images, vmin=0.0, vmax=1.0, border=True, height=64)
Compare two images using an interactive slider:
compare_images([checkerboard, np.random.rand(128, 128, 3)])
Video examples
Display a video (an iterable of images, e.g., a 3D or 4D array):
video = moving_circle((100, 100), num_images=10)
show_video(video, fps=10)
Show the video frames side-by-side:
show_images(video, columns=6, border=True, height=64)
Show the frames with their indices:
show_images({f'{i}': image for i, image in enumerate(video)}, width=32)
Read and display a video (either local or from the Web):
VIDEO = 'https://github.com/hhoppe/data/raw/main/video.mp4'
show_video(read_video(VIDEO))
Create and display a looping two-frame GIF video:
image1 = resize_image(np.random.rand(10, 10, 3), (50, 50))
show_video([image1, image1 * 0.8], fps=2, codec='gif')
Darken a video frame-by-frame:
output_path = '/tmp/out.mp4'
with VideoReader(VIDEO) as r:
darken_image = lambda image: to_float01(image) * 0.5
with VideoWriter(output_path, shape=r.shape, fps=r.fps, bps=r.bps) as w:
for image in r:
w.add_image(darken_image(image))
1# Copyright 2025 The mediapy Authors. 2# 3# Licensed under the Apache License, Version 2.0 (the "License"); 4# you may not use this file except in compliance with the License. 5# You may obtain a copy of the License at 6# 7# http://www.apache.org/licenses/LICENSE-2.0 8# 9# Unless required by applicable law or agreed to in writing, software 10# distributed under the License is distributed on an "AS IS" BASIS, 11# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12# See the License for the specific language governing permissions and 13# limitations under the License. 14 15"""`mediapy`: Read/write/show images and videos in an IPython/Jupyter notebook. 16 17[**[GitHub source]**](https://github.com/google/mediapy) 18[**[API docs]**](https://google.github.io/mediapy/) 19[**[PyPI package]**](https://pypi.org/project/mediapy/) 20[**[Colab 21example]**](https://colab.research.google.com/github/google/mediapy/blob/main/mediapy_examples.ipynb) 22 23See the [example 24notebook](https://github.com/google/mediapy/blob/main/mediapy_examples.ipynb), 25or better yet, [**open it in 26Colab**](https://colab.research.google.com/github/google/mediapy/blob/main/mediapy_examples.ipynb). 27 28## Image examples 29 30Display an image (2D or 3D `numpy` array): 31```python 32checkerboard = np.kron([[0, 1] * 16, [1, 0] * 16] * 16, np.ones((4, 4))) 33show_image(checkerboard) 34``` 35 36Read and display an image (either local or from the Web): 37```python 38IMAGE = 'https://github.com/hhoppe/data/raw/main/image.png' 39show_image(read_image(IMAGE)) 40``` 41 42Read and display an image from a local file: 43```python 44!wget -q -O /tmp/burano.png {IMAGE} 45show_image(read_image('/tmp/burano.png')) 46``` 47 48Show titled images side-by-side: 49```python 50images = { 51 'original': checkerboard, 52 'darkened': checkerboard * 0.7, 53 'random': np.random.rand(32, 32, 3), 54} 55show_images(images, vmin=0.0, vmax=1.0, border=True, height=64) 56``` 57 58Compare two images using an interactive slider: 59```python 60compare_images([checkerboard, np.random.rand(128, 128, 3)]) 61``` 62 63## Video examples 64 65Display a video (an iterable of images, e.g., a 3D or 4D array): 66```python 67video = moving_circle((100, 100), num_images=10) 68show_video(video, fps=10) 69``` 70 71Show the video frames side-by-side: 72```python 73show_images(video, columns=6, border=True, height=64) 74``` 75 76Show the frames with their indices: 77```python 78show_images({f'{i}': image for i, image in enumerate(video)}, width=32) 79``` 80 81Read and display a video (either local or from the Web): 82```python 83VIDEO = 'https://github.com/hhoppe/data/raw/main/video.mp4' 84show_video(read_video(VIDEO)) 85``` 86 87Create and display a looping two-frame GIF video: 88```python 89image1 = resize_image(np.random.rand(10, 10, 3), (50, 50)) 90show_video([image1, image1 * 0.8], fps=2, codec='gif') 91``` 92 93Darken a video frame-by-frame: 94```python 95output_path = '/tmp/out.mp4' 96with VideoReader(VIDEO) as r: 97 darken_image = lambda image: to_float01(image) * 0.5 98 with VideoWriter(output_path, shape=r.shape, fps=r.fps, bps=r.bps) as w: 99 for image in r: 100 w.add_image(darken_image(image)) 101``` 102""" 103 104from __future__ import annotations 105 106__docformat__ = 'google' 107__version__ = '1.2.6' 108__version_info__ = tuple(int(num) for num in __version__.split('.')) 109 110import base64 111from collections.abc import Callable, Iterable, Iterator, Mapping, Sequence 112import contextlib 113import functools 114import importlib 115import io 116import itertools 117import math 118import numbers 119import os # Package only needed for typing.TYPE_CHECKING. 120import pathlib 121import re 122import shlex 123import shutil 124import subprocess 125import sys 126import tempfile 127import typing 128from typing import Any 129import urllib.request 130import warnings 131 132import IPython.display 133import matplotlib.pyplot 134import numpy as np 135import numpy.typing as npt 136import PIL.Image 137import PIL.ImageOps 138 139 140if not hasattr(PIL.Image, 'Resampling'): # Allow Pillow<9.0. 141 PIL.Image.Resampling = PIL.Image # type: ignore 142 143# Selected and reordered here for pdoc documentation. 144__all__ = [ 145 'show_image', 146 'show_images', 147 'compare_images', 148 'show_video', 149 'show_videos', 150 'read_image', 151 'write_image', 152 'read_video', 153 'write_video', 154 'VideoReader', 155 'VideoWriter', 156 'VideoMetadata', 157 'compress_image', 158 'decompress_image', 159 'compress_video', 160 'decompress_video', 161 'html_from_compressed_image', 162 'html_from_compressed_video', 163 'resize_image', 164 'resize_video', 165 'to_rgb', 166 'to_type', 167 'to_float01', 168 'to_uint8', 169 'set_output_height', 170 'set_max_output_height', 171 'color_ramp', 172 'moving_circle', 173 'set_show_save_dir', 174 'set_ffmpeg', 175 'video_is_available', 176] 177 178if TYPE_CHECKING: 179 _ArrayLike = npt.ArrayLike 180 _DTypeLike = npt.DTypeLike 181 _NDArray = npt.NDArray[Any] 182 _DType = np.dtype[Any] 183else: 184 # Create named types for use in the `pdoc` documentation. 185 _ArrayLike = TypeVar('_ArrayLike') 186 _DTypeLike = TypeVar('_DTypeLike') 187 _NDArray = TypeVar('_NDArray') 188 _DType = TypeVar('_DType') # pylint: disable=invalid-name 189 190_IPYTHON_HTML_SIZE_LIMIT = 10**10 # Unlimited seems to be OK now. 191_T = TypeVar('_T') 192_Path = Union[str, 'os.PathLike[str]'] 193 194_IMAGE_COMPARISON_HTML = """\ 195<script 196 defer 197 src="https://unpkg.com/img-comparison-slider@7/dist/index.js" 198></script> 199<link 200 rel="stylesheet" 201 href="https://unpkg.com/img-comparison-slider@7/dist/styles.css" 202/> 203 204<img-comparison-slider> 205 <img slot="first" src="data:image/png;base64,{b64_1}" /> 206 <img slot="second" src="data:image/png;base64,{b64_2}" /> 207</img-comparison-slider> 208""" 209 210# ** Miscellaneous. 211 212 213class _Config: 214 ffmpeg_name_or_path: _Path = 'ffmpeg' 215 show_save_dir: _Path | None = None 216 217 218_config = _Config() 219 220 221def _open(path: _Path, *args: Any, **kwargs: Any) -> Any: 222 """Opens the file; this is a hook for the built-in `open()`.""" 223 return open(path, *args, **kwargs) 224 225 226def _path_is_local(path: _Path) -> bool: 227 """Returns True if the path is in the filesystem accessible by `ffmpeg`.""" 228 del path 229 return True 230 231 232def _search_for_ffmpeg_path() -> str | None: 233 """Returns a path to the ffmpeg program, or None if not found.""" 234 if filename := shutil.which(_config.ffmpeg_name_or_path): 235 return str(filename) 236 return None 237 238 239def _print_err(*args: str, **kwargs: Any) -> None: 240 """Prints arguments to stderr immediately.""" 241 kwargs = {**dict(file=sys.stderr, flush=True), **kwargs} 242 print(*args, **kwargs) 243 244 245def _chunked( 246 iterable: Iterable[_T], n: int | None = None 247) -> Iterator[tuple[_T, ...]]: 248 """Returns elements collected as tuples of length at most `n` if not None.""" 249 250 def take(n: int | None, iterable: Iterable[_T]) -> tuple[_T, ...]: 251 return tuple(itertools.islice(iterable, n)) 252 253 return iter(functools.partial(take, n, iter(iterable)), ()) 254 255 256def _peek_first(iterator: Iterable[_T]) -> tuple[_T, Iterable[_T]]: 257 """Given an iterator, returns first element and re-initialized iterator. 258 259 >>> first_image, images = _peek_first(moving_circle()) 260 261 Args: 262 iterator: An input iterator or iterable. 263 264 Returns: 265 A tuple (first_element, iterator_reinitialized) containing: 266 first_element: The first element of the input. 267 iterator_reinitialized: A clone of the original iterator/iterable. 268 """ 269 # Inspired from https://stackoverflow.com/a/12059829/1190077 270 peeker, iterator_reinitialized = itertools.tee(iterator) 271 first = next(peeker) 272 return first, iterator_reinitialized 273 274 275def _check_2d_shape(shape: tuple[int, int]) -> None: 276 """Checks that `shape` is of the form (height, width) with two integers.""" 277 if len(shape) != 2: 278 raise ValueError(f'Shape {shape} is not of the form (height, width).') 279 if not all(isinstance(i, numbers.Integral) for i in shape): 280 raise ValueError(f'Shape {shape} contains non-integers.') 281 282 283def _run(args: str | Sequence[str]) -> None: 284 """Executes command, printing output from stdout and stderr. 285 286 Args: 287 args: Command to execute, which can be either a string or a sequence of word 288 strings, as in `subprocess.run()`. If `args` is a string, the shell is 289 invoked to interpret it. 290 291 Raises: 292 RuntimeError: If the command's exit code is nonzero. 293 """ 294 proc = subprocess.run( 295 args, 296 shell=isinstance(args, str), 297 stdout=subprocess.PIPE, 298 stderr=subprocess.STDOUT, 299 check=False, 300 universal_newlines=True, 301 ) 302 print(proc.stdout, end='', flush=True) 303 if proc.returncode: 304 raise RuntimeError( 305 f"Command '{proc.args}' failed with code {proc.returncode}." 306 ) 307 308 309def _display_html(text: str, /) -> None: 310 """In a Jupyter notebook, display the HTML `text`.""" 311 IPython.display.display(IPython.display.HTML(text)) # type: ignore 312 313 314def set_ffmpeg(name_or_path: _Path) -> None: 315 """Specifies the name or path for the `ffmpeg` external program. 316 317 The `ffmpeg` program is required for compressing and decompressing video. 318 (It is used in `read_video`, `write_video`, `show_video`, `show_videos`, 319 etc.) 320 321 Args: 322 name_or_path: Either a filename within a directory of `os.environ['PATH']` 323 or a filepath. The default setting is 'ffmpeg'. 324 """ 325 _config.ffmpeg_name_or_path = name_or_path 326 327 328def set_output_height(num_pixels: int) -> None: 329 """Overrides the height of the current output cell, if using Colab.""" 330 try: 331 # We want to fail gracefully for non-Colab IPython notebooks. 332 output = importlib.import_module('google.colab.output') 333 s = f'google.colab.output.setIframeHeight("{num_pixels}px")' 334 output.eval_js(s) 335 except (ModuleNotFoundError, AttributeError): 336 pass 337 338 339def set_max_output_height(num_pixels: int) -> None: 340 """Sets the maximum height of the current output cell, if using Colab.""" 341 try: 342 # We want to fail gracefully for non-Colab IPython notebooks. 343 output = importlib.import_module('google.colab.output') 344 s = ( 345 'google.colab.output.setIframeHeight(' 346 f'0, true, {{maxHeight: {num_pixels}}})' 347 ) 348 output.eval_js(s) 349 except (ModuleNotFoundError, AttributeError): 350 pass 351 352 353# ** Type conversions. 354 355 356def _as_valid_media_type(dtype: _DTypeLike) -> _DType: 357 """Returns validated media data type.""" 358 dtype = np.dtype(dtype) 359 if not issubclass(dtype.type, (np.unsignedinteger, np.floating)): 360 raise ValueError( 361 f'Type {dtype} is not a valid media data type (uint or float).' 362 ) 363 return dtype 364 365 366def _as_valid_media_array(x: _ArrayLike) -> _NDArray: 367 """Converts to ndarray (if not already), and checks validity of data type.""" 368 a = np.asarray(x) 369 if a.dtype == bool: 370 a = a.astype(np.uint8) * np.iinfo(np.uint8).max 371 _as_valid_media_type(a.dtype) 372 return a 373 374 375def to_type(array: _ArrayLike, dtype: _DTypeLike) -> _NDArray: 376 """Returns media array converted to specified type. 377 378 A "media array" is one in which the dtype is either a floating-point type 379 (np.float32 or np.float64) or an unsigned integer type. The array values are 380 assumed to lie in the range [0.0, 1.0] for floating-point values, and in the 381 full range for unsigned integers, e.g. [0, 255] for np.uint8. 382 383 Conversion between integers and floats maps uint(0) to 0.0 and uint(MAX) to 384 1.0. The input array may also be of type bool, whereby True maps to 385 uint(MAX) or 1.0. The values are scaled and clamped as appropriate during 386 type conversions. 387 388 Args: 389 array: Input array-like object (floating-point, unsigned int, or bool). 390 dtype: Desired output type (floating-point or unsigned int). 391 392 Returns: 393 Array `a` if it is already of the specified dtype, else a converted array. 394 """ 395 a = np.asarray(array) 396 dtype = np.dtype(dtype) 397 del array 398 if a.dtype != bool: 399 _as_valid_media_type(a.dtype) # Verify that 'a' has a valid dtype. 400 if a.dtype == bool: 401 result = a.astype(dtype) 402 if np.issubdtype(dtype, np.unsignedinteger): 403 result = result * dtype.type(np.iinfo(dtype).max) 404 elif a.dtype == dtype: 405 result = a 406 elif np.issubdtype(dtype, np.unsignedinteger): 407 if np.issubdtype(a.dtype, np.unsignedinteger): 408 src_max: float = np.iinfo(a.dtype).max 409 else: 410 a = np.clip(a, 0.0, 1.0) 411 src_max = 1.0 412 dst_max = np.iinfo(dtype).max 413 if dst_max <= np.iinfo(np.uint16).max: 414 scale = np.array(dst_max / src_max, dtype=np.float32) 415 result = (a * scale + 0.5).astype(dtype) 416 elif dst_max <= np.iinfo(np.uint32).max: 417 result = (a.astype(np.float64) * (dst_max / src_max) + 0.5).astype(dtype) 418 else: 419 # https://stackoverflow.com/a/66306123/ 420 a = a.astype(np.float64) * (dst_max / src_max) + 0.5 421 dst = np.atleast_1d(a) 422 values_too_large = dst >= np.float64(dst_max) 423 with np.errstate(invalid='ignore'): 424 dst = dst.astype(dtype) 425 dst[values_too_large] = dst_max 426 result = dst if a.ndim > 0 else dst[0] 427 else: 428 assert np.issubdtype(dtype, np.floating) 429 result = a.astype(dtype) 430 if np.issubdtype(a.dtype, np.unsignedinteger): 431 result = result / dtype.type(np.iinfo(a.dtype).max) 432 return result 433 434 435def to_float01(a: _ArrayLike, dtype: _DTypeLike = np.float32) -> _NDArray: 436 """If array has unsigned integers, rescales them to the range [0.0, 1.0]. 437 438 Scaling is such that uint(0) maps to 0.0 and uint(MAX) maps to 1.0. See 439 `to_type`. 440 441 Args: 442 a: Input array. 443 dtype: Desired floating-point type if rescaling occurs. 444 445 Returns: 446 A new array of dtype values in the range [0.0, 1.0] if the input array `a` 447 contains unsigned integers; otherwise, array `a` is returned unchanged. 448 """ 449 a = np.asarray(a) 450 dtype = np.dtype(dtype) 451 if not np.issubdtype(dtype, np.floating): 452 raise ValueError(f'Type {dtype} is not floating-point.') 453 if np.issubdtype(a.dtype, np.floating): 454 return a 455 return to_type(a, dtype) 456 457 458def to_uint8(a: _ArrayLike) -> _NDArray: 459 """Returns array converted to uint8 values; see `to_type`.""" 460 return to_type(a, np.uint8) 461 462 463# ** Functions to generate example image and video data. 464 465 466def color_ramp( 467 shape: tuple[int, int] = (64, 64), *, dtype: _DTypeLike = np.float32 468) -> _NDArray: 469 """Returns an image of a red-green color gradient. 470 471 This is useful for quick experimentation and testing. See also 472 `moving_circle` to generate a sample video. 473 474 Args: 475 shape: 2D spatial dimensions (height, width) of generated image. 476 dtype: Type (uint or floating) of resulting pixel values. 477 """ 478 _check_2d_shape(shape) 479 dtype = _as_valid_media_type(dtype) 480 yx = (np.moveaxis(np.indices(shape), 0, -1) + 0.5) / shape 481 image = np.insert(yx, 2, 0.0, axis=-1) 482 return to_type(image, dtype) 483 484 485def moving_circle( 486 shape: tuple[int, int] = (256, 256), 487 num_images: int = 10, 488 *, 489 dtype: _DTypeLike = np.float32, 490) -> _NDArray: 491 """Returns a video of a circle moving in front of a color ramp. 492 493 This is useful for quick experimentation and testing. See also `color_ramp` 494 to generate a sample image. 495 496 >>> show_video(moving_circle((480, 640), 60), fps=60) 497 498 Args: 499 shape: 2D spatial dimensions (height, width) of generated video. 500 num_images: Number of video frames. 501 dtype: Type (uint or floating) of resulting pixel values. 502 """ 503 _check_2d_shape(shape) 504 dtype = np.dtype(dtype) 505 506 def generate_image(image_index: int) -> _NDArray: 507 """Returns a video frame image.""" 508 image = color_ramp(shape, dtype=dtype) 509 yx = np.moveaxis(np.indices(shape), 0, -1) 510 center = shape[0] * 0.6, shape[1] * (image_index + 0.5) / num_images 511 radius_squared = (min(shape) * 0.1) ** 2 512 inside = np.sum((yx - center) ** 2, axis=-1) < radius_squared 513 white_circle_color = 1.0, 1.0, 1.0 514 if np.issubdtype(dtype, np.unsignedinteger): 515 white_circle_color = to_type([white_circle_color], dtype)[0] 516 image[inside] = white_circle_color 517 return image 518 519 return np.array([generate_image(i) for i in range(num_images)]) 520 521 522# ** Color-space conversions. 523 524# Same matrix values as in two sources: 525# https://github.com/scikit-image/scikit-image/blob/master/skimage/color/colorconv.py#L377 526# https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/ops/image_ops_impl.py#L2754 527_YUV_FROM_RGB_MATRIX = np.array( 528 [ 529 [0.299, -0.14714119, 0.61497538], 530 [0.587, -0.28886916, -0.51496512], 531 [0.114, 0.43601035, -0.10001026], 532 ], 533 dtype=np.float32, 534) 535_RGB_FROM_YUV_MATRIX = np.linalg.inv(_YUV_FROM_RGB_MATRIX) 536_YUV_CHROMA_OFFSET = np.array([0.0, 0.5, 0.5], dtype=np.float32) 537 538 539def yuv_from_rgb(rgb: _ArrayLike) -> _NDArray: 540 """Returns the RGB image/video mapped to YUV [0,1] color space. 541 542 Note that the "YUV" color space used by video compressors is actually YCbCr! 543 544 Args: 545 rgb: Input image in sRGB space. 546 """ 547 rgb = to_float01(rgb) 548 if rgb.shape[-1] != 3: 549 raise ValueError(f'The last dimension in {rgb.shape} is not 3.') 550 return rgb @ _YUV_FROM_RGB_MATRIX + _YUV_CHROMA_OFFSET 551 552 553def rgb_from_yuv(yuv: _ArrayLike) -> _NDArray: 554 """Returns the YUV image/video mapped to RGB [0,1] color space.""" 555 yuv = to_float01(yuv) 556 if yuv.shape[-1] != 3: 557 raise ValueError(f'The last dimension in {yuv.shape} is not 3.') 558 return (yuv - _YUV_CHROMA_OFFSET) @ _RGB_FROM_YUV_MATRIX 559 560 561# Same matrix values as in 562# https://github.com/scikit-image/scikit-image/blob/master/skimage/color/colorconv.py#L1654 563# and https://en.wikipedia.org/wiki/YUV#Studio_swing_for_BT.601 564_YCBCR_FROM_RGB_MATRIX = np.array( 565 [ 566 [65.481, 128.553, 24.966], 567 [-37.797, -74.203, 112.0], 568 [112.0, -93.786, -18.214], 569 ], 570 dtype=np.float32, 571).transpose() 572_RGB_FROM_YCBCR_MATRIX = np.linalg.inv(_YCBCR_FROM_RGB_MATRIX) 573_YCBCR_OFFSET = np.array([16.0, 128.0, 128.0], dtype=np.float32) 574# Note that _YCBCR_FROM_RGB_MATRIX =~ _YUV_FROM_RGB_MATRIX * [219, 256, 182]; 575# https://en.wikipedia.org/wiki/YUV: "Y' values are conventionally shifted and 576# scaled to the range [16, 235] (referred to as studio swing or 'TV levels')"; 577# "studio range of 16-240 for U and V". (Where does value 182 come from?) 578 579 580def ycbcr_from_rgb(rgb: _ArrayLike) -> _NDArray: 581 """Returns the RGB image/video mapped to YCbCr [0,1] color space. 582 583 The YCbCr color space is the one called "YUV" by video compressors. 584 585 Args: 586 rgb: Input image in sRGB space. 587 """ 588 rgb = to_float01(rgb) 589 if rgb.shape[-1] != 3: 590 raise ValueError(f'The last dimension in {rgb.shape} is not 3.') 591 return (rgb @ _YCBCR_FROM_RGB_MATRIX + _YCBCR_OFFSET) / 255.0 592 593 594def rgb_from_ycbcr(ycbcr: _ArrayLike) -> _NDArray: 595 """Returns the YCbCr image/video mapped to RGB [0,1] color space.""" 596 ycbcr = to_float01(ycbcr) 597 if ycbcr.shape[-1] != 3: 598 raise ValueError(f'The last dimension in {ycbcr.shape} is not 3.') 599 return (ycbcr * 255.0 - _YCBCR_OFFSET) @ _RGB_FROM_YCBCR_MATRIX 600 601 602# ** Image processing. 603 604 605def _pil_image(image: _ArrayLike, mode: str | None = None) -> PIL.Image.Image: 606 """Returns a PIL image given a numpy matrix (either uint8 or float [0,1]).""" 607 image = _as_valid_media_array(image) 608 if image.ndim not in (2, 3): 609 raise ValueError(f'Image shape {image.shape} is neither 2D nor 3D.') 610 pil_image: PIL.Image.Image = PIL.Image.fromarray(image, mode=mode) 611 return pil_image 612 613 614def resize_image(image: _ArrayLike, shape: tuple[int, int]) -> _NDArray: 615 """Resizes image to specified spatial dimensions using a Lanczos filter. 616 617 Args: 618 image: Array-like 2D or 3D object, where dtype is uint or floating-point. 619 shape: 2D spatial dimensions (height, width) of output image. 620 621 Returns: 622 A resampled image whose spatial dimensions match `shape`. 623 """ 624 image = _as_valid_media_array(image) 625 if image.ndim not in (2, 3): 626 raise ValueError(f'Image shape {image.shape} is neither 2D nor 3D.') 627 _check_2d_shape(shape) 628 629 # A PIL image can be multichannel only if it has 3 or 4 uint8 channels, 630 # and it can be resized only if it is uint8 or float32. 631 supported_single_channel = ( 632 np.issubdtype(image.dtype, np.floating) or image.dtype == np.uint8 633 ) and image.ndim == 2 634 supported_multichannel = ( 635 image.dtype == np.uint8 and image.ndim == 3 and image.shape[2] in (3, 4) 636 ) 637 if supported_single_channel or supported_multichannel: 638 return np.array( 639 _pil_image(image).resize( 640 shape[::-1], resample=PIL.Image.Resampling.LANCZOS 641 ), 642 dtype=image.dtype, 643 ) 644 if image.ndim == 2: 645 # We convert to floating-point for resizing and convert back. 646 return to_type(resize_image(to_float01(image), shape), image.dtype) 647 # We resize each image channel individually. 648 return np.dstack( 649 [resize_image(channel, shape) for channel in np.moveaxis(image, -1, 0)] 650 ) 651 652 653# ** Video processing. 654 655 656def resize_video(video: Iterable[_NDArray], shape: tuple[int, int]) -> _NDArray: 657 """Resizes `video` to specified spatial dimensions using a Lanczos filter. 658 659 Args: 660 video: Iterable of images. 661 shape: 2D spatial dimensions (height, width) of output video. 662 663 Returns: 664 A resampled video whose spatial dimensions match `shape`. 665 """ 666 _check_2d_shape(shape) 667 return np.array([resize_image(image, shape) for image in video]) 668 669 670# ** General I/O. 671 672 673def _is_url(path_or_url: _Path) -> bool: 674 return isinstance(path_or_url, str) and path_or_url.startswith( 675 ('http://', 'https://', 'file://') 676 ) 677 678 679def read_contents(path_or_url: _Path) -> bytes: 680 """Returns the contents of the file specified by either a path or URL.""" 681 data: bytes 682 if _is_url(path_or_url): 683 assert isinstance(path_or_url, str) 684 headers = {'User-Agent': 'Chrome'} 685 request = urllib.request.Request(path_or_url, headers=headers) 686 with urllib.request.urlopen(request) as response: 687 data = response.read() 688 else: 689 with _open(path_or_url, 'rb') as f: 690 data = f.read() 691 return data 692 693 694@contextlib.contextmanager 695def _read_via_local_file(path_or_url: _Path) -> Iterator[str]: 696 """Context to copy a remote file locally to read from it. 697 698 Args: 699 path_or_url: File, which may be remote. 700 701 Yields: 702 The name of a local file which may be a copy of a remote file. 703 """ 704 if _is_url(path_or_url) or not _path_is_local(path_or_url): 705 suffix = pathlib.Path(path_or_url).suffix 706 with tempfile.TemporaryDirectory() as directory_name: 707 tmp_path = pathlib.Path(directory_name) / f'file{suffix}' 708 tmp_path.write_bytes(read_contents(path_or_url)) 709 yield str(tmp_path) 710 else: 711 yield str(path_or_url) 712 713 714@contextlib.contextmanager 715def _write_via_local_file(path: _Path) -> Iterator[str]: 716 """Context to write a temporary local file and subsequently copy it remotely. 717 718 Args: 719 path: File, which may be remote. 720 721 Yields: 722 The name of a local file which may be subsequently copied remotely. 723 """ 724 if _path_is_local(path): 725 yield str(path) 726 else: 727 suffix = pathlib.Path(path).suffix 728 with tempfile.TemporaryDirectory() as directory_name: 729 tmp_path = pathlib.Path(directory_name) / f'file{suffix}' 730 yield str(tmp_path) 731 with _open(path, mode='wb') as f: 732 f.write(tmp_path.read_bytes()) 733 734 735class set_show_save_dir: # pylint: disable=invalid-name 736 """Save all titled output from `show_*()` calls into files. 737 738 If the specified `directory` is not None, all titled images and videos 739 displayed by `show_image`, `show_images`, `show_video`, and `show_videos` are 740 also saved as files within the directory. 741 742 It can be used either to set the state or as a context manager: 743 744 >>> set_show_save_dir('/tmp') 745 >>> show_image(color_ramp(), title='image1') # Creates /tmp/image1.png. 746 >>> show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4. 747 >>> set_show_save_dir(None) 748 749 >>> with set_show_save_dir('/tmp'): 750 ... show_image(color_ramp(), title='image1') # Creates /tmp/image1.png. 751 ... show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4. 752 """ 753 754 def __init__(self, directory: _Path | None): 755 self._old_show_save_dir = _config.show_save_dir 756 _config.show_save_dir = directory 757 758 def __enter__(self) -> None: 759 pass 760 761 def __exit__(self, *_: Any) -> None: 762 _config.show_save_dir = self._old_show_save_dir 763 764 765# ** Image I/O. 766 767 768def read_image( 769 path_or_url: _Path, 770 *, 771 apply_exif_transpose: bool = True, 772 dtype: _DTypeLike = None, 773) -> _NDArray: 774 """Returns an image read from a file path or URL. 775 776 Decoding is performed using `PIL`, which supports `uint8` images with 1, 3, 777 or 4 channels and `uint16` images with a single channel. 778 779 Args: 780 path_or_url: Path of input file. 781 apply_exif_transpose: If True, rotate image according to EXIF orientation. 782 dtype: Data type of the returned array. If None, `np.uint8` or `np.uint16` 783 is inferred automatically. 784 """ 785 data = read_contents(path_or_url) 786 return decompress_image(data, dtype, apply_exif_transpose) 787 788 789def write_image( 790 path: _Path, image: _ArrayLike, fmt: str = 'png', **kwargs: Any 791) -> None: 792 """Writes an image to a file. 793 794 Encoding is performed using `PIL`, which supports `uint8` images with 1, 3, 795 or 4 channels and `uint16` images with a single channel. 796 797 File format is explicitly provided by `fmt` and not inferred by `path`. 798 799 Args: 800 path: Path of output file. 801 image: Array-like object. If its type is float, it is converted to np.uint8 802 using `to_uint8` (thus clamping to the input to the range [0.0, 1.0]). 803 Otherwise it must be np.uint8 or np.uint16. 804 fmt: Desired compression encoding, e.g. 'png'. 805 **kwargs: Additional parameters for `PIL.Image.save()`. 806 """ 807 image = _as_valid_media_array(image) 808 if np.issubdtype(image.dtype, np.floating): 809 image = to_uint8(image) 810 with _open(path, 'wb') as f: 811 _pil_image(image).save(f, format=fmt, **kwargs) 812 813 814def to_rgb( 815 array: _ArrayLike, 816 *, 817 vmin: float | None = None, 818 vmax: float | None = None, 819 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 820) -> _NDArray: 821 """Maps scalar values to RGB using value bounds and a color map. 822 823 Args: 824 array: Scalar values, with arbitrary shape. 825 vmin: Explicit min value for remapping; if None, it is obtained as the 826 minimum finite value of `array`. 827 vmax: Explicit max value for remapping; if None, it is obtained as the 828 maximum finite value of `array`. 829 cmap: A `pyplot` color map or callable, to map from 1D value to 3D or 4D 830 color. 831 832 Returns: 833 A new array in which each element is affinely mapped from [vmin, vmax] 834 to [0.0, 1.0] and then color-mapped. 835 """ 836 a = _as_valid_media_array(array) 837 del array 838 # For future numpy version 1.7.0: 839 # vmin = np.amin(a, where=np.isfinite(a)) if vmin is None else vmin 840 # vmax = np.amax(a, where=np.isfinite(a)) if vmax is None else vmax 841 vmin = np.amin(np.where(np.isfinite(a), a, np.inf)) if vmin is None else vmin 842 vmax = np.amax(np.where(np.isfinite(a), a, -np.inf)) if vmax is None else vmax 843 a = (a.astype('float') - vmin) / (vmax - vmin + np.finfo(float).eps) 844 if isinstance(cmap, str): 845 if hasattr(matplotlib, 'colormaps'): 846 rgb_from_scalar: Any = matplotlib.colormaps[cmap] # Newer version. 847 else: 848 rgb_from_scalar = matplotlib.pyplot.cm.get_cmap(cmap) # pylint: disable=no-member 849 else: 850 rgb_from_scalar = cmap 851 a = cast(_NDArray, rgb_from_scalar(a)) 852 # If there is a fully opaque alpha channel, remove it. 853 if a.shape[-1] == 4 and np.all(to_float01(a[..., 3])) == 1.0: 854 a = a[..., :3] 855 return a 856 857 858def compress_image( 859 image: _ArrayLike, *, fmt: str = 'png', **kwargs: Any 860) -> bytes: 861 """Returns a buffer containing a compressed image. 862 863 Args: 864 image: Array in a format supported by `PIL`, e.g. np.uint8 or np.uint16. 865 fmt: Desired compression encoding, e.g. 'png'. 866 **kwargs: Options for `PIL.save()`, e.g. `optimize=True` for greater 867 compression. 868 """ 869 image = _as_valid_media_array(image) 870 with io.BytesIO() as output: 871 _pil_image(image).save(output, format=fmt, **kwargs) 872 return output.getvalue() 873 874 875def decompress_image( 876 data: bytes, dtype: _DTypeLike = None, apply_exif_transpose: bool = True 877) -> _NDArray: 878 """Returns an image from a compressed data buffer. 879 880 Decoding is performed using `PIL`, which supports `uint8` images with 1, 3, 881 or 4 channels and `uint16` images with a single channel. 882 883 Args: 884 data: Buffer containing compressed image. 885 dtype: Data type of the returned array. If None, `np.uint8` or `np.uint16` 886 is inferred automatically. 887 apply_exif_transpose: If True, rotate image according to EXIF orientation. 888 """ 889 pil_image: PIL.Image.Image = PIL.Image.open(io.BytesIO(data)) 890 if apply_exif_transpose: 891 tmp_image = PIL.ImageOps.exif_transpose(pil_image) # Future: in_place=True. 892 assert tmp_image 893 pil_image = tmp_image 894 if dtype is None: 895 dtype = np.uint16 if pil_image.mode.startswith('I') else np.uint8 896 return np.array(pil_image, dtype=dtype) 897 898 899def html_from_compressed_image( 900 data: bytes, 901 width: int, 902 height: int, 903 *, 904 title: str | None = None, 905 border: bool | str = False, 906 pixelated: bool = True, 907 fmt: str = 'png', 908) -> str: 909 """Returns an HTML string with an image tag containing encoded data. 910 911 Args: 912 data: Compressed image bytes. 913 width: Width of HTML image in pixels. 914 height: Height of HTML image in pixels. 915 title: Optional text shown centered above image. 916 border: If `bool`, whether to place a black boundary around the image, or if 917 `str`, the boundary CSS style. 918 pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'. 919 fmt: Compression encoding. 920 """ 921 b64 = base64.b64encode(data).decode('utf-8') 922 if isinstance(border, str): 923 border = f'{border}; ' 924 elif border: 925 border = 'border:1px solid black; ' 926 else: 927 border = '' 928 s_pixelated = 'pixelated' if pixelated else 'auto' 929 s = ( 930 f'<img width="{width}" height="{height}"' 931 f' style="{border}image-rendering:{s_pixelated}; object-fit:cover;"' 932 f' src="data:image/{fmt};base64,{b64}"/>' 933 ) 934 if title is not None: 935 s = f"""<div style="display:flex; align-items:left;"> 936 <div style="display:flex; flex-direction:column; align-items:center;"> 937 <div>{title}</div><div>{s}</div></div></div>""" 938 return s 939 940 941def _get_width_height( 942 width: int | None, height: int | None, shape: tuple[int, int] 943) -> tuple[int, int]: 944 """Returns (width, height) given optional parameters and image shape.""" 945 assert len(shape) == 2, shape 946 if width and height: 947 return width, height 948 if width and not height: 949 return width, int(width * (shape[0] / shape[1]) + 0.5) 950 if height and not width: 951 return int(height * (shape[1] / shape[0]) + 0.5), height 952 return shape[::-1] 953 954 955def _ensure_mapped_to_rgb( 956 image: _ArrayLike, 957 *, 958 vmin: float | None = None, 959 vmax: float | None = None, 960 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 961) -> _NDArray: 962 """Ensure image is mapped to RGB.""" 963 image = _as_valid_media_array(image) 964 if not (image.ndim == 2 or (image.ndim == 3 and image.shape[2] in (1, 3, 4))): 965 raise ValueError( 966 f'Image with shape {image.shape} is neither a 2D array' 967 ' nor a 3D array with 1, 3, or 4 channels.' 968 ) 969 if image.ndim == 3 and image.shape[2] == 1: 970 image = image[:, :, 0] 971 if image.ndim == 2: 972 image = to_rgb(image, vmin=vmin, vmax=vmax, cmap=cmap) 973 return image 974 975 976def show_image( 977 image: _ArrayLike, *, title: str | None = None, **kwargs: Any 978) -> str | None: 979 """Displays an image in the notebook and optionally saves it to a file. 980 981 See `show_images`. 982 983 >>> show_image(np.random.rand(100, 100)) 984 >>> show_image(np.random.randint(0, 256, size=(80, 80, 3), dtype='uint8')) 985 >>> show_image(np.random.rand(10, 10) - 0.5, cmap='bwr', height=100) 986 >>> show_image(read_image('/tmp/image.png')) 987 >>> url = 'https://github.com/hhoppe/data/raw/main/image.png' 988 >>> show_image(read_image(url)) 989 990 Args: 991 image: 2D array-like, or 3D array-like with 1, 3, or 4 channels. 992 title: Optional text shown centered above the image. 993 **kwargs: See `show_images`. 994 995 Returns: 996 html string if `return_html` is `True`. 997 """ 998 return show_images([np.asarray(image)], [title], **kwargs) 999 1000 1001def show_images( 1002 images: Iterable[_ArrayLike] | Mapping[str, _ArrayLike], 1003 titles: Iterable[str | None] | None = None, 1004 *, 1005 width: int | None = None, 1006 height: int | None = None, 1007 downsample: bool = True, 1008 columns: int | None = None, 1009 vmin: float | None = None, 1010 vmax: float | None = None, 1011 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 1012 border: bool | str = False, 1013 ylabel: str = '', 1014 html_class: str = 'show_images', 1015 pixelated: bool | None = None, 1016 return_html: bool = False, 1017) -> str | None: 1018 """Displays a row of images in the IPython/Jupyter notebook. 1019 1020 If a directory has been specified using `set_show_save_dir`, also saves each 1021 titled image to a file in that directory based on its title. 1022 1023 >>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64)) 1024 >>> show_images([image1, image2]) 1025 >>> show_images({'random image': image1, 'color ramp': image2}, height=128) 1026 >>> show_images([image1, image2] * 5, columns=4, border=True) 1027 1028 Args: 1029 images: Iterable of images, or dictionary of `{title: image}`. Each image 1030 must be either a 2D array or a 3D array with 1, 3, or 4 channels. 1031 titles: Optional strings shown above the corresponding images. 1032 width: Optional, overrides displayed width (in pixels). 1033 height: Optional, overrides displayed height (in pixels). 1034 downsample: If True, each image whose width or height is greater than the 1035 specified `width` or `height` is resampled to the display resolution. This 1036 improves antialiasing and reduces the size of the notebook. 1037 columns: Optional, maximum number of images per row. 1038 vmin: For single-channel image, explicit min value for display. 1039 vmax: For single-channel image, explicit max value for display. 1040 cmap: For single-channel image, `pyplot` color map or callable to map 1D to 1041 3D color. 1042 border: If `bool`, whether to place a black boundary around the image, or if 1043 `str`, the boundary CSS style. 1044 ylabel: Text (rotated by 90 degrees) shown on the left of each row. 1045 html_class: CSS class name used in definition of HTML element. 1046 pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'; if 1047 False, sets 'image-rendering: auto'; if None, uses pixelated rendering 1048 only on images for which `width` or `height` introduces magnification. 1049 return_html: If `True` return the raw HTML `str` instead of displaying. 1050 1051 Returns: 1052 html string if `return_html` is `True`. 1053 """ 1054 if isinstance(images, Mapping): 1055 if titles is not None: 1056 raise ValueError('Cannot have images dictionary and titles parameter.') 1057 list_titles, list_images = list(images.keys()), list(images.values()) 1058 else: 1059 list_images = list(images) 1060 list_titles = [None] * len(list_images) if titles is None else list(titles) 1061 if len(list_images) != len(list_titles): 1062 raise ValueError( 1063 'Number of images does not match number of titles' 1064 f' ({len(list_images)} vs {len(list_titles)}).' 1065 ) 1066 1067 list_images = [ 1068 _ensure_mapped_to_rgb(image, vmin=vmin, vmax=vmax, cmap=cmap) 1069 for image in list_images 1070 ] 1071 1072 def maybe_downsample(image: _NDArray) -> _NDArray: 1073 shape = image.shape[0], image.shape[1] 1074 w, h = _get_width_height(width, height, shape) 1075 if w < shape[1] or h < shape[0]: 1076 image = resize_image(image, (h, w)) 1077 return image 1078 1079 if downsample: 1080 list_images = [maybe_downsample(image) for image in list_images] 1081 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1082 1083 for title, png_data in zip(list_titles, png_datas): 1084 if title is not None and _config.show_save_dir: 1085 path = pathlib.Path(_config.show_save_dir) / f'{title}.png' 1086 with _open(path, mode='wb') as f: 1087 f.write(png_data) 1088 1089 def html_from_compressed_images() -> str: 1090 html_strings = [] 1091 for image, title, png_data in zip(list_images, list_titles, png_datas): 1092 w, h = _get_width_height(width, height, image.shape[:2]) 1093 magnified = h > image.shape[0] or w > image.shape[1] 1094 pixelated2 = pixelated if pixelated is not None else magnified 1095 html_strings.append( 1096 html_from_compressed_image( 1097 png_data, w, h, title=title, border=border, pixelated=pixelated2 1098 ) 1099 ) 1100 # Create single-row tables each with no more than 'columns' elements. 1101 table_strings = [] 1102 for row_html_strings in _chunked(html_strings, columns): 1103 td = '<td style="padding:1px;">' 1104 s = ''.join(f'{td}{e}</td>' for e in row_html_strings) 1105 if ylabel: 1106 style = 'writing-mode:vertical-lr; transform:rotate(180deg);' 1107 s = f'{td}<span style="{style}">{ylabel}</span></td>' + s 1108 table_strings.append( 1109 f'<table class="{html_class}"' 1110 f' style="border-spacing:0px;"><tr>{s}</tr></table>' 1111 ) 1112 return ''.join(table_strings) 1113 1114 s = html_from_compressed_images() 1115 while len(s) > _IPYTHON_HTML_SIZE_LIMIT * 0.5: 1116 warnings.warn('mediapy: subsampling images to reduce HTML size') 1117 list_images = [image[::2, ::2] for image in list_images] 1118 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1119 s = html_from_compressed_images() 1120 if return_html: 1121 return s 1122 _display_html(s) 1123 return None 1124 1125 1126def compare_images( 1127 images: Iterable[_ArrayLike], 1128 *, 1129 vmin: float | None = None, 1130 vmax: float | None = None, 1131 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 1132) -> None: 1133 """Compare two images using an interactive slider. 1134 1135 Displays an HTML slider component to interactively swipe between two images. 1136 The slider functionality requires that the web browser have Internet access. 1137 See additional info in `https://github.com/sneas/img-comparison-slider`. 1138 1139 >>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64)) 1140 >>> compare_images([image1, image2]) 1141 1142 Args: 1143 images: Iterable of images. Each image must be either a 2D array or a 3D 1144 array with 1, 3, or 4 channels. There must be exactly two images. 1145 vmin: For single-channel image, explicit min value for display. 1146 vmax: For single-channel image, explicit max value for display. 1147 cmap: For single-channel image, `pyplot` color map or callable to map 1D to 1148 3D color. 1149 """ 1150 list_images = [ 1151 _ensure_mapped_to_rgb(image, vmin=vmin, vmax=vmax, cmap=cmap) 1152 for image in images 1153 ] 1154 if len(list_images) != 2: 1155 raise ValueError('The number of images must be 2.') 1156 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1157 b64_1, b64_2 = [ 1158 base64.b64encode(png_data).decode('utf-8') for png_data in png_datas 1159 ] 1160 s = _IMAGE_COMPARISON_HTML.replace('{b64_1}', b64_1).replace('{b64_2}', b64_2) 1161 _display_html(s) 1162 1163 1164# ** Video I/O. 1165 1166 1167def _filename_suffix_from_codec(codec: str) -> str: 1168 if codec == 'gif': 1169 return '.gif' 1170 if codec == 'vp9': 1171 return '.webm' 1172 1173 return '.mp4' 1174 1175 1176def _get_ffmpeg_path() -> str: 1177 path = _search_for_ffmpeg_path() 1178 if not path: 1179 raise RuntimeError( 1180 f"Program '{_config.ffmpeg_name_or_path}' is not found;" 1181 " perhaps install ffmpeg using 'apt install ffmpeg'." 1182 ) 1183 return path 1184 1185 1186@typing.overload 1187def _run_ffmpeg( 1188 ffmpeg_args: Sequence[str], 1189 stdin: int | None = None, 1190 stdout: int | None = None, 1191 stderr: int | None = None, 1192 encoding: None = None, # No encoding -> bytes 1193 allowed_input_files: Sequence[str] | None = None, 1194 allowed_output_files: Sequence[str] | None = None, 1195) -> subprocess.Popen[bytes]: 1196 ... 1197 1198 1199@typing.overload 1200def _run_ffmpeg( 1201 ffmpeg_args: Sequence[str], 1202 stdin: int | None = None, 1203 stdout: int | None = None, 1204 stderr: int | None = None, 1205 encoding: str = ..., # Encoding -> str 1206 allowed_input_files: Sequence[str] | None = None, 1207 allowed_output_files: Sequence[str] | None = None, 1208) -> subprocess.Popen[str]: 1209 ... 1210 1211 1212def _run_ffmpeg( 1213 ffmpeg_args: Sequence[str], 1214 stdin: int | None = None, 1215 stdout: int | None = None, 1216 stderr: int | None = None, 1217 encoding: str | None = None, 1218 allowed_input_files: Sequence[str] | None = None, 1219 allowed_output_files: Sequence[str] | None = None, 1220) -> subprocess.Popen[bytes] | subprocess.Popen[str]: 1221 """Runs ffmpeg with the given args. 1222 1223 Args: 1224 ffmpeg_args: The args to pass to ffmpeg. 1225 stdin: Same as in `subprocess.Popen`. 1226 stdout: Same as in `subprocess.Popen`. 1227 stderr: Same as in `subprocess.Popen`. 1228 encoding: Same as in `subprocess.Popen`. 1229 allowed_input_files: The input files to allow for ffmpeg. 1230 allowed_output_files: The output files to allow for ffmpeg. 1231 1232 Returns: 1233 The subprocess.Popen object with running ffmpeg process. 1234 """ 1235 argv = [] 1236 # In open source, keep env=None to preserve default behavior. 1237 # Context: https://github.com/google/mediapy/pull/62 1238 env: Any = None # pylint: disable=unused-variable 1239 ffmpeg_path = _get_ffmpeg_path() 1240 1241 # Allowed input and output files are not supported in open source. 1242 del allowed_input_files 1243 del allowed_output_files 1244 1245 argv.append(ffmpeg_path) 1246 argv.extend(ffmpeg_args) 1247 1248 return subprocess.Popen( 1249 argv, 1250 stdin=stdin, 1251 stdout=stdout, 1252 stderr=stderr, 1253 encoding=encoding, 1254 env=env, 1255 ) 1256 1257 1258def video_is_available() -> bool: 1259 """Returns True if the program `ffmpeg` is found. 1260 1261 See also `set_ffmpeg`. 1262 """ 1263 return _search_for_ffmpeg_path() is not None 1264 1265 1266class VideoMetadata(NamedTuple): 1267 """Represents the data stored in a video container header. 1268 1269 Attributes: 1270 num_images: Number of frames that is expected from the video stream. This 1271 is estimated from the framerate and the duration stored in the video 1272 header, so it might be inexact. We set the value to -1 if number of 1273 frames is not found in the header. 1274 shape: The dimensions (height, width) of each video frame. 1275 fps: The framerate in frames per second. 1276 bps: The estimated bitrate of the video stream in bits per second, retrieved 1277 from the video header. 1278 """ 1279 1280 num_images: int 1281 shape: tuple[int, int] 1282 fps: float 1283 bps: int | None 1284 1285 1286def _get_video_metadata(path: _Path) -> VideoMetadata: 1287 """Returns attributes of video stored in the specified local file.""" 1288 if not pathlib.Path(path).is_file(): 1289 raise RuntimeError(f"Video file '{path}' is not found.") 1290 1291 command = [ 1292 '-nostdin', 1293 '-i', 1294 str(path), 1295 '-acodec', 1296 'copy', 1297 # Necessary to get "frame= *(\d+)" using newer ffmpeg versions. 1298 # Previously, was `'-vcodec', 'copy'` 1299 '-vf', 1300 'select=1', 1301 '-vsync', 1302 '0', 1303 '-f', 1304 'null', 1305 '-', 1306 ] 1307 with _run_ffmpeg( 1308 command, 1309 allowed_input_files=[str(path)], 1310 stderr=subprocess.PIPE, 1311 encoding='utf-8', 1312 ) as proc: 1313 _, err = proc.communicate() 1314 bps = fps = num_images = width = height = rotation = None 1315 before_output_info = True 1316 for line in err.split('\n'): 1317 if line.startswith('Output '): 1318 before_output_info = False 1319 if match := re.search(r', bitrate: *([\d.]+) kb/s', line): 1320 bps = int(match.group(1)) * 1000 1321 if matches := re.findall(r'frame= *(\d+) ', line): 1322 num_images = int(matches[-1]) 1323 if 'Stream #0:' in line and ': Video:' in line and before_output_info: 1324 if not (match := re.search(r', (\d+)x(\d+)', line)): 1325 raise RuntimeError(f'Unable to parse video dimensions in line {line}') 1326 width, height = int(match.group(1)), int(match.group(2)) 1327 if match := re.search(r', ([\d.]+) fps', line): 1328 fps = float(match.group(1)) 1329 elif str(path).endswith('.gif'): 1330 # Some GIF files lack a framerate attribute; use a reasonable default. 1331 fps = 10 1332 else: 1333 raise RuntimeError(f'Unable to parse video framerate in line {line}') 1334 if match := re.fullmatch(r'\s*rotate\s*:\s*(\d+)', line): 1335 rotation = int(match.group(1)) 1336 if match := re.fullmatch(r'.*rotation of (-?\d+).*\sdegrees', line): 1337 rotation = int(match.group(1)) 1338 if not num_images: 1339 num_images = -1 1340 if not width: 1341 raise RuntimeError(f'Unable to parse video header: {err}') 1342 # By default, ffmpeg enables "-autorotate"; we just fix the dimensions. 1343 if rotation in (90, 270, -90, -270): 1344 width, height = height, width 1345 assert height is not None and width is not None 1346 shape = height, width 1347 assert fps is not None 1348 return VideoMetadata(num_images, shape, fps, bps) 1349 1350 1351class _VideoIO: 1352 """Base class for `VideoReader` and `VideoWriter`.""" 1353 1354 def _get_pix_fmt(self, dtype: _DType, image_format: str) -> str: 1355 """Returns ffmpeg pix_fmt given data type and image format.""" 1356 native_endian_suffix = {'little': 'le', 'big': 'be'}[sys.byteorder] 1357 return { 1358 np.uint8: { 1359 'rgb': 'rgb24', 1360 'yuv': 'yuv444p', 1361 'gray': 'gray', 1362 }, 1363 np.uint16: { 1364 'rgb': 'rgb48' + native_endian_suffix, 1365 'yuv': 'yuv444p16' + native_endian_suffix, 1366 'gray': 'gray16' + native_endian_suffix, 1367 }, 1368 }[dtype.type][image_format] 1369 1370 1371class VideoReader(_VideoIO): 1372 """Context to read a compressed video as an iterable over its images. 1373 1374 >>> with VideoReader('/tmp/river.mp4') as reader: 1375 ... print(f'Video has {reader.num_images} images with shape={reader.shape},' 1376 ... f' at {reader.fps} frames/sec and {reader.bps} bits/sec.') 1377 ... for image in reader: 1378 ... print(image.shape) 1379 1380 >>> with VideoReader('/tmp/river.mp4') as reader: 1381 ... video = np.array(tuple(reader)) 1382 1383 >>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4' 1384 >>> with VideoReader(url) as reader: 1385 ... show_video(reader) 1386 1387 Attributes: 1388 path_or_url: Location of input video. 1389 output_format: Format of output images (default 'rgb'). If 'rgb', each 1390 image has shape=(height, width, 3) with R, G, B values. If 'yuv', each 1391 image has shape=(height, width, 3) with Y, U, V values. If 'gray', each 1392 image has shape=(height, width). 1393 dtype: Data type for output images. The default is `np.uint8`. Use of 1394 `np.uint16` allows reading 10-bit or 12-bit data without precision loss. 1395 metadata: Object storing the information retrieved from the video header. 1396 Its attributes are copied as attributes in this class. 1397 num_images: Number of frames that is expected from the video stream. This 1398 is estimated from the framerate and the duration stored in the video 1399 header, so it might be inexact. 1400 shape: The dimensions (height, width) of each video frame. 1401 fps: The framerate in frames per second. 1402 bps: The estimated bitrate of the video stream in bits per second, retrieved 1403 from the video header. 1404 stream_index: The stream index to read from. The default is 0. 1405 """ 1406 1407 path_or_url: _Path 1408 output_format: str 1409 dtype: _DType 1410 metadata: VideoMetadata 1411 num_images: int 1412 shape: tuple[int, int] 1413 fps: float 1414 bps: int | None 1415 stream_index: int 1416 _num_bytes_per_image: int 1417 1418 def __init__( 1419 self, 1420 path_or_url: _Path, 1421 *, 1422 stream_index: int = 0, 1423 output_format: str = 'rgb', 1424 dtype: _DTypeLike = np.uint8, 1425 ): 1426 if output_format not in {'rgb', 'yuv', 'gray'}: 1427 raise ValueError( 1428 f'Output format {output_format} is not rgb, yuv, or gray.' 1429 ) 1430 self.path_or_url = path_or_url 1431 self.output_format = output_format 1432 self.stream_index = stream_index 1433 self.dtype = np.dtype(dtype) 1434 if self.dtype.type not in (np.uint8, np.uint16): 1435 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1436 self._read_via_local_file: Any = None 1437 self._popen: subprocess.Popen[bytes] | None = None 1438 self._proc: subprocess.Popen[bytes] | None = None 1439 1440 def __enter__(self) -> 'VideoReader': 1441 try: 1442 self._read_via_local_file = _read_via_local_file(self.path_or_url) 1443 # pylint: disable-next=no-member 1444 tmp_name = self._read_via_local_file.__enter__() 1445 1446 self.metadata = _get_video_metadata(tmp_name) 1447 self.num_images, self.shape, self.fps, self.bps = self.metadata 1448 pix_fmt = self._get_pix_fmt(self.dtype, self.output_format) 1449 num_channels = {'rgb': 3, 'yuv': 3, 'gray': 1}[self.output_format] 1450 bytes_per_channel = self.dtype.itemsize 1451 self._num_bytes_per_image = ( 1452 math.prod(self.shape) * num_channels * bytes_per_channel 1453 ) 1454 1455 command = [ 1456 '-v', 1457 'panic', 1458 '-nostdin', 1459 '-i', 1460 tmp_name, 1461 '-vcodec', 1462 'rawvideo', 1463 '-f', 1464 'image2pipe', 1465 '-map', 1466 f'0:v:{self.stream_index}', 1467 '-pix_fmt', 1468 pix_fmt, 1469 '-vsync', 1470 'vfr', 1471 '-', 1472 ] 1473 self._popen = _run_ffmpeg( 1474 command, 1475 stdout=subprocess.PIPE, 1476 stderr=subprocess.PIPE, 1477 allowed_input_files=[tmp_name], 1478 ) 1479 self._proc = self._popen.__enter__() 1480 except Exception: 1481 self.__exit__(None, None, None) 1482 raise 1483 return self 1484 1485 def __exit__(self, *_: Any) -> None: 1486 self.close() 1487 1488 def read(self) -> _NDArray | None: 1489 """Reads a video image frame (or None if at end of file). 1490 1491 Returns: 1492 A numpy array in the format specified by `output_format`, i.e., a 3D 1493 array with 3 color channels, except for format 'gray' which is 2D. 1494 """ 1495 assert self._proc, 'Error: reading from an already closed context.' 1496 stdout = self._proc.stdout 1497 assert stdout is not None 1498 data = stdout.read(self._num_bytes_per_image) 1499 if not data: # Due to either end-of-file or subprocess error. 1500 self.close() # Raises exception if subprocess had error. 1501 return None # To indicate end-of-file. 1502 assert len(data) == self._num_bytes_per_image 1503 image = np.frombuffer(data, dtype=self.dtype) 1504 if self.output_format == 'rgb': 1505 image = image.reshape(*self.shape, 3) 1506 elif self.output_format == 'yuv': # Convert from planar YUV to pixel YUV. 1507 image = np.moveaxis(image.reshape(3, *self.shape), 0, 2) 1508 elif self.output_format == 'gray': # Generate 2D rather than 3D ndimage. 1509 image = image.reshape(*self.shape) 1510 else: 1511 raise AssertionError 1512 return image 1513 1514 def __iter__(self) -> Iterator[_NDArray]: 1515 while True: 1516 image = self.read() 1517 if image is None: 1518 return 1519 yield image 1520 1521 def close(self) -> None: 1522 """Terminates video reader. (Called automatically at end of context.)""" 1523 if self._popen: 1524 self._popen.__exit__(None, None, None) 1525 self._popen = None 1526 self._proc = None 1527 if self._read_via_local_file: 1528 # pylint: disable-next=no-member 1529 self._read_via_local_file.__exit__(None, None, None) 1530 self._read_via_local_file = None 1531 1532 1533class VideoWriter(_VideoIO): 1534 """Context to write a compressed video. 1535 1536 >>> shape = 480, 640 1537 >>> with VideoWriter('/tmp/v.mp4', shape, fps=60) as writer: 1538 ... for image in moving_circle(shape, num_images=60): 1539 ... writer.add_image(image) 1540 >>> show_video(read_video('/tmp/v.mp4')) 1541 1542 1543 Bitrate control may be specified using at most one of: `bps`, `qp`, or `crf`. 1544 If none are specified, `qp` is set to a default value. 1545 See https://slhck.info/video/2017/03/01/rate-control.html 1546 1547 If codec is 'gif', the args `bps`, `qp`, `crf`, and `encoded_format` are 1548 ignored. 1549 1550 Attributes: 1551 path: Output video. Its suffix (e.g. '.mp4') determines the video container 1552 format. The suffix must be '.gif' if the codec is 'gif'. 1553 shape: 2D spatial dimensions (height, width) of video image frames. The 1554 dimensions must be even if 'encoded_format' has subsampled chroma (e.g., 1555 'yuv420p' or 'yuv420p10le'). 1556 codec: Compression algorithm as defined by "ffmpeg -codecs" (e.g., 'h264', 1557 'hevc', 'vp9', or 'gif'). 1558 metadata: Optional VideoMetadata object whose `fps` and `bps` attributes are 1559 used if not specified as explicit parameters. 1560 fps: Frames-per-second framerate (default is 60.0 except 25.0 for 'gif'). 1561 bps: Requested average bits-per-second bitrate (default None). 1562 qp: Quantization parameter for video compression quality (default None). 1563 crf: Constant rate factor for video compression quality (default None). 1564 ffmpeg_args: Additional arguments for `ffmpeg` command, e.g. '-g 30' to 1565 introduce I-frames, or '-bf 0' to omit B-frames. 1566 input_format: Format of input images (default 'rgb'). If 'rgb', each image 1567 has shape=(height, width, 3) or (height, width). If 'yuv', each image has 1568 shape=(height, width, 3) with Y, U, V values. If 'gray', each image has 1569 shape=(height, width). 1570 dtype: Expected data type for input images (any float input images are 1571 converted to `dtype`). The default is `np.uint8`. Use of `np.uint16` is 1572 necessary when encoding >8 bits/channel. 1573 encoded_format: Pixel format as defined by `ffmpeg -pix_fmts`, e.g., 1574 'yuv420p' (2x2-subsampled chroma), 'yuv444p' (full-res chroma), 1575 'yuv420p10le' (10-bit per channel), etc. The default (None) selects 1576 'yuv420p' if all shape dimensions are even, else 'yuv444p'. 1577 """ 1578 1579 def __init__( 1580 self, 1581 path: _Path, 1582 shape: tuple[int, int], 1583 *, 1584 codec: str = 'h264', 1585 metadata: VideoMetadata | None = None, 1586 fps: float | None = None, 1587 bps: int | None = None, 1588 qp: int | None = None, 1589 crf: float | None = None, 1590 ffmpeg_args: str | Sequence[str] = '', 1591 input_format: str = 'rgb', 1592 dtype: _DTypeLike = np.uint8, 1593 encoded_format: str | None = None, 1594 ) -> None: 1595 _check_2d_shape(shape) 1596 if fps is None and metadata: 1597 fps = metadata.fps 1598 if fps is None: 1599 fps = 25.0 if codec == 'gif' else 60.0 1600 if fps <= 0.0: 1601 raise ValueError(f'Frame-per-second value {fps} is invalid.') 1602 if bps is None and metadata: 1603 bps = metadata.bps 1604 bps = int(bps) if bps is not None else None 1605 if bps is not None and bps <= 0: 1606 raise ValueError(f'Bitrate value {bps} is invalid.') 1607 if qp is not None and (not isinstance(qp, int) or qp < 0): 1608 raise ValueError( 1609 f'Quantization parameter {qp} cannot be negative. It must be a' 1610 ' non-negative integer.' 1611 ) 1612 num_rate_specifications = sum(x is not None for x in (bps, qp, crf)) 1613 if num_rate_specifications > 1: 1614 raise ValueError( 1615 f'Must specify at most one of bps, qp, or crf ({bps}, {qp}, {crf}).' 1616 ) 1617 ffmpeg_args = ( 1618 shlex.split(ffmpeg_args) 1619 if isinstance(ffmpeg_args, str) 1620 else list(ffmpeg_args) 1621 ) 1622 if input_format not in {'rgb', 'yuv', 'gray'}: 1623 raise ValueError(f'Input format {input_format} is not rgb, yuv, or gray.') 1624 dtype = np.dtype(dtype) 1625 if dtype.type not in (np.uint8, np.uint16): 1626 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1627 self.path = pathlib.Path(path) 1628 self.shape = shape 1629 all_dimensions_are_even = all(dim % 2 == 0 for dim in shape) 1630 if encoded_format is None: 1631 encoded_format = 'yuv420p' if all_dimensions_are_even else 'yuv444p' 1632 if not all_dimensions_are_even and encoded_format.startswith( 1633 ('yuv42', 'yuvj42') 1634 ): 1635 raise ValueError( 1636 f'With encoded_format {encoded_format}, video dimensions must be' 1637 f' even, but shape is {shape}.' 1638 ) 1639 self.fps = fps 1640 self.codec = codec 1641 self.bps = bps 1642 self.qp = qp 1643 self.crf = crf 1644 self.ffmpeg_args = ffmpeg_args 1645 self.input_format = input_format 1646 self.dtype = dtype 1647 self.encoded_format = encoded_format 1648 if num_rate_specifications == 0 and not ffmpeg_args: 1649 qp = 20 if math.prod(self.shape) <= 640 * 480 else 28 1650 self._bitrate_args = ( 1651 (['-vb', f'{bps}'] if bps is not None else []) 1652 + (['-qp', f'{qp}'] if qp is not None else []) 1653 + (['-vb', '0', '-crf', f'{crf}'] if crf is not None else []) 1654 ) 1655 if self.codec == 'gif': 1656 if self.path.suffix != '.gif': 1657 raise ValueError(f"File '{self.path}' does not have a .gif suffix.") 1658 self.encoded_format = 'pal8' 1659 self._bitrate_args = [] 1660 video_filter = 'split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse' 1661 # Less common (and likely less useful) is a per-frame color palette: 1662 # video_filter = ('split[s0][s1];[s0]palettegen=stats_mode=single[p];' 1663 # '[s1][p]paletteuse=new=1') 1664 self.ffmpeg_args = ['-vf', video_filter, '-f', 'gif'] + self.ffmpeg_args 1665 self._write_via_local_file: Any = None 1666 self._popen: subprocess.Popen[bytes] | None = None 1667 self._proc: subprocess.Popen[bytes] | None = None 1668 1669 def __enter__(self) -> 'VideoWriter': 1670 input_pix_fmt = self._get_pix_fmt(self.dtype, self.input_format) 1671 try: 1672 self._write_via_local_file = _write_via_local_file(self.path) 1673 # pylint: disable-next=no-member 1674 tmp_name = self._write_via_local_file.__enter__() 1675 1676 # Writing to stdout using ('-f', 'mp4', '-') would require 1677 # ('-movflags', 'frag_keyframe+empty_moov') and the result is nonportable. 1678 height, width = self.shape 1679 command = ( 1680 [ 1681 '-v', 1682 'error', 1683 '-f', 1684 'rawvideo', 1685 '-vcodec', 1686 'rawvideo', 1687 '-pix_fmt', 1688 input_pix_fmt, 1689 '-s', 1690 f'{width}x{height}', 1691 '-r', 1692 f'{self.fps}', 1693 '-i', 1694 '-', 1695 '-an', 1696 '-vcodec', 1697 self.codec, 1698 '-pix_fmt', 1699 self.encoded_format, 1700 ] 1701 + self._bitrate_args 1702 + self.ffmpeg_args 1703 + ['-y', tmp_name] 1704 ) 1705 self._popen = _run_ffmpeg( 1706 command, 1707 stdin=subprocess.PIPE, 1708 stderr=subprocess.PIPE, 1709 allowed_output_files=[tmp_name], 1710 ) 1711 self._proc = self._popen.__enter__() 1712 except Exception: 1713 self.__exit__(None, None, None) 1714 raise 1715 return self 1716 1717 def __exit__(self, *_: Any) -> None: 1718 self.close() 1719 1720 def add_image(self, image: _NDArray) -> None: 1721 """Writes a video frame. 1722 1723 Args: 1724 image: Array whose dtype and first two dimensions must match the `dtype` 1725 and `shape` specified in `VideoWriter` initialization. If 1726 `input_format` is 'gray', the image must be 2D. For the 'rgb' 1727 input_format, the image may be either 2D (interpreted as grayscale) or 1728 3D with three (R, G, B) channels. For the 'yuv' input_format, the image 1729 must be 3D with three (Y, U, V) channels. 1730 1731 Raises: 1732 RuntimeError: If there is an error writing to the output file. 1733 """ 1734 assert self._proc, 'Error: writing to an already closed context.' 1735 if issubclass(image.dtype.type, (np.floating, np.bool_)): 1736 image = to_type(image, self.dtype) 1737 if image.dtype != self.dtype: 1738 raise ValueError(f'Image type {image.dtype} != {self.dtype}.') 1739 if self.input_format == 'gray': 1740 if image.ndim != 2: 1741 raise ValueError(f'Image dimensions {image.shape} are not 2D.') 1742 else: 1743 if image.ndim == 2 and self.input_format == 'rgb': 1744 image = np.dstack((image, image, image)) 1745 if not (image.ndim == 3 and image.shape[2] == 3): 1746 raise ValueError(f'Image dimensions {image.shape} are invalid.') 1747 if image.shape[:2] != self.shape: 1748 raise ValueError( 1749 f'Image dimensions {image.shape[:2]} do not match' 1750 f' those of the initialized video {self.shape}.' 1751 ) 1752 if self.input_format == 'yuv': # Convert from per-pixel YUV to planar YUV. 1753 image = np.moveaxis(image, 2, 0) 1754 data = image.tobytes() 1755 stdin = self._proc.stdin 1756 assert stdin is not None 1757 if stdin.write(data) != len(data): 1758 self._proc.wait() 1759 stderr = self._proc.stderr 1760 assert stderr is not None 1761 s = stderr.read().decode('utf-8') 1762 raise RuntimeError(f"Error writing '{self.path}': {s}") 1763 1764 def close(self) -> None: 1765 """Finishes writing the video. (Called automatically at end of context.)""" 1766 if self._popen: 1767 assert self._proc, 'Error: closing an already closed context.' 1768 stdin = self._proc.stdin 1769 assert stdin is not None 1770 stdin.close() 1771 if self._proc.wait(): 1772 stderr = self._proc.stderr 1773 assert stderr is not None 1774 s = stderr.read().decode('utf-8') 1775 raise RuntimeError(f"Error writing '{self.path}': {s}") 1776 self._popen.__exit__(None, None, None) 1777 self._popen = None 1778 self._proc = None 1779 if self._write_via_local_file: 1780 # pylint: disable-next=no-member 1781 self._write_via_local_file.__exit__(None, None, None) 1782 self._write_via_local_file = None 1783 1784 1785class _VideoArray(npt.NDArray[Any]): 1786 """Wrapper to add a VideoMetadata `metadata` attribute to a numpy array.""" 1787 1788 metadata: VideoMetadata | None 1789 1790 def __new__( 1791 cls: Type['_VideoArray'], 1792 input_array: _NDArray, 1793 metadata: VideoMetadata | None = None, 1794 ) -> '_VideoArray': 1795 obj: _VideoArray = np.asarray(input_array).view(cls) 1796 obj.metadata = metadata 1797 return obj 1798 1799 def __array_finalize__(self, obj: Any) -> None: 1800 if obj is None: 1801 return 1802 self.metadata = getattr(obj, 'metadata', None) 1803 1804 1805def read_video(path_or_url: _Path, **kwargs: Any) -> _VideoArray: 1806 """Returns an array containing all images read from a compressed video file. 1807 1808 >>> video = read_video('/tmp/river.mp4') 1809 >>> print(f'The framerate is {video.metadata.fps} frames/s.') 1810 >>> show_video(video) 1811 1812 >>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4' 1813 >>> show_video(read_video(url)) 1814 1815 Args: 1816 path_or_url: Input video file. 1817 **kwargs: Additional parameters for `VideoReader`. 1818 1819 Returns: 1820 A 4D `numpy` array with dimensions (frame, height, width, channel), or a 3D 1821 array if `output_format` is specified as 'gray'. The returned array has an 1822 attribute `metadata` containing `VideoMetadata` information. This enables 1823 `show_video` to retrieve the framerate in `metadata.fps`. Note that the 1824 metadata attribute is lost in most subsequent `numpy` operations. 1825 """ 1826 with VideoReader(path_or_url, **kwargs) as reader: 1827 return _VideoArray(np.array(tuple(reader)), metadata=reader.metadata) 1828 1829 1830def write_video(path: _Path, images: Iterable[_NDArray], **kwargs: Any) -> None: 1831 """Writes images to a compressed video file. 1832 1833 >>> video = moving_circle((480, 640), num_images=60) 1834 >>> write_video('/tmp/v.mp4', video, fps=60, qp=18) 1835 >>> show_video(read_video('/tmp/v.mp4')) 1836 1837 Args: 1838 path: Output video file. 1839 images: Iterable over video frames, e.g. a 4D array or a list of 2D or 3D 1840 arrays. 1841 **kwargs: Additional parameters for `VideoWriter`. 1842 """ 1843 first_image, images = _peek_first(images) 1844 shape = first_image.shape[0], first_image.shape[1] 1845 dtype = first_image.dtype 1846 if dtype == bool: 1847 dtype = np.dtype(np.uint8) 1848 elif np.issubdtype(dtype, np.floating): 1849 dtype = np.dtype(np.uint16) 1850 kwargs = {'metadata': getattr(images, 'metadata', None), **kwargs} 1851 with VideoWriter(path, shape=shape, dtype=dtype, **kwargs) as writer: 1852 for image in images: 1853 writer.add_image(image) 1854 1855 1856def compress_video( 1857 images: Iterable[_NDArray], *, codec: str = 'h264', **kwargs: Any 1858) -> bytes: 1859 """Returns a buffer containing a compressed video. 1860 1861 The video container is 'gif' for 'gif' codec, 'webm' for 'vp9' codec, 1862 and mp4 otherwise. 1863 1864 >>> video = read_video('/tmp/river.mp4') 1865 >>> data = compress_video(video, bps=10_000_000) 1866 >>> print(len(data)) 1867 1868 >>> data = compress_video(moving_circle((100, 100), num_images=10), fps=10) 1869 1870 Args: 1871 images: Iterable over video frames. 1872 codec: Compression algorithm as defined by `ffmpeg -codecs` (e.g., 'h264', 1873 'hevc', 'vp9', or 'gif'). 1874 **kwargs: Additional parameters for `VideoWriter`. 1875 1876 Returns: 1877 A bytes buffer containing the compressed video. 1878 """ 1879 suffix = _filename_suffix_from_codec(codec) 1880 with tempfile.TemporaryDirectory() as directory_name: 1881 tmp_path = pathlib.Path(directory_name) / f'file{suffix}' 1882 write_video(tmp_path, images, codec=codec, **kwargs) 1883 return tmp_path.read_bytes() 1884 1885 1886def decompress_video(data: bytes, **kwargs: Any) -> _NDArray: 1887 """Returns video images from an MP4-compressed data buffer.""" 1888 with tempfile.TemporaryDirectory() as directory_name: 1889 tmp_path = pathlib.Path(directory_name) / 'file.mp4' 1890 tmp_path.write_bytes(data) 1891 return read_video(tmp_path, **kwargs) 1892 1893 1894def html_from_compressed_video( 1895 data: bytes, 1896 width: int, 1897 height: int, 1898 *, 1899 title: str | None = None, 1900 border: bool | str = False, 1901 loop: bool = True, 1902 autoplay: bool = True, 1903) -> str: 1904 """Returns an HTML string with a video tag containing H264-encoded data. 1905 1906 Args: 1907 data: MP4-compressed video bytes. 1908 width: Width of HTML video in pixels. 1909 height: Height of HTML video in pixels. 1910 title: Optional text shown centered above the video. 1911 border: If `bool`, whether to place a black boundary around the image, or if 1912 `str`, the boundary CSS style. 1913 loop: If True, the playback repeats forever. 1914 autoplay: If True, video playback starts without having to click. 1915 """ 1916 b64 = base64.b64encode(data).decode('utf-8') 1917 if isinstance(border, str): 1918 border = f'{border}; ' 1919 elif border: 1920 border = 'border:1px solid black; ' 1921 else: 1922 border = '' 1923 options = ( 1924 f'controls width="{width}" height="{height}"' 1925 f' style="{border}object-fit:cover;"' 1926 f'{" loop" if loop else ""}' 1927 f'{" autoplay muted" if autoplay else ""}' 1928 ) 1929 s = f"""<video {options}> 1930 <source src="data:video/mp4;base64,{b64}" type="video/mp4"/> 1931 This browser does not support the video tag. 1932 </video>""" 1933 if title is not None: 1934 s = f"""<div style="display:flex; align-items:left;"> 1935 <div style="display:flex; flex-direction:column; align-items:center;"> 1936 <div>{title}</div><div>{s}</div></div></div>""" 1937 return s 1938 1939 1940def show_video( 1941 images: Iterable[_NDArray], *, title: str | None = None, **kwargs: Any 1942) -> str | None: 1943 """Displays a video in the IPython notebook and optionally saves it to a file. 1944 1945 See `show_videos`. 1946 1947 >>> video = read_video('https://github.com/hhoppe/data/raw/main/video.mp4') 1948 >>> show_video(video, title='River video') 1949 1950 >>> show_video(moving_circle((80, 80), num_images=10), fps=5, border=True) 1951 1952 >>> show_video(read_video('/tmp/river.mp4')) 1953 1954 Args: 1955 images: Iterable of video frames (e.g., a 4D array or a list of 2D or 3D 1956 arrays). 1957 title: Optional text shown centered above the video. 1958 **kwargs: See `show_videos`. 1959 1960 Returns: 1961 html string if `return_html` is `True`. 1962 """ 1963 return show_videos([images], [title], **kwargs) 1964 1965 1966def show_videos( 1967 videos: Iterable[Iterable[_NDArray]] | Mapping[str, Iterable[_NDArray]], 1968 titles: Iterable[str | None] | None = None, 1969 *, 1970 width: int | None = None, 1971 height: int | None = None, 1972 downsample: bool = True, 1973 columns: int | None = None, 1974 fps: float | None = None, 1975 bps: int | None = None, 1976 qp: int | None = None, 1977 codec: str = 'h264', 1978 ylabel: str = '', 1979 html_class: str = 'show_videos', 1980 return_html: bool = False, 1981 **kwargs: Any, 1982) -> str | None: 1983 """Displays a row of videos in the IPython notebook. 1984 1985 Creates HTML with `<video>` tags containing embedded H264-encoded bytestrings. 1986 If `codec` is set to 'gif', we instead use `<img>` tags containing embedded 1987 GIF-encoded bytestrings. Note that the resulting GIF animations skip frames 1988 when the `fps` period is not a multiple of 10 ms units (GIF frame delay 1989 units). Encoding at `fps` = 20.0, 25.0, or 50.0 works fine. 1990 1991 If a directory has been specified using `set_show_save_dir`, also saves each 1992 titled video to a file in that directory based on its title. 1993 1994 Args: 1995 videos: Iterable of videos, or dictionary of `{title: video}`. Each video 1996 must be an iterable of images. If a video object has a `metadata` 1997 (`VideoMetadata`) attribute, its `fps` field provides a default framerate. 1998 titles: Optional strings shown above the corresponding videos. 1999 width: Optional, overrides displayed width (in pixels). 2000 height: Optional, overrides displayed height (in pixels). 2001 downsample: If True, each video whose width or height is greater than the 2002 specified `width` or `height` is resampled to the display resolution. This 2003 improves antialiasing and reduces the size of the notebook. 2004 columns: Optional, maximum number of videos per row. 2005 fps: Frames-per-second framerate (default is 60.0 except 25.0 for GIF). 2006 bps: Bits-per-second bitrate (default None). 2007 qp: Quantization parameter for video compression quality (default None). 2008 codec: Compression algorithm; must be either 'h264' or 'gif'. 2009 ylabel: Text (rotated by 90 degrees) shown on the left of each row. 2010 html_class: CSS class name used in definition of HTML element. 2011 return_html: If `True` return the raw HTML `str` instead of displaying. 2012 **kwargs: Additional parameters (`border`, `loop`, `autoplay`) for 2013 `html_from_compressed_video`. 2014 2015 Returns: 2016 html string if `return_html` is `True`. 2017 """ 2018 if isinstance(videos, Mapping): 2019 if titles is not None: 2020 raise ValueError( 2021 'Cannot have both a video dictionary and a titles parameter.' 2022 ) 2023 list_titles = list(videos.keys()) 2024 list_videos = list(videos.values()) 2025 else: 2026 list_videos = list(cast('Iterable[_NDArray]', videos)) 2027 list_titles = [None] * len(list_videos) if titles is None else list(titles) 2028 if len(list_videos) != len(list_titles): 2029 raise ValueError( 2030 'Number of videos does not match number of titles' 2031 f' ({len(list_videos)} vs {len(list_titles)}).' 2032 ) 2033 if codec not in {'h264', 'gif'}: 2034 raise ValueError(f'Codec {codec} is neither h264 or gif.') 2035 2036 html_strings = [] 2037 for video, title in zip(list_videos, list_titles): 2038 metadata: VideoMetadata | None = getattr(video, 'metadata', None) 2039 first_image, video = _peek_first(video) 2040 w, h = _get_width_height(width, height, first_image.shape[:2]) 2041 if downsample and (w < first_image.shape[1] or h < first_image.shape[0]): 2042 # Not resize_video() because each image may have different depth and type. 2043 video = [resize_image(image, (h, w)) for image in video] 2044 first_image = video[0] 2045 data = compress_video( 2046 video, metadata=metadata, fps=fps, bps=bps, qp=qp, codec=codec 2047 ) 2048 if title is not None and _config.show_save_dir: 2049 suffix = _filename_suffix_from_codec(codec) 2050 path = pathlib.Path(_config.show_save_dir) / f'{title}{suffix}' 2051 with _open(path, mode='wb') as f: 2052 f.write(data) 2053 if codec == 'gif': 2054 pixelated = h > first_image.shape[0] or w > first_image.shape[1] 2055 html_string = html_from_compressed_image( 2056 data, w, h, title=title, fmt='gif', pixelated=pixelated, **kwargs 2057 ) 2058 else: 2059 html_string = html_from_compressed_video( 2060 data, w, h, title=title, **kwargs 2061 ) 2062 html_strings.append(html_string) 2063 2064 # Create single-row tables each with no more than 'columns' elements. 2065 table_strings = [] 2066 for row_html_strings in _chunked(html_strings, columns): 2067 td = '<td style="padding:1px;">' 2068 s = ''.join(f'{td}{e}</td>' for e in row_html_strings) 2069 if ylabel: 2070 style = 'writing-mode:vertical-lr; transform:rotate(180deg);' 2071 s = f'{td}<span style="{style}">{ylabel}</span></td>' + s 2072 table_strings.append( 2073 f'<table class="{html_class}"' 2074 f' style="border-spacing:0px;"><tr>{s}</tr></table>' 2075 ) 2076 s = ''.join(table_strings) 2077 if return_html: 2078 return s 2079 _display_html(s) 2080 return None 2081 2082 2083# Local Variables: 2084# fill-column: 80 2085# End:
977def show_image( 978 image: _ArrayLike, *, title: str | None = None, **kwargs: Any 979) -> str | None: 980 """Displays an image in the notebook and optionally saves it to a file. 981 982 See `show_images`. 983 984 >>> show_image(np.random.rand(100, 100)) 985 >>> show_image(np.random.randint(0, 256, size=(80, 80, 3), dtype='uint8')) 986 >>> show_image(np.random.rand(10, 10) - 0.5, cmap='bwr', height=100) 987 >>> show_image(read_image('/tmp/image.png')) 988 >>> url = 'https://github.com/hhoppe/data/raw/main/image.png' 989 >>> show_image(read_image(url)) 990 991 Args: 992 image: 2D array-like, or 3D array-like with 1, 3, or 4 channels. 993 title: Optional text shown centered above the image. 994 **kwargs: See `show_images`. 995 996 Returns: 997 html string if `return_html` is `True`. 998 """ 999 return show_images([np.asarray(image)], [title], **kwargs)
Displays an image in the notebook and optionally saves it to a file.
See show_images.
>>> show_image(np.random.rand(100, 100))
>>> show_image(np.random.randint(0, 256, size=(80, 80, 3), dtype='uint8'))
>>> show_image(np.random.rand(10, 10) - 0.5, cmap='bwr', height=100)
>>> show_image(read_image('/tmp/image.png'))
>>> url = 'https://github.com/hhoppe/data/raw/main/image.png'
>>> show_image(read_image(url))
Arguments:
- image: 2D array-like, or 3D array-like with 1, 3, or 4 channels.
- title: Optional text shown centered above the image.
- **kwargs: See
show_images.
Returns:
html string if
return_htmlisTrue.
1002def show_images( 1003 images: Iterable[_ArrayLike] | Mapping[str, _ArrayLike], 1004 titles: Iterable[str | None] | None = None, 1005 *, 1006 width: int | None = None, 1007 height: int | None = None, 1008 downsample: bool = True, 1009 columns: int | None = None, 1010 vmin: float | None = None, 1011 vmax: float | None = None, 1012 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 1013 border: bool | str = False, 1014 ylabel: str = '', 1015 html_class: str = 'show_images', 1016 pixelated: bool | None = None, 1017 return_html: bool = False, 1018) -> str | None: 1019 """Displays a row of images in the IPython/Jupyter notebook. 1020 1021 If a directory has been specified using `set_show_save_dir`, also saves each 1022 titled image to a file in that directory based on its title. 1023 1024 >>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64)) 1025 >>> show_images([image1, image2]) 1026 >>> show_images({'random image': image1, 'color ramp': image2}, height=128) 1027 >>> show_images([image1, image2] * 5, columns=4, border=True) 1028 1029 Args: 1030 images: Iterable of images, or dictionary of `{title: image}`. Each image 1031 must be either a 2D array or a 3D array with 1, 3, or 4 channels. 1032 titles: Optional strings shown above the corresponding images. 1033 width: Optional, overrides displayed width (in pixels). 1034 height: Optional, overrides displayed height (in pixels). 1035 downsample: If True, each image whose width or height is greater than the 1036 specified `width` or `height` is resampled to the display resolution. This 1037 improves antialiasing and reduces the size of the notebook. 1038 columns: Optional, maximum number of images per row. 1039 vmin: For single-channel image, explicit min value for display. 1040 vmax: For single-channel image, explicit max value for display. 1041 cmap: For single-channel image, `pyplot` color map or callable to map 1D to 1042 3D color. 1043 border: If `bool`, whether to place a black boundary around the image, or if 1044 `str`, the boundary CSS style. 1045 ylabel: Text (rotated by 90 degrees) shown on the left of each row. 1046 html_class: CSS class name used in definition of HTML element. 1047 pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'; if 1048 False, sets 'image-rendering: auto'; if None, uses pixelated rendering 1049 only on images for which `width` or `height` introduces magnification. 1050 return_html: If `True` return the raw HTML `str` instead of displaying. 1051 1052 Returns: 1053 html string if `return_html` is `True`. 1054 """ 1055 if isinstance(images, Mapping): 1056 if titles is not None: 1057 raise ValueError('Cannot have images dictionary and titles parameter.') 1058 list_titles, list_images = list(images.keys()), list(images.values()) 1059 else: 1060 list_images = list(images) 1061 list_titles = [None] * len(list_images) if titles is None else list(titles) 1062 if len(list_images) != len(list_titles): 1063 raise ValueError( 1064 'Number of images does not match number of titles' 1065 f' ({len(list_images)} vs {len(list_titles)}).' 1066 ) 1067 1068 list_images = [ 1069 _ensure_mapped_to_rgb(image, vmin=vmin, vmax=vmax, cmap=cmap) 1070 for image in list_images 1071 ] 1072 1073 def maybe_downsample(image: _NDArray) -> _NDArray: 1074 shape = image.shape[0], image.shape[1] 1075 w, h = _get_width_height(width, height, shape) 1076 if w < shape[1] or h < shape[0]: 1077 image = resize_image(image, (h, w)) 1078 return image 1079 1080 if downsample: 1081 list_images = [maybe_downsample(image) for image in list_images] 1082 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1083 1084 for title, png_data in zip(list_titles, png_datas): 1085 if title is not None and _config.show_save_dir: 1086 path = pathlib.Path(_config.show_save_dir) / f'{title}.png' 1087 with _open(path, mode='wb') as f: 1088 f.write(png_data) 1089 1090 def html_from_compressed_images() -> str: 1091 html_strings = [] 1092 for image, title, png_data in zip(list_images, list_titles, png_datas): 1093 w, h = _get_width_height(width, height, image.shape[:2]) 1094 magnified = h > image.shape[0] or w > image.shape[1] 1095 pixelated2 = pixelated if pixelated is not None else magnified 1096 html_strings.append( 1097 html_from_compressed_image( 1098 png_data, w, h, title=title, border=border, pixelated=pixelated2 1099 ) 1100 ) 1101 # Create single-row tables each with no more than 'columns' elements. 1102 table_strings = [] 1103 for row_html_strings in _chunked(html_strings, columns): 1104 td = '<td style="padding:1px;">' 1105 s = ''.join(f'{td}{e}</td>' for e in row_html_strings) 1106 if ylabel: 1107 style = 'writing-mode:vertical-lr; transform:rotate(180deg);' 1108 s = f'{td}<span style="{style}">{ylabel}</span></td>' + s 1109 table_strings.append( 1110 f'<table class="{html_class}"' 1111 f' style="border-spacing:0px;"><tr>{s}</tr></table>' 1112 ) 1113 return ''.join(table_strings) 1114 1115 s = html_from_compressed_images() 1116 while len(s) > _IPYTHON_HTML_SIZE_LIMIT * 0.5: 1117 warnings.warn('mediapy: subsampling images to reduce HTML size') 1118 list_images = [image[::2, ::2] for image in list_images] 1119 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1120 s = html_from_compressed_images() 1121 if return_html: 1122 return s 1123 _display_html(s) 1124 return None
Displays a row of images in the IPython/Jupyter notebook.
If a directory has been specified using set_show_save_dir, also saves each
titled image to a file in that directory based on its title.
>>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64))
>>> show_images([image1, image2])
>>> show_images({'random image': image1, 'color ramp': image2}, height=128)
>>> show_images([image1, image2] * 5, columns=4, border=True)
Arguments:
- images: Iterable of images, or dictionary of
{title: image}. Each image must be either a 2D array or a 3D array with 1, 3, or 4 channels. - titles: Optional strings shown above the corresponding images.
- width: Optional, overrides displayed width (in pixels).
- height: Optional, overrides displayed height (in pixels).
- downsample: If True, each image whose width or height is greater than the
specified
widthorheightis resampled to the display resolution. This improves antialiasing and reduces the size of the notebook. - columns: Optional, maximum number of images per row.
- vmin: For single-channel image, explicit min value for display.
- vmax: For single-channel image, explicit max value for display.
- cmap: For single-channel image,
pyplotcolor map or callable to map 1D to 3D color. - border: If
bool, whether to place a black boundary around the image, or ifstr, the boundary CSS style. - ylabel: Text (rotated by 90 degrees) shown on the left of each row.
- html_class: CSS class name used in definition of HTML element.
- pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'; if
False, sets 'image-rendering: auto'; if None, uses pixelated rendering
only on images for which
widthorheightintroduces magnification. - return_html: If
Truereturn the raw HTMLstrinstead of displaying.
Returns:
html string if
return_htmlisTrue.
1127def compare_images( 1128 images: Iterable[_ArrayLike], 1129 *, 1130 vmin: float | None = None, 1131 vmax: float | None = None, 1132 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 1133) -> None: 1134 """Compare two images using an interactive slider. 1135 1136 Displays an HTML slider component to interactively swipe between two images. 1137 The slider functionality requires that the web browser have Internet access. 1138 See additional info in `https://github.com/sneas/img-comparison-slider`. 1139 1140 >>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64)) 1141 >>> compare_images([image1, image2]) 1142 1143 Args: 1144 images: Iterable of images. Each image must be either a 2D array or a 3D 1145 array with 1, 3, or 4 channels. There must be exactly two images. 1146 vmin: For single-channel image, explicit min value for display. 1147 vmax: For single-channel image, explicit max value for display. 1148 cmap: For single-channel image, `pyplot` color map or callable to map 1D to 1149 3D color. 1150 """ 1151 list_images = [ 1152 _ensure_mapped_to_rgb(image, vmin=vmin, vmax=vmax, cmap=cmap) 1153 for image in images 1154 ] 1155 if len(list_images) != 2: 1156 raise ValueError('The number of images must be 2.') 1157 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1158 b64_1, b64_2 = [ 1159 base64.b64encode(png_data).decode('utf-8') for png_data in png_datas 1160 ] 1161 s = _IMAGE_COMPARISON_HTML.replace('{b64_1}', b64_1).replace('{b64_2}', b64_2) 1162 _display_html(s)
Compare two images using an interactive slider.
Displays an HTML slider component to interactively swipe between two images.
The slider functionality requires that the web browser have Internet access.
See additional info in https://github.com/sneas/img-comparison-slider.
>>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64))
>>> compare_images([image1, image2])
Arguments:
- images: Iterable of images. Each image must be either a 2D array or a 3D array with 1, 3, or 4 channels. There must be exactly two images.
- vmin: For single-channel image, explicit min value for display.
- vmax: For single-channel image, explicit max value for display.
- cmap: For single-channel image,
pyplotcolor map or callable to map 1D to 3D color.
1941def show_video( 1942 images: Iterable[_NDArray], *, title: str | None = None, **kwargs: Any 1943) -> str | None: 1944 """Displays a video in the IPython notebook and optionally saves it to a file. 1945 1946 See `show_videos`. 1947 1948 >>> video = read_video('https://github.com/hhoppe/data/raw/main/video.mp4') 1949 >>> show_video(video, title='River video') 1950 1951 >>> show_video(moving_circle((80, 80), num_images=10), fps=5, border=True) 1952 1953 >>> show_video(read_video('/tmp/river.mp4')) 1954 1955 Args: 1956 images: Iterable of video frames (e.g., a 4D array or a list of 2D or 3D 1957 arrays). 1958 title: Optional text shown centered above the video. 1959 **kwargs: See `show_videos`. 1960 1961 Returns: 1962 html string if `return_html` is `True`. 1963 """ 1964 return show_videos([images], [title], **kwargs)
Displays a video in the IPython notebook and optionally saves it to a file.
See show_videos.
>>> video = read_video('https://github.com/hhoppe/data/raw/main/video.mp4')
>>> show_video(video, title='River video')
>>> show_video(moving_circle((80, 80), num_images=10), fps=5, border=True)
>>> show_video(read_video('/tmp/river.mp4'))
Arguments:
- images: Iterable of video frames (e.g., a 4D array or a list of 2D or 3D arrays).
- title: Optional text shown centered above the video.
- **kwargs: See
show_videos.
Returns:
html string if
return_htmlisTrue.
1967def show_videos( 1968 videos: Iterable[Iterable[_NDArray]] | Mapping[str, Iterable[_NDArray]], 1969 titles: Iterable[str | None] | None = None, 1970 *, 1971 width: int | None = None, 1972 height: int | None = None, 1973 downsample: bool = True, 1974 columns: int | None = None, 1975 fps: float | None = None, 1976 bps: int | None = None, 1977 qp: int | None = None, 1978 codec: str = 'h264', 1979 ylabel: str = '', 1980 html_class: str = 'show_videos', 1981 return_html: bool = False, 1982 **kwargs: Any, 1983) -> str | None: 1984 """Displays a row of videos in the IPython notebook. 1985 1986 Creates HTML with `<video>` tags containing embedded H264-encoded bytestrings. 1987 If `codec` is set to 'gif', we instead use `<img>` tags containing embedded 1988 GIF-encoded bytestrings. Note that the resulting GIF animations skip frames 1989 when the `fps` period is not a multiple of 10 ms units (GIF frame delay 1990 units). Encoding at `fps` = 20.0, 25.0, or 50.0 works fine. 1991 1992 If a directory has been specified using `set_show_save_dir`, also saves each 1993 titled video to a file in that directory based on its title. 1994 1995 Args: 1996 videos: Iterable of videos, or dictionary of `{title: video}`. Each video 1997 must be an iterable of images. If a video object has a `metadata` 1998 (`VideoMetadata`) attribute, its `fps` field provides a default framerate. 1999 titles: Optional strings shown above the corresponding videos. 2000 width: Optional, overrides displayed width (in pixels). 2001 height: Optional, overrides displayed height (in pixels). 2002 downsample: If True, each video whose width or height is greater than the 2003 specified `width` or `height` is resampled to the display resolution. This 2004 improves antialiasing and reduces the size of the notebook. 2005 columns: Optional, maximum number of videos per row. 2006 fps: Frames-per-second framerate (default is 60.0 except 25.0 for GIF). 2007 bps: Bits-per-second bitrate (default None). 2008 qp: Quantization parameter for video compression quality (default None). 2009 codec: Compression algorithm; must be either 'h264' or 'gif'. 2010 ylabel: Text (rotated by 90 degrees) shown on the left of each row. 2011 html_class: CSS class name used in definition of HTML element. 2012 return_html: If `True` return the raw HTML `str` instead of displaying. 2013 **kwargs: Additional parameters (`border`, `loop`, `autoplay`) for 2014 `html_from_compressed_video`. 2015 2016 Returns: 2017 html string if `return_html` is `True`. 2018 """ 2019 if isinstance(videos, Mapping): 2020 if titles is not None: 2021 raise ValueError( 2022 'Cannot have both a video dictionary and a titles parameter.' 2023 ) 2024 list_titles = list(videos.keys()) 2025 list_videos = list(videos.values()) 2026 else: 2027 list_videos = list(cast('Iterable[_NDArray]', videos)) 2028 list_titles = [None] * len(list_videos) if titles is None else list(titles) 2029 if len(list_videos) != len(list_titles): 2030 raise ValueError( 2031 'Number of videos does not match number of titles' 2032 f' ({len(list_videos)} vs {len(list_titles)}).' 2033 ) 2034 if codec not in {'h264', 'gif'}: 2035 raise ValueError(f'Codec {codec} is neither h264 or gif.') 2036 2037 html_strings = [] 2038 for video, title in zip(list_videos, list_titles): 2039 metadata: VideoMetadata | None = getattr(video, 'metadata', None) 2040 first_image, video = _peek_first(video) 2041 w, h = _get_width_height(width, height, first_image.shape[:2]) 2042 if downsample and (w < first_image.shape[1] or h < first_image.shape[0]): 2043 # Not resize_video() because each image may have different depth and type. 2044 video = [resize_image(image, (h, w)) for image in video] 2045 first_image = video[0] 2046 data = compress_video( 2047 video, metadata=metadata, fps=fps, bps=bps, qp=qp, codec=codec 2048 ) 2049 if title is not None and _config.show_save_dir: 2050 suffix = _filename_suffix_from_codec(codec) 2051 path = pathlib.Path(_config.show_save_dir) / f'{title}{suffix}' 2052 with _open(path, mode='wb') as f: 2053 f.write(data) 2054 if codec == 'gif': 2055 pixelated = h > first_image.shape[0] or w > first_image.shape[1] 2056 html_string = html_from_compressed_image( 2057 data, w, h, title=title, fmt='gif', pixelated=pixelated, **kwargs 2058 ) 2059 else: 2060 html_string = html_from_compressed_video( 2061 data, w, h, title=title, **kwargs 2062 ) 2063 html_strings.append(html_string) 2064 2065 # Create single-row tables each with no more than 'columns' elements. 2066 table_strings = [] 2067 for row_html_strings in _chunked(html_strings, columns): 2068 td = '<td style="padding:1px;">' 2069 s = ''.join(f'{td}{e}</td>' for e in row_html_strings) 2070 if ylabel: 2071 style = 'writing-mode:vertical-lr; transform:rotate(180deg);' 2072 s = f'{td}<span style="{style}">{ylabel}</span></td>' + s 2073 table_strings.append( 2074 f'<table class="{html_class}"' 2075 f' style="border-spacing:0px;"><tr>{s}</tr></table>' 2076 ) 2077 s = ''.join(table_strings) 2078 if return_html: 2079 return s 2080 _display_html(s) 2081 return None
Displays a row of videos in the IPython notebook.
Creates HTML with <video> tags containing embedded H264-encoded bytestrings.
If codec is set to 'gif', we instead use <img> tags containing embedded
GIF-encoded bytestrings. Note that the resulting GIF animations skip frames
when the fps period is not a multiple of 10 ms units (GIF frame delay
units). Encoding at fps = 20.0, 25.0, or 50.0 works fine.
If a directory has been specified using set_show_save_dir, also saves each
titled video to a file in that directory based on its title.
Arguments:
- videos: Iterable of videos, or dictionary of
{title: video}. Each video must be an iterable of images. If a video object has ametadata(VideoMetadata) attribute, itsfpsfield provides a default framerate. - titles: Optional strings shown above the corresponding videos.
- width: Optional, overrides displayed width (in pixels).
- height: Optional, overrides displayed height (in pixels).
- downsample: If True, each video whose width or height is greater than the
specified
widthorheightis resampled to the display resolution. This improves antialiasing and reduces the size of the notebook. - columns: Optional, maximum number of videos per row.
- fps: Frames-per-second framerate (default is 60.0 except 25.0 for GIF).
- bps: Bits-per-second bitrate (default None).
- qp: Quantization parameter for video compression quality (default None).
- codec: Compression algorithm; must be either 'h264' or 'gif'.
- ylabel: Text (rotated by 90 degrees) shown on the left of each row.
- html_class: CSS class name used in definition of HTML element.
- return_html: If
Truereturn the raw HTMLstrinstead of displaying. - **kwargs: Additional parameters (
border,loop,autoplay) forhtml_from_compressed_video.
Returns:
html string if
return_htmlisTrue.
769def read_image( 770 path_or_url: _Path, 771 *, 772 apply_exif_transpose: bool = True, 773 dtype: _DTypeLike = None, 774) -> _NDArray: 775 """Returns an image read from a file path or URL. 776 777 Decoding is performed using `PIL`, which supports `uint8` images with 1, 3, 778 or 4 channels and `uint16` images with a single channel. 779 780 Args: 781 path_or_url: Path of input file. 782 apply_exif_transpose: If True, rotate image according to EXIF orientation. 783 dtype: Data type of the returned array. If None, `np.uint8` or `np.uint16` 784 is inferred automatically. 785 """ 786 data = read_contents(path_or_url) 787 return decompress_image(data, dtype, apply_exif_transpose)
Returns an image read from a file path or URL.
Decoding is performed using PIL, which supports uint8 images with 1, 3,
or 4 channels and uint16 images with a single channel.
Arguments:
- path_or_url: Path of input file.
- apply_exif_transpose: If True, rotate image according to EXIF orientation.
- dtype: Data type of the returned array. If None,
np.uint8ornp.uint16is inferred automatically.
790def write_image( 791 path: _Path, image: _ArrayLike, fmt: str = 'png', **kwargs: Any 792) -> None: 793 """Writes an image to a file. 794 795 Encoding is performed using `PIL`, which supports `uint8` images with 1, 3, 796 or 4 channels and `uint16` images with a single channel. 797 798 File format is explicitly provided by `fmt` and not inferred by `path`. 799 800 Args: 801 path: Path of output file. 802 image: Array-like object. If its type is float, it is converted to np.uint8 803 using `to_uint8` (thus clamping to the input to the range [0.0, 1.0]). 804 Otherwise it must be np.uint8 or np.uint16. 805 fmt: Desired compression encoding, e.g. 'png'. 806 **kwargs: Additional parameters for `PIL.Image.save()`. 807 """ 808 image = _as_valid_media_array(image) 809 if np.issubdtype(image.dtype, np.floating): 810 image = to_uint8(image) 811 with _open(path, 'wb') as f: 812 _pil_image(image).save(f, format=fmt, **kwargs)
Writes an image to a file.
Encoding is performed using PIL, which supports uint8 images with 1, 3,
or 4 channels and uint16 images with a single channel.
File format is explicitly provided by fmt and not inferred by path.
Arguments:
- path: Path of output file.
- image: Array-like object. If its type is float, it is converted to np.uint8
using
to_uint8(thus clamping to the input to the range [0.0, 1.0]). Otherwise it must be np.uint8 or np.uint16. - fmt: Desired compression encoding, e.g. 'png'.
- **kwargs: Additional parameters for
PIL.Image.save().
1806def read_video(path_or_url: _Path, **kwargs: Any) -> _VideoArray: 1807 """Returns an array containing all images read from a compressed video file. 1808 1809 >>> video = read_video('/tmp/river.mp4') 1810 >>> print(f'The framerate is {video.metadata.fps} frames/s.') 1811 >>> show_video(video) 1812 1813 >>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4' 1814 >>> show_video(read_video(url)) 1815 1816 Args: 1817 path_or_url: Input video file. 1818 **kwargs: Additional parameters for `VideoReader`. 1819 1820 Returns: 1821 A 4D `numpy` array with dimensions (frame, height, width, channel), or a 3D 1822 array if `output_format` is specified as 'gray'. The returned array has an 1823 attribute `metadata` containing `VideoMetadata` information. This enables 1824 `show_video` to retrieve the framerate in `metadata.fps`. Note that the 1825 metadata attribute is lost in most subsequent `numpy` operations. 1826 """ 1827 with VideoReader(path_or_url, **kwargs) as reader: 1828 return _VideoArray(np.array(tuple(reader)), metadata=reader.metadata)
Returns an array containing all images read from a compressed video file.
>>> video = read_video('/tmp/river.mp4')
>>> print(f'The framerate is {video.metadata.fps} frames/s.')
>>> show_video(video)
>>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4'
>>> show_video(read_video(url))
Arguments:
- path_or_url: Input video file.
- **kwargs: Additional parameters for
VideoReader.
Returns:
A 4D
numpyarray with dimensions (frame, height, width, channel), or a 3D array ifoutput_formatis specified as 'gray'. The returned array has an attributemetadatacontainingVideoMetadatainformation. This enablesshow_videoto retrieve the framerate inmetadata.fps. Note that the metadata attribute is lost in most subsequentnumpyoperations.
1831def write_video(path: _Path, images: Iterable[_NDArray], **kwargs: Any) -> None: 1832 """Writes images to a compressed video file. 1833 1834 >>> video = moving_circle((480, 640), num_images=60) 1835 >>> write_video('/tmp/v.mp4', video, fps=60, qp=18) 1836 >>> show_video(read_video('/tmp/v.mp4')) 1837 1838 Args: 1839 path: Output video file. 1840 images: Iterable over video frames, e.g. a 4D array or a list of 2D or 3D 1841 arrays. 1842 **kwargs: Additional parameters for `VideoWriter`. 1843 """ 1844 first_image, images = _peek_first(images) 1845 shape = first_image.shape[0], first_image.shape[1] 1846 dtype = first_image.dtype 1847 if dtype == bool: 1848 dtype = np.dtype(np.uint8) 1849 elif np.issubdtype(dtype, np.floating): 1850 dtype = np.dtype(np.uint16) 1851 kwargs = {'metadata': getattr(images, 'metadata', None), **kwargs} 1852 with VideoWriter(path, shape=shape, dtype=dtype, **kwargs) as writer: 1853 for image in images: 1854 writer.add_image(image)
Writes images to a compressed video file.
>>> video = moving_circle((480, 640), num_images=60)
>>> write_video('/tmp/v.mp4', video, fps=60, qp=18)
>>> show_video(read_video('/tmp/v.mp4'))
Arguments:
- path: Output video file.
- images: Iterable over video frames, e.g. a 4D array or a list of 2D or 3D arrays.
- **kwargs: Additional parameters for
VideoWriter.
1372class VideoReader(_VideoIO): 1373 """Context to read a compressed video as an iterable over its images. 1374 1375 >>> with VideoReader('/tmp/river.mp4') as reader: 1376 ... print(f'Video has {reader.num_images} images with shape={reader.shape},' 1377 ... f' at {reader.fps} frames/sec and {reader.bps} bits/sec.') 1378 ... for image in reader: 1379 ... print(image.shape) 1380 1381 >>> with VideoReader('/tmp/river.mp4') as reader: 1382 ... video = np.array(tuple(reader)) 1383 1384 >>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4' 1385 >>> with VideoReader(url) as reader: 1386 ... show_video(reader) 1387 1388 Attributes: 1389 path_or_url: Location of input video. 1390 output_format: Format of output images (default 'rgb'). If 'rgb', each 1391 image has shape=(height, width, 3) with R, G, B values. If 'yuv', each 1392 image has shape=(height, width, 3) with Y, U, V values. If 'gray', each 1393 image has shape=(height, width). 1394 dtype: Data type for output images. The default is `np.uint8`. Use of 1395 `np.uint16` allows reading 10-bit or 12-bit data without precision loss. 1396 metadata: Object storing the information retrieved from the video header. 1397 Its attributes are copied as attributes in this class. 1398 num_images: Number of frames that is expected from the video stream. This 1399 is estimated from the framerate and the duration stored in the video 1400 header, so it might be inexact. 1401 shape: The dimensions (height, width) of each video frame. 1402 fps: The framerate in frames per second. 1403 bps: The estimated bitrate of the video stream in bits per second, retrieved 1404 from the video header. 1405 stream_index: The stream index to read from. The default is 0. 1406 """ 1407 1408 path_or_url: _Path 1409 output_format: str 1410 dtype: _DType 1411 metadata: VideoMetadata 1412 num_images: int 1413 shape: tuple[int, int] 1414 fps: float 1415 bps: int | None 1416 stream_index: int 1417 _num_bytes_per_image: int 1418 1419 def __init__( 1420 self, 1421 path_or_url: _Path, 1422 *, 1423 stream_index: int = 0, 1424 output_format: str = 'rgb', 1425 dtype: _DTypeLike = np.uint8, 1426 ): 1427 if output_format not in {'rgb', 'yuv', 'gray'}: 1428 raise ValueError( 1429 f'Output format {output_format} is not rgb, yuv, or gray.' 1430 ) 1431 self.path_or_url = path_or_url 1432 self.output_format = output_format 1433 self.stream_index = stream_index 1434 self.dtype = np.dtype(dtype) 1435 if self.dtype.type not in (np.uint8, np.uint16): 1436 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1437 self._read_via_local_file: Any = None 1438 self._popen: subprocess.Popen[bytes] | None = None 1439 self._proc: subprocess.Popen[bytes] | None = None 1440 1441 def __enter__(self) -> 'VideoReader': 1442 try: 1443 self._read_via_local_file = _read_via_local_file(self.path_or_url) 1444 # pylint: disable-next=no-member 1445 tmp_name = self._read_via_local_file.__enter__() 1446 1447 self.metadata = _get_video_metadata(tmp_name) 1448 self.num_images, self.shape, self.fps, self.bps = self.metadata 1449 pix_fmt = self._get_pix_fmt(self.dtype, self.output_format) 1450 num_channels = {'rgb': 3, 'yuv': 3, 'gray': 1}[self.output_format] 1451 bytes_per_channel = self.dtype.itemsize 1452 self._num_bytes_per_image = ( 1453 math.prod(self.shape) * num_channels * bytes_per_channel 1454 ) 1455 1456 command = [ 1457 '-v', 1458 'panic', 1459 '-nostdin', 1460 '-i', 1461 tmp_name, 1462 '-vcodec', 1463 'rawvideo', 1464 '-f', 1465 'image2pipe', 1466 '-map', 1467 f'0:v:{self.stream_index}', 1468 '-pix_fmt', 1469 pix_fmt, 1470 '-vsync', 1471 'vfr', 1472 '-', 1473 ] 1474 self._popen = _run_ffmpeg( 1475 command, 1476 stdout=subprocess.PIPE, 1477 stderr=subprocess.PIPE, 1478 allowed_input_files=[tmp_name], 1479 ) 1480 self._proc = self._popen.__enter__() 1481 except Exception: 1482 self.__exit__(None, None, None) 1483 raise 1484 return self 1485 1486 def __exit__(self, *_: Any) -> None: 1487 self.close() 1488 1489 def read(self) -> _NDArray | None: 1490 """Reads a video image frame (or None if at end of file). 1491 1492 Returns: 1493 A numpy array in the format specified by `output_format`, i.e., a 3D 1494 array with 3 color channels, except for format 'gray' which is 2D. 1495 """ 1496 assert self._proc, 'Error: reading from an already closed context.' 1497 stdout = self._proc.stdout 1498 assert stdout is not None 1499 data = stdout.read(self._num_bytes_per_image) 1500 if not data: # Due to either end-of-file or subprocess error. 1501 self.close() # Raises exception if subprocess had error. 1502 return None # To indicate end-of-file. 1503 assert len(data) == self._num_bytes_per_image 1504 image = np.frombuffer(data, dtype=self.dtype) 1505 if self.output_format == 'rgb': 1506 image = image.reshape(*self.shape, 3) 1507 elif self.output_format == 'yuv': # Convert from planar YUV to pixel YUV. 1508 image = np.moveaxis(image.reshape(3, *self.shape), 0, 2) 1509 elif self.output_format == 'gray': # Generate 2D rather than 3D ndimage. 1510 image = image.reshape(*self.shape) 1511 else: 1512 raise AssertionError 1513 return image 1514 1515 def __iter__(self) -> Iterator[_NDArray]: 1516 while True: 1517 image = self.read() 1518 if image is None: 1519 return 1520 yield image 1521 1522 def close(self) -> None: 1523 """Terminates video reader. (Called automatically at end of context.)""" 1524 if self._popen: 1525 self._popen.__exit__(None, None, None) 1526 self._popen = None 1527 self._proc = None 1528 if self._read_via_local_file: 1529 # pylint: disable-next=no-member 1530 self._read_via_local_file.__exit__(None, None, None) 1531 self._read_via_local_file = None
Context to read a compressed video as an iterable over its images.
>>> with VideoReader('/tmp/river.mp4') as reader:
... print(f'Video has {reader.num_images} images with shape={reader.shape},'
... f' at {reader.fps} frames/sec and {reader.bps} bits/sec.')
... for image in reader:
... print(image.shape)
>>> with VideoReader('/tmp/river.mp4') as reader:
... video = np.array(tuple(reader))
>>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4'
>>> with VideoReader(url) as reader:
... show_video(reader)
Attributes:
- path_or_url: Location of input video.
- output_format: Format of output images (default 'rgb'). If 'rgb', each image has shape=(height, width, 3) with R, G, B values. If 'yuv', each image has shape=(height, width, 3) with Y, U, V values. If 'gray', each image has shape=(height, width).
- dtype: Data type for output images. The default is
np.uint8. Use ofnp.uint16allows reading 10-bit or 12-bit data without precision loss. - metadata: Object storing the information retrieved from the video header. Its attributes are copied as attributes in this class.
- num_images: Number of frames that is expected from the video stream. This is estimated from the framerate and the duration stored in the video header, so it might be inexact.
- shape: The dimensions (height, width) of each video frame.
- fps: The framerate in frames per second.
- bps: The estimated bitrate of the video stream in bits per second, retrieved from the video header.
- stream_index: The stream index to read from. The default is 0.
1419 def __init__( 1420 self, 1421 path_or_url: _Path, 1422 *, 1423 stream_index: int = 0, 1424 output_format: str = 'rgb', 1425 dtype: _DTypeLike = np.uint8, 1426 ): 1427 if output_format not in {'rgb', 'yuv', 'gray'}: 1428 raise ValueError( 1429 f'Output format {output_format} is not rgb, yuv, or gray.' 1430 ) 1431 self.path_or_url = path_or_url 1432 self.output_format = output_format 1433 self.stream_index = stream_index 1434 self.dtype = np.dtype(dtype) 1435 if self.dtype.type not in (np.uint8, np.uint16): 1436 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1437 self._read_via_local_file: Any = None 1438 self._popen: subprocess.Popen[bytes] | None = None 1439 self._proc: subprocess.Popen[bytes] | None = None
1489 def read(self) -> _NDArray | None: 1490 """Reads a video image frame (or None if at end of file). 1491 1492 Returns: 1493 A numpy array in the format specified by `output_format`, i.e., a 3D 1494 array with 3 color channels, except for format 'gray' which is 2D. 1495 """ 1496 assert self._proc, 'Error: reading from an already closed context.' 1497 stdout = self._proc.stdout 1498 assert stdout is not None 1499 data = stdout.read(self._num_bytes_per_image) 1500 if not data: # Due to either end-of-file or subprocess error. 1501 self.close() # Raises exception if subprocess had error. 1502 return None # To indicate end-of-file. 1503 assert len(data) == self._num_bytes_per_image 1504 image = np.frombuffer(data, dtype=self.dtype) 1505 if self.output_format == 'rgb': 1506 image = image.reshape(*self.shape, 3) 1507 elif self.output_format == 'yuv': # Convert from planar YUV to pixel YUV. 1508 image = np.moveaxis(image.reshape(3, *self.shape), 0, 2) 1509 elif self.output_format == 'gray': # Generate 2D rather than 3D ndimage. 1510 image = image.reshape(*self.shape) 1511 else: 1512 raise AssertionError 1513 return image
Reads a video image frame (or None if at end of file).
Returns:
A numpy array in the format specified by
output_format, i.e., a 3D array with 3 color channels, except for format 'gray' which is 2D.
1522 def close(self) -> None: 1523 """Terminates video reader. (Called automatically at end of context.)""" 1524 if self._popen: 1525 self._popen.__exit__(None, None, None) 1526 self._popen = None 1527 self._proc = None 1528 if self._read_via_local_file: 1529 # pylint: disable-next=no-member 1530 self._read_via_local_file.__exit__(None, None, None) 1531 self._read_via_local_file = None
Terminates video reader. (Called automatically at end of context.)
1534class VideoWriter(_VideoIO): 1535 """Context to write a compressed video. 1536 1537 >>> shape = 480, 640 1538 >>> with VideoWriter('/tmp/v.mp4', shape, fps=60) as writer: 1539 ... for image in moving_circle(shape, num_images=60): 1540 ... writer.add_image(image) 1541 >>> show_video(read_video('/tmp/v.mp4')) 1542 1543 1544 Bitrate control may be specified using at most one of: `bps`, `qp`, or `crf`. 1545 If none are specified, `qp` is set to a default value. 1546 See https://slhck.info/video/2017/03/01/rate-control.html 1547 1548 If codec is 'gif', the args `bps`, `qp`, `crf`, and `encoded_format` are 1549 ignored. 1550 1551 Attributes: 1552 path: Output video. Its suffix (e.g. '.mp4') determines the video container 1553 format. The suffix must be '.gif' if the codec is 'gif'. 1554 shape: 2D spatial dimensions (height, width) of video image frames. The 1555 dimensions must be even if 'encoded_format' has subsampled chroma (e.g., 1556 'yuv420p' or 'yuv420p10le'). 1557 codec: Compression algorithm as defined by "ffmpeg -codecs" (e.g., 'h264', 1558 'hevc', 'vp9', or 'gif'). 1559 metadata: Optional VideoMetadata object whose `fps` and `bps` attributes are 1560 used if not specified as explicit parameters. 1561 fps: Frames-per-second framerate (default is 60.0 except 25.0 for 'gif'). 1562 bps: Requested average bits-per-second bitrate (default None). 1563 qp: Quantization parameter for video compression quality (default None). 1564 crf: Constant rate factor for video compression quality (default None). 1565 ffmpeg_args: Additional arguments for `ffmpeg` command, e.g. '-g 30' to 1566 introduce I-frames, or '-bf 0' to omit B-frames. 1567 input_format: Format of input images (default 'rgb'). If 'rgb', each image 1568 has shape=(height, width, 3) or (height, width). If 'yuv', each image has 1569 shape=(height, width, 3) with Y, U, V values. If 'gray', each image has 1570 shape=(height, width). 1571 dtype: Expected data type for input images (any float input images are 1572 converted to `dtype`). The default is `np.uint8`. Use of `np.uint16` is 1573 necessary when encoding >8 bits/channel. 1574 encoded_format: Pixel format as defined by `ffmpeg -pix_fmts`, e.g., 1575 'yuv420p' (2x2-subsampled chroma), 'yuv444p' (full-res chroma), 1576 'yuv420p10le' (10-bit per channel), etc. The default (None) selects 1577 'yuv420p' if all shape dimensions are even, else 'yuv444p'. 1578 """ 1579 1580 def __init__( 1581 self, 1582 path: _Path, 1583 shape: tuple[int, int], 1584 *, 1585 codec: str = 'h264', 1586 metadata: VideoMetadata | None = None, 1587 fps: float | None = None, 1588 bps: int | None = None, 1589 qp: int | None = None, 1590 crf: float | None = None, 1591 ffmpeg_args: str | Sequence[str] = '', 1592 input_format: str = 'rgb', 1593 dtype: _DTypeLike = np.uint8, 1594 encoded_format: str | None = None, 1595 ) -> None: 1596 _check_2d_shape(shape) 1597 if fps is None and metadata: 1598 fps = metadata.fps 1599 if fps is None: 1600 fps = 25.0 if codec == 'gif' else 60.0 1601 if fps <= 0.0: 1602 raise ValueError(f'Frame-per-second value {fps} is invalid.') 1603 if bps is None and metadata: 1604 bps = metadata.bps 1605 bps = int(bps) if bps is not None else None 1606 if bps is not None and bps <= 0: 1607 raise ValueError(f'Bitrate value {bps} is invalid.') 1608 if qp is not None and (not isinstance(qp, int) or qp < 0): 1609 raise ValueError( 1610 f'Quantization parameter {qp} cannot be negative. It must be a' 1611 ' non-negative integer.' 1612 ) 1613 num_rate_specifications = sum(x is not None for x in (bps, qp, crf)) 1614 if num_rate_specifications > 1: 1615 raise ValueError( 1616 f'Must specify at most one of bps, qp, or crf ({bps}, {qp}, {crf}).' 1617 ) 1618 ffmpeg_args = ( 1619 shlex.split(ffmpeg_args) 1620 if isinstance(ffmpeg_args, str) 1621 else list(ffmpeg_args) 1622 ) 1623 if input_format not in {'rgb', 'yuv', 'gray'}: 1624 raise ValueError(f'Input format {input_format} is not rgb, yuv, or gray.') 1625 dtype = np.dtype(dtype) 1626 if dtype.type not in (np.uint8, np.uint16): 1627 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1628 self.path = pathlib.Path(path) 1629 self.shape = shape 1630 all_dimensions_are_even = all(dim % 2 == 0 for dim in shape) 1631 if encoded_format is None: 1632 encoded_format = 'yuv420p' if all_dimensions_are_even else 'yuv444p' 1633 if not all_dimensions_are_even and encoded_format.startswith( 1634 ('yuv42', 'yuvj42') 1635 ): 1636 raise ValueError( 1637 f'With encoded_format {encoded_format}, video dimensions must be' 1638 f' even, but shape is {shape}.' 1639 ) 1640 self.fps = fps 1641 self.codec = codec 1642 self.bps = bps 1643 self.qp = qp 1644 self.crf = crf 1645 self.ffmpeg_args = ffmpeg_args 1646 self.input_format = input_format 1647 self.dtype = dtype 1648 self.encoded_format = encoded_format 1649 if num_rate_specifications == 0 and not ffmpeg_args: 1650 qp = 20 if math.prod(self.shape) <= 640 * 480 else 28 1651 self._bitrate_args = ( 1652 (['-vb', f'{bps}'] if bps is not None else []) 1653 + (['-qp', f'{qp}'] if qp is not None else []) 1654 + (['-vb', '0', '-crf', f'{crf}'] if crf is not None else []) 1655 ) 1656 if self.codec == 'gif': 1657 if self.path.suffix != '.gif': 1658 raise ValueError(f"File '{self.path}' does not have a .gif suffix.") 1659 self.encoded_format = 'pal8' 1660 self._bitrate_args = [] 1661 video_filter = 'split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse' 1662 # Less common (and likely less useful) is a per-frame color palette: 1663 # video_filter = ('split[s0][s1];[s0]palettegen=stats_mode=single[p];' 1664 # '[s1][p]paletteuse=new=1') 1665 self.ffmpeg_args = ['-vf', video_filter, '-f', 'gif'] + self.ffmpeg_args 1666 self._write_via_local_file: Any = None 1667 self._popen: subprocess.Popen[bytes] | None = None 1668 self._proc: subprocess.Popen[bytes] | None = None 1669 1670 def __enter__(self) -> 'VideoWriter': 1671 input_pix_fmt = self._get_pix_fmt(self.dtype, self.input_format) 1672 try: 1673 self._write_via_local_file = _write_via_local_file(self.path) 1674 # pylint: disable-next=no-member 1675 tmp_name = self._write_via_local_file.__enter__() 1676 1677 # Writing to stdout using ('-f', 'mp4', '-') would require 1678 # ('-movflags', 'frag_keyframe+empty_moov') and the result is nonportable. 1679 height, width = self.shape 1680 command = ( 1681 [ 1682 '-v', 1683 'error', 1684 '-f', 1685 'rawvideo', 1686 '-vcodec', 1687 'rawvideo', 1688 '-pix_fmt', 1689 input_pix_fmt, 1690 '-s', 1691 f'{width}x{height}', 1692 '-r', 1693 f'{self.fps}', 1694 '-i', 1695 '-', 1696 '-an', 1697 '-vcodec', 1698 self.codec, 1699 '-pix_fmt', 1700 self.encoded_format, 1701 ] 1702 + self._bitrate_args 1703 + self.ffmpeg_args 1704 + ['-y', tmp_name] 1705 ) 1706 self._popen = _run_ffmpeg( 1707 command, 1708 stdin=subprocess.PIPE, 1709 stderr=subprocess.PIPE, 1710 allowed_output_files=[tmp_name], 1711 ) 1712 self._proc = self._popen.__enter__() 1713 except Exception: 1714 self.__exit__(None, None, None) 1715 raise 1716 return self 1717 1718 def __exit__(self, *_: Any) -> None: 1719 self.close() 1720 1721 def add_image(self, image: _NDArray) -> None: 1722 """Writes a video frame. 1723 1724 Args: 1725 image: Array whose dtype and first two dimensions must match the `dtype` 1726 and `shape` specified in `VideoWriter` initialization. If 1727 `input_format` is 'gray', the image must be 2D. For the 'rgb' 1728 input_format, the image may be either 2D (interpreted as grayscale) or 1729 3D with three (R, G, B) channels. For the 'yuv' input_format, the image 1730 must be 3D with three (Y, U, V) channels. 1731 1732 Raises: 1733 RuntimeError: If there is an error writing to the output file. 1734 """ 1735 assert self._proc, 'Error: writing to an already closed context.' 1736 if issubclass(image.dtype.type, (np.floating, np.bool_)): 1737 image = to_type(image, self.dtype) 1738 if image.dtype != self.dtype: 1739 raise ValueError(f'Image type {image.dtype} != {self.dtype}.') 1740 if self.input_format == 'gray': 1741 if image.ndim != 2: 1742 raise ValueError(f'Image dimensions {image.shape} are not 2D.') 1743 else: 1744 if image.ndim == 2 and self.input_format == 'rgb': 1745 image = np.dstack((image, image, image)) 1746 if not (image.ndim == 3 and image.shape[2] == 3): 1747 raise ValueError(f'Image dimensions {image.shape} are invalid.') 1748 if image.shape[:2] != self.shape: 1749 raise ValueError( 1750 f'Image dimensions {image.shape[:2]} do not match' 1751 f' those of the initialized video {self.shape}.' 1752 ) 1753 if self.input_format == 'yuv': # Convert from per-pixel YUV to planar YUV. 1754 image = np.moveaxis(image, 2, 0) 1755 data = image.tobytes() 1756 stdin = self._proc.stdin 1757 assert stdin is not None 1758 if stdin.write(data) != len(data): 1759 self._proc.wait() 1760 stderr = self._proc.stderr 1761 assert stderr is not None 1762 s = stderr.read().decode('utf-8') 1763 raise RuntimeError(f"Error writing '{self.path}': {s}") 1764 1765 def close(self) -> None: 1766 """Finishes writing the video. (Called automatically at end of context.)""" 1767 if self._popen: 1768 assert self._proc, 'Error: closing an already closed context.' 1769 stdin = self._proc.stdin 1770 assert stdin is not None 1771 stdin.close() 1772 if self._proc.wait(): 1773 stderr = self._proc.stderr 1774 assert stderr is not None 1775 s = stderr.read().decode('utf-8') 1776 raise RuntimeError(f"Error writing '{self.path}': {s}") 1777 self._popen.__exit__(None, None, None) 1778 self._popen = None 1779 self._proc = None 1780 if self._write_via_local_file: 1781 # pylint: disable-next=no-member 1782 self._write_via_local_file.__exit__(None, None, None) 1783 self._write_via_local_file = None
Context to write a compressed video.
>>> shape = 480, 640
>>> with VideoWriter('/tmp/v.mp4', shape, fps=60) as writer:
... for image in moving_circle(shape, num_images=60):
... writer.add_image(image)
>>> show_video(read_video('/tmp/v.mp4'))
Bitrate control may be specified using at most one of: bps, qp, or crf.
If none are specified, qp is set to a default value.
See https://slhck.info/video/2017/03/01/rate-control.html
If codec is 'gif', the args bps, qp, crf, and encoded_format are
ignored.
Attributes:
- path: Output video. Its suffix (e.g. '.mp4') determines the video container format. The suffix must be '.gif' if the codec is 'gif'.
- shape: 2D spatial dimensions (height, width) of video image frames. The dimensions must be even if 'encoded_format' has subsampled chroma (e.g., 'yuv420p' or 'yuv420p10le').
- codec: Compression algorithm as defined by "ffmpeg -codecs" (e.g., 'h264', 'hevc', 'vp9', or 'gif').
- metadata: Optional VideoMetadata object whose
fpsandbpsattributes are used if not specified as explicit parameters. - fps: Frames-per-second framerate (default is 60.0 except 25.0 for 'gif').
- bps: Requested average bits-per-second bitrate (default None).
- qp: Quantization parameter for video compression quality (default None).
- crf: Constant rate factor for video compression quality (default None).
- ffmpeg_args: Additional arguments for
ffmpegcommand, e.g. '-g 30' to introduce I-frames, or '-bf 0' to omit B-frames. - input_format: Format of input images (default 'rgb'). If 'rgb', each image has shape=(height, width, 3) or (height, width). If 'yuv', each image has shape=(height, width, 3) with Y, U, V values. If 'gray', each image has shape=(height, width).
- dtype: Expected data type for input images (any float input images are
converted to
dtype). The default isnp.uint8. Use ofnp.uint16is necessary when encoding >8 bits/channel. - encoded_format: Pixel format as defined by
ffmpeg -pix_fmts, e.g., 'yuv420p' (2x2-subsampled chroma), 'yuv444p' (full-res chroma), 'yuv420p10le' (10-bit per channel), etc. The default (None) selects 'yuv420p' if all shape dimensions are even, else 'yuv444p'.
1580 def __init__( 1581 self, 1582 path: _Path, 1583 shape: tuple[int, int], 1584 *, 1585 codec: str = 'h264', 1586 metadata: VideoMetadata | None = None, 1587 fps: float | None = None, 1588 bps: int | None = None, 1589 qp: int | None = None, 1590 crf: float | None = None, 1591 ffmpeg_args: str | Sequence[str] = '', 1592 input_format: str = 'rgb', 1593 dtype: _DTypeLike = np.uint8, 1594 encoded_format: str | None = None, 1595 ) -> None: 1596 _check_2d_shape(shape) 1597 if fps is None and metadata: 1598 fps = metadata.fps 1599 if fps is None: 1600 fps = 25.0 if codec == 'gif' else 60.0 1601 if fps <= 0.0: 1602 raise ValueError(f'Frame-per-second value {fps} is invalid.') 1603 if bps is None and metadata: 1604 bps = metadata.bps 1605 bps = int(bps) if bps is not None else None 1606 if bps is not None and bps <= 0: 1607 raise ValueError(f'Bitrate value {bps} is invalid.') 1608 if qp is not None and (not isinstance(qp, int) or qp < 0): 1609 raise ValueError( 1610 f'Quantization parameter {qp} cannot be negative. It must be a' 1611 ' non-negative integer.' 1612 ) 1613 num_rate_specifications = sum(x is not None for x in (bps, qp, crf)) 1614 if num_rate_specifications > 1: 1615 raise ValueError( 1616 f'Must specify at most one of bps, qp, or crf ({bps}, {qp}, {crf}).' 1617 ) 1618 ffmpeg_args = ( 1619 shlex.split(ffmpeg_args) 1620 if isinstance(ffmpeg_args, str) 1621 else list(ffmpeg_args) 1622 ) 1623 if input_format not in {'rgb', 'yuv', 'gray'}: 1624 raise ValueError(f'Input format {input_format} is not rgb, yuv, or gray.') 1625 dtype = np.dtype(dtype) 1626 if dtype.type not in (np.uint8, np.uint16): 1627 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1628 self.path = pathlib.Path(path) 1629 self.shape = shape 1630 all_dimensions_are_even = all(dim % 2 == 0 for dim in shape) 1631 if encoded_format is None: 1632 encoded_format = 'yuv420p' if all_dimensions_are_even else 'yuv444p' 1633 if not all_dimensions_are_even and encoded_format.startswith( 1634 ('yuv42', 'yuvj42') 1635 ): 1636 raise ValueError( 1637 f'With encoded_format {encoded_format}, video dimensions must be' 1638 f' even, but shape is {shape}.' 1639 ) 1640 self.fps = fps 1641 self.codec = codec 1642 self.bps = bps 1643 self.qp = qp 1644 self.crf = crf 1645 self.ffmpeg_args = ffmpeg_args 1646 self.input_format = input_format 1647 self.dtype = dtype 1648 self.encoded_format = encoded_format 1649 if num_rate_specifications == 0 and not ffmpeg_args: 1650 qp = 20 if math.prod(self.shape) <= 640 * 480 else 28 1651 self._bitrate_args = ( 1652 (['-vb', f'{bps}'] if bps is not None else []) 1653 + (['-qp', f'{qp}'] if qp is not None else []) 1654 + (['-vb', '0', '-crf', f'{crf}'] if crf is not None else []) 1655 ) 1656 if self.codec == 'gif': 1657 if self.path.suffix != '.gif': 1658 raise ValueError(f"File '{self.path}' does not have a .gif suffix.") 1659 self.encoded_format = 'pal8' 1660 self._bitrate_args = [] 1661 video_filter = 'split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse' 1662 # Less common (and likely less useful) is a per-frame color palette: 1663 # video_filter = ('split[s0][s1];[s0]palettegen=stats_mode=single[p];' 1664 # '[s1][p]paletteuse=new=1') 1665 self.ffmpeg_args = ['-vf', video_filter, '-f', 'gif'] + self.ffmpeg_args 1666 self._write_via_local_file: Any = None 1667 self._popen: subprocess.Popen[bytes] | None = None 1668 self._proc: subprocess.Popen[bytes] | None = None
1721 def add_image(self, image: _NDArray) -> None: 1722 """Writes a video frame. 1723 1724 Args: 1725 image: Array whose dtype and first two dimensions must match the `dtype` 1726 and `shape` specified in `VideoWriter` initialization. If 1727 `input_format` is 'gray', the image must be 2D. For the 'rgb' 1728 input_format, the image may be either 2D (interpreted as grayscale) or 1729 3D with three (R, G, B) channels. For the 'yuv' input_format, the image 1730 must be 3D with three (Y, U, V) channels. 1731 1732 Raises: 1733 RuntimeError: If there is an error writing to the output file. 1734 """ 1735 assert self._proc, 'Error: writing to an already closed context.' 1736 if issubclass(image.dtype.type, (np.floating, np.bool_)): 1737 image = to_type(image, self.dtype) 1738 if image.dtype != self.dtype: 1739 raise ValueError(f'Image type {image.dtype} != {self.dtype}.') 1740 if self.input_format == 'gray': 1741 if image.ndim != 2: 1742 raise ValueError(f'Image dimensions {image.shape} are not 2D.') 1743 else: 1744 if image.ndim == 2 and self.input_format == 'rgb': 1745 image = np.dstack((image, image, image)) 1746 if not (image.ndim == 3 and image.shape[2] == 3): 1747 raise ValueError(f'Image dimensions {image.shape} are invalid.') 1748 if image.shape[:2] != self.shape: 1749 raise ValueError( 1750 f'Image dimensions {image.shape[:2]} do not match' 1751 f' those of the initialized video {self.shape}.' 1752 ) 1753 if self.input_format == 'yuv': # Convert from per-pixel YUV to planar YUV. 1754 image = np.moveaxis(image, 2, 0) 1755 data = image.tobytes() 1756 stdin = self._proc.stdin 1757 assert stdin is not None 1758 if stdin.write(data) != len(data): 1759 self._proc.wait() 1760 stderr = self._proc.stderr 1761 assert stderr is not None 1762 s = stderr.read().decode('utf-8') 1763 raise RuntimeError(f"Error writing '{self.path}': {s}")
Writes a video frame.
Arguments:
- image: Array whose dtype and first two dimensions must match the
dtypeandshapespecified inVideoWriterinitialization. Ifinput_formatis 'gray', the image must be 2D. For the 'rgb' input_format, the image may be either 2D (interpreted as grayscale) or 3D with three (R, G, B) channels. For the 'yuv' input_format, the image must be 3D with three (Y, U, V) channels.
Raises:
- RuntimeError: If there is an error writing to the output file.
1765 def close(self) -> None: 1766 """Finishes writing the video. (Called automatically at end of context.)""" 1767 if self._popen: 1768 assert self._proc, 'Error: closing an already closed context.' 1769 stdin = self._proc.stdin 1770 assert stdin is not None 1771 stdin.close() 1772 if self._proc.wait(): 1773 stderr = self._proc.stderr 1774 assert stderr is not None 1775 s = stderr.read().decode('utf-8') 1776 raise RuntimeError(f"Error writing '{self.path}': {s}") 1777 self._popen.__exit__(None, None, None) 1778 self._popen = None 1779 self._proc = None 1780 if self._write_via_local_file: 1781 # pylint: disable-next=no-member 1782 self._write_via_local_file.__exit__(None, None, None) 1783 self._write_via_local_file = None
Finishes writing the video. (Called automatically at end of context.)
1267class VideoMetadata(NamedTuple): 1268 """Represents the data stored in a video container header. 1269 1270 Attributes: 1271 num_images: Number of frames that is expected from the video stream. This 1272 is estimated from the framerate and the duration stored in the video 1273 header, so it might be inexact. We set the value to -1 if number of 1274 frames is not found in the header. 1275 shape: The dimensions (height, width) of each video frame. 1276 fps: The framerate in frames per second. 1277 bps: The estimated bitrate of the video stream in bits per second, retrieved 1278 from the video header. 1279 """ 1280 1281 num_images: int 1282 shape: tuple[int, int] 1283 fps: float 1284 bps: int | None
Represents the data stored in a video container header.
Attributes:
- num_images: Number of frames that is expected from the video stream. This is estimated from the framerate and the duration stored in the video header, so it might be inexact. We set the value to -1 if number of frames is not found in the header.
- shape: The dimensions (height, width) of each video frame.
- fps: The framerate in frames per second.
- bps: The estimated bitrate of the video stream in bits per second, retrieved from the video header.
859def compress_image( 860 image: _ArrayLike, *, fmt: str = 'png', **kwargs: Any 861) -> bytes: 862 """Returns a buffer containing a compressed image. 863 864 Args: 865 image: Array in a format supported by `PIL`, e.g. np.uint8 or np.uint16. 866 fmt: Desired compression encoding, e.g. 'png'. 867 **kwargs: Options for `PIL.save()`, e.g. `optimize=True` for greater 868 compression. 869 """ 870 image = _as_valid_media_array(image) 871 with io.BytesIO() as output: 872 _pil_image(image).save(output, format=fmt, **kwargs) 873 return output.getvalue()
Returns a buffer containing a compressed image.
Arguments:
- image: Array in a format supported by
PIL, e.g. np.uint8 or np.uint16. - fmt: Desired compression encoding, e.g. 'png'.
- **kwargs: Options for
PIL.save(), e.g.optimize=Truefor greater compression.
876def decompress_image( 877 data: bytes, dtype: _DTypeLike = None, apply_exif_transpose: bool = True 878) -> _NDArray: 879 """Returns an image from a compressed data buffer. 880 881 Decoding is performed using `PIL`, which supports `uint8` images with 1, 3, 882 or 4 channels and `uint16` images with a single channel. 883 884 Args: 885 data: Buffer containing compressed image. 886 dtype: Data type of the returned array. If None, `np.uint8` or `np.uint16` 887 is inferred automatically. 888 apply_exif_transpose: If True, rotate image according to EXIF orientation. 889 """ 890 pil_image: PIL.Image.Image = PIL.Image.open(io.BytesIO(data)) 891 if apply_exif_transpose: 892 tmp_image = PIL.ImageOps.exif_transpose(pil_image) # Future: in_place=True. 893 assert tmp_image 894 pil_image = tmp_image 895 if dtype is None: 896 dtype = np.uint16 if pil_image.mode.startswith('I') else np.uint8 897 return np.array(pil_image, dtype=dtype)
Returns an image from a compressed data buffer.
Decoding is performed using PIL, which supports uint8 images with 1, 3,
or 4 channels and uint16 images with a single channel.
Arguments:
- data: Buffer containing compressed image.
- dtype: Data type of the returned array. If None,
np.uint8ornp.uint16is inferred automatically. - apply_exif_transpose: If True, rotate image according to EXIF orientation.
1857def compress_video( 1858 images: Iterable[_NDArray], *, codec: str = 'h264', **kwargs: Any 1859) -> bytes: 1860 """Returns a buffer containing a compressed video. 1861 1862 The video container is 'gif' for 'gif' codec, 'webm' for 'vp9' codec, 1863 and mp4 otherwise. 1864 1865 >>> video = read_video('/tmp/river.mp4') 1866 >>> data = compress_video(video, bps=10_000_000) 1867 >>> print(len(data)) 1868 1869 >>> data = compress_video(moving_circle((100, 100), num_images=10), fps=10) 1870 1871 Args: 1872 images: Iterable over video frames. 1873 codec: Compression algorithm as defined by `ffmpeg -codecs` (e.g., 'h264', 1874 'hevc', 'vp9', or 'gif'). 1875 **kwargs: Additional parameters for `VideoWriter`. 1876 1877 Returns: 1878 A bytes buffer containing the compressed video. 1879 """ 1880 suffix = _filename_suffix_from_codec(codec) 1881 with tempfile.TemporaryDirectory() as directory_name: 1882 tmp_path = pathlib.Path(directory_name) / f'file{suffix}' 1883 write_video(tmp_path, images, codec=codec, **kwargs) 1884 return tmp_path.read_bytes()
Returns a buffer containing a compressed video.
The video container is 'gif' for 'gif' codec, 'webm' for 'vp9' codec, and mp4 otherwise.
>>> video = read_video('/tmp/river.mp4')
>>> data = compress_video(video, bps=10_000_000)
>>> print(len(data))
>>> data = compress_video(moving_circle((100, 100), num_images=10), fps=10)
Arguments:
- images: Iterable over video frames.
- codec: Compression algorithm as defined by
ffmpeg -codecs(e.g., 'h264', 'hevc', 'vp9', or 'gif'). - **kwargs: Additional parameters for
VideoWriter.
Returns:
A bytes buffer containing the compressed video.
1887def decompress_video(data: bytes, **kwargs: Any) -> _NDArray: 1888 """Returns video images from an MP4-compressed data buffer.""" 1889 with tempfile.TemporaryDirectory() as directory_name: 1890 tmp_path = pathlib.Path(directory_name) / 'file.mp4' 1891 tmp_path.write_bytes(data) 1892 return read_video(tmp_path, **kwargs)
Returns video images from an MP4-compressed data buffer.
900def html_from_compressed_image( 901 data: bytes, 902 width: int, 903 height: int, 904 *, 905 title: str | None = None, 906 border: bool | str = False, 907 pixelated: bool = True, 908 fmt: str = 'png', 909) -> str: 910 """Returns an HTML string with an image tag containing encoded data. 911 912 Args: 913 data: Compressed image bytes. 914 width: Width of HTML image in pixels. 915 height: Height of HTML image in pixels. 916 title: Optional text shown centered above image. 917 border: If `bool`, whether to place a black boundary around the image, or if 918 `str`, the boundary CSS style. 919 pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'. 920 fmt: Compression encoding. 921 """ 922 b64 = base64.b64encode(data).decode('utf-8') 923 if isinstance(border, str): 924 border = f'{border}; ' 925 elif border: 926 border = 'border:1px solid black; ' 927 else: 928 border = '' 929 s_pixelated = 'pixelated' if pixelated else 'auto' 930 s = ( 931 f'<img width="{width}" height="{height}"' 932 f' style="{border}image-rendering:{s_pixelated}; object-fit:cover;"' 933 f' src="data:image/{fmt};base64,{b64}"/>' 934 ) 935 if title is not None: 936 s = f"""<div style="display:flex; align-items:left;"> 937 <div style="display:flex; flex-direction:column; align-items:center;"> 938 <div>{title}</div><div>{s}</div></div></div>""" 939 return s
Returns an HTML string with an image tag containing encoded data.
Arguments:
- data: Compressed image bytes.
- width: Width of HTML image in pixels.
- height: Height of HTML image in pixels.
- title: Optional text shown centered above image.
- border: If
bool, whether to place a black boundary around the image, or ifstr, the boundary CSS style. - pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'.
- fmt: Compression encoding.
1895def html_from_compressed_video( 1896 data: bytes, 1897 width: int, 1898 height: int, 1899 *, 1900 title: str | None = None, 1901 border: bool | str = False, 1902 loop: bool = True, 1903 autoplay: bool = True, 1904) -> str: 1905 """Returns an HTML string with a video tag containing H264-encoded data. 1906 1907 Args: 1908 data: MP4-compressed video bytes. 1909 width: Width of HTML video in pixels. 1910 height: Height of HTML video in pixels. 1911 title: Optional text shown centered above the video. 1912 border: If `bool`, whether to place a black boundary around the image, or if 1913 `str`, the boundary CSS style. 1914 loop: If True, the playback repeats forever. 1915 autoplay: If True, video playback starts without having to click. 1916 """ 1917 b64 = base64.b64encode(data).decode('utf-8') 1918 if isinstance(border, str): 1919 border = f'{border}; ' 1920 elif border: 1921 border = 'border:1px solid black; ' 1922 else: 1923 border = '' 1924 options = ( 1925 f'controls width="{width}" height="{height}"' 1926 f' style="{border}object-fit:cover;"' 1927 f'{" loop" if loop else ""}' 1928 f'{" autoplay muted" if autoplay else ""}' 1929 ) 1930 s = f"""<video {options}> 1931 <source src="data:video/mp4;base64,{b64}" type="video/mp4"/> 1932 This browser does not support the video tag. 1933 </video>""" 1934 if title is not None: 1935 s = f"""<div style="display:flex; align-items:left;"> 1936 <div style="display:flex; flex-direction:column; align-items:center;"> 1937 <div>{title}</div><div>{s}</div></div></div>""" 1938 return s
Returns an HTML string with a video tag containing H264-encoded data.
Arguments:
- data: MP4-compressed video bytes.
- width: Width of HTML video in pixels.
- height: Height of HTML video in pixels.
- title: Optional text shown centered above the video.
- border: If
bool, whether to place a black boundary around the image, or ifstr, the boundary CSS style. - loop: If True, the playback repeats forever.
- autoplay: If True, video playback starts without having to click.
615def resize_image(image: _ArrayLike, shape: tuple[int, int]) -> _NDArray: 616 """Resizes image to specified spatial dimensions using a Lanczos filter. 617 618 Args: 619 image: Array-like 2D or 3D object, where dtype is uint or floating-point. 620 shape: 2D spatial dimensions (height, width) of output image. 621 622 Returns: 623 A resampled image whose spatial dimensions match `shape`. 624 """ 625 image = _as_valid_media_array(image) 626 if image.ndim not in (2, 3): 627 raise ValueError(f'Image shape {image.shape} is neither 2D nor 3D.') 628 _check_2d_shape(shape) 629 630 # A PIL image can be multichannel only if it has 3 or 4 uint8 channels, 631 # and it can be resized only if it is uint8 or float32. 632 supported_single_channel = ( 633 np.issubdtype(image.dtype, np.floating) or image.dtype == np.uint8 634 ) and image.ndim == 2 635 supported_multichannel = ( 636 image.dtype == np.uint8 and image.ndim == 3 and image.shape[2] in (3, 4) 637 ) 638 if supported_single_channel or supported_multichannel: 639 return np.array( 640 _pil_image(image).resize( 641 shape[::-1], resample=PIL.Image.Resampling.LANCZOS 642 ), 643 dtype=image.dtype, 644 ) 645 if image.ndim == 2: 646 # We convert to floating-point for resizing and convert back. 647 return to_type(resize_image(to_float01(image), shape), image.dtype) 648 # We resize each image channel individually. 649 return np.dstack( 650 [resize_image(channel, shape) for channel in np.moveaxis(image, -1, 0)] 651 )
Resizes image to specified spatial dimensions using a Lanczos filter.
Arguments:
- image: Array-like 2D or 3D object, where dtype is uint or floating-point.
- shape: 2D spatial dimensions (height, width) of output image.
Returns:
A resampled image whose spatial dimensions match
shape.
657def resize_video(video: Iterable[_NDArray], shape: tuple[int, int]) -> _NDArray: 658 """Resizes `video` to specified spatial dimensions using a Lanczos filter. 659 660 Args: 661 video: Iterable of images. 662 shape: 2D spatial dimensions (height, width) of output video. 663 664 Returns: 665 A resampled video whose spatial dimensions match `shape`. 666 """ 667 _check_2d_shape(shape) 668 return np.array([resize_image(image, shape) for image in video])
Resizes video to specified spatial dimensions using a Lanczos filter.
Arguments:
- video: Iterable of images.
- shape: 2D spatial dimensions (height, width) of output video.
Returns:
A resampled video whose spatial dimensions match
shape.
815def to_rgb( 816 array: _ArrayLike, 817 *, 818 vmin: float | None = None, 819 vmax: float | None = None, 820 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 821) -> _NDArray: 822 """Maps scalar values to RGB using value bounds and a color map. 823 824 Args: 825 array: Scalar values, with arbitrary shape. 826 vmin: Explicit min value for remapping; if None, it is obtained as the 827 minimum finite value of `array`. 828 vmax: Explicit max value for remapping; if None, it is obtained as the 829 maximum finite value of `array`. 830 cmap: A `pyplot` color map or callable, to map from 1D value to 3D or 4D 831 color. 832 833 Returns: 834 A new array in which each element is affinely mapped from [vmin, vmax] 835 to [0.0, 1.0] and then color-mapped. 836 """ 837 a = _as_valid_media_array(array) 838 del array 839 # For future numpy version 1.7.0: 840 # vmin = np.amin(a, where=np.isfinite(a)) if vmin is None else vmin 841 # vmax = np.amax(a, where=np.isfinite(a)) if vmax is None else vmax 842 vmin = np.amin(np.where(np.isfinite(a), a, np.inf)) if vmin is None else vmin 843 vmax = np.amax(np.where(np.isfinite(a), a, -np.inf)) if vmax is None else vmax 844 a = (a.astype('float') - vmin) / (vmax - vmin + np.finfo(float).eps) 845 if isinstance(cmap, str): 846 if hasattr(matplotlib, 'colormaps'): 847 rgb_from_scalar: Any = matplotlib.colormaps[cmap] # Newer version. 848 else: 849 rgb_from_scalar = matplotlib.pyplot.cm.get_cmap(cmap) # pylint: disable=no-member 850 else: 851 rgb_from_scalar = cmap 852 a = cast(_NDArray, rgb_from_scalar(a)) 853 # If there is a fully opaque alpha channel, remove it. 854 if a.shape[-1] == 4 and np.all(to_float01(a[..., 3])) == 1.0: 855 a = a[..., :3] 856 return a
Maps scalar values to RGB using value bounds and a color map.
Arguments:
- array: Scalar values, with arbitrary shape.
- vmin: Explicit min value for remapping; if None, it is obtained as the
minimum finite value of
array. - vmax: Explicit max value for remapping; if None, it is obtained as the
maximum finite value of
array. - cmap: A
pyplotcolor map or callable, to map from 1D value to 3D or 4D color.
Returns:
A new array in which each element is affinely mapped from [vmin, vmax] to [0.0, 1.0] and then color-mapped.
376def to_type(array: _ArrayLike, dtype: _DTypeLike) -> _NDArray: 377 """Returns media array converted to specified type. 378 379 A "media array" is one in which the dtype is either a floating-point type 380 (np.float32 or np.float64) or an unsigned integer type. The array values are 381 assumed to lie in the range [0.0, 1.0] for floating-point values, and in the 382 full range for unsigned integers, e.g. [0, 255] for np.uint8. 383 384 Conversion between integers and floats maps uint(0) to 0.0 and uint(MAX) to 385 1.0. The input array may also be of type bool, whereby True maps to 386 uint(MAX) or 1.0. The values are scaled and clamped as appropriate during 387 type conversions. 388 389 Args: 390 array: Input array-like object (floating-point, unsigned int, or bool). 391 dtype: Desired output type (floating-point or unsigned int). 392 393 Returns: 394 Array `a` if it is already of the specified dtype, else a converted array. 395 """ 396 a = np.asarray(array) 397 dtype = np.dtype(dtype) 398 del array 399 if a.dtype != bool: 400 _as_valid_media_type(a.dtype) # Verify that 'a' has a valid dtype. 401 if a.dtype == bool: 402 result = a.astype(dtype) 403 if np.issubdtype(dtype, np.unsignedinteger): 404 result = result * dtype.type(np.iinfo(dtype).max) 405 elif a.dtype == dtype: 406 result = a 407 elif np.issubdtype(dtype, np.unsignedinteger): 408 if np.issubdtype(a.dtype, np.unsignedinteger): 409 src_max: float = np.iinfo(a.dtype).max 410 else: 411 a = np.clip(a, 0.0, 1.0) 412 src_max = 1.0 413 dst_max = np.iinfo(dtype).max 414 if dst_max <= np.iinfo(np.uint16).max: 415 scale = np.array(dst_max / src_max, dtype=np.float32) 416 result = (a * scale + 0.5).astype(dtype) 417 elif dst_max <= np.iinfo(np.uint32).max: 418 result = (a.astype(np.float64) * (dst_max / src_max) + 0.5).astype(dtype) 419 else: 420 # https://stackoverflow.com/a/66306123/ 421 a = a.astype(np.float64) * (dst_max / src_max) + 0.5 422 dst = np.atleast_1d(a) 423 values_too_large = dst >= np.float64(dst_max) 424 with np.errstate(invalid='ignore'): 425 dst = dst.astype(dtype) 426 dst[values_too_large] = dst_max 427 result = dst if a.ndim > 0 else dst[0] 428 else: 429 assert np.issubdtype(dtype, np.floating) 430 result = a.astype(dtype) 431 if np.issubdtype(a.dtype, np.unsignedinteger): 432 result = result / dtype.type(np.iinfo(a.dtype).max) 433 return result
Returns media array converted to specified type.
A "media array" is one in which the dtype is either a floating-point type (np.float32 or np.float64) or an unsigned integer type. The array values are assumed to lie in the range [0.0, 1.0] for floating-point values, and in the full range for unsigned integers, e.g. [0, 255] for np.uint8.
Conversion between integers and floats maps uint(0) to 0.0 and uint(MAX) to 1.0. The input array may also be of type bool, whereby True maps to uint(MAX) or 1.0. The values are scaled and clamped as appropriate during type conversions.
Arguments:
- array: Input array-like object (floating-point, unsigned int, or bool).
- dtype: Desired output type (floating-point or unsigned int).
Returns:
Array
aif it is already of the specified dtype, else a converted array.
436def to_float01(a: _ArrayLike, dtype: _DTypeLike = np.float32) -> _NDArray: 437 """If array has unsigned integers, rescales them to the range [0.0, 1.0]. 438 439 Scaling is such that uint(0) maps to 0.0 and uint(MAX) maps to 1.0. See 440 `to_type`. 441 442 Args: 443 a: Input array. 444 dtype: Desired floating-point type if rescaling occurs. 445 446 Returns: 447 A new array of dtype values in the range [0.0, 1.0] if the input array `a` 448 contains unsigned integers; otherwise, array `a` is returned unchanged. 449 """ 450 a = np.asarray(a) 451 dtype = np.dtype(dtype) 452 if not np.issubdtype(dtype, np.floating): 453 raise ValueError(f'Type {dtype} is not floating-point.') 454 if np.issubdtype(a.dtype, np.floating): 455 return a 456 return to_type(a, dtype)
If array has unsigned integers, rescales them to the range [0.0, 1.0].
Scaling is such that uint(0) maps to 0.0 and uint(MAX) maps to 1.0. See
to_type.
Arguments:
- a: Input array.
- dtype: Desired floating-point type if rescaling occurs.
Returns:
A new array of dtype values in the range [0.0, 1.0] if the input array
acontains unsigned integers; otherwise, arrayais returned unchanged.
459def to_uint8(a: _ArrayLike) -> _NDArray: 460 """Returns array converted to uint8 values; see `to_type`.""" 461 return to_type(a, np.uint8)
Returns array converted to uint8 values; see to_type.
329def set_output_height(num_pixels: int) -> None: 330 """Overrides the height of the current output cell, if using Colab.""" 331 try: 332 # We want to fail gracefully for non-Colab IPython notebooks. 333 output = importlib.import_module('google.colab.output') 334 s = f'google.colab.output.setIframeHeight("{num_pixels}px")' 335 output.eval_js(s) 336 except (ModuleNotFoundError, AttributeError): 337 pass
Overrides the height of the current output cell, if using Colab.
340def set_max_output_height(num_pixels: int) -> None: 341 """Sets the maximum height of the current output cell, if using Colab.""" 342 try: 343 # We want to fail gracefully for non-Colab IPython notebooks. 344 output = importlib.import_module('google.colab.output') 345 s = ( 346 'google.colab.output.setIframeHeight(' 347 f'0, true, {{maxHeight: {num_pixels}}})' 348 ) 349 output.eval_js(s) 350 except (ModuleNotFoundError, AttributeError): 351 pass
Sets the maximum height of the current output cell, if using Colab.
467def color_ramp( 468 shape: tuple[int, int] = (64, 64), *, dtype: _DTypeLike = np.float32 469) -> _NDArray: 470 """Returns an image of a red-green color gradient. 471 472 This is useful for quick experimentation and testing. See also 473 `moving_circle` to generate a sample video. 474 475 Args: 476 shape: 2D spatial dimensions (height, width) of generated image. 477 dtype: Type (uint or floating) of resulting pixel values. 478 """ 479 _check_2d_shape(shape) 480 dtype = _as_valid_media_type(dtype) 481 yx = (np.moveaxis(np.indices(shape), 0, -1) + 0.5) / shape 482 image = np.insert(yx, 2, 0.0, axis=-1) 483 return to_type(image, dtype)
Returns an image of a red-green color gradient.
This is useful for quick experimentation and testing. See also
moving_circle to generate a sample video.
Arguments:
- shape: 2D spatial dimensions (height, width) of generated image.
- dtype: Type (uint or floating) of resulting pixel values.
486def moving_circle( 487 shape: tuple[int, int] = (256, 256), 488 num_images: int = 10, 489 *, 490 dtype: _DTypeLike = np.float32, 491) -> _NDArray: 492 """Returns a video of a circle moving in front of a color ramp. 493 494 This is useful for quick experimentation and testing. See also `color_ramp` 495 to generate a sample image. 496 497 >>> show_video(moving_circle((480, 640), 60), fps=60) 498 499 Args: 500 shape: 2D spatial dimensions (height, width) of generated video. 501 num_images: Number of video frames. 502 dtype: Type (uint or floating) of resulting pixel values. 503 """ 504 _check_2d_shape(shape) 505 dtype = np.dtype(dtype) 506 507 def generate_image(image_index: int) -> _NDArray: 508 """Returns a video frame image.""" 509 image = color_ramp(shape, dtype=dtype) 510 yx = np.moveaxis(np.indices(shape), 0, -1) 511 center = shape[0] * 0.6, shape[1] * (image_index + 0.5) / num_images 512 radius_squared = (min(shape) * 0.1) ** 2 513 inside = np.sum((yx - center) ** 2, axis=-1) < radius_squared 514 white_circle_color = 1.0, 1.0, 1.0 515 if np.issubdtype(dtype, np.unsignedinteger): 516 white_circle_color = to_type([white_circle_color], dtype)[0] 517 image[inside] = white_circle_color 518 return image 519 520 return np.array([generate_image(i) for i in range(num_images)])
Returns a video of a circle moving in front of a color ramp.
This is useful for quick experimentation and testing. See also color_ramp
to generate a sample image.
>>> show_video(moving_circle((480, 640), 60), fps=60)
Arguments:
- shape: 2D spatial dimensions (height, width) of generated video.
- num_images: Number of video frames.
- dtype: Type (uint or floating) of resulting pixel values.
736class set_show_save_dir: # pylint: disable=invalid-name 737 """Save all titled output from `show_*()` calls into files. 738 739 If the specified `directory` is not None, all titled images and videos 740 displayed by `show_image`, `show_images`, `show_video`, and `show_videos` are 741 also saved as files within the directory. 742 743 It can be used either to set the state or as a context manager: 744 745 >>> set_show_save_dir('/tmp') 746 >>> show_image(color_ramp(), title='image1') # Creates /tmp/image1.png. 747 >>> show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4. 748 >>> set_show_save_dir(None) 749 750 >>> with set_show_save_dir('/tmp'): 751 ... show_image(color_ramp(), title='image1') # Creates /tmp/image1.png. 752 ... show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4. 753 """ 754 755 def __init__(self, directory: _Path | None): 756 self._old_show_save_dir = _config.show_save_dir 757 _config.show_save_dir = directory 758 759 def __enter__(self) -> None: 760 pass 761 762 def __exit__(self, *_: Any) -> None: 763 _config.show_save_dir = self._old_show_save_dir
Save all titled output from show_*() calls into files.
If the specified directory is not None, all titled images and videos
displayed by show_image, show_images, show_video, and show_videos are
also saved as files within the directory.
It can be used either to set the state or as a context manager:
>>> set_show_save_dir('/tmp')
>>> show_image(color_ramp(), title='image1') # Creates /tmp/image1.png.
>>> show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4.
>>> set_show_save_dir(None)
>>> with set_show_save_dir('/tmp'):
... show_image(color_ramp(), title='image1') # Creates /tmp/image1.png.
... show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4.
315def set_ffmpeg(name_or_path: _Path) -> None: 316 """Specifies the name or path for the `ffmpeg` external program. 317 318 The `ffmpeg` program is required for compressing and decompressing video. 319 (It is used in `read_video`, `write_video`, `show_video`, `show_videos`, 320 etc.) 321 322 Args: 323 name_or_path: Either a filename within a directory of `os.environ['PATH']` 324 or a filepath. The default setting is 'ffmpeg'. 325 """ 326 _config.ffmpeg_name_or_path = name_or_path
Specifies the name or path for the ffmpeg external program.
The ffmpeg program is required for compressing and decompressing video.
(It is used in read_video, write_video, show_video, show_videos,
etc.)
Arguments:
- name_or_path: Either a filename within a directory of
os.environ['PATH']or a filepath. The default setting is 'ffmpeg'.
1259def video_is_available() -> bool: 1260 """Returns True if the program `ffmpeg` is found. 1261 1262 See also `set_ffmpeg`. 1263 """ 1264 return _search_for_ffmpeg_path() is not None
Returns True if the program ffmpeg is found.
See also set_ffmpeg.