mediapy
mediapy: Read/write/show images and videos in an IPython/Jupyter notebook.
[GitHub source] [API docs] [PyPI package] [Colab example]
See the example notebook, or better yet, open it in Colab.
Image examples
Display an image (2D or 3D numpy array):
checkerboard = np.kron([[0, 1] * 16, [1, 0] * 16] * 16, np.ones((4, 4)))
show_image(checkerboard)
Read and display an image (either local or from the Web):
IMAGE = 'https://github.com/hhoppe/data/raw/main/image.png'
show_image(read_image(IMAGE))
Read and display an image from a local file:
!wget -q -O /tmp/burano.png {IMAGE}
show_image(read_image('/tmp/burano.png'))
Show titled images side-by-side:
images = {
'original': checkerboard,
'darkened': checkerboard * 0.7,
'random': np.random.rand(32, 32, 3),
}
show_images(images, vmin=0.0, vmax=1.0, border=True, height=64)
Compare two images using an interactive slider:
compare_images([checkerboard, np.random.rand(128, 128, 3)])
Video examples
Display a video (an iterable of images, e.g., a 3D or 4D array):
video = moving_circle((100, 100), num_images=10)
show_video(video, fps=10)
Show the video frames side-by-side:
show_images(video, columns=6, border=True, height=64)
Show the frames with their indices:
show_images({f'{i}': image for i, image in enumerate(video)}, width=32)
Read and display a video (either local or from the Web):
VIDEO = 'https://github.com/hhoppe/data/raw/main/video.mp4'
show_video(read_video(VIDEO))
Create and display a looping two-frame GIF video:
image1 = resize_image(np.random.rand(10, 10, 3), (50, 50))
show_video([image1, image1 * 0.8], fps=2, codec='gif')
Darken a video frame-by-frame:
output_path = '/tmp/out.mp4'
with VideoReader(VIDEO) as r:
darken_image = lambda image: to_float01(image) * 0.5
with VideoWriter(output_path, shape=r.shape, fps=r.fps, bps=r.bps) as w:
for image in r:
w.add_image(darken_image(image))
1# Copyright 2026 The mediapy Authors. 2# 3# Licensed under the Apache License, Version 2.0 (the "License"); 4# you may not use this file except in compliance with the License. 5# You may obtain a copy of the License at 6# 7# http://www.apache.org/licenses/LICENSE-2.0 8# 9# Unless required by applicable law or agreed to in writing, software 10# distributed under the License is distributed on an "AS IS" BASIS, 11# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12# See the License for the specific language governing permissions and 13# limitations under the License. 14 15"""`mediapy`: Read/write/show images and videos in an IPython/Jupyter notebook. 16 17[**[GitHub source]**](https://github.com/google/mediapy) 18[**[API docs]**](https://google.github.io/mediapy/) 19[**[PyPI package]**](https://pypi.org/project/mediapy/) 20[**[Colab 21example]**](https://colab.research.google.com/github/google/mediapy/blob/main/mediapy_examples.ipynb) 22 23See the [example 24notebook](https://github.com/google/mediapy/blob/main/mediapy_examples.ipynb), 25or better yet, [**open it in 26Colab**](https://colab.research.google.com/github/google/mediapy/blob/main/mediapy_examples.ipynb). 27 28## Image examples 29 30Display an image (2D or 3D `numpy` array): 31```python 32checkerboard = np.kron([[0, 1] * 16, [1, 0] * 16] * 16, np.ones((4, 4))) 33show_image(checkerboard) 34``` 35 36Read and display an image (either local or from the Web): 37```python 38IMAGE = 'https://github.com/hhoppe/data/raw/main/image.png' 39show_image(read_image(IMAGE)) 40``` 41 42Read and display an image from a local file: 43```python 44!wget -q -O /tmp/burano.png {IMAGE} 45show_image(read_image('/tmp/burano.png')) 46``` 47 48Show titled images side-by-side: 49```python 50images = { 51 'original': checkerboard, 52 'darkened': checkerboard * 0.7, 53 'random': np.random.rand(32, 32, 3), 54} 55show_images(images, vmin=0.0, vmax=1.0, border=True, height=64) 56``` 57 58Compare two images using an interactive slider: 59```python 60compare_images([checkerboard, np.random.rand(128, 128, 3)]) 61``` 62 63## Video examples 64 65Display a video (an iterable of images, e.g., a 3D or 4D array): 66```python 67video = moving_circle((100, 100), num_images=10) 68show_video(video, fps=10) 69``` 70 71Show the video frames side-by-side: 72```python 73show_images(video, columns=6, border=True, height=64) 74``` 75 76Show the frames with their indices: 77```python 78show_images({f'{i}': image for i, image in enumerate(video)}, width=32) 79``` 80 81Read and display a video (either local or from the Web): 82```python 83VIDEO = 'https://github.com/hhoppe/data/raw/main/video.mp4' 84show_video(read_video(VIDEO)) 85``` 86 87Create and display a looping two-frame GIF video: 88```python 89image1 = resize_image(np.random.rand(10, 10, 3), (50, 50)) 90show_video([image1, image1 * 0.8], fps=2, codec='gif') 91``` 92 93Darken a video frame-by-frame: 94```python 95output_path = '/tmp/out.mp4' 96with VideoReader(VIDEO) as r: 97 darken_image = lambda image: to_float01(image) * 0.5 98 with VideoWriter(output_path, shape=r.shape, fps=r.fps, bps=r.bps) as w: 99 for image in r: 100 w.add_image(darken_image(image)) 101``` 102""" 103 104from __future__ import annotations 105 106__docformat__ = 'google' 107__version__ = '1.2.6' 108__version_info__ = tuple(int(num) for num in __version__.split('.')) 109 110import base64 111from collections.abc import Callable, Iterable, Iterator, Mapping, Sequence 112import contextlib 113import functools 114import importlib 115import io 116import itertools 117import math 118import numbers 119import os # Package only needed for typing.TYPE_CHECKING. 120import pathlib 121import re 122import shlex 123import shutil 124import subprocess 125import sys 126import tempfile 127import typing 128from typing import Any 129import urllib.request 130import warnings 131 132import IPython.display 133import matplotlib.pyplot 134import numpy as np 135import numpy.typing as npt 136import PIL.Image 137import PIL.ImageOps 138 139 140if not hasattr(PIL.Image, 'Resampling'): # Allow Pillow<9.0. 141 PIL.Image.Resampling = PIL.Image # type: ignore 142 143# Selected and reordered here for pdoc documentation. 144__all__ = [ 145 'show_image', 146 'show_images', 147 'compare_images', 148 'show_video', 149 'show_videos', 150 'read_image', 151 'write_image', 152 'read_video', 153 'write_video', 154 'VideoReader', 155 'VideoWriter', 156 'VideoMetadata', 157 'compress_image', 158 'decompress_image', 159 'compress_video', 160 'decompress_video', 161 'html_from_compressed_image', 162 'html_from_compressed_video', 163 'resize_image', 164 'resize_video', 165 'to_rgb', 166 'to_type', 167 'to_float01', 168 'to_uint8', 169 'set_output_height', 170 'set_max_output_height', 171 'color_ramp', 172 'moving_circle', 173 'set_show_save_dir', 174 'set_ffmpeg', 175 'video_is_available', 176] 177 178if TYPE_CHECKING: 179 _ArrayLike = npt.ArrayLike 180 _DTypeLike = npt.DTypeLike 181 _NDArray = npt.NDArray[Any] 182 _DType = np.dtype[Any] 183else: 184 # Create named types for use in the `pdoc` documentation. 185 _ArrayLike = TypeVar('_ArrayLike') 186 _DTypeLike = TypeVar('_DTypeLike') 187 _NDArray = TypeVar('_NDArray') 188 _DType = TypeVar('_DType') # pylint: disable=invalid-name 189 190_IPYTHON_HTML_SIZE_LIMIT = 10**10 # Unlimited seems to be OK now. 191_T = TypeVar('_T') 192_Path = Union[str, 'os.PathLike[str]'] 193 194_IMAGE_COMPARISON_HTML = """\ 195<script 196 defer 197 src="https://unpkg.com/img-comparison-slider@7/dist/index.js" 198></script> 199<link 200 rel="stylesheet" 201 href="https://unpkg.com/img-comparison-slider@7/dist/styles.css" 202/> 203 204<img-comparison-slider> 205 <img slot="first" src="data:image/png;base64,{b64_1}" /> 206 <img slot="second" src="data:image/png;base64,{b64_2}" /> 207</img-comparison-slider> 208""" 209 210# ** Miscellaneous. 211 212 213class _Config: 214 ffmpeg_name_or_path: _Path = 'ffmpeg' 215 show_save_dir: _Path | None = None 216 217 218_config = _Config() 219 220 221def _open(path: _Path, *args: Any, **kwargs: Any) -> Any: 222 """Opens the file; this is a hook for the built-in `open()`.""" 223 return open(path, *args, **kwargs) 224 225 226def _path_is_local(path: _Path) -> bool: 227 """Returns True if the path is in the filesystem accessible by `ffmpeg`.""" 228 del path 229 return True 230 231 232def _search_for_ffmpeg_path() -> str | None: 233 """Returns a path to the ffmpeg program, or None if not found.""" 234 if filename := shutil.which(_config.ffmpeg_name_or_path): 235 return str(filename) 236 return None 237 238 239def _print_err(*args: str, **kwargs: Any) -> None: 240 """Prints arguments to stderr immediately.""" 241 kwargs = {**dict(file=sys.stderr, flush=True), **kwargs} 242 print(*args, **kwargs) 243 244 245def _chunked( 246 iterable: Iterable[_T], n: int | None = None 247) -> Iterator[tuple[_T, ...]]: 248 """Returns elements collected as tuples of length at most `n` if not None.""" 249 250 def take(n: int | None, iterable: Iterable[_T]) -> tuple[_T, ...]: 251 return tuple(itertools.islice(iterable, n)) 252 253 return iter(functools.partial(take, n, iter(iterable)), ()) 254 255 256def _peek_first(iterator: Iterable[_T]) -> tuple[_T, Iterable[_T]]: 257 """Given an iterator, returns first element and re-initialized iterator. 258 259 >>> first_image, images = _peek_first(moving_circle()) 260 261 Args: 262 iterator: An input iterator or iterable. 263 264 Returns: 265 A tuple (first_element, iterator_reinitialized) containing: 266 first_element: The first element of the input. 267 iterator_reinitialized: A clone of the original iterator/iterable. 268 """ 269 # Inspired from https://stackoverflow.com/a/12059829/1190077 270 peeker, iterator_reinitialized = itertools.tee(iterator) 271 first = next(peeker) 272 return first, iterator_reinitialized 273 274 275def _check_2d_shape(shape: tuple[int, int]) -> None: 276 """Checks that `shape` is of the form (height, width) with two integers.""" 277 if len(shape) != 2: 278 raise ValueError(f'Shape {shape} is not of the form (height, width).') 279 if not all(isinstance(i, numbers.Integral) for i in shape): 280 raise ValueError(f'Shape {shape} contains non-integers.') 281 282 283def _run(args: str | Sequence[str]) -> None: 284 """Executes command, printing output from stdout and stderr. 285 286 Args: 287 args: Command to execute, which can be either a string or a sequence of word 288 strings, as in `subprocess.run()`. If `args` is a string, the shell is 289 invoked to interpret it. 290 291 Raises: 292 RuntimeError: If the command's exit code is nonzero. 293 """ 294 proc = subprocess.run( 295 args, 296 shell=isinstance(args, str), 297 stdout=subprocess.PIPE, 298 stderr=subprocess.STDOUT, 299 check=False, 300 universal_newlines=True, 301 ) 302 print(proc.stdout, end='', flush=True) 303 if proc.returncode: 304 raise RuntimeError( 305 f"Command '{proc.args}' failed with code {proc.returncode}." 306 ) 307 308 309def _display_html(text: str, /) -> None: 310 """In a Jupyter notebook, display the HTML `text`.""" 311 IPython.display.display(IPython.display.HTML(text)) # type: ignore 312 313 314def set_ffmpeg(name_or_path: _Path) -> None: 315 """Specifies the name or path for the `ffmpeg` external program. 316 317 The `ffmpeg` program is required for compressing and decompressing video. 318 (It is used in `read_video`, `write_video`, `show_video`, `show_videos`, 319 etc.) 320 321 Args: 322 name_or_path: Either a filename within a directory of `os.environ['PATH']` 323 or a filepath. The default setting is 'ffmpeg'. 324 """ 325 _config.ffmpeg_name_or_path = name_or_path 326 327 328def set_output_height(num_pixels: int) -> None: 329 """Overrides the height of the current output cell, if using Colab.""" 330 try: 331 # We want to fail gracefully for non-Colab IPython notebooks. 332 output = importlib.import_module('google.colab.output') 333 s = f'google.colab.output.setIframeHeight("{num_pixels}px")' 334 output.eval_js(s) 335 except (ModuleNotFoundError, AttributeError): 336 pass 337 338 339def set_max_output_height(num_pixels: int) -> None: 340 """Sets the maximum height of the current output cell, if using Colab.""" 341 try: 342 # We want to fail gracefully for non-Colab IPython notebooks. 343 output = importlib.import_module('google.colab.output') 344 s = ( 345 'google.colab.output.setIframeHeight(' 346 f'0, true, {{maxHeight: {num_pixels}}})' 347 ) 348 output.eval_js(s) 349 except (ModuleNotFoundError, AttributeError): 350 pass 351 352 353# ** Type conversions. 354 355 356def _as_valid_media_type(dtype: _DTypeLike) -> _DType: 357 """Returns validated media data type.""" 358 dtype = np.dtype(dtype) 359 if not issubclass(dtype.type, (np.unsignedinteger, np.floating)): 360 raise ValueError( 361 f'Type {dtype} is not a valid media data type (uint or float).' 362 ) 363 return dtype 364 365 366def _as_valid_media_array(x: _ArrayLike) -> _NDArray: 367 """Converts to ndarray (if not already), and checks validity of data type.""" 368 a = np.asarray(x) 369 if a.dtype == bool: 370 a = a.astype(np.uint8) * np.iinfo(np.uint8).max 371 _as_valid_media_type(a.dtype) 372 return a 373 374 375def to_type(array: _ArrayLike, dtype: _DTypeLike) -> _NDArray: 376 """Returns media array converted to specified type. 377 378 A "media array" is one in which the dtype is either a floating-point type 379 (np.float32 or np.float64) or an unsigned integer type. The array values are 380 assumed to lie in the range [0.0, 1.0] for floating-point values, and in the 381 full range for unsigned integers, e.g. [0, 255] for np.uint8. 382 383 Conversion between integers and floats maps uint(0) to 0.0 and uint(MAX) to 384 1.0. The input array may also be of type bool, whereby True maps to 385 uint(MAX) or 1.0. The values are scaled and clamped as appropriate during 386 type conversions. 387 388 Args: 389 array: Input array-like object (floating-point, unsigned int, or bool). 390 dtype: Desired output type (floating-point or unsigned int). 391 392 Returns: 393 Array `a` if it is already of the specified dtype, else a converted array. 394 """ 395 a = np.asarray(array) 396 dtype = np.dtype(dtype) 397 del array 398 if a.dtype != bool: 399 _as_valid_media_type(a.dtype) # Verify that 'a' has a valid dtype. 400 if a.dtype == bool: 401 result = a.astype(dtype) 402 if np.issubdtype(dtype, np.unsignedinteger): 403 result = result * dtype.type(np.iinfo(dtype).max) 404 elif a.dtype == dtype: 405 result = a 406 elif np.issubdtype(dtype, np.unsignedinteger): 407 if np.issubdtype(a.dtype, np.unsignedinteger): 408 src_max: float = np.iinfo(a.dtype).max 409 else: 410 a = np.clip(a, 0.0, 1.0) 411 src_max = 1.0 412 dst_max = np.iinfo(dtype).max 413 if dst_max <= np.iinfo(np.uint16).max: 414 scale = np.array(dst_max / src_max, dtype=np.float32) 415 result = (a * scale + 0.5).astype(dtype) 416 elif dst_max <= np.iinfo(np.uint32).max: 417 result = (a.astype(np.float64) * (dst_max / src_max) + 0.5).astype(dtype) 418 else: 419 # https://stackoverflow.com/a/66306123/ 420 a = a.astype(np.float64) * (dst_max / src_max) + 0.5 421 dst = np.atleast_1d(a) 422 values_too_large = dst >= np.float64(dst_max) 423 with np.errstate(invalid='ignore'): 424 dst = dst.astype(dtype) 425 dst[values_too_large] = dst_max 426 result = dst if a.ndim > 0 else dst[0] 427 else: 428 assert np.issubdtype(dtype, np.floating) 429 result = a.astype(dtype) 430 if np.issubdtype(a.dtype, np.unsignedinteger): 431 result = result / dtype.type(np.iinfo(a.dtype).max) 432 return result 433 434 435def to_float01(a: _ArrayLike, dtype: _DTypeLike = np.float32) -> _NDArray: 436 """If array has unsigned integers, rescales them to the range [0.0, 1.0]. 437 438 Scaling is such that uint(0) maps to 0.0 and uint(MAX) maps to 1.0. See 439 `to_type`. 440 441 Args: 442 a: Input array. 443 dtype: Desired floating-point type if rescaling occurs. 444 445 Returns: 446 A new array of dtype values in the range [0.0, 1.0] if the input array `a` 447 contains unsigned integers; otherwise, array `a` is returned unchanged. 448 """ 449 a = np.asarray(a) 450 dtype = np.dtype(dtype) 451 if not np.issubdtype(dtype, np.floating): 452 raise ValueError(f'Type {dtype} is not floating-point.') 453 if np.issubdtype(a.dtype, np.floating): 454 return a 455 return to_type(a, dtype) 456 457 458def to_uint8(a: _ArrayLike) -> _NDArray: 459 """Returns array converted to uint8 values; see `to_type`.""" 460 return to_type(a, np.uint8) 461 462 463# ** Functions to generate example image and video data. 464 465 466def color_ramp( 467 shape: tuple[int, int] = (64, 64), *, dtype: _DTypeLike = np.float32 468) -> _NDArray: 469 """Returns an image of a red-green color gradient. 470 471 This is useful for quick experimentation and testing. See also 472 `moving_circle` to generate a sample video. 473 474 Args: 475 shape: 2D spatial dimensions (height, width) of generated image. 476 dtype: Type (uint or floating) of resulting pixel values. 477 """ 478 _check_2d_shape(shape) 479 dtype = _as_valid_media_type(dtype) 480 yx = (np.moveaxis(np.indices(shape), 0, -1) + 0.5) / shape 481 image = np.insert(yx, 2, 0.0, axis=-1) 482 return to_type(image, dtype) 483 484 485def moving_circle( 486 shape: tuple[int, int] = (256, 256), 487 num_images: int = 10, 488 *, 489 dtype: _DTypeLike = np.float32, 490) -> _NDArray: 491 """Returns a video of a circle moving in front of a color ramp. 492 493 This is useful for quick experimentation and testing. See also `color_ramp` 494 to generate a sample image. 495 496 >>> show_video(moving_circle((480, 640), 60), fps=60) 497 498 Args: 499 shape: 2D spatial dimensions (height, width) of generated video. 500 num_images: Number of video frames. 501 dtype: Type (uint or floating) of resulting pixel values. 502 """ 503 _check_2d_shape(shape) 504 dtype = np.dtype(dtype) 505 506 def generate_image(image_index: int) -> _NDArray: 507 """Returns a video frame image.""" 508 image = color_ramp(shape, dtype=dtype) 509 yx = np.moveaxis(np.indices(shape), 0, -1) 510 center = shape[0] * 0.6, shape[1] * (image_index + 0.5) / num_images 511 radius_squared = (min(shape) * 0.1) ** 2 512 inside = np.sum((yx - center) ** 2, axis=-1) < radius_squared 513 white_circle_color = 1.0, 1.0, 1.0 514 if np.issubdtype(dtype, np.unsignedinteger): 515 white_circle_color = to_type([white_circle_color], dtype)[0] 516 image[inside] = white_circle_color 517 return image 518 519 return np.array([generate_image(i) for i in range(num_images)]) 520 521 522# ** Color-space conversions. 523 524# Same matrix values as in two sources: 525# https://github.com/scikit-image/scikit-image/blob/master/skimage/color/colorconv.py#L377 526# https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/ops/image_ops_impl.py#L2754 527_YUV_FROM_RGB_MATRIX = np.array( 528 [ 529 [0.299, -0.14714119, 0.61497538], 530 [0.587, -0.28886916, -0.51496512], 531 [0.114, 0.43601035, -0.10001026], 532 ], 533 dtype=np.float32, 534) 535_RGB_FROM_YUV_MATRIX = np.linalg.inv(_YUV_FROM_RGB_MATRIX) 536_YUV_CHROMA_OFFSET = np.array([0.0, 0.5, 0.5], dtype=np.float32) 537 538 539def yuv_from_rgb(rgb: _ArrayLike) -> _NDArray: 540 """Returns the RGB image/video mapped to YUV [0,1] color space. 541 542 Note that the "YUV" color space used by video compressors is actually YCbCr! 543 544 Args: 545 rgb: Input image in sRGB space. 546 """ 547 rgb = to_float01(rgb) 548 if rgb.shape[-1] != 3: 549 raise ValueError(f'The last dimension in {rgb.shape} is not 3.') 550 return rgb @ _YUV_FROM_RGB_MATRIX + _YUV_CHROMA_OFFSET 551 552 553def rgb_from_yuv(yuv: _ArrayLike) -> _NDArray: 554 """Returns the YUV image/video mapped to RGB [0,1] color space.""" 555 yuv = to_float01(yuv) 556 if yuv.shape[-1] != 3: 557 raise ValueError(f'The last dimension in {yuv.shape} is not 3.') 558 return (yuv - _YUV_CHROMA_OFFSET) @ _RGB_FROM_YUV_MATRIX 559 560 561# Same matrix values as in 562# https://github.com/scikit-image/scikit-image/blob/master/skimage/color/colorconv.py#L1654 563# and https://en.wikipedia.org/wiki/YUV#Studio_swing_for_BT.601 564_YCBCR_FROM_RGB_MATRIX = np.array( 565 [ 566 [65.481, 128.553, 24.966], 567 [-37.797, -74.203, 112.0], 568 [112.0, -93.786, -18.214], 569 ], 570 dtype=np.float32, 571).transpose() 572_RGB_FROM_YCBCR_MATRIX = np.linalg.inv(_YCBCR_FROM_RGB_MATRIX) 573_YCBCR_OFFSET = np.array([16.0, 128.0, 128.0], dtype=np.float32) 574# Note that _YCBCR_FROM_RGB_MATRIX =~ _YUV_FROM_RGB_MATRIX * [219, 256, 182]; 575# https://en.wikipedia.org/wiki/YUV: "Y' values are conventionally shifted and 576# scaled to the range [16, 235] (referred to as studio swing or 'TV levels')"; 577# "studio range of 16-240 for U and V". (Where does value 182 come from?) 578 579 580def ycbcr_from_rgb(rgb: _ArrayLike) -> _NDArray: 581 """Returns the RGB image/video mapped to YCbCr [0,1] color space. 582 583 The YCbCr color space is the one called "YUV" by video compressors. 584 585 Args: 586 rgb: Input image in sRGB space. 587 """ 588 rgb = to_float01(rgb) 589 if rgb.shape[-1] != 3: 590 raise ValueError(f'The last dimension in {rgb.shape} is not 3.') 591 return (rgb @ _YCBCR_FROM_RGB_MATRIX + _YCBCR_OFFSET) / 255.0 592 593 594def rgb_from_ycbcr(ycbcr: _ArrayLike) -> _NDArray: 595 """Returns the YCbCr image/video mapped to RGB [0,1] color space.""" 596 ycbcr = to_float01(ycbcr) 597 if ycbcr.shape[-1] != 3: 598 raise ValueError(f'The last dimension in {ycbcr.shape} is not 3.') 599 return (ycbcr * 255.0 - _YCBCR_OFFSET) @ _RGB_FROM_YCBCR_MATRIX 600 601 602# ** Image processing. 603 604 605def _pil_image(image: _ArrayLike, mode: str | None = None) -> PIL.Image.Image: 606 """Returns a PIL image given a numpy matrix (either uint8 or float [0,1]).""" 607 image = _as_valid_media_array(image) 608 if image.ndim not in (2, 3): 609 raise ValueError(f'Image shape {image.shape} is neither 2D nor 3D.') 610 pil_image: PIL.Image.Image = PIL.Image.fromarray(image, mode=mode) 611 return pil_image 612 613 614def resize_image(image: _ArrayLike, shape: tuple[int, int]) -> _NDArray: 615 """Resizes image to specified spatial dimensions using a Lanczos filter. 616 617 Args: 618 image: Array-like 2D or 3D object, where dtype is uint or floating-point. 619 shape: 2D spatial dimensions (height, width) of output image. 620 621 Returns: 622 A resampled image whose spatial dimensions match `shape`. 623 """ 624 image = _as_valid_media_array(image) 625 if image.ndim not in (2, 3): 626 raise ValueError(f'Image shape {image.shape} is neither 2D nor 3D.') 627 _check_2d_shape(shape) 628 629 # A PIL image can be multichannel only if it has 3 or 4 uint8 channels, 630 # and it can be resized only if it is uint8 or float32. 631 supported_single_channel = ( 632 np.issubdtype(image.dtype, np.floating) or image.dtype == np.uint8 633 ) and image.ndim == 2 634 supported_multichannel = ( 635 image.dtype == np.uint8 and image.ndim == 3 and image.shape[2] in (3, 4) 636 ) 637 if supported_single_channel or supported_multichannel: 638 return np.array( 639 _pil_image(image).resize( 640 shape[::-1], resample=PIL.Image.Resampling.LANCZOS 641 ), 642 dtype=image.dtype, 643 ) 644 if image.ndim == 2: 645 # We convert to floating-point for resizing and convert back. 646 return to_type(resize_image(to_float01(image), shape), image.dtype) 647 # We resize each image channel individually. 648 return np.dstack( 649 [resize_image(channel, shape) for channel in np.moveaxis(image, -1, 0)] 650 ) 651 652 653# ** Video processing. 654 655 656def resize_video(video: Iterable[_NDArray], shape: tuple[int, int]) -> _NDArray: 657 """Resizes `video` to specified spatial dimensions using a Lanczos filter. 658 659 Args: 660 video: Iterable of images. 661 shape: 2D spatial dimensions (height, width) of output video. 662 663 Returns: 664 A resampled video whose spatial dimensions match `shape`. 665 """ 666 _check_2d_shape(shape) 667 return np.array([resize_image(image, shape) for image in video]) 668 669 670# ** General I/O. 671 672 673def _is_url(path_or_url: _Path) -> bool: 674 return isinstance(path_or_url, str) and path_or_url.startswith( 675 ('http://', 'https://', 'file://') 676 ) 677 678 679def read_contents(path_or_url: _Path) -> bytes: 680 """Returns the contents of the file specified by either a path or URL.""" 681 data: bytes 682 if _is_url(path_or_url): 683 assert isinstance(path_or_url, str) 684 headers = {'User-Agent': 'Chrome'} 685 request = urllib.request.Request(path_or_url, headers=headers) 686 with urllib.request.urlopen(request) as response: 687 data = response.read() 688 else: 689 with _open(path_or_url, 'rb') as f: 690 data = f.read() 691 return data 692 693 694@contextlib.contextmanager 695def _read_via_local_file(path_or_url: _Path) -> Iterator[str]: 696 """Context to copy a remote file locally to read from it. 697 698 Args: 699 path_or_url: File, which may be remote. 700 701 Yields: 702 The name of a local file which may be a copy of a remote file. 703 """ 704 if _is_url(path_or_url) or not _path_is_local(path_or_url): 705 suffix = pathlib.Path(path_or_url).suffix 706 with tempfile.TemporaryDirectory() as directory_name: 707 tmp_path = pathlib.Path(directory_name) / f'file{suffix}' 708 tmp_path.write_bytes(read_contents(path_or_url)) 709 yield str(tmp_path) 710 else: 711 yield str(path_or_url) 712 713 714@contextlib.contextmanager 715def _write_via_local_file(path: _Path) -> Iterator[str]: 716 """Context to write a temporary local file and subsequently copy it remotely. 717 718 Args: 719 path: File, which may be remote. 720 721 Yields: 722 The name of a local file which may be subsequently copied remotely. 723 """ 724 if _path_is_local(path): 725 yield str(path) 726 else: 727 suffix = pathlib.Path(path).suffix 728 with tempfile.TemporaryDirectory() as directory_name: 729 tmp_path = pathlib.Path(directory_name) / f'file{suffix}' 730 yield str(tmp_path) 731 with _open(path, mode='wb') as f: 732 f.write(tmp_path.read_bytes()) 733 734 735class set_show_save_dir: # pylint: disable=invalid-name 736 """Save all titled output from `show_*()` calls into files. 737 738 If the specified `directory` is not None, all titled images and videos 739 displayed by `show_image`, `show_images`, `show_video`, and `show_videos` are 740 also saved as files within the directory. 741 742 It can be used either to set the state or as a context manager: 743 744 >>> set_show_save_dir('/tmp') 745 >>> show_image(color_ramp(), title='image1') # Creates /tmp/image1.png. 746 >>> show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4. 747 >>> set_show_save_dir(None) 748 749 >>> with set_show_save_dir('/tmp'): 750 ... show_image(color_ramp(), title='image1') # Creates /tmp/image1.png. 751 ... show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4. 752 """ 753 754 def __init__(self, directory: _Path | None): 755 self._old_show_save_dir = _config.show_save_dir 756 _config.show_save_dir = directory 757 758 def __enter__(self) -> None: 759 pass 760 761 def __exit__(self, *_: Any) -> None: 762 _config.show_save_dir = self._old_show_save_dir 763 764 765# ** Image I/O. 766 767 768def read_image( 769 path_or_url: _Path, 770 *, 771 apply_exif_transpose: bool = True, 772 dtype: _DTypeLike = None, 773) -> _NDArray: 774 """Returns an image read from a file path or URL. 775 776 Decoding is performed using `PIL`, which supports `uint8` images with 1, 3, 777 or 4 channels and `uint16` images with a single channel. 778 779 Args: 780 path_or_url: Path of input file. 781 apply_exif_transpose: If True, rotate image according to EXIF orientation. 782 dtype: Data type of the returned array. If None, `np.uint8` or `np.uint16` 783 is inferred automatically. 784 """ 785 data = read_contents(path_or_url) 786 return decompress_image(data, dtype, apply_exif_transpose) 787 788 789def write_image( 790 path: _Path, image: _ArrayLike, fmt: str = 'png', **kwargs: Any 791) -> None: 792 """Writes an image to a file. 793 794 Encoding is performed using `PIL`, which supports `uint8` images with 1, 3, 795 or 4 channels and `uint16` images with a single channel. 796 797 File format is explicitly provided by `fmt` and not inferred by `path`. 798 799 Args: 800 path: Path of output file. 801 image: Array-like object. If its type is float, it is converted to np.uint8 802 using `to_uint8` (thus clamping to the input to the range [0.0, 1.0]). 803 Otherwise it must be np.uint8 or np.uint16. 804 fmt: Desired compression encoding, e.g. 'png'. 805 **kwargs: Additional parameters for `PIL.Image.save()`. 806 """ 807 image = _as_valid_media_array(image) 808 if np.issubdtype(image.dtype, np.floating): 809 image = to_uint8(image) 810 with _open(path, 'wb') as f: 811 _pil_image(image).save(f, format=fmt, **kwargs) 812 813 814def to_rgb( 815 array: _ArrayLike, 816 *, 817 vmin: float | None = None, 818 vmax: float | None = None, 819 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 820) -> _NDArray: 821 """Maps scalar values to RGB using value bounds and a color map. 822 823 Args: 824 array: Scalar values, with arbitrary shape. 825 vmin: Explicit min value for remapping; if None, it is obtained as the 826 minimum finite value of `array`. 827 vmax: Explicit max value for remapping; if None, it is obtained as the 828 maximum finite value of `array`. 829 cmap: A `pyplot` color map or callable, to map from 1D value to 3D or 4D 830 color. 831 832 Returns: 833 A new array in which each element is affinely mapped from [vmin, vmax] 834 to [0.0, 1.0] and then color-mapped. 835 """ 836 a = _as_valid_media_array(array) 837 del array 838 # For future numpy version 1.7.0: 839 # vmin = np.amin(a, where=np.isfinite(a)) if vmin is None else vmin 840 # vmax = np.amax(a, where=np.isfinite(a)) if vmax is None else vmax 841 vmin = np.amin(np.where(np.isfinite(a), a, np.inf)) if vmin is None else vmin 842 vmax = np.amax(np.where(np.isfinite(a), a, -np.inf)) if vmax is None else vmax 843 a = (a.astype('float') - vmin) / (vmax - vmin + np.finfo(float).eps) 844 if isinstance(cmap, str): 845 if hasattr(matplotlib, 'colormaps'): 846 rgb_from_scalar: Any = matplotlib.colormaps[cmap] # Newer version. 847 else: 848 rgb_from_scalar = matplotlib.pyplot.cm.get_cmap(cmap) # pylint: disable=no-member 849 else: 850 rgb_from_scalar = cmap 851 a = cast(_NDArray, rgb_from_scalar(a)) 852 # If there is a fully opaque alpha channel, remove it. 853 if a.shape[-1] == 4 and np.all(to_float01(a[..., 3])) == 1.0: 854 a = a[..., :3] 855 return a 856 857 858def compress_image( 859 image: _ArrayLike, *, fmt: str = 'png', **kwargs: Any 860) -> bytes: 861 """Returns a buffer containing a compressed image. 862 863 Args: 864 image: Array in a format supported by `PIL`, e.g. np.uint8 or np.uint16. 865 fmt: Desired compression encoding, e.g. 'png'. 866 **kwargs: Options for `PIL.save()`, e.g. `optimize=True` for greater 867 compression. 868 """ 869 image = _as_valid_media_array(image) 870 with io.BytesIO() as output: 871 _pil_image(image).save(output, format=fmt, **kwargs) 872 return output.getvalue() 873 874 875def decompress_image( 876 data: bytes, dtype: _DTypeLike = None, apply_exif_transpose: bool = True 877) -> _NDArray: 878 """Returns an image from a compressed data buffer. 879 880 Decoding is performed using `PIL`, which supports `uint8` images with 1, 3, 881 or 4 channels and `uint16` images with a single channel. 882 883 Args: 884 data: Buffer containing compressed image. 885 dtype: Data type of the returned array. If None, `np.uint8` or `np.uint16` 886 is inferred automatically. 887 apply_exif_transpose: If True, rotate image according to EXIF orientation. 888 """ 889 pil_image: PIL.Image.Image = PIL.Image.open(io.BytesIO(data)) 890 if apply_exif_transpose: 891 tmp_image = PIL.ImageOps.exif_transpose(pil_image) # Future: in_place=True. 892 assert tmp_image 893 pil_image = tmp_image 894 if dtype is None: 895 dtype = np.uint16 if pil_image.mode.startswith('I') else np.uint8 896 return np.array(pil_image, dtype=dtype) 897 898 899def html_from_compressed_image( 900 data: bytes, 901 width: int, 902 height: int, 903 *, 904 title: str | None = None, 905 border: bool | str = False, 906 pixelated: bool = True, 907 fmt: str = 'png', 908) -> str: 909 """Returns an HTML string with an image tag containing encoded data. 910 911 Args: 912 data: Compressed image bytes. 913 width: Width of HTML image in pixels. 914 height: Height of HTML image in pixels. 915 title: Optional text shown centered above image. 916 border: If `bool`, whether to place a black boundary around the image, or if 917 `str`, the boundary CSS style. 918 pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'. 919 fmt: Compression encoding. 920 """ 921 b64 = base64.b64encode(data).decode('utf-8') 922 if isinstance(border, str): 923 border = f'{border}; ' 924 elif border: 925 border = 'border:1px solid black; ' 926 else: 927 border = '' 928 s_pixelated = 'pixelated' if pixelated else 'auto' 929 s = ( 930 f'<img width="{width}" height="{height}"' 931 f' style="{border}image-rendering:{s_pixelated}; object-fit:cover;"' 932 f' src="data:image/{fmt};base64,{b64}"/>' 933 ) 934 if title is not None: 935 s = f"""<div style="display:flex; align-items:left;"> 936 <div style="display:flex; flex-direction:column; align-items:center;"> 937 <div>{title}</div><div>{s}</div></div></div>""" 938 return s 939 940 941def _get_width_height( 942 width: int | None, height: int | None, shape: tuple[int, int] 943) -> tuple[int, int]: 944 """Returns (width, height) given optional parameters and image shape.""" 945 assert len(shape) == 2, shape 946 if width and height: 947 return width, height 948 if width and not height: 949 return width, int(width * (shape[0] / shape[1]) + 0.5) 950 if height and not width: 951 return int(height * (shape[1] / shape[0]) + 0.5), height 952 return shape[::-1] 953 954 955def _ensure_mapped_to_rgb( 956 image: _ArrayLike, 957 *, 958 vmin: float | None = None, 959 vmax: float | None = None, 960 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 961) -> _NDArray: 962 """Ensure image is mapped to RGB.""" 963 image = _as_valid_media_array(image) 964 if not (image.ndim == 2 or (image.ndim == 3 and image.shape[2] in (1, 3, 4))): 965 raise ValueError( 966 f'Image with shape {image.shape} is neither a 2D array' 967 ' nor a 3D array with 1, 3, or 4 channels.' 968 ) 969 if image.ndim == 3 and image.shape[2] == 1: 970 image = image[:, :, 0] 971 if image.ndim == 2: 972 image = to_rgb(image, vmin=vmin, vmax=vmax, cmap=cmap) 973 return image 974 975 976def show_image( 977 image: _ArrayLike, *, title: str | None = None, **kwargs: Any 978) -> str | None: 979 """Displays an image in the notebook and optionally saves it to a file. 980 981 See `show_images`. 982 983 >>> show_image(np.random.rand(100, 100)) 984 >>> show_image(np.random.randint(0, 256, size=(80, 80, 3), dtype='uint8')) 985 >>> show_image(np.random.rand(10, 10) - 0.5, cmap='bwr', height=100) 986 >>> show_image(read_image('/tmp/image.png')) 987 >>> url = 'https://github.com/hhoppe/data/raw/main/image.png' 988 >>> show_image(read_image(url)) 989 990 Args: 991 image: 2D array-like, or 3D array-like with 1, 3, or 4 channels. 992 title: Optional text shown centered above the image. 993 **kwargs: See `show_images`. 994 995 Returns: 996 html string if `return_html` is `True`. 997 """ 998 return show_images([np.asarray(image)], [title], **kwargs) 999 1000 1001def show_images( 1002 images: Iterable[_ArrayLike] | Mapping[str, _ArrayLike], 1003 titles: Iterable[str | None] | None = None, 1004 *, 1005 width: int | None = None, 1006 height: int | None = None, 1007 downsample: bool = True, 1008 columns: int | None = None, 1009 vmin: float | None = None, 1010 vmax: float | None = None, 1011 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 1012 border: bool | str = False, 1013 ylabel: str = '', 1014 html_class: str = 'show_images', 1015 pixelated: bool | None = None, 1016 return_html: bool = False, 1017) -> str | None: 1018 """Displays a row of images in the IPython/Jupyter notebook. 1019 1020 If a directory has been specified using `set_show_save_dir`, also saves each 1021 titled image to a file in that directory based on its title. 1022 1023 >>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64)) 1024 >>> show_images([image1, image2]) 1025 >>> show_images({'random image': image1, 'color ramp': image2}, height=128) 1026 >>> show_images([image1, image2] * 5, columns=4, border=True) 1027 1028 Args: 1029 images: Iterable of images, or dictionary of `{title: image}`. Each image 1030 must be either a 2D array or a 3D array with 1, 3, or 4 channels. 1031 titles: Optional strings shown above the corresponding images. 1032 width: Optional, overrides displayed width (in pixels). 1033 height: Optional, overrides displayed height (in pixels). 1034 downsample: If True, each image whose width or height is greater than the 1035 specified `width` or `height` is resampled to the display resolution. This 1036 improves antialiasing and reduces the size of the notebook. 1037 columns: Optional, maximum number of images per row. 1038 vmin: For single-channel image, explicit min value for display. 1039 vmax: For single-channel image, explicit max value for display. 1040 cmap: For single-channel image, `pyplot` color map or callable to map 1D to 1041 3D color. 1042 border: If `bool`, whether to place a black boundary around the image, or if 1043 `str`, the boundary CSS style. 1044 ylabel: Text (rotated by 90 degrees) shown on the left of each row. 1045 html_class: CSS class name used in definition of HTML element. 1046 pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'; if 1047 False, sets 'image-rendering: auto'; if None, uses pixelated rendering 1048 only on images for which `width` or `height` introduces magnification. 1049 return_html: If `True` return the raw HTML `str` instead of displaying. 1050 1051 Returns: 1052 html string if `return_html` is `True`. 1053 """ 1054 if isinstance(images, Mapping): 1055 if titles is not None: 1056 raise ValueError('Cannot have images dictionary and titles parameter.') 1057 list_titles, list_images = list(images.keys()), list(images.values()) 1058 else: 1059 list_images = list(images) 1060 list_titles = [None] * len(list_images) if titles is None else list(titles) 1061 if len(list_images) != len(list_titles): 1062 raise ValueError( 1063 'Number of images does not match number of titles' 1064 f' ({len(list_images)} vs {len(list_titles)}).' 1065 ) 1066 1067 list_images = [ 1068 _ensure_mapped_to_rgb(image, vmin=vmin, vmax=vmax, cmap=cmap) 1069 for image in list_images 1070 ] 1071 1072 def maybe_downsample(image: _NDArray) -> _NDArray: 1073 shape = image.shape[0], image.shape[1] 1074 w, h = _get_width_height(width, height, shape) 1075 if w < shape[1] or h < shape[0]: 1076 image = resize_image(image, (h, w)) 1077 return image 1078 1079 if downsample: 1080 list_images = [maybe_downsample(image) for image in list_images] 1081 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1082 1083 for title, png_data in zip(list_titles, png_datas): 1084 if title is not None and _config.show_save_dir: 1085 path = pathlib.Path(_config.show_save_dir) / f'{title}.png' 1086 with _open(path, mode='wb') as f: 1087 f.write(png_data) 1088 1089 def html_from_compressed_images() -> str: 1090 html_strings = [] 1091 for image, title, png_data in zip(list_images, list_titles, png_datas): 1092 w, h = _get_width_height(width, height, image.shape[:2]) 1093 magnified = h > image.shape[0] or w > image.shape[1] 1094 pixelated2 = pixelated if pixelated is not None else magnified 1095 html_strings.append( 1096 html_from_compressed_image( 1097 png_data, w, h, title=title, border=border, pixelated=pixelated2 1098 ) 1099 ) 1100 # Create single-row tables each with no more than 'columns' elements. 1101 table_strings = [] 1102 for row_html_strings in _chunked(html_strings, columns): 1103 td = '<td style="padding:1px;">' 1104 s = ''.join(f'{td}{e}</td>' for e in row_html_strings) 1105 if ylabel: 1106 style = 'writing-mode:vertical-lr; transform:rotate(180deg);' 1107 s = f'{td}<span style="{style}">{ylabel}</span></td>' + s 1108 table_strings.append( 1109 f'<table class="{html_class}"' 1110 f' style="border-spacing:0px;"><tr>{s}</tr></table>' 1111 ) 1112 return ''.join(table_strings) 1113 1114 s = html_from_compressed_images() 1115 while len(s) > _IPYTHON_HTML_SIZE_LIMIT * 0.5: 1116 warnings.warn('mediapy: subsampling images to reduce HTML size') 1117 list_images = [image[::2, ::2] for image in list_images] 1118 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1119 s = html_from_compressed_images() 1120 if return_html: 1121 return s 1122 _display_html(s) 1123 return None 1124 1125 1126def compare_images( 1127 images: Iterable[_ArrayLike], 1128 *, 1129 vmin: float | None = None, 1130 vmax: float | None = None, 1131 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 1132) -> None: 1133 """Compare two images using an interactive slider. 1134 1135 Displays an HTML slider component to interactively swipe between two images. 1136 The slider functionality requires that the web browser have Internet access. 1137 See additional info in `https://github.com/sneas/img-comparison-slider`. 1138 1139 >>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64)) 1140 >>> compare_images([image1, image2]) 1141 1142 Args: 1143 images: Iterable of images. Each image must be either a 2D array or a 3D 1144 array with 1, 3, or 4 channels. There must be exactly two images. 1145 vmin: For single-channel image, explicit min value for display. 1146 vmax: For single-channel image, explicit max value for display. 1147 cmap: For single-channel image, `pyplot` color map or callable to map 1D to 1148 3D color. 1149 """ 1150 list_images = [ 1151 _ensure_mapped_to_rgb(image, vmin=vmin, vmax=vmax, cmap=cmap) 1152 for image in images 1153 ] 1154 if len(list_images) != 2: 1155 raise ValueError('The number of images must be 2.') 1156 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1157 b64_1, b64_2 = [ 1158 base64.b64encode(png_data).decode('utf-8') for png_data in png_datas 1159 ] 1160 s = _IMAGE_COMPARISON_HTML.replace('{b64_1}', b64_1).replace('{b64_2}', b64_2) 1161 _display_html(s) 1162 1163 1164# ** Video I/O. 1165 1166 1167def _filename_suffix_from_codec(codec: str) -> str: 1168 if codec == 'gif': 1169 return '.gif' 1170 if codec == 'vp9': 1171 return '.webm' 1172 1173 return '.mp4' 1174 1175 1176def _get_ffmpeg_path() -> str: 1177 path = _search_for_ffmpeg_path() 1178 if not path: 1179 raise RuntimeError( 1180 f"Program '{_config.ffmpeg_name_or_path}' is not found;" 1181 " perhaps install ffmpeg using 'apt install ffmpeg'." 1182 ) 1183 return path 1184 1185 1186@typing.overload 1187def _run_ffmpeg( 1188 ffmpeg_args: Sequence[str], 1189 stdin: int | None = None, 1190 stdout: int | None = None, 1191 stderr: int | None = None, 1192 encoding: None = None, # No encoding -> bytes 1193 allowed_input_files: Sequence[str] | None = None, 1194 allowed_output_files: Sequence[str] | None = None, 1195 sandbox_max_run_time_secs: int | None = None, 1196) -> subprocess.Popen[bytes]: 1197 ... 1198 1199 1200@typing.overload 1201def _run_ffmpeg( 1202 ffmpeg_args: Sequence[str], 1203 stdin: int | None = None, 1204 stdout: int | None = None, 1205 stderr: int | None = None, 1206 encoding: str = ..., # Encoding -> str 1207 allowed_input_files: Sequence[str] | None = None, 1208 allowed_output_files: Sequence[str] | None = None, 1209 sandbox_max_run_time_secs: int | None = None, 1210) -> subprocess.Popen[str]: 1211 ... 1212 1213 1214def _run_ffmpeg( 1215 ffmpeg_args: Sequence[str], 1216 stdin: int | None = None, 1217 stdout: int | None = None, 1218 stderr: int | None = None, 1219 encoding: str | None = None, 1220 allowed_input_files: Sequence[str] | None = None, 1221 allowed_output_files: Sequence[str] | None = None, 1222 sandbox_max_run_time_secs: int | None = None, 1223) -> subprocess.Popen[bytes] | subprocess.Popen[str]: 1224 """Runs ffmpeg with the given args. 1225 1226 Args: 1227 ffmpeg_args: The args to pass to ffmpeg. 1228 stdin: Same as in `subprocess.Popen`. 1229 stdout: Same as in `subprocess.Popen`. 1230 stderr: Same as in `subprocess.Popen`. 1231 encoding: Same as in `subprocess.Popen`. 1232 allowed_input_files: The input files to allow for ffmpeg. 1233 allowed_output_files: The output files to allow for ffmpeg. 1234 sandbox_max_run_time_secs: The maximum time in seconds to run the sandbox. 1235 If None, the default limit is 30 minutes. 1236 1237 Returns: 1238 The subprocess.Popen object with running ffmpeg process. 1239 """ 1240 argv = [] 1241 # In open source, keep env=None to preserve default behavior. 1242 # Context: https://github.com/google/mediapy/pull/62 1243 env: Any = None # pylint: disable=unused-variable 1244 ffmpeg_path = _get_ffmpeg_path() 1245 1246 # Sandbox max runtime, allowed input and ouput files are not supported in 1247 # open source. 1248 del allowed_input_files 1249 del allowed_output_files 1250 del sandbox_max_run_time_secs 1251 1252 argv.append(ffmpeg_path) 1253 argv.extend(ffmpeg_args) 1254 1255 return subprocess.Popen( 1256 argv, 1257 stdin=stdin, 1258 stdout=stdout, 1259 stderr=stderr, 1260 encoding=encoding, 1261 env=env, 1262 ) 1263 1264 1265def video_is_available() -> bool: 1266 """Returns True if the program `ffmpeg` is found. 1267 1268 See also `set_ffmpeg`. 1269 """ 1270 return _search_for_ffmpeg_path() is not None 1271 1272 1273class VideoMetadata(NamedTuple): 1274 """Represents the data stored in a video container header. 1275 1276 Attributes: 1277 num_images: Number of frames that is expected from the video stream. This 1278 is estimated from the framerate and the duration stored in the video 1279 header, so it might be inexact. We set the value to -1 if number of 1280 frames is not found in the header. 1281 shape: The dimensions (height, width) of each video frame. 1282 fps: The framerate in frames per second. 1283 bps: The estimated bitrate of the video stream in bits per second, retrieved 1284 from the video header. 1285 """ 1286 1287 num_images: int 1288 shape: tuple[int, int] 1289 fps: float 1290 bps: int | None 1291 1292 1293def _get_video_metadata(path: _Path) -> VideoMetadata: 1294 """Returns attributes of video stored in the specified local file.""" 1295 if not pathlib.Path(path).is_file(): 1296 raise RuntimeError(f"Video file '{path}' is not found.") 1297 1298 command = [ 1299 '-nostdin', 1300 '-i', 1301 str(path), 1302 '-acodec', 1303 'copy', 1304 # Necessary to get "frame= *(\d+)" using newer ffmpeg versions. 1305 # Previously, was `'-vcodec', 'copy'` 1306 '-vf', 1307 'select=1', 1308 '-vsync', 1309 '0', 1310 '-f', 1311 'null', 1312 '-', 1313 ] 1314 with _run_ffmpeg( 1315 command, 1316 allowed_input_files=[str(path)], 1317 stderr=subprocess.PIPE, 1318 encoding='utf-8', 1319 ) as proc: 1320 _, err = proc.communicate() 1321 bps = fps = num_images = width = height = rotation = None 1322 before_output_info = True 1323 for line in err.split('\n'): 1324 if line.startswith('Output '): 1325 before_output_info = False 1326 if match := re.search(r', bitrate: *([\d.]+) kb/s', line): 1327 bps = int(match.group(1)) * 1000 1328 if matches := re.findall(r'frame= *(\d+) ', line): 1329 num_images = int(matches[-1]) 1330 if 'Stream #0:' in line and ': Video:' in line and before_output_info: 1331 if not (match := re.search(r', (\d+)x(\d+)', line)): 1332 raise RuntimeError(f'Unable to parse video dimensions in line {line}') 1333 width, height = int(match.group(1)), int(match.group(2)) 1334 if match := re.search(r', ([\d.]+) fps', line): 1335 fps = float(match.group(1)) 1336 elif str(path).endswith('.gif'): 1337 # Some GIF files lack a framerate attribute; use a reasonable default. 1338 fps = 10 1339 else: 1340 raise RuntimeError(f'Unable to parse video framerate in line {line}') 1341 if match := re.fullmatch(r'\s*rotate\s*:\s*(\d+)', line): 1342 rotation = int(match.group(1)) 1343 if match := re.fullmatch(r'.*rotation of (-?\d+).*\sdegrees', line): 1344 rotation = int(match.group(1)) 1345 if not num_images: 1346 num_images = -1 1347 if not width: 1348 raise RuntimeError(f'Unable to parse video header: {err}') 1349 # By default, ffmpeg enables "-autorotate"; we just fix the dimensions. 1350 if rotation in (90, 270, -90, -270): 1351 width, height = height, width 1352 assert height is not None and width is not None 1353 shape = height, width 1354 assert fps is not None 1355 return VideoMetadata(num_images, shape, fps, bps) 1356 1357 1358class _VideoIO: 1359 """Base class for `VideoReader` and `VideoWriter`.""" 1360 1361 def _get_pix_fmt(self, dtype: _DType, image_format: str) -> str: 1362 """Returns ffmpeg pix_fmt given data type and image format.""" 1363 native_endian_suffix = {'little': 'le', 'big': 'be'}[sys.byteorder] 1364 return { 1365 np.uint8: { 1366 'rgb': 'rgb24', 1367 'yuv': 'yuv444p', 1368 'gray': 'gray', 1369 }, 1370 np.uint16: { 1371 'rgb': 'rgb48' + native_endian_suffix, 1372 'yuv': 'yuv444p16' + native_endian_suffix, 1373 'gray': 'gray16' + native_endian_suffix, 1374 }, 1375 }[dtype.type][image_format] 1376 1377 1378class VideoReader(_VideoIO): 1379 """Context to read a compressed video as an iterable over its images. 1380 1381 >>> with VideoReader('/tmp/river.mp4') as reader: 1382 ... print(f'Video has {reader.num_images} images with shape={reader.shape},' 1383 ... f' at {reader.fps} frames/sec and {reader.bps} bits/sec.') 1384 ... for image in reader: 1385 ... print(image.shape) 1386 1387 >>> with VideoReader('/tmp/river.mp4') as reader: 1388 ... video = np.array(tuple(reader)) 1389 1390 >>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4' 1391 >>> with VideoReader(url) as reader: 1392 ... show_video(reader) 1393 1394 Attributes: 1395 path_or_url: Location of input video. 1396 output_format: Format of output images (default 'rgb'). If 'rgb', each 1397 image has shape=(height, width, 3) with R, G, B values. If 'yuv', each 1398 image has shape=(height, width, 3) with Y, U, V values. If 'gray', each 1399 image has shape=(height, width). 1400 dtype: Data type for output images. The default is `np.uint8`. Use of 1401 `np.uint16` allows reading 10-bit or 12-bit data without precision loss. 1402 metadata: Object storing the information retrieved from the video header. 1403 Its attributes are copied as attributes in this class. 1404 num_images: Number of frames that is expected from the video stream. This 1405 is estimated from the framerate and the duration stored in the video 1406 header, so it might be inexact. 1407 shape: The dimensions (height, width) of each video frame. 1408 fps: The framerate in frames per second. 1409 bps: The estimated bitrate of the video stream in bits per second, retrieved 1410 from the video header. 1411 stream_index: The stream index to read from. The default is 0. 1412 sandbox_max_run_time_secs: The maximum time in seconds to run the sandbox. 1413 If None, the default limit is 30 minutes. Unused in open source. 1414 """ 1415 1416 path_or_url: _Path 1417 output_format: str 1418 dtype: _DType 1419 metadata: VideoMetadata 1420 num_images: int 1421 shape: tuple[int, int] 1422 fps: float 1423 bps: int | None 1424 stream_index: int 1425 _num_bytes_per_image: int 1426 1427 def __init__( 1428 self, 1429 path_or_url: _Path, 1430 *, 1431 stream_index: int = 0, 1432 output_format: str = 'rgb', 1433 dtype: _DTypeLike = np.uint8, 1434 sandbox_max_run_time_secs: int | None = None, 1435 ): 1436 if output_format not in {'rgb', 'yuv', 'gray'}: 1437 raise ValueError( 1438 f'Output format {output_format} is not rgb, yuv, or gray.' 1439 ) 1440 self.path_or_url = path_or_url 1441 self.output_format = output_format 1442 self.stream_index = stream_index 1443 self.dtype = np.dtype(dtype) 1444 if self.dtype.type not in (np.uint8, np.uint16): 1445 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1446 self.sandbox_max_run_time_secs = sandbox_max_run_time_secs 1447 self._read_via_local_file: Any = None 1448 self._popen: subprocess.Popen[bytes] | None = None 1449 self._proc: subprocess.Popen[bytes] | None = None 1450 1451 def __enter__(self) -> 'VideoReader': 1452 try: 1453 self._read_via_local_file = _read_via_local_file(self.path_or_url) 1454 # pylint: disable-next=no-member 1455 tmp_name = self._read_via_local_file.__enter__() 1456 1457 self.metadata = _get_video_metadata(tmp_name) 1458 self.num_images, self.shape, self.fps, self.bps = self.metadata 1459 pix_fmt = self._get_pix_fmt(self.dtype, self.output_format) 1460 num_channels = {'rgb': 3, 'yuv': 3, 'gray': 1}[self.output_format] 1461 bytes_per_channel = self.dtype.itemsize 1462 self._num_bytes_per_image = ( 1463 math.prod(self.shape) * num_channels * bytes_per_channel 1464 ) 1465 1466 command = [ 1467 '-v', 1468 'panic', 1469 '-nostdin', 1470 '-i', 1471 tmp_name, 1472 '-vcodec', 1473 'rawvideo', 1474 '-f', 1475 'image2pipe', 1476 '-map', 1477 f'0:v:{self.stream_index}', 1478 '-pix_fmt', 1479 pix_fmt, 1480 '-vsync', 1481 'vfr', 1482 '-', 1483 ] 1484 self._popen = _run_ffmpeg( 1485 command, 1486 stdout=subprocess.PIPE, 1487 stderr=subprocess.PIPE, 1488 allowed_input_files=[tmp_name], 1489 sandbox_max_run_time_secs=self.sandbox_max_run_time_secs, 1490 ) 1491 self._proc = self._popen.__enter__() 1492 except Exception: 1493 self.__exit__(None, None, None) 1494 raise 1495 return self 1496 1497 def __exit__(self, *_: Any) -> None: 1498 self.close() 1499 1500 def read(self) -> _NDArray | None: 1501 """Reads a video image frame (or None if at end of file). 1502 1503 Returns: 1504 A numpy array in the format specified by `output_format`, i.e., a 3D 1505 array with 3 color channels, except for format 'gray' which is 2D. 1506 1507 Raises: 1508 RuntimeError: If there is an error reading from the output file. 1509 """ 1510 assert self._proc, 'Error: reading from an already closed context.' 1511 stdout = self._proc.stdout 1512 assert stdout is not None 1513 data = stdout.read(self._num_bytes_per_image) 1514 if not data: # Due to either end-of-file or subprocess error. 1515 self.close() # Raises exception if subprocess had error. 1516 return None # To indicate end-of-file. 1517 if len(data) != self._num_bytes_per_image: 1518 self._proc.wait() 1519 stderr = self._proc.stderr 1520 stderr_output = '' 1521 if stderr is not None: 1522 stderr_output = stderr.read().decode('utf-8', errors='replace').strip() 1523 raise RuntimeError( 1524 f'ffmpeg exited with code {self._proc.returncode}.\nIncomplete' 1525 f' frame read: expected {self._num_bytes_per_image} bytes, but got' 1526 f' {len(data)}.\nffmpeg stderr:\n{stderr_output}' 1527 ) 1528 image = np.frombuffer(data, dtype=self.dtype) 1529 if self.output_format == 'rgb': 1530 image = image.reshape(*self.shape, 3) 1531 elif self.output_format == 'yuv': # Convert from planar YUV to pixel YUV. 1532 image = np.moveaxis(image.reshape(3, *self.shape), 0, 2) 1533 elif self.output_format == 'gray': # Generate 2D rather than 3D ndimage. 1534 image = image.reshape(*self.shape) 1535 else: 1536 raise AssertionError 1537 return image 1538 1539 def __iter__(self) -> Iterator[_NDArray]: 1540 while True: 1541 image = self.read() 1542 if image is None: 1543 return 1544 yield image 1545 1546 def close(self) -> None: 1547 """Terminates video reader. (Called automatically at end of context.)""" 1548 if self._popen: 1549 self._popen.__exit__(None, None, None) 1550 self._popen = None 1551 self._proc = None 1552 if self._read_via_local_file: 1553 # pylint: disable-next=no-member 1554 self._read_via_local_file.__exit__(None, None, None) 1555 self._read_via_local_file = None 1556 1557 1558class VideoWriter(_VideoIO): 1559 """Context to write a compressed video. 1560 1561 >>> shape = 480, 640 1562 >>> with VideoWriter('/tmp/v.mp4', shape, fps=60) as writer: 1563 ... for image in moving_circle(shape, num_images=60): 1564 ... writer.add_image(image) 1565 >>> show_video(read_video('/tmp/v.mp4')) 1566 1567 1568 Bitrate control may be specified using at most one of: `bps`, `qp`, or `crf`. 1569 If none are specified, `qp` is set to a default value. 1570 See https://slhck.info/video/2017/03/01/rate-control.html 1571 1572 If codec is 'gif', the args `bps`, `qp`, `crf`, and `encoded_format` are 1573 ignored. 1574 1575 Attributes: 1576 path: Output video. Its suffix (e.g. '.mp4') determines the video container 1577 format. The suffix must be '.gif' if the codec is 'gif'. 1578 shape: 2D spatial dimensions (height, width) of video image frames. The 1579 dimensions must be even if 'encoded_format' has subsampled chroma (e.g., 1580 'yuv420p' or 'yuv420p10le'). 1581 codec: Compression algorithm as defined by "ffmpeg -codecs" (e.g., 'h264', 1582 'hevc', 'vp9', or 'gif'). 1583 metadata: Optional VideoMetadata object whose `fps` and `bps` attributes are 1584 used if not specified as explicit parameters. 1585 fps: Frames-per-second framerate (default is 60.0 except 25.0 for 'gif'). 1586 bps: Requested average bits-per-second bitrate (default None). 1587 qp: Quantization parameter for video compression quality (default None). 1588 crf: Constant rate factor for video compression quality (default None). 1589 ffmpeg_args: Additional arguments for `ffmpeg` command, e.g. '-g 30' to 1590 introduce I-frames, or '-bf 0' to omit B-frames. 1591 input_format: Format of input images (default 'rgb'). If 'rgb', each image 1592 has shape=(height, width, 3) or (height, width). If 'yuv', each image has 1593 shape=(height, width, 3) with Y, U, V values. If 'gray', each image has 1594 shape=(height, width). 1595 dtype: Expected data type for input images (any float input images are 1596 converted to `dtype`). The default is `np.uint8`. Use of `np.uint16` is 1597 necessary when encoding >8 bits/channel. 1598 encoded_format: Pixel format as defined by `ffmpeg -pix_fmts`, e.g., 1599 'yuv420p' (2x2-subsampled chroma), 'yuv444p' (full-res chroma), 1600 'yuv420p10le' (10-bit per channel), etc. The default (None) selects 1601 'yuv420p' if all shape dimensions are even, else 'yuv444p'. 1602 sandbox_max_run_time_secs: The maximum time in seconds to run the sandbox. 1603 If None, the default limit is 30 minutes. Unused in open source. 1604 """ 1605 1606 def __init__( 1607 self, 1608 path: _Path, 1609 shape: tuple[int, int], 1610 *, 1611 codec: str = 'h264', 1612 metadata: VideoMetadata | None = None, 1613 fps: float | None = None, 1614 bps: int | None = None, 1615 qp: int | None = None, 1616 crf: float | None = None, 1617 ffmpeg_args: str | Sequence[str] = '', 1618 input_format: str = 'rgb', 1619 dtype: _DTypeLike = np.uint8, 1620 encoded_format: str | None = None, 1621 sandbox_max_run_time_secs: int | None = None, 1622 ) -> None: 1623 _check_2d_shape(shape) 1624 if fps is None and metadata: 1625 fps = metadata.fps 1626 if fps is None: 1627 fps = 25.0 if codec == 'gif' else 60.0 1628 if fps <= 0.0: 1629 raise ValueError(f'Frame-per-second value {fps} is invalid.') 1630 if bps is None and metadata: 1631 bps = metadata.bps 1632 bps = int(bps) if bps is not None else None 1633 if bps is not None and bps <= 0: 1634 raise ValueError(f'Bitrate value {bps} is invalid.') 1635 if qp is not None and (not isinstance(qp, int) or qp < 0): 1636 raise ValueError( 1637 f'Quantization parameter {qp} cannot be negative. It must be a' 1638 ' non-negative integer.' 1639 ) 1640 num_rate_specifications = sum(x is not None for x in (bps, qp, crf)) 1641 if num_rate_specifications > 1: 1642 raise ValueError( 1643 f'Must specify at most one of bps, qp, or crf ({bps}, {qp}, {crf}).' 1644 ) 1645 ffmpeg_args = ( 1646 shlex.split(ffmpeg_args) 1647 if isinstance(ffmpeg_args, str) 1648 else list(ffmpeg_args) 1649 ) 1650 if input_format not in {'rgb', 'yuv', 'gray'}: 1651 raise ValueError(f'Input format {input_format} is not rgb, yuv, or gray.') 1652 dtype = np.dtype(dtype) 1653 if dtype.type not in (np.uint8, np.uint16): 1654 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1655 self.path = pathlib.Path(path) 1656 self.shape = shape 1657 all_dimensions_are_even = all(dim % 2 == 0 for dim in shape) 1658 if encoded_format is None: 1659 encoded_format = 'yuv420p' if all_dimensions_are_even else 'yuv444p' 1660 if not all_dimensions_are_even and encoded_format.startswith( 1661 ('yuv42', 'yuvj42') 1662 ): 1663 raise ValueError( 1664 f'With encoded_format {encoded_format}, video dimensions must be' 1665 f' even, but shape is {shape}.' 1666 ) 1667 self.fps = fps 1668 self.codec = codec 1669 self.bps = bps 1670 self.qp = qp 1671 self.crf = crf 1672 self.ffmpeg_args = ffmpeg_args 1673 self.input_format = input_format 1674 self.dtype = dtype 1675 self.encoded_format = encoded_format 1676 self.sandbox_max_run_time_secs = sandbox_max_run_time_secs 1677 if num_rate_specifications == 0 and not ffmpeg_args: 1678 qp = 20 if math.prod(self.shape) <= 640 * 480 else 28 1679 self._bitrate_args = ( 1680 (['-vb', f'{bps}'] if bps is not None else []) 1681 + (['-qp', f'{qp}'] if qp is not None else []) 1682 + (['-vb', '0', '-crf', f'{crf}'] if crf is not None else []) 1683 ) 1684 if self.codec == 'gif': 1685 if self.path.suffix != '.gif': 1686 raise ValueError(f"File '{self.path}' does not have a .gif suffix.") 1687 self.encoded_format = 'pal8' 1688 self._bitrate_args = [] 1689 video_filter = 'split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse' 1690 # Less common (and likely less useful) is a per-frame color palette: 1691 # video_filter = ('split[s0][s1];[s0]palettegen=stats_mode=single[p];' 1692 # '[s1][p]paletteuse=new=1') 1693 self.ffmpeg_args = ['-vf', video_filter, '-f', 'gif'] + self.ffmpeg_args 1694 self._write_via_local_file: Any = None 1695 self._popen: subprocess.Popen[bytes] | None = None 1696 self._proc: subprocess.Popen[bytes] | None = None 1697 1698 def __enter__(self) -> 'VideoWriter': 1699 input_pix_fmt = self._get_pix_fmt(self.dtype, self.input_format) 1700 try: 1701 self._write_via_local_file = _write_via_local_file(self.path) 1702 # pylint: disable-next=no-member 1703 tmp_name = self._write_via_local_file.__enter__() 1704 1705 # Writing to stdout using ('-f', 'mp4', '-') would require 1706 # ('-movflags', 'frag_keyframe+empty_moov') and the result is nonportable. 1707 height, width = self.shape 1708 command = ( 1709 [ 1710 '-v', 1711 'error', 1712 '-f', 1713 'rawvideo', 1714 '-vcodec', 1715 'rawvideo', 1716 '-pix_fmt', 1717 input_pix_fmt, 1718 '-s', 1719 f'{width}x{height}', 1720 '-r', 1721 f'{self.fps}', 1722 '-i', 1723 '-', 1724 '-an', 1725 '-vcodec', 1726 self.codec, 1727 '-pix_fmt', 1728 self.encoded_format, 1729 ] 1730 + self._bitrate_args 1731 + self.ffmpeg_args 1732 + ['-y', tmp_name] 1733 ) 1734 self._popen = _run_ffmpeg( 1735 command, 1736 stdin=subprocess.PIPE, 1737 stderr=subprocess.PIPE, 1738 allowed_output_files=[tmp_name], 1739 sandbox_max_run_time_secs=self.sandbox_max_run_time_secs, 1740 ) 1741 self._proc = self._popen.__enter__() 1742 except Exception: 1743 self.__exit__(None, None, None) 1744 raise 1745 return self 1746 1747 def __exit__(self, *_: Any) -> None: 1748 self.close() 1749 1750 def add_image(self, image: _NDArray) -> None: 1751 """Writes a video frame. 1752 1753 Args: 1754 image: Array whose dtype and first two dimensions must match the `dtype` 1755 and `shape` specified in `VideoWriter` initialization. If 1756 `input_format` is 'gray', the image must be 2D. For the 'rgb' 1757 input_format, the image may be either 2D (interpreted as grayscale) or 1758 3D with three (R, G, B) channels. For the 'yuv' input_format, the image 1759 must be 3D with three (Y, U, V) channels. 1760 1761 Raises: 1762 RuntimeError: If there is an error writing to the output file. 1763 """ 1764 assert self._proc, 'Error: writing to an already closed context.' 1765 if issubclass(image.dtype.type, (np.floating, np.bool_)): 1766 image = to_type(image, self.dtype) 1767 if image.dtype != self.dtype: 1768 raise ValueError(f'Image type {image.dtype} != {self.dtype}.') 1769 if self.input_format == 'gray': 1770 if image.ndim != 2: 1771 raise ValueError(f'Image dimensions {image.shape} are not 2D.') 1772 else: 1773 if image.ndim == 2 and self.input_format == 'rgb': 1774 image = np.dstack((image, image, image)) 1775 if not (image.ndim == 3 and image.shape[2] == 3): 1776 raise ValueError(f'Image dimensions {image.shape} are invalid.') 1777 if image.shape[:2] != self.shape: 1778 raise ValueError( 1779 f'Image dimensions {image.shape[:2]} do not match' 1780 f' those of the initialized video {self.shape}.' 1781 ) 1782 if self.input_format == 'yuv': # Convert from per-pixel YUV to planar YUV. 1783 image = np.moveaxis(image, 2, 0) 1784 data = image.tobytes() 1785 stdin = self._proc.stdin 1786 assert stdin is not None 1787 if stdin.write(data) != len(data): 1788 self._proc.wait() 1789 stderr = self._proc.stderr 1790 assert stderr is not None 1791 s = stderr.read().decode('utf-8') 1792 raise RuntimeError(f"Error writing '{self.path}': {s}") 1793 1794 def close(self) -> None: 1795 """Finishes writing the video. (Called automatically at end of context.)""" 1796 if self._popen: 1797 assert self._proc, 'Error: closing an already closed context.' 1798 stdin = self._proc.stdin 1799 assert stdin is not None 1800 stdin.close() 1801 if self._proc.wait(): 1802 stderr = self._proc.stderr 1803 assert stderr is not None 1804 s = stderr.read().decode('utf-8') 1805 raise RuntimeError(f"Error writing '{self.path}': {s}") 1806 self._popen.__exit__(None, None, None) 1807 self._popen = None 1808 self._proc = None 1809 if self._write_via_local_file: 1810 # pylint: disable-next=no-member 1811 self._write_via_local_file.__exit__(None, None, None) 1812 self._write_via_local_file = None 1813 1814 1815class _VideoArray(npt.NDArray[Any]): 1816 """Wrapper to add a VideoMetadata `metadata` attribute to a numpy array.""" 1817 1818 metadata: VideoMetadata | None 1819 1820 def __new__( 1821 cls: Type['_VideoArray'], 1822 input_array: _NDArray, 1823 metadata: VideoMetadata | None = None, 1824 ) -> '_VideoArray': 1825 obj: _VideoArray = np.asarray(input_array).view(cls) 1826 obj.metadata = metadata 1827 return obj 1828 1829 def __array_finalize__(self, obj: Any) -> None: 1830 if obj is None: 1831 return 1832 self.metadata = getattr(obj, 'metadata', None) 1833 1834 1835def read_video(path_or_url: _Path, **kwargs: Any) -> _VideoArray: 1836 """Returns an array containing all images read from a compressed video file. 1837 1838 >>> video = read_video('/tmp/river.mp4') 1839 >>> print(f'The framerate is {video.metadata.fps} frames/s.') 1840 >>> show_video(video) 1841 1842 >>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4' 1843 >>> show_video(read_video(url)) 1844 1845 Args: 1846 path_or_url: Input video file. 1847 **kwargs: Additional parameters for `VideoReader`. 1848 1849 Returns: 1850 A 4D `numpy` array with dimensions (frame, height, width, channel), or a 3D 1851 array if `output_format` is specified as 'gray'. The returned array has an 1852 attribute `metadata` containing `VideoMetadata` information. This enables 1853 `show_video` to retrieve the framerate in `metadata.fps`. Note that the 1854 metadata attribute is lost in most subsequent `numpy` operations. 1855 """ 1856 with VideoReader(path_or_url, **kwargs) as reader: 1857 return _VideoArray(np.array(tuple(reader)), metadata=reader.metadata) 1858 1859 1860def write_video(path: _Path, images: Iterable[_NDArray], **kwargs: Any) -> None: 1861 """Writes images to a compressed video file. 1862 1863 >>> video = moving_circle((480, 640), num_images=60) 1864 >>> write_video('/tmp/v.mp4', video, fps=60, qp=18) 1865 >>> show_video(read_video('/tmp/v.mp4')) 1866 1867 Args: 1868 path: Output video file. 1869 images: Iterable over video frames, e.g. a 4D array or a list of 2D or 3D 1870 arrays. 1871 **kwargs: Additional parameters for `VideoWriter`. 1872 """ 1873 first_image, images = _peek_first(images) 1874 shape = first_image.shape[0], first_image.shape[1] 1875 dtype = first_image.dtype 1876 if dtype == bool: 1877 dtype = np.dtype(np.uint8) 1878 elif np.issubdtype(dtype, np.floating): 1879 dtype = np.dtype(np.uint16) 1880 kwargs = {'metadata': getattr(images, 'metadata', None), **kwargs} 1881 with VideoWriter(path, shape=shape, dtype=dtype, **kwargs) as writer: 1882 for image in images: 1883 writer.add_image(image) 1884 1885 1886def compress_video( 1887 images: Iterable[_NDArray], *, codec: str = 'h264', **kwargs: Any 1888) -> bytes: 1889 """Returns a buffer containing a compressed video. 1890 1891 The video container is 'gif' for 'gif' codec, 'webm' for 'vp9' codec, 1892 and mp4 otherwise. 1893 1894 >>> video = read_video('/tmp/river.mp4') 1895 >>> data = compress_video(video, bps=10_000_000) 1896 >>> print(len(data)) 1897 1898 >>> data = compress_video(moving_circle((100, 100), num_images=10), fps=10) 1899 1900 Args: 1901 images: Iterable over video frames. 1902 codec: Compression algorithm as defined by `ffmpeg -codecs` (e.g., 'h264', 1903 'hevc', 'vp9', or 'gif'). 1904 **kwargs: Additional parameters for `VideoWriter`. 1905 1906 Returns: 1907 A bytes buffer containing the compressed video. 1908 """ 1909 suffix = _filename_suffix_from_codec(codec) 1910 with tempfile.TemporaryDirectory() as directory_name: 1911 tmp_path = pathlib.Path(directory_name) / f'file{suffix}' 1912 write_video(tmp_path, images, codec=codec, **kwargs) 1913 return tmp_path.read_bytes() 1914 1915 1916def decompress_video(data: bytes, **kwargs: Any) -> _NDArray: 1917 """Returns video images from an MP4-compressed data buffer.""" 1918 with tempfile.TemporaryDirectory() as directory_name: 1919 tmp_path = pathlib.Path(directory_name) / 'file.mp4' 1920 tmp_path.write_bytes(data) 1921 return read_video(tmp_path, **kwargs) 1922 1923 1924def html_from_compressed_video( 1925 data: bytes, 1926 width: int, 1927 height: int, 1928 *, 1929 title: str | None = None, 1930 border: bool | str = False, 1931 loop: bool = True, 1932 autoplay: bool = True, 1933) -> str: 1934 """Returns an HTML string with a video tag containing H264-encoded data. 1935 1936 Args: 1937 data: MP4-compressed video bytes. 1938 width: Width of HTML video in pixels. 1939 height: Height of HTML video in pixels. 1940 title: Optional text shown centered above the video. 1941 border: If `bool`, whether to place a black boundary around the image, or if 1942 `str`, the boundary CSS style. 1943 loop: If True, the playback repeats forever. 1944 autoplay: If True, video playback starts without having to click. 1945 """ 1946 b64 = base64.b64encode(data).decode('utf-8') 1947 if isinstance(border, str): 1948 border = f'{border}; ' 1949 elif border: 1950 border = 'border:1px solid black; ' 1951 else: 1952 border = '' 1953 options = ( 1954 f'controls width="{width}" height="{height}"' 1955 f' style="{border}object-fit:cover;"' 1956 f'{" loop" if loop else ""}' 1957 f'{" autoplay muted" if autoplay else ""}' 1958 ) 1959 s = f"""<video {options}> 1960 <source src="data:video/mp4;base64,{b64}" type="video/mp4"/> 1961 This browser does not support the video tag. 1962 </video>""" 1963 if title is not None: 1964 s = f"""<div style="display:flex; align-items:left;"> 1965 <div style="display:flex; flex-direction:column; align-items:center;"> 1966 <div>{title}</div><div>{s}</div></div></div>""" 1967 return s 1968 1969 1970def show_video( 1971 images: Iterable[_NDArray], *, title: str | None = None, **kwargs: Any 1972) -> str | None: 1973 """Displays a video in the IPython notebook and optionally saves it to a file. 1974 1975 See `show_videos`. 1976 1977 >>> video = read_video('https://github.com/hhoppe/data/raw/main/video.mp4') 1978 >>> show_video(video, title='River video') 1979 1980 >>> show_video(moving_circle((80, 80), num_images=10), fps=5, border=True) 1981 1982 >>> show_video(read_video('/tmp/river.mp4')) 1983 1984 Args: 1985 images: Iterable of video frames (e.g., a 4D array or a list of 2D or 3D 1986 arrays). 1987 title: Optional text shown centered above the video. 1988 **kwargs: See `show_videos`. 1989 1990 Returns: 1991 html string if `return_html` is `True`. 1992 """ 1993 return show_videos([images], [title], **kwargs) 1994 1995 1996def show_videos( 1997 videos: Iterable[Iterable[_NDArray]] | Mapping[str, Iterable[_NDArray]], 1998 titles: Iterable[str | None] | None = None, 1999 *, 2000 width: int | None = None, 2001 height: int | None = None, 2002 downsample: bool = True, 2003 columns: int | None = None, 2004 fps: float | None = None, 2005 bps: int | None = None, 2006 qp: int | None = None, 2007 codec: str = 'h264', 2008 ylabel: str = '', 2009 html_class: str = 'show_videos', 2010 return_html: bool = False, 2011 **kwargs: Any, 2012) -> str | None: 2013 """Displays a row of videos in the IPython notebook. 2014 2015 Creates HTML with `<video>` tags containing embedded H264-encoded bytestrings. 2016 If `codec` is set to 'gif', we instead use `<img>` tags containing embedded 2017 GIF-encoded bytestrings. Note that the resulting GIF animations skip frames 2018 when the `fps` period is not a multiple of 10 ms units (GIF frame delay 2019 units). Encoding at `fps` = 20.0, 25.0, or 50.0 works fine. 2020 2021 If a directory has been specified using `set_show_save_dir`, also saves each 2022 titled video to a file in that directory based on its title. 2023 2024 Args: 2025 videos: Iterable of videos, or dictionary of `{title: video}`. Each video 2026 must be an iterable of images. If a video object has a `metadata` 2027 (`VideoMetadata`) attribute, its `fps` field provides a default framerate. 2028 titles: Optional strings shown above the corresponding videos. 2029 width: Optional, overrides displayed width (in pixels). 2030 height: Optional, overrides displayed height (in pixels). 2031 downsample: If True, each video whose width or height is greater than the 2032 specified `width` or `height` is resampled to the display resolution. This 2033 improves antialiasing and reduces the size of the notebook. 2034 columns: Optional, maximum number of videos per row. 2035 fps: Frames-per-second framerate (default is 60.0 except 25.0 for GIF). 2036 bps: Bits-per-second bitrate (default None). 2037 qp: Quantization parameter for video compression quality (default None). 2038 codec: Compression algorithm; must be either 'h264' or 'gif'. 2039 ylabel: Text (rotated by 90 degrees) shown on the left of each row. 2040 html_class: CSS class name used in definition of HTML element. 2041 return_html: If `True` return the raw HTML `str` instead of displaying. 2042 **kwargs: Additional parameters (`border`, `loop`, `autoplay`) for 2043 `html_from_compressed_video`. 2044 2045 Returns: 2046 html string if `return_html` is `True`. 2047 """ 2048 if isinstance(videos, Mapping): 2049 if titles is not None: 2050 raise ValueError( 2051 'Cannot have both a video dictionary and a titles parameter.' 2052 ) 2053 list_titles = list(videos.keys()) 2054 list_videos = list(videos.values()) 2055 else: 2056 list_videos = list(cast('Iterable[_NDArray]', videos)) 2057 list_titles = [None] * len(list_videos) if titles is None else list(titles) 2058 if len(list_videos) != len(list_titles): 2059 raise ValueError( 2060 'Number of videos does not match number of titles' 2061 f' ({len(list_videos)} vs {len(list_titles)}).' 2062 ) 2063 if codec not in {'h264', 'gif'}: 2064 raise ValueError(f'Codec {codec} is neither h264 or gif.') 2065 2066 html_strings = [] 2067 for video, title in zip(list_videos, list_titles): 2068 metadata: VideoMetadata | None = getattr(video, 'metadata', None) 2069 first_image, video = _peek_first(video) 2070 w, h = _get_width_height(width, height, first_image.shape[:2]) 2071 if downsample and (w < first_image.shape[1] or h < first_image.shape[0]): 2072 # Not resize_video() because each image may have different depth and type. 2073 video = [resize_image(image, (h, w)) for image in video] 2074 first_image = video[0] 2075 data = compress_video( 2076 video, metadata=metadata, fps=fps, bps=bps, qp=qp, codec=codec 2077 ) 2078 if title is not None and _config.show_save_dir: 2079 suffix = _filename_suffix_from_codec(codec) 2080 path = pathlib.Path(_config.show_save_dir) / f'{title}{suffix}' 2081 with _open(path, mode='wb') as f: 2082 f.write(data) 2083 if codec == 'gif': 2084 pixelated = h > first_image.shape[0] or w > first_image.shape[1] 2085 html_string = html_from_compressed_image( 2086 data, w, h, title=title, fmt='gif', pixelated=pixelated, **kwargs 2087 ) 2088 else: 2089 html_string = html_from_compressed_video( 2090 data, w, h, title=title, **kwargs 2091 ) 2092 html_strings.append(html_string) 2093 2094 # Create single-row tables each with no more than 'columns' elements. 2095 table_strings = [] 2096 for row_html_strings in _chunked(html_strings, columns): 2097 td = '<td style="padding:1px;">' 2098 s = ''.join(f'{td}{e}</td>' for e in row_html_strings) 2099 if ylabel: 2100 style = 'writing-mode:vertical-lr; transform:rotate(180deg);' 2101 s = f'{td}<span style="{style}">{ylabel}</span></td>' + s 2102 table_strings.append( 2103 f'<table class="{html_class}"' 2104 f' style="border-spacing:0px;"><tr>{s}</tr></table>' 2105 ) 2106 s = ''.join(table_strings) 2107 if return_html: 2108 return s 2109 _display_html(s) 2110 return None 2111 2112 2113# Local Variables: 2114# fill-column: 80 2115# End:
977def show_image( 978 image: _ArrayLike, *, title: str | None = None, **kwargs: Any 979) -> str | None: 980 """Displays an image in the notebook and optionally saves it to a file. 981 982 See `show_images`. 983 984 >>> show_image(np.random.rand(100, 100)) 985 >>> show_image(np.random.randint(0, 256, size=(80, 80, 3), dtype='uint8')) 986 >>> show_image(np.random.rand(10, 10) - 0.5, cmap='bwr', height=100) 987 >>> show_image(read_image('/tmp/image.png')) 988 >>> url = 'https://github.com/hhoppe/data/raw/main/image.png' 989 >>> show_image(read_image(url)) 990 991 Args: 992 image: 2D array-like, or 3D array-like with 1, 3, or 4 channels. 993 title: Optional text shown centered above the image. 994 **kwargs: See `show_images`. 995 996 Returns: 997 html string if `return_html` is `True`. 998 """ 999 return show_images([np.asarray(image)], [title], **kwargs)
Displays an image in the notebook and optionally saves it to a file.
See show_images.
>>> show_image(np.random.rand(100, 100))
>>> show_image(np.random.randint(0, 256, size=(80, 80, 3), dtype='uint8'))
>>> show_image(np.random.rand(10, 10) - 0.5, cmap='bwr', height=100)
>>> show_image(read_image('/tmp/image.png'))
>>> url = 'https://github.com/hhoppe/data/raw/main/image.png'
>>> show_image(read_image(url))
Arguments:
- image: 2D array-like, or 3D array-like with 1, 3, or 4 channels.
- title: Optional text shown centered above the image.
- **kwargs: See
show_images.
Returns:
html string if
return_htmlisTrue.
1002def show_images( 1003 images: Iterable[_ArrayLike] | Mapping[str, _ArrayLike], 1004 titles: Iterable[str | None] | None = None, 1005 *, 1006 width: int | None = None, 1007 height: int | None = None, 1008 downsample: bool = True, 1009 columns: int | None = None, 1010 vmin: float | None = None, 1011 vmax: float | None = None, 1012 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 1013 border: bool | str = False, 1014 ylabel: str = '', 1015 html_class: str = 'show_images', 1016 pixelated: bool | None = None, 1017 return_html: bool = False, 1018) -> str | None: 1019 """Displays a row of images in the IPython/Jupyter notebook. 1020 1021 If a directory has been specified using `set_show_save_dir`, also saves each 1022 titled image to a file in that directory based on its title. 1023 1024 >>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64)) 1025 >>> show_images([image1, image2]) 1026 >>> show_images({'random image': image1, 'color ramp': image2}, height=128) 1027 >>> show_images([image1, image2] * 5, columns=4, border=True) 1028 1029 Args: 1030 images: Iterable of images, or dictionary of `{title: image}`. Each image 1031 must be either a 2D array or a 3D array with 1, 3, or 4 channels. 1032 titles: Optional strings shown above the corresponding images. 1033 width: Optional, overrides displayed width (in pixels). 1034 height: Optional, overrides displayed height (in pixels). 1035 downsample: If True, each image whose width or height is greater than the 1036 specified `width` or `height` is resampled to the display resolution. This 1037 improves antialiasing and reduces the size of the notebook. 1038 columns: Optional, maximum number of images per row. 1039 vmin: For single-channel image, explicit min value for display. 1040 vmax: For single-channel image, explicit max value for display. 1041 cmap: For single-channel image, `pyplot` color map or callable to map 1D to 1042 3D color. 1043 border: If `bool`, whether to place a black boundary around the image, or if 1044 `str`, the boundary CSS style. 1045 ylabel: Text (rotated by 90 degrees) shown on the left of each row. 1046 html_class: CSS class name used in definition of HTML element. 1047 pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'; if 1048 False, sets 'image-rendering: auto'; if None, uses pixelated rendering 1049 only on images for which `width` or `height` introduces magnification. 1050 return_html: If `True` return the raw HTML `str` instead of displaying. 1051 1052 Returns: 1053 html string if `return_html` is `True`. 1054 """ 1055 if isinstance(images, Mapping): 1056 if titles is not None: 1057 raise ValueError('Cannot have images dictionary and titles parameter.') 1058 list_titles, list_images = list(images.keys()), list(images.values()) 1059 else: 1060 list_images = list(images) 1061 list_titles = [None] * len(list_images) if titles is None else list(titles) 1062 if len(list_images) != len(list_titles): 1063 raise ValueError( 1064 'Number of images does not match number of titles' 1065 f' ({len(list_images)} vs {len(list_titles)}).' 1066 ) 1067 1068 list_images = [ 1069 _ensure_mapped_to_rgb(image, vmin=vmin, vmax=vmax, cmap=cmap) 1070 for image in list_images 1071 ] 1072 1073 def maybe_downsample(image: _NDArray) -> _NDArray: 1074 shape = image.shape[0], image.shape[1] 1075 w, h = _get_width_height(width, height, shape) 1076 if w < shape[1] or h < shape[0]: 1077 image = resize_image(image, (h, w)) 1078 return image 1079 1080 if downsample: 1081 list_images = [maybe_downsample(image) for image in list_images] 1082 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1083 1084 for title, png_data in zip(list_titles, png_datas): 1085 if title is not None and _config.show_save_dir: 1086 path = pathlib.Path(_config.show_save_dir) / f'{title}.png' 1087 with _open(path, mode='wb') as f: 1088 f.write(png_data) 1089 1090 def html_from_compressed_images() -> str: 1091 html_strings = [] 1092 for image, title, png_data in zip(list_images, list_titles, png_datas): 1093 w, h = _get_width_height(width, height, image.shape[:2]) 1094 magnified = h > image.shape[0] or w > image.shape[1] 1095 pixelated2 = pixelated if pixelated is not None else magnified 1096 html_strings.append( 1097 html_from_compressed_image( 1098 png_data, w, h, title=title, border=border, pixelated=pixelated2 1099 ) 1100 ) 1101 # Create single-row tables each with no more than 'columns' elements. 1102 table_strings = [] 1103 for row_html_strings in _chunked(html_strings, columns): 1104 td = '<td style="padding:1px;">' 1105 s = ''.join(f'{td}{e}</td>' for e in row_html_strings) 1106 if ylabel: 1107 style = 'writing-mode:vertical-lr; transform:rotate(180deg);' 1108 s = f'{td}<span style="{style}">{ylabel}</span></td>' + s 1109 table_strings.append( 1110 f'<table class="{html_class}"' 1111 f' style="border-spacing:0px;"><tr>{s}</tr></table>' 1112 ) 1113 return ''.join(table_strings) 1114 1115 s = html_from_compressed_images() 1116 while len(s) > _IPYTHON_HTML_SIZE_LIMIT * 0.5: 1117 warnings.warn('mediapy: subsampling images to reduce HTML size') 1118 list_images = [image[::2, ::2] for image in list_images] 1119 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1120 s = html_from_compressed_images() 1121 if return_html: 1122 return s 1123 _display_html(s) 1124 return None
Displays a row of images in the IPython/Jupyter notebook.
If a directory has been specified using set_show_save_dir, also saves each
titled image to a file in that directory based on its title.
>>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64))
>>> show_images([image1, image2])
>>> show_images({'random image': image1, 'color ramp': image2}, height=128)
>>> show_images([image1, image2] * 5, columns=4, border=True)
Arguments:
- images: Iterable of images, or dictionary of
{title: image}. Each image must be either a 2D array or a 3D array with 1, 3, or 4 channels. - titles: Optional strings shown above the corresponding images.
- width: Optional, overrides displayed width (in pixels).
- height: Optional, overrides displayed height (in pixels).
- downsample: If True, each image whose width or height is greater than the
specified
widthorheightis resampled to the display resolution. This improves antialiasing and reduces the size of the notebook. - columns: Optional, maximum number of images per row.
- vmin: For single-channel image, explicit min value for display.
- vmax: For single-channel image, explicit max value for display.
- cmap: For single-channel image,
pyplotcolor map or callable to map 1D to 3D color. - border: If
bool, whether to place a black boundary around the image, or ifstr, the boundary CSS style. - ylabel: Text (rotated by 90 degrees) shown on the left of each row.
- html_class: CSS class name used in definition of HTML element.
- pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'; if
False, sets 'image-rendering: auto'; if None, uses pixelated rendering
only on images for which
widthorheightintroduces magnification. - return_html: If
Truereturn the raw HTMLstrinstead of displaying.
Returns:
html string if
return_htmlisTrue.
1127def compare_images( 1128 images: Iterable[_ArrayLike], 1129 *, 1130 vmin: float | None = None, 1131 vmax: float | None = None, 1132 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 1133) -> None: 1134 """Compare two images using an interactive slider. 1135 1136 Displays an HTML slider component to interactively swipe between two images. 1137 The slider functionality requires that the web browser have Internet access. 1138 See additional info in `https://github.com/sneas/img-comparison-slider`. 1139 1140 >>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64)) 1141 >>> compare_images([image1, image2]) 1142 1143 Args: 1144 images: Iterable of images. Each image must be either a 2D array or a 3D 1145 array with 1, 3, or 4 channels. There must be exactly two images. 1146 vmin: For single-channel image, explicit min value for display. 1147 vmax: For single-channel image, explicit max value for display. 1148 cmap: For single-channel image, `pyplot` color map or callable to map 1D to 1149 3D color. 1150 """ 1151 list_images = [ 1152 _ensure_mapped_to_rgb(image, vmin=vmin, vmax=vmax, cmap=cmap) 1153 for image in images 1154 ] 1155 if len(list_images) != 2: 1156 raise ValueError('The number of images must be 2.') 1157 png_datas = [compress_image(to_uint8(image)) for image in list_images] 1158 b64_1, b64_2 = [ 1159 base64.b64encode(png_data).decode('utf-8') for png_data in png_datas 1160 ] 1161 s = _IMAGE_COMPARISON_HTML.replace('{b64_1}', b64_1).replace('{b64_2}', b64_2) 1162 _display_html(s)
Compare two images using an interactive slider.
Displays an HTML slider component to interactively swipe between two images.
The slider functionality requires that the web browser have Internet access.
See additional info in https://github.com/sneas/img-comparison-slider.
>>> image1, image2 = np.random.rand(64, 64, 3), color_ramp((64, 64))
>>> compare_images([image1, image2])
Arguments:
- images: Iterable of images. Each image must be either a 2D array or a 3D array with 1, 3, or 4 channels. There must be exactly two images.
- vmin: For single-channel image, explicit min value for display.
- vmax: For single-channel image, explicit max value for display.
- cmap: For single-channel image,
pyplotcolor map or callable to map 1D to 3D color.
1971def show_video( 1972 images: Iterable[_NDArray], *, title: str | None = None, **kwargs: Any 1973) -> str | None: 1974 """Displays a video in the IPython notebook and optionally saves it to a file. 1975 1976 See `show_videos`. 1977 1978 >>> video = read_video('https://github.com/hhoppe/data/raw/main/video.mp4') 1979 >>> show_video(video, title='River video') 1980 1981 >>> show_video(moving_circle((80, 80), num_images=10), fps=5, border=True) 1982 1983 >>> show_video(read_video('/tmp/river.mp4')) 1984 1985 Args: 1986 images: Iterable of video frames (e.g., a 4D array or a list of 2D or 3D 1987 arrays). 1988 title: Optional text shown centered above the video. 1989 **kwargs: See `show_videos`. 1990 1991 Returns: 1992 html string if `return_html` is `True`. 1993 """ 1994 return show_videos([images], [title], **kwargs)
Displays a video in the IPython notebook and optionally saves it to a file.
See show_videos.
>>> video = read_video('https://github.com/hhoppe/data/raw/main/video.mp4')
>>> show_video(video, title='River video')
>>> show_video(moving_circle((80, 80), num_images=10), fps=5, border=True)
>>> show_video(read_video('/tmp/river.mp4'))
Arguments:
- images: Iterable of video frames (e.g., a 4D array or a list of 2D or 3D arrays).
- title: Optional text shown centered above the video.
- **kwargs: See
show_videos.
Returns:
html string if
return_htmlisTrue.
1997def show_videos( 1998 videos: Iterable[Iterable[_NDArray]] | Mapping[str, Iterable[_NDArray]], 1999 titles: Iterable[str | None] | None = None, 2000 *, 2001 width: int | None = None, 2002 height: int | None = None, 2003 downsample: bool = True, 2004 columns: int | None = None, 2005 fps: float | None = None, 2006 bps: int | None = None, 2007 qp: int | None = None, 2008 codec: str = 'h264', 2009 ylabel: str = '', 2010 html_class: str = 'show_videos', 2011 return_html: bool = False, 2012 **kwargs: Any, 2013) -> str | None: 2014 """Displays a row of videos in the IPython notebook. 2015 2016 Creates HTML with `<video>` tags containing embedded H264-encoded bytestrings. 2017 If `codec` is set to 'gif', we instead use `<img>` tags containing embedded 2018 GIF-encoded bytestrings. Note that the resulting GIF animations skip frames 2019 when the `fps` period is not a multiple of 10 ms units (GIF frame delay 2020 units). Encoding at `fps` = 20.0, 25.0, or 50.0 works fine. 2021 2022 If a directory has been specified using `set_show_save_dir`, also saves each 2023 titled video to a file in that directory based on its title. 2024 2025 Args: 2026 videos: Iterable of videos, or dictionary of `{title: video}`. Each video 2027 must be an iterable of images. If a video object has a `metadata` 2028 (`VideoMetadata`) attribute, its `fps` field provides a default framerate. 2029 titles: Optional strings shown above the corresponding videos. 2030 width: Optional, overrides displayed width (in pixels). 2031 height: Optional, overrides displayed height (in pixels). 2032 downsample: If True, each video whose width or height is greater than the 2033 specified `width` or `height` is resampled to the display resolution. This 2034 improves antialiasing and reduces the size of the notebook. 2035 columns: Optional, maximum number of videos per row. 2036 fps: Frames-per-second framerate (default is 60.0 except 25.0 for GIF). 2037 bps: Bits-per-second bitrate (default None). 2038 qp: Quantization parameter for video compression quality (default None). 2039 codec: Compression algorithm; must be either 'h264' or 'gif'. 2040 ylabel: Text (rotated by 90 degrees) shown on the left of each row. 2041 html_class: CSS class name used in definition of HTML element. 2042 return_html: If `True` return the raw HTML `str` instead of displaying. 2043 **kwargs: Additional parameters (`border`, `loop`, `autoplay`) for 2044 `html_from_compressed_video`. 2045 2046 Returns: 2047 html string if `return_html` is `True`. 2048 """ 2049 if isinstance(videos, Mapping): 2050 if titles is not None: 2051 raise ValueError( 2052 'Cannot have both a video dictionary and a titles parameter.' 2053 ) 2054 list_titles = list(videos.keys()) 2055 list_videos = list(videos.values()) 2056 else: 2057 list_videos = list(cast('Iterable[_NDArray]', videos)) 2058 list_titles = [None] * len(list_videos) if titles is None else list(titles) 2059 if len(list_videos) != len(list_titles): 2060 raise ValueError( 2061 'Number of videos does not match number of titles' 2062 f' ({len(list_videos)} vs {len(list_titles)}).' 2063 ) 2064 if codec not in {'h264', 'gif'}: 2065 raise ValueError(f'Codec {codec} is neither h264 or gif.') 2066 2067 html_strings = [] 2068 for video, title in zip(list_videos, list_titles): 2069 metadata: VideoMetadata | None = getattr(video, 'metadata', None) 2070 first_image, video = _peek_first(video) 2071 w, h = _get_width_height(width, height, first_image.shape[:2]) 2072 if downsample and (w < first_image.shape[1] or h < first_image.shape[0]): 2073 # Not resize_video() because each image may have different depth and type. 2074 video = [resize_image(image, (h, w)) for image in video] 2075 first_image = video[0] 2076 data = compress_video( 2077 video, metadata=metadata, fps=fps, bps=bps, qp=qp, codec=codec 2078 ) 2079 if title is not None and _config.show_save_dir: 2080 suffix = _filename_suffix_from_codec(codec) 2081 path = pathlib.Path(_config.show_save_dir) / f'{title}{suffix}' 2082 with _open(path, mode='wb') as f: 2083 f.write(data) 2084 if codec == 'gif': 2085 pixelated = h > first_image.shape[0] or w > first_image.shape[1] 2086 html_string = html_from_compressed_image( 2087 data, w, h, title=title, fmt='gif', pixelated=pixelated, **kwargs 2088 ) 2089 else: 2090 html_string = html_from_compressed_video( 2091 data, w, h, title=title, **kwargs 2092 ) 2093 html_strings.append(html_string) 2094 2095 # Create single-row tables each with no more than 'columns' elements. 2096 table_strings = [] 2097 for row_html_strings in _chunked(html_strings, columns): 2098 td = '<td style="padding:1px;">' 2099 s = ''.join(f'{td}{e}</td>' for e in row_html_strings) 2100 if ylabel: 2101 style = 'writing-mode:vertical-lr; transform:rotate(180deg);' 2102 s = f'{td}<span style="{style}">{ylabel}</span></td>' + s 2103 table_strings.append( 2104 f'<table class="{html_class}"' 2105 f' style="border-spacing:0px;"><tr>{s}</tr></table>' 2106 ) 2107 s = ''.join(table_strings) 2108 if return_html: 2109 return s 2110 _display_html(s) 2111 return None
Displays a row of videos in the IPython notebook.
Creates HTML with <video> tags containing embedded H264-encoded bytestrings.
If codec is set to 'gif', we instead use <img> tags containing embedded
GIF-encoded bytestrings. Note that the resulting GIF animations skip frames
when the fps period is not a multiple of 10 ms units (GIF frame delay
units). Encoding at fps = 20.0, 25.0, or 50.0 works fine.
If a directory has been specified using set_show_save_dir, also saves each
titled video to a file in that directory based on its title.
Arguments:
- videos: Iterable of videos, or dictionary of
{title: video}. Each video must be an iterable of images. If a video object has ametadata(VideoMetadata) attribute, itsfpsfield provides a default framerate. - titles: Optional strings shown above the corresponding videos.
- width: Optional, overrides displayed width (in pixels).
- height: Optional, overrides displayed height (in pixels).
- downsample: If True, each video whose width or height is greater than the
specified
widthorheightis resampled to the display resolution. This improves antialiasing and reduces the size of the notebook. - columns: Optional, maximum number of videos per row.
- fps: Frames-per-second framerate (default is 60.0 except 25.0 for GIF).
- bps: Bits-per-second bitrate (default None).
- qp: Quantization parameter for video compression quality (default None).
- codec: Compression algorithm; must be either 'h264' or 'gif'.
- ylabel: Text (rotated by 90 degrees) shown on the left of each row.
- html_class: CSS class name used in definition of HTML element.
- return_html: If
Truereturn the raw HTMLstrinstead of displaying. - **kwargs: Additional parameters (
border,loop,autoplay) forhtml_from_compressed_video.
Returns:
html string if
return_htmlisTrue.
769def read_image( 770 path_or_url: _Path, 771 *, 772 apply_exif_transpose: bool = True, 773 dtype: _DTypeLike = None, 774) -> _NDArray: 775 """Returns an image read from a file path or URL. 776 777 Decoding is performed using `PIL`, which supports `uint8` images with 1, 3, 778 or 4 channels and `uint16` images with a single channel. 779 780 Args: 781 path_or_url: Path of input file. 782 apply_exif_transpose: If True, rotate image according to EXIF orientation. 783 dtype: Data type of the returned array. If None, `np.uint8` or `np.uint16` 784 is inferred automatically. 785 """ 786 data = read_contents(path_or_url) 787 return decompress_image(data, dtype, apply_exif_transpose)
Returns an image read from a file path or URL.
Decoding is performed using PIL, which supports uint8 images with 1, 3,
or 4 channels and uint16 images with a single channel.
Arguments:
- path_or_url: Path of input file.
- apply_exif_transpose: If True, rotate image according to EXIF orientation.
- dtype: Data type of the returned array. If None,
np.uint8ornp.uint16is inferred automatically.
790def write_image( 791 path: _Path, image: _ArrayLike, fmt: str = 'png', **kwargs: Any 792) -> None: 793 """Writes an image to a file. 794 795 Encoding is performed using `PIL`, which supports `uint8` images with 1, 3, 796 or 4 channels and `uint16` images with a single channel. 797 798 File format is explicitly provided by `fmt` and not inferred by `path`. 799 800 Args: 801 path: Path of output file. 802 image: Array-like object. If its type is float, it is converted to np.uint8 803 using `to_uint8` (thus clamping to the input to the range [0.0, 1.0]). 804 Otherwise it must be np.uint8 or np.uint16. 805 fmt: Desired compression encoding, e.g. 'png'. 806 **kwargs: Additional parameters for `PIL.Image.save()`. 807 """ 808 image = _as_valid_media_array(image) 809 if np.issubdtype(image.dtype, np.floating): 810 image = to_uint8(image) 811 with _open(path, 'wb') as f: 812 _pil_image(image).save(f, format=fmt, **kwargs)
Writes an image to a file.
Encoding is performed using PIL, which supports uint8 images with 1, 3,
or 4 channels and uint16 images with a single channel.
File format is explicitly provided by fmt and not inferred by path.
Arguments:
- path: Path of output file.
- image: Array-like object. If its type is float, it is converted to np.uint8
using
to_uint8(thus clamping to the input to the range [0.0, 1.0]). Otherwise it must be np.uint8 or np.uint16. - fmt: Desired compression encoding, e.g. 'png'.
- **kwargs: Additional parameters for
PIL.Image.save().
1836def read_video(path_or_url: _Path, **kwargs: Any) -> _VideoArray: 1837 """Returns an array containing all images read from a compressed video file. 1838 1839 >>> video = read_video('/tmp/river.mp4') 1840 >>> print(f'The framerate is {video.metadata.fps} frames/s.') 1841 >>> show_video(video) 1842 1843 >>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4' 1844 >>> show_video(read_video(url)) 1845 1846 Args: 1847 path_or_url: Input video file. 1848 **kwargs: Additional parameters for `VideoReader`. 1849 1850 Returns: 1851 A 4D `numpy` array with dimensions (frame, height, width, channel), or a 3D 1852 array if `output_format` is specified as 'gray'. The returned array has an 1853 attribute `metadata` containing `VideoMetadata` information. This enables 1854 `show_video` to retrieve the framerate in `metadata.fps`. Note that the 1855 metadata attribute is lost in most subsequent `numpy` operations. 1856 """ 1857 with VideoReader(path_or_url, **kwargs) as reader: 1858 return _VideoArray(np.array(tuple(reader)), metadata=reader.metadata)
Returns an array containing all images read from a compressed video file.
>>> video = read_video('/tmp/river.mp4')
>>> print(f'The framerate is {video.metadata.fps} frames/s.')
>>> show_video(video)
>>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4'
>>> show_video(read_video(url))
Arguments:
- path_or_url: Input video file.
- **kwargs: Additional parameters for
VideoReader.
Returns:
A 4D
numpyarray with dimensions (frame, height, width, channel), or a 3D array ifoutput_formatis specified as 'gray'. The returned array has an attributemetadatacontainingVideoMetadatainformation. This enablesshow_videoto retrieve the framerate inmetadata.fps. Note that the metadata attribute is lost in most subsequentnumpyoperations.
1861def write_video(path: _Path, images: Iterable[_NDArray], **kwargs: Any) -> None: 1862 """Writes images to a compressed video file. 1863 1864 >>> video = moving_circle((480, 640), num_images=60) 1865 >>> write_video('/tmp/v.mp4', video, fps=60, qp=18) 1866 >>> show_video(read_video('/tmp/v.mp4')) 1867 1868 Args: 1869 path: Output video file. 1870 images: Iterable over video frames, e.g. a 4D array or a list of 2D or 3D 1871 arrays. 1872 **kwargs: Additional parameters for `VideoWriter`. 1873 """ 1874 first_image, images = _peek_first(images) 1875 shape = first_image.shape[0], first_image.shape[1] 1876 dtype = first_image.dtype 1877 if dtype == bool: 1878 dtype = np.dtype(np.uint8) 1879 elif np.issubdtype(dtype, np.floating): 1880 dtype = np.dtype(np.uint16) 1881 kwargs = {'metadata': getattr(images, 'metadata', None), **kwargs} 1882 with VideoWriter(path, shape=shape, dtype=dtype, **kwargs) as writer: 1883 for image in images: 1884 writer.add_image(image)
Writes images to a compressed video file.
>>> video = moving_circle((480, 640), num_images=60)
>>> write_video('/tmp/v.mp4', video, fps=60, qp=18)
>>> show_video(read_video('/tmp/v.mp4'))
Arguments:
- path: Output video file.
- images: Iterable over video frames, e.g. a 4D array or a list of 2D or 3D arrays.
- **kwargs: Additional parameters for
VideoWriter.
1379class VideoReader(_VideoIO): 1380 """Context to read a compressed video as an iterable over its images. 1381 1382 >>> with VideoReader('/tmp/river.mp4') as reader: 1383 ... print(f'Video has {reader.num_images} images with shape={reader.shape},' 1384 ... f' at {reader.fps} frames/sec and {reader.bps} bits/sec.') 1385 ... for image in reader: 1386 ... print(image.shape) 1387 1388 >>> with VideoReader('/tmp/river.mp4') as reader: 1389 ... video = np.array(tuple(reader)) 1390 1391 >>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4' 1392 >>> with VideoReader(url) as reader: 1393 ... show_video(reader) 1394 1395 Attributes: 1396 path_or_url: Location of input video. 1397 output_format: Format of output images (default 'rgb'). If 'rgb', each 1398 image has shape=(height, width, 3) with R, G, B values. If 'yuv', each 1399 image has shape=(height, width, 3) with Y, U, V values. If 'gray', each 1400 image has shape=(height, width). 1401 dtype: Data type for output images. The default is `np.uint8`. Use of 1402 `np.uint16` allows reading 10-bit or 12-bit data without precision loss. 1403 metadata: Object storing the information retrieved from the video header. 1404 Its attributes are copied as attributes in this class. 1405 num_images: Number of frames that is expected from the video stream. This 1406 is estimated from the framerate and the duration stored in the video 1407 header, so it might be inexact. 1408 shape: The dimensions (height, width) of each video frame. 1409 fps: The framerate in frames per second. 1410 bps: The estimated bitrate of the video stream in bits per second, retrieved 1411 from the video header. 1412 stream_index: The stream index to read from. The default is 0. 1413 sandbox_max_run_time_secs: The maximum time in seconds to run the sandbox. 1414 If None, the default limit is 30 minutes. Unused in open source. 1415 """ 1416 1417 path_or_url: _Path 1418 output_format: str 1419 dtype: _DType 1420 metadata: VideoMetadata 1421 num_images: int 1422 shape: tuple[int, int] 1423 fps: float 1424 bps: int | None 1425 stream_index: int 1426 _num_bytes_per_image: int 1427 1428 def __init__( 1429 self, 1430 path_or_url: _Path, 1431 *, 1432 stream_index: int = 0, 1433 output_format: str = 'rgb', 1434 dtype: _DTypeLike = np.uint8, 1435 sandbox_max_run_time_secs: int | None = None, 1436 ): 1437 if output_format not in {'rgb', 'yuv', 'gray'}: 1438 raise ValueError( 1439 f'Output format {output_format} is not rgb, yuv, or gray.' 1440 ) 1441 self.path_or_url = path_or_url 1442 self.output_format = output_format 1443 self.stream_index = stream_index 1444 self.dtype = np.dtype(dtype) 1445 if self.dtype.type not in (np.uint8, np.uint16): 1446 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1447 self.sandbox_max_run_time_secs = sandbox_max_run_time_secs 1448 self._read_via_local_file: Any = None 1449 self._popen: subprocess.Popen[bytes] | None = None 1450 self._proc: subprocess.Popen[bytes] | None = None 1451 1452 def __enter__(self) -> 'VideoReader': 1453 try: 1454 self._read_via_local_file = _read_via_local_file(self.path_or_url) 1455 # pylint: disable-next=no-member 1456 tmp_name = self._read_via_local_file.__enter__() 1457 1458 self.metadata = _get_video_metadata(tmp_name) 1459 self.num_images, self.shape, self.fps, self.bps = self.metadata 1460 pix_fmt = self._get_pix_fmt(self.dtype, self.output_format) 1461 num_channels = {'rgb': 3, 'yuv': 3, 'gray': 1}[self.output_format] 1462 bytes_per_channel = self.dtype.itemsize 1463 self._num_bytes_per_image = ( 1464 math.prod(self.shape) * num_channels * bytes_per_channel 1465 ) 1466 1467 command = [ 1468 '-v', 1469 'panic', 1470 '-nostdin', 1471 '-i', 1472 tmp_name, 1473 '-vcodec', 1474 'rawvideo', 1475 '-f', 1476 'image2pipe', 1477 '-map', 1478 f'0:v:{self.stream_index}', 1479 '-pix_fmt', 1480 pix_fmt, 1481 '-vsync', 1482 'vfr', 1483 '-', 1484 ] 1485 self._popen = _run_ffmpeg( 1486 command, 1487 stdout=subprocess.PIPE, 1488 stderr=subprocess.PIPE, 1489 allowed_input_files=[tmp_name], 1490 sandbox_max_run_time_secs=self.sandbox_max_run_time_secs, 1491 ) 1492 self._proc = self._popen.__enter__() 1493 except Exception: 1494 self.__exit__(None, None, None) 1495 raise 1496 return self 1497 1498 def __exit__(self, *_: Any) -> None: 1499 self.close() 1500 1501 def read(self) -> _NDArray | None: 1502 """Reads a video image frame (or None if at end of file). 1503 1504 Returns: 1505 A numpy array in the format specified by `output_format`, i.e., a 3D 1506 array with 3 color channels, except for format 'gray' which is 2D. 1507 1508 Raises: 1509 RuntimeError: If there is an error reading from the output file. 1510 """ 1511 assert self._proc, 'Error: reading from an already closed context.' 1512 stdout = self._proc.stdout 1513 assert stdout is not None 1514 data = stdout.read(self._num_bytes_per_image) 1515 if not data: # Due to either end-of-file or subprocess error. 1516 self.close() # Raises exception if subprocess had error. 1517 return None # To indicate end-of-file. 1518 if len(data) != self._num_bytes_per_image: 1519 self._proc.wait() 1520 stderr = self._proc.stderr 1521 stderr_output = '' 1522 if stderr is not None: 1523 stderr_output = stderr.read().decode('utf-8', errors='replace').strip() 1524 raise RuntimeError( 1525 f'ffmpeg exited with code {self._proc.returncode}.\nIncomplete' 1526 f' frame read: expected {self._num_bytes_per_image} bytes, but got' 1527 f' {len(data)}.\nffmpeg stderr:\n{stderr_output}' 1528 ) 1529 image = np.frombuffer(data, dtype=self.dtype) 1530 if self.output_format == 'rgb': 1531 image = image.reshape(*self.shape, 3) 1532 elif self.output_format == 'yuv': # Convert from planar YUV to pixel YUV. 1533 image = np.moveaxis(image.reshape(3, *self.shape), 0, 2) 1534 elif self.output_format == 'gray': # Generate 2D rather than 3D ndimage. 1535 image = image.reshape(*self.shape) 1536 else: 1537 raise AssertionError 1538 return image 1539 1540 def __iter__(self) -> Iterator[_NDArray]: 1541 while True: 1542 image = self.read() 1543 if image is None: 1544 return 1545 yield image 1546 1547 def close(self) -> None: 1548 """Terminates video reader. (Called automatically at end of context.)""" 1549 if self._popen: 1550 self._popen.__exit__(None, None, None) 1551 self._popen = None 1552 self._proc = None 1553 if self._read_via_local_file: 1554 # pylint: disable-next=no-member 1555 self._read_via_local_file.__exit__(None, None, None) 1556 self._read_via_local_file = None
Context to read a compressed video as an iterable over its images.
>>> with VideoReader('/tmp/river.mp4') as reader:
... print(f'Video has {reader.num_images} images with shape={reader.shape},'
... f' at {reader.fps} frames/sec and {reader.bps} bits/sec.')
... for image in reader:
... print(image.shape)
>>> with VideoReader('/tmp/river.mp4') as reader:
... video = np.array(tuple(reader))
>>> url = 'https://github.com/hhoppe/data/raw/main/video.mp4'
>>> with VideoReader(url) as reader:
... show_video(reader)
Attributes:
- path_or_url: Location of input video.
- output_format: Format of output images (default 'rgb'). If 'rgb', each image has shape=(height, width, 3) with R, G, B values. If 'yuv', each image has shape=(height, width, 3) with Y, U, V values. If 'gray', each image has shape=(height, width).
- dtype: Data type for output images. The default is
np.uint8. Use ofnp.uint16allows reading 10-bit or 12-bit data without precision loss. - metadata: Object storing the information retrieved from the video header. Its attributes are copied as attributes in this class.
- num_images: Number of frames that is expected from the video stream. This is estimated from the framerate and the duration stored in the video header, so it might be inexact.
- shape: The dimensions (height, width) of each video frame.
- fps: The framerate in frames per second.
- bps: The estimated bitrate of the video stream in bits per second, retrieved from the video header.
- stream_index: The stream index to read from. The default is 0.
- sandbox_max_run_time_secs: The maximum time in seconds to run the sandbox. If None, the default limit is 30 minutes. Unused in open source.
1428 def __init__( 1429 self, 1430 path_or_url: _Path, 1431 *, 1432 stream_index: int = 0, 1433 output_format: str = 'rgb', 1434 dtype: _DTypeLike = np.uint8, 1435 sandbox_max_run_time_secs: int | None = None, 1436 ): 1437 if output_format not in {'rgb', 'yuv', 'gray'}: 1438 raise ValueError( 1439 f'Output format {output_format} is not rgb, yuv, or gray.' 1440 ) 1441 self.path_or_url = path_or_url 1442 self.output_format = output_format 1443 self.stream_index = stream_index 1444 self.dtype = np.dtype(dtype) 1445 if self.dtype.type not in (np.uint8, np.uint16): 1446 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1447 self.sandbox_max_run_time_secs = sandbox_max_run_time_secs 1448 self._read_via_local_file: Any = None 1449 self._popen: subprocess.Popen[bytes] | None = None 1450 self._proc: subprocess.Popen[bytes] | None = None
1501 def read(self) -> _NDArray | None: 1502 """Reads a video image frame (or None if at end of file). 1503 1504 Returns: 1505 A numpy array in the format specified by `output_format`, i.e., a 3D 1506 array with 3 color channels, except for format 'gray' which is 2D. 1507 1508 Raises: 1509 RuntimeError: If there is an error reading from the output file. 1510 """ 1511 assert self._proc, 'Error: reading from an already closed context.' 1512 stdout = self._proc.stdout 1513 assert stdout is not None 1514 data = stdout.read(self._num_bytes_per_image) 1515 if not data: # Due to either end-of-file or subprocess error. 1516 self.close() # Raises exception if subprocess had error. 1517 return None # To indicate end-of-file. 1518 if len(data) != self._num_bytes_per_image: 1519 self._proc.wait() 1520 stderr = self._proc.stderr 1521 stderr_output = '' 1522 if stderr is not None: 1523 stderr_output = stderr.read().decode('utf-8', errors='replace').strip() 1524 raise RuntimeError( 1525 f'ffmpeg exited with code {self._proc.returncode}.\nIncomplete' 1526 f' frame read: expected {self._num_bytes_per_image} bytes, but got' 1527 f' {len(data)}.\nffmpeg stderr:\n{stderr_output}' 1528 ) 1529 image = np.frombuffer(data, dtype=self.dtype) 1530 if self.output_format == 'rgb': 1531 image = image.reshape(*self.shape, 3) 1532 elif self.output_format == 'yuv': # Convert from planar YUV to pixel YUV. 1533 image = np.moveaxis(image.reshape(3, *self.shape), 0, 2) 1534 elif self.output_format == 'gray': # Generate 2D rather than 3D ndimage. 1535 image = image.reshape(*self.shape) 1536 else: 1537 raise AssertionError 1538 return image
Reads a video image frame (or None if at end of file).
Returns:
A numpy array in the format specified by
output_format, i.e., a 3D array with 3 color channels, except for format 'gray' which is 2D.
Raises:
- RuntimeError: If there is an error reading from the output file.
1547 def close(self) -> None: 1548 """Terminates video reader. (Called automatically at end of context.)""" 1549 if self._popen: 1550 self._popen.__exit__(None, None, None) 1551 self._popen = None 1552 self._proc = None 1553 if self._read_via_local_file: 1554 # pylint: disable-next=no-member 1555 self._read_via_local_file.__exit__(None, None, None) 1556 self._read_via_local_file = None
Terminates video reader. (Called automatically at end of context.)
1559class VideoWriter(_VideoIO): 1560 """Context to write a compressed video. 1561 1562 >>> shape = 480, 640 1563 >>> with VideoWriter('/tmp/v.mp4', shape, fps=60) as writer: 1564 ... for image in moving_circle(shape, num_images=60): 1565 ... writer.add_image(image) 1566 >>> show_video(read_video('/tmp/v.mp4')) 1567 1568 1569 Bitrate control may be specified using at most one of: `bps`, `qp`, or `crf`. 1570 If none are specified, `qp` is set to a default value. 1571 See https://slhck.info/video/2017/03/01/rate-control.html 1572 1573 If codec is 'gif', the args `bps`, `qp`, `crf`, and `encoded_format` are 1574 ignored. 1575 1576 Attributes: 1577 path: Output video. Its suffix (e.g. '.mp4') determines the video container 1578 format. The suffix must be '.gif' if the codec is 'gif'. 1579 shape: 2D spatial dimensions (height, width) of video image frames. The 1580 dimensions must be even if 'encoded_format' has subsampled chroma (e.g., 1581 'yuv420p' or 'yuv420p10le'). 1582 codec: Compression algorithm as defined by "ffmpeg -codecs" (e.g., 'h264', 1583 'hevc', 'vp9', or 'gif'). 1584 metadata: Optional VideoMetadata object whose `fps` and `bps` attributes are 1585 used if not specified as explicit parameters. 1586 fps: Frames-per-second framerate (default is 60.0 except 25.0 for 'gif'). 1587 bps: Requested average bits-per-second bitrate (default None). 1588 qp: Quantization parameter for video compression quality (default None). 1589 crf: Constant rate factor for video compression quality (default None). 1590 ffmpeg_args: Additional arguments for `ffmpeg` command, e.g. '-g 30' to 1591 introduce I-frames, or '-bf 0' to omit B-frames. 1592 input_format: Format of input images (default 'rgb'). If 'rgb', each image 1593 has shape=(height, width, 3) or (height, width). If 'yuv', each image has 1594 shape=(height, width, 3) with Y, U, V values. If 'gray', each image has 1595 shape=(height, width). 1596 dtype: Expected data type for input images (any float input images are 1597 converted to `dtype`). The default is `np.uint8`. Use of `np.uint16` is 1598 necessary when encoding >8 bits/channel. 1599 encoded_format: Pixel format as defined by `ffmpeg -pix_fmts`, e.g., 1600 'yuv420p' (2x2-subsampled chroma), 'yuv444p' (full-res chroma), 1601 'yuv420p10le' (10-bit per channel), etc. The default (None) selects 1602 'yuv420p' if all shape dimensions are even, else 'yuv444p'. 1603 sandbox_max_run_time_secs: The maximum time in seconds to run the sandbox. 1604 If None, the default limit is 30 minutes. Unused in open source. 1605 """ 1606 1607 def __init__( 1608 self, 1609 path: _Path, 1610 shape: tuple[int, int], 1611 *, 1612 codec: str = 'h264', 1613 metadata: VideoMetadata | None = None, 1614 fps: float | None = None, 1615 bps: int | None = None, 1616 qp: int | None = None, 1617 crf: float | None = None, 1618 ffmpeg_args: str | Sequence[str] = '', 1619 input_format: str = 'rgb', 1620 dtype: _DTypeLike = np.uint8, 1621 encoded_format: str | None = None, 1622 sandbox_max_run_time_secs: int | None = None, 1623 ) -> None: 1624 _check_2d_shape(shape) 1625 if fps is None and metadata: 1626 fps = metadata.fps 1627 if fps is None: 1628 fps = 25.0 if codec == 'gif' else 60.0 1629 if fps <= 0.0: 1630 raise ValueError(f'Frame-per-second value {fps} is invalid.') 1631 if bps is None and metadata: 1632 bps = metadata.bps 1633 bps = int(bps) if bps is not None else None 1634 if bps is not None and bps <= 0: 1635 raise ValueError(f'Bitrate value {bps} is invalid.') 1636 if qp is not None and (not isinstance(qp, int) or qp < 0): 1637 raise ValueError( 1638 f'Quantization parameter {qp} cannot be negative. It must be a' 1639 ' non-negative integer.' 1640 ) 1641 num_rate_specifications = sum(x is not None for x in (bps, qp, crf)) 1642 if num_rate_specifications > 1: 1643 raise ValueError( 1644 f'Must specify at most one of bps, qp, or crf ({bps}, {qp}, {crf}).' 1645 ) 1646 ffmpeg_args = ( 1647 shlex.split(ffmpeg_args) 1648 if isinstance(ffmpeg_args, str) 1649 else list(ffmpeg_args) 1650 ) 1651 if input_format not in {'rgb', 'yuv', 'gray'}: 1652 raise ValueError(f'Input format {input_format} is not rgb, yuv, or gray.') 1653 dtype = np.dtype(dtype) 1654 if dtype.type not in (np.uint8, np.uint16): 1655 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1656 self.path = pathlib.Path(path) 1657 self.shape = shape 1658 all_dimensions_are_even = all(dim % 2 == 0 for dim in shape) 1659 if encoded_format is None: 1660 encoded_format = 'yuv420p' if all_dimensions_are_even else 'yuv444p' 1661 if not all_dimensions_are_even and encoded_format.startswith( 1662 ('yuv42', 'yuvj42') 1663 ): 1664 raise ValueError( 1665 f'With encoded_format {encoded_format}, video dimensions must be' 1666 f' even, but shape is {shape}.' 1667 ) 1668 self.fps = fps 1669 self.codec = codec 1670 self.bps = bps 1671 self.qp = qp 1672 self.crf = crf 1673 self.ffmpeg_args = ffmpeg_args 1674 self.input_format = input_format 1675 self.dtype = dtype 1676 self.encoded_format = encoded_format 1677 self.sandbox_max_run_time_secs = sandbox_max_run_time_secs 1678 if num_rate_specifications == 0 and not ffmpeg_args: 1679 qp = 20 if math.prod(self.shape) <= 640 * 480 else 28 1680 self._bitrate_args = ( 1681 (['-vb', f'{bps}'] if bps is not None else []) 1682 + (['-qp', f'{qp}'] if qp is not None else []) 1683 + (['-vb', '0', '-crf', f'{crf}'] if crf is not None else []) 1684 ) 1685 if self.codec == 'gif': 1686 if self.path.suffix != '.gif': 1687 raise ValueError(f"File '{self.path}' does not have a .gif suffix.") 1688 self.encoded_format = 'pal8' 1689 self._bitrate_args = [] 1690 video_filter = 'split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse' 1691 # Less common (and likely less useful) is a per-frame color palette: 1692 # video_filter = ('split[s0][s1];[s0]palettegen=stats_mode=single[p];' 1693 # '[s1][p]paletteuse=new=1') 1694 self.ffmpeg_args = ['-vf', video_filter, '-f', 'gif'] + self.ffmpeg_args 1695 self._write_via_local_file: Any = None 1696 self._popen: subprocess.Popen[bytes] | None = None 1697 self._proc: subprocess.Popen[bytes] | None = None 1698 1699 def __enter__(self) -> 'VideoWriter': 1700 input_pix_fmt = self._get_pix_fmt(self.dtype, self.input_format) 1701 try: 1702 self._write_via_local_file = _write_via_local_file(self.path) 1703 # pylint: disable-next=no-member 1704 tmp_name = self._write_via_local_file.__enter__() 1705 1706 # Writing to stdout using ('-f', 'mp4', '-') would require 1707 # ('-movflags', 'frag_keyframe+empty_moov') and the result is nonportable. 1708 height, width = self.shape 1709 command = ( 1710 [ 1711 '-v', 1712 'error', 1713 '-f', 1714 'rawvideo', 1715 '-vcodec', 1716 'rawvideo', 1717 '-pix_fmt', 1718 input_pix_fmt, 1719 '-s', 1720 f'{width}x{height}', 1721 '-r', 1722 f'{self.fps}', 1723 '-i', 1724 '-', 1725 '-an', 1726 '-vcodec', 1727 self.codec, 1728 '-pix_fmt', 1729 self.encoded_format, 1730 ] 1731 + self._bitrate_args 1732 + self.ffmpeg_args 1733 + ['-y', tmp_name] 1734 ) 1735 self._popen = _run_ffmpeg( 1736 command, 1737 stdin=subprocess.PIPE, 1738 stderr=subprocess.PIPE, 1739 allowed_output_files=[tmp_name], 1740 sandbox_max_run_time_secs=self.sandbox_max_run_time_secs, 1741 ) 1742 self._proc = self._popen.__enter__() 1743 except Exception: 1744 self.__exit__(None, None, None) 1745 raise 1746 return self 1747 1748 def __exit__(self, *_: Any) -> None: 1749 self.close() 1750 1751 def add_image(self, image: _NDArray) -> None: 1752 """Writes a video frame. 1753 1754 Args: 1755 image: Array whose dtype and first two dimensions must match the `dtype` 1756 and `shape` specified in `VideoWriter` initialization. If 1757 `input_format` is 'gray', the image must be 2D. For the 'rgb' 1758 input_format, the image may be either 2D (interpreted as grayscale) or 1759 3D with three (R, G, B) channels. For the 'yuv' input_format, the image 1760 must be 3D with three (Y, U, V) channels. 1761 1762 Raises: 1763 RuntimeError: If there is an error writing to the output file. 1764 """ 1765 assert self._proc, 'Error: writing to an already closed context.' 1766 if issubclass(image.dtype.type, (np.floating, np.bool_)): 1767 image = to_type(image, self.dtype) 1768 if image.dtype != self.dtype: 1769 raise ValueError(f'Image type {image.dtype} != {self.dtype}.') 1770 if self.input_format == 'gray': 1771 if image.ndim != 2: 1772 raise ValueError(f'Image dimensions {image.shape} are not 2D.') 1773 else: 1774 if image.ndim == 2 and self.input_format == 'rgb': 1775 image = np.dstack((image, image, image)) 1776 if not (image.ndim == 3 and image.shape[2] == 3): 1777 raise ValueError(f'Image dimensions {image.shape} are invalid.') 1778 if image.shape[:2] != self.shape: 1779 raise ValueError( 1780 f'Image dimensions {image.shape[:2]} do not match' 1781 f' those of the initialized video {self.shape}.' 1782 ) 1783 if self.input_format == 'yuv': # Convert from per-pixel YUV to planar YUV. 1784 image = np.moveaxis(image, 2, 0) 1785 data = image.tobytes() 1786 stdin = self._proc.stdin 1787 assert stdin is not None 1788 if stdin.write(data) != len(data): 1789 self._proc.wait() 1790 stderr = self._proc.stderr 1791 assert stderr is not None 1792 s = stderr.read().decode('utf-8') 1793 raise RuntimeError(f"Error writing '{self.path}': {s}") 1794 1795 def close(self) -> None: 1796 """Finishes writing the video. (Called automatically at end of context.)""" 1797 if self._popen: 1798 assert self._proc, 'Error: closing an already closed context.' 1799 stdin = self._proc.stdin 1800 assert stdin is not None 1801 stdin.close() 1802 if self._proc.wait(): 1803 stderr = self._proc.stderr 1804 assert stderr is not None 1805 s = stderr.read().decode('utf-8') 1806 raise RuntimeError(f"Error writing '{self.path}': {s}") 1807 self._popen.__exit__(None, None, None) 1808 self._popen = None 1809 self._proc = None 1810 if self._write_via_local_file: 1811 # pylint: disable-next=no-member 1812 self._write_via_local_file.__exit__(None, None, None) 1813 self._write_via_local_file = None
Context to write a compressed video.
>>> shape = 480, 640
>>> with VideoWriter('/tmp/v.mp4', shape, fps=60) as writer:
... for image in moving_circle(shape, num_images=60):
... writer.add_image(image)
>>> show_video(read_video('/tmp/v.mp4'))
Bitrate control may be specified using at most one of: bps, qp, or crf.
If none are specified, qp is set to a default value.
See https://slhck.info/video/2017/03/01/rate-control.html
If codec is 'gif', the args bps, qp, crf, and encoded_format are
ignored.
Attributes:
- path: Output video. Its suffix (e.g. '.mp4') determines the video container format. The suffix must be '.gif' if the codec is 'gif'.
- shape: 2D spatial dimensions (height, width) of video image frames. The dimensions must be even if 'encoded_format' has subsampled chroma (e.g., 'yuv420p' or 'yuv420p10le').
- codec: Compression algorithm as defined by "ffmpeg -codecs" (e.g., 'h264', 'hevc', 'vp9', or 'gif').
- metadata: Optional VideoMetadata object whose
fpsandbpsattributes are used if not specified as explicit parameters. - fps: Frames-per-second framerate (default is 60.0 except 25.0 for 'gif').
- bps: Requested average bits-per-second bitrate (default None).
- qp: Quantization parameter for video compression quality (default None).
- crf: Constant rate factor for video compression quality (default None).
- ffmpeg_args: Additional arguments for
ffmpegcommand, e.g. '-g 30' to introduce I-frames, or '-bf 0' to omit B-frames. - input_format: Format of input images (default 'rgb'). If 'rgb', each image has shape=(height, width, 3) or (height, width). If 'yuv', each image has shape=(height, width, 3) with Y, U, V values. If 'gray', each image has shape=(height, width).
- dtype: Expected data type for input images (any float input images are
converted to
dtype). The default isnp.uint8. Use ofnp.uint16is necessary when encoding >8 bits/channel. - encoded_format: Pixel format as defined by
ffmpeg -pix_fmts, e.g., 'yuv420p' (2x2-subsampled chroma), 'yuv444p' (full-res chroma), 'yuv420p10le' (10-bit per channel), etc. The default (None) selects 'yuv420p' if all shape dimensions are even, else 'yuv444p'. - sandbox_max_run_time_secs: The maximum time in seconds to run the sandbox. If None, the default limit is 30 minutes. Unused in open source.
1607 def __init__( 1608 self, 1609 path: _Path, 1610 shape: tuple[int, int], 1611 *, 1612 codec: str = 'h264', 1613 metadata: VideoMetadata | None = None, 1614 fps: float | None = None, 1615 bps: int | None = None, 1616 qp: int | None = None, 1617 crf: float | None = None, 1618 ffmpeg_args: str | Sequence[str] = '', 1619 input_format: str = 'rgb', 1620 dtype: _DTypeLike = np.uint8, 1621 encoded_format: str | None = None, 1622 sandbox_max_run_time_secs: int | None = None, 1623 ) -> None: 1624 _check_2d_shape(shape) 1625 if fps is None and metadata: 1626 fps = metadata.fps 1627 if fps is None: 1628 fps = 25.0 if codec == 'gif' else 60.0 1629 if fps <= 0.0: 1630 raise ValueError(f'Frame-per-second value {fps} is invalid.') 1631 if bps is None and metadata: 1632 bps = metadata.bps 1633 bps = int(bps) if bps is not None else None 1634 if bps is not None and bps <= 0: 1635 raise ValueError(f'Bitrate value {bps} is invalid.') 1636 if qp is not None and (not isinstance(qp, int) or qp < 0): 1637 raise ValueError( 1638 f'Quantization parameter {qp} cannot be negative. It must be a' 1639 ' non-negative integer.' 1640 ) 1641 num_rate_specifications = sum(x is not None for x in (bps, qp, crf)) 1642 if num_rate_specifications > 1: 1643 raise ValueError( 1644 f'Must specify at most one of bps, qp, or crf ({bps}, {qp}, {crf}).' 1645 ) 1646 ffmpeg_args = ( 1647 shlex.split(ffmpeg_args) 1648 if isinstance(ffmpeg_args, str) 1649 else list(ffmpeg_args) 1650 ) 1651 if input_format not in {'rgb', 'yuv', 'gray'}: 1652 raise ValueError(f'Input format {input_format} is not rgb, yuv, or gray.') 1653 dtype = np.dtype(dtype) 1654 if dtype.type not in (np.uint8, np.uint16): 1655 raise ValueError(f'Type {dtype} is not np.uint8 or np.uint16.') 1656 self.path = pathlib.Path(path) 1657 self.shape = shape 1658 all_dimensions_are_even = all(dim % 2 == 0 for dim in shape) 1659 if encoded_format is None: 1660 encoded_format = 'yuv420p' if all_dimensions_are_even else 'yuv444p' 1661 if not all_dimensions_are_even and encoded_format.startswith( 1662 ('yuv42', 'yuvj42') 1663 ): 1664 raise ValueError( 1665 f'With encoded_format {encoded_format}, video dimensions must be' 1666 f' even, but shape is {shape}.' 1667 ) 1668 self.fps = fps 1669 self.codec = codec 1670 self.bps = bps 1671 self.qp = qp 1672 self.crf = crf 1673 self.ffmpeg_args = ffmpeg_args 1674 self.input_format = input_format 1675 self.dtype = dtype 1676 self.encoded_format = encoded_format 1677 self.sandbox_max_run_time_secs = sandbox_max_run_time_secs 1678 if num_rate_specifications == 0 and not ffmpeg_args: 1679 qp = 20 if math.prod(self.shape) <= 640 * 480 else 28 1680 self._bitrate_args = ( 1681 (['-vb', f'{bps}'] if bps is not None else []) 1682 + (['-qp', f'{qp}'] if qp is not None else []) 1683 + (['-vb', '0', '-crf', f'{crf}'] if crf is not None else []) 1684 ) 1685 if self.codec == 'gif': 1686 if self.path.suffix != '.gif': 1687 raise ValueError(f"File '{self.path}' does not have a .gif suffix.") 1688 self.encoded_format = 'pal8' 1689 self._bitrate_args = [] 1690 video_filter = 'split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse' 1691 # Less common (and likely less useful) is a per-frame color palette: 1692 # video_filter = ('split[s0][s1];[s0]palettegen=stats_mode=single[p];' 1693 # '[s1][p]paletteuse=new=1') 1694 self.ffmpeg_args = ['-vf', video_filter, '-f', 'gif'] + self.ffmpeg_args 1695 self._write_via_local_file: Any = None 1696 self._popen: subprocess.Popen[bytes] | None = None 1697 self._proc: subprocess.Popen[bytes] | None = None
1751 def add_image(self, image: _NDArray) -> None: 1752 """Writes a video frame. 1753 1754 Args: 1755 image: Array whose dtype and first two dimensions must match the `dtype` 1756 and `shape` specified in `VideoWriter` initialization. If 1757 `input_format` is 'gray', the image must be 2D. For the 'rgb' 1758 input_format, the image may be either 2D (interpreted as grayscale) or 1759 3D with three (R, G, B) channels. For the 'yuv' input_format, the image 1760 must be 3D with three (Y, U, V) channels. 1761 1762 Raises: 1763 RuntimeError: If there is an error writing to the output file. 1764 """ 1765 assert self._proc, 'Error: writing to an already closed context.' 1766 if issubclass(image.dtype.type, (np.floating, np.bool_)): 1767 image = to_type(image, self.dtype) 1768 if image.dtype != self.dtype: 1769 raise ValueError(f'Image type {image.dtype} != {self.dtype}.') 1770 if self.input_format == 'gray': 1771 if image.ndim != 2: 1772 raise ValueError(f'Image dimensions {image.shape} are not 2D.') 1773 else: 1774 if image.ndim == 2 and self.input_format == 'rgb': 1775 image = np.dstack((image, image, image)) 1776 if not (image.ndim == 3 and image.shape[2] == 3): 1777 raise ValueError(f'Image dimensions {image.shape} are invalid.') 1778 if image.shape[:2] != self.shape: 1779 raise ValueError( 1780 f'Image dimensions {image.shape[:2]} do not match' 1781 f' those of the initialized video {self.shape}.' 1782 ) 1783 if self.input_format == 'yuv': # Convert from per-pixel YUV to planar YUV. 1784 image = np.moveaxis(image, 2, 0) 1785 data = image.tobytes() 1786 stdin = self._proc.stdin 1787 assert stdin is not None 1788 if stdin.write(data) != len(data): 1789 self._proc.wait() 1790 stderr = self._proc.stderr 1791 assert stderr is not None 1792 s = stderr.read().decode('utf-8') 1793 raise RuntimeError(f"Error writing '{self.path}': {s}")
Writes a video frame.
Arguments:
- image: Array whose dtype and first two dimensions must match the
dtypeandshapespecified inVideoWriterinitialization. Ifinput_formatis 'gray', the image must be 2D. For the 'rgb' input_format, the image may be either 2D (interpreted as grayscale) or 3D with three (R, G, B) channels. For the 'yuv' input_format, the image must be 3D with three (Y, U, V) channels.
Raises:
- RuntimeError: If there is an error writing to the output file.
1795 def close(self) -> None: 1796 """Finishes writing the video. (Called automatically at end of context.)""" 1797 if self._popen: 1798 assert self._proc, 'Error: closing an already closed context.' 1799 stdin = self._proc.stdin 1800 assert stdin is not None 1801 stdin.close() 1802 if self._proc.wait(): 1803 stderr = self._proc.stderr 1804 assert stderr is not None 1805 s = stderr.read().decode('utf-8') 1806 raise RuntimeError(f"Error writing '{self.path}': {s}") 1807 self._popen.__exit__(None, None, None) 1808 self._popen = None 1809 self._proc = None 1810 if self._write_via_local_file: 1811 # pylint: disable-next=no-member 1812 self._write_via_local_file.__exit__(None, None, None) 1813 self._write_via_local_file = None
Finishes writing the video. (Called automatically at end of context.)
1274class VideoMetadata(NamedTuple): 1275 """Represents the data stored in a video container header. 1276 1277 Attributes: 1278 num_images: Number of frames that is expected from the video stream. This 1279 is estimated from the framerate and the duration stored in the video 1280 header, so it might be inexact. We set the value to -1 if number of 1281 frames is not found in the header. 1282 shape: The dimensions (height, width) of each video frame. 1283 fps: The framerate in frames per second. 1284 bps: The estimated bitrate of the video stream in bits per second, retrieved 1285 from the video header. 1286 """ 1287 1288 num_images: int 1289 shape: tuple[int, int] 1290 fps: float 1291 bps: int | None
Represents the data stored in a video container header.
Attributes:
- num_images: Number of frames that is expected from the video stream. This is estimated from the framerate and the duration stored in the video header, so it might be inexact. We set the value to -1 if number of frames is not found in the header.
- shape: The dimensions (height, width) of each video frame.
- fps: The framerate in frames per second.
- bps: The estimated bitrate of the video stream in bits per second, retrieved from the video header.
859def compress_image( 860 image: _ArrayLike, *, fmt: str = 'png', **kwargs: Any 861) -> bytes: 862 """Returns a buffer containing a compressed image. 863 864 Args: 865 image: Array in a format supported by `PIL`, e.g. np.uint8 or np.uint16. 866 fmt: Desired compression encoding, e.g. 'png'. 867 **kwargs: Options for `PIL.save()`, e.g. `optimize=True` for greater 868 compression. 869 """ 870 image = _as_valid_media_array(image) 871 with io.BytesIO() as output: 872 _pil_image(image).save(output, format=fmt, **kwargs) 873 return output.getvalue()
Returns a buffer containing a compressed image.
Arguments:
- image: Array in a format supported by
PIL, e.g. np.uint8 or np.uint16. - fmt: Desired compression encoding, e.g. 'png'.
- **kwargs: Options for
PIL.save(), e.g.optimize=Truefor greater compression.
876def decompress_image( 877 data: bytes, dtype: _DTypeLike = None, apply_exif_transpose: bool = True 878) -> _NDArray: 879 """Returns an image from a compressed data buffer. 880 881 Decoding is performed using `PIL`, which supports `uint8` images with 1, 3, 882 or 4 channels and `uint16` images with a single channel. 883 884 Args: 885 data: Buffer containing compressed image. 886 dtype: Data type of the returned array. If None, `np.uint8` or `np.uint16` 887 is inferred automatically. 888 apply_exif_transpose: If True, rotate image according to EXIF orientation. 889 """ 890 pil_image: PIL.Image.Image = PIL.Image.open(io.BytesIO(data)) 891 if apply_exif_transpose: 892 tmp_image = PIL.ImageOps.exif_transpose(pil_image) # Future: in_place=True. 893 assert tmp_image 894 pil_image = tmp_image 895 if dtype is None: 896 dtype = np.uint16 if pil_image.mode.startswith('I') else np.uint8 897 return np.array(pil_image, dtype=dtype)
Returns an image from a compressed data buffer.
Decoding is performed using PIL, which supports uint8 images with 1, 3,
or 4 channels and uint16 images with a single channel.
Arguments:
- data: Buffer containing compressed image.
- dtype: Data type of the returned array. If None,
np.uint8ornp.uint16is inferred automatically. - apply_exif_transpose: If True, rotate image according to EXIF orientation.
1887def compress_video( 1888 images: Iterable[_NDArray], *, codec: str = 'h264', **kwargs: Any 1889) -> bytes: 1890 """Returns a buffer containing a compressed video. 1891 1892 The video container is 'gif' for 'gif' codec, 'webm' for 'vp9' codec, 1893 and mp4 otherwise. 1894 1895 >>> video = read_video('/tmp/river.mp4') 1896 >>> data = compress_video(video, bps=10_000_000) 1897 >>> print(len(data)) 1898 1899 >>> data = compress_video(moving_circle((100, 100), num_images=10), fps=10) 1900 1901 Args: 1902 images: Iterable over video frames. 1903 codec: Compression algorithm as defined by `ffmpeg -codecs` (e.g., 'h264', 1904 'hevc', 'vp9', or 'gif'). 1905 **kwargs: Additional parameters for `VideoWriter`. 1906 1907 Returns: 1908 A bytes buffer containing the compressed video. 1909 """ 1910 suffix = _filename_suffix_from_codec(codec) 1911 with tempfile.TemporaryDirectory() as directory_name: 1912 tmp_path = pathlib.Path(directory_name) / f'file{suffix}' 1913 write_video(tmp_path, images, codec=codec, **kwargs) 1914 return tmp_path.read_bytes()
Returns a buffer containing a compressed video.
The video container is 'gif' for 'gif' codec, 'webm' for 'vp9' codec, and mp4 otherwise.
>>> video = read_video('/tmp/river.mp4')
>>> data = compress_video(video, bps=10_000_000)
>>> print(len(data))
>>> data = compress_video(moving_circle((100, 100), num_images=10), fps=10)
Arguments:
- images: Iterable over video frames.
- codec: Compression algorithm as defined by
ffmpeg -codecs(e.g., 'h264', 'hevc', 'vp9', or 'gif'). - **kwargs: Additional parameters for
VideoWriter.
Returns:
A bytes buffer containing the compressed video.
1917def decompress_video(data: bytes, **kwargs: Any) -> _NDArray: 1918 """Returns video images from an MP4-compressed data buffer.""" 1919 with tempfile.TemporaryDirectory() as directory_name: 1920 tmp_path = pathlib.Path(directory_name) / 'file.mp4' 1921 tmp_path.write_bytes(data) 1922 return read_video(tmp_path, **kwargs)
Returns video images from an MP4-compressed data buffer.
900def html_from_compressed_image( 901 data: bytes, 902 width: int, 903 height: int, 904 *, 905 title: str | None = None, 906 border: bool | str = False, 907 pixelated: bool = True, 908 fmt: str = 'png', 909) -> str: 910 """Returns an HTML string with an image tag containing encoded data. 911 912 Args: 913 data: Compressed image bytes. 914 width: Width of HTML image in pixels. 915 height: Height of HTML image in pixels. 916 title: Optional text shown centered above image. 917 border: If `bool`, whether to place a black boundary around the image, or if 918 `str`, the boundary CSS style. 919 pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'. 920 fmt: Compression encoding. 921 """ 922 b64 = base64.b64encode(data).decode('utf-8') 923 if isinstance(border, str): 924 border = f'{border}; ' 925 elif border: 926 border = 'border:1px solid black; ' 927 else: 928 border = '' 929 s_pixelated = 'pixelated' if pixelated else 'auto' 930 s = ( 931 f'<img width="{width}" height="{height}"' 932 f' style="{border}image-rendering:{s_pixelated}; object-fit:cover;"' 933 f' src="data:image/{fmt};base64,{b64}"/>' 934 ) 935 if title is not None: 936 s = f"""<div style="display:flex; align-items:left;"> 937 <div style="display:flex; flex-direction:column; align-items:center;"> 938 <div>{title}</div><div>{s}</div></div></div>""" 939 return s
Returns an HTML string with an image tag containing encoded data.
Arguments:
- data: Compressed image bytes.
- width: Width of HTML image in pixels.
- height: Height of HTML image in pixels.
- title: Optional text shown centered above image.
- border: If
bool, whether to place a black boundary around the image, or ifstr, the boundary CSS style. - pixelated: If True, sets the CSS style to 'image-rendering: pixelated;'.
- fmt: Compression encoding.
1925def html_from_compressed_video( 1926 data: bytes, 1927 width: int, 1928 height: int, 1929 *, 1930 title: str | None = None, 1931 border: bool | str = False, 1932 loop: bool = True, 1933 autoplay: bool = True, 1934) -> str: 1935 """Returns an HTML string with a video tag containing H264-encoded data. 1936 1937 Args: 1938 data: MP4-compressed video bytes. 1939 width: Width of HTML video in pixels. 1940 height: Height of HTML video in pixels. 1941 title: Optional text shown centered above the video. 1942 border: If `bool`, whether to place a black boundary around the image, or if 1943 `str`, the boundary CSS style. 1944 loop: If True, the playback repeats forever. 1945 autoplay: If True, video playback starts without having to click. 1946 """ 1947 b64 = base64.b64encode(data).decode('utf-8') 1948 if isinstance(border, str): 1949 border = f'{border}; ' 1950 elif border: 1951 border = 'border:1px solid black; ' 1952 else: 1953 border = '' 1954 options = ( 1955 f'controls width="{width}" height="{height}"' 1956 f' style="{border}object-fit:cover;"' 1957 f'{" loop" if loop else ""}' 1958 f'{" autoplay muted" if autoplay else ""}' 1959 ) 1960 s = f"""<video {options}> 1961 <source src="data:video/mp4;base64,{b64}" type="video/mp4"/> 1962 This browser does not support the video tag. 1963 </video>""" 1964 if title is not None: 1965 s = f"""<div style="display:flex; align-items:left;"> 1966 <div style="display:flex; flex-direction:column; align-items:center;"> 1967 <div>{title}</div><div>{s}</div></div></div>""" 1968 return s
Returns an HTML string with a video tag containing H264-encoded data.
Arguments:
- data: MP4-compressed video bytes.
- width: Width of HTML video in pixels.
- height: Height of HTML video in pixels.
- title: Optional text shown centered above the video.
- border: If
bool, whether to place a black boundary around the image, or ifstr, the boundary CSS style. - loop: If True, the playback repeats forever.
- autoplay: If True, video playback starts without having to click.
615def resize_image(image: _ArrayLike, shape: tuple[int, int]) -> _NDArray: 616 """Resizes image to specified spatial dimensions using a Lanczos filter. 617 618 Args: 619 image: Array-like 2D or 3D object, where dtype is uint or floating-point. 620 shape: 2D spatial dimensions (height, width) of output image. 621 622 Returns: 623 A resampled image whose spatial dimensions match `shape`. 624 """ 625 image = _as_valid_media_array(image) 626 if image.ndim not in (2, 3): 627 raise ValueError(f'Image shape {image.shape} is neither 2D nor 3D.') 628 _check_2d_shape(shape) 629 630 # A PIL image can be multichannel only if it has 3 or 4 uint8 channels, 631 # and it can be resized only if it is uint8 or float32. 632 supported_single_channel = ( 633 np.issubdtype(image.dtype, np.floating) or image.dtype == np.uint8 634 ) and image.ndim == 2 635 supported_multichannel = ( 636 image.dtype == np.uint8 and image.ndim == 3 and image.shape[2] in (3, 4) 637 ) 638 if supported_single_channel or supported_multichannel: 639 return np.array( 640 _pil_image(image).resize( 641 shape[::-1], resample=PIL.Image.Resampling.LANCZOS 642 ), 643 dtype=image.dtype, 644 ) 645 if image.ndim == 2: 646 # We convert to floating-point for resizing and convert back. 647 return to_type(resize_image(to_float01(image), shape), image.dtype) 648 # We resize each image channel individually. 649 return np.dstack( 650 [resize_image(channel, shape) for channel in np.moveaxis(image, -1, 0)] 651 )
Resizes image to specified spatial dimensions using a Lanczos filter.
Arguments:
- image: Array-like 2D or 3D object, where dtype is uint or floating-point.
- shape: 2D spatial dimensions (height, width) of output image.
Returns:
A resampled image whose spatial dimensions match
shape.
657def resize_video(video: Iterable[_NDArray], shape: tuple[int, int]) -> _NDArray: 658 """Resizes `video` to specified spatial dimensions using a Lanczos filter. 659 660 Args: 661 video: Iterable of images. 662 shape: 2D spatial dimensions (height, width) of output video. 663 664 Returns: 665 A resampled video whose spatial dimensions match `shape`. 666 """ 667 _check_2d_shape(shape) 668 return np.array([resize_image(image, shape) for image in video])
Resizes video to specified spatial dimensions using a Lanczos filter.
Arguments:
- video: Iterable of images.
- shape: 2D spatial dimensions (height, width) of output video.
Returns:
A resampled video whose spatial dimensions match
shape.
815def to_rgb( 816 array: _ArrayLike, 817 *, 818 vmin: float | None = None, 819 vmax: float | None = None, 820 cmap: str | Callable[[_ArrayLike], _NDArray] = 'gray', 821) -> _NDArray: 822 """Maps scalar values to RGB using value bounds and a color map. 823 824 Args: 825 array: Scalar values, with arbitrary shape. 826 vmin: Explicit min value for remapping; if None, it is obtained as the 827 minimum finite value of `array`. 828 vmax: Explicit max value for remapping; if None, it is obtained as the 829 maximum finite value of `array`. 830 cmap: A `pyplot` color map or callable, to map from 1D value to 3D or 4D 831 color. 832 833 Returns: 834 A new array in which each element is affinely mapped from [vmin, vmax] 835 to [0.0, 1.0] and then color-mapped. 836 """ 837 a = _as_valid_media_array(array) 838 del array 839 # For future numpy version 1.7.0: 840 # vmin = np.amin(a, where=np.isfinite(a)) if vmin is None else vmin 841 # vmax = np.amax(a, where=np.isfinite(a)) if vmax is None else vmax 842 vmin = np.amin(np.where(np.isfinite(a), a, np.inf)) if vmin is None else vmin 843 vmax = np.amax(np.where(np.isfinite(a), a, -np.inf)) if vmax is None else vmax 844 a = (a.astype('float') - vmin) / (vmax - vmin + np.finfo(float).eps) 845 if isinstance(cmap, str): 846 if hasattr(matplotlib, 'colormaps'): 847 rgb_from_scalar: Any = matplotlib.colormaps[cmap] # Newer version. 848 else: 849 rgb_from_scalar = matplotlib.pyplot.cm.get_cmap(cmap) # pylint: disable=no-member 850 else: 851 rgb_from_scalar = cmap 852 a = cast(_NDArray, rgb_from_scalar(a)) 853 # If there is a fully opaque alpha channel, remove it. 854 if a.shape[-1] == 4 and np.all(to_float01(a[..., 3])) == 1.0: 855 a = a[..., :3] 856 return a
Maps scalar values to RGB using value bounds and a color map.
Arguments:
- array: Scalar values, with arbitrary shape.
- vmin: Explicit min value for remapping; if None, it is obtained as the
minimum finite value of
array. - vmax: Explicit max value for remapping; if None, it is obtained as the
maximum finite value of
array. - cmap: A
pyplotcolor map or callable, to map from 1D value to 3D or 4D color.
Returns:
A new array in which each element is affinely mapped from [vmin, vmax] to [0.0, 1.0] and then color-mapped.
376def to_type(array: _ArrayLike, dtype: _DTypeLike) -> _NDArray: 377 """Returns media array converted to specified type. 378 379 A "media array" is one in which the dtype is either a floating-point type 380 (np.float32 or np.float64) or an unsigned integer type. The array values are 381 assumed to lie in the range [0.0, 1.0] for floating-point values, and in the 382 full range for unsigned integers, e.g. [0, 255] for np.uint8. 383 384 Conversion between integers and floats maps uint(0) to 0.0 and uint(MAX) to 385 1.0. The input array may also be of type bool, whereby True maps to 386 uint(MAX) or 1.0. The values are scaled and clamped as appropriate during 387 type conversions. 388 389 Args: 390 array: Input array-like object (floating-point, unsigned int, or bool). 391 dtype: Desired output type (floating-point or unsigned int). 392 393 Returns: 394 Array `a` if it is already of the specified dtype, else a converted array. 395 """ 396 a = np.asarray(array) 397 dtype = np.dtype(dtype) 398 del array 399 if a.dtype != bool: 400 _as_valid_media_type(a.dtype) # Verify that 'a' has a valid dtype. 401 if a.dtype == bool: 402 result = a.astype(dtype) 403 if np.issubdtype(dtype, np.unsignedinteger): 404 result = result * dtype.type(np.iinfo(dtype).max) 405 elif a.dtype == dtype: 406 result = a 407 elif np.issubdtype(dtype, np.unsignedinteger): 408 if np.issubdtype(a.dtype, np.unsignedinteger): 409 src_max: float = np.iinfo(a.dtype).max 410 else: 411 a = np.clip(a, 0.0, 1.0) 412 src_max = 1.0 413 dst_max = np.iinfo(dtype).max 414 if dst_max <= np.iinfo(np.uint16).max: 415 scale = np.array(dst_max / src_max, dtype=np.float32) 416 result = (a * scale + 0.5).astype(dtype) 417 elif dst_max <= np.iinfo(np.uint32).max: 418 result = (a.astype(np.float64) * (dst_max / src_max) + 0.5).astype(dtype) 419 else: 420 # https://stackoverflow.com/a/66306123/ 421 a = a.astype(np.float64) * (dst_max / src_max) + 0.5 422 dst = np.atleast_1d(a) 423 values_too_large = dst >= np.float64(dst_max) 424 with np.errstate(invalid='ignore'): 425 dst = dst.astype(dtype) 426 dst[values_too_large] = dst_max 427 result = dst if a.ndim > 0 else dst[0] 428 else: 429 assert np.issubdtype(dtype, np.floating) 430 result = a.astype(dtype) 431 if np.issubdtype(a.dtype, np.unsignedinteger): 432 result = result / dtype.type(np.iinfo(a.dtype).max) 433 return result
Returns media array converted to specified type.
A "media array" is one in which the dtype is either a floating-point type (np.float32 or np.float64) or an unsigned integer type. The array values are assumed to lie in the range [0.0, 1.0] for floating-point values, and in the full range for unsigned integers, e.g. [0, 255] for np.uint8.
Conversion between integers and floats maps uint(0) to 0.0 and uint(MAX) to 1.0. The input array may also be of type bool, whereby True maps to uint(MAX) or 1.0. The values are scaled and clamped as appropriate during type conversions.
Arguments:
- array: Input array-like object (floating-point, unsigned int, or bool).
- dtype: Desired output type (floating-point or unsigned int).
Returns:
Array
aif it is already of the specified dtype, else a converted array.
436def to_float01(a: _ArrayLike, dtype: _DTypeLike = np.float32) -> _NDArray: 437 """If array has unsigned integers, rescales them to the range [0.0, 1.0]. 438 439 Scaling is such that uint(0) maps to 0.0 and uint(MAX) maps to 1.0. See 440 `to_type`. 441 442 Args: 443 a: Input array. 444 dtype: Desired floating-point type if rescaling occurs. 445 446 Returns: 447 A new array of dtype values in the range [0.0, 1.0] if the input array `a` 448 contains unsigned integers; otherwise, array `a` is returned unchanged. 449 """ 450 a = np.asarray(a) 451 dtype = np.dtype(dtype) 452 if not np.issubdtype(dtype, np.floating): 453 raise ValueError(f'Type {dtype} is not floating-point.') 454 if np.issubdtype(a.dtype, np.floating): 455 return a 456 return to_type(a, dtype)
If array has unsigned integers, rescales them to the range [0.0, 1.0].
Scaling is such that uint(0) maps to 0.0 and uint(MAX) maps to 1.0. See
to_type.
Arguments:
- a: Input array.
- dtype: Desired floating-point type if rescaling occurs.
Returns:
A new array of dtype values in the range [0.0, 1.0] if the input array
acontains unsigned integers; otherwise, arrayais returned unchanged.
459def to_uint8(a: _ArrayLike) -> _NDArray: 460 """Returns array converted to uint8 values; see `to_type`.""" 461 return to_type(a, np.uint8)
Returns array converted to uint8 values; see to_type.
329def set_output_height(num_pixels: int) -> None: 330 """Overrides the height of the current output cell, if using Colab.""" 331 try: 332 # We want to fail gracefully for non-Colab IPython notebooks. 333 output = importlib.import_module('google.colab.output') 334 s = f'google.colab.output.setIframeHeight("{num_pixels}px")' 335 output.eval_js(s) 336 except (ModuleNotFoundError, AttributeError): 337 pass
Overrides the height of the current output cell, if using Colab.
340def set_max_output_height(num_pixels: int) -> None: 341 """Sets the maximum height of the current output cell, if using Colab.""" 342 try: 343 # We want to fail gracefully for non-Colab IPython notebooks. 344 output = importlib.import_module('google.colab.output') 345 s = ( 346 'google.colab.output.setIframeHeight(' 347 f'0, true, {{maxHeight: {num_pixels}}})' 348 ) 349 output.eval_js(s) 350 except (ModuleNotFoundError, AttributeError): 351 pass
Sets the maximum height of the current output cell, if using Colab.
467def color_ramp( 468 shape: tuple[int, int] = (64, 64), *, dtype: _DTypeLike = np.float32 469) -> _NDArray: 470 """Returns an image of a red-green color gradient. 471 472 This is useful for quick experimentation and testing. See also 473 `moving_circle` to generate a sample video. 474 475 Args: 476 shape: 2D spatial dimensions (height, width) of generated image. 477 dtype: Type (uint or floating) of resulting pixel values. 478 """ 479 _check_2d_shape(shape) 480 dtype = _as_valid_media_type(dtype) 481 yx = (np.moveaxis(np.indices(shape), 0, -1) + 0.5) / shape 482 image = np.insert(yx, 2, 0.0, axis=-1) 483 return to_type(image, dtype)
Returns an image of a red-green color gradient.
This is useful for quick experimentation and testing. See also
moving_circle to generate a sample video.
Arguments:
- shape: 2D spatial dimensions (height, width) of generated image.
- dtype: Type (uint or floating) of resulting pixel values.
486def moving_circle( 487 shape: tuple[int, int] = (256, 256), 488 num_images: int = 10, 489 *, 490 dtype: _DTypeLike = np.float32, 491) -> _NDArray: 492 """Returns a video of a circle moving in front of a color ramp. 493 494 This is useful for quick experimentation and testing. See also `color_ramp` 495 to generate a sample image. 496 497 >>> show_video(moving_circle((480, 640), 60), fps=60) 498 499 Args: 500 shape: 2D spatial dimensions (height, width) of generated video. 501 num_images: Number of video frames. 502 dtype: Type (uint or floating) of resulting pixel values. 503 """ 504 _check_2d_shape(shape) 505 dtype = np.dtype(dtype) 506 507 def generate_image(image_index: int) -> _NDArray: 508 """Returns a video frame image.""" 509 image = color_ramp(shape, dtype=dtype) 510 yx = np.moveaxis(np.indices(shape), 0, -1) 511 center = shape[0] * 0.6, shape[1] * (image_index + 0.5) / num_images 512 radius_squared = (min(shape) * 0.1) ** 2 513 inside = np.sum((yx - center) ** 2, axis=-1) < radius_squared 514 white_circle_color = 1.0, 1.0, 1.0 515 if np.issubdtype(dtype, np.unsignedinteger): 516 white_circle_color = to_type([white_circle_color], dtype)[0] 517 image[inside] = white_circle_color 518 return image 519 520 return np.array([generate_image(i) for i in range(num_images)])
Returns a video of a circle moving in front of a color ramp.
This is useful for quick experimentation and testing. See also color_ramp
to generate a sample image.
>>> show_video(moving_circle((480, 640), 60), fps=60)
Arguments:
- shape: 2D spatial dimensions (height, width) of generated video.
- num_images: Number of video frames.
- dtype: Type (uint or floating) of resulting pixel values.
736class set_show_save_dir: # pylint: disable=invalid-name 737 """Save all titled output from `show_*()` calls into files. 738 739 If the specified `directory` is not None, all titled images and videos 740 displayed by `show_image`, `show_images`, `show_video`, and `show_videos` are 741 also saved as files within the directory. 742 743 It can be used either to set the state or as a context manager: 744 745 >>> set_show_save_dir('/tmp') 746 >>> show_image(color_ramp(), title='image1') # Creates /tmp/image1.png. 747 >>> show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4. 748 >>> set_show_save_dir(None) 749 750 >>> with set_show_save_dir('/tmp'): 751 ... show_image(color_ramp(), title='image1') # Creates /tmp/image1.png. 752 ... show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4. 753 """ 754 755 def __init__(self, directory: _Path | None): 756 self._old_show_save_dir = _config.show_save_dir 757 _config.show_save_dir = directory 758 759 def __enter__(self) -> None: 760 pass 761 762 def __exit__(self, *_: Any) -> None: 763 _config.show_save_dir = self._old_show_save_dir
Save all titled output from show_*() calls into files.
If the specified directory is not None, all titled images and videos
displayed by show_image, show_images, show_video, and show_videos are
also saved as files within the directory.
It can be used either to set the state or as a context manager:
>>> set_show_save_dir('/tmp')
>>> show_image(color_ramp(), title='image1') # Creates /tmp/image1.png.
>>> show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4.
>>> set_show_save_dir(None)
>>> with set_show_save_dir('/tmp'):
... show_image(color_ramp(), title='image1') # Creates /tmp/image1.png.
... show_video(moving_circle(), title='video2') # Creates /tmp/video2.mp4.
315def set_ffmpeg(name_or_path: _Path) -> None: 316 """Specifies the name or path for the `ffmpeg` external program. 317 318 The `ffmpeg` program is required for compressing and decompressing video. 319 (It is used in `read_video`, `write_video`, `show_video`, `show_videos`, 320 etc.) 321 322 Args: 323 name_or_path: Either a filename within a directory of `os.environ['PATH']` 324 or a filepath. The default setting is 'ffmpeg'. 325 """ 326 _config.ffmpeg_name_or_path = name_or_path
Specifies the name or path for the ffmpeg external program.
The ffmpeg program is required for compressing and decompressing video.
(It is used in read_video, write_video, show_video, show_videos,
etc.)
Arguments:
- name_or_path: Either a filename within a directory of
os.environ['PATH']or a filepath. The default setting is 'ffmpeg'.
1266def video_is_available() -> bool: 1267 """Returns True if the program `ffmpeg` is found. 1268 1269 See also `set_ffmpeg`. 1270 """ 1271 return _search_for_ffmpeg_path() is not None
Returns True if the program ffmpeg is found.
See also set_ffmpeg.