[Devlog] Coordinates and scale

Adrien Foucart, PhD in biomedical engineering.

Get sparse and irregular email updates by subscribing to https://adfoucart.substack.com. This website is guaranteed 100% human-written, ad-free and tracker-free.

Part 1 - why it’s complicated

When working with digital pathology images, particularly in a context where we have to use multiple libraries, compare to different modalities, and work at different scale, it’s very easy to get lost in the coordinates systems. It’s (for me at least) an extremely common source of error in my code, where I mix my axes or screw up in some way the indexing of the region I’m trying to adress.

Let’s take a quick example to see what I mean. To open whole-slide images (WSI), I use the Openslide library:

import openslide

slide = openslide.OpenSlide("path/to/slide.ndpi")

These WSI are multi-scale, meaning that they have been scanned at different levels of magnifications in the microscope. From the OpenSlide object, we can get information about the available levels, magnification, and corresponding resolution.

max_mag = float(slide.properties[openslide.PROPERTY_NAME_OBJECTIVE_POWER])
mpp_x = float(slide.properties[openslide.PROPERTY_NAME_MPP_X])
mpp_y = float(slide.properties[openslide.PROPERTY_NAME_MPP_Y])

print(f"Maximum magnification: {max_mag}x")
print(f"Maximum resolution: {mpp_x:.2f}x{mpp_y:.2f}µm/px")
print(f"Number of levels: {slide.level_count}")
for level in range(slide.level_count):
    print(f"[{level}] {slide.level_dimensions[level]}px\t" +
          f"@{max_mag/slide.level_downsamples[level]:.2f}x\t" +
          f"{mpp_x*slide.level_downsamples[level]:.2f}x{mpp_y*slide.level_downsamples[level]:.2f}µm/px (" +
Maximum magnification: 20.0x
Maximum resolution: 0.46x0.46µm/px
Number of levels: 8
[0] (76800, 38016)px    @20.00x 0.46x0.46µm/px (downsample=1.0)
[1] (38400, 19008)px    @10.00x 0.92x0.92µm/px (downsample=2.0)
[2] (19200, 9504)px @5.00x  1.84x1.84µm/px (downsample=4.0)
[3] (9600, 4752)px  @2.50x  3.67x3.67µm/px (downsample=8.0)
[4] (4800, 2376)px  @1.25x  7.35x7.35µm/px (downsample=16.0)
[5] (2400, 1188)px  @0.62x  14.69x14.69µm/px (downsample=32.0)
[6] (1200, 594)px   @0.31x  29.38x29.38µm/px (downsample=64.0)
[7] (600, 297)px    @0.16x  58.76x58.77µm/px (downsample=128.0)

At this point, we have not yet accessed the image data. If we want to do that, we need to call the read_region method, which takes as input three parameters: the location (of the top-left corner), the level and the dimensions. Here, we get a first common source of error, which is that the location needs to be expressed in “pixels at the highest available resolutions” (i.e. level 0), while the dimension is expressed at the target level. In both cases, the convention used for the axis order is (x, y), with x being the “width” direction and y the “height”.

So, let’s first look at the whole image at the lowest level of magnification:

from matplotlib import pyplot as plt
%matplotlib inline

im_low = slide.read_region((0, 0), 7, (600, 297))

Full image at lowest available magnification (~59x59µm/px)

Now if we want to look more specifically at the tissue region on the left at a higher magnification (say, 1.25x, corresponding to “level 4” in this image), we have some computations to do. We can easily determine good coordinates and dimensions at the level that we’re currently looking at. Something like (10, 130) for the top-left corner, (120, 150) for the dimensions should work. But we need to convert those values to the right coordinates systems. For the location, it’s the highest available resolution, which is 128x higher than the one we were looking at.

So we need:

location = (10*128, 130*128)

And for the dimensions, we need to look at level 4, which is 8x more magnification than the current view, so we need:

dimensions = (120*8, 150*8)

Let’s check if our math was correct:

tissue_region = slide.read_region(location, 4, dimensions)

Tissue region @1.25x magnification (~7x7µm/px)

Perfect… but as you can see, it requires some thinking to make sure we get the correct region. And it gets worse. Because read_region returns an Image object from the PIL library, but very often when we do some actual processing (or to display it like we just did with matplotlib) we want to have access directly to the pixel data, which we will typically store in a numpy array. The conversion is easily done, but we can see something interesting when we print the shape of this array:

import numpy as np
tissue_region_array = np.array(tissue_region)
(1200, 960, 4)

The dimensions are now expressed with a different convention: the first axis is the “rows” (or height), the second the “columns” (width), and then we have 4 channels, as the image has been opened as an RGBA image, with an Alpha channel. So if we want to adress a region in this array, we have to use a (y, x) convention instead of (x, y).

That’s not all. For registration tasks, a very useful library is SimpleITK. Which has, of course, its own image format, which can cause some trouble if we’re not careful:

import SimpleITK as sitk

tissue_region_sitk = sitk.GetImageFromArray(tissue_region_array)
print(tissue_region_sitk.GetWidth(), tissue_region_sitk.GetHeight(), tissue_region_sitk.GetDepth())
(4, 960, 1200)
4 960 1200

As we can see here, the method to read an image from a numpy array interpreted the numpy dimensions as (depth, height, width), which is not ideal if we want to process the image in SimpleITK… If we want SimpleITK to give the expected results, we need to specify that the pixel values are vectors (here, RGBA vectors):

tissue_region_sitk = sitk.GetImageFromArray(tissue_region_array, isVector=True)
print(tissue_region_sitk.GetWidth(), tissue_region_sitk.GetHeight(), tissue_region_sitk.GetDepth())
(960, 1200)
960 1200 0

And we are back to the (x, y) convention, like PIL. So if we want to access the same pixel in the three representations of the tissue region, we have:

px_pil = tissue_region.getpixel((400, 300))
px_np = tissue_region_array[300, 400]
px_sitk = tissue_region_sitk.GetPixel(400, 300)
(181, 84, 151, 255)
[181  84 151 255]
(181, 84, 151, 255)

Another interesting observation is that, in sitk, we can bring back some information about the location and the “real-world” resolution of the image, informations that are completely lost in the PIL and Numpy representations. With the “Origin” and “Spacing”, we can put the image into a world representation. Let’s do it here using µm for our dimensions.

tissue_region_sitk.SetOrigin((10*128*mpp_x, 130*128*mpp_y)) # offset in µm from the top-left corner of the image
tissue_region_sitk.SetSpacing((16*mpp_x, 16*mpp_y)) # resolution in µm at the level 4 (which is 16x downsampled from the maximum)

And we can see that all these informations are contained into the sitk.Image object.

VectorImage (0000028FFFD0E2E0)
  RTTI typeinfo:   class itk::VectorImage<unsigned char,2>
  Reference Count: 1
  Modified Time: 1676
  Debug: Off
  Object Name: 
  Source: (none)
  Source output name: (none)
  Release Data: Off
  Data Released: False
  Global Release Data: Off
  PipelineMTime: 0
  UpdateMTime: 0
  RealTimeStamp: 0 seconds 
    Dimension: 2
    Index: [0, 0]
    Size: [960, 1200]
    Dimension: 2
    Index: [0, 0]
    Size: [960, 1200]
    Dimension: 2
    Index: [0, 0]
    Size: [960, 1200]
  Spacing: [7.34518, 7.34585]
  Origin: [587.614, 7639.69]
1 0
0 1

7.34518 0
0 7.34585

0.136144 0
0 0.136131

  Inverse Direction: 
1 0
0 1

  VectorLength: 4
    ImportImageContainer (0000028F825273F0)
      RTTI typeinfo:   class itk::ImportImageContainer<unsigned __int64,unsigned char>
      Reference Count: 1
      Modified Time: 1673
      Debug: Off
      Object Name: 
      Pointer: 0000028F883B1040
      Container manages memory: true
      Size: 4608000
      Capacity: 4608000

Part 2 - how can we manage it more easily?

Clearly, this needs to be managed in some way. Here’s what I’ve been doing at the moment. First, I want to make sure that I can use the coordinates system more explicitly. So I want to define a “vector” class for coordinates and dimensions representation, using dataclasses (if you want more on dataclasses and why they’re useful, I recommend ArjanCodes’ video on YouTube on the topic):

from dataclasses import dataclass
from typing import Union

class Vector2D:
    x: Union[int, float]
    y: Union[int, float]
    def xy(self):
        return self.x, self.y
    def yx(self):
        return self.y, self.x

coordinates = Vector2D(x=20, y=130)
print(coordinates.xy, coordinates.yx)
(20, 130) (130, 20)

Next, I want to make the “navigation” in the WSI more transparent. For that, I’ll wrap the OpenSlide class into a WholeSlide class. There are several “helper” functions that I want to add there:

  • Being able to find the “level” for a target magnification.
  • Being able to extract regions based on relative coordinates (so that I can easily find the same region at different scales).
class WholeSlide:
    def __init__(self, path: str):
        self.path = path
        self.slide = openslide.OpenSlide(path)
        if openslide.PROPERTY_NAME_OBJECTIVE_POWER in self.slide.properties:
            self.mag = float(self.slide.properties[openslide.PROPERTY_NAME_OBJECTIVE_POWER])
            self.mag = 1
        if openslide.PROPERTY_NAME_MPP_X in self.slide.properties \
                and openslide.PROPERTY_NAME_MPP_Y in self.slide.properties:
            self.mpp = Vector2D(x=float(self.slide.properties[openslide.PROPERTY_NAME_MPP_X]),
            self.mpp = Vector2D(x=1., y=1.)

        self.dimensions = Vector2D(x=self.slide.level_dimensions[0][0], y=self.slide.level_dimensions[0][1]) 
    def get_best_level_for_magnification(self, magnification: float):
        return self.slide.get_best_level_for_downsample(self.mag/magnification)
    def get_absolute_position(self, location: Vector2D):
        """Returns location in µm from the top-left corner"""
        return Vector2D(x=location.x*self.dimensions.x*self.mpp.x,
    def read_region(self, location: Vector2D, magnification: float, dimensions: Vector2D):
        level = self.get_best_level_for_magnification(magnification)
        dimensions_at_level = Vector2D(x=int(dimensions.x*self.slide.level_dimensions[level][0]),
        location_abs = Vector2D(x=int(location.x*self.dimensions.x), y=int(location.y*self.dimensions.y))
        # check if we need to further rescale to get to the target magnification or if we can use the level as is
        if self.slide.level_downsamples[level] == self.mag/magnification:
            return self.slide.read_region(location_abs.xy, level, dimensions_at_level.xy)
        print(f"Resizing from {self.mag/self.slide.level_downsamples[level]}x")
        region = self.slide.read_region(location_abs.xy, level, dimensions_at_level.xy)
        factor = self.slide.level_downsamples[level]/(self.mag/magnification)
        target_dimensions = Vector2D(x=int(dimensions_at_level.x*factor), y=int(dimensions_at_level.y*factor))
        return region.resize(target_dimensions.xy)

Let’s check that we can get the full image, and the same region at different scales using our relative coordinates:

wsi = WholeSlide("path/to/slide.ndpi")

Full image:

plt.imshow(wsi.read_region(location=Vector2D(0, 0), magnification=20/128, dimensions=Vector2D(x=1, y=1)))
Full image at lowest available magnification (~59x59µm/px)

Region @1.25x, 1x, 2.5x

region_location = Vector2D(x=0.015, y=0.43)
region_dimensions = Vector2D(x=0.2, y=0.5)


plt.subplot(1, 3, 1)
plt.imshow(wsi.read_region(location=region_location, magnification=1.25, dimensions=region_dimensions))
plt.subplot(1, 3, 2)
plt.imshow(wsi.read_region(location=region_location, magnification=1., dimensions=region_dimensions))
plt.subplot(1, 3, 3)
plt.imshow(wsi.read_region(location=region_location, magnification=2.5, dimensions=region_dimensions))
Loaction=Vector2D(x=528.8527750998485, y=7505.109958220467)
Resizing from 1.25x
Tissue region @1.25x, 1x and 2.5x magnification

That’s already a lot more convenient. Now we still have to deal with the multi-library situation. What I really want is to make sure that, when I extract a region, I can keep all the relevant information (offset, magnification) along the way, and I can get the image in any format I want. So for that I’m going to create a “SlideRegion” class, and move the read_region code here into a _load method. I’ll also add as_pil, as_np and as_sitk methods to get the images in the requested formats.

class SlideRegion:
    def __init__(self, *,
                 wsi: WholeSlide,
                 location: Vector2D,
                 dimensions: Vector2D,
                 magnification: float):
        self.wsi = wsi
        self.location = location
        self.dimensions = dimensions
        self.magnification = magnification
        self.pil = None

    def _load(self):
        level = self.wsi.get_best_level_for_magnification(self.magnification)
        dimensions_at_level = Vector2D(x=int(self.dimensions.x*self.wsi.slide.level_dimensions[level][0]),
        location_abs = Vector2D(x=int(self.location.x*self.wsi.dimensions.x), 
        # check if we need to further rescale to get to the target magnification or if we can use the level as is
        if self.wsi.slide.level_downsamples[level] == self.wsi.mag/self.magnification:
            self.pil = self.wsi.slide.read_region(location_abs.xy, level, dimensions_at_level.xy)
        print(f"Resizing from {self.wsi.mag/self.wsi.slide.level_downsamples[level]}x")
        region = self.wsi.slide.read_region(location_abs.xy, level, dimensions_at_level.xy)
        factor = self.wsi.slide.level_downsamples[level]/(self.wsi.mag/self.magnification)
        target_dimensions = Vector2D(x=int(dimensions_at_level.x*factor), y=int(dimensions_at_level.y*factor))
        self.pil = region.resize(target_dimensions.xy)

    def as_pil(self):
        if self.pil is None:
        return self.pil

    def as_np(self):
        if self.pil is None:
        return np.array(self.pil)

    def as_sitk(self, floating = False):
        """Floating parameters indicates if the spacing / offset should be set (if floating is False)."""
        if self.pil is None:
        image = sitk.GetImageFromArray(np.array(self.pil), isVector=True)
        if not floating:
            downsample = self.wsi.mag/self.magnification       
            spacing = Vector2D(x=float(self.wsi.mpp.x) * downsample,
                               y=float(self.wsi.mpp.y) * downsample)
            origin = self.wsi.get_absolute_position(self.location)
            image.SetSpacing((spacing.x, spacing.y))
            image.SetOrigin((origin.x, origin.y))

        return image
    def absolute_position(self):
        return self.wsi.get_absolute_position(self.location)

And we can now modify the WholeSlide class so that read_region returns a SlideRegion:

class WholeSlide:
    def read_region(self, location: Vector2D, magnification: float, dimensions: Vector2D):
        return SlideRegion(wsi=self, location=location, magnification=magnification, dimensions=dimensions)

So now we can get our region in the format that we want, while keeping the relevant information:

wsi = WholeSlide("path/to/slide.ndpi")

region_location = Vector2D(x=0.015, y=0.43)
region_dimensions = Vector2D(x=0.2, y=0.5)

region = wsi.read_region(location=region_location, dimensions=region_dimensions, magnification=1.25) 
pos = region.absolute_position()
plt.title(f"Location=(x={pos.x:.2f},y={pos.y:.2f})µm @{region.magnification}x")
Tissue region @1.25x magnification, with location information

We can check that if we get sitk images at different resolutions, we have the correct information about their origin & spacing:

region_sitk = region.as_sitk()
print(region_sitk.GetOrigin(), region_sitk.GetSpacing())
region_lowres = wsi.read_region(location=region_location, dimensions=region_dimensions, magnification=0.05).as_sitk()
print(region_lowres.GetOrigin(), region_lowres.GetSpacing())
(528.8527750998485, 7505.109958220467) (7.34517743194234, 7.345851889261283)
Resizing from 0.15625x
(528.8527750998485, 7505.109958220467) (183.62943579855852, 183.64629723153206)

This should make it a lot less likely for me to screw up in the future !