astrokit.datasets

This module provides several python classes to deal with large datasets including: Subaru HSC survey, DESI Legacy Imaging Surveys, NED database. More datasets will be added in the future when I use them :)

HSCRetriever

LegacySurvey

LegacySurvey provides utilities for working with local DESI Legacy Imaging Surveys data, especially DR9 North and DR10 South tractor catalogs. It helps you locate bricks, read local tractor files, collect catalogs around a source list, and download coadd images.

Data Preparation

Download the Legacy Survey data from https://www.legacysurvey.org and organize the files in a local directory. The expected directory tree is:

<DIR_LEGACYSURVEY>/
        dr9_north/
            survey-bricks-dr9-north.fits.gz
            tractor/<AAA>/tractor-<brickname>.fits
        dr10_south/
            survey-bricks-dr10-south.fits.gz
            tractor/<AAA>/tractor-<brickname>.fits

Here <AAA> is the first three characters of a brick name. For example, tractor-3375m395.fits should be placed under tractor/337/.

Initialization

from pathlib import Path
from astrokit.datasets import LegacySurvey

dir_legacysurvey = Path("/path/to/legacysurvey")
legacysurvey = LegacySurvey(dir_legacysurvey)

During initialization, LegacySurvey loads a combined brick table from DR9 North and DR10 South:

bricksinfo = legacysurvey.bricksinfo
bricksinfo.head()

The bricksinfo table contains one row per brick, with columns such as:

Column

Description

release

Data release, either dr9 or dr10

AAA

First three characters of the brick name

brickname

Legacy Survey brick name

ra, dec

Brick center coordinate in degrees

ra1, ra2, dec1, dec2

Brick boundary in degrees

The combined table is cached as legacysurvey_bricksinfo.parquet in dir_legacysurvey after it is created for the first time.

Find Bricks Around One Position

Use find_brickname() when you only need the brick containing a coordinate:

brickname_dict = legacysurvey.find_brickname(ra=337.55, dec=-39.45)
print(brickname_dict)

Use find_bricks() when you need all bricks overlapping a circular search region around one sky position:

bricks = legacysurvey.find_bricks(
    ra=337.55,
    dec=-39.45,
    search_radius=60,  # arcsec
    show=True,
)
bricks

search_radius is in arcseconds. If show=True, the method plots the target, the search circle, and the overlapping bricks.

Find Bricks Around Many Positions

For a large source list, use find_bricks_from_list(). This method first uses a KD-tree on the sphere to select candidate bricks and then performs a fast local plane circle-brick overlap check. It is much faster than looping over find_bricks() source by source and is intended for arcmin-scale search radii.

bricks = legacysurvey.find_bricks_from_list(
    ra=source_table["ra"],
    dec=source_table["dec"],
    search_radius="5 arcmin",
    max_workers=8,
    chunksize=5000,
)
bricks.head()

The returned object is a pandas.DataFrame with the same style as legacysurvey.bricksinfo, but it only keeps the bricks overlapping at least one input source search region. Duplicated bricks are removed.

search_radius accepts:

  • a number, interpreted as arcseconds, for consistency with find_bricks();

  • a string such as "5 arcmin" or "300 arcsec";

  • an astropy.units.Quantity, such as 5*u.arcmin.

If you also need to know which bricks belong to each input source, use return_per_source=True:

bricks, bricks_per_source = legacysurvey.find_bricks_from_list(
    ra=source_table["ra"],
    dec=source_table["dec"],
    search_radius="5 arcmin",
    max_workers=8,
    return_per_source=True,
)

bricksinfo_indexed = legacysurvey.bricksinfo.reset_index(drop=True)
first_source_bricks = bricksinfo_indexed.iloc[bricks_per_source[0]]
first_source_bricks[["release", "brickname"]]

Useful options:

Option

Description

max_workers

Number of parallel workers used during exact filtering. Use 1 to disable parallel execution.

chunksize

Number of input sources in each parallel task. A value like 5000 or 10000 is usually reasonable for large catalogs.

return_per_source

If True, return (bricks, bricks_per_source).

check_files

If True, add file_is_ready, indicating whether the local tractor file exists.

show_progress

If True, show a progress bar.

Locate and Read Tractor Catalogs

Use find_tractor_file() to locate the local tractor file for one brick:

path = legacysurvey.find_tractor_file(
    release="dr10",
    brickname="3375m395",
)
print(path)

Use load_tractor_catalog() to read a tractor catalog into a pandas.DataFrame:

cat = legacysurvey.load_tractor_catalog(
    brickname="3375m395",
    release="dr10",
    columns=["ra", "dec", "type", "flux_g", "flux_r", "flux_z"],
)
cat.head()

Multi-dimensional FITS columns are expanded into separate columns with 1-based suffixes, for example flux_ivar_1, flux_ivar_2, etc.

Query Catalogs Around a Source List

query_catalogs_around_source() is a higher-level workflow for source-list matching. It:

  1. writes the input source table to FITS;

  2. finds all relevant bricks;

  3. combines the local tractor files for DR9 and DR10 separately;

  4. cross-matches the input sources with the combined tractor catalogs using STILTS.

from pathlib import Path

legacysurvey.query_catalogs_around_source(
    input_cat=source_table,
    output_dir=Path("output/legacy_survey_match"),
    task_name="my_sources",
    search_radius=5,       # arcsec, passed to STILTS matching
    col_ra="ra",
    col_dec="dec",
    max_workers=16,
)

Expected outputs include:

Output

Description

<task_name>_input_catalog.fits

Input source catalog

<task_name>_bricksinfo.pkl

Bricks found for each input source

<task_name>_dr9_all.fits, <task_name>_dr10_all.fits

Combined tractor catalogs

<task_name>_dr9.fits, <task_name>_dr10.fits

Final matched catalogs

This method requires STILTS to be available through astrokit.wrapper.stilts.

Make Legacy Survey IDs

make_ls_id() creates string identifiers from release, brickid, and objid:

ls_id = legacysurvey.make_ls_id(
    release=cat["release"],
    brickid=cat["brickid"],
    objid=cat["objid"],
)

The output format is:

<release>_<brickid>_<objid>

Download Coadd Images

Use download_image() to download one coadd image file from the Legacy Survey portal:

legacysurvey.download_image(
    release="dr10",
    brickname="3375m395",
    band="r",
    dir_output=Path("images"),
)

query_NED