astrokit.datasets
This module provides several python classes to deal with large datasets including: Subaru HSC survey, DESI Legacy Imaging Surveys, NED database. More datasets will be added in the future when I use them :)
HSCRetriever
LegacySurvey
LegacySurvey provides utilities for working with local DESI Legacy Imaging Surveys
data, especially DR9 North and DR10 South tractor catalogs. It helps you locate
bricks, read local tractor files, collect catalogs around a source list, and
download coadd images.
Data Preparation
Download the Legacy Survey data from https://www.legacysurvey.org and organize the files in a local directory. The expected directory tree is:
<DIR_LEGACYSURVEY>/
dr9_north/
survey-bricks-dr9-north.fits.gz
tractor/<AAA>/tractor-<brickname>.fits
dr10_south/
survey-bricks-dr10-south.fits.gz
tractor/<AAA>/tractor-<brickname>.fits
Here <AAA> is the first three characters of a brick name. For example,
tractor-3375m395.fits should be placed under tractor/337/.
Initialization
from pathlib import Path
from astrokit.datasets import LegacySurvey
dir_legacysurvey = Path("/path/to/legacysurvey")
legacysurvey = LegacySurvey(dir_legacysurvey)
During initialization, LegacySurvey loads a combined brick table from DR9 North
and DR10 South:
bricksinfo = legacysurvey.bricksinfo
bricksinfo.head()
The bricksinfo table contains one row per brick, with columns such as:
Column |
Description |
|---|---|
|
Data release, either |
|
First three characters of the brick name |
|
Legacy Survey brick name |
|
Brick center coordinate in degrees |
|
Brick boundary in degrees |
The combined table is cached as legacysurvey_bricksinfo.parquet in
dir_legacysurvey after it is created for the first time.
Find Bricks Around One Position
Use find_brickname() when you only need the brick containing a coordinate:
brickname_dict = legacysurvey.find_brickname(ra=337.55, dec=-39.45)
print(brickname_dict)
Use find_bricks() when you need all bricks overlapping a circular search region
around one sky position:
bricks = legacysurvey.find_bricks(
ra=337.55,
dec=-39.45,
search_radius=60, # arcsec
show=True,
)
bricks
search_radius is in arcseconds. If show=True, the method plots the target,
the search circle, and the overlapping bricks.
Find Bricks Around Many Positions
For a large source list, use find_bricks_from_list(). This method first uses a
KD-tree on the sphere to select candidate bricks and then performs a fast local
plane circle-brick overlap check. It is much faster than looping over
find_bricks() source by source and is intended for arcmin-scale search radii.
bricks = legacysurvey.find_bricks_from_list(
ra=source_table["ra"],
dec=source_table["dec"],
search_radius="5 arcmin",
max_workers=8,
chunksize=5000,
)
bricks.head()
The returned object is a pandas.DataFrame with the same style as
legacysurvey.bricksinfo, but it only keeps the bricks overlapping at least one
input source search region. Duplicated bricks are removed.
search_radius accepts:
a number, interpreted as arcseconds, for consistency with
find_bricks();a string such as
"5 arcmin"or"300 arcsec";an
astropy.units.Quantity, such as5*u.arcmin.
If you also need to know which bricks belong to each input source, use
return_per_source=True:
bricks, bricks_per_source = legacysurvey.find_bricks_from_list(
ra=source_table["ra"],
dec=source_table["dec"],
search_radius="5 arcmin",
max_workers=8,
return_per_source=True,
)
bricksinfo_indexed = legacysurvey.bricksinfo.reset_index(drop=True)
first_source_bricks = bricksinfo_indexed.iloc[bricks_per_source[0]]
first_source_bricks[["release", "brickname"]]
Useful options:
Option |
Description |
|---|---|
|
Number of parallel workers used during exact filtering. Use |
|
Number of input sources in each parallel task. A value like |
|
If |
|
If |
|
If |
Locate and Read Tractor Catalogs
Use find_tractor_file() to locate the local tractor file for one brick:
path = legacysurvey.find_tractor_file(
release="dr10",
brickname="3375m395",
)
print(path)
Use load_tractor_catalog() to read a tractor catalog into a
pandas.DataFrame:
cat = legacysurvey.load_tractor_catalog(
brickname="3375m395",
release="dr10",
columns=["ra", "dec", "type", "flux_g", "flux_r", "flux_z"],
)
cat.head()
Multi-dimensional FITS columns are expanded into separate columns with
1-based suffixes, for example flux_ivar_1, flux_ivar_2, etc.
Query Catalogs Around a Source List
query_catalogs_around_source() is a higher-level workflow for source-list
matching. It:
writes the input source table to FITS;
finds all relevant bricks;
combines the local tractor files for DR9 and DR10 separately;
cross-matches the input sources with the combined tractor catalogs using STILTS.
from pathlib import Path
legacysurvey.query_catalogs_around_source(
input_cat=source_table,
output_dir=Path("output/legacy_survey_match"),
task_name="my_sources",
search_radius=5, # arcsec, passed to STILTS matching
col_ra="ra",
col_dec="dec",
max_workers=16,
)
Expected outputs include:
Output |
Description |
|---|---|
|
Input source catalog |
|
Bricks found for each input source |
|
Combined tractor catalogs |
|
Final matched catalogs |
This method requires STILTS to be available through astrokit.wrapper.stilts.
Make Legacy Survey IDs
make_ls_id() creates string identifiers from release, brickid, and objid:
ls_id = legacysurvey.make_ls_id(
release=cat["release"],
brickid=cat["brickid"],
objid=cat["objid"],
)
The output format is:
<release>_<brickid>_<objid>
Download Coadd Images
Use download_image() to download one coadd image file from the Legacy Survey
portal:
legacysurvey.download_image(
release="dr10",
brickname="3375m395",
band="r",
dir_output=Path("images"),
)