ferc_xbrl_extractor.cli

A command line interface (CLI) to the xbrl extractor.

Functions

parse()

Process base commands from the CLI.

write_to_sqlite(sqlite_engine, table_name, table_data)

Write one table to a SQLite database.

write_to_duckdb(duckdb_path, table_name, table_data)

Write one table to a duckdb database.

load_extracted(→ None)

Write extracted data to SQLite/Duckdb databases.

run_main(filings, taxonomy, output_dir, sqlite_path, ...)

Log setup, taxonomy finding, and SQL IO.

convert_duckdb_into_parquet(duckdb_path, parquet_dir)

Convert the duckdb into a directory of parquet files.

convert_and_validate_datapackage_sqlite_to_parquet(→ dict)

Convert the SQLite datapackage into one that points at Parquet files.

write_datapackage(datapackage, output_dir)

Write a datapackage to <output_dir>/datapackage.json.

main()

Parse arguments and pass to run_main.

Module Contents

ferc_xbrl_extractor.cli.parse()[source]

Process base commands from the CLI.

ferc_xbrl_extractor.cli.write_to_sqlite(sqlite_engine: sqlalchemy.engine.Engine, table_name: str, table_data: pandas.DataFrame)[source]

Write one table to a SQLite database.

ferc_xbrl_extractor.cli.write_to_duckdb(duckdb_path: str, table_name: str, table_data: pandas.DataFrame)[source]

Write one table to a duckdb database.

ferc_xbrl_extractor.cli.load_extracted(extracted: ferc_xbrl_extractor.xbrl.ExtractOutput, sqlite_uri: str, duckdb_path: str | None) None[source]

Write extracted data to SQLite/Duckdb databases.

ferc_xbrl_extractor.cli.run_main(filings: list[pathlib.Path] | list[io.BytesIO], taxonomy: str | pathlib.Path | io.BytesIO, output_dir: pathlib.Path, sqlite_path: pathlib.Path, duckdb_path: pathlib.Path, form_number: int, workers: int | None, batch_size: int | None, loglevel: str, logfile: pathlib.Path | None, requested_tables: list[str] | None = None, instance_pattern: str = '')[source]

Log setup, taxonomy finding, and SQL IO.

ferc_xbrl_extractor.cli.convert_duckdb_into_parquet(duckdb_path: pathlib.Path, parquet_dir: pathlib.Path)[source]

Convert the duckdb into a directory of parquet files.

We do this using COPY. We tried using EXPORT DATABASE, but it unfortunately sanitizes the table names, which removes the schedule numbers in the table names so we can’t use it.

ferc_xbrl_extractor.cli.convert_and_validate_datapackage_sqlite_to_parquet(datapackage_path: pathlib.Path) dict[source]

Convert the SQLite datapackage into one that points at Parquet files.

  • instead of path pointing at monolithic SQLite db, point at individual Parquet files instead

  • update format/metadata fields

  • remove irrelevant dialect field

ferc_xbrl_extractor.cli.write_datapackage(datapackage: dict, output_dir: pathlib.Path)[source]

Write a datapackage to <output_dir>/datapackage.json.

output_dir must exist.

ferc_xbrl_extractor.cli.main()[source]

Parse arguments and pass to run_main.