ferc_xbrl_extractor.xbrl ======================== .. py:module:: ferc_xbrl_extractor.xbrl .. autoapi-nested-parse:: XBRL extractor. Classes ------- .. autoapisummary:: ferc_xbrl_extractor.xbrl.ExtractOutput Functions --------- .. autoapisummary:: ferc_xbrl_extractor.xbrl.extract ferc_xbrl_extractor.xbrl.table_data_from_instances ferc_xbrl_extractor.xbrl.process_batch ferc_xbrl_extractor.xbrl.process_instance ferc_xbrl_extractor.xbrl.get_fact_tables Module Contents --------------- .. py:class:: ExtractOutput Bases: :py:obj:`tuple` .. py:attribute:: table_defs .. py:attribute:: table_data .. py:attribute:: stats .. py:function:: extract(filings: list[pathlib.Path] | list[io.BytesIO], taxonomy_source: pathlib.Path | io.BytesIO, form_number: int, db_uri: str, datapackage_path: pathlib.Path | None = None, metadata_path: pathlib.Path | None = None, requested_tables: set[str] | None = None, instance_pattern: str = '', workers: int | None = None, batch_size: int | None = None) -> ExtractOutput Extract fact tables from instance documents as Pandas dataframes. :param filings: list of filings or zip files with filings. :param taxonomy_source: either a URL/path to taxonomy or in memory archive of taxonomy. :param form_number: the FERC form number (1, 2, 6, 60, 714). :param db_uri: the location of the database we are writing this form out to. :param datapackage_path: where to write a Frictionless datapackage descriptor to, if at all. Defaults to None, i.e., do not write one. :param metadata_path: where to write XBRL metadata to, if at all. Defaults to None, i.e., do not write one. :param requested_tables: only attempt to ingest data for these tables. :param instance_pattern: only ingest data for instances matching this regex. Defaults to empty string which matches all. :param workers: max number of workers to use. :param batch_size: max number of instances to parse for each worker. .. py:function:: table_data_from_instances(instance_builders: list[ferc_xbrl_extractor.instance.InstanceBuilder], table_defs: dict[str, ferc_xbrl_extractor.datapackage.FactTable], batch_size: int | None = None, workers: int | None = None) -> tuple[dict[str, pandas.DataFrame], dict[str, list]] Turn FactTables into Dataframes by ingesting facts from instances. To handle lots of instances, we split the instances into batches. :param instances: A list of Instance objects used for parsing XBRL filings. :param table_defs: the tables defined in the taxonomy that we will match facts to. :param batch_size: Number of filings to process before writing to DB. :param workers: Number of threads to create for parsing filings. .. py:function:: process_batch(instance_builders: collections.abc.Iterable[ferc_xbrl_extractor.instance.InstanceBuilder], table_defs: dict[str, ferc_xbrl_extractor.datapackage.FactTable]) -> tuple[dict[str, pandas.DataFrame], set[str]] Extract data from one batch of instances. Splitting instances into batches significantly improves multiprocessing performance. This is done explicitly rather than using ProcessPoolExecutor's chunk_size option so dataframes within the batch can be concatenated prior to returning to the parent process. :param instance_builders: Iterator of instance builders which can be parsed into instances. :param table_defs: Dictionary mapping table names to FactTable objects describing table structure. .. py:function:: process_instance(instance: ferc_xbrl_extractor.instance.Instance, table_defs: dict[str, ferc_xbrl_extractor.datapackage.FactTable]) -> dict[str, pandas.DataFrame] Extract data from a single XBRL filing. This function will use the Instance object to parse a single XBRL filing. It then iterates through requested tables and populates them with data extracted from the filing. :param instance: A single Instance object that represents an XBRL filing. :param table_defs: Dictionary mapping table names to FactTable objects describing table structure. .. py:function:: get_fact_tables(taxonomy_source: pathlib.Path | io.BytesIO, form_number: int, db_uri: str, filter_tables: set[str] | None = None, datapackage_path: str | None = None, metadata_path: str | None = None) -> dict[str, ferc_xbrl_extractor.datapackage.FactTable] Parse taxonomy from URL. XBRL defines 'fact tables' that groups related facts. These fact tables are used directly to create the tables that will make up the output SQLite database, and their structure. The output of this function is a dictionary that maps table names to 'FactTable' objects that specify their structure. This function will also output a json file containing a Frictionless Data-Package descriptor describing the output database if requested. :param taxonomy_source: Zipfile with archived taxonomies for form. :param form_number: FERC Form number (can be 1, 2, 6, 60, 714). :param db_uri: URI of database used for constructing datapackage descriptor. :param filter_tables: Optionally specify the set of tables to extract. If None, all possible tables will be extracted. :param datapackage_path: Create frictionless datapackage and write to specified path as JSON file. If path is None no datapackage descriptor will be saved. :param metadata_path: Path to metadata json file to output taxonomy metadata. :returns: Dictionary mapping to table names to structure.