FeatureGenerator module

class Generators.FeatureGenerator.Feature(feature_sql_file, feature_sql_params, temporal=True)

Bases: object

A single time-series based feature to be used for modelling.

This class serves as a wrapper for a SQL script used to collect a single set of features (such as a time series of diagnoses) for each of the members of a cohort. Note that the cohort is not specified here, and only needs to be specified just before actually collecting data.

Parameters
  • feature_sql_file (str) – A path to a SQL file containing a script to generate the feature. This script may contain templated terms in the form {param}.

  • feature_sql_params (dict) – Keyword arguments to format any templated terms in the SQL script found at feature_sql_file.

class Generators.FeatureGenerator.FeatureSet(db, dtcols=('feature_start_date', 'person_start_date', 'person_end_date'), id_col='person_id', time_col='feature_start_date', feature_col='concept_name')

Bases: object

A collection of features to be used for modelling

This class contains a group of Feature objects, and coordinates gathering each of these features for a cohort of patients and formatting the collected data into an efficient format.

Parameters

db (Utils.dbutils.Database) – The database from which the features are to be extracted.

add(feature)

Add a new Feature object to the FeatureSet (only temporal features supported for now)

Parameters

feature (Feature) – The feature to add

Returns

None

add_default_features(default_features, schema_name=None, cohort_name=None)
build(cohort, cache_file='/tmp/store.csv', from_cached=False)

Populate a feature set using the features in self._temporal_features. This function call will actually collect the dataset, and thus incur significant runtime.

Parameters
  • cohort (Generators.CohortGenerator.Cohort) – The cohort whose members data will be collected for.

  • cache_file (str) – A location to store a raw dump of data generated by feature SQL queries

  • from_cached (bool) – Try to load data directly from cache_file rather than first populating it by running SQL scripts

Returns

None

get_feature_names()
get_num_features()
get_sparr_rep()

Get the sparse array representation (if it is populated by self.build()) :returns: A sparse array of data if populated, else None