FeatureGenerator module¶
-
class
Generators.FeatureGenerator.
Feature
(feature_sql_file, feature_sql_params, temporal=True)¶ Bases:
object
A single time-series based feature to be used for modelling.
This class serves as a wrapper for a SQL script used to collect a single set of features (such as a time series of diagnoses) for each of the members of a cohort. Note that the cohort is not specified here, and only needs to be specified just before actually collecting data.
- Parameters
feature_sql_file (str) – A path to a SQL file containing a script to generate the feature. This script may contain templated terms in the form {param}.
feature_sql_params (dict) – Keyword arguments to format any templated terms in the SQL script found at feature_sql_file.
-
class
Generators.FeatureGenerator.
FeatureSet
(db, dtcols=('feature_start_date', 'person_start_date', 'person_end_date'), id_col='person_id', time_col='feature_start_date', feature_col='concept_name')¶ Bases:
object
A collection of features to be used for modelling
This class contains a group of Feature objects, and coordinates gathering each of these features for a cohort of patients and formatting the collected data into an efficient format.
- Parameters
db (Utils.dbutils.Database) – The database from which the features are to be extracted.
-
add
(feature)¶ Add a new Feature object to the FeatureSet (only temporal features supported for now)
- Parameters
feature (Feature) – The feature to add
- Returns
None
-
add_default_features
(default_features, schema_name=None, cohort_name=None)¶
-
build
(cohort, cache_file='/tmp/store.csv', from_cached=False)¶ Populate a feature set using the features in self._temporal_features. This function call will actually collect the dataset, and thus incur significant runtime.
- Parameters
cohort (Generators.CohortGenerator.Cohort) – The cohort whose members data will be collected for.
cache_file (str) – A location to store a raw dump of data generated by feature SQL queries
from_cached (bool) – Try to load data directly from cache_file rather than first populating it by running SQL scripts
- Returns
None
-
get_feature_names
()¶
-
get_num_features
()¶
-
get_sparr_rep
()¶ Get the sparse array representation (if it is populated by self.build()) :returns: A sparse array of data if populated, else None