API Reference
This page is automatically generated from the source code docstrings.
kegger.kegg_tools
clean_entry(entry)
Standardizes and cleans raw KEGG record fields.
This internal helper takes a dictionary of raw tags and values and applies specific parsing logic based on the KEGG field type (e.g., splitting GENE identifiers from their descriptions or parsing PATHWAY_MAP strings).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entry
|
dict
|
A dictionary where keys are KEGG tags (e.g., 'GENE') and values are lists of raw string lines. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
A cleaned dictionary where values are strings or lists of strings, structured for easier data analysis. |
Note
Special handling is applied to 'GENE' fields to separate gene IDs from their associated 'ORTHOLOG' identifiers.
Source code in src/kegger/kegg_tools.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | |
genes_to_pathways(org)
Retrieves the mapping between genes and their associated pathways for an organism.
Queries the KEGG 'link' endpoint to produce a many-to-many map of pathways and gene identifiers. This is useful for enrichment analysis or finding all genes within a specific biological process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
org
|
str
|
The KEGG organism code (e.g., 'shn' or 'eco'). |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A two-column DataFrame: - 'pathid': The KEGG pathway identifier (e.g., 'shn00010'). - 'gene': The specific gene identifier (e.g., 'Shewana3_0001'). |
Example
df_map = genes_to_pathways('shn')
To find all genes in a specific pathway:
glycolysis = df_map[df_map['pathid'] == 'shn00010']
Source code in src/kegger/kegg_tools.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
get_org(org)
Retrieve and parse a KEGG organism genome list.
Connects to the KEGG REST API 'list' endpoint to retrieve gene-level metadata and converts the tab-delimited response into a structured DataFrame.
Parameters
org : str The three- or four-letter KEGG organism identifier.
Returns
df : pandas.DataFrame DataFrame with the following columns: - gene: KEGG gene identifier (e.g., 'Shewana3_0001') - feature: Biological category (e.g., 'CDS', 'RNA') - position: Chromosomal coordinates - annotation: Functional description/gene name
Source code in src/kegger/kegg_tools.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | |
initialize_kegger(cache_path=None, expire_days=30)
Sets up a persistent local cache for KEGG API requests.
This function initializes a SQLite database to store API responses. Subsequent calls to the same KEGG URL will pull data from the local cache instead of the internet, significantly speeding up data processing and reducing server load.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_path
|
str | None
|
The filename or path for the SQLite cache. Defaults to "kegg_cache" (which creates 'kegg_cache.sqlite'). |
None
|
expire_days
|
int
|
How many days a cached response remains valid before a fresh request is forced. Defaults to 30. |
30
|
Note
If a cache file already exists at the specified path, this function will automatically load and reuse it.
Source code in src/kegger/kegg_tools.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
kegg_parser(request_text)
Parses a raw KEGG REST API response into a structured dictionary.
This function reads the flat-file format used by KEGG, identifying tags (like ENTRY, NAME, PATHWAY) and capturing their associated data. It utilizes a temporary file for memory-efficient processing of large records.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request_text
|
str
|
The raw text response from a KEGG REST API call. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
A processed dictionary containing the parsed and cleaned fields of the KEGG record. |
Source code in src/kegger/kegg_tools.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 | |
list_all_pathways(org)
Retrieves a list of all KEGG pathways for a specific organism.
This function queries the KEGG REST API to find every metabolic and signaling pathway associated with the provided organism code. It automatically cleans the 'path:' prefix from the results to simplify downstream data merging.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
org
|
str
|
The 3-4 letter KEGG organism code (e.g., 'eco' for E. coli, 'hsa' for humans, or 'mmu' for mouse). |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: A DataFrame with two columns: - 'pathid': The unique KEGG pathway identifier (e.g., 'eco00010'). - 'description': The human-readable name of the pathway. |
Example
import kegger pathways = kegger.list_all_pathways('eco') print(pathways.head())
Source code in src/kegger/kegg_tools.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |