Spaces:
Running
Running
| # Lemma Graph | |
| This project introduces the notion of a _lemma graph_ as an intermediate representation. | |
| Effectively, this provides a kind of cache during the processing of each "chunk" of text. | |
| Think of the end result as "enhanced tokenization" for text used to generate graph data elements. | |
| Other projects might call this by different names: | |
| an "evidence graph" in [#wen2023mindmap](biblio.md#wen2023mindmap) | |
| or a "dynamically growing local KG" in [#loganlpgs19](biblio.md#loganlpgs19). | |
| The lemma graph collects metadata from NLP parsing, entity linking, etc., which generally get discarded in many applications. | |
| Therefore the lemma graph becomes rather "noisy", and in most cases would be too big to store across the analysis of a large corpus. | |
| Leveraging this intermediate form, per chunk, collect the valuable information about nodes, edges, properties, probabilities, etc., to aggregate for the document analysis overall. | |
| Consequently, this project explores the use of topological transforms on graphs to enhance representations for [_graph levels of detail_](https://blog.derwen.ai/graph-levels-of-detail-ea4226abba55), i.e., being able to understand a graph a varying levels of abstraction. | |
| Note that adjacent areas of interest include emerging work on: | |
| - _graph of relations_ | |
| - _foundation models for KGs_ | |
| Means for "bootstrapping" a _lemma graph_ with initial semantic relations, allows for "sampling" from a curated KG to enhance the graph algorithms used, e.g., through _semantic random walks_ which allow for incorporating heterogeneous sources and relatively large-scale external KGs. | |
| This mechanism also creates opportunities for distributed processing, because the "chunks" of text can follow a _task parallel_ pattern, accumulating the extracted results from each lemma graph into a graph database. | |
| Augmenting a KG iteratively over time follows a similar pattern. | |