File size: 2,590 Bytes

da4dbb5
2347900
 
 
da4dbb5
 
2347900
7348352
da4dbb5
 
 
 
7348352
da4dbb5
7348352
 
da4dbb5
 
 
7348352
 
2347900
 
7348352
da4dbb5
2347900
da4dbb5
2347900
da4dbb5
7348352
 
da4dbb5
2347900
7348352
da4dbb5
2347900
7348352
da4dbb5
2347900
7348352
da4dbb5
2347900
7348352
da4dbb5
2347900
7348352
da4dbb5
 
 
2347900
 
 
 
 
 
7348352

---
language: en
license: mit
model_name: tbert-siamese-encoder
---

# Model Card for Model ID
This repository contains the embedding model used to embed artifact for traceability link prediction.


## Model Details

used in the siamese models
### Model Description
This embedding model is the encoder portion of the siamese model used in the paper cited.  This model utilized a relational classifier 
to create similarity scores between text pairs resembling a cross-encoder and consistently ranked almost as high as the top performer.



- **Developed by:** Jinfeng Lin (translated by Alberto Rodriguez)
- **Model type:** Roberta encoder trained on automatic traceability link prediction.
- **Language(s) (NLP):** en
- **License:** mit
- **Finetuned from model [optional]:** See Cited Ppaer.

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/jinfenglin/TraceBERT
- **Paper:** https://arxiv.org/abs/2102.04411

## Uses
Used to embed software artifacts intended to be compared via cosine similarity.

### Direct Use
Software traceability link prediction, Retrieval Augmented Generation, Artifact Clustering.

### Downstream Use [optional]
The intended vision for this model within a traceability link prediction pipeline, used to retrieve software artifacts for an LLM prompt, and for clustering.

### Out-of-Scope Use
This model could be used for a good set of starting weights for requirements classification.

## Bias, Risks, and Limitations
This data uses open source git data which can be inaccurate and lead to unexpected results.

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

```
parent_artifacts = [
"Display Artifacts",
]
texts = [
    "Display Artifacts", // parent artifact
    "A table view should be provided to display all project artifacts.", // child 1
    "The system should be able to generate documentation for a set of artifacts." // child 2
]
embeddings = model.encode(texts, convert_to_tensor=False)

parent_embedding = embeddings[0:1]
children_embeddings = embeddings[1:]

# Compute cosine similarity
sim_matrix = cosine_similarity(parent_embedding, children_embeddings)

```
## Training, Evaluation, and Results Details
Please see cited paper for more information on training method, evaluation, and resuts.