1 1 1

Jing

hij

AI & ML interests

None yet

Recent Activity

authored a paper about 2 months ago

Blackbox Model Provenance via Palimpsestic Membership Inference

updated a dataset 2 months ago

hij/test_generation

published a dataset 2 months ago

hij/test_generation

View all activity

Organizations

authored a paper about 2 months ago

Blackbox Model Provenance via Palimpsestic Membership Inference

Paper • 2510.19796 • Published Oct 22 • 3

updated a dataset 2 months ago

hij/test_generation

Viewer • Updated Oct 27 • 860k • 9

published a dataset 2 months ago

hij/test_generation

Viewer • Updated Oct 27 • 860k • 9

updated a dataset 4 months ago

hij/sequence_samples

Viewer • Updated Sep 6 • 5.1M • 26

published a dataset 4 months ago

hij/sequence_samples

Viewer • Updated Sep 6 • 5.1M • 26

authored 3 papers 4 months ago

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Paper • 2501.17148 • Published Jan 28 • 1

LLMs Encode Harmfulness and Refusal Separately

Paper • 2507.11878 • Published Jul 16 • 1

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

Paper • 2505.11770 • Published May 17 • 2

updated a dataset 11 months ago

hij/ravel

Viewer • Updated Feb 5 • 5.35k • 86

published a dataset 11 months ago

hij/ravel

Viewer • Updated Feb 5 • 5.35k • 86

liked a Space about 1 year ago

infini-gram

📖

117

Search and analyze n-grams in large datasets

upvoted a collection about 1 year ago

TOFU Unlearned Models

Collection

Collection of Phi TOFU models with various configurations • 17 items • Updated Oct 8, 2024 • 6

authored 4 papers over 1 year ago

Rigorously Assessing Natural Language Explanations of Neurons

Paper • 2309.10312 • Published Sep 19, 2023

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

Paper • 2401.12631 • Published Jan 23, 2024

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Paper • 2403.07809 • Published Mar 12, 2024 • 1

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Paper • 2402.17700 • Published Feb 27, 2024 • 2

Jing

AI & ML interests

Recent Activity

Organizations

hij's activity

infini-gram