DeepResearchEvaluator

Sleeping

App Files Files Community

awacke1 commited on Jan 1

Commit

5c1c468

verified ·

1 Parent(s): 73c62c4

Update README.md

Browse files

Files changed (1) hide show

README.md +58 -6

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: TalkingAIResearcher
 emoji: 🏆🏆🏆
 colorFrom: red
 colorTo: purple
@@ -8,14 +8,66 @@ sdk_version: 1.41.1
 app_file: app.py
 pinned: true
 license: mit
-short_description: TalkingAIResearcher
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-#OPENAI_API_KEY=your_key
-#ANTHROPIC_API_KEY=your_key
-#HF_KEY=your_key
 Features:

 ---
+title: DeepResearchEvaluator
 emoji: 🏆🏆🏆
 colorFrom: red
 colorTo: purple
 app_file: app.py
 pinned: true
 license: mit
+short_description: Deep Research Evaluator for Long Horizon Learning Tasks
 ---
+A Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences.
+Key Topics and Related Papers:
+Long-Horizon Task Planning in Robotics:
+"MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model"
+Authors: Yike Wu, Jiatao Zhang, Nan Hu, LanLing Tang, Guilin Qi, Jun Shao, Jie Ren, Wei Song
+This paper introduces a method that decomposes complex tasks at multiple levels to enhance planning capabilities using open-source large language models.
+ARXIV
+"ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning"
+Authors: Zhehua Zhou, Jiayang Song, Kunpeng Yao, Zhan Shu, Lei Ma
+The study presents a framework that improves LLM-based planning through an iterative self-refinement process, enhancing feasibility and correctness in task plans.
+ARXIV
+Skill-Based Reinforcement Learning:
+"Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks"
+Authors: Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu
+This research focuses on building multi-task agents in open-world environments by learning basic skills and planning over them to accomplish long-horizon tasks efficiently.
+ARXIV
+"SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks"
+Authors: Yongyan Wen, Siyuan Li, Rongchang Zuo, Lei Yuan, Hangyu Mao, Peng Liu
+The paper proposes a framework that integrates a differentiable decision tree within the high-level policy to generate skill embeddings, enhancing explainability in decision-making for complex tasks.
+ARXIV
+Neuro-Symbolic Approaches:
+"Learning for Long-Horizon Planning via Neuro-Symbolic Abductive Imitation"
+Authors: Jie-Jing Shao, Hao-Ran Hao, Xiao-Wen Yang, Yu-Feng Li
+This work introduces a framework that combines data-driven learning and symbolic-based reasoning to enable long-horizon planning through abductive imitation learning.
+ARXIV
+"CaStL: Constraints as Specifications through LLM Translation for Long-Horizon Task and Motion Planning"
+Authors: [Authors not specified]
+The study presents a method that utilizes large language models to translate constraints into formal specifications, facilitating long-horizon task and motion planning.
+ARXIV
+Evaluation Frameworks for AI Models:
+"ASI: Accuracy-Stability Index for Evaluating Deep Learning Models"
+Authors: Wei Dai, Daniel Berleant
+The paper introduces the Accuracy-Stability Index (ASI), a quantitative measure that incorporates both accuracy and stability for assessing deep learning models.
+ARXIV
+"Benchmarks for Deep Off-Policy Evaluation"
+Authors: Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine
+This research provides a collection of policies that, in conjunction with existing offline datasets, can be used for benchmarking off-policy evaluation in deep learning.
+ARXIV
+These topics and papers contribute to the development of AI systems capable of understanding research literature and applying the acquired knowledge to complex, long-horizon tasks, thereby advancing the field of artificial intelligence.
+---
 Features: