LRU1 commited on
Commit
34365ef
Β·
1 Parent(s): 3eb1399

fix the language problem

Browse files
README.md CHANGED
@@ -11,3 +11,64 @@ short_description: Using for automatically generating notes from video lectures
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
+
15
+ ### Project Overview
16
+ Lec2Note is an **automatic lecture-to-note generator**. Upload a lecture video (MP4/MKV/AVI) and receive a well-formatted Markdown study note containing:
17
+
18
+ - **ASR transcription** powered by OpenAI Whisper.
19
+ - **Video segmentation** using semantic & visual cues.
20
+ - **LLM summarisation** (e.g. GPT-4) for each segment, extracting key points, formulas and insights.
21
+ - **Image extraction** of key frames to illustrate the note.
22
+ - **Markdown assembly** into a single readable document.
23
+
24
+
25
+ ### Installation
26
+ ```bash
27
+ # Requires Python β‰₯ 3.10
28
+ git clone https://github.com/your-name/Lec2Note.git
29
+ cd Lec2Note
30
+ python -m venv .venv && source .venv/bin/activate
31
+ pip install -r requirements.txt
32
+ ```
33
+ GPU inference: ensure CUDA and the matching PyTorch build are installed.
34
+
35
+ ### Quick Start
36
+ #### 1. Web UI
37
+ Navigate to `https://huggingface.co/spaces/LRU1/lec2note` .
38
+
39
+ #### 2. CLI
40
+ ```bash
41
+ python -m lec2note.scripts.run_pipeline --video path/to/lecture.mp4 --output notes.md
42
+ ```
43
+
44
+ #### 3. Required Environment Variables
45
+ ```bash
46
+ export OPENAI_API_KEY=your_openai_api_key
47
+ export REPLICATE_API_TOKEN=your_replicate_api_token
48
+ export LOG_LEVEL=DEBUG(optional)
49
+ export AUDIO2TEXT_LOCAL=true|false(optional)
50
+ ```
51
+
52
+ ### Directory Structure
53
+ ```text
54
+ Lec2Note/
55
+ β”œβ”€β”€ app.py # Streamlit front-end
56
+ β”œβ”€β”€ lec2note/
57
+ β”‚ β”œβ”€β”€ ingestion/ # Audio/video preprocessing & ASR
58
+ β”‚ β”œβ”€β”€ segmentation/ # Semantic + visual segmentation
59
+ β”‚ β”œβ”€β”€ processing/ # LLM summarisation & note generation
60
+ β”‚ β”œβ”€β”€ synthesis/ # Markdown assembly
61
+ β”‚ └── scripts/ # CLI entry points
62
+ └── tests/ # Test suite
63
+ ```
64
+
65
+ ### Environment Variables
66
+ Some modules require the following environment variables:
67
+ - `OPENAI_API_KEY`: OpenAI access token.
68
+ - `WHISPER_MODEL`: Whisper model name, default `base`.
69
+
70
+ ### Contributing
71
+ Pull requests and issues are welcome! See `DEVELOPER_GUIDE.md` for code conventions and workflow.
72
+
73
+ ### License
74
+ Released under the Apache-2.0 license.
app.py CHANGED
@@ -2,11 +2,16 @@ import streamlit as st
2
  from pathlib import Path
3
  import tempfile, subprocess, threading, queue
4
  import textwrap
 
5
 
6
  st.set_page_config(page_title="Lec2Note2 – Lecture-to-Notes", layout="wide")
7
 
8
  st.title("πŸ“ Lec2Note – Automatic Lecture Notes Generator")
9
 
 
 
 
 
10
  st.markdown(
11
  textwrap.dedent(
12
  """
@@ -85,7 +90,7 @@ if run_btn and video_file:
85
  st.success("βœ… Notes generated!")
86
  md_content = output_md.read_text()
87
  with st.container(border=True):
88
- st.markdown(md_content)
89
  st.download_button(
90
  label="πŸ’Ύ Download notes.md",
91
  data=md_content,
 
2
  from pathlib import Path
3
  import tempfile, subprocess, threading, queue
4
  import textwrap
5
+ import streamlit.components.v1 as components
6
 
7
  st.set_page_config(page_title="Lec2Note2 – Lecture-to-Notes", layout="wide")
8
 
9
  st.title("πŸ“ Lec2Note – Automatic Lecture Notes Generator")
10
 
11
+ # Inject MathJax once for LaTeX rendering
12
+ MATHJAX = "<script src='https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js'></script>"
13
+ components.html(MATHJAX, height=0)
14
+
15
  st.markdown(
16
  textwrap.dedent(
17
  """
 
90
  st.success("βœ… Notes generated!")
91
  md_content = output_md.read_text()
92
  with st.container(border=True):
93
+ st.markdown(md_content, unsafe_allow_html=True)
94
  st.download_button(
95
  label="πŸ’Ύ Download notes.md",
96
  data=md_content,
lec2note/ingestion/__pycache__/whisper_runner.cpython-310.pyc CHANGED
Binary files a/lec2note/ingestion/__pycache__/whisper_runner.cpython-310.pyc and b/lec2note/ingestion/__pycache__/whisper_runner.cpython-310.pyc differ
 
lec2note/ingestion/whisper_runner.py CHANGED
@@ -17,10 +17,10 @@ __all__ = ["WhisperRunner"]
17
 
18
 
19
  class WhisperRunner: # noqa: D101
20
- model_name: str = "base"
21
 
22
  @classmethod
23
- def transcribe(cls, audio_fp: str | Path, lang: str = "zh") -> List[Dict[str, Any]]:
24
  """Transcribe ``audio_fp`` and return list with start/end/text.
25
 
26
  Notes
@@ -32,7 +32,7 @@ class WhisperRunner: # noqa: D101
32
  sub_path=audio_path.with_suffix(".json")
33
  if sub_path.exists():
34
  logger.info("[Whisper] loading exisisting subtitles.")
35
- with open(sub_path, "r") as f:
36
  return json.load(f)
37
  if not audio_path.exists():
38
  raise FileNotFoundError(audio_path)
@@ -92,6 +92,6 @@ class WhisperRunner: # noqa: D101
92
  }
93
  for seg in segments
94
  ]
95
- with open(sub_path, "w") as f:
96
- json.dump(results, f, indent=2)
97
  return results
 
17
 
18
 
19
  class WhisperRunner: # noqa: D101
20
+ model_name: str = "large-v3"
21
 
22
  @classmethod
23
+ def transcribe(cls, audio_fp: str | Path, lang: str = None) -> List[Dict[str, Any]]:
24
  """Transcribe ``audio_fp`` and return list with start/end/text.
25
 
26
  Notes
 
32
  sub_path=audio_path.with_suffix(".json")
33
  if sub_path.exists():
34
  logger.info("[Whisper] loading exisisting subtitles.")
35
+ with open(sub_path, "r", encoding="utf-8") as f:
36
  return json.load(f)
37
  if not audio_path.exists():
38
  raise FileNotFoundError(audio_path)
 
92
  }
93
  for seg in segments
94
  ]
95
+ with open(sub_path, "w", encoding="utf-8") as f:
96
+ json.dump(results, f, ensure_ascii=False, indent=2)
97
  return results
lec2note/processing/__pycache__/processor.cpython-310.pyc CHANGED
Binary files a/lec2note/processing/__pycache__/processor.cpython-310.pyc and b/lec2note/processing/__pycache__/processor.cpython-310.pyc differ
 
lec2note/processing/processor.py CHANGED
@@ -69,7 +69,7 @@ class Processor: # noqa: D101
69
  " - For **tables**, recreate them using Markdown table syntax.\n"
70
  " - For **code**, use Markdown code blocks with appropriate language identifiers.\n\n"
71
  "3. **Structure and Format**: Organize the notes logically. Use headings, subheadings, lists, and bold text to create a clear, readable, and well-structured document.\n\n"
72
- "4. **Language**: The notes should align with the subtitles.\n\n"
73
  "5. **Image Mapping**: Stop referencing the images and try to use formulas, tables, code snippets, or important diagrams to describe the images.\n\n"
74
  "---BEGIN LECTURE MATERIALS---\n"
75
  f"**Subtitles (placeholders inserted)**:\n{placeholder_subs}"
 
69
  " - For **tables**, recreate them using Markdown table syntax.\n"
70
  " - For **code**, use Markdown code blocks with appropriate language identifiers.\n\n"
71
  "3. **Structure and Format**: Organize the notes logically. Use headings, subheadings, lists, and bold text to create a clear, readable, and well-structured document.\n\n"
72
+ "4. **Language**: The notes'language should align with the subtitles!!!.\n\n"
73
  "5. **Image Mapping**: Stop referencing the images and try to use formulas, tables, code snippets, or important diagrams to describe the images.\n\n"
74
  "---BEGIN LECTURE MATERIALS---\n"
75
  f"**Subtitles (placeholders inserted)**:\n{placeholder_subs}"
lec2note/synthesis/__pycache__/assembler.cpython-310.pyc CHANGED
Binary files a/lec2note/synthesis/__pycache__/assembler.cpython-310.pyc and b/lec2note/synthesis/__pycache__/assembler.cpython-310.pyc differ
 
lec2note/synthesis/assembler.py CHANGED
@@ -52,6 +52,7 @@ class Assembler: # noqa: D101
52
  "1. **De-duplicate and Consolidate:** Identify all repetitive definitions and explanations. Merge them into a single, comprehensive section for each core concept. \n"
53
  "2. **Synthesize and Enhance:** Where different fragments explain the same concept with slightly different examples or details (e.g., one note uses a 'cheetah' example, another uses a 'robot'), synthesize these details to create a richer, more complete explanation under a single heading.\n"
54
  "3. **Polish and Format:** Ensure the final text is grammatically correct, flows naturally, and uses consistent, clean Markdown formatting (e.g., for tables, code blocks, and mathematical notation).\n\n"
 
55
  "**Constraint:** Ensure all unique concepts and key details from the original notes are preserved in the final document. The goal is to lose redundancy, not information.\n\n"
56
  "Here are the fragmented notes to process:\n\n"
57
  f"{raw_md}"
 
52
  "1. **De-duplicate and Consolidate:** Identify all repetitive definitions and explanations. Merge them into a single, comprehensive section for each core concept. \n"
53
  "2. **Synthesize and Enhance:** Where different fragments explain the same concept with slightly different examples or details (e.g., one note uses a 'cheetah' example, another uses a 'robot'), synthesize these details to create a richer, more complete explanation under a single heading.\n"
54
  "3. **Polish and Format:** Ensure the final text is grammatically correct, flows naturally, and uses consistent, clean Markdown formatting (e.g., for tables, code blocks, and mathematical notation).\n\n"
55
+ "4. **Language:** The notes' language should align with the subtitles!!!.\n\n"
56
  "**Constraint:** Ensure all unique concepts and key details from the original notes are preserved in the final document. The goal is to lose redundancy, not information.\n\n"
57
  "Here are the fragmented notes to process:\n\n"
58
  f"{raw_md}"