arhanovich commited on
Commit
67db278
·
verified ·
1 Parent(s): 74d26a5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +603 -0
README.md ADDED
@@ -0,0 +1,603 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **INTRODUCTION**
2
+
3
+ It is our team's pleasure to work with you and offer our latest cutting-edge Language Model (LLM)-Enteli-49B for your business needs.
4
+ This collaboration marks a significant step in utilizing advanced Natural Language Processing (NLP) to enhance your business operations.
5
+
6
+ This Hugging Face repository is divided into 5 sections; **Model Architecture**, **Model Usage**, **Immediate Integration**, **Deployment** and **Future Work** .
7
+ **Please check out our demo for that model: https://huggingface.co/spaces/arhanovich/Enteli-49B_Demo**
8
+
9
+ **Key Features of Enteli-49B:**
10
+
11
+ -**SOTA Performance**: As it can be discerned from the benchmarks, Enteli-49B outperforms incredibly other language models like GPT-3.5.
12
+ Our LLM excels in understanding and generating human-like text with advanced reasoning, coding and math abilities.
13
+
14
+ -**Customization and Scalability**: Tailor the model to your specific industry needs, ensuring relevance and efficiency in a plethora of tasks.
15
+
16
+ -**Computational Efficiency**: Regarding its high performance, our LLM's parameter size is relatively low and has less computational intensity for interference
17
+
18
+ -**Seamless Integration**: Easy integration with your existing systems and workflows.
19
+
20
+ **Choosing HuggingFace for Delivery and Demonstration:**
21
+
22
+ Our choice of HuggingFace as the platform for demonstration and delivery of our LLM to your sides is strategic and deliberate. HuggingFace is well-known for its robust,
23
+ user-friendly, and versatile environment. This platform not only simplifies the integration and deployment of advanced AI models but also ensures that you stay at the
24
+ forefront of AI technology with continuous updates and community support. Prominent firms in the field of AI like Google, Meta, Openai and Microsoft take advantage of
25
+ HuggingFace for sharing LLMs safely and easily.
26
+
27
+ **MODEL ARCHITECTURE**
28
+
29
+ It is pondered as an endorsed fact that successful LLMs like GPT-4 have been trained using a method called Mixture of Experts due its great performance and higher
30
+ efficiency. Thus, we, as EnteliMind trained Enteli-49B using the Mixture of Experts algorithm.
31
+
32
+ When it comes to improving the quality of machine learning models, scale is key. Given a fixed computing budget, training a larger model for fewer steps is better
33
+ than training a smaller model for more steps.An intriguing approach to achieve better scale with limited computational resources is the Mixture of Experts (MoE) model. This method allows for larger models or datasets to be pre-trained using the same compute budget as traditional dense models, but with significantly faster results. Instead of training a single language model where its training would be like a "black box", unaware of its domain-specific abilities, expert models can be separately trained with each expert dedicated to a single ability.
34
+
35
+ At its core, a MoE model comprises two primary components:
36
+
37
+ **Sparse MoE Layers**: These replace the usual dense feed-forward network (FFN) layers. A MoE layer consists of several "experts" – each being a separate neural network.
38
+ Typically, these experts are FFNs themselves, but they can also be more intricate, even forming hierarchical structures.
39
+
40
+ **Gate Network/Router:** This component directs specific tokens to specific experts. For instance, one token might be routed to one expert while another goes
41
+ to a different one. The routing process is critical in MoE models and is based on learned parameters that are pre-trained alongside the network.
42
+
43
+ ![MoE Layer](https://cdn.discordapp.com/attachments/1016188040978370650/1205151002278232154/image.png?ex=65d75355&is=65c4de55&hm=af4ad6ee9923e999cf3fbd7f6cdc23afdfd093accee59aa1d77d8edab1507daa& "MoE Layer")
44
+
45
+
46
+
47
+
48
+
49
+ **Gating Network Mechanics:**
50
+
51
+ The gating network's function is to efficiently distribute input across various experts. It's mathematically defined as:
52
+
53
+
54
+ ![Gating Network Mechanics](https://cdn.discordapp.com/attachments/1016188040978370650/1205151493603332216/image.png?ex=65d753ca&is=65c4deca&hm=72690eaa46fa311e912a8e7a666424b084ca8377db2f85fb9ce13de3eae693e9& "Gating Network Mechanics")
55
+
56
+
57
+
58
+
59
+
60
+
61
+
62
+
63
+
64
+
65
+
66
+ **Sparsity and Conditional Computation:**
67
+
68
+ Sparsity in MoE models is about using conditional computation - activating only parts of the network for specific inputs. This approach enables scaling up the
69
+ model size without a proportional increase in computation. This is mathematically represented as:
70
+
71
+ ![Sparsity and Conditional Computation](https://media.discordapp.net/attachments/1016188040978370650/1205152780776513588/image.png?ex=65d754fd&is=65c4dffd&hm=ce342d802d162d2617c2d7e56f96cde31f997707eeed5fab4bded77a5396b169&=&format=webp&quality=lossless&width=558&height=158 "Sparsity and Conditional Computation")
72
+
73
+
74
+
75
+ Where _y_ is the output, _G_(_x_) is the gating function, _Ei_​(_x_) is the operation by the i-th expert, and _n_ is the number of experts.
76
+
77
+
78
+
79
+ **Innovative Gating and Load Balancing:**
80
+
81
+ Beyond traditional gating, techniques like Noisy Top-k Gating add noise to the gating process, keeping only the top k values. This method, while
82
+ introducing complexity, aids in faster training and inference by activating fewer experts. Additionally, noise helps in load balancing, ensuring an equitable
83
+ distribution of tokens among experts, preventing any single expert from becoming a bottleneck. Here is its mathematical representation:
84
+
85
+ ![Innovative Gating](https://media.discordapp.net/attachments/1016188040978370650/1205153528146956288/image.png?ex=65d755af&is=65c4e0af&hm=2fc8c56b6ae066b65124d1cde5177466f58379321a7d0763f3e9b87cd788bf10&=&format=webp&quality=lossless&width=1384&height=684 "Innovative Gating")
86
+
87
+ **Our Research Findings:**
88
+
89
+ We have simplified our own entire model architecture to the transformer module's mixture of experts known as "MixtralForCausalLM". This allows for easy integration
90
+ with the HuggingFace and the transformers module which will certainly facilitate the future work like Supervised Fine-tuning.
91
+
92
+ However, it is best to acknowledge that the difference between the original implementation and the simplified version is pretty minute and we would like to share our
93
+ extra research findings when training Enteli-49B.
94
+
95
+
96
+ **1-) Exponential Mean Absolute Deviation Normalization (EMADNorm):**
97
+
98
+ Enteli-49B incorporates EMADNorm to normalize the data, which divides each element by an exponential factor dependent on the dataset's mean absolute deviation (MAD).
99
+ The MAD and EMADNorm are defined as:
100
+
101
+ ![EMADNorm](https://media.discordapp.net/attachments/1016188040978370650/1205154287857041418/image.png?ex=65d75664&is=65c4e164&hm=477e6af7478bb36d6eb999af3fd6442c54102ccb313cbda1b9ce31a27b57c855&=&format=webp&quality=lossless&width=490&height=160 "EMAD Norm")
102
+
103
+ Where N is the number of elements, xi is each individual element, μ is the mean of all elements, and e is the base of the natural logarithm.
104
+
105
+ EMADNorm focuses on the spread of the data by considering the mean absolute deviation. This aspect is particularly beneficial in datasets where the dispersion is an
106
+ important feature and needs to be emphasized or normalized differently from the mean. By using an exponential function of the MAD, EMADNorm adapts the degree of
107
+ normalization to the characteristics of the dataset. This adaptability can be crucial for datasets with varying levels of volatility or dispersion. Moreover, by normalizing the input data effectively, EMADNorm can contribute to more stable and efficient model training. It ensures that the scale of the inputs does not adversely affect the learning process, which can be critical for the convergence and performance of deep learning models.
108
+
109
+ **2-) CurveLu Activation Function**
110
+
111
+ The feed-forward network in Enteli-49B utilizes the CurveLu activation function, a blend of ReLU and Tanh, allowing sensitivity to both positive and negative inputs.
112
+ The network can be represented as:
113
+
114
+ ![CurveLu Activation Function](https://media.discordapp.net/attachments/1016188040978370650/1205154792293277787/image.png?ex=65d756dd&is=65c4e1dd&hm=9c8115ea86e746da19ee05860e3aead63b2f4be13c395c446b950bba21fe0405&=&format=webp&quality=lossless&width=798&height=1440 "CurveLu Activation Function")
115
+
116
+
117
+ Where the Curvelu activation function equals to:
118
+ ![CurveLu Activation Function](https://media.discordapp.net/attachments/1016188040978370650/1205155203079348284/image.png?ex=65d7573f&is=65c4e23f&hm=8057acfc15ae0863bbeb93ba7a47cddde13a9678ee01c0584ff36f3e6fe727e7&=&format=webp&quality=lossless&width=608&height=112 "CurveLu Activation Function")
119
+
120
+
121
+ And _k_ is a hyper-parameter that dictates the steepness of the tanh function or it can be either set as a constant 1.
122
+
123
+ This novel activation function is both smooth and more forgiving to positive values as it can be discerned from the graph below.
124
+
125
+
126
+
127
+ **More Details:**
128
+
129
+ Enteli-49B is pre-trained on data extracted from the open Web with experts and routers trained simultaneously with over **1.9 Trillions of tokens.**
130
+
131
+ # Benchmarks
132
+
133
+ | | Enteli-49B (EnteliMind) | GPT 3.5 (OpenAI) | LLaMa 70B (Meta AI) |
134
+ |--------------------------|-------------------------|------------------|--------------------|
135
+ | MMLU | 73.6% | 70% | 69.9% |
136
+ | HelloSwag (10-shot) | 90.6% | 85.5% | 87.1% |
137
+ | ARC Challenge (25-shot) | 87.9% | 85.2% | 85.1% |
138
+ | WinoGrande (5-shot) | 83.2% | 81.6% | 83.2% |
139
+ | GSM-8K (5-shot) | 61.1% | 57.1% | 53.6% |
140
+
141
+ These benchmarks indicate that our model **outperforms** models like GPT-3.5 and LLaMa 2 70b although having fewer parameter size.
142
+
143
+ **Model Usage**
144
+
145
+ Our model can be easily used with the transformers python library.
146
+
147
+ The chat template that must be strictly used is as follows:
148
+ ```
149
+ \<s\> [INST] There goes the prompt [/INST] There goes the answer\</s\> [INST] Follow-up prompt [/INST]
150
+ ```
151
+
152
+ - \<s\> is the BOS (Beginning of string)
153
+ - \</s\> is the EOS (End of String)
154
+
155
+ Here is an example code for the model usage in python using GPU:
156
+
157
+ ```python
158
+ #pip install transformers accelerate bitsandbytes
159
+ import torch
160
+ from transformers import AutoTokenizer, AutoModelForCausalLM
161
+
162
+ model_name = "arhanovich/Enteli-49B"
163
+ auth_token = "There goes the auth token" #Since this a private model, you must use that auth token to access the model and the tokenizer
164
+
165
+ tokenizer = AutoTokenizer.from_pretrained(model_name, use_default_system_prompt=False, use_auth_token=auth_token)
166
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32, device_map='auto',local_files_only=False, load_in_4bit=True, use_auth_token=auth_token)
167
+
168
+ prompt = input("Query: ")
169
+ full_prompt = f"<s>[INST] You are a helpful AI called Enteli trained by the AI company EnteliMind.[/INST]\nUser: {prompt}\nAssistant:"
170
+ input_ids = tokenizer(full_prompt, return_tensors="pt").input_ids.to("cuda")
171
+ generation_output = model.generate(
172
+ input_ids=input_ids, max_new_tokens=500)
173
+ answer = str(tokenizer.decode(generation_output[0], skip_special_tokens=True)).replace(full_prompt, "")
174
+ print(f"Answer: {answer}")
175
+ ```
176
+
177
+ **Important Notes:**
178
+
179
+ - This chat template must be strictly used
180
+ - In this code **torch.float32** has been used however, alternatively, torch.float16 could also be used which can lead to faster computations and lower memory usage
181
+ but at the cost of precision.
182
+ - In this code model has been loaded with **4-bit** which refers to a form of model quantization where the weights of a neural network are represented
183
+ using only 4 bits per weight. Quantization reduces the model size and can speed up inference. However, for the sake of precision, it can be replaced with
184
+ for example 32 bit which would require more memory and hardware like GPU accelerator.
185
+ - Other parameters of the model.generate() such as temperature, top_p, top_k or max_new_tokens can also be altered upon request
186
+
187
+
188
+
189
+
190
+ **Immediate Integration**
191
+
192
+ In the dynamic landscape of artificial intelligence, the fusion of Enteli-49B with external functions heralds a groundbreaking era of innovation and utility.
193
+ This integration is not just an advancement; it's a revolution, poised to redefine the boundaries of technology and human interaction.
194
+
195
+ To exemplify, here are some potential use cases of the combination of Enteli-49B with external functions:
196
+
197
+ - Combining it with a calculator function to enable it carry out flawless calculations
198
+ - Combining it with a web browser or a search engine, making it aware of the current data
199
+ - Combining it with complex financial calculation tools like market analysis or investment portfolio.
200
+
201
+ Thus, any API or function in a coding environment can be integrated with Enteli-49B. Things get very interesting when you combine multiple
202
+ Enteli-49B with each one having its tools, enabling it to carry out complex tasks that humans are not able to perform efficiently. This can be potentailly
203
+ be the dawn of a new form of intelligence.
204
+
205
+ We, as EnteliMind team, have written two examplar scripts that will be a starting-point of that journey:
206
+
207
+ **Example1: Integration with functions of Single Paramter**
208
+
209
+ In the first example script, we are combining Enteli-49B with a **calculator tool** and a **webbrowser**.
210
+ Here is the code:
211
+
212
+
213
+ ```python
214
+ pip install transformers accelerate bitsandbytes duckduckgo_search
215
+
216
+ import torch
217
+ import transformers
218
+
219
+ model_name = "arhanovich/Enteli-49B"
220
+
221
+ auth_token = "There goes the auth token" #Since this a private model, you must use that auth token to access the model and the tokenizer
222
+
223
+ tokenizer = transformers.AutoTokenizer.from_pretrained(model_name, use_default_system_prompt=False, use_auth_token=auth_token)
224
+ model = transformers.AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32, device_map='auto',local_files_only=False, load_in_4bit=True, use_auth_token=auth_token)
225
+
226
+ generate_text = transformers.pipeline(
227
+ model=model, tokenizer=tokenizer,
228
+ return_full_text=False,
229
+ task="text-generation",
230
+ temperature=0.1, # 'randomness' of outputs, 0.0 is the min and 1.0 the max
231
+ top_p=0.15, # select from top tokens whose probability add up to 15%
232
+ top_k=0, # select from top 0 tokens (because zero, relies on top_p)
233
+ max_new_tokens=512, # max number of tokens to generate in the output
234
+ repetition_penalty=1.1
235
+ )
236
+
237
+
238
+ def instruction_format(sys_message: str, query: str):
239
+ return f'<s> [INST] {sys_message} [/INST]\nUser: {query}\nAssistant: ```json\n{{\n"tool_name": '
240
+
241
+ system_message= """You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:
242
+
243
+ - Calculator: the calculator should be used whenever you need to perform a calculation, no matter how simple. It uses Python so make sure to write complete Python code required to perform the calculation required and make sure the Python returns your answer to the `output` variable.
244
+ - Search: the search tool should be used whenever you need to find information. It can be used to find information about everything
245
+ - Final Answer: the final answer tool must be used to respond to the user. You must use this when you have decided on an answer.
246
+
247
+ TOOL USAGE
248
+
249
+ Let's get started. The users query is as follows.
250
+ """
251
+
252
+ import json
253
+
254
+ def format_output(text: str):
255
+ full_json_str = '{\n"tool_name": '+text
256
+ full_json_str = full_json_str.strip()
257
+ if full_json_str.endswith("```"):
258
+ full_json_str = full_json_str[:-3]
259
+ return json.loads(full_json_str)
260
+
261
+ from duckduckgo_search import DDGS
262
+
263
+ def use_tool(action: dict):
264
+ tool_name = action["tool_name"]
265
+ if tool_name == "Final Answer":
266
+ return "Assistant: "+action["input"]
267
+ elif tool_name == "Calculator":
268
+ exec(action["input"])
269
+ return f"Tool Output: {output}"
270
+ elif tool_name == "Search":
271
+ contexts = []
272
+ with DDGS() as ddgs:
273
+ results = ddgs.text(
274
+ action["input"],
275
+ region="wt-wt", safesearch="on",
276
+ max_results=3
277
+ )
278
+ for r in results:
279
+ contexts.append(r['body'])
280
+ info = "\n---\n".join(contexts)
281
+ return f"Tool Output: {info}"
282
+ else:
283
+ # otherwise just assume final answer
284
+ return "Assistant: "+action["input"]
285
+
286
+
287
+ def run_agent(query: str):
288
+ res = generate_text(query)
289
+ action_dict = format_output(res[0]["generated_text"])
290
+ response = use_tool(action_dict)
291
+ full_text = f"{query}{res[0]['generated_text']}\n{response}"
292
+ return response, full_text
293
+
294
+
295
+ query = input(">: ")
296
+
297
+ input_prompt = instruction_format(system_message, query)
298
+
299
+ out = run_agent(input_prompt)
300
+ print(out)
301
+
302
+ second_step = out[1]+"""
303
+ Assistant: ```json
304
+ {
305
+ "tool_name": """
306
+
307
+ out = run_agent(second_step)
308
+
309
+ print(out[0])
310
+ ```
311
+
312
+ This code sets up a basic AI agent. Note that, python libraries such as Langchain or LlamaIndex could also be utilised for building the agent.
313
+ Also, the custom cools (and the corresponding system prompt) can be altered for different functionalities.
314
+
315
+ Also replace the TOOL USAGE part with:
316
+
317
+
318
+ To use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "what is the square root of 51?" you must use the calculator tool like so:
319
+
320
+ ```json
321
+ {
322
+ "tool_name": "Calculator",
323
+ "input": "from math import sqrt; output = sqrt(51)"
324
+ }
325
+ ```
326
+
327
+ Or to answer the question "who is the current president of the USA?" you must respond:
328
+
329
+ ```json
330
+ {
331
+ "tool_name": "Search",
332
+ "input": "current president of USA"
333
+ }
334
+ ```
335
+
336
+ Remember, even when answering to the user, you must still use this JSON format! If you'd like to ask how the user is doing you must write:
337
+
338
+ ```json
339
+ {
340
+ "tool_name": "Final Answer",
341
+ "input": "How are you today?"
342
+ }
343
+ ```
344
+
345
+
346
+
347
+ **Example2:Integration with functions of Multiple Paramters**
348
+
349
+ In this example, we will be building a Finance Agent that has tools of Compund Interest, Present Value Annuity and Capital Asset Pricing calculation.
350
+
351
+ ```python
352
+ #pip install transformers accelerate bitsandbytes
353
+ import torch
354
+ import transformers
355
+ auth_token = "There goes the auth token" #Since this a private model, you must use that auth token to access the model and the tokenizer
356
+ model_name = "arhanovich/Enteli-49B"
357
+
358
+ tokenizer = transformers.AutoTokenizer.from_pretrained(model_name, use_default_system_prompt=False, use_auth_token=auth_token)
359
+ model = transformers.AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32, device_map='auto',local_files_only=False, load_in_4bit=True, use_auth_token=auth_token)
360
+
361
+
362
+ def generate_text(query):
363
+ system_message = """
364
+ <s>[INST]You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:
365
+
366
+ - Compund Interest: Calculate the future value of an investment with compound interest. :param principal: Initial amount of money invested (principal) :param rate: Annual interest rate (as a decimal) :param periods: Number of periods the money is invested for :return: Future value of the investment.
367
+ - Present Value Annuity: Calculate the present value of an annuity :param payment: The fixed payment amount per period :param rate: Discount rate per period (as a decimal).:param periods: Total number of periods :return: Present value of the annuity.
368
+ - Capital Asset Pricing: Calculate the expected return of an asset using the Capital Asset Pricing Model (CAPM) :param expected_market_return: Expected return of the market :param risk_free_rate: Risk-free rate of return :param beta: Beta of the asset :return: Expected return of the asset.
369
+ - Final Answer: the final answer tool must be used to respond to the user. You must use this when you have decided on an answer. :param answer:Your final answer
370
+
371
+ To use these tools you must always respond in JSON format containing `"tool_name"` and `"parameters"` key-value pairs.
372
+
373
+ For example, to answer the question, "Suppose you invest $5,000 in a savings account offering an annual interest rate of 4%. How much money will be in the account after 10 years if the interest is compounded annually?" you must use the tool like so:
374
+
375
+ ```json
376
+ {
377
+ "tool_name": "Compund Interest",
378
+ "input": "principal=5000, rate=0.04, periods=10"
379
+ }
380
+ ```
381
+
382
+ Or to answer the question "You are considering an investment that will pay you $1,000 per year for the next 5 years. If your discount rate is 3%, what is the present value of these future payments?" you must respond:
383
+
384
+ ```json
385
+ {
386
+ "tool_name": "Present Value Annuity",
387
+ "input": "payment=1000, rate=0.03, periods=5"
388
+ }
389
+ ```
390
+
391
+ To answer the question "An asset has a beta of 1.2. The risk-free rate is 2%, and the expected market return is 8%. What is the expected return on this asset according to the CAPM?" use the tool like that
392
+ ```json
393
+ {
394
+ "tool_name": "Capital Asset Pricing",
395
+ "input": "expected_market_return=0.08, risk_free_rate=0.02, beta=1.2"
396
+ }
397
+ ```
398
+
399
+ Remember, even when answering to the user, you must still use this JSON format! Example, if the Present Value of the Annuity tool gave an ouput like that: 4987.76
400
+
401
+ ```json
402
+ {
403
+ "tool_name": "Final Answer",
404
+ "input": "answer: The Present Value of the Annuity is 4987.76"
405
+ }
406
+ ```
407
+
408
+ Let's get started. The users query is as follows. You must always give your answer in JSON fomat!!!
409
+ User: """
410
+
411
+ full_prompt = system_message + query + "[/INST]"
412
+
413
+ input_ids = tokenizer(full_prompt, return_tensors="pt").input_ids.to("cuda")
414
+
415
+ generation_output = model.generate(input_ids=input_ids, max_new_tokens=1024, temperature=0.6, top_p=0.9, top_k=50)
416
+ answer = str(tokenizer.decode(generation_output[0], skip_special_tokens=True))
417
+ answer = answer.split("[/INST]")[-1].strip()
418
+ return answer
419
+
420
+ import json
421
+ import re
422
+ def format_output(text: str):
423
+ # Find the JSON part in the text
424
+ start = text.find("{")
425
+ end = text.rfind("}") + 1
426
+ if start == -1 or end == -1:
427
+ raise ValueError("JSON string not found in the text")
428
+
429
+ # Extract the JSON string
430
+ json_str = text[start:end]
431
+
432
+ # Parse the JSON string
433
+ try:
434
+ json_obj = json.loads(json_str)
435
+ except json.JSONDecodeError:
436
+ match = re.search(r'"answer":\s*"([^"]+)"', text)
437
+ if match:
438
+ return match.group(1)
439
+ else:
440
+ raise ValueError("Answer not found")
441
+
442
+
443
+
444
+ # Ensure the necessary keys are present
445
+ if "tool_name" not in json_obj or "input" not in json_obj:
446
+ raise ValueError("Required keys ('tool_name', 'input') are missing in the JSON")
447
+
448
+ # Extract and parse the parameters
449
+ try:
450
+ parameters_str = json_obj["input"]
451
+ params = dict(param.split("=") for param in parameters_str.split(", "))
452
+
453
+ # Convert parameter values to appropriate type (int, float, or leave as string)
454
+ def convert_value(v):
455
+ try:
456
+ return float(v) if '.' in v else int(v)
457
+ except ValueError:
458
+ return v # If conversion to int or float fails, return the string as is
459
+
460
+ params = {k: convert_value(v) for k, v in params.items()}
461
+ except Exception as e:
462
+ raise ValueError(f"Error parsing parameters: {e}")
463
+
464
+ return json_obj["tool_name"], params
465
+
466
+
467
+ def compound_interest(principal, rate, periods):
468
+ """
469
+ Calculate the future value of an investment with compound interest.
470
+ :param principal: Initial amount of money invested (principal).
471
+ :param rate: Annual interest rate (as a decimal).
472
+ :param periods: Number of periods the money is invested for.
473
+ :return: Future value of the investment.
474
+ """
475
+ return principal * (1 + rate) ** periods
476
+
477
+ def present_value_annuity(payment, rate, periods):
478
+ """
479
+ Calculate the present value of an annuity.
480
+ :param payment: The fixed payment amount per period.
481
+ :param rate: Discount rate per period (as a decimal).
482
+ :param periods: Total number of periods.
483
+ :return: Present value of the annuity.
484
+ """
485
+ return payment * ((1 - (1 + rate) ** -periods) / rate)
486
+
487
+ def capm(expected_market_return, risk_free_rate, beta):
488
+ """
489
+ Calculate the expected return of an asset using the Capital Asset Pricing Model (CAPM).
490
+ :param expected_market_return: Expected return of the market.
491
+ :param risk_free_rate: Risk-free rate of return.
492
+ :param beta: Beta of the asset.
493
+ :return: Expected return of the asset.
494
+ """
495
+ return risk_free_rate + beta * (expected_market_return - risk_free_rate)
496
+
497
+
498
+ def final_answer(answer):
499
+ return answer
500
+
501
+
502
+
503
+ def use_tool(tool_name, params):
504
+ if tool_name == "Final Answer":
505
+ result = final_answer(**params)
506
+ return "Assistant:" + result
507
+
508
+ elif tool_name == "Capital Asset Pricing":
509
+ result = capm(**params)
510
+ return "Tool Output:" + str(result)
511
+
512
+ elif tool_name == "Present Value Annuity":
513
+ result = present_value_annuity(**params)
514
+ return "Tool Output:" + str(result)
515
+ elif tool_name == "Compound Interest":
516
+ result = compound_interest(**params)
517
+ return "Tool Output:" + str(result)
518
+
519
+ else:
520
+ return "Assistant: An error occured"
521
+
522
+
523
+ def run_agent(query: str):
524
+ res = generate_text(query)
525
+ print(res)
526
+ tool_name, params = format_output(res)
527
+ response = use_tool(tool_name, params)
528
+ full_text = f"{query}{res}\n{response}"
529
+ return response, full_text
530
+
531
+
532
+ query= input(">: ")
533
+ out = run_agent(query)
534
+ print(f"Result: {out[0]}")
535
+
536
+ #You can run the second outputs and get the final results using the same logic as in the previous example
537
+ ```
538
+
539
+
540
+
541
+ **Deployment**
542
+ Enteli-49B, a sophisticated AI model, necessitates a minimum of 95GB of VRAM for optimal operation. It functions efficiently on **dual
543
+ A100 80GB** systems, where each A100 is equipped with 80GB of VRAM, 117GB of RAM, and 12 VCPUs.
544
+
545
+ The model is compatible with virtual machines, with affordable options available through runpod.io. On average,
546
+ the model processes and outputs a total of 500 tokens in approximately 35 seconds when utilizing a dual A100 80GB setup.
547
+
548
+ In the context of this model, 'tokens' represent fragments of words. During the initial processing phase, the input is segmented into these tokens,
549
+ which may consist of partial words, spaces, or even sub-words. For the English language, a single token is roughly equivalent to three-quarters of a word.
550
+
551
+ The cost for one-hour usage of a dual A100 80GB system on Runpod is approximately 4 USD. Consequently, processing 1,000,000 tokens (equivalent to around 750,000 words)
552
+ would incur a cost of about 75 USD. However, this approach allows for processing only one prompt at a time and presents challenges in GPU management. Additionally,
553
+ time-based GPU rental can lead to inefficiencies, as the model may not be in constant use. Thus, employing services like Runpod might not be the most user-friendly
554
+ option for consumers.
555
+
556
+ **Fortunately**, at EnteliMind, we have access to extensive, dedicated computational servers equipped with numerous GPUs.
557
+ We aim to offer an API service where you are billed based on token usage. Last month, our usage amounted to approximately 948,750,000 tokens, costing us 7590 USD.
558
+ From this data, we deduce that the cost for processing 1,000,000 tokens (about 750,000 words) is 8 USD. Therefore, we are **prepared to offer you our API service**
559
+ at a rate of 8 USD per 1 million tokens, subsequent to the purchase of our Enteli-49B AI model.
560
+
561
+
562
+
563
+ **Future Work**
564
+
565
+
566
+ The abilities of Enteli-49B can be amplified with these several mathods.
567
+
568
+ **1-)Building More Complex AI Agents and Swarms**
569
+
570
+ This is probably the cheapest and the easiest method, though it will produce the best results. The abilities of Enteli-49B could be expanded with many custom tools
571
+ (functions) integrated with. This opens wide avanues to the innovation in **finance** for example, combining the AI with any imaginable tool. Another advanced method is
572
+ builiding an AI agent swarm where different AIs with their toolsi talk, negotiate with each other to solve pretty intricate problems. This may sound a bit hard to implement
573
+ however the projects CrewAI and Autogen have simplified this process immensely.
574
+
575
+ Langchain: https://python.langchain.com/docs/get_started/introduction
576
+ CrewAI: https://github.com/joaomdmoura/crewAI
577
+ Autogen: https://github.com/microsoft/autogen
578
+
579
+
580
+ **2-)Fine-Tuning**
581
+
582
+ Fine-tuning is a crucial step in enhancing the capabilities of pre-trained large language models (LLMs) for specific tasks or domains.
583
+ Initially, these models are trained on vast and diverse datasets, equipping them with a broad understanding of language and its various applications.
584
+ However, this general training doesn't provide the model with deep expertise in particular areas or specialized tasks.
585
+
586
+ To address this, fine-tuning comes into play. It involves adjusting the model's parameters further, but this time using a smaller, domain-specific dataset.
587
+ This process is akin to giving the model a "mini-education" in a particular field or task, allowing it to become more adept and efficient in that area.
588
+
589
+ During fine-tuning, the model is exposed to examples that are closely related to the specific task at hand. This exposure helps the model to grasp the
590
+ subtleties and nuances of the domain, which might not have been covered during its initial training. For instance, a model trained on a general dataset may
591
+ have a basic understanding of medical terminology, but through fine-tuning with medical texts, it can develop a much more refined and accurate understanding of this
592
+ domain.
593
+
594
+ The result of fine-tuning is a more specialized version of the language model, tailored to perform better in specific applications.
595
+ It effectively narrows the gap between a general-purpose model and a specialized tool, unlocking new possibilities and enhancing the model's performance in targeted tasks.
596
+ This makes fine-tuning an invaluable process for realizing the full potential of LLMs in various domains and applications.
597
+
598
+ One of the most well-known technique is PEFT (Parameter-Efficient Fine-Tuning). It is a library for efficiently adapting large pretrained models to various
599
+ downstream applications without fine-tuning all of a model’s parameters because it is prohibitively costly. PEFT methods only fine-tune a small
600
+ number of (extra) model parameters - significantly decreasing computational and storage costs - while yielding performance comparable to a fully fine-tuned model.
601
+ This makes it more accessible to train and store large language models (LLMs) on consumer hardware.
602
+
603
+ PEFT: https://huggingface.co/docs/peft/index