Srinath Tummala
Built a question-answering chatbot using Retrieval-Augmented Generation (RAG) that allows users to query a document in natural language β tested using my own resume. This project showcases practical skills in integrating OpenAI GPT-4 with FAISS-based semantic search to build scalable and contextual document assistants.
text-embedding-ada-002
to generate embeddings from chunked document sections.Python
, OpenAI GPT-4
, LangChain
, FAISS
, Streamlit
, FastAPI
, PyMuPDF
, text-embedding-ada-002
A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses data analytics to purchase houses at a price below their actual values and flip them on at a higher price. For the same purpose, the company has collected a data set from the sale of houses in Australia. The data is provided in the CSV file below. The company is looking at prospective properties to buy to enter the market. You are required to build a regression model using regularisation in order to predict the actual value of the prospective properties and decide whether to invest in them or not. The company wants to know: Which variables are significant in predicting the price of a house, and How well those variables describe the price of a house. Also, determine the optimal value of lambda for ridge and lasso regression.
Steps Involved:
- Importing modules, Reading the data
- Analyzing Numerical Features
- Outlier Treatement
- Correlation Analysis
- Missing value treatement
- Univariate , Bivariate Analysis
- Data Visualization
- Encoding Categorical Features
- Splitting data into Train and Test data
- Transformation of Target Variable
- Feature Scaling
- Primary Feature Selection using RFE
- Ridge Regression
- Lasso Regression
- Comparing model coefficients
- Model Evaluation
- Choosing the final model and most significant features.
This Project is based on the rental bikes data set from a company called Boom Bikes.
Work Flow
- Data Loading and Understanding
- Data Visualization
- Data Preparation - Split into test,train and Rescale
- Data Modelling
- Residual Analysis, Checking Assumptions of Linear Regression
- Prediction and Evaluation on the Test Dataset
This case study is based on Lending Club Dataset. Based on the available data set, we are required to draw insights which help in categorizing a new customer. The objective of this case study is to analyze the data to identify customers who might default the loan.
Table of Contents
- Data Cleaning
- Data Standardization
- Missing value treatment
- Outliers Check
- Univariate Analysis and Segmented Univariate Analysis
- Observations from Univariate and Segmented Univariate Analysis
- Bivariate or Multivariate Analysis
- Observations from Bivariate or Multivariate Analysis
The aim of this project is to classify a URL as a Malicious or Benign URL. Collected a public dataset that consists of 450176 rows and two classes of URLs: Malicious or Benign. Have extracted a total of 17 features which consist of Lexical features, count-based features, and two binary features. Trained the model on Adaboost and Random Forest classifiers and used the Voting Classifier ensembling method to get the optimum result among the two classifiers
This Project submitted as a thesis at GITAM University, Hyderabad.
Authors: Srinath Tummala, Shivani Donthi