– Category Machine Learning

RankFlow plot for retriever visual evaluation

2024-07-08

RAG systems depend on high-quality retrieval to surface relevant information. Analyzing how document rankings evolve through multiple re-ranking steps is complex. This article explores ways to collect ranking data and visualize rank changes to optimize retriever effectiveness.

How Agile Can Kill Creativity in Data Science team?

2023-09-29

Discover the delicate balance between Agile methodologies and imagination in the domain of data science and analytics. Uncover the impact of Agile approaches on creativity within data science teams. Explore how these practices shape the innovative landscape of data science and analytics.

From Fixed-Size to NLP Chunking - A Deep Dive into Text Chunking Techniques

2023-09-11

Discover text chunking - the secret sauce behind accurate search results and smarter language models! By understanding how to effectively chunk text, we can improve the way we index documents, handle user queries, and utilize search results. Ready to uncover the secrets of text chunking?

Harnessing the Power of Dependency Injection for Improved Testability in Python

2023-06-21

Learn how to use dependency injection to decouple dependencies from our functions, methods, or classes, making it easier to test and maintain our code.

Rethinking the Link Between Speech and Expertise

2023-02-23

We often associate eloquent speech with intelligence and knowledge. But what if I told you that this assumption is not always true?

Understanding AI with ELI5 - Demystifying Decisions (tutorial)

2023-02-20

Want to know why your AI model made that decision? ELI5 has got you covered. Let's dive into Explainable AI with ELI5.

"Comprehensive Guide to Interpreting R\xB2, MSE, and RMSE for Regression Models."

2023-02-13

Don't let misleading metrics fool you. Master the art of analyzing regression model performance and make smarter decisions.

Libraries for Automated Exploratory Data Analysis (EDA)

2023-02-12

EDA Made Easy - Discover Top-10 Python Libraries That Will Take Your Data Analysis to the Next Level! Learn the Secrets of Automated EDA!

Is the the Game Theory Any Useful for Data Science?

2023-02-09

Exploring the intersection of game theory and data science - insights into decision-making, network behavior, and optimization algorithms.

Beat Overfitting in Kaggle Competitions - Proven Techniques

2023-02-08

Ready to take your Kaggle competition game to the next level? Learn how to recognize and prevent overfitting for top-notch results.

The Impact of Search Engines and AI Generative Models on Mental and Cognitive Capabilities

2023-02-01

Understand the effects of search engines and AI on our mental and cognitive capabilities. Equip yourself with the knowledge you need to make informed decisions about your own usage of these technologies.

Becoming a Data Wizard - The Benefits of Learning Databricks

2023-01-30

Learn how Databricks can help you master big data, improve data processing and machine learning skills and excel in your career. Boost your career with this powerful platform.

Common Types of Data Science Projects

2023-01-19

Learn about common types of data science projects and best practices for approaching them. From end-to-end individual work to production-ready projects, this guide covers it all.

How to Detect ChatGPT-Generated Text?

2023-01-11

Discover the latest methods for distinguishing machine-generated text from the human-written text. Learn about statistical, syntactic, semantic, and neural network-based approaches. Stay up-to-date with the latest research in NLP and AI.

Visual Text Exploration as Part of Preprocessing Before Classification

2022-10-11

This post discusses importance of visual text exploration in preprocessing for classification, covers techniques (wordcloud, Sentiment Analysis, topic modeling, data cleaning) & how to use them with popular libraries. Encourages readers to try for own projects.

Discovering Hidden Gems - Popular and Lesser-Known Dataset Sharing Platforms

2022-06-09

"Looking for the key to unlocking valuable datasets? Dive into the world of Kaggle, UCI, and more as we unveil the best platforms for data enthusiasts."

Pro Tips for Diagnosing Regression Model Errors

2022-02-22

Improve your regression model's accuracy and predictability by uncovering hidden errors with these essential plots.

15 Tools for Document Deskewing and Dewarping

2022-02-11

Sometimes input for document processing tasks such as OCR, table detection or text segmentation can be scanned or photo taken from hand that do not have ideal perspective - is rotated or spatially distorted in some way (warped document). If you are looking for my recommendations go straight to the last section of this article "Summary and recommendations".

Understanding Micro and Macro Averages in Multiclass Multilabel Problems

2021-12-22

Learn about micro and macro averages in multiclass multilabel problems, the difference between multiclass and multilabel problems and when to use micro and macro averages.

Unleashing the Power of T-Sne for Dimensionality Reduction in Python

2021-03-15

Want to create beautiful visualizations from complex data? Discover the power of T-SNE for dimensionality reduction in Python.

Kurtosis in Simple Terms, Interpretation and Gotchas

2021-02-18

Statistics can be tricky, but understanding kurtosis is a must for anyone who wants to avoid making common mistakes in statistical analyses. Learn how to interpret it in this comprehensive guide.

Finding Errors in Data - Data Validation

2021-01-31

Explore methods to detect & fix errors in data, including validation, visualizations, statistical tests, cleaning techniques, machine learning & data quality tools. Get concise, easy to understand information with examples & links to external resources.

Pandas Schema Validation

2021-01-16

Overview of the available tools and methods for schema validation in pandas, examplary code snippets and recommendation for when to use given tool.

Metrics Used to Compare Histograms

2020-01-19

Learn about metrics used to compare histograms with examples of how to calculate them in python. From Chi-Squared distance to Kullback-Leibler divergence and Earth Mover's distance. A comprehensive guide.

Kaggle Evaluation Metrics Used for Regression Problems

2019-02-16

"This post describe evaluation metrics used in Kaggle competitions where problem to solve is has regression nature. Eight different metrics are described, namely - Absolute Error (AE), Mean Absolute Error (MAE), Weighted Mean Absolute Error (WMAE), Pearson Correlation Coefficient, Spearman\u2019s Rank Correlation, Root Mean Squared Error (RMSE), Root Mean Squared Logarithmic Error (RMSLE), Mean Columnwise Root Mean Squared Error (MCRMSE)."

What's Cooking

2018-04-05

Exploratory Data Analysis of the Kaggle's "What's cooking" competition dataset to get understanding what kind of data we are dealing with and get intuition of existing dependencies.