
Data Science Weekly – Issue 595 by sebg
Hello!
Once a week, we write this email to share the links we thought were worth sharing in the Data Science, ML, AI, Data Visualization, and ML/Data Engineering worlds.
And now…let’s dive into some interesting links from this week.
-
Things Statisticians Claim to Hate But Secretly Love
There Are Liars, Damn Liars, and Statisticians. This Is the Truth…
-
Decomposing Transactional Systems
Every transactional system does four things:-
It executes transactions
-
It orders transactions
-
It validates transactions
-
It persists transactions
All four of these things must be done before the system may acknowledge a transaction’s result to a client. However, these steps can be done in any order. They can be done concurrently. Different systems achieve different tradeoffs by reordering these steps…
-
-
Keeping it boring (and relevant) with BM25F
First introduced in 1994, BM25 eventually made its way into popular search engines like Apache Lucene and has been powering search bars across the internet for decades. Because it works well with keyword search engines, BM25 is efficient to compute over massive datasets. It also performs well in diverse settings, providing good out-of-the-box ranking in a variety of domains like site search, legal search, and more…Given BM25’s success in text search, it’s natural to wonder if it could work well for keyword-style code searches. After implementing it in Sourcegraph’s recent 6.2 release, our answer is “yes”!…
Unlock the full potential of your data with Conjointly’s Insights Explorer. The Insights Explorer is a free browser-based rswam IDE that includes an AI assistant, allowing you to generate analysis without writing additional code or installing software.
Simply ask the AI assistant questions about your data in plain English, and it will generate executable code ready to help you transform, visualise, and explore your data. Insights Explorer helps you simplify your workflow and streamline the path from data to decision.
Whether you need quick analysis or deep data exploration, the Insights Explorer helps you focus on the insights that really matter, rather than getting bogged down in technicalities and common pain points of data programming. Spend less time searching for the right libraries, looking up syntax, restructuring data or debugging code and more time interpreting results and extracting meaningful insights for business decisions.
* Want to sponsor the newsletter? Email us for details –> team@datascienceweekly.org
-
Generative modelling in latent space
Most contemporary generative models of images, sound and video do not operate directly on pixels or waveforms. They consist of two stages: first, a compact, higher-level latent representation is extracted, and then an iterative generative process operates on this representation instead. How does this work, and why is this approach so popular?… -
Diffusion Generative Model, Non-Euclidean Data, and How the Algebraic/Geometric Structure of Lie Groups Helps
It is challenging but beneficial to create diffusion generative model on manifolds. By algorithmically introducing an auxiliary momentum variable and mathematically “trivializing” it, this task becomes, for a useful class of manifolds, as easy as diffusion model in Euclidean spaces, so that many great existing developments can be used… -
How to Build an Agent
It’s not that hard to build a fully functioning, code-editing agent…It seems like it would be. When you look at an agent editing files, running commands, wriggling itself out of errors, retrying different strategies – it seems like there has to be a secret behind it…There isn’t. It’s an LLM, a loop, and enough tokens… -
Python TARIFF
The GREATEST, most TREMENDOUS Python package that makes importing great again!…TARIFF is a fantastic tool that lets you impose import tariffs on Python packages. We’re going to bring manufacturing BACK to your codebase by making foreign imports more EXPENSIVE!… -
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
Representation Engineering (RepE) is a novel paradigm for controlling the behavior of LLMs. Unlike traditional approaches that modify inputs or fine-tune the model, RepE directly manipulates the model’s internal representations. As a result, it may offer more effective, interpretable, data-efficient, and flexible control o