Yazeed – QWORQS

BeautifulSoup

Posted on March 14, 2025March 14, 2025 by Yazeed

BeautifulSoup is a python package that allows you to extract data from HTML files, it is very easy and intuitive

Let us assume you have an HTML page !

First, let us assume you want the title from that HTML page….

mysoup = BeautifulSoup(response.content, 'html.parser')
title = soup.title.string if soup.title else "No title found"

Now, assuming you want to remove everything that has to do with CSS and presentation, you can remove the following things with this easy code snippet, then putting whatever is lef in a variable called text

for irrelevant in mysoup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
text = soup.body.get_text(separator="\n", strip=True)

Gnome calculator freezes instantly

Posted on February 25, 2025 by Yazeed

Well, the calculator that ships with Gnome would launch instantly, then freeze instantly, I would then have to keep clicking the (x) in the corner and wait for the OS to suggest killing the app

No matter how long I wait on the app, it won’t budge, you can’t even move it around the screen

Turns out the calculator works if I am not connected to the internet (Pull the Ethernet cable out)

The solution was simple, it seems the calculator freezes waiting for data from the internet, when there is no internet, it skips the step

The solution was simple, it goes online for currency conversion rates, something i never use, so the solution is to execute the following command to stop it from doing that

dconf write /org/gnome/calculator/refresh-interval 0

Yup, that is it, the calculator works perfectly now

All about hard drive cache

Posted on February 21, 2025February 21, 2025 by Yazeed

How does a hard drive cache work EXACTLY

The short answer is, EXACTLY, no one knows, how a hard drive cache works is a manufacturer secret and differs from drive to drive depending on the drive’s purpose, BUT, we have a lot of clues, some through the SATA specification (And PATA), others through industry standard commands, and it is also not so hard to get what we want from black box reverse engineering, we might not get the actual algorithm (or variant of the algorithm) from such an endeavor, but we can know enough to predict how it will work

Hard drives are not simple machines in any sense of the word, as soon as you are familiar with them, and if you are familiar with computer science, specifically algorithms, you will come to conclusions concerning where complexities lie ! and it is not all in the hardware, much of it is in the hard drive’s software (Firmware)

The hard drive’s raison d’être

You see, a hard drive spins at a certain speed (Most commonly 5400 or 7200 rpm), some spin even faster, the hard drive has to do all it can to do what it is asked in the most efficient way, so for example, it allows the OS (through the controller’s driver) to tell it all about what data it wants in advance so that it can plan the heads shortest path to getting all that data (Native Command Queuing and before it Tagged Command Queuing), but let us not get carried away here, we are here to find out how cache works ! NCQ is a topic for a different day (Or is it)

Im here for the recipes

There are very few recipes and interactions that you are able to make use of, but let me try to come up with the most common ones you will probably want.

IMPORTANT: please note that all this is lost when you switch your computer off, to make this stuff permanent, you will need to add them to /etc/rc.local or use udev rules

write caching

First, here are the commands to probe for state, enable and disable

# Check status (=0 means disabled)
sudo hdparm -W /dev/sdX
# Enable
sudo hdparm -W1 /dev/sdX
# Disable
sudo hdparm -W0 /dev/sdX

read ahead caching

First, here are the commands to probe for state, enable and disable

# Check for state (Zero means disabled, other values are sectors to cache)
sudo hdparm -a /dev/sdX
# Enable (Ask for a 256 sector read ahead)
sudo hdparm -a 256 /dev/sdX
# Disable 
sudo hdparm -A 0 /dev/sdX

Operating system level caching for a device

# Set read ahead for a disk into ram (Unit: Memory blocks)
blockdev --setra xxx /dev/sda
# Set write caching in system memory (Percentage of ram)
echo 10 > /proc/sys/vm/dirty_ratio
# Fstab entry to create a hard drive (Block device) in RAM (percentage or size Ex: 20G)
tmpfs /mnt/tmpfs tmpfs size=50%,rw,nosuid,nodev 0 0

In this day and age, do we still need spinning hard drives anyway ?

Well, yes, and no, in my case, I burn through hard drives and SSDs very quickly, but with a little tweaking, hard drives live a bit longer (Can only be achieved by also managing the vibration of multiple disks with a heavy computer case, but that is a topic for a different post), my use case is all about continuous writing, SSDs don’t seem to like this.

If this does not apply to you, and SSD cost is what is stopping you from going all in SSD, then maybe you would be interested in a post about adding an SSD caching layer in front of your inexpensive spinning disk

Why this is important to me (and you)

It is important to me because I have a mysql database spread across a big bunch of spinning disks, those disks are being written to ALL THE TIME, and this is precisely why using SSDs here is a bad idea, the data is short lived but the drive is hammered with writes continuously !

I am not saying that hard drives don’t take a considerable hit when they are hammered with writes continuously, but a disk constantly busy seeking while writing vs a disk writing sequentially do not bear the same kind of penalty, in fact, from my experiments, a hard disk with a write load designed to destroy it, will last much less than an SSD ! and the hit on SSDs also depends on the workload (Check write amplification), so yeah, this subject can get out of hand quickly

Is a hard drive’s cache used for reading or writing

Both, you will be told online (On some very authoritative popular places) that it is mostly for reading, but I fail to see what that means, it is mostly for whatever you are doing more ! Here is a bad example, It’s as if you are asking if a dolly is more concerned with sending goods to the truck or bringing them from the truck to the warehouse, it depends on whether you are loading or unloading the truck

Why is this a bad example you ask, well, because a hard drive is not a dolly that is being used to unload a truck, operating systems and database engines and hard drives are not a sheet of metal on 4 wheels (More like a sheet of oxidized metal on one bearing, but that is besides the point), A database operation will typically require many reads before it does any writes, and those reads are also handled by the database engine’s cache and the operating system’s cache, you get the idea and complexity…. but this still doesn’t mean that cache is concerned with reads more than writes or the other way around. it will depend on your workload, and on the correct disk firmware for that workload (EX: WD purple vs WD Blue, VS WD Black for example).

the firmware will always determine the priorities of the disk when caching, so certain firmwares will lean towards caching writes over reads while other firmwares will do the opposite.

NCQ already !

Well, since me and my big mouth already got us into NCQ, let me start with that and get it out of the way

NCQ is not possible without a chache, the cache is used to

Store operating system’s requests, reordering them according to their locations on the disk, and fetch them
Some requests may be served immediately from the cache before that cache is overwritten
Write Coalescing and Deferred Writes, writes can be “acknowledged” before being written and wait their turn to truly be written, and are only written to disk when they are combined into a larger write for optimization (There is a feature in NCQ that allows the OS to know if it was written to the disk or just the cache, but you don’t need that in your applications, you shouldn’t care)

Okay, so let us get back to what we were saying….

Hard drive cache for reading

hard drive designers are certainly well aware of the operating system’s cache in ram, so what good could come from caching in a measly 64MBs on the disk

this is a very good question, you see the operating system will not attempt to read neighboring areas of the disk just because they have zero overhead, but the disk will, it is free potential prefetch so why wouldn’t it fill its cache with it

There are many reasons why it would and why it would not, the cache size is limited, so there are priorities to what gets done with this cache, but also, the required processing is not little, so you don’t want to push that hard drive processor making a bottleneck out of it, remember when western digital came out with their black series and promoted them as having 2 processors (Micro-controllers is probably the correct term, but why complicate the jargon), that is because there is plenty of processing tasks to be done ?

So let us get to the reading business, if you ask AI, you will get very outdated or irrelevant data, when i asked AI, it seems to return advantages that are nulled by operating system disk-to-ram caching, so let me tell you what is still true and what is not

Prefetching and Read-Ahead Optimization also known as (read-lookahead feature) and (read-ahead caching): Since the hard drive has knowledge of its own physical layout and access patterns, it can intelligently prefetch adjacent data into cache. Unlike the operating system, which only caches frequently used files or blocks, the hard drive itself can anticipate sequential reads and load data preemptively at a very little to no overhead (because it is reading data in the head’s way mostly). This is particularly useful for sequential reads (Mostly contiguous) . the drive itself has the facility to detect whether the read is sequential or not from the request addresses, SO TO AVOID LOST SPINS DON’T COMPLETELY DISABLE IT… MAKE IT LOWER IF YOU MUST, EXPERIMENTATION ON THE BEST SIZE IS KEY
Interaction with OS-Level Caching: While the operating system also caches data in RAM, the drive’s internal cache is the first line of defense against performance bottlenecks. The OS might not always know the drive’s specific access patterns, whereas the drive’s firmware can optimize for known workloads in real-time.
Adaptive Algorithms: Some hard drives (probably all modern ones) employ adaptive caching techniques, where they analyze access patterns over time and adjust caching strategies accordingly. For example, a drive may increase its read-ahead buffer if it detects frequent sequential reads but prioritize different caching strategies when dealing with random access patterns.

Hard drive cache for writing

Writing to a hard drive is not as straightforward as it might seem. The cache plays a crucial role in optimizing write performance and improving the overall lifespan of the drive. When data is written to a hard drive, it doesn’t necessarily go straight to the platters. Instead, the cache temporarily holds this data before it is written in an optimized manner.

This is beneficial for a few reasons:

Write Coalescing: The hard drive can combine multiple small write requests into a single, larger, more efficient write operation. This reduces the number of disk rotations required to complete a task.
Reducing Latency: If an application writes small amounts of data frequently, the cache allows the drive to acknowledge the write operation almost instantly before the data is physically committed to the disk.
Deferring Writes: Some writes can be held in cache temporarily, allowing the drive to prioritize more urgent tasks before actually writing the data to disk.

However, this raises an important issue: data integrity. Since data is often held in volatile cache before being written permanently, there is always a risk of data loss in the event of a power failure or unexpected system shutdown. To mitigate this, many enterprise-grade drives implement write-through caching or battery-backed cache systems that ensure data is not lost before it is written.

Does Cache Improve Write Speed?

Yes, but only under certain conditions. For bursty, short writes, the cache significantly improves performance because the hard drive doesn’t have to immediately seek and rotate to a specific position on the disk. Instead, it temporarily holds the data and commits it at an optimal time. However, for sustained, sequential writes that exceed the cache size, the drive eventually has to flush the cache and write directly to disk, which means the cache offers diminishing returns.

Another critical aspect to consider is firmware tuning. Some manufacturers optimize their firmware for different workloads. Consumer drives often prioritize read-heavy workloads, while enterprise drives optimize caching strategies for sustained writes and improved data integrity.

Cache Eviction and Management

Since cache size is limited (typically between 8MB and 256MB on modern drives), the firmware must decide what stays in cache and what gets discarded. The general approach follows:

Least Recently Used (LRU): Frequently accessed data is kept in cache, while older, less-used data is replaced.
Write Prioritization: If a large sequential write is detected, the drive may flush other cache contents to prioritize this operation.
Predictive Read-Ahead: The drive may determine patterns in disk access and prefetch data into cache for anticipated future reads.

The Role of the OS in Caching

The operating system also plays a major role in caching, with its own layer of RAM-based disk caching. It can reorder and batch disk operations before passing them to the hard drive. This means that even if a hard drive’s cache is relatively small, the OS can compensate by managing frequently accessed data in RAM, which is significantly faster than any onboard hard drive cache.

When Cache Doesn’t Help

While cache is incredibly useful for many workloads, there are scenarios where it does little to nothing:

Purely Sequential Writes: If you are writing large files that exceed the cache size, the drive will quickly bypass the cache and write directly to disk.
Heavy Random Workloads: If your workload is entirely random writes that do not benefit from coalescing or deferred writes, the cache provides minimal advantage.
Database Applications (Like MySQL): Many database engines already perform their own caching and optimizations, sometimes making CERTAIN TYPES OF CACHING on the hard drive’s cache redundant, and making other caching mechanisms more valuable (Why i research hard drive caching).

Final Thoughts

Hard drive cache is a critical but often misunderstood component. It plays a dynamic role in both read and write operations, helping to bridge the performance gap between slow spinning platters and fast system memory. While the actual caching algorithms remain proprietary, we can infer their behavior from real-world testing and performance characteristics.

For database-heavy workloads like MySQL, tuning both the database and disk caching mechanisms can lead to significant performance gains. Understanding when and how a hard drive’s cache is utilized can help in selecting the right drive for your specific use case.

12TB disk does not show up

Posted on February 19, 2025 by Yazeed

I have been using an intel “D525mw” intel atom system as a network attached storage system for some time now, I have an extra SATA PCIe card (Silicon Image, Inc. SiI 3132) so that I can connect 4 disks, when the 12TB western digital disk (HGST HUH721212AL) is connected to the external SATA card, it does not show up, meaning, an “fdisk -l” does not bring it up !

So the next thing to do is swap the SATA connection with a different disk connected to the motherboard, and suddenly it works, amazing, but I need to know where the problem comes from

The first theory is that disks that are SFF-8447 compliant (rather than the old IDEMA standard) are not supported by this controller !

LangChain

Posted on February 9, 2025February 19, 2025 by Yazeed

Langchain is an abstraction layer so that you don’t have to call the APIs of every AI provider separately

Originally in Python, langchain ports such as Javascript (Official), Dart and Rust exist

Ancient Gigabyte GA-X58A-UD3R V2 troubles

Posted on February 3, 2025February 4, 2025 by Yazeed

One reason for writing this post is that I can not seem to find a way to update the backup BIOS on the system, this means that if my BIOS ever gets corrupt, the system will revert to a very old BIOS, and I will have to do it all again, this post should save me a lot of time if that happens.

This is a computer as old as time that has always been giving me trouble, was used for a few years then went and found a home in the basement with all the other electronic garbage… but lately, I have been short on ram, and this machine had 24 Gigs of it, so i thought i might move some processes to it, so i fired it up and it fired just fine, then i noticed a problem with the RAM speed !

The CPU on the device is a first gen I7-980 (The 6 core one), and the ram is all identical (Kingston KVR1333D3N9/4G)

BIOS Update

So before i go about bashing Gigabyte (the deserve it as you will see through this post), I decided to start by updating the BIOS, I had the FA, and on the website, there were 3 versions, all of them newer than FA, namely FF, FG1, and FH, FH is obviously the one we are aiming for

So the motherboard is advertised to have a built in bios update function, so i started with that (Q-flash), trying to update to FH failed (Error about invalid bios file), So i decided to take them one by one, started by updating to FF which worked, but then went to FG1 and that failed, turns out FG1 is where EFI was added to the bios effectively doubling its size.

I am mentioning EVERYTHING here to spare you the hassle of trying unnecessary steps if you have the same motherboard

So, i created a DOS boot flash disk, and tried their update utility, that too failed with an error complaining that it is of the wrong size…

This is sort of a problem, I now need a copy of Windows 7, and the only copy of windows I have is on my laptop, Windows 11, and that will surely not boot on a motherboard without UEFI !

Eventually, i found another very old computer with windows, and what do you know, the update was only successful with Gigabyte’s @BIOS program.

Before using the tool, I followed the instructions about disabling hyper threading on the CPU from the BIOS.

So, now i have an up to date bios (And by up to date I mean 2012),let us get to the actual problem !

RAM issues

Ever since day 1, the RAM would report 1333 in the bios, but (400 MT/s) in the operating system (Cpu-z on windows, and “lshw -short -C memory” and “inxi -mxxx”)

the CPU itself only handles ram up to 1066, So i was expecting the motherboard to fall down to that !

but when running any of the commands above, it reports 400MHz and 400MT/s !

400 is a very low number ! especially when another computer from the same era reports (1600 MT/s)

Not only that, but I can’t seem to access the manufacturer name for example.

So after hours of trying to convince the system to use different values, I was surprised when i ran sysbench (sysbench memory run), sysbench did not yield the expected low results ! but instead, reported a data transfer of 5GB/s

So, up to now, I am under the impression that when the CPU slows down, it takes the RAM speed down with it, I will test this theory very soon, but for now, who cares, it is working and misreporting and that is all I am interested in this very minute

Below is almost the same post, i started writing it, then forgot i wrote it and wrote it again, leaving it here in hopes that it may be useful….

Now to the BIOS !

This was a real pain, putting the motherboard in production again (I need ram on the network), I notices that there is no EFI ! turns out that would require the bios to be upgraded from FA to FH ! (Through FF and FG1), the tools bundled with the firmware image don’t work ! I created a dos disk, and tried, but the updater that comes with the firmware claims the image is not compatible….

Turns out I have one choice to get the update going, install windows 7 !!!

So i found a hard drive with Windows 7, and used @bios to upgrade the bios

Which SSD should I buy

Posted on January 30, 2025January 30, 2025 by Yazeed

This is a question that I get all the time from people who think I know my way around computers. So i will try making it as simple as possible because people who ask me are usually not the savviest of people when it comes to technology.

A table comparing DRAM-less SSDs to SSDs with DRAM is available here (Nothing extra ordinary, just to give you a feel of what variables are compared)

Storage space

There is honestly very little that I can say about storage that you don’t already know, your workload, and what you store on your computer is something you probably know all about, if you have plenty of video content, you need a big disk, if you just need to browse the web, a very small 128GB disk would do, one more thing to keep note of is durability, in the world of SSDs, the TB-Written endurance/longevity rating increases as the disk grows bigger, simply because flash cells have a limited number of write cycles, the larger the disk, the more flash cells, the more data you can write to a disk without it going bad, Also, some performance parameters change slightly between different sizes of the same model, but I will not be making a choice based on those numbers, So the space consideration will boil down to “What are you planning to store on the disk”, and how much space will that need

Speed

The last time we spotted people to whom speed meant freedom of the soul was in 1997, ever since then speed has come to imply “Time savings” more than anything, So this is the area where things will get a bit complicated, the list of keywords you see here are what will be covered in this post

Max transfer rate
IOPS
Max sustained rate
Burst rate
DRAM
SLC Cache
4K reads/writes
HMB (Host Memory Buffer)
NAND flash memory for caching

Whether we are talking about spinning disks or SSDs, there is the maximum speed, which is basically a sequential read or write to the disk, which now, with NVMe (PCIe connected SSDs) has reached crazy speeds, and there is IOPS (Input/Output operations)

Cheap SSDs don’t come with DRAM, and for the average user, there is no problem with that, most data is for reading, and the SSD is hardly ever pressed to a workload where the difference is noticeable, so do I need a disk with DRAM ?

If you are just someone who plays games, runs a web browser, and boots windows, the short answer is NO, the difference in time is not worth it.

Gradio

Posted on January 27, 2025January 27, 2025 by Yazeed

Gradio is all about user interface, it is a startup that got acquired by Hugging Face !

In your python code, you execute the following line

import gradio as gr # The way people usually call it

now, here is an example that turns your text to uppercase

def to_uppercase(text):
    return text.upper()

Now, to get a user interface, run the following code

gr.Interface(fn=to_uppercase, inputs="textbox", outputs="textbox").launch()

Now, you should be getting 2 boxes, one for your input, and the other to display your text in uppercase

Now, imagine the to_uppercase function being a call to an AI, so there you have it

Here is a variant with big boxes

# Inputs and Outputs

view = gr.Interface(
    fn=to_uppercase,
    inputs=[gr.Textbox(label="Your message:", lines=6)],
    # outputs=[gr.Textbox(label="Response:", lines=8)],
    outputs=[gr.Markdown(label="Response:")],
    flagging_mode="never"
)
view.launch()

At this point, when you run this, you should get the link to the URL with that user interface !

LM Studio

Posted on January 24, 2025January 24, 2025 by Yazeed

LM studio is a great tool to run models locally on your machine, but it is somewhat more than that

According to their intro, it is…

A desktop application for running local LLMs
A familiar chat interface
Search & download functionality (via Hugging Face 🤗)
A local server that can listen on OpenAI-like endpoints
Systems for managing local models and configurations

stable-diffusion-webui from Automatic1111

Posted on January 23, 2025January 23, 2025 by Yazeed

This is a great front end for Stable Diffusion

the official GIT repo for this is here, to install it alongside stable diffusion and everything else you need, make sure you are running in your local environment with conda or pip, and run the following command

wget -q https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh

then make it executable and run it with the following command

./webui.sh --listen --api

You can add Automatic1111 to your openwebUI (Go to settings/images)