BeautifulSoup

BeautifulSoup is a python package that allows you to extract data from HTML files, it is very easy and intuitive

Let us assume you have an HTML page !

First, let us assume you want the title from that HTML page….

mysoup = BeautifulSoup(response.content, 'html.parser')
title = soup.title.string if soup.title else "No title found"

Now, assuming you want to remove everything that has to do with CSS and presentation, you can remove the following things with this easy code snippet, then putting whatever is lef in a variable called text

for irrelevant in mysoup.body(["script", "style", "img", "input"]):
irrelevant.decompose()
text = soup.body.get_text(separator="\n", strip=True)

Python virtual environment with pip

If you are familiar with python, you probably are also familiar with pip, and very likely familiar with venv (Virtual environments)

There is nothing special in particular about this for AI, it is exactly as you would for any other installation.

For AI, i would totally recommend anaconda, but if for some reason that is not an option, this option will do 100% of the time

So, you need to start by installing Python 3 !, On debian, I would just use the repository with apt, it may be true at the time of writing that in the repo it uses 3.11 rather than the latest 3.13, but that is absolutely fine

sudo apt update
// The following command should do
sudo apt install python3
//But i would much rather install everything python in one go
apt install build-essential wget git python3-pip python3-dev python3-venv \
python3-wheel libfreetype6-dev libxml2-dev libzip-dev libsasl2-dev \
python3-setuptools

Now, with that out of the way, navigate to the project’s folder (Assuming you have downloaded a project for example), and create a virtual environment

python3 -m venv venv

Now, you can activate that project with

source venv/bin/activate
//On windows, the above should look something like
venv\Scripts\activate

That is basically it, you will now need to, from within the command prompt of the venv, install dependencies either one by one using the pip command, or point it to a file containing the dependencies, for example

pip install -r requirements.txt

You should now be good to go !

Setting up Anaconda for AI

What is Anaconda ?

Conda, Like pip, is a python package manager, but conda is probably undisputed as the more thorough solution of the two, with better support for non-python packages (pip has very limited support) and support for more complex dependency trees

To clarify things, conda is the package manager while Anaconda is a bigger bundle, if you want to install conda alone, you are probably looking to install Miniconda. Anaconda is a set of about a hundred packages including conda, numpy, scipy, ipython notebook, and so on.

So, let us go through installing and using Anaconda on all 3 platforms, Windows, Linux and Mac

Linux

On Debian, there is no Anaconda package, to install, you will need to download the download script from anaconda and install it (Or conda, or miniconda for that matter) , you can add miniconda to apt using the “https://repo.anaconda.com” repo if you are willing to add it (apt install conda), but here I will assume you will just install Anaconda, and the only orthodox way to do that is with the installation script

Download the Anaconda installer from the Anaconda Website (There is a button that reads skip registration if you don’t want to give them your email address)

https://www.anaconda.com/download

Navigate to the downloads folder, and execute the script just downloaded, in my case, the script’s name was Anaconda3-2024.10-1-Linux-x86_64.sh so I execute the following

cd /home/qworqs/Downloads
chmod 0777 Anaconda3-2024.10-1-Linux-x86_64.sh
./Anaconda3-2024.10-1-Linux-x86_64.sh

After accepting the agreement, I see a message asking me to accept the license agreement, hit enter, take a look at the license agreement, hit the letter (q) to exit the agreement, then you will be asked if you accept the agreement, assuming you agreed to it, you will next be presented with….

Anaconda3 will now be installed into this location:
/home/qworqs/anaconda3

To which i accepted the suggested location

Now, I opt to keep the installer in the downloads directory just in case something goes wrong, but you can safely delete the 1GB installer if you like !

At the end of the installation, the installer offers to update your shell, in my case, i opted NOT TO, if you opted otherwise, you can always “set auto_activate_base” to false….

Do you wish to update your shell profile to automatically initialize conda?
This will activate conda on startup and change the command prompt when activated.
If you'd prefer that conda's base environment not be activated on startup,
run the following command when conda is activated:

conda config --set auto_activate_base false

You can undo this by running `conda init --reverse $SHELL`? [yes|no]

Once i answered no, i was presented with the following message

You have chosen to not have conda modify your shell scripts at all.
To activate conda's base environment in your current shell session:

eval "$(/home/voodoo/anaconda3/bin/conda shell.YOUR_SHELL_NAME hook)"

To install conda's shell functions for easier access, first activate, then:

conda init

Thank you for installing Anaconda3!

The environment

For convinience, let us start by adding our Anaconda3 installation to system path

//First, add anaconda to path by editing either ./bashrc OR ~/.bash_profile (Depending on which one you have), and adding the following to the bottom of the file

export PATH=~/anaconda3/bin:$PATH

Now, to apply the changes, you should either close the terminal window and re-open it, or run the command “source ~/.bashrc” or “source ~/.bash_profile”

To check whether the magic happened, run the command “conda –version“, in my case, that returned “conda 24.9.2”

Now, from this stage on, conda is installed, but to be able to use it, you should have a project ! so now you can move on (The index page), and I will explain how to run your project where needed, here, for completion, I will assume you have a project and put the instructions here (You know you are in the project dir when you see a yaml file commonly called environment.yml)

So, again, to activate an environment, there is a yaml file for every project that contains the dependencies of that project ! let us assume you are in your project’s directory, and the yaml file is called environment.yml , the following command will create a python sub-environment, and install all the dependencies in that yaml file, be sure you are in the directory with the yaml file

Now, to create a virtual environment, cd into the directory that has your project and run the following

conda env create -f environment.yml

Once the above is done downloading and installing, you should get a message like the one below

#
# To activate this environment, use
#
# $ conda activate projectName
#
# To deactivate an active environment, use
#
# $ conda deactivate

Now, next time, when you open a terminal, and want to activate an environment

1- conda init (conda deactivate to reverse), you only need to do this once
2- open a new shell
3- conda activate ProjectName (Also, conda deactivate to exit)