This blog has plenty of posts about AI, some are about AI tools, others are about installing AI locally, so this post is where I am putting all the AI stuff I have ever blogged about in one place !
The section Local AI is about creating your own AI server using freely available sources, the API section lists all the services that provide an API but can not be installed locally, and the Online Services is where you can get things done via AI online (That can’t be installed locally or accessed programmatically via API)
How to use: Keep this page open, and open the links within the steps as you go…. resources are on separate pages and linked to from this page to keep this page manageable in size.
Here are the steps to get you up to speed with AI, if you follow the simple steps in this guide, you will be playing with all the AI stuff and creating projects based on AI in no time, IT IS MUCH EASIER THAN YOU THINK, All you need is some shallow understanding of python
Step By Step
The following steps are mostly not prerequisites for each other, I am forced to pick a sequence for learning on your behalf, but feel free to jump around.
- Environment: The first thing you need to do is install a python environment, this is either through Anaconda or venv (Anaconda preferred, venv is the alternative)
- Run locally with Ollama: To run AI models locally, you may want to start with Ollama – Ollama makes it amazingly easy to run so many models with one line on smaller less capable systems (Like your PC that has no GPU), this is because Ollama compiles those models from C++ ! but that comes at a price, as you you don’t control things the way you would in HF transformers library. Well, you can, but it is a lot of work.
- Cloud/API pay per use Models: For learning purposes, or for production, you may want to use Frontier AI solutions, ChatGPT and Claude chat may be amazing tools for chatting with a very high tech AI, but those chat services are completely separate from the API that you will need to programmatically access those AI engines, so at this stage, you may want to obtain some API keys from OpenAI (ChatGPT), or Anthropic (Claude), you can also get keys for many other systems, one very cheap option is DeepSeek (3% of the cost of ChatGPT), or in other words, thirty times cheaper, it is also amazing that you can run DeepSeek V3 locally if you have the hardware for it !, there are many more and I can not mention them all in one paragraph, So i will be compiling them here
- ENV File: Once you have created API keys from one of the systems above, you would want to incorporate them into your .env file, for a .env file, all you need to do is add a .env file to your project’s directory, and the file should be as explained here (your .env file)
- Jupyter Notebooks: Run code inside a Jupyter Notebook, learn to use it, and learn to use Google Colab (hosted Jupyter Notebook)
- Types of prompts: Prompting through API
Dev & Local AI
Environment setup
- Hugging Face transformers library – For running LLMs locally, you get access to the python code and the pytorch code which has the model,
- LangChain: A framework, an abstraction layer so you can use the same code with multiple APIs
- Gradio
- Weights and biases
LLM & Frontier
- Meta / Llama
- Google / Gemma: (Open source variant of gemeni)
- Mistral / Mixtral: A mix of experts
- Alibaba Cloud: Qwen
- Microsoft : Phi
Creating images with AI (Local)
- VQGAN (Vector Quantized Generative Adversarial Network / neural network) : The software that generates the image
- CLIP (Contrastive Language-Image Pre-training / neural network) : Software to influence a generated image based on input text (User prompt)
- VQGAN+CLIP : Two neural network pieces of software that work in tandem.
- CLIP-Guided-Diffusion: A technique for doing text-to-image synthesis cheaply using pre-trained CLIP and diffusion models.
- Google colab notebook: A tool made by google where you can run python code and utilize google’s GPUs, both paid and free exist
Transcribe Audio
- OpenAI’s whisper, the undesputed champion of transcribing audio to text
Text To Speech
- Tortoise and Bark for Voice Synthesis
Online
Only stuff that I have tried or know about, and it only gets its own blog post if i have enough to say about it, if all i have to say are a couple of lines, The entry will be explained in place
LLM & Frontier
- OpenAI / ChatGPT: Available through website and API, the world’s most popular LLM
- Anthropic / Claude:Available through website and API, the second most popular LLM
- Google / Gemeni
- Cohere / Command R
- Perplexity: (A search engine that can either use other models, or its own model)
Managed Cloud
- Amazon Bedrock
- Google vertex
- Azure ML
Direct chat with frontier models
- ChatGPT: OpenAI’s LLM
- Claude AI: Anthropic’s LLM
- llama : Meta’s AI
- Gemeni: Google’s model
- Command R: cohere’s model
- Perplexity: Perplexity’s model
Audio
- Turboscribe: https://turboscribe.ai
Tried to feed it a file with 2 people, one speaking with a Jordanian accent, and another with a Saudi accent, the results were 8/10, the system seems to allow 3 files 30 minutes each for free users. Probably Powered by OpenAI Whisper engine.
Other relevant resources
- Selenium: Web browser automation, good for data scraping
- playwright: End to end testing for web apps
The best model for the job
Qwen 3.5 seems to be the best model for multi-lingual applications
Terms
- Agentic AI: A number of agents, each tuned to play a role, working together to solve a problem
- Parameters of a model: Also known as model weights, the number of parameters in a model is the number of decision “nodes”Switches”, as a model trains, new weights are added and weigt values are changed
- A token: In the early days, a token was 1 charracter, later, words became tokens, but that was problematic in terms of dictionary size, so today, tokens are chunks of letters that are commonly found in words, so a word may consist of 2 tokens for example, as a rule of thumb, 100 tokens are around 70 words. there is a tool called tokenizer to show you how many tokens are in a sentence, the tokenizers differ from provider to provider since there is no standard set of tokens
- Context window: the number of tokens that can be used in a conversation, it is basically the sum of all the prompts up to that point, added to all the output of the LLM that is being passed as input in the next request, and by prompts, we mean both system and user prompts