26 Feb 2025 4 min read

Can I Build My Own JARVIS? Let’s Find Out.

For years, I’ve been fascinated by AI assistants. They are useful, sure, but they always seem to be missing something. We all want a JARVIS from Iron Man or the computer from Star Trek that just gets us and responds naturally. Of course, I fully realize how absurdly difficult this is. But thinking like a scientist, my goal isn’t success or failure—it’s running experiments, seeing what works, and learning from the outcomes. We’ve all seen the sci-fi dream of a true AI companion, something more like JARVIS from Iron Man than a glorified voice remote. That’s the inspiration behind this project: I want to build a home AI that goes beyond simple automation and becomes a real, intelligent assistant.

This series will document my journey in creating a home AI system from scratch, covering everything from hardware selection to AI model customization and real-world testing. There will be plenty of trial and error, unexpected challenges, and exciting breakthroughs along the way. Whether you're an AI enthusiast, a developer, or just someone curious about where this technology is heading, I hope this series gives you insight into the process of building a smarter assistant.

The Vision: What I Want This AI to Do

Most voice assistants today are limited—they can turn on your lights, play music, or answer basic questions, but they lack real context, memory, and adaptability. My goal is to create an AI that can:

Understand and remember context across conversations
Control smart home devices intelligently, not just on command
Use only local LLMs and prioritize privacy
Proactively engage in conversations while remaining unobtrusive—this is more than just a technology challenge, and I have a lot of thoughts on it
I want to build this entirely custom for my home, which will help me cut a lot of corners.

This isn’t about reinventing the wheel. It’s about running experiments, analyzing the results, and learning from them. Whether this works or not, I’ll gain valuable insights into the application of LLMs while using them to build the very thing they power.

Capturing the Vision: My Project Notes

Before diving into the technical details, I outlined my core goals and priorities:

In the spirit of sharing, here are my early notes. Be kind:

Core AI Model and Local Processing

JARVIS with Home Assistant (HA) integration and a quantized large language model running locally, starting with Llama 3.3:70B, but I’m open to experimenting with other large LLMs (also, if anyone has a spare RTX 5090 lying around, let’s talk)
100% local processing for LLMs, with online access only for external data like weather and news

Context Awareness & Interaction

AI should handle multi-step conversations with continuity
Example:
- Me: “What is the temp in the kitchen?”
- JARVIS: “It’s 69 in the kitchen”
- Me: “Living room?”
- JARVIS: “71”
No "hardcoded" responses—except for error handling (e.g., “I’m unable to respond right now”) and a few edge cases. Responses should remain dynamic.

Secondary AI Persona: DRADIS (Direction, Range, and Distance)

A separate persona for direct, fast-response queries
Example commands: “Who is home?”, “Are any windows open?”, “Turn on the security system”, “Who is at the front door?”
Terse responses, optimized for speed and responsiveness
Conditional logic is acceptable for efficiency

Additional Features & Future Enhancements

Voice Training & Recognition – AI will recognize family members based on their voice (already working). It will also handle edge cases, such as recognizing guests or unknown voices, and determine appropriate responses based on context. – AI will recognize family members based on their voice (already working)
Contextual Addressing – Distinguishing between users to provide appropriate responses (e.g., addressing individuals properly in conversations)
Proactive Engagement – The AI won’t just respond to queries but will initiate conversations based on context (e.g., notifying when someone is at the front door, letting you know it's about to rain if there is a window open)

Guiding Principles

Privacy-first – Even if the system remains entirely local, security and privacy are key.
Good enough code is good enough – Priority is usability, with quality code as a secondary step. (The nicest way to say it is that it will be messy.) There are a few things to my benefit that will help with performance, like the speed of my local network.
Use existing infrastructure – No special phone apps or reliance on GPS. I have home security cameras, but they will remain as such. The system can leverage them, but no additional cameras will be added to support this. They remain, first and foremost, security cameras—not creepy home spies.
Visual processing – Utilize OpenCV and MotionEye for camera-based analysis
Share progress – Blog, LinkedIn, YouTube, and potential open-source contributions
Seamless experience – The AI should feel like it’s everywhere, always listening and responding naturally
I will fully leverage LLMs' code-writing ability to generate the code.

What’s Coming in This Series

Each post will focus on a different aspect of development, including:

Choosing the right hardware for an always-on AI and the mistakes I’ve made so far.
Selecting and augmenting LLMs for natural conversations and the interesting problems that poses, like streaming the text responses back over as spoken audio.
Building an AI “workflow" that integrates speech, memory, and automation
Testing real-world interactions and improving performance
Exploring the challenges and limitations of today’s AI tools

This is a passion project, and like any real innovation, it won’t be a straight path. I’ll be sharing my wins, my failures, and the lessons learned as I push this idea forward.

I’d love to hear your thoughts as this project evolves. What would you want a home AI to do? Let’s explore the possibilities together.

Next up, I’ll dive into defining the vision and key challenges of building an AI assistant that actually feels intelligent, and some work I've already done that made me confident enough to start a blog...

Finally, it won’t be called J.A.R.V.I.S. nor will it sound like Paul Bettany. Don't sue me, Disney.

OK, maybe it will sound a little like Paul...