I have been using Openclaw recently and the concern surrounds around building such application is how do we make sure any action, execution or command goes unchecked
Exploring via example on how will we go about building a compliant assistant who is Jarvis but still doesn't creates chaos:
Buliding an agent that can book travel, sends emails and make purchases upto INR 10,000/-. How will I'm going to make sure its safe in executing plans and maintain control which stays in my control.
Safety Framework :
- Tier 1 : Read only (no approval needed), it allows searching flights, get travel recommendations, check prices etc.
- Tier 2 : Low-risk actions (implicit approval), send drafts to user folders, add items to cart, create calendar holds.
- Tier 3 : Medium-risk actions (explicit approval), send emails on user behalfs, book refundable travel, purchase decisions of 2000/- to 5000/-
- Tier 4 : High-risk actions(multi-step approval), non-refundable bookings, purchases upto 10,000, access sensitive data like cards etc.
Example workflow :
User : Book me a flight to New Delhi next week
Agent Plan -
- Search flights [Tier 1 - proceed]
- Select best option based on preferences
- Book flights of 5,000/- [Tier 3 - request approval]
Safegaurd : Daily limit of INR 3,000/- of weekly and INR 10,000/- monthly
Transaction validation :
- Check against budget
- Flag unusual spending patterns
- Require extra approval for large purchases
Anomaly detection : Pattern monitoring
- Booking from unusual locations
- Purchase outise normal categories
- Time of day outside normal pattern
User Control Mechanism :
Users can toggle :
- Can search and recommend
- Can create drafts
- Can sends emails
- Can make purchases
- Can book refundable travels
- Can book non refundable travels
- Spending controls (max per transactions, categories allowed or categories blocked)
Smart Defaults : First time doing then always require approval, after 5 successful transactions allow with notification, after 20 successful transactions become fully autonomous (unless high risk)
Have audit trails to all agents actions in lets say 90 days : spending summary, savings vs manual (time + money) and errors/corrections.
Metrics for safety :
Leading Indicator :
- Approval rejection rate ( target less than ~5% )
- User overrides ( target less than ~10% )
- Rollbacks / cancellations ( target less than ~2% )
Lagging Indicator :
- User complaints about agent actions
- Unauthorized actions (target : 0%)
- Financial Disputes (target :0%)
Success Critera :
- User Trust score > 80%
- Task success rate > 95%
- Zero authorized purchases
- Average time saved : 2hrs/ week
I'll try this as my framework to build my openclaw ripeoff and use local llm models and have my own safe compliant Jarvis
