Aged Alibaba Cloud business account Alibaba Cloud Log Service SLS guide
So, you’ve decided to use Alibaba Cloud Log Service (SLS). Congratulations! You’re about to bring order to the chaotic universe of logs: those mysterious text lines that appear when something breaks, disappear when you need them most, and somehow grow legs and run into the dark when you’re trying to debug at 2 a.m.
Don’t worry. This guide is here to keep your logs from escaping. We’ll take you from the basics (what SLS is and what pieces you’ll see in the console) to real setup steps (collecting, parsing, indexing, exploring, alerting), and we’ll finish with troubleshooting and practical advice that saves time, money, and at least a few gray hairs.
What Is Alibaba Cloud Log Service (SLS)?
Alibaba Cloud Log Service, often shortened to SLS, is a managed platform for collecting, storing, searching, and analyzing logs. It helps you centralize logs from applications, servers, containers, gateways, or even middleware. Instead of hopping between machines, searching local files, and praying to the log gods, you send your logs to SLS where you can query them quickly using built-in tools.
Think of SLS as a well-organized library for your telemetry. You can:
- Ingest logs from sources (agents, APIs, SDKs, or other services).
- Parse and structure logs (so “random text soup” turns into fields like timestamp, status code, userId, requestId).
- Index key fields to speed up searching.
- Query logs using a powerful search language.
- Create dashboards and alerts to monitor systems continuously.
Aged Alibaba Cloud business account SLS doesn’t just store logs. It gives you an investigation workflow: find, filter, analyze, and respond. And when your production system sneezes, you’ll hear about it before the rest of the city catches a cold.
Core Concepts You’ll See in SLS
Before you click buttons, it helps to understand the building blocks. SLS has a few important terms that sound simple until you’re staring at an interface wondering which one you’re supposed to create first.
Project
A project is a top-level container in SLS. You can group related log stores under a project. Depending on how you structure your organization, you might create separate projects per environment (dev, staging, prod), per team, or per product.
If you’re using cost controls or access policies, projects are often where you apply them at a broad level.
Logstore
A logstore is where your actual logs land. Each logstore has settings for topic type, data format, indexing configuration, and data retention. You can create multiple logstores under one project, separating different types of logs (for example: application logs, access logs, audit logs, and error logs).
Logstores are like “folders” that still behave like “databases” because they store data and support querying.
Log topic / Data ingestion channel
SLS uses the concept of topics to classify incoming data and route it correctly. Depending on the ingestion method, your configuration includes topic-related settings so SLS knows how to interpret your incoming messages.
You might not think about topics in early setups, but eventually you’ll notice your agent or sender uses a topic parameter. It’s basically the address label on your data shipment.
Index and fields
Indexes make searching faster. But indexing everything is like indexing your entire kitchen in the catalog: it works, but you might regret it when the bill arrives. The trick is to index what you commonly filter on—things like serviceName, environment, requestId, statusCode, or errorCode.
Parsing and indexing work together: parsing extracts fields; indexing makes those fields searchable efficiently.
Dashboards and alerts
Dashboards help you visualize log-derived metrics (for example, request rate, error counts, latency distributions if you have enough data). Alerts trigger notifications when certain conditions occur (for example, “error rate exceeded 2% for 5 minutes”).
In other words: you turn logs into early warning signals. Very satisfying. Slightly addictive.
Planning Your SLS Setup (Before You Ingest a Single Byte)
It’s tempting to rush straight into “create logstore, install agent, send logs, hope for the best.” Please don’t. Your future self will thank you if you plan a little first.
Decide what logs you’ll collect
Common choices:
- Application logs (INFO/WARN/ERROR)
- HTTP access logs (method, path, status, latency)
- Audit logs (who did what)
- Infrastructure logs (system events, container logs)
- Custom event logs (business events like checkout failure)
If you’re unsure, start with a focused set: error logs and request logs. You can expand later.
Choose environments and retention
Retention determines how long SLS keeps the data. Longer retention is convenient, but it increases cost. A practical approach is:
- Dev/staging: shorter retention
- Prod: longer retention (at least enough to cover incident investigation cycles)
Also decide how long you need logs for compliance or troubleshooting. If you don’t know, pick a starting point and refine after you see usage patterns.
Pick parsing strategy
Logs usually come in two forms:
- Structured logs (JSON): easier parsing, less pain.
- Unstructured logs (plain text): requires parsing rules (regex, key=value patterns, etc.).
If you can control your application, JSON logs are a gift. If not, SLS can still parse, but you’ll spend more time building patterns. Which is fine—unless your patterns turn into spaghetti.
Step-by-Step: Creating Your First Project and Logstore
Let’s do the setup in a clean order. Your exact UI might vary slightly, but the workflow is generally consistent.
Step 1: Create a Project
Go to the SLS console and create a new project. Give it a meaningful name. For example:
- prod-web
- staging-api
- payments-service
The project should match how you want to manage permissions and costs.
Step 2: Create a Logstore
Under your project, create a logstore. You’ll usually choose settings like:
- Logstore name (for example: access_logs, app_logs, error_logs)
- Time zone and ingestion/data format (depending on options)
- Retention period
- Indexing and parsing settings (often configured later too)
If you’re collecting access logs, you may want fields like statusCode, requestPath, and responseTime. For application logs, you might want logLevel, errorCode, serviceName, and requestId.
Step 3: Configure Log Topic / Ingestion settings
SLS requires a mapping between incoming data and the logstore. Depending on your collection method, you’ll set up a topic, choose where data comes from, and generate credentials or configuration for your agent or sender.
This is the part where people sometimes wonder, “Why are my logs not showing up?”—which usually boils down to the sender pointing to the wrong logstore or topic. We’ll discuss troubleshooting later, so don’t panic yet.
Ingesting Logs: The Practical Methods
There are a few ways to get logs into SLS. The best method depends on where your logs originate and how your infrastructure is built (VMs, Kubernetes, ECS, on-prem, etc.).
Method A: Using an SLS Log Agent
An agent is the most common approach for VM or container environments. You install it on the machines that produce logs. The agent reads log files or container stdout, then ships them to SLS.
Typical workflow:
- Install the agent.
- Configure input paths (log files) or container log sources.
- Configure parsing rules or assume JSON.
- Set SLS credentials and endpoint.
- Start the agent and validate ingestion.
Tip: keep your log file naming consistent and avoid “mystery” locations. The agent config should be boring and reliable, like a toaster.
Method B: Sending logs via API/SDK
If your application can send log events directly, you might use an SDK or API to push logs to SLS. This approach is useful when:
- You want to send structured events only (no file tailing).
- You’re in serverless environments.
- You want to control batching and format precisely.
However, be careful with volume: sending too many events directly from app threads can create overhead. Most teams handle this with buffering or async pipelines.
Method C: Integrations with other Alibaba Cloud services
Depending on your environment, there may be built-in integrations with other services that produce logs. You can route those logs to SLS without writing custom collection code.
If you’re already using the Alibaba Cloud ecosystem, integrations can save time and reduce operational burden. Still, always verify that the logs arrive with the fields you expect.
Parsing Logs: Turning Noise into Usable Fields
Once logs arrive in SLS, you’ll want them to be searchable. That’s where parsing comes in. Parsing extracts fields from raw log lines.
Aged Alibaba Cloud business account Think of parsing as turning your messy desk into labeled drawers. Your future investigation self will feel like they’re living in a calm productivity montage.
Structured (JSON) logs
If your logs are JSON, parsing is straightforward. You configure SLS to treat the log line as JSON, then map JSON keys into searchable fields.
Practical recommendations:
- Use stable key names (avoid changing “userID” vs “userId” across releases).
- Include a timestamp field if you can control it.
- Aged Alibaba Cloud business account Include requestId or traceId to connect events.
- Keep log messages consistent so dashboards and alerts don’t silently break.
JSON is the “easy mode.” Even if you’re still debugging, it’s usually the kind of debugging where the answer is “your JSON wasn’t actually valid.” That’s easier than regex purgatory.
Unstructured text logs
If your logs look like:
“2026-04-30 12:34:56 ERROR Something failed: code=E123 msg=timeout user=42”
You can parse key=value segments or use regex patterns. In SLS, you’ll configure parsing rules such as regular expressions, key-value extraction, or grok-like patterns depending on available features.
Regex is powerful, but also capable of producing artful chaos. Here are some rules of thumb:
- Start small: parse only the fields you truly need.
- Validate with a handful of log samples.
- Write patterns that tolerate variability (optional spaces, variable digits).
- Avoid capturing too much with “.*” unless you really mean it.
When parsing goes wrong, SLS might still ingest logs, but fields may be missing or empty. That can make your searches look like nothing happened, when actually your fields didn’t get extracted.
Indexing: Make Queries Fast (Without Paying for Everything Everywhere)
Indexing determines which fields SLS can search efficiently. If you know you’ll frequently query by certain fields, index them.
Aged Alibaba Cloud business account For example, you might index:
- logLevel
- serviceName
- statusCode
- errorCode
- requestId / traceId
- environment
But maybe you don’t index:
- full message text (unless you query it often)
- high-cardinality fields like random IDs you never filter by
- entire payload blobs
High cardinality indexes can increase storage and cost. The goal is to index what’s useful for filtering and grouping, not to index every last detail because “it might be important later.” (Spoiler: later is usually a different filter entirely.)
Querying Logs in SLS
Now the fun part: searching. Once logs are parsed and indexed, you can run queries to filter and inspect events. The interface typically offers a query editor with options to search by time range, fields, and message patterns.
Even without knowing every exact query syntax detail, you should get comfortable with a few patterns:
- Time filtering: “Show me the logs from the last 15 minutes.”
- Field filtering: “Show me where statusCode is 500.”
- Free text search: “Find logs containing ‘timeout’.”
- Combining conditions: “statusCode=500 AND serviceName=checkout.”
- Sorting and limiting results: “Show the most recent errors.”
If you’re building queries for dashboards, you’ll often aggregate counts by time buckets (for example, count errors per minute) and by categories (like errorCode).
Pro tip: When investigating incidents, always anchor your queries with requestId/traceId if you have them. Otherwise, you’re basically debugging in the dark while wearing a blindfold made of alerts.
Dashboards: Turning Logs into Visible Truth
Dashboards help you understand trends without manually running queries. Many SLS setups include dashboards that plot:
- Request rate over time
- Error count and error rate
- Top paths or endpoints by traffic
- Top error codes
- Latency percentiles (if you log latency fields)
A good dashboard answers questions quickly:
- Are we currently having errors?
- Did errors start after a deployment?
- Aged Alibaba Cloud business account Which service/endpoint is responsible?
- Is it a specific error code or a general failure?
When building dashboards, use consistent field names and ensure parsing is correct. Otherwise, your graphs will be like “I swear something is happening” but with no readable reason why.
Alerting: Don’t Let Problems Wait for Morning
Alerts trigger when log-derived conditions are met. Common alert triggers:
- Error rate exceeds threshold
- Specific error code appears frequently
- Log severity includes “ERROR” above baseline
- Latency crosses a threshold
- No logs received from a critical service (yes, silence is sometimes a problem)
Aged Alibaba Cloud business account Good alert design is about reducing noise. A useless alert is just a confident announcement of your system’s failure to communicate.
Practical tips:
- Use time windows (evaluate over 5 minutes or 10 minutes).
- Avoid single-event alerts unless the event is truly catastrophic.
- Include actionable context in notifications (serviceName, environment, errorCode).
- Test alerts in staging so you don’t wake up everyone for a false positive.
Security and Access Control
Logs can contain sensitive information. That means you must treat SLS access like you treat passwords: carefully, and ideally not by writing them on sticky notes.
Use least privilege
Create roles and policies that grant only the necessary permissions. Typically:
- Developers might need read/query permissions.
- Operators might need dashboard/alert management.
- Ingestion agents need write permissions to specific logstores.
Protect credentials used by agents
If your ingestion method requires access keys or token credentials, store them securely and rotate them when needed. Avoid embedding secrets directly in source code repositories.
Be mindful of personally identifiable information (PII)
Before shipping logs to SLS, consider whether your logs might contain:
- Usernames/emails
- Phone numbers
- IP addresses
- Session tokens
- Payment-related data
If yes, consider masking or redacting sensitive fields at the source or during ingestion/parsing. Many teams implement a “log hygiene” policy so that debugging doesn’t accidentally become data leakage.
Cost Considerations: How Not to Accidentally Buy a Small Planet
SLS cost is influenced by factors like log volume, retention, indexing, and query patterns. While exact pricing can vary, the general strategies are stable:
- Reduce ingestion volume: avoid sending extremely verbose logs.
- Use appropriate retention: keep only what you need.
- Index only necessary fields.
- Prefer structured logs: parsing can reduce wasted indexing and improve efficiency.
- Regularly review and clean up unused logstores.
Also, if you have debug logs that you only need temporarily, consider enabling them for limited periods or using sampling strategies.
Costs sometimes rise when people keep everything forever “just in case.” Just in case is how budgets vanish like socks in a dryer.
Troubleshooting: When Logs Don’t Show Up (Or Show Up Like Ghosts)
Let’s address the classic incidents:
Problem 1: No logs appear in SLS
Common causes:
- Agent is not running or misconfigured.
- Wrong logstore or topic is configured.
- Time parsing is incorrect (timestamp field mismatch or time zone confusion).
- Network or authentication failure.
- File path or log rotation behavior means your agent isn’t reading the current files.
Fix approach:
- Check agent logs (the agent should tell you ingestion status).
- Verify the destination logstore name and topic.
- In the SLS UI, search around the expected time range (include buffering delays).
- Confirm the logs have correct format expected by parsing.
Aged Alibaba Cloud business account Problem 2: Logs arrive, but fields are empty
Usually parsing is not matching the log format.
- Regex patterns might not match due to slight differences in logs.
- JSON logs might not be valid JSON (trailing commas, truncated messages).
- Field names might differ between environments or versions.
Fix approach:
- Grab raw log samples from the source.
- Test parsing rules against those samples.
- Confirm timestamp extraction works (otherwise time-based queries look empty).
Problem 3: Queries are slow or expensive
Typically because indexes aren’t set up for the fields you filter by. Or you’re doing queries over large unbounded time ranges.
Fix approach:
- Index the fields you filter frequently.
- Use narrow time ranges.
- Reduce query complexity for dashboards (pre-aggregate when possible).
Problem 4: Alert triggers too often (aka “Alert Fatigue”)
Aged Alibaba Cloud business account If alerts are noisy, you’ll ignore them. That’s not a moral failing; it’s just physics.
Fix approach:
- Increase time windows and thresholds.
- Alert on rates or counts rather than single log lines.
- Include additional filters (serviceName, environment, errorCode).
- Consider anomaly detection if supported, so you alert on meaningful deviations.
Best Practices: Make Your SLS Experience Pleasant
Here are habits that will save you time and make SLS feel like an ally rather than a mysterious machine that eats logs and burps graphs.
Standardize log formats
Use consistent JSON schemas (or consistent text patterns). Standardization makes parsing easier and dashboards stable across releases.
Include correlation identifiers
Make sure you log requestId or traceId. Debugging without correlation IDs is like trying to find a specific book in a library where everything is stored in “miscellaneous.”
Log levels should mean something
INFO should be informative but not flood the system. WARN indicates something worth noticing. ERROR indicates failure or conditions requiring attention.
If your application logs everything as ERROR, congratulations: you have invented a permanent outage alert.
Document your log fields
Maintain a simple internal doc listing your log fields, their types, and when they appear. This helps when new developers join and suddenly they’re like, “What is errorCode? Where is it set?”
Test parsing and dashboards with sample data
Before going live, use test logs to validate parsing rules, indexing, dashboards, and alerts. Nothing hurts like launching “production-ready” parsing rules that quietly extract the wrong field name.
Example Workflow: From Zero to “We Can See What’s Happening”
Aged Alibaba Cloud business account Let’s imagine a typical journey for a web service team.
Stage 1: Collect app logs
The team creates a project named prod-web and a logstore named app_logs with a reasonable retention. They configure an agent to tail application log files, and they ensure logs are either JSON or can be parsed using key=value patterns.
They validate ingestion by running a simple query: search for “ERROR” in the last 15 minutes.
Aged Alibaba Cloud business account Stage 2: Add parsing for key fields
They parse:
- timestamp
- logLevel
- serviceName
- requestId
- errorCode
Then they index logLevel, serviceName, errorCode, and requestId.
Stage 3: Create an error dashboard
Aged Alibaba Cloud business account They build a dashboard showing error counts per minute grouped by errorCode. Now they can answer quickly: what’s failing, and how badly?
Stage 4: Add alerts
They add an alert: “If error count for errorCode=E123 exceeds 20 in 5 minutes, notify on-call.”
They test by triggering that error code in staging.
Stage 5: Tune cost and noise
If logs are too verbose, they reduce log volume or sample less important messages. If alerts are too noisy, they adjust thresholds and filters.
The team ends up with a system that’s useful, not merely loud.
FAQ: Common Questions About SLS
Do I need to parse logs?
Not always, but parsing is strongly recommended if you want meaningful search and dashboards. Without parsing, you can still search by raw text, but queries become slower, less precise, and far more annoying.
Should I index every field?
No. Index fields you filter on or group by frequently. Indexing everything can increase cost and complexity. Index what your humans actually use during debugging.
How long should I keep logs?
Depends on your incident response needs and compliance requirements. A common approach is shorter retention for dev/staging and longer retention for prod. Start with a reasonable value and adjust based on how often you need older logs.
What if my logs change format after a deployment?
That’s real life. If your parsing rules depend on a specific format, log schema changes can break parsing. Use backward-compatible changes, or update parsing rules in coordination with application releases.
Conclusion: You’re Now Equipped to Tame the Log Beast
SLS can feel like a lot at first—projects, logstores, topics, parsing rules, indexing, dashboards, alerts—but it’s ultimately a consistent workflow: collect, structure, search, visualize, and respond.
The key to success is not rushing. Set up a clear project and logstore structure. Start with essential logs. Parse and index what you need. Build dashboards that answer real questions. Add alerts that notify you when something meaningful is happening. Then, iterate: refine parsing rules, adjust indexing, and tune alerts to avoid chaos.
In the end, the goal is simple: when something goes wrong, your logs shouldn’t be a mystery novel. They should be an instruction manual, with the villain clearly labeled and the exit route conveniently highlighted.

