23June2008

Drawing Lines

Posted by sanford under: Log Standards.

Developing a logging standard will start by sorting out relationships, making distinctions, setting boundaries, and creating working assumptions. A jigsaw puzzle solver might advise flipping over the pieces, sorting them by color or pattern, and starting with the border first. Commonalities lead to other relationships. Assembling a subset of pieces can be done in isolation from the rest of the puzzle. After a while, the growing subsets can be joined until the picture is complete. Same here.

On to the flipping. The first step is to separate fact from desire.

A log message is record of an event or about events. It presumes no importance, and does not know where it fits in the world. It is information, clear or incomplete, verbose or cryptic. Do with it what you will, but do not presume it to be anything more. Not looking too closely, it is a statement that must be presumed to be fact.

When identifying an event by terms such as “security”, “auditing”, or “operations” a use case is being applied. A comfortable classification. It can be based on convention, the purpose of a product consuming the log, the perspective of the user in need of the log, or other grouping or abstraction. As such, it is based in judgment and objectives, and subject to change.

Security folks will see security in any event. Type is relevant only to security risk evaluation. The security industry does have general agreement about some events. Still, new threats are constantly discovered. The overall level of risk changes from day to day.

Messages used for auditing have a high dependence on use. Security is one, general policy is one, and business practice monitoring and policy is still another. The sets of messages that satisfy each are not identical. The repercussions from one use can lead to litigation, from another it’s just another message tossed into the basement file cabinet.

Operations has mixed perspective. It can be network traffic through an enterprise or the state of the product delivering the traffic. The message type depends on the user, the use, and the part of an infrastructure where it is being applied.

This is fodder for use standards, not a logging standard.

It’s natural for early efforts to mix facts with results. Subject matter experts (SME) may be the initial drivers or heavily involved. Without an objective foundation the available references are experience, best practices, and professional agreement. They are important references though. As use cases, they define how a logging standard will be used. In other disciplines, a third party may be brought in, with the stakeholders and SMEs, to sort out the various classifications and perceptions, without losing perspective. A logging standard must be objective in structure and form, and respect the areas it will be applied.

Now for sorting.

With a starting point that removes opinion from the mix, other points rise to the top. One example is division of associations.

Any message may contain information where valuation beyond the event is only possible with knowledge of the product. (I use “product” to define anything that can record an accessible log.) A way to start down this path is through the definition of the three basic event types.

Application

Any event about the product doing it’s job, satisfying it’s purpose, and generally the only message of concern throughout the various user, IT, and company levels. A firewall rejecting traffic, a ticking system creating a ticket, a sensor recording the internal pressure of an oil pipeline, a user adding a PDF file to a documentation catalog, a user login.

Operational

The state of the product, the state of the product’s relationship with its environment, and any internal processing supporting the product’s application. Open connection threshold warnings, replicating directory server contents, start up, shut down, the steps to close a user session.

Administrative

Change to the product’s operational or application state or configuration. Adding firewall rules, adding a user, setting the minimum memory threshold, adding a DNS zone.

To restate a classic problem, how are these events known to be what they are? An administrative event requires a user with administrative privileges. So, if an administrator is involved, it must that type of event. Hmmm. Okay, answer these questions:

1. How does one identify an administrator?

It’s standard advice to not use default administrative accounts. How does one know if “tretiak” is an administrator or a user? Well, by the operation of course!

2. By the operation?

There may be common operations known to require administrative privileges. List them. Then list operations that any user can perform except when administrative super strength is needed. Then list the operations in terns of the oil pipeline sensor.

Okay, then by the objects that are changed!

3 By what is changed?

A set of unknowns just a large at the previous points. Sit two OS SMEs in a room. One has depth in i/5OS. The other is Windows. Ask them to list the objects touched in common administrative practices. Start a stop watch. Count the puzzled expressions that happen in 60 seconds.

What’s left? A choice, that does not need to be made now. Only acknowledged. Either the product generating the log has to state what type of message it is, or the log standard must allow for the application of domain specific information, outside the standard.

Which way, in terms of the standard is best? Either or both?

1 

17June2008

Why standards?

Posted by sanford under: Log Standards.

I’ve often wondered about the viability of broad vendor adoption of a log standard. There are missing pieces. An organic aspect of adoption that isn’t in the make up of logging. There’s hardly a person dealing with systems and software that hasn’t imagined how logs could be used, or wished the logs they’re dealing with were useful. What’s missing?

I won’t go into the behavioral profiles relevant to selling security. Security is out. Business needs win, anyway. If it doesn’t help make money, why spend money on it?

Compliance? Most governmental regulations are so many characters on a page. The rest are so many sentences. None seem to state that a product vendor must log in a useable form. Many companies and auditors have found ways to make them work without reliance on standards.

Industry mandates? We’re getting closer. Still, they press what a user of their service must do to be up to speed. Nothing about log standards, vendor responsibility, etc.

Maybe general policy enforcment. We’ve lived a long time with “thou shalt and thou shan’t” followed by “and we better not catch you.” Doesn’t seem to be a big winner.

Operational monitoring? A lot of companies working on that. Plenty of products that will help watch the egg basket, as long as the basket is bought from them or a close partner. No push there. They have the solution. Just ask.

Administrators? Sure. Log standards will help them to pull off miracles at 3:00 AM in their 32nd hour, sitting in a dark closet, with no documentation, the tin support plan, and are either doing it from 12 timezones away or are afraid they’ll have to move that far to keep their job. Wake them up first before asking if a log standard would be good.

The vendors creating the logs? Is that one worth discussion? The folks that sell reporting options based on their log formats? That use messages as inter-system communications and monitoring, or make 20% of revenue from support plans?

On top of all this is a sense log standards are too big of a problem. Too many pieces, to many complexities. If one existed today, it would take years, perhaps a decade or so to come up to a broad enough implementation to mean anything. That’s too long. I’m taking vacation in a few months. Can I have something before then?

Let me know if anyone was missed.

So … why? There’s a practical reason, and a why-can’t-everyone-see-it-reason.

Compliance and industry mandates have amplifed a scenario played out in companies big and small. They have placed the company computing based business infrastructure in one person’s lap. Traditional attempts to maintain policy had to span administrative, business, and product domains. Each executive responsible for such an area was directly responsible for understanding and implementation, and communicating policy, risk, and violations. Today those areas belong to a company officer.

This position doesn’t particularly care who made the firewall, or the brand of database. The state of the company is inherently product agnostic. He will ask one person the risk of failing the next audit. That person, or group, cannot afford to worry about how this OS or that OS reports logins. There’s too much disparity and too much information. Bottom line, only. This will never be achieved without standards.

From the company view, a standard is assurance compliance management can reach a reasonable level. Time, effort, money, fines, and bad quarterly reports can be saved or avoided. How well a product can integrate into the compliance/policy infrastrcuture will become part of the purchase decision matrix. If product does not integrate, and the company must purchase equally costly professional services to integrate, how far down the list of pros and cons will the buyer go? Eventually, product vendors must comply or risk market loss.

Now for the fun side.

A favorite analogy for logs is aluminum.   Aluminum is the most abundant metal in the Earth’s crust. It’s the third most common element, after oxygen and silicon. Like aluminum, logs are everywhere. It can be argued that even the smallest change to data or a system can create a message many times the size of the data. Give an appropriate level of silliness one can generate more logs about an event then the sum of elements of the event. Even to the point of bringing a system to the ground. Logs can be common and abundant.

Bring the silliness down a bit.

At one time aluminum was a precious metal, more valuable than silver, or even gold. The value came from the difficulty to extract it. At the end of the 19th century processes were developed that drastically lowered the cost. Ask an aluminum vendor and you will receive a list of uses that add up to how the world as we know it would not exist without that metal.

A cost effective method for extraction of the data contained in logs, implicit and explicit, would enable any number of capabilities in a company. Take your pick. Instead of 30 different console applications for a security product infrastructure, there can be one primary console that works at the event level. Cross domain and product activities can be tracked. IT and similar costs could be justified. The person given company responsibility for compliance/policy conformance could work within that domain, and not the domains of a hundred tribes.

There are dozens of others. If you’ve read this far, my guess is it wouldn’t take long to create a list of issues that might be solved, or capabilities that might be useful. People who take advantage of the normalization and APIs available in some log management products are working these areas, today. If you’re someone more into research and analysis, there is a rich set of behavioral information that today is guessed at.

Standardization is one of the steps to make this happen. Standards must address presentation, content, classification, the broad range of products and product types, and vendor logging efficiency, ease of transition, and maintainability. It must consider or include transport, use by legacy systems, community support, multiple involved log consumer types, perspective structure, some level of future growth.

Standards will move forward if customers press vendors and organizations to develop them. While it may not look like it, product vendors and vendors that use logs are taking interest. This is not a time to watch what happens and accept what comes.

2 

15June2008

Yeah, I did

Posted by sanford under: Where It Will.

Once, in a misspent other life, I bought a scanner. Visions of old family photos to share and getting rid of a box of old documents, danced in my head. Preservation of family treasures and efficiency were good enough reasons. Ultimately, I gave my sister the pictures and shredded the documents. Kept the scanner, though. I bought a new bookcase to set it on. The bookcase would be used and the scanner wasn’t taking up space. It worked.

The scanner came with software. Copying, image editing, and optical character recognition. Copying was obvious and good. Something needed to be copied at least every other day. It was fun using the image editor, flipping colors, looking at a scanned picture of my sister facing left, then right, then upside down. Blurring, sharpening, painting her a red eye, then removing it. That lasted for a few hours. I kept some of the experiments to press my sister to get her own scanner. She could have fun enhancing the years of family history in those photos.

Character recognition was new to me. It was fed a document, one destined for shredding, and little yellow squares and red lines filled the scanned page. Tweaks and corrections, more tweaks and corrections, and some “a”s still came out as “*”. In the quiet hours of early morning desperation slipped toward a weary acceptance. Some good reason for the scanner would come up. In the mean time, a bookcase is always useful, sooner or later.

Standing on the back patio, working the last cigarette before bed, a great idea came, as they do in the early morning, after a number of beers, or when love strikes. I could use it to save money.

Prior failures at justification were pushed aside by a sputtering glow of enthusiasm. I dug up a grocery receipt, smoothed it, and fit it carefully on the glass. The data from different stores could be scanned, converted, and imported into a spreadsheet. A price history could be created. The most economical stores could be mapped. My sister would love it.

So what where the issues? Stores named items differently. That was okay. A skate in the park. After enough data had been captured and aligned, the list would plateau and the cost to manage would become small. The nickels and dimes would add up. The scanner would be paid for. Pathetic, perhaps. One can learn something from anything. After another hour or two, the first receipt was on disk.

The receipt from the next store killed the idea. Item names were frustratingly different. So was the layout. To align them, items and prices had to be flipped. Then the items, old and new, were sorted. The sort didn’t quite work. Naming was different enough to make the order relationships an uncomfortable approximation. The names had to be normalized. Aligning the same items took more time than anticipated. Neither receipt was identical, of course. Normalizing happened when new items were added. The number of steps for a few pieces of paper was numbing.

A view of the future had me hunched over tiny pieces of paper for hours, once a week. I could see fighting the OCR, making tiny adjustments to the data, working to fit in a new receipt type, and perhaps writing scripts that had to be tweaked for each difference. The receipts ended up in a box in the closet for some future bored ambition.

This scene, reduced to a few points, and without reference to my sister, has been used to describe one of the challenges in log management and analysis. Variations in log content, completeness, and form for the same set of data or data types require detailed effort that may not survive the next need. Every business has a variation. For this simple example, a focused effort might provide a solution. There are more programs than businesses, I suspect. Software has no restrictions on creativity and complexity. Logs and the messages they contain can be as varied as the people that create them. I’ve sometimes thought working with logs is akin to a pre-computer information management systems, before training on how to construct, manage, and file was common place.

That comparison only introduces a few of the issues with log management. Even today, a broad range of people presume a simplicity and directness in logs. Knowledgeable folk have brought solutions to the problem, then found more interesting things to work on, leaving the challenge unfulfilled. The complexity of large scale log management is a reflection of the complexity and growth of computing in the last 40 years.

The purpose of this fly spec on the blog sphere is to put down some of the thoughts a solution, or set of solutions, might be built on. At the least, I hope it’s useful to those trying to make sense of it.

1