All Input is Evil

The principle All Input is Evil originates from the field of software development and computer security. It is a behavior that encourages developers to view all input as potentially unsafe or hostile.

While in the 1960s to 1980s, the early years of software development, functions were primarily implemented for scientific systems and the first office computers, the importance of computer security became apparent with the increasing spread of the Internet in the 1990s. So-called hackers exploited errors in the software to penetrate other people’s systems, extract, manipulate and delete data or simply gain free access to paid services. Secret services around the world discovered the potential of the Internet for themselves.

In the mid-1990s, companies, authorities and governments became increasingly security-conscious. Security experts developed the principle All Input is Evil to raise awareness of the need to validate and check all sources other than user input (e.g. databases or files from the Internet).

In the 2000s, software development practices were expanded to include security aspects and security guidelines and training materials were systematically developed and published.

With the principle All Input is Evil, it is now assumed that all input can be potentially harmful and must therefore always be validated and handled. While user input is the most obvious source of malicious input, there are many other types of input that can also pose a threat. So let’s think a bit about where we find inputs that need to be checked.

User input

The obvious, most obvious and most frequently mentioned input is user input via keyboard, mouse and touch gestures. The first, equally obvious thought immediately leads to mandatory fields in forms, which are all too often put to the test with SQL injections or XSS attacks.

In conversations with software developers, I have repeatedly found that business parameters are often checked less intensively than mandatory fields. There are often positive tests for limit values, but negative tests are less common. The internal consistency of business requirements is a particular challenge for both software developers and business experts.

Food for thought: Human lives depend on the correct dosage of ingredients in a medical prescription. If medically unacceptable deviations are permitted, life and limb may be harmed.

From my point of view it is crystal clear, but I would like to emphasize it once again: Basically, you cannot trust data transmissions of user input via web front-ends or apps. Therefore, the input must be validated on the server and rejected if necessary. Attackers may not use frontends that we provide…

Incidentally, user input also includes input via copy & paste functions. A manipulated clipboard can accept supposedly correctly copied values and output incorrect, manipulated values and feed them into the processing procedures.

Files

File uploads are widely known to contain malicious code or to be manipulated in such a way that they can damage the target system, e.g. by buffer overflows or exploiting vulnerabilities in file parsing. Malware uploads are often prevented by Endpoint Protection systems. However, files that “get through” enter the processing procedures of the software system and must also be checked for validity there.

Somewhat less common in development processes is the reaction to intentionally modified and faulty configuration files that negatively influence the behavior of the software. Checks for functional and technical validity must also be implemented here.

Log files have been a suitable target for attacks not only since the Log4Shell vulnerability. If log files are manipulated, these manipulations can be fed into analysis systems. Log parsing tools can also be affected by vulnerabilities, which can then be exploited with the appropriate privileges. Suitable (functional) error handling must therefore be implemented for log files.

Databases

Databases are organized collections of data that are stored in a structured way so that they can be efficiently retrieved, managed and updated. I like to differentiate between the data that is to be transferred into a database and the data that is read from it. Attacks on this data are known as SQL injections and can lead to unwanted and malicious behavior directly on the database side. The classic '; DROP DATABASE datenbankname; -- can have catastrophic effects on a company’s business operations. Reading from a database may be no less exciting, raising the blood pressure of an organization’s entire security and development departments. As with direct user input, the limits of business parameters must also be checked for validity and validity when reading out data before it is processed or output (cf. XSS attacks).

Cache and cookies

Caches are fast storage mechanisms to speed up access to data. While hardware caches can be found in CPUs, web caches can be found in browsers, proxy servers or content delivery networks (CDN). Databases build a query cache and in-memory caches such as Redis or Memcached store frequently used data in RAM. Hard disk caches are in small memories to speed up access to relatively slow hard disks or to equalize access. Data stored in the cache or in cookies can be modified by attackers to deliver malicious content or manipulate the behavior of the software.

Environment variables

Environment variables are indispensable both in software development and in operating environments. They are often used to set environment-specific configurations, for example paths to data or services. Keys and secrets stored in variables are also not uncommon. Environment variables can be easily queried in any programming language and often contain important parameters for the correct functioning of the software. If these parameters are manipulated via environment variables, attackers can inject malicious content, falsify data, disrupt calculations or even manipulate the behavior of the software.

Interprocess communication (IPC) and shared memory

Pipes are unidirectional communication channels that enable a process to send data to another process. The data transferred to a process (sequentially) via pipes must be checked for validity. Popular errors in reception are missing size checks, which can lead to buffer overflows and crashes. Maladic inputs lead to maladic behavior, so these harmful contents must be identified and handled.

When using shared memory for data transfer, different processes access the same memory area. The speed advantage comes at the cost of a more complex access logic, as race conditions can arise, for example, which are to be prevented by semaphores and mutexes.

Property Pipes Shared Memory
Direction of communication Unidirectional (mostly) Bidirectional (multiple processes)
Speed Slower (OS participation) Faster (direct memory access)
Data buffer Volatile, deleted after read Data persists
Synchronization Normally not necessary Requires synchronization (e.g. semaphores)
Intended use Sequential communication Parallel access to large amounts of data
Resource consumption Higher (due to OS interactions) Lower (direct memory access)
Security aspect: access rights By default, only usable for related processes (e.g. parent-child processes). External processes require extended rights. Processes must be explicitly granted rights to access the shared memory, which requires stricter access control.
Security aspect: data isolation Good isolation, as data between processes is volatile and is not stored long-term. Attacks such as buffer overflows can occur, but communication is more difficult to compromise. Higher risk as multiple processes can access the same memory at the same time. Incorrect synchronization can lead to data leaks or unauthorized access.

Communication between two processes cannot only take place on one and the same computer. Inter-process communication also includes network connections. Sockets are a method of connecting processes on the same computer or via a network. They are often used to transfer data between two processes that may be on different machines. This leaves them vulnerable to remote attacks. Security measures such as firewalls, TLS (Transport Layer Security) and authentication mechanisms are important to ensure data integrity through encryption. Inputs that are transmitted to your own software via sockets require, you guessed it, verification.

Signals are an asynchronous method of communication between processes. They are used to send events to other processes, such as requesting a specific action or terminating a process. It is important for your own software to know how to handle incoming signals. For example, a SIGKILL can lead to unwanted data loss, which is why signal handlers should be implemented to process certain signals safely and react to them appropriately.

Sessions

Sessions are used in a temporary interaction between a user (or client) and a server that is maintained over a certain period of time. It makes it possible to store the state and data of a user across multiple requests, even if the protocol used, e.g. HTTP(S), is stateless by nature. Sessions can be taken over and manipulated by man-in-the-middle attacks or XSS. Session data should therefore be encrypted and protected by additional measures, e.g. secure flags for cookies, session timeouts, CSRF (Cross-Site Request Forgery), cryptographically secure generation of session IDs or automatic renewal after each authentication. Understanding the session (data) as input helps to protect it appropriately.

Libraries

Today, software is generally no longer developed “plain from scratch” and the wheel is not always reinvented. With the help of third-party libraries, dependencies are introduced into the software that are sometimes difficult to understand and penetrate. It is therefore advisable to obtain libraries only from trustworthy sources and to check them thoroughly for vulnerabilities. Even with open source software, these functions may contain vulnerabilities or backdoors that allow access to your own systems. Libraries are therefore not a “classic” user input, but rather fall under the heading of “supply chain security” as an input in the development process.

API requests

Hardly any modern application today can do without an Internet-exposed backend. API gateways, for example, provide access to REST APIs, which provide functions and data that are used in your own systems. API calls can deliver manipulated data and thus compromise your own system, cause it to crash or exploit vulnerabilities. The same applies here: All input is evil.

Emails

Let’s pause for a moment and think about where manipulated emails can cause damage. The first thing that comes to mind is email clients. But other programs such as CRM systems also receive and process emails - apart from mail servers. In-house developments with built-in email processing must also view harmful content, links and attachments as input and check them for validity or harmlessness. Have you ever seen a ZIP bomb delivered by email explode in your own program?

Sensors and IoT devices

Manipulating sensors is often easier than you might think. For example, temperature sensors can be manipulated with simple heat sources (e.g. lighters) or by masking them. Therefore, a professional analysis of sensor inputs should be carried out and threshold values for normal and abnormal behavior should be defined. Software tests should take into account defective or failed sensors. Surveillance cameras (= image sensors), heating control systems, cars, traffic lights, escalator controls, elevators, pumps and many more are popular targets. IoT devices, which usually read these sensors, can also be manipulated and deliver falsified data. Some unpatched or unpatchable IoT devices are quickly taken over and remotely controlled by botnets. Inputs such as over-the-air updates (OTA) must therefore also be checked for integrity and authenticity. All functions of IoT devices should be secured according to their threat scenarios.

Machine learning (ML) and artificial intelligence (AI)

I’m not telling you anything new: data is an essential component for training machine learning models and artificial intelligence. It must be ensured that this data is tamper-free (e.g. encryption, signed records, versioning) so that the models can be steered in the right direction. In order to validate model inputs, it must be ensured that no incorrect or illogical values are contained (data cleansing) and that they correspond to the intended data types and limits. Unusual data constellations can be an indication of manipulated data.

Conclusion: Secure software does not come easy.

All the types of input shown illustrate that the All Input is Evil principle is aimed at much more than just user input. Careful validation, sanitization and control of all inputs can significantly reduce the risk posed by potentially malicious data.

Companies often assume that software developers write secure software “on their own”. The truth is that requirements for secure software are often not formulated at all and yet it is expected that there are no vulnerabilities. No software developer should be 100% “sold” on projects or the like. It is utopian to expect security comes easy. A rethink must take place here.

In order for input data to be validated in accordance with business requirements, software developers must be enabled to take suitable and appropriate protective measures. To do this, developers only need three things:

  1. time to learn software-side protection mechanisms for secure application. Companies should give their development departments enough freedom to acquire and expand skills for secure software development. Starting with awareness measures tailored to the technologies used, through to challenges in the respective programming language, hackathons and also work shadowing with software testers and pentesters. Time freedom, technical “playgrounds” and the provision of (external) skills are the key to gaining experience.
  2. software security requirements to be implemented. Compile requirements for the security of the software to be developed. Functional requirements, non-functional requirements and security requirements must be implemented together - no requirement stands alone. Software developers build exactly what they are told to build. If no security requirements are formulated, none will be implemented.
  3. time to implement the protective measures during development. Even if experience in the implementation of security requirements is consolidated, the implementation of security measures still takes time. Listen to your developers when they call out effort estimates to you.

In my opinion, the principle discussed here is a powerful, albeit often neglected, principle in IT security. It is sometimes pushed into the background by the strong presence of network security and security management products, but it is a sharp sword in a company’s security concept. Because wherever software is to be found, exploitable vulnerabilities can also be found. Consequently, this is exactly where you have to start in order to fundamentally improve the security of the software: In software development. And to achieve this, companies must be prepared to invest in the training and continuous professional development of their software development departments. Secure software is increasingly becoming a critical success factor and the sooner you start implementing security requirements, the more you stand out from the competition. The skills required to develop secure software are therefore business-critical.