What Happens During a Penetration Test (Step-by-Step)
What a good engagement actually looks like, end to end.
For organisations buying penetration testing for the first time — or for those whose last test left them with a 200-page PDF and a vague sense of unease — the process can feel opaque. What does a tester actually do? Where does the testing happen? What can go wrong? And how do you know whether you got value?
A properly run penetration test follows a defined methodology, sets expectations up front, and produces evidence you can act on. This guide walks through what a good engagement looks like end-to-end, with specifics on what should happen at each stage, what your team should expect to do, and what questions to ask if something feels off.
Scoping and authorisation
Every credible penetration test starts before any testing begins. The scoping conversation determines what is being tested, what techniques are in and out of bounds, when testing will happen, and who is authorised to receive findings. This stage is not paperwork — it is the foundation of a safe, legal and useful engagement.
Expect to agree:
- In-scope assets — specific IP ranges, applications, domains or environments. “Test the network” is not a scope; “test the 10.50.0.0/16 production network and the customer portal at app.example.com” is.
- Test type and depth — black-box (no information), grey-box (limited information, perhaps a low-privileged account) or white-box (full information including source code or architecture diagrams). Grey-box is the most common because it balances realism with value.
- Rules of engagement — testing windows, denial-of-service rules, social engineering boundaries, handling of any genuinely sensitive data encountered, and escalation paths if something breaks.
- Authorisation in writing. A signed letter of authorisation, sometimes called a “get out of jail” letter, is mandatory. Without it, testing is illegal under the Computer Misuse Act 1990 in the UK, regardless of intent.
Reputable testers will push back on under-scoped or under-resourced engagements. If a provider agrees to test a complex application in two days for a fixed low fee without asking probing questions, that is a sign the testing will be shallow at best.
Discovery and analysis
Once authorisation is in place, the tester begins discovery. The goal is to build an accurate picture of the target as it really exists — not as it was documented in the original scope. Modern environments drift, and the gap between the asset register and reality is usually where the most interesting findings live.
Discovery typically combines:
- Passive reconnaissance — DNS records, certificate transparency logs, public code repositories, leaked credentials in breach datasets, and search engine artefacts. None of this touches the target.
- Active enumeration — port scanning, service fingerprinting, directory brute-forcing, API endpoint discovery, and authenticated crawling where credentials are provided.
- Manual investigation — reading responses, exploring application logic, mapping authentication and authorisation flows, and looking for the seams between systems.
A scanner produces a list of services and CVEs at this stage. A penetration tester uses that as raw material, then asks the questions a scanner cannot: which of these issues actually matter here? Which of these endpoints behave differently for different user roles? What assumptions has this application made about its callers? The output of this phase is a prioritised list of hypotheses to validate, not a list of findings to report. Reporting comes after exploitation.
Exploitation and validation
Where authorised, the tester now safely exploits the most promising weaknesses to demonstrate impact. The point is not to break things — a good tester actively avoids that. The point is to remove ambiguity. A finding that says “this might allow privilege escalation” is much less useful than one that says “this allowed us to obtain Domain Admin within three hours, and here is the proof”.
Exploitation can include:
- Chaining several low-severity issues into a single high-impact path.
- Lateral movement from a foothold to assets of real value (databases, identity infrastructure, source code).
- Demonstrating authorisation bypasses with concrete evidence (one user accessing another user’s data, for example).
- Validating that exploitation in your specific environment matches the theoretical CVSS score — or, often, does not.
This is also where the tester confirms or rules out the false positives that almost always survive earlier phases. The difference between a 50-finding report and a 12-finding report is rarely the quality of testing; it is usually how rigorously findings have been validated.
Throughout this phase, the tester stays in contact with your security or technical lead. If anything genuinely critical is found — for example, a live attacker is already present, or a finding is so severe that production stability is at risk — the engagement pauses and you are notified immediately. This is sometimes called a “stop the test” finding.
Reporting and retesting
The report is the deliverable, and it is where many penetration tests quietly fail. A good report should be useful to three different audiences at once: the board, the security team, and the engineers who will fix the issues.
Expect to see:
- An executive summary in plain English, with a clear narrative of what was tested, what was found, and what it means for the business.
- Prioritised findings with severity ratings that reflect real impact in your environment, not raw CVSS base scores.
- Reproducible evidence for each finding — exact requests, screenshots or steps — so engineers can verify the issue, understand the fix and test the remediation.
- Specific remediation guidance that is actionable in your stack, not generic platitudes pulled from a knowledge base.
- Strategic observations about patterns or root causes that span multiple findings.
The engagement does not end at delivery. A reputable provider includes retesting in the cost. Once remediation work is complete, the tester re-verifies each finding to confirm it has actually been fixed. This is the single most undervalued part of the process: it converts “we patched it” into “it is provably patched”. If a provider charges separately for retesting, or provides only a written reassessment without actually re-running the tests, treat that as a warning sign.
Frequently asked questions
How long does a penetration test take?
Most engagements run between one and three weeks of active testing, with another one to two weeks for reporting. Larger, more complex environments take longer.
Is penetration testing legal in the UK?
Yes, with written authorisation. Without it, testing falls foul of the Computer Misuse Act 1990. Reputable providers will not begin testing without a signed authorisation letter.
What qualifications should I look for?
For UK-regulated work, CREST and NCSC CHECK are the gold standards at the company level. At the individual level, OSCP, CRTO and OSWE are reasonable indicators of hands-on capability.
Will the test break our systems?
Properly scoped testing rarely causes outages. Risky activities such as denial-of-service testing are excluded by default and only carried out where you have explicitly asked for them.
