How to Decommission On-Premise Private Networks

I have been having some interesting customer discussions lately around the future of Enterprise Networking. For most customers today, Hybrid cloud is the reality and the new norm. This involves traditional on-premise private networking interconnected to various cloud datacenters through traditional methods such as VPN.

I have had some customers though who are interested in going 100% cloud and are curious about how that might be accomplished. Particularly in the Enterprise space, there is a need for a global network backbone, a network core, and remote access / VPN in addition to various service layers. There are a lot of network services and complex configurations that either need to remain in a co-lo data center or be ported to the cloud.

Now these days it is possible to lift and shift these last components to the cloud. Nearly every cloud provider has all the analogs needed or have the ability to deploy a virtual appliance for any of your major network appliance boxes. Platform network services are available as well as VSAs from every manufacturer. Whether you use Cisco, Palo Alto, etc, etc, there are Virtual models of your favorite appliances available so you can mirror your on-prem configuration in the cloud. This is essentially the traditional “lift and shift” but with Networking equipment in scope rather than workloads.

As we all know, Lift and Shift isn’t the ideal. It doesn’t take advantage of modern services that are available, doesn’t usually map cleanly to the strengths of the cloud data center, and is generally very expensive. It also hold on to some legacy architecture which will continue to require legacy skillsets in addition to the new cloud skill sets that are required.

So what is the new way? Or rather, what is a better way to provide private network services and all of their important features without keeping a physical presence and without doing a legacy lift and shift? Everybody has some ideas and for the most part they still have one foot in legacy, trying to keep vendor relations and skillsets relevant.

I recently had a potential customer challenge me to solve this problem as an academic exercise / B2B interview.

Defining the Business Requirement

The Internet Provides:

•A resilient network infrastructure
•Optimized routing between resources
•Support for all protocols and standards
•Ability to move / relocate resources

So why use a complex and expensive Private Network?

This is by no means a complete list, however, here are some of the core features provided by a Private Network. These all map to common business requirements and generally provide services that are not available on the public internet.

Global Network Backbone
•Provides Private Communication
•Provides geographical security boundaries

Regional Network Core
•Provides network segmentation and routing
•Provides insertion point for policy and security

Network Perimeter
•Provides access to the internet
•Provides security boundary for internal network

•Allows remote end-points to participate with internal network

• Provides Network redundancy / Resilience
• Provides cost based / optimal routing

• Provides various levels of access control

• Provides physical autonomy for services
• Enables zero Trust Models
• Enables principal based access control

Identity Provider
• Provides authentication
• Provides federation and SSO
• Provides MFA
• Provides Security Principals and Groups

• Netflow

Net Properties of a Private Network





The goal of a private network is to provide connectivity to private resources, privacy of data and communications, control over access, and in many cases, visibility into access patterns, failed events, auditing, etc.

A Modern Approach

So how can we achieve these same core capabilities and even some/all of their granular sub-capabilities without investing in these big network appliances and service layers?

An Ideal Solution would have these qualities

  • Use the public internet as a global backbone
  • Utilize a Zero-Trust / no access default
  • Provide the optimal route between resources
  • Provide private communication over the wire at all times
  • Allow for centralized and discrete Access Control to resources
  • Provide an audit trail for compliance and security purposes
  • Not require proprietary or expensive hardware

Thought Process

The internet itself provides a resilient, performant, and inexpensive access layer. Anyone has access to this default fabric from just about anywhere and over a wide variety of mediums.

The best path between any two devices is generally via their public IP addresses or the public IP address of their local internet gateway.

Various technologies exist for authenticating, authorizing, and securing access between resources

Zero Trust, not just at the authentication layer, but also at the access layer would prevent brute force or exploit attack vectors on internet accessible resources.

Point to Point VPNs would allow secure communication between any two devices but also allow access to services that are not presented over public IP addresses.

A persistent private ID for each resource would support full mobility of resources between remote offices, cloud regions, etc.

An installable OS / network extension would be ideal as opposed to a physical appliance based approach

A solution that could utilize existing cloud-based identity providers would be simple and facilitate central management as well as taking advantage of native features in that IDP such as MFA, Risk-based conditional access, SSO, etc, etc, etc.

A solution with a simple and central management interface could easily take the place of countess devices and the administrative burden / complexity that that bring.

So where to start?

It occurred to me that a point to point / mesh VPN with central management would knock most of these requirements out easily. I did some research and found a solution that I thought would work well. I did some testing and came back with this:

  • Point to Point communication between clients over the public net
  • Superior NAT traversal
  • Strong key-based encryption
  • Persistent private IP addresses with a variety of DNS options
  • Zero-Trust default
  • Centrally managed Access Control and Endpoint visibility
  • Makes use of common IDPs such as Azure AD or Google
  • Inherits the advanced security features from these IDPs
  • A variety of ACL types including user and group based ACLs
  • Software only / agent based on all major OSs including mobile devices
  • Ability to create proxies to bastioned resources or resources that don’t support the client
  • Ability to create routes between remote data centers and maintain ACLs for those address spaces
  • Audit logging at top of feature roadmap


I am seriously impressed. The first thing that jumped out at me is how powerful Zero-Trust is when combined with Principal (user/group) based ACLs. Being able to control resources based on group membership is hugely beneficial for a large number of reasons.

I took this another step further. Most resources support AD-integrated or LDAP based authentication. Making use of groups I was able to secure resources such that not only is login prohibited without the right group membership but all network access is also prohibited. I was able to test a scenario where a network resource was unreachable and login was not allowed even if it were reachable. Then simply by adding a user into an Active Directory group, I was able to grant both network access and login access. Furthermore, the access mechanism (the client) requires MFA to login to the network. An extra data point here is also that the network resource is bastioned and has ZERO internet access at all. This create several layers of security that are difficult to achieve with a legacy network.


As you can see, there is no network access to my Virtual Center server. It is on a bastioned private network and I am on the public internet. The security group that I will use to grant access is empty.

Granting Access

As you can see in the top right, I’m logged in via SSO


In conclusion I think its possible to create equivalent enterprise services to the ones we use on prem today. I think we can do it without a lot of complexity, and I think we can do it with a smaller team than ever before. Taking a completely bastioned service and exposing access over the public net through an Active Directory group membership is something I have never personally experienced using complex and expensive private networking products. I know its possible but the implementation overhead and capital expenses are formidable. I think that this approach could be a win.

Wrapping Up

•Challenge Traditional Approaches to Infrastructure, Security, and Access
•Cloud and Hybrid Cloud Infrastructures can be significantly simpler but must be embraced
•The answer may be throwing out attachment to legacy services
•Leverage Modern Authentication and IDPs wherever possible
•Network with people who are learning and doing interesting things

Featured post


Thank you for stopping by.  Whether you came here deliberately, by accident, or were lured in by the smell, you are welcome.  Peruse, learn, comment, contribute, but please don’t hate.

Not all who wander are lost. — J.R.R. Tolkien

Featured post

Installing an Azure Stack PoC in a VMware Virtual Machine


Azure Stack (rebranded as Azure Stack Hub, not to be confused with Azure Stack HCI) is offered as an HCI solution by Microsoft OEM hardware partners and is supported by Microsoft. In short, Azure Stack is an on-premise Azure Region. I won’t go into detail about that here, keep an eye out for a different article on that.

Azure Stack requires a minimum of 4 hosts with identical storage configuration. You’re able to install an Azure Stack PoC on a single host provided that it has enough resources. This is called the ASDK (Azure Stack Development Environment). There are some firm MINIMUM requirements for installing the ASDK and they should not be neglected.

Minimum Hardware Requirements

  • Physical Computer
  • 96GB of RAM, 16 CPU cores
  • 450GB of storage across 4 physical drives.
  • Drives must be identified as SAS connected
  • Drives must be identified as NVMe, SSD, or HDD

We’re not just setting up a hypervisor here. A fully deployed ASDK includes many virtual machines all of which are running cloud fabric services. It can take a full day to deploy all of these resources so please, take the HW requirements seriously.

I happen to have a server in my lab that can provide this level of resources, however, I don’t want to bother with reconfiguring the host outside of my vSphere cluster so I was determined to deploy it as a VM. There are several other articles on the net about this topic, however, no single one of them provided everything that I needed to be successful. Also none of them that I found directly solved the last problem I had to overcome. I didn’t try this on VMware workstation so those tutorials may have been adequate for that purpose but this one will cover getting the ASDK up and running on ESXi


I’m not going to go through the setup procedure step-by-step. I am assuming that you have read the documentation and are aware of the process for deploying Azure Stack. Perhaps I will do a post specifically on that but there are plenty out there already. The items below will only address overcoming the barriers, not the whole process of pre-work, staging, or actual installation steps.


My first recommendation is to read this article in it’s entirety before you start installing things. Some of the more critical things are covered last but need to be taken into account from the beginning.

Inside your ASDK Virtual Disk environment, disable the Windows Update Service. Installation of Azure Stack can take 24 hours. You don’t want Windows deciding to update itself during this time.

I do recommend following the resource recommendations. If you are able, use RDMs or VMDKs that aren’t sharing a drive. If you’ve got SSDs or SAN backed storage with plenty of performance, that is fine too, just keep in mind that there will be a nested hypervisor and a lot nested VMs which will all need resources to perform adequately. You don’t have to have a full 96G of RAM or 16-cores but you want to. Give it more if you can.

Last, I recommend that you research anything that doesn’t just make sense. I usually go into detail and fully explain things but I’m short on time today. If you don’t know how to do these things, look them up, nothing is mysterious or proprietary. If you can’t find a file or setting referenced here then first familiarize yourself with the whole installation process using the most recent guides.

I’m also not going to specify line numbers for modifying files because they change somewhat frequently.

Requirement 1 – Physical Computer

Yeah it checks. I didn’t find a satisfactory way to obscure that the installation was inside of a virtual machine from the installer. The trick is to modify the installation scrips to disregard the fact that its installing in a VM.

  • Enable CPU Virtualization on the VM, you will need to run a nested Hypervisor so don’t forget this step
  • Modify the asdk-installer.ps1 script. Search for “Physical” and identify the code block with an if statement referring to this being a physical host. Comment out the block or modify it in your own way to makes sure that installation continues if the host is detected as a VM
  • Note this file doesn’t exist until you have already failed an install or you manually pre-stage all of the dependencies. Inside the ASDK Boot VMDK, modify C:\CloudDeployment\Roles\PhysicalMachines\Tests\BareMetal.Tests.ps1 The same goes here and there are tutorials on this already. Find the check blocking installation on a VM and alter it. Just change the $true/$false flag in the if statement.
  • Note: I have read that HW Version 11 is required. I have not tested with prior versions.

After performing these steps, installation should continue on a Virtual Machine. You can always rerun the InstallAzureStackPoC.ps1 with the -rerun flag and install will pick up where it left off provided there are no truly serious errors.

Requirement 2 – Resource Minimums

I do recommend having the minimum number of resources. Yes, I know, broken record. There are reasons! But we can’t always make that a reality so you can alter the minimum resource requirements in these files. Note: Sometimes these paths change.

  • C:\CloudDeployment\Configuration\Roles\Infrastructure\BareMetal\OneNodeROle.xml
  • C:\CloudDeployment\Configuration\Roles\Fabric\VirtualMachines\OneNodeRole.xml

Requirement 3 – Physical Hard Drives

The physical hard drive requirement isn’t a performance issue. The drives will be used to make a Storage Spaces Direct Storage Pool and so the drives will need to pass the Failover Cluster Health Checks for Storage Pools Direct. It will check for multi-writer capability. Also there are drive minimums for Storage Pools Direct. If you provide 8 devices with 2TB of storage you will get a resilient Storage Pool, otherwise you will get a non-resilient Storage Pool (No parity or mirroring). If you have a mix of HDD and SSD then provide a couple of small SSD devices (minimum of 2) as a cache tier.

  • Use RDMs or make your VMDK file Eager Zero Thick (There is no way around this)
  • Do not enable VirtualSSD options if the devices are HDD
  • Select “Multi-Writer” sharing for the VMDK files
  • Use the LSI Logic SAS controller
  • Select Virtual SCSI Bus Sharing

This will allow your disks to pass the Cluster Validation Checks

Requirement 4 – SAS Bus

If you’re using the LSI Logic SAS Virtual SCSI Adapter then this is a non-issue. I don’t recommend fighting with the other device types.

This will probably not apply to a virtual environment but for completeness, you can add acceptable BUS types here (But you still must conform to Storage Spaces Direct requirements) Note: The instructions have deprecated and should be a non-issue for vSphere so I have removed them. I may update this later.

Requirement 5 – Drive must be identified as NVMe, SSD, or HDD

In my ESXi 6.7 environment, SSD are properly identified in the Windows guest as SSD drives. HDD drives are detected as SSD or “unspecified.” If HDDs are improperly detected as SSDs, there seems to be a performance issue as Storage Spaces Direct is an intelligent storage system and may be trying to use native SSD command sets. Unspecified drive types are not allowed and fail the health checks. Note: If you have all SSDs and they are all detected as SSDs then this section is probably unnecessary.

Here is where this gets problematic. As of this post we can manually specify the media type of a drive using PowerShell. However, that cannot be done until the disk is a member of a non-primordial storage pool. As soon as they are removed from the storage pool, they lose their custom attributes. This creates a chicken and egg dilemma. Luckily if we create the proper Storage Pool for the ASDK installer, it will use it and successfully install. The trick here is just knowing what to call it.

  • Create a storage pool called SU1_Pool and include all of your drives
  • Make sure that all HDDs are properly marked as HDDs and all SSDs are properly marked as SSDs
  • Change any FrindlyNames to your preference
  • See this guide here for examples: Managing Storage Spaces Direct with PowerShell

This should be enough to get you through any of the hurdles that I hit along the way. Feel free to ping me with questions or assistance.

How to Mange Storage Pools Direct with PowerShell

Mini Guide

I need to get some examples documented here as reference for another post so this will be skinny but I will circle back and make it more complete later.

Getting Physical Disk Information

Get-PhysicalDisk |fl *Friendly*,*Media*,*Size*,*Serial*

Changing Physical Disk Properties

Note that Physical Disk properties can only be changed for disks that are in a non-Primordial storage pool. Once the disks are removed from a storage pool, the manually assigned properties revert.

$disk = Get-PhysicalDisk -SerialNumber 600029212*
Set-PhysicalDisk -InputObject $disk -NewFriendlyName HDD1 -MediaType HDD

Azure NetApp Files

As of this blog, ANF is general availability, however, it is not available in the Azure Portal to all users. There is a waitlist to get whitelisted for provisioning, but if you would like that expedited, please contact me and I will get you provisioned.

What is Azure NetApp Files?

Microsoft Azure and NetApp have partnered to bring a first class NAS environment to Azure but utilizing NetApp’s ONTAP as a backed storage service.

Azure NetApp Files, henceforth referred to as ANF provides a resilient and performant NFS or CIFS environment for general file share, HPC, or Database environment. ANF also provides Ransomware protection via Snapshots. ANF is not available in every region but it is available in at least one region in most geographies.

ANF is a 1st-party service. This means that it is provided by Microsoft Azure. There is no NetApp Account team, no reseller (unless you are buying Azure through a reseller) and all ANF usage is billed directly from Azure.

How can I use ANF?

ANF is provisioned and administered directly in the Azure Portal, however, it is not available there until you have been whitelisted. There is a Waitlist for getting your ANF consumption approved and enabled.

How am I charged for ANF?

Getting whitelisted for ANF does not obligate you to buy ANF and if you decide to remove ANF from your Azure environment, you are no longer charged. It is truly subscription and consumption based like any other Azure service.

The base unit of ANF consumption is a Capacity Pool. Capacity Pools are minimum 4TiB and maximum 100TiB. You can provision more than one Capacity Pool. Upon creating you first Capacity Pool, you will be charged an hourly rate for the consumption of that Pool. The Capacity Pool will autogrow if you exceed it’s capacity. There are different performance levels available for a Capacity Pool and each of these are billed at different rates. Keep your eye out for a more in-depth page on the technical details of the different logical constructs within ANF and how they are used.

How does ANF compare to CVO?

Cloud Volumes Ontap (the software previously known as ONTAP Cloud) is a virtual machine deployed into your cloud environment. The virtual machine runs ONTAP and is able to provide all features and functionality of an ONTAP hardware appliance (other than Fiber Channel). CVO can run stand-alone or as an HA pair spanning Availability Zones. There is some flexibility with CVO, particularly if you want to use it for test/dev or as an ONTAP replication target.

Today ANF is a secure multi-tenant hardware appliance sitting adjacent to Azure. Because of this, it has substantial performance advantages over CVO. Because it is G1 as a 1st-party service, not every ONTAP feature is currently available for ANF.

In short, use CVO for all things ONTAP and use ANF for all things performance whether it be an clustered HPC environment or an Oracle Database.

Basic ANF pricing at MSRP
Note: I can provide better pricing if you contact me

Pricing details

Price (West US 2)
Standard Storage$0.000202/GiB/hour
Premium Storage$0.000403/GiB/hour
Ultra Storage$0.000538/GiB/hour

Getting Started

Microsoft Azure NetApp Files Waitlist Request
Note: I can get you provisioned quicker than the Waitlist

Persistent Memory 101

I’ve written a few guides for Persistent Memory recently and slipped in bits and pieces of information here and there. I decided to consolidate the little things like nomenclature into a different place. So if things in another post aren’t clear just because you haven’t read extensively then the answers should be here. Moving those things here will make them easier to find and keep them consistent. Hopefully its also less distracting to the content in the other posts.

Persistent Memory is different than traditional non-volatile memory in that it is actual memory addressable memory. Were talking about DIMMs located in DIMMs sockets on a motherboard. This is actual DRAM that that has an added mechanism or mechanisms for making it persistent across reboots or power outages.

While being incredibly fast, RAM does not easily lend itself to being a storage media. Most applications function by accessing a block device, not a memory region and these constructs just aren’t available without some assistance. DIMMs are not inherently all that serviceable. You have a pool of RAM and one stick goes bad. How can this be tolerated? How can I perform a hot-memory replacement with minimal impact? To address all of these things, CPU and Motherboard manufacturers have had to extend some specifications in order to do things like partition and group the DIMMs in intelligent and/or configurable ways.

Operating Systems have had to add device types and features to support an array of different access methods. Do we want to access our Persistent Memory as a traditional block device, a PMEM aware block device, or a character device that applications can access natively? All of these are possible but require different configuration steps and have different performance profiles.

Persistent Memory Nomenclature

PMEM refers to DRAM memory spaces which have been backed by a persistence mechanism such as a battery and/or NAND directly attached for de-staging.

An NVDIMM is the physical component of Persistent Memory. NVDIMMs fit into a RAM socket on an NVDIMM compatible motherboard.

A PMEM Region is a logical unit of PMEM. A region could be a single NVDIMM or a partition on a single NVMDIMM. It could be all of the space on all of your NVDIMMs collectively, or it could be a partition sliced across multiple NVDIMMs. As a gross generalization, Regions are configured in BIOS/EFI and are constructed before the OS boots. I like to think of NVDIMMs as physical disks and a Region as a Logical Drive.

The Linux Kernel PMEM driver. This library is required for initializing NVDIMM Regions, and constructing Namespaces.

Name Space
Much like NVMe, PMEM makes use of Name Spaces. A Namespace can be an entire Region or a piece of a Region. The Namespace is the basic construct the Operating System will work against.

Page Cache or Buffer Cache
Read-after write cache where blocks written to persistent media are tracked and cached in memory. This process speeds up IO for disk based media. This process is unnecessary for PMEM and can actually slow things down. When PMEM is used as a block device hosting a file system, the page cache is in use.

Capability that allows a file system to bypass the page cache and write directly to PMEM. Currently EXT4 and XFS have DAX support if mounted with the dax mount option.

VMware nomenclature for passing NVDIMM directly to a VM. When using vPMEM, PMEM capacity is passed directly through to a VM as a virtual NVDIMM. The guest Operating System must support NVDIMMs.

VMware nomenclature for presenting PMEM capacity to a VM as a vmdk file connected to a virtual SCSI controller.

User space cli tool for configuring PMEM namespaces

The Persistent Memory Development Kit provides additional tools and libraries for managing PMEM.

Character Device
Devices where the driver communicates by sending and receiving a single character at a time rather than a whole block of data. This is the type of device used by applications with native PMEM support.

How to break a hung lock in NetApp ONTAP 9

If the reason you need to close locked files is to stop the whole CIFS server, then there are different instructions.  If you are just trying to recover a file that has a stuck lock, this should help.

If you are familiar with Window file servers, the NetApp CIFS server works the same way.  You can connect to it with the File Sharing MMC tool in Windows.  The user you are logged into Windows as needs to have Administrator rights on the File Share.


In the MMC

File->Add/Remove Snap-In->Shared Folders->Add

Select “Another Computer” and enter the name of the vServer.

You can leave “All” selected.

Click “Okay”

Navigate to “Open Files”

Find the file that is locked

Right-Click on the file and choose “Close Open File”

This will fix a locked file issue most of the time.

To close the file in the ONTAP Command Line, it is a lot more complicated.

You need to know the vServer Name (file server)

You need to know the Volume Name (usually the share name)

It helps to know the whole path to the file.

Login to the ONTAP Command Line

To show ALL locks (lots of output)

vserver locks show -protocol cifs

To show all locks on one vserver

vserver locks show -protocol cifs -vserver [vservername]

To show all locks on a specific volume and or path

vserver locks show -protocol cifs -vserver [vservername] -volume [volumeName}-path [ontapPathToFile]


vserver locks show -vserver wdl-svm-management -volume wdl_files

wdl-ontap1::> vserver locks show -vserver wdl-svm-management -volume wdl_files  -protocol cifs                                                       
Vserver: wdl-svm-management
Volume   Object Path               LIF         Protocol  Lock Type   Client
-------- ------------------------- ----------- --------- ----------- ----------
         /wdl_files/               wdl-svm-management_cifs_lif1
                                               cifs      share-level
                Sharelock Mode: read-deny_none
                Sharelock Mode: read-deny_none
                                               cifs      share-level
                Sharelock Mode: read-deny_none
                                               cifs      share-level
                Sharelock Mode: read-deny_none
4 entries were displayed.

To break a lock, use the break command and the full path from the output above

vserver locks break -vserver [vservername] -volume [volumename] -path [full_path]


wdl-ontap1::> vserver locks break -vserver wdl-svm-management -volume wdl_files -path /wdl_files/home/Administrator/Security/Certificates/CAs/LAB-PDX-DC-01-CA

Warning: Breaking file locks can cause applications to become unsynchronized and may lead to data corruption.
Do you want to continue? {y|n}: y
1 entry was acted on.

vSphere 6.7 – Resetting the SSL state back to zero

I have a few other tutorials on here regarding vSphere SSL certificates. I found that there were a variety of issues which led to a problematic SSL state that was difficult to recovery from.

The guide will show you how to get back to a stable starting point so that once you understand the process, you can install custom SSL certificates without any problems.

This guide only covers a VCSA environment. If you have a Windows environment, the tools and paths will be different, however, the concepts are the same.

I have created some scripts to make this process simpler. When I have a moment, I will upload them to github and link it HERE. (If you would find these helpful before I get that done, please contact me and I’ll get them uploaded sooner).

Step 1:
Unregister any 3rd-Party Extensions. These will often block successfully installing / updating the PSC certificates. Here are a couple of useful example links, or refer to the documentation for your 3rd-Party Extension Provider.
Remove Extensions using SSH
Remove Extensions using the MOB Browser

Step 2:
Attempt to use Certificate Manager to revert to default / self-signed certificates. This may not work if you are having other SSL related issues but try.

Step 3:
Identify and remove all non-VMware Root CA’s registered in the certificate store. This can feel complicated the first time. You will need to get familiar with a few tools, hopefully you are comfortable with the Linux CLI. This was tedious enough for me that I wrote some scripts which I will reference in addition to showing you the command line utilities. The instructions for doing this will be included below.

Step 4:
Assuming that Step 2 was not successful before, attempt Step 2 again. If you can’t get Step 2 working then installing your own certs won’t go any better.

Step 5:
If you can’t get Step 2 working then you are going to have to parse through your logs for warnings or errors. I recommend backing up or deleting your Certificate Manager Log file and running Step 2 over again. This way you will only have to parse data from one run at the process.

rm /var/log/vmware/vmafd/certificate-manager.log
Or use my script
cat /var/log/vmware/vmafd/certificate-manager.log |grep -i 'warning \|error \|fail' |more

I recommend starting with your favorite search engine for errors, but feel free to reach out to me if you can’t find a solution.

Step 6:
If you can’t get to the bottom of this, then I recommend upgrading to the latest update available and installing all available patches then trying again from Step 2.

If this doesn’t work then you might need to reinstall your PSC. This isn’t too difficult actually. Just do a backup, run the installer (Install the latest version) and redeploy using the backup files.

Viewing the Contents of the Root Certificate Store

vecs-cli usage:
/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOTS --text

This will make it easier to read
/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOTS --text |grep -i 'Alias\|Subject:\|Before\|After\|issuer'

Or use my script

Example Output
Alias : f052cf63552bc9cb365c199b3320fa383415979f
        Issuer: CN=CA, DC=vsphere, DC=local, C=US, ST=California, O=wdl-psc-00.lab.local, OU=VMware Engineering
            Not Before: Feb 21 20:31:25 2019 GMT
            Not After : Feb 18 20:31:25 2029 GMT
        Subject: CN=CA, DC=vsphere, DC=local, C=US, ST=California, O=wdl-psc-00.lab.local, OU=VMware Engineering
Alias : 5ab252164061b935c22128f875a264fec8efd1d0
        Issuer: CN=CA, DC=vsphere, DC=local, C=US, ST=California, O=pdx-psc-00.lab.local, OU=VMware Engineering
            Not Before: Feb 27 16:20:59 2019 GMT
            Not After : Feb 24 16:20:59 2029 GMT
        Subject: CN=CA, DC=vsphere, DC=local, C=US, ST=California, O=pdx-psc-00.lab.local, OU=VMware Engineering
Alias : ff1f984a104a7c265ab6a3bd98c5b9a22c809b70
        Issuer: DC=local, DC=lab, CN=lab-PDX-DC-01-CA
            Not Before: Feb 20 17:08:18 2019 GMT
            Not After : Feb 20 17:18:18 2039 GMT
        Subject: DC=local, DC=lab, CN=lab-PDX-DC-01-CA
Alias : e6575bb7c6e3486bd4355e236e8dbefb0ddfb013
        Issuer: DC=local, DC=lab, CN=lab-PDX-DC-01-CA
            Not Before: Mar  2 18:51:15 2019 GMT
            Not After : Mar  2 19:01:15 2021 GMT
        Subject: C=US, ST=OR, L=PDX, O=Local Lab, OU=Engineering, CN=PDX-PSC-00-CA
                CA Issuers - URI:ldap:///CN=lab-PDX-DC-01-CA,CN=AIA,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=lab,DC=local?cACertificate?base?objectClass=certificationAuthority
Alias : e56a6f43a38003e101c2abfc35f0ad50de7218b9
        Issuer: DC=local, DC=lab, CN=lab-PDX-DC-01-CA
            Not Before: Feb 25 05:23:12 2019 GMT
            Not After : Feb 25 05:33:12 2021 GMT
        Subject: C=US, ST=WA, L=WDL, O=Lab.local, OU=Engineering, CN=WDL-PSC-00-CA
                CA Issuers - URI:ldap:///CN=lab-PDX-DC-01-CA,CN=AIA,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=lab,DC=local?cACertificate?base?objectClass=certificationAuthority

In the case above you would want to identify (and delete) the aliases for everything that isn’t a VMware self-signed cert.

The process is below or you can use my script.

Backing up the Aliases
Backing up the aliases is part of deleting them. I will assume that you have a folder called /certs on your PSC host.

vecs-cli usage:
/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store TRUSTED_ROOTS --alias $ALIAS --output /certs/$ALIAS.crt

/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store TRUSTED_ROOTS --alias ff1f984a104a7c265ab6a3bd98c5b9a22c809b70 --output /certs/ff1f984a104a7c265ab6a3bd98c5b9a22c809b70.crt

Un-publishing the Alias
The alias needs to be unpublished before it is deleted or there is some risk that the certificate will be restored to the certificate store. The backup copy of the cert is used for this process.

dir-cli usage:
/usr/lib/vmware-vmafd/bin/dir-cli trustedcert unpublish --cert "/certs/$ALIAS.crt"

/usr/lib/vmware-vmafd/bin/dir-cli trustedcert unpublish --cert "/certs/ff1f984a104a7c265ab6a3bd98c5b9a22c809b70.crt"

Deleting the Alias

vecs-cli usage:
/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store TRUSTED_ROOTS --alias $ALIAS

/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store TRUSTED_ROOTS --alias ff1f984a104a7c265ab6a3bd98c5b9a22c809b70

Afterward list the aliases again to make sure the one you deleted is gone.

vSPhere 6.7 – Custom SSL Certificates

Note: There are 20 tutorials for installing custom SSL certificates out there on the net. I’m not going to cover that in detail. What I will cover here is all of the little things that you will need in order to be successful following one of those tutorials. They aren’t comprehensive and everything goes according to plan. In my experience that just isn’t a fair representation of what happens, especially after upgrades or not getting it right the first time. So read this first and then go try one of the tutorials. Or if you’re stuck on one of those tutorials, hopefully this will get you out of the muck. I’ll post links to some useful tutorials and documentation at the end of this post.

Custom SSL certificates prior to vSphere 6.x used to be a frustrating proposition. It wasn’t particularly easy to do and once done, it caused a lot of misc operational issues. It almost always made troubleshooting more difficult and it could cause communication issues between 3rd party components. The way its done now is consistent and convenient. It’s even easy once its in place.

In 6.x, the Platform Services Controller issues SSL certificates to every component participating in the SSO domain. Even 3rd party plug-ins are issued an SSL cert directly from the PSC. This is done by default with a self-signed root CA certificate. All we need to do in order to have the PSC issue valid SSL certificates for our own environment is to authorize it as a signing authority in our SSL signing chain.

Check out my other blog posts and security pages on SSL basics and 80/20 rules for success.

The process is the same for 6.5 and 6.7. I have had more success with these tools working the further along I am in versions. If you’re considering an upgrade to 6.7 then do that first. If you’re on 6.5 or 6.7, install all available updates and patches first.

First of all, before you start or if you’re having trouble, install the latest patches. Before you get into the meat of installing your own certificates, install the latest versions, updates, and patches. Did I say that enough times? There are some bugs in the SSL tools, pretty much in every version. One of them is even related to creating your CSRs so upgrade before you even get started.

I am only covering the process for the VCSA and not a Windows VC server. I have a distributed VCSA environment in my lab with multiple sites, PSCs, and VCs. The important pieces covered here should apply equally to a Windows install, however, some of the commands and paths referenced will be different on Windows.

Step 0: Install all available upgrades and patches. (One last time).
Then make sure that the DNS CNAME and PTR records for these things are correct:
Every ESXi host

Step 1: Verify that you have an Enterprise Certificate Authority in your environment, that you are able to request certificates, and that you know how to contact the CA administrator. Also make sure that your CA configuration is up to date and using SHA256 instead of SHA1. SHA1 signed certificates will not be considered valid by most clients. This shouldn’t be an issue unless your CA has been around a long time or unless you’re starting out with an older OS to provide your CA, like Server2012. Note that even SHA256 certificates will have an SHA1 thumbprint. Don’t worry about that while troubleshooting, its normal.

Step 2: Follow the instructions (in other tutorials) for creating a VCSA Signing Certificate Template for 6.5 and higher. You will need this template to correctly fill your CSR.

Step 3: Download and install OpenSSL on your workstation, whether Windows or Linux. And look up the instructions for converting PKCS12 certificates to PEM format. You might need this for Step 8.

Step 4: If you have already done custom SSL certificates on your VCSA and are having trouble, or if you are replacing existing custom SSL certificates (If you don’t have an SSL blank slate) then you will want to follow this tutorial to reset your SSL environment back to zero.

Step 5: Login to your VCSA all-in one, or your external Platform Services Controller and run the certificate manager program.

Step 6: Choose “Replace SSL Certificate with Custom Signing Certificate and replace certificates.” The actual number for this varies depending on vSphere version and install type. You will need to enter an SSO admin credential.
Then choose to create a Certificate Signing Request.
When asked if you want to replace all certs, answer yes.
When asked if you want to configure the SSL configuration file, choose yes.
For this step, what you enter here is important. Enter the typical answers for Location, State, etc.
When asked about the Common Name, DO NOT ENTER THE FQDN OF YOUR PSC HOST. If you do, the whole process will fail several steps later.
This will be the name of the CA that is created. Ask your CA admin or examine an existing SSL certificate to determine if there is a naming convention. If you’re not sure, use HOSTNAME-CA.
Don’t enter an IP Address for your PSC, its unnecessary. The last question asking the name of your CA, use what you entered for the CN above. I will refer to this a CA_NAME for the rest of the guide.
Save your CSR and key someplace that is easy to find like /root or /tmp.

Step 7: Get your CSR signed with the VCSA Template created in Step 2 and export it as Base64.

Step 8: Add the whole certificate chain to your certificate.
check out my guide for doing this here

Step 9: Copy your new certificate chain file back to the PSC host

Step 10: If Certificate Manager is still up on your PSC then continue to import Custom SSL Certificate. If it’s not, rerun Certificate Manager. Choose the custom SSL option again, and then choose to import your custom certificate chain.
Provide the full path to your certificate
Provide the full path to your key file
Watch the prompts on the above two lines carefully. Notice if it accepts your cert before entering the key.

If you immediately encounter an error with the certificates. Then check Steps 0, 4, 6 and 8.

It will go through a lengthy process of generating and replacing keys and restarting services. If this process fails, refer to Steps 0, 4, and 6 as the problem will almost always be in one of those places.

Step 11: If you are on 6.7u1, this step is optional. Run certificate manager again. Choose to replace the Machine Certificate.
When asked if you want to reconfigure the SSL configuration, choose yes.
When asked for the CN, go ahead and use the FQDN now.
When asked for the CA name(last question) use CA_NAME identified in step 6.

Step 12: If you are on 6.7u1, this step is optional. Run certificate manager again. Choose to replace the Web Services Certificate.
When asked if you want to reconfigure the SSL configuration, choose yes.
When asked for the CN, use something different than you used for step 11. Maybe web-FQDN.
When asked for the CA name(last question) use CA_NAME identified in step 6.

Step 13: If you have an external PSC, then login to your vCenter Server and perform steps 11 and 12 for the vCenter Server.

Step 14: Restart the services on your vCenter Server (especially if you have an external PSC)

Step 15: Navigate to to FQDN of your vCenter Server. If you don’t have a clean SSL state, then inspect the site’s SSL certificate. If you see the whole chain, then its probably a caching / cookie issue. Clear your cookies or restart your browser. If it still isn’t working, look at the specific error message in your browser for a clue to the problem.

Step 16: Navigate to the VM Admin URL for your PSC and VSCA appliance(s). You MIGHT find that you have a valid SSL certificate there. You MIGHT find that you don’t.
If you don’t, then follow these instructions to fix it.

Step 17: ESXi Host Certificates. Your ESXi hosts won’t accept a new certificate from the PSC until that certificate is 24 hours old. If you don’t want to wait for 24 hours, you can adjust this setting. It is an advanced vCenter Server setting and is configured in minutes. Just change this setting to something like 5 minutes.
When the hosts are done adding you can change it back to the default.

Step 18: Re-register your 3rd-party plug-ins. You may have had to disable or remove 3rd-party plug-ins in order to get this far. If you did, now is when you can re-register them. Keep in mind that the certificates issued by the PSC are only for inter-service communication. The actual server management URLs of your extension servers will need their own Custom SSL certificates to secure front-end management traffic. I will hopefully be providing enough examples of those to make securing whichever ones you have a piece of cake.

If you found this article helpful, take a look at my other vSpehre or SSL related posts and pages. Especially Breaking Bad SSL Habits.

I wrote this up a bit after I did the process so if anything isn’t quite right, feel free to let me know and I’ll fix it.

VMware Documentation
Creating a Signing CA Template
Replacing Default SSL Certificates with Custom Certificates

Excellent Example Video

How to include the whole Certificate Chain in a PEM SSL Certificate

There are a few reasons that your application server might require access to a full certificate chain.  In most cases we are uploading and importing certificates in PEM format.  For the purposes of this article we will consider PEM, x.509, and Base64 synonymous.  They are overlapping standards (think JSON vs YAML).  Different tools in the same process chain will refer to the same data by each of these conventions so for this article, just think of them as the same thing. With all this in mind, when given the choice, choose Base64 as your export format.

If you have certificates or key files that are not in PEM format then you may need to convert them.  This is pretty simple using OpenSSL.  If you are doing a lot with SSL, make sure you have OpenSSL configured on your security workstation.  I may show examples of using OpenSSL, but documenting it’s use is out of scope for this article.

Some nomenclature:
Root Certificate Authority:  The top level of the certificate signing chain.  (Often kept offline for security purposes)
Trusted Root Authority:  A CA that has been configured as “Trusted” on an SSL client.  It doesn’t matter is a cert is signed and by who if the client doesn’t trust the source.
Intermediate / Subordinate / Signing Authority:  A Certificate Authority which is authorized by a higher-level authority to sign certificates.  There can be multiple levels of Authorities.
Certificate Signing Request(CSR):  An request generated by a user or application that is encoded with the host details that are required by the certificate.  A private key is also generated at the time a CSR is created.
Certificate Key:  An encrypted Private Key file that is required to unlock an SSL certificate for use.

Certificate: A PEM formatted SSL certificate text looks like this:


There, with all of that out of the way… Your application has requested that the certificate you provide contains the entire signing chain.  So what do you do?  In some cases you might be asked to supply the certificate and the chain separately.  In this case, you will still need to build the chain.  In most cases, you will be asked to provide the certificate and the chain in one PEM certificate file.

First you need to identify your certificate chain.  You can sometimes download the whole chain from your CA.  That chain may or may not be in PEM format and may need to be converted using OpenSSL.  For simplicity, let’s assume that you may have an easier method to get YOUR chain but I’ll show how to build the chain by hand.

Above we the the certificate chain for the SSL certificate issued for mysite.lab.local. The certificate was signed by lab-WDL-DC1-CA which is subordinate to lab-PDX-DC-01-CA. You can also call lab-WDL-DC1-CA an Intermediate CA.

Most of the time, an application like a web server will only need the certificate itself and the associated private key file. Sometimes the application will require a full chain. There are different reasons. The SSL certificate might be used for bi-directional communication and needs the full chain so it knows to trust other servers signed in the chain. Or the application might act as a signing authority itself and needs knowledge of the whole chain.

In any case, if you have to provide the whole chain, you are generally only given the option of uploading one PEM file. In that case, you will want to structure it in this way.

If you are including the server cert in the chain, it goes here
The last CA in the chain goes here
Intermediate / Subordinate CA’s go here, one after the other, ascending order
The Root CA Certificate goes here

So based on the image of the certificate chain above, a valid chain including the certificate would look like this.


vSphere 6.7 – Fake Persistent Memory

No NVDIMMs? With esxi 6.7, you can create fake PMEM devices.
This is unsupported and requires a host reboot. Sometimes more than one…

This will provide a PMEM pool for POC or skills development but the memory backing these devices is not persistent. For PMEM devices, just migrate the PMEM disks to standard Datastores when not in use. For NVDIMM devices, you will lose them if the host a VM is on loses power. There is no mechanism to state them to disk.
If there is important data, perhaps DD it to a file on the OS disk of the VM before shutting down.

The -v flag indicates the percentage of DRAM memory that should be allocated to virtual PMEM.

Create a free website or blog at

Up ↑

Create your website with
Get started