Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PCIe for Hackers: The Diffpair Prelude (hackaday.com)
124 points by zdw on March 16, 2023 | hide | past | favorite | 48 comments


From the comments: "Protip: don’t try to be fancy and provide a local REFCLK from an independent oscillator. Always use the host’s REFCLK as it may be spread spectrum modulated for EMI compliance in most PC systems."

Someone handed me a non-functional prototype system a few years ago (the first time I'd encountered PCIe), and it had a local REFCLK and an upstream Android-style host which, indeed, had enabled spread-spectrum clocking. Took me two miserable weeks to figure that one out.


Also it's a good point to understand just how these SERDES blocks work. There is a CDR circuit, but it is fully dependent on the refclk, which can only be off a tiny amount from the one used on the transmit side- it's not like there's a wide-bandwidth PLL that recovers the embedded clock from the encoded data.

This is unfortunate, because for transporting video you would like to also transport the pixel clock, but you don't get it for free from the SERDES CDR.

DisplayPort use SERDES and have to transport the pixel clock. They do it by sending a message with fractional relationship between the pixel clock and the SERDES clock to the receive side, which derives the pixel clock from the recovered SERDES CDR clock using a fractional-N PLL.

I had the idea of transporting video over PCIe at one point, so I was interested in this. The reason is that PCIe is sometimes available for free, so why not use it. I wanted to use a local reference clock, so would have been forced to use the DisplayPort scheme (but no fractional-N PLL available).

SERDES with actual PLL CDRs in them for video do exist, but they are different from the generic SERDES used for PCIe and networking.

[edit: removed HDMI, only DisplayPort works as I said above]


HDMI/DVI has dedicated pixel clock pair as do the various LVDS LCD panel interfaces (and CameraLink, which is a related PHY). These interfaces are more or less “dumb” and simply serialize the display data onto three pairs with the same timing as VGA/DPI (discrete serdeses for these interfaces tend to have internal x10 PLL for generating the bit-clock).


The newer versions of Camera Link don't need the clock lane, FPD Link-II and above or Channel Link-II and above.


For the hardware I was using, the downstream CDR was actually doing a reasonable job locking on the spread-spectrum clock, such that PCIe negotiations would sometimes work through quite a few states without error (I was watching the state machine in Xilinx/Vivado's ChipScope), before inevitably failing when the CDR lost lock.


iirc that’s why only very few GPUs supported more than two HDMI/DVI outputs, you need a complicated clock generator for them as the bit clock is a multiple of the pixel clock. Meanwhile DisplayPort only has like four possible link clocks.


> First off, you want to keep both of the pair’s signals close to each other throughout their length. The closer the two signals are, the better external interference cancellation works, and the less noise they radiate (…)

Wrong! This is a myth that's stubbornly difficult to weed out. The coupling between the wires of a pair is only ~12% the amount of coupling to the reference plane.

On a PCB it's best to treat each signal on its own and route it as a single ended, coplanar waveguide surrounded by reference planes. This allows to use far thicker widths, which reduces inductance, which also reduces the coupling to radiating EM fields.

If you want a longer talk on the topic: https://www.youtube.com/watch?v=QG0Apol-oj0


Rick Hartley cracks me up. He's like a PCB design cross between R. Lee Ermey and Tony Robbins.


The talk was great! Thanks for sharing.


Here is one thing I've learned about PCIe- the .1 uF series capacitors must be .1 uF, .01 uF will not work. The reason is that the link detection depends on the RC charging rate of these capacitors. Can you guess why I know this?


I can and I am so sorry.


Their point about diff pairs being more resistant to noise is correct, but it’s not the primary reason for using diff pairs.

Differential signals, at a physical layer level, are faster than single ended IO. It’s because you have double the current drive/sinking capability with two drivers. You’re also getting capacitive coupling between each leg of the pair working in your favor, which keeps your edge transition nice and fast.

Important stuff when your unit interval is under 500ps!


> Their point about diff pairs being more resistant to noise is correct, but it’s not the primary reason for using diff pairs.

The article is correct. This comment is wrong.

> Differential signals, at a physical layer level, are faster than single ended IO. It’s because you have double the current drive/sinking capability with two drivers.

The load is also double in a diff pair compared to a single wire, so the net effect is a wash compared to a single wire.

> You’re also getting capacitive coupling between each leg of the pair working in your favor, which keeps your edge transition nice and fast.

The opposite is true. Differential capacitance effectively appears 2X higher than the nominal capacitance to differential signals, making it a drawback of differential signaling rather than a benefit.


When you see a differential pair, simply imagine two independent lines that are each referenced to ground. (That's close to what happens on the PCB anyway, if a ground plane is present. Most of the return current ends up ground-referenced.) Two lines that would have exhibited 50-ohm characteristic impedance in a single-ended circuit will form a ~100-ohm diff pair.

In other words, the capacitance isn't doubled, since the capacitance is split by the imaginary ground between the two lines. It looks like two caps in series, not in parallel. Same is true for the load resistance.


I have designed PCIe compatible transceivers and this will be the last comment I make about it because correcting hardware nonsense on HN is only of transient interest to me.

Everything I wrote is 100% correct and in fact incontrovertible. Whether there is a little or a lot of differential capacitance does not change the fact that the differential portion of the capacitance has a 2X effect on the differential signal (as opposed to the common-mode signal, which it has no effect on). This is supported by basic math.

If capacitance is to ground then it is not differential capacitance so it is not relevant to this discussion. It may be true that differential capacitance is not a significant contributor to the impedance of PCB differential traces but that does not change the fundamental result (similarly, the principle of photovoltaic conversion still holds true in the dark even though there is little light to convert). And PCB traces are not the only kinds of differential pairs. Diff pairs exist inside the integrated circuits that drive the PCBs where they operate less like transmission lines and more like lumped capacitances due to the frequencies of interest compared to the dimensions of the conductors. In these circuit and conductor structures, differential capacitance can be significant and this is what OP was talking about since he was talking about the legs of the driver (transistors). OP was just wrong about the differential capacitance being good for speed. It's bad for speed.


Meh. There isn't a whole heck of a lot of capacitance between two parallel traces. Not compared to a single-ended trace that is (almost necessarily) referenced to one or two planes on adjacent layer(s).


I think they're a bit off on the noise explanation, as they invoke a common misunderstanding of differential pairs on circuit boards (free common mode rejection).

You're right that differential signals work much better at high speed than single-ended! However one major reason is that you can get a much higher data rate using slower edges by comparing two signals. Diff pair signals don't look more like square waves due to more drive power - they look more like sinusoids. Which uses way less power!


For anyone interested in the topic, I can highly recommend "Signal and Power Integrity Simplified" by Eric Bogatin.

I found this to be absolutely best resource on the topic for somebody with no formal electronics education like me.


I like this book a lot. Has a heavy focus on physics and an intuitive understanding of physics, rather than just giving a list of heuristics to abide by.


Yes, that's exactly the point. Heuristics is good until it is not. And any heuristics will eventually fail at high enough frequency...

With intuitive understanding you start seeing how various design choices affect the performance and you can focus on checking what might be the cause of the problem and fixing things that have biggest potential for improvement. It allows you to learn from your mistakes whereas with heuristics you feel like a dog that is being beaten by the owner but without any idea why.


One of the most fascinating thing about working with hardware vs software is your designs are influenced by the laws of physics, but more importantly, how we choose to implement our understanding of these laws. I've gotten away with quick and dirty prototypes with no length matching but always keeping the two traces as close to each others' path as possible with as few vias as possible (even avoiding if able), but other times you need to get it right as I suspect that some chips are more robust than others with their signal processing. For CAN bus and PCIe, you can use garbage routing and tin foil to connect your diff pairs, but for HDMI, you better get that right or suffer random, intermittent problems forever.


How do I get into the layers between differential pairs and drivers running on my computer? I think we need at least a mid-end FPGA board in ordor to play with all the high speed goodness. The crypto crash gave us a bunch of those boards but now supply is drying up.


Judging from https://www.fpgadeveloper.com/list-of-fpga-dev-boards-for-pc..., your best bet is one that plugs into an M.2 slot.


Sounds like this might be a start of a series of articles on HaD, hopefully they'll cover making a custom endpoint. Because that is also my impression that you need beefy fpga and possibly even proprietary ip cores to play with pcie, but it'd be cool to be proven wrong there


This is my understanding too. Is it possible to use an fpga without built in diff pairs and serdes, and use the io's that are built with a suitable high speed clock domain for a "diy" differential pair? Maybe not as fast, but still faster than any other io on the fpga?


Get yourself a mid-end FPGA before it is too late then. :)

Tinkering with anything physical is always going to need appropriate tools, so there is probably no better way than to find some of the tools as cheaply as possible.


Isn't Thunderbolt also "just" PCIe over USB-C cabling? :)

PCIe seems to be a well designed standard all things considered - very fast and very robust.


In college I read (parts of) the big computer standards -- USB, PCIe, SATA (rip), and Bluetooth. PCIe was consistently head and shoulders above the rest.

This was true on an editorial level and on a structural level. They defined terms before they used them, they didn't use the same term to mean two different things, they foresaw and sidestepped problems that other standards blundered into, they didn't go on architecture astronaut tangents, they weren't stingy with bits when they could buy simplicity/clarity/separation of concerns but they spared no effort when performance was on the line. These were all frequent sins in the other standards.

I've always wondered what human factors led to this. Good leadership? Fewer cooks in the kitchen? More time? There is probably a lesson in there, but I didn't stay in computer engineering long enough to gain industry contacts and figure it out.


I suspect it was a combination of technical limitations and the nature of the developers. The USB 1.0 standard was finalized in 1996, seven years before PCIe 1.0 (2003), and was one of the first (maybe the first?) consumer protocols to fully support automatically-enumerated hot-swappable devices. (Having worked on USB drivers a bit, I still kind of hate the protocol, though.) Keep in mind too that there is a massive difference in computing power, memory, etc. between 1996 and 2003. Similarly, I get the impression that "low-power wireless" was a herculean effort when Bluetooth was developed in ~1999.

It looks like PCIe 1.0 was designed solely by Intel, which probably accounts for a lot of the coherence and editing. SATA (2003) was designed by committee.


This is something I would be very interested in reading in more detail, with examples from multiple specs pointing out pitfalls or successes. Dammit I'm supposed to be working not diving down a rabbit hole.


Thunderbolt is confusing because of the many roles it can adopt. When Thunderbolt is used to connect a PCIe peripheral to a host, it is tunneling PCIe inside packets on Thunderbolt lower layers. The reason this is confusing is because Thunderbolt can also encapsulate DisplayPort traffic, and it can encapsulate DisplayPort and PCIe traffic at the same time, but Thunderbolt ports can also just be DisplayPort ports, electrically, but they cannot be PCIe ports. If you plug a Thunderbolt ports into a DisplayPort display, the Thunderbolt interface will switch modes and simply become a DisplayPort interface. It has the same ability to become a USB port, if it thinks it is connected to a USB peer. But there is no such way for Thunderbolt to be connected to a PCIe device and act like it is just pure PCIe.


Thunderbolt is quite different at the lower layers. It's more accurate to think of Thunderbolt as its own connection technology that can carry both PCIe and DisplayPort data packets multiplexed on one cable, with the DisplayPort packets getting priority.


I've heard that Thunderbolt and USB4 "merged" -- does this mean it's Thunderbolt at the lowest layers and USB4 is encapsulated? Or are there two different signaling standards sharing the same PHY?


When USB is involved, you can always assume they chose the more complicated option.

USB4 is a competing, incompatible method of encapsulating and multiplexing PCIe, DisplayPort and USB data packets over one cable. But the USB4 spec incorporates Thunderbolt as an alternate mode. Thunderbolt support is required for USB4 hubs but optional for USB4 hosts and devices. So you will often (maybe even always, in practice) be able to use a USB4 host port as a Thunderbolt port.


It's worth mentioning that Microsoft seems to demand support for Thunderbolt mode for Win11 certification.


Don't let Usb4 hear you saying that, it'll be mad. USB4 is usb4. Which... very closely resembles thunderbolt. Thunderbolt 4 is PCIe tunneling over USB4.

Note also Display Port now works similarly: instead of switching to an alt-mode (dedicating half the lanes to a video signal), in USB4 Display Port packets are encapsulated in usb4 packets. So now you can attach a variety of displays, to a tree of hubs, provided you have bandwidth, whereas this was a cluster fuck for a while... it's part of the reason usb-c output hubs were so rare, no one wanted to explain to consumers why a couple monitors with usb-c cables wouldn't work with their hub. Now, they will (provided there's sufficient bandwidth).


From the article:

> any interference affects the [diff pair] signals equally – as signals are compared to each other to receive information, this means that the information received is not affected by noise overlaid onto both of the signals.

This is not true - a common myth about differential pairs. Don't rely on it for your designs. More info - see page 15 https://www.speedingedge.com/PDF-Files/DiffSigDesign.pdf


To expand - the way most relevant noise propagates on a board is often assumed to couple equally into both sides of the differential pair. It doesn't.


Someday, I wish I could understand electrical impedance.

I understand DC resistance. After designing and building my own Tube based electric guitar amplifier, I head to learn the relation between DC resistance and current _intimately_.

What I still didn't have a grasp on is the concept of impedance. While I was able to calculate correct values for my guitar amp, I didn't know _why_ I was doing it which is a bit frustrating. For instance, I still don't understand the output transformer that "matches the impedance of the speaker to the output tube". Le sigh.


I'm no expert but FWIW, I find thinking about it as a resistance is unhelpful. It's much closer to a sort of "resonance coefficient". Sure it's measured in ohms but a 50 ohm cable will not "resist" the flow of signal like a 50 ohm resistor because a resistor is a load (can simulate an antenna) while the cable is just resonant with the load. What makes a given line/cable "50 ohm" is largely capacitance driven because it needs to be resonant with a 50 ohm load at one end and a 50 ohm source at the other.

I find amateur radio literature to be pretty good for this kind of stuff. It's a very central issue to them.


I think the OP is talking about input/output impedances of amplifiers/guitar pickups, while you're talking about transmission line characteristic impedance.

Impedance of a transmission line is weird: in some ways, it acts exactly like a resistor, and in some ways it doesn't.

In contrast to a resistor, an ideal transmission line doesn't convert electrical energy to heat. Ideally, 100% of electrical energy put in one end of the line will make it to the other end of the line intact.

However, just like an ideal resistor, an ideal transmission line will have a real-valued impedance, not a capacitive (negative imaginary values) or inductive (positive imaginary values) impedance; nor will an ideal transmission line have any frequency dependence in its impedance: 50 ohms is 50 ohms.

One way I like to think of characteristic impedance is that it's the temporary impedance a change in signal will see until current/voltage wave reflections make it back from the other end:

For a 50-ohm ideal transmission line one light-second long (pretending we have velocity factor 1.0 to make the math easy), your ohmmeter would read 50 ohms for two seconds, and after that it would read whatever resistance is connected to the other end. If the other end is an open circuit, the ohmmeter would measure 50 ohms then infinite resistance; if the other end is a short circuit, the ohmmeter would see 50 ohms then 0 ohms; if the other end is terminated with a 50-ohm resistor, the ohmmeter would measure 50 ohms indefinitely.

If you have an infinitely long ideal transmission line with 50 ohm impedance, and hooked up an ohmmeter to it, it would measure 50 ohms. If you hooked up an LCR meter, it would show 0 impedance and 0 capacitance.

This indistinguishability between an infinitely long transmission line and a resistor is why a matched termination resistor prevents signal reflections: If you have some finite length of transmission line, and you attach either a matched resistor or an infinitely long transmission line with the same impedance to the end, the first length of transmission line cannot tell which you have attached - its behavior will be the same in either case: all the energy is passed into the next section.


Do you mean impedance the concept (it's just resistance, in the frequency domain), the fact that amplifiers have an "output" impedance, maximum power transfer, or how a transformer works to convert output impedance?

There's a lot of inter-layered things behind "why do tube amps need an output transformer." The concept of impedance is pretty simple and doesn't really deal with that, other than being the 0th layer.


Impedance is basically the resistance of the change in current direction. In DC systems, you almost never deal with this. You deal with this constantly in AC systems and LC circuits.

https://pediaa.com/difference-between-impedance-and-resistan...


"Impedance" as used by audio people is completely different from "impedance" as used by RF people.

In audio, it's about the equivalent resistance of an input or load. If you put a dummy resistor in place of a load, how would the output/driver treat it? How do you choose what value of resistor to use, to stand in for a given load/input? That's the impedance.

In RF, it's about the equivalent resistance of the transmission line itself, because at high enough frequencies the speed-of-light prevents you from even seeing the far end. That phenomenon is well treated here:

https://www.ibiblio.org/kuphaldt/electricCircuits/AC/AC_14.h...


I don't believe this is true, AF and RF have significant overlap on the frequency domain, AF is just baseband RF when it's on the wire. RF very much can work with a resistive load, a 50ohm dummy load will ohm out as 50 ohm DC on any cheap multimeter.


The distinction is when the transmission line is "electrically short" relative to the wavelength of the highest harmonic of interest (typical in audio unless your speaker wires are 40km long), and thus its impedance doesn't play much of a role (and therefore you "see" the load directly), or if the transmission line is "electrically long" (typically the case in RF), so your source is driving the line rather than driving the load directly.


I hope to see more about this. One of the things I would like to see are adapters that lets you bond multiple thunderbolt cables to get a full 16x PCIe interface. The reverse of a bifurcator.

I'd like an external GPU for a high-end laptop with no compromises.


Someone figure out eGPUs for Apple Silicon, maybe a kickstarter?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: