To understand NPCs, you need to consider IOCs. I'll try to explain:
Each IOC binds to exactly one NPC
Multiple IOCs can be bound to one NPC
Multiple NPCs cannot bind to one IOC, each NPC will bind to a separate IOC
The built-in ports count as one IOC
Some numbers:
Each IOC has a 10 Gb full duplex connection to the fabric
Each SPC has a 10Gb full duplex connection to the fabric
Each NPC has two 10 Gb full duplex connections to the fabric: one towards the IOCs, and one towards the SPCs
Basic traffic flow:
Traffic enters an IOC, gets through the fabric to the associated ingress NPC
The ingress NPC load-balances to an SPC (*), traffic gets to the SPC via the fabric
SPC does processing, sends traffic back out to NPC of egress IOC via fabric
Egress NPC sends traffic to egress IOC via the fabric, traffic leaves the IOC
Making yourself a diagram of that flow will likely be really useful.
What does that mean in practice? Lets consider the 2x10G IOC.
It means that the 2x10G IOC is 2:1 oversubscribed: It can only handle 10G full duplex, because of the constraints of the fabric connections.
Say you have one 2x10G IOC. You are using both ports, and for sake of argument, you have SPCs sufficient to handle 5G of traffic. You want to get to 10G. In that case, just add SPCs. Adding an additional IOC and NPC isn't going to help, your bottleneck are the SPCs.
Case 2: You now want to get beyond 10G throughput, You'd add more SPCs (maximum 7 on an srx 3600 btw), then add an additional IOC and NPC. You now have two 2x10G IOCs, but only use one port on each. You have two NPCs, which means each IOC can handle its full 10 Gb, and, SPC throughput willing, you'll get 20 Gb of traffic through the unit, 10G in each direction (IOC A to IOC B, and vice versa)
Case 2 and a half: you have 7 SPCs, 2 2x10G IOCs with one port used each, but only one NPC. You'd be stuck at 10 Gb throughput: The fabric connections of the NPC become the bottle neck.
Okay, you have an srx3400. Which means your real world SPC performance (IMIX, or http) will top out at around 10Gb with three SPCs. In that case, adding extra IOC and NPC isn't going to get you past that 10 Gb limit. The good news is that all of these modules will work in an SRX 3600, so if you need to get beyond 10 Gb, you can move to a new chassis and keep your investment in IOCs, NPCs and SPCs.
To get beyond 20 Gb of real-world throughput, you'd move to an SRX 5000, space allowing an SRX 5800. It's a different architecture, where IOCs come with built-in NPCs, and this over subscription issue does not exist. Next-gen SPCs mean that you have a lot of performance headroom on that platform. All of which comes at a considerably higher price point than the SRX 3000 series.
Out of curiosity, what throughput are you aiming at on the SRX 3400?
(*) simplified - there are flow lookups and first SPC considerations here, but that's largely irrelevant for this discussion